diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 7898822e0e11d..95729f845ff5c 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,515 +1,24 @@ Contributing to pandas ====================== -Where to start? ---------------- - -All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. - -If you are simply looking to start working with the *pandas* codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [Difficulty Novice](https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22) where you could start out. - -Or maybe through using *pandas* you have an idea of you own or are looking for something in the documentation and thinking 'this can be improved'...you can do something about it! - -Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). - -Bug reports and enhancement requests ------------------------------------- - -Bug reports are an important part of making *pandas* more stable. Having a complete bug report will allow others to reproduce the bug and provide insight into fixing. Because many versions of *pandas* are supported, knowing version information will also identify improvements made since previous versions. Trying the bug-producing code out on the *master* branch is often a worthwhile exercise to confirm the bug still exists. It is also worth searching existing bug reports and pull requests to see if the issue has already been reported and/or fixed. - -Bug reports must: - -1. Include a short, self-contained Python snippet reproducing the problem. You can format the code nicely by using [GitHub Flavored Markdown](http://github.github.com/github-flavored-markdown/): - - ```python - >>> from pandas import DataFrame - >>> df = DataFrame(...) - ... - ``` - -2. Include the full version string of *pandas* and its dependencies. In versions of *pandas* after 0.12 you can use a built in function: - - >>> from pandas.util.print_versions import show_versions - >>> show_versions() - - and in *pandas* 0.13.1 onwards: - - >>> pd.show_versions() - -3. Explain why the current behavior is wrong/not desired and what you expect instead. - -The issue will then show up to the *pandas* community and be open to comments/ideas from others. - -Working with the code ---------------------- - -Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the *pandas* code base. - -### Version control, Git, and GitHub - -To the new user, working with Git is one of the more daunting aspects of contributing to *pandas*. It can very quickly become overwhelming, but sticking to the guidelines below will help keep the process straightforward and mostly trouble free. As always, if you are having difficulties please feel free to ask for help. - -The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas). To contribute you will need to sign up for a [free GitHub account](https://github.com/signup/free). We use [Git](http://git-scm.com/) for version control to allow many people to work together on the project. - -Some great resources for learning Git: - -- the [GitHub help pages](http://help.github.com/). -- the [NumPy's documentation](http://docs.scipy.org/doc/numpy/dev/index.html). -- Matthew Brett's [Pydagogue](http://matthew-brett.github.com/pydagogue/). - -### Getting started with Git - -[GitHub has instructions](http://help.github.com/set-up-git-redirect) for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before you can work seamlessly between your local repository and GitHub. - -### Forking - -You will need your own fork to work on the code. Go to the [pandas project page](https://github.com/pandas-dev/pandas) and hit the `Fork` button. You will want to clone your fork to your machine: - - git clone git@github.com:your-user-name/pandas.git pandas-yourname - cd pandas-yourname - git remote add upstream git://github.com/pandas-dev/pandas.git - -This creates the directory pandas-yourname and connects your repository to the upstream (main project) *pandas* repository. - -The testing suite will run automatically on Travis-CI once your pull request is submitted. However, if you wish to run the test suite on a branch prior to submitting the pull request, then Travis-CI needs to be hooked up to your GitHub repository. Instructions for doing so are [here](http://about.travis-ci.org/docs/user/getting-started/). - -### Creating a branch - -You want your master branch to reflect only production-ready code, so create a feature branch for making your changes. For example: - - git branch shiny-new-feature - git checkout shiny-new-feature - -The above can be simplified to: - - git checkout -b shiny-new-feature - -This changes your working directory to the shiny-new-feature branch. Keep any changes in this branch specific to one bug or feature so it is clear what the branch brings to *pandas*. You can have many shiny-new-features and switch in between them using the git checkout command. - -To update this branch, you need to retrieve the changes from the master branch: - - git fetch upstream - git rebase upstream/master - -This will replay your commits on top of the lastest pandas git master. If this leads to merge conflicts, you must resolve these before submitting your pull request. If you have uncommitted changes, you will need to `stash` them prior to updating. This will effectively store your changes and they can be reapplied after updating. - -### Creating a development environment - -An easy way to create a *pandas* development environment is as follows. - -- Install either Anaconda <install.anaconda> or miniconda <install.miniconda> -- Make sure that you have cloned the repository <contributing.forking> -- `cd` to the *pandas* source directory - -Tell conda to create a new environment, named `pandas_dev`, or any other name you would like for this environment, by running: - - conda create -n pandas_dev --file ci/requirements_dev.txt - -For a python 3 environment: - - conda create -n pandas_dev python=3 --file ci/requirements_dev.txt - -If you are on Windows, then you will also need to install the compiler linkages: - - conda install -n pandas_dev libpython - -This will create the new environment, and not touch any of your existing environments, nor any existing python installation. It will install all of the basic dependencies of *pandas*, as well as the development and testing tools. If you would like to install other dependencies, you can install them as follows: - - conda install -n pandas_dev -c pandas pytables scipy - -To install *all* pandas dependencies you can do the following: - - conda install -n pandas_dev -c pandas --file ci/requirements_all.txt - -To work in this environment, Windows users should `activate` it as follows: - - activate pandas_dev - -Mac OSX and Linux users should use: - - source activate pandas_dev - -You will then see a confirmation message to indicate you are in the new development environment. - -To view your environments: - - conda info -e - -To return to you home root environment: - - deactivate - -See the full conda docs [here](http://conda.pydata.org/docs). - -At this point you can easily do an *in-place* install, as detailed in the next section. - -### Making changes - -Before making your code changes, it is often necessary to build the code that was just checked out. There are two primary methods of doing this. - -1. The best way to develop *pandas* is to build the C extensions in-place by running: - - python setup.py build_ext --inplace - - If you startup the Python interpreter in the *pandas* source directory you will call the built C extensions - -2. Another very common option is to do a `develop` install of *pandas*: - - python setup.py develop - - This makes a symbolic link that tells the Python interpreter to import *pandas* from your development directory. Thus, you can always be using the development version on your system without being inside the clone directory. - -Contributing to the documentation ---------------------------------- - -If you're not the developer type, contributing to the documentation is still of huge value. You don't even have to be an expert on *pandas* to do so! Something as simple as rewriting small passages for clarity as you reference the docs is a simple but effective way to contribute. The next person to read that passage will be in your debt! - -In fact, there are sections of the docs that are worse off after being written by experts. If something in the docs doesn't make sense to you, updating the relevant section after you figure it out is a simple way to ensure it will help the next person. - -### About the *pandas* documentation - -The documentation is written in **reStructuredText**, which is almost like writing in plain English, and built using [Sphinx](http://sphinx.pocoo.org/). The Sphinx Documentation has an excellent [introduction to reST](http://sphinx.pocoo.org/rest.html). Review the Sphinx docs to perform more complex changes to the documentation as well. - -Some other important things to know about the docs: - -- The *pandas* documentation consists of two parts: the docstrings in the code itself and the docs in this folder `pandas/doc/`. - - The docstrings provide a clear explanation of the usage of the individual functions, while the documentation in this folder consists of tutorial-like overviews per topic together with some other information (what's new, installation, etc). - -- The docstrings follow the **Numpy Docstring Standard**, which is used widely in the Scientific Python community. This standard specifies the format of the different sections of the docstring. See [this document](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) for a detailed explanation, or look at some of the existing functions to extend it in a similar manner. -- The tutorials make heavy use of the [ipython directive](http://matplotlib.org/sampledoc/ipython_directive.html) sphinx extension. This directive lets you put code in the documentation which will be run during the doc build. For example: - - .. ipython:: python - - x = 2 - x**3 - - will be rendered as: - - In [1]: x = 2 - - In [2]: x**3 - Out[2]: 8 - - Almost all code examples in the docs are run (and the output saved) during the doc build. This approach means that code examples will always be up to date, but it does make the doc building a bit more complex. - -> **note** -> -> The `.rst` files are used to automatically generate Markdown and HTML versions of the docs. For this reason, please do not edit `CONTRIBUTING.md` directly, but instead make any changes to `doc/source/contributing.rst`. Then, to generate `CONTRIBUTING.md`, use [pandoc](http://johnmacfarlane.net/pandoc/) with the following command: -> -> pandoc doc/source/contributing.rst -t markdown_github > CONTRIBUTING.md - -The utility script `scripts/api_rst_coverage.py` can be used to compare the list of methods documented in `doc/source/api.rst` (which is used to generate the [API Reference](http://pandas.pydata.org/pandas-docs/stable/api.html) page) and the actual public methods. This will identify methods documented in in `doc/source/api.rst` that are not actually class methods, and existing methods that are not documented in `doc/source/api.rst`. - -### How to build the *pandas* documentation - -#### Requirements - -To build the *pandas* docs there are some extra requirements: you will need to have `sphinx` and `ipython` installed. [numpydoc](https://github.com/numpy/numpydoc) is used to parse the docstrings that follow the Numpy Docstring Standard (see above), but you don't need to install this because a local copy of numpydoc is included in the *pandas* source code. - -It is easiest to create a development environment <contributing.dev\_env>, then install: - - conda install -n pandas_dev sphinx ipython - -Furthermore, it is recommended to have all [optional dependencies](http://pandas.pydata.org/pandas-docs/dev/install.html#optional-dependencies) installed. This is not strictly necessary, but be aware that you will see some error messages when building the docs. This happens because all the code in the documentation is executed during the doc build, and so code examples using optional dependencies will generate errors. Run `pd.show_versions()` to get an overview of the installed version of all dependencies. - -> **warning** -> -> You need to have `sphinx` version 1.2.2 or newer, but older than version 1.3. Versions before 1.1.3 should also work. - -#### Building the documentation - -So how do you build the docs? Navigate to your local `pandas/doc/` directory in the console and run: - - python make.py html - -Then you can find the HTML output in the folder `pandas/doc/build/html/`. - -The first time you build the docs, it will take quite a while because it has to run all the code examples and build all the generated docstring pages. In subsequent evocations, sphinx will try to only build the pages that have been modified. - -If you want to do a full clean build, do: - - python make.py clean - python make.py build - -Starting with *pandas* 0.13.1 you can tell `make.py` to compile only a single section of the docs, greatly reducing the turn-around time for checking your changes. You will be prompted to delete `.rst` files that aren't required. This is okay because the prior versions of these files can be checked out from git. However, you must make sure not to commit the file deletions to your Git repository! - - #omit autosummary and API section - python make.py clean - python make.py --no-api - - # compile the docs with only a single - # section, that which is in indexing.rst - python make.py clean - python make.py --single indexing - -For comparison, a full documentation build may take 10 minutes, a `-no-api` build may take 3 minutes and a single section may take 15 seconds. Subsequent builds, which only process portions you have changed, will be faster. Open the following file in a web browser to see the full documentation you just built: - - pandas/docs/build/html/index.html +Whether you are a novice or experienced software developer, all contributions and suggestions are welcome! -And you'll have the satisfaction of seeing your new and improved documentation! +Our main contribution docs can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst), but if you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant places in the docs for further information. -#### Building master branch documentation - -When pull requests are merged into the *pandas* `master` branch, the main parts of the documentation are also built by Travis-CI. These docs are then hosted [here](http://pandas-docs.github.io/pandas-docs-travis). - -Contributing to the code base ------------------------------ - -### Code standards - -*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) standard. There are several tools to ensure you abide by this standard. - -We've written a tool to check that your commits are PEP8 great, [pip install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch vs master with: - - pep8radius master --diff - -and make these changes with: - - pep8radius master --diff --in-place - -Alternatively, use the [flake8](http://pypi.python.org/pypi/flake8) tool for checking the style of your code. Additional standards are outlined on the [code style wiki page](https://github.com/pandas-dev/pandas/wiki/Code-Style-and-Conventions). - -Please try to maintain backward compatibility. *pandas* has lots of users with lots of existing code, so don't break it if at all possible. If you think breakage is required, clearly state why as part of the pull request. Also, be careful when changing method signatures and add deprecation warnings where needed. - -### Test-driven development/code writing - -*pandas* is serious about testing and strongly encourages contributors to embrace [test-driven development (TDD)](http://en.wikipedia.org/wiki/Test-driven_development). This development process "relies on the repetition of a very short development cycle: first the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test." So, before actually writing any code, you should write your tests. Often the test can be taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests. - -Adding tests is one of the most common requests after code is pushed to *pandas*. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue. - -Like many packages, *pandas* uses the [Nose testing system](https://nose.readthedocs.io/en/latest/index.html) and the convenient extensions in [numpy.testing](http://docs.scipy.org/doc/numpy/reference/routines.testing.html). - -#### Writing tests - -All tests should go into the `tests` subdirectory of the specific package. This folder contains many current examples of tests, and we suggest looking to these for inspiration. If your test requires working with files or network connectivity, there is more information on the [testing page](https://github.com/pandas-dev/pandas/wiki/Testing) of the wiki. - -The `pandas.util.testing` module has many special `assert` functions that make it easier to make statements about whether Series or DataFrame objects are equivalent. The easiest way to verify that your code is correct is to explicitly construct the result you expect, then compare the actual result to the expected correct result: - - def test_pivot(self): - data = { - 'index' : ['A', 'B', 'C', 'C', 'B', 'A'], - 'columns' : ['One', 'One', 'One', 'Two', 'Two', 'Two'], - 'values' : [1., 2., 3., 3., 2., 1.] - } - - frame = DataFrame(data) - pivoted = frame.pivot(index='index', columns='columns', values='values') - - expected = DataFrame({ - 'One' : {'A' : 1., 'B' : 2., 'C' : 3.}, - 'Two' : {'A' : 1., 'B' : 2., 'C' : 3.} - }) - - assert_frame_equal(pivoted, expected) - -#### Running the test suite - -The tests can then be run directly inside your Git clone (without having to install *pandas*) by typing: - - nosetests pandas - -The tests suite is exhaustive and takes around 20 minutes to run. Often it is worth running only a subset of tests first around your changes before running the entire suite. This is done using one of the following constructs: - - nosetests pandas/tests/[test-module].py - nosetests pandas/tests/[test-module].py:[TestClass] - nosetests pandas/tests/[test-module].py:[TestClass].[test_method] - -#### Running the performance test suite - -Performance matters and it is worth considering whether your code has introduced performance regressions. *pandas* is in the process of migrating to the [asv library](https://github.com/spacetelescope/asv) to enable easy monitoring of the performance of critical *pandas* operations. These benchmarks are all found in the `pandas/asv_bench` directory. asv supports both python2 and python3. - -> **note** -> -> The asv benchmark suite was translated from the previous framework, vbench, so many stylistic issues are likely a result of automated transformation of the code. - -To use asv you will need either `conda` or `virtualenv`. For more details please check the [asv installation webpage](https://asv.readthedocs.io/en/latest/installing.html). - -To install asv: - - pip install git+https://github.com/spacetelescope/asv - -If you need to run a benchmark, change your directory to `/asv_bench/` and run the following if you have been developing on `master`: - - asv continuous master - -If you are working on another branch, either of the following can be used: - - asv continuous master HEAD - asv continuous master your_branch - -This will check out the master revision and run the suite on both master and your commit. Running the full test suite can take up to one hour and use up to 3GB of RAM. Usually it is sufficient to paste only a subset of the results into the pull request to show that the committed changes do not cause unexpected performance regressions. - -You can run specific benchmarks using the `-b` flag, which takes a regular expression. For example, this will only run tests from a `pandas/asv_bench/benchmarks/groupby.py` file: - - asv continuous master -b groupby - -If you want to only run a specific group of tests from a file, you can do it using `.` as a separator. For example: - - asv continuous master -b groupby.groupby_agg_builtins1 - -will only run a `groupby_agg_builtins1` test defined in a `groupby` file. - -It can also be useful to run tests in your current environment. You can simply do it by: - - asv dev - -This command is equivalent to: - - asv run --quick --show-stderr --python=same - -This will launch every test only once, display stderr from the benchmarks, and use your local `python` that comes from your `$PATH`. - -Information on how to write a benchmark can be found in the [asv documentation](https://asv.readthedocs.io/en/latest/writing_benchmarks.html). - -#### Running the vbench performance test suite (phasing out) - -Historically, *pandas* used [vbench library](https://github.com/pydata/vbench) to enable easy monitoring of the performance of critical *pandas* operations. These benchmarks are all found in the `pandas/vb_suite` directory. vbench currently only works on python2. - -To install vbench: - - pip install git+https://github.com/pydata/vbench - -Vbench also requires `sqlalchemy`, `gitpython`, and `psutil`, which can all be installed using pip. If you need to run a benchmark, change your directory to the *pandas* root and run: - - ./test_perf.sh -b master -t HEAD - -This will check out the master revision and run the suite on both master and your commit. Running the full test suite can take up to one hour and use up to 3GB of RAM. Usually it is sufficient to paste a subset of the results into the Pull Request to show that the committed changes do not cause unexpected performance regressions. - -You can run specific benchmarks using the `-r` flag, which takes a regular expression. - -See the [performance testing wiki](https://github.com/pandas-dev/pandas/wiki/Performance-Testing) for information on how to write a benchmark. - -### Documenting your code - -Changes should be reflected in the release notes located in `doc/source/whatsnew/vx.y.z.txt`. This file contains an ongoing change log for each release. Add an entry to this file to document your fix, enhancement or (unavoidable) breaking change. Make sure to include the GitHub issue number when adding your entry (using `` :issue:`1234` `` where 1234 is the issue/pull request number). - -If your code is an enhancement, it is most likely necessary to add usage examples to the existing documentation. This can be done following the section regarding documentation above <contributing.documentation>. Further, to let users know when this feature was added, the `versionadded` directive is used. The sphinx syntax for that is: - -``` sourceCode -.. versionadded:: 0.17.0 -``` - -This will put the text *New in version 0.17.0* wherever you put the sphinx directive. This should also be put in the docstring when adding a new function or method ([example](https://github.com/pandas-dev/pandas/blob/v0.16.2/pandas/core/generic.py#L1959)) or a new keyword argument ([example](https://github.com/pandas-dev/pandas/blob/v0.16.2/pandas/core/frame.py#L1171)). - -Contributing your changes to *pandas* -------------------------------------- - -### Committing your code - -Keep style fixes to a separate commit to make your pull request more readable. - -Once you've made changes, you can see them by typing: - - git status - -If you have created a new file, it is not being tracked by git. Add it by typing: - - git add path/to/file-to-be-added.py - -Doing 'git status' again should give something like: - - # On branch shiny-new-feature - # - # modified: /relative/path/to/file-you-added.py - # - -Finally, commit your changes to your local repository with an explanatory message. *Pandas* uses a convention for commit message prefixes and layout. Here are some common prefixes along with general guidelines for when to use them: - -> - ENH: Enhancement, new functionality -> - BUG: Bug fix -> - DOC: Additions/updates to documentation -> - TST: Additions/updates to tests -> - BLD: Updates to the build process/scripts -> - PERF: Performance improvement -> - CLN: Code cleanup - -The following defines how a commit message should be structured. Please reference the relevant GitHub issues in your commit message using GH1234 or \#1234. Either style is fine, but the former is generally preferred: - -> - a subject line with < 80 chars. -> - One blank line. -> - Optionally, a commit message body. - -Now you can commit your changes in your local repository: - - git commit -m - -### Combining commits - -If you have multiple commits, you may want to combine them into one commit, often referred to as "squashing" or "rebasing". This is a common request by package maintainers when submitting a pull request as it maintains a more compact commit history. To rebase your commits: - - git rebase -i HEAD~# - -Where \# is the number of commits you want to combine. Then you can pick the relevant commit message and discard others. - -To squash to the master branch do: - - git rebase -i master - -Use the `s` option on a commit to `squash`, meaning to keep the commit messages, or `f` to `fixup`, meaning to merge the commit messages. - -Then you will need to push the branch (see below) forcefully to replace the current commits with the new ones: - - git push origin shiny-new-feature -f - -### Pushing your changes - -When you want your changes to appear publicly on your GitHub page, push your forked feature branch's commits: - - git push origin shiny-new-feature - -Here `origin` is the default name given to your remote repository on GitHub. You can see the remote repositories: - - git remote -v - -If you added the upstream repository as described above you will see something like: - - origin git@github.com:yourname/pandas.git (fetch) - origin git@github.com:yourname/pandas.git (push) - upstream git://github.com/pandas-dev/pandas.git (fetch) - upstream git://github.com/pandas-dev/pandas.git (push) - -Now your code is on GitHub, but it is not yet a part of the *pandas* project. For that to happen, a pull request needs to be submitted on GitHub. - -### Review your code - -When you're ready to ask for a code review, file a pull request. Before you do, once again make sure that you have followed all the guidelines outlined in this document regarding code style, tests, performance tests, and documentation. You should also double check your branch changes against the branch it was based on: - -1. Navigate to your repository on GitHub -- -2. Click on `Branches` -3. Click on the `Compare` button for your feature branch -4. Select the `base` and `compare` branches, if necessary. This will be `master` and `shiny-new-feature`, respectively. - -### Finally, make the pull request - -If everything looks good, you are ready to make a pull request. A pull request is how code from a local repository becomes available to the GitHub community and can be looked at and eventually merged into the master version. This pull request and its associated changes will eventually be committed to the master branch and available in the next release. To submit a pull request: - -1. Navigate to your repository on GitHub -2. Click on the `Pull Request` button -3. You can then click on `Commits` and `Files Changed` to make sure everything looks okay one last time -4. Write a description of your changes in the `Preview Discussion` tab -5. Click `Send Pull Request`. - -This request then goes to the repository maintainers, and they will review the code. If you need to make more changes, you can make them in your branch, push them to GitHub, and the pull request will be automatically updated. Pushing them to GitHub again is done by: - - git push -f origin shiny-new-feature - -This will automatically update your pull request with the latest code and restart the Travis-CI tests. - -### Delete your merged branch (optional) - -Once your feature branch is accepted into upstream, you'll probably want to get rid of the branch. First, merge upstream master into your branch so git knows it is safe to delete your branch: - - git fetch upstream - git checkout master - git merge upstream/master +Getting Started +--------------- +If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation. -Then you can just do: +If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in our [Getting Started](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start) section of our main contribution doc. - git branch -d shiny-new-feature +Filing Issues +------------- +If you notice a bug in the code or in docs or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the [Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests) section of our main contribution doc. -Make sure you use a lower-case `-d`, or else git won't warn you if your feature branch has not actually been merged. +Contributing to the Codebase +---------------------------- +The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to our [Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code) section of our main contribution docs. -The branch will still exist on GitHub, so to delete it there do: +Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing). We also have guidelines regarding coding style that will be enforced during testing. Details about coding style can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards). - git push origin --delete shiny-new-feature +Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the [Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas) section of our main contribution docs. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md index 1f614b54b1f71..e33835c462511 100644 --- a/.github/ISSUE_TEMPLATE.md +++ b/.github/ISSUE_TEMPLATE.md @@ -8,11 +8,22 @@ [this should explain **why** the current behaviour is a problem and why the expected output is a better solution.] +**Note**: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates! + +**Note**: Many problems can be resolved by simply upgrading `pandas` to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if `master` addresses this issue, but that is not necessary. + +For documentation-related issues, you can check the latest versions of the docs on `master` here: + +https://pandas-docs.github.io/pandas-docs-travis/ + +If the issue has not been resolved there, go ahead and file it in the issue tracker. + #### Expected Output #### Output of ``pd.show_versions()``
-# Paste the output here pd.show_versions() here + +[paste the output of ``pd.show_versions()`` here below this line]
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 918d427ee4f4c..4e1e9ce017408 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,4 +1,4 @@ - - [ ] closes #xxxx - - [ ] tests added / passed - - [ ] passes ``git diff upstream/master | flake8 --diff`` - - [ ] whatsnew entry +- [ ] closes #xxxx +- [ ] tests added / passed +- [ ] passes `git diff upstream/master -u -- "*.py" | flake8 --diff` +- [ ] whatsnew entry diff --git a/.gitignore b/.gitignore index a77e780f3332d..ff0a6aef47163 100644 --- a/.gitignore +++ b/.gitignore @@ -7,6 +7,7 @@ *$ *.bak *flymake* +*.iml *.kdev4 *.log *.swp @@ -19,6 +20,7 @@ .noseids .ipynb_checkpoints .tags +.cache/ # Compiled source # ################### @@ -56,6 +58,8 @@ dist **/wheelhouse/* # coverage .coverage +coverage.xml +coverage_html_report # OS generated files # ###################### @@ -100,3 +104,5 @@ doc/source/index.rst doc/build/html/index.html # Windows specific leftover: doc/tmp.sv +doc/source/styled.xlsx +doc/source/templates/ diff --git a/.pep8speaks.yml b/.pep8speaks.yml new file mode 100644 index 0000000000000..299b76c8922cc --- /dev/null +++ b/.pep8speaks.yml @@ -0,0 +1,10 @@ +# File : .pep8speaks.yml + +scanner: + diff_only: True # If True, errors caused by only the patch are shown + +pycodestyle: + max-line-length: 79 + ignore: # Errors and warnings to ignore + - E731 + - E402 diff --git a/.travis.yml b/.travis.yml index be2058950d8ec..034e2a32bb75c 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,22 +1,25 @@ sudo: false language: python +# Default Python version is usually 2.7 +python: 3.5 -# To turn off cached miniconda, cython files and compiler cache comment out the -# USE_CACHE=true line for the build in the matrix below. To delete caches go to -# https://travis-ci.org/OWNER/REPOSITORY/caches or run +# To turn off cached cython files and compiler cache +# set NOCACHE-true +# To delete caches go to https://travis-ci.org/OWNER/REPOSITORY/caches or run # travis cache --delete inside the project directory from the travis command line client -# The cash directories will be deleted if anything in ci/ changes in a commit +# The cache directories will be deleted if anything in ci/ changes in a commit cache: + ccache: true directories: - - $HOME/miniconda # miniconda cache - $HOME/.cache # cython cache - $HOME/.ccache # compiler cache env: global: - - # pandas-docs-travis GH - - secure: "YvvTc+FrSYHgdxqoxn9s8VOaCWjvZzlkaf6k55kkmQqCYR9dPiLMsot1F96/N7o3YlD1s0znPQCak93Du8HHi/8809zAXloTaMSZrWz4R4qn96xlZFRE88O/w/Z1t3VVYpKX3MHlCggBc8MtXrqmvWKJMAqXyysZ4TTzoiJDPvE=" + # create a github personal access token + # cd pandas-dev/pandas + # travis encrypt 'PANDAS_GH_TOKEN=personal_access_token' -r pandas-dev/pandas + - secure: "EkWLZhbrp/mXJOx38CHjs7BnjXafsqHtwxPQrqWy457VDFWhIY1DMnIR/lOWG+a20Qv52sCsFtiZEmMfUjf0pLGXOqurdxbYBGJ7/ikFLk9yV2rDwiArUlVM9bWFnFxHvdz9zewBH55WurrY4ShZWyV+x2dWjjceWG5VpWeI6sA=" git: # for cloning @@ -24,312 +27,112 @@ git: matrix: fast_finish: true + exclude: + # Exclude the default Python 3.5 build + - python: 3.5 include: - - language: objective-c - os: osx - compiler: clang - osx_image: xcode6.4 + - os: osx + language: generic env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_osx" - - NOSE_ARGS="not slow and not network and not disabled" - - BUILD_TYPE=conda - - JOB_TAG=_OSX - - TRAVIS_PYTHON_VERSION=3.5 - - CACHE_NAME="35_osx" - - USE_CACHE=true - - python: 2.7 + - JOB="3.5_OSX" TEST_ARGS="--skip-slow --skip-network" + - dist: trusty env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_slow_nnet_LOCALE" - - NOSE_ARGS="slow and not network and not disabled" - - LOCALE_OVERRIDE="zh_CN.UTF-8" - - FULL_DEPS=true - - JOB_TAG=_LOCALE - - CACHE_NAME="27_slow_nnet_LOCALE" - - USE_CACHE=true + - JOB="2.7_LOCALE" LOCALE_OVERRIDE="zh_CN.UTF-8" SLOW=true addons: apt: packages: - language-pack-zh-hans - - python: 2.7 + - dist: trusty env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_nslow" - - NOSE_ARGS="not slow and not disabled" - - FULL_DEPS=true - - CLIPBOARD_GUI=gtk2 - - LINT=true - - CACHE_NAME="27_nslow" - - USE_CACHE=true + - JOB="2.7" TEST_ARGS="--skip-slow" LINT=true addons: apt: packages: - python-gtk2 - - python: 3.5 + - dist: trusty env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_nslow" - - NOSE_ARGS="not slow and not network and not disabled" - - FULL_DEPS=true - - CLIPBOARD=xsel - - COVERAGE=true - - CACHE_NAME="35_nslow" -# - USE_CACHE=true # Don't use cache for 35_nslow + - JOB="3.5" TEST_ARGS="--skip-slow --skip-network" COVERAGE=true addons: apt: packages: - xsel - - python: 3.6 + - dist: trusty env: - - PYTHON_VERSION=3.6 - - JOB_NAME: "36" - - NOSE_ARGS="not slow and not network and not disabled" - - PANDAS_TESTING_MODE="deprecate" - addons: - apt: - packages: - - libatlas-base-dev - - gfortran -# In allow_failures - - python: 2.7 - env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_nslow_nnet_COMPAT" - - NOSE_ARGS="not slow and not network and not disabled" - - LOCALE_OVERRIDE="it_IT.UTF-8" - - INSTALL_TEST=true - - JOB_TAG=_COMPAT - - CACHE_NAME="27_nslow_nnet_COMPAT" - - USE_CACHE=true - addons: - apt: - packages: - - language-pack-it -# In allow_failures - - python: 2.7 + - JOB="3.6" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" CONDA_FORGE=true + # In allow_failures + - dist: trusty env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_slow" - - JOB_TAG=_SLOW - - NOSE_ARGS="slow and not network and not disabled" - - FULL_DEPS=true - - CACHE_NAME="27_slow" - - USE_CACHE=true -# In allow_failures - - python: 2.7 + - JOB="2.7_SLOW" SLOW=true + # In allow_failures + - dist: trusty env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_build_test_conda" - - JOB_TAG=_BUILD_TEST - - NOSE_ARGS="not slow and not disabled" - - FULL_DEPS=true - - BUILD_TEST=true - - CACHE_NAME="27_build_test_conda" - - USE_CACHE=true -# In allow_failures - - python: 3.4 + - JOB="2.7_BUILD_TEST" TEST_ARGS="--skip-slow" BUILD_TEST=true + # In allow_failures + - dist: trusty env: - - PYTHON_VERSION=3.4 - - JOB_NAME: "34_nslow" - - LOCALE_OVERRIDE="zh_CN.UTF-8" - - NOSE_ARGS="not slow and not disabled" - - FULL_DEPS=true - - CLIPBOARD=xsel - - CACHE_NAME="34_nslow" - - USE_CACHE=true - addons: - apt: - packages: - - xsel - - language-pack-zh-hans -# In allow_failures - - python: 3.4 + - JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" + # In allow_failures + - dist: trusty env: - - PYTHON_VERSION=3.4 - - JOB_NAME: "34_slow" - - JOB_TAG=_SLOW - - NOSE_ARGS="slow and not network and not disabled" - - FULL_DEPS=true - - CLIPBOARD=xsel - - CACHE_NAME="34_slow" - - USE_CACHE=true + - JOB="3.6_DOC" DOC=true addons: apt: packages: - xsel -# In allow_failures - - python: 3.5 - env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_numpy_dev" - - JOB_TAG=_NUMPY_DEV - - NOSE_ARGS="not slow and not network and not disabled" - - PANDAS_TESTING_MODE="deprecate" - - CACHE_NAME="35_numpy_dev" - - USE_CACHE=true - addons: - apt: - packages: - - libatlas-base-dev - - gfortran -# In allow_failures - - python: 3.5 - env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_ascii" - - JOB_TAG=_ASCII - - NOSE_ARGS="not slow and not network and not disabled" - - LOCALE_OVERRIDE="C" - - CACHE_NAME="35_ascii" - - USE_CACHE=true -# In allow_failures - - python: 3.5 - env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "doc_build" - - FULL_DEPS=true - - DOC_BUILD=true - - JOB_TAG=_DOC_BUILD - - CACHE_NAME="doc_build" - - USE_CACHE=true allow_failures: - - python: 2.7 - env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_slow" - - JOB_TAG=_SLOW - - NOSE_ARGS="slow and not network and not disabled" - - FULL_DEPS=true - - CACHE_NAME="27_slow" - - USE_CACHE=true - - python: 3.4 + - dist: trusty env: - - PYTHON_VERSION=3.4 - - JOB_NAME: "34_slow" - - JOB_TAG=_SLOW - - NOSE_ARGS="slow and not network and not disabled" - - FULL_DEPS=true - - CLIPBOARD=xsel - - CACHE_NAME="34_slow" - - USE_CACHE=true - addons: - apt: - packages: - - xsel - - python: 2.7 + - JOB="2.7_SLOW" SLOW=true + - dist: trusty env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_build_test_conda" - - JOB_TAG=_BUILD_TEST - - NOSE_ARGS="not slow and not disabled" - - FULL_DEPS=true - - BUILD_TEST=true - - CACHE_NAME="27_build_test_conda" - - USE_CACHE=true - - python: 3.4 + - JOB="2.7_BUILD_TEST" TEST_ARGS="--skip-slow" BUILD_TEST=true + - dist: trusty env: - - PYTHON_VERSION=3.4 - - JOB_NAME: "34_nslow" - - LOCALE_OVERRIDE="zh_CN.UTF-8" - - NOSE_ARGS="not slow and not disabled" - - FULL_DEPS=true - - CLIPBOARD=xsel - - CACHE_NAME="34_nslow" - - USE_CACHE=true - addons: - apt: - packages: - - xsel - - language-pack-zh-hans - - python: 3.5 - env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_numpy_dev" - - JOB_TAG=_NUMPY_DEV - - NOSE_ARGS="not slow and not network and not disabled" - - PANDAS_TESTING_MODE="deprecate" - - CACHE_NAME="35_numpy_dev" - - USE_CACHE=true - addons: - apt: - packages: - - libatlas-base-dev - - gfortran - - python: 2.7 - env: - - PYTHON_VERSION=2.7 - - JOB_NAME: "27_nslow_nnet_COMPAT" - - NOSE_ARGS="not slow and not network and not disabled" - - LOCALE_OVERRIDE="it_IT.UTF-8" - - INSTALL_TEST=true - - JOB_TAG=_COMPAT - - CACHE_NAME="27_nslow_nnet_COMPAT" - - USE_CACHE=true - addons: - apt: - packages: - - language-pack-it - - python: 3.5 - env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "35_ascii" - - JOB_TAG=_ASCII - - NOSE_ARGS="not slow and not network and not disabled" - - LOCALE_OVERRIDE="C" - - CACHE_NAME="35_ascii" - - USE_CACHE=true - - python: 3.5 + - JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" + - dist: trusty env: - - PYTHON_VERSION=3.5 - - JOB_NAME: "doc_build" - - FULL_DEPS=true - - DOC_BUILD=true - - JOB_TAG=_DOC_BUILD - - CACHE_NAME="doc_build" - - USE_CACHE=true + - JOB="3.6_DOC" DOC=true before_install: - echo "before_install" - source ci/travis_process_gbq_encryption.sh - - echo $VIRTUAL_ENV - export PATH="$HOME/miniconda3/bin:$PATH" - df -h - - date - pwd - uname -a - - python -V -# git info & get tags - git --version - git tag - ci/before_install_travis.sh - - export DISPLAY=:99.0 + - export DISPLAY=":99.0" install: - echo "install start" - - ci/check_cache.sh - ci/prep_cython_cache.sh - ci/install_travis.sh - ci/submit_cython_cache.sh - echo "install done" before_script: - - source activate pandas && pip install codecov - - ci/install_db.sh + - ci/install_db_travis.sh script: - echo "script start" - ci/run_build_docs.sh - - ci/script.sh + - ci/script_single.sh + - ci/script_multi.sh - ci/lint.sh - echo "script done" after_success: - - source activate pandas && codecov + - ci/upload_coverage.sh after_script: - echo "after_script start" - - ci/install_test.sh - - source activate pandas && python -c "import pandas; pandas.show_versions();" - - ci/print_skipped.py /tmp/nosetests.xml + - source activate pandas && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd + - if [ -e /tmp/single.xml ]; then + ci/print_skipped.py /tmp/single.xml; + fi + - if [ -e /tmp/multiple.xml ]; then + ci/print_skipped.py /tmp/multiple.xml; + fi - echo "after_script done" diff --git a/AUTHORS.md b/AUTHORS.md new file mode 100644 index 0000000000000..dcaaea101f4c8 --- /dev/null +++ b/AUTHORS.md @@ -0,0 +1,57 @@ +About the Copyright Holders +=========================== + +* Copyright (c) 2008-2011 AQR Capital Management, LLC + + AQR Capital Management began pandas development in 2008. Development was + led by Wes McKinney. AQR released the source under this license in 2009. +* Copyright (c) 2011-2012, Lambda Foundry, Inc. + + Wes is now an employee of Lambda Foundry, and remains the pandas project + lead. +* Copyright (c) 2011-2012, PyData Development Team + + The PyData Development Team is the collection of developers of the PyData + project. This includes all of the PyData sub-projects, including pandas. The + core team that coordinates development on GitHub can be found here: + http://github.com/pydata. + +Full credits for pandas contributors can be found in the documentation. + +Our Copyright Policy +==================== + +PyData uses a shared copyright model. Each contributor maintains copyright +over their contributions to PyData. However, it is important to note that +these contributions are typically only changes to the repositories. Thus, +the PyData source code, in its entirety, is not the copyright of any single +person or institution. Instead, it is the collective copyright of the +entire PyData Development Team. If individual contributors want to maintain +a record of what changes/contributions they have specific copyright on, +they should indicate their copyright in the commit message of the change +when they commit the change to one of the PyData repositories. + +With this in mind, the following banner should be used in any source code +file to indicate the copyright and license terms: + +``` +#----------------------------------------------------------------------------- +# Copyright (c) 2012, PyData Development Team +# All rights reserved. +# +# Distributed under the terms of the BSD Simplified License. +# +# The full license is in the LICENSE file, distributed with this software. +#----------------------------------------------------------------------------- +``` + +Other licenses can be found in the LICENSES directory. + +License +======= + +pandas is distributed under a 3-clause ("Simplified" or "New") BSD +license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have +BSD-compatible licenses, are included. Their licenses follow the pandas +license. + diff --git a/LICENSE b/LICENSE index c9b8834e8774b..924de26253bf4 100644 --- a/LICENSE +++ b/LICENSE @@ -1,87 +1,29 @@ -======= -License -======= +BSD 3-Clause License -pandas is distributed under a 3-clause ("Simplified" or "New") BSD -license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have -BSD-compatible licenses, are included. Their licenses follow the pandas -license. - -pandas license -============== - -Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team -All rights reserved. - -Copyright (c) 2008-2011 AQR Capital Management, LLC +Copyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team All rights reserved. Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above - copyright notice, this list of conditions and the following - disclaimer in the documentation and/or other materials provided - with the distribution. - - * Neither the name of the copyright holder nor the names of any - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +* Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -About the Copyright Holders -=========================== - -AQR Capital Management began pandas development in 2008. Development was -led by Wes McKinney. AQR released the source under this license in 2009. -Wes is now an employee of Lambda Foundry, and remains the pandas project -lead. - -The PyData Development Team is the collection of developers of the PyData -project. This includes all of the PyData sub-projects, including pandas. The -core team that coordinates development on GitHub can be found here: -http://github.com/pydata. - -Full credits for pandas contributors can be found in the documentation. - -Our Copyright Policy -==================== - -PyData uses a shared copyright model. Each contributor maintains copyright -over their contributions to PyData. However, it is important to note that -these contributions are typically only changes to the repositories. Thus, -the PyData source code, in its entirety, is not the copyright of any single -person or institution. Instead, it is the collective copyright of the -entire PyData Development Team. If individual contributors want to maintain -a record of what changes/contributions they have specific copyright on, -they should indicate their copyright in the commit message of the change -when they commit the change to one of the PyData repositories. - -With this in mind, the following banner should be used in any source code -file to indicate the copyright and license terms: - -#----------------------------------------------------------------------------- -# Copyright (c) 2012, PyData Development Team -# All rights reserved. -# -# Distributed under the terms of the BSD Simplified License. -# -# The full license is in the LICENSE file, distributed with this software. -#----------------------------------------------------------------------------- - -Other licenses can be found in the LICENSES directory. \ No newline at end of file diff --git a/MANIFEST.in b/MANIFEST.in index 2d26fbfd6adaf..1a6b831c1b975 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -3,11 +3,11 @@ include LICENSE include RELEASE.md include README.rst include setup.py +include pyproject.toml graft doc prune doc/build -graft examples graft pandas global-exclude *.so @@ -26,3 +26,4 @@ global-exclude *.png # recursive-include LICENSES * include versioneer.py include pandas/_version.py +include pandas/io/formats/templates/*.tpl diff --git a/Makefile b/Makefile index 9a768932b8bea..194a8861715b7 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -tseries: pandas/lib.pyx pandas/tslib.pyx pandas/hashtable.pyx +tseries: pandas/_libs/lib.pyx pandas/_libs/tslib.pyx pandas/_libs/hashtable.pyx python setup.py build_ext --inplace .PHONY : develop build clean clean_pyc tseries doc @@ -9,9 +9,6 @@ clean: clean_pyc: -find . -name '*.py[co]' -exec rm {} \; -sparse: pandas/src/sparse.pyx - python setup.py build_ext --inplace - build: clean_pyc python setup.py build_ext --inplace diff --git a/README.md b/README.md index 4293d7294d5e0..ac043f5586498 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ latest release - latest release + latest release Package Status @@ -30,10 +30,19 @@ + + + + + circleci build status + + + + - - appveyor build status + + appveyor build status @@ -44,8 +53,16 @@ Conda - - conda downloads + + conda default downloads + + + + + Conda-forge + + + conda-forge downloads @@ -106,31 +123,31 @@ Here are just a few of the things that pandas does well: moving window linear regressions, date shifting and lagging, etc. - [missing-data]: http://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data - [insertion-deletion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion - [alignment]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures - [groupby]: http://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine - [conversion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe - [slicing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges - [fancy-indexing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix - [subsetting]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing - [merging]: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging - [joining]: http://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index - [reshape]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables - [pivot-table]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations - [mi]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex - [flat-files]: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files - [excel]: http://pandas.pydata.org/pandas-docs/stable/io.html#excel-files - [db]: http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries - [hdfstore]: http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables - [timeseries]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality + [missing-data]: https://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data + [insertion-deletion]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion + [alignment]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures + [groupby]: https://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine + [conversion]: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe + [slicing]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges + [fancy-indexing]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix + [subsetting]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing + [merging]: https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging + [joining]: https://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index + [reshape]: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables + [pivot-table]: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations + [mi]: https://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex + [flat-files]: https://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files + [excel]: https://pandas.pydata.org/pandas-docs/stable/io.html#excel-files + [db]: https://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries + [hdfstore]: https://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables + [timeseries]: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality ## Where to get it The source code is currently hosted on GitHub at: -http://github.com/pandas-dev/pandas +https://github.com/pandas-dev/pandas Binary installers for the latest released version are available at the [Python -package index](http://pypi.python.org/pypi/pandas/) and on conda. +package index](https://pypi.python.org/pypi/pandas) and on conda. ```sh # conda @@ -144,11 +161,11 @@ pip install pandas ## Dependencies - [NumPy](http://www.numpy.org): 1.7.0 or higher -- [python-dateutil](http://labix.org/python-dateutil): 1.5 or higher -- [pytz](http://pytz.sourceforge.net) +- [python-dateutil](https://labix.org/python-dateutil): 1.5 or higher +- [pytz](https://pythonhosted.org/pytz) - Needed for time zone support with ``pandas.date_range`` -See the [full installation instructions](http://pandas.pydata.org/pandas-docs/stable/install.html#dependencies) +See the [full installation instructions](https://pandas.pydata.org/pandas-docs/stable/install.html#dependencies) for recommended and optional dependencies. ## Installation from sources @@ -180,20 +197,13 @@ mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs pip install -e . ``` -On Windows, you will need to install MinGW and execute: - -```sh -python setup.py build --compiler=mingw32 -python setup.py install -``` - -See http://pandas.pydata.org/ for more information. +See the full instructions for [installing from source](https://pandas.pydata.org/pandas-docs/stable/install.html#installing-from-source). ## License -BSD +[BSD 3](LICENSE) ## Documentation -The official documentation is hosted on PyData.org: http://pandas.pydata.org/ +The official documentation is hosted on PyData.org: https://pandas.pydata.org/pandas-docs/stable The Sphinx documentation should provide a good starting point for learning how to use the library. Expect the docs to continue to expand as time goes on. @@ -202,10 +212,21 @@ to use the library. Expect the docs to continue to expand as time goes on. Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and has been under active development since then. +## Getting Help + +For usage questions, the best place to go to is [StackOverflow](https://stackoverflow.com/questions/tagged/pandas). +Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata). + ## Discussion and Development -Since pandas development is related to a number of other scientific -Python projects, questions are welcome on the scipy-user mailing -list. Specialized discussions or design issues should take place on -the PyData mailing list / Google group: +Most development discussion is taking place on github in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Gitter channel](https://gitter.im/pydata/pandas) is available for quick development related questions. + +## Contributing to pandas +All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. + +A detailed overview on how to contribute can be found in the **[contributing guide.](https://pandas.pydata.org/pandas-docs/stable/contributing.html)** + +If you are simply looking to start working with the pandas codebase, navigate to the [GitHub “issues” tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [Difficulty Novice](https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22) where you could start out. + +Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it! -https://groups.google.com/forum/#!forum/pydata +Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). diff --git a/RELEASE.md b/RELEASE.md index a181412be2719..efd075dabcba9 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -1,6 +1,6 @@ Release Notes ============= -The list of changes to pandas between each release can be found +The list of changes to Pandas between each release can be found [here](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html). For full details, see the commit logs at http://github.com/pandas-dev/pandas. diff --git a/appveyor.yml b/appveyor.yml index 2499e7069843d..65e62f887554e 100644 --- a/appveyor.yml +++ b/appveyor.yml @@ -14,28 +14,22 @@ environment: # /E:ON and /V:ON options are not enabled in the batch script intepreter # See: http://stackoverflow.com/a/13751649/163740 CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\ci\\run_with_env.cmd" + clone_folder: C:\projects\pandas matrix: - - CONDA_ROOT: "C:\\Miniconda3.5_64" + - CONDA_ROOT: "C:\\Miniconda3_64" PYTHON_VERSION: "3.6" PYTHON_ARCH: "64" CONDA_PY: "36" - CONDA_NPY: "111" + CONDA_NPY: "112" - - CONDA_ROOT: "C:\\Miniconda3.5_64" + - CONDA_ROOT: "C:\\Miniconda3_64" PYTHON_VERSION: "2.7" PYTHON_ARCH: "64" CONDA_PY: "27" CONDA_NPY: "110" - - CONDA_ROOT: "C:\\Miniconda3.5_64" - PYTHON_VERSION: "3.5" - PYTHON_ARCH: "64" - CONDA_PY: "35" - CONDA_NPY: "111" - - # We always use a 64-bit machine, but can build x86 distributions # with the PYTHON_ARCH variable (which is used by CMD_IN_ENV). platform: @@ -65,8 +59,7 @@ install: # install our build environment - cmd: conda config --set show_channel_urls true --set always_yes true --set changeps1 false - #- cmd: conda update -q conda - - cmd: conda install conda=4.2.15 + - cmd: conda update -q conda - cmd: conda config --set ssl_verify false # add the pandas channel *before* defaults to have defaults take priority @@ -78,22 +71,19 @@ install: # this is now the downloaded conda... - cmd: conda info -a - # build em using the local source checkout in the correct windows env - - cmd: '%CMD_IN_ENV% conda build ci\appveyor.recipe -q' - # create our env - - cmd: conda create -q -n pandas python=%PYTHON_VERSION% nose + - cmd: conda create -n pandas python=%PYTHON_VERSION% cython pytest>=3.1.0 pytest-xdist - cmd: activate pandas - - SET REQ=ci\requirements-%PYTHON_VERSION%-%PYTHON_ARCH%.run + - SET REQ=ci\requirements-%PYTHON_VERSION%_WIN.run - cmd: echo "installing requirements from %REQ%" - - cmd: conda install -n pandas -q --file=%REQ% + - cmd: conda install -n pandas --file=%REQ% - cmd: conda list -n pandas - cmd: echo "installing requirements from %REQ% - done" - - ps: conda install -n pandas (conda build ci\appveyor.recipe -q --output) + + # build em using the local source checkout in the correct windows env + - cmd: '%CMD_IN_ENV% python setup.py build_ext --inplace' test_script: # tests - - cd \ - cmd: activate pandas - - cmd: conda list - - cmd: nosetests --exe -A "not slow and not network and not disabled" pandas + - cmd: test.bat diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json index 155deb5bdbd1f..9c333f62810f4 100644 --- a/asv_bench/asv.conf.json +++ b/asv_bench/asv.conf.json @@ -26,7 +26,7 @@ // The Pythons you'd like to test against. If not provided, defaults // to the current version of Python used to run `asv`. // "pythons": ["2.7", "3.4"], - "pythons": ["2.7"], + "pythons": ["3.6"], // The matrix of dependencies to test. Each key is the name of a // package (in PyPI) and the values are version numbers. An empty @@ -46,11 +46,14 @@ "numexpr": [], "pytables": [null, ""], // platform dependent, see excludes below "tables": [null, ""], - "libpython": [null, ""], "openpyxl": [], "xlsxwriter": [], "xlrd": [], - "xlwt": [] + "xlwt": [], + "pytest": [], + // If using Windows with python 2.7 and want to build using the + // mingw toolchain (rather than MSVC), uncomment the following line. + // "libpython": [], }, // Combinations of libraries/python versions can be excluded/included @@ -79,10 +82,6 @@ {"environment_type": "conda", "pytables": null}, {"environment_type": "(?!conda).*", "tables": null}, {"environment_type": "(?!conda).*", "pytables": ""}, - // On conda&win32, install libpython - {"sys_platform": "(?!win32).*", "libpython": ""}, - {"environment_type": "conda", "sys_platform": "win32", "libpython": null}, - {"environment_type": "(?!conda).*", "libpython": ""} ], "include": [], @@ -118,8 +117,10 @@ // with results. If the commit is `null`, regression detection is // skipped for the matching benchmark. // - // "regressions_first_commits": { - // "some_benchmark": "352cdf", // Consider regressions only after this commit - // "another_benchmark": null, // Skip regression detection altogether - // } + "regressions_first_commits": { + ".*": "v0.20.0" + }, + "regression_thresholds": { + ".*": 0.05 + } } diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py index fe657936c403e..40cfec1bcd4c7 100644 --- a/asv_bench/benchmarks/algorithms.py +++ b/asv_bench/benchmarks/algorithms.py @@ -1,7 +1,16 @@ +from importlib import import_module + import numpy as np + import pandas as pd from pandas.util import testing as tm +for imp in ['pandas.util', 'pandas.tools.hashing']: + try: + hashing = import_module(imp) + break + except: + pass class Algorithms(object): goal_time = 0.2 @@ -103,13 +112,13 @@ def setup(self): self.df.iloc[10:20] = np.nan def time_frame(self): - self.df.hash() + hashing.hash_pandas_object(self.df) def time_series_int(self): - self.df.E.hash() + hashing.hash_pandas_object(self.df.E) def time_series_string(self): - self.df.B.hash() + hashing.hash_pandas_object(self.df.B) def time_series_categorical(self): - self.df.C.hash() + hashing.hash_pandas_object(self.df.C) diff --git a/asv_bench/benchmarks/attrs_caching.py b/asv_bench/benchmarks/attrs_caching.py index 9210f1f2878d4..b7610037bed4d 100644 --- a/asv_bench/benchmarks/attrs_caching.py +++ b/asv_bench/benchmarks/attrs_caching.py @@ -1,5 +1,9 @@ from .pandas_vb_common import * -from pandas.util.decorators import cache_readonly + +try: + from pandas.util import cache_readonly +except ImportError: + from pandas.util.decorators import cache_readonly class DataFrameAttributes(object): diff --git a/asv_bench/benchmarks/binary_ops.py b/asv_bench/benchmarks/binary_ops.py index 53cb1cf465698..0ca21b929ea17 100644 --- a/asv_bench/benchmarks/binary_ops.py +++ b/asv_bench/benchmarks/binary_ops.py @@ -1,5 +1,8 @@ from .pandas_vb_common import * -import pandas.computation.expressions as expr +try: + import pandas.core.computation.expressions as expr +except ImportError: + import pandas.computation.expressions as expr class Ops(object): @@ -107,4 +110,4 @@ def setup(self): self.s = Series(date_range('20010101', periods=self.N, freq='T', tz='US/Eastern')) self.ts = self.s[self.halfway] - self.s2 = Series(date_range('20010101', periods=self.N, freq='s', tz='US/Eastern')) \ No newline at end of file + self.s2 = Series(date_range('20010101', periods=self.N, freq='s', tz='US/Eastern')) diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py index cca652c68cf15..6432ccfb19efe 100644 --- a/asv_bench/benchmarks/categoricals.py +++ b/asv_bench/benchmarks/categoricals.py @@ -1,8 +1,11 @@ from .pandas_vb_common import * try: - from pandas.types.concat import union_categoricals + from pandas.api.types import union_categoricals except ImportError: - pass + try: + from pandas.types.concat import union_categoricals + except ImportError: + pass class Categoricals(object): @@ -63,3 +66,37 @@ def time_value_counts_dropna(self): def time_rendering(self): str(self.sel) + + +class Categoricals3(object): + goal_time = 0.2 + + def setup(self): + N = 100000 + ncats = 100 + + self.s1 = Series(np.array(tm.makeCategoricalIndex(N, ncats))) + self.s1_cat = self.s1.astype('category') + self.s1_cat_ordered = self.s1.astype('category', ordered=True) + + self.s2 = Series(np.random.randint(0, ncats, size=N)) + self.s2_cat = self.s2.astype('category') + self.s2_cat_ordered = self.s2.astype('category', ordered=True) + + def time_rank_string(self): + self.s1.rank() + + def time_rank_string_cat(self): + self.s1_cat.rank() + + def time_rank_string_cat_ordered(self): + self.s1_cat_ordered.rank() + + def time_rank_int(self): + self.s2.rank() + + def time_rank_int_cat(self): + self.s2_cat.rank() + + def time_rank_int_cat_ordered(self): + self.s2_cat_ordered.rank() diff --git a/asv_bench/benchmarks/eval.py b/asv_bench/benchmarks/eval.py index a0819e33dc254..6f33590ee9e33 100644 --- a/asv_bench/benchmarks/eval.py +++ b/asv_bench/benchmarks/eval.py @@ -1,6 +1,9 @@ from .pandas_vb_common import * import pandas as pd -import pandas.computation.expressions as expr +try: + import pandas.core.computation.expressions as expr +except ImportError: + import pandas.computation.expressions as expr class Eval(object): diff --git a/asv_bench/benchmarks/frame_ctor.py b/asv_bench/benchmarks/frame_ctor.py index 05c1a27fdf8ca..dec4fcba0eb5e 100644 --- a/asv_bench/benchmarks/frame_ctor.py +++ b/asv_bench/benchmarks/frame_ctor.py @@ -20,12 +20,12 @@ def setup(self): self.data = self.frame.to_dict() except: self.data = self.frame.toDict() - self.some_dict = self.data.values()[0] + self.some_dict = list(self.data.values())[0] self.dict_list = [dict(zip(self.columns, row)) for row in self.frame.values] self.data2 = dict( ((i, dict(((j, float(j)) for j in range(100)))) for i in - xrange(2000))) + range(2000))) def time_frame_ctor_list_of_dict(self): DataFrame(self.dict_list) diff --git a/asv_bench/benchmarks/frame_methods.py b/asv_bench/benchmarks/frame_methods.py index 9f491302a4d6f..af72ca1e9a6ab 100644 --- a/asv_bench/benchmarks/frame_methods.py +++ b/asv_bench/benchmarks/frame_methods.py @@ -56,7 +56,7 @@ def time_reindex_both_axes_ix(self): self.df.ix[(self.idx, self.idx)] def time_reindex_upcast(self): - self.df2.reindex(permutation(range(1200))) + self.df2.reindex(np.random.permutation(range(1200))) #---------------------------------------------------------------------- @@ -583,7 +583,7 @@ class frame_assign_timeseries_index(object): goal_time = 0.2 def setup(self): - self.idx = date_range('1/1/2000', periods=100000, freq='D') + self.idx = date_range('1/1/2000', periods=100000, freq='H') self.df = DataFrame(randn(100000, 1), columns=['A'], index=self.idx) def time_frame_assign_timeseries_index(self): diff --git a/asv_bench/benchmarks/gil.py b/asv_bench/benchmarks/gil.py index 1c5e59672cb57..78a94976e732d 100644 --- a/asv_bench/benchmarks/gil.py +++ b/asv_bench/benchmarks/gil.py @@ -1,11 +1,17 @@ from .pandas_vb_common import * -from pandas.core import common as com + +from pandas.core.algorithms import take_1d try: from cStringIO import StringIO except ImportError: from io import StringIO +try: + from pandas._libs import algos +except ImportError: + from pandas import algos + try: from pandas.util.testing import test_parallel @@ -167,11 +173,11 @@ def time_nogil_take1d_float64(self): @test_parallel(num_threads=2) def take_1d_pg2_int64(self): - com.take_1d(self.df.int64.values, self.indexer) + take_1d(self.df.int64.values, self.indexer) @test_parallel(num_threads=2) def take_1d_pg2_float64(self): - com.take_1d(self.df.float64.values, self.indexer) + take_1d(self.df.float64.values, self.indexer) class nogil_take1d_int64(object): @@ -193,11 +199,11 @@ def time_nogil_take1d_int64(self): @test_parallel(num_threads=2) def take_1d_pg2_int64(self): - com.take_1d(self.df.int64.values, self.indexer) + take_1d(self.df.int64.values, self.indexer) @test_parallel(num_threads=2) def take_1d_pg2_float64(self): - com.take_1d(self.df.float64.values, self.indexer) + take_1d(self.df.float64.values, self.indexer) class nogil_kth_smallest(object): @@ -226,7 +232,7 @@ class nogil_datetime_fields(object): def setup(self): self.N = 100000000 - self.dti = pd.date_range('1900-01-01', periods=self.N, freq='D') + self.dti = pd.date_range('1900-01-01', periods=self.N, freq='T') self.period = self.dti.to_period('D') if (not have_real_test_parallel): raise NotImplementedError diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py index 03ff62568b405..13b5cd2b06032 100644 --- a/asv_bench/benchmarks/groupby.py +++ b/asv_bench/benchmarks/groupby.py @@ -108,16 +108,34 @@ def setup(self): self.N = 10000 self.labels = np.random.randint(0, 2000, size=self.N) self.labels2 = np.random.randint(0, 3, size=self.N) - self.df = DataFrame({'key': self.labels, 'key2': self.labels2, 'value1': randn(self.N), 'value2': (['foo', 'bar', 'baz', 'qux'] * (self.N / 4)), }) - - def f(self, g): + self.df = DataFrame({ + 'key': self.labels, + 'key2': self.labels2, + 'value1': np.random.randn(self.N), + 'value2': (['foo', 'bar', 'baz', 'qux'] * (self.N // 4)), + }) + + @staticmethod + def scalar_function(g): return 1 - def time_groupby_frame_apply(self): - self.df.groupby(['key', 'key2']).apply(self.f) + def time_groupby_frame_apply_scalar_function(self): + self.df.groupby(['key', 'key2']).apply(self.scalar_function) + + def time_groupby_frame_apply_scalar_function_overhead(self): + self.df.groupby('key').apply(self.scalar_function) + + @staticmethod + def df_copy_function(g): + # ensure that the group name is available (see GH #15062) + g.name + return g.copy() - def time_groupby_frame_apply_overhead(self): - self.df.groupby('key').apply(self.f) + def time_groupby_frame_df_copy_function(self): + self.df.groupby(['key', 'key2']).apply(self.df_copy_function) + + def time_groupby_frame_apply_df_copy_overhead(self): + self.df.groupby('key').apply(self.df_copy_function) #---------------------------------------------------------------------- @@ -313,7 +331,7 @@ def setup(self): def get_test_data(self, ngroups=100, n=100000): self.unique_groups = range(self.ngroups) - self.arr = np.asarray(np.tile(self.unique_groups, (n / self.ngroups)), dtype=object) + self.arr = np.asarray(np.tile(self.unique_groups, int(n / self.ngroups)), dtype=object) if (len(self.arr) < n): self.arr = np.asarray((list(self.arr) + self.unique_groups[:(n - len(self.arr))]), dtype=object) random.shuffle(self.arr) @@ -350,6 +368,11 @@ def setup(self): self.dates = (np.datetime64('now') + self.offsets) self.df = DataFrame({'key1': np.random.randint(0, 500, size=self.n), 'key2': np.random.randint(0, 100, size=self.n), 'value1': np.random.randn(self.n), 'value2': np.random.randn(self.n), 'value3': np.random.randn(self.n), 'dates': self.dates, }) + N = 1000000 + self.draws = pd.Series(np.random.randn(N)) + labels = pd.Series(['foo', 'bar', 'baz', 'qux'] * (N // 4)) + self.cats = labels.astype('category') + def time_groupby_multi_size(self): self.df.groupby(['key1', 'key2']).size() @@ -359,6 +382,10 @@ def time_groupby_dt_size(self): def time_groupby_dt_timegrouper_size(self): self.df.groupby(TimeGrouper(key='dates', freq='M')).size() + def time_groupby_size(self): + self.draws.groupby(self.cats).size() + + #---------------------------------------------------------------------- # groupby with a variable value for ngroups @@ -492,6 +519,43 @@ def time_groupby_sum(self): self.df.groupby(['a'])['b'].sum() +class groupby_categorical(object): + goal_time = 0.2 + + def setup(self): + N = 100000 + arr = np.random.random(N) + + self.df = DataFrame(dict( + a=Categorical(np.random.randint(10000, size=N)), + b=arr)) + self.df_ordered = DataFrame(dict( + a=Categorical(np.random.randint(10000, size=N), ordered=True), + b=arr)) + self.df_extra_cat = DataFrame(dict( + a=Categorical(np.random.randint(100, size=N), + categories=np.arange(10000)), + b=arr)) + + def time_groupby_sort(self): + self.df.groupby('a')['b'].count() + + def time_groupby_nosort(self): + self.df.groupby('a', sort=False)['b'].count() + + def time_groupby_ordered_sort(self): + self.df_ordered.groupby('a')['b'].count() + + def time_groupby_ordered_nosort(self): + self.df_ordered.groupby('a', sort=False)['b'].count() + + def time_groupby_extra_cat_sort(self): + self.df_extra_cat.groupby('a')['b'].count() + + def time_groupby_extra_cat_nosort(self): + self.df_extra_cat.groupby('a', sort=False)['b'].count() + + class groupby_period(object): # GH 14338 goal_time = 0.2 diff --git a/asv_bench/benchmarks/hdfstore_bench.py b/asv_bench/benchmarks/hdfstore_bench.py index 78de5267a2969..7d490180e8af6 100644 --- a/asv_bench/benchmarks/hdfstore_bench.py +++ b/asv_bench/benchmarks/hdfstore_bench.py @@ -31,16 +31,12 @@ def setup(self): self.remove(self.f) self.store = HDFStore(self.f) - self.store.put('df1', self.df) - self.store.put('df_mixed', self.df_mixed) - - self.store.append('df5', self.df_mixed) - self.store.append('df7', self.df) - - self.store.append('df9', self.df_wide) - - self.store.append('df11', self.df_wide2) - self.store.append('df12', self.df2) + self.store.put('fixed', self.df) + self.store.put('fixed_mixed', self.df_mixed) + self.store.append('table', self.df2) + self.store.append('table_mixed', self.df_mixed) + self.store.append('table_wide', self.df_wide) + self.store.append('table_wide2', self.df_wide2) def teardown(self): self.store.close() @@ -52,45 +48,56 @@ def remove(self, f): pass def time_read_store(self): - self.store.get('df1') + self.store.get('fixed') def time_read_store_mixed(self): - self.store.get('df_mixed') + self.store.get('fixed_mixed') def time_write_store(self): - self.store.put('df2', self.df) + self.store.put('fixed_write', self.df) def time_write_store_mixed(self): - self.store.put('df_mixed2', self.df_mixed) + self.store.put('fixed_mixed_write', self.df_mixed) def time_read_store_table_mixed(self): - self.store.select('df5') + self.store.select('table_mixed') def time_write_store_table_mixed(self): - self.store.append('df6', self.df_mixed) + self.store.append('table_mixed_write', self.df_mixed) def time_read_store_table(self): - self.store.select('df7') + self.store.select('table') def time_write_store_table(self): - self.store.append('df8', self.df) + self.store.append('table_write', self.df) def time_read_store_table_wide(self): - self.store.select('df9') + self.store.select('table_wide') def time_write_store_table_wide(self): - self.store.append('df10', self.df_wide) + self.store.append('table_wide_write', self.df_wide) def time_write_store_table_dc(self): - self.store.append('df15', self.df, data_columns=True) + self.store.append('table_dc_write', self.df_dc, data_columns=True) def time_query_store_table_wide(self): - self.store.select('df11', [('index', '>', self.df_wide2.index[10000]), - ('index', '<', self.df_wide2.index[15000])]) + start = self.df_wide2.index[10000] + stop = self.df_wide2.index[15000] + self.store.select('table_wide', where="index > start and index < stop") def time_query_store_table(self): - self.store.select('df12', [('index', '>', self.df2.index[10000]), - ('index', '<', self.df2.index[15000])]) + start = self.df2.index[10000] + stop = self.df2.index[15000] + self.store.select('table', where="index > start and index < stop") + + def time_store_repr(self): + repr(self.store) + + def time_store_str(self): + str(self.store) + + def time_store_info(self): + self.store.info() class HDF5Panel(object): diff --git a/asv_bench/benchmarks/indexing.py b/asv_bench/benchmarks/indexing.py index 27cd320c661e0..d941ef20dc7ac 100644 --- a/asv_bench/benchmarks/indexing.py +++ b/asv_bench/benchmarks/indexing.py @@ -1,8 +1,4 @@ from .pandas_vb_common import * -try: - import pandas.computation.expressions as expr -except: - expr = None class Int64Indexing(object): @@ -23,6 +19,9 @@ def time_getitem_list_like(self): def time_getitem_array(self): self.s[np.arange(10000)] + def time_getitem_lists(self): + self.s[np.arange(10000).tolist()] + def time_iloc_array(self): self.s.iloc[np.arange(10000)] @@ -88,7 +87,7 @@ def setup(self): def time_getitem_scalar(self): self.ts[self.dt] - + class DataFrameIndexing(object): goal_time = 0.2 @@ -189,6 +188,27 @@ def setup(self): self.eps_C = 5 self.eps_D = 5000 self.mdt2 = self.mdt.set_index(['A', 'B', 'C', 'D']).sortlevel() + self.miint = MultiIndex.from_product( + [np.arange(1000), + np.arange(1000)], names=['one', 'two']) + + import string + + self.mi_large = MultiIndex.from_product( + [np.arange(1000), np.arange(20), list(string.ascii_letters)], + names=['one', 'two', 'three']) + self.mi_med = MultiIndex.from_product( + [np.arange(1000), np.arange(10), list('A')], + names=['one', 'two', 'three']) + self.mi_small = MultiIndex.from_product( + [np.arange(100), list('A'), list('A')], + names=['one', 'two', 'three']) + + rng = np.random.RandomState(4) + size = 1 << 16 + self.mi_unused_levels = pd.MultiIndex.from_arrays([ + rng.randint(0, 1 << 13, size), + rng.randint(0, 1 << 10, size)])[rng.rand(size) < 0.1] def time_series_xs_mi_ix(self): self.s.ix[999] @@ -197,7 +217,65 @@ def time_frame_xs_mi_ix(self): self.df.ix[999] def time_multiindex_slicers(self): - self.mdt2.loc[self.idx[(self.test_A - self.eps_A):(self.test_A + self.eps_A), (self.test_B - self.eps_B):(self.test_B + self.eps_B), (self.test_C - self.eps_C):(self.test_C + self.eps_C), (self.test_D - self.eps_D):(self.test_D + self.eps_D)], :] + self.mdt2.loc[self.idx[ + (self.test_A - self.eps_A):(self.test_A + self.eps_A), + (self.test_B - self.eps_B):(self.test_B + self.eps_B), + (self.test_C - self.eps_C):(self.test_C + self.eps_C), + (self.test_D - self.eps_D):(self.test_D + self.eps_D)], :] + + def time_multiindex_get_indexer(self): + self.miint.get_indexer( + np.array([(0, 10), (0, 11), (0, 12), + (0, 13), (0, 14), (0, 15), + (0, 16), (0, 17), (0, 18), + (0, 19)], dtype=object)) + + def time_multiindex_large_get_loc(self): + self.mi_large.get_loc((999, 19, 'Z')) + + def time_multiindex_large_get_loc_warm(self): + for _ in range(1000): + self.mi_large.get_loc((999, 19, 'Z')) + + def time_multiindex_med_get_loc(self): + self.mi_med.get_loc((999, 9, 'A')) + + def time_multiindex_med_get_loc_warm(self): + for _ in range(1000): + self.mi_med.get_loc((999, 9, 'A')) + + def time_multiindex_string_get_loc(self): + self.mi_small.get_loc((99, 'A', 'A')) + + def time_multiindex_small_get_loc_warm(self): + for _ in range(1000): + self.mi_small.get_loc((99, 'A', 'A')) + + def time_is_monotonic(self): + self.miint.is_monotonic + + def time_remove_unused_levels(self): + self.mi_unused_levels.remove_unused_levels() + + +class IntervalIndexing(object): + goal_time = 0.2 + + def setup(self): + self.monotonic = Series(np.arange(1000000), + index=IntervalIndex.from_breaks(np.arange(1000001))) + + def time_getitem_scalar(self): + self.monotonic[80000] + + def time_loc_scalar(self): + self.monotonic.loc[80000] + + def time_getitem_list(self): + self.monotonic[80000:] + + def time_loc_list(self): + self.monotonic.loc[80000:] class PanelIndexing(object): diff --git a/asv_bench/benchmarks/inference.py b/asv_bench/benchmarks/inference.py index 3635438a7f76b..dc1d6de73f8ae 100644 --- a/asv_bench/benchmarks/inference.py +++ b/asv_bench/benchmarks/inference.py @@ -113,5 +113,5 @@ def setup(self): self.na_values = set() def time_convert(self): - pd.lib.maybe_convert_numeric(self.data, self.na_values, - coerce_numeric=False) + lib.maybe_convert_numeric(self.data, self.na_values, + coerce_numeric=False) diff --git a/asv_bench/benchmarks/join_merge.py b/asv_bench/benchmarks/join_merge.py index d9c631fa92efd..3b0e33b72ddc1 100644 --- a/asv_bench/benchmarks/join_merge.py +++ b/asv_bench/benchmarks/join_merge.py @@ -6,7 +6,7 @@ from pandas import ordered_merge as merge_ordered -#---------------------------------------------------------------------- +# ---------------------------------------------------------------------- # Append class Append(object): @@ -35,7 +35,7 @@ def time_append_mixed(self): self.mdf1.append(self.mdf2) -#---------------------------------------------------------------------- +# ---------------------------------------------------------------------- # Concat class Concat(object): @@ -120,7 +120,7 @@ def time_f_ordered_axis1(self): concat(self.frames_f, axis=1, ignore_index=True) -#---------------------------------------------------------------------- +# ---------------------------------------------------------------------- # Joins class Join(object): @@ -202,7 +202,7 @@ def time_join_non_unique_equal(self): (self.fracofday * self.temp[self.fracofday.index]) -#---------------------------------------------------------------------- +# ---------------------------------------------------------------------- # Merges class Merge(object): @@ -257,7 +257,31 @@ def time_i8merge(self): merge(self.left, self.right, how='outer') -#---------------------------------------------------------------------- +class MergeCategoricals(object): + goal_time = 0.2 + + def setup(self): + self.left_object = pd.DataFrame( + {'X': np.random.choice(range(0, 10), size=(10000,)), + 'Y': np.random.choice(['one', 'two', 'three'], size=(10000,))}) + + self.right_object = pd.DataFrame( + {'X': np.random.choice(range(0, 10), size=(10000,)), + 'Z': np.random.choice(['jjj', 'kkk', 'sss'], size=(10000,))}) + + self.left_cat = self.left_object.assign( + Y=self.left_object['Y'].astype('category')) + self.right_cat = self.right_object.assign( + Z=self.right_object['Z'].astype('category')) + + def time_merge_object(self): + merge(self.left_object, self.right_object, on='X') + + def time_merge_cat(self): + merge(self.left_cat, self.right_cat, on='X') + + +# ---------------------------------------------------------------------- # Ordered merge class MergeOrdered(object): @@ -290,12 +314,12 @@ def setup(self): self.df1 = pd.DataFrame( {'time': np.random.randint(0, one_count / 20, one_count), - 'key': np.random.choice(list(string.uppercase), one_count), + 'key': np.random.choice(list(string.ascii_uppercase), one_count), 'key2': np.random.randint(0, 25, one_count), 'value1': np.random.randn(one_count)}) self.df2 = pd.DataFrame( {'time': np.random.randint(0, two_count / 20, two_count), - 'key': np.random.choice(list(string.uppercase), two_count), + 'key': np.random.choice(list(string.ascii_uppercase), two_count), 'key2': np.random.randint(0, 25, two_count), 'value2': np.random.randn(two_count)}) @@ -332,7 +356,7 @@ def time_multiby(self): merge_asof(self.df1e, self.df2e, on='time', by=['key', 'key2']) -#---------------------------------------------------------------------- +# ---------------------------------------------------------------------- # data alignment class Align(object): diff --git a/asv_bench/benchmarks/packers.py b/asv_bench/benchmarks/packers.py index cd43e305ead8f..24f80cc836dd4 100644 --- a/asv_bench/benchmarks/packers.py +++ b/asv_bench/benchmarks/packers.py @@ -153,18 +153,20 @@ def time_packers_read_stata_with_validation(self): class packers_read_sas(_Packers): def setup(self): - self.f = os.path.join(os.path.dirname(__file__), '..', '..', - 'pandas', 'io', 'tests', 'sas', 'data', - 'test1.sas7bdat') - self.f2 = os.path.join(os.path.dirname(__file__), '..', '..', - 'pandas', 'io', 'tests', 'sas', 'data', - 'paxraw_d_short.xpt') + + testdir = os.path.join(os.path.dirname(__file__), '..', '..', + 'pandas', 'tests', 'io', 'sas') + if not os.path.exists(testdir): + testdir = os.path.join(os.path.dirname(__file__), '..', '..', + 'pandas', 'io', 'tests', 'sas') + self.f = os.path.join(testdir, 'data', 'test1.sas7bdat') + self.f2 = os.path.join(testdir, 'data', 'paxraw_d_short.xpt') def time_read_sas7bdat(self): pd.read_sas(self.f, format='sas7bdat') def time_read_xport(self): - pd.read_sas(self.f, format='xport') + pd.read_sas(self.f2, format='xport') class CSV(_Packers): diff --git a/asv_bench/benchmarks/pandas_vb_common.py b/asv_bench/benchmarks/pandas_vb_common.py index 25b0b5dd4d1b0..b1a58e49fe86c 100644 --- a/asv_bench/benchmarks/pandas_vb_common.py +++ b/asv_bench/benchmarks/pandas_vb_common.py @@ -1,23 +1,27 @@ from pandas import * import pandas as pd -from datetime import timedelta from numpy.random import randn from numpy.random import randint -from numpy.random import permutation import pandas.util.testing as tm import random import numpy as np import threading +from importlib import import_module + try: from pandas.compat import range except ImportError: pass np.random.seed(1234) -try: - import pandas._tseries as lib -except: - import pandas.lib as lib + +# try em until it works! +for imp in ['pandas._libs.lib', 'pandas.lib', 'pandas_tseries']: + try: + lib = import_module(imp) + break + except: + pass try: Panel = Panel diff --git a/asv_bench/benchmarks/panel_ctor.py b/asv_bench/benchmarks/panel_ctor.py index faedce6c574ec..cc6071b054662 100644 --- a/asv_bench/benchmarks/panel_ctor.py +++ b/asv_bench/benchmarks/panel_ctor.py @@ -1,4 +1,5 @@ from .pandas_vb_common import * +from datetime import timedelta class Constructors1(object): @@ -24,7 +25,7 @@ class Constructors2(object): def setup(self): self.data_frames = {} for x in range(100): - self.dr = np.asarray(DatetimeIndex(start=datetime(1990, 1, 1), end=datetime(2012, 1, 1), freq=datetools.Day(1))) + self.dr = np.asarray(DatetimeIndex(start=datetime(1990, 1, 1), end=datetime(2012, 1, 1), freq='D')) self.df = DataFrame({'a': ([0] * len(self.dr)), 'b': ([1] * len(self.dr)), 'c': ([2] * len(self.dr)), }, index=self.dr) self.data_frames[x] = self.df @@ -36,7 +37,7 @@ class Constructors3(object): goal_time = 0.2 def setup(self): - self.dr = np.asarray(DatetimeIndex(start=datetime(1990, 1, 1), end=datetime(2012, 1, 1), freq=datetools.Day(1))) + self.dr = np.asarray(DatetimeIndex(start=datetime(1990, 1, 1), end=datetime(2012, 1, 1), freq='D')) self.data_frames = {} for x in range(100): self.df = DataFrame({'a': ([0] * len(self.dr)), 'b': ([1] * len(self.dr)), 'c': ([2] * len(self.dr)), }, index=self.dr) diff --git a/asv_bench/benchmarks/panel_methods.py b/asv_bench/benchmarks/panel_methods.py index ebe278f6e68b5..6609305502011 100644 --- a/asv_bench/benchmarks/panel_methods.py +++ b/asv_bench/benchmarks/panel_methods.py @@ -21,4 +21,4 @@ def time_shift(self): self.panel.shift(1) def time_shift_minor(self): - self.panel.shift(1, axis='minor') \ No newline at end of file + self.panel.shift(1, axis='minor') diff --git a/asv_bench/benchmarks/plotting.py b/asv_bench/benchmarks/plotting.py index 757c3e27dd333..dda684b35e301 100644 --- a/asv_bench/benchmarks/plotting.py +++ b/asv_bench/benchmarks/plotting.py @@ -4,7 +4,10 @@ except ImportError: def date_range(start=None, end=None, periods=None, freq=None): return DatetimeIndex(start, end, periods=periods, offset=freq) -from pandas.tools.plotting import andrews_curves +try: + from pandas.plotting import andrews_curves +except ImportError: + from pandas.tools.plotting import andrews_curves class TimeseriesPlotting(object): diff --git a/asv_bench/benchmarks/reindex.py b/asv_bench/benchmarks/reindex.py index 8db0cd7629332..537d275e7c727 100644 --- a/asv_bench/benchmarks/reindex.py +++ b/asv_bench/benchmarks/reindex.py @@ -16,8 +16,8 @@ def setup(self): data=np.random.rand(10000, 30), columns=range(30)) # multi-index - N = 1000 - K = 20 + N = 5000 + K = 200 level1 = tm.makeStringIndex(N).values.repeat(K) level2 = np.tile(tm.makeStringIndex(K).values, N) index = MultiIndex.from_arrays([level1, level2]) @@ -132,6 +132,9 @@ def setup(self): self.K = 10000 self.key1 = np.random.randint(0, self.K, size=self.N) self.df_int = DataFrame({'key1': self.key1}) + self.df_bool = DataFrame({i: np.random.randint(0, 2, size=self.K, + dtype=bool) + for i in range(10)}) def time_frame_drop_dups(self): self.df.drop_duplicates(['key1', 'key2']) @@ -154,6 +157,8 @@ def time_series_drop_dups_string(self): def time_frame_drop_dups_int(self): self.df_int.drop_duplicates() + def time_frame_drop_dups_bool(self): + self.df_bool.drop_duplicates() #---------------------------------------------------------------------- # blog "pandas escaped the zoo" diff --git a/asv_bench/benchmarks/replace.py b/asv_bench/benchmarks/replace.py index 66b8af53801ac..63562f90eab2b 100644 --- a/asv_bench/benchmarks/replace.py +++ b/asv_bench/benchmarks/replace.py @@ -1,6 +1,4 @@ from .pandas_vb_common import * -from pandas.compat import range -from datetime import timedelta class replace_fillna(object): diff --git a/asv_bench/benchmarks/reshape.py b/asv_bench/benchmarks/reshape.py index a3ecfff52c794..177e3e7cb87fa 100644 --- a/asv_bench/benchmarks/reshape.py +++ b/asv_bench/benchmarks/reshape.py @@ -1,5 +1,5 @@ from .pandas_vb_common import * -from pandas.core.reshape import melt, wide_to_long +from pandas import melt, wide_to_long class melt_dataframe(object): @@ -59,6 +59,27 @@ def time_reshape_unstack_simple(self): self.df.unstack(1) +class reshape_unstack_large_single_dtype(object): + goal_time = 0.2 + + def setup(self): + m = 100 + n = 1000 + + levels = np.arange(m) + index = pd.MultiIndex.from_product([levels]*2) + columns = np.arange(n) + values = np.arange(m*m*n).reshape(m*m, n) + self.df = pd.DataFrame(values, index, columns) + self.df2 = self.df.iloc[:-1] + + def time_unstack_full_product(self): + self.df.unstack() + + def time_unstack_with_mask(self): + self.df2.unstack() + + class unstack_sparse_keyspace(object): goal_time = 0.2 diff --git a/asv_bench/benchmarks/rolling.py b/asv_bench/benchmarks/rolling.py new file mode 100644 index 0000000000000..899349cd21f84 --- /dev/null +++ b/asv_bench/benchmarks/rolling.py @@ -0,0 +1,185 @@ +from .pandas_vb_common import * +import pandas as pd +import numpy as np + + +class DataframeRolling(object): + goal_time = 0.2 + + def setup(self): + self.N = 100000 + self.Ns = 10000 + self.df = pd.DataFrame({'a': np.random.random(self.N)}) + self.dfs = pd.DataFrame({'a': np.random.random(self.Ns)}) + self.wins = 10 + self.winl = 1000 + + def time_rolling_quantile_0(self): + (self.df.rolling(self.wins).quantile(0.0)) + + def time_rolling_quantile_1(self): + (self.df.rolling(self.wins).quantile(1.0)) + + def time_rolling_quantile_median(self): + (self.df.rolling(self.wins).quantile(0.5)) + + def time_rolling_median(self): + (self.df.rolling(self.wins).median()) + + def time_rolling_mean(self): + (self.df.rolling(self.wins).mean()) + + def time_rolling_max(self): + (self.df.rolling(self.wins).max()) + + def time_rolling_min(self): + (self.df.rolling(self.wins).min()) + + def time_rolling_std(self): + (self.df.rolling(self.wins).std()) + + def time_rolling_count(self): + (self.df.rolling(self.wins).count()) + + def time_rolling_skew(self): + (self.df.rolling(self.wins).skew()) + + def time_rolling_kurt(self): + (self.df.rolling(self.wins).kurt()) + + def time_rolling_sum(self): + (self.df.rolling(self.wins).sum()) + + def time_rolling_corr(self): + (self.dfs.rolling(self.wins).corr()) + + def time_rolling_cov(self): + (self.dfs.rolling(self.wins).cov()) + + def time_rolling_quantile_0_l(self): + (self.df.rolling(self.winl).quantile(0.0)) + + def time_rolling_quantile_1_l(self): + (self.df.rolling(self.winl).quantile(1.0)) + + def time_rolling_quantile_median_l(self): + (self.df.rolling(self.winl).quantile(0.5)) + + def time_rolling_median_l(self): + (self.df.rolling(self.winl).median()) + + def time_rolling_mean_l(self): + (self.df.rolling(self.winl).mean()) + + def time_rolling_max_l(self): + (self.df.rolling(self.winl).max()) + + def time_rolling_min_l(self): + (self.df.rolling(self.winl).min()) + + def time_rolling_std_l(self): + (self.df.rolling(self.wins).std()) + + def time_rolling_count_l(self): + (self.df.rolling(self.wins).count()) + + def time_rolling_skew_l(self): + (self.df.rolling(self.wins).skew()) + + def time_rolling_kurt_l(self): + (self.df.rolling(self.wins).kurt()) + + def time_rolling_sum_l(self): + (self.df.rolling(self.wins).sum()) + + +class SeriesRolling(object): + goal_time = 0.2 + + def setup(self): + self.N = 100000 + self.Ns = 10000 + self.df = pd.DataFrame({'a': np.random.random(self.N)}) + self.dfs = pd.DataFrame({'a': np.random.random(self.Ns)}) + self.sr = self.df.a + self.srs = self.dfs.a + self.wins = 10 + self.winl = 1000 + + def time_rolling_quantile_0(self): + (self.sr.rolling(self.wins).quantile(0.0)) + + def time_rolling_quantile_1(self): + (self.sr.rolling(self.wins).quantile(1.0)) + + def time_rolling_quantile_median(self): + (self.sr.rolling(self.wins).quantile(0.5)) + + def time_rolling_median(self): + (self.sr.rolling(self.wins).median()) + + def time_rolling_mean(self): + (self.sr.rolling(self.wins).mean()) + + def time_rolling_max(self): + (self.sr.rolling(self.wins).max()) + + def time_rolling_min(self): + (self.sr.rolling(self.wins).min()) + + def time_rolling_std(self): + (self.sr.rolling(self.wins).std()) + + def time_rolling_count(self): + (self.sr.rolling(self.wins).count()) + + def time_rolling_skew(self): + (self.sr.rolling(self.wins).skew()) + + def time_rolling_kurt(self): + (self.sr.rolling(self.wins).kurt()) + + def time_rolling_sum(self): + (self.sr.rolling(self.wins).sum()) + + def time_rolling_corr(self): + (self.srs.rolling(self.wins).corr()) + + def time_rolling_cov(self): + (self.srs.rolling(self.wins).cov()) + + def time_rolling_quantile_0_l(self): + (self.sr.rolling(self.winl).quantile(0.0)) + + def time_rolling_quantile_1_l(self): + (self.sr.rolling(self.winl).quantile(1.0)) + + def time_rolling_quantile_median_l(self): + (self.sr.rolling(self.winl).quantile(0.5)) + + def time_rolling_median_l(self): + (self.sr.rolling(self.winl).median()) + + def time_rolling_mean_l(self): + (self.sr.rolling(self.winl).mean()) + + def time_rolling_max_l(self): + (self.sr.rolling(self.winl).max()) + + def time_rolling_min_l(self): + (self.sr.rolling(self.winl).min()) + + def time_rolling_std_l(self): + (self.sr.rolling(self.wins).std()) + + def time_rolling_count_l(self): + (self.sr.rolling(self.wins).count()) + + def time_rolling_skew_l(self): + (self.sr.rolling(self.wins).skew()) + + def time_rolling_kurt_l(self): + (self.sr.rolling(self.wins).kurt()) + + def time_rolling_sum_l(self): + (self.sr.rolling(self.wins).sum()) diff --git a/asv_bench/benchmarks/series_methods.py b/asv_bench/benchmarks/series_methods.py index 413c4e044fd3a..3c0e2869357ae 100644 --- a/asv_bench/benchmarks/series_methods.py +++ b/asv_bench/benchmarks/series_methods.py @@ -68,8 +68,8 @@ def setup(self): self.s4 = self.s3.astype('object') def time_series_nlargest1(self): - self.s1.nlargest(3, take_last=True) - self.s1.nlargest(3, take_last=False) + self.s1.nlargest(3, keep='last') + self.s1.nlargest(3, keep='first') class series_nlargest2(object): @@ -83,8 +83,8 @@ def setup(self): self.s4 = self.s3.astype('object') def time_series_nlargest2(self): - self.s2.nlargest(3, take_last=True) - self.s2.nlargest(3, take_last=False) + self.s2.nlargest(3, keep='last') + self.s2.nlargest(3, keep='first') class series_nsmallest2(object): @@ -98,8 +98,8 @@ def setup(self): self.s4 = self.s3.astype('object') def time_series_nsmallest2(self): - self.s2.nsmallest(3, take_last=True) - self.s2.nsmallest(3, take_last=False) + self.s2.nsmallest(3, keep='last') + self.s2.nsmallest(3, keep='first') class series_dropna_int64(object): @@ -111,6 +111,7 @@ def setup(self): def time_series_dropna_int64(self): self.s.dropna() + class series_dropna_datetime(object): goal_time = 0.2 @@ -120,3 +121,13 @@ def setup(self): def time_series_dropna_datetime(self): self.s.dropna() + + +class series_clip(object): + goal_time = 0.2 + + def setup(self): + self.s = pd.Series(np.random.randn(50)) + + def time_series_dropna_datetime(self): + self.s.clip(0, 1) diff --git a/asv_bench/benchmarks/sparse.py b/asv_bench/benchmarks/sparse.py index 717fe7218ceda..7259e8cdb7d61 100644 --- a/asv_bench/benchmarks/sparse.py +++ b/asv_bench/benchmarks/sparse.py @@ -1,8 +1,8 @@ +from itertools import repeat + from .pandas_vb_common import * -import pandas.sparse.series import scipy.sparse -from pandas.core.sparse import SparseSeries, SparseDataFrame -from pandas.core.sparse import SparseDataFrame +from pandas import SparseSeries, SparseDataFrame class sparse_series_to_frame(object): @@ -29,6 +29,12 @@ class sparse_frame_constructor(object): def time_sparse_frame_constructor(self): SparseDataFrame(columns=np.arange(100), index=np.arange(1000)) + def time_sparse_from_scipy(self): + SparseDataFrame(scipy.sparse.rand(1000, 1000, 0.005)) + + def time_sparse_from_dict(self): + SparseDataFrame(dict(zip(range(1000), repeat([0])))) + class sparse_series_from_coo(object): goal_time = 0.2 @@ -37,7 +43,7 @@ def setup(self): self.A = scipy.sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(100, 100)) def time_sparse_series_from_coo(self): - self.ss = pandas.sparse.series.SparseSeries.from_coo(self.A) + self.ss = SparseSeries.from_coo(self.A) class sparse_series_to_coo(object): diff --git a/asv_bench/benchmarks/stat_ops.py b/asv_bench/benchmarks/stat_ops.py index 12fbb2478c2a5..1e1eb167b46bf 100644 --- a/asv_bench/benchmarks/stat_ops.py +++ b/asv_bench/benchmarks/stat_ops.py @@ -1,92 +1,36 @@ from .pandas_vb_common import * -class stat_ops_frame_mean_float_axis_0(object): - goal_time = 0.2 - - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_mean_float_axis_0(self): - self.df.mean() - - -class stat_ops_frame_mean_float_axis_1(object): - goal_time = 0.2 - - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_mean_float_axis_1(self): - self.df.mean(1) - - -class stat_ops_frame_mean_int_axis_0(object): - goal_time = 0.2 - - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_mean_int_axis_0(self): - self.dfi.mean() - - -class stat_ops_frame_mean_int_axis_1(object): - goal_time = 0.2 +def _set_use_bottleneck_False(): + try: + pd.options.compute.use_bottleneck = False + except: + from pandas.core import nanops + nanops._USE_BOTTLENECK = False - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_mean_int_axis_1(self): - self.dfi.mean(1) - - -class stat_ops_frame_sum_float_axis_0(object): - goal_time = 0.2 - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_sum_float_axis_0(self): - self.df.sum() - - -class stat_ops_frame_sum_float_axis_1(object): +class FrameOps(object): goal_time = 0.2 - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) + param_names = ['op', 'use_bottleneck', 'dtype', 'axis'] + params = [['mean', 'sum', 'median'], + [True, False], + ['float', 'int'], + [0, 1]] - def time_stat_ops_frame_sum_float_axis_1(self): - self.df.sum(1) + def setup(self, op, use_bottleneck, dtype, axis): + if dtype == 'float': + self.df = DataFrame(np.random.randn(100000, 4)) + elif dtype == 'int': + self.df = DataFrame(np.random.randint(1000, size=(100000, 4))) + if not use_bottleneck: + _set_use_bottleneck_False() -class stat_ops_frame_sum_int_axis_0(object): - goal_time = 0.2 - - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) - - def time_stat_ops_frame_sum_int_axis_0(self): - self.dfi.sum() - - -class stat_ops_frame_sum_int_axis_1(object): - goal_time = 0.2 - - def setup(self): - self.df = DataFrame(np.random.randn(100000, 4)) - self.dfi = DataFrame(np.random.randint(1000, size=self.df.shape)) + self.func = getattr(self.df, op) - def time_stat_ops_frame_sum_int_axis_1(self): - self.dfi.sum(1) + def time_op(self, op, use_bottleneck, dtype, axis): + self.func(axis=axis) class stat_ops_level_frame_sum(object): diff --git a/asv_bench/benchmarks/timeseries.py b/asv_bench/benchmarks/timeseries.py index 6e9ef4b10273c..b7151ad2eaa99 100644 --- a/asv_bench/benchmarks/timeseries.py +++ b/asv_bench/benchmarks/timeseries.py @@ -1,7 +1,9 @@ -from pandas.tseries.converter import DatetimeConverter +try: + from pandas.plotting._converter import DatetimeConverter +except ImportError: + from pandas.tseries.converter import DatetimeConverter from .pandas_vb_common import * import pandas as pd -from datetime import timedelta import datetime as dt try: import pandas.tseries.holiday @@ -51,10 +53,14 @@ def setup(self): self.rng6 = date_range(start='1/1/1', periods=self.N, freq='B') self.rng7 = date_range(start='1/1/1700', freq='D', periods=100000) - self.a = self.rng7[:50000].append(self.rng7[50002:]) + self.no_freq = self.rng7[:50000].append(self.rng7[50002:]) + self.d_freq = self.rng7[:50000].append(self.rng7[50000:]) + + self.rng8 = date_range(start='1/1/1700', freq='B', periods=100000) + self.b_freq = self.rng8[:50000].append(self.rng8[50000:]) def time_add_timedelta(self): - (self.rng + timedelta(minutes=2)) + (self.rng + dt.timedelta(minutes=2)) def time_add_offset_delta(self): (self.rng + self.delta_offset) @@ -92,8 +98,14 @@ def time_infer_dst(self): def time_timeseries_is_month_start(self): self.rng6.is_month_start - def time_infer_freq(self): - infer_freq(self.a) + def time_infer_freq_none(self): + infer_freq(self.no_freq) + + def time_infer_freq_daily(self): + infer_freq(self.d_freq) + + def time_infer_freq_business(self): + infer_freq(self.b_freq) class TimeDatetimeConverter(object): @@ -292,7 +304,10 @@ def setup(self): self.rng3 = date_range(start='1/1/2000', periods=1500000, freq='S') self.ts3 = Series(1, index=self.rng3) - def time_sort_index(self): + def time_sort_index_monotonic(self): + self.ts2.sort_index() + + def time_sort_index_non_monotonic(self): self.ts.sort_index() def time_timeseries_slice_minutely(self): @@ -495,3 +510,17 @@ def time_begin_incr_rng(self): def time_begin_decr_rng(self): self.rng - self.semi_month_begin + + +class DatetimeAccessor(object): + def setup(self): + self.N = 100000 + self.series = pd.Series( + pd.date_range(start='1/1/2000', periods=self.N, freq='T') + ) + + def time_dt_accessor(self): + self.series.dt + + def time_dt_accessor_normalize(self): + self.series.dt.normalize() diff --git a/asv_bench/vbench_to_asv.py b/asv_bench/vbench_to_asv.py index c3041ec2b1ba1..2a4ce5d183ea2 100644 --- a/asv_bench/vbench_to_asv.py +++ b/asv_bench/vbench_to_asv.py @@ -114,7 +114,7 @@ def translate_module(target_module): l_vars = {} exec('import ' + target_module) in g_vars - print target_module + print(target_module) module = eval(target_module, g_vars) benchmarks = [] @@ -157,7 +157,7 @@ def translate_module(target_module): mod = os.path.basename(module) if mod in ['make.py', 'measure_memory_consumption.py', 'perf_HEAD.py', 'run_suite.py', 'test_perf.py', 'generate_rst_files.py', 'test.py', 'suite.py']: continue - print - print mod + print('') + print(mod) translate_module(mod.replace('.py', '')) diff --git a/bench/alignment.py b/bench/alignment.py deleted file mode 100644 index bc3134f597ee0..0000000000000 --- a/bench/alignment.py +++ /dev/null @@ -1,22 +0,0 @@ -# Setup -from pandas.compat import range, lrange -import numpy as np -import pandas -import la -N = 1000 -K = 50 -arr1 = np.random.randn(N, K) -arr2 = np.random.randn(N, K) -idx1 = lrange(N) -idx2 = lrange(K) - -# pandas -dma1 = pandas.DataFrame(arr1, idx1, idx2) -dma2 = pandas.DataFrame(arr2, idx1[::-1], idx2[::-1]) - -# larry -lar1 = la.larry(arr1, [idx1, idx2]) -lar2 = la.larry(arr2, [idx1[::-1], idx2[::-1]]) - -for i in range(100): - result = lar1 + lar2 diff --git a/bench/bench_dense_to_sparse.py b/bench/bench_dense_to_sparse.py deleted file mode 100644 index e1dcd3456e88d..0000000000000 --- a/bench/bench_dense_to_sparse.py +++ /dev/null @@ -1,14 +0,0 @@ -from pandas import * - -K = 100 -N = 100000 -rng = DatetimeIndex('1/1/2000', periods=N, offset=datetools.Minute()) - -rng2 = np.asarray(rng).astype('M8[us]').astype('i8') - -series = {} -for i in range(1, K + 1): - data = np.random.randn(N)[:-i] - this_rng = rng2[:-i] - data[100:] = np.nan - series[i] = SparseSeries(data, index=this_rng) diff --git a/bench/bench_get_put_value.py b/bench/bench_get_put_value.py deleted file mode 100644 index 427e0b1b10a22..0000000000000 --- a/bench/bench_get_put_value.py +++ /dev/null @@ -1,56 +0,0 @@ -from pandas import * -from pandas.util.testing import rands -from pandas.compat import range - -N = 1000 -K = 50 - - -def _random_index(howmany): - return Index([rands(10) for _ in range(howmany)]) - -df = DataFrame(np.random.randn(N, K), index=_random_index(N), - columns=_random_index(K)) - - -def get1(): - for col in df.columns: - for row in df.index: - _ = df[col][row] - - -def get2(): - for col in df.columns: - for row in df.index: - _ = df.get_value(row, col) - - -def put1(): - for col in df.columns: - for row in df.index: - df[col][row] = 0 - - -def put2(): - for col in df.columns: - for row in df.index: - df.set_value(row, col, 0) - - -def resize1(): - buf = DataFrame() - for col in df.columns: - for row in df.index: - buf = buf.set_value(row, col, 5.) - return buf - - -def resize2(): - from collections import defaultdict - - buf = defaultdict(dict) - for col in df.columns: - for row in df.index: - buf[col][row] = 5. - - return DataFrame(buf) diff --git a/bench/bench_groupby.py b/bench/bench_groupby.py deleted file mode 100644 index d7a2853e1e7b2..0000000000000 --- a/bench/bench_groupby.py +++ /dev/null @@ -1,66 +0,0 @@ -from pandas import * -from pandas.util.testing import rands -from pandas.compat import range - -import string -import random - -k = 20000 -n = 10 - -foo = np.tile(np.array([rands(10) for _ in range(k)], dtype='O'), n) -foo2 = list(foo) -random.shuffle(foo) -random.shuffle(foo2) - -df = DataFrame({'A': foo, - 'B': foo2, - 'C': np.random.randn(n * k)}) - -import pandas._sandbox as sbx - - -def f(): - table = sbx.StringHashTable(len(df)) - ret = table.factorize(df['A']) - return ret - - -def g(): - table = sbx.PyObjectHashTable(len(df)) - ret = table.factorize(df['A']) - return ret - -ret = f() - -""" -import pandas._tseries as lib - -f = np.std - - -grouped = df.groupby(['A', 'B']) - -label_list = [ping.labels for ping in grouped.groupings] -shape = [len(ping.ids) for ping in grouped.groupings] - -from pandas.core.groupby import get_group_index - - -group_index = get_group_index(label_list, shape, - sort=True, xnull=True).astype('i4') - -ngroups = np.prod(shape) - -indexer = lib.groupsort_indexer(group_index, ngroups) - -values = df['C'].values.take(indexer) -group_index = group_index.take(indexer) - -f = lambda x: x.std(ddof=1) - -grouper = lib.Grouper(df['C'], np.ndarray.std, group_index, ngroups) -result = grouper.get_result() - -expected = grouped.std() -""" diff --git a/bench/bench_join_panel.py b/bench/bench_join_panel.py deleted file mode 100644 index f3c3f8ba15f70..0000000000000 --- a/bench/bench_join_panel.py +++ /dev/null @@ -1,85 +0,0 @@ -# reasonably efficient - - -def create_panels_append(cls, panels): - """ return an append list of panels """ - panels = [a for a in panels if a is not None] - # corner cases - if len(panels) == 0: - return None - elif len(panels) == 1: - return panels[0] - elif len(panels) == 2 and panels[0] == panels[1]: - return panels[0] - # import pdb; pdb.set_trace() - # create a joint index for the axis - - def joint_index_for_axis(panels, axis): - s = set() - for p in panels: - s.update(list(getattr(p, axis))) - return sorted(list(s)) - - def reindex_on_axis(panels, axis, axis_reindex): - new_axis = joint_index_for_axis(panels, axis) - new_panels = [p.reindex(**{axis_reindex: new_axis, - 'copy': False}) for p in panels] - return new_panels, new_axis - # create the joint major index, dont' reindex the sub-panels - we are - # appending - major = joint_index_for_axis(panels, 'major_axis') - # reindex on minor axis - panels, minor = reindex_on_axis(panels, 'minor_axis', 'minor') - # reindex on items - panels, items = reindex_on_axis(panels, 'items', 'items') - # concatenate values - try: - values = np.concatenate([p.values for p in panels], axis=1) - except Exception as detail: - raise Exception("cannot append values that dont' match dimensions! -> [%s] %s" - % (','.join(["%s" % p for p in panels]), str(detail))) - # pm('append - create_panel') - p = Panel(values, items=items, major_axis=major, - minor_axis=minor) - # pm('append - done') - return p - - -# does the job but inefficient (better to handle like you read a table in -# pytables...e.g create a LongPanel then convert to Wide) -def create_panels_join(cls, panels): - """ given an array of panels's, create a single panel """ - panels = [a for a in panels if a is not None] - # corner cases - if len(panels) == 0: - return None - elif len(panels) == 1: - return panels[0] - elif len(panels) == 2 and panels[0] == panels[1]: - return panels[0] - d = dict() - minor, major, items = set(), set(), set() - for panel in panels: - items.update(panel.items) - major.update(panel.major_axis) - minor.update(panel.minor_axis) - values = panel.values - for item, item_index in panel.items.indexMap.items(): - for minor_i, minor_index in panel.minor_axis.indexMap.items(): - for major_i, major_index in panel.major_axis.indexMap.items(): - try: - d[(minor_i, major_i, item)] = values[item_index, major_index, minor_index] - except: - pass - # stack the values - minor = sorted(list(minor)) - major = sorted(list(major)) - items = sorted(list(items)) - # create the 3d stack (items x columns x indicies) - data = np.dstack([np.asarray([np.asarray([d.get((minor_i, major_i, item), np.nan) - for item in items]) - for major_i in major]).transpose() - for minor_i in minor]) - # construct the panel - return Panel(data, items, major, minor) -add_class_method(Panel, create_panels_join, 'join_many') diff --git a/bench/bench_khash_dict.py b/bench/bench_khash_dict.py deleted file mode 100644 index 054fc36131b65..0000000000000 --- a/bench/bench_khash_dict.py +++ /dev/null @@ -1,89 +0,0 @@ -""" -Some comparisons of khash.h to Python dict -""" -from __future__ import print_function - -import numpy as np -import os - -from vbench.api import Benchmark -from pandas.util.testing import rands -from pandas.compat import range -import pandas._tseries as lib -import pandas._sandbox as sbx -import time - -import psutil - -pid = os.getpid() -proc = psutil.Process(pid) - - -def object_test_data(n): - pass - - -def string_test_data(n): - return np.array([rands(10) for _ in range(n)], dtype='O') - - -def int_test_data(n): - return np.arange(n, dtype='i8') - -N = 1000000 - -#---------------------------------------------------------------------- -# Benchmark 1: map_locations - - -def map_locations_python_object(): - arr = string_test_data(N) - return _timeit(lambda: lib.map_indices_object(arr)) - - -def map_locations_khash_object(): - arr = string_test_data(N) - - def f(): - table = sbx.PyObjectHashTable(len(arr)) - table.map_locations(arr) - return _timeit(f) - - -def _timeit(f, iterations=10): - start = time.time() - for _ in range(iterations): - foo = f() - elapsed = time.time() - start - return elapsed - -#---------------------------------------------------------------------- -# Benchmark 2: lookup_locations - - -def lookup_python(values): - table = lib.map_indices_object(values) - return _timeit(lambda: lib.merge_indexer_object(values, table)) - - -def lookup_khash(values): - table = sbx.PyObjectHashTable(len(values)) - table.map_locations(values) - locs = table.lookup_locations(values) - # elapsed = _timeit(lambda: table.lookup_locations2(values)) - return table - - -def leak(values): - for _ in range(100): - print(proc.get_memory_info()) - table = lookup_khash(values) - # table.destroy() - -arr = string_test_data(N) - -#---------------------------------------------------------------------- -# Benchmark 3: unique - -#---------------------------------------------------------------------- -# Benchmark 4: factorize diff --git a/bench/bench_merge.R b/bench/bench_merge.R deleted file mode 100644 index 3ed4618494857..0000000000000 --- a/bench/bench_merge.R +++ /dev/null @@ -1,161 +0,0 @@ -library(plyr) -library(data.table) -N <- 10000 -indices = rep(NA, N) -indices2 = rep(NA, N) -for (i in 1:N) { - indices[i] <- paste(sample(letters, 10), collapse="") - indices2[i] <- paste(sample(letters, 10), collapse="") -} -left <- data.frame(key=rep(indices[1:8000], 10), - key2=rep(indices2[1:8000], 10), - value=rnorm(80000)) -right <- data.frame(key=indices[2001:10000], - key2=indices2[2001:10000], - value2=rnorm(8000)) - -right2 <- data.frame(key=rep(right$key, 2), - key2=rep(right$key2, 2), - value2=rnorm(16000)) - -left.dt <- data.table(left, key=c("key", "key2")) -right.dt <- data.table(right, key=c("key", "key2")) -right2.dt <- data.table(right2, key=c("key", "key2")) - -# left.dt2 <- data.table(left) -# right.dt2 <- data.table(right) - -## left <- data.frame(key=rep(indices[1:1000], 10), -## key2=rep(indices2[1:1000], 10), -## value=rnorm(100000)) -## right <- data.frame(key=indices[1:1000], -## key2=indices2[1:1000], -## value2=rnorm(10000)) - -timeit <- function(func, niter=10) { - timing = rep(NA, niter) - for (i in 1:niter) { - gc() - timing[i] <- system.time(func())[3] - } - mean(timing) -} - -left.join <- function(sort=FALSE) { - result <- base::merge(left, right, all.x=TRUE, sort=sort) -} - -right.join <- function(sort=FALSE) { - result <- base::merge(left, right, all.y=TRUE, sort=sort) -} - -outer.join <- function(sort=FALSE) { - result <- base::merge(left, right, all=TRUE, sort=sort) -} - -inner.join <- function(sort=FALSE) { - result <- base::merge(left, right, all=FALSE, sort=sort) -} - -left.join.dt <- function(sort=FALSE) { - result <- right.dt[left.dt] -} - -right.join.dt <- function(sort=FALSE) { - result <- left.dt[right.dt] -} - -outer.join.dt <- function(sort=FALSE) { - result <- merge(left.dt, right.dt, all=TRUE, sort=sort) -} - -inner.join.dt <- function(sort=FALSE) { - result <- merge(left.dt, right.dt, all=FALSE, sort=sort) -} - -plyr.join <- function(type) { - result <- plyr::join(left, right, by=c("key", "key2"), - type=type, match="first") -} - -sort.options <- c(FALSE, TRUE) - -# many-to-one - -results <- matrix(nrow=4, ncol=3) -colnames(results) <- c("base::merge", "plyr", "data.table") -rownames(results) <- c("inner", "outer", "left", "right") - -base.functions <- c(inner.join, outer.join, left.join, right.join) -plyr.functions <- c(function() plyr.join("inner"), - function() plyr.join("full"), - function() plyr.join("left"), - function() plyr.join("right")) -dt.functions <- c(inner.join.dt, outer.join.dt, left.join.dt, right.join.dt) -for (i in 1:4) { - base.func <- base.functions[[i]] - plyr.func <- plyr.functions[[i]] - dt.func <- dt.functions[[i]] - results[i, 1] <- timeit(base.func) - results[i, 2] <- timeit(plyr.func) - results[i, 3] <- timeit(dt.func) -} - - -# many-to-many - -left.join <- function(sort=FALSE) { - result <- base::merge(left, right2, all.x=TRUE, sort=sort) -} - -right.join <- function(sort=FALSE) { - result <- base::merge(left, right2, all.y=TRUE, sort=sort) -} - -outer.join <- function(sort=FALSE) { - result <- base::merge(left, right2, all=TRUE, sort=sort) -} - -inner.join <- function(sort=FALSE) { - result <- base::merge(left, right2, all=FALSE, sort=sort) -} - -left.join.dt <- function(sort=FALSE) { - result <- right2.dt[left.dt] -} - -right.join.dt <- function(sort=FALSE) { - result <- left.dt[right2.dt] -} - -outer.join.dt <- function(sort=FALSE) { - result <- merge(left.dt, right2.dt, all=TRUE, sort=sort) -} - -inner.join.dt <- function(sort=FALSE) { - result <- merge(left.dt, right2.dt, all=FALSE, sort=sort) -} - -sort.options <- c(FALSE, TRUE) - -# many-to-one - -results <- matrix(nrow=4, ncol=3) -colnames(results) <- c("base::merge", "plyr", "data.table") -rownames(results) <- c("inner", "outer", "left", "right") - -base.functions <- c(inner.join, outer.join, left.join, right.join) -plyr.functions <- c(function() plyr.join("inner"), - function() plyr.join("full"), - function() plyr.join("left"), - function() plyr.join("right")) -dt.functions <- c(inner.join.dt, outer.join.dt, left.join.dt, right.join.dt) -for (i in 1:4) { - base.func <- base.functions[[i]] - plyr.func <- plyr.functions[[i]] - dt.func <- dt.functions[[i]] - results[i, 1] <- timeit(base.func) - results[i, 2] <- timeit(plyr.func) - results[i, 3] <- timeit(dt.func) -} - diff --git a/bench/bench_merge.py b/bench/bench_merge.py deleted file mode 100644 index 330dba7b9af69..0000000000000 --- a/bench/bench_merge.py +++ /dev/null @@ -1,105 +0,0 @@ -import random -import gc -import time -from pandas import * -from pandas.compat import range, lrange, StringIO -from pandas.util.testing import rands - -N = 10000 -ngroups = 10 - - -def get_test_data(ngroups=100, n=N): - unique_groups = lrange(ngroups) - arr = np.asarray(np.tile(unique_groups, n / ngroups), dtype=object) - - if len(arr) < n: - arr = np.asarray(list(arr) + unique_groups[:n - len(arr)], - dtype=object) - - random.shuffle(arr) - return arr - -# aggregate multiple columns -# df = DataFrame({'key1' : get_test_data(ngroups=ngroups), -# 'key2' : get_test_data(ngroups=ngroups), -# 'data1' : np.random.randn(N), -# 'data2' : np.random.randn(N)}) - -# df2 = DataFrame({'key1' : get_test_data(ngroups=ngroups, n=N//10), -# 'key2' : get_test_data(ngroups=ngroups//2, n=N//10), -# 'value' : np.random.randn(N // 10)}) -# result = merge.merge(df, df2, on='key2') - -N = 10000 - -indices = np.array([rands(10) for _ in range(N)], dtype='O') -indices2 = np.array([rands(10) for _ in range(N)], dtype='O') -key = np.tile(indices[:8000], 10) -key2 = np.tile(indices2[:8000], 10) - -left = DataFrame({'key': key, 'key2': key2, - 'value': np.random.randn(80000)}) -right = DataFrame({'key': indices[2000:], 'key2': indices2[2000:], - 'value2': np.random.randn(8000)}) - -right2 = right.append(right, ignore_index=True) - - -join_methods = ['inner', 'outer', 'left', 'right'] -results = DataFrame(index=join_methods, columns=[False, True]) -niter = 10 -for sort in [False, True]: - for join_method in join_methods: - f = lambda: merge(left, right, how=join_method, sort=sort) - gc.disable() - start = time.time() - for _ in range(niter): - f() - elapsed = (time.time() - start) / niter - gc.enable() - results[sort][join_method] = elapsed -# results.columns = ['pandas'] -results.columns = ['dont_sort', 'sort'] - - -# R results -# many to one -r_results = read_table(StringIO(""" base::merge plyr data.table -inner 0.2475 0.1183 0.1100 -outer 0.4213 0.1916 0.2090 -left 0.2998 0.1188 0.0572 -right 0.3102 0.0536 0.0376 -"""), sep='\s+') - -presults = results[['dont_sort']].rename(columns={'dont_sort': 'pandas'}) -all_results = presults.join(r_results) - -all_results = all_results.div(all_results['pandas'], axis=0) - -all_results = all_results.ix[:, ['pandas', 'data.table', 'plyr', - 'base::merge']] - -sort_results = DataFrame.from_items([('pandas', results['sort']), - ('R', r_results['base::merge'])]) -sort_results['Ratio'] = sort_results['R'] / sort_results['pandas'] - - -nosort_results = DataFrame.from_items([('pandas', results['dont_sort']), - ('R', r_results['base::merge'])]) -nosort_results['Ratio'] = nosort_results['R'] / nosort_results['pandas'] - -# many to many - -# many to one -r_results = read_table(StringIO("""base::merge plyr data.table -inner 0.4610 0.1276 0.1269 -outer 0.9195 0.1881 0.2725 -left 0.6559 0.1257 0.0678 -right 0.6425 0.0522 0.0428 -"""), sep='\s+') - -all_results = presults.join(r_results) -all_results = all_results.div(all_results['pandas'], axis=0) -all_results = all_results.ix[:, ['pandas', 'data.table', 'plyr', - 'base::merge']] diff --git a/bench/bench_merge_sqlite.py b/bench/bench_merge_sqlite.py deleted file mode 100644 index 3ad4b810119c3..0000000000000 --- a/bench/bench_merge_sqlite.py +++ /dev/null @@ -1,87 +0,0 @@ -import numpy as np -from collections import defaultdict -import gc -import time -from pandas import DataFrame -from pandas.util.testing import rands -from pandas.compat import range, zip -import random - -N = 10000 - -indices = np.array([rands(10) for _ in range(N)], dtype='O') -indices2 = np.array([rands(10) for _ in range(N)], dtype='O') -key = np.tile(indices[:8000], 10) -key2 = np.tile(indices2[:8000], 10) - -left = DataFrame({'key': key, 'key2': key2, - 'value': np.random.randn(80000)}) -right = DataFrame({'key': indices[2000:], 'key2': indices2[2000:], - 'value2': np.random.randn(8000)}) - -# right2 = right.append(right, ignore_index=True) -# right = right2 - -# random.shuffle(key2) -# indices2 = indices.copy() -# random.shuffle(indices2) - -# Prepare Database -import sqlite3 -create_sql_indexes = True - -conn = sqlite3.connect(':memory:') -conn.execute( - 'create table left( key varchar(10), key2 varchar(10), value int);') -conn.execute( - 'create table right( key varchar(10), key2 varchar(10), value2 int);') -conn.executemany('insert into left values (?, ?, ?)', - zip(key, key2, left['value'])) -conn.executemany('insert into right values (?, ?, ?)', - zip(right['key'], right['key2'], right['value2'])) - -# Create Indices -if create_sql_indexes: - conn.execute('create index left_ix on left(key, key2)') - conn.execute('create index right_ix on right(key, key2)') - - -join_methods = ['inner', 'left outer', 'left'] # others not supported -sql_results = DataFrame(index=join_methods, columns=[False]) -niter = 5 -for sort in [False]: - for join_method in join_methods: - sql = """CREATE TABLE test as select * - from left - %s join right - on left.key=right.key - and left.key2 = right.key2;""" % join_method - sql = """select * - from left - %s join right - on left.key=right.key - and left.key2 = right.key2;""" % join_method - - if sort: - sql = '%s order by key, key2' % sql - f = lambda: list(conn.execute(sql)) # list fetches results - g = lambda: conn.execute(sql) # list fetches results - gc.disable() - start = time.time() - # for _ in range(niter): - g() - elapsed = (time.time() - start) / niter - gc.enable() - - cur = conn.execute("DROP TABLE test") - conn.commit() - - sql_results[sort][join_method] = elapsed - sql_results.columns = ['sqlite3'] # ['dont_sort', 'sort'] - sql_results.index = ['inner', 'outer', 'left'] - - sql = """select * - from left - inner join right - on left.key=right.key - and left.key2 = right.key2;""" diff --git a/bench/bench_pivot.R b/bench/bench_pivot.R deleted file mode 100644 index 06dc6a105bc43..0000000000000 --- a/bench/bench_pivot.R +++ /dev/null @@ -1,27 +0,0 @@ -library(reshape2) - - -n <- 100000 -a.size <- 5 -b.size <- 5 - -data <- data.frame(a=sample(letters[1:a.size], n, replace=T), - b=sample(letters[1:b.size], n, replace=T), - c=rnorm(n), - d=rnorm(n)) - -timings <- numeric() - -# acast(melt(data, id=c("a", "b")), a ~ b, mean) -# acast(melt(data, id=c("a", "b")), a + b ~ variable, mean) - -for (i in 1:10) { - gc() - tim <- system.time(acast(melt(data, id=c("a", "b")), a ~ b, mean, - subset=.(variable=="c"))) - timings[i] = tim[3] -} - -mean(timings) - -acast(melt(data, id=c("a", "b")), a ~ b, mean, subset=.(variable="c")) diff --git a/bench/bench_pivot.py b/bench/bench_pivot.py deleted file mode 100644 index 007bd0aaebc2f..0000000000000 --- a/bench/bench_pivot.py +++ /dev/null @@ -1,16 +0,0 @@ -from pandas import * -import string - - -n = 100000 -asize = 5 -bsize = 5 - -letters = np.asarray(list(string.letters), dtype=object) - -data = DataFrame(dict(foo=letters[:asize][np.random.randint(0, asize, n)], - bar=letters[:bsize][np.random.randint(0, bsize, n)], - baz=np.random.randn(n), - qux=np.random.randn(n))) - -table = pivot_table(data, xby=['foo', 'bar']) diff --git a/bench/bench_take_indexing.py b/bench/bench_take_indexing.py deleted file mode 100644 index 5fb584bcfe45f..0000000000000 --- a/bench/bench_take_indexing.py +++ /dev/null @@ -1,55 +0,0 @@ -from __future__ import print_function -import numpy as np - -from pandas import * -import pandas._tseries as lib - -from pandas import DataFrame -import timeit -from pandas.compat import zip - -setup = """ -from pandas import Series -import pandas._tseries as lib -import random -import numpy as np - -import random -n = %d -k = %d -arr = np.random.randn(n, k) -indexer = np.arange(n, dtype=np.int32) -indexer = indexer[::-1] -""" - -sizes = [100, 1000, 10000, 100000] -iters = [1000, 1000, 100, 1] - -fancy_2d = [] -take_2d = [] -cython_2d = [] - -n = 1000 - - -def _timeit(stmt, size, k=5, iters=1000): - timer = timeit.Timer(stmt=stmt, setup=setup % (sz, k)) - return timer.timeit(n) / n - -for sz, its in zip(sizes, iters): - print(sz) - fancy_2d.append(_timeit('arr[indexer]', sz, iters=its)) - take_2d.append(_timeit('arr.take(indexer, axis=0)', sz, iters=its)) - cython_2d.append(_timeit('lib.take_axis0(arr, indexer)', sz, iters=its)) - -df = DataFrame({'fancy': fancy_2d, - 'take': take_2d, - 'cython': cython_2d}) - -print(df) - -from pandas.rpy.common import r -r('mat <- matrix(rnorm(50000), nrow=10000, ncol=5)') -r('set.seed(12345') -r('indexer <- sample(1:10000)') -r('mat[indexer,]') diff --git a/bench/bench_unique.py b/bench/bench_unique.py deleted file mode 100644 index 87bd2f2df586c..0000000000000 --- a/bench/bench_unique.py +++ /dev/null @@ -1,278 +0,0 @@ -from __future__ import print_function -from pandas import * -from pandas.util.testing import rands -from pandas.compat import range, zip -import pandas._tseries as lib -import numpy as np -import matplotlib.pyplot as plt - -N = 50000 -K = 10000 - -groups = np.array([rands(10) for _ in range(K)], dtype='O') -groups2 = np.array([rands(10) for _ in range(K)], dtype='O') - -labels = np.tile(groups, N // K) -labels2 = np.tile(groups2, N // K) -data = np.random.randn(N) - - -def timeit(f, niter): - import gc - import time - gc.disable() - start = time.time() - for _ in range(niter): - f() - elapsed = (time.time() - start) / niter - gc.enable() - return elapsed - - -def algo1(): - unique_labels = np.unique(labels) - result = np.empty(len(unique_labels)) - for i, label in enumerate(unique_labels): - result[i] = data[labels == label].sum() - - -def algo2(): - unique_labels = np.unique(labels) - indices = lib.groupby_indices(labels) - result = np.empty(len(unique_labels)) - - for i, label in enumerate(unique_labels): - result[i] = data.take(indices[label]).sum() - - -def algo3_nosort(): - rizer = lib.DictFactorizer() - labs, counts = rizer.factorize(labels, sort=False) - k = len(rizer.uniques) - out = np.empty(k) - lib.group_add(out, counts, data, labs) - - -def algo3_sort(): - rizer = lib.DictFactorizer() - labs, counts = rizer.factorize(labels, sort=True) - k = len(rizer.uniques) - out = np.empty(k) - lib.group_add(out, counts, data, labs) - -import numpy as np -import random - - -# dict to hold results -counts = {} - -# a hack to generate random key, value pairs. -# 5k keys, 100k values -x = np.tile(np.arange(5000, dtype='O'), 20) -random.shuffle(x) -xarr = x -x = [int(y) for y in x] -data = np.random.uniform(0, 1, 100000) - - -def f(): - # groupby sum - for k, v in zip(x, data): - try: - counts[k] += v - except KeyError: - counts[k] = v - - -def f2(): - rizer = lib.DictFactorizer() - labs, counts = rizer.factorize(xarr, sort=False) - k = len(rizer.uniques) - out = np.empty(k) - lib.group_add(out, counts, data, labs) - - -def algo4(): - rizer = lib.DictFactorizer() - labs1, _ = rizer.factorize(labels, sort=False) - k1 = len(rizer.uniques) - - rizer = lib.DictFactorizer() - labs2, _ = rizer.factorize(labels2, sort=False) - k2 = len(rizer.uniques) - - group_id = labs1 * k2 + labs2 - max_group = k1 * k2 - - if max_group > 1e6: - rizer = lib.Int64Factorizer(len(group_id)) - group_id, _ = rizer.factorize(group_id.astype('i8'), sort=True) - max_group = len(rizer.uniques) - - out = np.empty(max_group) - counts = np.zeros(max_group, dtype='i4') - lib.group_add(out, counts, data, group_id) - -# cumtime percall filename:lineno(function) -# 0.592 0.592 :1() - # 0.584 0.006 groupby_ex.py:37(algo3_nosort) - # 0.535 0.005 {method 'factorize' of DictFactorizer' objects} - # 0.047 0.000 {pandas._tseries.group_add} - # 0.002 0.000 numeric.py:65(zeros_like) - # 0.001 0.000 {method 'fill' of 'numpy.ndarray' objects} - # 0.000 0.000 {numpy.core.multiarray.empty_like} - # 0.000 0.000 {numpy.core.multiarray.empty} - -# UNIQUE timings - -# N = 10000000 -# K = 500000 - -# groups = np.array([rands(10) for _ in range(K)], dtype='O') - -# labels = np.tile(groups, N // K) -data = np.random.randn(N) - -data = np.random.randn(N) - -Ks = [100, 1000, 5000, 10000, 25000, 50000, 100000] - -# Ks = [500000, 1000000, 2500000, 5000000, 10000000] - -import psutil -import os -import gc - -pid = os.getpid() -proc = psutil.Process(pid) - - -def dict_unique(values, expected_K, sort=False, memory=False): - if memory: - gc.collect() - before_mem = proc.get_memory_info().rss - - rizer = lib.DictFactorizer() - result = rizer.unique_int64(values) - - if memory: - result = proc.get_memory_info().rss - before_mem - return result - - if sort: - result.sort() - assert(len(result) == expected_K) - return result - - -def khash_unique(values, expected_K, size_hint=False, sort=False, - memory=False): - if memory: - gc.collect() - before_mem = proc.get_memory_info().rss - - if size_hint: - rizer = lib.Factorizer(len(values)) - else: - rizer = lib.Factorizer(100) - - result = [] - result = rizer.unique(values) - - if memory: - result = proc.get_memory_info().rss - before_mem - return result - - if sort: - result.sort() - assert(len(result) == expected_K) - - -def khash_unique_str(values, expected_K, size_hint=False, sort=False, - memory=False): - if memory: - gc.collect() - before_mem = proc.get_memory_info().rss - - if size_hint: - rizer = lib.StringHashTable(len(values)) - else: - rizer = lib.StringHashTable(100) - - result = [] - result = rizer.unique(values) - - if memory: - result = proc.get_memory_info().rss - before_mem - return result - - if sort: - result.sort() - assert(len(result) == expected_K) - - -def khash_unique_int64(values, expected_K, size_hint=False, sort=False): - if size_hint: - rizer = lib.Int64HashTable(len(values)) - else: - rizer = lib.Int64HashTable(100) - - result = [] - result = rizer.unique(values) - - if sort: - result.sort() - assert(len(result) == expected_K) - - -def hash_bench(): - numpy = [] - dict_based = [] - dict_based_sort = [] - khash_hint = [] - khash_nohint = [] - for K in Ks: - print(K) - # groups = np.array([rands(10) for _ in range(K)]) - # labels = np.tile(groups, N // K).astype('O') - - groups = np.random.randint(0, long(100000000000), size=K) - labels = np.tile(groups, N // K) - dict_based.append(timeit(lambda: dict_unique(labels, K), 20)) - khash_nohint.append(timeit(lambda: khash_unique_int64(labels, K), 20)) - khash_hint.append(timeit(lambda: khash_unique_int64(labels, K, - size_hint=True), 20)) - - # memory, hard to get - # dict_based.append(np.mean([dict_unique(labels, K, memory=True) - # for _ in range(10)])) - # khash_nohint.append(np.mean([khash_unique(labels, K, memory=True) - # for _ in range(10)])) - # khash_hint.append(np.mean([khash_unique(labels, K, size_hint=True, memory=True) - # for _ in range(10)])) - - # dict_based_sort.append(timeit(lambda: dict_unique(labels, K, - # sort=True), 10)) - # numpy.append(timeit(lambda: np.unique(labels), 10)) - - # unique_timings = DataFrame({'numpy.unique' : numpy, - # 'dict, no sort' : dict_based, - # 'dict, sort' : dict_based_sort}, - # columns=['dict, no sort', - # 'dict, sort', 'numpy.unique'], - # index=Ks) - - unique_timings = DataFrame({'dict': dict_based, - 'khash, preallocate': khash_hint, - 'khash': khash_nohint}, - columns=['khash, preallocate', 'khash', 'dict'], - index=Ks) - - unique_timings.plot(kind='bar', legend=False) - plt.legend(loc='best') - plt.title('Unique on 100,000 values, int64') - plt.xlabel('Number of unique labels') - plt.ylabel('Mean execution time') - - plt.show() diff --git a/bench/bench_with_subset.R b/bench/bench_with_subset.R deleted file mode 100644 index 69d0f7a9eec63..0000000000000 --- a/bench/bench_with_subset.R +++ /dev/null @@ -1,53 +0,0 @@ -library(microbenchmark) -library(data.table) - - -data.frame.subset.bench <- function (n=1e7, times=30) { - df <- data.frame(a=rnorm(n), b=rnorm(n), c=rnorm(n)) - print(microbenchmark(subset(df, a <= b & b <= (c ^ 2 + b ^ 2 - a) & b > c), - times=times)) -} - - -# data.table allows something very similar to query with an expression -# but we have chained comparisons AND we're faster BOO YAH! -data.table.subset.expression.bench <- function (n=1e7, times=30) { - dt <- data.table(a=rnorm(n), b=rnorm(n), c=rnorm(n)) - print(microbenchmark(dt[, a <= b & b <= (c ^ 2 + b ^ 2 - a) & b > c], - times=times)) -} - - -# compare against subset with data.table for good measure -data.table.subset.bench <- function (n=1e7, times=30) { - dt <- data.table(a=rnorm(n), b=rnorm(n), c=rnorm(n)) - print(microbenchmark(subset(dt, a <= b & b <= (c ^ 2 + b ^ 2 - a) & b > c), - times=times)) -} - - -data.frame.with.bench <- function (n=1e7, times=30) { - df <- data.frame(a=rnorm(n), b=rnorm(n), c=rnorm(n)) - - print(microbenchmark(with(df, a + b * (c ^ 2 + b ^ 2 - a) / (a * c) ^ 3), - times=times)) -} - - -data.table.with.bench <- function (n=1e7, times=30) { - dt <- data.table(a=rnorm(n), b=rnorm(n), c=rnorm(n)) - print(microbenchmark(with(dt, a + b * (c ^ 2 + b ^ 2 - a) / (a * c) ^ 3), - times=times)) -} - - -bench <- function () { - data.frame.subset.bench() - data.table.subset.expression.bench() - data.table.subset.bench() - data.frame.with.bench() - data.table.with.bench() -} - - -bench() diff --git a/bench/bench_with_subset.py b/bench/bench_with_subset.py deleted file mode 100644 index 017401df3f7f3..0000000000000 --- a/bench/bench_with_subset.py +++ /dev/null @@ -1,116 +0,0 @@ -#!/usr/bin/env python - -""" -Microbenchmarks for comparison with R's "with" and "subset" functions -""" - -from __future__ import print_function -import numpy as np -from numpy import array -from timeit import repeat as timeit -from pandas.compat import range, zip -from pandas import DataFrame - - -setup_common = """from pandas import DataFrame -from numpy.random import randn -df = DataFrame(randn(%d, 3), columns=list('abc')) -%s""" - - -setup_with = "s = 'a + b * (c ** 2 + b ** 2 - a) / (a * c) ** 3'" - - -def bench_with(n, times=10, repeat=3, engine='numexpr'): - return np.array(timeit('df.eval(s, engine=%r)' % engine, - setup=setup_common % (n, setup_with), - repeat=repeat, number=times)) / times - - -setup_subset = "s = 'a <= b <= c ** 2 + b ** 2 - a and b > c'" - - -def bench_subset(n, times=10, repeat=3, engine='numexpr'): - return np.array(timeit('df.query(s, engine=%r)' % engine, - setup=setup_common % (n, setup_subset), - repeat=repeat, number=times)) / times - - -def bench(mn=1, mx=7, num=100, engines=('python', 'numexpr'), verbose=False): - r = np.logspace(mn, mx, num=num).round().astype(int) - - ev = DataFrame(np.empty((num, len(engines))), columns=engines) - qu = ev.copy(deep=True) - - ev['size'] = qu['size'] = r - - for engine in engines: - for i, n in enumerate(r): - if verbose: - print('engine: %r, i == %d' % (engine, i)) - ev.loc[i, engine] = bench_with(n, times=1, repeat=1, engine=engine) - qu.loc[i, engine] = bench_subset(n, times=1, repeat=1, - engine=engine) - - return ev, qu - - -def plot_perf(df, engines, title, filename=None): - from matplotlib.pyplot import figure, rc - - try: - from mpltools import style - except ImportError: - pass - else: - style.use('ggplot') - - rc('text', usetex=True) - - fig = figure(figsize=(4, 3), dpi=100) - ax = fig.add_subplot(111) - - for engine in engines: - ax.plot(df.size, df[engine], label=engine, lw=2) - - ax.set_xlabel('Number of Rows') - ax.set_ylabel('Time (s)') - ax.set_title(title) - ax.legend(loc='best') - ax.tick_params(top=False, right=False) - - fig.tight_layout() - - if filename is not None: - fig.savefig(filename) - - -if __name__ == '__main__': - import os - import pandas as pd - - pandas_dir = os.path.dirname(os.path.abspath(os.path.dirname(__file__))) - static_path = os.path.join(pandas_dir, 'doc', 'source', '_static') - - join = lambda p: os.path.join(static_path, p) - - fn = join('eval-query-perf-data.h5') - - engines = 'python', 'numexpr' - - if not os.path.exists(fn): - ev, qu = bench(verbose=True) - ev.to_hdf(fn, 'eval') - qu.to_hdf(fn, 'query') - else: - ev = pd.read_hdf(fn, 'eval') - qu = pd.read_hdf(fn, 'query') - - plot_perf(ev, engines, 'DataFrame.eval()', filename=join('eval-perf.png')) - plot_perf(qu, engines, 'DataFrame.query()', - filename=join('query-perf.png')) - - plot_perf(ev[ev.size <= 50000], engines, 'DataFrame.eval()', - filename=join('eval-perf-small.png')) - plot_perf(qu[qu.size <= 500000], engines, 'DataFrame.query()', - filename=join('query-perf-small.png')) diff --git a/bench/better_unique.py b/bench/better_unique.py deleted file mode 100644 index e03a4f433ce66..0000000000000 --- a/bench/better_unique.py +++ /dev/null @@ -1,80 +0,0 @@ -from __future__ import print_function -from pandas import DataFrame -from pandas.compat import range, zip -import timeit - -setup = """ -from pandas import Series -import pandas._tseries as _tseries -from pandas.compat import range -import random -import numpy as np - -def better_unique(values): - uniques = _tseries.fast_unique(values) - id_map = _tseries.map_indices_buf(uniques) - labels = _tseries.get_unique_labels(values, id_map) - return uniques, labels - -tot = 100000 - -def get_test_data(ngroups=100, n=tot): - unique_groups = range(ngroups) - random.shuffle(unique_groups) - arr = np.asarray(np.tile(unique_groups, n / ngroups), dtype=object) - - if len(arr) < n: - arr = np.asarray(list(arr) + unique_groups[:n - len(arr)], - dtype=object) - - return arr - -arr = get_test_data(ngroups=%d) -""" - -group_sizes = [10, 100, 1000, 10000, - 20000, 30000, 40000, - 50000, 60000, 70000, - 80000, 90000, 100000] - -numbers = [100, 100, 50] + [10] * 10 - -numpy = [] -wes = [] - -for sz, n in zip(group_sizes, numbers): - # wes_timer = timeit.Timer(stmt='better_unique(arr)', - # setup=setup % sz) - wes_timer = timeit.Timer(stmt='_tseries.fast_unique(arr)', - setup=setup % sz) - - numpy_timer = timeit.Timer(stmt='np.unique(arr)', - setup=setup % sz) - - print(n) - numpy_result = numpy_timer.timeit(number=n) / n - wes_result = wes_timer.timeit(number=n) / n - - print('Groups: %d, NumPy: %s, Wes: %s' % (sz, numpy_result, wes_result)) - - wes.append(wes_result) - numpy.append(numpy_result) - -result = DataFrame({'wes': wes, 'numpy': numpy}, index=group_sizes) - - -def make_plot(numpy, wes): - pass - -# def get_test_data(ngroups=100, n=100000): -# unique_groups = range(ngroups) -# random.shuffle(unique_groups) -# arr = np.asarray(np.tile(unique_groups, n / ngroups), dtype=object) - -# if len(arr) < n: -# arr = np.asarray(list(arr) + unique_groups[:n - len(arr)], -# dtype=object) - -# return arr - -# arr = get_test_data(ngroups=1000) diff --git a/bench/duplicated.R b/bench/duplicated.R deleted file mode 100644 index eb2376df2932a..0000000000000 --- a/bench/duplicated.R +++ /dev/null @@ -1,22 +0,0 @@ -N <- 100000 - -k1 = rep(NA, N) -k2 = rep(NA, N) -for (i in 1:N){ - k1[i] <- paste(sample(letters, 1), collapse="") - k2[i] <- paste(sample(letters, 1), collapse="") -} -df <- data.frame(a=k1, b=k2, c=rep(1:100, N / 100)) -df2 <- data.frame(a=k1, b=k2) - -timings <- numeric() -timings2 <- numeric() -for (i in 1:50) { - gc() - timings[i] = system.time(deduped <- df[!duplicated(df),])[3] - gc() - timings2[i] = system.time(deduped <- df[!duplicated(df[,c("a", "b")]),])[3] -} - -mean(timings) -mean(timings2) diff --git a/bench/io_roundtrip.py b/bench/io_roundtrip.py deleted file mode 100644 index d87da0ec6321a..0000000000000 --- a/bench/io_roundtrip.py +++ /dev/null @@ -1,116 +0,0 @@ -from __future__ import print_function -import time -import os -import numpy as np - -import la -import pandas -from pandas.compat import range -from pandas import datetools, DatetimeIndex - - -def timeit(f, iterations): - start = time.clock() - - for i in range(iterations): - f() - - return time.clock() - start - - -def rountrip_archive(N, K=50, iterations=10): - # Create data - arr = np.random.randn(N, K) - # lar = la.larry(arr) - dma = pandas.DataFrame(arr, - DatetimeIndex('1/1/2000', periods=N, - offset=datetools.Minute())) - dma[201] = 'bar' - - # filenames - filename_numpy = '/Users/wesm/tmp/numpy.npz' - filename_larry = '/Users/wesm/tmp/archive.hdf5' - filename_pandas = '/Users/wesm/tmp/pandas_tmp' - - # Delete old files - try: - os.unlink(filename_numpy) - except: - pass - try: - os.unlink(filename_larry) - except: - pass - - try: - os.unlink(filename_pandas) - except: - pass - - # Time a round trip save and load - # numpy_f = lambda: numpy_roundtrip(filename_numpy, arr, arr) - # numpy_time = timeit(numpy_f, iterations) / iterations - - # larry_f = lambda: larry_roundtrip(filename_larry, lar, lar) - # larry_time = timeit(larry_f, iterations) / iterations - - pandas_f = lambda: pandas_roundtrip(filename_pandas, dma, dma) - pandas_time = timeit(pandas_f, iterations) / iterations - print('pandas (HDF5) %7.4f seconds' % pandas_time) - - pickle_f = lambda: pandas_roundtrip(filename_pandas, dma, dma) - pickle_time = timeit(pickle_f, iterations) / iterations - print('pandas (pickle) %7.4f seconds' % pickle_time) - - # print('Numpy (npz) %7.4f seconds' % numpy_time) - # print('larry (HDF5) %7.4f seconds' % larry_time) - - # Delete old files - try: - os.unlink(filename_numpy) - except: - pass - try: - os.unlink(filename_larry) - except: - pass - - try: - os.unlink(filename_pandas) - except: - pass - - -def numpy_roundtrip(filename, arr1, arr2): - np.savez(filename, arr1=arr1, arr2=arr2) - npz = np.load(filename) - arr1 = npz['arr1'] - arr2 = npz['arr2'] - - -def larry_roundtrip(filename, lar1, lar2): - io = la.IO(filename) - io['lar1'] = lar1 - io['lar2'] = lar2 - lar1 = io['lar1'] - lar2 = io['lar2'] - - -def pandas_roundtrip(filename, dma1, dma2): - # What's the best way to code this? - from pandas.io.pytables import HDFStore - store = HDFStore(filename) - store['dma1'] = dma1 - store['dma2'] = dma2 - dma1 = store['dma1'] - dma2 = store['dma2'] - - -def pandas_roundtrip_pickle(filename, dma1, dma2): - dma1.save(filename) - dma1 = pandas.DataFrame.load(filename) - dma2.save(filename) - dma2 = pandas.DataFrame.load(filename) - -if __name__ == '__main__': - rountrip_archive(10000, K=200) diff --git a/bench/serialize.py b/bench/serialize.py deleted file mode 100644 index b0edd6a5752d2..0000000000000 --- a/bench/serialize.py +++ /dev/null @@ -1,89 +0,0 @@ -from __future__ import print_function -from pandas.compat import range, lrange -import time -import os -import numpy as np - -import la -import pandas - - -def timeit(f, iterations): - start = time.clock() - - for i in range(iterations): - f() - - return time.clock() - start - - -def roundtrip_archive(N, iterations=10): - - # Create data - arr = np.random.randn(N, N) - lar = la.larry(arr) - dma = pandas.DataFrame(arr, lrange(N), lrange(N)) - - # filenames - filename_numpy = '/Users/wesm/tmp/numpy.npz' - filename_larry = '/Users/wesm/tmp/archive.hdf5' - filename_pandas = '/Users/wesm/tmp/pandas_tmp' - - # Delete old files - try: - os.unlink(filename_numpy) - except: - pass - try: - os.unlink(filename_larry) - except: - pass - try: - os.unlink(filename_pandas) - except: - pass - - # Time a round trip save and load - numpy_f = lambda: numpy_roundtrip(filename_numpy, arr, arr) - numpy_time = timeit(numpy_f, iterations) / iterations - - larry_f = lambda: larry_roundtrip(filename_larry, lar, lar) - larry_time = timeit(larry_f, iterations) / iterations - - pandas_f = lambda: pandas_roundtrip(filename_pandas, dma, dma) - pandas_time = timeit(pandas_f, iterations) / iterations - - print('Numpy (npz) %7.4f seconds' % numpy_time) - print('larry (HDF5) %7.4f seconds' % larry_time) - print('pandas (HDF5) %7.4f seconds' % pandas_time) - - -def numpy_roundtrip(filename, arr1, arr2): - np.savez(filename, arr1=arr1, arr2=arr2) - npz = np.load(filename) - arr1 = npz['arr1'] - arr2 = npz['arr2'] - - -def larry_roundtrip(filename, lar1, lar2): - io = la.IO(filename) - io['lar1'] = lar1 - io['lar2'] = lar2 - lar1 = io['lar1'] - lar2 = io['lar2'] - - -def pandas_roundtrip(filename, dma1, dma2): - from pandas.io.pytables import HDFStore - store = HDFStore(filename) - store['dma1'] = dma1 - store['dma2'] = dma2 - dma1 = store['dma1'] - dma2 = store['dma2'] - - -def pandas_roundtrip_pickle(filename, dma1, dma2): - dma1.save(filename) - dma1 = pandas.DataFrame.load(filename) - dma2.save(filename) - dma2 = pandas.DataFrame.load(filename) diff --git a/bench/test.py b/bench/test.py deleted file mode 100644 index 2339deab313a1..0000000000000 --- a/bench/test.py +++ /dev/null @@ -1,70 +0,0 @@ -import numpy as np -import itertools -import collections -import scipy.ndimage as ndi -from pandas.compat import zip, range - -N = 10000 - -lat = np.random.randint(0, 360, N) -lon = np.random.randint(0, 360, N) -data = np.random.randn(N) - - -def groupby1(lat, lon, data): - indexer = np.lexsort((lon, lat)) - lat = lat.take(indexer) - lon = lon.take(indexer) - sorted_data = data.take(indexer) - - keys = 1000. * lat + lon - unique_keys = np.unique(keys) - bounds = keys.searchsorted(unique_keys) - - result = group_agg(sorted_data, bounds, lambda x: x.mean()) - - decoder = keys.searchsorted(unique_keys) - - return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result)) - - -def group_mean(lat, lon, data): - indexer = np.lexsort((lon, lat)) - lat = lat.take(indexer) - lon = lon.take(indexer) - sorted_data = data.take(indexer) - - keys = 1000 * lat + lon - unique_keys = np.unique(keys) - - result = ndi.mean(sorted_data, labels=keys, index=unique_keys) - decoder = keys.searchsorted(unique_keys) - - return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result)) - - -def group_mean_naive(lat, lon, data): - grouped = collections.defaultdict(list) - for lt, ln, da in zip(lat, lon, data): - grouped[(lt, ln)].append(da) - - averaged = dict((ltln, np.mean(da)) for ltln, da in grouped.items()) - - return averaged - - -def group_agg(values, bounds, f): - N = len(values) - result = np.empty(len(bounds), dtype=float) - for i, left_bound in enumerate(bounds): - if i == len(bounds) - 1: - right_bound = N - else: - right_bound = bounds[i + 1] - - result[i] = f(values[left_bound: right_bound]) - - return result - -# for i in range(10): -# groupby1(lat, lon, data) diff --git a/bench/zoo_bench.R b/bench/zoo_bench.R deleted file mode 100644 index 294d55f51a9ab..0000000000000 --- a/bench/zoo_bench.R +++ /dev/null @@ -1,71 +0,0 @@ -library(zoo) -library(xts) -library(fts) -library(tseries) -library(its) -library(xtable) - -## indices = rep(NA, 100000) -## for (i in 1:100000) -## indices[i] <- paste(sample(letters, 10), collapse="") - - - -## x <- zoo(rnorm(100000), indices) -## y <- zoo(rnorm(90000), indices[sample(1:100000, 90000)]) - -## indices <- as.POSIXct(1:100000) - -indices <- as.POSIXct(Sys.Date()) + seq(1, 100000000, 100) - -sz <- 500000 - -## x <- xts(rnorm(sz), sample(indices, sz)) -## y <- xts(rnorm(sz), sample(indices, sz)) - -zoo.bench <- function(){ - x <- zoo(rnorm(sz), sample(indices, sz)) - y <- zoo(rnorm(sz), sample(indices, sz)) - timeit(function() {x + y}) -} - -xts.bench <- function(){ - x <- xts(rnorm(sz), sample(indices, sz)) - y <- xts(rnorm(sz), sample(indices, sz)) - timeit(function() {x + y}) -} - -fts.bench <- function(){ - x <- fts(rnorm(sz), sort(sample(indices, sz))) - y <- fts(rnorm(sz), sort(sample(indices, sz)) - timeit(function() {x + y}) -} - -its.bench <- function(){ - x <- its(rnorm(sz), sort(sample(indices, sz))) - y <- its(rnorm(sz), sort(sample(indices, sz))) - timeit(function() {x + y}) -} - -irts.bench <- function(){ - x <- irts(sort(sample(indices, sz)), rnorm(sz)) - y <- irts(sort(sample(indices, sz)), rnorm(sz)) - timeit(function() {x + y}) -} - -timeit <- function(f){ - timings <- numeric() - for (i in 1:10) { - gc() - timings[i] = system.time(f())[3] - } - mean(timings) -} - -bench <- function(){ - results <- c(xts.bench(), fts.bench(), its.bench(), zoo.bench()) - names <- c("xts", "fts", "its", "zoo") - data.frame(results, names) -} - -result <- bench() diff --git a/bench/zoo_bench.py b/bench/zoo_bench.py deleted file mode 100644 index 74cb1952a5a2a..0000000000000 --- a/bench/zoo_bench.py +++ /dev/null @@ -1,36 +0,0 @@ -from pandas import * -from pandas.util.testing import rands - -n = 1000000 -# indices = Index([rands(10) for _ in xrange(n)]) - - -def sample(values, k): - sampler = np.random.permutation(len(values)) - return values.take(sampler[:k]) -sz = 500000 -rng = np.arange(0, 10000000000000, 10000000) -stamps = np.datetime64(datetime.now()).view('i8') + rng -idx1 = np.sort(sample(stamps, sz)) -idx2 = np.sort(sample(stamps, sz)) -ts1 = Series(np.random.randn(sz), idx1) -ts2 = Series(np.random.randn(sz), idx2) - - -# subsample_size = 90000 - -# x = Series(np.random.randn(100000), indices) -# y = Series(np.random.randn(subsample_size), -# index=sample(indices, subsample_size)) - - -# lx = larry(np.random.randn(100000), [list(indices)]) -# ly = larry(np.random.randn(subsample_size), [list(y.index)]) - -# Benchmark 1: Two 1-million length time series (int64-based index) with -# randomly chosen timestamps - -# Benchmark 2: Join two 5-variate time series DataFrames (outer and inner join) - -# df1 = DataFrame(np.random.randn(1000000, 5), idx1, columns=range(5)) -# df2 = DataFrame(np.random.randn(1000000, 5), idx2, columns=range(5, 10)) diff --git a/ci/appveyor.recipe/bld.bat b/ci/appveyor.recipe/bld.bat deleted file mode 100644 index 284926fae8c04..0000000000000 --- a/ci/appveyor.recipe/bld.bat +++ /dev/null @@ -1,2 +0,0 @@ -@echo off -%PYTHON% setup.py install diff --git a/ci/appveyor.recipe/build.sh b/ci/appveyor.recipe/build.sh deleted file mode 100644 index f341bce6fcf96..0000000000000 --- a/ci/appveyor.recipe/build.sh +++ /dev/null @@ -1,2 +0,0 @@ -#!/bin/sh -$PYTHON setup.py install diff --git a/ci/appveyor.recipe/meta.yaml b/ci/appveyor.recipe/meta.yaml deleted file mode 100644 index 777fd9d682d48..0000000000000 --- a/ci/appveyor.recipe/meta.yaml +++ /dev/null @@ -1,37 +0,0 @@ -package: - name: pandas - version: 0.20.0 - -build: - number: {{environ.get('APPVEYOR_BUILD_NUMBER', 0)}} # [win] - string: np{{ environ.get('CONDA_NPY') }}py{{ environ.get('CONDA_PY') }}_{{ environ.get('APPVEYOR_BUILD_NUMBER', 0) }} # [win] - -source: - - # conda-build needs a full clone - # rather than a shallow git_url type clone - # https://github.com/conda/conda-build/issues/780 - path: ../../ - -requirements: - build: - - python - - cython - - numpy x.x - - setuptools - - pytz - - python-dateutil - - run: - - python - - numpy x.x - - python-dateutil - - pytz - -test: - imports: - - pandas - -about: - home: http://pandas.pydata.org - license: BSD diff --git a/ci/before_install_travis.sh b/ci/before_install_travis.sh index f90427f97d3b7..2d0b4da6120dc 100755 --- a/ci/before_install_travis.sh +++ b/ci/before_install_travis.sh @@ -1,15 +1,10 @@ #!/bin/bash -# If envars.sh determined we're running in an authorized fork -# and the user opted in to the network cache,and that cached versions -# are available on the cache server, download and deploy the cached -# files to the local filesystem - echo "inside $0" -# overview if [ "${TRAVIS_OS_NAME}" == "linux" ]; then sh -e /etc/init.d/xvfb start fi -true # never fail because bad things happened here +# Never fail because bad things happened here. +true diff --git a/ci/build_docs.sh b/ci/build_docs.sh index 4dc9a203f1978..a038304fe0f7a 100755 --- a/ci/build_docs.sh +++ b/ci/build_docs.sh @@ -17,14 +17,11 @@ if [ "$?" != "0" ]; then fi -if [ x"$DOC_BUILD" != x"" ]; then +if [ "$DOC" ]; then echo "Will build docs" source activate pandas - conda install -n pandas -c r r rpy2 --yes - - time sudo apt-get $APT_ARGS install dvipng texlive-latex-base texlive-latex-extra mv "$TRAVIS_BUILD_DIR"/doc /tmp cd /tmp/doc @@ -43,10 +40,10 @@ if [ x"$DOC_BUILD" != x"" ]; then cd /tmp/doc/build/html git config --global user.email "pandas-docs-bot@localhost.foo" git config --global user.name "pandas-docs-bot" - git config --global credential.helper cache # create the repo git init + touch README git add README git commit -m "Initial commit" --allow-empty @@ -55,9 +52,22 @@ if [ x"$DOC_BUILD" != x"" ]; then touch .nojekyll git add --all . git commit -m "Version" --allow-empty + git remote remove origin - git remote add origin "https://${PANDAS_GH_TOKEN}@github.com/pandas-docs/pandas-docs-travis.git" + git remote add origin "https://${PANDAS_GH_TOKEN}@github.com/pandas-dev/pandas-docs-travis.git" + git fetch origin + git remote -v + git push origin gh-pages -f + + echo "Running doctests" + cd "$TRAVIS_BUILD_DIR" + pytest --doctest-modules \ + pandas/core/reshape/concat.py \ + pandas/core/reshape/pivot.py \ + pandas/core/reshape/reshape.py \ + pandas/core/reshape/tile.py + fi exit 0 diff --git a/ci/check_cache.sh b/ci/check_cache.sh index cd7a6e8f6b6f9..b83144fc45ef4 100755 --- a/ci/check_cache.sh +++ b/ci/check_cache.sh @@ -1,5 +1,9 @@ #!/bin/bash +# currently not used +# script to make sure that cache is clean +# Travis CI now handles this + if [ "$TRAVIS_PULL_REQUEST" == "false" ] then echo "Not a PR: checking for changes in ci/ from last 2 commits" @@ -12,14 +16,12 @@ else ci_changes=$(git diff PR_HEAD~2 --numstat | grep -E "ci/"| wc -l) fi -MINICONDA_DIR="$HOME/miniconda/" CACHE_DIR="$HOME/.cache/" CCACHE_DIR="$HOME/.ccache/" if [ $ci_changes -ne 0 ] then echo "Files have changed in ci/ deleting all caches" - rm -rf "$MINICONDA_DIR" rm -rf "$CACHE_DIR" rm -rf "$CCACHE_DIR" -fi \ No newline at end of file +fi diff --git a/ci/install_circle.sh b/ci/install_circle.sh new file mode 100755 index 0000000000000..29ca69970104b --- /dev/null +++ b/ci/install_circle.sh @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +home_dir=$(pwd) +echo "[home_dir: $home_dir]" + +echo "[ls -ltr]" +ls -ltr + +echo "[Using clean Miniconda install]" +rm -rf "$MINICONDA_DIR" + +# install miniconda +wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -q -O miniconda.sh || exit 1 +bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1 + +export PATH="$MINICONDA_DIR/bin:$PATH" + +echo "[update conda]" +conda config --set ssl_verify false || exit 1 +conda config --set always_yes true --set changeps1 false || exit 1 +conda update -q conda + +# add the pandas channel to take priority +# to add extra packages +echo "[add channels]" +conda config --add channels pandas || exit 1 +conda config --remove channels defaults || exit 1 +conda config --add channels defaults || exit 1 + +# Useful for debugging any issues with conda +conda info -a || exit 1 + +# support env variables passed +export ENVS_FILE=".envs" + +# make sure that the .envs file exists. it is ok if it is empty +touch $ENVS_FILE + +# assume all command line arguments are environmental variables +for var in "$@" +do + echo "export $var" >> $ENVS_FILE +done + +echo "[environmental variable file]" +cat $ENVS_FILE +source $ENVS_FILE + +export REQ_BUILD=ci/requirements-${JOB}.build +export REQ_RUN=ci/requirements-${JOB}.run +export REQ_PIP=ci/requirements-${JOB}.pip + +# edit the locale override if needed +if [ -n "$LOCALE_OVERRIDE" ]; then + echo "[Adding locale to the first line of pandas/__init__.py]" + rm -f pandas/__init__.pyc + sedc="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LOCALE_OVERRIDE')\n" + sed -i "$sedc" pandas/__init__.py + echo "[head -4 pandas/__init__.py]" + head -4 pandas/__init__.py + echo +fi + +# create envbuild deps +echo "[create env: ${REQ_BUILD}]" +time conda create -n pandas -q --file=${REQ_BUILD} || exit 1 +time conda install -n pandas pytest>=3.1.0 || exit 1 + +source activate pandas + +# build but don't install +echo "[build em]" +time python setup.py build_ext --inplace || exit 1 + +# we may have run installations +echo "[conda installs: ${REQ_RUN}]" +if [ -e ${REQ_RUN} ]; then + time conda install -q --file=${REQ_RUN} || exit 1 +fi + +# we may have additional pip installs +echo "[pip installs: ${REQ_PIP}]" +if [ -e ${REQ_PIP} ]; then + pip install -r $REQ_PIP +fi diff --git a/ci/install_db_circle.sh b/ci/install_db_circle.sh new file mode 100755 index 0000000000000..a00f74f009f54 --- /dev/null +++ b/ci/install_db_circle.sh @@ -0,0 +1,8 @@ +#!/bin/bash + +echo "installing dbs" +mysql -e 'create database pandas_nosetest;' +psql -c 'create database pandas_nosetest;' -U postgres + +echo "done" +exit 0 diff --git a/ci/install_db.sh b/ci/install_db_travis.sh similarity index 100% rename from ci/install_db.sh rename to ci/install_db_travis.sh diff --git a/ci/install_test.sh b/ci/install_test.sh deleted file mode 100755 index e01ad7b94a349..0000000000000 --- a/ci/install_test.sh +++ /dev/null @@ -1,17 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -if [ "$INSTALL_TEST" ]; then - source activate pandas - echo "Starting installation test." - conda uninstall cython || exit 1 - python "$TRAVIS_BUILD_DIR"/setup.py sdist --formats=zip,gztar || exit 1 - pip install "$TRAVIS_BUILD_DIR"/dist/*tar.gz || exit 1 - nosetests --exe -A "$NOSE_ARGS" pandas/tests/test_series.py --with-xunit --xunit-file=/tmp/nosetests_install.xml -else - echo "Skipping installation test." -fi -RET="$?" - -exit "$RET" diff --git a/ci/install_travis.sh b/ci/install_travis.sh index ded428c677f17..d26689f2e6b4b 100755 --- a/ci/install_travis.sh +++ b/ci/install_travis.sh @@ -1,18 +1,6 @@ #!/bin/bash -# There are 2 distinct pieces that get zipped and cached -# - The venv site-packages dir including the installed dependencies -# - The pandas build artifacts, using the build cache support via -# scripts/use_build_cache.py -# -# if the user opted in to use the cache and we're on a whitelisted fork -# - if the server doesn't hold a cached version of venv/pandas build, -# do things the slow way, and put the results on the cache server -# for the next time. -# - if the cache files are available, instal some necessaries via apt -# (no compiling needed), then directly goto script and collect 200$. -# - +# edit the locale file if needed function edit_init() { if [ -n "$LOCALE_OVERRIDE" ]; then @@ -26,143 +14,173 @@ function edit_init() fi } +echo echo "[install_travis]" edit_init home_dir=$(pwd) -echo "[home_dir: $home_dir]" +echo +echo "[home_dir]: $home_dir" +# install miniconda MINICONDA_DIR="$HOME/miniconda3" -if [ -d "$MINICONDA_DIR" ] && [ -e "$MINICONDA_DIR/bin/conda" ] && [ "$USE_CACHE" ]; then - echo "[Miniconda install already present from cache: $MINICONDA_DIR]" - - conda config --set always_yes yes --set changeps1 no || exit 1 - echo "[update conda]" - conda update -q conda || exit 1 - - # Useful for debugging any issues with conda - conda info -a || exit 1 - - # set the compiler cache to work - if [ "${TRAVIS_OS_NAME}" == "linux" ]; then - echo "[Using ccache]" - export PATH=/usr/lib/ccache:/usr/lib64/ccache:$PATH - gcc=$(which gcc) - echo "[gcc: $gcc]" - ccache=$(which ccache) - echo "[ccache: $ccache]" - export CC='ccache gcc' - fi +echo +echo "[Using clean Miniconda install]" -else - echo "[Using clean Miniconda install]" - echo "[Not using ccache]" +if [ -d "$MINICONDA_DIR" ]; then rm -rf "$MINICONDA_DIR" - # install miniconda - if [ "${TRAVIS_OS_NAME}" == "osx" ]; then - wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh || exit 1 - else - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh || exit 1 - fi - bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1 - - echo "[update conda]" - conda config --set ssl_verify false || exit 1 - conda config --set always_yes true --set changeps1 false || exit 1 - conda update -q conda - - # add the pandas channel to take priority - # to add extra packages - echo "[add channels]" - conda config --add channels pandas || exit 1 - conda config --remove channels defaults || exit 1 - conda config --add channels defaults || exit 1 - - conda install anaconda-client +fi - # Useful for debugging any issues with conda - conda info -a || exit 1 +# install miniconda +if [ "${TRAVIS_OS_NAME}" == "osx" ]; then + time wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh || exit 1 +else + time wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh || exit 1 +fi +time bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1 + +echo +echo "[show conda]" +which conda + +echo +echo "[update conda]" +conda config --set ssl_verify false || exit 1 +conda config --set quiet true --set always_yes true --set changeps1 false || exit 1 +conda update -q conda + +echo +echo "[add channels]" +conda config --remove channels defaults || exit 1 +conda config --add channels defaults || exit 1 + +if [ "$CONDA_FORGE" ]; then + # add conda-forge channel as priority + conda config --add channels conda-forge || exit 1 fi -# may have installation instructions for this build -INSTALL="ci/install-${PYTHON_VERSION}${JOB_TAG}.sh" -if [ -e ${INSTALL} ]; then - time bash $INSTALL || exit 1 +# Useful for debugging any issues with conda +conda info -a || exit 1 + +# set the compiler cache to work +echo +if [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "linux" ]; then + echo "[Using ccache]" + export PATH=/usr/lib/ccache:/usr/lib64/ccache:$PATH + gcc=$(which gcc) + echo "[gcc]: $gcc" + ccache=$(which ccache) + echo "[ccache]: $ccache" + export CC='ccache gcc' +elif [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "osx" ]; then + echo "[Install ccache]" + brew install ccache > /dev/null 2>&1 + echo "[Using ccache]" + export PATH=/usr/local/opt/ccache/libexec:$PATH + gcc=$(which gcc) + echo "[gcc]: $gcc" + ccache=$(which ccache) + echo "[ccache]: $ccache" else + echo "[Not using ccache]" +fi - # create new env - time conda create -n pandas python=$PYTHON_VERSION nose || exit 1 +echo +echo "[create env]" - if [ "$COVERAGE" ]; then - pip install coverage - fi - if [ "$LINT" ]; then - conda install flake8 - pip install cpplint - fi -fi +# create our environment +REQ="ci/requirements-${JOB}.build" +time conda create -n pandas --file=${REQ} || exit 1 -# build deps -echo "[build installs]" -REQ="ci/requirements-${PYTHON_VERSION}${JOB_TAG}.build" -if [ -e ${REQ} ]; then - time conda install -n pandas --file=${REQ} || exit 1 -fi +source activate pandas # may have addtl installation instructions for this build +echo echo "[build addtl installs]" -REQ="ci/requirements-${PYTHON_VERSION}${JOB_TAG}.build.sh" +REQ="ci/requirements-${JOB}.build.sh" if [ -e ${REQ} ]; then time bash $REQ || exit 1 fi -source activate pandas +time conda install -n pandas pytest>=3.1.0 +time pip install pytest-xdist -if [ "$BUILD_TEST" ]; then +if [ "$LINT" ]; then + conda install flake8 + pip install cpplint +fi - # build testing - pip uninstall --yes cython - pip install cython==0.23 - ( python setup.py build_ext --inplace && python setup.py develop ) || true +if [ "$COVERAGE" ]; then + pip install coverage pytest-cov +fi -else +echo +if [ -z "$BUILD_TEST" ]; then # build but don't install echo "[build em]" time python setup.py build_ext --inplace || exit 1 - # we may have run installations - echo "[conda installs]" - REQ="ci/requirements-${PYTHON_VERSION}${JOB_TAG}.run" - if [ -e ${REQ} ]; then - time conda install -n pandas --file=${REQ} || exit 1 - fi +fi - # we may have additional pip installs - echo "[pip installs]" - REQ="ci/requirements-${PYTHON_VERSION}${JOB_TAG}.pip" - if [ -e ${REQ} ]; then - pip install --upgrade -r $REQ - fi +# we may have run installations +echo +echo "[conda installs]" +REQ="ci/requirements-${JOB}.run" +if [ -e ${REQ} ]; then + time conda install -n pandas --file=${REQ} || exit 1 +fi - # may have addtl installation instructions for this build - echo "[addtl installs]" - REQ="ci/requirements-${PYTHON_VERSION}${JOB_TAG}.sh" - if [ -e ${REQ} ]; then - time bash $REQ || exit 1 - fi +# we may have additional pip installs +echo +echo "[pip installs]" +REQ="ci/requirements-${JOB}.pip" +if [ -e ${REQ} ]; then + pip install -r $REQ +fi - # remove any installed pandas package - # w/o removing anything else - echo "[removing installed pandas]" - conda remove pandas --force +# may have addtl installation instructions for this build +echo +echo "[addtl installs]" +REQ="ci/requirements-${JOB}.sh" +if [ -e ${REQ} ]; then + time bash $REQ || exit 1 +fi + +# remove any installed pandas package +# w/o removing anything else +echo +echo "[removing installed pandas]" +conda remove pandas -y --force +pip uninstall -y pandas + +if [ "$BUILD_TEST" ]; then + + # remove any installation + pip uninstall -y pandas + conda list pandas + pip list --format columns |grep pandas + + # build & install testing + echo ["building release"] + bash scripts/build_dist_for_release.sh + conda uninstall -y cython + time pip install dist/*tar.gz || exit 1 + +else # install our pandas + echo echo "[running setup.py develop]" python setup.py develop || exit 1 fi +echo +echo "[show pandas]" +conda list pandas + +echo echo "[done]" exit 0 diff --git a/ci/lint.sh b/ci/lint.sh index 2cbfdadf486b8..ed3af2568811c 100755 --- a/ci/lint.sh +++ b/ci/lint.sh @@ -8,9 +8,9 @@ RET=0 if [ "$LINT" ]; then - # pandas/src is C code, so no need to search there. + # pandas/_libs/src is C code, so no need to search there. echo "Linting *.py" - flake8 pandas --filename=*.py --exclude pandas/src + flake8 pandas --filename=*.py --exclude pandas/_libs/src if [ $? -ne "0" ]; then RET=1 fi @@ -46,8 +46,8 @@ if [ "$LINT" ]; then echo "Linting *.c and *.h" for path in '*.h' 'period_helper.c' 'datetime' 'parser' 'ujson' do - echo "linting -> pandas/src/$path" - cpplint --quiet --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/src/$path + echo "linting -> pandas/_libs/src/$path" + cpplint --quiet --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/_libs/src/$path if [ $? -ne "0" ]; then RET=1 fi @@ -55,7 +55,7 @@ if [ "$LINT" ]; then echo "Linting *.c and *.h DONE" echo "Check for invalid testing" - grep -r -E --include '*.py' --exclude nosetester.py --exclude testing.py '(numpy|np)\.testing' pandas + grep -r -E --include '*.py' --exclude testing.py '(numpy|np)\.testing' pandas if [ $? = "0" ]; then RET=1 fi diff --git a/ci/prep_cython_cache.sh b/ci/prep_cython_cache.sh index e091bb00ccedc..18d9388327ddc 100755 --- a/ci/prep_cython_cache.sh +++ b/ci/prep_cython_cache.sh @@ -22,7 +22,7 @@ fi home_dir=$(pwd) -if [ -f "$CACHE_File" ] && [ "$USE_CACHE" ] && [ -d "$PYX_CACHE_DIR" ]; then +if [ -f "$CACHE_File" ] && [ -z "$NOCACHE" ] && [ -d "$PYX_CACHE_DIR" ]; then echo "Cache available - checking pyx diff" @@ -57,16 +57,16 @@ if [ -f "$CACHE_File" ] && [ "$USE_CACHE" ] && [ -d "$PYX_CACHE_DIR" ]; then fi -if [ $clear_cache -eq 0 ] && [ "$USE_CACHE" ] +if [ $clear_cache -eq 0 ] && [ -z "$NOCACHE" ] then - # No and use_cache is set + # No and nocache is not set echo "Will reuse cached cython file" cd / tar xvmf $CACHE_File cd $home_dir else echo "Rebuilding cythonized files" - echo "Use cache (Blank if not set) = $USE_CACHE" + echo "No cache = $NOCACHE" echo "Clear cache (1=YES) = $clear_cache" fi diff --git a/ci/print_skipped.py b/ci/print_skipped.py index 9fb05df64bcea..dd2180f6eeb19 100755 --- a/ci/print_skipped.py +++ b/ci/print_skipped.py @@ -30,20 +30,21 @@ def parse_results(filename): i += 1 assert i - 1 == len(skipped) assert i - 1 == len(skipped) - assert len(skipped) == int(root.attrib['skip']) + # assert len(skipped) == int(root.attrib['skip']) return '\n'.join(skipped) def main(args): print('SKIPPED TESTS:') - print(parse_results(args.filename)) + for fn in args.filename: + print(parse_results(fn)) return 0 def parse_args(): import argparse parser = argparse.ArgumentParser() - parser.add_argument('filename', help='XUnit file to parse') + parser.add_argument('filename', nargs='+', help='XUnit file to parse') return parser.parse_args() diff --git a/ci/requirements-2.7.build b/ci/requirements-2.7.build index 836385671d603..415df13179fcf 100644 --- a/ci/requirements-2.7.build +++ b/ci/requirements-2.7.build @@ -1,4 +1,6 @@ +python=2.7* python-dateutil=2.4.1 pytz=2013b +nomkl numpy cython=0.23 diff --git a/ci/requirements-2.7.pip b/ci/requirements-2.7.pip index d16b932c8be4f..876d9e978fa84 100644 --- a/ci/requirements-2.7.pip +++ b/ci/requirements-2.7.pip @@ -1,9 +1,10 @@ blosc -httplib2 -google-api-python-client==1.2 -python-gflags==2.0 -oauth2client==1.5.0 +pandas-gbq +html5lib +beautifulsoup4 pathlib backports.lzma py PyCrypto +mock +ipython diff --git a/ci/requirements-2.7.run b/ci/requirements-2.7.run index 2bfb8a3777fdf..7152cb2c8b605 100644 --- a/ci/requirements-2.7.run +++ b/ci/requirements-2.7.run @@ -10,14 +10,11 @@ xlrd=0.9.2 sqlalchemy=0.9.6 lxml=3.2.1 scipy -xlsxwriter=0.4.6 +xlsxwriter=0.5.2 s3fs bottleneck -psycopg2=2.5.2 +psycopg2 patsy pymysql=0.6.3 -html5lib=1.0b2 -beautiful-soup=4.2.1 -statsmodels jinja2=2.8 -xarray +xarray=0.8.0 diff --git a/ci/requirements-2.7.sh b/ci/requirements-2.7.sh index 64d470e5c6e0e..e3bd5e46026c5 100644 --- a/ci/requirements-2.7.sh +++ b/ci/requirements-2.7.sh @@ -4,4 +4,4 @@ source activate pandas echo "install 27" -conda install -n pandas -c conda-forge feather-format +conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 fastparquet diff --git a/ci/requirements-2.7_BUILD_TEST.build b/ci/requirements-2.7_BUILD_TEST.build index faf1e3559f7f1..aadec00cb7ebf 100644 --- a/ci/requirements-2.7_BUILD_TEST.build +++ b/ci/requirements-2.7_BUILD_TEST.build @@ -1,4 +1,6 @@ +python=2.7* dateutil pytz +nomkl numpy cython diff --git a/ci/requirements-2.7_BUILD_TEST.pip b/ci/requirements-2.7_BUILD_TEST.pip new file mode 100644 index 0000000000000..a0fc77c40bc00 --- /dev/null +++ b/ci/requirements-2.7_BUILD_TEST.pip @@ -0,0 +1,7 @@ +xarray +geopandas +seaborn +pandas_gbq +pandas_datareader +statsmodels +scikit-learn diff --git a/ci/requirements-2.7_BUILD_TEST.sh b/ci/requirements-2.7_BUILD_TEST.sh new file mode 100755 index 0000000000000..78941fd0944e5 --- /dev/null +++ b/ci/requirements-2.7_BUILD_TEST.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +source activate pandas + +echo "install 27 BUILD_TEST" + +conda install -n pandas -c conda-forge pyarrow dask diff --git a/ci/requirements-2.7_COMPAT.build b/ci/requirements-2.7_COMPAT.build index 95e3da03f161b..d9c932daa110b 100644 --- a/ci/requirements-2.7_COMPAT.build +++ b/ci/requirements-2.7_COMPAT.build @@ -1,4 +1,5 @@ -numpy=1.7.1 +python=2.7* +numpy=1.9.2 cython=0.23 dateutil=1.5 pytz=2013b diff --git a/ci/requirements-2.7_COMPAT.pip b/ci/requirements-2.7_COMPAT.pip index 9533a630d06a4..13cd35a923124 100644 --- a/ci/requirements-2.7_COMPAT.pip +++ b/ci/requirements-2.7_COMPAT.pip @@ -1,2 +1,4 @@ +html5lib==1.0b2 +beautifulsoup4==4.2.0 openpyxl argparse diff --git a/ci/requirements-2.7_COMPAT.run b/ci/requirements-2.7_COMPAT.run index 32d71beb24388..39bf720140733 100644 --- a/ci/requirements-2.7_COMPAT.run +++ b/ci/requirements-2.7_COMPAT.run @@ -1,17 +1,14 @@ -numpy=1.7.1 +numpy=1.9.2 dateutil=1.5 pytz=2013b -scipy=0.11.0 +scipy=0.14.0 xlwt=0.7.5 xlrd=0.9.2 -statsmodels=0.4.3 -bottleneck=0.8.0 -numexpr=2.2.2 -pytables=3.0.0 -html5lib=1.0b2 -beautiful-soup=4.2.0 -psycopg2=2.5.1 +bottleneck=1.0.0 +numexpr=2.4.4 # we test that we correctly don't use an unsupported numexpr +pytables=3.2.2 +psycopg2 pymysql=0.6.0 sqlalchemy=0.7.8 -xlsxwriter=0.4.6 +xlsxwriter=0.5.2 jinja2=2.8 diff --git a/ci/requirements-2.7_LOCALE.build b/ci/requirements-2.7_LOCALE.build index c17730b912651..96cb184ec2665 100644 --- a/ci/requirements-2.7_LOCALE.build +++ b/ci/requirements-2.7_LOCALE.build @@ -1,4 +1,5 @@ +python=2.7* python-dateutil pytz=2013b -numpy=1.7.1 +numpy=1.9.2 cython=0.23 diff --git a/ci/requirements-2.7_LOCALE.pip b/ci/requirements-2.7_LOCALE.pip index cf8e6b8b3d3a6..1b825bbf492ca 100644 --- a/ci/requirements-2.7_LOCALE.pip +++ b/ci/requirements-2.7_LOCALE.pip @@ -1 +1,3 @@ +html5lib==1.0b2 +beautifulsoup4==4.2.1 blosc diff --git a/ci/requirements-2.7_LOCALE.run b/ci/requirements-2.7_LOCALE.run index 9bb37ee10f8db..00006106f7009 100644 --- a/ci/requirements-2.7_LOCALE.run +++ b/ci/requirements-2.7_LOCALE.run @@ -1,17 +1,12 @@ python-dateutil pytz=2013b -numpy=1.7.1 +numpy=1.9.2 xlwt=0.7.5 openpyxl=1.6.2 -xlsxwriter=0.4.6 +xlsxwriter=0.5.2 xlrd=0.9.2 -bottleneck=0.8.0 -matplotlib=1.2.1 -patsy=0.1.0 +bottleneck=1.0.0 +matplotlib=1.4.3 sqlalchemy=0.8.1 -html5lib=1.0b2 lxml=3.2.1 -scipy=0.11.0 -beautiful-soup=4.2.1 -statsmodels=0.4.3 -bigquery=2.0.17 +scipy diff --git a/ci/requirements-2.7_SLOW.build b/ci/requirements-2.7_SLOW.build index 664e8b418def7..a665ab9edd585 100644 --- a/ci/requirements-2.7_SLOW.build +++ b/ci/requirements-2.7_SLOW.build @@ -1,4 +1,5 @@ +python=2.7* python-dateutil pytz -numpy=1.8.2 +numpy=1.10* cython diff --git a/ci/requirements-2.7_SLOW.run b/ci/requirements-2.7_SLOW.run index 630d22636f284..f7708283ad04a 100644 --- a/ci/requirements-2.7_SLOW.run +++ b/ci/requirements-2.7_SLOW.run @@ -1,10 +1,9 @@ python-dateutil pytz -numpy=1.8.2 -matplotlib=1.3.1 +numpy=1.10* +matplotlib=1.4.3 scipy patsy -statsmodels xlwt openpyxl xlsxwriter @@ -14,7 +13,6 @@ pytables sqlalchemy lxml s3fs -bottleneck psycopg2 pymysql html5lib diff --git a/ci/requirements-2.7-64.run b/ci/requirements-2.7_WIN.run similarity index 100% rename from ci/requirements-2.7-64.run rename to ci/requirements-2.7_WIN.run diff --git a/ci/requirements-3.4-64.run b/ci/requirements-3.4-64.run deleted file mode 100644 index 106cc5b7168ba..0000000000000 --- a/ci/requirements-3.4-64.run +++ /dev/null @@ -1,12 +0,0 @@ -python-dateutil -pytz -numpy=1.9* -openpyxl -xlsxwriter -xlrd -xlwt -scipy -numexpr -pytables -bottleneck -jinja2=2.8 diff --git a/ci/requirements-3.4.build b/ci/requirements-3.4.build deleted file mode 100644 index e6e59dcba63fe..0000000000000 --- a/ci/requirements-3.4.build +++ /dev/null @@ -1,3 +0,0 @@ -numpy=1.8.1 -cython=0.24.1 -libgfortran=1.0 diff --git a/ci/requirements-3.4.pip b/ci/requirements-3.4.pip deleted file mode 100644 index 55986a0220bf0..0000000000000 --- a/ci/requirements-3.4.pip +++ /dev/null @@ -1,5 +0,0 @@ -python-dateutil==2.2 -blosc -httplib2 -google-api-python-client -oauth2client diff --git a/ci/requirements-3.4.run b/ci/requirements-3.4.run deleted file mode 100644 index 3e12adae7dd9f..0000000000000 --- a/ci/requirements-3.4.run +++ /dev/null @@ -1,18 +0,0 @@ -pytz=2015.7 -numpy=1.8.1 -openpyxl -xlsxwriter -xlrd -xlwt -html5lib -patsy -beautiful-soup -scipy -numexpr -pytables -lxml -sqlalchemy -bottleneck -pymysql=0.6.3 -psycopg2 -jinja2=2.8 diff --git a/ci/requirements-3.4_SLOW.sh b/ci/requirements-3.4_SLOW.sh deleted file mode 100644 index 24f1e042ed69e..0000000000000 --- a/ci/requirements-3.4_SLOW.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 34_slow" - -conda install -n pandas -c conda-forge matplotlib diff --git a/ci/requirements-3.5.build b/ci/requirements-3.5.build index 9558cf00ddf5c..76227e106e1fd 100644 --- a/ci/requirements-3.5.build +++ b/ci/requirements-3.5.build @@ -1,4 +1,6 @@ +python=3.5* python-dateutil pytz -numpy +nomkl +numpy=1.11.3 cython diff --git a/ci/requirements-3.5.pip b/ci/requirements-3.5.pip new file mode 100644 index 0000000000000..6e4f7b65f9728 --- /dev/null +++ b/ci/requirements-3.5.pip @@ -0,0 +1,2 @@ +xarray==0.9.1 +pandas-gbq diff --git a/ci/requirements-3.5.run b/ci/requirements-3.5.run index e15ca6079b4fe..52828b5220997 100644 --- a/ci/requirements-3.5.run +++ b/ci/requirements-3.5.run @@ -1,6 +1,5 @@ -python-dateutil pytz -numpy +numpy=1.11.3 openpyxl xlsxwriter xlrd @@ -16,6 +15,6 @@ bottleneck sqlalchemy pymysql psycopg2 -xarray s3fs beautifulsoup4 +ipython diff --git a/ci/requirements-3.5.sh b/ci/requirements-3.5.sh index d0f0b81802dc6..33db9c28c78a9 100644 --- a/ci/requirements-3.5.sh +++ b/ci/requirements-3.5.sh @@ -4,4 +4,8 @@ source activate pandas echo "install 35" -conda install -n pandas -c conda-forge feather-format +# pip install python-dateutil to get latest +conda remove -n pandas python-dateutil --force +pip install python-dateutil + +conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 diff --git a/ci/requirements-3.5_ASCII.build b/ci/requirements-3.5_ASCII.build index 9558cf00ddf5c..f7befe3b31865 100644 --- a/ci/requirements-3.5_ASCII.build +++ b/ci/requirements-3.5_ASCII.build @@ -1,4 +1,6 @@ +python=3.5* python-dateutil pytz +nomkl numpy cython diff --git a/ci/requirements-3.5_DOC_BUILD.sh b/ci/requirements-3.5_DOC_BUILD.sh deleted file mode 100644 index ca18ad976d46d..0000000000000 --- a/ci/requirements-3.5_DOC_BUILD.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install DOC_BUILD" - -conda install -n pandas -c conda-forge feather-format diff --git a/ci/requirements-3.5_NUMPY_DEV.build.sh b/ci/requirements-3.5_NUMPY_DEV.build.sh deleted file mode 100644 index 91fa15491bbf7..0000000000000 --- a/ci/requirements-3.5_NUMPY_DEV.build.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install numpy master wheel" - -# remove the system installed numpy -pip uninstall numpy -y - -# install numpy wheel from master -pip install --pre --upgrade --no-index --timeout=60 --trusted-host travis-dev-wheels.scipy.org -f http://travis-dev-wheels.scipy.org/ numpy scipy - -true diff --git a/ci/requirements-3.5_NUMPY_DEV.run b/ci/requirements-3.5_NUMPY_DEV.run deleted file mode 100644 index 0aa987baefb1d..0000000000000 --- a/ci/requirements-3.5_NUMPY_DEV.run +++ /dev/null @@ -1,2 +0,0 @@ -python-dateutil -pytz diff --git a/ci/requirements-3.5_OSX.build b/ci/requirements-3.5_OSX.build index a201be352b8e4..f5bc01b67a20a 100644 --- a/ci/requirements-3.5_OSX.build +++ b/ci/requirements-3.5_OSX.build @@ -1,2 +1,4 @@ +python=3.5* +nomkl numpy=1.10.4 cython diff --git a/ci/requirements-3.5_OSX.sh b/ci/requirements-3.5_OSX.sh index cfbd2882a8a2d..c2978b175968c 100644 --- a/ci/requirements-3.5_OSX.sh +++ b/ci/requirements-3.5_OSX.sh @@ -4,4 +4,4 @@ source activate pandas echo "install 35_OSX" -conda install -n pandas -c conda-forge feather-format +conda install -n pandas -c conda-forge feather-format==0.3.1 fastparquet diff --git a/ci/requirements-3.6-64.run b/ci/requirements-3.6-64.run deleted file mode 100644 index 58ba103504b2c..0000000000000 --- a/ci/requirements-3.6-64.run +++ /dev/null @@ -1,13 +0,0 @@ -python-dateutil -pytz -numpy -openpyxl -xlsxwriter -xlrd -#xlwt -scipy -feather-format -numexpr -pytables -matplotlib -blosc diff --git a/ci/requirements-3.6.build b/ci/requirements-3.6.build index 9558cf00ddf5c..1c4b46aea3865 100644 --- a/ci/requirements-3.6.build +++ b/ci/requirements-3.6.build @@ -1,4 +1,6 @@ +python=3.6* python-dateutil pytz +nomkl numpy cython diff --git a/ci/requirements-3.6.pip b/ci/requirements-3.6.pip index e69de29bb2d1d..753a60d6c119a 100644 --- a/ci/requirements-3.6.pip +++ b/ci/requirements-3.6.pip @@ -0,0 +1 @@ +brotlipy diff --git a/ci/requirements-3.6.run b/ci/requirements-3.6.run index 5d9cb05a7b402..822144a80bc9a 100644 --- a/ci/requirements-3.6.run +++ b/ci/requirements-3.6.run @@ -14,7 +14,12 @@ html5lib jinja2 sqlalchemy pymysql -# psycopg2 (not avail on defaults ATM) +feather-format +pyarrow +psycopg2 +python-snappy +fastparquet beautifulsoup4 s3fs xarray +ipython diff --git a/ci/requirements-3.6.sh b/ci/requirements-3.6.sh deleted file mode 100644 index 7d88ede751ec8..0000000000000 --- a/ci/requirements-3.6.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 36" - -conda install -n pandas -c conda-forge feather-format diff --git a/ci/requirements-3.5_DOC_BUILD.build b/ci/requirements-3.6_DOC.build similarity index 73% rename from ci/requirements-3.5_DOC_BUILD.build rename to ci/requirements-3.6_DOC.build index 9558cf00ddf5c..bdcfe28105866 100644 --- a/ci/requirements-3.5_DOC_BUILD.build +++ b/ci/requirements-3.6_DOC.build @@ -1,3 +1,4 @@ +python=3.6* python-dateutil pytz numpy diff --git a/ci/requirements-3.5_DOC_BUILD.run b/ci/requirements-3.6_DOC.run similarity index 71% rename from ci/requirements-3.5_DOC_BUILD.run rename to ci/requirements-3.6_DOC.run index 644a16f51f4b6..6c45e3371e9cf 100644 --- a/ci/requirements-3.5_DOC_BUILD.run +++ b/ci/requirements-3.6_DOC.run @@ -1,16 +1,19 @@ ipython ipykernel -sphinx +ipywidgets +sphinx=1.5* nbconvert nbformat notebook matplotlib +seaborn scipy lxml beautifulsoup4 html5lib pytables -openpyxl=1.8.5 +python-snappy +openpyxl xlrd xlwt xlsxwriter @@ -18,4 +21,5 @@ sqlalchemy numexpr bottleneck statsmodels -pyqt=4.11.4 +xarray +pyqt diff --git a/ci/requirements-3.6_DOC.sh b/ci/requirements-3.6_DOC.sh new file mode 100644 index 0000000000000..aec0f62148622 --- /dev/null +++ b/ci/requirements-3.6_DOC.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +source activate pandas + +echo "[install DOC_BUILD deps]" + +pip install pandas-gbq + +conda install -n pandas -c conda-forge feather-format pyarrow nbsphinx pandoc fastparquet + +conda install -n pandas -c r r rpy2 --yes diff --git a/ci/requirements-3.4_SLOW.build b/ci/requirements-3.6_LOCALE.build similarity index 53% rename from ci/requirements-3.4_SLOW.build rename to ci/requirements-3.6_LOCALE.build index c05a68a14b402..1c4b46aea3865 100644 --- a/ci/requirements-3.4_SLOW.build +++ b/ci/requirements-3.6_LOCALE.build @@ -1,4 +1,6 @@ +python=3.6* python-dateutil pytz -numpy=1.10* +nomkl +numpy cython diff --git a/bench/larry.py b/ci/requirements-3.6_LOCALE.pip similarity index 100% rename from bench/larry.py rename to ci/requirements-3.6_LOCALE.pip diff --git a/ci/requirements-3.4_SLOW.run b/ci/requirements-3.6_LOCALE.run similarity index 53% rename from ci/requirements-3.4_SLOW.run rename to ci/requirements-3.6_LOCALE.run index 215f840381ada..ad54284c6f7e3 100644 --- a/ci/requirements-3.4_SLOW.run +++ b/ci/requirements-3.6_LOCALE.run @@ -1,21 +1,22 @@ python-dateutil pytz -numpy=1.10* +numpy +scipy openpyxl xlsxwriter xlrd xlwt -html5lib -patsy -beautiful-soup -scipy -numexpr=2.4.4 +numexpr pytables matplotlib lxml +html5lib +jinja2 sqlalchemy -bottleneck pymysql +# feather-format (not available on defaults ATM) psycopg2 -statsmodels -jinja2=2.8 +beautifulsoup4 +s3fs +xarray +ipython diff --git a/ci/requirements-3.5_NUMPY_DEV.build b/ci/requirements-3.6_LOCALE_SLOW.build similarity index 53% rename from ci/requirements-3.5_NUMPY_DEV.build rename to ci/requirements-3.6_LOCALE_SLOW.build index d15edbfa3d2c1..1c4b46aea3865 100644 --- a/ci/requirements-3.5_NUMPY_DEV.build +++ b/ci/requirements-3.6_LOCALE_SLOW.build @@ -1,3 +1,6 @@ +python=3.6* python-dateutil pytz +nomkl +numpy cython diff --git a/pandas/api/tests/__init__.py b/ci/requirements-3.6_LOCALE_SLOW.pip similarity index 100% rename from pandas/api/tests/__init__.py rename to ci/requirements-3.6_LOCALE_SLOW.pip diff --git a/ci/requirements-3.6_LOCALE_SLOW.run b/ci/requirements-3.6_LOCALE_SLOW.run new file mode 100644 index 0000000000000..ad54284c6f7e3 --- /dev/null +++ b/ci/requirements-3.6_LOCALE_SLOW.run @@ -0,0 +1,22 @@ +python-dateutil +pytz +numpy +scipy +openpyxl +xlsxwriter +xlrd +xlwt +numexpr +pytables +matplotlib +lxml +html5lib +jinja2 +sqlalchemy +pymysql +# feather-format (not available on defaults ATM) +psycopg2 +beautifulsoup4 +s3fs +xarray +ipython diff --git a/ci/requirements-3.6_NUMPY_DEV.build b/ci/requirements-3.6_NUMPY_DEV.build new file mode 100644 index 0000000000000..900c050f1cc9e --- /dev/null +++ b/ci/requirements-3.6_NUMPY_DEV.build @@ -0,0 +1,3 @@ +python=3.6* +pytz +cython diff --git a/ci/requirements-3.6_NUMPY_DEV.build.sh b/ci/requirements-3.6_NUMPY_DEV.build.sh new file mode 100644 index 0000000000000..90ed04f8f0c17 --- /dev/null +++ b/ci/requirements-3.6_NUMPY_DEV.build.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +source activate pandas + +echo "install numpy master wheel" + +# remove the system installed numpy +pip uninstall numpy -y + +# install numpy wheel from master +PRE_WHEELS="https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com" +pip install --pre --upgrade --timeout=60 -f $PRE_WHEELS numpy scipy + +# install dateutil from master +pip install -U git+git://github.com/dateutil/dateutil.git + +true diff --git a/ci/requirements-3.6_NUMPY_DEV.run b/ci/requirements-3.6_NUMPY_DEV.run new file mode 100644 index 0000000000000..af44f198c687e --- /dev/null +++ b/ci/requirements-3.6_NUMPY_DEV.run @@ -0,0 +1 @@ +pytz diff --git a/ci/requirements-3.5-64.run b/ci/requirements-3.6_WIN.run similarity index 67% rename from ci/requirements-3.5-64.run rename to ci/requirements-3.6_WIN.run index 905c2ff3625bd..226caa458f6ee 100644 --- a/ci/requirements-3.5-64.run +++ b/ci/requirements-3.6_WIN.run @@ -1,13 +1,17 @@ python-dateutil pytz -numpy +numpy=1.12* +bottleneck openpyxl xlsxwriter xlrd xlwt scipy feather-format +pyarrow numexpr pytables matplotlib blosc +fastparquet +pyarrow diff --git a/ci/requirements_all.txt b/ci/requirements_all.txt index bc97957bff2b7..b153b6989df86 100644 --- a/ci/requirements_all.txt +++ b/ci/requirements_all.txt @@ -1,6 +1,9 @@ -nose +pytest>=3.1.0 +pytest-cov +pytest-xdist flake8 -sphinx +sphinx=1.5* +nbsphinx ipython python-dateutil pytz @@ -17,6 +20,7 @@ scipy numexpr pytables matplotlib +seaborn lxml sqlalchemy bottleneck diff --git a/ci/requirements_dev.txt b/ci/requirements_dev.txt index 7396fba6548d9..c7190c506ba18 100644 --- a/ci/requirements_dev.txt +++ b/ci/requirements_dev.txt @@ -2,5 +2,6 @@ python-dateutil pytz numpy cython -nose +pytest>=3.1.0 +pytest-cov flake8 diff --git a/ci/run_circle.sh b/ci/run_circle.sh new file mode 100755 index 0000000000000..0e46d28ab6fc4 --- /dev/null +++ b/ci/run_circle.sh @@ -0,0 +1,9 @@ +#!/usr/bin/env bash + +echo "[running tests]" +export PATH="$MINICONDA_DIR/bin:$PATH" + +source activate pandas + +echo "pytest --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas" +pytest --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas diff --git a/ci/script.sh b/ci/script.sh deleted file mode 100755 index e2ba883b81883..0000000000000 --- a/ci/script.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -source activate pandas - -# don't run the tests for the doc build -if [ x"$DOC_BUILD" != x"" ]; then - exit 0 -fi - -if [ -n "$LOCALE_OVERRIDE" ]; then - export LC_ALL="$LOCALE_OVERRIDE"; - echo "Setting LC_ALL to $LOCALE_OVERRIDE" - - pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' - python -c "$pycmd" -fi - -if [ "$BUILD_TEST" ]; then - echo "We are not running nosetests as this is simply a build test." -elif [ "$COVERAGE" ]; then - echo nosetests --exe -A "$NOSE_ARGS" pandas --with-coverage --with-xunit --xunit-file=/tmp/nosetests.xml - nosetests --exe -A "$NOSE_ARGS" pandas --with-coverage --cover-package=pandas --cover-tests --with-xunit --xunit-file=/tmp/nosetests.xml -else - echo nosetests --exe -A "$NOSE_ARGS" pandas --doctest-tests --with-xunit --xunit-file=/tmp/nosetests.xml - nosetests --exe -A "$NOSE_ARGS" pandas --doctest-tests --with-xunit --xunit-file=/tmp/nosetests.xml -fi - -RET="$?" - -exit "$RET" diff --git a/ci/script_multi.sh b/ci/script_multi.sh new file mode 100755 index 0000000000000..ee9fbcaad5ef5 --- /dev/null +++ b/ci/script_multi.sh @@ -0,0 +1,52 @@ +#!/bin/bash + +echo "[script multi]" + +source activate pandas + +if [ -n "$LOCALE_OVERRIDE" ]; then + export LC_ALL="$LOCALE_OVERRIDE"; + echo "Setting LC_ALL to $LOCALE_OVERRIDE" + + pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' + python -c "$pycmd" +fi + +# Workaround for pytest-xdist flaky collection order +# https://github.com/pytest-dev/pytest/issues/920 +# https://github.com/pytest-dev/pytest/issues/1075 +export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') +echo PYTHONHASHSEED=$PYTHONHASHSEED + +if [ "$BUILD_TEST" ]; then + echo "[build-test]" + + echo "[env]" + pip list --format columns |grep pandas + + echo "[running]" + cd /tmp + unset PYTHONPATH + python -c 'import pandas; pandas.test(["-n 2", "--skip-slow", "--skip-network", "-r xX", "-m not single"])' + +elif [ "$DOC" ]; then + echo "We are not running pytest as this is a doc-build" + +elif [ "$COVERAGE" ]; then + echo pytest -s -n 2 -m "not single" --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=/tmp/multiple.xml $TEST_ARGS pandas + pytest -s -n 2 -m "not single" --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=/tmp/multiple.xml $TEST_ARGS pandas + +elif [ "$SLOW" ]; then + TEST_ARGS="--only-slow --skip-network" + echo pytest -r xX -m "not single and slow" -v --junitxml=/tmp/multiple.xml $TEST_ARGS pandas + pytest -r xX -m "not single and slow" -v --junitxml=/tmp/multiple.xml $TEST_ARGS pandas + +else + echo pytest -n 2 -r xX -m "not single" --junitxml=/tmp/multiple.xml $TEST_ARGS pandas + pytest -n 2 -r xX -m "not single" --junitxml=/tmp/multiple.xml $TEST_ARGS pandas # TODO: doctest + +fi + +RET="$?" + +exit "$RET" diff --git a/ci/script_single.sh b/ci/script_single.sh new file mode 100755 index 0000000000000..375e9879e950f --- /dev/null +++ b/ci/script_single.sh @@ -0,0 +1,37 @@ +#!/bin/bash + +echo "[script_single]" + +source activate pandas + +if [ -n "$LOCALE_OVERRIDE" ]; then + export LC_ALL="$LOCALE_OVERRIDE"; + echo "Setting LC_ALL to $LOCALE_OVERRIDE" + + pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' + python -c "$pycmd" +fi + +if [ "$SLOW" ]; then + TEST_ARGS="--only-slow --skip-network" +fi + +if [ "$BUILD_TEST" ]; then + echo "We are not running pytest as this is a build test." + +elif [ "$DOC" ]; then + echo "We are not running pytest as this is a doc-build" + +elif [ "$COVERAGE" ]; then + echo pytest -s -m "single" --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas + pytest -s -m "single" --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas + +else + echo pytest -m "single" -r xX --junitxml=/tmp/single.xml $TEST_ARGS pandas + pytest -m "single" -r xX --junitxml=/tmp/single.xml $TEST_ARGS pandas # TODO: doctest + +fi + +RET="$?" + +exit "$RET" diff --git a/ci/show_circle.sh b/ci/show_circle.sh new file mode 100755 index 0000000000000..bfaa65c1d84f2 --- /dev/null +++ b/ci/show_circle.sh @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +echo "[installed versions]" + +export PATH="$MINICONDA_DIR/bin:$PATH" +source activate pandas + +python -c "import pandas; pandas.show_versions();" diff --git a/ci/submit_cython_cache.sh b/ci/submit_cython_cache.sh index cfbced4988357..b87acef0ba11c 100755 --- a/ci/submit_cython_cache.sh +++ b/ci/submit_cython_cache.sh @@ -9,7 +9,7 @@ rm -rf $PYX_CACHE_DIR home_dir=$(pwd) -mkdir $PYX_CACHE_DIR +mkdir -p $PYX_CACHE_DIR rsync -Rv $pyx_file_list $PYX_CACHE_DIR echo "pyx files:" diff --git a/ci/upload_coverage.sh b/ci/upload_coverage.sh new file mode 100755 index 0000000000000..a7ef2fa908079 --- /dev/null +++ b/ci/upload_coverage.sh @@ -0,0 +1,12 @@ +#!/bin/bash + +if [ -z "$COVERAGE" ]; then + echo "coverage is not selected for this build" + exit 0 +fi + +source activate pandas + +echo "uploading coverage" +bash <(curl -s https://codecov.io/bash) -Z -c -F single -f /tmp/cov-single.xml +bash <(curl -s https://codecov.io/bash) -Z -c -F multiple -f /tmp/cov-multiple.xml diff --git a/circle.yml b/circle.yml new file mode 100644 index 0000000000000..9d49145af54e3 --- /dev/null +++ b/circle.yml @@ -0,0 +1,38 @@ +machine: + environment: + # these are globally set + MINICONDA_DIR: /home/ubuntu/miniconda3 + + +database: + override: + - ./ci/install_db_circle.sh + + +checkout: + post: + # since circleci does a shallow fetch + # we need to populate our tags + - git fetch --depth=1000 + + +dependencies: + override: + - > + case $CIRCLE_NODE_INDEX in + 0) + sudo apt-get install language-pack-it && ./ci/install_circle.sh JOB="2.7_COMPAT" LOCALE_OVERRIDE="it_IT.UTF-8" ;; + 1) + sudo apt-get install language-pack-zh-hans && ./ci/install_circle.sh JOB="3.6_LOCALE" LOCALE_OVERRIDE="zh_CN.UTF-8" ;; + 2) + sudo apt-get install language-pack-zh-hans && ./ci/install_circle.sh JOB="3.6_LOCALE_SLOW" LOCALE_OVERRIDE="zh_CN.UTF-8" ;; + 3) + ./ci/install_circle.sh JOB="3.5_ASCII" LOCALE_OVERRIDE="C" ;; + esac + - ./ci/show_circle.sh + + +test: + override: + - case $CIRCLE_NODE_INDEX in 0) ./ci/run_circle.sh --skip-slow --skip-network ;; 1) ./ci/run_circle.sh --only-slow --skip-network ;; 2) ./ci/run_circle.sh --skip-slow --skip-network ;; 3) ./ci/run_circle.sh --skip-slow --skip-network ;; esac: + parallel: true diff --git a/codecov.yml b/codecov.yml index 45a6040c6a50d..512bc2e82a736 100644 --- a/codecov.yml +++ b/codecov.yml @@ -1,9 +1,13 @@ +codecov: + branch: master + coverage: status: project: default: + enabled: no target: '82' patch: default: + enabled: no target: '50' - branches: null diff --git a/doc/README.rst b/doc/README.rst index a3733846d9ed1..0ea3234dec348 100644 --- a/doc/README.rst +++ b/doc/README.rst @@ -81,7 +81,9 @@ have ``sphinx`` and ``ipython`` installed. `numpydoc `_ is used to parse the docstrings that follow the Numpy Docstring Standard (see above), but you don't need to install this because a local copy of ``numpydoc`` is included in the pandas source -code. +code. `nbsphinx `_ is used to convert +Jupyter notebooks. You will need to install it if you intend to modify any of +the notebooks included in the documentation. Furthermore, it is recommended to have all `optional dependencies `_ diff --git a/doc/_templates/api_redirect.html b/doc/_templates/api_redirect.html index 24bdd8363830f..c04a8b58ce544 100644 --- a/doc/_templates/api_redirect.html +++ b/doc/_templates/api_redirect.html @@ -1,15 +1,10 @@ -{% set pgn = pagename.split('.') -%} -{% if pgn[-2][0].isupper() -%} - {% set redirect = ["pandas", pgn[-2], pgn[-1], 'html']|join('.') -%} -{% else -%} - {% set redirect = ["pandas", pgn[-1], 'html']|join('.') -%} -{% endif -%} +{% set redirect = redirects[pagename.split("/")[-1]] %} - + This API page has moved -

This API page has moved here.

+

This API page has moved here.

- \ No newline at end of file + diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf index d504926d22580..0492805a1408b 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx index 76ae8f1e39d4e..6cca9ac4647f7 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx differ diff --git a/doc/cheatsheet/README.txt b/doc/cheatsheet/README.txt index e2f6ec042e9cc..d32fe5bcd05a6 100644 --- a/doc/cheatsheet/README.txt +++ b/doc/cheatsheet/README.txt @@ -2,3 +2,7 @@ The Pandas Cheat Sheet was created using Microsoft Powerpoint 2013. To create the PDF version, within Powerpoint, simply do a "Save As" and pick "PDF' as the format. +This cheat sheet was inspired by the RstudioData Wrangling Cheatsheet[1], written by Irv Lustig, Princeton Consultants[2]. + +[1]: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf +[2]: http://www.princetonoptimization.com/ diff --git a/doc/make.py b/doc/make.py index d46be2611ce3d..acef563f301e4 100755 --- a/doc/make.py +++ b/doc/make.py @@ -34,39 +34,52 @@ SPHINX_BUILD = 'sphinxbuild' -def upload_dev(user='pandas'): +def _process_user(user): + if user is None or user is False: + user = '' + else: + user = user + '@' + return user + + +def upload_dev(user=None): 'push a copy to the pydata dev directory' - if os.system('cd build/html; rsync -avz . {0}@pandas.pydata.org' + user = _process_user(user) + if os.system('cd build/html; rsync -avz . {0}pandas.pydata.org' ':/usr/share/nginx/pandas/pandas-docs/dev/ -essh'.format(user)): raise SystemExit('Upload to Pydata Dev failed') -def upload_dev_pdf(user='pandas'): +def upload_dev_pdf(user=None): 'push a copy to the pydata dev directory' - if os.system('cd build/latex; scp pandas.pdf {0}@pandas.pydata.org' + user = _process_user(user) + if os.system('cd build/latex; scp pandas.pdf {0}pandas.pydata.org' ':/usr/share/nginx/pandas/pandas-docs/dev/'.format(user)): raise SystemExit('PDF upload to Pydata Dev failed') -def upload_stable(user='pandas'): +def upload_stable(user=None): 'push a copy to the pydata stable directory' - if os.system('cd build/html; rsync -avz . {0}@pandas.pydata.org' + user = _process_user(user) + if os.system('cd build/html; rsync -avz . {0}pandas.pydata.org' ':/usr/share/nginx/pandas/pandas-docs/stable/ -essh'.format(user)): raise SystemExit('Upload to stable failed') -def upload_stable_pdf(user='pandas'): +def upload_stable_pdf(user=None): 'push a copy to the pydata dev directory' - if os.system('cd build/latex; scp pandas.pdf {0}@pandas.pydata.org' + user = _process_user(user) + if os.system('cd build/latex; scp pandas.pdf {0}pandas.pydata.org' ':/usr/share/nginx/pandas/pandas-docs/stable/'.format(user)): raise SystemExit('PDF upload to stable failed') -def upload_prev(ver, doc_root='./', user='pandas'): +def upload_prev(ver, doc_root='./', user=None): 'push a copy of older release to appropriate version directory' + user = _process_user(user) local_dir = doc_root + 'build/html' remote_dir = '/usr/share/nginx/pandas/pandas-docs/version/%s/' % ver - cmd = 'cd %s; rsync -avz . %s@pandas.pydata.org:%s -essh' + cmd = 'cd %s; rsync -avz . %spandas.pydata.org:%s -essh' cmd = cmd % (local_dir, user, remote_dir) print(cmd) if os.system(cmd): @@ -74,7 +87,7 @@ def upload_prev(ver, doc_root='./', user='pandas'): 'Upload to %s from %s failed' % (remote_dir, local_dir)) local_dir = doc_root + 'build/latex' - pdf_cmd = 'cd %s; scp pandas.pdf %s@pandas.pydata.org:%s' + pdf_cmd = 'cd %s; scp pandas.pdf %spandas.pydata.org:%s' pdf_cmd = pdf_cmd % (local_dir, user, remote_dir) if os.system(pdf_cmd): raise SystemExit('Upload PDF to %s from %s failed' % (ver, doc_root)) @@ -106,106 +119,55 @@ def clean(): @contextmanager -def cleanup_nb(nb): - try: - yield - finally: - try: - os.remove(nb + '.executed') - except OSError: - pass - - -def get_kernel(): - """Find the kernel name for your python version""" - return 'python%s' % sys.version_info.major - - -def execute_nb(src, dst, allow_errors=False, timeout=1000, kernel_name=''): - """ - Execute notebook in `src` and write the output to `dst` - - Parameters - ---------- - src, dst: str - path to notebook - allow_errors: bool - timeout: int - kernel_name: str - defualts to value set in notebook metadata - - Returns - ------- - dst: str - """ - import nbformat - from nbconvert.preprocessors import ExecutePreprocessor - - with io.open(src, encoding='utf-8') as f: - nb = nbformat.read(f, as_version=4) - - ep = ExecutePreprocessor(allow_errors=allow_errors, - timeout=timeout, - kernel_name=kernel_name) - ep.preprocess(nb, resources={}) - - with io.open(dst, 'wt', encoding='utf-8') as f: - nbformat.write(nb, f) - return dst - - -def convert_nb(src, dst, to='html', template_file='basic'): +def maybe_exclude_notebooks(): """ - Convert a notebook `src`. - - Parameters - ---------- - src, dst: str - filepaths - to: {'rst', 'html'} - format to export to - template_file: str - name of template file to use. Default 'basic' + Skip building the notebooks if pandoc is not installed. + This assumes that nbsphinx is installed. """ - from nbconvert import HTMLExporter, RSTExporter - - dispatch = {'rst': RSTExporter, 'html': HTMLExporter} - exporter = dispatch[to.lower()](template_file=template_file) + base = os.path.dirname(__file__) + notebooks = [os.path.join(base, 'source', nb) + for nb in ['style.ipynb']] + contents = {} + + def _remove_notebooks(): + for nb in notebooks: + with open(nb, 'rt') as f: + contents[nb] = f.read() + os.remove(nb) + + # Skip notebook conversion if + # 1. nbconvert isn't installed, or + # 2. nbconvert is installed, but pandoc isn't + try: + import nbconvert + except ImportError: + print("Warning: nbconvert not installed. Skipping notebooks.") + _remove_notebooks() + else: + try: + nbconvert.utils.pandoc.get_pandoc_version() + except nbconvert.utils.pandoc.PandocMissing: + print("Warning: Pandoc is not installed. Skipping notebooks.") + _remove_notebooks() - (body, resources) = exporter.from_filename(src) - with io.open(dst, 'wt', encoding='utf-8') as f: - f.write(body) - return dst + yield + for nb, content in contents.items(): + with open(nb, 'wt') as f: + f.write(content) def html(): check_build() - notebooks = [ - 'source/html-styling.ipynb', - ] - - for nb in notebooks: - with cleanup_nb(nb): - try: - print("Converting %s" % nb) - kernel_name = get_kernel() - executed = execute_nb(nb, nb + '.executed', allow_errors=True, - kernel_name=kernel_name) - convert_nb(executed, nb.rstrip('.ipynb') + '.html') - except (ImportError, IndexError) as e: - print(e) - print("Failed to convert %s" % nb) - - if os.system('sphinx-build -P -b html -d build/doctrees ' - 'source build/html'): - raise SystemExit("Building HTML failed.") - try: - # remove stale file - os.system('rm source/html-styling.html') - os.system('cd build; rm -f html/pandas.zip;') - except: - pass + with maybe_exclude_notebooks(): + if os.system('sphinx-build -P -b html -d build/doctrees ' + 'source build/html'): + raise SystemExit("Building HTML failed.") + try: + # remove stale file + os.remove('build/html/pandas.zip') + except: + pass def zip_html(): @@ -222,7 +184,7 @@ def latex(): check_build() if sys.platform != 'win32': # LaTeX format. - if os.system('sphinx-build -b latex -d build/doctrees ' + if os.system('sphinx-build -j 2 -b latex -d build/doctrees ' 'source build/latex'): raise SystemExit("Building LaTeX failed.") # Produce pdf. @@ -245,7 +207,7 @@ def latex_forced(): check_build() if sys.platform != 'win32': # LaTeX format. - if os.system('sphinx-build -b latex -d build/doctrees ' + if os.system('sphinx-build -j 2 -b latex -d build/doctrees ' 'source build/latex'): raise SystemExit("Building LaTeX failed.") # Produce pdf. diff --git a/doc/source/10min.rst b/doc/source/10min.rst index 0612e86134cf2..def49a641a0ff 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -84,29 +84,28 @@ will be completed: @verbatim In [1]: df2. - df2.A df2.boxplot - df2.abs df2.C - df2.add df2.clip - df2.add_prefix df2.clip_lower - df2.add_suffix df2.clip_upper - df2.align df2.columns - df2.all df2.combine - df2.any df2.combineAdd + df2.A df2.bool + df2.abs df2.boxplot + df2.add df2.C + df2.add_prefix df2.clip + df2.add_suffix df2.clip_lower + df2.align df2.clip_upper + df2.all df2.columns + df2.any df2.combine df2.append df2.combine_first - df2.apply df2.combineMult - df2.applymap df2.compound - df2.as_blocks df2.consolidate - df2.asfreq df2.convert_objects - df2.as_matrix df2.copy - df2.astype df2.corr - df2.at df2.corrwith - df2.at_time df2.count - df2.axes df2.cov - df2.B df2.cummax - df2.between_time df2.cummin - df2.bfill df2.cumprod - df2.blocks df2.cumsum - df2.bool df2.D + df2.apply df2.compound + df2.applymap df2.consolidate + df2.as_blocks df2.convert_objects + df2.asfreq df2.copy + df2.as_matrix df2.corr + df2.astype df2.corrwith + df2.at df2.count + df2.at_time df2.cov + df2.axes df2.cummax + df2.B df2.cummin + df2.between_time df2.cumprod + df2.bfill df2.cumsum + df2.blocks df2.D As you can see, the columns ``A``, ``B``, ``C``, and ``D`` are automatically tab completed. ``E`` is there as well; the rest of the attributes have been @@ -374,7 +373,7 @@ To get the boolean mask where values are ``nan`` .. ipython:: python - pd.isnull(df1) + pd.isna(df1) Operations diff --git a/doc/source/_static/ci.png b/doc/source/_static/ci.png new file mode 100644 index 0000000000000..4570ed2155586 Binary files /dev/null and b/doc/source/_static/ci.png differ diff --git a/doc/source/_static/style-excel.png b/doc/source/_static/style-excel.png new file mode 100644 index 0000000000000..f946949e8bcf9 Binary files /dev/null and b/doc/source/_static/style-excel.png differ diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst index 21ae9f1eb8409..711c3e9a95d05 100644 --- a/doc/source/advanced.rst +++ b/doc/source/advanced.rst @@ -46,7 +46,7 @@ data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d). In this section, we will show what exactly we mean by "hierarchical" indexing -and how it integrates with the all of the pandas indexing functionality +and how it integrates with all of the pandas indexing functionality described above and in prior sections. Later, when discussing :ref:`group by ` and :ref:`pivoting and reshaping data `, we'll show non-trivial applications to illustrate how it aids in structuring data for @@ -59,7 +59,7 @@ Creating a MultiIndex (hierarchical index) object The ``MultiIndex`` object is the hierarchical analogue of the standard ``Index`` object which typically stores the axis labels in pandas objects. You -can think of ``MultiIndex`` an array of tuples where each tuple is unique. A +can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A ``MultiIndex`` can be created from a list of arrays (using ``MultiIndex.from_arrays``), an array of tuples (using ``MultiIndex.from_tuples``), or a crossed set of iterables (using @@ -136,7 +136,7 @@ can find yourself working with hierarchically-indexed data without creating a may wish to generate your own ``MultiIndex`` when preparing the data set. Note that how the index is displayed by be controlled using the -``multi_sparse`` option in ``pandas.set_printoptions``: +``multi_sparse`` option in ``pandas.set_options()``: .. ipython:: python @@ -175,35 +175,40 @@ completely analogous way to selecting a column in a regular DataFrame: See :ref:`Cross-section with hierarchical index ` for how to select on a deeper level. -.. note:: +.. _advanced.shown_levels: + +Defined Levels +~~~~~~~~~~~~~~ - The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even - if the they are not actually used. When slicing an index, you may notice this. - For example: +The repr of a ``MultiIndex`` shows ALL the defined levels of an index, even +if the they are not actually used. When slicing an index, you may notice this. +For example: + +.. ipython:: python - .. ipython:: python + # original multi-index + df.columns - # original multi-index - df.columns + # sliced + df[['foo','qux']].columns - # sliced - df[['foo','qux']].columns +This is done to avoid a recomputation of the levels in order to make slicing +highly performant. If you want to see the actual used levels. - This is done to avoid a recomputation of the levels in order to make slicing - highly performant. If you want to see the actual used levels. +.. ipython:: python - .. ipython:: python + df[['foo','qux']].columns.values - df[['foo','qux']].columns.values + # for a specific level + df[['foo','qux']].columns.get_level_values(0) - # for a specific level - df[['foo','qux']].columns.get_level_values(0) +To reconstruct the multiindex with only the used levels - To reconstruct the multiindex with only the used levels +.. versionadded:: 0.20.0 - .. ipython:: python +.. ipython:: python - pd.MultiIndex.from_tuples(df[['foo','qux']].columns.values) + df[['foo','qux']].columns.remove_unused_levels() Data alignment and using ``reindex`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -288,7 +293,7 @@ As usual, **both sides** of the slicers are included as this is label indexing. .. code-block:: python - df.loc[(slice('A1','A3'),.....),:] + df.loc[(slice('A1','A3'),.....), :] rather than this: @@ -317,43 +322,43 @@ Basic multi-index slicing using slices, lists, and labels. .. ipython:: python - dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:] + dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :] You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)`` .. ipython:: python idx = pd.IndexSlice - dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] + dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] It is possible to perform quite complicated selections using this method on multiple axes at the same time. .. ipython:: python - dfmi.loc['A1',(slice(None),'foo')] - dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] + dfmi.loc['A1', (slice(None), 'foo')] + dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] Using a boolean indexer you can provide selection related to the *values*. .. ipython:: python - mask = dfmi[('a','foo')]>200 - dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']] + mask = dfmi[('a', 'foo')] > 200 + dfmi.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']] You can also specify the ``axis`` argument to ``.loc`` to interpret the passed slicers on a single axis. .. ipython:: python - dfmi.loc(axis=0)[:,:,['C1','C3']] + dfmi.loc(axis=0)[:, :, ['C1', 'C3']] Furthermore you can *set* the values using these methods .. ipython:: python df2 = dfmi.copy() - df2.loc(axis=0)[:,:,['C1','C3']] = -10 + df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10 df2 You can use a right-hand-side of an alignable object as well. @@ -361,7 +366,7 @@ You can use a right-hand-side of an alignable object as well. .. ipython:: python df2 = dfmi.copy() - df2.loc[idx[:,:,['C1','C3']],:] = df2*1000 + df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000 df2 .. _advanced.xs: @@ -845,6 +850,39 @@ Of course if you need integer based selection, then use ``iloc`` dfir.iloc[0:5] +.. _indexing.intervallindex: + +IntervalIndex +~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +.. warning:: + + These indexing behaviors are provisional and may change in a future version of pandas. + +.. ipython:: python + + df = pd.DataFrame({'A': [1, 2, 3, 4]}, + index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4])) + df + +Label based indexing via ``.loc`` along the edges of an interval works as you would expect, +selecting that particular interval. + +.. ipython:: python + + df.loc[2] + df.loc[[2, 3]] + +If you select a lable *contained* within an interval, this will also select the interval. + +.. ipython:: python + + df.loc[2.5] + df.loc[[2.5, 3.5]] + + Miscellaneous indexing FAQ -------------------------- @@ -880,7 +918,7 @@ normal Python ``list``. Monotonicity of an index can be tested with the ``is_mon .. ipython:: python - df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=range(5)) + df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=list(range(5))) df.index.is_monotonic_increasing # no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4: @@ -894,7 +932,7 @@ On the other hand, if the index is not monotonic, then both slice bounds must be .. ipython:: python - df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=range(6)) + df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=list(range(6))) df.index.is_monotonic_increasing # OK because 2 and 4 are in the index @@ -910,6 +948,16 @@ On the other hand, if the index is not monotonic, then both slice bounds must be In [11]: df.loc[2:3, :] KeyError: 'Cannot get right slice bound for non-unique label: 3' +:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that +an index is weakly monotonic. To check for strict montonicity, you can combine one of those with +:meth:`Index.is_unique` + +.. ipython:: python + + weakly_monotonic = pd.Index(['a', 'b', 'c', 'c']) + weakly_monotonic + weakly_monotonic.is_monotonic_increasing + weakly_monotonic.is_monotonic_increasing & weakly_monotonic.is_unique Endpoints are inclusive ~~~~~~~~~~~~~~~~~~~~~~~ @@ -965,6 +1013,7 @@ The different indexing operation can potentially change the dtype of a ``Series` res .. ipython:: python + series2 = pd.Series([True]) series2.dtype res = series2.reindex_like(series1) diff --git a/doc/source/api.rst b/doc/source/api.rst index 92f290b5ee0a9..12e6c7ad7f630 100644 --- a/doc/source/api.rst +++ b/doc/source/api.rst @@ -5,6 +5,22 @@ API Reference ************* +This page gives an overview of all public pandas objects, functions and +methods. In general, all classes and functions exposed in the top-level +``pandas.*`` namespace are regarded as public. + +Further some of the subpackages are public, including ``pandas.errors``, +``pandas.plotting``, and ``pandas.testing``. Certain functions in the the +``pandas.io`` and ``pandas.tseries`` submodules are public as well (those +mentioned in the documentation). Further, the ``pandas.api.types`` subpackage +holds some public functions related to data types in pandas. + + +.. warning:: + + The ``pandas.core``, ``pandas.compat``, and ``pandas.util`` top-level modules are considered to be PRIVATE. Stability of functionality in those modules in not guaranteed. + + .. _api.functions: Input/Output @@ -60,6 +76,7 @@ JSON :toctree: generated/ json_normalize + build_table_schema .. currentmodule:: pandas @@ -82,6 +99,7 @@ HDFStore: PyTables (HDF5) HDFStore.append HDFStore.get HDFStore.select + HDFStore.info Feather ~~~~~~~ @@ -91,6 +109,14 @@ Feather read_feather +Parquet +~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + read_parquet + SAS ~~~ @@ -111,16 +137,11 @@ SQL Google BigQuery ~~~~~~~~~~~~~~~ -.. currentmodule:: pandas.io.gbq .. autosummary:: :toctree: generated/ read_gbq - to_gbq - - -.. currentmodule:: pandas STATA @@ -165,6 +186,7 @@ Data manipulations concat get_dummies factorize + unique wide_to_long Top-level missing data @@ -173,7 +195,9 @@ Top-level missing data .. autosummary:: :toctree: generated/ + isna isnull + notna notnull Top-level conversions @@ -256,9 +280,10 @@ Conversion :toctree: generated/ Series.astype + Series.infer_objects Series.copy - Series.isnull - Series.notnull + Series.isna + Series.notna Indexing, iteration ~~~~~~~~~~~~~~~~~~~ @@ -313,6 +338,8 @@ Function application, GroupBy & Window :toctree: generated/ Series.apply + Series.aggregate + Series.transform Series.map Series.groupby Series.rolling @@ -393,6 +420,7 @@ Reindexing / Selection / Label manipulation Series.reset_index Series.sample Series.select + Series.set_axis Series.take Series.tail Series.truncate @@ -599,7 +627,6 @@ strings and apply several methods to it. These can be accessed like Series.cat Series.dt Index.str - CategoricalIndex.str MultiIndex.str DatetimeIndex.str TimedeltaIndex.str @@ -652,7 +679,7 @@ adding ordering information or special categories is need at creation time of th Categorical.from_codes ``np.asarray(categorical)`` works by implementing the array interface. Be aware, that this converts -the Categorical back to a numpy array, so levels and order information is not preserved! +the Categorical back to a numpy array, so categories and order information is not preserved! .. autosummary:: :toctree: generated/ @@ -710,9 +737,10 @@ Serialization / IO / Conversion Series.to_dense Series.to_string Series.to_clipboard + Series.to_latex -Sparse methods -~~~~~~~~~~~~~~ +Sparse +~~~~~~ .. autosummary:: :toctree: generated/ @@ -761,9 +789,10 @@ Conversion DataFrame.astype DataFrame.convert_objects + DataFrame.infer_objects DataFrame.copy - DataFrame.isnull - DataFrame.notnull + DataFrame.isna + DataFrame.notna Indexing, iteration ~~~~~~~~~~~~~~~~~~~ @@ -830,6 +859,8 @@ Function application, GroupBy & Window DataFrame.apply DataFrame.applymap + DataFrame.aggregate + DataFrame.transform DataFrame.groupby DataFrame.rolling DataFrame.expanding @@ -933,6 +964,7 @@ Reshaping, sorting, transposing DataFrame.swaplevel DataFrame.stack DataFrame.unstack + DataFrame.melt DataFrame.T DataFrame.to_panel DataFrame.to_xarray @@ -1030,6 +1062,13 @@ Serialization / IO / Conversion DataFrame.to_string DataFrame.to_clipboard +Sparse +~~~~~~ +.. autosummary:: + :toctree: generated/ + + SparseDataFrame.to_coo + .. _api.panel: Panel @@ -1070,8 +1109,8 @@ Conversion Panel.astype Panel.copy - Panel.isnull - Panel.notnull + Panel.isna + Panel.notna Getting and setting ~~~~~~~~~~~~~~~~~~~ @@ -1237,57 +1276,6 @@ Serialization / IO / Conversion Panel.to_xarray Panel.to_clipboard -.. _api.panel4d: - -Panel4D -------- - -Constructor -~~~~~~~~~~~ -.. autosummary:: - :toctree: generated/ - - Panel4D - -Serialization / IO / Conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. autosummary:: - :toctree: generated/ - - Panel4D.to_xarray - -Attributes and underlying data -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -**Axes** - - * **labels**: axis 1; each label corresponds to a Panel contained inside - * **items**: axis 2; each item corresponds to a DataFrame contained inside - * **major_axis**: axis 3; the index (rows) of each of the DataFrames - * **minor_axis**: axis 4; the columns of each of the DataFrames - -.. autosummary:: - :toctree: generated/ - - Panel4D.values - Panel4D.axes - Panel4D.ndim - Panel4D.size - Panel4D.shape - Panel4D.dtypes - Panel4D.ftypes - Panel4D.get_dtype_counts - Panel4D.get_ftype_counts - -Conversion -~~~~~~~~~~ -.. autosummary:: - :toctree: generated/ - - Panel4D.astype - Panel4D.copy - Panel4D.isnull - Panel4D.notnull - .. _api.index: Index @@ -1321,6 +1309,7 @@ Attributes Index.nbytes Index.ndim Index.size + Index.empty Index.strides Index.itemsize Index.base @@ -1356,8 +1345,16 @@ Modifying and Computations Index.unique Index.nunique Index.value_counts + +Missing Values +~~~~~~~~~~~~~~ +.. autosummary:: + :toctree: generated/ + Index.fillna Index.dropna + Index.isna + Index.notna Conversion ~~~~~~~~~~ @@ -1417,6 +1414,7 @@ CategoricalIndex .. autosummary:: :toctree: generated/ + :template: autosummary/class_without_autosummary.rst CategoricalIndex @@ -1438,6 +1436,28 @@ Categorical Components CategoricalIndex.as_ordered CategoricalIndex.as_unordered +.. _api.intervalindex: + +IntervalIndex +------------- + +.. autosummary:: + :toctree: generated/ + :template: autosummary/class_without_autosummary.rst + + IntervalIndex + +IntervalIndex Components +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + IntervalIndex.from_arrays + IntervalIndex.from_tuples + IntervalIndex.from_breaks + IntervalIndex.from_intervals + .. _api.multiindex: MultiIndex @@ -1447,6 +1467,7 @@ MultiIndex :toctree: generated/ MultiIndex + IndexSlice MultiIndex Components ~~~~~~~~~~~~~~~~~~~~~~ @@ -1465,6 +1486,7 @@ MultiIndex Components MultiIndex.droplevel MultiIndex.swaplevel MultiIndex.reorder_levels + MultiIndex.remove_unused_levels .. _api.datetimeindex: @@ -1697,6 +1719,7 @@ Computations / Descriptive Stats GroupBy.mean GroupBy.median GroupBy.min + GroupBy.ngroup GroupBy.nth GroupBy.ohlc GroupBy.prod @@ -1731,6 +1754,7 @@ application to columns of a specific data type. DataFrameGroupBy.diff DataFrameGroupBy.ffill DataFrameGroupBy.fillna + DataFrameGroupBy.filter DataFrameGroupBy.hist DataFrameGroupBy.idxmax DataFrameGroupBy.idxmin @@ -1767,7 +1791,7 @@ The following methods are available only for ``DataFrameGroupBy`` objects. Resampling ---------- -.. currentmodule:: pandas.tseries.resample +.. currentmodule:: pandas.core.resample Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resample`, :func:`pandas.Series.resample`. @@ -1827,7 +1851,7 @@ Computations / Descriptive Stats Style ----- -.. currentmodule:: pandas.formats.style +.. currentmodule:: pandas.io.formats.style ``Styler`` objects are returned by :attr:`pandas.DataFrame.style`. @@ -1892,3 +1916,97 @@ Working with options get_option set_option option_context + +Testing functions +~~~~~~~~~~~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + testing.assert_frame_equal + testing.assert_series_equal + testing.assert_index_equal + + +Exceptions and warnings +~~~~~~~~~~~~~~~~~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + errors.DtypeWarning + errors.EmptyDataError + errors.OutOfBoundsDatetime + errors.ParserError + errors.ParserWarning + errors.PerformanceWarning + errors.UnsortedIndexError + errors.UnsupportedFunctionCall + + +Data types related functionality +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. autosummary:: + :toctree: generated/ + + api.types.union_categoricals + api.types.infer_dtype + api.types.pandas_dtype + +Dtype introspection + +.. autosummary:: + :toctree: generated/ + + api.types.is_bool_dtype + api.types.is_categorical_dtype + api.types.is_complex_dtype + api.types.is_datetime64_any_dtype + api.types.is_datetime64_dtype + api.types.is_datetime64_ns_dtype + api.types.is_datetime64tz_dtype + api.types.is_extension_type + api.types.is_float_dtype + api.types.is_int64_dtype + api.types.is_integer_dtype + api.types.is_interval_dtype + api.types.is_numeric_dtype + api.types.is_object_dtype + api.types.is_period_dtype + api.types.is_signed_integer_dtype + api.types.is_string_dtype + api.types.is_timedelta64_dtype + api.types.is_timedelta64_ns_dtype + api.types.is_unsigned_integer_dtype + api.types.is_sparse + +Iterable introspection + +.. autosummary:: + :toctree: generated/ + + api.types.is_dict_like + api.types.is_file_like + api.types.is_list_like + api.types.is_named_tuple + api.types.is_iterator + +Scalar introspection + +.. autosummary:: + :toctree: generated/ + + api.types.is_bool + api.types.is_categorical + api.types.is_complex + api.types.is_datetimetz + api.types.is_float + api.types.is_hashable + api.types.is_integer + api.types.is_interval + api.types.is_number + api.types.is_period + api.types.is_re + api.types.is_re_compilable + api.types.is_scalar diff --git a/doc/source/basics.rst b/doc/source/basics.rst index f5f7c73223595..fe20a7eb2b786 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -93,7 +93,7 @@ Accelerated operations ---------------------- pandas has support for accelerating certain types of binary numerical and boolean operations using -the ``numexpr`` library (starting in 0.11.0) and the ``bottleneck`` libraries. +the ``numexpr`` library and the ``bottleneck`` libraries. These libraries are especially useful when dealing with large data sets, and provide large speedups. ``numexpr`` uses smart chunking, caching, and multiple cores. ``bottleneck`` is @@ -114,6 +114,15 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``): You are highly encouraged to install both libraries. See the section :ref:`Recommended Dependencies ` for more installation info. +These are both enabled to be used by default, you can control this by setting the options: + +.. versionadded:: 0.20.0 + +.. code-block:: python + + pd.set_option('compute.use_bottleneck', False) + pd.set_option('compute.use_numexpr', False) + .. _basics.binop: Flexible binary operations @@ -435,7 +444,7 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above: .. ipython:: python - combiner = lambda x, y: np.where(pd.isnull(x), y, x) + combiner = lambda x, y: np.where(pd.isna(x), y, x) df1.combine(df2, combiner) .. _basics.stats: @@ -502,7 +511,7 @@ optional ``level`` parameter which applies only if the object has a :header: "Function", "Description" :widths: 20, 80 - ``count``, Number of non-null observations + ``count``, Number of non-NA observations ``sum``, Sum of values ``mean``, Mean of values ``mad``, Mean absolute deviation @@ -532,7 +541,7 @@ will exclude NAs on Series input by default: np.mean(df['one'].values) ``Series`` also has a method :meth:`~Series.nunique` which will return the -number of unique non-null values: +number of unique non-NA values: .. ipython:: python @@ -702,7 +711,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise. 1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe` 2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply` -3. Elementwise_ function application: :meth:`~DataFrame.applymap` +3. `Aggregation API`_: :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform` +4. `Applying Elementwise Functions`_: :meth:`~DataFrame.applymap` .. _basics.pipe: @@ -778,6 +788,13 @@ statistics methods, take an optional ``axis`` argument: df.apply(np.cumsum) df.apply(np.exp) +``.apply()`` will also dispatch on a string method name. + +.. ipython:: python + + df.apply('mean') + df.apply('mean', axis=1) + Depending on the return type of the function passed to :meth:`~DataFrame.apply`, the result will either be of lower dimension or the same dimension. @@ -827,16 +844,226 @@ set to True, the passed function will instead receive an ndarray object, which has positive performance implications if you do not need the indexing functionality. -.. seealso:: +.. _basics.aggregate: + +Aggregation API +~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +The aggregation API allows one to express possibly multiple aggregation operations in a single concise way. +This API is similar across pandas objects, see :ref:`groupby API `, the +:ref:`window functions API `, and the :ref:`resample API `. +The entry point for aggregation is the method :meth:`~DataFrame.aggregate`, or the alias :meth:`~DataFrame.agg`. + +We will use a similar starting frame from above: + +.. ipython:: python + + tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], + index=pd.date_range('1/1/2000', periods=10)) + tsdf.iloc[3:7] = np.nan + tsdf + +Using a single function is equivalent to :meth:`~DataFrame.apply`; You can also pass named methods as strings. +These will return a ``Series`` of the aggregated output: + +.. ipython:: python + + tsdf.agg(np.sum) + + tsdf.agg('sum') + + # these are equivalent to a ``.sum()`` because we are aggregating on a single function + tsdf.sum() + +Single aggregations on a ``Series`` this will result in a scalar value: + +.. ipython:: python + + tsdf.A.agg('sum') + + +Aggregating with multiple functions ++++++++++++++++++++++++++++++++++++ + +You can pass multiple aggregation arguments as a list. +The results of each of the passed functions will be a row in the resultant ``DataFrame``. +These are naturally named from the aggregation function. + +.. ipython:: python - The section on :ref:`GroupBy ` demonstrates related, flexible - functionality for grouping by some criterion, applying, and combining the - results into a Series, DataFrame, etc. + tsdf.agg(['sum']) + +Multiple functions yield multiple rows: + +.. ipython:: python + + tsdf.agg(['sum', 'mean']) + +On a ``Series``, multiple functions return a ``Series``, indexed by the function names: + +.. ipython:: python + + tsdf.A.agg(['sum', 'mean']) + +Passing a ``lambda`` function will yield a ```` named row: + +.. ipython:: python + + tsdf.A.agg(['sum', lambda x: x.mean()]) + +Passing a named function will yield that name for the row: + +.. ipython:: python + + def mymean(x): + return x.mean() + + tsdf.A.agg(['sum', mymean]) + +Aggregating with a dict ++++++++++++++++++++++++ + +Passing a dictionary of column names to a scalar or a list of scalars, to ``DataFame.agg`` +allows you to customize which functions are applied to which columns. Note that the results +are not in any particular order, you can use an ``OrderedDict`` instead to guarantee ordering. + +.. ipython:: python + + tsdf.agg({'A': 'mean', 'B': 'sum'}) + +Passing a list-like will generate a ``DataFrame`` output. You will get a matrix-like output +of all of the aggregators. The output will consist of all unique functions. Those that are +not noted for a particular column will be ``NaN``: + +.. ipython:: python + + tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'}) + +.. _basics.aggregation.mixed_dtypes: + +Mixed Dtypes +++++++++++++ + +When presented with mixed dtypes that cannot aggregate, ``.agg`` will only take the valid +aggregations. This is similiar to how groupby ``.agg`` works. + +.. ipython:: python + + mdf = pd.DataFrame({'A': [1, 2, 3], + 'B': [1., 2., 3.], + 'C': ['foo', 'bar', 'baz'], + 'D': pd.date_range('20130101', periods=3)}) + mdf.dtypes + +.. ipython:: python -.. _Elementwise: + mdf.agg(['min', 'sum']) -Applying elementwise Python functions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. _basics.aggregation.custom_describe: + +Custom describe ++++++++++++++++ + +With ``.agg()`` is it possible to easily create a custom describe function, similar +to the built in :ref:`describe function `. + +.. ipython:: python + + from functools import partial + + q_25 = partial(pd.Series.quantile, q=0.25) + q_25.__name__ = '25%' + q_75 = partial(pd.Series.quantile, q=0.75) + q_75.__name__ = '75%' + + tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max']) + +.. _basics.transform: + +Transform API +~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +The :meth:`~DataFrame.transform` method returns an object that is indexed the same (same size) +as the original. This API allows you to provide *multiple* operations at the same +time rather than one-by-one. Its API is quite similar to the ``.agg`` API. + +Use a similar frame to the above sections. + +.. ipython:: python + + tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], + index=pd.date_range('1/1/2000', periods=10)) + tsdf.iloc[3:7] = np.nan + tsdf + +Transform the entire frame. ``.transform()`` allows input functions as: a numpy function, a string +function name or a user defined function. + +.. ipython:: python + :okwarning: + + tsdf.transform(np.abs) + tsdf.transform('abs') + tsdf.transform(lambda x: x.abs()) + +Here ``.transform()`` received a single function; this is equivalent to a ufunc application + +.. ipython:: python + + np.abs(tsdf) + +Passing a single function to ``.transform()`` with a ``Series`` will yield a single ``Series`` in return. + +.. ipython:: python + + tsdf.A.transform(np.abs) + + +Transform with multiple functions ++++++++++++++++++++++++++++++++++ + +Passing multiple functions will yield a column multi-indexed DataFrame. +The first level will be the original frame column names; the second level +will be the names of the transforming functions. + +.. ipython:: python + + tsdf.transform([np.abs, lambda x: x+1]) + +Passing multiple functions to a Series will yield a DataFrame. The +resulting column names will be the transforming functions. + +.. ipython:: python + + tsdf.A.transform([np.abs, lambda x: x+1]) + + +Transforming with a dict +++++++++++++++++++++++++ + + +Passing a dict of functions will will allow selective transforming per column. + +.. ipython:: python + + tsdf.transform({'A': np.abs, 'B': lambda x: x+1}) + +Passing a dict of lists will generate a multi-indexed DataFrame with these +selective transforms. + +.. ipython:: python + :okwarning: + + tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']}) + +.. _basics.elementwise: + +Applying Elementwise Functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since not all functions can be vectorized (accept NumPy arrays and return another array or value), the methods :meth:`~DataFrame.applymap` on DataFrame @@ -1797,7 +2024,29 @@ object conversion ~~~~~~~~~~~~~~~~~ pandas offers various functions to try to force conversion of types from the ``object`` dtype to other types. -The following functions are available for one dimensional object arrays or scalars: +In cases where the data is already of the correct type, but stored in an ``object`` array, the +:meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects` methods can be used to soft convert +to the correct type. + + .. ipython:: python + + import datetime + df = pd.DataFrame([[1, 2], + ['a', 'b'], + [datetime.datetime(2016, 3, 2), datetime.datetime(2016, 3, 2)]]) + df = df.T + df + df.dtypes + +Because the data was transposed the original inference stored all columns as object, which +``infer_objects`` will correct. + + .. ipython:: python + + df.infer_objects().dtypes + +The following functions are available for one dimensional object arrays or scalars to perform +hard conversion of objects to a specified type: - :meth:`~pandas.to_numeric` (conversion to numeric dtypes) @@ -1889,7 +2138,7 @@ gotchas Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``. The dtype of the input data will be preserved in cases where ``nans`` are not introduced (starting in 0.11.0) -See also :ref:`Support for integer ``NA`` ` +See also :ref:`Support for integer NA ` .. ipython:: python @@ -2002,7 +2251,3 @@ All numpy dtypes are subclasses of ``numpy.generic``: Pandas also defines the types ``category``, and ``datetime64[ns, tz]``, which are not integrated into the normal numpy hierarchy and wont show up with the above function. - -.. note:: - - The ``include`` and ``exclude`` parameters must be non-string sequences. diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst index 18e429cfc92fa..02d7920bc4a84 100644 --- a/doc/source/categorical.rst +++ b/doc/source/categorical.rst @@ -230,6 +230,15 @@ Categories must be unique or a `ValueError` is raised: except ValueError as e: print("ValueError: " + str(e)) +Categories must also not be ``NaN`` or a `ValueError` is raised: + +.. ipython:: python + + try: + s.cat.categories = [1,2,np.nan] + except ValueError as e: + print("ValueError: " + str(e)) + Appending new categories ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -444,6 +453,14 @@ the original values: np.asarray(cat) > base +When you compare two unordered categoricals with the same categories, the order is not considered: + +.. ipython:: python + + c1 = pd.Categorical(['a', 'b'], categories=['a', 'b'], ordered=False) + c2 = pd.Categorical(['a', 'b'], categories=['b', 'a'], ordered=False) + c1 == c2 + Operations ---------- @@ -617,6 +634,7 @@ Assigning a `Categorical` to parts of a column of other types will use the value df df.dtypes +.. _categorical.merge: Merging ~~~~~~~ @@ -646,6 +664,9 @@ In this case the categories are not the same and so an error is raised: The same applies to ``df.append(df_different)``. +See also the section on :ref:`merge dtypes` for notes about preserving merge dtypes and performance. + + .. _categorical.union: Unioning @@ -660,7 +681,7 @@ will be the union of the categories being combined. .. ipython:: python - from pandas.types.concat import union_categoricals + from pandas.api.types import union_categoricals a = pd.Categorical(["b", "c"]) b = pd.Categorical(["a", "b"]) union_categoricals([a, b]) @@ -693,6 +714,17 @@ The below raises ``TypeError`` because the categories are ordered and not identi Out[3]: TypeError: to union ordered Categoricals, all categories must be the same +.. versionadded:: 0.20.0 + +Ordered categoricals with different categories or orderings can be combined by +using the ``ignore_ordered=True`` argument. + +.. ipython:: python + + a = pd.Categorical(["a", "b", "c"], ordered=True) + b = pd.Categorical(["c", "b", "a"], ordered=True) + union_categoricals([a, b], ignore_order=True) + ``union_categoricals`` also works with a ``CategoricalIndex``, or ``Series`` containing categorical data, but note that the resulting array will always be a plain ``Categorical`` @@ -831,14 +863,14 @@ a code of ``-1``. s.cat.codes -Methods for working with missing data, e.g. :meth:`~Series.isnull`, :meth:`~Series.fillna`, +Methods for working with missing data, e.g. :meth:`~Series.isna`, :meth:`~Series.fillna`, :meth:`~Series.dropna`, all work normally: .. ipython:: python s = pd.Series(["a", "b", np.nan], dtype="category") s - pd.isnull(s) + pd.isna(s) s.fillna("a") Differences to R's `factor` diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst index aa0cbab4df10b..194e022e34c7c 100644 --- a/doc/source/comparison_with_r.rst +++ b/doc/source/comparison_with_r.rst @@ -206,14 +206,6 @@ of its first argument in its second: s <- 0:4 match(s, c(2,4)) -The :meth:`~pandas.core.groupby.GroupBy.apply` method can be used to replicate -this: - -.. ipython:: python - - s = pd.Series(np.arange(5),dtype=np.float32) - pd.Series(pd.match(s,[2,4],np.nan)) - For more details and examples see :ref:`the reshaping documentation `. diff --git a/doc/source/comparison_with_sas.rst b/doc/source/comparison_with_sas.rst index 7ec91d251f15d..1f2424d8a22f3 100644 --- a/doc/source/comparison_with_sas.rst +++ b/doc/source/comparison_with_sas.rst @@ -357,6 +357,146 @@ takes a list of columns to sort by. tips = tips.sort_values(['sex', 'total_bill']) tips.head() + +String Processing +----------------- + +Length +~~~~~~ + +SAS determines the length of a character string with the +`LENGTHN `__ +and `LENGTHC `__ +functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailing blanks. + +.. code-block:: none + + data _null_; + set tips; + put(LENGTHN(time)); + put(LENGTHC(time)); + run; + +Python determines the length of a character string with the ``len`` function. +``len`` includes trailing blanks. Use ``len`` and ``rstrip`` to exclude +trailing blanks. + +.. ipython:: python + + tips['time'].str.len().head() + tips['time'].str.rstrip().str.len().head() + + +Find +~~~~ + +SAS determines the position of a character in a string with the +`FINDW `__ function. +``FINDW`` takes the string defined by the first argument and searches for the first position of the substring +you supply as the second argument. + +.. code-block:: none + + data _null_; + set tips; + put(FINDW(sex,'ale')); + run; + +Python determines the position of a character in a string with the +``find`` function. ``find`` searches for the first position of the +substring. If the substring is found, the function returns its +position. Keep in mind that Python indexes are zero-based and +the function will return -1 if it fails to find the substring. + +.. ipython:: python + + tips['sex'].str.find("ale").head() + + +Substring +~~~~~~~~~ + +SAS extracts a substring from a string based on its position with the +`SUBSTR `__ function. + +.. code-block:: none + + data _null_; + set tips; + put(substr(sex,1,1)); + run; + +With pandas you can use ``[]`` notation to extract a substring +from a string by position locations. Keep in mind that Python +indexes are zero-based. + +.. ipython:: python + + tips['sex'].str[0:1].head() + + +Scan +~~~~ + +The SAS `SCAN `__ +function returns the nth word from a string. The first argument is the string you want to parse and the +second argument specifies which word you want to extract. + +.. code-block:: none + + data firstlast; + input String $60.; + First_Name = scan(string, 1); + Last_Name = scan(string, -1); + datalines2; + John Smith; + Jane Cook; + ;;; + run; + +Python extracts a substring from a string based on its text +by using regular expressions. There are much more powerful +approaches, but this just shows a simple approach. + +.. ipython:: python + + firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']}) + firstlast['First_Name'] = firstlast['String'].str.split(" ", expand=True)[0] + firstlast['Last_Name'] = firstlast['String'].str.rsplit(" ", expand=True)[0] + firstlast + + +Upcase, Lowcase, and Propcase +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The SAS `UPCASE `__ +`LOWCASE `__ and +`PROPCASE `__ +functions change the case of the argument. + +.. code-block:: none + + data firstlast; + input String $60.; + string_up = UPCASE(string); + string_low = LOWCASE(string); + string_prop = PROPCASE(string); + datalines2; + John Smith; + Jane Cook; + ;;; + run; + +The equivalent Python functions are ``upper``, ``lower``, and ``title``. + +.. ipython:: python + + firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']}) + firstlast['string_up'] = firstlast['String'].str.upper() + firstlast['string_low'] = firstlast['String'].str.lower() + firstlast['string_prop'] = firstlast['String'].str.title() + firstlast + Merging ------- @@ -444,13 +584,13 @@ For example, in SAS you could do this to filter missing values. if value_x ^= .; run; -Which doesn't work in in pandas. Instead, the ``pd.isnull`` or ``pd.notnull`` functions +Which doesn't work in in pandas. Instead, the ``pd.isna`` or ``pd.notna`` functions should be used for comparisons. .. ipython:: python - outer_join[pd.isnull(outer_join['value_x'])] - outer_join[pd.notnull(outer_join['value_x'])] + outer_join[pd.isna(outer_join['value_x'])] + outer_join[pd.notna(outer_join['value_x'])] pandas also provides a variety of methods to work with missing data - some of which would be challenging to express in SAS. For example, there are methods to @@ -570,15 +710,14 @@ machine's memory, but also that the operations on that data may be faster. If out of core processing is needed, one possibility is the `dask.dataframe `_ -library (currently in development) which +library (currently in development) which provides a subset of pandas functionality for an on-disk ``DataFrame`` Data Interop ~~~~~~~~~~~~ pandas provides a :func:`read_sas` method that can read SAS data saved in -the XPORT format. The ability to read SAS's binary format is planned for a -future release. +the XPORT or SAS7BDAT binary format. .. code-block:: none @@ -591,6 +730,15 @@ future release. .. code-block:: python df = pd.read_sas('transport-file.xpt') + df = pd.read_sas('binary-file.sas7bdat') + +You can also specify the file format directly. By default, pandas will try +to infer the file format based on its extension. + +.. code-block:: python + + df = pd.read_sas('transport-file.xpt', format='xport') + df = pd.read_sas('binary-file.sas7bdat', format='sas7bdat') XPORT is a relatively limited format and the parsing of it is not as optimized as some of the other pandas readers. An alternative way diff --git a/doc/source/comparison_with_sql.rst b/doc/source/comparison_with_sql.rst index 7962e0e69faa1..2112c7de8c897 100644 --- a/doc/source/comparison_with_sql.rst +++ b/doc/source/comparison_with_sql.rst @@ -101,7 +101,7 @@ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame usi # tips by parties of at least 5 diners OR bill total was more than $45 tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)] -NULL checking is done using the :meth:`~pandas.Series.notnull` and :meth:`~pandas.Series.isnull` +NULL checking is done using the :meth:`~pandas.Series.notna` and :meth:`~pandas.Series.isna` methods. .. ipython:: python @@ -121,9 +121,9 @@ where ``col2`` IS NULL with the following query: .. ipython:: python - frame[frame['col2'].isnull()] + frame[frame['col2'].isna()] -Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.notnull`. +Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.notna`. .. code-block:: sql @@ -133,7 +133,7 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series. .. ipython:: python - frame[frame['col1'].notnull()] + frame[frame['col1'].notna()] GROUP BY diff --git a/doc/source/computation.rst b/doc/source/computation.rst index 57480a244f308..76a030d355e33 100644 --- a/doc/source/computation.rst +++ b/doc/source/computation.rst @@ -459,6 +459,48 @@ default of the index) in a DataFrame. dft dft.rolling('2s', on='foo').sum() +.. _stats.rolling_window.endpoints: + +Rolling Window Endpoints +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +The inclusion of the interval endpoints in rolling window calculations can be specified with the ``closed`` +parameter: + +.. csv-table:: + :header: "``closed``", "Description", "Default for" + :widths: 20, 30, 30 + + ``right``, close right endpoint, time-based windows + ``left``, close left endpoint, + ``both``, close both endpoints, fixed windows + ``neither``, open endpoints, + +For example, having the right endpoint open is useful in many problems that require that there is no contamination +from present information back to past information. This allows the rolling window to compute statistics +"up to that point in time", but not including that point in time. + +.. ipython:: python + + df = pd.DataFrame({'x': 1}, + index = [pd.Timestamp('20130101 09:00:01'), + pd.Timestamp('20130101 09:00:02'), + pd.Timestamp('20130101 09:00:03'), + pd.Timestamp('20130101 09:00:04'), + pd.Timestamp('20130101 09:00:06')]) + + df["right"] = df.rolling('2s', closed='right').x.sum() # default + df["both"] = df.rolling('2s', closed='both').x.sum() + df["left"] = df.rolling('2s', closed='left').x.sum() + df["neither"] = df.rolling('2s', closed='neither').x.sum() + + df + +Currently, this feature is only implemented for time-based windows. +For fixed windows, the closed parameter cannot be set and the rolling window will always have both endpoints closed. + .. _stats.moments.ts-versus-resampling: Time-aware Rolling vs. Resampling @@ -505,13 +547,18 @@ two ``Series`` or any combination of ``DataFrame/Series`` or - ``DataFrame/DataFrame``: by default compute the statistic for matching column names, returning a DataFrame. If the keyword argument ``pairwise=True`` is passed then computes the statistic for each pair of columns, returning a - ``Panel`` whose ``items`` are the dates in question (see :ref:`the next section + ``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section `). For example: .. ipython:: python + df = pd.DataFrame(np.random.randn(1000, 4), + index=pd.date_range('1/1/2000', periods=1000), + columns=['A', 'B', 'C', 'D']) + df = df.cumsum() + df2 = df[:20] df2.rolling(window=5).corr(df2['B']) @@ -520,11 +567,16 @@ For example: Computing rolling pairwise covariances and correlations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. warning:: + + Prior to version 0.20.0 if ``pairwise=True`` was passed, a ``Panel`` would be returned. + This will now return a 2-level MultiIndexed DataFrame, see the whatsnew :ref:`here ` + In financial data analysis and other fields it's common to compute covariance and correlation matrices for a collection of time series. Often one is also interested in moving-window covariance and correlation matrices. This can be done by passing the ``pairwise`` keyword argument, which in the case of -``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in +``DataFrame`` inputs will yield a MultiIndexed ``DataFrame`` whose ``index`` are the dates in question. In the case of a single DataFrame argument the ``pairwise`` argument can even be omitted: @@ -539,15 +591,15 @@ can even be omitted: .. ipython:: python covs = df[['B','C','D']].rolling(window=50).cov(df[['A','B','C']], pairwise=True) - covs[df.index[-50]] + covs.loc['2002-09-22':] .. ipython:: python correls = df.rolling(window=50).corr() - correls[df.index[-50]] + correls.loc['2002-09-22':] You can efficiently retrieve the time series of correlations between two -columns using ``.loc`` indexing: +columns by reshaping and indexing: .. ipython:: python :suppress: @@ -557,7 +609,7 @@ columns using ``.loc`` indexing: .. ipython:: python @savefig rolling_corr_pairwise_ex.png - correls.loc[:, 'A', 'C'].plot() + correls.unstack(1)[('A', 'C')].plot() .. _stats.aggregate: @@ -565,7 +617,9 @@ Aggregation ----------- Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to -perform multiple computations on the data. This is very similar to a ``.groupby(...).agg`` seen :ref:`here `. +perform multiple computations on the data. These operations are similar to the :ref:`aggregating API `, +:ref:`groupby API `, and :ref:`resample API `. + .. ipython:: python @@ -590,24 +644,16 @@ columns if none are selected. .. _stats.aggregate.multifunc: -Applying multiple functions at once -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Applying multiple functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -With windowed Series you can also pass a list or dict of functions to do +With windowed ``Series`` you can also pass a list of functions to do aggregation with, outputting a DataFrame: .. ipython:: python r['A'].agg([np.sum, np.mean, np.std]) -If a dict is passed, the keys will be used to name the columns. Otherwise the -function's name (stored in the function object) will be used. - -.. ipython:: python - - r['A'].agg({'result1' : np.sum, - 'result2' : np.mean}) - On a widowed DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index: @@ -622,7 +668,7 @@ Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By passing a dict to ``aggregate`` you can apply a different aggregation to the -columns of a DataFrame: +columns of a ``DataFrame``: .. ipython:: python :okexcept: diff --git a/doc/source/conf.py b/doc/source/conf.py index 1e82dfca87d17..6eb12324ee461 100644 --- a/doc/source/conf.py +++ b/doc/source/conf.py @@ -14,8 +14,22 @@ import os import re import inspect +import importlib from pandas.compat import u, PY3 +try: + raw_input # Python 2 +except NameError: + raw_input = input # Python 3 + +# https://github.com/sphinx-doc/sphinx/pull/2325/files +# Workaround for sphinx-build recursion limit overflow: +# pickle.dump(doctree, f, pickle.HIGHEST_PROTOCOL) +# RuntimeError: maximum recursion depth exceeded while pickling an object +# +# Python's default allowed recursion depth is 1000. +sys.setrecursionlimit(5000) + # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. @@ -44,14 +58,16 @@ 'numpydoc', # used to parse numpy-style docstrings for autodoc 'ipython_sphinxext.ipython_directive', 'ipython_sphinxext.ipython_console_highlighting', + 'IPython.sphinxext.ipython_console_highlighting', # lowercase didn't work 'sphinx.ext.intersphinx', 'sphinx.ext.coverage', - 'sphinx.ext.pngmath', + 'sphinx.ext.mathjax', 'sphinx.ext.ifconfig', 'sphinx.ext.linkcode', + 'nbsphinx', ] - +exclude_patterns = ['**.ipynb_checkpoints'] with open("index.rst") as f: index_rst_lines = f.readlines() @@ -62,15 +78,16 @@ # JP: added from sphinxdocs autosummary_generate = False -if any([re.match("\s*api\s*",l) for l in index_rst_lines]): +if any([re.match("\s*api\s*", l) for l in index_rst_lines]): autosummary_generate = True files_to_delete = [] for f in os.listdir(os.path.dirname(__file__)): - if not f.endswith('.rst') or f.startswith('.') or os.path.basename(f) == 'index.rst': + if (not f.endswith(('.ipynb', '.rst')) or + f.startswith('.') or os.path.basename(f) == 'index.rst'): continue - _file_basename = f.split('.rst')[0] + _file_basename = os.path.splitext(f)[0] _regex_to_match = "\s*{}\s*$".format(_file_basename) if not any([re.match(_regex_to_match, line) for line in index_rst_lines]): files_to_delete.append(f) @@ -215,20 +232,70 @@ # Additional templates that should be rendered to pages, maps page names to # template names. -# Add redirect for previously existing API pages (which are now included in -# the API pages as top-level functions) based on a template (GH9911) +# Add redirect for previously existing API pages +# each item is like `(from_old, to_new)` +# To redirect a class and all its methods, see below +# https://github.com/pandas-dev/pandas/issues/16186 + moved_api_pages = [ - 'pandas.core.common.isnull', 'pandas.core.common.notnull', 'pandas.core.reshape.get_dummies', - 'pandas.tools.merge.concat', 'pandas.tools.merge.merge', 'pandas.tools.pivot.pivot_table', - 'pandas.tseries.tools.to_datetime', 'pandas.io.clipboard.read_clipboard', 'pandas.io.excel.ExcelFile.parse', - 'pandas.io.excel.read_excel', 'pandas.io.html.read_html', 'pandas.io.json.read_json', - 'pandas.io.parsers.read_csv', 'pandas.io.parsers.read_fwf', 'pandas.io.parsers.read_table', - 'pandas.io.pickle.read_pickle', 'pandas.io.pytables.HDFStore.append', 'pandas.io.pytables.HDFStore.get', - 'pandas.io.pytables.HDFStore.put', 'pandas.io.pytables.HDFStore.select', 'pandas.io.pytables.read_hdf', - 'pandas.io.sql.read_sql', 'pandas.io.sql.read_frame', 'pandas.io.sql.write_frame', - 'pandas.io.stata.read_stata'] - -html_additional_pages = {'generated/' + page: 'api_redirect.html' for page in moved_api_pages} + ('pandas.core.common.isnull', 'pandas.isna'), + ('pandas.core.common.notnull', 'pandas.notna'), + ('pandas.core.reshape.get_dummies', 'pandas.get_dummies'), + ('pandas.tools.merge.concat', 'pandas.concat'), + ('pandas.tools.merge.merge', 'pandas.merge'), + ('pandas.tools.pivot.pivot_table', 'pandas.pivot_table'), + ('pandas.tseries.tools.to_datetime', 'pandas.to_datetime'), + ('pandas.io.clipboard.read_clipboard', 'pandas.read_clipboard'), + ('pandas.io.excel.ExcelFile.parse', 'pandas.ExcelFile.parse'), + ('pandas.io.excel.read_excel', 'pandas.read_excel'), + ('pandas.io.gbq.read_gbq', 'pandas.read_gbq'), + ('pandas.io.html.read_html', 'pandas.read_html'), + ('pandas.io.json.read_json', 'pandas.read_json'), + ('pandas.io.parsers.read_csv', 'pandas.read_csv'), + ('pandas.io.parsers.read_fwf', 'pandas.read_fwf'), + ('pandas.io.parsers.read_table', 'pandas.read_table'), + ('pandas.io.pickle.read_pickle', 'pandas.read_pickle'), + ('pandas.io.pytables.HDFStore.append', 'pandas.HDFStore.append'), + ('pandas.io.pytables.HDFStore.get', 'pandas.HDFStore.get'), + ('pandas.io.pytables.HDFStore.put', 'pandas.HDFStore.put'), + ('pandas.io.pytables.HDFStore.select', 'pandas.HDFStore.select'), + ('pandas.io.pytables.read_hdf', 'pandas.read_hdf'), + ('pandas.io.sql.read_sql', 'pandas.read_sql'), + ('pandas.io.sql.read_frame', 'pandas.read_frame'), + ('pandas.io.sql.write_frame', 'pandas.write_frame'), + ('pandas.io.stata.read_stata', 'pandas.read_stata'), +] + +# Again, tuples of (from_old, to_new) +moved_classes = [ + ('pandas.tseries.resample.Resampler', 'pandas.core.resample.Resampler'), + ('pandas.formats.style.Styler', 'pandas.io.formats.style.Styler'), +] + +for old, new in moved_classes: + # the class itself... + moved_api_pages.append((old, new)) + + mod, classname = new.rsplit('.', 1) + klass = getattr(importlib.import_module(mod), classname) + methods = [x for x in dir(klass) + if not x.startswith('_') or x in ('__iter__', '__array__')] + + for method in methods: + # ... and each of its public methods + moved_api_pages.append( + ("{old}.{method}".format(old=old, method=method), + "{new}.{method}".format(new=new, method=method)) + ) + +html_additional_pages = { + 'generated/' + page[0]: 'api_redirect.html' + for page in moved_api_pages +} + +html_context = { + 'redirects': {old: new for old, new in moved_api_pages} +} # If false, no module index is generated. html_use_modindex = True @@ -253,6 +320,9 @@ # Output file base name for HTML help builder. htmlhelp_basename = 'pandas' +# -- Options for nbsphinx ------------------------------------------------ + +nbsphinx_allow_errors = True # -- Options for LaTeX output -------------------------------------------- diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst index ecc2a5e723c45..e172d0d2a71a2 100644 --- a/doc/source/contributing.rst +++ b/doc/source/contributing.rst @@ -51,14 +51,9 @@ Bug reports must: ... ``` -#. Include the full version string of *pandas* and its dependencies. In versions - of *pandas* after 0.12 you can use a built in function:: - - >>> from pandas.util.print_versions import show_versions - >>> show_versions() - - and in *pandas* 0.13.1 onwards:: +#. Include the full version string of *pandas* and its dependencies. You can use the built in function:: + >>> import pandas as pd >>> pd.show_versions() #. Explain why the current behavior is wrong/not desired and what you expect instead. @@ -113,13 +108,6 @@ want to clone your fork to your machine:: This creates the directory `pandas-yourname` and connects your repository to the upstream (main project) *pandas* repository. -The testing suite will run automatically on Travis-CI and Appveyor once your -pull request is submitted. However, if you wish to run the test suite on a -branch prior to submitting the pull request, then Travis-CI and/or AppVeyor -need to be hooked up to your GitHub repository. Instructions for doing so -are `here `__ for -Travis-CI and `here `__ for AppVeyor. - Creating a branch ----------------- @@ -183,7 +171,7 @@ other dependencies, you can install them as follows:: To install *all* pandas dependencies you can do the following:: - conda install -n pandas_dev -c pandas --file ci/requirements_all.txt + conda install -n pandas_dev -c conda-forge --file ci/requirements_all.txt To work in this environment, Windows users should ``activate`` it as follows:: @@ -216,7 +204,7 @@ At this point you can easily do an *in-place* install, as detailed in the next s Creating a Windows development environment ------------------------------------------ -To build on Windows, you need to have compilers installed to build the extensions. You will need to install the appropriate Visual Studio compilers, VS 2008 for Python 2.7, VS 2010 for 3.4, and VS 2015 for Python 3.5. +To build on Windows, you need to have compilers installed to build the extensions. You will need to install the appropriate Visual Studio compilers, VS 2008 for Python 2.7, VS 2010 for 3.4, and VS 2015 for Python 3.5 and 3.6. For Python 2.7, you can install the ``mingw`` compiler which will work equivalently to VS 2008:: @@ -226,7 +214,7 @@ or use the `Microsoft Visual Studio VC++ compiler for Python `__. Read the references below as there may be various gotchas during the installation. -For Python 3.5, you can download and install the `Visual Studio 2015 Community Edition `__. +For Python 3.5 and 3.6, you can download and install the `Visual Studio 2015 Community Edition `__. Here are some references and blogs: @@ -359,15 +347,14 @@ have ``sphinx`` and ``ipython`` installed. `numpydoc `_ is used to parse the docstrings that follow the Numpy Docstring Standard (see above), but you don't need to install this because a local copy of numpydoc is included in the *pandas* source -code. -`nbconvert `_ and -`nbformat `_ are required to build +code. `nbsphinx `_ is required to build the Jupyter notebooks included in the documentation. If you have a conda environment named ``pandas_dev``, you can install the extra requirements with:: conda install -n pandas_dev sphinx ipython nbconvert nbformat + conda install -n pandas_dev -c conda-forge nbsphinx Furthermore, it is recommended to have all :ref:`optional dependencies `. installed. This is not strictly necessary, but be aware that you will see some error @@ -432,7 +419,8 @@ Building master branch documentation When pull requests are merged into the *pandas* ``master`` branch, the main parts of the documentation are also built by Travis-CI. These docs are then hosted `here -`__. +`__, see also +the :ref:`Continuous Integration ` section. Contributing to the code base ============================= @@ -444,8 +432,9 @@ Code standards -------------- Writing good code is not just about what you write. It is also about *how* you -write it. During testing on Travis-CI, several tools will be run to check your -code for stylistic errors. Generating any warnings will cause the test to fail. +write it. During :ref:`Continuous Integration ` testing, several +tools will be run to check your code for stylistic errors. +Generating any warnings will cause the test to fail. Thus, good style is a requirement for submitting code to *pandas*. In addition, because a lot of people use our library, it is important that we @@ -461,13 +450,14 @@ C (cpplint) *pandas* uses the `Google `_ standard. Google provides an open source style checker called ``cpplint``, but we -use a fork of it that can be found `here `_. +use a fork of it that can be found `here `__. Here are *some* of the more common ``cpplint`` issues: - we restrict line-length to 80 characters to promote readability - every header file must include a header guard to avoid name collisions if re-included -Travis-CI will run the `cpplint `_ tool +:ref:`Continuous Integration `. will run the +`cpplint `_ tool and report any stylistic errors in your code. Therefore, it is helpful before submitting code to run the check yourself:: @@ -479,7 +469,7 @@ You can also run this command on an entire directory if necessary:: To make your commits compliant with this standard, you can install the `ClangFormat `_ tool, which can be -downloaded `here `_. To configure, in your home directory, +downloaded `here `__. To configure, in your home directory, run the following command:: clang-format style=google -dump-config > .clang-format @@ -514,20 +504,42 @@ the more common ``PEP8`` issues: - we restrict line-length to 79 characters to promote readability - passing arguments should have spaces after commas, e.g. ``foo(arg1, arg2, kw1='bar')`` -Travis-CI will run the `flake8 `_ tool +:ref:`Continuous Integration ` will run +the `flake8 `_ tool and report any stylistic errors in your code. Therefore, it is helpful before submitting code to run the check yourself on the diff:: - git diff master | flake8 --diff + git diff master -u -- "*.py" | flake8 --diff + +This command will catch any stylistic errors in your changes specifically, but +be beware it may not catch all of them. For example, if you delete the only +usage of an imported function, it is stylistically incorrect to import an +unused function. However, style-checking the diff will not catch this because +the actual import is not part of the diff. Thus, for completeness, you should +run this command, though it will take longer:: + + git diff master --name-only -- "*.py" | grep "pandas/" | xargs -r flake8 -Furthermore, we've written a tool to check that your commits are PEP8 great, `pip install pep8radius -`_. Look at PEP8 fixes in your branch vs master with:: +Note that on OSX, the ``-r`` flag is not available, so you have to omit it and +run this slightly modified command:: - pep8radius master --diff + git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8 -and make these changes with:: +Note that on Windows, these commands are unfortunately not possible because +commands like ``grep`` and ``xargs`` are not available natively. To imitate the +behavior with the commands above, you should run:: - pep8radius master --diff --in-place + git diff master --name-only -- "*.py" + +This will list all of the Python files that have been modified. The only ones +that matter during linting are any whose directory filepath begins with "pandas." +For each filepath, copy and paste it after the ``flake8`` command as shown below: + + flake8 + +Alternatively, you can install the ``grep`` and ``xargs`` commands via the +`MinGW `__ toolchain, and it will allow you to run the +commands above. Backwards Compatibility ~~~~~~~~~~~~~~~~~~~~~~~ @@ -537,6 +549,35 @@ existing code, so don't break it if at all possible. If you think breakage is r clearly state why as part of the pull request. Also, be careful when changing method signatures and add deprecation warnings where needed. +.. _contributing.ci: + +Testing With Continuous Integration +----------------------------------- + +The *pandas* test suite will run automatically on `Travis-CI `__, +`Appveyor `__, and `Circle CI `__ continuous integration +services, once your pull request is submitted. +However, if you wish to run the test suite on a branch prior to submitting the pull request, +then the continuous integration services need to be hooked to your GitHub repository. Instructions are here +for `Travis-CI `__, +`Appveyor `__ , and `CircleCI `__. + +A pull-request will be considered for merging when you have an all 'green' build. If any tests are failing, +then you will get a red 'X', where you can click through to see the individual failed tests. +This is an example of a green build. + +.. image:: _static/ci.png + +.. note:: + + Each time you push to *your* fork, a *new* run of the tests will be triggered on the CI. Appveyor will auto-cancel + any non-currently-running tests for that same pull-request. You can enable the auto-cancel feature for + `Travis-CI here `__ and + for `CircleCI here `__. + +.. _contributing.tdd: + + Test-driven development/code writing ------------------------------------ @@ -552,11 +593,15 @@ use cases and writing corresponding tests. Adding tests is one of the most common requests after code is pushed to *pandas*. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue. -Like many packages, *pandas* uses the `Nose testing system -`_ and the convenient +Like many packages, *pandas* uses `pytest +`_ and the convenient extensions in `numpy.testing `_. +.. note:: + + The earliest supported pytest version is 3.1.0. + Writing tests ~~~~~~~~~~~~~ @@ -589,23 +634,154 @@ the expected correct result:: assert_frame_equal(pivoted, expected) +Transitioning to ``pytest`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*pandas* existing test structure is *mostly* classed based, meaning that you will typically find tests wrapped in a class. + +.. code-block:: python + + class TestReallyCoolFeature(object): + .... + +Going forward, we are moving to a more *functional* style using the `pytest `__ framework, which offers a richer testing +framework that will facilitate testing and developing. Thus, instead of writing test classes, we will write test functions like this: + +.. code-block:: python + + def test_really_cool_feature(): + .... + +Using ``pytest`` +~~~~~~~~~~~~~~~~ + +Here is an example of a self-contained set of tests that illustrate multiple features that we like to use. + +- functional style: tests are like ``test_*`` and *only* take arguments that are either fixtures or parameters +- ``pytest.mark`` can be used to set metadata on test functions, e.g. ``skip`` or ``xfail``. +- using ``parametrize``: allow testing of multiple cases +- to set a mark on a parameter, ``pytest.param(..., marks=...)`` syntax should be used +- ``fixture``, code for object construction, on a per-test basis +- using bare ``assert`` for scalars and truth-testing +- ``tm.assert_series_equal`` (and its counter part ``tm.assert_frame_equal``), for pandas object comparisons. +- the typical pattern of constructing an ``expected`` and comparing versus the ``result`` + +We would name this file ``test_cool_feature.py`` and put in an appropriate place in the ``pandas/tests/`` structure. + +.. code-block:: python + + import pytest + import numpy as np + import pandas as pd + from pandas.util import testing as tm + + @pytest.mark.parametrize('dtype', ['int8', 'int16', 'int32', 'int64']) + def test_dtypes(dtype): + assert str(np.dtype(dtype)) == dtype + + @pytest.mark.parametrize('dtype', ['float32', + pytest.param('int16', marks=pytest.mark.skip), + pytest.param('int32', + marks=pytest.mark.xfail(reason='to show how it works'))]) + def test_mark(dtype): + assert str(np.dtype(dtype)) == 'float32' + + @pytest.fixture + def series(): + return pd.Series([1, 2, 3]) + + @pytest.fixture(params=['int8', 'int16', 'int32', 'int64']) + def dtype(request): + return request.param + + def test_series(series, dtype): + result = series.astype(dtype) + assert result.dtype == dtype + + expected = pd.Series([1, 2, 3], dtype=dtype) + tm.assert_series_equal(result, expected) + + +A test run of this yields + +.. code-block:: shell + + ((pandas) bash-3.2$ pytest test_cool_feature.py -v + =========================== test session starts =========================== + platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.31, pluggy-0.4.0 + collected 11 items + + tester.py::test_dtypes[int8] PASSED + tester.py::test_dtypes[int16] PASSED + tester.py::test_dtypes[int32] PASSED + tester.py::test_dtypes[int64] PASSED + tester.py::test_mark[float32] PASSED + tester.py::test_mark[int16] SKIPPED + tester.py::test_mark[int32] xfail + tester.py::test_series[int8] PASSED + tester.py::test_series[int16] PASSED + tester.py::test_series[int32] PASSED + tester.py::test_series[int64] PASSED + +Tests that we have ``parametrized`` are now accessible via the test name, for example we could run these with ``-k int8`` to sub-select *only* those tests which match ``int8``. + + +.. code-block:: shell + + ((pandas) bash-3.2$ pytest test_cool_feature.py -v -k int8 + =========================== test session starts =========================== + platform darwin -- Python 3.6.2, pytest-3.2.1, py-1.4.31, pluggy-0.4.0 + collected 11 items + + test_cool_feature.py::test_dtypes[int8] PASSED + test_cool_feature.py::test_series[int8] PASSED + + Running the test suite -~~~~~~~~~~~~~~~~~~~~~~ +---------------------- The tests can then be run directly inside your Git clone (without having to install *pandas*) by typing:: - nosetests pandas + pytest pandas The tests suite is exhaustive and takes around 20 minutes to run. Often it is worth running only a subset of tests first around your changes before running the -entire suite. This is done using one of the following constructs:: +entire suite. + +The easiest way to do this is with:: + + pytest pandas/path/to/test.py -k regex_matching_test_name + +Or with one of the following constructs:: + + pytest pandas/tests/[test-module].py + pytest pandas/tests/[test-module].py::[TestClass] + pytest pandas/tests/[test-module].py::[TestClass]::[test_method] + +Using `pytest-xdist `_, one can +speed up local testing on multicore machines. To use this feature, you will +need to install `pytest-xdist` via:: + + pip install pytest-xdist - nosetests pandas/tests/[test-module].py - nosetests pandas/tests/[test-module].py:[TestClass] - nosetests pandas/tests/[test-module].py:[TestClass].[test_method] +Two scripts are provided to assist with this. These scripts distribute +testing across 4 threads. - .. versionadded:: 0.18.0 +On Unix variants, one can type:: + + test_fast.sh + +On Windows, one can type:: + + test_fast.bat + +This can significantly reduce the time it takes to locally run tests before +submitting a pull request. + +For more, see the `pytest `_ documentation. + + .. versionadded:: 0.20.0 Furthermore one can run @@ -616,7 +792,8 @@ Furthermore one can run with an imported pandas to run tests similarly. Running the performance test suite -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------------- + Performance matters and it is worth considering whether your code has introduced performance regressions. *pandas* is in the process of migrating to `asv benchmarks `__ @@ -624,12 +801,6 @@ to enable easy monitoring of the performance of critical *pandas* operations. These benchmarks are all found in the ``pandas/asv_bench`` directory. asv supports both python2 and python3. -.. note:: - - The asv benchmark suite was translated from the previous framework, vbench, - so many stylistic issues are likely a result of automated transformation of the - code. - To use all features of asv, you will need either ``conda`` or ``virtualenv``. For more details please check the `asv installation webpage `_. @@ -689,73 +860,6 @@ This will display stderr from the benchmarks, and use your local Information on how to write a benchmark and how to use asv can be found in the `asv documentation `_. -.. _contributing.gbq_integration_tests: - -Running Google BigQuery Integration Tests -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You will need to create a Google BigQuery private key in JSON format in -order to run Google BigQuery integration tests on your local machine and -on Travis-CI. The first step is to create a `service account -`__. - -Integration tests for ``pandas.io.gbq`` are skipped in pull requests because -the credentials that are required for running Google BigQuery integration -tests are `encrypted `__ -on Travis-CI and are only accessible from the pandas-dev/pandas repository. The -credentials won't be available on forks of pandas. Here are the steps to run -gbq integration tests on a forked repository: - -#. Go to `Travis CI `__ and sign in with your GitHub - account. -#. Click on the ``+`` icon next to the ``My Repositories`` list and enable - Travis builds for your fork. -#. Click on the gear icon to edit your travis build, and add two environment - variables: - - - ``GBQ_PROJECT_ID`` with the value being the ID of your BigQuery project. - - - ``SERVICE_ACCOUNT_KEY`` with the value being the contents of the JSON key - that you downloaded for your service account. Use single quotes around - your JSON key to ensure that it is treated as a string. - - For both environment variables, keep the "Display value in build log" option - DISABLED. These variables contain sensitive data and you do not want their - contents being exposed in build logs. -#. Your branch should be tested automatically once it is pushed. You can check - the status by visiting your Travis branches page which exists at the - following location: https://travis-ci.org/your-user-name/pandas/branches . - Click on a build job for your branch. Expand the following line in the - build log: ``ci/print_skipped.py /tmp/nosetests.xml`` . Search for the - term ``test_gbq`` and confirm that gbq integration tests are not skipped. - -Running the vbench performance test suite (phasing out) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Historically, *pandas* used `vbench library `_ -to enable easy monitoring of the performance of critical *pandas* operations. -These benchmarks are all found in the ``pandas/vb_suite`` directory. vbench -currently only works on python2. - -To install vbench:: - - pip install git+https://github.com/pydata/vbench - -Vbench also requires ``sqlalchemy``, ``gitpython``, and ``psutil``, which can all be installed -using pip. If you need to run a benchmark, change your directory to the *pandas* root and run:: - - ./test_perf.sh -b master -t HEAD - -This will check out the master revision and run the suite on both master and -your commit. Running the full test suite can take up to one hour and use up -to 3GB of RAM. Usually it is sufficient to paste a subset of the results into the Pull Request to show that the committed changes do not cause unexpected -performance regressions. - -You can run specific benchmarks using the ``-r`` flag, which takes a regular expression. - -See the `performance testing wiki `_ for information -on how to write a benchmark. - Documenting your code --------------------- @@ -915,12 +1019,8 @@ updated. Pushing them to GitHub again is done by:: git push -f origin shiny-new-feature This will automatically update your pull request with the latest code and restart the -Travis-CI tests. +:ref:`Continuous Integration ` tests. -If your pull request is related to the ``pandas.io.gbq`` module, please see -the section on :ref:`Running Google BigQuery Integration Tests -` to configure a Google BigQuery service -account for your pull request on Travis-CI. Delete your merged branch (optional) ------------------------------------ diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst index 841195de3da47..32e7a616fe856 100644 --- a/doc/source/cookbook.rst +++ b/doc/source/cookbook.rst @@ -776,11 +776,17 @@ Resampling The :ref:`Resample ` docs. -`TimeGrouping of values grouped across time -`__ +`Using Grouper instead of TimeGrouper for time grouping of values +`__ -`TimeGrouping #2 -`__ +`Time grouping with some missing values +`__ + +`Valid frequency arguments to Grouper +`__ + +`Grouping using a MultiIndex +`__ `Using TimeGrouper and another grouping to create subgroups, then apply a custom function `__ @@ -905,14 +911,11 @@ CSV The :ref:`CSV ` docs -`read_csv in action `__ +`read_csv in action `__ `appending to a csv `__ -`how to read in multiple files, appending to create a single dataframe -`__ - `Reading a csv chunk-by-chunk `__ @@ -943,6 +946,42 @@ using that handle to read. `Write a multi-row index CSV without writing duplicates `__ +.. _cookbook.csv.multiple_files: + +Reading multiple files to create a single DataFrame +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all +of the individual frames into a list, and then combine the frames in the list using :func:`pd.concat`: + +.. ipython:: python + + for i in range(3): + data = pd.DataFrame(np.random.randn(10, 4)) + data.to_csv('file_{}.csv'.format(i)) + + files = ['file_0.csv', 'file_1.csv', 'file_2.csv'] + result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) + +You can use the same approach to read all files matching a pattern. Here is an example using ``glob``: + +.. ipython:: python + + import glob + files = glob.glob('file_*.csv') + result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) + +Finally, this strategy will work with the other ``pd.read_*(...)`` functions described in the :ref:`io docs`. + +.. ipython:: python + :suppress: + + for i in range(3): + os.remove('file_{}.csv'.format(i)) + +Parsing date components in multi-columns +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Parsing date components in multi-columns is faster with a format .. code-block:: python diff --git a/doc/source/developer.rst b/doc/source/developer.rst new file mode 100644 index 0000000000000..78c12b7e23b37 --- /dev/null +++ b/doc/source/developer.rst @@ -0,0 +1,135 @@ +.. _developer: + +.. currentmodule:: pandas + +.. ipython:: python + :suppress: + + import numpy as np + np.random.seed(123456) + np.set_printoptions(precision=4, suppress=True) + import pandas as pd + pd.options.display.max_rows = 15 + +********* +Developer +********* + +This section will focus on downstream applications of pandas. + +.. _apache.parquet: + +Storing pandas DataFrame objects in Apache Parquet format +--------------------------------------------------------- + +The `Apache Parquet `__ format +provides key-value metadata at the file and column level, stored in the footer +of the Parquet file: + +.. code-block:: shell + + 5: optional list key_value_metadata + +where ``KeyValue`` is + +.. code-block:: shell + + struct KeyValue { + 1: required string key + 2: optional string value + } + +So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a +``pandas`` metadata key in the ``FileMetaData`` with the the value stored as : + +.. code-block:: text + + {'index_columns': ['__index_level_0__', '__index_level_1__', ...], + 'columns': [, , ...], + 'pandas_version': $VERSION} + +Here, ```` and so forth are dictionaries containing the metadata for each +column. This has JSON form: + +.. code-block:: text + + {'name': column_name, + 'pandas_type': pandas_type, + 'numpy_type': numpy_type, + 'metadata': type_metadata} + +``pandas_type`` is the logical type of the column, and is one of: + +* Boolean: ``'bool'`` +* Integers: ``'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64'`` +* Floats: ``'float16', 'float32', 'float64'`` +* Date and Time Types: ``'datetime', 'datetimetz'``, ``'timedelta'`` +* String: ``'unicode', 'bytes'`` +* Categorical: ``'categorical'`` +* Other Python objects: ``'object'`` + +The ``numpy_type`` is the physical storage type of the column, which is the +result of ``str(dtype)`` for the underlying NumPy array that holds the data. So +for ``datetimetz`` this is ``datetime64[ns]`` and for categorical, it may be +any of the supported integer categorical types. + +The ``type_metadata`` is ``None`` except for: + +* ``datetimetz``: ``{'timezone': zone, 'unit': 'ns'}``, e.g. ``{'timezone', + 'America/New_York', 'unit': 'ns'}``. The ``'unit'`` is optional, and if + omitted it is assumed to be nanoseconds. +* ``categorical``: ``{'num_categories': K, 'ordered': is_ordered, 'type': $TYPE}`` + + * Here ``'type'`` is optional, and can be a nested pandas type specification + here (but not categorical) + +* ``unicode``: ``{'encoding': encoding}`` + + * The encoding is optional, and if not present is UTF-8 + +* ``object``: ``{'encoding': encoding}``. Objects can be serialized and stored + in ``BYTE_ARRAY`` Parquet columns. The encoding can be one of: + + * ``'pickle'`` + * ``'msgpack'`` + * ``'bson'`` + * ``'json'`` + +* ``timedelta``: ``{'unit': 'ns'}``. The ``'unit'`` is optional, and if omitted + it is assumed to be nanoseconds. This metadata is optional altogether + +For types other than these, the ``'metadata'`` key can be +omitted. Implementations can assume ``None`` if the key is not present. + +As an example of fully-formed metadata: + +.. code-block:: text + + {'index_columns': ['__index_level_0__'], + 'columns': [ + {'name': 'c0', + 'pandas_type': 'int8', + 'numpy_type': 'int8', + 'metadata': None}, + {'name': 'c1', + 'pandas_type': 'bytes', + 'numpy_type': 'object', + 'metadata': None}, + {'name': 'c2', + 'pandas_type': 'categorical', + 'numpy_type': 'int16', + 'metadata': {'num_categories': 1000, 'ordered': False}}, + {'name': 'c3', + 'pandas_type': 'datetimetz', + 'numpy_type': 'datetime64[ns]', + 'metadata': {'timezone': 'America/Los_Angeles'}}, + {'name': 'c4', + 'pandas_type': 'object', + 'numpy_type': 'object', + 'metadata': {'encoding': 'pickle'}}, + {'name': '__index_level_0__', + 'pandas_type': 'int64', + 'numpy_type': 'int64', + 'metadata': None} + ], + 'pandas_version': '0.20.0'} diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index cc69367017aed..3c6572229802d 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -153,7 +153,7 @@ Vectorized operations and label alignment with Series ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When doing data analysis, as with raw NumPy arrays looping through Series -value-by-value is usually not necessary. Series can be also be passed into most +value-by-value is usually not necessary. Series can also be passed into most NumPy methods expecting an ndarray. @@ -763,6 +763,11 @@ completion mechanism so they can be tab-completed: Panel ----- +.. warning:: + + In 0.20.0, ``Panel`` is deprecated and will be removed in + a future version. See the section :ref:`Deprecate Panel `. + Panel is a somewhat less-used, but still important container for 3-dimensional data. The term `panel data `__ is derived from econometrics and is partially responsible for the name pandas: @@ -783,6 +788,7 @@ From 3D ndarray with optional axis labels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. ipython:: python + :okwarning: wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], major_axis=pd.date_range('1/1/2000', periods=5), @@ -794,6 +800,7 @@ From dict of DataFrame objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. ipython:: python + :okwarning: data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 'Item2' : pd.DataFrame(np.random.randn(4, 2))} @@ -816,6 +823,7 @@ dictionary of DataFrames as above, and the following named parameters: For example, compare to the construction above: .. ipython:: python + :okwarning: pd.Panel.from_dict(data, orient='minor') @@ -824,6 +832,7 @@ DataFrame objects with mixed-type columns, all of the data will get upcasted to ``dtype=object`` unless you pass ``orient='minor'``: .. ipython:: python + :okwarning: df = pd.DataFrame({'a': ['foo', 'bar', 'baz'], 'b': np.random.randn(3)}) @@ -851,6 +860,7 @@ This method was introduced in v0.7 to replace ``LongPanel.to_long``, and convert a DataFrame with a two-level index to a Panel. .. ipython:: python + :okwarning: midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]]) df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx) @@ -880,6 +890,7 @@ A Panel can be rearranged using its ``transpose`` method (which does not make a copy by default unless the data are heterogeneous): .. ipython:: python + :okwarning: wp.transpose(2, 0, 1) @@ -909,6 +920,7 @@ Squeezing Another way to change the dimensionality of an object is to ``squeeze`` a 1-len object, similar to ``wp['Item1']`` .. ipython:: python + :okwarning: wp.reindex(items=['Item1']).squeeze() wp.reindex(items=['Item1'], minor=['B']).squeeze() @@ -923,12 +935,56 @@ for more on this. To convert a Panel to a DataFrame, use the ``to_frame`` method: .. ipython:: python + :okwarning: panel = pd.Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'], major_axis=pd.date_range('1/1/2000', periods=5), minor_axis=['a', 'b', 'c', 'd']) panel.to_frame() + +.. _dsintro.deprecate_panel: + +Deprecate Panel +--------------- + +Over the last few years, pandas has increased in both breadth and depth, with new features, +datatype support, and manipulation routines. As a result, supporting efficient indexing and functional +routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and +difficult-to-understand codebase. + +The 3-D structure of a ``Panel`` is much less common for many types of data analysis, +than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for +pandas to focus on these areas exclusively. + +Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data. + +In additon, the ``xarray`` package was built from the ground up, specifically in order to +support the multi-dimensional analysis that is one of ``Panel`` s main usecases. +`Here is a link to the xarray panel-transition documentation `__. + +.. ipython:: python + :okwarning: + + p = tm.makePanel() + p + +Convert to a MultiIndex DataFrame + +.. ipython:: python + :okwarning: + + p.to_frame() + +Alternatively, one can convert to an xarray ``DataArray``. + +.. ipython:: python + :okwarning: + + p.to_xarray() + +You can see the full-documentation for the `xarray package `__. + .. _dsintro.panelnd: .. _dsintro.panel4d: diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 5a7d6a11d293d..2348a3d10c54f 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -93,8 +93,8 @@ targets the IPython Notebook environment. `Plotly’s `__ `Python API `__ enables interactive figures and web shareability. Maps, 2D, 3D, and live-streaming graphs are rendered with WebGL and `D3.js `__. The library supports plotting directly from a pandas DataFrame and cloud-based collaboration. Users of `matplotlib, ggplot for Python, and Seaborn `__ can convert figures into interactive web-based plots. Plots can be drawn in `IPython Notebooks `__ , edited with R or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly is free for unlimited sharing, and has `cloud `__, `offline `__, or `on-premise `__ accounts for private use. -Visualizing Data in Qt applications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`QtPandas `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Spun off from the main pandas library, the `qtpandas `__ library enables DataFrame visualization and manipulation in PyQt4 and PySide applications. @@ -173,13 +173,15 @@ This package requires valid credentials for this API (non free). `pandaSDMX `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -pandaSDMX is an extensible library to retrieve and acquire statistical data +pandaSDMX is a library to retrieve and acquire statistical data and metadata disseminated in -`SDMX `_ 2.1. This standard is currently supported by -the European statistics office (Eurostat) -and the European Central Bank (ECB). Datasets may be returned as pandas Series -or multi-indexed DataFrames. - +`SDMX `_ 2.1, an ISO-standard +widely used by institutions such as statistics offices, central banks, +and international organisations. pandaSDMX can expose datasets and related +structural metadata including dataflows, code-lists, +and datastructure definitions as pandas Series +or multi-indexed DataFrames. + `fredapi `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fredapi is a Python interface to the `Federal Reserve Economic Data (FRED) `__ @@ -237,3 +239,14 @@ pandas own ``read_csv`` for CSV IO and leverages many existing packages such as PyTables, h5py, and pymongo to move data between non pandas formats. Its graph based approach is also extensible by end users for custom formats that may be too specific for the core of odo. + +.. _ecosystem.data_validation: + +Data validation +--------------- + +`Engarde `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Engarde is a lightweight library used to explicitly state your assumptions abour your datasets +and check that they're *actually* true. diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst index 11827fe2776cf..a3062b4086673 100644 --- a/doc/source/gotchas.rst +++ b/doc/source/gotchas.rst @@ -144,7 +144,7 @@ To evaluate single-element pandas objects in a boolean context, use the method ` Bitwise boolean ~~~~~~~~~~~~~~~ -Bitwise boolean operators like ``==`` and ``!=`` will return a boolean ``Series``, +Bitwise boolean operators like ``==`` and ``!=`` return a boolean ``Series``, which is almost always what you want anyways. .. code-block:: python @@ -194,7 +194,7 @@ For lack of ``NA`` (missing) support from the ground up in NumPy and Python in general, we were given the difficult choice between either - A *masked array* solution: an array of data and an array of boolean values - indicating whether a value + indicating whether a value is there or is missing - Using a special sentinel value, bit pattern, or set of sentinel values to denote ``NA`` across the dtypes @@ -202,7 +202,7 @@ For many reasons we chose the latter. After years of production use it has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value ``NaN`` (Not-A-Number) is used everywhere as the ``NA`` value, and there are API -functions ``isnull`` and ``notnull`` which can be used across the dtypes to +functions ``isna`` and ``notna`` which can be used across the dtypes to detect NA values. However, it comes with it a couple of trade-offs which I most certainly have @@ -247,16 +247,16 @@ dtype in order to store the NAs. These are summarized by this table: ``integer``, cast to ``float64`` ``boolean``, cast to ``object`` -While this may seem like a heavy trade-off, I have found very few -cases where this is an issue in practice. Some explanation for the motivation -here in the next section. +While this may seem like a heavy trade-off, I have found very few cases where +this is an issue in practice i.e. storing values greater than 2**53. Some +explanation for the motivation is in the next section. Why not make NumPy like R? ~~~~~~~~~~~~~~~~~~~~~~~~~~ Many people have suggested that NumPy should simply emulate the ``NA`` support present in the more domain-specific statistical programming language `R -`__. Part of the reason is the NumPy type hierarchy: +`__. Part of the reason is the NumPy type hierarchy: .. csv-table:: :header: "Typeclass","Dtypes" @@ -305,7 +305,7 @@ the ``DataFrame.copy`` method. If you are doing a lot of copying of DataFrame objects shared among threads, we recommend holding locks inside the threads where the data copying occurs. -See `this link `__ +See `this link `__ for more information. @@ -332,5 +332,5 @@ using something similar to the following: s = pd.Series(newx) See `the NumPy documentation on byte order -`__ for more +`__ for more details. diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 8484ccd69a983..937d682d238b3 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -439,7 +439,9 @@ Aggregation ----------- Once the GroupBy object has been created, several methods are available to -perform a computation on the grouped data. +perform a computation on the grouped data. These operations are similar to the +:ref:`aggregating API `, :ref:`window functions API `, +and :ref:`resample API `. An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` method: @@ -502,7 +504,7 @@ index are the group names and whose values are the sizes of each group. Applying multiple functions at once ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -With grouped Series you can also pass a list or dict of functions to do +With grouped ``Series`` you can also pass a list or dict of functions to do aggregation with, outputting a DataFrame: .. ipython:: python @@ -510,23 +512,35 @@ aggregation with, outputting a DataFrame: grouped = df.groupby('A') grouped['C'].agg([np.sum, np.mean, np.std]) -If a dict is passed, the keys will be used to name the columns. Otherwise the -function's name (stored in the function object) will be used. +On a grouped ``DataFrame``, you can pass a list of functions to apply to each +column, which produces an aggregated result with a hierarchical index: .. ipython:: python - grouped['D'].agg({'result1' : np.sum, - 'result2' : np.mean}) + grouped.agg([np.sum, np.mean, np.std]) -On a grouped DataFrame, you can pass a list of functions to apply to each -column, which produces an aggregated result with a hierarchical index: + +The resulting aggregations are named for the functions themselves. If you +need to rename, then you can add in a chained operation for a ``Series`` like this: .. ipython:: python - grouped.agg([np.sum, np.mean, np.std]) + (grouped['C'].agg([np.sum, np.mean, np.std]) + .rename(columns={'sum': 'foo', + 'mean': 'bar', + 'std': 'baz'}) + ) + +For a grouped ``DataFrame``, you can rename in a similar manner: + +.. ipython:: python + + (grouped.agg([np.sum, np.mean, np.std]) + .rename(columns={'sum': 'foo', + 'mean': 'bar', + 'std': 'baz'}) + ) -Passing a dict of functions has different behavior by default, see the next -section. Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -580,9 +594,21 @@ Transformation -------------- The ``transform`` method returns an object that is indexed the same (same size) -as the one being grouped. Thus, the passed transform function should return a -result that is the same size as the group chunk. For example, suppose we wished -to standardize the data within each group: +as the one being grouped. The transform function must: + +* Return a result that is either the same size as the group chunk or + broadcastable to the size of the group chunk (e.g., a scalar, + ``grouped.transform(lambda x: x.iloc[-1])``). +* Operate column-by-column on the group chunk. The transform is applied to + the first group chunk using chunk.apply. +* Not perform in-place operations on the group chunk. Group chunks should + be treated as immutable, and changes to a group chunk may produce unexpected + results. For example, when using ``fillna``, ``inplace`` must be ``False`` + (``grouped.transform(lambda x: x.fillna(inplace=False))``). +* (Optionally) operates on the entire group chunk. If this is supported, a + fast path is used starting from the *second* chunk. + +For example, suppose we wished to standardize the data within each group: .. ipython:: python @@ -620,6 +646,21 @@ We can also visually compare the original and transformed data sets. @savefig groupby_transform_plot.png compare.plot() +Transformation functions that have lower dimension outputs are broadcast to +match the shape of the input array. + +.. ipython:: python + + data_range = lambda x: x.max() - x.min() + ts.groupby(key).transform(data_range) + +Alternatively the built-in methods can be could be used to produce the same +outputs + +.. ipython:: python + + ts.groupby(key).transform('max') - ts.groupby(key).transform('min') + Another common data transform is to replace missing data with the group mean. .. ipython:: python @@ -664,8 +705,9 @@ and that the transformed data contains no NAs. .. note:: - Some functions when applied to a groupby object will automatically transform the input, returning - an object of the same shape as the original. Passing ``as_index=False`` will not affect these transformation methods. + Some functions when applied to a groupby object will automatically transform + the input, returning an object of the same shape as the original. Passing + ``as_index=False`` will not affect these transformation methods. For example: ``fillna, ffill, bfill, shift``. @@ -891,7 +933,7 @@ The dimension of the returned result can also change: d = pd.DataFrame({"a":["x", "y"], "b":[1,2]}) def identity(df): - print df + print(df) return df d.groupby("a").apply(identity) @@ -1080,12 +1122,36 @@ To see the order in which each row appears within its group, use the .. ipython:: python - df = pd.DataFrame(list('aaabba'), columns=['A']) - df + dfg = pd.DataFrame(list('aaabba'), columns=['A']) + dfg - df.groupby('A').cumcount() + dfg.groupby('A').cumcount() - df.groupby('A').cumcount(ascending=False) # kwarg only + dfg.groupby('A').cumcount(ascending=False) + +.. _groupby.ngroup: + +Enumerate groups +~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.2 + +To see the ordering of the groups (as opposed to the order of rows +within a group given by ``cumcount``) you can use the ``ngroup`` +method. + +Note that the numbers given to the groups match the order in which the +groups would be seen when iterating over the groupby object, not the +order they are first observed. + +.. ipython:: python + + dfg = pd.DataFrame(list('aaabba'), columns=['A']) + dfg + + dfg.groupby('A').ngroup() + + dfg.groupby('A').ngroup(ascending=False) Plotting ~~~~~~~~ @@ -1134,14 +1200,41 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on df df.groupby(df.sum(), axis=1).sum() +.. _groupby.multicolumn_factorization: + +Multi-column factorization +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By using ``.ngroup()``, we can extract information about the groups in +a way similar to :func:`factorize` (as described further in the +:ref:`reshaping API `) but which applies +naturally to multiple columns of mixed type and different +sources. This can be useful as an intermediate categorical-like step +in processing, when the relationships between the group rows are more +important than their content, or as input to an algorithm which only +accepts the integer encoding. (For more information about support in +pandas for full categorical data, see the :ref:`Categorical +introduction ` and the +:ref:`API documentation `.) + +.. ipython:: python + + dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")}) + + dfg + + dfg.groupby(["A", "B"]).ngroup() + + dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup() + Groupby by Indexer to 'resample' data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Resampling produces new hypothetical samples(resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples. +Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples. In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized. -In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation. +In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation. .. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples. diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template index 67072ff9fb224..f5c65e175b0db 100644 --- a/doc/source/index.rst.template +++ b/doc/source/index.rst.template @@ -116,7 +116,6 @@ See the package overview for more detail about what's in the library. whatsnew install contributing - faq overview 10min tutorials @@ -152,6 +151,7 @@ See the package overview for more detail about what's in the library. api {% endif -%} {%if not single -%} + developer internals release {% endif -%} diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index bc8997b313053..53a259ad6eb15 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -69,7 +69,7 @@ Different Choices for Indexing .. versionadded:: 0.11.0 Object selection has had a number of user-requested additions in order to -support more explicit location based indexing. pandas now supports three types +support more explicit location based indexing. Pandas now supports three types of multi-axis indexing. - ``.loc`` is primarily label based, but may also be used with a boolean array. ``.loc`` will raise ``KeyError`` when the items are not found. Allowed inputs are: @@ -78,8 +78,10 @@ of multi-axis indexing. *label* of the index. This use is **not** an integer position along the index) - A list or array of labels ``['a', 'b', 'c']`` - - A slice object with labels ``'a':'f'``, (note that contrary to usual python - slices, **both** the start and the stop are included!) + - A slice object with labels ``'a':'f'`` (note that contrary to usual python + slices, **both** the start and the stop are included, when present in the + index! - also see :ref:`Slicing with labels + `) - A boolean array - A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above) @@ -225,10 +227,6 @@ as an attribute: dfa.A panel.one -You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; -if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a -new column. - .. ipython:: python sa.a = 5 @@ -265,6 +263,37 @@ You can also assign a ``dict`` to a row of a ``DataFrame``: x.iloc[1] = dict(x=9, y=99) x +You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; +if you try to use attribute access to create a new column, it creates a new attribute rather than a +new column. In 0.21.0 and later, this will raise a ``UserWarning``: + +.. code-block:: ipython + + In[1]: df = pd.DataFrame({'one': [1., 2., 3.]}) + In[2]: df.two = [4, 5, 6] + UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access + In[3]: df + Out[3]: + one + 0 1.0 + 1 2.0 + 2 3.0 + +Similarly, it is possible to create a column with a name which collides with one of Pandas's +built-in methods or attributes, which can cause confusion later when attempting to access +that column as an attribute. This behavior now warns: + +.. code-block:: ipython + + In[4]: df['sum'] = [5., 7., 9.] + UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior + In[5]: df.sum + Out[5]: + + Slicing ranges -------------- @@ -330,13 +359,16 @@ Selection By Label dfl.loc['20130102':'20130104'] pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol. -**At least 1** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, the start bound is *included*, **AND** the stop bound is *included*. Integers are valid labels, but they refer to the label **and not the position**. +**At least 1** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, both the start bound **AND** the stop bound are *included*, if present in the index. Integers are valid labels, but they refer to the label **and not the position**. The ``.loc`` attribute is the primary access method. The following are valid inputs: - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is interpreted as a *label* of the index. This use is **not** an integer position along the index) - A list or array of labels ``['a', 'b', 'c']`` -- A slice object with labels ``'a':'f'`` (note that contrary to usual python slices, **both** the start and the stop are included!) +- A slice object with labels ``'a':'f'`` (note that contrary to usual python + slices, **both** the start and the stop are included, when present in the + index! - also See :ref:`Slicing with labels + `) - A boolean array - A ``callable``, see :ref:`Selection By Callable ` @@ -390,6 +422,34 @@ For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``) # this is also equivalent to ``df1.at['a','A']`` df1.loc['a', 'A'] +.. _indexing.slicing_with_labels: + +Slicing with labels +~~~~~~~~~~~~~~~~~~~ + +When using ``.loc`` with slices, if both the start and the stop labels are +present in the index, then elements *located* between the two (including them) +are returned: + +.. ipython:: python + + s = pd.Series(list('abcde'), index=[0,3,2,5,4]) + s.loc[3:5] + +If at least one of the two is absent, but the index is sorted, and can be +compared against start and stop labels, then slicing will still work as +expected, by selecting labels which *rank* between the two: + +.. ipython:: python + + s.sort_index() + s.sort_index().loc[1:6] + +However, if at least one of the two is absent *and* the index is not sorted, an +error will be raised (since doing otherwise would be computationally expensive, +as well as potentially ambiguous for mixed type indexes). For instance, in the +above example, ``s.loc[1:6]`` would raise ``KeyError``. + .. _indexing.integer: Selection By Position @@ -401,7 +461,7 @@ Selection By Position This is sometimes called ``chained assignment`` and should be avoided. See :ref:`Returning a View versus Copy ` -pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise a ``IndexError``. +Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``. The ``.iloc`` attribute is the primary access method. The following are valid inputs: diff --git a/doc/source/install.rst b/doc/source/install.rst index 158a6e5562b7a..f92c43839ee31 100644 --- a/doc/source/install.rst +++ b/doc/source/install.rst @@ -23,18 +23,6 @@ Officially Python 2.7, 3.4, 3.5, and 3.6 Installing pandas ----------------- -Trying out pandas, no installation required! -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The easiest way to start experimenting with pandas doesn't involve installing -pandas at all. - -`Wakari `__ is a free service that provides a hosted -`IPython Notebook `__ service in the cloud. - -Simply create an account, and have access to pandas from within your brower via -an `IPython Notebook `__ in a few minutes. - .. _install.anaconda: Installing pandas with Anaconda @@ -188,8 +176,8 @@ Running the test suite pandas is equipped with an exhaustive set of unit tests covering about 97% of the codebase as of this writing. To run it on your machine to verify that everything is working (and you have all of the dependencies, soft and hard, -installed), make sure you have `nose -`__ and run: +installed), make sure you have `pytest +`__ and run: :: @@ -214,8 +202,8 @@ installed), make sure you have `nose Dependencies ------------ -* `setuptools `__ -* `NumPy `__: 1.7.1 or higher +* `setuptools `__ +* `NumPy `__: 1.9.0 or higher * `python-dateutil `__: 1.5 or higher * `pytz `__: Needed for time zone support @@ -226,10 +214,11 @@ Recommended Dependencies * `numexpr `__: for accelerating certain numerical operations. ``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups. - If installed, must be Version 2.1 or higher (excluding a buggy 2.4.4). Version 2.4.6 or higher is highly recommended. + If installed, must be Version 2.4.6 or higher. * `bottleneck `__: for accelerating certain types of ``nan`` - evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. + evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed, + must be Version 1.0.0 or higher. .. note:: @@ -244,17 +233,18 @@ Optional Dependencies * `Cython `__: Only necessary to build development version. Version 0.23 or higher. -* `SciPy `__: miscellaneous statistical functions +* `SciPy `__: miscellaneous statistical functions, Version 0.14.0 or higher * `xarray `__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended. * `PyTables `__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended. * `Feather Format `__: necessary for feather-based storage, version 0.3.1 or higher. +* `Apache Parquet `__, either `pyarrow `__ (>= 0.4.1) or `fastparquet `__ (>= 0.0.6) for parquet-based storage. The `snappy `__ and `brotli `__ are available for compression support. * `SQLAlchemy `__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs `__. Some common drivers are: - - `psycopg2 `__: for PostgreSQL - - `pymysql `__: for MySQL. - - `SQLite `__: for SQLite, this is included in Python's standard library by default. + * `psycopg2 `__: for PostgreSQL + * `pymysql `__: for MySQL. + * `SQLite `__: for SQLite, this is included in Python's standard library by default. -* `matplotlib `__: for plotting +* `matplotlib `__: for plotting, Version 1.4.3 or higher. * For Excel I/O: * `xlrd/xlwt `__: Excel reading (xlrd) and writing (xlwt) @@ -272,11 +262,8 @@ Optional Dependencies `__, or `xclip `__: necessary to use :func:`~pandas.read_clipboard`. Most package managers on Linux distributions will have ``xclip`` and/or ``xsel`` immediately available for installation. -* Google's `python-gflags <`__ , - `oauth2client `__ , - `httplib2 `__ - and `google-api-python-client `__ - : Needed for :mod:`~pandas.io.gbq` +* For Google BigQuery I/O - see `here `__ + * `Backports.lzma `__: Only for Python 2, for writing to and/or reading from an xz compressed DataFrame in CSV; Python 3 support is built into the standard library. * One of the following combinations of libraries is needed to use the top-level :func:`~pandas.read_html` function: @@ -285,7 +272,7 @@ Optional Dependencies okay.) * `BeautifulSoup4`_ and `lxml`_ * `BeautifulSoup4`_ and `html5lib`_ and `lxml`_ - * Only `lxml`_, although see :ref:`HTML Table Parsing ` + * Only `lxml`_, although see :ref:`HTML Table Parsing ` for reasons as to why you should probably **not** take this approach. .. warning:: diff --git a/doc/source/io.rst b/doc/source/io.rst index 4c78758a0e2d2..e338407361705 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -29,36 +29,27 @@ IO Tools (Text, CSV, HDF5, ...) =============================== The pandas I/O API is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas`` -object. - - * :ref:`read_csv` - * :ref:`read_excel` - * :ref:`read_hdf` - * :ref:`read_feather` - * :ref:`read_sql` - * :ref:`read_json` - * :ref:`read_msgpack` - * :ref:`read_html` - * :ref:`read_gbq` - * :ref:`read_stata` - * :ref:`read_sas` - * :ref:`read_clipboard` - * :ref:`read_pickle` - -The corresponding ``writer`` functions are object methods that are accessed like ``df.to_csv()`` - - * :ref:`to_csv` - * :ref:`to_excel` - * :ref:`to_hdf` - * :ref:`to_feather` - * :ref:`to_sql` - * :ref:`to_json` - * :ref:`to_msgpack` - * :ref:`to_html` - * :ref:`to_gbq` - * :ref:`to_stata` - * :ref:`to_clipboard` - * :ref:`to_pickle` +object. The corresponding ``writer`` functions are object methods that are accessed like ``df.to_csv()`` + +.. csv-table:: + :header: "Format Type", "Data Description", "Reader", "Writer" + :widths: 30, 100, 60, 60 + :delim: ; + + text;`CSV `__;:ref:`read_csv`;:ref:`to_csv` + text;`JSON `__;:ref:`read_json`;:ref:`to_json` + text;`HTML `__;:ref:`read_html`;:ref:`to_html` + text; Local clipboard;:ref:`read_clipboard`;:ref:`to_clipboard` + binary;`MS Excel `__;:ref:`read_excel`;:ref:`to_excel` + binary;`HDF5 Format `__;:ref:`read_hdf`;:ref:`to_hdf` + binary;`Feather Format `__;:ref:`read_feather`;:ref:`to_feather` + binary;`Parquet Format `__;:ref:`read_parquet`;:ref:`to_parquet` + binary;`Msgpack `__;:ref:`read_msgpack`;:ref:`to_msgpack` + binary;`Stata `__;:ref:`read_stata`;:ref:`to_stata` + binary;`SAS `__;:ref:`read_sas`; + binary;`Python Pickle Format `__;:ref:`read_pickle`;:ref:`to_pickle` + SQL;`SQL `__;:ref:`read_sql`;:ref:`to_sql` + SQL;`Google Big Query `__;:ref:`read_gbq`;:ref:`to_gbq` :ref:`Here ` is an informal performance comparison for some of these IO methods. @@ -91,11 +82,12 @@ filepath_or_buffer : various locations), or any object with a ``read()`` method (such as an open file or :class:`~python:io.StringIO`). sep : str, defaults to ``','`` for :func:`read_csv`, ``\t`` for :func:`read_table` - Delimiter to use. If sep is ``None``, - will try to automatically determine this. Separators longer than 1 character - and different from ``'\s+'`` will be interpreted as regular expressions, will - force use of the python parsing engine and will ignore quotes in the data. - Regex example: ``'\\r\\t'``. + Delimiter to use. If sep is ``None``, the C engine cannot automatically detect + the separator, but the Python parsing engine can, meaning the latter will be + used automatically. In addition, separators longer than 1 character and + different from ``'\s+'`` will be interpreted as regular expressions and + will also force the use of the Python parsing engine. Note that regex + delimiters are prone to ignoring quoted data. Regex example: ``'\\r\\t'``. delimiter : str, default ``None`` Alternative argument name for sep. delim_whitespace : boolean, default False @@ -146,8 +138,9 @@ usecols : array-like or callable, default ``None`` Using this parameter results in much faster parsing time and lower memory usage. as_recarray : boolean, default ``False`` - DEPRECATED: this argument will be removed in a future version. Please call - ``pd.read_csv(...).to_records()`` instead. + .. deprecated:: 0.18.2 + + Please call ``pd.read_csv(...).to_records()`` instead. Return a NumPy recarray instead of a DataFrame after parsing the data. If set to ``True``, this option takes precedence over the ``squeeze`` parameter. @@ -200,7 +193,10 @@ skiprows : list-like or integer, default ``None`` skipfooter : int, default ``0`` Number of lines at bottom of file to skip (unsupported with engine='c'). skip_footer : int, default ``0`` - DEPRECATED: use the ``skipfooter`` parameter instead, as they are identical + .. deprecated:: 0.19.0 + + Use the ``skipfooter`` parameter instead, as they are identical + nrows : int, default ``None`` Number of rows of file to read. Useful for reading pieces of large files. low_memory : boolean, default ``True`` @@ -211,16 +207,22 @@ low_memory : boolean, default ``True`` use the ``chunksize`` or ``iterator`` parameter to return the data in chunks. (Only valid with C parser) buffer_lines : int, default None - DEPRECATED: this argument will be removed in a future version because its - value is not respected by the parser + .. deprecated:: 0.19.0 + + Argument removed because its value is not respected by the parser + compact_ints : boolean, default False - DEPRECATED: this argument will be removed in a future version + .. deprecated:: 0.19.0 + + Argument moved to ``pd.to_numeric`` If ``compact_ints`` is ``True``, then for any column that is of integer dtype, the parser will attempt to cast it as the smallest integer ``dtype`` possible, either signed or unsigned depending on the specification from the ``use_unsigned`` parameter. use_unsigned : boolean, default False - DEPRECATED: this argument will be removed in a future version + .. deprecated:: 0.18.2 + + Argument moved to ``pd.to_numeric`` If integer columns are being compacted (i.e. ``compact_ints=True``), specify whether the column should be compacted to the smallest signed or unsigned integer dtype. @@ -234,9 +236,9 @@ NA and Missing Data Handling na_values : scalar, str, list-like, or dict, default ``None`` Additional strings to recognize as NA/NaN. If dict passed, specific per-column - NA values. By default the following values are interpreted as NaN: - ``'-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'NA', - '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan', ''``. + NA values. See :ref:`na values const ` below + for a list of the values interpreted as NaN by default. + keep_default_na : boolean, default ``True`` If na_values are specified and keep_default_na is ``False`` the default NaN values are overridden, otherwise they're appended to. @@ -351,11 +353,11 @@ error_bad_lines : boolean, default ``True`` Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If ``False``, then these "bad lines" will dropped from the DataFrame that is - returned (only valid with C parser). See :ref:`bad lines ` + returned. See :ref:`bad lines ` below. warn_bad_lines : boolean, default ``True`` If error_bad_lines is ``False``, and warn_bad_lines is ``True``, a warning for - each "bad line" will be output (only valid with C parser). + each "bad line" will be output. .. _io.dtypes: @@ -721,6 +723,16 @@ index column inference and discard the last column, pass ``index_col=False``: pd.read_csv(StringIO(data)) pd.read_csv(StringIO(data), index_col=False) +If a subset of data is being parsed using the ``usecols`` option, the +``index_col`` specification is based on that subset, not the original data. + +.. ipython:: python + + data = 'a,b,c\n4,apple,bat,\n8,orange,cow,' + print(data) + pd.read_csv(StringIO(data), usecols=['b', 'c']) + pd.read_csv(StringIO(data), usecols=['b', 'c'], index_col=0) + .. _io.parse_dates: Date Handling @@ -1029,10 +1041,11 @@ the corresponding equivalent values will also imply a missing value (in this cas ``[5.0,5]`` are recognized as ``NaN``. To completely override the default values that are recognized as missing, specify ``keep_default_na=False``. -The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA', -'#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan']``. Although a 0-length string -``''`` is not included in the default ``NaN`` values list, it is still treated -as a missing value. + +.. _io.navaluesconst: + +The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', +'n/a', 'NA', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '']``. .. code-block:: python @@ -1241,7 +1254,8 @@ Files with Fixed Width Columns While ``read_csv`` reads delimited data, the :func:`read_fwf` function works with data files that have known and fixed column widths. The function parameters -to ``read_fwf`` are largely the same as `read_csv` with two extra parameters: +to ``read_fwf`` are largely the same as `read_csv` with two extra parameters, and +a different usage of the ``delimiter`` parameter: - ``colspecs``: A list of pairs (tuples) giving the extents of the fixed-width fields of each line as half-open intervals (i.e., [from, to[ ). @@ -1250,6 +1264,9 @@ to ``read_fwf`` are largely the same as `read_csv` with two extra parameters: behaviour, if not specified, is to infer. - ``widths``: A list of field widths which can be used instead of 'colspecs' if the intervals are contiguous. + - ``delimiter``: Characters to consider as filler characters in the fixed-width file. + Can be used to specify the filler character of the fields + if it is not spaces (e.g., '~'). .. ipython:: python :suppress: @@ -1448,6 +1465,14 @@ class of the csv module. For this, you have to specify ``sep=None``. print(open('tmp2.sv').read()) pd.read_csv('tmp2.sv', sep=None, engine='python') +.. _io.multiple_files: + +Reading multiple files to create a single DataFrame +''''''''''''''''''''''''''''''''''''''''''''''''''' + +It's best to use :func:`~pandas.concat` to combine multiple files. +See the :ref:`cookbook` for an example. + .. _io.chunking: Iterating through files chunk by chunk @@ -1995,6 +2020,13 @@ into a flat table. .. ipython:: python from pandas.io.json import json_normalize + data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}}, + {'name': {'given': 'Mose', 'family': 'Regner'}}, + {'id': 2, 'name': 'Faye Raker'}] + json_normalize(data) + +.. ipython:: python + data = [{'state': 'Florida', 'shortname': 'FL', 'info': { @@ -2033,6 +2065,126 @@ using Hadoop or Spark. df df.to_json(orient='records', lines=True) + +.. _io.table_schema: + +Table Schema +'''''''''''' + +.. versionadded:: 0.20.0 + +`Table Schema`_ is a spec for describing tabular datasets as a JSON +object. The JSON includes information on the field names, types, and +other attributes. You can use the orient ``table`` to build +a JSON string with two fields, ``schema`` and ``data``. + +.. ipython:: python + + df = pd.DataFrame( + {'A': [1, 2, 3], + 'B': ['a', 'b', 'c'], + 'C': pd.date_range('2016-01-01', freq='d', periods=3), + }, index=pd.Index(range(3), name='idx')) + df + df.to_json(orient='table', date_format="iso") + +The ``schema`` field contains the ``fields`` key, which itself contains +a list of column name to type pairs, including the ``Index`` or ``MultiIndex`` +(see below for a list of types). +The ``schema`` field also contains a ``primaryKey`` field if the (Multi)index +is unique. + +The second field, ``data``, contains the serialized data with the ``records`` +orient. +The index is included, and any datetimes are ISO 8601 formatted, as required +by the Table Schema spec. + +The full list of types supported are described in the Table Schema +spec. This table shows the mapping from pandas types: + +=============== ================= +Pandas type Table Schema type +=============== ================= +int64 integer +float64 number +bool boolean +datetime64[ns] datetime +timedelta64[ns] duration +categorical any +object str +=============== ================= + +A few notes on the generated table schema: + +- The ``schema`` object contains a ``pandas_version`` field. This contains + the version of pandas' dialect of the schema, and will be incremented + with each revision. +- All dates are converted to UTC when serializing. Even timezone naïve values, + which are treated as UTC with an offset of 0. + + .. ipython:: python + + from pandas.io.json import build_table_schema + s = pd.Series(pd.date_range('2016', periods=4)) + build_table_schema(s) + +- datetimes with a timezone (before serializing), include an additional field + ``tz`` with the time zone name (e.g. ``'US/Central'``). + + .. ipython:: python + + s_tz = pd.Series(pd.date_range('2016', periods=12, + tz='US/Central')) + build_table_schema(s_tz) + +- Periods are converted to timestamps before serialization, and so have the + same behavior of being converted to UTC. In addition, periods will contain + and additional field ``freq`` with the period's frequency, e.g. ``'A-DEC'`` + + .. ipython:: python + + s_per = pd.Series(1, index=pd.period_range('2016', freq='A-DEC', + periods=4)) + build_table_schema(s_per) + +- Categoricals use the ``any`` type and an ``enum`` constraint listing + the set of possible values. Additionally, an ``ordered`` field is included + + .. ipython:: python + + s_cat = pd.Series(pd.Categorical(['a', 'b', 'a'])) + build_table_schema(s_cat) + +- A ``primaryKey`` field, containing an array of labels, is included + *if the index is unique*: + + .. ipython:: python + + s_dupe = pd.Series([1, 2], index=[1, 1]) + build_table_schema(s_dupe) + +- The ``primaryKey`` behavior is the same with MultiIndexes, but in this + case the ``primaryKey`` is an array: + + .. ipython:: python + + s_multi = pd.Series(1, index=pd.MultiIndex.from_product([('a', 'b'), + (0, 1)])) + build_table_schema(s_multi) + +- The default naming roughly follows these rules: + + + For series, the ``object.name`` is used. If that's none, then the + name is ``values`` + + For DataFrames, the stringified version of the column name is used + + For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a + fallback to ``index`` if that is None. + + For ``MultiIndex``, ``mi.names`` is used. If any level has no name, + then ``level_`` is used. + + +_Table Schema: http://specs.frictionlessdata.io/json-table-schema/ + HTML ---- @@ -2043,7 +2195,7 @@ Reading HTML Content .. warning:: - We **highly encourage** you to read the :ref:`HTML Table Parsing gotchas` + We **highly encourage** you to read the :ref:`HTML Table Parsing gotchas ` below regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers. .. versionadded:: 0.12.0 @@ -2111,9 +2263,10 @@ Read a URL and match a table that contains specific text match = 'Metcalf Bank' df_list = pd.read_html(url, match=match) -Specify a header row (by default ```` elements are used to form the column -index); if specified, the header row is taken from the data minus the parsed -header elements (```` elements). +Specify a header row (by default ```` or ```` elements located within a +```` are used to form the column index, if multiple rows are contained within +```` then a multiindex is created); if specified, the header row is taken +from the data minus the parsed header elements (```` elements). .. code-block:: python @@ -2441,12 +2594,12 @@ Reading Excel Files ''''''''''''''''''' In the most basic use-case, ``read_excel`` takes a path to an Excel -file, and the ``sheetname`` indicating which sheet to parse. +file, and the ``sheet_name`` indicating which sheet to parse. .. code-block:: python # Returns a DataFrame - read_excel('path_to_file.xls', sheetname='Sheet1') + read_excel('path_to_file.xls', sheet_name='Sheet1') .. _io.excel.excelfile_class: @@ -2514,12 +2667,12 @@ of sheet names can simply be passed to ``read_excel`` with no loss in performanc Specifying Sheets +++++++++++++++++ -.. note :: The second argument is ``sheetname``, not to be confused with ``ExcelFile.sheet_names`` +.. note :: The second argument is ``sheet_name``, not to be confused with ``ExcelFile.sheet_names`` .. note :: An ExcelFile's attribute ``sheet_names`` provides access to a list of sheets. -- The arguments ``sheetname`` allows specifying the sheet or sheets to read. -- The default value for ``sheetname`` is 0, indicating to read the first sheet +- The arguments ``sheet_name`` allows specifying the sheet or sheets to read. +- The default value for ``sheet_name`` is 0, indicating to read the first sheet - Pass a string to refer to the name of a particular sheet in the workbook. - Pass an integer to refer to the index of a sheet. Indices follow Python convention, beginning at 0. @@ -2550,18 +2703,18 @@ Using None to get all sheets: .. code-block:: python # Returns a dictionary of DataFrames - read_excel('path_to_file.xls',sheetname=None) + read_excel('path_to_file.xls',sheet_name=None) Using a list to get multiple sheets: .. code-block:: python # Returns the 1st and 4th sheet, as a dictionary of DataFrames. - read_excel('path_to_file.xls',sheetname=['Sheet1',3]) + read_excel('path_to_file.xls',sheet_name=['Sheet1',3]) .. versionadded:: 0.16 -``read_excel`` can read more than one sheet, by setting ``sheetname`` to either +``read_excel`` can read more than one sheet, by setting ``sheet_name`` to either a list of sheet names, a list of sheet positions, or ``None`` to read all sheets. .. versionadded:: 0.13 @@ -2619,11 +2772,6 @@ should be passed to ``index_col`` and ``header`` import os os.remove('path_to_file.xlsx') -.. warning:: - - Excel files saved in version 0.16.2 or prior that had index names will still able to be read in, - but the ``has_index_names`` argument must specified to ``True``. - Parsing Specific Columns ++++++++++++++++++++++++ @@ -2646,6 +2794,20 @@ indices to be parsed. read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3]) + +Parsing Dates ++++++++++++++ + +Datetime-like values are normally automatically converted to the appropriate +dtype when reading the excel file. But if you have a column of strings that +*look* like dates (but are not actually formatted as dates in excel), you can +use the `parse_dates` keyword to parse those strings to datetimes: + +.. code-block:: python + + read_excel('path_to_file.xls', 'Sheet1', parse_dates=['date_strings']) + + Cell Converters +++++++++++++++ @@ -2777,6 +2939,7 @@ Added support for Openpyxl >= 2.2 ``'xlsxwriter'`` will produce an Excel 2007-format workbook (xlsx). If omitted, an Excel 2007-formatted workbook is produced. + .. _io.excel.writers: Excel writer engines @@ -2823,6 +2986,18 @@ argument to ``to_excel`` and to ``ExcelWriter``. The built-in engines are: df.to_excel('path_to_file.xlsx', sheet_name='Sheet1') +.. _io.excel.style: + +Style and Formatting +'''''''''''''''''''' + +The look and feel of Excel worksheets created from pandas can be modified using the following parameters on the ``DataFrame``'s ``to_excel`` method. + +- ``float_format`` : Format string for floating point numbers (default None) +- ``freeze_panes`` : A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so (1, 1) will freeze the first row and first column (default None) + + + .. _io.clipboard: Clipboard @@ -2909,9 +3084,66 @@ any pickled pandas object (or any other pickled object) from file: See `this question `__ for a detailed explanation. -.. note:: +.. _io.pickle.compression: + +Compressed pickle files +''''''''''''''''''''''' + +.. versionadded:: 0.20.0 + +:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle` can read +and write compressed pickle files. The compression types of ``gzip``, ``bz2``, ``xz`` are supported for reading and writing. +`zip`` file supports read only and must contain only one data file +to be read in. + +The compression type can be an explicit parameter or be inferred from the file extension. +If 'infer', then use ``gzip``, ``bz2``, ``zip``, or ``xz`` if filename ends in ``'.gz'``, ``'.bz2'``, ``'.zip'``, or +``'.xz'``, respectively. + +.. ipython:: python + + df = pd.DataFrame({ + 'A': np.random.randn(1000), + 'B': 'foo', + 'C': pd.date_range('20130101', periods=1000, freq='s')}) + df + +Using an explicit compression type + +.. ipython:: python + + df.to_pickle("data.pkl.compress", compression="gzip") + rt = pd.read_pickle("data.pkl.compress", compression="gzip") + rt + +Inferring compression type from the extension + +.. ipython:: python + + df.to_pickle("data.pkl.xz", compression="infer") + rt = pd.read_pickle("data.pkl.xz", compression="infer") + rt - These methods were previously ``pd.save`` and ``pd.load``, prior to 0.12.0, and are now deprecated. +The default is to 'infer + +.. ipython:: python + + df.to_pickle("data.pkl.gz") + rt = pd.read_pickle("data.pkl.gz") + rt + + df["A"].to_pickle("s1.pkl.bz2") + rt = pd.read_pickle("s1.pkl.bz2") + rt + +.. ipython:: python + :suppress: + + import os + os.remove("data.pkl.compress") + os.remove("data.pkl.xz") + os.remove("data.pkl.gz") + os.remove("s1.pkl.bz2") .. _io.msgpack: @@ -2969,7 +3201,7 @@ You can pass ``iterator=True`` to iterate over the unpacked results .. ipython:: python for o in pd.read_msgpack('foo.msg',iterator=True): - print o + print(o) You can pass ``append=True`` to the writer to append to an existing pack @@ -3197,7 +3429,7 @@ Fixed Format This was prior to 0.13.0 the ``Storer`` format. The examples above show storing using ``put``, which write the HDF5 to ``PyTables`` in a fixed array format, called -the ``fixed`` format. These types of stores are are **not** appendable once written (though you can simply +the ``fixed`` format. These types of stores are **not** appendable once written (though you can simply remove them and rewrite). Nor are they **queryable**; they must be retrieved in their entirety. They also do not support dataframes with non-unique column names. The ``fixed`` format stores offer very fast writing and slightly faster reading than ``table`` stores. @@ -3625,7 +3857,7 @@ be data_columns # on-disk operations store.append('df_dc', df_dc, data_columns = ['B', 'C', 'string', 'string2']) - store.select('df_dc', [ pd.Term('B>0') ]) + store.select('df_dc', where='B>0') # getting creative store.select('df_dc', 'B > 0 & C > 0 & string == foo') @@ -3687,7 +3919,7 @@ chunks. evens = [2,4,6,8,10] coordinates = store.select_as_coordinates('dfeq','number=evens') for c in chunks(coordinates, 2): - print store.select('dfeq',where=c) + print(store.select('dfeq',where=c)) Advanced Queries ++++++++++++++++ @@ -3857,26 +4089,64 @@ Compression +++++++++++ ``PyTables`` allows the stored data to be compressed. This applies to -all kinds of stores, not just tables. +all kinds of stores, not just tables. Two parameters are used to +control compression: ``complevel`` and ``complib``. + +``complevel`` specifies if and how hard data is to be compressed. + ``complevel=0`` and ``complevel=None`` disables + compression and ``0`_: The default compression library. A classic in terms of compression, achieves good compression rates but is somewhat slow. + - `lzo `_: Fast compression and decompression. + - `bzip2 `_: Good compression rates. + - `blosc `_: Fast compression and decompression. + + .. versionadded:: 0.20.2 + + Support for alternative blosc compressors: + + - `blosc:blosclz `_ This is the + default compressor for ``blosc`` + - `blosc:lz4 + `_: + A compact, very popular and fast compressor. + - `blosc:lz4hc + `_: + A tweaked version of LZ4, produces better + compression ratios at the expense of speed. + - `blosc:snappy `_: + A popular compressor used in many places. + - `blosc:zlib `_: A classic; + somewhat slower than the previous ones, but + achieving better compression ratios. + - `blosc:zstd `_: An + extremely well balanced codec; it provides the best + compression ratios among the others above, and at + reasonably fast speed. + + If ``complib`` is defined as something other than the + listed libraries a ``ValueError`` exception is issued. -- Pass ``complevel=int`` for a compression level (1-9, with 0 being no - compression, and the default) -- Pass ``complib=lib`` where lib is any of ``zlib, bzip2, lzo, blosc`` for - whichever compression library you prefer. +.. note:: -``HDFStore`` will use the file based compression scheme if no overriding -``complib`` or ``complevel`` options are provided. ``blosc`` offers very -fast compression, and is my most used. Note that ``lzo`` and ``bzip2`` -may not be installed (by Python) by default. + If the library specified with the ``complib`` option is missing on your platform, + compression defaults to ``zlib`` without further ado. -Compression for all objects within the file +Enable compression for all objects within the file: .. code-block:: python - store_compressed = pd.HDFStore('store_compressed.h5', complevel=9, complib='blosc') + store_compressed = pd.HDFStore('store_compressed.h5', complevel=9, complib='blosc:blosclz') -Or on-the-fly compression (this only applies to tables). You can turn -off file compression for a specific table by passing ``complevel=0`` +Or on-the-fly compression (this only applies to tables) in stores where compression is not enabled: .. code-block:: python @@ -4211,32 +4481,6 @@ Performance `Here `__ for more information and some solutions. -Experimental -'''''''''''' - -HDFStore supports ``Panel4D`` storage. - -.. ipython:: python - :okwarning: - - p4d = pd.Panel4D({ 'l1' : wp }) - p4d - store.append('p4d', p4d) - store - -These, by default, index the three axes ``items, major_axis, -minor_axis``. On an ``AppendableTable`` it is possible to setup with the -first append a different indexing scheme, depending on how you want to -store your data. Pass the ``axes`` keyword with a list of dimensions -(currently must by exactly 1 less than the total dimensions of the -object). This cannot be changed after table creation. - -.. ipython:: python - :okwarning: - - store.append('p4d2', p4d, axes=['labels', 'major_axis', 'minor_axis']) - store - store.select('p4d2', [ pd.Term('labels=l1'), pd.Term('items=Item1'), pd.Term('minor_axis=A_big_strings') ]) .. ipython:: python :suppress: @@ -4289,6 +4533,7 @@ See the `Full Documentation `__ Write to a feather file. .. ipython:: python + :okwarning: df.to_feather('example.feather') @@ -4308,6 +4553,79 @@ Read from a feather file. import os os.remove('example.feather') + +.. _io.parquet: + +Parquet +------- + +.. versionadded:: 0.21.0 + +`Apache Parquet `__ provides a partitioned binary columnar serialization for data frames. It is designed to +make reading and writing data frames efficient, and to make sharing data across data analysis +languages easy. Parquet can use a variety of compression techniques to shrink the file size as much as possible +while still maintaining good read performance. + +Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas +dtypes, including extension dtypes such as datetime with tz. + +Several caveats. + +- The format will NOT write an ``Index``, or ``MultiIndex`` for the ``DataFrame`` and will raise an + error if a non-default one is provided. You can simply ``.reset_index(drop=True)`` in order to store the index. +- Duplicate column names and non-string columns names are not supported +- Categorical dtypes are currently not-supported (for ``pyarrow``). +- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message + on an attempt at serialization. + +You can specifiy an ``engine`` to direct the serialization. This can be one of ``pyarrow``, or ``fastparquet``, or ``auto``. +If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``, then +then ``pyarrow`` is tried, and falling back to ``fastparquet``. + +See the documentation for `pyarrow `__ and `fastparquet `__ + +.. note:: + + These engines are very similar and should read/write nearly identical parquet format files. + These libraries differ by having different underlying dependencies (``fastparquet`` by using ``numba``, while ``pyarrow`` uses a c-library). + +.. ipython:: python + + df = pd.DataFrame({'a': list('abc'), + 'b': list(range(1, 4)), + 'c': np.arange(3, 6).astype('u1'), + 'd': np.arange(4.0, 7.0, dtype='float64'), + 'e': [True, False, True], + 'f': pd.date_range('20130101', periods=3), + 'g': pd.date_range('20130101', periods=3, tz='US/Eastern'), + 'h': pd.date_range('20130101', periods=3, freq='ns')}) + + df + df.dtypes + +Write to a parquet file. + +.. ipython:: python + + df.to_parquet('example_pa.parquet', engine='pyarrow') + df.to_parquet('example_fp.parquet', engine='fastparquet') + +Read from a parquet file. + +.. ipython:: python + + result = pd.read_parquet('example_pa.parquet', engine='pyarrow') + result = pd.read_parquet('example_fp.parquet', engine='fastparquet') + + result.dtypes + +.. ipython:: python + :suppress: + + import os + os.remove('example_pa.parquet') + os.remove('example_fp.parquet') + .. _io.sql: SQL Queries @@ -4639,260 +4957,18 @@ And then issue the following queries: Google BigQuery --------------- -.. versionadded:: 0.13.0 - -The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery -analytics web service to simplify retrieving results from BigQuery tables -using SQL-like queries. Result sets are parsed into a pandas -DataFrame with a shape and data types derived from the source table. -Additionally, DataFrames can be inserted into new BigQuery tables or appended -to existing tables. - -You will need to install some additional dependencies: - -- Google's `python-gflags `__ -- `httplib2 `__ -- `google-api-python-client `__ - -.. warning:: - - To use this module, you will need a valid BigQuery account. Refer to the - `BigQuery Documentation `__ for details on the service itself. - -The key functions are: - -.. currentmodule:: pandas.io.gbq - -.. autosummary:: - :toctree: generated/ - - read_gbq - to_gbq - -.. currentmodule:: pandas - -.. _io.bigquery_reader: - -.. _io.bigquery_authentication: - -Authentication -'''''''''''''' - -.. versionadded:: 0.18.0 - -Authentication to the Google ``BigQuery`` service is via ``OAuth 2.0``. -Is possible to authenticate with either user account credentials or service account credentials. - -Authenticating with user account credentials is as simple as following the prompts in a browser window -which will be automatically opened for you. You will be authenticated to the specified -``BigQuery`` account using the product name ``pandas GBQ``. It is only possible on local host. -The remote authentication using user account credentials is not currently supported in Pandas. -Additional information on the authentication mechanism can be found -`here `__. - -Authentication with service account credentials is possible via the `'private_key'` parameter. This method -is particularly useful when working on remote servers (eg. jupyter iPython notebook on remote host). -Additional information on service accounts can be found -`here `__. - -You will need to install an additional dependency: `oauth2client `__. - -Authentication via ``application default credentials`` is also possible. This is only valid -if the parameter ``private_key`` is not provided. This method also requires that -the credentials can be fetched from the environment the code is running in. -Otherwise, the OAuth2 client-side authentication is used. -Additional information on -`application default credentials `__. - -.. versionadded:: 0.19.0 - -.. note:: - - The `'private_key'` parameter can be set to either the file path of the service account key - in JSON format, or key contents of the service account key in JSON format. - -.. note:: - - A private key can be obtained from the Google developers console by clicking - `here `__. Use JSON key type. - - -Querying -'''''''' - -Suppose you want to load all data from an existing BigQuery table : `test_dataset.test_table` -into a DataFrame using the :func:`~pandas.io.gbq.read_gbq` function. - -.. code-block:: python - - # Insert your BigQuery Project ID Here - # Can be found in the Google web console - projectid = "xxxxxxxx" - - data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table', projectid) - - -You can define which column from BigQuery to use as an index in the -destination DataFrame as well as a preferred column order as follows: - -.. code-block:: python - - data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table', - index_col='index_column_name', - col_order=['col1', 'col2', 'col3'], projectid) - - -Starting with 0.20.0, you can specify the query config as parameter to use additional options of your job. -For more information about query configuration parameters see -`here `__. - -.. code-block:: python - - configuration = { - 'query': { - "useQueryCache": False - } - } - data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table', - configuration=configuration, projectid) - - -.. note:: - - You can find your project id in the `Google developers console `__. - - -.. note:: - - You can toggle the verbose output via the ``verbose`` flag which defaults to ``True``. - -.. note:: - - The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL - or BigQuery's ``'standard'`` SQL (beta). The default value is ``'legacy'``. For more information - on BigQuery's standard SQL, see `BigQuery SQL Reference - `__ - -.. _io.bigquery_writer: - - -Writing DataFrames -'''''''''''''''''' - -Assume we want to write a DataFrame ``df`` into a BigQuery table using :func:`~pandas.DataFrame.to_gbq`. - -.. ipython:: python - - df = pd.DataFrame({'my_string': list('abc'), - 'my_int64': list(range(1, 4)), - 'my_float64': np.arange(4.0, 7.0), - 'my_bool1': [True, False, True], - 'my_bool2': [False, True, False], - 'my_dates': pd.date_range('now', periods=3)}) - - df - df.dtypes - -.. code-block:: python - - df.to_gbq('my_dataset.my_table', projectid) - -.. note:: - - The destination table and destination dataset will automatically be created if they do not already exist. - -The ``if_exists`` argument can be used to dictate whether to ``'fail'``, ``'replace'`` -or ``'append'`` if the destination table already exists. The default value is ``'fail'``. - -For example, assume that ``if_exists`` is set to ``'fail'``. The following snippet will raise -a ``TableCreationError`` if the destination table already exists. - -.. code-block:: python - - df.to_gbq('my_dataset.my_table', projectid, if_exists='fail') - -.. note:: - - If the ``if_exists`` argument is set to ``'append'``, the destination dataframe will - be written to the table using the defined table schema and column types. The - dataframe must match the destination table in structure and data types. - If the ``if_exists`` argument is set to ``'replace'``, and the existing table has a - different schema, a delay of 2 minutes will be forced to ensure that the new schema - has propagated in the Google environment. See - `Google BigQuery issue 191 `__. - -Writing large DataFrames can result in errors due to size limitations being exceeded. -This can be avoided by setting the ``chunksize`` argument when calling :func:`~pandas.DataFrame.to_gbq`. -For example, the following writes ``df`` to a BigQuery table in batches of 10000 rows at a time: - -.. code-block:: python - - df.to_gbq('my_dataset.my_table', projectid, chunksize=10000) - -You can also see the progress of your post via the ``verbose`` flag which defaults to ``True``. -For example: - -.. code-block:: python - - In [8]: df.to_gbq('my_dataset.my_table', projectid, chunksize=10000, verbose=True) - - Streaming Insert is 10% Complete - Streaming Insert is 20% Complete - Streaming Insert is 30% Complete - Streaming Insert is 40% Complete - Streaming Insert is 50% Complete - Streaming Insert is 60% Complete - Streaming Insert is 70% Complete - Streaming Insert is 80% Complete - Streaming Insert is 90% Complete - Streaming Insert is 100% Complete - -.. note:: - - If an error occurs while streaming data to BigQuery, see - `Troubleshooting BigQuery Errors `__. - -.. note:: - - The BigQuery SQL query language has some oddities, see the - `BigQuery Query Reference Documentation `__. - -.. note:: - - While BigQuery uses SQL-like syntax, it has some important differences from traditional - databases both in functionality, API limitations (size and quantity of queries or uploads), - and how Google charges for use of the service. You should refer to `Google BigQuery documentation `__ - often as the service seems to be changing and evolving. BiqQuery is best for analyzing large - sets of data quickly, but it is not a direct replacement for a transactional database. - -Creating BigQuery Tables -'''''''''''''''''''''''' - .. warning:: - As of 0.17, the function :func:`~pandas.io.gbq.generate_bq_schema` has been deprecated and will be - removed in a future version. - -As of 0.15.2, the gbq module has a function :func:`~pandas.io.gbq.generate_bq_schema` which will -produce the dictionary representation schema of the specified pandas DataFrame. + Starting in 0.20.0, pandas has split off Google BigQuery support into the + separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it. -.. code-block:: ipython - - In [10]: gbq.generate_bq_schema(df, default_type='STRING') +The ``pandas-gbq`` package provides functionality to read/write from Google BigQuery. - Out[10]: {'fields': [{'name': 'my_bool1', 'type': 'BOOLEAN'}, - {'name': 'my_bool2', 'type': 'BOOLEAN'}, - {'name': 'my_dates', 'type': 'TIMESTAMP'}, - {'name': 'my_float64', 'type': 'FLOAT'}, - {'name': 'my_int64', 'type': 'INTEGER'}, - {'name': 'my_string', 'type': 'STRING'}]} - -.. note:: +pandas integrates with this external package. if ``pandas-gbq`` is installed, you can +use the pandas methods ``pd.read_gbq`` and ``DataFrame.to_gbq``, which will call the +respective functions from ``pandas-gbq``. - If you delete and re-create a BigQuery table with the same name, but different table schema, - you must wait 2 minutes before streaming data into the table. As a workaround, consider creating - the new table with a different name. Refer to - `Google BigQuery issue 191 `__. +Full documentation can be found `here `__ .. _io.stata: diff --git a/doc/source/merging.rst b/doc/source/merging.rst index f732f0a4cc749..d956f1ca54e6b 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -13,7 +13,7 @@ import matplotlib.pyplot as plt plt.close('all') - import pandas.util.doctools as doctools + import pandas.util._doctools as doctools p = doctools.TablePlotter() @@ -513,7 +513,8 @@ standard database join operations between DataFrame objects: pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, - suffixes=('_x', '_y'), copy=True, indicator=False) + suffixes=('_x', '_y'), copy=True, indicator=False, + validate=None) - ``left``: A DataFrame object - ``right``: Another DataFrame object @@ -551,6 +552,20 @@ standard database join operations between DataFrame objects: .. versionadded:: 0.17.0 +- ``validate`` : string, default None. + If specified, checks if merge is of specified type. + + * "one_to_one" or "1:1": checks if merge keys are unique in both + left and right datasets. + * "one_to_many" or "1:m": checks if merge keys are unique in left + dataset. + * "many_to_one" or "m:1": checks if merge keys are unique in right + dataset. + * "many_to_many" or "m:m": allowed, but does not result in checks. + + .. versionadded:: 0.21.0 + + The return type will be the same as ``left``. If ``left`` is a ``DataFrame`` and ``right`` is a subclass of DataFrame, the return type will still be ``DataFrame``. @@ -711,10 +726,40 @@ Here is another example with duplicate join keys in DataFrames: labels=['left', 'right'], vertical=False); plt.close('all'); + .. warning:: - Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, - may result in memory overflow. It is the user' s responsibility to manage duplicate values in keys before joining large DataFrames. + Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, which may result in memory overflow. It is the user' s responsibility to manage duplicate values in keys before joining large DataFrames. + +.. _merging.validation: + +Checking for duplicate keys +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.21.0 + +Users can use the ``validate`` argument to automatically check whether there are unexpected duplicates in their merge keys. Key uniqueness is checked before merge operations and so should protect against memory overflows. Checking key uniqueness is also a good way to ensure user data structures are as expected. + +In the following example, there are duplicate values of ``B`` in the right DataFrame. As this is not a one-to-one merge -- as specified in the ``validate`` argument -- an exception will be raised. + + +.. ipython:: python + + left = pd.DataFrame({'A' : [1,2], 'B' : [1, 2]}) + right = pd.DataFrame({'A' : [4,5,6], 'B': [2, 2, 2]}) + +.. code-block:: ipython + + In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one") + ... + MergeError: Merge keys are not unique in right dataset; not a one-to-one merge + +If the user is aware of the duplicates in the right `DataFrame` but wants to ensure there are no duplicates in the left DataFrame, one can use the `validate='one_to_many'` argument instead, which will not raise an exception. + +.. ipython:: python + + pd.merge(left, right, on='B', how='outer', validate="one_to_many") + .. _merging.indicator: @@ -746,6 +791,79 @@ The ``indicator`` argument will also accept string arguments, in which case the pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') +.. _merging.dtypes: + +Merge Dtypes +~~~~~~~~~~~~ + +.. versionadded:: 0.19.0 + +Merging will preserve the dtype of the join keys. + +.. ipython:: python + + left = pd.DataFrame({'key': [1], 'v1': [10]}) + left + right = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) + right + +We are able to preserve the join keys + +.. ipython:: python + + pd.merge(left, right, how='outer') + pd.merge(left, right, how='outer').dtypes + +Of course if you have missing values that are introduced, then the +resulting dtype will be upcast. + +.. ipython:: python + + pd.merge(left, right, how='outer', on='key') + pd.merge(left, right, how='outer', on='key').dtypes + +.. versionadded:: 0.20.0 + +Merging will preserve ``category`` dtypes of the mergands. See also the section on :ref:`categoricals ` + +The left frame. + +.. ipython:: python + + X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,))) + X = X.astype('category', categories=['foo', 'bar']) + + left = pd.DataFrame({'X': X, + 'Y': np.random.choice(['one', 'two', 'three'], size=(10,))}) + left + left.dtypes + +The right frame. + +.. ipython:: python + + right = pd.DataFrame({'X': pd.Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']), + 'Z': [1, 2]}) + right + right.dtypes + +The merged result + +.. ipython:: python + + result = pd.merge(left, right, how='outer') + result + result.dtypes + +.. note:: + + The category dtypes must be *exactly* the same, meaning the same categories and the ordered attribute. + Otherwise the result will coerce to ``object`` dtype. + +.. note:: + + Merging on ``category`` dtypes that are the same can be quite performant compared to ``object`` dtype merging. + .. _merging.join.index: Joining on index diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index 37930775885e3..d54288baa389b 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -36,7 +36,7 @@ When / why does data become missing? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some might quibble over our usage of *missing*. By "missing" we simply mean -**null** or "not present for whatever reason". Many data sets simply arrive with +**NA** ("not available") or "not present for whatever reason". Many data sets simply arrive with missing data, either because it exists and was not collected or it never existed. For example, in a collection of financial time series, some of the time series might start on different dates. Thus, values prior to the start date @@ -63,27 +63,27 @@ to handling missing data. While ``NaN`` is the default missing value marker for reasons of computational speed and convenience, we need to be able to easily detect this value with data of different types: floating point, integer, boolean, and general object. In many cases, however, the Python ``None`` will -arise and we wish to also consider that "missing" or "null". +arise and we wish to also consider that "missing" or "not available" or "NA". .. note:: Prior to version v0.10.0 ``inf`` and ``-inf`` were also - considered to be "null" in computations. This is no longer the case by - default; use the ``mode.use_inf_as_null`` option to recover it. + considered to be "NA" in computations. This is no longer the case by + default; use the ``mode.use_inf_as_na`` option to recover it. -.. _missing.isnull: +.. _missing.isna: To make detecting missing values easier (and across different array dtypes), -pandas provides the :func:`~pandas.core.common.isnull` and -:func:`~pandas.core.common.notnull` functions, which are also methods on +pandas provides the :func:`isna` and +:func:`notna` functions, which are also methods on ``Series`` and ``DataFrame`` objects: .. ipython:: python df2['one'] - pd.isnull(df2['one']) - df2['four'].notnull() - df2.isnull() + pd.isna(df2['one']) + df2['four'].notna() + df2.isna() .. warning:: @@ -206,7 +206,7 @@ with missing data. Filling missing values: fillna ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The **fillna** function can "fill in" NA values with non-null data in a couple +The **fillna** function can "fill in" NA values with non-NA data in a couple of ways, which we illustrate: **Replace NA with a scalar value** @@ -220,7 +220,7 @@ of ways, which we illustrate: **Fill gaps forward or backward** Using the same filling arguments as :ref:`reindexing `, we -can propagate non-null values forward or backward: +can propagate non-NA values forward or backward: .. ipython:: python @@ -288,7 +288,7 @@ a Series in this case. .. ipython:: python - dff.where(pd.notnull(dff), dff.mean(), axis='columns') + dff.where(pd.notna(dff), dff.mean(), axis='columns') .. _missing_data.dropna: @@ -540,7 +540,7 @@ String/Regular Expression Replacement `__ if this is unclear. -Replace the '.' with ``nan`` (str -> str) +Replace the '.' with ``NaN`` (str -> str) .. ipython:: python diff --git a/doc/source/options.rst b/doc/source/options.rst index 77cac6d495d13..51d02bc89692a 100644 --- a/doc/source/options.rst +++ b/doc/source/options.rst @@ -28,7 +28,7 @@ You can get/set options directly as attributes of the top-level ``options`` attr pd.options.display.max_rows = 999 pd.options.display.max_rows -There is also an API composed of 5 relevant functions, available directly from the ``pandas`` +The API is composed of 5 relevant functions, available directly from the ``pandas`` namespace: - :func:`~pandas.get_option` / :func:`~pandas.set_option` - get/set the value of a single option. @@ -40,7 +40,7 @@ namespace: **Note:** developers can check out pandas/core/config.py for more info. All of the functions above accept a regexp pattern (``re.search`` style) as an argument, -and so passing in a substring will work - as long as it is unambiguous : +and so passing in a substring will work - as long as it is unambiguous: .. ipython:: python @@ -241,7 +241,7 @@ suggestion. df ``display.chop_threshold`` sets at what level pandas rounds to zero when -it displays a Series of DataFrame. Note, this does not effect the +it displays a Series of DataFrame. Note, this does not effect the precision at which the number is stored. .. ipython:: python @@ -273,151 +273,165 @@ Options are 'right', and 'left'. Available Options ----------------- -========================== ============ ================================== -Option Default Function -========================== ============ ================================== -display.chop_threshold None If set to a float value, all float - values smaller then the given - threshold will be displayed as - exactly 0 by repr and friends. -display.colheader_justify right Controls the justification of - column headers. used by DataFrameFormatter. -display.column_space 12 No description available. -display.date_dayfirst False When True, prints and parses dates - with the day first, eg 20/01/2005 -display.date_yearfirst False When True, prints and parses dates - with the year first, eg 2005/01/20 -display.encoding UTF-8 Defaults to the detected encoding - of the console. Specifies the encoding - to be used for strings returned by - to_string, these are generally strings - meant to be displayed on the console. -display.expand_frame_repr True Whether to print out the full DataFrame - repr for wide DataFrames across - multiple lines, `max_columns` is - still respected, but the output will - wrap-around across multiple "pages" - if its width exceeds `display.width`. -display.float_format None The callable should accept a floating - point number and return a string with - the desired format of the number. - This is used in some places like - SeriesFormatter. - See core.format.EngFormatter for an example. -display.height 60 Deprecated. Use `display.max_rows` instead. -display.large_repr truncate For DataFrames exceeding max_rows/max_cols, - the repr (and HTML repr) can show - a truncated table (the default from 0.13), - or switch to the view from df.info() - (the behaviour in earlier versions of pandas). - allowable settings, ['truncate', 'info'] -display.latex.repr False Whether to produce a latex DataFrame - representation for jupyter frontends - that support it. -display.latex.escape True Escapes special caracters in Dataframes, when - using the to_latex method. -display.latex.longtable False Specifies if the to_latex method of a Dataframe - uses the longtable format. -display.line_width 80 Deprecated. Use `display.width` instead. -display.max_columns 20 max_rows and max_columns are used - in __repr__() methods to decide if - to_string() or info() is used to - render an object to a string. In - case python/IPython is running in - a terminal this can be set to 0 and - pandas will correctly auto-detect - the width the terminal and swap to - a smaller format in case all columns - would not fit vertically. The IPython - notebook, IPython qtconsole, or IDLE - do not run in a terminal and hence - it is not possible to do correct - auto-detection. 'None' value means - unlimited. -display.max_colwidth 50 The maximum width in characters of - a column in the repr of a pandas - data structure. When the column overflows, - a "..." placeholder is embedded in - the output. -display.max_info_columns 100 max_info_columns is used in DataFrame.info - method to decide if per column information - will be printed. -display.max_info_rows 1690785 df.info() will usually show null-counts - for each column. For large frames - this can be quite slow. max_info_rows - and max_info_cols limit this null - check only to frames with smaller - dimensions then specified. -display.max_rows 60 This sets the maximum number of rows - pandas should output when printing - out various output. For example, - this value determines whether the - repr() for a dataframe prints out - fully or just a summary repr. - 'None' value means unlimited. -display.max_seq_items 100 when pretty-printing a long sequence, - no more then `max_seq_items` will - be printed. If items are omitted, - they will be denoted by the addition - of "..." to the resulting string. - If set to None, the number of items - to be printed is unlimited. -display.memory_usage True This specifies if the memory usage of - a DataFrame should be displayed when the - df.info() method is invoked. -display.multi_sparse True "Sparsify" MultiIndex display (don't - display repeated elements in outer - levels within groups) -display.notebook_repr_html True When True, IPython notebook will - use html representation for - pandas objects (if it is available). -display.pprint_nest_depth 3 Controls the number of nested levels - to process when pretty-printing -display.precision 6 Floating point output precision in - terms of number of places after the - decimal, for regular formatting as well - as scientific notation. Similar to - numpy's ``precision`` print option -display.show_dimensions truncate Whether to print out dimensions - at the end of DataFrame repr. - If 'truncate' is specified, only - print out the dimensions if the - frame is truncated (e.g. not display - all rows and/or columns) -display.width 80 Width of the display in characters. - In case python/IPython is running in - a terminal this can be set to None - and pandas will correctly auto-detect - the width. Note that the IPython notebook, - IPython qtconsole, or IDLE do not run in a - terminal and hence it is not possible - to correctly detect the width. -html.border 1 A ``border=value`` attribute is - inserted in the ```` tag - for the DataFrame HTML repr. -io.excel.xls.writer xlwt The default Excel writer engine for - 'xls' files. -io.excel.xlsm.writer openpyxl The default Excel writer engine for - 'xlsm' files. Available options: - 'openpyxl' (the default). -io.excel.xlsx.writer openpyxl The default Excel writer engine for - 'xlsx' files. -io.hdf.default_format None default format writing format, if - None, then put will default to - 'fixed' and append will default to - 'table' -io.hdf.dropna_table True drop ALL nan rows when appending - to a table -mode.chained_assignment warn Raise an exception, warn, or no - action if trying to use chained - assignment, The default is warn -mode.sim_interactive False Whether to simulate interactive mode - for purposes of testing -mode.use_inf_as_null False True means treat None, NaN, -INF, - INF as null (old way), False means - None and NaN are null, but INF, -INF - are not null (new way). -========================== ============ ================================== +=================================== ============ ================================== +Option Default Function +=================================== ============ ================================== +display.chop_threshold None If set to a float value, all float + values smaller then the given + threshold will be displayed as + exactly 0 by repr and friends. +display.colheader_justify right Controls the justification of + column headers. used by DataFrameFormatter. +display.column_space 12 No description available. +display.date_dayfirst False When True, prints and parses dates + with the day first, eg 20/01/2005 +display.date_yearfirst False When True, prints and parses dates + with the year first, eg 2005/01/20 +display.encoding UTF-8 Defaults to the detected encoding + of the console. Specifies the encoding + to be used for strings returned by + to_string, these are generally strings + meant to be displayed on the console. +display.expand_frame_repr True Whether to print out the full DataFrame + repr for wide DataFrames across + multiple lines, `max_columns` is + still respected, but the output will + wrap-around across multiple "pages" + if its width exceeds `display.width`. +display.float_format None The callable should accept a floating + point number and return a string with + the desired format of the number. + This is used in some places like + SeriesFormatter. + See core.format.EngFormatter for an example. +display.large_repr truncate For DataFrames exceeding max_rows/max_cols, + the repr (and HTML repr) can show + a truncated table (the default from 0.13), + or switch to the view from df.info() + (the behaviour in earlier versions of pandas). + allowable settings, ['truncate', 'info'] +display.latex.repr False Whether to produce a latex DataFrame + representation for jupyter frontends + that support it. +display.latex.escape True Escapes special caracters in Dataframes, when + using the to_latex method. +display.latex.longtable False Specifies if the to_latex method of a Dataframe + uses the longtable format. +display.latex.multicolumn True Combines columns when using a MultiIndex +display.latex.multicolumn_format 'l' Alignment of multicolumn labels +display.latex.multirow False Combines rows when using a MultiIndex. + Centered instead of top-aligned, + separated by clines. +display.max_columns 20 max_rows and max_columns are used + in __repr__() methods to decide if + to_string() or info() is used to + render an object to a string. In + case python/IPython is running in + a terminal this can be set to 0 and + pandas will correctly auto-detect + the width the terminal and swap to + a smaller format in case all columns + would not fit vertically. The IPython + notebook, IPython qtconsole, or IDLE + do not run in a terminal and hence + it is not possible to do correct + auto-detection. 'None' value means + unlimited. +display.max_colwidth 50 The maximum width in characters of + a column in the repr of a pandas + data structure. When the column overflows, + a "..." placeholder is embedded in + the output. +display.max_info_columns 100 max_info_columns is used in DataFrame.info + method to decide if per column information + will be printed. +display.max_info_rows 1690785 df.info() will usually show null-counts + for each column. For large frames + this can be quite slow. max_info_rows + and max_info_cols limit this null + check only to frames with smaller + dimensions then specified. +display.max_rows 60 This sets the maximum number of rows + pandas should output when printing + out various output. For example, + this value determines whether the + repr() for a dataframe prints out + fully or just a summary repr. + 'None' value means unlimited. +display.max_seq_items 100 when pretty-printing a long sequence, + no more then `max_seq_items` will + be printed. If items are omitted, + they will be denoted by the addition + of "..." to the resulting string. + If set to None, the number of items + to be printed is unlimited. +display.memory_usage True This specifies if the memory usage of + a DataFrame should be displayed when the + df.info() method is invoked. +display.multi_sparse True "Sparsify" MultiIndex display (don't + display repeated elements in outer + levels within groups) +display.notebook_repr_html True When True, IPython notebook will + use html representation for + pandas objects (if it is available). +display.pprint_nest_depth 3 Controls the number of nested levels + to process when pretty-printing +display.precision 6 Floating point output precision in + terms of number of places after the + decimal, for regular formatting as well + as scientific notation. Similar to + numpy's ``precision`` print option +display.show_dimensions truncate Whether to print out dimensions + at the end of DataFrame repr. + If 'truncate' is specified, only + print out the dimensions if the + frame is truncated (e.g. not display + all rows and/or columns) +display.width 80 Width of the display in characters. + In case python/IPython is running in + a terminal this can be set to None + and pandas will correctly auto-detect + the width. Note that the IPython notebook, + IPython qtconsole, or IDLE do not run in a + terminal and hence it is not possible + to correctly detect the width. +display.html.table_schema False Whether to publish a Table Schema + representation for frontends that + support it. +display.html.border 1 A ``border=value`` attribute is + inserted in the ``
`` tag + for the DataFrame HTML repr. +io.excel.xls.writer xlwt The default Excel writer engine for + 'xls' files. +io.excel.xlsm.writer openpyxl The default Excel writer engine for + 'xlsm' files. Available options: + 'openpyxl' (the default). +io.excel.xlsx.writer openpyxl The default Excel writer engine for + 'xlsx' files. +io.hdf.default_format None default format writing format, if + None, then put will default to + 'fixed' and append will default to + 'table' +io.hdf.dropna_table True drop ALL nan rows when appending + to a table +io.parquet.engine None The engine to use as a default for + parquet reading and writing. If None + then try 'pyarrow' and 'fastparquet' +mode.chained_assignment warn Raise an exception, warn, or no + action if trying to use chained + assignment, The default is warn +mode.sim_interactive False Whether to simulate interactive mode + for purposes of testing. +mode.use_inf_as_na False True means treat None, NaN, -INF, + INF as NA (old way), False means + None and NaN are null, but INF, -INF + are not NA (new way). +compute.use_bottleneck True Use the bottleneck library to accelerate + computation if it is installed. +compute.use_numexpr True Use the numexpr library to accelerate + computation if it is installed. +=================================== ============ ================================== + .. _basics.console_output: @@ -507,3 +521,26 @@ Enabling ``display.unicode.ambiguous_as_wide`` lets pandas to figure these chara pd.set_option('display.unicode.east_asian_width', False) pd.set_option('display.unicode.ambiguous_as_wide', False) + +.. _options.table_schema: + +Table Schema Display +-------------------- + +.. versionadded:: 0.20.0 + +``DataFrame`` and ``Series`` will publish a Table Schema representation +by default. False by default, this can be enabled globally with the +``display.html.table_schema`` option: + +.. ipython:: python + + pd.set_option('display.html.table_schema', True) + +Only ``'display.max_rows'`` are serialized and published. + + +.. ipython:: python + :suppress: + + pd.reset_option('display.html.table_schema') diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 92caeec319169..00a71603e1261 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -6,7 +6,11 @@ Package overview **************** -:mod:`pandas` consists of the following things +:mod:`pandas` is an open source, BSD-licensed library providing high-performance, +easy-to-use data structures and data analysis tools for the `Python `__ +programming language. + +:mod:`pandas` consists of the following elements * A set of labeled array data structures, the primary of which are Series and DataFrame @@ -21,27 +25,23 @@ Package overview * Memory-efficient "sparse" versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value) * Moving window statistics (rolling mean, rolling standard deviation, etc.) - * Static and moving window linear and `panel regression - `__ -Data structures at a glance ---------------------------- +Data Structures +--------------- .. csv-table:: :header: "Dimensions", "Name", "Description" :widths: 15, 20, 50 - 1, Series, "1D labeled homogeneously-typed array" - 2, DataFrame, "General 2D labeled, size-mutable tabular structure with - potentially heterogeneously-typed columns" - 3, Panel, "General 3D labeled, also size-mutable array" + 1, "Series", "1D labeled homogeneously-typed array" + 2, "DataFrame", "General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed column" -Why more than 1 data structure? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Why more than one data structure? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The best way to think about the pandas data structures is as flexible containers for lower dimensional data. For example, DataFrame is a container -for Series, and Panel is a container for DataFrame objects. We would like to be +for Series, and Series is a container for scalars. We would like to be able to insert and remove objects from these containers in a dictionary-like fashion. @@ -85,36 +85,41 @@ The first stop for pandas issues and ideas is the `Github Issue Tracker pandas community experts can answer through `Stack Overflow `__. -Longer discussions occur on the `developer mailing list -`__, and commercial support -inquiries for Lambda Foundry should be sent to: support@lambdafoundry.com +Community +--------- -Credits -------- +pandas is actively supported today by a community of like-minded individuals around +the world who contribute their valuable time and energy to help make open source +pandas possible. Thanks to `all of our contributors `__. + +If you're interested in contributing, please +visit `Contributing to pandas webpage `__. -pandas development began at `AQR Capital Management `__ in -April 2008. It was open-sourced at the end of 2009. AQR continued to provide -resources for development through the end of 2011, and continues to contribute -bug reports today. +pandas is a `NUMFocus `__ sponsored project. +This will help ensure the success of development of pandas as a world-class open-source +project, and makes it possible to `donate `__ to the project. -Since January 2012, `Lambda Foundry `__, has -been providing development resources, as well as commercial support, -training, and consulting for pandas. +Project Governance +------------------ -pandas is only made possible by a group of people around the world like you -who have contributed new code, bug reports, fixes, comments and ideas. A -complete list can be found `on Github `__. +The governance process that pandas project has used informally since its inception in 2008 is formalized in `Project Governance documents `__ . +The documents clarify how decisions are made and how the various elements of our community interact, including the relationship between open source collaborative development and work that may be funded by for-profit or non-profit entities. + +Wes McKinney is the Benevolent Dictator for Life (BDFL). Development Team ----------------- +----------------- + +The list of the Core Team members and more detailed information can be found on the `people’s page `__ of the governance repo. + -pandas is a part of the PyData project. The PyData Development Team is a -collection of developers focused on the improvement of Python's data -libraries. The core team that coordinates development can be found on `Github -`__. If you're interested in contributing, please -visit the `project website `__. +Institutional Partners +---------------------- + +The information about current institutional partners can be found on `pandas website page `__ License ------- .. literalinclude:: ../../LICENSE + diff --git a/doc/source/r_interface.rst b/doc/source/r_interface.rst index b5d699cad69d5..88634d7f75c63 100644 --- a/doc/source/r_interface.rst +++ b/doc/source/r_interface.rst @@ -41,15 +41,17 @@ In the remainder of this page, a few examples of explicit conversion is given. T Transferring R data sets into Python ------------------------------------ -The ``pandas2ri.ri2py`` function retrieves an R data set and converts it to the -appropriate pandas object (most likely a DataFrame): +Once the pandas conversion is activated (``pandas2ri.activate()``), many conversions +of R to pandas objects will be done automatically. For example, to obtain the 'iris' dataset as a pandas DataFrame: .. ipython:: python r.data('iris') - df_iris = pandas2ri.ri2py(r['iris']) - df_iris.head() + r['iris'].head() +If the pandas conversion was not activated, the above could also be accomplished +by explicitly converting it with the ``pandas2ri.ri2py`` function +(``pandas2ri.ri2py(r['iris'])``). Converting DataFrames into R objects ------------------------------------ @@ -65,7 +67,6 @@ DataFrames into the equivalent R object (that is, **data.frame**): print(type(r_dataframe)) print(r_dataframe) - The DataFrame's index is stored as the ``rownames`` attribute of the data.frame instance. diff --git a/doc/source/release.rst b/doc/source/release.rst index f89fec9fb86e6..bf272e243e0dd 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -37,6 +37,298 @@ analysis / manipulation tool available in any language. * Binary installers on PyPI: http://pypi.python.org/pypi/pandas * Documentation: http://pandas.pydata.org +pandas 0.20.2 +------------- + +**Release date:** June 4, 2017 + +This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes, +bug fixes and performance improvements. +We recommend that all users upgrade to this version. + +See the :ref:`v0.20.2 Whatsnew ` overview for an extensive list +of all enhancements and bugs that have been fixed in 0.20.2. + +Thanks +~~~~~~ + +- Aaron Barber +- Andrew 亮 +- Becky Sweger +- Christian Prinoth +- Christian Stade-Schuldt +- DSM +- Erik Fredriksen +- Hugues Valois +- Jeff Reback +- Jeff Tratner +- JimStearns206 +- John W. O'Brien +- Joris Van den Bossche +- JosephWagner +- Keith Webber +- Mehmet Ali "Mali" Akmanalp +- Pankaj Pandey +- Patrick Luo +- Patrick O'Melveny +- Pietro Battiston +- RobinFiveWords +- Ryan Hendrickson +- SimonBaron +- Tom Augspurger +- WBare +- bpraggastis +- chernrick +- chris-b1 +- economy +- gfyoung +- jaredsnyder +- keitakurita +- linebp +- lloydkirk + +pandas 0.20.0 / 0.20.1 +---------------------- + +**Release date:** May 5, 2017 + + +This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, +enhancements, and performance improvements along with a large number of bug fixes. We recommend that all +users upgrade to this version. + +Highlights include: + +- New ``.agg()`` API for Series/DataFrame similar to the groupby-rolling-resample API's, see :ref:`here ` +- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here `. +- The ``.ix`` indexer has been deprecated, see :ref:`here ` +- ``Panel`` has been deprecated, see :ref:`here ` +- Addition of an ``IntervalIndex`` and ``Interval`` scalar type, see :ref:`here ` +- Improved user API when grouping by index levels in ``.groupby()``, see :ref:`here ` +- Improved support for ``UInt64`` dtypes, see :ref:`here ` +- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:`here ` +- Experimental support for exporting styled DataFrames (``DataFrame.style``) to Excel, see :ref:`here ` +- Window binary corr/cov operations now return a MultiIndexed ``DataFrame`` rather than a ``Panel``, as ``Panel`` is now deprecated, see :ref:`here ` +- Support for S3 handling now uses ``s3fs``, see :ref:`here ` +- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here ` + +See the :ref:`v0.20.1 Whatsnew ` overview for an extensive list +of all enhancements and bugs that have been fixed in 0.20.1. + + +.. note:: + + This is a combined release for 0.20.0 and and 0.20.1. + Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas' ``utils`` routines. (:issue:`16250`) + +Thanks +~~~~~~ + +- abaldenko +- Adam J. Stewart +- Adrian +- adrian-stepien +- Ajay Saxena +- Akash Tandon +- Albert Villanova del Moral +- Aleksey Bilogur +- alexandercbooth +- Alexis Mignon +- Amol Kahat +- Andreas Winkler +- Andrew Kittredge +- Anthonios Partheniou +- Arco Bast +- Ashish Singal +- atbd +- bastewart +- Baurzhan Muftakhidinov +- Ben Kandel +- Ben Thayer +- Ben Welsh +- Bill Chambers +- bmagnusson +- Brandon M. Burroughs +- Brian +- Brian McFee +- carlosdanielcsantos +- Carlos Souza +- chaimdemulder +- Chris +- chris-b1 +- Chris Ham +- Christopher C. Aycock +- Christoph Gohlke +- Christoph Paulik +- Chris Warth +- Clemens Brunner +- DaanVanHauwermeiren +- Daniel Himmelstein +- Dave Willmer +- David Cook +- David Gwynne +- David Hoffman +- David Krych +- dickreuter +- Diego Fernandez +- Dimitris Spathis +- discort +- Dmitry L +- Dody Suria Wijaya +- Dominik Stanczak +- Dr-Irv +- Dr. Irv +- dr-leo +- D.S. McNeil +- dubourg +- dwkenefick +- Elliott Sales de Andrade +- Ennemoser Christoph +- Francesc Alted +- Fumito Hamamura +- funnycrab +- gfyoung +- Giacomo Ferroni +- goldenbull +- Graham R. Jeffries +- Greg Williams +- Guilherme Beltramini +- Guilherme Samora +- Hao Wu +- Harshit Patni +- hesham.shabana@hotmail.com +- Ilya V. Schurov +- Iván Vallés Pérez +- Jackie Leng +- Jaehoon Hwang +- James Draper +- James Goppert +- James McBride +- James Santucci +- Jan Schulz +- Jeff Carey +- Jeff Reback +- JennaVergeynst +- Jim +- Jim Crist +- Joe Jevnik +- Joel Nothman +- John +- John Tucker +- John W. O'Brien +- John Zwinck +- jojomdt +- Jonathan de Bruin +- Jonathan Whitmore +- Jon Mease +- Jon M. Mease +- Joost Kranendonk +- Joris Van den Bossche +- Joshua Bradt +- Julian Santander +- Julien Marrec +- Jun Kim +- Justin Solinsky +- Kacawi +- Kamal Kamalaldin +- Kerby Shedden +- Kernc +- Keshav Ramaswamy +- Kevin Sheppard +- Kyle Kelley +- Larry Ren +- Leon Yin +- linebp +- Line Pedersen +- Lorenzo Cestaro +- Luca Scarabello +- Lukasz +- Mahmoud Lababidi +- manu +- manuels +- Mark Mandel +- Matthew Brett +- Matthew Roeschke +- mattip +- Matti Picus +- Matt Roeschke +- maxalbert +- Maximilian Roos +- mcocdawc +- Michael Charlton +- Michael Felt +- Michael Lamparski +- Michiel Stock +- Mikolaj Chwalisz +- Min RK +- Miroslav Šedivý +- Mykola Golubyev +- Nate Yoder +- Nathalie Rud +- Nicholas Ver Halen +- Nick Chmura +- Nolan Nichols +- nuffe +- Pankaj Pandey +- paul-mannino +- Pawel Kordek +- pbreach +- Pete Huang +- Peter +- Peter Csizsek +- Petio Petrov +- Phil Ruffwind +- Pietro Battiston +- Piotr Chromiec +- Prasanjit Prakash +- Robert Bradshaw +- Rob Forgione +- Robin +- Rodolfo Fernandez +- Roger Thomas +- Rouz Azari +- Sahil Dua +- sakkemo +- Sam Foo +- Sami Salonen +- Sarah Bird +- Sarma Tangirala +- scls19fr +- Scott Sanderson +- Sebastian Bank +- Sebastian Gsänger +- Sébastien de Menten +- Shawn Heide +- Shyam Saladi +- sinhrks +- Sinhrks +- Stephen Rauch +- stijnvanhoey +- Tara Adiseshan +- themrmax +- the-nose-knows +- Thiago Serafim +- Thoralf Gutierrez +- Thrasibule +- Tobias Gustafsson +- Tom Augspurger +- tomrod +- Tong Shen +- Tong SHEN +- TrigonaMinima +- tzinckgraf +- Uwe +- wandersoncferreira +- watercrossing +- wcwagner +- Wes Turner +- Wiktor Tomczak +- WillAyd +- xgdgsc +- Yaroslav Halchenko +- Yimeng Zhang +- yui-knk + pandas 0.19.2 ------------- diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index eccaa9474bf6d..3dce73b302c7c 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -265,8 +265,8 @@ the right thing: Reshaping by Melt ----------------- -The :func:`~pandas.melt` function is useful to massage a -DataFrame into a format where one or more columns are identifier variables, +The top-level :func:`melt` and :func:`~DataFrame.melt` functions are useful to +massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are "unpivoted" to the row axis, leaving just two non-identifier columns, "variable" and "value". The names of those columns can be customized by supplying the ``var_name`` and @@ -281,10 +281,11 @@ For instance, 'height' : [5.5, 6.0], 'weight' : [130, 150]}) cheese - pd.melt(cheese, id_vars=['first', 'last']) - pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity') + cheese.melt(id_vars=['first', 'last']) + cheese.melt(id_vars=['first', 'last'], var_name='quantity') -Another way to transform is to use the ``wide_to_long`` panel data convenience function. +Another way to transform is to use the ``wide_to_long`` panel data convenience +function. .. ipython:: python @@ -516,7 +517,15 @@ Alternatively we can specify custom bin-edges: .. ipython:: python - pd.cut(ages, bins=[0, 18, 35, 70]) + c = pd.cut(ages, bins=[0, 18, 35, 70]) + c + +.. versionadded:: 0.20.0 + +If the ``bins`` keyword is an ``IntervalIndex``, then these will be +used to bin the passed data. + + pd.cut([25, 20, 50], bins=c.categories) .. _reshaping.dummies: @@ -627,7 +636,7 @@ When a column contains only one level, it will be omitted in the result. pd.get_dummies(df, drop_first=True) - +.. _reshaping.factorize: Factorizing values ------------------ diff --git a/doc/source/sparse.rst b/doc/source/sparse.rst index 2bc5d3f6dd0f5..b4884cf1c4141 100644 --- a/doc/source/sparse.rst +++ b/doc/source/sparse.rst @@ -186,7 +186,37 @@ the correct dense result. Interaction with scipy.sparse ----------------------------- -Experimental api to transform between sparse pandas and scipy.sparse structures. +SparseDataFrame +~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +Pandas supports creating sparse dataframes directly from ``scipy.sparse`` matrices. + +.. ipython:: python + + from scipy.sparse import csr_matrix + + arr = np.random.random(size=(1000, 5)) + arr[arr < .9] = 0 + + sp_arr = csr_matrix(arr) + sp_arr + + sdf = pd.SparseDataFrame(sp_arr) + sdf + +All sparse formats are supported, but matrices that are not in :mod:`COOrdinate ` format will be converted, copying data as needed. +To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you can use the :meth:`SparseDataFrame.to_coo` method: + +.. ipython:: python + + sdf.to_coo() + +SparseSeries +~~~~~~~~~~~~ + +.. versionadded:: 0.16.0 A :meth:`SparseSeries.to_coo` method is implemented for transforming a ``SparseSeries`` indexed by a ``MultiIndex`` to a ``scipy.sparse.coo_matrix``. diff --git a/doc/source/html-styling.ipynb b/doc/source/style.ipynb similarity index 71% rename from doc/source/html-styling.ipynb rename to doc/source/style.ipynb index 1a97378fd30b1..c250787785e14 100644 --- a/doc/source/html-styling.ipynb +++ b/doc/source/style.ipynb @@ -2,45 +2,38 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "collapsed": true + }, "source": [ + "# Styling\n", + "\n", "*New in version 0.17.1*\n", "\n", - "

*Provisional: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your [feedback](https://github.com/pandas-dev/pandas/issues).*

\n", + "*Provisional: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your feedback.*\n", "\n", - "This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](http://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/html-styling.ipynb).\n", + "This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](http://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/style.ipynb).\n", "\n", "You can apply **conditional formatting**, the visual styling of a DataFrame\n", "depending on the data within, by using the ``DataFrame.style`` property.\n", - "This is a property that returns a ``pandas.Styler`` object, which has\n", + "This is a property that returns a ``Styler`` object, which has\n", "useful methods for formatting and displaying DataFrames.\n", "\n", "The styling is accomplished using CSS.\n", "You write \"style functions\" that take scalars, `DataFrame`s or `Series`, and return *like-indexed* DataFrames or Series with CSS `\"attribute: value\"` pairs for the values.\n", - "These functions can be incrementally passed to the `Styler` which collects the styles before rendering.\n", - "\n", - "### Contents\n", - "\n", - "- [Building Styles](#Building-Styles)\n", - "- [Finer Control: Slicing](#Finer-Control:-Slicing)\n", - "- [Builtin Styles](#Builtin-Styles)\n", - "- [Other options](#Other-options)\n", - "- [Sharing Styles](#Sharing-Styles)\n", - "- [Limitations](#Limitations)\n", - "- [Terms](#Terms)\n", - "- [Extensibility](#Extensibility)" + "These functions can be incrementally passed to the `Styler` which collects the styles before rendering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Building Styles\n", + "## Building Styles\n", "\n", "Pass your style functions into one of the following methods:\n", "\n", - "- `Styler.applymap`: elementwise\n", - "- `Styler.apply`: column-/row-/table-wise\n", + "- ``Styler.applymap``: elementwise\n", + "- ``Styler.apply``: column-/row-/table-wise\n", "\n", "Both of those methods take a function (and some other keyword arguments) and applies your function to the DataFrame in a certain way.\n", "`Styler.applymap` works through the DataFrame elementwise.\n", @@ -58,7 +51,21 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true, + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot\n", + "# We have this here to trigger matplotlib's font cache stuff.\n", + "# This cell is hidden from the output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true }, "outputs": [], "source": [ @@ -82,9 +89,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style" @@ -94,7 +99,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "*Note*: The `DataFrame.style` attribute is a propetry that returns a `Styler` object. `Styler` has a `_repr_html_` method defined on it so they are rendered automatically. If you want the actual HTML back for further processing or for writing to file call the `.render()` method which returns a string.\n", + "*Note*: The `DataFrame.style` attribute is a property that returns a `Styler` object. `Styler` has a `_repr_html_` method defined on it so they are rendered automatically. If you want the actual HTML back for further processing or for writing to file call the `.render()` method which returns a string.\n", "\n", "The above output looks very similar to the standard DataFrame HTML representation. But we've done some work behind the scenes to attach CSS classes to each cell. We can view these by calling the `.render` method." ] @@ -102,9 +107,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.highlight_null().render().split('\\n')[:10]" @@ -155,9 +158,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "s = df.style.applymap(color_negative_red)\n", @@ -203,9 +204,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.apply(highlight_max)" @@ -229,9 +228,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.\\\n", @@ -245,7 +242,7 @@ "source": [ "Above we used `Styler.apply` to pass in each column one at a time.\n", "\n", - "

*Debugging Tip*: If you're having trouble writing your style function, try just passing it into DataFrame.apply. Internally, Styler.apply uses DataFrame.apply so the result should be the same.

\n", + "*Debugging Tip*: If you're having trouble writing your style function, try just passing it into DataFrame.apply. Internally, Styler.apply uses DataFrame.apply so the result should be the same.\n", "\n", "What if you wanted to highlight just the maximum value in the entire table?\n", "Use `.apply(function, axis=None)` to indicate that your function wants the entire table, not one column or row at a time. Let's try that next.\n", @@ -285,9 +282,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.apply(highlight_max, color='darkorange', axis=None)" @@ -335,9 +330,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.apply(highlight_max, subset=['B', 'C', 'D'])" @@ -353,9 +346,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.applymap(color_negative_red,\n", @@ -388,9 +379,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.format(\"{:.2%}\")" @@ -406,9 +395,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.format({'B': \"{:0<4.0f}\", 'D': '{:+.2f}'})" @@ -424,9 +411,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.format({\"B\": lambda x: \"±{:.2f}\".format(abs(x))})" @@ -449,9 +434,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.highlight_null(null_color='red')" @@ -467,9 +450,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", @@ -490,9 +471,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "# Uses the full color range\n", @@ -502,12 +481,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ - "# Compreess the color range\n", + "# Compress the color range\n", "(df.loc[:4]\n", " .style\n", " .background_gradient(cmap='viridis', low=.5, high=0)\n", @@ -518,67 +495,128 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can include \"bar charts\" in your DataFrame." + "There's also `.highlight_min` and `.highlight_max`." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ - "df.style.bar(subset=['A', 'B'], color='#d65f5f')" + "df.style.highlight_max(axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "There's also `.highlight_min` and `.highlight_max`." + "Use `Styler.set_properties` when the style doesn't actually depend on the values." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ - "df.style.highlight_max(axis=0)" + "df.style.set_properties(**{'background-color': 'black',\n", + " 'color': 'lawngreen',\n", + " 'border-color': 'white'})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Bar charts" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can include \"bar charts\" in your DataFrame." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ - "df.style.highlight_min(axis=0)" + "df.style.bar(subset=['A', 'B'], color='#d65f5f')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Use `Styler.set_properties` when the style doesn't actually depend on the values." + "New in version 0.20.0 is the ability to customize further the bar chart: You can now have the `df.style.bar` be centered on zero or midpoint value (in addition to the already existing way of having the min value at the left side of the cell), and you can pass a list of `[color_negative, color_positive]`.\n", + "\n", + "Here's how you can change the above with the new `align='mid'` option:" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ - "df.style.set_properties(**{'background-color': 'black',\n", - " 'color': 'lawngreen',\n", - " 'border-color': 'white'})" + "df.style.bar(subset=['A', 'B'], align='mid', color=['#d65f5f', '#5fba7d'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following example aims to give a highlight of the behavior of the new align options:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from IPython.display import HTML\n", + "\n", + "# Test series\n", + "test1 = pd.Series([-100,-60,-30,-20], name='All Negative')\n", + "test2 = pd.Series([10,20,50,100], name='All Positive')\n", + "test3 = pd.Series([-10,-5,0,90], name='Both Pos and Neg')\n", + "\n", + "head = \"\"\"\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\"\"\"\n", + "\n", + "aligns = ['left','zero','mid']\n", + "for align in aligns:\n", + " row = \"\".format(align)\n", + " for serie in [test1,test2,test3]:\n", + " s = serie.copy()\n", + " s.name=''\n", + " row += \"\".format(s.to_frame().style.bar(align=align, \n", + " color=['#d65f5f', '#5fba7d'], \n", + " width=100).render()) #testn['width']\n", + " row += ''\n", + " head += row\n", + " \n", + "head+= \"\"\"\n", + "\n", + "
AlignAll NegativeAll PositiveBoth Neg and Pos
{}{}
\"\"\"\n", + " \n", + "\n", + "HTML(head)" ] }, { @@ -598,9 +636,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df2 = -df\n", @@ -611,9 +647,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "style2 = df2.style\n", @@ -632,7 +666,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Other options\n", + "## Other Options\n", "\n", "You've seen a few methods for data-driven styling.\n", "`Styler` also provides a few other options for styles that don't depend on the data.\n", @@ -643,7 +677,7 @@ "\n", "Each of these can be specified in two ways:\n", "\n", - "- A keyword argument to `pandas.core.Styler`\n", + "- A keyword argument to `Styler.__init__`\n", "- A call to one of the `.set_` methods, e.g. `.set_caption`\n", "\n", "The best method to use depends on the context. Use the `Styler` constructor when building many styled DataFrames that should all share the same properties. For interactive use, the`.set_` methods are more convenient." @@ -653,7 +687,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Precision" + "### Precision" ] }, { @@ -666,9 +700,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "with pd.option_context('display.precision', 2):\n", @@ -688,9 +720,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style\\\n", @@ -723,9 +753,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "df.style.set_caption('Colormaps, with a caption.')\\\n", @@ -751,9 +779,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML\n", @@ -792,7 +818,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# CSS Classes\n", + "### CSS Classes\n", "\n", "Certain CSS classes are attached to cells.\n", "\n", @@ -813,7 +839,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Limitations\n", + "### Limitations\n", "\n", "- DataFrame only `(use Series.to_frame().style)`\n", "- The index and columns must be unique\n", @@ -828,7 +854,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Terms\n", + "### Terms\n", "\n", "- Style function: a function that's passed into `Styler.apply` or `Styler.applymap` and returns values like `'css attribute: value'`\n", "- Builtin style functions: style functions that are methods on `Styler`\n", @@ -849,9 +875,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "from IPython.html import widgets\n", @@ -867,7 +891,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -887,18 +911,16 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "np.random.seed(25)\n", "cmap = cmap=sns.diverging_palette(5, 250, as_cmap=True)\n", - "df = pd.DataFrame(np.random.randn(20, 25)).cumsum()\n", + "bigdf = pd.DataFrame(np.random.randn(20, 25)).cumsum()\n", "\n", - "df.style.background_gradient(cmap, axis=1)\\\n", + "bigdf.style.background_gradient(cmap, axis=1)\\\n", " .set_properties(**{'max-width': '80px', 'font-size': '1pt'})\\\n", - " .set_caption(\"Hover to magify\")\\\n", + " .set_caption(\"Hover to magnify\")\\\n", " .set_precision(2)\\\n", " .set_table_styles(magnify())" ] @@ -907,41 +929,235 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Extensibility\n", + "## Export to Excel\n", "\n", - "The core of pandas is, and will remain, its \"high-performance, easy-to-use data structures\".\n", - "With that in mind, we hope that `DataFrame.style` accomplishes two goals\n", + "*New in version 0.20.0*\n", "\n", - "- Provide an API that is pleasing to use interactively and is \"good enough\" for many tasks\n", - "- Provide the foundations for dedicated libraries to build on\n", + "*Experimental: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your feedback.*\n", "\n", - "If you build a great library on top of this, let us know and we'll [link](http://pandas.pydata.org/pandas-docs/stable/ecosystem.html) to it.\n", + "Some support is available for exporting styled `DataFrames` to Excel worksheets using the `OpenPyXL` engine. CSS2.2 properties handled include:\n", "\n", - "## Subclassing\n", + "- `background-color`\n", + "- `border-style`, `border-width`, `border-color` and their {`top`, `right`, `bottom`, `left` variants}\n", + "- `color`\n", + "- `font-family`\n", + "- `font-style`\n", + "- `font-weight`\n", + "- `text-align`\n", + "- `text-decoration`\n", + "- `vertical-align`\n", + "- `white-space: nowrap`\n", "\n", - "This section contains a bit of information about the implementation of `Styler`.\n", - "Since the feature is so new all of this is subject to change, even more so than the end-use API.\n", + "Only CSS2 named colors and hex colors of the form `#rgb` or `#rrggbb` are currently supported." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "df.style.\\\n", + " applymap(color_negative_red).\\\n", + " apply(highlight_max).\\\n", + " to_excel('styled.xlsx', engine='openpyxl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A screenshot of the output:\n", "\n", - "As users apply styles (via `.apply`, `.applymap` or one of the builtins), we don't actually calculate anything.\n", - "Instead, we append functions and arguments to a list `self._todo`.\n", - "When asked (typically in `.render` we'll walk through the list and execute each function (this is in `self._compute()`.\n", - "These functions update an internal `defaultdict(list)`, `self.ctx` which maps DataFrame row / column positions to CSS attribute, value pairs.\n", + "![Excel spreadsheet with styled DataFrame](_static/style-excel.png)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Extensibility\n", "\n", - "We take the extra step through `self._todo` so that we can export styles and set them on other `Styler`s.\n", + "The core of pandas is, and will remain, its \"high-performance, easy-to-use data structures\".\n", + "With that in mind, we hope that `DataFrame.style` accomplishes two goals\n", "\n", - "Rendering uses [Jinja](http://jinja.pocoo.org/) templates.\n", - "The `.translate` method takes `self.ctx` and builds another dictionary ready to be passed into `Styler.template.render`, the Jinja template.\n", + "- Provide an API that is pleasing to use interactively and is \"good enough\" for many tasks\n", + "- Provide the foundations for dedicated libraries to build on\n", "\n", + "If you build a great library on top of this, let us know and we'll [link](http://pandas.pydata.org/pandas-docs/stable/ecosystem.html) to it.\n", "\n", - "## Alternate templates\n", + "### Subclassing\n", "\n", - "We've used [Jinja](http://jinja.pocoo.org/) templates to build up the HTML.\n", - "The template is stored as a class variable ``Styler.template.``. Subclasses can override that.\n", + "If the default template doesn't quite suit your needs, you can subclass Styler and extend or override the template.\n", + "We'll show an example of extending the default template to insert a custom header before each table." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from jinja2 import Environment, ChoiceLoader, FileSystemLoader\n", + "from IPython.display import HTML\n", + "from pandas.io.formats.style import Styler" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%mkdir templates" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This next cell writes the custom template.\n", + "We extend the template `html.tpl`, which comes with pandas." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%file templates/myhtml.tpl\n", + "{% extends \"html.tpl\" %}\n", + "{% block table %}\n", + "

{{ table_title|default(\"My Table\") }}

\n", + "{{ super() }}\n", + "{% endblock table %}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we've created a template, we need to set up a subclass of ``Styler`` that\n", + "knows about it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "class MyStyler(Styler):\n", + " env = Environment(\n", + " loader=ChoiceLoader([\n", + " FileSystemLoader(\"templates\"), # contains ours\n", + " Styler.loader, # the default\n", + " ])\n", + " )\n", + " template = env.get_template(\"myhtml.tpl\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that we include the original loader in our environment's loader.\n", + "That's because we extend the original template, so the Jinja environment needs\n", + "to be able to find it.\n", "\n", - "```python\n", - "class CustomStyle(Styler):\n", - " template = Template(\"\"\"...\"\"\")\n", - "```" + "Now we can use that custom styler. It's `__init__` takes a DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MyStyler(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our custom template accepts a `table_title` keyword. We can provide the value in the `.render` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "HTML(MyStyler(df).render(table_title=\"Extending Example\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For convenience, we provide the `Styler.from_custom_template` method that does the same as the custom subclass." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "EasyStyler = Styler.from_custom_template(\"templates\", \"myhtml.tpl\")\n", + "EasyStyler(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's the template structure:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"template_structure.html\") as f:\n", + " structure = f.read()\n", + " \n", + "HTML(structure)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "See the template in the [GitHub repo](https://github.com/pandas-dev/pandas) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hack to get the same style in the notebook as the\n", + "# main site. This is hidden in the docs.\n", + "from IPython.display import HTML\n", + "with open(\"themes/nature_with_gtoc/static/nature.css_t\") as f:\n", + " css = f.read()\n", + " \n", + "HTML(''.format(css))" ] } ], @@ -961,9 +1177,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.5.1" + "version": "3.6.1" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/doc/source/style.rst b/doc/source/style.rst deleted file mode 100644 index 506b38bf06e65..0000000000000 --- a/doc/source/style.rst +++ /dev/null @@ -1,10 +0,0 @@ -.. _style: - -.. currentmodule:: pandas - -***** -Style -***** - -.. raw:: html - :file: html-styling.html diff --git a/doc/source/template_structure.html b/doc/source/template_structure.html new file mode 100644 index 0000000000000..0778d8e2e6f18 --- /dev/null +++ b/doc/source/template_structure.html @@ -0,0 +1,57 @@ + + + +
before_style
+
style +
<style type="text/css">
+
table_styles
+
before_cellstyle
+
cellstyle
+
</style>
+
+ +
before_table
+ +
table +
<table ...>
+
caption
+ +
thead +
before_head_rows
+
head_tr (loop over headers)
+
after_head_rows
+
+ +
tbody +
before_rows
+
tr (loop over data rows)
+
after_rows
+
+
</table>
+
+ +
after_table
diff --git a/doc/source/text.rst b/doc/source/text.rst index 52e05c5d511bc..e3e4b24d17f44 100644 --- a/doc/source/text.rst +++ b/doc/source/text.rst @@ -146,8 +146,8 @@ following code will cause trouble because of the regular expression meaning of # We need to escape the special character (for >1 len patterns) dollars.str.replace(r'-\$', '-') -The ``replace`` method can also take a callable as replacement. It is called -on every ``pat`` using :func:`re.sub`. The callable should expect one +The ``replace`` method can also take a callable as replacement. It is called +on every ``pat`` using :func:`re.sub`. The callable should expect one positional argument (a regex object) and return a string. .. versionadded:: 0.20.0 @@ -164,6 +164,27 @@ positional argument (a regex object) and return a string. repl = lambda m: m.group('two').swapcase() pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl) +The ``replace`` method also accepts a compiled regular expression object +from :func:`re.compile` as a pattern. All flags should be included in the +compiled regular expression object. + +.. versionadded:: 0.20.0 + +.. ipython:: python + + import re + regex_pat = re.compile(r'^.a|dog', flags=re.IGNORECASE) + s3.str.replace(regex_pat, 'XX-XX ') + +Including a ``flags`` argument when calling ``replace`` with a compiled +regular expression object will raise a ``ValueError``. + +.. ipython:: + + @verbatim + In [1]: s3.str.replace(regex_pat, 'XX-XX ', flags=re.IGNORECASE) + --------------------------------------------------------------------------- + ValueError: case and flags cannot be set when pat is a compiled regex Indexing with ``.str`` ---------------------- @@ -351,33 +372,20 @@ You can check whether elements contain a pattern: .. ipython:: python - pattern = r'[a-z][0-9]' + pattern = r'[0-9][a-z]' pd.Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern) or match a pattern: - .. ipython:: python - pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern, as_indexer=True) + pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern) The distinction between ``match`` and ``contains`` is strictness: ``match`` relies on strict ``re.match``, while ``contains`` relies on ``re.search``. -.. warning:: - - In previous versions, ``match`` was for *extracting* groups, - returning a not-so-convenient Series of tuples. The new method ``extract`` - (described in the previous section) is now preferred. - - This old, deprecated behavior of ``match`` is still the default. As - demonstrated above, use the new behavior by setting ``as_indexer=True``. - In this mode, ``match`` is analogous to ``contains``, returning a boolean - Series. The new behavior will become the default behavior in a future - release. - Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take - an extra ``na`` argument so missing values can be considered True or False: +an extra ``na`` argument so missing values can be considered True or False: .. ipython:: python diff --git a/doc/source/themes/nature_with_gtoc/layout.html b/doc/source/themes/nature_with_gtoc/layout.html index ddf1e861f5f81..a2106605c5562 100644 --- a/doc/source/themes/nature_with_gtoc/layout.html +++ b/doc/source/themes/nature_with_gtoc/layout.html @@ -94,4 +94,15 @@

{{ _('Search') }}

}); }); + {% endblock %} \ No newline at end of file diff --git a/doc/source/themes/nature_with_gtoc/static/nature.css_t b/doc/source/themes/nature_with_gtoc/static/nature.css_t index 2948f0d68b402..b61068ee28bef 100644 --- a/doc/source/themes/nature_with_gtoc/static/nature.css_t +++ b/doc/source/themes/nature_with_gtoc/static/nature.css_t @@ -299,20 +299,45 @@ td.field-body blockquote { padding-left: 30px; } -.rendered_html table { +// Adapted from the new Jupyter notebook style +// https://github.com/jupyter/notebook/blob/c8841b68c4c0739bbee1291e0214771f24194079/notebook/static/notebook/less/renderedhtml.less#L59 +table { margin-left: auto; margin-right: auto; - border-right: 1px solid #cbcbcb; - border-bottom: 1px solid #cbcbcb; + border: none; + border-collapse: collapse; + border-spacing: 0; + color: @rendered_html_border_color; + table-layout: fixed; +} +thead { + border-bottom: 1px solid @rendered_html_border_color; + vertical-align: bottom; +} +tr, th, td { + vertical-align: middle; + padding: 0.5em 0.5em; + line-height: normal; + white-space: normal; + max-width: none; + border: none; +} +th { + font-weight: bold; +} +th.col_heading { + text-align: right; +} +tbody tr:nth-child(odd) { + background: #f5f5f5; } -.rendered_html td, th { - border-left: 1px solid #cbcbcb; - border-top: 1px solid #cbcbcb; - margin: 0; - padding: 0.5em .75em; +table td.data, table th.row_heading table th.col_heading { + font-family: monospace; + text-align: right; } + /** * See also */ diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index e09d240ed91b7..ce4a920ad77b5 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -113,7 +113,7 @@ For example: pd.Period('2012-05', freq='D') ``Timestamp`` and ``Period`` can be the index. Lists of ``Timestamp`` and -``Period`` are automatically coerce to ``DatetimeIndex`` and ``PeriodIndex`` +``Period`` are automatically coerced to ``DatetimeIndex`` and ``PeriodIndex`` respectively. .. ipython:: python @@ -247,12 +247,15 @@ Return NaT for input when unparseable Out[6]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None) +.. _timeseries.converting.epoch: + Epoch Timestamps ~~~~~~~~~~~~~~~~ It's also possible to convert integer or float epoch times. The default unit for these is nanoseconds (since these are how ``Timestamp`` s are stored). However, -often epochs are stored in another ``unit`` which can be specified: +often epochs are stored in another ``unit`` which can be specified. These are computed +from the starting point specified by the :ref:`Origin Parameter `. Typical epoch stored units @@ -264,17 +267,63 @@ Typical epoch stored units pd.to_datetime([1349720105100, 1349720105200, 1349720105300, 1349720105400, 1349720105500 ], unit='ms') -These *work*, but the results may be unexpected. +.. note:: + + Epoch times will be rounded to the nearest nanosecond. + +.. warning:: + + Conversion of float epoch times can lead to inaccurate and unexpected results. + :ref:`Python floats ` have about 15 digits precision in + decimal. Rounding during conversion from float to high precision ``Timestamp`` is + unavoidable. The only way to achieve exact precision is to use a fixed-width + types (e.g. an int64). + + .. ipython:: python + + pd.to_datetime([1490195805.433, 1490195805.433502912], unit='s') + pd.to_datetime(1490195805433502912, unit='ns') + +.. _timeseries.converting.epoch_inverse: + +From Timestamps to Epoch +~~~~~~~~~~~~~~~~~~~~~~~~ + +To invert the operation from above, namely, to convert from a ``Timestamp`` to a 'unix' epoch: .. ipython:: python - pd.to_datetime([1]) + stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D') + stamps - pd.to_datetime([1, 3.14], unit='s') +We convert the ``DatetimeIndex`` to an ``int64`` array, then divide by the conversion unit. -.. note:: +.. ipython:: python - Epoch times will be rounded to the nearest nanosecond. + stamps.view('int64') // pd.Timedelta(1, unit='s') + +.. _timeseries.origin: + +Using the Origin Parameter +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +Using the ``origin`` parameter, one can specify an alternative starting point for creation +of a ``DatetimeIndex``. + +Start with 1960-01-01 as the starting date + +.. ipython:: python + + pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01')) + +The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``. +Commonly called 'unix epoch' or POSIX time. + +.. ipython:: python + + pd.to_datetime([1, 2, 3], unit='D') .. _timeseries.daterange: @@ -607,10 +656,10 @@ There are several time/date properties that one can access from ``Timestamp`` or dayofyear,"The ordinal day of year" weekofyear,"The week ordinal of the year" week,"The week ordinal of the year" - dayofweek,"The numer of the day of the week with Monday=0, Sunday=6" + dayofweek,"The number of the day of the week with Monday=0, Sunday=6" weekday,"The number of the day of the week with Monday=0, Sunday=6" weekday_name,"The name of the day in a week (ex: Friday)" - quarter,"Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc." + quarter,"Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, etc." days_in_month,"The number of days in the month of the datetime" is_month_start,"Logical indicating if first day of month (defined by frequency)" is_month_end,"Logical indicating if last day of month (defined by frequency)" @@ -1043,10 +1092,10 @@ frequencies. We will refer to these aliases as *offset aliases* "BQ", "business quarter endfrequency" "QS", "quarter start frequency" "BQS", "business quarter start frequency" - "A", "year end frequency" - "BA", "business year end frequency" - "AS", "year start frequency" - "BAS", "business year start frequency" + "A, Y", "year end frequency" + "BA, BY", "business year end frequency" + "AS, YS", "year start frequency" + "BAS, BYS", "business year start frequency" "BH", "business hour frequency" "H", "hourly frequency" "T, min", "minutely frequency" @@ -1470,11 +1519,13 @@ We can instead only resample those groups where we have points as follows: ts.groupby(partial(round, freq='3T')).sum() +.. _timeseries.aggregate: + Aggregation ~~~~~~~~~~~ -Similar to :ref:`groupby aggregates ` and the :ref:`window functions `, a ``Resampler`` can be selectively -resampled. +Similar to the :ref:`aggregating API `, :ref:`groupby API `, and the :ref:`window functions API `, +a ``Resampler`` can be selectively resampled. Resampling a ``DataFrame``, the default will be to act on all columns with the same function. @@ -1500,14 +1551,6 @@ You can pass a list or dict of functions to do aggregation with, outputting a Da r['A'].agg([np.sum, np.mean, np.std]) -If a dict is passed, the keys will be used to name the columns. Otherwise the -function's name (stored in the function object) will be used. - -.. ipython:: python - - r['A'].agg({'result1' : np.sum, - 'result2' : np.mean}) - On a resampled DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index: @@ -1879,7 +1922,7 @@ then you can use a ``PeriodIndex`` and/or ``Series`` of ``Periods`` to do comput span = pd.period_range('1215-01-01', '1381-01-01', freq='D') span -To convert from a ``int64`` based YYYYMMDD representation. +To convert from an ``int64`` based YYYYMMDD representation. .. ipython:: python diff --git a/doc/source/tutorials.rst b/doc/source/tutorials.rst index c1c1c81915c46..9a97294d4e6d6 100644 --- a/doc/source/tutorials.rst +++ b/doc/source/tutorials.rst @@ -177,3 +177,5 @@ Various Tutorials - `Intro to pandas data structures, by Greg Reda `_ - `Pandas and Python: Top 10, by Manish Amde `_ - `Pandas Tutorial, by Mikhail Semeniuk `_ +- `Pandas DataFrames Tutorial, by Karlijn Willems `_ +- `A concise tutorial with real life examples `_ diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst index 2b2012dbf0b8a..fb799c642131d 100644 --- a/doc/source/visualization.rst +++ b/doc/source/visualization.rst @@ -152,7 +152,7 @@ You can also create these other plots using the methods ``DataFrame.plot.` In addition to these ``kind`` s, there are the :ref:`DataFrame.hist() `, and :ref:`DataFrame.boxplot() ` methods, which use a separate interface. -Finally, there are several :ref:`plotting functions ` in ``pandas.tools.plotting`` +Finally, there are several :ref:`plotting functions ` in ``pandas.plotting`` that take a :class:`Series` or :class:`DataFrame` as an argument. These include @@ -823,7 +823,7 @@ before plotting. Plotting Tools -------------- -These functions can be imported from ``pandas.tools.plotting`` +These functions can be imported from ``pandas.plotting`` and take a :class:`Series` or :class:`DataFrame` as an argument. .. _visualization.scatter_matrix: @@ -834,7 +834,7 @@ Scatter Matrix Plot .. versionadded:: 0.7.3 You can create a scatter plot matrix using the -``scatter_matrix`` method in ``pandas.tools.plotting``: +``scatter_matrix`` method in ``pandas.plotting``: .. ipython:: python :suppress: @@ -843,7 +843,7 @@ You can create a scatter plot matrix using the .. ipython:: python - from pandas.tools.plotting import scatter_matrix + from pandas.plotting import scatter_matrix df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd']) @savefig scatter_matrix_kde.png @@ -896,7 +896,7 @@ of the same class will usually be closer together and form larger structures. .. ipython:: python - from pandas.tools.plotting import andrews_curves + from pandas.plotting import andrews_curves data = pd.read_csv('data/iris.data') @@ -918,7 +918,7 @@ represents one data point. Points that tend to cluster will appear closer togeth .. ipython:: python - from pandas.tools.plotting import parallel_coordinates + from pandas.plotting import parallel_coordinates data = pd.read_csv('data/iris.data') @@ -948,7 +948,7 @@ implies that the underlying data are not random. .. ipython:: python - from pandas.tools.plotting import lag_plot + from pandas.plotting import lag_plot plt.figure() @@ -983,7 +983,7 @@ confidence band. .. ipython:: python - from pandas.tools.plotting import autocorrelation_plot + from pandas.plotting import autocorrelation_plot plt.figure() @@ -1016,7 +1016,7 @@ are what constitutes the bootstrap plot. .. ipython:: python - from pandas.tools.plotting import bootstrap_plot + from pandas.plotting import bootstrap_plot data = pd.Series(np.random.rand(1000)) @@ -1048,7 +1048,7 @@ be colored differently. .. ipython:: python - from pandas.tools.plotting import radviz + from pandas.plotting import radviz data = pd.read_csv('data/iris.data') @@ -1228,14 +1228,14 @@ Using the ``x_compat`` parameter, you can suppress this behavior: plt.close('all') If you have more than one plot that needs to be suppressed, the ``use`` method -in ``pandas.plot_params`` can be used in a `with statement`: +in ``pandas.plotting.plot_params`` can be used in a `with statement`: .. ipython:: python plt.figure() @savefig ser_plot_suppress_context.png - with pd.plot_params.use('x_compat', True): + with pd.plotting.plot_params.use('x_compat', True): df.A.plot(color='r') df.B.plot(color='g') df.C.plot(color='b') @@ -1245,6 +1245,18 @@ in ``pandas.plot_params`` can be used in a `with statement`: plt.close('all') +Automatic Date Tick Adjustment +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.20.0 + +``TimedeltaIndex`` now uses the native matplotlib +tick locator methods, it is useful to call the automatic +date tick adjustment from matplotlib for figures whose ticklabels overlap. + +See the :meth:`autofmt_xdate ` method and the +`matplotlib documentation `__ for more. + Subplots ~~~~~~~~ @@ -1438,11 +1450,11 @@ Also, you can pass different :class:`DataFrame` or :class:`Series` for ``table`` plt.close('all') -Finally, there is a helper function ``pandas.tools.plotting.table`` to create a table from :class:`DataFrame` and :class:`Series`, and add it to an ``matplotlib.Axes``. This function can accept keywords which matplotlib table has. +Finally, there is a helper function ``pandas.plotting.table`` to create a table from :class:`DataFrame` and :class:`Series`, and add it to an ``matplotlib.Axes``. This function can accept keywords which matplotlib table has. .. ipython:: python - from pandas.tools.plotting import table + from pandas.plotting import table fig, ax = plt.subplots(1, 1) table(ax, np.round(df.describe(), 2), diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst index d6fb1c6a8f9cc..3385bafc26467 100644 --- a/doc/source/whatsnew.rst +++ b/doc/source/whatsnew.rst @@ -18,6 +18,12 @@ What's New These are new features and improvements of note in each release. +.. include:: whatsnew/v0.21.0.txt + +.. include:: whatsnew/v0.20.3.txt + +.. include:: whatsnew/v0.20.2.txt + .. include:: whatsnew/v0.20.0.txt .. include:: whatsnew/v0.19.2.txt diff --git a/doc/source/whatsnew/v0.10.0.txt b/doc/source/whatsnew/v0.10.0.txt index fed3ba3ce3a84..f0db1d82252c1 100644 --- a/doc/source/whatsnew/v0.10.0.txt +++ b/doc/source/whatsnew/v0.10.0.txt @@ -128,15 +128,45 @@ labeled the aggregated group with the end of the interval: the next day). ``notnull``. That they ever were was a relic of early pandas. This behavior can be re-enabled globally by the ``mode.use_inf_as_null`` option: -.. ipython:: python +.. code-block:: ipython - s = pd.Series([1.5, np.inf, 3.4, -np.inf]) - pd.isnull(s) - s.fillna(0) - pd.set_option('use_inf_as_null', True) - pd.isnull(s) - s.fillna(0) - pd.reset_option('use_inf_as_null') + In [6]: s = pd.Series([1.5, np.inf, 3.4, -np.inf]) + + In [7]: pd.isnull(s) + Out[7]: + 0 False + 1 False + 2 False + 3 False + Length: 4, dtype: bool + + In [8]: s.fillna(0) + Out[8]: + 0 1.500000 + 1 inf + 2 3.400000 + 3 -inf + Length: 4, dtype: float64 + + In [9]: pd.set_option('use_inf_as_null', True) + + In [10]: pd.isnull(s) + Out[10]: + 0 False + 1 True + 2 False + 3 True + Length: 4, dtype: bool + + In [11]: s.fillna(0) + Out[11]: + 0 1.5 + 1 0.0 + 2 3.4 + 3 0.0 + Length: 4, dtype: float64 + + In [12]: pd.reset_option('use_inf_as_null') - Methods with the ``inplace`` option now all return ``None`` instead of the calling object. E.g. code written like ``df = df.fillna(0, inplace=True)`` @@ -303,11 +333,10 @@ Updated PyTables Support store.append('wp',wp) # selecting via A QUERY - store.select('wp', - [ Term('major_axis>20000102'), Term('minor_axis', '=', ['A','B']) ]) + store.select('wp', "major_axis>20000102 and minor_axis=['A','B']") # removing data from tables - store.remove('wp', Term('major_axis>20000103')) + store.remove('wp', "major_axis>20000103") store.select('wp') # deleting a store diff --git a/doc/source/whatsnew/v0.10.1.txt b/doc/source/whatsnew/v0.10.1.txt index edc628fe85027..d5880e44e46c6 100644 --- a/doc/source/whatsnew/v0.10.1.txt +++ b/doc/source/whatsnew/v0.10.1.txt @@ -58,7 +58,7 @@ perform queries on a table, by passing a list to ``data_columns`` # on-disk operations store.append('df', df, data_columns = ['B','C','string','string2']) - store.select('df',[ 'B > 0', 'string == foo' ]) + store.select('df', "B>0 and string=='foo'") # this is in-memory version of this type of selection df[(df.B > 0) & (df.string == 'foo')] @@ -110,7 +110,7 @@ columns, this is equivalent to passing a store.select('mi') # the levels are automatically included as data columns - store.select('mi', Term('foo=bar')) + store.select('mi', "foo='bar'") Multi-table creation via ``append_to_multiple`` and selection via ``select_as_multiple`` can create/select from multiple tables and return a diff --git a/doc/source/whatsnew/v0.13.0.txt b/doc/source/whatsnew/v0.13.0.txt index 118632cc2c0ee..f440be1ddd56e 100644 --- a/doc/source/whatsnew/v0.13.0.txt +++ b/doc/source/whatsnew/v0.13.0.txt @@ -357,11 +357,11 @@ HDFStore API Changes .. ipython:: python path = 'test.h5' - df = DataFrame(randn(10,2)) + df = pd.DataFrame(np.random.randn(10,2)) df.to_hdf(path,'df_table',format='table') df.to_hdf(path,'df_table2',append=True) df.to_hdf(path,'df_fixed') - with get_store(path) as store: + with pd.HDFStore(path) as store: print(store) .. ipython:: python @@ -790,7 +790,7 @@ Experimental .. ipython:: python for o in pd.read_msgpack('foo.msg',iterator=True): - print o + print(o) .. ipython:: python :suppress: diff --git a/doc/source/whatsnew/v0.13.1.txt b/doc/source/whatsnew/v0.13.1.txt index d5d54ba43b622..5e5653945fefa 100644 --- a/doc/source/whatsnew/v0.13.1.txt +++ b/doc/source/whatsnew/v0.13.1.txt @@ -125,7 +125,7 @@ API changes df = DataFrame({'col':['foo', 0, np.nan]}) df2 = DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0]) df.equals(df2) - df.equals(df2.sort()) + df.equals(df2.sort_index()) import pandas.core.common as com com.array_equivalent(np.array([0, np.nan]), np.array([0, np.nan])) diff --git a/doc/source/whatsnew/v0.15.0.txt b/doc/source/whatsnew/v0.15.0.txt index aff8ec9092cdc..6282f15b6faeb 100644 --- a/doc/source/whatsnew/v0.15.0.txt +++ b/doc/source/whatsnew/v0.15.0.txt @@ -80,7 +80,7 @@ For full docs, see the :ref:`categorical introduction ` and the # Reorder the categories and simultaneously add the missing categories df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) df["grade"] - df.sort("grade") + df.sort_values("grade") df.groupby("grade").size() - ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct diff --git a/doc/source/whatsnew/v0.17.0.txt b/doc/source/whatsnew/v0.17.0.txt index 9cb299593076d..a3bbaf73c01ca 100644 --- a/doc/source/whatsnew/v0.17.0.txt +++ b/doc/source/whatsnew/v0.17.0.txt @@ -329,7 +329,7 @@ has been changed to make this keyword unnecessary - the change is shown below. Google BigQuery Enhancements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added ability to automatically create a table/dataset using the :func:`pandas.io.gbq.to_gbq` function if the destination table/dataset does not exist. (:issue:`8325`, :issue:`11121`). -- Added ability to replace an existing table and schema when calling the :func:`pandas.io.gbq.to_gbq` function via the ``if_exists`` argument. See the :ref:`docs ` for more details (:issue:`8325`). +- Added ability to replace an existing table and schema when calling the :func:`pandas.io.gbq.to_gbq` function via the ``if_exists`` argument. See the `docs `__ for more details (:issue:`8325`). - ``InvalidColumnOrder`` and ``InvalidPageToken`` in the gbq module will raise ``ValueError`` instead of ``IOError``. - The ``generate_bq_schema()`` function is now deprecated and will be removed in a future version (:issue:`11121`) - The gbq module will now support Python 3 (:issue:`11094`). diff --git a/doc/source/whatsnew/v0.17.1.txt b/doc/source/whatsnew/v0.17.1.txt index 17496c84b7181..1ad7279ea79f7 100644 --- a/doc/source/whatsnew/v0.17.1.txt +++ b/doc/source/whatsnew/v0.17.1.txt @@ -58,7 +58,7 @@ We can render the HTML to get the following table. :file: whatsnew_0171_html_table.html :class:`~pandas.core.style.Styler` interacts nicely with the Jupyter Notebook. -See the :ref:`documentation - - - {% if caption %} - - {% endif %} - - - {% for r in head %} - - {% for c in r %} - {% if c.is_visible != False %} - <{{c.type}} class="{{c.class}}" {{ c.attributes|join(" ") }}> - {{c.value}} - {% endif %} - {% endfor %} - - {% endfor %} - - - {% for r in body %} - - {% for c in r %} - {% if c.is_visible != False %} - <{{c.type}} id="T_{{uuid}}{{c.id}}" - class="{{c.class}}" {{ c.attributes|join(" ") }}> - {{ c.display_value }} - {% endif %} - {% endfor %} - - {% endfor %} - -
{{caption}}
- """) - - def __init__(self, data, precision=None, table_styles=None, uuid=None, - caption=None, table_attributes=None): - self.ctx = defaultdict(list) - self._todo = [] - - if not isinstance(data, (pd.Series, pd.DataFrame)): - raise TypeError("``data`` must be a Series or DataFrame") - if data.ndim == 1: - data = data.to_frame() - if not data.index.is_unique or not data.columns.is_unique: - raise ValueError("style is not supported for non-unique indicies.") - - self.data = data - self.index = data.index - self.columns = data.columns - - self.uuid = uuid - self.table_styles = table_styles - self.caption = caption - if precision is None: - precision = get_option('display.precision') - self.precision = precision - self.table_attributes = table_attributes - # display_funcs maps (row, col) -> formatting function - - def default_display_func(x): - if is_float(x): - return '{:>.{precision}g}'.format(x, precision=self.precision) - else: - return x - - self._display_funcs = defaultdict(lambda: default_display_func) - - def _repr_html_(self): - """Hooks into Jupyter notebook rich display system.""" - return self.render() - - def _translate(self): - """ - Convert the DataFrame in `self.data` and the attrs from `_build_styles` - into a dictionary of {head, body, uuid, cellstyle} - """ - table_styles = self.table_styles or [] - caption = self.caption - ctx = self.ctx - precision = self.precision - uuid = self.uuid or str(uuid1()).replace("-", "_") - ROW_HEADING_CLASS = "row_heading" - COL_HEADING_CLASS = "col_heading" - INDEX_NAME_CLASS = "index_name" - - DATA_CLASS = "data" - BLANK_CLASS = "blank" - BLANK_VALUE = "" - - def format_attr(pair): - return "{key}={value}".format(**pair) - - # for sparsifying a MultiIndex - idx_lengths = _get_level_lengths(self.index) - col_lengths = _get_level_lengths(self.columns) - - cell_context = dict() - - n_rlvls = self.data.index.nlevels - n_clvls = self.data.columns.nlevels - rlabels = self.data.index.tolist() - clabels = self.data.columns.tolist() - - if n_rlvls == 1: - rlabels = [[x] for x in rlabels] - if n_clvls == 1: - clabels = [[x] for x in clabels] - clabels = list(zip(*clabels)) - - cellstyle = [] - head = [] - - for r in range(n_clvls): - # Blank for Index columns... - row_es = [{"type": "th", - "value": BLANK_VALUE, - "display_value": BLANK_VALUE, - "is_visible": True, - "class": " ".join([BLANK_CLASS])}] * (n_rlvls - 1) - - # ... except maybe the last for columns.names - name = self.data.columns.names[r] - cs = [BLANK_CLASS if name is None else INDEX_NAME_CLASS, - "level%s" % r] - name = BLANK_VALUE if name is None else name - row_es.append({"type": "th", - "value": name, - "display_value": name, - "class": " ".join(cs), - "is_visible": True}) - - for c in range(len(clabels[0])): - cs = [COL_HEADING_CLASS, "level%s" % r, "col%s" % c] - cs.extend(cell_context.get( - "col_headings", {}).get(r, {}).get(c, [])) - value = clabels[r][c] - row_es.append({"type": "th", - "value": value, - "display_value": value, - "class": " ".join(cs), - "is_visible": _is_visible(c, r, col_lengths), - "attributes": [ - format_attr({"key": "colspan", - "value": col_lengths.get( - (r, c), 1)}) - ]}) - head.append(row_es) - - if self.data.index.names and not all(x is None - for x in self.data.index.names): - index_header_row = [] - - for c, name in enumerate(self.data.index.names): - cs = [INDEX_NAME_CLASS, - "level%s" % c] - name = '' if name is None else name - index_header_row.append({"type": "th", "value": name, - "class": " ".join(cs)}) - - index_header_row.extend( - [{"type": "th", - "value": BLANK_VALUE, - "class": " ".join([BLANK_CLASS]) - }] * len(clabels[0])) - - head.append(index_header_row) - - body = [] - for r, idx in enumerate(self.data.index): - # cs.extend( - # cell_context.get("row_headings", {}).get(r, {}).get(c, [])) - row_es = [{"type": "th", - "is_visible": _is_visible(r, c, idx_lengths), - "attributes": [ - format_attr({"key": "rowspan", - "value": idx_lengths.get((c, r), 1)}) - ], - "value": rlabels[r][c], - "class": " ".join([ROW_HEADING_CLASS, "level%s" % c, - "row%s" % r]), - "display_value": rlabels[r][c]} - for c in range(len(rlabels[r]))] - - for c, col in enumerate(self.data.columns): - cs = [DATA_CLASS, "row%s" % r, "col%s" % c] - cs.extend(cell_context.get("data", {}).get(r, {}).get(c, [])) - formatter = self._display_funcs[(r, c)] - value = self.data.iloc[r, c] - row_es.append({ - "type": "td", - "value": value, - "class": " ".join(cs), - "id": "_".join(cs[1:]), - "display_value": formatter(value) - }) - props = [] - for x in ctx[r, c]: - # have to handle empty styles like [''] - if x.count(":"): - props.append(x.split(":")) - else: - props.append(['', '']) - cellstyle.append({'props': props, - 'selector': "row%s_col%s" % (r, c)}) - body.append(row_es) - - return dict(head=head, cellstyle=cellstyle, body=body, uuid=uuid, - precision=precision, table_styles=table_styles, - caption=caption, table_attributes=self.table_attributes) - - def format(self, formatter, subset=None): - """ - Format the text display value of cells. - - .. versionadded:: 0.18.0 - - Parameters - ---------- - formatter: str, callable, or dict - subset: IndexSlice - An argument to ``DataFrame.loc`` that restricts which elements - ``formatter`` is applied to. - - Returns - ------- - self : Styler - - Notes - ----- - - ``formatter`` is either an ``a`` or a dict ``{column name: a}`` where - ``a`` is one of - - - str: this will be wrapped in: ``a.format(x)`` - - callable: called with the value of an individual cell - - The default display value for numeric values is the "general" (``g``) - format with ``pd.options.display.precision`` precision. - - Examples - -------- - - >>> df = pd.DataFrame(np.random.randn(4, 2), columns=['a', 'b']) - >>> df.style.format("{:.2%}") - >>> df['c'] = ['a', 'b', 'c', 'd'] - >>> df.style.format({'C': str.upper}) - """ - if subset is None: - row_locs = range(len(self.data)) - col_locs = range(len(self.data.columns)) - else: - subset = _non_reducing_slice(subset) - if len(subset) == 1: - subset = subset, self.data.columns - - sub_df = self.data.loc[subset] - row_locs = self.data.index.get_indexer_for(sub_df.index) - col_locs = self.data.columns.get_indexer_for(sub_df.columns) - - if isinstance(formatter, MutableMapping): - for col, col_formatter in formatter.items(): - # formatter must be callable, so '{}' are converted to lambdas - col_formatter = _maybe_wrap_formatter(col_formatter) - col_num = self.data.columns.get_indexer_for([col])[0] - - for row_num in row_locs: - self._display_funcs[(row_num, col_num)] = col_formatter - else: - # single scalar to format all cells with - locs = product(*(row_locs, col_locs)) - for i, j in locs: - formatter = _maybe_wrap_formatter(formatter) - self._display_funcs[(i, j)] = formatter - return self - - def render(self): - """ - Render the built up styles to HTML - - .. versionadded:: 0.17.1 - - Returns - ------- - rendered: str - the rendered HTML - - Notes - ----- - ``Styler`` objects have defined the ``_repr_html_`` method - which automatically calls ``self.render()`` when it's the - last item in a Notebook cell. When calling ``Styler.render()`` - directly, wrap the result in ``IPython.display.HTML`` to view - the rendered HTML in the notebook. - """ - self._compute() - d = self._translate() - # filter out empty styles, every cell will have a class - # but the list of props may just be [['', '']]. - # so we have the neested anys below - trimmed = [x for x in d['cellstyle'] - if any(any(y) for y in x['props'])] - d['cellstyle'] = trimmed - return self.template.render(**d) - - def _update_ctx(self, attrs): - """ - update the state of the Styler. Collects a mapping - of {index_label: [': ']} - - attrs: Series or DataFrame - should contain strings of ': ;: ' - Whitespace shouldn't matter and the final trailing ';' shouldn't - matter. - """ - for row_label, v in attrs.iterrows(): - for col_label, col in v.iteritems(): - i = self.index.get_indexer([row_label])[0] - j = self.columns.get_indexer([col_label])[0] - for pair in col.rstrip(";").split(";"): - self.ctx[(i, j)].append(pair) - - def _copy(self, deepcopy=False): - styler = Styler(self.data, precision=self.precision, - caption=self.caption, uuid=self.uuid, - table_styles=self.table_styles) - if deepcopy: - styler.ctx = copy.deepcopy(self.ctx) - styler._todo = copy.deepcopy(self._todo) - else: - styler.ctx = self.ctx - styler._todo = self._todo - return styler - - def __copy__(self): - """ - Deep copy by default. - """ - return self._copy(deepcopy=False) - - def __deepcopy__(self, memo): - return self._copy(deepcopy=True) - - def clear(self): - """"Reset" the styler, removing any previously applied styles. - Returns None. - """ - self.ctx.clear() - self._todo = [] - - def _compute(self): - """ - Execute the style functions built up in `self._todo`. - - Relies on the conventions that all style functions go through - .apply or .applymap. The append styles to apply as tuples of - - (application method, *args, **kwargs) - """ - r = self - for func, args, kwargs in self._todo: - r = func(self)(*args, **kwargs) - return r - - def _apply(self, func, axis=0, subset=None, **kwargs): - subset = slice(None) if subset is None else subset - subset = _non_reducing_slice(subset) - data = self.data.loc[subset] - if axis is not None: - result = data.apply(func, axis=axis, **kwargs) - else: - result = func(data, **kwargs) - if not isinstance(result, pd.DataFrame): - raise TypeError( - "Function {!r} must return a DataFrame when " - "passed to `Styler.apply` with axis=None".format(func)) - if not (result.index.equals(data.index) and - result.columns.equals(data.columns)): - msg = ('Result of {!r} must have identical index and columns ' - 'as the input'.format(func)) - raise ValueError(msg) - - result_shape = result.shape - expected_shape = self.data.loc[subset].shape - if result_shape != expected_shape: - msg = ("Function {!r} returned the wrong shape.\n" - "Result has shape: {}\n" - "Expected shape: {}".format(func, - result.shape, - expected_shape)) - raise ValueError(msg) - self._update_ctx(result) - return self - - def apply(self, func, axis=0, subset=None, **kwargs): - """ - Apply a function column-wise, row-wise, or table-wase, - updating the HTML representation with the result. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - func : function - ``func`` should take a Series or DataFrame (depending - on ``axis``), and return an object with the same shape. - Must return a DataFrame with identical index and - column labels when ``axis=None`` - axis : int, str or None - apply to each column (``axis=0`` or ``'index'``) - or to each row (``axis=1`` or ``'columns'``) or - to the entire DataFrame at once with ``axis=None`` - subset : IndexSlice - a valid indexer to limit ``data`` to *before* applying the - function. Consider using a pandas.IndexSlice - kwargs : dict - pass along to ``func`` - - Returns - ------- - self : Styler - - Notes - ----- - The output shape of ``func`` should match the input, i.e. if - ``x`` is the input row, column, or table (depending on ``axis``), - then ``func(x.shape) == x.shape`` should be true. - - This is similar to ``DataFrame.apply``, except that ``axis=None`` - applies the function to the entire DataFrame at once, - rather than column-wise or row-wise. - - Examples - -------- - >>> def highlight_max(x): - ... return ['background-color: yellow' if v == x.max() else '' - for v in x] - ... - >>> df = pd.DataFrame(np.random.randn(5, 2)) - >>> df.style.apply(highlight_max) - """ - self._todo.append((lambda instance: getattr(instance, '_apply'), - (func, axis, subset), kwargs)) - return self - - def _applymap(self, func, subset=None, **kwargs): - func = partial(func, **kwargs) # applymap doesn't take kwargs? - if subset is None: - subset = pd.IndexSlice[:] - subset = _non_reducing_slice(subset) - result = self.data.loc[subset].applymap(func) - self._update_ctx(result) - return self - - def applymap(self, func, subset=None, **kwargs): - """ - Apply a function elementwise, updating the HTML - representation with the result. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - func : function - ``func`` should take a scalar and return a scalar - subset : IndexSlice - a valid indexer to limit ``data`` to *before* applying the - function. Consider using a pandas.IndexSlice - kwargs : dict - pass along to ``func`` - - Returns - ------- - self : Styler - - """ - self._todo.append((lambda instance: getattr(instance, '_applymap'), - (func, subset), kwargs)) - return self - - def set_precision(self, precision): - """ - Set the precision used to render. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - precision: int - - Returns - ------- - self : Styler - """ - self.precision = precision - return self - - def set_table_attributes(self, attributes): - """ - Set the table attributes. These are the items - that show up in the opening ```` tag in addition - to to automatic (by default) id. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - precision: int - - Returns - ------- - self : Styler - """ - self.table_attributes = attributes - return self - - def export(self): - """ - Export the styles to applied to the current Styler. - Can be applied to a second style with ``Styler.use``. - - .. versionadded:: 0.17.1 - - Returns - ------- - styles: list - - See Also - -------- - Styler.use - """ - return self._todo - - def use(self, styles): - """ - Set the styles on the current Styler, possibly using styles - from ``Styler.export``. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - styles: list - list of style functions - - Returns - ------- - self : Styler - - See Also - -------- - Styler.export - """ - self._todo.extend(styles) - return self - - def set_uuid(self, uuid): - """ - Set the uuid for a Styler. - - .. versionadded:: 0.17.1 - - Parameters - ---------- - uuid: str - - Returns - ------- - self : Styler - """ - self.uuid = uuid - return self - - def set_caption(self, caption): - """ - Se the caption on a Styler - - .. versionadded:: 0.17.1 - - Parameters - ---------- - caption: str - - Returns - ------- - self : Styler - """ - self.caption = caption - return self - - def set_table_styles(self, table_styles): - """ - Set the table styles on a Styler. These are placed in a - ``""") + if self.notebook: + self.write(template) + def write_result(self, buf): indent = 0 frame = self.frame @@ -1000,8 +1158,8 @@ def write_result(self, buf): if isinstance(self.classes, str): self.classes = self.classes.split() if not isinstance(self.classes, (list, tuple)): - raise AssertionError('classes must be list or tuple, ' - 'not %s' % type(self.classes)) + raise AssertionError('classes must be list or tuple, not {typ}' + .format(typ=type(self.classes))) _classes.extend(self.classes) if self.notebook: @@ -1013,11 +1171,11 @@ def write_result(self, buf): except (ImportError, AttributeError): pass - self.write(''.format(div_style)) + self.write(''.format(style=div_style)) - self.write('
' % (self.border, - ' '.join(_classes)), - indent) + self.write_style() + self.write('
' + .format(border=self.border, cls=' '.join(_classes)), indent) indent += self.indent_delta indent = self._write_header(indent) @@ -1026,8 +1184,10 @@ def write_result(self, buf): self.write('
', indent) if self.should_show_dimensions: by = chr(215) if compat.PY3 else unichr(215) # × - self.write(u('

%d rows %s %d columns

') % - (len(frame), by, len(frame.columns))) + self.write(u('

{rows} rows {by} {cols} columns

') + .format(rows=len(frame), + by=by, + cols=len(frame.columns))) if self.notebook: self.write('') @@ -1052,7 +1212,7 @@ def _column_header(): row.append(single_column_table(self.columns.names)) else: row.append('') - style = "text-align: %s;" % self.fmt.justify + style = "text-align: {just};".format(just=self.fmt.justify) row.extend([single_column_table(c, self.fmt.justify, style) for c in self.columns]) else: @@ -1067,7 +1227,7 @@ def _column_header(): indent += self.indent_delta if isinstance(self.columns, MultiIndex): - template = 'colspan="%d" halign="left"' + template = 'colspan="{span:d}" halign="left"' if self.fmt.sparsify: # GH3547 @@ -1076,7 +1236,7 @@ def _column_header(): sentinel = None levels = self.columns.format(sparsify=sentinel, adjoin=False, names=False) - level_lengths = _get_level_lengths(levels, sentinel) + level_lengths = get_level_lengths(levels, sentinel) inner_lvl = len(level_lengths) - 1 for lnum, (records, values) in enumerate(zip(level_lengths, levels)): @@ -1135,7 +1295,7 @@ def _column_header(): for i, v in enumerate(values): if i in records: if records[i] > 1: - tags[j] = template % records[i] + tags[j] = template.format(span=records[i]) else: continue j += 1 @@ -1153,7 +1313,9 @@ def _column_header(): self.write_tr(col_row, indent, self.indent_delta, header=True, align=align) - if self.fmt.has_index_names and self.fmt.index: + if all((self.fmt.has_index_names, + self.fmt.index, + self.fmt.show_index_names)): row = ([x if x is not None else '' for x in self.frame.index.names] + [''] * min(len(self.columns), self.max_cols)) @@ -1223,7 +1385,7 @@ def _write_regular_rows(self, fmt_values, indent): nindex_levels=1) def _write_hierarchical_rows(self, fmt_values, indent): - template = 'rowspan="%d" valign="top"' + template = 'rowspan="{span}" valign="top"' truncate_h = self.fmt.truncate_h truncate_v = self.fmt.truncate_v @@ -1242,7 +1404,7 @@ def _write_hierarchical_rows(self, fmt_values, indent): levels = frame.index.format(sparsify=sentinel, adjoin=False, names=False) - level_lengths = _get_level_lengths(levels, sentinel) + level_lengths = get_level_lengths(levels, sentinel) inner_lvl = len(level_lengths) - 1 if truncate_v: # Insert ... row and adjust idx_values and @@ -1298,7 +1460,7 @@ def _write_hierarchical_rows(self, fmt_values, indent): for records, v in zip(level_lengths, idx_values[i]): if i in records: if records[i] > 1: - tags[j] = template % records[i] + tags[j] = template.format(span=records[i]) else: sparse_offset += 1 continue @@ -1325,47 +1487,8 @@ def _write_hierarchical_rows(self, fmt_values, indent): nindex_levels=frame.index.nlevels) -def _get_level_lengths(levels, sentinel=''): - """For each index in each level the function returns lengths of indexes. - - Parameters - ---------- - levels : list of lists - List of values on for level. - sentinel : string, optional - Value which states that no new index starts on there. - - Returns - ---------- - Returns list of maps. For each level returns map of indexes (key is index - in row and value is length of index). - """ - if len(levels) == 0: - return [] - - control = [True for x in levels[0]] - - result = [] - for level in levels: - last_index = 0 - - lengths = {} - for i, key in enumerate(level): - if control[i] and key == sentinel: - pass - else: - control[i] = False - lengths[last_index] = i - last_index - last_index = i - - lengths[last_index] = len(level) - last_index - - result.append(lengths) - - return result - - class CSVFormatter(object): + def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', float_format=None, cols=None, header=True, index=True, index_label=None, mode='w', nanRep=None, encoding=None, @@ -1379,7 +1502,7 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', if path_or_buf is None: path_or_buf = StringIO() - self.path_or_buf = _expand_user(path_or_buf) + self.path_or_buf = _expand_user(_stringify_path(path_or_buf)) self.sep = sep self.na_rep = na_rep self.float_format = float_format @@ -1452,12 +1575,9 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', self.chunksize = int(chunksize) self.data_index = obj.index - if isinstance(obj.index, PeriodIndex): - self.data_index = obj.index.to_timestamp() - - if (isinstance(self.data_index, DatetimeIndex) and + if (isinstance(self.data_index, (DatetimeIndex, PeriodIndex)) and date_format is not None): - self.data_index = Index([x.strftime(date_format) if notnull(x) else + self.data_index = Index([x.strftime(date_format) if notna(x) else '' for x in self.data_index]) self.nlevels = getattr(self.data_index, 'nlevels', 1) @@ -1508,8 +1628,9 @@ def _save_header(self): return if has_aliases: if len(header) != len(cols): - raise ValueError(('Writing %d cols but got %d aliases' - % (len(cols), len(header)))) + raise ValueError(('Writing {ncols} cols but got {nalias} ' + 'aliases'.format(ncols=len(cols), + nalias=len(header)))) else: write_cols = header else: @@ -1561,7 +1682,7 @@ def _save_header(self): if isinstance(index_label, list) and len(index_label) > 1: col_line.extend([''] * (len(index_label) - 1)) - col_line.extend(columns.get_level_values(i)) + col_line.extend(columns._get_level_values(i)) writer.writerow(col_line) @@ -1616,298 +1737,6 @@ def _save_chunk(self, start_i, end_i): lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer) -# from collections import namedtuple -# ExcelCell = namedtuple("ExcelCell", -# 'row, col, val, style, mergestart, mergeend') - - -class ExcelCell(object): - __fields__ = ('row', 'col', 'val', 'style', 'mergestart', 'mergeend') - __slots__ = __fields__ - - def __init__(self, row, col, val, style=None, mergestart=None, - mergeend=None): - self.row = row - self.col = col - self.val = val - self.style = style - self.mergestart = mergestart - self.mergeend = mergeend - - -header_style = {"font": {"bold": True}, - "borders": {"top": "thin", - "right": "thin", - "bottom": "thin", - "left": "thin"}, - "alignment": {"horizontal": "center", - "vertical": "top"}} - - -class ExcelFormatter(object): - """ - Class for formatting a DataFrame to a list of ExcelCells, - - Parameters - ---------- - df : dataframe - na_rep: na representation - float_format : string, default None - Format string for floating point numbers - cols : sequence, optional - Columns to write - header : boolean or list of string, default True - Write out column names. If a list of string is given it is - assumed to be aliases for the column names - index : boolean, default True - output row names (index) - index_label : string or sequence, default None - Column label for index column(s) if desired. If None is given, and - `header` and `index` are True, then the index names are used. A - sequence should be given if the DataFrame uses MultiIndex. - merge_cells : boolean, default False - Format MultiIndex and Hierarchical Rows as merged cells. - inf_rep : string, default `'inf'` - representation for np.inf values (which aren't representable in Excel) - A `'-'` sign will be added in front of -inf. - """ - - def __init__(self, df, na_rep='', float_format=None, cols=None, - header=True, index=True, index_label=None, merge_cells=False, - inf_rep='inf'): - self.rowcounter = 0 - self.na_rep = na_rep - self.df = df - if cols is not None: - self.df = df.loc[:, cols] - self.columns = self.df.columns - self.float_format = float_format - self.index = index - self.index_label = index_label - self.header = header - self.merge_cells = merge_cells - self.inf_rep = inf_rep - - def _format_value(self, val): - if lib.checknull(val): - val = self.na_rep - elif is_float(val): - if lib.isposinf_scalar(val): - val = self.inf_rep - elif lib.isneginf_scalar(val): - val = '-%s' % self.inf_rep - elif self.float_format is not None: - val = float(self.float_format % val) - return val - - def _format_header_mi(self): - if self.columns.nlevels > 1: - if not self.index: - raise NotImplementedError("Writing to Excel with MultiIndex" - " columns and no index " - "('index'=False) is not yet " - "implemented.") - - has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) - if not (has_aliases or self.header): - return - - columns = self.columns - level_strs = columns.format(sparsify=self.merge_cells, adjoin=False, - names=False) - level_lengths = _get_level_lengths(level_strs) - coloffset = 0 - lnum = 0 - - if self.index and isinstance(self.df.index, MultiIndex): - coloffset = len(self.df.index[0]) - 1 - - if self.merge_cells: - # Format multi-index as a merged cells. - for lnum in range(len(level_lengths)): - name = columns.names[lnum] - yield ExcelCell(lnum, coloffset, name, header_style) - - for lnum, (spans, levels, labels) in enumerate(zip( - level_lengths, columns.levels, columns.labels)): - values = levels.take(labels) - for i in spans: - if spans[i] > 1: - yield ExcelCell(lnum, coloffset + i + 1, values[i], - header_style, lnum, - coloffset + i + spans[i]) - else: - yield ExcelCell(lnum, coloffset + i + 1, values[i], - header_style) - else: - # Format in legacy format with dots to indicate levels. - for i, values in enumerate(zip(*level_strs)): - v = ".".join(map(pprint_thing, values)) - yield ExcelCell(lnum, coloffset + i + 1, v, header_style) - - self.rowcounter = lnum - - def _format_header_regular(self): - has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) - if has_aliases or self.header: - coloffset = 0 - - if self.index: - coloffset = 1 - if isinstance(self.df.index, MultiIndex): - coloffset = len(self.df.index[0]) - - colnames = self.columns - if has_aliases: - if len(self.header) != len(self.columns): - raise ValueError('Writing %d cols but got %d aliases' % - (len(self.columns), len(self.header))) - else: - colnames = self.header - - for colindex, colname in enumerate(colnames): - yield ExcelCell(self.rowcounter, colindex + coloffset, colname, - header_style) - - def _format_header(self): - if isinstance(self.columns, MultiIndex): - gen = self._format_header_mi() - else: - gen = self._format_header_regular() - - gen2 = () - if self.df.index.names: - row = [x if x is not None else '' - for x in self.df.index.names] + [''] * len(self.columns) - if reduce(lambda x, y: x and y, map(lambda x: x != '', row)): - gen2 = (ExcelCell(self.rowcounter, colindex, val, header_style) - for colindex, val in enumerate(row)) - self.rowcounter += 1 - return itertools.chain(gen, gen2) - - def _format_body(self): - - if isinstance(self.df.index, MultiIndex): - return self._format_hierarchical_rows() - else: - return self._format_regular_rows() - - def _format_regular_rows(self): - has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) - if has_aliases or self.header: - self.rowcounter += 1 - - coloffset = 0 - # output index and index_label? - if self.index: - # chek aliases - # if list only take first as this is not a MultiIndex - if (self.index_label and - isinstance(self.index_label, (list, tuple, np.ndarray, - Index))): - index_label = self.index_label[0] - # if string good to go - elif self.index_label and isinstance(self.index_label, str): - index_label = self.index_label - else: - index_label = self.df.index.names[0] - - if isinstance(self.columns, MultiIndex): - self.rowcounter += 1 - - if index_label and self.header is not False: - yield ExcelCell(self.rowcounter - 1, 0, index_label, - header_style) - - # write index_values - index_values = self.df.index - if isinstance(self.df.index, PeriodIndex): - index_values = self.df.index.to_timestamp() - - coloffset = 1 - for idx, idxval in enumerate(index_values): - yield ExcelCell(self.rowcounter + idx, 0, idxval, header_style) - - # Write the body of the frame data series by series. - for colidx in range(len(self.columns)): - series = self.df.iloc[:, colidx] - for i, val in enumerate(series): - yield ExcelCell(self.rowcounter + i, colidx + coloffset, val) - - def _format_hierarchical_rows(self): - has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) - if has_aliases or self.header: - self.rowcounter += 1 - - gcolidx = 0 - - if self.index: - index_labels = self.df.index.names - # check for aliases - if (self.index_label and - isinstance(self.index_label, (list, tuple, np.ndarray, - Index))): - index_labels = self.index_label - - # MultiIndex columns require an extra row - # with index names (blank if None) for - # unambigous round-trip, unless not merging, - # in which case the names all go on one row Issue #11328 - if isinstance(self.columns, MultiIndex) and self.merge_cells: - self.rowcounter += 1 - - # if index labels are not empty go ahead and dump - if (any(x is not None for x in index_labels) and - self.header is not False): - - for cidx, name in enumerate(index_labels): - yield ExcelCell(self.rowcounter - 1, cidx, name, - header_style) - - if self.merge_cells: - # Format hierarchical rows as merged cells. - level_strs = self.df.index.format(sparsify=True, adjoin=False, - names=False) - level_lengths = _get_level_lengths(level_strs) - - for spans, levels, labels in zip(level_lengths, - self.df.index.levels, - self.df.index.labels): - - values = levels.take(labels, - allow_fill=levels._can_hold_na, - fill_value=True) - - for i in spans: - if spans[i] > 1: - yield ExcelCell(self.rowcounter + i, gcolidx, - values[i], header_style, - self.rowcounter + i + spans[i] - 1, - gcolidx) - else: - yield ExcelCell(self.rowcounter + i, gcolidx, - values[i], header_style) - gcolidx += 1 - - else: - # Format hierarchical rows with non-merged values. - for indexcolvals in zip(*self.df.index): - for idx, indexcolval in enumerate(indexcolvals): - yield ExcelCell(self.rowcounter + idx, gcolidx, - indexcolval, header_style) - gcolidx += 1 - - # Write the body of the frame data series by series. - for colidx in range(len(self.columns)): - series = self.df.iloc[:, colidx] - for i, val in enumerate(series): - yield ExcelCell(self.rowcounter + i, gcolidx + colidx, val) - - def get_formatted_cells(self): - for cell in itertools.chain(self._format_header(), - self._format_body()): - cell.val = self._format_value(cell.val) - yield cell # ---------------------------------------------------------------------- # Array formatters @@ -1918,6 +1747,8 @@ def format_array(values, formatter, float_format=None, na_rep='NaN', if is_categorical_dtype(values): fmt_klass = CategoricalArrayFormatter + elif is_interval_dtype(values): + fmt_klass = IntervalArrayFormatter elif is_float_dtype(values.dtype): fmt_klass = FloatArrayFormatter elif is_period_arraylike(values): @@ -1950,6 +1781,7 @@ def format_array(values, formatter, float_format=None, na_rep='NaN', class GenericArrayFormatter(object): + def __init__(self, values, digits=7, formatter=None, na_rep='NaN', space=12, float_format=None, justify='right', decimal='.', quoting=None, fixed_width=True): @@ -1972,8 +1804,9 @@ def _format_strings(self): if self.float_format is None: float_format = get_option("display.float_format") if float_format is None: - fmt_str = '%% .%dg' % get_option("display.precision") - float_format = lambda x: fmt_str % x + fmt_str = ('{{x: .{prec:d}g}}' + .format(prec=get_option("display.precision"))) + float_format = lambda x: fmt_str.format(x=x) else: float_format = self.float_format @@ -1989,10 +1822,10 @@ def _format(x): return 'NaT' return self.na_rep elif isinstance(x, PandasObject): - return '%s' % x + return u'{x}'.format(x=x) else: # object dtype - return '%s' % formatter(x) + return u'{x}'.format(x=formatter(x)) vals = self.values if isinstance(vals, Index): @@ -2000,17 +1833,17 @@ def _format(x): elif isinstance(vals, ABCSparseArray): vals = vals.values - is_float_type = lib.map_infer(vals, is_float) & notnull(vals) + is_float_type = lib.map_infer(vals, is_float) & notna(vals) leading_space = is_float_type.any() fmt_values = [] for i, v in enumerate(vals): if not is_float_type[i] and leading_space: - fmt_values.append(' %s' % _format(v)) + fmt_values.append(u' {v}'.format(v=_format(v))) elif is_float_type[i]: fmt_values.append(float_format(v)) else: - fmt_values.append(' %s' % _format(v)) + fmt_values.append(u' {v}'.format(v=_format(v))) return fmt_values @@ -2046,10 +1879,10 @@ def _value_formatter(self, float_format=None, threshold=None): # because str(0.0) = '0.0' while '%g' % 0.0 = '0' if float_format: def base_formatter(v): - return (float_format % v) if notnull(v) else self.na_rep + return float_format(value=v) if notna(v) else self.na_rep else: def base_formatter(v): - return str(v) if notnull(v) else self.na_rep + return str(v) if notna(v) else self.na_rep if self.decimal != '.': def decimal_formatter(v): @@ -2061,7 +1894,7 @@ def decimal_formatter(v): return decimal_formatter def formatter(value): - if notnull(value): + if notna(value): if abs(value) > threshold: return decimal_formatter(value) else: @@ -2091,7 +1924,7 @@ def format_values_with(float_format): # separate the wheat from the chaff values = self.values - mask = isnull(values) + mask = isna(values) if hasattr(values, 'to_dense'): # sparse numpy ndarray values = values.to_dense() values = np.array(values, dtype='object') @@ -2107,10 +1940,14 @@ def format_values_with(float_format): # There is a special default string when we are fixed-width # The default is otherwise to use str instead of a formatting string - if self.float_format is None and self.fixed_width: - float_format = '%% .%df' % self.digits + if self.float_format is None: + if self.fixed_width: + float_format = partial('{value: .{digits:d}f}'.format, + digits=self.digits) + else: + float_format = self.float_format else: - float_format = self.float_format + float_format = lambda value: self.float_format % value formatted_values = format_values_with(float_format) @@ -2137,7 +1974,8 @@ def format_values_with(float_format): (abs_vals > 0)).any() if has_small_values or (too_long and has_large_values): - float_format = '%% .%de' % self.digits + float_format = partial('{value: .{digits:d}e}'.format, + digits=self.digits) formatted_values = format_values_with(float_format) return formatted_values @@ -2151,13 +1989,15 @@ def _format_strings(self): class IntArrayFormatter(GenericArrayFormatter): + def _format_strings(self): - formatter = self.formatter or (lambda x: '% d' % x) + formatter = self.formatter or (lambda x: '{x: d}'.format(x=x)) fmt_values = [formatter(x) for x in self.values] return fmt_values class Datetime64Formatter(GenericArrayFormatter): + def __init__(self, values, nat_rep='NaT', date_format=None, **kwargs): super(Datetime64Formatter, self).__init__(values, **kwargs) self.nat_rep = nat_rep @@ -2182,21 +2022,34 @@ def _format_strings(self): return fmt_values.tolist() +class IntervalArrayFormatter(GenericArrayFormatter): + + def __init__(self, values, *args, **kwargs): + GenericArrayFormatter.__init__(self, values, *args, **kwargs) + + def _format_strings(self): + formatter = self.formatter or str + fmt_values = np.array([formatter(x) for x in self.values]) + return fmt_values + + class PeriodArrayFormatter(IntArrayFormatter): + def _format_strings(self): - from pandas.tseries.period import IncompatibleFrequency + from pandas.core.indexes.period import IncompatibleFrequency try: values = PeriodIndex(self.values).to_native_types() except IncompatibleFrequency: # periods may contains different freq values = Index(self.values, dtype='object').to_native_types() - formatter = self.formatter or (lambda x: '%s' % x) + formatter = self.formatter or (lambda x: '{x}'.format(x=x)) fmt_values = [formatter(x) for x in values] return fmt_values class CategoricalArrayFormatter(GenericArrayFormatter): + def __init__(self, values, *args, **kwargs): GenericArrayFormatter.__init__(self, values, *args, **kwargs) @@ -2328,6 +2181,7 @@ def _get_format_datetime64_from_values(values, date_format): class Datetime64TZFormatter(Datetime64Formatter): + def _format_strings(self): """ we by definition have a TZ """ @@ -2342,6 +2196,7 @@ def _format_strings(self): class Timedelta64Formatter(GenericArrayFormatter): + def __init__(self, values, nat_rep='NaT', box=False, **kwargs): super(Timedelta64Formatter, self).__init__(values, **kwargs) self.nat_rep = nat_rep @@ -2388,7 +2243,7 @@ def _formatter(x): x = Timedelta(x) result = x._repr_base(format=format) if box: - result = "'{0}'".format(result) + result = "'{res}'".format(res=result) return result return _formatter @@ -2443,12 +2298,12 @@ def _cond(values): def single_column_table(column, align=None, style=None): table = '%s' % str(i)) + table += ('{i!s}'.format(i=i)) table += '' return table @@ -2456,7 +2311,7 @@ def single_column_table(column, align=None, style=None): def single_row_table(row): # pragma: no cover table = '' for i in row: - table += ('' % str(i)) + table += (''.format(i=i)) table += '
%s{i!s}
' return table @@ -2467,82 +2322,6 @@ def _has_names(index): else: return index.name is not None -# ----------------------------------------------------------------------------- -# Global formatting options - -_initial_defencoding = None - - -def detect_console_encoding(): - """ - Try to find the most capable encoding supported by the console. - slighly modified from the way IPython handles the same issue. - """ - import locale - global _initial_defencoding - - encoding = None - try: - encoding = sys.stdout.encoding or sys.stdin.encoding - except AttributeError: - pass - - # try again for something better - if not encoding or 'ascii' in encoding.lower(): - try: - encoding = locale.getpreferredencoding() - except Exception: - pass - - # when all else fails. this will usually be "ascii" - if not encoding or 'ascii' in encoding.lower(): - encoding = sys.getdefaultencoding() - - # GH3360, save the reported defencoding at import time - # MPL backends may change it. Make available for debugging. - if not _initial_defencoding: - _initial_defencoding = sys.getdefaultencoding() - - return encoding - - -def get_console_size(): - """Return console size as tuple = (width, height). - - Returns (None,None) in non-interactive session. - """ - display_width = get_option('display.width') - # deprecated. - display_height = get_option('display.height', silent=True) - - # Consider - # interactive shell terminal, can detect term size - # interactive non-shell terminal (ipnb/ipqtconsole), cannot detect term - # size non-interactive script, should disregard term size - - # in addition - # width,height have default values, but setting to 'None' signals - # should use Auto-Detection, But only in interactive shell-terminal. - # Simple. yeah. - - if com.in_interactive_session(): - if com.in_ipython_frontend(): - # sane defaults for interactive non-shell terminal - # match default for width,height in config_init - from pandas.core.config import get_default_val - terminal_width = get_default_val('display.width') - terminal_height = get_default_val('display.height') - else: - # pure terminal - terminal_width, terminal_height = get_terminal_size() - else: - terminal_width, terminal_height = None, None - - # Note if the User sets width/Height to None (auto-detection) - # and we're in a script (non-inter), this will return (None,None) - # caller needs to deal. - return (display_width or terminal_width, display_height or terminal_height) - class EngFormatter(object): """ @@ -2626,18 +2405,19 @@ def __call__(self, num): prefix = self.ENG_PREFIXES[int_pow10] else: if int_pow10 < 0: - prefix = 'E-%02d' % (-int_pow10) + prefix = 'E-{pow10:02d}'.format(pow10=-int_pow10) else: - prefix = 'E+%02d' % int_pow10 + prefix = 'E+{pow10:02d}'.format(pow10=int_pow10) mant = sign * dnum / (10**pow10) if self.accuracy is None: # pragma: no cover - format_str = u("% g%s") + format_str = u("{mant: g}{prefix}") else: - format_str = (u("%% .%if%%s") % self.accuracy) + format_str = (u("{{mant: .{acc:d}f}}{{prefix}}") + .format(acc=self.accuracy)) - formatted = format_str % (mant, prefix) + formatted = format_str.format(mant=mant, prefix=prefix) return formatted # .strip() @@ -2679,17 +2459,3 @@ def _binify(cols, line_width): bins.append(len(cols)) return bins - - -if __name__ == '__main__': - arr = np.array([746.03, 0.00, 5620.00, 1592.36]) - # arr = np.array([11111111.1, 1.55]) - # arr = [314200.0034, 1.4125678] - arr = np.array( - [327763.3119, 345040.9076, 364460.9915, 398226.8688, 383800.5172, - 433442.9262, 539415.0568, 568590.4108, 599502.4276, 620921.8593, - 620898.5294, 552427.1093, 555221.2193, 519639.7059, 388175.7, - 379199.5854, 614898.25, 504833.3333, 560600., 941214.2857, 1134250., - 1219550., 855736.85, 1042615.4286, 722621.3043, 698167.1818, 803750.]) - fmt = FloatArrayFormatter(arr, digits=7) - print(fmt.get_result()) diff --git a/pandas/formats/printing.py b/pandas/io/formats/printing.py similarity index 88% rename from pandas/formats/printing.py rename to pandas/io/formats/printing.py index 37bd4b63d6f7a..cbad603630bd3 100644 --- a/pandas/formats/printing.py +++ b/pandas/io/formats/printing.py @@ -2,7 +2,8 @@ printing tools """ -from pandas.types.inference import is_sequence +import sys +from pandas.core.dtypes.inference import is_sequence from pandas import compat from pandas.compat import u from pandas.core.config import get_option @@ -233,3 +234,34 @@ def as_escaped_unicode(thing, escape_chars=escape_chars): def pprint_thing_encoded(object, encoding='utf-8', errors='replace', **kwds): value = pprint_thing(object) # get unicode representation of object return value.encode(encoding, errors, **kwds) + + +def _enable_data_resource_formatter(enable): + if 'IPython' not in sys.modules: + # definitely not in IPython + return + from IPython import get_ipython + ip = get_ipython() + if ip is None: + # still not in IPython + return + + formatters = ip.display_formatter.formatters + mimetype = "application/vnd.dataresource+json" + + if enable: + if mimetype not in formatters: + # define tableschema formatter + from IPython.core.formatters import BaseFormatter + + class TableSchemaFormatter(BaseFormatter): + print_method = '_repr_data_resource_' + _return_type = (dict,) + # register it: + formatters[mimetype] = TableSchemaFormatter() + # enable it if it's been disabled: + formatters[mimetype].enabled = True + else: + # unregister tableschema mime-type + if mimetype in formatters: + formatters[mimetype].enabled = False diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py new file mode 100644 index 0000000000000..445fceb4b8146 --- /dev/null +++ b/pandas/io/formats/style.py @@ -0,0 +1,1187 @@ +""" +Module for applying conditional formatting to +DataFrames and Series. +""" +from functools import partial +from itertools import product +from contextlib import contextmanager +from uuid import uuid1 +import copy +from collections import defaultdict, MutableMapping + +try: + from jinja2 import ( + PackageLoader, Environment, ChoiceLoader, FileSystemLoader + ) +except ImportError: + msg = "pandas.Styler requires jinja2. "\ + "Please install with `conda install Jinja2`\n"\ + "or `pip install Jinja2`" + raise ImportError(msg) + +from pandas.core.dtypes.common import is_float, is_string_like + +import numpy as np +import pandas as pd +from pandas.api.types import is_list_like +from pandas.compat import range +from pandas.core.config import get_option +from pandas.core.generic import _shared_docs +import pandas.core.common as com +from pandas.core.indexing import _maybe_numeric_slice, _non_reducing_slice +from pandas.util._decorators import Appender +try: + import matplotlib.pyplot as plt + from matplotlib import colors + has_mpl = True +except ImportError: + has_mpl = False + no_mpl_message = "{0} requires matplotlib." + + +@contextmanager +def _mpl(func): + if has_mpl: + yield plt, colors + else: + raise ImportError(no_mpl_message.format(func.__name__)) + + +class Styler(object): + """ + Helps style a DataFrame or Series according to the + data with HTML and CSS. + + .. versionadded:: 0.17.1 + + .. warning:: + This is a new feature and is under active development. + We'll be adding features and possibly making breaking changes in future + releases. + + Parameters + ---------- + data: Series or DataFrame + precision: int + precision to round floats to, defaults to pd.options.display.precision + table_styles: list-like, default None + list of {selector: (attr, value)} dicts; see Notes + uuid: str, default None + a unique identifier to avoid CSS collisons; generated automatically + caption: str, default None + caption to attach to the table + + Attributes + ---------- + env : Jinja2 Environment + template : Jinja2 Template + loader : Jinja2 Loader + + Notes + ----- + Most styling will be done by passing style functions into + ``Styler.apply`` or ``Styler.applymap``. Style functions should + return values with strings containing CSS ``'attr: value'`` that will + be applied to the indicated cells. + + If using in the Jupyter notebook, Styler has defined a ``_repr_html_`` + to automatically render itself. Otherwise call Styler.render to get + the genterated HTML. + + CSS classes are attached to the generated HTML + + * Index and Column names include ``index_name`` and ``level`` + where `k` is its level in a MultiIndex + * Index label cells include + + * ``row_heading`` + * ``row`` where `n` is the numeric position of the row + * ``level`` where `k` is the level in a MultiIndex + + * Column label cells include + * ``col_heading`` + * ``col`` where `n` is the numeric position of the column + * ``evel`` where `k` is the level in a MultiIndex + + * Blank cells include ``blank`` + * Data cells include ``data`` + + See Also + -------- + pandas.DataFrame.style + """ + loader = PackageLoader("pandas", "io/formats/templates") + env = Environment( + loader=loader, + trim_blocks=True, + ) + template = env.get_template("html.tpl") + + def __init__(self, data, precision=None, table_styles=None, uuid=None, + caption=None, table_attributes=None): + self.ctx = defaultdict(list) + self._todo = [] + + if not isinstance(data, (pd.Series, pd.DataFrame)): + raise TypeError("``data`` must be a Series or DataFrame") + if data.ndim == 1: + data = data.to_frame() + if not data.index.is_unique or not data.columns.is_unique: + raise ValueError("style is not supported for non-unique indicies.") + + self.data = data + self.index = data.index + self.columns = data.columns + + self.uuid = uuid + self.table_styles = table_styles + self.caption = caption + if precision is None: + precision = get_option('display.precision') + self.precision = precision + self.table_attributes = table_attributes + # display_funcs maps (row, col) -> formatting function + + def default_display_func(x): + if is_float(x): + return '{:>.{precision}g}'.format(x, precision=self.precision) + else: + return x + + self._display_funcs = defaultdict(lambda: default_display_func) + + def _repr_html_(self): + """Hooks into Jupyter notebook rich display system.""" + return self.render() + + @Appender(_shared_docs['to_excel'] % dict( + axes='index, columns', klass='Styler', + axes_single_arg="{0 or 'index', 1 or 'columns'}", + optional_by=""" + by : str or list of str + Name or list of names which refer to the axis items.""", + versionadded_to_excel='\n .. versionadded:: 0.20')) + def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', + float_format=None, columns=None, header=True, index=True, + index_label=None, startrow=0, startcol=0, engine=None, + merge_cells=True, encoding=None, inf_rep='inf', verbose=True, + freeze_panes=None): + + from pandas.io.formats.excel import ExcelFormatter + formatter = ExcelFormatter(self, na_rep=na_rep, cols=columns, + header=header, + float_format=float_format, index=index, + index_label=index_label, + merge_cells=merge_cells, + inf_rep=inf_rep) + formatter.write(excel_writer, sheet_name=sheet_name, startrow=startrow, + startcol=startcol, freeze_panes=freeze_panes, + engine=engine) + + def _translate(self): + """ + Convert the DataFrame in `self.data` and the attrs from `_build_styles` + into a dictionary of {head, body, uuid, cellstyle} + """ + table_styles = self.table_styles or [] + caption = self.caption + ctx = self.ctx + precision = self.precision + uuid = self.uuid or str(uuid1()).replace("-", "_") + ROW_HEADING_CLASS = "row_heading" + COL_HEADING_CLASS = "col_heading" + INDEX_NAME_CLASS = "index_name" + + DATA_CLASS = "data" + BLANK_CLASS = "blank" + BLANK_VALUE = "" + + def format_attr(pair): + return "{key}={value}".format(**pair) + + # for sparsifying a MultiIndex + idx_lengths = _get_level_lengths(self.index) + col_lengths = _get_level_lengths(self.columns) + + cell_context = dict() + + n_rlvls = self.data.index.nlevels + n_clvls = self.data.columns.nlevels + rlabels = self.data.index.tolist() + clabels = self.data.columns.tolist() + + if n_rlvls == 1: + rlabels = [[x] for x in rlabels] + if n_clvls == 1: + clabels = [[x] for x in clabels] + clabels = list(zip(*clabels)) + + cellstyle = [] + head = [] + + for r in range(n_clvls): + # Blank for Index columns... + row_es = [{"type": "th", + "value": BLANK_VALUE, + "display_value": BLANK_VALUE, + "is_visible": True, + "class": " ".join([BLANK_CLASS])}] * (n_rlvls - 1) + + # ... except maybe the last for columns.names + name = self.data.columns.names[r] + cs = [BLANK_CLASS if name is None else INDEX_NAME_CLASS, + "level%s" % r] + name = BLANK_VALUE if name is None else name + row_es.append({"type": "th", + "value": name, + "display_value": name, + "class": " ".join(cs), + "is_visible": True}) + + if clabels: + for c, value in enumerate(clabels[r]): + cs = [COL_HEADING_CLASS, "level%s" % r, "col%s" % c] + cs.extend(cell_context.get( + "col_headings", {}).get(r, {}).get(c, [])) + es = { + "type": "th", + "value": value, + "display_value": value, + "class": " ".join(cs), + "is_visible": _is_visible(c, r, col_lengths), + } + colspan = col_lengths.get((r, c), 0) + if colspan > 1: + es["attributes"] = [ + format_attr({"key": "colspan", "value": colspan}) + ] + row_es.append(es) + head.append(row_es) + + if self.data.index.names and not all(x is None + for x in self.data.index.names): + index_header_row = [] + + for c, name in enumerate(self.data.index.names): + cs = [INDEX_NAME_CLASS, + "level%s" % c] + name = '' if name is None else name + index_header_row.append({"type": "th", "value": name, + "class": " ".join(cs)}) + + index_header_row.extend( + [{"type": "th", + "value": BLANK_VALUE, + "class": " ".join([BLANK_CLASS]) + }] * len(clabels[0])) + + head.append(index_header_row) + + body = [] + for r, idx in enumerate(self.data.index): + row_es = [] + for c, value in enumerate(rlabels[r]): + rid = [ROW_HEADING_CLASS, "level%s" % c, "row%s" % r] + es = { + "type": "th", + "is_visible": _is_visible(r, c, idx_lengths), + "value": value, + "display_value": value, + "id": "_".join(rid[1:]), + "class": " ".join(rid) + } + rowspan = idx_lengths.get((c, r), 0) + if rowspan > 1: + es["attributes"] = [ + format_attr({"key": "rowspan", "value": rowspan}) + ] + row_es.append(es) + + for c, col in enumerate(self.data.columns): + cs = [DATA_CLASS, "row%s" % r, "col%s" % c] + cs.extend(cell_context.get("data", {}).get(r, {}).get(c, [])) + formatter = self._display_funcs[(r, c)] + value = self.data.iloc[r, c] + row_es.append({ + "type": "td", + "value": value, + "class": " ".join(cs), + "id": "_".join(cs[1:]), + "display_value": formatter(value) + }) + props = [] + for x in ctx[r, c]: + # have to handle empty styles like [''] + if x.count(":"): + props.append(x.split(":")) + else: + props.append(['', '']) + cellstyle.append({'props': props, + 'selector': "row%s_col%s" % (r, c)}) + body.append(row_es) + + return dict(head=head, cellstyle=cellstyle, body=body, uuid=uuid, + precision=precision, table_styles=table_styles, + caption=caption, table_attributes=self.table_attributes) + + def format(self, formatter, subset=None): + """ + Format the text display value of cells. + + .. versionadded:: 0.18.0 + + Parameters + ---------- + formatter: str, callable, or dict + subset: IndexSlice + An argument to ``DataFrame.loc`` that restricts which elements + ``formatter`` is applied to. + + Returns + ------- + self : Styler + + Notes + ----- + + ``formatter`` is either an ``a`` or a dict ``{column name: a}`` where + ``a`` is one of + + - str: this will be wrapped in: ``a.format(x)`` + - callable: called with the value of an individual cell + + The default display value for numeric values is the "general" (``g``) + format with ``pd.options.display.precision`` precision. + + Examples + -------- + + >>> df = pd.DataFrame(np.random.randn(4, 2), columns=['a', 'b']) + >>> df.style.format("{:.2%}") + >>> df['c'] = ['a', 'b', 'c', 'd'] + >>> df.style.format({'C': str.upper}) + """ + if subset is None: + row_locs = range(len(self.data)) + col_locs = range(len(self.data.columns)) + else: + subset = _non_reducing_slice(subset) + if len(subset) == 1: + subset = subset, self.data.columns + + sub_df = self.data.loc[subset] + row_locs = self.data.index.get_indexer_for(sub_df.index) + col_locs = self.data.columns.get_indexer_for(sub_df.columns) + + if isinstance(formatter, MutableMapping): + for col, col_formatter in formatter.items(): + # formatter must be callable, so '{}' are converted to lambdas + col_formatter = _maybe_wrap_formatter(col_formatter) + col_num = self.data.columns.get_indexer_for([col])[0] + + for row_num in row_locs: + self._display_funcs[(row_num, col_num)] = col_formatter + else: + # single scalar to format all cells with + locs = product(*(row_locs, col_locs)) + for i, j in locs: + formatter = _maybe_wrap_formatter(formatter) + self._display_funcs[(i, j)] = formatter + return self + + def render(self, **kwargs): + r""" + Render the built up styles to HTML + + .. versionadded:: 0.17.1 + + Parameters + ---------- + **kwargs: + Any additional keyword arguments are passed through + to ``self.template.render``. This is useful when you + need to provide additional variables for a custom + template. + + .. versionadded:: 0.20 + + Returns + ------- + rendered: str + the rendered HTML + + Notes + ----- + ``Styler`` objects have defined the ``_repr_html_`` method + which automatically calls ``self.render()`` when it's the + last item in a Notebook cell. When calling ``Styler.render()`` + directly, wrap the result in ``IPython.display.HTML`` to view + the rendered HTML in the notebook. + + Pandas uses the following keys in render. Arguments passed + in ``**kwargs`` take precedence, so think carefuly if you want + to override them: + + * head + * cellstyle + * body + * uuid + * precision + * table_styles + * caption + * table_attributes + """ + self._compute() + # TODO: namespace all the pandas keys + d = self._translate() + # filter out empty styles, every cell will have a class + # but the list of props may just be [['', '']]. + # so we have the neested anys below + trimmed = [x for x in d['cellstyle'] + if any(any(y) for y in x['props'])] + d['cellstyle'] = trimmed + d.update(kwargs) + return self.template.render(**d) + + def _update_ctx(self, attrs): + """ + update the state of the Styler. Collects a mapping + of {index_label: [': ']} + + attrs: Series or DataFrame + should contain strings of ': ;: ' + Whitespace shouldn't matter and the final trailing ';' shouldn't + matter. + """ + for row_label, v in attrs.iterrows(): + for col_label, col in v.iteritems(): + i = self.index.get_indexer([row_label])[0] + j = self.columns.get_indexer([col_label])[0] + for pair in col.rstrip(";").split(";"): + self.ctx[(i, j)].append(pair) + + def _copy(self, deepcopy=False): + styler = Styler(self.data, precision=self.precision, + caption=self.caption, uuid=self.uuid, + table_styles=self.table_styles) + if deepcopy: + styler.ctx = copy.deepcopy(self.ctx) + styler._todo = copy.deepcopy(self._todo) + else: + styler.ctx = self.ctx + styler._todo = self._todo + return styler + + def __copy__(self): + """ + Deep copy by default. + """ + return self._copy(deepcopy=False) + + def __deepcopy__(self, memo): + return self._copy(deepcopy=True) + + def clear(self): + """"Reset" the styler, removing any previously applied styles. + Returns None. + """ + self.ctx.clear() + self._todo = [] + + def _compute(self): + """ + Execute the style functions built up in `self._todo`. + + Relies on the conventions that all style functions go through + .apply or .applymap. The append styles to apply as tuples of + + (application method, *args, **kwargs) + """ + r = self + for func, args, kwargs in self._todo: + r = func(self)(*args, **kwargs) + return r + + def _apply(self, func, axis=0, subset=None, **kwargs): + subset = slice(None) if subset is None else subset + subset = _non_reducing_slice(subset) + data = self.data.loc[subset] + if axis is not None: + result = data.apply(func, axis=axis, **kwargs) + else: + result = func(data, **kwargs) + if not isinstance(result, pd.DataFrame): + raise TypeError( + "Function {!r} must return a DataFrame when " + "passed to `Styler.apply` with axis=None".format(func)) + if not (result.index.equals(data.index) and + result.columns.equals(data.columns)): + msg = ('Result of {!r} must have identical index and columns ' + 'as the input'.format(func)) + raise ValueError(msg) + + result_shape = result.shape + expected_shape = self.data.loc[subset].shape + if result_shape != expected_shape: + msg = ("Function {!r} returned the wrong shape.\n" + "Result has shape: {}\n" + "Expected shape: {}".format(func, + result.shape, + expected_shape)) + raise ValueError(msg) + self._update_ctx(result) + return self + + def apply(self, func, axis=0, subset=None, **kwargs): + """ + Apply a function column-wise, row-wise, or table-wase, + updating the HTML representation with the result. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + func : function + ``func`` should take a Series or DataFrame (depending + on ``axis``), and return an object with the same shape. + Must return a DataFrame with identical index and + column labels when ``axis=None`` + axis : int, str or None + apply to each column (``axis=0`` or ``'index'``) + or to each row (``axis=1`` or ``'columns'``) or + to the entire DataFrame at once with ``axis=None`` + subset : IndexSlice + a valid indexer to limit ``data`` to *before* applying the + function. Consider using a pandas.IndexSlice + kwargs : dict + pass along to ``func`` + + Returns + ------- + self : Styler + + Notes + ----- + The output shape of ``func`` should match the input, i.e. if + ``x`` is the input row, column, or table (depending on ``axis``), + then ``func(x.shape) == x.shape`` should be true. + + This is similar to ``DataFrame.apply``, except that ``axis=None`` + applies the function to the entire DataFrame at once, + rather than column-wise or row-wise. + + Examples + -------- + >>> def highlight_max(x): + ... return ['background-color: yellow' if v == x.max() else '' + for v in x] + ... + >>> df = pd.DataFrame(np.random.randn(5, 2)) + >>> df.style.apply(highlight_max) + """ + self._todo.append((lambda instance: getattr(instance, '_apply'), + (func, axis, subset), kwargs)) + return self + + def _applymap(self, func, subset=None, **kwargs): + func = partial(func, **kwargs) # applymap doesn't take kwargs? + if subset is None: + subset = pd.IndexSlice[:] + subset = _non_reducing_slice(subset) + result = self.data.loc[subset].applymap(func) + self._update_ctx(result) + return self + + def applymap(self, func, subset=None, **kwargs): + """ + Apply a function elementwise, updating the HTML + representation with the result. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + func : function + ``func`` should take a scalar and return a scalar + subset : IndexSlice + a valid indexer to limit ``data`` to *before* applying the + function. Consider using a pandas.IndexSlice + kwargs : dict + pass along to ``func`` + + Returns + ------- + self : Styler + + """ + self._todo.append((lambda instance: getattr(instance, '_applymap'), + (func, subset), kwargs)) + return self + + def set_precision(self, precision): + """ + Set the precision used to render. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + precision: int + + Returns + ------- + self : Styler + """ + self.precision = precision + return self + + def set_table_attributes(self, attributes): + """ + Set the table attributes. These are the items + that show up in the opening ```` tag in addition + to to automatic (by default) id. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + attributes : string + + Returns + ------- + self : Styler + + Examples + -------- + >>> df = pd.DataFrame(np.random.randn(10, 4)) + >>> df.style.set_table_attributes('class="pure-table"') + # ...
... + """ + self.table_attributes = attributes + return self + + def export(self): + """ + Export the styles to applied to the current Styler. + Can be applied to a second style with ``Styler.use``. + + .. versionadded:: 0.17.1 + + Returns + ------- + styles: list + + See Also + -------- + Styler.use + """ + return self._todo + + def use(self, styles): + """ + Set the styles on the current Styler, possibly using styles + from ``Styler.export``. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + styles: list + list of style functions + + Returns + ------- + self : Styler + + See Also + -------- + Styler.export + """ + self._todo.extend(styles) + return self + + def set_uuid(self, uuid): + """ + Set the uuid for a Styler. + + .. versionadded:: 0.17.1 + + Parameters + ---------- + uuid: str + + Returns + ------- + self : Styler + """ + self.uuid = uuid + return self + + def set_caption(self, caption): + """ + Se the caption on a Styler + + .. versionadded:: 0.17.1 + + Parameters + ---------- + caption: str + + Returns + ------- + self : Styler + """ + self.caption = caption + return self + + def set_table_styles(self, table_styles): + """ + Set the table styles on a Styler. These are placed in a + `` +{%- endblock style %} +{%- block before_table %}{% endblock before_table %} +{%- block table %} +
+{%- block caption %} +{%- if caption -%} + +{%- endif -%} +{%- endblock caption %} +{%- block thead %} + + {%- block before_head_rows %}{% endblock %} + {%- for r in head %} + {%- block head_tr scoped %} + + {%- for c in r %} + {%- if c.is_visible != False %} + <{{ c.type }} class="{{c.class}}" {{ c.attributes|join(" ") }}>{{c.value}} + {%- endif %} + {%- endfor %} + + {%- endblock head_tr %} + {%- endfor %} + {%- block after_head_rows %}{% endblock %} + +{%- endblock thead %} +{%- block tbody %} + + {%- block before_rows %}{%- endblock before_rows %} + {%- for r in body %} + {%- block tr scoped %} + + {%- for c in r %} + {%- if c.is_visible != False %} + <{{ c.type }} id="T_{{ uuid }}{{ c.id }}" class="{{ c.class }}" {{ c.attributes|join(" ") }}>{{ c.display_value }} + {%- endif %} + {%- endfor %} + + {%- endblock tr %} + {%- endfor %} + {%- block after_rows %}{%- endblock after_rows %} + +{%- endblock tbody %} +
{{caption}}
+{%- endblock table %} +{%- block after_table %}{% endblock after_table %} diff --git a/pandas/util/terminal.py b/pandas/io/formats/terminal.py similarity index 97% rename from pandas/util/terminal.py rename to pandas/io/formats/terminal.py index 6b8428ff75806..30bd1d16b538a 100644 --- a/pandas/util/terminal.py +++ b/pandas/io/formats/terminal.py @@ -14,6 +14,8 @@ from __future__ import print_function import os +import sys +import shutil __all__ = ['get_terminal_size'] @@ -26,6 +28,10 @@ def get_terminal_size(): IPython zmq frontends, or IDLE do not run in a terminal, """ import platform + + if sys.version_info[0] >= 3: + return shutil.get_terminal_size() + current_os = platform.system() tuple_xy = None if current_os == 'Windows': @@ -115,6 +121,7 @@ def ioctl_GWINSZ(fd): return None return int(cr[1]), int(cr[0]) + if __name__ == "__main__": sizex, sizey = get_terminal_size() print('width = %s height = %s' % (sizex, sizey)) diff --git a/pandas/io/gbq.py b/pandas/io/gbq.py index 966f53e9d75ef..b4dc9173f11ba 100644 --- a/pandas/io/gbq.py +++ b/pandas/io/gbq.py @@ -1,641 +1,24 @@ -import warnings -from datetime import datetime -import json -import logging -from time import sleep -import uuid -import time -import sys +""" Google BigQuery support """ -import numpy as np - -from distutils.version import StrictVersion -from pandas import compat -from pandas.core.api import DataFrame -from pandas.tools.merge import concat -from pandas.core.common import PandasError -from pandas.compat import lzip, bytes_to_str - - -def _check_google_client_version(): +def _try_import(): + # since pandas is a dependency of pandas-gbq + # we need to import on first use try: - import pkg_resources - + import pandas_gbq except ImportError: - raise ImportError('Could not import pkg_resources (setuptools).') - - if compat.PY3: - google_api_minimum_version = '1.4.1' - else: - google_api_minimum_version = '1.2.0' - - _GOOGLE_API_CLIENT_VERSION = pkg_resources.get_distribution( - 'google-api-python-client').version - - if (StrictVersion(_GOOGLE_API_CLIENT_VERSION) < - StrictVersion(google_api_minimum_version)): - raise ImportError("pandas requires google-api-python-client >= {0} " - "for Google BigQuery support, " - "current version {1}" - .format(google_api_minimum_version, - _GOOGLE_API_CLIENT_VERSION)) - - -def _test_google_api_imports(): - - try: - import httplib2 # noqa - try: - from googleapiclient.discovery import build # noqa - from googleapiclient.errors import HttpError # noqa - except: - from apiclient.discovery import build # noqa - from apiclient.errors import HttpError # noqa - from oauth2client.client import AccessTokenRefreshError # noqa - from oauth2client.client import OAuth2WebServerFlow # noqa - from oauth2client.file import Storage # noqa - from oauth2client.tools import run_flow, argparser # noqa - except ImportError as e: - raise ImportError("Missing module required for Google BigQuery " - "support: {0}".format(str(e))) - -logger = logging.getLogger('pandas.io.gbq') -logger.setLevel(logging.ERROR) - - -class InvalidPrivateKeyFormat(PandasError, ValueError): - """ - Raised when provided private key has invalid format. - """ - pass - - -class AccessDenied(PandasError, ValueError): - """ - Raised when invalid credentials are provided, or tokens have expired. - """ - pass - - -class DatasetCreationError(PandasError, ValueError): - """ - Raised when the create dataset method fails - """ - pass - - -class GenericGBQException(PandasError, ValueError): - """ - Raised when an unrecognized Google API Error occurs. - """ - pass - - -class InvalidColumnOrder(PandasError, ValueError): - """ - Raised when the provided column order for output - results DataFrame does not match the schema - returned by BigQuery. - """ - pass - - -class InvalidPageToken(PandasError, ValueError): - """ - Raised when Google BigQuery fails to return, - or returns a duplicate page token. - """ - pass - - -class InvalidSchema(PandasError, ValueError): - """ - Raised when the provided DataFrame does - not match the schema of the destination - table in BigQuery. - """ - pass - - -class NotFoundException(PandasError, ValueError): - """ - Raised when the project_id, table or dataset provided in the query could - not be found. - """ - pass - - -class StreamingInsertError(PandasError, ValueError): - """ - Raised when BigQuery reports a streaming insert error. - For more information see `Streaming Data Into BigQuery - `__ - """ - - -class TableCreationError(PandasError, ValueError): - """ - Raised when the create table method fails - """ - pass - - -class GbqConnector(object): - scope = 'https://www.googleapis.com/auth/bigquery' - - def __init__(self, project_id, reauth=False, verbose=False, - private_key=None, dialect='legacy'): - _check_google_client_version() - _test_google_api_imports() - self.project_id = project_id - self.reauth = reauth - self.verbose = verbose - self.private_key = private_key - self.dialect = dialect - self.credentials = self.get_credentials() - self.service = self.get_service() - - def get_credentials(self): - if self.private_key: - return self.get_service_account_credentials() - else: - # Try to retrieve Application Default Credentials - credentials = self.get_application_default_credentials() - if not credentials: - credentials = self.get_user_account_credentials() - return credentials - - def get_application_default_credentials(self): - """ - This method tries to retrieve the "default application credentials". - This could be useful for running code on Google Cloud Platform. - - .. versionadded:: 0.19.0 - - Parameters - ---------- - None - - Returns - ------- - - GoogleCredentials, - If the default application credentials can be retrieved - from the environment. The retrieved credentials should also - have access to the project (self.project_id) on BigQuery. - - OR None, - If default application credentials can not be retrieved - from the environment. Or, the retrieved credentials do not - have access to the project (self.project_id) on BigQuery. - """ - import httplib2 - try: - from googleapiclient.discovery import build - except ImportError: - from apiclient.discovery import build - try: - from oauth2client.client import GoogleCredentials - except ImportError: - return None - - try: - credentials = GoogleCredentials.get_application_default() - except: - return None - - http = httplib2.Http() - try: - http = credentials.authorize(http) - bigquery_service = build('bigquery', 'v2', http=http) - # Check if the application has rights to the BigQuery project - jobs = bigquery_service.jobs() - job_data = {'configuration': {'query': {'query': 'SELECT 1'}}} - jobs.insert(projectId=self.project_id, body=job_data).execute() - return credentials - except: - return None - - def get_user_account_credentials(self): - from oauth2client.client import OAuth2WebServerFlow - from oauth2client.file import Storage - from oauth2client.tools import run_flow, argparser - - flow = OAuth2WebServerFlow( - client_id=('495642085510-k0tmvj2m941jhre2nbqka17vqpjfddtd' - '.apps.googleusercontent.com'), - client_secret='kOc9wMptUtxkcIFbtZCcrEAc', - scope=self.scope, - redirect_uri='urn:ietf:wg:oauth:2.0:oob') - - storage = Storage('bigquery_credentials.dat') - credentials = storage.get() - - if credentials is None or credentials.invalid or self.reauth: - credentials = run_flow(flow, storage, argparser.parse_args([])) - - return credentials - - def get_service_account_credentials(self): - # Bug fix for https://github.com/pandas-dev/pandas/issues/12572 - # We need to know that a supported version of oauth2client is installed - # Test that either of the following is installed: - # - SignedJwtAssertionCredentials from oauth2client.client - # - ServiceAccountCredentials from oauth2client.service_account - # SignedJwtAssertionCredentials is available in oauthclient < 2.0.0 - # ServiceAccountCredentials is available in oauthclient >= 2.0.0 - oauth2client_v1 = True - oauth2client_v2 = True - - try: - from oauth2client.client import SignedJwtAssertionCredentials - except ImportError: - oauth2client_v1 = False - - try: - from oauth2client.service_account import ServiceAccountCredentials - except ImportError: - oauth2client_v2 = False - - if not oauth2client_v1 and not oauth2client_v2: - raise ImportError("Missing oauth2client required for BigQuery " - "service account support") - - from os.path import isfile - - try: - if isfile(self.private_key): - with open(self.private_key) as f: - json_key = json.loads(f.read()) - else: - # ugly hack: 'private_key' field has new lines inside, - # they break json parser, but we need to preserve them - json_key = json.loads(self.private_key.replace('\n', ' ')) - json_key['private_key'] = json_key['private_key'].replace( - ' ', '\n') - - if compat.PY3: - json_key['private_key'] = bytes( - json_key['private_key'], 'UTF-8') - - if oauth2client_v1: - return SignedJwtAssertionCredentials( - json_key['client_email'], - json_key['private_key'], - self.scope, - ) - else: - return ServiceAccountCredentials.from_json_keyfile_dict( - json_key, - self.scope) - except (KeyError, ValueError, TypeError, AttributeError): - raise InvalidPrivateKeyFormat( - "Private key is missing or invalid. It should be service " - "account private key JSON (file path or string contents) " - "with at least two keys: 'client_email' and 'private_key'. " - "Can be obtained from: https://console.developers.google." - "com/permissions/serviceaccounts") - - def _print(self, msg, end='\n'): - if self.verbose: - sys.stdout.write(msg + end) - sys.stdout.flush() - - def _start_timer(self): - self.start = time.time() - - def get_elapsed_seconds(self): - return round(time.time() - self.start, 2) - - def print_elapsed_seconds(self, prefix='Elapsed', postfix='s.', - overlong=7): - sec = self.get_elapsed_seconds() - if sec > overlong: - self._print('{} {} {}'.format(prefix, sec, postfix)) - - # http://stackoverflow.com/questions/1094841/reusable-library-to-get-human-readable-version-of-file-size - @staticmethod - def sizeof_fmt(num, suffix='b'): - fmt = "%3.1f %s%s" - for unit in ['', 'k', 'M', 'G', 'T', 'P', 'E', 'Z']: - if abs(num) < 1024.0: - return fmt % (num, unit, suffix) - num /= 1024.0 - return fmt % (num, 'Y', suffix) - - def get_service(self): - import httplib2 - try: - from googleapiclient.discovery import build - except: - from apiclient.discovery import build - - http = httplib2.Http() - http = self.credentials.authorize(http) - bigquery_service = build('bigquery', 'v2', http=http) - - return bigquery_service - - @staticmethod - def process_http_error(ex): - # See `BigQuery Troubleshooting Errors - # `__ - - status = json.loads(bytes_to_str(ex.content))['error'] - errors = status.get('errors', None) - - if errors: - for error in errors: - reason = error['reason'] - message = error['message'] - - raise GenericGBQException( - "Reason: {0}, Message: {1}".format(reason, message)) - - raise GenericGBQException(errors) - - def process_insert_errors(self, insert_errors): - for insert_error in insert_errors: - row = insert_error['index'] - errors = insert_error.get('errors', None) - for error in errors: - reason = error['reason'] - message = error['message'] - location = error['location'] - error_message = ('Error at Row: {0}, Reason: {1}, ' - 'Location: {2}, Message: {3}' - .format(row, reason, location, message)) - - # Report all error messages if verbose is set - if self.verbose: - self._print(error_message) - else: - raise StreamingInsertError(error_message + - '\nEnable verbose logging to ' - 'see all errors') - - raise StreamingInsertError - - def run_query(self, query, **kwargs): - try: - from googleapiclient.errors import HttpError - except: - from apiclient.errors import HttpError - from oauth2client.client import AccessTokenRefreshError - - _check_google_client_version() - - job_collection = self.service.jobs() - - job_config = { - 'query': { - 'query': query, - 'useLegacySql': self.dialect == 'legacy' - # 'allowLargeResults', 'createDisposition', - # 'preserveNulls', destinationTable, useQueryCache - } - } - config = kwargs.get('configuration') - if config is not None: - if len(config) != 1: - raise ValueError("Only one job type must be specified, but " - "given {}".format(','.join(config.keys()))) - if 'query' in config: - if 'query' in config['query'] and query is not None: - raise ValueError("Query statement can't be specified " - "inside config while it is specified " - "as parameter") - - job_config['query'].update(config['query']) - else: - raise ValueError("Only 'query' job type is supported") - - job_data = { - 'configuration': job_config - } - - self._start_timer() - try: - self._print('Requesting query... ', end="") - query_reply = job_collection.insert( - projectId=self.project_id, body=job_data).execute() - self._print('ok.\nQuery running...') - except (AccessTokenRefreshError, ValueError): - if self.private_key: - raise AccessDenied( - "The service account credentials are not valid") - else: - raise AccessDenied( - "The credentials have been revoked or expired, " - "please re-run the application to re-authorize") - except HttpError as ex: - self.process_http_error(ex) - - job_reference = query_reply['jobReference'] - - while not query_reply.get('jobComplete', False): - self.print_elapsed_seconds(' Elapsed', 's. Waiting...') - try: - query_reply = job_collection.getQueryResults( - projectId=job_reference['projectId'], - jobId=job_reference['jobId']).execute() - except HttpError as ex: - self.process_http_error(ex) - - if self.verbose: - if query_reply['cacheHit']: - self._print('Query done.\nCache hit.\n') - else: - bytes_processed = int(query_reply.get( - 'totalBytesProcessed', '0')) - self._print('Query done.\nProcessed: {}\n'.format( - self.sizeof_fmt(bytes_processed))) - - self._print('Retrieving results...') - - total_rows = int(query_reply['totalRows']) - result_pages = list() - seen_page_tokens = list() - current_row = 0 - # Only read schema on first page - schema = query_reply['schema'] - - # Loop through each page of data - while 'rows' in query_reply and current_row < total_rows: - page = query_reply['rows'] - result_pages.append(page) - current_row += len(page) - - self.print_elapsed_seconds( - ' Got page: {}; {}% done. Elapsed'.format( - len(result_pages), - round(100.0 * current_row / total_rows))) - - if current_row == total_rows: - break - - page_token = query_reply.get('pageToken', None) - - if not page_token and current_row < total_rows: - raise InvalidPageToken("Required pageToken was missing. " - "Received {0} of {1} rows" - .format(current_row, total_rows)) - - elif page_token in seen_page_tokens: - raise InvalidPageToken("A duplicate pageToken was returned") - - seen_page_tokens.append(page_token) - - try: - query_reply = job_collection.getQueryResults( - projectId=job_reference['projectId'], - jobId=job_reference['jobId'], - pageToken=page_token).execute() - except HttpError as ex: - self.process_http_error(ex) - - if current_row < total_rows: - raise InvalidPageToken() - - # print basic query stats - self._print('Got {} rows.\n'.format(total_rows)) - - return schema, result_pages - - def load_data(self, dataframe, dataset_id, table_id, chunksize): - try: - from googleapiclient.errors import HttpError - except: - from apiclient.errors import HttpError - - job_id = uuid.uuid4().hex - rows = [] - remaining_rows = len(dataframe) - - total_rows = remaining_rows - self._print("\n\n") - - for index, row in dataframe.reset_index(drop=True).iterrows(): - row_dict = dict() - row_dict['json'] = json.loads(row.to_json(force_ascii=False, - date_unit='s', - date_format='iso')) - row_dict['insertId'] = job_id + str(index) - rows.append(row_dict) - remaining_rows -= 1 - - if (len(rows) % chunksize == 0) or (remaining_rows == 0): - self._print("\rStreaming Insert is {0}% Complete".format( - ((total_rows - remaining_rows) * 100) / total_rows)) - - body = {'rows': rows} - - try: - response = self.service.tabledata().insertAll( - projectId=self.project_id, - datasetId=dataset_id, - tableId=table_id, - body=body).execute() - except HttpError as ex: - self.process_http_error(ex) - - # For streaming inserts, even if you receive a success HTTP - # response code, you'll need to check the insertErrors property - # of the response to determine if the row insertions were - # successful, because it's possible that BigQuery was only - # partially successful at inserting the rows. See the `Success - # HTTP Response Codes - # `__ - # section - - insert_errors = response.get('insertErrors', None) - if insert_errors: - self.process_insert_errors(insert_errors) - - sleep(1) # Maintains the inserts "per second" rate per API - rows = [] - - self._print("\n") - - def verify_schema(self, dataset_id, table_id, schema): - try: - from googleapiclient.errors import HttpError - except: - from apiclient.errors import HttpError - - try: - remote_schema = self.service.tables().get( - projectId=self.project_id, - datasetId=dataset_id, - tableId=table_id).execute()['schema'] - - fields_remote = set([json.dumps(field_remote) - for field_remote in remote_schema['fields']]) - fields_local = set(json.dumps(field_local) - for field_local in schema['fields']) - - return fields_remote == fields_local - except HttpError as ex: - self.process_http_error(ex) - - def delete_and_recreate_table(self, dataset_id, table_id, table_schema): - delay = 0 - - # Changes to table schema may take up to 2 minutes as of May 2015 See - # `Issue 191 - # `__ - # Compare previous schema with new schema to determine if there should - # be a 120 second delay - if not self.verify_schema(dataset_id, table_id, table_schema): - self._print('The existing table has a different schema. Please ' - 'wait 2 minutes. See Google BigQuery issue #191') - delay = 120 + # give a nice error message + raise ImportError("Load data from Google BigQuery\n" + "\n" + "the pandas-gbq package is not installed\n" + "see the docs: https://pandas-gbq.readthedocs.io\n" + "\n" + "you can install via pip or conda:\n" + "pip install pandas-gbq\n" + "conda install pandas-gbq -c conda-forge\n") - table = _Table(self.project_id, dataset_id, - private_key=self.private_key) - table.delete(table_id) - table.create(table_id, table_schema) - sleep(delay) - - -def _parse_data(schema, rows): - # see: - # http://pandas.pydata.org/pandas-docs/dev/missing_data.html - # #missing-data-casting-rules-and-indexing - dtype_map = {'INTEGER': np.dtype(float), - 'FLOAT': np.dtype(float), - # This seems to be buggy without nanosecond indicator - 'TIMESTAMP': 'M8[ns]'} - - fields = schema['fields'] - col_types = [field['type'] for field in fields] - col_names = [str(field['name']) for field in fields] - col_dtypes = [dtype_map.get(field['type'], object) for field in fields] - page_array = np.zeros((len(rows),), - dtype=lzip(col_names, col_dtypes)) - - for row_num, raw_row in enumerate(rows): - entries = raw_row.get('f', []) - for col_num, field_type in enumerate(col_types): - field_value = _parse_entry(entries[col_num].get('v', ''), - field_type) - page_array[row_num][col_num] = field_value - - return DataFrame(page_array, columns=col_names) - - -def _parse_entry(field_value, field_type): - if field_value is None or field_value == 'null': - return None - if field_type == 'INTEGER' or field_type == 'FLOAT': - return float(field_value) - elif field_type == 'TIMESTAMP': - timestamp = datetime.utcfromtimestamp(float(field_value)) - return np.datetime64(timestamp) - elif field_type == 'BOOLEAN': - return field_value == 'true' - return field_value + return pandas_gbq def read_gbq(query, project_id=None, index_col=None, col_order=None, @@ -643,14 +26,12 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, **kwargs): r"""Load data from Google BigQuery. - THIS IS AN EXPERIMENTAL LIBRARY - The main method a user calls to execute a Query in Google BigQuery and read results into a pandas DataFrame. Google BigQuery API Client Library v2 for Python is used. - Documentation is available at - https://developers.google.com/api-client-library/python/apis/bigquery/v2 + Documentation is available `here + `__ Authentication to the Google BigQuery service is via OAuth 2.0. @@ -658,8 +39,6 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, By default "application default credentials" are used. - .. versionadded:: 0.19.0 - If default application credentials are not found or are restrictive, user account credentials are used. In this case, you will be asked to grant permissions for product name 'pandas GBQ'. @@ -689,8 +68,6 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, or string contents. This is useful for remote server authentication (eg. jupyter iPython notebook on remote host) - .. versionadded:: 0.18.1 - dialect : {'legacy', 'standard'}, default 'legacy' 'legacy' : Use BigQuery's legacy SQL dialect. 'standard' : Use BigQuery's standard SQL (beta), which is @@ -698,8 +75,6 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, see `BigQuery SQL Reference `__ - .. versionadded:: 0.19.0 - **kwargs : Arbitrary keyword arguments configuration (dict): query config parameters for job processing. For example: @@ -707,9 +82,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, configuration = {'query': {'useQueryCache': False}} For more information see `BigQuery SQL Reference - ` - - .. versionadded:: 0.20.0 + `__ Returns ------- @@ -717,442 +90,20 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, DataFrame representing results of query """ - - if not project_id: - raise TypeError("Missing required parameter: project_id") - - if dialect not in ('legacy', 'standard'): - raise ValueError("'{0}' is not valid for dialect".format(dialect)) - - connector = GbqConnector(project_id, reauth=reauth, verbose=verbose, - private_key=private_key, - dialect=dialect) - schema, pages = connector.run_query(query, **kwargs) - dataframe_list = [] - while len(pages) > 0: - page = pages.pop() - dataframe_list.append(_parse_data(schema, page)) - - if len(dataframe_list) > 0: - final_df = concat(dataframe_list, ignore_index=True) - else: - final_df = _parse_data(schema, []) - - # Reindex the DataFrame on the provided column - if index_col is not None: - if index_col in final_df.columns: - final_df.set_index(index_col, inplace=True) - else: - raise InvalidColumnOrder( - 'Index column "{0}" does not exist in DataFrame.' - .format(index_col) - ) - - # Change the order of columns in the DataFrame based on provided list - if col_order is not None: - if sorted(col_order) == sorted(final_df.columns): - final_df = final_df[col_order] - else: - raise InvalidColumnOrder( - 'Column order does not match this DataFrame.' - ) - - # Downcast floats to integers and objects to booleans - # if there are no NaN's. This is presently due to a - # limitation of numpy in handling missing data. - final_df._data = final_df._data.downcast(dtypes='infer') - - connector.print_elapsed_seconds( - 'Total time taken', - datetime.now().strftime('s.\nFinished at %Y-%m-%d %H:%M:%S.'), - 0 - ) - - return final_df + pandas_gbq = _try_import() + return pandas_gbq.read_gbq( + query, project_id=project_id, + index_col=index_col, col_order=col_order, + reauth=reauth, verbose=verbose, + private_key=private_key, + dialect=dialect, + **kwargs) def to_gbq(dataframe, destination_table, project_id, chunksize=10000, verbose=True, reauth=False, if_exists='fail', private_key=None): - """Write a DataFrame to a Google BigQuery table. - - THIS IS AN EXPERIMENTAL LIBRARY - - The main method a user calls to export pandas DataFrame contents to - Google BigQuery table. - - Google BigQuery API Client Library v2 for Python is used. - Documentation is available at - https://developers.google.com/api-client-library/python/apis/bigquery/v2 - - Authentication to the Google BigQuery service is via OAuth 2.0. - - - If "private_key" is not provided: - - By default "application default credentials" are used. - - .. versionadded:: 0.19.0 - - If default application credentials are not found or are restrictive, - user account credentials are used. In this case, you will be asked to - grant permissions for product name 'pandas GBQ'. - - - If "private_key" is provided: - - Service account credentials will be used to authenticate. - - Parameters - ---------- - dataframe : DataFrame - DataFrame to be written - destination_table : string - Name of table to be written, in the form 'dataset.tablename' - project_id : str - Google BigQuery Account project ID. - chunksize : int (default 10000) - Number of rows to be inserted in each chunk from the dataframe. - verbose : boolean (default True) - Show percentage complete - reauth : boolean (default False) - Force Google BigQuery to reauthenticate the user. This is useful - if multiple accounts are used. - if_exists : {'fail', 'replace', 'append'}, default 'fail' - 'fail': If table exists, do nothing. - 'replace': If table exists, drop it, recreate it, and insert data. - 'append': If table exists, insert data. Create if does not exist. - private_key : str (optional) - Service account private key in JSON format. Can be file path - or string contents. This is useful for remote server - authentication (eg. jupyter iPython notebook on remote host) - """ - - if if_exists not in ('fail', 'replace', 'append'): - raise ValueError("'{0}' is not valid for if_exists".format(if_exists)) - - if '.' not in destination_table: - raise NotFoundException( - "Invalid Table Name. Should be of the form 'datasetId.tableId' ") - - connector = GbqConnector(project_id, reauth=reauth, verbose=verbose, - private_key=private_key) - dataset_id, table_id = destination_table.rsplit('.', 1) - - table = _Table(project_id, dataset_id, reauth=reauth, - private_key=private_key) - - table_schema = _generate_bq_schema(dataframe) - - # If table exists, check if_exists parameter - if table.exists(table_id): - if if_exists == 'fail': - raise TableCreationError("Could not create the table because it " - "already exists. " - "Change the if_exists parameter to " - "append or replace data.") - elif if_exists == 'replace': - connector.delete_and_recreate_table( - dataset_id, table_id, table_schema) - elif if_exists == 'append': - if not connector.verify_schema(dataset_id, table_id, table_schema): - raise InvalidSchema("Please verify that the structure and " - "data types in the DataFrame match the " - "schema of the destination table.") - else: - table.create(table_id, table_schema) - - connector.load_data(dataframe, dataset_id, table_id, chunksize) - - -def generate_bq_schema(df, default_type='STRING'): - # deprecation TimeSeries, #11121 - warnings.warn("generate_bq_schema is deprecated and will be removed in " - "a future version", FutureWarning, stacklevel=2) - - return _generate_bq_schema(df, default_type=default_type) - - -def _generate_bq_schema(df, default_type='STRING'): - """ Given a passed df, generate the associated Google BigQuery schema. - - Parameters - ---------- - df : DataFrame - default_type : string - The default big query type in case the type of the column - does not exist in the schema. - """ - - type_mapping = { - 'i': 'INTEGER', - 'b': 'BOOLEAN', - 'f': 'FLOAT', - 'O': 'STRING', - 'S': 'STRING', - 'U': 'STRING', - 'M': 'TIMESTAMP' - } - - fields = [] - for column_name, dtype in df.dtypes.iteritems(): - fields.append({'name': column_name, - 'type': type_mapping.get(dtype.kind, default_type)}) - - return {'fields': fields} - - -class _Table(GbqConnector): - - def __init__(self, project_id, dataset_id, reauth=False, verbose=False, - private_key=None): - try: - from googleapiclient.errors import HttpError - except: - from apiclient.errors import HttpError - self.http_error = HttpError - self.dataset_id = dataset_id - super(_Table, self).__init__(project_id, reauth, verbose, private_key) - - def exists(self, table_id): - """ Check if a table exists in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - table : str - Name of table to be verified - - Returns - ------- - boolean - true if table exists, otherwise false - """ - - try: - self.service.tables().get( - projectId=self.project_id, - datasetId=self.dataset_id, - tableId=table_id).execute() - return True - except self.http_error as ex: - if ex.resp.status == 404: - return False - else: - self.process_http_error(ex) - - def create(self, table_id, schema): - """ Create a table in Google BigQuery given a table and schema - - .. versionadded:: 0.17.0 - - Parameters - ---------- - table : str - Name of table to be written - schema : str - Use the generate_bq_schema to generate your table schema from a - dataframe. - """ - - if self.exists(table_id): - raise TableCreationError( - "The table could not be created because it already exists") - - if not _Dataset(self.project_id, - private_key=self.private_key).exists(self.dataset_id): - _Dataset(self.project_id, - private_key=self.private_key).create(self.dataset_id) - - body = { - 'schema': schema, - 'tableReference': { - 'tableId': table_id, - 'projectId': self.project_id, - 'datasetId': self.dataset_id - } - } - - try: - self.service.tables().insert( - projectId=self.project_id, - datasetId=self.dataset_id, - body=body).execute() - except self.http_error as ex: - self.process_http_error(ex) - - def delete(self, table_id): - """ Delete a table in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - table : str - Name of table to be deleted - """ - - if not self.exists(table_id): - raise NotFoundException("Table does not exist") - - try: - self.service.tables().delete( - datasetId=self.dataset_id, - projectId=self.project_id, - tableId=table_id).execute() - except self.http_error as ex: - self.process_http_error(ex) - - -class _Dataset(GbqConnector): - - def __init__(self, project_id, reauth=False, verbose=False, - private_key=None): - try: - from googleapiclient.errors import HttpError - except: - from apiclient.errors import HttpError - self.http_error = HttpError - super(_Dataset, self).__init__(project_id, reauth, verbose, - private_key) - - def exists(self, dataset_id): - """ Check if a dataset exists in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - dataset_id : str - Name of dataset to be verified - - Returns - ------- - boolean - true if dataset exists, otherwise false - """ - - try: - self.service.datasets().get( - projectId=self.project_id, - datasetId=dataset_id).execute() - return True - except self.http_error as ex: - if ex.resp.status == 404: - return False - else: - self.process_http_error(ex) - - def datasets(self): - """ Return a list of datasets in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - None - - Returns - ------- - list - List of datasets under the specific project - """ - - try: - list_dataset_response = self.service.datasets().list( - projectId=self.project_id).execute().get('datasets', None) - - if not list_dataset_response: - return [] - - dataset_list = list() - - for row_num, raw_row in enumerate(list_dataset_response): - dataset_list.append(raw_row['datasetReference']['datasetId']) - - return dataset_list - except self.http_error as ex: - self.process_http_error(ex) - - def create(self, dataset_id): - """ Create a dataset in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - dataset : str - Name of dataset to be written - """ - - if self.exists(dataset_id): - raise DatasetCreationError( - "The dataset could not be created because it already exists") - - body = { - 'datasetReference': { - 'projectId': self.project_id, - 'datasetId': dataset_id - } - } - - try: - self.service.datasets().insert( - projectId=self.project_id, - body=body).execute() - except self.http_error as ex: - self.process_http_error(ex) - - def delete(self, dataset_id): - """ Delete a dataset in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - dataset : str - Name of dataset to be deleted - """ - - if not self.exists(dataset_id): - raise NotFoundException( - "Dataset {0} does not exist".format(dataset_id)) - - try: - self.service.datasets().delete( - datasetId=dataset_id, - projectId=self.project_id).execute() - - except self.http_error as ex: - self.process_http_error(ex) - - def tables(self, dataset_id): - """ List tables in the specific dataset in Google BigQuery - - .. versionadded:: 0.17.0 - - Parameters - ---------- - dataset : str - Name of dataset to list tables for - - Returns - ------- - list - List of tables under the specific dataset - """ - - try: - list_table_response = self.service.tables().list( - projectId=self.project_id, - datasetId=dataset_id).execute().get('tables', None) - - if not list_table_response: - return [] - - table_list = list() - - for row_num, raw_row in enumerate(list_table_response): - table_list.append(raw_row['tableReference']['tableId']) - - return table_list - except self.http_error as ex: - self.process_http_error(ex) + pandas_gbq = _try_import() + pandas_gbq.to_gbq(dataframe, destination_table, project_id, + chunksize=chunksize, + verbose=verbose, reauth=reauth, + if_exists=if_exists, private_key=private_key) diff --git a/pandas/io/html.py b/pandas/io/html.py index 3c38dae91eb89..a4acb26af5259 100644 --- a/pandas/io/html.py +++ b/pandas/io/html.py @@ -12,15 +12,16 @@ import numpy as np -from pandas.types.common import is_list_like -from pandas.io.common import (EmptyDataError, _is_url, urlopen, +from pandas.core.dtypes.common import is_list_like +from pandas.errors import EmptyDataError +from pandas.io.common import (_is_url, urlopen, parse_url, _validate_header_arg) from pandas.io.parsers import TextParser from pandas.compat import (lrange, lmap, u, string_types, iteritems, raise_with_traceback, binary_type) from pandas import Series from pandas.core.common import AbstractMethodError -from pandas.formats.printing import pprint_thing +from pandas.io.formats.printing import pprint_thing _IMPORTS = False _HAS_BS4 = False @@ -36,8 +37,6 @@ def _importers(): if _IMPORTS: return - _IMPORTS = True - global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB try: @@ -58,6 +57,8 @@ def _importers(): except ImportError: pass + _IMPORTS = True + ############# # READ HTML # @@ -355,9 +356,12 @@ def _parse_raw_thead(self, table): thead = self._parse_thead(table) res = [] if thead: - res = lmap(self._text_getter, self._parse_th(thead[0])) - return np.atleast_1d( - np.array(res).squeeze()) if res and len(res) == 1 else res + trs = self._parse_tr(thead[0]) + for tr in trs: + cols = lmap(self._text_getter, self._parse_td(tr)) + if any([col != '' for col in cols]): + res.append(cols) + return res def _parse_raw_tfoot(self, table): tfoot = self._parse_tfoot(table) @@ -591,9 +595,17 @@ def _parse_tfoot(self, table): return table.xpath('.//tfoot') def _parse_raw_thead(self, table): - expr = './/thead//th' - return [_remove_whitespace(x.text_content()) for x in - table.xpath(expr)] + expr = './/thead' + thead = table.xpath(expr) + res = [] + if thead: + trs = self._parse_tr(thead[0]) + for tr in trs: + cols = [_remove_whitespace(x.text_content()) for x in + self._parse_td(tr)] + if any([col != '' for col in cols]): + res.append(cols) + return res def _parse_raw_tfoot(self, table): expr = './/tfoot//th|//tfoot//td' @@ -615,19 +627,17 @@ def _data_to_frame(**kwargs): head, body, foot = kwargs.pop('data') header = kwargs.pop('header') kwargs['skiprows'] = _get_skiprows(kwargs['skiprows']) - if head: - body = [head] + body - + rows = lrange(len(head)) + body = head + body if header is None: # special case when a table has elements - header = 0 + header = 0 if rows == [0] else rows if foot: body += [foot] # fill out elements of body that are "ragged" _expand_elements(body) - tp = TextParser(body, header=header, **kwargs) df = tp.read() return df @@ -853,7 +863,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, Notes ----- Before using this function you should read the :ref:`gotchas about the - HTML parsing libraries `. + HTML parsing libraries `. Expect to do some cleanup after you call this function. For example, you might need to manually assign column names if the column names are diff --git a/pandas/io/json/__init__.py b/pandas/io/json/__init__.py new file mode 100644 index 0000000000000..32d110b3404a9 --- /dev/null +++ b/pandas/io/json/__init__.py @@ -0,0 +1,5 @@ +from .json import to_json, read_json, loads, dumps # noqa +from .normalize import json_normalize # noqa +from .table_schema import build_table_schema # noqa + +del json, normalize, table_schema # noqa diff --git a/pandas/io/json.py b/pandas/io/json/json.py similarity index 69% rename from pandas/io/json.py rename to pandas/io/json/json.py index 767a2212d92da..a1d48719ba9c0 100644 --- a/pandas/io/json.py +++ b/pandas/io/json/json.py @@ -1,21 +1,24 @@ # pylint: disable-msg=E1101,W0613,W0603 - import os -import copy -from collections import defaultdict import numpy as np -import pandas.json as _json -from pandas.tslib import iNaT +import pandas._libs.json as json +from pandas._libs.tslib import iNaT from pandas.compat import StringIO, long, u -from pandas import compat, isnull -from pandas import Series, DataFrame, to_datetime -from pandas.io.common import get_filepath_or_buffer, _get_handle +from pandas import compat, isna +from pandas import Series, DataFrame, to_datetime, MultiIndex +from pandas.io.common import (get_filepath_or_buffer, _get_handle, + _stringify_path) from pandas.core.common import AbstractMethodError -from pandas.formats.printing import pprint_thing +from pandas.io.formats.printing import pprint_thing +from .normalize import _convert_to_line_delimits +from .table_schema import build_table_schema +from pandas.core.dtypes.common import is_period_dtype + +loads = json.loads +dumps = json.dumps -loads = _json.loads -dumps = _json.dumps +TABLE_SCHEMA_VERSION = '0.20.0' # interface to/from @@ -23,23 +26,27 @@ def to_json(path_or_buf, obj, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False): + path_or_buf = _stringify_path(path_or_buf) if lines and orient != 'records': - raise ValueError( - "'lines' keyword only valid when 'orient' is records") - - if isinstance(obj, Series): - s = SeriesWriter( - obj, orient=orient, date_format=date_format, - double_precision=double_precision, ensure_ascii=force_ascii, - date_unit=date_unit, default_handler=default_handler).write() + raise ValueError( + "'lines' keyword only valid when 'orient' is records") + + if orient == 'table' and isinstance(obj, Series): + obj = obj.to_frame(name=obj.name or 'values') + if orient == 'table' and isinstance(obj, DataFrame): + writer = JSONTableWriter + elif isinstance(obj, Series): + writer = SeriesWriter elif isinstance(obj, DataFrame): - s = FrameWriter( - obj, orient=orient, date_format=date_format, - double_precision=double_precision, ensure_ascii=force_ascii, - date_unit=date_unit, default_handler=default_handler).write() + writer = FrameWriter else: raise NotImplementedError("'obj' should be a Series or a DataFrame") + s = writer( + obj, orient=orient, date_format=date_format, + double_precision=double_precision, ensure_ascii=force_ascii, + date_unit=date_unit, default_handler=default_handler).write() + if lines: s = _convert_to_line_delimits(s) @@ -82,7 +89,8 @@ def write(self): ensure_ascii=self.ensure_ascii, date_unit=self.date_unit, iso_dates=self.date_format == 'iso', - default_handler=self.default_handler) + default_handler=self.default_handler + ) class SeriesWriter(Writer): @@ -109,6 +117,60 @@ def _format_axes(self): "'%s'." % self.orient) +class JSONTableWriter(FrameWriter): + _default_orient = 'records' + + def __init__(self, obj, orient, date_format, double_precision, + ensure_ascii, date_unit, default_handler=None): + """ + Adds a `schema` attribut with the Table Schema, resets + the index (can't do in caller, because the schema inference needs + to know what the index is, forces orient to records, and forces + date_format to 'iso'. + """ + super(JSONTableWriter, self).__init__( + obj, orient, date_format, double_precision, ensure_ascii, + date_unit, default_handler=default_handler) + + if date_format != 'iso': + msg = ("Trying to write with `orient='table'` and " + "`date_format='%s'`. Table Schema requires dates " + "to be formatted with `date_format='iso'`" % date_format) + raise ValueError(msg) + + self.schema = build_table_schema(obj) + + # NotImplementd on a column MultiIndex + if obj.ndim == 2 and isinstance(obj.columns, MultiIndex): + raise NotImplementedError( + "orient='table' is not supported for MultiIndex") + + # TODO: Do this timedelta properly in objToJSON.c See GH #15137 + if ((obj.ndim == 1) and (obj.name in set(obj.index.names)) or + len(obj.columns & obj.index.names)): + msg = "Overlapping names between the index and columns" + raise ValueError(msg) + + obj = obj.copy() + timedeltas = obj.select_dtypes(include=['timedelta']).columns + if len(timedeltas): + obj[timedeltas] = obj[timedeltas].applymap( + lambda x: x.isoformat()) + # Convert PeriodIndex to datetimes before serialzing + if is_period_dtype(obj.index): + obj.index = obj.index.to_timestamp() + + self.obj = obj.reset_index() + self.date_format = 'iso' + self.orient = 'records' + + def write(self): + data = super(JSONTableWriter, self).write() + serialized = '{{"schema": {}, "data": {}}}'.format( + dumps(self.schema), data) + return serialized + + def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, @@ -245,6 +307,17 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, col 1 col 2 0 a b 1 c d + + Encoding with Table Schema + + >>> df.to_json(orient='table') + '{"schema": {"fields": [{"name": "index", "type": "string"}, + {"name": "col 1", "type": "string"}, + {"name": "col 2", "type": "string"}], + "primaryKey": "index", + "pandas_version": "0.20.0"}, + "data": [{"index": "row 1", "col 1": "a", "col 2": "b"}, + {"index": "row 2", "col 1": "c", "col 2": "d"}]}' """ filepath_or_buffer, _, _ = get_filepath_or_buffer(path_or_buf, @@ -462,7 +535,7 @@ def _try_convert_to_date(self, data): # ignore numbers that are out of range if issubclass(new_data.dtype.type, np.number): - in_range = (isnull(new_data.values) | (new_data > self.min_stamp) | + in_range = (isna(new_data.values) | (new_data > self.min_stamp) | (new_data.values == iNaT)) if not in_range.all(): return data, False @@ -641,246 +714,3 @@ def is_ok(col): lambda col, c: self._try_convert_to_date(c), lambda col, c: ((self.keep_default_dates and is_ok(col)) or col in convert_dates)) - -# --------------------------------------------------------------------- -# JSON normalization routines - - -def _convert_to_line_delimits(s): - """Helper function that converts json lists to line delimited json.""" - - # Determine we have a JSON list to turn to lines otherwise just return the - # json object, only lists can - if not s[0] == '[' and s[-1] == ']': - return s - s = s[1:-1] - - from pandas.lib import convert_json_to_lines - return convert_json_to_lines(s) - - -def nested_to_record(ds, prefix="", level=0): - """a simplified json_normalize - - converts a nested dict into a flat dict ("record"), unlike json_normalize, - it does not attempt to extract a subset of the data. - - Parameters - ---------- - ds : dict or list of dicts - prefix: the prefix, optional, default: "" - level: the number of levels in the jason string, optional, default: 0 - - Returns - ------- - d - dict or list of dicts, matching `ds` - - Examples - -------- - - IN[52]: nested_to_record(dict(flat1=1,dict1=dict(c=1,d=2), - nested=dict(e=dict(c=1,d=2),d=2))) - Out[52]: - {'dict1.c': 1, - 'dict1.d': 2, - 'flat1': 1, - 'nested.d': 2, - 'nested.e.c': 1, - 'nested.e.d': 2} - """ - singleton = False - if isinstance(ds, dict): - ds = [ds] - singleton = True - - new_ds = [] - for d in ds: - - new_d = copy.deepcopy(d) - for k, v in d.items(): - # each key gets renamed with prefix - if not isinstance(k, compat.string_types): - k = str(k) - if level == 0: - newkey = k - else: - newkey = prefix + '.' + k - - # only dicts gets recurse-flattend - # only at level>1 do we rename the rest of the keys - if not isinstance(v, dict): - if level != 0: # so we skip copying for top level, common case - v = new_d.pop(k) - new_d[newkey] = v - continue - else: - v = new_d.pop(k) - new_d.update(nested_to_record(v, newkey, level + 1)) - new_ds.append(new_d) - - if singleton: - return new_ds[0] - return new_ds - - -def json_normalize(data, record_path=None, meta=None, - meta_prefix=None, - record_prefix=None, - errors='raise'): - - """ - "Normalize" semi-structured JSON data into a flat table - - Parameters - ---------- - data : dict or list of dicts - Unserialized JSON objects - record_path : string or list of strings, default None - Path in each object to list of records. If not passed, data will be - assumed to be an array of records - meta : list of paths (string or list of strings), default None - Fields to use as metadata for each record in resulting table - record_prefix : string, default None - If True, prefix records with dotted (?) path, e.g. foo.bar.field if - path to records is ['foo', 'bar'] - meta_prefix : string, default None - errors : {'raise', 'ignore'}, default 'raise' - - * ignore : will ignore KeyError if keys listed in meta are not - always present - * raise : will raise KeyError if keys listed in meta are not - always present - - .. versionadded:: 0.20.0 - - Returns - ------- - frame : DataFrame - - Examples - -------- - - >>> data = [{'state': 'Florida', - ... 'shortname': 'FL', - ... 'info': { - ... 'governor': 'Rick Scott' - ... }, - ... 'counties': [{'name': 'Dade', 'population': 12345}, - ... {'name': 'Broward', 'population': 40000}, - ... {'name': 'Palm Beach', 'population': 60000}]}, - ... {'state': 'Ohio', - ... 'shortname': 'OH', - ... 'info': { - ... 'governor': 'John Kasich' - ... }, - ... 'counties': [{'name': 'Summit', 'population': 1234}, - ... {'name': 'Cuyahoga', 'population': 1337}]}] - >>> from pandas.io.json import json_normalize - >>> result = json_normalize(data, 'counties', ['state', 'shortname', - ... ['info', 'governor']]) - >>> result - name population info.governor state shortname - 0 Dade 12345 Rick Scott Florida FL - 1 Broward 40000 Rick Scott Florida FL - 2 Palm Beach 60000 Rick Scott Florida FL - 3 Summit 1234 John Kasich Ohio OH - 4 Cuyahoga 1337 John Kasich Ohio OH - - """ - def _pull_field(js, spec): - result = js - if isinstance(spec, list): - for field in spec: - result = result[field] - else: - result = result[spec] - - return result - - # A bit of a hackjob - if isinstance(data, dict): - data = [data] - - if record_path is None: - if any([isinstance(x, dict) for x in compat.itervalues(data[0])]): - # naive normalization, this is idempotent for flat records - # and potentially will inflate the data considerably for - # deeply nested structures: - # {VeryLong: { b: 1,c:2}} -> {VeryLong.b:1 ,VeryLong.c:@} - # - # TODO: handle record value which are lists, at least error - # reasonably - data = nested_to_record(data) - return DataFrame(data) - elif not isinstance(record_path, list): - record_path = [record_path] - - if meta is None: - meta = [] - elif not isinstance(meta, list): - meta = [meta] - - for i, x in enumerate(meta): - if not isinstance(x, list): - meta[i] = [x] - - # Disastrously inefficient for now - records = [] - lengths = [] - - meta_vals = defaultdict(list) - meta_keys = ['.'.join(val) for val in meta] - - def _recursive_extract(data, path, seen_meta, level=0): - if len(path) > 1: - for obj in data: - for val, key in zip(meta, meta_keys): - if level + 1 == len(val): - seen_meta[key] = _pull_field(obj, val[-1]) - - _recursive_extract(obj[path[0]], path[1:], - seen_meta, level=level + 1) - else: - for obj in data: - recs = _pull_field(obj, path[0]) - - # For repeating the metadata later - lengths.append(len(recs)) - - for val, key in zip(meta, meta_keys): - if level + 1 > len(val): - meta_val = seen_meta[key] - else: - try: - meta_val = _pull_field(obj, val[level:]) - except KeyError as e: - if errors == 'ignore': - meta_val = np.nan - else: - raise \ - KeyError("Try running with " - "errors='ignore' as key " - "%s is not always present", e) - meta_vals[key].append(meta_val) - - records.extend(recs) - - _recursive_extract(data, record_path, {}, level=0) - - result = DataFrame(records) - - if record_prefix is not None: - result.rename(columns=lambda x: record_prefix + x, inplace=True) - - # Data types, a problem - for k, v in compat.iteritems(meta_vals): - if meta_prefix is not None: - k = meta_prefix + k - - if k in result: - raise ValueError('Conflicting metadata name %s, ' - 'need distinguishing prefix ' % k) - - result[k] = np.array(v).repeat(lengths) - - return result diff --git a/pandas/io/json/normalize.py b/pandas/io/json/normalize.py new file mode 100644 index 0000000000000..72776ed01de15 --- /dev/null +++ b/pandas/io/json/normalize.py @@ -0,0 +1,275 @@ +# --------------------------------------------------------------------- +# JSON normalization routines + +import copy +from collections import defaultdict +import numpy as np + +from pandas._libs.lib import convert_json_to_lines +from pandas import compat, DataFrame + + +def _convert_to_line_delimits(s): + """Helper function that converts json lists to line delimited json.""" + + # Determine we have a JSON list to turn to lines otherwise just return the + # json object, only lists can + if not s[0] == '[' and s[-1] == ']': + return s + s = s[1:-1] + + return convert_json_to_lines(s) + + +def nested_to_record(ds, prefix="", sep=".", level=0): + """a simplified json_normalize + + converts a nested dict into a flat dict ("record"), unlike json_normalize, + it does not attempt to extract a subset of the data. + + Parameters + ---------- + ds : dict or list of dicts + prefix: the prefix, optional, default: "" + sep : string, default '.' + Nested records will generate names separated by sep, + e.g., for sep='.', { 'foo' : { 'bar' : 0 } } -> foo.bar + + .. versionadded:: 0.20.0 + + level: the number of levels in the jason string, optional, default: 0 + + Returns + ------- + d - dict or list of dicts, matching `ds` + + Examples + -------- + + IN[52]: nested_to_record(dict(flat1=1,dict1=dict(c=1,d=2), + nested=dict(e=dict(c=1,d=2),d=2))) + Out[52]: + {'dict1.c': 1, + 'dict1.d': 2, + 'flat1': 1, + 'nested.d': 2, + 'nested.e.c': 1, + 'nested.e.d': 2} + """ + singleton = False + if isinstance(ds, dict): + ds = [ds] + singleton = True + + new_ds = [] + for d in ds: + + new_d = copy.deepcopy(d) + for k, v in d.items(): + # each key gets renamed with prefix + if not isinstance(k, compat.string_types): + k = str(k) + if level == 0: + newkey = k + else: + newkey = prefix + sep + k + + # only dicts gets recurse-flattend + # only at level>1 do we rename the rest of the keys + if not isinstance(v, dict): + if level != 0: # so we skip copying for top level, common case + v = new_d.pop(k) + new_d[newkey] = v + continue + else: + v = new_d.pop(k) + new_d.update(nested_to_record(v, newkey, sep, level + 1)) + new_ds.append(new_d) + + if singleton: + return new_ds[0] + return new_ds + + +def json_normalize(data, record_path=None, meta=None, + meta_prefix=None, + record_prefix=None, + errors='raise', + sep='.'): + """ + "Normalize" semi-structured JSON data into a flat table + + Parameters + ---------- + data : dict or list of dicts + Unserialized JSON objects + record_path : string or list of strings, default None + Path in each object to list of records. If not passed, data will be + assumed to be an array of records + meta : list of paths (string or list of strings), default None + Fields to use as metadata for each record in resulting table + record_prefix : string, default None + If True, prefix records with dotted (?) path, e.g. foo.bar.field if + path to records is ['foo', 'bar'] + meta_prefix : string, default None + errors : {'raise', 'ignore'}, default 'raise' + + * 'ignore' : will ignore KeyError if keys listed in meta are not + always present + * 'raise' : will raise KeyError if keys listed in meta are not + always present + + .. versionadded:: 0.20.0 + + sep : string, default '.' + Nested records will generate names separated by sep, + e.g., for sep='.', { 'foo' : { 'bar' : 0 } } -> foo.bar + + .. versionadded:: 0.20.0 + + + Returns + ------- + frame : DataFrame + + Examples + -------- + + >>> from pandas.io.json import json_normalize + >>> data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}}, + ... {'name': {'given': 'Mose', 'family': 'Regner'}}, + ... {'id': 2, 'name': 'Faye Raker'}] + >>> json_normalize(data) + id name name.family name.first name.given name.last + 0 1.0 NaN NaN Coleen NaN Volk + 1 NaN NaN Regner NaN Mose NaN + 2 2.0 Faye Raker NaN NaN NaN NaN + + >>> data = [{'state': 'Florida', + ... 'shortname': 'FL', + ... 'info': { + ... 'governor': 'Rick Scott' + ... }, + ... 'counties': [{'name': 'Dade', 'population': 12345}, + ... {'name': 'Broward', 'population': 40000}, + ... {'name': 'Palm Beach', 'population': 60000}]}, + ... {'state': 'Ohio', + ... 'shortname': 'OH', + ... 'info': { + ... 'governor': 'John Kasich' + ... }, + ... 'counties': [{'name': 'Summit', 'population': 1234}, + ... {'name': 'Cuyahoga', 'population': 1337}]}] + >>> result = json_normalize(data, 'counties', ['state', 'shortname', + ... ['info', 'governor']]) + >>> result + name population info.governor state shortname + 0 Dade 12345 Rick Scott Florida FL + 1 Broward 40000 Rick Scott Florida FL + 2 Palm Beach 60000 Rick Scott Florida FL + 3 Summit 1234 John Kasich Ohio OH + 4 Cuyahoga 1337 John Kasich Ohio OH + + """ + def _pull_field(js, spec): + result = js + if isinstance(spec, list): + for field in spec: + result = result[field] + else: + result = result[spec] + + return result + + if isinstance(data, list) and len(data) is 0: + return DataFrame() + + # A bit of a hackjob + if isinstance(data, dict): + data = [data] + + if record_path is None: + if any([isinstance(x, dict) for x in compat.itervalues(data[0])]): + # naive normalization, this is idempotent for flat records + # and potentially will inflate the data considerably for + # deeply nested structures: + # {VeryLong: { b: 1,c:2}} -> {VeryLong.b:1 ,VeryLong.c:@} + # + # TODO: handle record value which are lists, at least error + # reasonably + data = nested_to_record(data, sep=sep) + return DataFrame(data) + elif not isinstance(record_path, list): + record_path = [record_path] + + if meta is None: + meta = [] + elif not isinstance(meta, list): + meta = [meta] + + for i, x in enumerate(meta): + if not isinstance(x, list): + meta[i] = [x] + + # Disastrously inefficient for now + records = [] + lengths = [] + + meta_vals = defaultdict(list) + if not isinstance(sep, compat.string_types): + sep = str(sep) + meta_keys = [sep.join(val) for val in meta] + + def _recursive_extract(data, path, seen_meta, level=0): + if len(path) > 1: + for obj in data: + for val, key in zip(meta, meta_keys): + if level + 1 == len(val): + seen_meta[key] = _pull_field(obj, val[-1]) + + _recursive_extract(obj[path[0]], path[1:], + seen_meta, level=level + 1) + else: + for obj in data: + recs = _pull_field(obj, path[0]) + + # For repeating the metadata later + lengths.append(len(recs)) + + for val, key in zip(meta, meta_keys): + if level + 1 > len(val): + meta_val = seen_meta[key] + else: + try: + meta_val = _pull_field(obj, val[level:]) + except KeyError as e: + if errors == 'ignore': + meta_val = np.nan + else: + raise \ + KeyError("Try running with " + "errors='ignore' as key " + "%s is not always present", e) + meta_vals[key].append(meta_val) + + records.extend(recs) + + _recursive_extract(data, record_path, {}, level=0) + + result = DataFrame(records) + + if record_prefix is not None: + result.rename(columns=lambda x: record_prefix + x, inplace=True) + + # Data types, a problem + for k, v in compat.iteritems(meta_vals): + if meta_prefix is not None: + k = meta_prefix + k + + if k in result: + raise ValueError('Conflicting metadata name %s, ' + 'need distinguishing prefix ' % k) + + result[k] = np.array(v).repeat(lengths) + + return result diff --git a/pandas/io/json/table_schema.py b/pandas/io/json/table_schema.py new file mode 100644 index 0000000000000..c3865afa9c0c0 --- /dev/null +++ b/pandas/io/json/table_schema.py @@ -0,0 +1,181 @@ +""" +Table Schema builders + +http://specs.frictionlessdata.io/json-table-schema/ +""" +from pandas.core.dtypes.common import ( + is_integer_dtype, is_timedelta64_dtype, is_numeric_dtype, + is_bool_dtype, is_datetime64_dtype, is_datetime64tz_dtype, + is_categorical_dtype, is_period_dtype, is_string_dtype +) + + +def as_json_table_type(x): + """ + Convert a NumPy / pandas type to its corresponding json_table. + + Parameters + ---------- + x : array or dtype + + Returns + ------- + t : str + the Table Schema data types + + Notes + ----- + This table shows the relationship between NumPy / pandas dtypes, + and Table Schema dtypes. + + ============== ================= + Pandas type Table Schema type + ============== ================= + int64 integer + float64 number + bool boolean + datetime64[ns] datetime + timedelta64[ns] duration + object str + categorical any + =============== ================= + """ + if is_integer_dtype(x): + return 'integer' + elif is_bool_dtype(x): + return 'boolean' + elif is_numeric_dtype(x): + return 'number' + elif (is_datetime64_dtype(x) or is_datetime64tz_dtype(x) or + is_period_dtype(x)): + return 'datetime' + elif is_timedelta64_dtype(x): + return 'duration' + elif is_categorical_dtype(x): + return 'any' + elif is_string_dtype(x): + return 'string' + else: + return 'any' + + +def set_default_names(data): + """Sets index names to 'index' for regular, or 'level_x' for Multi""" + if all(name is not None for name in data.index.names): + return data + + data = data.copy() + if data.index.nlevels > 1: + names = [name if name is not None else 'level_{}'.format(i) + for i, name in enumerate(data.index.names)] + data.index.names = names + else: + data.index.name = data.index.name or 'index' + return data + + +def make_field(arr, dtype=None): + dtype = dtype or arr.dtype + if arr.name is None: + name = 'values' + else: + name = arr.name + field = {'name': name, + 'type': as_json_table_type(dtype)} + + if is_categorical_dtype(arr): + if hasattr(arr, 'categories'): + cats = arr.categories + ordered = arr.ordered + else: + cats = arr.cat.categories + ordered = arr.cat.ordered + field['constraints'] = {"enum": list(cats)} + field['ordered'] = ordered + elif is_period_dtype(arr): + field['freq'] = arr.freqstr + elif is_datetime64tz_dtype(arr): + if hasattr(arr, 'dt'): + field['tz'] = arr.dt.tz.zone + else: + field['tz'] = arr.tz.zone + return field + + +def build_table_schema(data, index=True, primary_key=None, version=True): + """ + Create a Table schema from ``data``. + + Parameters + ---------- + data : Series, DataFrame + index : bool, default True + Whether to include ``data.index`` in the schema. + primary_key : bool or None, default True + column names to designate as the primary key. + The default `None` will set `'primaryKey'` to the index + level or levels if the index is unique. + version : bool, default True + Whether to include a field `pandas_version` with the version + of pandas that generated the schema. + + Returns + ------- + schema : dict + + Examples + -------- + >>> df = pd.DataFrame( + ... {'A': [1, 2, 3], + ... 'B': ['a', 'b', 'c'], + ... 'C': pd.date_range('2016-01-01', freq='d', periods=3), + ... }, index=pd.Index(range(3), name='idx')) + >>> build_table_schema(df) + {'fields': [{'name': 'idx', 'type': 'integer'}, + {'name': 'A', 'type': 'integer'}, + {'name': 'B', 'type': 'string'}, + {'name': 'C', 'type': 'datetime'}], + 'pandas_version': '0.20.0', + 'primaryKey': ['idx']} + + Notes + ----- + See `_as_json_table_type` for conversion types. + Timedeltas as converted to ISO8601 duration format with + 9 decimal places after the secnods field for nanosecond precision. + + Categoricals are converted to the `any` dtype, and use the `enum` field + constraint to list the allowed values. The `ordered` attribute is included + in an `ordered` field. + """ + if index is True: + data = set_default_names(data) + + schema = {} + fields = [] + + if index: + if data.index.nlevels > 1: + for level in data.index.levels: + fields.append(make_field(level)) + else: + fields.append(make_field(data.index)) + + if data.ndim > 1: + for column, s in data.iteritems(): + fields.append(make_field(s)) + else: + fields.append(make_field(data)) + + schema['fields'] = fields + if index and data.index.is_unique and primary_key is None: + if data.index.nlevels == 1: + schema['primaryKey'] = [data.index.name] + else: + schema['primaryKey'] = data.index.names + elif primary_key is not None: + schema['primaryKey'] = primary_key + + if version: + schema['pandas_version'] = '0.20.0' + return schema diff --git a/pandas/msgpack/__init__.py b/pandas/io/msgpack/__init__.py similarity index 81% rename from pandas/msgpack/__init__.py rename to pandas/io/msgpack/__init__.py index 33d60a12ef0a3..984e90ee03e69 100644 --- a/pandas/msgpack/__init__.py +++ b/pandas/io/msgpack/__init__.py @@ -2,8 +2,8 @@ from collections import namedtuple -from pandas.msgpack.exceptions import * # noqa -from pandas.msgpack._version import version # noqa +from pandas.io.msgpack.exceptions import * # noqa +from pandas.io.msgpack._version import version # noqa class ExtType(namedtuple('ExtType', 'code data')): @@ -19,8 +19,8 @@ def __new__(cls, code, data): import os # noqa -from pandas.msgpack._packer import Packer # noqa -from pandas.msgpack._unpacker import unpack, unpackb, Unpacker # noqa +from pandas.io.msgpack._packer import Packer # noqa +from pandas.io.msgpack._unpacker import unpack, unpackb, Unpacker # noqa def pack(o, stream, **kwargs): @@ -41,6 +41,7 @@ def packb(o, **kwargs): """ return Packer(**kwargs).pack(o) + # alias for compatibility to simplejson/marshal/pickle. load = unpack loads = unpackb diff --git a/pandas/msgpack/_packer.pyx b/pandas/io/msgpack/_packer.pyx similarity index 98% rename from pandas/msgpack/_packer.pyx rename to pandas/io/msgpack/_packer.pyx index 008dbe5541d50..ad7ce1fb2531a 100644 --- a/pandas/msgpack/_packer.pyx +++ b/pandas/io/msgpack/_packer.pyx @@ -6,11 +6,11 @@ from libc.stdlib cimport * from libc.string cimport * from libc.limits cimport * -from pandas.msgpack.exceptions import PackValueError -from pandas.msgpack import ExtType +from pandas.io.msgpack.exceptions import PackValueError +from pandas.io.msgpack import ExtType -cdef extern from "../src/msgpack/pack.h": +cdef extern from "../../src/msgpack/pack.h": struct msgpack_packer: char* buf size_t length diff --git a/pandas/msgpack/_unpacker.pyx b/pandas/io/msgpack/_unpacker.pyx similarity index 98% rename from pandas/msgpack/_unpacker.pyx rename to pandas/io/msgpack/_unpacker.pyx index 6f23a24adde6c..504bfed48df3c 100644 --- a/pandas/msgpack/_unpacker.pyx +++ b/pandas/io/msgpack/_unpacker.pyx @@ -11,12 +11,12 @@ from libc.stdlib cimport * from libc.string cimport * from libc.limits cimport * -from pandas.msgpack.exceptions import (BufferFull, OutOfData, - UnpackValueError, ExtraData) -from pandas.msgpack import ExtType +from pandas.io.msgpack.exceptions import (BufferFull, OutOfData, + UnpackValueError, ExtraData) +from pandas.io.msgpack import ExtType -cdef extern from "../src/msgpack/unpack.h": +cdef extern from "../../src/msgpack/unpack.h": ctypedef struct msgpack_user: bint use_list PyObject* object_hook diff --git a/pandas/msgpack/_version.py b/pandas/io/msgpack/_version.py similarity index 100% rename from pandas/msgpack/_version.py rename to pandas/io/msgpack/_version.py diff --git a/pandas/msgpack/exceptions.py b/pandas/io/msgpack/exceptions.py similarity index 99% rename from pandas/msgpack/exceptions.py rename to pandas/io/msgpack/exceptions.py index 40f5a8af8f583..ae0f74a6700bd 100644 --- a/pandas/msgpack/exceptions.py +++ b/pandas/io/msgpack/exceptions.py @@ -15,6 +15,7 @@ class UnpackValueError(UnpackException, ValueError): class ExtraData(ValueError): + def __init__(self, unpacked, extra): self.unpacked = unpacked self.extra = extra diff --git a/pandas/io/packers.py b/pandas/io/packers.py index ab44e46c96b77..a2fc4db23700c 100644 --- a/pandas/io/packers.py +++ b/pandas/io/packers.py @@ -48,23 +48,24 @@ from pandas import compat from pandas.compat import u, u_safe -from pandas.types.common import (is_categorical_dtype, is_object_dtype, - needs_i8_conversion, pandas_dtype) +from pandas.core.dtypes.common import ( + is_categorical_dtype, is_object_dtype, + needs_i8_conversion, pandas_dtype) from pandas import (Timestamp, Period, Series, DataFrame, # noqa Index, MultiIndex, Float64Index, Int64Index, Panel, RangeIndex, PeriodIndex, DatetimeIndex, NaT, - Categorical) -from pandas.tslib import NaTType -from pandas.sparse.api import SparseSeries, SparseDataFrame -from pandas.sparse.array import BlockIndex, IntIndex + Categorical, CategoricalIndex) +from pandas._libs.tslib import NaTType +from pandas.core.sparse.api import SparseSeries, SparseDataFrame +from pandas.core.sparse.array import BlockIndex, IntIndex from pandas.core.generic import NDFrame -from pandas.core.common import PerformanceWarning -from pandas.io.common import get_filepath_or_buffer +from pandas.errors import PerformanceWarning +from pandas.io.common import get_filepath_or_buffer, _stringify_path from pandas.core.internals import BlockManager, make_block, _safe_reshape import pandas.core.internals as internals -from pandas.msgpack import Unpacker as _Unpacker, Packer as _Packer, ExtType +from pandas.io.msgpack import Unpacker as _Unpacker, Packer as _Packer, ExtType from pandas.util._move import ( BadMove as _BadMove, move_into_mutable_buffer as _move_into_mutable_buffer, @@ -148,6 +149,7 @@ def writer(fh): for a in args: fh.write(pack(a, **kwargs)) + path_or_buf = _stringify_path(path_or_buf) if isinstance(path_or_buf, compat.string_types): with open(path_or_buf, mode) as fh: writer(fh) @@ -217,6 +219,7 @@ def read(fh): raise ValueError('path_or_buf needs to be a string file path or file-like') + dtype_dict = {21: np.dtype('M8[ns]'), u('datetime64[ns]'): np.dtype('M8[ns]'), u('datetime64[us]'): np.dtype('M8[us]'), @@ -237,6 +240,7 @@ def dtype_for(t): return dtype_dict[t] return np.typeDict.get(t, t) + c2f_dict = {'complex': np.float64, 'complex128': np.float64, 'complex64': np.float32} @@ -571,7 +575,7 @@ def decode(obj): elif typ == u'period_index': data = unconvert(obj[u'data'], np.int64, obj.get(u'compress')) d = dict(name=obj[u'name'], freq=obj[u'freq']) - return globals()[obj[u'klass']](data, **d) + return globals()[obj[u'klass']]._from_ordinals(data, **d) elif typ == u'datetime_index': data = unconvert(obj[u'data'], np.int64, obj.get(u'compress')) d = dict(name=obj[u'name'], freq=obj[u'freq'], verify_integrity=False) @@ -587,8 +591,7 @@ def decode(obj): from_codes = globals()[obj[u'klass']].from_codes return from_codes(codes=obj[u'codes'], categories=obj[u'categories'], - ordered=obj[u'ordered'], - name=obj[u'name']) + ordered=obj[u'ordered']) elif typ == u'series': dtype = dtype_for(obj[u'dtype']) diff --git a/pandas/io/parquet.py b/pandas/io/parquet.py new file mode 100644 index 0000000000000..09603fd6fdcce --- /dev/null +++ b/pandas/io/parquet.py @@ -0,0 +1,194 @@ +""" parquet compat """ + +from warnings import catch_warnings +from distutils.version import LooseVersion +from pandas import DataFrame, RangeIndex, Int64Index, get_option +from pandas.compat import range +from pandas.io.common import get_filepath_or_buffer + + +def get_engine(engine): + """ return our implementation """ + + if engine == 'auto': + engine = get_option('io.parquet.engine') + + if engine == 'auto': + # try engines in this order + try: + return PyArrowImpl() + except ImportError: + pass + + try: + return FastParquetImpl() + except ImportError: + pass + + if engine not in ['pyarrow', 'fastparquet']: + raise ValueError("engine must be one of 'pyarrow', 'fastparquet'") + + if engine == 'pyarrow': + return PyArrowImpl() + elif engine == 'fastparquet': + return FastParquetImpl() + + +class PyArrowImpl(object): + + def __init__(self): + # since pandas is a dependency of pyarrow + # we need to import on first use + + try: + import pyarrow + import pyarrow.parquet + except ImportError: + raise ImportError("pyarrow is required for parquet support\n\n" + "you can install via conda\n" + "conda install pyarrow -c conda-forge\n" + "\nor via pip\n" + "pip install -U pyarrow\n") + + if LooseVersion(pyarrow.__version__) < '0.4.1': + raise ImportError("pyarrow >= 0.4.1 is required for parquet" + "support\n\n" + "you can install via conda\n" + "conda install pyarrow -c conda-forge\n" + "\nor via pip\n" + "pip install -U pyarrow\n") + + self.api = pyarrow + + def write(self, df, path, compression='snappy', **kwargs): + path, _, _ = get_filepath_or_buffer(path) + table = self.api.Table.from_pandas(df, timestamps_to_ms=True) + self.api.parquet.write_table( + table, path, compression=compression, **kwargs) + + def read(self, path): + path, _, _ = get_filepath_or_buffer(path) + return self.api.parquet.read_table(path).to_pandas() + + +class FastParquetImpl(object): + + def __init__(self): + # since pandas is a dependency of fastparquet + # we need to import on first use + + try: + import fastparquet + except ImportError: + raise ImportError("fastparquet is required for parquet support\n\n" + "you can install via conda\n" + "conda install fastparquet -c conda-forge\n" + "\nor via pip\n" + "pip install -U fastparquet") + + if LooseVersion(fastparquet.__version__) < '0.1.0': + raise ImportError("fastparquet >= 0.1.0 is required for parquet " + "support\n\n" + "you can install via conda\n" + "conda install fastparquet -c conda-forge\n" + "\nor via pip\n" + "pip install -U fastparquet") + + self.api = fastparquet + + def write(self, df, path, compression='snappy', **kwargs): + # thriftpy/protocol/compact.py:339: + # DeprecationWarning: tostring() is deprecated. + # Use tobytes() instead. + path, _, _ = get_filepath_or_buffer(path) + with catch_warnings(record=True): + self.api.write(path, df, + compression=compression, **kwargs) + + def read(self, path): + path, _, _ = get_filepath_or_buffer(path) + return self.api.ParquetFile(path).to_pandas() + + +def to_parquet(df, path, engine='auto', compression='snappy', **kwargs): + """ + Write a DataFrame to the parquet format. + + Parameters + ---------- + df : DataFrame + path : string + File path + engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' + Parquet reader library to use. If 'auto', then the option + 'io.parquet.engine' is used. If 'auto', then the first + library to be installed is used. + compression : str, optional, default 'snappy' + compression method, includes {'gzip', 'snappy', 'brotli'} + kwargs + Additional keyword arguments passed to the engine + """ + + impl = get_engine(engine) + + if not isinstance(df, DataFrame): + raise ValueError("to_parquet only support IO with DataFrames") + + valid_types = {'string', 'unicode'} + + # validate index + # -------------- + + # validate that we have only a default index + # raise on anything else as we don't serialize the index + + if not isinstance(df.index, Int64Index): + raise ValueError("parquet does not support serializing {} " + "for the index; you can .reset_index()" + "to make the index into column(s)".format( + type(df.index))) + + if not df.index.equals(RangeIndex.from_range(range(len(df)))): + raise ValueError("parquet does not support serializing a " + "non-default index for the index; you " + "can .reset_index() to make the index " + "into column(s)") + + if df.index.name is not None: + raise ValueError("parquet does not serialize index meta-data on a " + "default index") + + # validate columns + # ---------------- + + # must have value column names (strings only) + if df.columns.inferred_type not in valid_types: + raise ValueError("parquet must have string column names") + + return impl.write(df, path, compression=compression) + + +def read_parquet(path, engine='auto', **kwargs): + """ + Load a parquet object from the file path, returning a DataFrame. + + .. versionadded 0.21.0 + + Parameters + ---------- + path : string + File path + engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' + Parquet reader library to use. If 'auto', then the option + 'io.parquet.engine' is used. If 'auto', then the first + library to be installed is used. + kwargs are passed to the engine + + Returns + ------- + DataFrame + + """ + + impl = get_engine(engine) + return impl.read(path) diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py index f8905dfa315c4..8b1a921536a1d 100755 --- a/pandas/io/parsers.py +++ b/pandas/io/parsers.py @@ -13,31 +13,35 @@ import numpy as np from pandas import compat -from pandas.compat import (range, lrange, StringIO, lzip, +from pandas.compat import (range, lrange, PY3, StringIO, lzip, zip, string_types, map, u) -from pandas.types.common import (is_integer, _ensure_object, - is_list_like, is_integer_dtype, - is_float, is_dtype_equal, - is_object_dtype, is_string_dtype, - is_scalar, is_categorical_dtype) -from pandas.types.missing import isnull -from pandas.types.cast import _astype_nansafe -from pandas.core.index import Index, MultiIndex, RangeIndex +from pandas.core.dtypes.common import ( + is_integer, _ensure_object, + is_list_like, is_integer_dtype, + is_float, is_dtype_equal, + is_object_dtype, is_string_dtype, + is_scalar, is_categorical_dtype) +from pandas.core.dtypes.missing import isna +from pandas.core.dtypes.cast import astype_nansafe +from pandas.core.index import (Index, MultiIndex, RangeIndex, + _ensure_index_from_sequences) from pandas.core.series import Series from pandas.core.frame import DataFrame from pandas.core.categorical import Categorical +from pandas.core import algorithms from pandas.core.common import AbstractMethodError from pandas.io.date_converters import generic_parser -from pandas.io.common import (get_filepath_or_buffer, _validate_header_arg, - _get_handle, UnicodeReader, UTF8Recoder, - BaseIterator, ParserError, EmptyDataError, - ParserWarning, _NA_VALUES, _infer_compression) -from pandas.tseries import tools +from pandas.errors import ParserWarning, ParserError, EmptyDataError +from pandas.io.common import (get_filepath_or_buffer, is_file_like, + _validate_header_arg, _get_handle, + UnicodeReader, UTF8Recoder, _NA_VALUES, + BaseIterator, _infer_compression) +from pandas.core.tools import datetimes as tools -from pandas.util.decorators import Appender +from pandas.util._decorators import Appender -import pandas.lib as lib -import pandas.parser as _parser +import pandas._libs.lib as lib +import pandas._libs.parsers as parsers # BOM character (byte order mark) @@ -60,8 +64,6 @@ file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv %s -delimiter : str, default ``None`` - Alternative argument name for sep. delim_whitespace : boolean, default False Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be used as the sep. Equivalent to setting ``sep='\s+'``. If this option @@ -102,8 +104,8 @@ ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster parsing time and lower memory usage. as_recarray : boolean, default False - DEPRECATED: this argument will be removed in a future version. Please call - `pd.read_csv(...).to_records()` instead. + .. deprecated:: 0.19.0 + Please call `pd.read_csv(...).to_records()` instead. Return a NumPy recarray instead of a DataFrame after parsing the data. If set to True, this option takes precedence over the `squeeze` parameter. @@ -142,14 +144,15 @@ skipfooter : int, default 0 Number of lines at bottom of file to skip (Unsupported with engine='c') skip_footer : int, default 0 - DEPRECATED: use the `skipfooter` parameter instead, as they are identical + .. deprecated:: 0.19.0 + Use the `skipfooter` parameter instead, as they are identical nrows : int, default None Number of rows of file to read. Useful for reading pieces of large files na_values : scalar, str, list-like, or dict, default None Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), - 70, subsequent_indent=" ") + """'`. + 70, subsequent_indent=" ") + """'. keep_default_na : bool, default True If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they're appended to. @@ -168,7 +171,7 @@ * list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. * list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as - a single date column. + a single date column. * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' @@ -178,38 +181,39 @@ Note: A fast-path exists for iso8601-formatted dates. infer_datetime_format : boolean, default False - If True and parse_dates is enabled, pandas will attempt to infer the format - of the datetime strings in the columns, and if it can be inferred, switch - to a faster method of parsing them. In some cases this can increase the - parsing speed by ~5-10x. + If True and `parse_dates` is enabled, pandas will attempt to infer the + format of the datetime strings in the columns, and if it can be inferred, + switch to a faster method of parsing them. In some cases this can increase + the parsing speed by 5-10x. keep_date_col : boolean, default False - If True and parse_dates specifies combining multiple columns then + If True and `parse_dates` specifies combining multiple columns then keep the original columns. date_parser : function, default None Function to use for converting a sequence of string columns to an array of datetime instances. The default uses ``dateutil.parser.parser`` to do the - conversion. Pandas will try to call date_parser in three different ways, + conversion. Pandas will try to call `date_parser` in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays - (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the - string values from the columns defined by parse_dates into a single array - and pass that; and 3) call date_parser once for each row using one or more - strings (corresponding to the columns defined by parse_dates) as arguments. + (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the + string values from the columns defined by `parse_dates` into a single array + and pass that; and 3) call `date_parser` once for each row using one or + more strings (corresponding to the columns defined by `parse_dates`) as + arguments. dayfirst : boolean, default False DD/MM format dates, international and European format iterator : boolean, default False Return TextFileReader object for iteration or getting chunks with ``get_chunk()``. chunksize : int, default None - Return TextFileReader object for iteration. `See IO Tools docs for more - information - `_ on - ``iterator`` and ``chunksize``. + Return TextFileReader object for iteration. + See the `IO Tools docs + `_ + for more information on ``iterator`` and ``chunksize``. compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer' - For on-the-fly decompression of on-disk data. If 'infer', then use gzip, - bz2, zip or xz if filepath_or_buffer is a string ending in '.gz', '.bz2', - '.zip', or 'xz', respectively, and no decompression otherwise. If using - 'zip', the ZIP file must contain only one data file to be read in. - Set to None for no decompression. + For on-the-fly decompression of on-disk data. If 'infer' and + `filepath_or_buffer` is path-like, then detect compression from the + following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no + decompression). If using 'zip', the ZIP file must contain only one data + file to be read in. Set to None for no decompression. .. versionadded:: 0.18.1 support for 'zip' and 'xz' compression. @@ -261,10 +265,10 @@ Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these "bad lines" will dropped from the DataFrame that is - returned. (Only valid with C parser) + returned. warn_bad_lines : boolean, default True If error_bad_lines is False, and warn_bad_lines is True, a warning for each - "bad line" will be output. (Only valid with C parser). + "bad line" will be output. low_memory : boolean, default True Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed @@ -273,17 +277,19 @@ use the `chunksize` or `iterator` parameter to return the data in chunks. (Only valid with C parser) buffer_lines : int, default None - DEPRECATED: this argument will be removed in a future version because its - value is not respected by the parser + .. deprecated:: 0.19.0 + This argument is not respected by the parser compact_ints : boolean, default False - DEPRECATED: this argument will be removed in a future version + .. deprecated:: 0.19.0 + Argument moved to ``pd.to_numeric`` If compact_ints is True, then for any column that is of integer dtype, the parser will attempt to cast it as the smallest integer dtype possible, either signed or unsigned depending on the specification from the `use_unsigned` parameter. use_unsigned : boolean, default False - DEPRECATED: this argument will be removed in a future version + .. deprecated:: 0.19.0 + Argument moved to ``pd.to_numeric`` If integer columns are being compacted (i.e. `compact_ints=True`), specify whether the column should be compacted to the smallest signed or unsigned @@ -304,10 +310,14 @@ currently more feature-complete.""" _sep_doc = r"""sep : str, default {default} - Delimiter to use. If sep is None, will try to automatically determine - this. Separators longer than 1 character and different from ``'\s+'`` will - be interpreted as regular expressions, will force use of the python parsing - engine and will ignore quotes in the data. Regex example: ``'\r\t'``""" + Delimiter to use. If sep is None, the C engine cannot automatically detect + the separator, but the Python parsing engine can, meaning the latter will + be used automatically. In addition, separators longer than 1 character and + different from ``'\s+'`` will be interpreted as regular expressions and + will also force the use of the Python parsing engine. Note that regex + delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'`` +delimiter : str, default ``None`` + Alternative argument name for sep.""" _read_csv_doc = """ Read CSV (comma-separated) file into DataFrame @@ -332,36 +342,47 @@ widths : list of ints. optional A list of field widths which can be used instead of 'colspecs' if the intervals are contiguous. +delimiter : str, default ``'\t' + ' '`` + Characters to consider as filler characters in the fixed-width file. + Can be used to specify the filler character of the fields + if it is not spaces (e.g., '~'). """ _read_fwf_doc = """ Read a table of fixed-width formatted lines into DataFrame %s - -Also, 'delimiter' is used to specify the filler character of the -fields if it is not spaces (e.g., '~'). """ % (_parser_params % (_fwf_widths, '')) -def _validate_nrows(nrows): +def _validate_integer(name, val, min_val=0): """ - Checks whether the 'nrows' parameter for parsing is either + Checks whether the 'name' parameter for parsing is either an integer OR float that can SAFELY be cast to an integer without losing accuracy. Raises a ValueError if that is not the case. + + Parameters + ---------- + name : string + Parameter name (used for error reporting) + val : int or float + The value to check + min_val : int + Minimum allowed value (val < min_val will result in a ValueError) """ - msg = "'nrows' must be an integer" + msg = "'{name:s}' must be an integer >={min_val:d}".format(name=name, + min_val=min_val) - if nrows is not None: - if is_float(nrows): - if int(nrows) != nrows: + if val is not None: + if is_float(val): + if int(val) != val: raise ValueError(msg) - nrows = int(nrows) - elif not is_integer(nrows): + val = int(val) + elif not (is_integer(val) and val >= min_val): raise ValueError(msg) - return nrows + return val def _read(filepath_or_buffer, kwds): @@ -383,32 +404,22 @@ def _read(filepath_or_buffer, kwds): # Extract some of the arguments (pass chunksize on). iterator = kwds.get('iterator', False) - chunksize = kwds.get('chunksize', None) - nrows = _validate_nrows(kwds.pop('nrows', None)) + chunksize = _validate_integer('chunksize', kwds.get('chunksize', None), 1) + nrows = _validate_integer('nrows', kwds.get('nrows', None)) # Create the parser. parser = TextFileReader(filepath_or_buffer, **kwds) - if (nrows is not None) and (chunksize is not None): - raise NotImplementedError("'nrows' and 'chunksize' cannot be used" - " together yet.") - elif nrows is not None: - try: - data = parser.read(nrows) - finally: - parser.close() - return data - - elif chunksize or iterator: + if chunksize or iterator: return parser try: - data = parser.read() + data = parser.read(nrows) finally: parser.close() - return data + _parser_defaults = { 'delimiter': None, @@ -444,7 +455,7 @@ def _read(filepath_or_buffer, kwds): 'usecols': None, - # 'nrows': None, + 'nrows': None, # 'iterator': False, 'chunksize': None, 'verbose': False, @@ -477,20 +488,18 @@ def _read(filepath_or_buffer, kwds): 'widths': None, } -_c_unsupported = set(['skipfooter']) -_python_unsupported = set([ +_c_unsupported = {'skipfooter'} +_python_unsupported = { 'low_memory', 'buffer_lines', - 'error_bad_lines', - 'warn_bad_lines', 'float_precision', -]) -_deprecated_args = set([ +} +_deprecated_args = { 'as_recarray', 'buffer_lines', 'compact_ints', 'use_unsigned', -]) +} def _make_parser_function(name, sep=','): @@ -655,6 +664,7 @@ def parser_f(filepath_or_buffer, return parser_f + read_csv = _make_parser_function('read_csv', sep=',') read_csv = Appender(_read_csv_doc)(read_csv) @@ -747,10 +757,13 @@ def __init__(self, f, engine=None, **kwds): options = self._get_options_with_defaults(engine) self.chunksize = options.pop('chunksize', None) + self.nrows = options.pop('nrows', None) self.squeeze = options.pop('squeeze', False) # might mutate self.engine + self.engine = self._check_file_or_buffer(f, engine) self.options, self.engine = self._clean_options(options, engine) + if 'has_index_names' in kwds: self.options['has_index_names'] = kwds['has_index_names'] @@ -796,6 +809,23 @@ def _get_options_with_defaults(self, engine): return options + def _check_file_or_buffer(self, f, engine): + # see gh-16530 + if is_file_like(f): + next_attr = "__next__" if PY3 else "next" + + # The C engine doesn't need the file-like to have the "next" or + # "__next__" attribute. However, the Python engine explicitly calls + # "next(...)" when iterating through such an object, meaning it + # needs to have that attribute ("next" for Python 2.x, "__next__" + # for Python 3.x) + if engine != "c" and not hasattr(f, next_attr): + msg = ("The 'python' engine cannot iterate " + "through this file buffer.") + raise ValueError(msg) + + return engine + def _clean_options(self, options, engine): result = options.copy() @@ -964,6 +994,10 @@ def _make_engine(self, engine='c'): klass = PythonParser elif engine == 'python-fwf': klass = FixedWidthFieldParser + else: + raise ValueError('Unknown engine: {engine} (valid options are' + ' "c", "python", or' ' "python-fwf")'.format( + engine=engine)) self._engine = klass(self.f, **self.options) def _failover_to_python(self): @@ -1007,6 +1041,10 @@ def _create_index(self, ret): def get_chunk(self, size=None): if size is None: size = self.chunksize + if self.nrows is not None: + if self._currow >= self.nrows: + raise StopIteration + size = min(size, self.nrows - self._currow) return self.read(nrows=size) @@ -1028,6 +1066,37 @@ def _evaluate_usecols(usecols, names): return usecols +def _validate_skipfooter_arg(skipfooter): + """ + Validate the 'skipfooter' parameter. + + Checks whether 'skipfooter' is a non-negative integer. + Raises a ValueError if that is not the case. + + Parameters + ---------- + skipfooter : non-negative integer + The number of rows to skip at the end of the file. + + Returns + ------- + validated_skipfooter : non-negative integer + The original input if the validation succeeds. + + Raises + ------ + ValueError : 'skipfooter' was not a non-negative integer. + """ + + if not is_integer(skipfooter): + raise ValueError("skipfooter must be an integer") + + if skipfooter < 0: + raise ValueError("skipfooter cannot be negative") + + return skipfooter + + def _validate_usecols_arg(usecols): """ Validate the 'usecols' parameter. @@ -1124,6 +1193,8 @@ def __init__(self, kwds): # validate header options for mi self.header = kwds.get('header') if isinstance(self.header, (list, tuple, np.ndarray)): + if not all(map(is_integer, self.header)): + raise ValueError("header must be integer or list of integers") if kwds.get('as_recarray'): raise ValueError("cannot specify as_recarray when " "specifying a multi-index header") @@ -1144,6 +1215,10 @@ def __init__(self, kwds): raise ValueError("index_col must only contain row numbers " "when specifying a multi-index header") + # GH 16338 + elif self.header is not None and not is_integer(self.header): + raise ValueError("header must be integer or list of integers") + self._name_processed = False self._first_chunk = True @@ -1167,13 +1242,18 @@ def _should_parse_dates(self, i): if isinstance(self.parse_dates, bool): return self.parse_dates else: - name = self.index_names[i] + if self.index_names is not None: + name = self.index_names[i] + else: + name = None j = self.index_col[i] if is_scalar(self.parse_dates): - return (j == self.parse_dates) or (name == self.parse_dates) + return ((j == self.parse_dates) or + (name is not None and name == self.parse_dates)) else: - return (j in self.parse_dates) or (name in self.parse_dates) + return ((j in self.parse_dates) or + (name is not None and name in self.parse_dates)) def _extract_multi_indexer_columns(self, header, index_names, col_names, passed_names=False): @@ -1239,14 +1319,18 @@ def _maybe_dedup_names(self, names): # would be nice! if self.mangle_dupe_cols: names = list(names) # so we can index - counts = {} + counts = defaultdict(int) for i, col in enumerate(names): - cur_count = counts.get(col, 0) + cur_count = counts[col] - if cur_count > 0: - names[i] = '%s.%d' % (col, cur_count) + while cur_count > 0: + counts[col] = cur_count + 1 + col = '%s.%d' % (col, cur_count) + cur_count = counts[col] + + names[i] = col counts[col] = cur_count + 1 return names @@ -1343,6 +1427,7 @@ def _get_name(icol): def _agg_index(self, index, try_parse_dates=True): arrays = [] + for i, arr in enumerate(index): if (try_parse_dates and self._should_parse_dates(i)): @@ -1360,7 +1445,8 @@ def _agg_index(self, index, try_parse_dates=True): arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues) arrays.append(arr) - index = MultiIndex.from_arrays(arrays, names=self.index_names) + names = self.index_names + index = _ensure_index_from_sequences(arrays, names) return index @@ -1392,7 +1478,8 @@ def _convert_to_ndarrays(self, dct, na_values, na_fvalues, verbose=False, try: values = lib.map_infer(values, conv_f) except ValueError: - mask = lib.ismember(values, na_values).view(np.uint8) + mask = algorithms.isin( + values, list(na_values)).view(np.uint8) values = lib.map_infer_mask(values, conv_f, mask) cvals, na_count = self._infer_types( @@ -1413,7 +1500,7 @@ def _convert_to_ndarrays(self, dct, na_values, na_fvalues, verbose=False, if issubclass(cvals.dtype.type, np.integer) and self.compact_ints: cvals = lib.downcast_int64( - cvals, _parser.na_values, + cvals, parsers.na_values, self.use_unsigned) result[c] = cvals @@ -1440,7 +1527,7 @@ def _infer_types(self, values, na_values, try_num_bool=True): na_count = 0 if issubclass(values.dtype.type, (np.number, np.bool_)): - mask = lib.ismember(values, na_values) + mask = algorithms.isin(values, list(na_values)) na_count = mask.sum() if na_count > 0: if is_integer_dtype(values): @@ -1451,7 +1538,7 @@ def _infer_types(self, values, na_values, try_num_bool=True): if try_num_bool: try: result = lib.maybe_convert_numeric(values, na_values, False) - na_count = isnull(result).sum() + na_count = isna(result).sum() except Exception: result = values if values.dtype == np.object_: @@ -1490,11 +1577,11 @@ def _cast_types(self, values, cast_type, column): # c-parser which parses all categories # as strings if not is_object_dtype(values): - values = _astype_nansafe(values, str) + values = astype_nansafe(values, str) values = Categorical(values) else: try: - values = _astype_nansafe(values, cast_type, copy=True) + values = astype_nansafe(values, cast_type, copy=True) except ValueError: raise ValueError("Unable to convert column %s to " "type %s" % (column, cast_type)) @@ -1502,6 +1589,7 @@ def _cast_types(self, values, cast_type, column): def _do_date_conversions(self, names, data): # returns data, columns + if self.parse_dates is not None: data, names = _process_date_conversion( data, self._date_conv, self.parse_dates, self.index_col, @@ -1531,7 +1619,7 @@ def __init__(self, src, **kwds): # #2442 kwds['allow_leading_cols'] = self.index_col is not False - self._reader = _parser.TextReader(src, **kwds) + self._reader = parsers.TextReader(src, **kwds) # XXX self.usecols, self.usecols_dtype = _validate_usecols_arg( @@ -1572,6 +1660,12 @@ def __init__(self, src, **kwds): if self.usecols: usecols = _evaluate_usecols(self.usecols, self.orig_names) + + # GH 14671 + if (self.usecols_dtype == 'string' and + not set(usecols).issubset(self.orig_names)): + raise ValueError("Usecols do not match names.") + if len(self.names) > len(usecols): self.names = [n for i, n in enumerate(self.names) if (i in usecols or n in usecols)] @@ -1716,7 +1810,7 @@ def read(self, nrows=None): try_parse_dates=True) arrays.append(values) - index = MultiIndex.from_arrays(arrays) + index = _ensure_index_from_sequences(arrays) if self.usecols is not None: names = self._filter_usecols(names) @@ -1864,7 +1958,7 @@ def __init__(self, f, **kwds): else: self.skipfunc = lambda x: x in self.skiprows - self.skipfooter = kwds['skipfooter'] + self.skipfooter = _validate_skipfooter_arg(kwds['skipfooter']) self.delimiter = kwds['delimiter'] self.quotechar = kwds['quotechar'] @@ -1879,6 +1973,9 @@ def __init__(self, f, **kwds): self.usecols, _ = _validate_usecols_arg(kwds['usecols']) self.skip_blank_lines = kwds['skip_blank_lines'] + self.warn_bad_lines = kwds['warn_bad_lines'] + self.error_bad_lines = kwds['error_bad_lines'] + self.names_passed = kwds['names'] or None self.na_filter = kwds['na_filter'] @@ -1899,7 +1996,8 @@ def __init__(self, f, **kwds): self.comment = kwds['comment'] self._comment_lines = [] - f, handles = _get_handle(f, 'r', encoding=self.encoding, + mode = 'r' if PY3 else 'rb' + f, handles = _get_handle(f, mode, encoding=self.encoding, compression=self.compression, memory_map=self.memory_map) self.handles.extend(handles) @@ -2125,7 +2223,7 @@ def _exclude_implicit_index(self, alldata): def get_chunk(self, size=None): if size is None: size = self.chunksize - return self.read(nrows=size) + return self.read(rows=size) def _convert_data(self, data): # apply converters @@ -2187,10 +2285,11 @@ def _infer_columns(self): if self.header is not None: header = self.header - # we have a mi columns, so read an extra line if isinstance(header, (list, tuple, np.ndarray)): - have_mi_columns = True - header = list(header) + [header[-1] + 1] + have_mi_columns = len(header) > 1 + # we have a mi columns, so read an extra line + if have_mi_columns: + header = list(header) + [header[-1] + 1] else: have_mi_columns = False header = [header] @@ -2238,11 +2337,17 @@ def _infer_columns(self): this_columns.append(c) if not have_mi_columns and self.mangle_dupe_cols: - counts = {} + counts = defaultdict(int) + for i, col in enumerate(this_columns): - cur_count = counts.get(col, 0) - if cur_count > 0: - this_columns[i] = '%s.%d' % (col, cur_count) + cur_count = counts[col] + + while cur_count > 0: + counts[col] = cur_count + 1 + col = "%s.%d" % (col, cur_count) + cur_count = counts[col] + + this_columns[i] = col counts[col] = cur_count + 1 elif have_mi_columns: @@ -2422,7 +2527,19 @@ def _check_for_bom(self, first_row): # return an empty string. return [""] - def _empty(self, line): + def _is_line_empty(self, line): + """ + Check if a line is empty or not. + + Parameters + ---------- + line : str, array-like + The line of data to check. + + Returns + ------- + boolean : Whether or not the line is empty. + """ return not line or all(not x for x in line) def _next_line(self): @@ -2435,11 +2552,12 @@ def _next_line(self): line = self._check_comments([self.data[self.pos]])[0] self.pos += 1 # either uncommented or blank to begin with - if not self.skip_blank_lines and (self._empty(self.data[ - self.pos - 1]) or line): + if (not self.skip_blank_lines and + (self._is_line_empty( + self.data[self.pos - 1]) or line)): break elif self.skip_blank_lines: - ret = self._check_empty([line]) + ret = self._remove_empty_lines([line]) if ret: line = ret[0] break @@ -2451,35 +2569,19 @@ def _next_line(self): next(self.data) while True: - try: - orig_line = next(self.data) - except csv.Error as e: - msg = str(e) - - if 'NULL byte' in str(e): - msg = ('NULL byte detected. This byte ' - 'cannot be processed in Python\'s ' - 'native csv library at the moment, ' - 'so please pass in engine=\'c\' instead') - - if self.skipfooter > 0: - reason = ('Error could possibly be due to ' - 'parsing errors in the skipped footer rows ' - '(the skipfooter keyword is only applied ' - 'after Python\'s csv library has parsed ' - 'all rows).') - msg += '. ' + reason - - raise csv.Error(msg) - line = self._check_comments([orig_line])[0] + orig_line = self._next_iter_line(row_num=self.pos + 1) self.pos += 1 - if (not self.skip_blank_lines and - (self._empty(orig_line) or line)): - break - elif self.skip_blank_lines: - ret = self._check_empty([line]) - if ret: - line = ret[0] + + if orig_line is not None: + line = self._check_comments([orig_line])[0] + + if self.skip_blank_lines: + ret = self._remove_empty_lines([line]) + + if ret: + line = ret[0] + break + elif self._is_line_empty(orig_line) or line: break # This was the first line of the file, @@ -2492,6 +2594,66 @@ def _next_line(self): self.buf.append(line) return line + def _alert_malformed(self, msg, row_num): + """ + Alert a user about a malformed row. + + If `self.error_bad_lines` is True, the alert will be `ParserError`. + If `self.warn_bad_lines` is True, the alert will be printed out. + + Parameters + ---------- + msg : The error message to display. + row_num : The row number where the parsing error occurred. + Because this row number is displayed, we 1-index, + even though we 0-index internally. + """ + + if self.error_bad_lines: + raise ParserError(msg) + elif self.warn_bad_lines: + base = 'Skipping line {row_num}: '.format(row_num=row_num) + sys.stderr.write(base + msg + '\n') + + def _next_iter_line(self, row_num): + """ + Wrapper around iterating through `self.data` (CSV source). + + When a CSV error is raised, we check for specific + error messages that allow us to customize the + error message displayed to the user. + + Parameters + ---------- + row_num : The row number of the line being parsed. + """ + + try: + return next(self.data) + except csv.Error as e: + if self.warn_bad_lines or self.error_bad_lines: + msg = str(e) + + if 'NULL byte' in msg: + msg = ('NULL byte detected. This byte ' + 'cannot be processed in Python\'s ' + 'native csv library at the moment, ' + 'so please pass in engine=\'c\' instead') + elif 'newline inside string' in msg: + msg = ('EOF inside string starting with ' + 'line ' + str(row_num)) + + if self.skipfooter > 0: + reason = ('Error could possibly be due to ' + 'parsing errors in the skipped footer rows ' + '(the skipfooter keyword is only applied ' + 'after Python\'s csv library has parsed ' + 'all rows).') + msg += '. ' + reason + + self._alert_malformed(msg, row_num) + return None + def _check_comments(self, lines): if self.comment is None: return lines @@ -2510,7 +2672,22 @@ def _check_comments(self, lines): ret.append(rl) return ret - def _check_empty(self, lines): + def _remove_empty_lines(self, lines): + """ + Iterate through the lines and remove any that are + either empty or contain only one whitespace value + + Parameters + ---------- + lines : array-like + The array of lines that we are to filter. + + Returns + ------- + filtered_lines : array-like + The same array of lines with the "empty" ones removed. + """ + ret = [] for l in lines: # Remove empty lines and lines with only one whitespace value @@ -2626,37 +2803,49 @@ def _rows_to_cols(self, content): if self._implicit_index: col_len += len(self.index_col) - # see gh-13320 - zipped_content = list(lib.to_object_array( - content, min_width=col_len).T) - zip_len = len(zipped_content) - - if self.skipfooter < 0: - raise ValueError('skip footer cannot be negative') + max_len = max([len(row) for row in content]) - # Loop through rows to verify lengths are correct. - if (col_len != zip_len and + # Check that there are no rows with too many + # elements in their row (rows with too few + # elements are padded with NaN). + if (max_len > col_len and self.index_col is not False and self.usecols is None): - i = 0 - for (i, l) in enumerate(content): - if len(l) != col_len: - break - footers = 0 - if self.skipfooter: - footers = self.skipfooter + footers = self.skipfooter if self.skipfooter else 0 + bad_lines = [] - row_num = self.pos - (len(content) - i + footers) + iter_content = enumerate(content) + content_len = len(content) + content = [] - msg = ('Expected %d fields in line %d, saw %d' % - (col_len, row_num + 1, zip_len)) - if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE: - # see gh-13374 - reason = ('Error could possibly be due to quotes being ' - 'ignored when a multi-char delimiter is used.') - msg += '. ' + reason - raise ValueError(msg) + for (i, l) in iter_content: + actual_len = len(l) + + if actual_len > col_len: + if self.error_bad_lines or self.warn_bad_lines: + row_num = self.pos - (content_len - i + footers) + bad_lines.append((row_num, actual_len)) + + if self.error_bad_lines: + break + else: + content.append(l) + + for row_num, actual_len in bad_lines: + msg = ('Expected %d fields in line %d, saw %d' % + (col_len, row_num + 1, actual_len)) + if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE: + # see gh-13374 + reason = ('Error could possibly be due to quotes being ' + 'ignored when a multi-char delimiter is used.') + msg += '. ' + reason + + self._alert_malformed(msg, row_num + 1) + + # see gh-13320 + zipped_content = list(lib.to_object_array( + content, min_width=col_len).T) if self.usecols: if self._implicit_index: @@ -2670,7 +2859,6 @@ def _rows_to_cols(self, content): return zipped_content def _get_lines(self, rows=None): - source = self.data lines = self.buf new_rows = None @@ -2685,14 +2873,14 @@ def _get_lines(self, rows=None): rows -= len(self.buf) if new_rows is None: - if isinstance(source, list): - if self.pos > len(source): + if isinstance(self.data, list): + if self.pos > len(self.data): raise StopIteration if rows is None: - new_rows = source[self.pos:] - new_pos = len(source) + new_rows = self.data[self.pos:] + new_pos = len(self.data) else: - new_rows = source[self.pos:self.pos + rows] + new_rows = self.data[self.pos:self.pos + rows] new_pos = self.pos + rows # Check for stop rows. n.b.: self.skiprows is a set. @@ -2708,21 +2896,19 @@ def _get_lines(self, rows=None): try: if rows is not None: for _ in range(rows): - new_rows.append(next(source)) + new_rows.append(next(self.data)) lines.extend(new_rows) else: rows = 0 + while True: - try: - new_rows.append(next(source)) - rows += 1 - except csv.Error as inst: - if 'newline inside string' in str(inst): - row_num = str(self.pos + rows) - msg = ('EOF inside string starting with ' - 'line ' + row_num) - raise Exception(msg) - raise + new_row = self._next_iter_line( + row_num=self.pos + rows + 1) + rows += 1 + + if new_row is not None: + new_rows.append(new_row) + except StopIteration: if self.skiprows: new_rows = [row for i, row in enumerate(new_rows) @@ -2741,7 +2927,7 @@ def _get_lines(self, rows=None): lines = self._check_comments(lines) if self.skip_blank_lines: - lines = self._check_empty(lines) + lines = self._remove_empty_lines(lines) lines = self._check_thousands(lines) return self._check_decimal(lines) @@ -2856,7 +3042,7 @@ def _try_convert_dates(parser, colspec, data_dict, columns): if c in colset: colnames.append(c) elif isinstance(c, int) and c not in columns: - colnames.append(str(columns[c])) + colnames.append(columns[c]) else: colnames.append(c) @@ -2873,7 +3059,7 @@ def _clean_na_values(na_values, keep_default_na=True): if keep_default_na: na_values = _NA_VALUES else: - na_values = [] + na_values = set() na_fvalues = set() elif isinstance(na_values, dict): na_values = na_values.copy() # Prevent aliasing. @@ -2954,9 +3140,8 @@ def _get_empty_meta(columns, index_col, index_names, dtype=None): if index_col is None or index_col is False: index = Index([]) else: - index = [Series([], dtype=dtype[index_name]) - for index_name in index_names] - index = MultiIndex.from_arrays(index, names=index_names) + data = [Series([], dtype=dtype[name]) for name in index_names] + index = _ensure_index_from_sequences(data, names=index_names) index_col.sort() for i, n in enumerate(index_col): columns.pop(n - i) diff --git a/pandas/io/pickle.py b/pandas/io/pickle.py index 2358c296f782e..143b76575e36b 100644 --- a/pandas/io/pickle.py +++ b/pandas/io/pickle.py @@ -3,10 +3,11 @@ import numpy as np from numpy.lib.format import read_array, write_array from pandas.compat import BytesIO, cPickle as pkl, pickle_compat as pc, PY3 -from pandas.types.common import is_datetime64_dtype, _NS_DTYPE +from pandas.core.dtypes.common import is_datetime64_dtype, _NS_DTYPE +from pandas.io.common import _get_handle, _infer_compression, _stringify_path -def to_pickle(obj, path): +def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL): """ Pickle (serialize) object to input file path @@ -15,12 +16,39 @@ def to_pickle(obj, path): obj : any object path : string File path + compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer' + a string representing the compression to use in the output file + + .. versionadded:: 0.20.0 + protocol : int + Int which indicates which protocol should be used by the pickler, + default HIGHEST_PROTOCOL (see [1], paragraph 12.1.2). The possible + values for this parameter depend on the version of Python. For Python + 2.x, possible values are 0, 1, 2. For Python>=3.0, 3 is a valid value. + For Python >= 3.4, 4 is a valid value. A negative value for the + protocol parameter is equivalent to setting its value to + HIGHEST_PROTOCOL. + + .. [1] https://docs.python.org/3/library/pickle.html + .. versionadded:: 0.21.0 + + """ - with open(path, 'wb') as f: - pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL) + path = _stringify_path(path) + inferred_compression = _infer_compression(path, compression) + f, fh = _get_handle(path, 'wb', + compression=inferred_compression, + is_text=False) + if protocol < 0: + protocol = pkl.HIGHEST_PROTOCOL + try: + pkl.dump(obj, f, protocol=protocol) + finally: + for _f in fh: + _f.close() -def read_pickle(path): +def read_pickle(path, compression='infer'): """ Load pickled pandas object (or any other pickled object) from the specified file path @@ -32,11 +60,31 @@ def read_pickle(path): ---------- path : string File path + compression : {'infer', 'gzip', 'bz2', 'xz', 'zip', None}, default 'infer' + For on-the-fly decompression of on-disk data. If 'infer', then use + gzip, bz2, xz or zip if path ends in '.gz', '.bz2', '.xz', + or '.zip' respectively, and no decompression otherwise. + Set to None for no decompression. + + .. versionadded:: 0.20.0 Returns ------- unpickled : type of object stored in file """ + path = _stringify_path(path) + inferred_compression = _infer_compression(path, compression) + + def read_wrapper(func): + # wrapper file handle open/close operation + f, fh = _get_handle(path, 'rb', + compression=inferred_compression, + is_text=False) + try: + return func(f) + finally: + for _f in fh: + _f.close() def try_read(path, encoding=None): # try with cPickle @@ -48,19 +96,16 @@ def try_read(path, encoding=None): # cpickle # GH 6899 try: - with open(path, 'rb') as fh: - return pkl.load(fh) + return read_wrapper(lambda f: pkl.load(f)) except Exception: # reg/patched pickle try: - with open(path, 'rb') as fh: - return pc.load(fh, encoding=encoding, compat=False) - + return read_wrapper( + lambda f: pc.load(f, encoding=encoding, compat=False)) # compat pickle except: - with open(path, 'rb') as fh: - return pc.load(fh, encoding=encoding, compat=True) - + return read_wrapper( + lambda f: pc.load(f, encoding=encoding, compat=True)) try: return try_read(path) except: @@ -68,6 +113,7 @@ def try_read(path, encoding=None): return try_read(path, encoding='latin1') raise + # compat with sparse pickle / unpickle diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py index 9f161dc5ec50e..712e9e9903f0a 100644 --- a/pandas/io/pytables.py +++ b/pandas/io/pytables.py @@ -12,45 +12,41 @@ import warnings import os -from pandas.types.common import (is_list_like, - is_categorical_dtype, - is_timedelta64_dtype, - is_datetime64tz_dtype, - is_datetime64_dtype, - _ensure_object, - _ensure_int64, - _ensure_platform_int) -from pandas.types.missing import array_equivalent +from pandas.core.dtypes.common import ( + is_list_like, + is_categorical_dtype, + is_timedelta64_dtype, + is_datetime64tz_dtype, + is_datetime64_dtype, + _ensure_object, + _ensure_int64, + _ensure_platform_int) +from pandas.core.dtypes.missing import array_equivalent import numpy as np - -import pandas as pd from pandas import (Series, DataFrame, Panel, Panel4D, Index, - MultiIndex, Int64Index, isnull) + MultiIndex, Int64Index, isna, concat, + SparseSeries, SparseDataFrame, PeriodIndex, + DatetimeIndex, TimedeltaIndex) from pandas.core import config from pandas.io.common import _stringify_path -from pandas.sparse.api import SparseSeries, SparseDataFrame -from pandas.sparse.array import BlockIndex, IntIndex -from pandas.tseries.api import PeriodIndex, DatetimeIndex -from pandas.tseries.tdi import TimedeltaIndex +from pandas.core.sparse.array import BlockIndex, IntIndex from pandas.core.base import StringMixin -from pandas.formats.printing import adjoin, pprint_thing -from pandas.core.common import _asarray_tuplesafe, PerformanceWarning +from pandas.io.formats.printing import adjoin, pprint_thing +from pandas.errors import PerformanceWarning +from pandas.core.common import _asarray_tuplesafe from pandas.core.algorithms import match, unique from pandas.core.categorical import Categorical, _factorize_from_iterables from pandas.core.internals import (BlockManager, make_block, _block2d_to_blocknd, _factor_indexer, _block_shape) from pandas.core.index import _ensure_index -from pandas.tools.merge import concat from pandas import compat from pandas.compat import u_safe as u, PY3, range, lrange, string_types, filter from pandas.core.config import get_option -from pandas.computation.pytables import Expr, maybe_expression +from pandas.core.computation.pytables import Expr, maybe_expression -import pandas.lib as lib -import pandas.algos as algos -import pandas.tslib as tslib +from pandas._libs import tslib, algos, lib from distutils.version import LooseVersion @@ -76,6 +72,19 @@ def _ensure_encoding(encoding): encoding = _default_encoding return encoding + +def _ensure_str(name): + """Ensure that an index / column name is a str (python 3) or + unicode (python 2); otherwise they may be np.string dtype. + Non-string dtypes are passed through unchanged. + + https://github.com/pandas-dev/pandas/issues/13492 + """ + if isinstance(name, compat.string_types): + name = compat.text_type(name) + return name + + Term = Expr @@ -114,6 +123,7 @@ class ClosedFileError(Exception): class IncompatibilityWarning(Warning): pass + incompatibility_doc = """ where criteria is being ignored as this version [%s] is too old (or not-defined), read the file in and write it out to a new file to upgrade (with @@ -124,6 +134,7 @@ class IncompatibilityWarning(Warning): class AttributeConflictWarning(Warning): pass + attribute_conflict_doc = """ the [%s] attribute of the existing index is [%s] which conflicts with the new [%s], resetting the attribute to None @@ -133,6 +144,7 @@ class AttributeConflictWarning(Warning): class DuplicateWarning(Warning): pass + duplicate_doc = """ duplicate entries in table, taking most recently appended """ @@ -164,7 +176,6 @@ class DuplicateWarning(Warning): Series: u('series'), SparseSeries: u('sparse_series'), - pd.TimeSeries: u('series'), DataFrame: u('frame'), SparseDataFrame: u('sparse_frame'), Panel: u('wide'), @@ -173,7 +184,6 @@ class DuplicateWarning(Warning): # storer class map _STORER_MAP = { - u('TimeSeries'): 'LegacySeriesFixed', u('Series'): 'LegacySeriesFixed', u('DataFrame'): 'LegacyFrameFixed', u('DataMatrix'): 'LegacyFrameFixed', @@ -272,7 +282,7 @@ def to_hdf(path_or_buf, key, value, mode=None, complevel=None, complib=None, f(path_or_buf) -def read_hdf(path_or_buf, key=None, **kwargs): +def read_hdf(path_or_buf, key=None, mode='r', **kwargs): """ read from the store, close it if we opened it Retrieve pandas object stored in file, optionally based on where @@ -280,13 +290,16 @@ def read_hdf(path_or_buf, key=None, **kwargs): Parameters ---------- - path_or_buf : path (string), buffer, or path object (pathlib.Path or - py._path.local.LocalPath) to read from + path_or_buf : path (string), buffer or path object (pathlib.Path or + py._path.local.LocalPath) designating the file to open, or an + already opened pd.HDFStore object .. versionadded:: 0.19.0 support for pathlib, py.path. key : group identifier in the store. Can be omitted if the HDF file contains a single pandas object. + mode : string, {'r', 'r+', 'a'}, default 'r'. Mode to use when opening + the file. Ignored if path_or_buf is a pd.HDFStore. where : list of Term (or convertable) objects, optional start : optional, integer (defaults to None), row number to start selection @@ -303,17 +316,24 @@ def read_hdf(path_or_buf, key=None, **kwargs): """ - if kwargs.get('mode', 'a') not in ['r', 'r+', 'a']: + if mode not in ['r', 'r+', 'a']: raise ValueError('mode {0} is not allowed while performing a read. ' - 'Allowed modes are r, r+ and a.' - .format(kwargs.get('mode'))) + 'Allowed modes are r, r+ and a.'.format(mode)) # grab the scope if 'where' in kwargs: kwargs['where'] = _ensure_term(kwargs['where'], scope_level=1) - path_or_buf = _stringify_path(path_or_buf) - if isinstance(path_or_buf, string_types): + if isinstance(path_or_buf, HDFStore): + if not path_or_buf.is_open: + raise IOError('The HDFStore must be open for reading.') + store = path_or_buf + auto_close = False + else: + path_or_buf = _stringify_path(path_or_buf) + if not isinstance(path_or_buf, string_types): + raise NotImplementedError('Support for generic buffers has not ' + 'been implemented.') try: exists = os.path.exists(path_or_buf) @@ -325,22 +345,11 @@ def read_hdf(path_or_buf, key=None, **kwargs): raise compat.FileNotFoundError( 'File %s does not exist' % path_or_buf) + store = HDFStore(path_or_buf, mode=mode, **kwargs) # can't auto open/close if we are using an iterator # so delegate to the iterator - store = HDFStore(path_or_buf, **kwargs) auto_close = True - elif isinstance(path_or_buf, HDFStore): - if not path_or_buf.is_open: - raise IOError('The HDFStore must be open for reading.') - - store = path_or_buf - auto_close = False - - else: - raise NotImplementedError('Support for generic buffers has not been ' - 'implemented.') - try: if key is None: groups = store.groups() @@ -404,12 +413,17 @@ class HDFStore(StringMixin): and if the file does not exist it is created. ``'r+'`` It is similar to ``'a'``, but the file must already exist. - complevel : int, 1-9, default 0 - If a complib is specified compression will be applied - where possible - complib : {'zlib', 'bzip2', 'lzo', 'blosc', None}, default None - If complevel is > 0 apply compression to objects written - in the store wherever possible + complevel : int, 0-9, default None + Specifies a compression level for data. + A value of 0 disables compression. + complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib' + Specifies the compression library to be used. + As of v0.20.2 these additional compressors for Blosc are supported + (default if no compressor specified: 'blosc:blosclz'): + {'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy', + 'blosc:zlib', 'blosc:zstd'}. + Specifying a compression library which is not available issues + a ValueError. fletcher32 : bool, default False If applying compression use the fletcher32 checksum @@ -432,21 +446,28 @@ def __init__(self, path, mode=None, complevel=None, complib=None, raise ImportError('HDFStore requires PyTables, "{ex}" problem ' 'importing'.format(ex=str(ex))) - if complib not in (None, 'blosc', 'bzip2', 'lzo', 'zlib'): - raise ValueError("complib only supports 'blosc', 'bzip2', lzo' " - "or 'zlib' compression.") + if complib is not None and complib not in tables.filters.all_complibs: + raise ValueError( + "complib only supports {libs} compression.".format( + libs=tables.filters.all_complibs)) + + if complib is None and complevel is not None: + complib = tables.filters.default_complib - self._path = path + self._path = _stringify_path(path) if mode is None: mode = 'a' self._mode = mode self._handle = None - self._complevel = complevel + self._complevel = complevel if complevel else 0 self._complib = complib self._fletcher32 = fletcher32 self._filters = None self.open(mode=mode, **kwargs) + def __fspath__(self): + return self._path + @property def root(self): """ return the root node """ @@ -468,7 +489,6 @@ def __delitem__(self, key): def __getattr__(self, name): """ allow attribute access to get stores """ - self._check_if_open() try: return self.get(name) except: @@ -491,32 +511,7 @@ def __len__(self): return len(self.groups()) def __unicode__(self): - output = '%s\nFile path: %s\n' % (type(self), pprint_thing(self._path)) - if self.is_open: - lkeys = sorted(list(self.keys())) - if len(lkeys): - keys = [] - values = [] - - for k in lkeys: - try: - s = self.get_storer(k) - if s is not None: - keys.append(pprint_thing(s.pathname or k)) - values.append( - pprint_thing(s or 'invalid_HDFStore node')) - except Exception as detail: - keys.append(k) - values.append("[invalid_HDFStore node: %s]" - % pprint_thing(detail)) - - output += adjoin(12, keys, values) - else: - output += 'Empty' - else: - output += "File is CLOSED" - - return output + return '%s\nFile path: %s\n' % (type(self), pprint_thing(self._path)) def __enter__(self): return self @@ -576,11 +571,8 @@ def open(self, mode='a', **kwargs): if self.is_open: self.close() - if self._complib is not None: - if self._complevel is None: - self._complevel = 9 - self._filters = _tables().Filters(self._complevel, - self._complib, + if self._complevel and self._complevel > 0: + self._filters = _tables().Filters(self._complevel, self._complib, fletcher32=self._fletcher32) try: @@ -828,12 +820,12 @@ def func(_start, _stop, _where): # retrieve the objs, _where is always passed as a set of # coordinates here - objs = [t.read(where=_where, columns=columns, **kwargs) - for t in tbls] + objs = [t.read(where=_where, columns=columns, start=_start, + stop=_stop, **kwargs) for t in tbls] # concat and return return concat(objs, axis=axis, - verify_integrity=False).consolidate() + verify_integrity=False)._consolidate() # create the iterator it = TableIterator(self, s, func, where=where, nrows=nrows, @@ -1158,6 +1150,39 @@ def copy(self, file, mode='w', propindexes=True, keys=None, complib=None, return new_store + def info(self): + """ + print detailed information on the store + + .. versionadded:: 0.21.0 + """ + output = '%s\nFile path: %s\n' % (type(self), pprint_thing(self._path)) + if self.is_open: + lkeys = sorted(list(self.keys())) + if len(lkeys): + keys = [] + values = [] + + for k in lkeys: + try: + s = self.get_storer(k) + if s is not None: + keys.append(pprint_thing(s.pathname or k)) + values.append( + pprint_thing(s or 'invalid_HDFStore node')) + except Exception as detail: + keys.append(k) + values.append("[invalid_HDFStore node: %s]" + % pprint_thing(detail)) + + output += adjoin(12, keys, values) + else: + output += 'Empty' + else: + output += "File is CLOSED" + + return output + # private methods ###### def _check_if_open(self): if not self.is_open: @@ -1326,6 +1351,13 @@ def _read_group(self, group, **kwargs): def get_store(path, **kwargs): """ Backwards compatible alias for ``HDFStore`` """ + warnings.warn( + "get_store is deprecated and be " + "removed in a future version\n" + "HDFStore(path, **kwargs) is the replacement", + FutureWarning, + stacklevel=6) + return HDFStore(path, **kwargs) @@ -1415,7 +1447,8 @@ def get_result(self, coordinates=False): # if specified read via coordinates (necessary for multiple selections if coordinates: - where = self.s.read_coordinates(where=self.where) + where = self.s.read_coordinates(where=self.where, start=self.start, + stop=self.stop) else: where = self.where @@ -2098,7 +2131,17 @@ def convert(self, values, nan_rep, encoding): # we have a categorical categories = self.metadata - self.data = Categorical.from_codes(self.data.ravel(), + codes = self.data.ravel() + + # if we have stored a NaN in the categories + # then strip it; in theory we could have BOTH + # -1s in the codes and nulls :< + mask = isna(categories) + if mask.any(): + categories = categories[~mask] + codes[codes != -1] -= mask.astype(int).cumsum().values + + self.data = Categorical.from_codes(codes, categories=categories, ordered=self.ordered) @@ -2546,10 +2589,10 @@ def read_index_node(self, node, start=None, stop=None): name = None if 'name' in node._v_attrs: - name = node._v_attrs.name + name = _ensure_str(node._v_attrs.name) - index_class = self._alias_to_class(getattr(node._v_attrs, - 'index_class', '')) + index_class = self._alias_to_class(_ensure_decoded( + getattr(node._v_attrs, 'index_class', ''))) factory = self._get_index_factory(index_class) kwargs = {} @@ -3408,10 +3451,12 @@ def create_axes(self, axes, obj, validate=True, nan_rep=None, if existing_table is not None: indexer = len(self.non_index_axes) exist_axis = existing_table.non_index_axes[indexer][1] - if append_axis != exist_axis: + if not array_equivalent(np.array(append_axis), + np.array(exist_axis)): # ahah! -> reindex - if sorted(append_axis) == sorted(exist_axis): + if array_equivalent(np.array(sorted(append_axis)), + np.array(sorted(exist_axis))): append_axis = exist_axis # the non_index_axes info @@ -3440,7 +3485,7 @@ def get_blk_items(mgr, blocks): return [mgr.items.take(blk.mgr_locs) for blk in blocks] # figure out data_columns and get out blocks - block_obj = self.get_object(obj).consolidate() + block_obj = self.get_object(obj)._consolidate() blocks = block_obj._data.blocks blk_items = get_blk_items(block_obj._data, blocks) if len(self.non_index_axes): @@ -3789,9 +3834,9 @@ def read(self, where=None, columns=None, **kwargs): lp = DataFrame(c.data, index=long_index, columns=c.values) # need a better algorithm - tuple_index = long_index._tuple_index + tuple_index = long_index.values - unique_tuples = lib.fast_unique(tuple_index.values) + unique_tuples = lib.fast_unique(tuple_index) unique_tuples = _asarray_tuplesafe(unique_tuples) indexer = match(unique_tuples, tuple_index) @@ -3807,7 +3852,7 @@ def read(self, where=None, columns=None, **kwargs): if len(objs) == 1: wp = objs[0] else: - wp = concat(objs, axis=0, verify_integrity=False).consolidate() + wp = concat(objs, axis=0, verify_integrity=False)._consolidate() # apply the selection filters & axis orderings wp = self.process_axes(wp, columns=columns) @@ -3896,7 +3941,7 @@ def write_data(self, chunksize, dropna=False): # figure the mask: only do if we can successfully process this # column, otherwise ignore the mask - mask = isnull(a.data).all(axis=0) + mask = isna(a.data).all(axis=0) if isinstance(mask, np.ndarray): masks.append(mask.astype('u1', copy=False)) @@ -4313,7 +4358,7 @@ def _reindex_axis(obj, axis, labels, other=None): labels = _ensure_index(labels.unique()) if other is not None: - labels = labels & _ensure_index(other.unique()) + labels = _ensure_index(other.unique()) & labels if not labels.equals(ax): slicer = [slice(None, None)] * obj.ndim slicer[axis] = labels @@ -4336,7 +4381,7 @@ def _get_tz(tz): """ for a tz-aware type, return an encoded zone """ zone = tslib.get_timezone(tz) if zone is None: - zone = tslib.tot_seconds(tz.utcoffset()) + zone = tz.utcoffset().total_seconds() return zone diff --git a/pandas/io/sas/__init__.py b/pandas/io/sas/__init__.py index e69de29bb2d1d..fa6b29a1a3fcc 100644 --- a/pandas/io/sas/__init__.py +++ b/pandas/io/sas/__init__.py @@ -0,0 +1 @@ +from .sasreader import read_sas # noqa diff --git a/pandas/io/sas/saslib.pyx b/pandas/io/sas/sas.pyx similarity index 100% rename from pandas/io/sas/saslib.pyx rename to pandas/io/sas/sas.pyx diff --git a/pandas/io/sas/sas7bdat.py b/pandas/io/sas/sas7bdat.py index 91f417abc0502..2b3a91e2062b1 100644 --- a/pandas/io/sas/sas7bdat.py +++ b/pandas/io/sas/sas7bdat.py @@ -20,7 +20,7 @@ import numpy as np import struct import pandas.io.sas.sas_constants as const -from pandas.io.sas.saslib import Parser +from pandas.io.sas._sas import Parser class _subheader_pointer(object): @@ -44,8 +44,8 @@ class SAS7BDATReader(BaseIterator): index : column identifier, defaults to None Column to use as index. convert_dates : boolean, defaults to True - Attempt to convert dates to Pandas datetime values. Note all - SAS date formats are supported. + Attempt to convert dates to Pandas datetime values. Note that + some rarely used SAS date formats may be unsupported. blank_missing : boolean, defaults to True Convert empty strings to missing values (SAS uses blanks to indicate missing character variables). @@ -655,9 +655,15 @@ def _chunk_to_dataframe(self): rslt[name] = self._byte_chunk[jb, :].view( dtype=self.byte_order + 'd') rslt[name] = np.asarray(rslt[name], dtype=np.float64) - if self.convert_dates and (self.column_formats[j] == "MMDDYY"): - epoch = pd.datetime(1960, 1, 1) - rslt[name] = epoch + pd.to_timedelta(rslt[name], unit='d') + if self.convert_dates: + unit = None + if self.column_formats[j] in const.sas_date_formats: + unit = 'd' + elif self.column_formats[j] in const.sas_datetime_formats: + unit = 's' + if unit: + rslt[name] = pd.to_datetime(rslt[name], unit=unit, + origin="1960-01-01") jb += 1 elif self.column_types[j] == b's': rslt[name] = self._string_chunk[js, :] diff --git a/pandas/io/sas/sas_constants.py b/pandas/io/sas/sas_constants.py index 65ae1e9102cb2..c4b3588164305 100644 --- a/pandas/io/sas/sas_constants.py +++ b/pandas/io/sas/sas_constants.py @@ -145,3 +145,27 @@ class index: b"\xFF\xFF\xFF\xFE": index.columnListIndex, b"\xFE\xFF\xFF\xFF\xFF\xFF\xFF\xFF": index.columnListIndex, b"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE": index.columnListIndex} + + +# List of frequently used SAS date and datetime formats +# http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_intervals_sect009.htm +# https://github.com/epam/parso/blob/master/src/main/java/com/epam/parso/impl/SasFileConstants.java +sas_date_formats = ("DATE", "DAY", "DDMMYY", "DOWNAME", "JULDAY", "JULIAN", + "MMDDYY", "MMYY", "MMYYC", "MMYYD", "MMYYP", "MMYYS", + "MMYYN", "MONNAME", "MONTH", "MONYY", "QTR", "QTRR", + "NENGO", "WEEKDATE", "WEEKDATX", "WEEKDAY", "WEEKV", + "WORDDATE", "WORDDATX", "YEAR", "YYMM", "YYMMC", "YYMMD", + "YYMMP", "YYMMS", "YYMMN", "YYMON", "YYMMDD", "YYQ", + "YYQC", "YYQD", "YYQP", "YYQS", "YYQN", "YYQR", "YYQRC", + "YYQRD", "YYQRP", "YYQRS", "YYQRN", + "YYMMDDP", "YYMMDDC", "E8601DA", "YYMMDDN", "MMDDYYC", + "MMDDYYS", "MMDDYYD", "YYMMDDS", "B8601DA", "DDMMYYN", + "YYMMDDD", "DDMMYYB", "DDMMYYP", "MMDDYYP", "YYMMDDB", + "MMDDYYN", "DDMMYYC", "DDMMYYD", "DDMMYYS", + "MINGUO") + +sas_datetime_formats = ("DATETIME", "DTWKDATX", + "B8601DN", "B8601DT", "B8601DX", "B8601DZ", "B8601LX", + "E8601DN", "E8601DT", "E8601DX", "E8601DZ", "E8601LX", + "DATEAMPM", "DTDATE", "DTMONYY", "DTMONYY", "DTWKDATX", + "DTYEAR", "TOD", "MDYAMPM") diff --git a/pandas/io/sas/sas_xport.py b/pandas/io/sas/sas_xport.py index 76fc55154bc49..a43a5988a2ade 100644 --- a/pandas/io/sas/sas_xport.py +++ b/pandas/io/sas/sas_xport.py @@ -14,7 +14,7 @@ from pandas import compat import struct import numpy as np -from pandas.util.decorators import Appender +from pandas.util._decorators import Appender import warnings _correct_line1 = ("HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!" diff --git a/pandas/io/sas/sasreader.py b/pandas/io/sas/sasreader.py index 29e7f131fd9bc..b8a0bf5733158 100644 --- a/pandas/io/sas/sasreader.py +++ b/pandas/io/sas/sasreader.py @@ -2,11 +2,11 @@ Read SAS sas7bdat or xport files. """ from pandas import compat +from pandas.io.common import _stringify_path def read_sas(filepath_or_buffer, format=None, index=None, encoding=None, chunksize=None, iterator=False): - """ Read SAS files stored as either XPORT or SAS7BDAT format files. @@ -35,6 +35,7 @@ def read_sas(filepath_or_buffer, format=None, index=None, encoding=None, buffer_error_msg = ("If this is a buffer object rather " "than a string name, you must specify " "a format string") + filepath_or_buffer = _stringify_path(filepath_or_buffer) if not isinstance(filepath_or_buffer, compat.string_types): raise ValueError(buffer_error_msg) try: diff --git a/pandas/io/sql.py b/pandas/io/sql.py index 9fa01c413aca8..9aa47e5c69850 100644 --- a/pandas/io/sql.py +++ b/pandas/io/sql.py @@ -11,17 +11,18 @@ import re import numpy as np -import pandas.lib as lib -from pandas.types.missing import isnull -from pandas.types.dtypes import DatetimeTZDtype -from pandas.types.common import (is_list_like, is_dict_like, - is_datetime64tz_dtype) +import pandas._libs.lib as lib +from pandas.core.dtypes.missing import isna +from pandas.core.dtypes.dtypes import DatetimeTZDtype +from pandas.core.dtypes.common import ( + is_list_like, is_dict_like, + is_datetime64tz_dtype) from pandas.compat import (map, zip, raise_with_traceback, string_types, text_type) from pandas.core.api import DataFrame, Series from pandas.core.base import PandasObject -from pandas.tseries.tools import to_datetime +from pandas.core.tools.datetimes import to_datetime from contextlib import contextmanager @@ -431,7 +432,9 @@ def to_sql(frame, name, con, flavor=None, schema=None, if_exists='fail', library. If a DBAPI2 object, only sqlite3 is supported. flavor : 'sqlite', default None - DEPRECATED: this parameter will be removed in a future version + .. deprecated:: 0.19.0 + 'sqlite' is the only supported option if SQLAlchemy is not + used. schema : string, default None Name of SQL schema in database to write to (if database flavor supports this). If None, use default schema (default). @@ -483,7 +486,9 @@ def has_table(table_name, con, flavor=None, schema=None): library. If a DBAPI2 object, only sqlite3 is supported. flavor : 'sqlite', default None - DEPRECATED: this parameter will be removed in a future version + .. deprecated:: 0.19.0 + 'sqlite' is the only supported option if SQLAlchemy is not + installed. schema : string, default None Name of SQL schema in database to write to (if database flavor supports this). If None, use default schema (default). @@ -495,6 +500,7 @@ def has_table(table_name, con, flavor=None, schema=None): pandas_sql = pandasSQL_builder(con, flavor=flavor, schema=schema) return pandas_sql.has_table(table_name) + table_exists = has_table @@ -626,7 +632,7 @@ def insert_data(self): # replace NaN with None if b._can_hold_na: - mask = isnull(d) + mask = isna(d) d[mask] = None for col_loc, col in zip(b.mgr_locs, d): @@ -748,8 +754,9 @@ def _get_column_names_and_types(self, dtype_mapper): if self.index is not None: for i, idx_label in enumerate(self.index): idx_type = dtype_mapper( - self.frame.index.get_level_values(i)) - column_names_and_types.append((idx_label, idx_type, True)) + self.frame.index._get_level_values(i)) + column_names_and_types.append((text_type(idx_label), + idx_type, True)) column_names_and_types += [ (text_type(self.frame.columns[i]), @@ -838,7 +845,7 @@ def _harmonize_columns(self, parse_dates=None): except KeyError: pass # this column not in results - def _get_notnull_col_dtype(self, col): + def _get_notna_col_dtype(self, col): """ Infer datatype of the Series col. In case the dtype of col is 'object' and it contains NA values, this infers the datatype of the not-NA @@ -846,9 +853,9 @@ def _get_notnull_col_dtype(self, col): """ col_for_inference = col if col.dtype == 'object': - notnulldata = col[~isnull(col)] - if len(notnulldata): - col_for_inference = notnulldata + notnadata = col[~isna(col)] + if len(notnadata): + col_for_inference = notnadata return lib.infer_dtype(col_for_inference) @@ -858,7 +865,7 @@ def _sqlalchemy_type(self, col): if col.name in dtype: return self.dtype[col.name] - col_type = self._get_notnull_col_dtype(col) + col_type = self._get_notna_col_dtype(col) from sqlalchemy.types import (BigInteger, Integer, Float, Text, Boolean, @@ -1219,7 +1226,7 @@ def _create_sql_schema(self, frame, table_name, keys=None, dtype=None): def _get_unicode_name(name): try: - uname = name.encode("utf-8", "strict").decode("utf-8") + uname = text_type(name).encode("utf-8", "strict").decode("utf-8") except UnicodeError: raise ValueError("Cannot convert identifier to UTF-8: '%s'" % name) return uname @@ -1338,7 +1345,7 @@ def _sql_type_name(self, col): if col.name in dtype: return dtype[col.name] - col_type = self._get_notnull_col_dtype(col) + col_type = self._get_notna_col_dtype(col) if col_type == 'timedelta64': warnings.warn("the 'timedelta' type is not supported, and will be " "written as integer values (ns frequency) to the " @@ -1542,7 +1549,9 @@ def get_schema(frame, name, flavor=None, keys=None, con=None, dtype=None): library, default: None If a DBAPI2 object, only sqlite3 is supported. flavor : 'sqlite', default None - DEPRECATED: this parameter will be removed in a future version + .. deprecated:: 0.19.0 + 'sqlite' is the only supported option if SQLAlchemy is not + installed. dtype : dict of column name to SQL type, default None Optional specifying the datatype for columns. The SQL type should be a SQLAlchemy type, or a string for sqlite3 fallback connection. diff --git a/pandas/io/stata.py b/pandas/io/stata.py index 2be7657883e88..253ed03c25db9 100644 --- a/pandas/io/stata.py +++ b/pandas/io/stata.py @@ -15,23 +15,28 @@ import struct from dateutil.relativedelta import relativedelta -from pandas.types.common import (is_categorical_dtype, is_datetime64_dtype, - _ensure_object) +from pandas.core.dtypes.common import ( + is_categorical_dtype, is_datetime64_dtype, + _ensure_object) from pandas.core.base import StringMixin from pandas.core.categorical import Categorical from pandas.core.frame import DataFrame from pandas.core.series import Series import datetime -from pandas import compat, to_timedelta, to_datetime, isnull, DatetimeIndex +from pandas import compat, to_timedelta, to_datetime, isna, DatetimeIndex from pandas.compat import lrange, lmap, lzip, text_type, string_types, range, \ zip, BytesIO -from pandas.util.decorators import Appender +from pandas.util._decorators import Appender import pandas as pd -from pandas.io.common import get_filepath_or_buffer, BaseIterator -from pandas.lib import max_len_string_array, infer_dtype -from pandas.tslib import NaT, Timestamp +from pandas.io.common import (get_filepath_or_buffer, BaseIterator, + _stringify_path) +from pandas._libs.lib import max_len_string_array, infer_dtype +from pandas._libs.tslib import NaT, Timestamp + +VALID_ENCODINGS = ('ascii', 'us-ascii', 'latin-1', 'latin_1', 'iso-8859-1', + 'iso8859-1', '8859', 'cp819', 'latin', 'latin1', 'L1') _version_error = ("Version of given Stata file is not 104, 105, 108, " "111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), " @@ -45,7 +50,7 @@ _encoding_params = """\ encoding : string, None or encoding - Encoding used to parse the files. None defaults to iso-8859-1.""" + Encoding used to parse the files. None defaults to latin-1.""" _statafile_processing_params2 = """\ index : identifier of index column @@ -105,9 +110,11 @@ _statafile_processing_params2, _chunksize_params, _iterator_params) -_data_method_doc = """Reads observations from Stata file, converting them into a dataframe +_data_method_doc = """\ +Reads observations from Stata file, converting them into a dataframe -This is a legacy method. Use `read` in new code. + .. deprecated:: + This is a legacy method. Use `read` in new code. Parameters ---------- @@ -395,7 +402,7 @@ def parse_dates_safe(dates, delta=False, year=False, days=False): return DataFrame(d, index=index) - bad_loc = isnull(dates) + bad_loc = isna(dates) index = dates.index if bad_loc.any(): dates = Series(dates) @@ -459,6 +466,7 @@ class PossiblePrecisionLoss(Warning): class ValueLabelTypeMismatch(Warning): pass + value_label_mismatch_doc = """ Stata value labels (pandas categories) must be strings. Column {0} contains non-string labels which will be converted to strings. Please check that the @@ -815,9 +823,14 @@ def get_base_missing_value(cls, dtype): class StataParser(object): - _default_encoding = 'iso-8859-1' + _default_encoding = 'latin-1' def __init__(self, encoding): + if encoding is not None: + if encoding not in VALID_ENCODINGS: + raise ValueError('Unknown encoding. Only latin-1 and ascii ' + 'supported.') + self._encoding = encoding # type code. @@ -935,7 +948,7 @@ def __init__(self, path_or_buf, convert_dates=True, convert_categoricals=True, index=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, - encoding='iso-8859-1', chunksize=None): + encoding='latin-1', chunksize=None): super(StataReader, self).__init__(encoding) self.col_sizes = () @@ -948,6 +961,10 @@ def __init__(self, path_or_buf, convert_dates=True, self._preserve_dtypes = preserve_dtypes self._columns = columns self._order_categoricals = order_categoricals + if encoding is not None: + if encoding not in VALID_ENCODINGS: + raise ValueError('Unknown encoding. Only latin-1 and ascii ' + 'supported.') self._encoding = encoding self._chunksize = chunksize @@ -962,6 +979,7 @@ def __init__(self, path_or_buf, convert_dates=True, self._lines_read = 0 self._native_byteorder = _set_endianness(sys.byteorder) + path_or_buf = _stringify_path(path_or_buf) if isinstance(path_or_buf, str): path_or_buf, encoding, _ = get_filepath_or_buffer( path_or_buf, encoding=self._default_encoding @@ -979,6 +997,7 @@ def __init__(self, path_or_buf, convert_dates=True, self.path_or_buf = BytesIO(contents) self._read_header() + self._setup_dtype() def __enter__(self): """ enter context manager """ @@ -1281,6 +1300,23 @@ def _read_old_header(self, first_char): # necessary data to continue parsing self.data_location = self.path_or_buf.tell() + def _setup_dtype(self): + """Map between numpy and state dtypes""" + if self._dtype is not None: + return self._dtype + + dtype = [] # Convert struct data types to numpy data type + for i, typ in enumerate(self.typlist): + if typ in self.NUMPY_TYPE_MAP: + dtype.append(('s' + str(i), self.byteorder + + self.NUMPY_TYPE_MAP[typ])) + else: + dtype.append(('s' + str(i), 'S' + str(typ))) + dtype = np.dtype(dtype) + self._dtype = dtype + + return self._dtype + def _calcsize(self, fmt): return (type(fmt) is int and fmt or struct.calcsize(self.byteorder + fmt)) @@ -1361,7 +1397,8 @@ def _read_value_labels(self): def _read_strls(self): self.path_or_buf.seek(self.seek_strls) - self.GSO = {0: ''} + # Wrap v_o in a string to allow uint64 values as keys on 32bit OS + self.GSO = {'0': ''} while True: if self.path_or_buf.read(3) != b'GSO': break @@ -1386,10 +1423,11 @@ def _read_strls(self): if self.format_version == 117: encoding = self._encoding or self._default_encoding va = va[0:-1].decode(encoding) - self.GSO[v_o] = va + # Wrap v_o in a string to allow uint64 values as keys on 32bit OS + self.GSO[str(v_o)] = va # legacy - @Appender('DEPRECATED: ' + _data_method_doc) + @Appender(_data_method_doc) def data(self, **kwargs): import warnings @@ -1452,22 +1490,10 @@ def read(self, nrows=None, convert_dates=None, if nrows is None: nrows = self.nobs - if (self.format_version >= 117) and (self._dtype is None): + if (self.format_version >= 117) and (not self._value_labels_read): self._can_read_value_labels = True self._read_strls() - # Setup the dtype. - if self._dtype is None: - dtype = [] # Convert struct data types to numpy data type - for i, typ in enumerate(self.typlist): - if typ in self.NUMPY_TYPE_MAP: - dtype.append(('s' + str(i), self.byteorder + - self.NUMPY_TYPE_MAP[typ])) - else: - dtype.append(('s' + str(i), 'S' + str(typ))) - dtype = np.dtype(dtype) - self._dtype = dtype - # Read data dtype = self._dtype max_read_len = (self.nobs - self._lines_read) * dtype.itemsize @@ -1622,7 +1648,8 @@ def _insert_strls(self, data): for i, typ in enumerate(self.typlist): if typ != 'Q': continue - data.iloc[:, i] = [self.GSO[k] for k in data.iloc[:, i]] + # Wrap v_o in a string to allow uint64 values as keys on 32bit OS + data.iloc[:, i] = [self.GSO[str(k)] for k in data.iloc[:, i]] return data def _do_select_columns(self, data, columns): @@ -1854,7 +1881,7 @@ class StataWriter(StataParser): write_index : bool Write the index to Stata dataset. encoding : str - Default is latin-1. Unicode is not supported + Default is latin-1. Only latin-1 and ascii are supported. byteorder : str Can be ">", "<", "little", or "big". default is `sys.byteorder` time_stamp : datetime @@ -1913,7 +1940,7 @@ def __init__(self, fname, data, convert_dates=None, write_index=True, if byteorder is None: byteorder = sys.byteorder self._byteorder = _set_endianness(byteorder) - self._fname = fname + self._fname = _stringify_path(fname) self.type_converters = {253: np.int32, 252: np.int16, 251: np.int8} def _write(self, to_write): @@ -1937,7 +1964,6 @@ def _prepare_categoricals(self, data): return data get_base_missing_value = StataMissingValue.get_base_missing_value - index = data.index data_formatted = [] for col, col_is_cat in zip(data, is_cat): if col_is_cat: @@ -1960,8 +1986,7 @@ def _prepare_categoricals(self, data): # Replace missing values with Stata missing value for type values[values == -1] = get_base_missing_value(dtype) - data_formatted.append((col, values, index)) - + data_formatted.append((col, values)) else: data_formatted.append((col, data[col])) return DataFrame.from_items(data_formatted) diff --git a/pandas/io/tests/data/legacy_hdf/legacy.h5 b/pandas/io/tests/data/legacy_hdf/legacy.h5 deleted file mode 100644 index 38b822dd16994..0000000000000 Binary files a/pandas/io/tests/data/legacy_hdf/legacy.h5 and /dev/null differ diff --git a/pandas/io/tests/sas/data/productsales.csv b/pandas/io/tests/sas/data/productsales.csv deleted file mode 100644 index fea9b68912297..0000000000000 --- a/pandas/io/tests/sas/data/productsales.csv +++ /dev/null @@ -1,1441 +0,0 @@ -ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODTYPE,PRODUCT,QUARTER,YEAR,MONTH -925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -999,297,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -608,846,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -642,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -656,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -948,486,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -612,717,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -114,564,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -685,230,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -657,494,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -608,903,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -353,266,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -107,190,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -354,139,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -101,217,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -553,560,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -877,148,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -431,762,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -511,457,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -157,532,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -520,629,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -114,491,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -277,0,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -561,979,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -220,585,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,12054 -444,267,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,12085 -178,487,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,12113 -756,764,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,12144 -329,312,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,12174 -910,531,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,12205 -530,536,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,12235 -101,773,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,12266 -515,143,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,12297 -730,126,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,12327 -993,862,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,12358 -954,754,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,12388 -267,410,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,12419 -347,701,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,12450 -991,204,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,12478 -923,509,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,12509 -437,378,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,12539 -737,507,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,12570 -104,49,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,12600 -840,876,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,12631 -704,66,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,12662 -889,819,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,12692 -107,351,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,12723 -571,201,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,12753 -688,209,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,12054 -544,51,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,12085 -954,135,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,12113 -445,47,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,12144 -829,379,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,12174 -464,758,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,12205 -968,475,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,12235 -842,343,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,12266 -721,507,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,12297 -966,269,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,12327 -332,699,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,12358 -328,824,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,12388 -355,497,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,12419 -506,44,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,12450 -585,522,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,12478 -634,378,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,12509 -662,689,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,12539 -783,90,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,12570 -786,720,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,12600 -710,343,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,12631 -950,457,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,12662 -274,947,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,12692 -406,834,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,12723 -515,71,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,12753 -35,282,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -995,538,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -670,679,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -406,601,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -825,577,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -467,908,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -709,819,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -522,687,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -688,157,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -956,111,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -129,31,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -687,790,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -877,795,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -845,379,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -425,114,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -899,475,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -987,747,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -641,372,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -448,415,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -341,955,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -137,356,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -235,316,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -482,351,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -678,164,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -240,386,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,12054 -605,113,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,12085 -274,68,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,12113 -422,885,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,12144 -763,575,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,12174 -561,743,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,12205 -339,816,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,12235 -877,203,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,12266 -192,581,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,12297 -604,815,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,12327 -55,333,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,12358 -87,40,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,12388 -942,672,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,12419 -912,23,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,12450 -768,948,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,12478 -951,291,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,12509 -768,839,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,12539 -978,864,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,12570 -20,337,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,12600 -298,95,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,12631 -193,535,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,12662 -336,191,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,12692 -617,412,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,12723 -709,711,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,12753 -5,425,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -164,215,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -422,948,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -424,544,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -854,764,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -168,446,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -8,957,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -748,967,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -682,11,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -300,110,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -672,263,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -894,215,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -944,965,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -403,423,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -596,753,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -481,770,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -503,263,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -126,79,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -721,441,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -271,858,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -721,667,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -157,193,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -991,394,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -499,680,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -284,414,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,12054 -705,770,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,12085 -737,679,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,12113 -745,7,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,12144 -633,713,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,12174 -983,851,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,12205 -591,944,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,12235 -42,130,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,12266 -771,485,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,12297 -465,23,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,12327 -296,193,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,12358 -890,7,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,12388 -312,919,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,12419 -777,768,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,12450 -364,854,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,12478 -601,411,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,12509 -823,736,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,12539 -847,10,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,12570 -490,311,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,12600 -387,348,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,12631 -688,458,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,12662 -650,195,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,12692 -447,658,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,12723 -91,704,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,12753 -197,807,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,12054 -51,861,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,12085 -570,873,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,12113 -423,933,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,12144 -524,355,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,12174 -416,794,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,12205 -789,645,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,12235 -551,700,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,12266 -400,831,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,12297 -361,800,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,12327 -189,830,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,12358 -554,828,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,12388 -585,12,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,12419 -281,501,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,12450 -629,914,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,12478 -43,685,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,12509 -533,755,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,12539 -882,708,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,12570 -790,595,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,12600 -600,32,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,12631 -148,49,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,12662 -237,727,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,12692 -488,239,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,12723 -457,273,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,12753 -401,986,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -181,544,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -995,182,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -120,197,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -119,435,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -319,974,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -333,524,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -923,688,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -634,750,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -493,155,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -461,860,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -304,102,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -641,425,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -992,224,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -202,408,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -770,524,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -202,816,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -14,515,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -134,793,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -977,460,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -174,732,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -429,435,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -514,38,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -784,616,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -973,225,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,12054 -511,402,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,12085 -30,697,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,12113 -895,567,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,12144 -557,231,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,12174 -282,372,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,12205 -909,15,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,12235 -276,866,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,12266 -234,452,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,12297 -479,663,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,12327 -782,982,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,12358 -755,813,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,12388 -689,523,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,12419 -496,871,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,12450 -24,511,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,12478 -379,819,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,12509 -441,525,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,12539 -49,13,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,12570 -243,694,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,12600 -295,782,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,12631 -395,839,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,12662 -929,461,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,12692 -997,303,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,12723 -889,421,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,12753 -72,421,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -926,433,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -850,394,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -826,338,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -651,764,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -854,216,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -899,96,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -309,550,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -943,636,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -138,427,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -99,652,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -270,478,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -862,18,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -574,40,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -359,453,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -958,987,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -791,26,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -284,101,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -190,969,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -527,492,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -112,263,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -271,593,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -643,923,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -554,146,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -211,305,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,12054 -368,318,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,12085 -778,417,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,12113 -808,623,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,12144 -46,761,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,12174 -466,272,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,12205 -18,988,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,12235 -87,821,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,12266 -765,962,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,12297 -62,615,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,12327 -13,523,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,12358 -775,806,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,12388 -636,586,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,12419 -458,520,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,12450 -206,908,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,12478 -310,30,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,12509 -813,247,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,12539 -22,647,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,12570 -742,55,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,12600 -394,154,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,12631 -957,344,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,12662 -205,95,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,12692 -198,665,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,12723 -638,145,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,12753 -155,925,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,12054 -688,395,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,12085 -730,749,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,12113 -208,279,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,12144 -525,288,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,12174 -483,509,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,12205 -748,255,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,12235 -6,214,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,12266 -168,473,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,12297 -301,702,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,12327 -9,814,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,12358 -778,231,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,12388 -799,422,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,12419 -309,572,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,12450 -433,363,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,12478 -969,919,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,12509 -181,355,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,12539 -787,992,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,12570 -971,147,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,12600 -440,183,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,12631 -209,375,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,12662 -537,77,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,12692 -364,308,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,12723 -377,660,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,12753 -251,555,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -607,455,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -127,888,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -513,652,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -146,799,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -917,249,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -776,539,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -330,198,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -981,340,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -862,152,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -612,347,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -607,565,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -786,855,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -160,87,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -199,69,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -972,807,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -870,565,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -494,798,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -975,714,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -760,17,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -180,797,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -256,422,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -422,621,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -859,661,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -586,363,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,12054 -441,910,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,12085 -597,998,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,12113 -717,95,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,12144 -713,731,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,12174 -591,718,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,12205 -492,467,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,12235 -170,126,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,12266 -684,127,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,12297 -981,746,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,12327 -966,878,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,12358 -439,27,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,12388 -151,569,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,12419 -602,812,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,12450 -187,603,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,12478 -415,506,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,12509 -61,185,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,12539 -839,692,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,12570 -596,565,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,12600 -751,512,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,12631 -460,86,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,12662 -922,399,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,12692 -153,672,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,12723 -928,801,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,12753 -951,730,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -394,408,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -615,982,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -653,499,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -180,307,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -649,741,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -921,640,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -11,300,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -696,929,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -795,309,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -550,340,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -320,228,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -845,1000,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -245,21,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -142,583,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -717,506,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -3,405,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -790,556,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -646,72,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -230,103,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -938,262,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -629,102,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -317,841,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -812,159,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -141,570,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,12054 -64,375,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,12085 -207,298,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,12113 -435,32,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,12144 -96,760,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,12174 -252,338,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,12205 -956,149,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,12235 -633,343,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,12266 -190,151,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,12297 -227,44,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,12327 -24,583,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,12358 -420,230,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,12388 -910,907,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,12419 -709,783,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,12450 -810,117,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,12478 -723,416,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,12509 -911,318,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,12539 -230,888,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,12570 -448,60,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,12600 -945,596,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,12631 -508,576,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,12662 -262,576,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,12692 -441,280,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,12723 -15,219,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,12753 -795,133,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,12054 -301,273,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,12085 -304,86,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,12113 -49,400,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,12144 -576,364,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,12174 -669,63,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,12205 -325,929,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,12235 -272,344,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,12266 -80,768,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,12297 -46,668,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,12327 -223,407,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,12358 -774,536,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,12388 -784,657,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,12419 -92,215,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,12450 -67,966,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,12478 -747,674,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,12509 -686,574,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,12539 -93,266,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,12570 -192,680,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,12600 -51,362,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,12631 -498,412,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,12662 -546,431,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,12692 -485,94,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,12723 -925,345,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,12753 -292,445,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -540,632,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -21,855,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -100,36,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -49,250,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -353,427,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -911,367,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -823,245,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -278,893,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -576,490,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -655,88,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -763,964,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -88,62,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -746,506,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -927,680,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -297,153,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -291,403,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -838,98,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -112,376,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -509,477,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -472,50,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -495,592,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -1000,813,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -241,740,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -693,873,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,12054 -903,459,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,12085 -791,224,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,12113 -108,562,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,12144 -845,199,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,12174 -452,275,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,12205 -479,355,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,12235 -410,947,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,12266 -379,454,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,12297 -740,450,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,12327 -471,575,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,12358 -325,6,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,12388 -455,847,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,12419 -563,338,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,12450 -879,517,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,12478 -312,630,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,12509 -587,381,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,12539 -628,864,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,12570 -486,416,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,12600 -811,852,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,12631 -990,815,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,12662 -35,23,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,12692 -764,527,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,12723 -619,693,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,12753 -996,977,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -554,549,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -540,951,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -140,390,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -554,204,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -724,78,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -693,613,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -866,745,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -833,56,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -164,887,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -753,651,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -60,691,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -688,767,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -883,709,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -109,417,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -950,326,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -438,599,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -286,818,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -342,13,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -383,185,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -80,140,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -322,717,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -749,852,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -606,125,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -641,325,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,12054 -494,648,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,12085 -428,365,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,12113 -936,120,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,12144 -597,347,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,12174 -728,638,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,12205 -933,732,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,12235 -663,465,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,12266 -394,262,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,12297 -334,947,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,12327 -114,694,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,12358 -89,482,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,12388 -874,600,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,12419 -674,94,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,12450 -347,323,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,12478 -105,49,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,12509 -286,70,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,12539 -669,844,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,12570 -786,773,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,12600 -104,68,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,12631 -770,110,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,12662 -263,42,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,12692 -900,171,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,12723 -630,644,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,12753 -597,408,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,12054 -185,45,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,12085 -175,522,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,12113 -576,166,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,12144 -957,885,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,12174 -993,713,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,12205 -500,838,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,12235 -410,267,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,12266 -592,967,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,12297 -64,529,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,12327 -208,656,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,12358 -273,665,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,12388 -906,419,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,12419 -429,776,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,12450 -961,971,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,12478 -338,248,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,12509 -472,486,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,12539 -903,674,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,12570 -299,603,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,12600 -948,492,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,12631 -931,512,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,12662 -570,391,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,12692 -97,313,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,12723 -674,758,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,12753 -468,304,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -430,846,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -893,912,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -519,810,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -267,122,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -908,102,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -176,161,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -673,450,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -798,215,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -291,765,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -583,557,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -442,739,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -951,811,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -430,780,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -559,645,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -726,365,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -944,597,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -497,126,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -388,655,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -81,604,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -111,280,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -288,115,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -845,205,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -745,672,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -352,339,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,12054 -234,70,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,12085 -167,528,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,12113 -606,220,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,12144 -670,691,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,12174 -764,197,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,12205 -659,239,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,12235 -996,50,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,12266 -424,135,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,12297 -899,972,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,12327 -392,475,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,12358 -555,868,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,12388 -860,451,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,12419 -114,565,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,12450 -943,116,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,12478 -365,385,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,12509 -249,375,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,12539 -192,357,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,12570 -328,230,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,12600 -311,829,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,12631 -576,971,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,12662 -915,280,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,12692 -522,853,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,12723 -625,953,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,12753 -873,874,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -498,578,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -808,768,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -742,178,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -744,916,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -30,917,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -747,633,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -672,107,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -564,523,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -785,924,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -825,481,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -243,240,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -959,819,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -123,602,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -714,538,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -252,632,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -715,952,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -670,480,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -81,700,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -653,726,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -795,526,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -182,410,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -725,307,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -101,73,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -143,232,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,12054 -15,993,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,12085 -742,652,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,12113 -339,761,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,12144 -39,428,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,12174 -465,4,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,12205 -889,101,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,12235 -856,869,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,12266 -358,271,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,12297 -452,633,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,12327 -387,481,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,12358 -824,302,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,12388 -185,245,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,12419 -151,941,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,12450 -419,721,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,12478 -643,893,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,12509 -63,898,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,12539 -202,94,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,12570 -332,962,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,12600 -723,71,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,12631 -148,108,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,12662 -840,71,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,12692 -601,767,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,12723 -962,323,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,12753 -166,982,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,12054 -531,614,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,12085 -963,839,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,12113 -994,388,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,12144 -978,296,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,12174 -72,429,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,12205 -33,901,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,12235 -428,350,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,12266 -413,581,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,12297 -737,583,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,12327 -85,92,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,12358 -916,647,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,12388 -785,771,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,12419 -302,26,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,12450 -1000,598,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,12478 -458,715,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,12509 -896,74,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,12539 -615,580,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,12570 -174,848,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,12600 -651,118,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,12631 -784,54,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,12662 -121,929,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,12692 -341,393,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,12723 -615,820,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,12753 -697,336,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -215,299,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -197,747,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -205,154,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -256,486,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -377,251,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -577,225,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -686,77,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -332,74,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -534,596,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -485,493,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -594,782,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -413,487,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -13,127,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -483,538,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -820,94,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -745,252,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -79,722,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -36,536,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -950,958,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -74,466,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -458,309,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -609,680,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -429,539,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -956,511,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,12054 -205,505,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,12085 -629,720,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,12113 -277,823,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,12144 -266,21,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,12174 -872,142,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,12205 -435,95,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,12235 -988,398,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,12266 -953,328,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,12297 -556,151,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,12327 -211,978,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,12358 -389,918,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,12388 -351,542,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,12419 -14,96,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,12450 -181,496,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,12478 -452,77,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,12509 -511,236,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,12539 -193,913,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,12570 -797,49,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,12600 -988,967,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,12631 -487,502,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,12662 -941,790,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,12692 -577,121,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,12723 -456,55,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,12753 -982,739,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -593,683,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -702,610,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -528,248,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -873,530,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -301,889,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -769,245,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -724,473,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -466,938,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -774,150,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -111,772,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -954,201,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -780,945,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -210,177,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -93,378,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -332,83,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -186,803,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -782,398,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -41,215,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -222,194,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -992,287,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -477,410,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -948,50,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -817,204,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -597,239,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,12054 -649,637,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,12085 -3,938,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,12113 -731,788,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,12144 -181,399,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,12174 -468,576,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,12205 -891,187,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,12235 -226,703,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,12266 -28,455,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,12297 -609,244,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,12327 -224,868,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,12358 -230,353,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,12388 -216,101,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,12419 -282,924,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,12450 -501,144,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,12478 -320,0,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,12509 -720,910,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,12539 -464,259,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,12570 -363,107,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,12600 -49,63,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,12631 -223,270,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,12662 -452,554,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,12692 -210,154,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,12723 -444,205,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,12753 -222,441,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,12054 -678,183,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,12085 -25,459,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,12113 -57,810,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,12144 -981,268,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,12174 -740,916,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,12205 -408,742,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,12235 -966,522,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,12266 -107,299,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,12297 -488,677,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,12327 -759,709,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,12358 -504,310,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,12388 -99,160,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,12419 -503,698,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,12450 -724,540,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,12478 -309,901,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,12509 -625,34,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,12539 -294,536,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,12570 -890,780,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,12600 -501,716,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,12631 -34,532,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,12662 -203,871,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,12692 -140,199,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,12723 -845,845,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,12753 -774,591,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -645,378,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -986,942,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -296,686,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -936,720,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -341,546,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -32,845,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -277,667,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -548,627,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -727,142,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -812,655,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -168,556,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -150,459,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -136,89,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -695,726,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -363,38,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -853,60,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -621,369,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -764,381,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -669,465,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -772,981,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -228,758,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -261,31,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -821,237,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -100,285,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,12054 -465,94,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,12085 -350,561,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,12113 -991,143,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,12144 -910,95,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,12174 -206,341,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,12205 -263,388,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,12235 -374,272,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,12266 -875,890,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,12297 -810,734,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,12327 -398,364,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,12358 -565,619,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,12388 -417,517,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,12419 -291,781,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,12450 -251,327,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,12478 -449,48,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,12509 -774,809,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,12539 -386,73,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,12570 -22,936,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,12600 -940,400,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,12631 -132,736,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,12662 -103,211,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,12692 -152,271,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,12723 -952,855,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,12753 -872,923,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -748,854,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -749,769,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -876,271,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -860,383,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -900,29,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -705,185,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -913,351,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -315,560,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -466,840,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -233,517,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -906,949,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -148,633,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -661,636,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -847,138,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -768,481,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -866,408,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -475,130,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -112,813,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -136,661,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -763,311,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -388,872,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -996,643,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -486,174,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -494,528,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,12054 -771,124,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,12085 -49,126,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,12113 -322,440,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,12144 -878,881,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,12174 -827,292,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,12205 -852,873,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,12235 -716,357,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,12266 -81,247,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,12297 -916,18,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,12327 -673,395,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,12358 -242,620,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,12388 -914,946,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,12419 -902,72,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,12450 -707,691,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,12478 -223,95,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,12509 -619,878,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,12539 -254,757,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,12570 -688,898,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,12600 -477,172,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,12631 -280,419,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,12662 -546,849,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,12692 -630,807,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,12723 -455,599,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,12753 -505,59,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,12054 -823,790,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,12085 -891,574,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,12113 -840,96,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,12144 -436,376,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,12174 -168,352,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,12205 -177,741,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,12235 -727,12,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,12266 -278,157,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,12297 -443,10,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,12327 -905,544,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,12358 -881,817,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,12388 -507,754,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,12419 -363,425,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,12450 -603,492,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,12478 -473,485,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,12509 -128,369,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,12539 -105,560,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,12570 -325,651,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,12600 -711,326,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,12631 -983,180,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,12662 -241,935,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,12692 -71,403,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,12723 -395,345,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,12753 -168,278,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -512,376,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -291,104,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -776,543,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -271,798,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -946,333,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -195,833,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -165,132,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -238,629,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -409,337,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -720,300,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -309,470,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -812,875,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -441,237,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -500,272,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -517,860,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -924,415,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -572,140,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -768,367,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -692,195,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -28,245,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -202,285,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -76,98,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -421,932,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -636,898,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,12054 -52,330,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,12085 -184,603,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,12113 -739,280,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,12144 -841,507,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,12174 -65,202,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,12205 -623,513,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,12235 -517,132,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,12266 -636,21,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,12297 -845,657,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,12327 -232,195,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,12358 -26,323,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,12388 -680,299,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,12419 -364,811,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,12450 -572,739,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,12478 -145,889,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,12509 -644,189,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,12539 -87,698,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,12570 -620,646,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,12600 -535,562,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,12631 -661,753,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,12662 -884,425,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,12692 -689,693,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,12723 -646,941,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,12753 -4,975,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -813,455,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -773,260,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -205,69,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -657,147,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -154,533,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -747,881,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -787,457,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -867,441,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -307,859,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -571,177,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -92,633,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -269,382,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -764,707,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -662,566,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -818,349,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -617,128,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -649,231,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -895,258,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -750,812,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -738,362,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -107,133,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -278,60,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -32,88,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -129,378,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,12054 -187,569,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,12085 -670,186,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,12113 -678,875,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,12144 -423,636,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,12174 -389,360,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,12205 -257,677,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,12235 -780,708,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,12266 -159,158,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,12297 -97,384,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,12327 -479,927,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,12358 -9,134,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,12388 -614,273,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,12419 -261,27,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,12450 -115,209,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,12478 -358,470,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,12509 -133,219,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,12539 -891,907,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,12570 -702,778,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,12600 -58,998,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,12631 -606,194,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,12662 -668,933,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,12692 -813,708,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,12723 -450,949,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,12753 -956,579,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,12054 -276,131,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,12085 -889,689,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,12113 -708,908,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,12144 -14,524,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,12174 -904,336,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,12205 -272,916,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,12235 -257,236,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,12266 -343,965,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,12297 -80,350,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,12327 -530,599,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,12358 -340,901,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,12388 -595,935,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,12419 -47,667,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,12450 -279,104,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,12478 -293,803,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,12509 -162,64,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,12539 -935,825,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,12570 -689,839,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,12600 -484,184,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,12631 -230,348,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,12662 -164,904,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,12692 -401,219,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,12723 -607,381,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,12753 -229,524,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -786,902,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -92,212,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -455,762,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -409,182,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -166,442,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -277,919,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -92,67,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -631,741,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -390,617,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -403,214,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -964,202,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -223,788,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -684,639,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -645,336,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -470,937,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -424,399,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -862,21,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -736,125,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -554,635,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -790,229,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -115,770,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -853,622,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -643,109,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -794,975,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,12054 -892,820,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,12085 -728,123,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,12113 -744,135,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,12144 -678,535,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,12174 -768,971,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,12205 -234,166,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,12235 -333,814,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,12266 -968,557,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,12297 -119,820,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,12327 -469,486,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,12358 -261,429,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,12388 -984,65,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,12419 -845,977,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,12450 -374,410,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,12478 -687,150,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,12509 -157,630,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,12539 -49,488,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,12570 -817,112,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,12600 -223,598,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,12631 -433,705,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,12662 -41,226,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,12692 -396,979,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,12723 -131,19,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,12753 -521,204,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -751,805,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -45,549,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -144,912,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -119,427,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -728,1,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -120,540,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -657,940,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -409,644,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -881,821,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -113,560,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -831,309,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -129,1000,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -76,945,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -260,931,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -882,504,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -157,950,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -443,278,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -111,225,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -497,6,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -321,124,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -194,206,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -684,320,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -634,270,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -622,278,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,12054 -689,447,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,12085 -120,170,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,12113 -374,87,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,12144 -926,384,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,12174 -687,574,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,12205 -600,585,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,12235 -779,947,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,12266 -223,984,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,12297 -628,189,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,12327 -326,364,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,12358 -836,49,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,12388 -361,851,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,12419 -444,643,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,12450 -501,143,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,12478 -743,763,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,12509 -861,987,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,12539 -203,264,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,12570 -762,439,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,12600 -705,750,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,12631 -153,37,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,12662 -436,95,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,12692 -428,79,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,12723 -804,832,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,12753 -805,649,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,12054 -860,838,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,12085 -104,439,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,12113 -434,207,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,12144 -912,804,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,12174 -571,875,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,12205 -267,473,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,12235 -415,845,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,12266 -261,91,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,12297 -746,630,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,12327 -30,185,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,12358 -662,317,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,12388 -916,88,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,12419 -415,607,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,12450 -514,35,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,12478 -756,680,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,12509 -461,78,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,12539 -460,117,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,12570 -305,440,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,12600 -198,652,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,12631 -234,249,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,12662 -638,658,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,12692 -88,563,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,12723 -751,737,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,12753 -816,789,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -437,988,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -715,220,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -780,946,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -245,986,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -201,129,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -815,433,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -865,492,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -634,306,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -901,154,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -789,206,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -882,81,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -953,882,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -862,848,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -628,664,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -765,389,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -741,182,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -61,505,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -470,861,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -869,263,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -650,400,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -750,556,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -602,497,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -54,181,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -384,619,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,12054 -161,332,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,12085 -977,669,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,12113 -615,487,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,12144 -783,994,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,12174 -977,331,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,12205 -375,739,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,12235 -298,665,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,12266 -104,921,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,12297 -713,862,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,12327 -556,662,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,12358 -323,517,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,12388 -391,352,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,12419 -593,166,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,12450 -906,859,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,12478 -130,571,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,12509 -613,976,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,12539 -58,466,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,12570 -314,79,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,12600 -67,864,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,12631 -654,623,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,12662 -312,170,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,12692 -349,662,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,12723 -415,763,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,12753 -404,896,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12054 -22,973,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12085 -744,161,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,12113 -804,934,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12144 -101,697,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12174 -293,116,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,12205 -266,84,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12235 -372,604,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12266 -38,371,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,12297 -385,783,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12327 -262,335,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12358 -961,321,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,12388 -831,177,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12419 -579,371,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12450 -301,583,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,12478 -693,364,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12509 -895,343,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12539 -320,854,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,12570 -284,691,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12600 -362,387,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12631 -132,298,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,12662 -42,635,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12692 -118,81,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12723 -42,375,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,12753 -18,846,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,12054 -512,933,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,12085 -337,237,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,12113 -167,964,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,12144 -749,382,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,12174 -890,610,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,12205 -910,148,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,12235 -403,837,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,12266 -403,85,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,12297 -661,425,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,12327 -485,633,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,12358 -789,515,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,12388 -415,512,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,12419 -418,156,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,12450 -163,464,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,12478 -298,813,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,12509 -584,455,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,12539 -797,366,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,12570 -767,734,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,12600 -984,451,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,12631 -388,134,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,12662 -924,547,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,12692 -566,802,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,12723 -390,61,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,12753 -608,556,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,12054 -840,202,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,12085 -112,964,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,12113 -288,112,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,12144 -408,445,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,12174 -876,884,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,12205 -224,348,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,12235 -133,564,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,12266 -662,568,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,12297 -68,882,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,12327 -626,542,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,12358 -678,119,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,12388 -361,248,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,12419 -464,868,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,12450 -681,841,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,12478 -377,484,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,12509 -222,986,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,12539 -972,39,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,12570 -56,930,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,12600 -695,252,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,12631 -908,794,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,12662 -328,658,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,12692 -891,139,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,12723 -265,331,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,12753 -251,261,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12054 -783,122,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12085 -425,296,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,12113 -859,391,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12144 -314,75,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12174 -153,731,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,12205 -955,883,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12235 -654,707,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12266 -693,97,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,12297 -757,390,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12327 -221,237,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12358 -942,496,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,12388 -31,814,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12419 -540,765,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12450 -352,308,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,12478 -904,327,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12509 -436,266,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12539 -281,699,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,12570 -801,599,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12600 -273,950,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12631 -716,117,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,12662 -902,632,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12692 -341,35,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12723 -155,562,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,12753 -796,144,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,12054 -257,142,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,12085 -611,273,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,12113 -6,915,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,12144 -125,920,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,12174 -745,294,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,12205 -437,681,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,12235 -906,86,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,12266 -844,764,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,12297 -413,269,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,12327 -869,138,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,12358 -403,834,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,12388 -137,112,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,12419 -922,921,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,12450 -202,859,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,12478 -955,442,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,12509 -781,593,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,12539 -12,346,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,12570 -931,312,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,12600 -95,690,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,12631 -795,344,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,12662 -542,784,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,12692 -935,639,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,12723 -269,726,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,12753 -197,596,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12054 -828,263,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12085 -461,194,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,12113 -35,895,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12144 -88,502,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12174 -832,342,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,12205 -900,421,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12235 -368,901,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12266 -201,474,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,12297 -758,571,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12327 -504,511,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12358 -864,379,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,12388 -574,68,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12419 -61,210,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12450 -565,478,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,12478 -475,296,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12509 -44,664,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12539 -145,880,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,12570 -813,607,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12600 -703,97,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12631 -757,908,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,12662 -96,152,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12692 -860,622,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12723 -750,309,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,12753 -585,912,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,12054 -127,429,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,12085 -669,580,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,12113 -708,179,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,12144 -830,119,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,12174 -550,369,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,12205 -762,882,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,12235 -468,727,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,12266 -151,823,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,12297 -103,783,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,12327 -876,884,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,12358 -881,891,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,12388 -116,909,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,12419 -677,765,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,12450 -477,180,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,12478 -154,712,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,12509 -331,175,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,12539 -784,869,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,12570 -563,820,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,12600 -229,554,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,12631 -451,126,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,12662 -974,760,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,12692 -484,446,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,12723 -69,254,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,12753 -755,516,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,12054 -331,779,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,12085 -482,987,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,12113 -632,318,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,12144 -750,427,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,12174 -618,86,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,12205 -935,553,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,12235 -716,315,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,12266 -205,328,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,12297 -215,521,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,12327 -871,156,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,12358 -552,841,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,12388 -619,623,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,12419 -701,849,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,12450 -104,438,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,12478 -114,719,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,12509 -854,906,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,12539 -563,267,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,12570 -73,542,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,12600 -427,552,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,12631 -348,428,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,12662 -148,158,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,12692 -895,379,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,12723 -394,142,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,12753 -792,588,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12054 -175,506,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12085 -208,382,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,12113 -354,132,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12144 -163,652,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12174 -336,723,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,12205 -804,682,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12235 -863,382,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12266 -326,125,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,12297 -568,321,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12327 -691,922,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12358 -152,884,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,12388 -565,38,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12419 -38,194,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12450 -185,996,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,12478 -318,532,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12509 -960,391,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12539 -122,104,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,12570 -400,22,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12600 -301,650,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12631 -909,143,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,12662 -433,999,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12692 -508,415,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12723 -648,350,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,12753 -793,342,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,12054 -129,215,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,12085 -481,52,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,12113 -406,292,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,12144 -512,862,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,12174 -668,309,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,12205 -551,886,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,12235 -124,172,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,12266 -655,912,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,12297 -523,666,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,12327 -739,656,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,12358 -87,145,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,12388 -890,664,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,12419 -665,639,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,12450 -329,707,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,12478 -417,891,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,12509 -828,466,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,12539 -298,451,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,12570 -356,451,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,12600 -909,874,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,12631 -251,805,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,12662 -526,426,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,12692 -652,932,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,12723 -573,581,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,12753 diff --git a/pandas/io/tests/sas/test_sas.py b/pandas/io/tests/sas/test_sas.py deleted file mode 100644 index 237e3676c3b3d..0000000000000 --- a/pandas/io/tests/sas/test_sas.py +++ /dev/null @@ -1,13 +0,0 @@ -import pandas.util.testing as tm -from pandas.compat import StringIO -from pandas import read_sas - - -class TestSas(tm.TestCase): - - def test_sas_buffer_format(self): - - # GH14947 - b = StringIO("") - with self.assertRaises(ValueError): - read_sas(b) diff --git a/pandas/io/tests/test_common.py b/pandas/io/tests/test_common.py deleted file mode 100644 index 3c980cae3351a..0000000000000 --- a/pandas/io/tests/test_common.py +++ /dev/null @@ -1,159 +0,0 @@ -""" - Tests for the pandas.io.common functionalities -""" -import mmap -import os -from os.path import isabs - -import pandas.util.testing as tm - -from pandas.io import common -from pandas.compat import is_platform_windows, StringIO - -from pandas import read_csv, concat -import pandas as pd - -try: - from pathlib import Path -except ImportError: - pass - -try: - from py.path import local as LocalPath -except ImportError: - pass - - -class TestCommonIOCapabilities(tm.TestCase): - data1 = """index,A,B,C,D -foo,2,3,4,5 -bar,7,8,9,10 -baz,12,13,14,15 -qux,12,13,14,15 -foo2,12,13,14,15 -bar2,12,13,14,15 -""" - - def test_expand_user(self): - filename = '~/sometest' - expanded_name = common._expand_user(filename) - - self.assertNotEqual(expanded_name, filename) - self.assertTrue(isabs(expanded_name)) - self.assertEqual(os.path.expanduser(filename), expanded_name) - - def test_expand_user_normal_path(self): - filename = '/somefolder/sometest' - expanded_name = common._expand_user(filename) - - self.assertEqual(expanded_name, filename) - self.assertEqual(os.path.expanduser(filename), expanded_name) - - def test_stringify_path_pathlib(self): - tm._skip_if_no_pathlib() - - rel_path = common._stringify_path(Path('.')) - self.assertEqual(rel_path, '.') - redundant_path = common._stringify_path(Path('foo//bar')) - self.assertEqual(redundant_path, os.path.join('foo', 'bar')) - - def test_stringify_path_localpath(self): - tm._skip_if_no_localpath() - - path = os.path.join('foo', 'bar') - abs_path = os.path.abspath(path) - lpath = LocalPath(path) - self.assertEqual(common._stringify_path(lpath), abs_path) - - def test_get_filepath_or_buffer_with_path(self): - filename = '~/sometest' - filepath_or_buffer, _, _ = common.get_filepath_or_buffer(filename) - self.assertNotEqual(filepath_or_buffer, filename) - self.assertTrue(isabs(filepath_or_buffer)) - self.assertEqual(os.path.expanduser(filename), filepath_or_buffer) - - def test_get_filepath_or_buffer_with_buffer(self): - input_buffer = StringIO() - filepath_or_buffer, _, _ = common.get_filepath_or_buffer(input_buffer) - self.assertEqual(filepath_or_buffer, input_buffer) - - def test_iterator(self): - reader = read_csv(StringIO(self.data1), chunksize=1) - result = concat(reader, ignore_index=True) - expected = read_csv(StringIO(self.data1)) - tm.assert_frame_equal(result, expected) - - # GH12153 - it = read_csv(StringIO(self.data1), chunksize=1) - first = next(it) - tm.assert_frame_equal(first, expected.iloc[[0]]) - tm.assert_frame_equal(concat(it), expected.iloc[1:]) - - def test_error_rename(self): - # see gh-12665 - try: - raise common.CParserError() - except common.ParserError: - pass - - try: - raise common.ParserError() - except common.CParserError: - pass - - try: - raise common.ParserError() - except pd.parser.CParserError: - pass - - -class TestMMapWrapper(tm.TestCase): - - def setUp(self): - self.mmap_file = os.path.join(tm.get_data_path(), - 'test_mmap.csv') - - def test_constructor_bad_file(self): - non_file = StringIO('I am not a file') - non_file.fileno = lambda: -1 - - # the error raised is different on Windows - if is_platform_windows(): - msg = "The parameter is incorrect" - err = OSError - else: - msg = "[Errno 22]" - err = mmap.error - - tm.assertRaisesRegexp(err, msg, common.MMapWrapper, non_file) - - target = open(self.mmap_file, 'r') - target.close() - - msg = "I/O operation on closed file" - tm.assertRaisesRegexp(ValueError, msg, common.MMapWrapper, target) - - def test_get_attr(self): - with open(self.mmap_file, 'r') as target: - wrapper = common.MMapWrapper(target) - - attrs = dir(wrapper.mmap) - attrs = [attr for attr in attrs - if not attr.startswith('__')] - attrs.append('__next__') - - for attr in attrs: - self.assertTrue(hasattr(wrapper, attr)) - - self.assertFalse(hasattr(wrapper, 'foo')) - - def test_next(self): - with open(self.mmap_file, 'r') as target: - wrapper = common.MMapWrapper(target) - lines = target.readlines() - - for line in lines: - next_line = next(wrapper) - self.assertEqual(next_line.strip(), line.strip()) - - self.assertRaises(StopIteration, next, wrapper) diff --git a/pandas/io/tests/test_date_converters.py b/pandas/io/tests/test_date_converters.py deleted file mode 100644 index 99abbacb604fa..0000000000000 --- a/pandas/io/tests/test_date_converters.py +++ /dev/null @@ -1,156 +0,0 @@ -from pandas.compat import StringIO -from datetime import date, datetime - -import nose - -import numpy as np - -from pandas import DataFrame, MultiIndex -from pandas.io.parsers import (read_csv, read_table) -from pandas.util.testing import assert_frame_equal -import pandas.io.date_converters as conv -import pandas.util.testing as tm -from pandas.compat.numpy import np_array_datetime64_compat - - -class TestConverters(tm.TestCase): - - def setUp(self): - self.years = np.array([2007, 2008]) - self.months = np.array([1, 2]) - self.days = np.array([3, 4]) - self.hours = np.array([5, 6]) - self.minutes = np.array([7, 8]) - self.seconds = np.array([9, 0]) - self.dates = np.array(['2007/1/3', '2008/2/4'], dtype=object) - self.times = np.array(['05:07:09', '06:08:00'], dtype=object) - self.expected = np.array([datetime(2007, 1, 3, 5, 7, 9), - datetime(2008, 2, 4, 6, 8, 0)]) - - def test_parse_date_time(self): - result = conv.parse_date_time(self.dates, self.times) - self.assertTrue((result == self.expected).all()) - - data = """\ -date, time, a, b -2001-01-05, 10:00:00, 0.0, 10. -2001-01-05, 00:00:00, 1., 11. -""" - datecols = {'date_time': [0, 1]} - df = read_table(StringIO(data), sep=',', header=0, - parse_dates=datecols, date_parser=conv.parse_date_time) - self.assertIn('date_time', df) - self.assertEqual(df.date_time.loc[0], datetime(2001, 1, 5, 10, 0, 0)) - - data = ("KORD,19990127, 19:00:00, 18:56:00, 0.8100\n" - "KORD,19990127, 20:00:00, 19:56:00, 0.0100\n" - "KORD,19990127, 21:00:00, 20:56:00, -0.5900\n" - "KORD,19990127, 21:00:00, 21:18:00, -0.9900\n" - "KORD,19990127, 22:00:00, 21:56:00, -0.5900\n" - "KORD,19990127, 23:00:00, 22:56:00, -0.5900") - - date_spec = {'nominal': [1, 2], 'actual': [1, 3]} - df = read_csv(StringIO(data), header=None, parse_dates=date_spec, - date_parser=conv.parse_date_time) - - def test_parse_date_fields(self): - result = conv.parse_date_fields(self.years, self.months, self.days) - expected = np.array([datetime(2007, 1, 3), datetime(2008, 2, 4)]) - self.assertTrue((result == expected).all()) - - data = ("year, month, day, a\n 2001 , 01 , 10 , 10.\n" - "2001 , 02 , 1 , 11.") - datecols = {'ymd': [0, 1, 2]} - df = read_table(StringIO(data), sep=',', header=0, - parse_dates=datecols, - date_parser=conv.parse_date_fields) - self.assertIn('ymd', df) - self.assertEqual(df.ymd.loc[0], datetime(2001, 1, 10)) - - def test_datetime_six_col(self): - result = conv.parse_all_fields(self.years, self.months, self.days, - self.hours, self.minutes, self.seconds) - self.assertTrue((result == self.expected).all()) - - data = """\ -year, month, day, hour, minute, second, a, b -2001, 01, 05, 10, 00, 0, 0.0, 10. -2001, 01, 5, 10, 0, 00, 1., 11. -""" - datecols = {'ymdHMS': [0, 1, 2, 3, 4, 5]} - df = read_table(StringIO(data), sep=',', header=0, - parse_dates=datecols, - date_parser=conv.parse_all_fields) - self.assertIn('ymdHMS', df) - self.assertEqual(df.ymdHMS.loc[0], datetime(2001, 1, 5, 10, 0, 0)) - - def test_datetime_fractional_seconds(self): - data = """\ -year, month, day, hour, minute, second, a, b -2001, 01, 05, 10, 00, 0.123456, 0.0, 10. -2001, 01, 5, 10, 0, 0.500000, 1., 11. -""" - datecols = {'ymdHMS': [0, 1, 2, 3, 4, 5]} - df = read_table(StringIO(data), sep=',', header=0, - parse_dates=datecols, - date_parser=conv.parse_all_fields) - self.assertIn('ymdHMS', df) - self.assertEqual(df.ymdHMS.loc[0], datetime(2001, 1, 5, 10, 0, 0, - microsecond=123456)) - self.assertEqual(df.ymdHMS.loc[1], datetime(2001, 1, 5, 10, 0, 0, - microsecond=500000)) - - def test_generic(self): - data = "year, month, day, a\n 2001, 01, 10, 10.\n 2001, 02, 1, 11." - datecols = {'ym': [0, 1]} - dateconverter = lambda y, m: date(year=int(y), month=int(m), day=1) - df = read_table(StringIO(data), sep=',', header=0, - parse_dates=datecols, - date_parser=dateconverter) - self.assertIn('ym', df) - self.assertEqual(df.ym.loc[0], date(2001, 1, 1)) - - def test_dateparser_resolution_if_not_ns(self): - # issue 10245 - data = """\ -date,time,prn,rxstatus -2013-11-03,19:00:00,126,00E80000 -2013-11-03,19:00:00,23,00E80000 -2013-11-03,19:00:00,13,00E80000 -""" - - def date_parser(date, time): - datetime = np_array_datetime64_compat( - date + 'T' + time + 'Z', dtype='datetime64[s]') - return datetime - - df = read_csv(StringIO(data), date_parser=date_parser, - parse_dates={'datetime': ['date', 'time']}, - index_col=['datetime', 'prn']) - - datetimes = np_array_datetime64_compat(['2013-11-03T19:00:00Z'] * 3, - dtype='datetime64[s]') - df_correct = DataFrame(data={'rxstatus': ['00E80000'] * 3}, - index=MultiIndex.from_tuples( - [(datetimes[0], 126), - (datetimes[1], 23), - (datetimes[2], 13)], - names=['datetime', 'prn'])) - assert_frame_equal(df, df_correct) - - def test_parse_date_column_with_empty_string(self): - # GH 6428 - data = """case,opdate - 7,10/18/2006 - 7,10/18/2008 - 621, """ - result = read_csv(StringIO(data), parse_dates=['opdate']) - expected_data = [[7, '10/18/2006'], - [7, '10/18/2008'], - [621, ' ']] - expected = DataFrame(expected_data, columns=['case', 'opdate']) - assert_frame_equal(result, expected) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/test_gbq.py b/pandas/io/tests/test_gbq.py deleted file mode 100644 index 8a414dcd3ba4f..0000000000000 --- a/pandas/io/tests/test_gbq.py +++ /dev/null @@ -1,1230 +0,0 @@ -import re -from datetime import datetime -import nose -import pytz -import platform -from time import sleep -import os -import logging - -import numpy as np - -from distutils.version import StrictVersion -from pandas import compat - -from pandas import NaT -from pandas.compat import u, range -from pandas.core.frame import DataFrame -import pandas.io.gbq as gbq -import pandas.util.testing as tm -from pandas.compat.numpy import np_datetime64_compat - -PROJECT_ID = None -PRIVATE_KEY_JSON_PATH = None -PRIVATE_KEY_JSON_CONTENTS = None - -if compat.PY3: - DATASET_ID = 'pydata_pandas_bq_testing_py3' -else: - DATASET_ID = 'pydata_pandas_bq_testing_py2' - -TABLE_ID = 'new_test' -DESTINATION_TABLE = "{0}.{1}".format(DATASET_ID + "1", TABLE_ID) - -VERSION = platform.python_version() - -_IMPORTS = False -_GOOGLE_API_CLIENT_INSTALLED = False -_GOOGLE_API_CLIENT_VALID_VERSION = False -_HTTPLIB2_INSTALLED = False -_SETUPTOOLS_INSTALLED = False - - -def _skip_if_no_project_id(): - if not _get_project_id(): - raise nose.SkipTest( - "Cannot run integration tests without a project id") - - -def _skip_if_no_private_key_path(): - if not _get_private_key_path(): - raise nose.SkipTest("Cannot run integration tests without a " - "private key json file path") - - -def _skip_if_no_private_key_contents(): - if not _get_private_key_contents(): - raise nose.SkipTest("Cannot run integration tests without a " - "private key json contents") - - -def _in_travis_environment(): - return 'TRAVIS_BUILD_DIR' in os.environ and \ - 'GBQ_PROJECT_ID' in os.environ - - -def _get_project_id(): - if _in_travis_environment(): - return os.environ.get('GBQ_PROJECT_ID') - else: - return PROJECT_ID - - -def _get_private_key_path(): - if _in_travis_environment(): - return os.path.join(*[os.environ.get('TRAVIS_BUILD_DIR'), 'ci', - 'travis_gbq.json']) - else: - return PRIVATE_KEY_JSON_PATH - - -def _get_private_key_contents(): - if _in_travis_environment(): - with open(os.path.join(*[os.environ.get('TRAVIS_BUILD_DIR'), 'ci', - 'travis_gbq.json'])) as f: - return f.read() - else: - return PRIVATE_KEY_JSON_CONTENTS - - -def _test_imports(): - global _GOOGLE_API_CLIENT_INSTALLED, _GOOGLE_API_CLIENT_VALID_VERSION, \ - _HTTPLIB2_INSTALLED, _SETUPTOOLS_INSTALLED - - try: - import pkg_resources - _SETUPTOOLS_INSTALLED = True - except ImportError: - _SETUPTOOLS_INSTALLED = False - - if compat.PY3: - google_api_minimum_version = '1.4.1' - else: - google_api_minimum_version = '1.2.0' - - if _SETUPTOOLS_INSTALLED: - try: - try: - from googleapiclient.discovery import build # noqa - from googleapiclient.errors import HttpError # noqa - except: - from apiclient.discovery import build # noqa - from apiclient.errors import HttpError # noqa - - from oauth2client.client import OAuth2WebServerFlow # noqa - from oauth2client.client import AccessTokenRefreshError # noqa - - from oauth2client.file import Storage # noqa - from oauth2client.tools import run_flow # noqa - _GOOGLE_API_CLIENT_INSTALLED = True - _GOOGLE_API_CLIENT_VERSION = pkg_resources.get_distribution( - 'google-api-python-client').version - - if (StrictVersion(_GOOGLE_API_CLIENT_VERSION) >= - StrictVersion(google_api_minimum_version)): - _GOOGLE_API_CLIENT_VALID_VERSION = True - - except ImportError: - _GOOGLE_API_CLIENT_INSTALLED = False - - try: - import httplib2 # noqa - _HTTPLIB2_INSTALLED = True - except ImportError: - _HTTPLIB2_INSTALLED = False - - if not _SETUPTOOLS_INSTALLED: - raise ImportError('Could not import pkg_resources (setuptools).') - - if not _GOOGLE_API_CLIENT_INSTALLED: - raise ImportError('Could not import Google API Client.') - - if not _GOOGLE_API_CLIENT_VALID_VERSION: - raise ImportError("pandas requires google-api-python-client >= {0} " - "for Google BigQuery support, " - "current version {1}" - .format(google_api_minimum_version, - _GOOGLE_API_CLIENT_VERSION)) - - if not _HTTPLIB2_INSTALLED: - raise ImportError( - "pandas requires httplib2 for Google BigQuery support") - - # Bug fix for https://github.com/pandas-dev/pandas/issues/12572 - # We need to know that a supported version of oauth2client is installed - # Test that either of the following is installed: - # - SignedJwtAssertionCredentials from oauth2client.client - # - ServiceAccountCredentials from oauth2client.service_account - # SignedJwtAssertionCredentials is available in oauthclient < 2.0.0 - # ServiceAccountCredentials is available in oauthclient >= 2.0.0 - oauth2client_v1 = True - oauth2client_v2 = True - - try: - from oauth2client.client import SignedJwtAssertionCredentials # noqa - except ImportError: - oauth2client_v1 = False - - try: - from oauth2client.service_account import ServiceAccountCredentials # noqa - except ImportError: - oauth2client_v2 = False - - if not oauth2client_v1 and not oauth2client_v2: - raise ImportError("Missing oauth2client required for BigQuery " - "service account support") - - -def _setup_common(): - try: - _test_imports() - except (ImportError, NotImplementedError) as import_exception: - raise nose.SkipTest(import_exception) - - if _in_travis_environment(): - logging.getLogger('oauth2client').setLevel(logging.ERROR) - logging.getLogger('apiclient').setLevel(logging.ERROR) - - -def _check_if_can_get_correct_default_credentials(): - # Checks if "Application Default Credentials" can be fetched - # from the environment the tests are running in. - # See Issue #13577 - - import httplib2 - try: - from googleapiclient.discovery import build - except ImportError: - from apiclient.discovery import build - try: - from oauth2client.client import GoogleCredentials - credentials = GoogleCredentials.get_application_default() - http = httplib2.Http() - http = credentials.authorize(http) - bigquery_service = build('bigquery', 'v2', http=http) - jobs = bigquery_service.jobs() - job_data = {'configuration': {'query': {'query': 'SELECT 1'}}} - jobs.insert(projectId=_get_project_id(), body=job_data).execute() - return True - except: - return False - - -def clean_gbq_environment(private_key=None): - dataset = gbq._Dataset(_get_project_id(), private_key=private_key) - - for i in range(1, 10): - if DATASET_ID + str(i) in dataset.datasets(): - dataset_id = DATASET_ID + str(i) - table = gbq._Table(_get_project_id(), dataset_id, - private_key=private_key) - for j in range(1, 20): - if TABLE_ID + str(j) in dataset.tables(dataset_id): - table.delete(TABLE_ID + str(j)) - - dataset.delete(dataset_id) - - -def make_mixed_dataframe_v2(test_size): - # create df to test for all BQ datatypes except RECORD - bools = np.random.randint(2, size=(1, test_size)).astype(bool) - flts = np.random.randn(1, test_size) - ints = np.random.randint(1, 10, size=(1, test_size)) - strs = np.random.randint(1, 10, size=(1, test_size)).astype(str) - times = [datetime.now(pytz.timezone('US/Arizona')) - for t in range(test_size)] - return DataFrame({'bools': bools[0], - 'flts': flts[0], - 'ints': ints[0], - 'strs': strs[0], - 'times': times[0]}, - index=range(test_size)) - - -def test_generate_bq_schema_deprecated(): - # 11121 Deprecation of generate_bq_schema - with tm.assert_produces_warning(FutureWarning): - df = make_mixed_dataframe_v2(10) - gbq.generate_bq_schema(df) - - -class TestGBQConnectorIntegration(tm.TestCase): - - def setUp(self): - _setup_common() - _skip_if_no_project_id() - - self.sut = gbq.GbqConnector(_get_project_id(), - private_key=_get_private_key_path()) - - def test_should_be_able_to_make_a_connector(self): - self.assertTrue(self.sut is not None, - 'Could not create a GbqConnector') - - def test_should_be_able_to_get_valid_credentials(self): - credentials = self.sut.get_credentials() - self.assertFalse(credentials.invalid, 'Returned credentials invalid') - - def test_should_be_able_to_get_a_bigquery_service(self): - bigquery_service = self.sut.get_service() - self.assertTrue(bigquery_service is not None, 'No service returned') - - def test_should_be_able_to_get_schema_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(schema is not None) - - def test_should_be_able_to_get_results_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(pages is not None) - - def test_get_application_default_credentials_does_not_throw_error(self): - if _check_if_can_get_correct_default_credentials(): - raise nose.SkipTest("Can get default_credentials " - "from the environment!") - credentials = self.sut.get_application_default_credentials() - self.assertIsNone(credentials) - - def test_get_application_default_credentials_returns_credentials(self): - if not _check_if_can_get_correct_default_credentials(): - raise nose.SkipTest("Cannot get default_credentials " - "from the environment!") - from oauth2client.client import GoogleCredentials - credentials = self.sut.get_application_default_credentials() - self.assertTrue(isinstance(credentials, GoogleCredentials)) - - -class TestGBQConnectorServiceAccountKeyPathIntegration(tm.TestCase): - def setUp(self): - _setup_common() - - _skip_if_no_project_id() - _skip_if_no_private_key_path() - - self.sut = gbq.GbqConnector(_get_project_id(), - private_key=_get_private_key_path()) - - def test_should_be_able_to_make_a_connector(self): - self.assertTrue(self.sut is not None, - 'Could not create a GbqConnector') - - def test_should_be_able_to_get_valid_credentials(self): - credentials = self.sut.get_credentials() - self.assertFalse(credentials.invalid, 'Returned credentials invalid') - - def test_should_be_able_to_get_a_bigquery_service(self): - bigquery_service = self.sut.get_service() - self.assertTrue(bigquery_service is not None, 'No service returned') - - def test_should_be_able_to_get_schema_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(schema is not None) - - def test_should_be_able_to_get_results_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(pages is not None) - - -class TestGBQConnectorServiceAccountKeyContentsIntegration(tm.TestCase): - def setUp(self): - _setup_common() - - _skip_if_no_project_id() - _skip_if_no_private_key_path() - - self.sut = gbq.GbqConnector(_get_project_id(), - private_key=_get_private_key_path()) - - def test_should_be_able_to_make_a_connector(self): - self.assertTrue(self.sut is not None, - 'Could not create a GbqConnector') - - def test_should_be_able_to_get_valid_credentials(self): - credentials = self.sut.get_credentials() - self.assertFalse(credentials.invalid, 'Returned credentials invalid') - - def test_should_be_able_to_get_a_bigquery_service(self): - bigquery_service = self.sut.get_service() - self.assertTrue(bigquery_service is not None, 'No service returned') - - def test_should_be_able_to_get_schema_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(schema is not None) - - def test_should_be_able_to_get_results_from_query(self): - schema, pages = self.sut.run_query('SELECT 1') - self.assertTrue(pages is not None) - - -class GBQUnitTests(tm.TestCase): - def setUp(self): - _setup_common() - - def test_import_google_api_python_client(self): - if compat.PY2: - with tm.assertRaises(ImportError): - from googleapiclient.discovery import build # noqa - from googleapiclient.errors import HttpError # noqa - from apiclient.discovery import build # noqa - from apiclient.errors import HttpError # noqa - else: - from googleapiclient.discovery import build # noqa - from googleapiclient.errors import HttpError # noqa - - def test_should_return_bigquery_integers_as_python_floats(self): - result = gbq._parse_entry(1, 'INTEGER') - tm.assert_equal(result, float(1)) - - def test_should_return_bigquery_floats_as_python_floats(self): - result = gbq._parse_entry(1, 'FLOAT') - tm.assert_equal(result, float(1)) - - def test_should_return_bigquery_timestamps_as_numpy_datetime(self): - result = gbq._parse_entry('0e9', 'TIMESTAMP') - tm.assert_equal(result, np_datetime64_compat('1970-01-01T00:00:00Z')) - - def test_should_return_bigquery_booleans_as_python_booleans(self): - result = gbq._parse_entry('false', 'BOOLEAN') - tm.assert_equal(result, False) - - def test_should_return_bigquery_strings_as_python_strings(self): - result = gbq._parse_entry('STRING', 'STRING') - tm.assert_equal(result, 'STRING') - - def test_to_gbq_should_fail_if_invalid_table_name_passed(self): - with tm.assertRaises(gbq.NotFoundException): - gbq.to_gbq(DataFrame(), 'invalid_table_name', project_id="1234") - - def test_to_gbq_with_no_project_id_given_should_fail(self): - with tm.assertRaises(TypeError): - gbq.to_gbq(DataFrame(), 'dataset.tablename') - - def test_read_gbq_with_no_project_id_given_should_fail(self): - with tm.assertRaises(TypeError): - gbq.read_gbq('SELECT "1" as NUMBER_1') - - def test_that_parse_data_works_properly(self): - test_schema = {'fields': [ - {'mode': 'NULLABLE', 'name': 'VALID_STRING', 'type': 'STRING'}]} - test_page = [{'f': [{'v': 'PI'}]}] - - test_output = gbq._parse_data(test_schema, test_page) - correct_output = DataFrame({'VALID_STRING': ['PI']}) - tm.assert_frame_equal(test_output, correct_output) - - def test_read_gbq_with_invalid_private_key_json_should_fail(self): - with tm.assertRaises(gbq.InvalidPrivateKeyFormat): - gbq.read_gbq('SELECT 1', project_id='x', private_key='y') - - def test_read_gbq_with_empty_private_key_json_should_fail(self): - with tm.assertRaises(gbq.InvalidPrivateKeyFormat): - gbq.read_gbq('SELECT 1', project_id='x', private_key='{}') - - def test_read_gbq_with_private_key_json_wrong_types_should_fail(self): - with tm.assertRaises(gbq.InvalidPrivateKeyFormat): - gbq.read_gbq( - 'SELECT 1', project_id='x', - private_key='{ "client_email" : 1, "private_key" : True }') - - def test_read_gbq_with_empty_private_key_file_should_fail(self): - with tm.ensure_clean() as empty_file_path: - with tm.assertRaises(gbq.InvalidPrivateKeyFormat): - gbq.read_gbq('SELECT 1', project_id='x', - private_key=empty_file_path) - - def test_read_gbq_with_corrupted_private_key_json_should_fail(self): - _skip_if_no_private_key_path() - - with tm.assertRaises(gbq.InvalidPrivateKeyFormat): - gbq.read_gbq( - 'SELECT 1', project_id='x', - private_key=re.sub('[a-z]', '9', _get_private_key_path())) - - -class TestReadGBQIntegration(tm.TestCase): - - @classmethod - def setUpClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *BEFORE* - # executing *ALL* tests described below. - - _skip_if_no_project_id() - - _setup_common() - - def setUp(self): - # - PER-TEST FIXTURES - - # put here any instruction you want to be run *BEFORE* *EVERY* test is - # executed. - pass - - @classmethod - def tearDownClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *AFTER* - # executing all tests. - pass - - def tearDown(self): - # - PER-TEST FIXTURES - - # put here any instructions you want to be run *AFTER* *EVERY* test is - # executed. - pass - - def test_should_read_as_user_account(self): - if _in_travis_environment(): - raise nose.SkipTest("Cannot run local auth in travis environment") - - query = 'SELECT "PI" as VALID_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id()) - tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) - - def test_should_read_as_service_account_with_key_path(self): - _skip_if_no_private_key_path() - query = 'SELECT "PI" as VALID_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) - - def test_should_read_as_service_account_with_key_contents(self): - _skip_if_no_private_key_contents() - query = 'SELECT "PI" as VALID_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_contents()) - tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) - - def test_should_properly_handle_valid_strings(self): - query = 'SELECT "PI" as VALID_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) - - def test_should_properly_handle_empty_strings(self): - query = 'SELECT "" as EMPTY_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'EMPTY_STRING': [""]})) - - def test_should_properly_handle_null_strings(self): - query = 'SELECT STRING(NULL) as NULL_STRING' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'NULL_STRING': [None]})) - - def test_should_properly_handle_valid_integers(self): - query = 'SELECT INTEGER(3) as VALID_INTEGER' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'VALID_INTEGER': [3]})) - - def test_should_properly_handle_null_integers(self): - query = 'SELECT INTEGER(NULL) as NULL_INTEGER' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'NULL_INTEGER': [np.nan]})) - - def test_should_properly_handle_valid_floats(self): - query = 'SELECT PI() as VALID_FLOAT' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame( - {'VALID_FLOAT': [3.141592653589793]})) - - def test_should_properly_handle_null_floats(self): - query = 'SELECT FLOAT(NULL) as NULL_FLOAT' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'NULL_FLOAT': [np.nan]})) - - def test_should_properly_handle_timestamp_unix_epoch(self): - query = 'SELECT TIMESTAMP("1970-01-01 00:00:00") as UNIX_EPOCH' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame( - {'UNIX_EPOCH': [np.datetime64('1970-01-01T00:00:00.000000Z')]})) - - def test_should_properly_handle_arbitrary_timestamp(self): - query = 'SELECT TIMESTAMP("2004-09-15 05:00:00") as VALID_TIMESTAMP' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({ - 'VALID_TIMESTAMP': [np.datetime64('2004-09-15T05:00:00.000000Z')] - })) - - def test_should_properly_handle_null_timestamp(self): - query = 'SELECT TIMESTAMP(NULL) as NULL_TIMESTAMP' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'NULL_TIMESTAMP': [NaT]})) - - def test_should_properly_handle_true_boolean(self): - query = 'SELECT BOOLEAN(TRUE) as TRUE_BOOLEAN' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'TRUE_BOOLEAN': [True]})) - - def test_should_properly_handle_false_boolean(self): - query = 'SELECT BOOLEAN(FALSE) as FALSE_BOOLEAN' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'FALSE_BOOLEAN': [False]})) - - def test_should_properly_handle_null_boolean(self): - query = 'SELECT BOOLEAN(NULL) as NULL_BOOLEAN' - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, DataFrame({'NULL_BOOLEAN': [None]})) - - def test_unicode_string_conversion_and_normalization(self): - correct_test_datatype = DataFrame( - {'UNICODE_STRING': [u("\xe9\xfc")]} - ) - - unicode_string = "\xc3\xa9\xc3\xbc" - - if compat.PY3: - unicode_string = unicode_string.encode('latin-1').decode('utf8') - - query = 'SELECT "{0}" as UNICODE_STRING'.format(unicode_string) - - df = gbq.read_gbq(query, project_id=_get_project_id(), - private_key=_get_private_key_path()) - tm.assert_frame_equal(df, correct_test_datatype) - - def test_index_column(self): - query = "SELECT 'a' as STRING_1, 'b' as STRING_2" - result_frame = gbq.read_gbq(query, project_id=_get_project_id(), - index_col="STRING_1", - private_key=_get_private_key_path()) - correct_frame = DataFrame( - {'STRING_1': ['a'], 'STRING_2': ['b']}).set_index("STRING_1") - tm.assert_equal(result_frame.index.name, correct_frame.index.name) - - def test_column_order(self): - query = "SELECT 'a' as STRING_1, 'b' as STRING_2, 'c' as STRING_3" - col_order = ['STRING_3', 'STRING_1', 'STRING_2'] - result_frame = gbq.read_gbq(query, project_id=_get_project_id(), - col_order=col_order, - private_key=_get_private_key_path()) - correct_frame = DataFrame({'STRING_1': ['a'], 'STRING_2': [ - 'b'], 'STRING_3': ['c']})[col_order] - tm.assert_frame_equal(result_frame, correct_frame) - - def test_column_order_plus_index(self): - query = "SELECT 'a' as STRING_1, 'b' as STRING_2, 'c' as STRING_3" - col_order = ['STRING_3', 'STRING_2'] - result_frame = gbq.read_gbq(query, project_id=_get_project_id(), - index_col='STRING_1', col_order=col_order, - private_key=_get_private_key_path()) - correct_frame = DataFrame( - {'STRING_1': ['a'], 'STRING_2': ['b'], 'STRING_3': ['c']}) - correct_frame.set_index('STRING_1', inplace=True) - correct_frame = correct_frame[col_order] - tm.assert_frame_equal(result_frame, correct_frame) - - def test_malformed_query(self): - with tm.assertRaises(gbq.GenericGBQException): - gbq.read_gbq("SELCET * FORM [publicdata:samples.shakespeare]", - project_id=_get_project_id(), - private_key=_get_private_key_path()) - - def test_bad_project_id(self): - with tm.assertRaises(gbq.GenericGBQException): - gbq.read_gbq("SELECT 1", project_id='001', - private_key=_get_private_key_path()) - - def test_bad_table_name(self): - with tm.assertRaises(gbq.GenericGBQException): - gbq.read_gbq("SELECT * FROM [publicdata:samples.nope]", - project_id=_get_project_id(), - private_key=_get_private_key_path()) - - def test_download_dataset_larger_than_200k_rows(self): - test_size = 200005 - # Test for known BigQuery bug in datasets larger than 100k rows - # http://stackoverflow.com/questions/19145587/bq-py-not-paging-results - df = gbq.read_gbq("SELECT id FROM [publicdata:samples.wikipedia] " - "GROUP EACH BY id ORDER BY id ASC LIMIT {0}" - .format(test_size), - project_id=_get_project_id(), - private_key=_get_private_key_path()) - self.assertEqual(len(df.drop_duplicates()), test_size) - - def test_zero_rows(self): - # Bug fix for https://github.com/pandas-dev/pandas/issues/10273 - df = gbq.read_gbq("SELECT title, id " - "FROM [publicdata:samples.wikipedia] " - "WHERE timestamp=-9999999", - project_id=_get_project_id(), - private_key=_get_private_key_path()) - page_array = np.zeros( - (0,), dtype=[('title', object), ('id', np.dtype(float))]) - expected_result = DataFrame(page_array, columns=['title', 'id']) - self.assert_frame_equal(df, expected_result) - - def test_legacy_sql(self): - legacy_sql = "SELECT id FROM [publicdata.samples.wikipedia] LIMIT 10" - - # Test that a legacy sql statement fails when - # setting dialect='standard' - with tm.assertRaises(gbq.GenericGBQException): - gbq.read_gbq(legacy_sql, project_id=_get_project_id(), - dialect='standard', - private_key=_get_private_key_path()) - - # Test that a legacy sql statement succeeds when - # setting dialect='legacy' - df = gbq.read_gbq(legacy_sql, project_id=_get_project_id(), - dialect='legacy', - private_key=_get_private_key_path()) - self.assertEqual(len(df.drop_duplicates()), 10) - - def test_standard_sql(self): - standard_sql = "SELECT DISTINCT id FROM " \ - "`publicdata.samples.wikipedia` LIMIT 10" - - # Test that a standard sql statement fails when using - # the legacy SQL dialect (default value) - with tm.assertRaises(gbq.GenericGBQException): - gbq.read_gbq(standard_sql, project_id=_get_project_id(), - private_key=_get_private_key_path()) - - # Test that a standard sql statement succeeds when - # setting dialect='standard' - df = gbq.read_gbq(standard_sql, project_id=_get_project_id(), - dialect='standard', - private_key=_get_private_key_path()) - self.assertEqual(len(df.drop_duplicates()), 10) - - def test_invalid_option_for_sql_dialect(self): - sql_statement = "SELECT DISTINCT id FROM " \ - "`publicdata.samples.wikipedia` LIMIT 10" - - # Test that an invalid option for `dialect` raises ValueError - with tm.assertRaises(ValueError): - gbq.read_gbq(sql_statement, project_id=_get_project_id(), - dialect='invalid', - private_key=_get_private_key_path()) - - # Test that a correct option for dialect succeeds - # to make sure ValueError was due to invalid dialect - gbq.read_gbq(sql_statement, project_id=_get_project_id(), - dialect='standard', private_key=_get_private_key_path()) - - def test_query_with_parameters(self): - sql_statement = "SELECT @param1 + @param2 as VALID_RESULT" - config = { - 'query': { - "useLegacySql": False, - "parameterMode": "named", - "queryParameters": [ - { - "name": "param1", - "parameterType": { - "type": "INTEGER" - }, - "parameterValue": { - "value": 1 - } - }, - { - "name": "param2", - "parameterType": { - "type": "INTEGER" - }, - "parameterValue": { - "value": 2 - } - } - ] - } - } - # Test that a query that relies on parameters fails - # when parameters are not supplied via configuration - with tm.assertRaises(ValueError): - gbq.read_gbq(sql_statement, project_id=_get_project_id(), - private_key=_get_private_key_path()) - - # Test that the query is successful because we have supplied - # the correct query parameters via the 'config' option - df = gbq.read_gbq(sql_statement, project_id=_get_project_id(), - private_key=_get_private_key_path(), - configuration=config) - tm.assert_frame_equal(df, DataFrame({'VALID_RESULT': [3]})) - - def test_query_inside_configuration(self): - query_no_use = 'SELECT "PI_WRONG" as VALID_STRING' - query = 'SELECT "PI" as VALID_STRING' - config = { - 'query': { - "query": query, - "useQueryCache": False, - } - } - # Test that it can't pass query both - # inside config and as parameter - with tm.assertRaises(ValueError): - gbq.read_gbq(query_no_use, project_id=_get_project_id(), - private_key=_get_private_key_path(), - configuration=config) - - df = gbq.read_gbq(None, project_id=_get_project_id(), - private_key=_get_private_key_path(), - configuration=config) - tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) - - def test_configuration_without_query(self): - sql_statement = 'SELECT 1' - config = { - 'copy': { - "sourceTable": { - "projectId": _get_project_id(), - "datasetId": "publicdata:samples", - "tableId": "wikipedia" - }, - "destinationTable": { - "projectId": _get_project_id(), - "datasetId": "publicdata:samples", - "tableId": "wikipedia_copied" - }, - } - } - # Test that only 'query' configurations are supported - # nor 'copy','load','extract' - with tm.assertRaises(ValueError): - gbq.read_gbq(sql_statement, project_id=_get_project_id(), - private_key=_get_private_key_path(), - configuration=config) - - -class TestToGBQIntegration(tm.TestCase): - # Changes to BigQuery table schema may take up to 2 minutes as of May 2015 - # As a workaround to this issue, each test should use a unique table name. - # Make sure to modify the for loop range in the tearDownClass when a new - # test is added See `Issue 191 - # `__ - - @classmethod - def setUpClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *BEFORE* - # executing *ALL* tests described below. - - _skip_if_no_project_id() - - _setup_common() - clean_gbq_environment(_get_private_key_path()) - - gbq._Dataset(_get_project_id(), - private_key=_get_private_key_path() - ).create(DATASET_ID + "1") - - def setUp(self): - # - PER-TEST FIXTURES - - # put here any instruction you want to be run *BEFORE* *EVERY* test is - # executed. - - self.dataset = gbq._Dataset(_get_project_id(), - private_key=_get_private_key_path()) - self.table = gbq._Table(_get_project_id(), DATASET_ID + "1", - private_key=_get_private_key_path()) - self.sut = gbq.GbqConnector(_get_project_id(), - private_key=_get_private_key_path()) - - @classmethod - def tearDownClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *AFTER* - # executing all tests. - - clean_gbq_environment(_get_private_key_path()) - - def tearDown(self): - # - PER-TEST FIXTURES - - # put here any instructions you want to be run *AFTER* *EVERY* test is - # executed. - pass - - def test_upload_data(self): - destination_table = DESTINATION_TABLE + "1" - - test_size = 20001 - df = make_mixed_dataframe_v2(test_size) - - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_path()) - - sleep(30) # <- Curses Google!!! - - result = gbq.read_gbq("SELECT COUNT(*) as NUM_ROWS FROM {0}" - .format(destination_table), - project_id=_get_project_id(), - private_key=_get_private_key_path()) - self.assertEqual(result['NUM_ROWS'][0], test_size) - - def test_upload_data_if_table_exists_fail(self): - destination_table = DESTINATION_TABLE + "2" - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - self.table.create(TABLE_ID + "2", gbq._generate_bq_schema(df)) - - # Test the default value of if_exists is 'fail' - with tm.assertRaises(gbq.TableCreationError): - gbq.to_gbq(df, destination_table, _get_project_id(), - private_key=_get_private_key_path()) - - # Test the if_exists parameter with value 'fail' - with tm.assertRaises(gbq.TableCreationError): - gbq.to_gbq(df, destination_table, _get_project_id(), - if_exists='fail', private_key=_get_private_key_path()) - - def test_upload_data_if_table_exists_append(self): - destination_table = DESTINATION_TABLE + "3" - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - df_different_schema = tm.makeMixedDataFrame() - - # Initialize table with sample data - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_path()) - - # Test the if_exists parameter with value 'append' - gbq.to_gbq(df, destination_table, _get_project_id(), - if_exists='append', private_key=_get_private_key_path()) - - sleep(30) # <- Curses Google!!! - - result = gbq.read_gbq("SELECT COUNT(*) as NUM_ROWS FROM {0}" - .format(destination_table), - project_id=_get_project_id(), - private_key=_get_private_key_path()) - self.assertEqual(result['NUM_ROWS'][0], test_size * 2) - - # Try inserting with a different schema, confirm failure - with tm.assertRaises(gbq.InvalidSchema): - gbq.to_gbq(df_different_schema, destination_table, - _get_project_id(), if_exists='append', - private_key=_get_private_key_path()) - - def test_upload_data_if_table_exists_replace(self): - - raise nose.SkipTest("buggy test") - - destination_table = DESTINATION_TABLE + "4" - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - df_different_schema = tm.makeMixedDataFrame() - - # Initialize table with sample data - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_path()) - - # Test the if_exists parameter with the value 'replace'. - gbq.to_gbq(df_different_schema, destination_table, - _get_project_id(), if_exists='replace', - private_key=_get_private_key_path()) - - sleep(30) # <- Curses Google!!! - - result = gbq.read_gbq("SELECT COUNT(*) as NUM_ROWS FROM {0}" - .format(destination_table), - project_id=_get_project_id(), - private_key=_get_private_key_path()) - self.assertEqual(result['NUM_ROWS'][0], 5) - - def test_google_upload_errors_should_raise_exception(self): - destination_table = DESTINATION_TABLE + "5" - - test_timestamp = datetime.now(pytz.timezone('US/Arizona')) - bad_df = DataFrame({'bools': [False, False], 'flts': [0.0, 1.0], - 'ints': [0, '1'], 'strs': ['a', 1], - 'times': [test_timestamp, test_timestamp]}, - index=range(2)) - - with tm.assertRaises(gbq.StreamingInsertError): - gbq.to_gbq(bad_df, destination_table, _get_project_id(), - verbose=True, private_key=_get_private_key_path()) - - def test_generate_schema(self): - df = tm.makeMixedDataFrame() - schema = gbq._generate_bq_schema(df) - - test_schema = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - - self.assertEqual(schema, test_schema) - - def test_create_table(self): - destination_table = TABLE_ID + "6" - test_schema = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - self.table.create(destination_table, test_schema) - self.assertTrue(self.table.exists(destination_table), - 'Expected table to exist') - - def test_table_does_not_exist(self): - self.assertTrue(not self.table.exists(TABLE_ID + "7"), - 'Expected table not to exist') - - def test_delete_table(self): - destination_table = TABLE_ID + "8" - test_schema = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - self.table.create(destination_table, test_schema) - self.table.delete(destination_table) - self.assertTrue(not self.table.exists( - destination_table), 'Expected table not to exist') - - def test_list_table(self): - destination_table = TABLE_ID + "9" - test_schema = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - self.table.create(destination_table, test_schema) - self.assertTrue( - destination_table in self.dataset.tables(DATASET_ID + "1"), - 'Expected table list to contain table {0}' - .format(destination_table)) - - def test_verify_schema_allows_flexible_column_order(self): - destination_table = TABLE_ID + "10" - test_schema_1 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - test_schema_2 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - - self.table.create(destination_table, test_schema_1) - self.assertTrue(self.sut.verify_schema( - DATASET_ID + "1", destination_table, test_schema_2), - 'Expected schema to match') - - def test_verify_schema_fails_different_data_type(self): - destination_table = TABLE_ID + "11" - test_schema_1 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - test_schema_2 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'STRING'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - - self.table.create(destination_table, test_schema_1) - self.assertFalse(self.sut.verify_schema( - DATASET_ID + "1", destination_table, test_schema_2), - 'Expected different schema') - - def test_verify_schema_fails_different_structure(self): - destination_table = TABLE_ID + "12" - test_schema_1 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - test_schema_2 = {'fields': [{'name': 'A', 'type': 'FLOAT'}, - {'name': 'B2', 'type': 'FLOAT'}, - {'name': 'C', 'type': 'STRING'}, - {'name': 'D', 'type': 'TIMESTAMP'}]} - - self.table.create(destination_table, test_schema_1) - self.assertFalse(self.sut.verify_schema( - DATASET_ID + "1", destination_table, test_schema_2), - 'Expected different schema') - - def test_upload_data_flexible_column_order(self): - destination_table = DESTINATION_TABLE + "13" - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - - # Initialize table with sample data - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_path()) - - df_columns_reversed = df[df.columns[::-1]] - - gbq.to_gbq(df_columns_reversed, destination_table, _get_project_id(), - if_exists='append', private_key=_get_private_key_path()) - - def test_list_dataset(self): - dataset_id = DATASET_ID + "1" - self.assertTrue(dataset_id in self.dataset.datasets(), - 'Expected dataset list to contain dataset {0}' - .format(dataset_id)) - - def test_list_table_zero_results(self): - dataset_id = DATASET_ID + "2" - self.dataset.create(dataset_id) - table_list = gbq._Dataset(_get_project_id(), - private_key=_get_private_key_path() - ).tables(dataset_id) - self.assertEqual(len(table_list), 0, - 'Expected gbq.list_table() to return 0') - - def test_create_dataset(self): - dataset_id = DATASET_ID + "3" - self.dataset.create(dataset_id) - self.assertTrue(dataset_id in self.dataset.datasets(), - 'Expected dataset to exist') - - def test_delete_dataset(self): - dataset_id = DATASET_ID + "4" - self.dataset.create(dataset_id) - self.dataset.delete(dataset_id) - self.assertTrue(dataset_id not in self.dataset.datasets(), - 'Expected dataset not to exist') - - def test_dataset_exists(self): - dataset_id = DATASET_ID + "5" - self.dataset.create(dataset_id) - self.assertTrue(self.dataset.exists(dataset_id), - 'Expected dataset to exist') - - def create_table_data_dataset_does_not_exist(self): - dataset_id = DATASET_ID + "6" - table_id = TABLE_ID + "1" - table_with_new_dataset = gbq._Table(_get_project_id(), dataset_id) - df = make_mixed_dataframe_v2(10) - table_with_new_dataset.create(table_id, gbq._generate_bq_schema(df)) - self.assertTrue(self.dataset.exists(dataset_id), - 'Expected dataset to exist') - self.assertTrue(table_with_new_dataset.exists( - table_id), 'Expected dataset to exist') - - def test_dataset_does_not_exist(self): - self.assertTrue(not self.dataset.exists( - DATASET_ID + "_not_found"), 'Expected dataset not to exist') - - -class TestToGBQIntegrationServiceAccountKeyPath(tm.TestCase): - # Changes to BigQuery table schema may take up to 2 minutes as of May 2015 - # As a workaround to this issue, each test should use a unique table name. - # Make sure to modify the for loop range in the tearDownClass when a new - # test is added - # See `Issue 191 - # `__ - - @classmethod - def setUpClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *BEFORE* - # executing *ALL* tests described below. - - _skip_if_no_project_id() - _skip_if_no_private_key_path() - - _setup_common() - clean_gbq_environment(_get_private_key_path()) - - def setUp(self): - # - PER-TEST FIXTURES - - # put here any instruction you want to be run *BEFORE* *EVERY* test - # is executed. - pass - - @classmethod - def tearDownClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *AFTER* - # executing all tests. - - clean_gbq_environment(_get_private_key_path()) - - def tearDown(self): - # - PER-TEST FIXTURES - - # put here any instructions you want to be run *AFTER* *EVERY* test - # is executed. - pass - - def test_upload_data_as_service_account_with_key_path(self): - destination_table = "{0}.{1}".format(DATASET_ID + "2", TABLE_ID + "1") - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_path()) - - sleep(30) # <- Curses Google!!! - - result = gbq.read_gbq( - "SELECT COUNT(*) as NUM_ROWS FROM {0}".format(destination_table), - project_id=_get_project_id(), - private_key=_get_private_key_path()) - - self.assertEqual(result['NUM_ROWS'][0], test_size) - - -class TestToGBQIntegrationServiceAccountKeyContents(tm.TestCase): - # Changes to BigQuery table schema may take up to 2 minutes as of May 2015 - # As a workaround to this issue, each test should use a unique table name. - # Make sure to modify the for loop range in the tearDownClass when a new - # test is added - # See `Issue 191 - # `__ - - @classmethod - def setUpClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *BEFORE* - # executing *ALL* tests described below. - - _setup_common() - _skip_if_no_project_id() - _skip_if_no_private_key_contents() - - clean_gbq_environment(_get_private_key_contents()) - - def setUp(self): - # - PER-TEST FIXTURES - - # put here any instruction you want to be run *BEFORE* *EVERY* test - # is executed. - pass - - @classmethod - def tearDownClass(cls): - # - GLOBAL CLASS FIXTURES - - # put here any instruction you want to execute only *ONCE* *AFTER* - # executing all tests. - - clean_gbq_environment(_get_private_key_contents()) - - def tearDown(self): - # - PER-TEST FIXTURES - - # put here any instructions you want to be run *AFTER* *EVERY* test - # is executed. - pass - - def test_upload_data_as_service_account_with_key_contents(self): - destination_table = "{0}.{1}".format(DATASET_ID + "3", TABLE_ID + "1") - - test_size = 10 - df = make_mixed_dataframe_v2(test_size) - - gbq.to_gbq(df, destination_table, _get_project_id(), chunksize=10000, - private_key=_get_private_key_contents()) - - sleep(30) # <- Curses Google!!! - - result = gbq.read_gbq( - "SELECT COUNT(*) as NUM_ROWS FROM {0}".format(destination_table), - project_id=_get_project_id(), - private_key=_get_private_key_contents()) - self.assertEqual(result['NUM_ROWS'][0], test_size) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/test_pickle.py b/pandas/io/tests/test_pickle.py deleted file mode 100644 index a49f50b1bcb9f..0000000000000 --- a/pandas/io/tests/test_pickle.py +++ /dev/null @@ -1,291 +0,0 @@ -# pylint: disable=E1101,E1103,W0232 - -""" manage legacy pickle tests """ - -import nose -import os - -from distutils.version import LooseVersion - -import pandas as pd -from pandas import Index -from pandas.compat import u, is_platform_little_endian -import pandas -import pandas.util.testing as tm -from pandas.tseries.offsets import Day, MonthEnd - - -class TestPickle(): - """ - How to add pickle tests: - - 1. Install pandas version intended to output the pickle. - - 2. Execute "generate_legacy_storage_files.py" to create the pickle. - $ python generate_legacy_storage_files.py pickle - - 3. Move the created pickle to "data/legacy_pickle/" directory. - - NOTE: TestPickle can't be a subclass of tm.Testcase to use test generator. - http://stackoverflow.com/questions/6689537/ - nose-test-generators-inside-class - """ - _multiprocess_can_split_ = True - - def setUp(self): - from pandas.io.tests.generate_legacy_storage_files import ( - create_pickle_data) - self.data = create_pickle_data() - self.path = u('__%s__.pickle' % tm.rands(10)) - - def compare_element(self, result, expected, typ, version=None): - if isinstance(expected, Index): - tm.assert_index_equal(expected, result) - return - - if typ.startswith('sp_'): - comparator = getattr(tm, "assert_%s_equal" % typ) - comparator(result, expected, exact_indices=False) - elif typ == 'timestamp': - if expected is pd.NaT: - assert result is pd.NaT - else: - tm.assert_equal(result, expected) - tm.assert_equal(result.freq, expected.freq) - else: - comparator = getattr(tm, "assert_%s_equal" % - typ, tm.assert_almost_equal) - comparator(result, expected) - - def compare(self, vf, version): - - # py3 compat when reading py2 pickle - try: - data = pandas.read_pickle(vf) - except (ValueError) as e: - if 'unsupported pickle protocol:' in str(e): - # trying to read a py3 pickle in py2 - return - else: - raise - - for typ, dv in data.items(): - for dt, result in dv.items(): - try: - expected = self.data[typ][dt] - except (KeyError): - if version in ('0.10.1', '0.11.0') and dt == 'reg': - break - else: - raise - - # use a specific comparator - # if available - comparator = "compare_{typ}_{dt}".format(typ=typ, dt=dt) - comparator = getattr(self, comparator, self.compare_element) - comparator(result, expected, typ, version) - return data - - def compare_sp_series_ts(self, res, exp, typ, version): - # SparseTimeSeries integrated into SparseSeries in 0.12.0 - # and deprecated in 0.17.0 - if version and LooseVersion(version) <= "0.12.0": - tm.assert_sp_series_equal(res, exp, check_series_type=False) - else: - tm.assert_sp_series_equal(res, exp) - - def compare_series_ts(self, result, expected, typ, version): - # GH 7748 - tm.assert_series_equal(result, expected) - tm.assert_equal(result.index.freq, expected.index.freq) - tm.assert_equal(result.index.freq.normalize, False) - tm.assert_series_equal(result > 0, expected > 0) - - # GH 9291 - freq = result.index.freq - tm.assert_equal(freq + Day(1), Day(2)) - - res = freq + pandas.Timedelta(hours=1) - tm.assert_equal(isinstance(res, pandas.Timedelta), True) - tm.assert_equal(res, pandas.Timedelta(days=1, hours=1)) - - res = freq + pandas.Timedelta(nanoseconds=1) - tm.assert_equal(isinstance(res, pandas.Timedelta), True) - tm.assert_equal(res, pandas.Timedelta(days=1, nanoseconds=1)) - - def compare_series_dt_tz(self, result, expected, typ, version): - # 8260 - # dtype is object < 0.17.0 - if LooseVersion(version) < '0.17.0': - expected = expected.astype(object) - tm.assert_series_equal(result, expected) - else: - tm.assert_series_equal(result, expected) - - def compare_series_cat(self, result, expected, typ, version): - # Categorical dtype is added in 0.15.0 - # ordered is changed in 0.16.0 - if LooseVersion(version) < '0.15.0': - tm.assert_series_equal(result, expected, check_dtype=False, - check_categorical=False) - elif LooseVersion(version) < '0.16.0': - tm.assert_series_equal(result, expected, check_categorical=False) - else: - tm.assert_series_equal(result, expected) - - def compare_frame_dt_mixed_tzs(self, result, expected, typ, version): - # 8260 - # dtype is object < 0.17.0 - if LooseVersion(version) < '0.17.0': - expected = expected.astype(object) - tm.assert_frame_equal(result, expected) - else: - tm.assert_frame_equal(result, expected) - - def compare_frame_cat_onecol(self, result, expected, typ, version): - # Categorical dtype is added in 0.15.0 - # ordered is changed in 0.16.0 - if LooseVersion(version) < '0.15.0': - tm.assert_frame_equal(result, expected, check_dtype=False, - check_categorical=False) - elif LooseVersion(version) < '0.16.0': - tm.assert_frame_equal(result, expected, check_categorical=False) - else: - tm.assert_frame_equal(result, expected) - - def compare_frame_cat_and_float(self, result, expected, typ, version): - self.compare_frame_cat_onecol(result, expected, typ, version) - - def compare_index_period(self, result, expected, typ, version): - tm.assert_index_equal(result, expected) - tm.assertIsInstance(result.freq, MonthEnd) - tm.assert_equal(result.freq, MonthEnd()) - tm.assert_equal(result.freqstr, 'M') - tm.assert_index_equal(result.shift(2), expected.shift(2)) - - def compare_sp_frame_float(self, result, expected, typ, version): - if LooseVersion(version) <= '0.18.1': - tm.assert_sp_frame_equal(result, expected, exact_indices=False, - check_dtype=False) - else: - tm.assert_sp_frame_equal(result, expected) - - def read_pickles(self, version): - if not is_platform_little_endian(): - raise nose.SkipTest("known failure on non-little endian") - - pth = tm.get_data_path('legacy_pickle/{0}'.format(str(version))) - n = 0 - for f in os.listdir(pth): - vf = os.path.join(pth, f) - data = self.compare(vf, version) - - if data is None: - continue - n += 1 - assert n > 0, 'Pickle files are not tested' - - def test_pickles(self): - pickle_path = tm.get_data_path('legacy_pickle') - n = 0 - for v in os.listdir(pickle_path): - pth = os.path.join(pickle_path, v) - if os.path.isdir(pth): - yield self.read_pickles, v - n += 1 - assert n > 0, 'Pickle files are not tested' - - def test_round_trip_current(self): - - try: - import cPickle as c_pickle - - def c_pickler(obj, path): - with open(path, 'wb') as fh: - c_pickle.dump(obj, fh, protocol=-1) - - def c_unpickler(path): - with open(path, 'rb') as fh: - fh.seek(0) - return c_pickle.load(fh) - except: - c_pickler = None - c_unpickler = None - - import pickle as python_pickle - - def python_pickler(obj, path): - with open(path, 'wb') as fh: - python_pickle.dump(obj, fh, protocol=-1) - - def python_unpickler(path): - with open(path, 'rb') as fh: - fh.seek(0) - return python_pickle.load(fh) - - for typ, dv in self.data.items(): - for dt, expected in dv.items(): - - for writer in [pd.to_pickle, c_pickler, python_pickler]: - if writer is None: - continue - - with tm.ensure_clean(self.path) as path: - - # test writing with each pickler - writer(expected, path) - - # test reading with each unpickler - result = pd.read_pickle(path) - self.compare_element(result, expected, typ) - - if c_unpickler is not None: - result = c_unpickler(path) - self.compare_element(result, expected, typ) - - result = python_unpickler(path) - self.compare_element(result, expected, typ) - - def test_pickle_v0_14_1(self): - - # we have the name warning - # 10482 - with tm.assert_produces_warning(UserWarning): - cat = pd.Categorical(values=['a', 'b', 'c'], - categories=['a', 'b', 'c', 'd'], - name='foobar', ordered=False) - pickle_path = os.path.join(tm.get_data_path(), - 'categorical_0_14_1.pickle') - # This code was executed once on v0.14.1 to generate the pickle: - # - # cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'], - # name='foobar') - # with open(pickle_path, 'wb') as f: pickle.dump(cat, f) - # - tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path)) - - def test_pickle_v0_15_2(self): - # ordered -> _ordered - # GH 9347 - - # we have the name warning - # 10482 - with tm.assert_produces_warning(UserWarning): - cat = pd.Categorical(values=['a', 'b', 'c'], - categories=['a', 'b', 'c', 'd'], - name='foobar', ordered=False) - pickle_path = os.path.join(tm.get_data_path(), - 'categorical_0_15_2.pickle') - # This code was executed once on v0.15.2 to generate the pickle: - # - # cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'], - # name='foobar') - # with open(pickle_path, 'wb') as f: pickle.dump(cat, f) - # - tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path)) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - # '--with-coverage', '--cover-package=pandas.core'], - exit=False) diff --git a/pandas/io/tests/test_s3.py b/pandas/io/tests/test_s3.py deleted file mode 100644 index 8058698a906ea..0000000000000 --- a/pandas/io/tests/test_s3.py +++ /dev/null @@ -1,14 +0,0 @@ -import nose -from pandas.util import testing as tm - -from pandas.io.common import _is_s3_url - - -class TestS3URL(tm.TestCase): - def test_is_s3_url(self): - self.assertTrue(_is_s3_url("s3://pandas/somethingelse.com")) - self.assertFalse(_is_s3_url("s4://pandas/somethingelse.com")) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/json.py b/pandas/json.py new file mode 100644 index 0000000000000..16d6580c87951 --- /dev/null +++ b/pandas/json.py @@ -0,0 +1,7 @@ +# flake8: noqa + +import warnings +warnings.warn("The pandas.json module is deprecated and will be " + "removed in a future version. Please import from " + "pandas.io.json instead", FutureWarning, stacklevel=2) +from pandas._libs.json import dumps, loads diff --git a/pandas/lib.py b/pandas/lib.py new file mode 100644 index 0000000000000..859a78060fcc1 --- /dev/null +++ b/pandas/lib.py @@ -0,0 +1,8 @@ +# flake8: noqa + +import warnings +warnings.warn("The pandas.lib module is deprecated and will be " + "removed in a future version. These are private functions " + "and can be accessed from pandas._libs.lib instead", + FutureWarning, stacklevel=2) +from pandas._libs.lib import * diff --git a/pandas/parser.py b/pandas/parser.py new file mode 100644 index 0000000000000..f43a408c943d0 --- /dev/null +++ b/pandas/parser.py @@ -0,0 +1,8 @@ +# flake8: noqa + +import warnings +warnings.warn("The pandas.parser module is deprecated and will be " + "removed in a future version. Please import from " + "pandas.io.parser instead", FutureWarning, stacklevel=2) +from pandas._libs.parsers import na_values +from pandas.io.common import CParserError diff --git a/pandas/plotting/__init__.py b/pandas/plotting/__init__.py new file mode 100644 index 0000000000000..c3cbedb0fc28c --- /dev/null +++ b/pandas/plotting/__init__.py @@ -0,0 +1,19 @@ +""" +Plotting api +""" + +# flake8: noqa + +try: # mpl optional + from pandas.plotting import _converter + _converter.register() # needs to override so set_xlim works with str/number +except ImportError: + pass + +from pandas.plotting._misc import (scatter_matrix, radviz, + andrews_curves, bootstrap_plot, + parallel_coordinates, lag_plot, + autocorrelation_plot) +from pandas.plotting._core import boxplot +from pandas.plotting._style import plot_params +from pandas.plotting._tools import table diff --git a/pandas/plotting/_compat.py b/pandas/plotting/_compat.py new file mode 100644 index 0000000000000..7b04b9e1171ec --- /dev/null +++ b/pandas/plotting/_compat.py @@ -0,0 +1,67 @@ +# being a bit too dynamic +# pylint: disable=E1101 +from __future__ import division + +from distutils.version import LooseVersion + + +def _mpl_le_1_2_1(): + try: + import matplotlib as mpl + return (str(mpl.__version__) <= LooseVersion('1.2.1') and + str(mpl.__version__)[0] != '0') + except ImportError: + return False + + +def _mpl_ge_1_3_1(): + try: + import matplotlib + # The or v[0] == '0' is because their versioneer is + # messed up on dev + return (matplotlib.__version__ >= LooseVersion('1.3.1') or + matplotlib.__version__[0] == '0') + except ImportError: + return False + + +def _mpl_ge_1_4_0(): + try: + import matplotlib + return (matplotlib.__version__ >= LooseVersion('1.4') or + matplotlib.__version__[0] == '0') + except ImportError: + return False + + +def _mpl_ge_1_5_0(): + try: + import matplotlib + return (matplotlib.__version__ >= LooseVersion('1.5') or + matplotlib.__version__[0] == '0') + except ImportError: + return False + + +def _mpl_ge_2_0_0(): + try: + import matplotlib + return matplotlib.__version__ >= LooseVersion('2.0') + except ImportError: + return False + + +def _mpl_le_2_0_0(): + try: + import matplotlib + return matplotlib.compare_versions('2.0.0', matplotlib.__version__) + except ImportError: + return False + + +def _mpl_ge_2_0_1(): + try: + import matplotlib + return matplotlib.__version__ >= LooseVersion('2.0.1') + except ImportError: + return False diff --git a/pandas/plotting/_converter.py b/pandas/plotting/_converter.py new file mode 100644 index 0000000000000..47d15195315ba --- /dev/null +++ b/pandas/plotting/_converter.py @@ -0,0 +1,1046 @@ +from datetime import datetime, timedelta +import datetime as pydt +import numpy as np + +from dateutil.relativedelta import relativedelta + +import matplotlib.units as units +import matplotlib.dates as dates + +from matplotlib.ticker import Formatter, AutoLocator, Locator +from matplotlib.transforms import nonsingular + +from pandas.core.dtypes.common import ( + is_float, is_integer, + is_integer_dtype, + is_float_dtype, + is_datetime64_ns_dtype, + is_period_arraylike, + is_nested_list_like +) +from pandas.core.dtypes.generic import ABCSeries + +from pandas.compat import lrange +import pandas.compat as compat +import pandas._libs.lib as lib +import pandas.core.common as com +from pandas.core.index import Index + +from pandas.core.indexes.datetimes import date_range +import pandas.core.tools.datetimes as tools +import pandas.tseries.frequencies as frequencies +from pandas.tseries.frequencies import FreqGroup +from pandas.core.indexes.period import Period, PeriodIndex + +from pandas.plotting._compat import _mpl_le_2_0_0 + +# constants +HOURS_PER_DAY = 24. +MIN_PER_HOUR = 60. +SEC_PER_MIN = 60. + +SEC_PER_HOUR = SEC_PER_MIN * MIN_PER_HOUR +SEC_PER_DAY = SEC_PER_HOUR * HOURS_PER_DAY + +MUSEC_PER_DAY = 1e6 * SEC_PER_DAY + + +def register(): + units.registry[lib.Timestamp] = DatetimeConverter() + units.registry[Period] = PeriodConverter() + units.registry[pydt.datetime] = DatetimeConverter() + units.registry[pydt.date] = DatetimeConverter() + units.registry[pydt.time] = TimeConverter() + units.registry[np.datetime64] = DatetimeConverter() + + +def _to_ordinalf(tm): + tot_sec = (tm.hour * 3600 + tm.minute * 60 + tm.second + + float(tm.microsecond / 1e6)) + return tot_sec + + +def time2num(d): + if isinstance(d, compat.string_types): + parsed = tools.to_datetime(d) + if not isinstance(parsed, datetime): + raise ValueError('Could not parse time %s' % d) + return _to_ordinalf(parsed.time()) + if isinstance(d, pydt.time): + return _to_ordinalf(d) + return d + + +class TimeConverter(units.ConversionInterface): + + @staticmethod + def convert(value, unit, axis): + valid_types = (str, pydt.time) + if (isinstance(value, valid_types) or is_integer(value) or + is_float(value)): + return time2num(value) + if isinstance(value, Index): + return value.map(time2num) + if isinstance(value, (list, tuple, np.ndarray, Index)): + return [time2num(x) for x in value] + return value + + @staticmethod + def axisinfo(unit, axis): + if unit != 'time': + return None + + majloc = AutoLocator() + majfmt = TimeFormatter(majloc) + return units.AxisInfo(majloc=majloc, majfmt=majfmt, label='time') + + @staticmethod + def default_units(x, axis): + return 'time' + + +# time formatter +class TimeFormatter(Formatter): + + def __init__(self, locs): + self.locs = locs + + def __call__(self, x, pos=0): + fmt = '%H:%M:%S' + s = int(x) + ms = int((x - s) * 1e3) + us = int((x - s) * 1e6 - ms) + m, s = divmod(s, 60) + h, m = divmod(m, 60) + _, h = divmod(h, 24) + if us != 0: + fmt += '.%6f' + elif ms != 0: + fmt += '.%3f' + + return pydt.time(h, m, s, us).strftime(fmt) + + +# Period Conversion + + +class PeriodConverter(dates.DateConverter): + + @staticmethod + def convert(values, units, axis): + if is_nested_list_like(values): + values = [PeriodConverter._convert_1d(v, units, axis) + for v in values] + else: + values = PeriodConverter._convert_1d(values, units, axis) + return values + + @staticmethod + def _convert_1d(values, units, axis): + if not hasattr(axis, 'freq'): + raise TypeError('Axis must have `freq` set to convert to Periods') + valid_types = (compat.string_types, datetime, + Period, pydt.date, pydt.time) + if (isinstance(values, valid_types) or is_integer(values) or + is_float(values)): + return get_datevalue(values, axis.freq) + if isinstance(values, PeriodIndex): + return values.asfreq(axis.freq)._values + if isinstance(values, Index): + return values.map(lambda x: get_datevalue(x, axis.freq)) + if is_period_arraylike(values): + return PeriodIndex(values, freq=axis.freq)._values + if isinstance(values, (list, tuple, np.ndarray, Index)): + return [get_datevalue(x, axis.freq) for x in values] + return values + + +def get_datevalue(date, freq): + if isinstance(date, Period): + return date.asfreq(freq).ordinal + elif isinstance(date, (compat.string_types, datetime, + pydt.date, pydt.time)): + return Period(date, freq).ordinal + elif (is_integer(date) or is_float(date) or + (isinstance(date, (np.ndarray, Index)) and (date.size == 1))): + return date + elif date is None: + return None + raise ValueError("Unrecognizable date '%s'" % date) + + +def _dt_to_float_ordinal(dt): + """ + Convert :mod:`datetime` to the Gregorian date as UTC float days, + preserving hours, minutes, seconds and microseconds. Return value + is a :func:`float`. + """ + if (isinstance(dt, (np.ndarray, Index, ABCSeries) + ) and is_datetime64_ns_dtype(dt)): + base = dates.epoch2num(dt.asi8 / 1.0E9) + else: + base = dates.date2num(dt) + return base + + +# Datetime Conversion +class DatetimeConverter(dates.DateConverter): + + @staticmethod + def convert(values, unit, axis): + # values might be a 1-d array, or a list-like of arrays. + if is_nested_list_like(values): + values = [DatetimeConverter._convert_1d(v, unit, axis) + for v in values] + else: + values = DatetimeConverter._convert_1d(values, unit, axis) + return values + + @staticmethod + def _convert_1d(values, unit, axis): + def try_parse(values): + try: + return _dt_to_float_ordinal(tools.to_datetime(values)) + except Exception: + return values + + if isinstance(values, (datetime, pydt.date)): + return _dt_to_float_ordinal(values) + elif isinstance(values, np.datetime64): + return _dt_to_float_ordinal(lib.Timestamp(values)) + elif isinstance(values, pydt.time): + return dates.date2num(values) + elif (is_integer(values) or is_float(values)): + return values + elif isinstance(values, compat.string_types): + return try_parse(values) + elif isinstance(values, (list, tuple, np.ndarray, Index)): + if isinstance(values, Index): + values = values.values + if not isinstance(values, np.ndarray): + values = com._asarray_tuplesafe(values) + + if is_integer_dtype(values) or is_float_dtype(values): + return values + + try: + values = tools.to_datetime(values) + if isinstance(values, Index): + values = _dt_to_float_ordinal(values) + else: + values = [_dt_to_float_ordinal(x) for x in values] + except Exception: + values = _dt_to_float_ordinal(values) + + return values + + @staticmethod + def axisinfo(unit, axis): + """ + Return the :class:`~matplotlib.units.AxisInfo` for *unit*. + + *unit* is a tzinfo instance or None. + The *axis* argument is required but not used. + """ + tz = unit + + majloc = PandasAutoDateLocator(tz=tz) + majfmt = PandasAutoDateFormatter(majloc, tz=tz) + datemin = pydt.date(2000, 1, 1) + datemax = pydt.date(2010, 1, 1) + + return units.AxisInfo(majloc=majloc, majfmt=majfmt, label='', + default_limits=(datemin, datemax)) + + +class PandasAutoDateFormatter(dates.AutoDateFormatter): + + def __init__(self, locator, tz=None, defaultfmt='%Y-%m-%d'): + dates.AutoDateFormatter.__init__(self, locator, tz, defaultfmt) + # matplotlib.dates._UTC has no _utcoffset called by pandas + if self._tz is dates.UTC: + self._tz._utcoffset = self._tz.utcoffset(None) + + # For mpl > 2.0 the format strings are controlled via rcparams + # so do not mess with them. For mpl < 2.0 change the second + # break point and add a musec break point + if _mpl_le_2_0_0(): + self.scaled[1. / SEC_PER_DAY] = '%H:%M:%S' + self.scaled[1. / MUSEC_PER_DAY] = '%H:%M:%S.%f' + + +class PandasAutoDateLocator(dates.AutoDateLocator): + + def get_locator(self, dmin, dmax): + 'Pick the best locator based on a distance.' + delta = relativedelta(dmax, dmin) + + num_days = (delta.years * 12.0 + delta.months) * 31.0 + delta.days + num_sec = (delta.hours * 60.0 + delta.minutes) * 60.0 + delta.seconds + tot_sec = num_days * 86400. + num_sec + + if abs(tot_sec) < self.minticks: + self._freq = -1 + locator = MilliSecondLocator(self.tz) + locator.set_axis(self.axis) + + locator.set_view_interval(*self.axis.get_view_interval()) + locator.set_data_interval(*self.axis.get_data_interval()) + return locator + + return dates.AutoDateLocator.get_locator(self, dmin, dmax) + + def _get_unit(self): + return MilliSecondLocator.get_unit_generic(self._freq) + + +class MilliSecondLocator(dates.DateLocator): + + UNIT = 1. / (24 * 3600 * 1000) + + def __init__(self, tz): + dates.DateLocator.__init__(self, tz) + self._interval = 1. + + def _get_unit(self): + return self.get_unit_generic(-1) + + @staticmethod + def get_unit_generic(freq): + unit = dates.RRuleLocator.get_unit_generic(freq) + if unit < 0: + return MilliSecondLocator.UNIT + return unit + + def __call__(self): + # if no data have been set, this will tank with a ValueError + try: + dmin, dmax = self.viewlim_to_dt() + except ValueError: + return [] + + if dmin > dmax: + dmax, dmin = dmin, dmax + # We need to cap at the endpoints of valid datetime + + # TODO(wesm) unused? + # delta = relativedelta(dmax, dmin) + # try: + # start = dmin - delta + # except ValueError: + # start = _from_ordinal(1.0) + + # try: + # stop = dmax + delta + # except ValueError: + # # The magic number! + # stop = _from_ordinal(3652059.9999999) + + nmax, nmin = dates.date2num((dmax, dmin)) + + num = (nmax - nmin) * 86400 * 1000 + max_millis_ticks = 6 + for interval in [1, 10, 50, 100, 200, 500]: + if num <= interval * (max_millis_ticks - 1): + self._interval = interval + break + else: + # We went through the whole loop without breaking, default to 1 + self._interval = 1000. + + estimate = (nmax - nmin) / (self._get_unit() * self._get_interval()) + + if estimate > self.MAXTICKS * 2: + raise RuntimeError(('MillisecondLocator estimated to generate %d ' + 'ticks from %s to %s: exceeds Locator.MAXTICKS' + '* 2 (%d) ') % + (estimate, dmin, dmax, self.MAXTICKS * 2)) + + freq = '%dL' % self._get_interval() + tz = self.tz.tzname(None) + st = _from_ordinal(dates.date2num(dmin)) # strip tz + ed = _from_ordinal(dates.date2num(dmax)) + all_dates = date_range(start=st, end=ed, freq=freq, tz=tz).asobject + + try: + if len(all_dates) > 0: + locs = self.raise_if_exceeds(dates.date2num(all_dates)) + return locs + except Exception: # pragma: no cover + pass + + lims = dates.date2num([dmin, dmax]) + return lims + + def _get_interval(self): + return self._interval + + def autoscale(self): + """ + Set the view limits to include the data range. + """ + dmin, dmax = self.datalim_to_dt() + if dmin > dmax: + dmax, dmin = dmin, dmax + + # We need to cap at the endpoints of valid datetime + + # TODO(wesm): unused? + + # delta = relativedelta(dmax, dmin) + # try: + # start = dmin - delta + # except ValueError: + # start = _from_ordinal(1.0) + + # try: + # stop = dmax + delta + # except ValueError: + # # The magic number! + # stop = _from_ordinal(3652059.9999999) + + dmin, dmax = self.datalim_to_dt() + + vmin = dates.date2num(dmin) + vmax = dates.date2num(dmax) + + return self.nonsingular(vmin, vmax) + + +def _from_ordinal(x, tz=None): + ix = int(x) + dt = datetime.fromordinal(ix) + remainder = float(x) - ix + hour, remainder = divmod(24 * remainder, 1) + minute, remainder = divmod(60 * remainder, 1) + second, remainder = divmod(60 * remainder, 1) + microsecond = int(1e6 * remainder) + if microsecond < 10: + microsecond = 0 # compensate for rounding errors + dt = datetime(dt.year, dt.month, dt.day, int(hour), int(minute), + int(second), microsecond) + if tz is not None: + dt = dt.astimezone(tz) + + if microsecond > 999990: # compensate for rounding errors + dt += timedelta(microseconds=1e6 - microsecond) + + return dt + +# Fixed frequency dynamic tick locators and formatters + +# ------------------------------------------------------------------------- +# --- Locators --- +# ------------------------------------------------------------------------- + + +def _get_default_annual_spacing(nyears): + """ + Returns a default spacing between consecutive ticks for annual data. + """ + if nyears < 11: + (min_spacing, maj_spacing) = (1, 1) + elif nyears < 20: + (min_spacing, maj_spacing) = (1, 2) + elif nyears < 50: + (min_spacing, maj_spacing) = (1, 5) + elif nyears < 100: + (min_spacing, maj_spacing) = (5, 10) + elif nyears < 200: + (min_spacing, maj_spacing) = (5, 25) + elif nyears < 600: + (min_spacing, maj_spacing) = (10, 50) + else: + factor = nyears // 1000 + 1 + (min_spacing, maj_spacing) = (factor * 20, factor * 100) + return (min_spacing, maj_spacing) + + +def period_break(dates, period): + """ + Returns the indices where the given period changes. + + Parameters + ---------- + dates : PeriodIndex + Array of intervals to monitor. + period : string + Name of the period to monitor. + """ + current = getattr(dates, period) + previous = getattr(dates - 1, period) + return np.nonzero(current - previous)[0] + + +def has_level_label(label_flags, vmin): + """ + Returns true if the ``label_flags`` indicate there is at least one label + for this level. + + if the minimum view limit is not an exact integer, then the first tick + label won't be shown, so we must adjust for that. + """ + if label_flags.size == 0 or (label_flags.size == 1 and + label_flags[0] == 0 and + vmin % 1 > 0.0): + return False + else: + return True + + +def _daily_finder(vmin, vmax, freq): + periodsperday = -1 + + if freq >= FreqGroup.FR_HR: + if freq == FreqGroup.FR_NS: + periodsperday = 24 * 60 * 60 * 1000000000 + elif freq == FreqGroup.FR_US: + periodsperday = 24 * 60 * 60 * 1000000 + elif freq == FreqGroup.FR_MS: + periodsperday = 24 * 60 * 60 * 1000 + elif freq == FreqGroup.FR_SEC: + periodsperday = 24 * 60 * 60 + elif freq == FreqGroup.FR_MIN: + periodsperday = 24 * 60 + elif freq == FreqGroup.FR_HR: + periodsperday = 24 + else: # pragma: no cover + raise ValueError("unexpected frequency: %s" % freq) + periodsperyear = 365 * periodsperday + periodspermonth = 28 * periodsperday + + elif freq == FreqGroup.FR_BUS: + periodsperyear = 261 + periodspermonth = 19 + elif freq == FreqGroup.FR_DAY: + periodsperyear = 365 + periodspermonth = 28 + elif frequencies.get_freq_group(freq) == FreqGroup.FR_WK: + periodsperyear = 52 + periodspermonth = 3 + else: # pragma: no cover + raise ValueError("unexpected frequency") + + # save this for later usage + vmin_orig = vmin + + (vmin, vmax) = (Period(ordinal=int(vmin), freq=freq), + Period(ordinal=int(vmax), freq=freq)) + span = vmax.ordinal - vmin.ordinal + 1 + dates_ = PeriodIndex(start=vmin, end=vmax, freq=freq) + # Initialize the output + info = np.zeros(span, + dtype=[('val', np.int64), ('maj', bool), + ('min', bool), ('fmt', '|S20')]) + info['val'][:] = dates_._values + info['fmt'][:] = '' + info['maj'][[0, -1]] = True + # .. and set some shortcuts + info_maj = info['maj'] + info_min = info['min'] + info_fmt = info['fmt'] + + def first_label(label_flags): + if (label_flags[0] == 0) and (label_flags.size > 1) and \ + ((vmin_orig % 1) > 0.0): + return label_flags[1] + else: + return label_flags[0] + + # Case 1. Less than a month + if span <= periodspermonth: + day_start = period_break(dates_, 'day') + month_start = period_break(dates_, 'month') + + def _hour_finder(label_interval, force_year_start): + _hour = dates_.hour + _prev_hour = (dates_ - 1).hour + hour_start = (_hour - _prev_hour) != 0 + info_maj[day_start] = True + info_min[hour_start & (_hour % label_interval == 0)] = True + year_start = period_break(dates_, 'year') + info_fmt[hour_start & (_hour % label_interval == 0)] = '%H:%M' + info_fmt[day_start] = '%H:%M\n%d-%b' + info_fmt[year_start] = '%H:%M\n%d-%b\n%Y' + if force_year_start and not has_level_label(year_start, vmin_orig): + info_fmt[first_label(day_start)] = '%H:%M\n%d-%b\n%Y' + + def _minute_finder(label_interval): + hour_start = period_break(dates_, 'hour') + _minute = dates_.minute + _prev_minute = (dates_ - 1).minute + minute_start = (_minute - _prev_minute) != 0 + info_maj[hour_start] = True + info_min[minute_start & (_minute % label_interval == 0)] = True + year_start = period_break(dates_, 'year') + info_fmt = info['fmt'] + info_fmt[minute_start & (_minute % label_interval == 0)] = '%H:%M' + info_fmt[day_start] = '%H:%M\n%d-%b' + info_fmt[year_start] = '%H:%M\n%d-%b\n%Y' + + def _second_finder(label_interval): + minute_start = period_break(dates_, 'minute') + _second = dates_.second + _prev_second = (dates_ - 1).second + second_start = (_second - _prev_second) != 0 + info['maj'][minute_start] = True + info['min'][second_start & (_second % label_interval == 0)] = True + year_start = period_break(dates_, 'year') + info_fmt = info['fmt'] + info_fmt[second_start & (_second % + label_interval == 0)] = '%H:%M:%S' + info_fmt[day_start] = '%H:%M:%S\n%d-%b' + info_fmt[year_start] = '%H:%M:%S\n%d-%b\n%Y' + + if span < periodsperday / 12000.0: + _second_finder(1) + elif span < periodsperday / 6000.0: + _second_finder(2) + elif span < periodsperday / 2400.0: + _second_finder(5) + elif span < periodsperday / 1200.0: + _second_finder(10) + elif span < periodsperday / 800.0: + _second_finder(15) + elif span < periodsperday / 400.0: + _second_finder(30) + elif span < periodsperday / 150.0: + _minute_finder(1) + elif span < periodsperday / 70.0: + _minute_finder(2) + elif span < periodsperday / 24.0: + _minute_finder(5) + elif span < periodsperday / 12.0: + _minute_finder(15) + elif span < periodsperday / 6.0: + _minute_finder(30) + elif span < periodsperday / 2.5: + _hour_finder(1, False) + elif span < periodsperday / 1.5: + _hour_finder(2, False) + elif span < periodsperday * 1.25: + _hour_finder(3, False) + elif span < periodsperday * 2.5: + _hour_finder(6, True) + elif span < periodsperday * 4: + _hour_finder(12, True) + else: + info_maj[month_start] = True + info_min[day_start] = True + year_start = period_break(dates_, 'year') + info_fmt = info['fmt'] + info_fmt[day_start] = '%d' + info_fmt[month_start] = '%d\n%b' + info_fmt[year_start] = '%d\n%b\n%Y' + if not has_level_label(year_start, vmin_orig): + if not has_level_label(month_start, vmin_orig): + info_fmt[first_label(day_start)] = '%d\n%b\n%Y' + else: + info_fmt[first_label(month_start)] = '%d\n%b\n%Y' + + # Case 2. Less than three months + elif span <= periodsperyear // 4: + month_start = period_break(dates_, 'month') + info_maj[month_start] = True + if freq < FreqGroup.FR_HR: + info['min'] = True + else: + day_start = period_break(dates_, 'day') + info['min'][day_start] = True + week_start = period_break(dates_, 'week') + year_start = period_break(dates_, 'year') + info_fmt[week_start] = '%d' + info_fmt[month_start] = '\n\n%b' + info_fmt[year_start] = '\n\n%b\n%Y' + if not has_level_label(year_start, vmin_orig): + if not has_level_label(month_start, vmin_orig): + info_fmt[first_label(week_start)] = '\n\n%b\n%Y' + else: + info_fmt[first_label(month_start)] = '\n\n%b\n%Y' + # Case 3. Less than 14 months ............... + elif span <= 1.15 * periodsperyear: + year_start = period_break(dates_, 'year') + month_start = period_break(dates_, 'month') + week_start = period_break(dates_, 'week') + info_maj[month_start] = True + info_min[week_start] = True + info_min[year_start] = False + info_min[month_start] = False + info_fmt[month_start] = '%b' + info_fmt[year_start] = '%b\n%Y' + if not has_level_label(year_start, vmin_orig): + info_fmt[first_label(month_start)] = '%b\n%Y' + # Case 4. Less than 2.5 years ............... + elif span <= 2.5 * periodsperyear: + year_start = period_break(dates_, 'year') + quarter_start = period_break(dates_, 'quarter') + month_start = period_break(dates_, 'month') + info_maj[quarter_start] = True + info_min[month_start] = True + info_fmt[quarter_start] = '%b' + info_fmt[year_start] = '%b\n%Y' + # Case 4. Less than 4 years ................. + elif span <= 4 * periodsperyear: + year_start = period_break(dates_, 'year') + month_start = period_break(dates_, 'month') + info_maj[year_start] = True + info_min[month_start] = True + info_min[year_start] = False + + month_break = dates_[month_start].month + jan_or_jul = month_start[(month_break == 1) | (month_break == 7)] + info_fmt[jan_or_jul] = '%b' + info_fmt[year_start] = '%b\n%Y' + # Case 5. Less than 11 years ................ + elif span <= 11 * periodsperyear: + year_start = period_break(dates_, 'year') + quarter_start = period_break(dates_, 'quarter') + info_maj[year_start] = True + info_min[quarter_start] = True + info_min[year_start] = False + info_fmt[year_start] = '%Y' + # Case 6. More than 12 years ................ + else: + year_start = period_break(dates_, 'year') + year_break = dates_[year_start].year + nyears = span / periodsperyear + (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) + major_idx = year_start[(year_break % maj_anndef == 0)] + info_maj[major_idx] = True + minor_idx = year_start[(year_break % min_anndef == 0)] + info_min[minor_idx] = True + info_fmt[major_idx] = '%Y' + + return info + + +def _monthly_finder(vmin, vmax, freq): + periodsperyear = 12 + + vmin_orig = vmin + (vmin, vmax) = (int(vmin), int(vmax)) + span = vmax - vmin + 1 + + # Initialize the output + info = np.zeros(span, + dtype=[('val', int), ('maj', bool), ('min', bool), + ('fmt', '|S8')]) + info['val'] = np.arange(vmin, vmax + 1) + dates_ = info['val'] + info['fmt'] = '' + year_start = (dates_ % 12 == 0).nonzero()[0] + info_maj = info['maj'] + info_fmt = info['fmt'] + + if span <= 1.15 * periodsperyear: + info_maj[year_start] = True + info['min'] = True + + info_fmt[:] = '%b' + info_fmt[year_start] = '%b\n%Y' + + if not has_level_label(year_start, vmin_orig): + if dates_.size > 1: + idx = 1 + else: + idx = 0 + info_fmt[idx] = '%b\n%Y' + + elif span <= 2.5 * periodsperyear: + quarter_start = (dates_ % 3 == 0).nonzero() + info_maj[year_start] = True + # TODO: Check the following : is it really info['fmt'] ? + info['fmt'][quarter_start] = True + info['min'] = True + + info_fmt[quarter_start] = '%b' + info_fmt[year_start] = '%b\n%Y' + + elif span <= 4 * periodsperyear: + info_maj[year_start] = True + info['min'] = True + + jan_or_jul = (dates_ % 12 == 0) | (dates_ % 12 == 6) + info_fmt[jan_or_jul] = '%b' + info_fmt[year_start] = '%b\n%Y' + + elif span <= 11 * periodsperyear: + quarter_start = (dates_ % 3 == 0).nonzero() + info_maj[year_start] = True + info['min'][quarter_start] = True + + info_fmt[year_start] = '%Y' + + else: + nyears = span / periodsperyear + (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) + years = dates_[year_start] // 12 + 1 + major_idx = year_start[(years % maj_anndef == 0)] + info_maj[major_idx] = True + info['min'][year_start[(years % min_anndef == 0)]] = True + + info_fmt[major_idx] = '%Y' + + return info + + +def _quarterly_finder(vmin, vmax, freq): + periodsperyear = 4 + vmin_orig = vmin + (vmin, vmax) = (int(vmin), int(vmax)) + span = vmax - vmin + 1 + + info = np.zeros(span, + dtype=[('val', int), ('maj', bool), ('min', bool), + ('fmt', '|S8')]) + info['val'] = np.arange(vmin, vmax + 1) + info['fmt'] = '' + dates_ = info['val'] + info_maj = info['maj'] + info_fmt = info['fmt'] + year_start = (dates_ % 4 == 0).nonzero()[0] + + if span <= 3.5 * periodsperyear: + info_maj[year_start] = True + info['min'] = True + + info_fmt[:] = 'Q%q' + info_fmt[year_start] = 'Q%q\n%F' + if not has_level_label(year_start, vmin_orig): + if dates_.size > 1: + idx = 1 + else: + idx = 0 + info_fmt[idx] = 'Q%q\n%F' + + elif span <= 11 * periodsperyear: + info_maj[year_start] = True + info['min'] = True + info_fmt[year_start] = '%F' + + else: + years = dates_[year_start] // 4 + 1 + nyears = span / periodsperyear + (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) + major_idx = year_start[(years % maj_anndef == 0)] + info_maj[major_idx] = True + info['min'][year_start[(years % min_anndef == 0)]] = True + info_fmt[major_idx] = '%F' + + return info + + +def _annual_finder(vmin, vmax, freq): + (vmin, vmax) = (int(vmin), int(vmax + 1)) + span = vmax - vmin + 1 + + info = np.zeros(span, + dtype=[('val', int), ('maj', bool), ('min', bool), + ('fmt', '|S8')]) + info['val'] = np.arange(vmin, vmax + 1) + info['fmt'] = '' + dates_ = info['val'] + + (min_anndef, maj_anndef) = _get_default_annual_spacing(span) + major_idx = dates_ % maj_anndef == 0 + info['maj'][major_idx] = True + info['min'][(dates_ % min_anndef == 0)] = True + info['fmt'][major_idx] = '%Y' + + return info + + +def get_finder(freq): + if isinstance(freq, compat.string_types): + freq = frequencies.get_freq(freq) + fgroup = frequencies.get_freq_group(freq) + + if fgroup == FreqGroup.FR_ANN: + return _annual_finder + elif fgroup == FreqGroup.FR_QTR: + return _quarterly_finder + elif freq == FreqGroup.FR_MTH: + return _monthly_finder + elif ((freq >= FreqGroup.FR_BUS) or fgroup == FreqGroup.FR_WK): + return _daily_finder + else: # pragma: no cover + errmsg = "Unsupported frequency: %s" % (freq) + raise NotImplementedError(errmsg) + + +class TimeSeries_DateLocator(Locator): + """ + Locates the ticks along an axis controlled by a :class:`Series`. + + Parameters + ---------- + freq : {var} + Valid frequency specifier. + minor_locator : {False, True}, optional + Whether the locator is for minor ticks (True) or not. + dynamic_mode : {True, False}, optional + Whether the locator should work in dynamic mode. + base : {int}, optional + quarter : {int}, optional + month : {int}, optional + day : {int}, optional + """ + + def __init__(self, freq, minor_locator=False, dynamic_mode=True, + base=1, quarter=1, month=1, day=1, plot_obj=None): + if isinstance(freq, compat.string_types): + freq = frequencies.get_freq(freq) + self.freq = freq + self.base = base + (self.quarter, self.month, self.day) = (quarter, month, day) + self.isminor = minor_locator + self.isdynamic = dynamic_mode + self.offset = 0 + self.plot_obj = plot_obj + self.finder = get_finder(freq) + + def _get_default_locs(self, vmin, vmax): + "Returns the default locations of ticks." + + if self.plot_obj.date_axis_info is None: + self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq) + + locator = self.plot_obj.date_axis_info + + if self.isminor: + return np.compress(locator['min'], locator['val']) + return np.compress(locator['maj'], locator['val']) + + def __call__(self): + 'Return the locations of the ticks.' + # axis calls Locator.set_axis inside set_m_formatter + vi = tuple(self.axis.get_view_interval()) + if vi != self.plot_obj.view_interval: + self.plot_obj.date_axis_info = None + self.plot_obj.view_interval = vi + vmin, vmax = vi + if vmax < vmin: + vmin, vmax = vmax, vmin + if self.isdynamic: + locs = self._get_default_locs(vmin, vmax) + else: # pragma: no cover + base = self.base + (d, m) = divmod(vmin, base) + vmin = (d + 1) * base + locs = lrange(vmin, vmax + 1, base) + return locs + + def autoscale(self): + """ + Sets the view limits to the nearest multiples of base that contain the + data. + """ + # requires matplotlib >= 0.98.0 + (vmin, vmax) = self.axis.get_data_interval() + + locs = self._get_default_locs(vmin, vmax) + (vmin, vmax) = locs[[0, -1]] + if vmin == vmax: + vmin -= 1 + vmax += 1 + return nonsingular(vmin, vmax) + +# ------------------------------------------------------------------------- +# --- Formatter --- +# ------------------------------------------------------------------------- + + +class TimeSeries_DateFormatter(Formatter): + """ + Formats the ticks along an axis controlled by a :class:`PeriodIndex`. + + Parameters + ---------- + freq : {int, string} + Valid frequency specifier. + minor_locator : {False, True} + Whether the current formatter should apply to minor ticks (True) or + major ticks (False). + dynamic_mode : {True, False} + Whether the formatter works in dynamic mode or not. + """ + + def __init__(self, freq, minor_locator=False, dynamic_mode=True, + plot_obj=None): + if isinstance(freq, compat.string_types): + freq = frequencies.get_freq(freq) + self.format = None + self.freq = freq + self.locs = [] + self.formatdict = None + self.isminor = minor_locator + self.isdynamic = dynamic_mode + self.offset = 0 + self.plot_obj = plot_obj + self.finder = get_finder(freq) + + def _set_default_format(self, vmin, vmax): + "Returns the default ticks spacing." + + if self.plot_obj.date_axis_info is None: + self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq) + info = self.plot_obj.date_axis_info + + if self.isminor: + format = np.compress(info['min'] & np.logical_not(info['maj']), + info) + else: + format = np.compress(info['maj'], info) + self.formatdict = dict([(x, f) for (x, _, _, f) in format]) + return self.formatdict + + def set_locs(self, locs): + 'Sets the locations of the ticks' + # don't actually use the locs. This is just needed to work with + # matplotlib. Force to use vmin, vmax + self.locs = locs + + (vmin, vmax) = vi = tuple(self.axis.get_view_interval()) + if vi != self.plot_obj.view_interval: + self.plot_obj.date_axis_info = None + self.plot_obj.view_interval = vi + if vmax < vmin: + (vmin, vmax) = (vmax, vmin) + self._set_default_format(vmin, vmax) + + def __call__(self, x, pos=0): + if self.formatdict is None: + return '' + else: + fmt = self.formatdict.pop(x, '') + return Period(ordinal=int(x), freq=self.freq).strftime(fmt) + + +class TimeSeries_TimedeltaFormatter(Formatter): + """ + Formats the ticks along an axis controlled by a :class:`TimedeltaIndex`. + """ + + @staticmethod + def format_timedelta_ticks(x, pos, n_decimals): + """ + Convert seconds to 'D days HH:MM:SS.F' + """ + s, ns = divmod(x, 1e9) + m, s = divmod(s, 60) + h, m = divmod(m, 60) + d, h = divmod(h, 24) + decimals = int(ns * 10**(n_decimals - 9)) + s = r'{:02d}:{:02d}:{:02d}'.format(int(h), int(m), int(s)) + if n_decimals > 0: + s += '.{{:0{:0d}d}}'.format(n_decimals).format(decimals) + if d != 0: + s = '{:d} days '.format(int(d)) + s + return s + + def __call__(self, x, pos=0): + (vmin, vmax) = tuple(self.axis.get_view_interval()) + n_decimals = int(np.ceil(np.log10(100 * 1e9 / (vmax - vmin)))) + if n_decimals > 9: + n_decimals = 9 + return self.format_timedelta_ticks(x, pos, n_decimals) diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py new file mode 100644 index 0000000000000..e5b9497993172 --- /dev/null +++ b/pandas/plotting/_core.py @@ -0,0 +1,2863 @@ +# being a bit too dynamic +# pylint: disable=E1101 +from __future__ import division + +import warnings +import re +from collections import namedtuple +from distutils.version import LooseVersion + +import numpy as np + +from pandas.util._decorators import cache_readonly +from pandas.core.base import PandasObject +from pandas.core.dtypes.missing import isna, notna, remove_na_arraylike +from pandas.core.dtypes.common import ( + is_list_like, + is_integer, + is_number, + is_hashable, + is_iterator) +from pandas.core.dtypes.generic import ABCSeries + +from pandas.core.common import AbstractMethodError, _try_sort +from pandas.core.generic import _shared_docs, _shared_doc_kwargs +from pandas.core.index import Index, MultiIndex + +from pandas.core.indexes.period import PeriodIndex +from pandas.compat import range, lrange, map, zip, string_types +import pandas.compat as compat +from pandas.io.formats.printing import pprint_thing +from pandas.util._decorators import Appender + +from pandas.plotting._compat import (_mpl_ge_1_3_1, + _mpl_ge_1_5_0) +from pandas.plotting._style import (mpl_stylesheet, plot_params, + _get_standard_colors) +from pandas.plotting._tools import (_subplots, _flatten, table, + _handle_shared_axes, _get_all_lines, + _get_xlim, _set_ticks_props, + format_date_labels) + + +if _mpl_ge_1_5_0(): + # Compat with mp 1.5, which uses cycler. + import cycler + colors = mpl_stylesheet.pop('axes.color_cycle') + mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors) + + +def _get_standard_kind(kind): + return {'density': 'kde'}.get(kind, kind) + + +def _gca(rc=None): + import matplotlib.pyplot as plt + with plt.rc_context(rc): + return plt.gca() + + +def _gcf(): + import matplotlib.pyplot as plt + return plt.gcf() + + +class MPLPlot(object): + """ + Base class for assembling a pandas plot using matplotlib + + Parameters + ---------- + data : + + """ + + @property + def _kind(self): + """Specify kind str. Must be overridden in child class""" + raise NotImplementedError + + _layout_type = 'vertical' + _default_rot = 0 + orientation = None + _pop_attributes = ['label', 'style', 'logy', 'logx', 'loglog', + 'mark_right', 'stacked'] + _attr_defaults = {'logy': False, 'logx': False, 'loglog': False, + 'mark_right': True, 'stacked': False} + + def __init__(self, data, kind=None, by=None, subplots=False, sharex=None, + sharey=False, use_index=True, + figsize=None, grid=None, legend=True, rot=None, + ax=None, fig=None, title=None, xlim=None, ylim=None, + xticks=None, yticks=None, + sort_columns=False, fontsize=None, + secondary_y=False, colormap=None, + table=False, layout=None, **kwds): + + self.data = data + self.by = by + + self.kind = kind + + self.sort_columns = sort_columns + + self.subplots = subplots + + if sharex is None: + if ax is None: + self.sharex = True + else: + # if we get an axis, the users should do the visibility + # setting... + self.sharex = False + else: + self.sharex = sharex + + self.sharey = sharey + self.figsize = figsize + self.layout = layout + + self.xticks = xticks + self.yticks = yticks + self.xlim = xlim + self.ylim = ylim + self.title = title + self.use_index = use_index + + self.fontsize = fontsize + + if rot is not None: + self.rot = rot + # need to know for format_date_labels since it's rotated to 30 by + # default + self._rot_set = True + else: + self._rot_set = False + self.rot = self._default_rot + + if grid is None: + grid = False if secondary_y else self.plt.rcParams['axes.grid'] + + self.grid = grid + self.legend = legend + self.legend_handles = [] + self.legend_labels = [] + + for attr in self._pop_attributes: + value = kwds.pop(attr, self._attr_defaults.get(attr, None)) + setattr(self, attr, value) + + self.ax = ax + self.fig = fig + self.axes = None + + # parse errorbar input if given + xerr = kwds.pop('xerr', None) + yerr = kwds.pop('yerr', None) + self.errors = {} + for kw, err in zip(['xerr', 'yerr'], [xerr, yerr]): + self.errors[kw] = self._parse_errorbars(kw, err) + + if not isinstance(secondary_y, (bool, tuple, list, np.ndarray, Index)): + secondary_y = [secondary_y] + self.secondary_y = secondary_y + + # ugly TypeError if user passes matplotlib's `cmap` name. + # Probably better to accept either. + if 'cmap' in kwds and colormap: + raise TypeError("Only specify one of `cmap` and `colormap`.") + elif 'cmap' in kwds: + self.colormap = kwds.pop('cmap') + else: + self.colormap = colormap + + self.table = table + + self.kwds = kwds + + self._validate_color_args() + + def _validate_color_args(self): + if 'color' not in self.kwds and 'colors' in self.kwds: + warnings.warn(("'colors' is being deprecated. Please use 'color'" + "instead of 'colors'")) + colors = self.kwds.pop('colors') + self.kwds['color'] = colors + + if ('color' in self.kwds and self.nseries == 1 and + not is_list_like(self.kwds['color'])): + # support series.plot(color='green') + self.kwds['color'] = [self.kwds['color']] + + if ('color' in self.kwds and isinstance(self.kwds['color'], tuple) and + self.nseries == 1 and len(self.kwds['color']) in (3, 4)): + # support RGB and RGBA tuples in series plot + self.kwds['color'] = [self.kwds['color']] + + if ('color' in self.kwds or 'colors' in self.kwds) and \ + self.colormap is not None: + warnings.warn("'color' and 'colormap' cannot be used " + "simultaneously. Using 'color'") + + if 'color' in self.kwds and self.style is not None: + if is_list_like(self.style): + styles = self.style + else: + styles = [self.style] + # need only a single match + for s in styles: + if re.match('^[a-z]+?', s) is not None: + raise ValueError( + "Cannot pass 'style' string with a color " + "symbol and 'color' keyword argument. Please" + " use one or the other or pass 'style' " + "without a color symbol") + + def _iter_data(self, data=None, keep_index=False, fillna=None): + if data is None: + data = self.data + if fillna is not None: + data = data.fillna(fillna) + + # TODO: unused? + # if self.sort_columns: + # columns = _try_sort(data.columns) + # else: + # columns = data.columns + + for col, values in data.iteritems(): + if keep_index is True: + yield col, values + else: + yield col, values.values + + @property + def nseries(self): + if self.data.ndim == 1: + return 1 + else: + return self.data.shape[1] + + def draw(self): + self.plt.draw_if_interactive() + + def generate(self): + self._args_adjust() + self._compute_plot_data() + self._setup_subplots() + self._make_plot() + self._add_table() + self._make_legend() + self._adorn_subplots() + + for ax in self.axes: + self._post_plot_logic_common(ax, self.data) + self._post_plot_logic(ax, self.data) + + def _args_adjust(self): + pass + + def _has_plotted_object(self, ax): + """check whether ax has data""" + return (len(ax.lines) != 0 or + len(ax.artists) != 0 or + len(ax.containers) != 0) + + def _maybe_right_yaxis(self, ax, axes_num): + if not self.on_right(axes_num): + # secondary axes may be passed via ax kw + return self._get_ax_layer(ax) + + if hasattr(ax, 'right_ax'): + # if it has right_ax proparty, ``ax`` must be left axes + return ax.right_ax + elif hasattr(ax, 'left_ax'): + # if it has left_ax proparty, ``ax`` must be right axes + return ax + else: + # otherwise, create twin axes + orig_ax, new_ax = ax, ax.twinx() + # TODO: use Matplotlib public API when available + new_ax._get_lines = orig_ax._get_lines + new_ax._get_patches_for_fill = orig_ax._get_patches_for_fill + orig_ax.right_ax, new_ax.left_ax = new_ax, orig_ax + + if not self._has_plotted_object(orig_ax): # no data on left y + orig_ax.get_yaxis().set_visible(False) + return new_ax + + def _setup_subplots(self): + if self.subplots: + fig, axes = _subplots(naxes=self.nseries, + sharex=self.sharex, sharey=self.sharey, + figsize=self.figsize, ax=self.ax, + layout=self.layout, + layout_type=self._layout_type) + else: + if self.ax is None: + fig = self.plt.figure(figsize=self.figsize) + axes = fig.add_subplot(111) + else: + fig = self.ax.get_figure() + if self.figsize is not None: + fig.set_size_inches(self.figsize) + axes = self.ax + + axes = _flatten(axes) + + if self.logx or self.loglog: + [a.set_xscale('log') for a in axes] + if self.logy or self.loglog: + [a.set_yscale('log') for a in axes] + + self.fig = fig + self.axes = axes + + @property + def result(self): + """ + Return result axes + """ + if self.subplots: + if self.layout is not None and not is_list_like(self.ax): + return self.axes.reshape(*self.layout) + else: + return self.axes + else: + sec_true = isinstance(self.secondary_y, bool) and self.secondary_y + all_sec = (is_list_like(self.secondary_y) and + len(self.secondary_y) == self.nseries) + if (sec_true or all_sec): + # if all data is plotted on secondary, return right axes + return self._get_ax_layer(self.axes[0], primary=False) + else: + return self.axes[0] + + def _compute_plot_data(self): + data = self.data + + if isinstance(data, ABCSeries): + label = self.label + if label is None and data.name is None: + label = 'None' + data = data.to_frame(name=label) + + numeric_data = data._convert(datetime=True)._get_numeric_data() + + try: + is_empty = numeric_data.empty + except AttributeError: + is_empty = not len(numeric_data) + + # no empty frames or series allowed + if is_empty: + raise TypeError('Empty {0!r}: no numeric data to ' + 'plot'.format(numeric_data.__class__.__name__)) + + self.data = numeric_data + + def _make_plot(self): + raise AbstractMethodError(self) + + def _add_table(self): + if self.table is False: + return + elif self.table is True: + data = self.data.transpose() + else: + data = self.table + ax = self._get_ax(0) + table(ax, data) + + def _post_plot_logic_common(self, ax, data): + """Common post process for each axes""" + labels = [pprint_thing(key) for key in data.index] + labels = dict(zip(range(len(data.index)), labels)) + + if self.orientation == 'vertical' or self.orientation is None: + if self._need_to_set_index: + xticklabels = [labels.get(x, '') for x in ax.get_xticks()] + ax.set_xticklabels(xticklabels) + self._apply_axis_properties(ax.xaxis, rot=self.rot, + fontsize=self.fontsize) + self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize) + + if hasattr(ax, 'right_ax'): + self._apply_axis_properties(ax.right_ax.yaxis, + fontsize=self.fontsize) + + elif self.orientation == 'horizontal': + if self._need_to_set_index: + yticklabels = [labels.get(y, '') for y in ax.get_yticks()] + ax.set_yticklabels(yticklabels) + self._apply_axis_properties(ax.yaxis, rot=self.rot, + fontsize=self.fontsize) + self._apply_axis_properties(ax.xaxis, fontsize=self.fontsize) + + if hasattr(ax, 'right_ax'): + self._apply_axis_properties(ax.right_ax.yaxis, + fontsize=self.fontsize) + else: # pragma no cover + raise ValueError + + def _post_plot_logic(self, ax, data): + """Post process for each axes. Overridden in child classes""" + pass + + def _adorn_subplots(self): + """Common post process unrelated to data""" + if len(self.axes) > 0: + all_axes = self._get_subplots() + nrows, ncols = self._get_axes_layout() + _handle_shared_axes(axarr=all_axes, nplots=len(all_axes), + naxes=nrows * ncols, nrows=nrows, + ncols=ncols, sharex=self.sharex, + sharey=self.sharey) + + for ax in self.axes: + if self.yticks is not None: + ax.set_yticks(self.yticks) + + if self.xticks is not None: + ax.set_xticks(self.xticks) + + if self.ylim is not None: + ax.set_ylim(self.ylim) + + if self.xlim is not None: + ax.set_xlim(self.xlim) + + ax.grid(self.grid) + + if self.title: + if self.subplots: + if is_list_like(self.title): + if len(self.title) != self.nseries: + msg = ('The length of `title` must equal the number ' + 'of columns if using `title` of type `list` ' + 'and `subplots=True`.\n' + 'length of title = {}\n' + 'number of columns = {}').format( + len(self.title), self.nseries) + raise ValueError(msg) + + for (ax, title) in zip(self.axes, self.title): + ax.set_title(title) + else: + self.fig.suptitle(self.title) + else: + if is_list_like(self.title): + msg = ('Using `title` of type `list` is not supported ' + 'unless `subplots=True` is passed') + raise ValueError(msg) + self.axes[0].set_title(self.title) + + def _apply_axis_properties(self, axis, rot=None, fontsize=None): + labels = axis.get_majorticklabels() + axis.get_minorticklabels() + for label in labels: + if rot is not None: + label.set_rotation(rot) + if fontsize is not None: + label.set_fontsize(fontsize) + + @property + def legend_title(self): + if not isinstance(self.data.columns, MultiIndex): + name = self.data.columns.name + if name is not None: + name = pprint_thing(name) + return name + else: + stringified = map(pprint_thing, + self.data.columns.names) + return ','.join(stringified) + + def _add_legend_handle(self, handle, label, index=None): + if label is not None: + if self.mark_right and index is not None: + if self.on_right(index): + label = label + ' (right)' + self.legend_handles.append(handle) + self.legend_labels.append(label) + + def _make_legend(self): + ax, leg = self._get_ax_legend(self.axes[0]) + + handles = [] + labels = [] + title = '' + + if not self.subplots: + if leg is not None: + title = leg.get_title().get_text() + handles = leg.legendHandles + labels = [x.get_text() for x in leg.get_texts()] + + if self.legend: + if self.legend == 'reverse': + self.legend_handles = reversed(self.legend_handles) + self.legend_labels = reversed(self.legend_labels) + + handles += self.legend_handles + labels += self.legend_labels + if self.legend_title is not None: + title = self.legend_title + + if len(handles) > 0: + ax.legend(handles, labels, loc='best', title=title) + + elif self.subplots and self.legend: + for ax in self.axes: + if ax.get_visible(): + ax.legend(loc='best') + + def _get_ax_legend(self, ax): + leg = ax.get_legend() + other_ax = (getattr(ax, 'left_ax', None) or + getattr(ax, 'right_ax', None)) + other_leg = None + if other_ax is not None: + other_leg = other_ax.get_legend() + if leg is None and other_leg is not None: + leg = other_leg + ax = other_ax + return ax, leg + + @cache_readonly + def plt(self): + import matplotlib.pyplot as plt + return plt + + @staticmethod + def mpl_ge_1_3_1(): + return _mpl_ge_1_3_1() + + @staticmethod + def mpl_ge_1_5_0(): + return _mpl_ge_1_5_0() + + _need_to_set_index = False + + def _get_xticks(self, convert_period=False): + index = self.data.index + is_datetype = index.inferred_type in ('datetime', 'date', + 'datetime64', 'time') + + if self.use_index: + if convert_period and isinstance(index, PeriodIndex): + self.data = self.data.reindex(index=index.sort_values()) + x = self.data.index.to_timestamp()._mpl_repr() + elif index.is_numeric(): + """ + Matplotlib supports numeric values or datetime objects as + xaxis values. Taking LBYL approach here, by the time + matplotlib raises exception when using non numeric/datetime + values for xaxis, several actions are already taken by plt. + """ + x = index._mpl_repr() + elif is_datetype: + self.data = self.data[notna(self.data.index)] + self.data = self.data.sort_index() + x = self.data.index._mpl_repr() + else: + self._need_to_set_index = True + x = lrange(len(index)) + else: + x = lrange(len(index)) + + return x + + @classmethod + def _plot(cls, ax, x, y, style=None, is_errorbar=False, **kwds): + mask = isna(y) + if mask.any(): + y = np.ma.array(y) + y = np.ma.masked_where(mask, y) + + if isinstance(x, Index): + x = x._mpl_repr() + + if is_errorbar: + if 'xerr' in kwds: + kwds['xerr'] = np.array(kwds.get('xerr')) + if 'yerr' in kwds: + kwds['yerr'] = np.array(kwds.get('yerr')) + return ax.errorbar(x, y, **kwds) + else: + # prevent style kwarg from going to errorbar, where it is + # unsupported + if style is not None: + args = (x, y, style) + else: + args = (x, y) + return ax.plot(*args, **kwds) + + def _get_index_name(self): + if isinstance(self.data.index, MultiIndex): + name = self.data.index.names + if any(x is not None for x in name): + name = ','.join([pprint_thing(x) for x in name]) + else: + name = None + else: + name = self.data.index.name + if name is not None: + name = pprint_thing(name) + + return name + + @classmethod + def _get_ax_layer(cls, ax, primary=True): + """get left (primary) or right (secondary) axes""" + if primary: + return getattr(ax, 'left_ax', ax) + else: + return getattr(ax, 'right_ax', ax) + + def _get_ax(self, i): + # get the twinx ax if appropriate + if self.subplots: + ax = self.axes[i] + ax = self._maybe_right_yaxis(ax, i) + self.axes[i] = ax + else: + ax = self.axes[0] + ax = self._maybe_right_yaxis(ax, i) + + ax.get_yaxis().set_visible(True) + return ax + + def on_right(self, i): + if isinstance(self.secondary_y, bool): + return self.secondary_y + + if isinstance(self.secondary_y, (tuple, list, np.ndarray, Index)): + return self.data.columns[i] in self.secondary_y + + def _apply_style_colors(self, colors, kwds, col_num, label): + """ + Manage style and color based on column number and its label. + Returns tuple of appropriate style and kwds which "color" may be added. + """ + style = None + if self.style is not None: + if isinstance(self.style, list): + try: + style = self.style[col_num] + except IndexError: + pass + elif isinstance(self.style, dict): + style = self.style.get(label, style) + else: + style = self.style + + has_color = 'color' in kwds or self.colormap is not None + nocolor_style = style is None or re.match('[a-z]+', style) is None + if (has_color or self.subplots) and nocolor_style: + kwds['color'] = colors[col_num % len(colors)] + return style, kwds + + def _get_colors(self, num_colors=None, color_kwds='color'): + if num_colors is None: + num_colors = self.nseries + + return _get_standard_colors(num_colors=num_colors, + colormap=self.colormap, + color=self.kwds.get(color_kwds)) + + def _parse_errorbars(self, label, err): + """ + Look for error keyword arguments and return the actual errorbar data + or return the error DataFrame/dict + + Error bars can be specified in several ways: + Series: the user provides a pandas.Series object of the same + length as the data + ndarray: provides a np.ndarray of the same length as the data + DataFrame/dict: error values are paired with keys matching the + key in the plotted DataFrame + str: the name of the column within the plotted DataFrame + """ + + if err is None: + return None + + from pandas import DataFrame, Series + + def match_labels(data, e): + e = e.reindex_axis(data.index) + return e + + # key-matched DataFrame + if isinstance(err, DataFrame): + + err = match_labels(self.data, err) + # key-matched dict + elif isinstance(err, dict): + pass + + # Series of error values + elif isinstance(err, Series): + # broadcast error series across data + err = match_labels(self.data, err) + err = np.atleast_2d(err) + err = np.tile(err, (self.nseries, 1)) + + # errors are a column in the dataframe + elif isinstance(err, string_types): + evalues = self.data[err].values + self.data = self.data[self.data.columns.drop(err)] + err = np.atleast_2d(evalues) + err = np.tile(err, (self.nseries, 1)) + + elif is_list_like(err): + if is_iterator(err): + err = np.atleast_2d(list(err)) + else: + # raw error values + err = np.atleast_2d(err) + + err_shape = err.shape + + # asymmetrical error bars + if err.ndim == 3: + if (err_shape[0] != self.nseries) or \ + (err_shape[1] != 2) or \ + (err_shape[2] != len(self.data)): + msg = "Asymmetrical error bars should be provided " + \ + "with the shape (%u, 2, %u)" % \ + (self.nseries, len(self.data)) + raise ValueError(msg) + + # broadcast errors to each data series + if len(err) == 1: + err = np.tile(err, (self.nseries, 1)) + + elif is_number(err): + err = np.tile([err], (self.nseries, len(self.data))) + + else: + msg = "No valid %s detected" % label + raise ValueError(msg) + + return err + + def _get_errorbars(self, label=None, index=None, xerr=True, yerr=True): + from pandas import DataFrame + errors = {} + + for kw, flag in zip(['xerr', 'yerr'], [xerr, yerr]): + if flag: + err = self.errors[kw] + # user provided label-matched dataframe of errors + if isinstance(err, (DataFrame, dict)): + if label is not None and label in err.keys(): + err = err[label] + else: + err = None + elif index is not None and err is not None: + err = err[index] + + if err is not None: + errors[kw] = err + return errors + + def _get_subplots(self): + from matplotlib.axes import Subplot + return [ax for ax in self.axes[0].get_figure().get_axes() + if isinstance(ax, Subplot)] + + def _get_axes_layout(self): + axes = self._get_subplots() + x_set = set() + y_set = set() + for ax in axes: + # check axes coordinates to estimate layout + points = ax.get_position().get_points() + x_set.add(points[0][0]) + y_set.add(points[0][1]) + return (len(y_set), len(x_set)) + + +class PlanePlot(MPLPlot): + """ + Abstract class for plotting on plane, currently scatter and hexbin. + """ + + _layout_type = 'single' + + def __init__(self, data, x, y, **kwargs): + MPLPlot.__init__(self, data, **kwargs) + if x is None or y is None: + raise ValueError(self._kind + ' requires and x and y column') + if is_integer(x) and not self.data.columns.holds_integer(): + x = self.data.columns[x] + if is_integer(y) and not self.data.columns.holds_integer(): + y = self.data.columns[y] + if len(self.data[x]._get_numeric_data()) == 0: + raise ValueError(self._kind + ' requires x column to be numeric') + if len(self.data[y]._get_numeric_data()) == 0: + raise ValueError(self._kind + ' requires y column to be numeric') + + self.x = x + self.y = y + + @property + def nseries(self): + return 1 + + def _post_plot_logic(self, ax, data): + x, y = self.x, self.y + ax.set_ylabel(pprint_thing(y)) + ax.set_xlabel(pprint_thing(x)) + + +class ScatterPlot(PlanePlot): + _kind = 'scatter' + + def __init__(self, data, x, y, s=None, c=None, **kwargs): + if s is None: + # hide the matplotlib default for size, in case we want to change + # the handling of this argument later + s = 20 + super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs) + if is_integer(c) and not self.data.columns.holds_integer(): + c = self.data.columns[c] + self.c = c + + def _make_plot(self): + x, y, c, data = self.x, self.y, self.c, self.data + ax = self.axes[0] + + c_is_column = is_hashable(c) and c in self.data.columns + + # plot a colorbar only if a colormap is provided or necessary + cb = self.kwds.pop('colorbar', self.colormap or c_is_column) + + # pandas uses colormap, matplotlib uses cmap. + cmap = self.colormap or 'Greys' + cmap = self.plt.cm.get_cmap(cmap) + color = self.kwds.pop("color", None) + if c is not None and color is not None: + raise TypeError('Specify exactly one of `c` and `color`') + elif c is None and color is None: + c_values = self.plt.rcParams['patch.facecolor'] + elif color is not None: + c_values = color + elif c_is_column: + c_values = self.data[c].values + else: + c_values = c + + if self.legend and hasattr(self, 'label'): + label = self.label + else: + label = None + scatter = ax.scatter(data[x].values, data[y].values, c=c_values, + label=label, cmap=cmap, **self.kwds) + if cb: + img = ax.collections[0] + kws = dict(ax=ax) + if self.mpl_ge_1_3_1(): + kws['label'] = c if c_is_column else '' + self.fig.colorbar(img, **kws) + + if label is not None: + self._add_legend_handle(scatter, label) + else: + self.legend = False + + errors_x = self._get_errorbars(label=x, index=0, yerr=False) + errors_y = self._get_errorbars(label=y, index=0, xerr=False) + if len(errors_x) > 0 or len(errors_y) > 0: + err_kwds = dict(errors_x, **errors_y) + err_kwds['ecolor'] = scatter.get_facecolor()[0] + ax.errorbar(data[x].values, data[y].values, + linestyle='none', **err_kwds) + + +class HexBinPlot(PlanePlot): + _kind = 'hexbin' + + def __init__(self, data, x, y, C=None, **kwargs): + super(HexBinPlot, self).__init__(data, x, y, **kwargs) + if is_integer(C) and not self.data.columns.holds_integer(): + C = self.data.columns[C] + self.C = C + + def _make_plot(self): + x, y, data, C = self.x, self.y, self.data, self.C + ax = self.axes[0] + # pandas uses colormap, matplotlib uses cmap. + cmap = self.colormap or 'BuGn' + cmap = self.plt.cm.get_cmap(cmap) + cb = self.kwds.pop('colorbar', True) + + if C is None: + c_values = None + else: + c_values = data[C].values + + ax.hexbin(data[x].values, data[y].values, C=c_values, cmap=cmap, + **self.kwds) + if cb: + img = ax.collections[0] + self.fig.colorbar(img, ax=ax) + + def _make_legend(self): + pass + + +class LinePlot(MPLPlot): + _kind = 'line' + _default_rot = 0 + orientation = 'vertical' + + def __init__(self, data, **kwargs): + MPLPlot.__init__(self, data, **kwargs) + if self.stacked: + self.data = self.data.fillna(value=0) + self.x_compat = plot_params['x_compat'] + if 'x_compat' in self.kwds: + self.x_compat = bool(self.kwds.pop('x_compat')) + + def _is_ts_plot(self): + # this is slightly deceptive + return not self.x_compat and self.use_index and self._use_dynamic_x() + + def _use_dynamic_x(self): + from pandas.plotting._timeseries import _use_dynamic_x + return _use_dynamic_x(self._get_ax(0), self.data) + + def _make_plot(self): + if self._is_ts_plot(): + from pandas.plotting._timeseries import _maybe_convert_index + data = _maybe_convert_index(self._get_ax(0), self.data) + + x = data.index # dummy, not used + plotf = self._ts_plot + it = self._iter_data(data=data, keep_index=True) + else: + x = self._get_xticks(convert_period=True) + plotf = self._plot + it = self._iter_data() + + stacking_id = self._get_stacking_id() + is_errorbar = any(e is not None for e in self.errors.values()) + + colors = self._get_colors() + for i, (label, y) in enumerate(it): + ax = self._get_ax(i) + kwds = self.kwds.copy() + style, kwds = self._apply_style_colors(colors, kwds, i, label) + + errors = self._get_errorbars(label=label, index=i) + kwds = dict(kwds, **errors) + + label = pprint_thing(label) # .encode('utf-8') + kwds['label'] = label + + newlines = plotf(ax, x, y, style=style, column_num=i, + stacking_id=stacking_id, + is_errorbar=is_errorbar, + **kwds) + self._add_legend_handle(newlines[0], label, index=i) + + lines = _get_all_lines(ax) + left, right = _get_xlim(lines) + ax.set_xlim(left, right) + + @classmethod + def _plot(cls, ax, x, y, style=None, column_num=None, + stacking_id=None, **kwds): + # column_num is used to get the target column from protf in line and + # area plots + if column_num == 0: + cls._initialize_stacker(ax, stacking_id, len(y)) + y_values = cls._get_stacked_values(ax, stacking_id, y, kwds['label']) + lines = MPLPlot._plot(ax, x, y_values, style=style, **kwds) + cls._update_stacker(ax, stacking_id, y) + return lines + + @classmethod + def _ts_plot(cls, ax, x, data, style=None, **kwds): + from pandas.plotting._timeseries import (_maybe_resample, + _decorate_axes, + format_dateaxis) + # accept x to be consistent with normal plot func, + # x is not passed to tsplot as it uses data.index as x coordinate + # column_num must be in kwds for stacking purpose + freq, data = _maybe_resample(data, ax, kwds) + + # Set ax with freq info + _decorate_axes(ax, freq, kwds) + # digging deeper + if hasattr(ax, 'left_ax'): + _decorate_axes(ax.left_ax, freq, kwds) + if hasattr(ax, 'right_ax'): + _decorate_axes(ax.right_ax, freq, kwds) + ax._plot_data.append((data, cls._kind, kwds)) + + lines = cls._plot(ax, data.index, data.values, style=style, **kwds) + # set date formatter, locators and rescale limits + format_dateaxis(ax, ax.freq, data.index) + return lines + + def _get_stacking_id(self): + if self.stacked: + return id(self.data) + else: + return None + + @classmethod + def _initialize_stacker(cls, ax, stacking_id, n): + if stacking_id is None: + return + if not hasattr(ax, '_stacker_pos_prior'): + ax._stacker_pos_prior = {} + if not hasattr(ax, '_stacker_neg_prior'): + ax._stacker_neg_prior = {} + ax._stacker_pos_prior[stacking_id] = np.zeros(n) + ax._stacker_neg_prior[stacking_id] = np.zeros(n) + + @classmethod + def _get_stacked_values(cls, ax, stacking_id, values, label): + if stacking_id is None: + return values + if not hasattr(ax, '_stacker_pos_prior'): + # stacker may not be initialized for subplots + cls._initialize_stacker(ax, stacking_id, len(values)) + + if (values >= 0).all(): + return ax._stacker_pos_prior[stacking_id] + values + elif (values <= 0).all(): + return ax._stacker_neg_prior[stacking_id] + values + + raise ValueError('When stacked is True, each column must be either ' + 'all positive or negative.' + '{0} contains both positive and negative values' + .format(label)) + + @classmethod + def _update_stacker(cls, ax, stacking_id, values): + if stacking_id is None: + return + if (values >= 0).all(): + ax._stacker_pos_prior[stacking_id] += values + elif (values <= 0).all(): + ax._stacker_neg_prior[stacking_id] += values + + def _post_plot_logic(self, ax, data): + condition = (not self._use_dynamic_x() and + data.index.is_all_dates and + not self.subplots or + (self.subplots and self.sharex)) + + index_name = self._get_index_name() + + if condition: + # irregular TS rotated 30 deg. by default + # probably a better place to check / set this. + if not self._rot_set: + self.rot = 30 + format_date_labels(ax, rot=self.rot) + + if index_name is not None and self.use_index: + ax.set_xlabel(index_name) + + +class AreaPlot(LinePlot): + _kind = 'area' + + def __init__(self, data, **kwargs): + kwargs.setdefault('stacked', True) + data = data.fillna(value=0) + LinePlot.__init__(self, data, **kwargs) + + if not self.stacked: + # use smaller alpha to distinguish overlap + self.kwds.setdefault('alpha', 0.5) + + if self.logy or self.loglog: + raise ValueError("Log-y scales are not supported in area plot") + + @classmethod + def _plot(cls, ax, x, y, style=None, column_num=None, + stacking_id=None, is_errorbar=False, **kwds): + + if column_num == 0: + cls._initialize_stacker(ax, stacking_id, len(y)) + y_values = cls._get_stacked_values(ax, stacking_id, y, kwds['label']) + + # need to remove label, because subplots uses mpl legend as it is + line_kwds = kwds.copy() + if cls.mpl_ge_1_5_0(): + line_kwds.pop('label') + lines = MPLPlot._plot(ax, x, y_values, style=style, **line_kwds) + + # get data from the line to get coordinates for fill_between + xdata, y_values = lines[0].get_data(orig=False) + + # unable to use ``_get_stacked_values`` here to get starting point + if stacking_id is None: + start = np.zeros(len(y)) + elif (y >= 0).all(): + start = ax._stacker_pos_prior[stacking_id] + elif (y <= 0).all(): + start = ax._stacker_neg_prior[stacking_id] + else: + start = np.zeros(len(y)) + + if 'color' not in kwds: + kwds['color'] = lines[0].get_color() + + rect = ax.fill_between(xdata, start, y_values, **kwds) + cls._update_stacker(ax, stacking_id, y) + + # LinePlot expects list of artists + res = [rect] if cls.mpl_ge_1_5_0() else lines + return res + + def _add_legend_handle(self, handle, label, index=None): + if not self.mpl_ge_1_5_0(): + from matplotlib.patches import Rectangle + # Because fill_between isn't supported in legend, + # specifically add Rectangle handle here + alpha = self.kwds.get('alpha', None) + handle = Rectangle((0, 0), 1, 1, fc=handle.get_color(), + alpha=alpha) + LinePlot._add_legend_handle(self, handle, label, index=index) + + def _post_plot_logic(self, ax, data): + LinePlot._post_plot_logic(self, ax, data) + + if self.ylim is None: + if (data >= 0).all().all(): + ax.set_ylim(0, None) + elif (data <= 0).all().all(): + ax.set_ylim(None, 0) + + +class BarPlot(MPLPlot): + _kind = 'bar' + _default_rot = 90 + orientation = 'vertical' + + def __init__(self, data, **kwargs): + self.bar_width = kwargs.pop('width', 0.5) + pos = kwargs.pop('position', 0.5) + kwargs.setdefault('align', 'center') + self.tick_pos = np.arange(len(data)) + + self.bottom = kwargs.pop('bottom', 0) + self.left = kwargs.pop('left', 0) + + self.log = kwargs.pop('log', False) + MPLPlot.__init__(self, data, **kwargs) + + if self.stacked or self.subplots: + self.tickoffset = self.bar_width * pos + if kwargs['align'] == 'edge': + self.lim_offset = self.bar_width / 2 + else: + self.lim_offset = 0 + else: + if kwargs['align'] == 'edge': + w = self.bar_width / self.nseries + self.tickoffset = self.bar_width * (pos - 0.5) + w * 0.5 + self.lim_offset = w * 0.5 + else: + self.tickoffset = self.bar_width * pos + self.lim_offset = 0 + + self.ax_pos = self.tick_pos - self.tickoffset + + def _args_adjust(self): + if is_list_like(self.bottom): + self.bottom = np.array(self.bottom) + if is_list_like(self.left): + self.left = np.array(self.left) + + @classmethod + def _plot(cls, ax, x, y, w, start=0, log=False, **kwds): + return ax.bar(x, y, w, bottom=start, log=log, **kwds) + + @property + def _start_base(self): + return self.bottom + + def _make_plot(self): + import matplotlib as mpl + + colors = self._get_colors() + ncolors = len(colors) + + pos_prior = neg_prior = np.zeros(len(self.data)) + K = self.nseries + + for i, (label, y) in enumerate(self._iter_data(fillna=0)): + ax = self._get_ax(i) + kwds = self.kwds.copy() + kwds['color'] = colors[i % ncolors] + + errors = self._get_errorbars(label=label, index=i) + kwds = dict(kwds, **errors) + + label = pprint_thing(label) + + if (('yerr' in kwds) or ('xerr' in kwds)) \ + and (kwds.get('ecolor') is None): + kwds['ecolor'] = mpl.rcParams['xtick.color'] + + start = 0 + if self.log and (y >= 1).all(): + start = 1 + start = start + self._start_base + + if self.subplots: + w = self.bar_width / 2 + rect = self._plot(ax, self.ax_pos + w, y, self.bar_width, + start=start, label=label, + log=self.log, **kwds) + ax.set_title(label) + elif self.stacked: + mask = y > 0 + start = np.where(mask, pos_prior, neg_prior) + self._start_base + w = self.bar_width / 2 + rect = self._plot(ax, self.ax_pos + w, y, self.bar_width, + start=start, label=label, + log=self.log, **kwds) + pos_prior = pos_prior + np.where(mask, y, 0) + neg_prior = neg_prior + np.where(mask, 0, y) + else: + w = self.bar_width / K + rect = self._plot(ax, self.ax_pos + (i + 0.5) * w, y, w, + start=start, label=label, + log=self.log, **kwds) + self._add_legend_handle(rect, label, index=i) + + def _post_plot_logic(self, ax, data): + if self.use_index: + str_index = [pprint_thing(key) for key in data.index] + else: + str_index = [pprint_thing(key) for key in range(data.shape[0])] + name = self._get_index_name() + + s_edge = self.ax_pos[0] - 0.25 + self.lim_offset + e_edge = self.ax_pos[-1] + 0.25 + self.bar_width + self.lim_offset + + self._decorate_ticks(ax, name, str_index, s_edge, e_edge) + + def _decorate_ticks(self, ax, name, ticklabels, start_edge, end_edge): + ax.set_xlim((start_edge, end_edge)) + ax.set_xticks(self.tick_pos) + ax.set_xticklabels(ticklabels) + if name is not None and self.use_index: + ax.set_xlabel(name) + + +class BarhPlot(BarPlot): + _kind = 'barh' + _default_rot = 0 + orientation = 'horizontal' + + @property + def _start_base(self): + return self.left + + @classmethod + def _plot(cls, ax, x, y, w, start=0, log=False, **kwds): + return ax.barh(x, y, w, left=start, log=log, **kwds) + + def _decorate_ticks(self, ax, name, ticklabels, start_edge, end_edge): + # horizontal bars + ax.set_ylim((start_edge, end_edge)) + ax.set_yticks(self.tick_pos) + ax.set_yticklabels(ticklabels) + if name is not None and self.use_index: + ax.set_ylabel(name) + + +class HistPlot(LinePlot): + _kind = 'hist' + + def __init__(self, data, bins=10, bottom=0, **kwargs): + self.bins = bins # use mpl default + self.bottom = bottom + # Do not call LinePlot.__init__ which may fill nan + MPLPlot.__init__(self, data, **kwargs) + + def _args_adjust(self): + if is_integer(self.bins): + # create common bin edge + values = (self.data._convert(datetime=True)._get_numeric_data()) + values = np.ravel(values) + values = values[~isna(values)] + + hist, self.bins = np.histogram( + values, bins=self.bins, + range=self.kwds.get('range', None), + weights=self.kwds.get('weights', None)) + + if is_list_like(self.bottom): + self.bottom = np.array(self.bottom) + + @classmethod + def _plot(cls, ax, y, style=None, bins=None, bottom=0, column_num=0, + stacking_id=None, **kwds): + if column_num == 0: + cls._initialize_stacker(ax, stacking_id, len(bins) - 1) + y = y[~isna(y)] + + base = np.zeros(len(bins) - 1) + bottom = bottom + \ + cls._get_stacked_values(ax, stacking_id, base, kwds['label']) + # ignore style + n, bins, patches = ax.hist(y, bins=bins, bottom=bottom, **kwds) + cls._update_stacker(ax, stacking_id, n) + return patches + + def _make_plot(self): + colors = self._get_colors() + stacking_id = self._get_stacking_id() + + for i, (label, y) in enumerate(self._iter_data()): + ax = self._get_ax(i) + + kwds = self.kwds.copy() + + label = pprint_thing(label) + kwds['label'] = label + + style, kwds = self._apply_style_colors(colors, kwds, i, label) + if style is not None: + kwds['style'] = style + + kwds = self._make_plot_keywords(kwds, y) + artists = self._plot(ax, y, column_num=i, + stacking_id=stacking_id, **kwds) + self._add_legend_handle(artists[0], label, index=i) + + def _make_plot_keywords(self, kwds, y): + """merge BoxPlot/KdePlot properties to passed kwds""" + # y is required for KdePlot + kwds['bottom'] = self.bottom + kwds['bins'] = self.bins + return kwds + + def _post_plot_logic(self, ax, data): + if self.orientation == 'horizontal': + ax.set_xlabel('Frequency') + else: + ax.set_ylabel('Frequency') + + @property + def orientation(self): + if self.kwds.get('orientation', None) == 'horizontal': + return 'horizontal' + else: + return 'vertical' + + +class KdePlot(HistPlot): + _kind = 'kde' + orientation = 'vertical' + + def __init__(self, data, bw_method=None, ind=None, **kwargs): + MPLPlot.__init__(self, data, **kwargs) + self.bw_method = bw_method + self.ind = ind + + def _args_adjust(self): + pass + + def _get_ind(self, y): + if self.ind is None: + # np.nanmax() and np.nanmin() ignores the missing values + sample_range = np.nanmax(y) - np.nanmin(y) + ind = np.linspace(np.nanmin(y) - 0.5 * sample_range, + np.nanmax(y) + 0.5 * sample_range, 1000) + else: + ind = self.ind + return ind + + @classmethod + def _plot(cls, ax, y, style=None, bw_method=None, ind=None, + column_num=None, stacking_id=None, **kwds): + from scipy.stats import gaussian_kde + from scipy import __version__ as spv + + y = remove_na_arraylike(y) + + if LooseVersion(spv) >= '0.11.0': + gkde = gaussian_kde(y, bw_method=bw_method) + else: + gkde = gaussian_kde(y) + if bw_method is not None: + msg = ('bw_method was added in Scipy 0.11.0.' + + ' Scipy version in use is %s.' % spv) + warnings.warn(msg) + + y = gkde.evaluate(ind) + lines = MPLPlot._plot(ax, ind, y, style=style, **kwds) + return lines + + def _make_plot_keywords(self, kwds, y): + kwds['bw_method'] = self.bw_method + kwds['ind'] = self._get_ind(y) + return kwds + + def _post_plot_logic(self, ax, data): + ax.set_ylabel('Density') + + +class PiePlot(MPLPlot): + _kind = 'pie' + _layout_type = 'horizontal' + + def __init__(self, data, kind=None, **kwargs): + data = data.fillna(value=0) + if (data < 0).any().any(): + raise ValueError("{0} doesn't allow negative values".format(kind)) + MPLPlot.__init__(self, data, kind=kind, **kwargs) + + def _args_adjust(self): + self.grid = False + self.logy = False + self.logx = False + self.loglog = False + + def _validate_color_args(self): + pass + + def _make_plot(self): + colors = self._get_colors( + num_colors=len(self.data), color_kwds='colors') + self.kwds.setdefault('colors', colors) + + for i, (label, y) in enumerate(self._iter_data()): + ax = self._get_ax(i) + if label is not None: + label = pprint_thing(label) + ax.set_ylabel(label) + + kwds = self.kwds.copy() + + def blank_labeler(label, value): + if value == 0: + return '' + else: + return label + + idx = [pprint_thing(v) for v in self.data.index] + labels = kwds.pop('labels', idx) + # labels is used for each wedge's labels + # Blank out labels for values of 0 so they don't overlap + # with nonzero wedges + if labels is not None: + blabels = [blank_labeler(l, value) for + l, value in zip(labels, y)] + else: + blabels = None + results = ax.pie(y, labels=blabels, **kwds) + + if kwds.get('autopct', None) is not None: + patches, texts, autotexts = results + else: + patches, texts = results + autotexts = [] + + if self.fontsize is not None: + for t in texts + autotexts: + t.set_fontsize(self.fontsize) + + # leglabels is used for legend labels + leglabels = labels if labels is not None else idx + for p, l in zip(patches, leglabels): + self._add_legend_handle(p, l) + + +class BoxPlot(LinePlot): + _kind = 'box' + _layout_type = 'horizontal' + + _valid_return_types = (None, 'axes', 'dict', 'both') + # namedtuple to hold results + BP = namedtuple("Boxplot", ['ax', 'lines']) + + def __init__(self, data, return_type='axes', **kwargs): + # Do not call LinePlot.__init__ which may fill nan + if return_type not in self._valid_return_types: + raise ValueError( + "return_type must be {None, 'axes', 'dict', 'both'}") + + self.return_type = return_type + MPLPlot.__init__(self, data, **kwargs) + + def _args_adjust(self): + if self.subplots: + # Disable label ax sharing. Otherwise, all subplots shows last + # column label + if self.orientation == 'vertical': + self.sharex = False + else: + self.sharey = False + + @classmethod + def _plot(cls, ax, y, column_num=None, return_type='axes', **kwds): + if y.ndim == 2: + y = [remove_na_arraylike(v) for v in y] + # Boxplot fails with empty arrays, so need to add a NaN + # if any cols are empty + # GH 8181 + y = [v if v.size > 0 else np.array([np.nan]) for v in y] + else: + y = remove_na_arraylike(y) + bp = ax.boxplot(y, **kwds) + + if return_type == 'dict': + return bp, bp + elif return_type == 'both': + return cls.BP(ax=ax, lines=bp), bp + else: + return ax, bp + + def _validate_color_args(self): + if 'color' in self.kwds: + if self.colormap is not None: + warnings.warn("'color' and 'colormap' cannot be used " + "simultaneously. Using 'color'") + self.color = self.kwds.pop('color') + + if isinstance(self.color, dict): + valid_keys = ['boxes', 'whiskers', 'medians', 'caps'] + for key, values in compat.iteritems(self.color): + if key not in valid_keys: + raise ValueError("color dict contains invalid " + "key '{0}' " + "The key must be either {1}" + .format(key, valid_keys)) + else: + self.color = None + + # get standard colors for default + colors = _get_standard_colors(num_colors=3, + colormap=self.colormap, + color=None) + # use 2 colors by default, for box/whisker and median + # flier colors isn't needed here + # because it can be specified by ``sym`` kw + self._boxes_c = colors[0] + self._whiskers_c = colors[0] + self._medians_c = colors[2] + self._caps_c = 'k' # mpl default + + def _get_colors(self, num_colors=None, color_kwds='color'): + pass + + def maybe_color_bp(self, bp): + if isinstance(self.color, dict): + boxes = self.color.get('boxes', self._boxes_c) + whiskers = self.color.get('whiskers', self._whiskers_c) + medians = self.color.get('medians', self._medians_c) + caps = self.color.get('caps', self._caps_c) + else: + # Other types are forwarded to matplotlib + # If None, use default colors + boxes = self.color or self._boxes_c + whiskers = self.color or self._whiskers_c + medians = self.color or self._medians_c + caps = self.color or self._caps_c + + from matplotlib.artist import setp + setp(bp['boxes'], color=boxes, alpha=1) + setp(bp['whiskers'], color=whiskers, alpha=1) + setp(bp['medians'], color=medians, alpha=1) + setp(bp['caps'], color=caps, alpha=1) + + def _make_plot(self): + if self.subplots: + from pandas.core.series import Series + self._return_obj = Series() + + for i, (label, y) in enumerate(self._iter_data()): + ax = self._get_ax(i) + kwds = self.kwds.copy() + + ret, bp = self._plot(ax, y, column_num=i, + return_type=self.return_type, **kwds) + self.maybe_color_bp(bp) + self._return_obj[label] = ret + + label = [pprint_thing(label)] + self._set_ticklabels(ax, label) + else: + y = self.data.values.T + ax = self._get_ax(0) + kwds = self.kwds.copy() + + ret, bp = self._plot(ax, y, column_num=0, + return_type=self.return_type, **kwds) + self.maybe_color_bp(bp) + self._return_obj = ret + + labels = [l for l, _ in self._iter_data()] + labels = [pprint_thing(l) for l in labels] + if not self.use_index: + labels = [pprint_thing(key) for key in range(len(labels))] + self._set_ticklabels(ax, labels) + + def _set_ticklabels(self, ax, labels): + if self.orientation == 'vertical': + ax.set_xticklabels(labels) + else: + ax.set_yticklabels(labels) + + def _make_legend(self): + pass + + def _post_plot_logic(self, ax, data): + pass + + @property + def orientation(self): + if self.kwds.get('vert', True): + return 'vertical' + else: + return 'horizontal' + + @property + def result(self): + if self.return_type is None: + return super(BoxPlot, self).result + else: + return self._return_obj + + +# kinds supported by both dataframe and series +_common_kinds = ['line', 'bar', 'barh', + 'kde', 'density', 'area', 'hist', 'box'] +# kinds supported by dataframe +_dataframe_kinds = ['scatter', 'hexbin'] +# kinds supported only by series or dataframe single column +_series_kinds = ['pie'] +_all_kinds = _common_kinds + _dataframe_kinds + _series_kinds + +_klasses = [LinePlot, BarPlot, BarhPlot, KdePlot, HistPlot, BoxPlot, + ScatterPlot, HexBinPlot, AreaPlot, PiePlot] + +_plot_klass = {} +for klass in _klasses: + _plot_klass[klass._kind] = klass + + +def _plot(data, x=None, y=None, subplots=False, + ax=None, kind='line', **kwds): + kind = _get_standard_kind(kind.lower().strip()) + if kind in _all_kinds: + klass = _plot_klass[kind] + else: + raise ValueError("%r is not a valid plot kind" % kind) + + from pandas import DataFrame + if kind in _dataframe_kinds: + if isinstance(data, DataFrame): + plot_obj = klass(data, x=x, y=y, subplots=subplots, ax=ax, + kind=kind, **kwds) + else: + raise ValueError("plot kind %r can only be used for data frames" + % kind) + + elif kind in _series_kinds: + if isinstance(data, DataFrame): + if y is None and subplots is False: + msg = "{0} requires either y column or 'subplots=True'" + raise ValueError(msg.format(kind)) + elif y is not None: + if is_integer(y) and not data.columns.holds_integer(): + y = data.columns[y] + # converted to series actually. copy to not modify + data = data[y].copy() + data.index.name = y + plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) + else: + if isinstance(data, DataFrame): + if x is not None: + if is_integer(x) and not data.columns.holds_integer(): + x = data.columns[x] + data = data.set_index(x) + + if y is not None: + if is_integer(y) and not data.columns.holds_integer(): + y = data.columns[y] + label = kwds['label'] if 'label' in kwds else y + series = data[y].copy() # Don't modify + series.name = label + + for kw in ['xerr', 'yerr']: + if (kw in kwds) and \ + (isinstance(kwds[kw], string_types) or + is_integer(kwds[kw])): + try: + kwds[kw] = data[kwds[kw]] + except (IndexError, KeyError, TypeError): + pass + data = series + plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) + + plot_obj.generate() + plot_obj.draw() + return plot_obj.result + + +df_kind = """- 'scatter' : scatter plot + - 'hexbin' : hexbin plot""" +series_kind = "" + +df_coord = """x : label or position, default None + y : label or position, default None + Allows plotting of one column versus another""" +series_coord = "" + +df_unique = """stacked : boolean, default False in line and + bar plots, and True in area plot. If True, create stacked plot. + sort_columns : boolean, default False + Sort column names to determine plot ordering + secondary_y : boolean or sequence, default False + Whether to plot on the secondary y-axis + If a list/tuple, which columns to plot on secondary y-axis""" +series_unique = """label : label argument to provide to plot + secondary_y : boolean or sequence of ints, default False + If True then y-axis will be on the right""" + +df_ax = """ax : matplotlib axes object, default None + subplots : boolean, default False + Make separate subplots for each column + sharex : boolean, default True if ax is None else False + In case subplots=True, share x axis and set some x axis labels to + invisible; defaults to True if ax is None otherwise False if an ax + is passed in; Be aware, that passing in both an ax and sharex=True + will alter all x axis labels for all axis in a figure! + sharey : boolean, default False + In case subplots=True, share y axis and set some y axis labels to + invisible + layout : tuple (optional) + (rows, columns) for the layout of subplots""" +series_ax = """ax : matplotlib axes object + If not passed, uses gca()""" + +df_note = """- If `kind` = 'scatter' and the argument `c` is the name of a dataframe + column, the values of that column are used to color each point. + - If `kind` = 'hexbin', you can control the size of the bins with the + `gridsize` argument. By default, a histogram of the counts around each + `(x, y)` point is computed. You can specify alternative aggregations + by passing values to the `C` and `reduce_C_function` arguments. + `C` specifies the value at each `(x, y)` point and `reduce_C_function` + is a function of one argument that reduces all the values in a bin to + a single number (e.g. `mean`, `max`, `sum`, `std`).""" +series_note = "" + +_shared_doc_df_kwargs = dict(klass='DataFrame', klass_obj='df', + klass_kind=df_kind, klass_coord=df_coord, + klass_ax=df_ax, klass_unique=df_unique, + klass_note=df_note) +_shared_doc_series_kwargs = dict(klass='Series', klass_obj='s', + klass_kind=series_kind, + klass_coord=series_coord, klass_ax=series_ax, + klass_unique=series_unique, + klass_note=series_note) + +_shared_docs['plot'] = """ + Make plots of %(klass)s using matplotlib / pylab. + + *New in version 0.17.0:* Each plot kind has a corresponding method on the + ``%(klass)s.plot`` accessor: + ``%(klass_obj)s.plot(kind='line')`` is equivalent to + ``%(klass_obj)s.plot.line()``. + + Parameters + ---------- + data : %(klass)s + %(klass_coord)s + kind : str + - 'line' : line plot (default) + - 'bar' : vertical bar plot + - 'barh' : horizontal bar plot + - 'hist' : histogram + - 'box' : boxplot + - 'kde' : Kernel Density Estimation plot + - 'density' : same as 'kde' + - 'area' : area plot + - 'pie' : pie plot + %(klass_kind)s + %(klass_ax)s + figsize : a tuple (width, height) in inches + use_index : boolean, default True + Use index as ticks for x axis + title : string or list + Title to use for the plot. If a string is passed, print the string at + the top of the figure. If a list is passed and `subplots` is True, + print each item in the list above the corresponding subplot. + grid : boolean, default None (matlab style default) + Axis grid lines + legend : False/True/'reverse' + Place legend on axis subplots + style : list or dict + matplotlib line style per column + logx : boolean, default False + Use log scaling on x axis + logy : boolean, default False + Use log scaling on y axis + loglog : boolean, default False + Use log scaling on both x and y axes + xticks : sequence + Values to use for the xticks + yticks : sequence + Values to use for the yticks + xlim : 2-tuple/list + ylim : 2-tuple/list + rot : int, default None + Rotation for ticks (xticks for vertical, yticks for horizontal plots) + fontsize : int, default None + Font size for xticks and yticks + colormap : str or matplotlib colormap object, default None + Colormap to select colors from. If string, load colormap with that name + from matplotlib. + colorbar : boolean, optional + If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots) + position : float + Specify relative alignments for bar plot layout. + From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) + layout : tuple (optional) + (rows, columns) for the layout of the plot + table : boolean, Series or DataFrame, default False + If True, draw a table using the data in the DataFrame and the data will + be transposed to meet matplotlib's default layout. + If a Series or DataFrame is passed, use passed data to draw a table. + yerr : DataFrame, Series, array-like, dict and str + See :ref:`Plotting with Error Bars ` for + detail. + xerr : same types as yerr. + %(klass_unique)s + mark_right : boolean, default True + When using a secondary_y axis, automatically mark the column + labels with "(right)" in the legend + kwds : keywords + Options to pass to matplotlib plotting method + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + + Notes + ----- + + - See matplotlib documentation online for more on this subject + - If `kind` = 'bar' or 'barh', you can specify relative alignments + for bar plot layout by `position` keyword. + From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) + %(klass_note)s + + """ + + +@Appender(_shared_docs['plot'] % _shared_doc_df_kwargs) +def plot_frame(data, x=None, y=None, kind='line', ax=None, + subplots=False, sharex=None, sharey=False, layout=None, + figsize=None, use_index=True, title=None, grid=None, + legend=True, style=None, logx=False, logy=False, loglog=False, + xticks=None, yticks=None, xlim=None, ylim=None, + rot=None, fontsize=None, colormap=None, table=False, + yerr=None, xerr=None, + secondary_y=False, sort_columns=False, + **kwds): + return _plot(data, kind=kind, x=x, y=y, ax=ax, + subplots=subplots, sharex=sharex, sharey=sharey, + layout=layout, figsize=figsize, use_index=use_index, + title=title, grid=grid, legend=legend, + style=style, logx=logx, logy=logy, loglog=loglog, + xticks=xticks, yticks=yticks, xlim=xlim, ylim=ylim, + rot=rot, fontsize=fontsize, colormap=colormap, table=table, + yerr=yerr, xerr=xerr, + secondary_y=secondary_y, sort_columns=sort_columns, + **kwds) + + +@Appender(_shared_docs['plot'] % _shared_doc_series_kwargs) +def plot_series(data, kind='line', ax=None, # Series unique + figsize=None, use_index=True, title=None, grid=None, + legend=False, style=None, logx=False, logy=False, loglog=False, + xticks=None, yticks=None, xlim=None, ylim=None, + rot=None, fontsize=None, colormap=None, table=False, + yerr=None, xerr=None, + label=None, secondary_y=False, # Series unique + **kwds): + + import matplotlib.pyplot as plt + if ax is None and len(plt.get_fignums()) > 0: + ax = _gca() + ax = MPLPlot._get_ax_layer(ax) + return _plot(data, kind=kind, ax=ax, + figsize=figsize, use_index=use_index, title=title, + grid=grid, legend=legend, + style=style, logx=logx, logy=logy, loglog=loglog, + xticks=xticks, yticks=yticks, xlim=xlim, ylim=ylim, + rot=rot, fontsize=fontsize, colormap=colormap, table=table, + yerr=yerr, xerr=xerr, + label=label, secondary_y=secondary_y, + **kwds) + + +_shared_docs['boxplot'] = """ + Make a box plot from DataFrame column optionally grouped by some columns or + other inputs + + Parameters + ---------- + data : the pandas object holding the data + column : column name or list of names, or vector + Can be any valid input to groupby + by : string or sequence + Column in the DataFrame to group by + ax : Matplotlib axes object, optional + fontsize : int or string + rot : label rotation angle + figsize : A tuple (width, height) in inches + grid : Setting this to True will show the grid + layout : tuple (optional) + (rows, columns) for the layout of the plot + return_type : {None, 'axes', 'dict', 'both'}, default None + The kind of object to return. The default is ``axes`` + 'axes' returns the matplotlib axes the boxplot is drawn on; + 'dict' returns a dictionary whose values are the matplotlib + Lines of the boxplot; + 'both' returns a namedtuple with the axes and dict. + + When grouping with ``by``, a Series mapping columns to ``return_type`` + is returned, unless ``return_type`` is None, in which case a NumPy + array of axes is returned with the same shape as ``layout``. + See the prose documentation for more. + + kwds : other plotting keyword arguments to be passed to matplotlib boxplot + function + + Returns + ------- + lines : dict + ax : matplotlib Axes + (ax, lines): namedtuple + + Notes + ----- + Use ``return_type='dict'`` when you want to tweak the appearance + of the lines after plotting. In this case a dict containing the Lines + making up the boxes, caps, fliers, medians, and whiskers is returned. + """ + + +@Appender(_shared_docs['boxplot'] % _shared_doc_kwargs) +def boxplot(data, column=None, by=None, ax=None, fontsize=None, + rot=0, grid=True, figsize=None, layout=None, return_type=None, + **kwds): + + # validate return_type: + if return_type not in BoxPlot._valid_return_types: + raise ValueError("return_type must be {'axes', 'dict', 'both'}") + + from pandas import Series, DataFrame + if isinstance(data, Series): + data = DataFrame({'x': data}) + column = 'x' + + def _get_colors(): + return _get_standard_colors(color=kwds.get('color'), num_colors=1) + + def maybe_color_bp(bp): + if 'color' not in kwds: + from matplotlib.artist import setp + setp(bp['boxes'], color=colors[0], alpha=1) + setp(bp['whiskers'], color=colors[0], alpha=1) + setp(bp['medians'], color=colors[2], alpha=1) + + def plot_group(keys, values, ax): + keys = [pprint_thing(x) for x in keys] + values = [remove_na_arraylike(v) for v in values] + bp = ax.boxplot(values, **kwds) + if fontsize is not None: + ax.tick_params(axis='both', labelsize=fontsize) + if kwds.get('vert', 1): + ax.set_xticklabels(keys, rotation=rot) + else: + ax.set_yticklabels(keys, rotation=rot) + maybe_color_bp(bp) + + # Return axes in multiplot case, maybe revisit later # 985 + if return_type == 'dict': + return bp + elif return_type == 'both': + return BoxPlot.BP(ax=ax, lines=bp) + else: + return ax + + colors = _get_colors() + if column is None: + columns = None + else: + if isinstance(column, (list, tuple)): + columns = column + else: + columns = [column] + + if by is not None: + # Prefer array return type for 2-D plots to match the subplot layout + # https://github.com/pandas-dev/pandas/pull/12216#issuecomment-241175580 + result = _grouped_plot_by_column(plot_group, data, columns=columns, + by=by, grid=grid, figsize=figsize, + ax=ax, layout=layout, + return_type=return_type) + else: + if return_type is None: + return_type = 'axes' + if layout is not None: + raise ValueError("The 'layout' keyword is not supported when " + "'by' is None") + + if ax is None: + rc = {'figure.figsize': figsize} if figsize is not None else {} + ax = _gca(rc) + data = data._get_numeric_data() + if columns is None: + columns = data.columns + else: + data = data[columns] + + result = plot_group(columns, data.values.T, ax) + ax.grid(grid) + + return result + + +@Appender(_shared_docs['boxplot'] % _shared_doc_kwargs) +def boxplot_frame(self, column=None, by=None, ax=None, fontsize=None, rot=0, + grid=True, figsize=None, layout=None, + return_type=None, **kwds): + import matplotlib.pyplot as plt + ax = boxplot(self, column=column, by=by, ax=ax, fontsize=fontsize, + grid=grid, rot=rot, figsize=figsize, layout=layout, + return_type=return_type, **kwds) + plt.draw_if_interactive() + return ax + + +def scatter_plot(data, x, y, by=None, ax=None, figsize=None, grid=False, + **kwargs): + """ + Make a scatter plot from two DataFrame columns + + Parameters + ---------- + data : DataFrame + x : Column name for the x-axis values + y : Column name for the y-axis values + ax : Matplotlib axis object + figsize : A tuple (width, height) in inches + grid : Setting this to True will show the grid + kwargs : other plotting keyword arguments + To be passed to scatter function + + Returns + ------- + fig : matplotlib.Figure + """ + import matplotlib.pyplot as plt + + kwargs.setdefault('edgecolors', 'none') + + def plot_group(group, ax): + xvals = group[x].values + yvals = group[y].values + ax.scatter(xvals, yvals, **kwargs) + ax.grid(grid) + + if by is not None: + fig = _grouped_plot(plot_group, data, by=by, figsize=figsize, ax=ax) + else: + if ax is None: + fig = plt.figure() + ax = fig.add_subplot(111) + else: + fig = ax.get_figure() + plot_group(data, ax) + ax.set_ylabel(pprint_thing(y)) + ax.set_xlabel(pprint_thing(x)) + + ax.grid(grid) + + return fig + + +def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, + xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, + sharey=False, figsize=None, layout=None, bins=10, **kwds): + """ + Draw histogram of the DataFrame's series using matplotlib / pylab. + + Parameters + ---------- + data : DataFrame + column : string or sequence + If passed, will be used to limit data to a subset of columns + by : object, optional + If passed, then used to form histograms for separate groups + grid : boolean, default True + Whether to show axis grid lines + xlabelsize : int, default None + If specified changes the x-axis label size + xrot : float, default None + rotation of x axis labels + ylabelsize : int, default None + If specified changes the y-axis label size + yrot : float, default None + rotation of y axis labels + ax : matplotlib axes object, default None + sharex : boolean, default True if ax is None else False + In case subplots=True, share x axis and set some x axis labels to + invisible; defaults to True if ax is None otherwise False if an ax + is passed in; Be aware, that passing in both an ax and sharex=True + will alter all x axis labels for all subplots in a figure! + sharey : boolean, default False + In case subplots=True, share y axis and set some y axis labels to + invisible + figsize : tuple + The size of the figure to create in inches by default + layout : tuple, optional + Tuple of (rows, columns) for the layout of the histograms + bins : integer, default 10 + Number of histogram bins to be used + kwds : other plotting keyword arguments + To be passed to hist function + """ + + if by is not None: + axes = grouped_hist(data, column=column, by=by, ax=ax, grid=grid, + figsize=figsize, sharex=sharex, sharey=sharey, + layout=layout, bins=bins, xlabelsize=xlabelsize, + xrot=xrot, ylabelsize=ylabelsize, + yrot=yrot, **kwds) + return axes + + if column is not None: + if not isinstance(column, (list, np.ndarray, Index)): + column = [column] + data = data[column] + data = data._get_numeric_data() + naxes = len(data.columns) + + fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False, + sharex=sharex, sharey=sharey, figsize=figsize, + layout=layout) + _axes = _flatten(axes) + + for i, col in enumerate(_try_sort(data.columns)): + ax = _axes[i] + ax.hist(data[col].dropna().values, bins=bins, **kwds) + ax.set_title(col) + ax.grid(grid) + + _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, + ylabelsize=ylabelsize, yrot=yrot) + fig.subplots_adjust(wspace=0.3, hspace=0.3) + + return axes + + +def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, + xrot=None, ylabelsize=None, yrot=None, figsize=None, + bins=10, **kwds): + """ + Draw histogram of the input series using matplotlib + + Parameters + ---------- + by : object, optional + If passed, then used to form histograms for separate groups + ax : matplotlib axis object + If not passed, uses gca() + grid : boolean, default True + Whether to show axis grid lines + xlabelsize : int, default None + If specified changes the x-axis label size + xrot : float, default None + rotation of x axis labels + ylabelsize : int, default None + If specified changes the y-axis label size + yrot : float, default None + rotation of y axis labels + figsize : tuple, default None + figure size in inches by default + bins: integer, default 10 + Number of histogram bins to be used + kwds : keywords + To be passed to the actual plotting function + + Notes + ----- + See matplotlib documentation online for more on this + + """ + import matplotlib.pyplot as plt + + if by is None: + if kwds.get('layout', None) is not None: + raise ValueError("The 'layout' keyword is not supported when " + "'by' is None") + # hack until the plotting interface is a bit more unified + fig = kwds.pop('figure', plt.gcf() if plt.get_fignums() else + plt.figure(figsize=figsize)) + if (figsize is not None and tuple(figsize) != + tuple(fig.get_size_inches())): + fig.set_size_inches(*figsize, forward=True) + if ax is None: + ax = fig.gca() + elif ax.get_figure() != fig: + raise AssertionError('passed axis not bound to passed figure') + values = self.dropna().values + + ax.hist(values, bins=bins, **kwds) + ax.grid(grid) + axes = np.array([ax]) + + _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, + ylabelsize=ylabelsize, yrot=yrot) + + else: + if 'figure' in kwds: + raise ValueError("Cannot pass 'figure' when using the " + "'by' argument, since a new 'Figure' instance " + "will be created") + axes = grouped_hist(self, by=by, ax=ax, grid=grid, figsize=figsize, + bins=bins, xlabelsize=xlabelsize, xrot=xrot, + ylabelsize=ylabelsize, yrot=yrot, **kwds) + + if hasattr(axes, 'ndim'): + if axes.ndim == 1 and len(axes) == 1: + return axes[0] + return axes + + +def grouped_hist(data, column=None, by=None, ax=None, bins=50, figsize=None, + layout=None, sharex=False, sharey=False, rot=90, grid=True, + xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, + **kwargs): + """ + Grouped histogram + + Parameters + ---------- + data: Series/DataFrame + column: object, optional + by: object, optional + ax: axes, optional + bins: int, default 50 + figsize: tuple, optional + layout: optional + sharex: boolean, default False + sharey: boolean, default False + rot: int, default 90 + grid: bool, default True + kwargs: dict, keyword arguments passed to matplotlib.Axes.hist + + Returns + ------- + axes: collection of Matplotlib Axes + """ + def plot_group(group, ax): + ax.hist(group.dropna().values, bins=bins, **kwargs) + + xrot = xrot or rot + + fig, axes = _grouped_plot(plot_group, data, column=column, + by=by, sharex=sharex, sharey=sharey, ax=ax, + figsize=figsize, layout=layout, rot=rot) + + _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, + ylabelsize=ylabelsize, yrot=yrot) + + fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, right=0.9, + hspace=0.5, wspace=0.3) + return axes + + +def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None, + rot=0, grid=True, ax=None, figsize=None, + layout=None, **kwds): + """ + Make box plots from DataFrameGroupBy data. + + Parameters + ---------- + grouped : Grouped DataFrame + subplots : + * ``False`` - no subplots will be used + * ``True`` - create a subplot for each group + column : column name or list of names, or vector + Can be any valid input to groupby + fontsize : int or string + rot : label rotation angle + grid : Setting this to True will show the grid + ax : Matplotlib axis object, default None + figsize : A tuple (width, height) in inches + layout : tuple (optional) + (rows, columns) for the layout of the plot + kwds : other plotting keyword arguments to be passed to matplotlib boxplot + function + + Returns + ------- + dict of key/value = group key/DataFrame.boxplot return value + or DataFrame.boxplot return value in case subplots=figures=False + + Examples + -------- + >>> import pandas + >>> import numpy as np + >>> import itertools + >>> + >>> tuples = [t for t in itertools.product(range(1000), range(4))] + >>> index = pandas.MultiIndex.from_tuples(tuples, names=['lvl0', 'lvl1']) + >>> data = np.random.randn(len(index),4) + >>> df = pandas.DataFrame(data, columns=list('ABCD'), index=index) + >>> + >>> grouped = df.groupby(level='lvl1') + >>> boxplot_frame_groupby(grouped) + >>> + >>> grouped = df.unstack(level='lvl1').groupby(level=0, axis=1) + >>> boxplot_frame_groupby(grouped, subplots=False) + """ + if subplots is True: + naxes = len(grouped) + fig, axes = _subplots(naxes=naxes, squeeze=False, + ax=ax, sharex=False, sharey=True, + figsize=figsize, layout=layout) + axes = _flatten(axes) + + from pandas.core.series import Series + ret = Series() + for (key, group), ax in zip(grouped, axes): + d = group.boxplot(ax=ax, column=column, fontsize=fontsize, + rot=rot, grid=grid, **kwds) + ax.set_title(pprint_thing(key)) + ret.loc[key] = d + fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, + right=0.9, wspace=0.2) + else: + from pandas.core.reshape.concat import concat + keys, frames = zip(*grouped) + if grouped.axis == 0: + df = concat(frames, keys=keys, axis=1) + else: + if len(frames) > 1: + df = frames[0].join(frames[1::]) + else: + df = frames[0] + ret = df.boxplot(column=column, fontsize=fontsize, rot=rot, + grid=grid, ax=ax, figsize=figsize, + layout=layout, **kwds) + return ret + + +def _grouped_plot(plotf, data, column=None, by=None, numeric_only=True, + figsize=None, sharex=True, sharey=True, layout=None, + rot=0, ax=None, **kwargs): + from pandas import DataFrame + + if figsize == 'default': + # allowed to specify mpl default with 'default' + warnings.warn("figsize='default' is deprecated. Specify figure" + "size by tuple instead", FutureWarning, stacklevel=4) + figsize = None + + grouped = data.groupby(by) + if column is not None: + grouped = grouped[column] + + naxes = len(grouped) + fig, axes = _subplots(naxes=naxes, figsize=figsize, + sharex=sharex, sharey=sharey, ax=ax, + layout=layout) + + _axes = _flatten(axes) + + for i, (key, group) in enumerate(grouped): + ax = _axes[i] + if numeric_only and isinstance(group, DataFrame): + group = group._get_numeric_data() + plotf(group, ax, **kwargs) + ax.set_title(pprint_thing(key)) + + return fig, axes + + +def _grouped_plot_by_column(plotf, data, columns=None, by=None, + numeric_only=True, grid=False, + figsize=None, ax=None, layout=None, + return_type=None, **kwargs): + grouped = data.groupby(by) + if columns is None: + if not isinstance(by, (list, tuple)): + by = [by] + columns = data._get_numeric_data().columns.difference(by) + naxes = len(columns) + fig, axes = _subplots(naxes=naxes, sharex=True, sharey=True, + figsize=figsize, ax=ax, layout=layout) + + _axes = _flatten(axes) + + ax_values = [] + + for i, col in enumerate(columns): + ax = _axes[i] + gp_col = grouped[col] + keys, values = zip(*gp_col) + re_plotf = plotf(keys, values, ax, **kwargs) + ax.set_title(col) + ax.set_xlabel(pprint_thing(by)) + ax_values.append(re_plotf) + ax.grid(grid) + + from pandas.core.series import Series + result = Series(ax_values, index=columns) + + # Return axes in multiplot case, maybe revisit later # 985 + if return_type is None: + result = axes + + byline = by[0] if len(by) == 1 else by + fig.suptitle('Boxplot grouped by %s' % byline) + fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, right=0.9, wspace=0.2) + + return result + + +class BasePlotMethods(PandasObject): + + def __init__(self, data): + self._data = data + + def __call__(self, *args, **kwargs): + raise NotImplementedError + + +class SeriesPlotMethods(BasePlotMethods): + """Series plotting accessor and method + + Examples + -------- + >>> s.plot.line() + >>> s.plot.bar() + >>> s.plot.hist() + + Plotting methods can also be accessed by calling the accessor as a method + with the ``kind`` argument: + ``s.plot(kind='line')`` is equivalent to ``s.plot.line()`` + """ + + def __call__(self, kind='line', ax=None, + figsize=None, use_index=True, title=None, grid=None, + legend=False, style=None, logx=False, logy=False, + loglog=False, xticks=None, yticks=None, + xlim=None, ylim=None, + rot=None, fontsize=None, colormap=None, table=False, + yerr=None, xerr=None, + label=None, secondary_y=False, **kwds): + return plot_series(self._data, kind=kind, ax=ax, figsize=figsize, + use_index=use_index, title=title, grid=grid, + legend=legend, style=style, logx=logx, logy=logy, + loglog=loglog, xticks=xticks, yticks=yticks, + xlim=xlim, ylim=ylim, rot=rot, fontsize=fontsize, + colormap=colormap, table=table, yerr=yerr, + xerr=xerr, label=label, secondary_y=secondary_y, + **kwds) + __call__.__doc__ = plot_series.__doc__ + + def line(self, **kwds): + """ + Line plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='line', **kwds) + + def bar(self, **kwds): + """ + Vertical bar plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='bar', **kwds) + + def barh(self, **kwds): + """ + Horizontal bar plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='barh', **kwds) + + def box(self, **kwds): + """ + Boxplot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='box', **kwds) + + def hist(self, bins=10, **kwds): + """ + Histogram + + .. versionadded:: 0.17.0 + + Parameters + ---------- + bins: integer, default 10 + Number of histogram bins to be used + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='hist', bins=bins, **kwds) + + def kde(self, **kwds): + """ + Kernel Density Estimate plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='kde', **kwds) + + density = kde + + def area(self, **kwds): + """ + Area plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='area', **kwds) + + def pie(self, **kwds): + """ + Pie chart + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='pie', **kwds) + + +class FramePlotMethods(BasePlotMethods): + """DataFrame plotting accessor and method + + Examples + -------- + >>> df.plot.line() + >>> df.plot.scatter('x', 'y') + >>> df.plot.hexbin() + + These plotting methods can also be accessed by calling the accessor as a + method with the ``kind`` argument: + ``df.plot(kind='line')`` is equivalent to ``df.plot.line()`` + """ + + def __call__(self, x=None, y=None, kind='line', ax=None, + subplots=False, sharex=None, sharey=False, layout=None, + figsize=None, use_index=True, title=None, grid=None, + legend=True, style=None, logx=False, logy=False, loglog=False, + xticks=None, yticks=None, xlim=None, ylim=None, + rot=None, fontsize=None, colormap=None, table=False, + yerr=None, xerr=None, + secondary_y=False, sort_columns=False, **kwds): + return plot_frame(self._data, kind=kind, x=x, y=y, ax=ax, + subplots=subplots, sharex=sharex, sharey=sharey, + layout=layout, figsize=figsize, use_index=use_index, + title=title, grid=grid, legend=legend, style=style, + logx=logx, logy=logy, loglog=loglog, xticks=xticks, + yticks=yticks, xlim=xlim, ylim=ylim, rot=rot, + fontsize=fontsize, colormap=colormap, table=table, + yerr=yerr, xerr=xerr, secondary_y=secondary_y, + sort_columns=sort_columns, **kwds) + __call__.__doc__ = plot_frame.__doc__ + + def line(self, x=None, y=None, **kwds): + """ + Line plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='line', x=x, y=y, **kwds) + + def bar(self, x=None, y=None, **kwds): + """ + Vertical bar plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='bar', x=x, y=y, **kwds) + + def barh(self, x=None, y=None, **kwds): + """ + Horizontal bar plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='barh', x=x, y=y, **kwds) + + def box(self, by=None, **kwds): + """ + Boxplot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + by : string or sequence + Column in the DataFrame to group by. + \*\*kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='box', by=by, **kwds) + + def hist(self, by=None, bins=10, **kwds): + """ + Histogram + + .. versionadded:: 0.17.0 + + Parameters + ---------- + by : string or sequence + Column in the DataFrame to group by. + bins: integer, default 10 + Number of histogram bins to be used + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='hist', by=by, bins=bins, **kwds) + + def kde(self, **kwds): + """ + Kernel Density Estimate plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='kde', **kwds) + + density = kde + + def area(self, x=None, y=None, **kwds): + """ + Area plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='area', x=x, y=y, **kwds) + + def pie(self, y=None, **kwds): + """ + Pie chart + + .. versionadded:: 0.17.0 + + Parameters + ---------- + y : label or position, optional + Column to plot. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='pie', y=y, **kwds) + + def scatter(self, x, y, s=None, c=None, **kwds): + """ + Scatter plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + s : scalar or array_like, optional + Size of each point. + c : label or position, optional + Color of each point. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds) + + def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, + **kwds): + """ + Hexbin plot + + .. versionadded:: 0.17.0 + + Parameters + ---------- + x, y : label or position, optional + Coordinates for each point. + C : label or position, optional + The value at each `(x, y)` point. + reduce_C_function : callable, optional + Function of one argument that reduces all the values in a bin to + a single number (e.g. `mean`, `max`, `sum`, `std`). + gridsize : int, optional + Number of bins. + **kwds : optional + Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. + + Returns + ------- + axes : matplotlib.AxesSubplot or np.array of them + """ + if reduce_C_function is not None: + kwds['reduce_C_function'] = reduce_C_function + if gridsize is not None: + kwds['gridsize'] = gridsize + return self(kind='hexbin', x=x, y=y, C=C, **kwds) diff --git a/pandas/plotting/_misc.py b/pandas/plotting/_misc.py new file mode 100644 index 0000000000000..db2211fb55135 --- /dev/null +++ b/pandas/plotting/_misc.py @@ -0,0 +1,573 @@ +# being a bit too dynamic +# pylint: disable=E1101 +from __future__ import division + +import numpy as np + +from pandas.util._decorators import deprecate_kwarg +from pandas.core.dtypes.missing import notna +from pandas.compat import range, lrange, lmap, zip +from pandas.io.formats.printing import pprint_thing + + +from pandas.plotting._style import _get_standard_colors +from pandas.plotting._tools import _subplots, _set_ticks_props + + +def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, + diagonal='hist', marker='.', density_kwds=None, + hist_kwds=None, range_padding=0.05, **kwds): + """ + Draw a matrix of scatter plots. + + Parameters + ---------- + frame : DataFrame + alpha : float, optional + amount of transparency applied + figsize : (float,float), optional + a tuple (width, height) in inches + ax : Matplotlib axis object, optional + grid : bool, optional + setting this to True will show the grid + diagonal : {'hist', 'kde'} + pick between 'kde' and 'hist' for + either Kernel Density Estimation or Histogram + plot in the diagonal + marker : str, optional + Matplotlib marker type, default '.' + hist_kwds : other plotting keyword arguments + To be passed to hist function + density_kwds : other plotting keyword arguments + To be passed to kernel density estimate plot + range_padding : float, optional + relative extension of axis range in x and y + with respect to (x_max - x_min) or (y_max - y_min), + default 0.05 + kwds : other plotting keyword arguments + To be passed to scatter function + + Examples + -------- + >>> df = DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D']) + >>> scatter_matrix(df, alpha=0.2) + """ + + df = frame._get_numeric_data() + n = df.columns.size + naxes = n * n + fig, axes = _subplots(naxes=naxes, figsize=figsize, ax=ax, + squeeze=False) + + # no gaps between subplots + fig.subplots_adjust(wspace=0, hspace=0) + + mask = notna(df) + + marker = _get_marker_compat(marker) + + hist_kwds = hist_kwds or {} + density_kwds = density_kwds or {} + + # GH 14855 + kwds.setdefault('edgecolors', 'none') + + boundaries_list = [] + for a in df.columns: + values = df[a].values[mask[a].values] + rmin_, rmax_ = np.min(values), np.max(values) + rdelta_ext = (rmax_ - rmin_) * range_padding / 2. + boundaries_list.append((rmin_ - rdelta_ext, rmax_ + rdelta_ext)) + + for i, a in zip(lrange(n), df.columns): + for j, b in zip(lrange(n), df.columns): + ax = axes[i, j] + + if i == j: + values = df[a].values[mask[a].values] + + # Deal with the diagonal by drawing a histogram there. + if diagonal == 'hist': + ax.hist(values, **hist_kwds) + + elif diagonal in ('kde', 'density'): + from scipy.stats import gaussian_kde + y = values + gkde = gaussian_kde(y) + ind = np.linspace(y.min(), y.max(), 1000) + ax.plot(ind, gkde.evaluate(ind), **density_kwds) + + ax.set_xlim(boundaries_list[i]) + + else: + common = (mask[a] & mask[b]).values + + ax.scatter(df[b][common], df[a][common], + marker=marker, alpha=alpha, **kwds) + + ax.set_xlim(boundaries_list[j]) + ax.set_ylim(boundaries_list[i]) + + ax.set_xlabel(b) + ax.set_ylabel(a) + + if j != 0: + ax.yaxis.set_visible(False) + if i != n - 1: + ax.xaxis.set_visible(False) + + if len(df.columns) > 1: + lim1 = boundaries_list[0] + locs = axes[0][1].yaxis.get_majorticklocs() + locs = locs[(lim1[0] <= locs) & (locs <= lim1[1])] + adj = (locs - lim1[0]) / (lim1[1] - lim1[0]) + + lim0 = axes[0][0].get_ylim() + adj = adj * (lim0[1] - lim0[0]) + lim0[0] + axes[0][0].yaxis.set_ticks(adj) + + if np.all(locs == locs.astype(int)): + # if all ticks are int + locs = locs.astype(int) + axes[0][0].yaxis.set_ticklabels(locs) + + _set_ticks_props(axes, xlabelsize=8, xrot=90, ylabelsize=8, yrot=0) + + return axes + + +def _get_marker_compat(marker): + import matplotlib.lines as mlines + import matplotlib as mpl + if mpl.__version__ < '1.1.0' and marker == '.': + return 'o' + if marker not in mlines.lineMarkers: + return 'o' + return marker + + +def radviz(frame, class_column, ax=None, color=None, colormap=None, **kwds): + """RadViz - a multivariate data visualization algorithm + + Parameters: + ----------- + frame: DataFrame + class_column: str + Column name containing class names + ax: Matplotlib axis object, optional + color: list or tuple, optional + Colors to use for the different classes + colormap : str or matplotlib colormap object, default None + Colormap to select colors from. If string, load colormap with that name + from matplotlib. + kwds: keywords + Options to pass to matplotlib scatter plotting method + + Returns: + -------- + ax: Matplotlib axis object + """ + import matplotlib.pyplot as plt + import matplotlib.patches as patches + + def normalize(series): + a = min(series) + b = max(series) + return (series - a) / (b - a) + + n = len(frame) + classes = frame[class_column].drop_duplicates() + class_col = frame[class_column] + df = frame.drop(class_column, axis=1).apply(normalize) + + if ax is None: + ax = plt.gca(xlim=[-1, 1], ylim=[-1, 1]) + + to_plot = {} + colors = _get_standard_colors(num_colors=len(classes), colormap=colormap, + color_type='random', color=color) + + for kls in classes: + to_plot[kls] = [[], []] + + m = len(frame.columns) - 1 + s = np.array([(np.cos(t), np.sin(t)) + for t in [2.0 * np.pi * (i / float(m)) + for i in range(m)]]) + + for i in range(n): + row = df.iloc[i].values + row_ = np.repeat(np.expand_dims(row, axis=1), 2, axis=1) + y = (s * row_).sum(axis=0) / row.sum() + kls = class_col.iat[i] + to_plot[kls][0].append(y[0]) + to_plot[kls][1].append(y[1]) + + for i, kls in enumerate(classes): + ax.scatter(to_plot[kls][0], to_plot[kls][1], color=colors[i], + label=pprint_thing(kls), **kwds) + ax.legend() + + ax.add_patch(patches.Circle((0.0, 0.0), radius=1.0, facecolor='none')) + + for xy, name in zip(s, df.columns): + + ax.add_patch(patches.Circle(xy, radius=0.025, facecolor='gray')) + + if xy[0] < 0.0 and xy[1] < 0.0: + ax.text(xy[0] - 0.025, xy[1] - 0.025, name, + ha='right', va='top', size='small') + elif xy[0] < 0.0 and xy[1] >= 0.0: + ax.text(xy[0] - 0.025, xy[1] + 0.025, name, + ha='right', va='bottom', size='small') + elif xy[0] >= 0.0 and xy[1] < 0.0: + ax.text(xy[0] + 0.025, xy[1] - 0.025, name, + ha='left', va='top', size='small') + elif xy[0] >= 0.0 and xy[1] >= 0.0: + ax.text(xy[0] + 0.025, xy[1] + 0.025, name, + ha='left', va='bottom', size='small') + + ax.axis('equal') + return ax + + +@deprecate_kwarg(old_arg_name='data', new_arg_name='frame') +def andrews_curves(frame, class_column, ax=None, samples=200, color=None, + colormap=None, **kwds): + """ + Generates a matplotlib plot of Andrews curves, for visualising clusters of + multivariate data. + + Andrews curves have the functional form: + + f(t) = x_1/sqrt(2) + x_2 sin(t) + x_3 cos(t) + + x_4 sin(2t) + x_5 cos(2t) + ... + + Where x coefficients correspond to the values of each dimension and t is + linearly spaced between -pi and +pi. Each row of frame then corresponds to + a single curve. + + Parameters: + ----------- + frame : DataFrame + Data to be plotted, preferably normalized to (0.0, 1.0) + class_column : Name of the column containing class names + ax : matplotlib axes object, default None + samples : Number of points to plot in each curve + color: list or tuple, optional + Colors to use for the different classes + colormap : str or matplotlib colormap object, default None + Colormap to select colors from. If string, load colormap with that name + from matplotlib. + kwds: keywords + Options to pass to matplotlib plotting method + + Returns: + -------- + ax: Matplotlib axis object + + """ + from math import sqrt, pi + import matplotlib.pyplot as plt + + def function(amplitudes): + def f(t): + x1 = amplitudes[0] + result = x1 / sqrt(2.0) + + # Take the rest of the coefficients and resize them + # appropriately. Take a copy of amplitudes as otherwise numpy + # deletes the element from amplitudes itself. + coeffs = np.delete(np.copy(amplitudes), 0) + coeffs.resize(int((coeffs.size + 1) / 2), 2) + + # Generate the harmonics and arguments for the sin and cos + # functions. + harmonics = np.arange(0, coeffs.shape[0]) + 1 + trig_args = np.outer(harmonics, t) + + result += np.sum(coeffs[:, 0, np.newaxis] * np.sin(trig_args) + + coeffs[:, 1, np.newaxis] * np.cos(trig_args), + axis=0) + return result + return f + + n = len(frame) + class_col = frame[class_column] + classes = frame[class_column].drop_duplicates() + df = frame.drop(class_column, axis=1) + t = np.linspace(-pi, pi, samples) + used_legends = set([]) + + color_values = _get_standard_colors(num_colors=len(classes), + colormap=colormap, color_type='random', + color=color) + colors = dict(zip(classes, color_values)) + if ax is None: + ax = plt.gca(xlim=(-pi, pi)) + for i in range(n): + row = df.iloc[i].values + f = function(row) + y = f(t) + kls = class_col.iat[i] + label = pprint_thing(kls) + if label not in used_legends: + used_legends.add(label) + ax.plot(t, y, color=colors[kls], label=label, **kwds) + else: + ax.plot(t, y, color=colors[kls], **kwds) + + ax.legend(loc='upper right') + ax.grid() + return ax + + +def bootstrap_plot(series, fig=None, size=50, samples=500, **kwds): + """Bootstrap plot. + + Parameters: + ----------- + series: Time series + fig: matplotlib figure object, optional + size: number of data points to consider during each sampling + samples: number of times the bootstrap procedure is performed + kwds: optional keyword arguments for plotting commands, must be accepted + by both hist and plot + + Returns: + -------- + fig: matplotlib figure + """ + import random + import matplotlib.pyplot as plt + + # random.sample(ndarray, int) fails on python 3.3, sigh + data = list(series.values) + samplings = [random.sample(data, size) for _ in range(samples)] + + means = np.array([np.mean(sampling) for sampling in samplings]) + medians = np.array([np.median(sampling) for sampling in samplings]) + midranges = np.array([(min(sampling) + max(sampling)) * 0.5 + for sampling in samplings]) + if fig is None: + fig = plt.figure() + x = lrange(samples) + axes = [] + ax1 = fig.add_subplot(2, 3, 1) + ax1.set_xlabel("Sample") + axes.append(ax1) + ax1.plot(x, means, **kwds) + ax2 = fig.add_subplot(2, 3, 2) + ax2.set_xlabel("Sample") + axes.append(ax2) + ax2.plot(x, medians, **kwds) + ax3 = fig.add_subplot(2, 3, 3) + ax3.set_xlabel("Sample") + axes.append(ax3) + ax3.plot(x, midranges, **kwds) + ax4 = fig.add_subplot(2, 3, 4) + ax4.set_xlabel("Mean") + axes.append(ax4) + ax4.hist(means, **kwds) + ax5 = fig.add_subplot(2, 3, 5) + ax5.set_xlabel("Median") + axes.append(ax5) + ax5.hist(medians, **kwds) + ax6 = fig.add_subplot(2, 3, 6) + ax6.set_xlabel("Midrange") + axes.append(ax6) + ax6.hist(midranges, **kwds) + for axis in axes: + plt.setp(axis.get_xticklabels(), fontsize=8) + plt.setp(axis.get_yticklabels(), fontsize=8) + return fig + + +@deprecate_kwarg(old_arg_name='colors', new_arg_name='color') +@deprecate_kwarg(old_arg_name='data', new_arg_name='frame', stacklevel=3) +def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None, + use_columns=False, xticks=None, colormap=None, + axvlines=True, axvlines_kwds=None, sort_labels=False, + **kwds): + """Parallel coordinates plotting. + + Parameters + ---------- + frame: DataFrame + class_column: str + Column name containing class names + cols: list, optional + A list of column names to use + ax: matplotlib.axis, optional + matplotlib axis object + color: list or tuple, optional + Colors to use for the different classes + use_columns: bool, optional + If true, columns will be used as xticks + xticks: list or tuple, optional + A list of values to use for xticks + colormap: str or matplotlib colormap, default None + Colormap to use for line colors. + axvlines: bool, optional + If true, vertical lines will be added at each xtick + axvlines_kwds: keywords, optional + Options to be passed to axvline method for vertical lines + sort_labels: bool, False + Sort class_column labels, useful when assigning colours + + .. versionadded:: 0.20.0 + + kwds: keywords + Options to pass to matplotlib plotting method + + Returns + ------- + ax: matplotlib axis object + + Examples + -------- + >>> from pandas import read_csv + >>> from pandas.tools.plotting import parallel_coordinates + >>> from matplotlib import pyplot as plt + >>> df = read_csv('https://raw.github.com/pandas-dev/pandas/master' + '/pandas/tests/data/iris.csv') + >>> parallel_coordinates(df, 'Name', color=('#556270', + '#4ECDC4', '#C7F464')) + >>> plt.show() + """ + if axvlines_kwds is None: + axvlines_kwds = {'linewidth': 1, 'color': 'black'} + import matplotlib.pyplot as plt + + n = len(frame) + classes = frame[class_column].drop_duplicates() + class_col = frame[class_column] + + if cols is None: + df = frame.drop(class_column, axis=1) + else: + df = frame[cols] + + used_legends = set([]) + + ncols = len(df.columns) + + # determine values to use for xticks + if use_columns is True: + if not np.all(np.isreal(list(df.columns))): + raise ValueError('Columns must be numeric to be used as xticks') + x = df.columns + elif xticks is not None: + if not np.all(np.isreal(xticks)): + raise ValueError('xticks specified must be numeric') + elif len(xticks) != ncols: + raise ValueError('Length of xticks must match number of columns') + x = xticks + else: + x = lrange(ncols) + + if ax is None: + ax = plt.gca() + + color_values = _get_standard_colors(num_colors=len(classes), + colormap=colormap, color_type='random', + color=color) + + if sort_labels: + classes = sorted(classes) + color_values = sorted(color_values) + colors = dict(zip(classes, color_values)) + + for i in range(n): + y = df.iloc[i].values + kls = class_col.iat[i] + label = pprint_thing(kls) + if label not in used_legends: + used_legends.add(label) + ax.plot(x, y, color=colors[kls], label=label, **kwds) + else: + ax.plot(x, y, color=colors[kls], **kwds) + + if axvlines: + for i in x: + ax.axvline(i, **axvlines_kwds) + + ax.set_xticks(x) + ax.set_xticklabels(df.columns) + ax.set_xlim(x[0], x[-1]) + ax.legend(loc='upper right') + ax.grid() + return ax + + +def lag_plot(series, lag=1, ax=None, **kwds): + """Lag plot for time series. + + Parameters: + ----------- + series: Time series + lag: lag of the scatter plot, default 1 + ax: Matplotlib axis object, optional + kwds: Matplotlib scatter method keyword arguments, optional + + Returns: + -------- + ax: Matplotlib axis object + """ + import matplotlib.pyplot as plt + + # workaround because `c='b'` is hardcoded in matplotlibs scatter method + kwds.setdefault('c', plt.rcParams['patch.facecolor']) + + data = series.values + y1 = data[:-lag] + y2 = data[lag:] + if ax is None: + ax = plt.gca() + ax.set_xlabel("y(t)") + ax.set_ylabel("y(t + %s)" % lag) + ax.scatter(y1, y2, **kwds) + return ax + + +def autocorrelation_plot(series, ax=None, **kwds): + """Autocorrelation plot for time series. + + Parameters: + ----------- + series: Time series + ax: Matplotlib axis object, optional + kwds : keywords + Options to pass to matplotlib plotting method + + Returns: + ----------- + ax: Matplotlib axis object + """ + import matplotlib.pyplot as plt + n = len(series) + data = np.asarray(series) + if ax is None: + ax = plt.gca(xlim=(1, n), ylim=(-1.0, 1.0)) + mean = np.mean(data) + c0 = np.sum((data - mean) ** 2) / float(n) + + def r(h): + return ((data[:n - h] - mean) * + (data[h:] - mean)).sum() / float(n) / c0 + x = np.arange(n) + 1 + y = lmap(r, x) + z95 = 1.959963984540054 + z99 = 2.5758293035489004 + ax.axhline(y=z99 / np.sqrt(n), linestyle='--', color='grey') + ax.axhline(y=z95 / np.sqrt(n), color='grey') + ax.axhline(y=0.0, color='black') + ax.axhline(y=-z95 / np.sqrt(n), color='grey') + ax.axhline(y=-z99 / np.sqrt(n), linestyle='--', color='grey') + ax.set_xlabel("Lag") + ax.set_ylabel("Autocorrelation") + ax.plot(x, y, **kwds) + if 'label' in kwds: + ax.legend() + ax.grid() + return ax diff --git a/pandas/plotting/_style.py b/pandas/plotting/_style.py new file mode 100644 index 0000000000000..8cb4e30e0d91c --- /dev/null +++ b/pandas/plotting/_style.py @@ -0,0 +1,246 @@ +# being a bit too dynamic +# pylint: disable=E1101 +from __future__ import division + +import warnings +from contextlib import contextmanager +import re + +import numpy as np + +from pandas.core.dtypes.common import is_list_like +from pandas.compat import range, lrange, lmap +import pandas.compat as compat +from pandas.plotting._compat import _mpl_ge_2_0_0 + + +# Extracted from https://gist.github.com/huyng/816622 +# this is the rcParams set when setting display.with_mpl_style +# to True. +mpl_stylesheet = { + 'axes.axisbelow': True, + 'axes.color_cycle': ['#348ABD', + '#7A68A6', + '#A60628', + '#467821', + '#CF4457', + '#188487', + '#E24A33'], + 'axes.edgecolor': '#bcbcbc', + 'axes.facecolor': '#eeeeee', + 'axes.grid': True, + 'axes.labelcolor': '#555555', + 'axes.labelsize': 'large', + 'axes.linewidth': 1.0, + 'axes.titlesize': 'x-large', + 'figure.edgecolor': 'white', + 'figure.facecolor': 'white', + 'figure.figsize': (6.0, 4.0), + 'figure.subplot.hspace': 0.5, + 'font.family': 'monospace', + 'font.monospace': ['Andale Mono', + 'Nimbus Mono L', + 'Courier New', + 'Courier', + 'Fixed', + 'Terminal', + 'monospace'], + 'font.size': 10, + 'interactive': True, + 'keymap.all_axes': ['a'], + 'keymap.back': ['left', 'c', 'backspace'], + 'keymap.forward': ['right', 'v'], + 'keymap.fullscreen': ['f'], + 'keymap.grid': ['g'], + 'keymap.home': ['h', 'r', 'home'], + 'keymap.pan': ['p'], + 'keymap.save': ['s'], + 'keymap.xscale': ['L', 'k'], + 'keymap.yscale': ['l'], + 'keymap.zoom': ['o'], + 'legend.fancybox': True, + 'lines.antialiased': True, + 'lines.linewidth': 1.0, + 'patch.antialiased': True, + 'patch.edgecolor': '#EEEEEE', + 'patch.facecolor': '#348ABD', + 'patch.linewidth': 0.5, + 'toolbar': 'toolbar2', + 'xtick.color': '#555555', + 'xtick.direction': 'in', + 'xtick.major.pad': 6.0, + 'xtick.major.size': 0.0, + 'xtick.minor.pad': 6.0, + 'xtick.minor.size': 0.0, + 'ytick.color': '#555555', + 'ytick.direction': 'in', + 'ytick.major.pad': 6.0, + 'ytick.major.size': 0.0, + 'ytick.minor.pad': 6.0, + 'ytick.minor.size': 0.0 +} + + +def _get_standard_colors(num_colors=None, colormap=None, color_type='default', + color=None): + import matplotlib.pyplot as plt + + if color is None and colormap is not None: + if isinstance(colormap, compat.string_types): + import matplotlib.cm as cm + cmap = colormap + colormap = cm.get_cmap(colormap) + if colormap is None: + raise ValueError("Colormap {0} is not recognized".format(cmap)) + colors = lmap(colormap, np.linspace(0, 1, num=num_colors)) + elif color is not None: + if colormap is not None: + warnings.warn("'color' and 'colormap' cannot be used " + "simultaneously. Using 'color'") + colors = list(color) if is_list_like(color) else color + else: + if color_type == 'default': + # need to call list() on the result to copy so we don't + # modify the global rcParams below + try: + colors = [c['color'] + for c in list(plt.rcParams['axes.prop_cycle'])] + except KeyError: + colors = list(plt.rcParams.get('axes.color_cycle', + list('bgrcmyk'))) + if isinstance(colors, compat.string_types): + colors = list(colors) + elif color_type == 'random': + import random + + def random_color(column): + random.seed(column) + return [random.random() for _ in range(3)] + + colors = lmap(random_color, lrange(num_colors)) + else: + raise ValueError("color_type must be either 'default' or 'random'") + + if isinstance(colors, compat.string_types): + import matplotlib.colors + conv = matplotlib.colors.ColorConverter() + + def _maybe_valid_colors(colors): + try: + [conv.to_rgba(c) for c in colors] + return True + except ValueError: + return False + + # check whether the string can be convertable to single color + maybe_single_color = _maybe_valid_colors([colors]) + # check whether each character can be convertable to colors + maybe_color_cycle = _maybe_valid_colors(list(colors)) + if maybe_single_color and maybe_color_cycle and len(colors) > 1: + # Special case for single str 'CN' match and convert to hex + # for supporting matplotlib < 2.0.0 + if re.match(r'\AC[0-9]\Z', colors) and _mpl_ge_2_0_0(): + hex_color = [c['color'] + for c in list(plt.rcParams['axes.prop_cycle'])] + colors = [hex_color[int(colors[1])]] + else: + # this may no longer be required + msg = ("'{0}' can be parsed as both single color and " + "color cycle. Specify each color using a list " + "like ['{0}'] or {1}") + raise ValueError(msg.format(colors, list(colors))) + elif maybe_single_color: + colors = [colors] + else: + # ``colors`` is regarded as color cycle. + # mpl will raise error any of them is invalid + pass + + if len(colors) != num_colors: + try: + multiple = num_colors // len(colors) - 1 + except ZeroDivisionError: + raise ValueError("Invalid color argument: ''") + mod = num_colors % len(colors) + + colors += multiple * colors + colors += colors[:mod] + + return colors + + +class _Options(dict): + """ + Stores pandas plotting options. + Allows for parameter aliasing so you can just use parameter names that are + the same as the plot function parameters, but is stored in a canonical + format that makes it easy to breakdown into groups later + """ + + # alias so the names are same as plotting method parameter names + _ALIASES = {'x_compat': 'xaxis.compat'} + _DEFAULT_KEYS = ['xaxis.compat'] + + def __init__(self, deprecated=False): + self._deprecated = deprecated + # self['xaxis.compat'] = False + super(_Options, self).__setitem__('xaxis.compat', False) + + def _warn_if_deprecated(self): + if self._deprecated: + warnings.warn("'pandas.plot_params' is deprecated. Use " + "'pandas.plotting.plot_params' instead", + FutureWarning, stacklevel=3) + + def __getitem__(self, key): + self._warn_if_deprecated() + key = self._get_canonical_key(key) + if key not in self: + raise ValueError('%s is not a valid pandas plotting option' % key) + return super(_Options, self).__getitem__(key) + + def __setitem__(self, key, value): + self._warn_if_deprecated() + key = self._get_canonical_key(key) + return super(_Options, self).__setitem__(key, value) + + def __delitem__(self, key): + key = self._get_canonical_key(key) + if key in self._DEFAULT_KEYS: + raise ValueError('Cannot remove default parameter %s' % key) + return super(_Options, self).__delitem__(key) + + def __contains__(self, key): + key = self._get_canonical_key(key) + return super(_Options, self).__contains__(key) + + def reset(self): + """ + Reset the option store to its initial state + + Returns + ------- + None + """ + self._warn_if_deprecated() + self.__init__() + + def _get_canonical_key(self, key): + return self._ALIASES.get(key, key) + + @contextmanager + def use(self, key, value): + """ + Temporarily set a parameter value using the with statement. + Aliasing allowed. + """ + self._warn_if_deprecated() + old_value = self[key] + try: + self[key] = value + yield self + finally: + self[key] = old_value + + +plot_params = _Options() diff --git a/pandas/plotting/_timeseries.py b/pandas/plotting/_timeseries.py new file mode 100644 index 0000000000000..3d04973ed0009 --- /dev/null +++ b/pandas/plotting/_timeseries.py @@ -0,0 +1,339 @@ +# TODO: Use the fact that axis can have units to simplify the process + +import numpy as np + +from matplotlib import pylab +from pandas.core.indexes.period import Period +from pandas.tseries.offsets import DateOffset +import pandas.tseries.frequencies as frequencies +from pandas.core.indexes.datetimes import DatetimeIndex +from pandas.core.indexes.period import PeriodIndex +from pandas.core.indexes.timedeltas import TimedeltaIndex +from pandas.io.formats.printing import pprint_thing +import pandas.compat as compat + +from pandas.plotting._converter import (TimeSeries_DateLocator, + TimeSeries_DateFormatter, + TimeSeries_TimedeltaFormatter) + +# --------------------------------------------------------------------- +# Plotting functions and monkey patches + + +def tsplot(series, plotf, ax=None, **kwargs): + """ + Plots a Series on the given Matplotlib axes or the current axes + + Parameters + ---------- + axes : Axes + series : Series + + Notes + _____ + Supports same kwargs as Axes.plot + + """ + # Used inferred freq is possible, need a test case for inferred + if ax is None: + import matplotlib.pyplot as plt + ax = plt.gca() + + freq, series = _maybe_resample(series, ax, kwargs) + + # Set ax with freq info + _decorate_axes(ax, freq, kwargs) + ax._plot_data.append((series, plotf, kwargs)) + lines = plotf(ax, series.index._mpl_repr(), series.values, **kwargs) + + # set date formatter, locators and rescale limits + format_dateaxis(ax, ax.freq, series.index) + return lines + + +def _maybe_resample(series, ax, kwargs): + # resample against axes freq if necessary + freq, ax_freq = _get_freq(ax, series) + + if freq is None: # pragma: no cover + raise ValueError('Cannot use dynamic axis without frequency info') + + # Convert DatetimeIndex to PeriodIndex + if isinstance(series.index, DatetimeIndex): + series = series.to_period(freq=freq) + + if ax_freq is not None and freq != ax_freq: + if frequencies.is_superperiod(freq, ax_freq): # upsample input + series = series.copy() + series.index = series.index.asfreq(ax_freq, how='s') + freq = ax_freq + elif _is_sup(freq, ax_freq): # one is weekly + how = kwargs.pop('how', 'last') + series = getattr(series.resample('D'), how)().dropna() + series = getattr(series.resample(ax_freq), how)().dropna() + freq = ax_freq + elif frequencies.is_subperiod(freq, ax_freq) or _is_sub(freq, ax_freq): + _upsample_others(ax, freq, kwargs) + ax_freq = freq + else: # pragma: no cover + raise ValueError('Incompatible frequency conversion') + return freq, series + + +def _is_sub(f1, f2): + return ((f1.startswith('W') and frequencies.is_subperiod('D', f2)) or + (f2.startswith('W') and frequencies.is_subperiod(f1, 'D'))) + + +def _is_sup(f1, f2): + return ((f1.startswith('W') and frequencies.is_superperiod('D', f2)) or + (f2.startswith('W') and frequencies.is_superperiod(f1, 'D'))) + + +def _upsample_others(ax, freq, kwargs): + legend = ax.get_legend() + lines, labels = _replot_ax(ax, freq, kwargs) + _replot_ax(ax, freq, kwargs) + + other_ax = None + if hasattr(ax, 'left_ax'): + other_ax = ax.left_ax + if hasattr(ax, 'right_ax'): + other_ax = ax.right_ax + + if other_ax is not None: + rlines, rlabels = _replot_ax(other_ax, freq, kwargs) + lines.extend(rlines) + labels.extend(rlabels) + + if (legend is not None and kwargs.get('legend', True) and + len(lines) > 0): + title = legend.get_title().get_text() + if title == 'None': + title = None + ax.legend(lines, labels, loc='best', title=title) + + +def _replot_ax(ax, freq, kwargs): + data = getattr(ax, '_plot_data', None) + + # clear current axes and data + ax._plot_data = [] + ax.clear() + + _decorate_axes(ax, freq, kwargs) + + lines = [] + labels = [] + if data is not None: + for series, plotf, kwds in data: + series = series.copy() + idx = series.index.asfreq(freq, how='S') + series.index = idx + ax._plot_data.append((series, plotf, kwds)) + + # for tsplot + if isinstance(plotf, compat.string_types): + from pandas.plotting._core import _plot_klass + plotf = _plot_klass[plotf]._plot + + lines.append(plotf(ax, series.index._mpl_repr(), + series.values, **kwds)[0]) + labels.append(pprint_thing(series.name)) + + return lines, labels + + +def _decorate_axes(ax, freq, kwargs): + """Initialize axes for time-series plotting""" + if not hasattr(ax, '_plot_data'): + ax._plot_data = [] + + ax.freq = freq + xaxis = ax.get_xaxis() + xaxis.freq = freq + if not hasattr(ax, 'legendlabels'): + ax.legendlabels = [kwargs.get('label', None)] + else: + ax.legendlabels.append(kwargs.get('label', None)) + ax.view_interval = None + ax.date_axis_info = None + + +def _get_ax_freq(ax): + """ + Get the freq attribute of the ax object if set. + Also checks shared axes (eg when using secondary yaxis, sharex=True + or twinx) + """ + ax_freq = getattr(ax, 'freq', None) + if ax_freq is None: + # check for left/right ax in case of secondary yaxis + if hasattr(ax, 'left_ax'): + ax_freq = getattr(ax.left_ax, 'freq', None) + elif hasattr(ax, 'right_ax'): + ax_freq = getattr(ax.right_ax, 'freq', None) + if ax_freq is None: + # check if a shared ax (sharex/twinx) has already freq set + shared_axes = ax.get_shared_x_axes().get_siblings(ax) + if len(shared_axes) > 1: + for shared_ax in shared_axes: + ax_freq = getattr(shared_ax, 'freq', None) + if ax_freq is not None: + break + return ax_freq + + +def _get_freq(ax, series): + # get frequency from data + freq = getattr(series.index, 'freq', None) + if freq is None: + freq = getattr(series.index, 'inferred_freq', None) + + ax_freq = _get_ax_freq(ax) + + # use axes freq if no data freq + if freq is None: + freq = ax_freq + + # get the period frequency + if isinstance(freq, DateOffset): + freq = freq.rule_code + else: + freq = frequencies.get_base_alias(freq) + + freq = frequencies.get_period_alias(freq) + return freq, ax_freq + + +def _use_dynamic_x(ax, data): + freq = _get_index_freq(data) + ax_freq = _get_ax_freq(ax) + + if freq is None: # convert irregular if axes has freq info + freq = ax_freq + else: # do not use tsplot if irregular was plotted first + if (ax_freq is None) and (len(ax.get_lines()) > 0): + return False + + if freq is None: + return False + + if isinstance(freq, DateOffset): + freq = freq.rule_code + else: + freq = frequencies.get_base_alias(freq) + freq = frequencies.get_period_alias(freq) + + if freq is None: + return False + + # hack this for 0.10.1, creating more technical debt...sigh + if isinstance(data.index, DatetimeIndex): + base = frequencies.get_freq(freq) + x = data.index + if (base <= frequencies.FreqGroup.FR_DAY): + return x[:1].is_normalized + return Period(x[0], freq).to_timestamp(tz=x.tz) == x[0] + return True + + +def _get_index_freq(data): + freq = getattr(data.index, 'freq', None) + if freq is None: + freq = getattr(data.index, 'inferred_freq', None) + if freq == 'B': + weekdays = np.unique(data.index.dayofweek) + if (5 in weekdays) or (6 in weekdays): + freq = None + return freq + + +def _maybe_convert_index(ax, data): + # tsplot converts automatically, but don't want to convert index + # over and over for DataFrames + if isinstance(data.index, DatetimeIndex): + freq = getattr(data.index, 'freq', None) + + if freq is None: + freq = getattr(data.index, 'inferred_freq', None) + if isinstance(freq, DateOffset): + freq = freq.rule_code + + if freq is None: + freq = _get_ax_freq(ax) + + if freq is None: + raise ValueError('Could not get frequency alias for plotting') + + freq = frequencies.get_base_alias(freq) + freq = frequencies.get_period_alias(freq) + + data = data.to_period(freq=freq) + return data + + +# Patch methods for subplot. Only format_dateaxis is currently used. +# Do we need the rest for convenience? + +def format_timedelta_ticks(x, pos, n_decimals): + """ + Convert seconds to 'D days HH:MM:SS.F' + """ + s, ns = divmod(x, 1e9) + m, s = divmod(s, 60) + h, m = divmod(m, 60) + d, h = divmod(h, 24) + decimals = int(ns * 10**(n_decimals - 9)) + s = r'{:02d}:{:02d}:{:02d}'.format(int(h), int(m), int(s)) + if n_decimals > 0: + s += '.{{:0{:0d}d}}'.format(n_decimals).format(decimals) + if d != 0: + s = '{:d} days '.format(int(d)) + s + return s + + +def format_dateaxis(subplot, freq, index): + """ + Pretty-formats the date axis (x-axis). + + Major and minor ticks are automatically set for the frequency of the + current underlying series. As the dynamic mode is activated by + default, changing the limits of the x axis will intelligently change + the positions of the ticks. + """ + + # handle index specific formatting + # Note: DatetimeIndex does not use this + # interface. DatetimeIndex uses matplotlib.date directly + if isinstance(index, PeriodIndex): + + majlocator = TimeSeries_DateLocator(freq, dynamic_mode=True, + minor_locator=False, + plot_obj=subplot) + minlocator = TimeSeries_DateLocator(freq, dynamic_mode=True, + minor_locator=True, + plot_obj=subplot) + subplot.xaxis.set_major_locator(majlocator) + subplot.xaxis.set_minor_locator(minlocator) + + majformatter = TimeSeries_DateFormatter(freq, dynamic_mode=True, + minor_locator=False, + plot_obj=subplot) + minformatter = TimeSeries_DateFormatter(freq, dynamic_mode=True, + minor_locator=True, + plot_obj=subplot) + subplot.xaxis.set_major_formatter(majformatter) + subplot.xaxis.set_minor_formatter(minformatter) + + # x and y coord info + subplot.format_coord = lambda t, y: ( + "t = {0} y = {1:8f}".format(Period(ordinal=int(t), freq=freq), y)) + + elif isinstance(index, TimedeltaIndex): + subplot.xaxis.set_major_formatter( + TimeSeries_TimedeltaFormatter()) + else: + raise TypeError('index type not supported') + + pylab.draw_if_interactive() diff --git a/pandas/plotting/_tools.py b/pandas/plotting/_tools.py new file mode 100644 index 0000000000000..389e238ccb96e --- /dev/null +++ b/pandas/plotting/_tools.py @@ -0,0 +1,382 @@ +# being a bit too dynamic +# pylint: disable=E1101 +from __future__ import division + +import warnings +from math import ceil + +import numpy as np + +from pandas.core.dtypes.common import is_list_like +from pandas.core.dtypes.generic import ABCSeries +from pandas.core.index import Index +from pandas.compat import range + + +def format_date_labels(ax, rot): + # mini version of autofmt_xdate + try: + for label in ax.get_xticklabels(): + label.set_ha('right') + label.set_rotation(rot) + fig = ax.get_figure() + fig.subplots_adjust(bottom=0.2) + except Exception: # pragma: no cover + pass + + +def table(ax, data, rowLabels=None, colLabels=None, **kwargs): + """ + Helper function to convert DataFrame and Series to matplotlib.table + + Parameters + ---------- + `ax`: Matplotlib axes object + `data`: DataFrame or Series + data for table contents + `kwargs`: keywords, optional + keyword arguments which passed to matplotlib.table.table. + If `rowLabels` or `colLabels` is not specified, data index or column + name will be used. + + Returns + ------- + matplotlib table object + """ + from pandas import DataFrame + if isinstance(data, ABCSeries): + data = DataFrame(data, columns=[data.name]) + elif isinstance(data, DataFrame): + pass + else: + raise ValueError('Input data must be DataFrame or Series') + + if rowLabels is None: + rowLabels = data.index + + if colLabels is None: + colLabels = data.columns + + cellText = data.values + + import matplotlib.table + table = matplotlib.table.table(ax, cellText=cellText, + rowLabels=rowLabels, + colLabels=colLabels, **kwargs) + return table + + +def _get_layout(nplots, layout=None, layout_type='box'): + if layout is not None: + if not isinstance(layout, (tuple, list)) or len(layout) != 2: + raise ValueError('Layout must be a tuple of (rows, columns)') + + nrows, ncols = layout + + # Python 2 compat + ceil_ = lambda x: int(ceil(x)) + if nrows == -1 and ncols > 0: + layout = nrows, ncols = (ceil_(float(nplots) / ncols), ncols) + elif ncols == -1 and nrows > 0: + layout = nrows, ncols = (nrows, ceil_(float(nplots) / nrows)) + elif ncols <= 0 and nrows <= 0: + msg = "At least one dimension of layout must be positive" + raise ValueError(msg) + + if nrows * ncols < nplots: + raise ValueError('Layout of %sx%s must be larger than ' + 'required size %s' % (nrows, ncols, nplots)) + + return layout + + if layout_type == 'single': + return (1, 1) + elif layout_type == 'horizontal': + return (1, nplots) + elif layout_type == 'vertical': + return (nplots, 1) + + layouts = {1: (1, 1), 2: (1, 2), 3: (2, 2), 4: (2, 2)} + try: + return layouts[nplots] + except KeyError: + k = 1 + while k ** 2 < nplots: + k += 1 + + if (k - 1) * k >= nplots: + return k, (k - 1) + else: + return k, k + +# copied from matplotlib/pyplot.py and modified for pandas.plotting + + +def _subplots(naxes=None, sharex=False, sharey=False, squeeze=True, + subplot_kw=None, ax=None, layout=None, layout_type='box', + **fig_kw): + """Create a figure with a set of subplots already made. + + This utility wrapper makes it convenient to create common layouts of + subplots, including the enclosing figure object, in a single call. + + Keyword arguments: + + naxes : int + Number of required axes. Exceeded axes are set invisible. Default is + nrows * ncols. + + sharex : bool + If True, the X axis will be shared amongst all subplots. + + sharey : bool + If True, the Y axis will be shared amongst all subplots. + + squeeze : bool + + If True, extra dimensions are squeezed out from the returned axis object: + - if only one subplot is constructed (nrows=ncols=1), the resulting + single Axis object is returned as a scalar. + - for Nx1 or 1xN subplots, the returned object is a 1-d numpy object + array of Axis objects are returned as numpy 1-d arrays. + - for NxM subplots with N>1 and M>1 are returned as a 2d array. + + If False, no squeezing at all is done: the returned axis object is always + a 2-d array containing Axis instances, even if it ends up being 1x1. + + subplot_kw : dict + Dict with keywords passed to the add_subplot() call used to create each + subplots. + + ax : Matplotlib axis object, optional + + layout : tuple + Number of rows and columns of the subplot grid. + If not specified, calculated from naxes and layout_type + + layout_type : {'box', 'horziontal', 'vertical'}, default 'box' + Specify how to layout the subplot grid. + + fig_kw : Other keyword arguments to be passed to the figure() call. + Note that all keywords not recognized above will be + automatically included here. + + Returns: + + fig, ax : tuple + - fig is the Matplotlib Figure object + - ax can be either a single axis object or an array of axis objects if + more than one subplot was created. The dimensions of the resulting array + can be controlled with the squeeze keyword, see above. + + **Examples:** + + x = np.linspace(0, 2*np.pi, 400) + y = np.sin(x**2) + + # Just a figure and one subplot + f, ax = plt.subplots() + ax.plot(x, y) + ax.set_title('Simple plot') + + # Two subplots, unpack the output array immediately + f, (ax1, ax2) = plt.subplots(1, 2, sharey=True) + ax1.plot(x, y) + ax1.set_title('Sharing Y axis') + ax2.scatter(x, y) + + # Four polar axes + plt.subplots(2, 2, subplot_kw=dict(polar=True)) + """ + import matplotlib.pyplot as plt + + if subplot_kw is None: + subplot_kw = {} + + if ax is None: + fig = plt.figure(**fig_kw) + else: + if is_list_like(ax): + ax = _flatten(ax) + if layout is not None: + warnings.warn("When passing multiple axes, layout keyword is " + "ignored", UserWarning) + if sharex or sharey: + warnings.warn("When passing multiple axes, sharex and sharey " + "are ignored. These settings must be specified " + "when creating axes", UserWarning, + stacklevel=4) + if len(ax) == naxes: + fig = ax[0].get_figure() + return fig, ax + else: + raise ValueError("The number of passed axes must be {0}, the " + "same as the output plot".format(naxes)) + + fig = ax.get_figure() + # if ax is passed and a number of subplots is 1, return ax as it is + if naxes == 1: + if squeeze: + return fig, ax + else: + return fig, _flatten(ax) + else: + warnings.warn("To output multiple subplots, the figure containing " + "the passed axes is being cleared", UserWarning, + stacklevel=4) + fig.clear() + + nrows, ncols = _get_layout(naxes, layout=layout, layout_type=layout_type) + nplots = nrows * ncols + + # Create empty object array to hold all axes. It's easiest to make it 1-d + # so we can just append subplots upon creation, and then + axarr = np.empty(nplots, dtype=object) + + # Create first subplot separately, so we can share it if requested + ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw) + + if sharex: + subplot_kw['sharex'] = ax0 + if sharey: + subplot_kw['sharey'] = ax0 + axarr[0] = ax0 + + # Note off-by-one counting because add_subplot uses the MATLAB 1-based + # convention. + for i in range(1, nplots): + kwds = subplot_kw.copy() + # Set sharex and sharey to None for blank/dummy axes, these can + # interfere with proper axis limits on the visible axes if + # they share axes e.g. issue #7528 + if i >= naxes: + kwds['sharex'] = None + kwds['sharey'] = None + ax = fig.add_subplot(nrows, ncols, i + 1, **kwds) + axarr[i] = ax + + if naxes != nplots: + for ax in axarr[naxes:]: + ax.set_visible(False) + + _handle_shared_axes(axarr, nplots, naxes, nrows, ncols, sharex, sharey) + + if squeeze: + # Reshape the array to have the final desired dimension (nrow,ncol), + # though discarding unneeded dimensions that equal 1. If we only have + # one subplot, just return it instead of a 1-element array. + if nplots == 1: + axes = axarr[0] + else: + axes = axarr.reshape(nrows, ncols).squeeze() + else: + # returned axis array will be always 2-d, even if nrows=ncols=1 + axes = axarr.reshape(nrows, ncols) + + return fig, axes + + +def _remove_labels_from_axis(axis): + for t in axis.get_majorticklabels(): + t.set_visible(False) + + try: + # set_visible will not be effective if + # minor axis has NullLocator and NullFormattor (default) + import matplotlib.ticker as ticker + if isinstance(axis.get_minor_locator(), ticker.NullLocator): + axis.set_minor_locator(ticker.AutoLocator()) + if isinstance(axis.get_minor_formatter(), ticker.NullFormatter): + axis.set_minor_formatter(ticker.FormatStrFormatter('')) + for t in axis.get_minorticklabels(): + t.set_visible(False) + except Exception: # pragma no cover + raise + axis.get_label().set_visible(False) + + +def _handle_shared_axes(axarr, nplots, naxes, nrows, ncols, sharex, sharey): + if nplots > 1: + + if nrows > 1: + try: + # first find out the ax layout, + # so that we can correctly handle 'gaps" + layout = np.zeros((nrows + 1, ncols + 1), dtype=np.bool) + for ax in axarr: + layout[ax.rowNum, ax.colNum] = ax.get_visible() + + for ax in axarr: + # only the last row of subplots should get x labels -> all + # other off layout handles the case that the subplot is + # the last in the column, because below is no subplot/gap. + if not layout[ax.rowNum + 1, ax.colNum]: + continue + if sharex or len(ax.get_shared_x_axes() + .get_siblings(ax)) > 1: + _remove_labels_from_axis(ax.xaxis) + + except IndexError: + # if gridspec is used, ax.rowNum and ax.colNum may different + # from layout shape. in this case, use last_row logic + for ax in axarr: + if ax.is_last_row(): + continue + if sharex or len(ax.get_shared_x_axes() + .get_siblings(ax)) > 1: + _remove_labels_from_axis(ax.xaxis) + + if ncols > 1: + for ax in axarr: + # only the first column should get y labels -> set all other to + # off as we only have labels in teh first column and we always + # have a subplot there, we can skip the layout test + if ax.is_first_col(): + continue + if sharey or len(ax.get_shared_y_axes().get_siblings(ax)) > 1: + _remove_labels_from_axis(ax.yaxis) + + +def _flatten(axes): + if not is_list_like(axes): + return np.array([axes]) + elif isinstance(axes, (np.ndarray, Index)): + return axes.ravel() + return np.array(axes) + + +def _get_all_lines(ax): + lines = ax.get_lines() + + if hasattr(ax, 'right_ax'): + lines += ax.right_ax.get_lines() + + if hasattr(ax, 'left_ax'): + lines += ax.left_ax.get_lines() + + return lines + + +def _get_xlim(lines): + left, right = np.inf, -np.inf + for l in lines: + x = l.get_xdata(orig=False) + left = min(x[0], left) + right = max(x[-1], right) + return left, right + + +def _set_ticks_props(axes, xlabelsize=None, xrot=None, + ylabelsize=None, yrot=None): + import matplotlib.pyplot as plt + + for ax in _flatten(axes): + if xlabelsize is not None: + plt.setp(ax.get_xticklabels(), fontsize=xlabelsize) + if xrot is not None: + plt.setp(ax.get_xticklabels(), rotation=xrot) + if ylabelsize is not None: + plt.setp(ax.get_yticklabels(), fontsize=ylabelsize) + if yrot is not None: + plt.setp(ax.get_yticklabels(), rotation=yrot) + return axes diff --git a/pandas/sparse/api.py b/pandas/sparse/api.py deleted file mode 100644 index 55841fbeffa2d..0000000000000 --- a/pandas/sparse/api.py +++ /dev/null @@ -1,6 +0,0 @@ -# pylint: disable=W0611 -# flake8: noqa -from pandas.sparse.array import SparseArray -from pandas.sparse.list import SparseList -from pandas.sparse.series import SparseSeries, SparseTimeSeries -from pandas.sparse.frame import SparseDataFrame diff --git a/pandas/src/datetime_helper.h b/pandas/src/datetime_helper.h deleted file mode 100644 index bef4b4266c824..0000000000000 --- a/pandas/src/datetime_helper.h +++ /dev/null @@ -1,36 +0,0 @@ -/* -Copyright (c) 2016, PyData Development Team -All rights reserved. - -Distributed under the terms of the BSD Simplified License. - -The full license is in the LICENSE file, distributed with this software. -*/ - -#ifndef PANDAS_SRC_DATETIME_HELPER_H_ -#define PANDAS_SRC_DATETIME_HELPER_H_ - -#include -#include "datetime.h" -#include "numpy/arrayobject.h" -#include "numpy/arrayscalars.h" - -npy_int64 get_long_attr(PyObject *o, const char *attr) { - npy_int64 long_val; - PyObject *value = PyObject_GetAttrString(o, attr); - long_val = (PyLong_Check(value) ? - PyLong_AsLongLong(value) : PyInt_AS_LONG(value)); - Py_DECREF(value); - return long_val; -} - -npy_float64 total_seconds(PyObject *td) { - // Python 2.6 compat - npy_int64 microseconds = get_long_attr(td, "microseconds"); - npy_int64 seconds = get_long_attr(td, "seconds"); - npy_int64 days = get_long_attr(td, "days"); - npy_int64 days_in_seconds = days * 24LL * 3600LL; - return (microseconds + (seconds + days_in_seconds) * 1000000.0) / 1000000.0; -} - -#endif // PANDAS_SRC_DATETIME_HELPER_H_ diff --git a/pandas/stats/api.py b/pandas/stats/api.py index fd81b875faa91..2a11456d4f9e5 100644 --- a/pandas/stats/api.py +++ b/pandas/stats/api.py @@ -2,10 +2,6 @@ Common namespace of statistical functions """ -# pylint: disable-msg=W0611,W0614,W0401 - # flake8: noqa from pandas.stats.moments import * -from pandas.stats.interface import ols -from pandas.stats.fama_macbeth import fama_macbeth diff --git a/pandas/stats/common.py b/pandas/stats/common.py deleted file mode 100644 index be3b842e93cc8..0000000000000 --- a/pandas/stats/common.py +++ /dev/null @@ -1,45 +0,0 @@ - -_WINDOW_TYPES = { - 0: 'full_sample', - 1: 'rolling', - 2: 'expanding' -} -# also allow 'rolling' as key -_WINDOW_TYPES.update((v, v) for k, v in list(_WINDOW_TYPES.items())) -_ADDITIONAL_CLUSTER_TYPES = set(("entity", "time")) - - -def _get_cluster_type(cluster_type): - # this was previous behavior - if cluster_type is None: - return cluster_type - try: - return _get_window_type(cluster_type) - except ValueError: - final_type = str(cluster_type).lower().replace("_", " ") - if final_type in _ADDITIONAL_CLUSTER_TYPES: - return final_type - raise ValueError('Unrecognized cluster type: %s' % cluster_type) - - -def _get_window_type(window_type): - # e.g., 0, 1, 2 - final_type = _WINDOW_TYPES.get(window_type) - # e.g., 'full_sample' - final_type = final_type or _WINDOW_TYPES.get( - str(window_type).lower().replace(" ", "_")) - if final_type is None: - raise ValueError('Unrecognized window type: %s' % window_type) - return final_type - - -def banner(text, width=80): - """ - - """ - toFill = width - len(text) - - left = toFill // 2 - right = toFill - left - - return '%s%s%s' % ('-' * left, text, '-' * right) diff --git a/pandas/stats/fama_macbeth.py b/pandas/stats/fama_macbeth.py deleted file mode 100644 index f7d50e8e72a5c..0000000000000 --- a/pandas/stats/fama_macbeth.py +++ /dev/null @@ -1,240 +0,0 @@ -from pandas.core.base import StringMixin -from pandas.compat import StringIO, range - -import numpy as np - -from pandas.core.api import Series, DataFrame -import pandas.stats.common as common -from pandas.util.decorators import cache_readonly - -# flake8: noqa - -def fama_macbeth(**kwargs): - """Runs Fama-MacBeth regression. - - Parameters - ---------- - Takes the same arguments as a panel OLS, in addition to: - - nw_lags_beta: int - Newey-West adjusts the betas by the given lags - """ - window_type = kwargs.get('window_type') - if window_type is None: - klass = FamaMacBeth - else: - klass = MovingFamaMacBeth - - return klass(**kwargs) - - -class FamaMacBeth(StringMixin): - - def __init__(self, y, x, intercept=True, nw_lags=None, - nw_lags_beta=None, - entity_effects=False, time_effects=False, x_effects=None, - cluster=None, dropped_dummies=None, verbose=False): - import warnings - warnings.warn("The pandas.stats.fama_macbeth module is deprecated and will be " - "removed in a future version. We refer to external packages " - "like statsmodels, see here: " - "http://www.statsmodels.org/stable/index.html", - FutureWarning, stacklevel=4) - - if dropped_dummies is None: - dropped_dummies = {} - self._nw_lags_beta = nw_lags_beta - - from pandas.stats.plm import MovingPanelOLS - self._ols_result = MovingPanelOLS( - y=y, x=x, window_type='rolling', window=1, - intercept=intercept, - nw_lags=nw_lags, entity_effects=entity_effects, - time_effects=time_effects, x_effects=x_effects, cluster=cluster, - dropped_dummies=dropped_dummies, verbose=verbose) - - self._cols = self._ols_result._x.columns - - @cache_readonly - def _beta_raw(self): - return self._ols_result._beta_raw - - @cache_readonly - def _stats(self): - return _calc_t_stat(self._beta_raw, self._nw_lags_beta) - - @cache_readonly - def _mean_beta_raw(self): - return self._stats[0] - - @cache_readonly - def _std_beta_raw(self): - return self._stats[1] - - @cache_readonly - def _t_stat_raw(self): - return self._stats[2] - - def _make_result(self, result): - return Series(result, index=self._cols) - - @cache_readonly - def mean_beta(self): - return self._make_result(self._mean_beta_raw) - - @cache_readonly - def std_beta(self): - return self._make_result(self._std_beta_raw) - - @cache_readonly - def t_stat(self): - return self._make_result(self._t_stat_raw) - - @cache_readonly - def _results(self): - return { - 'mean_beta': self._mean_beta_raw, - 'std_beta': self._std_beta_raw, - 't_stat': self._t_stat_raw, - } - - @cache_readonly - def _coef_table(self): - buffer = StringIO() - buffer.write('%13s %13s %13s %13s %13s %13s\n' % - ('Variable', 'Beta', 'Std Err', 't-stat', 'CI 2.5%', 'CI 97.5%')) - template = '%13s %13.4f %13.4f %13.2f %13.4f %13.4f\n' - - for i, name in enumerate(self._cols): - if i and not (i % 5): - buffer.write('\n' + common.banner('')) - - mean_beta = self._results['mean_beta'][i] - std_beta = self._results['std_beta'][i] - t_stat = self._results['t_stat'][i] - ci1 = mean_beta - 1.96 * std_beta - ci2 = mean_beta + 1.96 * std_beta - - values = '(%s)' % name, mean_beta, std_beta, t_stat, ci1, ci2 - - buffer.write(template % values) - - if self._nw_lags_beta is not None: - buffer.write('\n') - buffer.write('*** The Std Err, t-stat are Newey-West ' - 'adjusted with Lags %5d\n' % self._nw_lags_beta) - - return buffer.getvalue() - - def __unicode__(self): - return self.summary - - @cache_readonly - def summary(self): - template = """ -----------------------Summary of Fama-MacBeth Analysis------------------------- - -Formula: Y ~ %(formulaRHS)s -# betas : %(nu)3d - -----------------------Summary of Estimated Coefficients------------------------ -%(coefTable)s ---------------------------------End of Summary--------------------------------- -""" - params = { - 'formulaRHS': ' + '.join(self._cols), - 'nu': len(self._beta_raw), - 'coefTable': self._coef_table, - } - - return template % params - - -class MovingFamaMacBeth(FamaMacBeth): - - def __init__(self, y, x, window_type='rolling', window=10, - intercept=True, nw_lags=None, nw_lags_beta=None, - entity_effects=False, time_effects=False, x_effects=None, - cluster=None, dropped_dummies=None, verbose=False): - if dropped_dummies is None: - dropped_dummies = {} - self._window_type = common._get_window_type(window_type) - self._window = window - - FamaMacBeth.__init__( - self, y=y, x=x, intercept=intercept, - nw_lags=nw_lags, nw_lags_beta=nw_lags_beta, - entity_effects=entity_effects, time_effects=time_effects, - x_effects=x_effects, cluster=cluster, - dropped_dummies=dropped_dummies, verbose=verbose) - - self._index = self._ols_result._index - self._T = len(self._index) - - @property - def _is_rolling(self): - return self._window_type == 'rolling' - - def _calc_stats(self): - mean_betas = [] - std_betas = [] - t_stats = [] - - # XXX - - mask = self._ols_result._rolling_ols_call[2] - obs_total = mask.astype(int).cumsum() - - start = self._window - 1 - betas = self._beta_raw - for i in range(start, self._T): - if self._is_rolling: - begin = i - start - else: - begin = 0 - - B = betas[max(obs_total[begin] - 1, 0): obs_total[i]] - mean_beta, std_beta, t_stat = _calc_t_stat(B, self._nw_lags_beta) - mean_betas.append(mean_beta) - std_betas.append(std_beta) - t_stats.append(t_stat) - - return np.array([mean_betas, std_betas, t_stats]) - - _stats = cache_readonly(_calc_stats) - - def _make_result(self, result): - return DataFrame(result, index=self._result_index, columns=self._cols) - - @cache_readonly - def _result_index(self): - mask = self._ols_result._rolling_ols_call[2] - # HACK XXX - return self._index[mask.cumsum() >= self._window] - - @cache_readonly - def _results(self): - return { - 'mean_beta': self._mean_beta_raw[-1], - 'std_beta': self._std_beta_raw[-1], - 't_stat': self._t_stat_raw[-1], - } - - -def _calc_t_stat(beta, nw_lags_beta): - N = len(beta) - B = beta - beta.mean(0) - C = np.dot(B.T, B) / N - - if nw_lags_beta is not None: - for i in range(nw_lags_beta + 1): - - cov = np.dot(B[i:].T, B[:(N - i)]) / N - weight = i / (nw_lags_beta + 1) - C += 2 * (1 - weight) * cov - - mean_beta = beta.mean(0) - std_beta = np.sqrt(np.diag(C)) / np.sqrt(N) - t_stat = mean_beta / std_beta - - return mean_beta, std_beta, t_stat diff --git a/pandas/stats/interface.py b/pandas/stats/interface.py deleted file mode 100644 index caf468b4f85fe..0000000000000 --- a/pandas/stats/interface.py +++ /dev/null @@ -1,143 +0,0 @@ -from pandas.core.api import Series, DataFrame, Panel, MultiIndex -from pandas.stats.ols import OLS, MovingOLS -from pandas.stats.plm import PanelOLS, MovingPanelOLS, NonPooledPanelOLS -import pandas.stats.common as common - - -def ols(**kwargs): - """Returns the appropriate OLS object depending on whether you need - simple or panel OLS, and a full-sample or rolling/expanding OLS. - - Will be a normal linear regression or a (pooled) panel regression depending - on the type of the inputs: - - y : Series, x : DataFrame -> OLS - y : Series, x : dict of DataFrame -> OLS - y : DataFrame, x : DataFrame -> PanelOLS - y : DataFrame, x : dict of DataFrame/Panel -> PanelOLS - y : Series with MultiIndex, x : Panel/DataFrame + MultiIndex -> PanelOLS - - Parameters - ---------- - y: Series or DataFrame - See above for types - x: Series, DataFrame, dict of Series, dict of DataFrame, Panel - weights : Series or ndarray - The weights are presumed to be (proportional to) the inverse of the - variance of the observations. That is, if the variables are to be - transformed by 1/sqrt(W) you must supply weights = 1/W - intercept: bool - True if you want an intercept. Defaults to True. - nw_lags: None or int - Number of Newey-West lags. Defaults to None. - nw_overlap: bool - Whether there are overlaps in the NW lags. Defaults to False. - window_type: {'full sample', 'rolling', 'expanding'} - 'full sample' by default - window: int - size of window (for rolling/expanding OLS). If window passed and no - explicit window_type, 'rolling" will be used as the window_type - - Panel OLS options: - pool: bool - Whether to run pooled panel regression. Defaults to true. - entity_effects: bool - Whether to account for entity fixed effects. Defaults to false. - time_effects: bool - Whether to account for time fixed effects. Defaults to false. - x_effects: list - List of x's to account for fixed effects. Defaults to none. - dropped_dummies: dict - Key is the name of the variable for the fixed effect. - Value is the value of that variable for which we drop the dummy. - - For entity fixed effects, key equals 'entity'. - - By default, the first dummy is dropped if no dummy is specified. - cluster: {'time', 'entity'} - cluster variances - - Examples - -------- - # Run simple OLS. - result = ols(y=y, x=x) - - # Run rolling simple OLS with window of size 10. - result = ols(y=y, x=x, window_type='rolling', window=10) - print(result.beta) - - result = ols(y=y, x=x, nw_lags=1) - - # Set up LHS and RHS for data across all items - y = A - x = {'B' : B, 'C' : C} - - # Run panel OLS. - result = ols(y=y, x=x) - - # Run expanding panel OLS with window 10 and entity clustering. - result = ols(y=y, x=x, cluster='entity', window_type='expanding', - window=10) - - Returns - ------- - The appropriate OLS object, which allows you to obtain betas and various - statistics, such as std err, t-stat, etc. - """ - - if (kwargs.get('cluster') is not None and - kwargs.get('nw_lags') is not None): - raise ValueError( - 'Pandas OLS does not work with Newey-West correction ' - 'and clustering.') - - pool = kwargs.get('pool') - if 'pool' in kwargs: - del kwargs['pool'] - - window_type = kwargs.get('window_type') - window = kwargs.get('window') - - if window_type is None: - if window is None: - window_type = 'full_sample' - else: - window_type = 'rolling' - else: - window_type = common._get_window_type(window_type) - - if window_type != 'full_sample': - kwargs['window_type'] = common._get_window_type(window_type) - - y = kwargs.get('y') - x = kwargs.get('x') - - panel = False - if isinstance(y, DataFrame) or (isinstance(y, Series) and - isinstance(y.index, MultiIndex)): - panel = True - if isinstance(x, Panel): - panel = True - - if window_type == 'full_sample': - for rolling_field in ('window_type', 'window', 'min_periods'): - if rolling_field in kwargs: - del kwargs[rolling_field] - - if panel: - if pool is False: - klass = NonPooledPanelOLS - else: - klass = PanelOLS - else: - klass = OLS - else: - if panel: - if pool is False: - klass = NonPooledPanelOLS - else: - klass = MovingPanelOLS - else: - klass = MovingOLS - - return klass(**kwargs) diff --git a/pandas/stats/math.py b/pandas/stats/math.py deleted file mode 100644 index 505415bebf89e..0000000000000 --- a/pandas/stats/math.py +++ /dev/null @@ -1,130 +0,0 @@ -# pylint: disable-msg=E1103 -# pylint: disable-msg=W0212 - -from __future__ import division - -from pandas.compat import range -import numpy as np -import numpy.linalg as linalg - - -def rank(X, cond=1.0e-12): - """ - Return the rank of a matrix X based on its generalized inverse, - not the SVD. - """ - X = np.asarray(X) - if len(X.shape) == 2: - import scipy.linalg as SL - D = SL.svdvals(X) - result = np.add.reduce(np.greater(D / D.max(), cond)) - return int(result.astype(np.int32)) - else: - return int(not np.alltrue(np.equal(X, 0.))) - - -def solve(a, b): - """Returns the solution of A X = B.""" - try: - return linalg.solve(a, b) - except linalg.LinAlgError: - return np.dot(linalg.pinv(a), b) - - -def inv(a): - """Returns the inverse of A.""" - try: - return np.linalg.inv(a) - except linalg.LinAlgError: - return np.linalg.pinv(a) - - -def is_psd(m): - eigvals = linalg.eigvals(m) - return np.isreal(eigvals).all() and (eigvals >= 0).all() - - -def newey_west(m, max_lags, nobs, df, nw_overlap=False): - """ - Compute Newey-West adjusted covariance matrix, taking into account - specified number of leads / lags - - Parameters - ---------- - m : (N x K) - max_lags : int - nobs : int - Number of observations in model - df : int - Degrees of freedom in explanatory variables - nw_overlap : boolean, default False - Assume data is overlapping - - Returns - ------- - ndarray (K x K) - - Reference - --------- - Newey, W. K. & West, K. D. (1987) A Simple, Positive - Semi-definite, Heteroskedasticity and Autocorrelation Consistent - Covariance Matrix, Econometrica, vol. 55(3), 703-708 - """ - Xeps = np.dot(m.T, m) - for lag in range(1, max_lags + 1): - auto_cov = np.dot(m[:-lag].T, m[lag:]) - weight = lag / (max_lags + 1) - if nw_overlap: - weight = 0 - bb = auto_cov + auto_cov.T - dd = (1 - weight) * bb - Xeps += dd - - Xeps *= nobs / (nobs - df) - - if nw_overlap and not is_psd(Xeps): - new_max_lags = int(np.ceil(max_lags * 1.5)) -# print('nw_overlap is True and newey_west generated a non positive ' -# 'semidefinite matrix, so using newey_west with max_lags of %d.' -# % new_max_lags) - return newey_west(m, new_max_lags, nobs, df) - - return Xeps - - -def calc_F(R, r, beta, var_beta, nobs, df): - """ - Computes the standard F-test statistic for linear restriction - hypothesis testing - - Parameters - ---------- - R: ndarray (N x N) - Restriction matrix - r: ndarray (N x 1) - Restriction vector - beta: ndarray (N x 1) - Estimated model coefficients - var_beta: ndarray (N x N) - Variance covariance matrix of regressors - nobs: int - Number of observations in model - df: int - Model degrees of freedom - - Returns - ------- - F value, (q, df_resid), p value - """ - from scipy.stats import f - - hyp = np.dot(R, beta.reshape(len(beta), 1)) - r - RSR = np.dot(R, np.dot(var_beta, R.T)) - - q = len(r) - - F = np.dot(hyp.T, np.dot(inv(RSR), hyp)).squeeze() / q - - p_value = 1 - f.cdf(F, q, nobs - df) - - return F, (q, nobs - df), p_value diff --git a/pandas/stats/misc.py b/pandas/stats/misc.py deleted file mode 100644 index 1a077dcb6f9a1..0000000000000 --- a/pandas/stats/misc.py +++ /dev/null @@ -1,389 +0,0 @@ -from numpy import NaN -from pandas import compat -import numpy as np - -from pandas.core.api import Series, DataFrame -from pandas.core.series import remove_na -from pandas.compat import zip, lrange -import pandas.core.common as com - - -def zscore(series): - return (series - series.mean()) / np.std(series, ddof=0) - - -def correl_ts(frame1, frame2): - """ - Pairwise correlation of columns of two DataFrame objects - - Parameters - ---------- - - Returns - ------- - y : Series - """ - results = {} - for col, series in compat.iteritems(frame1): - if col in frame2: - other = frame2[col] - - idx1 = series.valid().index - idx2 = other.valid().index - - common_index = idx1.intersection(idx2) - - seriesStand = zscore(series.reindex(common_index)) - otherStand = zscore(other.reindex(common_index)) - results[col] = (seriesStand * otherStand).mean() - - return Series(results) - - -def correl_xs(frame1, frame2): - return correl_ts(frame1.T, frame2.T) - - -def percentileofscore(a, score, kind='rank'): - """The percentile rank of a score relative to a list of scores. - - A `percentileofscore` of, for example, 80% means that 80% of the - scores in `a` are below the given score. In the case of gaps or - ties, the exact definition depends on the optional keyword, `kind`. - - Parameters - ---------- - a: array like - Array of scores to which `score` is compared. - score: int or float - Score that is compared to the elements in `a`. - kind: {'rank', 'weak', 'strict', 'mean'}, optional - This optional parameter specifies the interpretation of the - resulting score: - - - "rank": Average percentage ranking of score. In case of - multiple matches, average the percentage rankings of - all matching scores. - - "weak": This kind corresponds to the definition of a cumulative - distribution function. A percentileofscore of 80% - means that 80% of values are less than or equal - to the provided score. - - "strict": Similar to "weak", except that only values that are - strictly less than the given score are counted. - - "mean": The average of the "weak" and "strict" scores, often used in - testing. See - - http://en.wikipedia.org/wiki/Percentile_rank - - Returns - ------- - pcos : float - Percentile-position of score (0-100) relative to `a`. - - Examples - -------- - Three-quarters of the given values lie below a given score: - - >>> percentileofscore([1, 2, 3, 4], 3) - 75.0 - - With multiple matches, note how the scores of the two matches, 0.6 - and 0.8 respectively, are averaged: - - >>> percentileofscore([1, 2, 3, 3, 4], 3) - 70.0 - - Only 2/5 values are strictly less than 3: - - >>> percentileofscore([1, 2, 3, 3, 4], 3, kind='strict') - 40.0 - - But 4/5 values are less than or equal to 3: - - >>> percentileofscore([1, 2, 3, 3, 4], 3, kind='weak') - 80.0 - - The average between the weak and the strict scores is - - >>> percentileofscore([1, 2, 3, 3, 4], 3, kind='mean') - 60.0 - - """ - a = np.array(a) - n = len(a) - - if kind == 'rank': - if not(np.any(a == score)): - a = np.append(a, score) - a_len = np.array(lrange(len(a))) - else: - a_len = np.array(lrange(len(a))) + 1.0 - - a = np.sort(a) - idx = [a == score] - pct = (np.mean(a_len[idx]) / n) * 100.0 - return pct - - elif kind == 'strict': - return sum(a < score) / float(n) * 100 - elif kind == 'weak': - return sum(a <= score) / float(n) * 100 - elif kind == 'mean': - return (sum(a < score) + sum(a <= score)) * 50 / float(n) - else: - raise ValueError("kind can only be 'rank', 'strict', 'weak' or 'mean'") - - -def percentileRank(frame, column=None, kind='mean'): - """ - Return score at percentile for each point in time (cross-section) - - Parameters - ---------- - frame: DataFrame - column: string or Series, optional - Column name or specific Series to compute percentiles for. - If not provided, percentiles are computed for all values at each - point in time. Note that this can take a LONG time. - kind: {'rank', 'weak', 'strict', 'mean'}, optional - This optional parameter specifies the interpretation of the - resulting score: - - - "rank": Average percentage ranking of score. In case of - multiple matches, average the percentage rankings of - all matching scores. - - "weak": This kind corresponds to the definition of a cumulative - distribution function. A percentileofscore of 80% - means that 80% of values are less than or equal - to the provided score. - - "strict": Similar to "weak", except that only values that are - strictly less than the given score are counted. - - "mean": The average of the "weak" and "strict" scores, often used in - testing. See - - http://en.wikipedia.org/wiki/Percentile_rank - - Returns - ------- - TimeSeries or DataFrame, depending on input - """ - fun = lambda xs, score: percentileofscore(remove_na(xs), - score, kind=kind) - - results = {} - framet = frame.T - if column is not None: - if isinstance(column, Series): - for date, xs in compat.iteritems(frame.T): - results[date] = fun(xs, column.get(date, NaN)) - else: - for date, xs in compat.iteritems(frame.T): - results[date] = fun(xs, xs[column]) - results = Series(results) - else: - for column in frame.columns: - for date, xs in compat.iteritems(framet): - results.setdefault(date, {})[column] = fun(xs, xs[column]) - results = DataFrame(results).T - return results - - -def bucket(series, k, by=None): - """ - Produce DataFrame representing quantiles of a Series - - Parameters - ---------- - series : Series - k : int - number of quantiles - by : Series or same-length array - bucket by value - - Returns - ------- - DataFrame - """ - if by is None: - by = series - else: - by = by.reindex(series.index) - - split = _split_quantile(by, k) - mat = np.empty((len(series), k), dtype=float) * np.NaN - - for i, v in enumerate(split): - mat[:, i][v] = series.take(v) - - return DataFrame(mat, index=series.index, columns=np.arange(k) + 1) - - -def _split_quantile(arr, k): - arr = np.asarray(arr) - mask = np.isfinite(arr) - order = arr[mask].argsort() - n = len(arr) - - return np.array_split(np.arange(n)[mask].take(order), k) - - -def bucketcat(series, cats): - """ - Produce DataFrame representing quantiles of a Series - - Parameters - ---------- - series : Series - cat : Series or same-length array - bucket by category; mutually exclusive with 'by' - - Returns - ------- - DataFrame - """ - if not isinstance(series, Series): - series = Series(series, index=np.arange(len(series))) - - cats = np.asarray(cats) - - unique_labels = np.unique(cats) - unique_labels = unique_labels[com.notnull(unique_labels)] - - # group by - data = {} - - for label in unique_labels: - data[label] = series[cats == label] - - return DataFrame(data, columns=unique_labels) - - -def bucketpanel(series, bins=None, by=None, cat=None): - """ - Bucket data by two Series to create summary panel - - Parameters - ---------- - series : Series - bins : tuple (length-2) - e.g. (2, 2) - by : tuple of Series - bucket by value - cat : tuple of Series - bucket by category; mutually exclusive with 'by' - - Returns - ------- - DataFrame - """ - use_by = by is not None - use_cat = cat is not None - - if use_by and use_cat: - raise Exception('must specify by or cat, but not both') - elif use_by: - if len(by) != 2: - raise Exception('must provide two bucketing series') - - xby, yby = by - xbins, ybins = bins - - return _bucketpanel_by(series, xby, yby, xbins, ybins) - - elif use_cat: - xcat, ycat = cat - return _bucketpanel_cat(series, xcat, ycat) - else: - raise Exception('must specify either values or categories ' - 'to bucket by') - - -def _bucketpanel_by(series, xby, yby, xbins, ybins): - xby = xby.reindex(series.index) - yby = yby.reindex(series.index) - - xlabels = _bucket_labels(xby.reindex(series.index), xbins) - ylabels = _bucket_labels(yby.reindex(series.index), ybins) - - labels = _uniquify(xlabels, ylabels, xbins, ybins) - - mask = com.isnull(labels) - labels[mask] = -1 - - unique_labels = np.unique(labels) - bucketed = bucketcat(series, labels) - - _ulist = list(labels) - index_map = dict((x, _ulist.index(x)) for x in unique_labels) - - def relabel(key): - pos = index_map[key] - - xlab = xlabels[pos] - ylab = ylabels[pos] - - return '%sx%s' % (int(xlab) if com.notnull(xlab) else 'NULL', - int(ylab) if com.notnull(ylab) else 'NULL') - - return bucketed.rename(columns=relabel) - - -def _bucketpanel_cat(series, xcat, ycat): - xlabels, xmapping = _intern(xcat) - ylabels, ymapping = _intern(ycat) - - shift = 10 ** (np.ceil(np.log10(ylabels.max()))) - labels = xlabels * shift + ylabels - - sorter = labels.argsort() - sorted_labels = labels.take(sorter) - sorted_xlabels = xlabels.take(sorter) - sorted_ylabels = ylabels.take(sorter) - - unique_labels = np.unique(labels) - unique_labels = unique_labels[com.notnull(unique_labels)] - - locs = sorted_labels.searchsorted(unique_labels) - xkeys = sorted_xlabels.take(locs) - ykeys = sorted_ylabels.take(locs) - - stringified = ['(%s, %s)' % arg - for arg in zip(xmapping.take(xkeys), ymapping.take(ykeys))] - - result = bucketcat(series, labels) - result.columns = stringified - - return result - - -def _intern(values): - # assumed no NaN values - values = np.asarray(values) - - uniqued = np.unique(values) - labels = uniqued.searchsorted(values) - return labels, uniqued - - -def _uniquify(xlabels, ylabels, xbins, ybins): - # encode the stuff, create unique label - shifter = 10 ** max(xbins, ybins) - _xpiece = xlabels * shifter - _ypiece = ylabels - - return _xpiece + _ypiece - - -def _bucket_labels(series, k): - arr = np.asarray(series) - mask = np.isfinite(arr) - order = arr[mask].argsort() - n = len(series) - - split = np.array_split(np.arange(n)[mask].take(order), k) - - mat = np.empty(n, dtype=float) * np.NaN - for i, v in enumerate(split): - mat[v] = i - - return mat + 1 diff --git a/pandas/stats/moments.py b/pandas/stats/moments.py index 95b209aee0b0c..f6c3a08c6721a 100644 --- a/pandas/stats/moments.py +++ b/pandas/stats/moments.py @@ -6,9 +6,9 @@ import warnings import numpy as np -from pandas.types.common import is_scalar +from pandas.core.dtypes.common import is_scalar from pandas.core.api import DataFrame, Series -from pandas.util.decorators import Substitution, Appender +from pandas.util._decorators import Substitution, Appender __all__ = ['rolling_count', 'rolling_max', 'rolling_min', 'rolling_sum', 'rolling_mean', 'rolling_std', 'rolling_cov', @@ -385,6 +385,7 @@ def ewmstd(arg, com=None, span=None, halflife=None, alpha=None, min_periods=0, bias=bias, func_kw=['bias']) + ewmvol = ewmstd @@ -476,6 +477,7 @@ def f(arg, window, min_periods=None, freq=None, center=False, **kwargs) return f + rolling_max = _rolling_func('max', 'Moving maximum.', how='max') rolling_min = _rolling_func('min', 'Moving minimum.', how='min') rolling_sum = _rolling_func('sum', 'Moving sum.') @@ -683,6 +685,7 @@ def f(arg, min_periods=1, freq=None, **kwargs): **kwargs) return f + expanding_max = _expanding_func('max', 'Expanding maximum.') expanding_min = _expanding_func('min', 'Expanding minimum.') expanding_sum = _expanding_func('sum', 'Expanding sum.') diff --git a/pandas/stats/ols.py b/pandas/stats/ols.py deleted file mode 100644 index b533d255bd196..0000000000000 --- a/pandas/stats/ols.py +++ /dev/null @@ -1,1376 +0,0 @@ -""" -Ordinary least squares regression -""" - -# pylint: disable-msg=W0201 - -# flake8: noqa - -from pandas.compat import zip, range, StringIO -from itertools import starmap -from pandas import compat -import numpy as np - -from pandas.core.api import DataFrame, Series, isnull -from pandas.core.base import StringMixin -from pandas.types.common import _ensure_float64 -from pandas.core.index import MultiIndex -from pandas.core.panel import Panel -from pandas.util.decorators import cache_readonly - -import pandas.stats.common as scom -import pandas.stats.math as math -import pandas.stats.moments as moments - -_FP_ERR = 1e-8 - -class OLS(StringMixin): - """ - Runs a full sample ordinary least squares regression. - - Parameters - ---------- - y : Series - x : Series, DataFrame, dict of Series - intercept : bool - True if you want an intercept. - weights : array-like, optional - 1d array of weights. If you supply 1/W then the variables are pre- - multiplied by 1/sqrt(W). If no weights are supplied the default value - is 1 and WLS reults are the same as OLS. - nw_lags : None or int - Number of Newey-West lags. - nw_overlap : boolean, default False - Assume data is overlapping when computing Newey-West estimator - - """ - _panel_model = False - - def __init__(self, y, x, intercept=True, weights=None, nw_lags=None, - nw_overlap=False): - import warnings - warnings.warn("The pandas.stats.ols module is deprecated and will be " - "removed in a future version. We refer to external packages " - "like statsmodels, see some examples here: " - "http://www.statsmodels.org/stable/regression.html", - FutureWarning, stacklevel=4) - - try: - import statsmodels.api as sm - except ImportError: - import scikits.statsmodels.api as sm - - self._x_orig = x - self._y_orig = y - self._weights_orig = weights - self._intercept = intercept - self._nw_lags = nw_lags - self._nw_overlap = nw_overlap - - (self._y, self._x, self._weights, self._x_filtered, - self._index, self._time_has_obs) = self._prepare_data() - - if self._weights is not None: - self._x_trans = self._x.mul(np.sqrt(self._weights), axis=0) - self._y_trans = self._y * np.sqrt(self._weights) - self.sm_ols = sm.WLS(self._y.get_values(), - self._x.get_values(), - weights=self._weights.values).fit() - else: - self._x_trans = self._x - self._y_trans = self._y - self.sm_ols = sm.OLS(self._y.get_values(), - self._x.get_values()).fit() - - def _prepare_data(self): - """ - Cleans the input for single OLS. - - Parameters - ---------- - lhs: Series - Dependent variable in the regression. - rhs: dict, whose values are Series, DataFrame, or dict - Explanatory variables of the regression. - - Returns - ------- - Series, DataFrame - Cleaned lhs and rhs - """ - (filt_lhs, filt_rhs, filt_weights, - pre_filt_rhs, index, valid) = _filter_data(self._y_orig, self._x_orig, - self._weights_orig) - if self._intercept: - filt_rhs['intercept'] = 1. - pre_filt_rhs['intercept'] = 1. - - if hasattr(filt_weights, 'to_dense'): - filt_weights = filt_weights.to_dense() - - return (filt_lhs, filt_rhs, filt_weights, - pre_filt_rhs, index, valid) - - @property - def nobs(self): - return self._nobs - - @property - def _nobs(self): - return len(self._y) - - @property - def nw_lags(self): - return self._nw_lags - - @property - def x(self): - """Returns the filtered x used in the regression.""" - return self._x - - @property - def y(self): - """Returns the filtered y used in the regression.""" - return self._y - - @cache_readonly - def _beta_raw(self): - """Runs the regression and returns the beta.""" - return self.sm_ols.params - - @cache_readonly - def beta(self): - """Returns the betas in Series form.""" - return Series(self._beta_raw, index=self._x.columns) - - @cache_readonly - def _df_raw(self): - """Returns the degrees of freedom.""" - return math.rank(self._x.values) - - @cache_readonly - def df(self): - """Returns the degrees of freedom. - - This equals the rank of the X matrix. - """ - return self._df_raw - - @cache_readonly - def _df_model_raw(self): - """Returns the raw model degrees of freedom.""" - return self.sm_ols.df_model - - @cache_readonly - def df_model(self): - """Returns the degrees of freedom of the model.""" - return self._df_model_raw - - @cache_readonly - def _df_resid_raw(self): - """Returns the raw residual degrees of freedom.""" - return self.sm_ols.df_resid - - @cache_readonly - def df_resid(self): - """Returns the degrees of freedom of the residuals.""" - return self._df_resid_raw - - @cache_readonly - def _f_stat_raw(self): - """Returns the raw f-stat value.""" - from scipy.stats import f - - cols = self._x.columns - - if self._nw_lags is None: - F = self._r2_raw / (self._r2_raw - self._r2_adj_raw) - - q = len(cols) - if 'intercept' in cols: - q -= 1 - - shape = q, self.df_resid - p_value = 1 - f.cdf(F, shape[0], shape[1]) - return F, shape, p_value - - k = len(cols) - R = np.eye(k) - r = np.zeros((k, 1)) - - try: - intercept = cols.get_loc('intercept') - R = np.concatenate((R[0: intercept], R[intercept + 1:])) - r = np.concatenate((r[0: intercept], r[intercept + 1:])) - except KeyError: - # no intercept - pass - - return math.calc_F(R, r, self._beta_raw, self._var_beta_raw, - self._nobs, self.df) - - @cache_readonly - def f_stat(self): - """Returns the f-stat value.""" - return f_stat_to_dict(self._f_stat_raw) - - def f_test(self, hypothesis): - """Runs the F test, given a joint hypothesis. The hypothesis is - represented by a collection of equations, in the form - - A*x_1+B*x_2=C - - You must provide the coefficients even if they're 1. No spaces. - - The equations can be passed as either a single string or a - list of strings. - - Examples - -------- - o = ols(...) - o.f_test('1*x1+2*x2=0,1*x3=0') - o.f_test(['1*x1+2*x2=0','1*x3=0']) - """ - - x_names = self._x.columns - - R = [] - r = [] - - if isinstance(hypothesis, str): - eqs = hypothesis.split(',') - elif isinstance(hypothesis, list): - eqs = hypothesis - else: # pragma: no cover - raise Exception('hypothesis must be either string or list') - for equation in eqs: - row = np.zeros(len(x_names)) - lhs, rhs = equation.split('=') - for s in lhs.split('+'): - ss = s.split('*') - coeff = float(ss[0]) - x_name = ss[1] - - if x_name not in x_names: - raise Exception('no coefficient named %s' % x_name) - idx = x_names.get_loc(x_name) - row[idx] = coeff - rhs = float(rhs) - - R.append(row) - r.append(rhs) - - R = np.array(R) - q = len(r) - r = np.array(r).reshape(q, 1) - - result = math.calc_F(R, r, self._beta_raw, self._var_beta_raw, - self._nobs, self.df) - - return f_stat_to_dict(result) - - @cache_readonly - def _p_value_raw(self): - """Returns the raw p values.""" - from scipy.stats import t - - return 2 * t.sf(np.fabs(self._t_stat_raw), - self._df_resid_raw) - - @cache_readonly - def p_value(self): - """Returns the p values.""" - return Series(self._p_value_raw, index=self.beta.index) - - @cache_readonly - def _r2_raw(self): - """Returns the raw r-squared values.""" - if self._use_centered_tss: - return 1 - self.sm_ols.ssr / self.sm_ols.centered_tss - else: - return 1 - self.sm_ols.ssr / self.sm_ols.uncentered_tss - - @property - def _use_centered_tss(self): - # has_intercept = np.abs(self._resid_raw.sum()) < _FP_ERR - return self._intercept - - @cache_readonly - def r2(self): - """Returns the r-squared values.""" - return self._r2_raw - - @cache_readonly - def _r2_adj_raw(self): - """Returns the raw r-squared adjusted values.""" - return self.sm_ols.rsquared_adj - - @cache_readonly - def r2_adj(self): - """Returns the r-squared adjusted values.""" - return self._r2_adj_raw - - @cache_readonly - def _resid_raw(self): - """Returns the raw residuals.""" - return self.sm_ols.resid - - @cache_readonly - def resid(self): - """Returns the residuals.""" - return Series(self._resid_raw, index=self._x.index) - - @cache_readonly - def _rmse_raw(self): - """Returns the raw rmse values.""" - return np.sqrt(self.sm_ols.mse_resid) - - @cache_readonly - def rmse(self): - """Returns the rmse value.""" - return self._rmse_raw - - @cache_readonly - def _std_err_raw(self): - """Returns the raw standard err values.""" - return np.sqrt(np.diag(self._var_beta_raw)) - - @cache_readonly - def std_err(self): - """Returns the standard err values of the betas.""" - return Series(self._std_err_raw, index=self.beta.index) - - @cache_readonly - def _t_stat_raw(self): - """Returns the raw t-stat value.""" - return self._beta_raw / self._std_err_raw - - @cache_readonly - def t_stat(self): - """Returns the t-stat values of the betas.""" - return Series(self._t_stat_raw, index=self.beta.index) - - @cache_readonly - def _var_beta_raw(self): - """ - Returns the raw covariance of beta. - """ - x = self._x.values - y = self._y.values - - xx = np.dot(x.T, x) - - if self._nw_lags is None: - return math.inv(xx) * (self._rmse_raw ** 2) - else: - resid = y - np.dot(x, self._beta_raw) - m = (x.T * resid).T - - xeps = math.newey_west(m, self._nw_lags, self._nobs, self._df_raw, - self._nw_overlap) - - xx_inv = math.inv(xx) - return np.dot(xx_inv, np.dot(xeps, xx_inv)) - - @cache_readonly - def var_beta(self): - """Returns the variance-covariance matrix of beta.""" - return DataFrame(self._var_beta_raw, index=self.beta.index, - columns=self.beta.index) - - @cache_readonly - def _y_fitted_raw(self): - """Returns the raw fitted y values.""" - if self._weights is None: - X = self._x_filtered.values - else: - # XXX - return self.sm_ols.fittedvalues - - b = self._beta_raw - return np.dot(X, b) - - @cache_readonly - def y_fitted(self): - """Returns the fitted y values. This equals BX.""" - if self._weights is None: - index = self._x_filtered.index - orig_index = index - else: - index = self._y.index - orig_index = self._y_orig.index - - result = Series(self._y_fitted_raw, index=index) - return result.reindex(orig_index) - - @cache_readonly - def _y_predict_raw(self): - """Returns the raw predicted y values.""" - return self._y_fitted_raw - - @cache_readonly - def y_predict(self): - """Returns the predicted y values. - - For in-sample, this is same as y_fitted.""" - return self.y_fitted - - def predict(self, beta=None, x=None, fill_value=None, - fill_method=None, axis=0): - """ - Parameters - ---------- - beta : Series - x : Series or DataFrame - fill_value : scalar or dict, default None - fill_method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None - axis : {0, 1}, default 0 - See DataFrame.fillna for more details - - Notes - ----- - 1. If both fill_value and fill_method are None then NaNs are dropped - (this is the default behavior) - 2. An intercept will be automatically added to the new_y_values if - the model was fitted using an intercept - - Returns - ------- - Series of predicted values - """ - if beta is None and x is None: - return self.y_predict - - if beta is None: - beta = self.beta - else: - beta = beta.reindex(self.beta.index) - if isnull(beta).any(): - raise ValueError('Must supply betas for same variables') - - if x is None: - x = self._x - orig_x = x - else: - orig_x = x - if fill_value is None and fill_method is None: - x = x.dropna(how='any') - else: - x = x.fillna(value=fill_value, method=fill_method, axis=axis) - if isinstance(x, Series): - x = DataFrame({'x': x}) - if self._intercept: - x['intercept'] = 1. - - x = x.reindex(columns=self._x.columns) - - rs = np.dot(x.values, beta.values) - return Series(rs, x.index).reindex(orig_x.index) - - RESULT_FIELDS = ['r2', 'r2_adj', 'df', 'df_model', 'df_resid', 'rmse', - 'f_stat', 'beta', 'std_err', 't_stat', 'p_value', 'nobs'] - - @cache_readonly - def _results(self): - results = {} - for result in self.RESULT_FIELDS: - results[result] = getattr(self, result) - - return results - - @cache_readonly - def _coef_table(self): - buf = StringIO() - - buf.write('%14s %10s %10s %10s %10s %10s %10s\n' % - ('Variable', 'Coef', 'Std Err', 't-stat', - 'p-value', 'CI 2.5%', 'CI 97.5%')) - buf.write(scom.banner('')) - coef_template = '\n%14s %10.4f %10.4f %10.2f %10.4f %10.4f %10.4f' - - results = self._results - - beta = results['beta'] - - for i, name in enumerate(beta.index): - if i and not (i % 5): - buf.write('\n' + scom.banner('')) - - std_err = results['std_err'][name] - CI1 = beta[name] - 1.96 * std_err - CI2 = beta[name] + 1.96 * std_err - - t_stat = results['t_stat'][name] - p_value = results['p_value'][name] - - line = coef_template % (name, - beta[name], std_err, t_stat, p_value, CI1, CI2) - - buf.write(line) - - if self.nw_lags is not None: - buf.write('\n') - buf.write('*** The calculations are Newey-West ' - 'adjusted with lags %5d\n' % self.nw_lags) - - return buf.getvalue() - - @cache_readonly - def summary_as_matrix(self): - """Returns the formatted results of the OLS as a DataFrame.""" - results = self._results - beta = results['beta'] - data = {'beta': results['beta'], - 't-stat': results['t_stat'], - 'p-value': results['p_value'], - 'std err': results['std_err']} - return DataFrame(data, beta.index).T - - @cache_readonly - def summary(self): - """ - This returns the formatted result of the OLS computation - """ - template = """ -%(bannerTop)s - -Formula: Y ~ %(formula)s - -Number of Observations: %(nobs)d -Number of Degrees of Freedom: %(df)d - -R-squared: %(r2)10.4f -Adj R-squared: %(r2_adj)10.4f - -Rmse: %(rmse)10.4f - -F-stat %(f_stat_shape)s: %(f_stat)10.4f, p-value: %(f_stat_p_value)10.4f - -Degrees of Freedom: model %(df_model)d, resid %(df_resid)d - -%(bannerCoef)s -%(coef_table)s -%(bannerEnd)s -""" - coef_table = self._coef_table - - results = self._results - - f_stat = results['f_stat'] - - bracketed = ['<%s>' % str(c) for c in results['beta'].index] - - formula = StringIO() - formula.write(bracketed[0]) - tot = len(bracketed[0]) - line = 1 - for coef in bracketed[1:]: - tot = tot + len(coef) + 3 - - if tot // (68 * line): - formula.write('\n' + ' ' * 12) - line += 1 - - formula.write(' + ' + coef) - - params = { - 'bannerTop': scom.banner('Summary of Regression Analysis'), - 'bannerCoef': scom.banner('Summary of Estimated Coefficients'), - 'bannerEnd': scom.banner('End of Summary'), - 'formula': formula.getvalue(), - 'r2': results['r2'], - 'r2_adj': results['r2_adj'], - 'nobs': results['nobs'], - 'df': results['df'], - 'df_model': results['df_model'], - 'df_resid': results['df_resid'], - 'coef_table': coef_table, - 'rmse': results['rmse'], - 'f_stat': f_stat['f-stat'], - 'f_stat_shape': '(%d, %d)' % (f_stat['DF X'], f_stat['DF Resid']), - 'f_stat_p_value': f_stat['p-value'], - } - - return template % params - - def __unicode__(self): - return self.summary - - @cache_readonly - def _time_obs_count(self): - # XXX - return self._time_has_obs.astype(int) - - @property - def _total_times(self): - return self._time_has_obs.sum() - - -class MovingOLS(OLS): - """ - Runs a rolling/expanding simple OLS. - - Parameters - ---------- - y : Series - x : Series, DataFrame, or dict of Series - weights : array-like, optional - 1d array of weights. If None, equivalent to an unweighted OLS. - window_type : {'full sample', 'rolling', 'expanding'} - Default expanding - window : int - size of window (for rolling/expanding OLS) - min_periods : int - Threshold of non-null data points to require. - If None, defaults to size of window for window_type='rolling' and 1 - otherwise - intercept : bool - True if you want an intercept. - nw_lags : None or int - Number of Newey-West lags. - nw_overlap : boolean, default False - Assume data is overlapping when computing Newey-West estimator - - """ - - def __init__(self, y, x, weights=None, window_type='expanding', - window=None, min_periods=None, intercept=True, - nw_lags=None, nw_overlap=False): - - self._args = dict(intercept=intercept, nw_lags=nw_lags, - nw_overlap=nw_overlap) - - OLS.__init__(self, y=y, x=x, weights=weights, **self._args) - - self._set_window(window_type, window, min_periods) - - def _set_window(self, window_type, window, min_periods): - self._window_type = scom._get_window_type(window_type) - - if self._is_rolling: - if window is None: - raise AssertionError("Must specify window.") - if min_periods is None: - min_periods = window - else: - window = len(self._x) - if min_periods is None: - min_periods = 1 - - self._window = int(window) - self._min_periods = min_periods - -#------------------------------------------------------------------------------ -# "Public" results - - @cache_readonly - def beta(self): - """Returns the betas in Series/DataFrame form.""" - return DataFrame(self._beta_raw, - index=self._result_index, - columns=self._x.columns) - - @cache_readonly - def rank(self): - return Series(self._rank_raw, index=self._result_index) - - @cache_readonly - def df(self): - """Returns the degrees of freedom.""" - return Series(self._df_raw, index=self._result_index) - - @cache_readonly - def df_model(self): - """Returns the model degrees of freedom.""" - return Series(self._df_model_raw, index=self._result_index) - - @cache_readonly - def df_resid(self): - """Returns the residual degrees of freedom.""" - return Series(self._df_resid_raw, index=self._result_index) - - @cache_readonly - def f_stat(self): - """Returns the f-stat value.""" - f_stat_dicts = dict((date, f_stat_to_dict(f_stat)) - for date, f_stat in zip(self.beta.index, - self._f_stat_raw)) - - return DataFrame(f_stat_dicts).T - - def f_test(self, hypothesis): - raise NotImplementedError('must use full sample') - - @cache_readonly - def forecast_mean(self): - return Series(self._forecast_mean_raw, index=self._result_index) - - @cache_readonly - def forecast_vol(self): - return Series(self._forecast_vol_raw, index=self._result_index) - - @cache_readonly - def p_value(self): - """Returns the p values.""" - cols = self.beta.columns - return DataFrame(self._p_value_raw, columns=cols, - index=self._result_index) - - @cache_readonly - def r2(self): - """Returns the r-squared values.""" - return Series(self._r2_raw, index=self._result_index) - - @cache_readonly - def resid(self): - """Returns the residuals.""" - return Series(self._resid_raw[self._valid_obs_labels], - index=self._result_index) - - @cache_readonly - def r2_adj(self): - """Returns the r-squared adjusted values.""" - index = self.r2.index - - return Series(self._r2_adj_raw, index=index) - - @cache_readonly - def rmse(self): - """Returns the rmse values.""" - return Series(self._rmse_raw, index=self._result_index) - - @cache_readonly - def std_err(self): - """Returns the standard err values.""" - return DataFrame(self._std_err_raw, columns=self.beta.columns, - index=self._result_index) - - @cache_readonly - def t_stat(self): - """Returns the t-stat value.""" - return DataFrame(self._t_stat_raw, columns=self.beta.columns, - index=self._result_index) - - @cache_readonly - def var_beta(self): - """Returns the covariance of beta.""" - result = {} - result_index = self._result_index - for i in range(len(self._var_beta_raw)): - dm = DataFrame(self._var_beta_raw[i], columns=self.beta.columns, - index=self.beta.columns) - result[result_index[i]] = dm - - return Panel.from_dict(result, intersect=False) - - @cache_readonly - def y_fitted(self): - """Returns the fitted y values.""" - return Series(self._y_fitted_raw[self._valid_obs_labels], - index=self._result_index) - - @cache_readonly - def y_predict(self): - """Returns the predicted y values.""" - return Series(self._y_predict_raw[self._valid_obs_labels], - index=self._result_index) - -#------------------------------------------------------------------------------ -# "raw" attributes, calculations - - @property - def _is_rolling(self): - return self._window_type == 'rolling' - - @cache_readonly - def _beta_raw(self): - """Runs the regression and returns the beta.""" - beta, indices, mask = self._rolling_ols_call - - return beta[indices] - - @cache_readonly - def _result_index(self): - return self._index[self._valid_indices] - - @property - def _valid_indices(self): - return self._rolling_ols_call[1] - - @cache_readonly - def _rolling_ols_call(self): - return self._calc_betas(self._x_trans, self._y_trans) - - def _calc_betas(self, x, y): - N = len(self._index) - K = len(self._x.columns) - - betas = np.empty((N, K), dtype=float) - betas[:] = np.NaN - - valid = self._time_has_obs - enough = self._enough_obs - window = self._window - - # Use transformed (demeaned) Y, X variables - cum_xx = self._cum_xx(x) - cum_xy = self._cum_xy(x, y) - - for i in range(N): - if not valid[i] or not enough[i]: - continue - - xx = cum_xx[i] - xy = cum_xy[i] - if self._is_rolling and i >= window: - xx = xx - cum_xx[i - window] - xy = xy - cum_xy[i - window] - - betas[i] = math.solve(xx, xy) - - mask = ~np.isnan(betas).any(axis=1) - have_betas = np.arange(N)[mask] - - return betas, have_betas, mask - - def _rolling_rank(self): - dates = self._index - window = self._window - - ranks = np.empty(len(dates), dtype=float) - ranks[:] = np.NaN - for i, date in enumerate(dates): - if self._is_rolling and i >= window: - prior_date = dates[i - window + 1] - else: - prior_date = dates[0] - - x_slice = self._x.truncate(before=prior_date, after=date).values - - if len(x_slice) == 0: - continue - - ranks[i] = math.rank(x_slice) - - return ranks - - def _cum_xx(self, x): - dates = self._index - K = len(x.columns) - valid = self._time_has_obs - cum_xx = [] - - slicer = lambda df, dt: df.truncate(dt, dt).values - if not self._panel_model: - _get_index = x.index.get_loc - - def slicer(df, dt): - i = _get_index(dt) - return df.values[i:i + 1, :] - - last = np.zeros((K, K)) - - for i, date in enumerate(dates): - if not valid[i]: - cum_xx.append(last) - continue - - x_slice = slicer(x, date) - xx = last = last + np.dot(x_slice.T, x_slice) - cum_xx.append(xx) - - return cum_xx - - def _cum_xy(self, x, y): - dates = self._index - valid = self._time_has_obs - cum_xy = [] - - x_slicer = lambda df, dt: df.truncate(dt, dt).values - if not self._panel_model: - _get_index = x.index.get_loc - - def x_slicer(df, dt): - i = _get_index(dt) - return df.values[i:i + 1] - - _y_get_index = y.index.get_loc - _values = y.values - if isinstance(y.index, MultiIndex): - def y_slicer(df, dt): - loc = _y_get_index(dt) - return _values[loc] - else: - def y_slicer(df, dt): - i = _y_get_index(dt) - return _values[i:i + 1] - - last = np.zeros(len(x.columns)) - for i, date in enumerate(dates): - if not valid[i]: - cum_xy.append(last) - continue - - x_slice = x_slicer(x, date) - y_slice = y_slicer(y, date) - - xy = last = last + np.dot(x_slice.T, y_slice) - cum_xy.append(xy) - - return cum_xy - - @cache_readonly - def _rank_raw(self): - rank = self._rolling_rank() - return rank[self._valid_indices] - - @cache_readonly - def _df_raw(self): - """Returns the degrees of freedom.""" - return self._rank_raw - - @cache_readonly - def _df_model_raw(self): - """Returns the raw model degrees of freedom.""" - return self._df_raw - 1 - - @cache_readonly - def _df_resid_raw(self): - """Returns the raw residual degrees of freedom.""" - return self._nobs - self._df_raw - - @cache_readonly - def _f_stat_raw(self): - """Returns the raw f-stat value.""" - from scipy.stats import f - - items = self.beta.columns - nobs = self._nobs - df = self._df_raw - df_resid = nobs - df - - # var_beta has not been newey-west adjusted - if self._nw_lags is None: - F = self._r2_raw / (self._r2_raw - self._r2_adj_raw) - - q = len(items) - if 'intercept' in items: - q -= 1 - - def get_result_simple(Fst, d): - return Fst, (q, d), 1 - f.cdf(Fst, q, d) - - # Compute the P-value for each pair - result = starmap(get_result_simple, zip(F, df_resid)) - - return list(result) - - K = len(items) - R = np.eye(K) - r = np.zeros((K, 1)) - - try: - intercept = items.get_loc('intercept') - R = np.concatenate((R[0: intercept], R[intercept + 1:])) - r = np.concatenate((r[0: intercept], r[intercept + 1:])) - except KeyError: - # no intercept - pass - - def get_result(beta, vcov, n, d): - return math.calc_F(R, r, beta, vcov, n, d) - - results = starmap(get_result, - zip(self._beta_raw, self._var_beta_raw, nobs, df)) - - return list(results) - - @cache_readonly - def _p_value_raw(self): - """Returns the raw p values.""" - from scipy.stats import t - - result = [2 * t.sf(a, b) - for a, b in zip(np.fabs(self._t_stat_raw), - self._df_resid_raw)] - - return np.array(result) - - @cache_readonly - def _resid_stats(self): - uncentered_sst = [] - sst = [] - sse = [] - - Yreg = self._y - Y = self._y_trans - X = self._x_trans - weights = self._weights - - dates = self._index - window = self._window - for n, index in enumerate(self._valid_indices): - if self._is_rolling and index >= window: - prior_date = dates[index - window + 1] - else: - prior_date = dates[0] - - date = dates[index] - beta = self._beta_raw[n] - - X_slice = X.truncate(before=prior_date, after=date).values - Y_slice = _y_converter(Y.truncate(before=prior_date, after=date)) - - resid = Y_slice - np.dot(X_slice, beta) - - if weights is not None: - Y_slice = _y_converter(Yreg.truncate(before=prior_date, - after=date)) - weights_slice = weights.truncate(prior_date, date) - demeaned = Y_slice - np.average(Y_slice, weights=weights_slice) - SS_total = (weights_slice * demeaned ** 2).sum() - else: - SS_total = ((Y_slice - Y_slice.mean()) ** 2).sum() - - SS_err = (resid ** 2).sum() - SST_uncentered = (Y_slice ** 2).sum() - - sse.append(SS_err) - sst.append(SS_total) - uncentered_sst.append(SST_uncentered) - - return { - 'sse': np.array(sse), - 'centered_tss': np.array(sst), - 'uncentered_tss': np.array(uncentered_sst), - } - - @cache_readonly - def _rmse_raw(self): - """Returns the raw rmse values.""" - return np.sqrt(self._resid_stats['sse'] / self._df_resid_raw) - - @cache_readonly - def _r2_raw(self): - rs = self._resid_stats - - if self._use_centered_tss: - return 1 - rs['sse'] / rs['centered_tss'] - else: - return 1 - rs['sse'] / rs['uncentered_tss'] - - @cache_readonly - def _r2_adj_raw(self): - """Returns the raw r-squared adjusted values.""" - nobs = self._nobs - factors = (nobs - 1) / (nobs - self._df_raw) - return 1 - (1 - self._r2_raw) * factors - - @cache_readonly - def _resid_raw(self): - """Returns the raw residuals.""" - return (self._y.values - self._y_fitted_raw) - - @cache_readonly - def _std_err_raw(self): - """Returns the raw standard err values.""" - results = [] - for i in range(len(self._var_beta_raw)): - results.append(np.sqrt(np.diag(self._var_beta_raw[i]))) - - return np.array(results) - - @cache_readonly - def _t_stat_raw(self): - """Returns the raw t-stat value.""" - return self._beta_raw / self._std_err_raw - - @cache_readonly - def _var_beta_raw(self): - """Returns the raw covariance of beta.""" - x = self._x_trans - y = self._y_trans - dates = self._index - nobs = self._nobs - rmse = self._rmse_raw - beta = self._beta_raw - df = self._df_raw - window = self._window - cum_xx = self._cum_xx(self._x) - - results = [] - for n, i in enumerate(self._valid_indices): - xx = cum_xx[i] - date = dates[i] - - if self._is_rolling and i >= window: - xx = xx - cum_xx[i - window] - prior_date = dates[i - window + 1] - else: - prior_date = dates[0] - - x_slice = x.truncate(before=prior_date, after=date) - y_slice = y.truncate(before=prior_date, after=date) - xv = x_slice.values - yv = np.asarray(y_slice) - - if self._nw_lags is None: - result = math.inv(xx) * (rmse[n] ** 2) - else: - resid = yv - np.dot(xv, beta[n]) - m = (xv.T * resid).T - - xeps = math.newey_west(m, self._nw_lags, nobs[n], df[n], - self._nw_overlap) - - xx_inv = math.inv(xx) - result = np.dot(xx_inv, np.dot(xeps, xx_inv)) - - results.append(result) - - return np.array(results) - - @cache_readonly - def _forecast_mean_raw(self): - """Returns the raw covariance of beta.""" - nobs = self._nobs - window = self._window - - # x should be ones - dummy = DataFrame(index=self._y.index) - dummy['y'] = 1 - - cum_xy = self._cum_xy(dummy, self._y) - - results = [] - for n, i in enumerate(self._valid_indices): - sumy = cum_xy[i] - - if self._is_rolling and i >= window: - sumy = sumy - cum_xy[i - window] - - results.append(sumy[0] / nobs[n]) - - return np.array(results) - - @cache_readonly - def _forecast_vol_raw(self): - """Returns the raw covariance of beta.""" - beta = self._beta_raw - window = self._window - dates = self._index - x = self._x - - results = [] - for n, i in enumerate(self._valid_indices): - date = dates[i] - if self._is_rolling and i >= window: - prior_date = dates[i - window + 1] - else: - prior_date = dates[0] - - x_slice = x.truncate(prior_date, date).values - x_demeaned = x_slice - x_slice.mean(0) - x_cov = np.dot(x_demeaned.T, x_demeaned) / (len(x_slice) - 1) - - B = beta[n] - result = np.dot(B, np.dot(x_cov, B)) - results.append(np.sqrt(result)) - - return np.array(results) - - @cache_readonly - def _y_fitted_raw(self): - """Returns the raw fitted y values.""" - return (self._x.values * self._beta_matrix(lag=0)).sum(1) - - @cache_readonly - def _y_predict_raw(self): - """Returns the raw predicted y values.""" - return (self._x.values * self._beta_matrix(lag=1)).sum(1) - - @cache_readonly - def _results(self): - results = {} - for result in self.RESULT_FIELDS: - value = getattr(self, result) - if isinstance(value, Series): - value = value[self.beta.index[-1]] - elif isinstance(value, DataFrame): - value = value.xs(self.beta.index[-1]) - else: # pragma: no cover - raise Exception('Problem retrieving %s' % result) - results[result] = value - - return results - - @cache_readonly - def _window_time_obs(self): - window_obs = (Series(self._time_obs_count > 0) - .rolling(self._window, min_periods=1) - .sum() - .values - ) - - window_obs[np.isnan(window_obs)] = 0 - return window_obs.astype(int) - - @cache_readonly - def _nobs_raw(self): - if self._is_rolling: - window = self._window - else: - # expanding case - window = len(self._index) - - result = Series(self._time_obs_count).rolling( - window, min_periods=1).sum().values - - return result.astype(int) - - def _beta_matrix(self, lag=0): - if lag < 0: - raise AssertionError("'lag' must be greater than or equal to 0, " - "input was {0}".format(lag)) - - betas = self._beta_raw - - labels = np.arange(len(self._y)) - lag - indexer = self._valid_obs_labels.searchsorted(labels, side='left') - indexer[indexer == len(betas)] = len(betas) - 1 - - beta_matrix = betas[indexer] - beta_matrix[labels < self._valid_obs_labels[0]] = np.NaN - - return beta_matrix - - @cache_readonly - def _valid_obs_labels(self): - dates = self._index[self._valid_indices] - return self._y.index.searchsorted(dates) - - @cache_readonly - def _nobs(self): - return self._nobs_raw[self._valid_indices] - - @property - def nobs(self): - return Series(self._nobs, index=self._result_index) - - @cache_readonly - def _enough_obs(self): - # XXX: what's the best way to determine where to start? - return self._nobs_raw >= max(self._min_periods, - len(self._x.columns) + 1) - - -def _safe_update(d, other): - """ - Combine dictionaries with non-overlapping keys - """ - for k, v in compat.iteritems(other): - if k in d: - raise Exception('Duplicate regressor: %s' % k) - - d[k] = v - - -def _filter_data(lhs, rhs, weights=None): - """ - Cleans the input for single OLS. - - Parameters - ---------- - lhs : Series - Dependent variable in the regression. - rhs : dict, whose values are Series, DataFrame, or dict - Explanatory variables of the regression. - weights : array-like, optional - 1d array of weights. If None, equivalent to an unweighted OLS. - - Returns - ------- - Series, DataFrame - Cleaned lhs and rhs - """ - if not isinstance(lhs, Series): - if len(lhs) != len(rhs): - raise AssertionError("length of lhs must equal length of rhs") - lhs = Series(lhs, index=rhs.index) - - rhs = _combine_rhs(rhs) - lhs = DataFrame({'__y__': lhs}, dtype=float) - pre_filt_rhs = rhs.dropna(how='any') - - combined = rhs.join(lhs, how='outer') - if weights is not None: - combined['__weights__'] = weights - - valid = (combined.count(1) == len(combined.columns)).values - index = combined.index - combined = combined[valid] - - if weights is not None: - filt_weights = combined.pop('__weights__') - else: - filt_weights = None - - filt_lhs = combined.pop('__y__') - filt_rhs = combined - - if hasattr(filt_weights, 'to_dense'): - filt_weights = filt_weights.to_dense() - - return (filt_lhs.to_dense(), filt_rhs.to_dense(), filt_weights, - pre_filt_rhs.to_dense(), index, valid) - - -def _combine_rhs(rhs): - """ - Glue input X variables together while checking for potential - duplicates - """ - series = {} - - if isinstance(rhs, Series): - series['x'] = rhs - elif isinstance(rhs, DataFrame): - series = rhs.copy() - elif isinstance(rhs, dict): - for name, value in compat.iteritems(rhs): - if isinstance(value, Series): - _safe_update(series, {name: value}) - elif isinstance(value, (dict, DataFrame)): - _safe_update(series, value) - else: # pragma: no cover - raise Exception('Invalid RHS data type: %s' % type(value)) - else: # pragma: no cover - raise Exception('Invalid RHS type: %s' % type(rhs)) - - if not isinstance(series, DataFrame): - series = DataFrame(series, dtype=float) - - return series - -# A little kludge so we can use this method for both -# MovingOLS and MovingPanelOLS - - -def _y_converter(y): - y = y.values.squeeze() - if y.ndim == 0: # pragma: no cover - return np.array([y]) - else: - return y - - -def f_stat_to_dict(result): - f_stat, shape, p_value = result - - result = {} - result['f-stat'] = f_stat - result['DF X'] = shape[0] - result['DF Resid'] = shape[1] - result['p-value'] = p_value - - return result diff --git a/pandas/stats/plm.py b/pandas/stats/plm.py deleted file mode 100644 index 806dc289f843a..0000000000000 --- a/pandas/stats/plm.py +++ /dev/null @@ -1,863 +0,0 @@ -""" -Linear regression objects for panel data -""" - -# pylint: disable-msg=W0231 -# pylint: disable-msg=E1101,E1103 - -# flake8: noqa - -from __future__ import division -from pandas.compat import range -from pandas import compat -import warnings - -import numpy as np - -from pandas.core.panel import Panel -from pandas.core.frame import DataFrame -from pandas.core.reshape import get_dummies -from pandas.core.series import Series -from pandas.stats.ols import OLS, MovingOLS -import pandas.stats.common as com -import pandas.stats.math as math -from pandas.util.decorators import cache_readonly - - -class PanelOLS(OLS): - """Implements panel OLS. - - See ols function docs - """ - _panel_model = True - - def __init__(self, y, x, weights=None, intercept=True, nw_lags=None, - entity_effects=False, time_effects=False, x_effects=None, - cluster=None, dropped_dummies=None, verbose=False, - nw_overlap=False): - import warnings - warnings.warn("The pandas.stats.plm module is deprecated and will be " - "removed in a future version. We refer to external packages " - "like statsmodels, see some examples here: " - "http://www.statsmodels.org/stable/mixed_linear.html", - FutureWarning, stacklevel=4) - self._x_orig = x - self._y_orig = y - self._weights = weights - - self._intercept = intercept - self._nw_lags = nw_lags - self._nw_overlap = nw_overlap - self._entity_effects = entity_effects - self._time_effects = time_effects - self._x_effects = x_effects - self._dropped_dummies = dropped_dummies or {} - self._cluster = com._get_cluster_type(cluster) - self._verbose = verbose - - (self._x, self._x_trans, - self._x_filtered, self._y, - self._y_trans) = self._prepare_data() - - self._index = self._x.index.levels[0] - - self._T = len(self._index) - - def log(self, msg): - if self._verbose: # pragma: no cover - print(msg) - - def _prepare_data(self): - """Cleans and stacks input data into DataFrame objects - - If time effects is True, then we turn off intercepts and omit an item - from every (entity and x) fixed effect. - - Otherwise: - - If we have an intercept, we omit an item from every fixed effect. - - Else, we omit an item from every fixed effect except one of them. - - The categorical variables will get dropped from x. - """ - (x, x_filtered, y, weights, cat_mapping) = self._filter_data() - - self.log('Adding dummies to X variables') - x = self._add_dummies(x, cat_mapping) - - self.log('Adding dummies to filtered X variables') - x_filtered = self._add_dummies(x_filtered, cat_mapping) - - if self._x_effects: - x = x.drop(self._x_effects, axis=1) - x_filtered = x_filtered.drop(self._x_effects, axis=1) - - if self._time_effects: - x_regressor = x.sub(x.mean(level=0), level=0) - - unstacked_y = y.unstack() - y_regressor = unstacked_y.sub(unstacked_y.mean(1), axis=0).stack() - y_regressor.index = y.index - - elif self._intercept: - # only add intercept when no time effects - self.log('Adding intercept') - x = x_regressor = add_intercept(x) - x_filtered = add_intercept(x_filtered) - y_regressor = y - else: - self.log('No intercept added') - x_regressor = x - y_regressor = y - - if weights is not None: - if not y_regressor.index.equals(weights.index): - raise AssertionError("y_regressor and weights must have the " - "same index") - if not x_regressor.index.equals(weights.index): - raise AssertionError("x_regressor and weights must have the " - "same index") - - rt_weights = np.sqrt(weights) - y_regressor = y_regressor * rt_weights - x_regressor = x_regressor.mul(rt_weights, axis=0) - - return x, x_regressor, x_filtered, y, y_regressor - - def _filter_data(self): - """ - - """ - data = self._x_orig - cat_mapping = {} - - if isinstance(data, DataFrame): - data = data.to_panel() - else: - if isinstance(data, Panel): - data = data.copy() - - data, cat_mapping = self._convert_x(data) - - if not isinstance(data, Panel): - data = Panel.from_dict(data, intersect=True) - - x_names = data.items - - if self._weights is not None: - data['__weights__'] = self._weights - - # Filter x's without y (so we can make a prediction) - filtered = data.to_frame() - - # Filter all data together using to_frame - - # convert to DataFrame - y = self._y_orig - if isinstance(y, Series): - y = y.unstack() - - data['__y__'] = y - data_long = data.to_frame() - - x_filt = filtered.filter(x_names) - x = data_long.filter(x_names) - y = data_long['__y__'] - - if self._weights is not None and not self._weights.empty: - weights = data_long['__weights__'] - else: - weights = None - - return x, x_filt, y, weights, cat_mapping - - def _convert_x(self, x): - # Converts non-numeric data in x to floats. x_converted is the - # DataFrame with converted values, and x_conversion is a dict that - # provides the reverse mapping. For example, if 'A' was converted to 0 - # for x named 'variety', then x_conversion['variety'][0] is 'A'. - x_converted = {} - cat_mapping = {} - # x can be either a dict or a Panel, but in Python 3, dicts don't have - # .iteritems - iteritems = getattr(x, 'iteritems', x.items) - for key, df in iteritems(): - if not isinstance(df, DataFrame): - raise AssertionError("all input items must be DataFrames, " - "at least one is of " - "type {0}".format(type(df))) - - if _is_numeric(df): - x_converted[key] = df - else: - try: - df = df.astype(float) - except (TypeError, ValueError): - values = df.values - distinct_values = sorted(set(values.flat)) - cat_mapping[key] = dict(enumerate(distinct_values)) - new_values = np.searchsorted(distinct_values, values) - x_converted[key] = DataFrame(new_values, index=df.index, - columns=df.columns) - - if len(cat_mapping) == 0: - x_converted = x - - return x_converted, cat_mapping - - def _add_dummies(self, panel, mapping): - """ - Add entity and / or categorical dummies to input X DataFrame - - Returns - ------- - DataFrame - """ - panel = self._add_entity_effects(panel) - panel = self._add_categorical_dummies(panel, mapping) - - return panel - - def _add_entity_effects(self, panel): - """ - Add entity dummies to panel - - Returns - ------- - DataFrame - """ - from pandas.core.reshape import make_axis_dummies - - if not self._entity_effects: - return panel - - self.log('-- Adding entity fixed effect dummies') - - dummies = make_axis_dummies(panel, 'minor') - - if not self._use_all_dummies: - if 'entity' in self._dropped_dummies: - to_exclude = str(self._dropped_dummies.get('entity')) - else: - to_exclude = dummies.columns[0] - - if to_exclude not in dummies.columns: - raise Exception('%s not in %s' % (to_exclude, - dummies.columns)) - - self.log('-- Excluding dummy for entity: %s' % to_exclude) - - dummies = dummies.filter(dummies.columns.difference([to_exclude])) - - dummies = dummies.add_prefix('FE_') - panel = panel.join(dummies) - - return panel - - def _add_categorical_dummies(self, panel, cat_mappings): - """ - Add categorical dummies to panel - - Returns - ------- - DataFrame - """ - if not self._x_effects: - return panel - - dropped_dummy = (self._entity_effects and not self._use_all_dummies) - - for effect in self._x_effects: - self.log('-- Adding fixed effect dummies for %s' % effect) - - dummies = get_dummies(panel[effect]) - - val_map = cat_mappings.get(effect) - if val_map: - val_map = dict((v, k) for k, v in compat.iteritems(val_map)) - - if dropped_dummy or not self._use_all_dummies: - if effect in self._dropped_dummies: - to_exclude = mapped_name = self._dropped_dummies.get( - effect) - - if val_map: - mapped_name = val_map[to_exclude] - else: - to_exclude = mapped_name = dummies.columns[0] - - if mapped_name not in dummies.columns: # pragma: no cover - raise Exception('%s not in %s' % (to_exclude, - dummies.columns)) - - self.log( - '-- Excluding dummy for %s: %s' % (effect, to_exclude)) - - dummies = dummies.filter( - dummies.columns.difference([mapped_name])) - dropped_dummy = True - - dummies = _convertDummies(dummies, cat_mappings.get(effect)) - dummies = dummies.add_prefix('%s_' % effect) - panel = panel.join(dummies) - - return panel - - @property - def _use_all_dummies(self): - """ - In the case of using an intercept or including time fixed - effects, completely partitioning the sample would make the X - not full rank. - """ - return (not self._intercept and not self._time_effects) - - @cache_readonly - def _beta_raw(self): - """Runs the regression and returns the beta.""" - X = self._x_trans.values - Y = self._y_trans.values.squeeze() - - beta, _, _, _ = np.linalg.lstsq(X, Y) - - return beta - - @cache_readonly - def beta(self): - return Series(self._beta_raw, index=self._x.columns) - - @cache_readonly - def _df_model_raw(self): - """Returns the raw model degrees of freedom.""" - return self._df_raw - 1 - - @cache_readonly - def _df_resid_raw(self): - """Returns the raw residual degrees of freedom.""" - return self._nobs - self._df_raw - - @cache_readonly - def _df_raw(self): - """Returns the degrees of freedom.""" - df = math.rank(self._x_trans.values) - if self._time_effects: - df += self._total_times - - return df - - @cache_readonly - def _r2_raw(self): - Y = self._y_trans.values.squeeze() - X = self._x_trans.values - - resid = Y - np.dot(X, self._beta_raw) - - SSE = (resid ** 2).sum() - - if self._use_centered_tss: - SST = ((Y - np.mean(Y)) ** 2).sum() - else: - SST = (Y ** 2).sum() - - return 1 - SSE / SST - - @property - def _use_centered_tss(self): - # has_intercept = np.abs(self._resid_raw.sum()) < _FP_ERR - return self._intercept or self._entity_effects or self._time_effects - - @cache_readonly - def _r2_adj_raw(self): - """Returns the raw r-squared adjusted values.""" - nobs = self._nobs - factors = (nobs - 1) / (nobs - self._df_raw) - return 1 - (1 - self._r2_raw) * factors - - @cache_readonly - def _resid_raw(self): - Y = self._y.values.squeeze() - X = self._x.values - return Y - np.dot(X, self._beta_raw) - - @cache_readonly - def resid(self): - return self._unstack_vector(self._resid_raw) - - @cache_readonly - def _rmse_raw(self): - """Returns the raw rmse values.""" - # X = self._x.values - # Y = self._y.values.squeeze() - - X = self._x_trans.values - Y = self._y_trans.values.squeeze() - - resid = Y - np.dot(X, self._beta_raw) - ss = (resid ** 2).sum() - return np.sqrt(ss / (self._nobs - self._df_raw)) - - @cache_readonly - def _var_beta_raw(self): - cluster_axis = None - if self._cluster == 'time': - cluster_axis = 0 - elif self._cluster == 'entity': - cluster_axis = 1 - - x = self._x - y = self._y - - if self._time_effects: - xx = _xx_time_effects(x, y) - else: - xx = np.dot(x.values.T, x.values) - - return _var_beta_panel(y, x, self._beta_raw, xx, - self._rmse_raw, cluster_axis, self._nw_lags, - self._nobs, self._df_raw, self._nw_overlap) - - @cache_readonly - def _y_fitted_raw(self): - """Returns the raw fitted y values.""" - return np.dot(self._x.values, self._beta_raw) - - @cache_readonly - def y_fitted(self): - return self._unstack_vector(self._y_fitted_raw, index=self._x.index) - - def _unstack_vector(self, vec, index=None): - if index is None: - index = self._y_trans.index - panel = DataFrame(vec, index=index, columns=['dummy']) - return panel.to_panel()['dummy'] - - def _unstack_y(self, vec): - unstacked = self._unstack_vector(vec) - return unstacked.reindex(self.beta.index) - - @cache_readonly - def _time_obs_count(self): - return self._y_trans.count(level=0).values - - @cache_readonly - def _time_has_obs(self): - return self._time_obs_count > 0 - - @property - def _nobs(self): - return len(self._y) - - -def _convertDummies(dummies, mapping): - # cleans up the names of the generated dummies - new_items = [] - for item in dummies.columns: - if not mapping: - var = str(item) - if isinstance(item, float): - var = '%g' % item - - new_items.append(var) - else: - # renames the dummies if a conversion dict is provided - new_items.append(mapping[int(item)]) - - dummies = DataFrame(dummies.values, index=dummies.index, - columns=new_items) - - return dummies - - -def _is_numeric(df): - for col in df: - if df[col].dtype.name == 'object': - return False - - return True - - -def add_intercept(panel, name='intercept'): - """ - Add column of ones to input panel - - Parameters - ---------- - panel: Panel / DataFrame - name: string, default 'intercept'] - - Returns - ------- - New object (same type as input) - """ - panel = panel.copy() - panel[name] = 1. - - return panel.consolidate() - - -class MovingPanelOLS(MovingOLS, PanelOLS): - """Implements rolling/expanding panel OLS. - - See ols function docs - """ - _panel_model = True - - def __init__(self, y, x, weights=None, - window_type='expanding', window=None, - min_periods=None, - min_obs=None, - intercept=True, - nw_lags=None, nw_overlap=False, - entity_effects=False, - time_effects=False, - x_effects=None, - cluster=None, - dropped_dummies=None, - verbose=False): - - self._args = dict(intercept=intercept, - nw_lags=nw_lags, - nw_overlap=nw_overlap, - entity_effects=entity_effects, - time_effects=time_effects, - x_effects=x_effects, - cluster=cluster, - dropped_dummies=dropped_dummies, - verbose=verbose) - - PanelOLS.__init__(self, y=y, x=x, weights=weights, - **self._args) - - self._set_window(window_type, window, min_periods) - - if min_obs is None: - min_obs = len(self._x.columns) + 1 - - self._min_obs = min_obs - - @cache_readonly - def resid(self): - return self._unstack_y(self._resid_raw) - - @cache_readonly - def y_fitted(self): - return self._unstack_y(self._y_fitted_raw) - - @cache_readonly - def y_predict(self): - """Returns the predicted y values.""" - return self._unstack_y(self._y_predict_raw) - - def lagged_y_predict(self, lag=1): - """ - Compute forecast Y value lagging coefficient by input number - of time periods - - Parameters - ---------- - lag : int - - Returns - ------- - DataFrame - """ - x = self._x.values - betas = self._beta_matrix(lag=lag) - return self._unstack_y((betas * x).sum(1)) - - @cache_readonly - def _rolling_ols_call(self): - return self._calc_betas(self._x_trans, self._y_trans) - - @cache_readonly - def _df_raw(self): - """Returns the degrees of freedom.""" - df = self._rolling_rank() - - if self._time_effects: - df += self._window_time_obs - - return df[self._valid_indices] - - @cache_readonly - def _var_beta_raw(self): - """Returns the raw covariance of beta.""" - x = self._x - y = self._y - - dates = x.index.levels[0] - - cluster_axis = None - if self._cluster == 'time': - cluster_axis = 0 - elif self._cluster == 'entity': - cluster_axis = 1 - - nobs = self._nobs - rmse = self._rmse_raw - beta = self._beta_raw - df = self._df_raw - window = self._window - - if not self._time_effects: - # Non-transformed X - cum_xx = self._cum_xx(x) - - results = [] - for n, i in enumerate(self._valid_indices): - if self._is_rolling and i >= window: - prior_date = dates[i - window + 1] - else: - prior_date = dates[0] - - date = dates[i] - - x_slice = x.truncate(prior_date, date) - y_slice = y.truncate(prior_date, date) - - if self._time_effects: - xx = _xx_time_effects(x_slice, y_slice) - else: - xx = cum_xx[i] - if self._is_rolling and i >= window: - xx = xx - cum_xx[i - window] - - result = _var_beta_panel(y_slice, x_slice, beta[n], xx, rmse[n], - cluster_axis, self._nw_lags, - nobs[n], df[n], self._nw_overlap) - - results.append(result) - - return np.array(results) - - @cache_readonly - def _resid_raw(self): - beta_matrix = self._beta_matrix(lag=0) - - Y = self._y.values.squeeze() - X = self._x.values - resid = Y - (X * beta_matrix).sum(1) - - return resid - - @cache_readonly - def _y_fitted_raw(self): - x = self._x.values - betas = self._beta_matrix(lag=0) - return (betas * x).sum(1) - - @cache_readonly - def _y_predict_raw(self): - """Returns the raw predicted y values.""" - x = self._x.values - betas = self._beta_matrix(lag=1) - return (betas * x).sum(1) - - def _beta_matrix(self, lag=0): - if lag < 0: - raise AssertionError("'lag' must be greater than or equal to 0, " - "input was {0}".format(lag)) - - index = self._y_trans.index - major_labels = index.labels[0] - labels = major_labels - lag - indexer = self._valid_indices.searchsorted(labels, side='left') - - beta_matrix = self._beta_raw[indexer] - beta_matrix[labels < self._valid_indices[0]] = np.NaN - - return beta_matrix - - @cache_readonly - def _enough_obs(self): - # XXX: what's the best way to determine where to start? - # TODO: write unit tests for this - - rank_threshold = len(self._x.columns) + 1 - if self._min_obs < rank_threshold: # pragma: no cover - warnings.warn('min_obs is smaller than rank of X matrix') - - enough_observations = self._nobs_raw >= self._min_obs - enough_time_periods = self._window_time_obs >= self._min_periods - return enough_time_periods & enough_observations - - -def create_ols_dict(attr): - def attr_getter(self): - d = {} - for k, v in compat.iteritems(self.results): - result = getattr(v, attr) - d[k] = result - - return d - - return attr_getter - - -def create_ols_attr(attr): - return property(create_ols_dict(attr)) - - -class NonPooledPanelOLS(object): - """Implements non-pooled panel OLS. - - Parameters - ---------- - y : DataFrame - x : Series, DataFrame, or dict of Series - intercept : bool - True if you want an intercept. - nw_lags : None or int - Number of Newey-West lags. - window_type : {'full_sample', 'rolling', 'expanding'} - 'full_sample' by default - window : int - size of window (for rolling/expanding OLS) - """ - - ATTRIBUTES = [ - 'beta', - 'df', - 'df_model', - 'df_resid', - 'f_stat', - 'p_value', - 'r2', - 'r2_adj', - 'resid', - 'rmse', - 'std_err', - 'summary_as_matrix', - 't_stat', - 'var_beta', - 'x', - 'y', - 'y_fitted', - 'y_predict' - ] - - def __init__(self, y, x, window_type='full_sample', window=None, - min_periods=None, intercept=True, nw_lags=None, - nw_overlap=False): - - import warnings - warnings.warn("The pandas.stats.plm module is deprecated and will be " - "removed in a future version. We refer to external packages " - "like statsmodels, see some examples here: " - "http://www.statsmodels.org/stable/mixed_linear.html", - FutureWarning, stacklevel=4) - - for attr in self.ATTRIBUTES: - setattr(self.__class__, attr, create_ols_attr(attr)) - - results = {} - - for entity in y: - entity_y = y[entity] - - entity_x = {} - for x_var in x: - entity_x[x_var] = x[x_var][entity] - - from pandas.stats.interface import ols - results[entity] = ols(y=entity_y, - x=entity_x, - window_type=window_type, - window=window, - min_periods=min_periods, - intercept=intercept, - nw_lags=nw_lags, - nw_overlap=nw_overlap) - - self.results = results - - -def _var_beta_panel(y, x, beta, xx, rmse, cluster_axis, - nw_lags, nobs, df, nw_overlap): - xx_inv = math.inv(xx) - - yv = y.values - - if cluster_axis is None: - if nw_lags is None: - return xx_inv * (rmse ** 2) - else: - resid = yv - np.dot(x.values, beta) - m = (x.values.T * resid).T - - xeps = math.newey_west(m, nw_lags, nobs, df, nw_overlap) - - return np.dot(xx_inv, np.dot(xeps, xx_inv)) - else: - Xb = np.dot(x.values, beta).reshape((len(x.values), 1)) - resid = DataFrame(yv[:, None] - Xb, index=y.index, columns=['resid']) - - if cluster_axis == 1: - x = x.swaplevel(0, 1).sort_index(level=0) - resid = resid.swaplevel(0, 1).sort_index(level=0) - - m = _group_agg(x.values * resid.values, x.index._bounds, - lambda x: np.sum(x, axis=0)) - - if nw_lags is None: - nw_lags = 0 - - xox = 0 - for i in range(len(x.index.levels[0])): - xox += math.newey_west(m[i: i + 1], nw_lags, - nobs, df, nw_overlap) - - return np.dot(xx_inv, np.dot(xox, xx_inv)) - - -def _group_agg(values, bounds, f): - """ - R-style aggregator - - Parameters - ---------- - values : N-length or N x K ndarray - bounds : B-length ndarray - f : ndarray aggregation function - - Returns - ------- - ndarray with same length as bounds array - """ - if values.ndim == 1: - N = len(values) - result = np.empty(len(bounds), dtype=float) - elif values.ndim == 2: - N, K = values.shape - result = np.empty((len(bounds), K), dtype=float) - - testagg = f(values[:min(1, len(values))]) - if isinstance(testagg, np.ndarray) and testagg.ndim == 2: - raise AssertionError('Function must reduce') - - for i, left_bound in enumerate(bounds): - if i == len(bounds) - 1: - right_bound = N - else: - right_bound = bounds[i + 1] - - result[i] = f(values[left_bound:right_bound]) - - return result - - -def _xx_time_effects(x, y): - """ - Returns X'X - (X'T) (T'T)^-1 (T'X) - """ - # X'X - xx = np.dot(x.values.T, x.values) - xt = x.sum(level=0).values - - count = y.unstack().count(1).values - selector = count > 0 - - # X'X - (T'T)^-1 (T'X) - xt = xt[selector] - count = count[selector] - - return xx - np.dot(xt.T / count, xt) diff --git a/pandas/stats/tests/common.py b/pandas/stats/tests/common.py deleted file mode 100644 index 0ce4b20a4b719..0000000000000 --- a/pandas/stats/tests/common.py +++ /dev/null @@ -1,162 +0,0 @@ -# pylint: disable-msg=W0611,W0402 -# flake8: noqa - -from datetime import datetime -import string -import nose - -import numpy as np - -from pandas import DataFrame, bdate_range -from pandas.util.testing import assert_almost_equal # imported in other tests -import pandas.util.testing as tm - -N = 100 -K = 4 - -start = datetime(2007, 1, 1) -DATE_RANGE = bdate_range(start, periods=N) - -COLS = ['Col' + c for c in string.ascii_uppercase[:K]] - - -def makeDataFrame(): - data = DataFrame(np.random.randn(N, K), - columns=COLS, - index=DATE_RANGE) - - return data - - -def getBasicDatasets(): - A = makeDataFrame() - B = makeDataFrame() - C = makeDataFrame() - - return A, B, C - - -def check_for_scipy(): - try: - import scipy - except ImportError: - raise nose.SkipTest('no scipy') - - -def check_for_statsmodels(): - _have_statsmodels = True - try: - import statsmodels.api as sm - except ImportError: - try: - import scikits.statsmodels.api as sm - except ImportError: - raise nose.SkipTest('no statsmodels') - - -class BaseTest(tm.TestCase): - - def setUp(self): - check_for_scipy() - check_for_statsmodels() - - self.A, self.B, self.C = getBasicDatasets() - - self.createData1() - self.createData2() - self.createData3() - - def createData1(self): - date = datetime(2007, 1, 1) - date2 = datetime(2007, 1, 15) - date3 = datetime(2007, 1, 22) - - A = self.A.copy() - B = self.B.copy() - C = self.C.copy() - - A['ColA'][date] = np.NaN - B['ColA'][date] = np.NaN - C['ColA'][date] = np.NaN - C['ColA'][date2] = np.NaN - - # truncate data to save time - A = A[:30] - B = B[:30] - C = C[:30] - - self.panel_y = A - self.panel_x = {'B': B, 'C': C} - - self.series_panel_y = A.filter(['ColA']) - self.series_panel_x = {'B': B.filter(['ColA']), - 'C': C.filter(['ColA'])} - self.series_y = A['ColA'] - self.series_x = {'B': B['ColA'], - 'C': C['ColA']} - - def createData2(self): - y_data = [[1, np.NaN], - [2, 3], - [4, 5]] - y_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2), - datetime(2000, 1, 3)] - y_cols = ['A', 'B'] - self.panel_y2 = DataFrame(np.array(y_data), index=y_index, - columns=y_cols) - - x1_data = [[6, np.NaN], - [7, 8], - [9, 30], - [11, 12]] - x1_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2), - datetime(2000, 1, 3), - datetime(2000, 1, 4)] - x1_cols = ['A', 'B'] - x1 = DataFrame(np.array(x1_data), index=x1_index, - columns=x1_cols) - - x2_data = [[13, 14, np.NaN], - [15, np.NaN, np.NaN], - [16, 17, 48], - [19, 20, 21], - [22, 23, 24]] - x2_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2), - datetime(2000, 1, 3), - datetime(2000, 1, 4), - datetime(2000, 1, 5)] - x2_cols = ['C', 'A', 'B'] - x2 = DataFrame(np.array(x2_data), index=x2_index, - columns=x2_cols) - - self.panel_x2 = {'x1': x1, 'x2': x2} - - def createData3(self): - y_data = [[1, 2], - [3, 4]] - y_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2)] - y_cols = ['A', 'B'] - self.panel_y3 = DataFrame(np.array(y_data), index=y_index, - columns=y_cols) - - x1_data = [['A', 'B'], - ['C', 'A']] - x1_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2)] - x1_cols = ['A', 'B'] - x1 = DataFrame(np.array(x1_data), index=x1_index, - columns=x1_cols) - - x2_data = [['foo', 'bar'], - ['baz', 'foo']] - x2_index = [datetime(2000, 1, 1), - datetime(2000, 1, 2)] - x2_cols = ['A', 'B'] - x2 = DataFrame(np.array(x2_data), index=x2_index, - columns=x2_cols) - - self.panel_x3 = {'x1': x1, 'x2': x2} diff --git a/pandas/stats/tests/test_fama_macbeth.py b/pandas/stats/tests/test_fama_macbeth.py deleted file mode 100644 index 706becfa730c4..0000000000000 --- a/pandas/stats/tests/test_fama_macbeth.py +++ /dev/null @@ -1,73 +0,0 @@ -# flake8: noqa - -from pandas import DataFrame, Panel -from pandas.stats.api import fama_macbeth -from .common import assert_almost_equal, BaseTest - -from pandas.compat import range -from pandas import compat -import pandas.util.testing as tm -import numpy as np - - -class TestFamaMacBeth(BaseTest): - - def testFamaMacBethRolling(self): - # self.checkFamaMacBethExtended('rolling', self.panel_x, self.panel_y, - # nw_lags_beta=2) - - # df = DataFrame(np.random.randn(50, 10)) - x = dict((k, DataFrame(np.random.randn(50, 10))) for k in 'abcdefg') - x = Panel.from_dict(x) - y = (DataFrame(np.random.randn(50, 10)) + - DataFrame(0.01 * np.random.randn(50, 10))) - self.checkFamaMacBethExtended('rolling', x, y, nw_lags_beta=2) - self.checkFamaMacBethExtended('expanding', x, y, nw_lags_beta=2) - - def checkFamaMacBethExtended(self, window_type, x, y, **kwds): - window = 25 - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = fama_macbeth(y=y, x=x, window_type=window_type, window=window, - **kwds) - self._check_stuff_works(result) - - index = result._index - time = len(index) - - for i in range(time - window + 1): - if window_type == 'rolling': - start = index[i] - else: - start = index[0] - - end = index[i + window - 1] - - x2 = {} - for k, v in x.iteritems(): - x2[k] = v.truncate(start, end) - y2 = y.truncate(start, end) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - reference = fama_macbeth(y=y2, x=x2, **kwds) - # reference._stats is tuple - assert_almost_equal(reference._stats, result._stats[:, i], - check_dtype=False) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - static = fama_macbeth(y=y2, x=x2, **kwds) - self._check_stuff_works(static) - - def _check_stuff_works(self, result): - # does it work? - attrs = ['mean_beta', 'std_beta', 't_stat'] - for attr in attrs: - getattr(result, attr) - - # does it work? - result.summary - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/stats/tests/test_math.py b/pandas/stats/tests/test_math.py deleted file mode 100644 index bc09f33d2f467..0000000000000 --- a/pandas/stats/tests/test_math.py +++ /dev/null @@ -1,63 +0,0 @@ -import nose - -from datetime import datetime -from numpy.random import randn -import numpy as np - -from pandas.core.api import Series, DataFrame, date_range -import pandas.util.testing as tm -import pandas.stats.math as pmath -from pandas import ols - -N, K = 100, 10 - -_have_statsmodels = True -try: - import statsmodels.api as sm -except ImportError: - try: - import scikits.statsmodels.api as sm # noqa - except ImportError: - _have_statsmodels = False - - -class TestMath(tm.TestCase): - - _nan_locs = np.arange(20, 40) - _inf_locs = np.array([]) - - def setUp(self): - arr = randn(N) - arr[self._nan_locs] = np.NaN - - self.arr = arr - self.rng = date_range(datetime(2009, 1, 1), periods=N) - - self.series = Series(arr.copy(), index=self.rng) - - self.frame = DataFrame(randn(N, K), index=self.rng, - columns=np.arange(K)) - - def test_rank_1d(self): - self.assertEqual(1, pmath.rank(self.series)) - self.assertEqual(0, pmath.rank(Series(0, self.series.index))) - - def test_solve_rect(self): - if not _have_statsmodels: - raise nose.SkipTest("no statsmodels") - - b = Series(np.random.randn(N), self.frame.index) - result = pmath.solve(self.frame, b) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - expected = ols(y=b, x=self.frame, intercept=False).beta - self.assertTrue(np.allclose(result, expected)) - - def test_inv_illformed(self): - singular = DataFrame(np.array([[1, 1], [2, 2]])) - rs = pmath.inv(singular) - expected = np.array([[0.1, 0.2], [0.1, 0.2]]) - self.assertTrue(np.allclose(rs, expected)) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/stats/tests/test_ols.py b/pandas/stats/tests/test_ols.py deleted file mode 100644 index 2935f986cca9f..0000000000000 --- a/pandas/stats/tests/test_ols.py +++ /dev/null @@ -1,982 +0,0 @@ -""" -Unit test suite for OLS and PanelOLS classes -""" - -# pylint: disable-msg=W0212 - -# flake8: noqa - -from __future__ import division - -from datetime import datetime -from pandas import compat -from distutils.version import LooseVersion -import nose -import numpy as np - -from pandas import date_range, bdate_range -from pandas.core.panel import Panel -from pandas import DataFrame, Index, Series, notnull, offsets -from pandas.stats.api import ols -from pandas.stats.ols import _filter_data -from pandas.stats.plm import NonPooledPanelOLS, PanelOLS -from pandas.util.testing import (assert_almost_equal, assert_series_equal, - assert_frame_equal, assertRaisesRegexp, slow) -import pandas.util.testing as tm -import pandas.compat as compat -from pandas.stats.tests.common import BaseTest - -_have_statsmodels = True -try: - import statsmodels.api as sm -except ImportError: - try: - import scikits.statsmodels.api as sm - except ImportError: - _have_statsmodels = False - - -def _check_repr(obj): - repr(obj) - str(obj) - - -def _compare_ols_results(model1, model2): - tm.assertIsInstance(model1, type(model2)) - - if hasattr(model1, '_window_type'): - _compare_moving_ols(model1, model2) - else: - _compare_fullsample_ols(model1, model2) - - -def _compare_fullsample_ols(model1, model2): - assert_series_equal(model1.beta, model2.beta) - - -def _compare_moving_ols(model1, model2): - assert_frame_equal(model1.beta, model2.beta) - - -class TestOLS(BaseTest): - - _multiprocess_can_split_ = True - - # TODO: Add tests for OLS y predict - # TODO: Right now we just check for consistency between full-sample and - # rolling/expanding results of the panel OLS. We should also cross-check - # with trusted implementations of panel OLS (e.g. R). - # TODO: Add tests for non pooled OLS. - - @classmethod - def setUpClass(cls): - super(TestOLS, cls).setUpClass() - try: - import matplotlib as mpl - mpl.use('Agg', warn=False) - except ImportError: - pass - - if not _have_statsmodels: - raise nose.SkipTest("no statsmodels") - - def testOLSWithDatasets_ccard(self): - self.checkDataSet(sm.datasets.ccard.load(), skip_moving=True) - self.checkDataSet(sm.datasets.cpunish.load(), skip_moving=True) - self.checkDataSet(sm.datasets.longley.load(), skip_moving=True) - self.checkDataSet(sm.datasets.stackloss.load(), skip_moving=True) - - @slow - def testOLSWithDatasets_copper(self): - self.checkDataSet(sm.datasets.copper.load()) - - @slow - def testOLSWithDatasets_scotland(self): - self.checkDataSet(sm.datasets.scotland.load()) - - # degenerate case fails on some platforms - # self.checkDataSet(datasets.ccard.load(), 39, 49) # one col in X all - # 0s - - def testWLS(self): - # WLS centered SS changed (fixed) in 0.5.0 - sm_version = sm.version.version - if sm_version < LooseVersion('0.5.0'): - raise nose.SkipTest("WLS centered SS not fixed in statsmodels" - " version {0}".format(sm_version)) - - X = DataFrame(np.random.randn(30, 4), columns=['A', 'B', 'C', 'D']) - Y = Series(np.random.randn(30)) - weights = X.std(1) - - self._check_wls(X, Y, weights) - - weights.loc[[5, 15]] = np.nan - Y[[2, 21]] = np.nan - self._check_wls(X, Y, weights) - - def _check_wls(self, x, y, weights): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=y, x=x, weights=1 / weights) - - combined = x.copy() - combined['__y__'] = y - combined['__weights__'] = weights - combined = combined.dropna() - - endog = combined.pop('__y__').values - aweights = combined.pop('__weights__').values - exog = sm.add_constant(combined.values, prepend=False) - - sm_result = sm.WLS(endog, exog, weights=1 / aweights).fit() - - assert_almost_equal(sm_result.params, result._beta_raw) - assert_almost_equal(sm_result.resid, result._resid_raw) - - self.checkMovingOLS('rolling', x, y, weights=weights) - self.checkMovingOLS('expanding', x, y, weights=weights) - - def checkDataSet(self, dataset, start=None, end=None, skip_moving=False): - exog = dataset.exog[start: end] - endog = dataset.endog[start: end] - x = DataFrame(exog, index=np.arange(exog.shape[0]), - columns=np.arange(exog.shape[1])) - y = Series(endog, index=np.arange(len(endog))) - - self.checkOLS(exog, endog, x, y) - - if not skip_moving: - self.checkMovingOLS('rolling', x, y) - self.checkMovingOLS('rolling', x, y, nw_lags=0) - self.checkMovingOLS('expanding', x, y, nw_lags=0) - self.checkMovingOLS('rolling', x, y, nw_lags=1) - self.checkMovingOLS('expanding', x, y, nw_lags=1) - self.checkMovingOLS('expanding', x, y, nw_lags=1, nw_overlap=True) - - def checkOLS(self, exog, endog, x, y): - reference = sm.OLS(endog, sm.add_constant(exog, prepend=False)).fit() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=y, x=x) - - # check that sparse version is the same - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - sparse_result = ols(y=y.to_sparse(), x=x.to_sparse()) - _compare_ols_results(result, sparse_result) - - assert_almost_equal(reference.params, result._beta_raw) - assert_almost_equal(reference.df_model, result._df_model_raw) - assert_almost_equal(reference.df_resid, result._df_resid_raw) - assert_almost_equal(reference.fvalue, result._f_stat_raw[0]) - assert_almost_equal(reference.pvalues, result._p_value_raw) - assert_almost_equal(reference.rsquared, result._r2_raw) - assert_almost_equal(reference.rsquared_adj, result._r2_adj_raw) - assert_almost_equal(reference.resid, result._resid_raw) - assert_almost_equal(reference.bse, result._std_err_raw) - assert_almost_equal(reference.tvalues, result._t_stat_raw) - assert_almost_equal(reference.cov_params(), result._var_beta_raw) - assert_almost_equal(reference.fittedvalues, result._y_fitted_raw) - - _check_non_raw_results(result) - - def checkMovingOLS(self, window_type, x, y, weights=None, **kwds): - window = np.linalg.matrix_rank(x.values) * 2 - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - moving = ols(y=y, x=x, weights=weights, window_type=window_type, - window=window, **kwds) - - # check that sparse version is the same - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - sparse_moving = ols(y=y.to_sparse(), x=x.to_sparse(), - weights=weights, - window_type=window_type, - window=window, **kwds) - _compare_ols_results(moving, sparse_moving) - - index = moving._index - - for n, i in enumerate(moving._valid_indices): - if window_type == 'rolling' and i >= window: - prior_date = index[i - window + 1] - else: - prior_date = index[0] - - date = index[i] - - x_iter = {} - for k, v in compat.iteritems(x): - x_iter[k] = v.truncate(before=prior_date, after=date) - y_iter = y.truncate(before=prior_date, after=date) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - static = ols(y=y_iter, x=x_iter, weights=weights, **kwds) - - self.compare(static, moving, event_index=i, - result_index=n) - - _check_non_raw_results(moving) - - FIELDS = ['beta', 'df', 'df_model', 'df_resid', 'f_stat', 'p_value', - 'r2', 'r2_adj', 'rmse', 'std_err', 't_stat', - 'var_beta'] - - def compare(self, static, moving, event_index=None, - result_index=None): - - index = moving._index - - # Check resid if we have a time index specified - if event_index is not None: - ref = static._resid_raw[-1] - - label = index[event_index] - - res = moving.resid[label] - - assert_almost_equal(ref, res) - - ref = static._y_fitted_raw[-1] - res = moving.y_fitted[label] - - assert_almost_equal(ref, res) - - # Check y_fitted - - for field in self.FIELDS: - attr = '_%s_raw' % field - - ref = getattr(static, attr) - res = getattr(moving, attr) - - if result_index is not None: - res = res[result_index] - - assert_almost_equal(ref, res) - - def test_ols_object_dtype(self): - df = DataFrame(np.random.randn(20, 2), dtype=object) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=df[0], x=df[1]) - summary = repr(model) - - -class TestOLSMisc(tm.TestCase): - - _multiprocess_can_split_ = True - - """ - For test coverage with faux data - """ - @classmethod - def setUpClass(cls): - super(TestOLSMisc, cls).setUpClass() - if not _have_statsmodels: - raise nose.SkipTest("no statsmodels") - - def test_f_test(self): - x = tm.makeTimeDataFrame() - y = x.pop('A') - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x) - - hyp = '1*B+1*C+1*D=0' - result = model.f_test(hyp) - - hyp = ['1*B=0', - '1*C=0', - '1*D=0'] - result = model.f_test(hyp) - assert_almost_equal(result['f-stat'], model.f_stat['f-stat']) - - self.assertRaises(Exception, model.f_test, '1*A=0') - - def test_r2_no_intercept(self): - y = tm.makeTimeSeries() - x = tm.makeTimeDataFrame() - - x_with = x.copy() - x_with['intercept'] = 1. - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model1 = ols(y=y, x=x) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model2 = ols(y=y, x=x_with, intercept=False) - assert_series_equal(model1.beta, model2.beta) - - # TODO: can we infer whether the intercept is there... - self.assertNotEqual(model1.r2, model2.r2) - - # rolling - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model1 = ols(y=y, x=x, window=20) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model2 = ols(y=y, x=x_with, window=20, intercept=False) - assert_frame_equal(model1.beta, model2.beta) - self.assertTrue((model1.r2 != model2.r2).all()) - - def test_summary_many_terms(self): - x = DataFrame(np.random.randn(100, 20)) - y = np.random.randn(100) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x) - model.summary - - def test_y_predict(self): - y = tm.makeTimeSeries() - x = tm.makeTimeDataFrame() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model1 = ols(y=y, x=x) - assert_series_equal(model1.y_predict, model1.y_fitted) - assert_almost_equal(model1._y_predict_raw, model1._y_fitted_raw) - - def test_predict(self): - y = tm.makeTimeSeries() - x = tm.makeTimeDataFrame() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model1 = ols(y=y, x=x) - assert_series_equal(model1.predict(), model1.y_predict) - assert_series_equal(model1.predict(x=x), model1.y_predict) - assert_series_equal(model1.predict(beta=model1.beta), model1.y_predict) - - exog = x.copy() - exog['intercept'] = 1. - rs = Series(np.dot(exog.values, model1.beta.values), x.index) - assert_series_equal(model1.y_predict, rs) - - x2 = x.reindex(columns=x.columns[::-1]) - assert_series_equal(model1.predict(x=x2), model1.y_predict) - - x3 = x2 + 10 - pred3 = model1.predict(x=x3) - x3['intercept'] = 1. - x3 = x3.reindex(columns=model1.beta.index) - expected = Series(np.dot(x3.values, model1.beta.values), x3.index) - assert_series_equal(expected, pred3) - - beta = Series(0., model1.beta.index) - pred4 = model1.predict(beta=beta) - assert_series_equal(Series(0., pred4.index), pred4) - - def test_predict_longer_exog(self): - exogenous = {"1998": "4760", "1999": "5904", "2000": "4504", - "2001": "9808", "2002": "4241", "2003": "4086", - "2004": "4687", "2005": "7686", "2006": "3740", - "2007": "3075", "2008": "3753", "2009": "4679", - "2010": "5468", "2011": "7154", "2012": "4292", - "2013": "4283", "2014": "4595", "2015": "9194", - "2016": "4221", "2017": "4520"} - endogenous = {"1998": "691", "1999": "1580", "2000": "80", - "2001": "1450", "2002": "555", "2003": "956", - "2004": "877", "2005": "614", "2006": "468", - "2007": "191"} - - endog = Series(endogenous) - exog = Series(exogenous) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=endog, x=exog) - - pred = model.y_predict - self.assert_index_equal(pred.index, exog.index) - - def test_longpanel_series_combo(self): - wp = tm.makePanel() - lp = wp.to_frame() - - y = lp.pop('ItemA') - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=lp, entity_effects=True, window=20) - self.assertTrue(notnull(model.beta.values).all()) - tm.assertIsInstance(model, PanelOLS) - model.summary - - def test_series_rhs(self): - y = tm.makeTimeSeries() - x = tm.makeTimeSeries() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - expected = ols(y=y, x={'x': x}) - assert_series_equal(model.beta, expected.beta) - - # GH 5233/5250 - assert_series_equal(model.y_predict, model.predict(x=x)) - - def test_various_attributes(self): - # just make sure everything "works". test correctness elsewhere - - x = DataFrame(np.random.randn(100, 5)) - y = np.random.randn(100) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x, window=20) - - series_attrs = ['rank', 'df', 'forecast_mean', 'forecast_vol'] - - for attr in series_attrs: - value = getattr(model, attr) - tm.assertIsInstance(value, Series) - - # works - model._results - - def test_catch_regressor_overlap(self): - df1 = tm.makeTimeDataFrame().loc[:, ['A', 'B']] - df2 = tm.makeTimeDataFrame().loc[:, ['B', 'C', 'D']] - y = tm.makeTimeSeries() - - data = {'foo': df1, 'bar': df2} - - def f(): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - ols(y=y, x=data) - self.assertRaises(Exception, f) - - def test_plm_ctor(self): - y = tm.makeTimeDataFrame() - x = {'a': tm.makeTimeDataFrame(), - 'b': tm.makeTimeDataFrame()} - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x, intercept=False) - model.summary - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=Panel(x)) - model.summary - - def test_plm_attrs(self): - y = tm.makeTimeDataFrame() - x = {'a': tm.makeTimeDataFrame(), - 'b': tm.makeTimeDataFrame()} - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - rmodel = ols(y=y, x=x, window=10) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x) - model.resid - rmodel.resid - - def test_plm_lagged_y_predict(self): - y = tm.makeTimeDataFrame() - x = {'a': tm.makeTimeDataFrame(), - 'b': tm.makeTimeDataFrame()} - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x, window=10) - result = model.lagged_y_predict(2) - - def test_plm_f_test(self): - y = tm.makeTimeDataFrame() - x = {'a': tm.makeTimeDataFrame(), - 'b': tm.makeTimeDataFrame()} - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=y, x=x) - - hyp = '1*a+1*b=0' - result = model.f_test(hyp) - - hyp = ['1*a=0', - '1*b=0'] - result = model.f_test(hyp) - assert_almost_equal(result['f-stat'], model.f_stat['f-stat']) - - def test_plm_exclude_dummy_corner(self): - y = tm.makeTimeDataFrame() - x = {'a': tm.makeTimeDataFrame(), - 'b': tm.makeTimeDataFrame()} - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols( - y=y, x=x, entity_effects=True, dropped_dummies={'entity': 'D'}) - model.summary - - def f(): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - ols(y=y, x=x, entity_effects=True, - dropped_dummies={'entity': 'E'}) - self.assertRaises(Exception, f) - - def test_columns_tuples_summary(self): - # #1837 - X = DataFrame(np.random.randn(10, 2), columns=[('a', 'b'), ('c', 'd')]) - Y = Series(np.random.randn(10)) - - # it works! - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - model = ols(y=Y, x=X) - model.summary - - -class TestPanelOLS(BaseTest): - - _multiprocess_can_split_ = True - - FIELDS = ['beta', 'df', 'df_model', 'df_resid', 'f_stat', - 'p_value', 'r2', 'r2_adj', 'rmse', 'std_err', - 't_stat', 'var_beta'] - - _other_fields = ['resid', 'y_fitted'] - - def testFiltering(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2) - - x = result._x - index = x.index.get_level_values(0) - index = Index(sorted(set(index))) - exp_index = Index([datetime(2000, 1, 1), datetime(2000, 1, 3)]) - self.assert_index_equal(exp_index, index) - - index = x.index.get_level_values(1) - index = Index(sorted(set(index))) - exp_index = Index(['A', 'B']) - self.assert_index_equal(exp_index, index) - - x = result._x_filtered - index = x.index.get_level_values(0) - index = Index(sorted(set(index))) - exp_index = Index([datetime(2000, 1, 1), - datetime(2000, 1, 3), - datetime(2000, 1, 4)]) - self.assert_index_equal(exp_index, index) - - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 4, 5], - check_dtype=False) - - exp_x = np.array([[6, 14, 1], [9, 17, 1], - [30, 48, 1]], dtype=np.float64) - assert_almost_equal(exp_x, result._x.values) - - exp_x_filtered = np.array([[6, 14, 1], [9, 17, 1], [30, 48, 1], - [11, 20, 1], [12, 21, 1]], dtype=np.float64) - assert_almost_equal(exp_x_filtered, result._x_filtered.values) - - self.assert_index_equal(result._x_filtered.index.levels[0], - result.y_fitted.index) - - def test_wls_panel(self): - y = tm.makeTimeDataFrame() - x = Panel({'x1': tm.makeTimeDataFrame(), - 'x2': tm.makeTimeDataFrame()}) - - y.iloc[[1, 7], y.columns.get_loc('A')] = np.nan - y.iloc[[6, 15], y.columns.get_loc('B')] = np.nan - y.iloc[[3, 20], y.columns.get_loc('C')] = np.nan - y.iloc[[5, 11], y.columns.get_loc('D')] = np.nan - - stack_y = y.stack() - stack_x = DataFrame(dict((k, v.stack()) - for k, v in x.iteritems())) - - weights = x.std('items') - stack_weights = weights.stack() - - stack_y.index = stack_y.index._tuple_index - stack_x.index = stack_x.index._tuple_index - stack_weights.index = stack_weights.index._tuple_index - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=y, x=x, weights=1 / weights) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - expected = ols(y=stack_y, x=stack_x, weights=1 / stack_weights) - - assert_almost_equal(result.beta, expected.beta) - - for attr in ['resid', 'y_fitted']: - rvals = getattr(result, attr).stack().values - evals = getattr(expected, attr).values - assert_almost_equal(rvals, evals) - - def testWithTimeEffects(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2, time_effects=True) - - # .flat is flatiter instance - assert_almost_equal(result._y_trans.values.flat, [0, -0.5, 0.5], - check_dtype=False) - - exp_x = np.array([[0, 0], [-10.5, -15.5], [10.5, 15.5]]) - assert_almost_equal(result._x_trans.values, exp_x) - - # _check_non_raw_results(result) - - def testWithEntityEffects(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2, entity_effects=True) - - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 4, 5], - check_dtype=False) - - exp_x = DataFrame([[0., 6., 14., 1.], [0, 9, 17, 1], [1, 30, 48, 1]], - index=result._x.index, columns=['FE_B', 'x1', 'x2', - 'intercept'], - dtype=float) - tm.assert_frame_equal(result._x, exp_x.loc[:, result._x.columns]) - # _check_non_raw_results(result) - - def testWithEntityEffectsAndDroppedDummies(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2, entity_effects=True, - dropped_dummies={'entity': 'B'}) - - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 4, 5], - check_dtype=False) - exp_x = DataFrame([[1., 6., 14., 1.], [1, 9, 17, 1], [0, 30, 48, 1]], - index=result._x.index, columns=['FE_A', 'x1', 'x2', - 'intercept'], - dtype=float) - tm.assert_frame_equal(result._x, exp_x.loc[:, result._x.columns]) - # _check_non_raw_results(result) - - def testWithXEffects(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2, x_effects=['x1']) - - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 4, 5], - check_dtype=False) - - res = result._x - exp_x = DataFrame([[0., 0., 14., 1.], [0, 1, 17, 1], [1, 0, 48, 1]], - columns=['x1_30', 'x1_9', 'x2', 'intercept'], - index=res.index, dtype=float) - exp_x[['x1_30', 'x1_9']] = exp_x[['x1_30', 'x1_9']].astype(np.uint8) - assert_frame_equal(res, exp_x.reindex(columns=res.columns)) - - def testWithXEffectsAndDroppedDummies(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y2, x=self.panel_x2, x_effects=['x1'], - dropped_dummies={'x1': 30}) - - res = result._x - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 4, 5], - check_dtype=False) - exp_x = DataFrame([[1., 0., 14., 1.], [0, 1, 17, 1], [0, 0, 48, 1]], - columns=['x1_6', 'x1_9', 'x2', 'intercept'], - index=res.index, dtype=float) - exp_x[['x1_6', 'x1_9']] = exp_x[['x1_6', 'x1_9']].astype(np.uint8) - - assert_frame_equal(res, exp_x.reindex(columns=res.columns)) - - def testWithXEffectsAndConversion(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y3, x=self.panel_x3, - x_effects=['x1', 'x2']) - - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 2, 3, 4], - check_dtype=False) - exp_x = np.array([[0, 0, 0, 1, 1], [1, 0, 0, 0, 1], [0, 1, 1, 0, 1], - [0, 0, 0, 1, 1]], dtype=np.float64) - assert_almost_equal(result._x.values, exp_x) - - exp_index = Index(['x1_B', 'x1_C', 'x2_baz', 'x2_foo', 'intercept']) - self.assert_index_equal(exp_index, result._x.columns) - - # _check_non_raw_results(result) - - def testWithXEffectsAndConversionAndDroppedDummies(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=self.panel_y3, x=self.panel_x3, x_effects=['x1', 'x2'], - dropped_dummies={'x2': 'foo'}) - # .flat is flatiter instance - assert_almost_equal(result._y.values.flat, [1, 2, 3, 4], - check_dtype=False) - exp_x = np.array([[0, 0, 0, 0, 1], [1, 0, 1, 0, 1], [0, 1, 0, 1, 1], - [0, 0, 0, 0, 1]], dtype=np.float64) - assert_almost_equal(result._x.values, exp_x) - - exp_index = Index(['x1_B', 'x1_C', 'x2_bar', 'x2_baz', 'intercept']) - self.assert_index_equal(exp_index, result._x.columns) - - # _check_non_raw_results(result) - - def testForSeries(self): - self.checkForSeries(self.series_panel_x, self.series_panel_y, - self.series_x, self.series_y) - - self.checkForSeries(self.series_panel_x, self.series_panel_y, - self.series_x, self.series_y, nw_lags=0) - - self.checkForSeries(self.series_panel_x, self.series_panel_y, - self.series_x, self.series_y, nw_lags=1, - nw_overlap=True) - - def testRolling(self): - self.checkMovingOLS(self.panel_x, self.panel_y) - - def testRollingWithFixedEffects(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - entity_effects=True) - self.checkMovingOLS(self.panel_x, self.panel_y, intercept=False, - entity_effects=True) - - def testRollingWithTimeEffects(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - time_effects=True) - - def testRollingWithNeweyWest(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - nw_lags=1) - - def testRollingWithEntityCluster(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - cluster='entity') - - def testUnknownClusterRaisesValueError(self): - assertRaisesRegexp(ValueError, "Unrecognized cluster.*ridiculous", - self.checkMovingOLS, self.panel_x, self.panel_y, - cluster='ridiculous') - - def testRollingWithTimeEffectsAndEntityCluster(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - time_effects=True, cluster='entity') - - def testRollingWithTimeCluster(self): - self.checkMovingOLS(self.panel_x, self.panel_y, - cluster='time') - - def testRollingWithNeweyWestAndEntityCluster(self): - self.assertRaises(ValueError, self.checkMovingOLS, - self.panel_x, self.panel_y, - nw_lags=1, cluster='entity') - - def testRollingWithNeweyWestAndTimeEffectsAndEntityCluster(self): - self.assertRaises(ValueError, - self.checkMovingOLS, self.panel_x, self.panel_y, - nw_lags=1, cluster='entity', - time_effects=True) - - def testExpanding(self): - self.checkMovingOLS( - self.panel_x, self.panel_y, window_type='expanding') - - def testNonPooled(self): - self.checkNonPooled(y=self.panel_y, x=self.panel_x) - self.checkNonPooled(y=self.panel_y, x=self.panel_x, - window_type='rolling', window=25, min_periods=10) - - def testUnknownWindowType(self): - assertRaisesRegexp(ValueError, "window.*ridiculous", - self.checkNonPooled, y=self.panel_y, x=self.panel_x, - window_type='ridiculous', window=25, min_periods=10) - - def checkNonPooled(self, x, y, **kwds): - # For now, just check that it doesn't crash - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=y, x=x, pool=False, **kwds) - - _check_repr(result) - for attr in NonPooledPanelOLS.ATTRIBUTES: - _check_repr(getattr(result, attr)) - - def checkMovingOLS(self, x, y, window_type='rolling', **kwds): - window = 25 # must be larger than rank of x - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - moving = ols(y=y, x=x, window_type=window_type, - window=window, **kwds) - - index = moving._index - - for n, i in enumerate(moving._valid_indices): - if window_type == 'rolling' and i >= window: - prior_date = index[i - window + 1] - else: - prior_date = index[0] - - date = index[i] - - x_iter = {} - for k, v in compat.iteritems(x): - x_iter[k] = v.truncate(before=prior_date, after=date) - y_iter = y.truncate(before=prior_date, after=date) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - static = ols(y=y_iter, x=x_iter, **kwds) - - self.compare(static, moving, event_index=i, - result_index=n) - - _check_non_raw_results(moving) - - def checkForSeries(self, x, y, series_x, series_y, **kwds): - # Consistency check with simple OLS. - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = ols(y=y, x=x, **kwds) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - reference = ols(y=series_y, x=series_x, **kwds) - - self.compare(reference, result) - - def compare(self, static, moving, event_index=None, - result_index=None): - - # Check resid if we have a time index specified - if event_index is not None: - staticSlice = _period_slice(static, -1) - movingSlice = _period_slice(moving, event_index) - - ref = static._resid_raw[staticSlice] - res = moving._resid_raw[movingSlice] - - assert_almost_equal(ref, res) - - ref = static._y_fitted_raw[staticSlice] - res = moving._y_fitted_raw[movingSlice] - - assert_almost_equal(ref, res) - - # Check y_fitted - - for field in self.FIELDS: - attr = '_%s_raw' % field - - ref = getattr(static, attr) - res = getattr(moving, attr) - - if result_index is not None: - res = res[result_index] - - assert_almost_equal(ref, res) - - def test_auto_rolling_window_type(self): - data = tm.makeTimeDataFrame() - y = data.pop('A') - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - window_model = ols(y=y, x=data, window=20, min_periods=10) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - rolling_model = ols(y=y, x=data, window=20, min_periods=10, - window_type='rolling') - - assert_frame_equal(window_model.beta, rolling_model.beta) - - def test_group_agg(self): - from pandas.stats.plm import _group_agg - - values = np.ones((10, 2)) * np.arange(10).reshape((10, 1)) - bounds = np.arange(5) * 2 - f = lambda x: x.mean(axis=0) - - agged = _group_agg(values, bounds, f) - - assert(agged[1][0] == 2.5) - assert(agged[2][0] == 4.5) - - # test a function that doesn't aggregate - f2 = lambda x: np.zeros((2, 2)) - self.assertRaises(Exception, _group_agg, values, bounds, f2) - - -def _check_non_raw_results(model): - _check_repr(model) - _check_repr(model.resid) - _check_repr(model.summary_as_matrix) - _check_repr(model.y_fitted) - _check_repr(model.y_predict) - - -def _period_slice(panelModel, i): - index = panelModel._x_trans.index - period = index.levels[0][i] - - L, R = index.get_major_bounds(period, period) - - return slice(L, R) - - -class TestOLSFilter(tm.TestCase): - - _multiprocess_can_split_ = True - - def setUp(self): - date_index = date_range(datetime(2009, 12, 11), periods=3, - freq=offsets.BDay()) - ts = Series([3, 1, 4], index=date_index) - self.TS1 = ts - - date_index = date_range(datetime(2009, 12, 11), periods=5, - freq=offsets.BDay()) - ts = Series([1, 5, 9, 2, 6], index=date_index) - self.TS2 = ts - - date_index = date_range(datetime(2009, 12, 11), periods=3, - freq=offsets.BDay()) - ts = Series([5, np.nan, 3], index=date_index) - self.TS3 = ts - - date_index = date_range(datetime(2009, 12, 11), periods=5, - freq=offsets.BDay()) - ts = Series([np.nan, 5, 8, 9, 7], index=date_index) - self.TS4 = ts - - data = {'x1': self.TS2, 'x2': self.TS4} - self.DF1 = DataFrame(data=data) - - data = {'x1': self.TS2, 'x2': self.TS4} - self.DICT1 = data - - def testFilterWithSeriesRHS(self): - (lhs, rhs, weights, rhs_pre, - index, valid) = _filter_data(self.TS1, {'x1': self.TS2}, None) - self.tsAssertEqual(self.TS1.astype(np.float64), lhs, check_names=False) - self.tsAssertEqual(self.TS2[:3].astype(np.float64), rhs['x1'], - check_names=False) - self.tsAssertEqual(self.TS2.astype(np.float64), rhs_pre['x1'], - check_names=False) - - def testFilterWithSeriesRHS2(self): - (lhs, rhs, weights, rhs_pre, - index, valid) = _filter_data(self.TS2, {'x1': self.TS1}, None) - self.tsAssertEqual(self.TS2[:3].astype(np.float64), lhs, - check_names=False) - self.tsAssertEqual(self.TS1.astype(np.float64), rhs['x1'], - check_names=False) - self.tsAssertEqual(self.TS1.astype(np.float64), rhs_pre['x1'], - check_names=False) - - def testFilterWithSeriesRHS3(self): - (lhs, rhs, weights, rhs_pre, - index, valid) = _filter_data(self.TS3, {'x1': self.TS4}, None) - exp_lhs = self.TS3[2:3] - exp_rhs = self.TS4[2:3] - exp_rhs_pre = self.TS4[1:] - self.tsAssertEqual(exp_lhs, lhs, check_names=False) - self.tsAssertEqual(exp_rhs, rhs['x1'], check_names=False) - self.tsAssertEqual(exp_rhs_pre, rhs_pre['x1'], check_names=False) - - def testFilterWithDataFrameRHS(self): - (lhs, rhs, weights, rhs_pre, - index, valid) = _filter_data(self.TS1, self.DF1, None) - exp_lhs = self.TS1[1:].astype(np.float64) - exp_rhs1 = self.TS2[1:3] - exp_rhs2 = self.TS4[1:3].astype(np.float64) - self.tsAssertEqual(exp_lhs, lhs, check_names=False) - self.tsAssertEqual(exp_rhs1, rhs['x1'], check_names=False) - self.tsAssertEqual(exp_rhs2, rhs['x2'], check_names=False) - - def testFilterWithDictRHS(self): - (lhs, rhs, weights, rhs_pre, - index, valid) = _filter_data(self.TS1, self.DICT1, None) - exp_lhs = self.TS1[1:].astype(np.float64) - exp_rhs1 = self.TS2[1:3].astype(np.float64) - exp_rhs2 = self.TS4[1:3].astype(np.float64) - self.tsAssertEqual(exp_lhs, lhs, check_names=False) - self.tsAssertEqual(exp_rhs1, rhs['x1'], check_names=False) - self.tsAssertEqual(exp_rhs2, rhs['x2'], check_names=False) - - def tsAssertEqual(self, ts1, ts2, **kwargs): - self.assert_series_equal(ts1, ts2, **kwargs) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/stats/tests/test_var.py b/pandas/stats/tests/test_var.py deleted file mode 100644 index 9f2c95a2d3d5c..0000000000000 --- a/pandas/stats/tests/test_var.py +++ /dev/null @@ -1,99 +0,0 @@ -# flake8: noqa - -from __future__ import print_function - -import pandas.util.testing as tm - -from pandas.compat import range -import nose -import unittest - -raise nose.SkipTest('skipping this for now') - -try: - import statsmodels.tsa.var as sm_var - import statsmodels as sm -except ImportError: - import scikits.statsmodels.tsa.var as sm_var - import scikits.statsmodels as sm - - -import pandas.stats.var as _pvar -reload(_pvar) -from pandas.stats.var import VAR - -DECIMAL_6 = 6 -DECIMAL_5 = 5 -DECIMAL_4 = 4 -DECIMAL_3 = 3 -DECIMAL_2 = 2 - - -class CheckVAR(object): - - def test_params(self): - tm.assert_almost_equal(self.res1.params, self.res2.params, DECIMAL_3) - - def test_neqs(self): - tm.assert_numpy_array_equal(self.res1.neqs, self.res2.neqs) - - def test_nobs(self): - tm.assert_numpy_array_equal(self.res1.avobs, self.res2.nobs) - - def test_df_eq(self): - tm.assert_numpy_array_equal(self.res1.df_eq, self.res2.df_eq) - - def test_rmse(self): - results = self.res1.results - for i in range(len(results)): - tm.assert_almost_equal(results[i].mse_resid ** .5, - eval('self.res2.rmse_' + str(i + 1)), - DECIMAL_6) - - def test_rsquared(self): - results = self.res1.results - for i in range(len(results)): - tm.assert_almost_equal(results[i].rsquared, - eval('self.res2.rsquared_' + str(i + 1)), - DECIMAL_3) - - def test_llf(self): - results = self.res1.results - tm.assert_almost_equal(self.res1.llf, self.res2.llf, DECIMAL_2) - for i in range(len(results)): - tm.assert_almost_equal(results[i].llf, - eval('self.res2.llf_' + str(i + 1)), - DECIMAL_2) - - def test_aic(self): - tm.assert_almost_equal(self.res1.aic, self.res2.aic) - - def test_bic(self): - tm.assert_almost_equal(self.res1.bic, self.res2.bic) - - def test_hqic(self): - tm.assert_almost_equal(self.res1.hqic, self.res2.hqic) - - def test_fpe(self): - tm.assert_almost_equal(self.res1.fpe, self.res2.fpe) - - def test_detsig(self): - tm.assert_almost_equal(self.res1.detomega, self.res2.detsig) - - def test_bse(self): - tm.assert_almost_equal(self.res1.bse, self.res2.bse, DECIMAL_4) - - -class Foo(object): - - def __init__(self): - data = sm.datasets.macrodata.load() - data = data.data[['realinv', 'realgdp', 'realcons']].view((float, 3)) - data = diff(log(data), axis=0) - self.res1 = VAR2(endog=data).fit(maxlag=2) - from results import results_var - self.res2 = results_var.MacrodataResults() - - -if __name__ == '__main__': - unittest.main() diff --git a/pandas/stats/var.py b/pandas/stats/var.py deleted file mode 100644 index db4028d60f5c8..0000000000000 --- a/pandas/stats/var.py +++ /dev/null @@ -1,605 +0,0 @@ -# flake8: noqa - -from __future__ import division - -from pandas.compat import range, lrange, zip, reduce -from pandas import compat -import numpy as np -from pandas.core.base import StringMixin -from pandas.util.decorators import cache_readonly -from pandas.core.frame import DataFrame -from pandas.core.panel import Panel -from pandas.core.series import Series -import pandas.stats.common as common -from pandas.stats.math import inv -from pandas.stats.ols import _combine_rhs - - -class VAR(StringMixin): - """ - Estimates VAR(p) regression on multivariate time series data - presented in pandas data structures. - - Parameters - ---------- - data : DataFrame or dict of Series - p : lags to include - - """ - - def __init__(self, data, p=1, intercept=True): - import warnings - warnings.warn("The pandas.stats.var module is deprecated and will be " - "removed in a future version. We refer to external packages " - "like statsmodels, see some examples here: " - "http://www.statsmodels.org/stable/vector_ar.html#var", - FutureWarning, stacklevel=4) - - try: - import statsmodels.tsa.vector_ar.api as sm_var - except ImportError: - import scikits.statsmodels.tsa.var as sm_var - - self._data = DataFrame(_combine_rhs(data)) - self._p = p - - self._columns = self._data.columns - self._index = self._data.index - - self._intercept = intercept - - @cache_readonly - def aic(self): - """Returns the Akaike information criterion.""" - return self._ic['aic'] - - @cache_readonly - def bic(self): - """Returns the Bayesian information criterion.""" - return self._ic['bic'] - - @cache_readonly - def beta(self): - """ - Returns a DataFrame, where each column x1 contains the betas - calculated by regressing the x1 column of the VAR input with - the lagged input. - - Returns - ------- - DataFrame - """ - d = dict([(key, value.beta) - for (key, value) in compat.iteritems(self.ols_results)]) - return DataFrame(d) - - def forecast(self, h): - """ - Returns a DataFrame containing the forecasts for 1, 2, ..., n time - steps. Each column x1 contains the forecasts of the x1 column. - - Parameters - ---------- - n: int - Number of time steps ahead to forecast. - - Returns - ------- - DataFrame - """ - forecast = self._forecast_raw(h)[:, 0, :] - return DataFrame(forecast, index=lrange(1, 1 + h), - columns=self._columns) - - def forecast_cov(self, h): - """ - Returns the covariance of the forecast residuals. - - Returns - ------- - DataFrame - """ - return [DataFrame(value, index=self._columns, columns=self._columns) - for value in self._forecast_cov_raw(h)] - - def forecast_std_err(self, h): - """ - Returns the standard errors of the forecast residuals. - - Returns - ------- - DataFrame - """ - return DataFrame(self._forecast_std_err_raw(h), - index=lrange(1, 1 + h), columns=self._columns) - - @cache_readonly - def granger_causality(self): - """Returns the f-stats and p-values from the Granger Causality Test. - - If the data consists of columns x1, x2, x3, then we perform the - following regressions: - - x1 ~ L(x2, x3) - x1 ~ L(x1, x3) - x1 ~ L(x1, x2) - - The f-stats of these results are placed in the 'x1' column of the - returned DataFrame. We then repeat for x2, x3. - - Returns - ------- - Dict, where 'f-stat' returns the DataFrame containing the f-stats, - and 'p-value' returns the DataFrame containing the corresponding - p-values of the f-stats. - """ - from pandas.stats.api import ols - from scipy.stats import f - - d = {} - for col in self._columns: - d[col] = {} - for i in range(1, 1 + self._p): - lagged_data = self._lagged_data[i].filter( - self._columns - [col]) - - for key, value in compat.iteritems(lagged_data): - d[col][_make_param_name(i, key)] = value - - f_stat_dict = {} - p_value_dict = {} - - for col, y in compat.iteritems(self._data): - ssr_full = (self.resid[col] ** 2).sum() - - f_stats = [] - p_values = [] - - for col2 in self._columns: - result = ols(y=y, x=d[col2]) - - resid = result.resid - ssr_reduced = (resid ** 2).sum() - - M = self._p - N = self._nobs - K = self._k * self._p + 1 - f_stat = ((ssr_reduced - ssr_full) / M) / (ssr_full / (N - K)) - f_stats.append(f_stat) - - p_value = f.sf(f_stat, M, N - K) - p_values.append(p_value) - - f_stat_dict[col] = Series(f_stats, self._columns) - p_value_dict[col] = Series(p_values, self._columns) - - f_stat_mat = DataFrame(f_stat_dict) - p_value_mat = DataFrame(p_value_dict) - - return { - 'f-stat': f_stat_mat, - 'p-value': p_value_mat, - } - - @cache_readonly - def ols_results(self): - """ - Returns the results of the regressions: - x_1 ~ L(X) - x_2 ~ L(X) - ... - x_k ~ L(X) - - where X = [x_1, x_2, ..., x_k] - and L(X) represents the columns of X lagged 1, 2, ..., n lags - (n is the user-provided number of lags). - - Returns - ------- - dict - """ - from pandas.stats.api import ols - - d = {} - for i in range(1, 1 + self._p): - for col, series in compat.iteritems(self._lagged_data[i]): - d[_make_param_name(i, col)] = series - - result = dict([(col, ols(y=y, x=d, intercept=self._intercept)) - for col, y in compat.iteritems(self._data)]) - - return result - - @cache_readonly - def resid(self): - """ - Returns the DataFrame containing the residuals of the VAR regressions. - Each column x1 contains the residuals generated by regressing the x1 - column of the input against the lagged input. - - Returns - ------- - DataFrame - """ - d = dict([(col, series.resid) - for (col, series) in compat.iteritems(self.ols_results)]) - return DataFrame(d, index=self._index) - - @cache_readonly - def summary(self): - template = """ -%(banner_top)s - -Number of Observations: %(nobs)d -AIC: %(aic).3f -BIC: %(bic).3f - -%(banner_coef)s -%(coef_table)s -%(banner_end)s -""" - params = { - 'banner_top': common.banner('Summary of VAR'), - 'banner_coef': common.banner('Summary of Estimated Coefficients'), - 'banner_end': common.banner('End of Summary'), - 'coef_table': self.beta, - 'aic': self.aic, - 'bic': self.bic, - 'nobs': self._nobs, - } - - return template % params - - @cache_readonly - def _alpha(self): - """ - Returns array where the i-th element contains the intercept - when regressing the i-th column of self._data with the lagged data. - """ - if self._intercept: - return self._beta_raw[-1] - else: - return np.zeros(self._k) - - @cache_readonly - def _beta_raw(self): - return np.array([list(self.beta[col].values()) for col in self._columns]).T - - def _trans_B(self, h): - """ - Returns 0, 1, ..., (h-1)-th power of transpose of B as defined in - equation (4) on p. 142 of the Stata 11 Time Series reference book. - """ - result = [np.eye(1 + self._k * self._p)] - - row1 = np.zeros((1, 1 + self._k * self._p)) - row1[0, 0] = 1 - - v = self._alpha.reshape((self._k, 1)) - row2 = np.hstack(tuple([v] + self._lag_betas)) - - m = self._k * (self._p - 1) - row3 = np.hstack(( - np.zeros((m, 1)), - np.eye(m), - np.zeros((m, self._k)) - )) - - trans_B = np.vstack((row1, row2, row3)).T - - result.append(trans_B) - - for i in range(2, h): - result.append(np.dot(trans_B, result[i - 1])) - - return result - - @cache_readonly - def _x(self): - values = np.array([ - list(self._lagged_data[i][col].values()) - for i in range(1, 1 + self._p) - for col in self._columns - ]).T - - x = np.hstack((np.ones((len(values), 1)), values))[self._p:] - - return x - - @cache_readonly - def _cov_beta(self): - cov_resid = self._sigma - - x = self._x - - inv_cov_x = inv(np.dot(x.T, x)) - - return np.kron(inv_cov_x, cov_resid) - - def _data_xs(self, i): - """ - Returns the cross-section of the data at the given timestep. - """ - return self._data.values[i] - - def _forecast_cov_raw(self, n): - resid = self._forecast_cov_resid_raw(n) - # beta = self._forecast_cov_beta_raw(n) - - # return [a + b for a, b in zip(resid, beta)] - # TODO: ignore the beta forecast std err until it's verified - - return resid - - def _forecast_cov_beta_raw(self, n): - """ - Returns the covariance of the beta errors for the forecast at - 1, 2, ..., n timesteps. - """ - p = self._p - - values = self._data.values - T = len(values) - self._p - 1 - - results = [] - - for h in range(1, n + 1): - psi = self._psi(h) - trans_B = self._trans_B(h) - - sum = 0 - - cov_beta = self._cov_beta - - for t in range(T + 1): - index = t + p - y = values.take(lrange(index, index - p, -1), axis=0).ravel() - trans_Z = np.hstack(([1], y)) - trans_Z = trans_Z.reshape(1, len(trans_Z)) - - sum2 = 0 - for i in range(h): - ZB = np.dot(trans_Z, trans_B[h - 1 - i]) - - prod = np.kron(ZB, psi[i]) - sum2 = sum2 + prod - - sum = sum + chain_dot(sum2, cov_beta, sum2.T) - - results.append(sum / (T + 1)) - - return results - - def _forecast_cov_resid_raw(self, h): - """ - Returns the covariance of the residual errors for the forecast at - 1, 2, ..., h timesteps. - """ - psi_values = self._psi(h) - sum = 0 - result = [] - for i in range(h): - psi = psi_values[i] - sum = sum + chain_dot(psi, self._sigma, psi.T) - result.append(sum) - - return result - - def _forecast_raw(self, h): - """ - Returns the forecast at 1, 2, ..., h timesteps in the future. - """ - k = self._k - result = [] - for i in range(h): - sum = self._alpha.reshape(1, k) - for j in range(self._p): - beta = self._lag_betas[j] - idx = i - j - if idx > 0: - y = result[idx - 1] - else: - y = self._data_xs(idx - 1) - - sum = sum + np.dot(beta, y.T).T - result.append(sum) - - return np.array(result) - - def _forecast_std_err_raw(self, h): - """ - Returns the standard error of the forecasts - at 1, 2, ..., n timesteps. - """ - return np.array([np.sqrt(np.diag(value)) - for value in self._forecast_cov_raw(h)]) - - @cache_readonly - def _ic(self): - """ - Returns the Akaike/Bayesian information criteria. - """ - RSS = self._rss - k = self._p * (self._k * self._p + 1) - n = self._nobs * self._k - - return {'aic': 2 * k + n * np.log(RSS / n), - 'bic': n * np.log(RSS / n) + k * np.log(n)} - - @cache_readonly - def _k(self): - return len(self._columns) - - @cache_readonly - def _lag_betas(self): - """ - Returns list of B_i, where B_i represents the (k, k) matrix - with the j-th row containing the betas of regressing the j-th - column of self._data with self._data lagged i time steps. - First element is B_1, second element is B_2, etc. - """ - k = self._k - b = self._beta_raw - return [b[k * i: k * (i + 1)].T for i in range(self._p)] - - @cache_readonly - def _lagged_data(self): - return dict([(i, self._data.shift(i)) - for i in range(1, 1 + self._p)]) - - @cache_readonly - def _nobs(self): - return len(self._data) - self._p - - def _psi(self, h): - """ - psi value used for calculating standard error. - - Returns [psi_0, psi_1, ..., psi_(h - 1)] - """ - k = self._k - result = [np.eye(k)] - for i in range(1, h): - result.append(sum( - [np.dot(result[i - j], self._lag_betas[j - 1]) - for j in range(1, 1 + i) - if j <= self._p])) - - return result - - @cache_readonly - def _resid_raw(self): - resid = np.array([self.ols_results[col]._resid_raw - for col in self._columns]) - return resid - - @cache_readonly - def _rss(self): - """Returns the sum of the squares of the residuals.""" - return (self._resid_raw ** 2).sum() - - @cache_readonly - def _sigma(self): - """Returns covariance of resids.""" - k = self._k - n = self._nobs - - resid = self._resid_raw - - return np.dot(resid, resid.T) / (n - k) - - def __unicode__(self): - return self.summary - - -def lag_select(data, max_lags=5, ic=None): - """ - Select number of lags based on a variety of information criteria - - Parameters - ---------- - data : DataFrame-like - max_lags : int - Maximum number of lags to evaluate - ic : {None, 'aic', 'bic', ...} - Choosing None will just display the results - - Returns - ------- - None - """ - pass - - -class PanelVAR(VAR): - """ - Performs Vector Autoregression on panel data. - - Parameters - ---------- - data: Panel or dict of DataFrame - lags: int - """ - - def __init__(self, data, lags, intercept=True): - self._data = _prep_panel_data(data) - self._p = lags - self._intercept = intercept - - self._columns = self._data.items - - @cache_readonly - def _nobs(self): - """Returns the number of observations.""" - _, timesteps, entities = self._data.values.shape - return (timesteps - self._p) * entities - - @cache_readonly - def _rss(self): - """Returns the sum of the squares of the residuals.""" - return (self.resid.values ** 2).sum() - - def forecast(self, h): - """ - Returns the forecasts at 1, 2, ..., n timesteps in the future. - """ - forecast = self._forecast_raw(h).T.swapaxes(1, 2) - index = lrange(1, 1 + h) - w = Panel(forecast, items=self._data.items, major_axis=index, - minor_axis=self._data.minor_axis) - return w - - @cache_readonly - def resid(self): - """ - Returns the DataFrame containing the residuals of the VAR regressions. - Each column x1 contains the residuals generated by regressing the x1 - column of the input against the lagged input. - - Returns - ------- - DataFrame - """ - d = dict([(key, value.resid) - for (key, value) in compat.iteritems(self.ols_results)]) - return Panel.fromDict(d) - - def _data_xs(self, i): - return self._data.values[:, i, :].T - - @cache_readonly - def _sigma(self): - """Returns covariance of resids.""" - k = self._k - resid = _drop_incomplete_rows(self.resid.toLong().values) - n = len(resid) - return np.dot(resid.T, resid) / (n - k) - - -def _prep_panel_data(data): - """Converts the given data into a Panel.""" - if isinstance(data, Panel): - return data - - return Panel.fromDict(data) - - -def _drop_incomplete_rows(array): - mask = np.isfinite(array).all(1) - indices = np.arange(len(array))[mask] - return array.take(indices, 0) - - -def _make_param_name(lag, name): - return 'L%d.%s' % (lag, name) - - -def chain_dot(*matrices): - """ - Returns the dot product of the given matrices. - - Parameters - ---------- - matrices: argument list of ndarray - """ - return reduce(lambda x, y: np.dot(y, x), matrices[::-1]) diff --git a/pandas/testing.py b/pandas/testing.py new file mode 100644 index 0000000000000..3baf99957cb33 --- /dev/null +++ b/pandas/testing.py @@ -0,0 +1,8 @@ +# flake8: noqa + +""" +Public testing utility functions. +""" + +from pandas.util.testing import ( + assert_frame_equal, assert_series_equal, assert_index_equal) diff --git a/pandas/stats/tests/__init__.py b/pandas/tests/api/__init__.py similarity index 100% rename from pandas/stats/tests/__init__.py rename to pandas/tests/api/__init__.py diff --git a/pandas/tests/api/test_api.py b/pandas/tests/api/test_api.py new file mode 100644 index 0000000000000..09cccd54b74f8 --- /dev/null +++ b/pandas/tests/api/test_api.py @@ -0,0 +1,237 @@ +# -*- coding: utf-8 -*- + +from warnings import catch_warnings + +import pytest +import pandas as pd +from pandas import api +from pandas.util import testing as tm + + +class Base(object): + + def check(self, namespace, expected, ignored=None): + # see which names are in the namespace, minus optional + # ignored ones + # compare vs the expected + + result = sorted([f for f in dir(namespace) if not f.startswith('_')]) + if ignored is not None: + result = sorted(list(set(result) - set(ignored))) + + expected = sorted(expected) + tm.assert_almost_equal(result, expected) + + +class TestPDApi(Base): + + # these are optionally imported based on testing + # & need to be ignored + ignored = ['tests', 'locale', 'conftest'] + + # top-level sub-packages + lib = ['api', 'compat', 'core', 'errors', 'pandas', + 'plotting', 'test', 'testing', 'tools', 'tseries', + 'util', 'options', 'io'] + + # these are already deprecated; awaiting removal + deprecated_modules = ['stats', 'datetools', 'parser', + 'json', 'lib', 'tslib'] + + # misc + misc = ['IndexSlice', 'NaT'] + + # top-level classes + classes = ['Categorical', 'CategoricalIndex', 'DataFrame', 'DateOffset', + 'DatetimeIndex', 'ExcelFile', 'ExcelWriter', 'Float64Index', + 'Grouper', 'HDFStore', 'Index', 'Int64Index', 'MultiIndex', + 'Period', 'PeriodIndex', 'RangeIndex', 'UInt64Index', + 'Series', 'SparseArray', 'SparseDataFrame', + 'SparseSeries', 'TimeGrouper', 'Timedelta', + 'TimedeltaIndex', 'Timestamp', 'Interval', 'IntervalIndex'] + + # these are already deprecated; awaiting removal + deprecated_classes = ['WidePanel', 'Panel4D', + 'SparseList', 'Expr', 'Term'] + + # these should be deprecated in the future + deprecated_classes_in_future = ['Panel'] + + # external modules exposed in pandas namespace + modules = ['np', 'datetime'] + + # top-level functions + funcs = ['bdate_range', 'concat', 'crosstab', 'cut', + 'date_range', 'interval_range', 'eval', + 'factorize', 'get_dummies', + 'infer_freq', 'isna', 'isnull', 'lreshape', + 'melt', 'notna', 'notnull', 'offsets', + 'merge', 'merge_ordered', 'merge_asof', + 'period_range', + 'pivot', 'pivot_table', 'qcut', + 'show_versions', 'timedelta_range', 'unique', + 'value_counts', 'wide_to_long'] + + # top-level option funcs + funcs_option = ['reset_option', 'describe_option', 'get_option', + 'option_context', 'set_option', + 'set_eng_float_format'] + + # top-level read_* funcs + funcs_read = ['read_clipboard', 'read_csv', 'read_excel', 'read_fwf', + 'read_gbq', 'read_hdf', 'read_html', 'read_json', + 'read_msgpack', 'read_pickle', 'read_sas', 'read_sql', + 'read_sql_query', 'read_sql_table', 'read_stata', + 'read_table', 'read_feather', 'read_parquet'] + + # top-level to_* funcs + funcs_to = ['to_datetime', 'to_msgpack', + 'to_numeric', 'to_pickle', 'to_timedelta'] + + # top-level to deprecate in the future + deprecated_funcs_in_future = [] + + # these are already deprecated; awaiting removal + deprecated_funcs = ['ewma', 'ewmcorr', 'ewmcov', 'ewmstd', 'ewmvar', + 'ewmvol', 'expanding_apply', 'expanding_corr', + 'expanding_count', 'expanding_cov', 'expanding_kurt', + 'expanding_max', 'expanding_mean', 'expanding_median', + 'expanding_min', 'expanding_quantile', + 'expanding_skew', 'expanding_std', 'expanding_sum', + 'expanding_var', 'rolling_apply', + 'rolling_corr', 'rolling_count', 'rolling_cov', + 'rolling_kurt', 'rolling_max', 'rolling_mean', + 'rolling_median', 'rolling_min', 'rolling_quantile', + 'rolling_skew', 'rolling_std', 'rolling_sum', + 'rolling_var', 'rolling_window', 'ordered_merge', + 'pnow', 'match', 'groupby', 'get_store', + 'plot_params', 'scatter_matrix'] + + def test_api(self): + + self.check(pd, + self.lib + self.misc + + self.modules + self.deprecated_modules + + self.classes + self.deprecated_classes + + self.deprecated_classes_in_future + + self.funcs + self.funcs_option + + self.funcs_read + self.funcs_to + + self.deprecated_funcs_in_future + + self.deprecated_funcs, + self.ignored) + + +class TestApi(Base): + + allowed = ['types'] + + def test_api(self): + + self.check(api, self.allowed) + + +class TestTesting(Base): + + funcs = ['assert_frame_equal', 'assert_series_equal', + 'assert_index_equal'] + + def test_testing(self): + + from pandas import testing + self.check(testing, self.funcs) + + +class TestDatetoolsDeprecation(object): + + def test_deprecation_access_func(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.datetools.to_datetime('2016-01-01') + + def test_deprecation_access_obj(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.datetools.monthEnd + + +class TestTopLevelDeprecations(object): + + # top-level API deprecations + # GH 13790 + + def test_pnow(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.pnow(freq='M') + + def test_term(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.Term('index>=date') + + def test_expr(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.Expr('2>1') + + def test_match(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.match([1, 2, 3], [1]) + + def test_groupby(self): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + pd.groupby(pd.Series([1, 2, 3]), [1, 1, 1]) + + # GH 15940 + + def test_get_store(self): + pytest.importorskip('tables') + with tm.ensure_clean() as path: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + s = pd.get_store(path) + s.close() + + +class TestJson(object): + + def test_deprecation_access_func(self): + with catch_warnings(record=True): + pd.json.dumps([]) + + +class TestParser(object): + + def test_deprecation_access_func(self): + with catch_warnings(record=True): + pd.parser.na_values + + +class TestLib(object): + + def test_deprecation_access_func(self): + with catch_warnings(record=True): + pd.lib.infer_dtype('foo') + + +class TestTSLib(object): + + def test_deprecation_access_func(self): + with catch_warnings(record=True): + pd.tslib.Timestamp('20160101') + + +class TestTypes(object): + + def test_deprecation_access_func(self): + with tm.assert_produces_warning( + FutureWarning, check_stacklevel=False): + from pandas.types.concat import union_categoricals + c1 = pd.Categorical(list('aabc')) + c2 = pd.Categorical(list('abcd')) + union_categoricals( + [c1, c2], + sort_categories=True, + ignore_order=True) diff --git a/pandas/tests/api/test_types.py b/pandas/tests/api/test_types.py new file mode 100644 index 0000000000000..1cbcf3f9109a4 --- /dev/null +++ b/pandas/tests/api/test_types.py @@ -0,0 +1,103 @@ +# -*- coding: utf-8 -*- + +import pytest + +from warnings import catch_warnings +import numpy as np + +import pandas +from pandas.core import common as com +from pandas.api import types +from pandas.util import testing as tm + +from .test_api import Base + + +class TestTypes(Base): + + allowed = ['is_bool', 'is_bool_dtype', + 'is_categorical', 'is_categorical_dtype', 'is_complex', + 'is_complex_dtype', 'is_datetime64_any_dtype', + 'is_datetime64_dtype', 'is_datetime64_ns_dtype', + 'is_datetime64tz_dtype', 'is_datetimetz', 'is_dtype_equal', + 'is_extension_type', 'is_float', 'is_float_dtype', + 'is_int64_dtype', 'is_integer', + 'is_integer_dtype', 'is_number', 'is_numeric_dtype', + 'is_object_dtype', 'is_scalar', 'is_sparse', + 'is_string_dtype', 'is_signed_integer_dtype', + 'is_timedelta64_dtype', 'is_timedelta64_ns_dtype', + 'is_unsigned_integer_dtype', 'is_period', + 'is_period_dtype', 'is_interval', 'is_interval_dtype', + 'is_re', 'is_re_compilable', + 'is_dict_like', 'is_iterator', 'is_file_like', + 'is_list_like', 'is_hashable', + 'is_named_tuple', + 'pandas_dtype', 'union_categoricals', 'infer_dtype'] + deprecated = ['is_any_int_dtype', 'is_floating_dtype', 'is_sequence'] + dtypes = ['CategoricalDtype', 'DatetimeTZDtype', + 'PeriodDtype', 'IntervalDtype'] + + def test_types(self): + + self.check(types, self.allowed + self.dtypes + self.deprecated) + + def check_deprecation(self, fold, fnew): + with tm.assert_produces_warning(DeprecationWarning): + try: + result = fold('foo') + expected = fnew('foo') + assert result == expected + except TypeError: + pytest.raises(TypeError, lambda: fnew('foo')) + except AttributeError: + pytest.raises(AttributeError, lambda: fnew('foo')) + + def test_deprecation_core_common(self): + + # test that we are in fact deprecating + # the pandas.core.common introspectors + for t in self.allowed: + self.check_deprecation(getattr(com, t), getattr(types, t)) + + def test_deprecation_core_common_array_equivalent(self): + + with tm.assert_produces_warning(DeprecationWarning): + com.array_equivalent(np.array([1, 2]), np.array([1, 2])) + + def test_deprecation_core_common_moved(self): + + # these are in pandas.core.dtypes.common + l = ['is_datetime_arraylike', + 'is_datetime_or_timedelta_dtype', + 'is_datetimelike', + 'is_datetimelike_v_numeric', + 'is_datetimelike_v_object', + 'is_datetimetz', + 'is_int_or_datetime_dtype', + 'is_period_arraylike', + 'is_string_like', + 'is_string_like_dtype'] + + from pandas.core.dtypes import common as c + for t in l: + self.check_deprecation(getattr(com, t), getattr(c, t)) + + def test_removed_from_core_common(self): + + for t in ['is_null_datelike_scalar', + 'ensure_float']: + pytest.raises(AttributeError, lambda: getattr(com, t)) + + def test_deprecated_from_api_types(self): + + for t in self.deprecated: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + getattr(types, t)(1) + + +def test_moved_infer_dtype(): + + with catch_warnings(record=True): + e = pandas.lib.infer_dtype('foo') + assert e is not None diff --git a/pandas/tests/formats/__init__.py b/pandas/tests/computation/__init__.py similarity index 100% rename from pandas/tests/formats/__init__.py rename to pandas/tests/computation/__init__.py diff --git a/pandas/tests/computation/test_compat.py b/pandas/tests/computation/test_compat.py new file mode 100644 index 0000000000000..ed569625177d3 --- /dev/null +++ b/pandas/tests/computation/test_compat.py @@ -0,0 +1,46 @@ +import pytest +from distutils.version import LooseVersion + +import pandas as pd + +from pandas.core.computation.engines import _engines +import pandas.core.computation.expr as expr +from pandas.core.computation import _MIN_NUMEXPR_VERSION + + +def test_compat(): + # test we have compat with our version of nu + + from pandas.core.computation import _NUMEXPR_INSTALLED + try: + import numexpr as ne + ver = ne.__version__ + if ver < LooseVersion(_MIN_NUMEXPR_VERSION): + assert not _NUMEXPR_INSTALLED + else: + assert _NUMEXPR_INSTALLED + except ImportError: + pytest.skip("not testing numexpr version compat") + + +@pytest.mark.parametrize('engine', _engines) +@pytest.mark.parametrize('parser', expr._parsers) +def test_invalid_numexpr_version(engine, parser): + def testit(): + a, b = 1, 2 # noqa + res = pd.eval('a + b', engine=engine, parser=parser) + assert res == 3 + + if engine == 'numexpr': + try: + import numexpr as ne + except ImportError: + pytest.skip("no numexpr") + else: + if ne.__version__ < LooseVersion(_MIN_NUMEXPR_VERSION): + with pytest.raises(ImportError): + testit() + else: + testit() + else: + testit() diff --git a/pandas/computation/tests/test_eval.py b/pandas/tests/computation/test_eval.py similarity index 72% rename from pandas/computation/tests/test_eval.py rename to pandas/tests/computation/test_eval.py index 3a446bfc36c21..d2874b1606e72 100644 --- a/pandas/computation/tests/test_eval.py +++ b/pandas/tests/computation/test_eval.py @@ -1,44 +1,60 @@ - -# flake8: noqa - import warnings +from warnings import catch_warnings import operator from itertools import product -from distutils.version import LooseVersion -import nose -from nose.tools import assert_raises +import pytest from numpy.random import randn, rand, randint import numpy as np -from pandas.types.common import is_list_like, is_scalar +from pandas.core.dtypes.common import is_bool, is_list_like, is_scalar import pandas as pd from pandas.core import common as com +from pandas.errors import PerformanceWarning from pandas import DataFrame, Series, Panel, date_range from pandas.util.testing import makeCustomDataframe as mkdf -from pandas.computation import pytables -from pandas.computation.engines import _engines, NumExprClobberingError -from pandas.computation.expr import PythonExprVisitor, PandasExprVisitor -from pandas.computation.ops import (_binary_ops_dict, - _special_case_arith_ops_syms, - _arith_ops_syms, _bool_ops_syms, - _unary_math_ops, _binary_math_ops) - -import pandas.computation.expr as expr +from pandas.core.computation import pytables +from pandas.core.computation.engines import _engines, NumExprClobberingError +from pandas.core.computation.expr import PythonExprVisitor, PandasExprVisitor +from pandas.core.computation.expressions import ( + _USE_NUMEXPR, _NUMEXPR_INSTALLED) +from pandas.core.computation.ops import ( + _binary_ops_dict, + _special_case_arith_ops_syms, + _arith_ops_syms, _bool_ops_syms, + _unary_math_ops, _binary_math_ops) + +import pandas.core.computation.expr as expr import pandas.util.testing as tm -import pandas.lib as lib from pandas.util.testing import (assert_frame_equal, randbool, - assertRaisesRegexp, assert_numpy_array_equal, - assert_produces_warning, assert_series_equal, - slow) -from pandas.compat import PY3, u, reduce + assert_numpy_array_equal, assert_series_equal, + assert_produces_warning) +from pandas.compat import PY3, reduce _series_frame_incompatible = _bool_ops_syms _scalar_skip = 'in', 'not in' +@pytest.fixture(params=( + pytest.param(engine, + marks=pytest.mark.skipif( + engine == 'numexpr' and not _USE_NUMEXPR, + reason='numexpr enabled->{enabled}, ' + 'installed->{installed}'.format( + enabled=_USE_NUMEXPR, + installed=_NUMEXPR_INSTALLED))) + for engine in _engines)) # noqa +def engine(request): + return request.param + + +@pytest.fixture(params=expr._parsers) +def parser(request): + return request.param + + def engine_has_neg_frac(engine): return _engines[engine].has_neg_frac @@ -49,7 +65,8 @@ def _eval_single_bin(lhs, cmp1, rhs, engine): try: return c(lhs, rhs) except ValueError as e: - if str(e).startswith('negative number cannot be raised to a fractional power'): + if str(e).startswith('negative number cannot be ' + 'raised to a fractional power'): return np.nan raise return c(lhs, rhs) @@ -57,14 +74,14 @@ def _eval_single_bin(lhs, cmp1, rhs, engine): def _series_and_2d_ndarray(lhs, rhs): return ((isinstance(lhs, Series) and - isinstance(rhs, np.ndarray) and rhs.ndim > 1) - or (isinstance(rhs, Series) and - isinstance(lhs, np.ndarray) and lhs.ndim > 1)) + isinstance(rhs, np.ndarray) and rhs.ndim > 1) or + (isinstance(rhs, Series) and + isinstance(lhs, np.ndarray) and lhs.ndim > 1)) def _series_and_frame(lhs, rhs): - return ((isinstance(lhs, Series) and isinstance(rhs, DataFrame)) - or (isinstance(rhs, Series) and isinstance(lhs, DataFrame))) + return ((isinstance(lhs, Series) and isinstance(rhs, DataFrame)) or + (isinstance(rhs, Series) and isinstance(lhs, DataFrame))) def _bool_and_frame(lhs, rhs): @@ -79,11 +96,10 @@ def _is_py3_complex_incompat(result, expected): _good_arith_ops = com.difference(_arith_ops_syms, _special_case_arith_ops_syms) -class TestEvalNumexprPandas(tm.TestCase): +class TestEvalNumexprPandas(object): @classmethod - def setUpClass(cls): - super(TestEvalNumexprPandas, cls).setUpClass() + def setup_class(cls): tm.skip_if_no_ne() import numexpr as ne cls.ne = ne @@ -91,8 +107,7 @@ def setUpClass(cls): cls.parser = 'pandas' @classmethod - def tearDownClass(cls): - super(TestEvalNumexprPandas, cls).tearDownClass() + def teardown_class(cls): del cls.engine, cls.parser if hasattr(cls, 'ne'): del cls.ne @@ -121,16 +136,16 @@ def setup_ops(self): self.arith_ops = _good_arith_ops self.unary_ops = '-', '~', 'not ' - def setUp(self): + def setup_method(self, method): self.setup_ops() self.setup_data() self.current_engines = filter(lambda x: x != self.engine, _engines) - def tearDown(self): + def teardown_method(self, method): del self.lhses, self.rhses, self.scalar_rhses, self.scalar_lhses del self.pandas_rhses, self.pandas_lhses, self.current_engines - @slow + @pytest.mark.slow def test_complex_cmp_ops(self): cmp_ops = ('!=', '==', '<=', '>=', '<', '>') cmp2_ops = ('>', '<') @@ -147,7 +162,7 @@ def test_simple_cmp_ops(self): for lhs, rhs, cmp_op in product(bool_lhses, bool_rhses, self.cmp_ops): self.check_simple_cmp_op(lhs, cmp_op, rhs) - @slow + @pytest.mark.slow def test_binary_arith_ops(self): for lhs, op, rhs in product(self.lhses, self.arith_ops, self.rhses): self.check_binary_arith_op(lhs, op, rhs) @@ -167,17 +182,17 @@ def test_pow(self): for lhs, rhs in product(self.lhses, self.rhses): self.check_pow(lhs, '**', rhs) - @slow + @pytest.mark.slow def test_single_invert_op(self): for lhs, op, rhs in product(self.lhses, self.cmp_ops, self.rhses): self.check_single_invert_op(lhs, op, rhs) - @slow + @pytest.mark.slow def test_compound_invert_op(self): for lhs, op, rhs in product(self.lhses, self.cmp_ops, self.rhses): self.check_compound_invert_op(lhs, op, rhs) - @slow + @pytest.mark.slow def test_chained_cmp_op(self): mids = self.lhses cmp_ops = '<', '>' @@ -193,7 +208,7 @@ def check_equal(self, result, expected): elif isinstance(result, np.ndarray): tm.assert_numpy_array_equal(result, expected) else: - self.assertEqual(result, expected) + assert result == expected def check_complex_cmp_op(self, lhs, cmp1, rhs, binop, cmp2): skip_these = _scalar_skip @@ -201,29 +216,32 @@ def check_complex_cmp_op(self, lhs, cmp1, rhs, binop, cmp2): binop=binop, cmp2=cmp2) scalar_with_in_notin = (is_scalar(rhs) and (cmp1 in skip_these or - cmp2 in skip_these)) + cmp2 in skip_these)) if scalar_with_in_notin: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): pd.eval(ex, engine=self.engine, parser=self.parser) - self.assertRaises(TypeError, pd.eval, ex, engine=self.engine, - parser=self.parser, local_dict={'lhs': lhs, - 'rhs': rhs}) + with pytest.raises(TypeError): + pd.eval(ex, engine=self.engine, parser=self.parser, + local_dict={'lhs': lhs, 'rhs': rhs}) else: lhs_new = _eval_single_bin(lhs, cmp1, rhs, self.engine) rhs_new = _eval_single_bin(lhs, cmp2, rhs, self.engine) - if (isinstance(lhs_new, Series) and isinstance(rhs_new, DataFrame) - and binop in _series_frame_incompatible): + if (isinstance(lhs_new, Series) and + isinstance(rhs_new, DataFrame) and + binop in _series_frame_incompatible): pass # TODO: the code below should be added back when left and right # hand side bool ops are fixed. - + # # try: - # self.assertRaises(Exception, pd.eval, ex, - #local_dict={'lhs': lhs, 'rhs': rhs}, - # engine=self.engine, parser=self.parser) + # pytest.raises(Exception, pd.eval, ex, + # local_dict={'lhs': lhs, 'rhs': rhs}, + # engine=self.engine, parser=self.parser) # except AssertionError: - #import ipdb; ipdb.set_trace() - # raise + # import ipdb + # + # ipdb.set_trace() + # raise else: expected = _eval_single_bin( lhs_new, binop, rhs_new, self.engine) @@ -231,7 +249,6 @@ def check_complex_cmp_op(self, lhs, cmp1, rhs, binop, cmp2): self.check_equal(result, expected) def check_chained_cmp_op(self, lhs, cmp1, mid, cmp2, rhs): - skip_these = _scalar_skip def check_operands(left, right, cmp_op): return _eval_single_bin(left, cmp_op, right, self.engine) @@ -254,9 +271,9 @@ def check_operands(left, right, cmp_op): def check_simple_cmp_op(self, lhs, cmp1, rhs): ex = 'lhs {0} rhs'.format(cmp1) if cmp1 in ('in', 'not in') and not is_list_like(rhs): - self.assertRaises(TypeError, pd.eval, ex, engine=self.engine, - parser=self.parser, local_dict={'lhs': lhs, - 'rhs': rhs}) + pytest.raises(TypeError, pd.eval, ex, engine=self.engine, + parser=self.parser, local_dict={'lhs': lhs, + 'rhs': rhs}) else: expected = _eval_single_bin(lhs, cmp1, rhs, self.engine) result = pd.eval(ex, engine=self.engine, parser=self.parser) @@ -309,17 +326,18 @@ def check_floor_division(self, lhs, arith1, rhs): expected = lhs // rhs self.check_equal(res, expected) else: - self.assertRaises(TypeError, pd.eval, ex, local_dict={'lhs': lhs, - 'rhs': rhs}, - engine=self.engine, parser=self.parser) + pytest.raises(TypeError, pd.eval, ex, + local_dict={'lhs': lhs, 'rhs': rhs}, + engine=self.engine, parser=self.parser) def get_expected_pow_result(self, lhs, rhs): try: expected = _eval_single_bin(lhs, '**', rhs, self.engine) except ValueError as e: - if str(e).startswith('negative number cannot be raised to a fractional power'): + if str(e).startswith('negative number cannot be ' + 'raised to a fractional power'): if self.engine == 'python': - raise nose.SkipTest(str(e)) + pytest.skip(str(e)) else: expected = np.nan else: @@ -333,8 +351,8 @@ def check_pow(self, lhs, arith1, rhs): if (is_scalar(lhs) and is_scalar(rhs) and _is_py3_complex_incompat(result, expected)): - self.assertRaises(AssertionError, tm.assert_numpy_array_equal, - result, expected) + pytest.raises(AssertionError, tm.assert_numpy_array_equal, + result, expected) else: tm.assert_almost_equal(result, expected) @@ -365,9 +383,9 @@ def check_compound_invert_op(self, lhs, cmp1, rhs): ex = '~(lhs {0} rhs)'.format(cmp1) if is_scalar(rhs) and cmp1 in skip_these: - self.assertRaises(TypeError, pd.eval, ex, engine=self.engine, - parser=self.parser, local_dict={'lhs': lhs, - 'rhs': rhs}) + pytest.raises(TypeError, pd.eval, ex, engine=self.engine, + parser=self.parser, local_dict={'lhs': lhs, + 'rhs': rhs}) else: # compound if is_scalar(lhs) and is_scalar(rhs): @@ -397,16 +415,16 @@ def test_frame_invert(self): # float always raises lhs = DataFrame(randn(5, 2)) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) # int raises on numexpr lhs = DataFrame(randint(5, size=(5, 2))) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = ~lhs @@ -422,10 +440,10 @@ def test_frame_invert(self): # object raises lhs = DataFrame({'b': ['a', 1, 2.0], 'c': rand(3) > 0.5}) if self.engine == 'numexpr': - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) def test_series_invert(self): @@ -436,16 +454,16 @@ def test_series_invert(self): # float raises lhs = Series(randn(5)) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) # int raises on numexpr lhs = Series(randint(5, size=5)) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = ~lhs @@ -465,10 +483,10 @@ def test_series_invert(self): # object lhs = Series(['a', 1, 2.0]) if self.engine == 'numexpr': - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) def test_frame_negate(self): @@ -489,7 +507,7 @@ def test_frame_negate(self): # bool doesn't work with numexpr but works elsewhere lhs = DataFrame(rand(5, 2) > 0.5) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = -lhs @@ -514,7 +532,7 @@ def test_series_negate(self): # bool doesn't work with numexpr but works elsewhere lhs = Series(rand(5) > 0.5) if self.engine == 'numexpr': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = -lhs @@ -527,7 +545,7 @@ def test_frame_pos(self): # float lhs = DataFrame(randn(5, 2)) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -537,7 +555,7 @@ def test_frame_pos(self): # int lhs = DataFrame(randint(5, size=(5, 2))) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -547,7 +565,7 @@ def test_frame_pos(self): # bool doesn't work with numexpr but works elsewhere lhs = DataFrame(rand(5, 2) > 0.5) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -560,7 +578,7 @@ def test_series_pos(self): # float lhs = Series(randn(5)) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -570,7 +588,7 @@ def test_series_pos(self): # int lhs = Series(randint(5, size=5)) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -580,7 +598,7 @@ def test_series_pos(self): # bool doesn't work with numexpr but works elsewhere lhs = Series(rand(5) > 0.5) if self.engine == 'python': - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): result = pd.eval(expr, engine=self.engine, parser=self.parser) else: expect = lhs @@ -588,33 +606,31 @@ def test_series_pos(self): assert_series_equal(expect, result) def test_scalar_unary(self): - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): pd.eval('~1.0', engine=self.engine, parser=self.parser) - self.assertEqual( - pd.eval('-1.0', parser=self.parser, engine=self.engine), -1.0) - self.assertEqual( - pd.eval('+1.0', parser=self.parser, engine=self.engine), +1.0) - - self.assertEqual( - pd.eval('~1', parser=self.parser, engine=self.engine), ~1) - self.assertEqual( - pd.eval('-1', parser=self.parser, engine=self.engine), -1) - self.assertEqual( - pd.eval('+1', parser=self.parser, engine=self.engine), +1) - - self.assertEqual( - pd.eval('~True', parser=self.parser, engine=self.engine), ~True) - self.assertEqual( - pd.eval('~False', parser=self.parser, engine=self.engine), ~False) - self.assertEqual( - pd.eval('-True', parser=self.parser, engine=self.engine), -True) - self.assertEqual( - pd.eval('-False', parser=self.parser, engine=self.engine), -False) - self.assertEqual( - pd.eval('+True', parser=self.parser, engine=self.engine), +True) - self.assertEqual( - pd.eval('+False', parser=self.parser, engine=self.engine), +False) + assert pd.eval('-1.0', parser=self.parser, + engine=self.engine) == -1.0 + assert pd.eval('+1.0', parser=self.parser, + engine=self.engine) == +1.0 + assert pd.eval('~1', parser=self.parser, + engine=self.engine) == ~1 + assert pd.eval('-1', parser=self.parser, + engine=self.engine) == -1 + assert pd.eval('+1', parser=self.parser, + engine=self.engine) == +1 + assert pd.eval('~True', parser=self.parser, + engine=self.engine) == ~True + assert pd.eval('~False', parser=self.parser, + engine=self.engine) == ~False + assert pd.eval('-True', parser=self.parser, + engine=self.engine) == -True + assert pd.eval('-False', parser=self.parser, + engine=self.engine) == -False + assert pd.eval('+True', parser=self.parser, + engine=self.engine) == +True + assert pd.eval('+False', parser=self.parser, + engine=self.engine) == +False def test_unary_in_array(self): # GH 11235 @@ -633,63 +649,64 @@ def test_disallow_scalar_bool_ops(self): exprs += '2 * x > 2 or 1 and 2', exprs += '2 * df > 3 and 1 or a', - x, a, b, df = np.random.randn(3), 1, 2, DataFrame(randn(3, 2)) + x, a, b, df = np.random.randn(3), 1, 2, DataFrame(randn(3, 2)) # noqa for ex in exprs: - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex, engine=self.engine, parser=self.parser) def test_identical(self): - # GH 10546 + # see gh-10546 x = 1 result = pd.eval('x', engine=self.engine, parser=self.parser) - self.assertEqual(result, 1) - self.assertTrue(is_scalar(result)) + assert result == 1 + assert is_scalar(result) x = 1.5 result = pd.eval('x', engine=self.engine, parser=self.parser) - self.assertEqual(result, 1.5) - self.assertTrue(is_scalar(result)) + assert result == 1.5 + assert is_scalar(result) x = False result = pd.eval('x', engine=self.engine, parser=self.parser) - self.assertEqual(result, False) - self.assertTrue(is_scalar(result)) + assert not result + assert is_bool(result) + assert is_scalar(result) x = np.array([1]) result = pd.eval('x', engine=self.engine, parser=self.parser) tm.assert_numpy_array_equal(result, np.array([1])) - self.assertEqual(result.shape, (1, )) + assert result.shape == (1, ) x = np.array([1.5]) result = pd.eval('x', engine=self.engine, parser=self.parser) tm.assert_numpy_array_equal(result, np.array([1.5])) - self.assertEqual(result.shape, (1, )) + assert result.shape == (1, ) - x = np.array([False]) + x = np.array([False]) # noqa result = pd.eval('x', engine=self.engine, parser=self.parser) tm.assert_numpy_array_equal(result, np.array([False])) - self.assertEqual(result.shape, (1, )) + assert result.shape == (1, ) def test_line_continuation(self): # GH 11149 exp = """1 + 2 * \ 5 - 1 + 2 """ result = pd.eval(exp, engine=self.engine, parser=self.parser) - self.assertEqual(result, 12) + assert result == 12 def test_float_truncation(self): # GH 14241 exp = '1000000000.006' result = pd.eval(exp, engine=self.engine, parser=self.parser) expected = np.float64(exp) - self.assertEqual(result, expected) + assert result == expected df = pd.DataFrame({'A': [1000000000.0009, 1000000000.0011, 1000000000.0015]}) cutoff = 1000000000.0006 result = df.query("A < %.4f" % cutoff) - self.assertTrue(result.empty) + assert result.empty cutoff = 1000000000.0010 result = df.query("A > %.4f" % cutoff) @@ -702,12 +719,11 @@ def test_float_truncation(self): tm.assert_frame_equal(expected, result) - class TestEvalNumexprPython(TestEvalNumexprPandas): @classmethod - def setUpClass(cls): - super(TestEvalNumexprPython, cls).setUpClass() + def setup_class(cls): + super(TestEvalNumexprPython, cls).setup_class() tm.skip_if_no_ne() import numexpr as ne cls.ne = ne @@ -726,15 +742,15 @@ def setup_ops(self): def check_chained_cmp_op(self, lhs, cmp1, mid, cmp2, rhs): ex1 = 'lhs {0} mid {1} rhs'.format(cmp1, cmp2) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex1, engine=self.engine, parser=self.parser) class TestEvalPythonPython(TestEvalNumexprPython): @classmethod - def setUpClass(cls): - super(TestEvalPythonPython, cls).setUpClass() + def setup_class(cls): + super(TestEvalPythonPython, cls).setup_class() cls.engine = 'python' cls.parser = 'python' @@ -763,8 +779,8 @@ def check_alignment(self, result, nlhs, ghs, op): class TestEvalPythonPandas(TestEvalPythonPython): @classmethod - def setUpClass(cls): - super(TestEvalPythonPandas, cls).setUpClass() + def setup_class(cls): + super(TestEvalPythonPandas, cls).setup_class() cls.engine = 'python' cls.parser = 'pandas' @@ -776,16 +792,16 @@ def check_chained_cmp_op(self, lhs, cmp1, mid, cmp2, rhs): f = lambda *args, **kwargs: np.random.randn() -ENGINES_PARSERS = list(product(_engines, expr._parsers)) +# ------------------------------------- +# gh-12388: Typecasting rules consistency with python -#------------------------------------- -# typecasting rules consistency with python -# issue #12388 class TestTypeCasting(object): - - def check_binop_typecasting(self, engine, parser, op, dt): - tm.skip_if_no_ne(engine) + @pytest.mark.parametrize('op', ['+', '-', '*', '**', '/']) + # maybe someday... numexpr has too many upcasting rules now + # chain(*(np.sctypes[x] for x in ['uint', 'int', 'float'])) + @pytest.mark.parametrize('dt', [np.float32, np.float64]) + def test_binop_typecasting(self, engine, parser, op, dt): df = mkdf(5, 3, data_gen_f=f, dtype=dt) s = 'df {} 3'.format(op) res = pd.eval(s, engine=engine, parser=parser) @@ -799,17 +815,9 @@ def check_binop_typecasting(self, engine, parser, op, dt): assert res.values.dtype == dt assert_frame_equal(res, eval(s)) - def test_binop_typecasting(self): - for engine, parser in ENGINES_PARSERS: - for op in ['+', '-', '*', '**', '/']: - # maybe someday... numexpr has too many upcasting rules now - #for dt in chain(*(np.sctypes[x] for x in ['uint', 'int', 'float'])): - for dt in [np.float32, np.float64]: - yield self.check_binop_typecasting, engine, parser, op, dt - -#------------------------------------- -# basic and complex alignment +# ------------------------------------- +# Basic and complex alignment def _is_datetime(x): return issubclass(x.dtype.type, np.datetime64) @@ -826,19 +834,13 @@ class TestAlignment(object): index_types = 'i', 'u', 'dt' lhs_index_types = index_types + ('s',) # 'p' - def check_align_nested_unary_op(self, engine, parser): - tm.skip_if_no_ne(engine) + def test_align_nested_unary_op(self, engine, parser): s = 'df * ~2' df = mkdf(5, 3, data_gen_f=f) res = pd.eval(s, engine=engine, parser=parser) assert_frame_equal(res, df * ~2) - def test_align_nested_unary_op(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_align_nested_unary_op, engine, parser - - def check_basic_frame_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) + def test_basic_frame_alignment(self, engine, parser): args = product(self.lhs_index_types, self.index_types, self.index_types) with warnings.catch_warnings(record=True): @@ -856,12 +858,7 @@ def check_basic_frame_alignment(self, engine, parser): res = pd.eval('df + df2', engine=engine, parser=parser) assert_frame_equal(res, df + df2) - def test_basic_frame_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_basic_frame_alignment, engine, parser - - def check_frame_comparison(self, engine, parser): - tm.skip_if_no_ne(engine) + def test_frame_comparison(self, engine, parser): args = product(self.lhs_index_types, repeat=2) for r_idx_type, c_idx_type in args: df = mkdf(10, 10, data_gen_f=f, r_idx_type=r_idx_type, @@ -874,12 +871,8 @@ def check_frame_comparison(self, engine, parser): res = pd.eval('df < df3', engine=engine, parser=parser) assert_frame_equal(res, df < df3) - def test_frame_comparison(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_frame_comparison, engine, parser - - def check_medium_complex_frame_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) + @pytest.mark.slow + def test_medium_complex_frame_alignment(self, engine, parser): args = product(self.lhs_index_types, self.index_types, self.index_types, self.index_types) @@ -899,14 +892,7 @@ def check_medium_complex_frame_alignment(self, engine, parser): engine=engine, parser=parser) assert_frame_equal(res, df + df2 + df3) - @slow - def test_medium_complex_frame_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_medium_complex_frame_alignment, engine, parser - - def check_basic_frame_series_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) - + def test_basic_frame_series_alignment(self, engine, parser): def testit(r_idx_type, c_idx_type, index_name): df = mkdf(10, 10, data_gen_f=f, r_idx_type=r_idx_type, c_idx_type=c_idx_type) @@ -932,13 +918,7 @@ def testit(r_idx_type, c_idx_type, index_name): for r_idx_type, c_idx_type, index_name in args: testit(r_idx_type, c_idx_type, index_name) - def test_basic_frame_series_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_basic_frame_series_alignment, engine, parser - - def check_basic_series_frame_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) - + def test_basic_series_frame_alignment(self, engine, parser): def testit(r_idx_type, c_idx_type, index_name): df = mkdf(10, 7, data_gen_f=f, r_idx_type=r_idx_type, c_idx_type=c_idx_type) @@ -968,12 +948,7 @@ def testit(r_idx_type, c_idx_type, index_name): for r_idx_type, c_idx_type, index_name in args: testit(r_idx_type, c_idx_type, index_name) - def test_basic_series_frame_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_basic_series_frame_alignment, engine, parser - - def check_series_frame_commutativity(self, engine, parser): - tm.skip_if_no_ne(engine) + def test_series_frame_commutativity(self, engine, parser): args = product(self.lhs_index_types, self.index_types, ('+', '*'), ('index', 'columns')) @@ -1000,13 +975,8 @@ def check_series_frame_commutativity(self, engine, parser): if engine == 'numexpr': assert_frame_equal(a, b) - def test_series_frame_commutativity(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_series_frame_commutativity, engine, parser - - def check_complex_series_frame_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) - + @pytest.mark.slow + def test_complex_series_frame_alignment(self, engine, parser): import random args = product(self.lhs_index_types, self.index_types, self.index_types, self.index_types) @@ -1047,20 +1017,14 @@ def check_complex_series_frame_alignment(self, engine, parser): parser=parser) else: res = pd.eval('df2 + s + df', engine=engine, parser=parser) - tm.assert_equal(res.shape, expected.shape) + assert res.shape == expected.shape assert_frame_equal(res, expected) - @slow - def test_complex_series_frame_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield self.check_complex_series_frame_alignment, engine, parser - - def check_performance_warning_for_poor_alignment(self, engine, parser): - tm.skip_if_no_ne(engine) + def test_performance_warning_for_poor_alignment(self, engine, parser): df = DataFrame(randn(1000, 10)) s = Series(randn(10000)) if engine == 'numexpr': - seen = pd.core.common.PerformanceWarning + seen = PerformanceWarning else: seen = False @@ -1082,7 +1046,7 @@ def check_performance_warning_for_poor_alignment(self, engine, parser): is_python_engine = engine == 'python' if not is_python_engine: - wrn = pd.core.common.PerformanceWarning + wrn = PerformanceWarning else: wrn = False @@ -1090,36 +1054,29 @@ def check_performance_warning_for_poor_alignment(self, engine, parser): pd.eval('df + s', engine=engine, parser=parser) if not is_python_engine: - tm.assert_equal(len(w), 1) + assert len(w) == 1 msg = str(w[0].message) expected = ("Alignment difference on axis {0} is larger" " than an order of magnitude on term {1!r}, " "by more than {2:.4g}; performance may suffer" "".format(1, 'df', np.log10(s.size - df.shape[1]))) - tm.assert_equal(msg, expected) + assert msg == expected - def test_performance_warning_for_poor_alignment(self): - for engine, parser in ENGINES_PARSERS: - yield (self.check_performance_warning_for_poor_alignment, engine, - parser) +# ------------------------------------ +# Slightly more complex ops -#------------------------------------ -# slightly more complex ops - -class TestOperationsNumExprPandas(tm.TestCase): +class TestOperationsNumExprPandas(object): @classmethod - def setUpClass(cls): - super(TestOperationsNumExprPandas, cls).setUpClass() + def setup_class(cls): tm.skip_if_no_ne() cls.engine = 'numexpr' cls.parser = 'pandas' cls.arith_ops = expr._arith_ops_syms + expr._cmp_ops_syms @classmethod - def tearDownClass(cls): - super(TestOperationsNumExprPandas, cls).tearDownClass() + def teardown_class(cls): del cls.engine, cls.parser def eval(self, *args, **kwargs): @@ -1137,22 +1094,22 @@ def test_simple_arith_ops(self): ex3 = '1 {0} (x + 1)'.format(op) if op in ('in', 'not in'): - self.assertRaises(TypeError, pd.eval, ex, - engine=self.engine, parser=self.parser) + pytest.raises(TypeError, pd.eval, ex, + engine=self.engine, parser=self.parser) else: expec = _eval_single_bin(1, op, 1, self.engine) x = self.eval(ex, engine=self.engine, parser=self.parser) - tm.assert_equal(x, expec) + assert x == expec expec = _eval_single_bin(x, op, 1, self.engine) y = self.eval(ex2, local_dict={'x': x}, engine=self.engine, parser=self.parser) - tm.assert_equal(y, expec) + assert y == expec expec = _eval_single_bin(1, op, x + 1, self.engine) y = self.eval(ex3, local_dict={'x': x}, engine=self.engine, parser=self.parser) - tm.assert_equal(y, expec) + assert y == expec def test_simple_bool_ops(self): for op, lhs, rhs in product(expr._bool_ops_syms, (True, False), @@ -1160,7 +1117,7 @@ def test_simple_bool_ops(self): ex = '{0} {1} {2}'.format(lhs, op, rhs) res = self.eval(ex) exp = eval(ex) - self.assertEqual(res, exp) + assert res == exp def test_bool_ops_with_constants(self): for op, lhs, rhs in product(expr._bool_ops_syms, ('True', 'False'), @@ -1168,23 +1125,26 @@ def test_bool_ops_with_constants(self): ex = '{0} {1} {2}'.format(lhs, op, rhs) res = self.eval(ex) exp = eval(ex) - self.assertEqual(res, exp) + assert res == exp def test_panel_fails(self): - x = Panel(randn(3, 4, 5)) - y = Series(randn(10)) - assert_raises(NotImplementedError, self.eval, 'x + y', - local_dict={'x': x, 'y': y}) + with catch_warnings(record=True): + x = Panel(randn(3, 4, 5)) + y = Series(randn(10)) + with pytest.raises(NotImplementedError): + self.eval('x + y', + local_dict={'x': x, 'y': y}) def test_4d_ndarray_fails(self): x = randn(3, 4, 5, 6) y = Series(randn(10)) - assert_raises(NotImplementedError, self.eval, 'x + y', + with pytest.raises(NotImplementedError): + self.eval('x + y', local_dict={'x': x, 'y': y}) def test_constant(self): x = self.eval('1') - tm.assert_equal(x, 1) + assert x == 1 def test_single_variable(self): df = DataFrame(randn(10, 2)) @@ -1194,7 +1154,7 @@ def test_single_variable(self): def test_truediv(self): s = np.array([1]) ex = 's / 1' - d = {'s': s} + d = {'s': s} # noqa if PY3: res = self.eval(ex, truediv=False) @@ -1205,19 +1165,19 @@ def test_truediv(self): res = self.eval('1 / 2', truediv=True) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec res = self.eval('1 / 2', truediv=False) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec res = self.eval('s / 2', truediv=False) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec res = self.eval('s / 2', truediv=True) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec else: res = self.eval(ex, truediv=False) tm.assert_numpy_array_equal(res, np.array([1])) @@ -1227,23 +1187,23 @@ def test_truediv(self): res = self.eval('1 / 2', truediv=True) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec res = self.eval('1 / 2', truediv=False) expec = 0 - self.assertEqual(res, expec) + assert res == expec res = self.eval('s / 2', truediv=False) expec = 0 - self.assertEqual(res, expec) + assert res == expec res = self.eval('s / 2', truediv=True) expec = 0.5 - self.assertEqual(res, expec) + assert res == expec def test_failing_subscript_with_name_error(self): - df = DataFrame(np.random.randn(5, 3)) - with tm.assertRaises(NameError): + df = DataFrame(np.random.randn(5, 3)) # noqa + with pytest.raises(NameError): self.eval('df[x > 2] > 2') def test_lhs_expression_subscript(self): @@ -1269,21 +1229,19 @@ def test_assignment_fails(self): df = DataFrame(np.random.randn(5, 3), columns=list('abc')) df2 = DataFrame(np.random.randn(5, 3)) expr1 = 'df = df2' - self.assertRaises(ValueError, self.eval, expr1, - local_dict={'df': df, 'df2': df2}) + pytest.raises(ValueError, self.eval, expr1, + local_dict={'df': df, 'df2': df2}) def test_assignment_column(self): - tm.skip_if_no_ne('numexpr') df = DataFrame(np.random.randn(5, 2), columns=list('ab')) orig_df = df.copy() # multiple assignees - self.assertRaises(SyntaxError, df.eval, 'd c = a + b') + pytest.raises(SyntaxError, df.eval, 'd c = a + b') # invalid assignees - self.assertRaises(SyntaxError, df.eval, 'd,c = a + b') - self.assertRaises( - SyntaxError, df.eval, 'Timestamp("20131001") = a + b') + pytest.raises(SyntaxError, df.eval, 'd,c = a + b') + pytest.raises(SyntaxError, df.eval, 'Timestamp("20131001") = a + b') # single assignment - existing variable expected = orig_df.copy() @@ -1319,14 +1277,14 @@ def f(): df.eval('a = a + b', inplace=True) result = old_a + df.b assert_series_equal(result, df.a, check_names=False) - self.assertTrue(result.name is None) + assert result.name is None f() # multiple assignment df = orig_df.copy() df.eval('c = a + b', inplace=True) - self.assertRaises(SyntaxError, df.eval, 'c = a = b') + pytest.raises(SyntaxError, df.eval, 'c = a = b') # explicit targets df = orig_df.copy() @@ -1344,27 +1302,18 @@ def test_column_in(self): assert_series_equal(result, expected) def assignment_not_inplace(self): - # GH 9297 - tm.skip_if_no_ne('numexpr') + # see gh-9297 df = DataFrame(np.random.randn(5, 2), columns=list('ab')) actual = df.eval('c = a + b', inplace=False) - self.assertIsNotNone(actual) + assert actual is not None + expected = df.copy() expected['c'] = expected['a'] + expected['b'] - assert_frame_equal(df, expected) - - # default for inplace will change - with tm.assert_produces_warnings(FutureWarning): - df.eval('c = a + b') - - # but don't warn without assignment - with tm.assert_produces_warnings(None): - df.eval('a + b') + tm.assert_frame_equal(df, expected) def test_multi_line_expression(self): # GH 11149 - tm.skip_if_no_ne('numexpr') df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) expected = df.copy() @@ -1374,7 +1323,7 @@ def test_multi_line_expression(self): c = a + b d = c + b""", inplace=True) assert_frame_equal(expected, df) - self.assertIsNone(ans) + assert ans is None expected['a'] = expected['a'] - 1 expected['e'] = expected['a'] + 2 @@ -1382,17 +1331,16 @@ def test_multi_line_expression(self): a = a - 1 e = a + 2""", inplace=True) assert_frame_equal(expected, df) - self.assertIsNone(ans) + assert ans is None # multi-line not valid if not all assignments - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.eval(""" a = b + 2 b - 2""", inplace=False) def test_multi_line_expression_not_inplace(self): # GH 11149 - tm.skip_if_no_ne('numexpr') df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) expected = df.copy() @@ -1410,22 +1358,75 @@ def test_multi_line_expression_not_inplace(self): e = a + 2""", inplace=False) assert_frame_equal(expected, df) + def test_multi_line_expression_local_variable(self): + # GH 15342 + df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) + expected = df.copy() + + local_var = 7 + expected['c'] = expected['a'] * local_var + expected['d'] = expected['c'] + local_var + ans = df.eval(""" + c = a * @local_var + d = c + @local_var + """, inplace=True) + assert_frame_equal(expected, df) + assert ans is None + def test_assignment_in_query(self): # GH 8664 df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) df_orig = df.copy() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.query('a = 1') assert_frame_equal(df, df_orig) - def query_inplace(self): - # GH 11149 + def test_query_inplace(self): + # see gh-11149 df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) expected = df.copy() expected = expected[expected['a'] == 2] df.query('a == 2', inplace=True) assert_frame_equal(expected, df) + df = {} + expected = {"a": 3} + + self.eval("a = 1 + 2", target=df, inplace=True) + tm.assert_dict_equal(df, expected) + + @pytest.mark.parametrize("invalid_target", [1, "cat", [1, 2], + np.array([]), (1, 3)]) + def test_cannot_item_assign(self, invalid_target): + msg = "Cannot assign expression output to target" + expression = "a = 1 + 2" + + with tm.assert_raises_regex(ValueError, msg): + self.eval(expression, target=invalid_target, inplace=True) + + if hasattr(invalid_target, "copy"): + with tm.assert_raises_regex(ValueError, msg): + self.eval(expression, target=invalid_target, inplace=False) + + @pytest.mark.parametrize("invalid_target", [1, "cat", (1, 3)]) + def test_cannot_copy_item(self, invalid_target): + msg = "Cannot return a copy of the target" + expression = "a = 1 + 2" + + with tm.assert_raises_regex(ValueError, msg): + self.eval(expression, target=invalid_target, inplace=False) + + @pytest.mark.parametrize("target", [1, "cat", [1, 2], + np.array([]), (1, 3), {1: 2}]) + def test_inplace_no_assignment(self, target): + expression = "1 + 2" + + assert self.eval(expression, target=target, inplace=False) == 3 + + msg = "Cannot operate inplace if there is no assignment" + with tm.assert_raises_regex(ValueError, msg): + self.eval(expression, target=target, inplace=True) + def test_basic_period_index_boolean_expression(self): df = mkdf(2, 2, data_gen_f=f, c_idx_type='p', r_idx_type='i') @@ -1460,57 +1461,57 @@ def test_simple_in_ops(self): if self.parser != 'python': res = pd.eval('1 in [1, 2]', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('2 in (1, 2)', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('3 in (1, 2)', engine=self.engine, parser=self.parser) - self.assertFalse(res) + assert not res res = pd.eval('3 not in (1, 2)', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('[3] not in (1, 2)', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('[3] in ([3], 2)', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('[[3]] in [[[3]], 2]', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('(3,) in [(3,), 2]', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res res = pd.eval('(3,) not in [(3,), 2]', engine=self.engine, parser=self.parser) - self.assertFalse(res) + assert not res res = pd.eval('[(3,)] in [[(3,)], 2]', engine=self.engine, parser=self.parser) - self.assertTrue(res) + assert res else: - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('1 in [1, 2]', engine=self.engine, parser=self.parser) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('2 in (1, 2)', engine=self.engine, parser=self.parser) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('3 in (1, 2)', engine=self.engine, parser=self.parser) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('3 not in (1, 2)', engine=self.engine, parser=self.parser) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('[(3,)] in (1, 2, [(3,)])', engine=self.engine, parser=self.parser) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('[3] not in (1, 2, [[3]])', engine=self.engine, parser=self.parser) @@ -1518,8 +1519,8 @@ def test_simple_in_ops(self): class TestOperationsNumExprPython(TestOperationsNumExprPandas): @classmethod - def setUpClass(cls): - super(TestOperationsNumExprPython, cls).setUpClass() + def setup_class(cls): + super(TestOperationsNumExprPython, cls).setup_class() cls.engine = 'numexpr' cls.parser = 'python' tm.skip_if_no_ne(cls.engine) @@ -1528,40 +1529,40 @@ def setUpClass(cls): cls.arith_ops) def test_check_many_exprs(self): - a = 1 + a = 1 # noqa expr = ' * '.join('a' * 33) expected = 1 res = pd.eval(expr, engine=self.engine, parser=self.parser) - tm.assert_equal(res, expected) + assert res == expected def test_fails_and(self): df = DataFrame(np.random.randn(5, 3)) - self.assertRaises(NotImplementedError, pd.eval, 'df > 2 and df > 3', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + pytest.raises(NotImplementedError, pd.eval, 'df > 2 and df > 3', + local_dict={'df': df}, parser=self.parser, + engine=self.engine) def test_fails_or(self): df = DataFrame(np.random.randn(5, 3)) - self.assertRaises(NotImplementedError, pd.eval, 'df > 2 or df > 3', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + pytest.raises(NotImplementedError, pd.eval, 'df > 2 or df > 3', + local_dict={'df': df}, parser=self.parser, + engine=self.engine) def test_fails_not(self): df = DataFrame(np.random.randn(5, 3)) - self.assertRaises(NotImplementedError, pd.eval, 'not df > 2', - local_dict={'df': df}, parser=self.parser, - engine=self.engine) + pytest.raises(NotImplementedError, pd.eval, 'not df > 2', + local_dict={'df': df}, parser=self.parser, + engine=self.engine) def test_fails_ampersand(self): - df = DataFrame(np.random.randn(5, 3)) + df = DataFrame(np.random.randn(5, 3)) # noqa ex = '(df + 2)[df > 1] > 0 & (df > 0)' - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex, parser=self.parser, engine=self.engine) def test_fails_pipe(self): - df = DataFrame(np.random.randn(5, 3)) + df = DataFrame(np.random.randn(5, 3)) # noqa ex = '(df + 2)[df > 1] > 0 | (df > 0)' - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex, parser=self.parser, engine=self.engine) def test_bool_ops_with_constants(self): @@ -1569,31 +1570,31 @@ def test_bool_ops_with_constants(self): ('True', 'False')): ex = '{0} {1} {2}'.format(lhs, op, rhs) if op in ('and', 'or'): - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): self.eval(ex) else: res = self.eval(ex) exp = eval(ex) - self.assertEqual(res, exp) + assert res == exp def test_simple_bool_ops(self): for op, lhs, rhs in product(expr._bool_ops_syms, (True, False), (True, False)): ex = 'lhs {0} rhs'.format(op) if op in ('and', 'or'): - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex, engine=self.engine, parser=self.parser) else: res = pd.eval(ex, engine=self.engine, parser=self.parser) exp = eval(ex) - self.assertEqual(res, exp) + assert res == exp class TestOperationsPythonPython(TestOperationsNumExprPython): @classmethod - def setUpClass(cls): - super(TestOperationsPythonPython, cls).setUpClass() + def setup_class(cls): + super(TestOperationsPythonPython, cls).setup_class() cls.engine = cls.parser = 'python' cls.arith_ops = expr._arith_ops_syms + expr._cmp_ops_syms cls.arith_ops = filter(lambda x: x not in ('in', 'not in'), @@ -1603,18 +1604,17 @@ def setUpClass(cls): class TestOperationsPythonPandas(TestOperationsNumExprPandas): @classmethod - def setUpClass(cls): - super(TestOperationsPythonPandas, cls).setUpClass() + def setup_class(cls): + super(TestOperationsPythonPandas, cls).setup_class() cls.engine = 'python' cls.parser = 'pandas' cls.arith_ops = expr._arith_ops_syms + expr._cmp_ops_syms -class TestMathPythonPython(tm.TestCase): +class TestMathPythonPython(object): @classmethod - def setUpClass(cls): - super(TestMathPythonPython, cls).setUpClass() + def setup_class(cls): tm.skip_if_no_ne() cls.engine = 'python' cls.parser = 'pandas' @@ -1622,7 +1622,7 @@ def setUpClass(cls): cls.binary_fns = _binary_math_ops @classmethod - def tearDownClass(cls): + def teardown_class(cls): del cls.engine, cls.parser def eval(self, *args, **kwargs): @@ -1675,14 +1675,14 @@ def test_df_arithmetic_subexpression(self): def check_result_type(self, dtype, expect_dtype): df = DataFrame({'a': np.random.randn(10).astype(dtype)}) - self.assertEqual(df.a.dtype, dtype) + assert df.a.dtype == dtype df.eval("b = sin(a)", engine=self.engine, parser=self.parser, inplace=True) got = df.b expect = np.sin(df.a) - self.assertEqual(expect.dtype, got.dtype) - self.assertEqual(expect_dtype, got.dtype) + assert expect.dtype == got.dtype + assert expect_dtype == got.dtype tm.assert_series_equal(got, expect, check_names=False) def test_result_types(self): @@ -1693,7 +1693,7 @@ def test_result_types(self): def test_result_types2(self): # xref https://github.com/pandas-dev/pandas/issues/12293 - raise nose.SkipTest("unreliable tests on complex128") + pytest.skip("unreliable tests on complex128") # Did not test complex64 because DataFrame is converting it to # complex128. Due to https://github.com/pandas-dev/pandas/issues/10952 @@ -1701,17 +1701,17 @@ def test_result_types2(self): def test_undefined_func(self): df = DataFrame({'a': np.random.randn(10)}) - with tm.assertRaisesRegexp(ValueError, - "\"mysin\" is not a supported function"): + with tm.assert_raises_regex( + ValueError, "\"mysin\" is not a supported function"): df.eval("mysin(a)", engine=self.engine, parser=self.parser) def test_keyword_arg(self): df = DataFrame({'a': np.random.randn(10)}) - with tm.assertRaisesRegexp(TypeError, - "Function \"sin\" does not support " - "keyword arguments"): + with tm.assert_raises_regex(TypeError, + "Function \"sin\" does not support " + "keyword arguments"): df.eval("sin(x=a)", engine=self.engine, parser=self.parser) @@ -1720,8 +1720,8 @@ def test_keyword_arg(self): class TestMathPythonPandas(TestMathPythonPython): @classmethod - def setUpClass(cls): - super(TestMathPythonPandas, cls).setUpClass() + def setup_class(cls): + super(TestMathPythonPandas, cls).setup_class() cls.engine = 'python' cls.parser = 'pandas' @@ -1729,8 +1729,8 @@ def setUpClass(cls): class TestMathNumExprPandas(TestMathPythonPython): @classmethod - def setUpClass(cls): - super(TestMathNumExprPandas, cls).setUpClass() + def setup_class(cls): + super(TestMathNumExprPandas, cls).setup_class() cls.engine = 'numexpr' cls.parser = 'pandas' @@ -1738,8 +1738,8 @@ def setUpClass(cls): class TestMathNumExprPython(TestMathPythonPython): @classmethod - def setUpClass(cls): - super(TestMathNumExprPython, cls).setUpClass() + def setup_class(cls): + super(TestMathNumExprPython, cls).setup_class() cls.engine = 'numexpr' cls.parser = 'python' @@ -1749,208 +1749,142 @@ def setUpClass(cls): class TestScope(object): - def check_global_scope(self, e, engine, parser): - tm.skip_if_no_ne(engine) + def test_global_scope(self, engine, parser): + e = '_var_s * 2' tm.assert_numpy_array_equal(_var_s * 2, pd.eval(e, engine=engine, parser=parser)) - def test_global_scope(self): - e = '_var_s * 2' - for engine, parser in product(_engines, expr._parsers): - yield self.check_global_scope, e, engine, parser - - def check_no_new_locals(self, engine, parser): - tm.skip_if_no_ne(engine) - x = 1 + def test_no_new_locals(self, engine, parser): + x = 1 # noqa lcls = locals().copy() pd.eval('x + 1', local_dict=lcls, engine=engine, parser=parser) lcls2 = locals().copy() lcls2.pop('lcls') - tm.assert_equal(lcls, lcls2) - - def test_no_new_locals(self): - for engine, parser in product(_engines, expr._parsers): - yield self.check_no_new_locals, engine, parser + assert lcls == lcls2 - def check_no_new_globals(self, engine, parser): - tm.skip_if_no_ne(engine) - x = 1 + def test_no_new_globals(self, engine, parser): + x = 1 # noqa gbls = globals().copy() pd.eval('x + 1', engine=engine, parser=parser) gbls2 = globals().copy() - tm.assert_equal(gbls, gbls2) - - def test_no_new_globals(self): - for engine, parser in product(_engines, expr._parsers): - yield self.check_no_new_globals, engine, parser + assert gbls == gbls2 def test_invalid_engine(): tm.skip_if_no_ne() - assertRaisesRegexp(KeyError, 'Invalid engine \'asdf\' passed', - pd.eval, 'x + y', local_dict={'x': 1, 'y': 2}, - engine='asdf') + tm.assert_raises_regex(KeyError, 'Invalid engine \'asdf\' passed', + pd.eval, 'x + y', local_dict={'x': 1, 'y': 2}, + engine='asdf') def test_invalid_parser(): tm.skip_if_no_ne() - assertRaisesRegexp(KeyError, 'Invalid parser \'asdf\' passed', - pd.eval, 'x + y', local_dict={'x': 1, 'y': 2}, - parser='asdf') + tm.assert_raises_regex(KeyError, 'Invalid parser \'asdf\' passed', + pd.eval, 'x + y', local_dict={'x': 1, 'y': 2}, + parser='asdf') _parsers = {'python': PythonExprVisitor, 'pytables': pytables.ExprVisitor, 'pandas': PandasExprVisitor} -def check_disallowed_nodes(engine, parser): +@pytest.mark.parametrize('engine', _parsers) +@pytest.mark.parametrize('parser', _parsers) +def test_disallowed_nodes(engine, parser): tm.skip_if_no_ne(engine) VisitorClass = _parsers[parser] uns_ops = VisitorClass.unsupported_nodes inst = VisitorClass('x + 1', engine, parser) for ops in uns_ops: - assert_raises(NotImplementedError, getattr(inst, ops)) - + with pytest.raises(NotImplementedError): + getattr(inst, ops)() -def test_disallowed_nodes(): - for engine, visitor in product(_parsers, repeat=2): - yield check_disallowed_nodes, engine, visitor - -def check_syntax_error_exprs(engine, parser): - tm.skip_if_no_ne(engine) +def test_syntax_error_exprs(engine, parser): e = 's +' - assert_raises(SyntaxError, pd.eval, e, engine=engine, parser=parser) - - -def test_syntax_error_exprs(): - for engine, parser in ENGINES_PARSERS: - yield check_syntax_error_exprs, engine, parser + with pytest.raises(SyntaxError): + pd.eval(e, engine=engine, parser=parser) -def check_name_error_exprs(engine, parser): - tm.skip_if_no_ne(engine) +def test_name_error_exprs(engine, parser): e = 's + t' - with tm.assertRaises(NameError): + with pytest.raises(NameError): pd.eval(e, engine=engine, parser=parser) -def test_name_error_exprs(): - for engine, parser in ENGINES_PARSERS: - yield check_name_error_exprs, engine, parser - - -def check_invalid_local_variable_reference(engine, parser): - tm.skip_if_no_ne(engine) - - a, b = 1, 2 +def test_invalid_local_variable_reference(engine, parser): + a, b = 1, 2 # noqa exprs = 'a + @b', '@a + b', '@a + @b' - for expr in exprs: + + for _expr in exprs: if parser != 'pandas': - with tm.assertRaisesRegexp(SyntaxError, "The '@' prefix is only"): - pd.eval(exprs, engine=engine, parser=parser) + with tm.assert_raises_regex(SyntaxError, + "The '@' prefix is only"): + pd.eval(_expr, engine=engine, parser=parser) else: - with tm.assertRaisesRegexp(SyntaxError, "The '@' prefix is not"): - pd.eval(exprs, engine=engine, parser=parser) - + with tm.assert_raises_regex(SyntaxError, + "The '@' prefix is not"): + pd.eval(_expr, engine=engine, parser=parser) -def test_invalid_local_variable_reference(): - for engine, parser in ENGINES_PARSERS: - yield check_invalid_local_variable_reference, engine, parser - -def check_numexpr_builtin_raises(engine, parser): - tm.skip_if_no_ne(engine) +def test_numexpr_builtin_raises(engine, parser): sin, dotted_line = 1, 2 if engine == 'numexpr': - with tm.assertRaisesRegexp(NumExprClobberingError, - 'Variables in expression .+'): + with tm.assert_raises_regex(NumExprClobberingError, + 'Variables in expression .+'): pd.eval('sin + dotted_line', engine=engine, parser=parser) else: res = pd.eval('sin + dotted_line', engine=engine, parser=parser) - tm.assert_equal(res, sin + dotted_line) - + assert res == sin + dotted_line -def test_numexpr_builtin_raises(): - for engine, parser in ENGINES_PARSERS: - yield check_numexpr_builtin_raises, engine, parser - -def check_bad_resolver_raises(engine, parser): - tm.skip_if_no_ne(engine) +def test_bad_resolver_raises(engine, parser): cannot_resolve = 42, 3.0 - with tm.assertRaisesRegexp(TypeError, 'Resolver of type .+'): + with tm.assert_raises_regex(TypeError, 'Resolver of type .+'): pd.eval('1 + 2', resolvers=cannot_resolve, engine=engine, parser=parser) -def test_bad_resolver_raises(): - for engine, parser in ENGINES_PARSERS: - yield check_bad_resolver_raises, engine, parser - - -def check_empty_string_raises(engine, parser): +def test_empty_string_raises(engine, parser): # GH 13139 - tm.skip_if_no_ne(engine) - with tm.assertRaisesRegexp(ValueError, 'expr cannot be an empty string'): + with tm.assert_raises_regex(ValueError, + 'expr cannot be an empty string'): pd.eval('', engine=engine, parser=parser) -def test_empty_string_raises(): - for engine, parser in ENGINES_PARSERS: - yield check_empty_string_raises, engine, parser - - -def check_more_than_one_expression_raises(engine, parser): - tm.skip_if_no_ne(engine) - with tm.assertRaisesRegexp(SyntaxError, - 'only a single expression is allowed'): +def test_more_than_one_expression_raises(engine, parser): + with tm.assert_raises_regex(SyntaxError, + 'only a single expression is allowed'): pd.eval('1 + 1; 2 + 2', engine=engine, parser=parser) -def test_more_than_one_expression_raises(): - for engine, parser in ENGINES_PARSERS: - yield check_more_than_one_expression_raises, engine, parser +@pytest.mark.parametrize('cmp', ('and', 'or')) +@pytest.mark.parametrize('lhs', (int, float)) +@pytest.mark.parametrize('rhs', (int, float)) +def test_bool_ops_fails_on_scalars(lhs, cmp, rhs, engine, parser): + gen = {int: lambda: np.random.randint(10), float: np.random.randn} + mid = gen[lhs]() # noqa + lhs = gen[lhs]() # noqa + rhs = gen[rhs]() # noqa -def check_bool_ops_fails_on_scalars(gen, lhs, cmp, rhs, engine, parser): - tm.skip_if_no_ne(engine) - mid = gen[type(lhs)]() ex1 = 'lhs {0} mid {1} rhs'.format(cmp, cmp) ex2 = 'lhs {0} mid and mid {1} rhs'.format(cmp, cmp) ex3 = '(lhs {0} mid) & (mid {1} rhs)'.format(cmp, cmp) for ex in (ex1, ex2, ex3): - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval(ex, engine=engine, parser=parser) -def test_bool_ops_fails_on_scalars(): - _bool_ops_syms = 'and', 'or' - dtypes = int, float - gen = {int: lambda: np.random.randint(10), float: np.random.randn} - for engine, parser, dtype1, cmp, dtype2 in product(_engines, expr._parsers, - dtypes, _bool_ops_syms, - dtypes): - yield (check_bool_ops_fails_on_scalars, gen, gen[dtype1](), cmp, - gen[dtype2](), engine, parser) - - -def check_inf(engine, parser): - tm.skip_if_no_ne(engine) +def test_inf(engine, parser): s = 'inf + 1' expected = np.inf result = pd.eval(s, engine=engine, parser=parser) - tm.assert_equal(result, expected) - + assert result == expected -def test_inf(): - for engine, parser in ENGINES_PARSERS: - yield check_inf, engine, parser - -def check_negate_lt_eq_le(engine, parser): - tm.skip_if_no_ne(engine) +def test_negate_lt_eq_le(engine, parser): df = pd.DataFrame([[0, 10], [1, 20]], columns=['cat', 'count']) expected = df[~(df.cat > 0)] @@ -1958,27 +1892,18 @@ def check_negate_lt_eq_le(engine, parser): tm.assert_frame_equal(result, expected) if parser == 'python': - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.query('not (cat > 0)', engine=engine, parser=parser) else: result = df.query('not (cat > 0)', engine=engine, parser=parser) tm.assert_frame_equal(result, expected) -def test_negate_lt_eq_le(): - for engine, parser in product(_engines, expr._parsers): - yield check_negate_lt_eq_le, engine, parser - -class TestValidate(tm.TestCase): +class TestValidate(object): def test_validate_bool_args(self): - invalid_values = [1, "True", [1,2,3], 5.0] + invalid_values = [1, "True", [1, 2, 3], 5.0] for value in invalid_values: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): pd.eval("2+2", inplace=value) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_msgpack/__init__.py b/pandas/tests/dtypes/__init__.py similarity index 100% rename from pandas/tests/test_msgpack/__init__.py rename to pandas/tests/dtypes/__init__.py diff --git a/pandas/tests/dtypes/test_cast.py b/pandas/tests/dtypes/test_cast.py new file mode 100644 index 0000000000000..d9fb458c83529 --- /dev/null +++ b/pandas/tests/dtypes/test_cast.py @@ -0,0 +1,409 @@ +# -*- coding: utf-8 -*- + +""" +These test the private routines in types/cast.py + +""" + +import pytest +from datetime import datetime, timedelta, date +import numpy as np + +import pandas as pd +from pandas import (Timedelta, Timestamp, DatetimeIndex, + DataFrame, NaT, Period, Series) + +from pandas.core.dtypes.cast import ( + maybe_downcast_to_dtype, + maybe_convert_objects, + cast_scalar_to_array, + infer_dtype_from_scalar, + infer_dtype_from_array, + maybe_convert_string_to_object, + maybe_convert_scalar, + find_common_type) +from pandas.core.dtypes.dtypes import ( + CategoricalDtype, + DatetimeTZDtype, + PeriodDtype) +from pandas.core.dtypes.common import ( + is_dtype_equal) +from pandas.util import testing as tm + + +class TestMaybeDowncast(object): + + def test_downcast_conv(self): + # test downcasting + + arr = np.array([8.5, 8.6, 8.7, 8.8, 8.9999999999995]) + result = maybe_downcast_to_dtype(arr, 'infer') + assert (np.array_equal(result, arr)) + + arr = np.array([8., 8., 8., 8., 8.9999999999995]) + result = maybe_downcast_to_dtype(arr, 'infer') + expected = np.array([8, 8, 8, 8, 9]) + assert (np.array_equal(result, expected)) + + arr = np.array([8., 8., 8., 8., 9.0000000000005]) + result = maybe_downcast_to_dtype(arr, 'infer') + expected = np.array([8, 8, 8, 8, 9]) + assert (np.array_equal(result, expected)) + + # GH16875 coercing of bools + ser = Series([True, True, False]) + result = maybe_downcast_to_dtype(ser, np.dtype(np.float64)) + expected = ser + tm.assert_series_equal(result, expected) + + # conversions + + expected = np.array([1, 2]) + for dtype in [np.float64, object, np.int64]: + arr = np.array([1.0, 2.0], dtype=dtype) + result = maybe_downcast_to_dtype(arr, 'infer') + tm.assert_almost_equal(result, expected, check_dtype=False) + + for dtype in [np.float64, object]: + expected = np.array([1.0, 2.0, np.nan], dtype=dtype) + arr = np.array([1.0, 2.0, np.nan], dtype=dtype) + result = maybe_downcast_to_dtype(arr, 'infer') + tm.assert_almost_equal(result, expected) + + # empties + for dtype in [np.int32, np.float64, np.float32, np.bool_, + np.int64, object]: + arr = np.array([], dtype=dtype) + result = maybe_downcast_to_dtype(arr, 'int64') + tm.assert_almost_equal(result, np.array([], dtype=np.int64)) + assert result.dtype == np.int64 + + def test_datetimelikes_nan(self): + arr = np.array([1, 2, np.nan]) + exp = np.array([1, 2, np.datetime64('NaT')], dtype='datetime64[ns]') + res = maybe_downcast_to_dtype(arr, 'datetime64[ns]') + tm.assert_numpy_array_equal(res, exp) + + exp = np.array([1, 2, np.timedelta64('NaT')], dtype='timedelta64[ns]') + res = maybe_downcast_to_dtype(arr, 'timedelta64[ns]') + tm.assert_numpy_array_equal(res, exp) + + def test_datetime_with_timezone(self): + # GH 15426 + ts = Timestamp("2016-01-01 12:00:00", tz='US/Pacific') + exp = DatetimeIndex([ts, ts]) + res = maybe_downcast_to_dtype(exp, exp.dtype) + tm.assert_index_equal(res, exp) + + res = maybe_downcast_to_dtype(exp.asi8, exp.dtype) + tm.assert_index_equal(res, exp) + + +class TestInferDtype(object): + + def testinfer_dtype_from_scalar(self): + # Test that infer_dtype_from_scalar is returning correct dtype for int + # and float. + + for dtypec in [np.uint8, np.int8, np.uint16, np.int16, np.uint32, + np.int32, np.uint64, np.int64]: + data = dtypec(12) + dtype, val = infer_dtype_from_scalar(data) + assert dtype == type(data) + + data = 12 + dtype, val = infer_dtype_from_scalar(data) + assert dtype == np.int64 + + for dtypec in [np.float16, np.float32, np.float64]: + data = dtypec(12) + dtype, val = infer_dtype_from_scalar(data) + assert dtype == dtypec + + data = np.float(12) + dtype, val = infer_dtype_from_scalar(data) + assert dtype == np.float64 + + for data in [True, False]: + dtype, val = infer_dtype_from_scalar(data) + assert dtype == np.bool_ + + for data in [np.complex64(1), np.complex128(1)]: + dtype, val = infer_dtype_from_scalar(data) + assert dtype == np.complex_ + + for data in [np.datetime64(1, 'ns'), Timestamp(1), + datetime(2000, 1, 1, 0, 0)]: + dtype, val = infer_dtype_from_scalar(data) + assert dtype == 'M8[ns]' + + for data in [np.timedelta64(1, 'ns'), Timedelta(1), + timedelta(1)]: + dtype, val = infer_dtype_from_scalar(data) + assert dtype == 'm8[ns]' + + for tz in ['UTC', 'US/Eastern', 'Asia/Tokyo']: + dt = Timestamp(1, tz=tz) + dtype, val = infer_dtype_from_scalar(dt, pandas_dtype=True) + assert dtype == 'datetime64[ns, {0}]'.format(tz) + assert val == dt.value + + dtype, val = infer_dtype_from_scalar(dt) + assert dtype == np.object_ + assert val == dt + + for freq in ['M', 'D']: + p = Period('2011-01-01', freq=freq) + dtype, val = infer_dtype_from_scalar(p, pandas_dtype=True) + assert dtype == 'period[{0}]'.format(freq) + assert val == p.ordinal + + dtype, val = infer_dtype_from_scalar(p) + dtype == np.object_ + assert val == p + + # misc + for data in [date(2000, 1, 1), + Timestamp(1, tz='US/Eastern'), 'foo']: + + dtype, val = infer_dtype_from_scalar(data) + assert dtype == np.object_ + + def testinfer_dtype_from_scalar_errors(self): + with pytest.raises(ValueError): + infer_dtype_from_scalar(np.array([1])) + + @pytest.mark.parametrize( + "arr, expected, pandas_dtype", + [('foo', np.object_, False), + (b'foo', np.object_, False), + (1, np.int_, False), + (1.5, np.float_, False), + ([1], np.int_, False), + (np.array([1], dtype=np.int64), np.int64, False), + ([np.nan, 1, ''], np.object_, False), + (np.array([[1.0, 2.0]]), np.float_, False), + (pd.Categorical(list('aabc')), np.object_, False), + (pd.Categorical([1, 2, 3]), np.int64, False), + (pd.Categorical(list('aabc')), 'category', True), + (pd.Categorical([1, 2, 3]), 'category', True), + (Timestamp('20160101'), np.object_, False), + (np.datetime64('2016-01-01'), np.dtype(' df.two.sum() + + with catch_warnings(record=True) as w: + # successfully modify column in place + # this should not raise a warning + df.one += 1 + assert len(w) == 0 + assert df.one.iloc[0] == 2 + + with catch_warnings(record=True) as w: + # successfully add an attribute to a series + # this should not raise a warning + df.two.not_an_index = [1, 2] + assert len(w) == 0 + + with tm.assert_produces_warning(UserWarning): + # warn when setting column to nonexistent name + df.four = df.two + 2 + assert df.four.sum() > df.two.sum() + + with tm.assert_produces_warning(UserWarning): + # warn when column has same name as method + df['sum'] = df.two diff --git a/pandas/tests/types/test_inference.py b/pandas/tests/dtypes/test_inference.py similarity index 56% rename from pandas/tests/types/test_inference.py rename to pandas/tests/dtypes/test_inference.py index 5c35112d0fe19..dbde7ae5081d4 100644 --- a/pandas/tests/types/test_inference.py +++ b/pandas/tests/dtypes/test_inference.py @@ -5,39 +5,39 @@ related to inference and not otherwise tested in types/test_common.py """ - -import nose +from warnings import catch_warnings import collections import re from datetime import datetime, date, timedelta, time +from decimal import Decimal import numpy as np import pytz +import pytest import pandas as pd -from pandas import lib, tslib +from pandas._libs import tslib, lib from pandas import (Series, Index, DataFrame, Timedelta, DatetimeIndex, TimedeltaIndex, Timestamp, - Panel, Period, Categorical) -from pandas.compat import u, PY2, lrange -from pandas.types import inference -from pandas.types.common import (is_timedelta64_dtype, - is_timedelta64_ns_dtype, - is_datetime64_dtype, - is_datetime64_ns_dtype, - is_datetime64_any_dtype, - is_datetime64tz_dtype, - is_number, - is_integer, - is_float, - is_bool, - is_scalar, - _ensure_int32, - _ensure_categorical) -from pandas.types.missing import isnull + Panel, Period, Categorical, isna) +from pandas.compat import u, PY2, PY3, StringIO, lrange +from pandas.core.dtypes import inference +from pandas.core.dtypes.common import ( + is_timedelta64_dtype, + is_timedelta64_ns_dtype, + is_datetime64_dtype, + is_datetime64_ns_dtype, + is_datetime64_any_dtype, + is_datetime64tz_dtype, + is_number, + is_integer, + is_float, + is_bool, + is_scalar, + is_scipy_sparse, + _ensure_int32, + _ensure_categorical) from pandas.util import testing as tm -_multiprocess_can_split_ = True - def test_is_sequence(): is_seq = inference.is_sequence @@ -67,6 +67,27 @@ def test_is_list_like(): assert not inference.is_list_like(f) +@pytest.mark.parametrize('inner', [ + [], [1], (1, ), (1, 2), {'a': 1}, set([1, 'a']), Series([1]), + Series([]), Series(['a']).str, (x for x in range(5)) +]) +@pytest.mark.parametrize('outer', [ + list, Series, np.array, tuple +]) +def test_is_nested_list_like_passes(inner, outer): + result = outer([inner for _ in range(5)]) + assert inference.is_list_like(result) + + +@pytest.mark.parametrize('obj', [ + 'abc', [], [1], (1,), ['a'], 'a', {'a'}, + [1, 2, 3], Series([1]), DataFrame({"A": [1]}), + ([1, 2] for _ in range(5)), +]) +def test_is_nested_list_like_fails(obj): + assert not inference.is_nested_list_like(obj) + + def test_is_dict_like(): passes = [{}, {'A': 1}, Series([1])] fails = ['1', 1, [1, 2], (1, 2), range(2), Index([1])] @@ -78,6 +99,50 @@ def test_is_dict_like(): assert not inference.is_dict_like(f) +def test_is_file_like(): + class MockFile(object): + pass + + is_file = inference.is_file_like + + data = StringIO("data") + assert is_file(data) + + # No read / write attributes + # No iterator attributes + m = MockFile() + assert not is_file(m) + + MockFile.write = lambda self: 0 + + # Write attribute but not an iterator + m = MockFile() + assert not is_file(m) + + # gh-16530: Valid iterator just means we have the + # __iter__ attribute for our purposes. + MockFile.__iter__ = lambda self: self + + # Valid write-only file + m = MockFile() + assert is_file(m) + + del MockFile.write + MockFile.read = lambda self: 0 + + # Valid read-only file + m = MockFile() + assert is_file(m) + + # Iterator but no read / write attributes + data = [1, 2, 3] + assert not is_file(data) + + if PY3: + from unittest import mock + assert not is_file(mock.Mock()) + + def test_is_named_tuple(): passes = (collections.namedtuple('Test', list('abc'))(1, 2, 3), ) fails = ((1, 2, 3), 'a', Series({'pi': 3.14})) @@ -161,32 +226,35 @@ def test_is_recompilable(): assert not inference.is_re_compilable(f) -class TestInference(tm.TestCase): +class TestInference(object): def test_infer_dtype_bytes(self): compare = 'string' if PY2 else 'bytes' # string array of bytes arr = np.array(list('abc'), dtype='S1') - self.assertEqual(lib.infer_dtype(arr), compare) + assert lib.infer_dtype(arr) == compare # object array of bytes arr = arr.astype(object) - self.assertEqual(lib.infer_dtype(arr), compare) + assert lib.infer_dtype(arr) == compare + + # object array of bytes with missing values + assert lib.infer_dtype([b'a', np.nan, b'c'], skipna=True) == compare def test_isinf_scalar(self): # GH 11352 - self.assertTrue(lib.isposinf_scalar(float('inf'))) - self.assertTrue(lib.isposinf_scalar(np.inf)) - self.assertFalse(lib.isposinf_scalar(-np.inf)) - self.assertFalse(lib.isposinf_scalar(1)) - self.assertFalse(lib.isposinf_scalar('a')) - - self.assertTrue(lib.isneginf_scalar(float('-inf'))) - self.assertTrue(lib.isneginf_scalar(-np.inf)) - self.assertFalse(lib.isneginf_scalar(np.inf)) - self.assertFalse(lib.isneginf_scalar(1)) - self.assertFalse(lib.isneginf_scalar('a')) + assert lib.isposinf_scalar(float('inf')) + assert lib.isposinf_scalar(np.inf) + assert not lib.isposinf_scalar(-np.inf) + assert not lib.isposinf_scalar(1) + assert not lib.isposinf_scalar('a') + + assert lib.isneginf_scalar(float('-inf')) + assert lib.isneginf_scalar(-np.inf) + assert not lib.isneginf_scalar(np.inf) + assert not lib.isneginf_scalar(1) + assert not lib.isneginf_scalar('a') def test_maybe_convert_numeric_infinities(self): # see gh-13274 @@ -222,7 +290,7 @@ def test_maybe_convert_numeric_infinities(self): tm.assert_numpy_array_equal(out, pos) # too many characters - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): lib.maybe_convert_numeric( np.array(['foo_' + infinity], dtype=object), na_values, maybe_int) @@ -240,17 +308,17 @@ def test_maybe_convert_numeric_post_floatify_nan(self): def test_convert_infs(self): arr = np.array(['inf', 'inf', 'inf'], dtype='O') result = lib.maybe_convert_numeric(arr, set(), False) - self.assertTrue(result.dtype == np.float64) + assert result.dtype == np.float64 arr = np.array(['-inf', '-inf', '-inf'], dtype='O') result = lib.maybe_convert_numeric(arr, set(), False) - self.assertTrue(result.dtype == np.float64) + assert result.dtype == np.float64 def test_scientific_no_exponent(self): # See PR 12215 arr = np.array(['42E', '2E', '99e', '6e'], dtype='O') result = lib.maybe_convert_numeric(arr, set(), False, True) - self.assertTrue(np.all(np.isnan(result))) + assert np.all(np.isnan(result)) def test_convert_non_hashable(self): # GH13324 @@ -285,7 +353,7 @@ def test_convert_numeric_uint64_nan(self): for coerce in (True, False): for arr, na_values in cases: if coerce: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): lib.maybe_convert_numeric(arr, na_values, coerce_numeric=coerce) else: @@ -304,7 +372,7 @@ def test_convert_numeric_int64_uint64(self): for coerce in (True, False): for case in cases: if coerce: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): lib.maybe_convert_numeric(case, set(), coerce_numeric=coerce) else: @@ -340,309 +408,344 @@ def test_mixed_dtypes_remain_object_array(self): tm.assert_numpy_array_equal(result, array) -class TestTypeInference(tm.TestCase): - _multiprocess_can_split_ = True +class TestTypeInference(object): def test_length_zero(self): result = lib.infer_dtype(np.array([], dtype='i4')) - self.assertEqual(result, 'integer') + assert result == 'integer' result = lib.infer_dtype([]) - self.assertEqual(result, 'empty') + assert result == 'empty' def test_integers(self): arr = np.array([1, 2, 3, np.int64(4), np.int32(5)], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'integer') + assert result == 'integer' arr = np.array([1, 2, 3, np.int64(4), np.int32(5), 'foo'], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'mixed-integer') + assert result == 'mixed-integer' arr = np.array([1, 2, 3, 4, 5], dtype='i4') result = lib.infer_dtype(arr) - self.assertEqual(result, 'integer') + assert result == 'integer' def test_bools(self): arr = np.array([True, False, True, True, True], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'boolean') + assert result == 'boolean' arr = np.array([np.bool_(True), np.bool_(False)], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'boolean') + assert result == 'boolean' arr = np.array([True, False, True, 'foo'], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'mixed') + assert result == 'mixed' arr = np.array([True, False, True], dtype=bool) result = lib.infer_dtype(arr) - self.assertEqual(result, 'boolean') + assert result == 'boolean' + + arr = np.array([True, np.nan, False], dtype='O') + result = lib.infer_dtype(arr, skipna=True) + assert result == 'boolean' def test_floats(self): arr = np.array([1., 2., 3., np.float64(4), np.float32(5)], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'floating') + assert result == 'floating' arr = np.array([1, 2, 3, np.float64(4), np.float32(5), 'foo'], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'mixed-integer') + assert result == 'mixed-integer' arr = np.array([1, 2, 3, 4, 5], dtype='f4') result = lib.infer_dtype(arr) - self.assertEqual(result, 'floating') + assert result == 'floating' arr = np.array([1, 2, 3, 4, 5], dtype='f8') result = lib.infer_dtype(arr) - self.assertEqual(result, 'floating') + assert result == 'floating' + + def test_decimals(self): + # GH15690 + arr = np.array([Decimal(1), Decimal(2), Decimal(3)]) + result = lib.infer_dtype(arr) + assert result == 'decimal' + + arr = np.array([1.0, 2.0, Decimal(3)]) + result = lib.infer_dtype(arr) + assert result == 'mixed' + + arr = np.array([Decimal(1), Decimal('NaN'), Decimal(3)]) + result = lib.infer_dtype(arr) + assert result == 'decimal' + + arr = np.array([Decimal(1), np.nan, Decimal(3)], dtype='O') + result = lib.infer_dtype(arr) + assert result == 'decimal' def test_string(self): pass def test_unicode(self): - pass + arr = [u'a', np.nan, u'c'] + result = lib.infer_dtype(arr) + assert result == 'mixed' + + arr = [u'a', np.nan, u'c'] + result = lib.infer_dtype(arr, skipna=True) + expected = 'unicode' if PY2 else 'string' + assert result == expected def test_datetime(self): dates = [datetime(2012, 1, x) for x in range(1, 20)] index = Index(dates) - self.assertEqual(index.inferred_type, 'datetime64') + assert index.inferred_type == 'datetime64' def test_infer_dtype_datetime(self): arr = np.array([Timestamp('2011-01-01'), Timestamp('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([np.datetime64('2011-01-01'), np.datetime64('2011-01-01')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' arr = np.array([datetime(2011, 1, 1), datetime(2012, 2, 1)]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' # starts with nan for n in [pd.NaT, np.nan]: arr = np.array([n, pd.Timestamp('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([n, np.datetime64('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' arr = np.array([n, datetime(2011, 1, 1)]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([n, pd.Timestamp('2011-01-02'), n]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([n, np.datetime64('2011-01-02'), n]) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' arr = np.array([n, datetime(2011, 1, 1), n]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' # different type of nat arr = np.array([np.timedelta64('nat'), np.datetime64('2011-01-02')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([np.datetime64('2011-01-02'), np.timedelta64('nat')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' # mixed datetime arr = np.array([datetime(2011, 1, 1), pd.Timestamp('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' # should be datetime? arr = np.array([np.datetime64('2011-01-01'), pd.Timestamp('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([pd.Timestamp('2011-01-02'), np.datetime64('2011-01-01')]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([np.nan, pd.Timestamp('2011-01-02'), 1]) - self.assertEqual(lib.infer_dtype(arr), 'mixed-integer') + assert lib.infer_dtype(arr) == 'mixed-integer' arr = np.array([np.nan, pd.Timestamp('2011-01-02'), 1.1]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([np.nan, '2011-01-01', pd.Timestamp('2011-01-02')]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' def test_infer_dtype_timedelta(self): arr = np.array([pd.Timedelta('1 days'), pd.Timedelta('2 days')]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([np.timedelta64(1, 'D'), np.timedelta64(2, 'D')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([timedelta(1), timedelta(2)]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' # starts with nan for n in [pd.NaT, np.nan]: arr = np.array([n, Timedelta('1 days')]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([n, np.timedelta64(1, 'D')]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([n, timedelta(1)]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([n, pd.Timedelta('1 days'), n]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([n, np.timedelta64(1, 'D'), n]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([n, timedelta(1), n]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' # different type of nat arr = np.array([np.datetime64('nat'), np.timedelta64(1, 'D')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([np.timedelta64(1, 'D'), np.datetime64('nat')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' def test_infer_dtype_period(self): # GH 13664 arr = np.array([pd.Period('2011-01', freq='D'), pd.Period('2011-02', freq='D')]) - self.assertEqual(pd.lib.infer_dtype(arr), 'period') + assert lib.infer_dtype(arr) == 'period' arr = np.array([pd.Period('2011-01', freq='D'), pd.Period('2011-02', freq='M')]) - self.assertEqual(pd.lib.infer_dtype(arr), 'period') + assert lib.infer_dtype(arr) == 'period' # starts with nan for n in [pd.NaT, np.nan]: arr = np.array([n, pd.Period('2011-01', freq='D')]) - self.assertEqual(pd.lib.infer_dtype(arr), 'period') + assert lib.infer_dtype(arr) == 'period' arr = np.array([n, pd.Period('2011-01', freq='D'), n]) - self.assertEqual(pd.lib.infer_dtype(arr), 'period') + assert lib.infer_dtype(arr) == 'period' # different type of nat arr = np.array([np.datetime64('nat'), pd.Period('2011-01', freq='M')], dtype=object) - self.assertEqual(pd.lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([pd.Period('2011-01', freq='M'), np.datetime64('nat')], dtype=object) - self.assertEqual(pd.lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' def test_infer_dtype_all_nan_nat_like(self): arr = np.array([np.nan, np.nan]) - self.assertEqual(lib.infer_dtype(arr), 'floating') + assert lib.infer_dtype(arr) == 'floating' # nan and None mix are result in mixed arr = np.array([np.nan, np.nan, None]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([None, np.nan, np.nan]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' # pd.NaT arr = np.array([pd.NaT]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([pd.NaT, np.nan]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([np.nan, pd.NaT]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([np.nan, pd.NaT, np.nan]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' arr = np.array([None, pd.NaT, None]) - self.assertEqual(lib.infer_dtype(arr), 'datetime') + assert lib.infer_dtype(arr) == 'datetime' # np.datetime64(nat) arr = np.array([np.datetime64('nat')]) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' for n in [np.nan, pd.NaT, None]: arr = np.array([n, np.datetime64('nat'), n]) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' arr = np.array([pd.NaT, n, np.datetime64('nat'), n]) - self.assertEqual(lib.infer_dtype(arr), 'datetime64') + assert lib.infer_dtype(arr) == 'datetime64' arr = np.array([np.timedelta64('nat')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' for n in [np.nan, pd.NaT, None]: arr = np.array([n, np.timedelta64('nat'), n]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' arr = np.array([pd.NaT, n, np.timedelta64('nat'), n]) - self.assertEqual(lib.infer_dtype(arr), 'timedelta') + assert lib.infer_dtype(arr) == 'timedelta' # datetime / timedelta mixed arr = np.array([pd.NaT, np.datetime64('nat'), np.timedelta64('nat'), np.nan]) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' arr = np.array([np.timedelta64('nat'), np.datetime64('nat')], dtype=object) - self.assertEqual(lib.infer_dtype(arr), 'mixed') + assert lib.infer_dtype(arr) == 'mixed' def test_is_datetimelike_array_all_nan_nat_like(self): arr = np.array([np.nan, pd.NaT, np.datetime64('nat')]) - self.assertTrue(lib.is_datetime_array(arr)) - self.assertTrue(lib.is_datetime64_array(arr)) - self.assertFalse(lib.is_timedelta_array(arr)) - self.assertFalse(lib.is_timedelta64_array(arr)) - self.assertFalse(lib.is_timedelta_or_timedelta64_array(arr)) + assert lib.is_datetime_array(arr) + assert lib.is_datetime64_array(arr) + assert not lib.is_timedelta_array(arr) + assert not lib.is_timedelta64_array(arr) + assert not lib.is_timedelta_or_timedelta64_array(arr) arr = np.array([np.nan, pd.NaT, np.timedelta64('nat')]) - self.assertFalse(lib.is_datetime_array(arr)) - self.assertFalse(lib.is_datetime64_array(arr)) - self.assertTrue(lib.is_timedelta_array(arr)) - self.assertTrue(lib.is_timedelta64_array(arr)) - self.assertTrue(lib.is_timedelta_or_timedelta64_array(arr)) + assert not lib.is_datetime_array(arr) + assert not lib.is_datetime64_array(arr) + assert lib.is_timedelta_array(arr) + assert lib.is_timedelta64_array(arr) + assert lib.is_timedelta_or_timedelta64_array(arr) arr = np.array([np.nan, pd.NaT, np.datetime64('nat'), np.timedelta64('nat')]) - self.assertFalse(lib.is_datetime_array(arr)) - self.assertFalse(lib.is_datetime64_array(arr)) - self.assertFalse(lib.is_timedelta_array(arr)) - self.assertFalse(lib.is_timedelta64_array(arr)) - self.assertFalse(lib.is_timedelta_or_timedelta64_array(arr)) + assert not lib.is_datetime_array(arr) + assert not lib.is_datetime64_array(arr) + assert not lib.is_timedelta_array(arr) + assert not lib.is_timedelta64_array(arr) + assert not lib.is_timedelta_or_timedelta64_array(arr) arr = np.array([np.nan, pd.NaT]) - self.assertTrue(lib.is_datetime_array(arr)) - self.assertTrue(lib.is_datetime64_array(arr)) - self.assertTrue(lib.is_timedelta_array(arr)) - self.assertTrue(lib.is_timedelta64_array(arr)) - self.assertTrue(lib.is_timedelta_or_timedelta64_array(arr)) + assert lib.is_datetime_array(arr) + assert lib.is_datetime64_array(arr) + assert lib.is_timedelta_array(arr) + assert lib.is_timedelta64_array(arr) + assert lib.is_timedelta_or_timedelta64_array(arr) arr = np.array([np.nan, np.nan], dtype=object) - self.assertFalse(lib.is_datetime_array(arr)) - self.assertFalse(lib.is_datetime64_array(arr)) - self.assertFalse(lib.is_timedelta_array(arr)) - self.assertFalse(lib.is_timedelta64_array(arr)) - self.assertFalse(lib.is_timedelta_or_timedelta64_array(arr)) + assert not lib.is_datetime_array(arr) + assert not lib.is_datetime64_array(arr) + assert not lib.is_timedelta_array(arr) + assert not lib.is_timedelta64_array(arr) + assert not lib.is_timedelta_or_timedelta64_array(arr) def test_date(self): - dates = [date(2012, 1, x) for x in range(1, 20)] + dates = [date(2012, 1, day) for day in range(1, 20)] index = Index(dates) - self.assertEqual(index.inferred_type, 'date') + assert index.inferred_type == 'date' + + dates = [date(2012, 1, day) for day in range(1, 20)] + [np.nan] + result = lib.infer_dtype(dates) + assert result == 'mixed' + + result = lib.infer_dtype(dates, skipna=True) + assert result == 'date' def test_to_object_array_tuples(self): r = (5, 6) @@ -665,7 +768,7 @@ def test_object(self): # cannot infer more than this as only a single element arr = np.array([None], dtype='O') result = lib.infer_dtype(arr) - self.assertEqual(result, 'mixed') + assert result == 'mixed' def test_to_object_array_width(self): # see gh-13320 @@ -685,11 +788,11 @@ def test_to_object_array_width(self): tm.assert_numpy_array_equal(out, expected) def test_is_period(self): - self.assertTrue(lib.is_period(pd.Period('2011-01', freq='M'))) - self.assertFalse(lib.is_period(pd.PeriodIndex(['2011-01'], freq='M'))) - self.assertFalse(lib.is_period(pd.Timestamp('2011-01'))) - self.assertFalse(lib.is_period(1)) - self.assertFalse(lib.is_period(np.nan)) + assert lib.is_period(pd.Period('2011-01', freq='M')) + assert not lib.is_period(pd.PeriodIndex(['2011-01'], freq='M')) + assert not lib.is_period(pd.Timestamp('2011-01')) + assert not lib.is_period(1) + assert not lib.is_period(np.nan) def test_categorical(self): @@ -697,230 +800,227 @@ def test_categorical(self): from pandas import Categorical, Series arr = Categorical(list('abc')) result = lib.infer_dtype(arr) - self.assertEqual(result, 'categorical') + assert result == 'categorical' result = lib.infer_dtype(Series(arr)) - self.assertEqual(result, 'categorical') + assert result == 'categorical' arr = Categorical(list('abc'), categories=['cegfab'], ordered=True) result = lib.infer_dtype(arr) - self.assertEqual(result, 'categorical') + assert result == 'categorical' result = lib.infer_dtype(Series(arr)) - self.assertEqual(result, 'categorical') + assert result == 'categorical' -class TestNumberScalar(tm.TestCase): +class TestNumberScalar(object): def test_is_number(self): - self.assertTrue(is_number(True)) - self.assertTrue(is_number(1)) - self.assertTrue(is_number(1.1)) - self.assertTrue(is_number(1 + 3j)) - self.assertTrue(is_number(np.bool(False))) - self.assertTrue(is_number(np.int64(1))) - self.assertTrue(is_number(np.float64(1.1))) - self.assertTrue(is_number(np.complex128(1 + 3j))) - self.assertTrue(is_number(np.nan)) - - self.assertFalse(is_number(None)) - self.assertFalse(is_number('x')) - self.assertFalse(is_number(datetime(2011, 1, 1))) - self.assertFalse(is_number(np.datetime64('2011-01-01'))) - self.assertFalse(is_number(Timestamp('2011-01-01'))) - self.assertFalse(is_number(Timestamp('2011-01-01', - tz='US/Eastern'))) - self.assertFalse(is_number(timedelta(1000))) - self.assertFalse(is_number(Timedelta('1 days'))) + assert is_number(True) + assert is_number(1) + assert is_number(1.1) + assert is_number(1 + 3j) + assert is_number(np.bool(False)) + assert is_number(np.int64(1)) + assert is_number(np.float64(1.1)) + assert is_number(np.complex128(1 + 3j)) + assert is_number(np.nan) + + assert not is_number(None) + assert not is_number('x') + assert not is_number(datetime(2011, 1, 1)) + assert not is_number(np.datetime64('2011-01-01')) + assert not is_number(Timestamp('2011-01-01')) + assert not is_number(Timestamp('2011-01-01', tz='US/Eastern')) + assert not is_number(timedelta(1000)) + assert not is_number(Timedelta('1 days')) # questionable - self.assertFalse(is_number(np.bool_(False))) - self.assertTrue(is_number(np.timedelta64(1, 'D'))) + assert not is_number(np.bool_(False)) + assert is_number(np.timedelta64(1, 'D')) def test_is_bool(self): - self.assertTrue(is_bool(True)) - self.assertTrue(is_bool(np.bool(False))) - self.assertTrue(is_bool(np.bool_(False))) - - self.assertFalse(is_bool(1)) - self.assertFalse(is_bool(1.1)) - self.assertFalse(is_bool(1 + 3j)) - self.assertFalse(is_bool(np.int64(1))) - self.assertFalse(is_bool(np.float64(1.1))) - self.assertFalse(is_bool(np.complex128(1 + 3j))) - self.assertFalse(is_bool(np.nan)) - self.assertFalse(is_bool(None)) - self.assertFalse(is_bool('x')) - self.assertFalse(is_bool(datetime(2011, 1, 1))) - self.assertFalse(is_bool(np.datetime64('2011-01-01'))) - self.assertFalse(is_bool(Timestamp('2011-01-01'))) - self.assertFalse(is_bool(Timestamp('2011-01-01', - tz='US/Eastern'))) - self.assertFalse(is_bool(timedelta(1000))) - self.assertFalse(is_bool(np.timedelta64(1, 'D'))) - self.assertFalse(is_bool(Timedelta('1 days'))) + assert is_bool(True) + assert is_bool(np.bool(False)) + assert is_bool(np.bool_(False)) + + assert not is_bool(1) + assert not is_bool(1.1) + assert not is_bool(1 + 3j) + assert not is_bool(np.int64(1)) + assert not is_bool(np.float64(1.1)) + assert not is_bool(np.complex128(1 + 3j)) + assert not is_bool(np.nan) + assert not is_bool(None) + assert not is_bool('x') + assert not is_bool(datetime(2011, 1, 1)) + assert not is_bool(np.datetime64('2011-01-01')) + assert not is_bool(Timestamp('2011-01-01')) + assert not is_bool(Timestamp('2011-01-01', tz='US/Eastern')) + assert not is_bool(timedelta(1000)) + assert not is_bool(np.timedelta64(1, 'D')) + assert not is_bool(Timedelta('1 days')) def test_is_integer(self): - self.assertTrue(is_integer(1)) - self.assertTrue(is_integer(np.int64(1))) - - self.assertFalse(is_integer(True)) - self.assertFalse(is_integer(1.1)) - self.assertFalse(is_integer(1 + 3j)) - self.assertFalse(is_integer(np.bool(False))) - self.assertFalse(is_integer(np.bool_(False))) - self.assertFalse(is_integer(np.float64(1.1))) - self.assertFalse(is_integer(np.complex128(1 + 3j))) - self.assertFalse(is_integer(np.nan)) - self.assertFalse(is_integer(None)) - self.assertFalse(is_integer('x')) - self.assertFalse(is_integer(datetime(2011, 1, 1))) - self.assertFalse(is_integer(np.datetime64('2011-01-01'))) - self.assertFalse(is_integer(Timestamp('2011-01-01'))) - self.assertFalse(is_integer(Timestamp('2011-01-01', - tz='US/Eastern'))) - self.assertFalse(is_integer(timedelta(1000))) - self.assertFalse(is_integer(Timedelta('1 days'))) + assert is_integer(1) + assert is_integer(np.int64(1)) + + assert not is_integer(True) + assert not is_integer(1.1) + assert not is_integer(1 + 3j) + assert not is_integer(np.bool(False)) + assert not is_integer(np.bool_(False)) + assert not is_integer(np.float64(1.1)) + assert not is_integer(np.complex128(1 + 3j)) + assert not is_integer(np.nan) + assert not is_integer(None) + assert not is_integer('x') + assert not is_integer(datetime(2011, 1, 1)) + assert not is_integer(np.datetime64('2011-01-01')) + assert not is_integer(Timestamp('2011-01-01')) + assert not is_integer(Timestamp('2011-01-01', tz='US/Eastern')) + assert not is_integer(timedelta(1000)) + assert not is_integer(Timedelta('1 days')) # questionable - self.assertTrue(is_integer(np.timedelta64(1, 'D'))) + assert is_integer(np.timedelta64(1, 'D')) def test_is_float(self): - self.assertTrue(is_float(1.1)) - self.assertTrue(is_float(np.float64(1.1))) - self.assertTrue(is_float(np.nan)) - - self.assertFalse(is_float(True)) - self.assertFalse(is_float(1)) - self.assertFalse(is_float(1 + 3j)) - self.assertFalse(is_float(np.bool(False))) - self.assertFalse(is_float(np.bool_(False))) - self.assertFalse(is_float(np.int64(1))) - self.assertFalse(is_float(np.complex128(1 + 3j))) - self.assertFalse(is_float(None)) - self.assertFalse(is_float('x')) - self.assertFalse(is_float(datetime(2011, 1, 1))) - self.assertFalse(is_float(np.datetime64('2011-01-01'))) - self.assertFalse(is_float(Timestamp('2011-01-01'))) - self.assertFalse(is_float(Timestamp('2011-01-01', - tz='US/Eastern'))) - self.assertFalse(is_float(timedelta(1000))) - self.assertFalse(is_float(np.timedelta64(1, 'D'))) - self.assertFalse(is_float(Timedelta('1 days'))) + assert is_float(1.1) + assert is_float(np.float64(1.1)) + assert is_float(np.nan) + + assert not is_float(True) + assert not is_float(1) + assert not is_float(1 + 3j) + assert not is_float(np.bool(False)) + assert not is_float(np.bool_(False)) + assert not is_float(np.int64(1)) + assert not is_float(np.complex128(1 + 3j)) + assert not is_float(None) + assert not is_float('x') + assert not is_float(datetime(2011, 1, 1)) + assert not is_float(np.datetime64('2011-01-01')) + assert not is_float(Timestamp('2011-01-01')) + assert not is_float(Timestamp('2011-01-01', tz='US/Eastern')) + assert not is_float(timedelta(1000)) + assert not is_float(np.timedelta64(1, 'D')) + assert not is_float(Timedelta('1 days')) def test_is_datetime_dtypes(self): ts = pd.date_range('20130101', periods=3) tsa = pd.date_range('20130101', periods=3, tz='US/Eastern') - self.assertTrue(is_datetime64_dtype('datetime64')) - self.assertTrue(is_datetime64_dtype('datetime64[ns]')) - self.assertTrue(is_datetime64_dtype(ts)) - self.assertFalse(is_datetime64_dtype(tsa)) + assert is_datetime64_dtype('datetime64') + assert is_datetime64_dtype('datetime64[ns]') + assert is_datetime64_dtype(ts) + assert not is_datetime64_dtype(tsa) - self.assertFalse(is_datetime64_ns_dtype('datetime64')) - self.assertTrue(is_datetime64_ns_dtype('datetime64[ns]')) - self.assertTrue(is_datetime64_ns_dtype(ts)) - self.assertTrue(is_datetime64_ns_dtype(tsa)) + assert not is_datetime64_ns_dtype('datetime64') + assert is_datetime64_ns_dtype('datetime64[ns]') + assert is_datetime64_ns_dtype(ts) + assert is_datetime64_ns_dtype(tsa) - self.assertTrue(is_datetime64_any_dtype('datetime64')) - self.assertTrue(is_datetime64_any_dtype('datetime64[ns]')) - self.assertTrue(is_datetime64_any_dtype(ts)) - self.assertTrue(is_datetime64_any_dtype(tsa)) + assert is_datetime64_any_dtype('datetime64') + assert is_datetime64_any_dtype('datetime64[ns]') + assert is_datetime64_any_dtype(ts) + assert is_datetime64_any_dtype(tsa) - self.assertFalse(is_datetime64tz_dtype('datetime64')) - self.assertFalse(is_datetime64tz_dtype('datetime64[ns]')) - self.assertFalse(is_datetime64tz_dtype(ts)) - self.assertTrue(is_datetime64tz_dtype(tsa)) + assert not is_datetime64tz_dtype('datetime64') + assert not is_datetime64tz_dtype('datetime64[ns]') + assert not is_datetime64tz_dtype(ts) + assert is_datetime64tz_dtype(tsa) for tz in ['US/Eastern', 'UTC']: dtype = 'datetime64[ns, {}]'.format(tz) - self.assertFalse(is_datetime64_dtype(dtype)) - self.assertTrue(is_datetime64tz_dtype(dtype)) - self.assertTrue(is_datetime64_ns_dtype(dtype)) - self.assertTrue(is_datetime64_any_dtype(dtype)) + assert not is_datetime64_dtype(dtype) + assert is_datetime64tz_dtype(dtype) + assert is_datetime64_ns_dtype(dtype) + assert is_datetime64_any_dtype(dtype) def test_is_timedelta(self): - self.assertTrue(is_timedelta64_dtype('timedelta64')) - self.assertTrue(is_timedelta64_dtype('timedelta64[ns]')) - self.assertFalse(is_timedelta64_ns_dtype('timedelta64')) - self.assertTrue(is_timedelta64_ns_dtype('timedelta64[ns]')) + assert is_timedelta64_dtype('timedelta64') + assert is_timedelta64_dtype('timedelta64[ns]') + assert not is_timedelta64_ns_dtype('timedelta64') + assert is_timedelta64_ns_dtype('timedelta64[ns]') tdi = TimedeltaIndex([1e14, 2e14], dtype='timedelta64') - self.assertTrue(is_timedelta64_dtype(tdi)) - self.assertTrue(is_timedelta64_ns_dtype(tdi)) - self.assertTrue(is_timedelta64_ns_dtype(tdi.astype('timedelta64[ns]'))) + assert is_timedelta64_dtype(tdi) + assert is_timedelta64_ns_dtype(tdi) + assert is_timedelta64_ns_dtype(tdi.astype('timedelta64[ns]')) # Conversion to Int64Index: - self.assertFalse(is_timedelta64_ns_dtype(tdi.astype('timedelta64'))) - self.assertFalse(is_timedelta64_ns_dtype(tdi.astype('timedelta64[h]'))) + assert not is_timedelta64_ns_dtype(tdi.astype('timedelta64')) + assert not is_timedelta64_ns_dtype(tdi.astype('timedelta64[h]')) -class Testisscalar(tm.TestCase): +class Testisscalar(object): def test_isscalar_builtin_scalars(self): - self.assertTrue(is_scalar(None)) - self.assertTrue(is_scalar(True)) - self.assertTrue(is_scalar(False)) - self.assertTrue(is_scalar(0.)) - self.assertTrue(is_scalar(np.nan)) - self.assertTrue(is_scalar('foobar')) - self.assertTrue(is_scalar(b'foobar')) - self.assertTrue(is_scalar(u('efoobar'))) - self.assertTrue(is_scalar(datetime(2014, 1, 1))) - self.assertTrue(is_scalar(date(2014, 1, 1))) - self.assertTrue(is_scalar(time(12, 0))) - self.assertTrue(is_scalar(timedelta(hours=1))) - self.assertTrue(is_scalar(pd.NaT)) + assert is_scalar(None) + assert is_scalar(True) + assert is_scalar(False) + assert is_scalar(0.) + assert is_scalar(np.nan) + assert is_scalar('foobar') + assert is_scalar(b'foobar') + assert is_scalar(u('efoobar')) + assert is_scalar(datetime(2014, 1, 1)) + assert is_scalar(date(2014, 1, 1)) + assert is_scalar(time(12, 0)) + assert is_scalar(timedelta(hours=1)) + assert is_scalar(pd.NaT) def test_isscalar_builtin_nonscalars(self): - self.assertFalse(is_scalar({})) - self.assertFalse(is_scalar([])) - self.assertFalse(is_scalar([1])) - self.assertFalse(is_scalar(())) - self.assertFalse(is_scalar((1, ))) - self.assertFalse(is_scalar(slice(None))) - self.assertFalse(is_scalar(Ellipsis)) + assert not is_scalar({}) + assert not is_scalar([]) + assert not is_scalar([1]) + assert not is_scalar(()) + assert not is_scalar((1, )) + assert not is_scalar(slice(None)) + assert not is_scalar(Ellipsis) def test_isscalar_numpy_array_scalars(self): - self.assertTrue(is_scalar(np.int64(1))) - self.assertTrue(is_scalar(np.float64(1.))) - self.assertTrue(is_scalar(np.int32(1))) - self.assertTrue(is_scalar(np.object_('foobar'))) - self.assertTrue(is_scalar(np.str_('foobar'))) - self.assertTrue(is_scalar(np.unicode_(u('foobar')))) - self.assertTrue(is_scalar(np.bytes_(b'foobar'))) - self.assertTrue(is_scalar(np.datetime64('2014-01-01'))) - self.assertTrue(is_scalar(np.timedelta64(1, 'h'))) + assert is_scalar(np.int64(1)) + assert is_scalar(np.float64(1.)) + assert is_scalar(np.int32(1)) + assert is_scalar(np.object_('foobar')) + assert is_scalar(np.str_('foobar')) + assert is_scalar(np.unicode_(u('foobar'))) + assert is_scalar(np.bytes_(b'foobar')) + assert is_scalar(np.datetime64('2014-01-01')) + assert is_scalar(np.timedelta64(1, 'h')) def test_isscalar_numpy_zerodim_arrays(self): for zerodim in [np.array(1), np.array('foobar'), np.array(np.datetime64('2014-01-01')), np.array(np.timedelta64(1, 'h')), np.array(np.datetime64('NaT'))]: - self.assertFalse(is_scalar(zerodim)) - self.assertTrue(is_scalar(lib.item_from_zerodim(zerodim))) + assert not is_scalar(zerodim) + assert is_scalar(lib.item_from_zerodim(zerodim)) def test_isscalar_numpy_arrays(self): - self.assertFalse(is_scalar(np.array([]))) - self.assertFalse(is_scalar(np.array([[]]))) - self.assertFalse(is_scalar(np.matrix('1; 2'))) + assert not is_scalar(np.array([])) + assert not is_scalar(np.array([[]])) + assert not is_scalar(np.matrix('1; 2')) def test_isscalar_pandas_scalars(self): - self.assertTrue(is_scalar(Timestamp('2014-01-01'))) - self.assertTrue(is_scalar(Timedelta(hours=1))) - self.assertTrue(is_scalar(Period('2014-01-01'))) + assert is_scalar(Timestamp('2014-01-01')) + assert is_scalar(Timedelta(hours=1)) + assert is_scalar(Period('2014-01-01')) def test_lisscalar_pandas_containers(self): - self.assertFalse(is_scalar(Series())) - self.assertFalse(is_scalar(Series([1]))) - self.assertFalse(is_scalar(DataFrame())) - self.assertFalse(is_scalar(DataFrame([[1]]))) - self.assertFalse(is_scalar(Panel())) - self.assertFalse(is_scalar(Panel([[[1]]]))) - self.assertFalse(is_scalar(Index([]))) - self.assertFalse(is_scalar(Index([1]))) + assert not is_scalar(Series()) + assert not is_scalar(Series([1])) + assert not is_scalar(DataFrame()) + assert not is_scalar(DataFrame([[1]])) + with catch_warnings(record=True): + assert not is_scalar(Panel()) + assert not is_scalar(Panel([[[1]]])) + assert not is_scalar(Index([])) + assert not is_scalar(Index([1])) def test_datetimeindex_from_empty_datetime64_array(): @@ -942,7 +1042,7 @@ def test_nan_to_nat_conversions(): s = df['B'].copy() s._data = s._data.setitem(indexer=tuple([slice(8, 9)]), value=np.nan) - assert (isnull(s[8])) + assert (isna(s[8])) # numpy < 1.7.0 is wrong from distutils.version import LooseVersion @@ -950,6 +1050,12 @@ def test_nan_to_nat_conversions(): assert (s[8].value == np.datetime64('NaT').astype(np.int64)) +def test_is_scipy_sparse(spmatrix): # noqa: F811 + tm._skip_if_no_scipy() + assert is_scipy_sparse(spmatrix([[0, 1]])) + assert not is_scipy_sparse(np.array([1])) + + def test_ensure_int32(): values = np.arange(10, dtype=np.int32) result = _ensure_int32(values) @@ -968,8 +1074,3 @@ def test_ensure_categorical(): values = Categorical(values) result = _ensure_categorical(values) tm.assert_categorical_equal(result, values) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/types/test_io.py b/pandas/tests/dtypes/test_io.py similarity index 74% rename from pandas/tests/types/test_io.py rename to pandas/tests/dtypes/test_io.py index 545edf8f1386c..ae92e9ecca681 100644 --- a/pandas/tests/types/test_io.py +++ b/pandas/tests/dtypes/test_io.py @@ -1,25 +1,25 @@ # -*- coding: utf-8 -*- import numpy as np -import pandas.lib as lib +import pandas._libs.lib as lib import pandas.util.testing as tm from pandas.compat import long, u -class TestParseSQL(tm.TestCase): +class TestParseSQL(object): def test_convert_sql_column_floats(self): arr = np.array([1.5, None, 3, 4.2], dtype=object) result = lib.convert_sql_column(arr) expected = np.array([1.5, np.nan, 3, 4.2], dtype='f8') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_strings(self): arr = np.array(['1.5', None, '3', '4.2'], dtype=object) result = lib.convert_sql_column(arr) expected = np.array(['1.5', np.nan, '3', '4.2'], dtype=object) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_unicode(self): arr = np.array([u('1.5'), None, u('3'), u('4.2')], @@ -27,7 +27,7 @@ def test_convert_sql_column_unicode(self): result = lib.convert_sql_column(arr) expected = np.array([u('1.5'), np.nan, u('3'), u('4.2')], dtype=object) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_ints(self): arr = np.array([1, 2, 3, 4], dtype='O') @@ -35,82 +35,75 @@ def test_convert_sql_column_ints(self): result = lib.convert_sql_column(arr) result2 = lib.convert_sql_column(arr2) expected = np.array([1, 2, 3, 4], dtype='i8') - self.assert_numpy_array_equal(result, expected) - self.assert_numpy_array_equal(result2, expected) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result2, expected) arr = np.array([1, 2, 3, None, 4], dtype='O') result = lib.convert_sql_column(arr) expected = np.array([1, 2, 3, np.nan, 4], dtype='f8') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_longs(self): arr = np.array([long(1), long(2), long(3), long(4)], dtype='O') result = lib.convert_sql_column(arr) expected = np.array([1, 2, 3, 4], dtype='i8') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) arr = np.array([long(1), long(2), long(3), None, long(4)], dtype='O') result = lib.convert_sql_column(arr) expected = np.array([1, 2, 3, np.nan, 4], dtype='f8') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_bools(self): arr = np.array([True, False, True, False], dtype='O') result = lib.convert_sql_column(arr) expected = np.array([True, False, True, False], dtype=bool) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) arr = np.array([True, False, None, False], dtype='O') result = lib.convert_sql_column(arr) expected = np.array([True, False, np.nan, False], dtype=object) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_sql_column_decimals(self): from decimal import Decimal arr = np.array([Decimal('1.5'), None, Decimal('3'), Decimal('4.2')]) result = lib.convert_sql_column(arr) expected = np.array([1.5, np.nan, 3, 4.2], dtype='f8') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_convert_downcast_int64(self): - from pandas.parser import na_values + from pandas._libs.parsers import na_values arr = np.array([1, 2, 7, 8, 10], dtype=np.int64) expected = np.array([1, 2, 7, 8, 10], dtype=np.int8) # default argument result = lib.downcast_int64(arr, na_values) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = lib.downcast_int64(arr, na_values, use_unsigned=False) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) expected = np.array([1, 2, 7, 8, 10], dtype=np.uint8) result = lib.downcast_int64(arr, na_values, use_unsigned=True) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) # still cast to int8 despite use_unsigned=True # because of the negative number as an element arr = np.array([1, 2, -7, 8, 10], dtype=np.int64) expected = np.array([1, 2, -7, 8, 10], dtype=np.int8) result = lib.downcast_int64(arr, na_values, use_unsigned=True) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) arr = np.array([1, 2, 7, 8, 300], dtype=np.int64) expected = np.array([1, 2, 7, 8, 300], dtype=np.int16) result = lib.downcast_int64(arr, na_values) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) int8_na = na_values[np.int8] int64_na = na_values[np.int64] arr = np.array([int64_na, 2, 3, 10, 15], dtype=np.int64) expected = np.array([int8_na, 2, 3, 10, 15], dtype=np.int8) result = lib.downcast_int64(arr, na_values) - self.assert_numpy_array_equal(result, expected) - - -if __name__ == '__main__': - import nose - - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/types/test_missing.py b/pandas/tests/dtypes/test_missing.py similarity index 60% rename from pandas/tests/types/test_missing.py rename to pandas/tests/dtypes/test_missing.py index fa2bd535bb8d5..d3c9ca51af18f 100644 --- a/pandas/tests/types/test_missing.py +++ b/pandas/tests/dtypes/test_missing.py @@ -1,6 +1,7 @@ # -*- coding: utf-8 -*- -import nose +import pytest +from warnings import catch_warnings import numpy as np from datetime import datetime from pandas.util import testing as tm @@ -8,158 +9,186 @@ import pandas as pd from pandas.core import config as cf from pandas.compat import u -from pandas.tslib import iNaT +from pandas._libs.tslib import iNaT from pandas import (NaT, Float64Index, Series, DatetimeIndex, TimedeltaIndex, date_range) -from pandas.types.dtypes import DatetimeTZDtype -from pandas.types.missing import (array_equivalent, isnull, notnull, - na_value_for_dtype) +from pandas.core.dtypes.common import is_scalar +from pandas.core.dtypes.dtypes import DatetimeTZDtype +from pandas.core.dtypes.missing import ( + array_equivalent, isna, notna, isnull, notnull, + na_value_for_dtype) -_multiprocess_can_split_ = True +@pytest.mark.parametrize('notna_f', [notna, notnull]) +def test_notna_notnull(notna_f): + assert notna_f(1.) + assert not notna_f(None) + assert not notna_f(np.NaN) -def test_notnull(): - assert notnull(1.) - assert not notnull(None) - assert not notnull(np.NaN) - - with cf.option_context("mode.use_inf_as_null", False): - assert notnull(np.inf) - assert notnull(-np.inf) + with cf.option_context("mode.use_inf_as_na", False): + assert notna_f(np.inf) + assert notna_f(-np.inf) arr = np.array([1.5, np.inf, 3.5, -np.inf]) - result = notnull(arr) + result = notna_f(arr) assert result.all() - with cf.option_context("mode.use_inf_as_null", True): - assert not notnull(np.inf) - assert not notnull(-np.inf) + with cf.option_context("mode.use_inf_as_na", True): + assert not notna_f(np.inf) + assert not notna_f(-np.inf) arr = np.array([1.5, np.inf, 3.5, -np.inf]) - result = notnull(arr) + result = notna_f(arr) assert result.sum() == 2 - with cf.option_context("mode.use_inf_as_null", False): + with cf.option_context("mode.use_inf_as_na", False): for s in [tm.makeFloatSeries(), tm.makeStringSeries(), tm.makeObjectSeries(), tm.makeTimeSeries(), tm.makePeriodSeries()]: - assert (isinstance(isnull(s), Series)) + assert (isinstance(notna_f(s), Series)) -class TestIsNull(tm.TestCase): +class TestIsNA(object): def test_0d_array(self): - self.assertTrue(isnull(np.array(np.nan))) - self.assertFalse(isnull(np.array(0.0))) - self.assertFalse(isnull(np.array(0))) + assert isna(np.array(np.nan)) + assert not isna(np.array(0.0)) + assert not isna(np.array(0)) # test object dtype - self.assertTrue(isnull(np.array(np.nan, dtype=object))) - self.assertFalse(isnull(np.array(0.0, dtype=object))) - self.assertFalse(isnull(np.array(0, dtype=object))) - - def test_isnull(self): - self.assertFalse(isnull(1.)) - self.assertTrue(isnull(None)) - self.assertTrue(isnull(np.NaN)) - self.assertTrue(float('nan')) - self.assertFalse(isnull(np.inf)) - self.assertFalse(isnull(-np.inf)) + assert isna(np.array(np.nan, dtype=object)) + assert not isna(np.array(0.0, dtype=object)) + assert not isna(np.array(0, dtype=object)) + + def test_empty_object(self): + + for shape in [(4, 0), (4,)]: + arr = np.empty(shape=shape, dtype=object) + result = isna(arr) + expected = np.ones(shape=shape, dtype=bool) + tm.assert_numpy_array_equal(result, expected) + + @pytest.mark.parametrize('isna_f', [isna, isnull]) + def test_isna_isnull(self, isna_f): + assert not isna_f(1.) + assert isna_f(None) + assert isna_f(np.NaN) + assert float('nan') + assert not isna_f(np.inf) + assert not isna_f(-np.inf) # series for s in [tm.makeFloatSeries(), tm.makeStringSeries(), tm.makeObjectSeries(), tm.makeTimeSeries(), tm.makePeriodSeries()]: - self.assertIsInstance(isnull(s), Series) + assert isinstance(isna_f(s), Series) # frame for df in [tm.makeTimeDataFrame(), tm.makePeriodFrame(), tm.makeMixedDataFrame()]: - result = isnull(df) - expected = df.apply(isnull) + result = isna_f(df) + expected = df.apply(isna_f) tm.assert_frame_equal(result, expected) # panel - for p in [tm.makePanel(), tm.makePeriodPanel(), - tm.add_nans(tm.makePanel())]: - result = isnull(p) - expected = p.apply(isnull) - tm.assert_panel_equal(result, expected) + with catch_warnings(record=True): + for p in [tm.makePanel(), tm.makePeriodPanel(), + tm.add_nans(tm.makePanel())]: + result = isna_f(p) + expected = p.apply(isna_f) + tm.assert_panel_equal(result, expected) # panel 4d - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): for p in [tm.makePanel4D(), tm.add_nans_panel4d(tm.makePanel4D())]: - result = isnull(p) - expected = p.apply(isnull) + result = isna_f(p) + expected = p.apply(isna_f) tm.assert_panel4d_equal(result, expected) - def test_isnull_lists(self): - result = isnull([[False]]) + def test_isna_lists(self): + result = isna([[False]]) exp = np.array([[False]]) tm.assert_numpy_array_equal(result, exp) - result = isnull([[1], [2]]) + result = isna([[1], [2]]) exp = np.array([[False], [False]]) tm.assert_numpy_array_equal(result, exp) # list of strings / unicode - result = isnull(['foo', 'bar']) + result = isna(['foo', 'bar']) exp = np.array([False, False]) tm.assert_numpy_array_equal(result, exp) - result = isnull([u('foo'), u('bar')]) + result = isna([u('foo'), u('bar')]) exp = np.array([False, False]) tm.assert_numpy_array_equal(result, exp) - def test_isnull_nat(self): - result = isnull([NaT]) + def test_isna_nat(self): + result = isna([NaT]) exp = np.array([True]) tm.assert_numpy_array_equal(result, exp) - result = isnull(np.array([NaT], dtype=object)) + result = isna(np.array([NaT], dtype=object)) exp = np.array([True]) tm.assert_numpy_array_equal(result, exp) - def test_isnull_numpy_nat(self): + def test_isna_numpy_nat(self): arr = np.array([NaT, np.datetime64('NaT'), np.timedelta64('NaT'), np.datetime64('NaT', 's')]) - result = isnull(arr) + result = isna(arr) expected = np.array([True] * 4) tm.assert_numpy_array_equal(result, expected) - def test_isnull_datetime(self): - self.assertFalse(isnull(datetime.now())) - self.assertTrue(notnull(datetime.now())) + def test_isna_datetime(self): + assert not isna(datetime.now()) + assert notna(datetime.now()) idx = date_range('1/1/1990', periods=20) exp = np.ones(len(idx), dtype=bool) - tm.assert_numpy_array_equal(notnull(idx), exp) + tm.assert_numpy_array_equal(notna(idx), exp) idx = np.asarray(idx) idx[0] = iNaT idx = DatetimeIndex(idx) - mask = isnull(idx) - self.assertTrue(mask[0]) + mask = isna(idx) + assert mask[0] exp = np.array([True] + [False] * (len(idx) - 1), dtype=bool) - self.assert_numpy_array_equal(mask, exp) + tm.assert_numpy_array_equal(mask, exp) # GH 9129 pidx = idx.to_period(freq='M') - mask = isnull(pidx) - self.assertTrue(mask[0]) + mask = isna(pidx) + assert mask[0] exp = np.array([True] + [False] * (len(idx) - 1), dtype=bool) - self.assert_numpy_array_equal(mask, exp) + tm.assert_numpy_array_equal(mask, exp) - mask = isnull(pidx[1:]) + mask = isna(pidx[1:]) exp = np.zeros(len(mask), dtype=bool) - self.assert_numpy_array_equal(mask, exp) + tm.assert_numpy_array_equal(mask, exp) + + @pytest.mark.parametrize( + "value, expected", + [(np.complex128(np.nan), True), + (np.float64(1), False), + (np.array([1, 1 + 0j, np.nan, 3]), + np.array([False, False, True, False])), + (np.array([1, 1 + 0j, np.nan, 3], dtype=object), + np.array([False, False, True, False])), + (np.array([1, 1 + 0j, np.nan, 3]).astype(object), + np.array([False, False, True, False]))]) + def test_complex(self, value, expected): + result = isna(value) + if is_scalar(result): + assert result is expected + else: + tm.assert_numpy_array_equal(result, expected) def test_datetime_other_units(self): idx = pd.DatetimeIndex(['2011-01-01', 'NaT', '2011-01-02']) exp = np.array([False, True, False]) - tm.assert_numpy_array_equal(isnull(idx), exp) - tm.assert_numpy_array_equal(notnull(idx), ~exp) - tm.assert_numpy_array_equal(isnull(idx.values), exp) - tm.assert_numpy_array_equal(notnull(idx.values), ~exp) + tm.assert_numpy_array_equal(isna(idx), exp) + tm.assert_numpy_array_equal(notna(idx), ~exp) + tm.assert_numpy_array_equal(isna(idx.values), exp) + tm.assert_numpy_array_equal(notna(idx.values), ~exp) for dtype in ['datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', @@ -167,24 +196,24 @@ def test_datetime_other_units(self): values = idx.values.astype(dtype) exp = np.array([False, True, False]) - tm.assert_numpy_array_equal(isnull(values), exp) - tm.assert_numpy_array_equal(notnull(values), ~exp) + tm.assert_numpy_array_equal(isna(values), exp) + tm.assert_numpy_array_equal(notna(values), ~exp) exp = pd.Series([False, True, False]) s = pd.Series(values) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) s = pd.Series(values, dtype=object) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) def test_timedelta_other_units(self): idx = pd.TimedeltaIndex(['1 days', 'NaT', '2 days']) exp = np.array([False, True, False]) - tm.assert_numpy_array_equal(isnull(idx), exp) - tm.assert_numpy_array_equal(notnull(idx), ~exp) - tm.assert_numpy_array_equal(isnull(idx.values), exp) - tm.assert_numpy_array_equal(notnull(idx.values), ~exp) + tm.assert_numpy_array_equal(isna(idx), exp) + tm.assert_numpy_array_equal(notna(idx), ~exp) + tm.assert_numpy_array_equal(isna(idx.values), exp) + tm.assert_numpy_array_equal(notna(idx.values), ~exp) for dtype in ['timedelta64[D]', 'timedelta64[h]', 'timedelta64[m]', 'timedelta64[s]', 'timedelta64[ms]', 'timedelta64[us]', @@ -192,30 +221,30 @@ def test_timedelta_other_units(self): values = idx.values.astype(dtype) exp = np.array([False, True, False]) - tm.assert_numpy_array_equal(isnull(values), exp) - tm.assert_numpy_array_equal(notnull(values), ~exp) + tm.assert_numpy_array_equal(isna(values), exp) + tm.assert_numpy_array_equal(notna(values), ~exp) exp = pd.Series([False, True, False]) s = pd.Series(values) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) s = pd.Series(values, dtype=object) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) def test_period(self): idx = pd.PeriodIndex(['2011-01', 'NaT', '2012-01'], freq='M') exp = np.array([False, True, False]) - tm.assert_numpy_array_equal(isnull(idx), exp) - tm.assert_numpy_array_equal(notnull(idx), ~exp) + tm.assert_numpy_array_equal(isna(idx), exp) + tm.assert_numpy_array_equal(notna(idx), ~exp) exp = pd.Series([False, True, False]) s = pd.Series(idx) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) s = pd.Series(idx, dtype=object) - tm.assert_series_equal(isnull(s), exp) - tm.assert_series_equal(notnull(s), ~exp) + tm.assert_series_equal(isna(s), exp) + tm.assert_series_equal(notna(s), ~exp) def test_array_equivalent(): @@ -304,8 +333,3 @@ def test_na_value_for_dtype(): for dtype in ['O']: assert np.isnan(na_value_for_dtype(np.dtype(dtype))) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/formats/test_format.py b/pandas/tests/formats/test_format.py deleted file mode 100644 index 9eff64b40625d..0000000000000 --- a/pandas/tests/formats/test_format.py +++ /dev/null @@ -1,5006 +0,0 @@ -# -*- coding: utf-8 -*- - -# TODO(wesm): lots of issues making flake8 hard -# flake8: noqa - -from __future__ import print_function -from distutils.version import LooseVersion -import re - -from pandas.compat import (range, zip, lrange, StringIO, PY3, - u, lzip, is_platform_windows, - is_platform_32bit) -import pandas.compat as compat -import itertools -from operator import methodcaller -import os -import sys -from textwrap import dedent -import warnings - -from numpy import nan -from numpy.random import randn -import numpy as np - -import codecs - -div_style = '' -try: - import IPython - if IPython.__version__ < LooseVersion('3.0.0'): - div_style = ' style="max-width:1500px;overflow:auto;"' -except (ImportError, AttributeError): - pass - -from pandas import DataFrame, Series, Index, Timestamp, MultiIndex, date_range, NaT - -import pandas.formats.format as fmt -import pandas.util.testing as tm -import pandas.core.common as com -import pandas.formats.printing as printing -from pandas.util.terminal import get_terminal_size -import pandas as pd -from pandas.core.config import (set_option, get_option, option_context, - reset_option) -from datetime import datetime - -import nose - -use_32bit_repr = is_platform_windows() or is_platform_32bit() - -_frame = DataFrame(tm.getSeriesData()) - - -def curpath(): - pth, _ = os.path.split(os.path.abspath(__file__)) - return pth - - -def has_info_repr(df): - r = repr(df) - c1 = r.split('\n')[0].startswith(", 2. Index, 3. Columns, 4. dtype, 5. memory usage, 6. trailing newline - return has_info and nv - - -def has_horizontally_truncated_repr(df): - try: # Check header row - fst_line = np.array(repr(df).splitlines()[0].split()) - cand_col = np.where(fst_line == '...')[0][0] - except: - return False - # Make sure each row has this ... in the same place - r = repr(df) - for ix, l in enumerate(r.splitlines()): - if not r.split()[cand_col] == '...': - return False - return True - - -def has_vertically_truncated_repr(df): - r = repr(df) - only_dot_row = False - for row in r.splitlines(): - if re.match(r'^[\.\ ]+$', row): - only_dot_row = True - return only_dot_row - - -def has_truncated_repr(df): - return has_horizontally_truncated_repr( - df) or has_vertically_truncated_repr(df) - - -def has_doubly_truncated_repr(df): - return has_horizontally_truncated_repr( - df) and has_vertically_truncated_repr(df) - - -def has_expanded_repr(df): - r = repr(df) - for line in r.split('\n'): - if line.endswith('\\'): - return True - return False - - -class TestDataFrameFormatting(tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): - self.warn_filters = warnings.filters - warnings.filterwarnings('ignore', category=FutureWarning, - module=".*format") - - self.frame = _frame.copy() - - def tearDown(self): - warnings.filters = self.warn_filters - - def test_repr_embedded_ndarray(self): - arr = np.empty(10, dtype=[('err', object)]) - for i in range(len(arr)): - arr['err'][i] = np.random.randn(i) - - df = DataFrame(arr) - repr(df['err']) - repr(df) - df.to_string() - - def test_eng_float_formatter(self): - self.frame.loc[5] = 0 - - fmt.set_eng_float_format() - repr(self.frame) - - fmt.set_eng_float_format(use_eng_prefix=True) - repr(self.frame) - - fmt.set_eng_float_format(accuracy=0) - repr(self.frame) - self.reset_display_options() - - def test_show_null_counts(self): - - df = DataFrame(1, columns=range(10), index=range(10)) - df.iloc[1, 1] = np.nan - - def check(null_counts, result): - buf = StringIO() - df.info(buf=buf, null_counts=null_counts) - self.assertTrue(('non-null' in buf.getvalue()) is result) - - with option_context('display.max_info_rows', 20, - 'display.max_info_columns', 20): - check(None, True) - check(True, True) - check(False, False) - - with option_context('display.max_info_rows', 5, - 'display.max_info_columns', 5): - check(None, False) - check(True, False) - check(False, False) - - def test_repr_tuples(self): - buf = StringIO() - - df = DataFrame({'tups': lzip(range(10), range(10))}) - repr(df) - df.to_string(col_space=10, buf=buf) - - def test_repr_truncation(self): - max_len = 20 - with option_context("display.max_colwidth", max_len): - df = DataFrame({'A': np.random.randn(10), - 'B': [tm.rands(np.random.randint( - max_len - 1, max_len + 1)) for i in range(10) - ]}) - r = repr(df) - r = r[r.find('\n') + 1:] - - adj = fmt._get_adjustment() - - for line, value in lzip(r.split('\n'), df['B']): - if adj.len(value) + 1 > max_len: - self.assertIn('...', line) - else: - self.assertNotIn('...', line) - - with option_context("display.max_colwidth", 999999): - self.assertNotIn('...', repr(df)) - - with option_context("display.max_colwidth", max_len + 2): - self.assertNotIn('...', repr(df)) - - def test_repr_chop_threshold(self): - df = DataFrame([[0.1, 0.5], [0.5, -0.1]]) - pd.reset_option("display.chop_threshold") # default None - self.assertEqual(repr(df), ' 0 1\n0 0.1 0.5\n1 0.5 -0.1') - - with option_context("display.chop_threshold", 0.2): - self.assertEqual(repr(df), ' 0 1\n0 0.0 0.5\n1 0.5 0.0') - - with option_context("display.chop_threshold", 0.6): - self.assertEqual(repr(df), ' 0 1\n0 0.0 0.0\n1 0.0 0.0') - - with option_context("display.chop_threshold", None): - self.assertEqual(repr(df), ' 0 1\n0 0.1 0.5\n1 0.5 -0.1') - - def test_repr_obeys_max_seq_limit(self): - with option_context("display.max_seq_items", 2000): - self.assertTrue(len(printing.pprint_thing(lrange(1000))) > 1000) - - with option_context("display.max_seq_items", 5): - self.assertTrue(len(printing.pprint_thing(lrange(1000))) < 100) - - def test_repr_set(self): - self.assertEqual(printing.pprint_thing(set([1])), '{1}') - - def test_repr_is_valid_construction_code(self): - # for the case of Index, where the repr is traditional rather then - # stylized - idx = Index(['a', 'b']) - res = eval("pd." + repr(idx)) - tm.assert_series_equal(Series(res), Series(idx)) - - def test_repr_should_return_str(self): - # http://docs.python.org/py3k/reference/datamodel.html#object.__repr__ - # http://docs.python.org/reference/datamodel.html#object.__repr__ - # "...The return value must be a string object." - - # (str on py2.x, str (unicode) on py3) - - data = [8, 5, 3, 5] - index1 = [u("\u03c3"), u("\u03c4"), u("\u03c5"), u("\u03c6")] - cols = [u("\u03c8")] - df = DataFrame(data, columns=cols, index=index1) - self.assertTrue(type(df.__repr__()) == str) # both py2 / 3 - - def test_repr_no_backslash(self): - with option_context('mode.sim_interactive', True): - df = DataFrame(np.random.randn(10, 4)) - self.assertTrue('\\' not in repr(df)) - - def test_expand_frame_repr(self): - df_small = DataFrame('hello', [0], [0]) - df_wide = DataFrame('hello', [0], lrange(10)) - df_tall = DataFrame('hello', lrange(30), lrange(5)) - - with option_context('mode.sim_interactive', True): - with option_context('display.max_columns', 10, 'display.width', 20, - 'display.max_rows', 20, - 'display.show_dimensions', True): - with option_context('display.expand_frame_repr', True): - self.assertFalse(has_truncated_repr(df_small)) - self.assertFalse(has_expanded_repr(df_small)) - self.assertFalse(has_truncated_repr(df_wide)) - self.assertTrue(has_expanded_repr(df_wide)) - self.assertTrue(has_vertically_truncated_repr(df_tall)) - self.assertTrue(has_expanded_repr(df_tall)) - - with option_context('display.expand_frame_repr', False): - self.assertFalse(has_truncated_repr(df_small)) - self.assertFalse(has_expanded_repr(df_small)) - self.assertFalse(has_horizontally_truncated_repr(df_wide)) - self.assertFalse(has_expanded_repr(df_wide)) - self.assertTrue(has_vertically_truncated_repr(df_tall)) - self.assertFalse(has_expanded_repr(df_tall)) - - def test_repr_non_interactive(self): - # in non interactive mode, there can be no dependency on the - # result of terminal auto size detection - df = DataFrame('hello', lrange(1000), lrange(5)) - - with option_context('mode.sim_interactive', False, 'display.width', 0, - 'display.height', 0, 'display.max_rows', 5000): - self.assertFalse(has_truncated_repr(df)) - self.assertFalse(has_expanded_repr(df)) - - def test_repr_max_columns_max_rows(self): - term_width, term_height = get_terminal_size() - if term_width < 10 or term_height < 10: - raise nose.SkipTest("terminal size too small, " - "{0} x {1}".format(term_width, term_height)) - - def mkframe(n): - index = ['%05d' % i for i in range(n)] - return DataFrame(0, index, index) - - df6 = mkframe(6) - df10 = mkframe(10) - with option_context('mode.sim_interactive', True): - with option_context('display.width', term_width * 2): - with option_context('display.max_rows', 5, - 'display.max_columns', 5): - self.assertFalse(has_expanded_repr(mkframe(4))) - self.assertFalse(has_expanded_repr(mkframe(5))) - self.assertFalse(has_expanded_repr(df6)) - self.assertTrue(has_doubly_truncated_repr(df6)) - - with option_context('display.max_rows', 20, - 'display.max_columns', 10): - # Out off max_columns boundary, but no extending - # since not exceeding width - self.assertFalse(has_expanded_repr(df6)) - self.assertFalse(has_truncated_repr(df6)) - - with option_context('display.max_rows', 9, - 'display.max_columns', 10): - # out vertical bounds can not result in exanded repr - self.assertFalse(has_expanded_repr(df10)) - self.assertTrue(has_vertically_truncated_repr(df10)) - - # width=None in terminal, auto detection - with option_context('display.max_columns', 100, 'display.max_rows', - term_width * 20, 'display.width', None): - df = mkframe((term_width // 7) - 2) - self.assertFalse(has_expanded_repr(df)) - df = mkframe((term_width // 7) + 2) - printing.pprint_thing(df._repr_fits_horizontal_()) - self.assertTrue(has_expanded_repr(df)) - - def test_str_max_colwidth(self): - # GH 7856 - df = pd.DataFrame([{'a': 'foo', - 'b': 'bar', - 'c': 'uncomfortably long line with lots of stuff', - 'd': 1}, {'a': 'foo', - 'b': 'bar', - 'c': 'stuff', - 'd': 1}]) - df.set_index(['a', 'b', 'c']) - self.assertTrue( - str(df) == - ' a b c d\n' - '0 foo bar uncomfortably long line with lots of stuff 1\n' - '1 foo bar stuff 1') - with option_context('max_colwidth', 20): - self.assertTrue(str(df) == ' a b c d\n' - '0 foo bar uncomfortably lo... 1\n' - '1 foo bar stuff 1') - - def test_auto_detect(self): - term_width, term_height = get_terminal_size() - fac = 1.05 # Arbitrary large factor to exceed term widht - cols = range(int(term_width * fac)) - index = range(10) - df = DataFrame(index=index, columns=cols) - with option_context('mode.sim_interactive', True): - with option_context('max_rows', None): - with option_context('max_columns', None): - # Wrap around with None - self.assertTrue(has_expanded_repr(df)) - with option_context('max_rows', 0): - with option_context('max_columns', 0): - # Truncate with auto detection. - self.assertTrue(has_horizontally_truncated_repr(df)) - - index = range(int(term_height * fac)) - df = DataFrame(index=index, columns=cols) - with option_context('max_rows', 0): - with option_context('max_columns', None): - # Wrap around with None - self.assertTrue(has_expanded_repr(df)) - # Truncate vertically - self.assertTrue(has_vertically_truncated_repr(df)) - - with option_context('max_rows', None): - with option_context('max_columns', 0): - self.assertTrue(has_horizontally_truncated_repr(df)) - - def test_to_string_repr_unicode(self): - buf = StringIO() - - unicode_values = [u('\u03c3')] * 10 - unicode_values = np.array(unicode_values, dtype=object) - df = DataFrame({'unicode': unicode_values}) - df.to_string(col_space=10, buf=buf) - - # it works! - repr(df) - - idx = Index(['abc', u('\u03c3a'), 'aegdvg']) - ser = Series(np.random.randn(len(idx)), idx) - rs = repr(ser).split('\n') - line_len = len(rs[0]) - for line in rs[1:]: - try: - line = line.decode(get_option("display.encoding")) - except: - pass - if not line.startswith('dtype:'): - self.assertEqual(len(line), line_len) - - # it works even if sys.stdin in None - _stdin = sys.stdin - try: - sys.stdin = None - repr(df) - finally: - sys.stdin = _stdin - - def test_to_string_unicode_columns(self): - df = DataFrame({u('\u03c3'): np.arange(10.)}) - - buf = StringIO() - df.to_string(buf=buf) - buf.getvalue() - - buf = StringIO() - df.info(buf=buf) - buf.getvalue() - - result = self.frame.to_string() - tm.assertIsInstance(result, compat.text_type) - - def test_to_string_utf8_columns(self): - n = u("\u05d0").encode('utf-8') - - with option_context('display.max_rows', 1): - df = DataFrame([1, 2], columns=[n]) - repr(df) - - def test_to_string_unicode_two(self): - dm = DataFrame({u('c/\u03c3'): []}) - buf = StringIO() - dm.to_string(buf) - - def test_to_string_unicode_three(self): - dm = DataFrame(['\xc2']) - buf = StringIO() - dm.to_string(buf) - - def test_to_string_with_formatters(self): - df = DataFrame({'int': [1, 2, 3], - 'float': [1.0, 2.0, 3.0], - 'object': [(1, 2), True, False]}, - columns=['int', 'float', 'object']) - - formatters = [('int', lambda x: '0x%x' % x), - ('float', lambda x: '[% 4.1f]' % x), - ('object', lambda x: '-%s-' % str(x))] - result = df.to_string(formatters=dict(formatters)) - result2 = df.to_string(formatters=lzip(*formatters)[1]) - self.assertEqual(result, (' int float object\n' - '0 0x1 [ 1.0] -(1, 2)-\n' - '1 0x2 [ 2.0] -True-\n' - '2 0x3 [ 3.0] -False-')) - self.assertEqual(result, result2) - - def test_to_string_with_datetime64_monthformatter(self): - months = [datetime(2016, 1, 1), datetime(2016, 2, 2)] - x = DataFrame({'months': months}) - - def format_func(x): - return x.strftime('%Y-%m') - result = x.to_string(formatters={'months': format_func}) - expected = 'months\n0 2016-01\n1 2016-02' - self.assertEqual(result.strip(), expected) - - def test_to_string_with_datetime64_hourformatter(self): - - x = DataFrame({'hod': pd.to_datetime(['10:10:10.100', '12:12:12.120'], - format='%H:%M:%S.%f')}) - - def format_func(x): - return x.strftime('%H:%M') - - result = x.to_string(formatters={'hod': format_func}) - expected = 'hod\n0 10:10\n1 12:12' - self.assertEqual(result.strip(), expected) - - def test_to_string_with_formatters_unicode(self): - df = DataFrame({u('c/\u03c3'): [1, 2, 3]}) - result = df.to_string(formatters={u('c/\u03c3'): lambda x: '%s' % x}) - self.assertEqual(result, u(' c/\u03c3\n') + '0 1\n1 2\n2 3') - - def test_east_asian_unicode_frame(self): - if PY3: - _rep = repr - else: - _rep = unicode - - # not alighned properly because of east asian width - - # mid col - df = DataFrame({'a': [u'あ', u'いいい', u'う', u'ええええええ'], - 'b': [1, 222, 33333, 4]}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na あ 1\n" - u"bb いいい 222\nc う 33333\n" - u"ddd ええええええ 4") - self.assertEqual(_rep(df), expected) - - # last col - df = DataFrame({'a': [1, 222, 33333, 4], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na 1 あ\n" - u"bb 222 いいい\nc 33333 う\n" - u"ddd 4 ええええええ") - self.assertEqual(_rep(df), expected) - - # all col - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na あああああ あ\n" - u"bb い いいい\nc う う\n" - u"ddd えええ ええええええ") - self.assertEqual(_rep(df), expected) - - # column name - df = DataFrame({u'あああああ': [1, 222, 33333, 4], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" b あああああ\na あ 1\n" - u"bb いいい 222\nc う 33333\n" - u"ddd ええええええ 4") - self.assertEqual(_rep(df), expected) - - # index - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=[u'あああ', u'いいいいいい', u'うう', u'え']) - expected = (u" a b\nあああ あああああ あ\n" - u"いいいいいい い いいい\nうう う う\n" - u"え えええ ええええええ") - self.assertEqual(_rep(df), expected) - - # index name - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=pd.Index([u'あ', u'い', u'うう', u'え'], name=u'おおおお')) - expected = (u" a b\nおおおお \nあ あああああ あ\n" - u"い い いいい\nうう う う\nえ えええ ええええええ" - ) - self.assertEqual(_rep(df), expected) - - # all - df = DataFrame({u'あああ': [u'あああ', u'い', u'う', u'えええええ'], - u'いいいいい': [u'あ', u'いいい', u'う', u'ええ']}, - index=pd.Index([u'あ', u'いいい', u'うう', u'え'], name=u'お')) - expected = (u" あああ いいいいい\nお \nあ あああ あ\n" - u"いいい い いいい\nうう う う\nえ えええええ ええ") - self.assertEqual(_rep(df), expected) - - # MultiIndex - idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( - u'おおお', u'かかかか'), (u'き', u'くく')]) - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, index=idx) - expected = (u" a b\nあ いい あああああ あ\n" - u"う え い いいい\nおおお かかかか う う\n" - u"き くく えええ ええええええ") - self.assertEqual(_rep(df), expected) - - # truncate - with option_context('display.max_rows', 3, 'display.max_columns', 3): - df = pd.DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ'], - 'c': [u'お', u'か', u'ききき', u'くくくくくく'], - u'ああああ': [u'さ', u'し', u'す', u'せ']}, - columns=['a', 'b', 'c', u'ああああ']) - - expected = (u" a ... ああああ\n0 あああああ ... さ\n" - u".. ... ... ...\n3 えええ ... せ\n" - u"\n[4 rows x 4 columns]") - self.assertEqual(_rep(df), expected) - - df.index = [u'あああ', u'いいいい', u'う', 'aaa'] - expected = (u" a ... ああああ\nあああ あああああ ... さ\n" - u".. ... ... ...\naaa えええ ... せ\n" - u"\n[4 rows x 4 columns]") - self.assertEqual(_rep(df), expected) - - # Emable Unicode option ----------------------------------------- - with option_context('display.unicode.east_asian_width', True): - - # mid col - df = DataFrame({'a': [u'あ', u'いいい', u'う', u'ええええええ'], - 'b': [1, 222, 33333, 4]}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na あ 1\n" - u"bb いいい 222\nc う 33333\n" - u"ddd ええええええ 4") - self.assertEqual(_rep(df), expected) - - # last col - df = DataFrame({'a': [1, 222, 33333, 4], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na 1 あ\n" - u"bb 222 いいい\nc 33333 う\n" - u"ddd 4 ええええええ") - self.assertEqual(_rep(df), expected) - - # all col - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" a b\na あああああ あ\n" - u"bb い いいい\nc う う\n" - u"ddd えええ ええええええ" - "") - self.assertEqual(_rep(df), expected) - - # column name - df = DataFrame({u'あああああ': [1, 222, 33333, 4], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=['a', 'bb', 'c', 'ddd']) - expected = (u" b あああああ\na あ 1\n" - u"bb いいい 222\nc う 33333\n" - u"ddd ええええええ 4") - self.assertEqual(_rep(df), expected) - - # index - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=[u'あああ', u'いいいいいい', u'うう', u'え']) - expected = (u" a b\nあああ あああああ あ\n" - u"いいいいいい い いいい\nうう う う\n" - u"え えええ ええええええ") - self.assertEqual(_rep(df), expected) - - # index name - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, - index=pd.Index([u'あ', u'い', u'うう', u'え'], name=u'おおおお')) - expected = (u" a b\nおおおお \n" - u"あ あああああ あ\nい い いいい\n" - u"うう う う\nえ えええ ええええええ" - ) - self.assertEqual(_rep(df), expected) - - # all - df = DataFrame({u'あああ': [u'あああ', u'い', u'う', u'えええええ'], - u'いいいいい': [u'あ', u'いいい', u'う', u'ええ']}, - index=pd.Index([u'あ', u'いいい', u'うう', u'え'], name=u'お')) - expected = (u" あああ いいいいい\nお \n" - u"あ あああ あ\nいいい い いいい\n" - u"うう う う\nえ えええええ ええ") - self.assertEqual(_rep(df), expected) - - # MultiIndex - idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( - u'おおお', u'かかかか'), (u'き', u'くく')]) - df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, index=idx) - expected = (u" a b\nあ いい あああああ あ\n" - u"う え い いいい\nおおお かかかか う う\n" - u"き くく えええ ええええええ") - self.assertEqual(_rep(df), expected) - - # truncate - with option_context('display.max_rows', 3, 'display.max_columns', - 3): - - df = pd.DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], - 'b': [u'あ', u'いいい', u'う', u'ええええええ'], - 'c': [u'お', u'か', u'ききき', u'くくくくくく'], - u'ああああ': [u'さ', u'し', u'す', u'せ']}, - columns=['a', 'b', 'c', u'ああああ']) - - expected = (u" a ... ああああ\n0 あああああ ... さ\n" - u".. ... ... ...\n3 えええ ... せ\n" - u"\n[4 rows x 4 columns]") - self.assertEqual(_rep(df), expected) - - df.index = [u'あああ', u'いいいい', u'う', 'aaa'] - expected = (u" a ... ああああ\nあああ あああああ ... さ\n" - u"... ... ... ...\naaa えええ ... せ\n" - u"\n[4 rows x 4 columns]") - self.assertEqual(_rep(df), expected) - - # ambiguous unicode - df = DataFrame({u'あああああ': [1, 222, 33333, 4], - 'b': [u'あ', u'いいい', u'¡¡', u'ええええええ']}, - index=['a', 'bb', 'c', '¡¡¡']) - expected = (u" b あああああ\na あ 1\n" - u"bb いいい 222\nc ¡¡ 33333\n" - u"¡¡¡ ええええええ 4") - self.assertEqual(_rep(df), expected) - - def test_to_string_buffer_all_unicode(self): - buf = StringIO() - - empty = DataFrame({u('c/\u03c3'): Series()}) - nonempty = DataFrame({u('c/\u03c3'): Series([1, 2, 3])}) - - print(empty, file=buf) - print(nonempty, file=buf) - - # this should work - buf.getvalue() - - def test_to_string_with_col_space(self): - df = DataFrame(np.random.random(size=(1, 3))) - c10 = len(df.to_string(col_space=10).split("\n")[1]) - c20 = len(df.to_string(col_space=20).split("\n")[1]) - c30 = len(df.to_string(col_space=30).split("\n")[1]) - self.assertTrue(c10 < c20 < c30) - - # GH 8230 - # col_space wasn't being applied with header=False - with_header = df.to_string(col_space=20) - with_header_row1 = with_header.splitlines()[1] - no_header = df.to_string(col_space=20, header=False) - self.assertEqual(len(with_header_row1), len(no_header)) - - def test_to_string_truncate_indices(self): - for index in [tm.makeStringIndex, tm.makeUnicodeIndex, tm.makeIntIndex, - tm.makeDateIndex, tm.makePeriodIndex]: - for column in [tm.makeStringIndex]: - for h in [10, 20]: - for w in [10, 20]: - with option_context("display.expand_frame_repr", - False): - df = DataFrame(index=index(h), columns=column(w)) - with option_context("display.max_rows", 15): - if h == 20: - self.assertTrue( - has_vertically_truncated_repr(df)) - else: - self.assertFalse( - has_vertically_truncated_repr(df)) - with option_context("display.max_columns", 15): - if w == 20: - self.assertTrue( - has_horizontally_truncated_repr(df)) - else: - self.assertFalse( - has_horizontally_truncated_repr(df)) - with option_context("display.max_rows", 15, - "display.max_columns", 15): - if h == 20 and w == 20: - self.assertTrue(has_doubly_truncated_repr( - df)) - else: - self.assertFalse(has_doubly_truncated_repr( - df)) - - def test_to_string_truncate_multilevel(self): - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - df = DataFrame(index=arrays, columns=arrays) - with option_context("display.max_rows", 7, "display.max_columns", 7): - self.assertTrue(has_doubly_truncated_repr(df)) - - def test_truncate_with_different_dtypes(self): - - # 11594, 12045 - # when truncated the dtypes of the splits can differ - - # 11594 - import datetime - s = Series([datetime.datetime(2012, 1, 1)]*10 + [datetime.datetime(1012,1,2)] + [datetime.datetime(2012, 1, 3)]*10) - - with pd.option_context('display.max_rows', 8): - result = str(s) - self.assertTrue('object' in result) - - # 12045 - df = DataFrame({'text': ['some words'] + [None]*9}) - - with pd.option_context('display.max_rows', 8, 'display.max_columns', 3): - result = str(df) - self.assertTrue('None' in result) - self.assertFalse('NaN' in result) - - def test_datetimelike_frame(self): - - # GH 12211 - df = DataFrame({'date' : [pd.Timestamp('20130101').tz_localize('UTC')] + [pd.NaT]*5}) - - with option_context("display.max_rows", 5): - result = str(df) - self.assertTrue('2013-01-01 00:00:00+00:00' in result) - self.assertTrue('NaT' in result) - self.assertTrue('...' in result) - self.assertTrue('[6 rows x 1 columns]' in result) - - dts = [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5 + [pd.NaT] * 5 - df = pd.DataFrame({"dt": dts, - "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) - with option_context('display.max_rows', 5): - expected = (' dt x\n' - '0 2011-01-01 00:00:00-05:00 1\n' - '1 2011-01-01 00:00:00-05:00 2\n' - '.. ... ..\n' - '8 NaT 9\n' - '9 NaT 10\n\n' - '[10 rows x 2 columns]') - self.assertEqual(repr(df), expected) - - dts = [pd.NaT] * 5 + [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5 - df = pd.DataFrame({"dt": dts, - "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) - with option_context('display.max_rows', 5): - expected = (' dt x\n' - '0 NaT 1\n' - '1 NaT 2\n' - '.. ... ..\n' - '8 2011-01-01 00:00:00-05:00 9\n' - '9 2011-01-01 00:00:00-05:00 10\n\n' - '[10 rows x 2 columns]') - self.assertEqual(repr(df), expected) - - dts = ([pd.Timestamp('2011-01-01', tz='Asia/Tokyo')] * 5 + - [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5) - df = pd.DataFrame({"dt": dts, - "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) - with option_context('display.max_rows', 5): - expected = (' dt x\n' - '0 2011-01-01 00:00:00+09:00 1\n' - '1 2011-01-01 00:00:00+09:00 2\n' - '.. ... ..\n' - '8 2011-01-01 00:00:00-05:00 9\n' - '9 2011-01-01 00:00:00-05:00 10\n\n' - '[10 rows x 2 columns]') - self.assertEqual(repr(df), expected) - - def test_to_html_with_col_space(self): - def check_with_width(df, col_space): - import re - # check that col_space affects HTML generation - # and be very brittle about it. - html = df.to_html(col_space=col_space) - hdrs = [x for x in html.split(r"\n") if re.search(r"\s]", x)] - self.assertTrue(len(hdrs) > 0) - for h in hdrs: - self.assertTrue("min-width" in h) - self.assertTrue(str(col_space) in h) - - df = DataFrame(np.random.random(size=(1, 3))) - - check_with_width(df, 30) - check_with_width(df, 50) - - def test_to_html_with_empty_string_label(self): - # GH3547, to_html regards empty string labels as repeated labels - data = {'c1': ['a', 'b'], 'c2': ['a', ''], 'data': [1, 2]} - df = DataFrame(data).set_index(['c1', 'c2']) - res = df.to_html() - self.assertTrue("rowspan" not in res) - - def test_to_html_unicode(self): - df = DataFrame({u('\u03c3'): np.arange(10.)}) - expected = u'\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\u03c3
00.0
11.0
22.0
33.0
44.0
55.0
66.0
77.0
88.0
99.0
' - self.assertEqual(df.to_html(), expected) - df = DataFrame({'A': [u('\u03c3')]}) - expected = u'\n \n \n \n \n \n \n \n \n \n \n \n \n
A
0\u03c3
' - self.assertEqual(df.to_html(), expected) - - def test_to_html_decimal(self): - # GH 12031 - df = DataFrame({'A': [6.0, 3.1, 2.2]}) - result = df.to_html(decimal=',') - expected = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
A
06,0
13,1
22,2
') - self.assertEqual(result, expected) - - def test_to_html_escaped(self): - a = 'str", - b: ""}, - 'co>l2': {a: "", - b: ""}} - rs = DataFrame(test_dict).to_html() - xp = """ - - - - - - - - - - - - - - - - - - - -
co<l1co>l2
str<ing1 &amp;<type 'str'><type 'str'>
stri>ng2 &amp;<type 'str'><type 'str'>
""" - - self.assertEqual(xp, rs) - - def test_to_html_escape_disabled(self): - a = 'strbold", - b: "bold"}, - 'co>l2': {a: "bold", - b: "bold"}} - rs = DataFrame(test_dict).to_html(escape=False) - xp = """ - - - - - - - - - - - - - - - - - -
co - co>l2
str - boldbold
stri>ng2 &boldbold
""" - - self.assertEqual(xp, rs) - - def test_to_html_multiindex_index_false(self): - # issue 8452 - df = DataFrame({ - 'a': range(2), - 'b': range(3, 5), - 'c': range(5, 7), - 'd': range(3, 5) - }) - df.columns = MultiIndex.from_product([['a', 'b'], ['c', 'd']]) - result = df.to_html(index=False) - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ab
cdcd
0353
1464
""" - - self.assertEqual(result, expected) - - df.index = Index(df.index.values, name='idx') - result = df.to_html(index=False) - self.assertEqual(result, expected) - - def test_to_html_multiindex_sparsify_false_multi_sparse(self): - with option_context('display.multi_sparse', False): - index = MultiIndex.from_arrays([[0, 0, 1, 1], [0, 1, 0, 1]], - names=['foo', None]) - - df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], index=index) - - result = df.to_html() - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
01
foo
0001
0123
1045
1167
""" - - self.assertEqual(result, expected) - - df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], - columns=index[::2], index=index) - - result = df.to_html() - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
foo01
00
foo
0001
0123
1045
1167
""" - - self.assertEqual(result, expected) - - def test_to_html_multiindex_sparsify(self): - index = MultiIndex.from_arrays([[0, 0, 1, 1], [0, 1, 0, 1]], - names=['foo', None]) - - df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], index=index) - - result = df.to_html() - expected = """ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
01
foo
0001
123
1045
167
""" - - self.assertEqual(result, expected) - - df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], columns=index[::2], - index=index) - - result = df.to_html() - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
foo01
00
foo
0001
123
1045
167
""" - - self.assertEqual(result, expected) - - def test_to_html_multiindex_odd_even_truncate(self): - # GH 14882 - Issue on truncation with odd length DataFrame - mi = MultiIndex.from_product([[100, 200, 300], - [10, 20, 30], - [1, 2, 3, 4, 5, 6, 7]], - names=['a','b','c']) - df = DataFrame({'n' : range(len(mi))}, index = mi) - result = df.to_html(max_rows=60) - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
n
abc
1001010
21
32
43
54
65
76
2017
28
39
410
511
612
713
30114
215
316
417
518
619
720
20010121
222
323
424
525
626
727
20128
229
......
633
734
30135
236
337
438
539
640
741
30010142
243
344
445
546
647
748
20149
250
351
452
553
654
755
30156
257
358
459
560
661
762
""" - self.assertEqual(result, expected) - - # Test that ... appears in a middle level - result = df.to_html(max_rows=56) - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
n
abc
1001010
21
32
43
54
65
76
2017
28
39
410
511
612
713
30114
215
316
417
518
619
720
20010121
222
323
424
525
626
727
.........
30135
236
337
438
539
640
741
30010142
243
344
445
546
647
748
20149
250
351
452
553
654
755
30156
257
358
459
560
661
762
""" - self.assertEqual(result, expected) - - def test_to_html_index_formatter(self): - df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], columns=['foo', None], - index=lrange(4)) - - f = lambda x: 'abcd' [x] - result = df.to_html(formatters={'__index__': f}) - expected = """\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fooNone
a01
b23
c45
d67
""" - - self.assertEqual(result, expected) - - def test_to_html_datetime64_monthformatter(self): - months = [datetime(2016, 1, 1), datetime(2016, 2, 2)] - x = DataFrame({'months': months}) - - def format_func(x): - return x.strftime('%Y-%m') - result = x.to_html(formatters={'months': format_func}) - expected = """\ - - - - - - - - - - - - - - - - - -
months
02016-01
12016-02
""" - self.assertEqual(result, expected) - - def test_to_html_datetime64_hourformatter(self): - - x = DataFrame({'hod': pd.to_datetime(['10:10:10.100', '12:12:12.120'], - format='%H:%M:%S.%f')}) - - def format_func(x): - return x.strftime('%H:%M') - result = x.to_html(formatters={'hod': format_func}) - expected = """\ - - - - - - - - - - - - - - - - - -
hod
010:10
112:12
""" - self.assertEqual(result, expected) - - def test_to_html_regression_GH6098(self): - df = DataFrame({u('clé1'): [u('a'), u('a'), u('b'), u('b'), u('a')], - u('clé2'): [u('1er'), u('2ème'), u('1er'), u('2ème'), - u('1er')], - 'données1': np.random.randn(5), - 'données2': np.random.randn(5)}) - # it works - df.pivot_table(index=[u('clé1')], columns=[u('clé2')])._repr_html_() - - def test_to_html_truncate(self): - raise nose.SkipTest("unreliable on travis") - index = pd.DatetimeIndex(start='20010101', freq='D', periods=20) - df = DataFrame(index=index, columns=range(20)) - fmt.set_option('display.max_rows', 8) - fmt.set_option('display.max_columns', 4) - result = df._repr_html_() - expected = '''\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
01...1819
2001-01-01NaNNaN...NaNNaN
2001-01-02NaNNaN...NaNNaN
2001-01-03NaNNaN...NaNNaN
2001-01-04NaNNaN...NaNNaN
..................
2001-01-17NaNNaN...NaNNaN
2001-01-18NaNNaN...NaNNaN
2001-01-19NaNNaN...NaNNaN
2001-01-20NaNNaN...NaNNaN
-

20 rows × 20 columns

-'''.format(div_style) - if compat.PY2: - expected = expected.decode('utf-8') - self.assertEqual(result, expected) - - def test_to_html_truncate_multi_index(self): - raise nose.SkipTest("unreliable on travis") - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - df = DataFrame(index=arrays, columns=arrays) - fmt.set_option('display.max_rows', 7) - fmt.set_option('display.max_columns', 7) - result = df._repr_html_() - expected = '''\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
barbaz...fooqux
onetwoone...twoonetwo
baroneNaNNaNNaN...NaNNaNNaN
twoNaNNaNNaN...NaNNaNNaN
bazoneNaNNaNNaN...NaNNaNNaN
...........................
footwoNaNNaNNaN...NaNNaNNaN
quxoneNaNNaNNaN...NaNNaNNaN
twoNaNNaNNaN...NaNNaNNaN
-

8 rows × 8 columns

-'''.format(div_style) - if compat.PY2: - expected = expected.decode('utf-8') - self.assertEqual(result, expected) - - def test_to_html_truncate_multi_index_sparse_off(self): - raise nose.SkipTest("unreliable on travis") - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - df = DataFrame(index=arrays, columns=arrays) - fmt.set_option('display.max_rows', 7) - fmt.set_option('display.max_columns', 7) - fmt.set_option('display.multi_sparse', False) - result = df._repr_html_() - expected = '''\ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
barbarbaz...fooquxqux
onetwoone...twoonetwo
baroneNaNNaNNaN...NaNNaNNaN
bartwoNaNNaNNaN...NaNNaNNaN
bazoneNaNNaNNaN...NaNNaNNaN
footwoNaNNaNNaN...NaNNaNNaN
quxoneNaNNaNNaN...NaNNaNNaN
quxtwoNaNNaNNaN...NaNNaNNaN
-

8 rows × 8 columns

-'''.format(div_style) - if compat.PY2: - expected = expected.decode('utf-8') - self.assertEqual(result, expected) - - def test_to_html_border(self): - df = DataFrame({'A': [1, 2]}) - result = df.to_html() - assert 'border="1"' in result - - def test_to_html_border_option(self): - df = DataFrame({'A': [1, 2]}) - with pd.option_context('html.border', 0): - result = df.to_html() - self.assertTrue('border="0"' in result) - self.assertTrue('border="0"' in df._repr_html_()) - - def test_to_html_border_zero(self): - df = DataFrame({'A': [1, 2]}) - result = df.to_html(border=0) - self.assertTrue('border="0"' in result) - - def test_nonunicode_nonascii_alignment(self): - df = DataFrame([["aa\xc3\xa4\xc3\xa4", 1], ["bbbb", 2]]) - rep_str = df.to_string() - lines = rep_str.split('\n') - self.assertEqual(len(lines[1]), len(lines[2])) - - def test_unicode_problem_decoding_as_ascii(self): - dm = DataFrame({u('c/\u03c3'): Series({'test': np.NaN})}) - compat.text_type(dm.to_string()) - - def test_string_repr_encoding(self): - filepath = tm.get_data_path('unicode_series.csv') - df = pd.read_csv(filepath, header=None, encoding='latin1') - repr(df) - repr(df[1]) - - def test_repr_corner(self): - # representing infs poses no problems - df = DataFrame({'foo': [-np.inf, np.inf]}) - repr(df) - - def test_frame_info_encoding(self): - index = ['\'Til There Was You (1997)', - 'ldum klaka (Cold Fever) (1994)'] - fmt.set_option('display.max_rows', 1) - df = DataFrame(columns=['a', 'b', 'c'], index=index) - repr(df) - repr(df.T) - fmt.set_option('display.max_rows', 200) - - def test_pprint_thing(self): - from pandas.formats.printing import pprint_thing as pp_t - - if PY3: - raise nose.SkipTest("doesn't work on Python 3") - - self.assertEqual(pp_t('a'), u('a')) - self.assertEqual(pp_t(u('a')), u('a')) - self.assertEqual(pp_t(None), 'None') - self.assertEqual(pp_t(u('\u05d0'), quote_strings=True), u("u'\u05d0'")) - self.assertEqual(pp_t(u('\u05d0'), quote_strings=False), u('\u05d0')) - self.assertEqual(pp_t((u('\u05d0'), - u('\u05d1')), quote_strings=True), - u("(u'\u05d0', u'\u05d1')")) - self.assertEqual(pp_t((u('\u05d0'), (u('\u05d1'), - u('\u05d2'))), - quote_strings=True), - u("(u'\u05d0', (u'\u05d1', u'\u05d2'))")) - self.assertEqual(pp_t(('foo', u('\u05d0'), (u('\u05d0'), - u('\u05d0'))), - quote_strings=True), - u("(u'foo', u'\u05d0', (u'\u05d0', u'\u05d0'))")) - - # escape embedded tabs in string - # GH #2038 - self.assertTrue(not "\t" in pp_t("a\tb", escape_chars=("\t", ))) - - def test_wide_repr(self): - with option_context('mode.sim_interactive', True, - 'display.show_dimensions', True): - max_cols = get_option('display.max_columns') - df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) - set_option('display.expand_frame_repr', False) - rep_str = repr(df) - - assert "10 rows x %d columns" % (max_cols - 1) in rep_str - set_option('display.expand_frame_repr', True) - wide_repr = repr(df) - self.assertNotEqual(rep_str, wide_repr) - - with option_context('display.width', 120): - wider_repr = repr(df) - self.assertTrue(len(wider_repr) < len(wide_repr)) - - reset_option('display.expand_frame_repr') - - def test_wide_repr_wide_columns(self): - with option_context('mode.sim_interactive', True): - df = DataFrame(randn(5, 3), columns=['a' * 90, 'b' * 90, 'c' * 90]) - rep_str = repr(df) - - self.assertEqual(len(rep_str.splitlines()), 20) - - def test_wide_repr_named(self): - with option_context('mode.sim_interactive', True): - max_cols = get_option('display.max_columns') - df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) - df.index.name = 'DataFrame Index' - set_option('display.expand_frame_repr', False) - - rep_str = repr(df) - set_option('display.expand_frame_repr', True) - wide_repr = repr(df) - self.assertNotEqual(rep_str, wide_repr) - - with option_context('display.width', 150): - wider_repr = repr(df) - self.assertTrue(len(wider_repr) < len(wide_repr)) - - for line in wide_repr.splitlines()[1::13]: - self.assertIn('DataFrame Index', line) - - reset_option('display.expand_frame_repr') - - def test_wide_repr_multiindex(self): - with option_context('mode.sim_interactive', True): - midx = MultiIndex.from_arrays(tm.rands_array(5, size=(2, 10))) - max_cols = get_option('display.max_columns') - df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1)), - index=midx) - df.index.names = ['Level 0', 'Level 1'] - set_option('display.expand_frame_repr', False) - rep_str = repr(df) - set_option('display.expand_frame_repr', True) - wide_repr = repr(df) - self.assertNotEqual(rep_str, wide_repr) - - with option_context('display.width', 150): - wider_repr = repr(df) - self.assertTrue(len(wider_repr) < len(wide_repr)) - - for line in wide_repr.splitlines()[1::13]: - self.assertIn('Level 0 Level 1', line) - - reset_option('display.expand_frame_repr') - - def test_wide_repr_multiindex_cols(self): - with option_context('mode.sim_interactive', True): - max_cols = get_option('display.max_columns') - midx = MultiIndex.from_arrays(tm.rands_array(5, size=(2, 10))) - mcols = MultiIndex.from_arrays(tm.rands_array(3, size=(2, max_cols - - 1))) - df = DataFrame(tm.rands_array(25, (10, max_cols - 1)), - index=midx, columns=mcols) - df.index.names = ['Level 0', 'Level 1'] - set_option('display.expand_frame_repr', False) - rep_str = repr(df) - set_option('display.expand_frame_repr', True) - wide_repr = repr(df) - self.assertNotEqual(rep_str, wide_repr) - - with option_context('display.width', 150): - wider_repr = repr(df) - self.assertTrue(len(wider_repr) < len(wide_repr)) - - reset_option('display.expand_frame_repr') - - def test_wide_repr_unicode(self): - with option_context('mode.sim_interactive', True): - max_cols = get_option('display.max_columns') - df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) - set_option('display.expand_frame_repr', False) - rep_str = repr(df) - set_option('display.expand_frame_repr', True) - wide_repr = repr(df) - self.assertNotEqual(rep_str, wide_repr) - - with option_context('display.width', 150): - wider_repr = repr(df) - self.assertTrue(len(wider_repr) < len(wide_repr)) - - reset_option('display.expand_frame_repr') - - def test_wide_repr_wide_long_columns(self): - with option_context('mode.sim_interactive', True): - df = DataFrame({'a': ['a' * 30, 'b' * 30], - 'b': ['c' * 70, 'd' * 80]}) - - result = repr(df) - self.assertTrue('ccccc' in result) - self.assertTrue('ddddd' in result) - - def test_long_series(self): - n = 1000 - s = Series( - np.random.randint(-50, 50, n), - index=['s%04d' % x for x in range(n)], dtype='int64') - - import re - str_rep = str(s) - nmatches = len(re.findall('dtype', str_rep)) - self.assertEqual(nmatches, 1) - - def test_index_with_nan(self): - # GH 2850 - df = DataFrame({'id1': {0: '1a3', - 1: '9h4'}, - 'id2': {0: np.nan, - 1: 'd67'}, - 'id3': {0: '78d', - 1: '79d'}, - 'value': {0: 123, - 1: 64}}) - - # multi-index - y = df.set_index(['id1', 'id2', 'id3']) - result = y.to_string() - expected = u( - ' value\nid1 id2 id3 \n1a3 NaN 78d 123\n9h4 d67 79d 64') - self.assertEqual(result, expected) - - # index - y = df.set_index('id2') - result = y.to_string() - expected = u( - ' id1 id3 value\nid2 \nNaN 1a3 78d 123\nd67 9h4 79d 64') - self.assertEqual(result, expected) - - # with append (this failed in 0.12) - y = df.set_index(['id1', 'id2']).set_index('id3', append=True) - result = y.to_string() - expected = u( - ' value\nid1 id2 id3 \n1a3 NaN 78d 123\n9h4 d67 79d 64') - self.assertEqual(result, expected) - - # all-nan in mi - df2 = df.copy() - df2.loc[:, 'id2'] = np.nan - y = df2.set_index('id2') - result = y.to_string() - expected = u( - ' id1 id3 value\nid2 \nNaN 1a3 78d 123\nNaN 9h4 79d 64') - self.assertEqual(result, expected) - - # partial nan in mi - df2 = df.copy() - df2.loc[:, 'id2'] = np.nan - y = df2.set_index(['id2', 'id3']) - result = y.to_string() - expected = u( - ' id1 value\nid2 id3 \nNaN 78d 1a3 123\n 79d 9h4 64') - self.assertEqual(result, expected) - - df = DataFrame({'id1': {0: np.nan, - 1: '9h4'}, - 'id2': {0: np.nan, - 1: 'd67'}, - 'id3': {0: np.nan, - 1: '79d'}, - 'value': {0: 123, - 1: 64}}) - - y = df.set_index(['id1', 'id2', 'id3']) - result = y.to_string() - expected = u( - ' value\nid1 id2 id3 \nNaN NaN NaN 123\n9h4 d67 79d 64') - self.assertEqual(result, expected) - - def test_to_string(self): - from pandas import read_table - import re - - # big mixed - biggie = DataFrame({'A': randn(200), - 'B': tm.makeStringIndex(200)}, - index=lrange(200)) - - biggie.loc[:20, 'A'] = nan - biggie.loc[:20, 'B'] = nan - s = biggie.to_string() - - buf = StringIO() - retval = biggie.to_string(buf=buf) - self.assertIsNone(retval) - self.assertEqual(buf.getvalue(), s) - - tm.assertIsInstance(s, compat.string_types) - - # print in right order - result = biggie.to_string(columns=['B', 'A'], col_space=17, - float_format='%.5f'.__mod__) - lines = result.split('\n') - header = lines[0].strip().split() - joined = '\n'.join([re.sub(r'\s+', ' ', x).strip() for x in lines[1:]]) - recons = read_table(StringIO(joined), names=header, - header=None, sep=' ') - tm.assert_series_equal(recons['B'], biggie['B']) - self.assertEqual(recons['A'].count(), biggie['A'].count()) - self.assertTrue((np.abs(recons['A'].dropna() - biggie['A'].dropna()) < - 0.1).all()) - - # expected = ['B', 'A'] - # self.assertEqual(header, expected) - - result = biggie.to_string(columns=['A'], col_space=17) - header = result.split('\n')[0].strip().split() - expected = ['A'] - self.assertEqual(header, expected) - - biggie.to_string(columns=['B', 'A'], - formatters={'A': lambda x: '%.1f' % x}) - - biggie.to_string(columns=['B', 'A'], float_format=str) - biggie.to_string(columns=['B', 'A'], col_space=12, float_format=str) - - frame = DataFrame(index=np.arange(200)) - frame.to_string() - - def test_to_string_no_header(self): - df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) - - df_s = df.to_string(header=False) - expected = "0 1 4\n1 2 5\n2 3 6" - - self.assertEqual(df_s, expected) - - def test_to_string_no_index(self): - df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) - - df_s = df.to_string(index=False) - expected = "x y\n1 4\n2 5\n3 6" - - self.assertEqual(df_s, expected) - - def test_to_string_line_width_no_index(self): - df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) - - df_s = df.to_string(line_width=1, index=False) - expected = "x \\\n1 \n2 \n3 \n\ny \n4 \n5 \n6" - - self.assertEqual(df_s, expected) - - def test_to_string_float_formatting(self): - self.reset_display_options() - fmt.set_option('display.precision', 5, 'display.column_space', 12, - 'display.notebook_repr_html', False) - - df = DataFrame({'x': [0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, - 1.253456, np.pi, -1e6]}) - - df_s = df.to_string() - - # Python 2.5 just wants me to be sad. And debian 32-bit - # sys.version_info[0] == 2 and sys.version_info[1] < 6: - if _three_digit_exp(): - expected = (' x\n0 0.00000e+000\n1 2.50000e-001\n' - '2 3.45600e+003\n3 1.20000e+046\n4 1.64000e+006\n' - '5 1.70000e+008\n6 1.25346e+000\n7 3.14159e+000\n' - '8 -1.00000e+006') - else: - expected = (' x\n0 0.00000e+00\n1 2.50000e-01\n' - '2 3.45600e+03\n3 1.20000e+46\n4 1.64000e+06\n' - '5 1.70000e+08\n6 1.25346e+00\n7 3.14159e+00\n' - '8 -1.00000e+06') - self.assertEqual(df_s, expected) - - df = DataFrame({'x': [3234, 0.253]}) - df_s = df.to_string() - - expected = (' x\n' '0 3234.000\n' '1 0.253') - self.assertEqual(df_s, expected) - - self.reset_display_options() - self.assertEqual(get_option("display.precision"), 6) - - df = DataFrame({'x': [1e9, 0.2512]}) - df_s = df.to_string() - # Python 2.5 just wants me to be sad. And debian 32-bit - # sys.version_info[0] == 2 and sys.version_info[1] < 6: - if _three_digit_exp(): - expected = (' x\n' - '0 1.000000e+009\n' - '1 2.512000e-001') - else: - expected = (' x\n' - '0 1.000000e+09\n' - '1 2.512000e-01') - self.assertEqual(df_s, expected) - - def test_to_string_small_float_values(self): - df = DataFrame({'a': [1.5, 1e-17, -5.5e-7]}) - - result = df.to_string() - # sadness per above - if '%.4g' % 1.7e8 == '1.7e+008': - expected = (' a\n' - '0 1.500000e+000\n' - '1 1.000000e-017\n' - '2 -5.500000e-007') - else: - expected = (' a\n' - '0 1.500000e+00\n' - '1 1.000000e-17\n' - '2 -5.500000e-07') - self.assertEqual(result, expected) - - # but not all exactly zero - df = df * 0 - result = df.to_string() - expected = (' 0\n' '0 0\n' '1 0\n' '2 -0') - - def test_to_string_float_index(self): - index = Index([1.5, 2, 3, 4, 5]) - df = DataFrame(lrange(5), index=index) - - result = df.to_string() - expected = (' 0\n' - '1.5 0\n' - '2.0 1\n' - '3.0 2\n' - '4.0 3\n' - '5.0 4') - self.assertEqual(result, expected) - - def test_to_string_ascii_error(self): - data = [('0 ', u(' .gitignore '), u(' 5 '), - ' \xe2\x80\xa2\xe2\x80\xa2\xe2\x80' - '\xa2\xe2\x80\xa2\xe2\x80\xa2')] - df = DataFrame(data) - - # it works! - repr(df) - - def test_to_string_int_formatting(self): - df = DataFrame({'x': [-15, 20, 25, -35]}) - self.assertTrue(issubclass(df['x'].dtype.type, np.integer)) - - output = df.to_string() - expected = (' x\n' '0 -15\n' '1 20\n' '2 25\n' '3 -35') - self.assertEqual(output, expected) - - def test_to_string_index_formatter(self): - df = DataFrame([lrange(5), lrange(5, 10), lrange(10, 15)]) - - rs = df.to_string(formatters={'__index__': lambda x: 'abc' [x]}) - - xp = """\ - 0 1 2 3 4 -a 0 1 2 3 4 -b 5 6 7 8 9 -c 10 11 12 13 14\ -""" - - self.assertEqual(rs, xp) - - def test_to_string_left_justify_cols(self): - self.reset_display_options() - df = DataFrame({'x': [3234, 0.253]}) - df_s = df.to_string(justify='left') - expected = (' x \n' '0 3234.000\n' '1 0.253') - self.assertEqual(df_s, expected) - - def test_to_string_format_na(self): - self.reset_display_options() - df = DataFrame({'A': [np.nan, -1, -2.1234, 3, 4], - 'B': [np.nan, 'foo', 'foooo', 'fooooo', 'bar']}) - result = df.to_string() - - expected = (' A B\n' - '0 NaN NaN\n' - '1 -1.0000 foo\n' - '2 -2.1234 foooo\n' - '3 3.0000 fooooo\n' - '4 4.0000 bar') - self.assertEqual(result, expected) - - df = DataFrame({'A': [np.nan, -1., -2., 3., 4.], - 'B': [np.nan, 'foo', 'foooo', 'fooooo', 'bar']}) - result = df.to_string() - - expected = (' A B\n' - '0 NaN NaN\n' - '1 -1.0 foo\n' - '2 -2.0 foooo\n' - '3 3.0 fooooo\n' - '4 4.0 bar') - self.assertEqual(result, expected) - - def test_to_string_line_width(self): - df = DataFrame(123, lrange(10, 15), lrange(30)) - s = df.to_string(line_width=80) - self.assertEqual(max(len(l) for l in s.split('\n')), 80) - - def test_show_dimensions(self): - df = DataFrame(123, lrange(10, 15), lrange(30)) - - with option_context('display.max_rows', 10, 'display.max_columns', 40, - 'display.width', 500, 'display.expand_frame_repr', - 'info', 'display.show_dimensions', True): - self.assertTrue('5 rows' in str(df)) - self.assertTrue('5 rows' in df._repr_html_()) - with option_context('display.max_rows', 10, 'display.max_columns', 40, - 'display.width', 500, 'display.expand_frame_repr', - 'info', 'display.show_dimensions', False): - self.assertFalse('5 rows' in str(df)) - self.assertFalse('5 rows' in df._repr_html_()) - with option_context('display.max_rows', 2, 'display.max_columns', 2, - 'display.width', 500, 'display.expand_frame_repr', - 'info', 'display.show_dimensions', 'truncate'): - self.assertTrue('5 rows' in str(df)) - self.assertTrue('5 rows' in df._repr_html_()) - with option_context('display.max_rows', 10, 'display.max_columns', 40, - 'display.width', 500, 'display.expand_frame_repr', - 'info', 'display.show_dimensions', 'truncate'): - self.assertFalse('5 rows' in str(df)) - self.assertFalse('5 rows' in df._repr_html_()) - - def test_to_html(self): - # big mixed - biggie = DataFrame({'A': randn(200), - 'B': tm.makeStringIndex(200)}, - index=lrange(200)) - - biggie.loc[:20, 'A'] = nan - biggie.loc[:20, 'B'] = nan - s = biggie.to_html() - - buf = StringIO() - retval = biggie.to_html(buf=buf) - self.assertIsNone(retval) - self.assertEqual(buf.getvalue(), s) - - tm.assertIsInstance(s, compat.string_types) - - biggie.to_html(columns=['B', 'A'], col_space=17) - biggie.to_html(columns=['B', 'A'], - formatters={'A': lambda x: '%.1f' % x}) - - biggie.to_html(columns=['B', 'A'], float_format=str) - biggie.to_html(columns=['B', 'A'], col_space=12, float_format=str) - - frame = DataFrame(index=np.arange(200)) - frame.to_html() - - def test_to_html_filename(self): - biggie = DataFrame({'A': randn(200), - 'B': tm.makeStringIndex(200)}, - index=lrange(200)) - - biggie.loc[:20, 'A'] = nan - biggie.loc[:20, 'B'] = nan - with tm.ensure_clean('test.html') as path: - biggie.to_html(path) - with open(path, 'r') as f: - s = biggie.to_html() - s2 = f.read() - self.assertEqual(s, s2) - - frame = DataFrame(index=np.arange(200)) - with tm.ensure_clean('test.html') as path: - frame.to_html(path) - with open(path, 'r') as f: - self.assertEqual(frame.to_html(), f.read()) - - def test_to_html_with_no_bold(self): - x = DataFrame({'x': randn(5)}) - ashtml = x.to_html(bold_rows=False) - self.assertFalse('")]) - - def test_to_html_columns_arg(self): - result = self.frame.to_html(columns=['A']) - self.assertNotIn('B', result) - - def test_to_html_multiindex(self): - columns = MultiIndex.from_tuples(list(zip(np.arange(2).repeat(2), - np.mod(lrange(4), 2))), - names=['CL0', 'CL1']) - df = DataFrame([list('abcd'), list('efgh')], columns=columns) - result = df.to_html(justify='left') - expected = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
CL001
CL10101
0abcd
1efgh
') - - self.assertEqual(result, expected) - - columns = MultiIndex.from_tuples(list(zip( - range(4), np.mod( - lrange(4), 2)))) - df = DataFrame([list('abcd'), list('efgh')], columns=columns) - - result = df.to_html(justify='right') - expected = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
0123
0101
0abcd
1efgh
') - - self.assertEqual(result, expected) - - def test_to_html_justify(self): - df = DataFrame({'A': [6, 30000, 2], - 'B': [1, 2, 70000], - 'C': [223442, 0, 1]}, - columns=['A', 'B', 'C']) - result = df.to_html(justify='left') - expected = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
061223442
13000020
22700001
') - self.assertEqual(result, expected) - - result = df.to_html(justify='right') - expected = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
061223442
13000020
22700001
') - self.assertEqual(result, expected) - - def test_to_html_index(self): - index = ['foo', 'bar', 'baz'] - df = DataFrame({'A': [1, 2, 3], - 'B': [1.2, 3.4, 5.6], - 'C': ['one', 'two', np.NaN]}, - columns=['A', 'B', 'C'], - index=index) - expected_with_index = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
foo11.2one
bar23.4two
baz35.6NaN
') - self.assertEqual(df.to_html(), expected_with_index) - - expected_without_index = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
11.2one
23.4two
35.6NaN
') - result = df.to_html(index=False) - for i in index: - self.assertNotIn(i, result) - self.assertEqual(result, expected_without_index) - df.index = Index(['foo', 'bar', 'baz'], name='idx') - expected_with_index = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
idx
foo11.2one
bar23.4two
baz35.6NaN
') - self.assertEqual(df.to_html(), expected_with_index) - self.assertEqual(df.to_html(index=False), expected_without_index) - - tuples = [('foo', 'car'), ('foo', 'bike'), ('bar', 'car')] - df.index = MultiIndex.from_tuples(tuples) - - expected_with_index = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
foocar11.2one
bike23.4two
barcar35.6NaN
') - self.assertEqual(df.to_html(), expected_with_index) - - result = df.to_html(index=False) - for i in ['foo', 'bar', 'car', 'bike']: - self.assertNotIn(i, result) - # must be the same result as normal index - self.assertEqual(result, expected_without_index) - - df.index = MultiIndex.from_tuples(tuples, names=['idx1', 'idx2']) - expected_with_index = ('\n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - ' \n' - '
ABC
idx1idx2
foocar11.2one
bike23.4two
barcar35.6NaN
') - self.assertEqual(df.to_html(), expected_with_index) - self.assertEqual(df.to_html(index=False), expected_without_index) - - def test_repr_html(self): - self.frame._repr_html_() - - fmt.set_option('display.max_rows', 1, 'display.max_columns', 1) - self.frame._repr_html_() - - fmt.set_option('display.notebook_repr_html', False) - self.frame._repr_html_() - - self.reset_display_options() - - df = DataFrame([[1, 2], [3, 4]]) - fmt.set_option('display.show_dimensions', True) - self.assertTrue('2 rows' in df._repr_html_()) - fmt.set_option('display.show_dimensions', False) - self.assertFalse('2 rows' in df._repr_html_()) - - self.reset_display_options() - - def test_repr_html_wide(self): - max_cols = get_option('display.max_columns') - df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) - reg_repr = df._repr_html_() - assert "..." not in reg_repr - - wide_df = DataFrame(tm.rands_array(25, size=(10, max_cols + 1))) - wide_repr = wide_df._repr_html_() - assert "..." in wide_repr - - def test_repr_html_wide_multiindex_cols(self): - max_cols = get_option('display.max_columns') - - mcols = MultiIndex.from_product([np.arange(max_cols // 2), - ['foo', 'bar']], - names=['first', 'second']) - df = DataFrame(tm.rands_array(25, size=(10, len(mcols))), - columns=mcols) - reg_repr = df._repr_html_() - assert '...' not in reg_repr - - mcols = MultiIndex.from_product((np.arange(1 + (max_cols // 2)), - ['foo', 'bar']), - names=['first', 'second']) - df = DataFrame(tm.rands_array(25, size=(10, len(mcols))), - columns=mcols) - wide_repr = df._repr_html_() - assert '...' in wide_repr - - def test_repr_html_long(self): - max_rows = get_option('display.max_rows') - h = max_rows - 1 - df = DataFrame({'A': np.arange(1, 1 + h), 'B': np.arange(41, 41 + h)}) - reg_repr = df._repr_html_() - assert '..' not in reg_repr - assert str(41 + max_rows // 2) in reg_repr - - h = max_rows + 1 - df = DataFrame({'A': np.arange(1, 1 + h), 'B': np.arange(41, 41 + h)}) - long_repr = df._repr_html_() - assert '..' in long_repr - assert str(41 + max_rows // 2) not in long_repr - assert u('%d rows ') % h in long_repr - assert u('2 columns') in long_repr - - def test_repr_html_float(self): - max_rows = get_option('display.max_rows') - h = max_rows - 1 - df = DataFrame({'idx': np.linspace(-10, 10, h), - 'A': np.arange(1, 1 + h), - 'B': np.arange(41, 41 + h)}).set_index('idx') - reg_repr = df._repr_html_() - assert '..' not in reg_repr - assert str(40 + h) in reg_repr - - h = max_rows + 1 - df = DataFrame({'idx': np.linspace(-10, 10, h), - 'A': np.arange(1, 1 + h), - 'B': np.arange(41, 41 + h)}).set_index('idx') - long_repr = df._repr_html_() - assert '..' in long_repr - assert '31' not in long_repr - assert u('%d rows ') % h in long_repr - assert u('2 columns') in long_repr - - def test_repr_html_long_multiindex(self): - max_rows = get_option('display.max_rows') - max_L1 = max_rows // 2 - - tuples = list(itertools.product(np.arange(max_L1), ['foo', 'bar'])) - idx = MultiIndex.from_tuples(tuples, names=['first', 'second']) - df = DataFrame(np.random.randn(max_L1 * 2, 2), index=idx, - columns=['A', 'B']) - reg_repr = df._repr_html_() - assert '...' not in reg_repr - - tuples = list(itertools.product(np.arange(max_L1 + 1), ['foo', 'bar'])) - idx = MultiIndex.from_tuples(tuples, names=['first', 'second']) - df = DataFrame(np.random.randn((max_L1 + 1) * 2, 2), index=idx, - columns=['A', 'B']) - long_repr = df._repr_html_() - assert '...' in long_repr - - def test_repr_html_long_and_wide(self): - max_cols = get_option('display.max_columns') - max_rows = get_option('display.max_rows') - - h, w = max_rows - 1, max_cols - 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert '...' not in df._repr_html_() - - h, w = max_rows + 1, max_cols + 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert '...' in df._repr_html_() - - def test_info_repr(self): - max_rows = get_option('display.max_rows') - max_cols = get_option('display.max_columns') - # Long - h, w = max_rows + 1, max_cols - 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert has_vertically_truncated_repr(df) - with option_context('display.large_repr', 'info'): - assert has_info_repr(df) - - # Wide - h, w = max_rows - 1, max_cols + 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert has_horizontally_truncated_repr(df) - with option_context('display.large_repr', 'info'): - assert has_info_repr(df) - - def test_info_repr_max_cols(self): - # GH #6939 - df = DataFrame(randn(10, 5)) - with option_context('display.large_repr', 'info', - 'display.max_columns', 1, - 'display.max_info_columns', 4): - self.assertTrue(has_non_verbose_info_repr(df)) - - with option_context('display.large_repr', 'info', - 'display.max_columns', 1, - 'display.max_info_columns', 5): - self.assertFalse(has_non_verbose_info_repr(df)) - - # test verbose overrides - # fmt.set_option('display.max_info_columns', 4) # exceeded - - def test_info_repr_html(self): - max_rows = get_option('display.max_rows') - max_cols = get_option('display.max_columns') - # Long - h, w = max_rows + 1, max_cols - 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert r'<class' not in df._repr_html_() - with option_context('display.large_repr', 'info'): - assert r'<class' in df._repr_html_() - - # Wide - h, w = max_rows - 1, max_cols + 1 - df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) - assert ' - - - - - - - - - - """).strip() - self.assertEqual(result, expected) - - result = df.to_html(classes=["sortable", "draggable"]) - self.assertEqual(result, expected) - - def test_to_html_no_index_max_rows(self): - # GH https://github.com/pandas-dev/pandas/issues/14998 - df = DataFrame({"A": [1, 2, 3, 4]}) - result = df.to_html(index=False, max_rows=1) - expected = dedent("""\ - - - - - - - - - - - -
A
1
""") - self.assertEqual(result, expected) - - def test_pprint_pathological_object(self): - """ - if the test fails, the stack will overflow and nose crash, - but it won't hang. - """ - - class A: - - def __getitem__(self, key): - return 3 # obviously simplified - - df = DataFrame([A()]) - repr(df) # just don't dine - - def test_float_trim_zeros(self): - vals = [2.08430917305e+10, 3.52205017305e+10, 2.30674817305e+10, - 2.03954217305e+10, 5.59897817305e+10] - skip = True - for line in repr(DataFrame({'A': vals})).split('\n')[:-2]: - if line.startswith('dtype:'): - continue - if _three_digit_exp(): - self.assertTrue(('+010' in line) or skip) - else: - self.assertTrue(('+10' in line) or skip) - skip = False - - def test_dict_entries(self): - df = DataFrame({'A': [{'a': 1, 'b': 2}]}) - - val = df.to_string() - self.assertTrue("'a': 1" in val) - self.assertTrue("'b': 2" in val) - - def test_to_latex_filename(self): - with tm.ensure_clean('test.tex') as path: - self.frame.to_latex(path) - - with open(path, 'r') as f: - self.assertEqual(self.frame.to_latex(), f.read()) - - # test with utf-8 and encoding option (GH 7061) - df = DataFrame([[u'au\xdfgangen']]) - with tm.ensure_clean('test.tex') as path: - df.to_latex(path, encoding='utf-8') - with codecs.open(path, 'r', encoding='utf-8') as f: - self.assertEqual(df.to_latex(), f.read()) - - # test with utf-8 without encoding option - if compat.PY3: # python3: pandas default encoding is utf-8 - with tm.ensure_clean('test.tex') as path: - df.to_latex(path) - with codecs.open(path, 'r', encoding='utf-8') as f: - self.assertEqual(df.to_latex(), f.read()) - else: - # python2 default encoding is ascii, so an error should be raised - with tm.ensure_clean('test.tex') as path: - self.assertRaises(UnicodeEncodeError, df.to_latex, path) - - def test_to_latex(self): - # it works! - self.frame.to_latex() - - df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) - withindex_result = df.to_latex() - withindex_expected = r"""\begin{tabular}{lrl} -\toprule -{} & a & b \\ -\midrule -0 & 1 & b1 \\ -1 & 2 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withindex_result, withindex_expected) - - withoutindex_result = df.to_latex(index=False) - withoutindex_expected = r"""\begin{tabular}{rl} -\toprule - a & b \\ -\midrule - 1 & b1 \\ - 2 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withoutindex_result, withoutindex_expected) - - def test_to_latex_format(self): - # GH Bug #9402 - self.frame.to_latex(column_format='ccc') - - df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) - withindex_result = df.to_latex(column_format='ccc') - withindex_expected = r"""\begin{tabular}{ccc} -\toprule -{} & a & b \\ -\midrule -0 & 1 & b1 \\ -1 & 2 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withindex_result, withindex_expected) - - def test_to_latex_with_formatters(self): - df = DataFrame({'int': [1, 2, 3], - 'float': [1.0, 2.0, 3.0], - 'object': [(1, 2), True, False], - 'datetime64': [datetime(2016, 1, 1), - datetime(2016, 2, 5), - datetime(2016, 3, 3)]}) - - formatters = {'int': lambda x: '0x%x' % x, - 'float': lambda x: '[% 4.1f]' % x, - 'object': lambda x: '-%s-' % str(x), - 'datetime64': lambda x: x.strftime('%Y-%m'), - '__index__': lambda x: 'index: %s' % x} - result = df.to_latex(formatters=dict(formatters)) - - expected = r"""\begin{tabular}{llrrl} -\toprule -{} & datetime64 & float & int & object \\ -\midrule -index: 0 & 2016-01 & [ 1.0] & 0x1 & -(1, 2)- \\ -index: 1 & 2016-02 & [ 2.0] & 0x2 & -True- \\ -index: 2 & 2016-03 & [ 3.0] & 0x3 & -False- \\ -\bottomrule -\end{tabular} -""" - self.assertEqual(result, expected) - - def test_to_latex_multiindex(self): - df = DataFrame({('x', 'y'): ['a']}) - result = df.to_latex() - expected = r"""\begin{tabular}{ll} -\toprule -{} & x \\ -{} & y \\ -\midrule -0 & a \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(result, expected) - - result = df.T.to_latex() - expected = r"""\begin{tabular}{lll} -\toprule - & & 0 \\ -\midrule -x & y & a \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(result, expected) - - df = DataFrame.from_dict({ - ('c1', 0): pd.Series(dict((x, x) for x in range(4))), - ('c1', 1): pd.Series(dict((x, x + 4) for x in range(4))), - ('c2', 0): pd.Series(dict((x, x) for x in range(4))), - ('c2', 1): pd.Series(dict((x, x + 4) for x in range(4))), - ('c3', 0): pd.Series(dict((x, x) for x in range(4))), - }).T - result = df.to_latex() - expected = r"""\begin{tabular}{llrrrr} -\toprule - & & 0 & 1 & 2 & 3 \\ -\midrule -c1 & 0 & 0 & 1 & 2 & 3 \\ - & 1 & 4 & 5 & 6 & 7 \\ -c2 & 0 & 0 & 1 & 2 & 3 \\ - & 1 & 4 & 5 & 6 & 7 \\ -c3 & 0 & 0 & 1 & 2 & 3 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(result, expected) - - # GH 10660 - df = pd.DataFrame({'a': [0, 0, 1, 1], - 'b': list('abab'), - 'c': [1, 2, 3, 4]}) - result = df.set_index(['a', 'b']).to_latex() - expected = r"""\begin{tabular}{llr} -\toprule - & & c \\ -a & b & \\ -\midrule -0 & a & 1 \\ - & b & 2 \\ -1 & a & 3 \\ - & b & 4 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(result, expected) - - result = df.groupby('a').describe().to_latex() - expected = r"""\begin{tabular}{llr} -\toprule - & & c \\ -a & {} & \\ -\midrule -0 & count & 2.000000 \\ - & mean & 1.500000 \\ - & std & 0.707107 \\ - & min & 1.000000 \\ - & 25\% & 1.250000 \\ - & 50\% & 1.500000 \\ - & 75\% & 1.750000 \\ - & max & 2.000000 \\ -1 & count & 2.000000 \\ - & mean & 3.500000 \\ - & std & 0.707107 \\ - & min & 3.000000 \\ - & 25\% & 3.250000 \\ - & 50\% & 3.500000 \\ - & 75\% & 3.750000 \\ - & max & 4.000000 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(result, expected) - - def test_to_latex_escape(self): - a = 'a' - b = 'b' - - test_dict = {u('co^l1'): {a: "a", - b: "b"}, - u('co$e^x$'): {a: "a", - b: "b"}} - - unescaped_result = DataFrame(test_dict).to_latex(escape=False) - escaped_result = DataFrame(test_dict).to_latex( - ) # default: escape=True - - unescaped_expected = r'''\begin{tabular}{lll} -\toprule -{} & co$e^x$ & co^l1 \\ -\midrule -a & a & a \\ -b & b & b \\ -\bottomrule -\end{tabular} -''' - - escaped_expected = r'''\begin{tabular}{lll} -\toprule -{} & co\$e\textasciicircumx\$ & co\textasciicircuml1 \\ -\midrule -a & a & a \\ -b & b & b \\ -\bottomrule -\end{tabular} -''' - - self.assertEqual(unescaped_result, unescaped_expected) - self.assertEqual(escaped_result, escaped_expected) - - def test_to_latex_longtable(self): - self.frame.to_latex(longtable=True) - - df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) - withindex_result = df.to_latex(longtable=True) - withindex_expected = r"""\begin{longtable}{lrl} -\toprule -{} & a & b \\ -\midrule -\endhead -\midrule -\multicolumn{3}{r}{{Continued on next page}} \\ -\midrule -\endfoot - -\bottomrule -\endlastfoot -0 & 1 & b1 \\ -1 & 2 & b2 \\ -\end{longtable} -""" - - self.assertEqual(withindex_result, withindex_expected) - - withoutindex_result = df.to_latex(index=False, longtable=True) - withoutindex_expected = r"""\begin{longtable}{rl} -\toprule - a & b \\ -\midrule -\endhead -\midrule -\multicolumn{3}{r}{{Continued on next page}} \\ -\midrule -\endfoot - -\bottomrule -\endlastfoot - 1 & b1 \\ - 2 & b2 \\ -\end{longtable} -""" - - self.assertEqual(withoutindex_result, withoutindex_expected) - - def test_to_latex_escape_special_chars(self): - special_characters = ['&', '%', '$', '#', '_', '{', '}', '~', '^', - '\\'] - df = DataFrame(data=special_characters) - observed = df.to_latex() - expected = r"""\begin{tabular}{ll} -\toprule -{} & 0 \\ -\midrule -0 & \& \\ -1 & \% \\ -2 & \$ \\ -3 & \# \\ -4 & \_ \\ -5 & \{ \\ -6 & \} \\ -7 & \textasciitilde \\ -8 & \textasciicircum \\ -9 & \textbackslash \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(observed, expected) - - def test_to_latex_no_header(self): - # GH 7124 - df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) - withindex_result = df.to_latex(header=False) - withindex_expected = r"""\begin{tabular}{lrl} -\toprule -0 & 1 & b1 \\ -1 & 2 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withindex_result, withindex_expected) - - withoutindex_result = df.to_latex(index=False, header=False) - withoutindex_expected = r"""\begin{tabular}{rl} -\toprule - 1 & b1 \\ - 2 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withoutindex_result, withoutindex_expected) - - def test_to_latex_decimal(self): - # GH 12031 - self.frame.to_latex() - df = DataFrame({'a': [1.0, 2.1], 'b': ['b1', 'b2']}) - withindex_result = df.to_latex(decimal=',') - print("WHAT THE") - withindex_expected = r"""\begin{tabular}{lrl} -\toprule -{} & a & b \\ -\midrule -0 & 1,0 & b1 \\ -1 & 2,1 & b2 \\ -\bottomrule -\end{tabular} -""" - - self.assertEqual(withindex_result, withindex_expected) - - def test_to_csv_quotechar(self): - df = DataFrame({'col': [1, 2]}) - expected = """\ -"","col" -"0","1" -"1","2" -""" - - with tm.ensure_clean('test.csv') as path: - df.to_csv(path, quoting=1) # 1=QUOTE_ALL - with open(path, 'r') as f: - self.assertEqual(f.read(), expected) - - expected = """\ -$$,$col$ -$0$,$1$ -$1$,$2$ -""" - - with tm.ensure_clean('test.csv') as path: - df.to_csv(path, quoting=1, quotechar="$") - with open(path, 'r') as f: - self.assertEqual(f.read(), expected) - - with tm.ensure_clean('test.csv') as path: - with tm.assertRaisesRegexp(TypeError, 'quotechar'): - df.to_csv(path, quoting=1, quotechar=None) - - def test_to_csv_doublequote(self): - df = DataFrame({'col': ['a"a', '"bb"']}) - expected = '''\ -"","col" -"0","a""a" -"1","""bb""" -''' - - with tm.ensure_clean('test.csv') as path: - df.to_csv(path, quoting=1, doublequote=True) # QUOTE_ALL - with open(path, 'r') as f: - self.assertEqual(f.read(), expected) - - from _csv import Error - with tm.ensure_clean('test.csv') as path: - with tm.assertRaisesRegexp(Error, 'escapechar'): - df.to_csv(path, doublequote=False) # no escapechar set - - def test_to_csv_escapechar(self): - df = DataFrame({'col': ['a"a', '"bb"']}) - expected = '''\ -"","col" -"0","a\\"a" -"1","\\"bb\\"" -''' - - with tm.ensure_clean('test.csv') as path: # QUOTE_ALL - df.to_csv(path, quoting=1, doublequote=False, escapechar='\\') - with open(path, 'r') as f: - self.assertEqual(f.read(), expected) - - df = DataFrame({'col': ['a,a', ',bb,']}) - expected = """\ -,col -0,a\\,a -1,\\,bb\\, -""" - - with tm.ensure_clean('test.csv') as path: - df.to_csv(path, quoting=3, escapechar='\\') # QUOTE_NONE - with open(path, 'r') as f: - self.assertEqual(f.read(), expected) - - def test_csv_to_string(self): - df = DataFrame({'col': [1, 2]}) - expected = ',col\n0,1\n1,2\n' - self.assertEqual(df.to_csv(), expected) - - def test_to_csv_decimal(self): - # GH 781 - df = DataFrame({'col1': [1], 'col2': ['a'], 'col3': [10.1]}) - - expected_default = ',col1,col2,col3\n0,1,a,10.1\n' - self.assertEqual(df.to_csv(), expected_default) - - expected_european_excel = ';col1;col2;col3\n0;1;a;10,1\n' - self.assertEqual( - df.to_csv(decimal=',', sep=';'), expected_european_excel) - - expected_float_format_default = ',col1,col2,col3\n0,1,a,10.10\n' - self.assertEqual( - df.to_csv(float_format='%.2f'), expected_float_format_default) - - expected_float_format = ';col1;col2;col3\n0;1;a;10,10\n' - self.assertEqual( - df.to_csv(decimal=',', sep=';', - float_format='%.2f'), expected_float_format) - - # GH 11553: testing if decimal is taken into account for '0.0' - df = pd.DataFrame({'a': [0, 1.1], 'b': [2.2, 3.3], 'c': 1}) - expected = 'a,b,c\n0^0,2^2,1\n1^1,3^3,1\n' - self.assertEqual(df.to_csv(index=False, decimal='^'), expected) - - # same but for an index - self.assertEqual(df.set_index('a').to_csv(decimal='^'), expected) - - # same for a multi-index - self.assertEqual( - df.set_index(['a', 'b']).to_csv(decimal="^"), expected) - - def test_to_csv_float_format(self): - # testing if float_format is taken into account for the index - # GH 11553 - df = pd.DataFrame({'a': [0, 1], 'b': [2.2, 3.3], 'c': 1}) - expected = 'a,b,c\n0,2.20,1\n1,3.30,1\n' - self.assertEqual( - df.set_index('a').to_csv(float_format='%.2f'), expected) - - # same for a multi-index - self.assertEqual( - df.set_index(['a', 'b']).to_csv(float_format='%.2f'), expected) - - def test_to_csv_na_rep(self): - # testing if NaN values are correctly represented in the index - # GH 11553 - df = DataFrame({'a': [0, np.NaN], 'b': [0, 1], 'c': [2, 3]}) - expected = "a,b,c\n0.0,0,2\n_,1,3\n" - self.assertEqual(df.set_index('a').to_csv(na_rep='_'), expected) - self.assertEqual(df.set_index(['a', 'b']).to_csv(na_rep='_'), expected) - - # now with an index containing only NaNs - df = DataFrame({'a': np.NaN, 'b': [0, 1], 'c': [2, 3]}) - expected = "a,b,c\n_,0,2\n_,1,3\n" - self.assertEqual(df.set_index('a').to_csv(na_rep='_'), expected) - self.assertEqual(df.set_index(['a', 'b']).to_csv(na_rep='_'), expected) - - # check if na_rep parameter does not break anything when no NaN - df = DataFrame({'a': 0, 'b': [0, 1], 'c': [2, 3]}) - expected = "a,b,c\n0,0,2\n0,1,3\n" - self.assertEqual(df.set_index('a').to_csv(na_rep='_'), expected) - self.assertEqual(df.set_index(['a', 'b']).to_csv(na_rep='_'), expected) - - def test_to_csv_date_format(self): - # GH 10209 - df_sec = DataFrame({'A': pd.date_range('20130101', periods=5, freq='s') - }) - df_day = DataFrame({'A': pd.date_range('20130101', periods=5, freq='d') - }) - - expected_default_sec = ',A\n0,2013-01-01 00:00:00\n1,2013-01-01 00:00:01\n2,2013-01-01 00:00:02' + \ - '\n3,2013-01-01 00:00:03\n4,2013-01-01 00:00:04\n' - self.assertEqual(df_sec.to_csv(), expected_default_sec) - - expected_ymdhms_day = ',A\n0,2013-01-01 00:00:00\n1,2013-01-02 00:00:00\n2,2013-01-03 00:00:00' + \ - '\n3,2013-01-04 00:00:00\n4,2013-01-05 00:00:00\n' - self.assertEqual( - df_day.to_csv( - date_format='%Y-%m-%d %H:%M:%S'), expected_ymdhms_day) - - expected_ymd_sec = ',A\n0,2013-01-01\n1,2013-01-01\n2,2013-01-01\n3,2013-01-01\n4,2013-01-01\n' - self.assertEqual( - df_sec.to_csv(date_format='%Y-%m-%d'), expected_ymd_sec) - - expected_default_day = ',A\n0,2013-01-01\n1,2013-01-02\n2,2013-01-03\n3,2013-01-04\n4,2013-01-05\n' - self.assertEqual(df_day.to_csv(), expected_default_day) - self.assertEqual( - df_day.to_csv(date_format='%Y-%m-%d'), expected_default_day) - - # testing if date_format parameter is taken into account for - # multi-indexed dataframes (GH 7791) - df_sec['B'] = 0 - df_sec['C'] = 1 - expected_ymd_sec = 'A,B,C\n2013-01-01,0,1\n' - df_sec_grouped = df_sec.groupby([pd.Grouper(key='A', freq='1h'), 'B']) - self.assertEqual(df_sec_grouped.mean().to_csv(date_format='%Y-%m-%d'), - expected_ymd_sec) - - def test_to_csv_multi_index(self): - # see gh-6618 - df = DataFrame([1], columns=pd.MultiIndex.from_arrays([[1],[2]])) - - exp = ",1\n,2\n0,1\n" - self.assertEqual(df.to_csv(), exp) - - exp = "1\n2\n1\n" - self.assertEqual(df.to_csv(index=False), exp) - - df = DataFrame([1], columns=pd.MultiIndex.from_arrays([[1],[2]]), - index=pd.MultiIndex.from_arrays([[1],[2]])) - - exp = ",,1\n,,2\n1,2,1\n" - self.assertEqual(df.to_csv(), exp) - - exp = "1\n2\n1\n" - self.assertEqual(df.to_csv(index=False), exp) - - df = DataFrame([1], columns=pd.MultiIndex.from_arrays([['foo'],['bar']])) - - exp = ",foo\n,bar\n0,1\n" - self.assertEqual(df.to_csv(), exp) - - exp = "foo\nbar\n1\n" - self.assertEqual(df.to_csv(index=False), exp) - - def test_period(self): - # GH 12615 - df = pd.DataFrame({'A': pd.period_range('2013-01', - periods=4, freq='M'), - 'B': [pd.Period('2011-01', freq='M'), - pd.Period('2011-02-01', freq='D'), - pd.Period('2011-03-01 09:00', freq='H'), - pd.Period('2011-04', freq='M')], - 'C': list('abcd')}) - exp = (" A B C\n0 2013-01 2011-01 a\n" - "1 2013-02 2011-02-01 b\n2 2013-03 2011-03-01 09:00 c\n" - "3 2013-04 2011-04 d") - self.assertEqual(str(df), exp) - - -class TestSeriesFormatting(tm.TestCase): - - _multiprocess_can_split_ = True - - def setUp(self): - self.ts = tm.makeTimeSeries() - - def test_repr_unicode(self): - s = Series([u('\u03c3')] * 10) - repr(s) - - a = Series([u("\u05d0")] * 1000) - a.name = 'title1' - repr(a) - - def test_to_string(self): - buf = StringIO() - - s = self.ts.to_string() - - retval = self.ts.to_string(buf=buf) - self.assertIsNone(retval) - self.assertEqual(buf.getvalue().strip(), s) - - # pass float_format - format = '%.4f'.__mod__ - result = self.ts.to_string(float_format=format) - result = [x.split()[1] for x in result.split('\n')[:-1]] - expected = [format(x) for x in self.ts] - self.assertEqual(result, expected) - - # empty string - result = self.ts[:0].to_string() - self.assertEqual(result, 'Series([], Freq: B)') - - result = self.ts[:0].to_string(length=0) - self.assertEqual(result, 'Series([], Freq: B)') - - # name and length - cp = self.ts.copy() - cp.name = 'foo' - result = cp.to_string(length=True, name=True, dtype=True) - last_line = result.split('\n')[-1].strip() - self.assertEqual(last_line, - "Freq: B, Name: foo, Length: %d, dtype: float64" % - len(cp)) - - def test_freq_name_separation(self): - s = Series(np.random.randn(10), - index=date_range('1/1/2000', periods=10), name=0) - - result = repr(s) - self.assertTrue('Freq: D, Name: 0' in result) - - def test_to_string_mixed(self): - s = Series(['foo', np.nan, -1.23, 4.56]) - result = s.to_string() - expected = (u('0 foo\n') + u('1 NaN\n') + u('2 -1.23\n') + - u('3 4.56')) - self.assertEqual(result, expected) - - # but don't count NAs as floats - s = Series(['foo', np.nan, 'bar', 'baz']) - result = s.to_string() - expected = (u('0 foo\n') + '1 NaN\n' + '2 bar\n' + '3 baz') - self.assertEqual(result, expected) - - s = Series(['foo', 5, 'bar', 'baz']) - result = s.to_string() - expected = (u('0 foo\n') + '1 5\n' + '2 bar\n' + '3 baz') - self.assertEqual(result, expected) - - def test_to_string_float_na_spacing(self): - s = Series([0., 1.5678, 2., -3., 4.]) - s[::2] = np.nan - - result = s.to_string() - expected = (u('0 NaN\n') + '1 1.5678\n' + '2 NaN\n' + - '3 -3.0000\n' + '4 NaN') - self.assertEqual(result, expected) - - def test_to_string_without_index(self): - # GH 11729 Test index=False option - s = Series([1, 2, 3, 4]) - result = s.to_string(index=False) - expected = (u('1\n') + '2\n' + '3\n' + '4') - self.assertEqual(result, expected) - - def test_unicode_name_in_footer(self): - s = Series([1, 2], name=u('\u05e2\u05d1\u05e8\u05d9\u05ea')) - sf = fmt.SeriesFormatter(s, name=u('\u05e2\u05d1\u05e8\u05d9\u05ea')) - sf._get_footer() # should not raise exception - - def test_east_asian_unicode_series(self): - if PY3: - _rep = repr - else: - _rep = unicode - # not alighned properly because of east asian width - - # unicode index - s = Series(['a', 'bb', 'CCC', 'D'], - index=[u'あ', u'いい', u'ううう', u'ええええ']) - expected = (u"あ a\nいい bb\nううう CCC\n" - u"ええええ D\ndtype: object") - self.assertEqual(_rep(s), expected) - - # unicode values - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=['a', 'bb', 'c', 'ddd']) - expected = (u"a あ\nbb いい\nc ううう\n" - u"ddd ええええ\ndtype: object") - self.assertEqual(_rep(s), expected) - - # both - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=[u'ああ', u'いいいい', u'う', u'えええ']) - expected = (u"ああ あ\nいいいい いい\nう ううう\n" - u"えええ ええええ\ndtype: object") - self.assertEqual(_rep(s), expected) - - # unicode footer - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=[u'ああ', u'いいいい', u'う', u'えええ'], name=u'おおおおおおお') - expected = (u"ああ あ\nいいいい いい\nう ううう\n" - u"えええ ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - # MultiIndex - idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( - u'おおお', u'かかかか'), (u'き', u'くく')]) - s = Series([1, 22, 3333, 44444], index=idx) - expected = (u"あ いい 1\nう え 22\nおおお かかかか 3333\n" - u"き くく 44444\ndtype: int64") - self.assertEqual(_rep(s), expected) - - # object dtype, shorter than unicode repr - s = Series([1, 22, 3333, 44444], index=[1, 'AB', np.nan, u'あああ']) - expected = (u"1 1\nAB 22\nNaN 3333\n" - u"あああ 44444\ndtype: int64") - self.assertEqual(_rep(s), expected) - - # object dtype, longer than unicode repr - s = Series([1, 22, 3333, 44444], - index=[1, 'AB', pd.Timestamp('2011-01-01'), u'あああ']) - expected = (u"1 1\nAB 22\n" - u"2011-01-01 00:00:00 3333\nあああ 44444\ndtype: int64" - ) - self.assertEqual(_rep(s), expected) - - # truncate - with option_context('display.max_rows', 3): - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], name=u'おおおおおおお') - - expected = (u"0 あ\n ... \n" - u"3 ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - s.index = [u'ああ', u'いいいい', u'う', u'えええ'] - expected = (u"ああ あ\n ... \n" - u"えええ ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - # Emable Unicode option ----------------------------------------- - with option_context('display.unicode.east_asian_width', True): - - # unicode index - s = Series(['a', 'bb', 'CCC', 'D'], - index=[u'あ', u'いい', u'ううう', u'ええええ']) - expected = (u"あ a\nいい bb\nううう CCC\n" - u"ええええ D\ndtype: object") - self.assertEqual(_rep(s), expected) - - # unicode values - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=['a', 'bb', 'c', 'ddd']) - expected = (u"a あ\nbb いい\nc ううう\n" - u"ddd ええええ\ndtype: object") - self.assertEqual(_rep(s), expected) - - # both - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=[u'ああ', u'いいいい', u'う', u'えええ']) - expected = (u"ああ あ\nいいいい いい\nう ううう\n" - u"えええ ええええ\ndtype: object") - self.assertEqual(_rep(s), expected) - - # unicode footer - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], - index=[u'ああ', u'いいいい', u'う', u'えええ'], name=u'おおおおおおお') - expected = (u"ああ あ\nいいいい いい\nう ううう\n" - u"えええ ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - # MultiIndex - idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( - u'おおお', u'かかかか'), (u'き', u'くく')]) - s = Series([1, 22, 3333, 44444], index=idx) - expected = (u"あ いい 1\nう え 22\nおおお かかかか 3333\n" - u"き くく 44444\ndtype: int64") - self.assertEqual(_rep(s), expected) - - # object dtype, shorter than unicode repr - s = Series([1, 22, 3333, 44444], index=[1, 'AB', np.nan, u'あああ']) - expected = (u"1 1\nAB 22\nNaN 3333\n" - u"あああ 44444\ndtype: int64") - self.assertEqual(_rep(s), expected) - - # object dtype, longer than unicode repr - s = Series([1, 22, 3333, 44444], - index=[1, 'AB', pd.Timestamp('2011-01-01'), u'あああ']) - expected = (u"1 1\nAB 22\n" - u"2011-01-01 00:00:00 3333\nあああ 44444\ndtype: int64" - ) - self.assertEqual(_rep(s), expected) - - # truncate - with option_context('display.max_rows', 3): - s = Series([u'あ', u'いい', u'ううう', u'ええええ'], name=u'おおおおおおお') - expected = (u"0 あ\n ... \n" - u"3 ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - s.index = [u'ああ', u'いいいい', u'う', u'えええ'] - expected = (u"ああ あ\n ... \n" - u"えええ ええええ\nName: おおおおおおお, dtype: object") - self.assertEqual(_rep(s), expected) - - # ambiguous unicode - s = Series([u'¡¡', u'い¡¡', u'ううう', u'ええええ'], - index=[u'ああ', u'¡¡¡¡いい', u'¡¡', u'えええ']) - expected = (u"ああ ¡¡\n¡¡¡¡いい い¡¡\n¡¡ ううう\n" - u"えええ ええええ\ndtype: object") - self.assertEqual(_rep(s), expected) - - def test_float_trim_zeros(self): - vals = [2.08430917305e+10, 3.52205017305e+10, 2.30674817305e+10, - 2.03954217305e+10, 5.59897817305e+10] - for line in repr(Series(vals)).split('\n'): - if line.startswith('dtype:'): - continue - if _three_digit_exp(): - self.assertIn('+010', line) - else: - self.assertIn('+10', line) - - def test_datetimeindex(self): - - index = date_range('20130102', periods=6) - s = Series(1, index=index) - result = s.to_string() - self.assertTrue('2013-01-02' in result) - - # nat in index - s2 = Series(2, index=[Timestamp('20130111'), NaT]) - s = s2.append(s) - result = s.to_string() - self.assertTrue('NaT' in result) - - # nat in summary - result = str(s2.index) - self.assertTrue('NaT' in result) - - def test_timedelta64(self): - - from datetime import datetime, timedelta - - Series(np.array([1100, 20], dtype='timedelta64[ns]')).to_string() - - s = Series(date_range('2012-1-1', periods=3, freq='D')) - - # GH2146 - - # adding NaTs - y = s - s.shift(1) - result = y.to_string() - self.assertTrue('1 days' in result) - self.assertTrue('00:00:00' not in result) - self.assertTrue('NaT' in result) - - # with frac seconds - o = Series([datetime(2012, 1, 1, microsecond=150)] * 3) - y = s - o - result = y.to_string() - self.assertTrue('-1 days +23:59:59.999850' in result) - - # rounding? - o = Series([datetime(2012, 1, 1, 1)] * 3) - y = s - o - result = y.to_string() - self.assertTrue('-1 days +23:00:00' in result) - self.assertTrue('1 days 23:00:00' in result) - - o = Series([datetime(2012, 1, 1, 1, 1)] * 3) - y = s - o - result = y.to_string() - self.assertTrue('-1 days +22:59:00' in result) - self.assertTrue('1 days 22:59:00' in result) - - o = Series([datetime(2012, 1, 1, 1, 1, microsecond=150)] * 3) - y = s - o - result = y.to_string() - self.assertTrue('-1 days +22:58:59.999850' in result) - self.assertTrue('0 days 22:58:59.999850' in result) - - # neg time - td = timedelta(minutes=5, seconds=3) - s2 = Series(date_range('2012-1-1', periods=3, freq='D')) + td - y = s - s2 - result = y.to_string() - self.assertTrue('-1 days +23:54:57' in result) - - td = timedelta(microseconds=550) - s2 = Series(date_range('2012-1-1', periods=3, freq='D')) + td - y = s - td - result = y.to_string() - self.assertTrue('2012-01-01 23:59:59.999450' in result) - - # no boxing of the actual elements - td = Series(pd.timedelta_range('1 days', periods=3)) - result = td.to_string() - self.assertEqual(result, u("0 1 days\n1 2 days\n2 3 days")) - - def test_mixed_datetime64(self): - df = DataFrame({'A': [1, 2], 'B': ['2012-01-01', '2012-01-02']}) - df['B'] = pd.to_datetime(df.B) - - result = repr(df.loc[0]) - self.assertTrue('2012-01-01' in result) - - def test_period(self): - # GH 12615 - index = pd.period_range('2013-01', periods=6, freq='M') - s = Series(np.arange(6, dtype='int64'), index=index) - exp = ("2013-01 0\n2013-02 1\n2013-03 2\n2013-04 3\n" - "2013-05 4\n2013-06 5\nFreq: M, dtype: int64") - self.assertEqual(str(s), exp) - - s = Series(index) - exp = ("0 2013-01\n1 2013-02\n2 2013-03\n3 2013-04\n" - "4 2013-05\n5 2013-06\ndtype: object") - self.assertEqual(str(s), exp) - - # periods with mixed freq - s = Series([pd.Period('2011-01', freq='M'), - pd.Period('2011-02-01', freq='D'), - pd.Period('2011-03-01 09:00', freq='H')]) - exp = ("0 2011-01\n1 2011-02-01\n" - "2 2011-03-01 09:00\ndtype: object") - self.assertEqual(str(s), exp) - - def test_max_multi_index_display(self): - # GH 7101 - - # doc example (indexing.rst) - - # multi-index - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - tuples = list(zip(*arrays)) - index = MultiIndex.from_tuples(tuples, names=['first', 'second']) - s = Series(randn(8), index=index) - - with option_context("display.max_rows", 10): - self.assertEqual(len(str(s).split('\n')), 10) - with option_context("display.max_rows", 3): - self.assertEqual(len(str(s).split('\n')), 5) - with option_context("display.max_rows", 2): - self.assertEqual(len(str(s).split('\n')), 5) - with option_context("display.max_rows", 1): - self.assertEqual(len(str(s).split('\n')), 4) - with option_context("display.max_rows", 0): - self.assertEqual(len(str(s).split('\n')), 10) - - # index - s = Series(randn(8), None) - - with option_context("display.max_rows", 10): - self.assertEqual(len(str(s).split('\n')), 9) - with option_context("display.max_rows", 3): - self.assertEqual(len(str(s).split('\n')), 4) - with option_context("display.max_rows", 2): - self.assertEqual(len(str(s).split('\n')), 4) - with option_context("display.max_rows", 1): - self.assertEqual(len(str(s).split('\n')), 3) - with option_context("display.max_rows", 0): - self.assertEqual(len(str(s).split('\n')), 9) - - # Make sure #8532 is fixed - def test_consistent_format(self): - s = pd.Series([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.9999, 1, 1] * 10) - with option_context("display.max_rows", 10): - res = repr(s) - exp = ('0 1.0000\n1 1.0000\n2 1.0000\n3 ' - '1.0000\n4 1.0000\n ... \n125 ' - '1.0000\n126 1.0000\n127 0.9999\n128 ' - '1.0000\n129 1.0000\ndtype: float64') - self.assertEqual(res, exp) - - @staticmethod - def gen_test_series(): - s1 = pd.Series(['a'] * 100) - s2 = pd.Series(['ab'] * 100) - s3 = pd.Series(['a', 'ab', 'abc', 'abcd', 'abcde', 'abcdef']) - s4 = s3[::-1] - test_sers = {'onel': s1, 'twol': s2, 'asc': s3, 'desc': s4} - return test_sers - - def chck_ncols(self, s): - with option_context("display.max_rows", 10): - res = repr(s) - lines = res.split('\n') - lines = [line for line in repr(s).split('\n') - if not re.match(r'[^\.]*\.+', line)][:-1] - ncolsizes = len(set(len(line.strip()) for line in lines)) - self.assertEqual(ncolsizes, 1) - - def test_format_explicit(self): - test_sers = self.gen_test_series() - with option_context("display.max_rows", 4): - res = repr(test_sers['onel']) - exp = '0 a\n1 a\n ..\n98 a\n99 a\ndtype: object' - self.assertEqual(exp, res) - res = repr(test_sers['twol']) - exp = ('0 ab\n1 ab\n ..\n98 ab\n99 ab\ndtype:' - ' object') - self.assertEqual(exp, res) - res = repr(test_sers['asc']) - exp = ('0 a\n1 ab\n ... \n4 abcde\n5' - ' abcdef\ndtype: object') - self.assertEqual(exp, res) - res = repr(test_sers['desc']) - exp = ('5 abcdef\n4 abcde\n ... \n1 ab\n0' - ' a\ndtype: object') - self.assertEqual(exp, res) - - def test_ncols(self): - test_sers = self.gen_test_series() - for s in test_sers.values(): - self.chck_ncols(s) - - def test_max_rows_eq_one(self): - s = Series(range(10), dtype='int64') - with option_context("display.max_rows", 1): - strrepr = repr(s).split('\n') - exp1 = ['0', '0'] - res1 = strrepr[0].split() - self.assertEqual(exp1, res1) - exp2 = ['..'] - res2 = strrepr[1].split() - self.assertEqual(exp2, res2) - - def test_truncate_ndots(self): - def getndots(s): - return len(re.match(r'[^\.]*(\.*)', s).groups()[0]) - - s = Series([0, 2, 3, 6]) - with option_context("display.max_rows", 2): - strrepr = repr(s).replace('\n', '') - self.assertEqual(getndots(strrepr), 2) - - s = Series([0, 100, 200, 400]) - with option_context("display.max_rows", 2): - strrepr = repr(s).replace('\n', '') - self.assertEqual(getndots(strrepr), 3) - - def test_to_string_name(self): - s = Series(range(100), dtype='int64') - s.name = 'myser' - res = s.to_string(max_rows=2, name=True) - exp = '0 0\n ..\n99 99\nName: myser' - self.assertEqual(res, exp) - res = s.to_string(max_rows=2, name=False) - exp = '0 0\n ..\n99 99' - self.assertEqual(res, exp) - - def test_to_string_dtype(self): - s = Series(range(100), dtype='int64') - res = s.to_string(max_rows=2, dtype=True) - exp = '0 0\n ..\n99 99\ndtype: int64' - self.assertEqual(res, exp) - res = s.to_string(max_rows=2, dtype=False) - exp = '0 0\n ..\n99 99' - self.assertEqual(res, exp) - - def test_to_string_length(self): - s = Series(range(100), dtype='int64') - res = s.to_string(max_rows=2, length=True) - exp = '0 0\n ..\n99 99\nLength: 100' - self.assertEqual(res, exp) - - def test_to_string_na_rep(self): - s = pd.Series(index=range(100)) - res = s.to_string(na_rep='foo', max_rows=2) - exp = '0 foo\n ..\n99 foo' - self.assertEqual(res, exp) - - def test_to_string_float_format(self): - s = pd.Series(range(10), dtype='float64') - res = s.to_string(float_format=lambda x: '{0:2.1f}'.format(x), - max_rows=2) - exp = '0 0.0\n ..\n9 9.0' - self.assertEqual(res, exp) - - def test_to_string_header(self): - s = pd.Series(range(10), dtype='int64') - s.index.name = 'foo' - res = s.to_string(header=True, max_rows=2) - exp = 'foo\n0 0\n ..\n9 9' - self.assertEqual(res, exp) - res = s.to_string(header=False, max_rows=2) - exp = '0 0\n ..\n9 9' - self.assertEqual(res, exp) - - -class TestEngFormatter(tm.TestCase): - _multiprocess_can_split_ = True - - def test_eng_float_formatter(self): - df = DataFrame({'A': [1.41, 141., 14100, 1410000.]}) - - fmt.set_eng_float_format() - result = df.to_string() - expected = (' A\n' - '0 1.410E+00\n' - '1 141.000E+00\n' - '2 14.100E+03\n' - '3 1.410E+06') - self.assertEqual(result, expected) - - fmt.set_eng_float_format(use_eng_prefix=True) - result = df.to_string() - expected = (' A\n' - '0 1.410\n' - '1 141.000\n' - '2 14.100k\n' - '3 1.410M') - self.assertEqual(result, expected) - - fmt.set_eng_float_format(accuracy=0) - result = df.to_string() - expected = (' A\n' - '0 1E+00\n' - '1 141E+00\n' - '2 14E+03\n' - '3 1E+06') - self.assertEqual(result, expected) - - self.reset_display_options() - - def compare(self, formatter, input, output): - formatted_input = formatter(input) - msg = ("formatting of %s results in '%s', expected '%s'" % - (str(input), formatted_input, output)) - self.assertEqual(formatted_input, output, msg) - - def compare_all(self, formatter, in_out): - """ - Parameters: - ----------- - formatter: EngFormatter under test - in_out: list of tuples. Each tuple = (number, expected_formatting) - - It is tested if 'formatter(number) == expected_formatting'. - *number* should be >= 0 because formatter(-number) == fmt is also - tested. *fmt* is derived from *expected_formatting* - """ - for input, output in in_out: - self.compare(formatter, input, output) - self.compare(formatter, -input, "-" + output[1:]) - - def test_exponents_with_eng_prefix(self): - formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) - f = np.sqrt(2) - in_out = [(f * 10 ** -24, " 1.414y"), (f * 10 ** -23, " 14.142y"), - (f * 10 ** -22, " 141.421y"), (f * 10 ** -21, " 1.414z"), - (f * 10 ** -20, " 14.142z"), (f * 10 ** -19, " 141.421z"), - (f * 10 ** -18, " 1.414a"), (f * 10 ** -17, " 14.142a"), - (f * 10 ** -16, " 141.421a"), (f * 10 ** -15, " 1.414f"), - (f * 10 ** -14, " 14.142f"), (f * 10 ** -13, " 141.421f"), - (f * 10 ** -12, " 1.414p"), (f * 10 ** -11, " 14.142p"), - (f * 10 ** -10, " 141.421p"), (f * 10 ** -9, " 1.414n"), - (f * 10 ** -8, " 14.142n"), (f * 10 ** -7, " 141.421n"), - (f * 10 ** -6, " 1.414u"), (f * 10 ** -5, " 14.142u"), - (f * 10 ** -4, " 141.421u"), (f * 10 ** -3, " 1.414m"), - (f * 10 ** -2, " 14.142m"), (f * 10 ** -1, " 141.421m"), - (f * 10 ** 0, " 1.414"), (f * 10 ** 1, " 14.142"), - (f * 10 ** 2, " 141.421"), (f * 10 ** 3, " 1.414k"), - (f * 10 ** 4, " 14.142k"), (f * 10 ** 5, " 141.421k"), - (f * 10 ** 6, " 1.414M"), (f * 10 ** 7, " 14.142M"), - (f * 10 ** 8, " 141.421M"), (f * 10 ** 9, " 1.414G"), ( - f * 10 ** 10, " 14.142G"), (f * 10 ** 11, " 141.421G"), - (f * 10 ** 12, " 1.414T"), (f * 10 ** 13, " 14.142T"), ( - f * 10 ** 14, " 141.421T"), (f * 10 ** 15, " 1.414P"), ( - f * 10 ** 16, " 14.142P"), (f * 10 ** 17, " 141.421P"), ( - f * 10 ** 18, " 1.414E"), (f * 10 ** 19, " 14.142E"), - (f * 10 ** 20, " 141.421E"), (f * 10 ** 21, " 1.414Z"), ( - f * 10 ** 22, " 14.142Z"), (f * 10 ** 23, " 141.421Z"), ( - f * 10 ** 24, " 1.414Y"), (f * 10 ** 25, " 14.142Y"), ( - f * 10 ** 26, " 141.421Y")] - self.compare_all(formatter, in_out) - - def test_exponents_without_eng_prefix(self): - formatter = fmt.EngFormatter(accuracy=4, use_eng_prefix=False) - f = np.pi - in_out = [(f * 10 ** -24, " 3.1416E-24"), - (f * 10 ** -23, " 31.4159E-24"), - (f * 10 ** -22, " 314.1593E-24"), - (f * 10 ** -21, " 3.1416E-21"), - (f * 10 ** -20, " 31.4159E-21"), - (f * 10 ** -19, " 314.1593E-21"), - (f * 10 ** -18, " 3.1416E-18"), - (f * 10 ** -17, " 31.4159E-18"), - (f * 10 ** -16, " 314.1593E-18"), - (f * 10 ** -15, " 3.1416E-15"), - (f * 10 ** -14, " 31.4159E-15"), - (f * 10 ** -13, " 314.1593E-15"), - (f * 10 ** -12, " 3.1416E-12"), - (f * 10 ** -11, " 31.4159E-12"), - (f * 10 ** -10, " 314.1593E-12"), - (f * 10 ** -9, " 3.1416E-09"), (f * 10 ** -8, " 31.4159E-09"), - (f * 10 ** -7, " 314.1593E-09"), (f * 10 ** -6, " 3.1416E-06"), - (f * 10 ** -5, " 31.4159E-06"), (f * 10 ** -4, - " 314.1593E-06"), - (f * 10 ** -3, " 3.1416E-03"), (f * 10 ** -2, " 31.4159E-03"), - (f * 10 ** -1, " 314.1593E-03"), (f * 10 ** 0, " 3.1416E+00"), ( - f * 10 ** 1, " 31.4159E+00"), (f * 10 ** 2, " 314.1593E+00"), - (f * 10 ** 3, " 3.1416E+03"), (f * 10 ** 4, " 31.4159E+03"), ( - f * 10 ** 5, " 314.1593E+03"), (f * 10 ** 6, " 3.1416E+06"), - (f * 10 ** 7, " 31.4159E+06"), (f * 10 ** 8, " 314.1593E+06"), ( - f * 10 ** 9, " 3.1416E+09"), (f * 10 ** 10, " 31.4159E+09"), - (f * 10 ** 11, " 314.1593E+09"), (f * 10 ** 12, " 3.1416E+12"), - (f * 10 ** 13, " 31.4159E+12"), (f * 10 ** 14, " 314.1593E+12"), - (f * 10 ** 15, " 3.1416E+15"), (f * 10 ** 16, " 31.4159E+15"), - (f * 10 ** 17, " 314.1593E+15"), (f * 10 ** 18, " 3.1416E+18"), - (f * 10 ** 19, " 31.4159E+18"), (f * 10 ** 20, " 314.1593E+18"), - (f * 10 ** 21, " 3.1416E+21"), (f * 10 ** 22, " 31.4159E+21"), - (f * 10 ** 23, " 314.1593E+21"), (f * 10 ** 24, " 3.1416E+24"), - (f * 10 ** 25, " 31.4159E+24"), (f * 10 ** 26, " 314.1593E+24")] - self.compare_all(formatter, in_out) - - def test_rounding(self): - formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) - in_out = [(5.55555, ' 5.556'), (55.5555, ' 55.556'), - (555.555, ' 555.555'), (5555.55, ' 5.556k'), - (55555.5, ' 55.556k'), (555555, ' 555.555k')] - self.compare_all(formatter, in_out) - - formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) - in_out = [(5.55555, ' 5.6'), (55.5555, ' 55.6'), (555.555, ' 555.6'), - (5555.55, ' 5.6k'), (55555.5, ' 55.6k'), (555555, ' 555.6k')] - self.compare_all(formatter, in_out) - - formatter = fmt.EngFormatter(accuracy=0, use_eng_prefix=True) - in_out = [(5.55555, ' 6'), (55.5555, ' 56'), (555.555, ' 556'), - (5555.55, ' 6k'), (55555.5, ' 56k'), (555555, ' 556k')] - self.compare_all(formatter, in_out) - - formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) - result = formatter(0) - self.assertEqual(result, u(' 0.000')) - - def test_nan(self): - # Issue #11981 - - formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) - result = formatter(np.nan) - self.assertEqual(result, u('NaN')) - - df = pd.DataFrame({'a':[1.5, 10.3, 20.5], - 'b':[50.3, 60.67, 70.12], - 'c':[100.2, 101.33, 120.33]}) - pt = df.pivot_table(values='a', index='b', columns='c') - fmt.set_eng_float_format(accuracy=1) - result = pt.to_string() - self.assertTrue('NaN' in result) - self.reset_display_options() - - def test_inf(self): - # Issue #11981 - - formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) - result = formatter(np.inf) - self.assertEqual(result, u('inf')) - - -def _three_digit_exp(): - return '%.4g' % 1.7e8 == '1.7e+008' - - -class TestFloatArrayFormatter(tm.TestCase): - - def test_misc(self): - obj = fmt.FloatArrayFormatter(np.array([], dtype=np.float64)) - result = obj.get_result() - self.assertTrue(len(result) == 0) - - def test_format(self): - obj = fmt.FloatArrayFormatter(np.array([12, 0], dtype=np.float64)) - result = obj.get_result() - self.assertEqual(result[0], " 12.0") - self.assertEqual(result[1], " 0.0") - - def test_output_significant_digits(self): - # Issue #9764 - - # In case default display precision changes: - with pd.option_context('display.precision', 6): - # DataFrame example from issue #9764 - d = pd.DataFrame( - {'col1': [9.999e-8, 1e-7, 1.0001e-7, 2e-7, 4.999e-7, 5e-7, - 5.0001e-7, 6e-7, 9.999e-7, 1e-6, 1.0001e-6, 2e-6, - 4.999e-6, 5e-6, 5.0001e-6, 6e-6]}) - - expected_output = { - (0, 6): - ' col1\n0 9.999000e-08\n1 1.000000e-07\n2 1.000100e-07\n3 2.000000e-07\n4 4.999000e-07\n5 5.000000e-07', - (1, 6): - ' col1\n1 1.000000e-07\n2 1.000100e-07\n3 2.000000e-07\n4 4.999000e-07\n5 5.000000e-07', - (1, 8): - ' col1\n1 1.000000e-07\n2 1.000100e-07\n3 2.000000e-07\n4 4.999000e-07\n5 5.000000e-07\n6 5.000100e-07\n7 6.000000e-07', - (8, 16): - ' col1\n8 9.999000e-07\n9 1.000000e-06\n10 1.000100e-06\n11 2.000000e-06\n12 4.999000e-06\n13 5.000000e-06\n14 5.000100e-06\n15 6.000000e-06', - (9, 16): - ' col1\n9 0.000001\n10 0.000001\n11 0.000002\n12 0.000005\n13 0.000005\n14 0.000005\n15 0.000006' - } - - for (start, stop), v in expected_output.items(): - self.assertEqual(str(d[start:stop]), v) - - def test_too_long(self): - # GH 10451 - with pd.option_context('display.precision', 4): - # need both a number > 1e6 and something that normally formats to - # having length > display.precision + 6 - df = pd.DataFrame(dict(x=[12345.6789])) - self.assertEqual(str(df), ' x\n0 12345.6789') - df = pd.DataFrame(dict(x=[2e6])) - self.assertEqual(str(df), ' x\n0 2000000.0') - df = pd.DataFrame(dict(x=[12345.6789, 2e6])) - self.assertEqual( - str(df), ' x\n0 1.2346e+04\n1 2.0000e+06') - - -class TestRepr_timedelta64(tm.TestCase): - - def test_none(self): - delta_1d = pd.to_timedelta(1, unit='D') - delta_0d = pd.to_timedelta(0, unit='D') - delta_1s = pd.to_timedelta(1, unit='s') - delta_500ms = pd.to_timedelta(500, unit='ms') - - drepr = lambda x: x._repr_base() - self.assertEqual(drepr(delta_1d), "1 days") - self.assertEqual(drepr(-delta_1d), "-1 days") - self.assertEqual(drepr(delta_0d), "0 days") - self.assertEqual(drepr(delta_1s), "0 days 00:00:01") - self.assertEqual(drepr(delta_500ms), "0 days 00:00:00.500000") - self.assertEqual(drepr(delta_1d + delta_1s), "1 days 00:00:01") - self.assertEqual( - drepr(delta_1d + delta_500ms), "1 days 00:00:00.500000") - - def test_even_day(self): - delta_1d = pd.to_timedelta(1, unit='D') - delta_0d = pd.to_timedelta(0, unit='D') - delta_1s = pd.to_timedelta(1, unit='s') - delta_500ms = pd.to_timedelta(500, unit='ms') - - drepr = lambda x: x._repr_base(format='even_day') - self.assertEqual(drepr(delta_1d), "1 days") - self.assertEqual(drepr(-delta_1d), "-1 days") - self.assertEqual(drepr(delta_0d), "0 days") - self.assertEqual(drepr(delta_1s), "0 days 00:00:01") - self.assertEqual(drepr(delta_500ms), "0 days 00:00:00.500000") - self.assertEqual(drepr(delta_1d + delta_1s), "1 days 00:00:01") - self.assertEqual( - drepr(delta_1d + delta_500ms), "1 days 00:00:00.500000") - - def test_sub_day(self): - delta_1d = pd.to_timedelta(1, unit='D') - delta_0d = pd.to_timedelta(0, unit='D') - delta_1s = pd.to_timedelta(1, unit='s') - delta_500ms = pd.to_timedelta(500, unit='ms') - - drepr = lambda x: x._repr_base(format='sub_day') - self.assertEqual(drepr(delta_1d), "1 days") - self.assertEqual(drepr(-delta_1d), "-1 days") - self.assertEqual(drepr(delta_0d), "00:00:00") - self.assertEqual(drepr(delta_1s), "00:00:01") - self.assertEqual(drepr(delta_500ms), "00:00:00.500000") - self.assertEqual(drepr(delta_1d + delta_1s), "1 days 00:00:01") - self.assertEqual( - drepr(delta_1d + delta_500ms), "1 days 00:00:00.500000") - - def test_long(self): - delta_1d = pd.to_timedelta(1, unit='D') - delta_0d = pd.to_timedelta(0, unit='D') - delta_1s = pd.to_timedelta(1, unit='s') - delta_500ms = pd.to_timedelta(500, unit='ms') - - drepr = lambda x: x._repr_base(format='long') - self.assertEqual(drepr(delta_1d), "1 days 00:00:00") - self.assertEqual(drepr(-delta_1d), "-1 days +00:00:00") - self.assertEqual(drepr(delta_0d), "0 days 00:00:00") - self.assertEqual(drepr(delta_1s), "0 days 00:00:01") - self.assertEqual(drepr(delta_500ms), "0 days 00:00:00.500000") - self.assertEqual(drepr(delta_1d + delta_1s), "1 days 00:00:01") - self.assertEqual( - drepr(delta_1d + delta_500ms), "1 days 00:00:00.500000") - - def test_all(self): - delta_1d = pd.to_timedelta(1, unit='D') - delta_0d = pd.to_timedelta(0, unit='D') - delta_1ns = pd.to_timedelta(1, unit='ns') - - drepr = lambda x: x._repr_base(format='all') - self.assertEqual(drepr(delta_1d), "1 days 00:00:00.000000000") - self.assertEqual(drepr(delta_0d), "0 days 00:00:00.000000000") - self.assertEqual(drepr(delta_1ns), "0 days 00:00:00.000000001") - - -class TestTimedelta64Formatter(tm.TestCase): - - def test_days(self): - x = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='D') - result = fmt.Timedelta64Formatter(x, box=True).get_result() - self.assertEqual(result[0].strip(), "'0 days'") - self.assertEqual(result[1].strip(), "'1 days'") - - result = fmt.Timedelta64Formatter(x[1:2], box=True).get_result() - self.assertEqual(result[0].strip(), "'1 days'") - - result = fmt.Timedelta64Formatter(x, box=False).get_result() - self.assertEqual(result[0].strip(), "0 days") - self.assertEqual(result[1].strip(), "1 days") - - result = fmt.Timedelta64Formatter(x[1:2], box=False).get_result() - self.assertEqual(result[0].strip(), "1 days") - - def test_days_neg(self): - x = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='D') - result = fmt.Timedelta64Formatter(-x, box=True).get_result() - self.assertEqual(result[0].strip(), "'0 days'") - self.assertEqual(result[1].strip(), "'-1 days'") - - def test_subdays(self): - y = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='s') - result = fmt.Timedelta64Formatter(y, box=True).get_result() - self.assertEqual(result[0].strip(), "'00:00:00'") - self.assertEqual(result[1].strip(), "'00:00:01'") - - def test_subdays_neg(self): - y = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='s') - result = fmt.Timedelta64Formatter(-y, box=True).get_result() - self.assertEqual(result[0].strip(), "'00:00:00'") - self.assertEqual(result[1].strip(), "'-1 days +23:59:59'") - - def test_zero(self): - x = pd.to_timedelta(list(range(1)) + [pd.NaT], unit='D') - result = fmt.Timedelta64Formatter(x, box=True).get_result() - self.assertEqual(result[0].strip(), "'0 days'") - - x = pd.to_timedelta(list(range(1)), unit='D') - result = fmt.Timedelta64Formatter(x, box=True).get_result() - self.assertEqual(result[0].strip(), "'0 days'") - - -class TestDatetime64Formatter(tm.TestCase): - - def test_mixed(self): - x = Series([datetime(2013, 1, 1), datetime(2013, 1, 1, 12), pd.NaT]) - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 00:00:00") - self.assertEqual(result[1].strip(), "2013-01-01 12:00:00") - - def test_dates(self): - x = Series([datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT]) - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01") - self.assertEqual(result[1].strip(), "2013-01-02") - - def test_date_nanos(self): - x = Series([Timestamp(200)]) - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "1970-01-01 00:00:00.000000200") - - def test_dates_display(self): - - # 10170 - # make sure that we are consistently display date formatting - x = Series(date_range('20130101 09:00:00', periods=5, freq='D')) - x.iloc[1] = np.nan - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 09:00:00") - self.assertEqual(result[1].strip(), "NaT") - self.assertEqual(result[4].strip(), "2013-01-05 09:00:00") - - x = Series(date_range('20130101 09:00:00', periods=5, freq='s')) - x.iloc[1] = np.nan - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 09:00:00") - self.assertEqual(result[1].strip(), "NaT") - self.assertEqual(result[4].strip(), "2013-01-01 09:00:04") - - x = Series(date_range('20130101 09:00:00', periods=5, freq='ms')) - x.iloc[1] = np.nan - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 09:00:00.000") - self.assertEqual(result[1].strip(), "NaT") - self.assertEqual(result[4].strip(), "2013-01-01 09:00:00.004") - - x = Series(date_range('20130101 09:00:00', periods=5, freq='us')) - x.iloc[1] = np.nan - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 09:00:00.000000") - self.assertEqual(result[1].strip(), "NaT") - self.assertEqual(result[4].strip(), "2013-01-01 09:00:00.000004") - - x = Series(date_range('20130101 09:00:00', periods=5, freq='N')) - x.iloc[1] = np.nan - result = fmt.Datetime64Formatter(x).get_result() - self.assertEqual(result[0].strip(), "2013-01-01 09:00:00.000000000") - self.assertEqual(result[1].strip(), "NaT") - self.assertEqual(result[4].strip(), "2013-01-01 09:00:00.000000004") - - def test_datetime64formatter_yearmonth(self): - x = Series([datetime(2016, 1, 1), datetime(2016, 2, 2)]) - - def format_func(x): - return x.strftime('%Y-%m') - - formatter = fmt.Datetime64Formatter(x, formatter=format_func) - result = formatter.get_result() - self.assertEqual(result, ['2016-01', '2016-02']) - - def test_datetime64formatter_hoursecond(self): - - x = Series(pd.to_datetime(['10:10:10.100', '12:12:12.120'], - format='%H:%M:%S.%f')) - - def format_func(x): - return x.strftime('%H:%M') - - formatter = fmt.Datetime64Formatter(x, formatter=format_func) - result = formatter.get_result() - self.assertEqual(result, ['10:10', '12:12']) - - -class TestNaTFormatting(tm.TestCase): - - def test_repr(self): - self.assertEqual(repr(pd.NaT), "NaT") - - def test_str(self): - self.assertEqual(str(pd.NaT), "NaT") - - -class TestDatetimeIndexFormat(tm.TestCase): - - def test_datetime(self): - formatted = pd.to_datetime([datetime(2003, 1, 1, 12), pd.NaT]).format() - self.assertEqual(formatted[0], "2003-01-01 12:00:00") - self.assertEqual(formatted[1], "NaT") - - def test_date(self): - formatted = pd.to_datetime([datetime(2003, 1, 1), pd.NaT]).format() - self.assertEqual(formatted[0], "2003-01-01") - self.assertEqual(formatted[1], "NaT") - - def test_date_tz(self): - formatted = pd.to_datetime([datetime(2013, 1, 1)], utc=True).format() - self.assertEqual(formatted[0], "2013-01-01 00:00:00+00:00") - - formatted = pd.to_datetime( - [datetime(2013, 1, 1), pd.NaT], utc=True).format() - self.assertEqual(formatted[0], "2013-01-01 00:00:00+00:00") - - def test_date_explict_date_format(self): - formatted = pd.to_datetime([datetime(2003, 2, 1), pd.NaT]).format( - date_format="%m-%d-%Y", na_rep="UT") - self.assertEqual(formatted[0], "02-01-2003") - self.assertEqual(formatted[1], "UT") - - -class TestDatetimeIndexUnicode(tm.TestCase): - - def test_dates(self): - text = str(pd.to_datetime([datetime(2013, 1, 1), datetime(2014, 1, 1) - ])) - self.assertTrue("['2013-01-01'," in text) - self.assertTrue(", '2014-01-01']" in text) - - def test_mixed(self): - text = str(pd.to_datetime([datetime(2013, 1, 1), datetime( - 2014, 1, 1, 12), datetime(2014, 1, 1)])) - self.assertTrue("'2013-01-01 00:00:00'," in text) - self.assertTrue("'2014-01-01 00:00:00']" in text) - - -class TestStringRepTimestamp(tm.TestCase): - - def test_no_tz(self): - dt_date = datetime(2013, 1, 2) - self.assertEqual(str(dt_date), str(Timestamp(dt_date))) - - dt_datetime = datetime(2013, 1, 2, 12, 1, 3) - self.assertEqual(str(dt_datetime), str(Timestamp(dt_datetime))) - - dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45) - self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us))) - - ts_nanos_only = Timestamp(200) - self.assertEqual(str(ts_nanos_only), "1970-01-01 00:00:00.000000200") - - ts_nanos_micros = Timestamp(1200) - self.assertEqual(str(ts_nanos_micros), "1970-01-01 00:00:00.000001200") - - def test_tz_pytz(self): - tm._skip_if_no_pytz() - - import pytz - - dt_date = datetime(2013, 1, 2, tzinfo=pytz.utc) - self.assertEqual(str(dt_date), str(Timestamp(dt_date))) - - dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=pytz.utc) - self.assertEqual(str(dt_datetime), str(Timestamp(dt_datetime))) - - dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=pytz.utc) - self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us))) - - def test_tz_dateutil(self): - tm._skip_if_no_dateutil() - import dateutil - utc = dateutil.tz.tzutc() - - dt_date = datetime(2013, 1, 2, tzinfo=utc) - self.assertEqual(str(dt_date), str(Timestamp(dt_date))) - - dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=utc) - self.assertEqual(str(dt_datetime), str(Timestamp(dt_datetime))) - - dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=utc) - self.assertEqual(str(dt_datetime_us), str(Timestamp(dt_datetime_us))) - - def test_nat_representations(self): - for f in (str, repr, methodcaller('isoformat')): - self.assertEqual(f(pd.NaT), 'NaT') - - -def test_format_percentiles(): - result = fmt.format_percentiles([0.01999, 0.02001, 0.5, 0.666666, 0.9999]) - expected = ['1.999%', '2.001%', '50%', '66.667%', '99.99%'] - tm.assert_equal(result, expected) - - result = fmt.format_percentiles([0, 0.5, 0.02001, 0.5, 0.666666, 0.9999]) - expected = ['0%', '50%', '2.0%', '50%', '66.67%', '99.99%'] - tm.assert_equal(result, expected) - - tm.assertRaises(ValueError, fmt.format_percentiles, [0.1, np.nan, 0.5]) - tm.assertRaises(ValueError, fmt.format_percentiles, [-0.001, 0.1, 0.5]) - tm.assertRaises(ValueError, fmt.format_percentiles, [2, 0.1, 0.5]) - tm.assertRaises(ValueError, fmt.format_percentiles, [0.1, 0.5, 'a']) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/formats/test_printing.py b/pandas/tests/formats/test_printing.py deleted file mode 100644 index 3bcceca1f50a7..0000000000000 --- a/pandas/tests/formats/test_printing.py +++ /dev/null @@ -1,142 +0,0 @@ -# -*- coding: utf-8 -*- -import nose -from pandas import compat -import pandas.formats.printing as printing -import pandas.formats.format as fmt -import pandas.util.testing as tm -import pandas.core.config as cf - -_multiprocess_can_split_ = True - - -def test_adjoin(): - data = [['a', 'b', 'c'], ['dd', 'ee', 'ff'], ['ggg', 'hhh', 'iii']] - expected = 'a dd ggg\nb ee hhh\nc ff iii' - - adjoined = printing.adjoin(2, *data) - - assert (adjoined == expected) - - -def test_repr_binary_type(): - import string - letters = string.ascii_letters - btype = compat.binary_type - try: - raw = btype(letters, encoding=cf.get_option('display.encoding')) - except TypeError: - raw = btype(letters) - b = compat.text_type(compat.bytes_to_str(raw)) - res = printing.pprint_thing(b, quote_strings=True) - tm.assert_equal(res, repr(b)) - res = printing.pprint_thing(b, quote_strings=False) - tm.assert_equal(res, b) - - -class TestFormattBase(tm.TestCase): - - def test_adjoin(self): - data = [['a', 'b', 'c'], ['dd', 'ee', 'ff'], ['ggg', 'hhh', 'iii']] - expected = 'a dd ggg\nb ee hhh\nc ff iii' - - adjoined = printing.adjoin(2, *data) - - self.assertEqual(adjoined, expected) - - def test_adjoin_unicode(self): - data = [[u'あ', 'b', 'c'], ['dd', u'ええ', 'ff'], ['ggg', 'hhh', u'いいい']] - expected = u'あ dd ggg\nb ええ hhh\nc ff いいい' - adjoined = printing.adjoin(2, *data) - self.assertEqual(adjoined, expected) - - adj = fmt.EastAsianTextAdjustment() - - expected = u"""あ dd ggg -b ええ hhh -c ff いいい""" - - adjoined = adj.adjoin(2, *data) - self.assertEqual(adjoined, expected) - cols = adjoined.split('\n') - self.assertEqual(adj.len(cols[0]), 13) - self.assertEqual(adj.len(cols[1]), 13) - self.assertEqual(adj.len(cols[2]), 16) - - expected = u"""あ dd ggg -b ええ hhh -c ff いいい""" - - adjoined = adj.adjoin(7, *data) - self.assertEqual(adjoined, expected) - cols = adjoined.split('\n') - self.assertEqual(adj.len(cols[0]), 23) - self.assertEqual(adj.len(cols[1]), 23) - self.assertEqual(adj.len(cols[2]), 26) - - def test_justify(self): - adj = fmt.EastAsianTextAdjustment() - - def just(x, *args, **kwargs): - # wrapper to test single str - return adj.justify([x], *args, **kwargs)[0] - - self.assertEqual(just('abc', 5, mode='left'), 'abc ') - self.assertEqual(just('abc', 5, mode='center'), ' abc ') - self.assertEqual(just('abc', 5, mode='right'), ' abc') - self.assertEqual(just(u'abc', 5, mode='left'), 'abc ') - self.assertEqual(just(u'abc', 5, mode='center'), ' abc ') - self.assertEqual(just(u'abc', 5, mode='right'), ' abc') - - self.assertEqual(just(u'パンダ', 5, mode='left'), u'パンダ') - self.assertEqual(just(u'パンダ', 5, mode='center'), u'パンダ') - self.assertEqual(just(u'パンダ', 5, mode='right'), u'パンダ') - - self.assertEqual(just(u'パンダ', 10, mode='left'), u'パンダ ') - self.assertEqual(just(u'パンダ', 10, mode='center'), u' パンダ ') - self.assertEqual(just(u'パンダ', 10, mode='right'), u' パンダ') - - def test_east_asian_len(self): - adj = fmt.EastAsianTextAdjustment() - - self.assertEqual(adj.len('abc'), 3) - self.assertEqual(adj.len(u'abc'), 3) - - self.assertEqual(adj.len(u'パンダ'), 6) - self.assertEqual(adj.len(u'パンダ'), 5) - self.assertEqual(adj.len(u'パンダpanda'), 11) - self.assertEqual(adj.len(u'パンダpanda'), 10) - - def test_ambiguous_width(self): - adj = fmt.EastAsianTextAdjustment() - self.assertEqual(adj.len(u'¡¡ab'), 4) - - with cf.option_context('display.unicode.ambiguous_as_wide', True): - adj = fmt.EastAsianTextAdjustment() - self.assertEqual(adj.len(u'¡¡ab'), 6) - - data = [[u'あ', 'b', 'c'], ['dd', u'ええ', 'ff'], - ['ggg', u'¡¡ab', u'いいい']] - expected = u'あ dd ggg \nb ええ ¡¡ab\nc ff いいい' - adjoined = adj.adjoin(2, *data) - self.assertEqual(adjoined, expected) - - -# TODO: fix this broken test - -# def test_console_encode(): -# """ -# On Python 2, if sys.stdin.encoding is None (IPython with zmq frontend) -# common.console_encode should encode things as utf-8. -# """ -# if compat.PY3: -# raise nose.SkipTest - -# with tm.stdin_encoding(encoding=None): -# result = printing.console_encode(u"\u05d0") -# expected = u"\u05d0".encode('utf-8') -# assert (result == expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/frame/common.py b/pandas/tests/frame/common.py index 37f67712e1b58..b475d25eb5dac 100644 --- a/pandas/tests/frame/common.py +++ b/pandas/tests/frame/common.py @@ -1,7 +1,7 @@ import numpy as np from pandas import compat -from pandas.util.decorators import cache_readonly +from pandas.util._decorators import cache_readonly import pandas.util.testing as tm import pandas as pd @@ -89,11 +89,11 @@ def empty(self): @cache_readonly def ts1(self): - return tm.makeTimeSeries() + return tm.makeTimeSeries(nper=30) @cache_readonly def ts2(self): - return tm.makeTimeSeries()[5:] + return tm.makeTimeSeries(nper=30)[5:] @cache_readonly def simple(self): diff --git a/pandas/tests/frame/test_alter_axes.py b/pandas/tests/frame/test_alter_axes.py index edeca0a664a87..8bcc19e6d8ba4 100644 --- a/pandas/tests/frame/test_alter_axes.py +++ b/pandas/tests/frame/test_alter_axes.py @@ -2,27 +2,30 @@ from __future__ import print_function +import pytest + from datetime import datetime, timedelta import numpy as np from pandas.compat import lrange from pandas import (DataFrame, Series, Index, MultiIndex, - RangeIndex) + RangeIndex, date_range, IntervalIndex, + to_datetime) +from pandas.core.dtypes.common import ( + is_object_dtype, + is_categorical_dtype, + is_interval_dtype) import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameAlterAxes(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameAlterAxes(TestData): def test_set_index(self): idx = Index(np.arange(len(self.mixed_frame))) @@ -30,8 +33,8 @@ def test_set_index(self): # cache it _ = self.mixed_frame['foo'] # noqa self.mixed_frame.index = idx - self.assertIs(self.mixed_frame['foo'].index, idx) - with assertRaisesRegexp(ValueError, 'Length mismatch'): + assert self.mixed_frame['foo'].index is idx + with tm.assert_raises_regex(ValueError, 'Length mismatch'): self.mixed_frame.index = idx[::2] def test_set_index_cast(self): @@ -66,7 +69,7 @@ def test_set_index2(self): assert_frame_equal(result, expected) assert_frame_equal(result_nodrop, expected_nodrop) - self.assertEqual(result.index.name, index.name) + assert result.index.name == index.name # inplace, single df2 = df.copy() @@ -94,7 +97,7 @@ def test_set_index2(self): assert_frame_equal(result, expected) assert_frame_equal(result_nodrop, expected_nodrop) - self.assertEqual(result.index.names, index.names) + assert result.index.names == index.names # inplace df2 = df.copy() @@ -106,7 +109,8 @@ def test_set_index2(self): assert_frame_equal(df3, expected_nodrop) # corner case - with assertRaisesRegexp(ValueError, 'Index has duplicate keys'): + with tm.assert_raises_regex(ValueError, + 'Index has duplicate keys'): df.set_index('A', verify_integrity=True) # append @@ -123,7 +127,7 @@ def test_set_index2(self): # Series result = df.set_index(df.C) - self.assertEqual(result.index.name, 'C') + assert result.index.name == 'C' def test_set_index_nonuniq(self): df = DataFrame({'A': ['foo', 'foo', 'foo', 'bar', 'bar'], @@ -131,9 +135,10 @@ def test_set_index_nonuniq(self): 'C': ['a', 'b', 'c', 'd', 'e'], 'D': np.random.randn(5), 'E': np.random.randn(5)}) - with assertRaisesRegexp(ValueError, 'Index has duplicate keys'): + with tm.assert_raises_regex(ValueError, + 'Index has duplicate keys'): df.set_index('A', verify_integrity=True, inplace=True) - self.assertIn('A', df) + assert 'A' in df def test_set_index_bug(self): # GH1590 @@ -169,7 +174,7 @@ def test_construction_with_categorical_index(self): idf = df.set_index('B') str(idf) tm.assert_index_equal(idf.index, ci, check_names=False) - self.assertEqual(idf.index.name, 'B') + assert idf.index.name == 'B' # from a CategoricalIndex df = DataFrame({'A': np.random.randn(10), @@ -177,17 +182,17 @@ def test_construction_with_categorical_index(self): idf = df.set_index('B') str(idf) tm.assert_index_equal(idf.index, ci, check_names=False) - self.assertEqual(idf.index.name, 'B') + assert idf.index.name == 'B' idf = df.set_index('B').reset_index().set_index('B') str(idf) tm.assert_index_equal(idf.index, ci, check_names=False) - self.assertEqual(idf.index.name, 'B') + assert idf.index.name == 'B' new_df = idf.reset_index() new_df.index = df.B tm.assert_index_equal(new_df.index, ci, check_names=False) - self.assertEqual(idf.index.name, 'B') + assert idf.index.name == 'B' def test_set_index_cast_datetimeindex(self): df = DataFrame({'A': [datetime(2000, 1, 1) + timedelta(i) @@ -195,13 +200,13 @@ def test_set_index_cast_datetimeindex(self): 'B': np.random.randn(1000)}) idf = df.set_index('A') - tm.assertIsInstance(idf.index, pd.DatetimeIndex) + assert isinstance(idf.index, pd.DatetimeIndex) # don't cast a DatetimeIndex WITH a tz, leave as object # GH 6032 i = (pd.DatetimeIndex( - pd.tseries.tools.to_datetime(['2013-1-1 13:00', - '2013-1-2 14:00'], errors="raise")) + to_datetime(['2013-1-1 13:00', + '2013-1-2 14:00'], errors="raise")) .tz_localize('US/Pacific')) df = DataFrame(np.random.randn(2, 1), columns=['A']) @@ -219,7 +224,7 @@ def test_set_index_cast_datetimeindex(self): df['B'] = i result = df['B'] assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, 'B') + assert result.name == 'B' # keep the timezone result = i.to_series(keep_tz=True) @@ -230,13 +235,13 @@ def test_set_index_cast_datetimeindex(self): result = df['C'] comp = pd.DatetimeIndex(expected.values).copy() comp.tz = None - self.assert_numpy_array_equal(result.values, comp.values) + tm.assert_numpy_array_equal(result.values, comp.values) # list of datetimes with a tz df['D'] = i.to_pydatetime() result = df['D'] assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, 'D') + assert result.name == 'D' # GH 6785 # set the index manually @@ -274,9 +279,9 @@ def test_set_index_timezone(self): i = pd.to_datetime(["2014-01-01 10:10:10"], utc=True).tz_convert('Europe/Rome') df = DataFrame({'i': i}) - self.assertEqual(df.set_index(i).index[0].hour, 11) - self.assertEqual(pd.DatetimeIndex(pd.Series(df.i))[0].hour, 11) - self.assertEqual(df.set_index(df.i).index[0].hour, 11) + assert df.set_index(i).index[0].hour == 11 + assert pd.DatetimeIndex(pd.Series(df.i))[0].hour == 11 + assert df.set_index(df.i).index[0].hour == 11 def test_set_index_dst(self): di = pd.date_range('2006-10-29 00:00:00', periods=3, @@ -297,6 +302,17 @@ def test_set_index_dst(self): exp = pd.DataFrame({'b': [3, 4, 5]}, index=exp_index) tm.assert_frame_equal(res, exp) + def test_reset_index_with_intervals(self): + idx = pd.IntervalIndex.from_breaks(np.arange(11), name='x') + original = pd.DataFrame({'x': idx, 'y': np.arange(10)})[['x', 'y']] + + result = original.set_index('x') + expected = pd.DataFrame({'y': np.arange(10)}, index=idx) + assert_frame_equal(result, expected) + + result2 = result.reset_index() + assert_frame_equal(result2, original) + def test_set_index_multiindexcolumns(self): columns = MultiIndex.from_tuples([('foo', 1), ('foo', 2), ('bar', 1)]) df = DataFrame(np.random.randn(3, 3), columns=columns) @@ -322,9 +338,35 @@ def test_set_index_empty_column(self): def test_set_columns(self): cols = Index(np.arange(len(self.mixed_frame.columns))) self.mixed_frame.columns = cols - with assertRaisesRegexp(ValueError, 'Length mismatch'): + with tm.assert_raises_regex(ValueError, 'Length mismatch'): self.mixed_frame.columns = cols[::2] + def test_dti_set_index_reindex(self): + # GH 6631 + df = DataFrame(np.random.random(6)) + idx1 = date_range('2011/01/01', periods=6, freq='M', tz='US/Eastern') + idx2 = date_range('2013', periods=6, freq='A', tz='Asia/Tokyo') + + df = df.set_index(idx1) + tm.assert_index_equal(df.index, idx1) + df = df.reindex(idx2) + tm.assert_index_equal(df.index, idx2) + + # 11314 + # with tz + index = date_range(datetime(2015, 10, 1), + datetime(2015, 10, 1, 23), + freq='H', tz='US/Eastern') + df = DataFrame(np.random.randn(24, 1), columns=['a'], index=index) + new_index = date_range(datetime(2015, 10, 2), + datetime(2015, 10, 2, 23), + freq='H', tz='US/Eastern') + + # TODO: unused? + result = df.set_index(new_index) # noqa + + assert new_index.freq == index.freq + # Renaming def test_rename(self): @@ -356,7 +398,7 @@ def test_rename(self): tm.assert_index_equal(renamed.index, pd.Index(['BAR', 'FOO'])) # have to pass something - self.assertRaises(TypeError, self.frame.rename) + pytest.raises(TypeError, self.frame.rename) # partial columns renamed = self.frame.rename(columns={'C': 'foo', 'D': 'bar'}) @@ -374,45 +416,117 @@ def test_rename(self): renamed = renamer.rename(index={'foo': 'bar', 'bar': 'foo'}) tm.assert_index_equal(renamed.index, pd.Index(['bar', 'foo'], name='name')) - self.assertEqual(renamed.index.name, renamer.index.name) + assert renamed.index.name == renamer.index.name + + def test_rename_axis_inplace(self): + # GH 15704 + frame = self.frame.copy() + expected = frame.rename_axis('foo') + result = frame.copy() + no_return = result.rename_axis('foo', inplace=True) + + assert no_return is None + assert_frame_equal(result, expected) + + expected = frame.rename_axis('bar', axis=1) + result = frame.copy() + no_return = result.rename_axis('bar', axis=1, inplace=True) + + assert no_return is None + assert_frame_equal(result, expected) + + def test_rename_multiindex(self): - # MultiIndex tuples_index = [('foo1', 'bar1'), ('foo2', 'bar2')] tuples_columns = [('fizz1', 'buzz1'), ('fizz2', 'buzz2')] index = MultiIndex.from_tuples(tuples_index, names=['foo', 'bar']) columns = MultiIndex.from_tuples( tuples_columns, names=['fizz', 'buzz']) - renamer = DataFrame([(0, 0), (1, 1)], index=index, columns=columns) - renamed = renamer.rename(index={'foo1': 'foo3', 'bar2': 'bar3'}, - columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}) + df = DataFrame([(0, 0), (1, 1)], index=index, columns=columns) + + # + # without specifying level -> accross all levels + + renamed = df.rename(index={'foo1': 'foo3', 'bar2': 'bar3'}, + columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}) new_index = MultiIndex.from_tuples([('foo3', 'bar1'), ('foo2', 'bar3')], names=['foo', 'bar']) new_columns = MultiIndex.from_tuples([('fizz3', 'buzz1'), ('fizz2', 'buzz3')], names=['fizz', 'buzz']) - self.assert_index_equal(renamed.index, new_index) - self.assert_index_equal(renamed.columns, new_columns) - self.assertEqual(renamed.index.names, renamer.index.names) - self.assertEqual(renamed.columns.names, renamer.columns.names) + tm.assert_index_equal(renamed.index, new_index) + tm.assert_index_equal(renamed.columns, new_columns) + assert renamed.index.names == df.index.names + assert renamed.columns.names == df.columns.names + + # + # with specifying a level (GH13766) + + # dict + new_columns = MultiIndex.from_tuples([('fizz3', 'buzz1'), + ('fizz2', 'buzz2')], + names=['fizz', 'buzz']) + renamed = df.rename(columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}, + level=0) + tm.assert_index_equal(renamed.columns, new_columns) + renamed = df.rename(columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}, + level='fizz') + tm.assert_index_equal(renamed.columns, new_columns) + + new_columns = MultiIndex.from_tuples([('fizz1', 'buzz1'), + ('fizz2', 'buzz3')], + names=['fizz', 'buzz']) + renamed = df.rename(columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}, + level=1) + tm.assert_index_equal(renamed.columns, new_columns) + renamed = df.rename(columns={'fizz1': 'fizz3', 'buzz2': 'buzz3'}, + level='buzz') + tm.assert_index_equal(renamed.columns, new_columns) + + # function + func = str.upper + new_columns = MultiIndex.from_tuples([('FIZZ1', 'buzz1'), + ('FIZZ2', 'buzz2')], + names=['fizz', 'buzz']) + renamed = df.rename(columns=func, level=0) + tm.assert_index_equal(renamed.columns, new_columns) + renamed = df.rename(columns=func, level='fizz') + tm.assert_index_equal(renamed.columns, new_columns) + + new_columns = MultiIndex.from_tuples([('fizz1', 'BUZZ1'), + ('fizz2', 'BUZZ2')], + names=['fizz', 'buzz']) + renamed = df.rename(columns=func, level=1) + tm.assert_index_equal(renamed.columns, new_columns) + renamed = df.rename(columns=func, level='buzz') + tm.assert_index_equal(renamed.columns, new_columns) + + # index + new_index = MultiIndex.from_tuples([('foo3', 'bar1'), + ('foo2', 'bar2')], + names=['foo', 'bar']) + renamed = df.rename(index={'foo1': 'foo3', 'bar2': 'bar3'}, + level=0) + tm.assert_index_equal(renamed.index, new_index) def test_rename_nocopy(self): renamed = self.frame.rename(columns={'C': 'foo'}, copy=False) renamed['foo'] = 1. - self.assertTrue((self.frame['C'] == 1.).all()) + assert (self.frame['C'] == 1.).all() def test_rename_inplace(self): self.frame.rename(columns={'C': 'foo'}) - self.assertIn('C', self.frame) - self.assertNotIn('foo', self.frame) + assert 'C' in self.frame + assert 'foo' not in self.frame c_id = id(self.frame['C']) frame = self.frame.copy() frame.rename(columns={'C': 'foo'}, inplace=True) - self.assertNotIn('C', frame) - self.assertIn('foo', frame) - self.assertNotEqual(id(frame['foo']), c_id) + assert 'C' not in frame + assert 'foo' in frame + assert id(frame['foo']) != c_id def test_rename_bug(self): # GH 5344 @@ -492,27 +606,27 @@ def test_reset_index(self): # default name assigned rdf = self.frame.reset_index() exp = pd.Series(self.frame.index.values, name='index') - self.assert_series_equal(rdf['index'], exp) + tm.assert_series_equal(rdf['index'], exp) # default name assigned, corner case df = self.frame.copy() df['index'] = 'foo' rdf = df.reset_index() exp = pd.Series(self.frame.index.values, name='level_0') - self.assert_series_equal(rdf['level_0'], exp) + tm.assert_series_equal(rdf['level_0'], exp) # but this is ok self.frame.index.name = 'index' deleveled = self.frame.reset_index() - self.assert_series_equal(deleveled['index'], - pd.Series(self.frame.index)) - self.assert_index_equal(deleveled.index, - pd.Index(np.arange(len(deleveled)))) + tm.assert_series_equal(deleveled['index'], + pd.Series(self.frame.index)) + tm.assert_index_equal(deleveled.index, + pd.Index(np.arange(len(deleveled)))) # preserve column names self.frame.columns.name = 'columns' resetted = self.frame.reset_index() - self.assertEqual(resetted.columns.name, 'columns') + assert resetted.columns.name == 'columns' # only remove certain columns frame = self.frame.reset_index().set_index(['index', 'A', 'B']) @@ -544,6 +658,43 @@ def test_reset_index(self): xp = xp.set_index(['B'], append=True) assert_frame_equal(rs, xp, check_names=False) + def test_reset_index_level(self): + df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], + columns=['A', 'B', 'C', 'D']) + + for levels in ['A', 'B'], [0, 1]: + # With MultiIndex + result = df.set_index(['A', 'B']).reset_index(level=levels[0]) + tm.assert_frame_equal(result, df.set_index('B')) + + result = df.set_index(['A', 'B']).reset_index(level=levels[:1]) + tm.assert_frame_equal(result, df.set_index('B')) + + result = df.set_index(['A', 'B']).reset_index(level=levels) + tm.assert_frame_equal(result, df) + + result = df.set_index(['A', 'B']).reset_index(level=levels, + drop=True) + tm.assert_frame_equal(result, df[['C', 'D']]) + + # With single-level Index (GH 16263) + result = df.set_index('A').reset_index(level=levels[0]) + tm.assert_frame_equal(result, df) + + result = df.set_index('A').reset_index(level=levels[:1]) + tm.assert_frame_equal(result, df) + + result = df.set_index(['A']).reset_index(level=levels[0], + drop=True) + tm.assert_frame_equal(result, df[['B', 'C', 'D']]) + + # Missing levels - for both MultiIndex and single-level Index: + for idx_lev in ['A', 'B'], ['A']: + with tm.assert_raises_regex(KeyError, 'Level E '): + df.set_index(idx_lev).reset_index(level=['A', 'E']) + with tm.assert_raises_regex(IndexError, 'Too many levels'): + df.set_index(idx_lev).reset_index(level=[0, 1, 2]) + def test_reset_index_right_dtype(self): time = np.arange(0.0, 10, np.sqrt(2) / 2) s1 = Series((9.81 * time ** 2) / 2, @@ -552,10 +703,10 @@ def test_reset_index_right_dtype(self): df = DataFrame(s1) resetted = s1.reset_index() - self.assertEqual(resetted['time'].dtype, np.float64) + assert resetted['time'].dtype == np.float64 resetted = df.reset_index() - self.assertEqual(resetted['time'].dtype, np.float64) + assert resetted['time'].dtype == np.float64 def test_reset_index_multiindex_col(self): vals = np.random.randn(3, 3).astype(object) @@ -600,6 +751,33 @@ def test_reset_index_multiindex_col(self): ['a', 'mean', 'median', 'mean']]) assert_frame_equal(rs, xp) + def test_reset_index_multiindex_nan(self): + # GH6322, testing reset_index on MultiIndexes + # when we have a nan or all nan + df = pd.DataFrame({'A': ['a', 'b', 'c'], + 'B': [0, 1, np.nan], + 'C': np.random.rand(3)}) + rs = df.set_index(['A', 'B']).reset_index() + assert_frame_equal(rs, df) + + df = pd.DataFrame({'A': [np.nan, 'b', 'c'], + 'B': [0, 1, 2], + 'C': np.random.rand(3)}) + rs = df.set_index(['A', 'B']).reset_index() + assert_frame_equal(rs, df) + + df = pd.DataFrame({'A': ['a', 'b', 'c'], + 'B': [0, 1, 2], + 'C': [np.nan, 1.1, 2.2]}) + rs = df.set_index(['A', 'B']).reset_index() + assert_frame_equal(rs, df) + + df = pd.DataFrame({'A': ['a', 'b', 'c'], + 'B': [np.nan, np.nan, np.nan], + 'C': np.random.rand(3)}) + rs = df.set_index(['A', 'B']).reset_index() + assert_frame_equal(rs, df) + def test_reset_index_with_datetimeindex_cols(self): # GH5818 # @@ -618,7 +796,7 @@ def test_reset_index_range(self): df = pd.DataFrame([[0, 0], [1, 1]], columns=['A', 'B'], index=RangeIndex(stop=2)) result = df.reset_index() - tm.assertIsInstance(result.index, RangeIndex) + assert isinstance(result.index, RangeIndex) expected = pd.DataFrame([[0, 0, 0], [1, 1, 1]], columns=['index', 'A', 'B'], index=RangeIndex(stop=2)) @@ -628,7 +806,7 @@ def test_set_index_names(self): df = pd.util.testing.makeDataFrame() df.index.name = 'name' - self.assertEqual(df.set_index(df.index).index.names, ['name']) + assert df.set_index(df.index).index.names == ['name'] mi = MultiIndex.from_arrays(df[['A', 'B']].T.values, names=['A', 'B']) mi2 = MultiIndex.from_arrays(df[['A', 'B', 'A', 'B']].T.values, @@ -636,26 +814,27 @@ def test_set_index_names(self): df = df.set_index(['A', 'B']) - self.assertEqual(df.set_index(df.index).index.names, ['A', 'B']) + assert df.set_index(df.index).index.names == ['A', 'B'] # Check that set_index isn't converting a MultiIndex into an Index - self.assertTrue(isinstance(df.set_index(df.index).index, MultiIndex)) + assert isinstance(df.set_index(df.index).index, MultiIndex) # Check actual equality tm.assert_index_equal(df.set_index(df.index).index, mi) # Check that [MultiIndex, MultiIndex] yields a MultiIndex rather # than a pair of tuples - self.assertTrue(isinstance(df.set_index( - [df.index, df.index]).index, MultiIndex)) + assert isinstance(df.set_index( + [df.index, df.index]).index, MultiIndex) # Check equality tm.assert_index_equal(df.set_index([df.index, df.index]).index, mi2) def test_rename_objects(self): renamed = self.mixed_frame.rename(columns=str.upper) - self.assertIn('FOO', renamed) - self.assertNotIn('foo', renamed) + + assert 'FOO' in renamed + assert 'foo' not in renamed def test_assign_columns(self): self.frame['hi'] = 'there' @@ -679,3 +858,112 @@ def test_set_index_preserve_categorical_dtype(self): result = df.set_index(cols).reset_index() result = result.reindex(columns=df.columns) tm.assert_frame_equal(result, df) + + +class TestIntervalIndex(object): + + def test_setitem(self): + + df = DataFrame({'A': range(10)}) + s = pd.cut(df.A, 5) + assert isinstance(s.cat.categories, IntervalIndex) + + # B & D end up as Categoricals + # the remainer are converted to in-line objects + # contining an IntervalIndex.values + df['B'] = s + df['C'] = np.array(s) + df['D'] = s.values + df['E'] = np.array(s.values) + + assert is_categorical_dtype(df['B']) + assert is_interval_dtype(df['B'].cat.categories) + assert is_categorical_dtype(df['D']) + assert is_interval_dtype(df['D'].cat.categories) + + assert is_object_dtype(df['C']) + assert is_object_dtype(df['E']) + + # they compare equal as Index + # when converted to numpy objects + c = lambda x: Index(np.array(x)) + tm.assert_index_equal(c(df.B), c(df.B), check_names=False) + tm.assert_index_equal(c(df.B), c(df.C), check_names=False) + tm.assert_index_equal(c(df.B), c(df.D), check_names=False) + tm.assert_index_equal(c(df.B), c(df.D), check_names=False) + + # B & D are the same Series + tm.assert_series_equal(df['B'], df['B'], check_names=False) + tm.assert_series_equal(df['B'], df['D'], check_names=False) + + # C & E are the same Series + tm.assert_series_equal(df['C'], df['C'], check_names=False) + tm.assert_series_equal(df['C'], df['E'], check_names=False) + + def test_set_reset_index(self): + + df = DataFrame({'A': range(10)}) + s = pd.cut(df.A, 5) + df['B'] = s + df = df.set_index('B') + + df = df.reset_index() + + def test_set_axis_inplace(self): + # GH14636 + df = DataFrame({'A': [1.1, 2.2, 3.3], + 'B': [5.0, 6.1, 7.2], + 'C': [4.4, 5.5, 6.6]}, + index=[2010, 2011, 2012]) + + expected = {0: df.copy(), + 1: df.copy()} + expected[0].index = list('abc') + expected[1].columns = list('abc') + expected['index'] = expected[0] + expected['columns'] = expected[1] + + for axis in expected: + # inplace=True + # The FutureWarning comes from the fact that we would like to have + # inplace default to False some day + for inplace, warn in (None, FutureWarning), (True, None): + kwargs = {'inplace': inplace} + + result = df.copy() + with tm.assert_produces_warning(warn): + result.set_axis(list('abc'), axis=axis, **kwargs) + tm.assert_frame_equal(result, expected[axis]) + + # inplace=False + result = df.set_axis(list('abc'), axis=axis, inplace=False) + tm.assert_frame_equal(expected[axis], result) + + # omitting the "axis" parameter + with tm.assert_produces_warning(None): + result = df.set_axis(list('abc'), inplace=False) + tm.assert_frame_equal(result, expected[0]) + + # wrong values for the "axis" parameter + for axis in 3, 'foo': + with tm.assert_raises_regex(ValueError, 'No axis named'): + df.set_axis(list('abc'), axis=axis, inplace=False) + + def test_set_axis_prior_to_deprecation_signature(self): + df = DataFrame({'A': [1.1, 2.2, 3.3], + 'B': [5.0, 6.1, 7.2], + 'C': [4.4, 5.5, 6.6]}, + index=[2010, 2011, 2012]) + + expected = {0: df.copy(), + 1: df.copy()} + expected[0].index = list('abc') + expected[1].columns = list('abc') + expected['index'] = expected[0] + expected['columns'] = expected[1] + + # old signature + for axis in expected: + with tm.assert_produces_warning(FutureWarning): + result = df.set_axis(axis, list('abc'), inplace=False) + tm.assert_frame_equal(result, expected[axis]) diff --git a/pandas/tests/frame/test_analytics.py b/pandas/tests/frame/test_analytics.py index 5d51306363053..93514a8a42215 100644 --- a/pandas/tests/frame/test_analytics.py +++ b/pandas/tests/frame/test_analytics.py @@ -2,30 +2,29 @@ from __future__ import print_function -from datetime import timedelta, datetime +from datetime import timedelta from distutils.version import LooseVersion import sys -import nose +import pytest +from string import ascii_lowercase from numpy import nan from numpy.random import randn import numpy as np -from pandas.compat import lrange -from pandas import (compat, isnull, notnull, DataFrame, Series, +from pandas.compat import lrange, product +from pandas import (compat, isna, notna, DataFrame, Series, MultiIndex, date_range, Timestamp) import pandas as pd import pandas.core.nanops as nanops import pandas.core.algorithms as algorithms -import pandas.formats.printing as printing +import pandas.io.formats.printing as printing import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameAnalytics(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameAnalytics(TestData): # ---------------------------------------------------------------------= # Correlation and covariance @@ -82,11 +81,11 @@ def test_corr_nooverlap(self): 'C': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}) rs = df.corr(meth) - self.assertTrue(isnull(rs.loc['A', 'B'])) - self.assertTrue(isnull(rs.loc['B', 'A'])) - self.assertEqual(rs.loc['A', 'A'], 1) - self.assertEqual(rs.loc['B', 'B'], 1) - self.assertTrue(isnull(rs.loc['C', 'C'])) + assert isna(rs.loc['A', 'B']) + assert isna(rs.loc['B', 'A']) + assert rs.loc['A', 'A'] == 1 + assert rs.loc['B', 'B'] == 1 + assert isna(rs.loc['C', 'C']) def test_corr_constant(self): tm._skip_if_no_scipy() @@ -97,7 +96,7 @@ def test_corr_constant(self): df = DataFrame({'A': [1, 1, 1, np.nan, np.nan, np.nan], 'B': [np.nan, np.nan, np.nan, 1, 1, 1]}) rs = df.corr(meth) - self.assertTrue(isnull(rs.values).all()) + assert isna(rs.values).all() def test_corr_int(self): # dtypes other than float64 #1761 @@ -120,6 +119,15 @@ def test_corr_int_and_boolean(self): for meth in ['pearson', 'kendall', 'spearman']: tm.assert_frame_equal(df.corr(meth), expected) + def test_corr_cov_independent_index_column(self): + # GH 14617 + df = pd.DataFrame(np.random.randn(4 * 10).reshape(10, 4), + columns=list("abcd")) + for method in ['cov', 'corr']: + result = getattr(df, method)() + assert result.index is not result.columns + assert result.index.equals(result.columns) + def test_cov(self): # min_periods no NAs (corner case) expected = self.frame.cov() @@ -128,7 +136,7 @@ def test_cov(self): tm.assert_frame_equal(expected, result) result = self.frame.cov(min_periods=len(self.frame) + 1) - self.assertTrue(isnull(result.values).all()) + assert isna(result.values).all() # with NAs frame = self.frame.copy() @@ -182,10 +190,10 @@ def test_corrwith(self): dropped = a.corrwith(b, axis=0, drop=True) tm.assert_almost_equal(dropped['A'], a['A'].corr(b['A'])) - self.assertNotIn('B', dropped) + assert 'B' not in dropped dropped = a.corrwith(b, axis=1, drop=True) - self.assertNotIn(a.index[-1], dropped.index) + assert a.index[-1] not in dropped.index # non time-series data index = ['a', 'b', 'c', 'd', 'e'] @@ -226,7 +234,7 @@ def test_corrwith_matches_corrcoef(self): c2 = np.corrcoef(df1['a'], df2['a'])[0][1] tm.assert_almost_equal(c1, c2) - self.assertTrue(c1 < 1) + assert c1 < 1 def test_bool_describe_in_mixed_frame(self): df = DataFrame({ @@ -327,8 +335,8 @@ def test_describe_datetime_columns(self): '50%', '75%', 'max']) expected.columns = exp_columns tm.assert_frame_equal(result, expected) - self.assertEqual(result.columns.freq, 'MS') - self.assertEqual(result.columns.tz, expected.columns.tz) + assert result.columns.freq == 'MS' + assert result.columns.tz == expected.columns.tz def test_describe_timedelta_values(self): # GH 6145 @@ -365,7 +373,7 @@ def test_describe_timedelta_values(self): "50% 3 days 00:00:00 0 days 03:00:00\n" "75% 4 days 00:00:00 0 days 04:00:00\n" "max 5 days 00:00:00 0 days 05:00:00") - self.assertEqual(repr(res), exp_repr) + assert repr(res) == exp_repr def test_reduce_mixed_frame(self): # GH 6806 @@ -381,7 +389,7 @@ def test_reduce_mixed_frame(self): tm.assert_series_equal(test, df.T.sum(axis=1)) def test_count(self): - f = lambda s: notnull(s).sum() + f = lambda s: notna(s).sum() self._check_stat_op('count', f, has_skipna=False, has_numeric_only=True, @@ -391,10 +399,10 @@ def test_count(self): # corner case frame = DataFrame() ct1 = frame.count(1) - tm.assertIsInstance(ct1, Series) + assert isinstance(ct1, Series) ct2 = frame.count(0) - tm.assertIsInstance(ct2, Series) + assert isinstance(ct2, Series) # GH #423 df = DataFrame(index=lrange(10)) @@ -454,7 +462,7 @@ def test_stat_operators_attempt_obj_array(self): for df in [df1, df2]: for meth in methods: - self.assertEqual(df.values.dtype, np.object_) + assert df.values.dtype == np.object_ result = getattr(df, meth)(1) expected = getattr(df.astype('f8'), meth)(1) @@ -469,7 +477,7 @@ def test_product(self): def test_median(self): def wrapper(x): - if isnull(x).any(): + if isna(x).any(): return np.nan return np.median(x) @@ -500,7 +508,7 @@ def test_cummin(self): # fix issue cummin_xs = self.tsframe.cummin(axis=1) - self.assertEqual(np.shape(cummin_xs), np.shape(self.tsframe)) + assert np.shape(cummin_xs) == np.shape(self.tsframe) def test_cummax(self): self.tsframe.loc[5:10, 0] = nan @@ -523,7 +531,7 @@ def test_cummax(self): # fix issue cummax_xs = self.tsframe.cummax(axis=1) - self.assertEqual(np.shape(cummax_xs), np.shape(self.tsframe)) + assert np.shape(cummax_xs) == np.shape(self.tsframe) def test_max(self): self._check_stat_op('max', np.max, check_dates=True) @@ -550,11 +558,11 @@ def test_var_std(self): arr = np.repeat(np.random.random((1, 1000)), 1000, 0) result = nanops.nanvar(arr, axis=0) - self.assertFalse((result < 0).any()) + assert not (result < 0).any() if nanops._USE_BOTTLENECK: nanops._USE_BOTTLENECK = False result = nanops.nanvar(arr, axis=0) - self.assertFalse((result < 0).any()) + assert not (result < 0).any() nanops._USE_BOTTLENECK = True def test_numeric_only_flag(self): @@ -578,10 +586,27 @@ def test_numeric_only_flag(self): tm.assert_series_equal(expected, result) # df1 has all numbers, df2 has a letter inside - self.assertRaises(TypeError, lambda: getattr(df1, meth) - (axis=1, numeric_only=False)) - self.assertRaises(TypeError, lambda: getattr(df2, meth) - (axis=1, numeric_only=False)) + pytest.raises(TypeError, lambda: getattr(df1, meth)( + axis=1, numeric_only=False)) + pytest.raises(TypeError, lambda: getattr(df2, meth)( + axis=1, numeric_only=False)) + + def test_mixed_ops(self): + # GH 16116 + df = DataFrame({'int': [1, 2, 3, 4], + 'float': [1., 2., 3., 4.], + 'str': ['a', 'b', 'c', 'd']}) + + for op in ['mean', 'std', 'var', 'skew', + 'kurt', 'sem']: + result = getattr(df, op)() + assert len(result) == 2 + + if nanops._USE_BOTTLENECK: + nanops._USE_BOTTLENECK = False + result = getattr(df, op)() + assert len(result) == 2 + nanops._USE_BOTTLENECK = True def test_cumsum(self): self.tsframe.loc[5:10, 0] = nan @@ -604,7 +629,7 @@ def test_cumsum(self): # fix issue cumsum_xs = self.tsframe.cumsum(axis=1) - self.assertEqual(np.shape(cumsum_xs), np.shape(self.tsframe)) + assert np.shape(cumsum_xs) == np.shape(self.tsframe) def test_cumprod(self): self.tsframe.loc[5:10, 0] = nan @@ -623,7 +648,7 @@ def test_cumprod(self): # fix issue cumprod_xs = self.tsframe.cumprod(axis=1) - self.assertEqual(np.shape(cumprod_xs), np.shape(self.tsframe)) + assert np.shape(cumprod_xs) == np.shape(self.tsframe) # ints df = self.tsframe.fillna(0).astype(int) @@ -635,173 +660,6 @@ def test_cumprod(self): df.cumprod(0) df.cumprod(1) - def test_rank(self): - tm._skip_if_no_scipy() - from scipy.stats import rankdata - - self.frame['A'][::2] = np.nan - self.frame['B'][::3] = np.nan - self.frame['C'][::4] = np.nan - self.frame['D'][::5] = np.nan - - ranks0 = self.frame.rank() - ranks1 = self.frame.rank(1) - mask = np.isnan(self.frame.values) - - fvals = self.frame.fillna(np.inf).values - - exp0 = np.apply_along_axis(rankdata, 0, fvals) - exp0[mask] = np.nan - - exp1 = np.apply_along_axis(rankdata, 1, fvals) - exp1[mask] = np.nan - - tm.assert_almost_equal(ranks0.values, exp0) - tm.assert_almost_equal(ranks1.values, exp1) - - # integers - df = DataFrame(np.random.randint(0, 5, size=40).reshape((10, 4))) - - result = df.rank() - exp = df.astype(float).rank() - tm.assert_frame_equal(result, exp) - - result = df.rank(1) - exp = df.astype(float).rank(1) - tm.assert_frame_equal(result, exp) - - def test_rank2(self): - df = DataFrame([[1, 3, 2], [1, 2, 3]]) - expected = DataFrame([[1.0, 3.0, 2.0], [1, 2, 3]]) / 3.0 - result = df.rank(1, pct=True) - tm.assert_frame_equal(result, expected) - - df = DataFrame([[1, 3, 2], [1, 2, 3]]) - expected = df.rank(0) / 2.0 - result = df.rank(0, pct=True) - tm.assert_frame_equal(result, expected) - - df = DataFrame([['b', 'c', 'a'], ['a', 'c', 'b']]) - expected = DataFrame([[2.0, 3.0, 1.0], [1, 3, 2]]) - result = df.rank(1, numeric_only=False) - tm.assert_frame_equal(result, expected) - - expected = DataFrame([[2.0, 1.5, 1.0], [1, 1.5, 2]]) - result = df.rank(0, numeric_only=False) - tm.assert_frame_equal(result, expected) - - df = DataFrame([['b', np.nan, 'a'], ['a', 'c', 'b']]) - expected = DataFrame([[2.0, nan, 1.0], [1.0, 3.0, 2.0]]) - result = df.rank(1, numeric_only=False) - tm.assert_frame_equal(result, expected) - - expected = DataFrame([[2.0, nan, 1.0], [1.0, 1.0, 2.0]]) - result = df.rank(0, numeric_only=False) - tm.assert_frame_equal(result, expected) - - # f7u12, this does not work without extensive workaround - data = [[datetime(2001, 1, 5), nan, datetime(2001, 1, 2)], - [datetime(2000, 1, 2), datetime(2000, 1, 3), - datetime(2000, 1, 1)]] - df = DataFrame(data) - - # check the rank - expected = DataFrame([[2., nan, 1.], - [2., 3., 1.]]) - result = df.rank(1, numeric_only=False, ascending=True) - tm.assert_frame_equal(result, expected) - - expected = DataFrame([[1., nan, 2.], - [2., 1., 3.]]) - result = df.rank(1, numeric_only=False, ascending=False) - tm.assert_frame_equal(result, expected) - - # mixed-type frames - self.mixed_frame['datetime'] = datetime.now() - self.mixed_frame['timedelta'] = timedelta(days=1, seconds=1) - - result = self.mixed_frame.rank(1) - expected = self.mixed_frame.rank(1, numeric_only=True) - tm.assert_frame_equal(result, expected) - - df = DataFrame({"a": [1e-20, -5, 1e-20 + 1e-40, 10, - 1e60, 1e80, 1e-30]}) - exp = DataFrame({"a": [3.5, 1., 3.5, 5., 6., 7., 2.]}) - tm.assert_frame_equal(df.rank(), exp) - - def test_rank_na_option(self): - tm._skip_if_no_scipy() - from scipy.stats import rankdata - - self.frame['A'][::2] = np.nan - self.frame['B'][::3] = np.nan - self.frame['C'][::4] = np.nan - self.frame['D'][::5] = np.nan - - # bottom - ranks0 = self.frame.rank(na_option='bottom') - ranks1 = self.frame.rank(1, na_option='bottom') - - fvals = self.frame.fillna(np.inf).values - - exp0 = np.apply_along_axis(rankdata, 0, fvals) - exp1 = np.apply_along_axis(rankdata, 1, fvals) - - tm.assert_almost_equal(ranks0.values, exp0) - tm.assert_almost_equal(ranks1.values, exp1) - - # top - ranks0 = self.frame.rank(na_option='top') - ranks1 = self.frame.rank(1, na_option='top') - - fval0 = self.frame.fillna((self.frame.min() - 1).to_dict()).values - fval1 = self.frame.T - fval1 = fval1.fillna((fval1.min() - 1).to_dict()).T - fval1 = fval1.fillna(np.inf).values - - exp0 = np.apply_along_axis(rankdata, 0, fval0) - exp1 = np.apply_along_axis(rankdata, 1, fval1) - - tm.assert_almost_equal(ranks0.values, exp0) - tm.assert_almost_equal(ranks1.values, exp1) - - # descending - - # bottom - ranks0 = self.frame.rank(na_option='top', ascending=False) - ranks1 = self.frame.rank(1, na_option='top', ascending=False) - - fvals = self.frame.fillna(np.inf).values - - exp0 = np.apply_along_axis(rankdata, 0, -fvals) - exp1 = np.apply_along_axis(rankdata, 1, -fvals) - - tm.assert_almost_equal(ranks0.values, exp0) - tm.assert_almost_equal(ranks1.values, exp1) - - # descending - - # top - ranks0 = self.frame.rank(na_option='bottom', ascending=False) - ranks1 = self.frame.rank(1, na_option='bottom', ascending=False) - - fval0 = self.frame.fillna((self.frame.min() - 1).to_dict()).values - fval1 = self.frame.T - fval1 = fval1.fillna((fval1.min() - 1).to_dict()).T - fval1 = fval1.fillna(np.inf).values - - exp0 = np.apply_along_axis(rankdata, 0, -fval0) - exp1 = np.apply_along_axis(rankdata, 1, -fval1) - - tm.assert_numpy_array_equal(ranks0.values, exp0) - tm.assert_numpy_array_equal(ranks1.values, exp1) - - def test_rank_axis(self): - # check if using axes' names gives the same result - df = pd.DataFrame([[2, 1], [4, 3]]) - tm.assert_frame_equal(df.rank(axis=0), df.rank(axis='index')) - tm.assert_frame_equal(df.rank(axis=1), df.rank(axis='columns')) - def test_sem(self): alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) self._check_stat_op('sem', alt) @@ -813,33 +671,13 @@ def test_sem(self): arr = np.repeat(np.random.random((1, 1000)), 1000, 0) result = nanops.nansem(arr, axis=0) - self.assertFalse((result < 0).any()) + assert not (result < 0).any() if nanops._USE_BOTTLENECK: nanops._USE_BOTTLENECK = False result = nanops.nansem(arr, axis=0) - self.assertFalse((result < 0).any()) + assert not (result < 0).any() nanops._USE_BOTTLENECK = True - def test_sort_invalid_kwargs(self): - df = DataFrame([1, 2, 3], columns=['a']) - - msg = r"sort\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, df.sort, foo=2) - - # Neither of these should raise an error because they - # are explicit keyword arguments in the signature and - # hence should not be swallowed by the kwargs parameter - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - df.sort(axis=1) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - df.sort(kind='mergesort') - - msg = "the 'order' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, df.sort, order=2) - def test_skew(self): tm._skip_if_no_scipy() from scipy.stats import skew @@ -872,8 +710,8 @@ def alt(x): kurt = df.kurt() kurt2 = df.kurt(level=0).xs('bar') tm.assert_series_equal(kurt, kurt2, check_names=False) - self.assertTrue(kurt.name is None) - self.assertEqual(kurt2.name, 'bar') + assert kurt.name is None + assert kurt2.name == 'bar' def _check_stat_op(self, name, alternative, frame=None, has_skipna=True, has_numeric_only=False, check_dtype=True, @@ -890,12 +728,12 @@ def _check_stat_op(self, name, alternative, frame=None, has_skipna=True, df = DataFrame({'b': date_range('1/1/2001', periods=2)}) _f = getattr(df, name) result = _f() - self.assertIsInstance(result, Series) + assert isinstance(result, Series) df['a'] = lrange(len(df)) result = getattr(df, name)() - self.assertIsInstance(result, Series) - self.assertTrue(len(result)) + assert isinstance(result, Series) + assert len(result) if has_skipna: def skipna_wrapper(x): @@ -933,15 +771,15 @@ def wrapper(x): # check dtypes if check_dtype: lcd_dtype = frame.values.dtype - self.assertEqual(lcd_dtype, result0.dtype) - self.assertEqual(lcd_dtype, result1.dtype) + assert lcd_dtype == result0.dtype + assert lcd_dtype == result1.dtype # result = f(axis=1) # comp = frame.apply(alternative, axis=1).reindex(result.index) # assert_series_equal(result, comp) # bad axis - tm.assertRaisesRegexp(ValueError, 'No axis named 2', f, axis=2) + tm.assert_raises_regex(ValueError, 'No axis named 2', f, axis=2) # make sure works on mixed-type frame getattr(self.mixed_frame, name)(axis=0) getattr(self.mixed_frame, name)(axis=1) @@ -958,8 +796,8 @@ def wrapper(x): r0 = getattr(all_na, name)(axis=0) r1 = getattr(all_na, name)(axis=1) if not tm._incompat_bottleneck_version(name): - self.assertTrue(np.isnan(r0).all()) - self.assertTrue(np.isnan(r1).all()) + assert np.isnan(r0).all() + assert np.isnan(r1).all() def test_mode(self): df = pd.DataFrame({"A": [12, 12, 11, 12, 19, 11], @@ -969,18 +807,23 @@ def test_mode(self): "E": [8, 8, 1, 1, 3, 3]}) tm.assert_frame_equal(df[["A"]].mode(), pd.DataFrame({"A": [12]})) - expected = pd.Series([], dtype='int64', name='D').to_frame() + expected = pd.Series([0, 1, 2, 3, 4, 5], dtype='int64', name='D').\ + to_frame() tm.assert_frame_equal(df[["D"]].mode(), expected) expected = pd.Series([1, 3, 8], dtype='int64', name='E').to_frame() tm.assert_frame_equal(df[["E"]].mode(), expected) tm.assert_frame_equal(df[["A", "B"]].mode(), pd.DataFrame({"A": [12], "B": [10.]})) tm.assert_frame_equal(df.mode(), - pd.DataFrame({"A": [12, np.nan, np.nan], - "B": [10, np.nan, np.nan], - "C": [8, 9, np.nan], - "D": [np.nan, np.nan, np.nan], - "E": [1, 3, 8]})) + pd.DataFrame({"A": [12, np.nan, np.nan, np.nan, + np.nan, np.nan], + "B": [10, np.nan, np.nan, np.nan, + np.nan, np.nan], + "C": [8, 9, np.nan, np.nan, np.nan, + np.nan], + "D": [0, 1, 2, 3, 4, 5], + "E": [1, 3, 8, np.nan, np.nan, + np.nan]})) # outputs in sorted order df["C"] = list(reversed(df["C"])) @@ -997,20 +840,12 @@ def test_mode(self): df = pd.DataFrame({"A": np.arange(6, dtype='int64'), "B": pd.date_range('2011', periods=6), "C": list('abcdef')}) - exp = pd.DataFrame({"A": pd.Series([], dtype=df["A"].dtype), - "B": pd.Series([], dtype=df["B"].dtype), - "C": pd.Series([], dtype=df["C"].dtype)}) - tm.assert_frame_equal(df.mode(), exp) - - # and also when not empty - df.loc[1, "A"] = 0 - df.loc[4, "B"] = df.loc[3, "B"] - df.loc[5, "C"] = 'e' - exp = pd.DataFrame({"A": pd.Series([0], dtype=df["A"].dtype), - "B": pd.Series([df.loc[3, "B"]], + exp = pd.DataFrame({"A": pd.Series(np.arange(6, dtype='int64'), + dtype=df["A"].dtype), + "B": pd.Series(pd.date_range('2011', periods=6), dtype=df["B"].dtype), - "C": pd.Series(['e'], dtype=df["C"].dtype)}) - + "C": pd.Series(list('abcdef'), + dtype=df["C"].dtype)}) tm.assert_frame_equal(df.mode(), exp) def test_operators_timedelta64(self): @@ -1025,19 +860,19 @@ def test_operators_timedelta64(self): # min result = diffs.min() - self.assertEqual(result[0], diffs.loc[0, 'A']) - self.assertEqual(result[1], diffs.loc[0, 'B']) + assert result[0] == diffs.loc[0, 'A'] + assert result[1] == diffs.loc[0, 'B'] result = diffs.min(axis=1) - self.assertTrue((result == diffs.loc[0, 'B']).all()) + assert (result == diffs.loc[0, 'B']).all() # max result = diffs.max() - self.assertEqual(result[0], diffs.loc[2, 'A']) - self.assertEqual(result[1], diffs.loc[2, 'B']) + assert result[0] == diffs.loc[2, 'A'] + assert result[1] == diffs.loc[2, 'B'] result = diffs.max(axis=1) - self.assertTrue((result == diffs['A']).all()) + assert (result == diffs['A']).all() # abs result = diffs.abs() @@ -1055,7 +890,7 @@ def test_operators_timedelta64(self): mixed['F'] = Timestamp('20130101') # results in an object array - from pandas.tseries.timedeltas import ( + from pandas.core.tools.timedeltas import ( _coerce_scalar_to_timedelta_type as _coerce) result = mixed.min() @@ -1085,20 +920,20 @@ def test_operators_timedelta64(self): df = DataFrame({'time': date_range('20130102', periods=5), 'time2': date_range('20130105', periods=5)}) df['off1'] = df['time2'] - df['time'] - self.assertEqual(df['off1'].dtype, 'timedelta64[ns]') + assert df['off1'].dtype == 'timedelta64[ns]' df['off2'] = df['time'] - df['time2'] df._consolidate_inplace() - self.assertTrue(df['off1'].dtype == 'timedelta64[ns]') - self.assertTrue(df['off2'].dtype == 'timedelta64[ns]') + assert df['off1'].dtype == 'timedelta64[ns]' + assert df['off2'].dtype == 'timedelta64[ns]' def test_sum_corner(self): axis0 = self.empty.sum(0) axis1 = self.empty.sum(1) - tm.assertIsInstance(axis0, Series) - tm.assertIsInstance(axis1, Series) - self.assertEqual(len(axis0), 0) - self.assertEqual(len(axis1), 0) + assert isinstance(axis0, Series) + assert isinstance(axis1, Series) + assert len(axis0) == 0 + assert len(axis1) == 0 def test_sum_object(self): values = self.frame.values.astype(int) @@ -1117,18 +952,18 @@ def test_mean_corner(self): # unit test when have object data the_mean = self.mixed_frame.mean(axis=0) the_sum = self.mixed_frame.sum(axis=0, numeric_only=True) - self.assert_index_equal(the_sum.index, the_mean.index) - self.assertTrue(len(the_mean.index) < len(self.mixed_frame.columns)) + tm.assert_index_equal(the_sum.index, the_mean.index) + assert len(the_mean.index) < len(self.mixed_frame.columns) # xs sum mixed type, just want to know it works... the_mean = self.mixed_frame.mean(axis=1) the_sum = self.mixed_frame.sum(axis=1, numeric_only=True) - self.assert_index_equal(the_sum.index, the_mean.index) + tm.assert_index_equal(the_sum.index, the_mean.index) # take mean of boolean column self.frame['bool'] = self.frame['A'] > 0 means = self.frame.mean(0) - self.assertEqual(means['bool'], self.frame['bool'].values.mean()) + assert means['bool'] == self.frame['bool'].values.mean() def test_stats_mixed_type(self): # don't blow up @@ -1139,7 +974,7 @@ def test_stats_mixed_type(self): def test_median_corner(self): def wrapper(x): - if isnull(x).any(): + if isna(x).any(): return np.nan return np.median(x) @@ -1163,8 +998,8 @@ def test_cumsum_corner(self): def test_sum_bools(self): df = DataFrame(index=lrange(1), columns=lrange(10)) - bools = isnull(df) - self.assertEqual(bools.sum(axis=1)[0], 10) + bools = isna(df) + assert bools.sum(axis=1)[0] == 10 # Index of max / min @@ -1180,7 +1015,7 @@ def test_idxmin(self): skipna=skipna) tm.assert_series_equal(result, expected) - self.assertRaises(ValueError, frame.idxmin, axis=2) + pytest.raises(ValueError, frame.idxmin, axis=2) def test_idxmax(self): frame = self.frame @@ -1194,7 +1029,7 @@ def test_idxmax(self): skipna=skipna) tm.assert_series_equal(result, expected) - self.assertRaises(ValueError, frame.idxmax, axis=2) + pytest.raises(ValueError, frame.idxmax, axis=2) # ---------------------------------------------------------------------- # Logical reductions @@ -1269,7 +1104,7 @@ def wrapper(x): # assert_series_equal(result, comp) # bad axis - self.assertRaises(ValueError, f, axis=2) + pytest.raises(ValueError, f, axis=2) # make sure works on mixed-type frame mixed = self.mixed_frame @@ -1296,80 +1131,13 @@ def __nonzero__(self): r0 = getattr(all_na, name)(axis=0) r1 = getattr(all_na, name)(axis=1) if name == 'any': - self.assertFalse(r0.any()) - self.assertFalse(r1.any()) + assert not r0.any() + assert not r1.any() else: - self.assertTrue(r0.all()) - self.assertTrue(r1.all()) + assert r0.all() + assert r1.all() # ---------------------------------------------------------------------- - # Top / bottom - - def test_nlargest(self): - # GH10393 - from string import ascii_lowercase - df = pd.DataFrame({'a': np.random.permutation(10), - 'b': list(ascii_lowercase[:10])}) - result = df.nlargest(5, 'a') - expected = df.sort_values('a', ascending=False).head(5) - tm.assert_frame_equal(result, expected) - - def test_nlargest_multiple_columns(self): - from string import ascii_lowercase - df = pd.DataFrame({'a': np.random.permutation(10), - 'b': list(ascii_lowercase[:10]), - 'c': np.random.permutation(10).astype('float64')}) - result = df.nlargest(5, ['a', 'b']) - expected = df.sort_values(['a', 'b'], ascending=False).head(5) - tm.assert_frame_equal(result, expected) - - def test_nsmallest(self): - from string import ascii_lowercase - df = pd.DataFrame({'a': np.random.permutation(10), - 'b': list(ascii_lowercase[:10])}) - result = df.nsmallest(5, 'a') - expected = df.sort_values('a').head(5) - tm.assert_frame_equal(result, expected) - - def test_nsmallest_multiple_columns(self): - from string import ascii_lowercase - df = pd.DataFrame({'a': np.random.permutation(10), - 'b': list(ascii_lowercase[:10]), - 'c': np.random.permutation(10).astype('float64')}) - result = df.nsmallest(5, ['a', 'c']) - expected = df.sort_values(['a', 'c']).head(5) - tm.assert_frame_equal(result, expected) - - def test_nsmallest_nlargest_duplicate_index(self): - # GH 13412 - df = pd.DataFrame({'a': [1, 2, 3, 4], - 'b': [4, 3, 2, 1], - 'c': [0, 1, 2, 3]}, - index=[0, 0, 1, 1]) - result = df.nsmallest(4, 'a') - expected = df.sort_values('a').head(4) - tm.assert_frame_equal(result, expected) - - result = df.nlargest(4, 'a') - expected = df.sort_values('a', ascending=False).head(4) - tm.assert_frame_equal(result, expected) - - result = df.nsmallest(4, ['a', 'c']) - expected = df.sort_values(['a', 'c']).head(4) - tm.assert_frame_equal(result, expected) - - result = df.nsmallest(4, ['c', 'a']) - expected = df.sort_values(['c', 'a']).head(4) - tm.assert_frame_equal(result, expected) - - result = df.nlargest(4, ['a', 'c']) - expected = df.sort_values(['a', 'c'], ascending=False).head(4) - tm.assert_frame_equal(result, expected) - - result = df.nlargest(4, ['c', 'a']) - expected = df.sort_values(['c', 'a'], ascending=False).head(4) - tm.assert_frame_equal(result, expected) - # ---------------------------------------------------------------------- # Isin def test_isin(self): @@ -1383,10 +1151,13 @@ def test_isin(self): expected = DataFrame([df.loc[s].isin(other) for s in df.index]) tm.assert_frame_equal(result, expected) - def test_isin_empty(self): + @pytest.mark.parametrize("empty", [[], Series(), np.array([])]) + def test_isin_empty(self, empty): + # see gh-16991 df = DataFrame({'A': ['a', 'b', 'c'], 'B': ['a', 'e', 'f']}) - result = df.isin([]) - expected = pd.DataFrame(False, df.index, df.columns) + expected = DataFrame(False, df.index, df.columns) + + result = df.isin(empty) tm.assert_frame_equal(result, expected) def test_isin_dict(self): @@ -1412,10 +1183,10 @@ def test_isin_with_string_scalar(self): df = DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'], 'ids2': ['a', 'n', 'c', 'n']}, index=['foo', 'bar', 'baz', 'qux']) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.isin('a') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.isin('aaa') def test_isin_df(self): @@ -1433,23 +1204,31 @@ def test_isin_df(self): expected['B'] = False tm.assert_frame_equal(result, expected) + def test_isin_tuples(self): + # GH16394 + df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) + df['C'] = list(zip(df['A'], df['B'])) + result = df['C'].isin([(1, 'a')]) + tm.assert_series_equal(result, + Series([True, False, False], name="C")) + def test_isin_df_dupe_values(self): df1 = DataFrame({'A': [1, 2, 3, 4], 'B': [2, np.nan, 4, 4]}) # just cols duped df2 = DataFrame([[0, 2], [12, 4], [2, np.nan], [4, 5]], columns=['B', 'B']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.isin(df2) # just index duped df2 = DataFrame([[0, 2], [12, 4], [2, np.nan], [4, 5]], columns=['A', 'B'], index=[0, 0, 1, 1]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.isin(df2) # cols and index: df2.columns = ['B', 'B'] - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.isin(df2) def test_isin_dupe_self(self): @@ -1495,6 +1274,27 @@ def test_isin_multiIndex(self): result = df1.isin(df2) tm.assert_frame_equal(result, expected) + def test_isin_empty_datetimelike(self): + # GH 15473 + df1_ts = DataFrame({'date': + pd.to_datetime(['2014-01-01', '2014-01-02'])}) + df1_td = DataFrame({'date': + [pd.Timedelta(1, 's'), pd.Timedelta(2, 's')]}) + df2 = DataFrame({'date': []}) + df3 = DataFrame() + + expected = DataFrame({'date': [False, False]}) + + result = df1_ts.isin(df2) + tm.assert_frame_equal(result, expected) + result = df1_ts.isin(df3) + tm.assert_frame_equal(result, expected) + + result = df1_td.isin(df2) + tm.assert_frame_equal(result, expected) + result = df1_td.isin(df3) + tm.assert_frame_equal(result, expected) + # ---------------------------------------------------------------------- # Row deduplication @@ -1518,13 +1318,7 @@ def test_drop_duplicates(self): result = df.drop_duplicates('AAA', keep=False) expected = df.loc[[]] tm.assert_frame_equal(result, expected) - self.assertEqual(len(result), 0) - - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates('AAA', take_last=True) - expected = df.loc[[6, 7]] - tm.assert_frame_equal(result, expected) + assert len(result) == 0 # multi column expected = df.loc[[0, 1, 2, 3]] @@ -1541,12 +1335,6 @@ def test_drop_duplicates(self): expected = df.loc[[0]] tm.assert_frame_equal(result, expected) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates(('AAA', 'B'), take_last=True) - expected = df.loc[[0, 5, 6, 7]] - tm.assert_frame_equal(result, expected) - # consider everything df2 = df.loc[:, ['AAA', 'B', 'C']] @@ -1563,13 +1351,6 @@ def test_drop_duplicates(self): expected = df2.drop_duplicates(['AAA', 'B'], keep=False) tm.assert_frame_equal(result, expected) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df2.drop_duplicates(take_last=True) - with tm.assert_produces_warning(FutureWarning): - expected = df2.drop_duplicates(['AAA', 'B'], take_last=True) - tm.assert_frame_equal(result, expected) - # integers result = df.drop_duplicates('C') expected = df.iloc[[0, 2]] @@ -1610,7 +1391,7 @@ def test_drop_duplicates(self): df = df.append([[1] + [0] * 8], ignore_index=True) for keep in ['first', 'last', False]: - self.assertEqual(df.duplicated(keep=keep).sum(), 0) + assert df.duplicated(keep=keep).sum() == 0 def test_drop_duplicates_for_take_all(self): df = DataFrame({'AAA': ['foo', 'bar', 'baz', 'bar', @@ -1665,13 +1446,7 @@ def test_drop_duplicates_tuple(self): result = df.drop_duplicates(('AA', 'AB'), keep=False) expected = df.loc[[]] # empty df - self.assertEqual(len(result), 0) - tm.assert_frame_equal(result, expected) - - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates(('AA', 'AB'), take_last=True) - expected = df.loc[[6, 7]] + assert len(result) == 0 tm.assert_frame_equal(result, expected) # multi column @@ -1700,13 +1475,7 @@ def test_drop_duplicates_NA(self): result = df.drop_duplicates('A', keep=False) expected = df.loc[[]] # empty df tm.assert_frame_equal(result, expected) - self.assertEqual(len(result), 0) - - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates('A', take_last=True) - expected = df.loc[[1, 6, 7]] - tm.assert_frame_equal(result, expected) + assert len(result) == 0 # multi column result = df.drop_duplicates(['A', 'B']) @@ -1721,12 +1490,6 @@ def test_drop_duplicates_NA(self): expected = df.loc[[6]] tm.assert_frame_equal(result, expected) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates(['A', 'B'], take_last=True) - expected = df.loc[[1, 5, 6, 7]] - tm.assert_frame_equal(result, expected) - # nan df = DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'bar', 'foo'], @@ -1747,13 +1510,7 @@ def test_drop_duplicates_NA(self): result = df.drop_duplicates('C', keep=False) expected = df.loc[[]] # empty df tm.assert_frame_equal(result, expected) - self.assertEqual(len(result), 0) - - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates('C', take_last=True) - expected = df.loc[[3, 7]] - tm.assert_frame_equal(result, expected) + assert len(result) == 0 # multi column result = df.drop_duplicates(['C', 'B']) @@ -1768,12 +1525,6 @@ def test_drop_duplicates_NA(self): expected = df.loc[[1]] tm.assert_frame_equal(result, expected) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - result = df.drop_duplicates(['C', 'B'], take_last=True) - expected = df.loc[[1, 3, 6, 7]] - tm.assert_frame_equal(result, expected) - def test_drop_duplicates_NA_for_take_all(self): # none df = DataFrame({'A': [None, None, 'foo', 'bar', @@ -1834,15 +1585,7 @@ def test_drop_duplicates_inplace(self): expected = orig.loc[[]] result = df tm.assert_frame_equal(result, expected) - self.assertEqual(len(df), 0) - - # deprecate take_last - df = orig.copy() - with tm.assert_produces_warning(FutureWarning): - df.drop_duplicates('A', take_last=True, inplace=True) - expected = orig.loc[[6, 7]] - result = df - tm.assert_frame_equal(result, expected) + assert len(df) == 0 # multi column df = orig.copy() @@ -1863,14 +1606,6 @@ def test_drop_duplicates_inplace(self): result = df tm.assert_frame_equal(result, expected) - # deprecate take_last - df = orig.copy() - with tm.assert_produces_warning(FutureWarning): - df.drop_duplicates(['A', 'B'], take_last=True, inplace=True) - expected = orig.loc[[0, 5, 6, 7]] - result = df - tm.assert_frame_equal(result, expected) - # consider everything orig2 = orig.loc[:, ['A', 'B', 'C']].copy() @@ -1893,17 +1628,7 @@ def test_drop_duplicates_inplace(self): result = df2 tm.assert_frame_equal(result, expected) - # deprecate take_last - df2 = orig2.copy() - with tm.assert_produces_warning(FutureWarning): - df2.drop_duplicates(take_last=True, inplace=True) - with tm.assert_produces_warning(FutureWarning): - expected = orig2.drop_duplicates(['A', 'B'], take_last=True) - result = df2 - tm.assert_frame_equal(result, expected) - # Rounding - def test_round(self): # GH 2665 @@ -1932,7 +1657,7 @@ def test_round(self): # Round with a list round_list = [1, 2] - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(round_list) # Round with a dictionary @@ -1955,34 +1680,34 @@ def test_round(self): # float input to `decimals` non_int_round_dict = {'col1': 1, 'col2': 0.5} - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_dict) # String input non_int_round_dict = {'col1': 1, 'col2': 'foo'} - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_dict) non_int_round_Series = Series(non_int_round_dict) - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_Series) # List input non_int_round_dict = {'col1': 1, 'col2': [1, 2]} - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_dict) non_int_round_Series = Series(non_int_round_dict) - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_Series) # Non integer Series inputs non_int_round_Series = Series(non_int_round_dict) - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_Series) non_int_round_Series = Series(non_int_round_dict) - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(non_int_round_Series) # Negative numbers @@ -2003,10 +1728,10 @@ def test_round(self): if sys.version < LooseVersion('2.7'): # Rounding with decimal is a ValueError in Python < 2.7 - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.round(nan_round_Series) else: - with self.assertRaises(TypeError): + with pytest.raises(TypeError): df.round(nan_round_Series) # Make sure this doesn't break existing Series.round @@ -2035,7 +1760,7 @@ def test_numpy_round(self): tm.assert_frame_equal(out, expected) msg = "the 'out' parameter is not supported" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): np.round(df, decimals=0, out=df) def test_round_mixed_type(self): @@ -2061,15 +1786,15 @@ def test_round_issue(self): dfs = pd.concat((df, df), axis=1) rounded = dfs.round() - self.assert_index_equal(rounded.index, dfs.index) + tm.assert_index_equal(rounded.index, dfs.index) decimals = pd.Series([1, 0, 2], index=['A', 'B', 'A']) - self.assertRaises(ValueError, df.round, decimals) + pytest.raises(ValueError, df.round, decimals) def test_built_in_round(self): if not compat.PY3: - raise nose.SkipTest("build in round cannot be overriden " - "prior to Python 3") + pytest.skip("build in round cannot be overriden " + "prior to Python 3") # GH11763 # Here's the test frame we'll be working with @@ -2085,15 +1810,35 @@ def test_built_in_round(self): def test_clip(self): median = self.frame.median().median() + original = self.frame.copy() capped = self.frame.clip_upper(median) - self.assertFalse((capped.values > median).any()) + assert not (capped.values > median).any() floored = self.frame.clip_lower(median) - self.assertFalse((floored.values < median).any()) + assert not (floored.values < median).any() double = self.frame.clip(upper=median, lower=median) - self.assertFalse((double.values != median).any()) + assert not (double.values != median).any() + + # Verify that self.frame was not changed inplace + assert (self.frame.values == original.values).all() + + def test_inplace_clip(self): + # GH #15388 + median = self.frame.median().median() + frame_copy = self.frame.copy() + + frame_copy.clip_upper(median, inplace=True) + assert not (frame_copy.values > median).any() + frame_copy = self.frame.copy() + + frame_copy.clip_lower(median, inplace=True) + assert not (frame_copy.values < median).any() + frame_copy = self.frame.copy() + + frame_copy.clip(upper=median, lower=median, inplace=True) + assert not (frame_copy.values != median).any() def test_dataframe_clip(self): # GH #2747 @@ -2106,41 +1851,77 @@ def test_dataframe_clip(self): lb_mask = df.values <= lb ub_mask = df.values >= ub mask = ~lb_mask & ~ub_mask - self.assertTrue((clipped_df.values[lb_mask] == lb).all()) - self.assertTrue((clipped_df.values[ub_mask] == ub).all()) - self.assertTrue((clipped_df.values[mask] == - df.values[mask]).all()) - - def test_clip_against_series(self): + assert (clipped_df.values[lb_mask] == lb).all() + assert (clipped_df.values[ub_mask] == ub).all() + assert (clipped_df.values[mask] == df.values[mask]).all() + + @pytest.mark.xfail(reason=("clip on mixed integer or floats " + "with integer clippers coerces to float")) + def test_clip_mixed_numeric(self): + + df = DataFrame({'A': [1, 2, 3], + 'B': [1., np.nan, 3.]}) + result = df.clip(1, 2) + expected = DataFrame({'A': [1, 2, 2], + 'B': [1., np.nan, 2.]}) + tm.assert_frame_equal(result, expected, check_like=True) + + @pytest.mark.parametrize("inplace", [True, False]) + def test_clip_against_series(self, inplace): # GH #6966 df = DataFrame(np.random.randn(1000, 2)) lb = Series(np.random.randn(1000)) ub = lb + 1 - clipped_df = df.clip(lb, ub, axis=0) + original = df.copy() + clipped_df = df.clip(lb, ub, axis=0, inplace=inplace) + + if inplace: + clipped_df = df for i in range(2): - lb_mask = df.iloc[:, i] <= lb - ub_mask = df.iloc[:, i] >= ub + lb_mask = original.iloc[:, i] <= lb + ub_mask = original.iloc[:, i] >= ub mask = ~lb_mask & ~ub_mask result = clipped_df.loc[lb_mask, i] tm.assert_series_equal(result, lb[lb_mask], check_names=False) - self.assertEqual(result.name, i) + assert result.name == i result = clipped_df.loc[ub_mask, i] tm.assert_series_equal(result, ub[ub_mask], check_names=False) - self.assertEqual(result.name, i) + assert result.name == i tm.assert_series_equal(clipped_df.loc[mask, i], df.loc[mask, i]) - def test_clip_against_frame(self): + @pytest.mark.parametrize("inplace", [True, False]) + @pytest.mark.parametrize("lower", [[2, 3, 4], np.asarray([2, 3, 4])]) + @pytest.mark.parametrize("axis,res", [ + (0, [[2., 2., 3.], [4., 5., 6.], [7., 7., 7.]]), + (1, [[2., 3., 4.], [4., 5., 6.], [5., 6., 7.]]) + ]) + def test_clip_against_list_like(self, inplace, lower, axis, res): + # GH #15390 + original = self.simple.copy(deep=True) + + result = original.clip(lower=lower, upper=[5, 6, 7], + axis=axis, inplace=inplace) + + expected = pd.DataFrame(res, + columns=original.columns, + index=original.index) + if inplace: + result = original + tm.assert_frame_equal(result, expected, check_exact=True) + + @pytest.mark.parametrize("axis", [0, 1, None]) + def test_clip_against_frame(self, axis): df = DataFrame(np.random.randn(1000, 2)) lb = DataFrame(np.random.randn(1000, 2)) ub = lb + 1 - clipped_df = df.clip(lb, ub) + clipped_df = df.clip(lb, ub, axis=axis) lb_mask = df <= lb ub_mask = df >= ub @@ -2150,6 +1931,17 @@ def test_clip_against_frame(self): tm.assert_frame_equal(clipped_df[ub_mask], ub[ub_mask]) tm.assert_frame_equal(clipped_df[mask], df[mask]) + def test_clip_with_na_args(self): + """Should process np.nan argument as None """ + # GH # 17276 + tm.assert_frame_equal(self.frame.clip(np.nan), self.frame) + tm.assert_frame_equal(self.frame.clip(upper=[1, 2, np.nan]), + self.frame) + tm.assert_frame_equal(self.frame.clip(lower=[1, np.nan, 3]), + self.frame) + tm.assert_frame_equal(self.frame.clip(upper=np.nan, lower=np.nan), + self.frame) + # Matrix-like def test_dot(self): @@ -2170,11 +1962,11 @@ def test_dot(self): # Check series argument result = a.dot(b['one']) tm.assert_series_equal(result, expected['one'], check_names=False) - self.assertTrue(result.name is None) + assert result.name is None result = a.dot(b1['one']) tm.assert_series_equal(result, expected['one'], check_names=False) - self.assertTrue(result.name is None) + assert result.name is None # can pass correct-length arrays row = a.iloc[0].values @@ -2183,7 +1975,8 @@ def test_dot(self): exp = a.dot(a.iloc[0]) tm.assert_series_equal(result, exp) - with tm.assertRaisesRegexp(ValueError, 'Dot product shape mismatch'): + with tm.assert_raises_regex(ValueError, + 'Dot product shape mismatch'): a.dot(row[:-1]) a = np.random.rand(1, 5) @@ -2200,9 +1993,147 @@ def test_dot(self): df = DataFrame(randn(3, 4), index=[1, 2, 3], columns=lrange(4)) df2 = DataFrame(randn(5, 3), index=lrange(5), columns=[1, 2, 3]) - with tm.assertRaisesRegexp(ValueError, 'aligned'): + with tm.assert_raises_regex(ValueError, 'aligned'): df.dot(df2) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + +@pytest.fixture +def df_duplicates(): + return pd.DataFrame({'a': [1, 2, 3, 4, 4], + 'b': [1, 1, 1, 1, 1], + 'c': [0, 1, 2, 5, 4]}, + index=[0, 0, 1, 1, 1]) + + +@pytest.fixture +def df_strings(): + return pd.DataFrame({'a': np.random.permutation(10), + 'b': list(ascii_lowercase[:10]), + 'c': np.random.permutation(10).astype('float64')}) + + +@pytest.fixture +def df_main_dtypes(): + return pd.DataFrame( + {'group': [1, 1, 2], + 'int': [1, 2, 3], + 'float': [4., 5., 6.], + 'string': list('abc'), + 'category_string': pd.Series(list('abc')).astype('category'), + 'category_int': [7, 8, 9], + 'datetime': pd.date_range('20130101', periods=3), + 'datetimetz': pd.date_range('20130101', + periods=3, + tz='US/Eastern'), + 'timedelta': pd.timedelta_range('1 s', periods=3, freq='s')}, + columns=['group', 'int', 'float', 'string', + 'category_string', 'category_int', + 'datetime', 'datetimetz', + 'timedelta']) + + +class TestNLargestNSmallest(object): + + dtype_error_msg_template = ("Column {column!r} has dtype {dtype}, cannot " + "use method {method!r} with this dtype") + + # ---------------------------------------------------------------------- + # Top / bottom + @pytest.mark.parametrize( + 'method, n, order', + product(['nsmallest', 'nlargest'], range(1, 11), + [['a'], + ['c'], + ['a', 'b'], + ['a', 'c'], + ['b', 'a'], + ['b', 'c'], + ['a', 'b', 'c'], + ['c', 'a', 'b'], + ['c', 'b', 'a'], + ['b', 'c', 'a'], + ['b', 'a', 'c'], + + # dups! + ['b', 'c', 'c'], + + ])) + def test_n(self, df_strings, method, n, order): + # GH10393 + df = df_strings + if 'b' in order: + + error_msg = self.dtype_error_msg_template.format( + column='b', method=method, dtype='object') + with tm.assert_raises_regex(TypeError, error_msg): + getattr(df, method)(n, order) + else: + ascending = method == 'nsmallest' + result = getattr(df, method)(n, order) + expected = df.sort_values(order, ascending=ascending).head(n) + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize( + 'method, columns', + product(['nsmallest', 'nlargest'], + product(['group'], ['category_string', 'string']) + )) + def test_n_error(self, df_main_dtypes, method, columns): + df = df_main_dtypes + error_msg = self.dtype_error_msg_template.format( + column=columns[1], method=method, dtype=df[columns[1]].dtype) + with tm.assert_raises_regex(TypeError, error_msg): + getattr(df, method)(2, columns) + + def test_n_all_dtypes(self, df_main_dtypes): + df = df_main_dtypes + df.nsmallest(2, list(set(df) - {'category_string', 'string'})) + df.nlargest(2, list(set(df) - {'category_string', 'string'})) + + def test_n_identical_values(self): + # GH15297 + df = pd.DataFrame({'a': [1] * 5, 'b': [1, 2, 3, 4, 5]}) + + result = df.nlargest(3, 'a') + expected = pd.DataFrame( + {'a': [1] * 3, 'b': [1, 2, 3]}, index=[0, 1, 2] + ) + tm.assert_frame_equal(result, expected) + + result = df.nsmallest(3, 'a') + expected = pd.DataFrame({'a': [1] * 3, 'b': [1, 2, 3]}) + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize( + 'n, order', + product([1, 2, 3, 4, 5], + [['a', 'b', 'c'], + ['c', 'b', 'a'], + ['a'], + ['b'], + ['a', 'b'], + ['c', 'b']])) + def test_n_duplicate_index(self, df_duplicates, n, order): + # GH 13412 + + df = df_duplicates + result = df.nsmallest(n, order) + expected = df.sort_values(order).head(n) + tm.assert_frame_equal(result, expected) + + result = df.nlargest(n, order) + expected = df.sort_values(order, ascending=False).head(n) + tm.assert_frame_equal(result, expected) + + def test_series_broadcasting(self): + # smoke test for numpy warnings + # GH 16378, GH 16306 + df = DataFrame([1.0, 1.0, 1.0]) + df_nan = DataFrame({'A': [np.nan, 2.0, np.nan]}) + s = Series([1, 1, 1]) + s_nan = Series([np.nan, np.nan, 1]) + + with tm.assert_produces_warning(None): + df_nan.clip_lower(s, axis=0) + for op in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']: + getattr(df, op)(s_nan, axis=0) diff --git a/pandas/tests/frame/test_api.py b/pandas/tests/frame/test_api.py new file mode 100644 index 0000000000000..a62fcb506a34b --- /dev/null +++ b/pandas/tests/frame/test_api.py @@ -0,0 +1,437 @@ +# -*- coding: utf-8 -*- + +from __future__ import print_function + +import pytest + +# pylint: disable-msg=W0612,E1101 +from copy import deepcopy +import sys +from distutils.version import LooseVersion + +from pandas.compat import range, lrange +from pandas import compat + +from numpy.random import randn +import numpy as np + +from pandas import DataFrame, Series, date_range, timedelta_range +import pandas as pd + +from pandas.util.testing import (assert_almost_equal, + assert_series_equal, + assert_frame_equal) + +import pandas.util.testing as tm + +from pandas.tests.frame.common import TestData + + +class SharedWithSparse(object): + """ + A collection of tests DataFrame and SparseDataFrame can share. + + In generic tests on this class, use ``self._assert_frame_equal()`` and + ``self._assert_series_equal()`` which are implemented in sub-classes + and dispatch correctly. + """ + def _assert_frame_equal(self, left, right): + """Dispatch to frame class dependent assertion""" + raise NotImplementedError + + def _assert_series_equal(self, left, right): + """Dispatch to series class dependent assertion""" + raise NotImplementedError + + def test_copy_index_name_checking(self): + # don't want to be able to modify the index stored elsewhere after + # making a copy + for attr in ('index', 'columns'): + ind = getattr(self.frame, attr) + ind.name = None + cp = self.frame.copy() + getattr(cp, attr).name = 'foo' + assert getattr(self.frame, attr).name is None + + def test_getitem_pop_assign_name(self): + s = self.frame['A'] + assert s.name == 'A' + + s = self.frame.pop('A') + assert s.name == 'A' + + s = self.frame.loc[:, 'B'] + assert s.name == 'B' + + s2 = s.loc[:] + assert s2.name == 'B' + + def test_get_value(self): + for idx in self.frame.index: + for col in self.frame.columns: + result = self.frame.get_value(idx, col) + expected = self.frame[col][idx] + tm.assert_almost_equal(result, expected) + + def test_add_prefix_suffix(self): + with_prefix = self.frame.add_prefix('foo#') + expected = pd.Index(['foo#%s' % c for c in self.frame.columns]) + tm.assert_index_equal(with_prefix.columns, expected) + + with_suffix = self.frame.add_suffix('#foo') + expected = pd.Index(['%s#foo' % c for c in self.frame.columns]) + tm.assert_index_equal(with_suffix.columns, expected) + + with_pct_prefix = self.frame.add_prefix('%') + expected = pd.Index(['%{}'.format(c) for c in self.frame.columns]) + tm.assert_index_equal(with_pct_prefix.columns, expected) + + with_pct_suffix = self.frame.add_suffix('%') + expected = pd.Index(['{}%'.format(c) for c in self.frame.columns]) + tm.assert_index_equal(with_pct_suffix.columns, expected) + + def test_get_axis(self): + f = self.frame + assert f._get_axis_number(0) == 0 + assert f._get_axis_number(1) == 1 + assert f._get_axis_number('index') == 0 + assert f._get_axis_number('rows') == 0 + assert f._get_axis_number('columns') == 1 + + assert f._get_axis_name(0) == 'index' + assert f._get_axis_name(1) == 'columns' + assert f._get_axis_name('index') == 'index' + assert f._get_axis_name('rows') == 'index' + assert f._get_axis_name('columns') == 'columns' + + assert f._get_axis(0) is f.index + assert f._get_axis(1) is f.columns + + tm.assert_raises_regex( + ValueError, 'No axis named', f._get_axis_number, 2) + tm.assert_raises_regex( + ValueError, 'No axis.*foo', f._get_axis_name, 'foo') + tm.assert_raises_regex( + ValueError, 'No axis.*None', f._get_axis_name, None) + tm.assert_raises_regex(ValueError, 'No axis named', + f._get_axis_number, None) + + def test_keys(self): + getkeys = self.frame.keys + assert getkeys() is self.frame.columns + + def test_column_contains_typeerror(self): + try: + self.frame.columns in self.frame + except TypeError: + pass + + def test_not_hashable(self): + df = self.klass([1]) + pytest.raises(TypeError, hash, df) + pytest.raises(TypeError, hash, self.empty) + + def test_new_empty_index(self): + df1 = self.klass(randn(0, 3)) + df2 = self.klass(randn(0, 3)) + df1.index.name = 'foo' + assert df2.index.name is None + + def test_array_interface(self): + with np.errstate(all='ignore'): + result = np.sqrt(self.frame) + assert isinstance(result, type(self.frame)) + assert result.index is self.frame.index + assert result.columns is self.frame.columns + + self._assert_frame_equal(result, self.frame.apply(np.sqrt)) + + def test_get_agg_axis(self): + cols = self.frame._get_agg_axis(0) + assert cols is self.frame.columns + + idx = self.frame._get_agg_axis(1) + assert idx is self.frame.index + + pytest.raises(ValueError, self.frame._get_agg_axis, 2) + + def test_nonzero(self): + assert self.empty.empty + + assert not self.frame.empty + assert not self.mixed_frame.empty + + # corner case + df = DataFrame({'A': [1., 2., 3.], + 'B': ['a', 'b', 'c']}, + index=np.arange(3)) + del df['A'] + assert not df.empty + + def test_iteritems(self): + df = self.klass([[1, 2, 3], [4, 5, 6]], columns=['a', 'a', 'b']) + for k, v in compat.iteritems(df): + assert isinstance(v, self.klass._constructor_sliced) + + def test_items(self): + # issue #17213, #13918 + cols = ['a', 'b', 'c'] + df = DataFrame([[1, 2, 3], [4, 5, 6]], columns=cols) + for c, (k, v) in zip(cols, df.items()): + assert c == k + assert isinstance(v, Series) + assert (df[k] == v).all() + + def test_iter(self): + assert tm.equalContents(list(self.frame), self.frame.columns) + + def test_iterrows(self): + for k, v in self.frame.iterrows(): + exp = self.frame.loc[k] + self._assert_series_equal(v, exp) + + for k, v in self.mixed_frame.iterrows(): + exp = self.mixed_frame.loc[k] + self._assert_series_equal(v, exp) + + def test_itertuples(self): + for i, tup in enumerate(self.frame.itertuples()): + s = self.klass._constructor_sliced(tup[1:]) + s.name = tup[0] + expected = self.frame.iloc[i, :].reset_index(drop=True) + self._assert_series_equal(s, expected) + + df = self.klass({'floats': np.random.randn(5), + 'ints': lrange(5)}, columns=['floats', 'ints']) + + for tup in df.itertuples(index=False): + assert isinstance(tup[1], np.integer) + + df = self.klass(data={"a": [1, 2, 3], "b": [4, 5, 6]}) + dfaa = df[['a', 'a']] + + assert (list(dfaa.itertuples()) == + [(0, 1, 1), (1, 2, 2), (2, 3, 3)]) + assert (repr(list(df.itertuples(name=None))) == + '[(0, 1, 4), (1, 2, 5), (2, 3, 6)]') + + tup = next(df.itertuples(name='TestName')) + + if sys.version >= LooseVersion('2.7'): + assert tup._fields == ('Index', 'a', 'b') + assert (tup.Index, tup.a, tup.b) == tup + assert type(tup).__name__ == 'TestName' + + df.columns = ['def', 'return'] + tup2 = next(df.itertuples(name='TestName')) + assert tup2 == (0, 1, 4) + + if sys.version >= LooseVersion('2.7'): + assert tup2._fields == ('Index', '_1', '_2') + + df3 = DataFrame(dict(('f' + str(i), [i]) for i in range(1024))) + # will raise SyntaxError if trying to create namedtuple + tup3 = next(df3.itertuples()) + assert not hasattr(tup3, '_fields') + assert isinstance(tup3, tuple) + + def test_len(self): + assert len(self.frame) == len(self.frame.index) + + def test_as_matrix(self): + frame = self.frame + mat = frame.as_matrix() + + frameCols = frame.columns + for i, row in enumerate(mat): + for j, value in enumerate(row): + col = frameCols[j] + if np.isnan(value): + assert np.isnan(frame[col][i]) + else: + assert value == frame[col][i] + + # mixed type + mat = self.mixed_frame.as_matrix(['foo', 'A']) + assert mat[0, 0] == 'bar' + + df = self.klass({'real': [1, 2, 3], 'complex': [1j, 2j, 3j]}) + mat = df.as_matrix() + assert mat[0, 0] == 1j + + # single block corner case + mat = self.frame.as_matrix(['A', 'B']) + expected = self.frame.reindex(columns=['A', 'B']).values + assert_almost_equal(mat, expected) + + def test_transpose(self): + frame = self.frame + dft = frame.T + for idx, series in compat.iteritems(dft): + for col, value in compat.iteritems(series): + if np.isnan(value): + assert np.isnan(frame[col][idx]) + else: + assert value == frame[col][idx] + + # mixed type + index, data = tm.getMixedTypeDict() + mixed = self.klass(data, index=index) + + mixed_T = mixed.T + for col, s in compat.iteritems(mixed_T): + assert s.dtype == np.object_ + + def test_swapaxes(self): + df = self.klass(np.random.randn(10, 5)) + self._assert_frame_equal(df.T, df.swapaxes(0, 1)) + self._assert_frame_equal(df.T, df.swapaxes(1, 0)) + self._assert_frame_equal(df, df.swapaxes(0, 0)) + pytest.raises(ValueError, df.swapaxes, 2, 5) + + def test_axis_aliases(self): + f = self.frame + + # reg name + expected = f.sum(axis=0) + result = f.sum(axis='index') + assert_series_equal(result, expected) + + expected = f.sum(axis=1) + result = f.sum(axis='columns') + assert_series_equal(result, expected) + + def test_more_asMatrix(self): + values = self.mixed_frame.as_matrix() + assert values.shape[1] == len(self.mixed_frame.columns) + + def test_repr_with_mi_nat(self): + df = self.klass({'X': [1, 2]}, + index=[[pd.NaT, pd.Timestamp('20130101')], ['a', 'b']]) + res = repr(df) + exp = ' X\nNaT a 1\n2013-01-01 b 2' + assert res == exp + + def test_iteritems_names(self): + for k, v in compat.iteritems(self.mixed_frame): + assert v.name == k + + def test_series_put_names(self): + series = self.mixed_frame._series + for k, v in compat.iteritems(series): + assert v.name == k + + def test_empty_nonzero(self): + df = self.klass([1, 2, 3]) + assert not df.empty + df = self.klass(index=[1], columns=[1]) + assert not df.empty + df = self.klass(index=['a', 'b'], columns=['c', 'd']).dropna() + assert df.empty + assert df.T.empty + empty_frames = [self.klass(), + self.klass(index=[1]), + self.klass(columns=[1]), + self.klass({1: []})] + for df in empty_frames: + assert df.empty + assert df.T.empty + + def test_with_datetimelikes(self): + + df = self.klass({'A': date_range('20130101', periods=10), + 'B': timedelta_range('1 day', periods=10)}) + t = df.T + + result = t.get_dtype_counts() + expected = Series({'object': 10}) + tm.assert_series_equal(result, expected) + + +class TestDataFrameMisc(SharedWithSparse, TestData): + + klass = DataFrame + # SharedWithSparse tests use generic, klass-agnostic assertion + _assert_frame_equal = staticmethod(assert_frame_equal) + _assert_series_equal = staticmethod(assert_series_equal) + + def test_values(self): + self.frame.values[:, 0] = 5. + assert (self.frame.values[:, 0] == 5).all() + + def test_deepcopy(self): + cp = deepcopy(self.frame) + series = cp['A'] + series[:] = 10 + for idx, value in compat.iteritems(series): + assert self.frame['A'][idx] != value + + def test_transpose_get_view(self): + dft = self.frame.T + dft.values[:, 5:10] = 5 + + assert (self.frame.values[5:10] == 5).all() + + def test_inplace_return_self(self): + # re #1893 + + data = DataFrame({'a': ['foo', 'bar', 'baz', 'qux'], + 'b': [0, 0, 1, 1], + 'c': [1, 2, 3, 4]}) + + def _check_f(base, f): + result = f(base) + assert result is None + + # -----DataFrame----- + + # set_index + f = lambda x: x.set_index('a', inplace=True) + _check_f(data.copy(), f) + + # reset_index + f = lambda x: x.reset_index(inplace=True) + _check_f(data.set_index('a'), f) + + # drop_duplicates + f = lambda x: x.drop_duplicates(inplace=True) + _check_f(data.copy(), f) + + # sort + f = lambda x: x.sort_values('b', inplace=True) + _check_f(data.copy(), f) + + # sort_index + f = lambda x: x.sort_index(inplace=True) + _check_f(data.copy(), f) + + # fillna + f = lambda x: x.fillna(0, inplace=True) + _check_f(data.copy(), f) + + # replace + f = lambda x: x.replace(1, 0, inplace=True) + _check_f(data.copy(), f) + + # rename + f = lambda x: x.rename({1: 'foo'}, inplace=True) + _check_f(data.copy(), f) + + # -----Series----- + d = data.copy()['c'] + + # reset_index + f = lambda x: x.reset_index(inplace=True, drop=True) + _check_f(data.set_index('a')['c'], f) + + # fillna + f = lambda x: x.fillna(0, inplace=True) + _check_f(d.copy(), f) + + # replace + f = lambda x: x.replace(1, 0, inplace=True) + _check_f(d.copy(), f) + + # rename + f = lambda x: x.rename({1: 'foo'}, inplace=True) + _check_f(d.copy(), f) diff --git a/pandas/tests/frame/test_apply.py b/pandas/tests/frame/test_apply.py index fe04d1005e003..ab2e810d77634 100644 --- a/pandas/tests/frame/test_apply.py +++ b/pandas/tests/frame/test_apply.py @@ -2,52 +2,53 @@ from __future__ import print_function +import pytest + from datetime import datetime import warnings import numpy as np -from pandas import (notnull, DataFrame, Series, MultiIndex, date_range, +from pandas import (notna, DataFrame, Series, MultiIndex, date_range, Timestamp, compat) import pandas as pd -from pandas.types.dtypes import CategoricalDtype +from pandas.core.dtypes.dtypes import CategoricalDtype from pandas.util.testing import (assert_series_equal, assert_frame_equal) import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameApply(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameApply(TestData): def test_apply(self): with np.errstate(all='ignore'): # ufunc applied = self.frame.apply(np.sqrt) - assert_series_equal(np.sqrt(self.frame['A']), applied['A']) + tm.assert_series_equal(np.sqrt(self.frame['A']), applied['A']) # aggregator applied = self.frame.apply(np.mean) - self.assertEqual(applied['A'], np.mean(self.frame['A'])) + assert applied['A'] == np.mean(self.frame['A']) d = self.frame.index[0] applied = self.frame.apply(np.mean, axis=1) - self.assertEqual(applied[d], np.mean(self.frame.xs(d))) - self.assertIs(applied.index, self.frame.index) # want this + assert applied[d] == np.mean(self.frame.xs(d)) + assert applied.index is self.frame.index # want this # invalid axis df = DataFrame( [[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'a', 'c']) - self.assertRaises(ValueError, df.apply, lambda x: x, 2) + pytest.raises(ValueError, df.apply, lambda x: x, 2) - # GH9573 + # see gh-9573 df = DataFrame({'c0': ['A', 'A', 'B', 'B'], 'c1': ['C', 'C', 'D', 'D']}) df = df.apply(lambda ts: ts.astype('category')) - self.assertEqual(df.shape, (4, 2)) - self.assertTrue(isinstance(df['c0'].dtype, CategoricalDtype)) - self.assertTrue(isinstance(df['c1'].dtype, CategoricalDtype)) + + assert df.shape == (4, 2) + assert isinstance(df['c0'].dtype, CategoricalDtype) + assert isinstance(df['c1'].dtype, CategoricalDtype) def test_apply_mixed_datetimelike(self): # mixed datetimelike @@ -60,10 +61,10 @@ def test_apply_mixed_datetimelike(self): def test_apply_empty(self): # empty applied = self.empty.apply(np.sqrt) - self.assertTrue(applied.empty) + assert applied.empty applied = self.empty.apply(np.mean) - self.assertTrue(applied.empty) + assert applied.empty no_rows = self.frame[:0] result = no_rows.apply(lambda x: x.mean()) @@ -96,7 +97,7 @@ def test_apply_empty(self): [], index=pd.Index([], dtype=object))) # Ensure that x.append hasn't been called - self.assertEqual(x, []) + assert x == [] def test_apply_standard_nonunique(self): df = DataFrame( @@ -108,17 +109,28 @@ def test_apply_standard_nonunique(self): rs = df.T.apply(lambda s: s[0], axis=0) assert_series_equal(rs, xp) + def test_with_string_args(self): + + for arg in ['sum', 'mean', 'min', 'max', 'std']: + result = self.frame.apply(arg) + expected = getattr(self.frame, arg)() + tm.assert_series_equal(result, expected) + + result = self.frame.apply(arg, axis=1) + expected = getattr(self.frame, arg)(axis=1) + tm.assert_series_equal(result, expected) + def test_apply_broadcast(self): broadcasted = self.frame.apply(np.mean, broadcast=True) agged = self.frame.apply(np.mean) for col, ts in compat.iteritems(broadcasted): - self.assertTrue((ts == agged[col]).all()) + assert (ts == agged[col]).all() broadcasted = self.frame.apply(np.mean, axis=1, broadcast=True) agged = self.frame.apply(np.mean, axis=1) for idx in broadcasted.index: - self.assertTrue((broadcasted.xs(idx) == agged[idx]).all()) + assert (broadcasted.xs(idx) == agged[idx]).all() def test_apply_raw(self): result0 = self.frame.apply(np.mean, raw=True) @@ -138,7 +150,7 @@ def test_apply_raw(self): def test_apply_axis1(self): d = self.frame.index[0] tapplied = self.frame.apply(np.mean, axis=1) - self.assertEqual(tapplied[d], np.mean(self.frame.xs(d))) + assert tapplied[d] == np.mean(self.frame.xs(d)) def test_apply_ignore_failures(self): result = self.mixed_frame._apply_standard(np.mean, 0, @@ -178,10 +190,10 @@ def _checkit(axis=0, raw=False): res = df.apply(f, axis=axis, raw=raw) if is_reduction: agg_axis = df._get_agg_axis(axis) - tm.assertIsInstance(res, Series) - self.assertIs(res.index, agg_axis) + assert isinstance(res, Series) + assert res.index is agg_axis else: - tm.assertIsInstance(res, DataFrame) + assert isinstance(res, DataFrame) _checkit() _checkit(axis=1) @@ -195,7 +207,7 @@ def _checkit(axis=0, raw=False): _check(no_index, lambda x: x.mean()) result = no_cols.apply(lambda x: x.mean(), broadcast=True) - tm.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) def test_apply_with_args_kwds(self): def add_some(x, howmuch=0): @@ -266,18 +278,17 @@ def transform(row): return row def transform2(row): - if (notnull(row['C']) and row['C'].startswith('shin') and + if (notna(row['C']) and row['C'].startswith('shin') and row['A'] == 'foo'): row['D'] = 7 return row try: - transformed = data.apply(transform, axis=1) # noqa + data.apply(transform, axis=1) except AttributeError as e: - self.assertEqual(len(e.args), 2) - self.assertEqual(e.args[1], 'occurred at index 4') - self.assertEqual( - e.args[0], "'float' object has no attribute 'startswith'") + assert len(e.args) == 2 + assert e.args[1] == 'occurred at index 4' + assert e.args[0] == "'float' object has no attribute 'startswith'" def test_apply_bug(self): @@ -348,7 +359,7 @@ def test_apply_multi_index(self): s.index = MultiIndex.from_arrays([['a', 'a', 'b'], ['c', 'd', 'd']]) s.columns = ['col1', 'col2'] res = s.apply(lambda x: Series({'min': min(x), 'max': max(x)}), 1) - tm.assertIsInstance(res.index, MultiIndex) + assert isinstance(res.index, MultiIndex) def test_apply_dict(self): @@ -371,23 +382,23 @@ def test_apply_dict(self): def test_applymap(self): applied = self.frame.applymap(lambda x: x * 2) - assert_frame_equal(applied, self.frame * 2) - result = self.frame.applymap(type) + tm.assert_frame_equal(applied, self.frame * 2) + self.frame.applymap(type) - # GH #465, function returning tuples + # gh-465: function returning tuples result = self.frame.applymap(lambda x: (x, x)) - tm.assertIsInstance(result['A'][0], tuple) + assert isinstance(result['A'][0], tuple) - # GH 2909, object conversion to float in constructor? + # gh-2909: object conversion to float in constructor? df = DataFrame(data=[1, 'a']) result = df.applymap(lambda x: x) - self.assertEqual(result.dtypes[0], object) + assert result.dtypes[0] == object df = DataFrame(data=[1., 'a']) result = df.applymap(lambda x: x) - self.assertEqual(result.dtypes[0], object) + assert result.dtypes[0] == object - # GH2786 + # see gh-2786 df = DataFrame(np.random.random((3, 4))) df2 = df.copy() cols = ['a', 'a', 'a', 'a'] @@ -396,16 +407,16 @@ def test_applymap(self): expected = df2.applymap(str) expected.columns = cols result = df.applymap(str) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # datetime/timedelta df['datetime'] = Timestamp('20130101') df['timedelta'] = pd.Timedelta('1 min') result = df.applymap(str) for f in ['datetime', 'timedelta']: - self.assertEqual(result.loc[0, f], str(df.loc[0, f])) + assert result.loc[0, f] == str(df.loc[0, f]) - # GH 8222 + # see gh-8222 empty_frames = [pd.DataFrame(), pd.DataFrame(columns=list('ABC')), pd.DataFrame(index=list('ABC')), @@ -433,6 +444,15 @@ def test_applymap_box(self): 'd': ['Period', 'Period']}) tm.assert_frame_equal(res, exp) + def test_frame_apply_dont_convert_datetime64(self): + from pandas.tseries.offsets import BDay + df = DataFrame({'x1': [datetime(1996, 1, 1)]}) + + df = df.applymap(lambda x: x + BDay()) + df = df.applymap(lambda x: x + BDay()) + + assert df.x1.dtype == 'M8[ns]' + # See gh-12244 def test_apply_non_numpy_dtype(self): df = DataFrame({'dt': pd.date_range( @@ -448,3 +468,215 @@ def test_apply_non_numpy_dtype(self): df = DataFrame({'dt': ['a', 'b', 'c', 'a']}, dtype='category') result = df.apply(lambda x: x) assert_frame_equal(result, df) + + +def zip_frames(*frames): + """ + take a list of frames, zip the columns together for each + assume that these all have the first frame columns + + return a new frame + """ + columns = frames[0].columns + zipped = [f[c] for c in columns for f in frames] + return pd.concat(zipped, axis=1) + + +class TestDataFrameAggregate(TestData): + + _multiprocess_can_split_ = True + + def test_agg_transform(self): + + with np.errstate(all='ignore'): + + f_sqrt = np.sqrt(self.frame) + f_abs = np.abs(self.frame) + + # ufunc + result = self.frame.transform(np.sqrt) + expected = f_sqrt.copy() + assert_frame_equal(result, expected) + + result = self.frame.apply(np.sqrt) + assert_frame_equal(result, expected) + + result = self.frame.transform(np.sqrt) + assert_frame_equal(result, expected) + + # list-like + result = self.frame.apply([np.sqrt]) + expected = f_sqrt.copy() + expected.columns = pd.MultiIndex.from_product( + [self.frame.columns, ['sqrt']]) + assert_frame_equal(result, expected) + + result = self.frame.transform([np.sqrt]) + assert_frame_equal(result, expected) + + # multiple items in list + # these are in the order as if we are applying both + # functions per series and then concatting + expected = zip_frames(f_sqrt, f_abs) + expected.columns = pd.MultiIndex.from_product( + [self.frame.columns, ['sqrt', 'absolute']]) + result = self.frame.apply([np.sqrt, np.abs]) + assert_frame_equal(result, expected) + + result = self.frame.transform(['sqrt', np.abs]) + assert_frame_equal(result, expected) + + def test_transform_and_agg_err(self): + # cannot both transform and agg + def f(): + self.frame.transform(['max', 'min']) + pytest.raises(ValueError, f) + + def f(): + with np.errstate(all='ignore'): + self.frame.agg(['max', 'sqrt']) + pytest.raises(ValueError, f) + + def f(): + with np.errstate(all='ignore'): + self.frame.transform(['max', 'sqrt']) + pytest.raises(ValueError, f) + + df = pd.DataFrame({'A': range(5), 'B': 5}) + + def f(): + with np.errstate(all='ignore'): + df.agg({'A': ['abs', 'sum'], 'B': ['mean', 'max']}) + + def test_demo(self): + # demonstration tests + df = pd.DataFrame({'A': range(5), 'B': 5}) + + result = df.agg(['min', 'max']) + expected = DataFrame({'A': [0, 4], 'B': [5, 5]}, + columns=['A', 'B'], + index=['min', 'max']) + tm.assert_frame_equal(result, expected) + + result = df.agg({'A': ['min', 'max'], 'B': ['sum', 'max']}) + expected = DataFrame({'A': [4.0, 0.0, np.nan], + 'B': [5.0, np.nan, 25.0]}, + columns=['A', 'B'], + index=['max', 'min', 'sum']) + tm.assert_frame_equal(result.reindex_like(expected), expected) + + def test_agg_dict_nested_renaming_depr(self): + + df = pd.DataFrame({'A': range(5), 'B': 5}) + + # nested renaming + with tm.assert_produces_warning(FutureWarning): + df.agg({'A': {'foo': 'min'}, + 'B': {'bar': 'max'}}) + + def test_agg_reduce(self): + # all reducers + expected = zip_frames(self.frame.mean().to_frame(), + self.frame.max().to_frame(), + self.frame.sum().to_frame()).T + expected.index = ['mean', 'max', 'sum'] + result = self.frame.agg(['mean', 'max', 'sum']) + assert_frame_equal(result, expected) + + # dict input with scalars + result = self.frame.agg({'A': 'mean', 'B': 'sum'}) + expected = Series([self.frame.A.mean(), self.frame.B.sum()], + index=['A', 'B']) + assert_series_equal(result.reindex_like(expected), expected) + + # dict input with lists + result = self.frame.agg({'A': ['mean'], 'B': ['sum']}) + expected = DataFrame({'A': Series([self.frame.A.mean()], + index=['mean']), + 'B': Series([self.frame.B.sum()], + index=['sum'])}) + assert_frame_equal(result.reindex_like(expected), expected) + + # dict input with lists with multiple + result = self.frame.agg({'A': ['mean', 'sum'], + 'B': ['sum', 'max']}) + expected = DataFrame({'A': Series([self.frame.A.mean(), + self.frame.A.sum()], + index=['mean', 'sum']), + 'B': Series([self.frame.B.sum(), + self.frame.B.max()], + index=['sum', 'max'])}) + assert_frame_equal(result.reindex_like(expected), expected) + + def test_nuiscance_columns(self): + + # GH 15015 + df = DataFrame({'A': [1, 2, 3], + 'B': [1., 2., 3.], + 'C': ['foo', 'bar', 'baz'], + 'D': pd.date_range('20130101', periods=3)}) + + result = df.agg('min') + expected = Series([1, 1., 'bar', pd.Timestamp('20130101')], + index=df.columns) + assert_series_equal(result, expected) + + result = df.agg(['min']) + expected = DataFrame([[1, 1., 'bar', pd.Timestamp('20130101')]], + index=['min'], columns=df.columns) + assert_frame_equal(result, expected) + + result = df.agg('sum') + expected = Series([6, 6., 'foobarbaz'], + index=['A', 'B', 'C']) + assert_series_equal(result, expected) + + result = df.agg(['sum']) + expected = DataFrame([[6, 6., 'foobarbaz']], + index=['sum'], columns=['A', 'B', 'C']) + assert_frame_equal(result, expected) + + def test_non_callable_aggregates(self): + + # GH 16405 + # 'size' is a property of frame/series + # validate that this is working + df = DataFrame({'A': [None, 2, 3], + 'B': [1.0, np.nan, 3.0], + 'C': ['foo', None, 'bar']}) + + # Function aggregate + result = df.agg({'A': 'count'}) + expected = pd.Series({'A': 2}) + + assert_series_equal(result, expected) + + # Non-function aggregate + result = df.agg({'A': 'size'}) + expected = pd.Series({'A': 3}) + + assert_series_equal(result, expected) + + # Mix function and non-function aggs + result1 = df.agg(['count', 'size']) + result2 = df.agg({'A': ['count', 'size'], + 'B': ['count', 'size'], + 'C': ['count', 'size']}) + expected = pd.DataFrame({'A': {'count': 2, 'size': 3}, + 'B': {'count': 2, 'size': 3}, + 'C': {'count': 2, 'size': 3}}) + + assert_frame_equal(result1, result2, check_like=True) + assert_frame_equal(result2, expected, check_like=True) + + # Just functional string arg is same as calling df.arg() + result = df.agg('count') + expected = df.count() + + assert_series_equal(result, expected) + + # Just a string attribute arg same as calling df.arg + result = df.agg('size') + expected = df.size + + assert result == expected diff --git a/pandas/tests/frame/test_asof.py b/pandas/tests/frame/test_asof.py index f68219120b48e..fea6a5370109e 100644 --- a/pandas/tests/frame/test_asof.py +++ b/pandas/tests/frame/test_asof.py @@ -1,48 +1,41 @@ # coding=utf-8 -import nose - import numpy as np from pandas import (DataFrame, date_range, Timestamp, Series, to_datetime) -from pandas.util.testing import assert_frame_equal, assert_series_equal import pandas.util.testing as tm from .common import TestData -class TestFrameAsof(TestData, tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): +class TestFrameAsof(TestData): + def setup_method(self, method): self.N = N = 50 - rng = date_range('1/1/1990', periods=N, freq='53s') + self.rng = date_range('1/1/1990', periods=N, freq='53s') self.df = DataFrame({'A': np.arange(N), 'B': np.arange(N)}, - index=rng) + index=self.rng) def test_basic(self): - df = self.df.copy() df.loc[15:30, 'A'] = np.nan dates = date_range('1/1/1990', periods=self.N * 3, freq='25s') result = df.asof(dates) - self.assertTrue(result.notnull().all(1).all()) + assert result.notna().all(1).all() lb = df.index[14] ub = df.index[30] dates = list(dates) result = df.asof(dates) - self.assertTrue(result.notnull().all(1).all()) + assert result.notna().all(1).all() mask = (result.index >= lb) & (result.index < ub) rs = result[mask] - self.assertTrue((rs == 14).all(1).all()) + assert (rs == 14).all(1).all() def test_subset(self): - N = 10 rng = date_range('1/1/1990', periods=N, freq='53s') df = DataFrame({'A': np.arange(N), 'B': np.arange(N)}, @@ -54,19 +47,19 @@ def test_subset(self): # with a subset of A should be the same result = df.asof(dates, subset='A') expected = df.asof(dates) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # same with A/B result = df.asof(dates, subset=['A', 'B']) expected = df.asof(dates) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # B gives self.df.asof result = df.asof(dates, subset='B') expected = df.resample('25s', closed='right').ffill().reindex(dates) expected.iloc[20:] = 9 - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_missing(self): # GH 15118 @@ -78,13 +71,38 @@ def test_missing(self): result = df.asof('1989-12-31') expected = Series(index=['A', 'B'], name=Timestamp('1989-12-31')) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = df.asof(to_datetime(['1989-12-31'])) expected = DataFrame(index=to_datetime(['1989-12-31']), columns=['A', 'B'], dtype='float64') - assert_frame_equal(result, expected) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + tm.assert_frame_equal(result, expected) + + def test_all_nans(self): + # GH 15713 + # DataFrame is all nans + result = DataFrame([np.nan]).asof([0]) + expected = DataFrame([np.nan]) + tm.assert_frame_equal(result, expected) + + # testing non-default indexes, multiple inputs + dates = date_range('1/1/1990', periods=self.N * 3, freq='25s') + result = DataFrame(np.nan, index=self.rng, columns=['A']).asof(dates) + expected = DataFrame(np.nan, index=dates, columns=['A']) + tm.assert_frame_equal(result, expected) + + # testing multiple columns + dates = date_range('1/1/1990', periods=self.N * 3, freq='25s') + result = DataFrame(np.nan, index=self.rng, + columns=['A', 'B', 'C']).asof(dates) + expected = DataFrame(np.nan, index=dates, columns=['A', 'B', 'C']) + tm.assert_frame_equal(result, expected) + + # testing scalar input + result = DataFrame(np.nan, index=[1, 2], columns=['A', 'B']).asof([3]) + expected = DataFrame(np.nan, index=[3], columns=['A', 'B']) + tm.assert_frame_equal(result, expected) + + result = DataFrame(np.nan, index=[1, 2], columns=['A', 'B']).asof(3) + expected = Series(np.nan, index=['A', 'B'], name=3) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/frame/test_axis_select_reindex.py b/pandas/tests/frame/test_axis_select_reindex.py index ff6215531fc64..e76869bf6712b 100644 --- a/pandas/tests/frame/test_axis_select_reindex.py +++ b/pandas/tests/frame/test_axis_select_reindex.py @@ -2,6 +2,8 @@ from __future__ import print_function +import pytest + from datetime import datetime from numpy import random @@ -9,25 +11,21 @@ from pandas.compat import lrange, lzip, u from pandas import (compat, DataFrame, Series, Index, MultiIndex, - date_range, isnull) + date_range, isna) import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_frame_equal -from pandas.core.common import PerformanceWarning +from pandas.errors import PerformanceWarning import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameSelectReindex(tm.TestCase, TestData): +class TestDataFrameSelectReindex(TestData): # These are specific reindex-based tests; other indexing tests should go in # test_indexing - _multiprocess_can_split_ = True - def test_drop_names(self): df = DataFrame([[1, 2, 3], [3, 4, 5], [5, 6, 7]], index=['a', 'b', 'c'], @@ -39,29 +37,34 @@ def test_drop_names(self): df_inplace_b.drop('b', inplace=True) df_inplace_e.drop('e', axis=1, inplace=True) for obj in (df_dropped_b, df_dropped_e, df_inplace_b, df_inplace_e): - self.assertEqual(obj.index.name, 'first') - self.assertEqual(obj.columns.name, 'second') - self.assertEqual(list(df.columns), ['d', 'e', 'f']) + assert obj.index.name == 'first' + assert obj.columns.name == 'second' + assert list(df.columns) == ['d', 'e', 'f'] - self.assertRaises(ValueError, df.drop, ['g']) - self.assertRaises(ValueError, df.drop, ['g'], 1) + pytest.raises(ValueError, df.drop, ['g']) + pytest.raises(ValueError, df.drop, ['g'], 1) # errors = 'ignore' dropped = df.drop(['g'], errors='ignore') expected = Index(['a', 'b', 'c'], name='first') - self.assert_index_equal(dropped.index, expected) + tm.assert_index_equal(dropped.index, expected) dropped = df.drop(['b', 'g'], errors='ignore') expected = Index(['a', 'c'], name='first') - self.assert_index_equal(dropped.index, expected) + tm.assert_index_equal(dropped.index, expected) dropped = df.drop(['g'], axis=1, errors='ignore') expected = Index(['d', 'e', 'f'], name='second') - self.assert_index_equal(dropped.columns, expected) + tm.assert_index_equal(dropped.columns, expected) dropped = df.drop(['d', 'g'], axis=1, errors='ignore') expected = Index(['e', 'f'], name='second') - self.assert_index_equal(dropped.columns, expected) + tm.assert_index_equal(dropped.columns, expected) + + # GH 16398 + dropped = df.drop([], errors='ignore') + expected = Index(['a', 'b', 'c'], name='first') + tm.assert_index_equal(dropped.index, expected) def test_drop_col_still_multiindex(self): arrays = [['a', 'b', 'c', 'top'], @@ -84,10 +87,10 @@ def test_drop(self): assert_frame_equal(simple.drop( [0, 3], axis='index'), simple.loc[[1, 2], :]) - self.assertRaises(ValueError, simple.drop, 5) - self.assertRaises(ValueError, simple.drop, 'C', 1) - self.assertRaises(ValueError, simple.drop, [1, 5]) - self.assertRaises(ValueError, simple.drop, ['A', 'C'], 1) + pytest.raises(ValueError, simple.drop, 5) + pytest.raises(ValueError, simple.drop, 'C', 1) + pytest.raises(ValueError, simple.drop, [1, 5]) + pytest.raises(ValueError, simple.drop, ['A', 'C'], 1) # errors = 'ignore' assert_frame_equal(simple.drop(5, errors='ignore'), simple) @@ -102,6 +105,7 @@ def test_drop(self): columns=['a', 'a', 'b']) assert_frame_equal(nu_df.drop('a', axis=1), nu_df[['b']]) assert_frame_equal(nu_df.drop('b', axis='columns'), nu_df['a']) + assert_frame_equal(nu_df.drop([]), nu_df) # GH 16398 nu_df = nu_df.set_index(pd.Index(['X', 'Y', 'X'])) nu_df.columns = list('abc') @@ -122,7 +126,7 @@ def test_drop_multiindex_not_lexsorted(self): lexsorted_mi = MultiIndex.from_tuples( [('a', ''), ('b1', 'c1'), ('b2', 'c2')], names=['b', 'c']) lexsorted_df = DataFrame([[1, 3, 4]], columns=lexsorted_mi) - self.assertTrue(lexsorted_df.columns.is_lexsorted()) + assert lexsorted_df.columns.is_lexsorted() # define the non-lexsorted version not_lexsorted_df = DataFrame(columns=['a', 'b', 'c', 'd'], @@ -131,7 +135,7 @@ def test_drop_multiindex_not_lexsorted(self): not_lexsorted_df = not_lexsorted_df.pivot_table( index='a', columns=['b', 'c'], values='d') not_lexsorted_df = not_lexsorted_df.reset_index() - self.assertFalse(not_lexsorted_df.columns.is_lexsorted()) + assert not not_lexsorted_df.columns.is_lexsorted() # compare the results tm.assert_frame_equal(lexsorted_df, not_lexsorted_df) @@ -174,16 +178,16 @@ def test_reindex(self): for idx, val in compat.iteritems(newFrame[col]): if idx in self.frame.index: if np.isnan(val): - self.assertTrue(np.isnan(self.frame[col][idx])) + assert np.isnan(self.frame[col][idx]) else: - self.assertEqual(val, self.frame[col][idx]) + assert val == self.frame[col][idx] else: - self.assertTrue(np.isnan(val)) + assert np.isnan(val) for col, series in compat.iteritems(newFrame): - self.assertTrue(tm.equalContents(series.index, newFrame.index)) + assert tm.equalContents(series.index, newFrame.index) emptyFrame = self.frame.reindex(Index([])) - self.assertEqual(len(emptyFrame.index), 0) + assert len(emptyFrame.index) == 0 # Cython code should be unit-tested directly nonContigFrame = self.frame.reindex(self.ts1.index[::2]) @@ -192,41 +196,40 @@ def test_reindex(self): for idx, val in compat.iteritems(nonContigFrame[col]): if idx in self.frame.index: if np.isnan(val): - self.assertTrue(np.isnan(self.frame[col][idx])) + assert np.isnan(self.frame[col][idx]) else: - self.assertEqual(val, self.frame[col][idx]) + assert val == self.frame[col][idx] else: - self.assertTrue(np.isnan(val)) + assert np.isnan(val) for col, series in compat.iteritems(nonContigFrame): - self.assertTrue(tm.equalContents(series.index, - nonContigFrame.index)) + assert tm.equalContents(series.index, nonContigFrame.index) # corner cases # Same index, copies values but not index if copy=False newFrame = self.frame.reindex(self.frame.index, copy=False) - self.assertIs(newFrame.index, self.frame.index) + assert newFrame.index is self.frame.index # length zero newFrame = self.frame.reindex([]) - self.assertTrue(newFrame.empty) - self.assertEqual(len(newFrame.columns), len(self.frame.columns)) + assert newFrame.empty + assert len(newFrame.columns) == len(self.frame.columns) # length zero with columns reindexed with non-empty index newFrame = self.frame.reindex([]) newFrame = newFrame.reindex(self.frame.index) - self.assertEqual(len(newFrame.index), len(self.frame.index)) - self.assertEqual(len(newFrame.columns), len(self.frame.columns)) + assert len(newFrame.index) == len(self.frame.index) + assert len(newFrame.columns) == len(self.frame.columns) # pass non-Index newFrame = self.frame.reindex(list(self.ts1.index)) - self.assert_index_equal(newFrame.index, self.ts1.index) + tm.assert_index_equal(newFrame.index, self.ts1.index) # copy with no axes result = self.frame.reindex() assert_frame_equal(result, self.frame) - self.assertFalse(result is self.frame) + assert result is not self.frame def test_reindex_nan(self): df = pd.DataFrame([[1, 2], [3, 5], [7, 11], [9, 23]], @@ -258,27 +261,27 @@ def test_reindex_name_remains(self): i = Series(np.arange(10), name='iname') df = df.reindex(i) - self.assertEqual(df.index.name, 'iname') + assert df.index.name == 'iname' df = df.reindex(Index(np.arange(10), name='tmpname')) - self.assertEqual(df.index.name, 'tmpname') + assert df.index.name == 'tmpname' s = Series(random.rand(10)) df = DataFrame(s.T, index=np.arange(len(s))) i = Series(np.arange(10), name='iname') df = df.reindex(columns=i) - self.assertEqual(df.columns.name, 'iname') + assert df.columns.name == 'iname' def test_reindex_int(self): smaller = self.intframe.reindex(self.intframe.index[::2]) - self.assertEqual(smaller['A'].dtype, np.int64) + assert smaller['A'].dtype == np.int64 bigger = smaller.reindex(self.intframe.index) - self.assertEqual(bigger['A'].dtype, np.float64) + assert bigger['A'].dtype == np.float64 smaller = self.intframe.reindex(columns=['A', 'B']) - self.assertEqual(smaller['A'].dtype, np.int64) + assert smaller['A'].dtype == np.int64 def test_reindex_like(self): other = self.frame.reindex(index=self.frame.index[:10], @@ -287,15 +290,15 @@ def test_reindex_like(self): assert_frame_equal(other, self.frame.reindex_like(other)) def test_reindex_columns(self): - newFrame = self.frame.reindex(columns=['A', 'B', 'E']) + new_frame = self.frame.reindex(columns=['A', 'B', 'E']) - assert_series_equal(newFrame['B'], self.frame['B']) - self.assertTrue(np.isnan(newFrame['E']).all()) - self.assertNotIn('C', newFrame) + tm.assert_series_equal(new_frame['B'], self.frame['B']) + assert np.isnan(new_frame['E']).all() + assert 'C' not in new_frame - # length zero - newFrame = self.frame.reindex(columns=[]) - self.assertTrue(newFrame.empty) + # Length zero + new_frame = self.frame.reindex(columns=[]) + assert new_frame.empty def test_reindex_columns_method(self): @@ -349,15 +352,15 @@ def test_reindex_axes(self): both_freq = df.reindex(index=time_freq, columns=some_cols).index.freq seq_freq = df.reindex(index=time_freq).reindex( columns=some_cols).index.freq - self.assertEqual(index_freq, both_freq) - self.assertEqual(index_freq, seq_freq) + assert index_freq == both_freq + assert index_freq == seq_freq def test_reindex_fill_value(self): df = DataFrame(np.random.randn(10, 4)) # axis=0 result = df.reindex(lrange(15)) - self.assertTrue(np.isnan(result.values[-5:]).all()) + assert np.isnan(result.values[-5:]).all() result = df.reindex(lrange(15), fill_value=0) expected = df.reindex(lrange(15)).fillna(0) @@ -407,37 +410,39 @@ def test_reindex_dups(self): assert_frame_equal(result, expected) # reindex fails - self.assertRaises(ValueError, df.reindex, index=list(range(len(df)))) + pytest.raises(ValueError, df.reindex, index=list(range(len(df)))) def test_align(self): af, bf = self.frame.align(self.frame) - self.assertIsNot(af._data, self.frame._data) + assert af._data is not self.frame._data af, bf = self.frame.align(self.frame, copy=False) - self.assertIs(af._data, self.frame._data) + assert af._data is self.frame._data # axis = 0 other = self.frame.iloc[:-5, :3] af, bf = self.frame.align(other, axis=0, fill_value=-1) - self.assert_index_equal(bf.columns, other.columns) + + tm.assert_index_equal(bf.columns, other.columns) + # test fill value join_idx = self.frame.index.join(other.index) diff_a = self.frame.index.difference(join_idx) diff_b = other.index.difference(join_idx) diff_a_vals = af.reindex(diff_a).values diff_b_vals = bf.reindex(diff_b).values - self.assertTrue((diff_a_vals == -1).all()) + assert (diff_a_vals == -1).all() af, bf = self.frame.align(other, join='right', axis=0) - self.assert_index_equal(bf.columns, other.columns) - self.assert_index_equal(bf.index, other.index) - self.assert_index_equal(af.index, other.index) + tm.assert_index_equal(bf.columns, other.columns) + tm.assert_index_equal(bf.index, other.index) + tm.assert_index_equal(af.index, other.index) # axis = 1 other = self.frame.iloc[:-5, :3].copy() af, bf = self.frame.align(other, axis=1) - self.assert_index_equal(bf.columns, self.frame.columns) - self.assert_index_equal(bf.index, other.index) + tm.assert_index_equal(bf.columns, self.frame.columns) + tm.assert_index_equal(bf.index, other.index) # test fill value join_idx = self.frame.index.join(other.index) @@ -448,42 +453,42 @@ def test_align(self): # TODO(wesm): unused? diff_b_vals = bf.reindex(diff_b).values # noqa - self.assertTrue((diff_a_vals == -1).all()) + assert (diff_a_vals == -1).all() af, bf = self.frame.align(other, join='inner', axis=1) - self.assert_index_equal(bf.columns, other.columns) + tm.assert_index_equal(bf.columns, other.columns) af, bf = self.frame.align(other, join='inner', axis=1, method='pad') - self.assert_index_equal(bf.columns, other.columns) + tm.assert_index_equal(bf.columns, other.columns) # test other non-float types af, bf = self.intframe.align(other, join='inner', axis=1, method='pad') - self.assert_index_equal(bf.columns, other.columns) + tm.assert_index_equal(bf.columns, other.columns) af, bf = self.mixed_frame.align(self.mixed_frame, join='inner', axis=1, method='pad') - self.assert_index_equal(bf.columns, self.mixed_frame.columns) + tm.assert_index_equal(bf.columns, self.mixed_frame.columns) af, bf = self.frame.align(other.iloc[:, 0], join='inner', axis=1, method=None, fill_value=None) - self.assert_index_equal(bf.index, Index([])) + tm.assert_index_equal(bf.index, Index([])) af, bf = self.frame.align(other.iloc[:, 0], join='inner', axis=1, method=None, fill_value=0) - self.assert_index_equal(bf.index, Index([])) + tm.assert_index_equal(bf.index, Index([])) # mixed floats/ints af, bf = self.mixed_float.align(other.iloc[:, 0], join='inner', axis=1, method=None, fill_value=0) - self.assert_index_equal(bf.index, Index([])) + tm.assert_index_equal(bf.index, Index([])) af, bf = self.mixed_int.align(other.iloc[:, 0], join='inner', axis=1, method=None, fill_value=0) - self.assert_index_equal(bf.index, Index([])) + tm.assert_index_equal(bf.index, Index([])) - # try to align dataframe to series along bad axis - self.assertRaises(ValueError, self.frame.align, af.iloc[0, :3], - join='inner', axis=2) + # Try to align DataFrame to Series along bad axis + with pytest.raises(ValueError): + self.frame.align(af.iloc[0, :3], join='inner', axis=2) # align dataframe to series with broadcast or not idx = self.frame.index @@ -492,7 +497,7 @@ def test_align(self): left, right = self.frame.align(s, axis=0) tm.assert_index_equal(left.index, self.frame.index) tm.assert_index_equal(right.index, self.frame.index) - self.assertTrue(isinstance(right, Series)) + assert isinstance(right, Series) left, right = self.frame.align(s, broadcast_axis=1) tm.assert_index_equal(left.index, self.frame.index) @@ -501,17 +506,17 @@ def test_align(self): expected[c] = s expected = DataFrame(expected, index=self.frame.index, columns=self.frame.columns) - assert_frame_equal(right, expected) + tm.assert_frame_equal(right, expected) - # GH 9558 + # see gh-9558 df = DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) result = df[df['a'] == 2] expected = DataFrame([[2, 5]], index=[1], columns=['a', 'b']) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = df.where(df['a'] == 2, 0) expected = DataFrame({'a': [0, 2, 0], 'b': [0, 5, 0]}) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def _check_align(self, a, b, axis, fill_axis, how, method, limit=None): aa, ab = a.align(b, axis=axis, join=how, method=method, limit=limit, @@ -657,33 +662,33 @@ def test_align_series_combinations(self): tm.assert_frame_equal(res2, exp1) def test_filter(self): - # items + # Items filtered = self.frame.filter(['A', 'B', 'E']) - self.assertEqual(len(filtered.columns), 2) - self.assertNotIn('E', filtered) + assert len(filtered.columns) == 2 + assert 'E' not in filtered filtered = self.frame.filter(['A', 'B', 'E'], axis='columns') - self.assertEqual(len(filtered.columns), 2) - self.assertNotIn('E', filtered) + assert len(filtered.columns) == 2 + assert 'E' not in filtered - # other axis + # Other axis idx = self.frame.index[0:4] filtered = self.frame.filter(idx, axis='index') expected = self.frame.reindex(index=idx) - assert_frame_equal(filtered, expected) + tm.assert_frame_equal(filtered, expected) # like fcopy = self.frame.copy() fcopy['AA'] = 1 filtered = fcopy.filter(like='A') - self.assertEqual(len(filtered.columns), 2) - self.assertIn('AA', filtered) + assert len(filtered.columns) == 2 + assert 'AA' in filtered # like with ints in column names df = DataFrame(0., index=[0, 1, 2], columns=[0, 1, '_A', '_B']) filtered = df.filter(like='_') - self.assertEqual(len(filtered.columns), 2) + assert len(filtered.columns) == 2 # regex with ints in column names # from PR #10384 @@ -691,41 +696,41 @@ def test_filter(self): expected = DataFrame( 0., index=[0, 1, 2], columns=pd.Index([1, 2], dtype=object)) filtered = df.filter(regex='^[0-9]+$') - assert_frame_equal(filtered, expected) + tm.assert_frame_equal(filtered, expected) expected = DataFrame(0., index=[0, 1, 2], columns=[0, '0', 1, '1']) # shouldn't remove anything filtered = expected.filter(regex='^[0-9]+$') - assert_frame_equal(filtered, expected) + tm.assert_frame_equal(filtered, expected) # pass in None - with assertRaisesRegexp(TypeError, 'Must pass'): + with tm.assert_raises_regex(TypeError, 'Must pass'): self.frame.filter() - with assertRaisesRegexp(TypeError, 'Must pass'): + with tm.assert_raises_regex(TypeError, 'Must pass'): self.frame.filter(items=None) - with assertRaisesRegexp(TypeError, 'Must pass'): + with tm.assert_raises_regex(TypeError, 'Must pass'): self.frame.filter(axis=1) # test mutually exclusive arguments - with assertRaisesRegexp(TypeError, 'mutually exclusive'): + with tm.assert_raises_regex(TypeError, 'mutually exclusive'): self.frame.filter(items=['one', 'three'], regex='e$', like='bbi') - with assertRaisesRegexp(TypeError, 'mutually exclusive'): + with tm.assert_raises_regex(TypeError, 'mutually exclusive'): self.frame.filter(items=['one', 'three'], regex='e$', axis=1) - with assertRaisesRegexp(TypeError, 'mutually exclusive'): + with tm.assert_raises_regex(TypeError, 'mutually exclusive'): self.frame.filter(items=['one', 'three'], regex='e$') - with assertRaisesRegexp(TypeError, 'mutually exclusive'): + with tm.assert_raises_regex(TypeError, 'mutually exclusive'): self.frame.filter(items=['one', 'three'], like='bbi', axis=0) - with assertRaisesRegexp(TypeError, 'mutually exclusive'): + with tm.assert_raises_regex(TypeError, 'mutually exclusive'): self.frame.filter(items=['one', 'three'], like='bbi') # objects filtered = self.mixed_frame.filter(like='foo') - self.assertIn('foo', filtered) + assert 'foo' in filtered # unicode columns, won't ascii-encode df = self.frame.rename(columns={'B': u('\u2202')}) filtered = df.filter(like='C') - self.assertTrue('C' in filtered) + assert 'C' in filtered def test_filter_regex_search(self): fcopy = self.frame.copy() @@ -733,8 +738,8 @@ def test_filter_regex_search(self): # regex filtered = fcopy.filter(regex='[A]+') - self.assertEqual(len(filtered.columns), 2) - self.assertIn('AA', filtered) + assert len(filtered.columns) == 2 + assert 'AA' in filtered # doesn't have to be at beginning df = DataFrame({'aBBa': [1, 2], @@ -796,10 +801,10 @@ def test_take(self): assert_frame_equal(result, expected, check_names=False) # illegal indices - self.assertRaises(IndexError, df.take, [3, 1, 2, 30], axis=0) - self.assertRaises(IndexError, df.take, [3, 1, 2, -31], axis=0) - self.assertRaises(IndexError, df.take, [3, 1, 2, 5], axis=1) - self.assertRaises(IndexError, df.take, [3, 1, 2, -5], axis=1) + pytest.raises(IndexError, df.take, [3, 1, 2, 30], axis=0) + pytest.raises(IndexError, df.take, [3, 1, 2, -31], axis=0) + pytest.raises(IndexError, df.take, [3, 1, 2, 5], axis=1) + pytest.raises(IndexError, df.take, [3, 1, 2, -5], axis=1) # mixed-dtype order = [4, 1, 2, 0, 3] @@ -846,29 +851,29 @@ def test_reindex_boolean(self): columns=[0, 2]) reindexed = frame.reindex(np.arange(10)) - self.assertEqual(reindexed.values.dtype, np.object_) - self.assertTrue(isnull(reindexed[0][1])) + assert reindexed.values.dtype == np.object_ + assert isna(reindexed[0][1]) reindexed = frame.reindex(columns=lrange(3)) - self.assertEqual(reindexed.values.dtype, np.object_) - self.assertTrue(isnull(reindexed[1]).all()) + assert reindexed.values.dtype == np.object_ + assert isna(reindexed[1]).all() def test_reindex_objects(self): reindexed = self.mixed_frame.reindex(columns=['foo', 'A', 'B']) - self.assertIn('foo', reindexed) + assert 'foo' in reindexed reindexed = self.mixed_frame.reindex(columns=['A', 'B']) - self.assertNotIn('foo', reindexed) + assert 'foo' not in reindexed def test_reindex_corner(self): index = Index(['a', 'b', 'c']) dm = self.empty.reindex(index=[1, 2, 3]) reindexed = dm.reindex(columns=index) - self.assert_index_equal(reindexed.columns, index) + tm.assert_index_equal(reindexed.columns, index) # ints are weird smaller = self.intframe.reindex(columns=['A', 'B', 'E']) - self.assertEqual(smaller['E'].dtype, np.float64) + assert smaller['E'].dtype == np.float64 def test_reindex_axis(self): cols = ['A', 'B', 'E'] @@ -881,7 +886,7 @@ def test_reindex_axis(self): reindexed2 = self.intframe.reindex(index=rows) assert_frame_equal(reindexed1, reindexed2) - self.assertRaises(ValueError, self.intframe.reindex_axis, rows, axis=2) + pytest.raises(ValueError, self.intframe.reindex_axis, rows, axis=2) # no-op case cols = self.frame.columns.copy() diff --git a/pandas/tests/frame/test_block_internals.py b/pandas/tests/frame/test_block_internals.py index 33550670720c3..afa3c4f25789a 100644 --- a/pandas/tests/frame/test_block_internals.py +++ b/pandas/tests/frame/test_block_internals.py @@ -2,6 +2,8 @@ from __future__ import print_function +import pytest + from datetime import datetime, timedelta import itertools @@ -15,8 +17,7 @@ from pandas.util.testing import (assert_almost_equal, assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) + assert_frame_equal) import pandas.util.testing as tm @@ -27,9 +28,7 @@ # structure -class TestDataFrameBlockInternals(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameBlockInternals(TestData): def test_cast_internals(self): casted = DataFrame(self.frame._data, dtype=int) @@ -42,18 +41,24 @@ def test_cast_internals(self): def test_consolidate(self): self.frame['E'] = 7. - consolidated = self.frame.consolidate() - self.assertEqual(len(consolidated._data.blocks), 1) + consolidated = self.frame._consolidate() + assert len(consolidated._data.blocks) == 1 # Ensure copy, do I want this? - recons = consolidated.consolidate() - self.assertIsNot(recons, consolidated) - assert_frame_equal(recons, consolidated) + recons = consolidated._consolidate() + assert recons is not consolidated + tm.assert_frame_equal(recons, consolidated) self.frame['F'] = 8. - self.assertEqual(len(self.frame._data.blocks), 3) - self.frame.consolidate(inplace=True) - self.assertEqual(len(self.frame._data.blocks), 1) + assert len(self.frame._data.blocks) == 3 + + self.frame._consolidate(inplace=True) + assert len(self.frame._data.blocks) == 1 + + def test_consolidate_deprecation(self): + self.frame['E'] = 7 + with tm.assert_produces_warning(FutureWarning): + self.frame.consolidate() def test_consolidate_inplace(self): frame = self.frame.copy() # noqa @@ -64,18 +69,18 @@ def test_consolidate_inplace(self): def test_as_matrix_consolidate(self): self.frame['E'] = 7. - self.assertFalse(self.frame._data.is_consolidated()) + assert not self.frame._data.is_consolidated() _ = self.frame.as_matrix() # noqa - self.assertTrue(self.frame._data.is_consolidated()) + assert self.frame._data.is_consolidated() def test_modify_values(self): self.frame.values[5] = 5 - self.assertTrue((self.frame.values[5] == 5).all()) + assert (self.frame.values[5] == 5).all() # unconsolidated self.frame['E'] = 7. self.frame.values[6] = 6 - self.assertTrue((self.frame.values[6] == 6).all()) + assert (self.frame.values[6] == 6).all() def test_boolean_set_uncons(self): self.frame['E'] = 7. @@ -90,47 +95,47 @@ def test_as_matrix_numeric_cols(self): self.frame['foo'] = 'bar' values = self.frame.as_matrix(['A', 'B', 'C', 'D']) - self.assertEqual(values.dtype, np.float64) + assert values.dtype == np.float64 def test_as_matrix_lcd(self): # mixed lcd values = self.mixed_float.as_matrix(['A', 'B', 'C', 'D']) - self.assertEqual(values.dtype, np.float64) + assert values.dtype == np.float64 values = self.mixed_float.as_matrix(['A', 'B', 'C']) - self.assertEqual(values.dtype, np.float32) + assert values.dtype == np.float32 values = self.mixed_float.as_matrix(['C']) - self.assertEqual(values.dtype, np.float16) + assert values.dtype == np.float16 # GH 10364 # B uint64 forces float because there are other signed int types values = self.mixed_int.as_matrix(['A', 'B', 'C', 'D']) - self.assertEqual(values.dtype, np.float64) + assert values.dtype == np.float64 values = self.mixed_int.as_matrix(['A', 'D']) - self.assertEqual(values.dtype, np.int64) + assert values.dtype == np.int64 # B uint64 forces float because there are other signed int types values = self.mixed_int.as_matrix(['A', 'B', 'C']) - self.assertEqual(values.dtype, np.float64) + assert values.dtype == np.float64 # as B and C are both unsigned, no forcing to float is needed values = self.mixed_int.as_matrix(['B', 'C']) - self.assertEqual(values.dtype, np.uint64) + assert values.dtype == np.uint64 values = self.mixed_int.as_matrix(['A', 'C']) - self.assertEqual(values.dtype, np.int32) + assert values.dtype == np.int32 values = self.mixed_int.as_matrix(['C', 'D']) - self.assertEqual(values.dtype, np.int64) + assert values.dtype == np.int64 values = self.mixed_int.as_matrix(['A']) - self.assertEqual(values.dtype, np.int32) + assert values.dtype == np.int32 values = self.mixed_int.as_matrix(['C']) - self.assertEqual(values.dtype, np.uint8) + assert values.dtype == np.uint8 def test_constructor_with_convert(self): # this is actually mostly a test of lib.maybe_convert_objects @@ -215,8 +220,8 @@ def test_construction_with_mixed(self): # mixed-type frames self.mixed_frame['datetime'] = datetime.now() self.mixed_frame['timedelta'] = timedelta(days=1, seconds=1) - self.assertEqual(self.mixed_frame['datetime'].dtype, 'M8[ns]') - self.assertEqual(self.mixed_frame['timedelta'].dtype, 'm8[ns]') + assert self.mixed_frame['datetime'].dtype == 'M8[ns]' + assert self.mixed_frame['timedelta'].dtype == 'm8[ns]' result = self.mixed_frame.get_dtype_counts().sort_values() expected = Series({'float64': 4, 'object': 1, @@ -281,10 +286,10 @@ def f(dtype): columns=["A", "B", "C"], dtype=dtype) - self.assertRaises(NotImplementedError, f, - [("A", "datetime64[h]"), - ("B", "str"), - ("C", "int32")]) + pytest.raises(NotImplementedError, f, + [("A", "datetime64[h]"), + ("B", "str"), + ("C", "int32")]) # these work (though results may be unexpected) f('int64') @@ -302,12 +307,12 @@ def test_equals_different_blocks(self): df1 = df0.reset_index()[["A", "B", "C"]] # this assert verifies that the above operations have # induced a block rearrangement - self.assertTrue(df0._data.blocks[0].dtype != - df1._data.blocks[0].dtype) + assert (df0._data.blocks[0].dtype != df1._data.blocks[0].dtype) + # do the real tests assert_frame_equal(df0, df1) - self.assertTrue(df0.equals(df1)) - self.assertTrue(df1.equals(df0)) + assert df0.equals(df1) + assert df1.equals(df0) def test_copy_blocks(self): # API/ENH 9607 @@ -321,7 +326,7 @@ def test_copy_blocks(self): _df.loc[:, column] = _df[column] + 1 # make sure we did not change the original DataFrame - self.assertFalse(_df[column].equals(df[column])) + assert not _df[column].equals(df[column]) def test_no_copy_blocks(self): # API/ENH 9607 @@ -335,30 +340,30 @@ def test_no_copy_blocks(self): _df.loc[:, column] = _df[column] + 1 # make sure we did change the original DataFrame - self.assertTrue(_df[column].equals(df[column])) + assert _df[column].equals(df[column]) def test_copy(self): cop = self.frame.copy() cop['E'] = cop['A'] - self.assertNotIn('E', self.frame) + assert 'E' not in self.frame # copy objects copy = self.mixed_frame.copy() - self.assertIsNot(copy._data, self.mixed_frame._data) + assert copy._data is not self.mixed_frame._data def test_pickle(self): - unpickled = self.round_trip_pickle(self.mixed_frame) + unpickled = tm.round_trip_pickle(self.mixed_frame) assert_frame_equal(self.mixed_frame, unpickled) # buglet self.mixed_frame._data.ndim # empty - unpickled = self.round_trip_pickle(self.empty) + unpickled = tm.round_trip_pickle(self.empty) repr(unpickled) # tz frame - unpickled = self.round_trip_pickle(self.tzframe) + unpickled = tm.round_trip_pickle(self.tzframe) assert_frame_equal(self.tzframe, unpickled) def test_consolidate_datetime64(self): @@ -394,8 +399,8 @@ def test_consolidate_datetime64(self): tm.assert_index_equal(pd.DatetimeIndex(df.ending), ser_ending.index) def test_is_mixed_type(self): - self.assertFalse(self.frame._is_mixed_type) - self.assertTrue(self.mixed_frame._is_mixed_type) + assert not self.frame._is_mixed_type + assert self.mixed_frame._is_mixed_type def test_get_numeric_data(self): # TODO(wesm): unused? @@ -447,7 +452,7 @@ def test_convert_objects(self): oops = self.mixed_frame.T.T converted = oops._convert(datetime=True) assert_frame_equal(converted, self.mixed_frame) - self.assertEqual(converted['A'].dtype, np.float64) + assert converted['A'].dtype == np.float64 # force numeric conversion self.mixed_frame['H'] = '1.' @@ -459,23 +464,23 @@ def test_convert_objects(self): self.mixed_frame['K'] = '1' self.mixed_frame.loc[0:5, ['J', 'K']] = 'garbled' converted = self.mixed_frame._convert(datetime=True, numeric=True) - self.assertEqual(converted['H'].dtype, 'float64') - self.assertEqual(converted['I'].dtype, 'int64') - self.assertEqual(converted['J'].dtype, 'float64') - self.assertEqual(converted['K'].dtype, 'float64') - self.assertEqual(len(converted['J'].dropna()), l - 5) - self.assertEqual(len(converted['K'].dropna()), l - 5) + assert converted['H'].dtype == 'float64' + assert converted['I'].dtype == 'int64' + assert converted['J'].dtype == 'float64' + assert converted['K'].dtype == 'float64' + assert len(converted['J'].dropna()) == l - 5 + assert len(converted['K'].dropna()) == l - 5 # via astype converted = self.mixed_frame.copy() converted['H'] = converted['H'].astype('float64') converted['I'] = converted['I'].astype('int64') - self.assertEqual(converted['H'].dtype, 'float64') - self.assertEqual(converted['I'].dtype, 'int64') + assert converted['H'].dtype == 'float64' + assert converted['I'].dtype == 'int64' # via astype, but errors converted = self.mixed_frame.copy() - with assertRaisesRegexp(ValueError, 'invalid literal'): + with tm.assert_raises_regex(ValueError, 'invalid literal'): converted['H'].astype('int32') # mixed in a single column @@ -490,6 +495,32 @@ def test_convert_objects_no_conversion(self): mixed2 = mixed1._convert(datetime=True) assert_frame_equal(mixed1, mixed2) + def test_infer_objects(self): + # GH 11221 + df = DataFrame({'a': ['a', 1, 2, 3], + 'b': ['b', 2.0, 3.0, 4.1], + 'c': ['c', datetime(2016, 1, 1), + datetime(2016, 1, 2), + datetime(2016, 1, 3)], + 'd': [1, 2, 3, 'd']}, + columns=['a', 'b', 'c', 'd']) + df = df.iloc[1:].infer_objects() + + assert df['a'].dtype == 'int64' + assert df['b'].dtype == 'float64' + assert df['c'].dtype == 'M8[ns]' + assert df['d'].dtype == 'object' + + expected = DataFrame({'a': [1, 2, 3], + 'b': [2.0, 3.0, 4.1], + 'c': [datetime(2016, 1, 1), + datetime(2016, 1, 2), + datetime(2016, 1, 3)], + 'd': [2, 3, 'd']}, + columns=['a', 'b', 'c', 'd']) + # reconstruct frame to verify inference is same + tm.assert_frame_equal(df.reset_index(drop=True), expected) + def test_stale_cached_series_bug_473(self): # this is chained, but ok @@ -502,7 +533,7 @@ def test_stale_cached_series_bug_473(self): repr(Y) result = Y.sum() # noqa exp = Y['g'].sum() # noqa - self.assertTrue(pd.isnull(Y['g']['c'])) + assert pd.isna(Y['g']['c']) def test_get_X_columns(self): # numeric and object columns @@ -513,8 +544,8 @@ def test_get_X_columns(self): 'd': [None, None, None], 'e': [3.14, 0.577, 2.773]}) - self.assert_index_equal(df._get_numeric_data().columns, - pd.Index(['a', 'b', 'e'])) + tm.assert_index_equal(df._get_numeric_data().columns, + pd.Index(['a', 'b', 'e'])) def test_strange_column_corruption_issue(self): # (wesm) Unclear how exactly this is related to internal matters @@ -535,6 +566,6 @@ def test_strange_column_corruption_issue(self): myid = 100 - first = len(df.loc[pd.isnull(df[myid]), [myid]]) - second = len(df.loc[pd.isnull(df[myid]), [myid]]) - self.assertTrue(first == second == 0) + first = len(df.loc[pd.isna(df[myid]), [myid]]) + second = len(df.loc[pd.isna(df[myid]), [myid]]) + assert first == second == 0 diff --git a/pandas/tests/frame/test_combine_concat.py b/pandas/tests/frame/test_combine_concat.py index 71b6500e7184a..e82faaeef2986 100644 --- a/pandas/tests/frame/test_combine_concat.py +++ b/pandas/tests/frame/test_combine_concat.py @@ -9,20 +9,16 @@ import pandas as pd -from pandas import DataFrame, Index, Series, Timestamp +from pandas import DataFrame, Index, Series, Timestamp, date_range from pandas.compat import lrange from pandas.tests.frame.common import TestData import pandas.util.testing as tm -from pandas.util.testing import (assertRaisesRegexp, - assert_frame_equal, - assert_series_equal) +from pandas.util.testing import assert_frame_equal, assert_series_equal -class TestDataFrameConcatCommon(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameConcatCommon(TestData): def test_concat_multiple_frames_dtypes(self): @@ -80,11 +76,13 @@ def test_append_series_dict(self): columns=['foo', 'bar', 'baz', 'qux']) series = df.loc[4] - with assertRaisesRegexp(ValueError, 'Indexes have overlapping values'): + with tm.assert_raises_regex(ValueError, + 'Indexes have overlapping values'): df.append(series, verify_integrity=True) series.name = None - with assertRaisesRegexp(TypeError, 'Can only append a Series if ' - 'ignore_index=True'): + with tm.assert_raises_regex(TypeError, + 'Can only append a Series if ' + 'ignore_index=True'): df.append(series, verify_integrity=True) result = df.append(series[::-1], ignore_index=True) @@ -272,7 +270,7 @@ def test_update_raise(self): other = DataFrame([[2., nan], [nan, 7]], index=[1, 3], columns=[1, 2]) - with assertRaisesRegexp(ValueError, "Data overlaps"): + with tm.assert_raises_regex(ValueError, "Data overlaps"): df.update(other, raise_conflict=True) def test_update_from_non_df(self): @@ -305,7 +303,7 @@ def test_join_str_datetime(self): tst = A.join(C, on='aa') - self.assertEqual(len(tst.columns), 3) + assert len(tst.columns) == 3 def test_join_multiindex_leftright(self): # GH 10741 @@ -421,13 +419,29 @@ def test_concat_axis_parameter(self): assert_frame_equal(concatted_1_series, expected_columns_series) # Testing ValueError - with assertRaisesRegexp(ValueError, 'No axis named'): + with tm.assert_raises_regex(ValueError, 'No axis named'): pd.concat([series1, series2], axis='something') - -class TestDataFrameCombineFirst(tm.TestCase, TestData): - - _multiprocess_can_split_ = True + def test_concat_numerical_names(self): + # #15262 # #12223 + df = pd.DataFrame({'col': range(9)}, + dtype='int32', + index=(pd.MultiIndex + .from_product([['A0', 'A1', 'A2'], + ['B0', 'B1', 'B2']], + names=[1, 2]))) + result = pd.concat((df.iloc[:2, :], df.iloc[-2:, :])) + expected = pd.DataFrame({'col': [0, 1, 7, 8]}, + dtype='int32', + index=pd.MultiIndex.from_tuples([('A0', 'B0'), + ('A0', 'B1'), + ('A2', 'B1'), + ('A2', 'B2')], + names=[1, 2])) + tm.assert_frame_equal(result, expected) + + +class TestDataFrameCombineFirst(TestData): def test_combine_first_mixed(self): a = Series(['a', 'b'], index=lrange(2)) @@ -450,7 +464,7 @@ def test_combine_first(self): combined = head.combine_first(tail) reordered_frame = self.frame.reindex(combined.index) assert_frame_equal(combined, reordered_frame) - self.assertTrue(tm.equalContents(combined.columns, self.frame.columns)) + assert tm.equalContents(combined.columns, self.frame.columns) assert_series_equal(combined['A'], reordered_frame['A']) # same index @@ -464,7 +478,7 @@ def test_combine_first(self): combined = fcopy.combine_first(fcopy2) - self.assertTrue((combined['A'] == 1).all()) + assert (combined['A'] == 1).all() assert_series_equal(combined['B'], fcopy['B']) assert_series_equal(combined['C'], fcopy2['C']) assert_series_equal(combined['D'], fcopy['D']) @@ -474,12 +488,12 @@ def test_combine_first(self): head['A'] = 1 combined = head.combine_first(tail) - self.assertTrue((combined['A'][:10] == 1).all()) + assert (combined['A'][:10] == 1).all() # reverse overlap tail['A'][:10] = 0 combined = tail.combine_first(head) - self.assertTrue((combined['A'][:10] == 0).all()) + assert (combined['A'][:10] == 0).all() # no overlap f = self.frame[:10] @@ -496,13 +510,13 @@ def test_combine_first(self): assert_frame_equal(comb, self.frame) comb = self.frame.combine_first(DataFrame(index=["faz", "boo"])) - self.assertTrue("faz" in comb.index) + assert "faz" in comb.index # #2525 df = DataFrame({'a': [1]}, index=[datetime(2012, 1, 1)]) df2 = DataFrame({}, columns=['b']) result = df.combine_first(df2) - self.assertTrue('b' in result) + assert 'b' in result def test_combine_first_mixed_bug(self): idx = Index(['a', 'b', 'c', 'e']) @@ -524,7 +538,7 @@ def test_combine_first_mixed_bug(self): "col5": ser3}) combined = frame1.combine_first(frame2) - self.assertEqual(len(combined.columns), 5) + assert len(combined.columns) == 5 # gh 3016 (same as in update) df = DataFrame([[1., 2., False, True], [4., 5., True, False]], @@ -589,28 +603,28 @@ def test_combine_first_align_nan(self): dfa = pd.DataFrame([[pd.Timestamp('2011-01-01'), 2]], columns=['a', 'b']) dfb = pd.DataFrame([[4], [5]], columns=['b']) - self.assertEqual(dfa['a'].dtype, 'datetime64[ns]') - self.assertEqual(dfa['b'].dtype, 'int64') + assert dfa['a'].dtype == 'datetime64[ns]' + assert dfa['b'].dtype == 'int64' res = dfa.combine_first(dfb) exp = pd.DataFrame({'a': [pd.Timestamp('2011-01-01'), pd.NaT], 'b': [2., 5.]}, columns=['a', 'b']) tm.assert_frame_equal(res, exp) - self.assertEqual(res['a'].dtype, 'datetime64[ns]') + assert res['a'].dtype == 'datetime64[ns]' # ToDo: this must be int64 - self.assertEqual(res['b'].dtype, 'float64') + assert res['b'].dtype == 'float64' res = dfa.iloc[:0].combine_first(dfb) exp = pd.DataFrame({'a': [np.nan, np.nan], 'b': [4, 5]}, columns=['a', 'b']) tm.assert_frame_equal(res, exp) # ToDo: this must be datetime64 - self.assertEqual(res['a'].dtype, 'float64') + assert res['a'].dtype == 'float64' # ToDo: this must be int64 - self.assertEqual(res['b'].dtype, 'int64') + assert res['b'].dtype == 'int64' def test_combine_first_timezone(self): - # GH 7630 + # see gh-7630 data1 = pd.to_datetime('20100101 01:01').tz_localize('UTC') df1 = pd.DataFrame(columns=['UTCdatetime', 'abc'], data=data1, @@ -630,10 +644,10 @@ def test_combine_first_timezone(self): index=pd.date_range('20140627', periods=2, freq='D')) tm.assert_frame_equal(res, exp) - self.assertEqual(res['UTCdatetime'].dtype, 'datetime64[ns, UTC]') - self.assertEqual(res['abc'].dtype, 'datetime64[ns, UTC]') + assert res['UTCdatetime'].dtype == 'datetime64[ns, UTC]' + assert res['abc'].dtype == 'datetime64[ns, UTC]' - # GH 10567 + # see gh-10567 dts1 = pd.date_range('2015-01-01', '2015-01-05', tz='UTC') df1 = pd.DataFrame({'DATE': dts1}) dts2 = pd.date_range('2015-01-03', '2015-01-05', tz='UTC') @@ -641,7 +655,7 @@ def test_combine_first_timezone(self): res = df1.combine_first(df2) tm.assert_frame_equal(res, df1) - self.assertEqual(res['DATE'].dtype, 'datetime64[ns, UTC]') + assert res['DATE'].dtype == 'datetime64[ns, UTC]' dts1 = pd.DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03', '2011-01-04'], tz='US/Eastern') @@ -666,7 +680,7 @@ def test_combine_first_timezone(self): # if df1 doesn't have NaN, keep its dtype res = df1.combine_first(df2) tm.assert_frame_equal(res, df1) - self.assertEqual(res['DATE'].dtype, 'datetime64[ns, US/Eastern]') + assert res['DATE'].dtype == 'datetime64[ns, US/Eastern]' dts1 = pd.date_range('2015-01-01', '2015-01-02', tz='US/Eastern') df1 = pd.DataFrame({'DATE': dts1}) @@ -679,7 +693,7 @@ def test_combine_first_timezone(self): pd.Timestamp('2015-01-03')] exp = pd.DataFrame({'DATE': exp_dts}) tm.assert_frame_equal(res, exp) - self.assertEqual(res['DATE'].dtype, 'object') + assert res['DATE'].dtype == 'object' def test_combine_first_timedelta(self): data1 = pd.TimedeltaIndex(['1 day', 'NaT', '3 day', '4day']) @@ -692,7 +706,7 @@ def test_combine_first_timedelta(self): '11 day', '3 day', '4 day']) exp = pd.DataFrame({'TD': exp_dts}, index=[1, 2, 3, 4, 5, 7]) tm.assert_frame_equal(res, exp) - self.assertEqual(res['TD'].dtype, 'timedelta64[ns]') + assert res['TD'].dtype == 'timedelta64[ns]' def test_combine_first_period(self): data1 = pd.PeriodIndex(['2011-01', 'NaT', '2011-03', @@ -708,7 +722,7 @@ def test_combine_first_period(self): freq='M') exp = pd.DataFrame({'P': exp_dts}, index=[1, 2, 3, 4, 5, 7]) tm.assert_frame_equal(res, exp) - self.assertEqual(res['P'].dtype, 'object') + assert res['P'].dtype == 'object' # different freq dts2 = pd.PeriodIndex(['2012-01-01', '2012-01-02', @@ -724,7 +738,7 @@ def test_combine_first_period(self): pd.Period('2011-04', freq='M')] exp = pd.DataFrame({'P': exp_dts}, index=[1, 2, 3, 4, 5, 7]) tm.assert_frame_equal(res, exp) - self.assertEqual(res['P'].dtype, 'object') + assert res['P'].dtype == 'object' def test_combine_first_int(self): # GH14687 - integer series that do no align exactly @@ -734,4 +748,41 @@ def test_combine_first_int(self): res = df1.combine_first(df2) tm.assert_frame_equal(res, df1) - self.assertEqual(res['a'].dtype, 'int64') + assert res['a'].dtype == 'int64' + + def test_concat_datetime_datetime64_frame(self): + # #2624 + rows = [] + rows.append([datetime(2010, 1, 1), 1]) + rows.append([datetime(2010, 1, 2), 'hi']) + + df2_obj = DataFrame.from_records(rows, columns=['date', 'test']) + + ind = date_range(start="2000/1/1", freq="D", periods=10) + df1 = DataFrame({'date': ind, 'test': lrange(10)}) + + # it works! + pd.concat([df1, df2_obj]) + + +class TestDataFrameUpdate(TestData): + + def test_update_nan(self): + # #15593 #15617 + # test 1 + df1 = DataFrame({'A': [1.0, 2, 3], 'B': date_range('2000', periods=3)}) + df2 = DataFrame({'A': [None, 2, 3]}) + expected = df1.copy() + df1.update(df2, overwrite=False) + + tm.assert_frame_equal(df1, expected) + + # test 2 + df1 = DataFrame({'A': [1.0, None, 3], + 'B': date_range('2000', periods=3)}) + df2 = DataFrame({'A': [None, 2, 3]}) + expected = DataFrame({'A': [1.0, 2, 3], + 'B': date_range('2000', periods=3)}) + df1.update(df2, overwrite=False) + + tm.assert_frame_equal(df1, expected) diff --git a/pandas/tests/frame/test_constructors.py b/pandas/tests/frame/test_constructors.py index 07cf6816330bc..d942330ecd8a6 100644 --- a/pandas/tests/frame/test_constructors.py +++ b/pandas/tests/frame/test_constructors.py @@ -6,25 +6,22 @@ import functools import itertools -import nose - +import pytest from numpy.random import randn import numpy as np import numpy.ma as ma import numpy.ma.mrecords as mrecords -from pandas.types.common import is_integer_dtype +from pandas.core.dtypes.common import is_integer_dtype from pandas.compat import (lmap, long, zip, range, lrange, lzip, OrderedDict, is_platform_little_endian) from pandas import compat -from pandas import (DataFrame, Index, Series, isnull, +from pandas import (DataFrame, Index, Series, isna, MultiIndex, Timedelta, Timestamp, date_range) -from pandas.core.common import PandasError import pandas as pd -import pandas.core.common as com -import pandas.lib as lib +import pandas._libs.lib as lib import pandas.util.testing as tm from pandas.tests.frame.common import TestData @@ -35,16 +32,14 @@ 'int32', 'int64'] -class TestDataFrameConstructors(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameConstructors(TestData): def test_constructor(self): df = DataFrame() - self.assertEqual(len(df.index), 0) + assert len(df.index) == 0 df = DataFrame(data={}) - self.assertEqual(len(df.index), 0) + assert len(df.index) == 0 def test_constructor_mixed(self): index, data = tm.getMixedTypeDict() @@ -53,11 +48,11 @@ def test_constructor_mixed(self): indexed_frame = DataFrame(data, index=index) # noqa unindexed_frame = DataFrame(data) # noqa - self.assertEqual(self.mixed_frame['foo'].dtype, np.object_) + assert self.mixed_frame['foo'].dtype == np.object_ def test_constructor_cast_failure(self): foo = DataFrame({'a': ['a', 'b', 'c']}, dtype=np.float64) - self.assertEqual(foo['a'].dtype, object) + assert foo['a'].dtype == object # GH 3010, constructing with odd arrays df = DataFrame(np.ones((4, 2))) @@ -66,8 +61,8 @@ def test_constructor_cast_failure(self): df['foo'] = np.ones((4, 2)).tolist() # this is not ok - self.assertRaises(ValueError, df.__setitem__, tuple(['test']), - np.ones((4, 2))) + pytest.raises(ValueError, df.__setitem__, tuple(['test']), + np.ones((4, 2))) # this is ok df['foo2'] = np.ones((4, 2)).tolist() @@ -81,32 +76,31 @@ def test_constructor_dtype_copy(self): new_df = pd.DataFrame(orig_df, dtype=float, copy=True) new_df['col1'] = 200. - self.assertEqual(orig_df['col1'][0], 1.) + assert orig_df['col1'][0] == 1. def test_constructor_dtype_nocast_view(self): df = DataFrame([[1, 2]]) should_be_view = DataFrame(df, dtype=df[0].dtype) should_be_view[0][0] = 99 - self.assertEqual(df.values[0, 0], 99) + assert df.values[0, 0] == 99 should_be_view = DataFrame(df.values, dtype=df[0].dtype) should_be_view[0][0] = 97 - self.assertEqual(df.values[0, 0], 97) + assert df.values[0, 0] == 97 def test_constructor_dtype_list_data(self): df = DataFrame([[1, '2'], [None, 'a']], dtype=object) - self.assertIsNone(df.loc[1, 0]) - self.assertEqual(df.loc[0, 1], '2') + assert df.loc[1, 0] is None + assert df.loc[0, 1] == '2' def test_constructor_list_frames(self): - - # GH 3243 + # see gh-3243 result = DataFrame([DataFrame([])]) - self.assertEqual(result.shape, (1, 0)) + assert result.shape == (1, 0) result = DataFrame([DataFrame(dict(A=lrange(5)))]) - tm.assertIsInstance(result.iloc[0, 0], DataFrame) + assert isinstance(result.iloc[0, 0], DataFrame) def test_constructor_mixed_dtypes(self): @@ -154,8 +148,8 @@ def test_constructor_complex_dtypes(self): b = np.random.rand(10).astype(np.complex128) df = DataFrame({'a': a, 'b': b}) - self.assertEqual(a.dtype, df.a.dtype) - self.assertEqual(b.dtype, df.b.dtype) + assert a.dtype == df.a.dtype + assert b.dtype == df.b.dtype def test_constructor_rec(self): rec = self.frame.to_records(index=False) @@ -166,11 +160,11 @@ def test_constructor_rec(self): index = self.frame.index df = DataFrame(rec) - self.assert_index_equal(df.columns, pd.Index(rec.dtype.names)) + tm.assert_index_equal(df.columns, pd.Index(rec.dtype.names)) df2 = DataFrame(rec, index=index) - self.assert_index_equal(df2.columns, pd.Index(rec.dtype.names)) - self.assert_index_equal(df2.index, index) + tm.assert_index_equal(df2.columns, pd.Index(rec.dtype.names)) + tm.assert_index_equal(df2.index, index) rng = np.arange(len(rec))[::-1] df3 = DataFrame(rec, index=rng, columns=['C', 'B']) @@ -180,7 +174,7 @@ def test_constructor_rec(self): def test_constructor_bool(self): df = DataFrame({0: np.ones(10, dtype=bool), 1: np.zeros(10, dtype=bool)}) - self.assertEqual(df.values.dtype, np.bool_) + assert df.values.dtype == np.bool_ def test_constructor_overflow_int64(self): # see gh-14881 @@ -188,7 +182,7 @@ def test_constructor_overflow_int64(self): dtype=np.uint64) result = DataFrame({'a': values}) - self.assertEqual(result['a'].dtype, np.uint64) + assert result['a'].dtype == np.uint64 # see gh-2355 data_scores = [(6311132704823138710, 273), (2685045978526272070, 23), @@ -199,7 +193,7 @@ def test_constructor_overflow_int64(self): data = np.zeros((len(data_scores),), dtype=dtype) data[:] = data_scores df_crawls = DataFrame(data) - self.assertEqual(df_crawls['uid'].dtype, np.uint64) + assert df_crawls['uid'].dtype == np.uint64 def test_constructor_ordereddict(self): import random @@ -208,15 +202,15 @@ def test_constructor_ordereddict(self): random.shuffle(nums) expected = ['A%d' % i for i in nums] df = DataFrame(OrderedDict(zip(expected, [[0]] * nitems))) - self.assertEqual(expected, list(df.columns)) + assert expected == list(df.columns) def test_constructor_dict(self): frame = DataFrame({'col1': self.ts1, 'col2': self.ts2}) # col2 is padded with NaN - self.assertEqual(len(self.ts1), 30) - self.assertEqual(len(self.ts2), 25) + assert len(self.ts1) == 30 + assert len(self.ts2) == 25 tm.assert_series_equal(self.ts1, frame['col1'], check_names=False) @@ -228,55 +222,55 @@ def test_constructor_dict(self): 'col2': self.ts2}, columns=['col2', 'col3', 'col4']) - self.assertEqual(len(frame), len(self.ts2)) - self.assertNotIn('col1', frame) - self.assertTrue(isnull(frame['col3']).all()) + assert len(frame) == len(self.ts2) + assert 'col1' not in frame + assert isna(frame['col3']).all() # Corner cases - self.assertEqual(len(DataFrame({})), 0) + assert len(DataFrame({})) == 0 # mix dict and array, wrong size - no spec for which error should raise # first - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): DataFrame({'A': {'a': 'a', 'b': 'b'}, 'B': ['a', 'b', 'c']}) # Length-one dict micro-optimization frame = DataFrame({'A': {'1': 1, '2': 2}}) - self.assert_index_equal(frame.index, pd.Index(['1', '2'])) + tm.assert_index_equal(frame.index, pd.Index(['1', '2'])) # empty dict plus index idx = Index([0, 1, 2]) frame = DataFrame({}, index=idx) - self.assertIs(frame.index, idx) + assert frame.index is idx # empty with index and columns idx = Index([0, 1, 2]) frame = DataFrame({}, index=idx, columns=idx) - self.assertIs(frame.index, idx) - self.assertIs(frame.columns, idx) - self.assertEqual(len(frame._series), 3) + assert frame.index is idx + assert frame.columns is idx + assert len(frame._series) == 3 # with dict of empty list and Series frame = DataFrame({'A': [], 'B': []}, columns=['A', 'B']) - self.assert_index_equal(frame.index, Index([], dtype=np.int64)) + tm.assert_index_equal(frame.index, Index([], dtype=np.int64)) # GH 14381 # Dict with None value frame_none = DataFrame(dict(a=None), index=[0]) frame_none_list = DataFrame(dict(a=[None]), index=[0]) - tm.assert_equal(frame_none.get_value(0, 'a'), None) - tm.assert_equal(frame_none_list.get_value(0, 'a'), None) + assert frame_none.get_value(0, 'a') is None + assert frame_none_list.get_value(0, 'a') is None tm.assert_frame_equal(frame_none, frame_none_list) # GH10856 # dict with scalar values should raise error, even if columns passed - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): DataFrame({'a': 0.7}) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): DataFrame({'a': 0.7}, columns=['a']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): DataFrame({'a': 0.7}, columns=['b']) def test_constructor_multi_index(self): @@ -285,47 +279,50 @@ def test_constructor_multi_index(self): tuples = [(2, 3), (3, 3), (3, 3)] mi = MultiIndex.from_tuples(tuples) df = DataFrame(index=mi, columns=mi) - self.assertTrue(pd.isnull(df).values.ravel().all()) + assert pd.isna(df).values.ravel().all() tuples = [(3, 3), (2, 3), (3, 3)] mi = MultiIndex.from_tuples(tuples) df = DataFrame(index=mi, columns=mi) - self.assertTrue(pd.isnull(df).values.ravel().all()) + assert pd.isna(df).values.ravel().all() def test_constructor_error_msgs(self): msg = "Empty data passed with indices specified." # passing an empty array with columns specified. - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame(np.empty(0), columns=list('abc')) msg = "Mixing dicts with non-Series may lead to ambiguous ordering." # mix dict and array, wrong size - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame({'A': {'a': 'a', 'b': 'b'}, 'B': ['a', 'b', 'c']}) # wrong size ndarray, GH 3105 msg = r"Shape of passed values is \(3, 4\), indices imply \(3, 3\)" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame(np.arange(12).reshape((4, 3)), columns=['foo', 'bar', 'baz'], index=pd.date_range('2000-01-01', periods=3)) # higher dim raise exception - with tm.assertRaisesRegexp(ValueError, 'Must pass 2-d input'): + with tm.assert_raises_regex(ValueError, 'Must pass 2-d input'): DataFrame(np.zeros((3, 3, 3)), columns=['A', 'B', 'C'], index=[1]) # wrong size axis labels - with tm.assertRaisesRegexp(ValueError, "Shape of passed values is " - r"\(3, 2\), indices imply \(3, 1\)"): + with tm.assert_raises_regex(ValueError, "Shape of passed values " + "is \(3, 2\), indices " + "imply \(3, 1\)"): DataFrame(np.random.rand(2, 3), columns=['A', 'B', 'C'], index=[1]) - with tm.assertRaisesRegexp(ValueError, "Shape of passed values is " - r"\(3, 2\), indices imply \(2, 2\)"): + with tm.assert_raises_regex(ValueError, "Shape of passed values " + "is \(3, 2\), indices " + "imply \(2, 2\)"): DataFrame(np.random.rand(2, 3), columns=['A', 'B'], index=[1, 2]) - with tm.assertRaisesRegexp(ValueError, 'If using all scalar values, ' - 'you must pass an index'): + with tm.assert_raises_regex(ValueError, "If using all scalar " + "values, you must pass " + "an index"): DataFrame({'a': False, 'b': True}) def test_constructor_with_embedded_frames(self): @@ -380,14 +377,14 @@ def test_constructor_dict_cast(self): 'B': {'1': '1', '2': '2', '3': '3'}, } frame = DataFrame(test_data, dtype=float) - self.assertEqual(len(frame), 3) - self.assertEqual(frame['B'].dtype, np.float64) - self.assertEqual(frame['A'].dtype, np.float64) + assert len(frame) == 3 + assert frame['B'].dtype == np.float64 + assert frame['A'].dtype == np.float64 frame = DataFrame(test_data) - self.assertEqual(len(frame), 3) - self.assertEqual(frame['B'].dtype, np.object_) - self.assertEqual(frame['A'].dtype, np.float64) + assert len(frame) == 3 + assert frame['B'].dtype == np.object_ + assert frame['A'].dtype == np.float64 # can't cast to float test_data = { @@ -395,17 +392,17 @@ def test_constructor_dict_cast(self): 'B': dict(zip(range(15), randn(15))) } frame = DataFrame(test_data, dtype=float) - self.assertEqual(len(frame), 20) - self.assertEqual(frame['A'].dtype, np.object_) - self.assertEqual(frame['B'].dtype, np.float64) + assert len(frame) == 20 + assert frame['A'].dtype == np.object_ + assert frame['B'].dtype == np.float64 def test_constructor_dict_dont_upcast(self): d = {'Col1': {'Row1': 'A String', 'Row2': np.nan}} df = DataFrame(d) - tm.assertIsInstance(df['Col1']['Row2'], float) + assert isinstance(df['Col1']['Row2'], float) dm = DataFrame([[1, 2], ['a', 'b']], index=[1, 2], columns=[1, 2]) - tm.assertIsInstance(dm[1][1], int) + assert isinstance(dm[1][1], int) def test_constructor_dict_of_tuples(self): # GH #1491 @@ -496,14 +493,14 @@ def test_constructor_period(self): a = pd.PeriodIndex(['2012-01', 'NaT', '2012-04'], freq='M') b = pd.PeriodIndex(['2012-02-01', '2012-03-01', 'NaT'], freq='D') df = pd.DataFrame({'a': a, 'b': b}) - self.assertEqual(df['a'].dtype, 'object') - self.assertEqual(df['b'].dtype, 'object') + assert df['a'].dtype == 'object' + assert df['b'].dtype == 'object' # list of periods df = pd.DataFrame({'a': a.asobject.tolist(), 'b': b.asobject.tolist()}) - self.assertEqual(df['a'].dtype, 'object') - self.assertEqual(df['b'].dtype, 'object') + assert df['a'].dtype == 'object' + assert df['b'].dtype == 'object' def test_nested_dict_frame_constructor(self): rng = pd.period_range('1/1/2000', periods=5) @@ -532,55 +529,55 @@ def _check_basic_constructor(self, empty): # 2-D input frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(len(frame.index), 2) - self.assertEqual(len(frame.columns), 3) + assert len(frame.index) == 2 + assert len(frame.columns) == 3 # 1-D input frame = DataFrame(empty((3,)), columns=['A'], index=[1, 2, 3]) - self.assertEqual(len(frame.index), 3) - self.assertEqual(len(frame.columns), 1) + assert len(frame.index) == 3 + assert len(frame.columns) == 1 # cast type frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2], dtype=np.int64) - self.assertEqual(frame.values.dtype, np.int64) + assert frame.values.dtype == np.int64 # wrong size axis labels msg = r'Shape of passed values is \(3, 2\), indices imply \(3, 1\)' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame(mat, columns=['A', 'B', 'C'], index=[1]) msg = r'Shape of passed values is \(3, 2\), indices imply \(2, 2\)' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame(mat, columns=['A', 'B'], index=[1, 2]) # higher dim raise exception - with tm.assertRaisesRegexp(ValueError, 'Must pass 2-d input'): + with tm.assert_raises_regex(ValueError, 'Must pass 2-d input'): DataFrame(empty((3, 3, 3)), columns=['A', 'B', 'C'], index=[1]) # automatic labeling frame = DataFrame(mat) - self.assert_index_equal(frame.index, pd.Index(lrange(2))) - self.assert_index_equal(frame.columns, pd.Index(lrange(3))) + tm.assert_index_equal(frame.index, pd.Index(lrange(2))) + tm.assert_index_equal(frame.columns, pd.Index(lrange(3))) frame = DataFrame(mat, index=[1, 2]) - self.assert_index_equal(frame.columns, pd.Index(lrange(3))) + tm.assert_index_equal(frame.columns, pd.Index(lrange(3))) frame = DataFrame(mat, columns=['A', 'B', 'C']) - self.assert_index_equal(frame.index, pd.Index(lrange(2))) + tm.assert_index_equal(frame.index, pd.Index(lrange(2))) # 0-length axis frame = DataFrame(empty((0, 3))) - self.assertEqual(len(frame.index), 0) + assert len(frame.index) == 0 frame = DataFrame(empty((3, 0))) - self.assertEqual(len(frame.columns), 0) + assert len(frame.columns) == 0 def test_constructor_ndarray(self): self._check_basic_constructor(np.ones) frame = DataFrame(['foo', 'bar'], index=[0, 1], columns=['A']) - self.assertEqual(len(frame), 2) + assert len(frame) == 2 def test_constructor_maskedarray(self): self._check_basic_constructor(ma.masked_all) @@ -590,13 +587,13 @@ def test_constructor_maskedarray(self): mat[0, 0] = 1.0 mat[1, 2] = 2.0 frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(1.0, frame['A'][1]) - self.assertEqual(2.0, frame['C'][2]) + assert 1.0 == frame['A'][1] + assert 2.0 == frame['C'][2] # what is this even checking?? mat = ma.masked_all((2, 3), dtype=float) frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertTrue(np.all(~np.asarray(frame == frame))) + assert np.all(~np.asarray(frame == frame)) def test_constructor_maskedarray_nonfloat(self): # masked int promoted to float @@ -604,66 +601,66 @@ def test_constructor_maskedarray_nonfloat(self): # 2-D input frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(len(frame.index), 2) - self.assertEqual(len(frame.columns), 3) - self.assertTrue(np.all(~np.asarray(frame == frame))) + assert len(frame.index) == 2 + assert len(frame.columns) == 3 + assert np.all(~np.asarray(frame == frame)) # cast type frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2], dtype=np.float64) - self.assertEqual(frame.values.dtype, np.float64) + assert frame.values.dtype == np.float64 # Check non-masked values mat2 = ma.copy(mat) mat2[0, 0] = 1 mat2[1, 2] = 2 frame = DataFrame(mat2, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(1, frame['A'][1]) - self.assertEqual(2, frame['C'][2]) + assert 1 == frame['A'][1] + assert 2 == frame['C'][2] # masked np.datetime64 stays (use lib.NaT as null) mat = ma.masked_all((2, 3), dtype='M8[ns]') # 2-D input frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(len(frame.index), 2) - self.assertEqual(len(frame.columns), 3) - self.assertTrue(isnull(frame).values.all()) + assert len(frame.index) == 2 + assert len(frame.columns) == 3 + assert isna(frame).values.all() # cast type frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2], dtype=np.int64) - self.assertEqual(frame.values.dtype, np.int64) + assert frame.values.dtype == np.int64 # Check non-masked values mat2 = ma.copy(mat) mat2[0, 0] = 1 mat2[1, 2] = 2 frame = DataFrame(mat2, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(1, frame['A'].view('i8')[1]) - self.assertEqual(2, frame['C'].view('i8')[2]) + assert 1 == frame['A'].view('i8')[1] + assert 2 == frame['C'].view('i8')[2] # masked bool promoted to object mat = ma.masked_all((2, 3), dtype=bool) # 2-D input frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(len(frame.index), 2) - self.assertEqual(len(frame.columns), 3) - self.assertTrue(np.all(~np.asarray(frame == frame))) + assert len(frame.index) == 2 + assert len(frame.columns) == 3 + assert np.all(~np.asarray(frame == frame)) # cast type frame = DataFrame(mat, columns=['A', 'B', 'C'], index=[1, 2], dtype=object) - self.assertEqual(frame.values.dtype, object) + assert frame.values.dtype == object # Check non-masked values mat2 = ma.copy(mat) mat2[0, 0] = True mat2[1, 2] = False frame = DataFrame(mat2, columns=['A', 'B', 'C'], index=[1, 2]) - self.assertEqual(True, frame['A'][1]) - self.assertEqual(False, frame['C'][2]) + assert frame['A'][1] is True + assert frame['C'][2] is False def test_constructor_mrecarray(self): # Ensure mrecarray produces frame identical to dict of masked arrays @@ -710,41 +707,41 @@ def test_constructor_mrecarray(self): def test_constructor_corner(self): df = DataFrame(index=[]) - self.assertEqual(df.values.shape, (0, 0)) + assert df.values.shape == (0, 0) # empty but with specified dtype df = DataFrame(index=lrange(10), columns=['a', 'b'], dtype=object) - self.assertEqual(df.values.dtype, np.object_) + assert df.values.dtype == np.object_ # does not error but ends up float df = DataFrame(index=lrange(10), columns=['a', 'b'], dtype=int) - self.assertEqual(df.values.dtype, np.object_) + assert df.values.dtype == np.object_ # #1783 empty dtype object df = DataFrame({}, columns=['foo', 'bar']) - self.assertEqual(df.values.dtype, np.object_) + assert df.values.dtype == np.object_ df = DataFrame({'b': 1}, index=lrange(10), columns=list('abc'), dtype=int) - self.assertEqual(df.values.dtype, np.object_) + assert df.values.dtype == np.object_ def test_constructor_scalar_inference(self): data = {'int': 1, 'bool': True, 'float': 3., 'complex': 4j, 'object': 'foo'} df = DataFrame(data, index=np.arange(10)) - self.assertEqual(df['int'].dtype, np.int64) - self.assertEqual(df['bool'].dtype, np.bool_) - self.assertEqual(df['float'].dtype, np.float64) - self.assertEqual(df['complex'].dtype, np.complex128) - self.assertEqual(df['object'].dtype, np.object_) + assert df['int'].dtype == np.int64 + assert df['bool'].dtype == np.bool_ + assert df['float'].dtype == np.float64 + assert df['complex'].dtype == np.complex128 + assert df['object'].dtype == np.object_ def test_constructor_arrays_and_scalars(self): df = DataFrame({'a': randn(10), 'b': True}) exp = DataFrame({'a': df['a'].values, 'b': [True] * 10}) tm.assert_frame_equal(df, exp) - with tm.assertRaisesRegexp(ValueError, 'must pass an index'): + with tm.assert_raises_regex(ValueError, 'must pass an index'): DataFrame({'a': False, 'b': True}) def test_constructor_DataFrame(self): @@ -752,38 +749,38 @@ def test_constructor_DataFrame(self): tm.assert_frame_equal(df, self.frame) df_casted = DataFrame(self.frame, dtype=np.int64) - self.assertEqual(df_casted.values.dtype, np.int64) + assert df_casted.values.dtype == np.int64 def test_constructor_more(self): # used to be in test_matrix.py arr = randn(10) dm = DataFrame(arr, columns=['A'], index=np.arange(10)) - self.assertEqual(dm.values.ndim, 2) + assert dm.values.ndim == 2 arr = randn(0) dm = DataFrame(arr) - self.assertEqual(dm.values.ndim, 2) - self.assertEqual(dm.values.ndim, 2) + assert dm.values.ndim == 2 + assert dm.values.ndim == 2 # no data specified dm = DataFrame(columns=['A', 'B'], index=np.arange(10)) - self.assertEqual(dm.values.shape, (10, 2)) + assert dm.values.shape == (10, 2) dm = DataFrame(columns=['A', 'B']) - self.assertEqual(dm.values.shape, (0, 2)) + assert dm.values.shape == (0, 2) dm = DataFrame(index=np.arange(10)) - self.assertEqual(dm.values.shape, (10, 0)) + assert dm.values.shape == (10, 0) # corner, silly # TODO: Fix this Exception to be better... - with tm.assertRaisesRegexp(PandasError, 'constructor not ' - 'properly called'): + with tm.assert_raises_regex(ValueError, 'constructor not ' + 'properly called'): DataFrame((1, 2, 3)) # can't cast mat = np.array(['foo', 'bar'], dtype=object).reshape(2, 1) - with tm.assertRaisesRegexp(ValueError, 'cast'): + with tm.assert_raises_regex(ValueError, 'cast'): DataFrame(mat, index=[0, 1], columns=[0], dtype=float) dm = DataFrame(DataFrame(self.frame._series)) @@ -794,8 +791,8 @@ def test_constructor_more(self): 'B': np.ones(10, dtype=np.float64)}, index=np.arange(10)) - self.assertEqual(len(dm.columns), 2) - self.assertEqual(dm.values.dtype, np.float64) + assert len(dm.columns) == 2 + assert dm.values.dtype == np.float64 def test_constructor_empty_list(self): df = DataFrame([], index=[]) @@ -819,8 +816,8 @@ def test_constructor_list_of_lists(self): # GH #484 l = [[1, 'a'], [2, 'b']] df = DataFrame(data=l, columns=["num", "str"]) - self.assertTrue(is_integer_dtype(df['num'])) - self.assertEqual(df['str'].dtype, np.object_) + assert is_integer_dtype(df['num']) + assert df['str'].dtype == np.object_ # GH 4851 # list of 0-dim ndarrays @@ -1009,8 +1006,8 @@ class CustomDict(dict): def test_constructor_ragged(self): data = {'A': randn(10), 'B': randn(8)} - with tm.assertRaisesRegexp(ValueError, - 'arrays must all be same length'): + with tm.assert_raises_regex(ValueError, + 'arrays must all be same length'): DataFrame(data) def test_constructor_scalar(self): @@ -1029,10 +1026,10 @@ def test_constructor_mixed_dict_and_Series(self): data['B'] = Series([4, 3, 2, 1], index=['bar', 'qux', 'baz', 'foo']) result = DataFrame(data) - self.assertTrue(result.index.is_monotonic) + assert result.index.is_monotonic # ordering ambiguous, raise exception - with tm.assertRaisesRegexp(ValueError, 'ambiguous ordering'): + with tm.assert_raises_regex(ValueError, 'ambiguous ordering'): DataFrame({'A': ['a', 'b'], 'B': {'a': 'a', 'b': 'b'}}) # this is OK though @@ -1077,8 +1074,8 @@ def test_constructor_orient(self): def test_constructor_Series_named(self): a = Series([1, 2, 3], index=['a', 'b', 'c'], name='x') df = DataFrame(a) - self.assertEqual(df.columns[0], 'x') - self.assert_index_equal(df.index, a.index) + assert df.columns[0] == 'x' + tm.assert_index_equal(df.index, a.index) # ndarray like arr = np.random.randn(10) @@ -1092,12 +1089,12 @@ def test_constructor_Series_named(self): expected = DataFrame({0: s}) tm.assert_frame_equal(df, expected) - self.assertRaises(ValueError, DataFrame, s, columns=[1, 2]) + pytest.raises(ValueError, DataFrame, s, columns=[1, 2]) # #2234 a = Series([], name='x') df = DataFrame(a) - self.assertEqual(df.columns[0], 'x') + assert df.columns[0] == 'x' # series with name and w/o s1 = Series(arr, name='x') @@ -1111,6 +1108,22 @@ def test_constructor_Series_named(self): expected = DataFrame({1: s1, 0: arr}, columns=[0, 1]) tm.assert_frame_equal(df, expected) + def test_constructor_Series_named_and_columns(self): + # GH 9232 validation + + s0 = Series(range(5), name=0) + s1 = Series(range(5), name=1) + + # matching name and column gives standard frame + tm.assert_frame_equal(pd.DataFrame(s0, columns=[0]), + s0.to_frame()) + tm.assert_frame_equal(pd.DataFrame(s1, columns=[1]), + s1.to_frame()) + + # non-matching produces empty frame + assert pd.DataFrame(s0, columns=[1]).empty + assert pd.DataFrame(s1, columns=[0]).empty + def test_constructor_Series_differently_indexed(self): # name s1 = Series([1, 2, 3], index=['a', 'b', 'c'], name='x') @@ -1122,13 +1135,13 @@ def test_constructor_Series_differently_indexed(self): df1 = DataFrame(s1, index=other_index) exp1 = DataFrame(s1.reindex(other_index)) - self.assertEqual(df1.columns[0], 'x') + assert df1.columns[0] == 'x' tm.assert_frame_equal(df1, exp1) df2 = DataFrame(s2, index=other_index) exp2 = DataFrame(s2.reindex(other_index)) - self.assertEqual(df2.columns[0], 0) - self.assert_index_equal(df2.index, other_index) + assert df2.columns[0] == 0 + tm.assert_index_equal(df2.index, other_index) tm.assert_frame_equal(df2, exp2) def test_constructor_manager_resize(self): @@ -1137,8 +1150,8 @@ def test_constructor_manager_resize(self): result = DataFrame(self.frame._data, index=index, columns=columns) - self.assert_index_equal(result.index, Index(index)) - self.assert_index_equal(result.columns, Index(columns)) + tm.assert_index_equal(result.index, Index(index)) + tm.assert_index_equal(result.columns, Index(columns)) def test_constructor_from_items(self): items = [(c, self.frame[c]) for c in self.frame.columns] @@ -1158,10 +1171,11 @@ def test_constructor_from_items(self): columns=self.mixed_frame.columns, orient='index') tm.assert_frame_equal(recons, self.mixed_frame) - self.assertEqual(recons['A'].dtype, np.float64) + assert recons['A'].dtype == np.float64 - with tm.assertRaisesRegexp(TypeError, - "Must pass columns with orient='index'"): + with tm.assert_raises_regex(TypeError, + "Must pass columns with " + "orient='index'"): DataFrame.from_items(row_items, orient='index') # orient='index', but thar be tuples @@ -1174,7 +1188,7 @@ def test_constructor_from_items(self): columns=self.mixed_frame.columns, orient='index') tm.assert_frame_equal(recons, self.mixed_frame) - tm.assertIsInstance(recons['foo'][0], tuple) + assert isinstance(recons['foo'][0], tuple) rs = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])], orient='index', @@ -1188,7 +1202,8 @@ def test_constructor_mix_series_nonseries(self): 'B': list(self.frame['B'])}, columns=['A', 'B']) tm.assert_frame_equal(df, self.frame.loc[:, ['A', 'B']]) - with tm.assertRaisesRegexp(ValueError, 'does not match index length'): + with tm.assert_raises_regex(ValueError, 'does not match ' + 'index length'): DataFrame({'A': self.frame['A'], 'B': list(self.frame['B'])[:-2]}) def test_constructor_miscast_na_int_dtype(self): @@ -1197,8 +1212,8 @@ def test_constructor_miscast_na_int_dtype(self): tm.assert_frame_equal(df, expected) def test_constructor_iterator_failure(self): - with tm.assertRaisesRegexp(TypeError, 'iterator'): - df = DataFrame(iter([1, 2, 3])) # noqa + with tm.assert_raises_regex(TypeError, 'iterator'): + DataFrame(iter([1, 2, 3])) def test_constructor_column_duplicates(self): # it works! #2079 @@ -1212,9 +1227,9 @@ def test_constructor_column_duplicates(self): [('a', [8]), ('a', [5])], columns=['a', 'a']) tm.assert_frame_equal(idf, edf) - self.assertRaises(ValueError, DataFrame.from_items, - [('a', [8]), ('a', [5]), ('b', [6])], - columns=['b', 'a', 'a']) + pytest.raises(ValueError, DataFrame.from_items, + [('a', [8]), ('a', [5]), ('b', [6])], + columns=['b', 'a', 'a']) def test_constructor_empty_with_string_dtype(self): # GH 9428 @@ -1245,9 +1260,10 @@ def test_constructor_single_value(self): dtype=object), index=[1, 2], columns=['a', 'c'])) - self.assertRaises(com.PandasError, DataFrame, 'a', [1, 2]) - self.assertRaises(com.PandasError, DataFrame, 'a', columns=['a', 'c']) - with tm.assertRaisesRegexp(TypeError, 'incompatible data and dtype'): + pytest.raises(ValueError, DataFrame, 'a', [1, 2]) + pytest.raises(ValueError, DataFrame, 'a', columns=['a', 'c']) + with tm.assert_raises_regex(TypeError, 'incompatible data ' + 'and dtype'): DataFrame('a', [1, 2], ['a', 'c'], float) def test_constructor_with_datetimes(self): @@ -1304,7 +1320,7 @@ def test_constructor_with_datetimes(self): ind = date_range(start="2000-01-01", freq="D", periods=10) datetimes = [ts.to_pydatetime() for ts in ind] datetime_s = Series(datetimes) - self.assertEqual(datetime_s.dtype, 'M8[ns]') + assert datetime_s.dtype == 'M8[ns]' df = DataFrame({'datetime_s': datetime_s}) result = df.get_dtype_counts() expected = Series({datetime64name: 1}) @@ -1330,12 +1346,12 @@ def test_constructor_with_datetimes(self): dt = tz.localize(datetime(2012, 1, 1)) df = DataFrame({'End Date': dt}, index=[0]) - self.assertEqual(df.iat[0, 0], dt) + assert df.iat[0, 0] == dt tm.assert_series_equal(df.dtypes, Series( {'End Date': 'datetime64[ns, US/Eastern]'})) df = DataFrame([{'End Date': dt}]) - self.assertEqual(df.iat[0, 0], dt) + assert df.iat[0, 0] == dt tm.assert_series_equal(df.dtypes, Series( {'End Date': 'datetime64[ns, US/Eastern]'})) @@ -1343,13 +1359,13 @@ def test_constructor_with_datetimes(self): # GH 8411 dr = date_range('20130101', periods=3) df = DataFrame({'value': dr}) - self.assertTrue(df.iat[0, 0].tz is None) + assert df.iat[0, 0].tz is None dr = date_range('20130101', periods=3, tz='UTC') df = DataFrame({'value': dr}) - self.assertTrue(str(df.iat[0, 0].tz) == 'UTC') + assert str(df.iat[0, 0].tz) == 'UTC' dr = date_range('20130101', periods=3, tz='US/Eastern') df = DataFrame({'value': dr}) - self.assertTrue(str(df.iat[0, 0].tz) == 'US/Eastern') + assert str(df.iat[0, 0].tz) == 'US/Eastern' # GH 7822 # preserver an index with a tz on dict construction @@ -1371,6 +1387,15 @@ def test_constructor_with_datetimes(self): .reset_index(drop=True), 'b': i_no_tz}) tm.assert_frame_equal(df, expected) + def test_constructor_datetimes_with_nulls(self): + # gh-15869 + for arr in [np.array([None, None, None, None, + datetime.now(), None]), + np.array([None, None, datetime.now(), None])]: + result = DataFrame(arr).get_dtype_counts() + expected = Series({'datetime64[ns]': 1}) + tm.assert_series_equal(result, expected) + def test_constructor_for_list_with_dtypes(self): # TODO(wesm): unused intname = np.dtype(np.int_).name # noqa @@ -1441,18 +1466,18 @@ def test_constructor_for_list_with_dtypes(self): def test_constructor_frame_copy(self): cop = DataFrame(self.frame, copy=True) cop['A'] = 5 - self.assertTrue((cop['A'] == 5).all()) - self.assertFalse((self.frame['A'] == 5).all()) + assert (cop['A'] == 5).all() + assert not (self.frame['A'] == 5).all() def test_constructor_ndarray_copy(self): df = DataFrame(self.frame.values) self.frame.values[5] = 5 - self.assertTrue((df.values[5] == 5).all()) + assert (df.values[5] == 5).all() df = DataFrame(self.frame.values, copy=True) self.frame.values[6] = 6 - self.assertFalse((df.values[6] == 6).all()) + assert not (df.values[6] == 6).all() def test_constructor_series_copy(self): series = self.frame._series @@ -1460,7 +1485,7 @@ def test_constructor_series_copy(self): df = DataFrame({'A': series['A']}) df['A'][:] = 5 - self.assertFalse((series['A'] == 5).all()) + assert not (series['A'] == 5).all() def test_constructor_with_nas(self): # GH 5016 @@ -1471,7 +1496,7 @@ def check(df): df.iloc[:, i] # allow single nans to succeed - indexer = np.arange(len(df.columns))[isnull(df.columns)] + indexer = np.arange(len(df.columns))[isna(df.columns)] if len(indexer) == 1: tm.assert_series_equal(df.iloc[:, indexer[0]], @@ -1482,7 +1507,7 @@ def check(df): def f(): df.loc[:, np.nan] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) df = DataFrame([[1, 2, 3], [4, 5, 6]], index=[1, np.nan]) check(df) @@ -1501,8 +1526,8 @@ def f(): def test_constructor_lists_to_object_dtype(self): # from #1074 d = DataFrame({'a': [np.nan, False]}) - self.assertEqual(d['a'].dtype, np.object_) - self.assertFalse(d['a'][1]) + assert d['a'].dtype == np.object_ + assert not d['a'][1] def test_from_records_to_records(self): # from numpy documentation @@ -1514,7 +1539,7 @@ def test_from_records_to_records(self): index = pd.Index(np.arange(len(arr))[::-1]) indexed_frame = DataFrame.from_records(arr, index=index) - self.assert_index_equal(indexed_frame.index, index) + tm.assert_index_equal(indexed_frame.index, index) # without names, it should go to last ditch arr2 = np.zeros((2, 3)) @@ -1522,18 +1547,18 @@ def test_from_records_to_records(self): # wrong length msg = r'Shape of passed values is \(3, 2\), indices imply \(3, 1\)' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): DataFrame.from_records(arr, index=index[:-1]) indexed_frame = DataFrame.from_records(arr, index='f1') # what to do? records = indexed_frame.to_records() - self.assertEqual(len(records.dtype.names), 3) + assert len(records.dtype.names) == 3 records = indexed_frame.to_records(index=False) - self.assertEqual(len(records.dtype.names), 2) - self.assertNotIn('index', records.dtype.names) + assert len(records.dtype.names) == 2 + assert 'index' not in records.dtype.names def test_from_records_nones(self): tuples = [(1, 2, None, 3), @@ -1541,7 +1566,7 @@ def test_from_records_nones(self): (None, 2, 5, 3)] df = DataFrame.from_records(tuples, columns=['a', 'b', 'c', 'd']) - self.assertTrue(np.isnan(df['c'][0])) + assert np.isnan(df['c'][0]) def test_from_records_iterator(self): arr = np.array([(1.0, 1.0, 2, 2), (3.0, 3.0, 4, 4), (5., 5., 6, 6), @@ -1606,7 +1631,7 @@ def test_from_records_columns_not_modified(self): df = DataFrame.from_records(tuples, columns=columns, index='a') # noqa - self.assertEqual(columns, original_columns) + assert columns == original_columns def test_from_records_decimal(self): from decimal import Decimal @@ -1614,11 +1639,11 @@ def test_from_records_decimal(self): tuples = [(Decimal('1.5'),), (Decimal('2.5'),), (None,)] df = DataFrame.from_records(tuples, columns=['a']) - self.assertEqual(df['a'].dtype, object) + assert df['a'].dtype == object df = DataFrame.from_records(tuples, columns=['a'], coerce_float=True) - self.assertEqual(df['a'].dtype, np.float64) - self.assertTrue(np.isnan(df['a'].values[-1])) + assert df['a'].dtype == np.float64 + assert np.isnan(df['a'].values[-1]) def test_from_records_duplicates(self): result = DataFrame.from_records([(1, 2, 3), (4, 5, 6)], @@ -1638,12 +1663,12 @@ def create_dict(order_id): documents.append({'order_id': 10, 'quantity': 5}) result = DataFrame.from_records(documents, index='order_id') - self.assertEqual(result.index.name, 'order_id') + assert result.index.name == 'order_id' # MultiIndex result = DataFrame.from_records(documents, index=['order_id', 'quantity']) - self.assertEqual(result.index.names, ('order_id', 'quantity')) + assert result.index.names == ('order_id', 'quantity') def test_from_records_misc_brokenness(self): # #2179 @@ -1692,20 +1717,20 @@ def test_from_records_empty_with_nonempty_fields_gh3682(self): a = np.array([(1, 2)], dtype=[('id', np.int64), ('value', np.int64)]) df = DataFrame.from_records(a, index='id') tm.assert_index_equal(df.index, Index([1], name='id')) - self.assertEqual(df.index.name, 'id') + assert df.index.name == 'id' tm.assert_index_equal(df.columns, Index(['value'])) b = np.array([], dtype=[('id', np.int64), ('value', np.int64)]) df = DataFrame.from_records(b, index='id') tm.assert_index_equal(df.index, Index([], name='id')) - self.assertEqual(df.index.name, 'id') + assert df.index.name == 'id' def test_from_records_with_datetimes(self): # this may fail on certain platforms because of a numpy issue # related GH6140 if not is_platform_little_endian(): - raise nose.SkipTest("known failure of test on non-little endian") + pytest.skip("known failure of test on non-little endian") # construction with a null in a recarray # GH 6140 @@ -1717,7 +1742,7 @@ def test_from_records_with_datetimes(self): try: recarray = np.core.records.fromarrays(arrdata, dtype=dtypes) except (ValueError): - raise nose.SkipTest("known failure of numpy rec array creation") + pytest.skip("known failure of numpy rec array creation") result = DataFrame.from_records(recarray) tm.assert_frame_equal(result, expected) @@ -1794,13 +1819,13 @@ def test_from_records_sequencelike(self): # empty case result = DataFrame.from_records([], columns=['foo', 'bar', 'baz']) - self.assertEqual(len(result), 0) - self.assert_index_equal(result.columns, - pd.Index(['foo', 'bar', 'baz'])) + assert len(result) == 0 + tm.assert_index_equal(result.columns, + pd.Index(['foo', 'bar', 'baz'])) result = DataFrame.from_records([]) - self.assertEqual(len(result), 0) - self.assertEqual(len(result.columns), 0) + assert len(result) == 0 + assert len(result.columns) == 0 def test_from_records_dictlike(self): @@ -1853,8 +1878,8 @@ def test_from_records_bad_index_column(self): tm.assert_index_equal(df1.index, Index(df.C)) # should fail - self.assertRaises(ValueError, DataFrame.from_records, df, index=[2]) - self.assertRaises(KeyError, DataFrame.from_records, df, index=2) + pytest.raises(ValueError, DataFrame.from_records, df, index=[2]) + pytest.raises(KeyError, DataFrame.from_records, df, index=2) def test_from_records_non_tuple(self): class Record(object): @@ -1880,14 +1905,21 @@ def test_from_records_len0_with_columns(self): result = DataFrame.from_records([], index='foo', columns=['foo', 'bar']) - self.assertTrue(np.array_equal(result.columns, ['bar'])) - self.assertEqual(len(result), 0) - self.assertEqual(result.index.name, 'foo') + assert np.array_equal(result.columns, ['bar']) + assert len(result) == 0 + assert result.index.name == 'foo' + def test_to_frame_with_falsey_names(self): + # GH 16114 + result = Series(name=0).to_frame().dtypes + expected = Series({0: np.float64}) + tm.assert_series_equal(result, expected) + + result = DataFrame(Series(name=0)).dtypes + tm.assert_series_equal(result, expected) -class TestDataFrameConstructorWithDatetimeTZ(tm.TestCase, TestData): - _multiprocess_can_split_ = True +class TestDataFrameConstructorWithDatetimeTZ(TestData): def test_from_dict(self): @@ -1900,8 +1932,8 @@ def test_from_dict(self): # construction df = DataFrame({'A': idx, 'B': dr}) - self.assertTrue(df['A'].dtype, 'M8[ns, US/Eastern') - self.assertTrue(df['A'].name == 'A') + assert df['A'].dtype, 'M8[ns, US/Eastern' + assert df['A'].name == 'A' tm.assert_series_equal(df['A'], Series(idx, name='A')) tm.assert_series_equal(df['B'], Series(dr, name='B')) @@ -1920,8 +1952,28 @@ def test_from_index(self): df2 = DataFrame(Series(idx2)) tm.assert_series_equal(df2[0], Series(idx2, name=0)) -if __name__ == '__main__': - import nose # noqa + def test_frame_dict_constructor_datetime64_1680(self): + dr = date_range('1/1/2012', periods=10) + s = Series(dr, index=dr) + + # it works! + DataFrame({'a': 'foo', 'b': s}, index=dr) + DataFrame({'a': 'foo', 'b': s.values}, index=dr) + + def test_frame_datetime64_mixed_index_ctor_1681(self): + dr = date_range('2011/1/1', '2012/1/1', freq='W-FRI') + ts = Series(dr) + + # it works! + d = DataFrame({'A': 'foo', 'B': ts}, index=dr) + assert d['B'].isna().all() + + def test_frame_timeseries_to_records(self): + index = date_range('1/1/2000', periods=10) + df = DataFrame(np.random.randn(10, 3), index=index, + columns=['a', 'b', 'c']) + + result = df.to_records() + result['index'].dtype == 'M8[ns]' - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + result = df.to_records(index=False) diff --git a/pandas/tests/frame/test_convert_to.py b/pandas/tests/frame/test_convert_to.py index 53083a602e183..629c695b702fe 100644 --- a/pandas/tests/frame/test_convert_to.py +++ b/pandas/tests/frame/test_convert_to.py @@ -1,8 +1,7 @@ # -*- coding: utf-8 -*- -from __future__ import print_function - -from numpy import nan +import pytest +import collections import numpy as np from pandas import compat @@ -10,57 +9,10 @@ date_range) import pandas.util.testing as tm - from pandas.tests.frame.common import TestData -class TestDataFrameConvertTo(tm.TestCase, TestData): - - _multiprocess_can_split_ = True - - def test_to_dict(self): - test_data = { - 'A': {'1': 1, '2': 2}, - 'B': {'1': '1', '2': '2', '3': '3'}, - } - recons_data = DataFrame(test_data).to_dict() - - for k, v in compat.iteritems(test_data): - for k2, v2 in compat.iteritems(v): - self.assertEqual(v2, recons_data[k][k2]) - - recons_data = DataFrame(test_data).to_dict("l") - - for k, v in compat.iteritems(test_data): - for k2, v2 in compat.iteritems(v): - self.assertEqual(v2, recons_data[k][int(k2) - 1]) - - recons_data = DataFrame(test_data).to_dict("s") - - for k, v in compat.iteritems(test_data): - for k2, v2 in compat.iteritems(v): - self.assertEqual(v2, recons_data[k][k2]) - - recons_data = DataFrame(test_data).to_dict("sp") - expected_split = {'columns': ['A', 'B'], 'index': ['1', '2', '3'], - 'data': [[1.0, '1'], [2.0, '2'], [nan, '3']]} - tm.assert_dict_equal(recons_data, expected_split) - - recons_data = DataFrame(test_data).to_dict("r") - expected_records = [{'A': 1.0, 'B': '1'}, - {'A': 2.0, 'B': '2'}, - {'A': nan, 'B': '3'}] - tm.assertIsInstance(recons_data, list) - self.assertEqual(len(recons_data), 3) - for l, r in zip(recons_data, expected_records): - tm.assert_dict_equal(l, r) - - # GH10844 - recons_data = DataFrame(test_data).to_dict("i") - - for k, v in compat.iteritems(test_data): - for k2, v2 in compat.iteritems(v): - self.assertEqual(v2, recons_data[k2][k]) +class TestDataFrameConvertTo(TestData): def test_to_dict_timestamp(self): @@ -77,10 +29,10 @@ def test_to_dict_timestamp(self): expected_records_mixed = [{'A': tsmp, 'B': 1}, {'A': tsmp, 'B': 2}] - self.assertEqual(test_data.to_dict(orient='records'), - expected_records) - self.assertEqual(test_data_mixed.to_dict(orient='records'), - expected_records_mixed) + assert (test_data.to_dict(orient='records') == + expected_records) + assert (test_data_mixed.to_dict(orient='records') == + expected_records_mixed) expected_series = { 'A': Series([tsmp, tsmp], name='A'), @@ -116,16 +68,16 @@ def test_to_dict_timestamp(self): def test_to_dict_invalid_orient(self): df = DataFrame({'A': [0, 1]}) - self.assertRaises(ValueError, df.to_dict, orient='xinvalid') + pytest.raises(ValueError, df.to_dict, orient='xinvalid') def test_to_records_dt64(self): df = DataFrame([["one", "two", "three"], ["four", "five", "six"]], index=date_range("2012-01-01", "2012-01-02")) - self.assertEqual(df.to_records()['index'][0], df.index[0]) + assert df.to_records()['index'][0] == df.index[0] rs = df.to_records(convert_datetime64=False) - self.assertEqual(rs['index'][0], df.index.values[0]) + assert rs['index'][0] == df.index.values[0] def test_to_records_with_multindex(self): # GH3189 @@ -134,8 +86,8 @@ def test_to_records_with_multindex(self): data = np.zeros((8, 4)) df = DataFrame(data, index=index) r = df.to_records(index=True)['level_0'] - self.assertTrue('bar' in r) - self.assertTrue('one' not in r) + assert 'bar' in r + assert 'one' not in r def test_to_records_with_Mapping_type(self): import email @@ -161,16 +113,16 @@ def test_to_records_index_name(self): df = DataFrame(np.random.randn(3, 3)) df.index.name = 'X' rs = df.to_records() - self.assertIn('X', rs.dtype.fields) + assert 'X' in rs.dtype.fields df = DataFrame(np.random.randn(3, 3)) rs = df.to_records() - self.assertIn('index', rs.dtype.fields) + assert 'index' in rs.dtype.fields df.index = MultiIndex.from_tuples([('a', 'x'), ('a', 'y'), ('b', 'z')]) df.index.names = ['A', None] rs = df.to_records() - self.assertIn('level_0', rs.dtype.fields) + assert 'level_0' in rs.dtype.fields def test_to_records_with_unicode_index(self): # GH13172 @@ -179,3 +131,108 @@ def test_to_records_with_unicode_index(self): .to_records() expected = np.rec.array([('x', 'y')], dtype=[('a', 'O'), ('b', 'O')]) tm.assert_almost_equal(result, expected) + + def test_to_records_with_unicode_column_names(self): + # xref issue: https://github.com/numpy/numpy/issues/2407 + # Issue #11879. to_records used to raise an exception when used + # with column names containing non ascii caracters in Python 2 + result = DataFrame(data={u"accented_name_é": [1.0]}).to_records() + + # Note that numpy allows for unicode field names but dtypes need + # to be specified using dictionnary intsead of list of tuples. + expected = np.rec.array( + [(0, 1.0)], + dtype={"names": ["index", u"accented_name_é"], + "formats": [' 0) # create nans bools = df > 0 - mask = isnull(df) + mask = isna(df) expected = bools.astype(float).mask(mask) result = bools.mask(mask) assert_frame_equal(result, expected) @@ -388,11 +391,11 @@ def test_getitem_setitem_ix_negative_integers(self): with catch_warnings(record=True): self.frame.ix[:, [-1]] = 0 - self.assertTrue((self.frame['D'] == 0).all()) + assert (self.frame['D'] == 0).all() df = DataFrame(np.random.randn(8, 4)) with catch_warnings(record=True): - self.assertTrue(isnull(df.ix[:, [-1]].values).all()) + assert isna(df.ix[:, [-1]].values).all() # #1942 a = DataFrame(randn(20, 2), index=[chr(x + 65) for x in range(20)]) @@ -401,28 +404,28 @@ def test_getitem_setitem_ix_negative_integers(self): with catch_warnings(record=True): assert_series_equal(a.ix[-1], a.ix[-2], check_names=False) - self.assertEqual(a.ix[-1].name, 'T') - self.assertEqual(a.ix[-2].name, 'S') + assert a.ix[-1].name == 'T' + assert a.ix[-2].name == 'S' def test_getattr(self): assert_series_equal(self.frame.A, self.frame['A']) - self.assertRaises(AttributeError, getattr, self.frame, - 'NONEXISTENT_NAME') + pytest.raises(AttributeError, getattr, self.frame, + 'NONEXISTENT_NAME') def test_setattr_column(self): df = DataFrame({'foobar': 1}, index=lrange(10)) df.foobar = 5 - self.assertTrue((df.foobar == 5).all()) + assert (df.foobar == 5).all() def test_setitem(self): # not sure what else to do here series = self.frame['A'][::2] self.frame['col5'] = series - self.assertIn('col5', self.frame) + assert 'col5' in self.frame - self.assertEqual(len(series), 15) - self.assertEqual(len(self.frame), 30) + assert len(series) == 15 + assert len(self.frame) == 30 exp = np.ravel(np.column_stack((series.values, [np.nan] * 15))) exp = Series(exp, index=self.frame.index, name='col5') @@ -432,13 +435,13 @@ def test_setitem(self): self.frame['col6'] = series tm.assert_series_equal(series, self.frame['col6'], check_names=False) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): self.frame[randn(len(self.frame) + 1)] = 1 # set ndarray arr = randn(len(self.frame)) self.frame['col9'] = arr - self.assertTrue((self.frame['col9'] == arr).all()) + assert (self.frame['col9'] == arr).all() self.frame['col7'] = 5 assert((self.frame['col7'] == 5).all()) @@ -455,14 +458,14 @@ def test_setitem(self): def f(): smaller['col10'] = ['1', '2'] - self.assertRaises(com.SettingWithCopyError, f) - self.assertEqual(smaller['col10'].dtype, np.object_) - self.assertTrue((smaller['col10'] == ['1', '2']).all()) + pytest.raises(com.SettingWithCopyError, f) + assert smaller['col10'].dtype == np.object_ + assert (smaller['col10'] == ['1', '2']).all() # with a dtype for dtype in ['int32', 'int64', 'float32', 'float64']: self.frame[dtype] = np.array(arr, dtype=dtype) - self.assertEqual(self.frame[dtype].dtype.name, dtype) + assert self.frame[dtype].dtype.name == dtype # dtype changing GH4204 df = DataFrame([[0, 0]]) @@ -484,7 +487,7 @@ def test_setitem_always_copy(self): self.frame['E'] = s self.frame['E'][5:10] = nan - self.assertTrue(notnull(s[5:10]).all()) + assert notna(s[5:10]).all() def test_setitem_boolean(self): df = self.frame.copy() @@ -519,8 +522,9 @@ def test_setitem_boolean(self): values[values == 2] = 3 assert_almost_equal(df.values, values) - with assertRaisesRegexp(TypeError, 'Must pass DataFrame with boolean ' - 'values only'): + with tm.assert_raises_regex(TypeError, 'Must pass ' + 'DataFrame with ' + 'boolean values only'): df[df * 0] = 2 # index with DataFrame @@ -538,32 +542,32 @@ def test_setitem_boolean(self): def test_setitem_cast(self): self.frame['D'] = self.frame['D'].astype('i8') - self.assertEqual(self.frame['D'].dtype, np.int64) + assert self.frame['D'].dtype == np.int64 # #669, should not cast? # this is now set to int64, which means a replacement of the column to # the value dtype (and nothing to do with the existing dtype) self.frame['B'] = 0 - self.assertEqual(self.frame['B'].dtype, np.int64) + assert self.frame['B'].dtype == np.int64 # cast if pass array of course self.frame['B'] = np.arange(len(self.frame)) - self.assertTrue(issubclass(self.frame['B'].dtype.type, np.integer)) + assert issubclass(self.frame['B'].dtype.type, np.integer) self.frame['foo'] = 'bar' self.frame['foo'] = 0 - self.assertEqual(self.frame['foo'].dtype, np.int64) + assert self.frame['foo'].dtype == np.int64 self.frame['foo'] = 'bar' self.frame['foo'] = 2.5 - self.assertEqual(self.frame['foo'].dtype, np.float64) + assert self.frame['foo'].dtype == np.float64 self.frame['something'] = 0 - self.assertEqual(self.frame['something'].dtype, np.int64) + assert self.frame['something'].dtype == np.int64 self.frame['something'] = 2 - self.assertEqual(self.frame['something'].dtype, np.int64) + assert self.frame['something'].dtype == np.int64 self.frame['something'] = 2.5 - self.assertEqual(self.frame['something'].dtype, np.float64) + assert self.frame['something'].dtype == np.float64 # GH 7704 # dtype conversion on setting @@ -577,9 +581,9 @@ def test_setitem_cast(self): # Test that data type is preserved . #5782 df = DataFrame({'one': np.arange(6, dtype=np.int8)}) df.loc[1, 'one'] = 6 - self.assertEqual(df.dtypes.one, np.dtype(np.int8)) + assert df.dtypes.one == np.dtype(np.int8) df.one = np.int8(7) - self.assertEqual(df.dtypes.one, np.dtype(np.int8)) + assert df.dtypes.one == np.dtype(np.int8) def test_setitem_boolean_column(self): expected = self.frame.copy() @@ -597,8 +601,8 @@ def test_setitem_corner(self): index=np.arange(3)) del df['B'] df['B'] = [1., 2., 3.] - self.assertIn('B', df) - self.assertEqual(len(df.columns), 2) + assert 'B' in df + assert len(df.columns) == 2 df['A'] = 'beginning' df['E'] = 'foo' @@ -610,29 +614,29 @@ def test_setitem_corner(self): dm = DataFrame(index=self.frame.index) dm['A'] = 'foo' dm['B'] = 'bar' - self.assertEqual(len(dm.columns), 2) - self.assertEqual(dm.values.dtype, np.object_) + assert len(dm.columns) == 2 + assert dm.values.dtype == np.object_ # upcast dm['C'] = 1 - self.assertEqual(dm['C'].dtype, np.int64) + assert dm['C'].dtype == np.int64 dm['E'] = 1. - self.assertEqual(dm['E'].dtype, np.float64) + assert dm['E'].dtype == np.float64 # set existing column dm['A'] = 'bar' - self.assertEqual('bar', dm['A'][0]) + assert 'bar' == dm['A'][0] dm = DataFrame(index=np.arange(3)) dm['A'] = 1 dm['foo'] = 'bar' del dm['foo'] dm['foo'] = 'bar' - self.assertEqual(dm['foo'].dtype, np.object_) + assert dm['foo'].dtype == np.object_ dm['coercable'] = ['1', '2', '3'] - self.assertEqual(dm['coercable'].dtype, np.object_) + assert dm['coercable'].dtype == np.object_ def test_setitem_corner2(self): data = {"title": ['foobar', 'bar', 'foobar'] + ['foobar'] * 17, @@ -644,14 +648,14 @@ def test_setitem_corner2(self): df.loc[ix, ['title']] = 'foobar' df.loc[ix, ['cruft']] = 0 - self.assertEqual(df.loc[1, 'title'], 'foobar') - self.assertEqual(df.loc[1, 'cruft'], 0) + assert df.loc[1, 'title'] == 'foobar' + assert df.loc[1, 'cruft'] == 0 def test_setitem_ambig(self): - # difficulties with mixed-type data + # Difficulties with mixed-type data from decimal import Decimal - # created as float type + # Created as float type dm = DataFrame(index=lrange(3), columns=lrange(3)) coercable_series = Series([Decimal(1) for _ in range(3)], @@ -659,32 +663,29 @@ def test_setitem_ambig(self): uncoercable_series = Series(['foo', 'bzr', 'baz'], index=lrange(3)) dm[0] = np.ones(3) - self.assertEqual(len(dm.columns), 3) - # self.assertIsNone(dm.objects) + assert len(dm.columns) == 3 dm[1] = coercable_series - self.assertEqual(len(dm.columns), 3) - # self.assertIsNone(dm.objects) + assert len(dm.columns) == 3 dm[2] = uncoercable_series - self.assertEqual(len(dm.columns), 3) - # self.assertIsNotNone(dm.objects) - self.assertEqual(dm[2].dtype, np.object_) + assert len(dm.columns) == 3 + assert dm[2].dtype == np.object_ def test_setitem_clear_caches(self): - # GH #304 + # see gh-304 df = DataFrame({'x': [1.1, 2.1, 3.1, 4.1], 'y': [5.1, 6.1, 7.1, 8.1]}, index=[0, 1, 2, 3]) df.insert(2, 'z', np.nan) # cache it foo = df['z'] - df.loc[df.index[2:], 'z'] = 42 expected = Series([np.nan, np.nan, 42, 42], index=df.index, name='z') - self.assertIsNot(df['z'], foo) - assert_series_equal(df['z'], expected) + + assert df['z'] is not foo + tm.assert_series_equal(df['z'], expected) def test_setitem_None(self): # GH #766 @@ -704,7 +705,7 @@ def test_setitem_empty(self): 'c': ['111', '222', '333']}) result = df.copy() - result.loc[result.b.isnull(), 'a'] = result.a + result.loc[result.b.isna(), 'a'] = result.a assert_frame_equal(result, df) def test_setitem_empty_frame_with_boolean(self): @@ -730,10 +731,10 @@ def test_getitem_empty_frame_with_boolean(self): def test_delitem_corner(self): f = self.frame.copy() del f['D'] - self.assertEqual(len(f.columns), 3) - self.assertRaises(KeyError, f.__delitem__, 'D') + assert len(f.columns) == 3 + pytest.raises(KeyError, f.__delitem__, 'D') del f['B'] - self.assertEqual(len(f.columns), 2) + assert len(f.columns) == 2 def test_getitem_fancy_2d(self): f = self.frame @@ -773,20 +774,20 @@ def test_getitem_fancy_2d(self): assert_frame_equal(f, exp) with catch_warnings(record=True): - self.assertRaises(ValueError, f.ix.__getitem__, f > 0.5) + pytest.raises(ValueError, f.ix.__getitem__, f > 0.5) def test_slice_floats(self): index = [52195.504153, 52196.303147, 52198.369883] df = DataFrame(np.random.rand(3, 2), index=index) s1 = df.loc[52195.1:52196.5] - self.assertEqual(len(s1), 2) + assert len(s1) == 2 s1 = df.loc[52195.1:52196.6] - self.assertEqual(len(s1), 2) + assert len(s1) == 2 s1 = df.loc[52195.1:52198.9] - self.assertEqual(len(s1), 3) + assert len(s1) == 3 def test_getitem_fancy_slice_integers_step(self): df = DataFrame(np.random.randn(10, 5)) @@ -794,7 +795,7 @@ def test_getitem_fancy_slice_integers_step(self): # this is OK result = df.iloc[:8:2] # noqa df.iloc[:8:2] = np.nan - self.assertTrue(isnull(df.iloc[:8:2]).values.all()) + assert isna(df.iloc[:8:2]).values.all() def test_getitem_setitem_integer_slice_keyerrors(self): df = DataFrame(np.random.randn(10, 5), index=lrange(0, 20, 2)) @@ -802,12 +803,12 @@ def test_getitem_setitem_integer_slice_keyerrors(self): # this is OK cp = df.copy() cp.iloc[4:10] = 0 - self.assertTrue((cp.iloc[4:10] == 0).values.all()) + assert (cp.iloc[4:10] == 0).values.all() # so is this cp = df.copy() cp.iloc[3:11] = 0 - self.assertTrue((cp.iloc[3:11] == 0).values.all()) + assert (cp.iloc[3:11] == 0).values.all() result = df.iloc[2:6] result2 = df.loc[3:11] @@ -818,8 +819,8 @@ def test_getitem_setitem_integer_slice_keyerrors(self): # non-monotonic, raise KeyError df2 = df.iloc[lrange(5) + lrange(5, 10)[::-1]] - self.assertRaises(KeyError, df2.loc.__getitem__, slice(3, 11)) - self.assertRaises(KeyError, df2.loc.__setitem__, slice(3, 11), 0) + pytest.raises(KeyError, df2.loc.__getitem__, slice(3, 11)) + pytest.raises(KeyError, df2.loc.__setitem__, slice(3, 11), 0) def test_setitem_fancy_2d(self): @@ -929,7 +930,7 @@ def test_setitem_fancy_2d(self): def test_fancy_getitem_slice_mixed(self): sliced = self.mixed_frame.iloc[:, -3:] - self.assertEqual(sliced['D'].dtype, np.float64) + assert sliced['D'].dtype == np.float64 # get view with single block # setting it triggers setting with copy @@ -937,8 +938,8 @@ def test_fancy_getitem_slice_mixed(self): def f(): sliced['C'] = 4. - self.assertRaises(com.SettingWithCopyError, f) - self.assertTrue((self.frame['C'] == 4).all()) + pytest.raises(com.SettingWithCopyError, f) + assert (self.frame['C'] == 4).all() def test_fancy_setitem_int_labels(self): # integer index defers to label-based indexing @@ -998,31 +999,28 @@ def test_fancy_index_int_labels_exceptions(self): with catch_warnings(record=True): # labels that aren't contained - self.assertRaises(KeyError, df.ix.__setitem__, - ([0, 1, 2], [2, 3, 4]), 5) + pytest.raises(KeyError, df.ix.__setitem__, + ([0, 1, 2], [2, 3, 4]), 5) # try to set indices not contained in frame - self.assertRaises(KeyError, - self.frame.ix.__setitem__, - ['foo', 'bar', 'baz'], 1) - self.assertRaises(KeyError, - self.frame.ix.__setitem__, - (slice(None, None), ['E']), 1) + pytest.raises(KeyError, self.frame.ix.__setitem__, + ['foo', 'bar', 'baz'], 1) + pytest.raises(KeyError, self.frame.ix.__setitem__, + (slice(None, None), ['E']), 1) # partial setting now allows this GH2578 - # self.assertRaises(KeyError, - # self.frame.ix.__setitem__, - # (slice(None, None), 'E'), 1) + # pytest.raises(KeyError, self.frame.ix.__setitem__, + # (slice(None, None), 'E'), 1) def test_setitem_fancy_mixed_2d(self): with catch_warnings(record=True): self.mixed_frame.ix[:5, ['C', 'B', 'A']] = 5 result = self.mixed_frame.ix[:5, ['C', 'B', 'A']] - self.assertTrue((result.values == 5).all()) + assert (result.values == 5).all() self.mixed_frame.ix[5] = np.nan - self.assertTrue(isnull(self.mixed_frame.ix[5]).all()) + assert isna(self.mixed_frame.ix[5]).all() self.mixed_frame.ix[5] = self.mixed_frame.ix[6] assert_series_equal(self.mixed_frame.ix[5], self.mixed_frame.ix[6], @@ -1032,7 +1030,7 @@ def test_setitem_fancy_mixed_2d(self): with catch_warnings(record=True): df = DataFrame({1: [1., 2., 3.], 2: [3, 4, 5]}) - self.assertTrue(df._is_mixed_type) + assert df._is_mixed_type df.ix[1] = [5, 10] @@ -1174,29 +1172,29 @@ def test_getitem_fancy_1d(self): # return self if no slicing...for now with catch_warnings(record=True): - self.assertIs(f.ix[:, :], f) + assert f.ix[:, :] is f # low dimensional slice with catch_warnings(record=True): xs1 = f.ix[2, ['C', 'B', 'A']] xs2 = f.xs(f.index[2]).reindex(['C', 'B', 'A']) - assert_series_equal(xs1, xs2) + tm.assert_series_equal(xs1, xs2) with catch_warnings(record=True): ts1 = f.ix[5:10, 2] ts2 = f[f.columns[2]][5:10] - assert_series_equal(ts1, ts2) + tm.assert_series_equal(ts1, ts2) # positional xs with catch_warnings(record=True): xs1 = f.ix[0] xs2 = f.xs(f.index[0]) - assert_series_equal(xs1, xs2) + tm.assert_series_equal(xs1, xs2) with catch_warnings(record=True): xs1 = f.ix[f.index[5]] xs2 = f.xs(f.index[5]) - assert_series_equal(xs1, xs2) + tm.assert_series_equal(xs1, xs2) # single column with catch_warnings(record=True): @@ -1207,18 +1205,18 @@ def test_getitem_fancy_1d(self): exp = f.copy() exp.values[5] = 4 f.ix[5][:] = 4 - assert_frame_equal(exp, f) + tm.assert_frame_equal(exp, f) with catch_warnings(record=True): exp.values[:, 1] = 6 f.ix[:, 1][:] = 6 - assert_frame_equal(exp, f) + tm.assert_frame_equal(exp, f) # slice of mixed-frame with catch_warnings(record=True): xs = self.mixed_frame.ix[5] exp = self.mixed_frame.xs(self.mixed_frame.index[5]) - assert_series_equal(xs, exp) + tm.assert_series_equal(xs, exp) def test_setitem_fancy_1d(self): @@ -1284,7 +1282,7 @@ def test_getitem_fancy_scalar(self): for col in f.columns: ts = f[col] for idx in f.index[::5]: - self.assertEqual(ix[idx, col], ts[idx]) + assert ix[idx, col] == ts[idx] def test_setitem_fancy_scalar(self): f = self.frame @@ -1353,10 +1351,10 @@ def test_getitem_fancy_ints(self): def test_getitem_setitem_fancy_exceptions(self): ix = self.frame.iloc - with assertRaisesRegexp(IndexingError, 'Too many indexers'): + with tm.assert_raises_regex(IndexingError, 'Too many indexers'): ix[:, :, :] - with assertRaises(IndexingError): + with pytest.raises(IndexingError): ix[:, :, :] = 1 def test_getitem_setitem_boolean_misaligned(self): @@ -1396,17 +1394,17 @@ def test_getitem_setitem_float_labels(self): result = df.loc[1.5:4] expected = df.reindex([1.5, 2, 3, 4]) assert_frame_equal(result, expected) - self.assertEqual(len(result), 4) + assert len(result) == 4 result = df.loc[4:5] expected = df.reindex([4, 5]) # reindex with int assert_frame_equal(result, expected, check_index_type=False) - self.assertEqual(len(result), 2) + assert len(result) == 2 result = df.loc[4:5] expected = df.reindex([4.0, 5.0]) # reindex with float assert_frame_equal(result, expected) - self.assertEqual(len(result), 2) + assert len(result) == 2 # loc_float changes this to work properly result = df.loc[1:2] @@ -1415,63 +1413,63 @@ def test_getitem_setitem_float_labels(self): df.loc[1:2] = 0 result = df[1:2] - self.assertTrue((result == 0).all().all()) + assert (result == 0).all().all() # #2727 index = Index([1.0, 2.5, 3.5, 4.5, 5.0]) df = DataFrame(np.random.randn(5, 5), index=index) # positional slicing only via iloc! - self.assertRaises(TypeError, lambda: df.iloc[1.0:5]) + pytest.raises(TypeError, lambda: df.iloc[1.0:5]) result = df.iloc[4:5] expected = df.reindex([5.0]) assert_frame_equal(result, expected) - self.assertEqual(len(result), 1) + assert len(result) == 1 cp = df.copy() def f(): cp.iloc[1.0:5] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): result = cp.iloc[1.0:5] == 0 # noqa - self.assertRaises(TypeError, f) - self.assertTrue(result.values.all()) - self.assertTrue((cp.iloc[0:1] == df.iloc[0:1]).values.all()) + pytest.raises(TypeError, f) + assert result.values.all() + assert (cp.iloc[0:1] == df.iloc[0:1]).values.all() cp = df.copy() cp.iloc[4:5] = 0 - self.assertTrue((cp.iloc[4:5] == 0).values.all()) - self.assertTrue((cp.iloc[0:4] == df.iloc[0:4]).values.all()) + assert (cp.iloc[4:5] == 0).values.all() + assert (cp.iloc[0:4] == df.iloc[0:4]).values.all() # float slicing result = df.loc[1.0:5] expected = df assert_frame_equal(result, expected) - self.assertEqual(len(result), 5) + assert len(result) == 5 result = df.loc[1.1:5] expected = df.reindex([2.5, 3.5, 4.5, 5.0]) assert_frame_equal(result, expected) - self.assertEqual(len(result), 4) + assert len(result) == 4 result = df.loc[4.51:5] expected = df.reindex([5.0]) assert_frame_equal(result, expected) - self.assertEqual(len(result), 1) + assert len(result) == 1 result = df.loc[1.0:5.0] expected = df.reindex([1.0, 2.5, 3.5, 4.5, 5.0]) assert_frame_equal(result, expected) - self.assertEqual(len(result), 5) + assert len(result) == 5 cp = df.copy() cp.loc[1.0:5.0] = 0 result = cp.loc[1.0:5.0] - self.assertTrue((result == 0).values.all()) + assert (result == 0).values.all() def test_setitem_single_column_mixed(self): df = DataFrame(randn(5, 3), index=['a', 'b', 'c', 'd', 'e'], @@ -1493,21 +1491,20 @@ def test_setitem_single_column_mixed_datetime(self): assert_series_equal(result, expected) # set an allowable datetime64 type - from pandas import tslib - df.loc['b', 'timestamp'] = tslib.iNaT - self.assertTrue(isnull(df.loc['b', 'timestamp'])) + df.loc['b', 'timestamp'] = iNaT + assert isna(df.loc['b', 'timestamp']) # allow this syntax df.loc['c', 'timestamp'] = nan - self.assertTrue(isnull(df.loc['c', 'timestamp'])) + assert isna(df.loc['c', 'timestamp']) # allow this syntax df.loc['d', :] = nan - self.assertTrue(isnull(df.loc['c', :]).all() == False) # noqa + assert not isna(df.loc['c', :]).all() # as of GH 3216 this will now work! # try to set with a list like item - # self.assertRaises( + # pytest.raises( # Exception, df.loc.__setitem__, ('d', 'timestamp'), [nan]) def test_setitem_frame(self): @@ -1612,11 +1609,11 @@ def test_getitem_setitem_ix_bool_keyerror(self): # #2199 df = DataFrame({'a': [1, 2, 3]}) - self.assertRaises(KeyError, df.loc.__getitem__, False) - self.assertRaises(KeyError, df.loc.__getitem__, True) + pytest.raises(KeyError, df.loc.__getitem__, False) + pytest.raises(KeyError, df.loc.__getitem__, True) - self.assertRaises(KeyError, df.loc.__setitem__, False, 0) - self.assertRaises(KeyError, df.loc.__setitem__, True, 0) + pytest.raises(KeyError, df.loc.__setitem__, False, 0) + pytest.raises(KeyError, df.loc.__setitem__, True, 0) def test_getitem_list_duplicates(self): # #1943 @@ -1624,7 +1621,7 @@ def test_getitem_list_duplicates(self): df.columns.name = 'foo' result = df[['B', 'C']] - self.assertEqual(result.columns.name, 'foo') + assert result.columns.name == 'foo' expected = df.iloc[:, 2:] assert_frame_equal(result, expected) @@ -1634,7 +1631,7 @@ def test_get_value(self): for col in self.frame.columns: result = self.frame.get_value(idx, col) expected = self.frame[col][idx] - self.assertEqual(result, expected) + assert result == expected def test_lookup(self): def alt(df, rows, cols, dtype): @@ -1660,46 +1657,46 @@ def testit(df): df['mask'] = df.lookup(df.index, 'mask_' + df['label']) exp_mask = alt(df, df.index, 'mask_' + df['label'], dtype=np.bool_) tm.assert_series_equal(df['mask'], pd.Series(exp_mask, name='mask')) - self.assertEqual(df['mask'].dtype, np.bool_) + assert df['mask'].dtype == np.bool_ - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): self.frame.lookup(['xyz'], ['A']) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): self.frame.lookup([self.frame.index[0]], ['xyz']) - with tm.assertRaisesRegexp(ValueError, 'same size'): + with tm.assert_raises_regex(ValueError, 'same size'): self.frame.lookup(['a', 'b', 'c'], ['a']) def test_set_value(self): for idx in self.frame.index: for col in self.frame.columns: self.frame.set_value(idx, col, 1) - self.assertEqual(self.frame[col][idx], 1) + assert self.frame[col][idx] == 1 def test_set_value_resize(self): res = self.frame.set_value('foobar', 'B', 0) - self.assertIs(res, self.frame) - self.assertEqual(res.index[-1], 'foobar') - self.assertEqual(res.get_value('foobar', 'B'), 0) + assert res is self.frame + assert res.index[-1] == 'foobar' + assert res.get_value('foobar', 'B') == 0 self.frame.loc['foobar', 'qux'] = 0 - self.assertEqual(self.frame.get_value('foobar', 'qux'), 0) + assert self.frame.get_value('foobar', 'qux') == 0 res = self.frame.copy() res3 = res.set_value('foobar', 'baz', 'sam') - self.assertEqual(res3['baz'].dtype, np.object_) + assert res3['baz'].dtype == np.object_ res = self.frame.copy() res3 = res.set_value('foobar', 'baz', True) - self.assertEqual(res3['baz'].dtype, np.object_) + assert res3['baz'].dtype == np.object_ res = self.frame.copy() res3 = res.set_value('foobar', 'baz', 5) - self.assertTrue(is_float_dtype(res3['baz'])) - self.assertTrue(isnull(res3['baz'].drop(['foobar'])).all()) - self.assertRaises(ValueError, res3.set_value, 'foobar', 'baz', 'sam') + assert is_float_dtype(res3['baz']) + assert isna(res3['baz'].drop(['foobar'])).all() + pytest.raises(ValueError, res3.set_value, 'foobar', 'baz', 'sam') def test_set_value_with_index_dtype_change(self): df_orig = DataFrame(randn(3, 3), index=lrange(3), columns=list('ABC')) @@ -1708,43 +1705,42 @@ def test_set_value_with_index_dtype_change(self): # so column is not created df = df_orig.copy() df.set_value('C', 2, 1.0) - self.assertEqual(list(df.index), list(df_orig.index) + ['C']) - # self.assertEqual(list(df.columns), list(df_orig.columns) + [2]) + assert list(df.index) == list(df_orig.index) + ['C'] + # assert list(df.columns) == list(df_orig.columns) + [2] df = df_orig.copy() df.loc['C', 2] = 1.0 - self.assertEqual(list(df.index), list(df_orig.index) + ['C']) - # self.assertEqual(list(df.columns), list(df_orig.columns) + [2]) + assert list(df.index) == list(df_orig.index) + ['C'] + # assert list(df.columns) == list(df_orig.columns) + [2] # create both new df = df_orig.copy() df.set_value('C', 'D', 1.0) - self.assertEqual(list(df.index), list(df_orig.index) + ['C']) - self.assertEqual(list(df.columns), list(df_orig.columns) + ['D']) + assert list(df.index) == list(df_orig.index) + ['C'] + assert list(df.columns) == list(df_orig.columns) + ['D'] df = df_orig.copy() df.loc['C', 'D'] = 1.0 - self.assertEqual(list(df.index), list(df_orig.index) + ['C']) - self.assertEqual(list(df.columns), list(df_orig.columns) + ['D']) + assert list(df.index) == list(df_orig.index) + ['C'] + assert list(df.columns) == list(df_orig.columns) + ['D'] def test_get_set_value_no_partial_indexing(self): # partial w/ MultiIndex raise exception index = MultiIndex.from_tuples([(0, 1), (0, 2), (1, 1), (1, 2)]) df = DataFrame(index=index, columns=lrange(4)) - self.assertRaises(KeyError, df.get_value, 0, 1) - # self.assertRaises(KeyError, df.set_value, 0, 1, 0) + pytest.raises(KeyError, df.get_value, 0, 1) + # pytest.raises(KeyError, df.set_value, 0, 1, 0) def test_single_element_ix_dont_upcast(self): self.frame['E'] = 1 - self.assertTrue(issubclass(self.frame['E'].dtype.type, - (int, np.integer))) + assert issubclass(self.frame['E'].dtype.type, (int, np.integer)) with catch_warnings(record=True): result = self.frame.ix[self.frame.index[5], 'E'] - self.assertTrue(is_integer(result)) + assert is_integer(result) result = self.frame.loc[self.frame.index[5], 'E'] - self.assertTrue(is_integer(result)) + assert is_integer(result) # GH 11617 df = pd.DataFrame(dict(a=[1.23])) @@ -1752,9 +1748,9 @@ def test_single_element_ix_dont_upcast(self): with catch_warnings(record=True): result = df.ix[0, "b"] - self.assertTrue(is_integer(result)) + assert is_integer(result) result = df.loc[0, "b"] - self.assertTrue(is_integer(result)) + assert is_integer(result) expected = Series([666], [0], name='b') with catch_warnings(record=True): @@ -1763,13 +1759,9 @@ def test_single_element_ix_dont_upcast(self): result = df.loc[[0], "b"] assert_series_equal(result, expected) - def test_irow(self): + def test_iloc_row(self): df = DataFrame(np.random.randn(10, 4), index=lrange(0, 20, 2)) - # 10711, deprecated - with tm.assert_produces_warning(FutureWarning): - df.irow(1) - result = df.iloc[1] exp = df.loc[2] assert_series_equal(result, exp) @@ -1787,7 +1779,7 @@ def test_irow(self): # setting it makes it raise/warn def f(): result[2] = 0. - self.assertRaises(com.SettingWithCopyError, f) + pytest.raises(com.SettingWithCopyError, f) exp_col = df[2].copy() exp_col[4:8] = 0. assert_series_equal(df[2], exp_col) @@ -1797,14 +1789,10 @@ def f(): expected = df.reindex(df.index[[1, 2, 4, 6]]) assert_frame_equal(result, expected) - def test_icol(self): + def test_iloc_col(self): df = DataFrame(np.random.randn(4, 10), columns=lrange(0, 20, 2)) - # 10711, deprecated - with tm.assert_produces_warning(FutureWarning): - df.icol(1) - result = df.iloc[:, 1] exp = df.loc[:, 2] assert_series_equal(result, exp) @@ -1822,16 +1810,15 @@ def test_icol(self): # and that we are setting a copy def f(): result[8] = 0. - self.assertRaises(com.SettingWithCopyError, f) - self.assertTrue((df[8] == 0).all()) + pytest.raises(com.SettingWithCopyError, f) + assert (df[8] == 0).all() # list of integers result = df.iloc[:, [1, 2, 4, 6]] expected = df.reindex(columns=df.columns[[1, 2, 4, 6]]) assert_frame_equal(result, expected) - def test_irow_icol_duplicates(self): - # 10711, deprecated + def test_iloc_duplicates(self): df = DataFrame(np.random.rand(3, 3), columns=list('ABC'), index=list('aab')) @@ -1839,14 +1826,14 @@ def test_irow_icol_duplicates(self): result = df.iloc[0] with catch_warnings(record=True): result2 = df.ix[0] - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) assert_almost_equal(result.values, df.values[0]) assert_series_equal(result, result2) with catch_warnings(record=True): result = df.T.iloc[:, 0] result2 = df.T.ix[:, 0] - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) assert_almost_equal(result.values, df.values[0]) assert_series_equal(result, result2) @@ -1876,22 +1863,18 @@ def test_irow_icol_duplicates(self): expected = df.take([0], axis=1) assert_frame_equal(result, expected) - def test_icol_sparse_propegate_fill_value(self): - from pandas.sparse.api import SparseDataFrame + def test_iloc_sparse_propegate_fill_value(self): + from pandas.core.sparse.api import SparseDataFrame df = SparseDataFrame({'A': [999, 1]}, default_fill_value=999) - self.assertTrue(len(df['A'].sp_values) == len(df.iloc[:, 0].sp_values)) - - def test_iget_value(self): - # 10711 deprecated + assert len(df['A'].sp_values) == len(df.iloc[:, 0].sp_values) - with tm.assert_produces_warning(FutureWarning): - self.frame.iget_value(0, 0) + def test_iat(self): for i, row in enumerate(self.frame.index): for j, col in enumerate(self.frame.columns): result = self.frame.iat[i, j] expected = self.frame.at[row, col] - self.assertEqual(result, expected) + assert result == expected def test_nested_exception(self): # Ignore the strange way of triggering the problem @@ -1907,7 +1890,7 @@ def test_nested_exception(self): try: repr(df) except Exception as e: - self.assertNotEqual(type(e), UnboundLocalError) + assert type(e) != UnboundLocalError def test_reindex_methods(self): df = pd.DataFrame({'x': list(range(5))}) @@ -1945,6 +1928,21 @@ def test_reindex_methods(self): actual = df.reindex(target, method='nearest', tolerance=0.2) assert_frame_equal(expected, actual) + def test_reindex_frame_add_nat(self): + rng = date_range('1/1/2000 00:00:00', periods=10, freq='10s') + df = DataFrame({'A': np.random.randn(len(rng)), 'B': rng}) + + result = df.reindex(lrange(15)) + assert np.issubdtype(result['B'].dtype, np.dtype('M8[ns]')) + + mask = com.isna(result)['B'] + assert mask[-5:].all() + assert not mask[:-5].any() + + def test_set_dataframe_column_ns_dtype(self): + x = DataFrame([datetime.now(), datetime.now()]) + assert x[0].dtype == np.dtype('M8[ns]') + def test_non_monotonic_reindex_methods(self): dr = pd.date_range('2013-08-01', periods=6, freq='B') data = np.random.randn(6, 1) @@ -1952,11 +1950,10 @@ def test_non_monotonic_reindex_methods(self): df_rev = pd.DataFrame(data, index=dr[[3, 4, 5] + [0, 1, 2]], columns=list('A')) # index is not monotonic increasing or decreasing - self.assertRaises(ValueError, df_rev.reindex, df.index, method='pad') - self.assertRaises(ValueError, df_rev.reindex, df.index, method='ffill') - self.assertRaises(ValueError, df_rev.reindex, df.index, method='bfill') - self.assertRaises(ValueError, df_rev.reindex, - df.index, method='nearest') + pytest.raises(ValueError, df_rev.reindex, df.index, method='pad') + pytest.raises(ValueError, df_rev.reindex, df.index, method='ffill') + pytest.raises(ValueError, df_rev.reindex, df.index, method='bfill') + pytest.raises(ValueError, df_rev.reindex, df.index, method='nearest') def test_reindex_level(self): from itertools import permutations @@ -2098,13 +2095,13 @@ def test_setitem_with_unaligned_tz_aware_datetime_column(self): assert_series_equal(df['dates'], column) def test_setitem_datetime_coercion(self): - # GH 1048 + # gh-1048 df = pd.DataFrame({'c': [pd.Timestamp('2010-10-01')] * 3}) df.loc[0:1, 'c'] = np.datetime64('2008-08-08') - self.assertEqual(pd.Timestamp('2008-08-08'), df.loc[0, 'c']) - self.assertEqual(pd.Timestamp('2008-08-08'), df.loc[1, 'c']) + assert pd.Timestamp('2008-08-08') == df.loc[0, 'c'] + assert pd.Timestamp('2008-08-08') == df.loc[1, 'c'] df.loc[2, 'c'] = date(2005, 5, 5) - self.assertEqual(pd.Timestamp('2005-05-05'), df.loc[2, 'c']) + assert pd.Timestamp('2005-05-05') == df.loc[2, 'c'] def test_setitem_datetimelike_with_inference(self): # GH 7592 @@ -2142,14 +2139,14 @@ def test_at_time_between_time_datetimeindex(self): expected2 = df.iloc[ainds] assert_frame_equal(result, expected) assert_frame_equal(result, expected2) - self.assertEqual(len(result), 4) + assert len(result) == 4 result = df.between_time(bkey.start, bkey.stop) expected = df.loc[bkey] expected2 = df.iloc[binds] assert_frame_equal(result, expected) assert_frame_equal(result, expected2) - self.assertEqual(len(result), 12) + assert len(result) == 12 result = df.copy() result.loc[akey] = 0 @@ -2180,9 +2177,9 @@ def test_xs(self): xs = self.frame.xs(idx) for item, value in compat.iteritems(xs): if np.isnan(value): - self.assertTrue(np.isnan(self.frame[item][idx])) + assert np.isnan(self.frame[item][idx]) else: - self.assertEqual(value, self.frame[item][idx]) + assert value == self.frame[item][idx] # mixed-type xs test_data = { @@ -2191,11 +2188,11 @@ def test_xs(self): } frame = DataFrame(test_data) xs = frame.xs('1') - self.assertEqual(xs.dtype, np.object_) - self.assertEqual(xs['A'], 1) - self.assertEqual(xs['B'], '1') + assert xs.dtype == np.object_ + assert xs['A'] == 1 + assert xs['B'] == '1' - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): self.tsframe.xs(self.tsframe.index[0] - BDay()) # xs get column @@ -2206,7 +2203,7 @@ def test_xs(self): # view is returned if possible series = self.frame.xs('A', axis=1) series[:] = 5 - self.assertTrue((expected == 5).all()) + assert (expected == 5).all() def test_xs_corner(self): # pathological mixed-type reordering case @@ -2256,7 +2253,7 @@ def test_xs_view(self): index=lrange(4), columns=lrange(5)) dm.xs(2)[:] = 10 - self.assertTrue((dm.xs(2) == 10).all()) + assert (dm.xs(2) == 10).all() def test_index_namedtuple(self): from collections import namedtuple @@ -2269,10 +2266,10 @@ def test_index_namedtuple(self): with catch_warnings(record=True): result = df.ix[IndexType("foo", "bar")]["A"] - self.assertEqual(result, 1) + assert result == 1 result = df.loc[IndexType("foo", "bar")]["A"] - self.assertEqual(result, 1) + assert result == 1 def test_boolean_indexing(self): idx = lrange(3) @@ -2292,7 +2289,7 @@ def test_boolean_indexing(self): df1[df1 > 2.0 * df2] = -1 assert_frame_equal(df1, expected) - with assertRaisesRegexp(ValueError, 'Item wrong length'): + with tm.assert_raises_regex(ValueError, 'Item wrong length'): df1[df1.index[:-1] > 2] = -1 def test_boolean_indexing_mixed(self): @@ -2323,7 +2320,8 @@ def test_boolean_indexing_mixed(self): assert_frame_equal(df2, expected) df['foo'] = 'test' - with tm.assertRaisesRegexp(TypeError, 'boolean setting on mixed-type'): + with tm.assert_raises_regex(TypeError, 'boolean setting ' + 'on mixed-type'): df[df > 0.3] = 1 def test_where(self): @@ -2351,7 +2349,7 @@ def _check_get(df, cond, check_dtypes=True): # dtypes if check_dtypes: - self.assertTrue((rs.dtypes == df.dtypes).all()) + assert (rs.dtypes == df.dtypes).all() # check getting for df in [default_frame, self.mixed_frame, @@ -2400,7 +2398,7 @@ def _check_align(df, cond, other, check_dtypes=True): # can't check dtype when other is an ndarray if check_dtypes and not isinstance(other, np.ndarray): - self.assertTrue((rs.dtypes == df.dtypes).all()) + assert (rs.dtypes == df.dtypes).all() for df in [self.mixed_frame, self.mixed_float, self.mixed_int]: @@ -2421,14 +2419,14 @@ def _check_align(df, cond, other, check_dtypes=True): # invalid conditions df = default_frame err1 = (df + 1).values[0:2, :] - self.assertRaises(ValueError, df.where, cond, err1) + pytest.raises(ValueError, df.where, cond, err1) err2 = cond.iloc[:2, :].values other1 = _safe_add(df) - self.assertRaises(ValueError, df.where, err2, other1) + pytest.raises(ValueError, df.where, err2, other1) - self.assertRaises(ValueError, df.mask, True) - self.assertRaises(ValueError, df.mask, 0) + pytest.raises(ValueError, df.mask, True) + pytest.raises(ValueError, df.mask, 0) # where inplace def _check_set(df, cond, check_dtypes=True): @@ -2444,7 +2442,7 @@ def _check_set(df, cond, check_dtypes=True): for k, v in compat.iteritems(df.dtypes): if issubclass(v.type, np.integer) and not cond[k].all(): v = np.dtype('float64') - self.assertEqual(dfi[k].dtype, v) + assert dfi[k].dtype == v for df in [default_frame, self.mixed_frame, self.mixed_float, self.mixed_int]: @@ -2466,6 +2464,96 @@ def _check_set(df, cond, check_dtypes=True): expected = df[df['a'] == 1].reindex(df.index) assert_frame_equal(result, expected) + def test_where_array_like(self): + # see gh-15414 + klasses = [list, tuple, np.array] + + df = DataFrame({'a': [1, 2, 3]}) + cond = [[False], [True], [True]] + expected = DataFrame({'a': [np.nan, 2, 3]}) + + for klass in klasses: + result = df.where(klass(cond)) + assert_frame_equal(result, expected) + + df['b'] = 2 + expected['b'] = [2, np.nan, 2] + cond = [[False, True], [True, False], [True, True]] + + for klass in klasses: + result = df.where(klass(cond)) + assert_frame_equal(result, expected) + + def test_where_invalid_input(self): + # see gh-15414: only boolean arrays accepted + df = DataFrame({'a': [1, 2, 3]}) + msg = "Boolean array expected for the condition" + + conds = [ + [[1], [0], [1]], + Series([[2], [5], [7]]), + DataFrame({'a': [2, 5, 7]}), + [["True"], ["False"], ["True"]], + [[Timestamp("2017-01-01")], + [pd.NaT], [Timestamp("2017-01-02")]] + ] + + for cond in conds: + with tm.assert_raises_regex(ValueError, msg): + df.where(cond) + + df['b'] = 2 + conds = [ + [[0, 1], [1, 0], [1, 1]], + Series([[0, 2], [5, 0], [4, 7]]), + [["False", "True"], ["True", "False"], + ["True", "True"]], + DataFrame({'a': [2, 5, 7], 'b': [4, 8, 9]}), + [[pd.NaT, Timestamp("2017-01-01")], + [Timestamp("2017-01-02"), pd.NaT], + [Timestamp("2017-01-03"), Timestamp("2017-01-03")]] + ] + + for cond in conds: + with tm.assert_raises_regex(ValueError, msg): + df.where(cond) + + def test_where_dataframe_col_match(self): + df = DataFrame([[1, 2, 3], [4, 5, 6]]) + cond = DataFrame([[True, False, True], [False, False, True]]) + + result = df.where(cond) + expected = DataFrame([[1.0, np.nan, 3], [np.nan, np.nan, 6]]) + tm.assert_frame_equal(result, expected) + + # this *does* align, though has no matching columns + cond.columns = ["a", "b", "c"] + result = df.where(cond) + expected = DataFrame(np.nan, index=df.index, columns=df.columns) + tm.assert_frame_equal(result, expected) + + def test_where_ndframe_align(self): + msg = "Array conditional must be same shape as self" + df = DataFrame([[1, 2, 3], [4, 5, 6]]) + + cond = [True] + with tm.assert_raises_regex(ValueError, msg): + df.where(cond) + + expected = DataFrame([[1, 2, 3], [np.nan, np.nan, np.nan]]) + + out = df.where(Series(cond)) + tm.assert_frame_equal(out, expected) + + cond = np.array([False, True, False, True]) + with tm.assert_raises_regex(ValueError, msg): + df.where(cond) + + expected = DataFrame([[np.nan, np.nan, np.nan], [4, 5, 6]]) + + out = df.where(Series(cond)) + tm.assert_frame_equal(out, expected) + def test_where_bug(self): # GH 2793 @@ -2502,7 +2590,7 @@ def test_where_bug(self): # GH7506 a = DataFrame({0: [1, 2], 1: [3, 4], 2: [5, 6]}) b = DataFrame({0: [np.nan, 8], 1: [9, np.nan], 2: [np.nan, np.nan]}) - do_not_replace = b.isnull() | (a > b) + do_not_replace = b.isna() | (a > b) expected = a.copy() expected[~do_not_replace] = b @@ -2512,7 +2600,7 @@ def test_where_bug(self): a = DataFrame({0: [4, 6], 1: [1, 0]}) b = DataFrame({0: [np.nan, 3], 1: [3, np.nan]}) - do_not_replace = b.isnull() | (a > b) + do_not_replace = b.isna() | (a > b) expected = a.copy() expected[~do_not_replace] = b @@ -2545,9 +2633,10 @@ def test_where_none(self): # GH 7656 df = DataFrame([{'A': 1, 'B': np.nan, 'C': 'Test'}, { 'A': np.nan, 'B': 'Test', 'C': np.nan}]) - expected = df.where(~isnull(df), None) - with tm.assertRaisesRegexp(TypeError, 'boolean setting on mixed-type'): - df.where(~isnull(df), None, inplace=True) + expected = df.where(~isna(df), None) + with tm.assert_raises_regex(TypeError, 'boolean setting ' + 'on mixed-type'): + df.where(~isna(df), None, inplace=True) def test_where_align(self): @@ -2561,10 +2650,10 @@ def create(): # series df = create() expected = df.fillna(df.mean()) - result = df.where(pd.notnull(df), df.mean(), axis='columns') + result = df.where(pd.notna(df), df.mean(), axis='columns') assert_frame_equal(result, expected) - df.where(pd.notnull(df), df.mean(), inplace=True, axis='columns') + df.where(pd.notna(df), df.mean(), inplace=True, axis='columns') assert_frame_equal(df, expected) df = create().fillna(0) @@ -2577,7 +2666,7 @@ def create(): # frame df = create() expected = df.fillna(1) - result = df.where(pd.notnull(df), DataFrame( + result = df.where(pd.notna(df), DataFrame( 1, index=df.index, columns=df.columns)) assert_frame_equal(result, expected) @@ -2624,7 +2713,7 @@ def test_where_axis(self): result.where(mask, s, axis='index', inplace=True) assert_frame_equal(result, expected) - expected = DataFrame([[0, np.nan], [0, np.nan]], dtype='float64') + expected = DataFrame([[0, np.nan], [0, np.nan]]) result = df.where(mask, s, axis='columns') assert_frame_equal(result, expected) @@ -2635,17 +2724,18 @@ def test_where_axis(self): assert_frame_equal(result, expected) # Multiple dtypes (=> multiple Blocks) - df = pd.concat([DataFrame(np.random.randn(10, 2)), - DataFrame(np.random.randint(0, 10, size=(10, 2)))], - ignore_index=True, axis=1) + df = pd.concat([ + DataFrame(np.random.randn(10, 2)), + DataFrame(np.random.randint(0, 10, size=(10, 2)), dtype='int64')], + ignore_index=True, axis=1) mask = DataFrame(False, columns=df.columns, index=df.index) s1 = Series(1, index=df.columns) s2 = Series(2, index=df.index) result = df.where(mask, s1, axis='columns') expected = DataFrame(1.0, columns=df.columns, index=df.index) - expected[2] = expected[2].astype(int) - expected[3] = expected[3].astype(int) + expected[2] = expected[2].astype('int64') + expected[3] = expected[3].astype('int64') assert_frame_equal(result, expected) result = df.copy() @@ -2654,8 +2744,8 @@ def test_where_axis(self): result = df.where(mask, s2, axis='index') expected = DataFrame(2.0, columns=df.columns, index=df.index) - expected[2] = expected[2].astype(int) - expected[3] = expected[3].astype(int) + expected[2] = expected[2].astype('int64') + expected[3] = expected[3].astype('int64') assert_frame_equal(result, expected) result = df.copy() @@ -2804,7 +2894,7 @@ def test_type_error_multiindex(self): dg = df.pivot_table(index='i', columns='c', values=['x', 'y']) - with assertRaisesRegexp(TypeError, "is an invalid key"): + with tm.assert_raises_regex(TypeError, "is an invalid key"): str(dg[:, 0]) index = Index(range(2), name='i') @@ -2824,11 +2914,9 @@ def test_type_error_multiindex(self): assert_series_equal(result, expected) -class TestDataFrameIndexingDatetimeWithTZ(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameIndexingDatetimeWithTZ(TestData): - def setUp(self): + def setup_method(self, method): self.idx = Index(date_range('20130101', periods=3, tz='US/Eastern'), name='foo') self.dr = date_range('20130110', periods=3) @@ -2852,16 +2940,15 @@ def test_setitem(self): # are copies) b1 = df._data.blocks[1] b2 = df._data.blocks[2] - self.assertTrue(b1.values.equals(b2.values)) - self.assertFalse(id(b1.values.values.base) == - id(b2.values.values.base)) + assert b1.values.equals(b2.values) + assert id(b1.values.values.base) != id(b2.values.values.base) # with nan df2 = df.copy() df2.iloc[1, 1] = pd.NaT df2.iloc[1, 2] = pd.NaT result = df2['B'] - assert_series_equal(notnull(result), Series( + assert_series_equal(notna(result), Series( [True, False, True], name='B')) assert_series_equal(df2.dtypes, df.dtypes) @@ -2872,7 +2959,7 @@ def test_set_reset(self): # set/reset df = DataFrame({'A': [0, 1, 2]}, index=idx) result = df.reset_index() - self.assertTrue(result['foo'].dtype, 'M8[ns, US/Eastern') + assert result['foo'].dtype, 'M8[ns, US/Eastern' df = result.set_index('foo') tm.assert_index_equal(df.index, idx) @@ -2885,11 +2972,9 @@ def test_transpose(self): assert_frame_equal(result, expected) -class TestDataFrameIndexingUInt64(tm.TestCase, TestData): +class TestDataFrameIndexingUInt64(TestData): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.ir = Index(np.arange(3), dtype=np.uint64) self.idx = Index([2**63, 2**63 + 5, 2**63 + 10], name='foo') @@ -2915,7 +3000,7 @@ def test_setitem(self): df2.iloc[1, 1] = pd.NaT df2.iloc[1, 2] = pd.NaT result = df2['B'] - assert_series_equal(notnull(result), Series( + assert_series_equal(notna(result), Series( [True, False, True], name='B')) assert_series_equal(df2.dtypes, Series([np.dtype('uint64'), np.dtype('O'), np.dtype('O')], @@ -2928,7 +3013,7 @@ def test_set_reset(self): # set/reset df = DataFrame({'A': [0, 1, 2]}, index=idx) result = df.reset_index() - self.assertEqual(result['foo'].dtype, np.dtype('uint64')) + assert result['foo'].dtype == np.dtype('uint64') df = result.set_index('foo') tm.assert_index_equal(df.index, idx) @@ -2939,9 +3024,3 @@ def test_transpose(self): expected = DataFrame(self.df.values.T) expected.index = ['A', 'B'] assert_frame_equal(result, expected) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/frame/test_join.py b/pandas/tests/frame/test_join.py new file mode 100644 index 0000000000000..afecba2026dd7 --- /dev/null +++ b/pandas/tests/frame/test_join.py @@ -0,0 +1,167 @@ +# -*- coding: utf-8 -*- + +import pytest +import numpy as np + +from pandas import DataFrame, Index, PeriodIndex +from pandas.tests.frame.common import TestData +import pandas.util.testing as tm + + +@pytest.fixture +def frame_with_period_index(): + return DataFrame( + data=np.arange(20).reshape(4, 5), + columns=list('abcde'), + index=PeriodIndex(start='2000', freq='A', periods=4)) + + +@pytest.fixture +def frame(): + return TestData().frame + + +@pytest.fixture +def left(): + return DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0]) + + +@pytest.fixture +def right(): + return DataFrame({'b': [300, 100, 200]}, index=[3, 1, 2]) + + +@pytest.mark.parametrize( + "how, sort, expected", + [('inner', False, DataFrame({'a': [20, 10], + 'b': [200, 100]}, + index=[2, 1])), + ('inner', True, DataFrame({'a': [10, 20], + 'b': [100, 200]}, + index=[1, 2])), + ('left', False, DataFrame({'a': [20, 10, 0], + 'b': [200, 100, np.nan]}, + index=[2, 1, 0])), + ('left', True, DataFrame({'a': [0, 10, 20], + 'b': [np.nan, 100, 200]}, + index=[0, 1, 2])), + ('right', False, DataFrame({'a': [np.nan, 10, 20], + 'b': [300, 100, 200]}, + index=[3, 1, 2])), + ('right', True, DataFrame({'a': [10, 20, np.nan], + 'b': [100, 200, 300]}, + index=[1, 2, 3])), + ('outer', False, DataFrame({'a': [0, 10, 20, np.nan], + 'b': [np.nan, 100, 200, 300]}, + index=[0, 1, 2, 3])), + ('outer', True, DataFrame({'a': [0, 10, 20, np.nan], + 'b': [np.nan, 100, 200, 300]}, + index=[0, 1, 2, 3]))]) +def test_join(left, right, how, sort, expected): + + result = left.join(right, how=how, sort=sort) + tm.assert_frame_equal(result, expected) + + +def test_join_index(frame): + # left / right + + f = frame.loc[frame.index[:10], ['A', 'B']] + f2 = frame.loc[frame.index[5:], ['C', 'D']].iloc[::-1] + + joined = f.join(f2) + tm.assert_index_equal(f.index, joined.index) + expected_columns = Index(['A', 'B', 'C', 'D']) + tm.assert_index_equal(joined.columns, expected_columns) + + joined = f.join(f2, how='left') + tm.assert_index_equal(joined.index, f.index) + tm.assert_index_equal(joined.columns, expected_columns) + + joined = f.join(f2, how='right') + tm.assert_index_equal(joined.index, f2.index) + tm.assert_index_equal(joined.columns, expected_columns) + + # inner + + joined = f.join(f2, how='inner') + tm.assert_index_equal(joined.index, f.index[5:10]) + tm.assert_index_equal(joined.columns, expected_columns) + + # outer + + joined = f.join(f2, how='outer') + tm.assert_index_equal(joined.index, frame.index.sort_values()) + tm.assert_index_equal(joined.columns, expected_columns) + + tm.assert_raises_regex( + ValueError, 'join method', f.join, f2, how='foo') + + # corner case - overlapping columns + for how in ('outer', 'left', 'inner'): + with tm.assert_raises_regex(ValueError, 'columns overlap but ' + 'no suffix'): + frame.join(frame, how=how) + + +def test_join_index_more(frame): + af = frame.loc[:, ['A', 'B']] + bf = frame.loc[::2, ['C', 'D']] + + expected = af.copy() + expected['C'] = frame['C'][::2] + expected['D'] = frame['D'][::2] + + result = af.join(bf) + tm.assert_frame_equal(result, expected) + + result = af.join(bf, how='right') + tm.assert_frame_equal(result, expected[::2]) + + result = bf.join(af, how='right') + tm.assert_frame_equal(result, expected.loc[:, result.columns]) + + +def test_join_index_series(frame): + df = frame.copy() + s = df.pop(frame.columns[-1]) + joined = df.join(s) + + # TODO should this check_names ? + tm.assert_frame_equal(joined, frame, check_names=False) + + s.name = None + tm.assert_raises_regex(ValueError, 'must have a name', df.join, s) + + +def test_join_overlap(frame): + df1 = frame.loc[:, ['A', 'B', 'C']] + df2 = frame.loc[:, ['B', 'C', 'D']] + + joined = df1.join(df2, lsuffix='_df1', rsuffix='_df2') + df1_suf = df1.loc[:, ['B', 'C']].add_suffix('_df1') + df2_suf = df2.loc[:, ['B', 'C']].add_suffix('_df2') + + no_overlap = frame.loc[:, ['A', 'D']] + expected = df1_suf.join(df2_suf).join(no_overlap) + + # column order not necessarily sorted + tm.assert_frame_equal(joined, expected.loc[:, joined.columns]) + + +def test_join_period_index(frame_with_period_index): + other = frame_with_period_index.rename( + columns=lambda x: '{key}{key}'.format(key=x)) + + joined_values = np.concatenate( + [frame_with_period_index.values] * 2, axis=1) + + joined_cols = frame_with_period_index.columns.append(other.columns) + + joined = frame_with_period_index.join(other) + expected = DataFrame( + data=joined_values, + columns=joined_cols, + index=frame_with_period_index.index) + + tm.assert_frame_equal(joined, expected) diff --git a/pandas/tests/frame/test_misc_api.py b/pandas/tests/frame/test_misc_api.py deleted file mode 100644 index f5719fa1d8b85..0000000000000 --- a/pandas/tests/frame/test_misc_api.py +++ /dev/null @@ -1,493 +0,0 @@ -# -*- coding: utf-8 -*- - -from __future__ import print_function -# pylint: disable-msg=W0612,E1101 -from copy import deepcopy -import sys -import nose -from distutils.version import LooseVersion - -from pandas.compat import range, lrange -from pandas import compat - -from numpy.random import randn -import numpy as np - -from pandas import DataFrame, Series -import pandas as pd - -from pandas.util.testing import (assert_almost_equal, - assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) - -import pandas.util.testing as tm - -from pandas.tests.frame.common import TestData - - -class SharedWithSparse(object): - - _multiprocess_can_split_ = True - - def test_copy_index_name_checking(self): - # don't want to be able to modify the index stored elsewhere after - # making a copy - for attr in ('index', 'columns'): - ind = getattr(self.frame, attr) - ind.name = None - cp = self.frame.copy() - getattr(cp, attr).name = 'foo' - self.assertIsNone(getattr(self.frame, attr).name) - - def test_getitem_pop_assign_name(self): - s = self.frame['A'] - self.assertEqual(s.name, 'A') - - s = self.frame.pop('A') - self.assertEqual(s.name, 'A') - - s = self.frame.loc[:, 'B'] - self.assertEqual(s.name, 'B') - - s2 = s.loc[:] - self.assertEqual(s2.name, 'B') - - def test_get_value(self): - for idx in self.frame.index: - for col in self.frame.columns: - result = self.frame.get_value(idx, col) - expected = self.frame[col][idx] - tm.assert_almost_equal(result, expected) - - def test_join_index(self): - # left / right - - f = self.frame.reindex(columns=['A', 'B'])[:10] - f2 = self.frame.reindex(columns=['C', 'D']) - - joined = f.join(f2) - self.assert_index_equal(f.index, joined.index) - self.assertEqual(len(joined.columns), 4) - - joined = f.join(f2, how='left') - self.assert_index_equal(joined.index, f.index) - self.assertEqual(len(joined.columns), 4) - - joined = f.join(f2, how='right') - self.assert_index_equal(joined.index, f2.index) - self.assertEqual(len(joined.columns), 4) - - # inner - - f = self.frame.reindex(columns=['A', 'B'])[:10] - f2 = self.frame.reindex(columns=['C', 'D']) - - joined = f.join(f2, how='inner') - self.assert_index_equal(joined.index, f.index.intersection(f2.index)) - self.assertEqual(len(joined.columns), 4) - - # outer - - f = self.frame.reindex(columns=['A', 'B'])[:10] - f2 = self.frame.reindex(columns=['C', 'D']) - - joined = f.join(f2, how='outer') - self.assertTrue(tm.equalContents(self.frame.index, joined.index)) - self.assertEqual(len(joined.columns), 4) - - assertRaisesRegexp(ValueError, 'join method', f.join, f2, how='foo') - - # corner case - overlapping columns - for how in ('outer', 'left', 'inner'): - with assertRaisesRegexp(ValueError, 'columns overlap but ' - 'no suffix'): - self.frame.join(self.frame, how=how) - - def test_join_index_more(self): - af = self.frame.loc[:, ['A', 'B']] - bf = self.frame.loc[::2, ['C', 'D']] - - expected = af.copy() - expected['C'] = self.frame['C'][::2] - expected['D'] = self.frame['D'][::2] - - result = af.join(bf) - assert_frame_equal(result, expected) - - result = af.join(bf, how='right') - assert_frame_equal(result, expected[::2]) - - result = bf.join(af, how='right') - assert_frame_equal(result, expected.loc[:, result.columns]) - - def test_join_index_series(self): - df = self.frame.copy() - s = df.pop(self.frame.columns[-1]) - joined = df.join(s) - - # TODO should this check_names ? - assert_frame_equal(joined, self.frame, check_names=False) - - s.name = None - assertRaisesRegexp(ValueError, 'must have a name', df.join, s) - - def test_join_overlap(self): - df1 = self.frame.loc[:, ['A', 'B', 'C']] - df2 = self.frame.loc[:, ['B', 'C', 'D']] - - joined = df1.join(df2, lsuffix='_df1', rsuffix='_df2') - df1_suf = df1.loc[:, ['B', 'C']].add_suffix('_df1') - df2_suf = df2.loc[:, ['B', 'C']].add_suffix('_df2') - - no_overlap = self.frame.loc[:, ['A', 'D']] - expected = df1_suf.join(df2_suf).join(no_overlap) - - # column order not necessarily sorted - assert_frame_equal(joined, expected.loc[:, joined.columns]) - - def test_add_prefix_suffix(self): - with_prefix = self.frame.add_prefix('foo#') - expected = pd.Index(['foo#%s' % c for c in self.frame.columns]) - self.assert_index_equal(with_prefix.columns, expected) - - with_suffix = self.frame.add_suffix('#foo') - expected = pd.Index(['%s#foo' % c for c in self.frame.columns]) - self.assert_index_equal(with_suffix.columns, expected) - - -class TestDataFrameMisc(tm.TestCase, SharedWithSparse, TestData): - - klass = DataFrame - - _multiprocess_can_split_ = True - - def test_get_axis(self): - f = self.frame - self.assertEqual(f._get_axis_number(0), 0) - self.assertEqual(f._get_axis_number(1), 1) - self.assertEqual(f._get_axis_number('index'), 0) - self.assertEqual(f._get_axis_number('rows'), 0) - self.assertEqual(f._get_axis_number('columns'), 1) - - self.assertEqual(f._get_axis_name(0), 'index') - self.assertEqual(f._get_axis_name(1), 'columns') - self.assertEqual(f._get_axis_name('index'), 'index') - self.assertEqual(f._get_axis_name('rows'), 'index') - self.assertEqual(f._get_axis_name('columns'), 'columns') - - self.assertIs(f._get_axis(0), f.index) - self.assertIs(f._get_axis(1), f.columns) - - assertRaisesRegexp(ValueError, 'No axis named', f._get_axis_number, 2) - assertRaisesRegexp(ValueError, 'No axis.*foo', f._get_axis_name, 'foo') - assertRaisesRegexp(ValueError, 'No axis.*None', f._get_axis_name, None) - assertRaisesRegexp(ValueError, 'No axis named', f._get_axis_number, - None) - - def test_keys(self): - getkeys = self.frame.keys - self.assertIs(getkeys(), self.frame.columns) - - def test_column_contains_typeerror(self): - try: - self.frame.columns in self.frame - except TypeError: - pass - - def test_not_hashable(self): - df = pd.DataFrame([1]) - self.assertRaises(TypeError, hash, df) - self.assertRaises(TypeError, hash, self.empty) - - def test_new_empty_index(self): - df1 = DataFrame(randn(0, 3)) - df2 = DataFrame(randn(0, 3)) - df1.index.name = 'foo' - self.assertIsNone(df2.index.name) - - def test_array_interface(self): - with np.errstate(all='ignore'): - result = np.sqrt(self.frame) - tm.assertIsInstance(result, type(self.frame)) - self.assertIs(result.index, self.frame.index) - self.assertIs(result.columns, self.frame.columns) - - assert_frame_equal(result, self.frame.apply(np.sqrt)) - - def test_get_agg_axis(self): - cols = self.frame._get_agg_axis(0) - self.assertIs(cols, self.frame.columns) - - idx = self.frame._get_agg_axis(1) - self.assertIs(idx, self.frame.index) - - self.assertRaises(ValueError, self.frame._get_agg_axis, 2) - - def test_nonzero(self): - self.assertTrue(self.empty.empty) - - self.assertFalse(self.frame.empty) - self.assertFalse(self.mixed_frame.empty) - - # corner case - df = DataFrame({'A': [1., 2., 3.], - 'B': ['a', 'b', 'c']}, - index=np.arange(3)) - del df['A'] - self.assertFalse(df.empty) - - def test_iteritems(self): - df = DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'a', 'b']) - for k, v in compat.iteritems(df): - self.assertEqual(type(v), Series) - - def test_iter(self): - self.assertTrue(tm.equalContents(list(self.frame), self.frame.columns)) - - def test_iterrows(self): - for i, (k, v) in enumerate(self.frame.iterrows()): - exp = self.frame.xs(self.frame.index[i]) - assert_series_equal(v, exp) - - for i, (k, v) in enumerate(self.mixed_frame.iterrows()): - exp = self.mixed_frame.xs(self.mixed_frame.index[i]) - assert_series_equal(v, exp) - - def test_itertuples(self): - for i, tup in enumerate(self.frame.itertuples()): - s = Series(tup[1:]) - s.name = tup[0] - expected = self.frame.iloc[i, :].reset_index(drop=True) - assert_series_equal(s, expected) - - df = DataFrame({'floats': np.random.randn(5), - 'ints': lrange(5)}, columns=['floats', 'ints']) - - for tup in df.itertuples(index=False): - tm.assertIsInstance(tup[1], np.integer) - - df = DataFrame(data={"a": [1, 2, 3], "b": [4, 5, 6]}) - dfaa = df[['a', 'a']] - self.assertEqual(list(dfaa.itertuples()), [ - (0, 1, 1), (1, 2, 2), (2, 3, 3)]) - - self.assertEqual(repr(list(df.itertuples(name=None))), - '[(0, 1, 4), (1, 2, 5), (2, 3, 6)]') - - tup = next(df.itertuples(name='TestName')) - - # no support for field renaming in Python 2.6, regular tuples are - # returned - if sys.version >= LooseVersion('2.7'): - self.assertEqual(tup._fields, ('Index', 'a', 'b')) - self.assertEqual((tup.Index, tup.a, tup.b), tup) - self.assertEqual(type(tup).__name__, 'TestName') - - df.columns = ['def', 'return'] - tup2 = next(df.itertuples(name='TestName')) - self.assertEqual(tup2, (0, 1, 4)) - - if sys.version >= LooseVersion('2.7'): - self.assertEqual(tup2._fields, ('Index', '_1', '_2')) - - df3 = DataFrame(dict(('f' + str(i), [i]) for i in range(1024))) - # will raise SyntaxError if trying to create namedtuple - tup3 = next(df3.itertuples()) - self.assertFalse(hasattr(tup3, '_fields')) - self.assertIsInstance(tup3, tuple) - - def test_len(self): - self.assertEqual(len(self.frame), len(self.frame.index)) - - def test_as_matrix(self): - frame = self.frame - mat = frame.as_matrix() - - frameCols = frame.columns - for i, row in enumerate(mat): - for j, value in enumerate(row): - col = frameCols[j] - if np.isnan(value): - self.assertTrue(np.isnan(frame[col][i])) - else: - self.assertEqual(value, frame[col][i]) - - # mixed type - mat = self.mixed_frame.as_matrix(['foo', 'A']) - self.assertEqual(mat[0, 0], 'bar') - - df = DataFrame({'real': [1, 2, 3], 'complex': [1j, 2j, 3j]}) - mat = df.as_matrix() - self.assertEqual(mat[0, 0], 1j) - - # single block corner case - mat = self.frame.as_matrix(['A', 'B']) - expected = self.frame.reindex(columns=['A', 'B']).values - assert_almost_equal(mat, expected) - - def test_values(self): - self.frame.values[:, 0] = 5. - self.assertTrue((self.frame.values[:, 0] == 5).all()) - - def test_deepcopy(self): - cp = deepcopy(self.frame) - series = cp['A'] - series[:] = 10 - for idx, value in compat.iteritems(series): - self.assertNotEqual(self.frame['A'][idx], value) - - # --------------------------------------------------------------------- - # Transposing - - def test_transpose(self): - frame = self.frame - dft = frame.T - for idx, series in compat.iteritems(dft): - for col, value in compat.iteritems(series): - if np.isnan(value): - self.assertTrue(np.isnan(frame[col][idx])) - else: - self.assertEqual(value, frame[col][idx]) - - # mixed type - index, data = tm.getMixedTypeDict() - mixed = DataFrame(data, index=index) - - mixed_T = mixed.T - for col, s in compat.iteritems(mixed_T): - self.assertEqual(s.dtype, np.object_) - - def test_transpose_get_view(self): - dft = self.frame.T - dft.values[:, 5:10] = 5 - - self.assertTrue((self.frame.values[5:10] == 5).all()) - - def test_swapaxes(self): - df = DataFrame(np.random.randn(10, 5)) - assert_frame_equal(df.T, df.swapaxes(0, 1)) - assert_frame_equal(df.T, df.swapaxes(1, 0)) - assert_frame_equal(df, df.swapaxes(0, 0)) - self.assertRaises(ValueError, df.swapaxes, 2, 5) - - def test_axis_aliases(self): - f = self.frame - - # reg name - expected = f.sum(axis=0) - result = f.sum(axis='index') - assert_series_equal(result, expected) - - expected = f.sum(axis=1) - result = f.sum(axis='columns') - assert_series_equal(result, expected) - - def test_more_asMatrix(self): - values = self.mixed_frame.as_matrix() - self.assertEqual(values.shape[1], len(self.mixed_frame.columns)) - - def test_repr_with_mi_nat(self): - df = DataFrame({'X': [1, 2]}, - index=[[pd.NaT, pd.Timestamp('20130101')], ['a', 'b']]) - res = repr(df) - exp = ' X\nNaT a 1\n2013-01-01 b 2' - self.assertEqual(res, exp) - - def test_iterkv_deprecation(self): - with tm.assert_produces_warning(FutureWarning): - self.mixed_float.iterkv() - - def test_iterkv_names(self): - for k, v in compat.iteritems(self.mixed_frame): - self.assertEqual(v.name, k) - - def test_series_put_names(self): - series = self.mixed_frame._series - for k, v in compat.iteritems(series): - self.assertEqual(v.name, k) - - def test_empty_nonzero(self): - df = DataFrame([1, 2, 3]) - self.assertFalse(df.empty) - df = pd.DataFrame(index=[1], columns=[1]) - self.assertFalse(df.empty) - df = DataFrame(index=['a', 'b'], columns=['c', 'd']).dropna() - self.assertTrue(df.empty) - self.assertTrue(df.T.empty) - empty_frames = [pd.DataFrame(), - pd.DataFrame(index=[1]), - pd.DataFrame(columns=[1]), - pd.DataFrame({1: []})] - for df in empty_frames: - self.assertTrue(df.empty) - self.assertTrue(df.T.empty) - - def test_inplace_return_self(self): - # re #1893 - - data = DataFrame({'a': ['foo', 'bar', 'baz', 'qux'], - 'b': [0, 0, 1, 1], - 'c': [1, 2, 3, 4]}) - - def _check_f(base, f): - result = f(base) - self.assertTrue(result is None) - - # -----DataFrame----- - - # set_index - f = lambda x: x.set_index('a', inplace=True) - _check_f(data.copy(), f) - - # reset_index - f = lambda x: x.reset_index(inplace=True) - _check_f(data.set_index('a'), f) - - # drop_duplicates - f = lambda x: x.drop_duplicates(inplace=True) - _check_f(data.copy(), f) - - # sort - f = lambda x: x.sort_values('b', inplace=True) - _check_f(data.copy(), f) - - # sort_index - f = lambda x: x.sort_index(inplace=True) - _check_f(data.copy(), f) - - # fillna - f = lambda x: x.fillna(0, inplace=True) - _check_f(data.copy(), f) - - # replace - f = lambda x: x.replace(1, 0, inplace=True) - _check_f(data.copy(), f) - - # rename - f = lambda x: x.rename({1: 'foo'}, inplace=True) - _check_f(data.copy(), f) - - # -----Series----- - d = data.copy()['c'] - - # reset_index - f = lambda x: x.reset_index(inplace=True, drop=True) - _check_f(data.set_index('a')['c'], f) - - # fillna - f = lambda x: x.fillna(0, inplace=True) - _check_f(d.copy(), f) - - # replace - f = lambda x: x.replace(1, 0, inplace=True) - _check_f(d.copy(), f) - - # rename - f = lambda x: x.rename({1: 'foo'}, inplace=True) - _check_f(d.copy(), f) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/frame/test_missing.py b/pandas/tests/frame/test_missing.py index c4f037e85edf6..ebd15b3180a33 100644 --- a/pandas/tests/frame/test_missing.py +++ b/pandas/tests/frame/test_missing.py @@ -2,6 +2,8 @@ from __future__ import print_function +import pytest + from distutils.version import LooseVersion from numpy import nan, random import numpy as np @@ -11,25 +13,28 @@ date_range) import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from pandas.tests.frame.common import TestData, _check_mixed_float +try: + import scipy + _is_scipy_ge_0190 = scipy.__version__ >= LooseVersion('0.19.0') +except: + _is_scipy_ge_0190 = False + + def _skip_if_no_pchip(): try: from scipy.interpolate import pchip_interpolate # noqa except ImportError: - import nose - raise nose.SkipTest('scipy.interpolate.pchip missing') - + import pytest + pytest.skip('scipy.interpolate.pchip missing') -class TestDataFrameMissingData(tm.TestCase, TestData): - _multiprocess_can_split_ = True +class TestDataFrameMissingData(TestData): def test_dropEmptyRows(self): N = len(self.frame.index) @@ -73,10 +78,10 @@ def test_dropIncompleteRows(self): samesize_frame = frame.dropna(subset=['bar']) assert_series_equal(frame['foo'], original) - self.assertTrue((frame['bar'] == 5).all()) + assert (frame['bar'] == 5).all() inp_frame2.dropna(subset=['bar'], inplace=True) - self.assert_index_equal(samesize_frame.index, self.frame.index) - self.assert_index_equal(inp_frame2.index, self.frame.index) + tm.assert_index_equal(samesize_frame.index, self.frame.index) + tm.assert_index_equal(inp_frame2.index, self.frame.index) def test_dropna(self): df = DataFrame(np.random.randn(6, 4)) @@ -134,7 +139,7 @@ def test_dropna(self): assert_frame_equal(dropped, expected) # bad input - self.assertRaises(ValueError, df.dropna, axis=3) + pytest.raises(ValueError, df.dropna, axis=3) def test_drop_and_dropna_caching(self): # tst that cacher updates @@ -153,10 +158,10 @@ def test_drop_and_dropna_caching(self): def test_dropna_corner(self): # bad input - self.assertRaises(ValueError, self.frame.dropna, how='foo') - self.assertRaises(TypeError, self.frame.dropna, how=None) + pytest.raises(ValueError, self.frame.dropna, how='foo') + pytest.raises(TypeError, self.frame.dropna, how=None) # non-existent column - 8303 - self.assertRaises(KeyError, self.frame.dropna, subset=['A', 'X']) + pytest.raises(KeyError, self.frame.dropna, subset=['A', 'X']) def test_dropna_multiple_axes(self): df = DataFrame([[1, np.nan, 2, 3], @@ -182,13 +187,12 @@ def test_fillna(self): tf.loc[tf.index[-5:], 'A'] = nan zero_filled = self.tsframe.fillna(0) - self.assertTrue((zero_filled.loc[zero_filled.index[:5], 'A'] == 0 - ).all()) + assert (zero_filled.loc[zero_filled.index[:5], 'A'] == 0).all() padded = self.tsframe.fillna(method='pad') - self.assertTrue(np.isnan(padded.loc[padded.index[:5], 'A']).all()) - self.assertTrue((padded.loc[padded.index[-5:], 'A'] == - padded.loc[padded.index[-5], 'A']).all()) + assert np.isnan(padded.loc[padded.index[:5], 'A']).all() + assert (padded.loc[padded.index[-5:], 'A'] == + padded.loc[padded.index[-5], 'A']).all() # mixed type mf = self.mixed_frame @@ -197,8 +201,8 @@ def test_fillna(self): result = self.mixed_frame.fillna(value=0) result = self.mixed_frame.fillna(method='pad') - self.assertRaises(ValueError, self.tsframe.fillna) - self.assertRaises(ValueError, self.tsframe.fillna, 5, method='ffill') + pytest.raises(ValueError, self.tsframe.fillna) + pytest.raises(ValueError, self.tsframe.fillna, 5, method='ffill') # mixed numeric (but no float16) mf = self.mixed_float.reindex(columns=['A', 'B', 'D']) @@ -252,6 +256,34 @@ def test_fillna(self): result = df.fillna(value={'Date': df['Date2']}) assert_frame_equal(result, expected) + # with timezone + # GH 15855 + df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+01:00'), + pd.NaT]}) + exp = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+01:00'), + pd.Timestamp('2012-11-11 00:00:00+01:00')]}) + assert_frame_equal(df.fillna(method='pad'), exp) + + df = pd.DataFrame({'A': [pd.NaT, + pd.Timestamp('2012-11-11 00:00:00+01:00')]}) + exp = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+01:00'), + pd.Timestamp('2012-11-11 00:00:00+01:00')]}) + assert_frame_equal(df.fillna(method='bfill'), exp) + + def test_fillna_downcast(self): + # GH 15277 + # infer int64 from float64 + df = pd.DataFrame({'a': [1., np.nan]}) + result = df.fillna(0, downcast='infer') + expected = pd.DataFrame({'a': [1, 0]}) + assert_frame_equal(result, expected) + + # infer int64 from float64 when fillna value is a dict + df = pd.DataFrame({'a': [1., np.nan]}) + result = df.fillna({'a': 0}, downcast='infer') + expected = pd.DataFrame({'a': [1, 0]}) + assert_frame_equal(result, expected) + def test_fillna_dtype_conversion(self): # make sure that fillna on an empty frame works df = DataFrame(index=["A", "B", "C"], columns=[1, 2, 3, 4, 5]) @@ -291,7 +323,7 @@ def test_fillna_datetime_columns(self): 'C': ['foo', 'bar', '?'], 'D': ['foo2', 'bar2', '?']}, index=date_range('20130110', periods=3)) - self.assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) df = pd.DataFrame({'A': [-1, -2, np.nan], 'B': [pd.Timestamp('2013-01-01'), @@ -306,7 +338,7 @@ def test_fillna_datetime_columns(self): 'C': ['foo', 'bar', '?'], 'D': ['foo2', 'bar2', '?']}, index=pd.date_range('20130110', periods=3)) - self.assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_ffill(self): self.tsframe['A'][:5] = nan @@ -322,6 +354,40 @@ def test_bfill(self): assert_frame_equal(self.tsframe.bfill(), self.tsframe.fillna(method='bfill')) + def test_frame_pad_backfill_limit(self): + index = np.arange(10) + df = DataFrame(np.random.randn(10, 4), index=index) + + result = df[:2].reindex(index, method='pad', limit=5) + + expected = df[:2].reindex(index).fillna(method='pad') + expected.values[-3:] = np.nan + tm.assert_frame_equal(result, expected) + + result = df[-2:].reindex(index, method='backfill', limit=5) + + expected = df[-2:].reindex(index).fillna(method='backfill') + expected.values[:3] = np.nan + tm.assert_frame_equal(result, expected) + + def test_frame_fillna_limit(self): + index = np.arange(10) + df = DataFrame(np.random.randn(10, 4), index=index) + + result = df[:2].reindex(index) + result = result.fillna(method='pad', limit=5) + + expected = df[:2].reindex(index).fillna(method='pad') + expected.values[-3:] = np.nan + tm.assert_frame_equal(result, expected) + + result = df[-2:].reindex(index) + result = result.fillna(method='backfill', limit=5) + + expected = df[-2:].reindex(index).fillna(method='backfill') + expected.values[:3] = np.nan + tm.assert_frame_equal(result, expected) + def test_fillna_skip_certain_blocks(self): # don't try to fill boolean, int blocks @@ -336,18 +402,21 @@ def test_fillna_inplace(self): df[3][-4:] = np.nan expected = df.fillna(value=0) - self.assertIsNot(expected, df) + assert expected is not df df.fillna(value=0, inplace=True) - assert_frame_equal(df, expected) + tm.assert_frame_equal(df, expected) + + expected = df.fillna(value={0: 0}, inplace=True) + assert expected is None df[1][:4] = np.nan df[3][-4:] = np.nan expected = df.fillna(method='ffill') - self.assertIsNot(expected, df) + assert expected is not df df.fillna(method='ffill', inplace=True) - assert_frame_equal(df, expected) + tm.assert_frame_equal(df, expected) def test_fillna_dict_series(self): df = DataFrame({'a': [nan, 1, 2, nan, nan], @@ -370,7 +439,8 @@ def test_fillna_dict_series(self): assert_frame_equal(result, expected) # disable this for now - with assertRaisesRegexp(NotImplementedError, 'column by column'): + with tm.assert_raises_regex(NotImplementedError, + 'column by column'): df.fillna(df.max(1), axis=1) def test_fillna_dataframe(self): @@ -410,24 +480,23 @@ def test_fillna_columns(self): assert_frame_equal(result, expected) def test_fillna_invalid_method(self): - with assertRaisesRegexp(ValueError, 'ffil'): + with tm.assert_raises_regex(ValueError, 'ffil'): self.frame.fillna(method='ffil') def test_fillna_invalid_value(self): # list - self.assertRaises(TypeError, self.frame.fillna, [1, 2]) + pytest.raises(TypeError, self.frame.fillna, [1, 2]) # tuple - self.assertRaises(TypeError, self.frame.fillna, (1, 2)) + pytest.raises(TypeError, self.frame.fillna, (1, 2)) # frame with series - self.assertRaises(ValueError, self.frame.iloc[:, 0].fillna, - self.frame) + pytest.raises(ValueError, self.frame.iloc[:, 0].fillna, self.frame) def test_fillna_col_reordering(self): cols = ["COL." + str(i) for i in range(5, 0, -1)] data = np.random.rand(20, 5) df = DataFrame(index=lrange(20), columns=cols, data=data) filled = df.fillna(method='ffill') - self.assertEqual(df.columns.tolist(), filled.columns.tolist()) + assert df.columns.tolist() == filled.columns.tolist() def test_fill_corner(self): mf = self.mixed_frame @@ -435,7 +504,7 @@ def test_fill_corner(self): mf.loc[mf.index[-10:], 'A'] = nan filled = self.mixed_frame.fillna(value=0) - self.assertTrue((filled.loc[filled.index[5:20], 'foo'] == 0).all()) + assert (filled.loc[filled.index[5:20], 'foo'] == 0).all() del self.mixed_frame['foo'] empty_float = self.frame.reindex(columns=[]) @@ -453,7 +522,7 @@ def test_fill_value_when_combine_const(self): assert_frame_equal(res, exp) -class TestDataFrameInterpolate(tm.TestCase, TestData): +class TestDataFrameInterpolate(TestData): def test_interp_basic(self): df = DataFrame({'A': [1, 2, np.nan, 4], @@ -478,7 +547,7 @@ def test_interp_bad_method(self): 'B': [1, 4, 9, np.nan], 'C': [1, 2, 3, 5], 'D': list('abcd')}) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.interpolate(method='not_a_method') def test_interp_combo(self): @@ -498,11 +567,12 @@ def test_interp_combo(self): def test_interp_nan_idx(self): df = DataFrame({'A': [1, 2, np.nan, 4], 'B': [np.nan, 2, 3, 4]}) df = df.set_index('A') - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.interpolate(method='values') def test_interp_various(self): tm._skip_if_no_scipy() + df = DataFrame({'A': [1, 2, np.nan, 4, 5, np.nan, 7], 'C': [1, 2, 3, 5, 8, 13, 21]}) df = df.set_index('C') @@ -514,8 +584,15 @@ def test_interp_various(self): assert_frame_equal(result, expected) result = df.interpolate(method='cubic') - expected.A.loc[3] = 2.81621174 - expected.A.loc[13] = 5.64146581 + # GH #15662. + # new cubic and quadratic interpolation algorithms from scipy 0.19.0. + # previously `splmake` was used. See scipy/scipy#6710 + if _is_scipy_ge_0190: + expected.A.loc[3] = 2.81547781 + expected.A.loc[13] = 5.52964175 + else: + expected.A.loc[3] = 2.81621174 + expected.A.loc[13] = 5.64146581 assert_frame_equal(result, expected) result = df.interpolate(method='nearest') @@ -524,8 +601,12 @@ def test_interp_various(self): assert_frame_equal(result, expected, check_dtype=False) result = df.interpolate(method='quadratic') - expected.A.loc[3] = 2.82533638 - expected.A.loc[13] = 6.02817974 + if _is_scipy_ge_0190: + expected.A.loc[3] = 2.82150771 + expected.A.loc[13] = 6.12648668 + else: + expected.A.loc[3] = 2.82533638 + expected.A.loc[13] = 6.02817974 assert_frame_equal(result, expected) result = df.interpolate(method='slinear') @@ -538,11 +619,6 @@ def test_interp_various(self): expected.A.loc[13] = 5 assert_frame_equal(result, expected, check_dtype=False) - result = df.interpolate(method='quadratic') - expected.A.loc[3] = 2.82533638 - expected.A.loc[13] = 6.02817974 - assert_frame_equal(result, expected) - def test_interp_alt_scipy(self): tm._skip_if_no_scipy() df = DataFrame({'A': [1, 2, np.nan, 4, 5, np.nan, 7], @@ -619,7 +695,7 @@ def test_interp_raise_on_only_mixed(self): 'C': [np.nan, 2, 5, 7], 'D': [np.nan, np.nan, 9, 9], 'E': [1, 2, 3, 4]}) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.interpolate(axis=1) def test_interp_inplace(self): @@ -663,10 +739,3 @@ def test_interp_ignore_all_good(self): # all good result = df[['B', 'D']].interpolate(downcast=None) assert_frame_equal(result, df[['B', 'D']]) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - # '--with-coverage', '--cover-package=pandas.core'] - exit=False) diff --git a/pandas/tests/frame/test_mutate_columns.py b/pandas/tests/frame/test_mutate_columns.py index 5beab1565e538..4462260a290d9 100644 --- a/pandas/tests/frame/test_mutate_columns.py +++ b/pandas/tests/frame/test_mutate_columns.py @@ -1,15 +1,13 @@ # -*- coding: utf-8 -*- from __future__ import print_function - +import pytest from pandas.compat import range, lrange import numpy as np -from pandas import DataFrame, Series, Index +from pandas import DataFrame, Series, Index, MultiIndex -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_frame_equal import pandas.util.testing as tm @@ -19,9 +17,7 @@ # Column add, remove, delete. -class TestDataFrameMutateColumns(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameMutateColumns(TestData): def test_assign(self): df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) @@ -78,13 +74,13 @@ def test_assign_alphabetical(self): def test_assign_bad(self): df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # non-keyword argument - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.assign(lambda x: x.A) - with tm.assertRaises(AttributeError): + with pytest.raises(AttributeError): df.assign(C=df.A, D=df.A + df.C) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): df.assign(C=lambda df: df.A, D=lambda df: df['A'] + df['C']) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): df.assign(C=df.A, D=lambda x: x['A'] + x['C']) def test_insert_error_msmgs(self): @@ -95,7 +91,7 @@ def test_insert_error_msmgs(self): s = DataFrame({'foo': ['a', 'b', 'c', 'a'], 'fiz': [ 'g', 'h', 'i', 'j']}).set_index('foo') msg = 'cannot reindex from a duplicate axis' - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): df['newcol'] = s # GH 4107, more descriptive error message @@ -103,7 +99,7 @@ def test_insert_error_msmgs(self): columns=['a', 'b', 'c', 'd']) msg = 'incompatible index of inserted column with frame index' - with assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): df['gr'] = df.groupby(['b', 'c']).count() def test_insert_benchmark(self): @@ -123,12 +119,12 @@ def test_insert(self): columns=['c', 'b', 'a']) df.insert(0, 'foo', df['a']) - self.assert_index_equal(df.columns, Index(['foo', 'c', 'b', 'a'])) + tm.assert_index_equal(df.columns, Index(['foo', 'c', 'b', 'a'])) tm.assert_series_equal(df['a'], df['foo'], check_names=False) df.insert(2, 'bar', df['c']) - self.assert_index_equal(df.columns, - Index(['foo', 'c', 'bar', 'b', 'a'])) + tm.assert_index_equal(df.columns, + Index(['foo', 'c', 'bar', 'b', 'a'])) tm.assert_almost_equal(df['c'], df['bar'], check_names=False) # diff dtype @@ -136,25 +132,25 @@ def test_insert(self): # new item df['x'] = df['a'].astype('float32') result = Series(dict(float64=5, float32=1)) - self.assertTrue((df.get_dtype_counts() == result).all()) + assert (df.get_dtype_counts() == result).all() # replacing current (in different block) df['a'] = df['a'].astype('float32') result = Series(dict(float64=4, float32=2)) - self.assertTrue((df.get_dtype_counts() == result).all()) + assert (df.get_dtype_counts() == result).all() df['y'] = df['a'].astype('int32') result = Series(dict(float64=4, float32=2, int32=1)) - self.assertTrue((df.get_dtype_counts() == result).all()) + assert (df.get_dtype_counts() == result).all() - with assertRaisesRegexp(ValueError, 'already exists'): + with tm.assert_raises_regex(ValueError, 'already exists'): df.insert(1, 'a', df['b']) - self.assertRaises(ValueError, df.insert, 1, 'c', df['b']) + pytest.raises(ValueError, df.insert, 1, 'c', df['b']) df.columns.name = 'some_name' # preserve columns name field df.insert(0, 'baz', df['c']) - self.assertEqual(df.columns.name, 'some_name') + assert df.columns.name == 'some_name' # GH 13522 df = DataFrame(index=['A', 'B', 'C']) @@ -165,21 +161,45 @@ def test_insert(self): def test_delitem(self): del self.frame['A'] - self.assertNotIn('A', self.frame) + assert 'A' not in self.frame + + def test_delitem_multiindex(self): + midx = MultiIndex.from_product([['A', 'B'], [1, 2]]) + df = DataFrame(np.random.randn(4, 4), columns=midx) + assert len(df.columns) == 4 + assert ('A', ) in df.columns + assert 'A' in df.columns + + result = df['A'] + assert isinstance(result, DataFrame) + del df['A'] + + assert len(df.columns) == 2 + + # A still in the levels, BUT get a KeyError if trying + # to delete + assert ('A', ) not in df.columns + with pytest.raises(KeyError): + del df[('A',)] + + # xref: https://github.com/pandas-dev/pandas/issues/2770 + # the 'A' is STILL in the columns! + assert 'A' in df.columns + with pytest.raises(KeyError): + del df['A'] def test_pop(self): self.frame.columns.name = 'baz' self.frame.pop('A') - self.assertNotIn('A', self.frame) + assert 'A' not in self.frame self.frame['foo'] = 'bar' self.frame.pop('foo') - self.assertNotIn('foo', self.frame) - # TODO self.assertEqual(self.frame.columns.name, 'baz') + assert 'foo' not in self.frame + # TODO assert self.frame.columns.name == 'baz' - # 10912 - # inplace ops cause caching issue + # gh-10912: inplace ops cause caching issue a = DataFrame([[1, 2, 3], [4, 5, 6]], columns=[ 'A', 'B', 'C'], index=['X', 'Y']) b = a.pop('B') @@ -188,23 +208,23 @@ def test_pop(self): # original frame expected = DataFrame([[1, 3], [4, 6]], columns=[ 'A', 'C'], index=['X', 'Y']) - assert_frame_equal(a, expected) + tm.assert_frame_equal(a, expected) # result expected = Series([2, 5], index=['X', 'Y'], name='B') + 1 - assert_series_equal(b, expected) + tm.assert_series_equal(b, expected) def test_pop_non_unique_cols(self): df = DataFrame({0: [0, 1], 1: [0, 1], 2: [4, 5]}) df.columns = ["a", "b", "a"] res = df.pop("a") - self.assertEqual(type(res), DataFrame) - self.assertEqual(len(res), 2) - self.assertEqual(len(df.columns), 1) - self.assertTrue("b" in df.columns) - self.assertFalse("a" in df.columns) - self.assertEqual(len(df.index), 2) + assert type(res) == DataFrame + assert len(res) == 2 + assert len(df.columns) == 1 + assert "b" in df.columns + assert "a" not in df.columns + assert len(df.index) == 2 def test_insert_column_bug_4032(self): diff --git a/pandas/tests/frame/test_nonunique_indexes.py b/pandas/tests/frame/test_nonunique_indexes.py index 835c18ffc6081..4f77ba0ae1f5a 100644 --- a/pandas/tests/frame/test_nonunique_indexes.py +++ b/pandas/tests/frame/test_nonunique_indexes.py @@ -2,24 +2,21 @@ from __future__ import print_function +import pytest import numpy as np from pandas.compat import lrange, u from pandas import DataFrame, Series, MultiIndex, date_range import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameNonuniqueIndexes(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameNonuniqueIndexes(TestData): def test_column_dups_operations(self): @@ -54,7 +51,7 @@ def check(result, expected=None): [2, 1, 3, 5, 'bah']], columns=['foo', 'bar', 'foo', 'hello', 'string']) check(df, expected) - with assertRaisesRegexp(ValueError, 'Length of value'): + with tm.assert_raises_regex(ValueError, 'Length of value'): df.insert(0, 'AnotherColumn', range(len(df.index) - 1)) # insert same dtype @@ -89,7 +86,7 @@ def check(result, expected=None): check(df, expected) # consolidate - df = df.consolidate() + df = df._consolidate() expected = DataFrame([[1, 1, 'bah', 3], [1, 2, 'bah', 3], [2, 3, 'bah', 3]], columns=['foo', 'foo', 'string', 'foo2']) @@ -104,8 +101,8 @@ def check(result, expected=None): check(df, expected) # insert a dup - assertRaisesRegexp(ValueError, 'cannot insert', - df.insert, 2, 'new_col', 4.) + tm.assert_raises_regex(ValueError, 'cannot insert', + df.insert, 2, 'new_col', 4.) df.insert(2, 'new_col', 4., allow_duplicates=True) expected = DataFrame([[1, 1, 4., 5., 'bah', 3], [1, 2, 4., 5., 'bah', 3], @@ -154,7 +151,7 @@ def check(result, expected=None): df = DataFrame([[1, 2.5], [3, 4.5]], index=[1, 2], columns=['x', 'x']) result = df.values expected = np.array([[1, 2.5], [3, 4.5]]) - self.assertTrue((result == expected).all().all()) + assert (result == expected).all().all() # rename, GH 4403 df4 = DataFrame( @@ -191,8 +188,8 @@ def check(result, expected=None): # reindex is invalid! df = DataFrame([[1, 5, 7.], [1, 5, 7.], [1, 5, 7.]], columns=['bar', 'a', 'a']) - self.assertRaises(ValueError, df.reindex, columns=['bar']) - self.assertRaises(ValueError, df.reindex, columns=['bar', 'foo']) + pytest.raises(ValueError, df.reindex, columns=['bar']) + pytest.raises(ValueError, df.reindex, columns=['bar', 'foo']) # drop df = DataFrame([[1, 5, 7.], [1, 5, 7.], [1, 5, 7.]], @@ -309,7 +306,7 @@ def check(result, expected=None): # boolean with the duplicate raises df = DataFrame(np.arange(12).reshape(3, 4), columns=dups, dtype='float64') - self.assertRaises(ValueError, lambda: df[df.A > 6]) + pytest.raises(ValueError, lambda: df[df.A > 6]) # dup aligining operations should work # GH 5185 @@ -326,7 +323,7 @@ def check(result, expected=None): columns=['A', 'A']) # not-comparing like-labelled - self.assertRaises(ValueError, lambda: df1 == df2) + pytest.raises(ValueError, lambda: df1 == df2) df1r = df1.reindex_like(df2) result = df1r == df2 @@ -413,7 +410,7 @@ def test_columns_with_dups(self): assert_frame_equal(df, expected) # this is an error because we cannot disambiguate the dup columns - self.assertRaises(Exception, lambda x: DataFrame( + pytest.raises(Exception, lambda x: DataFrame( [[1, 2, 'foo', 'bar']], columns=['a', 'a', 'a', 'a'])) # dups across blocks @@ -428,10 +425,10 @@ def test_columns_with_dups(self): columns=df_float.columns) df = pd.concat([df_float, df_int, df_bool, df_object, df_dt], axis=1) - self.assertEqual(len(df._data._blknos), len(df.columns)) - self.assertEqual(len(df._data._blklocs), len(df.columns)) + assert len(df._data._blknos) == len(df.columns) + assert len(df._data._blklocs) == len(df.columns) - # testing iget + # testing iloc for i in range(len(df.columns)): df.iloc[:, i] @@ -451,7 +448,7 @@ def test_as_matrix_duplicates(self): expected = np.array([[1, 2, 'a', 'b'], [1, 2, 'a', 'b']], dtype=object) - self.assertTrue(np.array_equal(result, expected)) + assert np.array_equal(result, expected) def test_set_value_by_index(self): # See gh-12344 diff --git a/pandas/tests/frame/test_operators.py b/pandas/tests/frame/test_operators.py index f843a5c08ce05..5052bef24e95a 100644 --- a/pandas/tests/frame/test_operators.py +++ b/pandas/tests/frame/test_operators.py @@ -5,7 +5,7 @@ from datetime import datetime import operator -import nose +import pytest from numpy import nan, random import numpy as np @@ -15,13 +15,12 @@ from pandas import (DataFrame, Series, MultiIndex, Timestamp, date_range) import pandas.core.common as com -import pandas.formats.printing as printing +import pandas.io.formats.printing as printing import pandas as pd from pandas.util.testing import (assert_numpy_array_equal, assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) + assert_frame_equal) import pandas.util.testing as tm @@ -29,9 +28,7 @@ _check_mixed_int) -class TestDataFrameOperators(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameOperators(TestData): def test_operators(self): garbage = random.random(4) @@ -44,17 +41,17 @@ def test_operators(self): for idx, val in compat.iteritems(series): origVal = self.frame[col][idx] * 2 if not np.isnan(val): - self.assertEqual(val, origVal) + assert val == origVal else: - self.assertTrue(np.isnan(origVal)) + assert np.isnan(origVal) for col, series in compat.iteritems(seriesSum): for idx, val in compat.iteritems(series): origVal = self.frame[col][idx] + colSeries[col] if not np.isnan(val): - self.assertEqual(val, origVal) + assert val == origVal else: - self.assertTrue(np.isnan(origVal)) + assert np.isnan(origVal) added = self.frame2 + self.frame2 expected = self.frame2 * 2 @@ -71,7 +68,7 @@ def test_operators(self): DataFrame(index=[0], dtype=dtype), ] for df in frames: - self.assertTrue((df + df).equals(df)) + assert (df + df).equals(df) assert_frame_equal(df + df, df) def test_ops_np_scalar(self): @@ -121,12 +118,12 @@ def test_operators_boolean(self): def f(): DataFrame(1.0, index=[1], columns=['A']) | DataFrame( True, index=[1], columns=['A']) - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): DataFrame('foo', index=[1], columns=['A']) | DataFrame( True, index=[1], columns=['A']) - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_operators_none_as_na(self): df = DataFrame({"col1": [2, 5.0, 123, None], @@ -140,12 +137,12 @@ def test_operators_none_as_na(self): filled = df.fillna(np.nan) result = op(df, 3) expected = op(filled, 3).astype(object) - expected[com.isnull(expected)] = None + expected[com.isna(expected)] = None assert_frame_equal(result, expected) result = op(df, df) expected = op(filled, filled).astype(object) - expected[com.isnull(expected)] = None + expected[com.isna(expected)] = None assert_frame_equal(result, expected) result = op(df, df.fillna(7)) @@ -159,12 +156,12 @@ def test_comparison_invalid(self): def check(df, df2): for (x, y) in [(df, df2), (df2, df)]: - self.assertRaises(TypeError, lambda: x == y) - self.assertRaises(TypeError, lambda: x != y) - self.assertRaises(TypeError, lambda: x >= y) - self.assertRaises(TypeError, lambda: x > y) - self.assertRaises(TypeError, lambda: x < y) - self.assertRaises(TypeError, lambda: x <= y) + pytest.raises(TypeError, lambda: x == y) + pytest.raises(TypeError, lambda: x != y) + pytest.raises(TypeError, lambda: x >= y) + pytest.raises(TypeError, lambda: x > y) + pytest.raises(TypeError, lambda: x < y) + pytest.raises(TypeError, lambda: x <= y) # GH4968 # invalid date/int comparisons @@ -191,6 +188,7 @@ def test_timestamp_compare(self): df.loc[np.random.rand(len(df)) > 0.5, 'dates2'] = pd.NaT ops = {'gt': 'lt', 'lt': 'gt', 'ge': 'le', 'le': 'ge', 'eq': 'eq', 'ne': 'ne'} + for left, right in ops.items(): left_f = getattr(operator, left) right_f = getattr(operator, right) @@ -239,7 +237,7 @@ def test_modulo(self): s = p[0] res = s % p res2 = p % s - self.assertFalse(np.array_equal(res.fillna(0), res2.fillna(0))) + assert not np.array_equal(res.fillna(0), res2.fillna(0)) def test_div(self): @@ -273,7 +271,7 @@ def test_div(self): s = p[0] res = s / p res2 = p / s - self.assertFalse(np.array_equal(res.fillna(0), res2.fillna(0))) + assert not np.array_equal(res.fillna(0), res2.fillna(0)) def test_logical_operators(self): @@ -281,14 +279,14 @@ def _check_bin_op(op): result = op(df1, df2) expected = DataFrame(op(df1.values, df2.values), index=df1.index, columns=df1.columns) - self.assertEqual(result.values.dtype, np.bool_) + assert result.values.dtype == np.bool_ assert_frame_equal(result, expected) def _check_unary_op(op): result = op(df1) expected = DataFrame(op(df1.values), index=df1.index, columns=df1.columns) - self.assertEqual(result.values.dtype, np.bool_) + assert result.values.dtype == np.bool_ assert_frame_equal(result, expected) df1 = {'a': {'a': True, 'b': False, 'c': False, 'd': True, 'e': True}, @@ -318,14 +316,12 @@ def _check_unary_op(op): # operator.neg is deprecated in numpy >= 1.9 _check_unary_op(operator.inv) - def test_logical_typeerror(self): - if not compat.PY3: - self.assertRaises(TypeError, self.frame.__eq__, 'foo') - self.assertRaises(TypeError, self.frame.__lt__, 'foo') - self.assertRaises(TypeError, self.frame.__gt__, 'foo') - self.assertRaises(TypeError, self.frame.__ne__, 'foo') - else: - raise nose.SkipTest('test_logical_typeerror not tested on PY3') + @pytest.mark.parametrize('op,res', [('__eq__', False), + ('__ne__', True)]) + def test_logical_typeerror_with_non_valid(self, op, res): + # we are comparing floats vs a string + result = getattr(self.frame, op)('foo') + assert bool(result.all().all()) is res def test_logical_with_nas(self): d = DataFrame({'a': [np.nan, False], 'b': [True, True]}) @@ -425,10 +421,10 @@ def test_arith_flex_frame(self): # ndim >= 3 ndim_5 = np.ones(self.frame.shape + (3, 4, 5)) msg = "Unable to coerce to Series/DataFrame" - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): f(self.frame, ndim_5) - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): getattr(self.frame, op)(ndim_5) # res_add = self.frame.add(self.frame) @@ -450,9 +446,9 @@ def test_arith_flex_frame(self): result = self.frame[:0].add(self.frame) assert_frame_equal(result, self.frame * np.nan) - with assertRaisesRegexp(NotImplementedError, 'fill_value'): + with tm.assert_raises_regex(NotImplementedError, 'fill_value'): self.frame.add(self.frame.iloc[0], fill_value=3) - with assertRaisesRegexp(NotImplementedError, 'fill_value'): + with tm.assert_raises_regex(NotImplementedError, 'fill_value'): self.frame.add(self.frame.iloc[0], axis='index', fill_value=3) def test_binary_ops_align(self): @@ -576,8 +572,8 @@ def _check_unaligned_frame(meth, op, df, other): assert_frame_equal(rs, xp) # DataFrame - self.assertTrue(df.eq(df).values.all()) - self.assertFalse(df.ne(df).values.any()) + assert df.eq(df).values.all() + assert not df.ne(df).values.any() for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: f = getattr(df, op) o = getattr(operator, op) @@ -591,7 +587,7 @@ def _check_unaligned_frame(meth, op, df, other): # NAs msg = "Unable to coerce to Series/DataFrame" assert_frame_equal(f(np.nan), o(df, np.nan)) - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): f(ndim_5) # Series @@ -637,17 +633,17 @@ def _test_seq(df, idx_ser, col_ser): # NA df.loc[0, 0] = np.nan rs = df.eq(df) - self.assertFalse(rs.loc[0, 0]) + assert not rs.loc[0, 0] rs = df.ne(df) - self.assertTrue(rs.loc[0, 0]) + assert rs.loc[0, 0] rs = df.gt(df) - self.assertFalse(rs.loc[0, 0]) + assert not rs.loc[0, 0] rs = df.lt(df) - self.assertFalse(rs.loc[0, 0]) + assert not rs.loc[0, 0] rs = df.ge(df) - self.assertFalse(rs.loc[0, 0]) + assert not rs.loc[0, 0] rs = df.le(df) - self.assertFalse(rs.loc[0, 0]) + assert not rs.loc[0, 0] # complex arr = np.array([np.nan, 1, 6, np.nan]) @@ -655,14 +651,14 @@ def _test_seq(df, idx_ser, col_ser): df = DataFrame({'a': arr}) df2 = DataFrame({'a': arr2}) rs = df.gt(df2) - self.assertFalse(rs.values.any()) + assert not rs.values.any() rs = df.ne(df2) - self.assertTrue(rs.values.all()) + assert rs.values.all() arr3 = np.array([2j, np.nan, None]) df3 = DataFrame({'a': arr3}) rs = df3.gt(2j) - self.assertFalse(rs.values.any()) + assert not rs.values.any() # corner, dtype=object df1 = DataFrame({'col': ['foo', np.nan, 'bar']}) @@ -679,13 +675,13 @@ def test_return_dtypes_bool_op_costant(self): # not empty DataFrame for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: result = getattr(df, op)(const).get_dtype_counts() - self.assert_series_equal(result, Series([2], ['bool'])) + tm.assert_series_equal(result, Series([2], ['bool'])) # empty DataFrame empty = df.iloc[:0] for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: result = getattr(empty, op)(const).get_dtype_counts() - self.assert_series_equal(result, Series([2], ['bool'])) + tm.assert_series_equal(result, Series([2], ['bool'])) def test_dti_tz_convert_to_utc(self): base = pd.DatetimeIndex(['2011-01-01', '2011-01-02', @@ -769,31 +765,30 @@ def test_combineFrame(self): exp.loc[~exp.index.isin(indexer)] = np.nan tm.assert_series_equal(added['A'], exp.loc[added['A'].index]) - self.assertTrue( - np.isnan(added['C'].reindex(frame_copy.index)[:5]).all()) + assert np.isnan(added['C'].reindex(frame_copy.index)[:5]).all() # assert(False) - self.assertTrue(np.isnan(added['D']).all()) + assert np.isnan(added['D']).all() self_added = self.frame + self.frame - self.assert_index_equal(self_added.index, self.frame.index) + tm.assert_index_equal(self_added.index, self.frame.index) added_rev = frame_copy + self.frame - self.assertTrue(np.isnan(added['D']).all()) - self.assertTrue(np.isnan(added_rev['D']).all()) + assert np.isnan(added['D']).all() + assert np.isnan(added_rev['D']).all() # corner cases # empty plus_empty = self.frame + self.empty - self.assertTrue(np.isnan(plus_empty.values).all()) + assert np.isnan(plus_empty.values).all() empty_plus = self.empty + self.frame - self.assertTrue(np.isnan(empty_plus.values).all()) + assert np.isnan(empty_plus.values).all() empty_empty = self.empty + self.empty - self.assertTrue(empty_empty.empty) + assert empty_empty.empty # out of order reverse = self.frame.reindex(columns=self.frame.columns[::-1]) @@ -833,12 +828,14 @@ def test_combineSeries(self): for key, s in compat.iteritems(self.frame): assert_series_equal(larger_added[key], s + series[key]) - self.assertIn('E', larger_added) - self.assertTrue(np.isnan(larger_added['E']).all()) + assert 'E' in larger_added + assert np.isnan(larger_added['E']).all() - # vs mix (upcast) as needed + # no upcast needed added = self.mixed_float + series - _check_mixed_float(added, dtype='float64') + _check_mixed_float(added) + + # vs mix (upcast) as needed added = self.mixed_float + series.astype('float32') _check_mixed_float(added, dtype=dict(C=None)) added = self.mixed_float + series.astype('float16') @@ -865,16 +862,16 @@ def test_combineSeries(self): for key, col in compat.iteritems(self.tsframe): result = col + ts assert_series_equal(added[key], result, check_names=False) - self.assertEqual(added[key].name, key) + assert added[key].name == key if col.name == ts.name: - self.assertEqual(result.name, 'A') + assert result.name == 'A' else: - self.assertTrue(result.name is None) + assert result.name is None smaller_frame = self.tsframe[:-5] smaller_added = smaller_frame.add(ts, axis='index') - self.assert_index_equal(smaller_added.index, self.tsframe.index) + tm.assert_index_equal(smaller_added.index, self.tsframe.index) smaller_ts = ts[:-5] smaller_added2 = self.tsframe.add(smaller_ts, axis='index') @@ -895,22 +892,22 @@ def test_combineSeries(self): # empty but with non-empty index frame = self.tsframe[:1].reindex(columns=[]) result = frame.mul(ts, axis='index') - self.assertEqual(len(result), len(ts)) + assert len(result) == len(ts) def test_combineFunc(self): result = self.frame * 2 - self.assert_numpy_array_equal(result.values, self.frame.values * 2) + tm.assert_numpy_array_equal(result.values, self.frame.values * 2) # vs mix result = self.mixed_float * 2 for c, s in compat.iteritems(result): - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal( s.values, self.mixed_float[c].values * 2) _check_mixed_float(result, dtype=dict(C=None)) result = self.empty * 2 - self.assertIs(result.index, self.empty.index) - self.assertEqual(len(result.columns), 0) + assert result.index is self.empty.index + assert len(result.columns) == 0 def test_comparisons(self): df1 = tm.makeTimeDataFrame() @@ -921,21 +918,23 @@ def test_comparisons(self): def test_comp(func): result = func(df1, df2) - self.assert_numpy_array_equal(result.values, - func(df1.values, df2.values)) - with assertRaisesRegexp(ValueError, 'Wrong number of dimensions'): + tm.assert_numpy_array_equal(result.values, + func(df1.values, df2.values)) + with tm.assert_raises_regex(ValueError, + 'Wrong number of dimensions'): func(df1, ndim_5) result2 = func(self.simple, row) - self.assert_numpy_array_equal(result2.values, - func(self.simple.values, row.values)) + tm.assert_numpy_array_equal(result2.values, + func(self.simple.values, row.values)) result3 = func(self.frame, 0) - self.assert_numpy_array_equal(result3.values, - func(self.frame.values, 0)) + tm.assert_numpy_array_equal(result3.values, + func(self.frame.values, 0)) - with assertRaisesRegexp(ValueError, 'Can only compare ' - 'identically-labeled DataFrame'): + with tm.assert_raises_regex(ValueError, + 'Can only compare identically' + '-labeled DataFrame'): func(self.simple, self.simple[:2]) test_comp(operator.eq) @@ -952,7 +951,7 @@ def test_comparison_protected_from_errstate(self): expected = missing_df.values < 0 with np.errstate(invalid='raise'): result = (missing_df < 0).values - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_string_comparison(self): df = DataFrame([{"a": 1, "b": "foo"}, {"a": 2, "b": "bar"}]) @@ -968,7 +967,7 @@ def test_float_none_comparison(self): df = DataFrame(np.random.randn(8, 3), index=lrange(8), columns=['A', 'B', 'C']) - self.assertRaises(TypeError, df.__eq__, None) + pytest.raises(TypeError, df.__eq__, None) def test_boolean_comparison(self): @@ -1001,8 +1000,8 @@ def test_boolean_comparison(self): result = df.values > b_r assert_numpy_array_equal(result, expected.values) - self.assertRaises(ValueError, df.__gt__, b_c) - self.assertRaises(ValueError, df.values.__gt__, b_c) + pytest.raises(ValueError, df.__gt__, b_c) + pytest.raises(ValueError, df.values.__gt__, b_c) # == expected = DataFrame([[False, False], [True, False], [False, False]]) @@ -1021,8 +1020,8 @@ def test_boolean_comparison(self): result = df.values == b_r assert_numpy_array_equal(result, expected.values) - self.assertRaises(ValueError, lambda: df == b_c) - self.assertFalse(np.array_equal(df.values, b_c)) + pytest.raises(ValueError, lambda: df == b_c) + assert not np.array_equal(df.values, b_c) # with alignment df = DataFrame(np.arange(6).reshape((3, 2)), @@ -1037,76 +1036,8 @@ def test_boolean_comparison(self): assert_frame_equal(result, expected) # not shape compatible - self.assertRaises(ValueError, lambda: df == (2, 2)) - self.assertRaises(ValueError, lambda: df == [2, 2]) - - def test_combineAdd(self): - - with tm.assert_produces_warning(FutureWarning): - # trivial - comb = self.frame.combineAdd(self.frame) - assert_frame_equal(comb, self.frame * 2) - - # more rigorous - a = DataFrame([[1., nan, nan, 2., nan]], - columns=np.arange(5)) - b = DataFrame([[2., 3., nan, 2., 6., nan]], - columns=np.arange(6)) - expected = DataFrame([[3., 3., nan, 4., 6., nan]], - columns=np.arange(6)) - - with tm.assert_produces_warning(FutureWarning): - result = a.combineAdd(b) - assert_frame_equal(result, expected) - - with tm.assert_produces_warning(FutureWarning): - result2 = a.T.combineAdd(b.T) - assert_frame_equal(result2, expected.T) - - expected2 = a.combine(b, operator.add, fill_value=0.) - assert_frame_equal(expected, expected2) - - # corner cases - with tm.assert_produces_warning(FutureWarning): - comb = self.frame.combineAdd(self.empty) - assert_frame_equal(comb, self.frame) - - with tm.assert_produces_warning(FutureWarning): - comb = self.empty.combineAdd(self.frame) - assert_frame_equal(comb, self.frame) - - # integer corner case - df1 = DataFrame({'x': [5]}) - df2 = DataFrame({'x': [1]}) - df3 = DataFrame({'x': [6]}) - - with tm.assert_produces_warning(FutureWarning): - comb = df1.combineAdd(df2) - assert_frame_equal(comb, df3) - - # mixed type GH2191 - df1 = DataFrame({'A': [1, 2], 'B': [3, 4]}) - df2 = DataFrame({'A': [1, 2], 'C': [5, 6]}) - with tm.assert_produces_warning(FutureWarning): - rs = df1.combineAdd(df2) - xp = DataFrame({'A': [2, 4], 'B': [3, 4.], 'C': [5, 6.]}) - assert_frame_equal(xp, rs) - - # TODO: test integer fill corner? - - def test_combineMult(self): - with tm.assert_produces_warning(FutureWarning): - # trivial - comb = self.frame.combineMult(self.frame) - - assert_frame_equal(comb, self.frame ** 2) - - # corner cases - comb = self.frame.combineMult(self.empty) - assert_frame_equal(comb, self.frame) - - comb = self.empty.combineMult(self.frame) - assert_frame_equal(comb, self.frame) + pytest.raises(ValueError, lambda: df == (2, 2)) + pytest.raises(ValueError, lambda: df == [2, 2]) def test_combine_generic(self): df1 = self.frame @@ -1114,8 +1045,8 @@ def test_combine_generic(self): combined = df1.combine(df2, np.add) combined2 = df2.combine(df1, np.add) - self.assertTrue(combined['D'].isnull().all()) - self.assertTrue(combined2['D'].isnull().all()) + assert combined['D'].isna().all() + assert combined2['D'].isna().all() chunk = combined.loc[combined.index[:-5], ['A', 'B', 'C']] chunk2 = combined2.loc[combined2.index[:-5], ['A', 'B', 'C']] @@ -1185,16 +1116,16 @@ def test_inplace_ops_identity(self): s += 1 assert_series_equal(s, s2) assert_series_equal(s_orig + 1, s) - self.assertIs(s, s2) - self.assertIs(s._data, s2._data) + assert s is s2 + assert s._data is s2._data df = df_orig.copy() df2 = df df += 1 assert_frame_equal(df, df2) assert_frame_equal(df_orig + 1, df) - self.assertIs(df, df2) - self.assertIs(df._data, df2._data) + assert df is df2 + assert df._data is df2._data # dtype change s = s_orig.copy() @@ -1208,8 +1139,8 @@ def test_inplace_ops_identity(self): df += 1.5 assert_frame_equal(df, df2) assert_frame_equal(df_orig + 1.5, df) - self.assertIs(df, df2) - self.assertIs(df._data, df2._data) + assert df is df2 + assert df._data is df2._data # mixed dtype arr = np.random.randint(0, 10, size=5) @@ -1220,7 +1151,7 @@ def test_inplace_ops_identity(self): expected = DataFrame({'A': arr.copy() + 1, 'B': 'foo'}) assert_frame_equal(df, expected) assert_frame_equal(df2, expected) - self.assertIs(df._data, df2._data) + assert df._data is df2._data df = df_orig.copy() df2 = df @@ -1228,7 +1159,7 @@ def test_inplace_ops_identity(self): expected = DataFrame({'A': arr.copy() + 1.5, 'B': 'foo'}) assert_frame_equal(df, expected) assert_frame_equal(df2, expected) - self.assertIs(df._data, df2._data) + assert df._data is df2._data def test_alignment_non_pandas(self): index = ['A', 'B', 'C'] @@ -1247,10 +1178,10 @@ def test_alignment_non_pandas(self): # length mismatch msg = 'Unable to coerce to Series, length must be 3: given 2' for val in [[1, 2], (1, 2), np.array([1, 2])]: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): align(df, val, 'index') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): align(df, val, 'columns') val = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) @@ -1264,19 +1195,14 @@ def test_alignment_non_pandas(self): # shape mismatch msg = 'Unable to coerce to DataFrame, shape must be' val = np.array([[1, 2, 3], [4, 5, 6]]) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): align(df, val, 'index') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): align(df, val, 'columns') val = np.zeros((3, 3, 3)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): align(df, val, 'index') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): align(df, val, 'columns') - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/frame/test_period.py b/pandas/tests/frame/test_period.py new file mode 100644 index 0000000000000..482210966fe6b --- /dev/null +++ b/pandas/tests/frame/test_period.py @@ -0,0 +1,140 @@ +import numpy as np +from numpy.random import randn +from datetime import timedelta + +import pandas as pd +import pandas.util.testing as tm +from pandas import (PeriodIndex, period_range, DataFrame, date_range, + Index, to_datetime, DatetimeIndex) + + +def _permute(obj): + return obj.take(np.random.permutation(len(obj))) + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_as_frame_columns(self): + rng = period_range('1/1/2000', periods=5) + df = DataFrame(randn(10, 5), columns=rng) + + ts = df[rng[0]] + tm.assert_series_equal(ts, df.iloc[:, 0]) + + # GH # 1211 + repr(df) + + ts = df['1/1/2000'] + tm.assert_series_equal(ts, df.iloc[:, 0]) + + def test_frame_setitem(self): + rng = period_range('1/1/2000', periods=5, name='index') + df = DataFrame(randn(5, 3), index=rng) + + df['Index'] = rng + rs = Index(df['Index']) + tm.assert_index_equal(rs, rng, check_names=False) + assert rs.name == 'Index' + assert rng.name == 'index' + + rs = df.reset_index().set_index('index') + assert isinstance(rs.index, PeriodIndex) + tm.assert_index_equal(rs.index, rng) + + def test_frame_to_time_stamp(self): + K = 5 + index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + df = DataFrame(randn(len(index), K), index=index) + df['mix'] = 'a' + + exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') + result = df.to_timestamp('D', 'end') + tm.assert_index_equal(result.index, exp_index) + tm.assert_numpy_array_equal(result.values, df.values) + + exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') + result = df.to_timestamp('D', 'start') + tm.assert_index_equal(result.index, exp_index) + + def _get_with_delta(delta, freq='A-DEC'): + return date_range(to_datetime('1/1/2001') + delta, + to_datetime('12/31/2009') + delta, freq=freq) + + delta = timedelta(hours=23) + result = df.to_timestamp('H', 'end') + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + delta = timedelta(hours=23, minutes=59) + result = df.to_timestamp('T', 'end') + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + result = df.to_timestamp('S', 'end') + delta = timedelta(hours=23, minutes=59, seconds=59) + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + # columns + df = df.T + + exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') + result = df.to_timestamp('D', 'end', axis=1) + tm.assert_index_equal(result.columns, exp_index) + tm.assert_numpy_array_equal(result.values, df.values) + + exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') + result = df.to_timestamp('D', 'start', axis=1) + tm.assert_index_equal(result.columns, exp_index) + + delta = timedelta(hours=23) + result = df.to_timestamp('H', 'end', axis=1) + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.columns, exp_index) + + delta = timedelta(hours=23, minutes=59) + result = df.to_timestamp('T', 'end', axis=1) + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.columns, exp_index) + + result = df.to_timestamp('S', 'end', axis=1) + delta = timedelta(hours=23, minutes=59, seconds=59) + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.columns, exp_index) + + # invalid axis + tm.assert_raises_regex( + ValueError, 'axis', df.to_timestamp, axis=2) + + result1 = df.to_timestamp('5t', axis=1) + result2 = df.to_timestamp('t', axis=1) + expected = pd.date_range('2001-01-01', '2009-01-01', freq='AS') + assert isinstance(result1.columns, DatetimeIndex) + assert isinstance(result2.columns, DatetimeIndex) + tm.assert_numpy_array_equal(result1.columns.asi8, expected.asi8) + tm.assert_numpy_array_equal(result2.columns.asi8, expected.asi8) + # PeriodIndex.to_timestamp always use 'infer' + assert result1.columns.freqstr == 'AS-JAN' + assert result2.columns.freqstr == 'AS-JAN' + + def test_frame_index_to_string(self): + index = PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M') + frame = DataFrame(np.random.randn(3, 4), index=index) + + # it works! + frame.to_string() + + def test_align_frame(self): + rng = period_range('1/1/2000', '1/1/2010', freq='A') + ts = DataFrame(np.random.randn(len(rng), 3), index=rng) + + result = ts + ts[::2] + expected = ts + ts + expected.values[1::2] = np.nan + tm.assert_frame_equal(result, expected) + + result = ts + _permute(ts[::2]) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/frame/test_quantile.py b/pandas/tests/frame/test_quantile.py index 22414a6ba8a53..2f264874378bc 100644 --- a/pandas/tests/frame/test_quantile.py +++ b/pandas/tests/frame/test_quantile.py @@ -3,36 +3,31 @@ from __future__ import print_function -import nose +import pytest import numpy as np from pandas import (DataFrame, Series, Timestamp, _np_version_under1p11) import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm -from pandas import _np_version_under1p9 from pandas.tests.frame.common import TestData -class TestDataFrameQuantile(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameQuantile(TestData): def test_quantile(self): from numpy import percentile q = self.tsframe.quantile(0.1, axis=0) - self.assertEqual(q['A'], percentile(self.tsframe['A'], 10)) + assert q['A'] == percentile(self.tsframe['A'], 10) tm.assert_index_equal(q.index, self.tsframe.columns) q = self.tsframe.quantile(0.9, axis=1) - self.assertEqual(q['2000-01-17'], - percentile(self.tsframe.loc['2000-01-17'], 90)) + assert (q['2000-01-17'] == + percentile(self.tsframe.loc['2000-01-17'], 90)) tm.assert_index_equal(q.index, self.tsframe.index) # test degenerate case @@ -79,7 +74,7 @@ def test_quantile_axis_mixed(self): # must raise def f(): df.quantile(.5, axis=1, numeric_only=False) - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_quantile_axis_parameter(self): # GH 9543/9544 @@ -102,44 +97,41 @@ def test_quantile_axis_parameter(self): result = df.quantile(.5, axis="columns") assert_series_equal(result, expected) - self.assertRaises(ValueError, df.quantile, 0.1, axis=-1) - self.assertRaises(ValueError, df.quantile, 0.1, axis="column") + pytest.raises(ValueError, df.quantile, 0.1, axis=-1) + pytest.raises(ValueError, df.quantile, 0.1, axis="column") def test_quantile_interpolation(self): - # GH #10174 - if _np_version_under1p9: - raise nose.SkipTest("Numpy version under 1.9") - + # see gh-10174 from numpy import percentile # interpolation = linear (default case) q = self.tsframe.quantile(0.1, axis=0, interpolation='linear') - self.assertEqual(q['A'], percentile(self.tsframe['A'], 10)) + assert q['A'] == percentile(self.tsframe['A'], 10) q = self.intframe.quantile(0.1) - self.assertEqual(q['A'], percentile(self.intframe['A'], 10)) + assert q['A'] == percentile(self.intframe['A'], 10) # test with and without interpolation keyword q1 = self.intframe.quantile(0.1) - self.assertEqual(q1['A'], np.percentile(self.intframe['A'], 10)) - assert_series_equal(q, q1) + assert q1['A'] == np.percentile(self.intframe['A'], 10) + tm.assert_series_equal(q, q1) # interpolation method other than default linear df = DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]}, index=[1, 2, 3]) result = df.quantile(.5, axis=1, interpolation='nearest') expected = Series([1, 2, 3], index=[1, 2, 3], name=0.5) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # cross-check interpolation=nearest results in original dtype exp = np.percentile(np.array([[1, 2, 3], [2, 3, 4]]), .5, axis=0, interpolation='nearest') expected = Series(exp, index=[1, 2, 3], name=0.5, dtype='int64') - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # float df = DataFrame({"A": [1., 2., 3.], "B": [2., 3., 4.]}, index=[1, 2, 3]) result = df.quantile(.5, axis=1, interpolation='nearest') expected = Series([1., 2., 3.], index=[1, 2, 3], name=0.5) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) exp = np.percentile(np.array([[1., 2., 3.], [2., 3., 4.]]), .5, axis=0, interpolation='nearest') expected = Series(exp, index=[1, 2, 3], name=0.5, dtype='float64') @@ -170,44 +162,6 @@ def test_quantile_interpolation(self): index=[.25, .5], columns=['a', 'b', 'c']) assert_frame_equal(result, expected) - def test_quantile_interpolation_np_lt_1p9(self): - # GH #10174 - if not _np_version_under1p9: - raise nose.SkipTest("Numpy version is greater than 1.9") - - from numpy import percentile - - # interpolation = linear (default case) - q = self.tsframe.quantile(0.1, axis=0, interpolation='linear') - self.assertEqual(q['A'], percentile(self.tsframe['A'], 10)) - q = self.intframe.quantile(0.1) - self.assertEqual(q['A'], percentile(self.intframe['A'], 10)) - - # test with and without interpolation keyword - q1 = self.intframe.quantile(0.1) - self.assertEqual(q1['A'], np.percentile(self.intframe['A'], 10)) - assert_series_equal(q, q1) - - # interpolation method other than default linear - expErrMsg = "Interpolation methods other than linear" - df = DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]}, index=[1, 2, 3]) - with assertRaisesRegexp(ValueError, expErrMsg): - df.quantile(.5, axis=1, interpolation='nearest') - - with assertRaisesRegexp(ValueError, expErrMsg): - df.quantile([.5, .75], axis=1, interpolation='lower') - - # test degenerate case - df = DataFrame({'x': [], 'y': []}) - with assertRaisesRegexp(ValueError, expErrMsg): - q = df.quantile(0.1, axis=0, interpolation='higher') - - # multi - df = DataFrame([[1, 1, 1], [2, 2, 2], [3, 3, 3]], - columns=['a', 'b', 'c']) - with assertRaisesRegexp(ValueError, expErrMsg): - df.quantile([.25, .5], interpolation='midpoint') - def test_quantile_multi(self): df = DataFrame([[1, 1, 1], [2, 2, 2], [3, 3, 3]], columns=['a', 'b', 'c']) @@ -270,7 +224,7 @@ def test_quantile_datetime(self): def test_quantile_invalid(self): msg = 'percentiles should all be in the interval \\[0, 1\\]' for invalid in [-1, 2, [0.5, -1], [0.5, 2]]: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.tsframe.quantile(invalid) def test_quantile_box(self): @@ -433,7 +387,7 @@ def test_quantile_empty(self): # res = df.quantile(0.5) # datetimes - df = DataFrame(columns=['a', 'b'], dtype='datetime64') + df = DataFrame(columns=['a', 'b'], dtype='datetime64[ns]') # FIXME (gives NaNs instead of NaT in 0.18.1 or 0.19.0) # res = df.quantile(0.5, numeric_only=False) diff --git a/pandas/tests/frame/test_query_eval.py b/pandas/tests/frame/test_query_eval.py index 36ae5dac733a5..f0f1a2df27e93 100644 --- a/pandas/tests/frame/test_query_eval.py +++ b/pandas/tests/frame/test_query_eval.py @@ -3,8 +3,7 @@ from __future__ import print_function import operator -import nose -from itertools import product +import pytest from pandas.compat import (zip, range, lrange, StringIO) from pandas import DataFrame, Series, Index, MultiIndex, date_range @@ -15,11 +14,10 @@ from pandas.util.testing import (assert_series_equal, assert_frame_equal, - assertRaises, makeCustomDataframe as mkdf) import pandas.util.testing as tm -from pandas.computation import _NUMEXPR_INSTALLED +from pandas.core.computation import _NUMEXPR_INSTALLED from pandas.tests.frame.common import TestData @@ -28,21 +26,31 @@ ENGINES = 'python', 'numexpr' +@pytest.fixture(params=PARSERS, ids=lambda x: x) +def parser(request): + return request.param + + +@pytest.fixture(params=ENGINES, ids=lambda x: x) +def engine(request): + return request.param + + def skip_if_no_pandas_parser(parser): if parser != 'pandas': - raise nose.SkipTest("cannot evaluate with parser {0!r}".format(parser)) + pytest.skip("cannot evaluate with parser {0!r}".format(parser)) def skip_if_no_ne(engine='numexpr'): if engine == 'numexpr': if not _NUMEXPR_INSTALLED: - raise nose.SkipTest("cannot query engine numexpr when numexpr not " - "installed") + pytest.skip("cannot query engine numexpr when numexpr not " + "installed") -class TestCompat(tm.TestCase): +class TestCompat(object): - def setUp(self): + def setup_method(self, method): self.df = DataFrame({'A': [1, 2, 3]}) self.expected1 = self.df[self.df.A > 0] self.expected2 = self.df.A + 1 @@ -82,15 +90,13 @@ def test_query_numexpr(self): result = df.eval('A+1', engine='numexpr') assert_series_equal(result, self.expected2, check_names=False) else: - self.assertRaises(ImportError, - lambda: df.query('A>0', engine='numexpr')) - self.assertRaises(ImportError, - lambda: df.eval('A+1', engine='numexpr')) + pytest.raises(ImportError, + lambda: df.query('A>0', engine='numexpr')) + pytest.raises(ImportError, + lambda: df.eval('A+1', engine='numexpr')) -class TestDataFrameEval(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameEval(TestData): def test_ops(self): @@ -141,10 +147,10 @@ def test_query_non_str(self): df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'b']}) msg = "expr must be a string to be evaluated" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): df.query(lambda x: x.B == "b") - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): df.query(111) def test_query_empty_string(self): @@ -152,7 +158,7 @@ def test_query_empty_string(self): df = pd.DataFrame({'A': [1, 2, 3]}) msg = "expr cannot be an empty string" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): df.query('') def test_eval_resolvers_as_list(self): @@ -160,18 +166,17 @@ def test_eval_resolvers_as_list(self): df = DataFrame(randn(10, 2), columns=list('ab')) dict1 = {'a': 1} dict2 = {'b': 2} - self.assertTrue(df.eval('a + b', resolvers=[dict1, dict2]) == - dict1['a'] + dict2['b']) - self.assertTrue(pd.eval('a + b', resolvers=[dict1, dict2]) == - dict1['a'] + dict2['b']) - + assert (df.eval('a + b', resolvers=[dict1, dict2]) == + dict1['a'] + dict2['b']) + assert (pd.eval('a + b', resolvers=[dict1, dict2]) == + dict1['a'] + dict2['b']) -class TestDataFrameQueryWithMultiIndex(tm.TestCase): - _multiprocess_can_split_ = True +class TestDataFrameQueryWithMultiIndex(object): - def check_query_with_named_multiindex(self, parser, engine): + def test_query_with_named_multiindex(self, parser, engine): tm.skip_if_no_ne(engine) + skip_if_no_pandas_parser(parser) a = np.random.choice(['red', 'green'], size=10) b = np.random.choice(['eggs', 'ham'], size=10) index = MultiIndex.from_arrays([a, b], names=['color', 'food']) @@ -219,12 +224,9 @@ def check_query_with_named_multiindex(self, parser, engine): assert_frame_equal(res1, exp) assert_frame_equal(res2, exp) - def test_query_with_named_multiindex(self): - for parser, engine in product(['pandas'], ENGINES): - yield self.check_query_with_named_multiindex, parser, engine - - def check_query_with_unnamed_multiindex(self, parser, engine): + def test_query_with_unnamed_multiindex(self, parser, engine): tm.skip_if_no_ne(engine) + skip_if_no_pandas_parser(parser) a = np.random.choice(['red', 'green'], size=10) b = np.random.choice(['eggs', 'ham'], size=10) index = MultiIndex.from_arrays([a, b]) @@ -313,12 +315,9 @@ def check_query_with_unnamed_multiindex(self, parser, engine): assert_frame_equal(res1, exp) assert_frame_equal(res2, exp) - def test_query_with_unnamed_multiindex(self): - for parser, engine in product(['pandas'], ENGINES): - yield self.check_query_with_unnamed_multiindex, parser, engine - - def check_query_with_partially_named_multiindex(self, parser, engine): + def test_query_with_partially_named_multiindex(self, parser, engine): tm.skip_if_no_ne(engine) + skip_if_no_pandas_parser(parser) a = np.random.choice(['red', 'green'], size=10) b = np.arange(10) index = MultiIndex.from_arrays([a, b]) @@ -346,17 +345,7 @@ def check_query_with_partially_named_multiindex(self, parser, engine): exp = df[ind != "red"] assert_frame_equal(res, exp) - def test_query_with_partially_named_multiindex(self): - for parser, engine in product(['pandas'], ENGINES): - yield (self.check_query_with_partially_named_multiindex, - parser, engine) - def test_query_multiindex_get_index_resolvers(self): - for parser, engine in product(['pandas'], ENGINES): - yield (self.check_query_multiindex_get_index_resolvers, parser, - engine) - - def check_query_multiindex_get_index_resolvers(self, parser, engine): df = mkdf(10, 3, r_idx_nlevels=2, r_idx_names=['spam', 'eggs']) resolvers = df._get_index_resolvers() @@ -380,41 +369,31 @@ def to_series(mi, level): else: raise AssertionError("object must be a Series or Index") - def test_raise_on_panel_with_multiindex(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_raise_on_panel_with_multiindex, parser, engine - - def check_raise_on_panel_with_multiindex(self, parser, engine): + def test_raise_on_panel_with_multiindex(self, parser, engine): tm.skip_if_no_ne() p = tm.makePanel(7) p.items = tm.makeCustomIndex(len(p.items), nlevels=2) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('p + 1', parser=parser, engine=engine) - def test_raise_on_panel4d_with_multiindex(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_raise_on_panel4d_with_multiindex, parser, engine - - def check_raise_on_panel4d_with_multiindex(self, parser, engine): + def test_raise_on_panel4d_with_multiindex(self, parser, engine): tm.skip_if_no_ne() p4d = tm.makePanel4D(7) p4d.items = tm.makeCustomIndex(len(p4d.items), nlevels=2) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.eval('p4d + 1', parser=parser, engine=engine) -class TestDataFrameQueryNumExprPandas(tm.TestCase): +class TestDataFrameQueryNumExprPandas(object): @classmethod - def setUpClass(cls): - super(TestDataFrameQueryNumExprPandas, cls).setUpClass() + def setup_class(cls): cls.engine = 'numexpr' cls.parser = 'pandas' tm.skip_if_no_ne(cls.engine) @classmethod - def tearDownClass(cls): - super(TestDataFrameQueryNumExprPandas, cls).tearDownClass() + def teardown_class(cls): del cls.engine, cls.parser def test_date_query_with_attribute_access(self): @@ -488,7 +467,7 @@ def test_date_index_query_with_NaT_duplicates(self): df = DataFrame(d) df.loc[np.random.rand(n) > 0.5, 'dates1'] = pd.NaT df.set_index('dates1', inplace=True, drop=True) - res = df.query('index < 20130101 < dates3', engine=engine, + res = df.query('dates1 < 20130101 < dates3', engine=engine, parser=parser) expec = df[(df.index.to_series() < '20130101') & ('20130101' < df.dates3)] @@ -504,18 +483,18 @@ def test_date_query_with_non_date(self): ops = '==', '!=', '<', '>', '<=', '>=' for op in ops: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.query('dates %s nondate' % op, parser=parser, engine=engine) def test_query_syntax_error(self): engine, parser = self.engine, self.parser df = DataFrame({"i": lrange(10), "+": lrange(3, 13), "r": lrange(4, 14)}) - with tm.assertRaises(SyntaxError): + with pytest.raises(SyntaxError): df.query('i - +', engine=engine, parser=parser) def test_query_scope(self): - from pandas.computation.ops import UndefinedVariableError + from pandas.core.computation.ops import UndefinedVariableError engine, parser = self.engine, self.parser skip_if_no_pandas_parser(parser) @@ -531,34 +510,34 @@ def test_query_scope(self): assert_frame_equal(res, expected) # no local variable c - with tm.assertRaises(UndefinedVariableError): + with pytest.raises(UndefinedVariableError): df.query('@a > b > @c', engine=engine, parser=parser) # no column named 'c' - with tm.assertRaises(UndefinedVariableError): + with pytest.raises(UndefinedVariableError): df.query('@a > b > c', engine=engine, parser=parser) def test_query_doesnt_pickup_local(self): - from pandas.computation.ops import UndefinedVariableError + from pandas.core.computation.ops import UndefinedVariableError engine, parser = self.engine, self.parser n = m = 10 df = DataFrame(np.random.randint(m, size=(n, 3)), columns=list('abc')) # we don't pick up the local 'sin' - with tm.assertRaises(UndefinedVariableError): + with pytest.raises(UndefinedVariableError): df.query('sin > 5', engine=engine, parser=parser) def test_query_builtin(self): - from pandas.computation.engines import NumExprClobberingError + from pandas.core.computation.engines import NumExprClobberingError engine, parser = self.engine, self.parser n = m = 10 df = DataFrame(np.random.randint(m, size=(n, 3)), columns=list('abc')) df.index.name = 'sin' - with tm.assertRaisesRegexp(NumExprClobberingError, - 'Variables in expression.+'): + with tm.assert_raises_regex(NumExprClobberingError, + 'Variables in expression.+'): df.query('sin > 5', engine=engine, parser=parser) def test_query(self): @@ -628,12 +607,12 @@ def test_nested_scope(self): assert_frame_equal(result, expected) def test_nested_raises_on_local_self_reference(self): - from pandas.computation.ops import UndefinedVariableError + from pandas.core.computation.ops import UndefinedVariableError df = DataFrame(np.random.randn(5, 3)) # can't reference ourself b/c we're a local so @ is necessary - with tm.assertRaises(UndefinedVariableError): + with pytest.raises(UndefinedVariableError): df.query('df > 0', engine=self.engine, parser=self.parser) def test_local_syntax(self): @@ -687,12 +666,12 @@ def test_at_inside_string(self): assert_frame_equal(result, expected) def test_query_undefined_local(self): - from pandas.computation.ops import UndefinedVariableError + from pandas.core.computation.ops import UndefinedVariableError engine, parser = self.engine, self.parser skip_if_no_pandas_parser(parser) df = DataFrame(np.random.rand(10, 2), columns=list('ab')) - with tm.assertRaisesRegexp(UndefinedVariableError, - "local variable 'c' is not defined"): + with tm.assert_raises_regex(UndefinedVariableError, + "local variable 'c' is not defined"): df.query('a == @c', engine=engine, parser=parser) def test_index_resolvers_come_after_columns_with_the_same_name(self): @@ -738,8 +717,8 @@ def test_inf(self): class TestDataFrameQueryNumExprPython(TestDataFrameQueryNumExprPandas): @classmethod - def setUpClass(cls): - super(TestDataFrameQueryNumExprPython, cls).setUpClass() + def setup_class(cls): + super(TestDataFrameQueryNumExprPython, cls).setup_class() cls.engine = 'numexpr' cls.parser = 'python' tm.skip_if_no_ne(cls.engine) @@ -803,26 +782,26 @@ def test_date_index_query_with_NaT_duplicates(self): df['dates3'] = date_range('1/1/2014', periods=n) df.loc[np.random.rand(n) > 0.5, 'dates1'] = pd.NaT df.set_index('dates1', inplace=True, drop=True) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.query('index < 20130101 < dates3', engine=engine, parser=parser) def test_nested_scope(self): - from pandas.computation.ops import UndefinedVariableError + from pandas.core.computation.ops import UndefinedVariableError engine = self.engine parser = self.parser # smoke test x = 1 # noqa result = pd.eval('x + 1', engine=engine, parser=parser) - self.assertEqual(result, 2) + assert result == 2 df = DataFrame(np.random.randn(5, 3)) df2 = DataFrame(np.random.randn(5, 3)) # don't have the pandas parser - with tm.assertRaises(SyntaxError): + with pytest.raises(SyntaxError): df.query('(@df>0) & (@df2>0)', engine=engine, parser=parser) - with tm.assertRaises(UndefinedVariableError): + with pytest.raises(UndefinedVariableError): df.query('(df>0) & (df2>0)', engine=engine, parser=parser) expected = df[(df > 0) & (df2 > 0)] @@ -839,8 +818,8 @@ def test_nested_scope(self): class TestDataFrameQueryPythonPandas(TestDataFrameQueryNumExprPandas): @classmethod - def setUpClass(cls): - super(TestDataFrameQueryPythonPandas, cls).setUpClass() + def setup_class(cls): + super(TestDataFrameQueryPythonPandas, cls).setup_class() cls.engine = 'python' cls.parser = 'pandas' cls.frame = TestData().frame @@ -860,8 +839,8 @@ def test_query_builtin(self): class TestDataFrameQueryPythonPython(TestDataFrameQueryNumExprPython): @classmethod - def setUpClass(cls): - super(TestDataFrameQueryPythonPython, cls).setUpClass() + def setup_class(cls): + super(TestDataFrameQueryPythonPython, cls).setup_class() cls.engine = cls.parser = 'python' cls.frame = TestData().frame @@ -877,9 +856,9 @@ def test_query_builtin(self): assert_frame_equal(expected, result) -class TestDataFrameQueryStrings(tm.TestCase): +class TestDataFrameQueryStrings(object): - def check_str_query_method(self, parser, engine): + def test_str_query_method(self, parser, engine): tm.skip_if_no_ne(engine) df = DataFrame(randn(10, 1), columns=['b']) df['strings'] = Series(list('aabbccddee')) @@ -897,8 +876,9 @@ def check_str_query_method(self, parser, engine): for lhs, op, rhs in zip(lhs, ops, rhs): ex = '{lhs} {op} {rhs}'.format(lhs=lhs, op=op, rhs=rhs) - assertRaises(NotImplementedError, df.query, ex, engine=engine, - parser=parser, local_dict={'strings': df.strings}) + pytest.raises(NotImplementedError, df.query, ex, + engine=engine, parser=parser, + local_dict={'strings': df.strings}) else: res = df.query('"a" == strings', engine=engine, parser=parser) assert_frame_equal(res, expect) @@ -915,15 +895,7 @@ def check_str_query_method(self, parser, engine): assert_frame_equal(res, expect) assert_frame_equal(res, df[~df.strings.isin(['a'])]) - def test_str_query_method(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_str_query_method, parser, engine - - def test_str_list_query_method(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_str_list_query_method, parser, engine - - def check_str_list_query_method(self, parser, engine): + def test_str_list_query_method(self, parser, engine): tm.skip_if_no_ne(engine) df = DataFrame(randn(10, 1), columns=['b']) df['strings'] = Series(list('aabbccddee')) @@ -941,7 +913,7 @@ def check_str_list_query_method(self, parser, engine): for lhs, op, rhs in zip(lhs, ops, rhs): ex = '{lhs} {op} {rhs}'.format(lhs=lhs, op=op, rhs=rhs) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.query(ex, engine=engine, parser=parser) else: res = df.query('strings == ["a", "b"]', engine=engine, @@ -962,7 +934,7 @@ def check_str_list_query_method(self, parser, engine): parser=parser) assert_frame_equal(res, expect) - def check_query_with_string_columns(self, parser, engine): + def test_query_with_string_columns(self, parser, engine): tm.skip_if_no_ne(engine) df = DataFrame({'a': list('aaaabbbbcccc'), 'b': list('aabbccddeeff'), @@ -977,17 +949,13 @@ def check_query_with_string_columns(self, parser, engine): expec = df[df.a.isin(df.b) & (df.c < df.d)] assert_frame_equal(res, expec) else: - with assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.query('a in b', parser=parser, engine=engine) - with assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.query('a in b and c < d', parser=parser, engine=engine) - def test_query_with_string_columns(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_query_with_string_columns, parser, engine - - def check_object_array_eq_ne(self, parser, engine): + def test_object_array_eq_ne(self, parser, engine): tm.skip_if_no_ne(engine) df = DataFrame({'a': list('aaaabbbbcccc'), 'b': list('aabbccddeeff'), @@ -1001,11 +969,7 @@ def check_object_array_eq_ne(self, parser, engine): exp = df[df.a != df.b] assert_frame_equal(res, exp) - def test_object_array_eq_ne(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_object_array_eq_ne, parser, engine - - def check_query_with_nested_strings(self, parser, engine): + def test_query_with_nested_strings(self, parser, engine): tm.skip_if_no_ne(engine) skip_if_no_pandas_parser(parser) raw = """id event timestamp @@ -1029,11 +993,7 @@ def check_query_with_nested_strings(self, parser, engine): engine=engine) assert_frame_equal(expected, res) - def test_query_with_nested_string(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_query_with_nested_strings, parser, engine - - def check_query_with_nested_special_character(self, parser, engine): + def test_query_with_nested_special_character(self, parser, engine): skip_if_no_pandas_parser(parser) tm.skip_if_no_ne(engine) df = DataFrame({'a': ['a', 'b', 'test & test'], @@ -1042,12 +1002,7 @@ def check_query_with_nested_special_character(self, parser, engine): expec = df[df.a == 'test & test'] assert_frame_equal(res, expec) - def test_query_with_nested_special_character(self): - for parser, engine in product(PARSERS, ENGINES): - yield (self.check_query_with_nested_special_character, - parser, engine) - - def check_query_lex_compare_strings(self, parser, engine): + def test_query_lex_compare_strings(self, parser, engine): tm.skip_if_no_ne(engine=engine) import operator as opr @@ -1062,11 +1017,7 @@ def check_query_lex_compare_strings(self, parser, engine): expected = df[func(df.X, 'd')] assert_frame_equal(res, expected) - def test_query_lex_compare_strings(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_query_lex_compare_strings, parser, engine - - def check_query_single_element_booleans(self, parser, engine): + def test_query_single_element_booleans(self, parser, engine): tm.skip_if_no_ne(engine) columns = 'bid', 'bidsize', 'ask', 'asksize' data = np.random.randint(2, size=(1, len(columns))).astype(bool) @@ -1075,12 +1026,9 @@ def check_query_single_element_booleans(self, parser, engine): expected = df[df.bid & df.ask] assert_frame_equal(res, expected) - def test_query_single_element_booleans(self): - for parser, engine in product(PARSERS, ENGINES): - yield self.check_query_single_element_booleans, parser, engine - - def check_query_string_scalar_variable(self, parser, engine): + def test_query_string_scalar_variable(self, parser, engine): tm.skip_if_no_ne(engine) + skip_if_no_pandas_parser(parser) df = pd.DataFrame({'Symbol': ['BUD US', 'BUD US', 'IBM US', 'IBM US'], 'Price': [109.70, 109.72, 183.30, 183.35]}) e = df[df.Symbol == 'BUD US'] @@ -1088,24 +1036,19 @@ def check_query_string_scalar_variable(self, parser, engine): r = df.query('Symbol == @symb', parser=parser, engine=engine) assert_frame_equal(e, r) - def test_query_string_scalar_variable(self): - for parser, engine in product(['pandas'], ENGINES): - yield self.check_query_string_scalar_variable, parser, engine - -class TestDataFrameEvalNumExprPandas(tm.TestCase): +class TestDataFrameEvalNumExprPandas(object): @classmethod - def setUpClass(cls): - super(TestDataFrameEvalNumExprPandas, cls).setUpClass() + def setup_class(cls): cls.engine = 'numexpr' cls.parser = 'pandas' tm.skip_if_no_ne() - def setUp(self): + def setup_method(self, method): self.frame = DataFrame(randn(10, 3), columns=list('abc')) - def tearDown(self): + def teardown_method(self, method): del self.frame def test_simple_expr(self): @@ -1123,9 +1066,9 @@ def test_invalid_type_for_operator_raises(self): df = DataFrame({'a': [1, 2], 'b': ['c', 'd']}) ops = '+', '-', '*', '/' for op in ops: - with tm.assertRaisesRegexp(TypeError, - r"unsupported operand type\(s\) for " - r".+: '.+' and '.+'"): + with tm.assert_raises_regex(TypeError, + "unsupported operand type\(s\) " + "for .+: '.+' and '.+'"): df.eval('a {0} b'.format(op), engine=self.engine, parser=self.parser) @@ -1133,8 +1076,8 @@ def test_invalid_type_for_operator_raises(self): class TestDataFrameEvalNumExprPython(TestDataFrameEvalNumExprPandas): @classmethod - def setUpClass(cls): - super(TestDataFrameEvalNumExprPython, cls).setUpClass() + def setup_class(cls): + super(TestDataFrameEvalNumExprPython, cls).setup_class() cls.engine = 'numexpr' cls.parser = 'python' tm.skip_if_no_ne(cls.engine) @@ -1143,8 +1086,8 @@ def setUpClass(cls): class TestDataFrameEvalPythonPandas(TestDataFrameEvalNumExprPandas): @classmethod - def setUpClass(cls): - super(TestDataFrameEvalPythonPandas, cls).setUpClass() + def setup_class(cls): + super(TestDataFrameEvalPythonPandas, cls).setup_class() cls.engine = 'python' cls.parser = 'pandas' @@ -1152,11 +1095,5 @@ def setUpClass(cls): class TestDataFrameEvalPythonPython(TestDataFrameEvalNumExprPython): @classmethod - def setUpClass(cls): - super(TestDataFrameEvalPythonPython, cls).tearDownClass() + def setup_class(cls): cls.engine = cls.parser = 'python' - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/frame/test_rank.py b/pandas/tests/frame/test_rank.py new file mode 100644 index 0000000000000..58f4d9b770173 --- /dev/null +++ b/pandas/tests/frame/test_rank.py @@ -0,0 +1,267 @@ +# -*- coding: utf-8 -*- +import pytest +from datetime import timedelta, datetime +from distutils.version import LooseVersion +from numpy import nan +import numpy as np + +from pandas import Series, DataFrame + +from pandas.compat import product +from pandas.util.testing import assert_frame_equal +import pandas.util.testing as tm +from pandas.tests.frame.common import TestData + + +class TestRank(TestData): + s = Series([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]) + df = DataFrame({'A': s, 'B': s}) + + results = { + 'average': np.array([1.5, 5.5, 7.0, 3.5, nan, + 3.5, 1.5, 8.0, nan, 5.5]), + 'min': np.array([1, 5, 7, 3, nan, 3, 1, 8, nan, 5]), + 'max': np.array([2, 6, 7, 4, nan, 4, 2, 8, nan, 6]), + 'first': np.array([1, 5, 7, 3, nan, 4, 2, 8, nan, 6]), + 'dense': np.array([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]), + } + + def test_rank(self): + rankdata = pytest.importorskip('scipy.stats.rankdata') + + self.frame['A'][::2] = np.nan + self.frame['B'][::3] = np.nan + self.frame['C'][::4] = np.nan + self.frame['D'][::5] = np.nan + + ranks0 = self.frame.rank() + ranks1 = self.frame.rank(1) + mask = np.isnan(self.frame.values) + + fvals = self.frame.fillna(np.inf).values + + exp0 = np.apply_along_axis(rankdata, 0, fvals) + exp0[mask] = np.nan + + exp1 = np.apply_along_axis(rankdata, 1, fvals) + exp1[mask] = np.nan + + tm.assert_almost_equal(ranks0.values, exp0) + tm.assert_almost_equal(ranks1.values, exp1) + + # integers + df = DataFrame(np.random.randint(0, 5, size=40).reshape((10, 4))) + + result = df.rank() + exp = df.astype(float).rank() + tm.assert_frame_equal(result, exp) + + result = df.rank(1) + exp = df.astype(float).rank(1) + tm.assert_frame_equal(result, exp) + + def test_rank2(self): + df = DataFrame([[1, 3, 2], [1, 2, 3]]) + expected = DataFrame([[1.0, 3.0, 2.0], [1, 2, 3]]) / 3.0 + result = df.rank(1, pct=True) + tm.assert_frame_equal(result, expected) + + df = DataFrame([[1, 3, 2], [1, 2, 3]]) + expected = df.rank(0) / 2.0 + result = df.rank(0, pct=True) + tm.assert_frame_equal(result, expected) + + df = DataFrame([['b', 'c', 'a'], ['a', 'c', 'b']]) + expected = DataFrame([[2.0, 3.0, 1.0], [1, 3, 2]]) + result = df.rank(1, numeric_only=False) + tm.assert_frame_equal(result, expected) + + expected = DataFrame([[2.0, 1.5, 1.0], [1, 1.5, 2]]) + result = df.rank(0, numeric_only=False) + tm.assert_frame_equal(result, expected) + + df = DataFrame([['b', np.nan, 'a'], ['a', 'c', 'b']]) + expected = DataFrame([[2.0, nan, 1.0], [1.0, 3.0, 2.0]]) + result = df.rank(1, numeric_only=False) + tm.assert_frame_equal(result, expected) + + expected = DataFrame([[2.0, nan, 1.0], [1.0, 1.0, 2.0]]) + result = df.rank(0, numeric_only=False) + tm.assert_frame_equal(result, expected) + + # f7u12, this does not work without extensive workaround + data = [[datetime(2001, 1, 5), nan, datetime(2001, 1, 2)], + [datetime(2000, 1, 2), datetime(2000, 1, 3), + datetime(2000, 1, 1)]] + df = DataFrame(data) + + # check the rank + expected = DataFrame([[2., nan, 1.], + [2., 3., 1.]]) + result = df.rank(1, numeric_only=False, ascending=True) + tm.assert_frame_equal(result, expected) + + expected = DataFrame([[1., nan, 2.], + [2., 1., 3.]]) + result = df.rank(1, numeric_only=False, ascending=False) + tm.assert_frame_equal(result, expected) + + # mixed-type frames + self.mixed_frame['datetime'] = datetime.now() + self.mixed_frame['timedelta'] = timedelta(days=1, seconds=1) + + result = self.mixed_frame.rank(1) + expected = self.mixed_frame.rank(1, numeric_only=True) + tm.assert_frame_equal(result, expected) + + df = DataFrame({"a": [1e-20, -5, 1e-20 + 1e-40, 10, + 1e60, 1e80, 1e-30]}) + exp = DataFrame({"a": [3.5, 1., 3.5, 5., 6., 7., 2.]}) + tm.assert_frame_equal(df.rank(), exp) + + def test_rank_na_option(self): + rankdata = pytest.importorskip('scipy.stats.rankdata') + + self.frame['A'][::2] = np.nan + self.frame['B'][::3] = np.nan + self.frame['C'][::4] = np.nan + self.frame['D'][::5] = np.nan + + # bottom + ranks0 = self.frame.rank(na_option='bottom') + ranks1 = self.frame.rank(1, na_option='bottom') + + fvals = self.frame.fillna(np.inf).values + + exp0 = np.apply_along_axis(rankdata, 0, fvals) + exp1 = np.apply_along_axis(rankdata, 1, fvals) + + tm.assert_almost_equal(ranks0.values, exp0) + tm.assert_almost_equal(ranks1.values, exp1) + + # top + ranks0 = self.frame.rank(na_option='top') + ranks1 = self.frame.rank(1, na_option='top') + + fval0 = self.frame.fillna((self.frame.min() - 1).to_dict()).values + fval1 = self.frame.T + fval1 = fval1.fillna((fval1.min() - 1).to_dict()).T + fval1 = fval1.fillna(np.inf).values + + exp0 = np.apply_along_axis(rankdata, 0, fval0) + exp1 = np.apply_along_axis(rankdata, 1, fval1) + + tm.assert_almost_equal(ranks0.values, exp0) + tm.assert_almost_equal(ranks1.values, exp1) + + # descending + + # bottom + ranks0 = self.frame.rank(na_option='top', ascending=False) + ranks1 = self.frame.rank(1, na_option='top', ascending=False) + + fvals = self.frame.fillna(np.inf).values + + exp0 = np.apply_along_axis(rankdata, 0, -fvals) + exp1 = np.apply_along_axis(rankdata, 1, -fvals) + + tm.assert_almost_equal(ranks0.values, exp0) + tm.assert_almost_equal(ranks1.values, exp1) + + # descending + + # top + ranks0 = self.frame.rank(na_option='bottom', ascending=False) + ranks1 = self.frame.rank(1, na_option='bottom', ascending=False) + + fval0 = self.frame.fillna((self.frame.min() - 1).to_dict()).values + fval1 = self.frame.T + fval1 = fval1.fillna((fval1.min() - 1).to_dict()).T + fval1 = fval1.fillna(np.inf).values + + exp0 = np.apply_along_axis(rankdata, 0, -fval0) + exp1 = np.apply_along_axis(rankdata, 1, -fval1) + + tm.assert_numpy_array_equal(ranks0.values, exp0) + tm.assert_numpy_array_equal(ranks1.values, exp1) + + def test_rank_axis(self): + # check if using axes' names gives the same result + df = DataFrame([[2, 1], [4, 3]]) + tm.assert_frame_equal(df.rank(axis=0), df.rank(axis='index')) + tm.assert_frame_equal(df.rank(axis=1), df.rank(axis='columns')) + + def test_rank_methods_frame(self): + pytest.importorskip('scipy.stats.special') + rankdata = pytest.importorskip('scipy.stats.rankdata') + import scipy + + xs = np.random.randint(0, 21, (100, 26)) + xs = (xs - 10.0) / 10.0 + cols = [chr(ord('z') - i) for i in range(xs.shape[1])] + + for vals in [xs, xs + 1e6, xs * 1e-6]: + df = DataFrame(vals, columns=cols) + + for ax in [0, 1]: + for m in ['average', 'min', 'max', 'first', 'dense']: + result = df.rank(axis=ax, method=m) + sprank = np.apply_along_axis( + rankdata, ax, vals, + m if m != 'first' else 'ordinal') + sprank = sprank.astype(np.float64) + expected = DataFrame(sprank, columns=cols) + + if LooseVersion(scipy.__version__) >= '0.17.0': + expected = expected.astype('float64') + tm.assert_frame_equal(result, expected) + + def test_rank_descending(self): + dtypes = ['O', 'f8', 'i8'] + + for dtype, method in product(dtypes, self.results): + if 'i' in dtype: + df = self.df.dropna() + else: + df = self.df.astype(dtype) + + res = df.rank(ascending=False) + expected = (df.max() - df).rank() + assert_frame_equal(res, expected) + + if method == 'first' and dtype == 'O': + continue + + expected = (df.max() - df).rank(method=method) + + if dtype != 'O': + res2 = df.rank(method=method, ascending=False, + numeric_only=True) + assert_frame_equal(res2, expected) + + res3 = df.rank(method=method, ascending=False, + numeric_only=False) + assert_frame_equal(res3, expected) + + def test_rank_2d_tie_methods(self): + df = self.df + + def _check2d(df, expected, method='average', axis=0): + exp_df = DataFrame({'A': expected, 'B': expected}) + + if axis == 1: + df = df.T + exp_df = exp_df.T + + result = df.rank(method=method, axis=axis) + assert_frame_equal(result, exp_df) + + dtypes = [None, object] + disabled = set([(object, 'first')]) + results = self.results + + for method, axis, dtype in product(results, [0, 1], dtypes): + if (dtype, method) in disabled: + continue + frame = df if dtype is None else df.astype(dtype) + _check2d(frame, results[method], method=method, axis=axis) diff --git a/pandas/tests/frame/test_replace.py b/pandas/tests/frame/test_replace.py index adc7af225588c..fbc4accd0e41e 100644 --- a/pandas/tests/frame/test_replace.py +++ b/pandas/tests/frame/test_replace.py @@ -2,6 +2,8 @@ from __future__ import print_function +import pytest + from datetime import datetime import re @@ -21,9 +23,7 @@ from pandas.tests.frame.common import TestData -class TestDataFrameReplace(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameReplace(TestData): def test_replace_inplace(self): self.tsframe['A'][:5] = nan @@ -33,8 +33,8 @@ def test_replace_inplace(self): tsframe.replace(nan, 0, inplace=True) assert_frame_equal(tsframe, self.tsframe.fillna(0)) - self.assertRaises(TypeError, self.tsframe.replace, nan, inplace=True) - self.assertRaises(TypeError, self.tsframe.replace, nan) + pytest.raises(TypeError, self.tsframe.replace, nan, inplace=True) + pytest.raises(TypeError, self.tsframe.replace, nan) # mixed type mf = self.mixed_frame @@ -548,7 +548,7 @@ def test_regex_replace_numeric_to_object_conversion(self): expec = DataFrame({'a': ['a', 1, 2, 3], 'b': mix['b'], 'c': mix['c']}) res = df.replace(0, 'a') assert_frame_equal(res, expec) - self.assertEqual(res.a.dtype, np.object_) + assert res.a.dtype == np.object_ def test_replace_regex_metachar(self): metachars = '[]', '()', r'\d', r'\w', r'\s' @@ -720,7 +720,7 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self): assert_frame_equal(expected, result) def test_replace_value_is_none(self): - self.assertRaises(TypeError, self.tsframe.replace, nan) + pytest.raises(TypeError, self.tsframe.replace, nan) orig_value = self.tsframe.iloc[0, 0] orig2 = self.tsframe.iloc[1, 0] @@ -781,7 +781,7 @@ def test_replace_dtypes(self): # bools df = DataFrame({'bools': [True, False, True]}) result = df.replace(False, True) - self.assertTrue(result.values.all()) + assert result.values.all() # complex blocks df = DataFrame({'complex': [1j, 2j, 3j]}) @@ -797,7 +797,7 @@ def test_replace_dtypes(self): expected = DataFrame({'datetime64': Index([now] * 3)}) assert_frame_equal(result, expected) - def test_replace_input_formats(self): + def test_replace_input_formats_listlike(self): # both dicts to_rep = {'A': np.nan, 'B': 0, 'C': ''} values = {'A': 0, 'B': -1, 'C': 'missing'} @@ -814,15 +814,6 @@ def test_replace_input_formats(self): 'C': ['', 'asdf', 'fd']}) assert_frame_equal(result, expected) - # dict to scalar - filled = df.replace(to_rep, 0) - expected = {} - for k, v in compat.iteritems(df): - expected[k] = v.replace(to_rep[k], 0) - assert_frame_equal(filled, DataFrame(expected)) - - self.assertRaises(TypeError, df.replace, to_rep, [np.nan, 0, '']) - # scalar to dict values = {'A': 0, 'B': -1, 'C': 'missing'} df = DataFrame({'A': [np.nan, 0, np.nan], 'B': [0, 2, 5], @@ -842,7 +833,21 @@ def test_replace_input_formats(self): expected.replace(to_rep[i], values[i], inplace=True) assert_frame_equal(result, expected) - self.assertRaises(ValueError, df.replace, to_rep, values[1:]) + pytest.raises(ValueError, df.replace, to_rep, values[1:]) + + def test_replace_input_formats_scalar(self): + df = DataFrame({'A': [np.nan, 0, np.inf], 'B': [0, 2, 5], + 'C': ['', 'asdf', 'fd']}) + + # dict to scalar + to_rep = {'A': np.nan, 'B': 0, 'C': ''} + filled = df.replace(to_rep, 0) + expected = {} + for k, v in compat.iteritems(df): + expected[k] = v.replace(to_rep[k], 0) + assert_frame_equal(filled, DataFrame(expected)) + + pytest.raises(TypeError, df.replace, to_rep, [np.nan, 0, '']) # list to scalar to_rep = [np.nan, 0, ''] @@ -913,7 +918,7 @@ def test_replace_bool_with_bool(self): def test_replace_with_dict_with_bool_keys(self): df = DataFrame({0: [True, False], 1: [False, True]}) - with tm.assertRaisesRegexp(TypeError, 'Cannot compare types .+'): + with tm.assert_raises_regex(TypeError, 'Cannot compare types .+'): df.replace({'asdf': 'asdb', True: 'yes'}) def test_replace_truthy(self): @@ -924,7 +929,8 @@ def test_replace_truthy(self): def test_replace_int_to_int_chain(self): df = DataFrame({'a': lrange(1, 5)}) - with tm.assertRaisesRegexp(ValueError, "Replacement not allowed .+"): + with tm.assert_raises_regex(ValueError, + "Replacement not allowed .+"): df.replace({'a': dict(zip(range(1, 5), range(2, 6)))}) def test_replace_str_to_str_chain(self): @@ -932,7 +938,8 @@ def test_replace_str_to_str_chain(self): astr = a.astype(str) bstr = np.arange(2, 6).astype(str) df = DataFrame({'a': astr}) - with tm.assertRaisesRegexp(ValueError, "Replacement not allowed .+"): + with tm.assert_raises_regex(ValueError, + "Replacement not allowed .+"): df.replace({'a': dict(zip(astr, bstr))}) def test_replace_swapping_bug(self): @@ -971,7 +978,7 @@ def test_replace_period(self): 'out_augmented_MAY_2011.json', 'out_augmented_AUG_2011.json', 'out_augmented_JAN_2011.json'], columns=['fname']) - tm.assert_equal(set(df.fname.values), set(d['fname'].keys())) + assert set(df.fname.values) == set(d['fname'].keys()) expected = DataFrame({'fname': [d['fname'][k] for k in df.fname.values]}) result = df.replace(d) @@ -994,7 +1001,7 @@ def test_replace_datetime(self): 'out_augmented_MAY_2011.json', 'out_augmented_AUG_2011.json', 'out_augmented_JAN_2011.json'], columns=['fname']) - tm.assert_equal(set(df.fname.values), set(d['fname'].keys())) + assert set(df.fname.values) == set(d['fname'].keys()) expected = DataFrame({'fname': [d['fname'][k] for k in df.fname.values]}) result = df.replace(d) @@ -1055,3 +1062,13 @@ def test_replace_datetimetz(self): Timestamp('20130103', tz='US/Eastern')], 'B': [0, np.nan, 2]}) assert_frame_equal(result, expected) + + def test_replace_with_empty_dictlike(self): + # GH 15289 + mix = {'a': lrange(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']} + df = DataFrame(mix) + assert_frame_equal(df, df.replace({})) + assert_frame_equal(df, df.replace(Series([]))) + + assert_frame_equal(df, df.replace({'b': {}})) + assert_frame_equal(df, df.replace(Series({'b': {}}))) diff --git a/pandas/tests/frame/test_repr_info.py b/pandas/tests/frame/test_repr_info.py index 12cd62f8b4cc0..37f8c0cc85b23 100644 --- a/pandas/tests/frame/test_repr_info.py +++ b/pandas/tests/frame/test_repr_info.py @@ -8,10 +8,11 @@ from numpy import nan import numpy as np +import pytest from pandas import (DataFrame, compat, option_context) -from pandas.compat import StringIO, lrange, u -import pandas.formats.format as fmt +from pandas.compat import StringIO, lrange, u, PYPY +import pandas.io.formats.format as fmt import pandas as pd import pandas.util.testing as tm @@ -23,9 +24,7 @@ # structure -class TestDataFrameReprInfoEtc(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameReprInfoEtc(TestData): def test_repr_empty(self): # empty @@ -42,7 +41,7 @@ def test_repr_mixed(self): foo = repr(self.mixed_frame) # noqa self.mixed_frame.info(verbose=False, buf=buf) - @tm.slow + @pytest.mark.slow def test_repr_mixed_big(self): # big mixed biggie = DataFrame({'A': np.random.randn(200), @@ -74,22 +73,22 @@ def test_repr(self): self.empty.info(buf=buf) df = DataFrame(["a\n\r\tb"], columns=["a\n\r\td"], index=["a\n\r\tf"]) - self.assertFalse("\t" in repr(df)) - self.assertFalse("\r" in repr(df)) - self.assertFalse("a\n" in repr(df)) + assert "\t" not in repr(df) + assert "\r" not in repr(df) + assert "a\n" not in repr(df) def test_repr_dimensions(self): df = DataFrame([[1, 2, ], [3, 4]]) with option_context('display.show_dimensions', True): - self.assertTrue("2 rows x 2 columns" in repr(df)) + assert "2 rows x 2 columns" in repr(df) with option_context('display.show_dimensions', False): - self.assertFalse("2 rows x 2 columns" in repr(df)) + assert "2 rows x 2 columns" not in repr(df) with option_context('display.show_dimensions', 'truncate'): - self.assertFalse("2 rows x 2 columns" in repr(df)) + assert "2 rows x 2 columns" not in repr(df) - @tm.slow + @pytest.mark.slow def test_repr_big(self): # big one biggie = DataFrame(np.zeros((200, 4)), columns=lrange(4), @@ -120,7 +119,7 @@ def test_repr_unsortable(self): fmt.set_option('display.max_rows', 1000, 'display.max_columns', 1000) repr(self.frame) - self.reset_display_options() + tm.reset_display_options() warnings.filters = warn_filters @@ -134,11 +133,11 @@ def test_repr_unicode(self): result = repr(df) ex_top = ' A' - self.assertEqual(result.split('\n')[0].rstrip(), ex_top) + assert result.split('\n')[0].rstrip() == ex_top df = DataFrame({'A': [uval, uval]}) result = repr(df) - self.assertEqual(result.split('\n')[0].rstrip(), ex_top) + assert result.split('\n')[0].rstrip() == ex_top def test_unicode_string_with_unicode(self): df = DataFrame({'A': [u("\u05d0")]}) @@ -173,7 +172,7 @@ def test_repr_column_name_unicode_truncation_bug(self): ' the File through the code..')}) result = repr(df) - self.assertIn('StringCol', result) + assert 'StringCol' in result def test_latex_repr(self): result = r"""\begin{tabular}{llll} @@ -188,11 +187,12 @@ def test_latex_repr(self): with option_context("display.latex.escape", False, 'display.latex.repr', True): df = DataFrame([[r'$\alpha$', 'b', 'c'], [1, 2, 3]]) - self.assertEqual(result, df._repr_latex_()) + assert result == df._repr_latex_() # GH 12182 - self.assertIsNone(df._repr_latex_()) + assert df._repr_latex_() is None + @tm.capture_stdout def test_info(self): io = StringIO() self.frame.info(buf=io) @@ -200,11 +200,8 @@ def test_info(self): frame = DataFrame(np.random.randn(5, 3)) - import sys - sys.stdout = StringIO() frame.info() frame.info(verbose=False) - sys.stdout = sys.__stdout__ def test_info_wide(self): from pandas import set_option, reset_option @@ -215,13 +212,13 @@ def test_info_wide(self): io = StringIO() df.info(buf=io, max_cols=101) rs = io.getvalue() - self.assertTrue(len(rs.splitlines()) > 100) + assert len(rs.splitlines()) > 100 xp = rs set_option('display.max_info_columns', 101) io = StringIO() df.info(buf=io) - self.assertEqual(rs, xp) + assert rs == xp reset_option('display.max_info_columns') def test_info_duplicate_columns(self): @@ -241,8 +238,8 @@ def test_info_duplicate_columns_shows_correct_dtypes(self): frame.info(buf=io) io.seek(0) lines = io.readlines() - self.assertEqual('a 1 non-null int64\n', lines[3]) - self.assertEqual('a 1 non-null float64\n', lines[4]) + assert 'a 1 non-null int64\n' == lines[3] + assert 'a 1 non-null float64\n' == lines[4] def test_info_shows_column_dtypes(self): dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]', @@ -267,7 +264,7 @@ def test_info_max_cols(self): buf = StringIO() df.info(buf=buf, verbose=verbose) res = buf.getvalue() - self.assertEqual(len(res.strip().split('\n')), len_) + assert len(res.strip().split('\n')) == len_ for len_, verbose in [(10, None), (5, False), (10, True)]: @@ -276,7 +273,7 @@ def test_info_max_cols(self): buf = StringIO() df.info(buf=buf, verbose=verbose) res = buf.getvalue() - self.assertEqual(len(res.strip().split('\n')), len_) + assert len(res.strip().split('\n')) == len_ for len_, max_cols in [(10, 5), (5, 4)]: # setting truncates @@ -284,14 +281,14 @@ def test_info_max_cols(self): buf = StringIO() df.info(buf=buf, max_cols=max_cols) res = buf.getvalue() - self.assertEqual(len(res.strip().split('\n')), len_) + assert len(res.strip().split('\n')) == len_ # setting wouldn't truncate with option_context('max_info_columns', 5): buf = StringIO() df.info(buf=buf, max_cols=max_cols) res = buf.getvalue() - self.assertEqual(len(res.strip().split('\n')), len_) + assert len(res.strip().split('\n')) == len_ def test_info_memory_usage(self): # Ensure memory usage is displayed, when asserted, on the last line @@ -303,41 +300,28 @@ def test_info_memory_usage(self): data[i] = np.random.randint(2, size=n).astype(dtype) df = DataFrame(data) buf = StringIO() + # display memory usage case df.info(buf=buf, memory_usage=True) res = buf.getvalue().splitlines() - self.assertTrue("memory usage: " in res[-1]) + assert "memory usage: " in res[-1] + # do not display memory usage cas df.info(buf=buf, memory_usage=False) res = buf.getvalue().splitlines() - self.assertTrue("memory usage: " not in res[-1]) + assert "memory usage: " not in res[-1] df.info(buf=buf, memory_usage=True) res = buf.getvalue().splitlines() + # memory usage is a lower bound, so print it as XYZ+ MB - self.assertTrue(re.match(r"memory usage: [^+]+\+", res[-1])) + assert re.match(r"memory usage: [^+]+\+", res[-1]) df.iloc[:, :5].info(buf=buf, memory_usage=True) res = buf.getvalue().splitlines() - # excluded column with object dtype, so estimate is accurate - self.assertFalse(re.match(r"memory usage: [^+]+\+", res[-1])) - - df_with_object_index = pd.DataFrame({'a': [1]}, index=['foo']) - df_with_object_index.info(buf=buf, memory_usage=True) - res = buf.getvalue().splitlines() - self.assertTrue(re.match(r"memory usage: [^+]+\+", res[-1])) - - df_with_object_index.info(buf=buf, memory_usage='deep') - res = buf.getvalue().splitlines() - self.assertTrue(re.match(r"memory usage: [^+]+$", res[-1])) - - self.assertGreater(df_with_object_index.memory_usage(index=True, - deep=True).sum(), - df_with_object_index.memory_usage(index=True).sum()) - df_object = pd.DataFrame({'a': ['a']}) - self.assertGreater(df_object.memory_usage(deep=True).sum(), - df_object.memory_usage().sum()) + # excluded column with object dtype, so estimate is accurate + assert not re.match(r"memory usage: [^+]+\+", res[-1]) # Test a DataFrame with duplicate columns dtypes = ['int64', 'int64', 'int64', 'float64'] @@ -348,19 +332,27 @@ def test_info_memory_usage(self): df = DataFrame(data) df.columns = dtypes + df_with_object_index = pd.DataFrame({'a': [1]}, index=['foo']) + df_with_object_index.info(buf=buf, memory_usage=True) + res = buf.getvalue().splitlines() + assert re.match(r"memory usage: [^+]+\+", res[-1]) + + df_with_object_index.info(buf=buf, memory_usage='deep') + res = buf.getvalue().splitlines() + assert re.match(r"memory usage: [^+]+$", res[-1]) + # Ensure df size is as expected # (cols * rows * bytes) + index size df_size = df.memory_usage().sum() exp_size = len(dtypes) * n * 8 + df.index.nbytes - self.assertEqual(df_size, exp_size) + assert df_size == exp_size # Ensure number of cols in memory_usage is the same as df size_df = np.size(df.columns.values) + 1 # index=True; default - self.assertEqual(size_df, np.size(df.memory_usage())) + assert size_df == np.size(df.memory_usage()) # assert deep works only on object - self.assertEqual(df.memory_usage().sum(), - df.memory_usage(deep=True).sum()) + assert df.memory_usage().sum() == df.memory_usage(deep=True).sum() # test for validity DataFrame(1, index=['a'], columns=['A'] @@ -377,10 +369,76 @@ def test_info_memory_usage(self): df.memory_usage(index=True) df.index.values.nbytes + mem = df.memory_usage(deep=True).sum() + assert mem > 0 + + @pytest.mark.skipif(PYPY, + reason="on PyPy deep=True doesn't change result") + def test_info_memory_usage_deep_not_pypy(self): + df_with_object_index = pd.DataFrame({'a': [1]}, index=['foo']) + assert (df_with_object_index.memory_usage( + index=True, deep=True).sum() > + df_with_object_index.memory_usage( + index=True).sum()) + + df_object = pd.DataFrame({'a': ['a']}) + assert (df_object.memory_usage(deep=True).sum() > + df_object.memory_usage().sum()) + + @pytest.mark.skipif(not PYPY, + reason="on PyPy deep=True does not change result") + def test_info_memory_usage_deep_pypy(self): + df_with_object_index = pd.DataFrame({'a': [1]}, index=['foo']) + assert (df_with_object_index.memory_usage( + index=True, deep=True).sum() == + df_with_object_index.memory_usage( + index=True).sum()) + + df_object = pd.DataFrame({'a': ['a']}) + assert (df_object.memory_usage(deep=True).sum() == + df_object.memory_usage().sum()) + + @pytest.mark.skipif(PYPY, reason="PyPy getsizeof() fails by design") + def test_usage_via_getsizeof(self): + df = DataFrame( + data=1, + index=pd.MultiIndex.from_product( + [['a'], range(1000)]), + columns=['A'] + ) + mem = df.memory_usage(deep=True).sum() # sys.getsizeof will call the .memory_usage with # deep=True, and add on some GC overhead - diff = df.memory_usage(deep=True).sum() - sys.getsizeof(df) - self.assertTrue(abs(diff) < 100) + diff = mem - sys.getsizeof(df) + assert abs(diff) < 100 + + def test_info_memory_usage_qualified(self): + + buf = StringIO() + df = DataFrame(1, columns=list('ab'), + index=[1, 2, 3]) + df.info(buf=buf) + assert '+' not in buf.getvalue() + + buf = StringIO() + df = DataFrame(1, columns=list('ab'), + index=list('ABC')) + df.info(buf=buf) + assert '+' in buf.getvalue() + + buf = StringIO() + df = DataFrame(1, columns=list('ab'), + index=pd.MultiIndex.from_product( + [range(3), range(3)])) + df.info(buf=buf) + assert '+' not in buf.getvalue() + + buf = StringIO() + df = DataFrame(1, columns=list('ab'), + index=pd.MultiIndex.from_product( + [range(3), ['foo', 'bar']])) + df.info(buf=buf) + assert '+' in buf.getvalue() def test_info_memory_usage_bug_on_multiindex(self): # GH 14308 @@ -400,11 +458,11 @@ def memory_usage(f): df = DataFrame({'value': np.random.randn(N * M)}, index=index) unstacked = df.unstack('id') - self.assertEqual(df.values.nbytes, unstacked.values.nbytes) - self.assertTrue(memory_usage(df) > memory_usage(unstacked)) + assert df.values.nbytes == unstacked.values.nbytes + assert memory_usage(df) > memory_usage(unstacked) # high upper bound - self.assertTrue(memory_usage(unstacked) - memory_usage(df) < 2000) + assert memory_usage(unstacked) - memory_usage(df) < 2000 def test_info_categorical(self): # GH14298 diff --git a/pandas/tests/frame/test_reshape.py b/pandas/tests/frame/test_reshape.py index 705270b695b77..e2f362ebdc895 100644 --- a/pandas/tests/frame/test_reshape.py +++ b/pandas/tests/frame/test_reshape.py @@ -2,8 +2,11 @@ from __future__ import print_function +from warnings import catch_warnings from datetime import datetime + import itertools +import pytest from numpy.random import randn from numpy import nan @@ -14,18 +17,14 @@ Timedelta, Period) import pandas as pd -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameReshape(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameReshape(TestData): def test_pivot(self): data = { @@ -42,44 +41,45 @@ def test_pivot(self): 'One': {'A': 1., 'B': 2., 'C': 3.}, 'Two': {'A': 1., 'B': 2., 'C': 3.} }) - expected.index.name, expected.columns.name = 'index', 'columns' - assert_frame_equal(pivoted, expected) + expected.index.name, expected.columns.name = 'index', 'columns' + tm.assert_frame_equal(pivoted, expected) # name tracking - self.assertEqual(pivoted.index.name, 'index') - self.assertEqual(pivoted.columns.name, 'columns') + assert pivoted.index.name == 'index' + assert pivoted.columns.name == 'columns' # don't specify values pivoted = frame.pivot(index='index', columns='columns') - self.assertEqual(pivoted.index.name, 'index') - self.assertEqual(pivoted.columns.names, (None, 'columns')) + assert pivoted.index.name == 'index' + assert pivoted.columns.names == (None, 'columns') - # pivot multiple columns - wp = tm.makePanel() - lp = wp.to_frame() - df = lp.reset_index() - assert_frame_equal(df.pivot('major', 'minor'), lp.unstack()) + with catch_warnings(record=True): + # pivot multiple columns + wp = tm.makePanel() + lp = wp.to_frame() + df = lp.reset_index() + tm.assert_frame_equal(df.pivot('major', 'minor'), lp.unstack()) def test_pivot_duplicates(self): data = DataFrame({'a': ['bar', 'bar', 'foo', 'foo', 'foo'], 'b': ['one', 'two', 'one', 'one', 'two'], 'c': [1., 2., 3., 3., 4.]}) - with assertRaisesRegexp(ValueError, 'duplicate entries'): + with tm.assert_raises_regex(ValueError, 'duplicate entries'): data.pivot('a', 'b', 'c') def test_pivot_empty(self): df = DataFrame({}, columns=['a', 'b', 'c']) result = df.pivot('a', 'b', 'c') expected = DataFrame({}) - assert_frame_equal(result, expected, check_names=False) + tm.assert_frame_equal(result, expected, check_names=False) def test_pivot_integer_bug(self): df = DataFrame(data=[("A", "1", "A1"), ("B", "2", "B2")]) result = df.pivot(index=1, columns=0, values=2) repr(result) - self.assert_index_equal(result.columns, Index(['A', 'B'], name=0)) + tm.assert_index_equal(result.columns, Index(['A', 'B'], name=0)) def test_pivot_index_none(self): # gh-3962 @@ -106,36 +106,32 @@ def test_pivot_index_none(self): ('values', 'Two')], names=[None, 'columns']) expected.index.name = 'index' - assert_frame_equal(result, expected, check_names=False) - self.assertEqual(result.index.name, 'index',) - self.assertEqual(result.columns.names, (None, 'columns')) + tm.assert_frame_equal(result, expected, check_names=False) + assert result.index.name == 'index' + assert result.columns.names == (None, 'columns') expected.columns = expected.columns.droplevel(0) - - data = { - 'index': range(7), - 'columns': ['One', 'One', 'One', 'Two', 'Two', 'Two'], - 'values': [1., 2., 3., 3., 2., 1.] - } - result = frame.pivot(columns='columns', values='values') expected.columns.name = 'columns' - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_stack_unstack(self): - stacked = self.frame.stack() + f = self.frame.copy() + f[:] = np.arange(np.prod(f.shape)).reshape(f.shape) + + stacked = f.stack() stacked_df = DataFrame({'foo': stacked, 'bar': stacked}) unstacked = stacked.unstack() unstacked_df = stacked_df.unstack() - assert_frame_equal(unstacked, self.frame) - assert_frame_equal(unstacked_df['bar'], self.frame) + assert_frame_equal(unstacked, f) + assert_frame_equal(unstacked_df['bar'], f) unstacked_cols = stacked.unstack(0) unstacked_cols_df = stacked_df.unstack(0) - assert_frame_equal(unstacked_cols.T, self.frame) - assert_frame_equal(unstacked_cols_df['bar'].T, self.frame) + assert_frame_equal(unstacked_cols.T, f) + assert_frame_equal(unstacked_cols_df['bar'].T, f) def test_unstack_fill(self): @@ -361,7 +357,7 @@ def test_stack_mixed_levels(self): # When mixed types are passed and the ints are not level # names, raise - self.assertRaises(ValueError, df2.stack, level=['animal', 0]) + pytest.raises(ValueError, df2.stack, level=['animal', 0]) # GH #8584: Having 0 in the level names could raise a # strange error about lexsort depth @@ -442,7 +438,7 @@ def test_unstack_to_series(self): # check reversibility data = self.frame.unstack() - self.assertTrue(isinstance(data, Series)) + assert isinstance(data, Series) undo = data.unstack().T assert_frame_equal(undo, self.frame) @@ -513,17 +509,17 @@ def test_unstack_dtypes(self): right = right.set_index(['A', 'B']).unstack(0) right[('D', 'a')] = right[('D', 'a')].astype('int64') - self.assertEqual(left.shape, (3, 2)) - assert_frame_equal(left, right) + assert left.shape == (3, 2) + tm.assert_frame_equal(left, right) def test_unstack_non_unique_index_names(self): idx = MultiIndex.from_tuples([('a', 'b'), ('c', 'd')], names=['c1', 'c1']) df = DataFrame([1, 2], index=idx) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.unstack('c1') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.T.stack('c1') def test_unstack_nan_index(self): # GH7466 @@ -532,12 +528,12 @@ def test_unstack_nan_index(self): # GH7466 def verify(df): mk_list = lambda a: list(a) if isinstance(a, tuple) else [a] - rows, cols = df.notnull().values.nonzero() + rows, cols = df.notna().values.nonzero() for i, j in zip(rows, cols): left = sorted(df.iloc[i, j].split('.')) right = mk_list(df.index[i]) + mk_list(df.columns[j]) right = sorted(list(map(cast, right))) - self.assertEqual(left, right) + assert left == right df = DataFrame({'jim': ['a', 'b', nan, 'd'], 'joe': ['w', 'x', 'y', 'z'], @@ -551,7 +547,7 @@ def verify(df): mi = df.set_index(list(idx)) for lev in range(2): udf = mi.unstack(level=lev) - self.assertEqual(udf.notnull().values.sum(), len(df)) + assert udf.notna().values.sum() == len(df) verify(udf['jolie']) df = DataFrame({'1st': ['d'] * 3 + [nan] * 5 + ['a'] * 2 + @@ -569,7 +565,7 @@ def verify(df): mi = df.set_index(list(idx)) for lev in range(3): udf = mi.unstack(level=lev) - self.assertEqual(udf.notnull().values.sum(), 2 * len(df)) + assert udf.notna().values.sum() == 2 * len(df) for col in ['4th', '5th']: verify(udf[col]) @@ -674,12 +670,12 @@ def verify(df): df.loc[1, '3rd'] = df.loc[4, '3rd'] = nan left = df.set_index(['1st', '2nd', '3rd']).unstack(['2nd', '3rd']) - self.assertEqual(left.notnull().values.sum(), 2 * len(df)) + assert left.notna().values.sum() == 2 * len(df) for col in ['jim', 'joe']: for _, r in df.iterrows(): key = r['1st'], (col, r['2nd'], r['3rd']) - self.assertEqual(r[col], left.loc[key]) + assert r[col] == left.loc[key] def test_stack_datetime_column_multiIndex(self): # GH 8039 diff --git a/pandas/tests/frame/test_sorting.py b/pandas/tests/frame/test_sorting.py index bbd8dd9b48b5c..891c94b59074a 100644 --- a/pandas/tests/frame/test_sorting.py +++ b/pandas/tests/frame/test_sorting.py @@ -2,73 +2,29 @@ from __future__ import print_function +import pytest +import random import numpy as np +import pandas as pd from pandas.compat import lrange from pandas import (DataFrame, Series, MultiIndex, Timestamp, - date_range, NaT) + date_range, NaT, IntervalIndex) -from pandas.util.testing import (assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from pandas.tests.frame.common import TestData -class TestDataFrameSorting(tm.TestCase, TestData): - - _multiprocess_can_split_ = True - - def test_sort_index(self): - # GH13496 - - frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4], - columns=['A', 'B', 'C', 'D']) - - # axis=0 : sort rows by index labels - unordered = frame.loc[[3, 2, 4, 1]] - result = unordered.sort_index(axis=0) - expected = frame - assert_frame_equal(result, expected) - - result = unordered.sort_index(ascending=False) - expected = frame[::-1] - assert_frame_equal(result, expected) - - # axis=1 : sort columns by column names - unordered = frame.iloc[:, [2, 1, 3, 0]] - result = unordered.sort_index(axis=1) - assert_frame_equal(result, frame) - - result = unordered.sort_index(axis=1, ascending=False) - expected = frame.iloc[:, ::-1] - assert_frame_equal(result, expected) - - def test_sort_index_multiindex(self): - # GH13496 - - # sort rows by specified level of multi-index - mi = MultiIndex.from_tuples([[2, 1, 3], [1, 1, 1]], names=list('ABC')) - df = DataFrame([[1, 2], [3, 4]], mi) - - # MI sort, but no level: sort_level has no effect - mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, 1]], names=list('ABC')) - df = DataFrame([[1, 2], [3, 4]], mi) - result = df.sort_index(sort_remaining=False) - expected = df.sort_index() - assert_frame_equal(result, expected) +class TestDataFrameSorting(TestData): def test_sort(self): frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4], columns=['A', 'B', 'C', 'D']) - # 9816 deprecated - with tm.assert_produces_warning(FutureWarning): - frame.sort(columns='A') - with tm.assert_produces_warning(FutureWarning): - frame.sort() + # see gh-9816 with tm.assert_produces_warning(FutureWarning): frame.sortlevel() @@ -105,7 +61,7 @@ def test_sort_values(self): sorted_df = frame.sort_values(by=['B', 'A'], ascending=[True, False]) assert_frame_equal(sorted_df, expected) - self.assertRaises(ValueError, lambda: frame.sort_values( + pytest.raises(ValueError, lambda: frame.sort_values( by=['A', 'B'], axis=2, inplace=True)) # by row (axis=1): GH 10806 @@ -130,7 +86,7 @@ def test_sort_values(self): assert_frame_equal(sorted_df, expected) msg = r'Length of ascending \(5\) != length of by \(2\)' - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): frame.sort_values(by=['A', 'B'], axis=0, ascending=[True] * 5) def test_sort_values_inplace(self): @@ -157,21 +113,6 @@ def test_sort_values_inplace(self): expected = frame.sort_values(by=['A', 'B'], ascending=False) assert_frame_equal(sorted_df, expected) - def test_sort_index_categorical_index(self): - - df = (DataFrame({'A': np.arange(6, dtype='int64'), - 'B': Series(list('aabbca')) - .astype('category', categories=list('cab'))}) - .set_index('B')) - - result = df.sort_index() - expected = df.iloc[[4, 0, 1, 5, 2, 3]] - assert_frame_equal(result, expected) - - result = df.sort_index(ascending=False) - expected = df.iloc[[3, 2, 5, 1, 0, 4]] - assert_frame_equal(result, expected) - def test_sort_nan(self): # GH3917 nan = np.nan @@ -297,8 +238,95 @@ def test_stable_descending_multicolumn_sort(self): kind='mergesort') assert_frame_equal(sorted_df, expected) + def test_stable_categorial(self): + # GH 16793 + df = DataFrame({ + 'x': pd.Categorical(np.repeat([1, 2, 3, 4], 5), ordered=True) + }) + expected = df.copy() + sorted_df = df.sort_values('x', kind='mergesort') + assert_frame_equal(sorted_df, expected) + + def test_sort_datetimes(self): + + # GH 3461, argsort / lexsort differences for a datetime column + df = DataFrame(['a', 'a', 'a', 'b', 'c', 'd', 'e', 'f', 'g'], + columns=['A'], + index=date_range('20130101', periods=9)) + dts = [Timestamp(x) + for x in ['2004-02-11', '2004-01-21', '2004-01-26', + '2005-09-20', '2010-10-04', '2009-05-12', + '2008-11-12', '2010-09-28', '2010-09-28']] + df['B'] = dts[::2] + dts[1::2] + df['C'] = 2. + df['A1'] = 3. + + df1 = df.sort_values(by='A') + df2 = df.sort_values(by=['A']) + assert_frame_equal(df1, df2) + + df1 = df.sort_values(by='B') + df2 = df.sort_values(by=['B']) + assert_frame_equal(df1, df2) + + def test_frame_column_inplace_sort_exception(self): + s = self.frame['A'] + with tm.assert_raises_regex(ValueError, "This Series is a view"): + s.sort_values(inplace=True) + + cp = s.copy() + cp.sort_values() # it works! + + def test_sort_nat_values_in_int_column(self): + + # GH 14922: "sorting with large float and multiple columns incorrect" + + # cause was that the int64 value NaT was considered as "na". Which is + # only correct for datetime64 columns. + + int_values = (2, int(NaT)) + float_values = (2.0, -1.797693e308) + + df = DataFrame(dict(int=int_values, float=float_values), + columns=["int", "float"]) + + df_reversed = DataFrame(dict(int=int_values[::-1], + float=float_values[::-1]), + columns=["int", "float"], + index=[1, 0]) + + # NaT is not a "na" for int64 columns, so na_position must not + # influence the result: + df_sorted = df.sort_values(["int", "float"], na_position="last") + assert_frame_equal(df_sorted, df_reversed) + + df_sorted = df.sort_values(["int", "float"], na_position="first") + assert_frame_equal(df_sorted, df_reversed) + + # reverse sorting order + df_sorted = df.sort_values(["int", "float"], ascending=False) + assert_frame_equal(df_sorted, df) + + # and now check if NaT is still considered as "na" for datetime64 + # columns: + df = DataFrame(dict(datetime=[Timestamp("2016-01-01"), NaT], + float=float_values), columns=["datetime", "float"]) + + df_reversed = DataFrame(dict(datetime=[NaT, Timestamp("2016-01-01")], + float=float_values[::-1]), + columns=["datetime", "float"], + index=[1, 0]) + + df_sorted = df.sort_values(["datetime", "float"], na_position="first") + assert_frame_equal(df_sorted, df_reversed) + + df_sorted = df.sort_values(["datetime", "float"], na_position="last") + assert_frame_equal(df_sorted, df_reversed) + + +class TestDataFrameSortIndexKinds(TestData): + def test_sort_index_multicolumn(self): - import random A = np.arange(5).repeat(20) B = np.tile(np.arange(5), 20) random.shuffle(A) @@ -342,7 +370,7 @@ def test_sort_index_inplace(self): df.sort_index(inplace=True) expected = frame assert_frame_equal(df, expected) - self.assertNotEqual(a_id, id(df['A'])) + assert a_id != id(df['A']) df = unordered.copy() df.sort_index(ascending=False, inplace=True) @@ -399,26 +427,26 @@ def test_sort_index_duplicates(self): df = DataFrame([lrange(5, 9), lrange(4)], columns=['a', 'a', 'b', 'b']) - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): # use .sort_values #9816 with tm.assert_produces_warning(FutureWarning): df.sort_index(by='a') - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): df.sort_values(by='a') - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): # use .sort_values #9816 with tm.assert_produces_warning(FutureWarning): df.sort_index(by=['a']) - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): df.sort_values(by=['a']) - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): # use .sort_values #9816 with tm.assert_produces_warning(FutureWarning): # multi-column 'by' is separate codepath df.sort_index(by=['a', 'b']) - with assertRaisesRegexp(ValueError, 'duplicate'): + with tm.assert_raises_regex(ValueError, 'duplicate'): # multi-column 'by' is separate codepath df.sort_values(by=['a', 'b']) @@ -426,11 +454,11 @@ def test_sort_index_duplicates(self): # GH4370 df = DataFrame(np.random.randn(4, 2), columns=MultiIndex.from_tuples([('a', 0), ('a', 1)])) - with assertRaisesRegexp(ValueError, 'levels'): + with tm.assert_raises_regex(ValueError, 'levels'): # use .sort_values #9816 with tm.assert_produces_warning(FutureWarning): df.sort_index(by='a') - with assertRaisesRegexp(ValueError, 'levels'): + with tm.assert_raises_regex(ValueError, 'levels'): df.sort_values(by='a') # convert tuples to a list of tuples @@ -454,78 +482,73 @@ def test_sort_index_level(self): res = df.sort_index(level=['A', 'B'], sort_remaining=False) assert_frame_equal(df, res) - def test_sort_datetimes(self): - - # GH 3461, argsort / lexsort differences for a datetime column - df = DataFrame(['a', 'a', 'a', 'b', 'c', 'd', 'e', 'f', 'g'], - columns=['A'], - index=date_range('20130101', periods=9)) - dts = [Timestamp(x) - for x in ['2004-02-11', '2004-01-21', '2004-01-26', - '2005-09-20', '2010-10-04', '2009-05-12', - '2008-11-12', '2010-09-28', '2010-09-28']] - df['B'] = dts[::2] + dts[1::2] - df['C'] = 2. - df['A1'] = 3. - - df1 = df.sort_values(by='A') - df2 = df.sort_values(by=['A']) - assert_frame_equal(df1, df2) - - df1 = df.sort_values(by='B') - df2 = df.sort_values(by=['B']) - assert_frame_equal(df1, df2) - - def test_frame_column_inplace_sort_exception(self): - s = self.frame['A'] - with assertRaisesRegexp(ValueError, "This Series is a view"): - s.sort_values(inplace=True) - - cp = s.copy() - cp.sort_values() # it works! + def test_sort_index_categorical_index(self): - def test_sort_nat_values_in_int_column(self): + df = (DataFrame({'A': np.arange(6, dtype='int64'), + 'B': Series(list('aabbca')) + .astype('category', categories=list('cab'))}) + .set_index('B')) - # GH 14922: "sorting with large float and multiple columns incorrect" + result = df.sort_index() + expected = df.iloc[[4, 0, 1, 5, 2, 3]] + assert_frame_equal(result, expected) - # cause was that the int64 value NaT was considered as "na". Which is - # only correct for datetime64 columns. + result = df.sort_index(ascending=False) + expected = df.iloc[[3, 2, 5, 1, 0, 4]] + assert_frame_equal(result, expected) - int_values = (2, int(NaT)) - float_values = (2.0, -1.797693e308) + def test_sort_index(self): + # GH13496 - df = DataFrame(dict(int=int_values, float=float_values), - columns=["int", "float"]) + frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4], + columns=['A', 'B', 'C', 'D']) - df_reversed = DataFrame(dict(int=int_values[::-1], - float=float_values[::-1]), - columns=["int", "float"], - index=[1, 0]) + # axis=0 : sort rows by index labels + unordered = frame.loc[[3, 2, 4, 1]] + result = unordered.sort_index(axis=0) + expected = frame + assert_frame_equal(result, expected) - # NaT is not a "na" for int64 columns, so na_position must not - # influence the result: - df_sorted = df.sort_values(["int", "float"], na_position="last") - assert_frame_equal(df_sorted, df_reversed) + result = unordered.sort_index(ascending=False) + expected = frame[::-1] + assert_frame_equal(result, expected) - df_sorted = df.sort_values(["int", "float"], na_position="first") - assert_frame_equal(df_sorted, df_reversed) + # axis=1 : sort columns by column names + unordered = frame.iloc[:, [2, 1, 3, 0]] + result = unordered.sort_index(axis=1) + assert_frame_equal(result, frame) - # reverse sorting order - df_sorted = df.sort_values(["int", "float"], ascending=False) - assert_frame_equal(df_sorted, df) + result = unordered.sort_index(axis=1, ascending=False) + expected = frame.iloc[:, ::-1] + assert_frame_equal(result, expected) - # and now check if NaT is still considered as "na" for datetime64 - # columns: - df = DataFrame(dict(datetime=[Timestamp("2016-01-01"), NaT], - float=float_values), columns=["datetime", "float"]) + def test_sort_index_multiindex(self): + # GH13496 - df_reversed = DataFrame(dict(datetime=[NaT, Timestamp("2016-01-01")], - float=float_values[::-1]), - columns=["datetime", "float"], - index=[1, 0]) + # sort rows by specified level of multi-index + mi = MultiIndex.from_tuples([[2, 1, 3], [1, 1, 1]], names=list('ABC')) + df = DataFrame([[1, 2], [3, 4]], mi) - df_sorted = df.sort_values(["datetime", "float"], na_position="first") - assert_frame_equal(df_sorted, df_reversed) + # MI sort, but no level: sort_level has no effect + mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, 1]], names=list('ABC')) + df = DataFrame([[1, 2], [3, 4]], mi) + result = df.sort_index(sort_remaining=False) + expected = df.sort_index() + assert_frame_equal(result, expected) - df_sorted = df.sort_values(["datetime", "float"], na_position="last") - assert_frame_equal(df_sorted, df_reversed) + def test_sort_index_intervalindex(self): + # this is a de-facto sort via unstack + # confirming that we sort in the order of the bins + y = Series(np.random.randn(100)) + x1 = Series(np.sign(np.random.randn(100))) + x2 = pd.cut(Series(np.random.randn(100)), + bins=[-3, -0.5, 0, 0.5, 3]) + model = pd.concat([y, x1, x2], axis=1, keys=['Y', 'X1', 'X2']) + + result = model.groupby(['X1', 'X2']).mean().unstack() + expected = IntervalIndex.from_tuples( + [(-3.0, -0.5), (-0.5, 0.0), + (0.0, 0.5), (0.5, 3.0)], + closed='right') + result = result.columns.levels[1].categories + tm.assert_index_equal(result, expected) diff --git a/pandas/tests/frame/test_subclass.py b/pandas/tests/frame/test_subclass.py index 8bd6d3ba54371..52c591e4dcbb0 100644 --- a/pandas/tests/frame/test_subclass.py +++ b/pandas/tests/frame/test_subclass.py @@ -2,6 +2,7 @@ from __future__ import print_function +from warnings import catch_warnings import numpy as np from pandas import DataFrame, Series, MultiIndex, Panel @@ -11,9 +12,7 @@ from pandas.tests.frame.common import TestData -class TestDataFrameSubclassing(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameSubclassing(TestData): def test_frame_subclassing_and_slicing(self): # Subclass frame and ensure it returns the right class on slicing it @@ -51,45 +50,45 @@ def custom_frame_function(self): cdf = CustomDataFrame(data) # Did we get back our own DF class? - self.assertTrue(isinstance(cdf, CustomDataFrame)) + assert isinstance(cdf, CustomDataFrame) # Do we get back our own Series class after selecting a column? cdf_series = cdf.col1 - self.assertTrue(isinstance(cdf_series, CustomSeries)) - self.assertEqual(cdf_series.custom_series_function(), 'OK') + assert isinstance(cdf_series, CustomSeries) + assert cdf_series.custom_series_function() == 'OK' # Do we get back our own DF class after slicing row-wise? cdf_rows = cdf[1:5] - self.assertTrue(isinstance(cdf_rows, CustomDataFrame)) - self.assertEqual(cdf_rows.custom_frame_function(), 'OK') + assert isinstance(cdf_rows, CustomDataFrame) + assert cdf_rows.custom_frame_function() == 'OK' # Make sure sliced part of multi-index frame is custom class mcol = pd.MultiIndex.from_tuples([('A', 'A'), ('A', 'B')]) cdf_multi = CustomDataFrame([[0, 1], [2, 3]], columns=mcol) - self.assertTrue(isinstance(cdf_multi['A'], CustomDataFrame)) + assert isinstance(cdf_multi['A'], CustomDataFrame) mcol = pd.MultiIndex.from_tuples([('A', ''), ('B', '')]) cdf_multi2 = CustomDataFrame([[0, 1], [2, 3]], columns=mcol) - self.assertTrue(isinstance(cdf_multi2['A'], CustomSeries)) + assert isinstance(cdf_multi2['A'], CustomSeries) def test_dataframe_metadata(self): df = tm.SubclassedDataFrame({'X': [1, 2, 3], 'Y': [1, 2, 3]}, index=['a', 'b', 'c']) df.testattr = 'XXX' - self.assertEqual(df.testattr, 'XXX') - self.assertEqual(df[['X']].testattr, 'XXX') - self.assertEqual(df.loc[['a', 'b'], :].testattr, 'XXX') - self.assertEqual(df.iloc[[0, 1], :].testattr, 'XXX') + assert df.testattr == 'XXX' + assert df[['X']].testattr == 'XXX' + assert df.loc[['a', 'b'], :].testattr == 'XXX' + assert df.iloc[[0, 1], :].testattr == 'XXX' - # GH9776 - self.assertEqual(df.iloc[0:1, :].testattr, 'XXX') + # see gh-9776 + assert df.iloc[0:1, :].testattr == 'XXX' - # GH10553 - unpickled = self.round_trip_pickle(df) + # see gh-10553 + unpickled = tm.round_trip_pickle(df) tm.assert_frame_equal(df, unpickled) - self.assertEqual(df._metadata, unpickled._metadata) - self.assertEqual(df.testattr, unpickled.testattr) + assert df._metadata == unpickled._metadata + assert df.testattr == unpickled.testattr def test_indexing_sliced(self): # GH 11559 @@ -100,54 +99,55 @@ def test_indexing_sliced(self): res = df.loc[:, 'X'] exp = tm.SubclassedSeries([1, 2, 3], index=list('abc'), name='X') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = df.iloc[:, 1] exp = tm.SubclassedSeries([4, 5, 6], index=list('abc'), name='Y') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = df.loc[:, 'Z'] exp = tm.SubclassedSeries([7, 8, 9], index=list('abc'), name='Z') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = df.loc['a', :] exp = tm.SubclassedSeries([1, 4, 7], index=list('XYZ'), name='a') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = df.iloc[1, :] exp = tm.SubclassedSeries([2, 5, 8], index=list('XYZ'), name='b') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = df.loc['c', :] exp = tm.SubclassedSeries([3, 6, 9], index=list('XYZ'), name='c') tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) def test_to_panel_expanddim(self): # GH 9762 - class SubclassedFrame(DataFrame): + with catch_warnings(record=True): + class SubclassedFrame(DataFrame): - @property - def _constructor_expanddim(self): - return SubclassedPanel - - class SubclassedPanel(Panel): - pass - - index = MultiIndex.from_tuples([(0, 0), (0, 1), (0, 2)]) - df = SubclassedFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]}, index=index) - result = df.to_panel() - self.assertTrue(isinstance(result, SubclassedPanel)) - expected = SubclassedPanel([[[1, 2, 3]], [[4, 5, 6]]], - items=['X', 'Y'], major_axis=[0], - minor_axis=[0, 1, 2], - dtype='int64') - tm.assert_panel_equal(result, expected) + @property + def _constructor_expanddim(self): + return SubclassedPanel + + class SubclassedPanel(Panel): + pass + + index = MultiIndex.from_tuples([(0, 0), (0, 1), (0, 2)]) + df = SubclassedFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]}, index=index) + result = df.to_panel() + assert isinstance(result, SubclassedPanel) + expected = SubclassedPanel([[[1, 2, 3]], [[4, 5, 6]]], + items=['X', 'Y'], major_axis=[0], + minor_axis=[0, 1, 2], + dtype='int64') + tm.assert_panel_equal(result, expected) def test_subclass_attr_err_propagation(self): # GH 11808 @@ -156,7 +156,7 @@ class A(DataFrame): @property def bar(self): return self.i_dont_exist - with tm.assertRaisesRegexp(AttributeError, '.*i_dont_exist.*'): + with tm.assert_raises_regex(AttributeError, '.*i_dont_exist.*'): A().bar def test_subclass_align(self): @@ -173,15 +173,15 @@ def test_subclass_align(self): exp2 = tm.SubclassedDataFrame({'c': [1, 2, np.nan, 4, np.nan], 'd': [1, 2, np.nan, 4, np.nan]}, index=list('ABCDE')) - tm.assertIsInstance(res1, tm.SubclassedDataFrame) + assert isinstance(res1, tm.SubclassedDataFrame) tm.assert_frame_equal(res1, exp1) - tm.assertIsInstance(res2, tm.SubclassedDataFrame) + assert isinstance(res2, tm.SubclassedDataFrame) tm.assert_frame_equal(res2, exp2) res1, res2 = df1.a.align(df2.c) - tm.assertIsInstance(res1, tm.SubclassedSeries) + assert isinstance(res1, tm.SubclassedSeries) tm.assert_series_equal(res1, exp1.a) - tm.assertIsInstance(res2, tm.SubclassedSeries) + assert isinstance(res2, tm.SubclassedSeries) tm.assert_series_equal(res2, exp2.c) def test_subclass_align_combinations(self): @@ -199,23 +199,23 @@ def test_subclass_align_combinations(self): exp2 = pd.Series([1, 2, np.nan, 4, np.nan], index=list('ABCDE'), name='x') - tm.assertIsInstance(res1, tm.SubclassedDataFrame) + assert isinstance(res1, tm.SubclassedDataFrame) tm.assert_frame_equal(res1, exp1) - tm.assertIsInstance(res2, tm.SubclassedSeries) + assert isinstance(res2, tm.SubclassedSeries) tm.assert_series_equal(res2, exp2) # series + frame res1, res2 = s.align(df) - tm.assertIsInstance(res1, tm.SubclassedSeries) + assert isinstance(res1, tm.SubclassedSeries) tm.assert_series_equal(res1, exp2) - tm.assertIsInstance(res2, tm.SubclassedDataFrame) + assert isinstance(res2, tm.SubclassedDataFrame) tm.assert_frame_equal(res2, exp1) def test_subclass_iterrows(self): # GH 13977 df = tm.SubclassedDataFrame({'a': [1]}) for i, row in df.iterrows(): - tm.assertIsInstance(row, tm.SubclassedSeries) + assert isinstance(row, tm.SubclassedSeries) tm.assert_series_equal(row, df.loc[i]) def test_subclass_sparse_slice(self): @@ -229,9 +229,9 @@ def test_subclass_sparse_slice(self): tm.SubclassedSparseDataFrame(rows[:2])) tm.assert_sp_frame_equal(ssdf[:2], tm.SubclassedSparseDataFrame(rows[:2])) - tm.assert_equal(ssdf.loc[:2].testattr, "testattr") - tm.assert_equal(ssdf.iloc[:2].testattr, "testattr") - tm.assert_equal(ssdf[:2].testattr, "testattr") + assert ssdf.loc[:2].testattr == "testattr" + assert ssdf.iloc[:2].testattr == "testattr" + assert ssdf[:2].testattr == "testattr" tm.assert_sp_series_equal(ssdf.loc[1], tm.SubclassedSparseSeries(rows[1]), diff --git a/pandas/tests/frame/test_timeseries.py b/pandas/tests/frame/test_timeseries.py index 85967e9eda0d6..19fbf854256c6 100644 --- a/pandas/tests/frame/test_timeseries.py +++ b/pandas/tests/frame/test_timeseries.py @@ -2,29 +2,29 @@ from __future__ import print_function -from datetime import datetime +from datetime import datetime, time + +import pytest from numpy import nan from numpy.random import randn import numpy as np -from pandas import DataFrame, Series, Index, Timestamp, DatetimeIndex +from pandas import (DataFrame, Series, Index, + Timestamp, DatetimeIndex, + to_datetime, date_range) import pandas as pd import pandas.tseries.offsets as offsets -from pandas.util.testing import (assert_almost_equal, - assert_series_equal, - assert_frame_equal, - assertRaisesRegexp) +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm +from pandas.compat import product from pandas.tests.frame.common import TestData -class TestDataFrameTimeSeriesMethods(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameTimeSeriesMethods(TestData): def test_diff(self): the_diff = self.tsframe.diff(1) @@ -38,7 +38,7 @@ def test_diff(self): s = Series([a, b]) rs = DataFrame({'s': s}).diff() - self.assertEqual(rs.s[1], 1) + assert rs.s[1] == 1 # mixed numeric tf = self.tsframe.astype('float32') @@ -71,7 +71,7 @@ def test_diff_mixed_dtype(self): df['A'] = np.array([1, 2, 3, 4, 5], dtype=object) result = df.diff() - self.assertEqual(result[0].dtype, np.float64) + assert result[0].dtype == np.float64 def test_diff_neg_n(self): rs = self.tsframe.diff(-1) @@ -117,16 +117,70 @@ def test_pct_change_shift_over_nas(self): edf = DataFrame({'a': expected, 'b': expected}) assert_frame_equal(chg, edf) + def test_frame_ctor_datetime64_column(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') + dates = np.asarray(rng) + + df = DataFrame({'A': np.random.randn(len(rng)), 'B': dates}) + assert np.issubdtype(df['B'].dtype, np.dtype('M8[ns]')) + + def test_frame_add_datetime64_column(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') + df = DataFrame(index=np.arange(len(rng))) + + df['A'] = rng + assert np.issubdtype(df['A'].dtype, np.dtype('M8[ns]')) + + def test_frame_datetime64_pre1900_repr(self): + df = DataFrame({'year': date_range('1/1/1700', periods=50, + freq='A-DEC')}) + # it works! + repr(df) + + def test_frame_add_datetime64_col_other_units(self): + n = 100 + + units = ['h', 'm', 's', 'ms', 'D', 'M', 'Y'] + + ns_dtype = np.dtype('M8[ns]') + + for unit in units: + dtype = np.dtype('M8[%s]' % unit) + vals = np.arange(n, dtype=np.int64).view(dtype) + + df = DataFrame({'ints': np.arange(n)}, index=np.arange(n)) + df[unit] = vals + + ex_vals = to_datetime(vals.astype('O')).values + + assert df[unit].dtype == ns_dtype + assert (df[unit].values == ex_vals).all() + + # Test insertion into existing datetime64 column + df = DataFrame({'ints': np.arange(n)}, index=np.arange(n)) + df['dates'] = np.arange(n, dtype=np.int64).view(ns_dtype) + + for unit in units: + dtype = np.dtype('M8[%s]' % unit) + vals = np.arange(n, dtype=np.int64).view(dtype) + + tmp = df.copy() + + tmp['dates'] = vals + ex_vals = to_datetime(vals.astype('O')).values + + assert (tmp['dates'].values == ex_vals).all() + def test_shift(self): # naive shift shiftedFrame = self.tsframe.shift(5) - self.assert_index_equal(shiftedFrame.index, self.tsframe.index) + tm.assert_index_equal(shiftedFrame.index, self.tsframe.index) shiftedSeries = self.tsframe['A'].shift(5) assert_series_equal(shiftedFrame['A'], shiftedSeries) shiftedFrame = self.tsframe.shift(-5) - self.assert_index_equal(shiftedFrame.index, self.tsframe.index) + tm.assert_index_equal(shiftedFrame.index, self.tsframe.index) shiftedSeries = self.tsframe['A'].shift(-5) assert_series_equal(shiftedFrame['A'], shiftedSeries) @@ -137,7 +191,7 @@ def test_shift(self): # shift by DateOffset shiftedFrame = self.tsframe.shift(5, freq=offsets.BDay()) - self.assertEqual(len(shiftedFrame), len(self.tsframe)) + assert len(shiftedFrame) == len(self.tsframe) shiftedFrame2 = self.tsframe.shift(5, freq='B') assert_frame_equal(shiftedFrame, shiftedFrame2) @@ -154,8 +208,8 @@ def test_shift(self): ps = tm.makePeriodFrame() shifted = ps.shift(1) unshifted = shifted.shift(-1) - self.assert_index_equal(shifted.index, ps.index) - self.assert_index_equal(unshifted.index, ps.index) + tm.assert_index_equal(shifted.index, ps.index) + tm.assert_index_equal(unshifted.index, ps.index) tm.assert_numpy_array_equal(unshifted.iloc[:, 0].valid().values, ps.iloc[:-1, 0].values) @@ -164,8 +218,9 @@ def test_shift(self): assert_frame_equal(shifted2, shifted3) assert_frame_equal(ps, shifted2.shift(-1, 'B')) - assertRaisesRegexp(ValueError, 'does not match PeriodIndex freq', - ps.shift, freq='D') + tm.assert_raises_regex(ValueError, + 'does not match PeriodIndex freq', + ps.shift, freq='D') # shift other axis # GH 6371 @@ -211,6 +266,28 @@ def test_shift_empty(self): assert_frame_equal(df, rs) + def test_shift_duplicate_columns(self): + # GH 9092; verify that position-based shifting works + # in the presence of duplicate columns + column_lists = [list(range(5)), [1] * 5, [1, 1, 2, 2, 1]] + data = np.random.randn(20, 5) + + shifted = [] + for columns in column_lists: + df = pd.DataFrame(data.copy(), columns=columns) + for s in range(5): + df.iloc[:, s] = df.iloc[:, s].shift(s + 1) + df.columns = range(5) + shifted.append(df) + + # sanity check the base case + nulls = shifted[0].isna().sum() + assert_series_equal(nulls, Series(range(1, 6), dtype='int64')) + + # check all answers are the same + assert_frame_equal(shifted[0], shifted[1]) + assert_frame_equal(shifted[0], shifted[2]) + def test_tshift(self): # PeriodIndex ps = tm.makePeriodFrame() @@ -225,7 +302,8 @@ def test_tshift(self): shifted3 = ps.tshift(freq=offsets.BDay()) assert_frame_equal(shifted, shifted3) - assertRaisesRegexp(ValueError, 'does not match', ps.tshift, freq='M') + tm.assert_raises_regex( + ValueError, 'does not match', ps.tshift, freq='M') # DatetimeIndex shifted = self.tsframe.tshift(1) @@ -245,7 +323,7 @@ def test_tshift(self): assert_frame_equal(unshifted, inferred_ts) no_freq = self.tsframe.iloc[[0, 5, 7], :] - self.assertRaises(ValueError, no_freq.tshift) + pytest.raises(ValueError, no_freq.tshift) def test_truncate(self): ts = self.tsframe[::3] @@ -286,21 +364,21 @@ def test_truncate(self): truncated = ts.truncate(after=end_missing) assert_frame_equal(truncated, expected) - self.assertRaises(ValueError, ts.truncate, - before=ts.index[-1] - 1, - after=ts.index[0] + 1) + pytest.raises(ValueError, ts.truncate, + before=ts.index[-1] - 1, + after=ts.index[0] + 1) def test_truncate_copy(self): index = self.tsframe.index truncated = self.tsframe.truncate(index[5], index[10]) truncated.values[:] = 5. - self.assertFalse((self.tsframe.values[5:11] == 5).any()) + assert not (self.tsframe.values[5:11] == 5).any() def test_asfreq(self): offset_monthly = self.tsframe.asfreq(offsets.BMonthEnd()) rule_monthly = self.tsframe.asfreq('BM') - assert_almost_equal(offset_monthly['A'], rule_monthly['A']) + tm.assert_almost_equal(offset_monthly['A'], rule_monthly['A']) filled = rule_monthly.asfreq('B', method='pad') # noqa # TODO: actually check that this worked. @@ -311,17 +389,17 @@ def test_asfreq(self): # test does not blow up on length-0 DataFrame zero_length = self.tsframe.reindex([]) result = zero_length.asfreq('BM') - self.assertIsNot(result, zero_length) + assert result is not zero_length def test_asfreq_datetimeindex(self): df = DataFrame({'A': [1, 2, 3]}, index=[datetime(2011, 11, 1), datetime(2011, 11, 2), datetime(2011, 11, 3)]) df = df.asfreq('B') - tm.assertIsInstance(df.index, DatetimeIndex) + assert isinstance(df.index, DatetimeIndex) ts = df['A'].asfreq('B') - tm.assertIsInstance(ts.index, DatetimeIndex) + assert isinstance(ts.index, DatetimeIndex) def test_asfreq_fillvalue(self): # test for fill value during upsampling, related to issue 3715 @@ -352,15 +430,105 @@ def test_first_last_valid(self): frame = DataFrame({'foo': mat}, index=self.frame.index) index = frame.first_valid_index() - self.assertEqual(index, frame.index[5]) + assert index == frame.index[5] index = frame.last_valid_index() - self.assertEqual(index, frame.index[-6]) + assert index == frame.index[-6] # GH12800 empty = DataFrame() - self.assertIsNone(empty.last_valid_index()) - self.assertIsNone(empty.first_valid_index()) + assert empty.last_valid_index() is None + assert empty.first_valid_index() is None + + def test_at_time_frame(self): + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = DataFrame(np.random.randn(len(rng), 2), index=rng) + rs = ts.at_time(rng[1]) + assert (rs.index.hour == rng[1].hour).all() + assert (rs.index.minute == rng[1].minute).all() + assert (rs.index.second == rng[1].second).all() + + result = ts.at_time('9:30') + expected = ts.at_time(time(9, 30)) + assert_frame_equal(result, expected) + + result = ts.loc[time(9, 30)] + expected = ts.loc[(rng.hour == 9) & (rng.minute == 30)] + + assert_frame_equal(result, expected) + + # midnight, everything + rng = date_range('1/1/2000', '1/31/2000') + ts = DataFrame(np.random.randn(len(rng), 3), index=rng) + + result = ts.at_time(time(0, 0)) + assert_frame_equal(result, ts) + + # time doesn't exist + rng = date_range('1/1/2012', freq='23Min', periods=384) + ts = DataFrame(np.random.randn(len(rng), 2), rng) + rs = ts.at_time('16:00') + assert len(rs) == 0 + + def test_between_time_frame(self): + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = DataFrame(np.random.randn(len(rng), 2), index=rng) + stime = time(0, 0) + etime = time(1, 0) + + close_open = product([True, False], [True, False]) + for inc_start, inc_end in close_open: + filtered = ts.between_time(stime, etime, inc_start, inc_end) + exp_len = 13 * 4 + 1 + if not inc_start: + exp_len -= 5 + if not inc_end: + exp_len -= 4 + + assert len(filtered) == exp_len + for rs in filtered.index: + t = rs.time() + if inc_start: + assert t >= stime + else: + assert t > stime + + if inc_end: + assert t <= etime + else: + assert t < etime + + result = ts.between_time('00:00', '01:00') + expected = ts.between_time(stime, etime) + assert_frame_equal(result, expected) + + # across midnight + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = DataFrame(np.random.randn(len(rng), 2), index=rng) + stime = time(22, 0) + etime = time(9, 0) + + close_open = product([True, False], [True, False]) + for inc_start, inc_end in close_open: + filtered = ts.between_time(stime, etime, inc_start, inc_end) + exp_len = (12 * 11 + 1) * 4 + 1 + if not inc_start: + exp_len -= 4 + if not inc_end: + exp_len -= 4 + + assert len(filtered) == exp_len + for rs in filtered.index: + t = rs.time() + if inc_start: + assert (t >= stime) or (t <= etime) + else: + assert (t > stime) or (t <= etime) + + if inc_end: + assert (t <= etime) or (t >= stime) + else: + assert (t < etime) or (t >= stime) def test_operation_on_NaT(self): # Both NaT and Timestamp are in DataFrame. @@ -392,17 +560,39 @@ def test_datetime_assignment_with_NaT_and_diff_time_units(self): result = pd.Series(data_ns).to_frame() result['new'] = data_ns expected = pd.DataFrame({0: [1, None], - 'new': [1, None]}, dtype='datetime64[ns]') + 'new': [1, None]}, dtype='datetime64[ns]') tm.assert_frame_equal(result, expected) # OutOfBoundsDatetime error shouldn't occur data_s = np.array([1, 'nat'], dtype='datetime64[s]') result['new'] = data_s expected = pd.DataFrame({0: [1, None], - 'new': [1e9, None]}, dtype='datetime64[ns]') + 'new': [1e9, None]}, dtype='datetime64[ns]') tm.assert_frame_equal(result, expected) + def test_frame_to_period(self): + K = 5 + from pandas.core.indexes.period import period_range + + dr = date_range('1/1/2000', '1/1/2001') + pr = period_range('1/1/2000', '1/1/2001') + df = DataFrame(randn(len(dr), K), index=dr) + df['mix'] = 'a' + + pts = df.to_period() + exp = df.copy() + exp.index = pr + assert_frame_equal(pts, exp) + + pts = df.to_period('M') + tm.assert_index_equal(pts.index, exp.index.asfreq('M')) + + df = df.T + pts = df.to_period(axis=1) + exp = df.copy() + exp.columns = pr + assert_frame_equal(pts, exp) + + pts = df.to_period('M', axis=1) + tm.assert_index_equal(pts.columns, exp.columns.asfreq('M')) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + pytest.raises(ValueError, df.to_period, axis=2) diff --git a/pandas/tests/frame/test_to_csv.py b/pandas/tests/frame/test_to_csv.py index b585462365606..6a4b1686a31e2 100644 --- a/pandas/tests/frame/test_to_csv.py +++ b/pandas/tests/frame/test_to_csv.py @@ -3,12 +3,13 @@ from __future__ import print_function import csv +import pytest from numpy import nan import numpy as np from pandas.compat import (lmap, range, lrange, StringIO, u) -from pandas.parser import ParserError +from pandas.errors import ParserError from pandas import (DataFrame, Index, Series, MultiIndex, Timestamp, date_range, read_csv, compat, to_datetime) import pandas as pd @@ -17,8 +18,7 @@ assert_series_equal, assert_frame_equal, ensure_clean, - makeCustomDataframe as mkdf, - assertRaisesRegexp, slow) + makeCustomDataframe as mkdf) import pandas.util.testing as tm from pandas.tests.frame.common import TestData @@ -29,9 +29,7 @@ 'int32', 'int64'] -class TestDataFrameToCSV(tm.TestCase, TestData): - - _multiprocess_can_split_ = True +class TestDataFrameToCSV(TestData): def test_to_csv_from_csv1(self): @@ -96,8 +94,8 @@ def test_to_csv_from_csv2(self): assert_frame_equal(xp, rs) - self.assertRaises(ValueError, self.frame2.to_csv, path, - header=['AA', 'X']) + pytest.raises(ValueError, self.frame2.to_csv, path, + header=['AA', 'X']) def test_to_csv_from_csv3(self): @@ -207,7 +205,7 @@ def _check_df(df, cols=None): cols = ['b', 'a'] _check_df(df, cols) - @slow + @pytest.mark.slow def test_to_csv_dtnat(self): # GH3437 from pandas import NaT @@ -238,7 +236,7 @@ def make_dtnat_arr(n, nnat=None): assert_frame_equal(df, recons, check_names=False, check_less_precise=True) - @slow + @pytest.mark.slow def test_to_csv_moar(self): def _do_test(df, r_dtype=None, c_dtype=None, @@ -435,13 +433,13 @@ def test_to_csv_no_index(self): assert_frame_equal(df, result) def test_to_csv_with_mix_columns(self): - # GH11637, incorrect output when a mix of integer and string column + # gh-11637: incorrect output when a mix of integer and string column # names passed as columns parameter in to_csv df = DataFrame({0: ['a', 'b', 'c'], 1: ['aa', 'bb', 'cc']}) df['test'] = 'txt' - self.assertEqual(df.to_csv(), df.to_csv(columns=[0, 1, 'test'])) + assert df.to_csv() == df.to_csv(columns=[0, 1, 'test']) def test_to_csv_headers(self): # GH6186, the presence or absence of `index` incorrectly @@ -477,7 +475,7 @@ def test_to_csv_multiindex(self): # TODO to_csv drops column name assert_frame_equal(frame, df, check_names=False) - self.assertEqual(frame.index.names, df.index.names) + assert frame.index.names == df.index.names # needed if setUP becomes a classmethod self.frame.index = old_index @@ -496,7 +494,7 @@ def test_to_csv_multiindex(self): # do not load index tsframe.to_csv(path) recons = DataFrame.from_csv(path, index_col=None) - self.assertEqual(len(recons.columns), len(tsframe.columns) + 2) + assert len(recons.columns) == len(tsframe.columns) + 2 # no index tsframe.to_csv(path, index=False) @@ -550,7 +548,7 @@ def _make_frame(names=None): df = _make_frame(True) df.to_csv(path, tupleize_cols=False, index=False) result = read_csv(path, header=[0, 1], tupleize_cols=False) - self.assertTrue(all([x is None for x in result.columns.names])) + assert all([x is None for x in result.columns.names]) result.columns.names = df.columns.names assert_frame_equal(df, result) @@ -589,13 +587,13 @@ def _make_frame(names=None): for i in [6, 7]: msg = 'len of {i}, but only 5 lines in file'.format(i=i) - with assertRaisesRegexp(ParserError, msg): + with tm.assert_raises_regex(ParserError, msg): read_csv(path, tupleize_cols=False, header=lrange(i), index_col=0) # write with cols - with assertRaisesRegexp(TypeError, 'cannot specify cols with a ' - 'MultiIndex'): + with tm.assert_raises_regex(TypeError, 'cannot specify cols ' + 'with a MultiIndex'): df.to_csv(path, tupleize_cols=False, columns=['foo', 'bar']) with ensure_clean('__tmp_to_csv_multiindex__') as path: @@ -605,8 +603,8 @@ def _make_frame(names=None): exp = tsframe[:0] exp.index = [] - self.assert_index_equal(recons.columns, exp.columns) - self.assertEqual(len(recons), 0) + tm.assert_index_equal(recons.columns, exp.columns) + assert len(recons) == 0 def test_to_csv_float32_nanrep(self): df = DataFrame(np.random.randn(1, 4).astype(np.float32)) @@ -617,7 +615,7 @@ def test_to_csv_float32_nanrep(self): with open(path) as f: lines = f.readlines() - self.assertEqual(lines[1].split(',')[2], '999') + assert lines[1].split(',')[2] == '999' def test_to_csv_withcommas(self): @@ -730,7 +728,7 @@ def test_to_csv_chunking(self): rs = read_csv(filename, index_col=0) assert_frame_equal(rs, aa) - @slow + @pytest.mark.slow def test_to_csv_wide_frame_formatting(self): # Issue #8621 df = DataFrame(np.random.randn(1, 100010), columns=None, index=None) @@ -815,7 +813,7 @@ def test_to_csv_unicodewriter_quoting(self): '2,"bar"\n' '3,"baz"\n') - self.assertEqual(result, expected) + assert result == expected def test_to_csv_quote_none(self): # GH4328 @@ -826,7 +824,7 @@ def test_to_csv_quote_none(self): encoding=encoding, index=False) result = buf.getvalue() expected = 'A\nhello\n{"hello"}\n' - self.assertEqual(result, expected) + assert result == expected def test_to_csv_index_no_leading_comma(self): df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, @@ -838,7 +836,7 @@ def test_to_csv_index_no_leading_comma(self): 'one,1,4\n' 'two,2,5\n' 'three,3,6\n') - self.assertEqual(buf.getvalue(), expected) + assert buf.getvalue() == expected def test_to_csv_line_terminators(self): df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, @@ -850,7 +848,7 @@ def test_to_csv_line_terminators(self): 'one,1,4\r\n' 'two,2,5\r\n' 'three,3,6\r\n') - self.assertEqual(buf.getvalue(), expected) + assert buf.getvalue() == expected buf = StringIO() df.to_csv(buf) # The default line terminator remains \n @@ -858,7 +856,7 @@ def test_to_csv_line_terminators(self): 'one,1,4\n' 'two,2,5\n' 'three,3,6\n') - self.assertEqual(buf.getvalue(), expected) + assert buf.getvalue() == expected def test_to_csv_from_csv_categorical(self): @@ -870,7 +868,7 @@ def test_to_csv_from_csv_categorical(self): s.to_csv(res) exp = StringIO() s2.to_csv(exp) - self.assertEqual(res.getvalue(), exp.getvalue()) + assert res.getvalue() == exp.getvalue() df = DataFrame({"s": s}) df2 = DataFrame({"s": s2}) @@ -878,14 +876,14 @@ def test_to_csv_from_csv_categorical(self): df.to_csv(res) exp = StringIO() df2.to_csv(exp) - self.assertEqual(res.getvalue(), exp.getvalue()) + assert res.getvalue() == exp.getvalue() def test_to_csv_path_is_none(self): # GH 8215 # Make sure we return string for consistency with # Series.to_csv() csv_str = self.frame.to_csv(path_or_buf=None) - self.assertIsInstance(csv_str, str) + assert isinstance(csv_str, str) recons = pd.read_csv(StringIO(csv_str), index_col=0) assert_frame_equal(self.frame, recons) @@ -910,7 +908,7 @@ def test_to_csv_compression_gzip(self): text = f.read().decode('utf8') f.close() for col in df.columns: - self.assertIn(col, text) + assert col in text def test_to_csv_compression_bz2(self): # GH7615 @@ -933,7 +931,7 @@ def test_to_csv_compression_bz2(self): text = f.read().decode('utf8') f.close() for col in df.columns: - self.assertIn(col, text) + assert col in text def test_to_csv_compression_xz(self): # GH11852 @@ -967,8 +965,8 @@ def test_to_csv_compression_value_error(self): with ensure_clean() as filename: # zip compression is not supported and should raise ValueError import zipfile - self.assertRaises(zipfile.BadZipfile, df.to_csv, - filename, compression="zip") + pytest.raises(zipfile.BadZipfile, df.to_csv, + filename, compression="zip") def test_to_csv_date_format(self): with ensure_clean('__tmp_to_csv_date_format__') as path: @@ -1080,13 +1078,13 @@ def test_to_csv_quoting(self): 1,False,3.2,,"b,c" """ result = df.to_csv() - self.assertEqual(result, expected) + assert result == expected result = df.to_csv(quoting=None) - self.assertEqual(result, expected) + assert result == expected result = df.to_csv(quoting=csv.QUOTE_MINIMAL) - self.assertEqual(result, expected) + assert result == expected expected = """\ "","c_bool","c_float","c_int","c_string" @@ -1094,7 +1092,7 @@ def test_to_csv_quoting(self): "1","False","3.2","","b,c" """ result = df.to_csv(quoting=csv.QUOTE_ALL) - self.assertEqual(result, expected) + assert result == expected # see gh-12922, gh-13259: make sure changes to # the formatters do not break this behaviour @@ -1104,14 +1102,14 @@ def test_to_csv_quoting(self): 1,False,3.2,"","b,c" """ result = df.to_csv(quoting=csv.QUOTE_NONNUMERIC) - self.assertEqual(result, expected) + assert result == expected msg = "need to escape, but no escapechar set" - tm.assertRaisesRegexp(csv.Error, msg, df.to_csv, - quoting=csv.QUOTE_NONE) - tm.assertRaisesRegexp(csv.Error, msg, df.to_csv, - quoting=csv.QUOTE_NONE, - escapechar=None) + tm.assert_raises_regex(csv.Error, msg, df.to_csv, + quoting=csv.QUOTE_NONE) + tm.assert_raises_regex(csv.Error, msg, df.to_csv, + quoting=csv.QUOTE_NONE, + escapechar=None) expected = """\ ,c_bool,c_float,c_int,c_string @@ -1120,7 +1118,7 @@ def test_to_csv_quoting(self): """ result = df.to_csv(quoting=csv.QUOTE_NONE, escapechar='!') - self.assertEqual(result, expected) + assert result == expected expected = """\ ,c_bool,c_ffloat,c_int,c_string @@ -1129,7 +1127,7 @@ def test_to_csv_quoting(self): """ result = df.to_csv(quoting=csv.QUOTE_NONE, escapechar='f') - self.assertEqual(result, expected) + assert result == expected # see gh-3503: quoting Windows line terminators # presents with encoding? @@ -1137,17 +1135,39 @@ def test_to_csv_quoting(self): df = pd.read_csv(StringIO(text)) buf = StringIO() df.to_csv(buf, encoding='utf-8', index=False) - self.assertEqual(buf.getvalue(), text) + assert buf.getvalue() == text # xref gh-7791: make sure the quoting parameter is passed through # with multi-indexes df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) df = df.set_index(['a', 'b']) expected = '"a","b","c"\n"1","3","5"\n"2","4","6"\n' - self.assertEqual(df.to_csv(quoting=csv.QUOTE_ALL), expected) + assert df.to_csv(quoting=csv.QUOTE_ALL) == expected + + def test_period_index_date_overflow(self): + # see gh-15982 + + dates = ["1990-01-01", "2000-01-01", "3005-01-01"] + index = pd.PeriodIndex(dates, freq="D") + + df = pd.DataFrame([4, 5, 6], index=index) + result = df.to_csv() + expected = ',0\n1990-01-01,4\n2000-01-01,5\n3005-01-01,6\n' + assert result == expected + + date_format = "%m-%d-%Y" + result = df.to_csv(date_format=date_format) + + expected = ',0\n01-01-1990,4\n01-01-2000,5\n01-01-3005,6\n' + assert result == expected + + # Overflow with pd.NaT + dates = ["1990-01-01", pd.NaT, "3005-01-01"] + index = pd.PeriodIndex(dates, freq="D") + + df = pd.DataFrame([4, 5, 6], index=index) + result = df.to_csv() -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + expected = ',0\n1990-01-01,4\n,5\n3005-01-01,6\n' + assert result == expected diff --git a/pandas/tests/frame/test_validate.py b/pandas/tests/frame/test_validate.py index e1ef87bb3271a..2de0e866f6e70 100644 --- a/pandas/tests/frame/test_validate.py +++ b/pandas/tests/frame/test_validate.py @@ -1,33 +1,33 @@ -from unittest import TestCase from pandas.core.frame import DataFrame +import pytest +import pandas.util.testing as tm -class TestDataFrameValidate(TestCase): - """Tests for error handling related to data types of method arguments.""" - df = DataFrame({'a': [1, 2], 'b': [3, 4]}) - - def test_validate_bool_args(self): - # Tests for error handling related to boolean arguments. - invalid_values = [1, "True", [1, 2, 3], 5.0] - - for value in invalid_values: - with self.assertRaises(ValueError): - self.df.query('a > b', inplace=value) - with self.assertRaises(ValueError): - self.df.eval('a + b', inplace=value) +@pytest.fixture +def dataframe(): + return DataFrame({'a': [1, 2], 'b': [3, 4]}) - with self.assertRaises(ValueError): - self.df.set_index(keys=['a'], inplace=value) - with self.assertRaises(ValueError): - self.df.reset_index(inplace=value) - - with self.assertRaises(ValueError): - self.df.dropna(inplace=value) - - with self.assertRaises(ValueError): - self.df.drop_duplicates(inplace=value) +class TestDataFrameValidate(object): + """Tests for error handling related to data types of method arguments.""" - with self.assertRaises(ValueError): - self.df.sort_values(by=['a'], inplace=value) + @pytest.mark.parametrize("func", ["query", "eval", "set_index", + "reset_index", "dropna", + "drop_duplicates", "sort_values"]) + @pytest.mark.parametrize("inplace", [1, "True", [1, 2, 3], 5.0]) + def test_validate_bool_args(self, dataframe, func, inplace): + msg = "For argument \"inplace\" expected type bool" + kwargs = dict(inplace=inplace) + + if func == "query": + kwargs["expr"] = "a > b" + elif func == "eval": + kwargs["expr"] = "a + b" + elif func == "set_index": + kwargs["keys"] = ["a"] + elif func == "sort_values": + kwargs["by"] = ["a"] + + with tm.assert_raises_regex(ValueError, msg): + getattr(dataframe, func)(**kwargs) diff --git a/pandas/tests/groupby/common.py b/pandas/tests/groupby/common.py new file mode 100644 index 0000000000000..3e99e8211b4f8 --- /dev/null +++ b/pandas/tests/groupby/common.py @@ -0,0 +1,62 @@ +""" Base setup """ + +import pytest +import numpy as np +from pandas.util import testing as tm +from pandas import DataFrame, MultiIndex + + +@pytest.fixture +def mframe(): + index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', + 'three']], + labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=['first', 'second']) + return DataFrame(np.random.randn(10, 3), index=index, + columns=['A', 'B', 'C']) + + +@pytest.fixture +def df(): + return DataFrame( + {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], + 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], + 'C': np.random.randn(8), + 'D': np.random.randn(8)}) + + +class MixIn(object): + + def setup_method(self, method): + self.ts = tm.makeTimeSeries() + + self.seriesd = tm.getSeriesData() + self.tsd = tm.getTimeSeriesData() + self.frame = DataFrame(self.seriesd) + self.tsframe = DataFrame(self.tsd) + + self.df = df() + self.df_mixed_floats = DataFrame( + {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], + 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], + 'C': np.random.randn(8), + 'D': np.array( + np.random.randn(8), dtype='float32')}) + + self.mframe = mframe() + + self.three_group = DataFrame( + {'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', + 'foo', 'foo', 'foo'], + 'B': ['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', + 'two', 'two', 'one'], + 'C': ['dull', 'dull', 'shiny', 'dull', 'dull', 'shiny', 'shiny', + 'dull', 'shiny', 'shiny', 'shiny'], + 'D': np.random.randn(11), + 'E': np.random.randn(11), + 'F': np.random.randn(11)}) + + +def assert_fp_equal(a, b): + assert (np.abs(a - b) < 1e-12).all() diff --git a/pandas/tests/groupby/test_aggregate.py b/pandas/tests/groupby/test_aggregate.py index 6b162b71f79de..efc833575843c 100644 --- a/pandas/tests/groupby/test_aggregate.py +++ b/pandas/tests/groupby/test_aggregate.py @@ -1,36 +1,33 @@ # -*- coding: utf-8 -*- -from __future__ import print_function -import nose -from datetime import datetime - - -from pandas import date_range -from pandas.core.index import MultiIndex -from pandas.core.api import DataFrame -from pandas.core.series import Series +""" +we test .agg behavior / note that .apply is tested +generally in test_groupby.py +""" -from pandas.util.testing import (assert_frame_equal, assert_series_equal - ) +from __future__ import print_function -from pandas.core.groupby import (SpecificationError) -from pandas.compat import (lmap, OrderedDict) -from pandas.formats.printing import pprint_thing +import pytest -from pandas import compat +from datetime import datetime, timedelta +from functools import partial -import pandas.core.common as com import numpy as np - -import pandas.util.testing as tm +from numpy import nan import pandas as pd +from pandas import (date_range, MultiIndex, DataFrame, + Series, Index, bdate_range, concat) +from pandas.util.testing import assert_frame_equal, assert_series_equal +from pandas.core.groupby import SpecificationError, DataError +from pandas.compat import OrderedDict +from pandas.io.formats.printing import pprint_thing +import pandas.util.testing as tm -class TestGroupByAggregate(tm.TestCase): - _multiprocess_can_split_ = True +class TestGroupByAggregate(object): - def setUp(self): + def setup_method(self, method): self.ts = tm.makeTimeSeries() self.seriesd = tm.getSeriesData() @@ -123,7 +120,7 @@ def test_agg_period_index(self): prng = period_range('2012-1-1', freq='M', periods=3) df = DataFrame(np.random.randn(3, 2), index=prng) rs = df.groupby(level=0).sum() - tm.assertIsInstance(rs.index, PeriodIndex) + assert isinstance(rs.index, PeriodIndex) # GH 3579 index = period_range(start='1999-01', periods=5, freq='M') @@ -160,10 +157,33 @@ def test_agg_dict_parameter_cast_result_dtypes(self): assert_series_equal(grouped.time.last(), exp['time']) assert_series_equal(grouped.time.agg('last'), exp['time']) + # count + exp = pd.Series([2, 2, 2, 2], + index=Index(list('ABCD'), name='class'), + name='time') + assert_series_equal(grouped.time.agg(len), exp) + assert_series_equal(grouped.time.size(), exp) + + exp = pd.Series([0, 1, 1, 2], + index=Index(list('ABCD'), name='class'), + name='time') + assert_series_equal(grouped.time.count(), exp) + + def test_agg_cast_results_dtypes(self): + # similar to GH12821 + # xref #11444 + u = [datetime(2015, x + 1, 1) for x in range(12)] + v = list('aaabbbbbbccd') + df = pd.DataFrame({'X': v, 'Y': u}) + + result = df.groupby('X')['Y'].agg(len) + expected = df.groupby('X')['Y'].count() + assert_series_equal(result, expected) + def test_agg_must_agg(self): grouped = self.df.groupby('A')['C'] - self.assertRaises(Exception, grouped.agg, lambda x: x.describe()) - self.assertRaises(Exception, grouped.agg, lambda x: x.index[:2]) + pytest.raises(Exception, grouped.agg, lambda x: x.describe()) + pytest.raises(Exception, grouped.agg, lambda x: x.index[:2]) def test_agg_ser_multi_key(self): # TODO(wesm): unused @@ -177,7 +197,7 @@ def test_agg_ser_multi_key(self): def test_agg_apply_corner(self): # nothing to group, all NA grouped = self.ts.groupby(self.ts * np.nan) - self.assertEqual(self.ts.dtype, np.float64) + assert self.ts.dtype == np.float64 # groupby float64 values results in Float64Index exp = Series([], dtype=np.float64, index=pd.Index( @@ -214,6 +234,27 @@ def test_agg_grouping_is_list_tuple(self): expected = grouped.mean() tm.assert_frame_equal(result, expected) + def test_aggregate_float64_no_int64(self): + # see gh-11199 + df = DataFrame({"a": [1, 2, 3, 4, 5], + "b": [1, 2, 2, 4, 5], + "c": [1, 2, 3, 4, 5]}) + + expected = DataFrame({"a": [1, 2.5, 4, 5]}, + index=[1, 2, 4, 5]) + expected.index.name = "b" + + result = df.groupby("b")[["a"]].mean() + tm.assert_frame_equal(result, expected) + + expected = DataFrame({"a": [1, 2.5, 4, 5], + "c": [1, 2.5, 4, 5]}, + index=[1, 2, 4, 5]) + expected.index.name = "b" + + result = df.groupby("b")[["a", "c"]].mean() + tm.assert_frame_equal(result, expected) + def test_aggregate_api_consistency(self): # GH 9052 # make sure that the aggregates via dict @@ -274,8 +315,10 @@ def test_aggregate_api_consistency(self): expected.columns = MultiIndex.from_product([['C', 'D'], ['mean', 'sum']]) - result = grouped[['D', 'C']].agg({'r': np.sum, - 'r2': np.mean}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = grouped[['D', 'C']].agg({'r': np.sum, + 'r2': np.mean}) expected = pd.concat([d_sum, c_sum, d_mean, @@ -285,6 +328,27 @@ def test_aggregate_api_consistency(self): ['D', 'C']]) assert_frame_equal(result, expected, check_like=True) + def test_agg_dict_renaming_deprecation(self): + # 15931 + df = pd.DataFrame({'A': [1, 1, 1, 2, 2], + 'B': range(5), + 'C': range(5)}) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False) as w: + df.groupby('A').agg({'B': {'foo': ['sum', 'max']}, + 'C': {'bar': ['count', 'min']}}) + assert "using a dict with renaming" in str(w[0].message) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + df.groupby('A')[['B', 'C']].agg({'ma': 'max'}) + + with tm.assert_produces_warning(FutureWarning) as w: + df.groupby('A').B.agg({'foo': 'count'}) + assert "using a dict on a Series for aggregation" in str( + w[0].message) + def test_agg_compat(self): # GH 12334 @@ -303,14 +367,19 @@ def test_agg_compat(self): axis=1) expected.columns = MultiIndex.from_tuples([('C', 'sum'), ('C', 'std')]) - result = g['D'].agg({'C': ['sum', 'std']}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = g['D'].agg({'C': ['sum', 'std']}) assert_frame_equal(result, expected, check_like=True) expected = pd.concat([g['D'].sum(), g['D'].std()], axis=1) expected.columns = ['C', 'D'] - result = g['D'].agg({'C': 'sum', 'D': 'std'}) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = g['D'].agg({'C': 'sum', 'D': 'std'}) assert_frame_equal(result, expected, check_like=True) def test_agg_nested_dicts(self): @@ -329,10 +398,12 @@ def f(): g.aggregate({'r1': {'C': ['mean', 'sum']}, 'r2': {'D': ['mean', 'sum']}}) - self.assertRaises(SpecificationError, f) + pytest.raises(SpecificationError, f) - result = g.agg({'C': {'ra': ['mean', 'std']}, - 'D': {'rb': ['mean', 'std']}}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = g.agg({'C': {'ra': ['mean', 'std']}, + 'D': {'rb': ['mean', 'std']}}) expected = pd.concat([g['C'].mean(), g['C'].std(), g['D'].mean(), g['D'].std()], axis=1) expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( @@ -341,9 +412,14 @@ def f(): # same name as the original column # GH9052 - expected = g['D'].agg({'result1': np.sum, 'result2': np.mean}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + expected = g['D'].agg({'result1': np.sum, 'result2': np.mean}) expected = expected.rename(columns={'result1': 'D'}) - result = g['D'].agg({'D': np.sum, 'result2': np.mean}) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = g['D'].agg({'D': np.sum, 'result2': np.mean}) assert_frame_equal(result, expected, check_like=True) def test_agg_python_multiindex(self): @@ -390,7 +466,7 @@ def test_aggregate_item_by_item(self): # def aggfun(ser): # return len(ser + 'a') # result = grouped.agg(aggfun) - # self.assertEqual(len(result.columns), 1) + # assert len(result.columns) == 1 aggfun = lambda ser: ser.size result = grouped.agg(aggfun) @@ -412,8 +488,8 @@ def aggfun(ser): return ser.size result = DataFrame().groupby(self.df.A).agg(aggfun) - tm.assertIsInstance(result, DataFrame) - self.assertEqual(len(result), 0) + assert isinstance(result, DataFrame) + assert len(result) == 0 def test_agg_item_by_item_raise_typeerror(self): from numpy.random import randint @@ -425,7 +501,7 @@ def raiseException(df): pprint_thing(df.to_string()) raise TypeError - self.assertRaises(TypeError, df.groupby(0).agg, raiseException) + pytest.raises(TypeError, df.groupby(0).agg, raiseException) def test_series_agg_multikey(self): ts = tm.makeTimeSeries() @@ -455,40 +531,364 @@ def bad(x): expected = data.groupby(['A', 'B']).agg(lambda x: 'foo') assert_frame_equal(result, expected) + def test_cythonized_aggers(self): + data = {'A': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1., nan, nan], + 'B': ['A', 'B'] * 6, + 'C': np.random.randn(12)} + df = DataFrame(data) + df.loc[2:10:2, 'C'] = nan + + def _testit(name): + + op = lambda x: getattr(x, name)() + + # single column + grouped = df.drop(['B'], axis=1).groupby('A') + exp = {} + for cat, group in grouped: + exp[cat] = op(group['C']) + exp = DataFrame({'C': exp}) + exp.index.name = 'A' + result = op(grouped) + assert_frame_equal(result, exp) + + # multiple columns + grouped = df.groupby(['A', 'B']) + expd = {} + for (cat1, cat2), group in grouped: + expd.setdefault(cat1, {})[cat2] = op(group['C']) + exp = DataFrame(expd).T.stack(dropna=False) + exp.index.names = ['A', 'B'] + exp.name = 'C' + + result = op(grouped)['C'] + if not tm._incompat_bottleneck_version(name): + assert_series_equal(result, exp) + + _testit('count') + _testit('sum') + _testit('std') + _testit('var') + _testit('sem') + _testit('mean') + _testit('median') + _testit('prod') + _testit('min') + _testit('max') + + def test_cython_agg_boolean(self): + frame = DataFrame({'a': np.random.randint(0, 5, 50), + 'b': np.random.randint(0, 2, 50).astype('bool')}) + result = frame.groupby('a')['b'].mean() + expected = frame.groupby('a')['b'].agg(np.mean) -def assert_fp_equal(a, b): - assert (np.abs(a - b) < 1e-12).all() + assert_series_equal(result, expected) + def test_cython_agg_nothing_to_agg(self): + frame = DataFrame({'a': np.random.randint(0, 5, 50), + 'b': ['foo', 'bar'] * 25}) + pytest.raises(DataError, frame.groupby('a')['b'].mean) + + frame = DataFrame({'a': np.random.randint(0, 5, 50), + 'b': ['foo', 'bar'] * 25}) + pytest.raises(DataError, frame[['b']].groupby(frame['a']).mean) + + def test_cython_agg_nothing_to_agg_with_dates(self): + frame = DataFrame({'a': np.random.randint(0, 5, 50), + 'b': ['foo', 'bar'] * 25, + 'dates': pd.date_range('now', periods=50, + freq='T')}) + with tm.assert_raises_regex(DataError, + "No numeric types to aggregate"): + frame.groupby('b').dates.mean() + + def test_cython_agg_frame_columns(self): + # #2113 + df = DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]}) + + df.groupby(level=0, axis='columns').mean() + df.groupby(level=0, axis='columns').mean() + df.groupby(level=0, axis='columns').mean() + df.groupby(level=0, axis='columns').mean() + + def test_cython_agg_return_dict(self): + # GH 16741 + ts = self.df.groupby('A')['B'].agg( + lambda x: x.value_counts().to_dict()) + expected = Series([{'two': 1, 'one': 1, 'three': 1}, + {'two': 2, 'one': 2, 'three': 1}], + index=Index(['bar', 'foo'], name='A'), + name='B') + assert_series_equal(ts, expected) + + def test_cython_fail_agg(self): + dr = bdate_range('1/1/2000', periods=50) + ts = Series(['A', 'B', 'C', 'D', 'E'] * 10, index=dr) + + grouped = ts.groupby(lambda x: x.month) + summed = grouped.sum() + expected = grouped.agg(np.sum) + assert_series_equal(summed, expected) + + def test_agg_consistency(self): + # agg with ([]) and () not consistent + # GH 6715 + + def P1(a): + try: + return np.percentile(a.dropna(), q=1) + except: + return np.nan + + import datetime as dt + df = DataFrame({'col1': [1, 2, 3, 4], + 'col2': [10, 25, 26, 31], + 'date': [dt.date(2013, 2, 10), dt.date(2013, 2, 10), + dt.date(2013, 2, 11), dt.date(2013, 2, 11)]}) + + g = df.groupby('date') + + expected = g.agg([P1]) + expected.columns = expected.columns.levels[0] + + result = g.agg(P1) + assert_frame_equal(result, expected) -def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): - tups = lmap(tuple, df[keys].values) - tups = com._asarray_tuplesafe(tups) - expected = f(df.groupby(tups)[field]) - for k, v in compat.iteritems(expected): - assert (result[k] == v) + def test_wrap_agg_out(self): + grouped = self.three_group.groupby(['A', 'B']) + def func(ser): + if ser.dtype == np.object: + raise TypeError + else: + return ser.sum() -def test_decons(): - from pandas.core.groupby import decons_group_index, get_group_index + result = grouped.aggregate(func) + exp_grouped = self.three_group.loc[:, self.three_group.columns != 'C'] + expected = exp_grouped.groupby(['A', 'B']).aggregate(func) + assert_frame_equal(result, expected) - def testit(label_list, shape): - group_index = get_group_index(label_list, shape, sort=True, xnull=True) - label_list2 = decons_group_index(group_index, shape) + def test_agg_multiple_functions_maintain_order(self): + # GH #610 + funcs = [('mean', np.mean), ('max', np.max), ('min', np.min)] + result = self.df.groupby('A')['C'].agg(funcs) + exp_cols = Index(['mean', 'max', 'min']) - for a, b in zip(label_list, label_list2): - assert (np.array_equal(a, b)) + tm.assert_index_equal(result.columns, exp_cols) - shape = (4, 5, 6) - label_list = [np.tile([0, 1, 2, 3, 0, 1, 2, 3], 100), np.tile( - [0, 2, 4, 3, 0, 1, 2, 3], 100), np.tile( - [5, 1, 0, 2, 3, 0, 5, 4], 100)] - testit(label_list, shape) + def test_multiple_functions_tuples_and_non_tuples(self): + # #1359 - shape = (10000, 10000) - label_list = [np.tile(np.arange(10000), 5), np.tile(np.arange(10000), 5)] - testit(label_list, shape) + funcs = [('foo', 'mean'), 'std'] + ex_funcs = [('foo', 'mean'), ('std', 'std')] + result = self.df.groupby('A')['C'].agg(funcs) + expected = self.df.groupby('A')['C'].agg(ex_funcs) + assert_frame_equal(result, expected) + + result = self.df.groupby('A').agg(funcs) + expected = self.df.groupby('A').agg(ex_funcs) + assert_frame_equal(result, expected) + + def test_agg_multiple_functions_too_many_lambdas(self): + grouped = self.df.groupby('A') + funcs = ['mean', lambda x: x.mean(), lambda x: x.std()] + + pytest.raises(SpecificationError, grouped.agg, funcs) + + def test_more_flexible_frame_multi_function(self): + + grouped = self.df.groupby('A') -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure', '-s' - ], exit=False) + exmean = grouped.agg(OrderedDict([['C', np.mean], ['D', np.mean]])) + exstd = grouped.agg(OrderedDict([['C', np.std], ['D', np.std]])) + + expected = concat([exmean, exstd], keys=['mean', 'std'], axis=1) + expected = expected.swaplevel(0, 1, axis=1).sort_index(level=0, axis=1) + + d = OrderedDict([['C', [np.mean, np.std]], ['D', [np.mean, np.std]]]) + result = grouped.aggregate(d) + + assert_frame_equal(result, expected) + + # be careful + result = grouped.aggregate(OrderedDict([['C', np.mean], + ['D', [np.mean, np.std]]])) + expected = grouped.aggregate(OrderedDict([['C', np.mean], + ['D', [np.mean, np.std]]])) + assert_frame_equal(result, expected) + + def foo(x): + return np.mean(x) + + def bar(x): + return np.std(x, ddof=1) + + # this uses column selection & renaming + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + d = OrderedDict([['C', np.mean], ['D', OrderedDict( + [['foo', np.mean], ['bar', np.std]])]]) + result = grouped.aggregate(d) + + d = OrderedDict([['C', [np.mean]], ['D', [foo, bar]]]) + expected = grouped.aggregate(d) + + assert_frame_equal(result, expected) + + def test_multi_function_flexible_mix(self): + # GH #1268 + grouped = self.df.groupby('A') + + d = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ + 'bar', 'std' + ]])], ['D', 'sum']]) + + # this uses column selection & renaming + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = grouped.aggregate(d) + + d2 = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ + 'bar', 'std' + ]])], ['D', ['sum']]]) + + # this uses column selection & renaming + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result2 = grouped.aggregate(d2) + + d3 = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ + 'bar', 'std' + ]])], ['D', {'sum': 'sum'}]]) + + # this uses column selection & renaming + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + expected = grouped.aggregate(d3) + + assert_frame_equal(result, expected) + assert_frame_equal(result2, expected) + + def test_agg_callables(self): + # GH 7929 + df = DataFrame({'foo': [1, 2], 'bar': [3, 4]}).astype(np.int64) + + class fn_class(object): + + def __call__(self, x): + return sum(x) + + equiv_callables = [sum, np.sum, lambda x: sum(x), lambda x: x.sum(), + partial(sum), fn_class()] + + expected = df.groupby("foo").agg(sum) + for ecall in equiv_callables: + result = df.groupby('foo').agg(ecall) + assert_frame_equal(result, expected) + + def test__cython_agg_general(self): + ops = [('mean', np.mean), + ('median', np.median), + ('var', np.var), + ('add', np.sum), + ('prod', np.prod), + ('min', np.min), + ('max', np.max), + ('first', lambda x: x.iloc[0]), + ('last', lambda x: x.iloc[-1]), ] + df = DataFrame(np.random.randn(1000)) + labels = np.random.randint(0, 50, size=1000).astype(float) + + for op, targop in ops: + result = df.groupby(labels)._cython_agg_general(op) + expected = df.groupby(labels).agg(targop) + try: + tm.assert_frame_equal(result, expected) + except BaseException as exc: + exc.args += ('operation: %s' % op, ) + raise + + def test_cython_agg_empty_buckets(self): + ops = [('mean', np.mean), + ('median', lambda x: np.median(x) if len(x) > 0 else np.nan), + ('var', lambda x: np.var(x, ddof=1)), + ('add', lambda x: np.sum(x) if len(x) > 0 else np.nan), + ('prod', np.prod), + ('min', np.min), + ('max', np.max), ] + + df = pd.DataFrame([11, 12, 13]) + grps = range(0, 55, 5) + + for op, targop in ops: + result = df.groupby(pd.cut(df[0], grps))._cython_agg_general(op) + expected = df.groupby(pd.cut(df[0], grps)).agg(lambda x: targop(x)) + try: + tm.assert_frame_equal(result, expected) + except BaseException as exc: + exc.args += ('operation: %s' % op,) + raise + + def test_agg_over_numpy_arrays(self): + # GH 3788 + df = pd.DataFrame([[1, np.array([10, 20, 30])], + [1, np.array([40, 50, 60])], + [2, np.array([20, 30, 40])]], + columns=['category', 'arraydata']) + result = df.groupby('category').agg(sum) + + expected_data = [[np.array([50, 70, 90])], [np.array([20, 30, 40])]] + expected_index = pd.Index([1, 2], name='category') + expected_column = ['arraydata'] + expected = pd.DataFrame(expected_data, + index=expected_index, + columns=expected_column) + + assert_frame_equal(result, expected) + + def test_agg_timezone_round_trip(self): + # GH 15426 + ts = pd.Timestamp("2016-01-01 12:00:00", tz='US/Pacific') + df = pd.DataFrame({'a': 1, 'b': [ts + timedelta(minutes=nn) + for nn in range(10)]}) + + result1 = df.groupby('a')['b'].agg(np.min).iloc[0] + result2 = df.groupby('a')['b'].agg(lambda x: np.min(x)).iloc[0] + result3 = df.groupby('a')['b'].min().iloc[0] + + assert result1 == ts + assert result2 == ts + assert result3 == ts + + dates = [pd.Timestamp("2016-01-0%d 12:00:00" % i, tz='US/Pacific') + for i in range(1, 5)] + df = pd.DataFrame({'A': ['a', 'b'] * 2, 'B': dates}) + grouped = df.groupby('A') + + ts = df['B'].iloc[0] + assert ts == grouped.nth(0)['B'].iloc[0] + assert ts == grouped.head(1)['B'].iloc[0] + assert ts == grouped.first()['B'].iloc[0] + assert ts == grouped.apply(lambda x: x.iloc[0])[0] + + ts = df['B'].iloc[2] + assert ts == grouped.last()['B'].iloc[0] + assert ts == grouped.apply(lambda x: x.iloc[-1])[0] + + def test_sum_uint64_overflow(self): + # see gh-14758 + + # Convert to uint64 and don't overflow + df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], + dtype=object) + 9223372036854775807 + + index = pd.Index([9223372036854775808, 9223372036854775810, + 9223372036854775812], dtype=np.uint64) + expected = pd.DataFrame({1: [9223372036854775809, + 9223372036854775811, + 9223372036854775813]}, index=index) + + expected.index.name = 0 + result = df.groupby(0).sum() + tm.assert_frame_equal(result, expected) diff --git a/pandas/tseries/tests/test_bin_groupby.py b/pandas/tests/groupby/test_bin_groupby.py similarity index 83% rename from pandas/tseries/tests/test_bin_groupby.py rename to pandas/tests/groupby/test_bin_groupby.py index 08c0833be0cd6..8b95455b53d22 100644 --- a/pandas/tseries/tests/test_bin_groupby.py +++ b/pandas/tests/groupby/test_bin_groupby.py @@ -1,14 +1,15 @@ # -*- coding: utf-8 -*- +import pytest + from numpy import nan import numpy as np -from pandas.types.common import _ensure_int64 -from pandas import Index, isnull +from pandas.core.dtypes.common import _ensure_int64 +from pandas import Index, isna from pandas.util.testing import assert_almost_equal import pandas.util.testing as tm -import pandas.lib as lib -import pandas.algos as algos +from pandas._libs import lib, groupby def test_series_grouper(): @@ -45,10 +46,9 @@ def test_series_bin_grouper(): assert_almost_equal(counts, exp_counts) -class TestBinGroupers(tm.TestCase): - _multiprocess_can_split_ = True +class TestBinGroupers(object): - def setUp(self): + def setup_method(self, method): self.obj = np.random.randn(10, 1) self.labels = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2], dtype=np.int64) self.bins = np.array([3, 6], dtype=np.int64) @@ -72,15 +72,15 @@ def test_generate_bins(self): bins = func(values, binner, closed='right') assert ((bins == np.array([3, 6])).all()) - self.assertRaises(ValueError, generate_bins_generic, values, [], - 'right') - self.assertRaises(ValueError, generate_bins_generic, values[:0], - binner, 'right') + pytest.raises(ValueError, generate_bins_generic, values, [], + 'right') + pytest.raises(ValueError, generate_bins_generic, values[:0], + binner, 'right') - self.assertRaises(ValueError, generate_bins_generic, values, [4], - 'right') - self.assertRaises(ValueError, generate_bins_generic, values, [-3, -1], - 'right') + pytest.raises(ValueError, generate_bins_generic, values, [4], + 'right') + pytest.raises(ValueError, generate_bins_generic, values, [-3, -1], + 'right') def test_group_ohlc(): @@ -93,11 +93,11 @@ def _check(dtype): labels = _ensure_int64(np.repeat(np.arange(3), np.diff(np.r_[0, bins]))) - func = getattr(algos, 'group_ohlc_%s' % dtype) + func = getattr(groupby, 'group_ohlc_%s' % dtype) func(out, counts, obj[:, None], labels) def _ohlc(group): - if isnull(group).all(): + if isna(group).all(): return np.repeat(nan, 4) return [group[0], group.max(), group.min(), group[-1]] @@ -117,11 +117,12 @@ def _ohlc(group): _check('float64') -class TestMoments(tm.TestCase): +class TestMoments(object): pass -class TestReducer(tm.TestCase): +class TestReducer(object): + def test_int_index(self): from pandas.core.series import Series diff --git a/pandas/tests/groupby/test_categorical.py b/pandas/tests/groupby/test_categorical.py index 81aa183426be9..fdc03acd3e931 100644 --- a/pandas/tests/groupby/test_categorical.py +++ b/pandas/tests/groupby/test_categorical.py @@ -1,71 +1,21 @@ # -*- coding: utf-8 -*- from __future__ import print_function -import nose -from numpy import nan - - -from pandas.core.index import Index, MultiIndex, CategoricalIndex -from pandas.core.api import DataFrame, Categorical - -from pandas.core.series import Series +from datetime import datetime -from pandas.util.testing import (assert_frame_equal, assert_series_equal - ) +import pytest -from pandas.compat import (lmap) - -from pandas import compat - -import pandas.core.common as com import numpy as np +from numpy import nan -import pandas.util.testing as tm import pandas as pd +from pandas import (Index, MultiIndex, CategoricalIndex, + DataFrame, Categorical, Series, Interval) +from pandas.util.testing import assert_frame_equal, assert_series_equal +import pandas.util.testing as tm +from .common import MixIn -class TestGroupByCategorical(tm.TestCase): - - _multiprocess_can_split_ = True - - def setUp(self): - self.ts = tm.makeTimeSeries() - - self.seriesd = tm.getSeriesData() - self.tsd = tm.getTimeSeriesData() - self.frame = DataFrame(self.seriesd) - self.tsframe = DataFrame(self.tsd) - - self.df = DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.random.randn(8)}) - - self.df_mixed_floats = DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.array( - np.random.randn(8), dtype='float32')}) - - index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', - 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], - names=['first', 'second']) - self.mframe = DataFrame(np.random.randn(10, 3), index=index, - columns=['A', 'B', 'C']) - - self.three_group = DataFrame( - {'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', - 'foo', 'foo', 'foo'], - 'B': ['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', - 'two', 'two', 'one'], - 'C': ['dull', 'dull', 'shiny', 'dull', 'dull', 'shiny', 'shiny', - 'dull', 'shiny', 'shiny', 'shiny'], - 'D': np.random.randn(11), - 'E': np.random.randn(11), - 'F': np.random.randn(11)}) +class TestGroupByCategorical(MixIn): def test_level_groupby_get_group(self): # GH15155 @@ -98,7 +48,7 @@ def get_stats(group): 'mean': group.mean()} result = self.df.groupby(cats).D.apply(get_stats) - self.assertEqual(result.index.names[0], 'C') + assert result.index.names[0] == 'C' def test_apply_categorical_data(self): # GH 10138 @@ -159,17 +109,18 @@ def test_groupby_categorical(self): exp_cats = Categorical(ord_labels, ordered=True, categories=['foo', 'bar', 'baz', 'qux']) expected = ord_data.groupby(exp_cats, sort=False).describe() - expected.index.names = [None, None] assert_frame_equal(desc_result, expected) # GH 10460 expc = Categorical.from_codes(np.arange(4).repeat(8), levels, ordered=True) exp = CategoricalIndex(expc) - self.assert_index_equal(desc_result.index.get_level_values(0), exp) + tm.assert_index_equal((desc_result.stack().index + .get_level_values(0)), exp) exp = Index(['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max'] * 4) - self.assert_index_equal(desc_result.index.get_level_values(1), exp) + tm.assert_index_equal((desc_result.stack().index + .get_level_values(1)), exp) def test_groupby_datetime_categorical(self): # GH9049: ensure backward compatibility @@ -196,7 +147,6 @@ def test_groupby_datetime_categorical(self): ord_labels = cats.take_nd(idx) ord_data = data.take(idx) expected = ord_data.groupby(ord_labels).describe() - expected.index.names = [None, None] assert_frame_equal(desc_result, expected) tm.assert_index_equal(desc_result.index, expected.index) tm.assert_index_equal( @@ -207,15 +157,18 @@ def test_groupby_datetime_categorical(self): expc = Categorical.from_codes( np.arange(4).repeat(8), levels, ordered=True) exp = CategoricalIndex(expc) - self.assert_index_equal(desc_result.index.get_level_values(0), exp) + tm.assert_index_equal((desc_result.stack().index + .get_level_values(0)), exp) exp = Index(['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max'] * 4) - self.assert_index_equal(desc_result.index.get_level_values(1), exp) + tm.assert_index_equal((desc_result.stack().index + .get_level_values(1)), exp) def test_groupby_categorical_index(self): + s = np.random.RandomState(12345) levels = ['foo', 'bar', 'baz', 'qux'] - codes = np.random.randint(0, 4, size=20) + codes = s.randint(0, 4, size=20) cats = Categorical.from_codes(codes, levels, ordered=True) df = DataFrame( np.repeat( @@ -246,8 +199,8 @@ def test_groupby_describe_categorical_columns(self): df = DataFrame(np.random.randn(20, 4), columns=cats) result = df.groupby([1, 2, 3, 4] * 5).describe() - tm.assert_index_equal(result.columns, cats) - tm.assert_categorical_equal(result.columns.values, cats.values) + tm.assert_index_equal(result.stack().columns, cats) + tm.assert_categorical_equal(result.stack().columns.values, cats.values) def test_groupby_unstack_categorical(self): # GH11558 (example is taken from the original issue) @@ -268,70 +221,15 @@ def test_groupby_unstack_categorical(self): expected = pd.Series([6, 4], index=pd.Index(['X', 'Y'], name='artist')) tm.assert_series_equal(result, expected) - def test_groupby_categorical_unequal_len(self): + def test_groupby_bins_unequal_len(self): # GH3011 series = Series([np.nan, np.nan, 1, 1, 2, 2, 3, 3, 4, 4]) - # The raises only happens with categorical, not with series of types - # category bins = pd.cut(series.dropna().values, 4) # len(bins) != len(series) here - self.assertRaises(ValueError, lambda: series.groupby(bins).mean()) - - def test_groupby_categorical_two_columns(self): - - # https://github.com/pandas-dev/pandas/issues/8138 - d = {'cat': - pd.Categorical(["a", "b", "a", "b"], categories=["a", "b", "c"], - ordered=True), - 'ints': [1, 1, 2, 2], - 'val': [10, 20, 30, 40]} - test = pd.DataFrame(d) - - # Grouping on a single column - groups_single_key = test.groupby("cat") - res = groups_single_key.agg('mean') - - exp_index = pd.CategoricalIndex(["a", "b", "c"], name="cat", - ordered=True) - exp = DataFrame({"ints": [1.5, 1.5, np.nan], "val": [20, 30, np.nan]}, - index=exp_index) - tm.assert_frame_equal(res, exp) - - # Grouping on two columns - groups_double_key = test.groupby(["cat", "ints"]) - res = groups_double_key.agg('mean') - exp = DataFrame({"val": [10, 30, 20, 40, np.nan, np.nan], - "cat": pd.Categorical(["a", "a", "b", "b", "c", "c"], - ordered=True), - "ints": [1, 2, 1, 2, 1, 2]}).set_index(["cat", "ints" - ]) - tm.assert_frame_equal(res, exp) - - # GH 10132 - for key in [('a', 1), ('b', 2), ('b', 1), ('a', 2)]: - c, i = key - result = groups_double_key.get_group(key) - expected = test[(test.cat == c) & (test.ints == i)] - assert_frame_equal(result, expected) - - d = {'C1': [3, 3, 4, 5], 'C2': [1, 2, 3, 4], 'C3': [10, 100, 200, 34]} - test = pd.DataFrame(d) - values = pd.cut(test['C1'], [1, 2, 3, 6]) - values.name = "cat" - groups_double_key = test.groupby([values, 'C2']) - - res = groups_double_key.agg('mean') - nan = np.nan - idx = MultiIndex.from_product( - [Categorical(["(1, 2]", "(2, 3]", "(3, 6]"], ordered=True), - [1, 2, 3, 4]], - names=["cat", "C2"]) - exp = DataFrame({"C1": [nan, nan, nan, nan, 3, 3, - nan, nan, nan, nan, 4, 5], - "C3": [nan, nan, nan, nan, 10, 100, - nan, nan, nan, nan, 200, 34]}, index=idx) - tm.assert_frame_equal(res, exp) + def f(): + series.groupby(bins).mean() + pytest.raises(ValueError, f) def test_groupby_multi_categorical_as_index(self): # GH13204 @@ -384,6 +282,30 @@ def test_groupby_multi_categorical_as_index(self): tm.assert_frame_equal(result, expected, check_index_type=True) + def test_groupby_preserve_categories(self): + # GH-13179 + categories = list('abc') + + # ordered=True + df = DataFrame({'A': pd.Categorical(list('ba'), + categories=categories, + ordered=True)}) + index = pd.CategoricalIndex(categories, categories, ordered=True) + tm.assert_index_equal(df.groupby('A', sort=True).first().index, index) + tm.assert_index_equal(df.groupby('A', sort=False).first().index, index) + + # ordered=False + df = DataFrame({'A': pd.Categorical(list('ba'), + categories=categories, + ordered=False)}) + sort_index = pd.CategoricalIndex(categories, categories, ordered=False) + nosort_index = pd.CategoricalIndex(list('bac'), list('bac'), + ordered=False) + tm.assert_index_equal(df.groupby('A', sort=True).first().index, + sort_index) + tm.assert_index_equal(df.groupby('A', sort=False).first().index, + nosort_index) + def test_groupby_preserve_categorical_dtype(self): # GH13743, GH13854 df = DataFrame({'A': [1, 2, 1, 1, 2], @@ -456,42 +378,151 @@ def test_groupby_categorical_no_compress(self): result = data.groupby("b").mean() result = result["a"].values exp = np.array([1, 2, 4, np.nan]) - self.assert_numpy_array_equal(result, exp) - - -def assert_fp_equal(a, b): - assert (np.abs(a - b) < 1e-12).all() - - -def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): - tups = lmap(tuple, df[keys].values) - tups = com._asarray_tuplesafe(tups) - expected = f(df.groupby(tups)[field]) - for k, v in compat.iteritems(expected): - assert (result[k] == v) + tm.assert_numpy_array_equal(result, exp) + + def test_groupby_sort_categorical(self): + # dataframe groupby sort was being ignored # GH 8868 + df = DataFrame([['(7.5, 10]', 10, 10], + ['(7.5, 10]', 8, 20], + ['(2.5, 5]', 5, 30], + ['(5, 7.5]', 6, 40], + ['(2.5, 5]', 4, 50], + ['(0, 2.5]', 1, 60], + ['(5, 7.5]', 7, 70]], columns=['range', 'foo', 'bar']) + df['range'] = Categorical(df['range'], ordered=True) + index = CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', + '(7.5, 10]'], name='range', ordered=True) + result_sort = DataFrame([[1, 60], [5, 30], [6, 40], [10, 10]], + columns=['foo', 'bar'], index=index) + + col = 'range' + assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) + # when categories is ordered, group is ordered by category's order + assert_frame_equal(result_sort, df.groupby(col, sort=False).first()) + + df['range'] = Categorical(df['range'], ordered=False) + index = CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', + '(7.5, 10]'], name='range') + result_sort = DataFrame([[1, 60], [5, 30], [6, 40], [10, 10]], + columns=['foo', 'bar'], index=index) + + index = CategoricalIndex(['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', + '(0, 2.5]'], + categories=['(7.5, 10]', '(2.5, 5]', + '(5, 7.5]', '(0, 2.5]'], + name='range') + result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], + index=index, columns=['foo', 'bar']) + + col = 'range' + # this is an unordered categorical, but we allow this #### + assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) + assert_frame_equal(result_nosort, df.groupby(col, sort=False).first()) + + def test_groupby_sort_categorical_datetimelike(self): + # GH10505 + + # use same data as test_groupby_sort_categorical, which category is + # corresponding to datetime.month + df = DataFrame({'dt': [datetime(2011, 7, 1), datetime(2011, 7, 1), + datetime(2011, 2, 1), datetime(2011, 5, 1), + datetime(2011, 2, 1), datetime(2011, 1, 1), + datetime(2011, 5, 1)], + 'foo': [10, 8, 5, 6, 4, 1, 7], + 'bar': [10, 20, 30, 40, 50, 60, 70]}, + columns=['dt', 'foo', 'bar']) + + # ordered=True + df['dt'] = Categorical(df['dt'], ordered=True) + index = [datetime(2011, 1, 1), datetime(2011, 2, 1), + datetime(2011, 5, 1), datetime(2011, 7, 1)] + result_sort = DataFrame( + [[1, 60], [5, 30], [6, 40], [10, 10]], columns=['foo', 'bar']) + result_sort.index = CategoricalIndex(index, name='dt', ordered=True) + + index = [datetime(2011, 7, 1), datetime(2011, 2, 1), + datetime(2011, 5, 1), datetime(2011, 1, 1)] + result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], + columns=['foo', 'bar']) + result_nosort.index = CategoricalIndex(index, categories=index, + name='dt', ordered=True) + + col = 'dt' + assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) + # when categories is ordered, group is ordered by category's order + assert_frame_equal(result_sort, df.groupby(col, sort=False).first()) + + # ordered = False + df['dt'] = Categorical(df['dt'], ordered=False) + index = [datetime(2011, 1, 1), datetime(2011, 2, 1), + datetime(2011, 5, 1), datetime(2011, 7, 1)] + result_sort = DataFrame( + [[1, 60], [5, 30], [6, 40], [10, 10]], columns=['foo', 'bar']) + result_sort.index = CategoricalIndex(index, name='dt') + + index = [datetime(2011, 7, 1), datetime(2011, 2, 1), + datetime(2011, 5, 1), datetime(2011, 1, 1)] + result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], + columns=['foo', 'bar']) + result_nosort.index = CategoricalIndex(index, categories=index, + name='dt') + + col = 'dt' + assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) + assert_frame_equal(result_nosort, df.groupby(col, sort=False).first()) + def test_groupby_categorical_two_columns(self): -def test_decons(): - from pandas.core.groupby import decons_group_index, get_group_index + # https://github.com/pandas-dev/pandas/issues/8138 + d = {'cat': + pd.Categorical(["a", "b", "a", "b"], categories=["a", "b", "c"], + ordered=True), + 'ints': [1, 1, 2, 2], + 'val': [10, 20, 30, 40]} + test = pd.DataFrame(d) - def testit(label_list, shape): - group_index = get_group_index(label_list, shape, sort=True, xnull=True) - label_list2 = decons_group_index(group_index, shape) + # Grouping on a single column + groups_single_key = test.groupby("cat") + res = groups_single_key.agg('mean') - for a, b in zip(label_list, label_list2): - assert (np.array_equal(a, b)) + exp_index = pd.CategoricalIndex(["a", "b", "c"], name="cat", + ordered=True) + exp = DataFrame({"ints": [1.5, 1.5, np.nan], "val": [20, 30, np.nan]}, + index=exp_index) + tm.assert_frame_equal(res, exp) - shape = (4, 5, 6) - label_list = [np.tile([0, 1, 2, 3, 0, 1, 2, 3], 100), np.tile( - [0, 2, 4, 3, 0, 1, 2, 3], 100), np.tile( - [5, 1, 0, 2, 3, 0, 5, 4], 100)] - testit(label_list, shape) + # Grouping on two columns + groups_double_key = test.groupby(["cat", "ints"]) + res = groups_double_key.agg('mean') + exp = DataFrame({"val": [10, 30, 20, 40, np.nan, np.nan], + "cat": pd.Categorical(["a", "a", "b", "b", "c", "c"], + ordered=True), + "ints": [1, 2, 1, 2, 1, 2]}).set_index(["cat", "ints" + ]) + tm.assert_frame_equal(res, exp) - shape = (10000, 10000) - label_list = [np.tile(np.arange(10000), 5), np.tile(np.arange(10000), 5)] - testit(label_list, shape) + # GH 10132 + for key in [('a', 1), ('b', 2), ('b', 1), ('a', 2)]: + c, i = key + result = groups_double_key.get_group(key) + expected = test[(test.cat == c) & (test.ints == i)] + assert_frame_equal(result, expected) + d = {'C1': [3, 3, 4, 5], 'C2': [1, 2, 3, 4], 'C3': [10, 100, 200, 34]} + test = pd.DataFrame(d) + values = pd.cut(test['C1'], [1, 2, 3, 6]) + values.name = "cat" + groups_double_key = test.groupby([values, 'C2']) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure', '-s' - ], exit=False) + res = groups_double_key.agg('mean') + nan = np.nan + idx = MultiIndex.from_product( + [Categorical([Interval(1, 2), Interval(2, 3), + Interval(3, 6)], ordered=True), + [1, 2, 3, 4]], + names=["cat", "C2"]) + exp = DataFrame({"C1": [nan, nan, nan, nan, 3, 3, + nan, nan, nan, nan, 4, 5], + "C3": [nan, nan, nan, nan, 10, 100, + nan, nan, nan, nan, 200, 34]}, index=idx) + tm.assert_frame_equal(res, exp) diff --git a/pandas/tests/groupby/test_counting.py b/pandas/tests/groupby/test_counting.py new file mode 100644 index 0000000000000..485241d593d4f --- /dev/null +++ b/pandas/tests/groupby/test_counting.py @@ -0,0 +1,197 @@ +# -*- coding: utf-8 -*- +from __future__ import print_function + +import numpy as np + +from pandas import (DataFrame, Series, MultiIndex) +from pandas.util.testing import assert_series_equal +from pandas.compat import (range, product as cart_product) + + +class TestCounting(object): + + def test_cumcount(self): + df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A']) + g = df.groupby('A') + sg = g.A + + expected = Series([0, 1, 2, 0, 3]) + + assert_series_equal(expected, g.cumcount()) + assert_series_equal(expected, sg.cumcount()) + + def test_cumcount_empty(self): + ge = DataFrame().groupby(level=0) + se = Series().groupby(level=0) + + # edge case, as this is usually considered float + e = Series(dtype='int64') + + assert_series_equal(e, ge.cumcount()) + assert_series_equal(e, se.cumcount()) + + def test_cumcount_dupe_index(self): + df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], + index=[0] * 5) + g = df.groupby('A') + sg = g.A + + expected = Series([0, 1, 2, 0, 3], index=[0] * 5) + + assert_series_equal(expected, g.cumcount()) + assert_series_equal(expected, sg.cumcount()) + + def test_cumcount_mi(self): + mi = MultiIndex.from_tuples([[0, 1], [1, 2], [2, 2], [2, 2], [1, 0]]) + df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], + index=mi) + g = df.groupby('A') + sg = g.A + + expected = Series([0, 1, 2, 0, 3], index=mi) + + assert_series_equal(expected, g.cumcount()) + assert_series_equal(expected, sg.cumcount()) + + def test_cumcount_groupby_not_col(self): + df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], + index=[0] * 5) + g = df.groupby([0, 0, 0, 1, 0]) + sg = g.A + + expected = Series([0, 1, 2, 0, 3], index=[0] * 5) + + assert_series_equal(expected, g.cumcount()) + assert_series_equal(expected, sg.cumcount()) + + def test_ngroup(self): + df = DataFrame({'A': list('aaaba')}) + g = df.groupby('A') + sg = g.A + + expected = Series([0, 0, 0, 1, 0]) + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_distinct(self): + df = DataFrame({'A': list('abcde')}) + g = df.groupby('A') + sg = g.A + + expected = Series(range(5), dtype='int64') + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_one_group(self): + df = DataFrame({'A': [0] * 5}) + g = df.groupby('A') + sg = g.A + + expected = Series([0] * 5) + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_empty(self): + ge = DataFrame().groupby(level=0) + se = Series().groupby(level=0) + + # edge case, as this is usually considered float + e = Series(dtype='int64') + + assert_series_equal(e, ge.ngroup()) + assert_series_equal(e, se.ngroup()) + + def test_ngroup_series_matches_frame(self): + df = DataFrame({'A': list('aaaba')}) + s = Series(list('aaaba')) + + assert_series_equal(df.groupby(s).ngroup(), + s.groupby(s).ngroup()) + + def test_ngroup_dupe_index(self): + df = DataFrame({'A': list('aaaba')}, index=[0] * 5) + g = df.groupby('A') + sg = g.A + + expected = Series([0, 0, 0, 1, 0], index=[0] * 5) + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_mi(self): + mi = MultiIndex.from_tuples([[0, 1], [1, 2], [2, 2], [2, 2], [1, 0]]) + df = DataFrame({'A': list('aaaba')}, index=mi) + g = df.groupby('A') + sg = g.A + expected = Series([0, 0, 0, 1, 0], index=mi) + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_groupby_not_col(self): + df = DataFrame({'A': list('aaaba')}, index=[0] * 5) + g = df.groupby([0, 0, 0, 1, 0]) + sg = g.A + + expected = Series([0, 0, 0, 1, 0], index=[0] * 5) + + assert_series_equal(expected, g.ngroup()) + assert_series_equal(expected, sg.ngroup()) + + def test_ngroup_descending(self): + df = DataFrame(['a', 'a', 'b', 'a', 'b'], columns=['A']) + g = df.groupby(['A']) + + ascending = Series([0, 0, 1, 0, 1]) + descending = Series([1, 1, 0, 1, 0]) + + assert_series_equal(descending, (g.ngroups - 1) - ascending) + assert_series_equal(ascending, g.ngroup(ascending=True)) + assert_series_equal(descending, g.ngroup(ascending=False)) + + def test_ngroup_matches_cumcount(self): + # verify one manually-worked out case works + df = DataFrame([['a', 'x'], ['a', 'y'], ['b', 'x'], + ['a', 'x'], ['b', 'y']], columns=['A', 'X']) + g = df.groupby(['A', 'X']) + g_ngroup = g.ngroup() + g_cumcount = g.cumcount() + expected_ngroup = Series([0, 1, 2, 0, 3]) + expected_cumcount = Series([0, 0, 0, 1, 0]) + + assert_series_equal(g_ngroup, expected_ngroup) + assert_series_equal(g_cumcount, expected_cumcount) + + def test_ngroup_cumcount_pair(self): + # brute force comparison for all small series + for p in cart_product(range(3), repeat=4): + df = DataFrame({'a': p}) + g = df.groupby(['a']) + + order = sorted(set(p)) + ngroupd = [order.index(val) for val in p] + cumcounted = [p[:i].count(val) for i, val in enumerate(p)] + + assert_series_equal(g.ngroup(), Series(ngroupd)) + assert_series_equal(g.cumcount(), Series(cumcounted)) + + def test_ngroup_respects_groupby_order(self): + np.random.seed(0) + df = DataFrame({'a': np.random.choice(list('abcdef'), 100)}) + for sort_flag in (False, True): + g = df.groupby(['a'], sort=sort_flag) + df['group_id'] = -1 + df['group_index'] = -1 + + for i, (_, group) in enumerate(g): + df.loc[group.index, 'group_id'] = i + for j, ind in enumerate(group.index): + df.loc[ind, 'group_index'] = j + + assert_series_equal(Series(df['group_id'].values), + g.ngroup()) + assert_series_equal(Series(df['group_index'].values), + g.cumcount()) diff --git a/pandas/tests/groupby/test_filters.py b/pandas/tests/groupby/test_filters.py index 40d8039f71576..cac6b46af8f87 100644 --- a/pandas/tests/groupby/test_filters.py +++ b/pandas/tests/groupby/test_filters.py @@ -1,9 +1,8 @@ # -*- coding: utf-8 -*- from __future__ import print_function -import nose - from numpy import nan +import pytest from pandas import Timestamp from pandas.core.index import MultiIndex @@ -24,11 +23,9 @@ import pandas as pd -class TestGroupByFilter(tm.TestCase): - - _multiprocess_can_split_ = True +class TestGroupByFilter(object): - def setUp(self): + def setup_method(self, method): self.ts = tm.makeTimeSeries() self.seriesd = tm.getSeriesData() @@ -168,8 +165,8 @@ def raise_if_sum_is_zero(x): s = pd.Series([-1, 0, 1, 2]) grouper = s.apply(lambda x: x % 2) grouped = s.groupby(grouper) - self.assertRaises(TypeError, - lambda: grouped.filter(raise_if_sum_is_zero)) + pytest.raises(TypeError, + lambda: grouped.filter(raise_if_sum_is_zero)) def test_filter_with_axis_in_groupby(self): # issue 11041 @@ -190,16 +187,16 @@ def test_filter_bad_shapes(self): g_s = s.groupby(s) f = lambda x: x - self.assertRaises(TypeError, lambda: g_df.filter(f)) - self.assertRaises(TypeError, lambda: g_s.filter(f)) + pytest.raises(TypeError, lambda: g_df.filter(f)) + pytest.raises(TypeError, lambda: g_s.filter(f)) f = lambda x: x == 1 - self.assertRaises(TypeError, lambda: g_df.filter(f)) - self.assertRaises(TypeError, lambda: g_s.filter(f)) + pytest.raises(TypeError, lambda: g_df.filter(f)) + pytest.raises(TypeError, lambda: g_s.filter(f)) f = lambda x: np.outer(x, x) - self.assertRaises(TypeError, lambda: g_df.filter(f)) - self.assertRaises(TypeError, lambda: g_s.filter(f)) + pytest.raises(TypeError, lambda: g_df.filter(f)) + pytest.raises(TypeError, lambda: g_s.filter(f)) def test_filter_nan_is_false(self): df = DataFrame({'A': np.arange(8), @@ -220,6 +217,7 @@ def test_filter_against_workaround(self): grouper = s.apply(lambda x: np.round(x, -1)) grouped = s.groupby(grouper) f = lambda x: x.mean() > 10 + old_way = s[grouped.transform(f).astype('bool')] new_way = grouped.filter(f) assert_series_equal(new_way.sort_values(), old_way.sort_values()) @@ -580,7 +578,8 @@ def test_filter_enforces_scalarness(self): ['worst', 'd', 'y'], ['best', 'd', 'z'], ], columns=['a', 'b', 'c']) - with tm.assertRaisesRegexp(TypeError, 'filter function returned a.*'): + with tm.assert_raises_regex(TypeError, + 'filter function returned a.*'): df.groupby('c').filter(lambda g: g['a'] == 'best') def test_filter_non_bool_raises(self): @@ -593,7 +592,8 @@ def test_filter_non_bool_raises(self): ['worst', 'd', 1], ['best', 'd', 1], ], columns=['a', 'b', 'c']) - with tm.assertRaisesRegexp(TypeError, 'filter function returned a.*'): + with tm.assert_raises_regex(TypeError, + 'filter function returned a.*'): df.groupby('a').filter(lambda g: g.c.mean()) def test_filter_dropna_with_empty_groups(self): @@ -620,29 +620,3 @@ def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): expected = f(df.groupby(tups)[field]) for k, v in compat.iteritems(expected): assert (result[k] == v) - - -def test_decons(): - from pandas.core.groupby import decons_group_index, get_group_index - - def testit(label_list, shape): - group_index = get_group_index(label_list, shape, sort=True, xnull=True) - label_list2 = decons_group_index(group_index, shape) - - for a, b in zip(label_list, label_list2): - assert (np.array_equal(a, b)) - - shape = (4, 5, 6) - label_list = [np.tile([0, 1, 2, 3, 0, 1, 2, 3], 100), np.tile( - [0, 2, 4, 3, 0, 1, 2, 3], 100), np.tile( - [5, 1, 0, 2, 3, 0, 5, 4], 100)] - testit(label_list, shape) - - shape = (10000, 10000) - label_list = [np.tile(np.arange(10000), 5), np.tile(np.arange(10000), 5)] - testit(label_list, shape) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure', '-s' - ], exit=False) diff --git a/pandas/tests/groupby/test_groupby.py b/pandas/tests/groupby/test_groupby.py index ffb6025163a6b..8957beacab376 100644 --- a/pandas/tests/groupby/test_groupby.py +++ b/pandas/tests/groupby/test_groupby.py @@ -1,82 +1,34 @@ # -*- coding: utf-8 -*- from __future__ import print_function -import nose +import pytest + +from warnings import catch_warnings from string import ascii_lowercase from datetime import datetime from numpy import nan -from pandas.types.common import _ensure_platform_int -from pandas import date_range, bdate_range, Timestamp, isnull -from pandas.core.index import Index, MultiIndex, CategoricalIndex -from pandas.core.api import Categorical, DataFrame -from pandas.core.common import UnsupportedFunctionCall -from pandas.core.groupby import (SpecificationError, DataError, _nargsort, - _lexsort_indexer) -from pandas.core.series import Series -from pandas.core.config import option_context +from pandas import (date_range, bdate_range, Timestamp, + Index, MultiIndex, DataFrame, Series, + concat, Panel, DatetimeIndex) +from pandas.errors import UnsupportedFunctionCall, PerformanceWarning from pandas.util.testing import (assert_panel_equal, assert_frame_equal, assert_series_equal, assert_almost_equal, - assert_index_equal, assertRaisesRegexp) + assert_index_equal) from pandas.compat import (range, long, lrange, StringIO, lmap, lzip, map, zip, builtins, OrderedDict, product as cart_product) from pandas import compat -from pandas.core.panel import Panel -from pandas.tools.merge import concat from collections import defaultdict -from functools import partial import pandas.core.common as com import numpy as np import pandas.core.nanops as nanops - import pandas.util.testing as tm import pandas as pd +from .common import MixIn -class TestGroupBy(tm.TestCase): - - _multiprocess_can_split_ = True - - def setUp(self): - self.ts = tm.makeTimeSeries() - - self.seriesd = tm.getSeriesData() - self.tsd = tm.getTimeSeriesData() - self.frame = DataFrame(self.seriesd) - self.tsframe = DataFrame(self.tsd) - - self.df = DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.random.randn(8)}) - - self.df_mixed_floats = DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.array( - np.random.randn(8), dtype='float32')}) - - index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', - 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], - names=['first', 'second']) - self.mframe = DataFrame(np.random.randn(10, 3), index=index, - columns=['A', 'B', 'C']) - - self.three_group = DataFrame( - {'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', - 'foo', 'foo', 'foo'], - 'B': ['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', - 'two', 'two', 'one'], - 'C': ['dull', 'dull', 'shiny', 'dull', 'dull', 'shiny', 'shiny', - 'dull', 'shiny', 'shiny', 'shiny'], - 'D': np.random.randn(11), - 'E': np.random.randn(11), - 'F': np.random.randn(11)}) +class TestGroupBy(MixIn): def test_basic(self): def checkit(dtype): @@ -89,10 +41,10 @@ def checkit(dtype): grouped = data.groupby(lambda x: x // 3) for k, v in grouped: - self.assertEqual(len(v), 3) + assert len(v) == 3 agged = grouped.aggregate(np.mean) - self.assertEqual(agged[1], 1) + assert agged[1] == 1 assert_series_equal(agged, grouped.agg(np.mean)) # shorthand assert_series_equal(agged, grouped.mean()) @@ -100,7 +52,7 @@ def checkit(dtype): expected = grouped.apply(lambda x: x * x.sum()) transformed = grouped.transform(lambda x: x * x.sum()) - self.assertEqual(transformed[7], 12) + assert transformed[7] == 12 assert_series_equal(transformed, expected) value_grouped = data.groupby(data) @@ -109,14 +61,17 @@ def checkit(dtype): # complex agg agged = grouped.aggregate([np.mean, np.std]) - agged = grouped.aggregate({'one': np.mean, 'two': np.std}) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + agged = grouped.aggregate({'one': np.mean, 'two': np.std}) group_constants = {0: 10, 1: 20, 2: 30} agged = grouped.agg(lambda x: group_constants[x.name] + x.mean()) - self.assertEqual(agged[1], 21) + assert agged[1] == 21 # corner cases - self.assertRaises(Exception, grouped.aggregate, lambda x: x * 2) + pytest.raises(Exception, grouped.aggregate, lambda x: x * 2) for dtype in ['int64', 'int32', 'float64', 'float32']: checkit(dtype) @@ -124,237 +79,14 @@ def checkit(dtype): def test_select_bad_cols(self): df = DataFrame([[1, 2]], columns=['A', 'B']) g = df.groupby('A') - self.assertRaises(KeyError, g.__getitem__, ['C']) # g[['C']] + pytest.raises(KeyError, g.__getitem__, ['C']) # g[['C']] - self.assertRaises(KeyError, g.__getitem__, ['A', 'C']) # g[['A', 'C']] - with assertRaisesRegexp(KeyError, '^[^A]+$'): + pytest.raises(KeyError, g.__getitem__, ['A', 'C']) # g[['A', 'C']] + with tm.assert_raises_regex(KeyError, '^[^A]+$'): # A should not be referenced as a bad column... # will have to rethink regex if you change message! g[['A', 'C']] - def test_first_last_nth(self): - # tests for first / last / nth - grouped = self.df.groupby('A') - first = grouped.first() - expected = self.df.loc[[1, 0], ['B', 'C', 'D']] - expected.index = Index(['bar', 'foo'], name='A') - expected = expected.sort_index() - assert_frame_equal(first, expected) - - nth = grouped.nth(0) - assert_frame_equal(nth, expected) - - last = grouped.last() - expected = self.df.loc[[5, 7], ['B', 'C', 'D']] - expected.index = Index(['bar', 'foo'], name='A') - assert_frame_equal(last, expected) - - nth = grouped.nth(-1) - assert_frame_equal(nth, expected) - - nth = grouped.nth(1) - expected = self.df.loc[[2, 3], ['B', 'C', 'D']].copy() - expected.index = Index(['foo', 'bar'], name='A') - expected = expected.sort_index() - assert_frame_equal(nth, expected) - - # it works! - grouped['B'].first() - grouped['B'].last() - grouped['B'].nth(0) - - self.df.loc[self.df['A'] == 'foo', 'B'] = np.nan - self.assertTrue(isnull(grouped['B'].first()['foo'])) - self.assertTrue(isnull(grouped['B'].last()['foo'])) - self.assertTrue(isnull(grouped['B'].nth(0)['foo'])) - - # v0.14.0 whatsnew - df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) - g = df.groupby('A') - result = g.first() - expected = df.iloc[[1, 2]].set_index('A') - assert_frame_equal(result, expected) - - expected = df.iloc[[1, 2]].set_index('A') - result = g.nth(0, dropna='any') - assert_frame_equal(result, expected) - - def test_first_last_nth_dtypes(self): - - df = self.df_mixed_floats.copy() - df['E'] = True - df['F'] = 1 - - # tests for first / last / nth - grouped = df.groupby('A') - first = grouped.first() - expected = df.loc[[1, 0], ['B', 'C', 'D', 'E', 'F']] - expected.index = Index(['bar', 'foo'], name='A') - expected = expected.sort_index() - assert_frame_equal(first, expected) - - last = grouped.last() - expected = df.loc[[5, 7], ['B', 'C', 'D', 'E', 'F']] - expected.index = Index(['bar', 'foo'], name='A') - expected = expected.sort_index() - assert_frame_equal(last, expected) - - nth = grouped.nth(1) - expected = df.loc[[3, 2], ['B', 'C', 'D', 'E', 'F']] - expected.index = Index(['bar', 'foo'], name='A') - expected = expected.sort_index() - assert_frame_equal(nth, expected) - - # GH 2763, first/last shifting dtypes - idx = lrange(10) - idx.append(9) - s = Series(data=lrange(11), index=idx, name='IntCol') - self.assertEqual(s.dtype, 'int64') - f = s.groupby(level=0).first() - self.assertEqual(f.dtype, 'int64') - - def test_nth(self): - df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) - g = df.groupby('A') - - assert_frame_equal(g.nth(0), df.iloc[[0, 2]].set_index('A')) - assert_frame_equal(g.nth(1), df.iloc[[1]].set_index('A')) - assert_frame_equal(g.nth(2), df.loc[[]].set_index('A')) - assert_frame_equal(g.nth(-1), df.iloc[[1, 2]].set_index('A')) - assert_frame_equal(g.nth(-2), df.iloc[[0]].set_index('A')) - assert_frame_equal(g.nth(-3), df.loc[[]].set_index('A')) - assert_series_equal(g.B.nth(0), df.set_index('A').B.iloc[[0, 2]]) - assert_series_equal(g.B.nth(1), df.set_index('A').B.iloc[[1]]) - assert_frame_equal(g[['B']].nth(0), - df.loc[[0, 2], ['A', 'B']].set_index('A')) - - exp = df.set_index('A') - assert_frame_equal(g.nth(0, dropna='any'), exp.iloc[[1, 2]]) - assert_frame_equal(g.nth(-1, dropna='any'), exp.iloc[[1, 2]]) - - exp['B'] = np.nan - assert_frame_equal(g.nth(7, dropna='any'), exp.iloc[[1, 2]]) - assert_frame_equal(g.nth(2, dropna='any'), exp.iloc[[1, 2]]) - - # out of bounds, regression from 0.13.1 - # GH 6621 - df = DataFrame({'color': {0: 'green', - 1: 'green', - 2: 'red', - 3: 'red', - 4: 'red'}, - 'food': {0: 'ham', - 1: 'eggs', - 2: 'eggs', - 3: 'ham', - 4: 'pork'}, - 'two': {0: 1.5456590000000001, - 1: -0.070345000000000005, - 2: -2.4004539999999999, - 3: 0.46206000000000003, - 4: 0.52350799999999997}, - 'one': {0: 0.56573799999999996, - 1: -0.9742360000000001, - 2: 1.033801, - 3: -0.78543499999999999, - 4: 0.70422799999999997}}).set_index(['color', - 'food']) - - result = df.groupby(level=0, as_index=False).nth(2) - expected = df.iloc[[-1]] - assert_frame_equal(result, expected) - - result = df.groupby(level=0, as_index=False).nth(3) - expected = df.loc[[]] - assert_frame_equal(result, expected) - - # GH 7559 - # from the vbench - df = DataFrame(np.random.randint(1, 10, (100, 2)), dtype='int64') - s = df[1] - g = df[0] - expected = s.groupby(g).first() - expected2 = s.groupby(g).apply(lambda x: x.iloc[0]) - assert_series_equal(expected2, expected, check_names=False) - self.assertTrue(expected.name, 0) - self.assertEqual(expected.name, 1) - - # validate first - v = s[g == 1].iloc[0] - self.assertEqual(expected.iloc[0], v) - self.assertEqual(expected2.iloc[0], v) - - # this is NOT the same as .first (as sorted is default!) - # as it keeps the order in the series (and not the group order) - # related GH 7287 - expected = s.groupby(g, sort=False).first() - result = s.groupby(g, sort=False).nth(0, dropna='all') - assert_series_equal(result, expected) - - # doc example - df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) - g = df.groupby('A') - result = g.B.nth(0, dropna=True) - expected = g.B.first() - assert_series_equal(result, expected) - - # test multiple nth values - df = DataFrame([[1, np.nan], [1, 3], [1, 4], [5, 6], [5, 7]], - columns=['A', 'B']) - g = df.groupby('A') - - assert_frame_equal(g.nth(0), df.iloc[[0, 3]].set_index('A')) - assert_frame_equal(g.nth([0]), df.iloc[[0, 3]].set_index('A')) - assert_frame_equal(g.nth([0, 1]), df.iloc[[0, 1, 3, 4]].set_index('A')) - assert_frame_equal( - g.nth([0, -1]), df.iloc[[0, 2, 3, 4]].set_index('A')) - assert_frame_equal( - g.nth([0, 1, 2]), df.iloc[[0, 1, 2, 3, 4]].set_index('A')) - assert_frame_equal( - g.nth([0, 1, -1]), df.iloc[[0, 1, 2, 3, 4]].set_index('A')) - assert_frame_equal(g.nth([2]), df.iloc[[2]].set_index('A')) - assert_frame_equal(g.nth([3, 4]), df.loc[[]].set_index('A')) - - business_dates = pd.date_range(start='4/1/2014', end='6/30/2014', - freq='B') - df = DataFrame(1, index=business_dates, columns=['a', 'b']) - # get the first, fourth and last two business days for each month - key = (df.index.year, df.index.month) - result = df.groupby(key, as_index=False).nth([0, 3, -2, -1]) - expected_dates = pd.to_datetime( - ['2014/4/1', '2014/4/4', '2014/4/29', '2014/4/30', '2014/5/1', - '2014/5/6', '2014/5/29', '2014/5/30', '2014/6/2', '2014/6/5', - '2014/6/27', '2014/6/30']) - expected = DataFrame(1, columns=['a', 'b'], index=expected_dates) - assert_frame_equal(result, expected) - - def test_nth_multi_index(self): - # PR 9090, related to issue 8979 - # test nth on MultiIndex, should match .first() - grouped = self.three_group.groupby(['A', 'B']) - result = grouped.nth(0) - expected = grouped.first() - assert_frame_equal(result, expected) - - def test_nth_multi_index_as_expected(self): - # PR 9090, related to issue 8979 - # test nth on MultiIndex - three_group = DataFrame( - {'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', - 'foo', 'foo', 'foo'], - 'B': ['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', - 'two', 'two', 'one'], - 'C': ['dull', 'dull', 'shiny', 'dull', 'dull', 'shiny', 'shiny', - 'dull', 'shiny', 'shiny', 'shiny']}) - grouped = three_group.groupby(['A', 'B']) - result = grouped.nth(0) - expected = DataFrame( - {'C': ['dull', 'dull', 'dull', 'dull']}, - index=MultiIndex.from_arrays([['bar', 'bar', 'foo', 'foo'], - ['one', 'two', 'one', 'two']], - names=['A', 'B'])) - assert_frame_equal(result, expected) - def test_group_selection_cache(self): # GH 12839 nth, head, and tail should return same result consistently df = DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) @@ -598,7 +330,7 @@ def test_grouper_column_index_level_precedence(self): expected = df_multi_both.groupby([pd.Grouper(key='inner')]).mean() assert_frame_equal(result, expected) not_expected = df_multi_both.groupby(pd.Grouper(level='inner')).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group single Index by single key with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -607,7 +339,7 @@ def test_grouper_column_index_level_precedence(self): expected = df_single_both.groupby([pd.Grouper(key='inner')]).mean() assert_frame_equal(result, expected) not_expected = df_single_both.groupby(pd.Grouper(level='inner')).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group MultiIndex by single key list with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -616,7 +348,7 @@ def test_grouper_column_index_level_precedence(self): expected = df_multi_both.groupby([pd.Grouper(key='inner')]).mean() assert_frame_equal(result, expected) not_expected = df_multi_both.groupby(pd.Grouper(level='inner')).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group single Index by single key list with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -625,7 +357,7 @@ def test_grouper_column_index_level_precedence(self): expected = df_single_both.groupby([pd.Grouper(key='inner')]).mean() assert_frame_equal(result, expected) not_expected = df_single_both.groupby(pd.Grouper(level='inner')).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group MultiIndex by two keys (1) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -637,7 +369,7 @@ def test_grouper_column_index_level_precedence(self): not_expected = df_multi_both.groupby(['B', pd.Grouper(level='inner') ]).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group MultiIndex by two keys (2) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -648,7 +380,7 @@ def test_grouper_column_index_level_precedence(self): assert_frame_equal(result, expected) not_expected = df_multi_both.groupby([pd.Grouper(level='inner'), 'B']).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group single Index by two keys (1) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -660,7 +392,7 @@ def test_grouper_column_index_level_precedence(self): not_expected = df_single_both.groupby(['B', pd.Grouper(level='inner') ]).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) # Group single Index by two keys (2) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): @@ -671,7 +403,7 @@ def test_grouper_column_index_level_precedence(self): assert_frame_equal(result, expected) not_expected = df_single_both.groupby([pd.Grouper(level='inner'), 'B']).mean() - self.assertFalse(result.index.equals(not_expected.index)) + assert not result.index.equals(not_expected.index) def test_grouper_getting_correct_binner(self): @@ -691,31 +423,31 @@ def test_grouper_getting_correct_binner(self): assert_frame_equal(result, expected) def test_grouper_iter(self): - self.assertEqual(sorted(self.df.groupby('A').grouper), ['bar', 'foo']) + assert sorted(self.df.groupby('A').grouper) == ['bar', 'foo'] def test_empty_groups(self): - # GH # 1048 - self.assertRaises(ValueError, self.df.groupby, []) + # see gh-1048 + pytest.raises(ValueError, self.df.groupby, []) def test_groupby_grouper(self): grouped = self.df.groupby('A') result = self.df.groupby(grouped.grouper).mean() expected = grouped.mean() - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_groupby_duplicated_column_errormsg(self): # GH7511 df = DataFrame(columns=['A', 'B', 'A', 'C'], data=[range(4), range(2, 6), range(0, 8, 2)]) - self.assertRaises(ValueError, df.groupby, 'A') - self.assertRaises(ValueError, df.groupby, ['A', 'B']) + pytest.raises(ValueError, df.groupby, 'A') + pytest.raises(ValueError, df.groupby, ['A', 'B']) grouped = df.groupby('B') c = grouped.count() - self.assertTrue(c.columns.nlevels == 1) - self.assertTrue(c.columns.size == 3) + assert c.columns.nlevels == 1 + assert c.columns.size == 3 def test_groupby_dict_mapping(self): # GH #679 @@ -749,7 +481,7 @@ def test_groupby_grouper_f_sanity_checked(self): # when the elements are Timestamp. # the result is Index[0:6], very confusing. - self.assertRaises(AssertionError, ts.groupby, lambda key: key[0:6]) + pytest.raises(AssertionError, ts.groupby, lambda key: key[0:6]) def test_groupby_nonobject_dtype(self): key = self.mframe.index.labels[0] @@ -776,36 +508,36 @@ def max_value(group): def test_groupby_return_type(self): # GH2893, return a reduced type - df1 = DataFrame([{"val1": 1, - "val2": 20}, {"val1": 1, - "val2": 19}, {"val1": 2, - "val2": 27}, {"val1": 2, - "val2": 12} - ]) + df1 = DataFrame( + [{"val1": 1, "val2": 20}, + {"val1": 1, "val2": 19}, + {"val1": 2, "val2": 27}, + {"val1": 2, "val2": 12} + ]) def func(dataf): return dataf["val2"] - dataf["val2"].mean() result = df1.groupby("val1", squeeze=True).apply(func) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) - df2 = DataFrame([{"val1": 1, - "val2": 20}, {"val1": 1, - "val2": 19}, {"val1": 1, - "val2": 27}, {"val1": 1, - "val2": 12} - ]) + df2 = DataFrame( + [{"val1": 1, "val2": 20}, + {"val1": 1, "val2": 19}, + {"val1": 1, "val2": 27}, + {"val1": 1, "val2": 12} + ]) def func(dataf): return dataf["val2"] - dataf["val2"].mean() result = df2.groupby("val1", squeeze=True).apply(func) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) # GH3596, return a consistent type (regression in 0.11 from 0.10.1) df = DataFrame([[1, 1], [1, 1]], columns=['X', 'Y']) result = df.groupby('X', squeeze=False).count() - tm.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) # GH5592 # inconcistent return type @@ -865,12 +597,14 @@ def f(grp): assert_series_equal(result, e) def test_get_group(self): - wp = tm.makePanel() - grouped = wp.groupby(lambda x: x.month, axis='major') + with catch_warnings(record=True): + wp = tm.makePanel() + grouped = wp.groupby(lambda x: x.month, axis='major') - gp = grouped.get_group(1) - expected = wp.reindex(major=[x for x in wp.major_axis if x.month == 1]) - assert_panel_equal(gp, expected) + gp = grouped.get_group(1) + expected = wp.reindex( + major=[x for x in wp.major_axis if x.month == 1]) + assert_panel_equal(gp, expected) # GH 5267 # be datelike friendly @@ -898,21 +632,24 @@ def test_get_group(self): assert_frame_equal(result1, result3) # must pass a same-length tuple with multiple keys - self.assertRaises(ValueError, lambda: g.get_group('foo')) - self.assertRaises(ValueError, lambda: g.get_group(('foo'))) - self.assertRaises(ValueError, - lambda: g.get_group(('foo', 'bar', 'baz'))) + pytest.raises(ValueError, lambda: g.get_group('foo')) + pytest.raises(ValueError, lambda: g.get_group(('foo'))) + pytest.raises(ValueError, + lambda: g.get_group(('foo', 'bar', 'baz'))) def test_get_group_empty_bins(self): + d = pd.DataFrame([3, 1, 7, 6]) bins = [0, 5, 10, 15] g = d.groupby(pd.cut(d[0], bins)) - result = g.get_group('(0, 5]') + # TODO: should prob allow a str of Interval work as well + # IOW '(0, 5]' + result = g.get_group(pd.Interval(0, 5)) expected = DataFrame([3, 1], index=[0, 1]) assert_frame_equal(result, expected) - self.assertRaises(KeyError, lambda: g.get_group('(10, 15]')) + pytest.raises(KeyError, lambda: g.get_group(pd.Interval(10, 15))) def test_get_group_grouped_by_tuple(self): # GH 8121 @@ -932,8 +669,8 @@ def test_get_group_grouped_by_tuple(self): def test_grouping_error_on_multidim_input(self): from pandas.core.groupby import Grouping - self.assertRaises(ValueError, - Grouping, self.df.index, self.df[['A', 'A']]) + pytest.raises(ValueError, + Grouping, self.df.index, self.df[['A', 'A']]) def test_apply_describe_bug(self): grouped = self.mframe.groupby(level='first') @@ -1000,39 +737,40 @@ def func_with_date(batch): 'c': 2}, index=[1]) dfg_conversion_expected.index.name = 'a' - self.assert_frame_equal(dfg_no_conversion, dfg_no_conversion_expected) - self.assert_frame_equal(dfg_conversion, dfg_conversion_expected) + tm.assert_frame_equal(dfg_no_conversion, dfg_no_conversion_expected) + tm.assert_frame_equal(dfg_conversion, dfg_conversion_expected) def test_len(self): df = tm.makeTimeDataFrame() grouped = df.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day]) - self.assertEqual(len(grouped), len(df)) + assert len(grouped) == len(df) grouped = df.groupby([lambda x: x.year, lambda x: x.month]) expected = len(set([(x.year, x.month) for x in df.index])) - self.assertEqual(len(grouped), expected) + assert len(grouped) == expected # issue 11016 df = pd.DataFrame(dict(a=[np.nan] * 3, b=[1, 2, 3])) - self.assertEqual(len(df.groupby(('a'))), 0) - self.assertEqual(len(df.groupby(('b'))), 3) - self.assertEqual(len(df.groupby(('a', 'b'))), 3) + assert len(df.groupby(('a'))) == 0 + assert len(df.groupby(('b'))) == 3 + assert len(df.groupby(('a', 'b'))) == 3 def test_groups(self): grouped = self.df.groupby(['A']) groups = grouped.groups - self.assertIs(groups, grouped.groups) # caching works + assert groups is grouped.groups # caching works for k, v in compat.iteritems(grouped.groups): - self.assertTrue((self.df.loc[v]['A'] == k).all()) + assert (self.df.loc[v]['A'] == k).all() grouped = self.df.groupby(['A', 'B']) groups = grouped.groups - self.assertIs(groups, grouped.groups) # caching works + assert groups is grouped.groups # caching works + for k, v in compat.iteritems(grouped.groups): - self.assertTrue((self.df.loc[v]['A'] == k[0]).all()) - self.assertTrue((self.df.loc[v]['B'] == k[1]).all()) + assert (self.df.loc[v]['A'] == k[0]).all() + assert (self.df.loc[v]['B'] == k[1]).all() def test_basic_regression(self): # regression @@ -1045,266 +783,6 @@ def test_basic_regression(self): grouped = result.groupby(groupings) grouped.mean() - def test_transform(self): - data = Series(np.arange(9) // 3, index=np.arange(9)) - - index = np.arange(9) - np.random.shuffle(index) - data = data.reindex(index) - - grouped = data.groupby(lambda x: x // 3) - - transformed = grouped.transform(lambda x: x * x.sum()) - self.assertEqual(transformed[7], 12) - - # GH 8046 - # make sure that we preserve the input order - - df = DataFrame( - np.arange(6, dtype='int64').reshape( - 3, 2), columns=["a", "b"], index=[0, 2, 1]) - key = [0, 0, 1] - expected = df.sort_index().groupby(key).transform( - lambda x: x - x.mean()).groupby(key).mean() - result = df.groupby(key).transform(lambda x: x - x.mean()).groupby( - key).mean() - assert_frame_equal(result, expected) - - def demean(arr): - return arr - arr.mean() - - people = DataFrame(np.random.randn(5, 5), - columns=['a', 'b', 'c', 'd', 'e'], - index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) - key = ['one', 'two', 'one', 'two', 'one'] - result = people.groupby(key).transform(demean).groupby(key).mean() - expected = people.groupby(key).apply(demean).groupby(key).mean() - assert_frame_equal(result, expected) - - # GH 8430 - df = tm.makeTimeDataFrame() - g = df.groupby(pd.TimeGrouper('M')) - g.transform(lambda x: x - 1) - - # GH 9700 - df = DataFrame({'a': range(5, 10), 'b': range(5)}) - result = df.groupby('a').transform(max) - expected = DataFrame({'b': range(5)}) - tm.assert_frame_equal(result, expected) - - def test_transform_fast(self): - - df = DataFrame({'id': np.arange(100000) / 3, - 'val': np.random.randn(100000)}) - - grp = df.groupby('id')['val'] - - values = np.repeat(grp.mean().values, - _ensure_platform_int(grp.count().values)) - expected = pd.Series(values, index=df.index, name='val') - - result = grp.transform(np.mean) - assert_series_equal(result, expected) - - result = grp.transform('mean') - assert_series_equal(result, expected) - - # GH 12737 - df = pd.DataFrame({'grouping': [0, 1, 1, 3], 'f': [1.1, 2.1, 3.1, 4.5], - 'd': pd.date_range('2014-1-1', '2014-1-4'), - 'i': [1, 2, 3, 4]}, - columns=['grouping', 'f', 'i', 'd']) - result = df.groupby('grouping').transform('first') - - dates = [pd.Timestamp('2014-1-1'), pd.Timestamp('2014-1-2'), - pd.Timestamp('2014-1-2'), pd.Timestamp('2014-1-4')] - expected = pd.DataFrame({'f': [1.1, 2.1, 2.1, 4.5], - 'd': dates, - 'i': [1, 2, 2, 4]}, - columns=['f', 'i', 'd']) - assert_frame_equal(result, expected) - - # selection - result = df.groupby('grouping')[['f', 'i']].transform('first') - expected = expected[['f', 'i']] - assert_frame_equal(result, expected) - - # dup columns - df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['g', 'a', 'a']) - result = df.groupby('g').transform('first') - expected = df.drop('g', axis=1) - assert_frame_equal(result, expected) - - def test_transform_broadcast(self): - grouped = self.ts.groupby(lambda x: x.month) - result = grouped.transform(np.mean) - - self.assert_index_equal(result.index, self.ts.index) - for _, gp in grouped: - assert_fp_equal(result.reindex(gp.index), gp.mean()) - - grouped = self.tsframe.groupby(lambda x: x.month) - result = grouped.transform(np.mean) - self.assert_index_equal(result.index, self.tsframe.index) - for _, gp in grouped: - agged = gp.mean() - res = result.reindex(gp.index) - for col in self.tsframe: - assert_fp_equal(res[col], agged[col]) - - # group columns - grouped = self.tsframe.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, - axis=1) - result = grouped.transform(np.mean) - self.assert_index_equal(result.index, self.tsframe.index) - self.assert_index_equal(result.columns, self.tsframe.columns) - for _, gp in grouped: - agged = gp.mean(1) - res = result.reindex(columns=gp.columns) - for idx in gp.index: - assert_fp_equal(res.xs(idx), agged[idx]) - - def test_transform_axis(self): - - # make sure that we are setting the axes - # correctly when on axis=0 or 1 - # in the presence of a non-monotonic indexer - # GH12713 - - base = self.tsframe.iloc[0:5] - r = len(base.index) - c = len(base.columns) - tso = DataFrame(np.random.randn(r, c), - index=base.index, - columns=base.columns, - dtype='float64') - # monotonic - ts = tso - grouped = ts.groupby(lambda x: x.weekday()) - result = ts - grouped.transform('mean') - expected = grouped.apply(lambda x: x - x.mean()) - assert_frame_equal(result, expected) - - ts = ts.T - grouped = ts.groupby(lambda x: x.weekday(), axis=1) - result = ts - grouped.transform('mean') - expected = grouped.apply(lambda x: (x.T - x.mean(1)).T) - assert_frame_equal(result, expected) - - # non-monotonic - ts = tso.iloc[[1, 0] + list(range(2, len(base)))] - grouped = ts.groupby(lambda x: x.weekday()) - result = ts - grouped.transform('mean') - expected = grouped.apply(lambda x: x - x.mean()) - assert_frame_equal(result, expected) - - ts = ts.T - grouped = ts.groupby(lambda x: x.weekday(), axis=1) - result = ts - grouped.transform('mean') - expected = grouped.apply(lambda x: (x.T - x.mean(1)).T) - assert_frame_equal(result, expected) - - def test_transform_dtype(self): - # GH 9807 - # Check transform dtype output is preserved - df = DataFrame([[1, 3], [2, 3]]) - result = df.groupby(1).transform('mean') - expected = DataFrame([[1.5], [1.5]]) - assert_frame_equal(result, expected) - - def test_transform_bug(self): - # GH 5712 - # transforming on a datetime column - df = DataFrame(dict(A=Timestamp('20130101'), B=np.arange(5))) - result = df.groupby('A')['B'].transform( - lambda x: x.rank(ascending=False)) - expected = Series(np.arange(5, 0, step=-1), name='B') - assert_series_equal(result, expected) - - def test_transform_multiple(self): - grouped = self.ts.groupby([lambda x: x.year, lambda x: x.month]) - - grouped.transform(lambda x: x * 2) - grouped.transform(np.mean) - - def test_dispatch_transform(self): - df = self.tsframe[::5].reindex(self.tsframe.index) - - grouped = df.groupby(lambda x: x.month) - - filled = grouped.fillna(method='pad') - fillit = lambda x: x.fillna(method='pad') - expected = df.groupby(lambda x: x.month).transform(fillit) - assert_frame_equal(filled, expected) - - def test_transform_select_columns(self): - f = lambda x: x.mean() - result = self.df.groupby('A')['C', 'D'].transform(f) - - selection = self.df[['C', 'D']] - expected = selection.groupby(self.df['A']).transform(f) - - assert_frame_equal(result, expected) - - def test_transform_exclude_nuisance(self): - - # this also tests orderings in transform between - # series/frame to make sure it's consistent - expected = {} - grouped = self.df.groupby('A') - expected['C'] = grouped['C'].transform(np.mean) - expected['D'] = grouped['D'].transform(np.mean) - expected = DataFrame(expected) - result = self.df.groupby('A').transform(np.mean) - - assert_frame_equal(result, expected) - - def test_transform_function_aliases(self): - result = self.df.groupby('A').transform('mean') - expected = self.df.groupby('A').transform(np.mean) - assert_frame_equal(result, expected) - - result = self.df.groupby('A')['C'].transform('mean') - expected = self.df.groupby('A')['C'].transform(np.mean) - assert_series_equal(result, expected) - - def test_series_fast_transform_date(self): - # GH 13191 - df = pd.DataFrame({'grouping': [np.nan, 1, 1, 3], - 'd': pd.date_range('2014-1-1', '2014-1-4')}) - result = df.groupby('grouping')['d'].transform('first') - dates = [pd.NaT, pd.Timestamp('2014-1-2'), pd.Timestamp('2014-1-2'), - pd.Timestamp('2014-1-4')] - expected = pd.Series(dates, name='d') - assert_series_equal(result, expected) - - def test_transform_length(self): - # GH 9697 - df = pd.DataFrame({'col1': [1, 1, 2, 2], 'col2': [1, 2, 3, np.nan]}) - expected = pd.Series([3.0] * 4) - - def nsum(x): - return np.nansum(x) - - results = [df.groupby('col1').transform(sum)['col2'], - df.groupby('col1')['col2'].transform(sum), - df.groupby('col1').transform(nsum)['col2'], - df.groupby('col1')['col2'].transform(nsum)] - for result in results: - assert_series_equal(result, expected, check_names=False) - - def test_transform_coercion(self): - - # 14457 - # when we are transforming be sure to not coerce - # via assignment - df = pd.DataFrame(dict(A=['a', 'a'], B=[0, 1])) - g = df.groupby('A') - - expected = g.transform(np.mean) - result = g.transform(lambda x: np.mean(x)) - assert_frame_equal(result, expected) - def test_with_na(self): index = Index(np.arange(10)) @@ -1320,7 +798,7 @@ def test_with_na(self): assert_series_equal(agged, expected, check_dtype=False) - # self.assertTrue(issubclass(agged.dtype.type, np.integer)) + # assert issubclass(agged.dtype.type, np.integer) # explicity return a float from my function def f(x): @@ -1330,59 +808,7 @@ def f(x): expected = Series([4, 2], index=['bar', 'foo']) assert_series_equal(agged, expected, check_dtype=False) - self.assertTrue(issubclass(agged.dtype.type, np.dtype(dtype).type)) - - def test_groupby_transform_with_int(self): - - # GH 3740, make sure that we might upcast on item-by-item transform - - # floats - df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=Series(1, dtype='float64'), - C=Series( - [1, 2, 3, 1, 2, 3], dtype='float64'), D='foo')) - with np.errstate(all='ignore'): - result = df.groupby('A').transform( - lambda x: (x - x.mean()) / x.std()) - expected = DataFrame(dict(B=np.nan, C=Series( - [-1, 0, 1, -1, 0, 1], dtype='float64'))) - assert_frame_equal(result, expected) - - # int case - df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=1, - C=[1, 2, 3, 1, 2, 3], D='foo')) - with np.errstate(all='ignore'): - result = df.groupby('A').transform( - lambda x: (x - x.mean()) / x.std()) - expected = DataFrame(dict(B=np.nan, C=[-1, 0, 1, -1, 0, 1])) - assert_frame_equal(result, expected) - - # int that needs float conversion - s = Series([2, 3, 4, 10, 5, -1]) - df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=1, C=s, D='foo')) - with np.errstate(all='ignore'): - result = df.groupby('A').transform( - lambda x: (x - x.mean()) / x.std()) - - s1 = s.iloc[0:3] - s1 = (s1 - s1.mean()) / s1.std() - s2 = s.iloc[3:6] - s2 = (s2 - s2.mean()) / s2.std() - expected = DataFrame(dict(B=np.nan, C=concat([s1, s2]))) - assert_frame_equal(result, expected) - - # int downcasting - result = df.groupby('A').transform(lambda x: x * 2 / 2) - expected = DataFrame(dict(B=1, C=[2, 3, 4, 10, 5, -1])) - assert_frame_equal(result, expected) - - def test_groupby_transform_with_nan_group(self): - # GH 9941 - df = pd.DataFrame({'a': range(10), - 'b': [1, 1, 2, 3, np.nan, 4, 4, 5, 5, 5]}) - result = df.groupby(df.b)['a'].transform(max) - expected = pd.Series([1., 1., 2., 3., np.nan, 6., 6., 9., 9., 9.], - name='a') - assert_series_equal(result, expected) + assert issubclass(agged.dtype.type, np.dtype(dtype).type) def test_indices_concatenation_order(self): @@ -1427,12 +853,12 @@ def f3(x): assert_frame_equal(result1, result2) # should fail (not the same number of levels) - self.assertRaises(AssertionError, df.groupby('a').apply, f2) - self.assertRaises(AssertionError, df2.groupby('a').apply, f2) + pytest.raises(AssertionError, df.groupby('a').apply, f2) + pytest.raises(AssertionError, df2.groupby('a').apply, f2) # should fail (incorrect shape) - self.assertRaises(AssertionError, df.groupby('a').apply, f3) - self.assertRaises(AssertionError, df2.groupby('a').apply, f3) + pytest.raises(AssertionError, df.groupby('a').apply, f3) + pytest.raises(AssertionError, df2.groupby('a').apply, f3) def test_attr_wrapper(self): grouped = self.ts.groupby(lambda x: x.weekday()) @@ -1447,19 +873,19 @@ def test_attr_wrapper(self): for name, gp in grouped: expected[name] = gp.describe() expected = DataFrame(expected).T - assert_frame_equal(result.unstack(), expected) + assert_frame_equal(result, expected) # get attribute result = grouped.dtype expected = grouped.agg(lambda x: x.dtype) # make sure raises error - self.assertRaises(AttributeError, getattr, grouped, 'foo') + pytest.raises(AttributeError, getattr, grouped, 'foo') def test_series_describe_multikey(self): ts = tm.makeTimeSeries() grouped = ts.groupby([lambda x: x.year, lambda x: x.month]) - result = grouped.describe().unstack() + result = grouped.describe() assert_series_equal(result['mean'], grouped.mean(), check_names=False) assert_series_equal(result['std'], grouped.std(), check_names=False) assert_series_equal(result['min'], grouped.min(), check_names=False) @@ -1468,28 +894,38 @@ def test_series_describe_single(self): ts = tm.makeTimeSeries() grouped = ts.groupby(lambda x: x.month) result = grouped.apply(lambda x: x.describe()) - expected = grouped.describe() + expected = grouped.describe().stack() assert_series_equal(result, expected) def test_series_index_name(self): grouped = self.df.loc[:, ['C']].groupby(self.df['A']) result = grouped.agg(lambda x: x.mean()) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' def test_frame_describe_multikey(self): grouped = self.tsframe.groupby([lambda x: x.year, lambda x: x.month]) result = grouped.describe() - + desc_groups = [] for col in self.tsframe: - expected = grouped[col].describe() - assert_series_equal(result[col], expected, check_names=False) + group = grouped[col].describe() + group_col = pd.MultiIndex([[col] * len(group.columns), + group.columns], + [[0] * len(group.columns), + range(len(group.columns))]) + group = pd.DataFrame(group.values, + columns=group_col, + index=group.index) + desc_groups.append(group) + expected = pd.concat(desc_groups, axis=1) + tm.assert_frame_equal(result, expected) groupedT = self.tsframe.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, axis=1) result = groupedT.describe() - - for name, group in groupedT: - assert_frame_equal(result[name], group.describe()) + expected = self.tsframe.describe().T + expected.index = pd.MultiIndex([[0, 0, 1, 1], expected.index], + [range(4), range(len(expected.index))]) + tm.assert_frame_equal(result, expected) def test_frame_describe_tupleindex(self): @@ -1499,18 +935,35 @@ def test_frame_describe_tupleindex(self): 'z': [100, 200, 300, 400, 500] * 3}) df1['k'] = [(0, 0, 1), (0, 1, 0), (1, 0, 0)] * 5 df2 = df1.rename(columns={'k': 'key'}) - result = df1.groupby('k').describe() - expected = df2.groupby('key').describe() - expected.index.set_names(result.index.names, inplace=True) - assert_frame_equal(result, expected) + pytest.raises(ValueError, lambda: df1.groupby('k').describe()) + pytest.raises(ValueError, lambda: df2.groupby('key').describe()) + + def test_frame_describe_unstacked_format(self): + # GH 4792 + prices = {pd.Timestamp('2011-01-06 10:59:05', tz=None): 24990, + pd.Timestamp('2011-01-06 12:43:33', tz=None): 25499, + pd.Timestamp('2011-01-06 12:54:09', tz=None): 25499} + volumes = {pd.Timestamp('2011-01-06 10:59:05', tz=None): 1500000000, + pd.Timestamp('2011-01-06 12:43:33', tz=None): 5000000000, + pd.Timestamp('2011-01-06 12:54:09', tz=None): 100000000} + df = pd.DataFrame({'PRICE': prices, + 'VOLUME': volumes}) + result = df.groupby('PRICE').VOLUME.describe() + data = [df[df.PRICE == 24990].VOLUME.describe().values.tolist(), + df[df.PRICE == 25499].VOLUME.describe().values.tolist()] + expected = pd.DataFrame(data, + index=pd.Index([24990, 25499], name='PRICE'), + columns=['count', 'mean', 'std', 'min', + '25%', '50%', '75%', 'max']) + tm.assert_frame_equal(result, expected) def test_frame_groupby(self): grouped = self.tsframe.groupby(lambda x: x.weekday()) # aggregate aggregated = grouped.aggregate(np.mean) - self.assertEqual(len(aggregated), 5) - self.assertEqual(len(aggregated.columns), 4) + assert len(aggregated) == 5 + assert len(aggregated.columns) == 4 # by string tscopy = self.tsframe.copy() @@ -1521,8 +974,8 @@ def test_frame_groupby(self): # transform grouped = self.tsframe.head(30).groupby(lambda x: x.weekday()) transformed = grouped.transform(lambda x: x - x.mean()) - self.assertEqual(len(transformed), 30) - self.assertEqual(len(transformed.columns), 4) + assert len(transformed) == 30 + assert len(transformed.columns) == 4 # transform propagate transformed = grouped.transform(lambda x: x.mean()) @@ -1534,7 +987,7 @@ def test_frame_groupby(self): # iterate for weekday, group in grouped: - self.assertEqual(group.index[0].weekday(), weekday) + assert group.index[0].weekday() == weekday # groups / group_indices groups = grouped.groups @@ -1542,7 +995,7 @@ def test_frame_groupby(self): for k, v in compat.iteritems(groups): samething = self.tsframe.index.take(indices[k]) - self.assertTrue((samething == v).all()) + assert (samething == v).all() def test_grouping_is_iterable(self): # this code path isn't used anywhere else @@ -1560,8 +1013,8 @@ def test_frame_groupby_columns(self): # aggregate aggregated = grouped.aggregate(np.mean) - self.assertEqual(len(aggregated), len(self.tsframe)) - self.assertEqual(len(aggregated.columns), 2) + assert len(aggregated) == len(self.tsframe) + assert len(aggregated.columns) == 2 # transform tf = lambda x: x - x.mean() @@ -1570,32 +1023,34 @@ def test_frame_groupby_columns(self): # iterate for k, v in grouped: - self.assertEqual(len(v.columns), 2) + assert len(v.columns) == 2 def test_frame_set_name_single(self): grouped = self.df.groupby('A') result = grouped.mean() - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' result = self.df.groupby('A', as_index=False).mean() - self.assertNotEqual(result.index.name, 'A') + assert result.index.name != 'A' result = grouped.agg(np.mean) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' result = grouped.agg({'C': np.mean, 'D': np.std}) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' result = grouped['C'].mean() - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' result = grouped['C'].agg(np.mean) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' result = grouped['C'].agg([np.mean, np.std]) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' - result = grouped['C'].agg({'foo': np.mean, 'bar': np.std}) - self.assertEqual(result.index.name, 'A') + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = grouped['C'].agg({'foo': np.mean, 'bar': np.std}) + assert result.index.name == 'A' def test_multi_iter(self): s = Series(np.arange(6)) @@ -1609,8 +1064,8 @@ def test_multi_iter(self): ('b', '1', s[[4]]), ('b', '2', s[[3, 5]])] for i, ((one, two), three) in enumerate(iterated): e1, e2, e3 = expected[i] - self.assertEqual(e1, one) - self.assertEqual(e2, two) + assert e1 == one + assert e2 == two assert_series_equal(three, e3) def test_multi_iter_frame(self): @@ -1632,8 +1087,8 @@ def test_multi_iter_frame(self): ('b', '2', df.loc[idx[[1]]])] for i, ((one, two), three) in enumerate(iterated): e1, e2, e3 = expected[i] - self.assertEqual(e1, one) - self.assertEqual(e2, two) + assert e1 == one + assert e2 == two assert_frame_equal(three, e3) # don't iterate through groups with no data @@ -1643,7 +1098,7 @@ def test_multi_iter_frame(self): groups = {} for key, gp in grouped: groups[key] = gp - self.assertEqual(len(groups), 2) + assert len(groups) == 2 # axis = 1 three_levels = self.three_group.groupby(['A', 'B', 'C']).mean() @@ -1652,16 +1107,17 @@ def test_multi_iter_frame(self): pass def test_multi_iter_panel(self): - wp = tm.makePanel() - grouped = wp.groupby([lambda x: x.month, lambda x: x.weekday()], - axis=1) - - for (month, wd), group in grouped: - exp_axis = [x - for x in wp.major_axis - if x.month == month and x.weekday() == wd] - expected = wp.reindex(major=exp_axis) - assert_panel_equal(group, expected) + with catch_warnings(record=True): + wp = tm.makePanel() + grouped = wp.groupby([lambda x: x.month, lambda x: x.weekday()], + axis=1) + + for (month, wd), group in grouped: + exp_axis = [x + for x in wp.major_axis + if x.month == month and x.weekday() == wd] + expected = wp.reindex(major=exp_axis) + assert_panel_equal(group, expected) def test_multi_func(self): col1 = self.df['A'] @@ -1722,25 +1178,26 @@ def test_groupby_multiple_columns(self): def _check_op(op): - result1 = op(grouped) - - expected = defaultdict(dict) - for n1, gp1 in data.groupby('A'): - for n2, gp2 in gp1.groupby('B'): - expected[n1][n2] = op(gp2.loc[:, ['C', 'D']]) - expected = dict((k, DataFrame(v)) - for k, v in compat.iteritems(expected)) - expected = Panel.fromDict(expected).swapaxes(0, 1) - expected.major_axis.name, expected.minor_axis.name = 'A', 'B' - - # a little bit crude - for col in ['C', 'D']: - result_col = op(grouped[col]) - exp = expected[col] - pivoted = result1[col].unstack() - pivoted2 = result_col.unstack() - assert_frame_equal(pivoted.reindex_like(exp), exp) - assert_frame_equal(pivoted2.reindex_like(exp), exp) + with catch_warnings(record=True): + result1 = op(grouped) + + expected = defaultdict(dict) + for n1, gp1 in data.groupby('A'): + for n2, gp2 in gp1.groupby('B'): + expected[n1][n2] = op(gp2.loc[:, ['C', 'D']]) + expected = dict((k, DataFrame(v)) + for k, v in compat.iteritems(expected)) + expected = Panel.fromDict(expected).swapaxes(0, 1) + expected.major_axis.name, expected.minor_axis.name = 'A', 'B' + + # a little bit crude + for col in ['C', 'D']: + result_col = op(grouped[col]) + exp = expected[col] + pivoted = result1[col].unstack() + pivoted2 = result_col.unstack() + assert_frame_equal(pivoted.reindex_like(exp), exp) + assert_frame_equal(pivoted2.reindex_like(exp), exp) _check_op(lambda x: x.sum()) _check_op(lambda x: x.mean()) @@ -1768,7 +1225,10 @@ def test_groupby_as_index_agg(self): grouped = self.df.groupby('A', as_index=True) expected3 = grouped['C'].sum() expected3 = DataFrame(expected3).rename(columns={'C': 'Q'}) - result3 = grouped['C'].agg({'Q': np.sum}) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result3 = grouped['C'].agg({'Q': np.sum}) assert_frame_equal(result3, expected3) # multi-key @@ -1845,58 +1305,6 @@ def check_nunique(df, keys, as_index=True): check_nunique(frame, ['jim'], as_index=False) check_nunique(frame, ['jim', 'joe'], as_index=False) - def test_series_groupby_value_counts(self): - from itertools import product - - def rebuild_index(df): - arr = list(map(df.index.get_level_values, range(df.index.nlevels))) - df.index = MultiIndex.from_arrays(arr, names=df.index.names) - return df - - def check_value_counts(df, keys, bins): - for isort, normalize, sort, ascending, dropna \ - in product((False, True), repeat=5): - - kwargs = dict(normalize=normalize, sort=sort, - ascending=ascending, dropna=dropna, bins=bins) - - gr = df.groupby(keys, sort=isort) - left = gr['3rd'].value_counts(**kwargs) - - gr = df.groupby(keys, sort=isort) - right = gr['3rd'].apply(Series.value_counts, **kwargs) - right.index.names = right.index.names[:-1] + ['3rd'] - - # have to sort on index because of unstable sort on values - left, right = map(rebuild_index, (left, right)) # xref GH9212 - assert_series_equal(left.sort_index(), right.sort_index()) - - def loop(df): - bins = None, np.arange(0, max(5, df['3rd'].max()) + 1, 2) - keys = '1st', '2nd', ('1st', '2nd') - for k, b in product(keys, bins): - check_value_counts(df, k, b) - - days = date_range('2015-08-24', periods=10) - - for n, m in product((100, 1000), (5, 20)): - frame = DataFrame({ - '1st': np.random.choice( - list('abcd'), n), - '2nd': np.random.choice(days, n), - '3rd': np.random.randint(1, m + 1, n) - }) - - loop(frame) - - frame.loc[1::11, '1st'] = nan - frame.loc[3::17, '2nd'] = nan - frame.loc[7::19, '3rd'] = nan - frame.loc[8::19, '3rd'] = nan - frame.loc[9::19, '3rd'] = nan - - loop(frame) - def test_multiindex_passthru(self): # GH 7997 @@ -1938,26 +1346,26 @@ def test_as_index_series_return_frame(self): result = grouped['C'].agg(np.sum) expected = grouped.agg(np.sum).loc[:, ['A', 'C']] - tm.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) assert_frame_equal(result, expected) result2 = grouped2['C'].agg(np.sum) expected2 = grouped2.agg(np.sum).loc[:, ['A', 'B', 'C']] - tm.assertIsInstance(result2, DataFrame) + assert isinstance(result2, DataFrame) assert_frame_equal(result2, expected2) result = grouped['C'].sum() expected = grouped.sum().loc[:, ['A', 'C']] - tm.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) assert_frame_equal(result, expected) result2 = grouped2['C'].sum() expected2 = grouped2.sum().loc[:, ['A', 'B', 'C']] - tm.assertIsInstance(result2, DataFrame) + assert isinstance(result2, DataFrame) assert_frame_equal(result2, expected2) # corner case - self.assertRaises(Exception, grouped['C'].__getitem__, 'D') + pytest.raises(Exception, grouped['C'].__getitem__, 'D') def test_groupby_as_index_cython(self): data = self.df @@ -1975,7 +1383,7 @@ def test_groupby_as_index_cython(self): result = grouped.mean() expected = data.groupby(['A', 'B']).mean() - arrays = lzip(*expected.index._tuple_index) + arrays = lzip(*expected.index.values) expected.insert(0, 'A', arrays[0]) expected.insert(1, 'B', arrays[1]) expected.index = np.arange(len(expected)) @@ -1991,11 +1399,11 @@ def test_groupby_as_index_series_scalar(self): assert_frame_equal(result, expected) def test_groupby_as_index_corner(self): - self.assertRaises(TypeError, self.ts.groupby, lambda x: x.weekday(), - as_index=False) + pytest.raises(TypeError, self.ts.groupby, lambda x: x.weekday(), + as_index=False) - self.assertRaises(ValueError, self.df.groupby, lambda x: x.lower(), - as_index=False, axis=1) + pytest.raises(ValueError, self.df.groupby, lambda x: x.lower(), + as_index=False, axis=1) def test_groupby_as_index_apply(self): # GH #4648 and #3417 @@ -2091,7 +1499,7 @@ def test_groupby_multiple_key(self): lambda x: x.day], axis=1) agged = grouped.agg(lambda x: x.sum()) - self.assert_index_equal(agged.index, df.columns) + tm.assert_index_equal(agged.index, df.columns) assert_almost_equal(df.T.values, agged.values) agged = grouped.agg(lambda x: x.sum()) @@ -2128,8 +1536,8 @@ def test_omit_nuisance(self): # won't work with axis = 1 grouped = df.groupby({'A': 0, 'C': 0, 'D': 1, 'E': 1}, axis=1) - result = self.assertRaises(TypeError, grouped.agg, - lambda x: x.sum(0, numeric_only=False)) + result = pytest.raises(TypeError, grouped.agg, + lambda x: x.sum(0, numeric_only=False)) def test_omit_nuisance_python_multiple(self): grouped = self.three_group.groupby(['A', 'B']) @@ -2155,7 +1563,7 @@ def test_empty_groups_corner(self): agged = grouped.apply(lambda x: x.mean()) agged_A = grouped['A'].apply(np.mean) assert_series_equal(agged['A'], agged_A) - self.assertEqual(agged.index.name, 'first') + assert agged.index.name == 'first' def test_apply_concat_preserve_names(self): grouped = self.three_group.groupby(['A', 'B']) @@ -2183,17 +1591,17 @@ def desc3(group): return result result = grouped.apply(desc) - self.assertEqual(result.index.names, ('A', 'B', 'stat')) + assert result.index.names == ('A', 'B', 'stat') result2 = grouped.apply(desc2) - self.assertEqual(result2.index.names, ('A', 'B', 'stat')) + assert result2.index.names == ('A', 'B', 'stat') result3 = grouped.apply(desc3) - self.assertEqual(result3.index.names, ('A', 'B', None)) + assert result3.index.names == ('A', 'B', None) def test_nonsense_func(self): df = DataFrame([0]) - self.assertRaises(Exception, df.groupby, lambda x: x + 'foo') + pytest.raises(Exception, df.groupby, lambda x: x + 'foo') def test_builtins_apply(self): # GH8155 df = pd.DataFrame(np.random.randint(1, 50, (1000, 2)), @@ -2222,51 +1630,6 @@ def test_builtins_apply(self): # GH8155 assert_series_equal(getattr(result, fname)(), getattr(df, fname)()) - def test_cythonized_aggers(self): - data = {'A': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1., nan, nan], - 'B': ['A', 'B'] * 6, - 'C': np.random.randn(12)} - df = DataFrame(data) - df.loc[2:10:2, 'C'] = nan - - def _testit(name): - - op = lambda x: getattr(x, name)() - - # single column - grouped = df.drop(['B'], axis=1).groupby('A') - exp = {} - for cat, group in grouped: - exp[cat] = op(group['C']) - exp = DataFrame({'C': exp}) - exp.index.name = 'A' - result = op(grouped) - assert_frame_equal(result, exp) - - # multiple columns - grouped = df.groupby(['A', 'B']) - expd = {} - for (cat1, cat2), group in grouped: - expd.setdefault(cat1, {})[cat2] = op(group['C']) - exp = DataFrame(expd).T.stack(dropna=False) - exp.index.names = ['A', 'B'] - exp.name = 'C' - - result = op(grouped)['C'] - if not tm._incompat_bottleneck_version(name): - assert_series_equal(result, exp) - - _testit('count') - _testit('sum') - _testit('std') - _testit('var') - _testit('sem') - _testit('mean') - _testit('median') - _testit('prod') - _testit('min') - _testit('max') - def test_max_min_non_numeric(self): # #2700 aa = DataFrame({'nn': [11, 11, 22, 22], @@ -2274,16 +1637,16 @@ def test_max_min_non_numeric(self): 'ss': 4 * ['mama']}) result = aa.groupby('nn').max() - self.assertTrue('ss' in result) + assert 'ss' in result result = aa.groupby('nn').max(numeric_only=False) - self.assertTrue('ss' in result) + assert 'ss' in result result = aa.groupby('nn').min() - self.assertTrue('ss' in result) + assert 'ss' in result result = aa.groupby('nn').min(numeric_only=False) - self.assertTrue('ss' in result) + assert 'ss' in result def test_arg_passthru(self): # make sure that we are passing thru kwargs @@ -2386,7 +1749,8 @@ def test_arg_passthru(self): for attr in ['cummin', 'cummax']: f = getattr(df.groupby('group'), attr) result = f() - tm.assert_index_equal(result.columns, expected_columns_numeric) + # GH 15561: numeric_only=False set by default like min/max + tm.assert_index_equal(result.columns, expected_columns) result = f(numeric_only=False) tm.assert_index_equal(result.columns, expected_columns) @@ -2401,31 +1765,6 @@ def test_arg_passthru(self): result = f(numeric_only=False) tm.assert_index_equal(result.columns, expected_columns) - def test_cython_agg_boolean(self): - frame = DataFrame({'a': np.random.randint(0, 5, 50), - 'b': np.random.randint(0, 2, 50).astype('bool')}) - result = frame.groupby('a')['b'].mean() - expected = frame.groupby('a')['b'].agg(np.mean) - - assert_series_equal(result, expected) - - def test_cython_agg_nothing_to_agg(self): - frame = DataFrame({'a': np.random.randint(0, 5, 50), - 'b': ['foo', 'bar'] * 25}) - self.assertRaises(DataError, frame.groupby('a')['b'].mean) - - frame = DataFrame({'a': np.random.randint(0, 5, 50), - 'b': ['foo', 'bar'] * 25}) - self.assertRaises(DataError, frame[['b']].groupby(frame['a']).mean) - - def test_cython_agg_nothing_to_agg_with_dates(self): - frame = DataFrame({'a': np.random.randint(0, 5, 50), - 'b': ['foo', 'bar'] * 25, - 'dates': pd.date_range('now', periods=50, - freq='T')}) - with tm.assertRaisesRegexp(DataError, "No numeric types to aggregate"): - frame.groupby('b').dates.mean() - def test_groupby_timedelta_cython_count(self): df = DataFrame({'g': list('ab' * 2), 'delt': np.arange(4).astype('timedelta64[ns]')}) @@ -2435,22 +1774,13 @@ def test_groupby_timedelta_cython_count(self): result = df.groupby('g').delt.count() tm.assert_series_equal(expected, result) - def test_cython_agg_frame_columns(self): - # #2113 - df = DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]}) - - df.groupby(level=0, axis='columns').mean() - df.groupby(level=0, axis='columns').mean() - df.groupby(level=0, axis='columns').mean() - df.groupby(level=0, axis='columns').mean() - def test_wrap_aggregated_output_multindex(self): df = self.mframe.T df['baz', 'two'] = 'peekaboo' keys = [np.array([0, 0, 1]), np.array([0, 0, 1])] agged = df.groupby(keys).agg(np.mean) - tm.assertIsInstance(agged.columns, MultiIndex) + assert isinstance(agged.columns, MultiIndex) def aggfun(ser): if ser.name == ('foo', 'one'): @@ -2459,7 +1789,7 @@ def aggfun(ser): return ser.sum() agged2 = df.groupby(keys).aggregate(aggfun) - self.assertEqual(len(agged2.columns) + 1, len(df.columns)) + assert len(agged2.columns) + 1 == len(df.columns) def test_groupby_level(self): frame = self.mframe @@ -2474,13 +1804,13 @@ def test_groupby_level(self): expected0 = expected0.reindex(frame.index.levels[0]) expected1 = expected1.reindex(frame.index.levels[1]) - self.assertEqual(result0.index.name, 'first') - self.assertEqual(result1.index.name, 'second') + assert result0.index.name == 'first' + assert result1.index.name == 'second' assert_frame_equal(result0, expected0) assert_frame_equal(result1, expected1) - self.assertEqual(result0.index.name, frame.index.names[0]) - self.assertEqual(result1.index.name, frame.index.names[1]) + assert result0.index.name == frame.index.names[0] + assert result1.index.name == frame.index.names[1] # groupby level name result0 = frame.groupby(level='first').sum() @@ -2496,14 +1826,14 @@ def test_groupby_level(self): assert_frame_equal(result1, expected1.T) # raise exception for non-MultiIndex - self.assertRaises(ValueError, self.df.groupby, level=1) + pytest.raises(ValueError, self.df.groupby, level=1) def test_groupby_level_index_names(self): # GH4014 this used to raise ValueError since 'exp'>1 (in py2) df = DataFrame({'exp': ['A'] * 3 + ['B'] * 3, 'var1': lrange(6), }).set_index('exp') df.groupby(level='exp') - self.assertRaises(ValueError, df.groupby, level='foo') + pytest.raises(ValueError, df.groupby, level='foo') def test_groupby_level_with_nas(self): index = MultiIndex(levels=[[1, 0], [0, 1, 2, 3]], @@ -2530,12 +1860,12 @@ def test_groupby_level_apply(self): frame = self.mframe result = frame.groupby(level=0).count() - self.assertEqual(result.index.name, 'first') + assert result.index.name == 'first' result = frame.groupby(level=1).count() - self.assertEqual(result.index.name, 'second') + assert result.index.name == 'second' result = frame['A'].groupby(level=0).count() - self.assertEqual(result.index.name, 'first') + assert result.index.name == 'first' def test_groupby_args(self): # PR8618 and issue 8015 @@ -2544,16 +1874,14 @@ def test_groupby_args(self): def j(): frame.groupby() - self.assertRaisesRegexp(TypeError, - "You have to supply one of 'by' and 'level'", - j) + tm.assert_raises_regex(TypeError, "You have to supply one of " + "'by' and 'level'", j) def k(): frame.groupby(by=None, level=None) - self.assertRaisesRegexp(TypeError, - "You have to supply one of 'by' and 'level'", - k) + tm.assert_raises_regex(TypeError, "You have to supply one of " + "'by' and 'level'", k) def test_groupby_level_mapper(self): frame = self.mframe @@ -2582,20 +1910,20 @@ def test_groupby_level_nonmulti(self): Index(range(1, 7), name='foo')) result = s.groupby(level=0).sum() - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = s.groupby(level=[0]).sum() - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = s.groupby(level=-1).sum() - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = s.groupby(level=[-1]).sum() - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) - tm.assertRaises(ValueError, s.groupby, level=1) - tm.assertRaises(ValueError, s.groupby, level=-2) - tm.assertRaises(ValueError, s.groupby, level=[]) - tm.assertRaises(ValueError, s.groupby, level=[0, 0]) - tm.assertRaises(ValueError, s.groupby, level=[0, 1]) - tm.assertRaises(ValueError, s.groupby, level=[1]) + pytest.raises(ValueError, s.groupby, level=1) + pytest.raises(ValueError, s.groupby, level=-2) + pytest.raises(ValueError, s.groupby, level=[]) + pytest.raises(ValueError, s.groupby, level=[0, 0]) + pytest.raises(ValueError, s.groupby, level=[0, 1]) + pytest.raises(ValueError, s.groupby, level=[1]) def test_groupby_complex(self): # GH 12902 @@ -2618,15 +1946,6 @@ def test_grouping_labels(self): exp_labels = np.array([2, 2, 2, 0, 0, 1, 1, 3, 3, 3], dtype=np.intp) assert_almost_equal(grouped.grouper.labels[0], exp_labels) - def test_cython_fail_agg(self): - dr = bdate_range('1/1/2000', periods=50) - ts = Series(['A', 'B', 'C', 'D', 'E'] * 10, index=dr) - - grouped = ts.groupby(lambda x: x.month) - summed = grouped.sum() - expected = grouped.agg(np.sum) - assert_series_equal(summed, expected) - def test_apply_series_to_frame(self): def f(piece): with np.errstate(invalid='ignore'): @@ -2641,29 +1960,29 @@ def f(piece): grouped = ts.groupby(lambda x: x.month) result = grouped.apply(f) - tm.assertIsInstance(result, DataFrame) - self.assert_index_equal(result.index, ts.index) + assert isinstance(result, DataFrame) + tm.assert_index_equal(result.index, ts.index) def test_apply_series_yield_constant(self): result = self.df.groupby(['A', 'B'])['C'].apply(len) - self.assertEqual(result.index.names[:2], ('A', 'B')) + assert result.index.names[:2] == ('A', 'B') def test_apply_frame_yield_constant(self): # GH13568 result = self.df.groupby(['A', 'B']).apply(len) - self.assertTrue(isinstance(result, Series)) - self.assertIsNone(result.name) + assert isinstance(result, Series) + assert result.name is None result = self.df.groupby(['A', 'B'])[['C', 'D']].apply(len) - self.assertTrue(isinstance(result, Series)) - self.assertIsNone(result.name) + assert isinstance(result, Series) + assert result.name is None def test_apply_frame_to_series(self): grouped = self.df.groupby(['A', 'B']) result = grouped.apply(len) expected = grouped.count()['C'] - self.assert_index_equal(result.index, expected.index) - self.assert_numpy_array_equal(result.values, expected.values) + tm.assert_index_equal(result.index, expected.index) + tm.assert_numpy_array_equal(result.values, expected.values) def test_apply_frame_concat_series(self): def trans(group): @@ -2680,7 +1999,7 @@ def trans2(group): result = df.groupby('A').apply(trans) exp = df.groupby('A')['C'].apply(trans2) assert_series_equal(result, exp, check_names=False) - self.assertEqual(result.name, 'C') + assert result.name == 'C' def test_apply_transform(self): grouped = self.ts.groupby(lambda x: x.month) @@ -2776,26 +2095,26 @@ def test_groupby_with_hier_columns(self): df = DataFrame(np.random.randn(8, 4), index=index, columns=columns) result = df.groupby(level=0).mean() - self.assert_index_equal(result.columns, columns) + tm.assert_index_equal(result.columns, columns) result = df.groupby(level=0, axis=1).mean() - self.assert_index_equal(result.index, df.index) + tm.assert_index_equal(result.index, df.index) result = df.groupby(level=0).agg(np.mean) - self.assert_index_equal(result.columns, columns) + tm.assert_index_equal(result.columns, columns) result = df.groupby(level=0).apply(lambda x: x.mean()) - self.assert_index_equal(result.columns, columns) + tm.assert_index_equal(result.columns, columns) result = df.groupby(level=0, axis=1).agg(lambda x: x.mean(1)) - self.assert_index_equal(result.columns, Index(['A', 'B'])) - self.assert_index_equal(result.index, df.index) + tm.assert_index_equal(result.columns, Index(['A', 'B'])) + tm.assert_index_equal(result.index, df.index) # add a nuisance column sorted_columns, _ = columns.sortlevel(0) df['A', 'foo'] = 'bar' result = df.groupby(level=0).mean() - self.assert_index_equal(result.columns, df.columns[:-1]) + tm.assert_index_equal(result.columns, df.columns[:-1]) def test_pass_args_kwargs(self): from numpy import percentile @@ -2842,17 +2161,17 @@ def test_size(self): grouped = self.df.groupby(['A', 'B']) result = grouped.size() for key, group in grouped: - self.assertEqual(result[key], len(group)) + assert result[key] == len(group) grouped = self.df.groupby('A') result = grouped.size() for key, group in grouped: - self.assertEqual(result[key], len(group)) + assert result[key] == len(group) grouped = self.df.groupby('B') result = grouped.size() for key, group in grouped: - self.assertEqual(result[key], len(group)) + assert result[key] == len(group) df = DataFrame(np.random.choice(20, (1000, 3)), columns=list('abc')) for sort, key in cart_product((False, True), ('a', 'b', ['a', 'b'])): @@ -2994,16 +2313,21 @@ def test_non_cython_api(self): assert_frame_equal(result, expected) # describe - expected = DataFrame(dict(B=concat( - [df.loc[[0, 1], 'B'].describe(), df.loc[[2], 'B'].describe()], - keys=[1, 3]))) - expected.index.names = ['A', None] + expected_index = pd.Index([1, 3], name='A') + expected_col = pd.MultiIndex(levels=[['B'], + ['count', 'mean', 'std', 'min', + '25%', '50%', '75%', 'max']], + labels=[[0] * 8, list(range(8))]) + expected = pd.DataFrame([[1.0, 2.0, nan, 2.0, 2.0, 2.0, 2.0, 2.0], + [0.0, nan, nan, nan, nan, nan, nan, nan]], + index=expected_index, + columns=expected_col) result = g.describe() assert_frame_equal(result, expected) - expected = concat( - [df.loc[[0, 1], ['A', 'B']].describe(), - df.loc[[2], ['A', 'B']].describe()], keys=[0, 1]) + expected = pd.concat([df[df.A == 1].describe().unstack().to_frame().T, + df[df.A == 3].describe().unstack().to_frame().T]) + expected.index = pd.Index([0, 1]) result = gni.describe() assert_frame_equal(result, expected) @@ -3015,7 +2339,7 @@ def test_non_cython_api(self): assert_frame_equal(result, expected) # idxmax - expected = DataFrame([[0], [nan]], columns=['B'], index=[1, 3]) + expected = DataFrame([[0.0], [nan]], columns=['B'], index=[1, 3]) expected.index.name = 'A' result = g.idxmax() assert_frame_equal(result, expected) @@ -3053,30 +2377,6 @@ def test_grouping_ndarray(self): assert_frame_equal(result, expected, check_names=False ) # Note: no names when grouping by value - def test_agg_consistency(self): - # agg with ([]) and () not consistent - # GH 6715 - - def P1(a): - try: - return np.percentile(a.dropna(), q=1) - except: - return np.nan - - import datetime as dt - df = DataFrame({'col1': [1, 2, 3, 4], - 'col2': [10, 25, 26, 31], - 'date': [dt.date(2013, 2, 10), dt.date(2013, 2, 10), - dt.date(2013, 2, 11), dt.date(2013, 2, 11)]}) - - g = df.groupby('date') - - expected = g.agg([P1]) - expected.columns = expected.columns.levels[0] - - result = g.agg(P1) - assert_frame_equal(result, expected) - def test_apply_typecast_fail(self): df = DataFrame({'d': [1., 1., 1., 2., 2., 2.], 'c': np.tile( @@ -3159,29 +2459,7 @@ def f(g): return g result = grouped.apply(f) - self.assertTrue('value3' in result) - - def test_transform_mixed_type(self): - index = MultiIndex.from_arrays([[0, 0, 0, 1, 1, 1], [1, 2, 3, 1, 2, 3] - ]) - df = DataFrame({'d': [1., 1., 1., 2., 2., 2.], - 'c': np.tile(['a', 'b', 'c'], 2), - 'v': np.arange(1., 7.)}, index=index) - - def f(group): - group['g'] = group['d'] * 2 - return group[:1] - - grouped = df.groupby('c') - result = grouped.apply(f) - - self.assertEqual(result['d'].dtype, np.float64) - - # this is by definition a mutating operation! - with option_context('mode.chained_assignment', None): - for key, group in grouped: - res = f(group) - assert_frame_equal(res, result.loc[key]) + assert 'value3' in result def test_groupby_wrong_multi_labels(self): from pandas import read_csv @@ -3203,24 +2481,24 @@ def test_groupby_wrong_multi_labels(self): def test_groupby_series_with_name(self): result = self.df.groupby(self.df['A']).mean() result2 = self.df.groupby(self.df['A'], as_index=False).mean() - self.assertEqual(result.index.name, 'A') - self.assertIn('A', result2) + assert result.index.name == 'A' + assert 'A' in result2 result = self.df.groupby([self.df['A'], self.df['B']]).mean() result2 = self.df.groupby([self.df['A'], self.df['B']], as_index=False).mean() - self.assertEqual(result.index.names, ('A', 'B')) - self.assertIn('A', result2) - self.assertIn('B', result2) + assert result.index.names == ('A', 'B') + assert 'A' in result2 + assert 'B' in result2 def test_seriesgroupby_name_attr(self): # GH 6265 result = self.df.groupby('A')['C'] - self.assertEqual(result.count().name, 'C') - self.assertEqual(result.mean().name, 'C') + assert result.count().name == 'C' + assert result.mean().name == 'C' testFunc = lambda x: np.sum(x) * 2 - self.assertEqual(result.agg(testFunc).name, 'C') + assert result.agg(testFunc).name == 'C' def test_consistency_name(self): # GH 12363 @@ -3252,11 +2530,11 @@ def summarize_random_name(df): }, name=df.iloc[0]['A']) metrics = self.df.groupby('A').apply(summarize) - self.assertEqual(metrics.columns.name, None) + assert metrics.columns.name is None metrics = self.df.groupby('A').apply(summarize, 'metrics') - self.assertEqual(metrics.columns.name, 'metrics') + assert metrics.columns.name == 'metrics' metrics = self.df.groupby('A').apply(summarize_random_name) - self.assertEqual(metrics.columns.name, None) + assert metrics.columns.name is None def test_groupby_nonstring_columns(self): df = DataFrame([np.arange(10) for x in range(10)]) @@ -3284,7 +2562,7 @@ def test_cython_grouper_series_bug_noncontig(self): inds = np.tile(lrange(10), 10) result = obj.groupby(inds).agg(Series.median) - self.assertTrue(result.isnull().all()) + assert result.isna().all() def test_series_grouper_noncontig_index(self): index = Index(tm.rands_array(10, 100)) @@ -3317,12 +2595,12 @@ def convert_force_pure(x): grouped = s.groupby(labels) result = grouped.agg(convert_fast) - self.assertEqual(result.dtype, np.object_) - tm.assertIsInstance(result[0], Decimal) + assert result.dtype == np.object_ + assert isinstance(result[0], Decimal) result = grouped.agg(convert_force_pure) - self.assertEqual(result.dtype, np.object_) - tm.assertIsInstance(result[0], Decimal) + assert result.dtype == np.object_ + assert isinstance(result[0], Decimal) def test_fast_apply(self): # make sure that fast apply is correctly called @@ -3348,7 +2626,7 @@ def f(g): group_keys = grouper._get_group_keys() values, mutated = splitter.fast_apply(f, group_keys) - self.assertFalse(mutated) + assert not mutated def test_apply_with_mixed_dtype(self): # GH3480, apply with mixed dtype on axis=1 breaks in 0.11 @@ -3392,7 +2670,7 @@ def test_groupby_aggregation_mixed_dtype(self): def test_groupby_dtype_inference_empty(self): # GH 6733 df = DataFrame({'x': [], 'range': np.arange(0, dtype='int64')}) - self.assertEqual(df['x'].dtype, np.float64) + assert df['x'].dtype == np.float64 result = df.groupby('x').first() exp_index = Index([], name='x', dtype=np.float64) @@ -3405,7 +2683,7 @@ def test_groupby_list_infer_array_like(self): expected = self.df.groupby(self.df['A']).mean() assert_frame_equal(result, expected, check_names=False) - self.assertRaises(Exception, self.df.groupby, list(self.df['A'][:-1])) + pytest.raises(Exception, self.df.groupby, list(self.df['A'][:-1])) # pathological case of ambiguity df = DataFrame({'foo': [0, 1], @@ -3431,9 +2709,9 @@ def test_groupby_keys_same_size_as_index(self): def test_groupby_one_row(self): # GH 11741 df1 = pd.DataFrame(np.random.randn(1, 4), columns=list('ABCD')) - self.assertRaises(KeyError, df1.groupby, 'Z') + pytest.raises(KeyError, df1.groupby, 'Z') df2 = pd.DataFrame(np.random.randn(2, 4), columns=list('ABCD')) - self.assertRaises(KeyError, df2.groupby, 'Z') + pytest.raises(KeyError, df2.groupby, 'Z') def test_groupby_nat_exclude(self): # GH 6992 @@ -3447,7 +2725,7 @@ def test_groupby_nat_exclude(self): expected = [pd.Index([1, 7]), pd.Index([3, 5])] keys = sorted(grouped.groups.keys()) - self.assertEqual(len(keys), 2) + assert len(keys) == 2 for k, e in zip(keys, expected): # grouped.groups keys are np.datetime64 with system tz # not to be affected by tz, only compare values @@ -3455,7 +2733,7 @@ def test_groupby_nat_exclude(self): # confirm obj is not filtered tm.assert_frame_equal(grouped.grouper.groupings[0].obj, df) - self.assertEqual(grouped.ngroups, 2) + assert grouped.ngroups == 2 expected = { Timestamp('2013-01-01 00:00:00'): np.array([1, 7], dtype=np.int64), @@ -3463,27 +2741,27 @@ def test_groupby_nat_exclude(self): } for k in grouped.indices: - self.assert_numpy_array_equal(grouped.indices[k], expected[k]) + tm.assert_numpy_array_equal(grouped.indices[k], expected[k]) tm.assert_frame_equal( grouped.get_group(Timestamp('2013-01-01')), df.iloc[[1, 7]]) tm.assert_frame_equal( grouped.get_group(Timestamp('2013-02-01')), df.iloc[[3, 5]]) - self.assertRaises(KeyError, grouped.get_group, pd.NaT) + pytest.raises(KeyError, grouped.get_group, pd.NaT) nan_df = DataFrame({'nan': [np.nan, np.nan, np.nan], 'nat': [pd.NaT, pd.NaT, pd.NaT]}) - self.assertEqual(nan_df['nan'].dtype, 'float64') - self.assertEqual(nan_df['nat'].dtype, 'datetime64[ns]') + assert nan_df['nan'].dtype == 'float64' + assert nan_df['nat'].dtype == 'datetime64[ns]' for key in ['nan', 'nat']: grouped = nan_df.groupby(key) - self.assertEqual(grouped.groups, {}) - self.assertEqual(grouped.ngroups, 0) - self.assertEqual(grouped.indices, {}) - self.assertRaises(KeyError, grouped.get_group, np.nan) - self.assertRaises(KeyError, grouped.get_group, pd.NaT) + assert grouped.groups == {} + assert grouped.ngroups == 0 + assert grouped.indices == {} + pytest.raises(KeyError, grouped.get_group, np.nan) + pytest.raises(KeyError, grouped.get_group, pd.NaT) def test_dictify(self): dict(iter(self.df.groupby('A'))) @@ -3495,8 +2773,9 @@ def test_dictify(self): def test_sparse_friendly(self): sdf = self.df[['C', 'D']].to_sparse() - panel = tm.makePanel() - tm.add_nans(panel) + with catch_warnings(record=True): + panel = tm.makePanel() + tm.add_nans(panel) def _check_work(gp): gp.mean() @@ -3512,43 +2791,28 @@ def _check_work(gp): # _check_work(panel.groupby(lambda x: x.month, axis=1)) def test_panel_groupby(self): - self.panel = tm.makePanel() - tm.add_nans(self.panel) - grouped = self.panel.groupby({'ItemA': 0, 'ItemB': 0, 'ItemC': 1}, - axis='items') - agged = grouped.mean() - agged2 = grouped.agg(lambda x: x.mean('items')) - - tm.assert_panel_equal(agged, agged2) - - self.assert_index_equal(agged.items, Index([0, 1])) - - grouped = self.panel.groupby(lambda x: x.month, axis='major') - agged = grouped.mean() + with catch_warnings(record=True): + self.panel = tm.makePanel() + tm.add_nans(self.panel) + grouped = self.panel.groupby({'ItemA': 0, 'ItemB': 0, 'ItemC': 1}, + axis='items') + agged = grouped.mean() + agged2 = grouped.agg(lambda x: x.mean('items')) - exp = Index(sorted(list(set(self.panel.major_axis.month)))) - self.assert_index_equal(agged.major_axis, exp) + tm.assert_panel_equal(agged, agged2) - grouped = self.panel.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, - axis='minor') - agged = grouped.mean() - self.assert_index_equal(agged.minor_axis, Index([0, 1])) - - def test_numpy_groupby(self): - from pandas.core.groupby import numpy_groupby + tm.assert_index_equal(agged.items, Index([0, 1])) - data = np.random.randn(100, 100) - labels = np.random.randint(0, 10, size=100) + grouped = self.panel.groupby(lambda x: x.month, axis='major') + agged = grouped.mean() - df = DataFrame(data) + exp = Index(sorted(list(set(self.panel.major_axis.month)))) + tm.assert_index_equal(agged.major_axis, exp) - result = df.groupby(labels).sum().values - expected = numpy_groupby(data, labels) - assert_almost_equal(result, expected) - - result = df.groupby(labels, axis=1).sum().values - expected = numpy_groupby(data, labels, axis=1) - assert_almost_equal(result, expected) + grouped = self.panel.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, + axis='minor') + agged = grouped.mean() + tm.assert_index_equal(agged.minor_axis, Index([0, 1])) def test_groupby_2d_malformed(self): d = DataFrame(index=lrange(2)) @@ -3558,8 +2822,8 @@ def test_groupby_2d_malformed(self): d['label'] = ['l1', 'l2'] tmp = d.groupby(['group']).mean() res_values = np.array([[0, 1], [0, 1]], dtype=np.int64) - self.assert_index_equal(tmp.columns, Index(['zeros', 'ones'])) - self.assert_numpy_array_equal(tmp.values, res_values) + tm.assert_index_equal(tmp.columns, Index(['zeros', 'ones'])) + tm.assert_numpy_array_equal(tmp.values, res_values) def test_int32_overflow(self): B = np.concatenate((np.arange(10000), np.arange(10000), np.arange(5000) @@ -3573,86 +2837,7 @@ def test_int32_overflow(self): left = df.groupby(['A', 'B', 'C', 'D']).sum() right = df.groupby(['D', 'C', 'B', 'A']).sum() - self.assertEqual(len(left), len(right)) - - def test_int64_overflow(self): - from pandas.core.groupby import _int64_overflow_possible - - B = np.concatenate((np.arange(1000), np.arange(1000), np.arange(500))) - A = np.arange(2500) - df = DataFrame({'A': A, - 'B': B, - 'C': A, - 'D': B, - 'E': A, - 'F': B, - 'G': A, - 'H': B, - 'values': np.random.randn(2500)}) - - lg = df.groupby(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) - rg = df.groupby(['H', 'G', 'F', 'E', 'D', 'C', 'B', 'A']) - - left = lg.sum()['values'] - right = rg.sum()['values'] - - exp_index, _ = left.index.sortlevel() - self.assert_index_equal(left.index, exp_index) - - exp_index, _ = right.index.sortlevel(0) - self.assert_index_equal(right.index, exp_index) - - tups = list(map(tuple, df[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H' - ]].values)) - tups = com._asarray_tuplesafe(tups) - - expected = df.groupby(tups).sum()['values'] - - for k, v in compat.iteritems(expected): - self.assertEqual(left[k], right[k[::-1]]) - self.assertEqual(left[k], v) - self.assertEqual(len(left), len(right)) - - # GH9096 - values = range(55109) - data = pd.DataFrame.from_dict({'a': values, - 'b': values, - 'c': values, - 'd': values}) - grouped = data.groupby(['a', 'b', 'c', 'd']) - self.assertEqual(len(grouped), len(values)) - - arr = np.random.randint(-1 << 12, 1 << 12, (1 << 15, 5)) - i = np.random.choice(len(arr), len(arr) * 4) - arr = np.vstack((arr, arr[i])) # add sume duplicate rows - - i = np.random.permutation(len(arr)) - arr = arr[i] # shuffle rows - - df = DataFrame(arr, columns=list('abcde')) - df['jim'], df['joe'] = np.random.randn(2, len(df)) * 10 - gr = df.groupby(list('abcde')) - - # verify this is testing what it is supposed to test! - self.assertTrue(_int64_overflow_possible(gr.grouper.shape)) - - # mannually compute groupings - jim, joe = defaultdict(list), defaultdict(list) - for key, a, b in zip(map(tuple, arr), df['jim'], df['joe']): - jim[key].append(a) - joe[key].append(b) - - self.assertEqual(len(gr), len(jim)) - mi = MultiIndex.from_tuples(jim.keys(), names=list('abcde')) - - def aggr(func): - f = lambda a: np.fromiter(map(func, a), dtype='f8') - arr = np.vstack((f(jim.values()), f(joe.values()))).T - res = DataFrame(arr, columns=['jim', 'joe'], index=mi) - return res.sort_index() - - assert_frame_equal(gr.mean(), aggr(np.mean)) - assert_frame_equal(gr.median(), aggr(np.median)) + assert len(left) == len(right) def test_groupby_sort_multi(self): df = DataFrame({'a': ['foo', 'bar', 'baz'], @@ -3663,17 +2848,17 @@ def test_groupby_sort_multi(self): tups = lmap(tuple, df[['a', 'b', 'c']].values) tups = com._asarray_tuplesafe(tups) result = df.groupby(['a', 'b', 'c'], sort=True).sum() - self.assert_numpy_array_equal(result.index.values, tups[[1, 2, 0]]) + tm.assert_numpy_array_equal(result.index.values, tups[[1, 2, 0]]) tups = lmap(tuple, df[['c', 'a', 'b']].values) tups = com._asarray_tuplesafe(tups) result = df.groupby(['c', 'a', 'b'], sort=True).sum() - self.assert_numpy_array_equal(result.index.values, tups) + tm.assert_numpy_array_equal(result.index.values, tups) tups = lmap(tuple, df[['b', 'c', 'a']].values) tups = com._asarray_tuplesafe(tups) result = df.groupby(['b', 'c', 'a'], sort=True).sum() - self.assert_numpy_array_equal(result.index.values, tups[[2, 1, 0]]) + tm.assert_numpy_array_equal(result.index.values, tups[[2, 1, 0]]) df = DataFrame({'a': [0, 1, 2, 0, 1, 2], 'b': [0, 0, 0, 1, 1, 1], @@ -3768,21 +2953,7 @@ def test_no_nonsense_name(self): s.name = None result = s.groupby(self.frame['A']).agg(np.sum) - self.assertIsNone(result.name) - - def test_wrap_agg_out(self): - grouped = self.three_group.groupby(['A', 'B']) - - def func(ser): - if ser.dtype == np.object: - raise TypeError - else: - return ser.sum() - - result = grouped.aggregate(func) - exp_grouped = self.three_group.loc[:, self.three_group.columns != 'C'] - expected = exp_grouped.groupby(['A', 'B']).aggregate(func) - assert_frame_equal(result, expected) + assert result.name is None def test_multifunc_sum_bug(self): # GH #1065 @@ -3792,7 +2963,7 @@ def test_multifunc_sum_bug(self): grouped = x.groupby('test') result = grouped.agg({'fl': 'sum', 2: 'size'}) - self.assertEqual(result['fl'].dtype, np.float64) + assert result['fl'].dtype == np.float64 def test_handle_dict_return_value(self): def f(group): @@ -3804,7 +2975,7 @@ def g(group): result = self.df.groupby('A')['C'].apply(f) expected = self.df.groupby('A')['C'].apply(g) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) assert_series_equal(result, expected) def test_getitem_list_of_columns(self): @@ -3841,110 +3012,6 @@ def test_getitem_numeric_column_names(self): assert_frame_equal(result2, expected) assert_frame_equal(result3, expected) - def test_agg_multiple_functions_maintain_order(self): - # GH #610 - funcs = [('mean', np.mean), ('max', np.max), ('min', np.min)] - result = self.df.groupby('A')['C'].agg(funcs) - exp_cols = Index(['mean', 'max', 'min']) - - self.assert_index_equal(result.columns, exp_cols) - - def test_multiple_functions_tuples_and_non_tuples(self): - # #1359 - - funcs = [('foo', 'mean'), 'std'] - ex_funcs = [('foo', 'mean'), ('std', 'std')] - - result = self.df.groupby('A')['C'].agg(funcs) - expected = self.df.groupby('A')['C'].agg(ex_funcs) - assert_frame_equal(result, expected) - - result = self.df.groupby('A').agg(funcs) - expected = self.df.groupby('A').agg(ex_funcs) - assert_frame_equal(result, expected) - - def test_agg_multiple_functions_too_many_lambdas(self): - grouped = self.df.groupby('A') - funcs = ['mean', lambda x: x.mean(), lambda x: x.std()] - - self.assertRaises(SpecificationError, grouped.agg, funcs) - - def test_more_flexible_frame_multi_function(self): - from pandas import concat - - grouped = self.df.groupby('A') - - exmean = grouped.agg(OrderedDict([['C', np.mean], ['D', np.mean]])) - exstd = grouped.agg(OrderedDict([['C', np.std], ['D', np.std]])) - - expected = concat([exmean, exstd], keys=['mean', 'std'], axis=1) - expected = expected.swaplevel(0, 1, axis=1).sort_index(level=0, axis=1) - - d = OrderedDict([['C', [np.mean, np.std]], ['D', [np.mean, np.std]]]) - result = grouped.aggregate(d) - - assert_frame_equal(result, expected) - - # be careful - result = grouped.aggregate(OrderedDict([['C', np.mean], - ['D', [np.mean, np.std]]])) - expected = grouped.aggregate(OrderedDict([['C', np.mean], - ['D', [np.mean, np.std]]])) - assert_frame_equal(result, expected) - - def foo(x): - return np.mean(x) - - def bar(x): - return np.std(x, ddof=1) - - d = OrderedDict([['C', np.mean], ['D', OrderedDict( - [['foo', np.mean], ['bar', np.std]])]]) - result = grouped.aggregate(d) - - d = OrderedDict([['C', [np.mean]], ['D', [foo, bar]]]) - expected = grouped.aggregate(d) - - assert_frame_equal(result, expected) - - def test_multi_function_flexible_mix(self): - # GH #1268 - grouped = self.df.groupby('A') - - d = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ - 'bar', 'std' - ]])], ['D', 'sum']]) - result = grouped.aggregate(d) - d2 = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ - 'bar', 'std' - ]])], ['D', ['sum']]]) - result2 = grouped.aggregate(d2) - - d3 = OrderedDict([['C', OrderedDict([['foo', 'mean'], [ - 'bar', 'std' - ]])], ['D', {'sum': 'sum'}]]) - expected = grouped.aggregate(d3) - - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) - - def test_agg_callables(self): - # GH 7929 - df = DataFrame({'foo': [1, 2], 'bar': [3, 4]}).astype(np.int64) - - class fn_class(object): - - def __call__(self, x): - return sum(x) - - equiv_callables = [sum, np.sum, lambda x: sum(x), lambda x: x.sum(), - partial(sum), fn_class()] - - expected = df.groupby("foo").agg(sum) - for ecall in equiv_callables: - result = df.groupby('foo').agg(ecall) - assert_frame_equal(result, expected) - def test_set_group_name(self): def f(group): assert group.name is not None @@ -3972,106 +3039,30 @@ def _check_all(grouped): _check_all(self.df.groupby('A')) _check_all(self.df.groupby(['A', 'B'])) - def test_no_dummy_key_names(self): - # GH #1291 + def test_group_name_available_in_inference_pass(self): + # gh-15062 + df = pd.DataFrame({'a': [0, 0, 1, 1, 2, 2], 'b': np.arange(6)}) + + names = [] + + def f(group): + names.append(group.name) + return group.copy() + df.groupby('a', sort=False, group_keys=False).apply(f) + # we expect 2 zeros because we call ``f`` once to see if a faster route + # can be used. + expected_names = [0, 0, 1, 2] + assert names == expected_names + + def test_no_dummy_key_names(self): + # see gh-1291 result = self.df.groupby(self.df['A'].values).sum() - self.assertIsNone(result.index.name) + assert result.index.name is None result = self.df.groupby([self.df['A'].values, self.df['B'].values ]).sum() - self.assertEqual(result.index.names, (None, None)) - - def test_groupby_sort_categorical(self): - # dataframe groupby sort was being ignored # GH 8868 - df = DataFrame([['(7.5, 10]', 10, 10], - ['(7.5, 10]', 8, 20], - ['(2.5, 5]', 5, 30], - ['(5, 7.5]', 6, 40], - ['(2.5, 5]', 4, 50], - ['(0, 2.5]', 1, 60], - ['(5, 7.5]', 7, 70]], columns=['range', 'foo', 'bar']) - df['range'] = Categorical(df['range'], ordered=True) - index = CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', - '(7.5, 10]'], name='range', ordered=True) - result_sort = DataFrame([[1, 60], [5, 30], [6, 40], [10, 10]], - columns=['foo', 'bar'], index=index) - - col = 'range' - assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) - # when categories is ordered, group is ordered by category's order - assert_frame_equal(result_sort, df.groupby(col, sort=False).first()) - - df['range'] = Categorical(df['range'], ordered=False) - index = CategoricalIndex(['(0, 2.5]', '(2.5, 5]', '(5, 7.5]', - '(7.5, 10]'], name='range') - result_sort = DataFrame([[1, 60], [5, 30], [6, 40], [10, 10]], - columns=['foo', 'bar'], index=index) - - index = CategoricalIndex(['(7.5, 10]', '(2.5, 5]', '(5, 7.5]', - '(0, 2.5]'], - categories=['(7.5, 10]', '(2.5, 5]', - '(5, 7.5]', '(0, 2.5]'], - name='range') - result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], - index=index, columns=['foo', 'bar']) - - col = 'range' - # this is an unordered categorical, but we allow this #### - assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) - assert_frame_equal(result_nosort, df.groupby(col, sort=False).first()) - - def test_groupby_sort_categorical_datetimelike(self): - # GH10505 - - # use same data as test_groupby_sort_categorical, which category is - # corresponding to datetime.month - df = DataFrame({'dt': [datetime(2011, 7, 1), datetime(2011, 7, 1), - datetime(2011, 2, 1), datetime(2011, 5, 1), - datetime(2011, 2, 1), datetime(2011, 1, 1), - datetime(2011, 5, 1)], - 'foo': [10, 8, 5, 6, 4, 1, 7], - 'bar': [10, 20, 30, 40, 50, 60, 70]}, - columns=['dt', 'foo', 'bar']) - - # ordered=True - df['dt'] = Categorical(df['dt'], ordered=True) - index = [datetime(2011, 1, 1), datetime(2011, 2, 1), - datetime(2011, 5, 1), datetime(2011, 7, 1)] - result_sort = DataFrame( - [[1, 60], [5, 30], [6, 40], [10, 10]], columns=['foo', 'bar']) - result_sort.index = CategoricalIndex(index, name='dt', ordered=True) - - index = [datetime(2011, 7, 1), datetime(2011, 2, 1), - datetime(2011, 5, 1), datetime(2011, 1, 1)] - result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], - columns=['foo', 'bar']) - result_nosort.index = CategoricalIndex(index, categories=index, - name='dt', ordered=True) - - col = 'dt' - assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) - # when categories is ordered, group is ordered by category's order - assert_frame_equal(result_sort, df.groupby(col, sort=False).first()) - - # ordered = False - df['dt'] = Categorical(df['dt'], ordered=False) - index = [datetime(2011, 1, 1), datetime(2011, 2, 1), - datetime(2011, 5, 1), datetime(2011, 7, 1)] - result_sort = DataFrame( - [[1, 60], [5, 30], [6, 40], [10, 10]], columns=['foo', 'bar']) - result_sort.index = CategoricalIndex(index, name='dt') - - index = [datetime(2011, 7, 1), datetime(2011, 2, 1), - datetime(2011, 5, 1), datetime(2011, 1, 1)] - result_nosort = DataFrame([[10, 10], [5, 30], [6, 40], [1, 60]], - columns=['foo', 'bar']) - result_nosort.index = CategoricalIndex(index, categories=index, - name='dt') - - col = 'dt' - assert_frame_equal(result_sort, df.groupby(col, sort=True).first()) - assert_frame_equal(result_nosort, df.groupby(col, sort=False).first()) + assert result.index.names == (None, None) def test_groupby_sort_multiindex_series(self): # series multiindex groupby sort argument was not being passed through @@ -4090,163 +3081,7 @@ def test_groupby_sort_multiindex_series(self): result = mseries.groupby(level=['a', 'b'], sort=True).first() assert_series_equal(result, mseries_result.sort_index()) - def test_groupby_groups_datetimeindex(self): - # #1430 - from pandas.tseries.api import DatetimeIndex - periods = 1000 - ind = DatetimeIndex(start='2012/1/1', freq='5min', periods=periods) - df = DataFrame({'high': np.arange(periods), - 'low': np.arange(periods)}, index=ind) - grouped = df.groupby(lambda x: datetime(x.year, x.month, x.day)) - - # it works! - groups = grouped.groups - tm.assertIsInstance(list(groups.keys())[0], datetime) - - # GH 11442 - index = pd.date_range('2015/01/01', periods=5, name='date') - df = pd.DataFrame({'A': [5, 6, 7, 8, 9], - 'B': [1, 2, 3, 4, 5]}, index=index) - result = df.groupby(level='date').groups - dates = ['2015-01-05', '2015-01-04', '2015-01-03', - '2015-01-02', '2015-01-01'] - expected = {pd.Timestamp(date): pd.DatetimeIndex([date], name='date') - for date in dates} - tm.assert_dict_equal(result, expected) - - grouped = df.groupby(level='date') - for date in dates: - result = grouped.get_group(date) - data = [[df.loc[date, 'A'], df.loc[date, 'B']]] - expected_index = pd.DatetimeIndex([date], name='date') - expected = pd.DataFrame(data, - columns=list('AB'), - index=expected_index) - tm.assert_frame_equal(result, expected) - - def test_groupby_groups_datetimeindex_tz(self): - # GH 3950 - dates = ['2011-07-19 07:00:00', '2011-07-19 08:00:00', - '2011-07-19 09:00:00', '2011-07-19 07:00:00', - '2011-07-19 08:00:00', '2011-07-19 09:00:00'] - df = DataFrame({'label': ['a', 'a', 'a', 'b', 'b', 'b'], - 'datetime': dates, - 'value1': np.arange(6, dtype='int64'), - 'value2': [1, 2] * 3}) - df['datetime'] = df['datetime'].apply( - lambda d: Timestamp(d, tz='US/Pacific')) - - exp_idx1 = pd.DatetimeIndex(['2011-07-19 07:00:00', - '2011-07-19 07:00:00', - '2011-07-19 08:00:00', - '2011-07-19 08:00:00', - '2011-07-19 09:00:00', - '2011-07-19 09:00:00'], - tz='US/Pacific', name='datetime') - exp_idx2 = Index(['a', 'b'] * 3, name='label') - exp_idx = MultiIndex.from_arrays([exp_idx1, exp_idx2]) - expected = DataFrame({'value1': [0, 3, 1, 4, 2, 5], - 'value2': [1, 2, 2, 1, 1, 2]}, - index=exp_idx, columns=['value1', 'value2']) - - result = df.groupby(['datetime', 'label']).sum() - assert_frame_equal(result, expected) - - # by level - didx = pd.DatetimeIndex(dates, tz='Asia/Tokyo') - df = DataFrame({'value1': np.arange(6, dtype='int64'), - 'value2': [1, 2, 3, 1, 2, 3]}, - index=didx) - - exp_idx = pd.DatetimeIndex(['2011-07-19 07:00:00', - '2011-07-19 08:00:00', - '2011-07-19 09:00:00'], tz='Asia/Tokyo') - expected = DataFrame({'value1': [3, 5, 7], 'value2': [2, 4, 6]}, - index=exp_idx, columns=['value1', 'value2']) - - result = df.groupby(level=0).sum() - assert_frame_equal(result, expected) - - def test_groupby_multi_timezone(self): - - # combining multiple / different timezones yields UTC - - data = """0,2000-01-28 16:47:00,America/Chicago -1,2000-01-29 16:48:00,America/Chicago -2,2000-01-30 16:49:00,America/Los_Angeles -3,2000-01-31 16:50:00,America/Chicago -4,2000-01-01 16:50:00,America/New_York""" - - df = pd.read_csv(StringIO(data), header=None, - names=['value', 'date', 'tz']) - result = df.groupby('tz').date.apply( - lambda x: pd.to_datetime(x).dt.tz_localize(x.name)) - - expected = Series([Timestamp('2000-01-28 16:47:00-0600', - tz='America/Chicago'), - Timestamp('2000-01-29 16:48:00-0600', - tz='America/Chicago'), - Timestamp('2000-01-30 16:49:00-0800', - tz='America/Los_Angeles'), - Timestamp('2000-01-31 16:50:00-0600', - tz='America/Chicago'), - Timestamp('2000-01-01 16:50:00-0500', - tz='America/New_York')], - name='date', - dtype=object) - assert_series_equal(result, expected) - - tz = 'America/Chicago' - res_values = df.groupby('tz').date.get_group(tz) - result = pd.to_datetime(res_values).dt.tz_localize(tz) - exp_values = Series(['2000-01-28 16:47:00', '2000-01-29 16:48:00', - '2000-01-31 16:50:00'], - index=[0, 1, 3], name='date') - expected = pd.to_datetime(exp_values).dt.tz_localize(tz) - assert_series_equal(result, expected) - - def test_groupby_groups_periods(self): - dates = ['2011-07-19 07:00:00', '2011-07-19 08:00:00', - '2011-07-19 09:00:00', '2011-07-19 07:00:00', - '2011-07-19 08:00:00', '2011-07-19 09:00:00'] - df = DataFrame({'label': ['a', 'a', 'a', 'b', 'b', 'b'], - 'period': [pd.Period(d, freq='H') for d in dates], - 'value1': np.arange(6, dtype='int64'), - 'value2': [1, 2] * 3}) - - exp_idx1 = pd.PeriodIndex(['2011-07-19 07:00:00', - '2011-07-19 07:00:00', - '2011-07-19 08:00:00', - '2011-07-19 08:00:00', - '2011-07-19 09:00:00', - '2011-07-19 09:00:00'], - freq='H', name='period') - exp_idx2 = Index(['a', 'b'] * 3, name='label') - exp_idx = MultiIndex.from_arrays([exp_idx1, exp_idx2]) - expected = DataFrame({'value1': [0, 3, 1, 4, 2, 5], - 'value2': [1, 2, 2, 1, 1, 2]}, - index=exp_idx, columns=['value1', 'value2']) - - result = df.groupby(['period', 'label']).sum() - assert_frame_equal(result, expected) - - # by level - didx = pd.PeriodIndex(dates, freq='H') - df = DataFrame({'value1': np.arange(6, dtype='int64'), - 'value2': [1, 2, 3, 1, 2, 3]}, - index=didx) - - exp_idx = pd.PeriodIndex(['2011-07-19 07:00:00', - '2011-07-19 08:00:00', - '2011-07-19 09:00:00'], freq='H') - expected = DataFrame({'value1': [3, 5, 7], 'value2': [2, 4, 6]}, - index=exp_idx, columns=['value1', 'value2']) - - result = df.groupby(level=0).sum() - assert_frame_equal(result, expected) - def test_groupby_reindex_inside_function(self): - from pandas.tseries.api import DatetimeIndex periods = 1000 ind = DatetimeIndex(start='2012/1/1', freq='5min', periods=periods) @@ -4285,16 +3120,16 @@ def test_multiindex_columns_empty_level(self): df = DataFrame([[long(1), 'A']], columns=midx) grouped = df.groupby('to filter').groups - self.assertEqual(grouped['A'], [0]) + assert grouped['A'] == [0] grouped = df.groupby([('to filter', '')]).groups - self.assertEqual(grouped['A'], [0]) + assert grouped['A'] == [0] df = DataFrame([[long(1), 'A'], [long(2), 'B']], columns=midx) expected = df.groupby('to filter').groups result = df.groupby([('to filter', '')]).groups - self.assertEqual(result, expected) + assert result == expected df = DataFrame([[long(1), 'A'], [long(2), 'A']], columns=midx) @@ -4330,33 +3165,21 @@ def test_median_empty_bins(self): def test_groupby_non_arithmetic_agg_types(self): # GH9311, GH6620 - df = pd.DataFrame([{'a': 1, - 'b': 1}, {'a': 1, - 'b': 2}, {'a': 2, - 'b': 3}, {'a': 2, - 'b': 4}]) + df = pd.DataFrame( + [{'a': 1, 'b': 1}, + {'a': 1, 'b': 2}, + {'a': 2, 'b': 3}, + {'a': 2, 'b': 4}]) dtypes = ['int8', 'int16', 'int32', 'int64', 'float32', 'float64'] - grp_exp = {'first': {'df': [{'a': 1, - 'b': 1}, {'a': 2, - 'b': 3}]}, - 'last': {'df': [{'a': 1, - 'b': 2}, {'a': 2, - 'b': 4}]}, - 'min': {'df': [{'a': 1, - 'b': 1}, {'a': 2, - 'b': 3}]}, - 'max': {'df': [{'a': 1, - 'b': 2}, {'a': 2, - 'b': 4}]}, - 'nth': {'df': [{'a': 1, - 'b': 2}, {'a': 2, - 'b': 4}], + grp_exp = {'first': {'df': [{'a': 1, 'b': 1}, {'a': 2, 'b': 3}]}, + 'last': {'df': [{'a': 1, 'b': 2}, {'a': 2, 'b': 4}]}, + 'min': {'df': [{'a': 1, 'b': 1}, {'a': 2, 'b': 3}]}, + 'max': {'df': [{'a': 1, 'b': 2}, {'a': 2, 'b': 4}]}, + 'nth': {'df': [{'a': 1, 'b': 2}, {'a': 2, 'b': 4}], 'args': [1]}, - 'count': {'df': [{'a': 1, - 'b': 2}, {'a': 2, - 'b': 2}], + 'count': {'df': [{'a': 1, 'b': 2}, {'a': 2, 'b': 2}], 'out_type': 'int64'}} for dtype in dtypes: @@ -4406,38 +3229,7 @@ def test_groupby_non_arithmetic_agg_intlike_precision(self): grpd = df.groupby('a') res = getattr(grpd, method)(*data['args']) - self.assertEqual(res.iloc[0].b, data['expected']) - - def test_groupby_first_datetime64(self): - df = DataFrame([(1, 1351036800000000000), (2, 1351036800000000000)]) - df[1] = df[1].view('M8[ns]') - - self.assertTrue(issubclass(df[1].dtype.type, np.datetime64)) - - result = df.groupby(level=0).first() - got_dt = result[1].dtype - self.assertTrue(issubclass(got_dt.type, np.datetime64)) - - result = df[1].groupby(level=0).first() - got_dt = result.dtype - self.assertTrue(issubclass(got_dt.type, np.datetime64)) - - def test_groupby_max_datetime64(self): - # GH 5869 - # datetimelike dtype conversion from int - df = DataFrame(dict(A=Timestamp('20130101'), B=np.arange(5))) - expected = df.groupby('A')['A'].apply(lambda x: x.max()) - result = df.groupby('A')['A'].max() - assert_series_equal(result, expected) - - def test_groupby_datetime64_32_bit(self): - # GH 6410 / numpy 4328 - # 32-bit under 1.9-dev indexing issue - - df = DataFrame({"A": range(2), "B": [pd.Timestamp('2000-01-1')] * 2}) - result = df.groupby("A")["B"].transform(min) - expected = Series([pd.Timestamp('2000-01-1')] * 2, name='B') - assert_series_equal(result, expected) + assert res.iloc[0].b == data['expected'] def test_groupby_multiindex_missing_pair(self): # GH9049 @@ -4461,7 +3253,7 @@ def test_groupby_multiindex_not_lexsorted(self): lexsorted_mi = MultiIndex.from_tuples( [('a', ''), ('b1', 'c1'), ('b2', 'c2')], names=['b', 'c']) lexsorted_df = DataFrame([[1, 3, 4]], columns=lexsorted_mi) - self.assertTrue(lexsorted_df.columns.is_lexsorted()) + assert lexsorted_df.columns.is_lexsorted() # define the non-lexsorted version not_lexsorted_df = DataFrame(columns=['a', 'b', 'c', 'd'], @@ -4470,13 +3262,13 @@ def test_groupby_multiindex_not_lexsorted(self): not_lexsorted_df = not_lexsorted_df.pivot_table( index='a', columns=['b', 'c'], values='d') not_lexsorted_df = not_lexsorted_df.reset_index() - self.assertFalse(not_lexsorted_df.columns.is_lexsorted()) + assert not not_lexsorted_df.columns.is_lexsorted() # compare the results tm.assert_frame_equal(lexsorted_df, not_lexsorted_df) expected = lexsorted_df.groupby('a').mean() - with tm.assert_produces_warning(com.PerformanceWarning): + with tm.assert_produces_warning(PerformanceWarning): result = not_lexsorted_df.groupby('a').mean() tm.assert_frame_equal(expected, result) @@ -4485,7 +3277,7 @@ def test_groupby_multiindex_not_lexsorted(self): df = DataFrame({'x': ['a', 'a', 'b', 'a'], 'y': [1, 1, 2, 2], 'z': [1, 2, 3, 4]}).set_index(['x', 'y']) - self.assertFalse(df.index.is_lexsorted()) + assert not df.index.is_lexsorted() for level in [0, 1, [0, 1]]: for sort in [False, True]: @@ -4543,7 +3335,7 @@ def test_groupby_with_empty(self): index = pd.DatetimeIndex(()) data = () series = pd.Series(data, index) - grouper = pd.tseries.resample.TimeGrouper('D') + grouper = pd.core.resample.TimeGrouper('D') grouped = series.groupby(grouper) assert next(iter(grouped), None) is None @@ -4563,10 +3355,10 @@ def test_groupby_with_small_elem(self): 'change': [1234, 5678]}, index=pd.DatetimeIndex(['2014-09-10', '2013-10-10'])) grouped = df.groupby([pd.TimeGrouper(freq='M'), 'event']) - self.assertEqual(len(grouped.groups), 2) - self.assertEqual(grouped.ngroups, 2) - self.assertIn((pd.Timestamp('2014-09-30'), 'start'), grouped.groups) - self.assertIn((pd.Timestamp('2013-10-31'), 'start'), grouped.groups) + assert len(grouped.groups) == 2 + assert grouped.ngroups == 2 + assert (pd.Timestamp('2014-09-30'), 'start') in grouped.groups + assert (pd.Timestamp('2013-10-31'), 'start') in grouped.groups res = grouped.get_group((pd.Timestamp('2014-09-30'), 'start')) tm.assert_frame_equal(res, df.iloc[[0], :]) @@ -4578,10 +3370,10 @@ def test_groupby_with_small_elem(self): index=pd.DatetimeIndex(['2014-09-10', '2013-10-10', '2014-09-15'])) grouped = df.groupby([pd.TimeGrouper(freq='M'), 'event']) - self.assertEqual(len(grouped.groups), 2) - self.assertEqual(grouped.ngroups, 2) - self.assertIn((pd.Timestamp('2014-09-30'), 'start'), grouped.groups) - self.assertIn((pd.Timestamp('2013-10-31'), 'start'), grouped.groups) + assert len(grouped.groups) == 2 + assert grouped.ngroups == 2 + assert (pd.Timestamp('2014-09-30'), 'start') in grouped.groups + assert (pd.Timestamp('2013-10-31'), 'start') in grouped.groups res = grouped.get_group((pd.Timestamp('2014-09-30'), 'start')) tm.assert_frame_equal(res, df.iloc[[0, 2], :]) @@ -4594,11 +3386,11 @@ def test_groupby_with_small_elem(self): index=pd.DatetimeIndex(['2014-09-10', '2013-10-10', '2014-08-05'])) grouped = df.groupby([pd.TimeGrouper(freq='M'), 'event']) - self.assertEqual(len(grouped.groups), 3) - self.assertEqual(grouped.ngroups, 3) - self.assertIn((pd.Timestamp('2014-09-30'), 'start'), grouped.groups) - self.assertIn((pd.Timestamp('2013-10-31'), 'start'), grouped.groups) - self.assertIn((pd.Timestamp('2014-08-31'), 'start'), grouped.groups) + assert len(grouped.groups) == 3 + assert grouped.ngroups == 3 + assert (pd.Timestamp('2014-09-30'), 'start') in grouped.groups + assert (pd.Timestamp('2013-10-31'), 'start') in grouped.groups + assert (pd.Timestamp('2014-08-31'), 'start') in grouped.groups res = grouped.get_group((pd.Timestamp('2014-09-30'), 'start')) tm.assert_frame_equal(res, df.iloc[[0], :]) @@ -4607,435 +3399,6 @@ def test_groupby_with_small_elem(self): res = grouped.get_group((pd.Timestamp('2014-08-31'), 'start')) tm.assert_frame_equal(res, df.iloc[[2], :]) - def test_groupby_with_timezone_selection(self): - # GH 11616 - # Test that column selection returns output in correct timezone. - np.random.seed(42) - df = pd.DataFrame({ - 'factor': np.random.randint(0, 3, size=60), - 'time': pd.date_range('01/01/2000 00:00', periods=60, - freq='s', tz='UTC') - }) - df1 = df.groupby('factor').max()['time'] - df2 = df.groupby('factor')['time'].max() - tm.assert_series_equal(df1, df2) - - def test_timezone_info(self): - # GH 11682 - # Timezone info lost when broadcasting scalar datetime to DataFrame - tm._skip_if_no_pytz() - import pytz - - df = pd.DataFrame({'a': [1], 'b': [datetime.now(pytz.utc)]}) - self.assertEqual(df['b'][0].tzinfo, pytz.utc) - df = pd.DataFrame({'a': [1, 2, 3]}) - df['b'] = datetime.now(pytz.utc) - self.assertEqual(df['b'][0].tzinfo, pytz.utc) - - def test_groupby_with_timegrouper(self): - # GH 4161 - # TimeGrouper requires a sorted index - # also verifies that the resultant index has the correct name - import datetime as DT - df_original = DataFrame({ - 'Buyer': 'Carl Carl Carl Carl Joe Carl'.split(), - 'Quantity': [18, 3, 5, 1, 9, 3], - 'Date': [ - DT.datetime(2013, 9, 1, 13, 0), - DT.datetime(2013, 9, 1, 13, 5), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 3, 10, 0), - DT.datetime(2013, 12, 2, 12, 0), - DT.datetime(2013, 9, 2, 14, 0), - ] - }) - - # GH 6908 change target column's order - df_reordered = df_original.sort_values(by='Quantity') - - for df in [df_original, df_reordered]: - df = df.set_index(['Date']) - - expected = DataFrame( - {'Quantity': np.nan}, - index=date_range('20130901 13:00:00', - '20131205 13:00:00', freq='5D', - name='Date', closed='left')) - expected.iloc[[0, 6, 18], 0] = np.array( - [24., 6., 9.], dtype='float64') - - result1 = df.resample('5D') .sum() - assert_frame_equal(result1, expected) - - df_sorted = df.sort_index() - result2 = df_sorted.groupby(pd.TimeGrouper(freq='5D')).sum() - assert_frame_equal(result2, expected) - - result3 = df.groupby(pd.TimeGrouper(freq='5D')).sum() - assert_frame_equal(result3, expected) - - def test_groupby_with_timegrouper_methods(self): - # GH 3881 - # make sure API of timegrouper conforms - - import datetime as DT - df_original = pd.DataFrame({ - 'Branch': 'A A A A A B'.split(), - 'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(), - 'Quantity': [1, 3, 5, 8, 9, 3], - 'Date': [ - DT.datetime(2013, 1, 1, 13, 0), - DT.datetime(2013, 1, 1, 13, 5), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 2, 10, 0), - DT.datetime(2013, 12, 2, 12, 0), - DT.datetime(2013, 12, 2, 14, 0), - ] - }) - - df_sorted = df_original.sort_values(by='Quantity', ascending=False) - - for df in [df_original, df_sorted]: - df = df.set_index('Date', drop=False) - g = df.groupby(pd.TimeGrouper('6M')) - self.assertTrue(g.group_keys) - self.assertTrue(isinstance(g.grouper, pd.core.groupby.BinGrouper)) - groups = g.groups - self.assertTrue(isinstance(groups, dict)) - self.assertTrue(len(groups) == 3) - - def test_timegrouper_with_reg_groups(self): - - # GH 3794 - # allow combinateion of timegrouper/reg groups - - import datetime as DT - - df_original = DataFrame({ - 'Branch': 'A A A A A A A B'.split(), - 'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(), - 'Quantity': [1, 3, 5, 1, 8, 1, 9, 3], - 'Date': [ - DT.datetime(2013, 1, 1, 13, 0), - DT.datetime(2013, 1, 1, 13, 5), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 2, 10, 0), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 2, 10, 0), - DT.datetime(2013, 12, 2, 12, 0), - DT.datetime(2013, 12, 2, 14, 0), - ] - }).set_index('Date') - - df_sorted = df_original.sort_values(by='Quantity', ascending=False) - - for df in [df_original, df_sorted]: - expected = DataFrame({ - 'Buyer': 'Carl Joe Mark'.split(), - 'Quantity': [10, 18, 3], - 'Date': [ - DT.datetime(2013, 12, 31, 0, 0), - DT.datetime(2013, 12, 31, 0, 0), - DT.datetime(2013, 12, 31, 0, 0), - ] - }).set_index(['Date', 'Buyer']) - - result = df.groupby([pd.Grouper(freq='A'), 'Buyer']).sum() - assert_frame_equal(result, expected) - - expected = DataFrame({ - 'Buyer': 'Carl Mark Carl Joe'.split(), - 'Quantity': [1, 3, 9, 18], - 'Date': [ - DT.datetime(2013, 1, 1, 0, 0), - DT.datetime(2013, 1, 1, 0, 0), - DT.datetime(2013, 7, 1, 0, 0), - DT.datetime(2013, 7, 1, 0, 0), - ] - }).set_index(['Date', 'Buyer']) - result = df.groupby([pd.Grouper(freq='6MS'), 'Buyer']).sum() - assert_frame_equal(result, expected) - - df_original = DataFrame({ - 'Branch': 'A A A A A A A B'.split(), - 'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(), - 'Quantity': [1, 3, 5, 1, 8, 1, 9, 3], - 'Date': [ - DT.datetime(2013, 10, 1, 13, 0), - DT.datetime(2013, 10, 1, 13, 5), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 2, 10, 0), - DT.datetime(2013, 10, 1, 20, 0), - DT.datetime(2013, 10, 2, 10, 0), - DT.datetime(2013, 10, 2, 12, 0), - DT.datetime(2013, 10, 2, 14, 0), - ] - }).set_index('Date') - - df_sorted = df_original.sort_values(by='Quantity', ascending=False) - for df in [df_original, df_sorted]: - - expected = DataFrame({ - 'Buyer': 'Carl Joe Mark Carl Joe'.split(), - 'Quantity': [6, 8, 3, 4, 10], - 'Date': [ - DT.datetime(2013, 10, 1, 0, 0), - DT.datetime(2013, 10, 1, 0, 0), - DT.datetime(2013, 10, 1, 0, 0), - DT.datetime(2013, 10, 2, 0, 0), - DT.datetime(2013, 10, 2, 0, 0), - ] - }).set_index(['Date', 'Buyer']) - - result = df.groupby([pd.Grouper(freq='1D'), 'Buyer']).sum() - assert_frame_equal(result, expected) - - result = df.groupby([pd.Grouper(freq='1M'), 'Buyer']).sum() - expected = DataFrame({ - 'Buyer': 'Carl Joe Mark'.split(), - 'Quantity': [10, 18, 3], - 'Date': [ - DT.datetime(2013, 10, 31, 0, 0), - DT.datetime(2013, 10, 31, 0, 0), - DT.datetime(2013, 10, 31, 0, 0), - ] - }).set_index(['Date', 'Buyer']) - assert_frame_equal(result, expected) - - # passing the name - df = df.reset_index() - result = df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer' - ]).sum() - assert_frame_equal(result, expected) - - with self.assertRaises(KeyError): - df.groupby([pd.Grouper(freq='1M', key='foo'), 'Buyer']).sum() - - # passing the level - df = df.set_index('Date') - result = df.groupby([pd.Grouper(freq='1M', level='Date'), 'Buyer' - ]).sum() - assert_frame_equal(result, expected) - result = df.groupby([pd.Grouper(freq='1M', level=0), 'Buyer']).sum( - ) - assert_frame_equal(result, expected) - - with self.assertRaises(ValueError): - df.groupby([pd.Grouper(freq='1M', level='foo'), - 'Buyer']).sum() - - # multi names - df = df.copy() - df['Date'] = df.index + pd.offsets.MonthEnd(2) - result = df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer' - ]).sum() - expected = DataFrame({ - 'Buyer': 'Carl Joe Mark'.split(), - 'Quantity': [10, 18, 3], - 'Date': [ - DT.datetime(2013, 11, 30, 0, 0), - DT.datetime(2013, 11, 30, 0, 0), - DT.datetime(2013, 11, 30, 0, 0), - ] - }).set_index(['Date', 'Buyer']) - assert_frame_equal(result, expected) - - # error as we have both a level and a name! - with self.assertRaises(ValueError): - df.groupby([pd.Grouper(freq='1M', key='Date', - level='Date'), 'Buyer']).sum() - - # single groupers - expected = DataFrame({'Quantity': [31], - 'Date': [DT.datetime(2013, 10, 31, 0, 0) - ]}).set_index('Date') - result = df.groupby(pd.Grouper(freq='1M')).sum() - assert_frame_equal(result, expected) - - result = df.groupby([pd.Grouper(freq='1M')]).sum() - assert_frame_equal(result, expected) - - expected = DataFrame({'Quantity': [31], - 'Date': [DT.datetime(2013, 11, 30, 0, 0) - ]}).set_index('Date') - result = df.groupby(pd.Grouper(freq='1M', key='Date')).sum() - assert_frame_equal(result, expected) - - result = df.groupby([pd.Grouper(freq='1M', key='Date')]).sum() - assert_frame_equal(result, expected) - - # GH 6764 multiple grouping with/without sort - df = DataFrame({ - 'date': pd.to_datetime([ - '20121002', '20121007', '20130130', '20130202', '20130305', - '20121002', '20121207', '20130130', '20130202', '20130305', - '20130202', '20130305' - ]), - 'user_id': [1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 5], - 'whole_cost': [1790, 364, 280, 259, 201, 623, 90, 312, 359, 301, - 359, 801], - 'cost1': [12, 15, 10, 24, 39, 1, 0, 90, 45, 34, 1, 12] - }).set_index('date') - - for freq in ['D', 'M', 'A', 'Q-APR']: - expected = df.groupby('user_id')[ - 'whole_cost'].resample( - freq).sum().dropna().reorder_levels( - ['date', 'user_id']).sort_index().astype('int64') - expected.name = 'whole_cost' - - result1 = df.sort_index().groupby([pd.TimeGrouper(freq=freq), - 'user_id'])['whole_cost'].sum() - assert_series_equal(result1, expected) - - result2 = df.groupby([pd.TimeGrouper(freq=freq), 'user_id'])[ - 'whole_cost'].sum() - assert_series_equal(result2, expected) - - def test_timegrouper_get_group(self): - # GH 6914 - - df_original = DataFrame({ - 'Buyer': 'Carl Joe Joe Carl Joe Carl'.split(), - 'Quantity': [18, 3, 5, 1, 9, 3], - 'Date': [datetime(2013, 9, 1, 13, 0), - datetime(2013, 9, 1, 13, 5), - datetime(2013, 10, 1, 20, 0), - datetime(2013, 10, 3, 10, 0), - datetime(2013, 12, 2, 12, 0), - datetime(2013, 9, 2, 14, 0), ] - }) - df_reordered = df_original.sort_values(by='Quantity') - - # single grouping - expected_list = [df_original.iloc[[0, 1, 5]], df_original.iloc[[2, 3]], - df_original.iloc[[4]]] - dt_list = ['2013-09-30', '2013-10-31', '2013-12-31'] - - for df in [df_original, df_reordered]: - grouped = df.groupby(pd.Grouper(freq='M', key='Date')) - for t, expected in zip(dt_list, expected_list): - dt = pd.Timestamp(t) - result = grouped.get_group(dt) - assert_frame_equal(result, expected) - - # multiple grouping - expected_list = [df_original.iloc[[1]], df_original.iloc[[3]], - df_original.iloc[[4]]] - g_list = [('Joe', '2013-09-30'), ('Carl', '2013-10-31'), - ('Joe', '2013-12-31')] - - for df in [df_original, df_reordered]: - grouped = df.groupby(['Buyer', pd.Grouper(freq='M', key='Date')]) - for (b, t), expected in zip(g_list, expected_list): - dt = pd.Timestamp(t) - result = grouped.get_group((b, dt)) - assert_frame_equal(result, expected) - - # with index - df_original = df_original.set_index('Date') - df_reordered = df_original.sort_values(by='Quantity') - - expected_list = [df_original.iloc[[0, 1, 5]], df_original.iloc[[2, 3]], - df_original.iloc[[4]]] - - for df in [df_original, df_reordered]: - grouped = df.groupby(pd.Grouper(freq='M')) - for t, expected in zip(dt_list, expected_list): - dt = pd.Timestamp(t) - result = grouped.get_group(dt) - assert_frame_equal(result, expected) - - def test_timegrouper_apply_return_type_series(self): - # Using `apply` with the `TimeGrouper` should give the - # same return type as an `apply` with a `Grouper`. - # Issue #11742 - df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], - 'value': [10, 13]}) - df_dt = df.copy() - df_dt['date'] = pd.to_datetime(df_dt['date']) - - def sumfunc_series(x): - return pd.Series([x['value'].sum()], ('sum',)) - - expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_series) - result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) - .apply(sumfunc_series)) - assert_frame_equal(result.reset_index(drop=True), - expected.reset_index(drop=True)) - - def test_timegrouper_apply_return_type_value(self): - # Using `apply` with the `TimeGrouper` should give the - # same return type as an `apply` with a `Grouper`. - # Issue #11742 - df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], - 'value': [10, 13]}) - df_dt = df.copy() - df_dt['date'] = pd.to_datetime(df_dt['date']) - - def sumfunc_value(x): - return x.value.sum() - - expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_value) - result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) - .apply(sumfunc_value)) - assert_series_equal(result.reset_index(drop=True), - expected.reset_index(drop=True)) - - def test_cumcount(self): - df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A']) - g = df.groupby('A') - sg = g.A - - expected = Series([0, 1, 2, 0, 3]) - - assert_series_equal(expected, g.cumcount()) - assert_series_equal(expected, sg.cumcount()) - - def test_cumcount_empty(self): - ge = DataFrame().groupby(level=0) - se = Series().groupby(level=0) - - # edge case, as this is usually considered float - e = Series(dtype='int64') - - assert_series_equal(e, ge.cumcount()) - assert_series_equal(e, se.cumcount()) - - def test_cumcount_dupe_index(self): - df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], - index=[0] * 5) - g = df.groupby('A') - sg = g.A - - expected = Series([0, 1, 2, 0, 3], index=[0] * 5) - - assert_series_equal(expected, g.cumcount()) - assert_series_equal(expected, sg.cumcount()) - - def test_cumcount_mi(self): - mi = MultiIndex.from_tuples([[0, 1], [1, 2], [2, 2], [2, 2], [1, 0]]) - df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], - index=mi) - g = df.groupby('A') - sg = g.A - - expected = Series([0, 1, 2, 0, 3], index=mi) - - assert_series_equal(expected, g.cumcount()) - assert_series_equal(expected, sg.cumcount()) - - def test_cumcount_groupby_not_col(self): - df = DataFrame([['a'], ['a'], ['a'], ['b'], ['a']], columns=['A'], - index=[0] * 5) - g = df.groupby([0, 0, 0, 1, 0]) - sg = g.A - - expected = Series([0, 1, 2, 0, 3], index=[0] * 5) - - assert_series_equal(expected, g.cumcount()) - assert_series_equal(expected, sg.cumcount()) - def test_fill_constistency(self): # GH9221 @@ -5082,344 +3445,6 @@ def test_index_label_overlaps_location(self): expected = ser.take([1, 3, 4]) assert_series_equal(actual, expected) - def test_groupby_selection_with_methods(self): - # some methods which require DatetimeIndex - rng = pd.date_range('2014', periods=len(self.df)) - self.df.index = rng - - g = self.df.groupby(['A'])[['C']] - g_exp = self.df[['C']].groupby(self.df['A']) - # TODO check groupby with > 1 col ? - - # methods which are called as .foo() - methods = ['count', - 'corr', - 'cummax', - 'cummin', - 'cumprod', - 'describe', - 'rank', - 'quantile', - 'diff', - 'shift', - 'all', - 'any', - 'idxmin', - 'idxmax', - 'ffill', - 'bfill', - 'pct_change', - 'tshift'] - - for m in methods: - res = getattr(g, m)() - exp = getattr(g_exp, m)() - assert_frame_equal(res, exp) # should always be frames! - - # methods which aren't just .foo() - assert_frame_equal(g.fillna(0), g_exp.fillna(0)) - assert_frame_equal(g.dtypes, g_exp.dtypes) - assert_frame_equal(g.apply(lambda x: x.sum()), - g_exp.apply(lambda x: x.sum())) - - assert_frame_equal(g.resample('D').mean(), g_exp.resample('D').mean()) - assert_frame_equal(g.resample('D').ohlc(), - g_exp.resample('D').ohlc()) - - assert_frame_equal(g.filter(lambda x: len(x) == 3), - g_exp.filter(lambda x: len(x) == 3)) - - def test_groupby_whitelist(self): - from string import ascii_lowercase - letters = np.array(list(ascii_lowercase)) - N = 10 - random_letters = letters.take(np.random.randint(0, 26, N)) - df = DataFrame({'floats': N / 10 * Series(np.random.random(N)), - 'letters': Series(random_letters)}) - s = df.floats - - df_whitelist = frozenset([ - 'last', - 'first', - 'mean', - 'sum', - 'min', - 'max', - 'head', - 'tail', - 'cumcount', - 'resample', - 'describe', - 'rank', - 'quantile', - 'fillna', - 'mad', - 'any', - 'all', - 'take', - 'idxmax', - 'idxmin', - 'shift', - 'tshift', - 'ffill', - 'bfill', - 'pct_change', - 'skew', - 'plot', - 'boxplot', - 'hist', - 'median', - 'dtypes', - 'corrwith', - 'corr', - 'cov', - 'diff', - ]) - s_whitelist = frozenset([ - 'last', - 'first', - 'mean', - 'sum', - 'min', - 'max', - 'head', - 'tail', - 'cumcount', - 'resample', - 'describe', - 'rank', - 'quantile', - 'fillna', - 'mad', - 'any', - 'all', - 'take', - 'idxmax', - 'idxmin', - 'shift', - 'tshift', - 'ffill', - 'bfill', - 'pct_change', - 'skew', - 'plot', - 'hist', - 'median', - 'dtype', - 'corr', - 'cov', - 'diff', - 'unique', - # 'nlargest', 'nsmallest', - ]) - - for obj, whitelist in zip((df, s), (df_whitelist, s_whitelist)): - gb = obj.groupby(df.letters) - self.assertEqual(whitelist, gb._apply_whitelist) - for m in whitelist: - getattr(type(gb), m) - - AGG_FUNCTIONS = ['sum', 'prod', 'min', 'max', 'median', 'mean', 'skew', - 'mad', 'std', 'var', 'sem'] - AGG_FUNCTIONS_WITH_SKIPNA = ['skew', 'mad'] - - def test_groupby_whitelist_deprecations(self): - from string import ascii_lowercase - letters = np.array(list(ascii_lowercase)) - N = 10 - random_letters = letters.take(np.random.randint(0, 26, N)) - df = DataFrame({'floats': N / 10 * Series(np.random.random(N)), - 'letters': Series(random_letters)}) - - # 10711 deprecated - with tm.assert_produces_warning(FutureWarning): - df.groupby('letters').irow(0) - with tm.assert_produces_warning(FutureWarning): - df.groupby('letters').floats.irow(0) - - def test_regression_whitelist_methods(self): - - # GH6944 - # explicity test the whitelest methods - index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', - 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], - names=['first', 'second']) - raw_frame = DataFrame(np.random.randn(10, 3), index=index, - columns=Index(['A', 'B', 'C'], name='exp')) - raw_frame.iloc[1, [1, 2]] = np.nan - raw_frame.iloc[7, [0, 1]] = np.nan - - for op, level, axis, skipna in cart_product(self.AGG_FUNCTIONS, - lrange(2), lrange(2), - [True, False]): - - if axis == 0: - frame = raw_frame - else: - frame = raw_frame.T - - if op in self.AGG_FUNCTIONS_WITH_SKIPNA: - grouped = frame.groupby(level=level, axis=axis) - result = getattr(grouped, op)(skipna=skipna) - expected = getattr(frame, op)(level=level, axis=axis, - skipna=skipna) - assert_frame_equal(result, expected) - else: - grouped = frame.groupby(level=level, axis=axis) - result = getattr(grouped, op)() - expected = getattr(frame, op)(level=level, axis=axis) - assert_frame_equal(result, expected) - - def test_groupby_blacklist(self): - from string import ascii_lowercase - letters = np.array(list(ascii_lowercase)) - N = 10 - random_letters = letters.take(np.random.randint(0, 26, N)) - df = DataFrame({'floats': N / 10 * Series(np.random.random(N)), - 'letters': Series(random_letters)}) - s = df.floats - - blacklist = [ - 'eval', 'query', 'abs', 'where', - 'mask', 'align', 'groupby', 'clip', 'astype', - 'at', 'combine', 'consolidate', 'convert_objects', - ] - to_methods = [method for method in dir(df) if method.startswith('to_')] - - blacklist.extend(to_methods) - - # e.g., to_csv - defined_but_not_allowed = ("(?:^Cannot.+{0!r}.+{1!r}.+try using the " - "'apply' method$)") - - # e.g., query, eval - not_defined = "(?:^{1!r} object has no attribute {0!r}$)" - fmt = defined_but_not_allowed + '|' + not_defined - for bl in blacklist: - for obj in (df, s): - gb = obj.groupby(df.letters) - msg = fmt.format(bl, type(gb).__name__) - with tm.assertRaisesRegexp(AttributeError, msg): - getattr(gb, bl) - - def test_tab_completion(self): - grp = self.mframe.groupby(level='second') - results = set([v for v in dir(grp) if not v.startswith('_')]) - expected = set( - ['A', 'B', 'C', 'agg', 'aggregate', 'apply', 'boxplot', 'filter', - 'first', 'get_group', 'groups', 'hist', 'indices', 'last', 'max', - 'mean', 'median', 'min', 'name', 'ngroups', 'nth', 'ohlc', 'plot', - 'prod', 'size', 'std', 'sum', 'transform', 'var', 'sem', 'count', - 'nunique', 'head', 'irow', 'describe', 'cummax', 'quantile', - 'rank', 'cumprod', 'tail', 'resample', 'cummin', 'fillna', - 'cumsum', 'cumcount', 'all', 'shift', 'skew', 'bfill', 'ffill', - 'take', 'tshift', 'pct_change', 'any', 'mad', 'corr', 'corrwith', - 'cov', 'dtypes', 'ndim', 'diff', 'idxmax', 'idxmin', - 'ffill', 'bfill', 'pad', 'backfill', 'rolling', 'expanding']) - self.assertEqual(results, expected) - - def test_lexsort_indexer(self): - keys = [[nan] * 5 + list(range(100)) + [nan] * 5] - # orders=True, na_position='last' - result = _lexsort_indexer(keys, orders=True, na_position='last') - exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) - - # orders=True, na_position='first' - result = _lexsort_indexer(keys, orders=True, na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) - tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) - - # orders=False, na_position='last' - result = _lexsort_indexer(keys, orders=False, na_position='last') - exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) - - # orders=False, na_position='first' - result = _lexsort_indexer(keys, orders=False, na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) - tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) - - def test_nargsort(self): - # np.argsort(items) places NaNs last - items = [nan] * 5 + list(range(100)) + [nan] * 5 - # np.argsort(items2) may not place NaNs first - items2 = np.array(items, dtype='O') - - try: - # GH 2785; due to a regression in NumPy1.6.2 - np.argsort(np.array([[1, 2], [1, 3], [1, 2]], dtype='i')) - np.argsort(items2, kind='mergesort') - except TypeError: - raise nose.SkipTest('requested sort not available for type') - - # mergesort is the most difficult to get right because we want it to be - # stable. - - # According to numpy/core/tests/test_multiarray, """The number of - # sorted items must be greater than ~50 to check the actual algorithm - # because quick and merge sort fall over to insertion sort for small - # arrays.""" - - # mergesort, ascending=True, na_position='last' - result = _nargsort(items, kind='mergesort', ascending=True, - na_position='last') - exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=True, na_position='first' - result = _nargsort(items, kind='mergesort', ascending=True, - na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=False, na_position='last' - result = _nargsort(items, kind='mergesort', ascending=False, - na_position='last') - exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=False, na_position='first' - result = _nargsort(items, kind='mergesort', ascending=False, - na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=True, na_position='last' - result = _nargsort(items2, kind='mergesort', ascending=True, - na_position='last') - exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=True, na_position='first' - result = _nargsort(items2, kind='mergesort', ascending=True, - na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=False, na_position='last' - result = _nargsort(items2, kind='mergesort', ascending=False, - na_position='last') - exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - # mergesort, ascending=False, na_position='first' - result = _nargsort(items2, kind='mergesort', ascending=False, - na_position='first') - exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) - tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) - - def test_datetime_count(self): - df = DataFrame({'a': [1, 2, 3] * 2, - 'dates': pd.date_range('now', periods=6, freq='T')}) - result = df.groupby('a').dates.count() - expected = Series([ - 2, 2, 2 - ], index=Index([1, 2, 3], name='a'), name='dates') - tm.assert_series_equal(result, expected) - def test_lower_int_prec_count(self): df = DataFrame({'a': np.array( [0, 1, 2, 100], np.int8), @@ -5456,179 +3481,6 @@ def __eq__(self, other): list('ab'), name='grp')) tm.assert_frame_equal(result, expected) - def test__cython_agg_general(self): - ops = [('mean', np.mean), - ('median', np.median), - ('var', np.var), - ('add', np.sum), - ('prod', np.prod), - ('min', np.min), - ('max', np.max), - ('first', lambda x: x.iloc[0]), - ('last', lambda x: x.iloc[-1]), ] - df = DataFrame(np.random.randn(1000)) - labels = np.random.randint(0, 50, size=1000).astype(float) - - for op, targop in ops: - result = df.groupby(labels)._cython_agg_general(op) - expected = df.groupby(labels).agg(targop) - try: - tm.assert_frame_equal(result, expected) - except BaseException as exc: - exc.args += ('operation: %s' % op, ) - raise - - def test_cython_agg_empty_buckets(self): - ops = [('mean', np.mean), - ('median', lambda x: np.median(x) if len(x) > 0 else np.nan), - ('var', lambda x: np.var(x, ddof=1)), - ('add', lambda x: np.sum(x) if len(x) > 0 else np.nan), - ('prod', np.prod), - ('min', np.min), - ('max', np.max), ] - - df = pd.DataFrame([11, 12, 13]) - grps = range(0, 55, 5) - - for op, targop in ops: - result = df.groupby(pd.cut(df[0], grps))._cython_agg_general(op) - expected = df.groupby(pd.cut(df[0], grps)).agg(lambda x: targop(x)) - try: - tm.assert_frame_equal(result, expected) - except BaseException as exc: - exc.args += ('operation: %s' % op,) - raise - - def test_cython_group_transform_algos(self): - # GH 4095 - dtypes = [np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint32, - np.uint64, np.float32, np.float64] - - ops = [(pd.algos.group_cumprod_float64, np.cumproduct, [np.float64]), - (pd.algos.group_cumsum, np.cumsum, dtypes)] - - is_datetimelike = False - for pd_op, np_op, dtypes in ops: - for dtype in dtypes: - data = np.array([[1], [2], [3], [4]], dtype=dtype) - ans = np.zeros_like(data) - labels = np.array([0, 0, 0, 0], dtype=np.int64) - pd_op(ans, data, labels, is_datetimelike) - self.assert_numpy_array_equal(np_op(data), ans[:, 0], - check_dtype=False) - - # with nans - labels = np.array([0, 0, 0, 0, 0], dtype=np.int64) - - data = np.array([[1], [2], [3], [np.nan], [4]], dtype='float64') - actual = np.zeros_like(data) - actual.fill(np.nan) - pd.algos.group_cumprod_float64(actual, data, labels, is_datetimelike) - expected = np.array([1, 2, 6, np.nan, 24], dtype='float64') - self.assert_numpy_array_equal(actual[:, 0], expected) - - actual = np.zeros_like(data) - actual.fill(np.nan) - pd.algos.group_cumsum(actual, data, labels, is_datetimelike) - expected = np.array([1, 3, 6, np.nan, 10], dtype='float64') - self.assert_numpy_array_equal(actual[:, 0], expected) - - # timedelta - is_datetimelike = True - data = np.array([np.timedelta64(1, 'ns')] * 5, dtype='m8[ns]')[:, None] - actual = np.zeros_like(data, dtype='int64') - pd.algos.group_cumsum(actual, data.view('int64'), labels, - is_datetimelike) - expected = np.array([np.timedelta64(1, 'ns'), np.timedelta64( - 2, 'ns'), np.timedelta64(3, 'ns'), np.timedelta64(4, 'ns'), - np.timedelta64(5, 'ns')]) - self.assert_numpy_array_equal(actual[:, 0].view('m8[ns]'), expected) - - def test_cython_transform(self): - # GH 4095 - ops = [(('cumprod', - ()), lambda x: x.cumprod()), (('cumsum', ()), - lambda x: x.cumsum()), - (('shift', (-1, )), - lambda x: x.shift(-1)), (('shift', - (1, )), lambda x: x.shift())] - - s = Series(np.random.randn(1000)) - s_missing = s.copy() - s_missing.iloc[2:10] = np.nan - labels = np.random.randint(0, 50, size=1000).astype(float) - - # series - for (op, args), targop in ops: - for data in [s, s_missing]: - # print(data.head()) - expected = data.groupby(labels).transform(targop) - - tm.assert_series_equal(expected, - data.groupby(labels).transform(op, - *args)) - tm.assert_series_equal(expected, getattr( - data.groupby(labels), op)(*args)) - - strings = list('qwertyuiopasdfghjklz') - strings_missing = strings[:] - strings_missing[5] = np.nan - df = DataFrame({'float': s, - 'float_missing': s_missing, - 'int': [1, 1, 1, 1, 2] * 200, - 'datetime': pd.date_range('1990-1-1', periods=1000), - 'timedelta': pd.timedelta_range(1, freq='s', - periods=1000), - 'string': strings * 50, - 'string_missing': strings_missing * 50}) - df['cat'] = df['string'].astype('category') - - df2 = df.copy() - df2.index = pd.MultiIndex.from_product([range(100), range(10)]) - - # DataFrame - Single and MultiIndex, - # group by values, index level, columns - for df in [df, df2]: - for gb_target in [dict(by=labels), dict(level=0), dict(by='string') - ]: # dict(by='string_missing')]: - # dict(by=['int','string'])]: - - gb = df.groupby(**gb_target) - # whitelisted methods set the selection before applying - # bit a of hack to make sure the cythonized shift - # is equivalent to pre 0.17.1 behavior - if op == 'shift': - gb._set_group_selection() - - for (op, args), targop in ops: - if op != 'shift' and 'int' not in gb_target: - # numeric apply fastpath promotes dtype so have - # to apply seperately and concat - i = gb[['int']].apply(targop) - f = gb[['float', 'float_missing']].apply(targop) - expected = pd.concat([f, i], axis=1) - else: - expected = gb.apply(targop) - - expected = expected.sort_index(axis=1) - tm.assert_frame_equal(expected, - gb.transform(op, *args).sort_index( - axis=1)) - tm.assert_frame_equal(expected, getattr(gb, op)(*args)) - # individual columns - for c in df: - if c not in ['float', 'int', 'float_missing' - ] and op != 'shift': - self.assertRaises(DataError, gb[c].transform, op) - self.assertRaises(DataError, getattr(gb[c], op)) - else: - expected = gb[c].apply(targop) - expected.name = c - tm.assert_series_equal(expected, - gb[c].transform(op, *args)) - tm.assert_series_equal(expected, - getattr(gb[c], op)(*args)) - def test_groupby_cumprod(self): # GH 4095 df = pd.DataFrame({'key': ['b'] * 10, 'value': 2}) @@ -5688,7 +3540,7 @@ def test_max_nan_bug(self): r = gb[['File']].max() e = gb['File'].max().to_frame() tm.assert_frame_equal(r, e) - self.assertFalse(r['File'].isnull().any()) + assert not r['File'].isna().any() def test_nlargest(self): a = Series([1, 3, 5, 7, 2, 9, 0, 4, 6, 10]) @@ -5706,8 +3558,6 @@ def test_nlargest(self): 3, 2, 1, 3, 3, 2 ], index=MultiIndex.from_arrays([list('aaabbb'), [2, 3, 1, 6, 5, 7]])) assert_series_equal(gb.nlargest(3, keep='last'), e) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal(gb.nlargest(3, take_last=True), e) def test_nsmallest(self): a = Series([1, 3, 5, 7, 2, 9, 0, 4, 6, 10]) @@ -5725,8 +3575,6 @@ def test_nsmallest(self): 0, 1, 1, 0, 1, 2 ], index=MultiIndex.from_arrays([list('aaabbb'), [4, 1, 0, 9, 8, 7]])) assert_series_equal(gb.nsmallest(3, keep='last'), e) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal(gb.nsmallest(3, take_last=True), e) def test_transform_doesnt_clobber_ints(self): # GH 7972 @@ -5778,27 +3626,6 @@ def test_func(x): tm.assert_frame_equal(result1, expected1) tm.assert_frame_equal(result2, expected2) - def test_first_last_max_min_on_time_data(self): - # GH 10295 - # Verify that NaT is not in the result of max, min, first and last on - # Dataframe with datetime or timedelta values. - from datetime import timedelta as td - df_test = DataFrame( - {'dt': [nan, '2015-07-24 10:10', '2015-07-25 11:11', - '2015-07-23 12:12', nan], - 'td': [nan, td(days=1), td(days=2), td(days=3), nan]}) - df_test.dt = pd.to_datetime(df_test.dt) - df_test['group'] = 'A' - df_ref = df_test[df_test.dt.notnull()] - - grouped_test = df_test.groupby('group') - grouped_ref = df_ref.groupby('group') - - assert_frame_equal(grouped_ref.max(), grouped_test.max()) - assert_frame_equal(grouped_ref.min(), grouped_test.min()) - assert_frame_equal(grouped_ref.first(), grouped_test.first()) - assert_frame_equal(grouped_ref.last(), grouped_test.last()) - def test_groupby_preserves_sort(self): # Test to ensure that groupby always preserves sort order of original # object. Issue #8588 and #9651 @@ -5848,20 +3675,18 @@ def test_nunique_with_empty_series(self): expected = pd.Series(name='name', dtype='int64') tm.assert_series_equal(result, expected) - def test_transform_with_non_scalar_group(self): - # GH 10165 - cols = pd.MultiIndex.from_tuples([ - ('syn', 'A'), ('mis', 'A'), ('non', 'A'), - ('syn', 'C'), ('mis', 'C'), ('non', 'C'), - ('syn', 'T'), ('mis', 'T'), ('non', 'T'), - ('syn', 'G'), ('mis', 'G'), ('non', 'G')]) - df = pd.DataFrame(np.random.randint(1, 10, (4, 12)), - columns=cols, - index=['A', 'C', 'G', 'T']) - self.assertRaisesRegexp(ValueError, 'transform must return a scalar ' - 'value for each group.*', df.groupby - (axis=1, level=1).transform, - lambda z: z.div(z.sum(axis=1), axis=0)) + def test_nunique_with_timegrouper(self): + # GH 13453 + test = pd.DataFrame({ + 'time': [Timestamp('2016-06-28 09:35:35'), + Timestamp('2016-06-28 16:09:30'), + Timestamp('2016-06-28 16:46:28')], + 'data': ['1', '2', '3']}).set_index('time') + result = test.groupby(pd.TimeGrouper(freq='h'))['data'].nunique() + expected = test.groupby( + pd.TimeGrouper(freq='h') + )['data'].apply(pd.Series.nunique) + tm.assert_series_equal(result, expected) def test_numpy_compat(self): # see gh-12811 @@ -5871,10 +3696,10 @@ def test_numpy_compat(self): msg = "numpy operations are not valid with groupby" for func in ('mean', 'var', 'std', 'cumprod', 'cumsum'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(g, func), 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(g, func), foo=1) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(g, func), 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(g, func), foo=1) def test_grouping_string_repr(self): # GH 13394 @@ -5884,7 +3709,7 @@ def test_grouping_string_repr(self): result = gr.grouper.groupings[0].__repr__() expected = "Grouping(('A', 'a'))" - tm.assert_equal(result, expected) + assert result == expected def test_group_shift_with_null_key(self): # This test is designed to replicate the segfault in issue #13813. @@ -5900,8 +3725,8 @@ def test_group_shift_with_null_key(self): g = df.groupby(["A", "B"]) expected = DataFrame([(i + 12 if i % 3 and i < n_rows - 12 - else np.nan) - for i in range(n_rows)], dtype=float, + else np.nan) + for i in range(n_rows)], dtype=float, columns=["Z"], index=None) result = g.shift(-1) @@ -5917,27 +3742,10 @@ def test_pivot_table_values_key_error(self): df['year'] = df.set_index('eventDate').index.year df['month'] = df.set_index('eventDate').index.month - with self.assertRaises(KeyError): + with pytest.raises(KeyError): df.reset_index().pivot_table(index='year', columns='month', values='badname', aggfunc='count') - def test_agg_over_numpy_arrays(self): - # GH 3788 - df = pd.DataFrame([[1, np.array([10, 20, 30])], - [1, np.array([40, 50, 60])], - [2, np.array([20, 30, 40])]], - columns=['category', 'arraydata']) - result = df.groupby('category').agg(sum) - - expected_data = [[np.array([50, 70, 90])], [np.array([20, 30, 40])]] - expected_index = pd.Index([1, 2], name='category') - expected_column = ['arraydata'] - expected = pd.DataFrame(expected_data, - index=expected_index, - columns=expected_column) - - assert_frame_equal(result, expected) - def test_cummin_cummax(self): # GH 15048 num_types = [np.int32, np.int64, np.float32, np.float64] @@ -6017,9 +3825,84 @@ def test_cummin_cummax(self): result = base_df.groupby('A').B.apply(lambda x: x.cummax()).to_frame() tm.assert_frame_equal(expected, result) + # GH 15561 + df = pd.DataFrame(dict(a=[1], b=pd.to_datetime(['2001']))) + expected = pd.Series(pd.to_datetime('2001'), index=[0], name='b') + for method in ['cummax', 'cummin']: + result = getattr(df.groupby('a')['b'], method)() + tm.assert_series_equal(expected, result) + + # GH 15635 + df = pd.DataFrame(dict(a=[1, 2, 1], b=[2, 1, 1])) + result = df.groupby('a').b.cummax() + expected = pd.Series([2, 1, 2], name='b') + tm.assert_series_equal(result, expected) -def assert_fp_equal(a, b): - assert (np.abs(a - b) < 1e-12).all() + df = pd.DataFrame(dict(a=[1, 2, 1], b=[1, 2, 2])) + result = df.groupby('a').b.cummin() + expected = pd.Series([1, 2, 1], name='b') + tm.assert_series_equal(result, expected) + + def test_apply_numeric_coercion_when_datetime(self): + # In the past, group-by/apply operations have been over-eager + # in converting dtypes to numeric, in the presence of datetime + # columns. Various GH issues were filed, the reproductions + # for which are here. + + # GH 15670 + df = pd.DataFrame({'Number': [1, 2], + 'Date': ["2017-03-02"] * 2, + 'Str': ["foo", "inf"]}) + expected = df.groupby(['Number']).apply(lambda x: x.iloc[0]) + df.Date = pd.to_datetime(df.Date) + result = df.groupby(['Number']).apply(lambda x: x.iloc[0]) + tm.assert_series_equal(result['Str'], expected['Str']) + + # GH 15421 + df = pd.DataFrame({'A': [10, 20, 30], + 'B': ['foo', '3', '4'], + 'T': [pd.Timestamp("12:31:22")] * 3}) + + def get_B(g): + return g.iloc[0][['B']] + result = df.groupby('A').apply(get_B)['B'] + expected = df.B + expected.index = df.A + tm.assert_series_equal(result, expected) + + # GH 14423 + def predictions(tool): + out = pd.Series(index=['p1', 'p2', 'useTime'], dtype=object) + if 'step1' in list(tool.State): + out['p1'] = str(tool[tool.State == 'step1'].Machine.values[0]) + if 'step2' in list(tool.State): + out['p2'] = str(tool[tool.State == 'step2'].Machine.values[0]) + out['useTime'] = str( + tool[tool.State == 'step2'].oTime.values[0]) + return out + df1 = pd.DataFrame({'Key': ['B', 'B', 'A', 'A'], + 'State': ['step1', 'step2', 'step1', 'step2'], + 'oTime': ['', '2016-09-19 05:24:33', + '', '2016-09-19 23:59:04'], + 'Machine': ['23', '36L', '36R', '36R']}) + df2 = df1.copy() + df2.oTime = pd.to_datetime(df2.oTime) + expected = df1.groupby('Key').apply(predictions).p1 + result = df2.groupby('Key').apply(predictions).p1 + tm.assert_series_equal(expected, result) + + def test_gb_key_len_equal_axis_len(self): + # GH16843 + # test ensures that index and column keys are recognized correctly + # when number of keys equals axis length of groupby + df = pd.DataFrame([['foo', 'bar', 'B', 1], + ['foo', 'bar', 'B', 2], + ['foo', 'baz', 'C', 3]], + columns=['first', 'second', 'third', 'one']) + df = df.set_index(['first', 'second']) + df = df.groupby(['first', 'second', 'third']).size() + assert df.loc[('foo', 'bar', 'B')] == 2 + assert df.loc[('foo', 'baz', 'C')] == 1 def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): @@ -6028,29 +3911,3 @@ def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): expected = f(df.groupby(tups)[field]) for k, v in compat.iteritems(expected): assert (result[k] == v) - - -def test_decons(): - from pandas.core.groupby import decons_group_index, get_group_index - - def testit(label_list, shape): - group_index = get_group_index(label_list, shape, sort=True, xnull=True) - label_list2 = decons_group_index(group_index, shape) - - for a, b in zip(label_list, label_list2): - assert (np.array_equal(a, b)) - - shape = (4, 5, 6) - label_list = [np.tile([0, 1, 2, 3, 0, 1, 2, 3], 100), np.tile( - [0, 2, 4, 3, 0, 1, 2, 3], 100), np.tile( - [5, 1, 0, 2, 3, 0, 5, 4], 100)] - testit(label_list, shape) - - shape = (10000, 10000) - label_list = [np.tile(np.arange(10000), 5), np.tile(np.arange(10000), 5)] - testit(label_list, shape) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure', '-s' - ], exit=False) diff --git a/pandas/tests/groupby/test_nth.py b/pandas/tests/groupby/test_nth.py new file mode 100644 index 0000000000000..28392537be3c6 --- /dev/null +++ b/pandas/tests/groupby/test_nth.py @@ -0,0 +1,247 @@ +import numpy as np +import pandas as pd +from pandas import DataFrame, MultiIndex, Index, Series, isna +from pandas.compat import lrange +from pandas.util.testing import assert_frame_equal, assert_series_equal + +from .common import MixIn + + +class TestNth(MixIn): + + def test_first_last_nth(self): + # tests for first / last / nth + grouped = self.df.groupby('A') + first = grouped.first() + expected = self.df.loc[[1, 0], ['B', 'C', 'D']] + expected.index = Index(['bar', 'foo'], name='A') + expected = expected.sort_index() + assert_frame_equal(first, expected) + + nth = grouped.nth(0) + assert_frame_equal(nth, expected) + + last = grouped.last() + expected = self.df.loc[[5, 7], ['B', 'C', 'D']] + expected.index = Index(['bar', 'foo'], name='A') + assert_frame_equal(last, expected) + + nth = grouped.nth(-1) + assert_frame_equal(nth, expected) + + nth = grouped.nth(1) + expected = self.df.loc[[2, 3], ['B', 'C', 'D']].copy() + expected.index = Index(['foo', 'bar'], name='A') + expected = expected.sort_index() + assert_frame_equal(nth, expected) + + # it works! + grouped['B'].first() + grouped['B'].last() + grouped['B'].nth(0) + + self.df.loc[self.df['A'] == 'foo', 'B'] = np.nan + assert isna(grouped['B'].first()['foo']) + assert isna(grouped['B'].last()['foo']) + assert isna(grouped['B'].nth(0)['foo']) + + # v0.14.0 whatsnew + df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) + g = df.groupby('A') + result = g.first() + expected = df.iloc[[1, 2]].set_index('A') + assert_frame_equal(result, expected) + + expected = df.iloc[[1, 2]].set_index('A') + result = g.nth(0, dropna='any') + assert_frame_equal(result, expected) + + def test_first_last_nth_dtypes(self): + + df = self.df_mixed_floats.copy() + df['E'] = True + df['F'] = 1 + + # tests for first / last / nth + grouped = df.groupby('A') + first = grouped.first() + expected = df.loc[[1, 0], ['B', 'C', 'D', 'E', 'F']] + expected.index = Index(['bar', 'foo'], name='A') + expected = expected.sort_index() + assert_frame_equal(first, expected) + + last = grouped.last() + expected = df.loc[[5, 7], ['B', 'C', 'D', 'E', 'F']] + expected.index = Index(['bar', 'foo'], name='A') + expected = expected.sort_index() + assert_frame_equal(last, expected) + + nth = grouped.nth(1) + expected = df.loc[[3, 2], ['B', 'C', 'D', 'E', 'F']] + expected.index = Index(['bar', 'foo'], name='A') + expected = expected.sort_index() + assert_frame_equal(nth, expected) + + # GH 2763, first/last shifting dtypes + idx = lrange(10) + idx.append(9) + s = Series(data=lrange(11), index=idx, name='IntCol') + assert s.dtype == 'int64' + f = s.groupby(level=0).first() + assert f.dtype == 'int64' + + def test_nth(self): + df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) + g = df.groupby('A') + + assert_frame_equal(g.nth(0), df.iloc[[0, 2]].set_index('A')) + assert_frame_equal(g.nth(1), df.iloc[[1]].set_index('A')) + assert_frame_equal(g.nth(2), df.loc[[]].set_index('A')) + assert_frame_equal(g.nth(-1), df.iloc[[1, 2]].set_index('A')) + assert_frame_equal(g.nth(-2), df.iloc[[0]].set_index('A')) + assert_frame_equal(g.nth(-3), df.loc[[]].set_index('A')) + assert_series_equal(g.B.nth(0), df.set_index('A').B.iloc[[0, 2]]) + assert_series_equal(g.B.nth(1), df.set_index('A').B.iloc[[1]]) + assert_frame_equal(g[['B']].nth(0), + df.loc[[0, 2], ['A', 'B']].set_index('A')) + + exp = df.set_index('A') + assert_frame_equal(g.nth(0, dropna='any'), exp.iloc[[1, 2]]) + assert_frame_equal(g.nth(-1, dropna='any'), exp.iloc[[1, 2]]) + + exp['B'] = np.nan + assert_frame_equal(g.nth(7, dropna='any'), exp.iloc[[1, 2]]) + assert_frame_equal(g.nth(2, dropna='any'), exp.iloc[[1, 2]]) + + # out of bounds, regression from 0.13.1 + # GH 6621 + df = DataFrame({'color': {0: 'green', + 1: 'green', + 2: 'red', + 3: 'red', + 4: 'red'}, + 'food': {0: 'ham', + 1: 'eggs', + 2: 'eggs', + 3: 'ham', + 4: 'pork'}, + 'two': {0: 1.5456590000000001, + 1: -0.070345000000000005, + 2: -2.4004539999999999, + 3: 0.46206000000000003, + 4: 0.52350799999999997}, + 'one': {0: 0.56573799999999996, + 1: -0.9742360000000001, + 2: 1.033801, + 3: -0.78543499999999999, + 4: 0.70422799999999997}}).set_index(['color', + 'food']) + + result = df.groupby(level=0, as_index=False).nth(2) + expected = df.iloc[[-1]] + assert_frame_equal(result, expected) + + result = df.groupby(level=0, as_index=False).nth(3) + expected = df.loc[[]] + assert_frame_equal(result, expected) + + # GH 7559 + # from the vbench + df = DataFrame(np.random.randint(1, 10, (100, 2)), dtype='int64') + s = df[1] + g = df[0] + expected = s.groupby(g).first() + expected2 = s.groupby(g).apply(lambda x: x.iloc[0]) + assert_series_equal(expected2, expected, check_names=False) + assert expected.name == 1 + assert expected2.name == 1 + + # validate first + v = s[g == 1].iloc[0] + assert expected.iloc[0] == v + assert expected2.iloc[0] == v + + # this is NOT the same as .first (as sorted is default!) + # as it keeps the order in the series (and not the group order) + # related GH 7287 + expected = s.groupby(g, sort=False).first() + result = s.groupby(g, sort=False).nth(0, dropna='all') + assert_series_equal(result, expected) + + # doc example + df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) + g = df.groupby('A') + result = g.B.nth(0, dropna=True) + expected = g.B.first() + assert_series_equal(result, expected) + + # test multiple nth values + df = DataFrame([[1, np.nan], [1, 3], [1, 4], [5, 6], [5, 7]], + columns=['A', 'B']) + g = df.groupby('A') + + assert_frame_equal(g.nth(0), df.iloc[[0, 3]].set_index('A')) + assert_frame_equal(g.nth([0]), df.iloc[[0, 3]].set_index('A')) + assert_frame_equal(g.nth([0, 1]), df.iloc[[0, 1, 3, 4]].set_index('A')) + assert_frame_equal( + g.nth([0, -1]), df.iloc[[0, 2, 3, 4]].set_index('A')) + assert_frame_equal( + g.nth([0, 1, 2]), df.iloc[[0, 1, 2, 3, 4]].set_index('A')) + assert_frame_equal( + g.nth([0, 1, -1]), df.iloc[[0, 1, 2, 3, 4]].set_index('A')) + assert_frame_equal(g.nth([2]), df.iloc[[2]].set_index('A')) + assert_frame_equal(g.nth([3, 4]), df.loc[[]].set_index('A')) + + business_dates = pd.date_range(start='4/1/2014', end='6/30/2014', + freq='B') + df = DataFrame(1, index=business_dates, columns=['a', 'b']) + # get the first, fourth and last two business days for each month + key = (df.index.year, df.index.month) + result = df.groupby(key, as_index=False).nth([0, 3, -2, -1]) + expected_dates = pd.to_datetime( + ['2014/4/1', '2014/4/4', '2014/4/29', '2014/4/30', '2014/5/1', + '2014/5/6', '2014/5/29', '2014/5/30', '2014/6/2', '2014/6/5', + '2014/6/27', '2014/6/30']) + expected = DataFrame(1, columns=['a', 'b'], index=expected_dates) + assert_frame_equal(result, expected) + + def test_nth_multi_index(self): + # PR 9090, related to issue 8979 + # test nth on MultiIndex, should match .first() + grouped = self.three_group.groupby(['A', 'B']) + result = grouped.nth(0) + expected = grouped.first() + assert_frame_equal(result, expected) + + def test_nth_multi_index_as_expected(self): + # PR 9090, related to issue 8979 + # test nth on MultiIndex + three_group = DataFrame( + {'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', + 'foo', 'foo', 'foo'], + 'B': ['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', + 'two', 'two', 'one'], + 'C': ['dull', 'dull', 'shiny', 'dull', 'dull', 'shiny', 'shiny', + 'dull', 'shiny', 'shiny', 'shiny']}) + grouped = three_group.groupby(['A', 'B']) + result = grouped.nth(0) + expected = DataFrame( + {'C': ['dull', 'dull', 'dull', 'dull']}, + index=MultiIndex.from_arrays([['bar', 'bar', 'foo', 'foo'], + ['one', 'two', 'one', 'two']], + names=['A', 'B'])) + assert_frame_equal(result, expected) + + +def test_nth_empty(): + # GH 16064 + df = DataFrame(index=[0], columns=['a', 'b', 'c']) + result = df.groupby('a').nth(10) + expected = DataFrame(index=Index([], name='a'), columns=['b', 'c']) + assert_frame_equal(result, expected) + + result = df.groupby(['a', 'b']).nth(10) + expected = DataFrame(index=MultiIndex([[], []], [[], []], + names=['a', 'b']), + columns=['c']) + assert_frame_equal(result, expected) diff --git a/pandas/tests/groupby/test_timegrouper.py b/pandas/tests/groupby/test_timegrouper.py new file mode 100644 index 0000000000000..df0a93d783375 --- /dev/null +++ b/pandas/tests/groupby/test_timegrouper.py @@ -0,0 +1,610 @@ +""" test with the TimeGrouper / grouping with datetimes """ + +import pytest +import pytz + +from datetime import datetime +import numpy as np +from numpy import nan + +import pandas as pd +from pandas import (DataFrame, date_range, Index, + Series, MultiIndex, Timestamp, DatetimeIndex) +from pandas.compat import StringIO +from pandas.util import testing as tm +from pandas.util.testing import assert_frame_equal, assert_series_equal + + +class TestGroupBy(object): + + def test_groupby_with_timegrouper(self): + # GH 4161 + # TimeGrouper requires a sorted index + # also verifies that the resultant index has the correct name + df_original = DataFrame({ + 'Buyer': 'Carl Carl Carl Carl Joe Carl'.split(), + 'Quantity': [18, 3, 5, 1, 9, 3], + 'Date': [ + datetime(2013, 9, 1, 13, 0), + datetime(2013, 9, 1, 13, 5), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 3, 10, 0), + datetime(2013, 12, 2, 12, 0), + datetime(2013, 9, 2, 14, 0), + ] + }) + + # GH 6908 change target column's order + df_reordered = df_original.sort_values(by='Quantity') + + for df in [df_original, df_reordered]: + df = df.set_index(['Date']) + + expected = DataFrame( + {'Quantity': np.nan}, + index=date_range('20130901 13:00:00', + '20131205 13:00:00', freq='5D', + name='Date', closed='left')) + expected.iloc[[0, 6, 18], 0] = np.array( + [24., 6., 9.], dtype='float64') + + result1 = df.resample('5D') .sum() + assert_frame_equal(result1, expected) + + df_sorted = df.sort_index() + result2 = df_sorted.groupby(pd.TimeGrouper(freq='5D')).sum() + assert_frame_equal(result2, expected) + + result3 = df.groupby(pd.TimeGrouper(freq='5D')).sum() + assert_frame_equal(result3, expected) + + def test_groupby_with_timegrouper_methods(self): + # GH 3881 + # make sure API of timegrouper conforms + + df_original = pd.DataFrame({ + 'Branch': 'A A A A A B'.split(), + 'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(), + 'Quantity': [1, 3, 5, 8, 9, 3], + 'Date': [ + datetime(2013, 1, 1, 13, 0), + datetime(2013, 1, 1, 13, 5), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 2, 10, 0), + datetime(2013, 12, 2, 12, 0), + datetime(2013, 12, 2, 14, 0), + ] + }) + + df_sorted = df_original.sort_values(by='Quantity', ascending=False) + + for df in [df_original, df_sorted]: + df = df.set_index('Date', drop=False) + g = df.groupby(pd.TimeGrouper('6M')) + assert g.group_keys + assert isinstance(g.grouper, pd.core.groupby.BinGrouper) + groups = g.groups + assert isinstance(groups, dict) + assert len(groups) == 3 + + def test_timegrouper_with_reg_groups(self): + + # GH 3794 + # allow combinateion of timegrouper/reg groups + + df_original = DataFrame({ + 'Branch': 'A A A A A A A B'.split(), + 'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(), + 'Quantity': [1, 3, 5, 1, 8, 1, 9, 3], + 'Date': [ + datetime(2013, 1, 1, 13, 0), + datetime(2013, 1, 1, 13, 5), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 2, 10, 0), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 2, 10, 0), + datetime(2013, 12, 2, 12, 0), + datetime(2013, 12, 2, 14, 0), + ] + }).set_index('Date') + + df_sorted = df_original.sort_values(by='Quantity', ascending=False) + + for df in [df_original, df_sorted]: + expected = DataFrame({ + 'Buyer': 'Carl Joe Mark'.split(), + 'Quantity': [10, 18, 3], + 'Date': [ + datetime(2013, 12, 31, 0, 0), + datetime(2013, 12, 31, 0, 0), + datetime(2013, 12, 31, 0, 0), + ] + }).set_index(['Date', 'Buyer']) + + result = df.groupby([pd.Grouper(freq='A'), 'Buyer']).sum() + assert_frame_equal(result, expected) + + expected = DataFrame({ + 'Buyer': 'Carl Mark Carl Joe'.split(), + 'Quantity': [1, 3, 9, 18], + 'Date': [ + datetime(2013, 1, 1, 0, 0), + datetime(2013, 1, 1, 0, 0), + datetime(2013, 7, 1, 0, 0), + datetime(2013, 7, 1, 0, 0), + ] + }).set_index(['Date', 'Buyer']) + result = df.groupby([pd.Grouper(freq='6MS'), 'Buyer']).sum() + assert_frame_equal(result, expected) + + df_original = DataFrame({ + 'Branch': 'A A A A A A A B'.split(), + 'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(), + 'Quantity': [1, 3, 5, 1, 8, 1, 9, 3], + 'Date': [ + datetime(2013, 10, 1, 13, 0), + datetime(2013, 10, 1, 13, 5), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 2, 10, 0), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 2, 10, 0), + datetime(2013, 10, 2, 12, 0), + datetime(2013, 10, 2, 14, 0), + ] + }).set_index('Date') + + df_sorted = df_original.sort_values(by='Quantity', ascending=False) + for df in [df_original, df_sorted]: + + expected = DataFrame({ + 'Buyer': 'Carl Joe Mark Carl Joe'.split(), + 'Quantity': [6, 8, 3, 4, 10], + 'Date': [ + datetime(2013, 10, 1, 0, 0), + datetime(2013, 10, 1, 0, 0), + datetime(2013, 10, 1, 0, 0), + datetime(2013, 10, 2, 0, 0), + datetime(2013, 10, 2, 0, 0), + ] + }).set_index(['Date', 'Buyer']) + + result = df.groupby([pd.Grouper(freq='1D'), 'Buyer']).sum() + assert_frame_equal(result, expected) + + result = df.groupby([pd.Grouper(freq='1M'), 'Buyer']).sum() + expected = DataFrame({ + 'Buyer': 'Carl Joe Mark'.split(), + 'Quantity': [10, 18, 3], + 'Date': [ + datetime(2013, 10, 31, 0, 0), + datetime(2013, 10, 31, 0, 0), + datetime(2013, 10, 31, 0, 0), + ] + }).set_index(['Date', 'Buyer']) + assert_frame_equal(result, expected) + + # passing the name + df = df.reset_index() + result = df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer' + ]).sum() + assert_frame_equal(result, expected) + + with pytest.raises(KeyError): + df.groupby([pd.Grouper(freq='1M', key='foo'), 'Buyer']).sum() + + # passing the level + df = df.set_index('Date') + result = df.groupby([pd.Grouper(freq='1M', level='Date'), 'Buyer' + ]).sum() + assert_frame_equal(result, expected) + result = df.groupby([pd.Grouper(freq='1M', level=0), 'Buyer']).sum( + ) + assert_frame_equal(result, expected) + + with pytest.raises(ValueError): + df.groupby([pd.Grouper(freq='1M', level='foo'), + 'Buyer']).sum() + + # multi names + df = df.copy() + df['Date'] = df.index + pd.offsets.MonthEnd(2) + result = df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer' + ]).sum() + expected = DataFrame({ + 'Buyer': 'Carl Joe Mark'.split(), + 'Quantity': [10, 18, 3], + 'Date': [ + datetime(2013, 11, 30, 0, 0), + datetime(2013, 11, 30, 0, 0), + datetime(2013, 11, 30, 0, 0), + ] + }).set_index(['Date', 'Buyer']) + assert_frame_equal(result, expected) + + # error as we have both a level and a name! + with pytest.raises(ValueError): + df.groupby([pd.Grouper(freq='1M', key='Date', + level='Date'), 'Buyer']).sum() + + # single groupers + expected = DataFrame({'Quantity': [31], + 'Date': [datetime(2013, 10, 31, 0, 0) + ]}).set_index('Date') + result = df.groupby(pd.Grouper(freq='1M')).sum() + assert_frame_equal(result, expected) + + result = df.groupby([pd.Grouper(freq='1M')]).sum() + assert_frame_equal(result, expected) + + expected = DataFrame({'Quantity': [31], + 'Date': [datetime(2013, 11, 30, 0, 0) + ]}).set_index('Date') + result = df.groupby(pd.Grouper(freq='1M', key='Date')).sum() + assert_frame_equal(result, expected) + + result = df.groupby([pd.Grouper(freq='1M', key='Date')]).sum() + assert_frame_equal(result, expected) + + # GH 6764 multiple grouping with/without sort + df = DataFrame({ + 'date': pd.to_datetime([ + '20121002', '20121007', '20130130', '20130202', '20130305', + '20121002', '20121207', '20130130', '20130202', '20130305', + '20130202', '20130305' + ]), + 'user_id': [1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 5], + 'whole_cost': [1790, 364, 280, 259, 201, 623, 90, 312, 359, 301, + 359, 801], + 'cost1': [12, 15, 10, 24, 39, 1, 0, 90, 45, 34, 1, 12] + }).set_index('date') + + for freq in ['D', 'M', 'A', 'Q-APR']: + expected = df.groupby('user_id')[ + 'whole_cost'].resample( + freq).sum().dropna().reorder_levels( + ['date', 'user_id']).sort_index().astype('int64') + expected.name = 'whole_cost' + + result1 = df.sort_index().groupby([pd.TimeGrouper(freq=freq), + 'user_id'])['whole_cost'].sum() + assert_series_equal(result1, expected) + + result2 = df.groupby([pd.TimeGrouper(freq=freq), 'user_id'])[ + 'whole_cost'].sum() + assert_series_equal(result2, expected) + + def test_timegrouper_get_group(self): + # GH 6914 + + df_original = DataFrame({ + 'Buyer': 'Carl Joe Joe Carl Joe Carl'.split(), + 'Quantity': [18, 3, 5, 1, 9, 3], + 'Date': [datetime(2013, 9, 1, 13, 0), + datetime(2013, 9, 1, 13, 5), + datetime(2013, 10, 1, 20, 0), + datetime(2013, 10, 3, 10, 0), + datetime(2013, 12, 2, 12, 0), + datetime(2013, 9, 2, 14, 0), ] + }) + df_reordered = df_original.sort_values(by='Quantity') + + # single grouping + expected_list = [df_original.iloc[[0, 1, 5]], df_original.iloc[[2, 3]], + df_original.iloc[[4]]] + dt_list = ['2013-09-30', '2013-10-31', '2013-12-31'] + + for df in [df_original, df_reordered]: + grouped = df.groupby(pd.Grouper(freq='M', key='Date')) + for t, expected in zip(dt_list, expected_list): + dt = pd.Timestamp(t) + result = grouped.get_group(dt) + assert_frame_equal(result, expected) + + # multiple grouping + expected_list = [df_original.iloc[[1]], df_original.iloc[[3]], + df_original.iloc[[4]]] + g_list = [('Joe', '2013-09-30'), ('Carl', '2013-10-31'), + ('Joe', '2013-12-31')] + + for df in [df_original, df_reordered]: + grouped = df.groupby(['Buyer', pd.Grouper(freq='M', key='Date')]) + for (b, t), expected in zip(g_list, expected_list): + dt = pd.Timestamp(t) + result = grouped.get_group((b, dt)) + assert_frame_equal(result, expected) + + # with index + df_original = df_original.set_index('Date') + df_reordered = df_original.sort_values(by='Quantity') + + expected_list = [df_original.iloc[[0, 1, 5]], df_original.iloc[[2, 3]], + df_original.iloc[[4]]] + + for df in [df_original, df_reordered]: + grouped = df.groupby(pd.Grouper(freq='M')) + for t, expected in zip(dt_list, expected_list): + dt = pd.Timestamp(t) + result = grouped.get_group(dt) + assert_frame_equal(result, expected) + + def test_timegrouper_apply_return_type_series(self): + # Using `apply` with the `TimeGrouper` should give the + # same return type as an `apply` with a `Grouper`. + # Issue #11742 + df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], + 'value': [10, 13]}) + df_dt = df.copy() + df_dt['date'] = pd.to_datetime(df_dt['date']) + + def sumfunc_series(x): + return pd.Series([x['value'].sum()], ('sum',)) + + expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_series) + result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) + .apply(sumfunc_series)) + assert_frame_equal(result.reset_index(drop=True), + expected.reset_index(drop=True)) + + def test_timegrouper_apply_return_type_value(self): + # Using `apply` with the `TimeGrouper` should give the + # same return type as an `apply` with a `Grouper`. + # Issue #11742 + df = pd.DataFrame({'date': ['10/10/2000', '11/10/2000'], + 'value': [10, 13]}) + df_dt = df.copy() + df_dt['date'] = pd.to_datetime(df_dt['date']) + + def sumfunc_value(x): + return x.value.sum() + + expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_value) + result = (df_dt.groupby(pd.TimeGrouper(freq='M', key='date')) + .apply(sumfunc_value)) + assert_series_equal(result.reset_index(drop=True), + expected.reset_index(drop=True)) + + def test_groupby_groups_datetimeindex(self): + # #1430 + periods = 1000 + ind = DatetimeIndex(start='2012/1/1', freq='5min', periods=periods) + df = DataFrame({'high': np.arange(periods), + 'low': np.arange(periods)}, index=ind) + grouped = df.groupby(lambda x: datetime(x.year, x.month, x.day)) + + # it works! + groups = grouped.groups + assert isinstance(list(groups.keys())[0], datetime) + + # GH 11442 + index = pd.date_range('2015/01/01', periods=5, name='date') + df = pd.DataFrame({'A': [5, 6, 7, 8, 9], + 'B': [1, 2, 3, 4, 5]}, index=index) + result = df.groupby(level='date').groups + dates = ['2015-01-05', '2015-01-04', '2015-01-03', + '2015-01-02', '2015-01-01'] + expected = {pd.Timestamp(date): pd.DatetimeIndex([date], name='date') + for date in dates} + tm.assert_dict_equal(result, expected) + + grouped = df.groupby(level='date') + for date in dates: + result = grouped.get_group(date) + data = [[df.loc[date, 'A'], df.loc[date, 'B']]] + expected_index = pd.DatetimeIndex([date], name='date') + expected = pd.DataFrame(data, + columns=list('AB'), + index=expected_index) + tm.assert_frame_equal(result, expected) + + def test_groupby_groups_datetimeindex_tz(self): + # GH 3950 + dates = ['2011-07-19 07:00:00', '2011-07-19 08:00:00', + '2011-07-19 09:00:00', '2011-07-19 07:00:00', + '2011-07-19 08:00:00', '2011-07-19 09:00:00'] + df = DataFrame({'label': ['a', 'a', 'a', 'b', 'b', 'b'], + 'datetime': dates, + 'value1': np.arange(6, dtype='int64'), + 'value2': [1, 2] * 3}) + df['datetime'] = df['datetime'].apply( + lambda d: Timestamp(d, tz='US/Pacific')) + + exp_idx1 = pd.DatetimeIndex(['2011-07-19 07:00:00', + '2011-07-19 07:00:00', + '2011-07-19 08:00:00', + '2011-07-19 08:00:00', + '2011-07-19 09:00:00', + '2011-07-19 09:00:00'], + tz='US/Pacific', name='datetime') + exp_idx2 = Index(['a', 'b'] * 3, name='label') + exp_idx = MultiIndex.from_arrays([exp_idx1, exp_idx2]) + expected = DataFrame({'value1': [0, 3, 1, 4, 2, 5], + 'value2': [1, 2, 2, 1, 1, 2]}, + index=exp_idx, columns=['value1', 'value2']) + + result = df.groupby(['datetime', 'label']).sum() + assert_frame_equal(result, expected) + + # by level + didx = pd.DatetimeIndex(dates, tz='Asia/Tokyo') + df = DataFrame({'value1': np.arange(6, dtype='int64'), + 'value2': [1, 2, 3, 1, 2, 3]}, + index=didx) + + exp_idx = pd.DatetimeIndex(['2011-07-19 07:00:00', + '2011-07-19 08:00:00', + '2011-07-19 09:00:00'], tz='Asia/Tokyo') + expected = DataFrame({'value1': [3, 5, 7], 'value2': [2, 4, 6]}, + index=exp_idx, columns=['value1', 'value2']) + + result = df.groupby(level=0).sum() + assert_frame_equal(result, expected) + + def test_frame_datetime64_handling_groupby(self): + # it works! + df = DataFrame([(3, np.datetime64('2012-07-03')), + (3, np.datetime64('2012-07-04'))], + columns=['a', 'date']) + result = df.groupby('a').first() + assert result['date'][3] == Timestamp('2012-07-03') + + def test_groupby_multi_timezone(self): + + # combining multiple / different timezones yields UTC + + data = """0,2000-01-28 16:47:00,America/Chicago +1,2000-01-29 16:48:00,America/Chicago +2,2000-01-30 16:49:00,America/Los_Angeles +3,2000-01-31 16:50:00,America/Chicago +4,2000-01-01 16:50:00,America/New_York""" + + df = pd.read_csv(StringIO(data), header=None, + names=['value', 'date', 'tz']) + result = df.groupby('tz').date.apply( + lambda x: pd.to_datetime(x).dt.tz_localize(x.name)) + + expected = Series([Timestamp('2000-01-28 16:47:00-0600', + tz='America/Chicago'), + Timestamp('2000-01-29 16:48:00-0600', + tz='America/Chicago'), + Timestamp('2000-01-30 16:49:00-0800', + tz='America/Los_Angeles'), + Timestamp('2000-01-31 16:50:00-0600', + tz='America/Chicago'), + Timestamp('2000-01-01 16:50:00-0500', + tz='America/New_York')], + name='date', + dtype=object) + assert_series_equal(result, expected) + + tz = 'America/Chicago' + res_values = df.groupby('tz').date.get_group(tz) + result = pd.to_datetime(res_values).dt.tz_localize(tz) + exp_values = Series(['2000-01-28 16:47:00', '2000-01-29 16:48:00', + '2000-01-31 16:50:00'], + index=[0, 1, 3], name='date') + expected = pd.to_datetime(exp_values).dt.tz_localize(tz) + assert_series_equal(result, expected) + + def test_groupby_groups_periods(self): + dates = ['2011-07-19 07:00:00', '2011-07-19 08:00:00', + '2011-07-19 09:00:00', '2011-07-19 07:00:00', + '2011-07-19 08:00:00', '2011-07-19 09:00:00'] + df = DataFrame({'label': ['a', 'a', 'a', 'b', 'b', 'b'], + 'period': [pd.Period(d, freq='H') for d in dates], + 'value1': np.arange(6, dtype='int64'), + 'value2': [1, 2] * 3}) + + exp_idx1 = pd.PeriodIndex(['2011-07-19 07:00:00', + '2011-07-19 07:00:00', + '2011-07-19 08:00:00', + '2011-07-19 08:00:00', + '2011-07-19 09:00:00', + '2011-07-19 09:00:00'], + freq='H', name='period') + exp_idx2 = Index(['a', 'b'] * 3, name='label') + exp_idx = MultiIndex.from_arrays([exp_idx1, exp_idx2]) + expected = DataFrame({'value1': [0, 3, 1, 4, 2, 5], + 'value2': [1, 2, 2, 1, 1, 2]}, + index=exp_idx, columns=['value1', 'value2']) + + result = df.groupby(['period', 'label']).sum() + assert_frame_equal(result, expected) + + # by level + didx = pd.PeriodIndex(dates, freq='H') + df = DataFrame({'value1': np.arange(6, dtype='int64'), + 'value2': [1, 2, 3, 1, 2, 3]}, + index=didx) + + exp_idx = pd.PeriodIndex(['2011-07-19 07:00:00', + '2011-07-19 08:00:00', + '2011-07-19 09:00:00'], freq='H') + expected = DataFrame({'value1': [3, 5, 7], 'value2': [2, 4, 6]}, + index=exp_idx, columns=['value1', 'value2']) + + result = df.groupby(level=0).sum() + assert_frame_equal(result, expected) + + def test_groupby_first_datetime64(self): + df = DataFrame([(1, 1351036800000000000), (2, 1351036800000000000)]) + df[1] = df[1].view('M8[ns]') + + assert issubclass(df[1].dtype.type, np.datetime64) + + result = df.groupby(level=0).first() + got_dt = result[1].dtype + assert issubclass(got_dt.type, np.datetime64) + + result = df[1].groupby(level=0).first() + got_dt = result.dtype + assert issubclass(got_dt.type, np.datetime64) + + def test_groupby_max_datetime64(self): + # GH 5869 + # datetimelike dtype conversion from int + df = DataFrame(dict(A=Timestamp('20130101'), B=np.arange(5))) + expected = df.groupby('A')['A'].apply(lambda x: x.max()) + result = df.groupby('A')['A'].max() + assert_series_equal(result, expected) + + def test_groupby_datetime64_32_bit(self): + # GH 6410 / numpy 4328 + # 32-bit under 1.9-dev indexing issue + + df = DataFrame({"A": range(2), "B": [pd.Timestamp('2000-01-1')] * 2}) + result = df.groupby("A")["B"].transform(min) + expected = Series([pd.Timestamp('2000-01-1')] * 2, name='B') + assert_series_equal(result, expected) + + def test_groupby_with_timezone_selection(self): + # GH 11616 + # Test that column selection returns output in correct timezone. + np.random.seed(42) + df = pd.DataFrame({ + 'factor': np.random.randint(0, 3, size=60), + 'time': pd.date_range('01/01/2000 00:00', periods=60, + freq='s', tz='UTC') + }) + df1 = df.groupby('factor').max()['time'] + df2 = df.groupby('factor')['time'].max() + tm.assert_series_equal(df1, df2) + + def test_timezone_info(self): + # see gh-11682: Timezone info lost when broadcasting + # scalar datetime to DataFrame + + df = pd.DataFrame({'a': [1], 'b': [datetime.now(pytz.utc)]}) + assert df['b'][0].tzinfo == pytz.utc + df = pd.DataFrame({'a': [1, 2, 3]}) + df['b'] = datetime.now(pytz.utc) + assert df['b'][0].tzinfo == pytz.utc + + def test_datetime_count(self): + df = DataFrame({'a': [1, 2, 3] * 2, + 'dates': pd.date_range('now', periods=6, freq='T')}) + result = df.groupby('a').dates.count() + expected = Series([ + 2, 2, 2 + ], index=Index([1, 2, 3], name='a'), name='dates') + tm.assert_series_equal(result, expected) + + def test_first_last_max_min_on_time_data(self): + # GH 10295 + # Verify that NaT is not in the result of max, min, first and last on + # Dataframe with datetime or timedelta values. + from datetime import timedelta as td + df_test = DataFrame( + {'dt': [nan, '2015-07-24 10:10', '2015-07-25 11:11', + '2015-07-23 12:12', nan], + 'td': [nan, td(days=1), td(days=2), td(days=3), nan]}) + df_test.dt = pd.to_datetime(df_test.dt) + df_test['group'] = 'A' + df_ref = df_test[df_test.dt.notna()] + + grouped_test = df_test.groupby('group') + grouped_ref = df_ref.groupby('group') + + assert_frame_equal(grouped_ref.max(), grouped_test.max()) + assert_frame_equal(grouped_ref.min(), grouped_test.min()) + assert_frame_equal(grouped_ref.first(), grouped_test.first()) + assert_frame_equal(grouped_ref.last(), grouped_test.last()) diff --git a/pandas/tests/groupby/test_transform.py b/pandas/tests/groupby/test_transform.py new file mode 100644 index 0000000000000..98839a17d6e0c --- /dev/null +++ b/pandas/tests/groupby/test_transform.py @@ -0,0 +1,576 @@ +""" test with the .transform """ + +import pytest + +import numpy as np +import pandas as pd +from pandas.util import testing as tm +from pandas import Series, DataFrame, Timestamp, MultiIndex, concat, date_range +from pandas.core.dtypes.common import ( + _ensure_platform_int, is_timedelta64_dtype) +from pandas.compat import StringIO +from pandas._libs import groupby +from .common import MixIn, assert_fp_equal + +from pandas.util.testing import assert_frame_equal, assert_series_equal +from pandas.core.groupby import DataError +from pandas.core.config import option_context + + +class TestGroupBy(MixIn): + + def test_transform(self): + data = Series(np.arange(9) // 3, index=np.arange(9)) + + index = np.arange(9) + np.random.shuffle(index) + data = data.reindex(index) + + grouped = data.groupby(lambda x: x // 3) + + transformed = grouped.transform(lambda x: x * x.sum()) + assert transformed[7] == 12 + + # GH 8046 + # make sure that we preserve the input order + + df = DataFrame( + np.arange(6, dtype='int64').reshape( + 3, 2), columns=["a", "b"], index=[0, 2, 1]) + key = [0, 0, 1] + expected = df.sort_index().groupby(key).transform( + lambda x: x - x.mean()).groupby(key).mean() + result = df.groupby(key).transform(lambda x: x - x.mean()).groupby( + key).mean() + assert_frame_equal(result, expected) + + def demean(arr): + return arr - arr.mean() + + people = DataFrame(np.random.randn(5, 5), + columns=['a', 'b', 'c', 'd', 'e'], + index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis']) + key = ['one', 'two', 'one', 'two', 'one'] + result = people.groupby(key).transform(demean).groupby(key).mean() + expected = people.groupby(key).apply(demean).groupby(key).mean() + assert_frame_equal(result, expected) + + # GH 8430 + df = tm.makeTimeDataFrame() + g = df.groupby(pd.TimeGrouper('M')) + g.transform(lambda x: x - 1) + + # GH 9700 + df = DataFrame({'a': range(5, 10), 'b': range(5)}) + result = df.groupby('a').transform(max) + expected = DataFrame({'b': range(5)}) + tm.assert_frame_equal(result, expected) + + def test_transform_fast(self): + + df = DataFrame({'id': np.arange(100000) / 3, + 'val': np.random.randn(100000)}) + + grp = df.groupby('id')['val'] + + values = np.repeat(grp.mean().values, + _ensure_platform_int(grp.count().values)) + expected = pd.Series(values, index=df.index, name='val') + + result = grp.transform(np.mean) + assert_series_equal(result, expected) + + result = grp.transform('mean') + assert_series_equal(result, expected) + + # GH 12737 + df = pd.DataFrame({'grouping': [0, 1, 1, 3], 'f': [1.1, 2.1, 3.1, 4.5], + 'd': pd.date_range('2014-1-1', '2014-1-4'), + 'i': [1, 2, 3, 4]}, + columns=['grouping', 'f', 'i', 'd']) + result = df.groupby('grouping').transform('first') + + dates = [pd.Timestamp('2014-1-1'), pd.Timestamp('2014-1-2'), + pd.Timestamp('2014-1-2'), pd.Timestamp('2014-1-4')] + expected = pd.DataFrame({'f': [1.1, 2.1, 2.1, 4.5], + 'd': dates, + 'i': [1, 2, 2, 4]}, + columns=['f', 'i', 'd']) + assert_frame_equal(result, expected) + + # selection + result = df.groupby('grouping')[['f', 'i']].transform('first') + expected = expected[['f', 'i']] + assert_frame_equal(result, expected) + + # dup columns + df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['g', 'a', 'a']) + result = df.groupby('g').transform('first') + expected = df.drop('g', axis=1) + assert_frame_equal(result, expected) + + def test_transform_broadcast(self): + grouped = self.ts.groupby(lambda x: x.month) + result = grouped.transform(np.mean) + + tm.assert_index_equal(result.index, self.ts.index) + for _, gp in grouped: + assert_fp_equal(result.reindex(gp.index), gp.mean()) + + grouped = self.tsframe.groupby(lambda x: x.month) + result = grouped.transform(np.mean) + tm.assert_index_equal(result.index, self.tsframe.index) + for _, gp in grouped: + agged = gp.mean() + res = result.reindex(gp.index) + for col in self.tsframe: + assert_fp_equal(res[col], agged[col]) + + # group columns + grouped = self.tsframe.groupby({'A': 0, 'B': 0, 'C': 1, 'D': 1}, + axis=1) + result = grouped.transform(np.mean) + tm.assert_index_equal(result.index, self.tsframe.index) + tm.assert_index_equal(result.columns, self.tsframe.columns) + for _, gp in grouped: + agged = gp.mean(1) + res = result.reindex(columns=gp.columns) + for idx in gp.index: + assert_fp_equal(res.xs(idx), agged[idx]) + + def test_transform_axis(self): + + # make sure that we are setting the axes + # correctly when on axis=0 or 1 + # in the presence of a non-monotonic indexer + # GH12713 + + base = self.tsframe.iloc[0:5] + r = len(base.index) + c = len(base.columns) + tso = DataFrame(np.random.randn(r, c), + index=base.index, + columns=base.columns, + dtype='float64') + # monotonic + ts = tso + grouped = ts.groupby(lambda x: x.weekday()) + result = ts - grouped.transform('mean') + expected = grouped.apply(lambda x: x - x.mean()) + assert_frame_equal(result, expected) + + ts = ts.T + grouped = ts.groupby(lambda x: x.weekday(), axis=1) + result = ts - grouped.transform('mean') + expected = grouped.apply(lambda x: (x.T - x.mean(1)).T) + assert_frame_equal(result, expected) + + # non-monotonic + ts = tso.iloc[[1, 0] + list(range(2, len(base)))] + grouped = ts.groupby(lambda x: x.weekday()) + result = ts - grouped.transform('mean') + expected = grouped.apply(lambda x: x - x.mean()) + assert_frame_equal(result, expected) + + ts = ts.T + grouped = ts.groupby(lambda x: x.weekday(), axis=1) + result = ts - grouped.transform('mean') + expected = grouped.apply(lambda x: (x.T - x.mean(1)).T) + assert_frame_equal(result, expected) + + def test_transform_dtype(self): + # GH 9807 + # Check transform dtype output is preserved + df = DataFrame([[1, 3], [2, 3]]) + result = df.groupby(1).transform('mean') + expected = DataFrame([[1.5], [1.5]]) + assert_frame_equal(result, expected) + + def test_transform_bug(self): + # GH 5712 + # transforming on a datetime column + df = DataFrame(dict(A=Timestamp('20130101'), B=np.arange(5))) + result = df.groupby('A')['B'].transform( + lambda x: x.rank(ascending=False)) + expected = Series(np.arange(5, 0, step=-1), name='B') + assert_series_equal(result, expected) + + def test_transform_numeric_to_boolean(self): + # GH 16875 + # inconsistency in transforming boolean values + expected = pd.Series([True, True], name='A') + + df = pd.DataFrame({'A': [1.1, 2.2], 'B': [1, 2]}) + result = df.groupby('B').A.transform(lambda x: True) + assert_series_equal(result, expected) + + df = pd.DataFrame({'A': [1, 2], 'B': [1, 2]}) + result = df.groupby('B').A.transform(lambda x: True) + assert_series_equal(result, expected) + + def test_transform_datetime_to_timedelta(self): + # GH 15429 + # transforming a datetime to timedelta + df = DataFrame(dict(A=Timestamp('20130101'), B=np.arange(5))) + expected = pd.Series([ + Timestamp('20130101') - Timestamp('20130101')] * 5, name='A') + + # this does date math without changing result type in transform + base_time = df['A'][0] + result = df.groupby('A')['A'].transform( + lambda x: x.max() - x.min() + base_time) - base_time + assert_series_equal(result, expected) + + # this does date math and causes the transform to return timedelta + result = df.groupby('A')['A'].transform(lambda x: x.max() - x.min()) + assert_series_equal(result, expected) + + def test_transform_datetime_to_numeric(self): + # GH 10972 + # convert dt to float + df = DataFrame({ + 'a': 1, 'b': date_range('2015-01-01', periods=2, freq='D')}) + result = df.groupby('a').b.transform( + lambda x: x.dt.dayofweek - x.dt.dayofweek.mean()) + + expected = Series([-0.5, 0.5], name='b') + assert_series_equal(result, expected) + + # convert dt to int + df = DataFrame({ + 'a': 1, 'b': date_range('2015-01-01', periods=2, freq='D')}) + result = df.groupby('a').b.transform( + lambda x: x.dt.dayofweek - x.dt.dayofweek.min()) + + expected = Series([0, 1], name='b') + assert_series_equal(result, expected) + + def test_transform_casting(self): + # 13046 + data = """ + idx A ID3 DATETIME + 0 B-028 b76cd912ff "2014-10-08 13:43:27" + 1 B-054 4a57ed0b02 "2014-10-08 14:26:19" + 2 B-076 1a682034f8 "2014-10-08 14:29:01" + 3 B-023 b76cd912ff "2014-10-08 18:39:34" + 4 B-023 f88g8d7sds "2014-10-08 18:40:18" + 5 B-033 b76cd912ff "2014-10-08 18:44:30" + 6 B-032 b76cd912ff "2014-10-08 18:46:00" + 7 B-037 b76cd912ff "2014-10-08 18:52:15" + 8 B-046 db959faf02 "2014-10-08 18:59:59" + 9 B-053 b76cd912ff "2014-10-08 19:17:48" + 10 B-065 b76cd912ff "2014-10-08 19:21:38" + """ + df = pd.read_csv(StringIO(data), sep='\s+', + index_col=[0], parse_dates=['DATETIME']) + + result = df.groupby('ID3')['DATETIME'].transform(lambda x: x.diff()) + assert is_timedelta64_dtype(result.dtype) + + result = df[['ID3', 'DATETIME']].groupby('ID3').transform( + lambda x: x.diff()) + assert is_timedelta64_dtype(result.DATETIME.dtype) + + def test_transform_multiple(self): + grouped = self.ts.groupby([lambda x: x.year, lambda x: x.month]) + + grouped.transform(lambda x: x * 2) + grouped.transform(np.mean) + + def test_dispatch_transform(self): + df = self.tsframe[::5].reindex(self.tsframe.index) + + grouped = df.groupby(lambda x: x.month) + + filled = grouped.fillna(method='pad') + fillit = lambda x: x.fillna(method='pad') + expected = df.groupby(lambda x: x.month).transform(fillit) + assert_frame_equal(filled, expected) + + def test_transform_select_columns(self): + f = lambda x: x.mean() + result = self.df.groupby('A')['C', 'D'].transform(f) + + selection = self.df[['C', 'D']] + expected = selection.groupby(self.df['A']).transform(f) + + assert_frame_equal(result, expected) + + def test_transform_exclude_nuisance(self): + + # this also tests orderings in transform between + # series/frame to make sure it's consistent + expected = {} + grouped = self.df.groupby('A') + expected['C'] = grouped['C'].transform(np.mean) + expected['D'] = grouped['D'].transform(np.mean) + expected = DataFrame(expected) + result = self.df.groupby('A').transform(np.mean) + + assert_frame_equal(result, expected) + + def test_transform_function_aliases(self): + result = self.df.groupby('A').transform('mean') + expected = self.df.groupby('A').transform(np.mean) + assert_frame_equal(result, expected) + + result = self.df.groupby('A')['C'].transform('mean') + expected = self.df.groupby('A')['C'].transform(np.mean) + assert_series_equal(result, expected) + + def test_series_fast_transform_date(self): + # GH 13191 + df = pd.DataFrame({'grouping': [np.nan, 1, 1, 3], + 'd': pd.date_range('2014-1-1', '2014-1-4')}) + result = df.groupby('grouping')['d'].transform('first') + dates = [pd.NaT, pd.Timestamp('2014-1-2'), pd.Timestamp('2014-1-2'), + pd.Timestamp('2014-1-4')] + expected = pd.Series(dates, name='d') + assert_series_equal(result, expected) + + def test_transform_length(self): + # GH 9697 + df = pd.DataFrame({'col1': [1, 1, 2, 2], 'col2': [1, 2, 3, np.nan]}) + expected = pd.Series([3.0] * 4) + + def nsum(x): + return np.nansum(x) + + results = [df.groupby('col1').transform(sum)['col2'], + df.groupby('col1')['col2'].transform(sum), + df.groupby('col1').transform(nsum)['col2'], + df.groupby('col1')['col2'].transform(nsum)] + for result in results: + assert_series_equal(result, expected, check_names=False) + + def test_transform_coercion(self): + + # 14457 + # when we are transforming be sure to not coerce + # via assignment + df = pd.DataFrame(dict(A=['a', 'a'], B=[0, 1])) + g = df.groupby('A') + + expected = g.transform(np.mean) + result = g.transform(lambda x: np.mean(x)) + assert_frame_equal(result, expected) + + def test_groupby_transform_with_int(self): + + # GH 3740, make sure that we might upcast on item-by-item transform + + # floats + df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=Series(1, dtype='float64'), + C=Series( + [1, 2, 3, 1, 2, 3], dtype='float64'), D='foo')) + with np.errstate(all='ignore'): + result = df.groupby('A').transform( + lambda x: (x - x.mean()) / x.std()) + expected = DataFrame(dict(B=np.nan, C=Series( + [-1, 0, 1, -1, 0, 1], dtype='float64'))) + assert_frame_equal(result, expected) + + # int case + df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=1, + C=[1, 2, 3, 1, 2, 3], D='foo')) + with np.errstate(all='ignore'): + result = df.groupby('A').transform( + lambda x: (x - x.mean()) / x.std()) + expected = DataFrame(dict(B=np.nan, C=[-1, 0, 1, -1, 0, 1])) + assert_frame_equal(result, expected) + + # int that needs float conversion + s = Series([2, 3, 4, 10, 5, -1]) + df = DataFrame(dict(A=[1, 1, 1, 2, 2, 2], B=1, C=s, D='foo')) + with np.errstate(all='ignore'): + result = df.groupby('A').transform( + lambda x: (x - x.mean()) / x.std()) + + s1 = s.iloc[0:3] + s1 = (s1 - s1.mean()) / s1.std() + s2 = s.iloc[3:6] + s2 = (s2 - s2.mean()) / s2.std() + expected = DataFrame(dict(B=np.nan, C=concat([s1, s2]))) + assert_frame_equal(result, expected) + + # int downcasting + result = df.groupby('A').transform(lambda x: x * 2 / 2) + expected = DataFrame(dict(B=1, C=[2, 3, 4, 10, 5, -1])) + assert_frame_equal(result, expected) + + def test_groupby_transform_with_nan_group(self): + # GH 9941 + df = pd.DataFrame({'a': range(10), + 'b': [1, 1, 2, 3, np.nan, 4, 4, 5, 5, 5]}) + result = df.groupby(df.b)['a'].transform(max) + expected = pd.Series([1., 1., 2., 3., np.nan, 6., 6., 9., 9., 9.], + name='a') + assert_series_equal(result, expected) + + def test_transform_mixed_type(self): + index = MultiIndex.from_arrays([[0, 0, 0, 1, 1, 1], [1, 2, 3, 1, 2, 3] + ]) + df = DataFrame({'d': [1., 1., 1., 2., 2., 2.], + 'c': np.tile(['a', 'b', 'c'], 2), + 'v': np.arange(1., 7.)}, index=index) + + def f(group): + group['g'] = group['d'] * 2 + return group[:1] + + grouped = df.groupby('c') + result = grouped.apply(f) + + assert result['d'].dtype == np.float64 + + # this is by definition a mutating operation! + with option_context('mode.chained_assignment', None): + for key, group in grouped: + res = f(group) + assert_frame_equal(res, result.loc[key]) + + def test_cython_group_transform_algos(self): + # GH 4095 + dtypes = [np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint32, + np.uint64, np.float32, np.float64] + + ops = [(groupby.group_cumprod_float64, np.cumproduct, [np.float64]), + (groupby.group_cumsum, np.cumsum, dtypes)] + + is_datetimelike = False + for pd_op, np_op, dtypes in ops: + for dtype in dtypes: + data = np.array([[1], [2], [3], [4]], dtype=dtype) + ans = np.zeros_like(data) + labels = np.array([0, 0, 0, 0], dtype=np.int64) + pd_op(ans, data, labels, is_datetimelike) + tm.assert_numpy_array_equal(np_op(data), ans[:, 0], + check_dtype=False) + + # with nans + labels = np.array([0, 0, 0, 0, 0], dtype=np.int64) + + data = np.array([[1], [2], [3], [np.nan], [4]], dtype='float64') + actual = np.zeros_like(data) + actual.fill(np.nan) + groupby.group_cumprod_float64(actual, data, labels, is_datetimelike) + expected = np.array([1, 2, 6, np.nan, 24], dtype='float64') + tm.assert_numpy_array_equal(actual[:, 0], expected) + + actual = np.zeros_like(data) + actual.fill(np.nan) + groupby.group_cumsum(actual, data, labels, is_datetimelike) + expected = np.array([1, 3, 6, np.nan, 10], dtype='float64') + tm.assert_numpy_array_equal(actual[:, 0], expected) + + # timedelta + is_datetimelike = True + data = np.array([np.timedelta64(1, 'ns')] * 5, dtype='m8[ns]')[:, None] + actual = np.zeros_like(data, dtype='int64') + groupby.group_cumsum(actual, data.view('int64'), labels, + is_datetimelike) + expected = np.array([np.timedelta64(1, 'ns'), np.timedelta64( + 2, 'ns'), np.timedelta64(3, 'ns'), np.timedelta64(4, 'ns'), + np.timedelta64(5, 'ns')]) + tm.assert_numpy_array_equal(actual[:, 0].view('m8[ns]'), expected) + + def test_cython_transform(self): + # GH 4095 + ops = [(('cumprod', + ()), lambda x: x.cumprod()), (('cumsum', ()), + lambda x: x.cumsum()), + (('shift', (-1, )), + lambda x: x.shift(-1)), (('shift', + (1, )), lambda x: x.shift())] + + s = Series(np.random.randn(1000)) + s_missing = s.copy() + s_missing.iloc[2:10] = np.nan + labels = np.random.randint(0, 50, size=1000).astype(float) + + # series + for (op, args), targop in ops: + for data in [s, s_missing]: + # print(data.head()) + expected = data.groupby(labels).transform(targop) + + tm.assert_series_equal(expected, + data.groupby(labels).transform(op, + *args)) + tm.assert_series_equal(expected, getattr( + data.groupby(labels), op)(*args)) + + strings = list('qwertyuiopasdfghjklz') + strings_missing = strings[:] + strings_missing[5] = np.nan + df = DataFrame({'float': s, + 'float_missing': s_missing, + 'int': [1, 1, 1, 1, 2] * 200, + 'datetime': pd.date_range('1990-1-1', periods=1000), + 'timedelta': pd.timedelta_range(1, freq='s', + periods=1000), + 'string': strings * 50, + 'string_missing': strings_missing * 50}) + df['cat'] = df['string'].astype('category') + + df2 = df.copy() + df2.index = pd.MultiIndex.from_product([range(100), range(10)]) + + # DataFrame - Single and MultiIndex, + # group by values, index level, columns + for df in [df, df2]: + for gb_target in [dict(by=labels), dict(level=0), dict(by='string') + ]: # dict(by='string_missing')]: + # dict(by=['int','string'])]: + + gb = df.groupby(**gb_target) + # whitelisted methods set the selection before applying + # bit a of hack to make sure the cythonized shift + # is equivalent to pre 0.17.1 behavior + if op == 'shift': + gb._set_group_selection() + + for (op, args), targop in ops: + if op != 'shift' and 'int' not in gb_target: + # numeric apply fastpath promotes dtype so have + # to apply seperately and concat + i = gb[['int']].apply(targop) + f = gb[['float', 'float_missing']].apply(targop) + expected = pd.concat([f, i], axis=1) + else: + expected = gb.apply(targop) + + expected = expected.sort_index(axis=1) + tm.assert_frame_equal(expected, + gb.transform(op, *args).sort_index( + axis=1)) + tm.assert_frame_equal(expected, getattr(gb, op)(*args)) + # individual columns + for c in df: + if c not in ['float', 'int', 'float_missing' + ] and op != 'shift': + pytest.raises(DataError, gb[c].transform, op) + pytest.raises(DataError, getattr(gb[c], op)) + else: + expected = gb[c].apply(targop) + expected.name = c + tm.assert_series_equal(expected, + gb[c].transform(op, *args)) + tm.assert_series_equal(expected, + getattr(gb[c], op)(*args)) + + def test_transform_with_non_scalar_group(self): + # GH 10165 + cols = pd.MultiIndex.from_tuples([ + ('syn', 'A'), ('mis', 'A'), ('non', 'A'), + ('syn', 'C'), ('mis', 'C'), ('non', 'C'), + ('syn', 'T'), ('mis', 'T'), ('non', 'T'), + ('syn', 'G'), ('mis', 'G'), ('non', 'G')]) + df = pd.DataFrame(np.random.randint(1, 10, (4, 12)), + columns=cols, + index=['A', 'C', 'G', 'T']) + tm.assert_raises_regex(ValueError, 'transform must return ' + 'a scalar value for each ' + 'group.*', + df.groupby(axis=1, level=1).transform, + lambda z: z.div(z.sum(axis=1), axis=0)) diff --git a/pandas/tests/groupby/test_value_counts.py b/pandas/tests/groupby/test_value_counts.py new file mode 100644 index 0000000000000..b70a03ec3a1d3 --- /dev/null +++ b/pandas/tests/groupby/test_value_counts.py @@ -0,0 +1,61 @@ +import pytest + +from itertools import product +import numpy as np + +from pandas.util import testing as tm +from pandas import MultiIndex, DataFrame, Series, date_range + + +@pytest.mark.slow +@pytest.mark.parametrize("n,m", product((100, 1000), (5, 20))) +def test_series_groupby_value_counts(n, m): + np.random.seed(1234) + + def rebuild_index(df): + arr = list(map(df.index.get_level_values, range(df.index.nlevels))) + df.index = MultiIndex.from_arrays(arr, names=df.index.names) + return df + + def check_value_counts(df, keys, bins): + for isort, normalize, sort, ascending, dropna \ + in product((False, True), repeat=5): + + kwargs = dict(normalize=normalize, sort=sort, + ascending=ascending, dropna=dropna, bins=bins) + + gr = df.groupby(keys, sort=isort) + left = gr['3rd'].value_counts(**kwargs) + + gr = df.groupby(keys, sort=isort) + right = gr['3rd'].apply(Series.value_counts, **kwargs) + right.index.names = right.index.names[:-1] + ['3rd'] + + # have to sort on index because of unstable sort on values + left, right = map(rebuild_index, (left, right)) # xref GH9212 + tm.assert_series_equal(left.sort_index(), right.sort_index()) + + def loop(df): + bins = None, np.arange(0, max(5, df['3rd'].max()) + 1, 2) + keys = '1st', '2nd', ('1st', '2nd') + for k, b in product(keys, bins): + check_value_counts(df, k, b) + + days = date_range('2015-08-24', periods=10) + + frame = DataFrame({ + '1st': np.random.choice( + list('abcd'), n), + '2nd': np.random.choice(days, n), + '3rd': np.random.randint(1, m + 1, n) + }) + + loop(frame) + + frame.loc[1::11, '1st'] = np.nan + frame.loc[3::17, '2nd'] = np.nan + frame.loc[7::19, '3rd'] = np.nan + frame.loc[8::19, '3rd'] = np.nan + frame.loc[9::19, '3rd'] = np.nan + + loop(frame) diff --git a/pandas/tests/groupby/test_whitelist.py b/pandas/tests/groupby/test_whitelist.py new file mode 100644 index 0000000000000..2c8bf57f20fae --- /dev/null +++ b/pandas/tests/groupby/test_whitelist.py @@ -0,0 +1,303 @@ +""" +test methods relating to generic function evaluation +the so-called white/black lists +""" + +import pytest +from string import ascii_lowercase +import numpy as np +from pandas import DataFrame, Series, compat, date_range, Index, MultiIndex +from pandas.util import testing as tm +from pandas.compat import lrange, product + +AGG_FUNCTIONS = ['sum', 'prod', 'min', 'max', 'median', 'mean', 'skew', + 'mad', 'std', 'var', 'sem'] +AGG_FUNCTIONS_WITH_SKIPNA = ['skew', 'mad'] + +df_whitelist = frozenset([ + 'last', + 'first', + 'mean', + 'sum', + 'min', + 'max', + 'head', + 'tail', + 'cumcount', + 'ngroup', + 'resample', + 'rank', + 'quantile', + 'fillna', + 'mad', + 'any', + 'all', + 'take', + 'idxmax', + 'idxmin', + 'shift', + 'tshift', + 'ffill', + 'bfill', + 'pct_change', + 'skew', + 'plot', + 'boxplot', + 'hist', + 'median', + 'dtypes', + 'corrwith', + 'corr', + 'cov', + 'diff', +]) + +s_whitelist = frozenset([ + 'last', + 'first', + 'mean', + 'sum', + 'min', + 'max', + 'head', + 'tail', + 'cumcount', + 'ngroup', + 'resample', + 'rank', + 'quantile', + 'fillna', + 'mad', + 'any', + 'all', + 'take', + 'idxmax', + 'idxmin', + 'shift', + 'tshift', + 'ffill', + 'bfill', + 'pct_change', + 'skew', + 'plot', + 'hist', + 'median', + 'dtype', + 'corr', + 'cov', + 'diff', + 'unique', + 'nlargest', + 'nsmallest', +]) + + +@pytest.fixture +def mframe(): + index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', + 'three']], + labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=['first', 'second']) + return DataFrame(np.random.randn(10, 3), index=index, + columns=['A', 'B', 'C']) + + +@pytest.fixture +def df(): + return DataFrame( + {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], + 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], + 'C': np.random.randn(8), + 'D': np.random.randn(8)}) + + +@pytest.fixture +def df_letters(): + letters = np.array(list(ascii_lowercase)) + N = 10 + random_letters = letters.take(np.random.randint(0, 26, N)) + df = DataFrame({'floats': N / 10 * Series(np.random.random(N)), + 'letters': Series(random_letters)}) + return df + + +@pytest.mark.parametrize( + "obj, whitelist", zip((df_letters(), df_letters().floats), + (df_whitelist, s_whitelist))) +def test_groupby_whitelist(df_letters, obj, whitelist): + df = df_letters + + # these are aliases so ok to have the alias __name__ + alias = {'bfill': 'backfill', + 'ffill': 'pad', + 'boxplot': None} + + gb = obj.groupby(df.letters) + + assert whitelist == gb._apply_whitelist + for m in whitelist: + + m = alias.get(m, m) + if m is None: + continue + + f = getattr(type(gb), m) + + # name + try: + n = f.__name__ + except AttributeError: + continue + assert n == m + + # qualname + if compat.PY3: + try: + n = f.__qualname__ + except AttributeError: + continue + assert n.endswith(m) + + +@pytest.fixture +def raw_frame(): + index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', + 'three']], + labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=['first', 'second']) + raw_frame = DataFrame(np.random.randn(10, 3), index=index, + columns=Index(['A', 'B', 'C'], name='exp')) + raw_frame.iloc[1, [1, 2]] = np.nan + raw_frame.iloc[7, [0, 1]] = np.nan + return raw_frame + + +@pytest.mark.parametrize( + "op, level, axis, skipna", + product(AGG_FUNCTIONS, + lrange(2), lrange(2), + [True, False])) +def test_regression_whitelist_methods(raw_frame, op, level, axis, skipna): + # GH6944 + # explicity test the whitelest methods + + if axis == 0: + frame = raw_frame + else: + frame = raw_frame.T + + if op in AGG_FUNCTIONS_WITH_SKIPNA: + grouped = frame.groupby(level=level, axis=axis) + result = getattr(grouped, op)(skipna=skipna) + expected = getattr(frame, op)(level=level, axis=axis, + skipna=skipna) + tm.assert_frame_equal(result, expected) + else: + grouped = frame.groupby(level=level, axis=axis) + result = getattr(grouped, op)() + expected = getattr(frame, op)(level=level, axis=axis) + tm.assert_frame_equal(result, expected) + + +def test_groupby_blacklist(df_letters): + df = df_letters + s = df_letters.floats + + blacklist = [ + 'eval', 'query', 'abs', 'where', + 'mask', 'align', 'groupby', 'clip', 'astype', + 'at', 'combine', 'consolidate', 'convert_objects', + ] + to_methods = [method for method in dir(df) if method.startswith('to_')] + + blacklist.extend(to_methods) + + # e.g., to_csv + defined_but_not_allowed = ("(?:^Cannot.+{0!r}.+{1!r}.+try using the " + "'apply' method$)") + + # e.g., query, eval + not_defined = "(?:^{1!r} object has no attribute {0!r}$)" + fmt = defined_but_not_allowed + '|' + not_defined + for bl in blacklist: + for obj in (df, s): + gb = obj.groupby(df.letters) + msg = fmt.format(bl, type(gb).__name__) + with tm.assert_raises_regex(AttributeError, msg): + getattr(gb, bl) + + +def test_tab_completion(mframe): + grp = mframe.groupby(level='second') + results = set([v for v in dir(grp) if not v.startswith('_')]) + expected = set( + ['A', 'B', 'C', 'agg', 'aggregate', 'apply', 'boxplot', 'filter', + 'first', 'get_group', 'groups', 'hist', 'indices', 'last', 'max', + 'mean', 'median', 'min', 'ngroups', 'nth', 'ohlc', 'plot', + 'prod', 'size', 'std', 'sum', 'transform', 'var', 'sem', 'count', + 'nunique', 'head', 'describe', 'cummax', 'quantile', + 'rank', 'cumprod', 'tail', 'resample', 'cummin', 'fillna', + 'cumsum', 'cumcount', 'ngroup', 'all', 'shift', 'skew', + 'take', 'tshift', 'pct_change', 'any', 'mad', 'corr', 'corrwith', + 'cov', 'dtypes', 'ndim', 'diff', 'idxmax', 'idxmin', + 'ffill', 'bfill', 'pad', 'backfill', 'rolling', 'expanding']) + assert results == expected + + +def test_groupby_function_rename(mframe): + grp = mframe.groupby(level='second') + for name in ['sum', 'prod', 'min', 'max', 'first', 'last']: + f = getattr(grp, name) + assert f.__name__ == name + + +def test_groupby_selection_with_methods(df): + # some methods which require DatetimeIndex + rng = date_range('2014', periods=len(df)) + df.index = rng + + g = df.groupby(['A'])[['C']] + g_exp = df[['C']].groupby(df['A']) + # TODO check groupby with > 1 col ? + + # methods which are called as .foo() + methods = ['count', + 'corr', + 'cummax', + 'cummin', + 'cumprod', + 'describe', + 'rank', + 'quantile', + 'diff', + 'shift', + 'all', + 'any', + 'idxmin', + 'idxmax', + 'ffill', + 'bfill', + 'pct_change', + 'tshift'] + + for m in methods: + res = getattr(g, m)() + exp = getattr(g_exp, m)() + + # should always be frames! + tm.assert_frame_equal(res, exp) + + # methods which aren't just .foo() + tm.assert_frame_equal(g.fillna(0), g_exp.fillna(0)) + tm.assert_frame_equal(g.dtypes, g_exp.dtypes) + tm.assert_frame_equal(g.apply(lambda x: x.sum()), + g_exp.apply(lambda x: x.sum())) + + tm.assert_frame_equal(g.resample('D').mean(), g_exp.resample('D').mean()) + tm.assert_frame_equal(g.resample('D').ohlc(), + g_exp.resample('D').ohlc()) + + tm.assert_frame_equal(g.filter(lambda x: len(x) == 3), + g_exp.filter(lambda x: len(x) == 3)) diff --git a/pandas/tests/indexes/common.py b/pandas/tests/indexes/common.py index 5a482acf403cd..1fdc08d68eb26 100644 --- a/pandas/tests/indexes/common.py +++ b/pandas/tests/indexes/common.py @@ -1,5 +1,7 @@ # -*- coding: utf-8 -*- +import pytest + from pandas import compat from pandas.compat import PY3 @@ -7,9 +9,11 @@ from pandas import (Series, Index, Float64Index, Int64Index, UInt64Index, RangeIndex, MultiIndex, CategoricalIndex, DatetimeIndex, - TimedeltaIndex, PeriodIndex, notnull) -from pandas.types.common import needs_i8_conversion -from pandas.util.testing import assertRaisesRegexp + TimedeltaIndex, PeriodIndex, IntervalIndex, + notna, isna) +from pandas.core.indexes.datetimelike import DatetimeIndexOpsMixin +from pandas.core.dtypes.common import needs_i8_conversion +from pandas._libs.tslib import iNaT import pandas.util.testing as tm @@ -26,8 +30,8 @@ def setup_indices(self): setattr(self, name, idx) def verify_pickle(self, index): - unpickled = self.round_trip_pickle(index) - self.assertTrue(index.equals(unpickled)) + unpickled = tm.round_trip_pickle(index) + assert index.equals(unpickled) def test_pickle_compat_construction(self): # this is testing for pickle compat @@ -35,14 +39,23 @@ def test_pickle_compat_construction(self): return # need an object to create with - self.assertRaises(TypeError, self._holder) + pytest.raises(TypeError, self._holder) + + def test_to_series(self): + # assert that we are creating a copy of the index + + idx = self.create_index() + s = idx.to_series() + assert s.values is not idx.values + assert s.index is not idx + assert s.name == idx.name def test_shift(self): # GH8083 test the base class for shift idx = self.create_index() - self.assertRaises(NotImplementedError, idx.shift, 1) - self.assertRaises(NotImplementedError, idx.shift, 1, 2) + pytest.raises(NotImplementedError, idx.shift, 1) + pytest.raises(NotImplementedError, idx.shift, 1, 2) def test_create_index_existing_name(self): @@ -77,26 +90,26 @@ def test_create_index_existing_name(self): def test_numeric_compat(self): idx = self.create_index() - tm.assertRaisesRegexp(TypeError, "cannot perform __mul__", - lambda: idx * 1) - tm.assertRaisesRegexp(TypeError, "cannot perform __mul__", - lambda: 1 * idx) + tm.assert_raises_regex(TypeError, "cannot perform __mul__", + lambda: idx * 1) + tm.assert_raises_regex(TypeError, "cannot perform __mul__", + lambda: 1 * idx) div_err = "cannot perform __truediv__" if PY3 \ else "cannot perform __div__" - tm.assertRaisesRegexp(TypeError, div_err, lambda: idx / 1) - tm.assertRaisesRegexp(TypeError, div_err, lambda: 1 / idx) - tm.assertRaisesRegexp(TypeError, "cannot perform __floordiv__", - lambda: idx // 1) - tm.assertRaisesRegexp(TypeError, "cannot perform __floordiv__", - lambda: 1 // idx) + tm.assert_raises_regex(TypeError, div_err, lambda: idx / 1) + tm.assert_raises_regex(TypeError, div_err, lambda: 1 / idx) + tm.assert_raises_regex(TypeError, "cannot perform __floordiv__", + lambda: idx // 1) + tm.assert_raises_regex(TypeError, "cannot perform __floordiv__", + lambda: 1 // idx) def test_logical_compat(self): idx = self.create_index() - tm.assertRaisesRegexp(TypeError, 'cannot perform all', - lambda: idx.all()) - tm.assertRaisesRegexp(TypeError, 'cannot perform any', - lambda: idx.any()) + tm.assert_raises_regex(TypeError, 'cannot perform all', + lambda: idx.all()) + tm.assert_raises_regex(TypeError, 'cannot perform any', + lambda: idx.any()) def test_boolean_context_compat(self): @@ -107,7 +120,7 @@ def f(): if idx: pass - tm.assertRaisesRegexp(ValueError, 'The truth value of a', f) + tm.assert_raises_regex(ValueError, 'The truth value of a', f) def test_reindex_base(self): idx = self.create_index() @@ -116,18 +129,31 @@ def test_reindex_base(self): actual = idx.get_indexer(idx) tm.assert_numpy_array_equal(expected, actual) - with tm.assertRaisesRegexp(ValueError, 'Invalid fill method'): + with tm.assert_raises_regex(ValueError, 'Invalid fill method'): idx.get_indexer(idx, method='invalid') - def test_ndarray_compat_properties(self): + def test_get_indexer_consistency(self): + # See GH 16819 + for name, index in self.indices.items(): + if isinstance(index, IntervalIndex): + continue + indexer = index.get_indexer(index[0:2]) + assert isinstance(indexer, np.ndarray) + assert indexer.dtype == np.intp + + indexer, _ = index.get_indexer_non_unique(index[0:2]) + assert isinstance(indexer, np.ndarray) + assert indexer.dtype == np.intp + + def test_ndarray_compat_properties(self): idx = self.create_index() - self.assertTrue(idx.T.equals(idx)) - self.assertTrue(idx.transpose().equals(idx)) + assert idx.T.equals(idx) + assert idx.transpose().equals(idx) values = idx.values for prop in self._compat_props: - self.assertEqual(getattr(idx, prop), getattr(values, prop)) + assert getattr(idx, prop) == getattr(values, prop) # test for validity idx.nbytes @@ -143,14 +169,14 @@ def test_str(self): # test the string repr idx = self.create_index() idx.name = 'foo' - self.assertTrue("'foo'" in str(idx)) - self.assertTrue(idx.__class__.__name__ in str(idx)) + assert "'foo'" in str(idx) + assert idx.__class__.__name__ in str(idx) def test_dtype_str(self): for idx in self.indices.values(): dtype = idx.dtype_str - self.assertIsInstance(dtype, compat.string_types) - self.assertEqual(dtype, str(idx.dtype)) + assert isinstance(dtype, compat.string_types) + assert dtype == str(idx.dtype) def test_repr_max_seq_item_setting(self): # GH10182 @@ -158,14 +184,14 @@ def test_repr_max_seq_item_setting(self): idx = idx.repeat(50) with pd.option_context("display.max_seq_items", None): repr(idx) - self.assertFalse('...' in str(idx)) + assert '...' not in str(idx) def test_wrong_number_names(self): def testit(ind): ind.names = ["apple", "banana", "carrot"] for ind in self.indices.values(): - assertRaisesRegexp(ValueError, "^Length", testit, ind) + tm.assert_raises_regex(ValueError, "^Length", testit, ind) def test_set_name_methods(self): new_name = "This is the new name for this index" @@ -177,35 +203,36 @@ def test_set_name_methods(self): original_name = ind.name new_ind = ind.set_names([new_name]) - self.assertEqual(new_ind.name, new_name) - self.assertEqual(ind.name, original_name) + assert new_ind.name == new_name + assert ind.name == original_name res = ind.rename(new_name, inplace=True) # should return None - self.assertIsNone(res) - self.assertEqual(ind.name, new_name) - self.assertEqual(ind.names, [new_name]) - # with assertRaisesRegexp(TypeError, "list-like"): + assert res is None + assert ind.name == new_name + assert ind.names == [new_name] + # with tm.assert_raises_regex(TypeError, "list-like"): # # should still fail even if it would be the right length # ind.set_names("a") - with assertRaisesRegexp(ValueError, "Level must be None"): + with tm.assert_raises_regex(ValueError, "Level must be None"): ind.set_names("a", level=0) # rename in place just leaves tuples and other containers alone name = ('A', 'B') ind.rename(name, inplace=True) - self.assertEqual(ind.name, name) - self.assertEqual(ind.names, [name]) + assert ind.name == name + assert ind.names == [name] def test_hash_error(self): for ind in self.indices.values(): - with tm.assertRaisesRegexp(TypeError, "unhashable type: %r" % - type(ind).__name__): + with tm.assert_raises_regex(TypeError, "unhashable type: %r" % + type(ind).__name__): hash(ind) def test_copy_name(self): - # Check that "name" argument passed at initialization is honoured - # GH12309 + # gh-12309: Check that the "name" argument + # passed at initialization is honored. + for name, index in compat.iteritems(self.indices): if isinstance(index, MultiIndex): continue @@ -214,18 +241,21 @@ def test_copy_name(self): second = first.__class__(first, copy=False) # Even though "copy=False", we want a new object. - self.assertIsNot(first, second) - # Not using tm.assert_index_equal() since names differ: - self.assertTrue(index.equals(first)) + assert first is not second + + # Not using tm.assert_index_equal() since names differ. + assert index.equals(first) - self.assertEqual(first.name, 'mario') - self.assertEqual(second.name, 'mario') + assert first.name == 'mario' + assert second.name == 'mario' s1 = Series(2, index=first) s2 = Series(3, index=second[:-1]) - if not isinstance(index, CategoricalIndex): # See GH13365 + + if not isinstance(index, CategoricalIndex): + # See gh-13365 s3 = s1 * s2 - self.assertEqual(s3.index.name, 'mario') + assert s3.index.name == 'mario' def test_ensure_copied_data(self): # Check the "copy" argument of each Index.__new__ is honoured @@ -246,18 +276,21 @@ def test_ensure_copied_data(self): tm.assert_numpy_array_equal(index.values, result.values, check_same='copy') - if not isinstance(index, PeriodIndex): - result = index_type(index.values, copy=False, **init_kwargs) - tm.assert_numpy_array_equal(index.values, result.values, - check_same='same') - tm.assert_numpy_array_equal(index._values, result._values, - check_same='same') - else: + if isinstance(index, PeriodIndex): # .values an object array of Period, thus copied result = index_type(ordinal=index.asi8, copy=False, **init_kwargs) tm.assert_numpy_array_equal(index._values, result._values, check_same='same') + elif isinstance(index, IntervalIndex): + # checked in test_interval.py + pass + else: + result = index_type(index.values, copy=False, **init_kwargs) + tm.assert_numpy_array_equal(index.values, result.values, + check_same='same') + tm.assert_numpy_array_equal(index._values, result._values, + check_same='same') def test_copy_and_deepcopy(self): from copy import copy, deepcopy @@ -270,11 +303,11 @@ def test_copy_and_deepcopy(self): for func in (copy, deepcopy): idx_copy = func(ind) - self.assertIsNot(idx_copy, ind) - self.assertTrue(idx_copy.equals(ind)) + assert idx_copy is not ind + assert idx_copy.equals(ind) new_copy = ind.copy(deep=True, name="banana") - self.assertEqual(new_copy.name, "banana") + assert new_copy.name == "banana" def test_duplicates(self): for ind in self.indices.values(): @@ -284,15 +317,15 @@ def test_duplicates(self): if isinstance(ind, MultiIndex): continue idx = self._holder([ind[0]] * 5) - self.assertFalse(idx.is_unique) - self.assertTrue(idx.has_duplicates) + assert not idx.is_unique + assert idx.has_duplicates # GH 10115 # preserve names idx.name = 'foo' result = idx.drop_duplicates() - self.assertEqual(result.name, 'foo') - self.assert_index_equal(result, Index([ind[0]], name='foo')) + assert result.name == 'foo' + tm.assert_index_equal(result, Index([ind[0]], name='foo')) def test_get_unique_index(self): for ind in self.indices.values(): @@ -303,26 +336,26 @@ def test_get_unique_index(self): idx = ind[[0] * 5] idx_unique = ind[[0]] + # We test against `idx_unique`, so first we make sure it's unique # and doesn't contain nans. - self.assertTrue(idx_unique.is_unique) + assert idx_unique.is_unique try: - self.assertFalse(idx_unique.hasnans) + assert not idx_unique.hasnans except NotImplementedError: pass for dropna in [False, True]: result = idx._get_unique_index(dropna=dropna) - self.assert_index_equal(result, idx_unique) + tm.assert_index_equal(result, idx_unique) # nans: - if not ind._can_hold_na: continue if needs_i8_conversion(ind): vals = ind.asi8[[0] * 5] - vals[0] = pd.tslib.iNaT + vals[0] = iNaT else: vals = ind.values[[0] * 5] vals[0] = np.nan @@ -330,41 +363,35 @@ def test_get_unique_index(self): vals_unique = vals[:2] idx_nan = ind._shallow_copy(vals) idx_unique_nan = ind._shallow_copy(vals_unique) - self.assertTrue(idx_unique_nan.is_unique) + assert idx_unique_nan.is_unique - self.assertEqual(idx_nan.dtype, ind.dtype) - self.assertEqual(idx_unique_nan.dtype, ind.dtype) + assert idx_nan.dtype == ind.dtype + assert idx_unique_nan.dtype == ind.dtype for dropna, expected in zip([False, True], [idx_unique_nan, idx_unique]): for i in [idx_nan, idx_unique_nan]: result = i._get_unique_index(dropna=dropna) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_sort(self): for ind in self.indices.values(): - self.assertRaises(TypeError, ind.sort) - - def test_order(self): - for ind in self.indices.values(): - # 9816 deprecated - with tm.assert_produces_warning(FutureWarning): - ind.order() + pytest.raises(TypeError, ind.sort) def test_mutability(self): for ind in self.indices.values(): if not len(ind): continue - self.assertRaises(TypeError, ind.__setitem__, 0, ind[0]) + pytest.raises(TypeError, ind.__setitem__, 0, ind[0]) def test_view(self): for ind in self.indices.values(): i_view = ind.view() - self.assertEqual(i_view.name, ind.name) + assert i_view.name == ind.name def test_compat(self): for ind in self.indices.values(): - self.assertEqual(ind.tolist(), list(ind)) + assert ind.tolist() == list(ind) def test_memory_usage(self): for name, index in compat.iteritems(self.indices): @@ -374,17 +401,18 @@ def test_memory_usage(self): result2 = index.memory_usage() result3 = index.memory_usage(deep=True) - # RangeIndex doesn't use a hashtable engine - if not isinstance(index, RangeIndex): - self.assertTrue(result2 > result) + # RangeIndex, IntervalIndex + # don't have engines + if not isinstance(index, (RangeIndex, IntervalIndex)): + assert result2 > result if index.inferred_type == 'object': - self.assertTrue(result3 > result2) + assert result3 > result2 else: # we report 0 for no-length - self.assertEqual(result, 0) + assert result == 0 def test_argsort(self): for k, ind in self.indices.items(): @@ -407,21 +435,21 @@ def test_numpy_argsort(self): # pandas compatibility input validation - the # rest already perform separate (or no) such # validation via their 'values' attribute as - # defined in pandas/indexes/base.py - they + # defined in pandas.core.indexes/base.py - they # cannot be changed at the moment due to # backwards compatibility concerns if isinstance(type(ind), (CategoricalIndex, RangeIndex)): msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, - np.argsort, ind, axis=1) + tm.assert_raises_regex(ValueError, msg, + np.argsort, ind, axis=1) msg = "the 'kind' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argsort, - ind, kind='mergesort') + tm.assert_raises_regex(ValueError, msg, np.argsort, + ind, kind='mergesort') msg = "the 'order' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argsort, - ind, order=('a', 'b')) + tm.assert_raises_regex(ValueError, msg, np.argsort, + ind, order=('a', 'b')) def test_pickle(self): for ind in self.indices.values(): @@ -439,12 +467,12 @@ def test_take(self): result = ind.take(indexer) expected = ind[indexer] - self.assertTrue(result.equals(expected)) + assert result.equals(expected) if not isinstance(ind, (DatetimeIndex, PeriodIndex, TimedeltaIndex)): # GH 10791 - with tm.assertRaises(AttributeError): + with pytest.raises(AttributeError): ind.freq def test_take_invalid_kwargs(self): @@ -452,16 +480,16 @@ def test_take_invalid_kwargs(self): indices = [1, 2] msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, idx.take, - indices, foo=2) + tm.assert_raises_regex(TypeError, msg, idx.take, + indices, foo=2) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, out=indices) + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, out=indices) msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, mode='clip') + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, mode='clip') def test_repeat(self): rep = 2 @@ -481,12 +509,12 @@ def test_numpy_repeat(self): tm.assert_index_equal(np.repeat(i, rep), expected) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.repeat, - i, rep, axis=0) + tm.assert_raises_regex(ValueError, msg, np.repeat, + i, rep, axis=0) def test_where(self): i = self.create_index() - result = i.where(notnull(i)) + result = i.where(notna(i)) expected = i tm.assert_index_equal(result, expected) @@ -497,6 +525,18 @@ def test_where(self): result = i.where(cond) tm.assert_index_equal(result, expected) + def test_where_array_like(self): + i = self.create_index() + + _nan = i._na_value + cond = [False] + [True] * (len(i) - 1) + klasses = [list, tuple, np.array, pd.Series] + expected = pd.Index([_nan] + i[1:].tolist(), dtype=i.dtype) + + for klass in klasses: + result = i.where(klass(cond)) + tm.assert_index_equal(result, expected) + def test_setops_errorcases(self): for name, idx in compat.iteritems(self.indices): # # non-iterable input @@ -506,9 +546,10 @@ def test_setops_errorcases(self): for method in methods: for case in cases: - assertRaisesRegexp(TypeError, - "Input must be Index or array-like", - method, case) + tm.assert_raises_regex(TypeError, + "Input must be Index " + "or array-like", + method, case) def test_intersection_base(self): for name, idx in compat.iteritems(self.indices): @@ -519,7 +560,7 @@ def test_intersection_base(self): if isinstance(idx, CategoricalIndex): pass else: - self.assertTrue(tm.equalContents(intersect, second)) + assert tm.equalContents(intersect, second) # GH 10149 cases = [klass(second.values) @@ -527,17 +568,17 @@ def test_intersection_base(self): for case in cases: if isinstance(idx, PeriodIndex): msg = "can only call with other PeriodIndex-ed objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): result = first.intersection(case) elif isinstance(idx, CategoricalIndex): pass else: result = first.intersection(case) - self.assertTrue(tm.equalContents(result, second)) + assert tm.equalContents(result, second) if isinstance(idx, MultiIndex): msg = "other must be a MultiIndex or a list of tuples" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): result = first.intersection([1, 2, 3]) def test_union_base(self): @@ -546,7 +587,7 @@ def test_union_base(self): second = idx[:5] everything = idx union = first.union(second) - self.assertTrue(tm.equalContents(union, everything)) + assert tm.equalContents(union, everything) # GH 10149 cases = [klass(second.values) @@ -554,17 +595,17 @@ def test_union_base(self): for case in cases: if isinstance(idx, PeriodIndex): msg = "can only call with other PeriodIndex-ed objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): result = first.union(case) elif isinstance(idx, CategoricalIndex): pass else: result = first.union(case) - self.assertTrue(tm.equalContents(result, everything)) + assert tm.equalContents(result, everything) if isinstance(idx, MultiIndex): msg = "other must be a MultiIndex or a list of tuples" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): result = first.union([1, 2, 3]) def test_difference_base(self): @@ -577,7 +618,7 @@ def test_difference_base(self): if isinstance(idx, CategoricalIndex): pass else: - self.assertTrue(tm.equalContents(result, answer)) + assert tm.equalContents(result, answer) # GH 10149 cases = [klass(second.values) @@ -585,20 +626,20 @@ def test_difference_base(self): for case in cases: if isinstance(idx, PeriodIndex): msg = "can only call with other PeriodIndex-ed objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): result = first.difference(case) elif isinstance(idx, CategoricalIndex): pass elif isinstance(idx, (DatetimeIndex, TimedeltaIndex)): - self.assertEqual(result.__class__, answer.__class__) + assert result.__class__ == answer.__class__ tm.assert_numpy_array_equal(result.asi8, answer.asi8) else: result = first.difference(case) - self.assertTrue(tm.equalContents(result, answer)) + assert tm.equalContents(result, answer) if isinstance(idx, MultiIndex): msg = "other must be a MultiIndex or a list of tuples" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): result = first.difference([1, 2, 3]) def test_symmetric_difference(self): @@ -610,7 +651,7 @@ def test_symmetric_difference(self): else: answer = idx[[0, -1]] result = first.symmetric_difference(second) - self.assertTrue(tm.equalContents(result, answer)) + assert tm.equalContents(result, answer) # GH 10149 cases = [klass(second.values) @@ -618,22 +659,18 @@ def test_symmetric_difference(self): for case in cases: if isinstance(idx, PeriodIndex): msg = "can only call with other PeriodIndex-ed objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): result = first.symmetric_difference(case) elif isinstance(idx, CategoricalIndex): pass else: result = first.symmetric_difference(case) - self.assertTrue(tm.equalContents(result, answer)) + assert tm.equalContents(result, answer) if isinstance(idx, MultiIndex): msg = "other must be a MultiIndex or a list of tuples" - with tm.assertRaisesRegexp(TypeError, msg): - result = first.symmetric_difference([1, 2, 3]) - - # 12591 deprecated - with tm.assert_produces_warning(FutureWarning): - first.sym_diff(second) + with tm.assert_raises_regex(TypeError, msg): + first.symmetric_difference([1, 2, 3]) def test_insert_base(self): @@ -644,7 +681,7 @@ def test_insert_base(self): continue # test 0th element - self.assertTrue(idx[0:4].equals(result.insert(0, idx[0]))) + assert idx[0:4].equals(result.insert(0, idx[0])) def test_delete_base(self): @@ -659,37 +696,37 @@ def test_delete_base(self): expected = idx[1:] result = idx.delete(0) - self.assertTrue(result.equals(expected)) - self.assertEqual(result.name, expected.name) + assert result.equals(expected) + assert result.name == expected.name expected = idx[:-1] result = idx.delete(-1) - self.assertTrue(result.equals(expected)) - self.assertEqual(result.name, expected.name) + assert result.equals(expected) + assert result.name == expected.name - with tm.assertRaises((IndexError, ValueError)): + with pytest.raises((IndexError, ValueError)): # either depending on numpy version result = idx.delete(len(idx)) def test_equals(self): for name, idx in compat.iteritems(self.indices): - self.assertTrue(idx.equals(idx)) - self.assertTrue(idx.equals(idx.copy())) - self.assertTrue(idx.equals(idx.astype(object))) + assert idx.equals(idx) + assert idx.equals(idx.copy()) + assert idx.equals(idx.astype(object)) - self.assertFalse(idx.equals(list(idx))) - self.assertFalse(idx.equals(np.array(idx))) + assert not idx.equals(list(idx)) + assert not idx.equals(np.array(idx)) # Cannot pass in non-int64 dtype to RangeIndex if not isinstance(idx, RangeIndex): same_values = Index(idx, dtype=object) - self.assertTrue(idx.equals(same_values)) - self.assertTrue(same_values.equals(idx)) + assert idx.equals(same_values) + assert same_values.equals(idx) if idx.nlevels == 1: # do not test MultiIndex - self.assertFalse(idx.equals(pd.Series(idx))) + assert not idx.equals(pd.Series(idx)) def test_equals_op(self): # GH9947, GH10637 @@ -701,7 +738,7 @@ def test_equals_op(self): index_b = index_a[0:-1] index_c = index_a[0:-1].append(index_a[-2:-1]) index_d = index_a[0:1] - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == index_b expected1 = np.array([True] * n) expected2 = np.array([True] * (n - 1) + [False]) @@ -713,7 +750,7 @@ def test_equals_op(self): array_b = np.array(index_a[0:-1]) array_c = np.array(index_a[0:-1].append(index_a[-2:-1])) array_d = np.array(index_a[0:1]) - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == array_b tm.assert_numpy_array_equal(index_a == array_a, expected1) tm.assert_numpy_array_equal(index_a == array_c, expected2) @@ -723,22 +760,22 @@ def test_equals_op(self): series_b = Series(array_b) series_c = Series(array_c) series_d = Series(array_d) - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == series_b tm.assert_numpy_array_equal(index_a == series_a, expected1) tm.assert_numpy_array_equal(index_a == series_c, expected2) # cases where length is 1 for one of them - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == index_d - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == series_d - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): index_a == array_d msg = "Can only compare identically-labeled Series objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): series_a == series_d - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): series_a == array_d # comparing with a scalar should broadcast; note that we are excluding @@ -765,10 +802,10 @@ def test_numpy_ufuncs(self): np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.deg2rad, np.rad2deg]: - if isinstance(idx, pd.tseries.base.DatetimeIndexOpsMixin): + if isinstance(idx, DatetimeIndexOpsMixin): # raise TypeError or ValueError (PeriodIndex) # PeriodIndex behavior should be changed in future version - with tm.assertRaises(Exception): + with pytest.raises(Exception): with np.errstate(all='ignore'): func(idx) elif isinstance(idx, (Float64Index, Int64Index, UInt64Index)): @@ -776,33 +813,33 @@ def test_numpy_ufuncs(self): with np.errstate(all='ignore'): result = func(idx) exp = Index(func(idx.values), name=idx.name) - self.assert_index_equal(result, exp) - self.assertIsInstance(result, pd.Float64Index) + + tm.assert_index_equal(result, exp) + assert isinstance(result, pd.Float64Index) else: # raise AttributeError or TypeError if len(idx) == 0: continue else: - with tm.assertRaises(Exception): + with pytest.raises(Exception): with np.errstate(all='ignore'): func(idx) for func in [np.isfinite, np.isinf, np.isnan, np.signbit]: - if isinstance(idx, pd.tseries.base.DatetimeIndexOpsMixin): + if isinstance(idx, DatetimeIndexOpsMixin): # raise TypeError or ValueError (PeriodIndex) - with tm.assertRaises(Exception): + with pytest.raises(Exception): func(idx) elif isinstance(idx, (Float64Index, Int64Index, UInt64Index)): - # results in bool array + # Results in bool array result = func(idx) - exp = func(idx.values) - self.assertIsInstance(result, np.ndarray) - tm.assertNotIsInstance(result, Index) + assert isinstance(result, np.ndarray) + assert not isinstance(result, Index) else: if len(idx) == 0: continue else: - with tm.assertRaises(Exception): + with pytest.raises(Exception): func(idx) def test_hasnans_isnans(self): @@ -815,16 +852,16 @@ def test_hasnans_isnans(self): # cases in indices doesn't include NaN expected = np.array([False] * len(idx), dtype=bool) - self.assert_numpy_array_equal(idx._isnan, expected) - self.assertFalse(idx.hasnans) + tm.assert_numpy_array_equal(idx._isnan, expected) + assert not idx.hasnans idx = index.copy() values = idx.values if len(index) == 0: continue - elif isinstance(index, pd.tseries.base.DatetimeIndexOpsMixin): - values[1] = pd.tslib.iNaT + elif isinstance(index, DatetimeIndexOpsMixin): + values[1] = iNaT elif isinstance(index, (Int64Index, UInt64Index)): continue else: @@ -837,8 +874,8 @@ def test_hasnans_isnans(self): expected = np.array([False] * len(idx), dtype=bool) expected[1] = True - self.assert_numpy_array_equal(idx._isnan, expected) - self.assertTrue(idx.hasnans) + tm.assert_numpy_array_equal(idx._isnan, expected) + assert idx.hasnans def test_fillna(self): # GH 11343 @@ -847,24 +884,24 @@ def test_fillna(self): pass elif isinstance(index, MultiIndex): idx = index.copy() - msg = "isnull is not defined for MultiIndex" - with self.assertRaisesRegexp(NotImplementedError, msg): + msg = "isna is not defined for MultiIndex" + with tm.assert_raises_regex(NotImplementedError, msg): idx.fillna(idx[0]) else: idx = index.copy() result = idx.fillna(idx[0]) - self.assert_index_equal(result, idx) - self.assertFalse(result is idx) + tm.assert_index_equal(result, idx) + assert result is not idx msg = "'value' must be a scalar, passed: " - with self.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): idx.fillna([idx[0]]) idx = index.copy() values = idx.values - if isinstance(index, pd.tseries.base.DatetimeIndexOpsMixin): - values[1] = pd.tslib.iNaT + if isinstance(index, DatetimeIndexOpsMixin): + values[1] = iNaT elif isinstance(index, (Int64Index, UInt64Index)): continue else: @@ -877,5 +914,43 @@ def test_fillna(self): expected = np.array([False] * len(idx), dtype=bool) expected[1] = True - self.assert_numpy_array_equal(idx._isnan, expected) - self.assertTrue(idx.hasnans) + tm.assert_numpy_array_equal(idx._isnan, expected) + assert idx.hasnans + + def test_nulls(self): + # this is really a smoke test for the methods + # as these are adequately tested for function elsewhere + + for name, index in self.indices.items(): + if len(index) == 0: + tm.assert_numpy_array_equal( + index.isna(), np.array([], dtype=bool)) + elif isinstance(index, MultiIndex): + idx = index.copy() + msg = "isna is not defined for MultiIndex" + with tm.assert_raises_regex(NotImplementedError, msg): + idx.isna() + else: + + if not index.hasnans: + tm.assert_numpy_array_equal( + index.isna(), np.zeros(len(index), dtype=bool)) + tm.assert_numpy_array_equal( + index.notna(), np.ones(len(index), dtype=bool)) + else: + result = isna(index) + tm.assert_numpy_array_equal(index.isna(), result) + tm.assert_numpy_array_equal(index.notna(), ~result) + + def test_empty(self): + # GH 15270 + index = self.create_index() + assert not index.empty + assert index[:0].empty + + @pytest.mark.parametrize('how', ['outer', 'inner', 'left', 'right']) + def test_join_self_unique(self, how): + index = self.create_index() + if index.is_unique: + joined = index.join(index, how=how) + assert (index == joined).all() diff --git a/pandas/tests/indexes/data/s1-0.12.0.pickle b/pandas/tests/indexes/data/s1-0.12.0.pickle deleted file mode 100644 index 0ce9cfdf3aa94..0000000000000 Binary files a/pandas/tests/indexes/data/s1-0.12.0.pickle and /dev/null differ diff --git a/pandas/tests/indexes/data/s2-0.12.0.pickle b/pandas/tests/indexes/data/s2-0.12.0.pickle deleted file mode 100644 index 2318be2d9978b..0000000000000 Binary files a/pandas/tests/indexes/data/s2-0.12.0.pickle and /dev/null differ diff --git a/pandas/tests/indexes/datetimelike.py b/pandas/tests/indexes/datetimelike.py new file mode 100644 index 0000000000000..114940009377c --- /dev/null +++ b/pandas/tests/indexes/datetimelike.py @@ -0,0 +1,40 @@ +""" generic datetimelike tests """ + +from .common import Base +import pandas.util.testing as tm + + +class DatetimeLike(Base): + + def test_shift_identity(self): + + idx = self.create_index() + tm.assert_index_equal(idx, idx.shift(0)) + + def test_str(self): + + # test the string repr + idx = self.create_index() + idx.name = 'foo' + assert not "length=%s" % len(idx) in str(idx) + assert "'foo'" in str(idx) + assert idx.__class__.__name__ in str(idx) + + if hasattr(idx, 'tz'): + if idx.tz is not None: + assert idx.tz in str(idx) + if hasattr(idx, 'freq'): + assert "freq='%s'" % idx.freqstr in str(idx) + + def test_view(self): + super(DatetimeLike, self).test_view() + + i = self.create_index() + + i_view = i.view('i8') + result = self._holder(i) + tm.assert_index_equal(result, i) + + i_view = i.view(self._holder) + result = self._holder(i) + tm.assert_index_equal(result, i_view) diff --git a/pandas/tests/types/__init__.py b/pandas/tests/indexes/datetimes/__init__.py similarity index 100% rename from pandas/tests/types/__init__.py rename to pandas/tests/indexes/datetimes/__init__.py diff --git a/pandas/tests/indexes/datetimes/test_astype.py b/pandas/tests/indexes/datetimes/test_astype.py new file mode 100644 index 0000000000000..46be24b90faae --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_astype.py @@ -0,0 +1,298 @@ +import pytest + +import pytz +import dateutil +import numpy as np + +from datetime import datetime +from dateutil.tz import tzlocal + +import pandas as pd +import pandas.util.testing as tm +from pandas import (DatetimeIndex, date_range, Series, NaT, Index, Timestamp, + Int64Index, Period) + + +class TestDatetimeIndex(object): + + def test_astype(self): + # GH 13149, GH 13209 + idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) + + result = idx.astype(object) + expected = Index([Timestamp('2016-05-16')] + [NaT] * 3, dtype=object) + tm.assert_index_equal(result, expected) + + result = idx.astype(int) + expected = Int64Index([1463356800000000000] + + [-9223372036854775808] * 3, dtype=np.int64) + tm.assert_index_equal(result, expected) + + rng = date_range('1/1/2000', periods=10) + result = rng.astype('i8') + tm.assert_index_equal(result, Index(rng.asi8)) + tm.assert_numpy_array_equal(result.values, rng.asi8) + + def test_astype_with_tz(self): + + # with tz + rng = date_range('1/1/2000', periods=10, tz='US/Eastern') + result = rng.astype('datetime64[ns]') + expected = (date_range('1/1/2000', periods=10, + tz='US/Eastern') + .tz_convert('UTC').tz_localize(None)) + tm.assert_index_equal(result, expected) + + # BUG#10442 : testing astype(str) is correct for Series/DatetimeIndex + result = pd.Series(pd.date_range('2012-01-01', periods=3)).astype(str) + expected = pd.Series( + ['2012-01-01', '2012-01-02', '2012-01-03'], dtype=object) + tm.assert_series_equal(result, expected) + + result = Series(pd.date_range('2012-01-01', periods=3, + tz='US/Eastern')).astype(str) + expected = Series(['2012-01-01 00:00:00-05:00', + '2012-01-02 00:00:00-05:00', + '2012-01-03 00:00:00-05:00'], + dtype=object) + tm.assert_series_equal(result, expected) + + def test_astype_str_compat(self): + # GH 13149, GH 13209 + # verify that we are returing NaT as a string (and not unicode) + + idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) + result = idx.astype(str) + expected = Index(['2016-05-16', 'NaT', 'NaT', 'NaT'], dtype=object) + tm.assert_index_equal(result, expected) + + def test_astype_str(self): + # test astype string - #10442 + result = date_range('2012-01-01', periods=4, + name='test_name').astype(str) + expected = Index(['2012-01-01', '2012-01-02', '2012-01-03', + '2012-01-04'], name='test_name', dtype=object) + tm.assert_index_equal(result, expected) + + # test astype string with tz and name + result = date_range('2012-01-01', periods=3, name='test_name', + tz='US/Eastern').astype(str) + expected = Index(['2012-01-01 00:00:00-05:00', + '2012-01-02 00:00:00-05:00', + '2012-01-03 00:00:00-05:00'], + name='test_name', dtype=object) + tm.assert_index_equal(result, expected) + + # test astype string with freqH and name + result = date_range('1/1/2011', periods=3, freq='H', + name='test_name').astype(str) + expected = Index(['2011-01-01 00:00:00', '2011-01-01 01:00:00', + '2011-01-01 02:00:00'], + name='test_name', dtype=object) + tm.assert_index_equal(result, expected) + + # test astype string with freqH and timezone + result = date_range('3/6/2012 00:00', periods=2, freq='H', + tz='Europe/London', name='test_name').astype(str) + expected = Index(['2012-03-06 00:00:00+00:00', + '2012-03-06 01:00:00+00:00'], + dtype=object, name='test_name') + tm.assert_index_equal(result, expected) + + def test_astype_datetime64(self): + # GH 13149, GH 13209 + idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) + + result = idx.astype('datetime64[ns]') + tm.assert_index_equal(result, idx) + assert result is not idx + + result = idx.astype('datetime64[ns]', copy=False) + tm.assert_index_equal(result, idx) + assert result is idx + + idx_tz = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN], tz='EST') + result = idx_tz.astype('datetime64[ns]') + expected = DatetimeIndex(['2016-05-16 05:00:00', 'NaT', 'NaT', 'NaT'], + dtype='datetime64[ns]') + tm.assert_index_equal(result, expected) + + def test_astype_raises(self): + # GH 13149, GH 13209 + idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) + + pytest.raises(ValueError, idx.astype, float) + pytest.raises(ValueError, idx.astype, 'timedelta64') + pytest.raises(ValueError, idx.astype, 'timedelta64[ns]') + pytest.raises(ValueError, idx.astype, 'datetime64') + pytest.raises(ValueError, idx.astype, 'datetime64[D]') + + def test_index_convert_to_datetime_array(self): + def _check_rng(rng): + converted = rng.to_pydatetime() + assert isinstance(converted, np.ndarray) + for x, stamp in zip(converted, rng): + assert isinstance(x, datetime) + assert x == stamp.to_pydatetime() + assert x.tzinfo == stamp.tzinfo + + rng = date_range('20090415', '20090519') + rng_eastern = date_range('20090415', '20090519', tz='US/Eastern') + rng_utc = date_range('20090415', '20090519', tz='utc') + + _check_rng(rng) + _check_rng(rng_eastern) + _check_rng(rng_utc) + + def test_index_convert_to_datetime_array_explicit_pytz(self): + def _check_rng(rng): + converted = rng.to_pydatetime() + assert isinstance(converted, np.ndarray) + for x, stamp in zip(converted, rng): + assert isinstance(x, datetime) + assert x == stamp.to_pydatetime() + assert x.tzinfo == stamp.tzinfo + + rng = date_range('20090415', '20090519') + rng_eastern = date_range('20090415', '20090519', + tz=pytz.timezone('US/Eastern')) + rng_utc = date_range('20090415', '20090519', tz=pytz.utc) + + _check_rng(rng) + _check_rng(rng_eastern) + _check_rng(rng_utc) + + def test_index_convert_to_datetime_array_dateutil(self): + def _check_rng(rng): + converted = rng.to_pydatetime() + assert isinstance(converted, np.ndarray) + for x, stamp in zip(converted, rng): + assert isinstance(x, datetime) + assert x == stamp.to_pydatetime() + assert x.tzinfo == stamp.tzinfo + + rng = date_range('20090415', '20090519') + rng_eastern = date_range('20090415', '20090519', + tz='dateutil/US/Eastern') + rng_utc = date_range('20090415', '20090519', tz=dateutil.tz.tzutc()) + + _check_rng(rng) + _check_rng(rng_eastern) + _check_rng(rng_utc) + + +class TestToPeriod(object): + + def setup_method(self, method): + data = [Timestamp('2007-01-01 10:11:12.123456Z'), + Timestamp('2007-01-01 10:11:13.789123Z')] + self.index = DatetimeIndex(data) + + def test_to_period_millisecond(self): + index = self.index + + period = index.to_period(freq='L') + assert 2 == len(period) + assert period[0] == Period('2007-01-01 10:11:12.123Z', 'L') + assert period[1] == Period('2007-01-01 10:11:13.789Z', 'L') + + def test_to_period_microsecond(self): + index = self.index + + period = index.to_period(freq='U') + assert 2 == len(period) + assert period[0] == Period('2007-01-01 10:11:12.123456Z', 'U') + assert period[1] == Period('2007-01-01 10:11:13.789123Z', 'U') + + def test_to_period_tz_pytz(self): + from pytz import utc as UTC + + xp = date_range('1/1/2000', '4/1/2000').to_period() + + ts = date_range('1/1/2000', '4/1/2000', tz='US/Eastern') + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=UTC) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + def test_to_period_tz_explicit_pytz(self): + xp = date_range('1/1/2000', '4/1/2000').to_period() + + ts = date_range('1/1/2000', '4/1/2000', tz=pytz.timezone('US/Eastern')) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=pytz.utc) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + def test_to_period_tz_dateutil(self): + xp = date_range('1/1/2000', '4/1/2000').to_period() + + ts = date_range('1/1/2000', '4/1/2000', tz='dateutil/US/Eastern') + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=dateutil.tz.tzutc()) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) + + result = ts.to_period()[0] + expected = ts[0].to_period() + + assert result == expected + tm.assert_index_equal(ts.to_period(), xp) + + def test_astype_object(self): + # NumPy 1.6.1 weak ns support + rng = date_range('1/1/2000', periods=20) + + casted = rng.astype('O') + exp_values = list(rng) + + tm.assert_index_equal(casted, Index(exp_values, dtype=np.object_)) + assert casted.tolist() == exp_values diff --git a/pandas/tests/indexes/datetimes/test_construction.py b/pandas/tests/indexes/datetimes/test_construction.py new file mode 100644 index 0000000000000..cf896b06130a2 --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_construction.py @@ -0,0 +1,589 @@ +import pytest + +import pytz +import numpy as np +from datetime import timedelta + +import pandas as pd +from pandas import offsets +import pandas.util.testing as tm +from pandas._libs import tslib, lib +from pandas._libs.tslib import OutOfBoundsDatetime +from pandas import (DatetimeIndex, Index, Timestamp, datetime, date_range, + to_datetime) + + +class TestDatetimeIndex(object): + + def test_construction_caching(self): + + df = pd.DataFrame({'dt': pd.date_range('20130101', periods=3), + 'dttz': pd.date_range('20130101', periods=3, + tz='US/Eastern'), + 'dt_with_null': [pd.Timestamp('20130101'), pd.NaT, + pd.Timestamp('20130103')], + 'dtns': pd.date_range('20130101', periods=3, + freq='ns')}) + assert df.dttz.dtype.tz.zone == 'US/Eastern' + + def test_construction_with_alt(self): + + i = pd.date_range('20130101', periods=5, freq='H', tz='US/Eastern') + i2 = DatetimeIndex(i, dtype=i.dtype) + tm.assert_index_equal(i, i2) + assert i.tz.zone == 'US/Eastern' + + i2 = DatetimeIndex(i.tz_localize(None).asi8, tz=i.dtype.tz) + tm.assert_index_equal(i, i2) + assert i.tz.zone == 'US/Eastern' + + i2 = DatetimeIndex(i.tz_localize(None).asi8, dtype=i.dtype) + tm.assert_index_equal(i, i2) + assert i.tz.zone == 'US/Eastern' + + i2 = DatetimeIndex( + i.tz_localize(None).asi8, dtype=i.dtype, tz=i.dtype.tz) + tm.assert_index_equal(i, i2) + assert i.tz.zone == 'US/Eastern' + + # localize into the provided tz + i2 = DatetimeIndex(i.tz_localize(None).asi8, tz='UTC') + expected = i.tz_localize(None).tz_localize('UTC') + tm.assert_index_equal(i2, expected) + + # incompat tz/dtype + pytest.raises(ValueError, lambda: DatetimeIndex( + i.tz_localize(None).asi8, dtype=i.dtype, tz='US/Pacific')) + + def test_construction_index_with_mixed_timezones(self): + # gh-11488: no tz results in DatetimeIndex + result = Index([Timestamp('2011-01-01'), + Timestamp('2011-01-02')], name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01'), + Timestamp('2011-01-02')], name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is None + + # same tz results in DatetimeIndex + result = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', tz='Asia/Tokyo')], + name='idx') + exp = DatetimeIndex( + [Timestamp('2011-01-01 10:00'), Timestamp('2011-01-02 10:00') + ], tz='Asia/Tokyo', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + # same tz results in DatetimeIndex (DST) + result = Index([Timestamp('2011-01-01 10:00', tz='US/Eastern'), + Timestamp('2011-08-01 10:00', tz='US/Eastern')], + name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), + Timestamp('2011-08-01 10:00')], + tz='US/Eastern', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + # Different tz results in Index(dtype=object) + result = Index([Timestamp('2011-01-01 10:00'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + name='idx') + exp = Index([Timestamp('2011-01-01 10:00'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + dtype='object', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert not isinstance(result, DatetimeIndex) + + result = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + name='idx') + exp = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + dtype='object', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert not isinstance(result, DatetimeIndex) + + # length = 1 + result = Index([Timestamp('2011-01-01')], name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01')], name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is None + + # length = 1 with tz + result = Index( + [Timestamp('2011-01-01 10:00', tz='Asia/Tokyo')], name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 10:00')], tz='Asia/Tokyo', + name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + def test_construction_index_with_mixed_timezones_with_NaT(self): + # see gh-11488 + result = Index([pd.NaT, Timestamp('2011-01-01'), + pd.NaT, Timestamp('2011-01-02')], name='idx') + exp = DatetimeIndex([pd.NaT, Timestamp('2011-01-01'), + pd.NaT, Timestamp('2011-01-02')], name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is None + + # Same tz results in DatetimeIndex + result = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + pd.NaT, Timestamp('2011-01-02 10:00', + tz='Asia/Tokyo')], + name='idx') + exp = DatetimeIndex([pd.NaT, Timestamp('2011-01-01 10:00'), + pd.NaT, Timestamp('2011-01-02 10:00')], + tz='Asia/Tokyo', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + # same tz results in DatetimeIndex (DST) + result = Index([Timestamp('2011-01-01 10:00', tz='US/Eastern'), + pd.NaT, + Timestamp('2011-08-01 10:00', tz='US/Eastern')], + name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), pd.NaT, + Timestamp('2011-08-01 10:00')], + tz='US/Eastern', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + # different tz results in Index(dtype=object) + result = Index([pd.NaT, Timestamp('2011-01-01 10:00'), + pd.NaT, Timestamp('2011-01-02 10:00', + tz='US/Eastern')], + name='idx') + exp = Index([pd.NaT, Timestamp('2011-01-01 10:00'), + pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], + dtype='object', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert not isinstance(result, DatetimeIndex) + + result = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + pd.NaT, Timestamp('2011-01-02 10:00', + tz='US/Eastern')], name='idx') + exp = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], + dtype='object', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert not isinstance(result, DatetimeIndex) + + # all NaT + result = Index([pd.NaT, pd.NaT], name='idx') + exp = DatetimeIndex([pd.NaT, pd.NaT], name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is None + + # all NaT with tz + result = Index([pd.NaT, pd.NaT], tz='Asia/Tokyo', name='idx') + exp = DatetimeIndex([pd.NaT, pd.NaT], tz='Asia/Tokyo', name='idx') + + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + assert result.tz is not None + assert result.tz == exp.tz + + def test_construction_dti_with_mixed_timezones(self): + # GH 11488 (not changed, added explicit tests) + + # no tz results in DatetimeIndex + result = DatetimeIndex( + [Timestamp('2011-01-01'), Timestamp('2011-01-02')], name='idx') + exp = DatetimeIndex( + [Timestamp('2011-01-01'), Timestamp('2011-01-02')], name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + + # same tz results in DatetimeIndex + result = DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', + tz='Asia/Tokyo')], + name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), + Timestamp('2011-01-02 10:00')], + tz='Asia/Tokyo', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + + # same tz results in DatetimeIndex (DST) + result = DatetimeIndex([Timestamp('2011-01-01 10:00', tz='US/Eastern'), + Timestamp('2011-08-01 10:00', + tz='US/Eastern')], + name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), + Timestamp('2011-08-01 10:00')], + tz='US/Eastern', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + + # different tz coerces tz-naive to tz-awareIndex(dtype=object) + result = DatetimeIndex([Timestamp('2011-01-01 10:00'), + Timestamp('2011-01-02 10:00', + tz='US/Eastern')], name='idx') + exp = DatetimeIndex([Timestamp('2011-01-01 05:00'), + Timestamp('2011-01-02 10:00')], + tz='US/Eastern', name='idx') + tm.assert_index_equal(result, exp, exact=True) + assert isinstance(result, DatetimeIndex) + + # tz mismatch affecting to tz-aware raises TypeError/ValueError + + with pytest.raises(ValueError): + DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + name='idx') + + with tm.assert_raises_regex(TypeError, + 'data is already tz-aware'): + DatetimeIndex([Timestamp('2011-01-01 10:00'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + tz='Asia/Tokyo', name='idx') + + with pytest.raises(ValueError): + DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), + Timestamp('2011-01-02 10:00', tz='US/Eastern')], + tz='US/Eastern', name='idx') + + with tm.assert_raises_regex(TypeError, + 'data is already tz-aware'): + # passing tz should results in DatetimeIndex, then mismatch raises + # TypeError + Index([pd.NaT, Timestamp('2011-01-01 10:00'), + pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], + tz='Asia/Tokyo', name='idx') + + def test_construction_base_constructor(self): + arr = [pd.Timestamp('2011-01-01'), pd.NaT, pd.Timestamp('2011-01-03')] + tm.assert_index_equal(pd.Index(arr), pd.DatetimeIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.DatetimeIndex(np.array(arr))) + + arr = [np.nan, pd.NaT, pd.Timestamp('2011-01-03')] + tm.assert_index_equal(pd.Index(arr), pd.DatetimeIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.DatetimeIndex(np.array(arr))) + + def test_construction_outofbounds(self): + # GH 13663 + dates = [datetime(3000, 1, 1), datetime(4000, 1, 1), + datetime(5000, 1, 1), datetime(6000, 1, 1)] + exp = Index(dates, dtype=object) + # coerces to object + tm.assert_index_equal(Index(dates), exp) + + with pytest.raises(OutOfBoundsDatetime): + # can't create DatetimeIndex + DatetimeIndex(dates) + + def test_construction_with_ndarray(self): + # GH 5152 + dates = [datetime(2013, 10, 7), + datetime(2013, 10, 8), + datetime(2013, 10, 9)] + data = DatetimeIndex(dates, freq=pd.tseries.frequencies.BDay()).values + result = DatetimeIndex(data, freq=pd.tseries.frequencies.BDay()) + expected = DatetimeIndex(['2013-10-07', + '2013-10-08', + '2013-10-09'], + freq='B') + tm.assert_index_equal(result, expected) + + def test_constructor_coverage(self): + rng = date_range('1/1/2000', periods=10.5) + exp = date_range('1/1/2000', periods=10) + tm.assert_index_equal(rng, exp) + + pytest.raises(ValueError, DatetimeIndex, start='1/1/2000', + periods='foo', freq='D') + + pytest.raises(ValueError, DatetimeIndex, start='1/1/2000', + end='1/10/2000') + + pytest.raises(ValueError, DatetimeIndex, '1/1/2000') + + # generator expression + gen = (datetime(2000, 1, 1) + timedelta(i) for i in range(10)) + result = DatetimeIndex(gen) + expected = DatetimeIndex([datetime(2000, 1, 1) + timedelta(i) + for i in range(10)]) + tm.assert_index_equal(result, expected) + + # NumPy string array + strings = np.array(['2000-01-01', '2000-01-02', '2000-01-03']) + result = DatetimeIndex(strings) + expected = DatetimeIndex(strings.astype('O')) + tm.assert_index_equal(result, expected) + + from_ints = DatetimeIndex(expected.asi8) + tm.assert_index_equal(from_ints, expected) + + # string with NaT + strings = np.array(['2000-01-01', '2000-01-02', 'NaT']) + result = DatetimeIndex(strings) + expected = DatetimeIndex(strings.astype('O')) + tm.assert_index_equal(result, expected) + + from_ints = DatetimeIndex(expected.asi8) + tm.assert_index_equal(from_ints, expected) + + # non-conforming + pytest.raises(ValueError, DatetimeIndex, + ['2000-01-01', '2000-01-02', '2000-01-04'], freq='D') + + pytest.raises(ValueError, DatetimeIndex, start='2011-01-01', + freq='b') + pytest.raises(ValueError, DatetimeIndex, end='2011-01-01', + freq='B') + pytest.raises(ValueError, DatetimeIndex, periods=10, freq='D') + + def test_constructor_datetime64_tzformat(self): + # see gh-6572: ISO 8601 format results in pytz.FixedOffset + for freq in ['AS', 'W-SUN']: + idx = date_range('2013-01-01T00:00:00-05:00', + '2016-01-01T23:59:59-05:00', freq=freq) + expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', + freq=freq, tz=pytz.FixedOffset(-300)) + tm.assert_index_equal(idx, expected) + # Unable to use `US/Eastern` because of DST + expected_i8 = date_range('2013-01-01T00:00:00', + '2016-01-01T23:59:59', freq=freq, + tz='America/Lima') + tm.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) + + idx = date_range('2013-01-01T00:00:00+09:00', + '2016-01-01T23:59:59+09:00', freq=freq) + expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', + freq=freq, tz=pytz.FixedOffset(540)) + tm.assert_index_equal(idx, expected) + expected_i8 = date_range('2013-01-01T00:00:00', + '2016-01-01T23:59:59', freq=freq, + tz='Asia/Tokyo') + tm.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) + + # Non ISO 8601 format results in dateutil.tz.tzoffset + for freq in ['AS', 'W-SUN']: + idx = date_range('2013/1/1 0:00:00-5:00', '2016/1/1 23:59:59-5:00', + freq=freq) + expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', + freq=freq, tz=pytz.FixedOffset(-300)) + tm.assert_index_equal(idx, expected) + # Unable to use `US/Eastern` because of DST + expected_i8 = date_range('2013-01-01T00:00:00', + '2016-01-01T23:59:59', freq=freq, + tz='America/Lima') + tm.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) + + idx = date_range('2013/1/1 0:00:00+9:00', + '2016/1/1 23:59:59+09:00', freq=freq) + expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', + freq=freq, tz=pytz.FixedOffset(540)) + tm.assert_index_equal(idx, expected) + expected_i8 = date_range('2013-01-01T00:00:00', + '2016-01-01T23:59:59', freq=freq, + tz='Asia/Tokyo') + tm.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) + + def test_constructor_dtype(self): + + # passing a dtype with a tz should localize + idx = DatetimeIndex(['2013-01-01', '2013-01-02'], + dtype='datetime64[ns, US/Eastern]') + expected = DatetimeIndex(['2013-01-01', '2013-01-02'] + ).tz_localize('US/Eastern') + tm.assert_index_equal(idx, expected) + + idx = DatetimeIndex(['2013-01-01', '2013-01-02'], + tz='US/Eastern') + tm.assert_index_equal(idx, expected) + + # if we already have a tz and its not the same, then raise + idx = DatetimeIndex(['2013-01-01', '2013-01-02'], + dtype='datetime64[ns, US/Eastern]') + + pytest.raises(ValueError, + lambda: DatetimeIndex(idx, + dtype='datetime64[ns]')) + + # this is effectively trying to convert tz's + pytest.raises(TypeError, + lambda: DatetimeIndex(idx, + dtype='datetime64[ns, CET]')) + pytest.raises(ValueError, + lambda: DatetimeIndex( + idx, tz='CET', + dtype='datetime64[ns, US/Eastern]')) + result = DatetimeIndex(idx, dtype='datetime64[ns, US/Eastern]') + tm.assert_index_equal(idx, result) + + def test_constructor_name(self): + idx = DatetimeIndex(start='2000-01-01', periods=1, freq='A', + name='TEST') + assert idx.name == 'TEST' + + def test_000constructor_resolution(self): + # 2252 + t1 = Timestamp((1352934390 * 1000000000) + 1000000 + 1000 + 1) + idx = DatetimeIndex([t1]) + + assert idx.nanosecond[0] == t1.nanosecond + + +class TestTimeSeries(object): + + def test_dti_constructor_preserve_dti_freq(self): + rng = date_range('1/1/2000', '1/2/2000', freq='5min') + + rng2 = DatetimeIndex(rng) + assert rng.freq == rng2.freq + + def test_dti_constructor_years_only(self): + # GH 6961 + for tz in [None, 'UTC', 'Asia/Tokyo', 'dateutil/US/Pacific']: + rng1 = date_range('2014', '2015', freq='M', tz=tz) + expected1 = date_range('2014-01-31', '2014-12-31', freq='M', tz=tz) + + rng2 = date_range('2014', '2015', freq='MS', tz=tz) + expected2 = date_range('2014-01-01', '2015-01-01', freq='MS', + tz=tz) + + rng3 = date_range('2014', '2020', freq='A', tz=tz) + expected3 = date_range('2014-12-31', '2019-12-31', freq='A', tz=tz) + + rng4 = date_range('2014', '2020', freq='AS', tz=tz) + expected4 = date_range('2014-01-01', '2020-01-01', freq='AS', + tz=tz) + + for rng, expected in [(rng1, expected1), (rng2, expected2), + (rng3, expected3), (rng4, expected4)]: + tm.assert_index_equal(rng, expected) + + def test_dti_constructor_small_int(self): + # GH 13721 + exp = DatetimeIndex(['1970-01-01 00:00:00.00000000', + '1970-01-01 00:00:00.00000001', + '1970-01-01 00:00:00.00000002']) + + for dtype in [np.int64, np.int32, np.int16, np.int8]: + arr = np.array([0, 10, 20], dtype=dtype) + tm.assert_index_equal(DatetimeIndex(arr), exp) + + def test_ctor_str_intraday(self): + rng = DatetimeIndex(['1-1-2000 00:00:01']) + assert rng[0].second == 1 + + def test_is_(self): + dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') + assert dti.is_(dti) + assert dti.is_(dti.view()) + assert not dti.is_(dti.copy()) + + def test_index_cast_datetime64_other_units(self): + arr = np.arange(0, 100, 10, dtype=np.int64).view('M8[D]') + idx = Index(arr) + + assert (idx.values == tslib.cast_to_nanoseconds(arr)).all() + + def test_constructor_int64_nocopy(self): + # #1624 + arr = np.arange(1000, dtype=np.int64) + index = DatetimeIndex(arr) + + arr[50:100] = -1 + assert (index.asi8[50:100] == -1).all() + + arr = np.arange(1000, dtype=np.int64) + index = DatetimeIndex(arr, copy=True) + + arr[50:100] = -1 + assert (index.asi8[50:100] != -1).all() + + def test_from_freq_recreate_from_data(self): + freqs = ['M', 'Q', 'A', 'D', 'B', 'BH', 'T', 'S', 'L', 'U', 'H', 'N', + 'C'] + + for f in freqs: + org = DatetimeIndex(start='2001/02/01 09:00', freq=f, periods=1) + idx = DatetimeIndex(org, freq=f) + tm.assert_index_equal(idx, org) + + org = DatetimeIndex(start='2001/02/01 09:00', freq=f, + tz='US/Pacific', periods=1) + idx = DatetimeIndex(org, freq=f, tz='US/Pacific') + tm.assert_index_equal(idx, org) + + def test_datetimeindex_constructor_misc(self): + arr = ['1/1/2005', '1/2/2005', 'Jn 3, 2005', '2005-01-04'] + pytest.raises(Exception, DatetimeIndex, arr) + + arr = ['1/1/2005', '1/2/2005', '1/3/2005', '2005-01-04'] + idx1 = DatetimeIndex(arr) + + arr = [datetime(2005, 1, 1), '1/2/2005', '1/3/2005', '2005-01-04'] + idx2 = DatetimeIndex(arr) + + arr = [lib.Timestamp(datetime(2005, 1, 1)), '1/2/2005', '1/3/2005', + '2005-01-04'] + idx3 = DatetimeIndex(arr) + + arr = np.array(['1/1/2005', '1/2/2005', '1/3/2005', + '2005-01-04'], dtype='O') + idx4 = DatetimeIndex(arr) + + arr = to_datetime(['1/1/2005', '1/2/2005', '1/3/2005', '2005-01-04']) + idx5 = DatetimeIndex(arr) + + arr = to_datetime(['1/1/2005', '1/2/2005', 'Jan 3, 2005', '2005-01-04' + ]) + idx6 = DatetimeIndex(arr) + + idx7 = DatetimeIndex(['12/05/2007', '25/01/2008'], dayfirst=True) + idx8 = DatetimeIndex(['2007/05/12', '2008/01/25'], dayfirst=False, + yearfirst=True) + tm.assert_index_equal(idx7, idx8) + + for other in [idx2, idx3, idx4, idx5, idx6]: + assert (idx1.values == other.values).all() + + sdate = datetime(1999, 12, 25) + edate = datetime(2000, 1, 1) + idx = DatetimeIndex(start=sdate, freq='1B', periods=20) + assert len(idx) == 20 + assert idx[0] == sdate + 0 * offsets.BDay() + assert idx.freq == 'B' + + idx = DatetimeIndex(end=edate, freq=('D', 5), periods=20) + assert len(idx) == 20 + assert idx[-1] == edate + assert idx.freq == '5D' + + idx1 = DatetimeIndex(start=sdate, end=edate, freq='W-SUN') + idx2 = DatetimeIndex(start=sdate, end=edate, + freq=offsets.Week(weekday=6)) + assert len(idx1) == len(idx2) + assert idx1.offset == idx2.offset + + idx1 = DatetimeIndex(start=sdate, end=edate, freq='QS') + idx2 = DatetimeIndex(start=sdate, end=edate, + freq=offsets.QuarterBegin(startingMonth=1)) + assert len(idx1) == len(idx2) + assert idx1.offset == idx2.offset + + idx1 = DatetimeIndex(start=sdate, end=edate, freq='BQ') + idx2 = DatetimeIndex(start=sdate, end=edate, + freq=offsets.BQuarterEnd(startingMonth=12)) + assert len(idx1) == len(idx2) + assert idx1.offset == idx2.offset diff --git a/pandas/tests/indexes/datetimes/test_date_range.py b/pandas/tests/indexes/datetimes/test_date_range.py new file mode 100644 index 0000000000000..da4ca83c10dda --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_date_range.py @@ -0,0 +1,585 @@ +""" +test date_range, bdate_range, cdate_range +construction from the convenience range functions +""" + +import pytest + +import numpy as np +from pytz import timezone +from datetime import datetime, timedelta, time + +import pandas as pd +import pandas.util.testing as tm +from pandas import compat +from pandas.core.indexes.datetimes import bdate_range, cdate_range +from pandas import date_range, offsets, DatetimeIndex, Timestamp +from pandas.tseries.offsets import (generate_range, CDay, BDay, + DateOffset, MonthEnd) + +from pandas.tests.series.common import TestData + +START, END = datetime(2009, 1, 1), datetime(2010, 1, 1) + + +def eq_gen_range(kwargs, expected): + rng = generate_range(**kwargs) + assert (np.array_equal(list(rng), expected)) + + +class TestDateRanges(TestData): + + def test_date_range_gen_error(self): + rng = date_range('1/1/2000 00:00', '1/1/2000 00:18', freq='5min') + assert len(rng) == 4 + + @pytest.mark.parametrize("freq", ["AS", "YS"]) + def test_begin_year_alias(self, freq): + # see gh-9313 + rng = date_range("1/1/2013", "7/1/2017", freq=freq) + exp = pd.DatetimeIndex(["2013-01-01", "2014-01-01", + "2015-01-01", "2016-01-01", + "2017-01-01"], freq=freq) + tm.assert_index_equal(rng, exp) + + @pytest.mark.parametrize("freq", ["A", "Y"]) + def test_end_year_alias(self, freq): + # see gh-9313 + rng = date_range("1/1/2013", "7/1/2017", freq=freq) + exp = pd.DatetimeIndex(["2013-12-31", "2014-12-31", + "2015-12-31", "2016-12-31"], freq=freq) + tm.assert_index_equal(rng, exp) + + @pytest.mark.parametrize("freq", ["BA", "BY"]) + def test_business_end_year_alias(self, freq): + # see gh-9313 + rng = date_range("1/1/2013", "7/1/2017", freq=freq) + exp = pd.DatetimeIndex(["2013-12-31", "2014-12-31", + "2015-12-31", "2016-12-30"], freq=freq) + tm.assert_index_equal(rng, exp) + + def test_date_range_negative_freq(self): + # GH 11018 + rng = date_range('2011-12-31', freq='-2A', periods=3) + exp = pd.DatetimeIndex(['2011-12-31', '2009-12-31', + '2007-12-31'], freq='-2A') + tm.assert_index_equal(rng, exp) + assert rng.freq == '-2A' + + rng = date_range('2011-01-31', freq='-2M', periods=3) + exp = pd.DatetimeIndex(['2011-01-31', '2010-11-30', + '2010-09-30'], freq='-2M') + tm.assert_index_equal(rng, exp) + assert rng.freq == '-2M' + + def test_date_range_bms_bug(self): + # #1645 + rng = date_range('1/1/2000', periods=10, freq='BMS') + + ex_first = Timestamp('2000-01-03') + assert rng[0] == ex_first + + def test_date_range_normalize(self): + snap = datetime.today() + n = 50 + + rng = date_range(snap, periods=n, normalize=False, freq='2D') + + offset = timedelta(2) + values = DatetimeIndex([snap + i * offset for i in range(n)]) + + tm.assert_index_equal(rng, values) + + rng = date_range('1/1/2000 08:15', periods=n, normalize=False, + freq='B') + the_time = time(8, 15) + for val in rng: + assert val.time() == the_time + + def test_date_range_fy5252(self): + dr = date_range(start="2013-01-01", periods=2, freq=offsets.FY5253( + startingMonth=1, weekday=3, variation="nearest")) + assert dr[0] == Timestamp('2013-01-31') + assert dr[1] == Timestamp('2014-01-30') + + def test_date_range_ambiguous_arguments(self): + # #2538 + start = datetime(2011, 1, 1, 5, 3, 40) + end = datetime(2011, 1, 1, 8, 9, 40) + + pytest.raises(ValueError, date_range, start, end, freq='s', + periods=10) + + def test_date_range_businesshour(self): + idx = DatetimeIndex(['2014-07-04 09:00', '2014-07-04 10:00', + '2014-07-04 11:00', + '2014-07-04 12:00', '2014-07-04 13:00', + '2014-07-04 14:00', + '2014-07-04 15:00', '2014-07-04 16:00'], + freq='BH') + rng = date_range('2014-07-04 09:00', '2014-07-04 16:00', freq='BH') + tm.assert_index_equal(idx, rng) + + idx = DatetimeIndex( + ['2014-07-04 16:00', '2014-07-07 09:00'], freq='BH') + rng = date_range('2014-07-04 16:00', '2014-07-07 09:00', freq='BH') + tm.assert_index_equal(idx, rng) + + idx = DatetimeIndex(['2014-07-04 09:00', '2014-07-04 10:00', + '2014-07-04 11:00', + '2014-07-04 12:00', '2014-07-04 13:00', + '2014-07-04 14:00', + '2014-07-04 15:00', '2014-07-04 16:00', + '2014-07-07 09:00', '2014-07-07 10:00', + '2014-07-07 11:00', + '2014-07-07 12:00', '2014-07-07 13:00', + '2014-07-07 14:00', + '2014-07-07 15:00', '2014-07-07 16:00', + '2014-07-08 09:00', '2014-07-08 10:00', + '2014-07-08 11:00', + '2014-07-08 12:00', '2014-07-08 13:00', + '2014-07-08 14:00', + '2014-07-08 15:00', '2014-07-08 16:00'], + freq='BH') + rng = date_range('2014-07-04 09:00', '2014-07-08 16:00', freq='BH') + tm.assert_index_equal(idx, rng) + + def test_range_misspecified(self): + # GH #1095 + + pytest.raises(ValueError, date_range, '1/1/2000') + pytest.raises(ValueError, date_range, end='1/1/2000') + pytest.raises(ValueError, date_range, periods=10) + + pytest.raises(ValueError, date_range, '1/1/2000', freq='H') + pytest.raises(ValueError, date_range, end='1/1/2000', freq='H') + pytest.raises(ValueError, date_range, periods=10, freq='H') + + def test_compat_replace(self): + # https://github.com/statsmodels/statsmodels/issues/3349 + # replace should take ints/longs for compat + + for f in [compat.long, int]: + result = date_range(Timestamp('1960-04-01 00:00:00', + freq='QS-JAN'), + periods=f(76), + freq='QS-JAN') + assert len(result) == 76 + + def test_catch_infinite_loop(self): + offset = offsets.DateOffset(minute=5) + # blow up, don't loop forever + pytest.raises(Exception, date_range, datetime(2011, 11, 11), + datetime(2011, 11, 12), freq=offset) + + +class TestGenRangeGeneration(object): + + def test_generate(self): + rng1 = list(generate_range(START, END, offset=BDay())) + rng2 = list(generate_range(START, END, time_rule='B')) + assert rng1 == rng2 + + def test_generate_cday(self): + rng1 = list(generate_range(START, END, offset=CDay())) + rng2 = list(generate_range(START, END, time_rule='C')) + assert rng1 == rng2 + + def test_1(self): + eq_gen_range(dict(start=datetime(2009, 3, 25), periods=2), + [datetime(2009, 3, 25), datetime(2009, 3, 26)]) + + def test_2(self): + eq_gen_range(dict(start=datetime(2008, 1, 1), + end=datetime(2008, 1, 3)), + [datetime(2008, 1, 1), + datetime(2008, 1, 2), + datetime(2008, 1, 3)]) + + def test_3(self): + eq_gen_range(dict(start=datetime(2008, 1, 5), + end=datetime(2008, 1, 6)), + []) + + def test_precision_finer_than_offset(self): + # GH 9907 + result1 = DatetimeIndex(start='2015-04-15 00:00:03', + end='2016-04-22 00:00:00', freq='Q') + result2 = DatetimeIndex(start='2015-04-15 00:00:03', + end='2015-06-22 00:00:04', freq='W') + expected1_list = ['2015-06-30 00:00:03', '2015-09-30 00:00:03', + '2015-12-31 00:00:03', '2016-03-31 00:00:03'] + expected2_list = ['2015-04-19 00:00:03', '2015-04-26 00:00:03', + '2015-05-03 00:00:03', '2015-05-10 00:00:03', + '2015-05-17 00:00:03', '2015-05-24 00:00:03', + '2015-05-31 00:00:03', '2015-06-07 00:00:03', + '2015-06-14 00:00:03', '2015-06-21 00:00:03'] + expected1 = DatetimeIndex(expected1_list, dtype='datetime64[ns]', + freq='Q-DEC', tz=None) + expected2 = DatetimeIndex(expected2_list, dtype='datetime64[ns]', + freq='W-SUN', tz=None) + tm.assert_index_equal(result1, expected1) + tm.assert_index_equal(result2, expected2) + + +class TestBusinessDateRange(object): + + def setup_method(self, method): + self.rng = bdate_range(START, END) + + def test_constructor(self): + bdate_range(START, END, freq=BDay()) + bdate_range(START, periods=20, freq=BDay()) + bdate_range(end=START, periods=20, freq=BDay()) + pytest.raises(ValueError, date_range, '2011-1-1', '2012-1-1', 'B') + pytest.raises(ValueError, bdate_range, '2011-1-1', '2012-1-1', 'B') + + def test_naive_aware_conflicts(self): + naive = bdate_range(START, END, freq=BDay(), tz=None) + aware = bdate_range(START, END, freq=BDay(), + tz="Asia/Hong_Kong") + tm.assert_raises_regex(TypeError, "tz-naive.*tz-aware", + naive.join, aware) + tm.assert_raises_regex(TypeError, "tz-naive.*tz-aware", + aware.join, naive) + + def test_cached_range(self): + DatetimeIndex._cached_range(START, END, offset=BDay()) + DatetimeIndex._cached_range(START, periods=20, offset=BDay()) + DatetimeIndex._cached_range(end=START, periods=20, offset=BDay()) + + tm.assert_raises_regex(TypeError, "offset", + DatetimeIndex._cached_range, + START, END) + + tm.assert_raises_regex(TypeError, "specify period", + DatetimeIndex._cached_range, START, + offset=BDay()) + + tm.assert_raises_regex(TypeError, "specify period", + DatetimeIndex._cached_range, end=END, + offset=BDay()) + + tm.assert_raises_regex(TypeError, "start or end", + DatetimeIndex._cached_range, periods=20, + offset=BDay()) + + def test_cached_range_bug(self): + rng = date_range('2010-09-01 05:00:00', periods=50, + freq=DateOffset(hours=6)) + assert len(rng) == 50 + assert rng[0] == datetime(2010, 9, 1, 5) + + def test_timezone_comparaison_bug(self): + # smoke test + start = Timestamp('20130220 10:00', tz='US/Eastern') + result = date_range(start, periods=2, tz='US/Eastern') + assert len(result) == 2 + + def test_timezone_comparaison_assert(self): + start = Timestamp('20130220 10:00', tz='US/Eastern') + pytest.raises(AssertionError, date_range, start, periods=2, + tz='Europe/Berlin') + + def test_misc(self): + end = datetime(2009, 5, 13) + dr = bdate_range(end=end, periods=20) + firstDate = end - 19 * BDay() + + assert len(dr) == 20 + assert dr[0] == firstDate + assert dr[-1] == end + + def test_date_parse_failure(self): + badly_formed_date = '2007/100/1' + + pytest.raises(ValueError, Timestamp, badly_formed_date) + + pytest.raises(ValueError, bdate_range, start=badly_formed_date, + periods=10) + pytest.raises(ValueError, bdate_range, end=badly_formed_date, + periods=10) + pytest.raises(ValueError, bdate_range, badly_formed_date, + badly_formed_date) + + def test_daterange_bug_456(self): + # GH #456 + rng1 = bdate_range('12/5/2011', '12/5/2011') + rng2 = bdate_range('12/2/2011', '12/5/2011') + rng2.offset = BDay() + + result = rng1.union(rng2) + assert isinstance(result, DatetimeIndex) + + def test_error_with_zero_monthends(self): + pytest.raises(ValueError, date_range, '1/1/2000', '1/1/2001', + freq=MonthEnd(0)) + + def test_range_bug(self): + # GH #770 + offset = DateOffset(months=3) + result = date_range("2011-1-1", "2012-1-31", freq=offset) + + start = datetime(2011, 1, 1) + exp_values = [start + i * offset for i in range(5)] + tm.assert_index_equal(result, DatetimeIndex(exp_values)) + + def test_range_tz_pytz(self): + # see gh-2906 + tz = timezone('US/Eastern') + start = tz.localize(datetime(2011, 1, 1)) + end = tz.localize(datetime(2011, 1, 3)) + + dr = date_range(start=start, periods=3) + assert dr.tz.zone == tz.zone + assert dr[0] == start + assert dr[2] == end + + dr = date_range(end=end, periods=3) + assert dr.tz.zone == tz.zone + assert dr[0] == start + assert dr[2] == end + + dr = date_range(start=start, end=end) + assert dr.tz.zone == tz.zone + assert dr[0] == start + assert dr[2] == end + + def test_range_tz_dst_straddle_pytz(self): + tz = timezone('US/Eastern') + dates = [(tz.localize(datetime(2014, 3, 6)), + tz.localize(datetime(2014, 3, 12))), + (tz.localize(datetime(2013, 11, 1)), + tz.localize(datetime(2013, 11, 6)))] + for (start, end) in dates: + dr = date_range(start, end, freq='D') + assert dr[0] == start + assert dr[-1] == end + assert np.all(dr.hour == 0) + + dr = date_range(start, end, freq='D', tz='US/Eastern') + assert dr[0] == start + assert dr[-1] == end + assert np.all(dr.hour == 0) + + dr = date_range(start.replace(tzinfo=None), end.replace( + tzinfo=None), freq='D', tz='US/Eastern') + assert dr[0] == start + assert dr[-1] == end + assert np.all(dr.hour == 0) + + def test_range_tz_dateutil(self): + # see gh-2906 + + # Use maybe_get_tz to fix filename in tz under dateutil. + from pandas._libs.tslib import maybe_get_tz + tz = lambda x: maybe_get_tz('dateutil/' + x) + + start = datetime(2011, 1, 1, tzinfo=tz('US/Eastern')) + end = datetime(2011, 1, 3, tzinfo=tz('US/Eastern')) + + dr = date_range(start=start, periods=3) + assert dr.tz == tz('US/Eastern') + assert dr[0] == start + assert dr[2] == end + + dr = date_range(end=end, periods=3) + assert dr.tz == tz('US/Eastern') + assert dr[0] == start + assert dr[2] == end + + dr = date_range(start=start, end=end) + assert dr.tz == tz('US/Eastern') + assert dr[0] == start + assert dr[2] == end + + def test_range_closed(self): + begin = datetime(2011, 1, 1) + end = datetime(2014, 1, 1) + + for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: + closed = date_range(begin, end, closed=None, freq=freq) + left = date_range(begin, end, closed="left", freq=freq) + right = date_range(begin, end, closed="right", freq=freq) + expected_left = left + expected_right = right + + if end == closed[-1]: + expected_left = closed[:-1] + if begin == closed[0]: + expected_right = closed[1:] + + tm.assert_index_equal(expected_left, left) + tm.assert_index_equal(expected_right, right) + + def test_range_closed_with_tz_aware_start_end(self): + # GH12409, GH12684 + begin = Timestamp('2011/1/1', tz='US/Eastern') + end = Timestamp('2014/1/1', tz='US/Eastern') + + for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: + closed = date_range(begin, end, closed=None, freq=freq) + left = date_range(begin, end, closed="left", freq=freq) + right = date_range(begin, end, closed="right", freq=freq) + expected_left = left + expected_right = right + + if end == closed[-1]: + expected_left = closed[:-1] + if begin == closed[0]: + expected_right = closed[1:] + + tm.assert_index_equal(expected_left, left) + tm.assert_index_equal(expected_right, right) + + begin = Timestamp('2011/1/1') + end = Timestamp('2014/1/1') + begintz = Timestamp('2011/1/1', tz='US/Eastern') + endtz = Timestamp('2014/1/1', tz='US/Eastern') + + for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: + closed = date_range(begin, end, closed=None, freq=freq, + tz='US/Eastern') + left = date_range(begin, end, closed="left", freq=freq, + tz='US/Eastern') + right = date_range(begin, end, closed="right", freq=freq, + tz='US/Eastern') + expected_left = left + expected_right = right + + if endtz == closed[-1]: + expected_left = closed[:-1] + if begintz == closed[0]: + expected_right = closed[1:] + + tm.assert_index_equal(expected_left, left) + tm.assert_index_equal(expected_right, right) + + def test_range_closed_boundary(self): + # GH 11804 + for closed in ['right', 'left', None]: + right_boundary = date_range('2015-09-12', '2015-12-01', + freq='QS-MAR', closed=closed) + left_boundary = date_range('2015-09-01', '2015-09-12', + freq='QS-MAR', closed=closed) + both_boundary = date_range('2015-09-01', '2015-12-01', + freq='QS-MAR', closed=closed) + expected_right = expected_left = expected_both = both_boundary + + if closed == 'right': + expected_left = both_boundary[1:] + if closed == 'left': + expected_right = both_boundary[:-1] + if closed is None: + expected_right = both_boundary[1:] + expected_left = both_boundary[:-1] + + tm.assert_index_equal(right_boundary, expected_right) + tm.assert_index_equal(left_boundary, expected_left) + tm.assert_index_equal(both_boundary, expected_both) + + def test_years_only(self): + # GH 6961 + dr = date_range('2014', '2015', freq='M') + assert dr[0] == datetime(2014, 1, 31) + assert dr[-1] == datetime(2014, 12, 31) + + def test_freq_divides_end_in_nanos(self): + # GH 10885 + result_1 = date_range('2005-01-12 10:00', '2005-01-12 16:00', + freq='345min') + result_2 = date_range('2005-01-13 10:00', '2005-01-13 16:00', + freq='345min') + expected_1 = DatetimeIndex(['2005-01-12 10:00:00', + '2005-01-12 15:45:00'], + dtype='datetime64[ns]', freq='345T', + tz=None) + expected_2 = DatetimeIndex(['2005-01-13 10:00:00', + '2005-01-13 15:45:00'], + dtype='datetime64[ns]', freq='345T', + tz=None) + tm.assert_index_equal(result_1, expected_1) + tm.assert_index_equal(result_2, expected_2) + + +class TestCustomDateRange(object): + def setup_method(self, method): + self.rng = cdate_range(START, END) + + def test_constructor(self): + cdate_range(START, END, freq=CDay()) + cdate_range(START, periods=20, freq=CDay()) + cdate_range(end=START, periods=20, freq=CDay()) + pytest.raises(ValueError, date_range, '2011-1-1', '2012-1-1', 'C') + pytest.raises(ValueError, cdate_range, '2011-1-1', '2012-1-1', 'C') + + def test_cached_range(self): + DatetimeIndex._cached_range(START, END, offset=CDay()) + DatetimeIndex._cached_range(START, periods=20, + offset=CDay()) + DatetimeIndex._cached_range(end=START, periods=20, + offset=CDay()) + + pytest.raises(Exception, DatetimeIndex._cached_range, START, END) + + pytest.raises(Exception, DatetimeIndex._cached_range, START, + freq=CDay()) + + pytest.raises(Exception, DatetimeIndex._cached_range, end=END, + freq=CDay()) + + pytest.raises(Exception, DatetimeIndex._cached_range, periods=20, + freq=CDay()) + + def test_misc(self): + end = datetime(2009, 5, 13) + dr = cdate_range(end=end, periods=20) + firstDate = end - 19 * CDay() + + assert len(dr) == 20 + assert dr[0] == firstDate + assert dr[-1] == end + + def test_date_parse_failure(self): + badly_formed_date = '2007/100/1' + + pytest.raises(ValueError, Timestamp, badly_formed_date) + + pytest.raises(ValueError, cdate_range, start=badly_formed_date, + periods=10) + pytest.raises(ValueError, cdate_range, end=badly_formed_date, + periods=10) + pytest.raises(ValueError, cdate_range, badly_formed_date, + badly_formed_date) + + def test_daterange_bug_456(self): + # GH #456 + rng1 = cdate_range('12/5/2011', '12/5/2011') + rng2 = cdate_range('12/2/2011', '12/5/2011') + rng2.offset = CDay() + + result = rng1.union(rng2) + assert isinstance(result, DatetimeIndex) + + def test_cdaterange(self): + rng = cdate_range('2013-05-01', periods=3) + xp = DatetimeIndex(['2013-05-01', '2013-05-02', '2013-05-03']) + tm.assert_index_equal(xp, rng) + + def test_cdaterange_weekmask(self): + rng = cdate_range('2013-05-01', periods=3, + weekmask='Sun Mon Tue Wed Thu') + xp = DatetimeIndex(['2013-05-01', '2013-05-02', '2013-05-05']) + tm.assert_index_equal(xp, rng) + + def test_cdaterange_holidays(self): + rng = cdate_range('2013-05-01', periods=3, holidays=['2013-05-01']) + xp = DatetimeIndex(['2013-05-02', '2013-05-03', '2013-05-06']) + tm.assert_index_equal(xp, rng) + + def test_cdaterange_weekmask_and_holidays(self): + rng = cdate_range('2013-05-01', periods=3, + weekmask='Sun Mon Tue Wed Thu', + holidays=['2013-05-01']) + xp = DatetimeIndex(['2013-05-02', '2013-05-05', '2013-05-06']) + tm.assert_index_equal(xp, rng) diff --git a/pandas/tests/indexes/datetimes/test_datetime.py b/pandas/tests/indexes/datetimes/test_datetime.py new file mode 100644 index 0000000000000..47f53f53cfd02 --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_datetime.py @@ -0,0 +1,776 @@ +import pytest + +import numpy as np +from datetime import date, timedelta, time + +import dateutil +import pandas as pd +import pandas.util.testing as tm +from pandas.compat import lrange +from pandas.compat.numpy import np_datetime64_compat +from pandas import (DatetimeIndex, Index, date_range, Series, DataFrame, + Timestamp, datetime, offsets) + +from pandas.util.testing import assert_series_equal, assert_almost_equal + +randn = np.random.randn + + +class TestDatetimeIndex(object): + + def test_get_loc(self): + idx = pd.date_range('2000-01-01', periods=3) + + for method in [None, 'pad', 'backfill', 'nearest']: + assert idx.get_loc(idx[1], method) == 1 + assert idx.get_loc(idx[1].to_pydatetime(), method) == 1 + assert idx.get_loc(str(idx[1]), method) == 1 + + if method is not None: + assert idx.get_loc(idx[1], method, + tolerance=pd.Timedelta('0 days')) == 1 + + assert idx.get_loc('2000-01-01', method='nearest') == 0 + assert idx.get_loc('2000-01-01T12', method='nearest') == 1 + + assert idx.get_loc('2000-01-01T12', method='nearest', + tolerance='1 day') == 1 + assert idx.get_loc('2000-01-01T12', method='nearest', + tolerance=pd.Timedelta('1D')) == 1 + assert idx.get_loc('2000-01-01T12', method='nearest', + tolerance=np.timedelta64(1, 'D')) == 1 + assert idx.get_loc('2000-01-01T12', method='nearest', + tolerance=timedelta(1)) == 1 + with tm.assert_raises_regex(ValueError, 'must be convertible'): + idx.get_loc('2000-01-01T12', method='nearest', tolerance='foo') + with pytest.raises(KeyError): + idx.get_loc('2000-01-01T03', method='nearest', tolerance='2 hours') + + assert idx.get_loc('2000', method='nearest') == slice(0, 3) + assert idx.get_loc('2000-01', method='nearest') == slice(0, 3) + + assert idx.get_loc('1999', method='nearest') == 0 + assert idx.get_loc('2001', method='nearest') == 2 + + with pytest.raises(KeyError): + idx.get_loc('1999', method='pad') + with pytest.raises(KeyError): + idx.get_loc('2001', method='backfill') + + with pytest.raises(KeyError): + idx.get_loc('foobar') + with pytest.raises(TypeError): + idx.get_loc(slice(2)) + + idx = pd.to_datetime(['2000-01-01', '2000-01-04']) + assert idx.get_loc('2000-01-02', method='nearest') == 0 + assert idx.get_loc('2000-01-03', method='nearest') == 1 + assert idx.get_loc('2000-01', method='nearest') == slice(0, 2) + + # time indexing + idx = pd.date_range('2000-01-01', periods=24, freq='H') + tm.assert_numpy_array_equal(idx.get_loc(time(12)), + np.array([12]), check_dtype=False) + tm.assert_numpy_array_equal(idx.get_loc(time(12, 30)), + np.array([]), check_dtype=False) + with pytest.raises(NotImplementedError): + idx.get_loc(time(12, 30), method='pad') + + def test_get_indexer(self): + idx = pd.date_range('2000-01-01', periods=3) + exp = np.array([0, 1, 2], dtype=np.intp) + tm.assert_numpy_array_equal(idx.get_indexer(idx), exp) + + target = idx[0] + pd.to_timedelta(['-1 hour', '12 hours', + '1 day 1 hour']) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), + np.array([-1, 0, 1], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), + np.array([0, 1, 2], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), + np.array([0, 1, 1], dtype=np.intp)) + tm.assert_numpy_array_equal( + idx.get_indexer(target, 'nearest', + tolerance=pd.Timedelta('1 hour')), + np.array([0, -1, 1], dtype=np.intp)) + with pytest.raises(ValueError): + idx.get_indexer(idx[[0]], method='nearest', tolerance='foo') + + def test_reasonable_keyerror(self): + # GH #1062 + index = DatetimeIndex(['1/3/2000']) + try: + index.get_loc('1/1/2000') + except KeyError as e: + assert '2000' in str(e) + + def test_roundtrip_pickle_with_tz(self): + + # GH 8367 + # round-trip of timezone + index = date_range('20130101', periods=3, tz='US/Eastern', name='foo') + unpickled = tm.round_trip_pickle(index) + tm.assert_index_equal(index, unpickled) + + def test_reindex_preserves_tz_if_target_is_empty_list_or_array(self): + # GH7774 + index = date_range('20130101', periods=3, tz='US/Eastern') + assert str(index.reindex([])[0].tz) == 'US/Eastern' + assert str(index.reindex(np.array([]))[0].tz) == 'US/Eastern' + + def test_time_loc(self): # GH8667 + from datetime import time + from pandas._libs.index import _SIZE_CUTOFF + + ns = _SIZE_CUTOFF + np.array([-100, 100], dtype=np.int64) + key = time(15, 11, 30) + start = key.hour * 3600 + key.minute * 60 + key.second + step = 24 * 3600 + + for n in ns: + idx = pd.date_range('2014-11-26', periods=n, freq='S') + ts = pd.Series(np.random.randn(n), index=idx) + i = np.arange(start, n, step) + + tm.assert_numpy_array_equal(ts.index.get_loc(key), i, + check_dtype=False) + tm.assert_series_equal(ts[key], ts.iloc[i]) + + left, right = ts.copy(), ts.copy() + left[key] *= -10 + right.iloc[i] *= -10 + tm.assert_series_equal(left, right) + + def test_time_overflow_for_32bit_machines(self): + # GH8943. On some machines NumPy defaults to np.int32 (for example, + # 32-bit Linux machines). In the function _generate_regular_range + # found in tseries/index.py, `periods` gets multiplied by `strides` + # (which has value 1e9) and since the max value for np.int32 is ~2e9, + # and since those machines won't promote np.int32 to np.int64, we get + # overflow. + periods = np.int_(1000) + + idx1 = pd.date_range(start='2000', periods=periods, freq='S') + assert len(idx1) == periods + + idx2 = pd.date_range(end='2000', periods=periods, freq='S') + assert len(idx2) == periods + + def test_nat(self): + assert DatetimeIndex([np.nan])[0] is pd.NaT + + def test_ufunc_coercions(self): + idx = date_range('2011-01-01', periods=3, freq='2D', name='x') + + delta = np.timedelta64(1, 'D') + for result in [idx + delta, np.add(idx, delta)]: + assert isinstance(result, DatetimeIndex) + exp = date_range('2011-01-02', periods=3, freq='2D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '2D' + + for result in [idx - delta, np.subtract(idx, delta)]: + assert isinstance(result, DatetimeIndex) + exp = date_range('2010-12-31', periods=3, freq='2D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '2D' + + delta = np.array([np.timedelta64(1, 'D'), np.timedelta64(2, 'D'), + np.timedelta64(3, 'D')]) + for result in [idx + delta, np.add(idx, delta)]: + assert isinstance(result, DatetimeIndex) + exp = DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-08'], + freq='3D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '3D' + + for result in [idx - delta, np.subtract(idx, delta)]: + assert isinstance(result, DatetimeIndex) + exp = DatetimeIndex(['2010-12-31', '2011-01-01', '2011-01-02'], + freq='D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == 'D' + + def test_week_of_month_frequency(self): + # GH 5348: "ValueError: Could not evaluate WOM-1SUN" shouldn't raise + d1 = date(2002, 9, 1) + d2 = date(2013, 10, 27) + d3 = date(2012, 9, 30) + idx1 = DatetimeIndex([d1, d2]) + idx2 = DatetimeIndex([d3]) + result_append = idx1.append(idx2) + expected = DatetimeIndex([d1, d2, d3]) + tm.assert_index_equal(result_append, expected) + result_union = idx1.union(idx2) + expected = DatetimeIndex([d1, d3, d2]) + tm.assert_index_equal(result_union, expected) + + # GH 5115 + result = date_range("2013-1-1", periods=4, freq='WOM-1SAT') + dates = ['2013-01-05', '2013-02-02', '2013-03-02', '2013-04-06'] + expected = DatetimeIndex(dates, freq='WOM-1SAT') + tm.assert_index_equal(result, expected) + + def test_hash_error(self): + index = date_range('20010101', periods=10) + with tm.assert_raises_regex(TypeError, "unhashable type: %r" % + type(index).__name__): + hash(index) + + def test_stringified_slice_with_tz(self): + # GH2658 + import datetime + start = datetime.datetime.now() + idx = DatetimeIndex(start=start, freq="1d", periods=10) + df = DataFrame(lrange(10), index=idx) + df["2013-01-14 23:44:34.437768-05:00":] # no exception here + + def test_append_join_nondatetimeindex(self): + rng = date_range('1/1/2000', periods=10) + idx = Index(['a', 'b', 'c', 'd']) + + result = rng.append(idx) + assert isinstance(result[0], Timestamp) + + # it works + rng.join(idx, how='outer') + + def test_to_period_nofreq(self): + idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04']) + pytest.raises(ValueError, idx.to_period) + + idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], + freq='infer') + assert idx.freqstr == 'D' + expected = pd.PeriodIndex(['2000-01-01', '2000-01-02', + '2000-01-03'], freq='D') + tm.assert_index_equal(idx.to_period(), expected) + + # GH 7606 + idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03']) + assert idx.freqstr is None + tm.assert_index_equal(idx.to_period(), expected) + + def test_comparisons_coverage(self): + rng = date_range('1/1/2000', periods=10) + + # raise TypeError for now + pytest.raises(TypeError, rng.__lt__, rng[3].value) + + result = rng == list(rng) + exp = rng == rng + tm.assert_numpy_array_equal(result, exp) + + def test_comparisons_nat(self): + + fidx1 = pd.Index([1.0, np.nan, 3.0, np.nan, 5.0, 7.0]) + fidx2 = pd.Index([2.0, 3.0, np.nan, np.nan, 6.0, 7.0]) + + didx1 = pd.DatetimeIndex(['2014-01-01', pd.NaT, '2014-03-01', pd.NaT, + '2014-05-01', '2014-07-01']) + didx2 = pd.DatetimeIndex(['2014-02-01', '2014-03-01', pd.NaT, pd.NaT, + '2014-06-01', '2014-07-01']) + darr = np.array([np_datetime64_compat('2014-02-01 00:00Z'), + np_datetime64_compat('2014-03-01 00:00Z'), + np_datetime64_compat('nat'), np.datetime64('nat'), + np_datetime64_compat('2014-06-01 00:00Z'), + np_datetime64_compat('2014-07-01 00:00Z')]) + + cases = [(fidx1, fidx2), (didx1, didx2), (didx1, darr)] + + # Check pd.NaT is handles as the same as np.nan + with tm.assert_produces_warning(None): + for idx1, idx2 in cases: + + result = idx1 < idx2 + expected = np.array([True, False, False, False, True, False]) + tm.assert_numpy_array_equal(result, expected) + + result = idx2 > idx1 + expected = np.array([True, False, False, False, True, False]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 <= idx2 + expected = np.array([True, False, False, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx2 >= idx1 + expected = np.array([True, False, False, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 == idx2 + expected = np.array([False, False, False, False, False, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 != idx2 + expected = np.array([True, True, True, True, True, False]) + tm.assert_numpy_array_equal(result, expected) + + with tm.assert_produces_warning(None): + for idx1, val in [(fidx1, np.nan), (didx1, pd.NaT)]: + result = idx1 < val + expected = np.array([False, False, False, False, False, False]) + tm.assert_numpy_array_equal(result, expected) + result = idx1 > val + tm.assert_numpy_array_equal(result, expected) + + result = idx1 <= val + tm.assert_numpy_array_equal(result, expected) + result = idx1 >= val + tm.assert_numpy_array_equal(result, expected) + + result = idx1 == val + tm.assert_numpy_array_equal(result, expected) + + result = idx1 != val + expected = np.array([True, True, True, True, True, True]) + tm.assert_numpy_array_equal(result, expected) + + # Check pd.NaT is handles as the same as np.nan + with tm.assert_produces_warning(None): + for idx1, val in [(fidx1, 3), (didx1, datetime(2014, 3, 1))]: + result = idx1 < val + expected = np.array([True, False, False, False, False, False]) + tm.assert_numpy_array_equal(result, expected) + result = idx1 > val + expected = np.array([False, False, False, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 <= val + expected = np.array([True, False, True, False, False, False]) + tm.assert_numpy_array_equal(result, expected) + result = idx1 >= val + expected = np.array([False, False, True, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 == val + expected = np.array([False, False, True, False, False, False]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 != val + expected = np.array([True, True, False, True, True, True]) + tm.assert_numpy_array_equal(result, expected) + + def test_map(self): + rng = date_range('1/1/2000', periods=10) + + f = lambda x: x.strftime('%Y%m%d') + result = rng.map(f) + exp = Index([f(x) for x in rng], dtype='= -1') + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -2]), fill_value=True) + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -5]), fill_value=True) + + with pytest.raises(IndexError): + idx.take(np.array([1, -5])) + + def test_take_fill_value_with_timezone(self): + idx = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01'], + name='xxx', tz='US/Eastern') + result = idx.take(np.array([1, 0, -1])) + expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', '2011-03-01'], + name='xxx', tz='US/Eastern') + tm.assert_index_equal(result, expected) + + # fill_value + result = idx.take(np.array([1, 0, -1]), fill_value=True) + expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', 'NaT'], + name='xxx', tz='US/Eastern') + tm.assert_index_equal(result, expected) + + # allow_fill=False + result = idx.take(np.array([1, 0, -1]), allow_fill=False, + fill_value=True) + expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', '2011-03-01'], + name='xxx', tz='US/Eastern') + tm.assert_index_equal(result, expected) + + msg = ('When allow_fill=True and fill_value is not None, ' + 'all indices must be >= -1') + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -2]), fill_value=True) + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -5]), fill_value=True) + + with pytest.raises(IndexError): + idx.take(np.array([1, -5])) + + def test_map_bug_1677(self): + index = DatetimeIndex(['2012-04-25 09:30:00.393000']) + f = index.asof + + result = index.map(f) + expected = Index([f(index[0])]) + tm.assert_index_equal(result, expected) + + def test_groupby_function_tuple_1677(self): + df = DataFrame(np.random.rand(100), + index=date_range("1/1/2000", periods=100)) + monthly_group = df.groupby(lambda x: (x.year, x.month)) + + result = monthly_group.mean() + assert isinstance(result.index[0], tuple) + + def test_append_numpy_bug_1681(self): + # another datetime64 bug + dr = date_range('2011/1/1', '2012/1/1', freq='W-FRI') + a = DataFrame() + c = DataFrame({'A': 'foo', 'B': dr}, index=dr) + + result = a.append(c) + assert (result['B'] == dr).all() + + def test_isin(self): + index = tm.makeDateIndex(4) + result = index.isin(index) + assert result.all() + + result = index.isin(list(index)) + assert result.all() + + assert_almost_equal(index.isin([index[2], 5]), + np.array([False, False, True, False])) + + def test_time(self): + rng = pd.date_range('1/1/2000', freq='12min', periods=10) + result = pd.Index(rng).time + expected = [t.time() for t in rng] + assert (result == expected).all() + + def test_date(self): + rng = pd.date_range('1/1/2000', freq='12H', periods=10) + result = pd.Index(rng).date + expected = [t.date() for t in rng] + assert (result == expected).all() + + def test_does_not_convert_mixed_integer(self): + df = tm.makeCustomDataframe(10, 10, + data_gen_f=lambda *args, **kwargs: randn(), + r_idx_type='i', c_idx_type='dt') + cols = df.columns.join(df.index, how='outer') + joined = cols.join(df.columns) + assert cols.dtype == np.dtype('O') + assert cols.dtype == joined.dtype + tm.assert_numpy_array_equal(cols.values, joined.values) + + def test_slice_keeps_name(self): + # GH4226 + st = pd.Timestamp('2013-07-01 00:00:00', tz='America/Los_Angeles') + et = pd.Timestamp('2013-07-02 00:00:00', tz='America/Los_Angeles') + dr = pd.date_range(st, et, freq='H', name='timebucket') + assert dr[1:].name == dr.name + + def test_join_self(self): + index = date_range('1/1/2000', periods=10) + kinds = 'outer', 'inner', 'left', 'right' + for kind in kinds: + joined = index.join(index, how=kind) + assert index is joined + + def assert_index_parameters(self, index): + assert index.freq == '40960N' + assert index.inferred_freq == '40960N' + + def test_ns_index(self): + nsamples = 400 + ns = int(1e9 / 24414) + dtstart = np.datetime64('2012-09-20T00:00:00') + + dt = dtstart + np.arange(nsamples) * np.timedelta64(ns, 'ns') + freq = ns * offsets.Nano() + index = pd.DatetimeIndex(dt, freq=freq, name='time') + self.assert_index_parameters(index) + + new_index = pd.DatetimeIndex(start=index[0], end=index[-1], + freq=index.freq) + self.assert_index_parameters(new_index) + + def test_join_with_period_index(self): + df = tm.makeCustomDataframe( + 10, 10, data_gen_f=lambda *args: np.random.randint(2), + c_idx_type='p', r_idx_type='dt') + s = df.iloc[:5, 0] + joins = 'left', 'right', 'inner', 'outer' + + for join in joins: + with tm.assert_raises_regex(ValueError, + 'can only call with other ' + 'PeriodIndex-ed objects'): + df.columns.join(s.index, how=join) + + def test_factorize(self): + idx1 = DatetimeIndex(['2014-01', '2014-01', '2014-02', '2014-02', + '2014-03', '2014-03']) + + exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) + exp_idx = DatetimeIndex(['2014-01', '2014-02', '2014-03']) + + arr, idx = idx1.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + arr, idx = idx1.factorize(sort=True) + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + # tz must be preserved + idx1 = idx1.tz_localize('Asia/Tokyo') + exp_idx = exp_idx.tz_localize('Asia/Tokyo') + + arr, idx = idx1.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + idx2 = pd.DatetimeIndex(['2014-03', '2014-03', '2014-02', '2014-01', + '2014-03', '2014-01']) + + exp_arr = np.array([2, 2, 1, 0, 2, 0], dtype=np.intp) + exp_idx = DatetimeIndex(['2014-01', '2014-02', '2014-03']) + arr, idx = idx2.factorize(sort=True) + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + exp_arr = np.array([0, 0, 1, 2, 0, 2], dtype=np.intp) + exp_idx = DatetimeIndex(['2014-03', '2014-02', '2014-01']) + arr, idx = idx2.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + # freq must be preserved + idx3 = date_range('2000-01', periods=4, freq='M', tz='Asia/Tokyo') + exp_arr = np.array([0, 1, 2, 3], dtype=np.intp) + arr, idx = idx3.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, idx3) + + def test_factorize_tz(self): + # GH 13750 + for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: + base = pd.date_range('2016-11-05', freq='H', periods=100, tz=tz) + idx = base.repeat(5) + + exp_arr = np.arange(100, dtype=np.intp).repeat(5) + + for obj in [idx, pd.Series(idx)]: + arr, res = obj.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(res, base) + + def test_factorize_dst(self): + # GH 13750 + idx = pd.date_range('2016-11-06', freq='H', periods=12, + tz='US/Eastern') + + for obj in [idx, pd.Series(idx)]: + arr, res = obj.factorize() + tm.assert_numpy_array_equal(arr, np.arange(12, dtype=np.intp)) + tm.assert_index_equal(res, idx) + + idx = pd.date_range('2016-06-13', freq='H', periods=12, + tz='US/Eastern') + + for obj in [idx, pd.Series(idx)]: + arr, res = obj.factorize() + tm.assert_numpy_array_equal(arr, np.arange(12, dtype=np.intp)) + tm.assert_index_equal(res, idx) + + def test_slice_with_negative_step(self): + ts = Series(np.arange(20), + date_range('2014-01-01', periods=20, freq='MS')) + SLC = pd.IndexSlice + + def assert_slices_equivalent(l_slc, i_slc): + assert_series_equal(ts[l_slc], ts.iloc[i_slc]) + assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + + assert_slices_equivalent(SLC[Timestamp('2014-10-01')::-1], SLC[9::-1]) + assert_slices_equivalent(SLC['2014-10-01'::-1], SLC[9::-1]) + + assert_slices_equivalent(SLC[:Timestamp('2014-10-01'):-1], SLC[:8:-1]) + assert_slices_equivalent(SLC[:'2014-10-01':-1], SLC[:8:-1]) + + assert_slices_equivalent(SLC['2015-02-01':'2014-10-01':-1], + SLC[13:8:-1]) + assert_slices_equivalent(SLC[Timestamp('2015-02-01'):Timestamp( + '2014-10-01'):-1], SLC[13:8:-1]) + assert_slices_equivalent(SLC['2015-02-01':Timestamp('2014-10-01'):-1], + SLC[13:8:-1]) + assert_slices_equivalent(SLC[Timestamp('2015-02-01'):'2014-10-01':-1], + SLC[13:8:-1]) + + assert_slices_equivalent(SLC['2014-10-01':'2015-02-01':-1], SLC[:0]) + + def test_slice_with_zero_step_raises(self): + ts = Series(np.arange(20), + date_range('2014-01-01', periods=20, freq='MS')) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) + + def test_slice_bounds_empty(self): + # GH 14354 + empty_idx = DatetimeIndex(freq='1H', periods=0, end='2015') + + right = empty_idx._maybe_cast_slice_bound('2015-01-02', 'right', 'loc') + exp = Timestamp('2015-01-02 23:59:59.999999999') + assert right == exp + + left = empty_idx._maybe_cast_slice_bound('2015-01-02', 'left', 'loc') + exp = Timestamp('2015-01-02 00:00:00') + assert left == exp + + def test_slice_duplicate_monotonic(self): + # https://github.com/pandas-dev/pandas/issues/16515 + idx = pd.DatetimeIndex(['2017', '2017']) + result = idx._maybe_cast_slice_bound('2017-01-01', 'left', 'loc') + expected = Timestamp('2017-01-01') + assert result == expected diff --git a/pandas/tests/indexes/datetimes/test_datetimelike.py b/pandas/tests/indexes/datetimes/test_datetimelike.py new file mode 100644 index 0000000000000..3b970ee382521 --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_datetimelike.py @@ -0,0 +1,76 @@ +""" generic tests from the Datetimelike class """ + +import numpy as np +import pandas as pd +from pandas.util import testing as tm +from pandas import Series, Index, DatetimeIndex, date_range + +from ..datetimelike import DatetimeLike + + +class TestDatetimeIndex(DatetimeLike): + _holder = DatetimeIndex + + def setup_method(self, method): + self.indices = dict(index=tm.makeDateIndex(10)) + self.setup_indices() + + def create_index(self): + return date_range('20130101', periods=5) + + def test_shift(self): + + # test shift for datetimeIndex and non datetimeIndex + # GH8083 + + drange = self.create_index() + result = drange.shift(1) + expected = DatetimeIndex(['2013-01-02', '2013-01-03', '2013-01-04', + '2013-01-05', + '2013-01-06'], freq='D') + tm.assert_index_equal(result, expected) + + result = drange.shift(-1) + expected = DatetimeIndex(['2012-12-31', '2013-01-01', '2013-01-02', + '2013-01-03', '2013-01-04'], + freq='D') + tm.assert_index_equal(result, expected) + + result = drange.shift(3, freq='2D') + expected = DatetimeIndex(['2013-01-07', '2013-01-08', '2013-01-09', + '2013-01-10', + '2013-01-11'], freq='D') + tm.assert_index_equal(result, expected) + + def test_pickle_compat_construction(self): + pass + + def test_intersection(self): + first = self.index + second = self.index[5:] + intersect = first.intersection(second) + assert tm.equalContents(intersect, second) + + # GH 10149 + cases = [klass(second.values) for klass in [np.array, Series, list]] + for case in cases: + result = first.intersection(case) + assert tm.equalContents(result, second) + + third = Index(['a', 'b', 'c']) + result = first.intersection(third) + expected = pd.Index([], dtype=object) + tm.assert_index_equal(result, expected) + + def test_union(self): + first = self.index[:5] + second = self.index[5:] + everything = self.index + union = first.union(second) + assert tm.equalContents(union, everything) + + # GH 10149 + cases = [klass(second.values) for klass in [np.array, Series, list]] + for case in cases: + result = first.union(case) + assert tm.equalContents(result, everything) diff --git a/pandas/tests/indexes/datetimes/test_formats.py b/pandas/tests/indexes/datetimes/test_formats.py new file mode 100644 index 0000000000000..ea2731f66f0ef --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_formats.py @@ -0,0 +1,47 @@ +from pandas import DatetimeIndex + +import numpy as np + +import pandas.util.testing as tm +import pandas as pd + + +def test_to_native_types(): + index = DatetimeIndex(freq='1D', periods=3, start='2017-01-01') + + # First, with no arguments. + expected = np.array(['2017-01-01', '2017-01-02', + '2017-01-03'], dtype=object) + + result = index.to_native_types() + tm.assert_numpy_array_equal(result, expected) + + # No NaN values, so na_rep has no effect + result = index.to_native_types(na_rep='pandas') + tm.assert_numpy_array_equal(result, expected) + + # Make sure slicing works + expected = np.array(['2017-01-01', '2017-01-03'], dtype=object) + + result = index.to_native_types([0, 2]) + tm.assert_numpy_array_equal(result, expected) + + # Make sure date formatting works + expected = np.array(['01-2017-01', '01-2017-02', + '01-2017-03'], dtype=object) + + result = index.to_native_types(date_format='%m-%Y-%d') + tm.assert_numpy_array_equal(result, expected) + + # NULL object handling should work + index = DatetimeIndex(['2017-01-01', pd.NaT, '2017-01-03']) + expected = np.array(['2017-01-01', 'NaT', '2017-01-03'], dtype=object) + + result = index.to_native_types() + tm.assert_numpy_array_equal(result, expected) + + expected = np.array(['2017-01-01', 'pandas', + '2017-01-03'], dtype=object) + + result = index.to_native_types(na_rep='pandas') + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/indexes/datetimes/test_indexing.py b/pandas/tests/indexes/datetimes/test_indexing.py new file mode 100644 index 0000000000000..9416b08f9654a --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_indexing.py @@ -0,0 +1,241 @@ +import pytest + +import pytz +import numpy as np +import pandas as pd +import pandas.util.testing as tm +import pandas.compat as compat +from pandas import notna, Index, DatetimeIndex, datetime, date_range + + +class TestDatetimeIndex(object): + + def test_where_other(self): + + # other is ndarray or Index + i = pd.date_range('20130101', periods=3, tz='US/Eastern') + + for arr in [np.nan, pd.NaT]: + result = i.where(notna(i), other=np.nan) + expected = i + tm.assert_index_equal(result, expected) + + i2 = i.copy() + i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) + result = i.where(notna(i2), i2) + tm.assert_index_equal(result, i2) + + i2 = i.copy() + i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) + result = i.where(notna(i2), i2.values) + tm.assert_index_equal(result, i2) + + def test_where_tz(self): + i = pd.date_range('20130101', periods=3, tz='US/Eastern') + result = i.where(notna(i)) + expected = i + tm.assert_index_equal(result, expected) + + i2 = i.copy() + i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) + result = i.where(notna(i2)) + expected = i2 + tm.assert_index_equal(result, expected) + + def test_insert(self): + idx = DatetimeIndex( + ['2000-01-04', '2000-01-01', '2000-01-02'], name='idx') + + result = idx.insert(2, datetime(2000, 1, 5)) + exp = DatetimeIndex(['2000-01-04', '2000-01-01', '2000-01-05', + '2000-01-02'], name='idx') + tm.assert_index_equal(result, exp) + + # insertion of non-datetime should coerce to object index + result = idx.insert(1, 'inserted') + expected = Index([datetime(2000, 1, 4), 'inserted', + datetime(2000, 1, 1), + datetime(2000, 1, 2)], name='idx') + assert not isinstance(result, DatetimeIndex) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + + idx = date_range('1/1/2000', periods=3, freq='M', name='idx') + + # preserve freq + expected_0 = DatetimeIndex(['1999-12-31', '2000-01-31', '2000-02-29', + '2000-03-31'], name='idx', freq='M') + expected_3 = DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', + '2000-04-30'], name='idx', freq='M') + + # reset freq to None + expected_1_nofreq = DatetimeIndex(['2000-01-31', '2000-01-31', + '2000-02-29', + '2000-03-31'], name='idx', + freq=None) + expected_3_nofreq = DatetimeIndex(['2000-01-31', '2000-02-29', + '2000-03-31', + '2000-01-02'], name='idx', + freq=None) + + cases = [(0, datetime(1999, 12, 31), expected_0), + (-3, datetime(1999, 12, 31), expected_0), + (3, datetime(2000, 4, 30), expected_3), + (1, datetime(2000, 1, 31), expected_1_nofreq), + (3, datetime(2000, 1, 2), expected_3_nofreq)] + + for n, d, expected in cases: + result = idx.insert(n, d) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + # reset freq to None + result = idx.insert(3, datetime(2000, 1, 2)) + expected = DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', + '2000-01-02'], name='idx', freq=None) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq is None + + # see gh-7299 + idx = date_range('1/1/2000', periods=3, freq='D', tz='Asia/Tokyo', + name='idx') + with pytest.raises(ValueError): + idx.insert(3, pd.Timestamp('2000-01-04')) + with pytest.raises(ValueError): + idx.insert(3, datetime(2000, 1, 4)) + with pytest.raises(ValueError): + idx.insert(3, pd.Timestamp('2000-01-04', tz='US/Eastern')) + with pytest.raises(ValueError): + idx.insert(3, datetime(2000, 1, 4, + tzinfo=pytz.timezone('US/Eastern'))) + + for tz in ['US/Pacific', 'Asia/Singapore']: + idx = date_range('1/1/2000 09:00', periods=6, freq='H', tz=tz, + name='idx') + # preserve freq + expected = date_range('1/1/2000 09:00', periods=7, freq='H', tz=tz, + name='idx') + for d in [pd.Timestamp('2000-01-01 15:00', tz=tz), + pytz.timezone(tz).localize(datetime(2000, 1, 1, 15))]: + + result = idx.insert(6, d) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + assert result.tz == expected.tz + + expected = DatetimeIndex(['2000-01-01 09:00', '2000-01-01 10:00', + '2000-01-01 11:00', + '2000-01-01 12:00', '2000-01-01 13:00', + '2000-01-01 14:00', + '2000-01-01 10:00'], name='idx', + tz=tz, freq=None) + # reset freq to None + for d in [pd.Timestamp('2000-01-01 10:00', tz=tz), + pytz.timezone(tz).localize(datetime(2000, 1, 1, 10))]: + result = idx.insert(6, d) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.tz == expected.tz + assert result.freq is None + + def test_delete(self): + idx = date_range(start='2000-01-01', periods=5, freq='M', name='idx') + + # prserve freq + expected_0 = date_range(start='2000-02-01', periods=4, freq='M', + name='idx') + expected_4 = date_range(start='2000-01-01', periods=4, freq='M', + name='idx') + + # reset freq to None + expected_1 = DatetimeIndex(['2000-01-31', '2000-03-31', '2000-04-30', + '2000-05-31'], freq=None, name='idx') + + cases = {0: expected_0, + -5: expected_0, + -1: expected_4, + 4: expected_4, + 1: expected_1} + for n, expected in compat.iteritems(cases): + result = idx.delete(n) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + with pytest.raises((IndexError, ValueError)): + # either depeidnig on numpy version + result = idx.delete(5) + + for tz in [None, 'Asia/Tokyo', 'US/Pacific']: + idx = date_range(start='2000-01-01 09:00', periods=10, freq='H', + name='idx', tz=tz) + + expected = date_range(start='2000-01-01 10:00', periods=9, + freq='H', name='idx', tz=tz) + result = idx.delete(0) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freqstr == 'H' + assert result.tz == expected.tz + + expected = date_range(start='2000-01-01 09:00', periods=9, + freq='H', name='idx', tz=tz) + result = idx.delete(-1) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freqstr == 'H' + assert result.tz == expected.tz + + def test_delete_slice(self): + idx = date_range(start='2000-01-01', periods=10, freq='D', name='idx') + + # prserve freq + expected_0_2 = date_range(start='2000-01-04', periods=7, freq='D', + name='idx') + expected_7_9 = date_range(start='2000-01-01', periods=7, freq='D', + name='idx') + + # reset freq to None + expected_3_5 = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', + '2000-01-07', '2000-01-08', '2000-01-09', + '2000-01-10'], freq=None, name='idx') + + cases = {(0, 1, 2): expected_0_2, + (7, 8, 9): expected_7_9, + (3, 4, 5): expected_3_5} + for n, expected in compat.iteritems(cases): + result = idx.delete(n) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + result = idx.delete(slice(n[0], n[-1] + 1)) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + for tz in [None, 'Asia/Tokyo', 'US/Pacific']: + ts = pd.Series(1, index=pd.date_range( + '2000-01-01 09:00', periods=10, freq='H', name='idx', tz=tz)) + # preserve freq + result = ts.drop(ts.index[:5]).index + expected = pd.date_range('2000-01-01 14:00', periods=5, freq='H', + name='idx', tz=tz) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + assert result.tz == expected.tz + + # reset freq to None + result = ts.drop(ts.index[[1, 3, 5, 7, 9]]).index + expected = DatetimeIndex(['2000-01-01 09:00', '2000-01-01 11:00', + '2000-01-01 13:00', + '2000-01-01 15:00', '2000-01-01 17:00'], + freq=None, name='idx', tz=tz) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + assert result.tz == expected.tz diff --git a/pandas/tests/indexes/datetimes/test_misc.py b/pandas/tests/indexes/datetimes/test_misc.py new file mode 100644 index 0000000000000..951aa2c520d0f --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_misc.py @@ -0,0 +1,349 @@ +import pytest + +import numpy as np +import pandas as pd +import pandas.util.testing as tm +from pandas import (Index, DatetimeIndex, datetime, offsets, + Float64Index, date_range, Timestamp) + + +class TestDateTimeIndexToJulianDate(object): + + def test_1700(self): + r1 = Float64Index([2345897.5, 2345898.5, 2345899.5, 2345900.5, + 2345901.5]) + r2 = date_range(start=Timestamp('1710-10-01'), periods=5, + freq='D').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_2000(self): + r1 = Float64Index([2451601.5, 2451602.5, 2451603.5, 2451604.5, + 2451605.5]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='D').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_hour(self): + r1 = Float64Index( + [2451601.5, 2451601.5416666666666666, 2451601.5833333333333333, + 2451601.625, 2451601.6666666666666666]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='H').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_minute(self): + r1 = Float64Index( + [2451601.5, 2451601.5006944444444444, 2451601.5013888888888888, + 2451601.5020833333333333, 2451601.5027777777777777]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='T').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_second(self): + r1 = Float64Index( + [2451601.5, 2451601.500011574074074, 2451601.5000231481481481, + 2451601.5000347222222222, 2451601.5000462962962962]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='S').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + +class TestTimeSeries(object): + + def test_pass_datetimeindex_to_index(self): + # Bugs in #1396 + rng = date_range('1/1/2000', '3/1/2000') + idx = Index(rng, dtype=object) + + expected = Index(rng.to_pydatetime(), dtype=object) + + tm.assert_numpy_array_equal(idx.values, expected.values) + + def test_range_edges(self): + # GH 13672 + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000001'), + end=Timestamp('1970-01-01 00:00:00.000000004'), + freq='N') + exp = DatetimeIndex(['1970-01-01 00:00:00.000000001', + '1970-01-01 00:00:00.000000002', + '1970-01-01 00:00:00.000000003', + '1970-01-01 00:00:00.000000004']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000004'), + end=Timestamp('1970-01-01 00:00:00.000000001'), + freq='N') + exp = DatetimeIndex([]) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000001'), + end=Timestamp('1970-01-01 00:00:00.000000001'), + freq='N') + exp = DatetimeIndex(['1970-01-01 00:00:00.000000001']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000001'), + end=Timestamp('1970-01-01 00:00:00.000004'), + freq='U') + exp = DatetimeIndex(['1970-01-01 00:00:00.000001', + '1970-01-01 00:00:00.000002', + '1970-01-01 00:00:00.000003', + '1970-01-01 00:00:00.000004']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.001'), + end=Timestamp('1970-01-01 00:00:00.004'), + freq='L') + exp = DatetimeIndex(['1970-01-01 00:00:00.001', + '1970-01-01 00:00:00.002', + '1970-01-01 00:00:00.003', + '1970-01-01 00:00:00.004']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:01'), + end=Timestamp('1970-01-01 00:00:04'), freq='S') + exp = DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:02', + '1970-01-01 00:00:03', '1970-01-01 00:00:04']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 00:01'), + end=Timestamp('1970-01-01 00:04'), freq='T') + exp = DatetimeIndex(['1970-01-01 00:01', '1970-01-01 00:02', + '1970-01-01 00:03', '1970-01-01 00:04']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01 01:00'), + end=Timestamp('1970-01-01 04:00'), freq='H') + exp = DatetimeIndex(['1970-01-01 01:00', '1970-01-01 02:00', + '1970-01-01 03:00', '1970-01-01 04:00']) + tm.assert_index_equal(idx, exp) + + idx = DatetimeIndex(start=Timestamp('1970-01-01'), + end=Timestamp('1970-01-04'), freq='D') + exp = DatetimeIndex(['1970-01-01', '1970-01-02', + '1970-01-03', '1970-01-04']) + tm.assert_index_equal(idx, exp) + + def test_datetimeindex_integers_shift(self): + rng = date_range('1/1/2000', periods=20) + + result = rng + 5 + expected = rng.shift(5) + tm.assert_index_equal(result, expected) + + result = rng - 5 + expected = rng.shift(-5) + tm.assert_index_equal(result, expected) + + def test_datetimeindex_repr_short(self): + dr = date_range(start='1/1/2012', periods=1) + repr(dr) + + dr = date_range(start='1/1/2012', periods=2) + repr(dr) + + dr = date_range(start='1/1/2012', periods=3) + repr(dr) + + def test_normalize(self): + rng = date_range('1/1/2000 9:30', periods=10, freq='D') + + result = rng.normalize() + expected = date_range('1/1/2000', periods=10, freq='D') + tm.assert_index_equal(result, expected) + + rng_ns = pd.DatetimeIndex(np.array([1380585623454345752, + 1380585612343234312]).astype( + "datetime64[ns]")) + rng_ns_normalized = rng_ns.normalize() + expected = pd.DatetimeIndex(np.array([1380585600000000000, + 1380585600000000000]).astype( + "datetime64[ns]")) + tm.assert_index_equal(rng_ns_normalized, expected) + + assert result.is_normalized + assert not rng.is_normalized + + +class TestDatetime64(object): + + def test_datetimeindex_accessors(self): + + dti_naive = DatetimeIndex(freq='D', start=datetime(1998, 1, 1), + periods=365) + # GH 13303 + dti_tz = DatetimeIndex(freq='D', start=datetime(1998, 1, 1), + periods=365, tz='US/Eastern') + for dti in [dti_naive, dti_tz]: + + assert dti.year[0] == 1998 + assert dti.month[0] == 1 + assert dti.day[0] == 1 + assert dti.hour[0] == 0 + assert dti.minute[0] == 0 + assert dti.second[0] == 0 + assert dti.microsecond[0] == 0 + assert dti.dayofweek[0] == 3 + + assert dti.dayofyear[0] == 1 + assert dti.dayofyear[120] == 121 + + assert dti.weekofyear[0] == 1 + assert dti.weekofyear[120] == 18 + + assert dti.quarter[0] == 1 + assert dti.quarter[120] == 2 + + assert dti.days_in_month[0] == 31 + assert dti.days_in_month[90] == 30 + + assert dti.is_month_start[0] + assert not dti.is_month_start[1] + assert dti.is_month_start[31] + assert dti.is_quarter_start[0] + assert dti.is_quarter_start[90] + assert dti.is_year_start[0] + assert not dti.is_year_start[364] + assert not dti.is_month_end[0] + assert dti.is_month_end[30] + assert not dti.is_month_end[31] + assert dti.is_month_end[364] + assert not dti.is_quarter_end[0] + assert not dti.is_quarter_end[30] + assert dti.is_quarter_end[89] + assert dti.is_quarter_end[364] + assert not dti.is_year_end[0] + assert dti.is_year_end[364] + + # GH 11128 + assert dti.weekday_name[4] == u'Monday' + assert dti.weekday_name[5] == u'Tuesday' + assert dti.weekday_name[6] == u'Wednesday' + assert dti.weekday_name[7] == u'Thursday' + assert dti.weekday_name[8] == u'Friday' + assert dti.weekday_name[9] == u'Saturday' + assert dti.weekday_name[10] == u'Sunday' + + assert Timestamp('2016-04-04').weekday_name == u'Monday' + assert Timestamp('2016-04-05').weekday_name == u'Tuesday' + assert Timestamp('2016-04-06').weekday_name == u'Wednesday' + assert Timestamp('2016-04-07').weekday_name == u'Thursday' + assert Timestamp('2016-04-08').weekday_name == u'Friday' + assert Timestamp('2016-04-09').weekday_name == u'Saturday' + assert Timestamp('2016-04-10').weekday_name == u'Sunday' + + assert len(dti.year) == 365 + assert len(dti.month) == 365 + assert len(dti.day) == 365 + assert len(dti.hour) == 365 + assert len(dti.minute) == 365 + assert len(dti.second) == 365 + assert len(dti.microsecond) == 365 + assert len(dti.dayofweek) == 365 + assert len(dti.dayofyear) == 365 + assert len(dti.weekofyear) == 365 + assert len(dti.quarter) == 365 + assert len(dti.is_month_start) == 365 + assert len(dti.is_month_end) == 365 + assert len(dti.is_quarter_start) == 365 + assert len(dti.is_quarter_end) == 365 + assert len(dti.is_year_start) == 365 + assert len(dti.is_year_end) == 365 + assert len(dti.weekday_name) == 365 + + dti.name = 'name' + + # non boolean accessors -> return Index + for accessor in DatetimeIndex._field_ops: + res = getattr(dti, accessor) + assert len(res) == 365 + assert isinstance(res, Index) + assert res.name == 'name' + + # boolean accessors -> return array + for accessor in DatetimeIndex._bool_ops: + res = getattr(dti, accessor) + assert len(res) == 365 + assert isinstance(res, np.ndarray) + + # test boolean indexing + res = dti[dti.is_quarter_start] + exp = dti[[0, 90, 181, 273]] + tm.assert_index_equal(res, exp) + res = dti[dti.is_leap_year] + exp = DatetimeIndex([], freq='D', tz=dti.tz, name='name') + tm.assert_index_equal(res, exp) + + dti = DatetimeIndex(freq='BQ-FEB', start=datetime(1998, 1, 1), + periods=4) + + assert sum(dti.is_quarter_start) == 0 + assert sum(dti.is_quarter_end) == 4 + assert sum(dti.is_year_start) == 0 + assert sum(dti.is_year_end) == 1 + + # Ensure is_start/end accessors throw ValueError for CustomBusinessDay, + # CBD requires np >= 1.7 + bday_egypt = offsets.CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu') + dti = date_range(datetime(2013, 4, 30), periods=5, freq=bday_egypt) + pytest.raises(ValueError, lambda: dti.is_month_start) + + dti = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03']) + + assert dti.is_month_start[0] == 1 + + tests = [ + (Timestamp('2013-06-01', freq='M').is_month_start, 1), + (Timestamp('2013-06-01', freq='BM').is_month_start, 0), + (Timestamp('2013-06-03', freq='M').is_month_start, 0), + (Timestamp('2013-06-03', freq='BM').is_month_start, 1), + (Timestamp('2013-02-28', freq='Q-FEB').is_month_end, 1), + (Timestamp('2013-02-28', freq='Q-FEB').is_quarter_end, 1), + (Timestamp('2013-02-28', freq='Q-FEB').is_year_end, 1), + (Timestamp('2013-03-01', freq='Q-FEB').is_month_start, 1), + (Timestamp('2013-03-01', freq='Q-FEB').is_quarter_start, 1), + (Timestamp('2013-03-01', freq='Q-FEB').is_year_start, 1), + (Timestamp('2013-03-31', freq='QS-FEB').is_month_end, 1), + (Timestamp('2013-03-31', freq='QS-FEB').is_quarter_end, 0), + (Timestamp('2013-03-31', freq='QS-FEB').is_year_end, 0), + (Timestamp('2013-02-01', freq='QS-FEB').is_month_start, 1), + (Timestamp('2013-02-01', freq='QS-FEB').is_quarter_start, 1), + (Timestamp('2013-02-01', freq='QS-FEB').is_year_start, 1), + (Timestamp('2013-06-30', freq='BQ').is_month_end, 0), + (Timestamp('2013-06-30', freq='BQ').is_quarter_end, 0), + (Timestamp('2013-06-30', freq='BQ').is_year_end, 0), + (Timestamp('2013-06-28', freq='BQ').is_month_end, 1), + (Timestamp('2013-06-28', freq='BQ').is_quarter_end, 1), + (Timestamp('2013-06-28', freq='BQ').is_year_end, 0), + (Timestamp('2013-06-30', freq='BQS-APR').is_month_end, 0), + (Timestamp('2013-06-30', freq='BQS-APR').is_quarter_end, 0), + (Timestamp('2013-06-30', freq='BQS-APR').is_year_end, 0), + (Timestamp('2013-06-28', freq='BQS-APR').is_month_end, 1), + (Timestamp('2013-06-28', freq='BQS-APR').is_quarter_end, 1), + (Timestamp('2013-03-29', freq='BQS-APR').is_year_end, 1), + (Timestamp('2013-11-01', freq='AS-NOV').is_year_start, 1), + (Timestamp('2013-10-31', freq='AS-NOV').is_year_end, 1), + (Timestamp('2012-02-01').days_in_month, 29), + (Timestamp('2013-02-01').days_in_month, 28)] + + for ts, value in tests: + assert ts == value + + # GH 6538: Check that DatetimeIndex and its TimeStamp elements + # return the same weekofyear accessor close to new year w/ tz + dates = ["2013/12/29", "2013/12/30", "2013/12/31"] + dates = DatetimeIndex(dates, tz="Europe/Brussels") + expected = [52, 1, 1] + assert dates.weekofyear.tolist() == expected + assert [d.weekofyear for d in dates] == expected + + def test_nanosecond_field(self): + dti = DatetimeIndex(np.arange(10)) + + tm.assert_index_equal(dti.nanosecond, + pd.Index(np.arange(10, dtype=np.int64))) diff --git a/pandas/tests/indexes/datetimes/test_missing.py b/pandas/tests/indexes/datetimes/test_missing.py new file mode 100644 index 0000000000000..adc0b7b3d81e8 --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_missing.py @@ -0,0 +1,50 @@ +import pandas as pd +import pandas.util.testing as tm + + +class TestDatetimeIndex(object): + + def test_fillna_datetime64(self): + # GH 11343 + for tz in ['US/Eastern', 'Asia/Tokyo']: + idx = pd.DatetimeIndex(['2011-01-01 09:00', pd.NaT, + '2011-01-01 11:00']) + + exp = pd.DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00']) + tm.assert_index_equal( + idx.fillna(pd.Timestamp('2011-01-01 10:00')), exp) + + # tz mismatch + exp = pd.Index([pd.Timestamp('2011-01-01 09:00'), + pd.Timestamp('2011-01-01 10:00', tz=tz), + pd.Timestamp('2011-01-01 11:00')], dtype=object) + tm.assert_index_equal( + idx.fillna(pd.Timestamp('2011-01-01 10:00', tz=tz)), exp) + + # object + exp = pd.Index([pd.Timestamp('2011-01-01 09:00'), 'x', + pd.Timestamp('2011-01-01 11:00')], dtype=object) + tm.assert_index_equal(idx.fillna('x'), exp) + + idx = pd.DatetimeIndex(['2011-01-01 09:00', pd.NaT, + '2011-01-01 11:00'], tz=tz) + + exp = pd.DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00'], tz=tz) + tm.assert_index_equal( + idx.fillna(pd.Timestamp('2011-01-01 10:00', tz=tz)), exp) + + exp = pd.Index([pd.Timestamp('2011-01-01 09:00', tz=tz), + pd.Timestamp('2011-01-01 10:00'), + pd.Timestamp('2011-01-01 11:00', tz=tz)], + dtype=object) + tm.assert_index_equal( + idx.fillna(pd.Timestamp('2011-01-01 10:00')), exp) + + # object + exp = pd.Index([pd.Timestamp('2011-01-01 09:00', tz=tz), + 'x', + pd.Timestamp('2011-01-01 11:00', tz=tz)], + dtype=object) + tm.assert_index_equal(idx.fillna('x'), exp) diff --git a/pandas/tests/indexes/datetimes/test_ops.py b/pandas/tests/indexes/datetimes/test_ops.py new file mode 100644 index 0000000000000..86e65feec04f3 --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_ops.py @@ -0,0 +1,1286 @@ +import pytz +import pytest +import dateutil +import warnings +import numpy as np +from datetime import timedelta + +from itertools import product +import pandas as pd +import pandas._libs.tslib as tslib +import pandas.util.testing as tm +from pandas.errors import PerformanceWarning +from pandas.core.indexes.datetimes import cdate_range +from pandas import (DatetimeIndex, PeriodIndex, Series, Timestamp, Timedelta, + date_range, TimedeltaIndex, _np_version_under1p10, Index, + datetime, Float64Index, offsets, bdate_range) +from pandas.tseries.offsets import BMonthEnd, CDay, BDay +from pandas.tests.test_base import Ops + + +START, END = datetime(2009, 1, 1), datetime(2010, 1, 1) + + +class TestDatetimeIndexOps(Ops): + tz = [None, 'UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/Asia/Singapore', + 'dateutil/US/Pacific'] + + def setup_method(self, method): + super(TestDatetimeIndexOps, self).setup_method(method) + mask = lambda x: (isinstance(x, DatetimeIndex) or + isinstance(x, PeriodIndex)) + self.is_valid_objs = [o for o in self.objs if mask(o)] + self.not_valid_objs = [o for o in self.objs if not mask(o)] + + def test_ops_properties(self): + f = lambda x: isinstance(x, DatetimeIndex) + self.check_ops_properties(DatetimeIndex._field_ops, f) + self.check_ops_properties(DatetimeIndex._object_ops, f) + self.check_ops_properties(DatetimeIndex._bool_ops, f) + + def test_ops_properties_basic(self): + + # sanity check that the behavior didn't change + # GH7206 + for op in ['year', 'day', 'second', 'weekday']: + pytest.raises(TypeError, lambda x: getattr(self.dt_series, op)) + + # attribute access should still work! + s = Series(dict(year=2000, month=1, day=10)) + assert s.year == 2000 + assert s.month == 1 + assert s.day == 10 + pytest.raises(AttributeError, lambda: s.weekday) + + def test_asobject_tolist(self): + idx = pd.date_range(start='2013-01-01', periods=4, freq='M', + name='idx') + expected_list = [Timestamp('2013-01-31'), + Timestamp('2013-02-28'), + Timestamp('2013-03-31'), + Timestamp('2013-04-30')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + idx = pd.date_range(start='2013-01-01', periods=4, freq='M', + name='idx', tz='Asia/Tokyo') + expected_list = [Timestamp('2013-01-31', tz='Asia/Tokyo'), + Timestamp('2013-02-28', tz='Asia/Tokyo'), + Timestamp('2013-03-31', tz='Asia/Tokyo'), + Timestamp('2013-04-30', tz='Asia/Tokyo')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + idx = DatetimeIndex([datetime(2013, 1, 1), datetime(2013, 1, 2), + pd.NaT, datetime(2013, 1, 4)], name='idx') + expected_list = [Timestamp('2013-01-01'), + Timestamp('2013-01-02'), pd.NaT, + Timestamp('2013-01-04')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + def test_minmax(self): + for tz in self.tz: + # monotonic + idx1 = pd.DatetimeIndex(['2011-01-01', '2011-01-02', + '2011-01-03'], tz=tz) + assert idx1.is_monotonic + + # non-monotonic + idx2 = pd.DatetimeIndex(['2011-01-01', pd.NaT, '2011-01-03', + '2011-01-02', pd.NaT], tz=tz) + assert not idx2.is_monotonic + + for idx in [idx1, idx2]: + assert idx.min() == Timestamp('2011-01-01', tz=tz) + assert idx.max() == Timestamp('2011-01-03', tz=tz) + assert idx.argmin() == 0 + assert idx.argmax() == 2 + + for op in ['min', 'max']: + # Return NaT + obj = DatetimeIndex([]) + assert pd.isna(getattr(obj, op)()) + + obj = DatetimeIndex([pd.NaT]) + assert pd.isna(getattr(obj, op)()) + + obj = DatetimeIndex([pd.NaT, pd.NaT, pd.NaT]) + assert pd.isna(getattr(obj, op)()) + + def test_numpy_minmax(self): + dr = pd.date_range(start='2016-01-15', end='2016-01-20') + + assert np.min(dr) == Timestamp('2016-01-15 00:00:00', freq='D') + assert np.max(dr) == Timestamp('2016-01-20 00:00:00', freq='D') + + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, errmsg, np.min, dr, out=0) + tm.assert_raises_regex(ValueError, errmsg, np.max, dr, out=0) + + assert np.argmin(dr) == 0 + assert np.argmax(dr) == 5 + + if not _np_version_under1p10: + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex( + ValueError, errmsg, np.argmin, dr, out=0) + tm.assert_raises_regex( + ValueError, errmsg, np.argmax, dr, out=0) + + def test_round(self): + for tz in self.tz: + rng = pd.date_range(start='2016-01-01', periods=5, + freq='30Min', tz=tz) + elt = rng[1] + + expected_rng = DatetimeIndex([ + Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 01:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 02:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 02:00:00', tz=tz, freq='30T'), + ]) + expected_elt = expected_rng[1] + + tm.assert_index_equal(rng.round(freq='H'), expected_rng) + assert elt.round(freq='H') == expected_elt + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + rng.round(freq='foo') + with tm.assert_raises_regex(ValueError, msg): + elt.round(freq='foo') + + msg = " is a non-fixed frequency" + tm.assert_raises_regex(ValueError, msg, rng.round, freq='M') + tm.assert_raises_regex(ValueError, msg, elt.round, freq='M') + + # GH 14440 & 15578 + index = pd.DatetimeIndex(['2016-10-17 12:00:00.0015'], tz=tz) + result = index.round('ms') + expected = pd.DatetimeIndex(['2016-10-17 12:00:00.002000'], tz=tz) + tm.assert_index_equal(result, expected) + + for freq in ['us', 'ns']: + tm.assert_index_equal(index, index.round(freq)) + + index = pd.DatetimeIndex(['2016-10-17 12:00:00.00149'], tz=tz) + result = index.round('ms') + expected = pd.DatetimeIndex(['2016-10-17 12:00:00.001000'], tz=tz) + tm.assert_index_equal(result, expected) + + index = pd.DatetimeIndex(['2016-10-17 12:00:00.001501031']) + result = index.round('10ns') + expected = pd.DatetimeIndex(['2016-10-17 12:00:00.001501030']) + tm.assert_index_equal(result, expected) + + with tm.assert_produces_warning(): + ts = '2016-10-17 12:00:00.001501031' + pd.DatetimeIndex([ts]).round('1010ns') + + def test_repeat_range(self): + rng = date_range('1/1/2000', '1/1/2001') + + result = rng.repeat(5) + assert result.freq is None + assert len(result) == 5 * len(rng) + + for tz in self.tz: + index = pd.date_range('2001-01-01', periods=2, freq='D', tz=tz) + exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', + '2001-01-02', '2001-01-02'], tz=tz) + for res in [index.repeat(2), np.repeat(index, 2)]: + tm.assert_index_equal(res, exp) + assert res.freq is None + + index = pd.date_range('2001-01-01', periods=2, freq='2D', tz=tz) + exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', + '2001-01-03', '2001-01-03'], tz=tz) + for res in [index.repeat(2), np.repeat(index, 2)]: + tm.assert_index_equal(res, exp) + assert res.freq is None + + index = pd.DatetimeIndex(['2001-01-01', 'NaT', '2003-01-01'], + tz=tz) + exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', '2001-01-01', + 'NaT', 'NaT', 'NaT', + '2003-01-01', '2003-01-01', '2003-01-01'], + tz=tz) + for res in [index.repeat(3), np.repeat(index, 3)]: + tm.assert_index_equal(res, exp) + assert res.freq is None + + def test_repeat(self): + reps = 2 + msg = "the 'axis' parameter is not supported" + + for tz in self.tz: + rng = pd.date_range(start='2016-01-01', periods=2, + freq='30Min', tz=tz) + + expected_rng = DatetimeIndex([ + Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 00:30:00', tz=tz, freq='30T'), + Timestamp('2016-01-01 00:30:00', tz=tz, freq='30T'), + ]) + + res = rng.repeat(reps) + tm.assert_index_equal(res, expected_rng) + assert res.freq is None + + tm.assert_index_equal(np.repeat(rng, reps), expected_rng) + tm.assert_raises_regex(ValueError, msg, np.repeat, + rng, reps, axis=1) + + def test_representation(self): + + idx = [] + idx.append(DatetimeIndex([], freq='D')) + idx.append(DatetimeIndex(['2011-01-01'], freq='D')) + idx.append(DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D')) + idx.append(DatetimeIndex( + ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D')) + idx.append(DatetimeIndex( + ['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00' + ], freq='H', tz='Asia/Tokyo')) + idx.append(DatetimeIndex( + ['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], tz='US/Eastern')) + idx.append(DatetimeIndex( + ['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], tz='UTC')) + + exp = [] + exp.append("""DatetimeIndex([], dtype='datetime64[ns]', freq='D')""") + exp.append("DatetimeIndex(['2011-01-01'], dtype='datetime64[ns]', " + "freq='D')") + exp.append("DatetimeIndex(['2011-01-01', '2011-01-02'], " + "dtype='datetime64[ns]', freq='D')") + exp.append("DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], " + "dtype='datetime64[ns]', freq='D')") + exp.append("DatetimeIndex(['2011-01-01 09:00:00+09:00', " + "'2011-01-01 10:00:00+09:00', '2011-01-01 11:00:00+09:00']" + ", dtype='datetime64[ns, Asia/Tokyo]', freq='H')") + exp.append("DatetimeIndex(['2011-01-01 09:00:00-05:00', " + "'2011-01-01 10:00:00-05:00', 'NaT'], " + "dtype='datetime64[ns, US/Eastern]', freq=None)") + exp.append("DatetimeIndex(['2011-01-01 09:00:00+00:00', " + "'2011-01-01 10:00:00+00:00', 'NaT'], " + "dtype='datetime64[ns, UTC]', freq=None)""") + + with pd.option_context('display.width', 300): + for indx, expected in zip(idx, exp): + for func in ['__repr__', '__unicode__', '__str__']: + result = getattr(indx, func)() + assert result == expected + + def test_representation_to_series(self): + idx1 = DatetimeIndex([], freq='D') + idx2 = DatetimeIndex(['2011-01-01'], freq='D') + idx3 = DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') + idx4 = DatetimeIndex( + ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') + idx5 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00'], freq='H', tz='Asia/Tokyo') + idx6 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], + tz='US/Eastern') + idx7 = DatetimeIndex(['2011-01-01 09:00', '2011-01-02 10:15']) + + exp1 = """Series([], dtype: datetime64[ns])""" + + exp2 = """0 2011-01-01 +dtype: datetime64[ns]""" + + exp3 = """0 2011-01-01 +1 2011-01-02 +dtype: datetime64[ns]""" + + exp4 = """0 2011-01-01 +1 2011-01-02 +2 2011-01-03 +dtype: datetime64[ns]""" + + exp5 = """0 2011-01-01 09:00:00+09:00 +1 2011-01-01 10:00:00+09:00 +2 2011-01-01 11:00:00+09:00 +dtype: datetime64[ns, Asia/Tokyo]""" + + exp6 = """0 2011-01-01 09:00:00-05:00 +1 2011-01-01 10:00:00-05:00 +2 NaT +dtype: datetime64[ns, US/Eastern]""" + + exp7 = """0 2011-01-01 09:00:00 +1 2011-01-02 10:15:00 +dtype: datetime64[ns]""" + + with pd.option_context('display.width', 300): + for idx, expected in zip([idx1, idx2, idx3, idx4, + idx5, idx6, idx7], + [exp1, exp2, exp3, exp4, + exp5, exp6, exp7]): + result = repr(Series(idx)) + assert result == expected + + def test_summary(self): + # GH9116 + idx1 = DatetimeIndex([], freq='D') + idx2 = DatetimeIndex(['2011-01-01'], freq='D') + idx3 = DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') + idx4 = DatetimeIndex( + ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') + idx5 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00'], + freq='H', tz='Asia/Tokyo') + idx6 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], + tz='US/Eastern') + + exp1 = """DatetimeIndex: 0 entries +Freq: D""" + + exp2 = """DatetimeIndex: 1 entries, 2011-01-01 to 2011-01-01 +Freq: D""" + + exp3 = """DatetimeIndex: 2 entries, 2011-01-01 to 2011-01-02 +Freq: D""" + + exp4 = """DatetimeIndex: 3 entries, 2011-01-01 to 2011-01-03 +Freq: D""" + + exp5 = ("DatetimeIndex: 3 entries, 2011-01-01 09:00:00+09:00 " + "to 2011-01-01 11:00:00+09:00\n" + "Freq: H") + + exp6 = """DatetimeIndex: 3 entries, 2011-01-01 09:00:00-05:00 to NaT""" + + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, idx6], + [exp1, exp2, exp3, exp4, exp5, exp6]): + result = idx.summary() + assert result == expected + + def test_resolution(self): + for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', 'T', + 'S', 'L', 'U'], + ['day', 'day', 'day', 'day', 'hour', + 'minute', 'second', 'millisecond', + 'microsecond']): + for tz in self.tz: + idx = pd.date_range(start='2013-04-01', periods=30, freq=freq, + tz=tz) + assert idx.resolution == expected + + def test_union(self): + for tz in self.tz: + # union + rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other1 = pd.date_range('1/6/2000', freq='D', periods=5, tz=tz) + expected1 = pd.date_range('1/1/2000', freq='D', periods=10, tz=tz) + + rng2 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other2 = pd.date_range('1/4/2000', freq='D', periods=5, tz=tz) + expected2 = pd.date_range('1/1/2000', freq='D', periods=8, tz=tz) + + rng3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other3 = pd.DatetimeIndex([], tz=tz) + expected3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + + for rng, other, expected in [(rng1, other1, expected1), + (rng2, other2, expected2), + (rng3, other3, expected3)]: + + result_union = rng.union(other) + tm.assert_index_equal(result_union, expected) + + def test_add_iadd(self): + for tz in self.tz: + + # offset + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), Timedelta(hours=2)] + + for delta in offsets: + rng = pd.date_range('2000-01-01', '2000-02-01', tz=tz) + result = rng + delta + expected = pd.date_range('2000-01-01 02:00', + '2000-02-01 02:00', tz=tz) + tm.assert_index_equal(result, expected) + rng += delta + tm.assert_index_equal(rng, expected) + + # int + rng = pd.date_range('2000-01-01 09:00', freq='H', periods=10, + tz=tz) + result = rng + 1 + expected = pd.date_range('2000-01-01 10:00', freq='H', periods=10, + tz=tz) + tm.assert_index_equal(result, expected) + rng += 1 + tm.assert_index_equal(rng, expected) + + idx = DatetimeIndex(['2011-01-01', '2011-01-02']) + msg = "cannot add a datelike to a DatetimeIndex" + with tm.assert_raises_regex(TypeError, msg): + idx + Timestamp('2011-01-01') + + with tm.assert_raises_regex(TypeError, msg): + Timestamp('2011-01-01') + idx + + def test_add_dti_dti(self): + # previously performed setop (deprecated in 0.16.0), now raises + # TypeError (GH14164) + + dti = date_range('20130101', periods=3) + dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') + + with pytest.raises(TypeError): + dti + dti + + with pytest.raises(TypeError): + dti_tz + dti_tz + + with pytest.raises(TypeError): + dti_tz + dti + + with pytest.raises(TypeError): + dti + dti_tz + + def test_difference(self): + for tz in self.tz: + # diff + rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other1 = pd.date_range('1/6/2000', freq='D', periods=5, tz=tz) + expected1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + + rng2 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other2 = pd.date_range('1/4/2000', freq='D', periods=5, tz=tz) + expected2 = pd.date_range('1/1/2000', freq='D', periods=3, tz=tz) + + rng3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + other3 = pd.DatetimeIndex([], tz=tz) + expected3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) + + for rng, other, expected in [(rng1, other1, expected1), + (rng2, other2, expected2), + (rng3, other3, expected3)]: + result_diff = rng.difference(other) + tm.assert_index_equal(result_diff, expected) + + def test_sub_isub(self): + for tz in self.tz: + + # offset + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), Timedelta(hours=2)] + + for delta in offsets: + rng = pd.date_range('2000-01-01', '2000-02-01', tz=tz) + expected = pd.date_range('1999-12-31 22:00', + '2000-01-31 22:00', tz=tz) + + result = rng - delta + tm.assert_index_equal(result, expected) + rng -= delta + tm.assert_index_equal(rng, expected) + + # int + rng = pd.date_range('2000-01-01 09:00', freq='H', periods=10, + tz=tz) + result = rng - 1 + expected = pd.date_range('2000-01-01 08:00', freq='H', periods=10, + tz=tz) + tm.assert_index_equal(result, expected) + rng -= 1 + tm.assert_index_equal(rng, expected) + + def test_sub_dti_dti(self): + # previously performed setop (deprecated in 0.16.0), now changed to + # return subtraction -> TimeDeltaIndex (GH ...) + + dti = date_range('20130101', periods=3) + dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') + dti_tz2 = date_range('20130101', periods=3).tz_localize('UTC') + expected = TimedeltaIndex([0, 0, 0]) + + result = dti - dti + tm.assert_index_equal(result, expected) + + result = dti_tz - dti_tz + tm.assert_index_equal(result, expected) + + with pytest.raises(TypeError): + dti_tz - dti + + with pytest.raises(TypeError): + dti - dti_tz + + with pytest.raises(TypeError): + dti_tz - dti_tz2 + + # isub + dti -= dti + tm.assert_index_equal(dti, expected) + + # different length raises ValueError + dti1 = date_range('20130101', periods=3) + dti2 = date_range('20130101', periods=4) + with pytest.raises(ValueError): + dti1 - dti2 + + # NaN propagation + dti1 = DatetimeIndex(['2012-01-01', np.nan, '2012-01-03']) + dti2 = DatetimeIndex(['2012-01-02', '2012-01-03', np.nan]) + expected = TimedeltaIndex(['1 days', np.nan, np.nan]) + result = dti2 - dti1 + tm.assert_index_equal(result, expected) + + def test_sub_period(self): + # GH 13078 + # not supported, check TypeError + p = pd.Period('2011-01-01', freq='D') + + for freq in [None, 'D']: + idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq=freq) + + with pytest.raises(TypeError): + idx - p + + with pytest.raises(TypeError): + p - idx + + def test_comp_nat(self): + left = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT, + pd.Timestamp('2011-01-03')]) + right = pd.DatetimeIndex([pd.NaT, pd.NaT, pd.Timestamp('2011-01-03')]) + + for l, r in [(left, right), (left.asobject, right.asobject)]: + result = l == r + expected = np.array([False, False, True]) + tm.assert_numpy_array_equal(result, expected) + + result = l != r + expected = np.array([True, True, False]) + tm.assert_numpy_array_equal(result, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l == pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT == r, expected) + + expected = np.array([True, True, True]) + tm.assert_numpy_array_equal(l != pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT != l, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l < pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT > l, expected) + + def test_value_counts_unique(self): + # GH 7735 + for tz in self.tz: + idx = pd.date_range('2011-01-01 09:00', freq='H', periods=10) + # create repeated values, 'n'th element is repeated by n+1 times + idx = DatetimeIndex(np.repeat(idx.values, range(1, len(idx) + 1)), + tz=tz) + + exp_idx = pd.date_range('2011-01-01 18:00', freq='-1H', periods=10, + tz=tz) + expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + expected = pd.date_range('2011-01-01 09:00', freq='H', periods=10, + tz=tz) + tm.assert_index_equal(idx.unique(), expected) + + idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 09:00', + '2013-01-01 09:00', '2013-01-01 08:00', + '2013-01-01 08:00', pd.NaT], tz=tz) + + exp_idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 08:00'], + tz=tz) + expected = Series([3, 2], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + exp_idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 08:00', + pd.NaT], tz=tz) + expected = Series([3, 2, 1], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(dropna=False), + expected) + + tm.assert_index_equal(idx.unique(), exp_idx) + + def test_nonunique_contains(self): + # GH 9512 + for idx in map(DatetimeIndex, + ([0, 1, 0], [0, 0, -1], [0, -1, -1], + ['2015', '2015', '2016'], ['2015', '2015', '2014'])): + assert idx[0] in idx + + def test_order(self): + # with freq + idx1 = DatetimeIndex(['2011-01-01', '2011-01-02', + '2011-01-03'], freq='D', name='idx') + idx2 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00'], freq='H', + tz='Asia/Tokyo', name='tzidx') + + for idx in [idx1, idx2]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, idx) + assert ordered.freq == idx.freq + + ordered = idx.sort_values(ascending=False) + expected = idx[::-1] + tm.assert_index_equal(ordered, expected) + assert ordered.freq == expected.freq + assert ordered.freq.n == -1 + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, idx) + tm.assert_numpy_array_equal(indexer, np.array([0, 1, 2]), + check_dtype=False) + assert ordered.freq == idx.freq + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + expected = idx[::-1] + tm.assert_index_equal(ordered, expected) + tm.assert_numpy_array_equal(indexer, + np.array([2, 1, 0]), + check_dtype=False) + assert ordered.freq == expected.freq + assert ordered.freq.n == -1 + + # without freq + for tz in self.tz: + idx1 = DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', + '2011-01-02', '2011-01-01'], + tz=tz, name='idx1') + exp1 = DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-02', + '2011-01-03', '2011-01-05'], + tz=tz, name='idx1') + + idx2 = DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', + '2011-01-02', '2011-01-01'], + tz=tz, name='idx2') + + exp2 = DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-02', + '2011-01-03', '2011-01-05'], + tz=tz, name='idx2') + + idx3 = DatetimeIndex([pd.NaT, '2011-01-03', '2011-01-05', + '2011-01-02', pd.NaT], tz=tz, name='idx3') + exp3 = DatetimeIndex([pd.NaT, pd.NaT, '2011-01-02', '2011-01-03', + '2011-01-05'], tz=tz, name='idx3') + + for idx, expected in [(idx1, exp1), (idx2, exp2), (idx3, exp3)]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, expected) + assert ordered.freq is None + + ordered = idx.sort_values(ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + assert ordered.freq is None + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, expected) + + exp = np.array([0, 4, 3, 1, 2]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq is None + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + + exp = np.array([2, 1, 3, 4, 0]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq is None + + def test_getitem(self): + idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') + idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D', + tz='Asia/Tokyo', name='idx') + + for idx in [idx1, idx2]: + result = idx[0] + assert result == Timestamp('2011-01-01', tz=idx.tz) + + result = idx[0:5] + expected = pd.date_range('2011-01-01', '2011-01-05', freq='D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[0:10:2] + expected = pd.date_range('2011-01-01', '2011-01-09', freq='2D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[-20:-5:3] + expected = pd.date_range('2011-01-12', '2011-01-24', freq='3D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[4::-1] + expected = DatetimeIndex(['2011-01-05', '2011-01-04', '2011-01-03', + '2011-01-02', '2011-01-01'], + freq='-1D', tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + def test_drop_duplicates_metadata(self): + # GH 10115 + idx = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') + result = idx.drop_duplicates() + tm.assert_index_equal(idx, result) + assert idx.freq == result.freq + + idx_dup = idx.append(idx) + assert idx_dup.freq is None # freq is reset + result = idx_dup.drop_duplicates() + tm.assert_index_equal(idx, result) + assert result.freq is None + + def test_drop_duplicates(self): + # to check Index/Series compat + base = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') + idx = base.append(base[:5]) + + res = idx.drop_duplicates() + tm.assert_index_equal(res, base) + res = Series(idx).drop_duplicates() + tm.assert_series_equal(res, Series(base)) + + res = idx.drop_duplicates(keep='last') + exp = base[5:].append(base[:5]) + tm.assert_index_equal(res, exp) + res = Series(idx).drop_duplicates(keep='last') + tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) + + res = idx.drop_duplicates(keep=False) + tm.assert_index_equal(res, base[5:]) + res = Series(idx).drop_duplicates(keep=False) + tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) + + def test_take(self): + # GH 10295 + idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') + idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D', + tz='Asia/Tokyo', name='idx') + + for idx in [idx1, idx2]: + result = idx.take([0]) + assert result == Timestamp('2011-01-01', tz=idx.tz) + + result = idx.take([0, 1, 2]) + expected = pd.date_range('2011-01-01', '2011-01-03', freq='D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([0, 2, 4]) + expected = pd.date_range('2011-01-01', '2011-01-05', freq='2D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([7, 4, 1]) + expected = pd.date_range('2011-01-08', '2011-01-02', freq='-3D', + tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([3, 2, 5]) + expected = DatetimeIndex(['2011-01-04', '2011-01-03', + '2011-01-06'], + freq=None, tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq is None + + result = idx.take([-3, 2, 5]) + expected = DatetimeIndex(['2011-01-29', '2011-01-03', + '2011-01-06'], + freq=None, tz=idx.tz, name='idx') + tm.assert_index_equal(result, expected) + assert result.freq is None + + def test_take_invalid_kwargs(self): + idx = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') + indices = [1, 6, 5, 9, 10, 13, 15, 3] + + msg = r"take\(\) got an unexpected keyword argument 'foo'" + tm.assert_raises_regex(TypeError, msg, idx.take, + indices, foo=2) + + msg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, out=indices) + + msg = "the 'mode' parameter is not supported" + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, mode='clip') + + def test_infer_freq(self): + # GH 11018 + for freq in ['A', '2A', '-2A', 'Q', '-1Q', 'M', '-1M', 'D', '3D', + '-3D', 'W', '-1W', 'H', '2H', '-2H', 'T', '2T', 'S', + '-3S']: + idx = pd.date_range('2011-01-01 09:00:00', freq=freq, periods=10) + result = pd.DatetimeIndex(idx.asi8, freq='infer') + tm.assert_index_equal(idx, result) + assert result.freq == freq + + def test_nat_new(self): + idx = pd.date_range('2011-01-01', freq='D', periods=5, name='x') + result = idx._nat_new() + exp = pd.DatetimeIndex([pd.NaT] * 5, name='x') + tm.assert_index_equal(result, exp) + + result = idx._nat_new(box=False) + exp = np.array([tslib.iNaT] * 5, dtype=np.int64) + tm.assert_numpy_array_equal(result, exp) + + def test_shift(self): + # GH 9903 + for tz in self.tz: + idx = pd.DatetimeIndex([], name='xxx', tz=tz) + tm.assert_index_equal(idx.shift(0, freq='H'), idx) + tm.assert_index_equal(idx.shift(3, freq='H'), idx) + + idx = pd.DatetimeIndex(['2011-01-01 10:00', '2011-01-01 11:00' + '2011-01-01 12:00'], name='xxx', tz=tz) + tm.assert_index_equal(idx.shift(0, freq='H'), idx) + exp = pd.DatetimeIndex(['2011-01-01 13:00', '2011-01-01 14:00' + '2011-01-01 15:00'], name='xxx', tz=tz) + tm.assert_index_equal(idx.shift(3, freq='H'), exp) + exp = pd.DatetimeIndex(['2011-01-01 07:00', '2011-01-01 08:00' + '2011-01-01 09:00'], name='xxx', tz=tz) + tm.assert_index_equal(idx.shift(-3, freq='H'), exp) + + def test_nat(self): + assert pd.DatetimeIndex._na_value is pd.NaT + assert pd.DatetimeIndex([])._na_value is pd.NaT + + for tz in [None, 'US/Eastern', 'UTC']: + idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], tz=tz) + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) + assert not idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([], dtype=np.intp)) + + idx = pd.DatetimeIndex(['2011-01-01', 'NaT'], tz=tz) + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) + assert idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([1], dtype=np.intp)) + + def test_equals(self): + # GH 13107 + for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: + idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02', 'NaT']) + assert idx.equals(idx) + assert idx.equals(idx.copy()) + assert idx.equals(idx.asobject) + assert idx.asobject.equals(idx) + assert idx.asobject.equals(idx.asobject) + assert not idx.equals(list(idx)) + assert not idx.equals(pd.Series(idx)) + + idx2 = pd.DatetimeIndex(['2011-01-01', '2011-01-02', 'NaT'], + tz='US/Pacific') + assert not idx.equals(idx2) + assert not idx.equals(idx2.copy()) + assert not idx.equals(idx2.asobject) + assert not idx.asobject.equals(idx2) + assert not idx.equals(list(idx2)) + assert not idx.equals(pd.Series(idx2)) + + # same internal, different tz + idx3 = pd.DatetimeIndex._simple_new(idx.asi8, tz='US/Pacific') + tm.assert_numpy_array_equal(idx.asi8, idx3.asi8) + assert not idx.equals(idx3) + assert not idx.equals(idx3.copy()) + assert not idx.equals(idx3.asobject) + assert not idx.asobject.equals(idx3) + assert not idx.equals(list(idx3)) + assert not idx.equals(pd.Series(idx3)) + + +class TestDateTimeIndexToJulianDate(object): + + def test_1700(self): + r1 = Float64Index([2345897.5, 2345898.5, 2345899.5, 2345900.5, + 2345901.5]) + r2 = date_range(start=Timestamp('1710-10-01'), periods=5, + freq='D').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_2000(self): + r1 = Float64Index([2451601.5, 2451602.5, 2451603.5, 2451604.5, + 2451605.5]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='D').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_hour(self): + r1 = Float64Index( + [2451601.5, 2451601.5416666666666666, 2451601.5833333333333333, + 2451601.625, 2451601.6666666666666666]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='H').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_minute(self): + r1 = Float64Index( + [2451601.5, 2451601.5006944444444444, 2451601.5013888888888888, + 2451601.5020833333333333, 2451601.5027777777777777]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='T').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + def test_second(self): + r1 = Float64Index( + [2451601.5, 2451601.500011574074074, 2451601.5000231481481481, + 2451601.5000347222222222, 2451601.5000462962962962]) + r2 = date_range(start=Timestamp('2000-02-27'), periods=5, + freq='S').to_julian_date() + assert isinstance(r2, Float64Index) + tm.assert_index_equal(r1, r2) + + +# GH 10699 +@pytest.mark.parametrize('klass,assert_func', zip([Series, DatetimeIndex], + [tm.assert_series_equal, + tm.assert_index_equal])) +def test_datetime64_with_DateOffset(klass, assert_func): + s = klass(date_range('2000-01-01', '2000-01-31'), name='a') + result = s + pd.DateOffset(years=1) + result2 = pd.DateOffset(years=1) + s + exp = klass(date_range('2001-01-01', '2001-01-31'), name='a') + assert_func(result, exp) + assert_func(result2, exp) + + result = s - pd.DateOffset(years=1) + exp = klass(date_range('1999-01-01', '1999-01-31'), name='a') + assert_func(result, exp) + + s = klass([Timestamp('2000-01-15 00:15:00', tz='US/Central'), + pd.Timestamp('2000-02-15', tz='US/Central')], name='a') + result = s + pd.offsets.Day() + result2 = pd.offsets.Day() + s + exp = klass([Timestamp('2000-01-16 00:15:00', tz='US/Central'), + Timestamp('2000-02-16', tz='US/Central')], name='a') + assert_func(result, exp) + assert_func(result2, exp) + + s = klass([Timestamp('2000-01-15 00:15:00', tz='US/Central'), + pd.Timestamp('2000-02-15', tz='US/Central')], name='a') + result = s + pd.offsets.MonthEnd() + result2 = pd.offsets.MonthEnd() + s + exp = klass([Timestamp('2000-01-31 00:15:00', tz='US/Central'), + Timestamp('2000-02-29', tz='US/Central')], name='a') + assert_func(result, exp) + assert_func(result2, exp) + + # array of offsets - valid for Series only + if klass is Series: + with tm.assert_produces_warning(PerformanceWarning): + s = klass([Timestamp('2000-1-1'), Timestamp('2000-2-1')]) + result = s + Series([pd.offsets.DateOffset(years=1), + pd.offsets.MonthEnd()]) + exp = klass([Timestamp('2001-1-1'), Timestamp('2000-2-29') + ]) + assert_func(result, exp) + + # same offset + result = s + Series([pd.offsets.DateOffset(years=1), + pd.offsets.DateOffset(years=1)]) + exp = klass([Timestamp('2001-1-1'), Timestamp('2001-2-1')]) + assert_func(result, exp) + + s = klass([Timestamp('2000-01-05 00:15:00'), + Timestamp('2000-01-31 00:23:00'), + Timestamp('2000-01-01'), + Timestamp('2000-03-31'), + Timestamp('2000-02-29'), + Timestamp('2000-12-31'), + Timestamp('2000-05-15'), + Timestamp('2001-06-15')]) + + # DateOffset relativedelta fastpath + relative_kwargs = [('years', 2), ('months', 5), ('days', 3), + ('hours', 5), ('minutes', 10), ('seconds', 2), + ('microseconds', 5)] + for i, kwd in enumerate(relative_kwargs): + op = pd.DateOffset(**dict([kwd])) + assert_func(klass([x + op for x in s]), s + op) + assert_func(klass([x - op for x in s]), s - op) + op = pd.DateOffset(**dict(relative_kwargs[:i + 1])) + assert_func(klass([x + op for x in s]), s + op) + assert_func(klass([x - op for x in s]), s - op) + + # assert these are equal on a piecewise basis + offsets = ['YearBegin', ('YearBegin', {'month': 5}), + 'YearEnd', ('YearEnd', {'month': 5}), + 'MonthBegin', 'MonthEnd', + 'SemiMonthEnd', 'SemiMonthBegin', + 'Week', ('Week', {'weekday': 3}), + 'BusinessDay', 'BDay', 'QuarterEnd', 'QuarterBegin', + 'CustomBusinessDay', 'CDay', 'CBMonthEnd', + 'CBMonthBegin', 'BMonthBegin', 'BMonthEnd', + 'BusinessHour', 'BYearBegin', 'BYearEnd', + 'BQuarterBegin', ('LastWeekOfMonth', {'weekday': 2}), + ('FY5253Quarter', {'qtr_with_extra_week': 1, + 'startingMonth': 1, + 'weekday': 2, + 'variation': 'nearest'}), + ('FY5253', {'weekday': 0, + 'startingMonth': 2, + 'variation': + 'nearest'}), + ('WeekOfMonth', {'weekday': 2, + 'week': 2}), + 'Easter', ('DateOffset', {'day': 4}), + ('DateOffset', {'month': 5})] + + with warnings.catch_warnings(record=True): + for normalize in (True, False): + for do in offsets: + if isinstance(do, tuple): + do, kwargs = do + else: + do = do + kwargs = {} + + for n in [0, 5]: + if (do in ['WeekOfMonth', 'LastWeekOfMonth', + 'FY5253Quarter', 'FY5253'] and n == 0): + continue + op = getattr(pd.offsets, do)(n, + normalize=normalize, + **kwargs) + assert_func(klass([x + op for x in s]), s + op) + assert_func(klass([x - op for x in s]), s - op) + assert_func(klass([op + x for x in s]), op + s) + + +@pytest.mark.parametrize('years,months', product([-1, 0, 1], [-2, 0, 2])) +def test_shift_months(years, months): + s = DatetimeIndex([Timestamp('2000-01-05 00:15:00'), + Timestamp('2000-01-31 00:23:00'), + Timestamp('2000-01-01'), + Timestamp('2000-02-29'), + Timestamp('2000-12-31')]) + actual = DatetimeIndex(tslib.shift_months(s.asi8, years * 12 + + months)) + expected = DatetimeIndex([x + offsets.DateOffset( + years=years, months=months) for x in s]) + tm.assert_index_equal(actual, expected) + + +class TestBusinessDatetimeIndex(object): + + def setup_method(self, method): + self.rng = bdate_range(START, END) + + def test_comparison(self): + d = self.rng[10] + + comp = self.rng > d + assert comp[11] + assert not comp[9] + + def test_pickle_unpickle(self): + unpickled = tm.round_trip_pickle(self.rng) + assert unpickled.offset is not None + + def test_copy(self): + cp = self.rng.copy() + repr(cp) + tm.assert_index_equal(cp, self.rng) + + def test_repr(self): + # only really care that it works + repr(self.rng) + + def test_getitem(self): + smaller = self.rng[:5] + exp = DatetimeIndex(self.rng.view(np.ndarray)[:5]) + tm.assert_index_equal(smaller, exp) + + assert smaller.offset == self.rng.offset + + sliced = self.rng[::5] + assert sliced.offset == BDay() * 5 + + fancy_indexed = self.rng[[4, 3, 2, 1, 0]] + assert len(fancy_indexed) == 5 + assert isinstance(fancy_indexed, DatetimeIndex) + assert fancy_indexed.freq is None + + # 32-bit vs. 64-bit platforms + assert self.rng[4] == self.rng[np.int_(4)] + + def test_getitem_matplotlib_hackaround(self): + values = self.rng[:, None] + expected = self.rng.values[:, None] + tm.assert_numpy_array_equal(values, expected) + + def test_shift(self): + shifted = self.rng.shift(5) + assert shifted[0] == self.rng[5] + assert shifted.offset == self.rng.offset + + shifted = self.rng.shift(-5) + assert shifted[5] == self.rng[0] + assert shifted.offset == self.rng.offset + + shifted = self.rng.shift(0) + assert shifted[0] == self.rng[0] + assert shifted.offset == self.rng.offset + + rng = date_range(START, END, freq=BMonthEnd()) + shifted = rng.shift(1, freq=BDay()) + assert shifted[0] == rng[0] + BDay() + + def test_summary(self): + self.rng.summary() + self.rng[2:2].summary() + + def test_summary_pytz(self): + bdate_range('1/1/2005', '1/1/2009', tz=pytz.utc).summary() + + def test_summary_dateutil(self): + bdate_range('1/1/2005', '1/1/2009', tz=dateutil.tz.tzutc()).summary() + + def test_equals(self): + assert not self.rng.equals(list(self.rng)) + + def test_identical(self): + t1 = self.rng.copy() + t2 = self.rng.copy() + assert t1.identical(t2) + + # name + t1 = t1.rename('foo') + assert t1.equals(t2) + assert not t1.identical(t2) + t2 = t2.rename('foo') + assert t1.identical(t2) + + # freq + t2v = Index(t2.values) + assert t1.equals(t2v) + assert not t1.identical(t2v) + + +class TestCustomDatetimeIndex(object): + + def setup_method(self, method): + self.rng = cdate_range(START, END) + + def test_comparison(self): + d = self.rng[10] + + comp = self.rng > d + assert comp[11] + assert not comp[9] + + def test_copy(self): + cp = self.rng.copy() + repr(cp) + tm.assert_index_equal(cp, self.rng) + + def test_repr(self): + # only really care that it works + repr(self.rng) + + def test_getitem(self): + smaller = self.rng[:5] + exp = DatetimeIndex(self.rng.view(np.ndarray)[:5]) + tm.assert_index_equal(smaller, exp) + assert smaller.offset == self.rng.offset + + sliced = self.rng[::5] + assert sliced.offset == CDay() * 5 + + fancy_indexed = self.rng[[4, 3, 2, 1, 0]] + assert len(fancy_indexed) == 5 + assert isinstance(fancy_indexed, DatetimeIndex) + assert fancy_indexed.freq is None + + # 32-bit vs. 64-bit platforms + assert self.rng[4] == self.rng[np.int_(4)] + + def test_getitem_matplotlib_hackaround(self): + values = self.rng[:, None] + expected = self.rng.values[:, None] + tm.assert_numpy_array_equal(values, expected) + + def test_shift(self): + + shifted = self.rng.shift(5) + assert shifted[0] == self.rng[5] + assert shifted.offset == self.rng.offset + + shifted = self.rng.shift(-5) + assert shifted[5] == self.rng[0] + assert shifted.offset == self.rng.offset + + shifted = self.rng.shift(0) + assert shifted[0] == self.rng[0] + assert shifted.offset == self.rng.offset + + # PerformanceWarning + with warnings.catch_warnings(record=True): + rng = date_range(START, END, freq=BMonthEnd()) + shifted = rng.shift(1, freq=CDay()) + assert shifted[0] == rng[0] + CDay() + + def test_pickle_unpickle(self): + unpickled = tm.round_trip_pickle(self.rng) + assert unpickled.offset is not None + + def test_summary(self): + self.rng.summary() + self.rng[2:2].summary() + + def test_summary_pytz(self): + cdate_range('1/1/2005', '1/1/2009', tz=pytz.utc).summary() + + def test_summary_dateutil(self): + cdate_range('1/1/2005', '1/1/2009', tz=dateutil.tz.tzutc()).summary() + + def test_equals(self): + assert not self.rng.equals(list(self.rng)) diff --git a/pandas/tests/indexes/datetimes/test_partial_slicing.py b/pandas/tests/indexes/datetimes/test_partial_slicing.py new file mode 100644 index 0000000000000..e7d03aa193cbd --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_partial_slicing.py @@ -0,0 +1,270 @@ +""" test partial slicing on Series/Frame """ + +import pytest + +from datetime import datetime +import numpy as np +import pandas as pd + +from pandas import (DatetimeIndex, Series, DataFrame, + date_range, Index, Timedelta, Timestamp) +from pandas.util import testing as tm + + +class TestSlicing(object): + + def test_slice_year(self): + dti = DatetimeIndex(freq='B', start=datetime(2005, 1, 1), periods=500) + + s = Series(np.arange(len(dti)), index=dti) + result = s['2005'] + expected = s[s.index.year == 2005] + tm.assert_series_equal(result, expected) + + df = DataFrame(np.random.rand(len(dti), 5), index=dti) + result = df.loc['2005'] + expected = df[df.index.year == 2005] + tm.assert_frame_equal(result, expected) + + rng = date_range('1/1/2000', '1/1/2010') + + result = rng.get_loc('2009') + expected = slice(3288, 3653) + assert result == expected + + def test_slice_quarter(self): + dti = DatetimeIndex(freq='D', start=datetime(2000, 6, 1), periods=500) + + s = Series(np.arange(len(dti)), index=dti) + assert len(s['2001Q1']) == 90 + + df = DataFrame(np.random.rand(len(dti), 5), index=dti) + assert len(df.loc['1Q01']) == 90 + + def test_slice_month(self): + dti = DatetimeIndex(freq='D', start=datetime(2005, 1, 1), periods=500) + s = Series(np.arange(len(dti)), index=dti) + assert len(s['2005-11']) == 30 + + df = DataFrame(np.random.rand(len(dti), 5), index=dti) + assert len(df.loc['2005-11']) == 30 + + tm.assert_series_equal(s['2005-11'], s['11-2005']) + + def test_partial_slice(self): + rng = DatetimeIndex(freq='D', start=datetime(2005, 1, 1), periods=500) + s = Series(np.arange(len(rng)), index=rng) + + result = s['2005-05':'2006-02'] + expected = s['20050501':'20060228'] + tm.assert_series_equal(result, expected) + + result = s['2005-05':] + expected = s['20050501':] + tm.assert_series_equal(result, expected) + + result = s[:'2006-02'] + expected = s[:'20060228'] + tm.assert_series_equal(result, expected) + + result = s['2005-1-1'] + assert result == s.iloc[0] + + pytest.raises(Exception, s.__getitem__, '2004-12-31') + + def test_partial_slice_daily(self): + rng = DatetimeIndex(freq='H', start=datetime(2005, 1, 31), periods=500) + s = Series(np.arange(len(rng)), index=rng) + + result = s['2005-1-31'] + tm.assert_series_equal(result, s.iloc[:24]) + + pytest.raises(Exception, s.__getitem__, '2004-12-31 00') + + def test_partial_slice_hourly(self): + rng = DatetimeIndex(freq='T', start=datetime(2005, 1, 1, 20, 0, 0), + periods=500) + s = Series(np.arange(len(rng)), index=rng) + + result = s['2005-1-1'] + tm.assert_series_equal(result, s.iloc[:60 * 4]) + + result = s['2005-1-1 20'] + tm.assert_series_equal(result, s.iloc[:60]) + + assert s['2005-1-1 20:00'] == s.iloc[0] + pytest.raises(Exception, s.__getitem__, '2004-12-31 00:15') + + def test_partial_slice_minutely(self): + rng = DatetimeIndex(freq='S', start=datetime(2005, 1, 1, 23, 59, 0), + periods=500) + s = Series(np.arange(len(rng)), index=rng) + + result = s['2005-1-1 23:59'] + tm.assert_series_equal(result, s.iloc[:60]) + + result = s['2005-1-1'] + tm.assert_series_equal(result, s.iloc[:60]) + + assert s[Timestamp('2005-1-1 23:59:00')] == s.iloc[0] + pytest.raises(Exception, s.__getitem__, '2004-12-31 00:00:00') + + def test_partial_slice_second_precision(self): + rng = DatetimeIndex(start=datetime(2005, 1, 1, 0, 0, 59, + microsecond=999990), + periods=20, freq='US') + s = Series(np.arange(20), rng) + + tm.assert_series_equal(s['2005-1-1 00:00'], s.iloc[:10]) + tm.assert_series_equal(s['2005-1-1 00:00:59'], s.iloc[:10]) + + tm.assert_series_equal(s['2005-1-1 00:01'], s.iloc[10:]) + tm.assert_series_equal(s['2005-1-1 00:01:00'], s.iloc[10:]) + + assert s[Timestamp('2005-1-1 00:00:59.999990')] == s.iloc[0] + tm.assert_raises_regex(KeyError, '2005-1-1 00:00:00', + lambda: s['2005-1-1 00:00:00']) + + def test_partial_slicing_dataframe(self): + # GH14856 + # Test various combinations of string slicing resolution vs. + # index resolution + # - If string resolution is less precise than index resolution, + # string is considered a slice + # - If string resolution is equal to or more precise than index + # resolution, string is considered an exact match + formats = ['%Y', '%Y-%m', '%Y-%m-%d', '%Y-%m-%d %H', + '%Y-%m-%d %H:%M', '%Y-%m-%d %H:%M:%S'] + resolutions = ['year', 'month', 'day', 'hour', 'minute', 'second'] + for rnum, resolution in enumerate(resolutions[2:], 2): + # we check only 'day', 'hour', 'minute' and 'second' + unit = Timedelta("1 " + resolution) + middate = datetime(2012, 1, 1, 0, 0, 0) + index = DatetimeIndex([middate - unit, + middate, middate + unit]) + values = [1, 2, 3] + df = DataFrame({'a': values}, index, dtype=np.int64) + assert df.index.resolution == resolution + + # Timestamp with the same resolution as index + # Should be exact match for Series (return scalar) + # and raise KeyError for Frame + for timestamp, expected in zip(index, values): + ts_string = timestamp.strftime(formats[rnum]) + # make ts_string as precise as index + result = df['a'][ts_string] + assert isinstance(result, np.int64) + assert result == expected + pytest.raises(KeyError, df.__getitem__, ts_string) + + # Timestamp with resolution less precise than index + for fmt in formats[:rnum]: + for element, theslice in [[0, slice(None, 1)], + [1, slice(1, None)]]: + ts_string = index[element].strftime(fmt) + + # Series should return slice + result = df['a'][ts_string] + expected = df['a'][theslice] + tm.assert_series_equal(result, expected) + + # Frame should return slice as well + result = df[ts_string] + expected = df[theslice] + tm.assert_frame_equal(result, expected) + + # Timestamp with resolution more precise than index + # Compatible with existing key + # Should return scalar for Series + # and raise KeyError for Frame + for fmt in formats[rnum + 1:]: + ts_string = index[1].strftime(fmt) + result = df['a'][ts_string] + assert isinstance(result, np.int64) + assert result == 2 + pytest.raises(KeyError, df.__getitem__, ts_string) + + # Not compatible with existing key + # Should raise KeyError + for fmt, res in list(zip(formats, resolutions))[rnum + 1:]: + ts = index[1] + Timedelta("1 " + res) + ts_string = ts.strftime(fmt) + pytest.raises(KeyError, df['a'].__getitem__, ts_string) + pytest.raises(KeyError, df.__getitem__, ts_string) + + def test_partial_slicing_with_multiindex(self): + + # GH 4758 + # partial string indexing with a multi-index buggy + df = DataFrame({'ACCOUNT': ["ACCT1", "ACCT1", "ACCT1", "ACCT2"], + 'TICKER': ["ABC", "MNP", "XYZ", "XYZ"], + 'val': [1, 2, 3, 4]}, + index=date_range("2013-06-19 09:30:00", + periods=4, freq='5T')) + df_multi = df.set_index(['ACCOUNT', 'TICKER'], append=True) + + expected = DataFrame([ + [1] + ], index=Index(['ABC'], name='TICKER'), columns=['val']) + result = df_multi.loc[('2013-06-19 09:30:00', 'ACCT1')] + tm.assert_frame_equal(result, expected) + + expected = df_multi.loc[ + (pd.Timestamp('2013-06-19 09:30:00', tz=None), 'ACCT1', 'ABC')] + result = df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')] + tm.assert_series_equal(result, expected) + + # this is a KeyError as we don't do partial string selection on + # multi-levels + def f(): + df_multi.loc[('2013-06-19', 'ACCT1', 'ABC')] + + pytest.raises(KeyError, f) + + # GH 4294 + # partial slice on a series mi + s = pd.DataFrame(np.random.rand(1000, 1000), index=pd.date_range( + '2000-1-1', periods=1000)).stack() + + s2 = s[:-1].copy() + expected = s2['2000-1-4'] + result = s2[pd.Timestamp('2000-1-4')] + tm.assert_series_equal(result, expected) + + result = s[pd.Timestamp('2000-1-4')] + expected = s['2000-1-4'] + tm.assert_series_equal(result, expected) + + df2 = pd.DataFrame(s) + expected = df2.xs('2000-1-4') + result = df2.loc[pd.Timestamp('2000-1-4')] + tm.assert_frame_equal(result, expected) + + def test_partial_slice_doesnt_require_monotonicity(self): + # For historical reasons. + s = pd.Series(np.arange(10), pd.date_range('2014-01-01', periods=10)) + + nonmonotonic = s[[3, 5, 4]] + expected = nonmonotonic.iloc[:0] + timestamp = pd.Timestamp('2014-01-10') + + tm.assert_series_equal(nonmonotonic['2014-01-10':], expected) + tm.assert_raises_regex(KeyError, + r"Timestamp\('2014-01-10 00:00:00'\)", + lambda: nonmonotonic[timestamp:]) + + tm.assert_series_equal(nonmonotonic.loc['2014-01-10':], expected) + tm.assert_raises_regex(KeyError, + r"Timestamp\('2014-01-10 00:00:00'\)", + lambda: nonmonotonic.loc[timestamp:]) + + def test_loc_datetime_length_one(self): + # GH16071 + df = pd.DataFrame(columns=['1'], + index=pd.date_range('2016-10-01T00:00:00', + '2016-10-01T23:59:59')) + result = df.loc[datetime(2016, 10, 1):] + tm.assert_frame_equal(result, df) + + result = df.loc['2016-10-01T00:00:00':] + tm.assert_frame_equal(result, df) diff --git a/pandas/tests/indexes/datetimes/test_setops.py b/pandas/tests/indexes/datetimes/test_setops.py new file mode 100644 index 0000000000000..f43c010f59b9e --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_setops.py @@ -0,0 +1,418 @@ +from datetime import datetime + +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +from pandas.core.indexes.datetimes import cdate_range +from pandas import (DatetimeIndex, date_range, Series, bdate_range, DataFrame, + Int64Index, Index, to_datetime) +from pandas.tseries.offsets import Minute, BMonthEnd, MonthEnd + +START, END = datetime(2009, 1, 1), datetime(2010, 1, 1) + + +class TestDatetimeIndex(object): + + def test_union(self): + i1 = Int64Index(np.arange(0, 20, 2)) + i2 = Int64Index(np.arange(10, 30, 2)) + result = i1.union(i2) + expected = Int64Index(np.arange(0, 30, 2)) + tm.assert_index_equal(result, expected) + + def test_union_coverage(self): + idx = DatetimeIndex(['2000-01-03', '2000-01-01', '2000-01-02']) + ordered = DatetimeIndex(idx.sort_values(), freq='infer') + result = ordered.union(idx) + tm.assert_index_equal(result, ordered) + + result = ordered[:0].union(ordered) + tm.assert_index_equal(result, ordered) + assert result.freq == ordered.freq + + def test_union_bug_1730(self): + rng_a = date_range('1/1/2012', periods=4, freq='3H') + rng_b = date_range('1/1/2012', periods=4, freq='4H') + + result = rng_a.union(rng_b) + exp = DatetimeIndex(sorted(set(list(rng_a)) | set(list(rng_b)))) + tm.assert_index_equal(result, exp) + + def test_union_bug_1745(self): + left = DatetimeIndex(['2012-05-11 15:19:49.695000']) + right = DatetimeIndex(['2012-05-29 13:04:21.322000', + '2012-05-11 15:27:24.873000', + '2012-05-11 15:31:05.350000']) + + result = left.union(right) + exp = DatetimeIndex(sorted(set(list(left)) | set(list(right)))) + tm.assert_index_equal(result, exp) + + def test_union_bug_4564(self): + from pandas import DateOffset + left = date_range("2013-01-01", "2013-02-01") + right = left + DateOffset(minutes=15) + + result = left.union(right) + exp = DatetimeIndex(sorted(set(list(left)) | set(list(right)))) + tm.assert_index_equal(result, exp) + + def test_union_freq_both_none(self): + # GH11086 + expected = bdate_range('20150101', periods=10) + expected.freq = None + + result = expected.union(expected) + tm.assert_index_equal(result, expected) + assert result.freq is None + + def test_union_dataframe_index(self): + rng1 = date_range('1/1/1999', '1/1/2012', freq='MS') + s1 = Series(np.random.randn(len(rng1)), rng1) + + rng2 = date_range('1/1/1980', '12/1/2001', freq='MS') + s2 = Series(np.random.randn(len(rng2)), rng2) + df = DataFrame({'s1': s1, 's2': s2}) + + exp = pd.date_range('1/1/1980', '1/1/2012', freq='MS') + tm.assert_index_equal(df.index, exp) + + def test_union_with_DatetimeIndex(self): + i1 = Int64Index(np.arange(0, 20, 2)) + i2 = DatetimeIndex(start='2012-01-03 00:00:00', periods=10, freq='D') + i1.union(i2) # Works + i2.union(i1) # Fails with "AttributeError: can't set attribute" + + def test_intersection(self): + # GH 4690 (with tz) + for tz in [None, 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']: + base = date_range('6/1/2000', '6/30/2000', freq='D', name='idx') + + # if target has the same name, it is preserved + rng2 = date_range('5/15/2000', '6/20/2000', freq='D', name='idx') + expected2 = date_range('6/1/2000', '6/20/2000', freq='D', + name='idx') + + # if target name is different, it will be reset + rng3 = date_range('5/15/2000', '6/20/2000', freq='D', name='other') + expected3 = date_range('6/1/2000', '6/20/2000', freq='D', + name=None) + + rng4 = date_range('7/1/2000', '7/31/2000', freq='D', name='idx') + expected4 = DatetimeIndex([], name='idx') + + for (rng, expected) in [(rng2, expected2), (rng3, expected3), + (rng4, expected4)]: + result = base.intersection(rng) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + assert result.tz == expected.tz + + # non-monotonic + base = DatetimeIndex(['2011-01-05', '2011-01-04', + '2011-01-02', '2011-01-03'], + tz=tz, name='idx') + + rng2 = DatetimeIndex(['2011-01-04', '2011-01-02', + '2011-02-02', '2011-02-03'], + tz=tz, name='idx') + expected2 = DatetimeIndex( + ['2011-01-04', '2011-01-02'], tz=tz, name='idx') + + rng3 = DatetimeIndex(['2011-01-04', '2011-01-02', + '2011-02-02', '2011-02-03'], + tz=tz, name='other') + expected3 = DatetimeIndex( + ['2011-01-04', '2011-01-02'], tz=tz, name=None) + + # GH 7880 + rng4 = date_range('7/1/2000', '7/31/2000', freq='D', tz=tz, + name='idx') + expected4 = DatetimeIndex([], tz=tz, name='idx') + + for (rng, expected) in [(rng2, expected2), (rng3, expected3), + (rng4, expected4)]: + result = base.intersection(rng) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq is None + assert result.tz == expected.tz + + # empty same freq GH2129 + rng = date_range('6/1/2000', '6/15/2000', freq='T') + result = rng[0:0].intersection(rng) + assert len(result) == 0 + + result = rng.intersection(rng[0:0]) + assert len(result) == 0 + + def test_intersection_bug_1708(self): + from pandas import DateOffset + index_1 = date_range('1/1/2012', periods=4, freq='12H') + index_2 = index_1 + DateOffset(hours=1) + + result = index_1 & index_2 + assert len(result) == 0 + + def test_difference_freq(self): + # GH14323: difference of DatetimeIndex should not preserve frequency + + index = date_range("20160920", "20160925", freq="D") + other = date_range("20160921", "20160924", freq="D") + expected = DatetimeIndex(["20160920", "20160925"], freq=None) + idx_diff = index.difference(other) + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + other = date_range("20160922", "20160925", freq="D") + idx_diff = index.difference(other) + expected = DatetimeIndex(["20160920", "20160921"], freq=None) + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + def test_datetimeindex_diff(self): + dti1 = DatetimeIndex(freq='Q-JAN', start=datetime(1997, 12, 31), + periods=100) + dti2 = DatetimeIndex(freq='Q-JAN', start=datetime(1997, 12, 31), + periods=98) + assert len(dti1.difference(dti2)) == 2 + + def test_datetimeindex_union_join_empty(self): + dti = DatetimeIndex(start='1/1/2001', end='2/1/2001', freq='D') + empty = Index([]) + + result = dti.union(empty) + assert isinstance(result, DatetimeIndex) + assert result is result + + result = dti.join(empty) + assert isinstance(result, DatetimeIndex) + + def test_join_nonunique(self): + idx1 = to_datetime(['2012-11-06 16:00:11.477563', + '2012-11-06 16:00:11.477563']) + idx2 = to_datetime(['2012-11-06 15:11:09.006507', + '2012-11-06 15:11:09.006507']) + rs = idx1.join(idx2, how='outer') + assert rs.is_monotonic + + +class TestBusinessDatetimeIndex(object): + + def setup_method(self, method): + self.rng = bdate_range(START, END) + + def test_union(self): + # overlapping + left = self.rng[:10] + right = self.rng[5:10] + + the_union = left.union(right) + assert isinstance(the_union, DatetimeIndex) + + # non-overlapping, gap in middle + left = self.rng[:5] + right = self.rng[10:] + + the_union = left.union(right) + assert isinstance(the_union, Index) + + # non-overlapping, no gap + left = self.rng[:5] + right = self.rng[5:10] + + the_union = left.union(right) + assert isinstance(the_union, DatetimeIndex) + + # order does not matter + tm.assert_index_equal(right.union(left), the_union) + + # overlapping, but different offset + rng = date_range(START, END, freq=BMonthEnd()) + + the_union = self.rng.union(rng) + assert isinstance(the_union, DatetimeIndex) + + def test_outer_join(self): + # should just behave as union + + # overlapping + left = self.rng[:10] + right = self.rng[5:10] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + + # non-overlapping, gap in middle + left = self.rng[:5] + right = self.rng[10:] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + assert the_join.freq is None + + # non-overlapping, no gap + left = self.rng[:5] + right = self.rng[5:10] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + + # overlapping, but different offset + rng = date_range(START, END, freq=BMonthEnd()) + + the_join = self.rng.join(rng, how='outer') + assert isinstance(the_join, DatetimeIndex) + assert the_join.freq is None + + def test_union_not_cacheable(self): + rng = date_range('1/1/2000', periods=50, freq=Minute()) + rng1 = rng[10:] + rng2 = rng[:25] + the_union = rng1.union(rng2) + tm.assert_index_equal(the_union, rng) + + rng1 = rng[10:] + rng2 = rng[15:35] + the_union = rng1.union(rng2) + expected = rng[10:] + tm.assert_index_equal(the_union, expected) + + def test_intersection(self): + rng = date_range('1/1/2000', periods=50, freq=Minute()) + rng1 = rng[10:] + rng2 = rng[:25] + the_int = rng1.intersection(rng2) + expected = rng[10:25] + tm.assert_index_equal(the_int, expected) + assert isinstance(the_int, DatetimeIndex) + assert the_int.offset == rng.offset + + the_int = rng1.intersection(rng2.view(DatetimeIndex)) + tm.assert_index_equal(the_int, expected) + + # non-overlapping + the_int = rng[:10].intersection(rng[10:]) + expected = DatetimeIndex([]) + tm.assert_index_equal(the_int, expected) + + def test_intersection_bug(self): + # GH #771 + a = bdate_range('11/30/2011', '12/31/2011') + b = bdate_range('12/10/2011', '12/20/2011') + result = a.intersection(b) + tm.assert_index_equal(result, b) + + def test_month_range_union_tz_pytz(self): + from pytz import timezone + tz = timezone('US/Eastern') + + early_start = datetime(2011, 1, 1) + early_end = datetime(2011, 3, 1) + + late_start = datetime(2011, 3, 1) + late_end = datetime(2011, 5, 1) + + early_dr = date_range(start=early_start, end=early_end, tz=tz, + freq=MonthEnd()) + late_dr = date_range(start=late_start, end=late_end, tz=tz, + freq=MonthEnd()) + + early_dr.union(late_dr) + + def test_month_range_union_tz_dateutil(self): + tm._skip_if_windows_python_3() + + from pandas._libs.tslib import _dateutil_gettz as timezone + tz = timezone('US/Eastern') + + early_start = datetime(2011, 1, 1) + early_end = datetime(2011, 3, 1) + + late_start = datetime(2011, 3, 1) + late_end = datetime(2011, 5, 1) + + early_dr = date_range(start=early_start, end=early_end, tz=tz, + freq=MonthEnd()) + late_dr = date_range(start=late_start, end=late_end, tz=tz, + freq=MonthEnd()) + + early_dr.union(late_dr) + + +class TestCustomDatetimeIndex(object): + + def setup_method(self, method): + self.rng = cdate_range(START, END) + + def test_union(self): + # overlapping + left = self.rng[:10] + right = self.rng[5:10] + + the_union = left.union(right) + assert isinstance(the_union, DatetimeIndex) + + # non-overlapping, gap in middle + left = self.rng[:5] + right = self.rng[10:] + + the_union = left.union(right) + assert isinstance(the_union, Index) + + # non-overlapping, no gap + left = self.rng[:5] + right = self.rng[5:10] + + the_union = left.union(right) + assert isinstance(the_union, DatetimeIndex) + + # order does not matter + tm.assert_index_equal(right.union(left), the_union) + + # overlapping, but different offset + rng = date_range(START, END, freq=BMonthEnd()) + + the_union = self.rng.union(rng) + assert isinstance(the_union, DatetimeIndex) + + def test_outer_join(self): + # should just behave as union + + # overlapping + left = self.rng[:10] + right = self.rng[5:10] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + + # non-overlapping, gap in middle + left = self.rng[:5] + right = self.rng[10:] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + assert the_join.freq is None + + # non-overlapping, no gap + left = self.rng[:5] + right = self.rng[5:10] + + the_join = left.join(right, how='outer') + assert isinstance(the_join, DatetimeIndex) + + # overlapping, but different offset + rng = date_range(START, END, freq=BMonthEnd()) + + the_join = self.rng.join(rng, how='outer') + assert isinstance(the_join, DatetimeIndex) + assert the_join.freq is None + + def test_intersection_bug(self): + # GH #771 + a = cdate_range('11/30/2011', '12/31/2011') + b = cdate_range('12/10/2011', '12/20/2011') + result = a.intersection(b) + tm.assert_index_equal(result, b) diff --git a/pandas/tests/indexes/datetimes/test_tools.py b/pandas/tests/indexes/datetimes/test_tools.py new file mode 100644 index 0000000000000..50669ee357bbd --- /dev/null +++ b/pandas/tests/indexes/datetimes/test_tools.py @@ -0,0 +1,1612 @@ +""" test to_datetime """ + +import sys +import pytz +import pytest +import locale +import calendar +import dateutil +import numpy as np +from dateutil.parser import parse +from datetime import datetime, date, time +from distutils.version import LooseVersion + +import pandas as pd +from pandas._libs import tslib, lib +from pandas.core.tools import datetimes as tools +from pandas.core.tools.datetimes import normalize_date +from pandas.compat import lmap +from pandas.compat.numpy import np_array_datetime64_compat +from pandas.core.dtypes.common import is_datetime64_ns_dtype +from pandas.util import testing as tm +from pandas.util.testing import assert_series_equal, _skip_if_has_locale +from pandas import (isna, to_datetime, Timestamp, Series, DataFrame, + Index, DatetimeIndex, NaT, date_range, bdate_range, + compat) + + +class TimeConversionFormats(object): + + def test_to_datetime_format(self): + values = ['1/1/2000', '1/2/2000', '1/3/2000'] + + results1 = [Timestamp('20000101'), Timestamp('20000201'), + Timestamp('20000301')] + results2 = [Timestamp('20000101'), Timestamp('20000102'), + Timestamp('20000103')] + for vals, expecteds in [(values, (Index(results1), Index(results2))), + (Series(values), + (Series(results1), Series(results2))), + (values[0], (results1[0], results2[0])), + (values[1], (results1[1], results2[1])), + (values[2], (results1[2], results2[2]))]: + + for i, fmt in enumerate(['%d/%m/%Y', '%m/%d/%Y']): + result = to_datetime(vals, format=fmt) + expected = expecteds[i] + + if isinstance(expected, Series): + assert_series_equal(result, Series(expected)) + elif isinstance(expected, Timestamp): + assert result == expected + else: + tm.assert_index_equal(result, expected) + + def test_to_datetime_format_YYYYMMDD(self): + s = Series([19801222, 19801222] + [19810105] * 5) + expected = Series([Timestamp(x) for x in s.apply(str)]) + + result = to_datetime(s, format='%Y%m%d') + assert_series_equal(result, expected) + + result = to_datetime(s.apply(str), format='%Y%m%d') + assert_series_equal(result, expected) + + # with NaT + expected = Series([Timestamp("19801222"), Timestamp("19801222")] + + [Timestamp("19810105")] * 5) + expected[2] = np.nan + s[2] = np.nan + + result = to_datetime(s, format='%Y%m%d') + assert_series_equal(result, expected) + + # string with NaT + s = s.apply(str) + s[2] = 'nat' + result = to_datetime(s, format='%Y%m%d') + assert_series_equal(result, expected) + + # coercion + # GH 7930 + s = Series([20121231, 20141231, 99991231]) + result = pd.to_datetime(s, format='%Y%m%d', errors='ignore') + expected = Series([datetime(2012, 12, 31), + datetime(2014, 12, 31), datetime(9999, 12, 31)], + dtype=object) + tm.assert_series_equal(result, expected) + + result = pd.to_datetime(s, format='%Y%m%d', errors='coerce') + expected = Series(['20121231', '20141231', 'NaT'], dtype='M8[ns]') + assert_series_equal(result, expected) + + # GH 10178 + def test_to_datetime_format_integer(self): + s = Series([2000, 2001, 2002]) + expected = Series([Timestamp(x) for x in s.apply(str)]) + + result = to_datetime(s, format='%Y') + assert_series_equal(result, expected) + + s = Series([200001, 200105, 200206]) + expected = Series([Timestamp(x[:4] + '-' + x[4:]) for x in s.apply(str) + ]) + + result = to_datetime(s, format='%Y%m') + assert_series_equal(result, expected) + + def test_to_datetime_format_microsecond(self): + + # these are locale dependent + lang, _ = locale.getlocale() + month_abbr = calendar.month_abbr[4] + val = '01-{}-2011 00:00:01.978'.format(month_abbr) + + format = '%d-%b-%Y %H:%M:%S.%f' + result = to_datetime(val, format=format) + exp = datetime.strptime(val, format) + assert result == exp + + def test_to_datetime_format_time(self): + data = [ + ['01/10/2010 15:20', '%m/%d/%Y %H:%M', + Timestamp('2010-01-10 15:20')], + ['01/10/2010 05:43', '%m/%d/%Y %I:%M', + Timestamp('2010-01-10 05:43')], + ['01/10/2010 13:56:01', '%m/%d/%Y %H:%M:%S', + Timestamp('2010-01-10 13:56:01')] # , + # ['01/10/2010 08:14 PM', '%m/%d/%Y %I:%M %p', + # Timestamp('2010-01-10 20:14')], + # ['01/10/2010 07:40 AM', '%m/%d/%Y %I:%M %p', + # Timestamp('2010-01-10 07:40')], + # ['01/10/2010 09:12:56 AM', '%m/%d/%Y %I:%M:%S %p', + # Timestamp('2010-01-10 09:12:56')] + ] + for s, format, dt in data: + assert to_datetime(s, format=format) == dt + + def test_to_datetime_with_non_exact(self): + # GH 10834 + tm._skip_if_has_locale() + + # 8904 + # exact kw + if sys.version_info < (2, 7): + pytest.skip('on python version < 2.7') + + s = Series(['19MAY11', 'foobar19MAY11', '19MAY11:00:00:00', + '19MAY11 00:00:00Z']) + result = to_datetime(s, format='%d%b%y', exact=False) + expected = to_datetime(s.str.extract(r'(\d+\w+\d+)', expand=False), + format='%d%b%y') + assert_series_equal(result, expected) + + def test_parse_nanoseconds_with_formula(self): + + # GH8989 + # trunctaing the nanoseconds when a format was provided + for v in ["2012-01-01 09:00:00.000000001", + "2012-01-01 09:00:00.000001", + "2012-01-01 09:00:00.001", + "2012-01-01 09:00:00.001000", + "2012-01-01 09:00:00.001000000", ]: + expected = pd.to_datetime(v) + result = pd.to_datetime(v, format="%Y-%m-%d %H:%M:%S.%f") + assert result == expected + + def test_to_datetime_format_weeks(self): + data = [ + ['2009324', '%Y%W%w', Timestamp('2009-08-13')], + ['2013020', '%Y%U%w', Timestamp('2013-01-13')] + ] + for s, format, dt in data: + assert to_datetime(s, format=format) == dt + + +class TestToDatetime(object): + + def test_to_datetime_dt64s(self): + in_bound_dts = [ + np.datetime64('2000-01-01'), + np.datetime64('2000-01-02'), + ] + + for dt in in_bound_dts: + assert pd.to_datetime(dt) == Timestamp(dt) + + oob_dts = [np.datetime64('1000-01-01'), np.datetime64('5000-01-02'), ] + + for dt in oob_dts: + pytest.raises(ValueError, pd.to_datetime, dt, errors='raise') + pytest.raises(ValueError, Timestamp, dt) + assert pd.to_datetime(dt, errors='coerce') is NaT + + def test_to_datetime_array_of_dt64s(self): + dts = [np.datetime64('2000-01-01'), np.datetime64('2000-01-02'), ] + + # Assuming all datetimes are in bounds, to_datetime() returns + # an array that is equal to Timestamp() parsing + tm.assert_numpy_array_equal( + pd.to_datetime(dts, box=False), + np.array([Timestamp(x).asm8 for x in dts]) + ) + + # A list of datetimes where the last one is out of bounds + dts_with_oob = dts + [np.datetime64('9999-01-01')] + + pytest.raises(ValueError, pd.to_datetime, dts_with_oob, + errors='raise') + + tm.assert_numpy_array_equal( + pd.to_datetime(dts_with_oob, box=False, errors='coerce'), + np.array( + [ + Timestamp(dts_with_oob[0]).asm8, + Timestamp(dts_with_oob[1]).asm8, + tslib.iNaT, + ], + dtype='M8' + ) + ) + + # With errors='ignore', out of bounds datetime64s + # are converted to their .item(), which depending on the version of + # numpy is either a python datetime.datetime or datetime.date + tm.assert_numpy_array_equal( + pd.to_datetime(dts_with_oob, box=False, errors='ignore'), + np.array( + [dt.item() for dt in dts_with_oob], + dtype='O' + ) + ) + + def test_to_datetime_tz(self): + + # xref 8260 + # uniform returns a DatetimeIndex + arr = [pd.Timestamp('2013-01-01 13:00:00-0800', tz='US/Pacific'), + pd.Timestamp('2013-01-02 14:00:00-0800', tz='US/Pacific')] + result = pd.to_datetime(arr) + expected = DatetimeIndex( + ['2013-01-01 13:00:00', '2013-01-02 14:00:00'], tz='US/Pacific') + tm.assert_index_equal(result, expected) + + # mixed tzs will raise + arr = [pd.Timestamp('2013-01-01 13:00:00', tz='US/Pacific'), + pd.Timestamp('2013-01-02 14:00:00', tz='US/Eastern')] + pytest.raises(ValueError, lambda: pd.to_datetime(arr)) + + def test_to_datetime_tz_pytz(self): + # see gh-8260 + us_eastern = pytz.timezone('US/Eastern') + arr = np.array([us_eastern.localize(datetime(year=2000, month=1, day=1, + hour=3, minute=0)), + us_eastern.localize(datetime(year=2000, month=6, day=1, + hour=3, minute=0))], + dtype=object) + result = pd.to_datetime(arr, utc=True) + expected = DatetimeIndex(['2000-01-01 08:00:00+00:00', + '2000-06-01 07:00:00+00:00'], + dtype='datetime64[ns, UTC]', freq=None) + tm.assert_index_equal(result, expected) + + def test_to_datetime_utc_is_true(self): + # See gh-11934 + start = pd.Timestamp('2014-01-01', tz='utc') + end = pd.Timestamp('2014-01-03', tz='utc') + date_range = pd.bdate_range(start, end) + + result = pd.to_datetime(date_range, utc=True) + expected = pd.DatetimeIndex(data=date_range) + tm.assert_index_equal(result, expected) + + def test_to_datetime_tz_psycopg2(self): + + # xref 8260 + try: + import psycopg2 + except ImportError: + pytest.skip("no psycopg2 installed") + + # misc cases + tz1 = psycopg2.tz.FixedOffsetTimezone(offset=-300, name=None) + tz2 = psycopg2.tz.FixedOffsetTimezone(offset=-240, name=None) + arr = np.array([datetime(2000, 1, 1, 3, 0, tzinfo=tz1), + datetime(2000, 6, 1, 3, 0, tzinfo=tz2)], + dtype=object) + + result = pd.to_datetime(arr, errors='coerce', utc=True) + expected = DatetimeIndex(['2000-01-01 08:00:00+00:00', + '2000-06-01 07:00:00+00:00'], + dtype='datetime64[ns, UTC]', freq=None) + tm.assert_index_equal(result, expected) + + # dtype coercion + i = pd.DatetimeIndex([ + '2000-01-01 08:00:00+00:00' + ], tz=psycopg2.tz.FixedOffsetTimezone(offset=-300, name=None)) + assert is_datetime64_ns_dtype(i) + + # tz coerceion + result = pd.to_datetime(i, errors='coerce') + tm.assert_index_equal(result, i) + + result = pd.to_datetime(i, errors='coerce', utc=True) + expected = pd.DatetimeIndex(['2000-01-01 13:00:00'], + dtype='datetime64[ns, UTC]') + tm.assert_index_equal(result, expected) + + def test_datetime_bool(self): + # GH13176 + with pytest.raises(TypeError): + to_datetime(False) + assert to_datetime(False, errors="coerce") is NaT + assert to_datetime(False, errors="ignore") is False + with pytest.raises(TypeError): + to_datetime(True) + assert to_datetime(True, errors="coerce") is NaT + assert to_datetime(True, errors="ignore") is True + with pytest.raises(TypeError): + to_datetime([False, datetime.today()]) + with pytest.raises(TypeError): + to_datetime(['20130101', True]) + tm.assert_index_equal(to_datetime([0, False, NaT, 0.0], + errors="coerce"), + DatetimeIndex([to_datetime(0), NaT, + NaT, to_datetime(0)])) + + def test_datetime_invalid_datatype(self): + # GH13176 + + with pytest.raises(TypeError): + pd.to_datetime(bool) + with pytest.raises(TypeError): + pd.to_datetime(pd.to_datetime) + + +class ToDatetimeUnit(object): + + def test_unit(self): + # GH 11758 + # test proper behavior with erros + + with pytest.raises(ValueError): + to_datetime([1], unit='D', format='%Y%m%d') + + values = [11111111, 1, 1.0, tslib.iNaT, NaT, np.nan, + 'NaT', ''] + result = to_datetime(values, unit='D', errors='ignore') + expected = Index([11111111, Timestamp('1970-01-02'), + Timestamp('1970-01-02'), NaT, + NaT, NaT, NaT, NaT], + dtype=object) + tm.assert_index_equal(result, expected) + + result = to_datetime(values, unit='D', errors='coerce') + expected = DatetimeIndex(['NaT', '1970-01-02', '1970-01-02', + 'NaT', 'NaT', 'NaT', 'NaT', 'NaT']) + tm.assert_index_equal(result, expected) + + with pytest.raises(tslib.OutOfBoundsDatetime): + to_datetime(values, unit='D', errors='raise') + + values = [1420043460000, tslib.iNaT, NaT, np.nan, 'NaT'] + + result = to_datetime(values, errors='ignore', unit='s') + expected = Index([1420043460000, NaT, NaT, + NaT, NaT], dtype=object) + tm.assert_index_equal(result, expected) + + result = to_datetime(values, errors='coerce', unit='s') + expected = DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT', 'NaT']) + tm.assert_index_equal(result, expected) + + with pytest.raises(tslib.OutOfBoundsDatetime): + to_datetime(values, errors='raise', unit='s') + + # if we have a string, then we raise a ValueError + # and NOT an OutOfBoundsDatetime + for val in ['foo', Timestamp('20130101')]: + try: + to_datetime(val, errors='raise', unit='s') + except tslib.OutOfBoundsDatetime: + raise AssertionError("incorrect exception raised") + except ValueError: + pass + + def test_unit_consistency(self): + + # consistency of conversions + expected = Timestamp('1970-05-09 14:25:11') + result = pd.to_datetime(11111111, unit='s', errors='raise') + assert result == expected + assert isinstance(result, Timestamp) + + result = pd.to_datetime(11111111, unit='s', errors='coerce') + assert result == expected + assert isinstance(result, Timestamp) + + result = pd.to_datetime(11111111, unit='s', errors='ignore') + assert result == expected + assert isinstance(result, Timestamp) + + def test_unit_with_numeric(self): + + # GH 13180 + # coercions from floats/ints are ok + expected = DatetimeIndex(['2015-06-19 05:33:20', + '2015-05-27 22:33:20']) + arr1 = [1.434692e+18, 1.432766e+18] + arr2 = np.array(arr1).astype('int64') + for errors in ['ignore', 'raise', 'coerce']: + result = pd.to_datetime(arr1, errors=errors) + tm.assert_index_equal(result, expected) + + result = pd.to_datetime(arr2, errors=errors) + tm.assert_index_equal(result, expected) + + # but we want to make sure that we are coercing + # if we have ints/strings + expected = DatetimeIndex(['NaT', + '2015-06-19 05:33:20', + '2015-05-27 22:33:20']) + arr = ['foo', 1.434692e+18, 1.432766e+18] + result = pd.to_datetime(arr, errors='coerce') + tm.assert_index_equal(result, expected) + + expected = DatetimeIndex(['2015-06-19 05:33:20', + '2015-05-27 22:33:20', + 'NaT', + 'NaT']) + arr = [1.434692e+18, 1.432766e+18, 'foo', 'NaT'] + result = pd.to_datetime(arr, errors='coerce') + tm.assert_index_equal(result, expected) + + def test_unit_mixed(self): + + # mixed integers/datetimes + expected = DatetimeIndex(['2013-01-01', 'NaT', 'NaT']) + arr = [pd.Timestamp('20130101'), 1.434692e+18, 1.432766e+18] + result = pd.to_datetime(arr, errors='coerce') + tm.assert_index_equal(result, expected) + + with pytest.raises(ValueError): + pd.to_datetime(arr, errors='raise') + + expected = DatetimeIndex(['NaT', + 'NaT', + '2013-01-01']) + arr = [1.434692e+18, 1.432766e+18, pd.Timestamp('20130101')] + result = pd.to_datetime(arr, errors='coerce') + tm.assert_index_equal(result, expected) + + with pytest.raises(ValueError): + pd.to_datetime(arr, errors='raise') + + def test_dataframe(self): + + df = DataFrame({'year': [2015, 2016], + 'month': [2, 3], + 'day': [4, 5], + 'hour': [6, 7], + 'minute': [58, 59], + 'second': [10, 11], + 'ms': [1, 1], + 'us': [2, 2], + 'ns': [3, 3]}) + + result = to_datetime({'year': df['year'], + 'month': df['month'], + 'day': df['day']}) + expected = Series([Timestamp('20150204 00:00:00'), + Timestamp('20160305 00:0:00')]) + assert_series_equal(result, expected) + + # dict-like + result = to_datetime(df[['year', 'month', 'day']].to_dict()) + assert_series_equal(result, expected) + + # dict but with constructable + df2 = df[['year', 'month', 'day']].to_dict() + df2['month'] = 2 + result = to_datetime(df2) + expected2 = Series([Timestamp('20150204 00:00:00'), + Timestamp('20160205 00:0:00')]) + assert_series_equal(result, expected2) + + # unit mappings + units = [{'year': 'years', + 'month': 'months', + 'day': 'days', + 'hour': 'hours', + 'minute': 'minutes', + 'second': 'seconds'}, + {'year': 'year', + 'month': 'month', + 'day': 'day', + 'hour': 'hour', + 'minute': 'minute', + 'second': 'second'}, + ] + + for d in units: + result = to_datetime(df[list(d.keys())].rename(columns=d)) + expected = Series([Timestamp('20150204 06:58:10'), + Timestamp('20160305 07:59:11')]) + assert_series_equal(result, expected) + + d = {'year': 'year', + 'month': 'month', + 'day': 'day', + 'hour': 'hour', + 'minute': 'minute', + 'second': 'second', + 'ms': 'ms', + 'us': 'us', + 'ns': 'ns'} + + result = to_datetime(df.rename(columns=d)) + expected = Series([Timestamp('20150204 06:58:10.001002003'), + Timestamp('20160305 07:59:11.001002003')]) + assert_series_equal(result, expected) + + # coerce back to int + result = to_datetime(df.astype(str)) + assert_series_equal(result, expected) + + # passing coerce + df2 = DataFrame({'year': [2015, 2016], + 'month': [2, 20], + 'day': [4, 5]}) + with pytest.raises(ValueError): + to_datetime(df2) + result = to_datetime(df2, errors='coerce') + expected = Series([Timestamp('20150204 00:00:00'), + NaT]) + assert_series_equal(result, expected) + + # extra columns + with pytest.raises(ValueError): + df2 = df.copy() + df2['foo'] = 1 + to_datetime(df2) + + # not enough + for c in [['year'], + ['year', 'month'], + ['year', 'month', 'second'], + ['month', 'day'], + ['year', 'day', 'second']]: + with pytest.raises(ValueError): + to_datetime(df[c]) + + # duplicates + df2 = DataFrame({'year': [2015, 2016], + 'month': [2, 20], + 'day': [4, 5]}) + df2.columns = ['year', 'year', 'day'] + with pytest.raises(ValueError): + to_datetime(df2) + + df2 = DataFrame({'year': [2015, 2016], + 'month': [2, 20], + 'day': [4, 5], + 'hour': [4, 5]}) + df2.columns = ['year', 'month', 'day', 'day'] + with pytest.raises(ValueError): + to_datetime(df2) + + def test_dataframe_dtypes(self): + # #13451 + df = DataFrame({'year': [2015, 2016], + 'month': [2, 3], + 'day': [4, 5]}) + + # int16 + result = to_datetime(df.astype('int16')) + expected = Series([Timestamp('20150204 00:00:00'), + Timestamp('20160305 00:00:00')]) + assert_series_equal(result, expected) + + # mixed dtypes + df['month'] = df['month'].astype('int8') + df['day'] = df['day'].astype('int8') + result = to_datetime(df) + expected = Series([Timestamp('20150204 00:00:00'), + Timestamp('20160305 00:00:00')]) + assert_series_equal(result, expected) + + # float + df = DataFrame({'year': [2000, 2001], + 'month': [1.5, 1], + 'day': [1, 1]}) + with pytest.raises(ValueError): + to_datetime(df) + + +class ToDatetimeMisc(object): + + def test_index_to_datetime(self): + idx = Index(['1/1/2000', '1/2/2000', '1/3/2000']) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = idx.to_datetime() + expected = DatetimeIndex(pd.to_datetime(idx.values)) + tm.assert_index_equal(result, expected) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + today = datetime.today() + idx = Index([today], dtype=object) + result = idx.to_datetime() + expected = DatetimeIndex([today]) + tm.assert_index_equal(result, expected) + + def test_to_datetime_iso8601(self): + result = to_datetime(["2012-01-01 00:00:00"]) + exp = Timestamp("2012-01-01 00:00:00") + assert result[0] == exp + + result = to_datetime(['20121001']) # bad iso 8601 + exp = Timestamp('2012-10-01') + assert result[0] == exp + + def test_to_datetime_default(self): + rs = to_datetime('2001') + xp = datetime(2001, 1, 1) + assert rs == xp + + # dayfirst is essentially broken + + # to_datetime('01-13-2012', dayfirst=True) + # pytest.raises(ValueError, to_datetime('01-13-2012', + # dayfirst=True)) + + def test_to_datetime_on_datetime64_series(self): + # #2699 + s = Series(date_range('1/1/2000', periods=10)) + + result = to_datetime(s) + assert result[0] == s[0] + + def test_to_datetime_with_space_in_series(self): + # GH 6428 + s = Series(['10/18/2006', '10/18/2008', ' ']) + pytest.raises(ValueError, lambda: to_datetime(s, errors='raise')) + result_coerce = to_datetime(s, errors='coerce') + expected_coerce = Series([datetime(2006, 10, 18), + datetime(2008, 10, 18), + NaT]) + tm.assert_series_equal(result_coerce, expected_coerce) + result_ignore = to_datetime(s, errors='ignore') + tm.assert_series_equal(result_ignore, s) + + def test_to_datetime_with_apply(self): + # this is only locale tested with US/None locales + tm._skip_if_has_locale() + + # GH 5195 + # with a format and coerce a single item to_datetime fails + td = Series(['May 04', 'Jun 02', 'Dec 11'], index=[1, 2, 3]) + expected = pd.to_datetime(td, format='%b %y') + result = td.apply(pd.to_datetime, format='%b %y') + assert_series_equal(result, expected) + + td = pd.Series(['May 04', 'Jun 02', ''], index=[1, 2, 3]) + pytest.raises(ValueError, + lambda: pd.to_datetime(td, format='%b %y', + errors='raise')) + pytest.raises(ValueError, + lambda: td.apply(pd.to_datetime, format='%b %y', + errors='raise')) + expected = pd.to_datetime(td, format='%b %y', errors='coerce') + + result = td.apply( + lambda x: pd.to_datetime(x, format='%b %y', errors='coerce')) + assert_series_equal(result, expected) + + def test_to_datetime_types(self): + + # empty string + result = to_datetime('') + assert result is NaT + + result = to_datetime(['', '']) + assert isna(result).all() + + # ints + result = Timestamp(0) + expected = to_datetime(0) + assert result == expected + + # GH 3888 (strings) + expected = to_datetime(['2012'])[0] + result = to_datetime('2012') + assert result == expected + + # array = ['2012','20120101','20120101 12:01:01'] + array = ['20120101', '20120101 12:01:01'] + expected = list(to_datetime(array)) + result = lmap(Timestamp, array) + tm.assert_almost_equal(result, expected) + + # currently fails ### + # result = Timestamp('2012') + # expected = to_datetime('2012') + # assert result == expected + + def test_to_datetime_unprocessable_input(self): + # GH 4928 + tm.assert_numpy_array_equal( + to_datetime([1, '1'], errors='ignore'), + np.array([1, '1'], dtype='O') + ) + pytest.raises(TypeError, to_datetime, [1, '1'], errors='raise') + + def test_to_datetime_other_datetime64_units(self): + # 5/25/2012 + scalar = np.int64(1337904000000000).view('M8[us]') + as_obj = scalar.astype('O') + + index = DatetimeIndex([scalar]) + assert index[0] == scalar.astype('O') + + value = Timestamp(scalar) + assert value == as_obj + + def test_to_datetime_list_of_integers(self): + rng = date_range('1/1/2000', periods=20) + rng = DatetimeIndex(rng.values) + + ints = list(rng.asi8) + + result = DatetimeIndex(ints) + + tm.assert_index_equal(rng, result) + + def test_to_datetime_freq(self): + xp = bdate_range('2000-1-1', periods=10, tz='UTC') + rs = xp.to_datetime() + assert xp.freq == rs.freq + assert xp.tzinfo == rs.tzinfo + + def test_string_na_nat_conversion(self): + # GH #999, #858 + + from pandas.compat import parse_date + + strings = np.array(['1/1/2000', '1/2/2000', np.nan, + '1/4/2000, 12:34:56'], dtype=object) + + expected = np.empty(4, dtype='M8[ns]') + for i, val in enumerate(strings): + if isna(val): + expected[i] = tslib.iNaT + else: + expected[i] = parse_date(val) + + result = tslib.array_to_datetime(strings) + tm.assert_almost_equal(result, expected) + + result2 = to_datetime(strings) + assert isinstance(result2, DatetimeIndex) + tm.assert_numpy_array_equal(result, result2.values) + + malformed = np.array(['1/100/2000', np.nan], dtype=object) + + # GH 10636, default is now 'raise' + pytest.raises(ValueError, + lambda: to_datetime(malformed, errors='raise')) + + result = to_datetime(malformed, errors='ignore') + tm.assert_numpy_array_equal(result, malformed) + + pytest.raises(ValueError, to_datetime, malformed, errors='raise') + + idx = ['a', 'b', 'c', 'd', 'e'] + series = Series(['1/1/2000', np.nan, '1/3/2000', np.nan, + '1/5/2000'], index=idx, name='foo') + dseries = Series([to_datetime('1/1/2000'), np.nan, + to_datetime('1/3/2000'), np.nan, + to_datetime('1/5/2000')], index=idx, name='foo') + + result = to_datetime(series) + dresult = to_datetime(dseries) + + expected = Series(np.empty(5, dtype='M8[ns]'), index=idx) + for i in range(5): + x = series[i] + if isna(x): + expected[i] = tslib.iNaT + else: + expected[i] = to_datetime(x) + + assert_series_equal(result, expected, check_names=False) + assert result.name == 'foo' + + assert_series_equal(dresult, expected, check_names=False) + assert dresult.name == 'foo' + + def test_dti_constructor_numpy_timeunits(self): + # GH 9114 + base = pd.to_datetime(['2000-01-01T00:00', '2000-01-02T00:00', 'NaT']) + + for dtype in ['datetime64[h]', 'datetime64[m]', 'datetime64[s]', + 'datetime64[ms]', 'datetime64[us]', 'datetime64[ns]']: + values = base.values.astype(dtype) + + tm.assert_index_equal(DatetimeIndex(values), base) + tm.assert_index_equal(to_datetime(values), base) + + def test_dayfirst(self): + # GH 5917 + arr = ['10/02/2014', '11/02/2014', '12/02/2014'] + expected = DatetimeIndex([datetime(2014, 2, 10), datetime(2014, 2, 11), + datetime(2014, 2, 12)]) + idx1 = DatetimeIndex(arr, dayfirst=True) + idx2 = DatetimeIndex(np.array(arr), dayfirst=True) + idx3 = to_datetime(arr, dayfirst=True) + idx4 = to_datetime(np.array(arr), dayfirst=True) + idx5 = DatetimeIndex(Index(arr), dayfirst=True) + idx6 = DatetimeIndex(Series(arr), dayfirst=True) + tm.assert_index_equal(expected, idx1) + tm.assert_index_equal(expected, idx2) + tm.assert_index_equal(expected, idx3) + tm.assert_index_equal(expected, idx4) + tm.assert_index_equal(expected, idx5) + tm.assert_index_equal(expected, idx6) + + +class TestGuessDatetimeFormat(object): + + def test_guess_datetime_format_with_parseable_formats(self): + tm._skip_if_not_us_locale() + dt_string_to_format = (('20111230', '%Y%m%d'), + ('2011-12-30', '%Y-%m-%d'), + ('30-12-2011', '%d-%m-%Y'), + ('2011-12-30 00:00:00', '%Y-%m-%d %H:%M:%S'), + ('2011-12-30T00:00:00', '%Y-%m-%dT%H:%M:%S'), + ('2011-12-30 00:00:00.000000', + '%Y-%m-%d %H:%M:%S.%f'), ) + + for dt_string, dt_format in dt_string_to_format: + assert tools._guess_datetime_format(dt_string) == dt_format + + def test_guess_datetime_format_with_dayfirst(self): + ambiguous_string = '01/01/2011' + assert tools._guess_datetime_format( + ambiguous_string, dayfirst=True) == '%d/%m/%Y' + assert tools._guess_datetime_format( + ambiguous_string, dayfirst=False) == '%m/%d/%Y' + + def test_guess_datetime_format_with_locale_specific_formats(self): + # The month names will vary depending on the locale, in which + # case these wont be parsed properly (dateutil can't parse them) + tm._skip_if_has_locale() + + dt_string_to_format = (('30/Dec/2011', '%d/%b/%Y'), + ('30/December/2011', '%d/%B/%Y'), + ('30/Dec/2011 00:00:00', '%d/%b/%Y %H:%M:%S'), ) + + for dt_string, dt_format in dt_string_to_format: + assert tools._guess_datetime_format(dt_string) == dt_format + + def test_guess_datetime_format_invalid_inputs(self): + # A datetime string must include a year, month and a day for it + # to be guessable, in addition to being a string that looks like + # a datetime + invalid_dts = [ + '2013', + '01/2013', + '12:00:00', + '1/1/1/1', + 'this_is_not_a_datetime', + '51a', + 9, + datetime(2011, 1, 1), + ] + + for invalid_dt in invalid_dts: + assert tools._guess_datetime_format(invalid_dt) is None + + def test_guess_datetime_format_nopadding(self): + # GH 11142 + dt_string_to_format = (('2011-1-1', '%Y-%m-%d'), + ('30-1-2011', '%d-%m-%Y'), + ('1/1/2011', '%m/%d/%Y'), + ('2011-1-1 00:00:00', '%Y-%m-%d %H:%M:%S'), + ('2011-1-1 0:0:0', '%Y-%m-%d %H:%M:%S'), + ('2011-1-3T00:00:0', '%Y-%m-%dT%H:%M:%S')) + + for dt_string, dt_format in dt_string_to_format: + assert tools._guess_datetime_format(dt_string) == dt_format + + def test_guess_datetime_format_for_array(self): + tm._skip_if_not_us_locale() + expected_format = '%Y-%m-%d %H:%M:%S.%f' + dt_string = datetime(2011, 12, 30, 0, 0, 0).strftime(expected_format) + + test_arrays = [ + np.array([dt_string, dt_string, dt_string], dtype='O'), + np.array([np.nan, np.nan, dt_string], dtype='O'), + np.array([dt_string, 'random_string'], dtype='O'), + ] + + for test_array in test_arrays: + assert tools._guess_datetime_format_for_array( + test_array) == expected_format + + format_for_string_of_nans = tools._guess_datetime_format_for_array( + np.array( + [np.nan, np.nan, np.nan], dtype='O')) + assert format_for_string_of_nans is None + + +class TestToDatetimeInferFormat(object): + + def test_to_datetime_infer_datetime_format_consistent_format(self): + s = pd.Series(pd.date_range('20000101', periods=50, freq='H')) + + test_formats = ['%m-%d-%Y', '%m/%d/%Y %H:%M:%S.%f', + '%Y-%m-%dT%H:%M:%S.%f'] + + for test_format in test_formats: + s_as_dt_strings = s.apply(lambda x: x.strftime(test_format)) + + with_format = pd.to_datetime(s_as_dt_strings, format=test_format) + no_infer = pd.to_datetime(s_as_dt_strings, + infer_datetime_format=False) + yes_infer = pd.to_datetime(s_as_dt_strings, + infer_datetime_format=True) + + # Whether the format is explicitly passed, it is inferred, or + # it is not inferred, the results should all be the same + tm.assert_series_equal(with_format, no_infer) + tm.assert_series_equal(no_infer, yes_infer) + + def test_to_datetime_infer_datetime_format_inconsistent_format(self): + s = pd.Series(np.array(['01/01/2011 00:00:00', + '01-02-2011 00:00:00', + '2011-01-03T00:00:00'])) + + # When the format is inconsistent, infer_datetime_format should just + # fallback to the default parsing + tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), + pd.to_datetime(s, infer_datetime_format=True)) + + s = pd.Series(np.array(['Jan/01/2011', 'Feb/01/2011', 'Mar/01/2011'])) + + tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), + pd.to_datetime(s, infer_datetime_format=True)) + + def test_to_datetime_infer_datetime_format_series_with_nans(self): + s = pd.Series(np.array(['01/01/2011 00:00:00', np.nan, + '01/03/2011 00:00:00', np.nan])) + tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), + pd.to_datetime(s, infer_datetime_format=True)) + + def test_to_datetime_infer_datetime_format_series_starting_with_nans(self): + s = pd.Series(np.array([np.nan, np.nan, '01/01/2011 00:00:00', + '01/02/2011 00:00:00', '01/03/2011 00:00:00'])) + + tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), + pd.to_datetime(s, infer_datetime_format=True)) + + def test_to_datetime_iso8601_noleading_0s(self): + # GH 11871 + s = pd.Series(['2014-1-1', '2014-2-2', '2015-3-3']) + expected = pd.Series([pd.Timestamp('2014-01-01'), + pd.Timestamp('2014-02-02'), + pd.Timestamp('2015-03-03')]) + tm.assert_series_equal(pd.to_datetime(s), expected) + tm.assert_series_equal(pd.to_datetime(s, format='%Y-%m-%d'), expected) + + +class TestDaysInMonth(object): + # tests for issue #10154 + + def test_day_not_in_month_coerce(self): + assert isna(to_datetime('2015-02-29', errors='coerce')) + assert isna(to_datetime('2015-02-29', format="%Y-%m-%d", + errors='coerce')) + assert isna(to_datetime('2015-02-32', format="%Y-%m-%d", + errors='coerce')) + assert isna(to_datetime('2015-04-31', format="%Y-%m-%d", + errors='coerce')) + + def test_day_not_in_month_raise(self): + pytest.raises(ValueError, to_datetime, '2015-02-29', + errors='raise') + pytest.raises(ValueError, to_datetime, '2015-02-29', + errors='raise', format="%Y-%m-%d") + pytest.raises(ValueError, to_datetime, '2015-02-32', + errors='raise', format="%Y-%m-%d") + pytest.raises(ValueError, to_datetime, '2015-04-31', + errors='raise', format="%Y-%m-%d") + + def test_day_not_in_month_ignore(self): + assert to_datetime('2015-02-29', errors='ignore') == '2015-02-29' + assert to_datetime('2015-02-29', errors='ignore', + format="%Y-%m-%d") == '2015-02-29' + assert to_datetime('2015-02-32', errors='ignore', + format="%Y-%m-%d") == '2015-02-32' + assert to_datetime('2015-04-31', errors='ignore', + format="%Y-%m-%d") == '2015-04-31' + + +class TestDatetimeParsingWrappers(object): + def test_does_not_convert_mixed_integer(self): + bad_date_strings = ('-50000', '999', '123.1234', 'm', 'T') + + for bad_date_string in bad_date_strings: + assert not tslib._does_string_look_like_datetime(bad_date_string) + + good_date_strings = ('2012-01-01', + '01/01/2012', + 'Mon Sep 16, 2013', + '01012012', + '0101', + '1-1', ) + + for good_date_string in good_date_strings: + assert tslib._does_string_look_like_datetime(good_date_string) + + def test_parsers(self): + + # https://github.com/dateutil/dateutil/issues/217 + import dateutil + yearfirst = dateutil.__version__ >= LooseVersion('2.5.0') + + cases = {'2011-01-01': datetime(2011, 1, 1), + '2Q2005': datetime(2005, 4, 1), + '2Q05': datetime(2005, 4, 1), + '2005Q1': datetime(2005, 1, 1), + '05Q1': datetime(2005, 1, 1), + '2011Q3': datetime(2011, 7, 1), + '11Q3': datetime(2011, 7, 1), + '3Q2011': datetime(2011, 7, 1), + '3Q11': datetime(2011, 7, 1), + + # quarterly without space + '2000Q4': datetime(2000, 10, 1), + '00Q4': datetime(2000, 10, 1), + '4Q2000': datetime(2000, 10, 1), + '4Q00': datetime(2000, 10, 1), + '2000q4': datetime(2000, 10, 1), + '2000-Q4': datetime(2000, 10, 1), + '00-Q4': datetime(2000, 10, 1), + '4Q-2000': datetime(2000, 10, 1), + '4Q-00': datetime(2000, 10, 1), + '00q4': datetime(2000, 10, 1), + '2005': datetime(2005, 1, 1), + '2005-11': datetime(2005, 11, 1), + '2005 11': datetime(2005, 11, 1), + '11-2005': datetime(2005, 11, 1), + '11 2005': datetime(2005, 11, 1), + '200511': datetime(2020, 5, 11), + '20051109': datetime(2005, 11, 9), + '20051109 10:15': datetime(2005, 11, 9, 10, 15), + '20051109 08H': datetime(2005, 11, 9, 8, 0), + '2005-11-09 10:15': datetime(2005, 11, 9, 10, 15), + '2005-11-09 08H': datetime(2005, 11, 9, 8, 0), + '2005/11/09 10:15': datetime(2005, 11, 9, 10, 15), + '2005/11/09 08H': datetime(2005, 11, 9, 8, 0), + "Thu Sep 25 10:36:28 2003": datetime(2003, 9, 25, 10, + 36, 28), + "Thu Sep 25 2003": datetime(2003, 9, 25), + "Sep 25 2003": datetime(2003, 9, 25), + "January 1 2014": datetime(2014, 1, 1), + + # GH 10537 + '2014-06': datetime(2014, 6, 1), + '06-2014': datetime(2014, 6, 1), + '2014-6': datetime(2014, 6, 1), + '6-2014': datetime(2014, 6, 1), + + '20010101 12': datetime(2001, 1, 1, 12), + '20010101 1234': datetime(2001, 1, 1, 12, 34), + '20010101 123456': datetime(2001, 1, 1, 12, 34, 56), + } + + for date_str, expected in compat.iteritems(cases): + result1, _, _ = tools.parse_time_string(date_str, + yearfirst=yearfirst) + result2 = to_datetime(date_str, yearfirst=yearfirst) + result3 = to_datetime([date_str], yearfirst=yearfirst) + # result5 is used below + result4 = to_datetime(np.array([date_str], dtype=object), + yearfirst=yearfirst) + result6 = DatetimeIndex([date_str], yearfirst=yearfirst) + # result7 is used below + result8 = DatetimeIndex(Index([date_str]), yearfirst=yearfirst) + result9 = DatetimeIndex(Series([date_str]), yearfirst=yearfirst) + + for res in [result1, result2]: + assert res == expected + for res in [result3, result4, result6, result8, result9]: + exp = DatetimeIndex([pd.Timestamp(expected)]) + tm.assert_index_equal(res, exp) + + # these really need to have yearfist, but we don't support + if not yearfirst: + result5 = Timestamp(date_str) + assert result5 == expected + result7 = date_range(date_str, freq='S', periods=1, + yearfirst=yearfirst) + assert result7 == expected + + # NaT + result1, _, _ = tools.parse_time_string('NaT') + result2 = to_datetime('NaT') + result3 = Timestamp('NaT') + result4 = DatetimeIndex(['NaT'])[0] + assert result1 is tslib.NaT + assert result2 is tslib.NaT + assert result3 is tslib.NaT + assert result4 is tslib.NaT + + def test_parsers_quarter_invalid(self): + + cases = ['2Q 2005', '2Q-200A', '2Q-200', '22Q2005', '6Q-20', '2Q200.'] + for case in cases: + pytest.raises(ValueError, tools.parse_time_string, case) + + def test_parsers_dayfirst_yearfirst(self): + # OK + # 2.5.1 10-11-12 [dayfirst=0, yearfirst=0] -> 2012-10-11 00:00:00 + # 2.5.2 10-11-12 [dayfirst=0, yearfirst=1] -> 2012-10-11 00:00:00 + # 2.5.3 10-11-12 [dayfirst=0, yearfirst=0] -> 2012-10-11 00:00:00 + + # OK + # 2.5.1 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 + # 2.5.2 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 + # 2.5.3 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 + + # bug fix in 2.5.2 + # 2.5.1 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-11-12 00:00:00 + # 2.5.2 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-12-11 00:00:00 + # 2.5.3 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-12-11 00:00:00 + + # OK + # 2.5.1 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 + # 2.5.2 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 + # 2.5.3 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 + + # OK + # 2.5.1 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 + # 2.5.2 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 + # 2.5.3 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 + + # OK + # 2.5.1 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 + # 2.5.2 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 + # 2.5.3 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 + + # revert of bug in 2.5.2 + # 2.5.1 20/12/21 [dayfirst=1, yearfirst=1] -> 2020-12-21 00:00:00 + # 2.5.2 20/12/21 [dayfirst=1, yearfirst=1] -> month must be in 1..12 + # 2.5.3 20/12/21 [dayfirst=1, yearfirst=1] -> 2020-12-21 00:00:00 + + # OK + # 2.5.1 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 + # 2.5.2 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 + # 2.5.3 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 + + is_lt_253 = dateutil.__version__ < LooseVersion('2.5.3') + + # str : dayfirst, yearfirst, expected + cases = {'10-11-12': [(False, False, + datetime(2012, 10, 11)), + (True, False, + datetime(2012, 11, 10)), + (False, True, + datetime(2010, 11, 12)), + (True, True, + datetime(2010, 12, 11))], + '20/12/21': [(False, False, + datetime(2021, 12, 20)), + (True, False, + datetime(2021, 12, 20)), + (False, True, + datetime(2020, 12, 21)), + (True, True, + datetime(2020, 12, 21))]} + + for date_str, values in compat.iteritems(cases): + for dayfirst, yearfirst, expected in values: + + # odd comparisons across version + # let's just skip + if dayfirst and yearfirst and is_lt_253: + continue + + # compare with dateutil result + dateutil_result = parse(date_str, dayfirst=dayfirst, + yearfirst=yearfirst) + assert dateutil_result == expected + + result1, _, _ = tools.parse_time_string(date_str, + dayfirst=dayfirst, + yearfirst=yearfirst) + + # we don't support dayfirst/yearfirst here: + if not dayfirst and not yearfirst: + result2 = Timestamp(date_str) + assert result2 == expected + + result3 = to_datetime(date_str, dayfirst=dayfirst, + yearfirst=yearfirst) + + result4 = DatetimeIndex([date_str], dayfirst=dayfirst, + yearfirst=yearfirst)[0] + + assert result1 == expected + assert result3 == expected + assert result4 == expected + + def test_parsers_timestring(self): + # must be the same as dateutil result + cases = {'10:15': (parse('10:15'), datetime(1, 1, 1, 10, 15)), + '9:05': (parse('9:05'), datetime(1, 1, 1, 9, 5))} + + for date_str, (exp_now, exp_def) in compat.iteritems(cases): + result1, _, _ = tools.parse_time_string(date_str) + result2 = to_datetime(date_str) + result3 = to_datetime([date_str]) + result4 = Timestamp(date_str) + result5 = DatetimeIndex([date_str])[0] + # parse time string return time string based on default date + # others are not, and can't be changed because it is used in + # time series plot + assert result1 == exp_def + assert result2 == exp_now + assert result3 == exp_now + assert result4 == exp_now + assert result5 == exp_now + + def test_parsers_time(self): + # GH11818 + _skip_if_has_locale() + strings = ["14:15", "1415", "2:15pm", "0215pm", "14:15:00", "141500", + "2:15:00pm", "021500pm", time(14, 15)] + expected = time(14, 15) + + for time_string in strings: + assert tools.to_time(time_string) == expected + + new_string = "14.15" + pytest.raises(ValueError, tools.to_time, new_string) + assert tools.to_time(new_string, format="%H.%M") == expected + + arg = ["14:15", "20:20"] + expected_arr = [time(14, 15), time(20, 20)] + assert tools.to_time(arg) == expected_arr + assert tools.to_time(arg, format="%H:%M") == expected_arr + assert tools.to_time(arg, infer_time_format=True) == expected_arr + assert tools.to_time(arg, format="%I:%M%p", + errors="coerce") == [None, None] + + res = tools.to_time(arg, format="%I:%M%p", errors="ignore") + tm.assert_numpy_array_equal(res, np.array(arg, dtype=np.object_)) + + with pytest.raises(ValueError): + tools.to_time(arg, format="%I:%M%p", errors="raise") + + tm.assert_series_equal(tools.to_time(Series(arg, name="test")), + Series(expected_arr, name="test")) + + res = tools.to_time(np.array(arg)) + assert isinstance(res, list) + assert res == expected_arr + + def test_parsers_monthfreq(self): + cases = {'201101': datetime(2011, 1, 1, 0, 0), + '200005': datetime(2000, 5, 1, 0, 0)} + + for date_str, expected in compat.iteritems(cases): + result1, _, _ = tools.parse_time_string(date_str, freq='M') + assert result1 == expected + + def test_parsers_quarterly_with_freq(self): + msg = ('Incorrect quarterly string is given, quarter ' + 'must be between 1 and 4: 2013Q5') + with tm.assert_raises_regex(tslib.DateParseError, msg): + tools.parse_time_string('2013Q5') + + # GH 5418 + msg = ('Unable to retrieve month information from given freq: ' + 'INVLD-L-DEC-SAT') + with tm.assert_raises_regex(tslib.DateParseError, msg): + tools.parse_time_string('2013Q1', freq='INVLD-L-DEC-SAT') + + cases = {('2013Q2', None): datetime(2013, 4, 1), + ('2013Q2', 'A-APR'): datetime(2012, 8, 1), + ('2013-Q2', 'A-DEC'): datetime(2013, 4, 1)} + + for (date_str, freq), exp in compat.iteritems(cases): + result, _, _ = tools.parse_time_string(date_str, freq=freq) + assert result == exp + + def test_parsers_timezone_minute_offsets_roundtrip(self): + # GH11708 + base = to_datetime("2013-01-01 00:00:00") + dt_strings = [ + ('2013-01-01 05:45+0545', + "Asia/Katmandu", + "Timestamp('2013-01-01 05:45:00+0545', tz='Asia/Katmandu')"), + ('2013-01-01 05:30+0530', + "Asia/Kolkata", + "Timestamp('2013-01-01 05:30:00+0530', tz='Asia/Kolkata')") + ] + + for dt_string, tz, dt_string_repr in dt_strings: + dt_time = to_datetime(dt_string) + assert base == dt_time + converted_time = dt_time.tz_localize('UTC').tz_convert(tz) + assert dt_string_repr == repr(converted_time) + + def test_parsers_iso8601(self): + # GH 12060 + # test only the iso parser - flexibility to different + # separators and leadings 0s + # Timestamp construction falls back to dateutil + cases = {'2011-01-02': datetime(2011, 1, 2), + '2011-1-2': datetime(2011, 1, 2), + '2011-01': datetime(2011, 1, 1), + '2011-1': datetime(2011, 1, 1), + '2011 01 02': datetime(2011, 1, 2), + '2011.01.02': datetime(2011, 1, 2), + '2011/01/02': datetime(2011, 1, 2), + '2011\\01\\02': datetime(2011, 1, 2), + '2013-01-01 05:30:00': datetime(2013, 1, 1, 5, 30), + '2013-1-1 5:30:00': datetime(2013, 1, 1, 5, 30)} + for date_str, exp in compat.iteritems(cases): + actual = tslib._test_parse_iso8601(date_str) + assert actual == exp + + # seperators must all match - YYYYMM not valid + invalid_cases = ['2011-01/02', '2011^11^11', + '201401', '201111', '200101', + # mixed separated and unseparated + '2005-0101', '200501-01', + '20010101 12:3456', '20010101 1234:56', + # HHMMSS must have two digits in each component + # if unseparated + '20010101 1', '20010101 123', '20010101 12345', + '20010101 12345Z', + # wrong separator for HHMMSS + '2001-01-01 12-34-56'] + for date_str in invalid_cases: + with pytest.raises(ValueError): + tslib._test_parse_iso8601(date_str) + # If no ValueError raised, let me know which case failed. + raise Exception(date_str) + + +class TestArrayToDatetime(object): + + def test_try_parse_dates(self): + arr = np.array(['5/1/2000', '6/1/2000', '7/1/2000'], dtype=object) + + result = lib.try_parse_dates(arr, dayfirst=True) + expected = [parse(d, dayfirst=True) for d in arr] + assert np.array_equal(result, expected) + + def test_parsing_valid_dates(self): + arr = np.array(['01-01-2013', '01-02-2013'], dtype=object) + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr), + np_array_datetime64_compat( + [ + '2013-01-01T00:00:00.000000000-0000', + '2013-01-02T00:00:00.000000000-0000' + ], + dtype='M8[ns]' + ) + ) + + arr = np.array(['Mon Sep 16 2013', 'Tue Sep 17 2013'], dtype=object) + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr), + np_array_datetime64_compat( + [ + '2013-09-16T00:00:00.000000000-0000', + '2013-09-17T00:00:00.000000000-0000' + ], + dtype='M8[ns]' + ) + ) + + def test_parsing_timezone_offsets(self): + # All of these datetime strings with offsets are equivalent + # to the same datetime after the timezone offset is added + dt_strings = [ + '01-01-2013 08:00:00+08:00', + '2013-01-01T08:00:00.000000000+0800', + '2012-12-31T16:00:00.000000000-0800', + '12-31-2012 23:00:00-01:00' + ] + + expected_output = tslib.array_to_datetime(np.array( + ['01-01-2013 00:00:00'], dtype=object)) + + for dt_string in dt_strings: + tm.assert_numpy_array_equal( + tslib.array_to_datetime( + np.array([dt_string], dtype=object) + ), + expected_output + ) + + def test_number_looking_strings_not_into_datetime(self): + # #4601 + # These strings don't look like datetimes so they shouldn't be + # attempted to be converted + arr = np.array(['-352.737091', '183.575577'], dtype=object) + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr, errors='ignore'), arr) + + arr = np.array(['1', '2', '3', '4', '5'], dtype=object) + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr, errors='ignore'), arr) + + def test_coercing_dates_outside_of_datetime64_ns_bounds(self): + invalid_dates = [ + date(1000, 1, 1), + datetime(1000, 1, 1), + '1000-01-01', + 'Jan 1, 1000', + np.datetime64('1000-01-01'), + ] + + for invalid_date in invalid_dates: + pytest.raises(ValueError, + tslib.array_to_datetime, + np.array([invalid_date], dtype='object'), + errors='raise', ) + tm.assert_numpy_array_equal( + tslib.array_to_datetime( + np.array([invalid_date], dtype='object'), + errors='coerce'), + np.array([tslib.iNaT], dtype='M8[ns]') + ) + + arr = np.array(['1/1/1000', '1/1/2000'], dtype=object) + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr, errors='coerce'), + np_array_datetime64_compat( + [ + tslib.iNaT, + '2000-01-01T00:00:00.000000000-0000' + ], + dtype='M8[ns]' + ) + ) + + def test_coerce_of_invalid_datetimes(self): + arr = np.array(['01-01-2013', 'not_a_date', '1'], dtype=object) + + # Without coercing, the presence of any invalid dates prevents + # any values from being converted + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr, errors='ignore'), arr) + + # With coercing, the invalid dates becomes iNaT + tm.assert_numpy_array_equal( + tslib.array_to_datetime(arr, errors='coerce'), + np_array_datetime64_compat( + [ + '2013-01-01T00:00:00.000000000-0000', + tslib.iNaT, + tslib.iNaT + ], + dtype='M8[ns]' + ) + ) + + +def test_normalize_date(): + value = date(2012, 9, 7) + + result = normalize_date(value) + assert (result == datetime(2012, 9, 7)) + + value = datetime(2012, 9, 7, 12) + + result = normalize_date(value) + assert (result == datetime(2012, 9, 7)) + + +@pytest.fixture(params=['D', 's', 'ms', 'us', 'ns']) +def units(request): + return request.param + + +@pytest.fixture +def epoch_1960(): + # for origin as 1960-01-01 + return Timestamp('1960-01-01') + + +@pytest.fixture +def units_from_epochs(): + return list(range(5)) + + +@pytest.fixture(params=[epoch_1960(), + epoch_1960().to_pydatetime(), + epoch_1960().to_datetime64(), + str(epoch_1960())]) +def epochs(request): + return request.param + + +@pytest.fixture +def julian_dates(): + return pd.date_range('2014-1-1', periods=10).to_julian_date().values + + +class TestOrigin(object): + + def test_to_basic(self, julian_dates): + # gh-11276, gh-11745 + # for origin as julian + + result = Series(pd.to_datetime( + julian_dates, unit='D', origin='julian')) + expected = Series(pd.to_datetime( + julian_dates - pd.Timestamp(0).to_julian_date(), unit='D')) + assert_series_equal(result, expected) + + result = Series(pd.to_datetime( + [0, 1, 2], unit='D', origin='unix')) + expected = Series([Timestamp('1970-01-01'), + Timestamp('1970-01-02'), + Timestamp('1970-01-03')]) + assert_series_equal(result, expected) + + # default + result = Series(pd.to_datetime( + [0, 1, 2], unit='D')) + expected = Series([Timestamp('1970-01-01'), + Timestamp('1970-01-02'), + Timestamp('1970-01-03')]) + assert_series_equal(result, expected) + + def test_julian_round_trip(self): + result = pd.to_datetime(2456658, origin='julian', unit='D') + assert result.to_julian_date() == 2456658 + + # out-of-bounds + with pytest.raises(ValueError): + pd.to_datetime(1, origin="julian", unit='D') + + def test_invalid_unit(self, units, julian_dates): + + # checking for invalid combination of origin='julian' and unit != D + if units != 'D': + with pytest.raises(ValueError): + pd.to_datetime(julian_dates, unit=units, origin='julian') + + def test_invalid_origin(self): + + # need to have a numeric specified + with pytest.raises(ValueError): + pd.to_datetime("2005-01-01", origin="1960-01-01") + + with pytest.raises(ValueError): + pd.to_datetime("2005-01-01", origin="1960-01-01", unit='D') + + def test_epoch(self, units, epochs, epoch_1960, units_from_epochs): + + expected = Series( + [pd.Timedelta(x, unit=units) + + epoch_1960 for x in units_from_epochs]) + + result = Series(pd.to_datetime( + units_from_epochs, unit=units, origin=epochs)) + assert_series_equal(result, expected) + + @pytest.mark.parametrize("origin, exc", + [('random_string', ValueError), + ('epoch', ValueError), + ('13-24-1990', ValueError), + (datetime(1, 1, 1), tslib.OutOfBoundsDatetime)]) + def test_invalid_origins(self, origin, exc, units, units_from_epochs): + + with pytest.raises(exc): + pd.to_datetime(units_from_epochs, unit=units, + origin=origin) + + def test_invalid_origins_tzinfo(self): + # GH16842 + with pytest.raises(ValueError): + pd.to_datetime(1, unit='D', + origin=datetime(2000, 1, 1, tzinfo=pytz.utc)) + + def test_processing_order(self): + # make sure we handle out-of-bounds *before* + # constructing the dates + + result = pd.to_datetime(200 * 365, unit='D') + expected = Timestamp('2169-11-13 00:00:00') + assert result == expected + + result = pd.to_datetime(200 * 365, unit='D', origin='1870-01-01') + expected = Timestamp('2069-11-13 00:00:00') + assert result == expected + + result = pd.to_datetime(300 * 365, unit='D', origin='1870-01-01') + expected = Timestamp('2169-10-20 00:00:00') + assert result == expected diff --git a/pandas/tools/tests/__init__.py b/pandas/tests/indexes/period/__init__.py similarity index 100% rename from pandas/tools/tests/__init__.py rename to pandas/tests/indexes/period/__init__.py diff --git a/pandas/tests/indexes/period/test_asfreq.py b/pandas/tests/indexes/period/test_asfreq.py new file mode 100644 index 0000000000000..c8724b2a3bc91 --- /dev/null +++ b/pandas/tests/indexes/period/test_asfreq.py @@ -0,0 +1,155 @@ +import pytest + +import numpy as np +import pandas as pd +from pandas.util import testing as tm +from pandas import PeriodIndex, Series, DataFrame + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_asfreq(self): + pi1 = PeriodIndex(freq='A', start='1/1/2001', end='1/1/2001') + pi2 = PeriodIndex(freq='Q', start='1/1/2001', end='1/1/2001') + pi3 = PeriodIndex(freq='M', start='1/1/2001', end='1/1/2001') + pi4 = PeriodIndex(freq='D', start='1/1/2001', end='1/1/2001') + pi5 = PeriodIndex(freq='H', start='1/1/2001', end='1/1/2001 00:00') + pi6 = PeriodIndex(freq='Min', start='1/1/2001', end='1/1/2001 00:00') + pi7 = PeriodIndex(freq='S', start='1/1/2001', end='1/1/2001 00:00:00') + + assert pi1.asfreq('Q', 'S') == pi2 + assert pi1.asfreq('Q', 's') == pi2 + assert pi1.asfreq('M', 'start') == pi3 + assert pi1.asfreq('D', 'StarT') == pi4 + assert pi1.asfreq('H', 'beGIN') == pi5 + assert pi1.asfreq('Min', 'S') == pi6 + assert pi1.asfreq('S', 'S') == pi7 + + assert pi2.asfreq('A', 'S') == pi1 + assert pi2.asfreq('M', 'S') == pi3 + assert pi2.asfreq('D', 'S') == pi4 + assert pi2.asfreq('H', 'S') == pi5 + assert pi2.asfreq('Min', 'S') == pi6 + assert pi2.asfreq('S', 'S') == pi7 + + assert pi3.asfreq('A', 'S') == pi1 + assert pi3.asfreq('Q', 'S') == pi2 + assert pi3.asfreq('D', 'S') == pi4 + assert pi3.asfreq('H', 'S') == pi5 + assert pi3.asfreq('Min', 'S') == pi6 + assert pi3.asfreq('S', 'S') == pi7 + + assert pi4.asfreq('A', 'S') == pi1 + assert pi4.asfreq('Q', 'S') == pi2 + assert pi4.asfreq('M', 'S') == pi3 + assert pi4.asfreq('H', 'S') == pi5 + assert pi4.asfreq('Min', 'S') == pi6 + assert pi4.asfreq('S', 'S') == pi7 + + assert pi5.asfreq('A', 'S') == pi1 + assert pi5.asfreq('Q', 'S') == pi2 + assert pi5.asfreq('M', 'S') == pi3 + assert pi5.asfreq('D', 'S') == pi4 + assert pi5.asfreq('Min', 'S') == pi6 + assert pi5.asfreq('S', 'S') == pi7 + + assert pi6.asfreq('A', 'S') == pi1 + assert pi6.asfreq('Q', 'S') == pi2 + assert pi6.asfreq('M', 'S') == pi3 + assert pi6.asfreq('D', 'S') == pi4 + assert pi6.asfreq('H', 'S') == pi5 + assert pi6.asfreq('S', 'S') == pi7 + + assert pi7.asfreq('A', 'S') == pi1 + assert pi7.asfreq('Q', 'S') == pi2 + assert pi7.asfreq('M', 'S') == pi3 + assert pi7.asfreq('D', 'S') == pi4 + assert pi7.asfreq('H', 'S') == pi5 + assert pi7.asfreq('Min', 'S') == pi6 + + pytest.raises(ValueError, pi7.asfreq, 'T', 'foo') + result1 = pi1.asfreq('3M') + result2 = pi1.asfreq('M') + expected = PeriodIndex(freq='M', start='2001-12', end='2001-12') + tm.assert_numpy_array_equal(result1.asi8, expected.asi8) + assert result1.freqstr == '3M' + tm.assert_numpy_array_equal(result2.asi8, expected.asi8) + assert result2.freqstr == 'M' + + def test_asfreq_nat(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', '2011-04'], freq='M') + result = idx.asfreq(freq='Q') + expected = PeriodIndex(['2011Q1', '2011Q1', 'NaT', '2011Q2'], freq='Q') + tm.assert_index_equal(result, expected) + + def test_asfreq_mult_pi(self): + pi = PeriodIndex(['2001-01', '2001-02', 'NaT', '2001-03'], freq='2M') + + for freq in ['D', '3D']: + result = pi.asfreq(freq) + exp = PeriodIndex(['2001-02-28', '2001-03-31', 'NaT', + '2001-04-30'], freq=freq) + tm.assert_index_equal(result, exp) + assert result.freq == exp.freq + + result = pi.asfreq(freq, how='S') + exp = PeriodIndex(['2001-01-01', '2001-02-01', 'NaT', + '2001-03-01'], freq=freq) + tm.assert_index_equal(result, exp) + assert result.freq == exp.freq + + def test_asfreq_combined_pi(self): + pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], + freq='H') + exp = PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], + freq='25H') + for freq, how in zip(['1D1H', '1H1D'], ['S', 'E']): + result = pi.asfreq(freq, how=how) + tm.assert_index_equal(result, exp) + assert result.freq == exp.freq + + for freq in ['1D1H', '1H1D']: + pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', + 'NaT'], freq=freq) + result = pi.asfreq('H') + exp = PeriodIndex(['2001-01-02 00:00', '2001-01-03 02:00', 'NaT'], + freq='H') + tm.assert_index_equal(result, exp) + assert result.freq == exp.freq + + pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', + 'NaT'], freq=freq) + result = pi.asfreq('H', how='S') + exp = PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], + freq='H') + tm.assert_index_equal(result, exp) + assert result.freq == exp.freq + + def test_asfreq_ts(self): + index = PeriodIndex(freq='A', start='1/1/2001', end='12/31/2010') + ts = Series(np.random.randn(len(index)), index=index) + df = DataFrame(np.random.randn(len(index), 3), index=index) + + result = ts.asfreq('D', how='end') + df_result = df.asfreq('D', how='end') + exp_index = index.asfreq('D', how='end') + assert len(result) == len(ts) + tm.assert_index_equal(result.index, exp_index) + tm.assert_index_equal(df_result.index, exp_index) + + result = ts.asfreq('D', how='start') + assert len(result) == len(ts) + tm.assert_index_equal(result.index, index.asfreq('D', how='start')) + + def test_astype_asfreq(self): + pi1 = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01'], freq='D') + exp = PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M') + tm.assert_index_equal(pi1.asfreq('M'), exp) + tm.assert_index_equal(pi1.astype('period[M]'), exp) + + exp = PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='3M') + tm.assert_index_equal(pi1.asfreq('3M'), exp) + tm.assert_index_equal(pi1.astype('period[3M]'), exp) diff --git a/pandas/tests/indexes/period/test_construction.py b/pandas/tests/indexes/period/test_construction.py new file mode 100644 index 0000000000000..e5b889e100307 --- /dev/null +++ b/pandas/tests/indexes/period/test_construction.py @@ -0,0 +1,489 @@ +import pytest + +import numpy as np +import pandas as pd +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas.compat import lrange, PY3, text_type, lmap +from pandas import (Period, PeriodIndex, period_range, offsets, date_range, + Series, Index) + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_construction_base_constructor(self): + # GH 13664 + arr = [pd.Period('2011-01', freq='M'), pd.NaT, + pd.Period('2011-03', freq='M')] + tm.assert_index_equal(pd.Index(arr), pd.PeriodIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.PeriodIndex(np.array(arr))) + + arr = [np.nan, pd.NaT, pd.Period('2011-03', freq='M')] + tm.assert_index_equal(pd.Index(arr), pd.PeriodIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.PeriodIndex(np.array(arr))) + + arr = [pd.Period('2011-01', freq='M'), pd.NaT, + pd.Period('2011-03', freq='D')] + tm.assert_index_equal(pd.Index(arr), pd.Index(arr, dtype=object)) + + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.Index(np.array(arr), dtype=object)) + + def test_constructor_use_start_freq(self): + # GH #1118 + p = Period('4/2/2012', freq='B') + index = PeriodIndex(start=p, periods=10) + expected = PeriodIndex(start='4/2/2012', periods=10, freq='B') + tm.assert_index_equal(index, expected) + + def test_constructor_field_arrays(self): + # GH #1264 + + years = np.arange(1990, 2010).repeat(4)[2:-2] + quarters = np.tile(np.arange(1, 5), 20)[2:-2] + + index = PeriodIndex(year=years, quarter=quarters, freq='Q-DEC') + expected = period_range('1990Q3', '2009Q2', freq='Q-DEC') + tm.assert_index_equal(index, expected) + + index2 = PeriodIndex(year=years, quarter=quarters, freq='2Q-DEC') + tm.assert_numpy_array_equal(index.asi8, index2.asi8) + + index = PeriodIndex(year=years, quarter=quarters) + tm.assert_index_equal(index, expected) + + years = [2007, 2007, 2007] + months = [1, 2] + pytest.raises(ValueError, PeriodIndex, year=years, month=months, + freq='M') + pytest.raises(ValueError, PeriodIndex, year=years, month=months, + freq='2M') + pytest.raises(ValueError, PeriodIndex, year=years, month=months, + freq='M', start=Period('2007-01', freq='M')) + + years = [2007, 2007, 2007] + months = [1, 2, 3] + idx = PeriodIndex(year=years, month=months, freq='M') + exp = period_range('2007-01', periods=3, freq='M') + tm.assert_index_equal(idx, exp) + + def test_constructor_U(self): + # U was used as undefined period + pytest.raises(ValueError, period_range, '2007-1-1', periods=500, + freq='X') + + def test_constructor_nano(self): + idx = period_range(start=Period(ordinal=1, freq='N'), + end=Period(ordinal=4, freq='N'), freq='N') + exp = PeriodIndex([Period(ordinal=1, freq='N'), + Period(ordinal=2, freq='N'), + Period(ordinal=3, freq='N'), + Period(ordinal=4, freq='N')], freq='N') + tm.assert_index_equal(idx, exp) + + def test_constructor_arrays_negative_year(self): + years = np.arange(1960, 2000, dtype=np.int64).repeat(4) + quarters = np.tile(np.array([1, 2, 3, 4], dtype=np.int64), 40) + + pindex = PeriodIndex(year=years, quarter=quarters) + + tm.assert_index_equal(pindex.year, pd.Index(years)) + tm.assert_index_equal(pindex.quarter, pd.Index(quarters)) + + def test_constructor_invalid_quarters(self): + pytest.raises(ValueError, PeriodIndex, year=lrange(2000, 2004), + quarter=lrange(4), freq='Q-DEC') + + def test_constructor_corner(self): + pytest.raises(ValueError, PeriodIndex, periods=10, freq='A') + + start = Period('2007', freq='A-JUN') + end = Period('2010', freq='A-DEC') + pytest.raises(ValueError, PeriodIndex, start=start, end=end) + pytest.raises(ValueError, PeriodIndex, start=start) + pytest.raises(ValueError, PeriodIndex, end=end) + + result = period_range('2007-01', periods=10.5, freq='M') + exp = period_range('2007-01', periods=10, freq='M') + tm.assert_index_equal(result, exp) + + def test_constructor_fromarraylike(self): + idx = period_range('2007-01', periods=20, freq='M') + + # values is an array of Period, thus can retrieve freq + tm.assert_index_equal(PeriodIndex(idx.values), idx) + tm.assert_index_equal(PeriodIndex(list(idx.values)), idx) + + pytest.raises(ValueError, PeriodIndex, idx._values) + pytest.raises(ValueError, PeriodIndex, list(idx._values)) + pytest.raises(TypeError, PeriodIndex, + data=Period('2007', freq='A')) + + result = PeriodIndex(iter(idx)) + tm.assert_index_equal(result, idx) + + result = PeriodIndex(idx) + tm.assert_index_equal(result, idx) + + result = PeriodIndex(idx, freq='M') + tm.assert_index_equal(result, idx) + + result = PeriodIndex(idx, freq=offsets.MonthEnd()) + tm.assert_index_equal(result, idx) + assert result.freq == 'M' + + result = PeriodIndex(idx, freq='2M') + tm.assert_index_equal(result, idx.asfreq('2M')) + assert result.freq == '2M' + + result = PeriodIndex(idx, freq=offsets.MonthEnd(2)) + tm.assert_index_equal(result, idx.asfreq('2M')) + assert result.freq == '2M' + + result = PeriodIndex(idx, freq='D') + exp = idx.asfreq('D', 'e') + tm.assert_index_equal(result, exp) + + def test_constructor_datetime64arr(self): + vals = np.arange(100000, 100000 + 10000, 100, dtype=np.int64) + vals = vals.view(np.dtype('M8[us]')) + + pytest.raises(ValueError, PeriodIndex, vals, freq='D') + + def test_constructor_dtype(self): + # passing a dtype with a tz should localize + idx = PeriodIndex(['2013-01', '2013-03'], dtype='period[M]') + exp = PeriodIndex(['2013-01', '2013-03'], freq='M') + tm.assert_index_equal(idx, exp) + assert idx.dtype == 'period[M]' + + idx = PeriodIndex(['2013-01-05', '2013-03-05'], dtype='period[3D]') + exp = PeriodIndex(['2013-01-05', '2013-03-05'], freq='3D') + tm.assert_index_equal(idx, exp) + assert idx.dtype == 'period[3D]' + + # if we already have a freq and its not the same, then asfreq + # (not changed) + idx = PeriodIndex(['2013-01-01', '2013-01-02'], freq='D') + + res = PeriodIndex(idx, dtype='period[M]') + exp = PeriodIndex(['2013-01', '2013-01'], freq='M') + tm.assert_index_equal(res, exp) + assert res.dtype == 'period[M]' + + res = PeriodIndex(idx, freq='M') + tm.assert_index_equal(res, exp) + assert res.dtype == 'period[M]' + + msg = 'specified freq and dtype are different' + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + PeriodIndex(['2011-01'], freq='M', dtype='period[D]') + + def test_constructor_empty(self): + idx = pd.PeriodIndex([], freq='M') + assert isinstance(idx, PeriodIndex) + assert len(idx) == 0 + assert idx.freq == 'M' + + with tm.assert_raises_regex(ValueError, 'freq not specified'): + pd.PeriodIndex([]) + + def test_constructor_pi_nat(self): + idx = PeriodIndex([Period('2011-01', freq='M'), pd.NaT, + Period('2011-01', freq='M')]) + exp = PeriodIndex(['2011-01', 'NaT', '2011-01'], freq='M') + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex(np.array([Period('2011-01', freq='M'), pd.NaT, + Period('2011-01', freq='M')])) + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex([pd.NaT, pd.NaT, Period('2011-01', freq='M'), + Period('2011-01', freq='M')]) + exp = PeriodIndex(['NaT', 'NaT', '2011-01', '2011-01'], freq='M') + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex(np.array([pd.NaT, pd.NaT, + Period('2011-01', freq='M'), + Period('2011-01', freq='M')])) + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex([pd.NaT, pd.NaT, '2011-01', '2011-01'], freq='M') + tm.assert_index_equal(idx, exp) + + with tm.assert_raises_regex(ValueError, 'freq not specified'): + PeriodIndex([pd.NaT, pd.NaT]) + + with tm.assert_raises_regex(ValueError, 'freq not specified'): + PeriodIndex(np.array([pd.NaT, pd.NaT])) + + with tm.assert_raises_regex(ValueError, 'freq not specified'): + PeriodIndex(['NaT', 'NaT']) + + with tm.assert_raises_regex(ValueError, 'freq not specified'): + PeriodIndex(np.array(['NaT', 'NaT'])) + + def test_constructor_incompat_freq(self): + msg = "Input has different freq=D from PeriodIndex\\(freq=M\\)" + + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + PeriodIndex([Period('2011-01', freq='M'), pd.NaT, + Period('2011-01', freq='D')]) + + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + PeriodIndex(np.array([Period('2011-01', freq='M'), pd.NaT, + Period('2011-01', freq='D')])) + + # first element is pd.NaT + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + PeriodIndex([pd.NaT, Period('2011-01', freq='M'), + Period('2011-01', freq='D')]) + + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + PeriodIndex(np.array([pd.NaT, Period('2011-01', freq='M'), + Period('2011-01', freq='D')])) + + def test_constructor_mixed(self): + idx = PeriodIndex(['2011-01', pd.NaT, Period('2011-01', freq='M')]) + exp = PeriodIndex(['2011-01', 'NaT', '2011-01'], freq='M') + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex(['NaT', pd.NaT, Period('2011-01', freq='M')]) + exp = PeriodIndex(['NaT', 'NaT', '2011-01'], freq='M') + tm.assert_index_equal(idx, exp) + + idx = PeriodIndex([Period('2011-01-01', freq='D'), pd.NaT, + '2012-01-01']) + exp = PeriodIndex(['2011-01-01', 'NaT', '2012-01-01'], freq='D') + tm.assert_index_equal(idx, exp) + + def test_constructor_simple_new(self): + idx = period_range('2007-01', name='p', periods=2, freq='M') + result = idx._simple_new(idx, 'p', freq=idx.freq) + tm.assert_index_equal(result, idx) + + result = idx._simple_new(idx.astype('i8'), 'p', freq=idx.freq) + tm.assert_index_equal(result, idx) + + result = idx._simple_new([pd.Period('2007-01', freq='M'), + pd.Period('2007-02', freq='M')], + 'p', freq=idx.freq) + tm.assert_index_equal(result, idx) + + result = idx._simple_new(np.array([pd.Period('2007-01', freq='M'), + pd.Period('2007-02', freq='M')]), + 'p', freq=idx.freq) + tm.assert_index_equal(result, idx) + + def test_constructor_simple_new_empty(self): + # GH13079 + idx = PeriodIndex([], freq='M', name='p') + result = idx._simple_new(idx, name='p', freq='M') + tm.assert_index_equal(result, idx) + + def test_constructor_floats(self): + # GH13079 + for floats in [[1.1, 2.1], np.array([1.1, 2.1])]: + with pytest.raises(TypeError): + pd.PeriodIndex._simple_new(floats, freq='M') + + with pytest.raises(TypeError): + pd.PeriodIndex(floats, freq='M') + + def test_constructor_nat(self): + pytest.raises(ValueError, period_range, start='NaT', + end='2011-01-01', freq='M') + pytest.raises(ValueError, period_range, start='2011-01-01', + end='NaT', freq='M') + + def test_constructor_year_and_quarter(self): + year = pd.Series([2001, 2002, 2003]) + quarter = year - 2000 + idx = PeriodIndex(year=year, quarter=quarter) + strs = ['%dQ%d' % t for t in zip(quarter, year)] + lops = list(map(Period, strs)) + p = PeriodIndex(lops) + tm.assert_index_equal(p, idx) + + def test_constructor_freq_mult(self): + # GH #7811 + for func in [PeriodIndex, period_range]: + # must be the same, but for sure... + pidx = func(start='2014-01', freq='2M', periods=4) + expected = PeriodIndex(['2014-01', '2014-03', + '2014-05', '2014-07'], freq='2M') + tm.assert_index_equal(pidx, expected) + + pidx = func(start='2014-01-02', end='2014-01-15', freq='3D') + expected = PeriodIndex(['2014-01-02', '2014-01-05', + '2014-01-08', '2014-01-11', + '2014-01-14'], freq='3D') + tm.assert_index_equal(pidx, expected) + + pidx = func(end='2014-01-01 17:00', freq='4H', periods=3) + expected = PeriodIndex(['2014-01-01 09:00', '2014-01-01 13:00', + '2014-01-01 17:00'], freq='4H') + tm.assert_index_equal(pidx, expected) + + msg = ('Frequency must be positive, because it' + ' represents span: -1M') + with tm.assert_raises_regex(ValueError, msg): + PeriodIndex(['2011-01'], freq='-1M') + + msg = ('Frequency must be positive, because it' ' represents span: 0M') + with tm.assert_raises_regex(ValueError, msg): + PeriodIndex(['2011-01'], freq='0M') + + msg = ('Frequency must be positive, because it' ' represents span: 0M') + with tm.assert_raises_regex(ValueError, msg): + period_range('2011-01', periods=3, freq='0M') + + def test_constructor_freq_mult_dti_compat(self): + import itertools + mults = [1, 2, 3, 4, 5] + freqs = ['A', 'M', 'D', 'T', 'S'] + for mult, freq in itertools.product(mults, freqs): + freqstr = str(mult) + freq + pidx = PeriodIndex(start='2014-04-01', freq=freqstr, periods=10) + expected = date_range(start='2014-04-01', freq=freqstr, + periods=10).to_period(freqstr) + tm.assert_index_equal(pidx, expected) + + def test_constructor_freq_combined(self): + for freq in ['1D1H', '1H1D']: + pidx = PeriodIndex(['2016-01-01', '2016-01-02'], freq=freq) + expected = PeriodIndex(['2016-01-01 00:00', '2016-01-02 00:00'], + freq='25H') + for freq, func in zip(['1D1H', '1H1D'], [PeriodIndex, period_range]): + pidx = func(start='2016-01-01', periods=2, freq=freq) + expected = PeriodIndex(['2016-01-01 00:00', '2016-01-02 01:00'], + freq='25H') + tm.assert_index_equal(pidx, expected) + + def test_constructor(self): + pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + assert len(pi) == 9 + + pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2009') + assert len(pi) == 4 * 9 + + pi = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') + assert len(pi) == 12 * 9 + + pi = PeriodIndex(freq='D', start='1/1/2001', end='12/31/2009') + assert len(pi) == 365 * 9 + 2 + + pi = PeriodIndex(freq='B', start='1/1/2001', end='12/31/2009') + assert len(pi) == 261 * 9 + + pi = PeriodIndex(freq='H', start='1/1/2001', end='12/31/2001 23:00') + assert len(pi) == 365 * 24 + + pi = PeriodIndex(freq='Min', start='1/1/2001', end='1/1/2001 23:59') + assert len(pi) == 24 * 60 + + pi = PeriodIndex(freq='S', start='1/1/2001', end='1/1/2001 23:59:59') + assert len(pi) == 24 * 60 * 60 + + start = Period('02-Apr-2005', 'B') + i1 = PeriodIndex(start=start, periods=20) + assert len(i1) == 20 + assert i1.freq == start.freq + assert i1[0] == start + + end_intv = Period('2006-12-31', 'W') + i1 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == 10 + assert i1.freq == end_intv.freq + assert i1[-1] == end_intv + + end_intv = Period('2006-12-31', '1w') + i2 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == len(i2) + assert (i1 == i2).all() + assert i1.freq == i2.freq + + end_intv = Period('2006-12-31', ('w', 1)) + i2 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == len(i2) + assert (i1 == i2).all() + assert i1.freq == i2.freq + + end_intv = Period('2005-05-01', 'B') + i1 = PeriodIndex(start=start, end=end_intv) + + # infer freq from first element + i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) + assert len(i2) == 2 + assert i2[0] == end_intv + + i2 = PeriodIndex(np.array([end_intv, Period('2005-05-05', 'B')])) + assert len(i2) == 2 + assert i2[0] == end_intv + + # Mixed freq should fail + vals = [end_intv, Period('2006-12-31', 'w')] + pytest.raises(ValueError, PeriodIndex, vals) + vals = np.array(vals) + pytest.raises(ValueError, PeriodIndex, vals) + + def test_constructor_error(self): + start = Period('02-Apr-2005', 'B') + end_intv = Period('2006-12-31', ('w', 1)) + + msg = 'Start and end must have same freq' + with tm.assert_raises_regex(ValueError, msg): + PeriodIndex(start=start, end=end_intv) + + msg = 'Must specify 2 of start, end, periods' + with tm.assert_raises_regex(ValueError, msg): + PeriodIndex(start=start) + + def test_recreate_from_data(self): + for o in ['M', 'Q', 'A', 'D', 'B', 'T', 'S', 'L', 'U', 'N', 'H']: + org = PeriodIndex(start='2001/04/01', freq=o, periods=1) + idx = PeriodIndex(org.values, freq=o) + tm.assert_index_equal(idx, org) + + def test_map_with_string_constructor(self): + raw = [2005, 2007, 2009] + index = PeriodIndex(raw, freq='A') + types = str, + + if PY3: + # unicode + types += text_type, + + for t in types: + expected = Index(lmap(t, raw)) + res = index.map(t) + + # should return an Index + assert isinstance(res, Index) + + # preserve element types + assert all(isinstance(resi, t) for resi in res) + + # lastly, values should compare equal + tm.assert_index_equal(res, expected) + + +class TestSeriesPeriod(object): + + def setup_method(self, method): + self.series = Series(period_range('2000-01-01', periods=10, freq='D')) + + def test_constructor_cant_cast_period(self): + with pytest.raises(TypeError): + Series(period_range('2000-01-01', periods=10, freq='D'), + dtype=float) + + def test_constructor_cast_object(self): + s = Series(period_range('1/1/2000', periods=10), dtype=object) + exp = Series(period_range('1/1/2000', periods=10)) + tm.assert_series_equal(s, exp) diff --git a/pandas/tests/indexes/period/test_formats.py b/pandas/tests/indexes/period/test_formats.py new file mode 100644 index 0000000000000..533481ce051f7 --- /dev/null +++ b/pandas/tests/indexes/period/test_formats.py @@ -0,0 +1,48 @@ +from pandas import PeriodIndex + +import numpy as np + +import pandas.util.testing as tm +import pandas as pd + + +def test_to_native_types(): + index = PeriodIndex(['2017-01-01', '2017-01-02', + '2017-01-03'], freq='D') + + # First, with no arguments. + expected = np.array(['2017-01-01', '2017-01-02', + '2017-01-03'], dtype='= -1') + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -2]), fill_value=True) + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -5]), fill_value=True) + + with pytest.raises(IndexError): + idx.take(np.array([1, -5])) diff --git a/pandas/tests/indexes/period/test_ops.py b/pandas/tests/indexes/period/test_ops.py new file mode 100644 index 0000000000000..7acc335c31be4 --- /dev/null +++ b/pandas/tests/indexes/period/test_ops.py @@ -0,0 +1,1347 @@ +import pytest + +import numpy as np +from datetime import timedelta + +import pandas as pd +import pandas._libs.tslib as tslib +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas import (DatetimeIndex, PeriodIndex, period_range, Series, Period, + _np_version_under1p10, Index, Timedelta, offsets) + +from pandas.tests.test_base import Ops + + +class TestPeriodIndexOps(Ops): + + def setup_method(self, method): + super(TestPeriodIndexOps, self).setup_method(method) + mask = lambda x: (isinstance(x, DatetimeIndex) or + isinstance(x, PeriodIndex)) + self.is_valid_objs = [o for o in self.objs if mask(o)] + self.not_valid_objs = [o for o in self.objs if not mask(o)] + + def test_ops_properties(self): + f = lambda x: isinstance(x, PeriodIndex) + self.check_ops_properties(PeriodIndex._field_ops, f) + self.check_ops_properties(PeriodIndex._object_ops, f) + self.check_ops_properties(PeriodIndex._bool_ops, f) + + def test_asobject_tolist(self): + idx = pd.period_range(start='2013-01-01', periods=4, freq='M', + name='idx') + expected_list = [pd.Period('2013-01-31', freq='M'), + pd.Period('2013-02-28', freq='M'), + pd.Period('2013-03-31', freq='M'), + pd.Period('2013-04-30', freq='M')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + idx = PeriodIndex(['2013-01-01', '2013-01-02', 'NaT', + '2013-01-04'], freq='D', name='idx') + expected_list = [pd.Period('2013-01-01', freq='D'), + pd.Period('2013-01-02', freq='D'), + pd.Period('NaT', freq='D'), + pd.Period('2013-01-04', freq='D')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + assert result.dtype == object + tm.assert_index_equal(result, expected) + for i in [0, 1, 3]: + assert result[i] == expected[i] + assert result[2] is pd.NaT + assert result.name == expected.name + + result_list = idx.tolist() + for i in [0, 1, 3]: + assert result_list[i] == expected_list[i] + assert result_list[2] is pd.NaT + + def test_minmax(self): + + # monotonic + idx1 = pd.PeriodIndex([pd.NaT, '2011-01-01', '2011-01-02', + '2011-01-03'], freq='D') + assert idx1.is_monotonic + + # non-monotonic + idx2 = pd.PeriodIndex(['2011-01-01', pd.NaT, '2011-01-03', + '2011-01-02', pd.NaT], freq='D') + assert not idx2.is_monotonic + + for idx in [idx1, idx2]: + assert idx.min() == pd.Period('2011-01-01', freq='D') + assert idx.max() == pd.Period('2011-01-03', freq='D') + assert idx1.argmin() == 1 + assert idx2.argmin() == 0 + assert idx1.argmax() == 3 + assert idx2.argmax() == 2 + + for op in ['min', 'max']: + # Return NaT + obj = PeriodIndex([], freq='M') + result = getattr(obj, op)() + assert result is tslib.NaT + + obj = PeriodIndex([pd.NaT], freq='M') + result = getattr(obj, op)() + assert result is tslib.NaT + + obj = PeriodIndex([pd.NaT, pd.NaT, pd.NaT], freq='M') + result = getattr(obj, op)() + assert result is tslib.NaT + + def test_numpy_minmax(self): + pr = pd.period_range(start='2016-01-15', end='2016-01-20') + + assert np.min(pr) == Period('2016-01-15', freq='D') + assert np.max(pr) == Period('2016-01-20', freq='D') + + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, errmsg, np.min, pr, out=0) + tm.assert_raises_regex(ValueError, errmsg, np.max, pr, out=0) + + assert np.argmin(pr) == 0 + assert np.argmax(pr) == 5 + + if not _np_version_under1p10: + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex( + ValueError, errmsg, np.argmin, pr, out=0) + tm.assert_raises_regex( + ValueError, errmsg, np.argmax, pr, out=0) + + def test_representation(self): + # GH 7601 + idx1 = PeriodIndex([], freq='D') + idx2 = PeriodIndex(['2011-01-01'], freq='D') + idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') + idx4 = PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], + freq='D') + idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') + idx6 = PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', + 'NaT'], freq='H') + idx7 = pd.period_range('2013Q1', periods=1, freq="Q") + idx8 = pd.period_range('2013Q1', periods=2, freq="Q") + idx9 = pd.period_range('2013Q1', periods=3, freq="Q") + idx10 = PeriodIndex(['2011-01-01', '2011-02-01'], freq='3D') + + exp1 = """PeriodIndex([], dtype='period[D]', freq='D')""" + + exp2 = """PeriodIndex(['2011-01-01'], dtype='period[D]', freq='D')""" + + exp3 = ("PeriodIndex(['2011-01-01', '2011-01-02'], dtype='period[D]', " + "freq='D')") + + exp4 = ("PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], " + "dtype='period[D]', freq='D')") + + exp5 = ("PeriodIndex(['2011', '2012', '2013'], dtype='period[A-DEC]', " + "freq='A-DEC')") + + exp6 = ("PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], " + "dtype='period[H]', freq='H')") + + exp7 = ("PeriodIndex(['2013Q1'], dtype='period[Q-DEC]', " + "freq='Q-DEC')") + + exp8 = ("PeriodIndex(['2013Q1', '2013Q2'], dtype='period[Q-DEC]', " + "freq='Q-DEC')") + + exp9 = ("PeriodIndex(['2013Q1', '2013Q2', '2013Q3'], " + "dtype='period[Q-DEC]', freq='Q-DEC')") + + exp10 = ("PeriodIndex(['2011-01-01', '2011-02-01'], " + "dtype='period[3D]', freq='3D')") + + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, + idx6, idx7, idx8, idx9, idx10], + [exp1, exp2, exp3, exp4, exp5, + exp6, exp7, exp8, exp9, exp10]): + for func in ['__repr__', '__unicode__', '__str__']: + result = getattr(idx, func)() + assert result == expected + + def test_representation_to_series(self): + # GH 10971 + idx1 = PeriodIndex([], freq='D') + idx2 = PeriodIndex(['2011-01-01'], freq='D') + idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') + idx4 = PeriodIndex(['2011-01-01', '2011-01-02', + '2011-01-03'], freq='D') + idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') + idx6 = PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', + 'NaT'], freq='H') + + idx7 = pd.period_range('2013Q1', periods=1, freq="Q") + idx8 = pd.period_range('2013Q1', periods=2, freq="Q") + idx9 = pd.period_range('2013Q1', periods=3, freq="Q") + + exp1 = """Series([], dtype: object)""" + + exp2 = """0 2011-01-01 +dtype: object""" + + exp3 = """0 2011-01-01 +1 2011-01-02 +dtype: object""" + + exp4 = """0 2011-01-01 +1 2011-01-02 +2 2011-01-03 +dtype: object""" + + exp5 = """0 2011 +1 2012 +2 2013 +dtype: object""" + + exp6 = """0 2011-01-01 09:00 +1 2012-02-01 10:00 +2 NaT +dtype: object""" + + exp7 = """0 2013Q1 +dtype: object""" + + exp8 = """0 2013Q1 +1 2013Q2 +dtype: object""" + + exp9 = """0 2013Q1 +1 2013Q2 +2 2013Q3 +dtype: object""" + + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, + idx6, idx7, idx8, idx9], + [exp1, exp2, exp3, exp4, exp5, + exp6, exp7, exp8, exp9]): + result = repr(pd.Series(idx)) + assert result == expected + + def test_summary(self): + # GH9116 + idx1 = PeriodIndex([], freq='D') + idx2 = PeriodIndex(['2011-01-01'], freq='D') + idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') + idx4 = PeriodIndex( + ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') + idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') + idx6 = PeriodIndex( + ['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], freq='H') + + idx7 = pd.period_range('2013Q1', periods=1, freq="Q") + idx8 = pd.period_range('2013Q1', periods=2, freq="Q") + idx9 = pd.period_range('2013Q1', periods=3, freq="Q") + + exp1 = """PeriodIndex: 0 entries +Freq: D""" + + exp2 = """PeriodIndex: 1 entries, 2011-01-01 to 2011-01-01 +Freq: D""" + + exp3 = """PeriodIndex: 2 entries, 2011-01-01 to 2011-01-02 +Freq: D""" + + exp4 = """PeriodIndex: 3 entries, 2011-01-01 to 2011-01-03 +Freq: D""" + + exp5 = """PeriodIndex: 3 entries, 2011 to 2013 +Freq: A-DEC""" + + exp6 = """PeriodIndex: 3 entries, 2011-01-01 09:00 to NaT +Freq: H""" + + exp7 = """PeriodIndex: 1 entries, 2013Q1 to 2013Q1 +Freq: Q-DEC""" + + exp8 = """PeriodIndex: 2 entries, 2013Q1 to 2013Q2 +Freq: Q-DEC""" + + exp9 = """PeriodIndex: 3 entries, 2013Q1 to 2013Q3 +Freq: Q-DEC""" + + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, + idx6, idx7, idx8, idx9], + [exp1, exp2, exp3, exp4, exp5, + exp6, exp7, exp8, exp9]): + result = idx.summary() + assert result == expected + + def test_resolution(self): + for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', + 'T', 'S', 'L', 'U'], + ['day', 'day', 'day', 'day', + 'hour', 'minute', 'second', + 'millisecond', 'microsecond']): + + idx = pd.period_range(start='2013-04-01', periods=30, freq=freq) + assert idx.resolution == expected + + def test_add_iadd(self): + rng = pd.period_range('1/1/2000', freq='D', periods=5) + other = pd.period_range('1/6/2000', freq='D', periods=5) + + # previously performed setop union, now raises TypeError (GH14164) + with pytest.raises(TypeError): + rng + other + + with pytest.raises(TypeError): + rng += other + + # offset + # DateOffset + rng = pd.period_range('2014', '2024', freq='A') + result = rng + pd.offsets.YearEnd(5) + expected = pd.period_range('2019', '2029', freq='A') + tm.assert_index_equal(result, expected) + rng += pd.offsets.YearEnd(5) + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365), Timedelta(days=365)]: + msg = ('Input has different freq(=.+)? ' + 'from PeriodIndex\\(freq=A-DEC\\)') + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng + o + + rng = pd.period_range('2014-01', '2016-12', freq='M') + result = rng + pd.offsets.MonthEnd(5) + expected = pd.period_range('2014-06', '2017-05', freq='M') + tm.assert_index_equal(result, expected) + rng += pd.offsets.MonthEnd(5) + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365), Timedelta(days=365)]: + rng = pd.period_range('2014-01', '2016-12', freq='M') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=M\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng + o + + # Tick + offsets = [pd.offsets.Day(3), timedelta(days=3), + np.timedelta64(3, 'D'), pd.offsets.Hour(72), + timedelta(minutes=60 * 24 * 3), np.timedelta64(72, 'h'), + Timedelta('72:00:00')] + for delta in offsets: + rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') + result = rng + delta + expected = pd.period_range('2014-05-04', '2014-05-18', freq='D') + tm.assert_index_equal(result, expected) + rng += delta + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23), Timedelta('23:00:00')]: + rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=D\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng + o + + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), pd.offsets.Minute(120), + timedelta(minutes=120), np.timedelta64(120, 'm'), + Timedelta(minutes=120)] + for delta in offsets: + rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', + freq='H') + result = rng + delta + expected = pd.period_range('2014-01-01 12:00', '2014-01-05 12:00', + freq='H') + tm.assert_index_equal(result, expected) + rng += delta + tm.assert_index_equal(rng, expected) + + for delta in [pd.offsets.YearBegin(2), timedelta(minutes=30), + np.timedelta64(30, 's'), Timedelta(seconds=30)]: + rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', + freq='H') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=H\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng + delta + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng += delta + + # int + rng = pd.period_range('2000-01-01 09:00', freq='H', periods=10) + result = rng + 1 + expected = pd.period_range('2000-01-01 10:00', freq='H', periods=10) + tm.assert_index_equal(result, expected) + rng += 1 + tm.assert_index_equal(rng, expected) + + def test_sub(self): + rng = period_range('2007-01', periods=50) + + result = rng - 5 + exp = rng + (-5) + tm.assert_index_equal(result, exp) + + def test_sub_isub(self): + + # previously performed setop, now raises TypeError (GH14164) + # TODO needs to wait on #13077 for decision on result type + rng = pd.period_range('1/1/2000', freq='D', periods=5) + other = pd.period_range('1/6/2000', freq='D', periods=5) + + with pytest.raises(TypeError): + rng - other + + with pytest.raises(TypeError): + rng -= other + + # offset + # DateOffset + rng = pd.period_range('2014', '2024', freq='A') + result = rng - pd.offsets.YearEnd(5) + expected = pd.period_range('2009', '2019', freq='A') + tm.assert_index_equal(result, expected) + rng -= pd.offsets.YearEnd(5) + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + rng = pd.period_range('2014', '2024', freq='A') + msg = ('Input has different freq(=.+)? ' + 'from PeriodIndex\\(freq=A-DEC\\)') + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng - o + + rng = pd.period_range('2014-01', '2016-12', freq='M') + result = rng - pd.offsets.MonthEnd(5) + expected = pd.period_range('2013-08', '2016-07', freq='M') + tm.assert_index_equal(result, expected) + rng -= pd.offsets.MonthEnd(5) + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + rng = pd.period_range('2014-01', '2016-12', freq='M') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=M\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng - o + + # Tick + offsets = [pd.offsets.Day(3), timedelta(days=3), + np.timedelta64(3, 'D'), pd.offsets.Hour(72), + timedelta(minutes=60 * 24 * 3), np.timedelta64(72, 'h')] + for delta in offsets: + rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') + result = rng - delta + expected = pd.period_range('2014-04-28', '2014-05-12', freq='D') + tm.assert_index_equal(result, expected) + rng -= delta + tm.assert_index_equal(rng, expected) + + for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), + pd.offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23)]: + rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=D\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng - o + + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), pd.offsets.Minute(120), + timedelta(minutes=120), np.timedelta64(120, 'm')] + for delta in offsets: + rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', + freq='H') + result = rng - delta + expected = pd.period_range('2014-01-01 08:00', '2014-01-05 08:00', + freq='H') + tm.assert_index_equal(result, expected) + rng -= delta + tm.assert_index_equal(rng, expected) + + for delta in [pd.offsets.YearBegin(2), timedelta(minutes=30), + np.timedelta64(30, 's')]: + rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', + freq='H') + msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=H\\)' + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng + delta + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + rng += delta + + # int + rng = pd.period_range('2000-01-01 09:00', freq='H', periods=10) + result = rng - 1 + expected = pd.period_range('2000-01-01 08:00', freq='H', periods=10) + tm.assert_index_equal(result, expected) + rng -= 1 + tm.assert_index_equal(rng, expected) + + def test_comp_nat(self): + left = pd.PeriodIndex([pd.Period('2011-01-01'), pd.NaT, + pd.Period('2011-01-03')]) + right = pd.PeriodIndex([pd.NaT, pd.NaT, pd.Period('2011-01-03')]) + + for l, r in [(left, right), (left.asobject, right.asobject)]: + result = l == r + expected = np.array([False, False, True]) + tm.assert_numpy_array_equal(result, expected) + + result = l != r + expected = np.array([True, True, False]) + tm.assert_numpy_array_equal(result, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l == pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT == r, expected) + + expected = np.array([True, True, True]) + tm.assert_numpy_array_equal(l != pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT != l, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l < pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT > l, expected) + + def test_value_counts_unique(self): + # GH 7735 + idx = pd.period_range('2011-01-01 09:00', freq='H', periods=10) + # create repeated values, 'n'th element is repeated by n+1 times + idx = PeriodIndex(np.repeat(idx.values, range(1, len(idx) + 1)), + freq='H') + + exp_idx = PeriodIndex(['2011-01-01 18:00', '2011-01-01 17:00', + '2011-01-01 16:00', '2011-01-01 15:00', + '2011-01-01 14:00', '2011-01-01 13:00', + '2011-01-01 12:00', '2011-01-01 11:00', + '2011-01-01 10:00', + '2011-01-01 09:00'], freq='H') + expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + expected = pd.period_range('2011-01-01 09:00', freq='H', + periods=10) + tm.assert_index_equal(idx.unique(), expected) + + idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 09:00', + '2013-01-01 09:00', '2013-01-01 08:00', + '2013-01-01 08:00', pd.NaT], freq='H') + + exp_idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 08:00'], + freq='H') + expected = Series([3, 2], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + exp_idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 08:00', + pd.NaT], freq='H') + expected = Series([3, 2, 1], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(dropna=False), expected) + + tm.assert_index_equal(idx.unique(), exp_idx) + + def test_drop_duplicates_metadata(self): + # GH 10115 + idx = pd.period_range('2011-01-01', '2011-01-31', freq='D', name='idx') + result = idx.drop_duplicates() + tm.assert_index_equal(idx, result) + assert idx.freq == result.freq + + idx_dup = idx.append(idx) # freq will not be reset + result = idx_dup.drop_duplicates() + tm.assert_index_equal(idx, result) + assert idx.freq == result.freq + + def test_drop_duplicates(self): + # to check Index/Series compat + base = pd.period_range('2011-01-01', '2011-01-31', freq='D', + name='idx') + idx = base.append(base[:5]) + + res = idx.drop_duplicates() + tm.assert_index_equal(res, base) + res = Series(idx).drop_duplicates() + tm.assert_series_equal(res, Series(base)) + + res = idx.drop_duplicates(keep='last') + exp = base[5:].append(base[:5]) + tm.assert_index_equal(res, exp) + res = Series(idx).drop_duplicates(keep='last') + tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) + + res = idx.drop_duplicates(keep=False) + tm.assert_index_equal(res, base[5:]) + res = Series(idx).drop_duplicates(keep=False) + tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) + + def test_order_compat(self): + def _check_freq(index, expected_index): + if isinstance(index, PeriodIndex): + assert index.freq == expected_index.freq + + pidx = PeriodIndex(['2011', '2012', '2013'], name='pidx', freq='A') + # for compatibility check + iidx = Index([2011, 2012, 2013], name='idx') + for idx in [pidx, iidx]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, idx) + _check_freq(ordered, idx) + + ordered = idx.sort_values(ascending=False) + tm.assert_index_equal(ordered, idx[::-1]) + _check_freq(ordered, idx[::-1]) + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, idx) + tm.assert_numpy_array_equal(indexer, np.array([0, 1, 2]), + check_dtype=False) + _check_freq(ordered, idx) + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, idx[::-1]) + tm.assert_numpy_array_equal(indexer, np.array([2, 1, 0]), + check_dtype=False) + _check_freq(ordered, idx[::-1]) + + pidx = PeriodIndex(['2011', '2013', '2015', '2012', + '2011'], name='pidx', freq='A') + pexpected = PeriodIndex( + ['2011', '2011', '2012', '2013', '2015'], name='pidx', freq='A') + # for compatibility check + iidx = Index([2011, 2013, 2015, 2012, 2011], name='idx') + iexpected = Index([2011, 2011, 2012, 2013, 2015], name='idx') + for idx, expected in [(pidx, pexpected), (iidx, iexpected)]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, expected) + _check_freq(ordered, idx) + + ordered = idx.sort_values(ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + _check_freq(ordered, idx) + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, expected) + + exp = np.array([0, 4, 3, 1, 2]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + _check_freq(ordered, idx) + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + + exp = np.array([2, 1, 3, 4, 0]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + _check_freq(ordered, idx) + + pidx = PeriodIndex(['2011', '2013', 'NaT', '2011'], name='pidx', + freq='D') + + result = pidx.sort_values() + expected = PeriodIndex(['NaT', '2011', '2011', '2013'], + name='pidx', freq='D') + tm.assert_index_equal(result, expected) + assert result.freq == 'D' + + result = pidx.sort_values(ascending=False) + expected = PeriodIndex( + ['2013', '2011', '2011', 'NaT'], name='pidx', freq='D') + tm.assert_index_equal(result, expected) + assert result.freq == 'D' + + def test_order(self): + for freq in ['D', '2D', '4D']: + idx = PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], + freq=freq, name='idx') + + ordered = idx.sort_values() + tm.assert_index_equal(ordered, idx) + assert ordered.freq == idx.freq + + ordered = idx.sort_values(ascending=False) + expected = idx[::-1] + tm.assert_index_equal(ordered, expected) + assert ordered.freq == expected.freq + assert ordered.freq == freq + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, idx) + tm.assert_numpy_array_equal(indexer, np.array([0, 1, 2]), + check_dtype=False) + assert ordered.freq == idx.freq + assert ordered.freq == freq + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + expected = idx[::-1] + tm.assert_index_equal(ordered, expected) + tm.assert_numpy_array_equal(indexer, np.array([2, 1, 0]), + check_dtype=False) + assert ordered.freq == expected.freq + assert ordered.freq == freq + + idx1 = PeriodIndex(['2011-01-01', '2011-01-03', '2011-01-05', + '2011-01-02', '2011-01-01'], freq='D', name='idx1') + exp1 = PeriodIndex(['2011-01-01', '2011-01-01', '2011-01-02', + '2011-01-03', '2011-01-05'], freq='D', name='idx1') + + idx2 = PeriodIndex(['2011-01-01', '2011-01-03', '2011-01-05', + '2011-01-02', '2011-01-01'], + freq='D', name='idx2') + exp2 = PeriodIndex(['2011-01-01', '2011-01-01', '2011-01-02', + '2011-01-03', '2011-01-05'], + freq='D', name='idx2') + + idx3 = PeriodIndex([pd.NaT, '2011-01-03', '2011-01-05', + '2011-01-02', pd.NaT], freq='D', name='idx3') + exp3 = PeriodIndex([pd.NaT, pd.NaT, '2011-01-02', '2011-01-03', + '2011-01-05'], freq='D', name='idx3') + + for idx, expected in [(idx1, exp1), (idx2, exp2), (idx3, exp3)]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, expected) + assert ordered.freq == 'D' + + ordered = idx.sort_values(ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + assert ordered.freq == 'D' + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, expected) + + exp = np.array([0, 4, 3, 1, 2]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq == 'D' + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + + exp = np.array([2, 1, 3, 4, 0]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq == 'D' + + def test_nat_new(self): + + idx = pd.period_range('2011-01', freq='M', periods=5, name='x') + result = idx._nat_new() + exp = pd.PeriodIndex([pd.NaT] * 5, freq='M', name='x') + tm.assert_index_equal(result, exp) + + result = idx._nat_new(box=False) + exp = np.array([tslib.iNaT] * 5, dtype=np.int64) + tm.assert_numpy_array_equal(result, exp) + + def test_shift(self): + # GH 9903 + idx = pd.PeriodIndex([], name='xxx', freq='H') + + with pytest.raises(TypeError): + # period shift doesn't accept freq + idx.shift(1, freq='H') + + tm.assert_index_equal(idx.shift(0), idx) + tm.assert_index_equal(idx.shift(3), idx) + + idx = pd.PeriodIndex(['2011-01-01 10:00', '2011-01-01 11:00' + '2011-01-01 12:00'], name='xxx', freq='H') + tm.assert_index_equal(idx.shift(0), idx) + exp = pd.PeriodIndex(['2011-01-01 13:00', '2011-01-01 14:00' + '2011-01-01 15:00'], name='xxx', freq='H') + tm.assert_index_equal(idx.shift(3), exp) + exp = pd.PeriodIndex(['2011-01-01 07:00', '2011-01-01 08:00' + '2011-01-01 09:00'], name='xxx', freq='H') + tm.assert_index_equal(idx.shift(-3), exp) + + def test_repeat(self): + index = pd.period_range('2001-01-01', periods=2, freq='D') + exp = pd.PeriodIndex(['2001-01-01', '2001-01-01', + '2001-01-02', '2001-01-02'], freq='D') + for res in [index.repeat(2), np.repeat(index, 2)]: + tm.assert_index_equal(res, exp) + + index = pd.period_range('2001-01-01', periods=2, freq='2D') + exp = pd.PeriodIndex(['2001-01-01', '2001-01-01', + '2001-01-03', '2001-01-03'], freq='2D') + for res in [index.repeat(2), np.repeat(index, 2)]: + tm.assert_index_equal(res, exp) + + index = pd.PeriodIndex(['2001-01', 'NaT', '2003-01'], freq='M') + exp = pd.PeriodIndex(['2001-01', '2001-01', '2001-01', + 'NaT', 'NaT', 'NaT', + '2003-01', '2003-01', '2003-01'], freq='M') + for res in [index.repeat(3), np.repeat(index, 3)]: + tm.assert_index_equal(res, exp) + + def test_nat(self): + assert pd.PeriodIndex._na_value is pd.NaT + assert pd.PeriodIndex([], freq='M')._na_value is pd.NaT + + idx = pd.PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) + assert not idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([], dtype=np.intp)) + + idx = pd.PeriodIndex(['2011-01-01', 'NaT'], freq='D') + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) + assert idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([1], dtype=np.intp)) + + def test_equals(self): + # GH 13107 + for freq in ['D', 'M']: + idx = pd.PeriodIndex(['2011-01-01', '2011-01-02', 'NaT'], + freq=freq) + assert idx.equals(idx) + assert idx.equals(idx.copy()) + assert idx.equals(idx.asobject) + assert idx.asobject.equals(idx) + assert idx.asobject.equals(idx.asobject) + assert not idx.equals(list(idx)) + assert not idx.equals(pd.Series(idx)) + + idx2 = pd.PeriodIndex(['2011-01-01', '2011-01-02', 'NaT'], + freq='H') + assert not idx.equals(idx2) + assert not idx.equals(idx2.copy()) + assert not idx.equals(idx2.asobject) + assert not idx.asobject.equals(idx2) + assert not idx.equals(list(idx2)) + assert not idx.equals(pd.Series(idx2)) + + # same internal, different tz + idx3 = pd.PeriodIndex._simple_new(idx.asi8, freq='H') + tm.assert_numpy_array_equal(idx.asi8, idx3.asi8) + assert not idx.equals(idx3) + assert not idx.equals(idx3.copy()) + assert not idx.equals(idx3.asobject) + assert not idx.asobject.equals(idx3) + assert not idx.equals(list(idx3)) + assert not idx.equals(pd.Series(idx3)) + + +class TestPeriodIndexSeriesMethods(object): + """ Test PeriodIndex and Period Series Ops consistency """ + + def _check(self, values, func, expected): + idx = pd.PeriodIndex(values) + result = func(idx) + if isinstance(expected, pd.Index): + tm.assert_index_equal(result, expected) + else: + # comp op results in bool + tm.assert_numpy_array_equal(result, expected) + + s = pd.Series(values) + result = func(s) + + exp = pd.Series(expected, name=values.name) + tm.assert_series_equal(result, exp) + + def test_pi_ops(self): + idx = PeriodIndex(['2011-01', '2011-02', '2011-03', + '2011-04'], freq='M', name='idx') + + expected = PeriodIndex(['2011-03', '2011-04', + '2011-05', '2011-06'], freq='M', name='idx') + self._check(idx, lambda x: x + 2, expected) + self._check(idx, lambda x: 2 + x, expected) + + self._check(idx + 2, lambda x: x - 2, idx) + result = idx - Period('2011-01', freq='M') + exp = pd.Index([0, 1, 2, 3], name='idx') + tm.assert_index_equal(result, exp) + + result = Period('2011-01', freq='M') - idx + exp = pd.Index([0, -1, -2, -3], name='idx') + tm.assert_index_equal(result, exp) + + def test_pi_ops_errors(self): + idx = PeriodIndex(['2011-01', '2011-02', '2011-03', + '2011-04'], freq='M', name='idx') + s = pd.Series(idx) + + msg = r"unsupported operand type\(s\)" + + for obj in [idx, s]: + for ng in ["str", 1.5]: + with tm.assert_raises_regex(TypeError, msg): + obj + ng + + with pytest.raises(TypeError): + # error message differs between PY2 and 3 + ng + obj + + with tm.assert_raises_regex(TypeError, msg): + obj - ng + + with pytest.raises(TypeError): + np.add(obj, ng) + + if _np_version_under1p10: + assert np.add(ng, obj) is NotImplemented + else: + with pytest.raises(TypeError): + np.add(ng, obj) + + with pytest.raises(TypeError): + np.subtract(obj, ng) + + if _np_version_under1p10: + assert np.subtract(ng, obj) is NotImplemented + else: + with pytest.raises(TypeError): + np.subtract(ng, obj) + + def test_pi_ops_nat(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + expected = PeriodIndex(['2011-03', '2011-04', + 'NaT', '2011-06'], freq='M', name='idx') + self._check(idx, lambda x: x + 2, expected) + self._check(idx, lambda x: 2 + x, expected) + self._check(idx, lambda x: np.add(x, 2), expected) + + self._check(idx + 2, lambda x: x - 2, idx) + self._check(idx + 2, lambda x: np.subtract(x, 2), idx) + + # freq with mult + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='2M', name='idx') + expected = PeriodIndex(['2011-07', '2011-08', + 'NaT', '2011-10'], freq='2M', name='idx') + self._check(idx, lambda x: x + 3, expected) + self._check(idx, lambda x: 3 + x, expected) + self._check(idx, lambda x: np.add(x, 3), expected) + + self._check(idx + 3, lambda x: x - 3, idx) + self._check(idx + 3, lambda x: np.subtract(x, 3), idx) + + def test_pi_ops_array_int(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + f = lambda x: x + np.array([1, 2, 3, 4]) + exp = PeriodIndex(['2011-02', '2011-04', 'NaT', + '2011-08'], freq='M', name='idx') + self._check(idx, f, exp) + + f = lambda x: np.add(x, np.array([4, -1, 1, 2])) + exp = PeriodIndex(['2011-05', '2011-01', 'NaT', + '2011-06'], freq='M', name='idx') + self._check(idx, f, exp) + + f = lambda x: x - np.array([1, 2, 3, 4]) + exp = PeriodIndex(['2010-12', '2010-12', 'NaT', + '2010-12'], freq='M', name='idx') + self._check(idx, f, exp) + + f = lambda x: np.subtract(x, np.array([3, 2, 3, -2])) + exp = PeriodIndex(['2010-10', '2010-12', 'NaT', + '2011-06'], freq='M', name='idx') + self._check(idx, f, exp) + + def test_pi_ops_offset(self): + idx = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01', + '2011-04-01'], freq='D', name='idx') + f = lambda x: x + offsets.Day() + exp = PeriodIndex(['2011-01-02', '2011-02-02', '2011-03-02', + '2011-04-02'], freq='D', name='idx') + self._check(idx, f, exp) + + f = lambda x: x + offsets.Day(2) + exp = PeriodIndex(['2011-01-03', '2011-02-03', '2011-03-03', + '2011-04-03'], freq='D', name='idx') + self._check(idx, f, exp) + + f = lambda x: x - offsets.Day(2) + exp = PeriodIndex(['2010-12-30', '2011-01-30', '2011-02-27', + '2011-03-30'], freq='D', name='idx') + self._check(idx, f, exp) + + def test_pi_offset_errors(self): + idx = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01', + '2011-04-01'], freq='D', name='idx') + s = pd.Series(idx) + + # Series op is applied per Period instance, thus error is raised + # from Period + msg_idx = r"Input has different freq from PeriodIndex\(freq=D\)" + msg_s = r"Input cannot be converted to Period\(freq=D\)" + for obj, msg in [(idx, msg_idx), (s, msg_s)]: + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + obj + offsets.Hour(2) + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + offsets.Hour(2) + obj + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + obj - offsets.Hour(2) + + def test_pi_sub_period(self): + # GH 13071 + idx = PeriodIndex(['2011-01', '2011-02', '2011-03', + '2011-04'], freq='M', name='idx') + + result = idx - pd.Period('2012-01', freq='M') + exp = pd.Index([-12, -11, -10, -9], name='idx') + tm.assert_index_equal(result, exp) + + result = np.subtract(idx, pd.Period('2012-01', freq='M')) + tm.assert_index_equal(result, exp) + + result = pd.Period('2012-01', freq='M') - idx + exp = pd.Index([12, 11, 10, 9], name='idx') + tm.assert_index_equal(result, exp) + + result = np.subtract(pd.Period('2012-01', freq='M'), idx) + if _np_version_under1p10: + assert result is NotImplemented + else: + tm.assert_index_equal(result, exp) + + exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx') + tm.assert_index_equal(idx - pd.Period('NaT', freq='M'), exp) + tm.assert_index_equal(pd.Period('NaT', freq='M') - idx, exp) + + def test_pi_sub_pdnat(self): + # GH 13071 + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + exp = pd.TimedeltaIndex([pd.NaT] * 4, name='idx') + tm.assert_index_equal(pd.NaT - idx, exp) + tm.assert_index_equal(idx - pd.NaT, exp) + + def test_pi_sub_period_nat(self): + # GH 13071 + idx = PeriodIndex(['2011-01', 'NaT', '2011-03', + '2011-04'], freq='M', name='idx') + + result = idx - pd.Period('2012-01', freq='M') + exp = pd.Index([-12, np.nan, -10, -9], name='idx') + tm.assert_index_equal(result, exp) + + result = pd.Period('2012-01', freq='M') - idx + exp = pd.Index([12, np.nan, 10, 9], name='idx') + tm.assert_index_equal(result, exp) + + exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx') + tm.assert_index_equal(idx - pd.Period('NaT', freq='M'), exp) + tm.assert_index_equal(pd.Period('NaT', freq='M') - idx, exp) + + def test_pi_comp_period(self): + idx = PeriodIndex(['2011-01', '2011-02', '2011-03', + '2011-04'], freq='M', name='idx') + + f = lambda x: x == pd.Period('2011-03', freq='M') + exp = np.array([False, False, True, False], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: pd.Period('2011-03', freq='M') == x + self._check(idx, f, exp) + + f = lambda x: x != pd.Period('2011-03', freq='M') + exp = np.array([True, True, False, True], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: pd.Period('2011-03', freq='M') != x + self._check(idx, f, exp) + + f = lambda x: pd.Period('2011-03', freq='M') >= x + exp = np.array([True, True, True, False], dtype=np.bool) + self._check(idx, f, exp) + + f = lambda x: x > pd.Period('2011-03', freq='M') + exp = np.array([False, False, False, True], dtype=np.bool) + self._check(idx, f, exp) + + f = lambda x: pd.Period('2011-03', freq='M') >= x + exp = np.array([True, True, True, False], dtype=np.bool) + self._check(idx, f, exp) + + def test_pi_comp_period_nat(self): + idx = PeriodIndex(['2011-01', 'NaT', '2011-03', + '2011-04'], freq='M', name='idx') + + f = lambda x: x == pd.Period('2011-03', freq='M') + exp = np.array([False, False, True, False], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: pd.Period('2011-03', freq='M') == x + self._check(idx, f, exp) + + f = lambda x: x == tslib.NaT + exp = np.array([False, False, False, False], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: tslib.NaT == x + self._check(idx, f, exp) + + f = lambda x: x != pd.Period('2011-03', freq='M') + exp = np.array([True, True, False, True], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: pd.Period('2011-03', freq='M') != x + self._check(idx, f, exp) + + f = lambda x: x != tslib.NaT + exp = np.array([True, True, True, True], dtype=np.bool) + self._check(idx, f, exp) + f = lambda x: tslib.NaT != x + self._check(idx, f, exp) + + f = lambda x: pd.Period('2011-03', freq='M') >= x + exp = np.array([True, False, True, False], dtype=np.bool) + self._check(idx, f, exp) + + f = lambda x: x < pd.Period('2011-03', freq='M') + exp = np.array([True, False, False, False], dtype=np.bool) + self._check(idx, f, exp) + + f = lambda x: x > tslib.NaT + exp = np.array([False, False, False, False], dtype=np.bool) + self._check(idx, f, exp) + + f = lambda x: tslib.NaT >= x + exp = np.array([False, False, False, False], dtype=np.bool) + self._check(idx, f, exp) + + +class TestSeriesPeriod(object): + + def setup_method(self, method): + self.series = Series(period_range('2000-01-01', periods=10, freq='D')) + + def test_ops_series_timedelta(self): + # GH 13043 + s = pd.Series([pd.Period('2015-01-01', freq='D'), + pd.Period('2015-01-02', freq='D')], name='xxx') + assert s.dtype == object + + exp = pd.Series([pd.Period('2015-01-02', freq='D'), + pd.Period('2015-01-03', freq='D')], name='xxx') + tm.assert_series_equal(s + pd.Timedelta('1 days'), exp) + tm.assert_series_equal(pd.Timedelta('1 days') + s, exp) + + tm.assert_series_equal(s + pd.tseries.offsets.Day(), exp) + tm.assert_series_equal(pd.tseries.offsets.Day() + s, exp) + + def test_ops_series_period(self): + # GH 13043 + s = pd.Series([pd.Period('2015-01-01', freq='D'), + pd.Period('2015-01-02', freq='D')], name='xxx') + assert s.dtype == object + + p = pd.Period('2015-01-10', freq='D') + # dtype will be object because of original dtype + exp = pd.Series([9, 8], name='xxx', dtype=object) + tm.assert_series_equal(p - s, exp) + tm.assert_series_equal(s - p, -exp) + + s2 = pd.Series([pd.Period('2015-01-05', freq='D'), + pd.Period('2015-01-04', freq='D')], name='xxx') + assert s2.dtype == object + + exp = pd.Series([4, 2], name='xxx', dtype=object) + tm.assert_series_equal(s2 - s, exp) + tm.assert_series_equal(s - s2, -exp) + + +class TestFramePeriod(object): + + def test_ops_frame_period(self): + # GH 13043 + df = pd.DataFrame({'A': [pd.Period('2015-01', freq='M'), + pd.Period('2015-02', freq='M')], + 'B': [pd.Period('2014-01', freq='M'), + pd.Period('2014-02', freq='M')]}) + assert df['A'].dtype == object + assert df['B'].dtype == object + + p = pd.Period('2015-03', freq='M') + # dtype will be object because of original dtype + exp = pd.DataFrame({'A': np.array([2, 1], dtype=object), + 'B': np.array([14, 13], dtype=object)}) + tm.assert_frame_equal(p - df, exp) + tm.assert_frame_equal(df - p, -exp) + + df2 = pd.DataFrame({'A': [pd.Period('2015-05', freq='M'), + pd.Period('2015-06', freq='M')], + 'B': [pd.Period('2015-05', freq='M'), + pd.Period('2015-06', freq='M')]}) + assert df2['A'].dtype == object + assert df2['B'].dtype == object + + exp = pd.DataFrame({'A': np.array([4, 4], dtype=object), + 'B': np.array([16, 16], dtype=object)}) + tm.assert_frame_equal(df2 - df, exp) + tm.assert_frame_equal(df - df2, -exp) + + +class TestPeriodIndexComparisons(object): + + def test_pi_pi_comp(self): + + for freq in ['M', '2M', '3M']: + base = PeriodIndex(['2011-01', '2011-02', + '2011-03', '2011-04'], freq=freq) + p = Period('2011-02', freq=freq) + + exp = np.array([False, True, False, False]) + tm.assert_numpy_array_equal(base == p, exp) + tm.assert_numpy_array_equal(p == base, exp) + + exp = np.array([True, False, True, True]) + tm.assert_numpy_array_equal(base != p, exp) + tm.assert_numpy_array_equal(p != base, exp) + + exp = np.array([False, False, True, True]) + tm.assert_numpy_array_equal(base > p, exp) + tm.assert_numpy_array_equal(p < base, exp) + + exp = np.array([True, False, False, False]) + tm.assert_numpy_array_equal(base < p, exp) + tm.assert_numpy_array_equal(p > base, exp) + + exp = np.array([False, True, True, True]) + tm.assert_numpy_array_equal(base >= p, exp) + tm.assert_numpy_array_equal(p <= base, exp) + + exp = np.array([True, True, False, False]) + tm.assert_numpy_array_equal(base <= p, exp) + tm.assert_numpy_array_equal(p >= base, exp) + + idx = PeriodIndex(['2011-02', '2011-01', '2011-03', + '2011-05'], freq=freq) + + exp = np.array([False, False, True, False]) + tm.assert_numpy_array_equal(base == idx, exp) + + exp = np.array([True, True, False, True]) + tm.assert_numpy_array_equal(base != idx, exp) + + exp = np.array([False, True, False, False]) + tm.assert_numpy_array_equal(base > idx, exp) + + exp = np.array([True, False, False, True]) + tm.assert_numpy_array_equal(base < idx, exp) + + exp = np.array([False, True, True, False]) + tm.assert_numpy_array_equal(base >= idx, exp) + + exp = np.array([True, False, True, True]) + tm.assert_numpy_array_equal(base <= idx, exp) + + # different base freq + msg = "Input has different freq=A-DEC from PeriodIndex" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + base <= Period('2011', freq='A') + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + Period('2011', freq='A') >= base + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + idx = PeriodIndex(['2011', '2012', '2013', '2014'], freq='A') + base <= idx + + # Different frequency + msg = "Input has different freq=4M from PeriodIndex" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + base <= Period('2011', freq='4M') + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + Period('2011', freq='4M') >= base + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + idx = PeriodIndex(['2011', '2012', '2013', '2014'], freq='4M') + base <= idx + + def test_pi_nat_comp(self): + for freq in ['M', '2M', '3M']: + idx1 = PeriodIndex( + ['2011-01', '2011-02', 'NaT', '2011-05'], freq=freq) + + result = idx1 > Period('2011-02', freq=freq) + exp = np.array([False, False, False, True]) + tm.assert_numpy_array_equal(result, exp) + result = Period('2011-02', freq=freq) < idx1 + tm.assert_numpy_array_equal(result, exp) + + result = idx1 == Period('NaT', freq=freq) + exp = np.array([False, False, False, False]) + tm.assert_numpy_array_equal(result, exp) + result = Period('NaT', freq=freq) == idx1 + tm.assert_numpy_array_equal(result, exp) + + result = idx1 != Period('NaT', freq=freq) + exp = np.array([True, True, True, True]) + tm.assert_numpy_array_equal(result, exp) + result = Period('NaT', freq=freq) != idx1 + tm.assert_numpy_array_equal(result, exp) + + idx2 = PeriodIndex(['2011-02', '2011-01', '2011-04', + 'NaT'], freq=freq) + result = idx1 < idx2 + exp = np.array([True, False, False, False]) + tm.assert_numpy_array_equal(result, exp) + + result = idx1 == idx2 + exp = np.array([False, False, False, False]) + tm.assert_numpy_array_equal(result, exp) + + result = idx1 != idx2 + exp = np.array([True, True, True, True]) + tm.assert_numpy_array_equal(result, exp) + + result = idx1 == idx1 + exp = np.array([True, True, False, True]) + tm.assert_numpy_array_equal(result, exp) + + result = idx1 != idx1 + exp = np.array([False, False, True, False]) + tm.assert_numpy_array_equal(result, exp) + + diff = PeriodIndex(['2011-02', '2011-01', '2011-04', + 'NaT'], freq='4M') + msg = "Input has different freq=4M from PeriodIndex" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + idx1 > diff + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + idx1 == diff diff --git a/pandas/tests/indexes/period/test_partial_slicing.py b/pandas/tests/indexes/period/test_partial_slicing.py new file mode 100644 index 0000000000000..6d142722c315a --- /dev/null +++ b/pandas/tests/indexes/period/test_partial_slicing.py @@ -0,0 +1,141 @@ +import pytest + +import numpy as np + +import pandas as pd +from pandas.util import testing as tm +from pandas import (Series, period_range, DatetimeIndex, PeriodIndex, + DataFrame, _np_version_under1p12, Period) + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_slice_with_negative_step(self): + ts = Series(np.arange(20), + period_range('2014-01', periods=20, freq='M')) + SLC = pd.IndexSlice + + def assert_slices_equivalent(l_slc, i_slc): + tm.assert_series_equal(ts[l_slc], ts.iloc[i_slc]) + tm.assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + tm.assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + + assert_slices_equivalent(SLC[Period('2014-10')::-1], SLC[9::-1]) + assert_slices_equivalent(SLC['2014-10'::-1], SLC[9::-1]) + + assert_slices_equivalent(SLC[:Period('2014-10'):-1], SLC[:8:-1]) + assert_slices_equivalent(SLC[:'2014-10':-1], SLC[:8:-1]) + + assert_slices_equivalent(SLC['2015-02':'2014-10':-1], SLC[13:8:-1]) + assert_slices_equivalent(SLC[Period('2015-02'):Period('2014-10'):-1], + SLC[13:8:-1]) + assert_slices_equivalent(SLC['2015-02':Period('2014-10'):-1], + SLC[13:8:-1]) + assert_slices_equivalent(SLC[Period('2015-02'):'2014-10':-1], + SLC[13:8:-1]) + + assert_slices_equivalent(SLC['2014-10':'2015-02':-1], SLC[:0]) + + def test_slice_with_zero_step_raises(self): + ts = Series(np.arange(20), + period_range('2014-01', periods=20, freq='M')) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) + + def test_slice_keep_name(self): + idx = period_range('20010101', periods=10, freq='D', name='bob') + assert idx.name == idx[1:].name + + def test_pindex_slice_index(self): + pi = PeriodIndex(start='1/1/10', end='12/31/12', freq='M') + s = Series(np.random.rand(len(pi)), index=pi) + res = s['2010'] + exp = s[0:12] + tm.assert_series_equal(res, exp) + res = s['2011'] + exp = s[12:24] + tm.assert_series_equal(res, exp) + + def test_range_slice_day(self): + # GH 6716 + didx = DatetimeIndex(start='2013/01/01', freq='D', periods=400) + pidx = PeriodIndex(start='2013/01/01', freq='D', periods=400) + + # changed to TypeError in 1.12 + # https://github.com/numpy/numpy/pull/6271 + exc = IndexError if _np_version_under1p12 else TypeError + + for idx in [didx, pidx]: + # slices against index should raise IndexError + values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', + '2013/02/01 09:00'] + for v in values: + with pytest.raises(exc): + idx[v:] + + s = Series(np.random.rand(len(idx)), index=idx) + + tm.assert_series_equal(s['2013/01/02':], s[1:]) + tm.assert_series_equal(s['2013/01/02':'2013/01/05'], s[1:5]) + tm.assert_series_equal(s['2013/02':], s[31:]) + tm.assert_series_equal(s['2014':], s[365:]) + + invalid = ['2013/02/01 9H', '2013/02/01 09:00'] + for v in invalid: + with pytest.raises(exc): + idx[v:] + + def test_range_slice_seconds(self): + # GH 6716 + didx = DatetimeIndex(start='2013/01/01 09:00:00', freq='S', + periods=4000) + pidx = PeriodIndex(start='2013/01/01 09:00:00', freq='S', periods=4000) + + # changed to TypeError in 1.12 + # https://github.com/numpy/numpy/pull/6271 + exc = IndexError if _np_version_under1p12 else TypeError + + for idx in [didx, pidx]: + # slices against index should raise IndexError + values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', + '2013/02/01 09:00'] + for v in values: + with pytest.raises(exc): + idx[v:] + + s = Series(np.random.rand(len(idx)), index=idx) + + tm.assert_series_equal(s['2013/01/01 09:05':'2013/01/01 09:10'], + s[300:660]) + tm.assert_series_equal(s['2013/01/01 10:00':'2013/01/01 10:05'], + s[3600:3960]) + tm.assert_series_equal(s['2013/01/01 10H':], s[3600:]) + tm.assert_series_equal(s[:'2013/01/01 09:30'], s[:1860]) + for d in ['2013/01/01', '2013/01', '2013']: + tm.assert_series_equal(s[d:], s) + + def test_range_slice_outofbounds(self): + # GH 5407 + didx = DatetimeIndex(start='2013/10/01', freq='D', periods=10) + pidx = PeriodIndex(start='2013/10/01', freq='D', periods=10) + + for idx in [didx, pidx]: + df = DataFrame(dict(units=[100 + i for i in range(10)]), index=idx) + empty = DataFrame(index=idx.__class__([], freq='D'), + columns=['units']) + empty['units'] = empty['units'].astype('int64') + + tm.assert_frame_equal(df['2013/09/01':'2013/09/30'], empty) + tm.assert_frame_equal(df['2013/09/30':'2013/10/02'], df.iloc[:2]) + tm.assert_frame_equal(df['2013/10/01':'2013/10/02'], df.iloc[:2]) + tm.assert_frame_equal(df['2013/10/02':'2013/09/30'], empty) + tm.assert_frame_equal(df['2013/10/15':'2013/10/17'], empty) + tm.assert_frame_equal(df['2013-06':'2013-09'], empty) + tm.assert_frame_equal(df['2013-11':'2013-12'], empty) diff --git a/pandas/tests/indexes/period/test_period.py b/pandas/tests/indexes/period/test_period.py new file mode 100644 index 0000000000000..e24e2ad936e2c --- /dev/null +++ b/pandas/tests/indexes/period/test_period.py @@ -0,0 +1,781 @@ +import pytest + +import numpy as np +from numpy.random import randn +from datetime import timedelta + +import pandas as pd +from pandas.util import testing as tm +from pandas import (PeriodIndex, period_range, notna, DatetimeIndex, NaT, + Index, Period, Int64Index, Series, DataFrame, date_range, + offsets, compat) + +from ..datetimelike import DatetimeLike + + +class TestPeriodIndex(DatetimeLike): + _holder = PeriodIndex + _multiprocess_can_split_ = True + + def setup_method(self, method): + self.indices = dict(index=tm.makePeriodIndex(10)) + self.setup_indices() + + def create_index(self): + return period_range('20130101', periods=5, freq='D') + + def test_astype(self): + # GH 13149, GH 13209 + idx = PeriodIndex(['2016-05-16', 'NaT', NaT, np.NaN], freq='D') + + result = idx.astype(object) + expected = Index([Period('2016-05-16', freq='D')] + + [Period(NaT, freq='D')] * 3, dtype='object') + tm.assert_index_equal(result, expected) + + result = idx.astype(int) + expected = Int64Index([16937] + [-9223372036854775808] * 3, + dtype=np.int64) + tm.assert_index_equal(result, expected) + + idx = period_range('1990', '2009', freq='A') + result = idx.astype('i8') + tm.assert_index_equal(result, Index(idx.asi8)) + tm.assert_numpy_array_equal(result.values, idx.asi8) + + def test_astype_raises(self): + # GH 13149, GH 13209 + idx = PeriodIndex(['2016-05-16', 'NaT', NaT, np.NaN], freq='D') + + pytest.raises(ValueError, idx.astype, str) + pytest.raises(ValueError, idx.astype, float) + pytest.raises(ValueError, idx.astype, 'timedelta64') + pytest.raises(ValueError, idx.astype, 'timedelta64[ns]') + + def test_pickle_compat_construction(self): + pass + + def test_pickle_round_trip(self): + for freq in ['D', 'M', 'A']: + idx = PeriodIndex(['2016-05-16', 'NaT', NaT, np.NaN], freq=freq) + result = tm.round_trip_pickle(idx) + tm.assert_index_equal(result, idx) + + def test_get_loc(self): + idx = pd.period_range('2000-01-01', periods=3) + + for method in [None, 'pad', 'backfill', 'nearest']: + assert idx.get_loc(idx[1], method) == 1 + assert idx.get_loc(idx[1].asfreq('H', how='start'), method) == 1 + assert idx.get_loc(idx[1].to_timestamp(), method) == 1 + assert idx.get_loc(idx[1].to_timestamp() + .to_pydatetime(), method) == 1 + assert idx.get_loc(str(idx[1]), method) == 1 + + idx = pd.period_range('2000-01-01', periods=5)[::2] + assert idx.get_loc('2000-01-02T12', method='nearest', + tolerance='1 day') == 1 + assert idx.get_loc('2000-01-02T12', method='nearest', + tolerance=pd.Timedelta('1D')) == 1 + assert idx.get_loc('2000-01-02T12', method='nearest', + tolerance=np.timedelta64(1, 'D')) == 1 + assert idx.get_loc('2000-01-02T12', method='nearest', + tolerance=timedelta(1)) == 1 + with tm.assert_raises_regex(ValueError, 'must be convertible'): + idx.get_loc('2000-01-10', method='nearest', tolerance='foo') + + msg = 'Input has different freq from PeriodIndex\\(freq=D\\)' + with tm.assert_raises_regex(ValueError, msg): + idx.get_loc('2000-01-10', method='nearest', tolerance='1 hour') + with pytest.raises(KeyError): + idx.get_loc('2000-01-10', method='nearest', tolerance='1 day') + + def test_where(self): + i = self.create_index() + result = i.where(notna(i)) + expected = i + tm.assert_index_equal(result, expected) + + i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), + freq='D') + result = i.where(notna(i2)) + expected = i2 + tm.assert_index_equal(result, expected) + + def test_where_array_like(self): + i = self.create_index() + cond = [False] + [True] * (len(i) - 1) + klasses = [list, tuple, np.array, Series] + expected = pd.PeriodIndex([pd.NaT] + i[1:].tolist(), freq='D') + + for klass in klasses: + result = i.where(klass(cond)) + tm.assert_index_equal(result, expected) + + def test_where_other(self): + + i = self.create_index() + for arr in [np.nan, pd.NaT]: + result = i.where(notna(i), other=np.nan) + expected = i + tm.assert_index_equal(result, expected) + + i2 = i.copy() + i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), + freq='D') + result = i.where(notna(i2), i2) + tm.assert_index_equal(result, i2) + + i2 = i.copy() + i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), + freq='D') + result = i.where(notna(i2), i2.values) + tm.assert_index_equal(result, i2) + + def test_get_indexer(self): + idx = pd.period_range('2000-01-01', periods=3).asfreq('H', how='start') + tm.assert_numpy_array_equal(idx.get_indexer(idx), + np.array([0, 1, 2], dtype=np.intp)) + + target = pd.PeriodIndex(['1999-12-31T23', '2000-01-01T12', + '2000-01-02T01'], freq='H') + tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), + np.array([-1, 0, 1], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), + np.array([0, 1, 2], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), + np.array([0, 1, 1], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest', + tolerance='1 hour'), + np.array([0, -1, 1], dtype=np.intp)) + + msg = 'Input has different freq from PeriodIndex\\(freq=H\\)' + with tm.assert_raises_regex(ValueError, msg): + idx.get_indexer(target, 'nearest', tolerance='1 minute') + + tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest', + tolerance='1 day'), + np.array([0, 1, 1], dtype=np.intp)) + + def test_repeat(self): + # GH10183 + idx = pd.period_range('2000-01-01', periods=3, freq='D') + res = idx.repeat(3) + exp = PeriodIndex(idx.values.repeat(3), freq='D') + tm.assert_index_equal(res, exp) + assert res.freqstr == 'D' + + def test_period_index_indexer(self): + # GH4125 + idx = pd.period_range('2002-01', '2003-12', freq='M') + df = pd.DataFrame(pd.np.random.randn(24, 10), index=idx) + tm.assert_frame_equal(df, df.loc[idx]) + tm.assert_frame_equal(df, df.loc[list(idx)]) + tm.assert_frame_equal(df, df.loc[list(idx)]) + tm.assert_frame_equal(df.iloc[0:5], df.loc[idx[0:5]]) + tm.assert_frame_equal(df, df.loc[list(idx)]) + + def test_fillna_period(self): + # GH 11343 + idx = pd.PeriodIndex(['2011-01-01 09:00', pd.NaT, + '2011-01-01 11:00'], freq='H') + + exp = pd.PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', + '2011-01-01 11:00'], freq='H') + tm.assert_index_equal( + idx.fillna(pd.Period('2011-01-01 10:00', freq='H')), exp) + + exp = pd.Index([pd.Period('2011-01-01 09:00', freq='H'), 'x', + pd.Period('2011-01-01 11:00', freq='H')], dtype=object) + tm.assert_index_equal(idx.fillna('x'), exp) + + exp = pd.Index([pd.Period('2011-01-01 09:00', freq='H'), + pd.Period('2011-01-01', freq='D'), + pd.Period('2011-01-01 11:00', freq='H')], dtype=object) + tm.assert_index_equal(idx.fillna( + pd.Period('2011-01-01', freq='D')), exp) + + def test_no_millisecond_field(self): + with pytest.raises(AttributeError): + DatetimeIndex.millisecond + + with pytest.raises(AttributeError): + DatetimeIndex([]).millisecond + + def test_difference_freq(self): + # GH14323: difference of Period MUST preserve frequency + # but the ability to union results must be preserved + + index = period_range("20160920", "20160925", freq="D") + + other = period_range("20160921", "20160924", freq="D") + expected = PeriodIndex(["20160920", "20160925"], freq='D') + idx_diff = index.difference(other) + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + other = period_range("20160922", "20160925", freq="D") + idx_diff = index.difference(other) + expected = PeriodIndex(["20160920", "20160921"], freq='D') + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + def test_hash_error(self): + index = period_range('20010101', periods=10) + with tm.assert_raises_regex(TypeError, "unhashable type: %r" % + type(index).__name__): + hash(index) + + def test_make_time_series(self): + index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + series = Series(1, index=index) + assert isinstance(series, Series) + + def test_shallow_copy_empty(self): + + # GH13067 + idx = PeriodIndex([], freq='M') + result = idx._shallow_copy() + expected = idx + + tm.assert_index_equal(result, expected) + + def test_dtype_str(self): + pi = pd.PeriodIndex([], freq='M') + assert pi.dtype_str == 'period[M]' + assert pi.dtype_str == str(pi.dtype) + + pi = pd.PeriodIndex([], freq='3M') + assert pi.dtype_str == 'period[3M]' + assert pi.dtype_str == str(pi.dtype) + + def test_view_asi8(self): + idx = pd.PeriodIndex([], freq='M') + + exp = np.array([], dtype=np.int64) + tm.assert_numpy_array_equal(idx.view('i8'), exp) + tm.assert_numpy_array_equal(idx.asi8, exp) + + idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') + + exp = np.array([492, -9223372036854775808], dtype=np.int64) + tm.assert_numpy_array_equal(idx.view('i8'), exp) + tm.assert_numpy_array_equal(idx.asi8, exp) + + exp = np.array([14975, -9223372036854775808], dtype=np.int64) + idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') + tm.assert_numpy_array_equal(idx.view('i8'), exp) + tm.assert_numpy_array_equal(idx.asi8, exp) + + def test_values(self): + idx = pd.PeriodIndex([], freq='M') + + exp = np.array([], dtype=np.object) + tm.assert_numpy_array_equal(idx.values, exp) + tm.assert_numpy_array_equal(idx.get_values(), exp) + exp = np.array([], dtype=np.int64) + tm.assert_numpy_array_equal(idx._values, exp) + + idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') + + exp = np.array([pd.Period('2011-01', freq='M'), pd.NaT], dtype=object) + tm.assert_numpy_array_equal(idx.values, exp) + tm.assert_numpy_array_equal(idx.get_values(), exp) + exp = np.array([492, -9223372036854775808], dtype=np.int64) + tm.assert_numpy_array_equal(idx._values, exp) + + idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') + + exp = np.array([pd.Period('2011-01-01', freq='D'), pd.NaT], + dtype=object) + tm.assert_numpy_array_equal(idx.values, exp) + tm.assert_numpy_array_equal(idx.get_values(), exp) + exp = np.array([14975, -9223372036854775808], dtype=np.int64) + tm.assert_numpy_array_equal(idx._values, exp) + + def test_period_index_length(self): + pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + assert len(pi) == 9 + + pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2009') + assert len(pi) == 4 * 9 + + pi = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') + assert len(pi) == 12 * 9 + + start = Period('02-Apr-2005', 'B') + i1 = PeriodIndex(start=start, periods=20) + assert len(i1) == 20 + assert i1.freq == start.freq + assert i1[0] == start + + end_intv = Period('2006-12-31', 'W') + i1 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == 10 + assert i1.freq == end_intv.freq + assert i1[-1] == end_intv + + end_intv = Period('2006-12-31', '1w') + i2 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == len(i2) + assert (i1 == i2).all() + assert i1.freq == i2.freq + + end_intv = Period('2006-12-31', ('w', 1)) + i2 = PeriodIndex(end=end_intv, periods=10) + assert len(i1) == len(i2) + assert (i1 == i2).all() + assert i1.freq == i2.freq + + try: + PeriodIndex(start=start, end=end_intv) + raise AssertionError('Cannot allow mixed freq for start and end') + except ValueError: + pass + + end_intv = Period('2005-05-01', 'B') + i1 = PeriodIndex(start=start, end=end_intv) + + try: + PeriodIndex(start=start) + raise AssertionError( + 'Must specify periods if missing start or end') + except ValueError: + pass + + # infer freq from first element + i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) + assert len(i2) == 2 + assert i2[0] == end_intv + + i2 = PeriodIndex(np.array([end_intv, Period('2005-05-05', 'B')])) + assert len(i2) == 2 + assert i2[0] == end_intv + + # Mixed freq should fail + vals = [end_intv, Period('2006-12-31', 'w')] + pytest.raises(ValueError, PeriodIndex, vals) + vals = np.array(vals) + pytest.raises(ValueError, PeriodIndex, vals) + + def test_fields(self): + # year, month, day, hour, minute + # second, weekofyear, week, dayofweek, weekday, dayofyear, quarter + # qyear + pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2005') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2002') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='M', start='1/1/2001', end='1/1/2002') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='D', start='12/1/2001', end='6/1/2001') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='B', start='12/1/2001', end='6/1/2001') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='H', start='12/31/2001', end='1/1/2002 23:00') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='Min', start='12/31/2001', end='1/1/2002 00:20') + self._check_all_fields(pi) + + pi = PeriodIndex(freq='S', start='12/31/2001 00:00:00', + end='12/31/2001 00:05:00') + self._check_all_fields(pi) + + end_intv = Period('2006-12-31', 'W') + i1 = PeriodIndex(end=end_intv, periods=10) + self._check_all_fields(i1) + + def _check_all_fields(self, periodindex): + fields = ['year', 'month', 'day', 'hour', 'minute', 'second', + 'weekofyear', 'week', 'dayofweek', 'dayofyear', + 'quarter', 'qyear', 'days_in_month'] + + periods = list(periodindex) + s = pd.Series(periodindex) + + for field in fields: + field_idx = getattr(periodindex, field) + assert len(periodindex) == len(field_idx) + for x, val in zip(periods, field_idx): + assert getattr(x, field) == val + + if len(s) == 0: + continue + + field_s = getattr(s.dt, field) + assert len(periodindex) == len(field_s) + for x, val in zip(periods, field_s): + assert getattr(x, field) == val + + def test_indexing(self): + + # GH 4390, iat incorrectly indexing + index = period_range('1/1/2001', periods=10) + s = Series(randn(10), index=index) + expected = s[index[0]] + result = s.iat[0] + assert expected == result + + def test_period_set_index_reindex(self): + # GH 6631 + df = DataFrame(np.random.random(6)) + idx1 = period_range('2011/01/01', periods=6, freq='M') + idx2 = period_range('2013', periods=6, freq='A') + + df = df.set_index(idx1) + tm.assert_index_equal(df.index, idx1) + df = df.set_index(idx2) + tm.assert_index_equal(df.index, idx2) + + def test_factorize(self): + idx1 = PeriodIndex(['2014-01', '2014-01', '2014-02', '2014-02', + '2014-03', '2014-03'], freq='M') + + exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) + exp_idx = PeriodIndex(['2014-01', '2014-02', '2014-03'], freq='M') + + arr, idx = idx1.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + arr, idx = idx1.factorize(sort=True) + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + idx2 = pd.PeriodIndex(['2014-03', '2014-03', '2014-02', '2014-01', + '2014-03', '2014-01'], freq='M') + + exp_arr = np.array([2, 2, 1, 0, 2, 0], dtype=np.intp) + arr, idx = idx2.factorize(sort=True) + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + exp_arr = np.array([0, 0, 1, 2, 0, 2], dtype=np.intp) + exp_idx = PeriodIndex(['2014-03', '2014-02', '2014-01'], freq='M') + arr, idx = idx2.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + def test_asobject_like(self): + idx = pd.PeriodIndex([], freq='M') + + exp = np.array([], dtype=object) + tm.assert_numpy_array_equal(idx.asobject.values, exp) + tm.assert_numpy_array_equal(idx._mpl_repr(), exp) + + idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') + + exp = np.array([pd.Period('2011-01', freq='M'), pd.NaT], dtype=object) + tm.assert_numpy_array_equal(idx.asobject.values, exp) + tm.assert_numpy_array_equal(idx._mpl_repr(), exp) + + exp = np.array([pd.Period('2011-01-01', freq='D'), pd.NaT], + dtype=object) + idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') + tm.assert_numpy_array_equal(idx.asobject.values, exp) + tm.assert_numpy_array_equal(idx._mpl_repr(), exp) + + def test_is_(self): + create_index = lambda: PeriodIndex(freq='A', start='1/1/2001', + end='12/1/2009') + index = create_index() + assert index.is_(index) + assert not index.is_(create_index()) + assert index.is_(index.view()) + assert index.is_(index.view().view().view().view().view()) + assert index.view().is_(index) + ind2 = index.view() + index.name = "Apple" + assert ind2.is_(index) + assert not index.is_(index[:]) + assert not index.is_(index.asfreq('M')) + assert not index.is_(index.asfreq('A')) + assert not index.is_(index - 2) + assert not index.is_(index - 0) + + def test_comp_period(self): + idx = period_range('2007-01', periods=20, freq='M') + + result = idx < idx[10] + exp = idx.values < idx.values[10] + tm.assert_numpy_array_equal(result, exp) + + def test_contains(self): + rng = period_range('2007-01', freq='M', periods=10) + + assert Period('2007-01', freq='M') in rng + assert not Period('2007-01', freq='D') in rng + assert not Period('2007-01', freq='2M') in rng + + def test_contains_nat(self): + # see gh-13582 + idx = period_range('2007-01', freq='M', periods=10) + assert pd.NaT not in idx + assert None not in idx + assert float('nan') not in idx + assert np.nan not in idx + + idx = pd.PeriodIndex(['2011-01', 'NaT', '2011-02'], freq='M') + assert pd.NaT in idx + assert None in idx + assert float('nan') in idx + assert np.nan in idx + + def test_periods_number_check(self): + with pytest.raises(ValueError): + period_range('2011-1-1', '2012-1-1', 'B') + + def test_start_time(self): + index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31') + expected_index = date_range('2016-01-01', end='2016-05-31', freq='MS') + tm.assert_index_equal(index.start_time, expected_index) + + def test_end_time(self): + index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31') + expected_index = date_range('2016-01-01', end='2016-05-31', freq='M') + tm.assert_index_equal(index.end_time, expected_index) + + def test_index_duplicate_periods(self): + # monotonic + idx = PeriodIndex([2000, 2007, 2007, 2009, 2009], freq='A-JUN') + ts = Series(np.random.randn(len(idx)), index=idx) + + result = ts[2007] + expected = ts[1:3] + tm.assert_series_equal(result, expected) + result[:] = 1 + assert (ts[1:3] == 1).all() + + # not monotonic + idx = PeriodIndex([2000, 2007, 2007, 2009, 2007], freq='A-JUN') + ts = Series(np.random.randn(len(idx)), index=idx) + + result = ts[2007] + expected = ts[idx == 2007] + tm.assert_series_equal(result, expected) + + def test_index_unique(self): + idx = PeriodIndex([2000, 2007, 2007, 2009, 2009], freq='A-JUN') + expected = PeriodIndex([2000, 2007, 2009], freq='A-JUN') + tm.assert_index_equal(idx.unique(), expected) + assert idx.nunique() == 3 + + idx = PeriodIndex([2000, 2007, 2007, 2009, 2007], freq='A-JUN', + tz='US/Eastern') + expected = PeriodIndex([2000, 2007, 2009], freq='A-JUN', + tz='US/Eastern') + tm.assert_index_equal(idx.unique(), expected) + assert idx.nunique() == 3 + + def test_shift_gh8083(self): + + # test shift for PeriodIndex + # GH8083 + drange = self.create_index() + result = drange.shift(1) + expected = PeriodIndex(['2013-01-02', '2013-01-03', '2013-01-04', + '2013-01-05', '2013-01-06'], freq='D') + tm.assert_index_equal(result, expected) + + def test_shift(self): + pi1 = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='A', start='1/1/2002', end='12/1/2010') + + tm.assert_index_equal(pi1.shift(0), pi1) + + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(1), pi2) + + pi1 = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='A', start='1/1/2000', end='12/1/2008') + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(-1), pi2) + + pi1 = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='M', start='2/1/2001', end='1/1/2010') + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(1), pi2) + + pi1 = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='M', start='12/1/2000', end='11/1/2009') + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(-1), pi2) + + pi1 = PeriodIndex(freq='D', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='D', start='1/2/2001', end='12/2/2009') + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(1), pi2) + + pi1 = PeriodIndex(freq='D', start='1/1/2001', end='12/1/2009') + pi2 = PeriodIndex(freq='D', start='12/31/2000', end='11/30/2009') + assert len(pi1) == len(pi2) + tm.assert_index_equal(pi1.shift(-1), pi2) + + def test_shift_nat(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + result = idx.shift(1) + expected = PeriodIndex(['2011-02', '2011-03', 'NaT', + '2011-05'], freq='M', name='idx') + tm.assert_index_equal(result, expected) + assert result.name == expected.name + + def test_ndarray_compat_properties(self): + if compat.is_platform_32bit(): + pytest.skip("skipping on 32bit") + super(TestPeriodIndex, self).test_ndarray_compat_properties() + + def test_shift_ndarray(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + result = idx.shift(np.array([1, 2, 3, 4])) + expected = PeriodIndex(['2011-02', '2011-04', 'NaT', + '2011-08'], freq='M', name='idx') + tm.assert_index_equal(result, expected) + + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2011-04'], freq='M', name='idx') + result = idx.shift(np.array([1, -2, 3, -4])) + expected = PeriodIndex(['2011-02', '2010-12', 'NaT', + '2010-12'], freq='M', name='idx') + tm.assert_index_equal(result, expected) + + def test_negative_ordinals(self): + Period(ordinal=-1000, freq='A') + Period(ordinal=0, freq='A') + + idx1 = PeriodIndex(ordinal=[-1, 0, 1], freq='A') + idx2 = PeriodIndex(ordinal=np.array([-1, 0, 1]), freq='A') + tm.assert_index_equal(idx1, idx2) + + def test_pindex_fieldaccessor_nat(self): + idx = PeriodIndex(['2011-01', '2011-02', 'NaT', + '2012-03', '2012-04'], freq='D', name='name') + + exp = Index([2011, 2011, -1, 2012, 2012], dtype=np.int64, name='name') + tm.assert_index_equal(idx.year, exp) + exp = Index([1, 2, -1, 3, 4], dtype=np.int64, name='name') + tm.assert_index_equal(idx.month, exp) + + def test_pindex_qaccess(self): + pi = PeriodIndex(['2Q05', '3Q05', '4Q05', '1Q06', '2Q06'], freq='Q') + s = Series(np.random.rand(len(pi)), index=pi).cumsum() + # Todo: fix these accessors! + assert s['05Q4'] == s[2] + + def test_numpy_repeat(self): + index = period_range('20010101', periods=2) + expected = PeriodIndex([Period('2001-01-01'), Period('2001-01-01'), + Period('2001-01-02'), Period('2001-01-02')]) + + tm.assert_index_equal(np.repeat(index, 2), expected) + + msg = "the 'axis' parameter is not supported" + tm.assert_raises_regex( + ValueError, msg, np.repeat, index, 2, axis=1) + + def test_pindex_multiples(self): + pi = PeriodIndex(start='1/1/11', end='12/31/11', freq='2M') + expected = PeriodIndex(['2011-01', '2011-03', '2011-05', '2011-07', + '2011-09', '2011-11'], freq='2M') + tm.assert_index_equal(pi, expected) + assert pi.freq == offsets.MonthEnd(2) + assert pi.freqstr == '2M' + + pi = period_range(start='1/1/11', end='12/31/11', freq='2M') + tm.assert_index_equal(pi, expected) + assert pi.freq == offsets.MonthEnd(2) + assert pi.freqstr == '2M' + + pi = period_range(start='1/1/11', periods=6, freq='2M') + tm.assert_index_equal(pi, expected) + assert pi.freq == offsets.MonthEnd(2) + assert pi.freqstr == '2M' + + def test_iteration(self): + index = PeriodIndex(start='1/1/10', periods=4, freq='B') + + result = list(index) + assert isinstance(result[0], Period) + assert result[0].freq == index.freq + + def test_is_full(self): + index = PeriodIndex([2005, 2007, 2009], freq='A') + assert not index.is_full + + index = PeriodIndex([2005, 2006, 2007], freq='A') + assert index.is_full + + index = PeriodIndex([2005, 2005, 2007], freq='A') + assert not index.is_full + + index = PeriodIndex([2005, 2005, 2006], freq='A') + assert index.is_full + + index = PeriodIndex([2006, 2005, 2005], freq='A') + pytest.raises(ValueError, getattr, index, 'is_full') + + assert index[:0].is_full + + def test_with_multi_index(self): + # #1705 + index = date_range('1/1/2012', periods=4, freq='12H') + index_as_arrays = [index.to_period(freq='D'), index.hour] + + s = Series([0, 1, 2, 3], index_as_arrays) + + assert isinstance(s.index.levels[0], PeriodIndex) + + assert isinstance(s.index.values[0][0], Period) + + def test_convert_array_of_periods(self): + rng = period_range('1/1/2000', periods=20, freq='D') + periods = list(rng) + + result = pd.Index(periods) + assert isinstance(result, PeriodIndex) + + def test_append_concat(self): + # #1815 + d1 = date_range('12/31/1990', '12/31/1999', freq='A-DEC') + d2 = date_range('12/31/2000', '12/31/2009', freq='A-DEC') + + s1 = Series(np.random.randn(10), d1) + s2 = Series(np.random.randn(10), d2) + + s1 = s1.to_period() + s2 = s2.to_period() + + # drops index + result = pd.concat([s1, s2]) + assert isinstance(result.index, PeriodIndex) + assert result.index[0] == s1.index[0] + + def test_pickle_freq(self): + # GH2891 + prng = period_range('1/1/2011', '1/1/2012', freq='M') + new_prng = tm.round_trip_pickle(prng) + assert new_prng.freq == offsets.MonthEnd() + assert new_prng.freqstr == 'M' + + def test_map(self): + index = PeriodIndex([2005, 2007, 2009], freq='A') + result = index.map(lambda x: x + 1) + expected = index + 1 + tm.assert_index_equal(result, expected) + + result = index.map(lambda x: x.ordinal) + exp = Index([x.ordinal for x in index]) + tm.assert_index_equal(result, exp) + + @pytest.mark.parametrize('how', ['outer', 'inner', 'left', 'right']) + def test_join_self(self, how): + index = period_range('1/1/2000', periods=10) + joined = index.join(index, how=how) + assert index is joined diff --git a/pandas/tests/indexes/period/test_setops.py b/pandas/tests/indexes/period/test_setops.py new file mode 100644 index 0000000000000..1ac05f9fa94b7 --- /dev/null +++ b/pandas/tests/indexes/period/test_setops.py @@ -0,0 +1,252 @@ +import pytest + +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas import period_range, PeriodIndex, Index, date_range + + +def _permute(obj): + return obj.take(np.random.permutation(len(obj))) + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_joins(self): + index = period_range('1/1/2000', '1/20/2000', freq='D') + + for kind in ['inner', 'outer', 'left', 'right']: + joined = index.join(index[:-5], how=kind) + + assert isinstance(joined, PeriodIndex) + assert joined.freq == index.freq + + def test_join_self(self): + index = period_range('1/1/2000', '1/20/2000', freq='D') + + for kind in ['inner', 'outer', 'left', 'right']: + res = index.join(index, how=kind) + assert index is res + + def test_join_does_not_recur(self): + df = tm.makeCustomDataframe( + 3, 2, data_gen_f=lambda *args: np.random.randint(2), + c_idx_type='p', r_idx_type='dt') + s = df.iloc[:2, 0] + + res = s.index.join(df.columns, how='outer') + expected = Index([s.index[0], s.index[1], + df.columns[0], df.columns[1]], object) + tm.assert_index_equal(res, expected) + + def test_union(self): + # union + rng1 = pd.period_range('1/1/2000', freq='D', periods=5) + other1 = pd.period_range('1/6/2000', freq='D', periods=5) + expected1 = pd.period_range('1/1/2000', freq='D', periods=10) + + rng2 = pd.period_range('1/1/2000', freq='D', periods=5) + other2 = pd.period_range('1/4/2000', freq='D', periods=5) + expected2 = pd.period_range('1/1/2000', freq='D', periods=8) + + rng3 = pd.period_range('1/1/2000', freq='D', periods=5) + other3 = pd.PeriodIndex([], freq='D') + expected3 = pd.period_range('1/1/2000', freq='D', periods=5) + + rng4 = pd.period_range('2000-01-01 09:00', freq='H', periods=5) + other4 = pd.period_range('2000-01-02 09:00', freq='H', periods=5) + expected4 = pd.PeriodIndex(['2000-01-01 09:00', '2000-01-01 10:00', + '2000-01-01 11:00', '2000-01-01 12:00', + '2000-01-01 13:00', '2000-01-02 09:00', + '2000-01-02 10:00', '2000-01-02 11:00', + '2000-01-02 12:00', '2000-01-02 13:00'], + freq='H') + + rng5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', + '2000-01-01 09:05'], freq='T') + other5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:05' + '2000-01-01 09:08'], + freq='T') + expected5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', + '2000-01-01 09:05', '2000-01-01 09:08'], + freq='T') + + rng6 = pd.period_range('2000-01-01', freq='M', periods=7) + other6 = pd.period_range('2000-04-01', freq='M', periods=7) + expected6 = pd.period_range('2000-01-01', freq='M', periods=10) + + rng7 = pd.period_range('2003-01-01', freq='A', periods=5) + other7 = pd.period_range('1998-01-01', freq='A', periods=8) + expected7 = pd.period_range('1998-01-01', freq='A', periods=10) + + for rng, other, expected in [(rng1, other1, expected1), + (rng2, other2, expected2), + (rng3, other3, expected3), (rng4, other4, + expected4), + (rng5, other5, expected5), (rng6, other6, + expected6), + (rng7, other7, expected7)]: + + result_union = rng.union(other) + tm.assert_index_equal(result_union, expected) + + def test_union_misc(self): + index = period_range('1/1/2000', '1/20/2000', freq='D') + + result = index[:-5].union(index[10:]) + tm.assert_index_equal(result, index) + + # not in order + result = _permute(index[:-5]).union(_permute(index[10:])) + tm.assert_index_equal(result, index) + + # raise if different frequencies + index = period_range('1/1/2000', '1/20/2000', freq='D') + index2 = period_range('1/1/2000', '1/20/2000', freq='W-WED') + with pytest.raises(period.IncompatibleFrequency): + index.union(index2) + + msg = 'can only call with other PeriodIndex-ed objects' + with tm.assert_raises_regex(ValueError, msg): + index.join(index.to_timestamp()) + + index3 = period_range('1/1/2000', '1/20/2000', freq='2D') + with pytest.raises(period.IncompatibleFrequency): + index.join(index3) + + def test_union_dataframe_index(self): + rng1 = pd.period_range('1/1/1999', '1/1/2012', freq='M') + s1 = pd.Series(np.random.randn(len(rng1)), rng1) + + rng2 = pd.period_range('1/1/1980', '12/1/2001', freq='M') + s2 = pd.Series(np.random.randn(len(rng2)), rng2) + df = pd.DataFrame({'s1': s1, 's2': s2}) + + exp = pd.period_range('1/1/1980', '1/1/2012', freq='M') + tm.assert_index_equal(df.index, exp) + + def test_intersection(self): + index = period_range('1/1/2000', '1/20/2000', freq='D') + + result = index[:-5].intersection(index[10:]) + tm.assert_index_equal(result, index[10:-5]) + + # not in order + left = _permute(index[:-5]) + right = _permute(index[10:]) + result = left.intersection(right).sort_values() + tm.assert_index_equal(result, index[10:-5]) + + # raise if different frequencies + index = period_range('1/1/2000', '1/20/2000', freq='D') + index2 = period_range('1/1/2000', '1/20/2000', freq='W-WED') + with pytest.raises(period.IncompatibleFrequency): + index.intersection(index2) + + index3 = period_range('1/1/2000', '1/20/2000', freq='2D') + with pytest.raises(period.IncompatibleFrequency): + index.intersection(index3) + + def test_intersection_cases(self): + base = period_range('6/1/2000', '6/30/2000', freq='D', name='idx') + + # if target has the same name, it is preserved + rng2 = period_range('5/15/2000', '6/20/2000', freq='D', name='idx') + expected2 = period_range('6/1/2000', '6/20/2000', freq='D', + name='idx') + + # if target name is different, it will be reset + rng3 = period_range('5/15/2000', '6/20/2000', freq='D', name='other') + expected3 = period_range('6/1/2000', '6/20/2000', freq='D', + name=None) + + rng4 = period_range('7/1/2000', '7/31/2000', freq='D', name='idx') + expected4 = PeriodIndex([], name='idx', freq='D') + + for (rng, expected) in [(rng2, expected2), (rng3, expected3), + (rng4, expected4)]: + result = base.intersection(rng) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + # non-monotonic + base = PeriodIndex(['2011-01-05', '2011-01-04', '2011-01-02', + '2011-01-03'], freq='D', name='idx') + + rng2 = PeriodIndex(['2011-01-04', '2011-01-02', + '2011-02-02', '2011-02-03'], + freq='D', name='idx') + expected2 = PeriodIndex(['2011-01-04', '2011-01-02'], freq='D', + name='idx') + + rng3 = PeriodIndex(['2011-01-04', '2011-01-02', '2011-02-02', + '2011-02-03'], + freq='D', name='other') + expected3 = PeriodIndex(['2011-01-04', '2011-01-02'], freq='D', + name=None) + + rng4 = period_range('7/1/2000', '7/31/2000', freq='D', name='idx') + expected4 = PeriodIndex([], freq='D', name='idx') + + for (rng, expected) in [(rng2, expected2), (rng3, expected3), + (rng4, expected4)]: + result = base.intersection(rng) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == 'D' + + # empty same freq + rng = date_range('6/1/2000', '6/15/2000', freq='T') + result = rng[0:0].intersection(rng) + assert len(result) == 0 + + result = rng.intersection(rng[0:0]) + assert len(result) == 0 + + def test_difference(self): + # diff + rng1 = pd.period_range('1/1/2000', freq='D', periods=5) + other1 = pd.period_range('1/6/2000', freq='D', periods=5) + expected1 = pd.period_range('1/1/2000', freq='D', periods=5) + + rng2 = pd.period_range('1/1/2000', freq='D', periods=5) + other2 = pd.period_range('1/4/2000', freq='D', periods=5) + expected2 = pd.period_range('1/1/2000', freq='D', periods=3) + + rng3 = pd.period_range('1/1/2000', freq='D', periods=5) + other3 = pd.PeriodIndex([], freq='D') + expected3 = pd.period_range('1/1/2000', freq='D', periods=5) + + rng4 = pd.period_range('2000-01-01 09:00', freq='H', periods=5) + other4 = pd.period_range('2000-01-02 09:00', freq='H', periods=5) + expected4 = rng4 + + rng5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', + '2000-01-01 09:05'], freq='T') + other5 = pd.PeriodIndex( + ['2000-01-01 09:01', '2000-01-01 09:05'], freq='T') + expected5 = pd.PeriodIndex(['2000-01-01 09:03'], freq='T') + + rng6 = pd.period_range('2000-01-01', freq='M', periods=7) + other6 = pd.period_range('2000-04-01', freq='M', periods=7) + expected6 = pd.period_range('2000-01-01', freq='M', periods=3) + + rng7 = pd.period_range('2003-01-01', freq='A', periods=5) + other7 = pd.period_range('1998-01-01', freq='A', periods=8) + expected7 = pd.period_range('2006-01-01', freq='A', periods=2) + + for rng, other, expected in [(rng1, other1, expected1), + (rng2, other2, expected2), + (rng3, other3, expected3), + (rng4, other4, expected4), + (rng5, other5, expected5), + (rng6, other6, expected6), + (rng7, other7, expected7), ]: + result_union = rng.difference(other) + tm.assert_index_equal(result_union, expected) diff --git a/pandas/tests/indexes/period/test_tools.py b/pandas/tests/indexes/period/test_tools.py new file mode 100644 index 0000000000000..074678164e6f9 --- /dev/null +++ b/pandas/tests/indexes/period/test_tools.py @@ -0,0 +1,434 @@ +import numpy as np +from datetime import datetime, timedelta + +import pandas as pd +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas.compat import lrange +from pandas.tseries.frequencies import get_freq, MONTHS +from pandas._libs.period import period_ordinal, period_asfreq +from pandas import (PeriodIndex, Period, DatetimeIndex, Timestamp, Series, + date_range, to_datetime, period_range) + + +class TestPeriodRepresentation(object): + """ + Wish to match NumPy units + """ + + def _check_freq(self, freq, base_date): + rng = PeriodIndex(start=base_date, periods=10, freq=freq) + exp = np.arange(10, dtype=np.int64) + tm.assert_numpy_array_equal(rng._values, exp) + tm.assert_numpy_array_equal(rng.asi8, exp) + + def test_annual(self): + self._check_freq('A', 1970) + + def test_monthly(self): + self._check_freq('M', '1970-01') + + def test_weekly(self): + self._check_freq('W-THU', '1970-01-01') + + def test_daily(self): + self._check_freq('D', '1970-01-01') + + def test_business_daily(self): + self._check_freq('B', '1970-01-01') + + def test_hourly(self): + self._check_freq('H', '1970-01-01') + + def test_minutely(self): + self._check_freq('T', '1970-01-01') + + def test_secondly(self): + self._check_freq('S', '1970-01-01') + + def test_millisecondly(self): + self._check_freq('L', '1970-01-01') + + def test_microsecondly(self): + self._check_freq('U', '1970-01-01') + + def test_nanosecondly(self): + self._check_freq('N', '1970-01-01') + + def test_negone_ordinals(self): + freqs = ['A', 'M', 'Q', 'D', 'H', 'T', 'S'] + + period = Period(ordinal=-1, freq='D') + for freq in freqs: + repr(period.asfreq(freq)) + + for freq in freqs: + period = Period(ordinal=-1, freq=freq) + repr(period) + assert period.year == 1969 + + period = Period(ordinal=-1, freq='B') + repr(period) + period = Period(ordinal=-1, freq='W') + repr(period) + + +class TestTslib(object): + def test_intraday_conversion_factors(self): + assert period_asfreq(1, get_freq('D'), get_freq('H'), False) == 24 + assert period_asfreq(1, get_freq('D'), get_freq('T'), False) == 1440 + assert period_asfreq(1, get_freq('D'), get_freq('S'), False) == 86400 + assert period_asfreq(1, get_freq('D'), + get_freq('L'), False) == 86400000 + assert period_asfreq(1, get_freq('D'), + get_freq('U'), False) == 86400000000 + assert period_asfreq(1, get_freq('D'), + get_freq('N'), False) == 86400000000000 + + assert period_asfreq(1, get_freq('H'), get_freq('T'), False) == 60 + assert period_asfreq(1, get_freq('H'), get_freq('S'), False) == 3600 + assert period_asfreq(1, get_freq('H'), + get_freq('L'), False) == 3600000 + assert period_asfreq(1, get_freq('H'), + get_freq('U'), False) == 3600000000 + assert period_asfreq(1, get_freq('H'), + get_freq('N'), False) == 3600000000000 + + assert period_asfreq(1, get_freq('T'), get_freq('S'), False) == 60 + assert period_asfreq(1, get_freq('T'), get_freq('L'), False) == 60000 + assert period_asfreq(1, get_freq('T'), + get_freq('U'), False) == 60000000 + assert period_asfreq(1, get_freq('T'), + get_freq('N'), False) == 60000000000 + + assert period_asfreq(1, get_freq('S'), get_freq('L'), False) == 1000 + assert period_asfreq(1, get_freq('S'), + get_freq('U'), False) == 1000000 + assert period_asfreq(1, get_freq('S'), + get_freq('N'), False) == 1000000000 + + assert period_asfreq(1, get_freq('L'), get_freq('U'), False) == 1000 + assert period_asfreq(1, get_freq('L'), + get_freq('N'), False) == 1000000 + + assert period_asfreq(1, get_freq('U'), get_freq('N'), False) == 1000 + + def test_period_ordinal_start_values(self): + # information for 1.1.1970 + assert period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, get_freq('A')) == 0 + assert period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, get_freq('M')) == 0 + assert period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, get_freq('W')) == 1 + assert period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, get_freq('D')) == 0 + assert period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, get_freq('B')) == 0 + + def test_period_ordinal_week(self): + assert period_ordinal(1970, 1, 4, 0, 0, 0, 0, 0, get_freq('W')) == 1 + assert period_ordinal(1970, 1, 5, 0, 0, 0, 0, 0, get_freq('W')) == 2 + assert period_ordinal(2013, 10, 6, 0, + 0, 0, 0, 0, get_freq('W')) == 2284 + assert period_ordinal(2013, 10, 7, 0, + 0, 0, 0, 0, get_freq('W')) == 2285 + + def test_period_ordinal_business_day(self): + # Thursday + assert period_ordinal(2013, 10, 3, 0, + 0, 0, 0, 0, get_freq('B')) == 11415 + # Friday + assert period_ordinal(2013, 10, 4, 0, + 0, 0, 0, 0, get_freq('B')) == 11416 + # Saturday + assert period_ordinal(2013, 10, 5, 0, + 0, 0, 0, 0, get_freq('B')) == 11417 + # Sunday + assert period_ordinal(2013, 10, 6, 0, + 0, 0, 0, 0, get_freq('B')) == 11417 + # Monday + assert period_ordinal(2013, 10, 7, 0, + 0, 0, 0, 0, get_freq('B')) == 11417 + # Tuesday + assert period_ordinal(2013, 10, 8, 0, + 0, 0, 0, 0, get_freq('B')) == 11418 + + +class TestPeriodIndex(object): + + def setup_method(self, method): + pass + + def test_tolist(self): + index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + rs = index.tolist() + for x in rs: + assert isinstance(x, Period) + + recon = PeriodIndex(rs) + tm.assert_index_equal(index, recon) + + def test_to_timestamp(self): + index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') + series = Series(1, index=index, name='foo') + + exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') + result = series.to_timestamp(how='end') + tm.assert_index_equal(result.index, exp_index) + assert result.name == 'foo' + + exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') + result = series.to_timestamp(how='start') + tm.assert_index_equal(result.index, exp_index) + + def _get_with_delta(delta, freq='A-DEC'): + return date_range(to_datetime('1/1/2001') + delta, + to_datetime('12/31/2009') + delta, freq=freq) + + delta = timedelta(hours=23) + result = series.to_timestamp('H', 'end') + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + delta = timedelta(hours=23, minutes=59) + result = series.to_timestamp('T', 'end') + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + result = series.to_timestamp('S', 'end') + delta = timedelta(hours=23, minutes=59, seconds=59) + exp_index = _get_with_delta(delta) + tm.assert_index_equal(result.index, exp_index) + + index = PeriodIndex(freq='H', start='1/1/2001', end='1/2/2001') + series = Series(1, index=index, name='foo') + + exp_index = date_range('1/1/2001 00:59:59', end='1/2/2001 00:59:59', + freq='H') + result = series.to_timestamp(how='end') + tm.assert_index_equal(result.index, exp_index) + assert result.name == 'foo' + + def test_to_timestamp_quarterly_bug(self): + years = np.arange(1960, 2000).repeat(4) + quarters = np.tile(lrange(1, 5), 40) + + pindex = PeriodIndex(year=years, quarter=quarters) + + stamps = pindex.to_timestamp('D', 'end') + expected = DatetimeIndex([x.to_timestamp('D', 'end') for x in pindex]) + tm.assert_index_equal(stamps, expected) + + def test_to_timestamp_preserve_name(self): + index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009', + name='foo') + assert index.name == 'foo' + + conv = index.to_timestamp('D') + assert conv.name == 'foo' + + def test_to_timestamp_repr_is_code(self): + zs = [Timestamp('99-04-17 00:00:00', tz='UTC'), + Timestamp('2001-04-17 00:00:00', tz='UTC'), + Timestamp('2001-04-17 00:00:00', tz='America/Los_Angeles'), + Timestamp('2001-04-17 00:00:00', tz=None)] + for z in zs: + assert eval(repr(z)) == z + + def test_to_timestamp_pi_nat(self): + # GH 7228 + index = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='M', + name='idx') + + result = index.to_timestamp('D') + expected = DatetimeIndex([pd.NaT, datetime(2011, 1, 1), + datetime(2011, 2, 1)], name='idx') + tm.assert_index_equal(result, expected) + assert result.name == 'idx' + + result2 = result.to_period(freq='M') + tm.assert_index_equal(result2, index) + assert result2.name == 'idx' + + result3 = result.to_period(freq='3M') + exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='3M', name='idx') + tm.assert_index_equal(result3, exp) + assert result3.freqstr == '3M' + + msg = ('Frequency must be positive, because it' + ' represents span: -2A') + with tm.assert_raises_regex(ValueError, msg): + result.to_period(freq='-2A') + + def test_to_timestamp_pi_mult(self): + idx = PeriodIndex(['2011-01', 'NaT', '2011-02'], freq='2M', name='idx') + result = idx.to_timestamp() + expected = DatetimeIndex( + ['2011-01-01', 'NaT', '2011-02-01'], name='idx') + tm.assert_index_equal(result, expected) + result = idx.to_timestamp(how='E') + expected = DatetimeIndex( + ['2011-02-28', 'NaT', '2011-03-31'], name='idx') + tm.assert_index_equal(result, expected) + + def test_to_timestamp_pi_combined(self): + idx = PeriodIndex(start='2011', periods=2, freq='1D1H', name='idx') + result = idx.to_timestamp() + expected = DatetimeIndex( + ['2011-01-01 00:00', '2011-01-02 01:00'], name='idx') + tm.assert_index_equal(result, expected) + result = idx.to_timestamp(how='E') + expected = DatetimeIndex( + ['2011-01-02 00:59:59', '2011-01-03 01:59:59'], name='idx') + tm.assert_index_equal(result, expected) + result = idx.to_timestamp(how='E', freq='H') + expected = DatetimeIndex( + ['2011-01-02 00:00', '2011-01-03 01:00'], name='idx') + tm.assert_index_equal(result, expected) + + def test_to_timestamp_to_period_astype(self): + idx = DatetimeIndex([pd.NaT, '2011-01-01', '2011-02-01'], name='idx') + + res = idx.astype('period[M]') + exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='M', name='idx') + tm.assert_index_equal(res, exp) + + res = idx.astype('period[3M]') + exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='3M', name='idx') + tm.assert_index_equal(res, exp) + + def test_dti_to_period(self): + dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') + pi1 = dti.to_period() + pi2 = dti.to_period(freq='D') + pi3 = dti.to_period(freq='3D') + + assert pi1[0] == Period('Jan 2005', freq='M') + assert pi2[0] == Period('1/31/2005', freq='D') + assert pi3[0] == Period('1/31/2005', freq='3D') + + assert pi1[-1] == Period('Nov 2005', freq='M') + assert pi2[-1] == Period('11/30/2005', freq='D') + assert pi3[-1], Period('11/30/2005', freq='3D') + + tm.assert_index_equal(pi1, period_range('1/1/2005', '11/1/2005', + freq='M')) + tm.assert_index_equal(pi2, period_range('1/1/2005', '11/1/2005', + freq='M').asfreq('D')) + tm.assert_index_equal(pi3, period_range('1/1/2005', '11/1/2005', + freq='M').asfreq('3D')) + + def test_period_astype_to_timestamp(self): + pi = pd.PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M') + + exp = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01']) + tm.assert_index_equal(pi.astype('datetime64[ns]'), exp) + + exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31']) + tm.assert_index_equal(pi.astype('datetime64[ns]', how='end'), exp) + + exp = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01'], + tz='US/Eastern') + res = pi.astype('datetime64[ns, US/Eastern]') + tm.assert_index_equal(pi.astype('datetime64[ns, US/Eastern]'), exp) + + exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], + tz='US/Eastern') + res = pi.astype('datetime64[ns, US/Eastern]', how='end') + tm.assert_index_equal(res, exp) + + def test_to_period_quarterly(self): + # make sure we can make the round trip + for month in MONTHS: + freq = 'Q-%s' % month + rng = period_range('1989Q3', '1991Q3', freq=freq) + stamps = rng.to_timestamp() + result = stamps.to_period(freq) + tm.assert_index_equal(rng, result) + + def test_to_period_quarterlyish(self): + offsets = ['BQ', 'QS', 'BQS'] + for off in offsets: + rng = date_range('01-Jan-2012', periods=8, freq=off) + prng = rng.to_period() + assert prng.freq == 'Q-DEC' + + def test_to_period_annualish(self): + offsets = ['BA', 'AS', 'BAS'] + for off in offsets: + rng = date_range('01-Jan-2012', periods=8, freq=off) + prng = rng.to_period() + assert prng.freq == 'A-DEC' + + def test_to_period_monthish(self): + offsets = ['MS', 'BM'] + for off in offsets: + rng = date_range('01-Jan-2012', periods=8, freq=off) + prng = rng.to_period() + assert prng.freq == 'M' + + rng = date_range('01-Jan-2012', periods=8, freq='M') + prng = rng.to_period() + assert prng.freq == 'M' + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + date_range('01-Jan-2012', periods=8, freq='EOM') + + def test_period_dt64_round_trip(self): + dti = date_range('1/1/2000', '1/7/2002', freq='B') + pi = dti.to_period() + tm.assert_index_equal(pi.to_timestamp(), dti) + + dti = date_range('1/1/2000', '1/7/2002', freq='B') + pi = dti.to_period(freq='H') + tm.assert_index_equal(pi.to_timestamp(), dti) + + def test_to_timestamp_1703(self): + index = period_range('1/1/2012', periods=4, freq='D') + + result = index.to_timestamp() + assert result[0] == Timestamp('1/1/2012') + + def test_to_datetime_depr(self): + index = period_range('1/1/2012', periods=4, freq='D') + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = index.to_datetime() + assert result[0] == Timestamp('1/1/2012') + + def test_combine_first(self): + # GH 3367 + didx = pd.DatetimeIndex(start='1950-01-31', end='1950-07-31', freq='M') + pidx = pd.PeriodIndex(start=pd.Period('1950-1'), + end=pd.Period('1950-7'), freq='M') + # check to be consistent with DatetimeIndex + for idx in [didx, pidx]: + a = pd.Series([1, np.nan, np.nan, 4, 5, np.nan, 7], index=idx) + b = pd.Series([9, 9, 9, 9, 9, 9, 9], index=idx) + + result = a.combine_first(b) + expected = pd.Series([1, 9, 9, 4, 5, 9, 7], index=idx, + dtype=np.float64) + tm.assert_series_equal(result, expected) + + def test_searchsorted(self): + for freq in ['D', '2D']: + pidx = pd.PeriodIndex(['2014-01-01', '2014-01-02', '2014-01-03', + '2014-01-04', '2014-01-05'], freq=freq) + + p1 = pd.Period('2014-01-01', freq=freq) + assert pidx.searchsorted(p1) == 0 + + p2 = pd.Period('2014-01-04', freq=freq) + assert pidx.searchsorted(p2) == 3 + + msg = "Input has different freq=H from PeriodIndex" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + pidx.searchsorted(pd.Period('2014-01-01', freq='H')) + + msg = "Input has different freq=5D from PeriodIndex" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + pidx.searchsorted(pd.Period('2014-01-01', freq='5D')) + + with tm.assert_produces_warning(FutureWarning): + pidx.searchsorted(key=p2) diff --git a/pandas/tests/indexes/test_base.py b/pandas/tests/indexes/test_base.py index a0f2a090c9a06..aa32e75ba0d58 100644 --- a/pandas/tests/indexes/test_base.py +++ b/pandas/tests/indexes/test_base.py @@ -1,40 +1,38 @@ # -*- coding: utf-8 -*- +import pytest + from datetime import datetime, timedelta import pandas.util.testing as tm -from pandas.indexes.api import Index, MultiIndex +from pandas.core.indexes.api import Index, MultiIndex from pandas.tests.indexes.common import Base from pandas.compat import (range, lrange, lzip, u, text_type, zip, PY3, PY36) import operator -import os - -import nose import numpy as np from pandas import (period_range, date_range, Series, DataFrame, Float64Index, Int64Index, CategoricalIndex, DatetimeIndex, TimedeltaIndex, - PeriodIndex) -from pandas.core.index import _get_combined_index + PeriodIndex, isna) +from pandas.core.index import _get_combined_index, _ensure_index_from_sequences from pandas.util.testing import assert_almost_equal from pandas.compat.numpy import np_datetime64_compat import pandas.core.config as cf -from pandas.tseries.index import _to_m8 +from pandas.core.indexes.datetimes import _to_m8 import pandas as pd -from pandas.lib import Timestamp +from pandas._libs.lib import Timestamp -class TestIndex(Base, tm.TestCase): +class TestIndex(Base): _holder = Index - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.indices = dict(unicodeIndex=tm.makeUnicodeIndex(100), strIndex=tm.makeStringIndex(100), dateIndex=tm.makeDateIndex(100), @@ -56,14 +54,14 @@ def create_index(self): def test_new_axis(self): new_index = self.dateIndex[None, :] - self.assertEqual(new_index.ndim, 2) - tm.assertIsInstance(new_index, np.ndarray) + assert new_index.ndim == 2 + assert isinstance(new_index, np.ndarray) def test_copy_and_deepcopy(self): super(TestIndex, self).test_copy_and_deepcopy() new_copy2 = self.intIndex.copy(dtype=int) - self.assertEqual(new_copy2.dtype.kind, 'i') + assert new_copy2.dtype.kind == 'i' def test_constructor(self): # regular instance creation @@ -79,41 +77,41 @@ def test_constructor(self): # copy arr = np.array(self.strIndex) index = Index(arr, copy=True, name='name') - tm.assertIsInstance(index, Index) - self.assertEqual(index.name, 'name') + assert isinstance(index, Index) + assert index.name == 'name' tm.assert_numpy_array_equal(arr, index.values) arr[0] = "SOMEBIGLONGSTRING" - self.assertNotEqual(index[0], "SOMEBIGLONGSTRING") + assert index[0] != "SOMEBIGLONGSTRING" # what to do here? # arr = np.array(5.) - # self.assertRaises(Exception, arr.view, Index) + # pytest.raises(Exception, arr.view, Index) def test_constructor_corner(self): # corner case - self.assertRaises(TypeError, Index, 0) + pytest.raises(TypeError, Index, 0) def test_construction_list_mixed_tuples(self): - # 10697 - # if we are constructing from a mixed list of tuples, make sure that we - # are independent of the sorting order + # see gh-10697: if we are constructing from a mixed list of tuples, + # make sure that we are independent of the sorting order. idx1 = Index([('A', 1), 'B']) - self.assertIsInstance(idx1, Index) and self.assertNotInstance( - idx1, MultiIndex) + assert isinstance(idx1, Index) + assert not isinstance(idx1, MultiIndex) + idx2 = Index(['B', ('A', 1)]) - self.assertIsInstance(idx2, Index) and self.assertNotInstance( - idx2, MultiIndex) + assert isinstance(idx2, Index) + assert not isinstance(idx2, MultiIndex) def test_constructor_from_index_datetimetz(self): idx = pd.date_range('2015-01-01 10:00', freq='D', periods=3, tz='US/Eastern') result = pd.Index(idx) tm.assert_index_equal(result, idx) - self.assertEqual(result.tz, idx.tz) + assert result.tz == idx.tz result = pd.Index(idx.asobject) tm.assert_index_equal(result, idx) - self.assertEqual(result.tz, idx.tz) + assert result.tz == idx.tz def test_constructor_from_index_timedelta(self): idx = pd.timedelta_range('1 days', freq='D', periods=3) @@ -136,7 +134,7 @@ def test_constructor_from_series_datetimetz(self): tz='US/Eastern') result = pd.Index(pd.Series(idx)) tm.assert_index_equal(result, idx) - self.assertEqual(result.tz, idx.tz) + assert result.tz == idx.tz def test_constructor_from_series_timedelta(self): idx = pd.timedelta_range('1 days', freq='D', periods=3) @@ -155,9 +153,9 @@ def test_constructor_from_series(self): s = Series([Timestamp('20110101'), Timestamp('20120101'), Timestamp('20130101')]) result = Index(s) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = DatetimeIndex(s) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # GH 6273 # create from a series, passing a freq @@ -166,31 +164,30 @@ def test_constructor_from_series(self): result = DatetimeIndex(s, freq='MS') expected = DatetimeIndex(['1-1-1990', '2-1-1990', '3-1-1990', '4-1-1990', '5-1-1990'], freq='MS') - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) df = pd.DataFrame(np.random.rand(5, 3)) df['date'] = ['1-1-1990', '2-1-1990', '3-1-1990', '4-1-1990', '5-1-1990'] result = DatetimeIndex(df['date'], freq='MS') expected.name = 'date' - self.assert_index_equal(result, expected) - self.assertEqual(df['date'].dtype, object) + tm.assert_index_equal(result, expected) + assert df['date'].dtype == object exp = pd.Series(['1-1-1990', '2-1-1990', '3-1-1990', '4-1-1990', '5-1-1990'], name='date') - self.assert_series_equal(df['date'], exp) + tm.assert_series_equal(df['date'], exp) # GH 6274 # infer freq of same result = pd.infer_freq(df['date']) - self.assertEqual(result, 'MS') + assert result == 'MS' def test_constructor_ndarray_like(self): # GH 5460#issuecomment-44474502 # it should be possible to convert any object that satisfies the numpy # ndarray interface directly into an Index class ArrayLike(object): - def __init__(self, array): self.array = array @@ -201,22 +198,39 @@ def __array__(self, dtype=None): date_range('2000-01-01', periods=3).values]: expected = pd.Index(array) result = pd.Index(ArrayLike(array)) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) + + def test_constructor_int_dtype_nan(self): + # see gh-15187 + data = [np.nan] + msg = "cannot convert" + + with tm.assert_raises_regex(ValueError, msg): + Index(data, dtype='int64') + + with tm.assert_raises_regex(ValueError, msg): + Index(data, dtype='uint64') + + # This, however, should not break + # because NaN is float. + expected = Float64Index(data) + result = Index(data, dtype='float') + tm.assert_index_equal(result, expected) def test_index_ctor_infer_nan_nat(self): # GH 13467 exp = pd.Float64Index([np.nan, np.nan]) - self.assertEqual(exp.dtype, np.float64) + assert exp.dtype == np.float64 tm.assert_index_equal(Index([np.nan, np.nan]), exp) tm.assert_index_equal(Index(np.array([np.nan, np.nan])), exp) exp = pd.DatetimeIndex([pd.NaT, pd.NaT]) - self.assertEqual(exp.dtype, 'datetime64[ns]') + assert exp.dtype == 'datetime64[ns]' tm.assert_index_equal(Index([pd.NaT, pd.NaT]), exp) tm.assert_index_equal(Index(np.array([pd.NaT, pd.NaT])), exp) exp = pd.DatetimeIndex([pd.NaT, pd.NaT]) - self.assertEqual(exp.dtype, 'datetime64[ns]') + assert exp.dtype == 'datetime64[ns]' for data in [[pd.NaT, np.nan], [np.nan, pd.NaT], [np.nan, np.datetime64('nat')], @@ -225,13 +239,12 @@ def test_index_ctor_infer_nan_nat(self): tm.assert_index_equal(Index(np.array(data, dtype=object)), exp) exp = pd.TimedeltaIndex([pd.NaT, pd.NaT]) - self.assertEqual(exp.dtype, 'timedelta64[ns]') + assert exp.dtype == 'timedelta64[ns]' for data in [[np.nan, np.timedelta64('nat')], [np.timedelta64('nat'), np.nan], [pd.NaT, np.timedelta64('nat')], [np.timedelta64('nat'), pd.NaT]]: - tm.assert_index_equal(Index(data), exp) tm.assert_index_equal(Index(np.array(data, dtype=object)), exp) @@ -250,47 +263,47 @@ def test_index_ctor_infer_periodindex(self): xp = period_range('2012-1-1', freq='M', periods=3) rs = Index(xp) tm.assert_index_equal(rs, xp) - tm.assertIsInstance(rs, PeriodIndex) + assert isinstance(rs, PeriodIndex) def test_constructor_simple_new(self): idx = Index([1, 2, 3, 4, 5], name='int') result = idx._simple_new(idx, 'int') - self.assert_index_equal(result, idx) + tm.assert_index_equal(result, idx) idx = Index([1.1, np.nan, 2.2, 3.0], name='float') result = idx._simple_new(idx, 'float') - self.assert_index_equal(result, idx) + tm.assert_index_equal(result, idx) idx = Index(['A', 'B', 'C', np.nan], name='obj') result = idx._simple_new(idx, 'obj') - self.assert_index_equal(result, idx) + tm.assert_index_equal(result, idx) def test_constructor_dtypes(self): for idx in [Index(np.array([1, 2, 3], dtype=int)), Index(np.array([1, 2, 3], dtype=int), dtype=int), Index([1, 2, 3], dtype=int)]: - self.assertIsInstance(idx, Int64Index) + assert isinstance(idx, Int64Index) - # these should coerce + # These should coerce for idx in [Index(np.array([1., 2., 3.], dtype=float), dtype=int), Index([1., 2., 3.], dtype=int)]: - self.assertIsInstance(idx, Int64Index) + assert isinstance(idx, Int64Index) for idx in [Index(np.array([1., 2., 3.], dtype=float)), Index(np.array([1, 2, 3], dtype=int), dtype=float), Index(np.array([1., 2., 3.], dtype=float), dtype=float), Index([1, 2, 3], dtype=float), Index([1., 2., 3.], dtype=float)]: - self.assertIsInstance(idx, Float64Index) + assert isinstance(idx, Float64Index) for idx in [Index(np.array([True, False, True], dtype=bool)), Index([True, False, True]), Index(np.array([True, False, True], dtype=bool), dtype=bool), Index([True, False, True], dtype=bool)]: - self.assertIsInstance(idx, Index) - self.assertEqual(idx.dtype, object) + assert isinstance(idx, Index) + assert idx.dtype == object for idx in [Index(np.array([1, 2, 3], dtype=int), dtype='category'), Index([1, 2, 3], dtype='category'), @@ -299,32 +312,32 @@ def test_constructor_dtypes(self): dtype='category'), Index([datetime(2011, 1, 1), datetime(2011, 1, 2)], dtype='category')]: - self.assertIsInstance(idx, CategoricalIndex) + assert isinstance(idx, CategoricalIndex) for idx in [Index(np.array([np_datetime64_compat('2011-01-01'), np_datetime64_compat('2011-01-02')])), Index([datetime(2011, 1, 1), datetime(2011, 1, 2)])]: - self.assertIsInstance(idx, DatetimeIndex) + assert isinstance(idx, DatetimeIndex) for idx in [Index(np.array([np_datetime64_compat('2011-01-01'), np_datetime64_compat('2011-01-02')]), dtype=object), Index([datetime(2011, 1, 1), datetime(2011, 1, 2)], dtype=object)]: - self.assertNotIsInstance(idx, DatetimeIndex) - self.assertIsInstance(idx, Index) - self.assertEqual(idx.dtype, object) + assert not isinstance(idx, DatetimeIndex) + assert isinstance(idx, Index) + assert idx.dtype == object for idx in [Index(np.array([np.timedelta64(1, 'D'), np.timedelta64( 1, 'D')])), Index([timedelta(1), timedelta(1)])]: - self.assertIsInstance(idx, TimedeltaIndex) + assert isinstance(idx, TimedeltaIndex) for idx in [Index(np.array([np.timedelta64(1, 'D'), np.timedelta64(1, 'D')]), dtype=object), Index([timedelta(1), timedelta(1)], dtype=object)]: - self.assertNotIsInstance(idx, TimedeltaIndex) - self.assertIsInstance(idx, Index) - self.assertEqual(idx.dtype, object) + assert not isinstance(idx, TimedeltaIndex) + assert isinstance(idx, Index) + assert idx.dtype == object def test_constructor_dtypes_datetime(self): @@ -374,7 +387,7 @@ def test_view_with_args(self): ind = self.indices[i] # with arguments - self.assertRaises(TypeError, lambda: ind.view('i8')) + pytest.raises(TypeError, lambda: ind.view('i8')) # these are ok for i in list(set(self.indices.keys()) - set(restricted)): @@ -383,15 +396,6 @@ def test_view_with_args(self): # with arguments ind.view('i8') - def test_legacy_pickle_identity(self): - - # GH 8431 - pth = tm.get_data_path() - s1 = pd.read_pickle(os.path.join(pth, 's1-0.12.0.pickle')) - s2 = pd.read_pickle(os.path.join(pth, 's2-0.12.0.pickle')) - self.assertFalse(s1.index.identical(s2.index)) - self.assertFalse(s1.index.equals(s2.index)) - def test_astype(self): casted = self.intIndex.astype('i8') @@ -401,20 +405,20 @@ def test_astype(self): # pass on name self.intIndex.name = 'foobar' casted = self.intIndex.astype('i8') - self.assertEqual(casted.name, 'foobar') + assert casted.name == 'foobar' def test_equals_object(self): # same - self.assertTrue(Index(['a', 'b', 'c']).equals(Index(['a', 'b', 'c']))) + assert Index(['a', 'b', 'c']).equals(Index(['a', 'b', 'c'])) # different length - self.assertFalse(Index(['a', 'b', 'c']).equals(Index(['a', 'b']))) + assert not Index(['a', 'b', 'c']).equals(Index(['a', 'b'])) # same length, different values - self.assertFalse(Index(['a', 'b', 'c']).equals(Index(['a', 'b', 'd']))) + assert not Index(['a', 'b', 'c']).equals(Index(['a', 'b', 'd'])) # Must also be an Index - self.assertFalse(Index(['a', 'b', 'c']).equals(['a', 'b', 'c'])) + assert not Index(['a', 'b', 'c']).equals(['a', 'b', 'c']) def test_insert(self): @@ -423,34 +427,34 @@ def test_insert(self): result = Index(['b', 'c', 'd']) # test 0th element - self.assert_index_equal(Index(['a', 'b', 'c', 'd']), - result.insert(0, 'a')) + tm.assert_index_equal(Index(['a', 'b', 'c', 'd']), + result.insert(0, 'a')) # test Nth element that follows Python list behavior - self.assert_index_equal(Index(['b', 'c', 'e', 'd']), - result.insert(-1, 'e')) + tm.assert_index_equal(Index(['b', 'c', 'e', 'd']), + result.insert(-1, 'e')) # test loc +/- neq (0, -1) - self.assert_index_equal(result.insert(1, 'z'), result.insert(-2, 'z')) + tm.assert_index_equal(result.insert(1, 'z'), result.insert(-2, 'z')) # test empty null_index = Index([]) - self.assert_index_equal(Index(['a']), null_index.insert(0, 'a')) + tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a')) def test_delete(self): idx = Index(['a', 'b', 'c', 'd'], name='idx') expected = Index(['b', 'c', 'd'], name='idx') result = idx.delete(0) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) + tm.assert_index_equal(result, expected) + assert result.name == expected.name expected = Index(['a', 'b', 'c'], name='idx') result = idx.delete(-1) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) + tm.assert_index_equal(result, expected) + assert result.name == expected.name - with tm.assertRaises((IndexError, ValueError)): + with pytest.raises((IndexError, ValueError)): # either depending on numpy version result = idx.delete(5) @@ -460,60 +464,60 @@ def test_identical(self): i1 = Index(['a', 'b', 'c']) i2 = Index(['a', 'b', 'c']) - self.assertTrue(i1.identical(i2)) + assert i1.identical(i2) i1 = i1.rename('foo') - self.assertTrue(i1.equals(i2)) - self.assertFalse(i1.identical(i2)) + assert i1.equals(i2) + assert not i1.identical(i2) i2 = i2.rename('foo') - self.assertTrue(i1.identical(i2)) + assert i1.identical(i2) i3 = Index([('a', 'a'), ('a', 'b'), ('b', 'a')]) i4 = Index([('a', 'a'), ('a', 'b'), ('b', 'a')], tupleize_cols=False) - self.assertFalse(i3.identical(i4)) + assert not i3.identical(i4) def test_is_(self): ind = Index(range(10)) - self.assertTrue(ind.is_(ind)) - self.assertTrue(ind.is_(ind.view().view().view().view())) - self.assertFalse(ind.is_(Index(range(10)))) - self.assertFalse(ind.is_(ind.copy())) - self.assertFalse(ind.is_(ind.copy(deep=False))) - self.assertFalse(ind.is_(ind[:])) - self.assertFalse(ind.is_(ind.view(np.ndarray).view(Index))) - self.assertFalse(ind.is_(np.array(range(10)))) + assert ind.is_(ind) + assert ind.is_(ind.view().view().view().view()) + assert not ind.is_(Index(range(10))) + assert not ind.is_(ind.copy()) + assert not ind.is_(ind.copy(deep=False)) + assert not ind.is_(ind[:]) + assert not ind.is_(ind.view(np.ndarray).view(Index)) + assert not ind.is_(np.array(range(10))) # quasi-implementation dependent - self.assertTrue(ind.is_(ind.view())) + assert ind.is_(ind.view()) ind2 = ind.view() ind2.name = 'bob' - self.assertTrue(ind.is_(ind2)) - self.assertTrue(ind2.is_(ind)) + assert ind.is_(ind2) + assert ind2.is_(ind) # doesn't matter if Indices are *actually* views of underlying data, - self.assertFalse(ind.is_(Index(ind.values))) + assert not ind.is_(Index(ind.values)) arr = np.array(range(1, 11)) ind1 = Index(arr, copy=False) ind2 = Index(arr, copy=False) - self.assertFalse(ind1.is_(ind2)) + assert not ind1.is_(ind2) def test_asof(self): d = self.dateIndex[0] - self.assertEqual(self.dateIndex.asof(d), d) - self.assertTrue(np.isnan(self.dateIndex.asof(d - timedelta(1)))) + assert self.dateIndex.asof(d) == d + assert isna(self.dateIndex.asof(d - timedelta(1))) d = self.dateIndex[-1] - self.assertEqual(self.dateIndex.asof(d + timedelta(1)), d) + assert self.dateIndex.asof(d + timedelta(1)) == d d = self.dateIndex[0].to_pydatetime() - tm.assertIsInstance(self.dateIndex.asof(d), Timestamp) + assert isinstance(self.dateIndex.asof(d), Timestamp) def test_asof_datetime_partial(self): idx = pd.date_range('2010-01-01', periods=2, freq='m') expected = Timestamp('2010-02-28') result = idx.asof('2010-02') - self.assertEqual(result, expected) - self.assertFalse(isinstance(result, Index)) + assert result == expected + assert not isinstance(result, Index) def test_nanosecond_index_access(self): s = Series([Timestamp('20130101')]).values.view('i8')[0] @@ -523,12 +527,11 @@ def test_nanosecond_index_access(self): first_value = x.asof(x.index[0]) # this does not yet work, as parsing strings is done via dateutil - # self.assertEqual(first_value, - # x['2013-01-01 00:00:00.000000050+0000']) + # assert first_value == x['2013-01-01 00:00:00.000000050+0000'] exp_ts = np_datetime64_compat('2013-01-01 00:00:00.000000050+0000', 'ns') - self.assertEqual(first_value, x[Timestamp(exp_ts)]) + assert first_value == x[Timestamp(exp_ts)] def test_comparators(self): index = self.dateIndex @@ -541,7 +544,7 @@ def _check(op): arr_result = op(arr, element) index_result = op(index, element) - self.assertIsInstance(index_result, np.ndarray) + assert isinstance(index_result, np.ndarray) tm.assert_numpy_array_equal(arr_result, index_result) _check(operator.eq) @@ -558,16 +561,16 @@ def test_booleanindex(self): subIndex = self.strIndex[boolIdx] for i, val in enumerate(subIndex): - self.assertEqual(subIndex.get_loc(val), i) + assert subIndex.get_loc(val) == i subIndex = self.strIndex[list(boolIdx)] for i, val in enumerate(subIndex): - self.assertEqual(subIndex.get_loc(val), i) + assert subIndex.get_loc(val) == i def test_fancy(self): sl = self.strIndex[[1, 2, 3]] for i in sl: - self.assertEqual(i, sl[sl.get_loc(i)]) + assert i == sl[sl.get_loc(i)] def test_empty_fancy(self): empty_farr = np.array([], dtype=np.float_) @@ -579,64 +582,69 @@ def test_empty_fancy(self): for idx in [self.strIndex, self.intIndex, self.floatIndex]: empty_idx = idx.__class__([]) - self.assertTrue(idx[[]].identical(empty_idx)) - self.assertTrue(idx[empty_iarr].identical(empty_idx)) - self.assertTrue(idx[empty_barr].identical(empty_idx)) + assert idx[[]].identical(empty_idx) + assert idx[empty_iarr].identical(empty_idx) + assert idx[empty_barr].identical(empty_idx) # np.ndarray only accepts ndarray of int & bool dtypes, so should # Index. - self.assertRaises(IndexError, idx.__getitem__, empty_farr) + pytest.raises(IndexError, idx.__getitem__, empty_farr) def test_getitem(self): arr = np.array(self.dateIndex) exp = self.dateIndex[5] exp = _to_m8(exp) - self.assertEqual(exp, arr[5]) + assert exp == arr[5] def test_intersection(self): first = self.strIndex[:20] second = self.strIndex[:10] intersect = first.intersection(second) - self.assertTrue(tm.equalContents(intersect, second)) + assert tm.equalContents(intersect, second) # Corner cases inter = first.intersection(first) - self.assertIs(inter, first) + assert inter is first idx1 = Index([1, 2, 3, 4, 5], name='idx') # if target has the same name, it is preserved idx2 = Index([3, 4, 5, 6, 7], name='idx') expected2 = Index([3, 4, 5], name='idx') result2 = idx1.intersection(idx2) - self.assert_index_equal(result2, expected2) - self.assertEqual(result2.name, expected2.name) + tm.assert_index_equal(result2, expected2) + assert result2.name == expected2.name # if target name is different, it will be reset idx3 = Index([3, 4, 5, 6, 7], name='other') expected3 = Index([3, 4, 5], name=None) result3 = idx1.intersection(idx3) - self.assert_index_equal(result3, expected3) - self.assertEqual(result3.name, expected3.name) + tm.assert_index_equal(result3, expected3) + assert result3.name == expected3.name # non monotonic idx1 = Index([5, 3, 2, 4, 1], name='idx') idx2 = Index([4, 7, 6, 5, 3], name='idx') - result2 = idx1.intersection(idx2) - self.assertTrue(tm.equalContents(result2, expected2)) - self.assertEqual(result2.name, expected2.name) + expected = Index([5, 3, 4], name='idx') + result = idx1.intersection(idx2) + tm.assert_index_equal(result, expected) - idx3 = Index([4, 7, 6, 5, 3], name='other') - result3 = idx1.intersection(idx3) - self.assertTrue(tm.equalContents(result3, expected3)) - self.assertEqual(result3.name, expected3.name) + idx2 = Index([4, 7, 6, 5, 3], name='other') + expected = Index([5, 3, 4], name=None) + result = idx1.intersection(idx2) + tm.assert_index_equal(result, expected) # non-monotonic non-unique idx1 = Index(['A', 'B', 'A', 'C']) idx2 = Index(['B', 'D']) expected = Index(['B'], dtype='object') result = idx1.intersection(idx2) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) + + idx2 = Index(['B', 'D', 'A']) + expected = Index(['A', 'B', 'A'], dtype='object') + result = idx1.intersection(idx2) + tm.assert_index_equal(result, expected) # preserve names first = self.strIndex[5:20] @@ -644,39 +652,48 @@ def test_intersection(self): first.name = 'A' second.name = 'A' intersect = first.intersection(second) - self.assertEqual(intersect.name, 'A') + assert intersect.name == 'A' second.name = 'B' intersect = first.intersection(second) - self.assertIsNone(intersect.name) + assert intersect.name is None first.name = None second.name = 'B' intersect = first.intersection(second) - self.assertIsNone(intersect.name) + assert intersect.name is None + + def test_intersect_str_dates(self): + dt_dates = [datetime(2012, 2, 9), datetime(2012, 2, 22)] + + i1 = Index(dt_dates, dtype=object) + i2 = Index(['aa'], dtype=object) + res = i2.intersection(i1) + + assert len(res) == 0 def test_union(self): first = self.strIndex[5:20] second = self.strIndex[:10] everything = self.strIndex[:20] union = first.union(second) - self.assertTrue(tm.equalContents(union, everything)) + assert tm.equalContents(union, everything) # GH 10149 cases = [klass(second.values) for klass in [np.array, Series, list]] for case in cases: result = first.union(case) - self.assertTrue(tm.equalContents(result, everything)) + assert tm.equalContents(result, everything) # Corner cases union = first.union(first) - self.assertIs(union, first) + assert union is first union = first.union([]) - self.assertIs(union, first) + assert union is first union = Index([]).union(first) - self.assertIs(union, first) + assert union is first # preserve names first = Index(list('ab'), name='A') @@ -742,8 +759,8 @@ def test_union(self): else: appended = np.append(self.strIndex, self.dateIndex.astype('O')) - self.assertTrue(tm.equalContents(firstCat, appended)) - self.assertTrue(tm.equalContents(secondCat, self.strIndex)) + assert tm.equalContents(firstCat, appended) + assert tm.equalContents(secondCat, self.strIndex) tm.assert_contains_all(self.strIndex, firstCat) tm.assert_contains_all(self.strIndex, secondCat) tm.assert_contains_all(self.dateIndex, firstCat) @@ -751,23 +768,23 @@ def test_union(self): def test_add(self): idx = self.strIndex expected = Index(self.strIndex.values * 2) - self.assert_index_equal(idx + idx, expected) - self.assert_index_equal(idx + idx.tolist(), expected) - self.assert_index_equal(idx.tolist() + idx, expected) + tm.assert_index_equal(idx + idx, expected) + tm.assert_index_equal(idx + idx.tolist(), expected) + tm.assert_index_equal(idx.tolist() + idx, expected) # test add and radd idx = Index(list('abc')) expected = Index(['a1', 'b1', 'c1']) - self.assert_index_equal(idx + '1', expected) + tm.assert_index_equal(idx + '1', expected) expected = Index(['1a', '1b', '1c']) - self.assert_index_equal('1' + idx, expected) + tm.assert_index_equal('1' + idx, expected) def test_sub(self): idx = self.strIndex - self.assertRaises(TypeError, lambda: idx - 'a') - self.assertRaises(TypeError, lambda: idx - idx) - self.assertRaises(TypeError, lambda: idx - idx.tolist()) - self.assertRaises(TypeError, lambda: idx.tolist() - idx) + pytest.raises(TypeError, lambda: idx - 'a') + pytest.raises(TypeError, lambda: idx - idx) + pytest.raises(TypeError, lambda: idx - idx.tolist()) + pytest.raises(TypeError, lambda: idx.tolist() - idx) def test_map_identity_mapping(self): # GH 12766 @@ -816,40 +833,40 @@ def test_append_multiple(self): foos = [index[:2], index[2:4], index[4:]] result = foos[0].append(foos[1:]) - self.assert_index_equal(result, index) + tm.assert_index_equal(result, index) # empty result = index.append([]) - self.assert_index_equal(result, index) + tm.assert_index_equal(result, index) def test_append_empty_preserve_name(self): left = Index([], name='foo') right = Index([1, 2, 3], name='foo') result = left.append(right) - self.assertEqual(result.name, 'foo') + assert result.name == 'foo' left = Index([], name='foo') right = Index([1, 2, 3], name='bar') result = left.append(right) - self.assertIsNone(result.name) + assert result.name is None def test_add_string(self): # from bug report index = Index(['a', 'b', 'c']) index2 = index + 'foo' - self.assertNotIn('a', index2) - self.assertIn('afoo', index2) + assert 'a' not in index2 + assert 'afoo' in index2 def test_iadd_string(self): index = pd.Index(['a', 'b', 'c']) # doesn't fail test unless there is a check before `+=` - self.assertIn('a', index) + assert 'a' in index index += '_x' - self.assertIn('a_x', index) + assert 'a_x' in index def test_difference(self): @@ -860,23 +877,23 @@ def test_difference(self): # different names result = first.difference(second) - self.assertTrue(tm.equalContents(result, answer)) - self.assertEqual(result.name, None) + assert tm.equalContents(result, answer) + assert result.name is None # same names second.name = 'name' result = first.difference(second) - self.assertEqual(result.name, 'name') + assert result.name == 'name' # with empty result = first.difference([]) - self.assertTrue(tm.equalContents(result, first)) - self.assertEqual(result.name, first.name) + assert tm.equalContents(result, first) + assert result.name == first.name - # with everythin + # with everything result = first.difference(first) - self.assertEqual(len(result), 0) - self.assertEqual(result.name, first.name) + assert len(result) == 0 + assert result.name == first.name def test_symmetric_difference(self): # smoke @@ -884,20 +901,20 @@ def test_symmetric_difference(self): idx2 = Index([2, 3, 4, 5]) result = idx1.symmetric_difference(idx2) expected = Index([1, 5]) - self.assertTrue(tm.equalContents(result, expected)) - self.assertIsNone(result.name) + assert tm.equalContents(result, expected) + assert result.name is None # __xor__ syntax expected = idx1 ^ idx2 - self.assertTrue(tm.equalContents(result, expected)) - self.assertIsNone(result.name) + assert tm.equalContents(result, expected) + assert result.name is None # multiIndex idx1 = MultiIndex.from_tuples(self.tuples) idx2 = MultiIndex.from_tuples([('foo', 1), ('bar', 3)]) result = idx1.symmetric_difference(idx2) expected = MultiIndex.from_tuples([('bar', 2), ('baz', 3), ('bar', 3)]) - self.assertTrue(tm.equalContents(result, expected)) + assert tm.equalContents(result, expected) # nans: # GH 13514 change: {nan} - {nan} == {} @@ -919,32 +936,32 @@ def test_symmetric_difference(self): idx2 = np.array([2, 3, 4, 5]) expected = Index([1, 5]) result = idx1.symmetric_difference(idx2) - self.assertTrue(tm.equalContents(result, expected)) - self.assertEqual(result.name, 'idx1') + assert tm.equalContents(result, expected) + assert result.name == 'idx1' result = idx1.symmetric_difference(idx2, result_name='new_name') - self.assertTrue(tm.equalContents(result, expected)) - self.assertEqual(result.name, 'new_name') + assert tm.equalContents(result, expected) + assert result.name == 'new_name' def test_is_numeric(self): - self.assertFalse(self.dateIndex.is_numeric()) - self.assertFalse(self.strIndex.is_numeric()) - self.assertTrue(self.intIndex.is_numeric()) - self.assertTrue(self.floatIndex.is_numeric()) - self.assertFalse(self.catIndex.is_numeric()) + assert not self.dateIndex.is_numeric() + assert not self.strIndex.is_numeric() + assert self.intIndex.is_numeric() + assert self.floatIndex.is_numeric() + assert not self.catIndex.is_numeric() def test_is_object(self): - self.assertTrue(self.strIndex.is_object()) - self.assertTrue(self.boolIndex.is_object()) - self.assertFalse(self.catIndex.is_object()) - self.assertFalse(self.intIndex.is_object()) - self.assertFalse(self.dateIndex.is_object()) - self.assertFalse(self.floatIndex.is_object()) + assert self.strIndex.is_object() + assert self.boolIndex.is_object() + assert not self.catIndex.is_object() + assert not self.intIndex.is_object() + assert not self.dateIndex.is_object() + assert not self.floatIndex.is_object() def test_is_all_dates(self): - self.assertTrue(self.dateIndex.is_all_dates) - self.assertFalse(self.strIndex.is_all_dates) - self.assertFalse(self.intIndex.is_all_dates) + assert self.dateIndex.is_all_dates + assert not self.strIndex.is_all_dates + assert not self.intIndex.is_all_dates def test_summary(self): self._check_method_works(Index.summary) @@ -952,8 +969,8 @@ def test_summary(self): ind = Index(['{other}%s', "~:{range}:0"], name='A') result = ind.summary() # shouldn't be formatted accidentally. - self.assertIn('~:{range}:0', result) - self.assertIn('{other}%s', result) + assert '~:{range}:0' in result + assert '{other}%s' in result def test_format(self): self._check_method_works(Index.format) @@ -967,19 +984,19 @@ def test_format(self): index = Index([now]) formatted = index.format() expected = [str(index[0])] - self.assertEqual(formatted, expected) + assert formatted == expected # 2845 index = Index([1, 2.0 + 3.0j, np.nan]) formatted = index.format() expected = [str(index[0]), str(index[1]), u('NaN')] - self.assertEqual(formatted, expected) + assert formatted == expected # is this really allowed? index = Index([1, 2.0 + 3.0j, None]) formatted = index.format() expected = [str(index[0]), str(index[1]), u('NaN')] - self.assertEqual(formatted, expected) + assert formatted == expected self.strIndex[:0].format() @@ -989,27 +1006,27 @@ def test_format_with_name_time_info(self): dates = Index([dt + inc for dt in self.dateIndex], name='something') formatted = dates.format(name=True) - self.assertEqual(formatted[0], 'something') + assert formatted[0] == 'something' def test_format_datetime_with_time(self): t = Index([datetime(2012, 2, 7), datetime(2012, 2, 7, 23)]) result = t.format() expected = ['2012-02-07 00:00:00', '2012-02-07 23:00:00'] - self.assertEqual(len(result), 2) - self.assertEqual(result, expected) + assert len(result) == 2 + assert result == expected def test_format_none(self): values = ['a', 'b', 'c', None] idx = Index(values) idx.format() - self.assertIsNone(idx[3]) + assert idx[3] is None def test_logical_compat(self): idx = self.create_index() - self.assertEqual(idx.all(), idx.values.all()) - self.assertEqual(idx.any(), idx.values.any()) + assert idx.all() == idx.values.all() + assert idx.any() == idx.values.any() def _check_method_works(self, method): method(self.empty) @@ -1051,10 +1068,10 @@ def test_get_indexer_invalid(self): # GH10411 idx = Index(np.arange(10)) - with tm.assertRaisesRegexp(ValueError, 'tolerance argument'): + with tm.assert_raises_regex(ValueError, 'tolerance argument'): idx.get_indexer([1, 0], tolerance=1) - with tm.assertRaisesRegexp(ValueError, 'limit argument'): + with tm.assert_raises_regex(ValueError, 'limit argument'): idx.get_indexer([1, 0], limit=1) def test_get_indexer_nearest(self): @@ -1088,7 +1105,7 @@ def test_get_indexer_nearest(self): tm.assert_numpy_array_equal(actual, np.array(expected, dtype=np.intp)) - with tm.assertRaisesRegexp(ValueError, 'limit argument'): + with tm.assert_raises_regex(ValueError, 'limit argument'): idx.get_indexer([1, 0], method='nearest', limit=1) def test_get_indexer_nearest_decreasing(self): @@ -1117,41 +1134,41 @@ def test_get_indexer_strings(self): expected = np.array([0, 0, 1, -1], dtype=np.intp) tm.assert_numpy_array_equal(actual, expected) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): idx.get_indexer(['a', 'b', 'c', 'd'], method='nearest') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): idx.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=2) def test_get_loc(self): idx = pd.Index([0, 1, 2]) all_methods = [None, 'pad', 'backfill', 'nearest'] for method in all_methods: - self.assertEqual(idx.get_loc(1, method=method), 1) + assert idx.get_loc(1, method=method) == 1 if method is not None: - self.assertEqual(idx.get_loc(1, method=method, tolerance=0), 1) - with tm.assertRaises(TypeError): + assert idx.get_loc(1, method=method, tolerance=0) == 1 + with pytest.raises(TypeError): idx.get_loc([1, 2], method=method) for method, loc in [('pad', 1), ('backfill', 2), ('nearest', 1)]: - self.assertEqual(idx.get_loc(1.1, method), loc) + assert idx.get_loc(1.1, method) == loc for method, loc in [('pad', 1), ('backfill', 2), ('nearest', 1)]: - self.assertEqual(idx.get_loc(1.1, method, tolerance=1), loc) + assert idx.get_loc(1.1, method, tolerance=1) == loc for method in ['pad', 'backfill', 'nearest']: - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): idx.get_loc(1.1, method, tolerance=0.05) - with tm.assertRaisesRegexp(ValueError, 'must be numeric'): + with tm.assert_raises_regex(ValueError, 'must be numeric'): idx.get_loc(1.1, 'nearest', tolerance='invalid') - with tm.assertRaisesRegexp(ValueError, 'tolerance .* valid if'): + with tm.assert_raises_regex(ValueError, 'tolerance .* valid if'): idx.get_loc(1.1, tolerance=1) idx = pd.Index(['a', 'c']) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): idx.get_loc('a', method='nearest') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): idx.get_loc('a', method='pad', tolerance='invalid') def test_slice_locs(self): @@ -1159,71 +1176,71 @@ def test_slice_locs(self): idx = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=dtype)) n = len(idx) - self.assertEqual(idx.slice_locs(start=2), (2, n)) - self.assertEqual(idx.slice_locs(start=3), (3, n)) - self.assertEqual(idx.slice_locs(3, 8), (3, 6)) - self.assertEqual(idx.slice_locs(5, 10), (3, n)) - self.assertEqual(idx.slice_locs(end=8), (0, 6)) - self.assertEqual(idx.slice_locs(end=9), (0, 7)) + assert idx.slice_locs(start=2) == (2, n) + assert idx.slice_locs(start=3) == (3, n) + assert idx.slice_locs(3, 8) == (3, 6) + assert idx.slice_locs(5, 10) == (3, n) + assert idx.slice_locs(end=8) == (0, 6) + assert idx.slice_locs(end=9) == (0, 7) # reversed idx2 = idx[::-1] - self.assertEqual(idx2.slice_locs(8, 2), (2, 6)) - self.assertEqual(idx2.slice_locs(7, 3), (2, 5)) + assert idx2.slice_locs(8, 2) == (2, 6) + assert idx2.slice_locs(7, 3) == (2, 5) # float slicing idx = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=float)) n = len(idx) - self.assertEqual(idx.slice_locs(5.0, 10.0), (3, n)) - self.assertEqual(idx.slice_locs(4.5, 10.5), (3, 8)) + assert idx.slice_locs(5.0, 10.0) == (3, n) + assert idx.slice_locs(4.5, 10.5) == (3, 8) idx2 = idx[::-1] - self.assertEqual(idx2.slice_locs(8.5, 1.5), (2, 6)) - self.assertEqual(idx2.slice_locs(10.5, -1), (0, n)) + assert idx2.slice_locs(8.5, 1.5) == (2, 6) + assert idx2.slice_locs(10.5, -1) == (0, n) # int slicing with floats # GH 4892, these are all TypeErrors idx = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=int)) - self.assertRaises(TypeError, - lambda: idx.slice_locs(5.0, 10.0), (3, n)) - self.assertRaises(TypeError, - lambda: idx.slice_locs(4.5, 10.5), (3, 8)) + pytest.raises(TypeError, + lambda: idx.slice_locs(5.0, 10.0), (3, n)) + pytest.raises(TypeError, + lambda: idx.slice_locs(4.5, 10.5), (3, 8)) idx2 = idx[::-1] - self.assertRaises(TypeError, - lambda: idx2.slice_locs(8.5, 1.5), (2, 6)) - self.assertRaises(TypeError, - lambda: idx2.slice_locs(10.5, -1), (0, n)) + pytest.raises(TypeError, + lambda: idx2.slice_locs(8.5, 1.5), (2, 6)) + pytest.raises(TypeError, + lambda: idx2.slice_locs(10.5, -1), (0, n)) def test_slice_locs_dup(self): idx = Index(['a', 'a', 'b', 'c', 'd', 'd']) - self.assertEqual(idx.slice_locs('a', 'd'), (0, 6)) - self.assertEqual(idx.slice_locs(end='d'), (0, 6)) - self.assertEqual(idx.slice_locs('a', 'c'), (0, 4)) - self.assertEqual(idx.slice_locs('b', 'd'), (2, 6)) + assert idx.slice_locs('a', 'd') == (0, 6) + assert idx.slice_locs(end='d') == (0, 6) + assert idx.slice_locs('a', 'c') == (0, 4) + assert idx.slice_locs('b', 'd') == (2, 6) idx2 = idx[::-1] - self.assertEqual(idx2.slice_locs('d', 'a'), (0, 6)) - self.assertEqual(idx2.slice_locs(end='a'), (0, 6)) - self.assertEqual(idx2.slice_locs('d', 'b'), (0, 4)) - self.assertEqual(idx2.slice_locs('c', 'a'), (2, 6)) + assert idx2.slice_locs('d', 'a') == (0, 6) + assert idx2.slice_locs(end='a') == (0, 6) + assert idx2.slice_locs('d', 'b') == (0, 4) + assert idx2.slice_locs('c', 'a') == (2, 6) for dtype in [int, float]: idx = Index(np.array([10, 12, 12, 14], dtype=dtype)) - self.assertEqual(idx.slice_locs(12, 12), (1, 3)) - self.assertEqual(idx.slice_locs(11, 13), (1, 3)) + assert idx.slice_locs(12, 12) == (1, 3) + assert idx.slice_locs(11, 13) == (1, 3) idx2 = idx[::-1] - self.assertEqual(idx2.slice_locs(12, 12), (1, 3)) - self.assertEqual(idx2.slice_locs(13, 11), (1, 3)) + assert idx2.slice_locs(12, 12) == (1, 3) + assert idx2.slice_locs(13, 11) == (1, 3) def test_slice_locs_na(self): idx = Index([np.nan, 1, 2]) - self.assertRaises(KeyError, idx.slice_locs, start=1.5) - self.assertRaises(KeyError, idx.slice_locs, end=1.5) - self.assertEqual(idx.slice_locs(1), (1, 3)) - self.assertEqual(idx.slice_locs(np.nan), (0, 3)) + pytest.raises(KeyError, idx.slice_locs, start=1.5) + pytest.raises(KeyError, idx.slice_locs, end=1.5) + assert idx.slice_locs(1) == (1, 3) + assert idx.slice_locs(np.nan) == (0, 3) idx = Index([0, np.nan, np.nan, 1, 2]) - self.assertEqual(idx.slice_locs(np.nan), (1, 5)) + assert idx.slice_locs(np.nan) == (1, 5) def test_slice_locs_negative_step(self): idx = Index(list('bcdxy')) @@ -1235,7 +1252,7 @@ def check_slice(in_slice, expected): in_slice.step) result = idx[s_start:s_stop:in_slice.step] expected = pd.Index(list(expected)) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) for in_slice, expected in [ (SLC[::-1], 'yxdcb'), (SLC['b':'y':-1], ''), @@ -1257,40 +1274,40 @@ def test_drop(self): drop = self.strIndex[lrange(5, 10)] dropped = self.strIndex.drop(drop) expected = self.strIndex[lrange(5) + lrange(10, n)] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) - self.assertRaises(ValueError, self.strIndex.drop, ['foo', 'bar']) - self.assertRaises(ValueError, self.strIndex.drop, ['1', 'bar']) + pytest.raises(ValueError, self.strIndex.drop, ['foo', 'bar']) + pytest.raises(ValueError, self.strIndex.drop, ['1', 'bar']) # errors='ignore' mixed = drop.tolist() + ['foo'] dropped = self.strIndex.drop(mixed, errors='ignore') expected = self.strIndex[lrange(5) + lrange(10, n)] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = self.strIndex.drop(['foo', 'bar'], errors='ignore') expected = self.strIndex[lrange(n)] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = self.strIndex.drop(self.strIndex[0]) expected = self.strIndex[1:] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) ser = Index([1, 2, 3]) dropped = ser.drop(1) expected = Index([2, 3]) - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) # errors='ignore' - self.assertRaises(ValueError, ser.drop, [3, 4]) + pytest.raises(ValueError, ser.drop, [3, 4]) dropped = ser.drop(4, errors='ignore') expected = Index([1, 2, 3]) - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = ser.drop([3, 4, 5], errors='ignore') expected = Index([1, 2]) - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) def test_tuple_union_bug(self): import pandas @@ -1309,19 +1326,21 @@ def test_tuple_union_bug(self): int_idx = idx1.intersection(idx2) # needs to be 1d like idx1 and idx2 expected = idx1[:4] # pandas.Index(sorted(set(idx1) & set(idx2))) - self.assertEqual(int_idx.ndim, 1) - self.assert_index_equal(int_idx, expected) + assert int_idx.ndim == 1 + tm.assert_index_equal(int_idx, expected) # union broken union_idx = idx1.union(idx2) expected = idx2 - self.assertEqual(union_idx.ndim, 1) - self.assert_index_equal(union_idx, expected) + assert union_idx.ndim == 1 + tm.assert_index_equal(union_idx, expected) def test_is_monotonic_incomparable(self): index = Index([5, datetime.now(), 7]) - self.assertFalse(index.is_monotonic) - self.assertFalse(index.is_monotonic_decreasing) + assert not index.is_monotonic_increasing + assert not index.is_monotonic_decreasing + assert not index._is_strictly_monotonic_increasing + assert not index._is_strictly_monotonic_decreasing def test_get_set_value(self): values = np.random.randn(100) @@ -1330,7 +1349,7 @@ def test_get_set_value(self): assert_almost_equal(self.dateIndex.get_value(values, date), values[67]) self.dateIndex.set_value(values, date, 10) - self.assertEqual(values[67], 10) + assert values[67] == 10 def test_isin(self): values = ['foo', 'bar', 'quux'] @@ -1347,8 +1366,8 @@ def test_isin(self): # empty, return dtype bool idx = Index([]) result = idx.isin(values) - self.assertEqual(len(result), 0) - self.assertEqual(result.dtype, np.bool_) + assert len(result) == 0 + assert result.dtype == np.bool_ def test_isin_nan(self): tm.assert_numpy_array_equal(Index(['a', np.nan]).isin([np.nan]), @@ -1359,14 +1378,17 @@ def test_isin_nan(self): np.array([False, False])) tm.assert_numpy_array_equal(Index(['a', np.nan]).isin([pd.NaT]), np.array([False, False])) + # Float64Index overrides isin, so must be checked separately tm.assert_numpy_array_equal(Float64Index([1.0, np.nan]).isin([np.nan]), np.array([False, True])) tm.assert_numpy_array_equal( Float64Index([1.0, np.nan]).isin([float('nan')]), np.array([False, True])) + + # we cannot compare NaT with NaN tm.assert_numpy_array_equal(Float64Index([1.0, np.nan]).isin([pd.NaT]), - np.array([False, True])) + np.array([False, False])) def test_isin_level_kwarg(self): def check_idx(idx): @@ -1376,24 +1398,33 @@ def check_idx(idx): tm.assert_numpy_array_equal(expected, idx.isin(values, level=0)) tm.assert_numpy_array_equal(expected, idx.isin(values, level=-1)) - self.assertRaises(IndexError, idx.isin, values, level=1) - self.assertRaises(IndexError, idx.isin, values, level=10) - self.assertRaises(IndexError, idx.isin, values, level=-2) + pytest.raises(IndexError, idx.isin, values, level=1) + pytest.raises(IndexError, idx.isin, values, level=10) + pytest.raises(IndexError, idx.isin, values, level=-2) - self.assertRaises(KeyError, idx.isin, values, level=1.0) - self.assertRaises(KeyError, idx.isin, values, level='foobar') + pytest.raises(KeyError, idx.isin, values, level=1.0) + pytest.raises(KeyError, idx.isin, values, level='foobar') idx.name = 'foobar' tm.assert_numpy_array_equal(expected, idx.isin(values, level='foobar')) - self.assertRaises(KeyError, idx.isin, values, level='xyzzy') - self.assertRaises(KeyError, idx.isin, values, level=np.nan) + pytest.raises(KeyError, idx.isin, values, level='xyzzy') + pytest.raises(KeyError, idx.isin, values, level=np.nan) check_idx(Index(['qux', 'baz', 'foo', 'bar'])) # Float64Index overrides isin, so must be checked separately check_idx(Float64Index([1.0, 2.0, 3.0, 4.0])) + @pytest.mark.parametrize("empty", [[], Series(), np.array([])]) + def test_isin_empty(self, empty): + # see gh-16991 + idx = Index(["a", "b"]) + expected = np.array([False, False]) + + result = idx.isin(empty) + tm.assert_numpy_array_equal(expected, result) + def test_boolean_cmp(self): values = [1, 2, 3, 4] @@ -1405,11 +1436,11 @@ def test_boolean_cmp(self): def test_get_level_values(self): result = self.strIndex.get_level_values(0) - self.assert_index_equal(result, self.strIndex) + tm.assert_index_equal(result, self.strIndex) def test_slice_keep_name(self): idx = Index(['a', 'b'], name='asdf') - self.assertEqual(idx.name, idx[1:].name) + assert idx.name == idx[1:].name def test_join_self(self): # instance attributes of the form self.Index @@ -1420,7 +1451,7 @@ def test_join_self(self): for kind in kinds: joined = res.join(res, how=kind) - self.assertIs(res, joined) + assert res is joined def test_str_attribute(self): # GH9068 @@ -1436,8 +1467,8 @@ def test_str_attribute(self): MultiIndex.from_tuples([('foo', '1'), ('bar', '3')]), PeriodIndex(start='2000', end='2010', freq='A')] for idx in indices: - with self.assertRaisesRegexp(AttributeError, - 'only use .str accessor'): + with tm.assert_raises_regex(AttributeError, + 'only use .str accessor'): idx.str.repeat(2) idx = Index(['a b c', 'd e', 'f']) @@ -1453,7 +1484,7 @@ def test_str_attribute(self): idx = Index(['a1', 'a2', 'b1', 'b2']) expected = np.array([True, True, False, False]) tm.assert_numpy_array_equal(idx.str.startswith('a'), expected) - self.assertIsInstance(idx.str.startswith('a'), np.ndarray) + assert isinstance(idx.str.startswith('a'), np.ndarray) s = Series(range(4), index=idx) expected = Series(range(2), index=['a1', 'a2']) tm.assert_series_equal(s[s.index.str.startswith('a')], expected) @@ -1461,17 +1492,16 @@ def test_str_attribute(self): def test_tab_completion(self): # GH 9910 idx = Index(list('abcd')) - self.assertTrue('str' in dir(idx)) + assert 'str' in dir(idx) idx = Index(range(4)) - self.assertTrue('str' not in dir(idx)) + assert 'str' not in dir(idx) def test_indexing_doesnt_change_class(self): idx = Index([1, 2, 3, 'a', 'b', 'c']) - self.assertTrue(idx[1:3].identical(pd.Index([2, 3], dtype=np.object_))) - self.assertTrue(idx[[0, 1]].identical(pd.Index( - [1, 2], dtype=np.object_))) + assert idx[1:3].identical(pd.Index([2, 3], dtype=np.object_)) + assert idx[[0, 1]].identical(pd.Index([1, 2], dtype=np.object_)) def test_outer_join_sort(self): left_idx = Index(np.random.permutation(15)) @@ -1512,19 +1542,19 @@ def test_take_fill_value(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def test_reshape_raise(self): msg = "reshaping is not supported" idx = pd.Index([0, 1, 2]) - tm.assertRaisesRegexp(NotImplementedError, msg, - idx.reshape, idx.shape) + tm.assert_raises_regex(NotImplementedError, msg, + idx.reshape, idx.shape) def test_reindex_preserves_name_if_target_is_list_or_ndarray(self): # GH6552 @@ -1533,28 +1563,28 @@ def test_reindex_preserves_name_if_target_is_list_or_ndarray(self): dt_idx = pd.date_range('20130101', periods=3) idx.name = None - self.assertEqual(idx.reindex([])[0].name, None) - self.assertEqual(idx.reindex(np.array([]))[0].name, None) - self.assertEqual(idx.reindex(idx.tolist())[0].name, None) - self.assertEqual(idx.reindex(idx.tolist()[:-1])[0].name, None) - self.assertEqual(idx.reindex(idx.values)[0].name, None) - self.assertEqual(idx.reindex(idx.values[:-1])[0].name, None) + assert idx.reindex([])[0].name is None + assert idx.reindex(np.array([]))[0].name is None + assert idx.reindex(idx.tolist())[0].name is None + assert idx.reindex(idx.tolist()[:-1])[0].name is None + assert idx.reindex(idx.values)[0].name is None + assert idx.reindex(idx.values[:-1])[0].name is None # Must preserve name even if dtype changes. - self.assertEqual(idx.reindex(dt_idx.values)[0].name, None) - self.assertEqual(idx.reindex(dt_idx.tolist())[0].name, None) + assert idx.reindex(dt_idx.values)[0].name is None + assert idx.reindex(dt_idx.tolist())[0].name is None idx.name = 'foobar' - self.assertEqual(idx.reindex([])[0].name, 'foobar') - self.assertEqual(idx.reindex(np.array([]))[0].name, 'foobar') - self.assertEqual(idx.reindex(idx.tolist())[0].name, 'foobar') - self.assertEqual(idx.reindex(idx.tolist()[:-1])[0].name, 'foobar') - self.assertEqual(idx.reindex(idx.values)[0].name, 'foobar') - self.assertEqual(idx.reindex(idx.values[:-1])[0].name, 'foobar') + assert idx.reindex([])[0].name == 'foobar' + assert idx.reindex(np.array([]))[0].name == 'foobar' + assert idx.reindex(idx.tolist())[0].name == 'foobar' + assert idx.reindex(idx.tolist()[:-1])[0].name == 'foobar' + assert idx.reindex(idx.values)[0].name == 'foobar' + assert idx.reindex(idx.values[:-1])[0].name == 'foobar' # Must preserve name even if dtype changes. - self.assertEqual(idx.reindex(dt_idx.values)[0].name, 'foobar') - self.assertEqual(idx.reindex(dt_idx.tolist())[0].name, 'foobar') + assert idx.reindex(dt_idx.values)[0].name == 'foobar' + assert idx.reindex(dt_idx.tolist())[0].name == 'foobar' def test_reindex_preserves_type_if_target_is_empty_list_or_array(self): # GH7774 @@ -1563,10 +1593,9 @@ def test_reindex_preserves_type_if_target_is_empty_list_or_array(self): def get_reindex_type(target): return idx.reindex(target)[0].dtype.type - self.assertEqual(get_reindex_type([]), np.object_) - self.assertEqual(get_reindex_type(np.array([])), np.object_) - self.assertEqual(get_reindex_type(np.array([], dtype=np.int64)), - np.object_) + assert get_reindex_type([]) == np.object_ + assert get_reindex_type(np.array([])) == np.object_ + assert get_reindex_type(np.array([], dtype=np.int64)) == np.object_ def test_reindex_doesnt_preserve_type_if_target_is_empty_index(self): # GH7774 @@ -1575,14 +1604,14 @@ def test_reindex_doesnt_preserve_type_if_target_is_empty_index(self): def get_reindex_type(target): return idx.reindex(target)[0].dtype.type - self.assertEqual(get_reindex_type(pd.Int64Index([])), np.int64) - self.assertEqual(get_reindex_type(pd.Float64Index([])), np.float64) - self.assertEqual(get_reindex_type(pd.DatetimeIndex([])), np.datetime64) + assert get_reindex_type(pd.Int64Index([])) == np.int64 + assert get_reindex_type(pd.Float64Index([])) == np.float64 + assert get_reindex_type(pd.DatetimeIndex([])) == np.datetime64 reindexed = idx.reindex(pd.MultiIndex( [pd.Int64Index([]), pd.Float64Index([])], [[], []]))[0] - self.assertEqual(reindexed.levels[0].dtype.type, np.int64) - self.assertEqual(reindexed.levels[1].dtype.type, np.float64) + assert reindexed.levels[0].dtype.type == np.int64 + assert reindexed.levels[1].dtype.type == np.float64 def test_groupby(self): idx = Index(range(5)) @@ -1603,11 +1632,11 @@ def test_equals_op_multiindex(self): mi2 = MultiIndex.from_tuples([(1, 2), (4, 6)]) tm.assert_numpy_array_equal(df.index == mi2, np.array([True, False])) mi3 = MultiIndex.from_tuples([(1, 2), (4, 5), (8, 9)]) - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): df.index == mi3 index_a = Index(['foo', 'bar', 'baz']) - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): df.index == index_a tm.assert_numpy_array_equal(index_a == mi3, np.array([False, False, False])) @@ -1615,8 +1644,8 @@ def test_equals_op_multiindex(self): def test_conversion_preserves_name(self): # GH 10875 i = pd.Index(['01:02:03', '01:02:04'], name='label') - self.assertEqual(i.name, pd.to_datetime(i).name) - self.assertEqual(i.name, pd.to_timedelta(i).name) + assert i.name == pd.to_datetime(i).name + assert i.name == pd.to_timedelta(i).name def test_string_index_repr(self): # py3/py2 repr can differ because of "u" prefix @@ -1631,10 +1660,10 @@ def test_string_index_repr(self): idx = pd.Index(['a', 'bb', 'ccc']) if PY3: expected = u"""Index(['a', 'bb', 'ccc'], dtype='object')""" - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""Index([u'a', u'bb', u'ccc'], dtype='object')""" - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # multiple lines idx = pd.Index(['a', 'bb', 'ccc'] * 10) @@ -1645,7 +1674,7 @@ def test_string_index_repr(self): 'a', 'bb', 'ccc', 'a', 'bb', 'ccc'], dtype='object')""" - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""\ Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', @@ -1653,7 +1682,7 @@ def test_string_index_repr(self): u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc'], dtype='object')""" - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # truncated idx = pd.Index(['a', 'bb', 'ccc'] * 100) @@ -1664,7 +1693,7 @@ def test_string_index_repr(self): 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc'], dtype='object', length=300)""" - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""\ Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', @@ -1672,16 +1701,16 @@ def test_string_index_repr(self): u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc'], dtype='object', length=300)""" - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # short idx = pd.Index([u'あ', u'いい', u'ううう']) if PY3: expected = u"""Index(['あ', 'いい', 'ううう'], dtype='object')""" - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""Index([u'あ', u'いい', u'ううう'], dtype='object')""" - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # multiple lines idx = pd.Index([u'あ', u'いい', u'ううう'] * 10) @@ -1693,7 +1722,7 @@ def test_string_index_repr(self): u" 'あ', 'いい', 'ううう', 'あ', 'いい', " u"'ううう'],\n" u" dtype='object')") - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = (u"Index([u'あ', u'いい', u'ううう', u'あ', u'いい', " u"u'ううう', u'あ', u'いい', u'ううう', u'あ',\n" @@ -1702,7 +1731,7 @@ def test_string_index_repr(self): u" u'ううう', u'あ', u'いい', u'ううう', u'あ', " u"u'いい', u'ううう', u'あ', u'いい', u'ううう'],\n" u" dtype='object')") - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # truncated idx = pd.Index([u'あ', u'いい', u'ううう'] * 100) @@ -1713,7 +1742,7 @@ def test_string_index_repr(self): u" 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', " u"'ううう', 'あ', 'いい', 'ううう'],\n" u" dtype='object', length=300)") - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = (u"Index([u'あ', u'いい', u'ううう', u'あ', u'いい', " u"u'ううう', u'あ', u'いい', u'ううう', u'あ',\n" @@ -1722,7 +1751,7 @@ def test_string_index_repr(self): u"u'いい', u'ううう', u'あ', u'いい', u'ううう'],\n" u" dtype='object', length=300)") - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # Emable Unicode option ----------------------------------------- with cf.option_context('display.unicode.east_asian_width', True): @@ -1732,11 +1761,11 @@ def test_string_index_repr(self): if PY3: expected = (u"Index(['あ', 'いい', 'ううう'], " u"dtype='object')") - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = (u"Index([u'あ', u'いい', u'ううう'], " u"dtype='object')") - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # multiple lines idx = pd.Index([u'あ', u'いい', u'ううう'] * 10) @@ -1750,7 +1779,7 @@ def test_string_index_repr(self): u" 'あ', 'いい', 'ううう'],\n" u" dtype='object')""") - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = (u"Index([u'あ', u'いい', u'ううう', u'あ', u'いい', " u"u'ううう', u'あ', u'いい',\n" @@ -1762,7 +1791,7 @@ def test_string_index_repr(self): u"u'あ', u'いい', u'ううう'],\n" u" dtype='object')") - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected # truncated idx = pd.Index([u'あ', u'いい', u'ううう'] * 100) @@ -1776,7 +1805,7 @@ def test_string_index_repr(self): u" 'ううう'],\n" u" dtype='object', length=300)") - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = (u"Index([u'あ', u'いい', u'ううう', u'あ', u'いい', " u"u'ううう', u'あ', u'いい',\n" @@ -1787,46 +1816,49 @@ def test_string_index_repr(self): u" u'いい', u'ううう'],\n" u" dtype='object', length=300)") - self.assertEqual(coerce(idx), expected) + assert coerce(idx) == expected + + @pytest.mark.parametrize('dtype', [np.int64, np.float64]) + @pytest.mark.parametrize('delta', [1, 0, -1]) + def test_addsub_arithmetic(self, dtype, delta): + # GH 8142 + delta = dtype(delta) + idx = pd.Index([10, 11, 12], dtype=dtype) + result = idx + delta + expected = pd.Index(idx.values + delta, dtype=dtype) + tm.assert_index_equal(result, expected) + + # this subtraction used to fail + result = idx - delta + expected = pd.Index(idx.values - delta, dtype=dtype) + tm.assert_index_equal(result, expected) + + tm.assert_index_equal(idx + idx, 2 * idx) + tm.assert_index_equal(idx - idx, 0 * idx) + assert not (idx - idx).empty -class TestMixedIntIndex(Base, tm.TestCase): +class TestMixedIntIndex(Base): # Mostly the tests from common.py for which the results differ # in py2 and py3 because ints and strings are uncomparable in py3 # (GH 13514) _holder = Index - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.indices = dict(mixedIndex=Index([0, 'a', 1, 'b', 2, 'c'])) self.setup_indices() def create_index(self): return self.mixedIndex - def test_order(self): - idx = self.create_index() - # 9816 deprecated - if PY36: - with tm.assertRaisesRegexp(TypeError, "'>' not supported"): - with tm.assert_produces_warning(FutureWarning): - idx.order() - elif PY3: - with tm.assertRaisesRegexp(TypeError, "unorderable types"): - with tm.assert_produces_warning(FutureWarning): - idx.order() - else: - with tm.assert_produces_warning(FutureWarning): - idx.order() - def test_argsort(self): idx = self.create_index() if PY36: - with tm.assertRaisesRegexp(TypeError, "'>' not supported"): + with tm.assert_raises_regex(TypeError, "'>|<' not supported"): result = idx.argsort() elif PY3: - with tm.assertRaisesRegexp(TypeError, "unorderable types"): + with tm.assert_raises_regex(TypeError, "unorderable types"): result = idx.argsort() else: result = idx.argsort() @@ -1836,10 +1868,10 @@ def test_argsort(self): def test_numpy_argsort(self): idx = self.create_index() if PY36: - with tm.assertRaisesRegexp(TypeError, "'>' not supported"): + with tm.assert_raises_regex(TypeError, "'>|<' not supported"): result = np.argsort(idx) elif PY3: - with tm.assertRaisesRegexp(TypeError, "unorderable types"): + with tm.assert_raises_regex(TypeError, "unorderable types"): result = np.argsort(idx) else: result = np.argsort(idx) @@ -1855,22 +1887,22 @@ def test_copy_name(self): second = first.__class__(first, copy=False) # Even though "copy=False", we want a new object. - self.assertIsNot(first, second) + assert first is not second # Not using tm.assert_index_equal() since names differ: - self.assertTrue(idx.equals(first)) + assert idx.equals(first) - self.assertEqual(first.name, 'mario') - self.assertEqual(second.name, 'mario') + assert first.name == 'mario' + assert second.name == 'mario' s1 = Series(2, index=first) s2 = Series(3, index=second[:-1]) - if PY3: - with tm.assert_produces_warning(RuntimeWarning): - # unorderable types - s3 = s1 * s2 - else: + + warning_type = RuntimeWarning if PY3 else None + with tm.assert_produces_warning(warning_type): + # Python 3: Unorderable types s3 = s1 * s2 - self.assertEqual(s3.index.name, 'mario') + + assert s3.index.name == 'mario' def test_copy_name2(self): # Check that adding a "name" parameter to the copy is honored @@ -1878,23 +1910,23 @@ def test_copy_name2(self): idx = pd.Index([1, 2], name='MyName') idx1 = idx.copy() - self.assertTrue(idx.equals(idx1)) - self.assertEqual(idx.name, 'MyName') - self.assertEqual(idx1.name, 'MyName') + assert idx.equals(idx1) + assert idx.name == 'MyName' + assert idx1.name == 'MyName' idx2 = idx.copy(name='NewName') - self.assertTrue(idx.equals(idx2)) - self.assertEqual(idx.name, 'MyName') - self.assertEqual(idx2.name, 'NewName') + assert idx.equals(idx2) + assert idx.name == 'MyName' + assert idx2.name == 'NewName' idx3 = idx.copy(names=['NewName']) - self.assertTrue(idx.equals(idx3)) - self.assertEqual(idx.name, 'MyName') - self.assertEqual(idx.names, ['MyName']) - self.assertEqual(idx3.name, 'NewName') - self.assertEqual(idx3.names, ['NewName']) + assert idx.equals(idx3) + assert idx.name == 'MyName' + assert idx.names == ['MyName'] + assert idx3.name == 'NewName' + assert idx3.names == ['NewName'] def test_union_base(self): idx = self.create_index() @@ -1906,11 +1938,11 @@ def test_union_base(self): # unorderable types result = first.union(second) expected = Index(['b', 2, 'c', 0, 'a', 1]) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) else: result = first.union(second) expected = Index(['b', 2, 'c', 0, 'a', 1]) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # GH 10149 cases = [klass(second.values) @@ -1920,10 +1952,10 @@ def test_union_base(self): with tm.assert_produces_warning(RuntimeWarning): # unorderable types result = first.union(case) - self.assertTrue(tm.equalContents(result, idx)) + assert tm.equalContents(result, idx) else: result = first.union(case) - self.assertTrue(tm.equalContents(result, idx)) + assert tm.equalContents(result, idx) def test_intersection_base(self): # (same results for py2 and py3 but sortedness not tested elsewhere) @@ -1932,14 +1964,14 @@ def test_intersection_base(self): second = idx[:3] result = first.intersection(second) expected = Index([0, 'a', 1]) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # GH 10149 cases = [klass(second.values) for klass in [np.array, Series, list]] for case in cases: result = first.intersection(case) - self.assertTrue(tm.equalContents(result, second)) + assert tm.equalContents(result, second) def test_difference_base(self): # (same results for py2 and py3 but sortedness not tested elsewhere) @@ -1949,7 +1981,7 @@ def test_difference_base(self): result = first.difference(second) expected = Index([0, 1, 'a']) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_symmetric_difference(self): # (same results for py2 and py3 but sortedness not tested elsewhere) @@ -1959,12 +1991,12 @@ def test_symmetric_difference(self): result = first.symmetric_difference(second) expected = Index([0, 1, 2, 'a', 'c']) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_logical_compat(self): idx = self.create_index() - self.assertEqual(idx.all(), idx.values.all()) - self.assertEqual(idx.any(), idx.values.any()) + assert idx.all() == idx.values.all() + assert idx.any() == idx.values.any() def test_dropna(self): # GH 6194 @@ -1994,7 +2026,7 @@ def test_dropna(self): idx = pd.TimedeltaIndex(['1 days', '2 days', '3 days']) tm.assert_index_equal(idx.dropna(), idx) nanidx = pd.TimedeltaIndex([pd.NaT, '1 days', '2 days', - '3 days', pd.NaT]) + '3 days', pd.NaT]) tm.assert_index_equal(nanidx.dropna(), idx) idx = pd.PeriodIndex(['2012-02', '2012-04', '2012-05'], freq='M') @@ -2004,7 +2036,7 @@ def test_dropna(self): tm.assert_index_equal(nanidx.dropna(), idx) msg = "invalid how option: xxx" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): pd.Index([1, 2, 3]).dropna(how='xxx') def test_get_combined_index(self): @@ -2033,14 +2065,16 @@ def test_is_monotonic_na(self): pd.to_datetime(['2000-01-01', 'NaT', '2000-01-02']), pd.to_timedelta(['1 day', 'NaT']), ] for index in examples: - self.assertFalse(index.is_monotonic_increasing) - self.assertFalse(index.is_monotonic_decreasing) + assert not index.is_monotonic_increasing + assert not index.is_monotonic_decreasing + assert not index._is_strictly_monotonic_increasing + assert not index._is_strictly_monotonic_decreasing def test_repr_summary(self): with cf.option_context('display.max_seq_items', 10): r = repr(pd.Index(np.arange(1000))) - self.assertTrue(len(r) < 200) - self.assertTrue("..." in r) + assert len(r) < 200 + assert "..." in r def test_int_name_format(self): index = Index(['a', 'b', 'c'], name=0) @@ -2077,9 +2111,20 @@ def test_intersect_str_dates(self): i2 = Index(['aa'], dtype=object) res = i2.intersection(i1) - self.assertEqual(len(res), 0) + assert len(res) == 0 -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) +class TestIndexUtils(object): + + @pytest.mark.parametrize('data, names, expected', [ + ([[1, 2, 3]], None, Index([1, 2, 3])), + ([[1, 2, 3]], ['name'], Index([1, 2, 3], name='name')), + ([['a', 'a'], ['c', 'd']], None, + MultiIndex([['a'], ['c', 'd']], [[0, 0], [0, 1]])), + ([['a', 'a'], ['c', 'd']], ['L1', 'L2'], + MultiIndex([['a'], ['c', 'd']], [[0, 0], [0, 1]], + names=['L1', 'L2'])), + ]) + def test_ensure_index_from_sequences(self, data, names, expected): + result = _ensure_index_from_sequences(data, names) + tm.assert_index_equal(result, expected) diff --git a/pandas/tests/indexes/test_category.py b/pandas/tests/indexes/test_category.py index 708f424d9bad1..05d31af57b36c 100644 --- a/pandas/tests/indexes/test_category.py +++ b/pandas/tests/indexes/test_category.py @@ -1,17 +1,16 @@ # -*- coding: utf-8 -*- -# TODO(wesm): fix long line flake8 issues -# flake8: noqa +import pytest import pandas.util.testing as tm -from pandas.indexes.api import Index, CategoricalIndex +from pandas.core.indexes.api import Index, CategoricalIndex from .common import Base from pandas.compat import range, PY3 import numpy as np -from pandas import Categorical, compat, notnull +from pandas import Categorical, IntervalIndex, compat, notna from pandas.util.testing import assert_almost_equal import pandas.core.config as cf import pandas as pd @@ -20,10 +19,10 @@ unicode = lambda x: x -class TestCategoricalIndex(Base, tm.TestCase): +class TestCategoricalIndex(Base): _holder = CategoricalIndex - def setUp(self): + def setup_method(self, method): self.indices = dict(catIndex=tm.makeCategoricalIndex(100)) self.setup_indices() @@ -40,62 +39,66 @@ def test_construction(self): result = Index(ci) tm.assert_index_equal(result, ci, exact=True) - self.assertFalse(result.ordered) + assert not result.ordered result = Index(ci.values) tm.assert_index_equal(result, ci, exact=True) - self.assertFalse(result.ordered) + assert not result.ordered # empty result = CategoricalIndex(categories=categories) - self.assert_index_equal(result.categories, Index(categories)) + tm.assert_index_equal(result.categories, Index(categories)) tm.assert_numpy_array_equal(result.codes, np.array([], dtype='int8')) - self.assertFalse(result.ordered) + assert not result.ordered # passing categories result = CategoricalIndex(list('aabbca'), categories=categories) - self.assert_index_equal(result.categories, Index(categories)) + tm.assert_index_equal(result.categories, Index(categories)) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, 2, 0], dtype='int8')) + np.array([0, 0, 1, + 1, 2, 0], dtype='int8')) c = pd.Categorical(list('aabbca')) result = CategoricalIndex(c) - self.assert_index_equal(result.categories, Index(list('abc'))) + tm.assert_index_equal(result.categories, Index(list('abc'))) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, 2, 0], dtype='int8')) - self.assertFalse(result.ordered) + np.array([0, 0, 1, + 1, 2, 0], dtype='int8')) + assert not result.ordered result = CategoricalIndex(c, categories=categories) - self.assert_index_equal(result.categories, Index(categories)) + tm.assert_index_equal(result.categories, Index(categories)) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, 2, 0], dtype='int8')) - self.assertFalse(result.ordered) + np.array([0, 0, 1, + 1, 2, 0], dtype='int8')) + assert not result.ordered ci = CategoricalIndex(c, categories=list('abcd')) result = CategoricalIndex(ci) - self.assert_index_equal(result.categories, Index(categories)) + tm.assert_index_equal(result.categories, Index(categories)) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, 2, 0], dtype='int8')) - self.assertFalse(result.ordered) + np.array([0, 0, 1, + 1, 2, 0], dtype='int8')) + assert not result.ordered result = CategoricalIndex(ci, categories=list('ab')) - self.assert_index_equal(result.categories, Index(list('ab'))) + tm.assert_index_equal(result.categories, Index(list('ab'))) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, -1, 0], - dtype='int8')) - self.assertFalse(result.ordered) + np.array([0, 0, 1, + 1, -1, 0], dtype='int8')) + assert not result.ordered result = CategoricalIndex(ci, categories=list('ab'), ordered=True) - self.assert_index_equal(result.categories, Index(list('ab'))) + tm.assert_index_equal(result.categories, Index(list('ab'))) tm.assert_numpy_array_equal(result.codes, - np.array([0, 0, 1, 1, -1, 0], - dtype='int8')) - self.assertTrue(result.ordered) + np.array([0, 0, 1, + 1, -1, 0], dtype='int8')) + assert result.ordered # turn me to an Index result = Index(np.array(ci)) - self.assertIsInstance(result, Index) - self.assertNotIsInstance(result, CategoricalIndex) + assert isinstance(result, Index) + assert not isinstance(result, CategoricalIndex) def test_construction_with_dtype(self): @@ -128,12 +131,12 @@ def test_disallow_set_ops(self): # set ops (+/-) raise TypeError idx = pd.Index(pd.Categorical(['a', 'b'])) - self.assertRaises(TypeError, lambda: idx - idx) - self.assertRaises(TypeError, lambda: idx + idx) - self.assertRaises(TypeError, lambda: idx - ['a', 'b']) - self.assertRaises(TypeError, lambda: idx + ['a', 'b']) - self.assertRaises(TypeError, lambda: ['a', 'b'] - idx) - self.assertRaises(TypeError, lambda: ['a', 'b'] + idx) + pytest.raises(TypeError, lambda: idx - idx) + pytest.raises(TypeError, lambda: idx + idx) + pytest.raises(TypeError, lambda: idx - ['a', 'b']) + pytest.raises(TypeError, lambda: idx + ['a', 'b']) + pytest.raises(TypeError, lambda: ['a', 'b'] - idx) + pytest.raises(TypeError, lambda: ['a', 'b'] + idx) def test_method_delegation(self): @@ -167,41 +170,36 @@ def test_method_delegation(self): list('aabbca'), categories=list('cabdef'), ordered=True)) # invalid - self.assertRaises(ValueError, lambda: ci.set_categories( + pytest.raises(ValueError, lambda: ci.set_categories( list('cab'), inplace=True)) def test_contains(self): ci = self.create_index(categories=list('cabdef')) - self.assertTrue('a' in ci) - self.assertTrue('z' not in ci) - self.assertTrue('e' not in ci) - self.assertTrue(np.nan not in ci) + assert 'a' in ci + assert 'z' not in ci + assert 'e' not in ci + assert np.nan not in ci # assert codes NOT in index - self.assertFalse(0 in ci) - self.assertFalse(1 in ci) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - ci = CategoricalIndex( - list('aabbca'), categories=list('cabdef') + [np.nan]) - self.assertFalse(np.nan in ci) + assert 0 not in ci + assert 1 not in ci ci = CategoricalIndex( list('aabbca') + [np.nan], categories=list('cabdef')) - self.assertTrue(np.nan in ci) + assert np.nan in ci def test_min_max(self): ci = self.create_index(ordered=False) - self.assertRaises(TypeError, lambda: ci.min()) - self.assertRaises(TypeError, lambda: ci.max()) + pytest.raises(TypeError, lambda: ci.min()) + pytest.raises(TypeError, lambda: ci.max()) ci = self.create_index(ordered=True) - self.assertEqual(ci.min(), 'c') - self.assertEqual(ci.max(), 'b') + assert ci.min() == 'c' + assert ci.max() == 'b' def test_map(self): ci = pd.CategoricalIndex(list('ABABC'), categories=list('CBA'), @@ -220,32 +218,45 @@ def test_map(self): # GH 12766: Return an index not an array tm.assert_index_equal(ci.map(lambda x: 1), - Index(np.array([1] * 5, dtype=np.int64), name='XXX')) + Index(np.array([1] * 5, dtype=np.int64), + name='XXX')) # change categories dtype ci = pd.CategoricalIndex(list('ABABC'), categories=list('BAC'), ordered=False) + def f(x): return {'A': 10, 'B': 20, 'C': 30}.get(x) result = ci.map(f) - exp = pd.CategoricalIndex([10, 20, 10, 20, 30], categories=[20, 10, 30], + exp = pd.CategoricalIndex([10, 20, 10, 20, 30], + categories=[20, 10, 30], ordered=False) tm.assert_index_equal(result, exp) def test_where(self): i = self.create_index() - result = i.where(notnull(i)) + result = i.where(notna(i)) expected = i tm.assert_index_equal(result, expected) - i2 = i.copy() i2 = pd.CategoricalIndex([np.nan, np.nan] + i[2:].tolist(), categories=i.categories) - result = i.where(notnull(i2)) + result = i.where(notna(i2)) expected = i2 tm.assert_index_equal(result, expected) + def test_where_array_like(self): + i = self.create_index() + cond = [False] + [True] * (len(i) - 1) + klasses = [list, tuple, np.array, pd.Series] + expected = pd.CategoricalIndex([np.nan] + i[1:].tolist(), + categories=i.categories) + + for klass in klasses: + result = i.where(klass(cond)) + tm.assert_index_equal(result, expected) + def test_append(self): ci = self.create_index() @@ -264,10 +275,10 @@ def test_append(self): tm.assert_index_equal(result, ci, exact=True) # appending with different categories or reoreded is not ok - self.assertRaises( + pytest.raises( TypeError, lambda: ci.append(ci.values.set_categories(list('abcd')))) - self.assertRaises( + pytest.raises( TypeError, lambda: ci.append(ci.values.reorder_categories(list('abc')))) @@ -277,7 +288,7 @@ def test_append(self): tm.assert_index_equal(result, expected, exact=True) # invalid objects - self.assertRaises(TypeError, lambda: ci.append(Index(['a', 'd']))) + pytest.raises(TypeError, lambda: ci.append(Index(['a', 'd']))) # GH14298 - if base object is not categorical -> coerce to object result = Index(['c', 'a']).append(ci) @@ -305,7 +316,7 @@ def test_insert(self): tm.assert_index_equal(result, expected, exact=True) # invalid - self.assertRaises(TypeError, lambda: ci.insert(0, 'd')) + pytest.raises(TypeError, lambda: ci.insert(0, 'd')) def test_delete(self): @@ -320,9 +331,9 @@ def test_delete(self): expected = CategoricalIndex(list('aabbc'), categories=categories) tm.assert_index_equal(result, expected, exact=True) - with tm.assertRaises((IndexError, ValueError)): - # either depeidnig on numpy version - result = ci.delete(10) + with pytest.raises((IndexError, ValueError)): + # Either depending on NumPy version + ci.delete(10) def test_astype(self): @@ -331,26 +342,41 @@ def test_astype(self): tm.assert_index_equal(result, ci, exact=True) result = ci.astype(object) - self.assert_index_equal(result, Index(np.array(ci))) + tm.assert_index_equal(result, Index(np.array(ci))) # this IS equal, but not the same class - self.assertTrue(result.equals(ci)) - self.assertIsInstance(result, Index) - self.assertNotIsInstance(result, CategoricalIndex) + assert result.equals(ci) + assert isinstance(result, Index) + assert not isinstance(result, CategoricalIndex) - def test_reindex_base(self): + # interval + ii = IntervalIndex.from_arrays(left=[-0.001, 2.0], + right=[2, 4], + closed='right') + + ci = CategoricalIndex(Categorical.from_codes( + [0, 1, -1], categories=ii, ordered=True)) - # determined by cat ordering - idx = self.create_index() - expected = np.array([4, 0, 1, 5, 2, 3], dtype=np.intp) + result = ci.astype('interval') + expected = ii.take([0, 1, -1]) + tm.assert_index_equal(result, expected) + + result = IntervalIndex.from_intervals(result.values) + tm.assert_index_equal(result, expected) + + def test_reindex_base(self): + # Determined by cat ordering. + idx = CategoricalIndex(list("cab"), categories=list("cab")) + expected = np.arange(len(idx), dtype=np.intp) actual = idx.get_indexer(idx) tm.assert_numpy_array_equal(expected, actual) - with tm.assertRaisesRegexp(ValueError, 'Invalid fill method'): - idx.get_indexer(idx, method='invalid') + with tm.assert_raises_regex(ValueError, "Invalid fill method"): + idx.get_indexer(idx, method="invalid") def test_reindexing(self): + np.random.seed(123456789) ci = self.create_index() oidx = Index(np.array(ci)) @@ -360,14 +386,26 @@ def test_reindexing(self): expected = oidx.get_indexer_non_unique(finder)[0] actual = ci.get_indexer(finder) - tm.assert_numpy_array_equal(expected.values, actual, check_dtype=False) + tm.assert_numpy_array_equal(expected, actual) + + # see gh-17323 + # + # Even when indexer is equal to the + # members in the index, we should + # respect duplicates instead of taking + # the fast-track path. + for finder in [list("aabbca"), list("aababca")]: + expected = oidx.get_indexer_non_unique(finder)[0] + + actual = ci.get_indexer(finder) + tm.assert_numpy_array_equal(expected, actual) def test_reindex_dtype(self): c = CategoricalIndex(['a', 'b', 'c', 'a']) res, indexer = c.reindex(['a', 'c']) tm.assert_index_equal(res, Index(['a', 'a', 'c']), exact=True) tm.assert_numpy_array_equal(indexer, - np.array([0, 3, 2], dtype=np.int64)) + np.array([0, 3, 2], dtype=np.intp)) c = CategoricalIndex(['a', 'b', 'c', 'a']) res, indexer = c.reindex(Categorical(['a', 'c'])) @@ -375,7 +413,7 @@ def test_reindex_dtype(self): exp = CategoricalIndex(['a', 'a', 'c'], categories=['a', 'c']) tm.assert_index_equal(res, exp, exact=True) tm.assert_numpy_array_equal(indexer, - np.array([0, 3, 2], dtype=np.int64)) + np.array([0, 3, 2], dtype=np.intp)) c = CategoricalIndex(['a', 'b', 'c', 'a'], categories=['a', 'b', 'c', 'd']) @@ -383,7 +421,7 @@ def test_reindex_dtype(self): exp = Index(['a', 'a', 'c'], dtype='object') tm.assert_index_equal(res, exp, exact=True) tm.assert_numpy_array_equal(indexer, - np.array([0, 3, 2], dtype=np.int64)) + np.array([0, 3, 2], dtype=np.intp)) c = CategoricalIndex(['a', 'b', 'c', 'a'], categories=['a', 'b', 'c', 'd']) @@ -391,17 +429,57 @@ def test_reindex_dtype(self): exp = CategoricalIndex(['a', 'a', 'c'], categories=['a', 'c']) tm.assert_index_equal(res, exp, exact=True) tm.assert_numpy_array_equal(indexer, - np.array([0, 3, 2], dtype=np.int64)) + np.array([0, 3, 2], dtype=np.intp)) + + def test_reindex_empty_index(self): + # See GH16770 + c = CategoricalIndex([]) + res, indexer = c.reindex(['a', 'b']) + tm.assert_index_equal(res, Index(['a', 'b']), exact=True) + tm.assert_numpy_array_equal(indexer, + np.array([-1, -1], dtype=np.intp)) + + def test_is_monotonic(self): + c = CategoricalIndex([1, 2, 3]) + assert c.is_monotonic_increasing + assert not c.is_monotonic_decreasing + + c = CategoricalIndex([1, 2, 3], ordered=True) + assert c.is_monotonic_increasing + assert not c.is_monotonic_decreasing + + c = CategoricalIndex([1, 2, 3], categories=[3, 2, 1]) + assert not c.is_monotonic_increasing + assert c.is_monotonic_decreasing + + c = CategoricalIndex([1, 3, 2], categories=[3, 2, 1]) + assert not c.is_monotonic_increasing + assert not c.is_monotonic_decreasing + + c = CategoricalIndex([1, 2, 3], categories=[3, 2, 1], ordered=True) + assert not c.is_monotonic_increasing + assert c.is_monotonic_decreasing + + # non lexsorted categories + categories = [9, 0, 1, 2, 3] + + c = CategoricalIndex([9, 0], categories=categories) + assert c.is_monotonic_increasing + assert not c.is_monotonic_decreasing + + c = CategoricalIndex([0, 1], categories=categories) + assert c.is_monotonic_increasing + assert not c.is_monotonic_decreasing def test_duplicates(self): idx = CategoricalIndex([0, 0, 0], name='foo') - self.assertFalse(idx.is_unique) - self.assertTrue(idx.has_duplicates) + assert not idx.is_unique + assert idx.has_duplicates expected = CategoricalIndex([0], name='foo') - self.assert_index_equal(idx.drop_duplicates(), expected) - self.assert_index_equal(idx.unique(), expected) + tm.assert_index_equal(idx.drop_duplicates(), expected) + tm.assert_index_equal(idx.unique(), expected) def test_get_indexer(self): @@ -412,22 +490,22 @@ def test_get_indexer(self): r1 = idx1.get_indexer(idx2) assert_almost_equal(r1, np.array([0, 1, 2, -1], dtype=np.intp)) - self.assertRaises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='pad')) - self.assertRaises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='backfill')) - self.assertRaises(NotImplementedError, - lambda: idx2.get_indexer(idx1, method='nearest')) + pytest.raises(NotImplementedError, + lambda: idx2.get_indexer(idx1, method='pad')) + pytest.raises(NotImplementedError, + lambda: idx2.get_indexer(idx1, method='backfill')) + pytest.raises(NotImplementedError, + lambda: idx2.get_indexer(idx1, method='nearest')) def test_get_loc(self): # GH 12531 cidx1 = CategoricalIndex(list('abcde'), categories=list('edabc')) idx1 = Index(list('abcde')) - self.assertEqual(cidx1.get_loc('a'), idx1.get_loc('a')) - self.assertEqual(cidx1.get_loc('e'), idx1.get_loc('e')) + assert cidx1.get_loc('a') == idx1.get_loc('a') + assert cidx1.get_loc('e') == idx1.get_loc('e') for i in [cidx1, idx1]: - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): i.get_loc('NOT-EXIST') # non-unique @@ -436,16 +514,16 @@ def test_get_loc(self): # results in bool array res = cidx2.get_loc('d') - self.assert_numpy_array_equal(res, idx2.get_loc('d')) - self.assert_numpy_array_equal(res, np.array([False, False, False, - True, False, True])) + tm.assert_numpy_array_equal(res, idx2.get_loc('d')) + tm.assert_numpy_array_equal(res, np.array([False, False, False, + True, False, True])) # unique element results in scalar res = cidx2.get_loc('e') - self.assertEqual(res, idx2.get_loc('e')) - self.assertEqual(res, 4) + assert res == idx2.get_loc('e') + assert res == 4 for i in [cidx2, idx2]: - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): i.get_loc('NOT-EXIST') # non-unique, slicable @@ -454,15 +532,15 @@ def test_get_loc(self): # results in slice res = cidx3.get_loc('a') - self.assertEqual(res, idx3.get_loc('a')) - self.assertEqual(res, slice(0, 2, None)) + assert res == idx3.get_loc('a') + assert res == slice(0, 2, None) res = cidx3.get_loc('b') - self.assertEqual(res, idx3.get_loc('b')) - self.assertEqual(res, slice(2, 5, None)) + assert res == idx3.get_loc('b') + assert res == slice(2, 5, None) for i in [cidx3, idx3]: - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): i.get_loc('c') def test_repr_roundtrip(self): @@ -510,92 +588,85 @@ def test_identical(self): ci1 = CategoricalIndex(['a', 'b'], categories=['a', 'b'], ordered=True) ci2 = CategoricalIndex(['a', 'b'], categories=['a', 'b', 'c'], ordered=True) - self.assertTrue(ci1.identical(ci1)) - self.assertTrue(ci1.identical(ci1.copy())) - self.assertFalse(ci1.identical(ci2)) + assert ci1.identical(ci1) + assert ci1.identical(ci1.copy()) + assert not ci1.identical(ci2) def test_ensure_copied_data(self): - # Check the "copy" argument of each Index.__new__ is honoured - # GH12309 + # gh-12309: Check the "copy" argument of each + # Index.__new__ is honored. + # # Must be tested separately from other indexes because - # self.value is not an ndarray - _base = lambda ar : ar if ar.base is None else ar.base + # self.value is not an ndarray. + _base = lambda ar: ar if ar.base is None else ar.base + for index in self.indices.values(): result = CategoricalIndex(index.values, copy=True) tm.assert_index_equal(index, result) - self.assertIsNot(_base(index.values), _base(result.values)) + assert _base(index.values) is not _base(result.values) result = CategoricalIndex(index.values, copy=False) - self.assertIs(_base(index.values), _base(result.values)) + assert _base(index.values) is _base(result.values) def test_equals_categorical(self): - ci1 = CategoricalIndex(['a', 'b'], categories=['a', 'b'], ordered=True) ci2 = CategoricalIndex(['a', 'b'], categories=['a', 'b', 'c'], ordered=True) - self.assertTrue(ci1.equals(ci1)) - self.assertFalse(ci1.equals(ci2)) - self.assertTrue(ci1.equals(ci1.astype(object))) - self.assertTrue(ci1.astype(object).equals(ci1)) + assert ci1.equals(ci1) + assert not ci1.equals(ci2) + assert ci1.equals(ci1.astype(object)) + assert ci1.astype(object).equals(ci1) - self.assertTrue((ci1 == ci1).all()) - self.assertFalse((ci1 != ci1).all()) - self.assertFalse((ci1 > ci1).all()) - self.assertFalse((ci1 < ci1).all()) - self.assertTrue((ci1 <= ci1).all()) - self.assertTrue((ci1 >= ci1).all()) + assert (ci1 == ci1).all() + assert not (ci1 != ci1).all() + assert not (ci1 > ci1).all() + assert not (ci1 < ci1).all() + assert (ci1 <= ci1).all() + assert (ci1 >= ci1).all() - self.assertFalse((ci1 == 1).all()) - self.assertTrue((ci1 == Index(['a', 'b'])).all()) - self.assertTrue((ci1 == ci1.values).all()) + assert not (ci1 == 1).all() + assert (ci1 == Index(['a', 'b'])).all() + assert (ci1 == ci1.values).all() # invalid comparisons - with tm.assertRaisesRegexp(ValueError, "Lengths must match"): + with tm.assert_raises_regex(ValueError, "Lengths must match"): ci1 == Index(['a', 'b', 'c']) - self.assertRaises(TypeError, lambda: ci1 == ci2) - self.assertRaises( + pytest.raises(TypeError, lambda: ci1 == ci2) + pytest.raises( TypeError, lambda: ci1 == Categorical(ci1.values, ordered=False)) - self.assertRaises( + pytest.raises( TypeError, lambda: ci1 == Categorical(ci1.values, categories=list('abc'))) # tests # make sure that we are testing for category inclusion properly ci = CategoricalIndex(list('aabca'), categories=['c', 'a', 'b']) - self.assertFalse(ci.equals(list('aabca'))) - self.assertFalse(ci.equals(CategoricalIndex(list('aabca')))) - self.assertTrue(ci.equals(ci.copy())) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - ci = CategoricalIndex(list('aabca'), - categories=['c', 'a', 'b', np.nan]) - self.assertFalse(ci.equals(list('aabca'))) - self.assertFalse(ci.equals(CategoricalIndex(list('aabca')))) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertTrue(ci.equals(ci.copy())) + assert not ci.equals(list('aabca')) + assert not ci.equals(CategoricalIndex(list('aabca'))) + assert ci.equals(ci.copy()) ci = CategoricalIndex(list('aabca') + [np.nan], categories=['c', 'a', 'b']) - self.assertFalse(ci.equals(list('aabca'))) - self.assertFalse(ci.equals(CategoricalIndex(list('aabca')))) - self.assertTrue(ci.equals(ci.copy())) + assert not ci.equals(list('aabca')) + assert not ci.equals(CategoricalIndex(list('aabca'))) + assert ci.equals(ci.copy()) ci = CategoricalIndex(list('aabca') + [np.nan], categories=['c', 'a', 'b']) - self.assertFalse(ci.equals(list('aabca') + [np.nan])) - self.assertFalse(ci.equals(CategoricalIndex(list('aabca') + [np.nan]))) - self.assertTrue(ci.equals(ci.copy())) + assert not ci.equals(list('aabca') + [np.nan]) + assert not ci.equals(CategoricalIndex(list('aabca') + [np.nan])) + assert ci.equals(ci.copy()) def test_string_categorical_index_repr(self): # short idx = pd.CategoricalIndex(['a', 'bb', 'ccc']) if PY3: - expected = u"""CategoricalIndex(['a', 'bb', 'ccc'], categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')""" - self.assertEqual(repr(idx), expected) + expected = u"""CategoricalIndex(['a', 'bb', 'ccc'], categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')""" # noqa + assert repr(idx) == expected else: - expected = u"""CategoricalIndex([u'a', u'bb', u'ccc'], categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category')""" - self.assertEqual(unicode(idx), expected) + expected = u"""CategoricalIndex([u'a', u'bb', u'ccc'], categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category')""" # noqa + assert unicode(idx) == expected # multiple lines idx = pd.CategoricalIndex(['a', 'bb', 'ccc'] * 10) @@ -603,17 +674,17 @@ def test_string_categorical_index_repr(self): expected = u"""CategoricalIndex(['a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc'], - categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')""" + categories=['a', 'bb', 'ccc'], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc'], - categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category')""" + categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # truncated idx = pd.CategoricalIndex(['a', 'bb', 'ccc'] * 100) @@ -621,42 +692,42 @@ def test_string_categorical_index_repr(self): expected = u"""CategoricalIndex(['a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', ... 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc', 'a', 'bb', 'ccc'], - categories=['a', 'bb', 'ccc'], ordered=False, dtype='category', length=300)""" + categories=['a', 'bb', 'ccc'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', ... u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc'], - categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category', length=300)""" + categories=[u'a', u'bb', u'ccc'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # larger categories idx = pd.CategoricalIndex(list('abcdefghijklmmo')) if PY3: expected = u"""CategoricalIndex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'm', 'o'], - categories=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', ...], ordered=False, dtype='category')""" + categories=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l', u'm', u'm', u'o'], - categories=[u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', ...], ordered=False, dtype='category')""" + categories=[u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # short idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう']) if PY3: - expected = u"""CategoricalIndex(['あ', 'いい', 'ううう'], categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" - self.assertEqual(repr(idx), expected) + expected = u"""CategoricalIndex(['あ', 'いい', 'ううう'], categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" # noqa + assert repr(idx) == expected else: - expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう'], categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" - self.assertEqual(unicode(idx), expected) + expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう'], categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" # noqa + assert unicode(idx) == expected # multiple lines idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう'] * 10) @@ -664,17 +735,17 @@ def test_string_categorical_index_repr(self): expected = u"""CategoricalIndex(['あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう'], - categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" + categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう'], - categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" + categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # truncated idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう'] * 100) @@ -682,33 +753,33 @@ def test_string_categorical_index_repr(self): expected = u"""CategoricalIndex(['あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', ... 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう'], - categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category', length=300)""" + categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', ... u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう'], - categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category', length=300)""" + categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # larger categories idx = pd.CategoricalIndex(list(u'あいうえおかきくけこさしすせそ')) if PY3: expected = u"""CategoricalIndex(['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', 'け', 'こ', 'さ', 'し', 'す', 'せ', 'そ'], - categories=['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', ...], ordered=False, dtype='category')""" + categories=['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', u'け', u'こ', u'さ', u'し', u'す', u'せ', u'そ'], - categories=[u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', ...], ordered=False, dtype='category')""" + categories=[u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # Emable Unicode option ----------------------------------------- with cf.option_context('display.unicode.east_asian_width', True): @@ -716,11 +787,11 @@ def test_string_categorical_index_repr(self): # short idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう']) if PY3: - expected = u"""CategoricalIndex(['あ', 'いい', 'ううう'], categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" - self.assertEqual(repr(idx), expected) + expected = u"""CategoricalIndex(['あ', 'いい', 'ううう'], categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" # noqa + assert repr(idx) == expected else: - expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう'], categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" - self.assertEqual(unicode(idx), expected) + expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう'], categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" # noqa + assert unicode(idx) == expected # multiple lines idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう'] * 10) @@ -729,18 +800,18 @@ def test_string_categorical_index_repr(self): 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう'], - categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" + categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう'], - categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" + categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # truncated idx = pd.CategoricalIndex([u'あ', u'いい', u'ううう'] * 100) @@ -750,44 +821,44 @@ def test_string_categorical_index_repr(self): ... 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう', 'あ', 'いい', 'ううう'], - categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category', length=300)""" + categories=['あ', 'いい', 'ううう'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', ... u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう', u'あ', u'いい', u'ううう'], - categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category', length=300)""" + categories=[u'あ', u'いい', u'ううう'], ordered=False, dtype='category', length=300)""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected # larger categories idx = pd.CategoricalIndex(list(u'あいうえおかきくけこさしすせそ')) if PY3: expected = u"""CategoricalIndex(['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', 'け', 'こ', 'さ', 'し', 'す', 'せ', 'そ'], - categories=['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', ...], ordered=False, dtype='category')""" + categories=['あ', 'い', 'う', 'え', 'お', 'か', 'き', 'く', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(idx), expected) + assert repr(idx) == expected else: expected = u"""CategoricalIndex([u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', u'け', u'こ', u'さ', u'し', u'す', u'せ', u'そ'], - categories=[u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', ...], ordered=False, dtype='category')""" + categories=[u'あ', u'い', u'う', u'え', u'お', u'か', u'き', u'く', ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(unicode(idx), expected) + assert unicode(idx) == expected def test_fillna_categorical(self): # GH 11343 idx = CategoricalIndex([1.0, np.nan, 3.0, 1.0], name='x') # fill by value in categories exp = CategoricalIndex([1.0, 1.0, 3.0, 1.0], name='x') - self.assert_index_equal(idx.fillna(1.0), exp) + tm.assert_index_equal(idx.fillna(1.0), exp) # fill by value not in categories raises ValueError - with tm.assertRaisesRegexp(ValueError, - 'fill value must be in categories'): + with tm.assert_raises_regex(ValueError, + 'fill value must be in categories'): idx.fillna(2.0) def test_take_fill_value(self): @@ -841,12 +912,12 @@ def test_take_fill_value(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def test_take_fill_value_datetime(self): @@ -879,12 +950,12 @@ def test_take_fill_value_datetime(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def test_take_invalid_kwargs(self): @@ -892,13 +963,13 @@ def test_take_invalid_kwargs(self): indices = [1, 0, -1] msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, idx.take, - indices, foo=2) + tm.assert_raises_regex(TypeError, msg, idx.take, + indices, foo=2) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, out=indices) + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, out=indices) msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, mode='clip') + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, mode='clip') diff --git a/pandas/tests/indexes/test_datetimelike.py b/pandas/tests/indexes/test_datetimelike.py deleted file mode 100644 index 2cd73ec8d254a..0000000000000 --- a/pandas/tests/indexes/test_datetimelike.py +++ /dev/null @@ -1,1235 +0,0 @@ -# -*- coding: utf-8 -*- - -from datetime import datetime, timedelta, time, date - -import numpy as np - -from pandas import (DatetimeIndex, Float64Index, Index, Int64Index, - NaT, Period, PeriodIndex, Series, Timedelta, - TimedeltaIndex, date_range, period_range, - timedelta_range, notnull) - -import pandas.util.testing as tm - -import pandas as pd -from pandas.tslib import Timestamp, OutOfBoundsDatetime - -from .common import Base - - -class DatetimeLike(Base): - - def test_shift_identity(self): - - idx = self.create_index() - self.assert_index_equal(idx, idx.shift(0)) - - def test_str(self): - - # test the string repr - idx = self.create_index() - idx.name = 'foo' - self.assertFalse("length=%s" % len(idx) in str(idx)) - self.assertTrue("'foo'" in str(idx)) - self.assertTrue(idx.__class__.__name__ in str(idx)) - - if hasattr(idx, 'tz'): - if idx.tz is not None: - self.assertTrue(idx.tz in str(idx)) - if hasattr(idx, 'freq'): - self.assertTrue("freq='%s'" % idx.freqstr in str(idx)) - - def test_view(self): - super(DatetimeLike, self).test_view() - - i = self.create_index() - - i_view = i.view('i8') - result = self._holder(i) - tm.assert_index_equal(result, i) - - i_view = i.view(self._holder) - result = self._holder(i) - tm.assert_index_equal(result, i_view) - - -class TestDatetimeIndex(DatetimeLike, tm.TestCase): - _holder = DatetimeIndex - _multiprocess_can_split_ = True - - def setUp(self): - self.indices = dict(index=tm.makeDateIndex(10)) - self.setup_indices() - - def create_index(self): - return date_range('20130101', periods=5) - - def test_shift(self): - - # test shift for datetimeIndex and non datetimeIndex - # GH8083 - - drange = self.create_index() - result = drange.shift(1) - expected = DatetimeIndex(['2013-01-02', '2013-01-03', '2013-01-04', - '2013-01-05', - '2013-01-06'], freq='D') - self.assert_index_equal(result, expected) - - result = drange.shift(-1) - expected = DatetimeIndex(['2012-12-31', '2013-01-01', '2013-01-02', - '2013-01-03', '2013-01-04'], - freq='D') - self.assert_index_equal(result, expected) - - result = drange.shift(3, freq='2D') - expected = DatetimeIndex(['2013-01-07', '2013-01-08', '2013-01-09', - '2013-01-10', - '2013-01-11'], freq='D') - self.assert_index_equal(result, expected) - - def test_construction_with_alt(self): - - i = pd.date_range('20130101', periods=5, freq='H', tz='US/Eastern') - i2 = DatetimeIndex(i, dtype=i.dtype) - self.assert_index_equal(i, i2) - - i2 = DatetimeIndex(i.tz_localize(None).asi8, tz=i.dtype.tz) - self.assert_index_equal(i, i2) - - i2 = DatetimeIndex(i.tz_localize(None).asi8, dtype=i.dtype) - self.assert_index_equal(i, i2) - - i2 = DatetimeIndex( - i.tz_localize(None).asi8, dtype=i.dtype, tz=i.dtype.tz) - self.assert_index_equal(i, i2) - - # localize into the provided tz - i2 = DatetimeIndex(i.tz_localize(None).asi8, tz='UTC') - expected = i.tz_localize(None).tz_localize('UTC') - self.assert_index_equal(i2, expected) - - # incompat tz/dtype - self.assertRaises(ValueError, lambda: DatetimeIndex( - i.tz_localize(None).asi8, dtype=i.dtype, tz='US/Pacific')) - - def test_pickle_compat_construction(self): - pass - - def test_construction_index_with_mixed_timezones(self): - # GH 11488 - # no tz results in DatetimeIndex - result = Index([Timestamp('2011-01-01'), - Timestamp('2011-01-02')], name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01'), - Timestamp('2011-01-02')], name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNone(result.tz) - - # same tz results in DatetimeIndex - result = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', tz='Asia/Tokyo')], - name='idx') - exp = DatetimeIndex( - [Timestamp('2011-01-01 10:00'), Timestamp('2011-01-02 10:00') - ], tz='Asia/Tokyo', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - # same tz results in DatetimeIndex (DST) - result = Index([Timestamp('2011-01-01 10:00', tz='US/Eastern'), - Timestamp('2011-08-01 10:00', tz='US/Eastern')], - name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), - Timestamp('2011-08-01 10:00')], - tz='US/Eastern', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - # different tz results in Index(dtype=object) - result = Index([Timestamp('2011-01-01 10:00'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - name='idx') - exp = Index([Timestamp('2011-01-01 10:00'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - dtype='object', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertFalse(isinstance(result, DatetimeIndex)) - - result = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - name='idx') - exp = Index([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - dtype='object', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertFalse(isinstance(result, DatetimeIndex)) - - # length = 1 - result = Index([Timestamp('2011-01-01')], name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01')], name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNone(result.tz) - - # length = 1 with tz - result = Index( - [Timestamp('2011-01-01 10:00', tz='Asia/Tokyo')], name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 10:00')], tz='Asia/Tokyo', - name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - def test_construction_index_with_mixed_timezones_with_NaT(self): - # GH 11488 - result = Index([pd.NaT, Timestamp('2011-01-01'), - pd.NaT, Timestamp('2011-01-02')], name='idx') - exp = DatetimeIndex([pd.NaT, Timestamp('2011-01-01'), - pd.NaT, Timestamp('2011-01-02')], name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNone(result.tz) - - # same tz results in DatetimeIndex - result = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - pd.NaT, Timestamp('2011-01-02 10:00', - tz='Asia/Tokyo')], - name='idx') - exp = DatetimeIndex([pd.NaT, Timestamp('2011-01-01 10:00'), - pd.NaT, Timestamp('2011-01-02 10:00')], - tz='Asia/Tokyo', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - # same tz results in DatetimeIndex (DST) - result = Index([Timestamp('2011-01-01 10:00', tz='US/Eastern'), - pd.NaT, - Timestamp('2011-08-01 10:00', tz='US/Eastern')], - name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), pd.NaT, - Timestamp('2011-08-01 10:00')], - tz='US/Eastern', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - # different tz results in Index(dtype=object) - result = Index([pd.NaT, Timestamp('2011-01-01 10:00'), - pd.NaT, Timestamp('2011-01-02 10:00', - tz='US/Eastern')], - name='idx') - exp = Index([pd.NaT, Timestamp('2011-01-01 10:00'), - pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], - dtype='object', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertFalse(isinstance(result, DatetimeIndex)) - - result = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - pd.NaT, Timestamp('2011-01-02 10:00', - tz='US/Eastern')], name='idx') - exp = Index([pd.NaT, Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], - dtype='object', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertFalse(isinstance(result, DatetimeIndex)) - - # all NaT - result = Index([pd.NaT, pd.NaT], name='idx') - exp = DatetimeIndex([pd.NaT, pd.NaT], name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNone(result.tz) - - # all NaT with tz - result = Index([pd.NaT, pd.NaT], tz='Asia/Tokyo', name='idx') - exp = DatetimeIndex([pd.NaT, pd.NaT], tz='Asia/Tokyo', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - self.assertIsNotNone(result.tz) - self.assertEqual(result.tz, exp.tz) - - def test_construction_dti_with_mixed_timezones(self): - # GH 11488 (not changed, added explicit tests) - - # no tz results in DatetimeIndex - result = DatetimeIndex( - [Timestamp('2011-01-01'), Timestamp('2011-01-02')], name='idx') - exp = DatetimeIndex( - [Timestamp('2011-01-01'), Timestamp('2011-01-02')], name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - - # same tz results in DatetimeIndex - result = DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', - tz='Asia/Tokyo')], - name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), - Timestamp('2011-01-02 10:00')], - tz='Asia/Tokyo', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - - # same tz results in DatetimeIndex (DST) - result = DatetimeIndex([Timestamp('2011-01-01 10:00', tz='US/Eastern'), - Timestamp('2011-08-01 10:00', - tz='US/Eastern')], - name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 10:00'), - Timestamp('2011-08-01 10:00')], - tz='US/Eastern', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - - # different tz coerces tz-naive to tz-awareIndex(dtype=object) - result = DatetimeIndex([Timestamp('2011-01-01 10:00'), - Timestamp('2011-01-02 10:00', - tz='US/Eastern')], name='idx') - exp = DatetimeIndex([Timestamp('2011-01-01 05:00'), - Timestamp('2011-01-02 10:00')], - tz='US/Eastern', name='idx') - self.assert_index_equal(result, exp, exact=True) - self.assertTrue(isinstance(result, DatetimeIndex)) - - # tz mismatch affecting to tz-aware raises TypeError/ValueError - - with tm.assertRaises(ValueError): - DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - name='idx') - - with tm.assertRaisesRegexp(TypeError, 'data is already tz-aware'): - DatetimeIndex([Timestamp('2011-01-01 10:00'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - tz='Asia/Tokyo', name='idx') - - with tm.assertRaises(ValueError): - DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'), - Timestamp('2011-01-02 10:00', tz='US/Eastern')], - tz='US/Eastern', name='idx') - - with tm.assertRaisesRegexp(TypeError, 'data is already tz-aware'): - # passing tz should results in DatetimeIndex, then mismatch raises - # TypeError - Index([pd.NaT, Timestamp('2011-01-01 10:00'), - pd.NaT, Timestamp('2011-01-02 10:00', tz='US/Eastern')], - tz='Asia/Tokyo', name='idx') - - def test_construction_base_constructor(self): - arr = [pd.Timestamp('2011-01-01'), pd.NaT, pd.Timestamp('2011-01-03')] - tm.assert_index_equal(pd.Index(arr), pd.DatetimeIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.DatetimeIndex(np.array(arr))) - - arr = [np.nan, pd.NaT, pd.Timestamp('2011-01-03')] - tm.assert_index_equal(pd.Index(arr), pd.DatetimeIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.DatetimeIndex(np.array(arr))) - - def test_construction_outofbounds(self): - # GH 13663 - dates = [datetime(3000, 1, 1), datetime(4000, 1, 1), - datetime(5000, 1, 1), datetime(6000, 1, 1)] - exp = Index(dates, dtype=object) - # coerces to object - tm.assert_index_equal(Index(dates), exp) - - with tm.assertRaises(OutOfBoundsDatetime): - # can't create DatetimeIndex - DatetimeIndex(dates) - - def test_construction_with_ndarray(self): - # GH 5152 - dates = [datetime(2013, 10, 7), - datetime(2013, 10, 8), - datetime(2013, 10, 9)] - data = DatetimeIndex(dates, freq=pd.tseries.frequencies.BDay()).values - result = DatetimeIndex(data, freq=pd.tseries.frequencies.BDay()) - expected = DatetimeIndex(['2013-10-07', - '2013-10-08', - '2013-10-09'], - freq='B') - tm.assert_index_equal(result, expected) - - def test_astype(self): - # GH 13149, GH 13209 - idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) - - result = idx.astype(object) - expected = Index([Timestamp('2016-05-16')] + [NaT] * 3, dtype=object) - tm.assert_index_equal(result, expected) - - result = idx.astype(int) - expected = Int64Index([1463356800000000000] + - [-9223372036854775808] * 3, dtype=np.int64) - tm.assert_index_equal(result, expected) - - rng = date_range('1/1/2000', periods=10) - result = rng.astype('i8') - self.assert_index_equal(result, Index(rng.asi8)) - self.assert_numpy_array_equal(result.values, rng.asi8) - - def test_astype_with_tz(self): - - # with tz - rng = date_range('1/1/2000', periods=10, tz='US/Eastern') - result = rng.astype('datetime64[ns]') - expected = (date_range('1/1/2000', periods=10, - tz='US/Eastern') - .tz_convert('UTC').tz_localize(None)) - tm.assert_index_equal(result, expected) - - # BUG#10442 : testing astype(str) is correct for Series/DatetimeIndex - result = pd.Series(pd.date_range('2012-01-01', periods=3)).astype(str) - expected = pd.Series( - ['2012-01-01', '2012-01-02', '2012-01-03'], dtype=object) - tm.assert_series_equal(result, expected) - - result = Series(pd.date_range('2012-01-01', periods=3, - tz='US/Eastern')).astype(str) - expected = Series(['2012-01-01 00:00:00-05:00', - '2012-01-02 00:00:00-05:00', - '2012-01-03 00:00:00-05:00'], - dtype=object) - tm.assert_series_equal(result, expected) - - def test_astype_str_compat(self): - # GH 13149, GH 13209 - # verify that we are returing NaT as a string (and not unicode) - - idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) - result = idx.astype(str) - expected = Index(['2016-05-16', 'NaT', 'NaT', 'NaT'], dtype=object) - tm.assert_index_equal(result, expected) - - def test_astype_str(self): - # test astype string - #10442 - result = date_range('2012-01-01', periods=4, - name='test_name').astype(str) - expected = Index(['2012-01-01', '2012-01-02', '2012-01-03', - '2012-01-04'], name='test_name', dtype=object) - tm.assert_index_equal(result, expected) - - # test astype string with tz and name - result = date_range('2012-01-01', periods=3, name='test_name', - tz='US/Eastern').astype(str) - expected = Index(['2012-01-01 00:00:00-05:00', - '2012-01-02 00:00:00-05:00', - '2012-01-03 00:00:00-05:00'], - name='test_name', dtype=object) - tm.assert_index_equal(result, expected) - - # test astype string with freqH and name - result = date_range('1/1/2011', periods=3, freq='H', - name='test_name').astype(str) - expected = Index(['2011-01-01 00:00:00', '2011-01-01 01:00:00', - '2011-01-01 02:00:00'], - name='test_name', dtype=object) - tm.assert_index_equal(result, expected) - - # test astype string with freqH and timezone - result = date_range('3/6/2012 00:00', periods=2, freq='H', - tz='Europe/London', name='test_name').astype(str) - expected = Index(['2012-03-06 00:00:00+00:00', - '2012-03-06 01:00:00+00:00'], - dtype=object, name='test_name') - tm.assert_index_equal(result, expected) - - def test_astype_datetime64(self): - # GH 13149, GH 13209 - idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) - - result = idx.astype('datetime64[ns]') - tm.assert_index_equal(result, idx) - self.assertFalse(result is idx) - - result = idx.astype('datetime64[ns]', copy=False) - tm.assert_index_equal(result, idx) - self.assertTrue(result is idx) - - idx_tz = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN], tz='EST') - result = idx_tz.astype('datetime64[ns]') - expected = DatetimeIndex(['2016-05-16 05:00:00', 'NaT', 'NaT', 'NaT'], - dtype='datetime64[ns]') - tm.assert_index_equal(result, expected) - - def test_astype_raises(self): - # GH 13149, GH 13209 - idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN]) - - self.assertRaises(ValueError, idx.astype, float) - self.assertRaises(ValueError, idx.astype, 'timedelta64') - self.assertRaises(ValueError, idx.astype, 'timedelta64[ns]') - self.assertRaises(ValueError, idx.astype, 'datetime64') - self.assertRaises(ValueError, idx.astype, 'datetime64[D]') - - def test_where_other(self): - - # other is ndarray or Index - i = pd.date_range('20130101', periods=3, tz='US/Eastern') - - for arr in [np.nan, pd.NaT]: - result = i.where(notnull(i), other=np.nan) - expected = i - tm.assert_index_equal(result, expected) - - i2 = i.copy() - i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) - result = i.where(notnull(i2), i2) - tm.assert_index_equal(result, i2) - - i2 = i.copy() - i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) - result = i.where(notnull(i2), i2.values) - tm.assert_index_equal(result, i2) - - def test_where_tz(self): - i = pd.date_range('20130101', periods=3, tz='US/Eastern') - result = i.where(notnull(i)) - expected = i - tm.assert_index_equal(result, expected) - - i2 = i.copy() - i2 = Index([pd.NaT, pd.NaT] + i[2:].tolist()) - result = i.where(notnull(i2)) - expected = i2 - tm.assert_index_equal(result, expected) - - def test_get_loc(self): - idx = pd.date_range('2000-01-01', periods=3) - - for method in [None, 'pad', 'backfill', 'nearest']: - self.assertEqual(idx.get_loc(idx[1], method), 1) - self.assertEqual(idx.get_loc(idx[1].to_pydatetime(), method), 1) - self.assertEqual(idx.get_loc(str(idx[1]), method), 1) - if method is not None: - self.assertEqual(idx.get_loc(idx[1], method, - tolerance=pd.Timedelta('0 days')), - 1) - - self.assertEqual(idx.get_loc('2000-01-01', method='nearest'), 0) - self.assertEqual(idx.get_loc('2000-01-01T12', method='nearest'), 1) - - self.assertEqual(idx.get_loc('2000-01-01T12', method='nearest', - tolerance='1 day'), 1) - self.assertEqual(idx.get_loc('2000-01-01T12', method='nearest', - tolerance=pd.Timedelta('1D')), 1) - self.assertEqual(idx.get_loc('2000-01-01T12', method='nearest', - tolerance=np.timedelta64(1, 'D')), 1) - self.assertEqual(idx.get_loc('2000-01-01T12', method='nearest', - tolerance=timedelta(1)), 1) - with tm.assertRaisesRegexp(ValueError, 'must be convertible'): - idx.get_loc('2000-01-01T12', method='nearest', tolerance='foo') - with tm.assertRaises(KeyError): - idx.get_loc('2000-01-01T03', method='nearest', tolerance='2 hours') - - self.assertEqual(idx.get_loc('2000', method='nearest'), slice(0, 3)) - self.assertEqual(idx.get_loc('2000-01', method='nearest'), slice(0, 3)) - - self.assertEqual(idx.get_loc('1999', method='nearest'), 0) - self.assertEqual(idx.get_loc('2001', method='nearest'), 2) - - with tm.assertRaises(KeyError): - idx.get_loc('1999', method='pad') - with tm.assertRaises(KeyError): - idx.get_loc('2001', method='backfill') - - with tm.assertRaises(KeyError): - idx.get_loc('foobar') - with tm.assertRaises(TypeError): - idx.get_loc(slice(2)) - - idx = pd.to_datetime(['2000-01-01', '2000-01-04']) - self.assertEqual(idx.get_loc('2000-01-02', method='nearest'), 0) - self.assertEqual(idx.get_loc('2000-01-03', method='nearest'), 1) - self.assertEqual(idx.get_loc('2000-01', method='nearest'), slice(0, 2)) - - # time indexing - idx = pd.date_range('2000-01-01', periods=24, freq='H') - tm.assert_numpy_array_equal(idx.get_loc(time(12)), - np.array([12]), check_dtype=False) - tm.assert_numpy_array_equal(idx.get_loc(time(12, 30)), - np.array([]), check_dtype=False) - with tm.assertRaises(NotImplementedError): - idx.get_loc(time(12, 30), method='pad') - - def test_get_indexer(self): - idx = pd.date_range('2000-01-01', periods=3) - exp = np.array([0, 1, 2], dtype=np.intp) - tm.assert_numpy_array_equal(idx.get_indexer(idx), exp) - - target = idx[0] + pd.to_timedelta(['-1 hour', '12 hours', - '1 day 1 hour']) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), - np.array([-1, 0, 1], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), - np.array([0, 1, 2], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), - np.array([0, 1, 1], dtype=np.intp)) - tm.assert_numpy_array_equal( - idx.get_indexer(target, 'nearest', - tolerance=pd.Timedelta('1 hour')), - np.array([0, -1, 1], dtype=np.intp)) - with tm.assertRaises(ValueError): - idx.get_indexer(idx[[0]], method='nearest', tolerance='foo') - - def test_roundtrip_pickle_with_tz(self): - - # GH 8367 - # round-trip of timezone - index = date_range('20130101', periods=3, tz='US/Eastern', name='foo') - unpickled = self.round_trip_pickle(index) - self.assert_index_equal(index, unpickled) - - def test_reindex_preserves_tz_if_target_is_empty_list_or_array(self): - # GH7774 - index = date_range('20130101', periods=3, tz='US/Eastern') - self.assertEqual(str(index.reindex([])[0].tz), 'US/Eastern') - self.assertEqual(str(index.reindex(np.array([]))[0].tz), 'US/Eastern') - - def test_time_loc(self): # GH8667 - from datetime import time - from pandas.index import _SIZE_CUTOFF - - ns = _SIZE_CUTOFF + np.array([-100, 100], dtype=np.int64) - key = time(15, 11, 30) - start = key.hour * 3600 + key.minute * 60 + key.second - step = 24 * 3600 - - for n in ns: - idx = pd.date_range('2014-11-26', periods=n, freq='S') - ts = pd.Series(np.random.randn(n), index=idx) - i = np.arange(start, n, step) - - tm.assert_numpy_array_equal(ts.index.get_loc(key), i, - check_dtype=False) - tm.assert_series_equal(ts[key], ts.iloc[i]) - - left, right = ts.copy(), ts.copy() - left[key] *= -10 - right.iloc[i] *= -10 - tm.assert_series_equal(left, right) - - def test_time_overflow_for_32bit_machines(self): - # GH8943. On some machines NumPy defaults to np.int32 (for example, - # 32-bit Linux machines). In the function _generate_regular_range - # found in tseries/index.py, `periods` gets multiplied by `strides` - # (which has value 1e9) and since the max value for np.int32 is ~2e9, - # and since those machines won't promote np.int32 to np.int64, we get - # overflow. - periods = np.int_(1000) - - idx1 = pd.date_range(start='2000', periods=periods, freq='S') - self.assertEqual(len(idx1), periods) - - idx2 = pd.date_range(end='2000', periods=periods, freq='S') - self.assertEqual(len(idx2), periods) - - def test_intersection(self): - first = self.index - second = self.index[5:] - intersect = first.intersection(second) - self.assertTrue(tm.equalContents(intersect, second)) - - # GH 10149 - cases = [klass(second.values) for klass in [np.array, Series, list]] - for case in cases: - result = first.intersection(case) - self.assertTrue(tm.equalContents(result, second)) - - third = Index(['a', 'b', 'c']) - result = first.intersection(third) - expected = pd.Index([], dtype=object) - self.assert_index_equal(result, expected) - - def test_union(self): - first = self.index[:5] - second = self.index[5:] - everything = self.index - union = first.union(second) - self.assertTrue(tm.equalContents(union, everything)) - - # GH 10149 - cases = [klass(second.values) for klass in [np.array, Series, list]] - for case in cases: - result = first.union(case) - self.assertTrue(tm.equalContents(result, everything)) - - def test_nat(self): - self.assertIs(DatetimeIndex([np.nan])[0], pd.NaT) - - def test_ufunc_coercions(self): - idx = date_range('2011-01-01', periods=3, freq='2D', name='x') - - delta = np.timedelta64(1, 'D') - for result in [idx + delta, np.add(idx, delta)]: - tm.assertIsInstance(result, DatetimeIndex) - exp = date_range('2011-01-02', periods=3, freq='2D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '2D') - - for result in [idx - delta, np.subtract(idx, delta)]: - tm.assertIsInstance(result, DatetimeIndex) - exp = date_range('2010-12-31', periods=3, freq='2D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '2D') - - delta = np.array([np.timedelta64(1, 'D'), np.timedelta64(2, 'D'), - np.timedelta64(3, 'D')]) - for result in [idx + delta, np.add(idx, delta)]: - tm.assertIsInstance(result, DatetimeIndex) - exp = DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-08'], - freq='3D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '3D') - - for result in [idx - delta, np.subtract(idx, delta)]: - tm.assertIsInstance(result, DatetimeIndex) - exp = DatetimeIndex(['2010-12-31', '2011-01-01', '2011-01-02'], - freq='D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, 'D') - - def test_fillna_datetime64(self): - # GH 11343 - for tz in ['US/Eastern', 'Asia/Tokyo']: - idx = pd.DatetimeIndex(['2011-01-01 09:00', pd.NaT, - '2011-01-01 11:00']) - - exp = pd.DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00']) - self.assert_index_equal( - idx.fillna(pd.Timestamp('2011-01-01 10:00')), exp) - - # tz mismatch - exp = pd.Index([pd.Timestamp('2011-01-01 09:00'), - pd.Timestamp('2011-01-01 10:00', tz=tz), - pd.Timestamp('2011-01-01 11:00')], dtype=object) - self.assert_index_equal( - idx.fillna(pd.Timestamp('2011-01-01 10:00', tz=tz)), exp) - - # object - exp = pd.Index([pd.Timestamp('2011-01-01 09:00'), 'x', - pd.Timestamp('2011-01-01 11:00')], dtype=object) - self.assert_index_equal(idx.fillna('x'), exp) - - idx = pd.DatetimeIndex(['2011-01-01 09:00', pd.NaT, - '2011-01-01 11:00'], tz=tz) - - exp = pd.DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00'], tz=tz) - self.assert_index_equal( - idx.fillna(pd.Timestamp('2011-01-01 10:00', tz=tz)), exp) - - exp = pd.Index([pd.Timestamp('2011-01-01 09:00', tz=tz), - pd.Timestamp('2011-01-01 10:00'), - pd.Timestamp('2011-01-01 11:00', tz=tz)], - dtype=object) - self.assert_index_equal( - idx.fillna(pd.Timestamp('2011-01-01 10:00')), exp) - - # object - exp = pd.Index([pd.Timestamp('2011-01-01 09:00', tz=tz), - 'x', - pd.Timestamp('2011-01-01 11:00', tz=tz)], - dtype=object) - self.assert_index_equal(idx.fillna('x'), exp) - - def test_difference_freq(self): - # GH14323: difference of DatetimeIndex should not preserve frequency - - index = date_range("20160920", "20160925", freq="D") - other = date_range("20160921", "20160924", freq="D") - expected = DatetimeIndex(["20160920", "20160925"], freq=None) - idx_diff = index.difference(other) - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) - - other = date_range("20160922", "20160925", freq="D") - idx_diff = index.difference(other) - expected = DatetimeIndex(["20160920", "20160921"], freq=None) - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) - - def test_week_of_month_frequency(self): - # GH 5348: "ValueError: Could not evaluate WOM-1SUN" shouldn't raise - d1 = date(2002, 9, 1) - d2 = date(2013, 10, 27) - d3 = date(2012, 9, 30) - idx1 = DatetimeIndex([d1, d2]) - idx2 = DatetimeIndex([d3]) - result_append = idx1.append(idx2) - expected = DatetimeIndex([d1, d2, d3]) - tm.assert_index_equal(result_append, expected) - result_union = idx1.union(idx2) - expected = DatetimeIndex([d1, d3, d2]) - tm.assert_index_equal(result_union, expected) - - # GH 5115 - result = date_range("2013-1-1", periods=4, freq='WOM-1SAT') - dates = ['2013-01-05', '2013-02-02', '2013-03-02', '2013-04-06'] - expected = DatetimeIndex(dates, freq='WOM-1SAT') - tm.assert_index_equal(result, expected) - - -class TestPeriodIndex(DatetimeLike, tm.TestCase): - _holder = PeriodIndex - _multiprocess_can_split_ = True - - def setUp(self): - self.indices = dict(index=tm.makePeriodIndex(10)) - self.setup_indices() - - def create_index(self): - return period_range('20130101', periods=5, freq='D') - - def test_construction_base_constructor(self): - # GH 13664 - arr = [pd.Period('2011-01', freq='M'), pd.NaT, - pd.Period('2011-03', freq='M')] - tm.assert_index_equal(pd.Index(arr), pd.PeriodIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.PeriodIndex(np.array(arr))) - - arr = [np.nan, pd.NaT, pd.Period('2011-03', freq='M')] - tm.assert_index_equal(pd.Index(arr), pd.PeriodIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.PeriodIndex(np.array(arr))) - - arr = [pd.Period('2011-01', freq='M'), pd.NaT, - pd.Period('2011-03', freq='D')] - tm.assert_index_equal(pd.Index(arr), pd.Index(arr, dtype=object)) - - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.Index(np.array(arr), dtype=object)) - - def test_astype(self): - # GH 13149, GH 13209 - idx = PeriodIndex(['2016-05-16', 'NaT', NaT, np.NaN], freq='D') - - result = idx.astype(object) - expected = Index([Period('2016-05-16', freq='D')] + - [Period(NaT, freq='D')] * 3, dtype='object') - tm.assert_index_equal(result, expected) - - result = idx.astype(int) - expected = Int64Index([16937] + [-9223372036854775808] * 3, - dtype=np.int64) - tm.assert_index_equal(result, expected) - - idx = period_range('1990', '2009', freq='A') - result = idx.astype('i8') - self.assert_index_equal(result, Index(idx.asi8)) - self.assert_numpy_array_equal(result.values, idx.asi8) - - def test_astype_raises(self): - # GH 13149, GH 13209 - idx = PeriodIndex(['2016-05-16', 'NaT', NaT, np.NaN], freq='D') - - self.assertRaises(ValueError, idx.astype, str) - self.assertRaises(ValueError, idx.astype, float) - self.assertRaises(ValueError, idx.astype, 'timedelta64') - self.assertRaises(ValueError, idx.astype, 'timedelta64[ns]') - - def test_shift(self): - - # test shift for PeriodIndex - # GH8083 - drange = self.create_index() - result = drange.shift(1) - expected = PeriodIndex(['2013-01-02', '2013-01-03', '2013-01-04', - '2013-01-05', '2013-01-06'], freq='D') - self.assert_index_equal(result, expected) - - def test_pickle_compat_construction(self): - pass - - def test_get_loc(self): - idx = pd.period_range('2000-01-01', periods=3) - - for method in [None, 'pad', 'backfill', 'nearest']: - self.assertEqual(idx.get_loc(idx[1], method), 1) - self.assertEqual( - idx.get_loc(idx[1].asfreq('H', how='start'), method), 1) - self.assertEqual(idx.get_loc(idx[1].to_timestamp(), method), 1) - self.assertEqual( - idx.get_loc(idx[1].to_timestamp().to_pydatetime(), method), 1) - self.assertEqual(idx.get_loc(str(idx[1]), method), 1) - - idx = pd.period_range('2000-01-01', periods=5)[::2] - self.assertEqual(idx.get_loc('2000-01-02T12', method='nearest', - tolerance='1 day'), 1) - self.assertEqual(idx.get_loc('2000-01-02T12', method='nearest', - tolerance=pd.Timedelta('1D')), 1) - self.assertEqual(idx.get_loc('2000-01-02T12', method='nearest', - tolerance=np.timedelta64(1, 'D')), 1) - self.assertEqual(idx.get_loc('2000-01-02T12', method='nearest', - tolerance=timedelta(1)), 1) - with tm.assertRaisesRegexp(ValueError, 'must be convertible'): - idx.get_loc('2000-01-10', method='nearest', tolerance='foo') - - msg = 'Input has different freq from PeriodIndex\\(freq=D\\)' - with tm.assertRaisesRegexp(ValueError, msg): - idx.get_loc('2000-01-10', method='nearest', tolerance='1 hour') - with tm.assertRaises(KeyError): - idx.get_loc('2000-01-10', method='nearest', tolerance='1 day') - - def test_where(self): - i = self.create_index() - result = i.where(notnull(i)) - expected = i - tm.assert_index_equal(result, expected) - - i2 = i.copy() - i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), - freq='D') - result = i.where(notnull(i2)) - expected = i2 - tm.assert_index_equal(result, expected) - - def test_where_other(self): - - i = self.create_index() - for arr in [np.nan, pd.NaT]: - result = i.where(notnull(i), other=np.nan) - expected = i - tm.assert_index_equal(result, expected) - - i2 = i.copy() - i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), - freq='D') - result = i.where(notnull(i2), i2) - tm.assert_index_equal(result, i2) - - i2 = i.copy() - i2 = pd.PeriodIndex([pd.NaT, pd.NaT] + i[2:].tolist(), - freq='D') - result = i.where(notnull(i2), i2.values) - tm.assert_index_equal(result, i2) - - def test_get_indexer(self): - idx = pd.period_range('2000-01-01', periods=3).asfreq('H', how='start') - tm.assert_numpy_array_equal(idx.get_indexer(idx), - np.array([0, 1, 2], dtype=np.intp)) - - target = pd.PeriodIndex(['1999-12-31T23', '2000-01-01T12', - '2000-01-02T01'], freq='H') - tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), - np.array([-1, 0, 1], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), - np.array([0, 1, 2], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), - np.array([0, 1, 1], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest', - tolerance='1 hour'), - np.array([0, -1, 1], dtype=np.intp)) - - msg = 'Input has different freq from PeriodIndex\\(freq=H\\)' - with self.assertRaisesRegexp(ValueError, msg): - idx.get_indexer(target, 'nearest', tolerance='1 minute') - - tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest', - tolerance='1 day'), - np.array([0, 1, 1], dtype=np.intp)) - - def test_repeat(self): - # GH10183 - idx = pd.period_range('2000-01-01', periods=3, freq='D') - res = idx.repeat(3) - exp = PeriodIndex(idx.values.repeat(3), freq='D') - self.assert_index_equal(res, exp) - self.assertEqual(res.freqstr, 'D') - - def test_period_index_indexer(self): - # GH4125 - idx = pd.period_range('2002-01', '2003-12', freq='M') - df = pd.DataFrame(pd.np.random.randn(24, 10), index=idx) - self.assert_frame_equal(df, df.loc[idx]) - self.assert_frame_equal(df, df.loc[list(idx)]) - self.assert_frame_equal(df, df.loc[list(idx)]) - self.assert_frame_equal(df.iloc[0:5], df.loc[idx[0:5]]) - self.assert_frame_equal(df, df.loc[list(idx)]) - - def test_fillna_period(self): - # GH 11343 - idx = pd.PeriodIndex(['2011-01-01 09:00', pd.NaT, - '2011-01-01 11:00'], freq='H') - - exp = pd.PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00'], freq='H') - self.assert_index_equal( - idx.fillna(pd.Period('2011-01-01 10:00', freq='H')), exp) - - exp = pd.Index([pd.Period('2011-01-01 09:00', freq='H'), 'x', - pd.Period('2011-01-01 11:00', freq='H')], dtype=object) - self.assert_index_equal(idx.fillna('x'), exp) - - exp = pd.Index([pd.Period('2011-01-01 09:00', freq='H'), - pd.Period('2011-01-01', freq='D'), - pd.Period('2011-01-01 11:00', freq='H')], dtype=object) - self.assert_index_equal(idx.fillna(pd.Period('2011-01-01', freq='D')), - exp) - - def test_no_millisecond_field(self): - with self.assertRaises(AttributeError): - DatetimeIndex.millisecond - - with self.assertRaises(AttributeError): - DatetimeIndex([]).millisecond - - def test_difference_freq(self): - # GH14323: difference of Period MUST preserve frequency - # but the ability to union results must be preserved - - index = period_range("20160920", "20160925", freq="D") - - other = period_range("20160921", "20160924", freq="D") - expected = PeriodIndex(["20160920", "20160925"], freq='D') - idx_diff = index.difference(other) - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) - - other = period_range("20160922", "20160925", freq="D") - idx_diff = index.difference(other) - expected = PeriodIndex(["20160920", "20160921"], freq='D') - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) - - -class TestTimedeltaIndex(DatetimeLike, tm.TestCase): - _holder = TimedeltaIndex - _multiprocess_can_split_ = True - - def setUp(self): - self.indices = dict(index=tm.makeTimedeltaIndex(10)) - self.setup_indices() - - def create_index(self): - return pd.to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) - - def test_construction_base_constructor(self): - arr = [pd.Timedelta('1 days'), pd.NaT, pd.Timedelta('3 days')] - tm.assert_index_equal(pd.Index(arr), pd.TimedeltaIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.TimedeltaIndex(np.array(arr))) - - arr = [np.nan, pd.NaT, pd.Timedelta('1 days')] - tm.assert_index_equal(pd.Index(arr), pd.TimedeltaIndex(arr)) - tm.assert_index_equal(pd.Index(np.array(arr)), - pd.TimedeltaIndex(np.array(arr))) - - def test_shift(self): - # test shift for TimedeltaIndex - # err8083 - - drange = self.create_index() - result = drange.shift(1) - expected = TimedeltaIndex(['1 days 01:00:00', '2 days 01:00:00', - '3 days 01:00:00', - '4 days 01:00:00', '5 days 01:00:00'], - freq='D') - self.assert_index_equal(result, expected) - - result = drange.shift(3, freq='2D 1s') - expected = TimedeltaIndex(['6 days 01:00:03', '7 days 01:00:03', - '8 days 01:00:03', '9 days 01:00:03', - '10 days 01:00:03'], freq='D') - self.assert_index_equal(result, expected) - - def test_astype(self): - # GH 13149, GH 13209 - idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) - - result = idx.astype(object) - expected = Index([Timedelta('1 days 03:46:40')] + [pd.NaT] * 3, - dtype=object) - tm.assert_index_equal(result, expected) - - result = idx.astype(int) - expected = Int64Index([100000000000000] + [-9223372036854775808] * 3, - dtype=np.int64) - tm.assert_index_equal(result, expected) - - rng = timedelta_range('1 days', periods=10) - - result = rng.astype('i8') - self.assert_index_equal(result, Index(rng.asi8)) - self.assert_numpy_array_equal(rng.asi8, result.values) - - def test_astype_timedelta64(self): - # GH 13149, GH 13209 - idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) - - result = idx.astype('timedelta64') - expected = Float64Index([1e+14] + [np.NaN] * 3, dtype='float64') - tm.assert_index_equal(result, expected) - - result = idx.astype('timedelta64[ns]') - tm.assert_index_equal(result, idx) - self.assertFalse(result is idx) - - result = idx.astype('timedelta64[ns]', copy=False) - tm.assert_index_equal(result, idx) - self.assertTrue(result is idx) - - def test_astype_raises(self): - # GH 13149, GH 13209 - idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) - - self.assertRaises(ValueError, idx.astype, float) - self.assertRaises(ValueError, idx.astype, str) - self.assertRaises(ValueError, idx.astype, 'datetime64') - self.assertRaises(ValueError, idx.astype, 'datetime64[ns]') - - def test_get_loc(self): - idx = pd.to_timedelta(['0 days', '1 days', '2 days']) - - for method in [None, 'pad', 'backfill', 'nearest']: - self.assertEqual(idx.get_loc(idx[1], method), 1) - self.assertEqual(idx.get_loc(idx[1].to_pytimedelta(), method), 1) - self.assertEqual(idx.get_loc(str(idx[1]), method), 1) - - self.assertEqual( - idx.get_loc(idx[1], 'pad', tolerance=pd.Timedelta(0)), 1) - self.assertEqual( - idx.get_loc(idx[1], 'pad', tolerance=np.timedelta64(0, 's')), 1) - self.assertEqual(idx.get_loc(idx[1], 'pad', tolerance=timedelta(0)), 1) - - with tm.assertRaisesRegexp(ValueError, 'must be convertible'): - idx.get_loc(idx[1], method='nearest', tolerance='foo') - - for method, loc in [('pad', 1), ('backfill', 2), ('nearest', 1)]: - self.assertEqual(idx.get_loc('1 day 1 hour', method), loc) - - def test_get_indexer(self): - idx = pd.to_timedelta(['0 days', '1 days', '2 days']) - tm.assert_numpy_array_equal(idx.get_indexer(idx), - np.array([0, 1, 2], dtype=np.intp)) - - target = pd.to_timedelta(['-1 hour', '12 hours', '1 day 1 hour']) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), - np.array([-1, 0, 1], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), - np.array([0, 1, 2], dtype=np.intp)) - tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), - np.array([0, 1, 1], dtype=np.intp)) - - res = idx.get_indexer(target, 'nearest', - tolerance=pd.Timedelta('1 hour')) - tm.assert_numpy_array_equal(res, np.array([0, -1, 1], dtype=np.intp)) - - def test_numeric_compat(self): - - idx = self._holder(np.arange(5, dtype='int64')) - didx = self._holder(np.arange(5, dtype='int64') ** 2) - result = idx * 1 - tm.assert_index_equal(result, idx) - - result = 1 * idx - tm.assert_index_equal(result, idx) - - result = idx / 1 - tm.assert_index_equal(result, idx) - - result = idx // 1 - tm.assert_index_equal(result, idx) - - result = idx * np.array(5, dtype='int64') - tm.assert_index_equal(result, - self._holder(np.arange(5, dtype='int64') * 5)) - - result = idx * np.arange(5, dtype='int64') - tm.assert_index_equal(result, didx) - - result = idx * Series(np.arange(5, dtype='int64')) - tm.assert_index_equal(result, didx) - - result = idx * Series(np.arange(5, dtype='float64') + 0.1) - tm.assert_index_equal(result, self._holder(np.arange( - 5, dtype='float64') * (np.arange(5, dtype='float64') + 0.1))) - - # invalid - self.assertRaises(TypeError, lambda: idx * idx) - self.assertRaises(ValueError, lambda: idx * self._holder(np.arange(3))) - self.assertRaises(ValueError, lambda: idx * np.array([1, 2])) - - def test_pickle_compat_construction(self): - pass - - def test_ufunc_coercions(self): - # normal ops are also tested in tseries/test_timedeltas.py - idx = TimedeltaIndex(['2H', '4H', '6H', '8H', '10H'], - freq='2H', name='x') - - for result in [idx * 2, np.multiply(idx, 2)]: - tm.assertIsInstance(result, TimedeltaIndex) - exp = TimedeltaIndex(['4H', '8H', '12H', '16H', '20H'], - freq='4H', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '4H') - - for result in [idx / 2, np.divide(idx, 2)]: - tm.assertIsInstance(result, TimedeltaIndex) - exp = TimedeltaIndex(['1H', '2H', '3H', '4H', '5H'], - freq='H', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, 'H') - - idx = TimedeltaIndex(['2H', '4H', '6H', '8H', '10H'], - freq='2H', name='x') - for result in [-idx, np.negative(idx)]: - tm.assertIsInstance(result, TimedeltaIndex) - exp = TimedeltaIndex(['-2H', '-4H', '-6H', '-8H', '-10H'], - freq='-2H', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '-2H') - - idx = TimedeltaIndex(['-2H', '-1H', '0H', '1H', '2H'], - freq='H', name='x') - for result in [abs(idx), np.absolute(idx)]: - tm.assertIsInstance(result, TimedeltaIndex) - exp = TimedeltaIndex(['2H', '1H', '0H', '1H', '2H'], - freq=None, name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, None) - - def test_fillna_timedelta(self): - # GH 11343 - idx = pd.TimedeltaIndex(['1 day', pd.NaT, '3 day']) - - exp = pd.TimedeltaIndex(['1 day', '2 day', '3 day']) - self.assert_index_equal(idx.fillna(pd.Timedelta('2 day')), exp) - - exp = pd.TimedeltaIndex(['1 day', '3 hour', '3 day']) - idx.fillna(pd.Timedelta('3 hour')) - - exp = pd.Index( - [pd.Timedelta('1 day'), 'x', pd.Timedelta('3 day')], dtype=object) - self.assert_index_equal(idx.fillna('x'), exp) - - def test_difference_freq(self): - # GH14323: Difference of TimedeltaIndex should not preserve frequency - - index = timedelta_range("0 days", "5 days", freq="D") - - other = timedelta_range("1 days", "4 days", freq="D") - expected = TimedeltaIndex(["0 days", "5 days"], freq=None) - idx_diff = index.difference(other) - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) - - other = timedelta_range("2 days", "5 days", freq="D") - idx_diff = index.difference(other) - expected = TimedeltaIndex(["0 days", "1 days"], freq=None) - tm.assert_index_equal(idx_diff, expected) - tm.assert_attr_equal('freq', idx_diff, expected) diff --git a/pandas/tests/indexes/test_frozen.py b/pandas/tests/indexes/test_frozen.py new file mode 100644 index 0000000000000..ca9841112b1d5 --- /dev/null +++ b/pandas/tests/indexes/test_frozen.py @@ -0,0 +1,71 @@ +import numpy as np +from pandas.util import testing as tm +from pandas.tests.test_base import CheckImmutable, CheckStringMixin +from pandas.core.indexes.frozen import FrozenList, FrozenNDArray +from pandas.compat import u + + +class TestFrozenList(CheckImmutable, CheckStringMixin): + mutable_methods = ('extend', 'pop', 'remove', 'insert') + unicode_container = FrozenList([u("\u05d0"), u("\u05d1"), "c"]) + + def setup_method(self, method): + self.lst = [1, 2, 3, 4, 5] + self.container = FrozenList(self.lst) + self.klass = FrozenList + + def test_add(self): + result = self.container + (1, 2, 3) + expected = FrozenList(self.lst + [1, 2, 3]) + self.check_result(result, expected) + + result = (1, 2, 3) + self.container + expected = FrozenList([1, 2, 3] + self.lst) + self.check_result(result, expected) + + def test_inplace(self): + q = r = self.container + q += [5] + self.check_result(q, self.lst + [5]) + # other shouldn't be mutated + self.check_result(r, self.lst) + + +class TestFrozenNDArray(CheckImmutable, CheckStringMixin): + mutable_methods = ('put', 'itemset', 'fill') + unicode_container = FrozenNDArray([u("\u05d0"), u("\u05d1"), "c"]) + + def setup_method(self, method): + self.lst = [3, 5, 7, -2] + self.container = FrozenNDArray(self.lst) + self.klass = FrozenNDArray + + def test_shallow_copying(self): + original = self.container.copy() + assert isinstance(self.container.view(), FrozenNDArray) + assert not isinstance(self.container.view(np.ndarray), FrozenNDArray) + assert self.container.view() is not self.container + tm.assert_numpy_array_equal(self.container, original) + + # Shallow copy should be the same too + assert isinstance(self.container._shallow_copy(), FrozenNDArray) + + # setting should not be allowed + def testit(container): + container[0] = 16 + + self.check_mutable_error(testit, self.container) + + def test_values(self): + original = self.container.view(np.ndarray).copy() + n = original[0] + 15 + + vals = self.container.values() + tm.assert_numpy_array_equal(original, vals) + + assert original is not vals + vals[0] = n + + assert isinstance(self.container, FrozenNDArray) + tm.assert_numpy_array_equal(self.container.values(), original) + assert vals[0] == n diff --git a/pandas/tests/indexes/test_interval.py b/pandas/tests/indexes/test_interval.py new file mode 100644 index 0000000000000..18eefc3fbdca6 --- /dev/null +++ b/pandas/tests/indexes/test_interval.py @@ -0,0 +1,841 @@ +from __future__ import division + +import pytest +import numpy as np + +from pandas import (Interval, IntervalIndex, Index, isna, + interval_range, Timestamp, Timedelta, + compat) +from pandas._libs.interval import IntervalTree +from pandas.tests.indexes.common import Base +import pandas.util.testing as tm +import pandas as pd + + +class TestIntervalIndex(Base): + _holder = IntervalIndex + + def setup_method(self, method): + self.index = IntervalIndex.from_arrays([0, 1], [1, 2]) + self.index_with_nan = IntervalIndex.from_tuples( + [(0, 1), np.nan, (1, 2)]) + self.indices = dict(intervalIndex=tm.makeIntervalIndex(10)) + + def create_index(self): + return IntervalIndex.from_breaks(np.arange(10)) + + def test_constructors(self): + expected = self.index + actual = IntervalIndex.from_breaks(np.arange(3), closed='right') + assert expected.equals(actual) + + alternate = IntervalIndex.from_breaks(np.arange(3), closed='left') + assert not expected.equals(alternate) + + actual = IntervalIndex.from_intervals([Interval(0, 1), Interval(1, 2)]) + assert expected.equals(actual) + + actual = IntervalIndex([Interval(0, 1), Interval(1, 2)]) + assert expected.equals(actual) + + actual = IntervalIndex.from_arrays(np.arange(2), np.arange(2) + 1, + closed='right') + assert expected.equals(actual) + + actual = Index([Interval(0, 1), Interval(1, 2)]) + assert isinstance(actual, IntervalIndex) + assert expected.equals(actual) + + actual = Index(expected) + assert isinstance(actual, IntervalIndex) + assert expected.equals(actual) + + def test_constructors_other(self): + + # all-nan + result = IntervalIndex.from_intervals([np.nan]) + expected = np.array([np.nan], dtype=object) + tm.assert_numpy_array_equal(result.values, expected) + + # empty + result = IntervalIndex.from_intervals([]) + expected = np.array([], dtype=object) + tm.assert_numpy_array_equal(result.values, expected) + + def test_constructors_errors(self): + + # scalar + with pytest.raises(TypeError): + IntervalIndex(5) + + # not an interval + with pytest.raises(TypeError): + IntervalIndex([0, 1]) + + with pytest.raises(TypeError): + IntervalIndex.from_intervals([0, 1]) + + # invalid closed + with pytest.raises(ValueError): + IntervalIndex.from_arrays([0, 1], [1, 2], closed='invalid') + + # mismatched closed + with pytest.raises(ValueError): + IntervalIndex.from_intervals([Interval(0, 1), + Interval(1, 2, closed='left')]) + + with pytest.raises(ValueError): + IntervalIndex.from_arrays([0, 10], [3, 5]) + + with pytest.raises(ValueError): + Index([Interval(0, 1), Interval(2, 3, closed='left')]) + + # no point in nesting periods in an IntervalIndex + with pytest.raises(ValueError): + IntervalIndex.from_breaks( + pd.period_range('2000-01-01', periods=3)) + + def test_constructors_datetimelike(self): + + # DTI / TDI + for idx in [pd.date_range('20130101', periods=5), + pd.timedelta_range('1 day', periods=5)]: + result = IntervalIndex.from_breaks(idx) + expected = IntervalIndex.from_breaks(idx.values) + tm.assert_index_equal(result, expected) + + expected_scalar_type = type(idx[0]) + i = result[0] + assert isinstance(i.left, expected_scalar_type) + assert isinstance(i.right, expected_scalar_type) + + def test_constructors_error(self): + + # non-intervals + def f(): + IntervalIndex.from_intervals([0.997, 4.0]) + pytest.raises(TypeError, f) + + def test_properties(self): + index = self.index + assert len(index) == 2 + assert index.size == 2 + assert index.shape == (2, ) + + tm.assert_index_equal(index.left, Index([0, 1])) + tm.assert_index_equal(index.right, Index([1, 2])) + tm.assert_index_equal(index.mid, Index([0.5, 1.5])) + + assert index.closed == 'right' + + expected = np.array([Interval(0, 1), Interval(1, 2)], dtype=object) + tm.assert_numpy_array_equal(np.asarray(index), expected) + tm.assert_numpy_array_equal(index.values, expected) + + # with nans + index = self.index_with_nan + assert len(index) == 3 + assert index.size == 3 + assert index.shape == (3, ) + + tm.assert_index_equal(index.left, Index([0, np.nan, 1])) + tm.assert_index_equal(index.right, Index([1, np.nan, 2])) + tm.assert_index_equal(index.mid, Index([0.5, np.nan, 1.5])) + + assert index.closed == 'right' + + expected = np.array([Interval(0, 1), np.nan, + Interval(1, 2)], dtype=object) + tm.assert_numpy_array_equal(np.asarray(index), expected) + tm.assert_numpy_array_equal(index.values, expected) + + def test_with_nans(self): + index = self.index + assert not index.hasnans + tm.assert_numpy_array_equal(index.isna(), + np.array([False, False])) + tm.assert_numpy_array_equal(index.notna(), + np.array([True, True])) + + index = self.index_with_nan + assert index.hasnans + tm.assert_numpy_array_equal(index.notna(), + np.array([True, False, True])) + tm.assert_numpy_array_equal(index.isna(), + np.array([False, True, False])) + + def test_copy(self): + actual = self.index.copy() + assert actual.equals(self.index) + + actual = self.index.copy(deep=True) + assert actual.equals(self.index) + assert actual.left is not self.index.left + + def test_ensure_copied_data(self): + # exercise the copy flag in the constructor + + # not copying + index = self.index + result = IntervalIndex(index, copy=False) + tm.assert_numpy_array_equal(index.left.values, result.left.values, + check_same='same') + tm.assert_numpy_array_equal(index.right.values, result.right.values, + check_same='same') + + # by-definition make a copy + result = IntervalIndex.from_intervals(index.values, copy=False) + tm.assert_numpy_array_equal(index.left.values, result.left.values, + check_same='copy') + tm.assert_numpy_array_equal(index.right.values, result.right.values, + check_same='copy') + + def test_equals(self): + + idx = self.index + assert idx.equals(idx) + assert idx.equals(idx.copy()) + + assert not idx.equals(idx.astype(object)) + assert not idx.equals(np.array(idx)) + assert not idx.equals(list(idx)) + + assert not idx.equals([1, 2]) + assert not idx.equals(np.array([1, 2])) + assert not idx.equals(pd.date_range('20130101', periods=2)) + + def test_astype(self): + + idx = self.index + + for dtype in [np.int64, np.float64, 'datetime64[ns]', + 'datetime64[ns, US/Eastern]', 'timedelta64', + 'period[M]']: + pytest.raises(ValueError, idx.astype, dtype) + + result = idx.astype(object) + tm.assert_index_equal(result, Index(idx.values, dtype='object')) + assert not idx.equals(result) + assert idx.equals(IntervalIndex.from_intervals(result)) + + result = idx.astype('interval') + tm.assert_index_equal(result, idx) + assert result.equals(idx) + + result = idx.astype('category') + expected = pd.Categorical(idx, ordered=True) + tm.assert_categorical_equal(result, expected) + + def test_where(self): + expected = self.index + result = self.index.where(self.index.notna()) + tm.assert_index_equal(result, expected) + + idx = IntervalIndex.from_breaks([1, 2]) + result = idx.where([True, False]) + expected = IntervalIndex.from_intervals( + [Interval(1.0, 2.0, closed='right'), np.nan]) + tm.assert_index_equal(result, expected) + + def test_where_array_like(self): + pass + + def test_delete(self): + expected = IntervalIndex.from_breaks([1, 2]) + actual = self.index.delete(0) + assert expected.equals(actual) + + def test_insert(self): + expected = IntervalIndex.from_breaks(range(4)) + actual = self.index.insert(2, Interval(2, 3)) + assert expected.equals(actual) + + pytest.raises(ValueError, self.index.insert, 0, 1) + pytest.raises(ValueError, self.index.insert, 0, + Interval(2, 3, closed='left')) + + def test_take(self): + actual = self.index.take([0, 1]) + assert self.index.equals(actual) + + expected = IntervalIndex.from_arrays([0, 0, 1], [1, 1, 2]) + actual = self.index.take([0, 0, 1]) + assert expected.equals(actual) + + def test_monotonic_and_unique(self): + assert self.index.is_monotonic + assert self.index.is_unique + + idx = IntervalIndex.from_tuples([(0, 1), (0.5, 1.5)]) + assert idx.is_monotonic + assert idx.is_unique + + idx = IntervalIndex.from_tuples([(0, 1), (2, 3), (1, 2)]) + assert not idx.is_monotonic + assert idx.is_unique + + idx = IntervalIndex.from_tuples([(0, 2), (0, 2)]) + assert not idx.is_unique + assert idx.is_monotonic + + @pytest.mark.xfail(reason='not a valid repr as we use interval notation') + def test_repr(self): + i = IntervalIndex.from_tuples([(0, 1), (1, 2)], closed='right') + expected = ("IntervalIndex(left=[0, 1]," + "\n right=[1, 2]," + "\n closed='right'," + "\n dtype='interval[int64]')") + assert repr(i) == expected + + i = IntervalIndex.from_tuples((Timestamp('20130101'), + Timestamp('20130102')), + (Timestamp('20130102'), + Timestamp('20130103')), + closed='right') + expected = ("IntervalIndex(left=['2013-01-01', '2013-01-02']," + "\n right=['2013-01-02', '2013-01-03']," + "\n closed='right'," + "\n dtype='interval[datetime64[ns]]')") + assert repr(i) == expected + + @pytest.mark.xfail(reason='not a valid repr as we use interval notation') + def test_repr_max_seq_item_setting(self): + super(TestIntervalIndex, self).test_repr_max_seq_item_setting() + + @pytest.mark.xfail(reason='not a valid repr as we use interval notation') + def test_repr_roundtrip(self): + super(TestIntervalIndex, self).test_repr_roundtrip() + + def test_get_item(self): + i = IntervalIndex.from_arrays((0, 1, np.nan), (1, 2, np.nan), + closed='right') + assert i[0] == Interval(0.0, 1.0) + assert i[1] == Interval(1.0, 2.0) + assert isna(i[2]) + + result = i[0:1] + expected = IntervalIndex.from_arrays((0.,), (1.,), closed='right') + tm.assert_index_equal(result, expected) + + result = i[0:2] + expected = IntervalIndex.from_arrays((0., 1), (1., 2.), closed='right') + tm.assert_index_equal(result, expected) + + result = i[1:3] + expected = IntervalIndex.from_arrays((1., np.nan), (2., np.nan), + closed='right') + tm.assert_index_equal(result, expected) + + def test_get_loc_value(self): + pytest.raises(KeyError, self.index.get_loc, 0) + assert self.index.get_loc(0.5) == 0 + assert self.index.get_loc(1) == 0 + assert self.index.get_loc(1.5) == 1 + assert self.index.get_loc(2) == 1 + pytest.raises(KeyError, self.index.get_loc, -1) + pytest.raises(KeyError, self.index.get_loc, 3) + + idx = IntervalIndex.from_tuples([(0, 2), (1, 3)]) + assert idx.get_loc(0.5) == 0 + assert idx.get_loc(1) == 0 + tm.assert_numpy_array_equal(idx.get_loc(1.5), + np.array([0, 1], dtype='int64')) + tm.assert_numpy_array_equal(np.sort(idx.get_loc(2)), + np.array([0, 1], dtype='int64')) + assert idx.get_loc(3) == 1 + pytest.raises(KeyError, idx.get_loc, 3.5) + + idx = IntervalIndex.from_arrays([0, 2], [1, 3]) + pytest.raises(KeyError, idx.get_loc, 1.5) + + def slice_locs_cases(self, breaks): + # TODO: same tests for more index types + index = IntervalIndex.from_breaks([0, 1, 2], closed='right') + assert index.slice_locs() == (0, 2) + assert index.slice_locs(0, 1) == (0, 1) + assert index.slice_locs(1, 1) == (0, 1) + assert index.slice_locs(0, 2) == (0, 2) + assert index.slice_locs(0.5, 1.5) == (0, 2) + assert index.slice_locs(0, 0.5) == (0, 1) + assert index.slice_locs(start=1) == (0, 2) + assert index.slice_locs(start=1.2) == (1, 2) + assert index.slice_locs(end=1) == (0, 1) + assert index.slice_locs(end=1.1) == (0, 2) + assert index.slice_locs(end=1.0) == (0, 1) + assert index.slice_locs(-1, -1) == (0, 0) + + index = IntervalIndex.from_breaks([0, 1, 2], closed='neither') + assert index.slice_locs(0, 1) == (0, 1) + assert index.slice_locs(0, 2) == (0, 2) + assert index.slice_locs(0.5, 1.5) == (0, 2) + assert index.slice_locs(1, 1) == (1, 1) + assert index.slice_locs(1, 2) == (1, 2) + + index = IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)], + closed='both') + assert index.slice_locs(1, 1) == (0, 1) + assert index.slice_locs(1, 2) == (0, 2) + + def test_slice_locs_int64(self): + self.slice_locs_cases([0, 1, 2]) + + def test_slice_locs_float64(self): + self.slice_locs_cases([0.0, 1.0, 2.0]) + + def slice_locs_decreasing_cases(self, tuples): + index = IntervalIndex.from_tuples(tuples) + assert index.slice_locs(1.5, 0.5) == (1, 3) + assert index.slice_locs(2, 0) == (1, 3) + assert index.slice_locs(2, 1) == (1, 3) + assert index.slice_locs(3, 1.1) == (0, 3) + assert index.slice_locs(3, 3) == (0, 2) + assert index.slice_locs(3.5, 3.3) == (0, 1) + assert index.slice_locs(1, -3) == (2, 3) + + slice_locs = index.slice_locs(-1, -1) + assert slice_locs[0] == slice_locs[1] + + def test_slice_locs_decreasing_int64(self): + self.slice_locs_cases([(2, 4), (1, 3), (0, 2)]) + + def test_slice_locs_decreasing_float64(self): + self.slice_locs_cases([(2., 4.), (1., 3.), (0., 2.)]) + + def test_slice_locs_fails(self): + index = IntervalIndex.from_tuples([(1, 2), (0, 1), (2, 3)]) + with pytest.raises(KeyError): + index.slice_locs(1, 2) + + def test_get_loc_interval(self): + assert self.index.get_loc(Interval(0, 1)) == 0 + assert self.index.get_loc(Interval(0, 0.5)) == 0 + assert self.index.get_loc(Interval(0, 1, 'left')) == 0 + pytest.raises(KeyError, self.index.get_loc, Interval(2, 3)) + pytest.raises(KeyError, self.index.get_loc, + Interval(-1, 0, 'left')) + + def test_get_indexer(self): + actual = self.index.get_indexer([-1, 0, 0.5, 1, 1.5, 2, 3]) + expected = np.array([-1, -1, 0, 0, 1, 1, -1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index.get_indexer(self.index) + expected = np.array([0, 1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + index = IntervalIndex.from_breaks([0, 1, 2], closed='left') + actual = index.get_indexer([-1, 0, 0.5, 1, 1.5, 2, 3]) + expected = np.array([-1, 0, 0, 1, 1, -1, -1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index.get_indexer(index[:1]) + expected = np.array([0], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index.get_indexer(index) + expected = np.array([-1, 1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + def test_get_indexer_subintervals(self): + + # TODO: is this right? + # return indexers for wholly contained subintervals + target = IntervalIndex.from_breaks(np.linspace(0, 2, 5)) + actual = self.index.get_indexer(target) + expected = np.array([0, 0, 1, 1], dtype='p') + tm.assert_numpy_array_equal(actual, expected) + + target = IntervalIndex.from_breaks([0, 0.67, 1.33, 2]) + actual = self.index.get_indexer(target) + expected = np.array([0, 0, 1, 1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index.get_indexer(target[[0, -1]]) + expected = np.array([0, 1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + target = IntervalIndex.from_breaks([0, 0.33, 0.67, 1], closed='left') + actual = self.index.get_indexer(target) + expected = np.array([0, 0, 0], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + def test_contains(self): + # Only endpoints are valid. + i = IntervalIndex.from_arrays([0, 1], [1, 2]) + + # Invalid + assert 0 not in i + assert 1 not in i + assert 2 not in i + + # Valid + assert Interval(0, 1) in i + assert Interval(0, 2) in i + assert Interval(0, 0.5) in i + assert Interval(3, 5) not in i + assert Interval(-1, 0, closed='left') not in i + + def testcontains(self): + # can select values that are IN the range of a value + i = IntervalIndex.from_arrays([0, 1], [1, 2]) + + assert i.contains(0.1) + assert i.contains(0.5) + assert i.contains(1) + assert i.contains(Interval(0, 1)) + assert i.contains(Interval(0, 2)) + + # these overlaps completely + assert i.contains(Interval(0, 3)) + assert i.contains(Interval(1, 3)) + + assert not i.contains(20) + assert not i.contains(-20) + + def test_dropna(self): + + expected = IntervalIndex.from_tuples([(0.0, 1.0), (1.0, 2.0)]) + + ii = IntervalIndex.from_tuples([(0, 1), (1, 2), np.nan]) + result = ii.dropna() + tm.assert_index_equal(result, expected) + + ii = IntervalIndex.from_arrays([0, 1, np.nan], [1, 2, np.nan]) + result = ii.dropna() + tm.assert_index_equal(result, expected) + + def test_non_contiguous(self): + index = IntervalIndex.from_tuples([(0, 1), (2, 3)]) + target = [0.5, 1.5, 2.5] + actual = index.get_indexer(target) + expected = np.array([0, -1, 1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + assert 1.5 not in index + + def test_union(self): + other = IntervalIndex.from_arrays([2], [3]) + expected = IntervalIndex.from_arrays(range(3), range(1, 4)) + actual = self.index.union(other) + assert expected.equals(actual) + + actual = other.union(self.index) + assert expected.equals(actual) + + tm.assert_index_equal(self.index.union(self.index), self.index) + tm.assert_index_equal(self.index.union(self.index[:1]), + self.index) + + def test_intersection(self): + other = IntervalIndex.from_breaks([1, 2, 3]) + expected = IntervalIndex.from_breaks([1, 2]) + actual = self.index.intersection(other) + assert expected.equals(actual) + + tm.assert_index_equal(self.index.intersection(self.index), + self.index) + + def test_difference(self): + tm.assert_index_equal(self.index.difference(self.index[:1]), + self.index[1:]) + + def test_symmetric_difference(self): + result = self.index[:1].symmetric_difference(self.index[1:]) + expected = self.index + tm.assert_index_equal(result, expected) + + def test_set_operation_errors(self): + pytest.raises(ValueError, self.index.union, self.index.left) + + other = IntervalIndex.from_breaks([0, 1, 2], closed='neither') + pytest.raises(ValueError, self.index.union, other) + + def test_isin(self): + actual = self.index.isin(self.index) + tm.assert_numpy_array_equal(np.array([True, True]), actual) + + actual = self.index.isin(self.index[:1]) + tm.assert_numpy_array_equal(np.array([True, False]), actual) + + def test_comparison(self): + actual = Interval(0, 1) < self.index + expected = np.array([False, True]) + tm.assert_numpy_array_equal(actual, expected) + + actual = Interval(0.5, 1.5) < self.index + expected = np.array([False, True]) + tm.assert_numpy_array_equal(actual, expected) + actual = self.index > Interval(0.5, 1.5) + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index == self.index + expected = np.array([True, True]) + tm.assert_numpy_array_equal(actual, expected) + actual = self.index <= self.index + tm.assert_numpy_array_equal(actual, expected) + actual = self.index >= self.index + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index < self.index + expected = np.array([False, False]) + tm.assert_numpy_array_equal(actual, expected) + actual = self.index > self.index + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index == IntervalIndex.from_breaks([0, 1, 2], 'left') + tm.assert_numpy_array_equal(actual, expected) + + actual = self.index == self.index.values + tm.assert_numpy_array_equal(actual, np.array([True, True])) + actual = self.index.values == self.index + tm.assert_numpy_array_equal(actual, np.array([True, True])) + actual = self.index <= self.index.values + tm.assert_numpy_array_equal(actual, np.array([True, True])) + actual = self.index != self.index.values + tm.assert_numpy_array_equal(actual, np.array([False, False])) + actual = self.index > self.index.values + tm.assert_numpy_array_equal(actual, np.array([False, False])) + actual = self.index.values > self.index + tm.assert_numpy_array_equal(actual, np.array([False, False])) + + # invalid comparisons + actual = self.index == 0 + tm.assert_numpy_array_equal(actual, np.array([False, False])) + actual = self.index == self.index.left + tm.assert_numpy_array_equal(actual, np.array([False, False])) + + with tm.assert_raises_regex(TypeError, 'unorderable types'): + self.index > 0 + with tm.assert_raises_regex(TypeError, 'unorderable types'): + self.index <= 0 + with pytest.raises(TypeError): + self.index > np.arange(2) + with pytest.raises(ValueError): + self.index > np.arange(3) + + def test_missing_values(self): + idx = pd.Index([np.nan, pd.Interval(0, 1), pd.Interval(1, 2)]) + idx2 = pd.IntervalIndex.from_arrays([np.nan, 0, 1], [np.nan, 1, 2]) + assert idx.equals(idx2) + + with pytest.raises(ValueError): + IntervalIndex.from_arrays([np.nan, 0, 1], np.array([0, 1, 2])) + + tm.assert_numpy_array_equal(isna(idx), + np.array([True, False, False])) + + def test_sort_values(self): + expected = IntervalIndex.from_breaks([1, 2, 3, 4]) + actual = IntervalIndex.from_tuples([(3, 4), (1, 2), + (2, 3)]).sort_values() + tm.assert_index_equal(expected, actual) + + # nan + idx = self.index_with_nan + mask = idx.isna() + tm.assert_numpy_array_equal(mask, np.array([False, True, False])) + + result = idx.sort_values() + mask = result.isna() + tm.assert_numpy_array_equal(mask, np.array([False, False, True])) + + result = idx.sort_values(ascending=False) + mask = result.isna() + tm.assert_numpy_array_equal(mask, np.array([True, False, False])) + + def test_datetime(self): + dates = pd.date_range('2000', periods=3) + idx = IntervalIndex.from_breaks(dates) + + tm.assert_index_equal(idx.left, dates[:2]) + tm.assert_index_equal(idx.right, dates[-2:]) + + expected = pd.date_range('2000-01-01T12:00', periods=2) + tm.assert_index_equal(idx.mid, expected) + + assert pd.Timestamp('2000-01-01T12') not in idx + assert pd.Timestamp('2000-01-01T12') not in idx + + target = pd.date_range('1999-12-31T12:00', periods=7, freq='12H') + actual = idx.get_indexer(target) + + expected = np.array([-1, -1, 0, 0, 1, 1, -1], dtype='intp') + tm.assert_numpy_array_equal(actual, expected) + + def test_append(self): + + index1 = IntervalIndex.from_arrays([0, 1], [1, 2]) + index2 = IntervalIndex.from_arrays([1, 2], [2, 3]) + + result = index1.append(index2) + expected = IntervalIndex.from_arrays([0, 1, 1, 2], [1, 2, 2, 3]) + tm.assert_index_equal(result, expected) + + result = index1.append([index1, index2]) + expected = IntervalIndex.from_arrays([0, 1, 0, 1, 1, 2], + [1, 2, 1, 2, 2, 3]) + tm.assert_index_equal(result, expected) + + def f(): + index1.append(IntervalIndex.from_arrays([0, 1], [1, 2], + closed='both')) + + pytest.raises(ValueError, f) + + def test_is_non_overlapping_monotonic(self): + # Should be True in all cases + tpls = [(0, 1), (2, 3), (4, 5), (6, 7)] + for closed in ('left', 'right', 'neither', 'both'): + idx = IntervalIndex.from_tuples(tpls, closed=closed) + assert idx.is_non_overlapping_monotonic is True + + idx = IntervalIndex.from_tuples(reversed(tpls), closed=closed) + assert idx.is_non_overlapping_monotonic is True + + # Should be False in all cases (overlapping) + tpls = [(0, 2), (1, 3), (4, 5), (6, 7)] + for closed in ('left', 'right', 'neither', 'both'): + idx = IntervalIndex.from_tuples(tpls, closed=closed) + assert idx.is_non_overlapping_monotonic is False + + idx = IntervalIndex.from_tuples(reversed(tpls), closed=closed) + assert idx.is_non_overlapping_monotonic is False + + # Should be False in all cases (non-monotonic) + tpls = [(0, 1), (2, 3), (6, 7), (4, 5)] + for closed in ('left', 'right', 'neither', 'both'): + idx = IntervalIndex.from_tuples(tpls, closed=closed) + assert idx.is_non_overlapping_monotonic is False + + idx = IntervalIndex.from_tuples(reversed(tpls), closed=closed) + assert idx.is_non_overlapping_monotonic is False + + # Should be False for closed='both', overwise True (GH16560) + idx = IntervalIndex.from_breaks(range(4), closed='both') + assert idx.is_non_overlapping_monotonic is False + + for closed in ('left', 'right', 'neither'): + idx = IntervalIndex.from_breaks(range(4), closed=closed) + assert idx.is_non_overlapping_monotonic is True + + +class TestIntervalRange(object): + + def test_construction(self): + result = interval_range(0, 5, name='foo', closed='both') + expected = IntervalIndex.from_breaks( + np.arange(0, 5), name='foo', closed='both') + tm.assert_index_equal(result, expected) + + def test_errors(self): + + # not enough params + def f(): + interval_range(0) + + pytest.raises(ValueError, f) + + def f(): + interval_range(periods=2) + + pytest.raises(ValueError, f) + + def f(): + interval_range() + + pytest.raises(ValueError, f) + + # mixed units + def f(): + interval_range(0, Timestamp('20130101'), freq=2) + + pytest.raises(ValueError, f) + + def f(): + interval_range(0, 10, freq=Timedelta('1day')) + + pytest.raises(ValueError, f) + + +class TestIntervalTree(object): + def setup_method(self, method): + gentree = lambda dtype: IntervalTree(np.arange(5, dtype=dtype), + np.arange(5, dtype=dtype) + 2) + self.tree = gentree('int64') + self.trees = {dtype: gentree(dtype) + for dtype in ['int32', 'int64', 'float32', 'float64']} + + def test_get_loc(self): + for dtype, tree in self.trees.items(): + tm.assert_numpy_array_equal(tree.get_loc(1), + np.array([0], dtype='int64')) + tm.assert_numpy_array_equal(np.sort(tree.get_loc(2)), + np.array([0, 1], dtype='int64')) + with pytest.raises(KeyError): + tree.get_loc(-1) + + def test_get_indexer(self): + for dtype, tree in self.trees.items(): + tm.assert_numpy_array_equal( + tree.get_indexer(np.array([1.0, 5.5, 6.5])), + np.array([0, 4, -1], dtype='int64')) + with pytest.raises(KeyError): + tree.get_indexer(np.array([3.0])) + + def test_get_indexer_non_unique(self): + indexer, missing = self.tree.get_indexer_non_unique( + np.array([1.0, 2.0, 6.5])) + tm.assert_numpy_array_equal(indexer[:1], + np.array([0], dtype='int64')) + tm.assert_numpy_array_equal(np.sort(indexer[1:3]), + np.array([0, 1], dtype='int64')) + tm.assert_numpy_array_equal(np.sort(indexer[3:]), + np.array([-1], dtype='int64')) + tm.assert_numpy_array_equal(missing, np.array([2], dtype='int64')) + + def test_duplicates(self): + tree = IntervalTree([0, 0, 0], [1, 1, 1]) + tm.assert_numpy_array_equal(np.sort(tree.get_loc(0.5)), + np.array([0, 1, 2], dtype='int64')) + + with pytest.raises(KeyError): + tree.get_indexer(np.array([0.5])) + + indexer, missing = tree.get_indexer_non_unique(np.array([0.5])) + tm.assert_numpy_array_equal(np.sort(indexer), + np.array([0, 1, 2], dtype='int64')) + tm.assert_numpy_array_equal(missing, np.array([], dtype='int64')) + + def test_get_loc_closed(self): + for closed in ['left', 'right', 'both', 'neither']: + tree = IntervalTree([0], [1], closed=closed) + for p, errors in [(0, tree.open_left), + (1, tree.open_right)]: + if errors: + with pytest.raises(KeyError): + tree.get_loc(p) + else: + tm.assert_numpy_array_equal(tree.get_loc(p), + np.array([0], dtype='int64')) + + @pytest.mark.skipif(compat.is_platform_32bit(), + reason="int type mismatch on 32bit") + def test_get_indexer_closed(self): + x = np.arange(1000, dtype='float64') + found = x.astype('intp') + not_found = (-1 * np.ones(1000)).astype('intp') + + for leaf_size in [1, 10, 100, 10000]: + for closed in ['left', 'right', 'both', 'neither']: + tree = IntervalTree(x, x + 0.5, closed=closed, + leaf_size=leaf_size) + tm.assert_numpy_array_equal(found, + tree.get_indexer(x + 0.25)) + + expected = found if tree.closed_left else not_found + tm.assert_numpy_array_equal(expected, + tree.get_indexer(x + 0.0)) + + expected = found if tree.closed_right else not_found + tm.assert_numpy_array_equal(expected, + tree.get_indexer(x + 0.5)) diff --git a/pandas/tests/indexes/test_multi.py b/pandas/tests/indexes/test_multi.py index 7d9ceb526b912..798d244468961 100644 --- a/pandas/tests/indexes/test_multi.py +++ b/pandas/tests/indexes/test_multi.py @@ -6,7 +6,7 @@ from datetime import timedelta from itertools import product -import nose +import pytest import numpy as np @@ -15,25 +15,23 @@ from pandas import (CategoricalIndex, DataFrame, Index, MultiIndex, compat, date_range, period_range) from pandas.compat import PY3, long, lrange, lzip, range, u -from pandas.core.common import PerformanceWarning, UnsortedIndexError -from pandas.indexes.base import InvalidIndexError -from pandas.lib import Timestamp +from pandas.errors import PerformanceWarning, UnsortedIndexError +from pandas.core.indexes.base import InvalidIndexError +from pandas._libs import lib +from pandas._libs.lib import Timestamp import pandas.util.testing as tm -from pandas.util.testing import (assertRaises, assertRaisesRegexp, - assert_almost_equal, assert_copy) - +from pandas.util.testing import assert_almost_equal, assert_copy from .common import Base -class TestMultiIndex(Base, tm.TestCase): +class TestMultiIndex(Base): _holder = MultiIndex - _multiprocess_can_split_ = True _compat_props = ['shape', 'ndim', 'size', 'itemsize'] - def setUp(self): + def setup_method(self, method): major_axis = Index(['foo', 'bar', 'baz', 'qux']) minor_axis = Index(['one', 'two']) @@ -61,25 +59,25 @@ def f(): if common: pass - tm.assertRaisesRegexp(ValueError, 'The truth value of a', f) + tm.assert_raises_regex(ValueError, 'The truth value of a', f) def test_labels_dtypes(self): # GH 8456 i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) - self.assertTrue(i.labels[0].dtype == 'int8') - self.assertTrue(i.labels[1].dtype == 'int8') + assert i.labels[0].dtype == 'int8' + assert i.labels[1].dtype == 'int8' i = MultiIndex.from_product([['a'], range(40)]) - self.assertTrue(i.labels[1].dtype == 'int8') + assert i.labels[1].dtype == 'int8' i = MultiIndex.from_product([['a'], range(400)]) - self.assertTrue(i.labels[1].dtype == 'int16') + assert i.labels[1].dtype == 'int16' i = MultiIndex.from_product([['a'], range(40000)]) - self.assertTrue(i.labels[1].dtype == 'int32') + assert i.labels[1].dtype == 'int32' i = pd.MultiIndex.from_product([['a'], range(1000)]) - self.assertTrue((i.labels[0] >= 0).all()) - self.assertTrue((i.labels[1] >= 0).all()) + assert (i.labels[0] >= 0).all() + assert (i.labels[1] >= 0).all() def test_where(self): i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) @@ -87,7 +85,16 @@ def test_where(self): def f(): i.where(True) - self.assertRaises(NotImplementedError, f) + pytest.raises(NotImplementedError, f) + + def test_where_array_like(self): + i = MultiIndex.from_tuples([('A', 1), ('A', 2)]) + klasses = [list, tuple, np.array, pd.Series] + cond = [False, True] + + for klass in klasses: + f = lambda: i.where(klass(cond)) + pytest.raises(NotImplementedError, f) def test_repeat(self): reps = 2 @@ -116,39 +123,40 @@ def test_numpy_repeat(self): tm.assert_index_equal(np.repeat(m, reps), expected) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.repeat, m, reps, axis=1) + tm.assert_raises_regex( + ValueError, msg, np.repeat, m, reps, axis=1) def test_set_name_methods(self): # so long as these are synonyms, we don't need to test set_names - self.assertEqual(self.index.rename, self.index.set_names) + assert self.index.rename == self.index.set_names new_names = [name + "SUFFIX" for name in self.index_names] ind = self.index.set_names(new_names) - self.assertEqual(self.index.names, self.index_names) - self.assertEqual(ind.names, new_names) - with assertRaisesRegexp(ValueError, "^Length"): + assert self.index.names == self.index_names + assert ind.names == new_names + with tm.assert_raises_regex(ValueError, "^Length"): ind.set_names(new_names + new_names) new_names2 = [name + "SUFFIX2" for name in new_names] res = ind.set_names(new_names2, inplace=True) - self.assertIsNone(res) - self.assertEqual(ind.names, new_names2) + assert res is None + assert ind.names == new_names2 # set names for specific level (# GH7792) ind = self.index.set_names(new_names[0], level=0) - self.assertEqual(self.index.names, self.index_names) - self.assertEqual(ind.names, [new_names[0], self.index_names[1]]) + assert self.index.names == self.index_names + assert ind.names == [new_names[0], self.index_names[1]] res = ind.set_names(new_names2[0], level=0, inplace=True) - self.assertIsNone(res) - self.assertEqual(ind.names, [new_names2[0], self.index_names[1]]) + assert res is None + assert ind.names == [new_names2[0], self.index_names[1]] # set names for multiple levels ind = self.index.set_names(new_names, level=[0, 1]) - self.assertEqual(self.index.names, self.index_names) - self.assertEqual(ind.names, new_names) + assert self.index.names == self.index_names + assert ind.names == new_names res = ind.set_names(new_names2, level=[0, 1], inplace=True) - self.assertIsNone(res) - self.assertEqual(ind.names, new_names2) + assert res is None + assert ind.names == new_names2 def test_set_levels(self): # side note - you probably wouldn't want to use levels and labels @@ -159,7 +167,7 @@ def test_set_levels(self): def assert_matching(actual, expected, check_dtype=False): # avoid specifying internal representation # as much as possible - self.assertEqual(len(actual), len(expected)) + assert len(actual) == len(expected) for act, exp in zip(actual, expected): act = np.asarray(act) exp = np.asarray(exp) @@ -173,7 +181,7 @@ def assert_matching(actual, expected, check_dtype=False): # level changing [w/ mutation] ind2 = self.index.copy() inplace_return = ind2.set_levels(new_levels, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.levels, new_levels) # level changing specific level [w/o mutation] @@ -193,13 +201,13 @@ def assert_matching(actual, expected, check_dtype=False): # level changing specific level [w/ mutation] ind2 = self.index.copy() inplace_return = ind2.set_levels(new_levels[0], level=0, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.levels, [new_levels[0], levels[1]]) assert_matching(self.index.levels, levels) ind2 = self.index.copy() inplace_return = ind2.set_levels(new_levels[1], level=1, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.levels, [levels[0], new_levels[1]]) assert_matching(self.index.levels, levels) @@ -207,7 +215,7 @@ def assert_matching(actual, expected, check_dtype=False): ind2 = self.index.copy() inplace_return = ind2.set_levels(new_levels, level=[0, 1], inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.levels, new_levels) assert_matching(self.index.levels, levels) @@ -215,23 +223,23 @@ def assert_matching(actual, expected, check_dtype=False): # GH 13754 original_index = self.index.copy() for inplace in [True, False]: - with assertRaisesRegexp(ValueError, "^On"): + with tm.assert_raises_regex(ValueError, "^On"): self.index.set_levels(['c'], level=0, inplace=inplace) assert_matching(self.index.levels, original_index.levels, check_dtype=True) - with assertRaisesRegexp(ValueError, "^On"): + with tm.assert_raises_regex(ValueError, "^On"): self.index.set_labels([0, 1, 2, 3, 4, 5], level=0, inplace=inplace) assert_matching(self.index.labels, original_index.labels, check_dtype=True) - with assertRaisesRegexp(TypeError, "^Levels"): + with tm.assert_raises_regex(TypeError, "^Levels"): self.index.set_levels('c', level=0, inplace=inplace) assert_matching(self.index.levels, original_index.levels, check_dtype=True) - with assertRaisesRegexp(TypeError, "^Labels"): + with tm.assert_raises_regex(TypeError, "^Labels"): self.index.set_labels(1, level=0, inplace=inplace) assert_matching(self.index.labels, original_index.labels, check_dtype=True) @@ -248,7 +256,7 @@ def test_set_labels(self): def assert_matching(actual, expected): # avoid specifying internal representation # as much as possible - self.assertEqual(len(actual), len(expected)) + assert len(actual) == len(expected) for act, exp in zip(actual, expected): act = np.asarray(act) exp = np.asarray(exp, dtype=np.int8) @@ -262,7 +270,7 @@ def assert_matching(actual, expected): # label changing [w/ mutation] ind2 = self.index.copy() inplace_return = ind2.set_labels(new_labels, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.labels, new_labels) # label changing specific level [w/o mutation] @@ -282,13 +290,13 @@ def assert_matching(actual, expected): # label changing specific level [w/ mutation] ind2 = self.index.copy() inplace_return = ind2.set_labels(new_labels[0], level=0, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.labels, [new_labels[0], labels[1]]) assert_matching(self.index.labels, labels) ind2 = self.index.copy() inplace_return = ind2.set_labels(new_labels[1], level=1, inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.labels, [labels[0], new_labels[1]]) assert_matching(self.index.labels, labels) @@ -296,7 +304,7 @@ def assert_matching(actual, expected): ind2 = self.index.copy() inplace_return = ind2.set_labels(new_labels, level=[0, 1], inplace=True) - self.assertIsNone(inplace_return) + assert inplace_return is None assert_matching(ind2.labels, new_labels) assert_matching(self.index.labels, labels) @@ -304,46 +312,46 @@ def test_set_levels_labels_names_bad_input(self): levels, labels = self.index.levels, self.index.labels names = self.index.names - with tm.assertRaisesRegexp(ValueError, 'Length of levels'): + with tm.assert_raises_regex(ValueError, 'Length of levels'): self.index.set_levels([levels[0]]) - with tm.assertRaisesRegexp(ValueError, 'Length of labels'): + with tm.assert_raises_regex(ValueError, 'Length of labels'): self.index.set_labels([labels[0]]) - with tm.assertRaisesRegexp(ValueError, 'Length of names'): + with tm.assert_raises_regex(ValueError, 'Length of names'): self.index.set_names([names[0]]) # shouldn't scalar data error, instead should demand list-like - with tm.assertRaisesRegexp(TypeError, 'list of lists-like'): + with tm.assert_raises_regex(TypeError, 'list of lists-like'): self.index.set_levels(levels[0]) # shouldn't scalar data error, instead should demand list-like - with tm.assertRaisesRegexp(TypeError, 'list of lists-like'): + with tm.assert_raises_regex(TypeError, 'list of lists-like'): self.index.set_labels(labels[0]) # shouldn't scalar data error, instead should demand list-like - with tm.assertRaisesRegexp(TypeError, 'list-like'): + with tm.assert_raises_regex(TypeError, 'list-like'): self.index.set_names(names[0]) # should have equal lengths - with tm.assertRaisesRegexp(TypeError, 'list of lists-like'): + with tm.assert_raises_regex(TypeError, 'list of lists-like'): self.index.set_levels(levels[0], level=[0, 1]) - with tm.assertRaisesRegexp(TypeError, 'list-like'): + with tm.assert_raises_regex(TypeError, 'list-like'): self.index.set_levels(levels, level=0) # should have equal lengths - with tm.assertRaisesRegexp(TypeError, 'list of lists-like'): + with tm.assert_raises_regex(TypeError, 'list of lists-like'): self.index.set_labels(labels[0], level=[0, 1]) - with tm.assertRaisesRegexp(TypeError, 'list-like'): + with tm.assert_raises_regex(TypeError, 'list-like'): self.index.set_labels(labels, level=0) # should have equal lengths - with tm.assertRaisesRegexp(ValueError, 'Length of names'): + with tm.assert_raises_regex(ValueError, 'Length of names'): self.index.set_names(names[0], level=[0, 1]) - with tm.assertRaisesRegexp(TypeError, 'string'): + with tm.assert_raises_regex(TypeError, 'string'): self.index.set_names(names, level=0) def test_set_levels_categorical(self): @@ -366,57 +374,64 @@ def test_metadata_immutable(self): levels, labels = self.index.levels, self.index.labels # shouldn't be able to set at either the top level or base level mutable_regex = re.compile('does not support mutable operations') - with assertRaisesRegexp(TypeError, mutable_regex): + with tm.assert_raises_regex(TypeError, mutable_regex): levels[0] = levels[0] - with assertRaisesRegexp(TypeError, mutable_regex): + with tm.assert_raises_regex(TypeError, mutable_regex): levels[0][0] = levels[0][0] # ditto for labels - with assertRaisesRegexp(TypeError, mutable_regex): + with tm.assert_raises_regex(TypeError, mutable_regex): labels[0] = labels[0] - with assertRaisesRegexp(TypeError, mutable_regex): + with tm.assert_raises_regex(TypeError, mutable_regex): labels[0][0] = labels[0][0] # and for names names = self.index.names - with assertRaisesRegexp(TypeError, mutable_regex): + with tm.assert_raises_regex(TypeError, mutable_regex): names[0] = names[0] def test_inplace_mutation_resets_values(self): levels = [['a', 'b', 'c'], [4]] levels2 = [[1, 2, 3], ['a']] labels = [[0, 1, 0, 2, 2, 0], [0, 0, 0, 0, 0, 0]] + mi1 = MultiIndex(levels=levels, labels=labels) mi2 = MultiIndex(levels=levels2, labels=labels) vals = mi1.values.copy() vals2 = mi2.values.copy() - self.assertIsNotNone(mi1._tuples) - # make sure level setting works + assert mi1._tuples is not None + + # Make sure level setting works new_vals = mi1.set_levels(levels2).values - assert_almost_equal(vals2, new_vals) - # non-inplace doesn't kill _tuples [implementation detail] - assert_almost_equal(mi1._tuples, vals) - # and values is still same too - assert_almost_equal(mi1.values, vals) + tm.assert_almost_equal(vals2, new_vals) + + # Non-inplace doesn't kill _tuples [implementation detail] + tm.assert_almost_equal(mi1._tuples, vals) - # inplace should kill _tuples + # ...and values is still same too + tm.assert_almost_equal(mi1.values, vals) + + # Inplace should kill _tuples mi1.set_levels(levels2, inplace=True) - assert_almost_equal(mi1.values, vals2) + tm.assert_almost_equal(mi1.values, vals2) - # make sure label setting works too + # Make sure label setting works too labels2 = [[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]] exp_values = np.empty((6, ), dtype=object) exp_values[:] = [(long(1), 'a')] * 6 - # must be 1d array of tuples - self.assertEqual(exp_values.shape, (6, )) + + # Must be 1d array of tuples + assert exp_values.shape == (6, ) new_values = mi2.set_labels(labels2).values - # not inplace shouldn't change - assert_almost_equal(mi2._tuples, vals2) - # should have correct values - assert_almost_equal(exp_values, new_values) - # and again setting inplace should kill _tuples, etc + # Not inplace shouldn't change + tm.assert_almost_equal(mi2._tuples, vals2) + + # Should have correct values + tm.assert_almost_equal(exp_values, new_values) + + # ...and again setting inplace should kill _tuples, etc mi2.set_labels(labels2, inplace=True) - assert_almost_equal(mi2.values, new_values) + tm.assert_almost_equal(mi2.values, new_values) def test_copy_in_constructor(self): levels = np.array(["a", "b", "c"]) @@ -424,12 +439,12 @@ def test_copy_in_constructor(self): val = labels[0] mi = MultiIndex(levels=[levels, levels], labels=[labels, labels], copy=True) - self.assertEqual(mi.labels[0][0], val) + assert mi.labels[0][0] == val labels[0] = 15 - self.assertEqual(mi.labels[0][0], val) + assert mi.labels[0][0] == val val = levels[0] levels[0] = "PANDA" - self.assertEqual(mi.levels[0][0], val) + assert mi.levels[0][0] == val def test_set_value_keeps_names(self): # motivating example from #3742 @@ -441,11 +456,11 @@ def test_set_value_keeps_names(self): columns=['one', 'two', 'three', 'four'], index=idx) df = df.sort_index() - self.assertIsNone(df.is_copy) - self.assertEqual(df.index.names, ('Name', 'Number')) + assert df.is_copy is None + assert df.index.names == ('Name', 'Number') df = df.set_value(('grethe', '4'), 'one', 99.34) - self.assertIsNone(df.is_copy) - self.assertEqual(df.index.names, ('Name', 'Number')) + assert df.is_copy is None + assert df.index.names == ('Name', 'Number') def test_copy_names(self): # Check that adding a "names" parameter to the copy is honored @@ -453,62 +468,63 @@ def test_copy_names(self): multi_idx = pd.Index([(1, 2), (3, 4)], names=['MyName1', 'MyName2']) multi_idx1 = multi_idx.copy() - self.assertTrue(multi_idx.equals(multi_idx1)) - self.assertEqual(multi_idx.names, ['MyName1', 'MyName2']) - self.assertEqual(multi_idx1.names, ['MyName1', 'MyName2']) + assert multi_idx.equals(multi_idx1) + assert multi_idx.names == ['MyName1', 'MyName2'] + assert multi_idx1.names == ['MyName1', 'MyName2'] multi_idx2 = multi_idx.copy(names=['NewName1', 'NewName2']) - self.assertTrue(multi_idx.equals(multi_idx2)) - self.assertEqual(multi_idx.names, ['MyName1', 'MyName2']) - self.assertEqual(multi_idx2.names, ['NewName1', 'NewName2']) + assert multi_idx.equals(multi_idx2) + assert multi_idx.names == ['MyName1', 'MyName2'] + assert multi_idx2.names == ['NewName1', 'NewName2'] multi_idx3 = multi_idx.copy(name=['NewName1', 'NewName2']) - self.assertTrue(multi_idx.equals(multi_idx3)) - self.assertEqual(multi_idx.names, ['MyName1', 'MyName2']) - self.assertEqual(multi_idx3.names, ['NewName1', 'NewName2']) + assert multi_idx.equals(multi_idx3) + assert multi_idx.names == ['MyName1', 'MyName2'] + assert multi_idx3.names == ['NewName1', 'NewName2'] def test_names(self): - # names are assigned in __init__ + # names are assigned in setup names = self.index_names level_names = [level.name for level in self.index.levels] - self.assertEqual(names, level_names) + assert names == level_names # setting bad names on existing index = self.index - assertRaisesRegexp(ValueError, "^Length of names", setattr, index, - "names", list(index.names) + ["third"]) - assertRaisesRegexp(ValueError, "^Length of names", setattr, index, - "names", []) + tm.assert_raises_regex(ValueError, "^Length of names", + setattr, index, "names", + list(index.names) + ["third"]) + tm.assert_raises_regex(ValueError, "^Length of names", + setattr, index, "names", []) # initializing with bad names (should always be equivalent) major_axis, minor_axis = self.index.levels major_labels, minor_labels = self.index.labels - assertRaisesRegexp(ValueError, "^Length of names", MultiIndex, - levels=[major_axis, minor_axis], - labels=[major_labels, minor_labels], - names=['first']) - assertRaisesRegexp(ValueError, "^Length of names", MultiIndex, - levels=[major_axis, minor_axis], - labels=[major_labels, minor_labels], - names=['first', 'second', 'third']) + tm.assert_raises_regex(ValueError, "^Length of names", MultiIndex, + levels=[major_axis, minor_axis], + labels=[major_labels, minor_labels], + names=['first']) + tm.assert_raises_regex(ValueError, "^Length of names", MultiIndex, + levels=[major_axis, minor_axis], + labels=[major_labels, minor_labels], + names=['first', 'second', 'third']) # names are assigned index.names = ["a", "b"] ind_names = list(index.names) level_names = [level.name for level in index.levels] - self.assertEqual(ind_names, level_names) + assert ind_names == level_names def test_reference_duplicate_name(self): idx = MultiIndex.from_tuples( [('a', 'b'), ('c', 'd')], names=['x', 'x']) - self.assertTrue(idx._reference_duplicate_name('x')) + assert idx._reference_duplicate_name('x') idx = MultiIndex.from_tuples( [('a', 'b'), ('c', 'd')], names=['x', 'y']) - self.assertFalse(idx._reference_duplicate_name('x')) + assert not idx._reference_duplicate_name('x') def test_astype(self): expected = self.index.copy() @@ -517,79 +533,76 @@ def test_astype(self): assert_copy(actual.labels, expected.labels) self.check_level_names(actual, expected.names) - with assertRaisesRegexp(TypeError, "^Setting.*dtype.*object"): + with tm.assert_raises_regex(TypeError, "^Setting.*dtype.*object"): self.index.astype(np.dtype(int)) def test_constructor_single_level(self): - single_level = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux']], - labels=[[0, 1, 2, 3]], names=['first']) - tm.assertIsInstance(single_level, Index) - self.assertNotIsInstance(single_level, MultiIndex) - self.assertEqual(single_level.name, 'first') - - single_level = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux']], - labels=[[0, 1, 2, 3]]) - self.assertIsNone(single_level.name) + result = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux']], + labels=[[0, 1, 2, 3]], names=['first']) + assert isinstance(result, MultiIndex) + expected = Index(['foo', 'bar', 'baz', 'qux'], name='first') + tm.assert_index_equal(result.levels[0], expected) + assert result.names == ['first'] def test_constructor_no_levels(self): - assertRaisesRegexp(ValueError, "non-zero number of levels/labels", - MultiIndex, levels=[], labels=[]) + tm.assert_raises_regex(ValueError, "non-zero number " + "of levels/labels", + MultiIndex, levels=[], labels=[]) both_re = re.compile('Must pass both levels and labels') - with tm.assertRaisesRegexp(TypeError, both_re): + with tm.assert_raises_regex(TypeError, both_re): MultiIndex(levels=[]) - with tm.assertRaisesRegexp(TypeError, both_re): + with tm.assert_raises_regex(TypeError, both_re): MultiIndex(labels=[]) def test_constructor_mismatched_label_levels(self): labels = [np.array([1]), np.array([2]), np.array([3])] levels = ["a"] - assertRaisesRegexp(ValueError, "Length of levels and labels must be" - " the same", MultiIndex, levels=levels, - labels=labels) + tm.assert_raises_regex(ValueError, "Length of levels and labels " + "must be the same", MultiIndex, + levels=levels, labels=labels) length_error = re.compile('>= length of level') label_error = re.compile(r'Unequal label lengths: \[4, 2\]') # important to check that it's looking at the right thing. - with tm.assertRaisesRegexp(ValueError, length_error): + with tm.assert_raises_regex(ValueError, length_error): MultiIndex(levels=[['a'], ['b']], labels=[[0, 1, 2, 3], [0, 3, 4, 1]]) - with tm.assertRaisesRegexp(ValueError, label_error): + with tm.assert_raises_regex(ValueError, label_error): MultiIndex(levels=[['a'], ['b']], labels=[[0, 0, 0, 0], [0, 0]]) # external API - with tm.assertRaisesRegexp(ValueError, length_error): + with tm.assert_raises_regex(ValueError, length_error): self.index.copy().set_levels([['a'], ['b']]) - with tm.assertRaisesRegexp(ValueError, label_error): + with tm.assert_raises_regex(ValueError, label_error): self.index.copy().set_labels([[0, 0, 0, 0], [0, 0]]) # deprecated properties with warnings.catch_warnings(): warnings.simplefilter('ignore') - with tm.assertRaisesRegexp(ValueError, length_error): + with tm.assert_raises_regex(ValueError, length_error): self.index.copy().levels = [['a'], ['b']] - with tm.assertRaisesRegexp(ValueError, label_error): + with tm.assert_raises_regex(ValueError, label_error): self.index.copy().labels = [[0, 0, 0, 0], [0, 0]] def assert_multiindex_copied(self, copy, original): - # levels should be (at least, shallow copied) - assert_copy(copy.levels, original.levels) - - assert_almost_equal(copy.labels, original.labels) + # Levels should be (at least, shallow copied) + tm.assert_copy(copy.levels, original.levels) + tm.assert_almost_equal(copy.labels, original.labels) - # labels doesn't matter which way copied - assert_almost_equal(copy.labels, original.labels) - self.assertIsNot(copy.labels, original.labels) + # Labels doesn't matter which way copied + tm.assert_almost_equal(copy.labels, original.labels) + assert copy.labels is not original.labels - # names doesn't matter which way copied - self.assertEqual(copy.names, original.names) - self.assertIsNot(copy.names, original.names) + # Names doesn't matter which way copied + assert copy.names == original.names + assert copy.names is not original.names - # sort order should be copied - self.assertEqual(copy.sortorder, original.sortorder) + # Sort order should be copied + assert copy.sortorder == original.sortorder def test_copy(self): i_copy = self.index.copy() @@ -607,7 +620,7 @@ def test_view(self): self.assert_multiindex_copied(i_view, self.index) def check_level_names(self, index, names): - self.assertEqual([level.name for level in index.levels], list(names)) + assert [level.name for level in index.levels] == list(names) def test_changing_names(self): @@ -635,16 +648,16 @@ def test_changing_names(self): def test_duplicate_names(self): self.index.names = ['foo', 'foo'] - assertRaisesRegexp(KeyError, 'Level foo not found', - self.index._get_level_number, 'foo') + tm.assert_raises_regex(KeyError, 'Level foo not found', + self.index._get_level_number, 'foo') def test_get_level_number_integer(self): self.index.names = [1, 0] - self.assertEqual(self.index._get_level_number(1), 0) - self.assertEqual(self.index._get_level_number(0), 1) - self.assertRaises(IndexError, self.index._get_level_number, 2) - assertRaisesRegexp(KeyError, 'Level fourth not found', - self.index._get_level_number, 'fourth') + assert self.index._get_level_number(1) == 0 + assert self.index._get_level_number(0) == 1 + pytest.raises(IndexError, self.index._get_level_number, 2) + tm.assert_raises_regex(KeyError, 'Level fourth not found', + self.index._get_level_number, 'fourth') def test_from_arrays(self): arrays = [] @@ -652,14 +665,13 @@ def test_from_arrays(self): arrays.append(np.asarray(lev).take(lab)) result = MultiIndex.from_arrays(arrays) - self.assertEqual(list(result), list(self.index)) + assert list(result) == list(self.index) # infer correctly result = MultiIndex.from_arrays([[pd.NaT, Timestamp('20130101')], ['a', 'b']]) - self.assertTrue(result.levels[0].equals(Index([Timestamp('20130101') - ]))) - self.assertTrue(result.levels[1].equals(Index(['a', 'b']))) + assert result.levels[0].equals(Index([Timestamp('20130101')])) + assert result.levels[1].equals(Index(['a', 'b'])) def test_from_arrays_index_series_datetimetz(self): idx1 = pd.date_range('2015-01-01 10:00', freq='D', periods=3, @@ -747,21 +759,22 @@ def test_from_arrays_index_series_categorical(self): def test_from_arrays_empty(self): # 0 levels - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( ValueError, "Must pass non-zero number of levels/labels"): MultiIndex.from_arrays(arrays=[]) # 1 level result = MultiIndex.from_arrays(arrays=[[]], names=['A']) + assert isinstance(result, MultiIndex) expected = Index([], name='A') - tm.assert_index_equal(result, expected) + tm.assert_index_equal(result.levels[0], expected) # N levels for N in [2, 3]: arrays = [[]] * N names = list('ABC')[:N] result = MultiIndex.from_arrays(arrays=arrays, names=names) - expected = MultiIndex(levels=[np.array([])] * N, labels=[[]] * N, + expected = MultiIndex(levels=[[]] * N, labels=[[]] * N, names=names) tm.assert_index_equal(result, expected) @@ -769,24 +782,27 @@ def test_from_arrays_invalid_input(self): invalid_inputs = [1, [1], [1, 2], [[1], 2], 'a', ['a'], ['a', 'b'], [['a'], 'b']] for i in invalid_inputs: - tm.assertRaises(TypeError, MultiIndex.from_arrays, arrays=i) + pytest.raises(TypeError, MultiIndex.from_arrays, arrays=i) def test_from_arrays_different_lengths(self): - # GH13599 + # see gh-13599 idx1 = [1, 2, 3] idx2 = ['a', 'b'] - assertRaisesRegexp(ValueError, '^all arrays must be same length$', - MultiIndex.from_arrays, [idx1, idx2]) + tm.assert_raises_regex(ValueError, '^all arrays must ' + 'be same length$', + MultiIndex.from_arrays, [idx1, idx2]) idx1 = [] idx2 = ['a', 'b'] - assertRaisesRegexp(ValueError, '^all arrays must be same length$', - MultiIndex.from_arrays, [idx1, idx2]) + tm.assert_raises_regex(ValueError, '^all arrays must ' + 'be same length$', + MultiIndex.from_arrays, [idx1, idx2]) idx1 = [1, 2, 3] idx2 = [] - assertRaisesRegexp(ValueError, '^all arrays must be same length$', - MultiIndex.from_arrays, [idx1, idx2]) + tm.assert_raises_regex(ValueError, '^all arrays must ' + 'be same length$', + MultiIndex.from_arrays, [idx1, idx2]) def test_from_product(self): @@ -801,18 +817,18 @@ def test_from_product(self): expected = MultiIndex.from_tuples(tuples, names=names) tm.assert_index_equal(result, expected) - self.assertEqual(result.names, names) + assert result.names == names def test_from_product_empty(self): # 0 levels - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( ValueError, "Must pass non-zero number of levels/labels"): MultiIndex.from_product([]) # 1 level result = MultiIndex.from_product([[]], names=['A']) - expected = pd.Float64Index([], name='A') - tm.assert_index_equal(result, expected) + expected = pd.Index([], name='A') + tm.assert_index_equal(result.levels[0], expected) # 2 levels l1 = [[], ['foo', 'bar', 'baz'], []] @@ -820,7 +836,7 @@ def test_from_product_empty(self): names = ['A', 'B'] for first, second in zip(l1, l2): result = MultiIndex.from_product([first, second], names=names) - expected = MultiIndex(levels=[np.array(first), np.array(second)], + expected = MultiIndex(levels=[first, second], labels=[[], []], names=names) tm.assert_index_equal(result, expected) @@ -829,8 +845,7 @@ def test_from_product_empty(self): for N in range(4): lvl2 = lrange(N) result = MultiIndex.from_product([[], lvl2, []], names=names) - expected = MultiIndex(levels=[np.array(A) - for A in [[], lvl2, []]], + expected = MultiIndex(levels=[[], lvl2, []], labels=[[], [], []], names=names) tm.assert_index_equal(result, expected) @@ -838,12 +853,12 @@ def test_from_product_invalid_input(self): invalid_inputs = [1, [1], [1, 2], [[1], 2], 'a', ['a'], ['a', 'b'], [['a'], 'b']] for i in invalid_inputs: - tm.assertRaises(TypeError, MultiIndex.from_product, iterables=i) + pytest.raises(TypeError, MultiIndex.from_product, iterables=i) def test_from_product_datetimeindex(self): dt_index = date_range('2000-01-01', periods=2) mi = pd.MultiIndex.from_product([[1, 2], dt_index]) - etalon = pd.lib.list_to_object_array([(1, pd.Timestamp( + etalon = lib.list_to_object_array([(1, pd.Timestamp( '2000-01-01')), (1, pd.Timestamp('2000-01-02')), (2, pd.Timestamp( '2000-01-01')), (2, pd.Timestamp('2000-01-02'))]) tm.assert_numpy_array_equal(mi.values, etalon) @@ -870,21 +885,21 @@ def test_values_boxed(self): (3, pd.Timestamp('2000-01-03'))] mi = pd.MultiIndex.from_tuples(tuples) tm.assert_numpy_array_equal(mi.values, - pd.lib.list_to_object_array(tuples)) + lib.list_to_object_array(tuples)) # Check that code branches for boxed values produce identical results tm.assert_numpy_array_equal(mi.values[:4], mi[:4].values) def test_append(self): result = self.index[:3].append(self.index[3:]) - self.assertTrue(result.equals(self.index)) + assert result.equals(self.index) foos = [self.index[:1], self.index[1:3], self.index[3:]] result = foos[0].append(foos[1:]) - self.assertTrue(result.equals(self.index)) + assert result.equals(self.index) # empty result = self.index.append([]) - self.assertTrue(result.equals(self.index)) + assert result.equals(self.index) def test_append_mixed_dtypes(self): # GH 13660 @@ -896,15 +911,15 @@ def test_append_mixed_dtypes(self): [1.1, np.nan, 3.3], ['a', 'b', 'c'], dti, dti_tz, pi]) - self.assertEqual(mi.nlevels, 6) + assert mi.nlevels == 6 res = mi.append(mi) exp = MultiIndex.from_arrays([[1, 2, 3, 1, 2, 3], - [1.1, np.nan, 3.3, 1.1, np.nan, 3.3], - ['a', 'b', 'c', 'a', 'b', 'c'], - dti.append(dti), - dti_tz.append(dti_tz), - pi.append(pi)]) + [1.1, np.nan, 3.3, 1.1, np.nan, 3.3], + ['a', 'b', 'c', 'a', 'b', 'c'], + dti.append(dti), + dti_tz.append(dti_tz), + pi.append(pi)]) tm.assert_index_equal(res, exp) other = MultiIndex.from_arrays([['x', 'y', 'z'], ['x', 'y', 'z'], @@ -913,11 +928,11 @@ def test_append_mixed_dtypes(self): res = mi.append(other) exp = MultiIndex.from_arrays([[1, 2, 3, 'x', 'y', 'z'], - [1.1, np.nan, 3.3, 'x', 'y', 'z'], - ['a', 'b', 'c', 'x', 'y', 'z'], - dti.append(pd.Index(['x', 'y', 'z'])), - dti_tz.append(pd.Index(['x', 'y', 'z'])), - pi.append(pd.Index(['x', 'y', 'z']))]) + [1.1, np.nan, 3.3, 'x', 'y', 'z'], + ['a', 'b', 'c', 'x', 'y', 'z'], + dti.append(pd.Index(['x', 'y', 'z'])), + dti_tz.append(pd.Index(['x', 'y', 'z'])), + pi.append(pd.Index(['x', 'y', 'z']))]) tm.assert_index_equal(res, exp) def test_get_level_values(self): @@ -925,7 +940,7 @@ def test_get_level_values(self): expected = Index(['foo', 'foo', 'bar', 'baz', 'qux', 'qux'], name='first') tm.assert_index_equal(result, expected) - self.assertEqual(result.name, 'first') + assert result.name == 'first' result = self.index.get_level_values('first') expected = self.index.get_level_values(0) @@ -936,9 +951,9 @@ def test_get_level_values(self): ['A', 'B']), CategoricalIndex([1, 2, 3])], labels=[np.array( [0, 0, 0, 1, 1, 1]), np.array([0, 1, 2, 0, 1, 2])]) exp = CategoricalIndex(['A', 'A', 'A', 'B', 'B', 'B']) - self.assert_index_equal(index.get_level_values(0), exp) + tm.assert_index_equal(index.get_level_values(0), exp) exp = CategoricalIndex([1, 2, 3, 1, 2, 3]) - self.assert_index_equal(index.get_level_values(1), exp) + tm.assert_index_equal(index.get_level_values(1), exp) def test_get_level_values_na(self): arrays = [['a', 'b', 'b'], [1, np.nan, 2]] @@ -971,32 +986,32 @@ def test_get_level_values_na(self): arrays = [[], []] index = pd.MultiIndex.from_arrays(arrays) values = index.get_level_values(0) - self.assertEqual(values.shape, (0, )) + assert values.shape == (0, ) def test_reorder_levels(self): # this blows up - assertRaisesRegexp(IndexError, '^Too many levels', - self.index.reorder_levels, [2, 1, 0]) + tm.assert_raises_regex(IndexError, '^Too many levels', + self.index.reorder_levels, [2, 1, 0]) def test_nlevels(self): - self.assertEqual(self.index.nlevels, 2) + assert self.index.nlevels == 2 def test_iter(self): result = list(self.index) expected = [('foo', 'one'), ('foo', 'two'), ('bar', 'one'), ('baz', 'two'), ('qux', 'one'), ('qux', 'two')] - self.assertEqual(result, expected) + assert result == expected def test_legacy_pickle(self): if PY3: - raise nose.SkipTest("testing for legacy pickles not " - "support on py3") + pytest.skip("testing for legacy pickles not " + "support on py3") path = tm.get_data_path('multiindex_v1.pickle') obj = pd.read_pickle(path) obj2 = MultiIndex.from_tuples(obj.values) - self.assertTrue(obj.equals(obj2)) + assert obj.equals(obj2) res = obj.get_indexer(obj) exp = np.arange(len(obj), dtype=np.intp) @@ -1015,7 +1030,7 @@ def test_legacy_v2_unpickle(self): obj = pd.read_pickle(path) obj2 = MultiIndex.from_tuples(obj.values) - self.assertTrue(obj.equals(obj2)) + assert obj.equals(obj2) res = obj.get_indexer(obj) exp = np.arange(len(obj), dtype=np.intp) @@ -1035,73 +1050,99 @@ def test_roundtrip_pickle_with_tz(self): [[1, 2], ['a', 'b'], date_range('20130101', periods=3, tz='US/Eastern') ], names=['one', 'two', 'three']) - unpickled = self.round_trip_pickle(index) - self.assertTrue(index.equal_levels(unpickled)) + unpickled = tm.round_trip_pickle(index) + assert index.equal_levels(unpickled) def test_from_tuples_index_values(self): result = MultiIndex.from_tuples(self.index) - self.assertTrue((result.values == self.index.values).all()) + assert (result.values == self.index.values).all() def test_contains(self): - self.assertIn(('foo', 'two'), self.index) - self.assertNotIn(('bar', 'two'), self.index) - self.assertNotIn(None, self.index) + assert ('foo', 'two') in self.index + assert ('bar', 'two') not in self.index + assert None not in self.index + + def test_contains_top_level(self): + midx = MultiIndex.from_product([['A', 'B'], [1, 2]]) + assert 'A' in midx + assert 'A' not in midx._engine + + def test_contains_with_nat(self): + # MI with a NaT + mi = MultiIndex(levels=[['C'], + pd.date_range('2012-01-01', periods=5)], + labels=[[0, 0, 0, 0, 0, 0], [-1, 0, 1, 2, 3, 4]], + names=[None, 'B']) + assert ('C', pd.Timestamp('2012-01-01')) in mi + for val in mi.values: + assert val in mi def test_is_all_dates(self): - self.assertFalse(self.index.is_all_dates) + assert not self.index.is_all_dates def test_is_numeric(self): # MultiIndex is never numeric - self.assertFalse(self.index.is_numeric()) + assert not self.index.is_numeric() def test_getitem(self): # scalar - self.assertEqual(self.index[2], ('bar', 'one')) + assert self.index[2] == ('bar', 'one') # slice result = self.index[2:5] expected = self.index[[2, 3, 4]] - self.assertTrue(result.equals(expected)) + assert result.equals(expected) # boolean result = self.index[[True, False, True, False, True, True]] result2 = self.index[np.array([True, False, True, False, True, True])] expected = self.index[[0, 2, 4, 5]] - self.assertTrue(result.equals(expected)) - self.assertTrue(result2.equals(expected)) + assert result.equals(expected) + assert result2.equals(expected) def test_getitem_group_select(self): sorted_idx, _ = self.index.sortlevel(0) - self.assertEqual(sorted_idx.get_loc('baz'), slice(3, 4)) - self.assertEqual(sorted_idx.get_loc('foo'), slice(0, 2)) + assert sorted_idx.get_loc('baz') == slice(3, 4) + assert sorted_idx.get_loc('foo') == slice(0, 2) def test_get_loc(self): - self.assertEqual(self.index.get_loc(('foo', 'two')), 1) - self.assertEqual(self.index.get_loc(('baz', 'two')), 3) - self.assertRaises(KeyError, self.index.get_loc, ('bar', 'two')) - self.assertRaises(KeyError, self.index.get_loc, 'quux') + assert self.index.get_loc(('foo', 'two')) == 1 + assert self.index.get_loc(('baz', 'two')) == 3 + pytest.raises(KeyError, self.index.get_loc, ('bar', 'two')) + pytest.raises(KeyError, self.index.get_loc, 'quux') - self.assertRaises(NotImplementedError, self.index.get_loc, 'foo', - method='nearest') + pytest.raises(NotImplementedError, self.index.get_loc, 'foo', + method='nearest') # 3 levels index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( lrange(4))], labels=[np.array([0, 0, 1, 2, 2, 2, 3, 3]), np.array( [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])]) - self.assertRaises(KeyError, index.get_loc, (1, 1)) - self.assertEqual(index.get_loc((2, 0)), slice(3, 5)) + pytest.raises(KeyError, index.get_loc, (1, 1)) + assert index.get_loc((2, 0)) == slice(3, 5) def test_get_loc_duplicates(self): index = Index([2, 2, 2, 2]) result = index.get_loc(2) expected = slice(0, 4) - self.assertEqual(result, expected) - # self.assertRaises(Exception, index.get_loc, 2) + assert result == expected + # pytest.raises(Exception, index.get_loc, 2) index = Index(['c', 'a', 'a', 'b', 'b']) rs = index.get_loc('c') xp = 0 - assert (rs == xp) + assert rs == xp + + def test_get_value_duplicates(self): + index = MultiIndex(levels=[['D', 'B', 'C'], + [0, 26, 27, 37, 57, 67, 75, 82]], + labels=[[0, 0, 0, 1, 2, 2, 2, 2, 2, 2], + [1, 3, 4, 6, 0, 2, 2, 3, 5, 7]], + names=['tag', 'day']) + + assert index.get_loc('D') == slice(0, 3) + with pytest.raises(KeyError): + index._engine.get_value(np.array([]), 'D') def test_get_loc_level(self): index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( @@ -1111,22 +1152,30 @@ def test_get_loc_level(self): loc, new_index = index.get_loc_level((0, 1)) expected = slice(1, 2) exp_index = index[expected].droplevel(0).droplevel(0) - self.assertEqual(loc, expected) - self.assertTrue(new_index.equals(exp_index)) + assert loc == expected + assert new_index.equals(exp_index) loc, new_index = index.get_loc_level((0, 1, 0)) expected = 1 - self.assertEqual(loc, expected) - self.assertIsNone(new_index) + assert loc == expected + assert new_index is None - self.assertRaises(KeyError, index.get_loc_level, (2, 2)) + pytest.raises(KeyError, index.get_loc_level, (2, 2)) index = MultiIndex(levels=[[2000], lrange(4)], labels=[np.array( [0, 0, 0, 0]), np.array([0, 1, 2, 3])]) result, new_index = index.get_loc_level((2000, slice(None, None))) expected = slice(None, None) - self.assertEqual(result, expected) - self.assertTrue(new_index.equals(index.droplevel(0))) + assert result == expected + assert new_index.equals(index.droplevel(0)) + + def test_get_loc_missing_nan(self): + # GH 8569 + idx = MultiIndex.from_arrays([[1.0, 2.0], [3.0, 4.0]]) + assert isinstance(idx.get_loc(1), slice) + pytest.raises(KeyError, idx.get_loc, 3) + pytest.raises(KeyError, idx.get_loc, np.nan) + pytest.raises(KeyError, idx.get_loc, [np.nan]) def test_slice_locs(self): df = tm.makeTimeDataFrame() @@ -1148,17 +1197,19 @@ def test_slice_locs_with_type_mismatch(self): df = tm.makeTimeDataFrame() stacked = df.stack() idx = stacked.index - assertRaisesRegexp(TypeError, '^Level type mismatch', idx.slice_locs, - (1, 3)) - assertRaisesRegexp(TypeError, '^Level type mismatch', idx.slice_locs, - df.index[5] + timedelta(seconds=30), (5, 2)) + tm.assert_raises_regex(TypeError, '^Level type mismatch', + idx.slice_locs, (1, 3)) + tm.assert_raises_regex(TypeError, '^Level type mismatch', + idx.slice_locs, + df.index[5] + timedelta( + seconds=30), (5, 2)) df = tm.makeCustomDataframe(5, 5) stacked = df.stack() idx = stacked.index - with assertRaisesRegexp(TypeError, '^Level type mismatch'): + with tm.assert_raises_regex(TypeError, '^Level type mismatch'): idx.slice_locs(timedelta(seconds=30)) # TODO: Try creating a UnicodeDecodeError in exception message - with assertRaisesRegexp(TypeError, '^Level type mismatch'): + with tm.assert_raises_regex(TypeError, '^Level type mismatch'): idx.slice_locs(df.index[1], (16, "a")) def test_slice_locs_not_sorted(self): @@ -1166,9 +1217,9 @@ def test_slice_locs_not_sorted(self): lrange(4))], labels=[np.array([0, 0, 1, 2, 2, 2, 3, 3]), np.array( [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])]) - assertRaisesRegexp(KeyError, "[Kk]ey length.*greater than MultiIndex" - " lexsort depth", index.slice_locs, (1, 0, 1), - (2, 1, 0)) + tm.assert_raises_regex(KeyError, "[Kk]ey length.*greater than " + "MultiIndex lexsort depth", + index.slice_locs, (1, 0, 1), (2, 1, 0)) # works sorted_index, _ = index.sortlevel(0) @@ -1179,16 +1230,16 @@ def test_slice_locs_partial(self): sorted_idx, _ = self.index.sortlevel(0) result = sorted_idx.slice_locs(('foo', 'two'), ('qux', 'one')) - self.assertEqual(result, (1, 5)) + assert result == (1, 5) result = sorted_idx.slice_locs(None, ('qux', 'one')) - self.assertEqual(result, (0, 5)) + assert result == (0, 5) result = sorted_idx.slice_locs(('foo', 'two'), None) - self.assertEqual(result, (1, len(sorted_idx))) + assert result == (1, len(sorted_idx)) result = sorted_idx.slice_locs('bar', 'baz') - self.assertEqual(result, (2, 4)) + assert result == (2, 4) def test_slice_locs_not_contained(self): # some searchsorted action @@ -1198,22 +1249,22 @@ def test_slice_locs_not_contained(self): [0, 1, 2, 1, 2, 2, 0, 1, 2]], sortorder=0) result = index.slice_locs((1, 0), (5, 2)) - self.assertEqual(result, (3, 6)) + assert result == (3, 6) result = index.slice_locs(1, 5) - self.assertEqual(result, (3, 6)) + assert result == (3, 6) result = index.slice_locs((2, 2), (5, 2)) - self.assertEqual(result, (3, 6)) + assert result == (3, 6) result = index.slice_locs(2, 5) - self.assertEqual(result, (3, 6)) + assert result == (3, 6) result = index.slice_locs((1, 0), (6, 3)) - self.assertEqual(result, (3, 8)) + assert result == (3, 8) result = index.slice_locs(-1, 10) - self.assertEqual(result, (0, len(index))) + assert result == (0, len(index)) def test_consistency(self): # need to construct an overflow @@ -1233,7 +1284,7 @@ def test_consistency(self): index = MultiIndex(levels=[major_axis, minor_axis], labels=[major_labels, minor_labels]) - self.assertFalse(index.is_unique) + assert not index.is_unique def test_truncate(self): major_axis = Index(lrange(4)) @@ -1246,18 +1297,18 @@ def test_truncate(self): labels=[major_labels, minor_labels]) result = index.truncate(before=1) - self.assertNotIn('foo', result.levels[0]) - self.assertIn(1, result.levels[0]) + assert 'foo' not in result.levels[0] + assert 1 in result.levels[0] result = index.truncate(after=1) - self.assertNotIn(2, result.levels[0]) - self.assertIn(1, result.levels[0]) + assert 2 not in result.levels[0] + assert 1 in result.levels[0] result = index.truncate(before=1, after=2) - self.assertEqual(len(result.levels[0]), 2) + assert len(result.levels[0]) == 2 # after < before - self.assertRaises(ValueError, index.truncate, 3, 1) + pytest.raises(ValueError, index.truncate, 3, 1) def test_get_indexer(self): major_axis = Index(lrange(4)) @@ -1295,28 +1346,41 @@ def test_get_indexer(self): assert_almost_equal(r1, rbfill1) # pass non-MultiIndex - r1 = idx1.get_indexer(idx2._tuple_index) + r1 = idx1.get_indexer(idx2.values) rexp1 = idx1.get_indexer(idx2) assert_almost_equal(r1, rexp1) r1 = idx1.get_indexer([1, 2, 3]) - self.assertTrue((r1 == [-1, -1, -1]).all()) + assert (r1 == [-1, -1, -1]).all() # create index with duplicates idx1 = Index(lrange(10) + lrange(10)) idx2 = Index(lrange(20)) msg = "Reindexing only valid with uniquely valued Index objects" - with assertRaisesRegexp(InvalidIndexError, msg): + with tm.assert_raises_regex(InvalidIndexError, msg): idx1.get_indexer(idx2) def test_get_indexer_nearest(self): midx = MultiIndex.from_tuples([('a', 1), ('b', 2)]) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): midx.get_indexer(['a'], method='nearest') - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): midx.get_indexer(['a'], method='pad', tolerance=2) + def test_hash_collisions(self): + # non-smoke test that we don't get hash collisions + + index = MultiIndex.from_product([np.arange(1000), np.arange(1000)], + names=['one', 'two']) + result = index.get_indexer(index.values) + tm.assert_numpy_array_equal(result, np.arange( + len(index), dtype='intp')) + + for i in [0, 1, len(index) - 2, len(index) - 1]: + result = index.get_loc(index[i]) + assert result == i + def test_format(self): self.index.format() self.index[:0].format() @@ -1332,7 +1396,7 @@ def test_format_sparse_display(self): [0, 1, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0]]) result = index.format() - self.assertEqual(result[3], '1 0 0 0') + assert result[3] == '1 0 0 0' def test_format_sparse_config(self): warn_filters = warnings.filters @@ -1342,9 +1406,9 @@ def test_format_sparse_config(self): pd.set_option('display.multi_sparse', False) result = self.index.format() - self.assertEqual(result[1], 'foo two') + assert result[1] == 'foo two' - self.reset_display_options() + tm.reset_display_options() warnings.filters = warn_filters @@ -1393,7 +1457,7 @@ def test_to_hierarchical(self): labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], [0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]]) tm.assert_index_equal(result, expected) - self.assertEqual(result.names, index.names) + assert result.names == index.names # K > 1 result = index.to_hierarchical(3, 2) @@ -1401,7 +1465,7 @@ def test_to_hierarchical(self): labels=[[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]]) tm.assert_index_equal(result, expected) - self.assertEqual(result.names, index.names) + assert result.names == index.names # non-sorted index = MultiIndex.from_tuples([(2, 'c'), (1, 'b'), @@ -1415,18 +1479,19 @@ def test_to_hierarchical(self): (2, 'b'), (2, 'b')], names=['N1', 'N2']) tm.assert_index_equal(result, expected) - self.assertEqual(result.names, index.names) + assert result.names == index.names def test_bounds(self): self.index._bounds def test_equals_multi(self): - self.assertTrue(self.index.equals(self.index)) - self.assertTrue(self.index.equal_levels(self.index)) - - self.assertFalse(self.index.equals(self.index[:-1])) + assert self.index.equals(self.index) + assert not self.index.equals(self.index.values) + assert self.index.equals(Index(self.index.values)) - self.assertTrue(self.index.equals(self.index._tuple_index)) + assert self.index.equal_levels(self.index) + assert not self.index.equals(self.index[:-1]) + assert not self.index.equals(self.index[-1]) # different number of levels index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( @@ -1434,8 +1499,8 @@ def test_equals_multi(self): [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])]) index2 = MultiIndex(levels=index.levels[:-1], labels=index.labels[:-1]) - self.assertFalse(index.equals(index2)) - self.assertFalse(index.equal_levels(index2)) + assert not index.equals(index2) + assert not index.equal_levels(index2) # levels are different major_axis = Index(lrange(4)) @@ -1446,8 +1511,8 @@ def test_equals_multi(self): index = MultiIndex(levels=[major_axis, minor_axis], labels=[major_labels, minor_labels]) - self.assertFalse(self.index.equals(index)) - self.assertFalse(self.index.equal_levels(index)) + assert not self.index.equals(index) + assert not self.index.equal_levels(index) # some of the labels are different major_axis = Index(['foo', 'bar', 'baz', 'qux']) @@ -1458,52 +1523,61 @@ def test_equals_multi(self): index = MultiIndex(levels=[major_axis, minor_axis], labels=[major_labels, minor_labels]) - self.assertFalse(self.index.equals(index)) + assert not self.index.equals(index) + + def test_equals_missing_values(self): + # make sure take is not using -1 + i = pd.MultiIndex.from_tuples([(0, pd.NaT), + (0, pd.Timestamp('20130101'))]) + result = i[0:1].equals(i[0]) + assert not result + result = i[1:2].equals(i[1]) + assert not result def test_identical(self): mi = self.index.copy() mi2 = self.index.copy() - self.assertTrue(mi.identical(mi2)) + assert mi.identical(mi2) mi = mi.set_names(['new1', 'new2']) - self.assertTrue(mi.equals(mi2)) - self.assertFalse(mi.identical(mi2)) + assert mi.equals(mi2) + assert not mi.identical(mi2) mi2 = mi2.set_names(['new1', 'new2']) - self.assertTrue(mi.identical(mi2)) + assert mi.identical(mi2) mi3 = Index(mi.tolist(), names=mi.names) mi4 = Index(mi.tolist(), names=mi.names, tupleize_cols=False) - self.assertTrue(mi.identical(mi3)) - self.assertFalse(mi.identical(mi4)) - self.assertTrue(mi.equals(mi4)) + assert mi.identical(mi3) + assert not mi.identical(mi4) + assert mi.equals(mi4) def test_is_(self): mi = MultiIndex.from_tuples(lzip(range(10), range(10))) - self.assertTrue(mi.is_(mi)) - self.assertTrue(mi.is_(mi.view())) - self.assertTrue(mi.is_(mi.view().view().view().view())) + assert mi.is_(mi) + assert mi.is_(mi.view()) + assert mi.is_(mi.view().view().view().view()) mi2 = mi.view() # names are metadata, they don't change id mi2.names = ["A", "B"] - self.assertTrue(mi2.is_(mi)) - self.assertTrue(mi.is_(mi2)) + assert mi2.is_(mi) + assert mi.is_(mi2) - self.assertTrue(mi.is_(mi.set_names(["C", "D"]))) + assert mi.is_(mi.set_names(["C", "D"])) mi2 = mi.view() mi2.set_names(["E", "F"], inplace=True) - self.assertTrue(mi.is_(mi2)) + assert mi.is_(mi2) # levels are inherent properties, they change identity mi3 = mi2.set_levels([lrange(10), lrange(10)]) - self.assertFalse(mi3.is_(mi2)) + assert not mi3.is_(mi2) # shouldn't change - self.assertTrue(mi2.is_(mi)) + assert mi2.is_(mi) mi4 = mi3.view() mi4.set_levels([[1 for _ in range(10)], lrange(10)], inplace=True) - self.assertFalse(mi4.is_(mi3)) + assert not mi4.is_(mi3) mi5 = mi.view() mi5.set_levels(mi5.levels, inplace=True) - self.assertFalse(mi5.is_(mi)) + assert not mi5.is_(mi) def test_union(self): piece1 = self.index[:5][::-1] @@ -1511,69 +1585,69 @@ def test_union(self): the_union = piece1 | piece2 - tups = sorted(self.index._tuple_index) + tups = sorted(self.index.values) expected = MultiIndex.from_tuples(tups) - self.assertTrue(the_union.equals(expected)) + assert the_union.equals(expected) # corner case, pass self or empty thing: the_union = self.index.union(self.index) - self.assertIs(the_union, self.index) + assert the_union is self.index the_union = self.index.union(self.index[:0]) - self.assertIs(the_union, self.index) + assert the_union is self.index # won't work in python 3 - # tuples = self.index._tuple_index + # tuples = self.index.values # result = self.index[:4] | tuples[4:] - # self.assertTrue(result.equals(tuples)) + # assert result.equals(tuples) # not valid for python 3 # def test_union_with_regular_index(self): # other = Index(['A', 'B', 'C']) # result = other.union(self.index) - # self.assertIn(('foo', 'one'), result) - # self.assertIn('B', result) + # assert ('foo', 'one') in result + # assert 'B' in result # result2 = self.index.union(other) - # self.assertTrue(result.equals(result2)) + # assert result.equals(result2) def test_intersection(self): piece1 = self.index[:5][::-1] piece2 = self.index[3:] the_int = piece1 & piece2 - tups = sorted(self.index[3:5]._tuple_index) + tups = sorted(self.index[3:5].values) expected = MultiIndex.from_tuples(tups) - self.assertTrue(the_int.equals(expected)) + assert the_int.equals(expected) # corner case, pass self the_int = self.index.intersection(self.index) - self.assertIs(the_int, self.index) + assert the_int is self.index # empty intersection: disjoint empty = self.index[:2] & self.index[2:] expected = self.index[:0] - self.assertTrue(empty.equals(expected)) + assert empty.equals(expected) # can't do in python 3 - # tuples = self.index._tuple_index + # tuples = self.index.values # result = self.index & tuples - # self.assertTrue(result.equals(tuples)) + # assert result.equals(tuples) def test_sub(self): first = self.index # - now raises (previously was set op difference) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): first - self.index[-3:] - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.index[-3:] - first - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.index[-3:] - first.tolist() - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): first.tolist() - self.index[-3:] def test_difference(self): @@ -1584,66 +1658,75 @@ def test_difference(self): sortorder=0, names=self.index.names) - tm.assertIsInstance(result, MultiIndex) - self.assertTrue(result.equals(expected)) - self.assertEqual(result.names, self.index.names) + assert isinstance(result, MultiIndex) + assert result.equals(expected) + assert result.names == self.index.names # empty difference: reflexive result = self.index.difference(self.index) expected = self.index[:0] - self.assertTrue(result.equals(expected)) - self.assertEqual(result.names, self.index.names) + assert result.equals(expected) + assert result.names == self.index.names # empty difference: superset result = self.index[-3:].difference(self.index) expected = self.index[:0] - self.assertTrue(result.equals(expected)) - self.assertEqual(result.names, self.index.names) + assert result.equals(expected) + assert result.names == self.index.names # empty difference: degenerate result = self.index[:0].difference(self.index) expected = self.index[:0] - self.assertTrue(result.equals(expected)) - self.assertEqual(result.names, self.index.names) + assert result.equals(expected) + assert result.names == self.index.names # names not the same chunklet = self.index[-3:] chunklet.names = ['foo', 'baz'] result = first.difference(chunklet) - self.assertEqual(result.names, (None, None)) + assert result.names == (None, None) # empty, but non-equal result = self.index.difference(self.index.sortlevel(1)[0]) - self.assertEqual(len(result), 0) + assert len(result) == 0 # raise Exception called with non-MultiIndex - result = first.difference(first._tuple_index) - self.assertTrue(result.equals(first[:0])) + result = first.difference(first.values) + assert result.equals(first[:0]) # name from empty array result = first.difference([]) - self.assertTrue(first.equals(result)) - self.assertEqual(first.names, result.names) + assert first.equals(result) + assert first.names == result.names # name from non-empty array result = first.difference([('foo', 'one')]) expected = pd.MultiIndex.from_tuples([('bar', 'one'), ('baz', 'two'), ( 'foo', 'two'), ('qux', 'one'), ('qux', 'two')]) expected.names = first.names - self.assertEqual(first.names, result.names) - assertRaisesRegexp(TypeError, "other must be a MultiIndex or a list" - " of tuples", first.difference, [1, 2, 3, 4, 5]) + assert first.names == result.names + tm.assert_raises_regex(TypeError, "other must be a MultiIndex " + "or a list of tuples", + first.difference, [1, 2, 3, 4, 5]) def test_from_tuples(self): - assertRaisesRegexp(TypeError, 'Cannot infer number of levels from' - ' empty list', MultiIndex.from_tuples, []) + tm.assert_raises_regex(TypeError, 'Cannot infer number of levels ' + 'from empty list', + MultiIndex.from_tuples, []) idx = MultiIndex.from_tuples(((1, 2), (3, 4)), names=['a', 'b']) - self.assertEqual(len(idx), 2) + assert len(idx) == 2 + + def test_from_tuples_empty(self): + # GH 16777 + result = MultiIndex.from_tuples([], names=['a', 'b']) + expected = MultiIndex.from_arrays(arrays=[[], []], + names=['a', 'b']) + tm.assert_index_equal(result, expected) def test_argsort(self): result = self.index.argsort() - expected = self.index._tuple_index.argsort() + expected = self.index.values.argsort() tm.assert_numpy_array_equal(result, expected) def test_sortlevel(self): @@ -1656,23 +1739,23 @@ def test_sortlevel(self): sorted_idx, _ = index.sortlevel(0) expected = MultiIndex.from_tuples(sorted(tuples)) - self.assertTrue(sorted_idx.equals(expected)) + assert sorted_idx.equals(expected) sorted_idx, _ = index.sortlevel(0, ascending=False) - self.assertTrue(sorted_idx.equals(expected[::-1])) + assert sorted_idx.equals(expected[::-1]) sorted_idx, _ = index.sortlevel(1) by1 = sorted(tuples, key=lambda x: (x[1], x[0])) expected = MultiIndex.from_tuples(by1) - self.assertTrue(sorted_idx.equals(expected)) + assert sorted_idx.equals(expected) sorted_idx, _ = index.sortlevel(1, ascending=False) - self.assertTrue(sorted_idx.equals(expected[::-1])) + assert sorted_idx.equals(expected[::-1]) def test_sortlevel_not_sort_remaining(self): mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, 1]], names=list('ABC')) sorted_idx, _ = mi.sortlevel('A', sort_remaining=False) - self.assertTrue(sorted_idx.equals(mi)) + assert sorted_idx.equals(mi) def test_sortlevel_deterministic(self): tuples = [('bar', 'one'), ('foo', 'two'), ('qux', 'two'), @@ -1682,18 +1765,18 @@ def test_sortlevel_deterministic(self): sorted_idx, _ = index.sortlevel(0) expected = MultiIndex.from_tuples(sorted(tuples)) - self.assertTrue(sorted_idx.equals(expected)) + assert sorted_idx.equals(expected) sorted_idx, _ = index.sortlevel(0, ascending=False) - self.assertTrue(sorted_idx.equals(expected[::-1])) + assert sorted_idx.equals(expected[::-1]) sorted_idx, _ = index.sortlevel(1) by1 = sorted(tuples, key=lambda x: (x[1], x[0])) expected = MultiIndex.from_tuples(by1) - self.assertTrue(sorted_idx.equals(expected)) + assert sorted_idx.equals(expected) sorted_idx, _ = index.sortlevel(1, ascending=False) - self.assertTrue(sorted_idx.equals(expected[::-1])) + assert sorted_idx.equals(expected[::-1]) def test_dims(self): pass @@ -1705,66 +1788,66 @@ def test_drop(self): dropped2 = self.index.drop(index) expected = self.index[[0, 2, 3, 5]] - self.assert_index_equal(dropped, expected) - self.assert_index_equal(dropped2, expected) + tm.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped2, expected) dropped = self.index.drop(['bar']) expected = self.index[[0, 1, 3, 4, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = self.index.drop('foo') expected = self.index[[2, 3, 4, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) index = MultiIndex.from_tuples([('bar', 'two')]) - self.assertRaises(KeyError, self.index.drop, [('bar', 'two')]) - self.assertRaises(KeyError, self.index.drop, index) - self.assertRaises(KeyError, self.index.drop, ['foo', 'two']) + pytest.raises(KeyError, self.index.drop, [('bar', 'two')]) + pytest.raises(KeyError, self.index.drop, index) + pytest.raises(KeyError, self.index.drop, ['foo', 'two']) # partially correct argument mixed_index = MultiIndex.from_tuples([('qux', 'one'), ('bar', 'two')]) - self.assertRaises(KeyError, self.index.drop, mixed_index) + pytest.raises(KeyError, self.index.drop, mixed_index) # error='ignore' dropped = self.index.drop(index, errors='ignore') expected = self.index[[0, 1, 2, 3, 4, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = self.index.drop(mixed_index, errors='ignore') expected = self.index[[0, 1, 2, 3, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) dropped = self.index.drop(['foo', 'two'], errors='ignore') expected = self.index[[2, 3, 4, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) # mixed partial / full drop dropped = self.index.drop(['foo', ('qux', 'one')]) expected = self.index[[2, 3, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) # mixed partial / full drop / error='ignore' mixed_index = ['foo', ('qux', 'one'), 'two'] - self.assertRaises(KeyError, self.index.drop, mixed_index) + pytest.raises(KeyError, self.index.drop, mixed_index) dropped = self.index.drop(mixed_index, errors='ignore') expected = self.index[[2, 3, 5]] - self.assert_index_equal(dropped, expected) + tm.assert_index_equal(dropped, expected) def test_droplevel_with_names(self): index = self.index[self.index.get_loc('foo')] dropped = index.droplevel(0) - self.assertEqual(dropped.name, 'second') + assert dropped.name == 'second' index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( lrange(4))], labels=[np.array([0, 0, 1, 2, 2, 2, 3, 3]), np.array( [0, 1, 0, 0, 0, 1, 0, 1]), np.array([1, 0, 1, 1, 0, 0, 1, 0])], names=['one', 'two', 'three']) dropped = index.droplevel(0) - self.assertEqual(dropped.names, ('two', 'three')) + assert dropped.names == ('two', 'three') dropped = index.droplevel('two') expected = index.droplevel(1) - self.assertTrue(dropped.equals(expected)) + assert dropped.equals(expected) def test_droplevel_multiple(self): index = MultiIndex(levels=[Index(lrange(4)), Index(lrange(4)), Index( @@ -1774,7 +1857,7 @@ def test_droplevel_multiple(self): dropped = index[:2].droplevel(['three', 'one']) expected = index[:2].droplevel(2).droplevel(0) - self.assertTrue(dropped.equals(expected)) + assert dropped.equals(expected) def test_drop_not_lexsorted(self): # GH 12078 @@ -1782,7 +1865,7 @@ def test_drop_not_lexsorted(self): # define the lexsorted version of the multi-index tuples = [('a', ''), ('b1', 'c1'), ('b2', 'c2')] lexsorted_mi = MultiIndex.from_tuples(tuples, names=['b', 'c']) - self.assertTrue(lexsorted_mi.is_lexsorted()) + assert lexsorted_mi.is_lexsorted() # and the not-lexsorted version df = pd.DataFrame(columns=['a', 'b', 'c', 'd'], @@ -1790,19 +1873,19 @@ def test_drop_not_lexsorted(self): df = df.pivot_table(index='a', columns=['b', 'c'], values='d') df = df.reset_index() not_lexsorted_mi = df.columns - self.assertFalse(not_lexsorted_mi.is_lexsorted()) + assert not not_lexsorted_mi.is_lexsorted() # compare the results - self.assert_index_equal(lexsorted_mi, not_lexsorted_mi) - with self.assert_produces_warning(PerformanceWarning): - self.assert_index_equal(lexsorted_mi.drop('a'), - not_lexsorted_mi.drop('a')) + tm.assert_index_equal(lexsorted_mi, not_lexsorted_mi) + with tm.assert_produces_warning(PerformanceWarning): + tm.assert_index_equal(lexsorted_mi.drop('a'), + not_lexsorted_mi.drop('a')) def test_insert(self): # key contained in all levels new_index = self.index.insert(0, ('bar', 'two')) - self.assertTrue(new_index.equal_levels(self.index)) - self.assertEqual(new_index[0], ('bar', 'two')) + assert new_index.equal_levels(self.index) + assert new_index[0] == ('bar', 'two') # key not contained in all levels new_index = self.index.insert(0, ('abc', 'three')) @@ -1812,11 +1895,11 @@ def test_insert(self): exp1 = Index(list(self.index.levels[1]) + ['three'], name='second') tm.assert_index_equal(new_index.levels[1], exp1) - self.assertEqual(new_index[0], ('abc', 'three')) + assert new_index[0] == ('abc', 'three') # key wrong length msg = "Item must have length equal to number of levels" - with assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.index.insert(0, ('foo2', )) left = pd.DataFrame([['a', 'b', 0], ['b', 'd', 1]], @@ -1866,7 +1949,7 @@ def test_insert(self): def test_take_preserve_name(self): taken = self.index.take([3, 0, 1]) - self.assertEqual(taken.names, self.index.names) + assert taken.names == self.index.names def test_take_fill_value(self): # GH 12631 @@ -1900,12 +1983,12 @@ def test_take_fill_value(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def take_invalid_kwargs(self): @@ -1915,16 +1998,16 @@ def take_invalid_kwargs(self): indices = [1, 2] msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, idx.take, - indices, foo=2) + tm.assert_raises_regex(TypeError, msg, idx.take, + indices, foo=2) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, out=indices) + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, out=indices) msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, mode='clip') + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, mode='clip') def test_join_level(self): def _check_how(other, how): @@ -1933,8 +2016,8 @@ def _check_how(other, how): return_indexers=True) exp_level = other.join(self.index.levels[1], how=how) - self.assertTrue(join_index.levels[0].equals(self.index.levels[0])) - self.assertTrue(join_index.levels[1].equals(exp_level)) + assert join_index.levels[0].equals(self.index.levels[0]) + assert join_index.levels[1].equals(exp_level) # pare down levels mask = np.array( @@ -1947,7 +2030,7 @@ def _check_how(other, how): self.index.join(other, how=how, level='second', return_indexers=True) - self.assertTrue(join_index.equals(join_index2)) + assert join_index.equals(join_index2) tm.assert_numpy_array_equal(lidx, lidx2) tm.assert_numpy_array_equal(ridx, ridx2) tm.assert_numpy_array_equal(join_index2.values, exp_values) @@ -1965,17 +2048,17 @@ def _check_all(other): # some corner cases idx = Index(['three', 'one', 'two']) result = idx.join(self.index, level='second') - tm.assertIsInstance(result, MultiIndex) + assert isinstance(result, MultiIndex) - assertRaisesRegexp(TypeError, "Join.*MultiIndex.*ambiguous", - self.index.join, self.index, level=1) + tm.assert_raises_regex(TypeError, "Join.*MultiIndex.*ambiguous", + self.index.join, self.index, level=1) def test_join_self(self): kinds = 'outer', 'inner', 'left', 'right' for kind in kinds: res = self.index joined = res.join(res, how=kind) - self.assertIs(res, joined) + assert res is joined def test_join_multi(self): # GH 10665 @@ -1989,36 +2072,36 @@ def test_join_multi(self): [np.arange(4), [1, 2]], names=['a', 'b']) exp_lidx = np.array([1, 2, 5, 6, 9, 10, 13, 14], dtype=np.intp) exp_ridx = np.array([0, 1, 0, 1, 0, 1, 0, 1], dtype=np.intp) - self.assert_index_equal(jidx, exp_idx) - self.assert_numpy_array_equal(lidx, exp_lidx) - self.assert_numpy_array_equal(ridx, exp_ridx) + tm.assert_index_equal(jidx, exp_idx) + tm.assert_numpy_array_equal(lidx, exp_lidx) + tm.assert_numpy_array_equal(ridx, exp_ridx) # flip jidx, ridx, lidx = idx.join(midx, how='inner', return_indexers=True) - self.assert_index_equal(jidx, exp_idx) - self.assert_numpy_array_equal(lidx, exp_lidx) - self.assert_numpy_array_equal(ridx, exp_ridx) + tm.assert_index_equal(jidx, exp_idx) + tm.assert_numpy_array_equal(lidx, exp_lidx) + tm.assert_numpy_array_equal(ridx, exp_ridx) # keep MultiIndex jidx, lidx, ridx = midx.join(idx, how='left', return_indexers=True) exp_ridx = np.array([-1, 0, 1, -1, -1, 0, 1, -1, -1, 0, 1, -1, -1, 0, 1, -1], dtype=np.intp) - self.assert_index_equal(jidx, midx) - self.assertIsNone(lidx) - self.assert_numpy_array_equal(ridx, exp_ridx) + tm.assert_index_equal(jidx, midx) + assert lidx is None + tm.assert_numpy_array_equal(ridx, exp_ridx) # flip jidx, ridx, lidx = idx.join(midx, how='right', return_indexers=True) - self.assert_index_equal(jidx, midx) - self.assertIsNone(lidx) - self.assert_numpy_array_equal(ridx, exp_ridx) + tm.assert_index_equal(jidx, midx) + assert lidx is None + tm.assert_numpy_array_equal(ridx, exp_ridx) def test_reindex(self): result, indexer = self.index.reindex(list(self.index[:4])) - tm.assertIsInstance(result, MultiIndex) + assert isinstance(result, MultiIndex) self.check_level_names(result, self.index[:4].names) result, indexer = self.index.reindex(list(self.index)) - tm.assertIsInstance(result, MultiIndex) - self.assertIsNone(indexer) + assert isinstance(result, MultiIndex) + assert indexer is None self.check_level_names(result, self.index.names) def test_reindex_level(self): @@ -2030,28 +2113,29 @@ def test_reindex_level(self): exp_index = self.index.join(idx, level='second', how='right') exp_index2 = self.index.join(idx, level='second', how='left') - self.assertTrue(target.equals(exp_index)) + assert target.equals(exp_index) exp_indexer = np.array([0, 2, 4]) tm.assert_numpy_array_equal(indexer, exp_indexer, check_dtype=False) - self.assertTrue(target2.equals(exp_index2)) + assert target2.equals(exp_index2) exp_indexer2 = np.array([0, -1, 0, -1, 0, -1]) tm.assert_numpy_array_equal(indexer2, exp_indexer2, check_dtype=False) - assertRaisesRegexp(TypeError, "Fill method not supported", - self.index.reindex, self.index, method='pad', - level='second') + tm.assert_raises_regex(TypeError, "Fill method not supported", + self.index.reindex, self.index, + method='pad', level='second') - assertRaisesRegexp(TypeError, "Fill method not supported", idx.reindex, - idx, method='bfill', level='first') + tm.assert_raises_regex(TypeError, "Fill method not supported", + idx.reindex, idx, method='bfill', + level='first') def test_duplicates(self): - self.assertFalse(self.index.has_duplicates) - self.assertTrue(self.index.append(self.index).has_duplicates) + assert not self.index.has_duplicates + assert self.index.append(self.index).has_duplicates index = MultiIndex(levels=[[0, 1], [0, 1, 2]], labels=[ [0, 0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 0, 1, 2]]) - self.assertTrue(index.has_duplicates) + assert index.has_duplicates # GH 9075 t = [(u('x'), u('out'), u('z'), 5, u('y'), u('in'), u('z'), 169), @@ -2074,7 +2158,7 @@ def test_duplicates(self): (u('x'), u('out'), u('z'), 12, u('y'), u('in'), u('z'), 144)] index = pd.MultiIndex.from_tuples(t) - self.assertFalse(index.has_duplicates) + assert not index.has_duplicates # handle int64 overflow if possible def check(nlevels, with_nulls): @@ -2095,7 +2179,7 @@ def check(nlevels, with_nulls): # no dups index = MultiIndex(levels=levels, labels=labels) - self.assertFalse(index.has_duplicates) + assert not index.has_duplicates # with a dup if with_nulls: @@ -2106,7 +2190,7 @@ def check(nlevels, with_nulls): values = index.values.tolist() index = MultiIndex.from_tuples(values + [values[0]]) - self.assertTrue(index.has_duplicates) + assert index.has_duplicates # no overflow check(4, False) @@ -2124,14 +2208,14 @@ def check(nlevels, with_nulls): for keep in ['first', 'last', False]: left = mi.duplicated(keep=keep) - right = pd.hashtable.duplicated_object(mi.values, keep=keep) + right = pd._libs.hashtable.duplicated_object(mi.values, keep=keep) tm.assert_numpy_array_equal(left, right) # GH5873 for a in [101, 102]: mi = MultiIndex.from_arrays([[101, a], [3.5, np.nan]]) - self.assertFalse(mi.has_duplicates) - self.assertEqual(mi.get_duplicates(), []) + assert not mi.has_duplicates + assert mi.get_duplicates() == [] tm.assert_numpy_array_equal(mi.duplicated(), np.zeros( 2, dtype='bool')) @@ -2141,9 +2225,9 @@ def check(nlevels, with_nulls): lab = product(range(-1, n), range(-1, m)) mi = MultiIndex(levels=[list('abcde')[:n], list('WXYZ')[:m]], labels=np.random.permutation(list(lab)).T) - self.assertEqual(len(mi), (n + 1) * (m + 1)) - self.assertFalse(mi.has_duplicates) - self.assertEqual(mi.get_duplicates(), []) + assert len(mi) == (n + 1) * (m + 1) + assert not mi.has_duplicates + assert mi.get_duplicates() == [] tm.assert_numpy_array_equal(mi.duplicated(), np.zeros( len(mi), dtype='bool')) @@ -2155,8 +2239,8 @@ def test_duplicate_meta_data(self): index.set_names([None, None]), index.set_names([None, 'Num']), index.set_names(['Upper', 'Num']), ]: - self.assertTrue(idx.has_duplicates) - self.assertEqual(idx.drop_duplicates().names, idx.names) + assert idx.has_duplicates + assert idx.drop_duplicates().names == idx.names def test_get_unique_index(self): idx = self.index[[0, 1, 0, 1, 1, 0, 0]] @@ -2164,8 +2248,8 @@ def test_get_unique_index(self): for dropna in [False, True]: result = idx._get_unique_index(dropna=dropna) - self.assertTrue(result.unique) - self.assert_index_equal(result, expected) + assert result.unique + tm.assert_index_equal(result, expected) def test_unique(self): mi = pd.MultiIndex.from_arrays([[1, 2, 1, 2], [1, 1, 1, 2]]) @@ -2202,14 +2286,13 @@ def test_unique_datetimelike(self): def test_tolist(self): result = self.index.tolist() exp = list(self.index.values) - self.assertEqual(result, exp) + assert result == exp def test_repr_with_unicode_data(self): with pd.core.config.option_context("display.encoding", 'UTF-8'): d = {"a": [u("\u05d0"), 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]} index = pd.DataFrame(d).set_index(["a", "b"]).index - self.assertFalse("\\u" in repr(index) - ) # we don't want unicode-escaped + assert "\\u" not in repr(index) # we don't want unicode-escaped def test_repr_roundtrip(self): @@ -2223,10 +2306,8 @@ def test_repr_roundtrip(self): result = eval(repr(mi)) # string coerces to unicode tm.assert_index_equal(result, mi, exact=False) - self.assertEqual( - mi.get_level_values('first').inferred_type, 'string') - self.assertEqual( - result.get_level_values('first').inferred_type, 'unicode') + assert mi.get_level_values('first').inferred_type == 'string' + assert result.get_level_values('first').inferred_type == 'unicode' mi_u = MultiIndex.from_product( [list(u'ab'), range(3)], names=['first', 'second']) @@ -2242,7 +2323,6 @@ def test_repr_roundtrip(self): # long format mi = MultiIndex.from_product([list('abcdefg'), range(10)], names=['first', 'second']) - result = str(mi) if PY3: tm.assert_index_equal(eval(repr(mi)), mi, exact=True) @@ -2250,13 +2330,9 @@ def test_repr_roundtrip(self): result = eval(repr(mi)) # string coerces to unicode tm.assert_index_equal(result, mi, exact=False) - self.assertEqual( - mi.get_level_values('first').inferred_type, 'string') - self.assertEqual( - result.get_level_values('first').inferred_type, 'unicode') + assert mi.get_level_values('first').inferred_type == 'string' + assert result.get_level_values('first').inferred_type == 'unicode' - mi = MultiIndex.from_product( - [list(u'abcdefg'), range(10)], names=['first', 'second']) result = eval(repr(mi_u)) tm.assert_index_equal(result, mi_u, exact=True) @@ -2285,14 +2361,14 @@ def test_bytestring_with_unicode(self): def test_slice_keep_name(self): x = MultiIndex.from_tuples([('a', 'b'), (1, 2), ('c', 'd')], names=['x', 'y']) - self.assertEqual(x[1:].names, x.names) + assert x[1:].names == x.names - def test_isnull_behavior(self): + def test_isna_behavior(self): # should not segfault GH5123 # NOTE: if MI representation changes, may make sense to allow - # isnull(MI) - with tm.assertRaises(NotImplementedError): - pd.isnull(self.index) + # isna(MI) + with pytest.raises(NotImplementedError): + pd.isna(self.index) def test_level_setting_resets_attributes(self): ind = MultiIndex.from_arrays([ @@ -2301,9 +2377,185 @@ def test_level_setting_resets_attributes(self): assert ind.is_monotonic ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]], inplace=True) + # if this fails, probably didn't reset the cache correctly. assert not ind.is_monotonic + def test_is_monotonic(self): + i = MultiIndex.from_product([np.arange(10), + np.arange(10)], names=['one', 'two']) + assert i.is_monotonic + assert i._is_strictly_monotonic_increasing + assert Index(i.values).is_monotonic + assert i._is_strictly_monotonic_increasing + + i = MultiIndex.from_product([np.arange(10, 0, -1), + np.arange(10)], names=['one', 'two']) + assert not i.is_monotonic + assert not i._is_strictly_monotonic_increasing + assert not Index(i.values).is_monotonic + assert not Index(i.values)._is_strictly_monotonic_increasing + + i = MultiIndex.from_product([np.arange(10), + np.arange(10, 0, -1)], + names=['one', 'two']) + assert not i.is_monotonic + assert not i._is_strictly_monotonic_increasing + assert not Index(i.values).is_monotonic + assert not Index(i.values)._is_strictly_monotonic_increasing + + i = MultiIndex.from_product([[1.0, np.nan, 2.0], ['a', 'b', 'c']]) + assert not i.is_monotonic + assert not i._is_strictly_monotonic_increasing + assert not Index(i.values).is_monotonic + assert not Index(i.values)._is_strictly_monotonic_increasing + + # string ordering + i = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], + ['one', 'two', 'three']], + labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=['first', 'second']) + assert not i.is_monotonic + assert not Index(i.values).is_monotonic + assert not i._is_strictly_monotonic_increasing + assert not Index(i.values)._is_strictly_monotonic_increasing + + i = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], + ['mom', 'next', 'zenith']], + labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=['first', 'second']) + assert i.is_monotonic + assert Index(i.values).is_monotonic + assert i._is_strictly_monotonic_increasing + assert Index(i.values)._is_strictly_monotonic_increasing + + # mixed levels, hits the TypeError + i = MultiIndex( + levels=[[1, 2, 3, 4], ['gb00b03mlx29', 'lu0197800237', + 'nl0000289783', + 'nl0000289965', 'nl0000301109']], + labels=[[0, 1, 1, 2, 2, 2, 3], [4, 2, 0, 0, 1, 3, -1]], + names=['household_id', 'asset_id']) + + assert not i.is_monotonic + assert not i._is_strictly_monotonic_increasing + + def test_is_strictly_monotonic(self): + idx = pd.MultiIndex(levels=[['bar', 'baz'], ['mom', 'next']], + labels=[[0, 0, 1, 1], [0, 0, 0, 1]]) + assert idx.is_monotonic_increasing + assert not idx._is_strictly_monotonic_increasing + + @pytest.mark.xfail(reason="buggy MultiIndex.is_monotonic_decresaing.") + def test__is_strictly_monotonic_decreasing(self): + idx = pd.MultiIndex(levels=[['baz', 'bar'], ['next', 'mom']], + labels=[[0, 0, 1, 1], [0, 0, 0, 1]]) + assert idx.is_monotonic_decreasing + assert not idx._is_strictly_monotonic_decreasing + + def test_reconstruct_sort(self): + + # starts off lexsorted & monotonic + mi = MultiIndex.from_arrays([ + ['A', 'A', 'B', 'B', 'B'], [1, 2, 1, 2, 3] + ]) + assert mi.is_lexsorted() + assert mi.is_monotonic + + recons = mi._sort_levels_monotonic() + assert recons.is_lexsorted() + assert recons.is_monotonic + assert mi is recons + + assert mi.equals(recons) + assert Index(mi.values).equals(Index(recons.values)) + + # cannot convert to lexsorted + mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'), + ('x', 'b'), ('y', 'a'), ('z', 'b')], + names=['one', 'two']) + assert not mi.is_lexsorted() + assert not mi.is_monotonic + + recons = mi._sort_levels_monotonic() + assert not recons.is_lexsorted() + assert not recons.is_monotonic + + assert mi.equals(recons) + assert Index(mi.values).equals(Index(recons.values)) + + # cannot convert to lexsorted + mi = MultiIndex(levels=[['b', 'd', 'a'], [1, 2, 3]], + labels=[[0, 1, 0, 2], [2, 0, 0, 1]], + names=['col1', 'col2']) + assert not mi.is_lexsorted() + assert not mi.is_monotonic + + recons = mi._sort_levels_monotonic() + assert not recons.is_lexsorted() + assert not recons.is_monotonic + + assert mi.equals(recons) + assert Index(mi.values).equals(Index(recons.values)) + + def test_reconstruct_remove_unused(self): + # xref to GH 2770 + df = DataFrame([['deleteMe', 1, 9], + ['keepMe', 2, 9], + ['keepMeToo', 3, 9]], + columns=['first', 'second', 'third']) + df2 = df.set_index(['first', 'second'], drop=False) + df2 = df2[df2['first'] != 'deleteMe'] + + # removed levels are there + expected = MultiIndex(levels=[['deleteMe', 'keepMe', 'keepMeToo'], + [1, 2, 3]], + labels=[[1, 2], [1, 2]], + names=['first', 'second']) + result = df2.index + tm.assert_index_equal(result, expected) + + expected = MultiIndex(levels=[['keepMe', 'keepMeToo'], + [2, 3]], + labels=[[0, 1], [0, 1]], + names=['first', 'second']) + result = df2.index.remove_unused_levels() + tm.assert_index_equal(result, expected) + + # idempotent + result2 = result.remove_unused_levels() + tm.assert_index_equal(result2, expected) + assert result2.is_(result) + + @pytest.mark.parametrize('first_type,second_type', [ + ('int64', 'int64'), + ('datetime64[D]', 'str')]) + def test_remove_unused_levels_large(self, first_type, second_type): + # GH16556 + + # because tests should be deterministic (and this test in particular + # checks that levels are removed, which is not the case for every + # random input): + rng = np.random.RandomState(4) # seed is arbitrary value that works + + size = 1 << 16 + df = DataFrame(dict( + first=rng.randint(0, 1 << 13, size).astype(first_type), + second=rng.randint(0, 1 << 10, size).astype(second_type), + third=rng.rand(size))) + df = df.groupby(['first', 'second']).sum() + df = df[df.third < 0.1] + + result = df.index.remove_unused_levels() + assert len(result.levels[0]) < len(df.index.levels[0]) + assert len(result.levels[1]) < len(df.index.levels[1]) + assert result.equals(df.index) + + expected = df.reset_index().set_index(['first', 'second']).index + tm.assert_index_equal(result, expected) + def test_isin(self): values = [('foo', 2), ('bar', 3), ('quux', 4)] @@ -2316,8 +2568,8 @@ def test_isin(self): # empty, return dtype bool idx = MultiIndex.from_arrays([[], []]) result = idx.isin(values) - self.assertEqual(len(result), 0) - self.assertEqual(result.dtype, np.bool_) + assert len(result) == 0 + assert result.dtype == np.bool_ def test_isin_nan(self): idx = MultiIndex.from_arrays([['foo', 'bar'], [1.0, np.nan]]) @@ -2340,18 +2592,18 @@ def test_isin_level_kwarg(self): tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=1)) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level=-1)) - self.assertRaises(IndexError, idx.isin, vals_0, level=5) - self.assertRaises(IndexError, idx.isin, vals_0, level=-5) + pytest.raises(IndexError, idx.isin, vals_0, level=5) + pytest.raises(IndexError, idx.isin, vals_0, level=-5) - self.assertRaises(KeyError, idx.isin, vals_0, level=1.0) - self.assertRaises(KeyError, idx.isin, vals_1, level=-1.0) - self.assertRaises(KeyError, idx.isin, vals_1, level='A') + pytest.raises(KeyError, idx.isin, vals_0, level=1.0) + pytest.raises(KeyError, idx.isin, vals_1, level=-1.0) + pytest.raises(KeyError, idx.isin, vals_1, level='A') idx.names = ['A', 'B'] tm.assert_numpy_array_equal(expected, idx.isin(vals_0, level='A')) tm.assert_numpy_array_equal(expected, idx.isin(vals_1, level='B')) - self.assertRaises(KeyError, idx.isin, vals_1, level='C') + pytest.raises(KeyError, idx.isin, vals_1, level='C') def test_reindex_preserves_names_when_target_is_list_or_ndarray(self): # GH6552 @@ -2362,39 +2614,33 @@ def test_reindex_preserves_names_when_target_is_list_or_ndarray(self): other_dtype = pd.MultiIndex.from_product([[1, 2], [3, 4]]) # list & ndarray cases - self.assertEqual(idx.reindex([])[0].names, [None, None]) - self.assertEqual(idx.reindex(np.array([]))[0].names, [None, None]) - self.assertEqual(idx.reindex(target.tolist())[0].names, [None, None]) - self.assertEqual(idx.reindex(target.values)[0].names, [None, None]) - self.assertEqual( - idx.reindex(other_dtype.tolist())[0].names, [None, None]) - self.assertEqual( - idx.reindex(other_dtype.values)[0].names, [None, None]) + assert idx.reindex([])[0].names == [None, None] + assert idx.reindex(np.array([]))[0].names == [None, None] + assert idx.reindex(target.tolist())[0].names == [None, None] + assert idx.reindex(target.values)[0].names == [None, None] + assert idx.reindex(other_dtype.tolist())[0].names == [None, None] + assert idx.reindex(other_dtype.values)[0].names == [None, None] idx.names = ['foo', 'bar'] - self.assertEqual(idx.reindex([])[0].names, ['foo', 'bar']) - self.assertEqual(idx.reindex(np.array([]))[0].names, ['foo', 'bar']) - self.assertEqual(idx.reindex(target.tolist())[0].names, ['foo', 'bar']) - self.assertEqual(idx.reindex(target.values)[0].names, ['foo', 'bar']) - self.assertEqual( - idx.reindex(other_dtype.tolist())[0].names, ['foo', 'bar']) - self.assertEqual( - idx.reindex(other_dtype.values)[0].names, ['foo', 'bar']) + assert idx.reindex([])[0].names == ['foo', 'bar'] + assert idx.reindex(np.array([]))[0].names == ['foo', 'bar'] + assert idx.reindex(target.tolist())[0].names == ['foo', 'bar'] + assert idx.reindex(target.values)[0].names == ['foo', 'bar'] + assert idx.reindex(other_dtype.tolist())[0].names == ['foo', 'bar'] + assert idx.reindex(other_dtype.values)[0].names == ['foo', 'bar'] def test_reindex_lvl_preserves_names_when_target_is_list_or_array(self): # GH7774 idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b']], names=['foo', 'bar']) - self.assertEqual(idx.reindex([], level=0)[0].names, ['foo', 'bar']) - self.assertEqual(idx.reindex([], level=1)[0].names, ['foo', 'bar']) + assert idx.reindex([], level=0)[0].names == ['foo', 'bar'] + assert idx.reindex([], level=1)[0].names == ['foo', 'bar'] def test_reindex_lvl_preserves_type_if_target_is_empty_list_or_array(self): # GH7774 idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b']]) - self.assertEqual(idx.reindex([], level=0)[0].levels[0].dtype.type, - np.int64) - self.assertEqual(idx.reindex([], level=1)[0].levels[1].dtype.type, - np.object_) + assert idx.reindex([], level=0)[0].levels[0].dtype.type == np.int64 + assert idx.reindex([], level=1)[0].levels[1].dtype.type == np.object_ def test_groupby(self): groups = self.index.groupby(np.array([1, 1, 1, 2, 2, 2])) @@ -2422,23 +2668,23 @@ def test_index_name_retained(self): def test_equals_operator(self): # GH9785 - self.assertTrue((self.index == self.index).all()) + assert (self.index == self.index).all() def test_large_multiindex_error(self): # GH12527 df_below_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(499999)]), columns=['dest']) - with assertRaises(KeyError): + with pytest.raises(KeyError): df_below_1000000.loc[(-1, 0), 'dest'] - with assertRaises(KeyError): + with pytest.raises(KeyError): df_below_1000000.loc[(3, 0), 'dest'] df_above_1000000 = pd.DataFrame( 1, index=pd.MultiIndex.from_product([[1, 2], range(500001)]), columns=['dest']) - with assertRaises(KeyError): + with pytest.raises(KeyError): df_above_1000000.loc[(-1, 0), 'dest'] - with assertRaises(KeyError): + with pytest.raises(KeyError): df_above_1000000.loc[(3, 0), 'dest'] def test_partial_string_timestamp_multiindex(self): @@ -2491,7 +2737,7 @@ def test_partial_string_timestamp_multiindex(self): # ambiguous and we don't want to extend this behavior forward to work # in multi-indexes. This would amount to selecting a scalar from a # column. - with assertRaises(KeyError): + with pytest.raises(KeyError): df['2016-01-01'] # partial string match on year only @@ -2520,7 +2766,7 @@ def test_partial_string_timestamp_multiindex(self): tm.assert_frame_equal(result, expected) # Slicing date on first level should break (of course) - with assertRaises(KeyError): + with pytest.raises(KeyError): df_swap.loc['2016-01-01'] # GH12685 (partial string with daily resolution or below) @@ -2573,7 +2819,7 @@ def test_dropna(self): tm.assert_index_equal(idx.dropna(how='all'), exp) msg = "invalid how option: xxx" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.dropna(how='xxx') def test_unsortedindex(self): @@ -2584,19 +2830,69 @@ def test_unsortedindex(self): df = pd.DataFrame([[i, 10 * i] for i in lrange(6)], index=mi, columns=['one', 'two']) - with assertRaises(UnsortedIndexError): - df.loc(axis=0)['z', :] + # GH 16734: not sorted, but no real slicing + result = df.loc(axis=0)['z', 'a'] + expected = df.iloc[0] + tm.assert_series_equal(result, expected) + + with pytest.raises(UnsortedIndexError): + df.loc(axis=0)['z', slice('a')] df.sort_index(inplace=True) - self.assertEqual(len(df.loc(axis=0)['z', :]), 2) + assert len(df.loc(axis=0)['z', :]) == 2 - with assertRaises(KeyError): + with pytest.raises(KeyError): df.loc(axis=0)['q', :] + def test_unsortedindex_doc_examples(self): + # http://pandas.pydata.org/pandas-docs/stable/advanced.html#sorting-a-multiindex # noqa + dfm = DataFrame({'jim': [0, 0, 1, 1], + 'joe': ['x', 'x', 'z', 'y'], + 'jolie': np.random.rand(4)}) + + dfm = dfm.set_index(['jim', 'joe']) + with tm.assert_produces_warning(PerformanceWarning): + dfm.loc[(1, 'z')] + + with pytest.raises(UnsortedIndexError): + dfm.loc[(0, 'y'):(1, 'z')] + + assert not dfm.index.is_lexsorted() + assert dfm.index.lexsort_depth == 1 + + # sort it + dfm = dfm.sort_index() + dfm.loc[(1, 'z')] + dfm.loc[(0, 'y'):(1, 'z')] + + assert dfm.index.is_lexsorted() + assert dfm.index.lexsort_depth == 2 + def test_tuples_with_name_string(self): # GH 15110 and GH 14848 li = [(0, 0, 1), (0, 1, 0), (1, 0, 0)] - with assertRaises(ValueError): + with pytest.raises(ValueError): pd.Index(li, name='abc') - with assertRaises(ValueError): + with pytest.raises(ValueError): pd.Index(li, name='a') + + def test_nan_stays_float(self): + + # GH 7031 + idx0 = pd.MultiIndex(levels=[["A", "B"], []], + labels=[[1, 0], [-1, -1]], + names=[0, 1]) + idx1 = pd.MultiIndex(levels=[["C"], ["D"]], + labels=[[0], [0]], + names=[0, 1]) + idxm = idx0.join(idx1, how='outer') + assert pd.isna(idx0.get_level_values(1)).all() + # the following failed in 0.14.1 + assert pd.isna(idxm.get_level_values(1)[:-1]).all() + + df0 = pd.DataFrame([[1, 2]], index=idx0) + df1 = pd.DataFrame([[3, 4]], index=idx1) + dfm = df0 - df1 + assert pd.isna(df0.index.get_level_values(1)).all() + # the following failed in 0.14.1 + assert pd.isna(dfm.index.get_level_values(1)[:-1]).all() diff --git a/pandas/tests/indexes/test_numeric.py b/pandas/tests/indexes/test_numeric.py index c7acbf51a17e5..1a0a38c173284 100644 --- a/pandas/tests/indexes/test_numeric.py +++ b/pandas/tests/indexes/test_numeric.py @@ -1,18 +1,19 @@ # -*- coding: utf-8 -*- +import pytest + from datetime import datetime from pandas.compat import range, PY3 -import nose import numpy as np -from pandas import (date_range, Series, Index, Float64Index, +from pandas import (date_range, notna, Series, Index, Float64Index, Int64Index, UInt64Index, RangeIndex) import pandas.util.testing as tm import pandas as pd -from pandas.lib import Timestamp +from pandas._libs.lib import Timestamp from pandas.tests.indexes.common import Base @@ -75,10 +76,10 @@ def test_numeric_compat(self): tm.assert_index_equal(result, expected) # invalid - self.assertRaises(TypeError, - lambda: idx * date_range('20130101', periods=5)) - self.assertRaises(ValueError, lambda: idx * idx[0:3]) - self.assertRaises(ValueError, lambda: idx * np.array([1, 2])) + pytest.raises(TypeError, + lambda: idx * date_range('20130101', periods=5)) + pytest.raises(ValueError, lambda: idx * idx[0:3]) + pytest.raises(ValueError, lambda: idx * np.array([1, 2])) result = divmod(idx, 2) with np.errstate(all='ignore'): @@ -172,14 +173,13 @@ def test_modulo(self): # GH 9244 index = self.create_index() expected = Index(index.values % 2) - self.assert_index_equal(index % 2, expected) + tm.assert_index_equal(index % 2, expected) -class TestFloat64Index(Numeric, tm.TestCase): +class TestFloat64Index(Numeric): _holder = Float64Index - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.indices = dict(mixed=Float64Index([1.5, 2, 3, 4, 5]), float=Float64Index(np.arange(5) * 2.5)) self.setup_indices() @@ -192,14 +192,14 @@ def test_repr_roundtrip(self): tm.assert_index_equal(eval(repr(ind)), ind) def check_is_index(self, i): - self.assertIsInstance(i, Index) - self.assertNotIsInstance(i, Float64Index) + assert isinstance(i, Index) + assert not isinstance(i, Float64Index) def check_coerce(self, a, b, is_float_index=True): - self.assertTrue(a.equals(b)) - self.assert_index_equal(a, b, exact=False) + assert a.equals(b) + tm.assert_index_equal(a, b, exact=False) if is_float_index: - self.assertIsInstance(b, Float64Index) + assert isinstance(b, Float64Index) else: self.check_is_index(b) @@ -207,39 +207,39 @@ def test_constructor(self): # explicit construction index = Float64Index([1, 2, 3, 4, 5]) - self.assertIsInstance(index, Float64Index) + assert isinstance(index, Float64Index) expected = np.array([1, 2, 3, 4, 5], dtype='float64') - self.assert_numpy_array_equal(index.values, expected) + tm.assert_numpy_array_equal(index.values, expected) index = Float64Index(np.array([1, 2, 3, 4, 5])) - self.assertIsInstance(index, Float64Index) + assert isinstance(index, Float64Index) index = Float64Index([1., 2, 3, 4, 5]) - self.assertIsInstance(index, Float64Index) + assert isinstance(index, Float64Index) index = Float64Index(np.array([1., 2, 3, 4, 5])) - self.assertIsInstance(index, Float64Index) - self.assertEqual(index.dtype, float) + assert isinstance(index, Float64Index) + assert index.dtype == float index = Float64Index(np.array([1., 2, 3, 4, 5]), dtype=np.float32) - self.assertIsInstance(index, Float64Index) - self.assertEqual(index.dtype, np.float64) + assert isinstance(index, Float64Index) + assert index.dtype == np.float64 index = Float64Index(np.array([1, 2, 3, 4, 5]), dtype=np.float32) - self.assertIsInstance(index, Float64Index) - self.assertEqual(index.dtype, np.float64) + assert isinstance(index, Float64Index) + assert index.dtype == np.float64 # nan handling result = Float64Index([np.nan, np.nan]) - self.assertTrue(pd.isnull(result.values).all()) + assert pd.isna(result.values).all() result = Float64Index(np.array([np.nan])) - self.assertTrue(pd.isnull(result.values).all()) + assert pd.isna(result.values).all() result = Index(np.array([np.nan])) - self.assertTrue(pd.isnull(result.values).all()) + assert pd.isna(result.values).all() def test_constructor_invalid(self): # invalid - self.assertRaises(TypeError, Float64Index, 0.) - self.assertRaises(TypeError, Float64Index, ['a', 'b', 0.]) - self.assertRaises(TypeError, Float64Index, [Timestamp('20130101')]) + pytest.raises(TypeError, Float64Index, 0.) + pytest.raises(TypeError, Float64Index, ['a', 'b', 0.]) + pytest.raises(TypeError, Float64Index, [Timestamp('20130101')]) def test_constructor_coerce(self): @@ -260,15 +260,15 @@ def test_constructor_explicit(self): def test_astype(self): result = self.float.astype(object) - self.assertTrue(result.equals(self.float)) - self.assertTrue(self.float.equals(result)) + assert result.equals(self.float) + assert self.float.equals(result) self.check_is_index(result) i = self.mixed.copy() i.name = 'foo' result = i.astype(object) - self.assertTrue(result.equals(i)) - self.assertTrue(i.equals(result)) + assert result.equals(i) + assert i.equals(result) self.check_is_index(result) # GH 12881 @@ -297,28 +297,28 @@ def test_astype(self): # invalid for dtype in ['M8[ns]', 'm8[ns]']: - self.assertRaises(TypeError, lambda: i.astype(dtype)) + pytest.raises(TypeError, lambda: i.astype(dtype)) # GH 13149 for dtype in ['int16', 'int32', 'int64']: i = Float64Index([0, 1.1, np.NAN]) - self.assertRaises(ValueError, lambda: i.astype(dtype)) + pytest.raises(ValueError, lambda: i.astype(dtype)) def test_equals_numeric(self): i = Float64Index([1.0, 2.0]) - self.assertTrue(i.equals(i)) - self.assertTrue(i.identical(i)) + assert i.equals(i) + assert i.identical(i) i2 = Float64Index([1.0, 2.0]) - self.assertTrue(i.equals(i2)) + assert i.equals(i2) i = Float64Index([1.0, np.nan]) - self.assertTrue(i.equals(i)) - self.assertTrue(i.identical(i)) + assert i.equals(i) + assert i.identical(i) i2 = Float64Index([1.0, np.nan]) - self.assertTrue(i.equals(i2)) + assert i.equals(i2) def test_get_indexer(self): idx = Float64Index([0.0, 1.0, 2.0]) @@ -336,54 +336,62 @@ def test_get_indexer(self): def test_get_loc(self): idx = Float64Index([0.0, 1.0, 2.0]) for method in [None, 'pad', 'backfill', 'nearest']: - self.assertEqual(idx.get_loc(1, method), 1) + assert idx.get_loc(1, method) == 1 if method is not None: - self.assertEqual(idx.get_loc(1, method, tolerance=0), 1) + assert idx.get_loc(1, method, tolerance=0) == 1 for method, loc in [('pad', 1), ('backfill', 2), ('nearest', 1)]: - self.assertEqual(idx.get_loc(1.1, method), loc) - self.assertEqual(idx.get_loc(1.1, method, tolerance=0.9), loc) + assert idx.get_loc(1.1, method) == loc + assert idx.get_loc(1.1, method, tolerance=0.9) == loc - self.assertRaises(KeyError, idx.get_loc, 'foo') - self.assertRaises(KeyError, idx.get_loc, 1.5) - self.assertRaises(KeyError, idx.get_loc, 1.5, method='pad', - tolerance=0.1) + pytest.raises(KeyError, idx.get_loc, 'foo') + pytest.raises(KeyError, idx.get_loc, 1.5) + pytest.raises(KeyError, idx.get_loc, 1.5, method='pad', + tolerance=0.1) - with tm.assertRaisesRegexp(ValueError, 'must be numeric'): + with tm.assert_raises_regex(ValueError, 'must be numeric'): idx.get_loc(1.4, method='nearest', tolerance='foo') def test_get_loc_na(self): idx = Float64Index([np.nan, 1, 2]) - self.assertEqual(idx.get_loc(1), 1) - self.assertEqual(idx.get_loc(np.nan), 0) + assert idx.get_loc(1) == 1 + assert idx.get_loc(np.nan) == 0 idx = Float64Index([np.nan, 1, np.nan]) - self.assertEqual(idx.get_loc(1), 1) + assert idx.get_loc(1) == 1 # representable by slice [0:2:2] - # self.assertRaises(KeyError, idx.slice_locs, np.nan) + # pytest.raises(KeyError, idx.slice_locs, np.nan) sliced = idx.slice_locs(np.nan) - self.assertTrue(isinstance(sliced, tuple)) - self.assertEqual(sliced, (0, 3)) + assert isinstance(sliced, tuple) + assert sliced == (0, 3) # not representable by slice idx = Float64Index([np.nan, 1, np.nan, np.nan]) - self.assertEqual(idx.get_loc(1), 1) - self.assertRaises(KeyError, idx.slice_locs, np.nan) + assert idx.get_loc(1) == 1 + pytest.raises(KeyError, idx.slice_locs, np.nan) + + def test_get_loc_missing_nan(self): + # GH 8569 + idx = Float64Index([1, 2]) + assert idx.get_loc(1) == 0 + pytest.raises(KeyError, idx.get_loc, 3) + pytest.raises(KeyError, idx.get_loc, np.nan) + pytest.raises(KeyError, idx.get_loc, [np.nan]) def test_contains_nans(self): i = Float64Index([1.0, 2.0, np.nan]) - self.assertTrue(np.nan in i) + assert np.nan in i def test_contains_not_nans(self): i = Float64Index([1.0, 2.0, np.nan]) - self.assertTrue(1.0 in i) + assert 1.0 in i def test_doesnt_contain_all_the_things(self): i = Float64Index([np.nan]) - self.assertFalse(i.isin([0]).item()) - self.assertFalse(i.isin([1]).item()) - self.assertTrue(i.isin([np.nan]).item()) + assert not i.isin([0]).item() + assert not i.isin([1]).item() + assert i.isin([np.nan]).item() def test_nan_multiple_containment(self): i = Float64Index([1.0, np.nan]) @@ -400,7 +408,7 @@ def test_astype_from_object(self): index = Index([1.0, np.nan, 0.2], dtype='object') result = index.astype(float) expected = Float64Index([1.0, np.nan, 0.2]) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype tm.assert_index_equal(result, expected) def test_fillna_float64(self): @@ -408,15 +416,15 @@ def test_fillna_float64(self): idx = Index([1.0, np.nan, 3.0], dtype=float, name='x') # can't downcast exp = Index([1.0, 0.1, 3.0], name='x') - self.assert_index_equal(idx.fillna(0.1), exp) + tm.assert_index_equal(idx.fillna(0.1), exp) # downcast exp = Float64Index([1.0, 2.0, 3.0], name='x') - self.assert_index_equal(idx.fillna(2), exp) + tm.assert_index_equal(idx.fillna(2), exp) # object exp = Index([1.0, 'obj', 3.0], name='x') - self.assert_index_equal(idx.fillna('obj'), exp) + tm.assert_index_equal(idx.fillna('obj'), exp) def test_take_fill_value(self): # GH 12631 @@ -438,12 +446,12 @@ def test_take_fill_value(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) @@ -454,7 +462,7 @@ def test_view(self): i = self._holder([], name='Foo') i_view = i.view() - self.assertEqual(i_view.name, 'Foo') + assert i_view.name == 'Foo' i_view = i.view(self._dtype) tm.assert_index_equal(i, self._holder(i_view, name='Foo')) @@ -463,42 +471,61 @@ def test_view(self): tm.assert_index_equal(i, self._holder(i_view, name='Foo')) def test_is_monotonic(self): - self.assertTrue(self.index.is_monotonic) - self.assertTrue(self.index.is_monotonic_increasing) - self.assertFalse(self.index.is_monotonic_decreasing) + assert self.index.is_monotonic + assert self.index.is_monotonic_increasing + assert self.index._is_strictly_monotonic_increasing + assert not self.index.is_monotonic_decreasing + assert not self.index._is_strictly_monotonic_decreasing index = self._holder([4, 3, 2, 1]) - self.assertFalse(index.is_monotonic) - self.assertTrue(index.is_monotonic_decreasing) + assert not index.is_monotonic + assert not index._is_strictly_monotonic_increasing + assert index._is_strictly_monotonic_decreasing index = self._holder([1]) - self.assertTrue(index.is_monotonic) - self.assertTrue(index.is_monotonic_increasing) - self.assertTrue(index.is_monotonic_decreasing) + assert index.is_monotonic + assert index.is_monotonic_increasing + assert index.is_monotonic_decreasing + assert index._is_strictly_monotonic_increasing + assert index._is_strictly_monotonic_decreasing + + def test_is_strictly_monotonic(self): + index = self._holder([1, 1, 2, 3]) + assert index.is_monotonic_increasing + assert not index._is_strictly_monotonic_increasing + + index = self._holder([3, 2, 1, 1]) + assert index.is_monotonic_decreasing + assert not index._is_strictly_monotonic_decreasing + + index = self._holder([1, 1]) + assert index.is_monotonic_increasing + assert index.is_monotonic_decreasing + assert not index._is_strictly_monotonic_increasing + assert not index._is_strictly_monotonic_decreasing def test_logical_compat(self): idx = self.create_index() - self.assertEqual(idx.all(), idx.values.all()) - self.assertEqual(idx.any(), idx.values.any()) + assert idx.all() == idx.values.all() + assert idx.any() == idx.values.any() def test_identical(self): i = Index(self.index.copy()) - self.assertTrue(i.identical(self.index)) + assert i.identical(self.index) same_values_different_type = Index(i, dtype=object) - self.assertFalse(i.identical(same_values_different_type)) + assert not i.identical(same_values_different_type) i = self.index.copy(dtype=object) i = i.rename('foo') same_values = Index(i, dtype=object) - self.assertTrue(same_values.identical(i)) + assert same_values.identical(i) - self.assertFalse(i.identical(self.index)) - self.assertTrue(Index(same_values, name='foo', dtype=object).identical( - i)) + assert not i.identical(self.index) + assert Index(same_values, name='foo', dtype=object).identical(i) - self.assertFalse(self.index.copy(dtype=object) - .identical(self.index.copy(dtype=self._dtype))) + assert not self.index.copy(dtype=object).identical( + self.index.copy(dtype=self._dtype)) def test_join_non_unique(self): left = Index([4, 4, 3, 3]) @@ -506,7 +533,7 @@ def test_join_non_unique(self): joined, lidx, ridx = left.join(left, return_indexers=True) exp_joined = Index([3, 3, 3, 3, 4, 4, 4, 4]) - self.assert_index_equal(joined, exp_joined) + tm.assert_index_equal(joined, exp_joined) exp_lidx = np.array([2, 2, 3, 3, 0, 0, 1, 1], dtype=np.intp) tm.assert_numpy_array_equal(lidx, exp_lidx) @@ -518,7 +545,7 @@ def test_join_self(self): kinds = 'outer', 'inner', 'left', 'right' for kind in kinds: joined = self.index.join(self.index, how=kind) - self.assertIs(self.index, joined) + assert self.index is joined def test_union_noncomparable(self): from datetime import datetime, timedelta @@ -536,23 +563,23 @@ def test_union_noncomparable(self): def test_cant_or_shouldnt_cast(self): # can't data = ['foo', 'bar', 'baz'] - self.assertRaises(TypeError, self._holder, data) + pytest.raises(TypeError, self._holder, data) # shouldn't data = ['0', '1', '2'] - self.assertRaises(TypeError, self._holder, data) + pytest.raises(TypeError, self._holder, data) def test_view_index(self): self.index.view(Index) def test_prevent_casting(self): result = self.index.astype('O') - self.assertEqual(result.dtype, np.object_) + assert result.dtype == np.object_ def test_take_preserve_name(self): index = self._holder([1, 2, 3, 4], name='foo') taken = index.take([3, 0, 1]) - self.assertEqual(index.name, taken.name) + assert index.name == taken.name def test_take_fill_value(self): # see gh-12631 @@ -566,7 +593,7 @@ def test_take_fill_value(self): "{name} cannot contain NA").format(name=name) # fill_value=True - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -1]), fill_value=True) # allow_fill=False @@ -575,59 +602,58 @@ def test_take_fill_value(self): expected = self._holder([2, 1, 3], name='xxx') tm.assert_index_equal(result, expected) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def test_slice_keep_name(self): idx = self._holder([1, 2], name='asdf') - self.assertEqual(idx.name, idx[1:].name) + assert idx.name == idx[1:].name def test_ufunc_coercions(self): idx = self._holder([1, 2, 3, 4, 5], name='x') result = np.sqrt(idx) - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index(np.sqrt(np.array([1, 2, 3, 4, 5])), name='x') tm.assert_index_equal(result, exp) result = np.divide(idx, 2.) - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index([0.5, 1., 1.5, 2., 2.5], name='x') tm.assert_index_equal(result, exp) # _evaluate_numeric_binop result = idx + 2. - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index([3., 4., 5., 6., 7.], name='x') tm.assert_index_equal(result, exp) result = idx - 2. - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index([-1., 0., 1., 2., 3.], name='x') tm.assert_index_equal(result, exp) result = idx * 1. - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index([1., 2., 3., 4., 5.], name='x') tm.assert_index_equal(result, exp) result = idx / 2. - tm.assertIsInstance(result, Float64Index) + assert isinstance(result, Float64Index) exp = Float64Index([0.5, 1., 1.5, 2., 2.5], name='x') tm.assert_index_equal(result, exp) -class TestInt64Index(NumericInt, tm.TestCase): +class TestInt64Index(NumericInt): _dtype = 'int64' _holder = Int64Index - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.indices = dict(index=Int64Index(np.arange(0, 20, 2))) self.setup_indices() @@ -645,7 +671,7 @@ def test_constructor(self): tm.assert_index_equal(index, expected) # scalar raise Exception - self.assertRaises(TypeError, Int64Index, 5) + pytest.raises(TypeError, Int64Index, 5) # copy arr = self.index.values @@ -655,7 +681,7 @@ def test_constructor(self): # this should not change index arr[0] = val - self.assertNotEqual(new_index[0], val) + assert new_index[0] != val # interpret list-like expected = Int64Index([5, 0]) @@ -668,26 +694,51 @@ def test_constructor(self): def test_constructor_corner(self): arr = np.array([1, 2, 3, 4], dtype=object) index = Int64Index(arr) - self.assertEqual(index.values.dtype, np.int64) - self.assert_index_equal(index, Index(arr)) + assert index.values.dtype == np.int64 + tm.assert_index_equal(index, Index(arr)) # preventing casting arr = np.array([1, '2', 3, '4'], dtype=object) - with tm.assertRaisesRegexp(TypeError, 'casting'): + with tm.assert_raises_regex(TypeError, 'casting'): Int64Index(arr) arr_with_floats = [0, 2, 3, 4, 5, 1.25, 3, -1] - with tm.assertRaisesRegexp(TypeError, 'casting'): + with tm.assert_raises_regex(TypeError, 'casting'): Int64Index(arr_with_floats) def test_coerce_list(self): # coerce things arr = Index([1, 2, 3, 4]) - tm.assertIsInstance(arr, Int64Index) + assert isinstance(arr, Int64Index) # but not if explicit dtype passed arr = Index([1, 2, 3, 4], dtype=object) - tm.assertIsInstance(arr, Index) + assert isinstance(arr, Index) + + def test_where(self): + i = self.create_index() + result = i.where(notna(i)) + expected = i + tm.assert_index_equal(result, expected) + + _nan = i._na_value + cond = [False] + [True] * len(i[1:]) + expected = pd.Index([_nan] + i[1:].tolist()) + + result = i.where(cond) + tm.assert_index_equal(result, expected) + + def test_where_array_like(self): + i = self.create_index() + + _nan = i._na_value + cond = [False] + [True] * (len(i) - 1) + klasses = [list, tuple, np.array, pd.Series] + expected = pd.Index([_nan] + i[1:].tolist()) + + for klass in klasses: + result = i.where(klass(cond)) + tm.assert_index_equal(result, expected) def test_get_indexer(self): target = Int64Index(np.arange(10)) @@ -735,8 +786,8 @@ def test_join_inner(self): elidx = np.array([1, 6], dtype=np.intp) eridx = np.array([4, 1], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -745,12 +796,12 @@ def test_join_inner(self): return_indexers=True) res2 = self.index.intersection(other_mono) - self.assert_index_equal(res, res2) + tm.assert_index_equal(res, res2) elidx = np.array([1, 6], dtype=np.intp) eridx = np.array([1, 4], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -765,9 +816,9 @@ def test_join_left(self): eridx = np.array([-1, 4, -1, -1, -1, -1, 1, -1, -1, -1], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) + assert lidx is None tm.assert_numpy_array_equal(ridx, eridx) # monotonic @@ -775,9 +826,9 @@ def test_join_left(self): return_indexers=True) eridx = np.array([-1, 1, -1, -1, -1, -1, 4, -1, -1, -1], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) + assert lidx is None tm.assert_numpy_array_equal(ridx, eridx) # non-unique @@ -787,7 +838,7 @@ def test_join_left(self): eres = Index([1, 1, 2, 5, 7, 9]) # 1 is in idx2, so it should be x2 eridx = np.array([0, 1, 2, 3, -1, -1], dtype=np.intp) elidx = np.array([0, 0, 1, 2, 3, 4], dtype=np.intp) - self.assert_index_equal(res, eres) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -801,20 +852,20 @@ def test_join_right(self): eres = other elidx = np.array([-1, 6, -1, -1, 1, -1], dtype=np.intp) - tm.assertIsInstance(other, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(other, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) - self.assertIsNone(ridx) + assert ridx is None # monotonic res, lidx, ridx = self.index.join(other_mono, how='right', return_indexers=True) eres = other_mono elidx = np.array([-1, 1, -1, -1, 6, -1], dtype=np.intp) - tm.assertIsInstance(other, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(other, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) - self.assertIsNone(ridx) + assert ridx is None # non-unique idx = Index([1, 1, 2, 5]) @@ -823,7 +874,7 @@ def test_join_right(self): eres = Index([1, 1, 2, 5, 7, 9]) # 1 is in idx2, so it should be x2 elidx = np.array([0, 1, 2, 3, -1, -1], dtype=np.intp) eridx = np.array([0, 0, 1, 2, 3, 4], dtype=np.intp) - self.assert_index_equal(res, eres) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -833,26 +884,26 @@ def test_join_non_int_index(self): outer = self.index.join(other, how='outer') outer2 = other.join(self.index, how='outer') expected = Index([0, 2, 3, 4, 6, 7, 8, 10, 12, 14, 16, 18]) - self.assert_index_equal(outer, outer2) - self.assert_index_equal(outer, expected) + tm.assert_index_equal(outer, outer2) + tm.assert_index_equal(outer, expected) inner = self.index.join(other, how='inner') inner2 = other.join(self.index, how='inner') expected = Index([6, 8, 10]) - self.assert_index_equal(inner, inner2) - self.assert_index_equal(inner, expected) + tm.assert_index_equal(inner, inner2) + tm.assert_index_equal(inner, expected) left = self.index.join(other, how='left') - self.assert_index_equal(left, self.index.astype(object)) + tm.assert_index_equal(left, self.index.astype(object)) left2 = other.join(self.index, how='left') - self.assert_index_equal(left2, other) + tm.assert_index_equal(left2, other) right = self.index.join(other, how='right') - self.assert_index_equal(right, other) + tm.assert_index_equal(right, other) right2 = other.join(self.index, how='right') - self.assert_index_equal(right2, self.index.astype(object)) + tm.assert_index_equal(right2, self.index.astype(object)) def test_join_outer(self): other = Int64Index([7, 12, 25, 1, 2, 5]) @@ -863,7 +914,7 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other, how='outer', return_indexers=True) noidx_res = self.index.join(other, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) eres = Int64Index([0, 1, 2, 4, 5, 6, 7, 8, 10, 12, 14, 16, 18, 25]) elidx = np.array([0, -1, 1, 2, -1, 3, -1, 4, 5, 6, 7, 8, 9, -1], @@ -871,8 +922,8 @@ def test_join_outer(self): eridx = np.array([-1, 3, 4, -1, 5, -1, 0, -1, -1, 1, -1, -1, -1, 2], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -880,25 +931,24 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other_mono, how='outer', return_indexers=True) noidx_res = self.index.join(other_mono, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) elidx = np.array([0, -1, 1, 2, -1, 3, -1, 4, 5, 6, 7, 8, 9, -1], dtype=np.intp) eridx = np.array([-1, 0, 1, -1, 2, -1, 3, -1, -1, 4, -1, -1, -1, 5], dtype=np.intp) - tm.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) -class TestUInt64Index(NumericInt, tm.TestCase): +class TestUInt64Index(NumericInt): _dtype = 'uint64' _holder = UInt64Index - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.indices = dict(index=UInt64Index([2**63, 2**63 + 10, 2**63 + 15, 2**63 + 20, 2**63 + 25])) self.setup_indices() @@ -974,8 +1024,8 @@ def test_join_inner(self): elidx = np.array([1, 4], dtype=np.intp) eridx = np.array([5, 2], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -984,13 +1034,13 @@ def test_join_inner(self): return_indexers=True) res2 = self.index.intersection(other_mono) - self.assert_index_equal(res, res2) + tm.assert_index_equal(res, res2) elidx = np.array([1, 4], dtype=np.intp) eridx = np.array([3, 5], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -1006,9 +1056,9 @@ def test_join_left(self): eres = self.index eridx = np.array([-1, 5, -1, -1, 2], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) + assert lidx is None tm.assert_numpy_array_equal(ridx, eridx) # monotonic @@ -1016,9 +1066,9 @@ def test_join_left(self): return_indexers=True) eridx = np.array([-1, 3, -1, -1, 5], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) + assert lidx is None tm.assert_numpy_array_equal(ridx, eridx) # non-unique @@ -1032,7 +1082,7 @@ def test_join_left(self): eridx = np.array([0, 1, 2, 3, -1, -1], dtype=np.intp) elidx = np.array([0, 0, 1, 2, 3, 4], dtype=np.intp) - self.assert_index_equal(res, eres) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -1049,9 +1099,9 @@ def test_join_right(self): elidx = np.array([-1, -1, 4, -1, -1, 1], dtype=np.intp) tm.assert_numpy_array_equal(lidx, elidx) - tm.assertIsInstance(other, UInt64Index) - self.assert_index_equal(res, eres) - self.assertIsNone(ridx) + assert isinstance(other, UInt64Index) + tm.assert_index_equal(res, eres) + assert ridx is None # monotonic res, lidx, ridx = self.index.join(other_mono, how='right', @@ -1059,10 +1109,10 @@ def test_join_right(self): eres = other_mono elidx = np.array([-1, -1, -1, 1, -1, 4], dtype=np.intp) - tm.assertIsInstance(other, UInt64Index) + assert isinstance(other, UInt64Index) tm.assert_numpy_array_equal(lidx, elidx) - self.assert_index_equal(res, eres) - self.assertIsNone(ridx) + tm.assert_index_equal(res, eres) + assert ridx is None # non-unique idx = UInt64Index(2**63 + np.array([1, 1, 2, 5], dtype='uint64')) @@ -1075,7 +1125,7 @@ def test_join_right(self): elidx = np.array([0, 1, 2, 3, -1, -1], dtype=np.intp) eridx = np.array([0, 0, 1, 2, 3, 4], dtype=np.intp) - self.assert_index_equal(res, eres) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -1087,26 +1137,26 @@ def test_join_non_int_index(self): outer2 = other.join(self.index, how='outer') expected = Index(2**63 + np.array( [0, 1, 5, 7, 10, 15, 20, 25], dtype='uint64')) - self.assert_index_equal(outer, outer2) - self.assert_index_equal(outer, expected) + tm.assert_index_equal(outer, outer2) + tm.assert_index_equal(outer, expected) inner = self.index.join(other, how='inner') inner2 = other.join(self.index, how='inner') expected = Index(2**63 + np.array([10, 20], dtype='uint64')) - self.assert_index_equal(inner, inner2) - self.assert_index_equal(inner, expected) + tm.assert_index_equal(inner, inner2) + tm.assert_index_equal(inner, expected) left = self.index.join(other, how='left') - self.assert_index_equal(left, self.index.astype(object)) + tm.assert_index_equal(left, self.index.astype(object)) left2 = other.join(self.index, how='left') - self.assert_index_equal(left2, other) + tm.assert_index_equal(left2, other) right = self.index.join(other, how='right') - self.assert_index_equal(right, other) + tm.assert_index_equal(right, other) right2 = other.join(self.index, how='right') - self.assert_index_equal(right2, self.index.astype(object)) + tm.assert_index_equal(right2, self.index.astype(object)) def test_join_outer(self): other = UInt64Index(2**63 + np.array( @@ -1119,15 +1169,15 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other, how='outer', return_indexers=True) noidx_res = self.index.join(other, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) eres = UInt64Index(2**63 + np.array( [0, 1, 2, 7, 10, 12, 15, 20, 25], dtype='uint64')) elidx = np.array([0, -1, -1, -1, 1, -1, 2, 3, 4], dtype=np.intp) eridx = np.array([-1, 3, 4, 0, 5, 1, -1, -1, 2], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) @@ -1135,17 +1185,12 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other_mono, how='outer', return_indexers=True) noidx_res = self.index.join(other_mono, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) elidx = np.array([0, -1, -1, -1, 1, -1, 2, 3, 4], dtype=np.intp) eridx = np.array([-1, 0, 1, 2, 3, 4, -1, -1, 5], dtype=np.intp) - tm.assertIsInstance(res, UInt64Index) - self.assert_index_equal(res, eres) + assert isinstance(res, UInt64Index) + tm.assert_index_equal(res, eres) tm.assert_numpy_array_equal(lidx, elidx) tm.assert_numpy_array_equal(ridx, eridx) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/indexes/test_range.py b/pandas/tests/indexes/test_range.py index 38e715fce2720..5ecf467b57fc5 100644 --- a/pandas/tests/indexes/test_range.py +++ b/pandas/tests/indexes/test_range.py @@ -1,5 +1,7 @@ # -*- coding: utf-8 -*- +import pytest + from datetime import datetime from itertools import combinations import operator @@ -8,8 +10,8 @@ import numpy as np -from pandas import (Series, Index, Float64Index, Int64Index, RangeIndex) -from pandas.util.testing import assertRaisesRegexp +from pandas import (notna, Series, Index, Float64Index, + Int64Index, RangeIndex) import pandas.util.testing as tm @@ -18,11 +20,11 @@ from .test_numeric import Numeric -class TestRangeIndex(Numeric, tm.TestCase): +class TestRangeIndex(Numeric): _holder = RangeIndex _compat_props = ['shape', 'ndim', 'size', 'itemsize'] - def setUp(self): + def setup_method(self, method): self.indices = dict(index=RangeIndex(0, 20, 2, name='foo')) self.setup_indices() @@ -62,105 +64,105 @@ def test_too_many_names(self): def testit(): self.index.names = ["roger", "harold"] - assertRaisesRegexp(ValueError, "^Length", testit) + tm.assert_raises_regex(ValueError, "^Length", testit) def test_constructor(self): index = RangeIndex(5) expected = np.arange(5, dtype=np.int64) - self.assertIsInstance(index, RangeIndex) - self.assertEqual(index._start, 0) - self.assertEqual(index._stop, 5) - self.assertEqual(index._step, 1) - self.assertEqual(index.name, None) + assert isinstance(index, RangeIndex) + assert index._start == 0 + assert index._stop == 5 + assert index._step == 1 + assert index.name is None tm.assert_index_equal(Index(expected), index) index = RangeIndex(1, 5) expected = np.arange(1, 5, dtype=np.int64) - self.assertIsInstance(index, RangeIndex) - self.assertEqual(index._start, 1) + assert isinstance(index, RangeIndex) + assert index._start == 1 tm.assert_index_equal(Index(expected), index) index = RangeIndex(1, 5, 2) expected = np.arange(1, 5, 2, dtype=np.int64) - self.assertIsInstance(index, RangeIndex) - self.assertEqual(index._step, 2) + assert isinstance(index, RangeIndex) + assert index._step == 2 tm.assert_index_equal(Index(expected), index) msg = "RangeIndex\\(\\.\\.\\.\\) must be called with integers" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): RangeIndex() for index in [RangeIndex(0), RangeIndex(start=0), RangeIndex(stop=0), RangeIndex(0, 0)]: expected = np.empty(0, dtype=np.int64) - self.assertIsInstance(index, RangeIndex) - self.assertEqual(index._start, 0) - self.assertEqual(index._stop, 0) - self.assertEqual(index._step, 1) + assert isinstance(index, RangeIndex) + assert index._start == 0 + assert index._stop == 0 + assert index._step == 1 tm.assert_index_equal(Index(expected), index) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): RangeIndex(name='Foo') for index in [RangeIndex(0, name='Foo'), RangeIndex(start=0, name='Foo'), RangeIndex(stop=0, name='Foo'), RangeIndex(0, 0, name='Foo')]: - self.assertIsInstance(index, RangeIndex) - self.assertEqual(index.name, 'Foo') + assert isinstance(index, RangeIndex) + assert index.name == 'Foo' # we don't allow on a bare Index - self.assertRaises(TypeError, lambda: Index(0, 1000)) + pytest.raises(TypeError, lambda: Index(0, 1000)) # invalid args for i in [Index(['a', 'b']), Series(['a', 'b']), np.array(['a', 'b']), [], 'foo', datetime(2000, 1, 1, 0, 0), np.arange(0, 10), np.array([1]), [1]]: - self.assertRaises(TypeError, lambda: RangeIndex(i)) + pytest.raises(TypeError, lambda: RangeIndex(i)) def test_constructor_same(self): # pass thru w and w/o copy index = RangeIndex(1, 5, 2) result = RangeIndex(index, copy=False) - self.assertTrue(result.identical(index)) + assert result.identical(index) result = RangeIndex(index, copy=True) - self.assert_index_equal(result, index, exact=True) + tm.assert_index_equal(result, index, exact=True) result = RangeIndex(index) - self.assert_index_equal(result, index, exact=True) + tm.assert_index_equal(result, index, exact=True) - self.assertRaises(TypeError, - lambda: RangeIndex(index, dtype='float64')) + pytest.raises(TypeError, + lambda: RangeIndex(index, dtype='float64')) def test_constructor_range(self): - self.assertRaises(TypeError, lambda: RangeIndex(range(1, 5, 2))) + pytest.raises(TypeError, lambda: RangeIndex(range(1, 5, 2))) result = RangeIndex.from_range(range(1, 5, 2)) expected = RangeIndex(1, 5, 2) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = RangeIndex.from_range(range(5, 6)) expected = RangeIndex(5, 6, 1) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) # an invalid range result = RangeIndex.from_range(range(5, 1)) expected = RangeIndex(0, 0, 1) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = RangeIndex.from_range(range(5)) expected = RangeIndex(0, 5, 1) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = Index(range(1, 5, 2)) expected = RangeIndex(1, 5, 2) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) - self.assertRaises(TypeError, - lambda: Index(range(1, 5, 2), dtype='float64')) + pytest.raises(TypeError, + lambda: Index(range(1, 5, 2), dtype='float64')) def test_constructor_name(self): # GH12288 @@ -170,16 +172,16 @@ def test_constructor_name(self): copy = RangeIndex(orig) copy.name = 'copy' - self.assertTrue(orig.name, 'original') - self.assertTrue(copy.name, 'copy') + assert orig.name == 'original' + assert copy.name == 'copy' new = Index(copy) - self.assertTrue(new.name, 'copy') + assert new.name == 'copy' new.name = 'new' - self.assertTrue(orig.name, 'original') - self.assertTrue(new.name, 'copy') - self.assertTrue(new.name, 'new') + assert orig.name == 'original' + assert copy.name == 'copy' + assert new.name == 'new' def test_numeric_compat2(self): # validate that we are handling the RangeIndex overrides to numeric ops @@ -189,15 +191,15 @@ def test_numeric_compat2(self): result = idx * 2 expected = RangeIndex(0, 20, 4) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = idx + 2 expected = RangeIndex(2, 12, 2) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = idx - 2 expected = RangeIndex(-2, 8, 2) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) # truediv under PY3 result = idx / 2 @@ -206,11 +208,11 @@ def test_numeric_compat2(self): expected = RangeIndex(0, 5, 1).astype('float64') else: expected = RangeIndex(0, 5, 1) - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = idx / 4 expected = RangeIndex(0, 10, 2) / 4 - self.assert_index_equal(result, expected, exact=True) + tm.assert_index_equal(result, expected, exact=True) result = idx // 1 expected = idx @@ -244,25 +246,25 @@ def test_numeric_compat2(self): def test_constructor_corner(self): arr = np.array([1, 2, 3, 4], dtype=object) index = RangeIndex(1, 5) - self.assertEqual(index.values.dtype, np.int64) - self.assert_index_equal(index, Index(arr)) + assert index.values.dtype == np.int64 + tm.assert_index_equal(index, Index(arr)) # non-int raise Exception - self.assertRaises(TypeError, RangeIndex, '1', '10', '1') - self.assertRaises(TypeError, RangeIndex, 1.1, 10.2, 1.3) + pytest.raises(TypeError, RangeIndex, '1', '10', '1') + pytest.raises(TypeError, RangeIndex, 1.1, 10.2, 1.3) # invalid passed type - self.assertRaises(TypeError, lambda: RangeIndex(1, 5, dtype='float64')) + pytest.raises(TypeError, lambda: RangeIndex(1, 5, dtype='float64')) def test_copy(self): i = RangeIndex(5, name='Foo') i_copy = i.copy() - self.assertTrue(i_copy is not i) - self.assertTrue(i_copy.identical(i)) - self.assertEqual(i_copy._start, 0) - self.assertEqual(i_copy._stop, 5) - self.assertEqual(i_copy._step, 1) - self.assertEqual(i_copy.name, 'Foo') + assert i_copy is not i + assert i_copy.identical(i) + assert i_copy._start == 0 + assert i_copy._stop == 5 + assert i_copy._step == 1 + assert i_copy.name == 'Foo' def test_repr(self): i = RangeIndex(5, name='Foo') @@ -271,18 +273,18 @@ def test_repr(self): expected = "RangeIndex(start=0, stop=5, step=1, name='Foo')" else: expected = "RangeIndex(start=0, stop=5, step=1, name=u'Foo')" - self.assertTrue(result, expected) + assert result == expected result = eval(result) - self.assert_index_equal(result, i, exact=True) + tm.assert_index_equal(result, i, exact=True) i = RangeIndex(5, 0, -1) result = repr(i) expected = "RangeIndex(start=5, stop=0, step=-1)" - self.assertEqual(result, expected) + assert result == expected result = eval(result) - self.assert_index_equal(result, i, exact=True) + tm.assert_index_equal(result, i, exact=True) def test_insert(self): @@ -290,22 +292,22 @@ def test_insert(self): result = idx[1:4] # test 0th element - self.assert_index_equal(idx[0:4], result.insert(0, idx[0])) + tm.assert_index_equal(idx[0:4], result.insert(0, idx[0])) def test_delete(self): idx = RangeIndex(5, name='Foo') expected = idx[1:].astype(int) result = idx.delete(0) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) + tm.assert_index_equal(result, expected) + assert result.name == expected.name expected = idx[:-1].astype(int) result = idx.delete(-1) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) + tm.assert_index_equal(result, expected) + assert result.name == expected.name - with tm.assertRaises((IndexError, ValueError)): + with pytest.raises((IndexError, ValueError)): # either depending on numpy version result = idx.delete(len(idx)) @@ -314,7 +316,7 @@ def test_view(self): i = RangeIndex(0, name='Foo') i_view = i.view() - self.assertEqual(i_view.name, 'Foo') + assert i_view.name == 'Foo' i_view = i.view('i8') tm.assert_numpy_array_equal(i.values, i_view) @@ -323,31 +325,41 @@ def test_view(self): tm.assert_index_equal(i, i_view) def test_dtype(self): - self.assertEqual(self.index.dtype, np.int64) + assert self.index.dtype == np.int64 def test_is_monotonic(self): - self.assertTrue(self.index.is_monotonic) - self.assertTrue(self.index.is_monotonic_increasing) - self.assertFalse(self.index.is_monotonic_decreasing) + assert self.index.is_monotonic + assert self.index.is_monotonic_increasing + assert not self.index.is_monotonic_decreasing + assert self.index._is_strictly_monotonic_increasing + assert not self.index._is_strictly_monotonic_decreasing index = RangeIndex(4, 0, -1) - self.assertFalse(index.is_monotonic) - self.assertTrue(index.is_monotonic_decreasing) + assert not index.is_monotonic + assert not index._is_strictly_monotonic_increasing + assert index.is_monotonic_decreasing + assert index._is_strictly_monotonic_decreasing index = RangeIndex(1, 2) - self.assertTrue(index.is_monotonic) - self.assertTrue(index.is_monotonic_increasing) - self.assertTrue(index.is_monotonic_decreasing) + assert index.is_monotonic + assert index.is_monotonic_increasing + assert index.is_monotonic_decreasing + assert index._is_strictly_monotonic_increasing + assert index._is_strictly_monotonic_decreasing index = RangeIndex(2, 1) - self.assertTrue(index.is_monotonic) - self.assertTrue(index.is_monotonic_increasing) - self.assertTrue(index.is_monotonic_decreasing) + assert index.is_monotonic + assert index.is_monotonic_increasing + assert index.is_monotonic_decreasing + assert index._is_strictly_monotonic_increasing + assert index._is_strictly_monotonic_decreasing index = RangeIndex(1, 1) - self.assertTrue(index.is_monotonic) - self.assertTrue(index.is_monotonic_increasing) - self.assertTrue(index.is_monotonic_decreasing) + assert index.is_monotonic + assert index.is_monotonic_increasing + assert index.is_monotonic_decreasing + assert index._is_strictly_monotonic_increasing + assert index._is_strictly_monotonic_decreasing def test_equals_range(self): equiv_pairs = [(RangeIndex(0, 9, 2), RangeIndex(0, 10, 2)), @@ -355,54 +367,53 @@ def test_equals_range(self): (RangeIndex(1, 2, 3), RangeIndex(1, 3, 4)), (RangeIndex(0, -9, -2), RangeIndex(0, -10, -2))] for left, right in equiv_pairs: - self.assertTrue(left.equals(right)) - self.assertTrue(right.equals(left)) + assert left.equals(right) + assert right.equals(left) def test_logical_compat(self): idx = self.create_index() - self.assertEqual(idx.all(), idx.values.all()) - self.assertEqual(idx.any(), idx.values.any()) + assert idx.all() == idx.values.all() + assert idx.any() == idx.values.any() def test_identical(self): i = Index(self.index.copy()) - self.assertTrue(i.identical(self.index)) + assert i.identical(self.index) # we don't allow object dtype for RangeIndex if isinstance(self.index, RangeIndex): return same_values_different_type = Index(i, dtype=object) - self.assertFalse(i.identical(same_values_different_type)) + assert not i.identical(same_values_different_type) i = self.index.copy(dtype=object) i = i.rename('foo') same_values = Index(i, dtype=object) - self.assertTrue(same_values.identical(self.index.copy(dtype=object))) + assert same_values.identical(self.index.copy(dtype=object)) - self.assertFalse(i.identical(self.index)) - self.assertTrue(Index(same_values, name='foo', dtype=object).identical( - i)) + assert not i.identical(self.index) + assert Index(same_values, name='foo', dtype=object).identical(i) - self.assertFalse(self.index.copy(dtype=object) - .identical(self.index.copy(dtype='int64'))) + assert not self.index.copy(dtype=object).identical( + self.index.copy(dtype='int64')) def test_get_indexer(self): target = RangeIndex(10) indexer = self.index.get_indexer(target) expected = np.array([0, -1, 1, -1, 2, -1, 3, -1, 4, -1], dtype=np.intp) - self.assert_numpy_array_equal(indexer, expected) + tm.assert_numpy_array_equal(indexer, expected) def test_get_indexer_pad(self): target = RangeIndex(10) indexer = self.index.get_indexer(target, method='pad') expected = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype=np.intp) - self.assert_numpy_array_equal(indexer, expected) + tm.assert_numpy_array_equal(indexer, expected) def test_get_indexer_backfill(self): target = RangeIndex(10) indexer = self.index.get_indexer(target, method='backfill') expected = np.array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5], dtype=np.intp) - self.assert_numpy_array_equal(indexer, expected) + tm.assert_numpy_array_equal(indexer, expected) def test_join_outer(self): # join with Int64Index @@ -411,7 +422,7 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other, how='outer', return_indexers=True) noidx_res = self.index.join(other, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) eres = Int64Index([0, 2, 4, 6, 8, 10, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]) @@ -420,11 +431,11 @@ def test_join_outer(self): eridx = np.array([-1, -1, -1, -1, -1, -1, -1, -1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0], dtype=np.intp) - self.assertIsInstance(res, Int64Index) - self.assertFalse(isinstance(res, RangeIndex)) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, Int64Index) + assert not isinstance(res, RangeIndex) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + tm.assert_numpy_array_equal(ridx, eridx) # join with RangeIndex other = RangeIndex(25, 14, -1) @@ -432,13 +443,13 @@ def test_join_outer(self): res, lidx, ridx = self.index.join(other, how='outer', return_indexers=True) noidx_res = self.index.join(other, how='outer') - self.assert_index_equal(res, noidx_res) + tm.assert_index_equal(res, noidx_res) - self.assertIsInstance(res, Int64Index) - self.assertFalse(isinstance(res, RangeIndex)) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, Int64Index) + assert not isinstance(res, RangeIndex) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + tm.assert_numpy_array_equal(ridx, eridx) def test_join_inner(self): # Join with non-RangeIndex @@ -457,10 +468,10 @@ def test_join_inner(self): elidx = np.array([8, 9], dtype=np.intp) eridx = np.array([9, 7], dtype=np.intp) - self.assertIsInstance(res, Int64Index) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, Int64Index) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + tm.assert_numpy_array_equal(ridx, eridx) # Join two RangeIndex other = RangeIndex(25, 14, -1) @@ -468,10 +479,10 @@ def test_join_inner(self): res, lidx, ridx = self.index.join(other, how='inner', return_indexers=True) - self.assertIsInstance(res, RangeIndex) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, RangeIndex) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + tm.assert_numpy_array_equal(ridx, eridx) def test_join_left(self): # Join with Int64Index @@ -482,10 +493,10 @@ def test_join_left(self): eres = self.index eridx = np.array([-1, -1, -1, -1, -1, -1, -1, -1, 9, 7], dtype=np.intp) - self.assertIsInstance(res, RangeIndex) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, RangeIndex) + tm.assert_index_equal(res, eres) + assert lidx is None + tm.assert_numpy_array_equal(ridx, eridx) # Join withRangeIndex other = Int64Index(np.arange(25, 14, -1)) @@ -493,10 +504,10 @@ def test_join_left(self): res, lidx, ridx = self.index.join(other, how='left', return_indexers=True) - self.assertIsInstance(res, RangeIndex) - self.assert_index_equal(res, eres) - self.assertIsNone(lidx) - self.assert_numpy_array_equal(ridx, eridx) + assert isinstance(res, RangeIndex) + tm.assert_index_equal(res, eres) + assert lidx is None + tm.assert_numpy_array_equal(ridx, eridx) def test_join_right(self): # Join with Int64Index @@ -508,10 +519,10 @@ def test_join_right(self): elidx = np.array([-1, -1, -1, -1, -1, -1, -1, 9, -1, 8, -1], dtype=np.intp) - self.assertIsInstance(other, Int64Index) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assertIsNone(ridx) + assert isinstance(other, Int64Index) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + assert ridx is None # Join withRangeIndex other = RangeIndex(25, 14, -1) @@ -520,10 +531,10 @@ def test_join_right(self): return_indexers=True) eres = other - self.assertIsInstance(other, RangeIndex) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assertIsNone(ridx) + assert isinstance(other, RangeIndex) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + assert ridx is None def test_join_non_int_index(self): other = Index([3, 6, 7, 8, 10], dtype=object) @@ -531,26 +542,26 @@ def test_join_non_int_index(self): outer = self.index.join(other, how='outer') outer2 = other.join(self.index, how='outer') expected = Index([0, 2, 3, 4, 6, 7, 8, 10, 12, 14, 16, 18]) - self.assert_index_equal(outer, outer2) - self.assert_index_equal(outer, expected) + tm.assert_index_equal(outer, outer2) + tm.assert_index_equal(outer, expected) inner = self.index.join(other, how='inner') inner2 = other.join(self.index, how='inner') expected = Index([6, 8, 10]) - self.assert_index_equal(inner, inner2) - self.assert_index_equal(inner, expected) + tm.assert_index_equal(inner, inner2) + tm.assert_index_equal(inner, expected) left = self.index.join(other, how='left') - self.assert_index_equal(left, self.index.astype(object)) + tm.assert_index_equal(left, self.index.astype(object)) left2 = other.join(self.index, how='left') - self.assert_index_equal(left2, other) + tm.assert_index_equal(left2, other) right = self.index.join(other, how='right') - self.assert_index_equal(right, other) + tm.assert_index_equal(right, other) right2 = other.join(self.index, how='right') - self.assert_index_equal(right2, self.index.astype(object)) + tm.assert_index_equal(right2, self.index.astype(object)) def test_join_non_unique(self): other = Index([4, 4, 3, 3]) @@ -562,15 +573,15 @@ def test_join_non_unique(self): eridx = np.array([-1, -1, 0, 1, -1, -1, -1, -1, -1, -1, -1], dtype=np.intp) - self.assert_index_equal(res, eres) - self.assert_numpy_array_equal(lidx, elidx) - self.assert_numpy_array_equal(ridx, eridx) + tm.assert_index_equal(res, eres) + tm.assert_numpy_array_equal(lidx, elidx) + tm.assert_numpy_array_equal(ridx, eridx) def test_join_self(self): kinds = 'outer', 'inner', 'left', 'right' for kind in kinds: joined = self.index.join(self.index, how=kind) - self.assertIs(self.index, joined) + assert self.index is joined def test_intersection(self): # intersect with Int64Index @@ -578,26 +589,26 @@ def test_intersection(self): result = self.index.intersection(other) expected = Index(np.sort(np.intersect1d(self.index.values, other.values))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = other.intersection(self.index) expected = Index(np.sort(np.asarray(np.intersect1d(self.index.values, other.values)))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # intersect with increasing RangeIndex other = RangeIndex(1, 6) result = self.index.intersection(other) expected = Index(np.sort(np.intersect1d(self.index.values, other.values))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # intersect with decreasing RangeIndex other = RangeIndex(5, 0, -1) result = self.index.intersection(other) expected = Index(np.sort(np.intersect1d(self.index.values, other.values))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) index = RangeIndex(5) @@ -605,37 +616,28 @@ def test_intersection(self): other = RangeIndex(5, 10, 1) result = index.intersection(other) expected = RangeIndex(0, 0, 1) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) other = RangeIndex(-1, -5, -1) result = index.intersection(other) expected = RangeIndex(0, 0, 1) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # intersection of empty indices other = RangeIndex(0, 0, 1) result = index.intersection(other) expected = RangeIndex(0, 0, 1) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = other.intersection(index) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # intersection of non-overlapping values based on start value and gcd index = RangeIndex(1, 10, 2) other = RangeIndex(0, 10, 4) result = index.intersection(other) expected = RangeIndex(0, 0, 1) - self.assert_index_equal(result, expected) - - def test_intersect_str_dates(self): - dt_dates = [datetime(2012, 2, 9), datetime(2012, 2, 22)] - - i1 = Index(dt_dates, dtype=object) - i2 = Index(['aa'], dtype=object) - res = i2.intersection(i1) - - self.assertEqual(len(res), 0) + tm.assert_index_equal(result, expected) def test_union_noncomparable(self): from datetime import datetime, timedelta @@ -644,11 +646,11 @@ def test_union_noncomparable(self): other = Index([now + timedelta(i) for i in range(4)], dtype=object) result = self.index.union(other) expected = Index(np.concatenate((self.index, other))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = other.union(self.index) expected = Index(np.concatenate((other, self.index))) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_union(self): RI = RangeIndex @@ -687,30 +689,30 @@ def test_nbytes(self): # memory savings vs int index i = RangeIndex(0, 1000) - self.assertTrue(i.nbytes < i.astype(int).nbytes / 10) + assert i.nbytes < i.astype(int).nbytes / 10 # constant memory usage i2 = RangeIndex(0, 10) - self.assertEqual(i.nbytes, i2.nbytes) + assert i.nbytes == i2.nbytes def test_cant_or_shouldnt_cast(self): # can't - self.assertRaises(TypeError, RangeIndex, 'foo', 'bar', 'baz') + pytest.raises(TypeError, RangeIndex, 'foo', 'bar', 'baz') # shouldn't - self.assertRaises(TypeError, RangeIndex, '0', '1', '2') + pytest.raises(TypeError, RangeIndex, '0', '1', '2') def test_view_Index(self): self.index.view(Index) def test_prevent_casting(self): result = self.index.astype('O') - self.assertEqual(result.dtype, np.object_) + assert result.dtype == np.object_ def test_take_preserve_name(self): index = RangeIndex(1, 5, name='foo') taken = index.take([3, 0, 1]) - self.assertEqual(index.name, taken.name) + assert index.name == taken.name def test_take_fill_value(self): # GH 12631 @@ -721,7 +723,7 @@ def test_take_fill_value(self): # fill_value msg = "Unable to fill values because RangeIndex cannot contain NA" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -1]), fill_value=True) # allow_fill=False @@ -731,12 +733,12 @@ def test_take_fill_value(self): tm.assert_index_equal(result, expected) msg = "Unable to fill values because RangeIndex cannot contain NA" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): idx.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): idx.take(np.array([1, -5])) def test_print_unicode_columns(self): @@ -750,7 +752,7 @@ def test_repr_roundtrip(self): def test_slice_keep_name(self): idx = RangeIndex(1, 2, name='asdf') - self.assertEqual(idx.name, idx[1:].name) + assert idx.name == idx[1:].name def test_explicit_conversions(self): @@ -782,8 +784,8 @@ def test_duplicates(self): if not len(ind): continue idx = self.indices[ind] - self.assertTrue(idx.is_unique) - self.assertFalse(idx.has_duplicates) + assert idx.is_unique + assert not idx.has_duplicates def test_ufunc_compat(self): idx = RangeIndex(5) @@ -793,48 +795,48 @@ def test_ufunc_compat(self): def test_extended_gcd(self): result = self.index._extended_gcd(6, 10) - self.assertEqual(result[0], result[1] * 6 + result[2] * 10) - self.assertEqual(2, result[0]) + assert result[0] == result[1] * 6 + result[2] * 10 + assert 2 == result[0] result = self.index._extended_gcd(10, 6) - self.assertEqual(2, result[1] * 10 + result[2] * 6) - self.assertEqual(2, result[0]) + assert 2 == result[1] * 10 + result[2] * 6 + assert 2 == result[0] def test_min_fitting_element(self): result = RangeIndex(0, 20, 2)._min_fitting_element(1) - self.assertEqual(2, result) + assert 2 == result result = RangeIndex(1, 6)._min_fitting_element(1) - self.assertEqual(1, result) + assert 1 == result result = RangeIndex(18, -2, -2)._min_fitting_element(1) - self.assertEqual(2, result) + assert 2 == result result = RangeIndex(5, 0, -1)._min_fitting_element(1) - self.assertEqual(1, result) + assert 1 == result big_num = 500000000000000000000000 result = RangeIndex(5, big_num * 2, 1)._min_fitting_element(big_num) - self.assertEqual(big_num, result) + assert big_num == result def test_max_fitting_element(self): result = RangeIndex(0, 20, 2)._max_fitting_element(17) - self.assertEqual(16, result) + assert 16 == result result = RangeIndex(1, 6)._max_fitting_element(4) - self.assertEqual(4, result) + assert 4 == result result = RangeIndex(18, -2, -2)._max_fitting_element(17) - self.assertEqual(16, result) + assert 16 == result result = RangeIndex(5, 0, -1)._max_fitting_element(4) - self.assertEqual(4, result) + assert 4 == result big_num = 500000000000000000000000 result = RangeIndex(5, big_num * 2, 1)._max_fitting_element(big_num) - self.assertEqual(big_num, result) + assert big_num == result def test_pickle_compat_construction(self): # RangeIndex() is a valid constructor @@ -845,53 +847,53 @@ def test_slice_specialised(self): # scalar indexing res = self.index[1] expected = 2 - self.assertEqual(res, expected) + assert res == expected res = self.index[-1] expected = 18 - self.assertEqual(res, expected) + assert res == expected # slicing # slice value completion index = self.index[:] expected = self.index - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) # positive slice values index = self.index[7:10:2] expected = Index(np.array([14, 18]), name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) # negative slice values index = self.index[-1:-5:-2] expected = Index(np.array([18, 14]), name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) # stop overshoot index = self.index[2:100:4] expected = Index(np.array([4, 12]), name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) # reverse index = self.index[::-1] expected = Index(self.index.values[::-1], name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) index = self.index[-8::-1] expected = Index(np.array([4, 2, 0]), name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) index = self.index[-40::-1] expected = Index(np.array([], dtype=np.int64), name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) index = self.index[40::-1] expected = Index(self.index.values[40::-1], name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) index = self.index[10::-1] expected = Index(self.index.values[::-1], name='foo') - self.assert_index_equal(index, expected) + tm.assert_index_equal(index, expected) def test_len_specialised(self): @@ -902,16 +904,77 @@ def test_len_specialised(self): arr = np.arange(0, 5, step) i = RangeIndex(0, 5, step) - self.assertEqual(len(i), len(arr)) + assert len(i) == len(arr) i = RangeIndex(5, 0, step) - self.assertEqual(len(i), 0) + assert len(i) == 0 for step in np.arange(-6, -1, 1): arr = np.arange(5, 0, step) i = RangeIndex(5, 0, step) - self.assertEqual(len(i), len(arr)) + assert len(i) == len(arr) i = RangeIndex(0, 5, step) - self.assertEqual(len(i), 0) + assert len(i) == 0 + + def test_where(self): + i = self.create_index() + result = i.where(notna(i)) + expected = i + tm.assert_index_equal(result, expected) + + _nan = i._na_value + cond = [False] + [True] * len(i[1:]) + expected = pd.Index([_nan] + i[1:].tolist()) + + result = i.where(cond) + tm.assert_index_equal(result, expected) + + def test_where_array_like(self): + i = self.create_index() + + _nan = i._na_value + cond = [False] + [True] * (len(i) - 1) + klasses = [list, tuple, np.array, pd.Series] + expected = pd.Index([_nan] + i[1:].tolist()) + + for klass in klasses: + result = i.where(klass(cond)) + tm.assert_index_equal(result, expected) + + def test_append(self): + # GH16212 + RI = RangeIndex + I64 = Int64Index + F64 = Float64Index + OI = Index + cases = [([RI(1, 12, 5)], RI(1, 12, 5)), + ([RI(0, 6, 4)], RI(0, 6, 4)), + ([RI(1, 3), RI(3, 7)], RI(1, 7)), + ([RI(1, 5, 2), RI(5, 6)], RI(1, 6, 2)), + ([RI(1, 3, 2), RI(4, 7, 3)], RI(1, 7, 3)), + ([RI(-4, 3, 2), RI(4, 7, 2)], RI(-4, 7, 2)), + ([RI(-4, -8), RI(-8, -12)], RI(-8, -12)), + ([RI(-4, -8), RI(3, -4)], RI(3, -8)), + ([RI(-4, -8), RI(3, 5)], RI(3, 5)), + ([RI(-4, -2), RI(3, 5)], I64([-4, -3, 3, 4])), + ([RI(-2,), RI(3, 5)], RI(3, 5)), + ([RI(2,), RI(2)], I64([0, 1, 0, 1])), + ([RI(2,), RI(2, 5), RI(5, 8, 4)], RI(0, 6)), + ([RI(2,), RI(3, 5), RI(5, 8, 4)], I64([0, 1, 3, 4, 5])), + ([RI(-2, 2), RI(2, 5), RI(5, 8, 4)], RI(-2, 6)), + ([RI(3,), I64([-1, 3, 15])], I64([0, 1, 2, -1, 3, 15])), + ([RI(3,), F64([-1, 3.1, 15.])], F64([0, 1, 2, -1, 3.1, 15.])), + ([RI(3,), OI(['a', None, 14])], OI([0, 1, 2, 'a', None, 14])), + ([RI(3, 1), OI(['a', None, 14])], OI(['a', None, 14])) + ] + + for indices, expected in cases: + result = indices[0].append(indices[1:]) + tm.assert_index_equal(result, expected, exact=True) + + if len(indices) == 2: + # Append single item rather than list + result2 = indices[0].append(indices[1]) + tm.assert_index_equal(result2, expected, exact=True) diff --git a/pandas/tseries/tests/__init__.py b/pandas/tests/indexes/timedeltas/__init__.py similarity index 100% rename from pandas/tseries/tests/__init__.py rename to pandas/tests/indexes/timedeltas/__init__.py diff --git a/pandas/tests/indexes/timedeltas/test_astype.py b/pandas/tests/indexes/timedeltas/test_astype.py new file mode 100644 index 0000000000000..586b96f980f8f --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_astype.py @@ -0,0 +1,123 @@ +import pytest + +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +from pandas import (TimedeltaIndex, timedelta_range, Int64Index, Float64Index, + Index, Timedelta, Series) + +from ..datetimelike import DatetimeLike + + +class TestTimedeltaIndex(DatetimeLike): + _holder = TimedeltaIndex + _multiprocess_can_split_ = True + + def setup_method(self, method): + self.indices = dict(index=tm.makeTimedeltaIndex(10)) + self.setup_indices() + + def create_index(self): + return pd.to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) + + def test_astype(self): + # GH 13149, GH 13209 + idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) + + result = idx.astype(object) + expected = Index([Timedelta('1 days 03:46:40')] + [pd.NaT] * 3, + dtype=object) + tm.assert_index_equal(result, expected) + + result = idx.astype(int) + expected = Int64Index([100000000000000] + [-9223372036854775808] * 3, + dtype=np.int64) + tm.assert_index_equal(result, expected) + + rng = timedelta_range('1 days', periods=10) + + result = rng.astype('i8') + tm.assert_index_equal(result, Index(rng.asi8)) + tm.assert_numpy_array_equal(rng.asi8, result.values) + + def test_astype_timedelta64(self): + # GH 13149, GH 13209 + idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) + + result = idx.astype('timedelta64') + expected = Float64Index([1e+14] + [np.NaN] * 3, dtype='float64') + tm.assert_index_equal(result, expected) + + result = idx.astype('timedelta64[ns]') + tm.assert_index_equal(result, idx) + assert result is not idx + + result = idx.astype('timedelta64[ns]', copy=False) + tm.assert_index_equal(result, idx) + assert result is idx + + def test_astype_raises(self): + # GH 13149, GH 13209 + idx = TimedeltaIndex([1e14, 'NaT', pd.NaT, np.NaN]) + + pytest.raises(ValueError, idx.astype, float) + pytest.raises(ValueError, idx.astype, str) + pytest.raises(ValueError, idx.astype, 'datetime64') + pytest.raises(ValueError, idx.astype, 'datetime64[ns]') + + def test_pickle_compat_construction(self): + pass + + def test_shift(self): + # test shift for TimedeltaIndex + # err8083 + + drange = self.create_index() + result = drange.shift(1) + expected = TimedeltaIndex(['1 days 01:00:00', '2 days 01:00:00', + '3 days 01:00:00', + '4 days 01:00:00', '5 days 01:00:00'], + freq='D') + tm.assert_index_equal(result, expected) + + result = drange.shift(3, freq='2D 1s') + expected = TimedeltaIndex(['6 days 01:00:03', '7 days 01:00:03', + '8 days 01:00:03', '9 days 01:00:03', + '10 days 01:00:03'], freq='D') + tm.assert_index_equal(result, expected) + + def test_numeric_compat(self): + + idx = self._holder(np.arange(5, dtype='int64')) + didx = self._holder(np.arange(5, dtype='int64') ** 2) + result = idx * 1 + tm.assert_index_equal(result, idx) + + result = 1 * idx + tm.assert_index_equal(result, idx) + + result = idx / 1 + tm.assert_index_equal(result, idx) + + result = idx // 1 + tm.assert_index_equal(result, idx) + + result = idx * np.array(5, dtype='int64') + tm.assert_index_equal(result, + self._holder(np.arange(5, dtype='int64') * 5)) + + result = idx * np.arange(5, dtype='int64') + tm.assert_index_equal(result, didx) + + result = idx * Series(np.arange(5, dtype='int64')) + tm.assert_index_equal(result, didx) + + result = idx * Series(np.arange(5, dtype='float64') + 0.1) + tm.assert_index_equal(result, self._holder(np.arange( + 5, dtype='float64') * (np.arange(5, dtype='float64') + 0.1))) + + # invalid + pytest.raises(TypeError, lambda: idx * idx) + pytest.raises(ValueError, lambda: idx * self._holder(np.arange(3))) + pytest.raises(ValueError, lambda: idx * np.array([1, 2])) diff --git a/pandas/tests/indexes/timedeltas/test_construction.py b/pandas/tests/indexes/timedeltas/test_construction.py new file mode 100644 index 0000000000000..dd25e2cca2e55 --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_construction.py @@ -0,0 +1,88 @@ +import pytest + +import numpy as np +from datetime import timedelta + +import pandas as pd +import pandas.util.testing as tm +from pandas import TimedeltaIndex, timedelta_range, to_timedelta + + +class TestTimedeltaIndex(object): + _multiprocess_can_split_ = True + + def test_construction_base_constructor(self): + arr = [pd.Timedelta('1 days'), pd.NaT, pd.Timedelta('3 days')] + tm.assert_index_equal(pd.Index(arr), pd.TimedeltaIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.TimedeltaIndex(np.array(arr))) + + arr = [np.nan, pd.NaT, pd.Timedelta('1 days')] + tm.assert_index_equal(pd.Index(arr), pd.TimedeltaIndex(arr)) + tm.assert_index_equal(pd.Index(np.array(arr)), + pd.TimedeltaIndex(np.array(arr))) + + def test_constructor(self): + expected = TimedeltaIndex(['1 days', '1 days 00:00:05', '2 days', + '2 days 00:00:02', '0 days 00:00:03']) + result = TimedeltaIndex(['1 days', '1 days, 00:00:05', np.timedelta64( + 2, 'D'), timedelta(days=2, seconds=2), pd.offsets.Second(3)]) + tm.assert_index_equal(result, expected) + + # unicode + result = TimedeltaIndex([u'1 days', '1 days, 00:00:05', np.timedelta64( + 2, 'D'), timedelta(days=2, seconds=2), pd.offsets.Second(3)]) + + expected = TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', + '0 days 00:00:02']) + tm.assert_index_equal(TimedeltaIndex(range(3), unit='s'), expected) + expected = TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:05', + '0 days 00:00:09']) + tm.assert_index_equal(TimedeltaIndex([0, 5, 9], unit='s'), expected) + expected = TimedeltaIndex( + ['0 days 00:00:00.400', '0 days 00:00:00.450', + '0 days 00:00:01.200']) + tm.assert_index_equal(TimedeltaIndex([400, 450, 1200], unit='ms'), + expected) + + def test_constructor_coverage(self): + rng = timedelta_range('1 days', periods=10.5) + exp = timedelta_range('1 days', periods=10) + tm.assert_index_equal(rng, exp) + + pytest.raises(ValueError, TimedeltaIndex, start='1 days', + periods='foo', freq='D') + + pytest.raises(ValueError, TimedeltaIndex, start='1 days', + end='10 days') + + pytest.raises(ValueError, TimedeltaIndex, '1 days') + + # generator expression + gen = (timedelta(i) for i in range(10)) + result = TimedeltaIndex(gen) + expected = TimedeltaIndex([timedelta(i) for i in range(10)]) + tm.assert_index_equal(result, expected) + + # NumPy string array + strings = np.array(['1 days', '2 days', '3 days']) + result = TimedeltaIndex(strings) + expected = to_timedelta([1, 2, 3], unit='d') + tm.assert_index_equal(result, expected) + + from_ints = TimedeltaIndex(expected.asi8) + tm.assert_index_equal(from_ints, expected) + + # non-conforming freq + pytest.raises(ValueError, TimedeltaIndex, + ['1 days', '2 days', '4 days'], freq='D') + + pytest.raises(ValueError, TimedeltaIndex, periods=10, freq='D') + + def test_constructor_name(self): + idx = TimedeltaIndex(start='1 days', periods=1, freq='D', name='TEST') + assert idx.name == 'TEST' + + # GH10025 + idx2 = TimedeltaIndex(idx, name='something else') + assert idx2.name == 'something else' diff --git a/pandas/tests/indexes/timedeltas/test_indexing.py b/pandas/tests/indexes/timedeltas/test_indexing.py new file mode 100644 index 0000000000000..844033cc19eed --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_indexing.py @@ -0,0 +1,112 @@ +import pytest + +from datetime import timedelta + +import pandas.util.testing as tm +from pandas import TimedeltaIndex, timedelta_range, compat, Index, Timedelta + + +class TestTimedeltaIndex(object): + _multiprocess_can_split_ = True + + def test_insert(self): + + idx = TimedeltaIndex(['4day', '1day', '2day'], name='idx') + + result = idx.insert(2, timedelta(days=5)) + exp = TimedeltaIndex(['4day', '1day', '5day', '2day'], name='idx') + tm.assert_index_equal(result, exp) + + # insertion of non-datetime should coerce to object index + result = idx.insert(1, 'inserted') + expected = Index([Timedelta('4day'), 'inserted', Timedelta('1day'), + Timedelta('2day')], name='idx') + assert not isinstance(result, TimedeltaIndex) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + + idx = timedelta_range('1day 00:00:01', periods=3, freq='s', name='idx') + + # preserve freq + expected_0 = TimedeltaIndex(['1day', '1day 00:00:01', '1day 00:00:02', + '1day 00:00:03'], + name='idx', freq='s') + expected_3 = TimedeltaIndex(['1day 00:00:01', '1day 00:00:02', + '1day 00:00:03', '1day 00:00:04'], + name='idx', freq='s') + + # reset freq to None + expected_1_nofreq = TimedeltaIndex(['1day 00:00:01', '1day 00:00:01', + '1day 00:00:02', '1day 00:00:03'], + name='idx', freq=None) + expected_3_nofreq = TimedeltaIndex(['1day 00:00:01', '1day 00:00:02', + '1day 00:00:03', '1day 00:00:05'], + name='idx', freq=None) + + cases = [(0, Timedelta('1day'), expected_0), + (-3, Timedelta('1day'), expected_0), + (3, Timedelta('1day 00:00:04'), expected_3), + (1, Timedelta('1day 00:00:01'), expected_1_nofreq), + (3, Timedelta('1day 00:00:05'), expected_3_nofreq)] + + for n, d, expected in cases: + result = idx.insert(n, d) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + def test_delete(self): + idx = timedelta_range(start='1 Days', periods=5, freq='D', name='idx') + + # prserve freq + expected_0 = timedelta_range(start='2 Days', periods=4, freq='D', + name='idx') + expected_4 = timedelta_range(start='1 Days', periods=4, freq='D', + name='idx') + + # reset freq to None + expected_1 = TimedeltaIndex( + ['1 day', '3 day', '4 day', '5 day'], freq=None, name='idx') + + cases = {0: expected_0, + -5: expected_0, + -1: expected_4, + 4: expected_4, + 1: expected_1} + for n, expected in compat.iteritems(cases): + result = idx.delete(n) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + with pytest.raises((IndexError, ValueError)): + # either depeidnig on numpy version + result = idx.delete(5) + + def test_delete_slice(self): + idx = timedelta_range(start='1 days', periods=10, freq='D', name='idx') + + # prserve freq + expected_0_2 = timedelta_range(start='4 days', periods=7, freq='D', + name='idx') + expected_7_9 = timedelta_range(start='1 days', periods=7, freq='D', + name='idx') + + # reset freq to None + expected_3_5 = TimedeltaIndex(['1 d', '2 d', '3 d', + '7 d', '8 d', '9 d', '10d'], + freq=None, name='idx') + + cases = {(0, 1, 2): expected_0_2, + (7, 8, 9): expected_7_9, + (3, 4, 5): expected_3_5} + for n, expected in compat.iteritems(cases): + result = idx.delete(n) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq + + result = idx.delete(slice(n[0], n[-1] + 1)) + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert result.freq == expected.freq diff --git a/pandas/tests/indexes/timedeltas/test_ops.py b/pandas/tests/indexes/timedeltas/test_ops.py new file mode 100644 index 0000000000000..f4f669ee1d087 --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_ops.py @@ -0,0 +1,1284 @@ +import pytest + +import numpy as np +from datetime import timedelta +from distutils.version import LooseVersion + +import pandas as pd +import pandas.util.testing as tm +from pandas import to_timedelta +from pandas.util.testing import assert_series_equal, assert_frame_equal +from pandas import (Series, Timedelta, DataFrame, Timestamp, TimedeltaIndex, + timedelta_range, date_range, DatetimeIndex, Int64Index, + _np_version_under1p10, Float64Index, Index) +from pandas._libs.tslib import iNaT +from pandas.tests.test_base import Ops + + +class TestTimedeltaIndexOps(Ops): + def setup_method(self, method): + super(TestTimedeltaIndexOps, self).setup_method(method) + mask = lambda x: isinstance(x, TimedeltaIndex) + self.is_valid_objs = [o for o in self.objs if mask(o)] + self.not_valid_objs = [] + + def test_ops_properties(self): + f = lambda x: isinstance(x, TimedeltaIndex) + self.check_ops_properties(TimedeltaIndex._field_ops, f) + self.check_ops_properties(TimedeltaIndex._object_ops, f) + + def test_asobject_tolist(self): + idx = timedelta_range(start='1 days', periods=4, freq='D', name='idx') + expected_list = [Timedelta('1 days'), Timedelta('2 days'), + Timedelta('3 days'), Timedelta('4 days')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + idx = TimedeltaIndex([timedelta(days=1), timedelta(days=2), pd.NaT, + timedelta(days=4)], name='idx') + expected_list = [Timedelta('1 days'), Timedelta('2 days'), pd.NaT, + Timedelta('4 days')] + expected = pd.Index(expected_list, dtype=object, name='idx') + result = idx.asobject + assert isinstance(result, Index) + assert result.dtype == object + tm.assert_index_equal(result, expected) + assert result.name == expected.name + assert idx.tolist() == expected_list + + def test_minmax(self): + + # monotonic + idx1 = TimedeltaIndex(['1 days', '2 days', '3 days']) + assert idx1.is_monotonic + + # non-monotonic + idx2 = TimedeltaIndex(['1 days', np.nan, '3 days', 'NaT']) + assert not idx2.is_monotonic + + for idx in [idx1, idx2]: + assert idx.min() == Timedelta('1 days') + assert idx.max() == Timedelta('3 days') + assert idx.argmin() == 0 + assert idx.argmax() == 2 + + for op in ['min', 'max']: + # Return NaT + obj = TimedeltaIndex([]) + assert pd.isna(getattr(obj, op)()) + + obj = TimedeltaIndex([pd.NaT]) + assert pd.isna(getattr(obj, op)()) + + obj = TimedeltaIndex([pd.NaT, pd.NaT, pd.NaT]) + assert pd.isna(getattr(obj, op)()) + + def test_numpy_minmax(self): + dr = pd.date_range(start='2016-01-15', end='2016-01-20') + td = TimedeltaIndex(np.asarray(dr)) + + assert np.min(td) == Timedelta('16815 days') + assert np.max(td) == Timedelta('16820 days') + + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, errmsg, np.min, td, out=0) + tm.assert_raises_regex(ValueError, errmsg, np.max, td, out=0) + + assert np.argmin(td) == 0 + assert np.argmax(td) == 5 + + if not _np_version_under1p10: + errmsg = "the 'out' parameter is not supported" + tm.assert_raises_regex( + ValueError, errmsg, np.argmin, td, out=0) + tm.assert_raises_regex( + ValueError, errmsg, np.argmax, td, out=0) + + def test_round(self): + td = pd.timedelta_range(start='16801 days', periods=5, freq='30Min') + elt = td[1] + + expected_rng = TimedeltaIndex([ + Timedelta('16801 days 00:00:00'), + Timedelta('16801 days 00:00:00'), + Timedelta('16801 days 01:00:00'), + Timedelta('16801 days 02:00:00'), + Timedelta('16801 days 02:00:00'), + ]) + expected_elt = expected_rng[1] + + tm.assert_index_equal(td.round(freq='H'), expected_rng) + assert elt.round(freq='H') == expected_elt + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + td.round(freq='foo') + with tm.assert_raises_regex(ValueError, msg): + elt.round(freq='foo') + + msg = " is a non-fixed frequency" + tm.assert_raises_regex(ValueError, msg, td.round, freq='M') + tm.assert_raises_regex(ValueError, msg, elt.round, freq='M') + + def test_representation(self): + idx1 = TimedeltaIndex([], freq='D') + idx2 = TimedeltaIndex(['1 days'], freq='D') + idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') + idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') + idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) + + exp1 = """TimedeltaIndex([], dtype='timedelta64[ns]', freq='D')""" + + exp2 = ("TimedeltaIndex(['1 days'], dtype='timedelta64[ns]', " + "freq='D')") + + exp3 = ("TimedeltaIndex(['1 days', '2 days'], " + "dtype='timedelta64[ns]', freq='D')") + + exp4 = ("TimedeltaIndex(['1 days', '2 days', '3 days'], " + "dtype='timedelta64[ns]', freq='D')") + + exp5 = ("TimedeltaIndex(['1 days 00:00:01', '2 days 00:00:00', " + "'3 days 00:00:00'], dtype='timedelta64[ns]', freq=None)") + + with pd.option_context('display.width', 300): + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], + [exp1, exp2, exp3, exp4, exp5]): + for func in ['__repr__', '__unicode__', '__str__']: + result = getattr(idx, func)() + assert result == expected + + def test_representation_to_series(self): + idx1 = TimedeltaIndex([], freq='D') + idx2 = TimedeltaIndex(['1 days'], freq='D') + idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') + idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') + idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) + + exp1 = """Series([], dtype: timedelta64[ns])""" + + exp2 = """0 1 days +dtype: timedelta64[ns]""" + + exp3 = """0 1 days +1 2 days +dtype: timedelta64[ns]""" + + exp4 = """0 1 days +1 2 days +2 3 days +dtype: timedelta64[ns]""" + + exp5 = """0 1 days 00:00:01 +1 2 days 00:00:00 +2 3 days 00:00:00 +dtype: timedelta64[ns]""" + + with pd.option_context('display.width', 300): + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], + [exp1, exp2, exp3, exp4, exp5]): + result = repr(pd.Series(idx)) + assert result == expected + + def test_summary(self): + # GH9116 + idx1 = TimedeltaIndex([], freq='D') + idx2 = TimedeltaIndex(['1 days'], freq='D') + idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') + idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') + idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) + + exp1 = """TimedeltaIndex: 0 entries +Freq: D""" + + exp2 = """TimedeltaIndex: 1 entries, 1 days to 1 days +Freq: D""" + + exp3 = """TimedeltaIndex: 2 entries, 1 days to 2 days +Freq: D""" + + exp4 = """TimedeltaIndex: 3 entries, 1 days to 3 days +Freq: D""" + + exp5 = ("TimedeltaIndex: 3 entries, 1 days 00:00:01 to 3 days " + "00:00:00") + + for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], + [exp1, exp2, exp3, exp4, exp5]): + result = idx.summary() + assert result == expected + + def test_add_iadd(self): + + # only test adding/sub offsets as + is now numeric + + # offset + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), Timedelta(hours=2)] + + for delta in offsets: + rng = timedelta_range('1 days', '10 days') + result = rng + delta + expected = timedelta_range('1 days 02:00:00', '10 days 02:00:00', + freq='D') + tm.assert_index_equal(result, expected) + rng += delta + tm.assert_index_equal(rng, expected) + + # int + rng = timedelta_range('1 days 09:00:00', freq='H', periods=10) + result = rng + 1 + expected = timedelta_range('1 days 10:00:00', freq='H', periods=10) + tm.assert_index_equal(result, expected) + rng += 1 + tm.assert_index_equal(rng, expected) + + def test_sub_isub(self): + # only test adding/sub offsets as - is now numeric + + # offset + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), Timedelta(hours=2)] + + for delta in offsets: + rng = timedelta_range('1 days', '10 days') + result = rng - delta + expected = timedelta_range('0 days 22:00:00', '9 days 22:00:00') + tm.assert_index_equal(result, expected) + rng -= delta + tm.assert_index_equal(rng, expected) + + # int + rng = timedelta_range('1 days 09:00:00', freq='H', periods=10) + result = rng - 1 + expected = timedelta_range('1 days 08:00:00', freq='H', periods=10) + tm.assert_index_equal(result, expected) + rng -= 1 + tm.assert_index_equal(rng, expected) + + idx = TimedeltaIndex(['1 day', '2 day']) + msg = "cannot subtract a datelike from a TimedeltaIndex" + with tm.assert_raises_regex(TypeError, msg): + idx - Timestamp('2011-01-01') + + result = Timestamp('2011-01-01') + idx + expected = DatetimeIndex(['2011-01-02', '2011-01-03']) + tm.assert_index_equal(result, expected) + + def test_ops_compat(self): + + offsets = [pd.offsets.Hour(2), timedelta(hours=2), + np.timedelta64(2, 'h'), Timedelta(hours=2)] + + rng = timedelta_range('1 days', '10 days', name='foo') + + # multiply + for offset in offsets: + pytest.raises(TypeError, lambda: rng * offset) + + # divide + expected = Int64Index((np.arange(10) + 1) * 12, name='foo') + for offset in offsets: + result = rng / offset + tm.assert_index_equal(result, expected, exact=False) + + # floor divide + expected = Int64Index((np.arange(10) + 1) * 12, name='foo') + for offset in offsets: + result = rng // offset + tm.assert_index_equal(result, expected, exact=False) + + # divide with nats + rng = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') + expected = Float64Index([12, np.nan, 24], name='foo') + for offset in offsets: + result = rng / offset + tm.assert_index_equal(result, expected) + + # don't allow division by NaT (make could in the future) + pytest.raises(TypeError, lambda: rng / pd.NaT) + + def test_subtraction_ops(self): + + # with datetimes/timedelta and tdi/dti + tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') + dti = date_range('20130101', periods=3, name='bar') + td = Timedelta('1 days') + dt = Timestamp('20130101') + + pytest.raises(TypeError, lambda: tdi - dt) + pytest.raises(TypeError, lambda: tdi - dti) + pytest.raises(TypeError, lambda: td - dt) + pytest.raises(TypeError, lambda: td - dti) + + result = dt - dti + expected = TimedeltaIndex(['0 days', '-1 days', '-2 days'], name='bar') + tm.assert_index_equal(result, expected) + + result = dti - dt + expected = TimedeltaIndex(['0 days', '1 days', '2 days'], name='bar') + tm.assert_index_equal(result, expected) + + result = tdi - td + expected = TimedeltaIndex(['0 days', pd.NaT, '1 days'], name='foo') + tm.assert_index_equal(result, expected, check_names=False) + + result = td - tdi + expected = TimedeltaIndex(['0 days', pd.NaT, '-1 days'], name='foo') + tm.assert_index_equal(result, expected, check_names=False) + + result = dti - td + expected = DatetimeIndex( + ['20121231', '20130101', '20130102'], name='bar') + tm.assert_index_equal(result, expected, check_names=False) + + result = dt - tdi + expected = DatetimeIndex(['20121231', pd.NaT, '20121230'], name='foo') + tm.assert_index_equal(result, expected) + + def test_subtraction_ops_with_tz(self): + + # check that dt/dti subtraction ops with tz are validated + dti = date_range('20130101', periods=3) + ts = Timestamp('20130101') + dt = ts.to_pydatetime() + dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') + ts_tz = Timestamp('20130101').tz_localize('US/Eastern') + ts_tz2 = Timestamp('20130101').tz_localize('CET') + dt_tz = ts_tz.to_pydatetime() + td = Timedelta('1 days') + + def _check(result, expected): + assert result == expected + assert isinstance(result, Timedelta) + + # scalars + result = ts - ts + expected = Timedelta('0 days') + _check(result, expected) + + result = dt_tz - ts_tz + expected = Timedelta('0 days') + _check(result, expected) + + result = ts_tz - dt_tz + expected = Timedelta('0 days') + _check(result, expected) + + # tz mismatches + pytest.raises(TypeError, lambda: dt_tz - ts) + pytest.raises(TypeError, lambda: dt_tz - dt) + pytest.raises(TypeError, lambda: dt_tz - ts_tz2) + pytest.raises(TypeError, lambda: dt - dt_tz) + pytest.raises(TypeError, lambda: ts - dt_tz) + pytest.raises(TypeError, lambda: ts_tz2 - ts) + pytest.raises(TypeError, lambda: ts_tz2 - dt) + pytest.raises(TypeError, lambda: ts_tz - ts_tz2) + + # with dti + pytest.raises(TypeError, lambda: dti - ts_tz) + pytest.raises(TypeError, lambda: dti_tz - ts) + pytest.raises(TypeError, lambda: dti_tz - ts_tz2) + + result = dti_tz - dt_tz + expected = TimedeltaIndex(['0 days', '1 days', '2 days']) + tm.assert_index_equal(result, expected) + + result = dt_tz - dti_tz + expected = TimedeltaIndex(['0 days', '-1 days', '-2 days']) + tm.assert_index_equal(result, expected) + + result = dti_tz - ts_tz + expected = TimedeltaIndex(['0 days', '1 days', '2 days']) + tm.assert_index_equal(result, expected) + + result = ts_tz - dti_tz + expected = TimedeltaIndex(['0 days', '-1 days', '-2 days']) + tm.assert_index_equal(result, expected) + + result = td - td + expected = Timedelta('0 days') + _check(result, expected) + + result = dti_tz - td + expected = DatetimeIndex( + ['20121231', '20130101', '20130102'], tz='US/Eastern') + tm.assert_index_equal(result, expected) + + def test_dti_tdi_numeric_ops(self): + + # These are normally union/diff set-like ops + tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') + dti = date_range('20130101', periods=3, name='bar') + + # TODO(wesm): unused? + # td = Timedelta('1 days') + # dt = Timestamp('20130101') + + result = tdi - tdi + expected = TimedeltaIndex(['0 days', pd.NaT, '0 days'], name='foo') + tm.assert_index_equal(result, expected) + + result = tdi + tdi + expected = TimedeltaIndex(['2 days', pd.NaT, '4 days'], name='foo') + tm.assert_index_equal(result, expected) + + result = dti - tdi # name will be reset + expected = DatetimeIndex(['20121231', pd.NaT, '20130101']) + tm.assert_index_equal(result, expected) + + def test_sub_period(self): + # GH 13078 + # not supported, check TypeError + p = pd.Period('2011-01-01', freq='D') + + for freq in [None, 'H']: + idx = pd.TimedeltaIndex(['1 hours', '2 hours'], freq=freq) + + with pytest.raises(TypeError): + idx - p + + with pytest.raises(TypeError): + p - idx + + def test_addition_ops(self): + + # with datetimes/timedelta and tdi/dti + tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') + dti = date_range('20130101', periods=3, name='bar') + td = Timedelta('1 days') + dt = Timestamp('20130101') + + result = tdi + dt + expected = DatetimeIndex(['20130102', pd.NaT, '20130103'], name='foo') + tm.assert_index_equal(result, expected) + + result = dt + tdi + expected = DatetimeIndex(['20130102', pd.NaT, '20130103'], name='foo') + tm.assert_index_equal(result, expected) + + result = td + tdi + expected = TimedeltaIndex(['2 days', pd.NaT, '3 days'], name='foo') + tm.assert_index_equal(result, expected) + + result = tdi + td + expected = TimedeltaIndex(['2 days', pd.NaT, '3 days'], name='foo') + tm.assert_index_equal(result, expected) + + # unequal length + pytest.raises(ValueError, lambda: tdi + dti[0:1]) + pytest.raises(ValueError, lambda: tdi[0:1] + dti) + + # random indexes + pytest.raises(TypeError, lambda: tdi + Int64Index([1, 2, 3])) + + # this is a union! + # pytest.raises(TypeError, lambda : Int64Index([1,2,3]) + tdi) + + result = tdi + dti # name will be reset + expected = DatetimeIndex(['20130102', pd.NaT, '20130105']) + tm.assert_index_equal(result, expected) + + result = dti + tdi # name will be reset + expected = DatetimeIndex(['20130102', pd.NaT, '20130105']) + tm.assert_index_equal(result, expected) + + result = dt + td + expected = Timestamp('20130102') + assert result == expected + + result = td + dt + expected = Timestamp('20130102') + assert result == expected + + def test_comp_nat(self): + left = pd.TimedeltaIndex([pd.Timedelta('1 days'), pd.NaT, + pd.Timedelta('3 days')]) + right = pd.TimedeltaIndex([pd.NaT, pd.NaT, pd.Timedelta('3 days')]) + + for l, r in [(left, right), (left.asobject, right.asobject)]: + result = l == r + expected = np.array([False, False, True]) + tm.assert_numpy_array_equal(result, expected) + + result = l != r + expected = np.array([True, True, False]) + tm.assert_numpy_array_equal(result, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l == pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT == r, expected) + + expected = np.array([True, True, True]) + tm.assert_numpy_array_equal(l != pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT != l, expected) + + expected = np.array([False, False, False]) + tm.assert_numpy_array_equal(l < pd.NaT, expected) + tm.assert_numpy_array_equal(pd.NaT > l, expected) + + def test_value_counts_unique(self): + # GH 7735 + + idx = timedelta_range('1 days 09:00:00', freq='H', periods=10) + # create repeated values, 'n'th element is repeated by n+1 times + idx = TimedeltaIndex(np.repeat(idx.values, range(1, len(idx) + 1))) + + exp_idx = timedelta_range('1 days 18:00:00', freq='-1H', periods=10) + expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + expected = timedelta_range('1 days 09:00:00', freq='H', periods=10) + tm.assert_index_equal(idx.unique(), expected) + + idx = TimedeltaIndex(['1 days 09:00:00', '1 days 09:00:00', + '1 days 09:00:00', '1 days 08:00:00', + '1 days 08:00:00', pd.NaT]) + + exp_idx = TimedeltaIndex(['1 days 09:00:00', '1 days 08:00:00']) + expected = Series([3, 2], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(), expected) + + exp_idx = TimedeltaIndex(['1 days 09:00:00', '1 days 08:00:00', + pd.NaT]) + expected = Series([3, 2, 1], index=exp_idx) + + for obj in [idx, Series(idx)]: + tm.assert_series_equal(obj.value_counts(dropna=False), expected) + + tm.assert_index_equal(idx.unique(), exp_idx) + + def test_nonunique_contains(self): + # GH 9512 + for idx in map(TimedeltaIndex, ([0, 1, 0], [0, 0, -1], [0, -1, -1], + ['00:01:00', '00:01:00', '00:02:00'], + ['00:01:00', '00:01:00', '00:00:01'])): + assert idx[0] in idx + + def test_unknown_attribute(self): + # see gh-9680 + tdi = pd.timedelta_range(start=0, periods=10, freq='1s') + ts = pd.Series(np.random.normal(size=10), index=tdi) + assert 'foo' not in ts.__dict__.keys() + pytest.raises(AttributeError, lambda: ts.foo) + + def test_order(self): + # GH 10295 + idx1 = TimedeltaIndex(['1 day', '2 day', '3 day'], freq='D', + name='idx') + idx2 = TimedeltaIndex( + ['1 hour', '2 hour', '3 hour'], freq='H', name='idx') + + for idx in [idx1, idx2]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, idx) + assert ordered.freq == idx.freq + + ordered = idx.sort_values(ascending=False) + expected = idx[::-1] + tm.assert_index_equal(ordered, expected) + assert ordered.freq == expected.freq + assert ordered.freq.n == -1 + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, idx) + tm.assert_numpy_array_equal(indexer, np.array([0, 1, 2]), + check_dtype=False) + assert ordered.freq == idx.freq + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, idx[::-1]) + assert ordered.freq == expected.freq + assert ordered.freq.n == -1 + + idx1 = TimedeltaIndex(['1 hour', '3 hour', '5 hour', + '2 hour ', '1 hour'], name='idx1') + exp1 = TimedeltaIndex(['1 hour', '1 hour', '2 hour', + '3 hour', '5 hour'], name='idx1') + + idx2 = TimedeltaIndex(['1 day', '3 day', '5 day', + '2 day', '1 day'], name='idx2') + + # TODO(wesm): unused? + # exp2 = TimedeltaIndex(['1 day', '1 day', '2 day', + # '3 day', '5 day'], name='idx2') + + # idx3 = TimedeltaIndex([pd.NaT, '3 minute', '5 minute', + # '2 minute', pd.NaT], name='idx3') + # exp3 = TimedeltaIndex([pd.NaT, pd.NaT, '2 minute', '3 minute', + # '5 minute'], name='idx3') + + for idx, expected in [(idx1, exp1), (idx1, exp1), (idx1, exp1)]: + ordered = idx.sort_values() + tm.assert_index_equal(ordered, expected) + assert ordered.freq is None + + ordered = idx.sort_values(ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + assert ordered.freq is None + + ordered, indexer = idx.sort_values(return_indexer=True) + tm.assert_index_equal(ordered, expected) + + exp = np.array([0, 4, 3, 1, 2]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq is None + + ordered, indexer = idx.sort_values(return_indexer=True, + ascending=False) + tm.assert_index_equal(ordered, expected[::-1]) + + exp = np.array([2, 1, 3, 4, 0]) + tm.assert_numpy_array_equal(indexer, exp, check_dtype=False) + assert ordered.freq is None + + def test_getitem(self): + idx1 = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') + + for idx in [idx1]: + result = idx[0] + assert result == pd.Timedelta('1 day') + + result = idx[0:5] + expected = pd.timedelta_range('1 day', '5 day', freq='D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[0:10:2] + expected = pd.timedelta_range('1 day', '9 day', freq='2D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[-20:-5:3] + expected = pd.timedelta_range('12 day', '24 day', freq='3D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx[4::-1] + expected = TimedeltaIndex(['5 day', '4 day', '3 day', + '2 day', '1 day'], + freq='-1D', name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + def test_drop_duplicates_metadata(self): + # GH 10115 + idx = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') + result = idx.drop_duplicates() + tm.assert_index_equal(idx, result) + assert idx.freq == result.freq + + idx_dup = idx.append(idx) + assert idx_dup.freq is None # freq is reset + result = idx_dup.drop_duplicates() + tm.assert_index_equal(idx, result) + assert result.freq is None + + def test_drop_duplicates(self): + # to check Index/Series compat + base = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') + idx = base.append(base[:5]) + + res = idx.drop_duplicates() + tm.assert_index_equal(res, base) + res = Series(idx).drop_duplicates() + tm.assert_series_equal(res, Series(base)) + + res = idx.drop_duplicates(keep='last') + exp = base[5:].append(base[:5]) + tm.assert_index_equal(res, exp) + res = Series(idx).drop_duplicates(keep='last') + tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) + + res = idx.drop_duplicates(keep=False) + tm.assert_index_equal(res, base[5:]) + res = Series(idx).drop_duplicates(keep=False) + tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) + + def test_take(self): + # GH 10295 + idx1 = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') + + for idx in [idx1]: + result = idx.take([0]) + assert result == pd.Timedelta('1 day') + + result = idx.take([-1]) + assert result == pd.Timedelta('31 day') + + result = idx.take([0, 1, 2]) + expected = pd.timedelta_range('1 day', '3 day', freq='D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([0, 2, 4]) + expected = pd.timedelta_range('1 day', '5 day', freq='2D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([7, 4, 1]) + expected = pd.timedelta_range('8 day', '2 day', freq='-3D', + name='idx') + tm.assert_index_equal(result, expected) + assert result.freq == expected.freq + + result = idx.take([3, 2, 5]) + expected = TimedeltaIndex(['4 day', '3 day', '6 day'], name='idx') + tm.assert_index_equal(result, expected) + assert result.freq is None + + result = idx.take([-3, 2, 5]) + expected = TimedeltaIndex(['29 day', '3 day', '6 day'], name='idx') + tm.assert_index_equal(result, expected) + assert result.freq is None + + def test_take_invalid_kwargs(self): + idx = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') + indices = [1, 6, 5, 9, 10, 13, 15, 3] + + msg = r"take\(\) got an unexpected keyword argument 'foo'" + tm.assert_raises_regex(TypeError, msg, idx.take, + indices, foo=2) + + msg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, out=indices) + + msg = "the 'mode' parameter is not supported" + tm.assert_raises_regex(ValueError, msg, idx.take, + indices, mode='clip') + + def test_infer_freq(self): + # GH 11018 + for freq in ['D', '3D', '-3D', 'H', '2H', '-2H', 'T', '2T', 'S', '-3S' + ]: + idx = pd.timedelta_range('1', freq=freq, periods=10) + result = pd.TimedeltaIndex(idx.asi8, freq='infer') + tm.assert_index_equal(idx, result) + assert result.freq == freq + + def test_nat_new(self): + + idx = pd.timedelta_range('1', freq='D', periods=5, name='x') + result = idx._nat_new() + exp = pd.TimedeltaIndex([pd.NaT] * 5, name='x') + tm.assert_index_equal(result, exp) + + result = idx._nat_new(box=False) + exp = np.array([iNaT] * 5, dtype=np.int64) + tm.assert_numpy_array_equal(result, exp) + + def test_shift(self): + # GH 9903 + idx = pd.TimedeltaIndex([], name='xxx') + tm.assert_index_equal(idx.shift(0, freq='H'), idx) + tm.assert_index_equal(idx.shift(3, freq='H'), idx) + + idx = pd.TimedeltaIndex(['5 hours', '6 hours', '9 hours'], name='xxx') + tm.assert_index_equal(idx.shift(0, freq='H'), idx) + exp = pd.TimedeltaIndex(['8 hours', '9 hours', '12 hours'], name='xxx') + tm.assert_index_equal(idx.shift(3, freq='H'), exp) + exp = pd.TimedeltaIndex(['2 hours', '3 hours', '6 hours'], name='xxx') + tm.assert_index_equal(idx.shift(-3, freq='H'), exp) + + tm.assert_index_equal(idx.shift(0, freq='T'), idx) + exp = pd.TimedeltaIndex(['05:03:00', '06:03:00', '9:03:00'], + name='xxx') + tm.assert_index_equal(idx.shift(3, freq='T'), exp) + exp = pd.TimedeltaIndex(['04:57:00', '05:57:00', '8:57:00'], + name='xxx') + tm.assert_index_equal(idx.shift(-3, freq='T'), exp) + + def test_repeat(self): + index = pd.timedelta_range('1 days', periods=2, freq='D') + exp = pd.TimedeltaIndex(['1 days', '1 days', '2 days', '2 days']) + for res in [index.repeat(2), np.repeat(index, 2)]: + tm.assert_index_equal(res, exp) + assert res.freq is None + + index = TimedeltaIndex(['1 days', 'NaT', '3 days']) + exp = TimedeltaIndex(['1 days', '1 days', '1 days', + 'NaT', 'NaT', 'NaT', + '3 days', '3 days', '3 days']) + for res in [index.repeat(3), np.repeat(index, 3)]: + tm.assert_index_equal(res, exp) + assert res.freq is None + + def test_nat(self): + assert pd.TimedeltaIndex._na_value is pd.NaT + assert pd.TimedeltaIndex([])._na_value is pd.NaT + + idx = pd.TimedeltaIndex(['1 days', '2 days']) + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) + assert not idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([], dtype=np.intp)) + + idx = pd.TimedeltaIndex(['1 days', 'NaT']) + assert idx._can_hold_na + + tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) + assert idx.hasnans + tm.assert_numpy_array_equal(idx._nan_idxs, + np.array([1], dtype=np.intp)) + + def test_equals(self): + # GH 13107 + idx = pd.TimedeltaIndex(['1 days', '2 days', 'NaT']) + assert idx.equals(idx) + assert idx.equals(idx.copy()) + assert idx.equals(idx.asobject) + assert idx.asobject.equals(idx) + assert idx.asobject.equals(idx.asobject) + assert not idx.equals(list(idx)) + assert not idx.equals(pd.Series(idx)) + + idx2 = pd.TimedeltaIndex(['2 days', '1 days', 'NaT']) + assert not idx.equals(idx2) + assert not idx.equals(idx2.copy()) + assert not idx.equals(idx2.asobject) + assert not idx.asobject.equals(idx2) + assert not idx.asobject.equals(idx2.asobject) + assert not idx.equals(list(idx2)) + assert not idx.equals(pd.Series(idx2)) + + +class TestTimedeltas(object): + _multiprocess_can_split_ = True + + def test_ops(self): + + td = Timedelta(10, unit='d') + assert -td == Timedelta(-10, unit='d') + assert +td == Timedelta(10, unit='d') + assert td - td == Timedelta(0, unit='ns') + assert (td - pd.NaT) is pd.NaT + assert td + td == Timedelta(20, unit='d') + assert (td + pd.NaT) is pd.NaT + assert td * 2 == Timedelta(20, unit='d') + assert (td * pd.NaT) is pd.NaT + assert td / 2 == Timedelta(5, unit='d') + assert td // 2 == Timedelta(5, unit='d') + assert abs(td) == td + assert abs(-td) == td + assert td / td == 1 + assert (td / pd.NaT) is np.nan + assert (td // pd.NaT) is np.nan + + # invert + assert -td == Timedelta('-10d') + assert td * -1 == Timedelta('-10d') + assert -1 * td == Timedelta('-10d') + assert abs(-td) == Timedelta('10d') + + # invalid multiply with another timedelta + pytest.raises(TypeError, lambda: td * td) + + # can't operate with integers + pytest.raises(TypeError, lambda: td + 2) + pytest.raises(TypeError, lambda: td - 2) + + def test_ops_offsets(self): + td = Timedelta(10, unit='d') + assert Timedelta(241, unit='h') == td + pd.offsets.Hour(1) + assert Timedelta(241, unit='h') == pd.offsets.Hour(1) + td + assert 240 == td / pd.offsets.Hour(1) + assert 1 / 240.0 == pd.offsets.Hour(1) / td + assert Timedelta(239, unit='h') == td - pd.offsets.Hour(1) + assert Timedelta(-239, unit='h') == pd.offsets.Hour(1) - td + + def test_ops_ndarray(self): + td = Timedelta('1 day') + + # timedelta, timedelta + other = pd.to_timedelta(['1 day']).values + expected = pd.to_timedelta(['2 days']).values + tm.assert_numpy_array_equal(td + other, expected) + if LooseVersion(np.__version__) >= '1.8': + tm.assert_numpy_array_equal(other + td, expected) + pytest.raises(TypeError, lambda: td + np.array([1])) + pytest.raises(TypeError, lambda: np.array([1]) + td) + + expected = pd.to_timedelta(['0 days']).values + tm.assert_numpy_array_equal(td - other, expected) + if LooseVersion(np.__version__) >= '1.8': + tm.assert_numpy_array_equal(-other + td, expected) + pytest.raises(TypeError, lambda: td - np.array([1])) + pytest.raises(TypeError, lambda: np.array([1]) - td) + + expected = pd.to_timedelta(['2 days']).values + tm.assert_numpy_array_equal(td * np.array([2]), expected) + tm.assert_numpy_array_equal(np.array([2]) * td, expected) + pytest.raises(TypeError, lambda: td * other) + pytest.raises(TypeError, lambda: other * td) + + tm.assert_numpy_array_equal(td / other, + np.array([1], dtype=np.float64)) + if LooseVersion(np.__version__) >= '1.8': + tm.assert_numpy_array_equal(other / td, + np.array([1], dtype=np.float64)) + + # timedelta, datetime + other = pd.to_datetime(['2000-01-01']).values + expected = pd.to_datetime(['2000-01-02']).values + tm.assert_numpy_array_equal(td + other, expected) + if LooseVersion(np.__version__) >= '1.8': + tm.assert_numpy_array_equal(other + td, expected) + + expected = pd.to_datetime(['1999-12-31']).values + tm.assert_numpy_array_equal(-td + other, expected) + if LooseVersion(np.__version__) >= '1.8': + tm.assert_numpy_array_equal(other - td, expected) + + def test_ops_series(self): + # regression test for GH8813 + td = Timedelta('1 day') + other = pd.Series([1, 2]) + expected = pd.Series(pd.to_timedelta(['1 day', '2 days'])) + tm.assert_series_equal(expected, td * other) + tm.assert_series_equal(expected, other * td) + + def test_ops_series_object(self): + # GH 13043 + s = pd.Series([pd.Timestamp('2015-01-01', tz='US/Eastern'), + pd.Timestamp('2015-01-01', tz='Asia/Tokyo')], + name='xxx') + assert s.dtype == object + + exp = pd.Series([pd.Timestamp('2015-01-02', tz='US/Eastern'), + pd.Timestamp('2015-01-02', tz='Asia/Tokyo')], + name='xxx') + tm.assert_series_equal(s + pd.Timedelta('1 days'), exp) + tm.assert_series_equal(pd.Timedelta('1 days') + s, exp) + + # object series & object series + s2 = pd.Series([pd.Timestamp('2015-01-03', tz='US/Eastern'), + pd.Timestamp('2015-01-05', tz='Asia/Tokyo')], + name='xxx') + assert s2.dtype == object + exp = pd.Series([pd.Timedelta('2 days'), pd.Timedelta('4 days')], + name='xxx') + tm.assert_series_equal(s2 - s, exp) + tm.assert_series_equal(s - s2, -exp) + + s = pd.Series([pd.Timedelta('01:00:00'), pd.Timedelta('02:00:00')], + name='xxx', dtype=object) + assert s.dtype == object + + exp = pd.Series([pd.Timedelta('01:30:00'), pd.Timedelta('02:30:00')], + name='xxx') + tm.assert_series_equal(s + pd.Timedelta('00:30:00'), exp) + tm.assert_series_equal(pd.Timedelta('00:30:00') + s, exp) + + def test_ops_notimplemented(self): + class Other: + pass + + other = Other() + + td = Timedelta('1 day') + assert td.__add__(other) is NotImplemented + assert td.__sub__(other) is NotImplemented + assert td.__truediv__(other) is NotImplemented + assert td.__mul__(other) is NotImplemented + assert td.__floordiv__(other) is NotImplemented + + def test_ops_error_str(self): + # GH 13624 + tdi = TimedeltaIndex(['1 day', '2 days']) + + for l, r in [(tdi, 'a'), ('a', tdi)]: + with pytest.raises(TypeError): + l + r + + with pytest.raises(TypeError): + l > r + + with pytest.raises(TypeError): + l == r + + with pytest.raises(TypeError): + l != r + + def test_timedelta_ops(self): + # GH4984 + # make sure ops return Timedelta + s = Series([Timestamp('20130101') + timedelta(seconds=i * i) + for i in range(10)]) + td = s.diff() + + result = td.mean() + expected = to_timedelta(timedelta(seconds=9)) + assert result == expected + + result = td.to_frame().mean() + assert result[0] == expected + + result = td.quantile(.1) + expected = Timedelta(np.timedelta64(2600, 'ms')) + assert result == expected + + result = td.median() + expected = to_timedelta('00:00:09') + assert result == expected + + result = td.to_frame().median() + assert result[0] == expected + + # GH 6462 + # consistency in returned values for sum + result = td.sum() + expected = to_timedelta('00:01:21') + assert result == expected + + result = td.to_frame().sum() + assert result[0] == expected + + # std + result = td.std() + expected = to_timedelta(Series(td.dropna().values).std()) + assert result == expected + + result = td.to_frame().std() + assert result[0] == expected + + # invalid ops + for op in ['skew', 'kurt', 'sem', 'prod']: + pytest.raises(TypeError, getattr(td, op)) + + # GH 10040 + # make sure NaT is properly handled by median() + s = Series([Timestamp('2015-02-03'), Timestamp('2015-02-07')]) + assert s.diff().median() == timedelta(days=4) + + s = Series([Timestamp('2015-02-03'), Timestamp('2015-02-07'), + Timestamp('2015-02-15')]) + assert s.diff().median() == timedelta(days=6) + + def test_timedelta_ops_scalar(self): + # GH 6808 + base = pd.to_datetime('20130101 09:01:12.123456') + expected_add = pd.to_datetime('20130101 09:01:22.123456') + expected_sub = pd.to_datetime('20130101 09:01:02.123456') + + for offset in [pd.to_timedelta(10, unit='s'), timedelta(seconds=10), + np.timedelta64(10, 's'), + np.timedelta64(10000000000, 'ns'), + pd.offsets.Second(10)]: + result = base + offset + assert result == expected_add + + result = base - offset + assert result == expected_sub + + base = pd.to_datetime('20130102 09:01:12.123456') + expected_add = pd.to_datetime('20130103 09:01:22.123456') + expected_sub = pd.to_datetime('20130101 09:01:02.123456') + + for offset in [pd.to_timedelta('1 day, 00:00:10'), + pd.to_timedelta('1 days, 00:00:10'), + timedelta(days=1, seconds=10), + np.timedelta64(1, 'D') + np.timedelta64(10, 's'), + pd.offsets.Day() + pd.offsets.Second(10)]: + result = base + offset + assert result == expected_add + + result = base - offset + assert result == expected_sub + + def test_timedelta_ops_with_missing_values(self): + # setup + s1 = pd.to_timedelta(Series(['00:00:01'])) + s2 = pd.to_timedelta(Series(['00:00:02'])) + sn = pd.to_timedelta(Series([pd.NaT])) + df1 = DataFrame(['00:00:01']).apply(pd.to_timedelta) + df2 = DataFrame(['00:00:02']).apply(pd.to_timedelta) + dfn = DataFrame([pd.NaT]).apply(pd.to_timedelta) + scalar1 = pd.to_timedelta('00:00:01') + scalar2 = pd.to_timedelta('00:00:02') + timedelta_NaT = pd.to_timedelta('NaT') + NA = np.nan + + actual = scalar1 + scalar1 + assert actual == scalar2 + actual = scalar2 - scalar1 + assert actual == scalar1 + + actual = s1 + s1 + assert_series_equal(actual, s2) + actual = s2 - s1 + assert_series_equal(actual, s1) + + actual = s1 + scalar1 + assert_series_equal(actual, s2) + actual = scalar1 + s1 + assert_series_equal(actual, s2) + actual = s2 - scalar1 + assert_series_equal(actual, s1) + actual = -scalar1 + s2 + assert_series_equal(actual, s1) + + actual = s1 + timedelta_NaT + assert_series_equal(actual, sn) + actual = timedelta_NaT + s1 + assert_series_equal(actual, sn) + actual = s1 - timedelta_NaT + assert_series_equal(actual, sn) + actual = -timedelta_NaT + s1 + assert_series_equal(actual, sn) + + actual = s1 + NA + assert_series_equal(actual, sn) + actual = NA + s1 + assert_series_equal(actual, sn) + actual = s1 - NA + assert_series_equal(actual, sn) + actual = -NA + s1 + assert_series_equal(actual, sn) + + actual = s1 + pd.NaT + assert_series_equal(actual, sn) + actual = s2 - pd.NaT + assert_series_equal(actual, sn) + + actual = s1 + df1 + assert_frame_equal(actual, df2) + actual = s2 - df1 + assert_frame_equal(actual, df1) + actual = df1 + s1 + assert_frame_equal(actual, df2) + actual = df2 - s1 + assert_frame_equal(actual, df1) + + actual = df1 + df1 + assert_frame_equal(actual, df2) + actual = df2 - df1 + assert_frame_equal(actual, df1) + + actual = df1 + scalar1 + assert_frame_equal(actual, df2) + actual = df2 - scalar1 + assert_frame_equal(actual, df1) + + actual = df1 + timedelta_NaT + assert_frame_equal(actual, dfn) + actual = df1 - timedelta_NaT + assert_frame_equal(actual, dfn) + + actual = df1 + NA + assert_frame_equal(actual, dfn) + actual = df1 - NA + assert_frame_equal(actual, dfn) + + actual = df1 + pd.NaT # NaT is datetime, not timedelta + assert_frame_equal(actual, dfn) + actual = df1 - pd.NaT + assert_frame_equal(actual, dfn) + + def test_compare_timedelta_series(self): + # regresssion test for GH5963 + s = pd.Series([timedelta(days=1), timedelta(days=2)]) + actual = s > timedelta(days=1) + expected = pd.Series([False, True]) + tm.assert_series_equal(actual, expected) + + def test_compare_timedelta_ndarray(self): + # GH11835 + periods = [Timedelta('0 days 01:00:00'), Timedelta('0 days 01:00:00')] + arr = np.array(periods) + result = arr[0] > arr + expected = np.array([False, False]) + tm.assert_numpy_array_equal(result, expected) + + +class TestSlicing(object): + + def test_tdi_ops_attributes(self): + rng = timedelta_range('2 days', periods=5, freq='2D', name='x') + + result = rng + 1 + exp = timedelta_range('4 days', periods=5, freq='2D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '2D' + + result = rng - 2 + exp = timedelta_range('-2 days', periods=5, freq='2D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '2D' + + result = rng * 2 + exp = timedelta_range('4 days', periods=5, freq='4D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '4D' + + result = rng / 2 + exp = timedelta_range('1 days', periods=5, freq='D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == 'D' + + result = -rng + exp = timedelta_range('-2 days', periods=5, freq='-2D', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '-2D' + + rng = pd.timedelta_range('-2 days', periods=5, freq='D', name='x') + + result = abs(rng) + exp = TimedeltaIndex(['2 days', '1 days', '0 days', '1 days', + '2 days'], name='x') + tm.assert_index_equal(result, exp) + assert result.freq is None + + def test_add_overflow(self): + # see gh-14068 + msg = "too (big|large) to convert" + with tm.assert_raises_regex(OverflowError, msg): + to_timedelta(106580, 'D') + Timestamp('2000') + with tm.assert_raises_regex(OverflowError, msg): + Timestamp('2000') + to_timedelta(106580, 'D') + + _NaT = int(pd.NaT) + 1 + msg = "Overflow in int64 addition" + with tm.assert_raises_regex(OverflowError, msg): + to_timedelta([106580], 'D') + Timestamp('2000') + with tm.assert_raises_regex(OverflowError, msg): + Timestamp('2000') + to_timedelta([106580], 'D') + with tm.assert_raises_regex(OverflowError, msg): + to_timedelta([_NaT]) - Timedelta('1 days') + with tm.assert_raises_regex(OverflowError, msg): + to_timedelta(['5 days', _NaT]) - Timedelta('1 days') + with tm.assert_raises_regex(OverflowError, msg): + (to_timedelta([_NaT, '5 days', '1 hours']) - + to_timedelta(['7 seconds', _NaT, '4 hours'])) + + # These should not overflow! + exp = TimedeltaIndex([pd.NaT]) + result = to_timedelta([pd.NaT]) - Timedelta('1 days') + tm.assert_index_equal(result, exp) + + exp = TimedeltaIndex(['4 days', pd.NaT]) + result = to_timedelta(['5 days', pd.NaT]) - Timedelta('1 days') + tm.assert_index_equal(result, exp) + + exp = TimedeltaIndex([pd.NaT, pd.NaT, '5 hours']) + result = (to_timedelta([pd.NaT, '5 days', '1 hours']) + + to_timedelta(['7 seconds', pd.NaT, '4 hours'])) + tm.assert_index_equal(result, exp) diff --git a/pandas/tests/indexes/timedeltas/test_partial_slicing.py b/pandas/tests/indexes/timedeltas/test_partial_slicing.py new file mode 100644 index 0000000000000..8e5eae2a7a3ef --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_partial_slicing.py @@ -0,0 +1,83 @@ +import pytest + +import numpy as np +import pandas.util.testing as tm + +import pandas as pd +from pandas import Series, timedelta_range, Timedelta +from pandas.util.testing import assert_series_equal + + +class TestSlicing(object): + + def test_partial_slice(self): + rng = timedelta_range('1 day 10:11:12', freq='h', periods=500) + s = Series(np.arange(len(rng)), index=rng) + + result = s['5 day':'6 day'] + expected = s.iloc[86:134] + assert_series_equal(result, expected) + + result = s['5 day':] + expected = s.iloc[86:] + assert_series_equal(result, expected) + + result = s[:'6 day'] + expected = s.iloc[:134] + assert_series_equal(result, expected) + + result = s['6 days, 23:11:12'] + assert result == s.iloc[133] + + pytest.raises(KeyError, s.__getitem__, '50 days') + + def test_partial_slice_high_reso(self): + + # higher reso + rng = timedelta_range('1 day 10:11:12', freq='us', periods=2000) + s = Series(np.arange(len(rng)), index=rng) + + result = s['1 day 10:11:12':] + expected = s.iloc[0:] + assert_series_equal(result, expected) + + result = s['1 day 10:11:12.001':] + expected = s.iloc[1000:] + assert_series_equal(result, expected) + + result = s['1 days, 10:11:12.001001'] + assert result == s.iloc[1001] + + def test_slice_with_negative_step(self): + ts = Series(np.arange(20), timedelta_range('0', periods=20, freq='H')) + SLC = pd.IndexSlice + + def assert_slices_equivalent(l_slc, i_slc): + assert_series_equal(ts[l_slc], ts.iloc[i_slc]) + assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) + + assert_slices_equivalent(SLC[Timedelta(hours=7)::-1], SLC[7::-1]) + assert_slices_equivalent(SLC['7 hours'::-1], SLC[7::-1]) + + assert_slices_equivalent(SLC[:Timedelta(hours=7):-1], SLC[:6:-1]) + assert_slices_equivalent(SLC[:'7 hours':-1], SLC[:6:-1]) + + assert_slices_equivalent(SLC['15 hours':'7 hours':-1], SLC[15:6:-1]) + assert_slices_equivalent(SLC[Timedelta(hours=15):Timedelta(hours=7):- + 1], SLC[15:6:-1]) + assert_slices_equivalent(SLC['15 hours':Timedelta(hours=7):-1], + SLC[15:6:-1]) + assert_slices_equivalent(SLC[Timedelta(hours=15):'7 hours':-1], + SLC[15:6:-1]) + + assert_slices_equivalent(SLC['7 hours':'15 hours':-1], SLC[:0]) + + def test_slice_with_zero_step_raises(self): + ts = Series(np.arange(20), timedelta_range('0', periods=20, freq='H')) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: ts.loc[::0]) diff --git a/pandas/tests/indexes/timedeltas/test_setops.py b/pandas/tests/indexes/timedeltas/test_setops.py new file mode 100644 index 0000000000000..22546d25273a7 --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_setops.py @@ -0,0 +1,76 @@ +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +from pandas import TimedeltaIndex, timedelta_range, Int64Index + + +class TestTimedeltaIndex(object): + _multiprocess_can_split_ = True + + def test_union(self): + + i1 = timedelta_range('1day', periods=5) + i2 = timedelta_range('3day', periods=5) + result = i1.union(i2) + expected = timedelta_range('1day', periods=7) + tm.assert_index_equal(result, expected) + + i1 = Int64Index(np.arange(0, 20, 2)) + i2 = TimedeltaIndex(start='1 day', periods=10, freq='D') + i1.union(i2) # Works + i2.union(i1) # Fails with "AttributeError: can't set attribute" + + def test_union_coverage(self): + + idx = TimedeltaIndex(['3d', '1d', '2d']) + ordered = TimedeltaIndex(idx.sort_values(), freq='infer') + result = ordered.union(idx) + tm.assert_index_equal(result, ordered) + + result = ordered[:0].union(ordered) + tm.assert_index_equal(result, ordered) + assert result.freq == ordered.freq + + def test_union_bug_1730(self): + + rng_a = timedelta_range('1 day', periods=4, freq='3H') + rng_b = timedelta_range('1 day', periods=4, freq='4H') + + result = rng_a.union(rng_b) + exp = TimedeltaIndex(sorted(set(list(rng_a)) | set(list(rng_b)))) + tm.assert_index_equal(result, exp) + + def test_union_bug_1745(self): + + left = TimedeltaIndex(['1 day 15:19:49.695000']) + right = TimedeltaIndex(['2 day 13:04:21.322000', + '1 day 15:27:24.873000', + '1 day 15:31:05.350000']) + + result = left.union(right) + exp = TimedeltaIndex(sorted(set(list(left)) | set(list(right)))) + tm.assert_index_equal(result, exp) + + def test_union_bug_4564(self): + + left = timedelta_range("1 day", "30d") + right = left + pd.offsets.Minute(15) + + result = left.union(right) + exp = TimedeltaIndex(sorted(set(list(left)) | set(list(right)))) + tm.assert_index_equal(result, exp) + + def test_intersection_bug_1708(self): + index_1 = timedelta_range('1 day', periods=4, freq='h') + index_2 = index_1 + pd.offsets.Hour(5) + + result = index_1 & index_2 + assert len(result) == 0 + + index_1 = timedelta_range('1 day', periods=4, freq='h') + index_2 = index_1 + pd.offsets.Hour(1) + + result = index_1 & index_2 + expected = timedelta_range('1 day 01:00:00', periods=3, freq='h') + tm.assert_index_equal(result, expected) diff --git a/pandas/tests/indexes/timedeltas/test_timedelta.py b/pandas/tests/indexes/timedeltas/test_timedelta.py new file mode 100644 index 0000000000000..0b3bd0b03bccf --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_timedelta.py @@ -0,0 +1,609 @@ +import pytest + +import numpy as np +from datetime import timedelta + +import pandas as pd +import pandas.util.testing as tm +from pandas import (timedelta_range, date_range, Series, Timedelta, + DatetimeIndex, TimedeltaIndex, Index, DataFrame, + Int64Index) +from pandas.util.testing import (assert_almost_equal, assert_series_equal, + assert_index_equal) + +from ..datetimelike import DatetimeLike + +randn = np.random.randn + + +class TestTimedeltaIndex(DatetimeLike): + _holder = TimedeltaIndex + _multiprocess_can_split_ = True + + def setup_method(self, method): + self.indices = dict(index=tm.makeTimedeltaIndex(10)) + self.setup_indices() + + def create_index(self): + return pd.to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) + + def test_shift(self): + # test shift for TimedeltaIndex + # err8083 + + drange = self.create_index() + result = drange.shift(1) + expected = TimedeltaIndex(['1 days 01:00:00', '2 days 01:00:00', + '3 days 01:00:00', + '4 days 01:00:00', '5 days 01:00:00'], + freq='D') + tm.assert_index_equal(result, expected) + + result = drange.shift(3, freq='2D 1s') + expected = TimedeltaIndex(['6 days 01:00:03', '7 days 01:00:03', + '8 days 01:00:03', '9 days 01:00:03', + '10 days 01:00:03'], freq='D') + tm.assert_index_equal(result, expected) + + def test_get_loc(self): + idx = pd.to_timedelta(['0 days', '1 days', '2 days']) + + for method in [None, 'pad', 'backfill', 'nearest']: + assert idx.get_loc(idx[1], method) == 1 + assert idx.get_loc(idx[1].to_pytimedelta(), method) == 1 + assert idx.get_loc(str(idx[1]), method) == 1 + + assert idx.get_loc(idx[1], 'pad', + tolerance=pd.Timedelta(0)) == 1 + assert idx.get_loc(idx[1], 'pad', + tolerance=np.timedelta64(0, 's')) == 1 + assert idx.get_loc(idx[1], 'pad', + tolerance=timedelta(0)) == 1 + + with tm.assert_raises_regex(ValueError, 'must be convertible'): + idx.get_loc(idx[1], method='nearest', tolerance='foo') + + for method, loc in [('pad', 1), ('backfill', 2), ('nearest', 1)]: + assert idx.get_loc('1 day 1 hour', method) == loc + + # GH 16909 + assert idx.get_loc(idx[1].to_timedelta64()) == 1 + + # GH 16896 + assert idx.get_loc('0 days') == 0 + + def test_get_loc_nat(self): + tidx = TimedeltaIndex(['1 days 01:00:00', 'NaT', '2 days 01:00:00']) + + assert tidx.get_loc(pd.NaT) == 1 + assert tidx.get_loc(None) == 1 + assert tidx.get_loc(float('nan')) == 1 + assert tidx.get_loc(np.nan) == 1 + + def test_get_indexer(self): + idx = pd.to_timedelta(['0 days', '1 days', '2 days']) + tm.assert_numpy_array_equal(idx.get_indexer(idx), + np.array([0, 1, 2], dtype=np.intp)) + + target = pd.to_timedelta(['-1 hour', '12 hours', '1 day 1 hour']) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'pad'), + np.array([-1, 0, 1], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'backfill'), + np.array([0, 1, 2], dtype=np.intp)) + tm.assert_numpy_array_equal(idx.get_indexer(target, 'nearest'), + np.array([0, 1, 1], dtype=np.intp)) + + res = idx.get_indexer(target, 'nearest', + tolerance=pd.Timedelta('1 hour')) + tm.assert_numpy_array_equal(res, np.array([0, -1, 1], dtype=np.intp)) + + def test_numeric_compat(self): + + idx = self._holder(np.arange(5, dtype='int64')) + didx = self._holder(np.arange(5, dtype='int64') ** 2) + result = idx * 1 + tm.assert_index_equal(result, idx) + + result = 1 * idx + tm.assert_index_equal(result, idx) + + result = idx / 1 + tm.assert_index_equal(result, idx) + + result = idx // 1 + tm.assert_index_equal(result, idx) + + result = idx * np.array(5, dtype='int64') + tm.assert_index_equal(result, + self._holder(np.arange(5, dtype='int64') * 5)) + + result = idx * np.arange(5, dtype='int64') + tm.assert_index_equal(result, didx) + + result = idx * Series(np.arange(5, dtype='int64')) + tm.assert_index_equal(result, didx) + + result = idx * Series(np.arange(5, dtype='float64') + 0.1) + tm.assert_index_equal(result, self._holder(np.arange( + 5, dtype='float64') * (np.arange(5, dtype='float64') + 0.1))) + + # invalid + pytest.raises(TypeError, lambda: idx * idx) + pytest.raises(ValueError, lambda: idx * self._holder(np.arange(3))) + pytest.raises(ValueError, lambda: idx * np.array([1, 2])) + + def test_pickle_compat_construction(self): + pass + + def test_ufunc_coercions(self): + # normal ops are also tested in tseries/test_timedeltas.py + idx = TimedeltaIndex(['2H', '4H', '6H', '8H', '10H'], + freq='2H', name='x') + + for result in [idx * 2, np.multiply(idx, 2)]: + assert isinstance(result, TimedeltaIndex) + exp = TimedeltaIndex(['4H', '8H', '12H', '16H', '20H'], + freq='4H', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '4H' + + for result in [idx / 2, np.divide(idx, 2)]: + assert isinstance(result, TimedeltaIndex) + exp = TimedeltaIndex(['1H', '2H', '3H', '4H', '5H'], + freq='H', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == 'H' + + idx = TimedeltaIndex(['2H', '4H', '6H', '8H', '10H'], + freq='2H', name='x') + for result in [-idx, np.negative(idx)]: + assert isinstance(result, TimedeltaIndex) + exp = TimedeltaIndex(['-2H', '-4H', '-6H', '-8H', '-10H'], + freq='-2H', name='x') + tm.assert_index_equal(result, exp) + assert result.freq == '-2H' + + idx = TimedeltaIndex(['-2H', '-1H', '0H', '1H', '2H'], + freq='H', name='x') + for result in [abs(idx), np.absolute(idx)]: + assert isinstance(result, TimedeltaIndex) + exp = TimedeltaIndex(['2H', '1H', '0H', '1H', '2H'], + freq=None, name='x') + tm.assert_index_equal(result, exp) + assert result.freq is None + + def test_fillna_timedelta(self): + # GH 11343 + idx = pd.TimedeltaIndex(['1 day', pd.NaT, '3 day']) + + exp = pd.TimedeltaIndex(['1 day', '2 day', '3 day']) + tm.assert_index_equal(idx.fillna(pd.Timedelta('2 day')), exp) + + exp = pd.TimedeltaIndex(['1 day', '3 hour', '3 day']) + idx.fillna(pd.Timedelta('3 hour')) + + exp = pd.Index( + [pd.Timedelta('1 day'), 'x', pd.Timedelta('3 day')], dtype=object) + tm.assert_index_equal(idx.fillna('x'), exp) + + def test_difference_freq(self): + # GH14323: Difference of TimedeltaIndex should not preserve frequency + + index = timedelta_range("0 days", "5 days", freq="D") + + other = timedelta_range("1 days", "4 days", freq="D") + expected = TimedeltaIndex(["0 days", "5 days"], freq=None) + idx_diff = index.difference(other) + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + other = timedelta_range("2 days", "5 days", freq="D") + idx_diff = index.difference(other) + expected = TimedeltaIndex(["0 days", "1 days"], freq=None) + tm.assert_index_equal(idx_diff, expected) + tm.assert_attr_equal('freq', idx_diff, expected) + + def test_take(self): + + tds = ['1day 02:00:00', '1 day 04:00:00', '1 day 10:00:00'] + idx = TimedeltaIndex(start='1d', end='2d', freq='H', name='idx') + expected = TimedeltaIndex(tds, freq=None, name='idx') + + taken1 = idx.take([2, 4, 10]) + taken2 = idx[[2, 4, 10]] + + for taken in [taken1, taken2]: + tm.assert_index_equal(taken, expected) + assert isinstance(taken, TimedeltaIndex) + assert taken.freq is None + assert taken.name == expected.name + + def test_take_fill_value(self): + # GH 12631 + idx = pd.TimedeltaIndex(['1 days', '2 days', '3 days'], + name='xxx') + result = idx.take(np.array([1, 0, -1])) + expected = pd.TimedeltaIndex(['2 days', '1 days', '3 days'], + name='xxx') + tm.assert_index_equal(result, expected) + + # fill_value + result = idx.take(np.array([1, 0, -1]), fill_value=True) + expected = pd.TimedeltaIndex(['2 days', '1 days', 'NaT'], + name='xxx') + tm.assert_index_equal(result, expected) + + # allow_fill=False + result = idx.take(np.array([1, 0, -1]), allow_fill=False, + fill_value=True) + expected = pd.TimedeltaIndex(['2 days', '1 days', '3 days'], + name='xxx') + tm.assert_index_equal(result, expected) + + msg = ('When allow_fill=True and fill_value is not None, ' + 'all indices must be >= -1') + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -2]), fill_value=True) + with tm.assert_raises_regex(ValueError, msg): + idx.take(np.array([1, 0, -5]), fill_value=True) + + with pytest.raises(IndexError): + idx.take(np.array([1, -5])) + + def test_isin(self): + + index = tm.makeTimedeltaIndex(4) + result = index.isin(index) + assert result.all() + + result = index.isin(list(index)) + assert result.all() + + assert_almost_equal(index.isin([index[2], 5]), + np.array([False, False, True, False])) + + def test_factorize(self): + idx1 = TimedeltaIndex(['1 day', '1 day', '2 day', '2 day', '3 day', + '3 day']) + + exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) + exp_idx = TimedeltaIndex(['1 day', '2 day', '3 day']) + + arr, idx = idx1.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + arr, idx = idx1.factorize(sort=True) + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, exp_idx) + + # freq must be preserved + idx3 = timedelta_range('1 day', periods=4, freq='s') + exp_arr = np.array([0, 1, 2, 3], dtype=np.intp) + arr, idx = idx3.factorize() + tm.assert_numpy_array_equal(arr, exp_arr) + tm.assert_index_equal(idx, idx3) + + def test_join_self(self): + + index = timedelta_range('1 day', periods=10) + kinds = 'outer', 'inner', 'left', 'right' + for kind in kinds: + joined = index.join(index, how=kind) + tm.assert_index_equal(index, joined) + + def test_slice_keeps_name(self): + + # GH4226 + dr = pd.timedelta_range('1d', '5d', freq='H', name='timebucket') + assert dr[1:].name == dr.name + + def test_does_not_convert_mixed_integer(self): + df = tm.makeCustomDataframe(10, 10, + data_gen_f=lambda *args, **kwargs: randn(), + r_idx_type='i', c_idx_type='td') + str(df) + + cols = df.columns.join(df.index, how='outer') + joined = cols.join(df.columns) + assert cols.dtype == np.dtype('O') + assert cols.dtype == joined.dtype + tm.assert_index_equal(cols, joined) + + def test_sort_values(self): + + idx = TimedeltaIndex(['4d', '1d', '2d']) + + ordered = idx.sort_values() + assert ordered.is_monotonic + + ordered = idx.sort_values(ascending=False) + assert ordered[::-1].is_monotonic + + ordered, dexer = idx.sort_values(return_indexer=True) + assert ordered.is_monotonic + + tm.assert_numpy_array_equal(dexer, np.array([1, 2, 0]), + check_dtype=False) + + ordered, dexer = idx.sort_values(return_indexer=True, ascending=False) + assert ordered[::-1].is_monotonic + + tm.assert_numpy_array_equal(dexer, np.array([0, 2, 1]), + check_dtype=False) + + def test_get_duplicates(self): + idx = TimedeltaIndex(['1 day', '2 day', '2 day', '3 day', '3day', + '4day']) + + result = idx.get_duplicates() + ex = TimedeltaIndex(['2 day', '3day']) + tm.assert_index_equal(result, ex) + + def test_argmin_argmax(self): + idx = TimedeltaIndex(['1 day 00:00:05', '1 day 00:00:01', + '1 day 00:00:02']) + assert idx.argmin() == 1 + assert idx.argmax() == 0 + + def test_misc_coverage(self): + + rng = timedelta_range('1 day', periods=5) + result = rng.groupby(rng.days) + assert isinstance(list(result.values())[0][0], Timedelta) + + idx = TimedeltaIndex(['3d', '1d', '2d']) + assert not idx.equals(list(idx)) + + non_td = Index(list('abc')) + assert not idx.equals(list(non_td)) + + def test_map(self): + + rng = timedelta_range('1 day', periods=10) + + f = lambda x: x.days + result = rng.map(f) + exp = Int64Index([f(x) for x in rng]) + tm.assert_index_equal(result, exp) + + def test_comparisons_nat(self): + + tdidx1 = pd.TimedeltaIndex(['1 day', pd.NaT, '1 day 00:00:01', pd.NaT, + '1 day 00:00:01', '5 day 00:00:03']) + tdidx2 = pd.TimedeltaIndex(['2 day', '2 day', pd.NaT, pd.NaT, + '1 day 00:00:02', '5 days 00:00:03']) + tdarr = np.array([np.timedelta64(2, 'D'), + np.timedelta64(2, 'D'), np.timedelta64('nat'), + np.timedelta64('nat'), + np.timedelta64(1, 'D') + np.timedelta64(2, 's'), + np.timedelta64(5, 'D') + np.timedelta64(3, 's')]) + + cases = [(tdidx1, tdidx2), (tdidx1, tdarr)] + + # Check pd.NaT is handles as the same as np.nan + for idx1, idx2 in cases: + + result = idx1 < idx2 + expected = np.array([True, False, False, False, True, False]) + tm.assert_numpy_array_equal(result, expected) + + result = idx2 > idx1 + expected = np.array([True, False, False, False, True, False]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 <= idx2 + expected = np.array([True, False, False, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx2 >= idx1 + expected = np.array([True, False, False, False, True, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 == idx2 + expected = np.array([False, False, False, False, False, True]) + tm.assert_numpy_array_equal(result, expected) + + result = idx1 != idx2 + expected = np.array([True, True, True, True, True, False]) + tm.assert_numpy_array_equal(result, expected) + + def test_comparisons_coverage(self): + rng = timedelta_range('1 days', periods=10) + + result = rng < rng[3] + exp = np.array([True, True, True] + [False] * 7) + tm.assert_numpy_array_equal(result, exp) + + # raise TypeError for now + pytest.raises(TypeError, rng.__lt__, rng[3].value) + + result = rng == list(rng) + exp = rng == rng + tm.assert_numpy_array_equal(result, exp) + + def test_total_seconds(self): + # GH 10939 + # test index + rng = timedelta_range('1 days, 10:11:12.100123456', periods=2, + freq='s') + expt = [1 * 86400 + 10 * 3600 + 11 * 60 + 12 + 100123456. / 1e9, + 1 * 86400 + 10 * 3600 + 11 * 60 + 13 + 100123456. / 1e9] + tm.assert_almost_equal(rng.total_seconds(), Index(expt)) + + # test Series + s = Series(rng) + s_expt = Series(expt, index=[0, 1]) + tm.assert_series_equal(s.dt.total_seconds(), s_expt) + + # with nat + s[1] = np.nan + s_expt = Series([1 * 86400 + 10 * 3600 + 11 * 60 + + 12 + 100123456. / 1e9, np.nan], index=[0, 1]) + tm.assert_series_equal(s.dt.total_seconds(), s_expt) + + # with both nat + s = Series([np.nan, np.nan], dtype='timedelta64[ns]') + tm.assert_series_equal(s.dt.total_seconds(), + Series([np.nan, np.nan], index=[0, 1])) + + def test_pass_TimedeltaIndex_to_index(self): + + rng = timedelta_range('1 days', '10 days') + idx = Index(rng, dtype=object) + + expected = Index(rng.to_pytimedelta(), dtype=object) + + tm.assert_numpy_array_equal(idx.values, expected.values) + + def test_pickle(self): + + rng = timedelta_range('1 days', periods=10) + rng_p = tm.round_trip_pickle(rng) + tm.assert_index_equal(rng, rng_p) + + def test_hash_error(self): + index = timedelta_range('1 days', periods=10) + with tm.assert_raises_regex(TypeError, "unhashable type: %r" % + type(index).__name__): + hash(index) + + def test_append_join_nondatetimeindex(self): + rng = timedelta_range('1 days', periods=10) + idx = Index(['a', 'b', 'c', 'd']) + + result = rng.append(idx) + assert isinstance(result[0], Timedelta) + + # it works + rng.join(idx, how='outer') + + def test_append_numpy_bug_1681(self): + + td = timedelta_range('1 days', '10 days', freq='2D') + a = DataFrame() + c = DataFrame({'A': 'foo', 'B': td}, index=td) + str(c) + + result = a.append(c) + assert (result['B'] == td).all() + + def test_fields(self): + rng = timedelta_range('1 days, 10:11:12.100123456', periods=2, + freq='s') + tm.assert_index_equal(rng.days, Index([1, 1], dtype='int64')) + tm.assert_index_equal( + rng.seconds, + Index([10 * 3600 + 11 * 60 + 12, 10 * 3600 + 11 * 60 + 13], + dtype='int64')) + tm.assert_index_equal( + rng.microseconds, + Index([100 * 1000 + 123, 100 * 1000 + 123], dtype='int64')) + tm.assert_index_equal(rng.nanoseconds, + Index([456, 456], dtype='int64')) + + pytest.raises(AttributeError, lambda: rng.hours) + pytest.raises(AttributeError, lambda: rng.minutes) + pytest.raises(AttributeError, lambda: rng.milliseconds) + + # with nat + s = Series(rng) + s[1] = np.nan + + tm.assert_series_equal(s.dt.days, Series([1, np.nan], index=[0, 1])) + tm.assert_series_equal(s.dt.seconds, Series( + [10 * 3600 + 11 * 60 + 12, np.nan], index=[0, 1])) + + # preserve name (GH15589) + rng.name = 'name' + assert rng.days.name == 'name' + + def test_freq_conversion(self): + + # doc example + + # series + td = Series(date_range('20130101', periods=4)) - \ + Series(date_range('20121201', periods=4)) + td[2] += timedelta(minutes=5, seconds=3) + td[3] = np.nan + + result = td / np.timedelta64(1, 'D') + expected = Series([31, 31, (31 * 86400 + 5 * 60 + 3) / 86400.0, np.nan + ]) + assert_series_equal(result, expected) + + result = td.astype('timedelta64[D]') + expected = Series([31, 31, 31, np.nan]) + assert_series_equal(result, expected) + + result = td / np.timedelta64(1, 's') + expected = Series([31 * 86400, 31 * 86400, 31 * 86400 + 5 * 60 + 3, + np.nan]) + assert_series_equal(result, expected) + + result = td.astype('timedelta64[s]') + assert_series_equal(result, expected) + + # tdi + td = TimedeltaIndex(td) + + result = td / np.timedelta64(1, 'D') + expected = Index([31, 31, (31 * 86400 + 5 * 60 + 3) / 86400.0, np.nan]) + assert_index_equal(result, expected) + + result = td.astype('timedelta64[D]') + expected = Index([31, 31, 31, np.nan]) + assert_index_equal(result, expected) + + result = td / np.timedelta64(1, 's') + expected = Index([31 * 86400, 31 * 86400, 31 * 86400 + 5 * 60 + 3, + np.nan]) + assert_index_equal(result, expected) + + result = td.astype('timedelta64[s]') + assert_index_equal(result, expected) + + +class TestSlicing(object): + @pytest.mark.parametrize('freq', ['B', 'D']) + def test_timedelta(self, freq): + index = date_range('1/1/2000', periods=50, freq=freq) + + shifted = index + timedelta(1) + back = shifted + timedelta(-1) + tm.assert_index_equal(index, back) + + if freq == 'D': + expected = pd.tseries.offsets.Day(1) + assert index.freq == expected + assert shifted.freq == expected + assert back.freq == expected + else: # freq == 'B' + assert index.freq == pd.tseries.offsets.BusinessDay(1) + assert shifted.freq is None + assert back.freq == pd.tseries.offsets.BusinessDay(1) + + result = index - timedelta(1) + expected = index + timedelta(-1) + tm.assert_index_equal(result, expected) + + # GH4134, buggy with timedeltas + rng = date_range('2013', '2014') + s = Series(rng) + result1 = rng - pd.offsets.Hour(1) + result2 = DatetimeIndex(s - np.timedelta64(100000000)) + result3 = rng - np.timedelta64(100000000) + result4 = DatetimeIndex(s - pd.offsets.Hour(1)) + tm.assert_index_equal(result1, result4) + tm.assert_index_equal(result2, result3) + + +class TestTimeSeries(object): + _multiprocess_can_split_ = True + + def test_series_box_timedelta(self): + rng = timedelta_range('1 day 1 s', periods=5, freq='h') + s = Series(rng) + assert isinstance(s[1], Timedelta) + assert isinstance(s.iat[2], Timedelta) diff --git a/pandas/tests/indexes/timedeltas/test_timedelta_range.py b/pandas/tests/indexes/timedeltas/test_timedelta_range.py new file mode 100644 index 0000000000000..4732a0ce110de --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_timedelta_range.py @@ -0,0 +1,51 @@ +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +from pandas.tseries.offsets import Day, Second +from pandas import to_timedelta, timedelta_range +from pandas.util.testing import assert_frame_equal + + +class TestTimedeltas(object): + _multiprocess_can_split_ = True + + def test_timedelta_range(self): + + expected = to_timedelta(np.arange(5), unit='D') + result = timedelta_range('0 days', periods=5, freq='D') + tm.assert_index_equal(result, expected) + + expected = to_timedelta(np.arange(11), unit='D') + result = timedelta_range('0 days', '10 days', freq='D') + tm.assert_index_equal(result, expected) + + expected = to_timedelta(np.arange(5), unit='D') + Second(2) + Day() + result = timedelta_range('1 days, 00:00:02', '5 days, 00:00:02', + freq='D') + tm.assert_index_equal(result, expected) + + expected = to_timedelta([1, 3, 5, 7, 9], unit='D') + Second(2) + result = timedelta_range('1 days, 00:00:02', periods=5, freq='2D') + tm.assert_index_equal(result, expected) + + expected = to_timedelta(np.arange(50), unit='T') * 30 + result = timedelta_range('0 days', freq='30T', periods=50) + tm.assert_index_equal(result, expected) + + # GH 11776 + arr = np.arange(10).reshape(2, 5) + df = pd.DataFrame(np.arange(10).reshape(2, 5)) + for arg in (arr, df): + with tm.assert_raises_regex(TypeError, "1-d array"): + to_timedelta(arg) + for errors in ['ignore', 'raise', 'coerce']: + with tm.assert_raises_regex(TypeError, "1-d array"): + to_timedelta(arg, errors=errors) + + # issue10583 + df = pd.DataFrame(np.random.normal(size=(10, 4))) + df.index = pd.timedelta_range(start='0s', periods=10, freq='s') + expected = df.loc[pd.Timedelta('0s'):, :] + result = df.loc['0s':, :] + assert_frame_equal(expected, result) diff --git a/pandas/tests/indexes/timedeltas/test_tools.py b/pandas/tests/indexes/timedeltas/test_tools.py new file mode 100644 index 0000000000000..1a4d1b1d7abaa --- /dev/null +++ b/pandas/tests/indexes/timedeltas/test_tools.py @@ -0,0 +1,201 @@ +import pytest + +from datetime import time, timedelta +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +from pandas.util.testing import assert_series_equal +from pandas import (Series, Timedelta, to_timedelta, isna, + TimedeltaIndex) +from pandas._libs.tslib import iNaT + + +class TestTimedeltas(object): + _multiprocess_can_split_ = True + + def test_to_timedelta(self): + def conv(v): + return v.astype('m8[ns]') + + d1 = np.timedelta64(1, 'D') + + assert (to_timedelta('1 days 06:05:01.00003', box=False) == + conv(d1 + np.timedelta64(6 * 3600 + 5 * 60 + 1, 's') + + np.timedelta64(30, 'us'))) + assert (to_timedelta('15.5us', box=False) == + conv(np.timedelta64(15500, 'ns'))) + + # empty string + result = to_timedelta('', box=False) + assert result.astype('int64') == iNaT + + result = to_timedelta(['', '']) + assert isna(result).all() + + # pass thru + result = to_timedelta(np.array([np.timedelta64(1, 's')])) + expected = pd.Index(np.array([np.timedelta64(1, 's')])) + tm.assert_index_equal(result, expected) + + # ints + result = np.timedelta64(0, 'ns') + expected = to_timedelta(0, box=False) + assert result == expected + + # Series + expected = Series([timedelta(days=1), timedelta(days=1, seconds=1)]) + result = to_timedelta(Series(['1d', '1days 00:00:01'])) + tm.assert_series_equal(result, expected) + + # with units + result = TimedeltaIndex([np.timedelta64(0, 'ns'), np.timedelta64( + 10, 's').astype('m8[ns]')]) + expected = to_timedelta([0, 10], unit='s') + tm.assert_index_equal(result, expected) + + # single element conversion + v = timedelta(seconds=1) + result = to_timedelta(v, box=False) + expected = np.timedelta64(timedelta(seconds=1)) + assert result == expected + + v = np.timedelta64(timedelta(seconds=1)) + result = to_timedelta(v, box=False) + expected = np.timedelta64(timedelta(seconds=1)) + assert result == expected + + # arrays of various dtypes + arr = np.array([1] * 5, dtype='int64') + result = to_timedelta(arr, unit='s') + expected = TimedeltaIndex([np.timedelta64(1, 's')] * 5) + tm.assert_index_equal(result, expected) + + arr = np.array([1] * 5, dtype='int64') + result = to_timedelta(arr, unit='m') + expected = TimedeltaIndex([np.timedelta64(1, 'm')] * 5) + tm.assert_index_equal(result, expected) + + arr = np.array([1] * 5, dtype='int64') + result = to_timedelta(arr, unit='h') + expected = TimedeltaIndex([np.timedelta64(1, 'h')] * 5) + tm.assert_index_equal(result, expected) + + arr = np.array([1] * 5, dtype='timedelta64[s]') + result = to_timedelta(arr) + expected = TimedeltaIndex([np.timedelta64(1, 's')] * 5) + tm.assert_index_equal(result, expected) + + arr = np.array([1] * 5, dtype='timedelta64[D]') + result = to_timedelta(arr) + expected = TimedeltaIndex([np.timedelta64(1, 'D')] * 5) + tm.assert_index_equal(result, expected) + + # Test with lists as input when box=false + expected = np.array(np.arange(3) * 1000000000, dtype='timedelta64[ns]') + result = to_timedelta(range(3), unit='s', box=False) + tm.assert_numpy_array_equal(expected, result) + + result = to_timedelta(np.arange(3), unit='s', box=False) + tm.assert_numpy_array_equal(expected, result) + + result = to_timedelta([0, 1, 2], unit='s', box=False) + tm.assert_numpy_array_equal(expected, result) + + # Tests with fractional seconds as input: + expected = np.array( + [0, 500000000, 800000000, 1200000000], dtype='timedelta64[ns]') + result = to_timedelta([0., 0.5, 0.8, 1.2], unit='s', box=False) + tm.assert_numpy_array_equal(expected, result) + + def test_to_timedelta_invalid(self): + + # bad value for errors parameter + msg = "errors must be one of" + tm.assert_raises_regex(ValueError, msg, to_timedelta, + ['foo'], errors='never') + + # these will error + pytest.raises(ValueError, lambda: to_timedelta([1, 2], unit='foo')) + pytest.raises(ValueError, lambda: to_timedelta(1, unit='foo')) + + # time not supported ATM + pytest.raises(ValueError, lambda: to_timedelta(time(second=1))) + assert to_timedelta(time(second=1), errors='coerce') is pd.NaT + + pytest.raises(ValueError, lambda: to_timedelta(['foo', 'bar'])) + tm.assert_index_equal(TimedeltaIndex([pd.NaT, pd.NaT]), + to_timedelta(['foo', 'bar'], errors='coerce')) + + tm.assert_index_equal(TimedeltaIndex(['1 day', pd.NaT, '1 min']), + to_timedelta(['1 day', 'bar', '1 min'], + errors='coerce')) + + # gh-13613: these should not error because errors='ignore' + invalid_data = 'apple' + assert invalid_data == to_timedelta(invalid_data, errors='ignore') + + invalid_data = ['apple', '1 days'] + tm.assert_numpy_array_equal( + np.array(invalid_data, dtype=object), + to_timedelta(invalid_data, errors='ignore')) + + invalid_data = pd.Index(['apple', '1 days']) + tm.assert_index_equal(invalid_data, to_timedelta( + invalid_data, errors='ignore')) + + invalid_data = Series(['apple', '1 days']) + tm.assert_series_equal(invalid_data, to_timedelta( + invalid_data, errors='ignore')) + + def test_to_timedelta_via_apply(self): + # GH 5458 + expected = Series([np.timedelta64(1, 's')]) + result = Series(['00:00:01']).apply(to_timedelta) + tm.assert_series_equal(result, expected) + + result = Series([to_timedelta('00:00:01')]) + tm.assert_series_equal(result, expected) + + def test_to_timedelta_on_missing_values(self): + # GH5438 + timedelta_NaT = np.timedelta64('NaT') + + actual = pd.to_timedelta(Series(['00:00:01', np.nan])) + expected = Series([np.timedelta64(1000000000, 'ns'), + timedelta_NaT], dtype=' obj.ndim - 1: + return + + def _print(result, error=None): + if error is not None: + error = str(error) + v = ("%-16.16s [%-16.16s]: [typ->%-8.8s,obj->%-8.8s," + "key1->(%-4.4s),key2->(%-4.4s),axis->%s] %s" % + (name, result, t, o, method1, method2, a, error or '')) + if _verbose: + pprint_thing(v) + + try: + rs = getattr(obj, method1).__getitem__(_axify(obj, k1, a)) + + try: + xp = self.get_result(obj, method2, k2, a) + except: + result = 'no comp' + _print(result) + return + + detail = None + + try: + if is_scalar(rs) and is_scalar(xp): + assert rs == xp + elif xp.ndim == 1: + tm.assert_series_equal(rs, xp) + elif xp.ndim == 2: + tm.assert_frame_equal(rs, xp) + elif xp.ndim == 3: + tm.assert_panel_equal(rs, xp) + result = 'ok' + except AssertionError as e: + detail = str(e) + result = 'fail' + + # reverse the checks + if fails is True: + if result == 'fail': + result = 'ok (fail)' + + _print(result) + if not result.startswith('ok'): + raise AssertionError(detail) + + except AssertionError: + raise + except Exception as detail: + + # if we are in fails, the ok, otherwise raise it + if fails is not None: + if isinstance(detail, fails): + result = 'ok (%s)' % type(detail).__name__ + _print(result) + return + + result = type(detail).__name__ + raise AssertionError(_print(result, error=detail)) + + if typs is None: + typs = self._typs + + if objs is None: + objs = self._objs + + if axes is not None: + if not isinstance(axes, (tuple, list)): + axes = [axes] + else: + axes = list(axes) + else: + axes = [0, 1, 2] + + # check + for o in objs: + if o not in self._objs: + continue + + d = getattr(self, o) + for a in axes: + for t in typs: + if t not in self._typs: + continue + + obj = d[t] + if obj is None: + continue + + def _call(obj=obj): + obj = obj.copy() + + k2 = key2 + _eq(t, o, a, obj, key1, k2) + + # Panel deprecations + if isinstance(obj, Panel): + with catch_warnings(record=True): + _call() + else: + _call() diff --git a/pandas/tests/indexing/test_callable.py b/pandas/tests/indexing/test_callable.py index ab225f72934ce..95b406517be62 100644 --- a/pandas/tests/indexing/test_callable.py +++ b/pandas/tests/indexing/test_callable.py @@ -1,15 +1,12 @@ # -*- coding: utf-8 -*- # pylint: disable-msg=W0612,E1101 -import nose import numpy as np import pandas as pd import pandas.util.testing as tm -class TestIndexingCallable(tm.TestCase): - - _multiprocess_can_split_ = True +class TestIndexingCallable(object): def test_frame_loc_ix_callable(self): # GH 11485 @@ -62,10 +59,10 @@ def test_frame_loc_ix_callable(self): # scalar res = df.loc[lambda x: 1, lambda x: 'A'] - self.assertEqual(res, df.loc[1, 'A']) + assert res == df.loc[1, 'A'] res = df.loc[lambda x: 1, lambda x: 'A'] - self.assertEqual(res, df.loc[1, 'A']) + assert res == df.loc[1, 'A'] def test_frame_loc_ix_callable_mixture(self): # GH 11485 @@ -268,8 +265,3 @@ def test_frame_iloc_callable_setitem(self): exp = df.copy() exp.iloc[[1, 3], [0]] = [-5, -5] tm.assert_frame_equal(res, exp) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/indexing/test_categorical.py b/pandas/tests/indexing/test_categorical.py index b8a24cb2dcb03..6874fedaa705f 100644 --- a/pandas/tests/indexing/test_categorical.py +++ b/pandas/tests/indexing/test_categorical.py @@ -1,15 +1,18 @@ # -*- coding: utf-8 -*- +import pytest + import pandas as pd import numpy as np -from pandas import Series, DataFrame +from pandas import (Series, DataFrame, Timestamp, + Categorical, CategoricalIndex) from pandas.util.testing import assert_series_equal, assert_frame_equal from pandas.util import testing as tm -class TestCategoricalIndex(tm.TestCase): +class TestCategoricalIndex(object): - def setUp(self): + def setup_method(self, method): self.df = DataFrame({'A': np.arange(6, dtype='int64'), 'B': Series(list('aabbca')).astype( @@ -47,22 +50,33 @@ def test_loc_scalar(self): assert_frame_equal(df, expected) # value not in the categories - self.assertRaises(KeyError, lambda: df.loc['d']) + pytest.raises(KeyError, lambda: df.loc['d']) def f(): df.loc['d'] = 10 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): df.loc['d', 'A'] = 10 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): df.loc['d', 'C'] = 10 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) + + def test_getitem_scalar(self): + + cats = Categorical([Timestamp('12-31-1999'), + Timestamp('12-31-2000')]) + + s = Series([1, 2], index=cats) + + expected = s.iloc[0] + result = s[cats[0]] + assert result == expected def test_loc_listlike(self): @@ -72,70 +86,70 @@ def test_loc_listlike(self): assert_frame_equal(result, expected, check_index_type=True) result = self.df2.loc[['a', 'b', 'e']] - exp_index = pd.CategoricalIndex( + exp_index = CategoricalIndex( list('aaabbe'), categories=list('cabe'), name='B') expected = DataFrame({'A': [0, 1, 5, 2, 3, np.nan]}, index=exp_index) assert_frame_equal(result, expected, check_index_type=True) # element in the categories but not in the values - self.assertRaises(KeyError, lambda: self.df2.loc['e']) + pytest.raises(KeyError, lambda: self.df2.loc['e']) # assign is ok df = self.df2.copy() df.loc['e'] = 20 result = df.loc[['a', 'b', 'e']] - exp_index = pd.CategoricalIndex( + exp_index = CategoricalIndex( list('aaabbe'), categories=list('cabe'), name='B') expected = DataFrame({'A': [0, 1, 5, 2, 3, 20]}, index=exp_index) assert_frame_equal(result, expected) df = self.df2.copy() result = df.loc[['a', 'b', 'e']] - exp_index = pd.CategoricalIndex( + exp_index = CategoricalIndex( list('aaabbe'), categories=list('cabe'), name='B') expected = DataFrame({'A': [0, 1, 5, 2, 3, np.nan]}, index=exp_index) assert_frame_equal(result, expected, check_index_type=True) # not all labels in the categories - self.assertRaises(KeyError, lambda: self.df2.loc[['a', 'd']]) + pytest.raises(KeyError, lambda: self.df2.loc[['a', 'd']]) def test_loc_listlike_dtypes(self): # GH 11586 # unique categories and codes - index = pd.CategoricalIndex(['a', 'b', 'c']) + index = CategoricalIndex(['a', 'b', 'c']) df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=index) # unique slice res = df.loc[['a', 'b']] - exp_index = pd.CategoricalIndex(['a', 'b'], - categories=index.categories) + exp_index = CategoricalIndex(['a', 'b'], + categories=index.categories) exp = DataFrame({'A': [1, 2], 'B': [4, 5]}, index=exp_index) tm.assert_frame_equal(res, exp, check_index_type=True) # duplicated slice res = df.loc[['a', 'a', 'b']] - exp_index = pd.CategoricalIndex(['a', 'a', 'b'], - categories=index.categories) + exp_index = CategoricalIndex(['a', 'a', 'b'], + categories=index.categories) exp = DataFrame({'A': [1, 1, 2], 'B': [4, 4, 5]}, index=exp_index) tm.assert_frame_equal(res, exp, check_index_type=True) - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( KeyError, 'a list-indexer must only include values that are ' 'in the categories'): df.loc[['a', 'x']] # duplicated categories and codes - index = pd.CategoricalIndex(['a', 'b', 'a']) + index = CategoricalIndex(['a', 'b', 'a']) df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=index) # unique slice res = df.loc[['a', 'b']] exp = DataFrame({'A': [1, 3, 2], 'B': [4, 6, 5]}, - index=pd.CategoricalIndex(['a', 'a', 'b'])) + index=CategoricalIndex(['a', 'a', 'b'])) tm.assert_frame_equal(res, exp, check_index_type=True) # duplicated slice @@ -143,93 +157,116 @@ def test_loc_listlike_dtypes(self): exp = DataFrame( {'A': [1, 3, 1, 3, 2], 'B': [4, 6, 4, 6, 5 - ]}, index=pd.CategoricalIndex(['a', 'a', 'a', 'a', 'b'])) + ]}, index=CategoricalIndex(['a', 'a', 'a', 'a', 'b'])) tm.assert_frame_equal(res, exp, check_index_type=True) - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( KeyError, 'a list-indexer must only include values ' 'that are in the categories'): df.loc[['a', 'x']] # contains unused category - index = pd.CategoricalIndex( + index = CategoricalIndex( ['a', 'b', 'a', 'c'], categories=list('abcde')) df = DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=index) res = df.loc[['a', 'b']] - exp = DataFrame({'A': [1, 3, 2], - 'B': [5, 7, 6]}, index=pd.CategoricalIndex( - ['a', 'a', 'b'], categories=list('abcde'))) + exp = DataFrame({'A': [1, 3, 2], 'B': [5, 7, 6]}, + index=CategoricalIndex(['a', 'a', 'b'], + categories=list('abcde'))) tm.assert_frame_equal(res, exp, check_index_type=True) res = df.loc[['a', 'e']] exp = DataFrame({'A': [1, 3, np.nan], 'B': [5, 7, np.nan]}, - index=pd.CategoricalIndex(['a', 'a', 'e'], - categories=list('abcde'))) + index=CategoricalIndex(['a', 'a', 'e'], + categories=list('abcde'))) tm.assert_frame_equal(res, exp, check_index_type=True) # duplicated slice res = df.loc[['a', 'a', 'b']] exp = DataFrame({'A': [1, 3, 1, 3, 2], 'B': [5, 7, 5, 7, 6]}, - index=pd.CategoricalIndex(['a', 'a', 'a', 'a', 'b'], - categories=list('abcde'))) + index=CategoricalIndex(['a', 'a', 'a', 'a', 'b'], + categories=list('abcde'))) tm.assert_frame_equal(res, exp, check_index_type=True) - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( KeyError, 'a list-indexer must only include values ' 'that are in the categories'): df.loc[['a', 'x']] + def test_get_indexer_array(self): + arr = np.array([Timestamp('1999-12-31 00:00:00'), + Timestamp('2000-12-31 00:00:00')], dtype=object) + cats = [Timestamp('1999-12-31 00:00:00'), + Timestamp('2000-12-31 00:00:00')] + ci = CategoricalIndex(cats, + categories=cats, + ordered=False, dtype='category') + result = ci.get_indexer(arr) + expected = np.array([0, 1], dtype='intp') + tm.assert_numpy_array_equal(result, expected) + + def test_getitem_with_listlike(self): + # GH 16115 + cats = Categorical([Timestamp('12-31-1999'), + Timestamp('12-31-2000')]) + + expected = DataFrame([[1, 0], [0, 1]], dtype='uint8', + index=[0, 1], columns=cats) + dummies = pd.get_dummies(cats) + result = dummies[[c for c in dummies.columns]] + assert_frame_equal(result, expected) + def test_ix_categorical_index(self): # GH 12531 - df = pd.DataFrame(np.random.randn(3, 3), - index=list('ABC'), columns=list('XYZ')) + df = DataFrame(np.random.randn(3, 3), + index=list('ABC'), columns=list('XYZ')) cdf = df.copy() - cdf.index = pd.CategoricalIndex(df.index) - cdf.columns = pd.CategoricalIndex(df.columns) + cdf.index = CategoricalIndex(df.index) + cdf.columns = CategoricalIndex(df.columns) - expect = pd.Series(df.loc['A', :], index=cdf.columns, name='A') + expect = Series(df.loc['A', :], index=cdf.columns, name='A') assert_series_equal(cdf.loc['A', :], expect) - expect = pd.Series(df.loc[:, 'X'], index=cdf.index, name='X') + expect = Series(df.loc[:, 'X'], index=cdf.index, name='X') assert_series_equal(cdf.loc[:, 'X'], expect) - exp_index = pd.CategoricalIndex(list('AB'), categories=['A', 'B', 'C']) - expect = pd.DataFrame(df.loc[['A', 'B'], :], columns=cdf.columns, - index=exp_index) + exp_index = CategoricalIndex(list('AB'), categories=['A', 'B', 'C']) + expect = DataFrame(df.loc[['A', 'B'], :], columns=cdf.columns, + index=exp_index) assert_frame_equal(cdf.loc[['A', 'B'], :], expect) - exp_columns = pd.CategoricalIndex(list('XY'), - categories=['X', 'Y', 'Z']) - expect = pd.DataFrame(df.loc[:, ['X', 'Y']], index=cdf.index, - columns=exp_columns) + exp_columns = CategoricalIndex(list('XY'), + categories=['X', 'Y', 'Z']) + expect = DataFrame(df.loc[:, ['X', 'Y']], index=cdf.index, + columns=exp_columns) assert_frame_equal(cdf.loc[:, ['X', 'Y']], expect) # non-unique - df = pd.DataFrame(np.random.randn(3, 3), - index=list('ABA'), columns=list('XYX')) + df = DataFrame(np.random.randn(3, 3), + index=list('ABA'), columns=list('XYX')) cdf = df.copy() - cdf.index = pd.CategoricalIndex(df.index) - cdf.columns = pd.CategoricalIndex(df.columns) + cdf.index = CategoricalIndex(df.index) + cdf.columns = CategoricalIndex(df.columns) - exp_index = pd.CategoricalIndex(list('AA'), categories=['A', 'B']) - expect = pd.DataFrame(df.loc['A', :], columns=cdf.columns, - index=exp_index) + exp_index = CategoricalIndex(list('AA'), categories=['A', 'B']) + expect = DataFrame(df.loc['A', :], columns=cdf.columns, + index=exp_index) assert_frame_equal(cdf.loc['A', :], expect) - exp_columns = pd.CategoricalIndex(list('XX'), categories=['X', 'Y']) - expect = pd.DataFrame(df.loc[:, 'X'], index=cdf.index, - columns=exp_columns) + exp_columns = CategoricalIndex(list('XX'), categories=['X', 'Y']) + expect = DataFrame(df.loc[:, 'X'], index=cdf.index, + columns=exp_columns) assert_frame_equal(cdf.loc[:, 'X'], expect) - expect = pd.DataFrame(df.loc[['A', 'B'], :], columns=cdf.columns, - index=pd.CategoricalIndex(list('AAB'))) + expect = DataFrame(df.loc[['A', 'B'], :], columns=cdf.columns, + index=CategoricalIndex(list('AAB'))) assert_frame_equal(cdf.loc[['A', 'B'], :], expect) - expect = pd.DataFrame(df.loc[:, ['X', 'Y']], index=cdf.index, - columns=pd.CategoricalIndex(list('XXY'))) + expect = DataFrame(df.loc[:, ['X', 'Y']], index=cdf.index, + columns=CategoricalIndex(list('XXY'))) assert_frame_equal(cdf.loc[:, ['X', 'Y']], expect) def test_read_only_source(self): @@ -279,13 +316,13 @@ def test_reindexing(self): # then return a Categorical cats = list('cabe') - result = self.df2.reindex(pd.Categorical(['a', 'd'], categories=cats)) + result = self.df2.reindex(Categorical(['a', 'd'], categories=cats)) expected = DataFrame({'A': [0, 1, 5, np.nan], 'B': Series(list('aaad')).astype( 'category', categories=cats)}).set_index('B') assert_frame_equal(result, expected, check_index_type=True) - result = self.df2.reindex(pd.Categorical(['a'], categories=cats)) + result = self.df2.reindex(Categorical(['a'], categories=cats)) expected = DataFrame({'A': [0, 1, 5], 'B': Series(list('aaa')).astype( 'category', categories=cats)}).set_index('B') @@ -307,7 +344,7 @@ def test_reindexing(self): assert_frame_equal(result, expected, check_index_type=True) # give back the type of categorical that we received - result = self.df2.reindex(pd.Categorical( + result = self.df2.reindex(Categorical( ['a', 'd'], categories=cats, ordered=True)) expected = DataFrame( {'A': [0, 1, 5, np.nan], @@ -315,7 +352,7 @@ def test_reindexing(self): ordered=True)}).set_index('B') assert_frame_equal(result, expected, check_index_type=True) - result = self.df2.reindex(pd.Categorical( + result = self.df2.reindex(Categorical( ['a', 'd'], categories=['a', 'd'])) expected = DataFrame({'A': [0, 1, 5, np.nan], 'B': Series(list('aaad')).astype( @@ -324,22 +361,22 @@ def test_reindexing(self): assert_frame_equal(result, expected, check_index_type=True) # passed duplicate indexers are not allowed - self.assertRaises(ValueError, lambda: self.df2.reindex(['a', 'a'])) + pytest.raises(ValueError, lambda: self.df2.reindex(['a', 'a'])) # args NotImplemented ATM - self.assertRaises(NotImplementedError, - lambda: self.df2.reindex(['a'], method='ffill')) - self.assertRaises(NotImplementedError, - lambda: self.df2.reindex(['a'], level=1)) - self.assertRaises(NotImplementedError, - lambda: self.df2.reindex(['a'], limit=2)) + pytest.raises(NotImplementedError, + lambda: self.df2.reindex(['a'], method='ffill')) + pytest.raises(NotImplementedError, + lambda: self.df2.reindex(['a'], level=1)) + pytest.raises(NotImplementedError, + lambda: self.df2.reindex(['a'], limit=2)) def test_loc_slice(self): # slicing # not implemented ATM # GH9748 - self.assertRaises(TypeError, lambda: self.df.loc[1:5]) + pytest.raises(TypeError, lambda: self.df.loc[1:5]) # result = df.loc[1:5] # expected = df.iloc[[1,2,3,4]] @@ -387,8 +424,8 @@ def test_boolean_selection(self): # categories=[3, 2, 1], # ordered=False, # name=u'B') - self.assertRaises(TypeError, lambda: df4[df4.index < 2]) - self.assertRaises(TypeError, lambda: df4[df4.index > 1]) + pytest.raises(TypeError, lambda: df4[df4.index < 2]) + pytest.raises(TypeError, lambda: df4[df4.index > 1]) def test_indexing_with_category(self): diff --git a/pandas/tests/indexing/test_chaining_and_caching.py b/pandas/tests/indexing/test_chaining_and_caching.py new file mode 100644 index 0000000000000..25e572ee09a6b --- /dev/null +++ b/pandas/tests/indexing/test_chaining_and_caching.py @@ -0,0 +1,420 @@ +from warnings import catch_warnings + +import pytest + +import numpy as np +import pandas as pd +from pandas.core import common as com +from pandas import (compat, DataFrame, option_context, + Series, MultiIndex, date_range, Timestamp) +from pandas.util import testing as tm + + +class TestCaching(object): + + def test_slice_consolidate_invalidate_item_cache(self): + + # this is chained assignment, but will 'work' + with option_context('chained_assignment', None): + + # #3970 + df = DataFrame({"aa": compat.lrange(5), "bb": [2.2] * 5}) + + # Creates a second float block + df["cc"] = 0.0 + + # caches a reference to the 'bb' series + df["bb"] + + # repr machinery triggers consolidation + repr(df) + + # Assignment to wrong series + df['bb'].iloc[0] = 0.17 + df._clear_item_cache() + tm.assert_almost_equal(df['bb'][0], 0.17) + + def test_setitem_cache_updating(self): + # GH 5424 + cont = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'] + + for do_ref in [False, False]: + df = DataFrame({'a': cont, + "b": cont[3:] + cont[:3], + 'c': np.arange(7)}) + + # ref the cache + if do_ref: + df.loc[0, "c"] + + # set it + df.loc[7, 'c'] = 1 + + assert df.loc[0, 'c'] == 0.0 + assert df.loc[7, 'c'] == 1.0 + + # GH 7084 + # not updating cache on series setting with slices + expected = DataFrame({'A': [600, 600, 600]}, + index=date_range('5/7/2014', '5/9/2014')) + out = DataFrame({'A': [0, 0, 0]}, + index=date_range('5/7/2014', '5/9/2014')) + df = DataFrame({'C': ['A', 'A', 'A'], 'D': [100, 200, 300]}) + + # loop through df to update out + six = Timestamp('5/7/2014') + eix = Timestamp('5/9/2014') + for ix, row in df.iterrows(): + out.loc[six:eix, row['C']] = out.loc[six:eix, row['C']] + row['D'] + + tm.assert_frame_equal(out, expected) + tm.assert_series_equal(out['A'], expected['A']) + + # try via a chain indexing + # this actually works + out = DataFrame({'A': [0, 0, 0]}, + index=date_range('5/7/2014', '5/9/2014')) + for ix, row in df.iterrows(): + v = out[row['C']][six:eix] + row['D'] + out[row['C']][six:eix] = v + + tm.assert_frame_equal(out, expected) + tm.assert_series_equal(out['A'], expected['A']) + + out = DataFrame({'A': [0, 0, 0]}, + index=date_range('5/7/2014', '5/9/2014')) + for ix, row in df.iterrows(): + out.loc[six:eix, row['C']] += row['D'] + + tm.assert_frame_equal(out, expected) + tm.assert_series_equal(out['A'], expected['A']) + + +class TestChaining(object): + + def test_setitem_chained_setfault(self): + + # GH6026 + # setfaults under numpy 1.7.1 (ok on 1.8) + data = ['right', 'left', 'left', 'left', 'right', 'left', 'timeout'] + mdata = ['right', 'left', 'left', 'left', 'right', 'left', 'none'] + + df = DataFrame({'response': np.array(data)}) + mask = df.response == 'timeout' + df.response[mask] = 'none' + tm.assert_frame_equal(df, DataFrame({'response': mdata})) + + recarray = np.rec.fromarrays([data], names=['response']) + df = DataFrame(recarray) + mask = df.response == 'timeout' + df.response[mask] = 'none' + tm.assert_frame_equal(df, DataFrame({'response': mdata})) + + df = DataFrame({'response': data, 'response1': data}) + mask = df.response == 'timeout' + df.response[mask] = 'none' + tm.assert_frame_equal(df, DataFrame({'response': mdata, + 'response1': data})) + + # GH 6056 + expected = DataFrame(dict(A=[np.nan, 'bar', 'bah', 'foo', 'bar'])) + df = DataFrame(dict(A=np.array(['foo', 'bar', 'bah', 'foo', 'bar']))) + df['A'].iloc[0] = np.nan + result = df.head() + tm.assert_frame_equal(result, expected) + + df = DataFrame(dict(A=np.array(['foo', 'bar', 'bah', 'foo', 'bar']))) + df.A.iloc[0] = np.nan + result = df.head() + tm.assert_frame_equal(result, expected) + + def test_detect_chained_assignment(self): + + pd.set_option('chained_assignment', 'raise') + + # work with the chain + expected = DataFrame([[-5, 1], [-6, 3]], columns=list('AB')) + df = DataFrame(np.arange(4).reshape(2, 2), + columns=list('AB'), dtype='int64') + assert df.is_copy is None + + df['A'][0] = -5 + df['A'][1] = -6 + tm.assert_frame_equal(df, expected) + + # test with the chaining + df = DataFrame({'A': Series(range(2), dtype='int64'), + 'B': np.array(np.arange(2, 4), dtype=np.float64)}) + assert df.is_copy is None + + with pytest.raises(com.SettingWithCopyError): + df['A'][0] = -5 + + with pytest.raises(com.SettingWithCopyError): + df['A'][1] = np.nan + + assert df['A'].is_copy is None + + # Using a copy (the chain), fails + df = DataFrame({'A': Series(range(2), dtype='int64'), + 'B': np.array(np.arange(2, 4), dtype=np.float64)}) + + with pytest.raises(com.SettingWithCopyError): + df.loc[0]['A'] = -5 + + # Doc example + df = DataFrame({'a': ['one', 'one', 'two', 'three', + 'two', 'one', 'six'], + 'c': Series(range(7), dtype='int64')}) + assert df.is_copy is None + + with pytest.raises(com.SettingWithCopyError): + indexer = df.a.str.startswith('o') + df[indexer]['c'] = 42 + + expected = DataFrame({'A': [111, 'bbb', 'ccc'], 'B': [1, 2, 3]}) + df = DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) + + with pytest.raises(com.SettingWithCopyError): + df['A'][0] = 111 + + with pytest.raises(com.SettingWithCopyError): + df.loc[0]['A'] = 111 + + df.loc[0, 'A'] = 111 + tm.assert_frame_equal(df, expected) + + # gh-5475: Make sure that is_copy is picked up reconstruction + df = DataFrame({"A": [1, 2]}) + assert df.is_copy is None + + with tm.ensure_clean('__tmp__pickle') as path: + df.to_pickle(path) + df2 = pd.read_pickle(path) + df2["B"] = df2["A"] + df2["B"] = df2["A"] + + # gh-5597: a spurious raise as we are setting the entire column here + from string import ascii_letters as letters + + def random_text(nobs=100): + df = [] + for i in range(nobs): + idx = np.random.randint(len(letters), size=2) + idx.sort() + + df.append([letters[idx[0]:idx[1]]]) + + return DataFrame(df, columns=['letters']) + + df = random_text(100000) + + # Always a copy + x = df.iloc[[0, 1, 2]] + assert x.is_copy is not None + + x = df.iloc[[0, 1, 2, 4]] + assert x.is_copy is not None + + # Explicitly copy + indexer = df.letters.apply(lambda x: len(x) > 10) + df = df.loc[indexer].copy() + + assert df.is_copy is None + df['letters'] = df['letters'].apply(str.lower) + + # Implicitly take + df = random_text(100000) + indexer = df.letters.apply(lambda x: len(x) > 10) + df = df.loc[indexer] + + assert df.is_copy is not None + df['letters'] = df['letters'].apply(str.lower) + + # Implicitly take 2 + df = random_text(100000) + indexer = df.letters.apply(lambda x: len(x) > 10) + + df = df.loc[indexer] + assert df.is_copy is not None + df.loc[:, 'letters'] = df['letters'].apply(str.lower) + + # Should be ok even though it's a copy! + assert df.is_copy is None + + df['letters'] = df['letters'].apply(str.lower) + assert df.is_copy is None + + df = random_text(100000) + indexer = df.letters.apply(lambda x: len(x) > 10) + df.loc[indexer, 'letters'] = ( + df.loc[indexer, 'letters'].apply(str.lower)) + + # an identical take, so no copy + df = DataFrame({'a': [1]}).dropna() + assert df.is_copy is None + df['a'] += 1 + + # Inplace ops, originally from: + # http://stackoverflow.com/questions/20508968/series-fillna-in-a-multiindex-dataframe-does-not-fill-is-this-a-bug + a = [12, 23] + b = [123, None] + c = [1234, 2345] + d = [12345, 23456] + tuples = [('eyes', 'left'), ('eyes', 'right'), ('ears', 'left'), + ('ears', 'right')] + events = {('eyes', 'left'): a, + ('eyes', 'right'): b, + ('ears', 'left'): c, + ('ears', 'right'): d} + multiind = MultiIndex.from_tuples(tuples, names=['part', 'side']) + zed = DataFrame(events, index=['a', 'b'], columns=multiind) + + with pytest.raises(com.SettingWithCopyError): + zed['eyes']['right'].fillna(value=555, inplace=True) + + df = DataFrame(np.random.randn(10, 4)) + s = df.iloc[:, 0].sort_values() + + tm.assert_series_equal(s, df.iloc[:, 0].sort_values()) + tm.assert_series_equal(s, df[0].sort_values()) + + # see gh-6025: false positives + df = DataFrame({'column1': ['a', 'a', 'a'], 'column2': [4, 8, 9]}) + str(df) + + df['column1'] = df['column1'] + 'b' + str(df) + + df = df[df['column2'] != 8] + str(df) + + df['column1'] = df['column1'] + 'c' + str(df) + + # from SO: + # http://stackoverflow.com/questions/24054495/potential-bug-setting-value-for-undefined-column-using-iloc + df = DataFrame(np.arange(0, 9), columns=['count']) + df['group'] = 'b' + + with pytest.raises(com.SettingWithCopyError): + df.iloc[0:5]['group'] = 'a' + + # Mixed type setting but same dtype & changing dtype + df = DataFrame(dict(A=date_range('20130101', periods=5), + B=np.random.randn(5), + C=np.arange(5, dtype='int64'), + D=list('abcde'))) + + with pytest.raises(com.SettingWithCopyError): + df.loc[2]['D'] = 'foo' + + with pytest.raises(com.SettingWithCopyError): + df.loc[2]['C'] = 'foo' + + with pytest.raises(com.SettingWithCopyError): + df['C'][2] = 'foo' + + def test_setting_with_copy_bug(self): + + # operating on a copy + df = pd.DataFrame({'a': list(range(4)), + 'b': list('ab..'), + 'c': ['a', 'b', np.nan, 'd']}) + mask = pd.isna(df.c) + + def f(): + df[['c']][mask] = df[['b']][mask] + + pytest.raises(com.SettingWithCopyError, f) + + # invalid warning as we are returning a new object + # GH 8730 + df1 = DataFrame({'x': Series(['a', 'b', 'c']), + 'y': Series(['d', 'e', 'f'])}) + df2 = df1[['x']] + + # this should not raise + df2['y'] = ['g', 'h', 'i'] + + def test_detect_chained_assignment_warnings(self): + + # warnings + with option_context('chained_assignment', 'warn'): + df = DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) + with tm.assert_produces_warning( + expected_warning=com.SettingWithCopyWarning): + df.loc[0]['A'] = 111 + + def test_chained_getitem_with_lists(self): + + # GH6394 + # Regression in chained getitem indexing with embedded list-like from + # 0.12 + def check(result, expected): + tm.assert_numpy_array_equal(result, expected) + assert isinstance(result, np.ndarray) + + df = DataFrame({'A': 5 * [np.zeros(3)], 'B': 5 * [np.ones(3)]}) + expected = df['A'].iloc[2] + result = df.loc[2, 'A'] + check(result, expected) + result2 = df.iloc[2]['A'] + check(result2, expected) + result3 = df['A'].loc[2] + check(result3, expected) + result4 = df['A'].iloc[2] + check(result4, expected) + + def test_cache_updating(self): + # GH 4939, make sure to update the cache on setitem + + df = tm.makeDataFrame() + df['A'] # cache series + with catch_warnings(record=True): + df.ix["Hello Friend"] = df.ix[0] + assert "Hello Friend" in df['A'].index + assert "Hello Friend" in df['B'].index + + with catch_warnings(record=True): + panel = tm.makePanel() + panel.ix[0] # get first item into cache + panel.ix[:, :, 'A+1'] = panel.ix[:, :, 'A'] + 1 + assert "A+1" in panel.ix[0].columns + assert "A+1" in panel.ix[1].columns + + # 5216 + # make sure that we don't try to set a dead cache + a = np.random.rand(10, 3) + df = DataFrame(a, columns=['x', 'y', 'z']) + tuples = [(i, j) for i in range(5) for j in range(2)] + index = MultiIndex.from_tuples(tuples) + df.index = index + + # setting via chained assignment + # but actually works, since everything is a view + df.loc[0]['z'].iloc[0] = 1. + result = df.loc[(0, 0), 'z'] + assert result == 1 + + # correct setting + df.loc[(0, 0), 'z'] = 2 + result = df.loc[(0, 0), 'z'] + assert result == 2 + + # 10264 + df = DataFrame(np.zeros((5, 5), dtype='int64'), columns=[ + 'a', 'b', 'c', 'd', 'e'], index=range(5)) + df['f'] = 0 + df.f.values[3] = 1 + + # TODO(wesm): unused? + # y = df.iloc[np.arange(2, len(df))] + + df.f.values[3] = 2 + expected = DataFrame(np.zeros((5, 6), dtype='int64'), columns=[ + 'a', 'b', 'c', 'd', 'e', 'f'], index=range(5)) + expected.at[3, 'f'] = 2 + tm.assert_frame_equal(df, expected) + expected = Series([0, 0, 0, 2, 0], name='f') + tm.assert_series_equal(df.f, expected) diff --git a/pandas/tests/indexing/test_coercion.py b/pandas/tests/indexing/test_coercion.py index 0cfa7258461f1..752d2deb53304 100644 --- a/pandas/tests/indexing/test_coercion.py +++ b/pandas/tests/indexing/test_coercion.py @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- -import nose +import pytest import numpy as np import pandas as pd @@ -15,8 +15,6 @@ class CoercionBase(object): - _multiprocess_can_split_ = True - klasses = ['index', 'series'] dtypes = ['object', 'int64', 'float64', 'complex128', 'bool', 'datetime64', 'datetime64tz', 'timedelta64', 'period'] @@ -33,8 +31,8 @@ def _assert(self, left, right, dtype): tm.assert_index_equal(left, right) else: raise NotImplementedError - self.assertEqual(left.dtype, dtype) - self.assertEqual(right.dtype, dtype) + assert left.dtype == dtype + assert right.dtype == dtype def test_has_comprehensive_tests(self): for klass in self.klasses: @@ -46,7 +44,7 @@ def test_has_comprehensive_tests(self): raise AssertionError(msg.format(type(self), method_name)) -class TestSetitemCoercion(CoercionBase, tm.TestCase): +class TestSetitemCoercion(CoercionBase): method = 'setitem' @@ -57,7 +55,7 @@ def _assert_setitem_series_conversion(self, original_series, loc_value, temp[1] = loc_value tm.assert_series_equal(temp, expected_series) # check dtype explicitly for sure - self.assertEqual(temp.dtype, expected_dtype) + assert temp.dtype == expected_dtype # .loc works different rule, temporary disable # temp = original_series.copy() @@ -66,7 +64,7 @@ def _assert_setitem_series_conversion(self, original_series, loc_value, def test_setitem_series_object(self): obj = pd.Series(list('abcd')) - self.assertEqual(obj.dtype, np.object) + assert obj.dtype == np.object # object + int -> object exp = pd.Series(['a', 1, 'c', 'd']) @@ -86,7 +84,7 @@ def test_setitem_series_object(self): def test_setitem_series_int64(self): obj = pd.Series([1, 2, 3, 4]) - self.assertEqual(obj.dtype, np.int64) + assert obj.dtype == np.int64 # int + int -> int exp = pd.Series([1, 1, 3, 4]) @@ -95,7 +93,7 @@ def test_setitem_series_int64(self): # int + float -> float # TODO_GH12747 The result must be float # tm.assert_series_equal(temp, pd.Series([1, 1.1, 3, 4])) - # self.assertEqual(temp.dtype, np.float64) + # assert temp.dtype == np.float64 exp = pd.Series([1, 1, 3, 4]) self._assert_setitem_series_conversion(obj, 1.1, exp, np.int64) @@ -103,13 +101,26 @@ def test_setitem_series_int64(self): exp = pd.Series([1, 1 + 1j, 3, 4]) self._assert_setitem_series_conversion(obj, 1 + 1j, exp, np.complex128) - # int + bool -> int - exp = pd.Series([1, 1, 3, 4]) - self._assert_setitem_series_conversion(obj, True, exp, np.int64) + # int + bool -> object + exp = pd.Series([1, True, 3, 4]) + self._assert_setitem_series_conversion(obj, True, exp, np.object) + + def test_setitem_series_int8(self): + # integer dtype coercion (no change) + obj = pd.Series([1, 2, 3, 4], dtype=np.int8) + assert obj.dtype == np.int8 + + exp = pd.Series([1, 1, 3, 4], dtype=np.int8) + self._assert_setitem_series_conversion(obj, np.int32(1), exp, np.int8) + + # BUG: it must be Series([1, 1, 3, 4], dtype=np.int16) + exp = pd.Series([1, 0, 3, 4], dtype=np.int8) + self._assert_setitem_series_conversion(obj, np.int16(2**9), exp, + np.int8) def test_setitem_series_float64(self): obj = pd.Series([1.1, 2.2, 3.3, 4.4]) - self.assertEqual(obj.dtype, np.float64) + assert obj.dtype == np.float64 # float + int -> float exp = pd.Series([1.1, 1.0, 3.3, 4.4]) @@ -124,17 +135,17 @@ def test_setitem_series_float64(self): self._assert_setitem_series_conversion(obj, 1 + 1j, exp, np.complex128) - # float + bool -> float - exp = pd.Series([1.1, 1.0, 3.3, 4.4]) - self._assert_setitem_series_conversion(obj, True, exp, np.float64) + # float + bool -> object + exp = pd.Series([1.1, True, 3.3, 4.4]) + self._assert_setitem_series_conversion(obj, True, exp, np.object) def test_setitem_series_complex128(self): obj = pd.Series([1 + 1j, 2 + 2j, 3 + 3j, 4 + 4j]) - self.assertEqual(obj.dtype, np.complex128) + assert obj.dtype == np.complex128 # complex + int -> complex exp = pd.Series([1 + 1j, 1, 3 + 3j, 4 + 4j]) - self._assert_setitem_series_conversion(obj, True, exp, np.complex128) + self._assert_setitem_series_conversion(obj, 1, exp, np.complex128) # complex + float -> complex exp = pd.Series([1 + 1j, 1.1, 3 + 3j, 4 + 4j]) @@ -144,39 +155,39 @@ def test_setitem_series_complex128(self): exp = pd.Series([1 + 1j, 1 + 1j, 3 + 3j, 4 + 4j]) self._assert_setitem_series_conversion(obj, 1 + 1j, exp, np.complex128) - # complex + bool -> complex - exp = pd.Series([1 + 1j, 1, 3 + 3j, 4 + 4j]) - self._assert_setitem_series_conversion(obj, True, exp, np.complex128) + # complex + bool -> object + exp = pd.Series([1 + 1j, True, 3 + 3j, 4 + 4j]) + self._assert_setitem_series_conversion(obj, True, exp, np.object) def test_setitem_series_bool(self): obj = pd.Series([True, False, True, False]) - self.assertEqual(obj.dtype, np.bool) + assert obj.dtype == np.bool # bool + int -> int # TODO_GH12747 The result must be int # tm.assert_series_equal(temp, pd.Series([1, 1, 1, 0])) - # self.assertEqual(temp.dtype, np.int64) + # assert temp.dtype == np.int64 exp = pd.Series([True, True, True, False]) self._assert_setitem_series_conversion(obj, 1, exp, np.bool) # TODO_GH12747 The result must be int # assigning int greater than bool # tm.assert_series_equal(temp, pd.Series([1, 3, 1, 0])) - # self.assertEqual(temp.dtype, np.int64) + # assert temp.dtype == np.int64 exp = pd.Series([True, True, True, False]) self._assert_setitem_series_conversion(obj, 3, exp, np.bool) # bool + float -> float # TODO_GH12747 The result must be float # tm.assert_series_equal(temp, pd.Series([1., 1.1, 1., 0.])) - # self.assertEqual(temp.dtype, np.float64) + # assert temp.dtype == np.float64 exp = pd.Series([True, True, True, False]) self._assert_setitem_series_conversion(obj, 1.1, exp, np.bool) # bool + complex -> complex (buggy, results in bool) # TODO_GH12747 The result must be complex # tm.assert_series_equal(temp, pd.Series([1, 1 + 1j, 1, 0])) - # self.assertEqual(temp.dtype, np.complex128) + # assert temp.dtype == np.complex128 exp = pd.Series([True, True, True, False]) self._assert_setitem_series_conversion(obj, 1 + 1j, exp, np.bool) @@ -189,7 +200,7 @@ def test_setitem_series_datetime64(self): pd.Timestamp('2011-01-02'), pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' # datetime64 + datetime64 -> datetime64 exp = pd.Series([pd.Timestamp('2011-01-01'), @@ -200,14 +211,18 @@ def test_setitem_series_datetime64(self): exp, 'datetime64[ns]') # datetime64 + int -> object - # ToDo: The result must be object exp = pd.Series([pd.Timestamp('2011-01-01'), - pd.Timestamp(1), + 1, pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self._assert_setitem_series_conversion(obj, 1, exp, 'datetime64[ns]') + self._assert_setitem_series_conversion(obj, 1, exp, 'object') - # ToDo: add more tests once the above issue has been fixed + # datetime64 + object -> object + exp = pd.Series([pd.Timestamp('2011-01-01'), + 'x', + pd.Timestamp('2011-01-03'), + pd.Timestamp('2011-01-04')]) + self._assert_setitem_series_conversion(obj, 'x', exp, np.object) def test_setitem_series_datetime64tz(self): tz = 'US/Eastern' @@ -215,7 +230,7 @@ def test_setitem_series_datetime64tz(self): pd.Timestamp('2011-01-02', tz=tz), pd.Timestamp('2011-01-03', tz=tz), pd.Timestamp('2011-01-04', tz=tz)]) - self.assertEqual(obj.dtype, 'datetime64[ns, US/Eastern]') + assert obj.dtype == 'datetime64[ns, US/Eastern]' # datetime64tz + datetime64tz -> datetime64tz exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), @@ -226,19 +241,59 @@ def test_setitem_series_datetime64tz(self): self._assert_setitem_series_conversion(obj, value, exp, 'datetime64[ns, US/Eastern]') + # datetime64tz + datetime64tz (different tz) -> object + exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), + pd.Timestamp('2012-01-01', tz='US/Pacific'), + pd.Timestamp('2011-01-03', tz=tz), + pd.Timestamp('2011-01-04', tz=tz)]) + value = pd.Timestamp('2012-01-01', tz='US/Pacific') + self._assert_setitem_series_conversion(obj, value, exp, np.object) + + # datetime64tz + datetime64 -> object + exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), + pd.Timestamp('2012-01-01'), + pd.Timestamp('2011-01-03', tz=tz), + pd.Timestamp('2011-01-04', tz=tz)]) + value = pd.Timestamp('2012-01-01') + self._assert_setitem_series_conversion(obj, value, exp, np.object) + # datetime64 + int -> object - # ToDo: The result must be object exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), - pd.Timestamp(1, tz=tz), + 1, pd.Timestamp('2011-01-03', tz=tz), pd.Timestamp('2011-01-04', tz=tz)]) - self._assert_setitem_series_conversion(obj, 1, exp, - 'datetime64[ns, US/Eastern]') + self._assert_setitem_series_conversion(obj, 1, exp, np.object) # ToDo: add more tests once the above issue has been fixed def test_setitem_series_timedelta64(self): - pass + obj = pd.Series([pd.Timedelta('1 day'), + pd.Timedelta('2 day'), + pd.Timedelta('3 day'), + pd.Timedelta('4 day')]) + assert obj.dtype == 'timedelta64[ns]' + + # timedelta64 + timedelta64 -> timedelta64 + exp = pd.Series([pd.Timedelta('1 day'), + pd.Timedelta('12 day'), + pd.Timedelta('3 day'), + pd.Timedelta('4 day')]) + self._assert_setitem_series_conversion(obj, pd.Timedelta('12 day'), + exp, 'timedelta64[ns]') + + # timedelta64 + int -> object + exp = pd.Series([pd.Timedelta('1 day'), + 1, + pd.Timedelta('3 day'), + pd.Timedelta('4 day')]) + self._assert_setitem_series_conversion(obj, 1, exp, np.object) + + # timedelta64 + object -> object + exp = pd.Series([pd.Timedelta('1 day'), + 'x', + pd.Timedelta('3 day'), + pd.Timedelta('4 day')]) + self._assert_setitem_series_conversion(obj, 'x', exp, np.object) def test_setitem_series_period(self): pass @@ -251,18 +306,18 @@ def _assert_setitem_index_conversion(self, original_series, loc_key, exp = pd.Series([1, 2, 3, 4, 5], index=expected_index) tm.assert_series_equal(temp, exp) # check dtype explicitly for sure - self.assertEqual(temp.index.dtype, expected_dtype) + assert temp.index.dtype == expected_dtype temp = original_series.copy() temp.loc[loc_key] = 5 exp = pd.Series([1, 2, 3, 4, 5], index=expected_index) tm.assert_series_equal(temp, exp) # check dtype explicitly for sure - self.assertEqual(temp.index.dtype, expected_dtype) + assert temp.index.dtype == expected_dtype def test_setitem_index_object(self): obj = pd.Series([1, 2, 3, 4], index=list('abcd')) - self.assertEqual(obj.index.dtype, np.object) + assert obj.index.dtype == np.object # object + object -> object exp_index = pd.Index(list('abcdx')) @@ -270,7 +325,7 @@ def test_setitem_index_object(self): # object + int -> IndexError, regarded as location temp = obj.copy() - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): temp[5] = 5 # object + float -> object @@ -280,7 +335,7 @@ def test_setitem_index_object(self): def test_setitem_index_int64(self): # tests setitem with non-existing numeric key obj = pd.Series([1, 2, 3, 4]) - self.assertEqual(obj.index.dtype, np.int64) + assert obj.index.dtype == np.int64 # int + int -> int exp_index = pd.Index([0, 1, 2, 3, 5]) @@ -297,12 +352,12 @@ def test_setitem_index_int64(self): def test_setitem_index_float64(self): # tests setitem with non-existing numeric key obj = pd.Series([1, 2, 3, 4], index=[1.1, 2.1, 3.1, 4.1]) - self.assertEqual(obj.index.dtype, np.float64) + assert obj.index.dtype == np.float64 # float + int -> int temp = obj.copy() # TODO_GH12747 The result must be float - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): temp[5] = 5 # float + float -> float @@ -332,7 +387,7 @@ def test_setitem_index_period(self): pass -class TestInsertIndexCoercion(CoercionBase, tm.TestCase): +class TestInsertIndexCoercion(CoercionBase): klasses = ['index'] method = 'insert' @@ -343,11 +398,11 @@ def _assert_insert_conversion(self, original, value, target = original.copy() res = target.insert(1, value) tm.assert_index_equal(res, expected) - self.assertEqual(res.dtype, expected_dtype) + assert res.dtype == expected_dtype def test_insert_index_object(self): obj = pd.Index(list('abcd')) - self.assertEqual(obj.dtype, np.object) + assert obj.dtype == np.object # object + int -> object exp = pd.Index(['a', 1, 'b', 'c', 'd']) @@ -360,7 +415,7 @@ def test_insert_index_object(self): # object + bool -> object res = obj.insert(1, False) tm.assert_index_equal(res, pd.Index(['a', False, 'b', 'c', 'd'])) - self.assertEqual(res.dtype, np.object) + assert res.dtype == np.object # object + object -> object exp = pd.Index(['a', 'x', 'b', 'c', 'd']) @@ -368,7 +423,7 @@ def test_insert_index_object(self): def test_insert_index_int64(self): obj = pd.Int64Index([1, 2, 3, 4]) - self.assertEqual(obj.dtype, np.int64) + assert obj.dtype == np.int64 # int + int -> int exp = pd.Index([1, 1, 2, 3, 4]) @@ -388,7 +443,7 @@ def test_insert_index_int64(self): def test_insert_index_float64(self): obj = pd.Float64Index([1., 2., 3., 4.]) - self.assertEqual(obj.dtype, np.float64) + assert obj.dtype == np.float64 # float + int -> int exp = pd.Index([1., 1., 2., 3., 4.]) @@ -415,7 +470,7 @@ def test_insert_index_bool(self): def test_insert_index_datetime64(self): obj = pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04']) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' # datetime64 + datetime64 => datetime64 exp = pd.DatetimeIndex(['2011-01-01', '2012-01-01', '2011-01-02', @@ -425,18 +480,18 @@ def test_insert_index_datetime64(self): # ToDo: must coerce to object msg = "Passed item and index have different timezone" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): obj.insert(1, pd.Timestamp('2012-01-01', tz='US/Eastern')) # ToDo: must coerce to object msg = "cannot insert DatetimeIndex with incompatible label" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): obj.insert(1, 1) def test_insert_index_datetime64tz(self): obj = pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04'], tz='US/Eastern') - self.assertEqual(obj.dtype, 'datetime64[ns, US/Eastern]') + assert obj.dtype == 'datetime64[ns, US/Eastern]' # datetime64tz + datetime64tz => datetime64 exp = pd.DatetimeIndex(['2011-01-01', '2012-01-01', '2011-01-02', @@ -447,22 +502,22 @@ def test_insert_index_datetime64tz(self): # ToDo: must coerce to object msg = "Passed item and index have different timezone" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): obj.insert(1, pd.Timestamp('2012-01-01')) # ToDo: must coerce to object msg = "Passed item and index have different timezone" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): obj.insert(1, pd.Timestamp('2012-01-01', tz='Asia/Tokyo')) # ToDo: must coerce to object msg = "cannot insert DatetimeIndex with incompatible label" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): obj.insert(1, 1) def test_insert_index_timedelta64(self): obj = pd.TimedeltaIndex(['1 day', '2 day', '3 day', '4 day']) - self.assertEqual(obj.dtype, 'timedelta64[ns]') + assert obj.dtype == 'timedelta64[ns]' # timedelta64 + timedelta64 => timedelta64 exp = pd.TimedeltaIndex(['1 day', '10 day', '2 day', '3 day', '4 day']) @@ -471,18 +526,18 @@ def test_insert_index_timedelta64(self): # ToDo: must coerce to object msg = "cannot insert TimedeltaIndex with incompatible label" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): obj.insert(1, pd.Timestamp('2012-01-01')) # ToDo: must coerce to object msg = "cannot insert TimedeltaIndex with incompatible label" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): obj.insert(1, 1) def test_insert_index_period(self): obj = pd.PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04'], freq='M') - self.assertEqual(obj.dtype, 'period[M]') + assert obj.dtype == 'period[M]' # period + period => period exp = pd.PeriodIndex(['2011-01', '2012-01', '2011-02', @@ -516,7 +571,7 @@ def test_insert_index_period(self): self._assert_insert_conversion(obj, 'x', exp, np.object) -class TestWhereCoercion(CoercionBase, tm.TestCase): +class TestWhereCoercion(CoercionBase): method = 'where' @@ -529,7 +584,7 @@ def _assert_where_conversion(self, original, cond, values, def _where_object_common(self, klass): obj = klass(list('abcd')) - self.assertEqual(obj.dtype, np.object) + assert obj.dtype == np.object cond = klass([True, False, True, False]) # object + int -> object @@ -582,7 +637,7 @@ def test_where_index_object(self): def _where_int64_common(self, klass): obj = klass([1, 2, 3, 4]) - self.assertEqual(obj.dtype, np.int64) + assert obj.dtype == np.int64 cond = klass([True, False, True, False]) # int + int -> int @@ -612,13 +667,13 @@ def _where_int64_common(self, klass): self._assert_where_conversion(obj, cond, values, exp, np.complex128) - # int + bool -> int - exp = klass([1, 1, 3, 1]) - self._assert_where_conversion(obj, cond, True, exp, np.int64) + # int + bool -> object + exp = klass([1, True, 3, True]) + self._assert_where_conversion(obj, cond, True, exp, np.object) values = klass([True, False, True, True]) - exp = klass([1, 0, 3, 1]) - self._assert_where_conversion(obj, cond, values, exp, np.int64) + exp = klass([1, False, 3, True]) + self._assert_where_conversion(obj, cond, values, exp, np.object) def test_where_series_int64(self): self._where_int64_common(pd.Series) @@ -628,7 +683,7 @@ def test_where_index_int64(self): def _where_float64_common(self, klass): obj = klass([1.1, 2.2, 3.3, 4.4]) - self.assertEqual(obj.dtype, np.float64) + assert obj.dtype == np.float64 cond = klass([True, False, True, False]) # float + int -> float @@ -658,13 +713,13 @@ def _where_float64_common(self, klass): self._assert_where_conversion(obj, cond, values, exp, np.complex128) - # float + bool -> float - exp = klass([1.1, 1.0, 3.3, 1.0]) - self._assert_where_conversion(obj, cond, True, exp, np.float64) + # float + bool -> object + exp = klass([1.1, True, 3.3, True]) + self._assert_where_conversion(obj, cond, True, exp, np.object) values = klass([True, False, True, True]) - exp = klass([1.1, 0.0, 3.3, 1.0]) - self._assert_where_conversion(obj, cond, values, exp, np.float64) + exp = klass([1.1, False, 3.3, True]) + self._assert_where_conversion(obj, cond, values, exp, np.object) def test_where_series_float64(self): self._where_float64_common(pd.Series) @@ -674,7 +729,7 @@ def test_where_index_float64(self): def test_where_series_complex128(self): obj = pd.Series([1 + 1j, 2 + 2j, 3 + 3j, 4 + 4j]) - self.assertEqual(obj.dtype, np.complex128) + assert obj.dtype == np.complex128 cond = pd.Series([True, False, True, False]) # complex + int -> complex @@ -701,45 +756,46 @@ def test_where_series_complex128(self): exp = pd.Series([1 + 1j, 6 + 6j, 3 + 3j, 8 + 8j]) self._assert_where_conversion(obj, cond, values, exp, np.complex128) - # complex + bool -> complex - exp = pd.Series([1 + 1j, 1, 3 + 3j, 1]) - self._assert_where_conversion(obj, cond, True, exp, np.complex128) + # complex + bool -> object + exp = pd.Series([1 + 1j, True, 3 + 3j, True]) + self._assert_where_conversion(obj, cond, True, exp, np.object) values = pd.Series([True, False, True, True]) - exp = pd.Series([1 + 1j, 0, 3 + 3j, 1]) - self._assert_where_conversion(obj, cond, values, exp, np.complex128) + exp = pd.Series([1 + 1j, False, 3 + 3j, True]) + self._assert_where_conversion(obj, cond, values, exp, np.object) def test_where_index_complex128(self): pass def test_where_series_bool(self): + obj = pd.Series([True, False, True, False]) - self.assertEqual(obj.dtype, np.bool) + assert obj.dtype == np.bool cond = pd.Series([True, False, True, False]) - # bool + int -> int - exp = pd.Series([1, 1, 1, 1]) - self._assert_where_conversion(obj, cond, 1, exp, np.int64) + # bool + int -> object + exp = pd.Series([True, 1, True, 1]) + self._assert_where_conversion(obj, cond, 1, exp, np.object) values = pd.Series([5, 6, 7, 8]) - exp = pd.Series([1, 6, 1, 8]) - self._assert_where_conversion(obj, cond, values, exp, np.int64) + exp = pd.Series([True, 6, True, 8]) + self._assert_where_conversion(obj, cond, values, exp, np.object) - # bool + float -> float - exp = pd.Series([1.0, 1.1, 1.0, 1.1]) - self._assert_where_conversion(obj, cond, 1.1, exp, np.float64) + # bool + float -> object + exp = pd.Series([True, 1.1, True, 1.1]) + self._assert_where_conversion(obj, cond, 1.1, exp, np.object) values = pd.Series([5.5, 6.6, 7.7, 8.8]) - exp = pd.Series([1.0, 6.6, 1.0, 8.8]) - self._assert_where_conversion(obj, cond, values, exp, np.float64) + exp = pd.Series([True, 6.6, True, 8.8]) + self._assert_where_conversion(obj, cond, values, exp, np.object) - # bool + complex -> complex - exp = pd.Series([1, 1 + 1j, 1, 1 + 1j]) - self._assert_where_conversion(obj, cond, 1 + 1j, exp, np.complex128) + # bool + complex -> object + exp = pd.Series([True, 1 + 1j, True, 1 + 1j]) + self._assert_where_conversion(obj, cond, 1 + 1j, exp, np.object) values = pd.Series([5 + 5j, 6 + 6j, 7 + 7j, 8 + 8j]) - exp = pd.Series([1, 6 + 6j, 1, 8 + 8j]) - self._assert_where_conversion(obj, cond, values, exp, np.complex128) + exp = pd.Series([True, 6 + 6j, True, 8 + 8j]) + self._assert_where_conversion(obj, cond, values, exp, np.object) # bool + bool -> bool exp = pd.Series([True, True, True, True]) @@ -757,7 +813,7 @@ def test_where_series_datetime64(self): pd.Timestamp('2011-01-02'), pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' cond = pd.Series([True, False, True, False]) # datetime64 + datetime64 -> datetime64 @@ -778,10 +834,15 @@ def test_where_series_datetime64(self): pd.Timestamp('2012-01-04')]) self._assert_where_conversion(obj, cond, values, exp, 'datetime64[ns]') - # ToDo: coerce to object - msg = "cannot coerce a Timestamp with a tz on a naive Block" - with tm.assertRaisesRegexp(TypeError, msg): - obj.where(cond, pd.Timestamp('2012-01-01', tz='US/Eastern')) + # datetime64 + datetime64tz -> object + exp = pd.Series([pd.Timestamp('2011-01-01'), + pd.Timestamp('2012-01-01', tz='US/Eastern'), + pd.Timestamp('2011-01-03'), + pd.Timestamp('2012-01-01', tz='US/Eastern')]) + self._assert_where_conversion( + obj, cond, + pd.Timestamp('2012-01-01', tz='US/Eastern'), + exp, np.object) # ToDo: do not coerce to UTC, must be object values = pd.Series([pd.Timestamp('2012-01-01', tz='US/Eastern'), @@ -799,13 +860,13 @@ def test_where_index_datetime64(self): pd.Timestamp('2011-01-02'), pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' cond = pd.Index([True, False, True, False]) # datetime64 + datetime64 -> datetime64 # must support scalar msg = "cannot coerce a Timestamp with a tz on a naive Block" - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): obj.where(cond, pd.Timestamp('2012-01-01')) values = pd.Index([pd.Timestamp('2012-01-01'), @@ -821,7 +882,7 @@ def test_where_index_datetime64(self): # ToDo: coerce to object msg = ("Index\\(\\.\\.\\.\\) must be called with a collection " "of some kind") - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): obj.where(cond, pd.Timestamp('2012-01-01', tz='US/Eastern')) # ToDo: do not ignore timezone, must be object @@ -854,7 +915,7 @@ def test_where_index_period(self): pass -class TestFillnaSeriesCoercion(CoercionBase, tm.TestCase): +class TestFillnaSeriesCoercion(CoercionBase): # not indexing, but place here for consisntency @@ -869,7 +930,7 @@ def _assert_fillna_conversion(self, original, value, def _fillna_object_common(self, klass): obj = klass(['a', np.nan, 'c', 'd']) - self.assertEqual(obj.dtype, np.object) + assert obj.dtype == np.object # object + int -> object exp = klass(['a', 1, 'c', 'd']) @@ -900,9 +961,9 @@ def test_fillna_series_int64(self): def test_fillna_index_int64(self): pass - def _fillna_float64_common(self, klass): + def _fillna_float64_common(self, klass, complex): obj = klass([1.1, np.nan, 3.3, 4.4]) - self.assertEqual(obj.dtype, np.float64) + assert obj.dtype == np.float64 # float + int -> float exp = klass([1.1, 1.0, 3.3, 4.4]) @@ -912,30 +973,25 @@ def _fillna_float64_common(self, klass): exp = klass([1.1, 1.1, 3.3, 4.4]) self._assert_fillna_conversion(obj, 1.1, exp, np.float64) - if klass is pd.Series: - # float + complex -> complex - exp = klass([1.1, 1 + 1j, 3.3, 4.4]) - self._assert_fillna_conversion(obj, 1 + 1j, exp, np.complex128) - elif klass is pd.Index: - # float + complex -> object - exp = klass([1.1, 1 + 1j, 3.3, 4.4]) - self._assert_fillna_conversion(obj, 1 + 1j, exp, np.object) - else: - NotImplementedError + # float + complex -> we don't support a complex Index + # complex for Series, + # object for Index + exp = klass([1.1, 1 + 1j, 3.3, 4.4]) + self._assert_fillna_conversion(obj, 1 + 1j, exp, complex) - # float + bool -> float - exp = klass([1.1, 1.0, 3.3, 4.4]) - self._assert_fillna_conversion(obj, True, exp, np.float64) + # float + bool -> object + exp = klass([1.1, True, 3.3, 4.4]) + self._assert_fillna_conversion(obj, True, exp, np.object) def test_fillna_series_float64(self): - self._fillna_float64_common(pd.Series) + self._fillna_float64_common(pd.Series, complex=np.complex128) def test_fillna_index_float64(self): - self._fillna_float64_common(pd.Index) + self._fillna_float64_common(pd.Index, complex=np.object) def test_fillna_series_complex128(self): obj = pd.Series([1 + 1j, np.nan, 3 + 3j, 4 + 4j]) - self.assertEqual(obj.dtype, np.complex128) + assert obj.dtype == np.complex128 # complex + int -> complex exp = pd.Series([1 + 1j, 1, 3 + 3j, 4 + 4j]) @@ -949,12 +1005,12 @@ def test_fillna_series_complex128(self): exp = pd.Series([1 + 1j, 1 + 1j, 3 + 3j, 4 + 4j]) self._assert_fillna_conversion(obj, 1 + 1j, exp, np.complex128) - # complex + bool -> complex - exp = pd.Series([1 + 1j, 1, 3 + 3j, 4 + 4j]) - self._assert_fillna_conversion(obj, True, exp, np.complex128) + # complex + bool -> object + exp = pd.Series([1 + 1j, True, 3 + 3j, 4 + 4j]) + self._assert_fillna_conversion(obj, True, exp, np.object) def test_fillna_index_complex128(self): - self._fillna_float64_common(pd.Index) + self._fillna_float64_common(pd.Index, complex=np.object) def test_fillna_series_bool(self): # bool can't hold NaN @@ -968,7 +1024,7 @@ def test_fillna_series_datetime64(self): pd.NaT, pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' # datetime64 + datetime64 => datetime64 exp = pd.Series([pd.Timestamp('2011-01-01'), @@ -987,12 +1043,11 @@ def test_fillna_series_datetime64(self): self._assert_fillna_conversion(obj, value, exp, np.object) # datetime64 + int => object - # ToDo: must be coerced to object exp = pd.Series([pd.Timestamp('2011-01-01'), - pd.Timestamp(1), + 1, pd.Timestamp('2011-01-03'), pd.Timestamp('2011-01-04')]) - self._assert_fillna_conversion(obj, 1, exp, 'datetime64[ns]') + self._assert_fillna_conversion(obj, 1, exp, 'object') # datetime64 + object => object exp = pd.Series([pd.Timestamp('2011-01-01'), @@ -1008,7 +1063,7 @@ def test_fillna_series_datetime64tz(self): pd.NaT, pd.Timestamp('2011-01-03', tz=tz), pd.Timestamp('2011-01-04', tz=tz)]) - self.assertEqual(obj.dtype, 'datetime64[ns, US/Eastern]') + assert obj.dtype == 'datetime64[ns, US/Eastern]' # datetime64tz + datetime64tz => datetime64tz exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), @@ -1035,14 +1090,12 @@ def test_fillna_series_datetime64tz(self): value = pd.Timestamp('2012-01-01', tz='Asia/Tokyo') self._assert_fillna_conversion(obj, value, exp, np.object) - # datetime64tz + int => datetime64tz - # ToDo: must be object + # datetime64tz + int => object exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), - pd.Timestamp(1, tz=tz), + 1, pd.Timestamp('2011-01-03', tz=tz), pd.Timestamp('2011-01-04', tz=tz)]) - self._assert_fillna_conversion(obj, 1, exp, - 'datetime64[ns, US/Eastern]') + self._assert_fillna_conversion(obj, 1, exp, np.object) # datetime64tz + object => object exp = pd.Series([pd.Timestamp('2011-01-01', tz=tz), @@ -1060,7 +1113,7 @@ def test_fillna_series_period(self): def test_fillna_index_datetime64(self): obj = pd.DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03', '2011-01-04']) - self.assertEqual(obj.dtype, 'datetime64[ns]') + assert obj.dtype == 'datetime64[ns]' # datetime64 + datetime64 => datetime64 exp = pd.DatetimeIndex(['2011-01-01', '2012-01-01', @@ -1095,7 +1148,7 @@ def test_fillna_index_datetime64tz(self): obj = pd.DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03', '2011-01-04'], tz=tz) - self.assertEqual(obj.dtype, 'datetime64[ns, US/Eastern]') + assert obj.dtype == 'datetime64[ns, US/Eastern]' # datetime64tz + datetime64tz => datetime64tz exp = pd.DatetimeIndex(['2011-01-01', '2012-01-01', @@ -1141,25 +1194,40 @@ def test_fillna_index_period(self): pass -class TestReplaceSeriesCoercion(CoercionBase, tm.TestCase): +class TestReplaceSeriesCoercion(CoercionBase): # not indexing, but place here for consisntency klasses = ['series'] method = 'replace' - def setUp(self): + def setup_method(self, method): self.rep = {} self.rep['object'] = ['a', 'b'] self.rep['int64'] = [4, 5] self.rep['float64'] = [1.1, 2.2] self.rep['complex128'] = [1 + 1j, 2 + 2j] self.rep['bool'] = [True, False] + self.rep['datetime64[ns]'] = [pd.Timestamp('2011-01-01'), + pd.Timestamp('2011-01-03')] + + for tz in ['UTC', 'US/Eastern']: + # to test tz => different tz replacement + key = 'datetime64[ns, {0}]'.format(tz) + self.rep[key] = [pd.Timestamp('2011-01-01', tz=tz), + pd.Timestamp('2011-01-03', tz=tz)] + + self.rep['timedelta64[ns]'] = [pd.Timedelta('1 day'), + pd.Timedelta('2 day')] def _assert_replace_conversion(self, from_key, to_key, how): index = pd.Index([3, 4], name='xxx') obj = pd.Series(self.rep[from_key], index=index, name='yyy') - self.assertEqual(obj.dtype, from_key) + assert obj.dtype == from_key + + if (from_key.startswith('datetime') and to_key.startswith('datetime')): + # different tz, currently mask_missing raises SystemError + return if how == 'dict': replacer = dict(zip(self.rep[from_key], self.rep[to_key])) @@ -1170,29 +1238,14 @@ def _assert_replace_conversion(self, from_key, to_key, how): result = obj.replace(replacer) - # buggy on windows for bool/int64 - if (from_key == 'bool' and - to_key == 'int64' and - tm.is_platform_windows()): - raise nose.SkipTest("windows platform buggy: {0} -> {1}".format - (from_key, to_key)) - - if ((from_key == 'float64' and - to_key in ('bool', 'int64')) or - + if ((from_key == 'float64' and to_key in ('int64')) or (from_key == 'complex128' and - to_key in ('bool', 'int64', 'float64')) or - - (from_key == 'int64' and - to_key in ('bool')) or - - # TODO_GH12747 The result must be int? - (from_key == 'bool' and to_key == 'int64')): + to_key in ('int64', 'float64'))): - # buggy on 32-bit - if tm.is_platform_32bit(): - raise nose.SkipTest("32-bit platform buggy: {0} -> {1}".format - (from_key, to_key)) + # buggy on 32-bit / window + if compat.is_platform_32bit() or compat.is_platform_windows(): + pytest.skip("32-bit platform buggy: {0} -> {1}".format + (from_key, to_key)) # Expected: do not downcast by replacement exp = pd.Series(self.rep[to_key], index=index, @@ -1200,7 +1253,7 @@ def _assert_replace_conversion(self, from_key, to_key, how): else: exp = pd.Series(self.rep[to_key], index=index, name='yyy') - self.assertEqual(exp.dtype, to_key) + assert exp.dtype == to_key tm.assert_series_equal(result, exp) @@ -1245,18 +1298,36 @@ def test_replace_series_bool(self): if compat.PY3: # doesn't work in PY3, though ...dict_from_bool works fine - raise nose.SkipTest("doesn't work as in PY3") + pytest.skip("doesn't work as in PY3") self._assert_replace_conversion(from_key, to_key, how='series') def test_replace_series_datetime64(self): - pass + from_key = 'datetime64[ns]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='dict') + + from_key = 'datetime64[ns]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='series') def test_replace_series_datetime64tz(self): - pass + from_key = 'datetime64[ns, US/Eastern]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='dict') + + from_key = 'datetime64[ns, US/Eastern]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='series') def test_replace_series_timedelta64(self): - pass + from_key = 'timedelta64[ns]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='dict') + + from_key = 'timedelta64[ns]' + for to_key in self.rep: + self._assert_replace_conversion(from_key, to_key, how='series') def test_replace_series_period(self): pass diff --git a/pandas/tests/indexing/test_datetime.py b/pandas/tests/indexing/test_datetime.py new file mode 100644 index 0000000000000..ddac80fbc4693 --- /dev/null +++ b/pandas/tests/indexing/test_datetime.py @@ -0,0 +1,252 @@ +import numpy as np +import pandas as pd +from pandas import date_range, Index, DataFrame, Series, Timestamp +from pandas.util import testing as tm + + +class TestDatetimeIndex(object): + + def test_setitem_with_datetime_tz(self): + # 16889 + # support .loc with alignment and tz-aware DatetimeIndex + mask = np.array([True, False, True, False]) + + idx = pd.date_range('20010101', periods=4, tz='UTC') + df = pd.DataFrame({'a': np.arange(4)}, index=idx).astype('float64') + + result = df.copy() + result.loc[mask, :] = df.loc[mask, :] + tm.assert_frame_equal(result, df) + + result = df.copy() + result.loc[mask] = df.loc[mask] + tm.assert_frame_equal(result, df) + + idx = pd.date_range('20010101', periods=4) + df = pd.DataFrame({'a': np.arange(4)}, index=idx).astype('float64') + + result = df.copy() + result.loc[mask, :] = df.loc[mask, :] + tm.assert_frame_equal(result, df) + + result = df.copy() + result.loc[mask] = df.loc[mask] + tm.assert_frame_equal(result, df) + + def test_indexing_with_datetime_tz(self): + + # 8260 + # support datetime64 with tz + + idx = Index(date_range('20130101', periods=3, tz='US/Eastern'), + name='foo') + dr = date_range('20130110', periods=3) + df = DataFrame({'A': idx, 'B': dr}) + df['C'] = idx + df.iloc[1, 1] = pd.NaT + df.iloc[1, 2] = pd.NaT + + # indexing + result = df.iloc[1] + expected = Series([Timestamp('2013-01-02 00:00:00-0500', + tz='US/Eastern'), np.nan, np.nan], + index=list('ABC'), dtype='object', name=1) + tm.assert_series_equal(result, expected) + result = df.loc[1] + expected = Series([Timestamp('2013-01-02 00:00:00-0500', + tz='US/Eastern'), np.nan, np.nan], + index=list('ABC'), dtype='object', name=1) + tm.assert_series_equal(result, expected) + + # indexing - fast_xs + df = DataFrame({'a': date_range('2014-01-01', periods=10, tz='UTC')}) + result = df.iloc[5] + expected = Timestamp('2014-01-06 00:00:00+0000', tz='UTC', freq='D') + assert result == expected + + result = df.loc[5] + assert result == expected + + # indexing - boolean + result = df[df.a > df.a[3]] + expected = df.iloc[4:] + tm.assert_frame_equal(result, expected) + + # indexing - setting an element + df = DataFrame(data=pd.to_datetime( + ['2015-03-30 20:12:32', '2015-03-12 00:11:11']), columns=['time']) + df['new_col'] = ['new', 'old'] + df.time = df.set_index('time').index.tz_localize('UTC') + v = df[df.new_col == 'new'].set_index('time').index.tz_convert( + 'US/Pacific') + + # trying to set a single element on a part of a different timezone + # this converts to object + df2 = df.copy() + df2.loc[df2.new_col == 'new', 'time'] = v + + expected = Series([v[0], df.loc[1, 'time']], name='time') + tm.assert_series_equal(df2.time, expected) + + v = df.loc[df.new_col == 'new', 'time'] + pd.Timedelta('1s') + df.loc[df.new_col == 'new', 'time'] = v + tm.assert_series_equal(df.loc[df.new_col == 'new', 'time'], v) + + def test_consistency_with_tz_aware_scalar(self): + # xef gh-12938 + # various ways of indexing the same tz-aware scalar + df = Series([Timestamp('2016-03-30 14:35:25', + tz='Europe/Brussels')]).to_frame() + + df = pd.concat([df, df]).reset_index(drop=True) + expected = Timestamp('2016-03-30 14:35:25+0200', + tz='Europe/Brussels') + + result = df[0][0] + assert result == expected + + result = df.iloc[0, 0] + assert result == expected + + result = df.loc[0, 0] + assert result == expected + + result = df.iat[0, 0] + assert result == expected + + result = df.at[0, 0] + assert result == expected + + result = df[0].loc[0] + assert result == expected + + result = df[0].at[0] + assert result == expected + + def test_indexing_with_datetimeindex_tz(self): + + # GH 12050 + # indexing on a series with a datetimeindex with tz + index = pd.date_range('2015-01-01', periods=2, tz='utc') + + ser = pd.Series(range(2), index=index, + dtype='int64') + + # list-like indexing + + for sel in (index, list(index)): + # getitem + tm.assert_series_equal(ser[sel], ser) + + # setitem + result = ser.copy() + result[sel] = 1 + expected = pd.Series(1, index=index) + tm.assert_series_equal(result, expected) + + # .loc getitem + tm.assert_series_equal(ser.loc[sel], ser) + + # .loc setitem + result = ser.copy() + result.loc[sel] = 1 + expected = pd.Series(1, index=index) + tm.assert_series_equal(result, expected) + + # single element indexing + + # getitem + assert ser[index[1]] == 1 + + # setitem + result = ser.copy() + result[index[1]] = 5 + expected = pd.Series([0, 5], index=index) + tm.assert_series_equal(result, expected) + + # .loc getitem + assert ser.loc[index[1]] == 1 + + # .loc setitem + result = ser.copy() + result.loc[index[1]] = 5 + expected = pd.Series([0, 5], index=index) + tm.assert_series_equal(result, expected) + + def test_partial_setting_with_datetimelike_dtype(self): + + # GH9478 + # a datetimeindex alignment issue with partial setting + df = pd.DataFrame(np.arange(6.).reshape(3, 2), columns=list('AB'), + index=pd.date_range('1/1/2000', periods=3, + freq='1H')) + expected = df.copy() + expected['C'] = [expected.index[0]] + [pd.NaT, pd.NaT] + + mask = df.A < 1 + df.loc[mask, 'C'] = df.loc[mask].index + tm.assert_frame_equal(df, expected) + + def test_loc_setitem_datetime(self): + + # GH 9516 + dt1 = Timestamp('20130101 09:00:00') + dt2 = Timestamp('20130101 10:00:00') + + for conv in [lambda x: x, lambda x: x.to_datetime64(), + lambda x: x.to_pydatetime(), lambda x: np.datetime64(x)]: + + df = pd.DataFrame() + df.loc[conv(dt1), 'one'] = 100 + df.loc[conv(dt2), 'one'] = 200 + + expected = DataFrame({'one': [100.0, 200.0]}, index=[dt1, dt2]) + tm.assert_frame_equal(df, expected) + + def test_series_partial_set_datetime(self): + # GH 11497 + + idx = date_range('2011-01-01', '2011-01-02', freq='D', name='idx') + ser = Series([0.1, 0.2], index=idx, name='s') + + result = ser.loc[[Timestamp('2011-01-01'), Timestamp('2011-01-02')]] + exp = Series([0.1, 0.2], index=idx, name='s') + tm.assert_series_equal(result, exp, check_index_type=True) + + keys = [Timestamp('2011-01-02'), Timestamp('2011-01-02'), + Timestamp('2011-01-01')] + exp = Series([0.2, 0.2, 0.1], index=pd.DatetimeIndex(keys, name='idx'), + name='s') + tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) + + keys = [Timestamp('2011-01-03'), Timestamp('2011-01-02'), + Timestamp('2011-01-03')] + exp = Series([np.nan, 0.2, np.nan], + index=pd.DatetimeIndex(keys, name='idx'), name='s') + tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) + + def test_series_partial_set_period(self): + # GH 11497 + + idx = pd.period_range('2011-01-01', '2011-01-02', freq='D', name='idx') + ser = Series([0.1, 0.2], index=idx, name='s') + + result = ser.loc[[pd.Period('2011-01-01', freq='D'), + pd.Period('2011-01-02', freq='D')]] + exp = Series([0.1, 0.2], index=idx, name='s') + tm.assert_series_equal(result, exp, check_index_type=True) + + keys = [pd.Period('2011-01-02', freq='D'), + pd.Period('2011-01-02', freq='D'), + pd.Period('2011-01-01', freq='D')] + exp = Series([0.2, 0.2, 0.1], index=pd.PeriodIndex(keys, name='idx'), + name='s') + tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) + + keys = [pd.Period('2011-01-03', freq='D'), + pd.Period('2011-01-02', freq='D'), + pd.Period('2011-01-03', freq='D')] + exp = Series([np.nan, 0.2, np.nan], + index=pd.PeriodIndex(keys, name='idx'), name='s') + result = ser.loc[keys] + tm.assert_series_equal(result, exp) diff --git a/pandas/tests/indexing/test_floats.py b/pandas/tests/indexing/test_floats.py index 8f0fa2d56113b..00a2b8166ceed 100644 --- a/pandas/tests/indexing/test_floats.py +++ b/pandas/tests/indexing/test_floats.py @@ -1,5 +1,7 @@ # -*- coding: utf-8 -*- +import pytest + from warnings import catch_warnings import numpy as np from pandas import Series, DataFrame, Index, Float64Index @@ -7,7 +9,7 @@ import pandas.util.testing as tm -class TestFloatIndexers(tm.TestCase): +class TestFloatIndexers(object): def check(self, result, original, indexer, getitem): """ @@ -46,13 +48,13 @@ def test_scalar_error(self): def f(): s.iloc[3.0] - self.assertRaisesRegexp(TypeError, - 'cannot do positional indexing', - f) + tm.assert_raises_regex(TypeError, + 'cannot do positional indexing', + f) def f(): s.iloc[3.0] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_scalar_non_numeric(self): @@ -87,7 +89,7 @@ def f(): error = KeyError else: error = TypeError - self.assertRaises(error, f) + pytest.raises(error, f) # label based can be a TypeError or KeyError def f(): @@ -97,15 +99,15 @@ def f(): error = KeyError else: error = TypeError - self.assertRaises(error, f) + pytest.raises(error, f) # contains - self.assertFalse(3.0 in s) + assert 3.0 not in s # setting with a float fails with iloc def f(): s.iloc[3.0] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # setting with an indexer if s.index.inferred_type in ['categorical']: @@ -121,26 +123,26 @@ def f(): # s2 = s.copy() # def f(): # idxr(s2)[3.0] = 0 - # self.assertRaises(TypeError, f) + # pytest.raises(TypeError, f) pass else: s2 = s.copy() s2.loc[3.0] = 10 - self.assertTrue(s2.index.is_object()) + assert s2.index.is_object() for idxr in [lambda x: x.ix, lambda x: x]: s2 = s.copy() with catch_warnings(record=True): idxr(s2)[3.0] = 0 - self.assertTrue(s2.index.is_object()) + assert s2.index.is_object() # fallsback to position selection, series only s = Series(np.arange(len(i)), index=i) s[3] - self.assertRaises(TypeError, lambda: s[3.0]) + pytest.raises(TypeError, lambda: s[3.0]) def test_scalar_with_mixed(self): @@ -157,13 +159,13 @@ def f(): with catch_warnings(record=True): idxr(s2)[1.0] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) - self.assertRaises(KeyError, lambda: s2.loc[1.0]) + pytest.raises(KeyError, lambda: s2.loc[1.0]) result = s2.loc['b'] expected = 2 - self.assertEqual(result, expected) + assert result == expected # mixed index so we have label # indexing @@ -174,18 +176,18 @@ def f(): with catch_warnings(record=True): idxr(s3)[1.0] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) result = idxr(s3)[1] expected = 2 - self.assertEqual(result, expected) + assert result == expected - self.assertRaises(TypeError, lambda: s3.iloc[1.0]) - self.assertRaises(KeyError, lambda: s3.loc[1.0]) + pytest.raises(TypeError, lambda: s3.iloc[1.0]) + pytest.raises(KeyError, lambda: s3.loc[1.0]) result = s3.loc[1.5] expected = 3 - self.assertEqual(result, expected) + assert result == expected def test_scalar_integer(self): @@ -214,7 +216,8 @@ def test_scalar_integer(self): (lambda x: x, True)]: if isinstance(s, Series): - compare = self.assertEqual + def compare(x, y): + assert x == y expected = 100 else: compare = tm.assert_series_equal @@ -237,7 +240,7 @@ def test_scalar_integer(self): # contains # coerce to equal int - self.assertTrue(3.0 in s) + assert 3.0 in s def test_scalar_float(self): @@ -270,10 +273,10 @@ def f(): # random integer is a KeyError with catch_warnings(record=True): - self.assertRaises(KeyError, lambda: idxr(s)[3.5]) + pytest.raises(KeyError, lambda: idxr(s)[3.5]) # contains - self.assertTrue(3.0 in s) + assert 3.0 in s # iloc succeeds with an integer expected = s.iloc[3] @@ -284,11 +287,11 @@ def f(): self.check(result, s, 3, False) # iloc raises with a float - self.assertRaises(TypeError, lambda: s.iloc[3.0]) + pytest.raises(TypeError, lambda: s.iloc[3.0]) def g(): s2.iloc[3.0] = 0 - self.assertRaises(TypeError, g) + pytest.raises(TypeError, g) def test_slice_non_numeric(self): @@ -311,7 +314,7 @@ def test_slice_non_numeric(self): def f(): s.iloc[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, @@ -321,7 +324,7 @@ def f(): def f(): with catch_warnings(record=True): idxr(s)[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # setitem for l in [slice(3.0, 4), @@ -330,7 +333,7 @@ def f(): def f(): s.iloc[l] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) for idxr in [lambda x: x.ix, lambda x: x.loc, @@ -339,7 +342,7 @@ def f(): def f(): with catch_warnings(record=True): idxr(s)[l] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_slice_integer(self): @@ -378,7 +381,7 @@ def test_slice_integer(self): def f(): s[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # getitem out-of-bounds for l in [slice(-6, 6), @@ -402,7 +405,7 @@ def f(): def f(): s[slice(-6.0, 6.0)] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # getitem odd floats for l, res1 in [(slice(2.5, 4), slice(3, 5)), @@ -425,7 +428,7 @@ def f(): def f(): s[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # setitem for l in [slice(3.0, 4), @@ -438,13 +441,13 @@ def f(): with catch_warnings(record=True): idxr(sc)[l] = 0 result = idxr(sc)[l].values.ravel() - self.assertTrue((result == 0).all()) + assert (result == 0).all() # positional indexing def f(): s[l] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_integer_positional_indexing(self): """ make sure that we are raising on positional indexing @@ -466,7 +469,7 @@ def test_integer_positional_indexing(self): def f(): idxr(s)[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_slice_integer_frame_getitem(self): @@ -493,7 +496,7 @@ def test_slice_integer_frame_getitem(self): def f(): s[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # getitem out-of-bounds for l in [slice(-10, 10), @@ -506,7 +509,7 @@ def f(): def f(): s[slice(-10.0, 10.0)] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # getitem odd floats for l, res in [(slice(0.5, 1), slice(1, 2)), @@ -521,7 +524,7 @@ def f(): def f(): s[l] - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # setitem for l in [slice(3.0, 4), @@ -532,13 +535,13 @@ def f(): with catch_warnings(record=True): idxr(sc)[l] = 0 result = idxr(sc)[l].values.ravel() - self.assertTrue((result == 0).all()) + assert (result == 0).all() # positional indexing def f(): s[l] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def test_slice_float(self): @@ -560,24 +563,24 @@ def test_slice_float(self): with catch_warnings(record=True): result = idxr(s)[l] if isinstance(s, Series): - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) else: - self.assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # setitem s2 = s.copy() with catch_warnings(record=True): idxr(s2)[l] = 0 result = idxr(s2)[l].values.ravel() - self.assertTrue((result == 0).all()) + assert (result == 0).all() def test_floating_index_doc_example(self): index = Index([1.5, 2, 3, 4.5, 5]) s = Series(range(5), index=index) - self.assertEqual(s[3], 2) - self.assertEqual(s.loc[3], 2) - self.assertEqual(s.loc[3], 2) - self.assertEqual(s.iloc[3], 3) + assert s[3] == 2 + assert s.loc[3] == 2 + assert s.loc[3] == 2 + assert s.iloc[3] == 3 def test_floating_misc(self): @@ -596,23 +599,23 @@ def test_floating_misc(self): result1 = s[5.0] result2 = s.loc[5.0] result3 = s.loc[5.0] - self.assertEqual(result1, result2) - self.assertEqual(result1, result3) + assert result1 == result2 + assert result1 == result3 result1 = s[5] result2 = s.loc[5] result3 = s.loc[5] - self.assertEqual(result1, result2) - self.assertEqual(result1, result3) + assert result1 == result2 + assert result1 == result3 - self.assertEqual(s[5.0], s[5]) + assert s[5.0] == s[5] # value not found (and no fallbacking at all) # scalar integers - self.assertRaises(KeyError, lambda: s.loc[4]) - self.assertRaises(KeyError, lambda: s.loc[4]) - self.assertRaises(KeyError, lambda: s[4]) + pytest.raises(KeyError, lambda: s.loc[4]) + pytest.raises(KeyError, lambda: s.loc[4]) + pytest.raises(KeyError, lambda: s[4]) # fancy floats/integers create the correct entry (as nan) # fancy tests @@ -700,12 +703,171 @@ def test_floating_misc(self): assert_series_equal(result1, Series([1], index=[2.5])) def test_floating_tuples(self): - # GH13509 + # see gh-13509 s = Series([(1, 1), (2, 2), (3, 3)], index=[0.0, 0.1, 0.2], name='foo') + result = s[0.0] - self.assertEqual(result, (1, 1)) + assert result == (1, 1) + expected = Series([(1, 1), (2, 2)], index=[0.0, 0.0], name='foo') s = Series([(1, 1), (2, 2), (3, 3)], index=[0.0, 0.0, 0.2], name='foo') + result = s[0.0] - expected = Series([(1, 1), (2, 2)], index=[0.0, 0.0], name='foo') - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) + + def test_float64index_slicing_bug(self): + # GH 5557, related to slicing a float index + ser = {256: 2321.0, + 1: 78.0, + 2: 2716.0, + 3: 0.0, + 4: 369.0, + 5: 0.0, + 6: 269.0, + 7: 0.0, + 8: 0.0, + 9: 0.0, + 10: 3536.0, + 11: 0.0, + 12: 24.0, + 13: 0.0, + 14: 931.0, + 15: 0.0, + 16: 101.0, + 17: 78.0, + 18: 9643.0, + 19: 0.0, + 20: 0.0, + 21: 0.0, + 22: 63761.0, + 23: 0.0, + 24: 446.0, + 25: 0.0, + 26: 34773.0, + 27: 0.0, + 28: 729.0, + 29: 78.0, + 30: 0.0, + 31: 0.0, + 32: 3374.0, + 33: 0.0, + 34: 1391.0, + 35: 0.0, + 36: 361.0, + 37: 0.0, + 38: 61808.0, + 39: 0.0, + 40: 0.0, + 41: 0.0, + 42: 6677.0, + 43: 0.0, + 44: 802.0, + 45: 0.0, + 46: 2691.0, + 47: 0.0, + 48: 3582.0, + 49: 0.0, + 50: 734.0, + 51: 0.0, + 52: 627.0, + 53: 70.0, + 54: 2584.0, + 55: 0.0, + 56: 324.0, + 57: 0.0, + 58: 605.0, + 59: 0.0, + 60: 0.0, + 61: 0.0, + 62: 3989.0, + 63: 10.0, + 64: 42.0, + 65: 0.0, + 66: 904.0, + 67: 0.0, + 68: 88.0, + 69: 70.0, + 70: 8172.0, + 71: 0.0, + 72: 0.0, + 73: 0.0, + 74: 64902.0, + 75: 0.0, + 76: 347.0, + 77: 0.0, + 78: 36605.0, + 79: 0.0, + 80: 379.0, + 81: 70.0, + 82: 0.0, + 83: 0.0, + 84: 3001.0, + 85: 0.0, + 86: 1630.0, + 87: 7.0, + 88: 364.0, + 89: 0.0, + 90: 67404.0, + 91: 9.0, + 92: 0.0, + 93: 0.0, + 94: 7685.0, + 95: 0.0, + 96: 1017.0, + 97: 0.0, + 98: 2831.0, + 99: 0.0, + 100: 2963.0, + 101: 0.0, + 102: 854.0, + 103: 0.0, + 104: 0.0, + 105: 0.0, + 106: 0.0, + 107: 0.0, + 108: 0.0, + 109: 0.0, + 110: 0.0, + 111: 0.0, + 112: 0.0, + 113: 0.0, + 114: 0.0, + 115: 0.0, + 116: 0.0, + 117: 0.0, + 118: 0.0, + 119: 0.0, + 120: 0.0, + 121: 0.0, + 122: 0.0, + 123: 0.0, + 124: 0.0, + 125: 0.0, + 126: 67744.0, + 127: 22.0, + 128: 264.0, + 129: 0.0, + 260: 197.0, + 268: 0.0, + 265: 0.0, + 269: 0.0, + 261: 0.0, + 266: 1198.0, + 267: 0.0, + 262: 2629.0, + 258: 775.0, + 257: 0.0, + 263: 0.0, + 259: 0.0, + 264: 163.0, + 250: 10326.0, + 251: 0.0, + 252: 1228.0, + 253: 0.0, + 254: 2769.0, + 255: 0.0} + + # smoke test for the repr + s = Series(ser) + result = s.value_counts() + str(result) diff --git a/pandas/tests/indexing/test_iloc.py b/pandas/tests/indexing/test_iloc.py new file mode 100644 index 0000000000000..39569f0b0cb38 --- /dev/null +++ b/pandas/tests/indexing/test_iloc.py @@ -0,0 +1,653 @@ +""" test positional based indexing with iloc """ + +import pytest + +from warnings import catch_warnings +import numpy as np + +import pandas as pd +from pandas.compat import lrange, lmap +from pandas import Series, DataFrame, date_range, concat, isna +from pandas.util import testing as tm +from pandas.tests.indexing.common import Base + + +class TestiLoc(Base): + + def test_iloc_exceeds_bounds(self): + + # GH6296 + # iloc should allow indexers that exceed the bounds + df = DataFrame(np.random.random_sample((20, 5)), columns=list('ABCDE')) + expected = df + + # lists of positions should raise IndexErrror! + with tm.assert_raises_regex(IndexError, + 'positional indexers ' + 'are out-of-bounds'): + df.iloc[:, [0, 1, 2, 3, 4, 5]] + pytest.raises(IndexError, lambda: df.iloc[[1, 30]]) + pytest.raises(IndexError, lambda: df.iloc[[1, -30]]) + pytest.raises(IndexError, lambda: df.iloc[[100]]) + + s = df['A'] + pytest.raises(IndexError, lambda: s.iloc[[100]]) + pytest.raises(IndexError, lambda: s.iloc[[-100]]) + + # still raise on a single indexer + msg = 'single positional indexer is out-of-bounds' + with tm.assert_raises_regex(IndexError, msg): + df.iloc[30] + pytest.raises(IndexError, lambda: df.iloc[-30]) + + # GH10779 + # single positive/negative indexer exceeding Series bounds should raise + # an IndexError + with tm.assert_raises_regex(IndexError, msg): + s.iloc[30] + pytest.raises(IndexError, lambda: s.iloc[-30]) + + # slices are ok + result = df.iloc[:, 4:10] # 0 < start < len < stop + expected = df.iloc[:, 4:] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, -4:-10] # stop < 0 < start < len + expected = df.iloc[:, :0] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, 10:4:-1] # 0 < stop < len < start (down) + expected = df.iloc[:, :4:-1] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, 4:-10:-1] # stop < 0 < start < len (down) + expected = df.iloc[:, 4::-1] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, -10:4] # start < 0 < stop < len + expected = df.iloc[:, :4] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, 10:4] # 0 < stop < len < start + expected = df.iloc[:, :0] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, -10:-11:-1] # stop < start < 0 < len (down) + expected = df.iloc[:, :0] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, 10:11] # 0 < len < start < stop + expected = df.iloc[:, :0] + tm.assert_frame_equal(result, expected) + + # slice bounds exceeding is ok + result = s.iloc[18:30] + expected = s.iloc[18:] + tm.assert_series_equal(result, expected) + + result = s.iloc[30:] + expected = s.iloc[:0] + tm.assert_series_equal(result, expected) + + result = s.iloc[30::-1] + expected = s.iloc[::-1] + tm.assert_series_equal(result, expected) + + # doc example + def check(result, expected): + str(result) + result.dtypes + tm.assert_frame_equal(result, expected) + + dfl = DataFrame(np.random.randn(5, 2), columns=list('AB')) + check(dfl.iloc[:, 2:3], DataFrame(index=dfl.index)) + check(dfl.iloc[:, 1:3], dfl.iloc[:, [1]]) + check(dfl.iloc[4:6], dfl.iloc[[4]]) + + pytest.raises(IndexError, lambda: dfl.iloc[[4, 5, 6]]) + pytest.raises(IndexError, lambda: dfl.iloc[:, 4]) + + def test_iloc_getitem_int(self): + + # integer + self.check_result('integer', 'iloc', 2, 'ix', + {0: 4, 1: 6, 2: 8}, typs=['ints', 'uints']) + self.check_result('integer', 'iloc', 2, 'indexer', 2, + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + def test_iloc_getitem_neg_int(self): + + # neg integer + self.check_result('neg int', 'iloc', -1, 'ix', + {0: 6, 1: 9, 2: 12}, typs=['ints', 'uints']) + self.check_result('neg int', 'iloc', -1, 'indexer', -1, + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + def test_iloc_getitem_list_int(self): + + # list of ints + self.check_result('list int', 'iloc', [0, 1, 2], 'ix', + {0: [0, 2, 4], 1: [0, 3, 6], 2: [0, 4, 8]}, + typs=['ints', 'uints']) + self.check_result('list int', 'iloc', [2], 'ix', + {0: [4], 1: [6], 2: [8]}, typs=['ints', 'uints']) + self.check_result('list int', 'iloc', [0, 1, 2], 'indexer', [0, 1, 2], + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + # array of ints (GH5006), make sure that a single indexer is returning + # the correct type + self.check_result('array int', 'iloc', np.array([0, 1, 2]), 'ix', + {0: [0, 2, 4], + 1: [0, 3, 6], + 2: [0, 4, 8]}, typs=['ints', 'uints']) + self.check_result('array int', 'iloc', np.array([2]), 'ix', + {0: [4], 1: [6], 2: [8]}, typs=['ints', 'uints']) + self.check_result('array int', 'iloc', np.array([0, 1, 2]), 'indexer', + [0, 1, 2], + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + def test_iloc_getitem_neg_int_can_reach_first_index(self): + # GH10547 and GH10779 + # negative integers should be able to reach index 0 + df = DataFrame({'A': [2, 3, 5], 'B': [7, 11, 13]}) + s = df['A'] + + expected = df.iloc[0] + result = df.iloc[-3] + tm.assert_series_equal(result, expected) + + expected = df.iloc[[0]] + result = df.iloc[[-3]] + tm.assert_frame_equal(result, expected) + + expected = s.iloc[0] + result = s.iloc[-3] + assert result == expected + + expected = s.iloc[[0]] + result = s.iloc[[-3]] + tm.assert_series_equal(result, expected) + + # check the length 1 Series case highlighted in GH10547 + expected = pd.Series(['a'], index=['A']) + result = expected.iloc[[-1]] + tm.assert_series_equal(result, expected) + + def test_iloc_getitem_dups(self): + + # no dups in panel (bug?) + self.check_result('list int (dups)', 'iloc', [0, 1, 1, 3], 'ix', + {0: [0, 2, 2, 6], 1: [0, 3, 3, 9]}, + objs=['series', 'frame'], typs=['ints', 'uints']) + + # GH 6766 + df1 = DataFrame([{'A': None, 'B': 1}, {'A': 2, 'B': 2}]) + df2 = DataFrame([{'A': 3, 'B': 3}, {'A': 4, 'B': 4}]) + df = concat([df1, df2], axis=1) + + # cross-sectional indexing + result = df.iloc[0, 0] + assert isna(result) + + result = df.iloc[0, :] + expected = Series([np.nan, 1, 3, 3], index=['A', 'B', 'A', 'B'], + name=0) + tm.assert_series_equal(result, expected) + + def test_iloc_getitem_array(self): + + # array like + s = Series(index=lrange(1, 4)) + self.check_result('array like', 'iloc', s.index, 'ix', + {0: [2, 4, 6], 1: [3, 6, 9], 2: [4, 8, 12]}, + typs=['ints', 'uints']) + + def test_iloc_getitem_bool(self): + + # boolean indexers + b = [True, False, True, False, ] + self.check_result('bool', 'iloc', b, 'ix', b, typs=['ints', 'uints']) + self.check_result('bool', 'iloc', b, 'ix', b, + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + def test_iloc_getitem_slice(self): + + # slices + self.check_result('slice', 'iloc', slice(1, 3), 'ix', + {0: [2, 4], 1: [3, 6], 2: [4, 8]}, + typs=['ints', 'uints']) + self.check_result('slice', 'iloc', slice(1, 3), 'indexer', + slice(1, 3), + typs=['labels', 'mixed', 'ts', 'floats', 'empty'], + fails=IndexError) + + def test_iloc_getitem_slice_dups(self): + + df1 = DataFrame(np.random.randn(10, 4), columns=['A', 'A', 'B', 'B']) + df2 = DataFrame(np.random.randint(0, 10, size=20).reshape(10, 2), + columns=['A', 'C']) + + # axis=1 + df = concat([df1, df2], axis=1) + tm.assert_frame_equal(df.iloc[:, :4], df1) + tm.assert_frame_equal(df.iloc[:, 4:], df2) + + df = concat([df2, df1], axis=1) + tm.assert_frame_equal(df.iloc[:, :2], df2) + tm.assert_frame_equal(df.iloc[:, 2:], df1) + + exp = concat([df2, df1.iloc[:, [0]]], axis=1) + tm.assert_frame_equal(df.iloc[:, 0:3], exp) + + # axis=0 + df = concat([df, df], axis=0) + tm.assert_frame_equal(df.iloc[0:10, :2], df2) + tm.assert_frame_equal(df.iloc[0:10, 2:], df1) + tm.assert_frame_equal(df.iloc[10:, :2], df2) + tm.assert_frame_equal(df.iloc[10:, 2:], df1) + + def test_iloc_setitem(self): + df = self.frame_ints + + df.iloc[1, 1] = 1 + result = df.iloc[1, 1] + assert result == 1 + + df.iloc[:, 2:3] = 0 + expected = df.iloc[:, 2:3] + result = df.iloc[:, 2:3] + tm.assert_frame_equal(result, expected) + + # GH5771 + s = Series(0, index=[4, 5, 6]) + s.iloc[1:2] += 1 + expected = Series([0, 1, 0], index=[4, 5, 6]) + tm.assert_series_equal(s, expected) + + @pytest.mark.parametrize( + 'data, indexes, values, expected_k', [ + # test without indexer value in first level of MultiIndex + ([[2, 22, 5], [2, 33, 6]], [0, -1, 1], [2, 3, 1], [7, 10]), + # test like code sample 1 in the issue + ([[1, 22, 555], [1, 33, 666]], [0, -1, 1], [200, 300, 100], + [755, 1066]), + # test like code sample 2 in the issue + ([[1, 3, 7], [2, 4, 8]], [0, -1, 1], [10, 10, 1000], [17, 1018]), + # test like code sample 3 in the issue + ([[1, 11, 4], [2, 22, 5], [3, 33, 6]], [0, -1, 1], [4, 7, 10], + [8, 15, 13]) + ]) + def test_iloc_setitem_int_multiindex_series( + self, data, indexes, values, expected_k): + # GH17148 + df = pd.DataFrame( + data=data, + columns=['i', 'j', 'k']) + df = df.set_index(['i', 'j']) + + series = df.k.copy() + for i, v in zip(indexes, values): + series.iloc[i] += v + + df['k'] = expected_k + expected = df.k + tm.assert_series_equal(series, expected) + + def test_iloc_setitem_list(self): + + # setitem with an iloc list + df = DataFrame(np.arange(9).reshape((3, 3)), index=["A", "B", "C"], + columns=["A", "B", "C"]) + df.iloc[[0, 1], [1, 2]] + df.iloc[[0, 1], [1, 2]] += 100 + + expected = DataFrame( + np.array([0, 101, 102, 3, 104, 105, 6, 7, 8]).reshape((3, 3)), + index=["A", "B", "C"], columns=["A", "B", "C"]) + tm.assert_frame_equal(df, expected) + + def test_iloc_setitem_pandas_object(self): + # GH 17193, affecting old numpy (1.7 and 1.8) + s_orig = Series([0, 1, 2, 3]) + expected = Series([0, -1, -2, 3]) + + s = s_orig.copy() + s.iloc[Series([1, 2])] = [-1, -2] + tm.assert_series_equal(s, expected) + + s = s_orig.copy() + s.iloc[pd.Index([1, 2])] = [-1, -2] + tm.assert_series_equal(s, expected) + + def test_iloc_setitem_dups(self): + + # GH 6766 + # iloc with a mask aligning from another iloc + df1 = DataFrame([{'A': None, 'B': 1}, {'A': 2, 'B': 2}]) + df2 = DataFrame([{'A': 3, 'B': 3}, {'A': 4, 'B': 4}]) + df = concat([df1, df2], axis=1) + + expected = df.fillna(3) + expected['A'] = expected['A'].astype('float64') + inds = np.isnan(df.iloc[:, 0]) + mask = inds[inds].index + df.iloc[mask, 0] = df.iloc[mask, 2] + tm.assert_frame_equal(df, expected) + + # del a dup column across blocks + expected = DataFrame({0: [1, 2], 1: [3, 4]}) + expected.columns = ['B', 'B'] + del df['A'] + tm.assert_frame_equal(df, expected) + + # assign back to self + df.iloc[[0, 1], [0, 1]] = df.iloc[[0, 1], [0, 1]] + tm.assert_frame_equal(df, expected) + + # reversed x 2 + df.iloc[[1, 0], [0, 1]] = df.iloc[[1, 0], [0, 1]].reset_index( + drop=True) + df.iloc[[1, 0], [0, 1]] = df.iloc[[1, 0], [0, 1]].reset_index( + drop=True) + tm.assert_frame_equal(df, expected) + + def test_iloc_getitem_frame(self): + df = DataFrame(np.random.randn(10, 4), index=lrange(0, 20, 2), + columns=lrange(0, 8, 2)) + + result = df.iloc[2] + with catch_warnings(record=True): + exp = df.ix[4] + tm.assert_series_equal(result, exp) + + result = df.iloc[2, 2] + with catch_warnings(record=True): + exp = df.ix[4, 4] + assert result == exp + + # slice + result = df.iloc[4:8] + with catch_warnings(record=True): + expected = df.ix[8:14] + tm.assert_frame_equal(result, expected) + + result = df.iloc[:, 2:3] + with catch_warnings(record=True): + expected = df.ix[:, 4:5] + tm.assert_frame_equal(result, expected) + + # list of integers + result = df.iloc[[0, 1, 3]] + with catch_warnings(record=True): + expected = df.ix[[0, 2, 6]] + tm.assert_frame_equal(result, expected) + + result = df.iloc[[0, 1, 3], [0, 1]] + with catch_warnings(record=True): + expected = df.ix[[0, 2, 6], [0, 2]] + tm.assert_frame_equal(result, expected) + + # neg indicies + result = df.iloc[[-1, 1, 3], [-1, 1]] + with catch_warnings(record=True): + expected = df.ix[[18, 2, 6], [6, 2]] + tm.assert_frame_equal(result, expected) + + # dups indicies + result = df.iloc[[-1, -1, 1, 3], [-1, 1]] + with catch_warnings(record=True): + expected = df.ix[[18, 18, 2, 6], [6, 2]] + tm.assert_frame_equal(result, expected) + + # with index-like + s = Series(index=lrange(1, 5)) + result = df.iloc[s.index] + with catch_warnings(record=True): + expected = df.ix[[2, 4, 6, 8]] + tm.assert_frame_equal(result, expected) + + def test_iloc_getitem_labelled_frame(self): + # try with labelled frame + df = DataFrame(np.random.randn(10, 4), + index=list('abcdefghij'), columns=list('ABCD')) + + result = df.iloc[1, 1] + exp = df.loc['b', 'B'] + assert result == exp + + result = df.iloc[:, 2:3] + expected = df.loc[:, ['C']] + tm.assert_frame_equal(result, expected) + + # negative indexing + result = df.iloc[-1, -1] + exp = df.loc['j', 'D'] + assert result == exp + + # out-of-bounds exception + pytest.raises(IndexError, df.iloc.__getitem__, tuple([10, 5])) + + # trying to use a label + pytest.raises(ValueError, df.iloc.__getitem__, tuple(['j', 'D'])) + + def test_iloc_getitem_doc_issue(self): + + # multi axis slicing issue with single block + # surfaced in GH 6059 + + arr = np.random.randn(6, 4) + index = date_range('20130101', periods=6) + columns = list('ABCD') + df = DataFrame(arr, index=index, columns=columns) + + # defines ref_locs + df.describe() + + result = df.iloc[3:5, 0:2] + str(result) + result.dtypes + + expected = DataFrame(arr[3:5, 0:2], index=index[3:5], + columns=columns[0:2]) + tm.assert_frame_equal(result, expected) + + # for dups + df.columns = list('aaaa') + result = df.iloc[3:5, 0:2] + str(result) + result.dtypes + + expected = DataFrame(arr[3:5, 0:2], index=index[3:5], + columns=list('aa')) + tm.assert_frame_equal(result, expected) + + # related + arr = np.random.randn(6, 4) + index = list(range(0, 12, 2)) + columns = list(range(0, 8, 2)) + df = DataFrame(arr, index=index, columns=columns) + + df._data.blocks[0].mgr_locs + result = df.iloc[1:5, 2:4] + str(result) + result.dtypes + expected = DataFrame(arr[1:5, 2:4], index=index[1:5], + columns=columns[2:4]) + tm.assert_frame_equal(result, expected) + + def test_iloc_setitem_series(self): + df = DataFrame(np.random.randn(10, 4), index=list('abcdefghij'), + columns=list('ABCD')) + + df.iloc[1, 1] = 1 + result = df.iloc[1, 1] + assert result == 1 + + df.iloc[:, 2:3] = 0 + expected = df.iloc[:, 2:3] + result = df.iloc[:, 2:3] + tm.assert_frame_equal(result, expected) + + s = Series(np.random.randn(10), index=lrange(0, 20, 2)) + + s.iloc[1] = 1 + result = s.iloc[1] + assert result == 1 + + s.iloc[:4] = 0 + expected = s.iloc[:4] + result = s.iloc[:4] + tm.assert_series_equal(result, expected) + + s = Series([-1] * 6) + s.iloc[0::2] = [0, 2, 4] + s.iloc[1::2] = [1, 3, 5] + result = s + expected = Series([0, 1, 2, 3, 4, 5]) + tm.assert_series_equal(result, expected) + + def test_iloc_setitem_list_of_lists(self): + + # GH 7551 + # list-of-list is set incorrectly in mixed vs. single dtyped frames + df = DataFrame(dict(A=np.arange(5, dtype='int64'), + B=np.arange(5, 10, dtype='int64'))) + df.iloc[2:4] = [[10, 11], [12, 13]] + expected = DataFrame(dict(A=[0, 1, 10, 12, 4], B=[5, 6, 11, 13, 9])) + tm.assert_frame_equal(df, expected) + + df = DataFrame( + dict(A=list('abcde'), B=np.arange(5, 10, dtype='int64'))) + df.iloc[2:4] = [['x', 11], ['y', 13]] + expected = DataFrame(dict(A=['a', 'b', 'x', 'y', 'e'], + B=[5, 6, 11, 13, 9])) + tm.assert_frame_equal(df, expected) + + def test_iloc_mask(self): + + # GH 3631, iloc with a mask (of a series) should raise + df = DataFrame(lrange(5), list('ABCDE'), columns=['a']) + mask = (df.a % 2 == 0) + pytest.raises(ValueError, df.iloc.__getitem__, tuple([mask])) + mask.index = lrange(len(mask)) + pytest.raises(NotImplementedError, df.iloc.__getitem__, + tuple([mask])) + + # ndarray ok + result = df.iloc[np.array([True] * len(mask), dtype=bool)] + tm.assert_frame_equal(result, df) + + # the possibilities + locs = np.arange(4) + nums = 2 ** locs + reps = lmap(bin, nums) + df = DataFrame({'locs': locs, 'nums': nums}, reps) + + expected = { + (None, ''): '0b1100', + (None, '.loc'): '0b1100', + (None, '.iloc'): '0b1100', + ('index', ''): '0b11', + ('index', '.loc'): '0b11', + ('index', '.iloc'): ('iLocation based boolean indexing ' + 'cannot use an indexable as a mask'), + ('locs', ''): 'Unalignable boolean Series provided as indexer ' + '(index of the boolean Series and of the indexed ' + 'object do not match', + ('locs', '.loc'): 'Unalignable boolean Series provided as indexer ' + '(index of the boolean Series and of the ' + 'indexed object do not match', + ('locs', '.iloc'): ('iLocation based boolean indexing on an ' + 'integer type is not available'), + } + + # UserWarnings from reindex of a boolean mask + with catch_warnings(record=True): + result = dict() + for idx in [None, 'index', 'locs']: + mask = (df.nums > 2).values + if idx: + mask = Series(mask, list(reversed(getattr(df, idx)))) + for method in ['', '.loc', '.iloc']: + try: + if method: + accessor = getattr(df, method[1:]) + else: + accessor = df + ans = str(bin(accessor[mask]['nums'].sum())) + except Exception as e: + ans = str(e) + + key = tuple([idx, method]) + r = expected.get(key) + if r != ans: + raise AssertionError( + "[%s] does not match [%s], received [%s]" + % (key, ans, r)) + + def test_iloc_non_unique_indexing(self): + + # GH 4017, non-unique indexing (on the axis) + df = DataFrame({'A': [0.1] * 3000, 'B': [1] * 3000}) + idx = np.array(lrange(30)) * 99 + expected = df.iloc[idx] + + df3 = pd.concat([df, 2 * df, 3 * df]) + result = df3.iloc[idx] + + tm.assert_frame_equal(result, expected) + + df2 = DataFrame({'A': [0.1] * 1000, 'B': [1] * 1000}) + df2 = pd.concat([df2, 2 * df2, 3 * df2]) + + sidx = df2.index.to_series() + expected = df2.iloc[idx[idx <= sidx.max()]] + + new_list = [] + for r, s in expected.iterrows(): + new_list.append(s) + new_list.append(s * 2) + new_list.append(s * 3) + + expected = DataFrame(new_list) + expected = pd.concat([expected, DataFrame(index=idx[idx > sidx.max()]) + ]) + result = df2.loc[idx] + tm.assert_frame_equal(result, expected, check_index_type=False) + + def test_iloc_empty_list_indexer_is_ok(self): + from pandas.util.testing import makeCustomDataframe as mkdf + df = mkdf(5, 2) + # vertical empty + tm.assert_frame_equal(df.iloc[:, []], df.iloc[:, :0], + check_index_type=True, check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.iloc[[], :], df.iloc[:0, :], + check_index_type=True, check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.iloc[[]], df.iloc[:0, :], + check_index_type=True, + check_column_type=True) + + def test_identity_slice_returns_new_object(self): + # GH13873 + original_df = DataFrame({'a': [1, 2, 3]}) + sliced_df = original_df.iloc[:] + assert sliced_df is not original_df + + # should be a shallow copy + original_df['a'] = [4, 4, 4] + assert (sliced_df['a'] == 4).all() + + original_series = Series([1, 2, 3, 4, 5, 6]) + sliced_series = original_series.iloc[:] + assert sliced_series is not original_series + + # should also be a shallow copy + original_series[:3] = [7, 8, 9] + assert all(sliced_series[:3] == [7, 8, 9]) diff --git a/pandas/tests/indexing/test_indexing.py b/pandas/tests/indexing/test_indexing.py index f7fa07916ca74..f1f51f26df55c 100644 --- a/pandas/tests/indexing/test_indexing.py +++ b/pandas/tests/indexing/test_indexing.py @@ -1,2893 +1,102 @@ # -*- coding: utf-8 -*- # pylint: disable-msg=W0612,E1101 -import sys -import nose -import itertools -import warnings -from warnings import catch_warnings -from datetime import datetime - -from pandas.types.common import (is_integer_dtype, - is_float_dtype, - is_scalar) -from pandas.compat import range, lrange, lzip, StringIO, lmap, map -from pandas.tslib import NaT -from numpy import nan -from numpy.random import randn -import numpy as np - -import pandas as pd -import pandas.core.common as com -from pandas import option_context -from pandas.core.indexing import _non_reducing_slice, _maybe_numeric_slice -from pandas.core.api import (DataFrame, Index, Series, Panel, isnull, - MultiIndex, Timestamp, Timedelta, UInt64Index) -from pandas.formats.printing import pprint_thing -from pandas import concat -from pandas.core.common import PerformanceWarning, UnsortedIndexError - -import pandas.util.testing as tm -from pandas import date_range - - -_verbose = False - -# ------------------------------------------------------------------------ -# Indexing test cases - - -def _generate_indices(f, values=False): - """ generate the indicies - if values is True , use the axis values - is False, use the range - """ - - axes = f.axes - if values: - axes = [lrange(len(a)) for a in axes] - - return itertools.product(*axes) - - -def _get_value(f, i, values=False): - """ return the value for the location i """ - - # check agains values - if values: - return f.values[i] - - # this is equiv of f[col][row]..... - # v = f - # for a in reversed(i): - # v = v.__getitem__(a) - # return v - with catch_warnings(record=True): - return f.ix[i] - - -def _get_result(obj, method, key, axis): - """ return the result for this obj with this key and this axis """ - - if isinstance(key, dict): - key = key[axis] - - # use an artifical conversion to map the key as integers to the labels - # so ix can work for comparisions - if method == 'indexer': - method = 'ix' - key = obj._get_axis(axis)[key] - - # in case we actually want 0 index slicing - try: - xp = getattr(obj, method).__getitem__(_axify(obj, key, axis)) - except: - xp = getattr(obj, method).__getitem__(key) - - return xp - - -def _axify(obj, key, axis): - # create a tuple accessor - axes = [slice(None)] * obj.ndim - axes[axis] = key - return tuple(axes) - - -def _mklbl(prefix, n): - return ["%s%s" % (prefix, i) for i in range(n)] - - -class TestIndexing(tm.TestCase): - - _multiprocess_can_split_ = True - - _objs = set(['series', 'frame', 'panel']) - _typs = set(['ints', 'uints', 'labels', 'mixed', - 'ts', 'floats', 'empty', 'ts_rev']) - - def setUp(self): - - self.series_ints = Series(np.random.rand(4), index=lrange(0, 8, 2)) - self.frame_ints = DataFrame(np.random.randn(4, 4), - index=lrange(0, 8, 2), - columns=lrange(0, 12, 3)) - self.panel_ints = Panel(np.random.rand(4, 4, 4), - items=lrange(0, 8, 2), - major_axis=lrange(0, 12, 3), - minor_axis=lrange(0, 16, 4)) - - self.series_uints = Series(np.random.rand(4), - index=UInt64Index(lrange(0, 8, 2))) - self.frame_uints = DataFrame(np.random.randn(4, 4), - index=UInt64Index(lrange(0, 8, 2)), - columns=UInt64Index(lrange(0, 12, 3))) - self.panel_uints = Panel(np.random.rand(4, 4, 4), - items=UInt64Index(lrange(0, 8, 2)), - major_axis=UInt64Index(lrange(0, 12, 3)), - minor_axis=UInt64Index(lrange(0, 16, 4))) - - self.series_labels = Series(np.random.randn(4), index=list('abcd')) - self.frame_labels = DataFrame(np.random.randn(4, 4), - index=list('abcd'), columns=list('ABCD')) - self.panel_labels = Panel(np.random.randn(4, 4, 4), - items=list('abcd'), - major_axis=list('ABCD'), - minor_axis=list('ZYXW')) - - self.series_mixed = Series(np.random.randn(4), index=[2, 4, 'null', 8]) - self.frame_mixed = DataFrame(np.random.randn(4, 4), - index=[2, 4, 'null', 8]) - self.panel_mixed = Panel(np.random.randn(4, 4, 4), - items=[2, 4, 'null', 8]) - - self.series_ts = Series(np.random.randn(4), - index=date_range('20130101', periods=4)) - self.frame_ts = DataFrame(np.random.randn(4, 4), - index=date_range('20130101', periods=4)) - self.panel_ts = Panel(np.random.randn(4, 4, 4), - items=date_range('20130101', periods=4)) - - dates_rev = (date_range('20130101', periods=4) - .sort_values(ascending=False)) - self.series_ts_rev = Series(np.random.randn(4), - index=dates_rev) - self.frame_ts_rev = DataFrame(np.random.randn(4, 4), - index=dates_rev) - self.panel_ts_rev = Panel(np.random.randn(4, 4, 4), - items=dates_rev) - - self.frame_empty = DataFrame({}) - self.series_empty = Series({}) - self.panel_empty = Panel({}) - - # form agglomerates - for o in self._objs: - - d = dict() - for t in self._typs: - d[t] = getattr(self, '%s_%s' % (o, t), None) - - setattr(self, o, d) - - def check_values(self, f, func, values=False): - - if f is None: - return - axes = f.axes - indicies = itertools.product(*axes) - - for i in indicies: - result = getattr(f, func)[i] - - # check agains values - if values: - expected = f.values[i] - else: - expected = f - for a in reversed(i): - expected = expected.__getitem__(a) - - tm.assert_almost_equal(result, expected) - - def check_result(self, name, method1, key1, method2, key2, typs=None, - objs=None, axes=None, fails=None): - def _eq(t, o, a, obj, k1, k2): - """ compare equal for these 2 keys """ - - if a is not None and a > obj.ndim - 1: - return - - def _print(result, error=None): - if error is not None: - error = str(error) - v = ("%-16.16s [%-16.16s]: [typ->%-8.8s,obj->%-8.8s," - "key1->(%-4.4s),key2->(%-4.4s),axis->%s] %s" % - (name, result, t, o, method1, method2, a, error or '')) - if _verbose: - pprint_thing(v) - - try: - rs = getattr(obj, method1).__getitem__(_axify(obj, k1, a)) - - try: - xp = _get_result(obj, method2, k2, a) - except: - result = 'no comp' - _print(result) - return - - detail = None - - try: - if is_scalar(rs) and is_scalar(xp): - self.assertEqual(rs, xp) - elif xp.ndim == 1: - tm.assert_series_equal(rs, xp) - elif xp.ndim == 2: - tm.assert_frame_equal(rs, xp) - elif xp.ndim == 3: - tm.assert_panel_equal(rs, xp) - result = 'ok' - except AssertionError as e: - detail = str(e) - result = 'fail' - - # reverse the checks - if fails is True: - if result == 'fail': - result = 'ok (fail)' - - _print(result) - if not result.startswith('ok'): - raise AssertionError(detail) - - except AssertionError: - raise - except Exception as detail: - - # if we are in fails, the ok, otherwise raise it - if fails is not None: - if isinstance(detail, fails): - result = 'ok (%s)' % type(detail).__name__ - _print(result) - return - - result = type(detail).__name__ - raise AssertionError(_print(result, error=detail)) - - if typs is None: - typs = self._typs - - if objs is None: - objs = self._objs - - if axes is not None: - if not isinstance(axes, (tuple, list)): - axes = [axes] - else: - axes = list(axes) - else: - axes = [0, 1, 2] - - # check - for o in objs: - if o not in self._objs: - continue - - d = getattr(self, o) - for a in axes: - for t in typs: - if t not in self._typs: - continue - - obj = d[t] - if obj is not None: - obj = obj.copy() - - k2 = key2 - _eq(t, o, a, obj, key1, k2) - - def test_ix_deprecation(self): - # GH 15114 - - df = DataFrame({'A': [1, 2, 3]}) - with tm.assert_produces_warning(DeprecationWarning, - check_stacklevel=False): - df.ix[1, 'A'] - - def test_indexer_caching(self): - # GH5727 - # make sure that indexers are in the _internal_names_set - n = 1000001 - arrays = [lrange(n), lrange(n)] - index = MultiIndex.from_tuples(lzip(*arrays)) - s = Series(np.zeros(n), index=index) - str(s) - - # setitem - expected = Series(np.ones(n), index=index) - s = Series(np.zeros(n), index=index) - s[s == 0] = 1 - tm.assert_series_equal(s, expected) - - def test_at_and_iat_get(self): - def _check(f, func, values=False): - - if f is not None: - indicies = _generate_indices(f, values) - for i in indicies: - result = getattr(f, func)[i] - expected = _get_value(f, i, values) - tm.assert_almost_equal(result, expected) - - for o in self._objs: - - d = getattr(self, o) - - # iat - for f in [d['ints'], d['uints']]: - _check(f, 'iat', values=True) - - for f in [d['labels'], d['ts'], d['floats']]: - if f is not None: - self.assertRaises(ValueError, self.check_values, f, 'iat') - - # at - for f in [d['ints'], d['uints'], d['labels'], - d['ts'], d['floats']]: - _check(f, 'at') - - def test_at_and_iat_set(self): - def _check(f, func, values=False): - - if f is not None: - indicies = _generate_indices(f, values) - for i in indicies: - getattr(f, func)[i] = 1 - expected = _get_value(f, i, values) - tm.assert_almost_equal(expected, 1) - - for t in self._objs: - - d = getattr(self, t) - - # iat - for f in [d['ints'], d['uints']]: - _check(f, 'iat', values=True) - - for f in [d['labels'], d['ts'], d['floats']]: - if f is not None: - self.assertRaises(ValueError, _check, f, 'iat') - - # at - for f in [d['ints'], d['uints'], d['labels'], - d['ts'], d['floats']]: - _check(f, 'at') - - def test_at_iat_coercion(self): - - # as timestamp is not a tuple! - dates = date_range('1/1/2000', periods=8) - df = DataFrame(randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) - s = df['A'] - - result = s.at[dates[5]] - xp = s.values[5] - self.assertEqual(result, xp) - - # GH 7729 - # make sure we are boxing the returns - s = Series(['2014-01-01', '2014-02-02'], dtype='datetime64[ns]') - expected = Timestamp('2014-02-02') - - for r in [lambda: s.iat[1], lambda: s.iloc[1]]: - result = r() - self.assertEqual(result, expected) - - s = Series(['1 days', '2 days'], dtype='timedelta64[ns]') - expected = Timedelta('2 days') - - for r in [lambda: s.iat[1], lambda: s.iloc[1]]: - result = r() - self.assertEqual(result, expected) - - def test_iat_invalid_args(self): - pass - - def test_imethods_with_dups(self): - - # GH6493 - # iat/iloc with dups - - s = Series(range(5), index=[1, 1, 2, 2, 3], dtype='int64') - result = s.iloc[2] - self.assertEqual(result, 2) - result = s.iat[2] - self.assertEqual(result, 2) - - self.assertRaises(IndexError, lambda: s.iat[10]) - self.assertRaises(IndexError, lambda: s.iat[-10]) - - result = s.iloc[[2, 3]] - expected = Series([2, 3], [2, 2], dtype='int64') - tm.assert_series_equal(result, expected) - - df = s.to_frame() - result = df.iloc[2] - expected = Series(2, index=[0], name=2) - tm.assert_series_equal(result, expected) - - result = df.iat[2, 0] - expected = 2 - self.assertEqual(result, 2) - - def test_repeated_getitem_dups(self): - # GH 5678 - # repeated gettitems on a dup index returing a ndarray - df = DataFrame( - np.random.random_sample((20, 5)), - index=['ABCDE' [x % 5] for x in range(20)]) - expected = df.loc['A', 0] - result = df.loc[:, 0].loc['A'] - tm.assert_series_equal(result, expected) - - def test_iloc_exceeds_bounds(self): - - # GH6296 - # iloc should allow indexers that exceed the bounds - df = DataFrame(np.random.random_sample((20, 5)), columns=list('ABCDE')) - expected = df - - # lists of positions should raise IndexErrror! - with tm.assertRaisesRegexp(IndexError, - 'positional indexers are out-of-bounds'): - df.iloc[:, [0, 1, 2, 3, 4, 5]] - self.assertRaises(IndexError, lambda: df.iloc[[1, 30]]) - self.assertRaises(IndexError, lambda: df.iloc[[1, -30]]) - self.assertRaises(IndexError, lambda: df.iloc[[100]]) - - s = df['A'] - self.assertRaises(IndexError, lambda: s.iloc[[100]]) - self.assertRaises(IndexError, lambda: s.iloc[[-100]]) - - # still raise on a single indexer - msg = 'single positional indexer is out-of-bounds' - with tm.assertRaisesRegexp(IndexError, msg): - df.iloc[30] - self.assertRaises(IndexError, lambda: df.iloc[-30]) - - # GH10779 - # single positive/negative indexer exceeding Series bounds should raise - # an IndexError - with tm.assertRaisesRegexp(IndexError, msg): - s.iloc[30] - self.assertRaises(IndexError, lambda: s.iloc[-30]) - - # slices are ok - result = df.iloc[:, 4:10] # 0 < start < len < stop - expected = df.iloc[:, 4:] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, -4:-10] # stop < 0 < start < len - expected = df.iloc[:, :0] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, 10:4:-1] # 0 < stop < len < start (down) - expected = df.iloc[:, :4:-1] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, 4:-10:-1] # stop < 0 < start < len (down) - expected = df.iloc[:, 4::-1] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, -10:4] # start < 0 < stop < len - expected = df.iloc[:, :4] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, 10:4] # 0 < stop < len < start - expected = df.iloc[:, :0] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, -10:-11:-1] # stop < start < 0 < len (down) - expected = df.iloc[:, :0] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, 10:11] # 0 < len < start < stop - expected = df.iloc[:, :0] - tm.assert_frame_equal(result, expected) - - # slice bounds exceeding is ok - result = s.iloc[18:30] - expected = s.iloc[18:] - tm.assert_series_equal(result, expected) - - result = s.iloc[30:] - expected = s.iloc[:0] - tm.assert_series_equal(result, expected) - - result = s.iloc[30::-1] - expected = s.iloc[::-1] - tm.assert_series_equal(result, expected) - - # doc example - def check(result, expected): - str(result) - result.dtypes - tm.assert_frame_equal(result, expected) - - dfl = DataFrame(np.random.randn(5, 2), columns=list('AB')) - check(dfl.iloc[:, 2:3], DataFrame(index=dfl.index)) - check(dfl.iloc[:, 1:3], dfl.iloc[:, [1]]) - check(dfl.iloc[4:6], dfl.iloc[[4]]) - - self.assertRaises(IndexError, lambda: dfl.iloc[[4, 5, 6]]) - self.assertRaises(IndexError, lambda: dfl.iloc[:, 4]) - - def test_iloc_getitem_int(self): - - # integer - self.check_result('integer', 'iloc', 2, 'ix', - {0: 4, 1: 6, 2: 8}, typs=['ints', 'uints']) - self.check_result('integer', 'iloc', 2, 'indexer', 2, - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - def test_iloc_getitem_neg_int(self): - - # neg integer - self.check_result('neg int', 'iloc', -1, 'ix', - {0: 6, 1: 9, 2: 12}, typs=['ints', 'uints']) - self.check_result('neg int', 'iloc', -1, 'indexer', -1, - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - def test_iloc_getitem_list_int(self): - - # list of ints - self.check_result('list int', 'iloc', [0, 1, 2], 'ix', - {0: [0, 2, 4], 1: [0, 3, 6], 2: [0, 4, 8]}, - typs=['ints', 'uints']) - self.check_result('list int', 'iloc', [2], 'ix', - {0: [4], 1: [6], 2: [8]}, typs=['ints', 'uints']) - self.check_result('list int', 'iloc', [0, 1, 2], 'indexer', [0, 1, 2], - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - # array of ints (GH5006), make sure that a single indexer is returning - # the correct type - self.check_result('array int', 'iloc', np.array([0, 1, 2]), 'ix', - {0: [0, 2, 4], - 1: [0, 3, 6], - 2: [0, 4, 8]}, typs=['ints', 'uints']) - self.check_result('array int', 'iloc', np.array([2]), 'ix', - {0: [4], 1: [6], 2: [8]}, typs=['ints', 'uints']) - self.check_result('array int', 'iloc', np.array([0, 1, 2]), 'indexer', - [0, 1, 2], - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - def test_iloc_getitem_neg_int_can_reach_first_index(self): - # GH10547 and GH10779 - # negative integers should be able to reach index 0 - df = DataFrame({'A': [2, 3, 5], 'B': [7, 11, 13]}) - s = df['A'] - - expected = df.iloc[0] - result = df.iloc[-3] - tm.assert_series_equal(result, expected) - - expected = df.iloc[[0]] - result = df.iloc[[-3]] - tm.assert_frame_equal(result, expected) - - expected = s.iloc[0] - result = s.iloc[-3] - self.assertEqual(result, expected) - - expected = s.iloc[[0]] - result = s.iloc[[-3]] - tm.assert_series_equal(result, expected) - - # check the length 1 Series case highlighted in GH10547 - expected = pd.Series(['a'], index=['A']) - result = expected.iloc[[-1]] - tm.assert_series_equal(result, expected) - - def test_iloc_getitem_dups(self): - - # no dups in panel (bug?) - self.check_result('list int (dups)', 'iloc', [0, 1, 1, 3], 'ix', - {0: [0, 2, 2, 6], 1: [0, 3, 3, 9]}, - objs=['series', 'frame'], typs=['ints', 'uints']) - - # GH 6766 - df1 = DataFrame([{'A': None, 'B': 1}, {'A': 2, 'B': 2}]) - df2 = DataFrame([{'A': 3, 'B': 3}, {'A': 4, 'B': 4}]) - df = concat([df1, df2], axis=1) - - # cross-sectional indexing - result = df.iloc[0, 0] - self.assertTrue(isnull(result)) - - result = df.iloc[0, :] - expected = Series([np.nan, 1, 3, 3], index=['A', 'B', 'A', 'B'], - name=0) - tm.assert_series_equal(result, expected) - - def test_iloc_getitem_array(self): - - # array like - s = Series(index=lrange(1, 4)) - self.check_result('array like', 'iloc', s.index, 'ix', - {0: [2, 4, 6], 1: [3, 6, 9], 2: [4, 8, 12]}, - typs=['ints', 'uints']) - - def test_iloc_getitem_bool(self): - - # boolean indexers - b = [True, False, True, False, ] - self.check_result('bool', 'iloc', b, 'ix', b, typs=['ints', 'uints']) - self.check_result('bool', 'iloc', b, 'ix', b, - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - def test_iloc_getitem_slice(self): - - # slices - self.check_result('slice', 'iloc', slice(1, 3), 'ix', - {0: [2, 4], 1: [3, 6], 2: [4, 8]}, - typs=['ints', 'uints']) - self.check_result('slice', 'iloc', slice(1, 3), 'indexer', - slice(1, 3), - typs=['labels', 'mixed', 'ts', 'floats', 'empty'], - fails=IndexError) - - def test_iloc_getitem_slice_dups(self): - - df1 = DataFrame(np.random.randn(10, 4), columns=['A', 'A', 'B', 'B']) - df2 = DataFrame(np.random.randint(0, 10, size=20).reshape(10, 2), - columns=['A', 'C']) - - # axis=1 - df = concat([df1, df2], axis=1) - tm.assert_frame_equal(df.iloc[:, :4], df1) - tm.assert_frame_equal(df.iloc[:, 4:], df2) - - df = concat([df2, df1], axis=1) - tm.assert_frame_equal(df.iloc[:, :2], df2) - tm.assert_frame_equal(df.iloc[:, 2:], df1) - - exp = concat([df2, df1.iloc[:, [0]]], axis=1) - tm.assert_frame_equal(df.iloc[:, 0:3], exp) - - # axis=0 - df = concat([df, df], axis=0) - tm.assert_frame_equal(df.iloc[0:10, :2], df2) - tm.assert_frame_equal(df.iloc[0:10, 2:], df1) - tm.assert_frame_equal(df.iloc[10:, :2], df2) - tm.assert_frame_equal(df.iloc[10:, 2:], df1) - - def test_iloc_getitem_multiindex2(self): - # TODO(wesm): fix this - raise nose.SkipTest('this test was being suppressed, ' - 'needs to be fixed') - - arr = np.random.randn(3, 3) - df = DataFrame(arr, columns=[[2, 2, 4], [6, 8, 10]], - index=[[4, 4, 8], [8, 10, 12]]) - - rs = df.iloc[2] - xp = Series(arr[2], index=df.columns) - tm.assert_series_equal(rs, xp) - - rs = df.iloc[:, 2] - xp = Series(arr[:, 2], index=df.index) - tm.assert_series_equal(rs, xp) - - rs = df.iloc[2, 2] - xp = df.values[2, 2] - self.assertEqual(rs, xp) - - # for multiple items - # GH 5528 - rs = df.iloc[[0, 1]] - xp = df.xs(4, drop_level=False) - tm.assert_frame_equal(rs, xp) - - tup = zip(*[['a', 'a', 'b', 'b'], ['x', 'y', 'x', 'y']]) - index = MultiIndex.from_tuples(tup) - df = DataFrame(np.random.randn(4, 4), index=index) - rs = df.iloc[[2, 3]] - xp = df.xs('b', drop_level=False) - tm.assert_frame_equal(rs, xp) - - def test_iloc_setitem(self): - df = self.frame_ints - - df.iloc[1, 1] = 1 - result = df.iloc[1, 1] - self.assertEqual(result, 1) - - df.iloc[:, 2:3] = 0 - expected = df.iloc[:, 2:3] - result = df.iloc[:, 2:3] - tm.assert_frame_equal(result, expected) - - # GH5771 - s = Series(0, index=[4, 5, 6]) - s.iloc[1:2] += 1 - expected = Series([0, 1, 0], index=[4, 5, 6]) - tm.assert_series_equal(s, expected) - - def test_loc_setitem_slice(self): - # GH10503 - - # assigning the same type should not change the type - df1 = DataFrame({'a': [0, 1, 1], - 'b': Series([100, 200, 300], dtype='uint32')}) - ix = df1['a'] == 1 - newb1 = df1.loc[ix, 'b'] + 1 - df1.loc[ix, 'b'] = newb1 - expected = DataFrame({'a': [0, 1, 1], - 'b': Series([100, 201, 301], dtype='uint32')}) - tm.assert_frame_equal(df1, expected) - - # assigning a new type should get the inferred type - df2 = DataFrame({'a': [0, 1, 1], 'b': [100, 200, 300]}, - dtype='uint64') - ix = df1['a'] == 1 - newb2 = df2.loc[ix, 'b'] - df1.loc[ix, 'b'] = newb2 - expected = DataFrame({'a': [0, 1, 1], 'b': [100, 200, 300]}, - dtype='uint64') - tm.assert_frame_equal(df2, expected) - - def test_ix_loc_setitem_consistency(self): - - # GH 5771 - # loc with slice and series - s = Series(0, index=[4, 5, 6]) - s.loc[4:5] += 1 - expected = Series([1, 1, 0], index=[4, 5, 6]) - tm.assert_series_equal(s, expected) - - # GH 5928 - # chained indexing assignment - df = DataFrame({'a': [0, 1, 2]}) - expected = df.copy() - with catch_warnings(record=True): - expected.ix[[0, 1, 2], 'a'] = -expected.ix[[0, 1, 2], 'a'] - - with catch_warnings(record=True): - df['a'].ix[[0, 1, 2]] = -df['a'].ix[[0, 1, 2]] - tm.assert_frame_equal(df, expected) - - df = DataFrame({'a': [0, 1, 2], 'b': [0, 1, 2]}) - with catch_warnings(record=True): - df['a'].ix[[0, 1, 2]] = -df['a'].ix[[0, 1, 2]].astype( - 'float64') + 0.5 - expected = DataFrame({'a': [0.5, -0.5, -1.5], 'b': [0, 1, 2]}) - tm.assert_frame_equal(df, expected) - - # GH 8607 - # ix setitem consistency - df = DataFrame({'timestamp': [1413840976, 1413842580, 1413760580], - 'delta': [1174, 904, 161], - 'elapsed': [7673, 9277, 1470]}) - expected = DataFrame({'timestamp': pd.to_datetime( - [1413840976, 1413842580, 1413760580], unit='s'), - 'delta': [1174, 904, 161], - 'elapsed': [7673, 9277, 1470]}) - - df2 = df.copy() - df2['timestamp'] = pd.to_datetime(df['timestamp'], unit='s') - tm.assert_frame_equal(df2, expected) - - df2 = df.copy() - df2.loc[:, 'timestamp'] = pd.to_datetime(df['timestamp'], unit='s') - tm.assert_frame_equal(df2, expected) - - df2 = df.copy() - with catch_warnings(record=True): - df2.ix[:, 2] = pd.to_datetime(df['timestamp'], unit='s') - tm.assert_frame_equal(df2, expected) - - def test_ix_loc_consistency(self): - - # GH 8613 - # some edge cases where ix/loc should return the same - # this is not an exhaustive case - - def compare(result, expected): - if is_scalar(expected): - self.assertEqual(result, expected) - else: - self.assertTrue(expected.equals(result)) - - # failure cases for .loc, but these work for .ix - df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD')) - for key in [slice(1, 3), tuple([slice(0, 2), slice(0, 2)]), - tuple([slice(0, 2), df.columns[0:2]])]: - - for index in [tm.makeStringIndex, tm.makeUnicodeIndex, - tm.makeDateIndex, tm.makePeriodIndex, - tm.makeTimedeltaIndex]: - df.index = index(len(df.index)) - with catch_warnings(record=True): - df.ix[key] - - self.assertRaises(TypeError, lambda: df.loc[key]) - - df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'), - index=pd.date_range('2012-01-01', periods=5)) - - for key in ['2012-01-03', - '2012-01-31', - slice('2012-01-03', '2012-01-03'), - slice('2012-01-03', '2012-01-04'), - slice('2012-01-03', '2012-01-06', 2), - slice('2012-01-03', '2012-01-31'), - tuple([[True, True, True, False, True]]), ]: - - # getitem - - # if the expected raises, then compare the exceptions - try: - with catch_warnings(record=True): - expected = df.ix[key] - except KeyError: - self.assertRaises(KeyError, lambda: df.loc[key]) - continue - - result = df.loc[key] - compare(result, expected) - - # setitem - df1 = df.copy() - df2 = df.copy() - - with catch_warnings(record=True): - df1.ix[key] = 10 - df2.loc[key] = 10 - compare(df2, df1) - - # edge cases - s = Series([1, 2, 3, 4], index=list('abde')) - - result1 = s['a':'c'] - with catch_warnings(record=True): - result2 = s.ix['a':'c'] - result3 = s.loc['a':'c'] - tm.assert_series_equal(result1, result2) - tm.assert_series_equal(result1, result3) - - # now work rather than raising KeyError - s = Series(range(5), [-2, -1, 1, 2, 3]) - - with catch_warnings(record=True): - result1 = s.ix[-10:3] - result2 = s.loc[-10:3] - tm.assert_series_equal(result1, result2) - - with catch_warnings(record=True): - result1 = s.ix[0:3] - result2 = s.loc[0:3] - tm.assert_series_equal(result1, result2) - - def test_setitem_multiindex(self): - for index_fn in ('ix', 'loc'): - - def check(target, indexers, value, compare_fn, expected=None): - fn = getattr(target, index_fn) - fn.__setitem__(indexers, value) - result = fn.__getitem__(indexers) - if expected is None: - expected = value - compare_fn(result, expected) - # GH7190 - index = pd.MultiIndex.from_product([np.arange(0, 100), - np.arange(0, 80)], - names=['time', 'firm']) - t, n = 0, 2 - df = DataFrame(np.nan, columns=['A', 'w', 'l', 'a', 'x', - 'X', 'd', 'profit'], - index=index) - check(target=df, indexers=((t, n), 'X'), value=0, - compare_fn=self.assertEqual) - - df = DataFrame(-999, columns=['A', 'w', 'l', 'a', 'x', - 'X', 'd', 'profit'], - index=index) - check(target=df, indexers=((t, n), 'X'), value=1, - compare_fn=self.assertEqual) - - df = DataFrame(columns=['A', 'w', 'l', 'a', 'x', - 'X', 'd', 'profit'], - index=index) - check(target=df, indexers=((t, n), 'X'), value=2, - compare_fn=self.assertEqual) - - # GH 7218, assinging with 0-dim arrays - df = DataFrame(-999, columns=['A', 'w', 'l', 'a', 'x', - 'X', 'd', 'profit'], - index=index) - check(target=df, - indexers=((t, n), 'X'), - value=np.array(3), - compare_fn=self.assertEqual, - expected=3, ) - - # GH5206 - df = pd.DataFrame(np.arange(25).reshape(5, 5), - columns='A,B,C,D,E'.split(','), dtype=float) - df['F'] = 99 - row_selection = df['A'] % 2 == 0 - col_selection = ['B', 'C'] - with catch_warnings(record=True): - df.ix[row_selection, col_selection] = df['F'] - output = pd.DataFrame(99., index=[0, 2, 4], columns=['B', 'C']) - with catch_warnings(record=True): - tm.assert_frame_equal(df.ix[row_selection, col_selection], - output) - check(target=df, - indexers=(row_selection, col_selection), - value=df['F'], - compare_fn=tm.assert_frame_equal, - expected=output, ) - - # GH11372 - idx = pd.MultiIndex.from_product([ - ['A', 'B', 'C'], - pd.date_range('2015-01-01', '2015-04-01', freq='MS')]) - cols = pd.MultiIndex.from_product([ - ['foo', 'bar'], - pd.date_range('2016-01-01', '2016-02-01', freq='MS')]) - - df = pd.DataFrame(np.random.random((12, 4)), - index=idx, columns=cols) - - subidx = pd.MultiIndex.from_tuples( - [('A', pd.Timestamp('2015-01-01')), - ('A', pd.Timestamp('2015-02-01'))]) - subcols = pd.MultiIndex.from_tuples( - [('foo', pd.Timestamp('2016-01-01')), - ('foo', pd.Timestamp('2016-02-01'))]) - - vals = pd.DataFrame(np.random.random((2, 2)), - index=subidx, columns=subcols) - check(target=df, - indexers=(subidx, subcols), - value=vals, - compare_fn=tm.assert_frame_equal, ) - # set all columns - vals = pd.DataFrame( - np.random.random((2, 4)), index=subidx, columns=cols) - check(target=df, - indexers=(subidx, slice(None, None, None)), - value=vals, - compare_fn=tm.assert_frame_equal, ) - # identity - copy = df.copy() - check(target=df, indexers=(df.index, df.columns), value=df, - compare_fn=tm.assert_frame_equal, expected=copy) - - def test_indexing_with_datetime_tz(self): - - # 8260 - # support datetime64 with tz - - idx = Index(date_range('20130101', periods=3, tz='US/Eastern'), - name='foo') - dr = date_range('20130110', periods=3) - df = DataFrame({'A': idx, 'B': dr}) - df['C'] = idx - df.iloc[1, 1] = pd.NaT - df.iloc[1, 2] = pd.NaT - - # indexing - result = df.iloc[1] - expected = Series([Timestamp('2013-01-02 00:00:00-0500', - tz='US/Eastern'), np.nan, np.nan], - index=list('ABC'), dtype='object', name=1) - tm.assert_series_equal(result, expected) - result = df.loc[1] - expected = Series([Timestamp('2013-01-02 00:00:00-0500', - tz='US/Eastern'), np.nan, np.nan], - index=list('ABC'), dtype='object', name=1) - tm.assert_series_equal(result, expected) - - # indexing - fast_xs - df = DataFrame({'a': date_range('2014-01-01', periods=10, tz='UTC')}) - result = df.iloc[5] - expected = Timestamp('2014-01-06 00:00:00+0000', tz='UTC', freq='D') - self.assertEqual(result, expected) - - result = df.loc[5] - self.assertEqual(result, expected) - - # indexing - boolean - result = df[df.a > df.a[3]] - expected = df.iloc[4:] - tm.assert_frame_equal(result, expected) - - # indexing - setting an element - df = DataFrame(data=pd.to_datetime( - ['2015-03-30 20:12:32', '2015-03-12 00:11:11']), columns=['time']) - df['new_col'] = ['new', 'old'] - df.time = df.set_index('time').index.tz_localize('UTC') - v = df[df.new_col == 'new'].set_index('time').index.tz_convert( - 'US/Pacific') - - # trying to set a single element on a part of a different timezone - def f(): - df.loc[df.new_col == 'new', 'time'] = v - - self.assertRaises(ValueError, f) - - v = df.loc[df.new_col == 'new', 'time'] + pd.Timedelta('1s') - df.loc[df.new_col == 'new', 'time'] = v - tm.assert_series_equal(df.loc[df.new_col == 'new', 'time'], v) - - def test_indexing_with_datetimeindex_tz(self): - - # GH 12050 - # indexing on a series with a datetimeindex with tz - index = pd.date_range('2015-01-01', periods=2, tz='utc') - - ser = pd.Series(range(2), index=index, - dtype='int64') - - # list-like indexing - - for sel in (index, list(index)): - # getitem - tm.assert_series_equal(ser[sel], ser) - - # setitem - result = ser.copy() - result[sel] = 1 - expected = pd.Series(1, index=index) - tm.assert_series_equal(result, expected) - - # .loc getitem - tm.assert_series_equal(ser.loc[sel], ser) - - # .loc setitem - result = ser.copy() - result.loc[sel] = 1 - expected = pd.Series(1, index=index) - tm.assert_series_equal(result, expected) - - # single element indexing - - # getitem - self.assertEqual(ser[index[1]], 1) - - # setitem - result = ser.copy() - result[index[1]] = 5 - expected = pd.Series([0, 5], index=index) - tm.assert_series_equal(result, expected) - - # .loc getitem - self.assertEqual(ser.loc[index[1]], 1) - - # .loc setitem - result = ser.copy() - result.loc[index[1]] = 5 - expected = pd.Series([0, 5], index=index) - tm.assert_series_equal(result, expected) - - def test_loc_setitem_dups(self): - - # GH 6541 - df_orig = DataFrame( - {'me': list('rttti'), - 'foo': list('aaade'), - 'bar': np.arange(5, dtype='float64') * 1.34 + 2, - 'bar2': np.arange(5, dtype='float64') * -.34 + 2}).set_index('me') - - indexer = tuple(['r', ['bar', 'bar2']]) - df = df_orig.copy() - df.loc[indexer] *= 2.0 - tm.assert_series_equal(df.loc[indexer], 2.0 * df_orig.loc[indexer]) - - indexer = tuple(['r', 'bar']) - df = df_orig.copy() - df.loc[indexer] *= 2.0 - self.assertEqual(df.loc[indexer], 2.0 * df_orig.loc[indexer]) - - indexer = tuple(['t', ['bar', 'bar2']]) - df = df_orig.copy() - df.loc[indexer] *= 2.0 - tm.assert_frame_equal(df.loc[indexer], 2.0 * df_orig.loc[indexer]) - - def test_iloc_setitem_dups(self): - - # GH 6766 - # iloc with a mask aligning from another iloc - df1 = DataFrame([{'A': None, 'B': 1}, {'A': 2, 'B': 2}]) - df2 = DataFrame([{'A': 3, 'B': 3}, {'A': 4, 'B': 4}]) - df = concat([df1, df2], axis=1) - - expected = df.fillna(3) - expected['A'] = expected['A'].astype('float64') - inds = np.isnan(df.iloc[:, 0]) - mask = inds[inds].index - df.iloc[mask, 0] = df.iloc[mask, 2] - tm.assert_frame_equal(df, expected) - - # del a dup column across blocks - expected = DataFrame({0: [1, 2], 1: [3, 4]}) - expected.columns = ['B', 'B'] - del df['A'] - tm.assert_frame_equal(df, expected) - - # assign back to self - df.iloc[[0, 1], [0, 1]] = df.iloc[[0, 1], [0, 1]] - tm.assert_frame_equal(df, expected) - - # reversed x 2 - df.iloc[[1, 0], [0, 1]] = df.iloc[[1, 0], [0, 1]].reset_index( - drop=True) - df.iloc[[1, 0], [0, 1]] = df.iloc[[1, 0], [0, 1]].reset_index( - drop=True) - tm.assert_frame_equal(df, expected) - - def test_chained_getitem_with_lists(self): - - # GH6394 - # Regression in chained getitem indexing with embedded list-like from - # 0.12 - def check(result, expected): - tm.assert_numpy_array_equal(result, expected) - tm.assertIsInstance(result, np.ndarray) - - df = DataFrame({'A': 5 * [np.zeros(3)], 'B': 5 * [np.ones(3)]}) - expected = df['A'].iloc[2] - result = df.loc[2, 'A'] - check(result, expected) - result2 = df.iloc[2]['A'] - check(result2, expected) - result3 = df['A'].loc[2] - check(result3, expected) - result4 = df['A'].iloc[2] - check(result4, expected) - - def test_loc_getitem_int(self): - - # int label - self.check_result('int label', 'loc', 2, 'ix', 2, - typs=['ints', 'uints'], axes=0) - self.check_result('int label', 'loc', 3, 'ix', 3, - typs=['ints', 'uints'], axes=1) - self.check_result('int label', 'loc', 4, 'ix', 4, - typs=['ints', 'uints'], axes=2) - self.check_result('int label', 'loc', 2, 'ix', 2, - typs=['label'], fails=KeyError) - - def test_loc_getitem_label(self): - - # label - self.check_result('label', 'loc', 'c', 'ix', 'c', typs=['labels'], - axes=0) - self.check_result('label', 'loc', 'null', 'ix', 'null', typs=['mixed'], - axes=0) - self.check_result('label', 'loc', 8, 'ix', 8, typs=['mixed'], axes=0) - self.check_result('label', 'loc', Timestamp('20130102'), 'ix', 1, - typs=['ts'], axes=0) - self.check_result('label', 'loc', 'c', 'ix', 'c', typs=['empty'], - fails=KeyError) - - def test_loc_getitem_label_out_of_range(self): - - # out of range label - self.check_result('label range', 'loc', 'f', 'ix', 'f', - typs=['ints', 'uints', 'labels', 'mixed', 'ts'], - fails=KeyError) - self.check_result('label range', 'loc', 'f', 'ix', 'f', - typs=['floats'], fails=TypeError) - self.check_result('label range', 'loc', 20, 'ix', 20, - typs=['ints', 'uints', 'mixed'], fails=KeyError) - self.check_result('label range', 'loc', 20, 'ix', 20, - typs=['labels'], fails=TypeError) - self.check_result('label range', 'loc', 20, 'ix', 20, typs=['ts'], - axes=0, fails=TypeError) - self.check_result('label range', 'loc', 20, 'ix', 20, typs=['floats'], - axes=0, fails=TypeError) - - def test_loc_getitem_label_list(self): - - # list of labels - self.check_result('list lbl', 'loc', [0, 2, 4], 'ix', [0, 2, 4], - typs=['ints', 'uints'], axes=0) - self.check_result('list lbl', 'loc', [3, 6, 9], 'ix', [3, 6, 9], - typs=['ints', 'uints'], axes=1) - self.check_result('list lbl', 'loc', [4, 8, 12], 'ix', [4, 8, 12], - typs=['ints', 'uints'], axes=2) - self.check_result('list lbl', 'loc', ['a', 'b', 'd'], 'ix', - ['a', 'b', 'd'], typs=['labels'], axes=0) - self.check_result('list lbl', 'loc', ['A', 'B', 'C'], 'ix', - ['A', 'B', 'C'], typs=['labels'], axes=1) - self.check_result('list lbl', 'loc', ['Z', 'Y', 'W'], 'ix', - ['Z', 'Y', 'W'], typs=['labels'], axes=2) - self.check_result('list lbl', 'loc', [2, 8, 'null'], 'ix', - [2, 8, 'null'], typs=['mixed'], axes=0) - self.check_result('list lbl', 'loc', - [Timestamp('20130102'), Timestamp('20130103')], 'ix', - [Timestamp('20130102'), Timestamp('20130103')], - typs=['ts'], axes=0) - - self.check_result('list lbl', 'loc', [0, 1, 2], 'indexer', [0, 1, 2], - typs=['empty'], fails=KeyError) - self.check_result('list lbl', 'loc', [0, 2, 3], 'ix', [0, 2, 3], - typs=['ints', 'uints'], axes=0, fails=KeyError) - self.check_result('list lbl', 'loc', [3, 6, 7], 'ix', [3, 6, 7], - typs=['ints', 'uints'], axes=1, fails=KeyError) - self.check_result('list lbl', 'loc', [4, 8, 10], 'ix', [4, 8, 10], - typs=['ints', 'uints'], axes=2, fails=KeyError) - - def test_loc_getitem_label_list_fails(self): - # fails - self.check_result('list lbl', 'loc', [20, 30, 40], 'ix', [20, 30, 40], - typs=['ints', 'uints'], axes=1, fails=KeyError) - self.check_result('list lbl', 'loc', [20, 30, 40], 'ix', [20, 30, 40], - typs=['ints', 'uints'], axes=2, fails=KeyError) - - def test_loc_getitem_label_array_like(self): - # array like - self.check_result('array like', 'loc', Series(index=[0, 2, 4]).index, - 'ix', [0, 2, 4], typs=['ints', 'uints'], axes=0) - self.check_result('array like', 'loc', Series(index=[3, 6, 9]).index, - 'ix', [3, 6, 9], typs=['ints', 'uints'], axes=1) - self.check_result('array like', 'loc', Series(index=[4, 8, 12]).index, - 'ix', [4, 8, 12], typs=['ints', 'uints'], axes=2) - - def test_loc_getitem_series(self): - # GH14730 - # passing a series as a key with a MultiIndex - index = MultiIndex.from_product([[1, 2, 3], ['A', 'B', 'C']]) - x = Series(index=index, data=range(9), dtype=np.float64) - y = Series([1, 3]) - expected = Series( - data=[0, 1, 2, 6, 7, 8], - index=MultiIndex.from_product([[1, 3], ['A', 'B', 'C']]), - dtype=np.float64) - result = x.loc[y] - tm.assert_series_equal(result, expected) - - result = x.loc[[1, 3]] - tm.assert_series_equal(result, expected) - - empty = Series(data=[], dtype=np.float64) - expected = Series([], index=MultiIndex( - levels=index.levels, labels=[[], []], dtype=np.float64)) - result = x.loc[empty] - tm.assert_series_equal(result, expected) - - def test_loc_getitem_bool(self): - # boolean indexers - b = [True, False, True, False] - self.check_result('bool', 'loc', b, 'ix', b, - typs=['ints', 'uints', 'labels', - 'mixed', 'ts', 'floats']) - self.check_result('bool', 'loc', b, 'ix', b, typs=['empty'], - fails=KeyError) - - def test_loc_getitem_int_slice(self): - - # ok - self.check_result('int slice2', 'loc', slice(2, 4), 'ix', [2, 4], - typs=['ints', 'uints'], axes=0) - self.check_result('int slice2', 'loc', slice(3, 6), 'ix', [3, 6], - typs=['ints', 'uints'], axes=1) - self.check_result('int slice2', 'loc', slice(4, 8), 'ix', [4, 8], - typs=['ints', 'uints'], axes=2) - - # GH 3053 - # loc should treat integer slices like label slices - from itertools import product - - index = MultiIndex.from_tuples([t for t in product( - [6, 7, 8], ['a', 'b'])]) - df = DataFrame(np.random.randn(6, 6), index, index) - result = df.loc[6:8, :] - with catch_warnings(record=True): - expected = df.ix[6:8, :] - tm.assert_frame_equal(result, expected) - - index = MultiIndex.from_tuples([t - for t in product( - [10, 20, 30], ['a', 'b'])]) - df = DataFrame(np.random.randn(6, 6), index, index) - result = df.loc[20:30, :] - with catch_warnings(record=True): - expected = df.ix[20:30, :] - tm.assert_frame_equal(result, expected) - - # doc examples - result = df.loc[10, :] - with catch_warnings(record=True): - expected = df.ix[10, :] - tm.assert_frame_equal(result, expected) - - result = df.loc[:, 10] - # expected = df.ix[:,10] (this fails) - expected = df[10] - tm.assert_frame_equal(result, expected) - - def test_loc_to_fail(self): - - # GH3449 - df = DataFrame(np.random.random((3, 3)), - index=['a', 'b', 'c'], - columns=['e', 'f', 'g']) - - # raise a KeyError? - self.assertRaises(KeyError, df.loc.__getitem__, - tuple([[1, 2], [1, 2]])) - - # GH 7496 - # loc should not fallback - - s = Series() - s.loc[1] = 1 - s.loc['a'] = 2 - - self.assertRaises(KeyError, lambda: s.loc[-1]) - self.assertRaises(KeyError, lambda: s.loc[[-1, -2]]) - - self.assertRaises(KeyError, lambda: s.loc[['4']]) - - s.loc[-1] = 3 - result = s.loc[[-1, -2]] - expected = Series([3, np.nan], index=[-1, -2]) - tm.assert_series_equal(result, expected) - - s['a'] = 2 - self.assertRaises(KeyError, lambda: s.loc[[-2]]) - - del s['a'] - - def f(): - s.loc[[-2]] = 0 - - self.assertRaises(KeyError, f) - - # inconsistency between .loc[values] and .loc[values,:] - # GH 7999 - df = DataFrame([['a'], ['b']], index=[1, 2], columns=['value']) - - def f(): - df.loc[[3], :] - - self.assertRaises(KeyError, f) - - def f(): - df.loc[[3]] - - self.assertRaises(KeyError, f) - - def test_at_to_fail(self): - # at should not fallback - # GH 7814 - s = Series([1, 2, 3], index=list('abc')) - result = s.at['a'] - self.assertEqual(result, 1) - self.assertRaises(ValueError, lambda: s.at[0]) - - df = DataFrame({'A': [1, 2, 3]}, index=list('abc')) - result = df.at['a', 'A'] - self.assertEqual(result, 1) - self.assertRaises(ValueError, lambda: df.at['a', 0]) - - s = Series([1, 2, 3], index=[3, 2, 1]) - result = s.at[1] - self.assertEqual(result, 3) - self.assertRaises(ValueError, lambda: s.at['a']) - - df = DataFrame({0: [1, 2, 3]}, index=[3, 2, 1]) - result = df.at[1, 0] - self.assertEqual(result, 3) - self.assertRaises(ValueError, lambda: df.at['a', 0]) - - # GH 13822, incorrect error string with non-unique columns when missing - # column is accessed - df = DataFrame({'x': [1.], 'y': [2.], 'z': [3.]}) - df.columns = ['x', 'x', 'z'] - - # Check that we get the correct value in the KeyError - self.assertRaisesRegexp(KeyError, r"\['y'\] not in index", - lambda: df[['x', 'y', 'z']]) - - def test_loc_getitem_label_slice(self): - - # label slices (with ints) - self.check_result('lab slice', 'loc', slice(1, 3), - 'ix', slice(1, 3), - typs=['labels', 'mixed', 'empty', 'ts', 'floats'], - fails=TypeError) - - # real label slices - self.check_result('lab slice', 'loc', slice('a', 'c'), - 'ix', slice('a', 'c'), typs=['labels'], axes=0) - self.check_result('lab slice', 'loc', slice('A', 'C'), - 'ix', slice('A', 'C'), typs=['labels'], axes=1) - self.check_result('lab slice', 'loc', slice('W', 'Z'), - 'ix', slice('W', 'Z'), typs=['labels'], axes=2) - - self.check_result('ts slice', 'loc', slice('20130102', '20130104'), - 'ix', slice('20130102', '20130104'), - typs=['ts'], axes=0) - self.check_result('ts slice', 'loc', slice('20130102', '20130104'), - 'ix', slice('20130102', '20130104'), - typs=['ts'], axes=1, fails=TypeError) - self.check_result('ts slice', 'loc', slice('20130102', '20130104'), - 'ix', slice('20130102', '20130104'), - typs=['ts'], axes=2, fails=TypeError) - - # GH 14316 - self.check_result('ts slice rev', 'loc', slice('20130104', '20130102'), - 'indexer', [0, 1, 2], typs=['ts_rev'], axes=0) - - self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), - typs=['mixed'], axes=0, fails=TypeError) - self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), - typs=['mixed'], axes=1, fails=KeyError) - self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), - typs=['mixed'], axes=2, fails=KeyError) - - self.check_result('mixed slice', 'loc', slice(2, 4, 2), 'ix', slice( - 2, 4, 2), typs=['mixed'], axes=0, fails=TypeError) - - def test_loc_general(self): - - df = DataFrame( - np.random.rand(4, 4), columns=['A', 'B', 'C', 'D'], - index=['A', 'B', 'C', 'D']) - - # want this to work - result = df.loc[:, "A":"B"].iloc[0:2, :] - self.assertTrue((result.columns == ['A', 'B']).all()) - self.assertTrue((result.index == ['A', 'B']).all()) - - # mixed type - result = DataFrame({'a': [Timestamp('20130101')], 'b': [1]}).iloc[0] - expected = Series([Timestamp('20130101'), 1], index=['a', 'b'], name=0) - tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, object) - - def test_loc_setitem_consistency(self): - # GH 6149 - # coerce similary for setitem and loc when rows have a null-slice - expected = DataFrame({'date': Series(0, index=range(5), - dtype=np.int64), - 'val': Series(range(5), dtype=np.int64)}) - - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series( - range(5), dtype=np.int64)}) - df.loc[:, 'date'] = 0 - tm.assert_frame_equal(df, expected) - - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series(range(5), dtype=np.int64)}) - df.loc[:, 'date'] = np.array(0, dtype=np.int64) - tm.assert_frame_equal(df, expected) - - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series(range(5), dtype=np.int64)}) - df.loc[:, 'date'] = np.array([0, 0, 0, 0, 0], dtype=np.int64) - tm.assert_frame_equal(df, expected) - - expected = DataFrame({'date': Series('foo', index=range(5)), - 'val': Series(range(5), dtype=np.int64)}) - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series(range(5), dtype=np.int64)}) - df.loc[:, 'date'] = 'foo' - tm.assert_frame_equal(df, expected) - - expected = DataFrame({'date': Series(1.0, index=range(5)), - 'val': Series(range(5), dtype=np.int64)}) - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series(range(5), dtype=np.int64)}) - df.loc[:, 'date'] = 1.0 - tm.assert_frame_equal(df, expected) - - def test_loc_setitem_consistency_empty(self): - # empty (essentially noops) - expected = DataFrame(columns=['x', 'y']) - expected['x'] = expected['x'].astype(np.int64) - df = DataFrame(columns=['x', 'y']) - df.loc[:, 'x'] = 1 - tm.assert_frame_equal(df, expected) - - df = DataFrame(columns=['x', 'y']) - df['x'] = 1 - tm.assert_frame_equal(df, expected) - - def test_loc_setitem_consistency_slice_column_len(self): - # .loc[:,column] setting with slice == len of the column - # GH10408 - data = """Level_0,,,Respondent,Respondent,Respondent,OtherCat,OtherCat -Level_1,,,Something,StartDate,EndDate,Yes/No,SomethingElse -Region,Site,RespondentID,,,,, -Region_1,Site_1,3987227376,A,5/25/2015 10:59,5/25/2015 11:22,Yes, -Region_1,Site_1,3980680971,A,5/21/2015 9:40,5/21/2015 9:52,Yes,Yes -Region_1,Site_2,3977723249,A,5/20/2015 8:27,5/20/2015 8:41,Yes, -Region_1,Site_2,3977723089,A,5/20/2015 8:33,5/20/2015 9:09,Yes,No""" - - df = pd.read_csv(StringIO(data), header=[0, 1], index_col=[0, 1, 2]) - df.loc[:, ('Respondent', 'StartDate')] = pd.to_datetime(df.loc[:, ( - 'Respondent', 'StartDate')]) - df.loc[:, ('Respondent', 'EndDate')] = pd.to_datetime(df.loc[:, ( - 'Respondent', 'EndDate')]) - df.loc[:, ('Respondent', 'Duration')] = df.loc[:, ( - 'Respondent', 'EndDate')] - df.loc[:, ('Respondent', 'StartDate')] - - df.loc[:, ('Respondent', 'Duration')] = df.loc[:, ( - 'Respondent', 'Duration')].astype('timedelta64[s]') - expected = Series([1380, 720, 840, 2160.], index=df.index, - name=('Respondent', 'Duration')) - tm.assert_series_equal(df[('Respondent', 'Duration')], expected) - - def test_loc_setitem_frame(self): - df = self.frame_labels - - result = df.iloc[0, 0] - - df.loc['a', 'A'] = 1 - result = df.loc['a', 'A'] - self.assertEqual(result, 1) - - result = df.iloc[0, 0] - self.assertEqual(result, 1) - - df.loc[:, 'B':'D'] = 0 - expected = df.loc[:, 'B':'D'] - with catch_warnings(record=True): - result = df.ix[:, 1:] - tm.assert_frame_equal(result, expected) - - # GH 6254 - # setting issue - df = DataFrame(index=[3, 5, 4], columns=['A']) - df.loc[[4, 3, 5], 'A'] = np.array([1, 2, 3], dtype='int64') - expected = DataFrame(dict(A=Series( - [1, 2, 3], index=[4, 3, 5]))).reindex(index=[3, 5, 4]) - tm.assert_frame_equal(df, expected) - - # GH 6252 - # setting with an empty frame - keys1 = ['@' + str(i) for i in range(5)] - val1 = np.arange(5, dtype='int64') - - keys2 = ['@' + str(i) for i in range(4)] - val2 = np.arange(4, dtype='int64') - - index = list(set(keys1).union(keys2)) - df = DataFrame(index=index) - df['A'] = nan - df.loc[keys1, 'A'] = val1 - - df['B'] = nan - df.loc[keys2, 'B'] = val2 - - expected = DataFrame(dict(A=Series(val1, index=keys1), B=Series( - val2, index=keys2))).reindex(index=index) - tm.assert_frame_equal(df, expected) - - # GH 8669 - # invalid coercion of nan -> int - df = DataFrame({'A': [1, 2, 3], 'B': np.nan}) - df.loc[df.B > df.A, 'B'] = df.A - expected = DataFrame({'A': [1, 2, 3], 'B': np.nan}) - tm.assert_frame_equal(df, expected) - - # GH 6546 - # setting with mixed labels - df = DataFrame({1: [1, 2], 2: [3, 4], 'a': ['a', 'b']}) - - result = df.loc[0, [1, 2]] - expected = Series([1, 3], index=[1, 2], dtype=object, name=0) - tm.assert_series_equal(result, expected) - - expected = DataFrame({1: [5, 2], 2: [6, 4], 'a': ['a', 'b']}) - df.loc[0, [1, 2]] = [5, 6] - tm.assert_frame_equal(df, expected) - - def test_loc_setitem_frame_multiples(self): - # multiple setting - df = DataFrame({'A': ['foo', 'bar', 'baz'], - 'B': Series( - range(3), dtype=np.int64)}) - rhs = df.loc[1:2] - rhs.index = df.index[0:2] - df.loc[0:1] = rhs - expected = DataFrame({'A': ['bar', 'baz', 'baz'], - 'B': Series( - [1, 2, 2], dtype=np.int64)}) - tm.assert_frame_equal(df, expected) - - # multiple setting with frame on rhs (with M8) - df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), - 'val': Series( - range(5), dtype=np.int64)}) - expected = DataFrame({'date': [Timestamp('20000101'), Timestamp( - '20000102'), Timestamp('20000101'), Timestamp('20000102'), - Timestamp('20000103')], - 'val': Series( - [0, 1, 0, 1, 2], dtype=np.int64)}) - rhs = df.loc[0:2] - rhs.index = df.index[2:5] - df.loc[2:4] = rhs - tm.assert_frame_equal(df, expected) - - def test_iloc_getitem_frame(self): - df = DataFrame(np.random.randn(10, 4), index=lrange(0, 20, 2), - columns=lrange(0, 8, 2)) - - result = df.iloc[2] - with catch_warnings(record=True): - exp = df.ix[4] - tm.assert_series_equal(result, exp) - - result = df.iloc[2, 2] - with catch_warnings(record=True): - exp = df.ix[4, 4] - self.assertEqual(result, exp) - - # slice - result = df.iloc[4:8] - with catch_warnings(record=True): - expected = df.ix[8:14] - tm.assert_frame_equal(result, expected) - - result = df.iloc[:, 2:3] - with catch_warnings(record=True): - expected = df.ix[:, 4:5] - tm.assert_frame_equal(result, expected) - - # list of integers - result = df.iloc[[0, 1, 3]] - with catch_warnings(record=True): - expected = df.ix[[0, 2, 6]] - tm.assert_frame_equal(result, expected) - - result = df.iloc[[0, 1, 3], [0, 1]] - with catch_warnings(record=True): - expected = df.ix[[0, 2, 6], [0, 2]] - tm.assert_frame_equal(result, expected) - - # neg indicies - result = df.iloc[[-1, 1, 3], [-1, 1]] - with catch_warnings(record=True): - expected = df.ix[[18, 2, 6], [6, 2]] - tm.assert_frame_equal(result, expected) - - # dups indicies - result = df.iloc[[-1, -1, 1, 3], [-1, 1]] - with catch_warnings(record=True): - expected = df.ix[[18, 18, 2, 6], [6, 2]] - tm.assert_frame_equal(result, expected) - - # with index-like - s = Series(index=lrange(1, 5)) - result = df.iloc[s.index] - with catch_warnings(record=True): - expected = df.ix[[2, 4, 6, 8]] - tm.assert_frame_equal(result, expected) - - def test_iloc_getitem_labelled_frame(self): - # try with labelled frame - df = DataFrame(np.random.randn(10, 4), - index=list('abcdefghij'), columns=list('ABCD')) - - result = df.iloc[1, 1] - exp = df.loc['b', 'B'] - self.assertEqual(result, exp) - - result = df.iloc[:, 2:3] - expected = df.loc[:, ['C']] - tm.assert_frame_equal(result, expected) - - # negative indexing - result = df.iloc[-1, -1] - exp = df.loc['j', 'D'] - self.assertEqual(result, exp) - - # out-of-bounds exception - self.assertRaises(IndexError, df.iloc.__getitem__, tuple([10, 5])) - - # trying to use a label - self.assertRaises(ValueError, df.iloc.__getitem__, tuple(['j', 'D'])) - - def test_iloc_getitem_panel(self): - - # GH 7189 - p = Panel(np.arange(4 * 3 * 2).reshape(4, 3, 2), - items=['A', 'B', 'C', 'D'], - major_axis=['a', 'b', 'c'], - minor_axis=['one', 'two']) - - result = p.iloc[1] - expected = p.loc['B'] - tm.assert_frame_equal(result, expected) - - result = p.iloc[1, 1] - expected = p.loc['B', 'b'] - tm.assert_series_equal(result, expected) - - result = p.iloc[1, 1, 1] - expected = p.loc['B', 'b', 'two'] - self.assertEqual(result, expected) - - # slice - result = p.iloc[1:3] - expected = p.loc[['B', 'C']] - tm.assert_panel_equal(result, expected) - - result = p.iloc[:, 0:2] - expected = p.loc[:, ['a', 'b']] - tm.assert_panel_equal(result, expected) - - # list of integers - result = p.iloc[[0, 2]] - expected = p.loc[['A', 'C']] - tm.assert_panel_equal(result, expected) - - # neg indicies - result = p.iloc[[-1, 1], [-1, 1]] - expected = p.loc[['D', 'B'], ['c', 'b']] - tm.assert_panel_equal(result, expected) - - # dups indicies - result = p.iloc[[-1, -1, 1], [-1, 1]] - expected = p.loc[['D', 'D', 'B'], ['c', 'b']] - tm.assert_panel_equal(result, expected) - - # combined - result = p.iloc[0, [True, True], [0, 1]] - expected = p.loc['A', ['a', 'b'], ['one', 'two']] - tm.assert_frame_equal(result, expected) - - # out-of-bounds exception - self.assertRaises(IndexError, p.iloc.__getitem__, tuple([10, 5])) - - def f(): - p.iloc[0, [True, True], [0, 1, 2]] - - self.assertRaises(IndexError, f) - - # trying to use a label - self.assertRaises(ValueError, p.iloc.__getitem__, tuple(['j', 'D'])) - - # GH - p = Panel( - np.random.rand(4, 3, 2), items=['A', 'B', 'C', 'D'], - major_axis=['U', 'V', 'W'], minor_axis=['X', 'Y']) - expected = p['A'] - - result = p.iloc[0, :, :] - tm.assert_frame_equal(result, expected) - - result = p.iloc[0, [True, True, True], :] - tm.assert_frame_equal(result, expected) - - result = p.iloc[0, [True, True, True], [0, 1]] - tm.assert_frame_equal(result, expected) - - def f(): - p.iloc[0, [True, True, True], [0, 1, 2]] - - self.assertRaises(IndexError, f) - - def f(): - p.iloc[0, [True, True, True], [2]] - - self.assertRaises(IndexError, f) - - def test_iloc_getitem_panel_multiindex(self): - # GH 7199 - # Panel with multi-index - multi_index = pd.MultiIndex.from_tuples([('ONE', 'one'), - ('TWO', 'two'), - ('THREE', 'three')], - names=['UPPER', 'lower']) - - simple_index = [x[0] for x in multi_index] - wd1 = Panel(items=['First', 'Second'], major_axis=['a', 'b', 'c', 'd'], - minor_axis=multi_index) - - wd2 = Panel(items=['First', 'Second'], major_axis=['a', 'b', 'c', 'd'], - minor_axis=simple_index) - - expected1 = wd1['First'].iloc[[True, True, True, False], [0, 2]] - result1 = wd1.iloc[0, [True, True, True, False], [0, 2]] # WRONG - tm.assert_frame_equal(result1, expected1) - - expected2 = wd2['First'].iloc[[True, True, True, False], [0, 2]] - result2 = wd2.iloc[0, [True, True, True, False], [0, 2]] - tm.assert_frame_equal(result2, expected2) - - expected1 = DataFrame(index=['a'], columns=multi_index, - dtype='float64') - result1 = wd1.iloc[0, [0], [0, 1, 2]] - tm.assert_frame_equal(result1, expected1) - - expected2 = DataFrame(index=['a'], columns=simple_index, - dtype='float64') - result2 = wd2.iloc[0, [0], [0, 1, 2]] - tm.assert_frame_equal(result2, expected2) - - # GH 7516 - mi = MultiIndex.from_tuples([(0, 'x'), (1, 'y'), (2, 'z')]) - p = Panel(np.arange(3 * 3 * 3, dtype='int64').reshape(3, 3, 3), - items=['a', 'b', 'c'], major_axis=mi, - minor_axis=['u', 'v', 'w']) - result = p.iloc[:, 1, 0] - expected = Series([3, 12, 21], index=['a', 'b', 'c'], name='u') - tm.assert_series_equal(result, expected) - - result = p.loc[:, (1, 'y'), 'u'] - tm.assert_series_equal(result, expected) - - def test_iloc_getitem_doc_issue(self): - - # multi axis slicing issue with single block - # surfaced in GH 6059 - - arr = np.random.randn(6, 4) - index = date_range('20130101', periods=6) - columns = list('ABCD') - df = DataFrame(arr, index=index, columns=columns) - - # defines ref_locs - df.describe() - - result = df.iloc[3:5, 0:2] - str(result) - result.dtypes - - expected = DataFrame(arr[3:5, 0:2], index=index[3:5], - columns=columns[0:2]) - tm.assert_frame_equal(result, expected) - - # for dups - df.columns = list('aaaa') - result = df.iloc[3:5, 0:2] - str(result) - result.dtypes - - expected = DataFrame(arr[3:5, 0:2], index=index[3:5], - columns=list('aa')) - tm.assert_frame_equal(result, expected) - - # related - arr = np.random.randn(6, 4) - index = list(range(0, 12, 2)) - columns = list(range(0, 8, 2)) - df = DataFrame(arr, index=index, columns=columns) - - df._data.blocks[0].mgr_locs - result = df.iloc[1:5, 2:4] - str(result) - result.dtypes - expected = DataFrame(arr[1:5, 2:4], index=index[1:5], - columns=columns[2:4]) - tm.assert_frame_equal(result, expected) - - def test_setitem_ndarray_1d(self): - # GH5508 - - # len of indexer vs length of the 1d ndarray - df = DataFrame(index=Index(lrange(1, 11))) - df['foo'] = np.zeros(10, dtype=np.float64) - df['bar'] = np.zeros(10, dtype=np.complex) - - # invalid - def f(): - with catch_warnings(record=True): - df.ix[2:5, 'bar'] = np.array([2.33j, 1.23 + 0.1j, 2.2]) - - self.assertRaises(ValueError, f) - - def f(): - df.loc[df.index[2:5], 'bar'] = np.array([2.33j, 1.23 + 0.1j, - 2.2, 1.0]) - - self.assertRaises(ValueError, f) - - # valid - df.loc[df.index[2:6], 'bar'] = np.array([2.33j, 1.23 + 0.1j, - 2.2, 1.0]) - - result = df.loc[df.index[2:6], 'bar'] - expected = Series([2.33j, 1.23 + 0.1j, 2.2, 1.0], index=[3, 4, 5, 6], - name='bar') - tm.assert_series_equal(result, expected) - - # dtype getting changed? - df = DataFrame(index=Index(lrange(1, 11))) - df['foo'] = np.zeros(10, dtype=np.float64) - df['bar'] = np.zeros(10, dtype=np.complex) - - def f(): - df[2:5] = np.arange(1, 4) * 1j - - self.assertRaises(ValueError, f) - - def test_iloc_setitem_series(self): - df = DataFrame(np.random.randn(10, 4), index=list('abcdefghij'), - columns=list('ABCD')) - - df.iloc[1, 1] = 1 - result = df.iloc[1, 1] - self.assertEqual(result, 1) - - df.iloc[:, 2:3] = 0 - expected = df.iloc[:, 2:3] - result = df.iloc[:, 2:3] - tm.assert_frame_equal(result, expected) - - s = Series(np.random.randn(10), index=lrange(0, 20, 2)) - - s.iloc[1] = 1 - result = s.iloc[1] - self.assertEqual(result, 1) - - s.iloc[:4] = 0 - expected = s.iloc[:4] - result = s.iloc[:4] - tm.assert_series_equal(result, expected) - - s = Series([-1] * 6) - s.iloc[0::2] = [0, 2, 4] - s.iloc[1::2] = [1, 3, 5] - result = s - expected = Series([0, 1, 2, 3, 4, 5]) - tm.assert_series_equal(result, expected) - - def test_iloc_setitem_list_of_lists(self): - - # GH 7551 - # list-of-list is set incorrectly in mixed vs. single dtyped frames - df = DataFrame(dict(A=np.arange(5, dtype='int64'), - B=np.arange(5, 10, dtype='int64'))) - df.iloc[2:4] = [[10, 11], [12, 13]] - expected = DataFrame(dict(A=[0, 1, 10, 12, 4], B=[5, 6, 11, 13, 9])) - tm.assert_frame_equal(df, expected) - - df = DataFrame( - dict(A=list('abcde'), B=np.arange(5, 10, dtype='int64'))) - df.iloc[2:4] = [['x', 11], ['y', 13]] - expected = DataFrame(dict(A=['a', 'b', 'x', 'y', 'e'], - B=[5, 6, 11, 13, 9])) - tm.assert_frame_equal(df, expected) - - def test_iloc_getitem_multiindex(self): - mi_labels = DataFrame(np.random.randn(4, 3), - columns=[['i', 'i', 'j'], ['A', 'A', 'B']], - index=[['i', 'i', 'j', 'k'], - ['X', 'X', 'Y', 'Y']]) - - mi_int = DataFrame(np.random.randn(3, 3), - columns=[[2, 2, 4], [6, 8, 10]], - index=[[4, 4, 8], [8, 10, 12]]) - - # the first row - rs = mi_int.iloc[0] - with catch_warnings(record=True): - xp = mi_int.ix[4].ix[8] - tm.assert_series_equal(rs, xp, check_names=False) - self.assertEqual(rs.name, (4, 8)) - self.assertEqual(xp.name, 8) - - # 2nd (last) columns - rs = mi_int.iloc[:, 2] - with catch_warnings(record=True): - xp = mi_int.ix[:, 2] - tm.assert_series_equal(rs, xp) - - # corner column - rs = mi_int.iloc[2, 2] - with catch_warnings(record=True): - xp = mi_int.ix[:, 2].ix[2] - self.assertEqual(rs, xp) - - # this is basically regular indexing - rs = mi_labels.iloc[2, 2] - with catch_warnings(record=True): - xp = mi_labels.ix['j'].ix[:, 'j'].ix[0, 0] - self.assertEqual(rs, xp) - - def test_loc_multiindex(self): - - mi_labels = DataFrame(np.random.randn(3, 3), - columns=[['i', 'i', 'j'], ['A', 'A', 'B']], - index=[['i', 'i', 'j'], ['X', 'X', 'Y']]) - - mi_int = DataFrame(np.random.randn(3, 3), - columns=[[2, 2, 4], [6, 8, 10]], - index=[[4, 4, 8], [8, 10, 12]]) - - # the first row - rs = mi_labels.loc['i'] - with catch_warnings(record=True): - xp = mi_labels.ix['i'] - tm.assert_frame_equal(rs, xp) - - # 2nd (last) columns - rs = mi_labels.loc[:, 'j'] - with catch_warnings(record=True): - xp = mi_labels.ix[:, 'j'] - tm.assert_frame_equal(rs, xp) - - # corner column - rs = mi_labels.loc['j'].loc[:, 'j'] - with catch_warnings(record=True): - xp = mi_labels.ix['j'].ix[:, 'j'] - tm.assert_frame_equal(rs, xp) - - # with a tuple - rs = mi_labels.loc[('i', 'X')] - with catch_warnings(record=True): - xp = mi_labels.ix[('i', 'X')] - tm.assert_frame_equal(rs, xp) - - rs = mi_int.loc[4] - with catch_warnings(record=True): - xp = mi_int.ix[4] - tm.assert_frame_equal(rs, xp) - - def test_loc_multiindex_indexer_none(self): - - # GH6788 - # multi-index indexer is None (meaning take all) - attributes = ['Attribute' + str(i) for i in range(1)] - attribute_values = ['Value' + str(i) for i in range(5)] - - index = MultiIndex.from_product([attributes, attribute_values]) - df = 0.1 * np.random.randn(10, 1 * 5) + 0.5 - df = DataFrame(df, columns=index) - result = df[attributes] - tm.assert_frame_equal(result, df) - - # GH 7349 - # loc with a multi-index seems to be doing fallback - df = DataFrame(np.arange(12).reshape(-1, 1), - index=pd.MultiIndex.from_product([[1, 2, 3, 4], - [1, 2, 3]])) - - expected = df.loc[([1, 2], ), :] - result = df.loc[[1, 2]] - tm.assert_frame_equal(result, expected) - - def test_loc_multiindex_incomplete(self): - - # GH 7399 - # incomplete indexers - s = pd.Series(np.arange(15, dtype='int64'), - MultiIndex.from_product([range(5), ['a', 'b', 'c']])) - expected = s.loc[:, 'a':'c'] - - result = s.loc[0:4, 'a':'c'] - tm.assert_series_equal(result, expected) - tm.assert_series_equal(result, expected) - - result = s.loc[:4, 'a':'c'] - tm.assert_series_equal(result, expected) - tm.assert_series_equal(result, expected) - - result = s.loc[0:, 'a':'c'] - tm.assert_series_equal(result, expected) - tm.assert_series_equal(result, expected) - - # GH 7400 - # multiindexer gettitem with list of indexers skips wrong element - s = pd.Series(np.arange(15, dtype='int64'), - MultiIndex.from_product([range(5), ['a', 'b', 'c']])) - expected = s.iloc[[6, 7, 8, 12, 13, 14]] - result = s.loc[2:4:2, 'a':'c'] - tm.assert_series_equal(result, expected) - - def test_multiindex_perf_warn(self): - - if sys.version_info < (2, 7): - raise nose.SkipTest('python version < 2.7') - - df = DataFrame({'jim': [0, 0, 1, 1], - 'joe': ['x', 'x', 'z', 'y'], - 'jolie': np.random.rand(4)}).set_index(['jim', 'joe']) - - with tm.assert_produces_warning(PerformanceWarning, - clear=[pd.core.index]): - df.loc[(1, 'z')] - - df = df.iloc[[2, 1, 3, 0]] - with tm.assert_produces_warning(PerformanceWarning): - df.loc[(0, )] - - def test_series_getitem_multiindex(self): - - # GH 6018 - # series regression getitem with a multi-index - - s = Series([1, 2, 3]) - s.index = MultiIndex.from_tuples([(0, 0), (1, 1), (2, 1)]) - - result = s[:, 0] - expected = Series([1], index=[0]) - tm.assert_series_equal(result, expected) - - result = s.loc[:, 1] - expected = Series([2, 3], index=[1, 2]) - tm.assert_series_equal(result, expected) - - # xs - result = s.xs(0, level=0) - expected = Series([1], index=[0]) - tm.assert_series_equal(result, expected) - - result = s.xs(1, level=1) - expected = Series([2, 3], index=[1, 2]) - tm.assert_series_equal(result, expected) - - # GH6258 - dt = list(date_range('20130903', periods=3)) - idx = MultiIndex.from_product([list('AB'), dt]) - s = Series([1, 3, 4, 1, 3, 4], index=idx) - - result = s.xs('20130903', level=1) - expected = Series([1, 1], index=list('AB')) - tm.assert_series_equal(result, expected) - - # GH5684 - idx = MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), ('b', 'one'), - ('b', 'two')]) - s = Series([1, 2, 3, 4], index=idx) - s.index.set_names(['L1', 'L2'], inplace=True) - result = s.xs('one', level='L2') - expected = Series([1, 3], index=['a', 'b']) - expected.index.set_names(['L1'], inplace=True) - tm.assert_series_equal(result, expected) - - def test_ix_general(self): - - # ix general issues - - # GH 2817 - data = {'amount': {0: 700, 1: 600, 2: 222, 3: 333, 4: 444}, - 'col': {0: 3.5, 1: 3.5, 2: 4.0, 3: 4.0, 4: 4.0}, - 'year': {0: 2012, 1: 2011, 2: 2012, 3: 2012, 4: 2012}} - df = DataFrame(data).set_index(keys=['col', 'year']) - key = 4.0, 2012 - - # emits a PerformanceWarning, ok - with self.assert_produces_warning(PerformanceWarning): - tm.assert_frame_equal(df.loc[key], df.iloc[2:]) - - # this is ok - df.sort_index(inplace=True) - res = df.loc[key] - - # col has float dtype, result should be Float64Index - index = MultiIndex.from_arrays([[4.] * 3, [2012] * 3], - names=['col', 'year']) - expected = DataFrame({'amount': [222, 333, 444]}, index=index) - tm.assert_frame_equal(res, expected) - - def test_ix_weird_slicing(self): - # http://stackoverflow.com/q/17056560/1240268 - df = DataFrame({'one': [1, 2, 3, np.nan, np.nan], - 'two': [1, 2, 3, 4, 5]}) - df.loc[df['one'] > 1, 'two'] = -df['two'] - - expected = DataFrame({'one': {0: 1.0, - 1: 2.0, - 2: 3.0, - 3: nan, - 4: nan}, - 'two': {0: 1, - 1: -2, - 2: -3, - 3: 4, - 4: 5}}) - tm.assert_frame_equal(df, expected) - - def test_xs_multiindex(self): - - # GH2903 - columns = MultiIndex.from_tuples( - [('a', 'foo'), ('a', 'bar'), ('b', 'hello'), - ('b', 'world')], names=['lvl0', 'lvl1']) - df = DataFrame(np.random.randn(4, 4), columns=columns) - df.sort_index(axis=1, inplace=True) - result = df.xs('a', level='lvl0', axis=1) - expected = df.iloc[:, 0:2].loc[:, 'a'] - tm.assert_frame_equal(result, expected) - - result = df.xs('foo', level='lvl1', axis=1) - expected = df.iloc[:, 1:2].copy() - expected.columns = expected.columns.droplevel('lvl1') - tm.assert_frame_equal(result, expected) - - def test_per_axis_per_level_getitem(self): - - # GH6134 - # example test case - ix = MultiIndex.from_product([_mklbl('A', 5), _mklbl('B', 7), _mklbl( - 'C', 4), _mklbl('D', 2)]) - df = DataFrame(np.arange(len(ix.get_values())), index=ix) - - result = df.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (a == 'A1' or a == 'A2' or a == 'A3') and ( - c == 'C1' or c == 'C3')]] - tm.assert_frame_equal(result, expected) - - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (a == 'A1' or a == 'A2' or a == 'A3') and ( - c == 'C1' or c == 'C2' or c == 'C3')]] - result = df.loc[(slice('A1', 'A3'), slice(None), slice('C1', 'C3')), :] - tm.assert_frame_equal(result, expected) - - # test multi-index slicing with per axis and per index controls - index = MultiIndex.from_tuples([('A', 1), ('A', 2), - ('A', 3), ('B', 1)], - names=['one', 'two']) - columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), - ('b', 'foo'), ('b', 'bah')], - names=['lvl0', 'lvl1']) - - df = DataFrame( - np.arange(16, dtype='int64').reshape( - 4, 4), index=index, columns=columns) - df = df.sort_index(axis=0).sort_index(axis=1) - - # identity - result = df.loc[(slice(None), slice(None)), :] - tm.assert_frame_equal(result, df) - result = df.loc[(slice(None), slice(None)), (slice(None), slice(None))] - tm.assert_frame_equal(result, df) - result = df.loc[:, (slice(None), slice(None))] - tm.assert_frame_equal(result, df) - - # index - result = df.loc[(slice(None), [1]), :] - expected = df.iloc[[0, 3]] - tm.assert_frame_equal(result, expected) - - result = df.loc[(slice(None), 1), :] - expected = df.iloc[[0, 3]] - tm.assert_frame_equal(result, expected) - - # columns - result = df.loc[:, (slice(None), ['foo'])] - expected = df.iloc[:, [1, 3]] - tm.assert_frame_equal(result, expected) - - # both - result = df.loc[(slice(None), 1), (slice(None), ['foo'])] - expected = df.iloc[[0, 3], [1, 3]] - tm.assert_frame_equal(result, expected) - - result = df.loc['A', 'a'] - expected = DataFrame(dict(bar=[1, 5, 9], foo=[0, 4, 8]), - index=Index([1, 2, 3], name='two'), - columns=Index(['bar', 'foo'], name='lvl1')) - tm.assert_frame_equal(result, expected) - - result = df.loc[(slice(None), [1, 2]), :] - expected = df.iloc[[0, 1, 3]] - tm.assert_frame_equal(result, expected) - - # multi-level series - s = Series(np.arange(len(ix.get_values())), index=ix) - result = s.loc['A1':'A3', :, ['C1', 'C3']] - expected = s.loc[[tuple([a, b, c, d]) - for a, b, c, d in s.index.values - if (a == 'A1' or a == 'A2' or a == 'A3') and ( - c == 'C1' or c == 'C3')]] - tm.assert_series_equal(result, expected) - - # boolean indexers - result = df.loc[(slice(None), df.loc[:, ('a', 'bar')] > 5), :] - expected = df.iloc[[2, 3]] - tm.assert_frame_equal(result, expected) - - def f(): - df.loc[(slice(None), np.array([True, False])), :] - - self.assertRaises(ValueError, f) - - # ambiguous cases - # these can be multiply interpreted (e.g. in this case - # as df.loc[slice(None),[1]] as well - self.assertRaises(KeyError, lambda: df.loc[slice(None), [1]]) - - result = df.loc[(slice(None), [1]), :] - expected = df.iloc[[0, 3]] - tm.assert_frame_equal(result, expected) - - # not lexsorted - self.assertEqual(df.index.lexsort_depth, 2) - df = df.sort_index(level=1, axis=0) - self.assertEqual(df.index.lexsort_depth, 0) - with tm.assertRaisesRegexp( - UnsortedIndexError, - 'MultiIndex Slicing requires the index to be fully ' - r'lexsorted tuple len \(2\), lexsort depth \(0\)'): - df.loc[(slice(None), df.loc[:, ('a', 'bar')] > 5), :] - - def test_multiindex_slicers_non_unique(self): - - # GH 7106 - # non-unique mi index support - df = (DataFrame(dict(A=['foo', 'foo', 'foo', 'foo'], - B=['a', 'a', 'a', 'a'], - C=[1, 2, 1, 3], - D=[1, 2, 3, 4])) - .set_index(['A', 'B', 'C']).sort_index()) - self.assertFalse(df.index.is_unique) - expected = (DataFrame(dict(A=['foo', 'foo'], B=['a', 'a'], - C=[1, 1], D=[1, 3])) - .set_index(['A', 'B', 'C']).sort_index()) - result = df.loc[(slice(None), slice(None), 1), :] - tm.assert_frame_equal(result, expected) - - # this is equivalent of an xs expression - result = df.xs(1, level=2, drop_level=False) - tm.assert_frame_equal(result, expected) - - df = (DataFrame(dict(A=['foo', 'foo', 'foo', 'foo'], - B=['a', 'a', 'a', 'a'], - C=[1, 2, 1, 2], - D=[1, 2, 3, 4])) - .set_index(['A', 'B', 'C']).sort_index()) - self.assertFalse(df.index.is_unique) - expected = (DataFrame(dict(A=['foo', 'foo'], B=['a', 'a'], - C=[1, 1], D=[1, 3])) - .set_index(['A', 'B', 'C']).sort_index()) - result = df.loc[(slice(None), slice(None), 1), :] - self.assertFalse(result.index.is_unique) - tm.assert_frame_equal(result, expected) - - # GH12896 - # numpy-implementation dependent bug - ints = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 16, - 17, 18, 19, 200000, 200000] - n = len(ints) - idx = MultiIndex.from_arrays([['a'] * n, ints]) - result = Series([1] * n, index=idx) - result = result.sort_index() - result = result.loc[(slice(None), slice(100000))] - expected = Series([1] * (n - 2), index=idx[:-2]).sort_index() - tm.assert_series_equal(result, expected) - - def test_multiindex_slicers_datetimelike(self): - - # GH 7429 - # buggy/inconsistent behavior when slicing with datetime-like - import datetime - dates = [datetime.datetime(2012, 1, 1, 12, 12, 12) + - datetime.timedelta(days=i) for i in range(6)] - freq = [1, 2] - index = MultiIndex.from_product( - [dates, freq], names=['date', 'frequency']) - - df = DataFrame( - np.arange(6 * 2 * 4, dtype='int64').reshape( - -1, 4), index=index, columns=list('ABCD')) - - # multi-axis slicing - idx = pd.IndexSlice - expected = df.iloc[[0, 2, 4], [0, 1]] - result = df.loc[(slice(Timestamp('2012-01-01 12:12:12'), - Timestamp('2012-01-03 12:12:12')), - slice(1, 1)), slice('A', 'B')] - tm.assert_frame_equal(result, expected) - - result = df.loc[(idx[Timestamp('2012-01-01 12:12:12'):Timestamp( - '2012-01-03 12:12:12')], idx[1:1]), slice('A', 'B')] - tm.assert_frame_equal(result, expected) - - result = df.loc[(slice(Timestamp('2012-01-01 12:12:12'), - Timestamp('2012-01-03 12:12:12')), 1), - slice('A', 'B')] - tm.assert_frame_equal(result, expected) - - # with strings - result = df.loc[(slice('2012-01-01 12:12:12', '2012-01-03 12:12:12'), - slice(1, 1)), slice('A', 'B')] - tm.assert_frame_equal(result, expected) - - result = df.loc[(idx['2012-01-01 12:12:12':'2012-01-03 12:12:12'], 1), - idx['A', 'B']] - tm.assert_frame_equal(result, expected) - - def test_multiindex_slicers_edges(self): - # GH 8132 - # various edge cases - df = DataFrame( - {'A': ['A0'] * 5 + ['A1'] * 5 + ['A2'] * 5, - 'B': ['B0', 'B0', 'B1', 'B1', 'B2'] * 3, - 'DATE': ["2013-06-11", "2013-07-02", "2013-07-09", "2013-07-30", - "2013-08-06", "2013-06-11", "2013-07-02", "2013-07-09", - "2013-07-30", "2013-08-06", "2013-09-03", "2013-10-01", - "2013-07-09", "2013-08-06", "2013-09-03"], - 'VALUES': [22, 35, 14, 9, 4, 40, 18, 4, 2, 5, 1, 2, 3, 4, 2]}) - - df['DATE'] = pd.to_datetime(df['DATE']) - df1 = df.set_index(['A', 'B', 'DATE']) - df1 = df1.sort_index() - - # A1 - Get all values under "A0" and "A1" - result = df1.loc[(slice('A1')), :] - expected = df1.iloc[0:10] - tm.assert_frame_equal(result, expected) - - # A2 - Get all values from the start to "A2" - result = df1.loc[(slice('A2')), :] - expected = df1 - tm.assert_frame_equal(result, expected) - - # A3 - Get all values under "B1" or "B2" - result = df1.loc[(slice(None), slice('B1', 'B2')), :] - expected = df1.iloc[[2, 3, 4, 7, 8, 9, 12, 13, 14]] - tm.assert_frame_equal(result, expected) - - # A4 - Get all values between 2013-07-02 and 2013-07-09 - result = df1.loc[(slice(None), slice(None), - slice('20130702', '20130709')), :] - expected = df1.iloc[[1, 2, 6, 7, 12]] - tm.assert_frame_equal(result, expected) - - # B1 - Get all values in B0 that are also under A0, A1 and A2 - result = df1.loc[(slice('A2'), slice('B0')), :] - expected = df1.iloc[[0, 1, 5, 6, 10, 11]] - tm.assert_frame_equal(result, expected) - - # B2 - Get all values in B0, B1 and B2 (similar to what #2 is doing for - # the As) - result = df1.loc[(slice(None), slice('B2')), :] - expected = df1 - tm.assert_frame_equal(result, expected) - - # B3 - Get all values from B1 to B2 and up to 2013-08-06 - result = df1.loc[(slice(None), slice('B1', 'B2'), - slice('2013-08-06')), :] - expected = df1.iloc[[2, 3, 4, 7, 8, 9, 12, 13]] - tm.assert_frame_equal(result, expected) - - # B4 - Same as A4 but the start of the date slice is not a key. - # shows indexing on a partial selection slice - result = df1.loc[(slice(None), slice(None), - slice('20130701', '20130709')), :] - expected = df1.iloc[[1, 2, 6, 7, 12]] - tm.assert_frame_equal(result, expected) - - def test_per_axis_per_level_doc_examples(self): - - # test index maker - idx = pd.IndexSlice - - # from indexing.rst / advanced - index = MultiIndex.from_product([_mklbl('A', 4), _mklbl('B', 2), - _mklbl('C', 4), _mklbl('D', 2)]) - columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), - ('b', 'foo'), ('b', 'bah')], - names=['lvl0', 'lvl1']) - df = DataFrame(np.arange(len(index) * len(columns), dtype='int64') - .reshape((len(index), len(columns))), - index=index, columns=columns) - result = df.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (a == 'A1' or a == 'A2' or a == 'A3') and ( - c == 'C1' or c == 'C3')]] - tm.assert_frame_equal(result, expected) - result = df.loc[idx['A1':'A3', :, ['C1', 'C3']], :] - tm.assert_frame_equal(result, expected) - - result = df.loc[(slice(None), slice(None), ['C1', 'C3']), :] - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (c == 'C1' or c == 'C3')]] - tm.assert_frame_equal(result, expected) - result = df.loc[idx[:, :, ['C1', 'C3']], :] - tm.assert_frame_equal(result, expected) - - # not sorted - def f(): - df.loc['A1', (slice(None), 'foo')] - - self.assertRaises(UnsortedIndexError, f) - df = df.sort_index(axis=1) - - # slicing - df.loc['A1', (slice(None), 'foo')] - df.loc[(slice(None), slice(None), ['C1', 'C3']), (slice(None), 'foo')] - - # setitem - df.loc(axis=0)[:, :, ['C1', 'C3']] = -10 - - def test_loc_axis_arguments(self): - - index = MultiIndex.from_product([_mklbl('A', 4), _mklbl('B', 2), - _mklbl('C', 4), _mklbl('D', 2)]) - columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), - ('b', 'foo'), ('b', 'bah')], - names=['lvl0', 'lvl1']) - df = DataFrame(np.arange(len(index) * len(columns), dtype='int64') - .reshape((len(index), len(columns))), - index=index, - columns=columns).sort_index().sort_index(axis=1) - - # axis 0 - result = df.loc(axis=0)['A1':'A3', :, ['C1', 'C3']] - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (a == 'A1' or a == 'A2' or a == 'A3') and ( - c == 'C1' or c == 'C3')]] - tm.assert_frame_equal(result, expected) - - result = df.loc(axis='index')[:, :, ['C1', 'C3']] - expected = df.loc[[tuple([a, b, c, d]) - for a, b, c, d in df.index.values - if (c == 'C1' or c == 'C3')]] - tm.assert_frame_equal(result, expected) - - # axis 1 - result = df.loc(axis=1)[:, 'foo'] - expected = df.loc[:, (slice(None), 'foo')] - tm.assert_frame_equal(result, expected) - - result = df.loc(axis='columns')[:, 'foo'] - expected = df.loc[:, (slice(None), 'foo')] - tm.assert_frame_equal(result, expected) - - # invalid axis - def f(): - df.loc(axis=-1)[:, :, ['C1', 'C3']] - - self.assertRaises(ValueError, f) - - def f(): - df.loc(axis=2)[:, :, ['C1', 'C3']] - - self.assertRaises(ValueError, f) - - def f(): - df.loc(axis='foo')[:, :, ['C1', 'C3']] - - self.assertRaises(ValueError, f) - - def test_loc_coerceion(self): - - # 12411 - df = DataFrame({'date': [pd.Timestamp('20130101').tz_localize('UTC'), - pd.NaT]}) - expected = df.dtypes - - result = df.iloc[[0]] - tm.assert_series_equal(result.dtypes, expected) - - result = df.iloc[[1]] - tm.assert_series_equal(result.dtypes, expected) - - # 12045 - import datetime - df = DataFrame({'date': [datetime.datetime(2012, 1, 1), - datetime.datetime(1012, 1, 2)]}) - expected = df.dtypes - - result = df.iloc[[0]] - tm.assert_series_equal(result.dtypes, expected) - - result = df.iloc[[1]] - tm.assert_series_equal(result.dtypes, expected) - - # 11594 - df = DataFrame({'text': ['some words'] + [None] * 9}) - expected = df.dtypes - - result = df.iloc[0:2] - tm.assert_series_equal(result.dtypes, expected) - - result = df.iloc[3:] - tm.assert_series_equal(result.dtypes, expected) - - def test_per_axis_per_level_setitem(self): - - # test index maker - idx = pd.IndexSlice - - # test multi-index slicing with per axis and per index controls - index = MultiIndex.from_tuples([('A', 1), ('A', 2), - ('A', 3), ('B', 1)], - names=['one', 'two']) - columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), - ('b', 'foo'), ('b', 'bah')], - names=['lvl0', 'lvl1']) - - df_orig = DataFrame( - np.arange(16, dtype='int64').reshape( - 4, 4), index=index, columns=columns) - df_orig = df_orig.sort_index(axis=0).sort_index(axis=1) - - # identity - df = df_orig.copy() - df.loc[(slice(None), slice(None)), :] = 100 - expected = df_orig.copy() - expected.iloc[:, :] = 100 - tm.assert_frame_equal(df, expected) - - df = df_orig.copy() - df.loc(axis=0)[:, :] = 100 - expected = df_orig.copy() - expected.iloc[:, :] = 100 - tm.assert_frame_equal(df, expected) - df = df_orig.copy() - df.loc[(slice(None), slice(None)), (slice(None), slice(None))] = 100 - expected = df_orig.copy() - expected.iloc[:, :] = 100 - tm.assert_frame_equal(df, expected) - - df = df_orig.copy() - df.loc[:, (slice(None), slice(None))] = 100 - expected = df_orig.copy() - expected.iloc[:, :] = 100 - tm.assert_frame_equal(df, expected) +""" test fancy indexing & misc """ - # index - df = df_orig.copy() - df.loc[(slice(None), [1]), :] = 100 - expected = df_orig.copy() - expected.iloc[[0, 3]] = 100 - tm.assert_frame_equal(df, expected) +import pytest - df = df_orig.copy() - df.loc[(slice(None), 1), :] = 100 - expected = df_orig.copy() - expected.iloc[[0, 3]] = 100 - tm.assert_frame_equal(df, expected) +from warnings import catch_warnings +from datetime import datetime - df = df_orig.copy() - df.loc(axis=0)[:, 1] = 100 - expected = df_orig.copy() - expected.iloc[[0, 3]] = 100 - tm.assert_frame_equal(df, expected) +from pandas.core.dtypes.common import ( + is_integer_dtype, + is_float_dtype) +from pandas.compat import range, lrange, lzip, StringIO +import numpy as np - # columns - df = df_orig.copy() - df.loc[:, (slice(None), ['foo'])] = 100 - expected = df_orig.copy() - expected.iloc[:, [1, 3]] = 100 - tm.assert_frame_equal(df, expected) +import pandas as pd +from pandas.core.indexing import _non_reducing_slice, _maybe_numeric_slice +from pandas import NaT, DataFrame, Index, Series, MultiIndex +import pandas.util.testing as tm - # both - df = df_orig.copy() - df.loc[(slice(None), 1), (slice(None), ['foo'])] = 100 - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] = 100 - tm.assert_frame_equal(df, expected) +from pandas.tests.indexing.common import Base, _mklbl - df = df_orig.copy() - df.loc[idx[:, 1], idx[:, ['foo']]] = 100 - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] = 100 - tm.assert_frame_equal(df, expected) - df = df_orig.copy() - df.loc['A', 'a'] = 100 - expected = df_orig.copy() - expected.iloc[0:3, 0:2] = 100 - tm.assert_frame_equal(df, expected) +# ------------------------------------------------------------------------ +# Indexing test cases - # setting with a list-like - df = df_orig.copy() - df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( - [[100, 100], [100, 100]], dtype='int64') - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] = 100 - tm.assert_frame_equal(df, expected) - # not enough values - df = df_orig.copy() +class TestFancy(Base): + """ pure get/set item & fancy indexing """ - def f(): - df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( - [[100], [100, 100]], dtype='int64') + def test_setitem_ndarray_1d(self): + # GH5508 - self.assertRaises(ValueError, f) + # len of indexer vs length of the 1d ndarray + df = DataFrame(index=Index(lrange(1, 11))) + df['foo'] = np.zeros(10, dtype=np.float64) + df['bar'] = np.zeros(10, dtype=np.complex) + # invalid def f(): - df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( - [100, 100, 100, 100], dtype='int64') - - self.assertRaises(ValueError, f) - - # with an alignable rhs - df = df_orig.copy() - df.loc[(slice(None), 1), (slice(None), ['foo'])] = df.loc[(slice( - None), 1), (slice(None), ['foo'])] * 5 - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] = expected.iloc[[0, 3], [1, 3]] * 5 - tm.assert_frame_equal(df, expected) - - df = df_orig.copy() - df.loc[(slice(None), 1), (slice(None), ['foo'])] *= df.loc[(slice( - None), 1), (slice(None), ['foo'])] - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] *= expected.iloc[[0, 3], [1, 3]] - tm.assert_frame_equal(df, expected) - - rhs = df_orig.loc[(slice(None), 1), (slice(None), ['foo'])].copy() - rhs.loc[:, ('c', 'bah')] = 10 - df = df_orig.copy() - df.loc[(slice(None), 1), (slice(None), ['foo'])] *= rhs - expected = df_orig.copy() - expected.iloc[[0, 3], [1, 3]] *= expected.iloc[[0, 3], [1, 3]] - tm.assert_frame_equal(df, expected) - - def test_multiindex_setitem(self): - - # GH 3738 - # setting with a multi-index right hand side - arrays = [np.array(['bar', 'bar', 'baz', 'qux', 'qux', 'bar']), - np.array(['one', 'two', 'one', 'one', 'two', 'one']), - np.arange(0, 6, 1)] - - df_orig = pd.DataFrame(np.random.randn(6, 3), - index=arrays, - columns=['A', 'B', 'C']).sort_index() - - expected = df_orig.loc[['bar']] * 2 - df = df_orig.copy() - df.loc[['bar']] *= 2 - tm.assert_frame_equal(df.loc[['bar']], expected) + df.loc[df.index[2:5], 'bar'] = np.array([2.33j, 1.23 + 0.1j, + 2.2, 1.0]) - # raise because these have differing levels - def f(): - df.loc['bar'] *= 2 - - self.assertRaises(TypeError, f) - - # from SO - # http://stackoverflow.com/questions/24572040/pandas-access-the-level-of-multiindex-for-inplace-operation - df_orig = DataFrame.from_dict({'price': { - ('DE', 'Coal', 'Stock'): 2, - ('DE', 'Gas', 'Stock'): 4, - ('DE', 'Elec', 'Demand'): 1, - ('FR', 'Gas', 'Stock'): 5, - ('FR', 'Solar', 'SupIm'): 0, - ('FR', 'Wind', 'SupIm'): 0 - }}) - df_orig.index = MultiIndex.from_tuples(df_orig.index, - names=['Sit', 'Com', 'Type']) - - expected = df_orig.copy() - expected.iloc[[0, 2, 3]] *= 2 - - idx = pd.IndexSlice - df = df_orig.copy() - df.loc[idx[:, :, 'Stock'], :] *= 2 - tm.assert_frame_equal(df, expected) + pytest.raises(ValueError, f) - df = df_orig.copy() - df.loc[idx[:, :, 'Stock'], 'price'] *= 2 - tm.assert_frame_equal(df, expected) + # valid + df.loc[df.index[2:6], 'bar'] = np.array([2.33j, 1.23 + 0.1j, + 2.2, 1.0]) - def test_getitem_multiindex(self): - # GH 5725 the 'A' happens to be a valid Timestamp so the doesn't raise - # the appropriate error, only in PY3 of course! - index = MultiIndex(levels=[['D', 'B', 'C'], - [0, 26, 27, 37, 57, 67, 75, 82]], - labels=[[0, 0, 0, 1, 2, 2, 2, 2, 2, 2], - [1, 3, 4, 6, 0, 2, 2, 3, 5, 7]], - names=['tag', 'day']) - arr = np.random.randn(len(index), 1) - df = DataFrame(arr, index=index, columns=['val']) - result = df.val['D'] - expected = Series(arr.ravel()[0:3], name='val', index=Index( - [26, 37, 57], name='day')) + result = df.loc[df.index[2:6], 'bar'] + expected = Series([2.33j, 1.23 + 0.1j, 2.2, 1.0], index=[3, 4, 5, 6], + name='bar') tm.assert_series_equal(result, expected) - def f(): - df.val['A'] - - self.assertRaises(KeyError, f) - - def f(): - df.val['X'] - - self.assertRaises(KeyError, f) - - # A is treated as a special Timestamp - index = MultiIndex(levels=[['A', 'B', 'C'], - [0, 26, 27, 37, 57, 67, 75, 82]], - labels=[[0, 0, 0, 1, 2, 2, 2, 2, 2, 2], - [1, 3, 4, 6, 0, 2, 2, 3, 5, 7]], - names=['tag', 'day']) - df = DataFrame(arr, index=index, columns=['val']) - result = df.val['A'] - expected = Series(arr.ravel()[0:3], name='val', index=Index( - [26, 37, 57], name='day')) - tm.assert_series_equal(result, expected) + # dtype getting changed? + df = DataFrame(index=Index(lrange(1, 11))) + df['foo'] = np.zeros(10, dtype=np.float64) + df['bar'] = np.zeros(10, dtype=np.complex) def f(): - df.val['X'] - - self.assertRaises(KeyError, f) - - # GH 7866 - # multi-index slicing with missing indexers - idx = pd.MultiIndex.from_product([['A', 'B', 'C'], - ['foo', 'bar', 'baz']], - names=['one', 'two']) - s = pd.Series(np.arange(9, dtype='int64'), index=idx).sort_index() + df[2:5] = np.arange(1, 4) * 1j - exp_idx = pd.MultiIndex.from_product([['A'], ['foo', 'bar', 'baz']], - names=['one', 'two']) - expected = pd.Series(np.arange(3, dtype='int64'), - index=exp_idx).sort_index() + pytest.raises(ValueError, f) - result = s.loc[['A']] - tm.assert_series_equal(result, expected) - result = s.loc[['A', 'D']] - tm.assert_series_equal(result, expected) + def test_inf_upcast(self): + # GH 16957 + # We should be able to use np.inf as a key + # np.inf should cause an index to convert to float - # not any values found - self.assertRaises(KeyError, lambda: s.loc[['D']]) + # Test with np.inf in rows + df = pd.DataFrame(columns=[0]) + df.loc[1] = 1 + df.loc[2] = 2 + df.loc[np.inf] = 3 - # empty ok - result = s.loc[[]] - expected = s.iloc[[]] - tm.assert_series_equal(result, expected) + # make sure we can look up the value + assert df.loc[np.inf, 0] == 3 - idx = pd.IndexSlice - expected = pd.Series([0, 3, 6], index=pd.MultiIndex.from_product( - [['A', 'B', 'C'], ['foo']], names=['one', 'two'])).sort_index() + result = df.index + expected = pd.Float64Index([1, 2, np.inf]) + tm.assert_index_equal(result, expected) - result = s.loc[idx[:, ['foo']]] - tm.assert_series_equal(result, expected) - result = s.loc[idx[:, ['foo', 'bah']]] - tm.assert_series_equal(result, expected) + # Test with np.inf in columns + df = pd.DataFrame() + df.loc[0, 0] = 1 + df.loc[1, 1] = 2 + df.loc[0, np.inf] = 3 - # GH 8737 - # empty indexer - multi_index = pd.MultiIndex.from_product((['foo', 'bar', 'baz'], - ['alpha', 'beta'])) - df = DataFrame( - np.random.randn(5, 6), index=range(5), columns=multi_index) - df = df.sort_index(level=0, axis=1) - - expected = DataFrame(index=range(5), - columns=multi_index.reindex([])[0]) - result1 = df.loc[:, ([], slice(None))] - result2 = df.loc[:, (['foo'], [])] - tm.assert_frame_equal(result1, expected) - tm.assert_frame_equal(result2, expected) - - # regression from < 0.14.0 - # GH 7914 - df = DataFrame([[np.mean, np.median], ['mean', 'median']], - columns=MultiIndex.from_tuples([('functs', 'mean'), - ('functs', 'median')]), - index=['function', 'name']) - result = df.loc['function', ('functs', 'mean')] - self.assertEqual(result, np.mean) + result = df.columns + expected = pd.Float64Index([0, 1, np.inf]) + tm.assert_index_equal(result, expected) def test_setitem_dtype_upcast(self): # GH3216 df = DataFrame([{"a": 1}, {"a": 3, "b": 2}]) df['c'] = np.nan - self.assertEqual(df['c'].dtype, np.float64) + assert df['c'].dtype == np.float64 df.loc[0, 'c'] = 'foo' expected = DataFrame([{"a": 1, "c": 'foo'}, @@ -2906,8 +115,8 @@ def test_setitem_dtype_upcast(self): columns=['foo', 'bar', 'baz']) tm.assert_frame_equal(left, right) - self.assertTrue(is_integer_dtype(left['foo'])) - self.assertTrue(is_integer_dtype(left['baz'])) + assert is_integer_dtype(left['foo']) + assert is_integer_dtype(left['baz']) left = DataFrame(np.arange(6, dtype='int64').reshape(2, 3) / 10.0, index=list('ab'), @@ -2918,21 +127,8 @@ def test_setitem_dtype_upcast(self): columns=['foo', 'bar', 'baz']) tm.assert_frame_equal(left, right) - self.assertTrue(is_float_dtype(left['foo'])) - self.assertTrue(is_float_dtype(left['baz'])) - - def test_setitem_iloc(self): - - # setitem with an iloc list - df = DataFrame(np.arange(9).reshape((3, 3)), index=["A", "B", "C"], - columns=["A", "B", "C"]) - df.iloc[[0, 1], [1, 2]] - df.iloc[[0, 1], [1, 2]] += 100 - - expected = DataFrame( - np.array([0, 101, 102, 3, 104, 105, 6, 7, 8]).reshape((3, 3)), - index=["A", "B", "C"], columns=["A", "B", "C"]) - tm.assert_frame_equal(df, expected) + assert is_float_dtype(left['foo']) + assert is_float_dtype(left['baz']) def test_dups_fancy_indexing(self): @@ -2942,7 +138,7 @@ def test_dups_fancy_indexing(self): df.columns = ['a', 'a', 'b'] result = df[['b', 'a']].columns expected = Index(['b', 'a', 'a']) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) # across dtypes df = DataFrame([[1, 2, 1., 2., 3., 'foo', 'bar']], @@ -2995,23 +191,24 @@ def test_dups_fancy_indexing(self): # inconsistent returns for unique/duplicate indices when values are # missing - df = DataFrame(randn(4, 3), index=list('ABCD')) - expected = df.ix[['E']] + df = DataFrame(np.random.randn(4, 3), index=list('ABCD')) + expected = df.reindex(['E']) - dfnu = DataFrame(randn(5, 3), index=list('AABCD')) - result = dfnu.ix[['E']] + dfnu = DataFrame(np.random.randn(5, 3), index=list('AABCD')) + with catch_warnings(record=True): + result = dfnu.ix[['E']] tm.assert_frame_equal(result, expected) # ToDo: check_index_type can be True after GH 11497 # GH 4619; duplicate indexer with missing label df = DataFrame({"A": [0, 1, 2]}) - result = df.ix[[0, 8, 0]] + result = df.loc[[0, 8, 0]] expected = DataFrame({"A": [0, np.nan, 0]}, index=[0, 8, 0]) tm.assert_frame_equal(result, expected, check_index_type=False) df = DataFrame({"A": list('abc')}) - result = df.ix[[0, 8, 0]] + result = df.loc[[0, 8, 0]] expected = DataFrame({"A": ['a', np.nan, 'a']}, index=[0, 8, 0]) tm.assert_frame_equal(result, expected, check_index_type=False) @@ -3019,7 +216,7 @@ def test_dups_fancy_indexing(self): df = DataFrame({'test': [5, 7, 9, 11]}, index=['A', 'A', 'B', 'C']) expected = DataFrame( {'test': [5, 7, 5, 7, np.nan]}, index=['A', 'A', 'A', 'A', 'E']) - result = df.ix[['A', 'A', 'E']] + result = df.loc[['A', 'A', 'E']] tm.assert_frame_equal(result, expected) # GH 5835 @@ -3028,9 +225,9 @@ def test_dups_fancy_indexing(self): np.random.randn(5, 5), columns=['A', 'B', 'B', 'B', 'A']) expected = pd.concat( - [df.ix[:, ['A', 'B']], DataFrame(np.nan, columns=['C'], - index=df.index)], axis=1) - result = df.ix[:, ['A', 'B', 'C']] + [df.loc[:, ['A', 'B']], DataFrame(np.nan, columns=['C'], + index=df.index)], axis=1) + result = df.loc[:, ['A', 'B', 'C']] tm.assert_frame_equal(result, expected) # GH 6504, multi-axis indexing @@ -3060,9 +257,9 @@ def test_indexing_mixed_frame_bug(self): # this does not work, ie column test is not changed idx = df['test'] == '_' - temp = df.ix[idx, 'a'].apply(lambda x: '-----' if x == 'aaa' else x) - df.ix[idx, 'test'] = temp - self.assertEqual(df.iloc[0, 2], '-----') + temp = df.loc[idx, 'a'].apply(lambda x: '-----' if x == 'aaa' else x) + df.loc[idx, 'test'] = temp + assert df.iloc[0, 2] == '-----' # if I look at df, then element [0,2] equals '_'. If instead I type # df.ix[idx,'test'], I get '-----', finally by typing df.iloc[0,2] I @@ -3073,9 +270,9 @@ def test_multitype_list_index_access(self): df = pd.DataFrame(np.random.random((10, 5)), columns=["a"] + [20, 21, 22, 23]) - with self.assertRaises(KeyError): + with pytest.raises(KeyError): df[[22, 26, -8]] - self.assertEqual(df[21].shape[0], df.shape[0]) + assert df[21].shape[0] == df.shape[0] def test_set_index_nan(self): @@ -3097,17 +294,17 @@ def test_set_index_nan(self): 'QC': {17: 0.0, 18: 0.0, 19: 0.0, - 20: nan, - 21: nan, - 22: nan, - 23: nan, + 20: np.nan, + 21: np.nan, + 22: np.nan, + 23: np.nan, 24: 1.0, - 25: nan, - 26: nan, - 27: nan, - 28: nan, - 29: nan, - 30: nan}, + 25: np.nan, + 26: np.nan, + 27: np.nan, + 28: np.nan, + 29: np.nan, + 30: np.nan}, 'data': {17: 7.9544899999999998, 18: 8.0142609999999994, 19: 7.8591520000000008, @@ -3156,233 +353,6 @@ def test_multi_nan_indexing(self): Index(['C1', 'C2', 'C3', 'C4'], name='b')]) tm.assert_frame_equal(result, expected) - def test_iloc_panel_issue(self): - - # GH 3617 - p = Panel(randn(4, 4, 4)) - - self.assertEqual(p.iloc[:3, :3, :3].shape, (3, 3, 3)) - self.assertEqual(p.iloc[1, :3, :3].shape, (3, 3)) - self.assertEqual(p.iloc[:3, 1, :3].shape, (3, 3)) - self.assertEqual(p.iloc[:3, :3, 1].shape, (3, 3)) - self.assertEqual(p.iloc[1, 1, :3].shape, (3, )) - self.assertEqual(p.iloc[1, :3, 1].shape, (3, )) - self.assertEqual(p.iloc[:3, 1, 1].shape, (3, )) - - def test_panel_getitem(self): - # GH4016, date selection returns a frame when a partial string - # selection - ind = date_range(start="2000", freq="D", periods=1000) - df = DataFrame( - np.random.randn( - len(ind), 5), index=ind, columns=list('ABCDE')) - panel = Panel(dict([('frame_' + c, df) for c in list('ABC')])) - - test2 = panel.ix[:, "2002":"2002-12-31"] - test1 = panel.ix[:, "2002"] - tm.assert_panel_equal(test1, test2) - - # GH8710 - # multi-element getting with a list - panel = tm.makePanel() - - expected = panel.iloc[[0, 1]] - - result = panel.loc[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc[['ItemA', 'ItemB'], :, :] - tm.assert_panel_equal(result, expected) - - result = panel[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - result = panel.loc['ItemA':'ItemB'] - tm.assert_panel_equal(result, expected) - - result = panel.ix['ItemA':'ItemB'] - tm.assert_panel_equal(result, expected) - - result = panel.ix[['ItemA', 'ItemB']] - tm.assert_panel_equal(result, expected) - - # with an object-like - # GH 9140 - class TestObject: - - def __str__(self): - return "TestObject" - - obj = TestObject() - - p = Panel(np.random.randn(1, 5, 4), items=[obj], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - - expected = p.iloc[0] - result = p[obj] - tm.assert_frame_equal(result, expected) - - def test_panel_setitem(self): - - # GH 7763 - # loc and setitem have setting differences - np.random.seed(0) - index = range(3) - columns = list('abc') - - panel = Panel({'A': DataFrame(np.random.randn(3, 3), - index=index, columns=columns), - 'B': DataFrame(np.random.randn(3, 3), - index=index, columns=columns), - 'C': DataFrame(np.random.randn(3, 3), - index=index, columns=columns)}) - - replace = DataFrame(np.eye(3, 3), index=range(3), columns=columns) - expected = Panel({'A': replace, 'B': replace, 'C': replace}) - - p = panel.copy() - for idx in list('ABC'): - p[idx] = replace - tm.assert_panel_equal(p, expected) - - p = panel.copy() - for idx in list('ABC'): - p.loc[idx, :, :] = replace - tm.assert_panel_equal(p, expected) - - def test_panel_setitem_with_multiindex(self): - - # 10360 - # failing with a multi-index - arr = np.array([[[1, 2, 3], [0, 0, 0]], [[0, 0, 0], [0, 0, 0]]], - dtype=np.float64) - - # reg index - axes = dict(items=['A', 'B'], major_axis=[0, 1], - minor_axis=['X', 'Y', 'Z']) - p1 = Panel(0., **axes) - p1.iloc[0, 0, :] = [1, 2, 3] - expected = Panel(arr, **axes) - tm.assert_panel_equal(p1, expected) - - # multi-indexes - axes['items'] = pd.MultiIndex.from_tuples([('A', 'a'), ('B', 'b')]) - p2 = Panel(0., **axes) - p2.iloc[0, 0, :] = [1, 2, 3] - expected = Panel(arr, **axes) - tm.assert_panel_equal(p2, expected) - - axes['major_axis'] = pd.MultiIndex.from_tuples([('A', 1), ('A', 2)]) - p3 = Panel(0., **axes) - p3.iloc[0, 0, :] = [1, 2, 3] - expected = Panel(arr, **axes) - tm.assert_panel_equal(p3, expected) - - axes['minor_axis'] = pd.MultiIndex.from_product([['X'], range(3)]) - p4 = Panel(0., **axes) - p4.iloc[0, 0, :] = [1, 2, 3] - expected = Panel(arr, **axes) - tm.assert_panel_equal(p4, expected) - - arr = np.array( - [[[1, 0, 0], [2, 0, 0]], [[0, 0, 0], [0, 0, 0]]], dtype=np.float64) - p5 = Panel(0., **axes) - p5.iloc[0, :, 0] = [1, 2] - expected = Panel(arr, **axes) - tm.assert_panel_equal(p5, expected) - - def test_panel_assignment(self): - # GH3777 - wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - wp2 = Panel(randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - - # TODO: unused? - # expected = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] - - def f(): - wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = wp2.loc[ - ['Item1', 'Item2'], :, ['A', 'B']] - - self.assertRaises(NotImplementedError, f) - - # to_assign = wp2.loc[['Item1', 'Item2'], :, ['A', 'B']] - # wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = to_assign - # result = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] - # tm.assert_panel_equal(result,expected) - - def test_multiindex_assignment(self): - - # GH3777 part 2 - - # mixed dtype - df = DataFrame(np.random.randint(5, 10, size=9).reshape(3, 3), - columns=list('abc'), - index=[[4, 4, 8], [8, 10, 12]]) - df['d'] = np.nan - arr = np.array([0., 1.]) - - df.ix[4, 'd'] = arr - tm.assert_series_equal(df.ix[4, 'd'], - Series(arr, index=[8, 10], name='d')) - - # single dtype - df = DataFrame(np.random.randint(5, 10, size=9).reshape(3, 3), - columns=list('abc'), - index=[[4, 4, 8], [8, 10, 12]]) - - df.ix[4, 'c'] = arr - exp = Series(arr, index=[8, 10], name='c', dtype='float64') - tm.assert_series_equal(df.ix[4, 'c'], exp) - - # scalar ok - df.ix[4, 'c'] = 10 - exp = Series(10, index=[8, 10], name='c', dtype='float64') - tm.assert_series_equal(df.ix[4, 'c'], exp) - - # invalid assignments - def f(): - df.ix[4, 'c'] = [0, 1, 2, 3] - - self.assertRaises(ValueError, f) - - def f(): - df.ix[4, 'c'] = [0] - - self.assertRaises(ValueError, f) - - # groupby example - NUM_ROWS = 100 - NUM_COLS = 10 - col_names = ['A' + num for num in - map(str, np.arange(NUM_COLS).tolist())] - index_cols = col_names[:5] - - df = DataFrame(np.random.randint(5, size=(NUM_ROWS, NUM_COLS)), - dtype=np.int64, columns=col_names) - df = df.set_index(index_cols).sort_index() - grp = df.groupby(level=index_cols[:4]) - df['new_col'] = np.nan - - f_index = np.arange(5) - - def f(name, df2): - return Series(np.arange(df2.shape[0]), - name=df2.index.values[0]).reindex(f_index) - - # TODO(wesm): unused? - # new_df = pd.concat([f(name, df2) for name, df2 in grp], axis=1).T - - # we are actually operating on a copy here - # but in this case, that's ok - for name, df2 in grp: - new_vals = np.arange(df2.shape[0]) - df.ix[name, 'new_col'] = new_vals - def test_multi_assign(self): # GH 3626, an assignement of a sub-df to a df @@ -3390,14 +360,14 @@ def test_multi_assign(self): 'PF': [0, 0, 0, 0, 1, 1], 'col1': lrange(6), 'col2': lrange(6, 12)}) - df.ix[1, 0] = np.nan + df.iloc[1, 0] = np.nan df2 = df.copy() - mask = ~df2.FC.isnull() + mask = ~df2.FC.isna() cols = ['col1', 'col2'] dft = df2 * 2 - dft.ix[3, 3] = np.nan + dft.iloc[3, 3] = np.nan expected = DataFrame({'FC': ['a', np.nan, 'a', 'b', 'a', 'b'], 'PF': [0, 0, 0, 0, 1, 1], @@ -3405,17 +375,23 @@ def test_multi_assign(self): 'col2': [12, 7, 16, np.nan, 20, 22]}) # frame on rhs - df2.ix[mask, cols] = dft.ix[mask, cols] + df2.loc[mask, cols] = dft.loc[mask, cols] tm.assert_frame_equal(df2, expected) - df2.ix[mask, cols] = dft.ix[mask, cols] + df2.loc[mask, cols] = dft.loc[mask, cols] tm.assert_frame_equal(df2, expected) # with an ndarray on rhs + # coerces to float64 because values has float64 dtype + # GH 14001 + expected = DataFrame({'FC': ['a', np.nan, 'a', 'b', 'a', 'b'], + 'PF': [0, 0, 0, 0, 1, 1], + 'col1': [0., 1., 4., 6., 8., 10.], + 'col2': [12, 7, 16, np.nan, 20, 22]}) df2 = df.copy() - df2.ix[mask, cols] = dft.ix[mask, cols].values + df2.loc[mask, cols] = dft.loc[mask, cols].values tm.assert_frame_equal(df2, expected) - df2.ix[mask, cols] = dft.ix[mask, cols].values + df2.loc[mask, cols] = dft.loc[mask, cols].values tm.assert_frame_equal(df2, expected) # broadcasting on the rhs is required @@ -3430,79 +406,18 @@ def test_multi_assign(self): df.loc[df['A'] == 0, ['A', 'B']] = df['D'] tm.assert_frame_equal(df, expected) - def test_ix_assign_column_mixed(self): - # GH #1142 - df = DataFrame(tm.getSeriesData()) - df['foo'] = 'bar' - - orig = df.ix[:, 'B'].copy() - df.ix[:, 'B'] = df.ix[:, 'B'] + 1 - tm.assert_series_equal(df.B, orig + 1) - - # GH 3668, mixed frame with series value - df = DataFrame({'x': lrange(10), 'y': lrange(10, 20), 'z': 'bar'}) - expected = df.copy() - - for i in range(5): - indexer = i * 2 - v = 1000 + i * 200 - expected.ix[indexer, 'y'] = v - self.assertEqual(expected.ix[indexer, 'y'], v) - - df.ix[df.x % 2 == 0, 'y'] = df.ix[df.x % 2 == 0, 'y'] * 100 - tm.assert_frame_equal(df, expected) - - # GH 4508, making sure consistency of assignments - df = DataFrame({'a': [1, 2, 3], 'b': [0, 1, 2]}) - df.ix[[0, 2, ], 'b'] = [100, -100] - expected = DataFrame({'a': [1, 2, 3], 'b': [100, 1, -100]}) - tm.assert_frame_equal(df, expected) - - df = pd.DataFrame({'a': lrange(4)}) - df['b'] = np.nan - df.ix[[1, 3], 'b'] = [100, -100] - expected = DataFrame({'a': [0, 1, 2, 3], - 'b': [np.nan, 100, np.nan, -100]}) - tm.assert_frame_equal(df, expected) - - # ok, but chained assignments are dangerous - # if we turn off chained assignement it will work - with option_context('chained_assignment', None): - df = pd.DataFrame({'a': lrange(4)}) - df['b'] = np.nan - df['b'].ix[[1, 3]] = [100, -100] - tm.assert_frame_equal(df, expected) - - def test_ix_get_set_consistency(self): - - # GH 4544 - # ix/loc get/set not consistent when - # a mixed int/string index - df = DataFrame(np.arange(16).reshape((4, 4)), - columns=['a', 'b', 8, 'c'], - index=['e', 7, 'f', 'g']) - - self.assertEqual(df.ix['e', 8], 2) - self.assertEqual(df.loc['e', 8], 2) - - df.ix['e', 8] = 42 - self.assertEqual(df.ix['e', 8], 42) - self.assertEqual(df.loc['e', 8], 42) - - df.loc['e', 8] = 45 - self.assertEqual(df.ix['e', 8], 45) - self.assertEqual(df.loc['e', 8], 45) - def test_setitem_list(self): # GH 6043 # ix with a list df = DataFrame(index=[0, 1], columns=[0]) - df.ix[1, 0] = [1, 2, 3] - df.ix[1, 0] = [1, 2] + with catch_warnings(record=True): + df.ix[1, 0] = [1, 2, 3] + df.ix[1, 0] = [1, 2] result = DataFrame(index=[0, 1], columns=[0]) - result.ix[1, 0] = [1, 2] + with catch_warnings(record=True): + result.ix[1, 0] = [1, 2] tm.assert_frame_equal(result, df) @@ -3524,186 +439,24 @@ def view(self): return self df = DataFrame(index=[0, 1], columns=[0]) - df.ix[1, 0] = TO(1) - df.ix[1, 0] = TO(2) + with catch_warnings(record=True): + df.ix[1, 0] = TO(1) + df.ix[1, 0] = TO(2) result = DataFrame(index=[0, 1], columns=[0]) - result.ix[1, 0] = TO(2) + with catch_warnings(record=True): + result.ix[1, 0] = TO(2) tm.assert_frame_equal(result, df) # remains object dtype even after setting it back df = DataFrame(index=[0, 1], columns=[0]) - df.ix[1, 0] = TO(1) - df.ix[1, 0] = np.nan - result = DataFrame(index=[0, 1], columns=[0]) - - tm.assert_frame_equal(result, df) - - def test_iloc_mask(self): - - # GH 3631, iloc with a mask (of a series) should raise - df = DataFrame(lrange(5), list('ABCDE'), columns=['a']) - mask = (df.a % 2 == 0) - self.assertRaises(ValueError, df.iloc.__getitem__, tuple([mask])) - mask.index = lrange(len(mask)) - self.assertRaises(NotImplementedError, df.iloc.__getitem__, - tuple([mask])) - - # ndarray ok - result = df.iloc[np.array([True] * len(mask), dtype=bool)] - tm.assert_frame_equal(result, df) - - # the possibilities - locs = np.arange(4) - nums = 2 ** locs - reps = lmap(bin, nums) - df = DataFrame({'locs': locs, 'nums': nums}, reps) - - expected = { - (None, ''): '0b1100', - (None, '.loc'): '0b1100', - (None, '.iloc'): '0b1100', - ('index', ''): '0b11', - ('index', '.loc'): '0b11', - ('index', '.iloc'): ('iLocation based boolean indexing ' - 'cannot use an indexable as a mask'), - ('locs', ''): 'Unalignable boolean Series provided as indexer ' - '(index of the boolean Series and of the indexed ' - 'object do not match', - ('locs', '.loc'): 'Unalignable boolean Series provided as indexer ' - '(index of the boolean Series and of the ' - 'indexed object do not match', - ('locs', '.iloc'): ('iLocation based boolean indexing on an ' - 'integer type is not available'), - } - - # UserWarnings from reindex of a boolean mask - with warnings.catch_warnings(record=True): - result = dict() - for idx in [None, 'index', 'locs']: - mask = (df.nums > 2).values - if idx: - mask = Series(mask, list(reversed(getattr(df, idx)))) - for method in ['', '.loc', '.iloc']: - try: - if method: - accessor = getattr(df, method[1:]) - else: - accessor = df - ans = str(bin(accessor[mask]['nums'].sum())) - except Exception as e: - ans = str(e) - - key = tuple([idx, method]) - r = expected.get(key) - if r != ans: - raise AssertionError( - "[%s] does not match [%s], received [%s]" - % (key, ans, r)) - - def test_ix_slicing_strings(self): - # GH3836 - data = {'Classification': - ['SA EQUITY CFD', 'bbb', 'SA EQUITY', 'SA SSF', 'aaa'], - 'Random': [1, 2, 3, 4, 5], - 'X': ['correct', 'wrong', 'correct', 'correct', 'wrong']} - df = DataFrame(data) - x = df[~df.Classification.isin(['SA EQUITY CFD', 'SA EQUITY', 'SA SSF' - ])] - df.ix[x.index, 'X'] = df['Classification'] - - expected = DataFrame({'Classification': {0: 'SA EQUITY CFD', - 1: 'bbb', - 2: 'SA EQUITY', - 3: 'SA SSF', - 4: 'aaa'}, - 'Random': {0: 1, - 1: 2, - 2: 3, - 3: 4, - 4: 5}, - 'X': {0: 'correct', - 1: 'bbb', - 2: 'correct', - 3: 'correct', - 4: 'aaa'}}) # bug was 4: 'bbb' - - tm.assert_frame_equal(df, expected) - - def test_non_unique_loc(self): - # GH3659 - # non-unique indexer with loc slice - # https://groups.google.com/forum/?fromgroups#!topic/pydata/zTm2No0crYs - - # these are going to raise becuase the we are non monotonic - df = DataFrame({'A': [1, 2, 3, 4, 5, 6], - 'B': [3, 4, 5, 6, 7, 8]}, index=[0, 1, 0, 1, 2, 3]) - self.assertRaises(KeyError, df.loc.__getitem__, - tuple([slice(1, None)])) - self.assertRaises(KeyError, df.loc.__getitem__, - tuple([slice(0, None)])) - self.assertRaises(KeyError, df.loc.__getitem__, tuple([slice(1, 2)])) - - # monotonic are ok - df = DataFrame({'A': [1, 2, 3, 4, 5, 6], - 'B': [3, 4, 5, 6, 7, 8]}, - index=[0, 1, 0, 1, 2, 3]).sort_index(axis=0) - result = df.loc[1:] - expected = DataFrame({'A': [2, 4, 5, 6], 'B': [4, 6, 7, 8]}, - index=[1, 1, 2, 3]) - tm.assert_frame_equal(result, expected) - - result = df.loc[0:] - tm.assert_frame_equal(result, df) - - result = df.loc[1:2] - expected = DataFrame({'A': [2, 4, 5], 'B': [4, 6, 7]}, - index=[1, 1, 2]) - tm.assert_frame_equal(result, expected) - - def test_loc_name(self): - # GH 3880 - df = DataFrame([[1, 1], [1, 1]]) - df.index.name = 'index_name' - result = df.iloc[[0, 1]].index.name - self.assertEqual(result, 'index_name') - - result = df.ix[[0, 1]].index.name - self.assertEqual(result, 'index_name') - - result = df.loc[[0, 1]].index.name - self.assertEqual(result, 'index_name') - - def test_iloc_non_unique_indexing(self): - - # GH 4017, non-unique indexing (on the axis) - df = DataFrame({'A': [0.1] * 3000, 'B': [1] * 3000}) - idx = np.array(lrange(30)) * 99 - expected = df.iloc[idx] - - df3 = pd.concat([df, 2 * df, 3 * df]) - result = df3.iloc[idx] - - tm.assert_frame_equal(result, expected) - - df2 = DataFrame({'A': [0.1] * 1000, 'B': [1] * 1000}) - df2 = pd.concat([df2, 2 * df2, 3 * df2]) - - sidx = df2.index.to_series() - expected = df2.iloc[idx[idx <= sidx.max()]] - - new_list = [] - for r, s in expected.iterrows(): - new_list.append(s) - new_list.append(s * 2) - new_list.append(s * 3) + with catch_warnings(record=True): + df.ix[1, 0] = TO(1) + df.ix[1, 0] = np.nan + result = DataFrame(index=[0, 1], columns=[0]) - expected = DataFrame(new_list) - expected = pd.concat([expected, DataFrame(index=idx[idx > sidx.max()]) - ]) - result = df2.loc[idx] - tm.assert_frame_equal(result, expected, check_index_type=False) + tm.assert_frame_equal(result, df) def test_string_slice(self): # GH 14424 @@ -3711,19 +464,19 @@ def test_string_slice(self): # dtype should properly raises KeyError df = pd.DataFrame([1], pd.Index([pd.Timestamp('2011-01-01')], dtype=object)) - self.assertTrue(df.index.is_all_dates) - with tm.assertRaises(KeyError): + assert df.index.is_all_dates + with pytest.raises(KeyError): df['2011'] - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): df.loc['2011', 0] df = pd.DataFrame() - self.assertFalse(df.index.is_all_dates) - with tm.assertRaises(KeyError): + assert not df.index.is_all_dates + with pytest.raises(KeyError): df['2011'] - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): df.loc['2011', 0] def test_mi_access(self): @@ -3765,43 +518,6 @@ def test_mi_access(self): result = df2['A']['B2'] tm.assert_frame_equal(result, expected) - def test_non_unique_loc_memory_error(self): - - # GH 4280 - # non_unique index with a large selection triggers a memory error - - columns = list('ABCDEFG') - - def gen_test(l, l2): - return pd.concat([DataFrame(randn(l, len(columns)), - index=lrange(l), columns=columns), - DataFrame(np.ones((l2, len(columns))), - index=[0] * l2, columns=columns)]) - - def gen_expected(df, mask): - l = len(mask) - return pd.concat([df.take([0], convert=False), - DataFrame(np.ones((l, len(columns))), - index=[0] * l, - columns=columns), - df.take(mask[1:], convert=False)]) - - df = gen_test(900, 100) - self.assertFalse(df.index.is_unique) - - mask = np.arange(100) - result = df.loc[mask] - expected = gen_expected(df, mask) - tm.assert_frame_equal(result, expected) - - df = gen_test(900000, 100000) - self.assertFalse(df.index.is_unique) - - mask = np.arange(100000) - result = df.loc[mask] - expected = gen_expected(df, mask) - tm.assert_frame_equal(result, expected) - def test_astype_assignment(self): # GH4312 (iloc) @@ -3854,1332 +570,113 @@ def test_astype_assignment_with_dups(self): index = df.index.copy() df['A'] = df['A'].astype(np.float64) - self.assert_index_equal(df.index, index) + tm.assert_index_equal(df.index, index) # TODO(wesm): unused variables # result = df.get_dtype_counts().sort_index() # expected = Series({'float64': 2, 'object': 1}).sort_index() - def test_dups_loc(self): - - # GH4726 - # dup indexing with iloc/loc - df = DataFrame([[1, 2, 'foo', 'bar', Timestamp('20130101')]], - columns=['a', 'a', 'a', 'a', 'a'], index=[1]) - expected = Series([1, 2, 'foo', 'bar', Timestamp('20130101')], - index=['a', 'a', 'a', 'a', 'a'], name=1) - - result = df.iloc[0] - tm.assert_series_equal(result, expected) - - result = df.loc[1] - tm.assert_series_equal(result, expected) - - def test_partial_setting(self): - - # GH2578, allow ix and friends to partially set - - # series - s_orig = Series([1, 2, 3]) - - s = s_orig.copy() - s[5] = 5 - expected = Series([1, 2, 3, 5], index=[0, 1, 2, 5]) - tm.assert_series_equal(s, expected) - - s = s_orig.copy() - s.loc[5] = 5 - expected = Series([1, 2, 3, 5], index=[0, 1, 2, 5]) - tm.assert_series_equal(s, expected) - - s = s_orig.copy() - s[5] = 5. - expected = Series([1, 2, 3, 5.], index=[0, 1, 2, 5]) - tm.assert_series_equal(s, expected) - - s = s_orig.copy() - s.loc[5] = 5. - expected = Series([1, 2, 3, 5.], index=[0, 1, 2, 5]) - tm.assert_series_equal(s, expected) - - # iloc/iat raise - s = s_orig.copy() - - def f(): - s.iloc[3] = 5. - - self.assertRaises(IndexError, f) - - def f(): - s.iat[3] = 5. - - self.assertRaises(IndexError, f) - - # ## frame ## - - df_orig = DataFrame( - np.arange(6).reshape(3, 2), columns=['A', 'B'], dtype='int64') - - # iloc/iat raise - df = df_orig.copy() - - def f(): - df.iloc[4, 2] = 5. - - self.assertRaises(IndexError, f) - - def f(): - df.iat[4, 2] = 5. - - self.assertRaises(IndexError, f) - - # row setting where it exists - expected = DataFrame(dict({'A': [0, 4, 4], 'B': [1, 5, 5]})) - df = df_orig.copy() - df.iloc[1] = df.iloc[2] - tm.assert_frame_equal(df, expected) - - expected = DataFrame(dict({'A': [0, 4, 4], 'B': [1, 5, 5]})) - df = df_orig.copy() - df.loc[1] = df.loc[2] - tm.assert_frame_equal(df, expected) - - # like 2578, partial setting with dtype preservation - expected = DataFrame(dict({'A': [0, 2, 4, 4], 'B': [1, 3, 5, 5]})) - df = df_orig.copy() - df.loc[3] = df.loc[2] - tm.assert_frame_equal(df, expected) - - # single dtype frame, overwrite - expected = DataFrame(dict({'A': [0, 2, 4], 'B': [0, 2, 4]})) - df = df_orig.copy() - df.ix[:, 'B'] = df.ix[:, 'A'] - tm.assert_frame_equal(df, expected) - - # mixed dtype frame, overwrite - expected = DataFrame(dict({'A': [0, 2, 4], 'B': Series([0, 2, 4])})) - df = df_orig.copy() - df['B'] = df['B'].astype(np.float64) - df.ix[:, 'B'] = df.ix[:, 'A'] - tm.assert_frame_equal(df, expected) - - # single dtype frame, partial setting - expected = df_orig.copy() - expected['C'] = df['A'] - df = df_orig.copy() - df.ix[:, 'C'] = df.ix[:, 'A'] - tm.assert_frame_equal(df, expected) - - # mixed frame, partial setting - expected = df_orig.copy() - expected['C'] = df['A'] - df = df_orig.copy() - df.ix[:, 'C'] = df.ix[:, 'A'] - tm.assert_frame_equal(df, expected) - - # ## panel ## - p_orig = Panel(np.arange(16).reshape(2, 4, 2), - items=['Item1', 'Item2'], - major_axis=pd.date_range('2001/1/12', periods=4), - minor_axis=['A', 'B'], dtype='float64') - - # panel setting via item - p_orig = Panel(np.arange(16).reshape(2, 4, 2), - items=['Item1', 'Item2'], - major_axis=pd.date_range('2001/1/12', periods=4), - minor_axis=['A', 'B'], dtype='float64') - expected = p_orig.copy() - expected['Item3'] = expected['Item1'] - p = p_orig.copy() - p.loc['Item3'] = p['Item1'] - tm.assert_panel_equal(p, expected) - - # panel with aligned series - expected = p_orig.copy() - expected = expected.transpose(2, 1, 0) - expected['C'] = DataFrame({'Item1': [30, 30, 30, 30], - 'Item2': [32, 32, 32, 32]}, - index=p_orig.major_axis) - expected = expected.transpose(2, 1, 0) - p = p_orig.copy() - p.loc[:, :, 'C'] = Series([30, 32], index=p_orig.items) - tm.assert_panel_equal(p, expected) - - # GH 8473 - dates = date_range('1/1/2000', periods=8) - df_orig = DataFrame(np.random.randn(8, 4), index=dates, - columns=['A', 'B', 'C', 'D']) - - expected = pd.concat([df_orig, DataFrame( - {'A': 7}, index=[dates[-1] + 1])]) - df = df_orig.copy() - df.loc[dates[-1] + 1, 'A'] = 7 - tm.assert_frame_equal(df, expected) - df = df_orig.copy() - df.at[dates[-1] + 1, 'A'] = 7 - tm.assert_frame_equal(df, expected) - - exp_other = DataFrame({0: 7}, index=[dates[-1] + 1]) - expected = pd.concat([df_orig, exp_other], axis=1) - - df = df_orig.copy() - df.loc[dates[-1] + 1, 0] = 7 - tm.assert_frame_equal(df, expected) - df = df_orig.copy() - df.at[dates[-1] + 1, 0] = 7 - tm.assert_frame_equal(df, expected) - - def test_partial_setting_mixed_dtype(self): - - # in a mixed dtype environment, try to preserve dtypes - # by appending - df = DataFrame([[True, 1], [False, 2]], columns=["female", "fitness"]) - - s = df.loc[1].copy() - s.name = 2 - expected = df.append(s) - - df.loc[2] = df.loc[1] - tm.assert_frame_equal(df, expected) - - # columns will align - df = DataFrame(columns=['A', 'B']) - df.loc[0] = Series(1, index=range(4)) - tm.assert_frame_equal(df, DataFrame(columns=['A', 'B'], index=[0])) - - # columns will align - df = DataFrame(columns=['A', 'B']) - df.loc[0] = Series(1, index=['B']) - - exp = DataFrame([[np.nan, 1]], columns=['A', 'B'], - index=[0], dtype='float64') - tm.assert_frame_equal(df, exp) - - # list-like must conform - df = DataFrame(columns=['A', 'B']) - - def f(): - df.loc[0] = [1, 2, 3] - - self.assertRaises(ValueError, f) - - # these are coerced to float unavoidably (as its a list-like to begin) - df = DataFrame(columns=['A', 'B']) - df.loc[3] = [6, 7] - - exp = DataFrame([[6, 7]], index=[3], columns=['A', 'B'], - dtype='float64') - tm.assert_frame_equal(df, exp) - - def test_partial_setting_with_datetimelike_dtype(self): - - # GH9478 - # a datetimeindex alignment issue with partial setting - df = pd.DataFrame(np.arange(6.).reshape(3, 2), columns=list('AB'), - index=pd.date_range('1/1/2000', periods=3, - freq='1H')) - expected = df.copy() - expected['C'] = [expected.index[0]] + [pd.NaT, pd.NaT] - - mask = df.A < 1 - df.loc[mask, 'C'] = df.loc[mask].index - tm.assert_frame_equal(df, expected) - - def test_loc_setitem_datetime(self): - - # GH 9516 - dt1 = Timestamp('20130101 09:00:00') - dt2 = Timestamp('20130101 10:00:00') - - for conv in [lambda x: x, lambda x: x.to_datetime64(), - lambda x: x.to_pydatetime(), lambda x: np.datetime64(x)]: - - df = pd.DataFrame() - df.loc[conv(dt1), 'one'] = 100 - df.loc[conv(dt2), 'one'] = 200 - - expected = DataFrame({'one': [100.0, 200.0]}, index=[dt1, dt2]) - tm.assert_frame_equal(df, expected) - - def test_series_partial_set(self): - # partial set with new index - # Regression from GH4825 - ser = Series([0.1, 0.2], index=[1, 2]) - - # loc - expected = Series([np.nan, 0.2, np.nan], index=[3, 2, 3]) - result = ser.loc[[3, 2, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([np.nan, 0.2, np.nan, np.nan], index=[3, 2, 3, 'x']) - result = ser.loc[[3, 2, 3, 'x']] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([0.2, 0.2, 0.1], index=[2, 2, 1]) - result = ser.loc[[2, 2, 1]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([0.2, 0.2, np.nan, 0.1], index=[2, 2, 'x', 1]) - result = ser.loc[[2, 2, 'x', 1]] - tm.assert_series_equal(result, expected, check_index_type=True) - - # raises as nothing in in the index - self.assertRaises(KeyError, lambda: ser.loc[[3, 3, 3]]) - - expected = Series([0.2, 0.2, np.nan], index=[2, 2, 3]) - result = ser.loc[[2, 2, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([0.3, np.nan, np.nan], index=[3, 4, 4]) - result = Series([0.1, 0.2, 0.3], index=[1, 2, 3]).loc[[3, 4, 4]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([np.nan, 0.3, 0.3], index=[5, 3, 3]) - result = Series([0.1, 0.2, 0.3, 0.4], - index=[1, 2, 3, 4]).loc[[5, 3, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([np.nan, 0.4, 0.4], index=[5, 4, 4]) - result = Series([0.1, 0.2, 0.3, 0.4], - index=[1, 2, 3, 4]).loc[[5, 4, 4]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([0.4, np.nan, np.nan], index=[7, 2, 2]) - result = Series([0.1, 0.2, 0.3, 0.4], - index=[4, 5, 6, 7]).loc[[7, 2, 2]] - tm.assert_series_equal(result, expected, check_index_type=True) - - expected = Series([0.4, np.nan, np.nan], index=[4, 5, 5]) - result = Series([0.1, 0.2, 0.3, 0.4], - index=[1, 2, 3, 4]).loc[[4, 5, 5]] - tm.assert_series_equal(result, expected, check_index_type=True) - - # iloc - expected = Series([0.2, 0.2, 0.1, 0.1], index=[2, 2, 1, 1]) - result = ser.iloc[[1, 1, 0, 0]] - tm.assert_series_equal(result, expected, check_index_type=True) - - def test_series_partial_set_with_name(self): - # GH 11497 - - idx = Index([1, 2], dtype='int64', name='idx') - ser = Series([0.1, 0.2], index=idx, name='s') - - # loc - exp_idx = Index([3, 2, 3], dtype='int64', name='idx') - expected = Series([np.nan, 0.2, np.nan], index=exp_idx, name='s') - result = ser.loc[[3, 2, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([3, 2, 3, 'x'], dtype='object', name='idx') - expected = Series([np.nan, 0.2, np.nan, np.nan], index=exp_idx, - name='s') - result = ser.loc[[3, 2, 3, 'x']] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([2, 2, 1], dtype='int64', name='idx') - expected = Series([0.2, 0.2, 0.1], index=exp_idx, name='s') - result = ser.loc[[2, 2, 1]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([2, 2, 'x', 1], dtype='object', name='idx') - expected = Series([0.2, 0.2, np.nan, 0.1], index=exp_idx, name='s') - result = ser.loc[[2, 2, 'x', 1]] - tm.assert_series_equal(result, expected, check_index_type=True) - - # raises as nothing in in the index - self.assertRaises(KeyError, lambda: ser.loc[[3, 3, 3]]) - - exp_idx = Index([2, 2, 3], dtype='int64', name='idx') - expected = Series([0.2, 0.2, np.nan], index=exp_idx, name='s') - result = ser.loc[[2, 2, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([3, 4, 4], dtype='int64', name='idx') - expected = Series([0.3, np.nan, np.nan], index=exp_idx, name='s') - idx = Index([1, 2, 3], dtype='int64', name='idx') - result = Series([0.1, 0.2, 0.3], index=idx, name='s').loc[[3, 4, 4]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([5, 3, 3], dtype='int64', name='idx') - expected = Series([np.nan, 0.3, 0.3], index=exp_idx, name='s') - idx = Index([1, 2, 3, 4], dtype='int64', name='idx') - result = Series([0.1, 0.2, 0.3, 0.4], index=idx, - name='s').loc[[5, 3, 3]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([5, 4, 4], dtype='int64', name='idx') - expected = Series([np.nan, 0.4, 0.4], index=exp_idx, name='s') - idx = Index([1, 2, 3, 4], dtype='int64', name='idx') - result = Series([0.1, 0.2, 0.3, 0.4], index=idx, - name='s').loc[[5, 4, 4]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([7, 2, 2], dtype='int64', name='idx') - expected = Series([0.4, np.nan, np.nan], index=exp_idx, name='s') - idx = Index([4, 5, 6, 7], dtype='int64', name='idx') - result = Series([0.1, 0.2, 0.3, 0.4], index=idx, - name='s').loc[[7, 2, 2]] - tm.assert_series_equal(result, expected, check_index_type=True) - - exp_idx = Index([4, 5, 5], dtype='int64', name='idx') - expected = Series([0.4, np.nan, np.nan], index=exp_idx, name='s') - idx = Index([1, 2, 3, 4], dtype='int64', name='idx') - result = Series([0.1, 0.2, 0.3, 0.4], index=idx, - name='s').loc[[4, 5, 5]] - tm.assert_series_equal(result, expected, check_index_type=True) - - # iloc - exp_idx = Index([2, 2, 1, 1], dtype='int64', name='idx') - expected = Series([0.2, 0.2, 0.1, 0.1], index=exp_idx, name='s') - result = ser.iloc[[1, 1, 0, 0]] - tm.assert_series_equal(result, expected, check_index_type=True) - - def test_series_partial_set_datetime(self): - # GH 11497 - - idx = date_range('2011-01-01', '2011-01-02', freq='D', name='idx') - ser = Series([0.1, 0.2], index=idx, name='s') - - result = ser.loc[[Timestamp('2011-01-01'), Timestamp('2011-01-02')]] - exp = Series([0.1, 0.2], index=idx, name='s') - tm.assert_series_equal(result, exp, check_index_type=True) - - keys = [Timestamp('2011-01-02'), Timestamp('2011-01-02'), - Timestamp('2011-01-01')] - exp = Series([0.2, 0.2, 0.1], index=pd.DatetimeIndex(keys, name='idx'), - name='s') - tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) - - keys = [Timestamp('2011-01-03'), Timestamp('2011-01-02'), - Timestamp('2011-01-03')] - exp = Series([np.nan, 0.2, np.nan], - index=pd.DatetimeIndex(keys, name='idx'), name='s') - tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) - - def test_series_partial_set_period(self): - # GH 11497 - - idx = pd.period_range('2011-01-01', '2011-01-02', freq='D', name='idx') - ser = Series([0.1, 0.2], index=idx, name='s') - - result = ser.loc[[pd.Period('2011-01-01', freq='D'), - pd.Period('2011-01-02', freq='D')]] - exp = Series([0.1, 0.2], index=idx, name='s') - tm.assert_series_equal(result, exp, check_index_type=True) - - keys = [pd.Period('2011-01-02', freq='D'), - pd.Period('2011-01-02', freq='D'), - pd.Period('2011-01-01', freq='D')] - exp = Series([0.2, 0.2, 0.1], index=pd.PeriodIndex(keys, name='idx'), - name='s') - tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True) - - keys = [pd.Period('2011-01-03', freq='D'), - pd.Period('2011-01-02', freq='D'), - pd.Period('2011-01-03', freq='D')] - exp = Series([np.nan, 0.2, np.nan], - index=pd.PeriodIndex(keys, name='idx'), name='s') - result = ser.loc[keys] - tm.assert_series_equal(result, exp) - - def test_partial_set_invalid(self): - - # GH 4940 - # allow only setting of 'valid' values - - orig = tm.makeTimeDataFrame() - df = orig.copy() - - # don't allow not string inserts - def f(): - df.loc[100.0, :] = df.ix[0] - - self.assertRaises(TypeError, f) - - def f(): - df.loc[100, :] = df.ix[0] - - self.assertRaises(TypeError, f) - - def f(): - df.ix[100.0, :] = df.ix[0] - - self.assertRaises(TypeError, f) - - def f(): - df.ix[100, :] = df.ix[0] - - self.assertRaises(ValueError, f) - - # allow object conversion here - df = orig.copy() - df.loc['a', :] = df.ix[0] - exp = orig.append(pd.Series(df.ix[0], name='a')) - tm.assert_frame_equal(df, exp) - tm.assert_index_equal(df.index, - pd.Index(orig.index.tolist() + ['a'])) - self.assertEqual(df.index.dtype, 'object') - - def test_partial_set_empty_series(self): - - # GH5226 - - # partially set with an empty object series - s = Series() - s.loc[1] = 1 - tm.assert_series_equal(s, Series([1], index=[1])) - s.loc[3] = 3 - tm.assert_series_equal(s, Series([1, 3], index=[1, 3])) - - s = Series() - s.loc[1] = 1. - tm.assert_series_equal(s, Series([1.], index=[1])) - s.loc[3] = 3. - tm.assert_series_equal(s, Series([1., 3.], index=[1, 3])) - - s = Series() - s.loc['foo'] = 1 - tm.assert_series_equal(s, Series([1], index=['foo'])) - s.loc['bar'] = 3 - tm.assert_series_equal(s, Series([1, 3], index=['foo', 'bar'])) - s.loc[3] = 4 - tm.assert_series_equal(s, Series([1, 3, 4], index=['foo', 'bar', 3])) - - def test_partial_set_empty_frame(self): - - # partially set with an empty object - # frame - df = DataFrame() - - def f(): - df.loc[1] = 1 - - self.assertRaises(ValueError, f) - - def f(): - df.loc[1] = Series([1], index=['foo']) - - self.assertRaises(ValueError, f) - - def f(): - df.loc[:, 1] = 1 - - self.assertRaises(ValueError, f) - - # these work as they don't really change - # anything but the index - # GH5632 - expected = DataFrame(columns=['foo'], index=pd.Index( - [], dtype='int64')) - - def f(): - df = DataFrame() - df['foo'] = Series([], dtype='object') - return df - - tm.assert_frame_equal(f(), expected) - - def f(): - df = DataFrame() - df['foo'] = Series(df.index) - return df - - tm.assert_frame_equal(f(), expected) + @pytest.mark.parametrize("index,val", [ + (pd.Index([0, 1, 2]), 2), + (pd.Index([0, 1, '2']), '2'), + (pd.Index([0, 1, 2, np.inf, 4]), 4), + (pd.Index([0, 1, 2, np.nan, 4]), 4), + (pd.Index([0, 1, 2, np.inf]), np.inf), + (pd.Index([0, 1, 2, np.nan]), np.nan), + ]) + def test_index_contains(self, index, val): + assert val in index + + @pytest.mark.parametrize("index,val", [ + (pd.Index([0, 1, 2]), '2'), + (pd.Index([0, 1, '2']), 2), + (pd.Index([0, 1, 2, np.inf]), 4), + (pd.Index([0, 1, 2, np.nan]), 4), + (pd.Index([0, 1, 2, np.inf]), np.nan), + (pd.Index([0, 1, 2, np.nan]), np.inf), + # Checking if np.inf in Int64Index should not cause an OverflowError + # Related to GH 16957 + (pd.Int64Index([0, 1, 2]), np.inf), + (pd.Int64Index([0, 1, 2]), np.nan), + (pd.UInt64Index([0, 1, 2]), np.inf), + (pd.UInt64Index([0, 1, 2]), np.nan), + ]) + def test_index_not_contains(self, index, val): + assert val not in index - def f(): - df = DataFrame() - df['foo'] = df.index - return df - - tm.assert_frame_equal(f(), expected) - - expected = DataFrame(columns=['foo'], - index=pd.Index([], dtype='int64')) - expected['foo'] = expected['foo'].astype('float64') - - def f(): - df = DataFrame() - df['foo'] = [] - return df - - tm.assert_frame_equal(f(), expected) - - def f(): - df = DataFrame() - df['foo'] = Series(range(len(df))) - return df - - tm.assert_frame_equal(f(), expected) - - def f(): - df = DataFrame() - tm.assert_index_equal(df.index, pd.Index([], dtype='object')) - df['foo'] = range(len(df)) - return df - - expected = DataFrame(columns=['foo'], - index=pd.Index([], dtype='int64')) - expected['foo'] = expected['foo'].astype('float64') - tm.assert_frame_equal(f(), expected) - - df = DataFrame() - tm.assert_index_equal(df.columns, pd.Index([], dtype=object)) - df2 = DataFrame() - df2[1] = Series([1], index=['foo']) - df.loc[:, 1] = Series([1], index=['foo']) - tm.assert_frame_equal(df, DataFrame([[1]], index=['foo'], columns=[1])) - tm.assert_frame_equal(df, df2) - - # no index to start - expected = DataFrame({0: Series(1, index=range(4))}, - columns=['A', 'B', 0]) - - df = DataFrame(columns=['A', 'B']) - df[0] = Series(1, index=range(4)) - df.dtypes - str(df) - tm.assert_frame_equal(df, expected) - - df = DataFrame(columns=['A', 'B']) - df.loc[:, 0] = Series(1, index=range(4)) - df.dtypes - str(df) - tm.assert_frame_equal(df, expected) - - def test_partial_set_empty_frame_row(self): - # GH5720, GH5744 - # don't create rows when empty - expected = DataFrame(columns=['A', 'B', 'New'], - index=pd.Index([], dtype='int64')) - expected['A'] = expected['A'].astype('int64') - expected['B'] = expected['B'].astype('float64') - expected['New'] = expected['New'].astype('float64') - - df = DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]}) - y = df[df.A > 5] - y['New'] = np.nan - tm.assert_frame_equal(y, expected) - # tm.assert_frame_equal(y,expected) - - expected = DataFrame(columns=['a', 'b', 'c c', 'd']) - expected['d'] = expected['d'].astype('int64') - df = DataFrame(columns=['a', 'b', 'c c']) - df['d'] = 3 - tm.assert_frame_equal(df, expected) - tm.assert_series_equal(df['c c'], Series(name='c c', dtype=object)) - - # reindex columns is ok - df = DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]}) - y = df[df.A > 5] - result = y.reindex(columns=['A', 'B', 'C']) - expected = DataFrame(columns=['A', 'B', 'C'], - index=pd.Index([], dtype='int64')) - expected['A'] = expected['A'].astype('int64') - expected['B'] = expected['B'].astype('float64') - expected['C'] = expected['C'].astype('float64') - tm.assert_frame_equal(result, expected) - - def test_partial_set_empty_frame_set_series(self): - # GH 5756 - # setting with empty Series - df = DataFrame(Series()) - tm.assert_frame_equal(df, DataFrame({0: Series()})) - - df = DataFrame(Series(name='foo')) - tm.assert_frame_equal(df, DataFrame({'foo': Series()})) - - def test_partial_set_empty_frame_empty_copy_assignment(self): - # GH 5932 - # copy on empty with assignment fails - df = DataFrame(index=[0]) - df = df.copy() - df['a'] = 0 - expected = DataFrame(0, index=[0], columns=['a']) - tm.assert_frame_equal(df, expected) - - def test_partial_set_empty_frame_empty_consistencies(self): - # GH 6171 - # consistency on empty frames - df = DataFrame(columns=['x', 'y']) - df['x'] = [1, 2] - expected = DataFrame(dict(x=[1, 2], y=[np.nan, np.nan])) - tm.assert_frame_equal(df, expected, check_dtype=False) - - df = DataFrame(columns=['x', 'y']) - df['x'] = ['1', '2'] - expected = DataFrame( - dict(x=['1', '2'], y=[np.nan, np.nan]), dtype=object) - tm.assert_frame_equal(df, expected) - - df = DataFrame(columns=['x', 'y']) - df.loc[0, 'x'] = 1 - expected = DataFrame(dict(x=[1], y=[np.nan])) - tm.assert_frame_equal(df, expected, check_dtype=False) - - def test_cache_updating(self): - # GH 4939, make sure to update the cache on setitem - - df = tm.makeDataFrame() - df['A'] # cache series - df.ix["Hello Friend"] = df.ix[0] - self.assertIn("Hello Friend", df['A'].index) - self.assertIn("Hello Friend", df['B'].index) - - panel = tm.makePanel() - panel.ix[0] # get first item into cache - panel.ix[:, :, 'A+1'] = panel.ix[:, :, 'A'] + 1 - self.assertIn("A+1", panel.ix[0].columns) - self.assertIn("A+1", panel.ix[1].columns) - - # 5216 - # make sure that we don't try to set a dead cache - a = np.random.rand(10, 3) - df = DataFrame(a, columns=['x', 'y', 'z']) - tuples = [(i, j) for i in range(5) for j in range(2)] - index = MultiIndex.from_tuples(tuples) - df.index = index - - # setting via chained assignment - # but actually works, since everything is a view - df.loc[0]['z'].iloc[0] = 1. - result = df.loc[(0, 0), 'z'] - self.assertEqual(result, 1) - - # correct setting - df.loc[(0, 0), 'z'] = 2 - result = df.loc[(0, 0), 'z'] - self.assertEqual(result, 2) - - # 10264 - df = DataFrame(np.zeros((5, 5), dtype='int64'), columns=[ - 'a', 'b', 'c', 'd', 'e'], index=range(5)) - df['f'] = 0 - df.f.values[3] = 1 - - # TODO(wesm): unused? - # y = df.iloc[np.arange(2, len(df))] - - df.f.values[3] = 2 - expected = DataFrame(np.zeros((5, 6), dtype='int64'), columns=[ - 'a', 'b', 'c', 'd', 'e', 'f'], index=range(5)) - expected.at[3, 'f'] = 2 - tm.assert_frame_equal(df, expected) - expected = Series([0, 0, 0, 2, 0], name='f') - tm.assert_series_equal(df.f, expected) - - def test_slice_consolidate_invalidate_item_cache(self): - - # this is chained assignment, but will 'work' - with option_context('chained_assignment', None): - - # #3970 - df = DataFrame({"aa": lrange(5), "bb": [2.2] * 5}) - - # Creates a second float block - df["cc"] = 0.0 - - # caches a reference to the 'bb' series - df["bb"] - - # repr machinery triggers consolidation - repr(df) - - # Assignment to wrong series - df['bb'].iloc[0] = 0.17 - df._clear_item_cache() - self.assertAlmostEqual(df['bb'][0], 0.17) - - def test_setitem_cache_updating(self): - # GH 5424 - cont = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'] - - for do_ref in [False, False]: - df = DataFrame({'a': cont, - "b": cont[3:] + cont[:3], - 'c': np.arange(7)}) - - # ref the cache - if do_ref: - df.ix[0, "c"] - - # set it - df.ix[7, 'c'] = 1 - - self.assertEqual(df.ix[0, 'c'], 0.0) - self.assertEqual(df.ix[7, 'c'], 1.0) - - # GH 7084 - # not updating cache on series setting with slices - expected = DataFrame({'A': [600, 600, 600]}, - index=date_range('5/7/2014', '5/9/2014')) - out = DataFrame({'A': [0, 0, 0]}, - index=date_range('5/7/2014', '5/9/2014')) - df = DataFrame({'C': ['A', 'A', 'A'], 'D': [100, 200, 300]}) - - # loop through df to update out - six = Timestamp('5/7/2014') - eix = Timestamp('5/9/2014') - for ix, row in df.iterrows(): - out.loc[six:eix, row['C']] = out.loc[six:eix, row['C']] + row['D'] - - tm.assert_frame_equal(out, expected) - tm.assert_series_equal(out['A'], expected['A']) - - # try via a chain indexing - # this actually works - out = DataFrame({'A': [0, 0, 0]}, - index=date_range('5/7/2014', '5/9/2014')) - for ix, row in df.iterrows(): - v = out[row['C']][six:eix] + row['D'] - out[row['C']][six:eix] = v - - tm.assert_frame_equal(out, expected) - tm.assert_series_equal(out['A'], expected['A']) - - out = DataFrame({'A': [0, 0, 0]}, - index=date_range('5/7/2014', '5/9/2014')) - for ix, row in df.iterrows(): - out.loc[six:eix, row['C']] += row['D'] - - tm.assert_frame_equal(out, expected) - tm.assert_series_equal(out['A'], expected['A']) - - def test_setitem_chained_setfault(self): - - # GH6026 - # setfaults under numpy 1.7.1 (ok on 1.8) - data = ['right', 'left', 'left', 'left', 'right', 'left', 'timeout'] - mdata = ['right', 'left', 'left', 'left', 'right', 'left', 'none'] - - df = DataFrame({'response': np.array(data)}) - mask = df.response == 'timeout' - df.response[mask] = 'none' - tm.assert_frame_equal(df, DataFrame({'response': mdata})) - - recarray = np.rec.fromarrays([data], names=['response']) - df = DataFrame(recarray) - mask = df.response == 'timeout' - df.response[mask] = 'none' - tm.assert_frame_equal(df, DataFrame({'response': mdata})) - - df = DataFrame({'response': data, 'response1': data}) - mask = df.response == 'timeout' - df.response[mask] = 'none' - tm.assert_frame_equal(df, DataFrame({'response': mdata, - 'response1': data})) - - # GH 6056 - expected = DataFrame(dict(A=[np.nan, 'bar', 'bah', 'foo', 'bar'])) - df = DataFrame(dict(A=np.array(['foo', 'bar', 'bah', 'foo', 'bar']))) - df['A'].iloc[0] = np.nan - result = df.head() - tm.assert_frame_equal(result, expected) - - df = DataFrame(dict(A=np.array(['foo', 'bar', 'bah', 'foo', 'bar']))) - df.A.iloc[0] = np.nan - result = df.head() - tm.assert_frame_equal(result, expected) - - def test_detect_chained_assignment(self): - - pd.set_option('chained_assignment', 'raise') - - # work with the chain - expected = DataFrame([[-5, 1], [-6, 3]], columns=list('AB')) - df = DataFrame(np.arange(4).reshape(2, 2), - columns=list('AB'), dtype='int64') - self.assertIsNone(df.is_copy) - df['A'][0] = -5 - df['A'][1] = -6 - tm.assert_frame_equal(df, expected) - - # test with the chaining - df = DataFrame({'A': Series(range(2), dtype='int64'), - 'B': np.array(np.arange(2, 4), dtype=np.float64)}) - self.assertIsNone(df.is_copy) - - def f(): - df['A'][0] = -5 - - self.assertRaises(com.SettingWithCopyError, f) - - def f(): - df['A'][1] = np.nan - - self.assertRaises(com.SettingWithCopyError, f) - self.assertIsNone(df['A'].is_copy) - - # using a copy (the chain), fails - df = DataFrame({'A': Series(range(2), dtype='int64'), - 'B': np.array(np.arange(2, 4), dtype=np.float64)}) - - def f(): - df.loc[0]['A'] = -5 - - self.assertRaises(com.SettingWithCopyError, f) - - # doc example - df = DataFrame({'a': ['one', 'one', 'two', 'three', - 'two', 'one', 'six'], - 'c': Series(range(7), dtype='int64')}) - self.assertIsNone(df.is_copy) - expected = DataFrame({'a': ['one', 'one', 'two', 'three', - 'two', 'one', 'six'], - 'c': [42, 42, 2, 3, 4, 42, 6]}) - - def f(): - indexer = df.a.str.startswith('o') - df[indexer]['c'] = 42 - - self.assertRaises(com.SettingWithCopyError, f) - - expected = DataFrame({'A': [111, 'bbb', 'ccc'], 'B': [1, 2, 3]}) - df = DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) - - def f(): - df['A'][0] = 111 - - self.assertRaises(com.SettingWithCopyError, f) - - def f(): - df.loc[0]['A'] = 111 - - self.assertRaises(com.SettingWithCopyError, f) - - df.loc[0, 'A'] = 111 - tm.assert_frame_equal(df, expected) - - # make sure that is_copy is picked up reconstruction - # GH5475 - df = DataFrame({"A": [1, 2]}) - self.assertIsNone(df.is_copy) - with tm.ensure_clean('__tmp__pickle') as path: - df.to_pickle(path) - df2 = pd.read_pickle(path) - df2["B"] = df2["A"] - df2["B"] = df2["A"] - - # a suprious raise as we are setting the entire column here - # GH5597 - from string import ascii_letters as letters - - def random_text(nobs=100): - df = [] - for i in range(nobs): - idx = np.random.randint(len(letters), size=2) - idx.sort() - df.append([letters[idx[0]:idx[1]]]) - - return DataFrame(df, columns=['letters']) - - df = random_text(100000) - - # always a copy - x = df.iloc[[0, 1, 2]] - self.assertIsNotNone(x.is_copy) - x = df.iloc[[0, 1, 2, 4]] - self.assertIsNotNone(x.is_copy) - - # explicity copy - indexer = df.letters.apply(lambda x: len(x) > 10) - df = df.ix[indexer].copy() - self.assertIsNone(df.is_copy) - df['letters'] = df['letters'].apply(str.lower) - - # implicity take - df = random_text(100000) - indexer = df.letters.apply(lambda x: len(x) > 10) - df = df.ix[indexer] - self.assertIsNotNone(df.is_copy) - df['letters'] = df['letters'].apply(str.lower) - - # implicity take 2 - df = random_text(100000) - indexer = df.letters.apply(lambda x: len(x) > 10) - df = df.ix[indexer] - self.assertIsNotNone(df.is_copy) - df.loc[:, 'letters'] = df['letters'].apply(str.lower) - - # should be ok even though it's a copy! - self.assertIsNone(df.is_copy) - df['letters'] = df['letters'].apply(str.lower) - self.assertIsNone(df.is_copy) - - df = random_text(100000) - indexer = df.letters.apply(lambda x: len(x) > 10) - df.ix[indexer, 'letters'] = df.ix[indexer, 'letters'].apply(str.lower) - - # an identical take, so no copy - df = DataFrame({'a': [1]}).dropna() - self.assertIsNone(df.is_copy) - df['a'] += 1 - - # inplace ops - # original from: - # http://stackoverflow.com/questions/20508968/series-fillna-in-a-multiindex-dataframe-does-not-fill-is-this-a-bug - a = [12, 23] - b = [123, None] - c = [1234, 2345] - d = [12345, 23456] - tuples = [('eyes', 'left'), ('eyes', 'right'), ('ears', 'left'), - ('ears', 'right')] - events = {('eyes', 'left'): a, - ('eyes', 'right'): b, - ('ears', 'left'): c, - ('ears', 'right'): d} - multiind = MultiIndex.from_tuples(tuples, names=['part', 'side']) - zed = DataFrame(events, index=['a', 'b'], columns=multiind) - - def f(): - zed['eyes']['right'].fillna(value=555, inplace=True) - - self.assertRaises(com.SettingWithCopyError, f) - - df = DataFrame(np.random.randn(10, 4)) - s = df.iloc[:, 0].sort_values() - tm.assert_series_equal(s, df.iloc[:, 0].sort_values()) - tm.assert_series_equal(s, df[0].sort_values()) - - # false positives GH6025 - df = DataFrame({'column1': ['a', 'a', 'a'], 'column2': [4, 8, 9]}) - str(df) - df['column1'] = df['column1'] + 'b' - str(df) - df = df[df['column2'] != 8] - str(df) - df['column1'] = df['column1'] + 'c' - str(df) - - # from SO: - # http://stackoverflow.com/questions/24054495/potential-bug-setting-value-for-undefined-column-using-iloc - df = DataFrame(np.arange(0, 9), columns=['count']) - df['group'] = 'b' - - def f(): - df.iloc[0:5]['group'] = 'a' - - self.assertRaises(com.SettingWithCopyError, f) - - # mixed type setting - # same dtype & changing dtype - df = DataFrame(dict(A=date_range('20130101', periods=5), - B=np.random.randn(5), - C=np.arange(5, dtype='int64'), - D=list('abcde'))) - - def f(): - df.ix[2]['D'] = 'foo' - - self.assertRaises(com.SettingWithCopyError, f) - - def f(): - df.ix[2]['C'] = 'foo' - - self.assertRaises(com.SettingWithCopyError, f) - - def f(): - df['C'][2] = 'foo' + def test_index_type_coercion(self): - self.assertRaises(com.SettingWithCopyError, f) + with catch_warnings(record=True): - def test_setting_with_copy_bug(self): + # GH 11836 + # if we have an index type and set it with something that looks + # to numpy like the same, but is actually, not + # (e.g. setting with a float or string '0') + # then we need to coerce to object - # operating on a copy - df = pd.DataFrame({'a': list(range(4)), - 'b': list('ab..'), - 'c': ['a', 'b', np.nan, 'd']}) - mask = pd.isnull(df.c) + # integer indexes + for s in [Series(range(5)), + Series(range(5), index=range(1, 6))]: - def f(): - df[['c']][mask] = df[['b']][mask] - - self.assertRaises(com.SettingWithCopyError, f) - - # invalid warning as we are returning a new object - # GH 8730 - df1 = DataFrame({'x': Series(['a', 'b', 'c']), - 'y': Series(['d', 'e', 'f'])}) - df2 = df1[['x']] - - # this should not raise - df2['y'] = ['g', 'h', 'i'] - - def test_detect_chained_assignment_warnings(self): - - # warnings - with option_context('chained_assignment', 'warn'): - df = DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) - with tm.assert_produces_warning( - expected_warning=com.SettingWithCopyWarning): - df.loc[0]['A'] = 111 - - def test_float64index_slicing_bug(self): - # GH 5557, related to slicing a float index - ser = {256: 2321.0, - 1: 78.0, - 2: 2716.0, - 3: 0.0, - 4: 369.0, - 5: 0.0, - 6: 269.0, - 7: 0.0, - 8: 0.0, - 9: 0.0, - 10: 3536.0, - 11: 0.0, - 12: 24.0, - 13: 0.0, - 14: 931.0, - 15: 0.0, - 16: 101.0, - 17: 78.0, - 18: 9643.0, - 19: 0.0, - 20: 0.0, - 21: 0.0, - 22: 63761.0, - 23: 0.0, - 24: 446.0, - 25: 0.0, - 26: 34773.0, - 27: 0.0, - 28: 729.0, - 29: 78.0, - 30: 0.0, - 31: 0.0, - 32: 3374.0, - 33: 0.0, - 34: 1391.0, - 35: 0.0, - 36: 361.0, - 37: 0.0, - 38: 61808.0, - 39: 0.0, - 40: 0.0, - 41: 0.0, - 42: 6677.0, - 43: 0.0, - 44: 802.0, - 45: 0.0, - 46: 2691.0, - 47: 0.0, - 48: 3582.0, - 49: 0.0, - 50: 734.0, - 51: 0.0, - 52: 627.0, - 53: 70.0, - 54: 2584.0, - 55: 0.0, - 56: 324.0, - 57: 0.0, - 58: 605.0, - 59: 0.0, - 60: 0.0, - 61: 0.0, - 62: 3989.0, - 63: 10.0, - 64: 42.0, - 65: 0.0, - 66: 904.0, - 67: 0.0, - 68: 88.0, - 69: 70.0, - 70: 8172.0, - 71: 0.0, - 72: 0.0, - 73: 0.0, - 74: 64902.0, - 75: 0.0, - 76: 347.0, - 77: 0.0, - 78: 36605.0, - 79: 0.0, - 80: 379.0, - 81: 70.0, - 82: 0.0, - 83: 0.0, - 84: 3001.0, - 85: 0.0, - 86: 1630.0, - 87: 7.0, - 88: 364.0, - 89: 0.0, - 90: 67404.0, - 91: 9.0, - 92: 0.0, - 93: 0.0, - 94: 7685.0, - 95: 0.0, - 96: 1017.0, - 97: 0.0, - 98: 2831.0, - 99: 0.0, - 100: 2963.0, - 101: 0.0, - 102: 854.0, - 103: 0.0, - 104: 0.0, - 105: 0.0, - 106: 0.0, - 107: 0.0, - 108: 0.0, - 109: 0.0, - 110: 0.0, - 111: 0.0, - 112: 0.0, - 113: 0.0, - 114: 0.0, - 115: 0.0, - 116: 0.0, - 117: 0.0, - 118: 0.0, - 119: 0.0, - 120: 0.0, - 121: 0.0, - 122: 0.0, - 123: 0.0, - 124: 0.0, - 125: 0.0, - 126: 67744.0, - 127: 22.0, - 128: 264.0, - 129: 0.0, - 260: 197.0, - 268: 0.0, - 265: 0.0, - 269: 0.0, - 261: 0.0, - 266: 1198.0, - 267: 0.0, - 262: 2629.0, - 258: 775.0, - 257: 0.0, - 263: 0.0, - 259: 0.0, - 264: 163.0, - 250: 10326.0, - 251: 0.0, - 252: 1228.0, - 253: 0.0, - 254: 2769.0, - 255: 0.0} - - # smoke test for the repr - s = Series(ser) - result = s.value_counts() - str(result) - - def test_set_ix_out_of_bounds_axis_0(self): - df = pd.DataFrame( - randn(2, 5), index=["row%s" % i for i in range(2)], - columns=["col%s" % i for i in range(5)]) - self.assertRaises(ValueError, df.ix.__setitem__, (2, 0), 100) - - def test_set_ix_out_of_bounds_axis_1(self): - df = pd.DataFrame( - randn(5, 2), index=["row%s" % i for i in range(5)], - columns=["col%s" % i for i in range(2)]) - self.assertRaises(ValueError, df.ix.__setitem__, (0, 2), 100) - - def test_iloc_empty_list_indexer_is_ok(self): - from pandas.util.testing import makeCustomDataframe as mkdf - df = mkdf(5, 2) - # vertical empty - tm.assert_frame_equal(df.iloc[:, []], df.iloc[:, :0], - check_index_type=True, check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.iloc[[], :], df.iloc[:0, :], - check_index_type=True, check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.iloc[[]], df.iloc[:0, :], - check_index_type=True, - check_column_type=True) - - def test_loc_empty_list_indexer_is_ok(self): - from pandas.util.testing import makeCustomDataframe as mkdf - df = mkdf(5, 2) - # vertical empty - tm.assert_frame_equal(df.loc[:, []], df.iloc[:, :0], - check_index_type=True, check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.loc[[], :], df.iloc[:0, :], - check_index_type=True, check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.loc[[]], df.iloc[:0, :], - check_index_type=True, - check_column_type=True) - - def test_ix_empty_list_indexer_is_ok(self): - from pandas.util.testing import makeCustomDataframe as mkdf - df = mkdf(5, 2) - # vertical empty - tm.assert_frame_equal(df.ix[:, []], df.iloc[:, :0], - check_index_type=True, - check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.ix[[], :], df.iloc[:0, :], - check_index_type=True, - check_column_type=True) - # horizontal empty - tm.assert_frame_equal(df.ix[[]], df.iloc[:0, :], - check_index_type=True, - check_column_type=True) + assert s.index.is_integer() - def test_index_type_coercion(self): + for indexer in [lambda x: x.ix, + lambda x: x.loc, + lambda x: x]: + s2 = s.copy() + indexer(s2)[0.1] = 0 + assert s2.index.is_floating() + assert indexer(s2)[0.1] == 0 - # GH 11836 - # if we have an index type and set it with something that looks - # to numpy like the same, but is actually, not - # (e.g. setting with a float or string '0') - # then we need to coerce to object + s2 = s.copy() + indexer(s2)[0.0] = 0 + exp = s.index + if 0 not in s: + exp = Index(s.index.tolist() + [0]) + tm.assert_index_equal(s2.index, exp) - # integer indexes - for s in [Series(range(5)), - Series(range(5), index=range(1, 6))]: + s2 = s.copy() + indexer(s2)['0'] = 0 + assert s2.index.is_object() - self.assertTrue(s.index.is_integer()) + for s in [Series(range(5), index=np.arange(5.))]: - for indexer in [lambda x: x.ix, - lambda x: x.loc, - lambda x: x]: - s2 = s.copy() - indexer(s2)[0.1] = 0 - self.assertTrue(s2.index.is_floating()) - self.assertTrue(indexer(s2)[0.1] == 0) + assert s.index.is_floating() - s2 = s.copy() - indexer(s2)[0.0] = 0 - exp = s.index - if 0 not in s: - exp = Index(s.index.tolist() + [0]) - tm.assert_index_equal(s2.index, exp) + for idxr in [lambda x: x.ix, + lambda x: x.loc, + lambda x: x]: - s2 = s.copy() - indexer(s2)['0'] = 0 - self.assertTrue(s2.index.is_object()) + s2 = s.copy() + idxr(s2)[0.1] = 0 + assert s2.index.is_floating() + assert idxr(s2)[0.1] == 0 - for s in [Series(range(5), index=np.arange(5.))]: + s2 = s.copy() + idxr(s2)[0.0] = 0 + tm.assert_index_equal(s2.index, s.index) - self.assertTrue(s.index.is_floating()) + s2 = s.copy() + idxr(s2)['0'] = 0 + assert s2.index.is_object() - for idxr in [lambda x: x.ix, - lambda x: x.loc, - lambda x: x]: - s2 = s.copy() - idxr(s2)[0.1] = 0 - self.assertTrue(s2.index.is_floating()) - self.assertTrue(idxr(s2)[0.1] == 0) +class TestMisc(Base): - s2 = s.copy() - idxr(s2)[0.0] = 0 - tm.assert_index_equal(s2.index, s.index) + def test_indexer_caching(self): + # GH5727 + # make sure that indexers are in the _internal_names_set + n = 1000001 + arrays = [lrange(n), lrange(n)] + index = MultiIndex.from_tuples(lzip(*arrays)) + s = Series(np.zeros(n), index=index) + str(s) - s2 = s.copy() - idxr(s2)['0'] = 0 - self.assertTrue(s2.index.is_object()) + # setitem + expected = Series(np.ones(n), index=index) + s = Series(np.zeros(n), index=index) + s[s == 0] = 1 + tm.assert_series_equal(s, expected) def test_float_index_to_mixed(self): df = DataFrame({0.0: np.random.rand(10), 1.0: np.random.rand(10)}) @@ -5189,13 +686,6 @@ def test_float_index_to_mixed(self): 'a': [10] * 10}), df) - def test_duplicate_ix_returns_series(self): - df = DataFrame(np.random.randn(3, 3), index=[0.1, 0.2, 0.2], - columns=list('abc')) - r = df.ix[0.2, 'a'] - e = df.loc[0.2, 'a'] - tm.assert_series_equal(r, e) - def test_float_index_non_scalar_assignment(self): df = DataFrame({'a': [1, 2, 3], 'b': [3, 4, 5]}, index=[1., 2., 3.]) df.loc[df.index[:2]] = 1 @@ -5210,9 +700,9 @@ def test_float_index_non_scalar_assignment(self): def test_float_index_at_iat(self): s = pd.Series([1, 2, 3], index=[0.1, 0.2, 0.3]) for el, item in s.iteritems(): - self.assertEqual(s.at[el], item) + assert s.at[el] == item for i in range(len(s)): - self.assertEqual(s.iat[i], i + 1) + assert s.iat[i] == i + 1 def test_rhs_alignment(self): # GH8258, tests that both rows & columns are aligned to what is @@ -5231,15 +721,18 @@ def run_tests(df, rhs, right): tm.assert_frame_equal(left, right) left = df.copy() - left.ix[s, l] = rhs + with catch_warnings(record=True): + left.ix[s, l] = rhs tm.assert_frame_equal(left, right) left = df.copy() - left.ix[i, j] = rhs + with catch_warnings(record=True): + left.ix[i, j] = rhs tm.assert_frame_equal(left, right) left = df.copy() - left.ix[r, c] = rhs + with catch_warnings(record=True): + left.ix[r, c] = rhs tm.assert_frame_equal(left, right) xs = np.arange(20).reshape(5, 4) @@ -5272,7 +765,7 @@ def assert_slices_equivalent(l_slc, i_slc): if not idx.is_integer: # For integer indices, ix and plain getitem are position-based. tm.assert_series_equal(s[l_slc], s.iloc[i_slc]) - tm.assert_series_equal(s.ix[l_slc], s.iloc[i_slc]) + tm.assert_series_equal(s.loc[l_slc], s.iloc[i_slc]) for idx in [_mklbl('A', 20), np.arange(20) + 100, np.linspace(100, 150, 20)]: @@ -5283,42 +776,16 @@ def assert_slices_equivalent(l_slc, i_slc): assert_slices_equivalent(SLC[idx[13]:idx[9]:-1], SLC[13:8:-1]) assert_slices_equivalent(SLC[idx[9]:idx[13]:-1], SLC[:0]) - def test_multiindex_label_slicing_with_negative_step(self): - s = Series(np.arange(20), - MultiIndex.from_product([list('abcde'), np.arange(4)])) - SLC = pd.IndexSlice - - def assert_slices_equivalent(l_slc, i_slc): - tm.assert_series_equal(s.loc[l_slc], s.iloc[i_slc]) - tm.assert_series_equal(s[l_slc], s.iloc[i_slc]) - tm.assert_series_equal(s.ix[l_slc], s.iloc[i_slc]) - - assert_slices_equivalent(SLC[::-1], SLC[::-1]) - - assert_slices_equivalent(SLC['d'::-1], SLC[15::-1]) - assert_slices_equivalent(SLC[('d', )::-1], SLC[15::-1]) - - assert_slices_equivalent(SLC[:'d':-1], SLC[:11:-1]) - assert_slices_equivalent(SLC[:('d', ):-1], SLC[:11:-1]) - - assert_slices_equivalent(SLC['d':'b':-1], SLC[15:3:-1]) - assert_slices_equivalent(SLC[('d', ):'b':-1], SLC[15:3:-1]) - assert_slices_equivalent(SLC['d':('b', ):-1], SLC[15:3:-1]) - assert_slices_equivalent(SLC[('d', ):('b', ):-1], SLC[15:3:-1]) - assert_slices_equivalent(SLC['b':'d':-1], SLC[:0]) - - assert_slices_equivalent(SLC[('c', 2)::-1], SLC[10::-1]) - assert_slices_equivalent(SLC[:('c', 2):-1], SLC[:9:-1]) - assert_slices_equivalent(SLC[('e', 0):('c', 2):-1], SLC[16:9:-1]) - def test_slice_with_zero_step_raises(self): s = Series(np.arange(20), index=_mklbl('A', 20)) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: s[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: s.loc[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: s.ix[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: s[::0]) + tm.assert_raises_regex(ValueError, 'slice step cannot be zero', + lambda: s.loc[::0]) + with catch_warnings(record=True): + tm.assert_raises_regex(ValueError, + 'slice step cannot be zero', + lambda: s.ix[::0]) def test_indexing_assignment_dict_already_exists(self): df = pd.DataFrame({'x': [1, 2, 6], @@ -5333,11 +800,13 @@ def test_indexing_assignment_dict_already_exists(self): def test_indexing_dtypes_on_empty(self): # Check that .iloc and .ix return correct dtypes GH9983 df = DataFrame({'a': [1, 2, 3], 'b': ['b', 'b2', 'b3']}) - df2 = df.ix[[], :] + with catch_warnings(record=True): + df2 = df.ix[[], :] - self.assertEqual(df2.loc[:, 'a'].dtype, np.int64) + assert df2.loc[:, 'a'].dtype == np.int64 tm.assert_series_equal(df2.loc[:, 'a'], df2.iloc[:, 0]) - tm.assert_series_equal(df2.loc[:, 'a'], df2.ix[:, 0]) + with catch_warnings(record=True): + tm.assert_series_equal(df2.loc[:, 'a'], df2.ix[:, 0]) def test_range_in_series_indexing(self): # range can cause an indexing error @@ -5369,7 +838,7 @@ def test_non_reducing_slice(self): ] for slice_ in slices: tslice_ = _non_reducing_slice(slice_) - self.assertTrue(isinstance(df.loc[tslice_], DataFrame)) + assert isinstance(df.loc[tslice_], DataFrame) def test_list_slice(self): # like dataframe getitem @@ -5384,33 +853,30 @@ def test_maybe_numeric_slice(self): df = pd.DataFrame({'A': [1, 2], 'B': ['c', 'd'], 'C': [True, False]}) result = _maybe_numeric_slice(df, slice_=None) expected = pd.IndexSlice[:, ['A']] - self.assertEqual(result, expected) + assert result == expected result = _maybe_numeric_slice(df, None, include_bool=True) expected = pd.IndexSlice[:, ['A', 'C']] result = _maybe_numeric_slice(df, [1]) expected = [1] - self.assertEqual(result, expected) - - def test_multiindex_slice_first_level(self): - # GH 12697 - freq = ['a', 'b', 'c', 'd'] - idx = pd.MultiIndex.from_product([freq, np.arange(500)]) - df = pd.DataFrame(list(range(2000)), index=idx, columns=['Test']) - df_slice = df.loc[pd.IndexSlice[:, 30:70], :] - result = df_slice.loc['a'] - expected = pd.DataFrame(list(range(30, 71)), - columns=['Test'], - index=range(30, 71)) - tm.assert_frame_equal(result, expected) - result = df_slice.loc['d'] - expected = pd.DataFrame(list(range(1530, 1571)), - columns=['Test'], - index=range(30, 71)) + assert result == expected + + def test_partial_boolean_frame_indexing(self): + # GH 17170 + df = pd.DataFrame(np.arange(9.).reshape(3, 3), + index=list('abc'), + columns=list('ABC')) + index_df = pd.DataFrame(1, index=list('ab'), columns=list('AB')) + result = df[index_df.notnull()] + expected = pd.DataFrame(np.array([[0., 1., np.nan], + [3., 4., np.nan], + [np.nan] * 3]), + index=list('abc'), + columns=list('ABC')) tm.assert_frame_equal(result, expected) -class TestSeriesNoneCoercion(tm.TestCase): +class TestSeriesNoneCoercion(object): EXPECTED_RESULTS = [ # For numeric series, we should coerce to NaN. ([1, 2, 3], [np.nan, 2, 3]), @@ -5457,7 +923,7 @@ def test_coercion_with_loc_and_series(self): tm.assert_series_equal(start_series, expected_series) -class TestDataframeNoneCoercion(tm.TestCase): +class TestDataframeNoneCoercion(object): EXPECTED_SINGLE_ROW_RESULTS = [ # For numeric series, we should coerce to NaN. ([1, 2, 3], [np.nan, 2, 3]), @@ -5513,27 +979,3 @@ def test_none_coercion_mixed_dtypes(self): datetime(2000, 1, 3)], 'd': [None, 'b', 'c']}) tm.assert_frame_equal(start_dataframe, exp) - - -class TestTimedeltaIndexing(tm.TestCase): - - def test_boolean_indexing(self): - # GH 14946 - df = pd.DataFrame({'x': range(10)}) - df.index = pd.to_timedelta(range(10), unit='s') - conditions = [df['x'] > 3, df['x'] == 3, df['x'] < 3] - expected_data = [[0, 1, 2, 3, 10, 10, 10, 10, 10, 10], - [0, 1, 2, 10, 4, 5, 6, 7, 8, 9], - [10, 10, 10, 3, 4, 5, 6, 7, 8, 9]] - for cond, data in zip(conditions, expected_data): - result = df.copy() - result.loc[cond, 'x'] = 10 - expected = pd.DataFrame(data, - index=pd.to_timedelta(range(10), unit='s'), - columns=['x']) - tm.assert_frame_equal(expected, result) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/indexing/test_indexing_slow.py b/pandas/tests/indexing/test_indexing_slow.py index 5d563e20087b9..1b3fb18d9ff1d 100644 --- a/pandas/tests/indexing/test_indexing_slow.py +++ b/pandas/tests/indexing/test_indexing_slow.py @@ -6,13 +6,12 @@ import pandas as pd from pandas.core.api import Series, DataFrame, MultiIndex import pandas.util.testing as tm +import pytest -class TestIndexingSlow(tm.TestCase): +class TestIndexingSlow(object): - _multiprocess_can_split_ = True - - @tm.slow + @pytest.mark.slow def test_multiindex_get_loc(self): # GH7724, GH2646 with warnings.catch_warnings(record=True): @@ -29,10 +28,10 @@ def validate(mi, df, key): mask &= df.iloc[:, i] == k if not mask.any(): - self.assertNotIn(key[:i + 1], mi.index) + assert key[:i + 1] not in mi.index continue - self.assertIn(key[:i + 1], mi.index) + assert key[:i + 1] in mi.index right = df[mask].copy() if i + 1 != len(key): # partial key @@ -82,7 +81,7 @@ def loop(mi, df, keys): assert not mi.index.lexsort_depth < i loop(mi, df, keys) - @tm.slow + @pytest.mark.slow def test_large_dataframe_indexing(self): # GH10692 result = DataFrame({'x': range(10 ** 6)}, dtype='int64') @@ -90,7 +89,7 @@ def test_large_dataframe_indexing(self): expected = DataFrame({'x': range(10 ** 6 + 1)}, dtype='int64') tm.assert_frame_equal(result, expected) - @tm.slow + @pytest.mark.slow def test_large_mi_dataframe_indexing(self): # GH10645 result = MultiIndex.from_arrays([range(10 ** 6), range(10 ** 6)]) diff --git a/pandas/tests/indexing/test_interval.py b/pandas/tests/indexing/test_interval.py new file mode 100644 index 0000000000000..be6e5e1cffb2e --- /dev/null +++ b/pandas/tests/indexing/test_interval.py @@ -0,0 +1,245 @@ +import pytest +import numpy as np +import pandas as pd + +from pandas import Series, DataFrame, IntervalIndex, Interval +import pandas.util.testing as tm + + +class TestIntervalIndex(object): + + def setup_method(self, method): + self.s = Series(np.arange(5), IntervalIndex.from_breaks(np.arange(6))) + + def test_loc_with_scalar(self): + + s = self.s + expected = 0 + + result = s.loc[0.5] + assert result == expected + + result = s.loc[1] + assert result == expected + + with pytest.raises(KeyError): + s.loc[0] + + expected = s.iloc[:3] + tm.assert_series_equal(expected, s.loc[:3]) + tm.assert_series_equal(expected, s.loc[:2.5]) + tm.assert_series_equal(expected, s.loc[0.1:2.5]) + tm.assert_series_equal(expected, s.loc[-1:3]) + + expected = s.iloc[1:4] + tm.assert_series_equal(expected, s.loc[[1.5, 2.5, 3.5]]) + tm.assert_series_equal(expected, s.loc[[2, 3, 4]]) + tm.assert_series_equal(expected, s.loc[[1.5, 3, 4]]) + + expected = s.iloc[2:5] + tm.assert_series_equal(expected, s.loc[s >= 2]) + + def test_getitem_with_scalar(self): + + s = self.s + expected = 0 + + result = s[0.5] + assert result == expected + + result = s[1] + assert result == expected + + with pytest.raises(KeyError): + s[0] + + expected = s.iloc[:3] + tm.assert_series_equal(expected, s[:3]) + tm.assert_series_equal(expected, s[:2.5]) + tm.assert_series_equal(expected, s[0.1:2.5]) + tm.assert_series_equal(expected, s[-1:3]) + + expected = s.iloc[1:4] + tm.assert_series_equal(expected, s[[1.5, 2.5, 3.5]]) + tm.assert_series_equal(expected, s[[2, 3, 4]]) + tm.assert_series_equal(expected, s[[1.5, 3, 4]]) + + expected = s.iloc[2:5] + tm.assert_series_equal(expected, s[s >= 2]) + + def test_with_interval(self): + + s = self.s + expected = 0 + + result = s.loc[Interval(0, 1)] + assert result == expected + + result = s[Interval(0, 1)] + assert result == expected + + expected = s.iloc[3:5] + result = s.loc[Interval(3, 6)] + tm.assert_series_equal(expected, result) + + expected = s.iloc[3:5] + result = s.loc[[Interval(3, 6)]] + tm.assert_series_equal(expected, result) + + expected = s.iloc[3:5] + result = s.loc[[Interval(3, 5)]] + tm.assert_series_equal(expected, result) + + # missing + with pytest.raises(KeyError): + s.loc[Interval(-2, 0)] + + with pytest.raises(KeyError): + s[Interval(-2, 0)] + + with pytest.raises(KeyError): + s.loc[Interval(5, 6)] + + with pytest.raises(KeyError): + s[Interval(5, 6)] + + def test_with_slices(self): + + s = self.s + + # slice of interval + with pytest.raises(NotImplementedError): + s.loc[Interval(3, 6):] + + with pytest.raises(NotImplementedError): + s[Interval(3, 6):] + + expected = s.iloc[3:5] + result = s[[Interval(3, 6)]] + tm.assert_series_equal(expected, result) + + # slice of scalar with step != 1 + with pytest.raises(ValueError): + s[0:4:2] + + def test_with_overlaps(self): + + s = self.s + expected = s.iloc[[3, 4, 3, 4]] + result = s.loc[[Interval(3, 6), Interval(3, 6)]] + tm.assert_series_equal(expected, result) + + idx = IntervalIndex.from_tuples([(1, 5), (3, 7)]) + s = Series(range(len(idx)), index=idx) + + result = s[4] + expected = s + tm.assert_series_equal(expected, result) + + result = s[[4]] + expected = s + tm.assert_series_equal(expected, result) + + result = s.loc[[4]] + expected = s + tm.assert_series_equal(expected, result) + + result = s[Interval(3, 5)] + expected = s + tm.assert_series_equal(expected, result) + + result = s.loc[Interval(3, 5)] + expected = s + tm.assert_series_equal(expected, result) + + # doesn't intersect unique set of intervals + with pytest.raises(KeyError): + s[[Interval(3, 5)]] + + with pytest.raises(KeyError): + s.loc[[Interval(3, 5)]] + + def test_non_unique(self): + + idx = IntervalIndex.from_tuples([(1, 3), (3, 7)]) + + s = pd.Series(range(len(idx)), index=idx) + + result = s.loc[Interval(1, 3)] + assert result == 0 + + result = s.loc[[Interval(1, 3)]] + expected = s.iloc[0:1] + tm.assert_series_equal(expected, result) + + def test_non_unique_moar(self): + + idx = IntervalIndex.from_tuples([(1, 3), (1, 3), (3, 7)]) + s = Series(range(len(idx)), index=idx) + + result = s.loc[Interval(1, 3)] + expected = s.iloc[[0, 1]] + tm.assert_series_equal(expected, result) + + # non-unique index and slices not allowed + with pytest.raises(ValueError): + s.loc[Interval(1, 3):] + + with pytest.raises(ValueError): + s[Interval(1, 3):] + + # non-unique + with pytest.raises(ValueError): + s[[Interval(1, 3)]] + + def test_non_matching(self): + s = self.s + + # this is a departure from our current + # indexin scheme, but simpler + with pytest.raises(KeyError): + s.loc[[-1, 3, 4, 5]] + + with pytest.raises(KeyError): + s.loc[[-1, 3]] + + def test_large_series(self): + s = Series(np.arange(1000000), + index=IntervalIndex.from_breaks(np.arange(1000001))) + + result1 = s.loc[:80000] + result2 = s.loc[0:80000] + result3 = s.loc[0:80000:1] + tm.assert_series_equal(result1, result2) + tm.assert_series_equal(result1, result3) + + def test_loc_getitem_frame(self): + + df = DataFrame({'A': range(10)}) + s = pd.cut(df.A, 5) + df['B'] = s + df = df.set_index('B') + + result = df.loc[4] + expected = df.iloc[4:6] + tm.assert_frame_equal(result, expected) + + with pytest.raises(KeyError): + df.loc[10] + + # single list-like + result = df.loc[[4]] + expected = df.iloc[4:6] + tm.assert_frame_equal(result, expected) + + # non-unique + result = df.loc[[4, 5]] + expected = df.take([4, 5, 4, 5]) + tm.assert_frame_equal(result, expected) + + with pytest.raises(KeyError): + df.loc[[10]] + + # partial missing + with pytest.raises(KeyError): + df.loc[[10, 4]] diff --git a/pandas/tests/indexing/test_ix.py b/pandas/tests/indexing/test_ix.py new file mode 100644 index 0000000000000..dc9a591ee3101 --- /dev/null +++ b/pandas/tests/indexing/test_ix.py @@ -0,0 +1,335 @@ +""" test indexing with ix """ + +import pytest + +from warnings import catch_warnings + +import numpy as np +import pandas as pd + +from pandas.core.dtypes.common import is_scalar +from pandas.compat import lrange +from pandas import Series, DataFrame, option_context, MultiIndex +from pandas.util import testing as tm +from pandas.errors import PerformanceWarning + + +class TestIX(object): + + def test_ix_deprecation(self): + # GH 15114 + + df = DataFrame({'A': [1, 2, 3]}) + with tm.assert_produces_warning(DeprecationWarning, + check_stacklevel=False): + df.ix[1, 'A'] + + def test_ix_loc_setitem_consistency(self): + + # GH 5771 + # loc with slice and series + s = Series(0, index=[4, 5, 6]) + s.loc[4:5] += 1 + expected = Series([1, 1, 0], index=[4, 5, 6]) + tm.assert_series_equal(s, expected) + + # GH 5928 + # chained indexing assignment + df = DataFrame({'a': [0, 1, 2]}) + expected = df.copy() + with catch_warnings(record=True): + expected.ix[[0, 1, 2], 'a'] = -expected.ix[[0, 1, 2], 'a'] + + with catch_warnings(record=True): + df['a'].ix[[0, 1, 2]] = -df['a'].ix[[0, 1, 2]] + tm.assert_frame_equal(df, expected) + + df = DataFrame({'a': [0, 1, 2], 'b': [0, 1, 2]}) + with catch_warnings(record=True): + df['a'].ix[[0, 1, 2]] = -df['a'].ix[[0, 1, 2]].astype( + 'float64') + 0.5 + expected = DataFrame({'a': [0.5, -0.5, -1.5], 'b': [0, 1, 2]}) + tm.assert_frame_equal(df, expected) + + # GH 8607 + # ix setitem consistency + df = DataFrame({'timestamp': [1413840976, 1413842580, 1413760580], + 'delta': [1174, 904, 161], + 'elapsed': [7673, 9277, 1470]}) + expected = DataFrame({'timestamp': pd.to_datetime( + [1413840976, 1413842580, 1413760580], unit='s'), + 'delta': [1174, 904, 161], + 'elapsed': [7673, 9277, 1470]}) + + df2 = df.copy() + df2['timestamp'] = pd.to_datetime(df['timestamp'], unit='s') + tm.assert_frame_equal(df2, expected) + + df2 = df.copy() + df2.loc[:, 'timestamp'] = pd.to_datetime(df['timestamp'], unit='s') + tm.assert_frame_equal(df2, expected) + + df2 = df.copy() + with catch_warnings(record=True): + df2.ix[:, 2] = pd.to_datetime(df['timestamp'], unit='s') + tm.assert_frame_equal(df2, expected) + + def test_ix_loc_consistency(self): + + # GH 8613 + # some edge cases where ix/loc should return the same + # this is not an exhaustive case + + def compare(result, expected): + if is_scalar(expected): + assert result == expected + else: + assert expected.equals(result) + + # failure cases for .loc, but these work for .ix + df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD')) + for key in [slice(1, 3), tuple([slice(0, 2), slice(0, 2)]), + tuple([slice(0, 2), df.columns[0:2]])]: + + for index in [tm.makeStringIndex, tm.makeUnicodeIndex, + tm.makeDateIndex, tm.makePeriodIndex, + tm.makeTimedeltaIndex]: + df.index = index(len(df.index)) + with catch_warnings(record=True): + df.ix[key] + + pytest.raises(TypeError, lambda: df.loc[key]) + + df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'), + index=pd.date_range('2012-01-01', periods=5)) + + for key in ['2012-01-03', + '2012-01-31', + slice('2012-01-03', '2012-01-03'), + slice('2012-01-03', '2012-01-04'), + slice('2012-01-03', '2012-01-06', 2), + slice('2012-01-03', '2012-01-31'), + tuple([[True, True, True, False, True]]), ]: + + # getitem + + # if the expected raises, then compare the exceptions + try: + with catch_warnings(record=True): + expected = df.ix[key] + except KeyError: + pytest.raises(KeyError, lambda: df.loc[key]) + continue + + result = df.loc[key] + compare(result, expected) + + # setitem + df1 = df.copy() + df2 = df.copy() + + with catch_warnings(record=True): + df1.ix[key] = 10 + df2.loc[key] = 10 + compare(df2, df1) + + # edge cases + s = Series([1, 2, 3, 4], index=list('abde')) + + result1 = s['a':'c'] + with catch_warnings(record=True): + result2 = s.ix['a':'c'] + result3 = s.loc['a':'c'] + tm.assert_series_equal(result1, result2) + tm.assert_series_equal(result1, result3) + + # now work rather than raising KeyError + s = Series(range(5), [-2, -1, 1, 2, 3]) + + with catch_warnings(record=True): + result1 = s.ix[-10:3] + result2 = s.loc[-10:3] + tm.assert_series_equal(result1, result2) + + with catch_warnings(record=True): + result1 = s.ix[0:3] + result2 = s.loc[0:3] + tm.assert_series_equal(result1, result2) + + def test_ix_weird_slicing(self): + # http://stackoverflow.com/q/17056560/1240268 + df = DataFrame({'one': [1, 2, 3, np.nan, np.nan], + 'two': [1, 2, 3, 4, 5]}) + df.loc[df['one'] > 1, 'two'] = -df['two'] + + expected = DataFrame({'one': {0: 1.0, + 1: 2.0, + 2: 3.0, + 3: np.nan, + 4: np.nan}, + 'two': {0: 1, + 1: -2, + 2: -3, + 3: 4, + 4: 5}}) + tm.assert_frame_equal(df, expected) + + def test_ix_general(self): + + # ix general issues + + # GH 2817 + data = {'amount': {0: 700, 1: 600, 2: 222, 3: 333, 4: 444}, + 'col': {0: 3.5, 1: 3.5, 2: 4.0, 3: 4.0, 4: 4.0}, + 'year': {0: 2012, 1: 2011, 2: 2012, 3: 2012, 4: 2012}} + df = DataFrame(data).set_index(keys=['col', 'year']) + key = 4.0, 2012 + + # emits a PerformanceWarning, ok + with tm.assert_produces_warning(PerformanceWarning): + tm.assert_frame_equal(df.loc[key], df.iloc[2:]) + + # this is ok + df.sort_index(inplace=True) + res = df.loc[key] + + # col has float dtype, result should be Float64Index + index = MultiIndex.from_arrays([[4.] * 3, [2012] * 3], + names=['col', 'year']) + expected = DataFrame({'amount': [222, 333, 444]}, index=index) + tm.assert_frame_equal(res, expected) + + def test_ix_assign_column_mixed(self): + # GH #1142 + df = DataFrame(tm.getSeriesData()) + df['foo'] = 'bar' + + orig = df.loc[:, 'B'].copy() + df.loc[:, 'B'] = df.loc[:, 'B'] + 1 + tm.assert_series_equal(df.B, orig + 1) + + # GH 3668, mixed frame with series value + df = DataFrame({'x': lrange(10), 'y': lrange(10, 20), 'z': 'bar'}) + expected = df.copy() + + for i in range(5): + indexer = i * 2 + v = 1000 + i * 200 + expected.loc[indexer, 'y'] = v + assert expected.loc[indexer, 'y'] == v + + df.loc[df.x % 2 == 0, 'y'] = df.loc[df.x % 2 == 0, 'y'] * 100 + tm.assert_frame_equal(df, expected) + + # GH 4508, making sure consistency of assignments + df = DataFrame({'a': [1, 2, 3], 'b': [0, 1, 2]}) + df.loc[[0, 2, ], 'b'] = [100, -100] + expected = DataFrame({'a': [1, 2, 3], 'b': [100, 1, -100]}) + tm.assert_frame_equal(df, expected) + + df = pd.DataFrame({'a': lrange(4)}) + df['b'] = np.nan + df.loc[[1, 3], 'b'] = [100, -100] + expected = DataFrame({'a': [0, 1, 2, 3], + 'b': [np.nan, 100, np.nan, -100]}) + tm.assert_frame_equal(df, expected) + + # ok, but chained assignments are dangerous + # if we turn off chained assignement it will work + with option_context('chained_assignment', None): + df = pd.DataFrame({'a': lrange(4)}) + df['b'] = np.nan + df['b'].loc[[1, 3]] = [100, -100] + tm.assert_frame_equal(df, expected) + + def test_ix_get_set_consistency(self): + + # GH 4544 + # ix/loc get/set not consistent when + # a mixed int/string index + df = DataFrame(np.arange(16).reshape((4, 4)), + columns=['a', 'b', 8, 'c'], + index=['e', 7, 'f', 'g']) + + with catch_warnings(record=True): + assert df.ix['e', 8] == 2 + assert df.loc['e', 8] == 2 + + with catch_warnings(record=True): + df.ix['e', 8] = 42 + assert df.ix['e', 8] == 42 + assert df.loc['e', 8] == 42 + + df.loc['e', 8] = 45 + with catch_warnings(record=True): + assert df.ix['e', 8] == 45 + assert df.loc['e', 8] == 45 + + def test_ix_slicing_strings(self): + # see gh-3836 + data = {'Classification': + ['SA EQUITY CFD', 'bbb', 'SA EQUITY', 'SA SSF', 'aaa'], + 'Random': [1, 2, 3, 4, 5], + 'X': ['correct', 'wrong', 'correct', 'correct', 'wrong']} + df = DataFrame(data) + x = df[~df.Classification.isin(['SA EQUITY CFD', 'SA EQUITY', 'SA SSF' + ])] + with catch_warnings(record=True): + df.ix[x.index, 'X'] = df['Classification'] + + expected = DataFrame({'Classification': {0: 'SA EQUITY CFD', + 1: 'bbb', + 2: 'SA EQUITY', + 3: 'SA SSF', + 4: 'aaa'}, + 'Random': {0: 1, + 1: 2, + 2: 3, + 3: 4, + 4: 5}, + 'X': {0: 'correct', + 1: 'bbb', + 2: 'correct', + 3: 'correct', + 4: 'aaa'}}) # bug was 4: 'bbb' + + tm.assert_frame_equal(df, expected) + + def test_ix_setitem_out_of_bounds_axis_0(self): + df = pd.DataFrame( + np.random.randn(2, 5), index=["row%s" % i for i in range(2)], + columns=["col%s" % i for i in range(5)]) + with catch_warnings(record=True): + pytest.raises(ValueError, df.ix.__setitem__, (2, 0), 100) + + def test_ix_setitem_out_of_bounds_axis_1(self): + df = pd.DataFrame( + np.random.randn(5, 2), index=["row%s" % i for i in range(5)], + columns=["col%s" % i for i in range(2)]) + with catch_warnings(record=True): + pytest.raises(ValueError, df.ix.__setitem__, (0, 2), 100) + + def test_ix_empty_list_indexer_is_ok(self): + with catch_warnings(record=True): + from pandas.util.testing import makeCustomDataframe as mkdf + df = mkdf(5, 2) + # vertical empty + tm.assert_frame_equal(df.ix[:, []], df.iloc[:, :0], + check_index_type=True, + check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.ix[[], :], df.iloc[:0, :], + check_index_type=True, + check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.ix[[]], df.iloc[:0, :], + check_index_type=True, + check_column_type=True) + + def test_ix_duplicate_returns_series(self): + df = DataFrame(np.random.randn(3, 3), index=[0.1, 0.2, 0.2], + columns=list('abc')) + with catch_warnings(record=True): + r = df.ix[0.2, 'a'] + e = df.loc[0.2, 'a'] + tm.assert_series_equal(r, e) diff --git a/pandas/tests/indexing/test_loc.py b/pandas/tests/indexing/test_loc.py new file mode 100644 index 0000000000000..3e863a59df67e --- /dev/null +++ b/pandas/tests/indexing/test_loc.py @@ -0,0 +1,657 @@ +""" test label based indexing with loc """ + +import itertools +import pytest + +from warnings import catch_warnings +import numpy as np + +import pandas as pd +from pandas.compat import lrange, StringIO +from pandas import (Series, DataFrame, Timestamp, + date_range, MultiIndex) +from pandas.util import testing as tm +from pandas.tests.indexing.common import Base + + +class TestLoc(Base): + + def test_loc_getitem_dups(self): + # GH 5678 + # repeated gettitems on a dup index returing a ndarray + df = DataFrame( + np.random.random_sample((20, 5)), + index=['ABCDE' [x % 5] for x in range(20)]) + expected = df.loc['A', 0] + result = df.loc[:, 0].loc['A'] + tm.assert_series_equal(result, expected) + + def test_loc_getitem_dups2(self): + + # GH4726 + # dup indexing with iloc/loc + df = DataFrame([[1, 2, 'foo', 'bar', Timestamp('20130101')]], + columns=['a', 'a', 'a', 'a', 'a'], index=[1]) + expected = Series([1, 2, 'foo', 'bar', Timestamp('20130101')], + index=['a', 'a', 'a', 'a', 'a'], name=1) + + result = df.iloc[0] + tm.assert_series_equal(result, expected) + + result = df.loc[1] + tm.assert_series_equal(result, expected) + + def test_loc_setitem_dups(self): + + # GH 6541 + df_orig = DataFrame( + {'me': list('rttti'), + 'foo': list('aaade'), + 'bar': np.arange(5, dtype='float64') * 1.34 + 2, + 'bar2': np.arange(5, dtype='float64') * -.34 + 2}).set_index('me') + + indexer = tuple(['r', ['bar', 'bar2']]) + df = df_orig.copy() + df.loc[indexer] *= 2.0 + tm.assert_series_equal(df.loc[indexer], 2.0 * df_orig.loc[indexer]) + + indexer = tuple(['r', 'bar']) + df = df_orig.copy() + df.loc[indexer] *= 2.0 + assert df.loc[indexer] == 2.0 * df_orig.loc[indexer] + + indexer = tuple(['t', ['bar', 'bar2']]) + df = df_orig.copy() + df.loc[indexer] *= 2.0 + tm.assert_frame_equal(df.loc[indexer], 2.0 * df_orig.loc[indexer]) + + def test_loc_setitem_slice(self): + # GH10503 + + # assigning the same type should not change the type + df1 = DataFrame({'a': [0, 1, 1], + 'b': Series([100, 200, 300], dtype='uint32')}) + ix = df1['a'] == 1 + newb1 = df1.loc[ix, 'b'] + 1 + df1.loc[ix, 'b'] = newb1 + expected = DataFrame({'a': [0, 1, 1], + 'b': Series([100, 201, 301], dtype='uint32')}) + tm.assert_frame_equal(df1, expected) + + # assigning a new type should get the inferred type + df2 = DataFrame({'a': [0, 1, 1], 'b': [100, 200, 300]}, + dtype='uint64') + ix = df1['a'] == 1 + newb2 = df2.loc[ix, 'b'] + df1.loc[ix, 'b'] = newb2 + expected = DataFrame({'a': [0, 1, 1], 'b': [100, 200, 300]}, + dtype='uint64') + tm.assert_frame_equal(df2, expected) + + def test_loc_getitem_int(self): + + # int label + self.check_result('int label', 'loc', 2, 'ix', 2, + typs=['ints', 'uints'], axes=0) + self.check_result('int label', 'loc', 3, 'ix', 3, + typs=['ints', 'uints'], axes=1) + self.check_result('int label', 'loc', 4, 'ix', 4, + typs=['ints', 'uints'], axes=2) + self.check_result('int label', 'loc', 2, 'ix', 2, + typs=['label'], fails=KeyError) + + def test_loc_getitem_label(self): + + # label + self.check_result('label', 'loc', 'c', 'ix', 'c', typs=['labels'], + axes=0) + self.check_result('label', 'loc', 'null', 'ix', 'null', typs=['mixed'], + axes=0) + self.check_result('label', 'loc', 8, 'ix', 8, typs=['mixed'], axes=0) + self.check_result('label', 'loc', Timestamp('20130102'), 'ix', 1, + typs=['ts'], axes=0) + self.check_result('label', 'loc', 'c', 'ix', 'c', typs=['empty'], + fails=KeyError) + + def test_loc_getitem_label_out_of_range(self): + + # out of range label + self.check_result('label range', 'loc', 'f', 'ix', 'f', + typs=['ints', 'uints', 'labels', 'mixed', 'ts'], + fails=KeyError) + self.check_result('label range', 'loc', 'f', 'ix', 'f', + typs=['floats'], fails=TypeError) + self.check_result('label range', 'loc', 20, 'ix', 20, + typs=['ints', 'uints', 'mixed'], fails=KeyError) + self.check_result('label range', 'loc', 20, 'ix', 20, + typs=['labels'], fails=TypeError) + self.check_result('label range', 'loc', 20, 'ix', 20, typs=['ts'], + axes=0, fails=TypeError) + self.check_result('label range', 'loc', 20, 'ix', 20, typs=['floats'], + axes=0, fails=TypeError) + + def test_loc_getitem_label_list(self): + + # list of labels + self.check_result('list lbl', 'loc', [0, 2, 4], 'ix', [0, 2, 4], + typs=['ints', 'uints'], axes=0) + self.check_result('list lbl', 'loc', [3, 6, 9], 'ix', [3, 6, 9], + typs=['ints', 'uints'], axes=1) + self.check_result('list lbl', 'loc', [4, 8, 12], 'ix', [4, 8, 12], + typs=['ints', 'uints'], axes=2) + self.check_result('list lbl', 'loc', ['a', 'b', 'd'], 'ix', + ['a', 'b', 'd'], typs=['labels'], axes=0) + self.check_result('list lbl', 'loc', ['A', 'B', 'C'], 'ix', + ['A', 'B', 'C'], typs=['labels'], axes=1) + self.check_result('list lbl', 'loc', ['Z', 'Y', 'W'], 'ix', + ['Z', 'Y', 'W'], typs=['labels'], axes=2) + self.check_result('list lbl', 'loc', [2, 8, 'null'], 'ix', + [2, 8, 'null'], typs=['mixed'], axes=0) + self.check_result('list lbl', 'loc', + [Timestamp('20130102'), Timestamp('20130103')], 'ix', + [Timestamp('20130102'), Timestamp('20130103')], + typs=['ts'], axes=0) + + self.check_result('list lbl', 'loc', [0, 1, 2], 'indexer', [0, 1, 2], + typs=['empty'], fails=KeyError) + self.check_result('list lbl', 'loc', [0, 2, 3], 'ix', [0, 2, 3], + typs=['ints', 'uints'], axes=0, fails=KeyError) + self.check_result('list lbl', 'loc', [3, 6, 7], 'ix', [3, 6, 7], + typs=['ints', 'uints'], axes=1, fails=KeyError) + self.check_result('list lbl', 'loc', [4, 8, 10], 'ix', [4, 8, 10], + typs=['ints', 'uints'], axes=2, fails=KeyError) + + def test_loc_getitem_label_list_fails(self): + # fails + self.check_result('list lbl', 'loc', [20, 30, 40], 'ix', [20, 30, 40], + typs=['ints', 'uints'], axes=1, fails=KeyError) + self.check_result('list lbl', 'loc', [20, 30, 40], 'ix', [20, 30, 40], + typs=['ints', 'uints'], axes=2, fails=KeyError) + + def test_loc_getitem_label_array_like(self): + # array like + self.check_result('array like', 'loc', Series(index=[0, 2, 4]).index, + 'ix', [0, 2, 4], typs=['ints', 'uints'], axes=0) + self.check_result('array like', 'loc', Series(index=[3, 6, 9]).index, + 'ix', [3, 6, 9], typs=['ints', 'uints'], axes=1) + self.check_result('array like', 'loc', Series(index=[4, 8, 12]).index, + 'ix', [4, 8, 12], typs=['ints', 'uints'], axes=2) + + def test_loc_getitem_bool(self): + # boolean indexers + b = [True, False, True, False] + self.check_result('bool', 'loc', b, 'ix', b, + typs=['ints', 'uints', 'labels', + 'mixed', 'ts', 'floats']) + self.check_result('bool', 'loc', b, 'ix', b, typs=['empty'], + fails=KeyError) + + def test_loc_getitem_int_slice(self): + + # ok + self.check_result('int slice2', 'loc', slice(2, 4), 'ix', [2, 4], + typs=['ints', 'uints'], axes=0) + self.check_result('int slice2', 'loc', slice(3, 6), 'ix', [3, 6], + typs=['ints', 'uints'], axes=1) + self.check_result('int slice2', 'loc', slice(4, 8), 'ix', [4, 8], + typs=['ints', 'uints'], axes=2) + + # GH 3053 + # loc should treat integer slices like label slices + + index = MultiIndex.from_tuples([t for t in itertools.product( + [6, 7, 8], ['a', 'b'])]) + df = DataFrame(np.random.randn(6, 6), index, index) + result = df.loc[6:8, :] + expected = df + tm.assert_frame_equal(result, expected) + + index = MultiIndex.from_tuples([t + for t in itertools.product( + [10, 20, 30], ['a', 'b'])]) + df = DataFrame(np.random.randn(6, 6), index, index) + result = df.loc[20:30, :] + expected = df.iloc[2:] + tm.assert_frame_equal(result, expected) + + # doc examples + result = df.loc[10, :] + expected = df.iloc[0:2] + expected.index = ['a', 'b'] + tm.assert_frame_equal(result, expected) + + result = df.loc[:, 10] + # expected = df.ix[:,10] (this fails) + expected = df[10] + tm.assert_frame_equal(result, expected) + + def test_loc_to_fail(self): + + # GH3449 + df = DataFrame(np.random.random((3, 3)), + index=['a', 'b', 'c'], + columns=['e', 'f', 'g']) + + # raise a KeyError? + pytest.raises(KeyError, df.loc.__getitem__, + tuple([[1, 2], [1, 2]])) + + # GH 7496 + # loc should not fallback + + s = Series() + s.loc[1] = 1 + s.loc['a'] = 2 + + pytest.raises(KeyError, lambda: s.loc[-1]) + pytest.raises(KeyError, lambda: s.loc[[-1, -2]]) + + pytest.raises(KeyError, lambda: s.loc[['4']]) + + s.loc[-1] = 3 + result = s.loc[[-1, -2]] + expected = Series([3, np.nan], index=[-1, -2]) + tm.assert_series_equal(result, expected) + + s['a'] = 2 + pytest.raises(KeyError, lambda: s.loc[[-2]]) + + del s['a'] + + def f(): + s.loc[[-2]] = 0 + + pytest.raises(KeyError, f) + + # inconsistency between .loc[values] and .loc[values,:] + # GH 7999 + df = DataFrame([['a'], ['b']], index=[1, 2], columns=['value']) + + def f(): + df.loc[[3], :] + + pytest.raises(KeyError, f) + + def f(): + df.loc[[3]] + + pytest.raises(KeyError, f) + + def test_loc_getitem_label_slice(self): + + # label slices (with ints) + self.check_result('lab slice', 'loc', slice(1, 3), + 'ix', slice(1, 3), + typs=['labels', 'mixed', 'empty', 'ts', 'floats'], + fails=TypeError) + + # real label slices + self.check_result('lab slice', 'loc', slice('a', 'c'), + 'ix', slice('a', 'c'), typs=['labels'], axes=0) + self.check_result('lab slice', 'loc', slice('A', 'C'), + 'ix', slice('A', 'C'), typs=['labels'], axes=1) + self.check_result('lab slice', 'loc', slice('W', 'Z'), + 'ix', slice('W', 'Z'), typs=['labels'], axes=2) + + self.check_result('ts slice', 'loc', slice('20130102', '20130104'), + 'ix', slice('20130102', '20130104'), + typs=['ts'], axes=0) + self.check_result('ts slice', 'loc', slice('20130102', '20130104'), + 'ix', slice('20130102', '20130104'), + typs=['ts'], axes=1, fails=TypeError) + self.check_result('ts slice', 'loc', slice('20130102', '20130104'), + 'ix', slice('20130102', '20130104'), + typs=['ts'], axes=2, fails=TypeError) + + # GH 14316 + self.check_result('ts slice rev', 'loc', slice('20130104', '20130102'), + 'indexer', [0, 1, 2], typs=['ts_rev'], axes=0) + + self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), + typs=['mixed'], axes=0, fails=TypeError) + self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), + typs=['mixed'], axes=1, fails=KeyError) + self.check_result('mixed slice', 'loc', slice(2, 8), 'ix', slice(2, 8), + typs=['mixed'], axes=2, fails=KeyError) + + self.check_result('mixed slice', 'loc', slice(2, 4, 2), 'ix', slice( + 2, 4, 2), typs=['mixed'], axes=0, fails=TypeError) + + def test_loc_general(self): + + df = DataFrame( + np.random.rand(4, 4), columns=['A', 'B', 'C', 'D'], + index=['A', 'B', 'C', 'D']) + + # want this to work + result = df.loc[:, "A":"B"].iloc[0:2, :] + assert (result.columns == ['A', 'B']).all() + assert (result.index == ['A', 'B']).all() + + # mixed type + result = DataFrame({'a': [Timestamp('20130101')], 'b': [1]}).iloc[0] + expected = Series([Timestamp('20130101'), 1], index=['a', 'b'], name=0) + tm.assert_series_equal(result, expected) + assert result.dtype == object + + def test_loc_setitem_consistency(self): + # GH 6149 + # coerce similary for setitem and loc when rows have a null-slice + expected = DataFrame({'date': Series(0, index=range(5), + dtype=np.int64), + 'val': Series(range(5), dtype=np.int64)}) + + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series( + range(5), dtype=np.int64)}) + df.loc[:, 'date'] = 0 + tm.assert_frame_equal(df, expected) + + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series(range(5), dtype=np.int64)}) + df.loc[:, 'date'] = np.array(0, dtype=np.int64) + tm.assert_frame_equal(df, expected) + + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series(range(5), dtype=np.int64)}) + df.loc[:, 'date'] = np.array([0, 0, 0, 0, 0], dtype=np.int64) + tm.assert_frame_equal(df, expected) + + expected = DataFrame({'date': Series('foo', index=range(5)), + 'val': Series(range(5), dtype=np.int64)}) + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series(range(5), dtype=np.int64)}) + df.loc[:, 'date'] = 'foo' + tm.assert_frame_equal(df, expected) + + expected = DataFrame({'date': Series(1.0, index=range(5)), + 'val': Series(range(5), dtype=np.int64)}) + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series(range(5), dtype=np.int64)}) + df.loc[:, 'date'] = 1.0 + tm.assert_frame_equal(df, expected) + + def test_loc_setitem_consistency_empty(self): + # empty (essentially noops) + expected = DataFrame(columns=['x', 'y']) + expected['x'] = expected['x'].astype(np.int64) + df = DataFrame(columns=['x', 'y']) + df.loc[:, 'x'] = 1 + tm.assert_frame_equal(df, expected) + + df = DataFrame(columns=['x', 'y']) + df['x'] = 1 + tm.assert_frame_equal(df, expected) + + def test_loc_setitem_consistency_slice_column_len(self): + # .loc[:,column] setting with slice == len of the column + # GH10408 + data = """Level_0,,,Respondent,Respondent,Respondent,OtherCat,OtherCat +Level_1,,,Something,StartDate,EndDate,Yes/No,SomethingElse +Region,Site,RespondentID,,,,, +Region_1,Site_1,3987227376,A,5/25/2015 10:59,5/25/2015 11:22,Yes, +Region_1,Site_1,3980680971,A,5/21/2015 9:40,5/21/2015 9:52,Yes,Yes +Region_1,Site_2,3977723249,A,5/20/2015 8:27,5/20/2015 8:41,Yes, +Region_1,Site_2,3977723089,A,5/20/2015 8:33,5/20/2015 9:09,Yes,No""" + + df = pd.read_csv(StringIO(data), header=[0, 1], index_col=[0, 1, 2]) + df.loc[:, ('Respondent', 'StartDate')] = pd.to_datetime(df.loc[:, ( + 'Respondent', 'StartDate')]) + df.loc[:, ('Respondent', 'EndDate')] = pd.to_datetime(df.loc[:, ( + 'Respondent', 'EndDate')]) + df.loc[:, ('Respondent', 'Duration')] = df.loc[:, ( + 'Respondent', 'EndDate')] - df.loc[:, ('Respondent', 'StartDate')] + + df.loc[:, ('Respondent', 'Duration')] = df.loc[:, ( + 'Respondent', 'Duration')].astype('timedelta64[s]') + expected = Series([1380, 720, 840, 2160.], index=df.index, + name=('Respondent', 'Duration')) + tm.assert_series_equal(df[('Respondent', 'Duration')], expected) + + def test_loc_setitem_frame(self): + df = self.frame_labels + + result = df.iloc[0, 0] + + df.loc['a', 'A'] = 1 + result = df.loc['a', 'A'] + assert result == 1 + + result = df.iloc[0, 0] + assert result == 1 + + df.loc[:, 'B':'D'] = 0 + expected = df.loc[:, 'B':'D'] + result = df.iloc[:, 1:] + tm.assert_frame_equal(result, expected) + + # GH 6254 + # setting issue + df = DataFrame(index=[3, 5, 4], columns=['A']) + df.loc[[4, 3, 5], 'A'] = np.array([1, 2, 3], dtype='int64') + expected = DataFrame(dict(A=Series( + [1, 2, 3], index=[4, 3, 5]))).reindex(index=[3, 5, 4]) + tm.assert_frame_equal(df, expected) + + # GH 6252 + # setting with an empty frame + keys1 = ['@' + str(i) for i in range(5)] + val1 = np.arange(5, dtype='int64') + + keys2 = ['@' + str(i) for i in range(4)] + val2 = np.arange(4, dtype='int64') + + index = list(set(keys1).union(keys2)) + df = DataFrame(index=index) + df['A'] = np.nan + df.loc[keys1, 'A'] = val1 + + df['B'] = np.nan + df.loc[keys2, 'B'] = val2 + + expected = DataFrame(dict(A=Series(val1, index=keys1), B=Series( + val2, index=keys2))).reindex(index=index) + tm.assert_frame_equal(df, expected) + + # GH 8669 + # invalid coercion of nan -> int + df = DataFrame({'A': [1, 2, 3], 'B': np.nan}) + df.loc[df.B > df.A, 'B'] = df.A + expected = DataFrame({'A': [1, 2, 3], 'B': np.nan}) + tm.assert_frame_equal(df, expected) + + # GH 6546 + # setting with mixed labels + df = DataFrame({1: [1, 2], 2: [3, 4], 'a': ['a', 'b']}) + + result = df.loc[0, [1, 2]] + expected = Series([1, 3], index=[1, 2], dtype=object, name=0) + tm.assert_series_equal(result, expected) + + expected = DataFrame({1: [5, 2], 2: [6, 4], 'a': ['a', 'b']}) + df.loc[0, [1, 2]] = [5, 6] + tm.assert_frame_equal(df, expected) + + def test_loc_setitem_frame_multiples(self): + # multiple setting + df = DataFrame({'A': ['foo', 'bar', 'baz'], + 'B': Series( + range(3), dtype=np.int64)}) + rhs = df.loc[1:2] + rhs.index = df.index[0:2] + df.loc[0:1] = rhs + expected = DataFrame({'A': ['bar', 'baz', 'baz'], + 'B': Series( + [1, 2, 2], dtype=np.int64)}) + tm.assert_frame_equal(df, expected) + + # multiple setting with frame on rhs (with M8) + df = DataFrame({'date': date_range('2000-01-01', '2000-01-5'), + 'val': Series( + range(5), dtype=np.int64)}) + expected = DataFrame({'date': [Timestamp('20000101'), Timestamp( + '20000102'), Timestamp('20000101'), Timestamp('20000102'), + Timestamp('20000103')], + 'val': Series( + [0, 1, 0, 1, 2], dtype=np.int64)}) + rhs = df.loc[0:2] + rhs.index = df.index[2:5] + df.loc[2:4] = rhs + tm.assert_frame_equal(df, expected) + + def test_loc_coerceion(self): + + # 12411 + df = DataFrame({'date': [pd.Timestamp('20130101').tz_localize('UTC'), + pd.NaT]}) + expected = df.dtypes + + result = df.iloc[[0]] + tm.assert_series_equal(result.dtypes, expected) + + result = df.iloc[[1]] + tm.assert_series_equal(result.dtypes, expected) + + # 12045 + import datetime + df = DataFrame({'date': [datetime.datetime(2012, 1, 1), + datetime.datetime(1012, 1, 2)]}) + expected = df.dtypes + + result = df.iloc[[0]] + tm.assert_series_equal(result.dtypes, expected) + + result = df.iloc[[1]] + tm.assert_series_equal(result.dtypes, expected) + + # 11594 + df = DataFrame({'text': ['some words'] + [None] * 9}) + expected = df.dtypes + + result = df.iloc[0:2] + tm.assert_series_equal(result.dtypes, expected) + + result = df.iloc[3:] + tm.assert_series_equal(result.dtypes, expected) + + def test_loc_non_unique(self): + # GH3659 + # non-unique indexer with loc slice + # https://groups.google.com/forum/?fromgroups#!topic/pydata/zTm2No0crYs + + # these are going to raise becuase the we are non monotonic + df = DataFrame({'A': [1, 2, 3, 4, 5, 6], + 'B': [3, 4, 5, 6, 7, 8]}, index=[0, 1, 0, 1, 2, 3]) + pytest.raises(KeyError, df.loc.__getitem__, + tuple([slice(1, None)])) + pytest.raises(KeyError, df.loc.__getitem__, + tuple([slice(0, None)])) + pytest.raises(KeyError, df.loc.__getitem__, tuple([slice(1, 2)])) + + # monotonic are ok + df = DataFrame({'A': [1, 2, 3, 4, 5, 6], + 'B': [3, 4, 5, 6, 7, 8]}, + index=[0, 1, 0, 1, 2, 3]).sort_index(axis=0) + result = df.loc[1:] + expected = DataFrame({'A': [2, 4, 5, 6], 'B': [4, 6, 7, 8]}, + index=[1, 1, 2, 3]) + tm.assert_frame_equal(result, expected) + + result = df.loc[0:] + tm.assert_frame_equal(result, df) + + result = df.loc[1:2] + expected = DataFrame({'A': [2, 4, 5], 'B': [4, 6, 7]}, + index=[1, 1, 2]) + tm.assert_frame_equal(result, expected) + + def test_loc_non_unique_memory_error(self): + + # GH 4280 + # non_unique index with a large selection triggers a memory error + + columns = list('ABCDEFG') + + def gen_test(l, l2): + return pd.concat([ + DataFrame(np.random.randn(l, len(columns)), + index=lrange(l), columns=columns), + DataFrame(np.ones((l2, len(columns))), + index=[0] * l2, columns=columns)]) + + def gen_expected(df, mask): + l = len(mask) + return pd.concat([df.take([0], convert=False), + DataFrame(np.ones((l, len(columns))), + index=[0] * l, + columns=columns), + df.take(mask[1:], convert=False)]) + + df = gen_test(900, 100) + assert not df.index.is_unique + + mask = np.arange(100) + result = df.loc[mask] + expected = gen_expected(df, mask) + tm.assert_frame_equal(result, expected) + + df = gen_test(900000, 100000) + assert not df.index.is_unique + + mask = np.arange(100000) + result = df.loc[mask] + expected = gen_expected(df, mask) + tm.assert_frame_equal(result, expected) + + def test_loc_name(self): + # GH 3880 + df = DataFrame([[1, 1], [1, 1]]) + df.index.name = 'index_name' + result = df.iloc[[0, 1]].index.name + assert result == 'index_name' + + with catch_warnings(record=True): + result = df.ix[[0, 1]].index.name + assert result == 'index_name' + + result = df.loc[[0, 1]].index.name + assert result == 'index_name' + + def test_loc_empty_list_indexer_is_ok(self): + from pandas.util.testing import makeCustomDataframe as mkdf + df = mkdf(5, 2) + # vertical empty + tm.assert_frame_equal(df.loc[:, []], df.iloc[:, :0], + check_index_type=True, check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.loc[[], :], df.iloc[:0, :], + check_index_type=True, check_column_type=True) + # horizontal empty + tm.assert_frame_equal(df.loc[[]], df.iloc[:0, :], + check_index_type=True, + check_column_type=True) + + def test_identity_slice_returns_new_object(self): + # GH13873 + original_df = DataFrame({'a': [1, 2, 3]}) + sliced_df = original_df.loc[:] + assert sliced_df is not original_df + assert original_df[:] is not original_df + + # should be a shallow copy + original_df['a'] = [4, 4, 4] + assert (sliced_df['a'] == 4).all() + + # These should not return copies + assert original_df is original_df.loc[:, :] + df = DataFrame(np.random.randn(10, 4)) + assert df[0] is df.loc[:, 0] + + # Same tests for Series + original_series = Series([1, 2, 3, 4, 5, 6]) + sliced_series = original_series.loc[:] + assert sliced_series is not original_series + assert original_series[:] is not original_series + + original_series[:3] = [7, 8, 9] + assert all(sliced_series[:3] == [7, 8, 9]) diff --git a/pandas/tests/indexing/test_multiindex.py b/pandas/tests/indexing/test_multiindex.py new file mode 100644 index 0000000000000..c12bb8910ffc9 --- /dev/null +++ b/pandas/tests/indexing/test_multiindex.py @@ -0,0 +1,1308 @@ +from warnings import catch_warnings +import pytest +import numpy as np +import pandas as pd +from pandas import (Panel, Series, MultiIndex, DataFrame, + Timestamp, Index, date_range) +from pandas.util import testing as tm +from pandas.errors import PerformanceWarning, UnsortedIndexError +from pandas.tests.indexing.common import _mklbl + + +class TestMultiIndexBasic(object): + + def test_iloc_getitem_multiindex2(self): + # TODO(wesm): fix this + pytest.skip('this test was being suppressed, ' + 'needs to be fixed') + + arr = np.random.randn(3, 3) + df = DataFrame(arr, columns=[[2, 2, 4], [6, 8, 10]], + index=[[4, 4, 8], [8, 10, 12]]) + + rs = df.iloc[2] + xp = Series(arr[2], index=df.columns) + tm.assert_series_equal(rs, xp) + + rs = df.iloc[:, 2] + xp = Series(arr[:, 2], index=df.index) + tm.assert_series_equal(rs, xp) + + rs = df.iloc[2, 2] + xp = df.values[2, 2] + assert rs == xp + + # for multiple items + # GH 5528 + rs = df.iloc[[0, 1]] + xp = df.xs(4, drop_level=False) + tm.assert_frame_equal(rs, xp) + + tup = zip(*[['a', 'a', 'b', 'b'], ['x', 'y', 'x', 'y']]) + index = MultiIndex.from_tuples(tup) + df = DataFrame(np.random.randn(4, 4), index=index) + rs = df.iloc[[2, 3]] + xp = df.xs('b', drop_level=False) + tm.assert_frame_equal(rs, xp) + + def test_setitem_multiindex(self): + with catch_warnings(record=True): + + for index_fn in ('ix', 'loc'): + + def assert_equal(a, b): + assert a == b + + def check(target, indexers, value, compare_fn, expected=None): + fn = getattr(target, index_fn) + fn.__setitem__(indexers, value) + result = fn.__getitem__(indexers) + if expected is None: + expected = value + compare_fn(result, expected) + # GH7190 + index = pd.MultiIndex.from_product([np.arange(0, 100), + np.arange(0, 80)], + names=['time', 'firm']) + t, n = 0, 2 + df = DataFrame(np.nan, columns=['A', 'w', 'l', 'a', 'x', + 'X', 'd', 'profit'], + index=index) + check(target=df, indexers=((t, n), 'X'), value=0, + compare_fn=assert_equal) + + df = DataFrame(-999, columns=['A', 'w', 'l', 'a', 'x', + 'X', 'd', 'profit'], + index=index) + check(target=df, indexers=((t, n), 'X'), value=1, + compare_fn=assert_equal) + + df = DataFrame(columns=['A', 'w', 'l', 'a', 'x', + 'X', 'd', 'profit'], + index=index) + check(target=df, indexers=((t, n), 'X'), value=2, + compare_fn=assert_equal) + + # gh-7218: assigning with 0-dim arrays + df = DataFrame(-999, columns=['A', 'w', 'l', 'a', 'x', + 'X', 'd', 'profit'], + index=index) + check(target=df, + indexers=((t, n), 'X'), + value=np.array(3), + compare_fn=assert_equal, + expected=3, ) + + # GH5206 + df = pd.DataFrame(np.arange(25).reshape(5, 5), + columns='A,B,C,D,E'.split(','), dtype=float) + df['F'] = 99 + row_selection = df['A'] % 2 == 0 + col_selection = ['B', 'C'] + with catch_warnings(record=True): + df.ix[row_selection, col_selection] = df['F'] + output = pd.DataFrame(99., index=[0, 2, 4], columns=['B', 'C']) + with catch_warnings(record=True): + tm.assert_frame_equal(df.ix[row_selection, col_selection], + output) + check(target=df, + indexers=(row_selection, col_selection), + value=df['F'], + compare_fn=tm.assert_frame_equal, + expected=output, ) + + # GH11372 + idx = pd.MultiIndex.from_product([ + ['A', 'B', 'C'], + pd.date_range('2015-01-01', '2015-04-01', freq='MS')]) + cols = pd.MultiIndex.from_product([ + ['foo', 'bar'], + pd.date_range('2016-01-01', '2016-02-01', freq='MS')]) + + df = pd.DataFrame(np.random.random((12, 4)), + index=idx, columns=cols) + + subidx = pd.MultiIndex.from_tuples( + [('A', pd.Timestamp('2015-01-01')), + ('A', pd.Timestamp('2015-02-01'))]) + subcols = pd.MultiIndex.from_tuples( + [('foo', pd.Timestamp('2016-01-01')), + ('foo', pd.Timestamp('2016-02-01'))]) + + vals = pd.DataFrame(np.random.random((2, 2)), + index=subidx, columns=subcols) + check(target=df, + indexers=(subidx, subcols), + value=vals, + compare_fn=tm.assert_frame_equal, ) + # set all columns + vals = pd.DataFrame( + np.random.random((2, 4)), index=subidx, columns=cols) + check(target=df, + indexers=(subidx, slice(None, None, None)), + value=vals, + compare_fn=tm.assert_frame_equal, ) + # identity + copy = df.copy() + check(target=df, indexers=(df.index, df.columns), value=df, + compare_fn=tm.assert_frame_equal, expected=copy) + + def test_loc_getitem_series(self): + # GH14730 + # passing a series as a key with a MultiIndex + index = MultiIndex.from_product([[1, 2, 3], ['A', 'B', 'C']]) + x = Series(index=index, data=range(9), dtype=np.float64) + y = Series([1, 3]) + expected = Series( + data=[0, 1, 2, 6, 7, 8], + index=MultiIndex.from_product([[1, 3], ['A', 'B', 'C']]), + dtype=np.float64) + result = x.loc[y] + tm.assert_series_equal(result, expected) + + result = x.loc[[1, 3]] + tm.assert_series_equal(result, expected) + + # GH15424 + y1 = Series([1, 3], index=[1, 2]) + result = x.loc[y1] + tm.assert_series_equal(result, expected) + + empty = Series(data=[], dtype=np.float64) + expected = Series([], index=MultiIndex( + levels=index.levels, labels=[[], []], dtype=np.float64)) + result = x.loc[empty] + tm.assert_series_equal(result, expected) + + def test_loc_getitem_array(self): + # GH15434 + # passing an array as a key with a MultiIndex + index = MultiIndex.from_product([[1, 2, 3], ['A', 'B', 'C']]) + x = Series(index=index, data=range(9), dtype=np.float64) + y = np.array([1, 3]) + expected = Series( + data=[0, 1, 2, 6, 7, 8], + index=MultiIndex.from_product([[1, 3], ['A', 'B', 'C']]), + dtype=np.float64) + result = x.loc[y] + tm.assert_series_equal(result, expected) + + # empty array: + empty = np.array([]) + expected = Series([], index=MultiIndex( + levels=index.levels, labels=[[], []], dtype=np.float64)) + result = x.loc[empty] + tm.assert_series_equal(result, expected) + + # 0-dim array (scalar): + scalar = np.int64(1) + expected = Series( + data=[0, 1, 2], + index=['A', 'B', 'C'], + dtype=np.float64) + result = x.loc[scalar] + tm.assert_series_equal(result, expected) + + def test_iloc_getitem_multiindex(self): + mi_labels = DataFrame(np.random.randn(4, 3), + columns=[['i', 'i', 'j'], ['A', 'A', 'B']], + index=[['i', 'i', 'j', 'k'], + ['X', 'X', 'Y', 'Y']]) + + mi_int = DataFrame(np.random.randn(3, 3), + columns=[[2, 2, 4], [6, 8, 10]], + index=[[4, 4, 8], [8, 10, 12]]) + + # the first row + rs = mi_int.iloc[0] + with catch_warnings(record=True): + xp = mi_int.ix[4].ix[8] + tm.assert_series_equal(rs, xp, check_names=False) + assert rs.name == (4, 8) + assert xp.name == 8 + + # 2nd (last) columns + rs = mi_int.iloc[:, 2] + with catch_warnings(record=True): + xp = mi_int.ix[:, 2] + tm.assert_series_equal(rs, xp) + + # corner column + rs = mi_int.iloc[2, 2] + with catch_warnings(record=True): + xp = mi_int.ix[:, 2].ix[2] + assert rs == xp + + # this is basically regular indexing + rs = mi_labels.iloc[2, 2] + with catch_warnings(record=True): + xp = mi_labels.ix['j'].ix[:, 'j'].ix[0, 0] + assert rs == xp + + def test_loc_multiindex(self): + + mi_labels = DataFrame(np.random.randn(3, 3), + columns=[['i', 'i', 'j'], ['A', 'A', 'B']], + index=[['i', 'i', 'j'], ['X', 'X', 'Y']]) + + mi_int = DataFrame(np.random.randn(3, 3), + columns=[[2, 2, 4], [6, 8, 10]], + index=[[4, 4, 8], [8, 10, 12]]) + + # the first row + rs = mi_labels.loc['i'] + with catch_warnings(record=True): + xp = mi_labels.ix['i'] + tm.assert_frame_equal(rs, xp) + + # 2nd (last) columns + rs = mi_labels.loc[:, 'j'] + with catch_warnings(record=True): + xp = mi_labels.ix[:, 'j'] + tm.assert_frame_equal(rs, xp) + + # corner column + rs = mi_labels.loc['j'].loc[:, 'j'] + with catch_warnings(record=True): + xp = mi_labels.ix['j'].ix[:, 'j'] + tm.assert_frame_equal(rs, xp) + + # with a tuple + rs = mi_labels.loc[('i', 'X')] + with catch_warnings(record=True): + xp = mi_labels.ix[('i', 'X')] + tm.assert_frame_equal(rs, xp) + + rs = mi_int.loc[4] + with catch_warnings(record=True): + xp = mi_int.ix[4] + tm.assert_frame_equal(rs, xp) + + def test_getitem_partial_int(self): + # GH 12416 + # with single item + l1 = [10, 20] + l2 = ['a', 'b'] + df = DataFrame(index=range(2), + columns=pd.MultiIndex.from_product([l1, l2])) + expected = DataFrame(index=range(2), + columns=l2) + result = df[20] + tm.assert_frame_equal(result, expected) + + # with list + expected = DataFrame(index=range(2), + columns=pd.MultiIndex.from_product([l1[1:], l2])) + result = df[[20]] + tm.assert_frame_equal(result, expected) + + # missing item: + with tm.assert_raises_regex(KeyError, '1'): + df[1] + with tm.assert_raises_regex(KeyError, "'\[1\] not in index'"): + df[[1]] + + def test_loc_multiindex_indexer_none(self): + + # GH6788 + # multi-index indexer is None (meaning take all) + attributes = ['Attribute' + str(i) for i in range(1)] + attribute_values = ['Value' + str(i) for i in range(5)] + + index = MultiIndex.from_product([attributes, attribute_values]) + df = 0.1 * np.random.randn(10, 1 * 5) + 0.5 + df = DataFrame(df, columns=index) + result = df[attributes] + tm.assert_frame_equal(result, df) + + # GH 7349 + # loc with a multi-index seems to be doing fallback + df = DataFrame(np.arange(12).reshape(-1, 1), + index=pd.MultiIndex.from_product([[1, 2, 3, 4], + [1, 2, 3]])) + + expected = df.loc[([1, 2], ), :] + result = df.loc[[1, 2]] + tm.assert_frame_equal(result, expected) + + def test_loc_multiindex_incomplete(self): + + # GH 7399 + # incomplete indexers + s = pd.Series(np.arange(15, dtype='int64'), + MultiIndex.from_product([range(5), ['a', 'b', 'c']])) + expected = s.loc[:, 'a':'c'] + + result = s.loc[0:4, 'a':'c'] + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) + + result = s.loc[:4, 'a':'c'] + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) + + result = s.loc[0:, 'a':'c'] + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) + + # GH 7400 + # multiindexer gettitem with list of indexers skips wrong element + s = pd.Series(np.arange(15, dtype='int64'), + MultiIndex.from_product([range(5), ['a', 'b', 'c']])) + expected = s.iloc[[6, 7, 8, 12, 13, 14]] + result = s.loc[2:4:2, 'a':'c'] + tm.assert_series_equal(result, expected) + + def test_multiindex_perf_warn(self): + + df = DataFrame({'jim': [0, 0, 1, 1], + 'joe': ['x', 'x', 'z', 'y'], + 'jolie': np.random.rand(4)}).set_index(['jim', 'joe']) + + with tm.assert_produces_warning(PerformanceWarning, + clear=[pd.core.index]): + df.loc[(1, 'z')] + + df = df.iloc[[2, 1, 3, 0]] + with tm.assert_produces_warning(PerformanceWarning): + df.loc[(0, )] + + def test_series_getitem_multiindex(self): + + # GH 6018 + # series regression getitem with a multi-index + + s = Series([1, 2, 3]) + s.index = MultiIndex.from_tuples([(0, 0), (1, 1), (2, 1)]) + + result = s[:, 0] + expected = Series([1], index=[0]) + tm.assert_series_equal(result, expected) + + result = s.loc[:, 1] + expected = Series([2, 3], index=[1, 2]) + tm.assert_series_equal(result, expected) + + # xs + result = s.xs(0, level=0) + expected = Series([1], index=[0]) + tm.assert_series_equal(result, expected) + + result = s.xs(1, level=1) + expected = Series([2, 3], index=[1, 2]) + tm.assert_series_equal(result, expected) + + # GH6258 + dt = list(date_range('20130903', periods=3)) + idx = MultiIndex.from_product([list('AB'), dt]) + s = Series([1, 3, 4, 1, 3, 4], index=idx) + + result = s.xs('20130903', level=1) + expected = Series([1, 1], index=list('AB')) + tm.assert_series_equal(result, expected) + + # GH5684 + idx = MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), ('b', 'one'), + ('b', 'two')]) + s = Series([1, 2, 3, 4], index=idx) + s.index.set_names(['L1', 'L2'], inplace=True) + result = s.xs('one', level='L2') + expected = Series([1, 3], index=['a', 'b']) + expected.index.set_names(['L1'], inplace=True) + tm.assert_series_equal(result, expected) + + def test_xs_multiindex(self): + + # GH2903 + columns = MultiIndex.from_tuples( + [('a', 'foo'), ('a', 'bar'), ('b', 'hello'), + ('b', 'world')], names=['lvl0', 'lvl1']) + df = DataFrame(np.random.randn(4, 4), columns=columns) + df.sort_index(axis=1, inplace=True) + result = df.xs('a', level='lvl0', axis=1) + expected = df.iloc[:, 0:2].loc[:, 'a'] + tm.assert_frame_equal(result, expected) + + result = df.xs('foo', level='lvl1', axis=1) + expected = df.iloc[:, 1:2].copy() + expected.columns = expected.columns.droplevel('lvl1') + tm.assert_frame_equal(result, expected) + + def test_multiindex_setitem(self): + + # GH 3738 + # setting with a multi-index right hand side + arrays = [np.array(['bar', 'bar', 'baz', 'qux', 'qux', 'bar']), + np.array(['one', 'two', 'one', 'one', 'two', 'one']), + np.arange(0, 6, 1)] + + df_orig = pd.DataFrame(np.random.randn(6, 3), + index=arrays, + columns=['A', 'B', 'C']).sort_index() + + expected = df_orig.loc[['bar']] * 2 + df = df_orig.copy() + df.loc[['bar']] *= 2 + tm.assert_frame_equal(df.loc[['bar']], expected) + + # raise because these have differing levels + def f(): + df.loc['bar'] *= 2 + + pytest.raises(TypeError, f) + + # from SO + # http://stackoverflow.com/questions/24572040/pandas-access-the-level-of-multiindex-for-inplace-operation + df_orig = DataFrame.from_dict({'price': { + ('DE', 'Coal', 'Stock'): 2, + ('DE', 'Gas', 'Stock'): 4, + ('DE', 'Elec', 'Demand'): 1, + ('FR', 'Gas', 'Stock'): 5, + ('FR', 'Solar', 'SupIm'): 0, + ('FR', 'Wind', 'SupIm'): 0 + }}) + df_orig.index = MultiIndex.from_tuples(df_orig.index, + names=['Sit', 'Com', 'Type']) + + expected = df_orig.copy() + expected.iloc[[0, 2, 3]] *= 2 + + idx = pd.IndexSlice + df = df_orig.copy() + df.loc[idx[:, :, 'Stock'], :] *= 2 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[idx[:, :, 'Stock'], 'price'] *= 2 + tm.assert_frame_equal(df, expected) + + def test_getitem_duplicates_multiindex(self): + # GH 5725 the 'A' happens to be a valid Timestamp so the doesn't raise + # the appropriate error, only in PY3 of course! + + index = MultiIndex(levels=[['D', 'B', 'C'], + [0, 26, 27, 37, 57, 67, 75, 82]], + labels=[[0, 0, 0, 1, 2, 2, 2, 2, 2, 2], + [1, 3, 4, 6, 0, 2, 2, 3, 5, 7]], + names=['tag', 'day']) + arr = np.random.randn(len(index), 1) + df = DataFrame(arr, index=index, columns=['val']) + result = df.val['D'] + expected = Series(arr.ravel()[0:3], name='val', index=Index( + [26, 37, 57], name='day')) + tm.assert_series_equal(result, expected) + + def f(): + df.val['A'] + + pytest.raises(KeyError, f) + + def f(): + df.val['X'] + + pytest.raises(KeyError, f) + + # A is treated as a special Timestamp + index = MultiIndex(levels=[['A', 'B', 'C'], + [0, 26, 27, 37, 57, 67, 75, 82]], + labels=[[0, 0, 0, 1, 2, 2, 2, 2, 2, 2], + [1, 3, 4, 6, 0, 2, 2, 3, 5, 7]], + names=['tag', 'day']) + df = DataFrame(arr, index=index, columns=['val']) + result = df.val['A'] + expected = Series(arr.ravel()[0:3], name='val', index=Index( + [26, 37, 57], name='day')) + tm.assert_series_equal(result, expected) + + def f(): + df.val['X'] + + pytest.raises(KeyError, f) + + # GH 7866 + # multi-index slicing with missing indexers + idx = pd.MultiIndex.from_product([['A', 'B', 'C'], + ['foo', 'bar', 'baz']], + names=['one', 'two']) + s = pd.Series(np.arange(9, dtype='int64'), index=idx).sort_index() + + exp_idx = pd.MultiIndex.from_product([['A'], ['foo', 'bar', 'baz']], + names=['one', 'two']) + expected = pd.Series(np.arange(3, dtype='int64'), + index=exp_idx).sort_index() + + result = s.loc[['A']] + tm.assert_series_equal(result, expected) + result = s.loc[['A', 'D']] + tm.assert_series_equal(result, expected) + + # not any values found + pytest.raises(KeyError, lambda: s.loc[['D']]) + + # empty ok + result = s.loc[[]] + expected = s.iloc[[]] + tm.assert_series_equal(result, expected) + + idx = pd.IndexSlice + expected = pd.Series([0, 3, 6], index=pd.MultiIndex.from_product( + [['A', 'B', 'C'], ['foo']], names=['one', 'two'])).sort_index() + + result = s.loc[idx[:, ['foo']]] + tm.assert_series_equal(result, expected) + result = s.loc[idx[:, ['foo', 'bah']]] + tm.assert_series_equal(result, expected) + + # GH 8737 + # empty indexer + multi_index = pd.MultiIndex.from_product((['foo', 'bar', 'baz'], + ['alpha', 'beta'])) + df = DataFrame( + np.random.randn(5, 6), index=range(5), columns=multi_index) + df = df.sort_index(level=0, axis=1) + + expected = DataFrame(index=range(5), + columns=multi_index.reindex([])[0]) + result1 = df.loc[:, ([], slice(None))] + result2 = df.loc[:, (['foo'], [])] + tm.assert_frame_equal(result1, expected) + tm.assert_frame_equal(result2, expected) + + # regression from < 0.14.0 + # GH 7914 + df = DataFrame([[np.mean, np.median], ['mean', 'median']], + columns=MultiIndex.from_tuples([('functs', 'mean'), + ('functs', 'median')]), + index=['function', 'name']) + result = df.loc['function', ('functs', 'mean')] + assert result == np.mean + + def test_multiindex_assignment(self): + + # GH3777 part 2 + + # mixed dtype + df = DataFrame(np.random.randint(5, 10, size=9).reshape(3, 3), + columns=list('abc'), + index=[[4, 4, 8], [8, 10, 12]]) + df['d'] = np.nan + arr = np.array([0., 1.]) + + with catch_warnings(record=True): + df.ix[4, 'd'] = arr + tm.assert_series_equal(df.ix[4, 'd'], + Series(arr, index=[8, 10], name='d')) + + # single dtype + df = DataFrame(np.random.randint(5, 10, size=9).reshape(3, 3), + columns=list('abc'), + index=[[4, 4, 8], [8, 10, 12]]) + + with catch_warnings(record=True): + df.ix[4, 'c'] = arr + exp = Series(arr, index=[8, 10], name='c', dtype='float64') + tm.assert_series_equal(df.ix[4, 'c'], exp) + + # scalar ok + with catch_warnings(record=True): + df.ix[4, 'c'] = 10 + exp = Series(10, index=[8, 10], name='c', dtype='float64') + tm.assert_series_equal(df.ix[4, 'c'], exp) + + # invalid assignments + def f(): + with catch_warnings(record=True): + df.ix[4, 'c'] = [0, 1, 2, 3] + + pytest.raises(ValueError, f) + + def f(): + with catch_warnings(record=True): + df.ix[4, 'c'] = [0] + + pytest.raises(ValueError, f) + + # groupby example + NUM_ROWS = 100 + NUM_COLS = 10 + col_names = ['A' + num for num in + map(str, np.arange(NUM_COLS).tolist())] + index_cols = col_names[:5] + + df = DataFrame(np.random.randint(5, size=(NUM_ROWS, NUM_COLS)), + dtype=np.int64, columns=col_names) + df = df.set_index(index_cols).sort_index() + grp = df.groupby(level=index_cols[:4]) + df['new_col'] = np.nan + + f_index = np.arange(5) + + def f(name, df2): + return Series(np.arange(df2.shape[0]), + name=df2.index.values[0]).reindex(f_index) + + # TODO(wesm): unused? + # new_df = pd.concat([f(name, df2) for name, df2 in grp], axis=1).T + + # we are actually operating on a copy here + # but in this case, that's ok + for name, df2 in grp: + new_vals = np.arange(df2.shape[0]) + with catch_warnings(record=True): + df.ix[name, 'new_col'] = new_vals + + def test_multiindex_label_slicing_with_negative_step(self): + s = Series(np.arange(20), + MultiIndex.from_product([list('abcde'), np.arange(4)])) + SLC = pd.IndexSlice + + def assert_slices_equivalent(l_slc, i_slc): + tm.assert_series_equal(s.loc[l_slc], s.iloc[i_slc]) + tm.assert_series_equal(s[l_slc], s.iloc[i_slc]) + with catch_warnings(record=True): + tm.assert_series_equal(s.ix[l_slc], s.iloc[i_slc]) + + assert_slices_equivalent(SLC[::-1], SLC[::-1]) + + assert_slices_equivalent(SLC['d'::-1], SLC[15::-1]) + assert_slices_equivalent(SLC[('d', )::-1], SLC[15::-1]) + + assert_slices_equivalent(SLC[:'d':-1], SLC[:11:-1]) + assert_slices_equivalent(SLC[:('d', ):-1], SLC[:11:-1]) + + assert_slices_equivalent(SLC['d':'b':-1], SLC[15:3:-1]) + assert_slices_equivalent(SLC[('d', ):'b':-1], SLC[15:3:-1]) + assert_slices_equivalent(SLC['d':('b', ):-1], SLC[15:3:-1]) + assert_slices_equivalent(SLC[('d', ):('b', ):-1], SLC[15:3:-1]) + assert_slices_equivalent(SLC['b':'d':-1], SLC[:0]) + + assert_slices_equivalent(SLC[('c', 2)::-1], SLC[10::-1]) + assert_slices_equivalent(SLC[:('c', 2):-1], SLC[:9:-1]) + assert_slices_equivalent(SLC[('e', 0):('c', 2):-1], SLC[16:9:-1]) + + def test_multiindex_slice_first_level(self): + # GH 12697 + freq = ['a', 'b', 'c', 'd'] + idx = pd.MultiIndex.from_product([freq, np.arange(500)]) + df = pd.DataFrame(list(range(2000)), index=idx, columns=['Test']) + df_slice = df.loc[pd.IndexSlice[:, 30:70], :] + result = df_slice.loc['a'] + expected = pd.DataFrame(list(range(30, 71)), + columns=['Test'], + index=range(30, 71)) + tm.assert_frame_equal(result, expected) + result = df_slice.loc['d'] + expected = pd.DataFrame(list(range(1530, 1571)), + columns=['Test'], + index=range(30, 71)) + tm.assert_frame_equal(result, expected) + + def test_multiindex_symmetric_difference(self): + # GH 13490 + idx = MultiIndex.from_product([['a', 'b'], ['A', 'B']], + names=['a', 'b']) + result = idx ^ idx + assert result.names == idx.names + + idx2 = idx.copy().rename(['A', 'B']) + result = idx ^ idx2 + assert result.names == [None, None] + + +class TestMultiIndexSlicers(object): + + def test_per_axis_per_level_getitem(self): + + # GH6134 + # example test case + ix = MultiIndex.from_product([_mklbl('A', 5), _mklbl('B', 7), _mklbl( + 'C', 4), _mklbl('D', 2)]) + df = DataFrame(np.arange(len(ix.get_values())), index=ix) + + result = df.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (a == 'A1' or a == 'A2' or a == 'A3') and ( + c == 'C1' or c == 'C3')]] + tm.assert_frame_equal(result, expected) + + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (a == 'A1' or a == 'A2' or a == 'A3') and ( + c == 'C1' or c == 'C2' or c == 'C3')]] + result = df.loc[(slice('A1', 'A3'), slice(None), slice('C1', 'C3')), :] + tm.assert_frame_equal(result, expected) + + # test multi-index slicing with per axis and per index controls + index = MultiIndex.from_tuples([('A', 1), ('A', 2), + ('A', 3), ('B', 1)], + names=['one', 'two']) + columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), + ('b', 'foo'), ('b', 'bah')], + names=['lvl0', 'lvl1']) + + df = DataFrame( + np.arange(16, dtype='int64').reshape( + 4, 4), index=index, columns=columns) + df = df.sort_index(axis=0).sort_index(axis=1) + + # identity + result = df.loc[(slice(None), slice(None)), :] + tm.assert_frame_equal(result, df) + result = df.loc[(slice(None), slice(None)), (slice(None), slice(None))] + tm.assert_frame_equal(result, df) + result = df.loc[:, (slice(None), slice(None))] + tm.assert_frame_equal(result, df) + + # index + result = df.loc[(slice(None), [1]), :] + expected = df.iloc[[0, 3]] + tm.assert_frame_equal(result, expected) + + result = df.loc[(slice(None), 1), :] + expected = df.iloc[[0, 3]] + tm.assert_frame_equal(result, expected) + + # columns + result = df.loc[:, (slice(None), ['foo'])] + expected = df.iloc[:, [1, 3]] + tm.assert_frame_equal(result, expected) + + # both + result = df.loc[(slice(None), 1), (slice(None), ['foo'])] + expected = df.iloc[[0, 3], [1, 3]] + tm.assert_frame_equal(result, expected) + + result = df.loc['A', 'a'] + expected = DataFrame(dict(bar=[1, 5, 9], foo=[0, 4, 8]), + index=Index([1, 2, 3], name='two'), + columns=Index(['bar', 'foo'], name='lvl1')) + tm.assert_frame_equal(result, expected) + + result = df.loc[(slice(None), [1, 2]), :] + expected = df.iloc[[0, 1, 3]] + tm.assert_frame_equal(result, expected) + + # multi-level series + s = Series(np.arange(len(ix.get_values())), index=ix) + result = s.loc['A1':'A3', :, ['C1', 'C3']] + expected = s.loc[[tuple([a, b, c, d]) + for a, b, c, d in s.index.values + if (a == 'A1' or a == 'A2' or a == 'A3') and ( + c == 'C1' or c == 'C3')]] + tm.assert_series_equal(result, expected) + + # boolean indexers + result = df.loc[(slice(None), df.loc[:, ('a', 'bar')] > 5), :] + expected = df.iloc[[2, 3]] + tm.assert_frame_equal(result, expected) + + def f(): + df.loc[(slice(None), np.array([True, False])), :] + + pytest.raises(ValueError, f) + + # ambiguous cases + # these can be multiply interpreted (e.g. in this case + # as df.loc[slice(None),[1]] as well + pytest.raises(KeyError, lambda: df.loc[slice(None), [1]]) + + result = df.loc[(slice(None), [1]), :] + expected = df.iloc[[0, 3]] + tm.assert_frame_equal(result, expected) + + # not lexsorted + assert df.index.lexsort_depth == 2 + df = df.sort_index(level=1, axis=0) + assert df.index.lexsort_depth == 0 + with tm.assert_raises_regex( + UnsortedIndexError, + 'MultiIndex slicing requires the index to be ' + r'lexsorted: slicing on levels \[1\], lexsort depth 0'): + df.loc[(slice(None), slice('bar')), :] + + # GH 16734: not sorted, but no real slicing + result = df.loc[(slice(None), df.loc[:, ('a', 'bar')] > 5), :] + tm.assert_frame_equal(result, df.iloc[[1, 3], :]) + + def test_multiindex_slicers_non_unique(self): + + # GH 7106 + # non-unique mi index support + df = (DataFrame(dict(A=['foo', 'foo', 'foo', 'foo'], + B=['a', 'a', 'a', 'a'], + C=[1, 2, 1, 3], + D=[1, 2, 3, 4])) + .set_index(['A', 'B', 'C']).sort_index()) + assert not df.index.is_unique + expected = (DataFrame(dict(A=['foo', 'foo'], B=['a', 'a'], + C=[1, 1], D=[1, 3])) + .set_index(['A', 'B', 'C']).sort_index()) + result = df.loc[(slice(None), slice(None), 1), :] + tm.assert_frame_equal(result, expected) + + # this is equivalent of an xs expression + result = df.xs(1, level=2, drop_level=False) + tm.assert_frame_equal(result, expected) + + df = (DataFrame(dict(A=['foo', 'foo', 'foo', 'foo'], + B=['a', 'a', 'a', 'a'], + C=[1, 2, 1, 2], + D=[1, 2, 3, 4])) + .set_index(['A', 'B', 'C']).sort_index()) + assert not df.index.is_unique + expected = (DataFrame(dict(A=['foo', 'foo'], B=['a', 'a'], + C=[1, 1], D=[1, 3])) + .set_index(['A', 'B', 'C']).sort_index()) + result = df.loc[(slice(None), slice(None), 1), :] + assert not result.index.is_unique + tm.assert_frame_equal(result, expected) + + # GH12896 + # numpy-implementation dependent bug + ints = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 16, + 17, 18, 19, 200000, 200000] + n = len(ints) + idx = MultiIndex.from_arrays([['a'] * n, ints]) + result = Series([1] * n, index=idx) + result = result.sort_index() + result = result.loc[(slice(None), slice(100000))] + expected = Series([1] * (n - 2), index=idx[:-2]).sort_index() + tm.assert_series_equal(result, expected) + + def test_multiindex_slicers_datetimelike(self): + + # GH 7429 + # buggy/inconsistent behavior when slicing with datetime-like + import datetime + dates = [datetime.datetime(2012, 1, 1, 12, 12, 12) + + datetime.timedelta(days=i) for i in range(6)] + freq = [1, 2] + index = MultiIndex.from_product( + [dates, freq], names=['date', 'frequency']) + + df = DataFrame( + np.arange(6 * 2 * 4, dtype='int64').reshape( + -1, 4), index=index, columns=list('ABCD')) + + # multi-axis slicing + idx = pd.IndexSlice + expected = df.iloc[[0, 2, 4], [0, 1]] + result = df.loc[(slice(Timestamp('2012-01-01 12:12:12'), + Timestamp('2012-01-03 12:12:12')), + slice(1, 1)), slice('A', 'B')] + tm.assert_frame_equal(result, expected) + + result = df.loc[(idx[Timestamp('2012-01-01 12:12:12'):Timestamp( + '2012-01-03 12:12:12')], idx[1:1]), slice('A', 'B')] + tm.assert_frame_equal(result, expected) + + result = df.loc[(slice(Timestamp('2012-01-01 12:12:12'), + Timestamp('2012-01-03 12:12:12')), 1), + slice('A', 'B')] + tm.assert_frame_equal(result, expected) + + # with strings + result = df.loc[(slice('2012-01-01 12:12:12', '2012-01-03 12:12:12'), + slice(1, 1)), slice('A', 'B')] + tm.assert_frame_equal(result, expected) + + result = df.loc[(idx['2012-01-01 12:12:12':'2012-01-03 12:12:12'], 1), + idx['A', 'B']] + tm.assert_frame_equal(result, expected) + + def test_multiindex_slicers_edges(self): + # GH 8132 + # various edge cases + df = DataFrame( + {'A': ['A0'] * 5 + ['A1'] * 5 + ['A2'] * 5, + 'B': ['B0', 'B0', 'B1', 'B1', 'B2'] * 3, + 'DATE': ["2013-06-11", "2013-07-02", "2013-07-09", "2013-07-30", + "2013-08-06", "2013-06-11", "2013-07-02", "2013-07-09", + "2013-07-30", "2013-08-06", "2013-09-03", "2013-10-01", + "2013-07-09", "2013-08-06", "2013-09-03"], + 'VALUES': [22, 35, 14, 9, 4, 40, 18, 4, 2, 5, 1, 2, 3, 4, 2]}) + + df['DATE'] = pd.to_datetime(df['DATE']) + df1 = df.set_index(['A', 'B', 'DATE']) + df1 = df1.sort_index() + + # A1 - Get all values under "A0" and "A1" + result = df1.loc[(slice('A1')), :] + expected = df1.iloc[0:10] + tm.assert_frame_equal(result, expected) + + # A2 - Get all values from the start to "A2" + result = df1.loc[(slice('A2')), :] + expected = df1 + tm.assert_frame_equal(result, expected) + + # A3 - Get all values under "B1" or "B2" + result = df1.loc[(slice(None), slice('B1', 'B2')), :] + expected = df1.iloc[[2, 3, 4, 7, 8, 9, 12, 13, 14]] + tm.assert_frame_equal(result, expected) + + # A4 - Get all values between 2013-07-02 and 2013-07-09 + result = df1.loc[(slice(None), slice(None), + slice('20130702', '20130709')), :] + expected = df1.iloc[[1, 2, 6, 7, 12]] + tm.assert_frame_equal(result, expected) + + # B1 - Get all values in B0 that are also under A0, A1 and A2 + result = df1.loc[(slice('A2'), slice('B0')), :] + expected = df1.iloc[[0, 1, 5, 6, 10, 11]] + tm.assert_frame_equal(result, expected) + + # B2 - Get all values in B0, B1 and B2 (similar to what #2 is doing for + # the As) + result = df1.loc[(slice(None), slice('B2')), :] + expected = df1 + tm.assert_frame_equal(result, expected) + + # B3 - Get all values from B1 to B2 and up to 2013-08-06 + result = df1.loc[(slice(None), slice('B1', 'B2'), + slice('2013-08-06')), :] + expected = df1.iloc[[2, 3, 4, 7, 8, 9, 12, 13]] + tm.assert_frame_equal(result, expected) + + # B4 - Same as A4 but the start of the date slice is not a key. + # shows indexing on a partial selection slice + result = df1.loc[(slice(None), slice(None), + slice('20130701', '20130709')), :] + expected = df1.iloc[[1, 2, 6, 7, 12]] + tm.assert_frame_equal(result, expected) + + def test_per_axis_per_level_doc_examples(self): + + # test index maker + idx = pd.IndexSlice + + # from indexing.rst / advanced + index = MultiIndex.from_product([_mklbl('A', 4), _mklbl('B', 2), + _mklbl('C', 4), _mklbl('D', 2)]) + columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), + ('b', 'foo'), ('b', 'bah')], + names=['lvl0', 'lvl1']) + df = DataFrame(np.arange(len(index) * len(columns), dtype='int64') + .reshape((len(index), len(columns))), + index=index, columns=columns) + result = df.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (a == 'A1' or a == 'A2' or a == 'A3') and ( + c == 'C1' or c == 'C3')]] + tm.assert_frame_equal(result, expected) + result = df.loc[idx['A1':'A3', :, ['C1', 'C3']], :] + tm.assert_frame_equal(result, expected) + + result = df.loc[(slice(None), slice(None), ['C1', 'C3']), :] + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (c == 'C1' or c == 'C3')]] + tm.assert_frame_equal(result, expected) + result = df.loc[idx[:, :, ['C1', 'C3']], :] + tm.assert_frame_equal(result, expected) + + # not sorted + def f(): + df.loc['A1', ('a', slice('foo'))] + + pytest.raises(UnsortedIndexError, f) + + # GH 16734: not sorted, but no real slicing + tm.assert_frame_equal(df.loc['A1', (slice(None), 'foo')], + df.loc['A1'].iloc[:, [0, 2]]) + + df = df.sort_index(axis=1) + + # slicing + df.loc['A1', (slice(None), 'foo')] + df.loc[(slice(None), slice(None), ['C1', 'C3']), (slice(None), 'foo')] + + # setitem + df.loc(axis=0)[:, :, ['C1', 'C3']] = -10 + + def test_loc_axis_arguments(self): + + index = MultiIndex.from_product([_mklbl('A', 4), _mklbl('B', 2), + _mklbl('C', 4), _mklbl('D', 2)]) + columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), + ('b', 'foo'), ('b', 'bah')], + names=['lvl0', 'lvl1']) + df = DataFrame(np.arange(len(index) * len(columns), dtype='int64') + .reshape((len(index), len(columns))), + index=index, + columns=columns).sort_index().sort_index(axis=1) + + # axis 0 + result = df.loc(axis=0)['A1':'A3', :, ['C1', 'C3']] + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (a == 'A1' or a == 'A2' or a == 'A3') and ( + c == 'C1' or c == 'C3')]] + tm.assert_frame_equal(result, expected) + + result = df.loc(axis='index')[:, :, ['C1', 'C3']] + expected = df.loc[[tuple([a, b, c, d]) + for a, b, c, d in df.index.values + if (c == 'C1' or c == 'C3')]] + tm.assert_frame_equal(result, expected) + + # axis 1 + result = df.loc(axis=1)[:, 'foo'] + expected = df.loc[:, (slice(None), 'foo')] + tm.assert_frame_equal(result, expected) + + result = df.loc(axis='columns')[:, 'foo'] + expected = df.loc[:, (slice(None), 'foo')] + tm.assert_frame_equal(result, expected) + + # invalid axis + def f(): + df.loc(axis=-1)[:, :, ['C1', 'C3']] + + pytest.raises(ValueError, f) + + def f(): + df.loc(axis=2)[:, :, ['C1', 'C3']] + + pytest.raises(ValueError, f) + + def f(): + df.loc(axis='foo')[:, :, ['C1', 'C3']] + + pytest.raises(ValueError, f) + + def test_per_axis_per_level_setitem(self): + + # test index maker + idx = pd.IndexSlice + + # test multi-index slicing with per axis and per index controls + index = MultiIndex.from_tuples([('A', 1), ('A', 2), + ('A', 3), ('B', 1)], + names=['one', 'two']) + columns = MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), + ('b', 'foo'), ('b', 'bah')], + names=['lvl0', 'lvl1']) + + df_orig = DataFrame( + np.arange(16, dtype='int64').reshape( + 4, 4), index=index, columns=columns) + df_orig = df_orig.sort_index(axis=0).sort_index(axis=1) + + # identity + df = df_orig.copy() + df.loc[(slice(None), slice(None)), :] = 100 + expected = df_orig.copy() + expected.iloc[:, :] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc(axis=0)[:, :] = 100 + expected = df_orig.copy() + expected.iloc[:, :] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[(slice(None), slice(None)), (slice(None), slice(None))] = 100 + expected = df_orig.copy() + expected.iloc[:, :] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[:, (slice(None), slice(None))] = 100 + expected = df_orig.copy() + expected.iloc[:, :] = 100 + tm.assert_frame_equal(df, expected) + + # index + df = df_orig.copy() + df.loc[(slice(None), [1]), :] = 100 + expected = df_orig.copy() + expected.iloc[[0, 3]] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[(slice(None), 1), :] = 100 + expected = df_orig.copy() + expected.iloc[[0, 3]] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc(axis=0)[:, 1] = 100 + expected = df_orig.copy() + expected.iloc[[0, 3]] = 100 + tm.assert_frame_equal(df, expected) + + # columns + df = df_orig.copy() + df.loc[:, (slice(None), ['foo'])] = 100 + expected = df_orig.copy() + expected.iloc[:, [1, 3]] = 100 + tm.assert_frame_equal(df, expected) + + # both + df = df_orig.copy() + df.loc[(slice(None), 1), (slice(None), ['foo'])] = 100 + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[idx[:, 1], idx[:, ['foo']]] = 100 + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] = 100 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc['A', 'a'] = 100 + expected = df_orig.copy() + expected.iloc[0:3, 0:2] = 100 + tm.assert_frame_equal(df, expected) + + # setting with a list-like + df = df_orig.copy() + df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( + [[100, 100], [100, 100]], dtype='int64') + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] = 100 + tm.assert_frame_equal(df, expected) + + # not enough values + df = df_orig.copy() + + def f(): + df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( + [[100], [100, 100]], dtype='int64') + + pytest.raises(ValueError, f) + + def f(): + df.loc[(slice(None), 1), (slice(None), ['foo'])] = np.array( + [100, 100, 100, 100], dtype='int64') + + pytest.raises(ValueError, f) + + # with an alignable rhs + df = df_orig.copy() + df.loc[(slice(None), 1), (slice(None), ['foo'])] = df.loc[(slice( + None), 1), (slice(None), ['foo'])] * 5 + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] = expected.iloc[[0, 3], [1, 3]] * 5 + tm.assert_frame_equal(df, expected) + + df = df_orig.copy() + df.loc[(slice(None), 1), (slice(None), ['foo'])] *= df.loc[(slice( + None), 1), (slice(None), ['foo'])] + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] *= expected.iloc[[0, 3], [1, 3]] + tm.assert_frame_equal(df, expected) + + rhs = df_orig.loc[(slice(None), 1), (slice(None), ['foo'])].copy() + rhs.loc[:, ('c', 'bah')] = 10 + df = df_orig.copy() + df.loc[(slice(None), 1), (slice(None), ['foo'])] *= rhs + expected = df_orig.copy() + expected.iloc[[0, 3], [1, 3]] *= expected.iloc[[0, 3], [1, 3]] + tm.assert_frame_equal(df, expected) + + +class TestMultiIndexPanel(object): + + def test_iloc_getitem_panel_multiindex(self): + + with catch_warnings(record=True): + + # GH 7199 + # Panel with multi-index + multi_index = pd.MultiIndex.from_tuples([('ONE', 'one'), + ('TWO', 'two'), + ('THREE', 'three')], + names=['UPPER', 'lower']) + + simple_index = [x[0] for x in multi_index] + wd1 = Panel(items=['First', 'Second'], + major_axis=['a', 'b', 'c', 'd'], + minor_axis=multi_index) + + wd2 = Panel(items=['First', 'Second'], + major_axis=['a', 'b', 'c', 'd'], + minor_axis=simple_index) + + expected1 = wd1['First'].iloc[[True, True, True, False], [0, 2]] + result1 = wd1.iloc[0, [True, True, True, False], [0, 2]] # WRONG + tm.assert_frame_equal(result1, expected1) + + expected2 = wd2['First'].iloc[[True, True, True, False], [0, 2]] + result2 = wd2.iloc[0, [True, True, True, False], [0, 2]] + tm.assert_frame_equal(result2, expected2) + + expected1 = DataFrame(index=['a'], columns=multi_index, + dtype='float64') + result1 = wd1.iloc[0, [0], [0, 1, 2]] + tm.assert_frame_equal(result1, expected1) + + expected2 = DataFrame(index=['a'], columns=simple_index, + dtype='float64') + result2 = wd2.iloc[0, [0], [0, 1, 2]] + tm.assert_frame_equal(result2, expected2) + + # GH 7516 + mi = MultiIndex.from_tuples([(0, 'x'), (1, 'y'), (2, 'z')]) + p = Panel(np.arange(3 * 3 * 3, dtype='int64').reshape(3, 3, 3), + items=['a', 'b', 'c'], major_axis=mi, + minor_axis=['u', 'v', 'w']) + result = p.iloc[:, 1, 0] + expected = Series([3, 12, 21], index=['a', 'b', 'c'], name='u') + tm.assert_series_equal(result, expected) + + result = p.loc[:, (1, 'y'), 'u'] + tm.assert_series_equal(result, expected) + + def test_panel_setitem_with_multiindex(self): + + with catch_warnings(record=True): + # 10360 + # failing with a multi-index + arr = np.array([[[1, 2, 3], [0, 0, 0]], + [[0, 0, 0], [0, 0, 0]]], + dtype=np.float64) + + # reg index + axes = dict(items=['A', 'B'], major_axis=[0, 1], + minor_axis=['X', 'Y', 'Z']) + p1 = Panel(0., **axes) + p1.iloc[0, 0, :] = [1, 2, 3] + expected = Panel(arr, **axes) + tm.assert_panel_equal(p1, expected) + + # multi-indexes + axes['items'] = pd.MultiIndex.from_tuples( + [('A', 'a'), ('B', 'b')]) + p2 = Panel(0., **axes) + p2.iloc[0, 0, :] = [1, 2, 3] + expected = Panel(arr, **axes) + tm.assert_panel_equal(p2, expected) + + axes['major_axis'] = pd.MultiIndex.from_tuples( + [('A', 1), ('A', 2)]) + p3 = Panel(0., **axes) + p3.iloc[0, 0, :] = [1, 2, 3] + expected = Panel(arr, **axes) + tm.assert_panel_equal(p3, expected) + + axes['minor_axis'] = pd.MultiIndex.from_product( + [['X'], range(3)]) + p4 = Panel(0., **axes) + p4.iloc[0, 0, :] = [1, 2, 3] + expected = Panel(arr, **axes) + tm.assert_panel_equal(p4, expected) + + arr = np.array( + [[[1, 0, 0], [2, 0, 0]], [[0, 0, 0], [0, 0, 0]]], + dtype=np.float64) + p5 = Panel(0., **axes) + p5.iloc[0, :, 0] = [1, 2] + expected = Panel(arr, **axes) + tm.assert_panel_equal(p5, expected) diff --git a/pandas/tests/indexing/test_panel.py b/pandas/tests/indexing/test_panel.py new file mode 100644 index 0000000000000..2d4ffd6a4e783 --- /dev/null +++ b/pandas/tests/indexing/test_panel.py @@ -0,0 +1,219 @@ +import pytest +from warnings import catch_warnings + +import numpy as np +from pandas.util import testing as tm +from pandas import Panel, date_range, DataFrame + + +class TestPanel(object): + + def test_iloc_getitem_panel(self): + + with catch_warnings(record=True): + # GH 7189 + p = Panel(np.arange(4 * 3 * 2).reshape(4, 3, 2), + items=['A', 'B', 'C', 'D'], + major_axis=['a', 'b', 'c'], + minor_axis=['one', 'two']) + + result = p.iloc[1] + expected = p.loc['B'] + tm.assert_frame_equal(result, expected) + + result = p.iloc[1, 1] + expected = p.loc['B', 'b'] + tm.assert_series_equal(result, expected) + + result = p.iloc[1, 1, 1] + expected = p.loc['B', 'b', 'two'] + assert result == expected + + # slice + result = p.iloc[1:3] + expected = p.loc[['B', 'C']] + tm.assert_panel_equal(result, expected) + + result = p.iloc[:, 0:2] + expected = p.loc[:, ['a', 'b']] + tm.assert_panel_equal(result, expected) + + # list of integers + result = p.iloc[[0, 2]] + expected = p.loc[['A', 'C']] + tm.assert_panel_equal(result, expected) + + # neg indicies + result = p.iloc[[-1, 1], [-1, 1]] + expected = p.loc[['D', 'B'], ['c', 'b']] + tm.assert_panel_equal(result, expected) + + # dups indicies + result = p.iloc[[-1, -1, 1], [-1, 1]] + expected = p.loc[['D', 'D', 'B'], ['c', 'b']] + tm.assert_panel_equal(result, expected) + + # combined + result = p.iloc[0, [True, True], [0, 1]] + expected = p.loc['A', ['a', 'b'], ['one', 'two']] + tm.assert_frame_equal(result, expected) + + # out-of-bounds exception + with pytest.raises(IndexError): + p.iloc[tuple([10, 5])] + + def f(): + p.iloc[0, [True, True], [0, 1, 2]] + + pytest.raises(IndexError, f) + + # trying to use a label + with pytest.raises(ValueError): + p.iloc[tuple(['j', 'D'])] + + # GH + p = Panel( + np.random.rand(4, 3, 2), items=['A', 'B', 'C', 'D'], + major_axis=['U', 'V', 'W'], minor_axis=['X', 'Y']) + expected = p['A'] + + result = p.iloc[0, :, :] + tm.assert_frame_equal(result, expected) + + result = p.iloc[0, [True, True, True], :] + tm.assert_frame_equal(result, expected) + + result = p.iloc[0, [True, True, True], [0, 1]] + tm.assert_frame_equal(result, expected) + + def f(): + p.iloc[0, [True, True, True], [0, 1, 2]] + + pytest.raises(IndexError, f) + + def f(): + p.iloc[0, [True, True, True], [2]] + + pytest.raises(IndexError, f) + + def test_iloc_panel_issue(self): + + with catch_warnings(record=True): + # see gh-3617 + p = Panel(np.random.randn(4, 4, 4)) + + assert p.iloc[:3, :3, :3].shape == (3, 3, 3) + assert p.iloc[1, :3, :3].shape == (3, 3) + assert p.iloc[:3, 1, :3].shape == (3, 3) + assert p.iloc[:3, :3, 1].shape == (3, 3) + assert p.iloc[1, 1, :3].shape == (3, ) + assert p.iloc[1, :3, 1].shape == (3, ) + assert p.iloc[:3, 1, 1].shape == (3, ) + + def test_panel_getitem(self): + + with catch_warnings(record=True): + # GH4016, date selection returns a frame when a partial string + # selection + ind = date_range(start="2000", freq="D", periods=1000) + df = DataFrame( + np.random.randn( + len(ind), 5), index=ind, columns=list('ABCDE')) + panel = Panel(dict([('frame_' + c, df) for c in list('ABC')])) + + test2 = panel.loc[:, "2002":"2002-12-31"] + test1 = panel.loc[:, "2002"] + tm.assert_panel_equal(test1, test2) + + # GH8710 + # multi-element getting with a list + panel = tm.makePanel() + + expected = panel.iloc[[0, 1]] + + result = panel.loc[['ItemA', 'ItemB']] + tm.assert_panel_equal(result, expected) + + result = panel.loc[['ItemA', 'ItemB'], :, :] + tm.assert_panel_equal(result, expected) + + result = panel[['ItemA', 'ItemB']] + tm.assert_panel_equal(result, expected) + + result = panel.loc['ItemA':'ItemB'] + tm.assert_panel_equal(result, expected) + + with catch_warnings(record=True): + result = panel.ix[['ItemA', 'ItemB']] + tm.assert_panel_equal(result, expected) + + # with an object-like + # GH 9140 + class TestObject: + + def __str__(self): + return "TestObject" + + obj = TestObject() + + p = Panel(np.random.randn(1, 5, 4), items=[obj], + major_axis=date_range('1/1/2000', periods=5), + minor_axis=['A', 'B', 'C', 'D']) + + expected = p.iloc[0] + result = p[obj] + tm.assert_frame_equal(result, expected) + + def test_panel_setitem(self): + + with catch_warnings(record=True): + # GH 7763 + # loc and setitem have setting differences + np.random.seed(0) + index = range(3) + columns = list('abc') + + panel = Panel({'A': DataFrame(np.random.randn(3, 3), + index=index, columns=columns), + 'B': DataFrame(np.random.randn(3, 3), + index=index, columns=columns), + 'C': DataFrame(np.random.randn(3, 3), + index=index, columns=columns)}) + + replace = DataFrame(np.eye(3, 3), index=range(3), columns=columns) + expected = Panel({'A': replace, 'B': replace, 'C': replace}) + + p = panel.copy() + for idx in list('ABC'): + p[idx] = replace + tm.assert_panel_equal(p, expected) + + p = panel.copy() + for idx in list('ABC'): + p.loc[idx, :, :] = replace + tm.assert_panel_equal(p, expected) + + def test_panel_assignment(self): + + with catch_warnings(record=True): + # GH3777 + wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], + major_axis=date_range('1/1/2000', periods=5), + minor_axis=['A', 'B', 'C', 'D']) + wp2 = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], + major_axis=date_range('1/1/2000', periods=5), + minor_axis=['A', 'B', 'C', 'D']) + + # TODO: unused? + # expected = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] + + def f(): + wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = wp2.loc[ + ['Item1', 'Item2'], :, ['A', 'B']] + + pytest.raises(NotImplementedError, f) + + # to_assign = wp2.loc[['Item1', 'Item2'], :, ['A', 'B']] + # wp.loc[['Item1', 'Item2'], :, ['A', 'B']] = to_assign + # result = wp.loc[['Item1', 'Item2'], :, ['A', 'B']] + # tm.assert_panel_equal(result,expected) diff --git a/pandas/tests/indexing/test_partial.py b/pandas/tests/indexing/test_partial.py new file mode 100644 index 0000000000000..93a85e247a787 --- /dev/null +++ b/pandas/tests/indexing/test_partial.py @@ -0,0 +1,591 @@ +""" +test setting *parts* of objects both positionally and label based + +TOD: these should be split among the indexer tests +""" + +import pytest + +from warnings import catch_warnings +import numpy as np + +import pandas as pd +from pandas import Series, DataFrame, Panel, Index, date_range +from pandas.util import testing as tm + + +class TestPartialSetting(object): + + def test_partial_setting(self): + + # GH2578, allow ix and friends to partially set + + # series + s_orig = Series([1, 2, 3]) + + s = s_orig.copy() + s[5] = 5 + expected = Series([1, 2, 3, 5], index=[0, 1, 2, 5]) + tm.assert_series_equal(s, expected) + + s = s_orig.copy() + s.loc[5] = 5 + expected = Series([1, 2, 3, 5], index=[0, 1, 2, 5]) + tm.assert_series_equal(s, expected) + + s = s_orig.copy() + s[5] = 5. + expected = Series([1, 2, 3, 5.], index=[0, 1, 2, 5]) + tm.assert_series_equal(s, expected) + + s = s_orig.copy() + s.loc[5] = 5. + expected = Series([1, 2, 3, 5.], index=[0, 1, 2, 5]) + tm.assert_series_equal(s, expected) + + # iloc/iat raise + s = s_orig.copy() + + def f(): + s.iloc[3] = 5. + + pytest.raises(IndexError, f) + + def f(): + s.iat[3] = 5. + + pytest.raises(IndexError, f) + + # ## frame ## + + df_orig = DataFrame( + np.arange(6).reshape(3, 2), columns=['A', 'B'], dtype='int64') + + # iloc/iat raise + df = df_orig.copy() + + def f(): + df.iloc[4, 2] = 5. + + pytest.raises(IndexError, f) + + def f(): + df.iat[4, 2] = 5. + + pytest.raises(IndexError, f) + + # row setting where it exists + expected = DataFrame(dict({'A': [0, 4, 4], 'B': [1, 5, 5]})) + df = df_orig.copy() + df.iloc[1] = df.iloc[2] + tm.assert_frame_equal(df, expected) + + expected = DataFrame(dict({'A': [0, 4, 4], 'B': [1, 5, 5]})) + df = df_orig.copy() + df.loc[1] = df.loc[2] + tm.assert_frame_equal(df, expected) + + # like 2578, partial setting with dtype preservation + expected = DataFrame(dict({'A': [0, 2, 4, 4], 'B': [1, 3, 5, 5]})) + df = df_orig.copy() + df.loc[3] = df.loc[2] + tm.assert_frame_equal(df, expected) + + # single dtype frame, overwrite + expected = DataFrame(dict({'A': [0, 2, 4], 'B': [0, 2, 4]})) + df = df_orig.copy() + with catch_warnings(record=True): + df.ix[:, 'B'] = df.ix[:, 'A'] + tm.assert_frame_equal(df, expected) + + # mixed dtype frame, overwrite + expected = DataFrame(dict({'A': [0, 2, 4], 'B': Series([0, 2, 4])})) + df = df_orig.copy() + df['B'] = df['B'].astype(np.float64) + with catch_warnings(record=True): + df.ix[:, 'B'] = df.ix[:, 'A'] + tm.assert_frame_equal(df, expected) + + # single dtype frame, partial setting + expected = df_orig.copy() + expected['C'] = df['A'] + df = df_orig.copy() + with catch_warnings(record=True): + df.ix[:, 'C'] = df.ix[:, 'A'] + tm.assert_frame_equal(df, expected) + + # mixed frame, partial setting + expected = df_orig.copy() + expected['C'] = df['A'] + df = df_orig.copy() + with catch_warnings(record=True): + df.ix[:, 'C'] = df.ix[:, 'A'] + tm.assert_frame_equal(df, expected) + + with catch_warnings(record=True): + # ## panel ## + p_orig = Panel(np.arange(16).reshape(2, 4, 2), + items=['Item1', 'Item2'], + major_axis=pd.date_range('2001/1/12', periods=4), + minor_axis=['A', 'B'], dtype='float64') + + # panel setting via item + p_orig = Panel(np.arange(16).reshape(2, 4, 2), + items=['Item1', 'Item2'], + major_axis=pd.date_range('2001/1/12', periods=4), + minor_axis=['A', 'B'], dtype='float64') + expected = p_orig.copy() + expected['Item3'] = expected['Item1'] + p = p_orig.copy() + p.loc['Item3'] = p['Item1'] + tm.assert_panel_equal(p, expected) + + # panel with aligned series + expected = p_orig.copy() + expected = expected.transpose(2, 1, 0) + expected['C'] = DataFrame({'Item1': [30, 30, 30, 30], + 'Item2': [32, 32, 32, 32]}, + index=p_orig.major_axis) + expected = expected.transpose(2, 1, 0) + p = p_orig.copy() + p.loc[:, :, 'C'] = Series([30, 32], index=p_orig.items) + tm.assert_panel_equal(p, expected) + + # GH 8473 + dates = date_range('1/1/2000', periods=8) + df_orig = DataFrame(np.random.randn(8, 4), index=dates, + columns=['A', 'B', 'C', 'D']) + + expected = pd.concat([df_orig, DataFrame( + {'A': 7}, index=[dates[-1] + 1])]) + df = df_orig.copy() + df.loc[dates[-1] + 1, 'A'] = 7 + tm.assert_frame_equal(df, expected) + df = df_orig.copy() + df.at[dates[-1] + 1, 'A'] = 7 + tm.assert_frame_equal(df, expected) + + exp_other = DataFrame({0: 7}, index=[dates[-1] + 1]) + expected = pd.concat([df_orig, exp_other], axis=1) + + df = df_orig.copy() + df.loc[dates[-1] + 1, 0] = 7 + tm.assert_frame_equal(df, expected) + df = df_orig.copy() + df.at[dates[-1] + 1, 0] = 7 + tm.assert_frame_equal(df, expected) + + def test_partial_setting_mixed_dtype(self): + + # in a mixed dtype environment, try to preserve dtypes + # by appending + df = DataFrame([[True, 1], [False, 2]], columns=["female", "fitness"]) + + s = df.loc[1].copy() + s.name = 2 + expected = df.append(s) + + df.loc[2] = df.loc[1] + tm.assert_frame_equal(df, expected) + + # columns will align + df = DataFrame(columns=['A', 'B']) + df.loc[0] = Series(1, index=range(4)) + tm.assert_frame_equal(df, DataFrame(columns=['A', 'B'], index=[0])) + + # columns will align + df = DataFrame(columns=['A', 'B']) + df.loc[0] = Series(1, index=['B']) + + exp = DataFrame([[np.nan, 1]], columns=['A', 'B'], + index=[0], dtype='float64') + tm.assert_frame_equal(df, exp) + + # list-like must conform + df = DataFrame(columns=['A', 'B']) + + def f(): + df.loc[0] = [1, 2, 3] + + pytest.raises(ValueError, f) + + # TODO: #15657, these are left as object and not coerced + df = DataFrame(columns=['A', 'B']) + df.loc[3] = [6, 7] + + exp = DataFrame([[6, 7]], index=[3], columns=['A', 'B'], + dtype='object') + tm.assert_frame_equal(df, exp) + + def test_series_partial_set(self): + # partial set with new index + # Regression from GH4825 + ser = Series([0.1, 0.2], index=[1, 2]) + + # loc + expected = Series([np.nan, 0.2, np.nan], index=[3, 2, 3]) + result = ser.loc[[3, 2, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([np.nan, 0.2, np.nan, np.nan], index=[3, 2, 3, 'x']) + result = ser.loc[[3, 2, 3, 'x']] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([0.2, 0.2, 0.1], index=[2, 2, 1]) + result = ser.loc[[2, 2, 1]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([0.2, 0.2, np.nan, 0.1], index=[2, 2, 'x', 1]) + result = ser.loc[[2, 2, 'x', 1]] + tm.assert_series_equal(result, expected, check_index_type=True) + + # raises as nothing in in the index + pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + + expected = Series([0.2, 0.2, np.nan], index=[2, 2, 3]) + result = ser.loc[[2, 2, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([0.3, np.nan, np.nan], index=[3, 4, 4]) + result = Series([0.1, 0.2, 0.3], index=[1, 2, 3]).loc[[3, 4, 4]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([np.nan, 0.3, 0.3], index=[5, 3, 3]) + result = Series([0.1, 0.2, 0.3, 0.4], + index=[1, 2, 3, 4]).loc[[5, 3, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([np.nan, 0.4, 0.4], index=[5, 4, 4]) + result = Series([0.1, 0.2, 0.3, 0.4], + index=[1, 2, 3, 4]).loc[[5, 4, 4]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([0.4, np.nan, np.nan], index=[7, 2, 2]) + result = Series([0.1, 0.2, 0.3, 0.4], + index=[4, 5, 6, 7]).loc[[7, 2, 2]] + tm.assert_series_equal(result, expected, check_index_type=True) + + expected = Series([0.4, np.nan, np.nan], index=[4, 5, 5]) + result = Series([0.1, 0.2, 0.3, 0.4], + index=[1, 2, 3, 4]).loc[[4, 5, 5]] + tm.assert_series_equal(result, expected, check_index_type=True) + + # iloc + expected = Series([0.2, 0.2, 0.1, 0.1], index=[2, 2, 1, 1]) + result = ser.iloc[[1, 1, 0, 0]] + tm.assert_series_equal(result, expected, check_index_type=True) + + def test_series_partial_set_with_name(self): + # GH 11497 + + idx = Index([1, 2], dtype='int64', name='idx') + ser = Series([0.1, 0.2], index=idx, name='s') + + # loc + exp_idx = Index([3, 2, 3], dtype='int64', name='idx') + expected = Series([np.nan, 0.2, np.nan], index=exp_idx, name='s') + result = ser.loc[[3, 2, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([3, 2, 3, 'x'], dtype='object', name='idx') + expected = Series([np.nan, 0.2, np.nan, np.nan], index=exp_idx, + name='s') + result = ser.loc[[3, 2, 3, 'x']] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([2, 2, 1], dtype='int64', name='idx') + expected = Series([0.2, 0.2, 0.1], index=exp_idx, name='s') + result = ser.loc[[2, 2, 1]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([2, 2, 'x', 1], dtype='object', name='idx') + expected = Series([0.2, 0.2, np.nan, 0.1], index=exp_idx, name='s') + result = ser.loc[[2, 2, 'x', 1]] + tm.assert_series_equal(result, expected, check_index_type=True) + + # raises as nothing in in the index + pytest.raises(KeyError, lambda: ser.loc[[3, 3, 3]]) + + exp_idx = Index([2, 2, 3], dtype='int64', name='idx') + expected = Series([0.2, 0.2, np.nan], index=exp_idx, name='s') + result = ser.loc[[2, 2, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([3, 4, 4], dtype='int64', name='idx') + expected = Series([0.3, np.nan, np.nan], index=exp_idx, name='s') + idx = Index([1, 2, 3], dtype='int64', name='idx') + result = Series([0.1, 0.2, 0.3], index=idx, name='s').loc[[3, 4, 4]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([5, 3, 3], dtype='int64', name='idx') + expected = Series([np.nan, 0.3, 0.3], index=exp_idx, name='s') + idx = Index([1, 2, 3, 4], dtype='int64', name='idx') + result = Series([0.1, 0.2, 0.3, 0.4], index=idx, + name='s').loc[[5, 3, 3]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([5, 4, 4], dtype='int64', name='idx') + expected = Series([np.nan, 0.4, 0.4], index=exp_idx, name='s') + idx = Index([1, 2, 3, 4], dtype='int64', name='idx') + result = Series([0.1, 0.2, 0.3, 0.4], index=idx, + name='s').loc[[5, 4, 4]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([7, 2, 2], dtype='int64', name='idx') + expected = Series([0.4, np.nan, np.nan], index=exp_idx, name='s') + idx = Index([4, 5, 6, 7], dtype='int64', name='idx') + result = Series([0.1, 0.2, 0.3, 0.4], index=idx, + name='s').loc[[7, 2, 2]] + tm.assert_series_equal(result, expected, check_index_type=True) + + exp_idx = Index([4, 5, 5], dtype='int64', name='idx') + expected = Series([0.4, np.nan, np.nan], index=exp_idx, name='s') + idx = Index([1, 2, 3, 4], dtype='int64', name='idx') + result = Series([0.1, 0.2, 0.3, 0.4], index=idx, + name='s').loc[[4, 5, 5]] + tm.assert_series_equal(result, expected, check_index_type=True) + + # iloc + exp_idx = Index([2, 2, 1, 1], dtype='int64', name='idx') + expected = Series([0.2, 0.2, 0.1, 0.1], index=exp_idx, name='s') + result = ser.iloc[[1, 1, 0, 0]] + tm.assert_series_equal(result, expected, check_index_type=True) + + def test_partial_set_invalid(self): + + # GH 4940 + # allow only setting of 'valid' values + + orig = tm.makeTimeDataFrame() + df = orig.copy() + + # don't allow not string inserts + def f(): + with catch_warnings(record=True): + df.loc[100.0, :] = df.ix[0] + + pytest.raises(TypeError, f) + + def f(): + with catch_warnings(record=True): + df.loc[100, :] = df.ix[0] + + pytest.raises(TypeError, f) + + def f(): + with catch_warnings(record=True): + df.ix[100.0, :] = df.ix[0] + + pytest.raises(TypeError, f) + + def f(): + with catch_warnings(record=True): + df.ix[100, :] = df.ix[0] + + pytest.raises(ValueError, f) + + # allow object conversion here + df = orig.copy() + with catch_warnings(record=True): + df.loc['a', :] = df.ix[0] + exp = orig.append(pd.Series(df.ix[0], name='a')) + tm.assert_frame_equal(df, exp) + tm.assert_index_equal(df.index, + pd.Index(orig.index.tolist() + ['a'])) + assert df.index.dtype == 'object' + + def test_partial_set_empty_series(self): + + # GH5226 + + # partially set with an empty object series + s = Series() + s.loc[1] = 1 + tm.assert_series_equal(s, Series([1], index=[1])) + s.loc[3] = 3 + tm.assert_series_equal(s, Series([1, 3], index=[1, 3])) + + s = Series() + s.loc[1] = 1. + tm.assert_series_equal(s, Series([1.], index=[1])) + s.loc[3] = 3. + tm.assert_series_equal(s, Series([1., 3.], index=[1, 3])) + + s = Series() + s.loc['foo'] = 1 + tm.assert_series_equal(s, Series([1], index=['foo'])) + s.loc['bar'] = 3 + tm.assert_series_equal(s, Series([1, 3], index=['foo', 'bar'])) + s.loc[3] = 4 + tm.assert_series_equal(s, Series([1, 3, 4], index=['foo', 'bar', 3])) + + def test_partial_set_empty_frame(self): + + # partially set with an empty object + # frame + df = DataFrame() + + def f(): + df.loc[1] = 1 + + pytest.raises(ValueError, f) + + def f(): + df.loc[1] = Series([1], index=['foo']) + + pytest.raises(ValueError, f) + + def f(): + df.loc[:, 1] = 1 + + pytest.raises(ValueError, f) + + # these work as they don't really change + # anything but the index + # GH5632 + expected = DataFrame(columns=['foo'], index=pd.Index( + [], dtype='int64')) + + def f(): + df = DataFrame() + df['foo'] = Series([], dtype='object') + return df + + tm.assert_frame_equal(f(), expected) + + def f(): + df = DataFrame() + df['foo'] = Series(df.index) + return df + + tm.assert_frame_equal(f(), expected) + + def f(): + df = DataFrame() + df['foo'] = df.index + return df + + tm.assert_frame_equal(f(), expected) + + expected = DataFrame(columns=['foo'], + index=pd.Index([], dtype='int64')) + expected['foo'] = expected['foo'].astype('float64') + + def f(): + df = DataFrame() + df['foo'] = [] + return df + + tm.assert_frame_equal(f(), expected) + + def f(): + df = DataFrame() + df['foo'] = Series(range(len(df))) + return df + + tm.assert_frame_equal(f(), expected) + + def f(): + df = DataFrame() + tm.assert_index_equal(df.index, pd.Index([], dtype='object')) + df['foo'] = range(len(df)) + return df + + expected = DataFrame(columns=['foo'], + index=pd.Index([], dtype='int64')) + expected['foo'] = expected['foo'].astype('float64') + tm.assert_frame_equal(f(), expected) + + df = DataFrame() + tm.assert_index_equal(df.columns, pd.Index([], dtype=object)) + df2 = DataFrame() + df2[1] = Series([1], index=['foo']) + df.loc[:, 1] = Series([1], index=['foo']) + tm.assert_frame_equal(df, DataFrame([[1]], index=['foo'], columns=[1])) + tm.assert_frame_equal(df, df2) + + # no index to start + expected = DataFrame({0: Series(1, index=range(4))}, + columns=['A', 'B', 0]) + + df = DataFrame(columns=['A', 'B']) + df[0] = Series(1, index=range(4)) + df.dtypes + str(df) + tm.assert_frame_equal(df, expected) + + df = DataFrame(columns=['A', 'B']) + df.loc[:, 0] = Series(1, index=range(4)) + df.dtypes + str(df) + tm.assert_frame_equal(df, expected) + + def test_partial_set_empty_frame_row(self): + # GH5720, GH5744 + # don't create rows when empty + expected = DataFrame(columns=['A', 'B', 'New'], + index=pd.Index([], dtype='int64')) + expected['A'] = expected['A'].astype('int64') + expected['B'] = expected['B'].astype('float64') + expected['New'] = expected['New'].astype('float64') + + df = DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]}) + y = df[df.A > 5] + y['New'] = np.nan + tm.assert_frame_equal(y, expected) + # tm.assert_frame_equal(y,expected) + + expected = DataFrame(columns=['a', 'b', 'c c', 'd']) + expected['d'] = expected['d'].astype('int64') + df = DataFrame(columns=['a', 'b', 'c c']) + df['d'] = 3 + tm.assert_frame_equal(df, expected) + tm.assert_series_equal(df['c c'], Series(name='c c', dtype=object)) + + # reindex columns is ok + df = DataFrame({"A": [1, 2, 3], "B": [1.2, 4.2, 5.2]}) + y = df[df.A > 5] + result = y.reindex(columns=['A', 'B', 'C']) + expected = DataFrame(columns=['A', 'B', 'C'], + index=pd.Index([], dtype='int64')) + expected['A'] = expected['A'].astype('int64') + expected['B'] = expected['B'].astype('float64') + expected['C'] = expected['C'].astype('float64') + tm.assert_frame_equal(result, expected) + + def test_partial_set_empty_frame_set_series(self): + # GH 5756 + # setting with empty Series + df = DataFrame(Series()) + tm.assert_frame_equal(df, DataFrame({0: Series()})) + + df = DataFrame(Series(name='foo')) + tm.assert_frame_equal(df, DataFrame({'foo': Series()})) + + def test_partial_set_empty_frame_empty_copy_assignment(self): + # GH 5932 + # copy on empty with assignment fails + df = DataFrame(index=[0]) + df = df.copy() + df['a'] = 0 + expected = DataFrame(0, index=[0], columns=['a']) + tm.assert_frame_equal(df, expected) + + def test_partial_set_empty_frame_empty_consistencies(self): + # GH 6171 + # consistency on empty frames + df = DataFrame(columns=['x', 'y']) + df['x'] = [1, 2] + expected = DataFrame(dict(x=[1, 2], y=[np.nan, np.nan])) + tm.assert_frame_equal(df, expected, check_dtype=False) + + df = DataFrame(columns=['x', 'y']) + df['x'] = ['1', '2'] + expected = DataFrame( + dict(x=['1', '2'], y=[np.nan, np.nan]), dtype=object) + tm.assert_frame_equal(df, expected) + + df = DataFrame(columns=['x', 'y']) + df.loc[0, 'x'] = 1 + expected = DataFrame(dict(x=[1], y=[np.nan])) + tm.assert_frame_equal(df, expected, check_dtype=False) diff --git a/pandas/tests/indexing/test_scalar.py b/pandas/tests/indexing/test_scalar.py new file mode 100644 index 0000000000000..7314ff6619049 --- /dev/null +++ b/pandas/tests/indexing/test_scalar.py @@ -0,0 +1,172 @@ +""" test scalar indexing, including at and iat """ + +import pytest + +import numpy as np + +from pandas import (Series, DataFrame, Timestamp, + Timedelta, date_range) +from pandas.util import testing as tm +from pandas.tests.indexing.common import Base + + +class TestScalar(Base): + + def test_at_and_iat_get(self): + def _check(f, func, values=False): + + if f is not None: + indicies = self.generate_indices(f, values) + for i in indicies: + result = getattr(f, func)[i] + expected = self.get_value(f, i, values) + tm.assert_almost_equal(result, expected) + + for o in self._objs: + + d = getattr(self, o) + + # iat + for f in [d['ints'], d['uints']]: + _check(f, 'iat', values=True) + + for f in [d['labels'], d['ts'], d['floats']]: + if f is not None: + pytest.raises(ValueError, self.check_values, f, 'iat') + + # at + for f in [d['ints'], d['uints'], d['labels'], + d['ts'], d['floats']]: + _check(f, 'at') + + def test_at_and_iat_set(self): + def _check(f, func, values=False): + + if f is not None: + indicies = self.generate_indices(f, values) + for i in indicies: + getattr(f, func)[i] = 1 + expected = self.get_value(f, i, values) + tm.assert_almost_equal(expected, 1) + + for t in self._objs: + + d = getattr(self, t) + + # iat + for f in [d['ints'], d['uints']]: + _check(f, 'iat', values=True) + + for f in [d['labels'], d['ts'], d['floats']]: + if f is not None: + pytest.raises(ValueError, _check, f, 'iat') + + # at + for f in [d['ints'], d['uints'], d['labels'], + d['ts'], d['floats']]: + _check(f, 'at') + + def test_at_iat_coercion(self): + + # as timestamp is not a tuple! + dates = date_range('1/1/2000', periods=8) + df = DataFrame(np.random.randn(8, 4), + index=dates, + columns=['A', 'B', 'C', 'D']) + s = df['A'] + + result = s.at[dates[5]] + xp = s.values[5] + assert result == xp + + # GH 7729 + # make sure we are boxing the returns + s = Series(['2014-01-01', '2014-02-02'], dtype='datetime64[ns]') + expected = Timestamp('2014-02-02') + + for r in [lambda: s.iat[1], lambda: s.iloc[1]]: + result = r() + assert result == expected + + s = Series(['1 days', '2 days'], dtype='timedelta64[ns]') + expected = Timedelta('2 days') + + for r in [lambda: s.iat[1], lambda: s.iloc[1]]: + result = r() + assert result == expected + + def test_iat_invalid_args(self): + pass + + def test_imethods_with_dups(self): + + # GH6493 + # iat/iloc with dups + + s = Series(range(5), index=[1, 1, 2, 2, 3], dtype='int64') + result = s.iloc[2] + assert result == 2 + result = s.iat[2] + assert result == 2 + + pytest.raises(IndexError, lambda: s.iat[10]) + pytest.raises(IndexError, lambda: s.iat[-10]) + + result = s.iloc[[2, 3]] + expected = Series([2, 3], [2, 2], dtype='int64') + tm.assert_series_equal(result, expected) + + df = s.to_frame() + result = df.iloc[2] + expected = Series(2, index=[0], name=2) + tm.assert_series_equal(result, expected) + + result = df.iat[2, 0] + assert result == 2 + + def test_at_to_fail(self): + # at should not fallback + # GH 7814 + s = Series([1, 2, 3], index=list('abc')) + result = s.at['a'] + assert result == 1 + pytest.raises(ValueError, lambda: s.at[0]) + + df = DataFrame({'A': [1, 2, 3]}, index=list('abc')) + result = df.at['a', 'A'] + assert result == 1 + pytest.raises(ValueError, lambda: df.at['a', 0]) + + s = Series([1, 2, 3], index=[3, 2, 1]) + result = s.at[1] + assert result == 3 + pytest.raises(ValueError, lambda: s.at['a']) + + df = DataFrame({0: [1, 2, 3]}, index=[3, 2, 1]) + result = df.at[1, 0] + assert result == 3 + pytest.raises(ValueError, lambda: df.at['a', 0]) + + # GH 13822, incorrect error string with non-unique columns when missing + # column is accessed + df = DataFrame({'x': [1.], 'y': [2.], 'z': [3.]}) + df.columns = ['x', 'x', 'z'] + + # Check that we get the correct value in the KeyError + tm.assert_raises_regex(KeyError, r"\['y'\] not in index", + lambda: df[['x', 'y', 'z']]) + + def test_at_with_tz(self): + # gh-15822 + df = DataFrame({'name': ['John', 'Anderson'], + 'date': [Timestamp(2017, 3, 13, 13, 32, 56), + Timestamp(2017, 2, 16, 12, 10, 3)]}) + df['date'] = df['date'].dt.tz_localize('Asia/Shanghai') + + expected = Timestamp('2017-03-13 13:32:56+0800', tz='Asia/Shanghai') + + result = df.loc[0, 'date'] + assert result == expected + + result = df.at[0, 'date'] + assert result == expected diff --git a/pandas/tests/indexing/test_timedelta.py b/pandas/tests/indexing/test_timedelta.py new file mode 100644 index 0000000000000..32609362e49af --- /dev/null +++ b/pandas/tests/indexing/test_timedelta.py @@ -0,0 +1,49 @@ +import pytest + +import pandas as pd +from pandas.util import testing as tm + + +class TestTimedeltaIndexing(object): + def test_boolean_indexing(self): + # GH 14946 + df = pd.DataFrame({'x': range(10)}) + df.index = pd.to_timedelta(range(10), unit='s') + conditions = [df['x'] > 3, df['x'] == 3, df['x'] < 3] + expected_data = [[0, 1, 2, 3, 10, 10, 10, 10, 10, 10], + [0, 1, 2, 10, 4, 5, 6, 7, 8, 9], + [10, 10, 10, 3, 4, 5, 6, 7, 8, 9]] + for cond, data in zip(conditions, expected_data): + result = df.assign(x=df.mask(cond, 10).astype('int64')) + expected = pd.DataFrame(data, + index=pd.to_timedelta(range(10), unit='s'), + columns=['x'], + dtype='int64') + tm.assert_frame_equal(expected, result) + + @pytest.mark.parametrize( + "indexer, expected", + [(0, [20, 1, 2, 3, 4, 5, 6, 7, 8, 9]), + (slice(4, 8), [0, 1, 2, 3, 20, 20, 20, 20, 8, 9]), + ([3, 5], [0, 1, 2, 20, 4, 20, 6, 7, 8, 9])]) + def test_list_like_indexing(self, indexer, expected): + # GH 16637 + df = pd.DataFrame({'x': range(10)}, dtype="int64") + df.index = pd.to_timedelta(range(10), unit='s') + + df.loc[df.index[indexer], 'x'] = 20 + + expected = pd.DataFrame(expected, + index=pd.to_timedelta(range(10), unit='s'), + columns=['x'], + dtype="int64") + + tm.assert_frame_equal(expected, df) + + def test_string_indexing(self): + # GH 16896 + df = pd.DataFrame({'x': range(3)}, + index=pd.to_timedelta(range(3), unit='days')) + expected = df.iloc[0] + sliced = df.loc['0 days'] + tm.assert_series_equal(sliced, expected) diff --git a/vb_suite/source/_static/stub b/pandas/tests/internals/__init__.py similarity index 100% rename from vb_suite/source/_static/stub rename to pandas/tests/internals/__init__.py diff --git a/pandas/tests/internals/test_external_block.py b/pandas/tests/internals/test_external_block.py new file mode 100644 index 0000000000000..cccde76c3e1d9 --- /dev/null +++ b/pandas/tests/internals/test_external_block.py @@ -0,0 +1,29 @@ +# -*- coding: utf-8 -*- +# pylint: disable=W0102 + +import numpy as np + +import pandas as pd +from pandas.core.internals import Block, BlockManager, SingleBlockManager + + +class CustomBlock(Block): + + def formatting_values(self): + return np.array(["Val: {}".format(i) for i in self.values]) + + +def test_custom_repr(): + values = np.arange(3, dtype='int64') + + # series + block = CustomBlock(values, placement=slice(0, 3)) + + s = pd.Series(SingleBlockManager(block, pd.RangeIndex(3))) + assert repr(s) == '0 Val: 0\n1 Val: 1\n2 Val: 2\ndtype: int64' + + # dataframe + block = CustomBlock(values.reshape(1, -1), placement=slice(0, 1)) + blk_mgr = BlockManager([block], [['col'], range(3)]) + df = pd.DataFrame(blk_mgr) + assert repr(df) == ' col\n0 Val: 0\n1 Val: 1\n2 Val: 2' diff --git a/pandas/tests/test_internals.py b/pandas/tests/internals/test_internals.py similarity index 65% rename from pandas/tests/test_internals.py rename to pandas/tests/internals/test_internals.py index 5000d6d4510fb..0900d21b250ed 100644 --- a/pandas/tests/test_internals.py +++ b/pandas/tests/internals/test_internals.py @@ -2,32 +2,44 @@ # pylint: disable=W0102 from datetime import datetime, date - -import nose +import sys +import pytest import numpy as np import re +from distutils.version import LooseVersion import itertools from pandas import (Index, MultiIndex, DataFrame, DatetimeIndex, Series, Categorical) from pandas.compat import OrderedDict, lrange -from pandas.sparse.array import SparseArray +from pandas.core.sparse.array import SparseArray from pandas.core.internals import (BlockPlacement, SingleBlockManager, make_block, BlockManager) import pandas.core.algorithms as algos import pandas.util.testing as tm import pandas as pd -from pandas import lib +from pandas._libs import lib from pandas.util.testing import (assert_almost_equal, assert_frame_equal, randn, assert_series_equal) from pandas.compat import zip, u +# in 3.6.1 a c-api slicing function changed, see src/compat_helper.h +PY361 = sys.version >= LooseVersion('3.6.1') + + +@pytest.fixture +def mgr(): + return create_mgr( + 'a: f8; b: object; c: f8; d: object; e: f8;' + 'f: bool; g: i8; h: complex; i: datetime-1; j: datetime-2;' + 'k: M8[ns, US/Eastern]; l: M8[ns, CET];') + def assert_block_equal(left, right): tm.assert_numpy_array_equal(left.values, right.values) - assert (left.dtype == right.dtype) - tm.assertIsInstance(left.mgr_locs, lib.BlockPlacement) - tm.assertIsInstance(right.mgr_locs, lib.BlockPlacement) + assert left.dtype == right.dtype + assert isinstance(left.mgr_locs, lib.BlockPlacement) + assert isinstance(right.mgr_locs, lib.BlockPlacement) tm.assert_numpy_array_equal(left.mgr_locs.as_array, right.mgr_locs.as_array) @@ -180,11 +192,9 @@ def create_mgr(descr, item_shape=None): [mgr_items] + [np.arange(n) for n in item_shape]) -class TestBlock(tm.TestCase): +class TestBlock(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): # self.fblock = get_float_ex() # a,c,e # self.cblock = get_complex_ex() # # self.oblock = get_obj_ex() @@ -199,11 +209,11 @@ def setUp(self): def test_constructor(self): int32block = create_block('i4', [0]) - self.assertEqual(int32block.dtype, np.int32) + assert int32block.dtype == np.int32 def test_pickle(self): def _check(blk): - assert_block_equal(self.round_trip_pickle(blk), blk) + assert_block_equal(tm.round_trip_pickle(blk), blk) _check(self.fblock) _check(self.cblock) @@ -211,14 +221,14 @@ def _check(blk): _check(self.bool_block) def test_mgr_locs(self): - tm.assertIsInstance(self.fblock.mgr_locs, lib.BlockPlacement) + assert isinstance(self.fblock.mgr_locs, lib.BlockPlacement) tm.assert_numpy_array_equal(self.fblock.mgr_locs.as_array, np.array([0, 2, 4], dtype=np.int64)) def test_attrs(self): - self.assertEqual(self.fblock.shape, self.fblock.values.shape) - self.assertEqual(self.fblock.dtype, self.fblock.values.dtype) - self.assertEqual(len(self.fblock), len(self.fblock.values)) + assert self.fblock.shape == self.fblock.values.shape + assert self.fblock.dtype == self.fblock.values.dtype + assert len(self.fblock) == len(self.fblock.values) def test_merge(self): avals = randn(2, 10) @@ -238,7 +248,7 @@ def test_merge(self): def test_copy(self): cop = self.fblock.copy() - self.assertIsNot(cop, self.fblock) + assert cop is not self.fblock assert_block_equal(self.fblock, cop) def test_reindex_index(self): @@ -253,104 +263,97 @@ def test_insert(self): def test_delete(self): newb = self.fblock.copy() newb.delete(0) - tm.assertIsInstance(newb.mgr_locs, lib.BlockPlacement) + assert isinstance(newb.mgr_locs, lib.BlockPlacement) tm.assert_numpy_array_equal(newb.mgr_locs.as_array, np.array([2, 4], dtype=np.int64)) - self.assertTrue((newb.values[0] == 1).all()) + assert (newb.values[0] == 1).all() newb = self.fblock.copy() newb.delete(1) - tm.assertIsInstance(newb.mgr_locs, lib.BlockPlacement) + assert isinstance(newb.mgr_locs, lib.BlockPlacement) tm.assert_numpy_array_equal(newb.mgr_locs.as_array, np.array([0, 4], dtype=np.int64)) - self.assertTrue((newb.values[1] == 2).all()) + assert (newb.values[1] == 2).all() newb = self.fblock.copy() newb.delete(2) tm.assert_numpy_array_equal(newb.mgr_locs.as_array, np.array([0, 2], dtype=np.int64)) - self.assertTrue((newb.values[1] == 1).all()) + assert (newb.values[1] == 1).all() newb = self.fblock.copy() - self.assertRaises(Exception, newb.delete, 3) + with pytest.raises(Exception): + newb.delete(3) def test_split_block_at(self): # with dup column support this method was taken out # GH3679 - raise nose.SkipTest("skipping for now") + pytest.skip("skipping for now") bs = list(self.fblock.split_block_at('a')) - self.assertEqual(len(bs), 1) - self.assertTrue(np.array_equal(bs[0].items, ['c', 'e'])) + assert len(bs) == 1 + assert np.array_equal(bs[0].items, ['c', 'e']) bs = list(self.fblock.split_block_at('c')) - self.assertEqual(len(bs), 2) - self.assertTrue(np.array_equal(bs[0].items, ['a'])) - self.assertTrue(np.array_equal(bs[1].items, ['e'])) + assert len(bs) == 2 + assert np.array_equal(bs[0].items, ['a']) + assert np.array_equal(bs[1].items, ['e']) bs = list(self.fblock.split_block_at('e')) - self.assertEqual(len(bs), 1) - self.assertTrue(np.array_equal(bs[0].items, ['a', 'c'])) + assert len(bs) == 1 + assert np.array_equal(bs[0].items, ['a', 'c']) # bblock = get_bool_ex(['f']) # bs = list(bblock.split_block_at('f')) - # self.assertEqual(len(bs), 0) + # assert len(bs), 0) -class TestDatetimeBlock(tm.TestCase): - _multiprocess_can_split_ = True +class TestDatetimeBlock(object): def test_try_coerce_arg(self): block = create_block('datetime', [0]) # coerce None none_coerced = block._try_coerce_args(block.values, None)[2] - self.assertTrue(pd.Timestamp(none_coerced) is pd.NaT) + assert pd.Timestamp(none_coerced) is pd.NaT # coerce different types of date bojects vals = (np.datetime64('2010-10-10'), datetime(2010, 10, 10), date(2010, 10, 10)) for val in vals: coerced = block._try_coerce_args(block.values, val)[2] - self.assertEqual(np.int64, type(coerced)) - self.assertEqual(pd.Timestamp('2010-10-10'), pd.Timestamp(coerced)) - + assert np.int64 == type(coerced) + assert pd.Timestamp('2010-10-10') == pd.Timestamp(coerced) -class TestBlockManager(tm.TestCase): - _multiprocess_can_split_ = True - def setUp(self): - self.mgr = create_mgr( - 'a: f8; b: object; c: f8; d: object; e: f8;' - 'f: bool; g: i8; h: complex; i: datetime-1; j: datetime-2;' - 'k: M8[ns, US/Eastern]; l: M8[ns, CET];') +class TestBlockManager(object): def test_constructor_corner(self): pass def test_attrs(self): mgr = create_mgr('a,b,c: f8-1; d,e,f: f8-2') - self.assertEqual(mgr.nblocks, 2) - self.assertEqual(len(mgr), 6) + assert mgr.nblocks == 2 + assert len(mgr) == 6 def test_is_mixed_dtype(self): - self.assertFalse(create_mgr('a,b:f8').is_mixed_type) - self.assertFalse(create_mgr('a:f8-1; b:f8-2').is_mixed_type) + assert not create_mgr('a,b:f8').is_mixed_type + assert not create_mgr('a:f8-1; b:f8-2').is_mixed_type - self.assertTrue(create_mgr('a,b:f8; c,d: f4').is_mixed_type) - self.assertTrue(create_mgr('a,b:f8; c,d: object').is_mixed_type) + assert create_mgr('a,b:f8; c,d: f4').is_mixed_type + assert create_mgr('a,b:f8; c,d: object').is_mixed_type def test_is_indexed_like(self): mgr1 = create_mgr('a,b: f8') mgr2 = create_mgr('a:i8; b:bool') mgr3 = create_mgr('a,b,c: f8') - self.assertTrue(mgr1._is_indexed_like(mgr1)) - self.assertTrue(mgr1._is_indexed_like(mgr2)) - self.assertTrue(mgr1._is_indexed_like(mgr3)) + assert mgr1._is_indexed_like(mgr1) + assert mgr1._is_indexed_like(mgr2) + assert mgr1._is_indexed_like(mgr3) - self.assertFalse(mgr1._is_indexed_like(mgr1.get_slice( - slice(-1), axis=1))) + assert not mgr1._is_indexed_like(mgr1.get_slice( + slice(-1), axis=1)) def test_duplicate_ref_loc_failure(self): tmp_mgr = create_mgr('a:bool; a: f8') @@ -359,61 +362,63 @@ def test_duplicate_ref_loc_failure(self): blocks[0].mgr_locs = np.array([0]) blocks[1].mgr_locs = np.array([0]) + # test trying to create block manager with overlapping ref locs - self.assertRaises(AssertionError, BlockManager, blocks, axes) + with pytest.raises(AssertionError): + BlockManager(blocks, axes) blocks[0].mgr_locs = np.array([0]) blocks[1].mgr_locs = np.array([1]) mgr = BlockManager(blocks, axes) mgr.iget(1) - def test_contains(self): - self.assertIn('a', self.mgr) - self.assertNotIn('baz', self.mgr) + def test_contains(self, mgr): + assert 'a' in mgr + assert 'baz' not in mgr - def test_pickle(self): + def test_pickle(self, mgr): - mgr2 = self.round_trip_pickle(self.mgr) - assert_frame_equal(DataFrame(self.mgr), DataFrame(mgr2)) + mgr2 = tm.round_trip_pickle(mgr) + assert_frame_equal(DataFrame(mgr), DataFrame(mgr2)) # share ref_items - # self.assertIs(mgr2.blocks[0].ref_items, mgr2.blocks[1].ref_items) + # assert mgr2.blocks[0].ref_items is mgr2.blocks[1].ref_items # GH2431 - self.assertTrue(hasattr(mgr2, "_is_consolidated")) - self.assertTrue(hasattr(mgr2, "_known_consolidated")) + assert hasattr(mgr2, "_is_consolidated") + assert hasattr(mgr2, "_known_consolidated") # reset to False on load - self.assertFalse(mgr2._is_consolidated) - self.assertFalse(mgr2._known_consolidated) + assert not mgr2._is_consolidated + assert not mgr2._known_consolidated def test_non_unique_pickle(self): mgr = create_mgr('a,a,a:f8') - mgr2 = self.round_trip_pickle(mgr) + mgr2 = tm.round_trip_pickle(mgr) assert_frame_equal(DataFrame(mgr), DataFrame(mgr2)) mgr = create_mgr('a: f8; a: i8') - mgr2 = self.round_trip_pickle(mgr) + mgr2 = tm.round_trip_pickle(mgr) assert_frame_equal(DataFrame(mgr), DataFrame(mgr2)) def test_categorical_block_pickle(self): mgr = create_mgr('a: category') - mgr2 = self.round_trip_pickle(mgr) + mgr2 = tm.round_trip_pickle(mgr) assert_frame_equal(DataFrame(mgr), DataFrame(mgr2)) smgr = create_single_mgr('category') - smgr2 = self.round_trip_pickle(smgr) + smgr2 = tm.round_trip_pickle(smgr) assert_series_equal(Series(smgr), Series(smgr2)) - def test_get_scalar(self): - for item in self.mgr.items: - for i, index in enumerate(self.mgr.axes[1]): - res = self.mgr.get_scalar((item, index)) - exp = self.mgr.get(item, fastpath=False)[i] - self.assertEqual(res, exp) - exp = self.mgr.get(item).internal_values()[i] - self.assertEqual(res, exp) + def test_get_scalar(self, mgr): + for item in mgr.items: + for i, index in enumerate(mgr.axes[1]): + res = mgr.get_scalar((item, index)) + exp = mgr.get(item, fastpath=False)[i] + assert res == exp + exp = mgr.get(item).internal_values()[i] + assert res == exp def test_get(self): cols = Index(list('abc')) @@ -442,30 +447,21 @@ def test_set(self): tm.assert_numpy_array_equal(mgr.get('d').internal_values(), np.array(['foo'] * 3, dtype=np.object_)) - def test_insert(self): - self.mgr.insert(0, 'inserted', np.arange(N)) - - self.assertEqual(self.mgr.items[0], 'inserted') - assert_almost_equal(self.mgr.get('inserted'), np.arange(N)) - - for blk in self.mgr.blocks: - yield self.assertIs, self.mgr.items, blk.ref_items + def test_set_change_dtype(self, mgr): + mgr.set('baz', np.zeros(N, dtype=bool)) - def test_set_change_dtype(self): - self.mgr.set('baz', np.zeros(N, dtype=bool)) + mgr.set('baz', np.repeat('foo', N)) + assert mgr.get('baz').dtype == np.object_ - self.mgr.set('baz', np.repeat('foo', N)) - self.assertEqual(self.mgr.get('baz').dtype, np.object_) - - mgr2 = self.mgr.consolidate() + mgr2 = mgr.consolidate() mgr2.set('baz', np.repeat('foo', N)) - self.assertEqual(mgr2.get('baz').dtype, np.object_) + assert mgr2.get('baz').dtype == np.object_ mgr2.set('quux', randn(N).astype(int)) - self.assertEqual(mgr2.get('quux').dtype, np.int_) + assert mgr2.get('quux').dtype == np.int_ mgr2.set('quux', randn(N)) - self.assertEqual(mgr2.get('quux').dtype, np.float_) + assert mgr2.get('quux').dtype == np.float_ def test_set_change_dtype_slice(self): # GH8850 cols = MultiIndex.from_tuples([('1st', 'a'), ('2nd', 'b'), ('3rd', 'c') @@ -473,70 +469,69 @@ def test_set_change_dtype_slice(self): # GH8850 df = DataFrame([[1.0, 2, 3], [4.0, 5, 6]], columns=cols) df['2nd'] = df['2nd'] * 2.0 - self.assertEqual(sorted(df.blocks.keys()), ['float64', 'int64']) + assert sorted(df.blocks.keys()) == ['float64', 'int64'] assert_frame_equal(df.blocks['float64'], DataFrame( [[1.0, 4.0], [4.0, 10.0]], columns=cols[:2])) assert_frame_equal(df.blocks['int64'], DataFrame( [[3], [6]], columns=cols[2:])) - def test_copy(self): - cp = self.mgr.copy(deep=False) - for blk, cp_blk in zip(self.mgr.blocks, cp.blocks): + def test_copy(self, mgr): + cp = mgr.copy(deep=False) + for blk, cp_blk in zip(mgr.blocks, cp.blocks): # view assertion - self.assertTrue(cp_blk.equals(blk)) - self.assertTrue(cp_blk.values.base is blk.values.base) + assert cp_blk.equals(blk) + assert cp_blk.values.base is blk.values.base - cp = self.mgr.copy(deep=True) - for blk, cp_blk in zip(self.mgr.blocks, cp.blocks): + cp = mgr.copy(deep=True) + for blk, cp_blk in zip(mgr.blocks, cp.blocks): # copy assertion we either have a None for a base or in case of # some blocks it is an array (e.g. datetimetz), but was copied - self.assertTrue(cp_blk.equals(blk)) + assert cp_blk.equals(blk) if cp_blk.values.base is not None and blk.values.base is not None: - self.assertFalse(cp_blk.values.base is blk.values.base) + assert cp_blk.values.base is not blk.values.base else: - self.assertTrue(cp_blk.values.base is None and blk.values.base - is None) + assert cp_blk.values.base is None and blk.values.base is None def test_sparse(self): mgr = create_mgr('a: sparse-1; b: sparse-2') # what to test here? - self.assertEqual(mgr.as_matrix().dtype, np.float64) + assert mgr.as_matrix().dtype == np.float64 def test_sparse_mixed(self): mgr = create_mgr('a: sparse-1; b: sparse-2; c: f8') - self.assertEqual(len(mgr.blocks), 3) - self.assertIsInstance(mgr, BlockManager) + assert len(mgr.blocks) == 3 + assert isinstance(mgr, BlockManager) # what to test here? def test_as_matrix_float(self): mgr = create_mgr('c: f4; d: f2; e: f8') - self.assertEqual(mgr.as_matrix().dtype, np.float64) + assert mgr.as_matrix().dtype == np.float64 mgr = create_mgr('c: f4; d: f2') - self.assertEqual(mgr.as_matrix().dtype, np.float32) + assert mgr.as_matrix().dtype == np.float32 def test_as_matrix_int_bool(self): mgr = create_mgr('a: bool-1; b: bool-2') - self.assertEqual(mgr.as_matrix().dtype, np.bool_) + assert mgr.as_matrix().dtype == np.bool_ mgr = create_mgr('a: i8-1; b: i8-2; c: i4; d: i2; e: u1') - self.assertEqual(mgr.as_matrix().dtype, np.int64) + assert mgr.as_matrix().dtype == np.int64 mgr = create_mgr('c: i4; d: i2; e: u1') - self.assertEqual(mgr.as_matrix().dtype, np.int32) + assert mgr.as_matrix().dtype == np.int32 def test_as_matrix_datetime(self): mgr = create_mgr('h: datetime-1; g: datetime-2') - self.assertEqual(mgr.as_matrix().dtype, 'M8[ns]') + assert mgr.as_matrix().dtype == 'M8[ns]' def test_as_matrix_datetime_tz(self): mgr = create_mgr('h: M8[ns, US/Eastern]; g: M8[ns, CET]') - self.assertEqual(mgr.get('h').dtype, 'datetime64[ns, US/Eastern]') - self.assertEqual(mgr.get('g').dtype, 'datetime64[ns, CET]') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.get('h').dtype == 'datetime64[ns, US/Eastern]' + assert mgr.get('g').dtype == 'datetime64[ns, CET]' + assert mgr.as_matrix().dtype == 'object' def test_astype(self): # coerce all @@ -544,9 +539,9 @@ def test_astype(self): for t in ['float16', 'float32', 'float64', 'int32', 'int64']: t = np.dtype(t) tmgr = mgr.astype(t) - self.assertEqual(tmgr.get('c').dtype.type, t) - self.assertEqual(tmgr.get('d').dtype.type, t) - self.assertEqual(tmgr.get('e').dtype.type, t) + assert tmgr.get('c').dtype.type == t + assert tmgr.get('d').dtype.type == t + assert tmgr.get('e').dtype.type == t # mixed mgr = create_mgr('a,b: object; c: bool; d: datetime;' @@ -554,24 +549,24 @@ def test_astype(self): for t in ['float16', 'float32', 'float64', 'int32', 'int64']: t = np.dtype(t) tmgr = mgr.astype(t, errors='ignore') - self.assertEqual(tmgr.get('c').dtype.type, t) - self.assertEqual(tmgr.get('e').dtype.type, t) - self.assertEqual(tmgr.get('f').dtype.type, t) - self.assertEqual(tmgr.get('g').dtype.type, t) + assert tmgr.get('c').dtype.type == t + assert tmgr.get('e').dtype.type == t + assert tmgr.get('f').dtype.type == t + assert tmgr.get('g').dtype.type == t - self.assertEqual(tmgr.get('a').dtype.type, np.object_) - self.assertEqual(tmgr.get('b').dtype.type, np.object_) + assert tmgr.get('a').dtype.type == np.object_ + assert tmgr.get('b').dtype.type == np.object_ if t != np.int64: - self.assertEqual(tmgr.get('d').dtype.type, np.datetime64) + assert tmgr.get('d').dtype.type == np.datetime64 else: - self.assertEqual(tmgr.get('d').dtype.type, t) + assert tmgr.get('d').dtype.type == t def test_convert(self): def _compare(old_mgr, new_mgr): """ compare the blocks, numeric compare ==, object don't """ old_blocks = set(old_mgr.blocks) new_blocks = set(new_mgr.blocks) - self.assertEqual(len(old_blocks), len(new_blocks)) + assert len(old_blocks) == len(new_blocks) # compare non-numeric for b in old_blocks: @@ -580,7 +575,7 @@ def _compare(old_mgr, new_mgr): if (b.values == nb.values).all(): found = True break - self.assertTrue(found) + assert found for b in new_blocks: found = False @@ -588,7 +583,7 @@ def _compare(old_mgr, new_mgr): if (b.values == ob.values).all(): found = True break - self.assertTrue(found) + assert found # noops mgr = create_mgr('f: i8; g: f8') @@ -605,11 +600,11 @@ def _compare(old_mgr, new_mgr): mgr.set('b', np.array(['2.'] * N, dtype=np.object_)) mgr.set('foo', np.array(['foo.'] * N, dtype=np.object_)) new_mgr = mgr.convert(numeric=True) - self.assertEqual(new_mgr.get('a').dtype, np.int64) - self.assertEqual(new_mgr.get('b').dtype, np.float64) - self.assertEqual(new_mgr.get('foo').dtype, np.object_) - self.assertEqual(new_mgr.get('f').dtype, np.int64) - self.assertEqual(new_mgr.get('g').dtype, np.float64) + assert new_mgr.get('a').dtype == np.int64 + assert new_mgr.get('b').dtype == np.float64 + assert new_mgr.get('foo').dtype == np.object_ + assert new_mgr.get('f').dtype == np.int64 + assert new_mgr.get('g').dtype == np.float64 mgr = create_mgr('a,b,foo: object; f: i4; bool: bool; dt: datetime;' 'i: i8; g: f8; h: f2') @@ -617,15 +612,15 @@ def _compare(old_mgr, new_mgr): mgr.set('b', np.array(['2.'] * N, dtype=np.object_)) mgr.set('foo', np.array(['foo.'] * N, dtype=np.object_)) new_mgr = mgr.convert(numeric=True) - self.assertEqual(new_mgr.get('a').dtype, np.int64) - self.assertEqual(new_mgr.get('b').dtype, np.float64) - self.assertEqual(new_mgr.get('foo').dtype, np.object_) - self.assertEqual(new_mgr.get('f').dtype, np.int32) - self.assertEqual(new_mgr.get('bool').dtype, np.bool_) - self.assertEqual(new_mgr.get('dt').dtype.type, np.datetime64) - self.assertEqual(new_mgr.get('i').dtype, np.int64) - self.assertEqual(new_mgr.get('g').dtype, np.float64) - self.assertEqual(new_mgr.get('h').dtype, np.float16) + assert new_mgr.get('a').dtype == np.int64 + assert new_mgr.get('b').dtype == np.float64 + assert new_mgr.get('foo').dtype == np.object_ + assert new_mgr.get('f').dtype == np.int32 + assert new_mgr.get('bool').dtype == np.bool_ + assert new_mgr.get('dt').dtype.type, np.datetime64 + assert new_mgr.get('i').dtype == np.int64 + assert new_mgr.get('g').dtype == np.float64 + assert new_mgr.get('h').dtype == np.float16 def test_interleave(self): @@ -633,49 +628,49 @@ def test_interleave(self): for dtype in ['f8', 'i8', 'object', 'bool', 'complex', 'M8[ns]', 'm8[ns]']: mgr = create_mgr('a: {0}'.format(dtype)) - self.assertEqual(mgr.as_matrix().dtype, dtype) + assert mgr.as_matrix().dtype == dtype mgr = create_mgr('a: {0}; b: {0}'.format(dtype)) - self.assertEqual(mgr.as_matrix().dtype, dtype) + assert mgr.as_matrix().dtype == dtype # will be converted according the actual dtype of the underlying mgr = create_mgr('a: category') - self.assertEqual(mgr.as_matrix().dtype, 'i8') + assert mgr.as_matrix().dtype == 'i8' mgr = create_mgr('a: category; b: category') - self.assertEqual(mgr.as_matrix().dtype, 'i8'), + assert mgr.as_matrix().dtype == 'i8' mgr = create_mgr('a: category; b: category2') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: category2') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: category2; b: category2') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' # combinations mgr = create_mgr('a: f8') - self.assertEqual(mgr.as_matrix().dtype, 'f8') + assert mgr.as_matrix().dtype == 'f8' mgr = create_mgr('a: f8; b: i8') - self.assertEqual(mgr.as_matrix().dtype, 'f8') + assert mgr.as_matrix().dtype == 'f8' mgr = create_mgr('a: f4; b: i8') - self.assertEqual(mgr.as_matrix().dtype, 'f4') + assert mgr.as_matrix().dtype == 'f8' mgr = create_mgr('a: f4; b: i8; d: object') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: bool; b: i8') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: complex') - self.assertEqual(mgr.as_matrix().dtype, 'complex') + assert mgr.as_matrix().dtype == 'complex' mgr = create_mgr('a: f8; b: category') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: M8[ns]; b: category') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: M8[ns]; b: bool') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: M8[ns]; b: i8') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: m8[ns]; b: bool') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: m8[ns]; b: i8') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' mgr = create_mgr('a: M8[ns]; b: m8[ns]') - self.assertEqual(mgr.as_matrix().dtype, 'object') + assert mgr.as_matrix().dtype == 'object' def test_interleave_non_unique_cols(self): df = DataFrame([ @@ -686,26 +681,26 @@ def test_interleave_non_unique_cols(self): df_unique = df.copy() df_unique.columns = ['x', 'y'] - self.assertEqual(df_unique.values.shape, df.values.shape) + assert df_unique.values.shape == df.values.shape tm.assert_numpy_array_equal(df_unique.values[0], df.values[0]) tm.assert_numpy_array_equal(df_unique.values[1], df.values[1]) def test_consolidate(self): pass - def test_consolidate_ordering_issues(self): - self.mgr.set('f', randn(N)) - self.mgr.set('d', randn(N)) - self.mgr.set('b', randn(N)) - self.mgr.set('g', randn(N)) - self.mgr.set('h', randn(N)) - - # we have datetime/tz blocks in self.mgr - cons = self.mgr.consolidate() - self.assertEqual(cons.nblocks, 4) - cons = self.mgr.consolidate().get_numeric_data() - self.assertEqual(cons.nblocks, 1) - tm.assertIsInstance(cons.blocks[0].mgr_locs, lib.BlockPlacement) + def test_consolidate_ordering_issues(self, mgr): + mgr.set('f', randn(N)) + mgr.set('d', randn(N)) + mgr.set('b', randn(N)) + mgr.set('g', randn(N)) + mgr.set('h', randn(N)) + + # we have datetime/tz blocks in mgr + cons = mgr.consolidate() + assert cons.nblocks == 4 + cons = mgr.consolidate().get_numeric_data() + assert cons.nblocks == 1 + assert isinstance(cons.blocks[0].mgr_locs, lib.BlockPlacement) tm.assert_numpy_array_equal(cons.blocks[0].mgr_locs.as_array, np.arange(len(cons.items), dtype=np.int64)) @@ -718,7 +713,7 @@ def test_reindex_items(self): 'f: bool; g: f8-2') reindexed = mgr.reindex_axis(['g', 'c', 'a', 'd'], axis=0) - self.assertEqual(reindexed.nblocks, 2) + assert reindexed.nblocks == 2 tm.assert_index_equal(reindexed.items, pd.Index(['g', 'c', 'a', 'd'])) assert_almost_equal( mgr.get('g', fastpath=False), reindexed.get('g', fastpath=False)) @@ -752,9 +747,9 @@ def test_multiindex_xs(self): mgr.set_axis(1, index) result = mgr.xs('bar', axis=1) - self.assertEqual(result.shape, (6, 2)) - self.assertEqual(result.axes[1][0], ('bar', 'one')) - self.assertEqual(result.axes[1][1], ('bar', 'two')) + assert result.shape == (6, 2) + assert result.axes[1][0] == ('bar', 'one') + assert result.axes[1][1] == ('bar', 'two') def test_get_numeric_data(self): mgr = create_mgr('int: int; float: float; complex: complex;' @@ -830,11 +825,11 @@ def test_equals(self): # unique items bm1 = create_mgr('a,b,c: i8-1; d,e,f: i8-2') bm2 = BlockManager(bm1.blocks[::-1], bm1.axes) - self.assertTrue(bm1.equals(bm2)) + assert bm1.equals(bm2) bm1 = create_mgr('a,a,a: i8-1; b,b,b: i8-2') bm2 = BlockManager(bm1.blocks[::-1], bm1.axes) - self.assertTrue(bm1.equals(bm2)) + assert bm1.equals(bm2) def test_equals_block_order_different_dtypes(self): # GH 9330 @@ -852,19 +847,19 @@ def test_equals_block_order_different_dtypes(self): block_perms = itertools.permutations(bm.blocks) for bm_perm in block_perms: bm_this = BlockManager(bm_perm, bm.axes) - self.assertTrue(bm.equals(bm_this)) - self.assertTrue(bm_this.equals(bm)) + assert bm.equals(bm_this) + assert bm_this.equals(bm) def test_single_mgr_ctor(self): mgr = create_single_mgr('f8', num_rows=5) - self.assertEqual(mgr.as_matrix().tolist(), [0., 1., 2., 3., 4.]) + assert mgr.as_matrix().tolist() == [0., 1., 2., 3., 4.] def test_validate_bool_args(self): invalid_values = [1, "True", [1, 2, 3], 5.0] bm1 = create_mgr('a,b,c: i8-1; d,e,f: i8-2') for value in invalid_values: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): bm1.replace_list([1], [2], inplace=value) @@ -922,32 +917,37 @@ def assert_slice_ok(mgr, axis, slobj): for mgr in self.MANAGERS: for ax in range(mgr.ndim): # slice - yield assert_slice_ok, mgr, ax, slice(None) - yield assert_slice_ok, mgr, ax, slice(3) - yield assert_slice_ok, mgr, ax, slice(100) - yield assert_slice_ok, mgr, ax, slice(1, 4) - yield assert_slice_ok, mgr, ax, slice(3, 0, -2) + assert_slice_ok(mgr, ax, slice(None)) + assert_slice_ok(mgr, ax, slice(3)) + assert_slice_ok(mgr, ax, slice(100)) + assert_slice_ok(mgr, ax, slice(1, 4)) + assert_slice_ok(mgr, ax, slice(3, 0, -2)) # boolean mask - yield assert_slice_ok, mgr, ax, np.array([], dtype=np.bool_) - yield (assert_slice_ok, mgr, ax, - np.ones(mgr.shape[ax], dtype=np.bool_)) - yield (assert_slice_ok, mgr, ax, - np.zeros(mgr.shape[ax], dtype=np.bool_)) + assert_slice_ok( + mgr, ax, np.array([], dtype=np.bool_)) + assert_slice_ok( + mgr, ax, + np.ones(mgr.shape[ax], dtype=np.bool_)) + assert_slice_ok( + mgr, ax, + np.zeros(mgr.shape[ax], dtype=np.bool_)) if mgr.shape[ax] >= 3: - yield (assert_slice_ok, mgr, ax, - np.arange(mgr.shape[ax]) % 3 == 0) - yield (assert_slice_ok, mgr, ax, np.array( - [True, True, False], dtype=np.bool_)) + assert_slice_ok( + mgr, ax, + np.arange(mgr.shape[ax]) % 3 == 0) + assert_slice_ok( + mgr, ax, np.array( + [True, True, False], dtype=np.bool_)) # fancy indexer - yield assert_slice_ok, mgr, ax, [] - yield assert_slice_ok, mgr, ax, lrange(mgr.shape[ax]) + assert_slice_ok(mgr, ax, []) + assert_slice_ok(mgr, ax, lrange(mgr.shape[ax])) if mgr.shape[ax] >= 3: - yield assert_slice_ok, mgr, ax, [0, 1, 2] - yield assert_slice_ok, mgr, ax, [-1, -2, -3] + assert_slice_ok(mgr, ax, [0, 1, 2]) + assert_slice_ok(mgr, ax, [-1, -2, -3]) def test_take(self): def assert_take_ok(mgr, axis, indexer): @@ -961,13 +961,13 @@ def assert_take_ok(mgr, axis, indexer): for mgr in self.MANAGERS: for ax in range(mgr.ndim): # take/fancy indexer - yield assert_take_ok, mgr, ax, [] - yield assert_take_ok, mgr, ax, [0, 0, 0] - yield assert_take_ok, mgr, ax, lrange(mgr.shape[ax]) + assert_take_ok(mgr, ax, []) + assert_take_ok(mgr, ax, [0, 0, 0]) + assert_take_ok(mgr, ax, lrange(mgr.shape[ax])) if mgr.shape[ax] >= 3: - yield assert_take_ok, mgr, ax, [0, 1, 2] - yield assert_take_ok, mgr, ax, [-1, -2, -3] + assert_take_ok(mgr, ax, [0, 1, 2]) + assert_take_ok(mgr, ax, [-1, -2, -3]) def test_reindex_axis(self): def assert_reindex_axis_is_ok(mgr, axis, new_labels, fill_value): @@ -985,25 +985,33 @@ def assert_reindex_axis_is_ok(mgr, axis, new_labels, fill_value): for mgr in self.MANAGERS: for ax in range(mgr.ndim): for fill_value in (None, np.nan, 100.): - yield (assert_reindex_axis_is_ok, mgr, ax, - pd.Index([]), fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, mgr.axes[ax], - fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, - mgr.axes[ax][[0, 0, 0]], fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, - pd.Index(['foo', 'bar', 'baz']), fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, - pd.Index(['foo', mgr.axes[ax][0], 'baz']), - fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + pd.Index([]), fill_value) + assert_reindex_axis_is_ok( + mgr, ax, mgr.axes[ax], + fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + mgr.axes[ax][[0, 0, 0]], fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + pd.Index(['foo', 'bar', 'baz']), fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + pd.Index(['foo', mgr.axes[ax][0], 'baz']), + fill_value) if mgr.shape[ax] >= 3: - yield (assert_reindex_axis_is_ok, mgr, ax, - mgr.axes[ax][:-3], fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, - mgr.axes[ax][-3::-1], fill_value) - yield (assert_reindex_axis_is_ok, mgr, ax, - mgr.axes[ax][[0, 1, 2, 0, 1, 2]], fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + mgr.axes[ax][:-3], fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + mgr.axes[ax][-3::-1], fill_value) + assert_reindex_axis_is_ok( + mgr, ax, + mgr.axes[ax][[0, 1, 2, 0, 1, 2]], fill_value) def test_reindex_indexer(self): @@ -1022,33 +1030,41 @@ def assert_reindex_indexer_is_ok(mgr, axis, new_labels, indexer, for mgr in self.MANAGERS: for ax in range(mgr.ndim): for fill_value in (None, np.nan, 100.): - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index([]), [], fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, - mgr.axes[ax], np.arange(mgr.shape[ax]), fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index(['foo'] * mgr.shape[ax]), - np.arange(mgr.shape[ax]), fill_value) - - yield (assert_reindex_indexer_is_ok, mgr, ax, - mgr.axes[ax][::-1], np.arange(mgr.shape[ax]), - fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, mgr.axes[ax], - np.arange(mgr.shape[ax])[::-1], fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index(['foo', 'bar', 'baz']), - [0, 0, 0], fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index(['foo', 'bar', 'baz']), - [-1, 0, -1], fill_value) - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index(['foo', mgr.axes[ax][0], 'baz']), - [-1, -1, -1], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index([]), [], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + mgr.axes[ax], np.arange(mgr.shape[ax]), fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index(['foo'] * mgr.shape[ax]), + np.arange(mgr.shape[ax]), fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + mgr.axes[ax][::-1], np.arange(mgr.shape[ax]), + fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, mgr.axes[ax], + np.arange(mgr.shape[ax])[::-1], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index(['foo', 'bar', 'baz']), + [0, 0, 0], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index(['foo', 'bar', 'baz']), + [-1, 0, -1], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index(['foo', mgr.axes[ax][0], 'baz']), + [-1, -1, -1], fill_value) if mgr.shape[ax] >= 3: - yield (assert_reindex_indexer_is_ok, mgr, ax, - pd.Index(['foo', 'bar', 'baz']), - [0, 1, 2], fill_value) + assert_reindex_indexer_is_ok( + mgr, ax, + pd.Index(['foo', 'bar', 'baz']), + [0, 1, 2], fill_value) # test_get_slice(slice_like, axis) # take(indexer, axis) @@ -1056,25 +1072,26 @@ def assert_reindex_indexer_is_ok(mgr, axis, new_labels, indexer, # reindex_indexer(new_labels, indexer, axis) -class TestBlockPlacement(tm.TestCase): - _multiprocess_can_split_ = True +class TestBlockPlacement(object): def test_slice_len(self): - self.assertEqual(len(BlockPlacement(slice(0, 4))), 4) - self.assertEqual(len(BlockPlacement(slice(0, 4, 2))), 2) - self.assertEqual(len(BlockPlacement(slice(0, 3, 2))), 2) + assert len(BlockPlacement(slice(0, 4))) == 4 + assert len(BlockPlacement(slice(0, 4, 2))) == 2 + assert len(BlockPlacement(slice(0, 3, 2))) == 2 - self.assertEqual(len(BlockPlacement(slice(0, 1, 2))), 1) - self.assertEqual(len(BlockPlacement(slice(1, 0, -1))), 1) + assert len(BlockPlacement(slice(0, 1, 2))) == 1 + assert len(BlockPlacement(slice(1, 0, -1))) == 1 def test_zero_step_raises(self): - self.assertRaises(ValueError, BlockPlacement, slice(1, 1, 0)) - self.assertRaises(ValueError, BlockPlacement, slice(1, 2, 0)) + with pytest.raises(ValueError): + BlockPlacement(slice(1, 1, 0)) + with pytest.raises(ValueError): + BlockPlacement(slice(1, 2, 0)) def test_unbounded_slice_raises(self): def assert_unbounded_slice_error(slc): - self.assertRaisesRegexp(ValueError, "unbounded slice", - lambda: BlockPlacement(slc)) + tm.assert_raises_regex(ValueError, "unbounded slice", + lambda: BlockPlacement(slc)) assert_unbounded_slice_error(slice(None, None)) assert_unbounded_slice_error(slice(10, None)) @@ -1092,7 +1109,7 @@ def assert_unbounded_slice_error(slc): def test_not_slice_like_slices(self): def assert_not_slice_like(slc): - self.assertTrue(not BlockPlacement(slc).is_slice_like) + assert not BlockPlacement(slc).is_slice_like assert_not_slice_like(slice(0, 0)) assert_not_slice_like(slice(100, 0)) @@ -1100,12 +1117,12 @@ def assert_not_slice_like(slc): assert_not_slice_like(slice(100, 100, -1)) assert_not_slice_like(slice(0, 100, -1)) - self.assertTrue(not BlockPlacement(slice(0, 0)).is_slice_like) - self.assertTrue(not BlockPlacement(slice(100, 100)).is_slice_like) + assert not BlockPlacement(slice(0, 0)).is_slice_like + assert not BlockPlacement(slice(100, 100)).is_slice_like def test_array_to_slice_conversion(self): def assert_as_slice_equals(arr, slc): - self.assertEqual(BlockPlacement(arr).as_slice, slc) + assert BlockPlacement(arr).as_slice == slc assert_as_slice_equals([0], slice(0, 1, 1)) assert_as_slice_equals([100], slice(100, 101, 1)) @@ -1115,12 +1132,14 @@ def assert_as_slice_equals(arr, slc): assert_as_slice_equals([0, 100], slice(0, 200, 100)) assert_as_slice_equals([2, 1], slice(2, 0, -1)) - assert_as_slice_equals([2, 1, 0], slice(2, None, -1)) - assert_as_slice_equals([100, 0], slice(100, None, -100)) + + if not PY361: + assert_as_slice_equals([2, 1, 0], slice(2, None, -1)) + assert_as_slice_equals([100, 0], slice(100, None, -100)) def test_not_slice_like_arrays(self): def assert_not_slice_like(arr): - self.assertTrue(not BlockPlacement(arr).is_slice_like) + assert not BlockPlacement(arr).is_slice_like assert_not_slice_like([]) assert_not_slice_like([-1]) @@ -1133,13 +1152,13 @@ def assert_not_slice_like(arr): assert_not_slice_like([1, 1, 1]) def test_slice_iter(self): - self.assertEqual(list(BlockPlacement(slice(0, 3))), [0, 1, 2]) - self.assertEqual(list(BlockPlacement(slice(0, 0))), []) - self.assertEqual(list(BlockPlacement(slice(3, 0))), []) + assert list(BlockPlacement(slice(0, 3))) == [0, 1, 2] + assert list(BlockPlacement(slice(0, 0))) == [] + assert list(BlockPlacement(slice(3, 0))) == [] - self.assertEqual(list(BlockPlacement(slice(3, 0, -1))), [3, 2, 1]) - self.assertEqual(list(BlockPlacement(slice(3, None, -1))), - [3, 2, 1, 0]) + if not PY361: + assert list(BlockPlacement(slice(3, 0, -1))) == [3, 2, 1] + assert list(BlockPlacement(slice(3, None, -1))) == [3, 2, 1, 0] def test_slice_to_array_conversion(self): def assert_as_array_equals(slc, asarray): @@ -1152,44 +1171,44 @@ def assert_as_array_equals(slc, asarray): assert_as_array_equals(slice(3, 0), []) assert_as_array_equals(slice(3, 0, -1), [3, 2, 1]) - assert_as_array_equals(slice(3, None, -1), [3, 2, 1, 0]) - assert_as_array_equals(slice(31, None, -10), [31, 21, 11, 1]) + + if not PY361: + assert_as_array_equals(slice(3, None, -1), [3, 2, 1, 0]) + assert_as_array_equals(slice(31, None, -10), [31, 21, 11, 1]) def test_blockplacement_add(self): bpl = BlockPlacement(slice(0, 5)) - self.assertEqual(bpl.add(1).as_slice, slice(1, 6, 1)) - self.assertEqual(bpl.add(np.arange(5)).as_slice, slice(0, 10, 2)) - self.assertEqual(list(bpl.add(np.arange(5, 0, -1))), [5, 5, 5, 5, 5]) + assert bpl.add(1).as_slice == slice(1, 6, 1) + assert bpl.add(np.arange(5)).as_slice == slice(0, 10, 2) + assert list(bpl.add(np.arange(5, 0, -1))) == [5, 5, 5, 5, 5] def test_blockplacement_add_int(self): def assert_add_equals(val, inc, result): - self.assertEqual(list(BlockPlacement(val).add(inc)), result) + assert list(BlockPlacement(val).add(inc)) == result assert_add_equals(slice(0, 0), 0, []) assert_add_equals(slice(1, 4), 0, [1, 2, 3]) assert_add_equals(slice(3, 0, -1), 0, [3, 2, 1]) - assert_add_equals(slice(2, None, -1), 0, [2, 1, 0]) assert_add_equals([1, 2, 4], 0, [1, 2, 4]) assert_add_equals(slice(0, 0), 10, []) assert_add_equals(slice(1, 4), 10, [11, 12, 13]) assert_add_equals(slice(3, 0, -1), 10, [13, 12, 11]) - assert_add_equals(slice(2, None, -1), 10, [12, 11, 10]) assert_add_equals([1, 2, 4], 10, [11, 12, 14]) assert_add_equals(slice(0, 0), -1, []) assert_add_equals(slice(1, 4), -1, [0, 1, 2]) - assert_add_equals(slice(3, 0, -1), -1, [2, 1, 0]) assert_add_equals([1, 2, 4], -1, [0, 1, 3]) - self.assertRaises(ValueError, - lambda: BlockPlacement(slice(1, 4)).add(-10)) - self.assertRaises(ValueError, - lambda: BlockPlacement([1, 2, 4]).add(-10)) - self.assertRaises(ValueError, - lambda: BlockPlacement(slice(2, None, -1)).add(-1)) + with pytest.raises(ValueError): + BlockPlacement(slice(1, 4)).add(-10) + with pytest.raises(ValueError): + BlockPlacement([1, 2, 4]).add(-10) + if not PY361: + assert_add_equals(slice(3, 0, -1), -1, [2, 1, 0]) + assert_add_equals(slice(2, None, -1), 0, [2, 1, 0]) + assert_add_equals(slice(2, None, -1), 10, [12, 11, 10]) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + with pytest.raises(ValueError): + BlockPlacement(slice(2, None, -1)).add(-1) diff --git a/pandas/tests/io/__init__.py b/pandas/tests/io/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/io/tests/data/S4_EDUC1.dta b/pandas/tests/io/data/S4_EDUC1.dta similarity index 100% rename from pandas/io/tests/data/S4_EDUC1.dta rename to pandas/tests/io/data/S4_EDUC1.dta diff --git a/pandas/io/tests/data/banklist.csv b/pandas/tests/io/data/banklist.csv similarity index 100% rename from pandas/io/tests/data/banklist.csv rename to pandas/tests/io/data/banklist.csv diff --git a/pandas/io/tests/data/banklist.html b/pandas/tests/io/data/banklist.html similarity index 100% rename from pandas/io/tests/data/banklist.html rename to pandas/tests/io/data/banklist.html diff --git a/pandas/io/tests/data/blank.xls b/pandas/tests/io/data/blank.xls old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank.xls rename to pandas/tests/io/data/blank.xls diff --git a/pandas/io/tests/data/blank.xlsm b/pandas/tests/io/data/blank.xlsm old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank.xlsm rename to pandas/tests/io/data/blank.xlsm diff --git a/pandas/io/tests/data/blank.xlsx b/pandas/tests/io/data/blank.xlsx old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank.xlsx rename to pandas/tests/io/data/blank.xlsx diff --git a/pandas/io/tests/data/blank_with_header.xls b/pandas/tests/io/data/blank_with_header.xls old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank_with_header.xls rename to pandas/tests/io/data/blank_with_header.xls diff --git a/pandas/io/tests/data/blank_with_header.xlsm b/pandas/tests/io/data/blank_with_header.xlsm old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank_with_header.xlsm rename to pandas/tests/io/data/blank_with_header.xlsm diff --git a/pandas/io/tests/data/blank_with_header.xlsx b/pandas/tests/io/data/blank_with_header.xlsx old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/blank_with_header.xlsx rename to pandas/tests/io/data/blank_with_header.xlsx diff --git a/pandas/io/tests/data/categorical_0_14_1.pickle b/pandas/tests/io/data/categorical_0_14_1.pickle similarity index 100% rename from pandas/io/tests/data/categorical_0_14_1.pickle rename to pandas/tests/io/data/categorical_0_14_1.pickle diff --git a/pandas/io/tests/data/categorical_0_15_2.pickle b/pandas/tests/io/data/categorical_0_15_2.pickle similarity index 100% rename from pandas/io/tests/data/categorical_0_15_2.pickle rename to pandas/tests/io/data/categorical_0_15_2.pickle diff --git a/pandas/io/tests/data/computer_sales_page.html b/pandas/tests/io/data/computer_sales_page.html similarity index 100% rename from pandas/io/tests/data/computer_sales_page.html rename to pandas/tests/io/data/computer_sales_page.html diff --git a/pandas/tests/io/data/feather-0_3_1.feather b/pandas/tests/io/data/feather-0_3_1.feather new file mode 100644 index 0000000000000..5a2c7b3dcc684 Binary files /dev/null and b/pandas/tests/io/data/feather-0_3_1.feather differ diff --git a/pandas/tests/io/data/fixed_width_format.txt b/pandas/tests/io/data/fixed_width_format.txt new file mode 100644 index 0000000000000..bb487d8de7ef9 --- /dev/null +++ b/pandas/tests/io/data/fixed_width_format.txt @@ -0,0 +1,3 @@ +A B C +1 2 3 +4 5 6 diff --git a/pandas/io/tests/data/gbq_fake_job.txt b/pandas/tests/io/data/gbq_fake_job.txt similarity index 100% rename from pandas/io/tests/data/gbq_fake_job.txt rename to pandas/tests/io/data/gbq_fake_job.txt diff --git a/pandas/io/tests/data/html_encoding/chinese_utf-16.html b/pandas/tests/io/data/html_encoding/chinese_utf-16.html similarity index 100% rename from pandas/io/tests/data/html_encoding/chinese_utf-16.html rename to pandas/tests/io/data/html_encoding/chinese_utf-16.html diff --git a/pandas/io/tests/data/html_encoding/chinese_utf-32.html b/pandas/tests/io/data/html_encoding/chinese_utf-32.html similarity index 100% rename from pandas/io/tests/data/html_encoding/chinese_utf-32.html rename to pandas/tests/io/data/html_encoding/chinese_utf-32.html diff --git a/pandas/io/tests/data/html_encoding/chinese_utf-8.html b/pandas/tests/io/data/html_encoding/chinese_utf-8.html similarity index 100% rename from pandas/io/tests/data/html_encoding/chinese_utf-8.html rename to pandas/tests/io/data/html_encoding/chinese_utf-8.html diff --git a/pandas/io/tests/data/html_encoding/letz_latin1.html b/pandas/tests/io/data/html_encoding/letz_latin1.html similarity index 100% rename from pandas/io/tests/data/html_encoding/letz_latin1.html rename to pandas/tests/io/data/html_encoding/letz_latin1.html diff --git a/pandas/io/tests/data/iris.csv b/pandas/tests/io/data/iris.csv similarity index 100% rename from pandas/io/tests/data/iris.csv rename to pandas/tests/io/data/iris.csv diff --git a/pandas/io/tests/data/legacy_hdf/datetimetz_object.h5 b/pandas/tests/io/data/legacy_hdf/datetimetz_object.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/datetimetz_object.h5 rename to pandas/tests/io/data/legacy_hdf/datetimetz_object.h5 diff --git a/pandas/io/tests/data/legacy_hdf/legacy_0.10.h5 b/pandas/tests/io/data/legacy_hdf/legacy_0.10.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/legacy_0.10.h5 rename to pandas/tests/io/data/legacy_hdf/legacy_0.10.h5 diff --git a/pandas/io/tests/data/legacy_hdf/legacy_table.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/legacy_table.h5 rename to pandas/tests/io/data/legacy_hdf/legacy_table.h5 diff --git a/pandas/io/tests/data/legacy_hdf/legacy_table_0.11.h5 b/pandas/tests/io/data/legacy_hdf/legacy_table_0.11.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/legacy_table_0.11.h5 rename to pandas/tests/io/data/legacy_hdf/legacy_table_0.11.h5 diff --git a/pandas/tests/io/data/legacy_hdf/periodindex_0.20.1_x86_64_darwin_2.7.13.h5 b/pandas/tests/io/data/legacy_hdf/periodindex_0.20.1_x86_64_darwin_2.7.13.h5 new file mode 100644 index 0000000000000..6fb92d3c564bd Binary files /dev/null and b/pandas/tests/io/data/legacy_hdf/periodindex_0.20.1_x86_64_darwin_2.7.13.h5 differ diff --git a/pandas/io/tests/data/legacy_hdf/pytables_native.h5 b/pandas/tests/io/data/legacy_hdf/pytables_native.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/pytables_native.h5 rename to pandas/tests/io/data/legacy_hdf/pytables_native.h5 diff --git a/pandas/io/tests/data/legacy_hdf/pytables_native2.h5 b/pandas/tests/io/data/legacy_hdf/pytables_native2.h5 similarity index 100% rename from pandas/io/tests/data/legacy_hdf/pytables_native2.h5 rename to pandas/tests/io/data/legacy_hdf/pytables_native2.h5 diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.0/0.16.0_x86_64_darwin_2.7.9.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.0/0.16.0_x86_64_darwin_2.7.9.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.0/0.16.0_x86_64_darwin_2.7.9.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.0/0.16.0_x86_64_darwin_2.7.9.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_2.7.10.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_2.7.10.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_2.7.10.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_2.7.10.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_3.4.3.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_3.4.3.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_3.4.3.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_AMD64_windows_3.4.3.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.10.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.10.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.10.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.10.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.9.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.9.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.9.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_2.7.9.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_3.4.3.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_3.4.3.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_3.4.3.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_darwin_3.4.3.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_2.7.10.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_2.7.10.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_2.7.10.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_2.7.10.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_3.4.3.msgpack b/pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_3.4.3.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_3.4.3.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.16.2/0.16.2_x86_64_linux_3.4.3.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_3.4.4.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_3.4.4.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_3.4.4.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_AMD64_windows_3.4.4.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_3.4.4.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_3.4.4.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_3.4.4.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_darwin_3.4.4.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_3.4.4.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_3.4.4.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_3.4.4.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.0_x86_64_linux_3.4.4.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_3.5.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_3.5.1.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_3.5.1.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.0/0.17.1_AMD64_windows_3.5.1.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_3.5.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_3.5.1.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_3.5.1.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_AMD64_windows_3.5.1.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_3.5.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_3.5.1.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_3.5.1.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_darwin_3.5.1.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_3.4.4.msgpack b/pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_3.4.4.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_3.4.4.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.17.1/0.17.1_x86_64_linux_3.4.4.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_3.5.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_3.5.1.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_3.5.1.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_AMD64_windows_3.5.1.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_2.7.11.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_2.7.11.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_2.7.11.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_2.7.11.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_3.5.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_3.5.1.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_3.5.1.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.0/0.18.0_x86_64_darwin_3.5.1.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_2.7.12.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_2.7.12.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_2.7.12.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_2.7.12.msgpack diff --git a/pandas/io/tests/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_3.5.2.msgpack b/pandas/tests/io/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_3.5.2.msgpack similarity index 100% rename from pandas/io/tests/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_3.5.2.msgpack rename to pandas/tests/io/data/legacy_msgpack/0.18.1/0.18.1_x86_64_darwin_3.5.2.msgpack diff --git a/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_2.7.12.msgpack b/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_2.7.12.msgpack new file mode 100644 index 0000000000000..f2dc38766025e Binary files /dev/null and b/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_2.7.12.msgpack differ diff --git a/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_3.6.1.msgpack b/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_3.6.1.msgpack new file mode 100644 index 0000000000000..4137629f53cf2 Binary files /dev/null and b/pandas/tests/io/data/legacy_msgpack/0.19.2/0.19.2_x86_64_darwin_3.6.1.msgpack differ diff --git a/pandas/io/tests/data/legacy_pickle/0.10.1/AMD64_windows_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.10.1/AMD64_windows_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.10.1/AMD64_windows_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.10.1/AMD64_windows_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.10.1/x86_64_linux_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.10.1/x86_64_linux_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.10.1/x86_64_linux_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.10.1/x86_64_linux_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.11.0/0.11.0_x86_64_linux_3.3.0.pickle b/pandas/tests/io/data/legacy_pickle/0.11.0/0.11.0_x86_64_linux_3.3.0.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.11.0/0.11.0_x86_64_linux_3.3.0.pickle rename to pandas/tests/io/data/legacy_pickle/0.11.0/0.11.0_x86_64_linux_3.3.0.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.11.0/x86_64_linux_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.11.0/x86_64_linux_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.11.0/x86_64_linux_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.11.0/x86_64_linux_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.11.0/x86_64_linux_3.3.0.pickle b/pandas/tests/io/data/legacy_pickle/0.11.0/x86_64_linux_3.3.0.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.11.0/x86_64_linux_3.3.0.pickle rename to pandas/tests/io/data/legacy_pickle/0.11.0/x86_64_linux_3.3.0.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.12.0/0.12.0_AMD64_windows_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.12.0/0.12.0_AMD64_windows_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.12.0/0.12.0_AMD64_windows_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.12.0/0.12.0_AMD64_windows_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.12.0/0.12.0_x86_64_linux_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.12.0/0.12.0_x86_64_linux_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.12.0/0.12.0_x86_64_linux_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.12.0/0.12.0_x86_64_linux_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_AMD64_windows_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_AMD64_windows_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_AMD64_windows_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_AMD64_windows_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.6.5.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.6.5.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.6.5.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.6.5.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_3.2.3.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_3.2.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_i686_linux_3.2.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_i686_linux_3.2.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.5.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.5.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.5.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.5.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.6.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.6.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.6.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_darwin_2.7.6.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.3.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.8.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.8.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.8.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_2.7.8.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_3.3.0.pickle b/pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_3.3.0.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_3.3.0.pickle rename to pandas/tests/io/data/legacy_pickle/0.13.0/0.13.0_x86_64_linux_3.3.0.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.14.0/0.14.0_x86_64_darwin_2.7.6.pickle b/pandas/tests/io/data/legacy_pickle/0.14.0/0.14.0_x86_64_darwin_2.7.6.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.14.0/0.14.0_x86_64_darwin_2.7.6.pickle rename to pandas/tests/io/data/legacy_pickle/0.14.0/0.14.0_x86_64_darwin_2.7.6.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.14.0/0.14.0_x86_64_linux_2.7.8.pickle b/pandas/tests/io/data/legacy_pickle/0.14.0/0.14.0_x86_64_linux_2.7.8.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.14.0/0.14.0_x86_64_linux_2.7.8.pickle rename to pandas/tests/io/data/legacy_pickle/0.14.0/0.14.0_x86_64_linux_2.7.8.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.14.1/0.14.1_x86_64_darwin_2.7.12.pickle b/pandas/tests/io/data/legacy_pickle/0.14.1/0.14.1_x86_64_darwin_2.7.12.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.14.1/0.14.1_x86_64_darwin_2.7.12.pickle rename to pandas/tests/io/data/legacy_pickle/0.14.1/0.14.1_x86_64_darwin_2.7.12.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.14.1/0.14.1_x86_64_linux_2.7.8.pickle b/pandas/tests/io/data/legacy_pickle/0.14.1/0.14.1_x86_64_linux_2.7.8.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.14.1/0.14.1_x86_64_linux_2.7.8.pickle rename to pandas/tests/io/data/legacy_pickle/0.14.1/0.14.1_x86_64_linux_2.7.8.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.15.0/0.15.0_x86_64_darwin_2.7.12.pickle b/pandas/tests/io/data/legacy_pickle/0.15.0/0.15.0_x86_64_darwin_2.7.12.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.15.0/0.15.0_x86_64_darwin_2.7.12.pickle rename to pandas/tests/io/data/legacy_pickle/0.15.0/0.15.0_x86_64_darwin_2.7.12.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.15.0/0.15.0_x86_64_linux_2.7.8.pickle b/pandas/tests/io/data/legacy_pickle/0.15.0/0.15.0_x86_64_linux_2.7.8.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.15.0/0.15.0_x86_64_linux_2.7.8.pickle rename to pandas/tests/io/data/legacy_pickle/0.15.0/0.15.0_x86_64_linux_2.7.8.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.15.2/0.15.2_x86_64_darwin_2.7.9.pickle b/pandas/tests/io/data/legacy_pickle/0.15.2/0.15.2_x86_64_darwin_2.7.9.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.15.2/0.15.2_x86_64_darwin_2.7.9.pickle rename to pandas/tests/io/data/legacy_pickle/0.15.2/0.15.2_x86_64_darwin_2.7.9.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.0/0.16.0_x86_64_darwin_2.7.9.pickle b/pandas/tests/io/data/legacy_pickle/0.16.0/0.16.0_x86_64_darwin_2.7.9.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.0/0.16.0_x86_64_darwin_2.7.9.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.0/0.16.0_x86_64_darwin_2.7.9.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_2.7.10.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_2.7.10.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_2.7.10.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_2.7.10.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_3.4.3.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_3.4.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_3.4.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_AMD64_windows_3.4.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.10.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.10.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.10.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.10.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.9.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.9.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.9.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_2.7.9.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_3.4.3.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_3.4.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_3.4.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_darwin_3.4.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_2.7.10.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_2.7.10.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_2.7.10.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_2.7.10.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_3.4.3.pickle b/pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_3.4.3.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_3.4.3.pickle rename to pandas/tests/io/data/legacy_pickle/0.16.2/0.16.2_x86_64_linux_3.4.3.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_3.4.4.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_3.4.4.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_3.4.4.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_AMD64_windows_3.4.4.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_3.4.4.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_3.4.4.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_3.4.4.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_darwin_3.4.4.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_3.4.4.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_3.4.4.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_3.4.4.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.0_x86_64_linux_3.4.4.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.0/0.17.1_AMD64_windows_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.0/0.17.1_AMD64_windows_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.0/0.17.1_AMD64_windows_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.0/0.17.1_AMD64_windows_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.1/0.17.1_AMD64_windows_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.1/0.17.1_AMD64_windows_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.1/0.17.1_AMD64_windows_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.1/0.17.1_AMD64_windows_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.17.1/0.17.1_x86_64_darwin_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.17.1/0.17.1_x86_64_darwin_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.17.1/0.17.1_x86_64_darwin_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.17.1/0.17.1_x86_64_darwin_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_3.5.1.pickle b/pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_3.5.1.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_3.5.1.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_AMD64_windows_3.5.1.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_2.7.11.pickle b/pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_2.7.11.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_2.7.11.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_2.7.11.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_3.5.1.pickle b/pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_3.5.1.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_3.5.1.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.0/0.18.0_x86_64_darwin_3.5.1.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_2.7.12.pickle b/pandas/tests/io/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_2.7.12.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_2.7.12.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_2.7.12.pickle diff --git a/pandas/io/tests/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_3.5.2.pickle b/pandas/tests/io/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_3.5.2.pickle similarity index 100% rename from pandas/io/tests/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_3.5.2.pickle rename to pandas/tests/io/data/legacy_pickle/0.18.1/0.18.1_x86_64_darwin_3.5.2.pickle diff --git a/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_2.7.12.pickle b/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_2.7.12.pickle new file mode 100644 index 0000000000000..d702ab444df62 Binary files /dev/null and b/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_2.7.12.pickle differ diff --git a/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_3.6.1.pickle b/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_3.6.1.pickle new file mode 100644 index 0000000000000..75ea95ff402c4 Binary files /dev/null and b/pandas/tests/io/data/legacy_pickle/0.19.2/0.19.2_x86_64_darwin_3.6.1.pickle differ diff --git a/pandas/io/tests/data/macau.html b/pandas/tests/io/data/macau.html similarity index 100% rename from pandas/io/tests/data/macau.html rename to pandas/tests/io/data/macau.html diff --git a/pandas/io/tests/data/nyse_wsj.html b/pandas/tests/io/data/nyse_wsj.html similarity index 100% rename from pandas/io/tests/data/nyse_wsj.html rename to pandas/tests/io/data/nyse_wsj.html diff --git a/pandas/io/tests/data/spam.html b/pandas/tests/io/data/spam.html similarity index 100% rename from pandas/io/tests/data/spam.html rename to pandas/tests/io/data/spam.html diff --git a/pandas/io/tests/data/stata10_115.dta b/pandas/tests/io/data/stata10_115.dta old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/stata10_115.dta rename to pandas/tests/io/data/stata10_115.dta diff --git a/pandas/io/tests/data/stata10_117.dta b/pandas/tests/io/data/stata10_117.dta old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/stata10_117.dta rename to pandas/tests/io/data/stata10_117.dta diff --git a/pandas/io/tests/data/stata11_115.dta b/pandas/tests/io/data/stata11_115.dta old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/stata11_115.dta rename to pandas/tests/io/data/stata11_115.dta diff --git a/pandas/io/tests/data/stata11_117.dta b/pandas/tests/io/data/stata11_117.dta old mode 100755 new mode 100644 similarity index 100% rename from pandas/io/tests/data/stata11_117.dta rename to pandas/tests/io/data/stata11_117.dta diff --git a/pandas/io/tests/data/stata12_117.dta b/pandas/tests/io/data/stata12_117.dta similarity index 100% rename from pandas/io/tests/data/stata12_117.dta rename to pandas/tests/io/data/stata12_117.dta diff --git a/pandas/io/tests/data/stata14_118.dta b/pandas/tests/io/data/stata14_118.dta similarity index 100% rename from pandas/io/tests/data/stata14_118.dta rename to pandas/tests/io/data/stata14_118.dta diff --git a/pandas/io/tests/data/stata15.dta b/pandas/tests/io/data/stata15.dta similarity index 100% rename from pandas/io/tests/data/stata15.dta rename to pandas/tests/io/data/stata15.dta diff --git a/pandas/io/tests/data/stata1_114.dta b/pandas/tests/io/data/stata1_114.dta similarity index 100% rename from pandas/io/tests/data/stata1_114.dta rename to pandas/tests/io/data/stata1_114.dta diff --git a/pandas/io/tests/data/stata1_117.dta b/pandas/tests/io/data/stata1_117.dta similarity index 100% rename from pandas/io/tests/data/stata1_117.dta rename to pandas/tests/io/data/stata1_117.dta diff --git a/pandas/io/tests/data/stata1_encoding.dta b/pandas/tests/io/data/stata1_encoding.dta similarity index 100% rename from pandas/io/tests/data/stata1_encoding.dta rename to pandas/tests/io/data/stata1_encoding.dta diff --git a/pandas/io/tests/data/stata2_113.dta b/pandas/tests/io/data/stata2_113.dta similarity index 100% rename from pandas/io/tests/data/stata2_113.dta rename to pandas/tests/io/data/stata2_113.dta diff --git a/pandas/io/tests/data/stata2_114.dta b/pandas/tests/io/data/stata2_114.dta similarity index 100% rename from pandas/io/tests/data/stata2_114.dta rename to pandas/tests/io/data/stata2_114.dta diff --git a/pandas/io/tests/data/stata2_115.dta b/pandas/tests/io/data/stata2_115.dta similarity index 100% rename from pandas/io/tests/data/stata2_115.dta rename to pandas/tests/io/data/stata2_115.dta diff --git a/pandas/io/tests/data/stata2_117.dta b/pandas/tests/io/data/stata2_117.dta similarity index 100% rename from pandas/io/tests/data/stata2_117.dta rename to pandas/tests/io/data/stata2_117.dta diff --git a/pandas/io/tests/data/stata3.csv b/pandas/tests/io/data/stata3.csv similarity index 100% rename from pandas/io/tests/data/stata3.csv rename to pandas/tests/io/data/stata3.csv diff --git a/pandas/io/tests/data/stata3_113.dta b/pandas/tests/io/data/stata3_113.dta similarity index 100% rename from pandas/io/tests/data/stata3_113.dta rename to pandas/tests/io/data/stata3_113.dta diff --git a/pandas/io/tests/data/stata3_114.dta b/pandas/tests/io/data/stata3_114.dta similarity index 100% rename from pandas/io/tests/data/stata3_114.dta rename to pandas/tests/io/data/stata3_114.dta diff --git a/pandas/io/tests/data/stata3_115.dta b/pandas/tests/io/data/stata3_115.dta similarity index 100% rename from pandas/io/tests/data/stata3_115.dta rename to pandas/tests/io/data/stata3_115.dta diff --git a/pandas/io/tests/data/stata3_117.dta b/pandas/tests/io/data/stata3_117.dta similarity index 100% rename from pandas/io/tests/data/stata3_117.dta rename to pandas/tests/io/data/stata3_117.dta diff --git a/pandas/io/tests/data/stata4_113.dta b/pandas/tests/io/data/stata4_113.dta similarity index 100% rename from pandas/io/tests/data/stata4_113.dta rename to pandas/tests/io/data/stata4_113.dta diff --git a/pandas/io/tests/data/stata4_114.dta b/pandas/tests/io/data/stata4_114.dta similarity index 100% rename from pandas/io/tests/data/stata4_114.dta rename to pandas/tests/io/data/stata4_114.dta diff --git a/pandas/io/tests/data/stata4_115.dta b/pandas/tests/io/data/stata4_115.dta similarity index 100% rename from pandas/io/tests/data/stata4_115.dta rename to pandas/tests/io/data/stata4_115.dta diff --git a/pandas/io/tests/data/stata4_117.dta b/pandas/tests/io/data/stata4_117.dta similarity index 100% rename from pandas/io/tests/data/stata4_117.dta rename to pandas/tests/io/data/stata4_117.dta diff --git a/pandas/io/tests/data/stata5.csv b/pandas/tests/io/data/stata5.csv similarity index 100% rename from pandas/io/tests/data/stata5.csv rename to pandas/tests/io/data/stata5.csv diff --git a/pandas/io/tests/data/stata5_113.dta b/pandas/tests/io/data/stata5_113.dta similarity index 100% rename from pandas/io/tests/data/stata5_113.dta rename to pandas/tests/io/data/stata5_113.dta diff --git a/pandas/io/tests/data/stata5_114.dta b/pandas/tests/io/data/stata5_114.dta similarity index 100% rename from pandas/io/tests/data/stata5_114.dta rename to pandas/tests/io/data/stata5_114.dta diff --git a/pandas/io/tests/data/stata5_115.dta b/pandas/tests/io/data/stata5_115.dta similarity index 100% rename from pandas/io/tests/data/stata5_115.dta rename to pandas/tests/io/data/stata5_115.dta diff --git a/pandas/io/tests/data/stata5_117.dta b/pandas/tests/io/data/stata5_117.dta similarity index 100% rename from pandas/io/tests/data/stata5_117.dta rename to pandas/tests/io/data/stata5_117.dta diff --git a/pandas/io/tests/data/stata6.csv b/pandas/tests/io/data/stata6.csv similarity index 100% rename from pandas/io/tests/data/stata6.csv rename to pandas/tests/io/data/stata6.csv diff --git a/pandas/io/tests/data/stata6_113.dta b/pandas/tests/io/data/stata6_113.dta similarity index 100% rename from pandas/io/tests/data/stata6_113.dta rename to pandas/tests/io/data/stata6_113.dta diff --git a/pandas/io/tests/data/stata6_114.dta b/pandas/tests/io/data/stata6_114.dta similarity index 100% rename from pandas/io/tests/data/stata6_114.dta rename to pandas/tests/io/data/stata6_114.dta diff --git a/pandas/io/tests/data/stata6_115.dta b/pandas/tests/io/data/stata6_115.dta similarity index 100% rename from pandas/io/tests/data/stata6_115.dta rename to pandas/tests/io/data/stata6_115.dta diff --git a/pandas/io/tests/data/stata6_117.dta b/pandas/tests/io/data/stata6_117.dta similarity index 100% rename from pandas/io/tests/data/stata6_117.dta rename to pandas/tests/io/data/stata6_117.dta diff --git a/pandas/io/tests/data/stata7_111.dta b/pandas/tests/io/data/stata7_111.dta similarity index 100% rename from pandas/io/tests/data/stata7_111.dta rename to pandas/tests/io/data/stata7_111.dta diff --git a/pandas/io/tests/data/stata7_115.dta b/pandas/tests/io/data/stata7_115.dta similarity index 100% rename from pandas/io/tests/data/stata7_115.dta rename to pandas/tests/io/data/stata7_115.dta diff --git a/pandas/io/tests/data/stata7_117.dta b/pandas/tests/io/data/stata7_117.dta similarity index 100% rename from pandas/io/tests/data/stata7_117.dta rename to pandas/tests/io/data/stata7_117.dta diff --git a/pandas/io/tests/data/stata8_113.dta b/pandas/tests/io/data/stata8_113.dta similarity index 100% rename from pandas/io/tests/data/stata8_113.dta rename to pandas/tests/io/data/stata8_113.dta diff --git a/pandas/io/tests/data/stata8_115.dta b/pandas/tests/io/data/stata8_115.dta similarity index 100% rename from pandas/io/tests/data/stata8_115.dta rename to pandas/tests/io/data/stata8_115.dta diff --git a/pandas/io/tests/data/stata8_117.dta b/pandas/tests/io/data/stata8_117.dta similarity index 100% rename from pandas/io/tests/data/stata8_117.dta rename to pandas/tests/io/data/stata8_117.dta diff --git a/pandas/io/tests/data/stata9_115.dta b/pandas/tests/io/data/stata9_115.dta similarity index 100% rename from pandas/io/tests/data/stata9_115.dta rename to pandas/tests/io/data/stata9_115.dta diff --git a/pandas/io/tests/data/stata9_117.dta b/pandas/tests/io/data/stata9_117.dta similarity index 100% rename from pandas/io/tests/data/stata9_117.dta rename to pandas/tests/io/data/stata9_117.dta diff --git a/pandas/io/tests/data/test1.csv b/pandas/tests/io/data/test1.csv similarity index 100% rename from pandas/io/tests/data/test1.csv rename to pandas/tests/io/data/test1.csv diff --git a/pandas/io/tests/data/test1.xls b/pandas/tests/io/data/test1.xls similarity index 100% rename from pandas/io/tests/data/test1.xls rename to pandas/tests/io/data/test1.xls diff --git a/pandas/io/tests/data/test1.xlsm b/pandas/tests/io/data/test1.xlsm similarity index 100% rename from pandas/io/tests/data/test1.xlsm rename to pandas/tests/io/data/test1.xlsm diff --git a/pandas/io/tests/data/test1.xlsx b/pandas/tests/io/data/test1.xlsx similarity index 100% rename from pandas/io/tests/data/test1.xlsx rename to pandas/tests/io/data/test1.xlsx diff --git a/pandas/io/tests/data/test2.xls b/pandas/tests/io/data/test2.xls similarity index 100% rename from pandas/io/tests/data/test2.xls rename to pandas/tests/io/data/test2.xls diff --git a/pandas/io/tests/data/test2.xlsm b/pandas/tests/io/data/test2.xlsm similarity index 100% rename from pandas/io/tests/data/test2.xlsm rename to pandas/tests/io/data/test2.xlsm diff --git a/pandas/io/tests/data/test2.xlsx b/pandas/tests/io/data/test2.xlsx similarity index 100% rename from pandas/io/tests/data/test2.xlsx rename to pandas/tests/io/data/test2.xlsx diff --git a/pandas/io/tests/data/test3.xls b/pandas/tests/io/data/test3.xls similarity index 100% rename from pandas/io/tests/data/test3.xls rename to pandas/tests/io/data/test3.xls diff --git a/pandas/io/tests/data/test3.xlsm b/pandas/tests/io/data/test3.xlsm similarity index 100% rename from pandas/io/tests/data/test3.xlsm rename to pandas/tests/io/data/test3.xlsm diff --git a/pandas/io/tests/data/test3.xlsx b/pandas/tests/io/data/test3.xlsx similarity index 100% rename from pandas/io/tests/data/test3.xlsx rename to pandas/tests/io/data/test3.xlsx diff --git a/pandas/io/tests/data/test4.xls b/pandas/tests/io/data/test4.xls similarity index 100% rename from pandas/io/tests/data/test4.xls rename to pandas/tests/io/data/test4.xls diff --git a/pandas/io/tests/data/test4.xlsm b/pandas/tests/io/data/test4.xlsm similarity index 100% rename from pandas/io/tests/data/test4.xlsm rename to pandas/tests/io/data/test4.xlsm diff --git a/pandas/io/tests/data/test4.xlsx b/pandas/tests/io/data/test4.xlsx similarity index 100% rename from pandas/io/tests/data/test4.xlsx rename to pandas/tests/io/data/test4.xlsx diff --git a/pandas/io/tests/data/test5.xls b/pandas/tests/io/data/test5.xls similarity index 100% rename from pandas/io/tests/data/test5.xls rename to pandas/tests/io/data/test5.xls diff --git a/pandas/io/tests/data/test5.xlsm b/pandas/tests/io/data/test5.xlsm similarity index 100% rename from pandas/io/tests/data/test5.xlsm rename to pandas/tests/io/data/test5.xlsm diff --git a/pandas/io/tests/data/test5.xlsx b/pandas/tests/io/data/test5.xlsx similarity index 100% rename from pandas/io/tests/data/test5.xlsx rename to pandas/tests/io/data/test5.xlsx diff --git a/pandas/io/tests/data/test_converters.xls b/pandas/tests/io/data/test_converters.xls similarity index 100% rename from pandas/io/tests/data/test_converters.xls rename to pandas/tests/io/data/test_converters.xls diff --git a/pandas/io/tests/data/test_converters.xlsm b/pandas/tests/io/data/test_converters.xlsm similarity index 100% rename from pandas/io/tests/data/test_converters.xlsm rename to pandas/tests/io/data/test_converters.xlsm diff --git a/pandas/io/tests/data/test_converters.xlsx b/pandas/tests/io/data/test_converters.xlsx similarity index 100% rename from pandas/io/tests/data/test_converters.xlsx rename to pandas/tests/io/data/test_converters.xlsx diff --git a/pandas/io/tests/data/test_index_name_pre17.xls b/pandas/tests/io/data/test_index_name_pre17.xls similarity index 100% rename from pandas/io/tests/data/test_index_name_pre17.xls rename to pandas/tests/io/data/test_index_name_pre17.xls diff --git a/pandas/io/tests/data/test_index_name_pre17.xlsm b/pandas/tests/io/data/test_index_name_pre17.xlsm similarity index 100% rename from pandas/io/tests/data/test_index_name_pre17.xlsm rename to pandas/tests/io/data/test_index_name_pre17.xlsm diff --git a/pandas/io/tests/data/test_index_name_pre17.xlsx b/pandas/tests/io/data/test_index_name_pre17.xlsx similarity index 100% rename from pandas/io/tests/data/test_index_name_pre17.xlsx rename to pandas/tests/io/data/test_index_name_pre17.xlsx diff --git a/pandas/io/tests/data/test_mmap.csv b/pandas/tests/io/data/test_mmap.csv similarity index 100% rename from pandas/io/tests/data/test_mmap.csv rename to pandas/tests/io/data/test_mmap.csv diff --git a/pandas/io/tests/data/test_multisheet.xls b/pandas/tests/io/data/test_multisheet.xls similarity index 100% rename from pandas/io/tests/data/test_multisheet.xls rename to pandas/tests/io/data/test_multisheet.xls diff --git a/pandas/io/tests/data/test_multisheet.xlsm b/pandas/tests/io/data/test_multisheet.xlsm similarity index 100% rename from pandas/io/tests/data/test_multisheet.xlsm rename to pandas/tests/io/data/test_multisheet.xlsm diff --git a/pandas/io/tests/data/test_multisheet.xlsx b/pandas/tests/io/data/test_multisheet.xlsx similarity index 100% rename from pandas/io/tests/data/test_multisheet.xlsx rename to pandas/tests/io/data/test_multisheet.xlsx diff --git a/pandas/io/tests/data/test_squeeze.xls b/pandas/tests/io/data/test_squeeze.xls similarity index 100% rename from pandas/io/tests/data/test_squeeze.xls rename to pandas/tests/io/data/test_squeeze.xls diff --git a/pandas/io/tests/data/test_squeeze.xlsm b/pandas/tests/io/data/test_squeeze.xlsm similarity index 100% rename from pandas/io/tests/data/test_squeeze.xlsm rename to pandas/tests/io/data/test_squeeze.xlsm diff --git a/pandas/io/tests/data/test_squeeze.xlsx b/pandas/tests/io/data/test_squeeze.xlsx similarity index 100% rename from pandas/io/tests/data/test_squeeze.xlsx rename to pandas/tests/io/data/test_squeeze.xlsx diff --git a/pandas/io/tests/data/test_types.xls b/pandas/tests/io/data/test_types.xls similarity index 100% rename from pandas/io/tests/data/test_types.xls rename to pandas/tests/io/data/test_types.xls diff --git a/pandas/io/tests/data/test_types.xlsm b/pandas/tests/io/data/test_types.xlsm similarity index 100% rename from pandas/io/tests/data/test_types.xlsm rename to pandas/tests/io/data/test_types.xlsm diff --git a/pandas/io/tests/data/test_types.xlsx b/pandas/tests/io/data/test_types.xlsx similarity index 100% rename from pandas/io/tests/data/test_types.xlsx rename to pandas/tests/io/data/test_types.xlsx diff --git a/pandas/io/tests/data/testdateoverflow.xls b/pandas/tests/io/data/testdateoverflow.xls similarity index 100% rename from pandas/io/tests/data/testdateoverflow.xls rename to pandas/tests/io/data/testdateoverflow.xls diff --git a/pandas/io/tests/data/testdateoverflow.xlsm b/pandas/tests/io/data/testdateoverflow.xlsm similarity index 100% rename from pandas/io/tests/data/testdateoverflow.xlsm rename to pandas/tests/io/data/testdateoverflow.xlsm diff --git a/pandas/io/tests/data/testdateoverflow.xlsx b/pandas/tests/io/data/testdateoverflow.xlsx similarity index 100% rename from pandas/io/tests/data/testdateoverflow.xlsx rename to pandas/tests/io/data/testdateoverflow.xlsx diff --git a/pandas/io/tests/data/testdtype.xls b/pandas/tests/io/data/testdtype.xls similarity index 100% rename from pandas/io/tests/data/testdtype.xls rename to pandas/tests/io/data/testdtype.xls diff --git a/pandas/io/tests/data/testdtype.xlsm b/pandas/tests/io/data/testdtype.xlsm similarity index 100% rename from pandas/io/tests/data/testdtype.xlsm rename to pandas/tests/io/data/testdtype.xlsm diff --git a/pandas/io/tests/data/testdtype.xlsx b/pandas/tests/io/data/testdtype.xlsx similarity index 100% rename from pandas/io/tests/data/testdtype.xlsx rename to pandas/tests/io/data/testdtype.xlsx diff --git a/pandas/io/tests/data/testmultiindex.xls b/pandas/tests/io/data/testmultiindex.xls similarity index 100% rename from pandas/io/tests/data/testmultiindex.xls rename to pandas/tests/io/data/testmultiindex.xls diff --git a/pandas/io/tests/data/testmultiindex.xlsm b/pandas/tests/io/data/testmultiindex.xlsm similarity index 100% rename from pandas/io/tests/data/testmultiindex.xlsm rename to pandas/tests/io/data/testmultiindex.xlsm diff --git a/pandas/io/tests/data/testmultiindex.xlsx b/pandas/tests/io/data/testmultiindex.xlsx similarity index 100% rename from pandas/io/tests/data/testmultiindex.xlsx rename to pandas/tests/io/data/testmultiindex.xlsx diff --git a/pandas/io/tests/data/testskiprows.xls b/pandas/tests/io/data/testskiprows.xls similarity index 100% rename from pandas/io/tests/data/testskiprows.xls rename to pandas/tests/io/data/testskiprows.xls diff --git a/pandas/io/tests/data/testskiprows.xlsm b/pandas/tests/io/data/testskiprows.xlsm similarity index 100% rename from pandas/io/tests/data/testskiprows.xlsm rename to pandas/tests/io/data/testskiprows.xlsm diff --git a/pandas/io/tests/data/testskiprows.xlsx b/pandas/tests/io/data/testskiprows.xlsx similarity index 100% rename from pandas/io/tests/data/testskiprows.xlsx rename to pandas/tests/io/data/testskiprows.xlsx diff --git a/pandas/io/tests/data/times_1900.xls b/pandas/tests/io/data/times_1900.xls similarity index 100% rename from pandas/io/tests/data/times_1900.xls rename to pandas/tests/io/data/times_1900.xls diff --git a/pandas/io/tests/data/times_1900.xlsm b/pandas/tests/io/data/times_1900.xlsm similarity index 100% rename from pandas/io/tests/data/times_1900.xlsm rename to pandas/tests/io/data/times_1900.xlsm diff --git a/pandas/io/tests/data/times_1900.xlsx b/pandas/tests/io/data/times_1900.xlsx similarity index 100% rename from pandas/io/tests/data/times_1900.xlsx rename to pandas/tests/io/data/times_1900.xlsx diff --git a/pandas/io/tests/data/times_1904.xls b/pandas/tests/io/data/times_1904.xls similarity index 100% rename from pandas/io/tests/data/times_1904.xls rename to pandas/tests/io/data/times_1904.xls diff --git a/pandas/io/tests/data/times_1904.xlsm b/pandas/tests/io/data/times_1904.xlsm similarity index 100% rename from pandas/io/tests/data/times_1904.xlsm rename to pandas/tests/io/data/times_1904.xlsm diff --git a/pandas/io/tests/data/times_1904.xlsx b/pandas/tests/io/data/times_1904.xlsx similarity index 100% rename from pandas/io/tests/data/times_1904.xlsx rename to pandas/tests/io/data/times_1904.xlsx diff --git a/pandas/io/tests/data/tips.csv b/pandas/tests/io/data/tips.csv similarity index 100% rename from pandas/io/tests/data/tips.csv rename to pandas/tests/io/data/tips.csv diff --git a/pandas/io/tests/data/valid_markup.html b/pandas/tests/io/data/valid_markup.html similarity index 100% rename from pandas/io/tests/data/valid_markup.html rename to pandas/tests/io/data/valid_markup.html diff --git a/pandas/io/tests/data/wikipedia_states.html b/pandas/tests/io/data/wikipedia_states.html similarity index 100% rename from pandas/io/tests/data/wikipedia_states.html rename to pandas/tests/io/data/wikipedia_states.html diff --git a/pandas/tests/io/formats/__init__.py b/pandas/tests/io/formats/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/io/tests/parser/data/unicode_series.csv b/pandas/tests/io/formats/data/unicode_series.csv similarity index 100% rename from pandas/io/tests/parser/data/unicode_series.csv rename to pandas/tests/io/formats/data/unicode_series.csv diff --git a/pandas/tests/io/formats/test_css.py b/pandas/tests/io/formats/test_css.py new file mode 100644 index 0000000000000..c07856dc63602 --- /dev/null +++ b/pandas/tests/io/formats/test_css.py @@ -0,0 +1,186 @@ +import pytest + +from pandas.util import testing as tm +from pandas.io.formats.css import CSSResolver, CSSWarning + + +def assert_resolves(css, props, inherited=None): + resolve = CSSResolver() + actual = resolve(css, inherited=inherited) + assert props == actual + + +def assert_same_resolution(css1, css2, inherited=None): + resolve = CSSResolver() + resolved1 = resolve(css1, inherited=inherited) + resolved2 = resolve(css2, inherited=inherited) + assert resolved1 == resolved2 + + +@pytest.mark.parametrize('name,norm,abnorm', [ + ('whitespace', 'hello: world; foo: bar', + ' \t hello \t :\n world \n ; \n foo: \tbar\n\n'), + ('case', 'hello: world; foo: bar', 'Hello: WORLD; foO: bar'), + ('empty-decl', 'hello: world; foo: bar', + '; hello: world;; foo: bar;\n; ;'), + ('empty-list', '', ';'), +]) +def test_css_parse_normalisation(name, norm, abnorm): + assert_same_resolution(norm, abnorm) + + +@pytest.mark.parametrize( + 'invalid_css,remainder', [ + # No colon + ('hello-world', ''), + ('border-style: solid; hello-world', 'border-style: solid'), + ('border-style: solid; hello-world; font-weight: bold', + 'border-style: solid; font-weight: bold'), + # Unclosed string fail + # Invalid size + ('font-size: blah', 'font-size: 1em'), + ('font-size: 1a2b', 'font-size: 1em'), + ('font-size: 1e5pt', 'font-size: 1em'), + ('font-size: 1+6pt', 'font-size: 1em'), + ('font-size: 1unknownunit', 'font-size: 1em'), + ('font-size: 10', 'font-size: 1em'), + ('font-size: 10 pt', 'font-size: 1em'), + ]) +def test_css_parse_invalid(invalid_css, remainder): + with tm.assert_produces_warning(CSSWarning): + assert_same_resolution(invalid_css, remainder) + + # TODO: we should be checking that in other cases no warnings are raised + + +@pytest.mark.parametrize( + 'shorthand,expansions', + [('margin', ['margin-top', 'margin-right', + 'margin-bottom', 'margin-left']), + ('padding', ['padding-top', 'padding-right', + 'padding-bottom', 'padding-left']), + ('border-width', ['border-top-width', 'border-right-width', + 'border-bottom-width', 'border-left-width']), + ('border-color', ['border-top-color', 'border-right-color', + 'border-bottom-color', 'border-left-color']), + ('border-style', ['border-top-style', 'border-right-style', + 'border-bottom-style', 'border-left-style']), + ]) +def test_css_side_shorthands(shorthand, expansions): + top, right, bottom, left = expansions + + assert_resolves('%s: 1pt' % shorthand, + {top: '1pt', right: '1pt', + bottom: '1pt', left: '1pt'}) + + assert_resolves('%s: 1pt 4pt' % shorthand, + {top: '1pt', right: '4pt', + bottom: '1pt', left: '4pt'}) + + assert_resolves('%s: 1pt 4pt 2pt' % shorthand, + {top: '1pt', right: '4pt', + bottom: '2pt', left: '4pt'}) + + assert_resolves('%s: 1pt 4pt 2pt 0pt' % shorthand, + {top: '1pt', right: '4pt', + bottom: '2pt', left: '0pt'}) + + with tm.assert_produces_warning(CSSWarning): + assert_resolves('%s: 1pt 1pt 1pt 1pt 1pt' % shorthand, + {}) + + +@pytest.mark.parametrize('style,inherited,equiv', [ + ('margin: 1px; margin: 2px', '', + 'margin: 2px'), + ('margin: 1px', 'margin: 2px', + 'margin: 1px'), + ('margin: 1px; margin: inherit', 'margin: 2px', + 'margin: 2px'), + ('margin: 1px; margin-top: 2px', '', + 'margin-left: 1px; margin-right: 1px; ' + + 'margin-bottom: 1px; margin-top: 2px'), + ('margin-top: 2px', 'margin: 1px', + 'margin: 1px; margin-top: 2px'), + ('margin: 1px', 'margin-top: 2px', + 'margin: 1px'), + ('margin: 1px; margin-top: inherit', 'margin: 2px', + 'margin: 1px; margin-top: 2px'), +]) +def test_css_precedence(style, inherited, equiv): + resolve = CSSResolver() + inherited_props = resolve(inherited) + style_props = resolve(style, inherited=inherited_props) + equiv_props = resolve(equiv) + assert style_props == equiv_props + + +@pytest.mark.parametrize('style,equiv', [ + ('margin: 1px; margin-top: inherit', + 'margin-bottom: 1px; margin-right: 1px; margin-left: 1px'), + ('margin-top: inherit', ''), + ('margin-top: initial', ''), +]) +def test_css_none_absent(style, equiv): + assert_same_resolution(style, equiv) + + +@pytest.mark.parametrize('size,resolved', [ + ('xx-small', '6pt'), + ('x-small', '%fpt' % 7.5), + ('small', '%fpt' % 9.6), + ('medium', '12pt'), + ('large', '%fpt' % 13.5), + ('x-large', '18pt'), + ('xx-large', '24pt'), + + ('8px', '6pt'), + ('1.25pc', '15pt'), + ('.25in', '18pt'), + ('02.54cm', '72pt'), + ('25.4mm', '72pt'), + ('101.6q', '72pt'), + ('101.6q', '72pt'), +]) +@pytest.mark.parametrize('relative_to', # invariant to inherited size + [None, '16pt']) +def test_css_absolute_font_size(size, relative_to, resolved): + if relative_to is None: + inherited = None + else: + inherited = {'font-size': relative_to} + assert_resolves('font-size: %s' % size, {'font-size': resolved}, + inherited=inherited) + + +@pytest.mark.parametrize('size,relative_to,resolved', [ + ('1em', None, '12pt'), + ('1.0em', None, '12pt'), + ('1.25em', None, '15pt'), + ('1em', '16pt', '16pt'), + ('1.0em', '16pt', '16pt'), + ('1.25em', '16pt', '20pt'), + ('1rem', '16pt', '12pt'), + ('1.0rem', '16pt', '12pt'), + ('1.25rem', '16pt', '15pt'), + ('100%', None, '12pt'), + ('125%', None, '15pt'), + ('100%', '16pt', '16pt'), + ('125%', '16pt', '20pt'), + ('2ex', None, '12pt'), + ('2.0ex', None, '12pt'), + ('2.50ex', None, '15pt'), + ('inherit', '16pt', '16pt'), + + ('smaller', None, '10pt'), + ('smaller', '18pt', '15pt'), + ('larger', None, '%fpt' % 14.4), + ('larger', '15pt', '18pt'), +]) +def test_css_relative_font_size(size, relative_to, resolved): + if relative_to is None: + inherited = None + else: + inherited = {'font-size': relative_to} + assert_resolves('font-size: %s' % size, {'font-size': resolved}, + inherited=inherited) diff --git a/pandas/tests/io/formats/test_eng_formatting.py b/pandas/tests/io/formats/test_eng_formatting.py new file mode 100644 index 0000000000000..9d5773283176c --- /dev/null +++ b/pandas/tests/io/formats/test_eng_formatting.py @@ -0,0 +1,193 @@ +import numpy as np +import pandas as pd +from pandas import DataFrame +from pandas.compat import u +import pandas.io.formats.format as fmt +from pandas.util import testing as tm + + +class TestEngFormatter(object): + + def test_eng_float_formatter(self): + df = DataFrame({'A': [1.41, 141., 14100, 1410000.]}) + + fmt.set_eng_float_format() + result = df.to_string() + expected = (' A\n' + '0 1.410E+00\n' + '1 141.000E+00\n' + '2 14.100E+03\n' + '3 1.410E+06') + assert result == expected + + fmt.set_eng_float_format(use_eng_prefix=True) + result = df.to_string() + expected = (' A\n' + '0 1.410\n' + '1 141.000\n' + '2 14.100k\n' + '3 1.410M') + assert result == expected + + fmt.set_eng_float_format(accuracy=0) + result = df.to_string() + expected = (' A\n' + '0 1E+00\n' + '1 141E+00\n' + '2 14E+03\n' + '3 1E+06') + assert result == expected + + tm.reset_display_options() + + def compare(self, formatter, input, output): + formatted_input = formatter(input) + assert formatted_input == output + + def compare_all(self, formatter, in_out): + """ + Parameters: + ----------- + formatter: EngFormatter under test + in_out: list of tuples. Each tuple = (number, expected_formatting) + + It is tested if 'formatter(number) == expected_formatting'. + *number* should be >= 0 because formatter(-number) == fmt is also + tested. *fmt* is derived from *expected_formatting* + """ + for input, output in in_out: + self.compare(formatter, input, output) + self.compare(formatter, -input, "-" + output[1:]) + + def test_exponents_with_eng_prefix(self): + formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) + f = np.sqrt(2) + in_out = [ + (f * 10 ** -24, " 1.414y"), (f * 10 ** -23, " 14.142y"), + (f * 10 ** -22, " 141.421y"), (f * 10 ** -21, " 1.414z"), + (f * 10 ** -20, " 14.142z"), (f * 10 ** -19, " 141.421z"), + (f * 10 ** -18, " 1.414a"), (f * 10 ** -17, " 14.142a"), + (f * 10 ** -16, " 141.421a"), (f * 10 ** -15, " 1.414f"), + (f * 10 ** -14, " 14.142f"), (f * 10 ** -13, " 141.421f"), + (f * 10 ** -12, " 1.414p"), (f * 10 ** -11, " 14.142p"), + (f * 10 ** -10, " 141.421p"), (f * 10 ** -9, " 1.414n"), + (f * 10 ** -8, " 14.142n"), (f * 10 ** -7, " 141.421n"), + (f * 10 ** -6, " 1.414u"), (f * 10 ** -5, " 14.142u"), + (f * 10 ** -4, " 141.421u"), (f * 10 ** -3, " 1.414m"), + (f * 10 ** -2, " 14.142m"), (f * 10 ** -1, " 141.421m"), + (f * 10 ** 0, " 1.414"), (f * 10 ** 1, " 14.142"), + (f * 10 ** 2, " 141.421"), (f * 10 ** 3, " 1.414k"), + (f * 10 ** 4, " 14.142k"), (f * 10 ** 5, " 141.421k"), + (f * 10 ** 6, " 1.414M"), (f * 10 ** 7, " 14.142M"), + (f * 10 ** 8, " 141.421M"), (f * 10 ** 9, " 1.414G"), + (f * 10 ** 10, " 14.142G"), (f * 10 ** 11, " 141.421G"), + (f * 10 ** 12, " 1.414T"), (f * 10 ** 13, " 14.142T"), + (f * 10 ** 14, " 141.421T"), (f * 10 ** 15, " 1.414P"), + (f * 10 ** 16, " 14.142P"), (f * 10 ** 17, " 141.421P"), + (f * 10 ** 18, " 1.414E"), (f * 10 ** 19, " 14.142E"), + (f * 10 ** 20, " 141.421E"), (f * 10 ** 21, " 1.414Z"), + (f * 10 ** 22, " 14.142Z"), (f * 10 ** 23, " 141.421Z"), + (f * 10 ** 24, " 1.414Y"), (f * 10 ** 25, " 14.142Y"), + (f * 10 ** 26, " 141.421Y")] + self.compare_all(formatter, in_out) + + def test_exponents_without_eng_prefix(self): + formatter = fmt.EngFormatter(accuracy=4, use_eng_prefix=False) + f = np.pi + in_out = [ + (f * 10 ** -24, " 3.1416E-24"), + (f * 10 ** -23, " 31.4159E-24"), + (f * 10 ** -22, " 314.1593E-24"), + (f * 10 ** -21, " 3.1416E-21"), + (f * 10 ** -20, " 31.4159E-21"), + (f * 10 ** -19, " 314.1593E-21"), + (f * 10 ** -18, " 3.1416E-18"), + (f * 10 ** -17, " 31.4159E-18"), + (f * 10 ** -16, " 314.1593E-18"), + (f * 10 ** -15, " 3.1416E-15"), + (f * 10 ** -14, " 31.4159E-15"), + (f * 10 ** -13, " 314.1593E-15"), + (f * 10 ** -12, " 3.1416E-12"), + (f * 10 ** -11, " 31.4159E-12"), + (f * 10 ** -10, " 314.1593E-12"), + (f * 10 ** -9, " 3.1416E-09"), + (f * 10 ** -8, " 31.4159E-09"), + (f * 10 ** -7, " 314.1593E-09"), + (f * 10 ** -6, " 3.1416E-06"), + (f * 10 ** -5, " 31.4159E-06"), + (f * 10 ** -4, " 314.1593E-06"), + (f * 10 ** -3, " 3.1416E-03"), + (f * 10 ** -2, " 31.4159E-03"), + (f * 10 ** -1, " 314.1593E-03"), + (f * 10 ** 0, " 3.1416E+00"), + (f * 10 ** 1, " 31.4159E+00"), + (f * 10 ** 2, " 314.1593E+00"), + (f * 10 ** 3, " 3.1416E+03"), + (f * 10 ** 4, " 31.4159E+03"), + (f * 10 ** 5, " 314.1593E+03"), + (f * 10 ** 6, " 3.1416E+06"), + (f * 10 ** 7, " 31.4159E+06"), + (f * 10 ** 8, " 314.1593E+06"), + (f * 10 ** 9, " 3.1416E+09"), + (f * 10 ** 10, " 31.4159E+09"), + (f * 10 ** 11, " 314.1593E+09"), + (f * 10 ** 12, " 3.1416E+12"), + (f * 10 ** 13, " 31.4159E+12"), + (f * 10 ** 14, " 314.1593E+12"), + (f * 10 ** 15, " 3.1416E+15"), + (f * 10 ** 16, " 31.4159E+15"), + (f * 10 ** 17, " 314.1593E+15"), + (f * 10 ** 18, " 3.1416E+18"), + (f * 10 ** 19, " 31.4159E+18"), + (f * 10 ** 20, " 314.1593E+18"), + (f * 10 ** 21, " 3.1416E+21"), + (f * 10 ** 22, " 31.4159E+21"), + (f * 10 ** 23, " 314.1593E+21"), + (f * 10 ** 24, " 3.1416E+24"), + (f * 10 ** 25, " 31.4159E+24"), + (f * 10 ** 26, " 314.1593E+24")] + self.compare_all(formatter, in_out) + + def test_rounding(self): + formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) + in_out = [(5.55555, ' 5.556'), (55.5555, ' 55.556'), + (555.555, ' 555.555'), (5555.55, ' 5.556k'), + (55555.5, ' 55.556k'), (555555, ' 555.555k')] + self.compare_all(formatter, in_out) + + formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) + in_out = [(5.55555, ' 5.6'), (55.5555, ' 55.6'), (555.555, ' 555.6'), + (5555.55, ' 5.6k'), (55555.5, ' 55.6k'), (555555, ' 555.6k')] + self.compare_all(formatter, in_out) + + formatter = fmt.EngFormatter(accuracy=0, use_eng_prefix=True) + in_out = [(5.55555, ' 6'), (55.5555, ' 56'), (555.555, ' 556'), + (5555.55, ' 6k'), (55555.5, ' 56k'), (555555, ' 556k')] + self.compare_all(formatter, in_out) + + formatter = fmt.EngFormatter(accuracy=3, use_eng_prefix=True) + result = formatter(0) + assert result == u(' 0.000') + + def test_nan(self): + # Issue #11981 + + formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) + result = formatter(np.nan) + assert result == u('NaN') + + df = pd.DataFrame({'a': [1.5, 10.3, 20.5], + 'b': [50.3, 60.67, 70.12], + 'c': [100.2, 101.33, 120.33]}) + pt = df.pivot_table(values='a', index='b', columns='c') + fmt.set_eng_float_format(accuracy=1) + result = pt.to_string() + assert 'NaN' in result + tm.reset_display_options() + + def test_inf(self): + # Issue #11981 + + formatter = fmt.EngFormatter(accuracy=1, use_eng_prefix=True) + result = formatter(np.inf) + assert result == u('inf') diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py new file mode 100644 index 0000000000000..e1499565ce4a6 --- /dev/null +++ b/pandas/tests/io/formats/test_format.py @@ -0,0 +1,2576 @@ +# -*- coding: utf-8 -*- + +""" +Test output formatting for Series/DataFrame, including to_string & reprs +""" + +from __future__ import print_function +import re + +import pytz +import dateutil +import itertools +from operator import methodcaller +import os +import sys +import warnings +from datetime import datetime + +import pytest + +import numpy as np +import pandas as pd +from pandas import (DataFrame, Series, Index, Timestamp, MultiIndex, + date_range, NaT, read_table) +from pandas.compat import (range, zip, lrange, StringIO, PY3, + u, lzip, is_platform_windows, + is_platform_32bit) +import pandas.compat as compat + +import pandas.io.formats.format as fmt +import pandas.io.formats.printing as printing + +import pandas.util.testing as tm +from pandas.io.formats.terminal import get_terminal_size +from pandas.core.config import (set_option, get_option, option_context, + reset_option) + +use_32bit_repr = is_platform_windows() or is_platform_32bit() + +_frame = DataFrame(tm.getSeriesData()) + + +def curpath(): + pth, _ = os.path.split(os.path.abspath(__file__)) + return pth + + +def has_info_repr(df): + r = repr(df) + c1 = r.split('\n')[0].startswith(" + # 2. Index + # 3. Columns + # 4. dtype + # 5. memory usage + # 6. trailing newline + nv = len(r.split('\n')) == 6 + return has_info and nv + + +def has_horizontally_truncated_repr(df): + try: # Check header row + fst_line = np.array(repr(df).splitlines()[0].split()) + cand_col = np.where(fst_line == '...')[0][0] + except: + return False + # Make sure each row has this ... in the same place + r = repr(df) + for ix, l in enumerate(r.splitlines()): + if not r.split()[cand_col] == '...': + return False + return True + + +def has_vertically_truncated_repr(df): + r = repr(df) + only_dot_row = False + for row in r.splitlines(): + if re.match(r'^[\.\ ]+$', row): + only_dot_row = True + return only_dot_row + + +def has_truncated_repr(df): + return has_horizontally_truncated_repr( + df) or has_vertically_truncated_repr(df) + + +def has_doubly_truncated_repr(df): + return has_horizontally_truncated_repr( + df) and has_vertically_truncated_repr(df) + + +def has_expanded_repr(df): + r = repr(df) + for line in r.split('\n'): + if line.endswith('\\'): + return True + return False + + +class TestDataFrameFormatting(object): + + def setup_method(self, method): + self.warn_filters = warnings.filters + warnings.filterwarnings('ignore', category=FutureWarning, + module=".*format") + + self.frame = _frame.copy() + + def teardown_method(self, method): + warnings.filters = self.warn_filters + + def test_repr_embedded_ndarray(self): + arr = np.empty(10, dtype=[('err', object)]) + for i in range(len(arr)): + arr['err'][i] = np.random.randn(i) + + df = DataFrame(arr) + repr(df['err']) + repr(df) + df.to_string() + + def test_eng_float_formatter(self): + self.frame.loc[5] = 0 + + fmt.set_eng_float_format() + repr(self.frame) + + fmt.set_eng_float_format(use_eng_prefix=True) + repr(self.frame) + + fmt.set_eng_float_format(accuracy=0) + repr(self.frame) + tm.reset_display_options() + + def test_show_null_counts(self): + + df = DataFrame(1, columns=range(10), index=range(10)) + df.iloc[1, 1] = np.nan + + def check(null_counts, result): + buf = StringIO() + df.info(buf=buf, null_counts=null_counts) + assert ('non-null' in buf.getvalue()) is result + + with option_context('display.max_info_rows', 20, + 'display.max_info_columns', 20): + check(None, True) + check(True, True) + check(False, False) + + with option_context('display.max_info_rows', 5, + 'display.max_info_columns', 5): + check(None, False) + check(True, False) + check(False, False) + + def test_repr_tuples(self): + buf = StringIO() + + df = DataFrame({'tups': lzip(range(10), range(10))}) + repr(df) + df.to_string(col_space=10, buf=buf) + + def test_repr_truncation(self): + max_len = 20 + with option_context("display.max_colwidth", max_len): + df = DataFrame({'A': np.random.randn(10), + 'B': [tm.rands(np.random.randint( + max_len - 1, max_len + 1)) for i in range(10) + ]}) + r = repr(df) + r = r[r.find('\n') + 1:] + + adj = fmt._get_adjustment() + + for line, value in lzip(r.split('\n'), df['B']): + if adj.len(value) + 1 > max_len: + assert '...' in line + else: + assert '...' not in line + + with option_context("display.max_colwidth", 999999): + assert '...' not in repr(df) + + with option_context("display.max_colwidth", max_len + 2): + assert '...' not in repr(df) + + def test_repr_chop_threshold(self): + df = DataFrame([[0.1, 0.5], [0.5, -0.1]]) + pd.reset_option("display.chop_threshold") # default None + assert repr(df) == ' 0 1\n0 0.1 0.5\n1 0.5 -0.1' + + with option_context("display.chop_threshold", 0.2): + assert repr(df) == ' 0 1\n0 0.0 0.5\n1 0.5 0.0' + + with option_context("display.chop_threshold", 0.6): + assert repr(df) == ' 0 1\n0 0.0 0.0\n1 0.0 0.0' + + with option_context("display.chop_threshold", None): + assert repr(df) == ' 0 1\n0 0.1 0.5\n1 0.5 -0.1' + + def test_repr_chop_threshold_column_below(self): + # GH 6839: validation case + + df = pd.DataFrame([[10, 20, 30, 40], + [8e-10, -1e-11, 2e-9, -2e-11]]).T + + with option_context("display.chop_threshold", 0): + assert repr(df) == (' 0 1\n' + '0 10.0 8.000000e-10\n' + '1 20.0 -1.000000e-11\n' + '2 30.0 2.000000e-09\n' + '3 40.0 -2.000000e-11') + + with option_context("display.chop_threshold", 1e-8): + assert repr(df) == (' 0 1\n' + '0 10.0 0.000000e+00\n' + '1 20.0 0.000000e+00\n' + '2 30.0 0.000000e+00\n' + '3 40.0 0.000000e+00') + + with option_context("display.chop_threshold", 5e-11): + assert repr(df) == (' 0 1\n' + '0 10.0 8.000000e-10\n' + '1 20.0 0.000000e+00\n' + '2 30.0 2.000000e-09\n' + '3 40.0 0.000000e+00') + + def test_repr_obeys_max_seq_limit(self): + with option_context("display.max_seq_items", 2000): + assert len(printing.pprint_thing(lrange(1000))) > 1000 + + with option_context("display.max_seq_items", 5): + assert len(printing.pprint_thing(lrange(1000))) < 100 + + def test_repr_set(self): + assert printing.pprint_thing(set([1])) == '{1}' + + def test_repr_is_valid_construction_code(self): + # for the case of Index, where the repr is traditional rather then + # stylized + idx = Index(['a', 'b']) + res = eval("pd." + repr(idx)) + tm.assert_series_equal(Series(res), Series(idx)) + + def test_repr_should_return_str(self): + # http://docs.python.org/py3k/reference/datamodel.html#object.__repr__ + # http://docs.python.org/reference/datamodel.html#object.__repr__ + # "...The return value must be a string object." + + # (str on py2.x, str (unicode) on py3) + + data = [8, 5, 3, 5] + index1 = [u("\u03c3"), u("\u03c4"), u("\u03c5"), u("\u03c6")] + cols = [u("\u03c8")] + df = DataFrame(data, columns=cols, index=index1) + assert type(df.__repr__()) == str # both py2 / 3 + + def test_repr_no_backslash(self): + with option_context('mode.sim_interactive', True): + df = DataFrame(np.random.randn(10, 4)) + assert '\\' not in repr(df) + + def test_expand_frame_repr(self): + df_small = DataFrame('hello', [0], [0]) + df_wide = DataFrame('hello', [0], lrange(10)) + df_tall = DataFrame('hello', lrange(30), lrange(5)) + + with option_context('mode.sim_interactive', True): + with option_context('display.max_columns', 10, 'display.width', 20, + 'display.max_rows', 20, + 'display.show_dimensions', True): + with option_context('display.expand_frame_repr', True): + assert not has_truncated_repr(df_small) + assert not has_expanded_repr(df_small) + assert not has_truncated_repr(df_wide) + assert has_expanded_repr(df_wide) + assert has_vertically_truncated_repr(df_tall) + assert has_expanded_repr(df_tall) + + with option_context('display.expand_frame_repr', False): + assert not has_truncated_repr(df_small) + assert not has_expanded_repr(df_small) + assert not has_horizontally_truncated_repr(df_wide) + assert not has_expanded_repr(df_wide) + assert has_vertically_truncated_repr(df_tall) + assert not has_expanded_repr(df_tall) + + def test_repr_non_interactive(self): + # in non interactive mode, there can be no dependency on the + # result of terminal auto size detection + df = DataFrame('hello', lrange(1000), lrange(5)) + + with option_context('mode.sim_interactive', False, 'display.width', 0, + 'display.max_rows', 5000): + assert not has_truncated_repr(df) + assert not has_expanded_repr(df) + + def test_repr_max_columns_max_rows(self): + term_width, term_height = get_terminal_size() + if term_width < 10 or term_height < 10: + pytest.skip("terminal size too small, " + "{0} x {1}".format(term_width, term_height)) + + def mkframe(n): + index = ['%05d' % i for i in range(n)] + return DataFrame(0, index, index) + + df6 = mkframe(6) + df10 = mkframe(10) + with option_context('mode.sim_interactive', True): + with option_context('display.width', term_width * 2): + with option_context('display.max_rows', 5, + 'display.max_columns', 5): + assert not has_expanded_repr(mkframe(4)) + assert not has_expanded_repr(mkframe(5)) + assert not has_expanded_repr(df6) + assert has_doubly_truncated_repr(df6) + + with option_context('display.max_rows', 20, + 'display.max_columns', 10): + # Out off max_columns boundary, but no extending + # since not exceeding width + assert not has_expanded_repr(df6) + assert not has_truncated_repr(df6) + + with option_context('display.max_rows', 9, + 'display.max_columns', 10): + # out vertical bounds can not result in exanded repr + assert not has_expanded_repr(df10) + assert has_vertically_truncated_repr(df10) + + # width=None in terminal, auto detection + with option_context('display.max_columns', 100, 'display.max_rows', + term_width * 20, 'display.width', None): + df = mkframe((term_width // 7) - 2) + assert not has_expanded_repr(df) + df = mkframe((term_width // 7) + 2) + printing.pprint_thing(df._repr_fits_horizontal_()) + assert has_expanded_repr(df) + + def test_str_max_colwidth(self): + # GH 7856 + df = pd.DataFrame([{'a': 'foo', + 'b': 'bar', + 'c': 'uncomfortably long line with lots of stuff', + 'd': 1}, {'a': 'foo', + 'b': 'bar', + 'c': 'stuff', + 'd': 1}]) + df.set_index(['a', 'b', 'c']) + assert str(df) == ( + ' a b c d\n' + '0 foo bar uncomfortably long line with lots of stuff 1\n' + '1 foo bar stuff 1') + with option_context('max_colwidth', 20): + assert str(df) == (' a b c d\n' + '0 foo bar uncomfortably lo... 1\n' + '1 foo bar stuff 1') + + def test_auto_detect(self): + term_width, term_height = get_terminal_size() + fac = 1.05 # Arbitrary large factor to exceed term widht + cols = range(int(term_width * fac)) + index = range(10) + df = DataFrame(index=index, columns=cols) + with option_context('mode.sim_interactive', True): + with option_context('max_rows', None): + with option_context('max_columns', None): + # Wrap around with None + assert has_expanded_repr(df) + with option_context('max_rows', 0): + with option_context('max_columns', 0): + # Truncate with auto detection. + assert has_horizontally_truncated_repr(df) + + index = range(int(term_height * fac)) + df = DataFrame(index=index, columns=cols) + with option_context('max_rows', 0): + with option_context('max_columns', None): + # Wrap around with None + assert has_expanded_repr(df) + # Truncate vertically + assert has_vertically_truncated_repr(df) + + with option_context('max_rows', None): + with option_context('max_columns', 0): + assert has_horizontally_truncated_repr(df) + + def test_to_string_repr_unicode(self): + buf = StringIO() + + unicode_values = [u('\u03c3')] * 10 + unicode_values = np.array(unicode_values, dtype=object) + df = DataFrame({'unicode': unicode_values}) + df.to_string(col_space=10, buf=buf) + + # it works! + repr(df) + + idx = Index(['abc', u('\u03c3a'), 'aegdvg']) + ser = Series(np.random.randn(len(idx)), idx) + rs = repr(ser).split('\n') + line_len = len(rs[0]) + for line in rs[1:]: + try: + line = line.decode(get_option("display.encoding")) + except: + pass + if not line.startswith('dtype:'): + assert len(line) == line_len + + # it works even if sys.stdin in None + _stdin = sys.stdin + try: + sys.stdin = None + repr(df) + finally: + sys.stdin = _stdin + + def test_to_string_unicode_columns(self): + df = DataFrame({u('\u03c3'): np.arange(10.)}) + + buf = StringIO() + df.to_string(buf=buf) + buf.getvalue() + + buf = StringIO() + df.info(buf=buf) + buf.getvalue() + + result = self.frame.to_string() + assert isinstance(result, compat.text_type) + + def test_to_string_utf8_columns(self): + n = u("\u05d0").encode('utf-8') + + with option_context('display.max_rows', 1): + df = DataFrame([1, 2], columns=[n]) + repr(df) + + def test_to_string_unicode_two(self): + dm = DataFrame({u('c/\u03c3'): []}) + buf = StringIO() + dm.to_string(buf) + + def test_to_string_unicode_three(self): + dm = DataFrame(['\xc2']) + buf = StringIO() + dm.to_string(buf) + + def test_to_string_with_formatters(self): + df = DataFrame({'int': [1, 2, 3], + 'float': [1.0, 2.0, 3.0], + 'object': [(1, 2), True, False]}, + columns=['int', 'float', 'object']) + + formatters = [('int', lambda x: '0x%x' % x), + ('float', lambda x: '[% 4.1f]' % x), + ('object', lambda x: '-%s-' % str(x))] + result = df.to_string(formatters=dict(formatters)) + result2 = df.to_string(formatters=lzip(*formatters)[1]) + assert result == (' int float object\n' + '0 0x1 [ 1.0] -(1, 2)-\n' + '1 0x2 [ 2.0] -True-\n' + '2 0x3 [ 3.0] -False-') + assert result == result2 + + def test_to_string_with_datetime64_monthformatter(self): + months = [datetime(2016, 1, 1), datetime(2016, 2, 2)] + x = DataFrame({'months': months}) + + def format_func(x): + return x.strftime('%Y-%m') + result = x.to_string(formatters={'months': format_func}) + expected = 'months\n0 2016-01\n1 2016-02' + assert result.strip() == expected + + def test_to_string_with_datetime64_hourformatter(self): + + x = DataFrame({'hod': pd.to_datetime(['10:10:10.100', '12:12:12.120'], + format='%H:%M:%S.%f')}) + + def format_func(x): + return x.strftime('%H:%M') + + result = x.to_string(formatters={'hod': format_func}) + expected = 'hod\n0 10:10\n1 12:12' + assert result.strip() == expected + + def test_to_string_with_formatters_unicode(self): + df = DataFrame({u('c/\u03c3'): [1, 2, 3]}) + result = df.to_string(formatters={u('c/\u03c3'): lambda x: '%s' % x}) + assert result == u(' c/\u03c3\n') + '0 1\n1 2\n2 3' + + def test_east_asian_unicode_frame(self): + if PY3: + _rep = repr + else: + _rep = unicode # noqa + + # not alighned properly because of east asian width + + # mid col + df = DataFrame({'a': [u'あ', u'いいい', u'う', u'ええええええ'], + 'b': [1, 222, 33333, 4]}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\na あ 1\n" + u"bb いいい 222\nc う 33333\n" + u"ddd ええええええ 4") + assert _rep(df) == expected + + # last col + df = DataFrame({'a': [1, 222, 33333, 4], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\na 1 あ\n" + u"bb 222 いいい\nc 33333 う\n" + u"ddd 4 ええええええ") + assert _rep(df) == expected + + # all col + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\na あああああ あ\n" + u"bb い いいい\nc う う\n" + u"ddd えええ ええええええ") + assert _rep(df) == expected + + # column name + df = DataFrame({u'あああああ': [1, 222, 33333, 4], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" b あああああ\na あ 1\n" + u"bb いいい 222\nc う 33333\n" + u"ddd ええええええ 4") + assert _rep(df) == expected + + # index + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=[u'あああ', u'いいいいいい', u'うう', u'え']) + expected = (u" a b\nあああ あああああ あ\n" + u"いいいいいい い いいい\nうう う う\n" + u"え えええ ええええええ") + assert _rep(df) == expected + + # index name + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=pd.Index([u'あ', u'い', u'うう', u'え'], + name=u'おおおお')) + expected = (u" a b\n" + u"おおおお \n" + u"あ あああああ あ\n" + u"い い いいい\n" + u"うう う う\n" + u"え えええ ええええええ") + assert _rep(df) == expected + + # all + df = DataFrame({u'あああ': [u'あああ', u'い', u'う', u'えええええ'], + u'いいいいい': [u'あ', u'いいい', u'う', u'ええ']}, + index=pd.Index([u'あ', u'いいい', u'うう', u'え'], + name=u'お')) + expected = (u" あああ いいいいい\n" + u"お \n" + u"あ あああ あ\n" + u"いいい い いいい\n" + u"うう う う\n" + u"え えええええ ええ") + assert _rep(df) == expected + + # MultiIndex + idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( + u'おおお', u'かかかか'), (u'き', u'くく')]) + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=idx) + expected = (u" a b\n" + u"あ いい あああああ あ\n" + u"う え い いいい\n" + u"おおお かかかか う う\n" + u"き くく えええ ええええええ") + assert _rep(df) == expected + + # truncate + with option_context('display.max_rows', 3, 'display.max_columns', 3): + df = pd.DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ'], + 'c': [u'お', u'か', u'ききき', u'くくくくくく'], + u'ああああ': [u'さ', u'し', u'す', u'せ']}, + columns=['a', 'b', 'c', u'ああああ']) + + expected = (u" a ... ああああ\n0 あああああ ... さ\n" + u".. ... ... ...\n3 えええ ... せ\n" + u"\n[4 rows x 4 columns]") + assert _rep(df) == expected + + df.index = [u'あああ', u'いいいい', u'う', 'aaa'] + expected = (u" a ... ああああ\nあああ あああああ ... さ\n" + u".. ... ... ...\naaa えええ ... せ\n" + u"\n[4 rows x 4 columns]") + assert _rep(df) == expected + + # Emable Unicode option ----------------------------------------- + with option_context('display.unicode.east_asian_width', True): + + # mid col + df = DataFrame({'a': [u'あ', u'いいい', u'う', u'ええええええ'], + 'b': [1, 222, 33333, 4]}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\na あ 1\n" + u"bb いいい 222\nc う 33333\n" + u"ddd ええええええ 4") + assert _rep(df) == expected + + # last col + df = DataFrame({'a': [1, 222, 33333, 4], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\na 1 あ\n" + u"bb 222 いいい\nc 33333 う\n" + u"ddd 4 ええええええ") + assert _rep(df) == expected + + # all col + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" a b\n" + u"a あああああ あ\n" + u"bb い いいい\n" + u"c う う\n" + u"ddd えええ ええええええ") + assert _rep(df) == expected + + # column name + df = DataFrame({u'あああああ': [1, 222, 33333, 4], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=['a', 'bb', 'c', 'ddd']) + expected = (u" b あああああ\n" + u"a あ 1\n" + u"bb いいい 222\n" + u"c う 33333\n" + u"ddd ええええええ 4") + assert _rep(df) == expected + + # index + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=[u'あああ', u'いいいいいい', u'うう', u'え']) + expected = (u" a b\n" + u"あああ あああああ あ\n" + u"いいいいいい い いいい\n" + u"うう う う\n" + u"え えええ ええええええ") + assert _rep(df) == expected + + # index name + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=pd.Index([u'あ', u'い', u'うう', u'え'], + name=u'おおおお')) + expected = (u" a b\n" + u"おおおお \n" + u"あ あああああ あ\n" + u"い い いいい\n" + u"うう う う\n" + u"え えええ ええええええ") + assert _rep(df) == expected + + # all + df = DataFrame({u'あああ': [u'あああ', u'い', u'う', u'えええええ'], + u'いいいいい': [u'あ', u'いいい', u'う', u'ええ']}, + index=pd.Index([u'あ', u'いいい', u'うう', u'え'], + name=u'お')) + expected = (u" あああ いいいいい\n" + u"お \n" + u"あ あああ あ\n" + u"いいい い いいい\n" + u"うう う う\n" + u"え えええええ ええ") + assert _rep(df) == expected + + # MultiIndex + idx = pd.MultiIndex.from_tuples([(u'あ', u'いい'), (u'う', u'え'), ( + u'おおお', u'かかかか'), (u'き', u'くく')]) + df = DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ']}, + index=idx) + expected = (u" a b\n" + u"あ いい あああああ あ\n" + u"う え い いいい\n" + u"おおお かかかか う う\n" + u"き くく えええ ええええええ") + assert _rep(df) == expected + + # truncate + with option_context('display.max_rows', 3, 'display.max_columns', + 3): + + df = pd.DataFrame({'a': [u'あああああ', u'い', u'う', u'えええ'], + 'b': [u'あ', u'いいい', u'う', u'ええええええ'], + 'c': [u'お', u'か', u'ききき', u'くくくくくく'], + u'ああああ': [u'さ', u'し', u'す', u'せ']}, + columns=['a', 'b', 'c', u'ああああ']) + + expected = (u" a ... ああああ\n" + u"0 あああああ ... さ\n" + u".. ... ... ...\n" + u"3 えええ ... せ\n" + u"\n[4 rows x 4 columns]") + assert _rep(df) == expected + + df.index = [u'あああ', u'いいいい', u'う', 'aaa'] + expected = (u" a ... ああああ\n" + u"あああ あああああ ... さ\n" + u"... ... ... ...\n" + u"aaa えええ ... せ\n" + u"\n[4 rows x 4 columns]") + assert _rep(df) == expected + + # ambiguous unicode + df = DataFrame({u'あああああ': [1, 222, 33333, 4], + 'b': [u'あ', u'いいい', u'¡¡', u'ええええええ']}, + index=['a', 'bb', 'c', '¡¡¡']) + expected = (u" b あああああ\n" + u"a あ 1\n" + u"bb いいい 222\n" + u"c ¡¡ 33333\n" + u"¡¡¡ ええええええ 4") + assert _rep(df) == expected + + def test_to_string_buffer_all_unicode(self): + buf = StringIO() + + empty = DataFrame({u('c/\u03c3'): Series()}) + nonempty = DataFrame({u('c/\u03c3'): Series([1, 2, 3])}) + + print(empty, file=buf) + print(nonempty, file=buf) + + # this should work + buf.getvalue() + + def test_to_string_with_col_space(self): + df = DataFrame(np.random.random(size=(1, 3))) + c10 = len(df.to_string(col_space=10).split("\n")[1]) + c20 = len(df.to_string(col_space=20).split("\n")[1]) + c30 = len(df.to_string(col_space=30).split("\n")[1]) + assert c10 < c20 < c30 + + # GH 8230 + # col_space wasn't being applied with header=False + with_header = df.to_string(col_space=20) + with_header_row1 = with_header.splitlines()[1] + no_header = df.to_string(col_space=20, header=False) + assert len(with_header_row1) == len(no_header) + + def test_to_string_truncate_indices(self): + for index in [tm.makeStringIndex, tm.makeUnicodeIndex, tm.makeIntIndex, + tm.makeDateIndex, tm.makePeriodIndex]: + for column in [tm.makeStringIndex]: + for h in [10, 20]: + for w in [10, 20]: + with option_context("display.expand_frame_repr", + False): + df = DataFrame(index=index(h), columns=column(w)) + with option_context("display.max_rows", 15): + if h == 20: + assert has_vertically_truncated_repr(df) + else: + assert not has_vertically_truncated_repr( + df) + with option_context("display.max_columns", 15): + if w == 20: + assert has_horizontally_truncated_repr(df) + else: + assert not ( + has_horizontally_truncated_repr(df)) + with option_context("display.max_rows", 15, + "display.max_columns", 15): + if h == 20 and w == 20: + assert has_doubly_truncated_repr(df) + else: + assert not has_doubly_truncated_repr( + df) + + def test_to_string_truncate_multilevel(self): + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + df = DataFrame(index=arrays, columns=arrays) + with option_context("display.max_rows", 7, "display.max_columns", 7): + assert has_doubly_truncated_repr(df) + + def test_truncate_with_different_dtypes(self): + + # 11594, 12045 + # when truncated the dtypes of the splits can differ + + # 11594 + import datetime + s = Series([datetime.datetime(2012, 1, 1)] * 10 + + [datetime.datetime(1012, 1, 2)] + [ + datetime.datetime(2012, 1, 3)] * 10) + + with pd.option_context('display.max_rows', 8): + result = str(s) + assert 'object' in result + + # 12045 + df = DataFrame({'text': ['some words'] + [None] * 9}) + + with pd.option_context('display.max_rows', 8, + 'display.max_columns', 3): + result = str(df) + assert 'None' in result + assert 'NaN' not in result + + def test_datetimelike_frame(self): + + # GH 12211 + df = DataFrame( + {'date': [pd.Timestamp('20130101').tz_localize('UTC')] + + [pd.NaT] * 5}) + + with option_context("display.max_rows", 5): + result = str(df) + assert '2013-01-01 00:00:00+00:00' in result + assert 'NaT' in result + assert '...' in result + assert '[6 rows x 1 columns]' in result + + dts = [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5 + [pd.NaT] * 5 + df = pd.DataFrame({"dt": dts, + "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) + with option_context('display.max_rows', 5): + expected = (' dt x\n' + '0 2011-01-01 00:00:00-05:00 1\n' + '1 2011-01-01 00:00:00-05:00 2\n' + '.. ... ..\n' + '8 NaT 9\n' + '9 NaT 10\n\n' + '[10 rows x 2 columns]') + assert repr(df) == expected + + dts = [pd.NaT] * 5 + [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5 + df = pd.DataFrame({"dt": dts, + "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) + with option_context('display.max_rows', 5): + expected = (' dt x\n' + '0 NaT 1\n' + '1 NaT 2\n' + '.. ... ..\n' + '8 2011-01-01 00:00:00-05:00 9\n' + '9 2011-01-01 00:00:00-05:00 10\n\n' + '[10 rows x 2 columns]') + assert repr(df) == expected + + dts = ([pd.Timestamp('2011-01-01', tz='Asia/Tokyo')] * 5 + + [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5) + df = pd.DataFrame({"dt": dts, + "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}) + with option_context('display.max_rows', 5): + expected = (' dt x\n' + '0 2011-01-01 00:00:00+09:00 1\n' + '1 2011-01-01 00:00:00+09:00 2\n' + '.. ... ..\n' + '8 2011-01-01 00:00:00-05:00 9\n' + '9 2011-01-01 00:00:00-05:00 10\n\n' + '[10 rows x 2 columns]') + assert repr(df) == expected + + def test_nonunicode_nonascii_alignment(self): + df = DataFrame([["aa\xc3\xa4\xc3\xa4", 1], ["bbbb", 2]]) + rep_str = df.to_string() + lines = rep_str.split('\n') + assert len(lines[1]) == len(lines[2]) + + def test_unicode_problem_decoding_as_ascii(self): + dm = DataFrame({u('c/\u03c3'): Series({'test': np.nan})}) + compat.text_type(dm.to_string()) + + def test_string_repr_encoding(self): + filepath = tm.get_data_path('unicode_series.csv') + df = pd.read_csv(filepath, header=None, encoding='latin1') + repr(df) + repr(df[1]) + + def test_repr_corner(self): + # representing infs poses no problems + df = DataFrame({'foo': [-np.inf, np.inf]}) + repr(df) + + def test_frame_info_encoding(self): + index = ['\'Til There Was You (1997)', + 'ldum klaka (Cold Fever) (1994)'] + fmt.set_option('display.max_rows', 1) + df = DataFrame(columns=['a', 'b', 'c'], index=index) + repr(df) + repr(df.T) + fmt.set_option('display.max_rows', 200) + + def test_pprint_thing(self): + from pandas.io.formats.printing import pprint_thing as pp_t + + if PY3: + pytest.skip("doesn't work on Python 3") + + assert pp_t('a') == u('a') + assert pp_t(u('a')) == u('a') + assert pp_t(None) == 'None' + assert pp_t(u('\u05d0'), quote_strings=True) == u("u'\u05d0'") + assert pp_t(u('\u05d0'), quote_strings=False) == u('\u05d0') + assert (pp_t((u('\u05d0'), u('\u05d1')), quote_strings=True) == + u("(u'\u05d0', u'\u05d1')")) + assert (pp_t((u('\u05d0'), (u('\u05d1'), u('\u05d2'))), + quote_strings=True) == u("(u'\u05d0', " + "(u'\u05d1', u'\u05d2'))")) + assert (pp_t(('foo', u('\u05d0'), (u('\u05d0'), u('\u05d0'))), + quote_strings=True) == u("(u'foo', u'\u05d0', " + "(u'\u05d0', u'\u05d0'))")) + + # gh-2038: escape embedded tabs in string + assert "\t" not in pp_t("a\tb", escape_chars=("\t", )) + + def test_wide_repr(self): + with option_context('mode.sim_interactive', True, + 'display.show_dimensions', True): + max_cols = get_option('display.max_columns') + df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) + set_option('display.expand_frame_repr', False) + rep_str = repr(df) + + assert "10 rows x %d columns" % (max_cols - 1) in rep_str + set_option('display.expand_frame_repr', True) + wide_repr = repr(df) + assert rep_str != wide_repr + + with option_context('display.width', 120): + wider_repr = repr(df) + assert len(wider_repr) < len(wide_repr) + + reset_option('display.expand_frame_repr') + + def test_wide_repr_wide_columns(self): + with option_context('mode.sim_interactive', True): + df = DataFrame(np.random.randn(5, 3), + columns=['a' * 90, 'b' * 90, 'c' * 90]) + rep_str = repr(df) + + assert len(rep_str.splitlines()) == 20 + + def test_wide_repr_named(self): + with option_context('mode.sim_interactive', True): + max_cols = get_option('display.max_columns') + df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) + df.index.name = 'DataFrame Index' + set_option('display.expand_frame_repr', False) + + rep_str = repr(df) + set_option('display.expand_frame_repr', True) + wide_repr = repr(df) + assert rep_str != wide_repr + + with option_context('display.width', 150): + wider_repr = repr(df) + assert len(wider_repr) < len(wide_repr) + + for line in wide_repr.splitlines()[1::13]: + assert 'DataFrame Index' in line + + reset_option('display.expand_frame_repr') + + def test_wide_repr_multiindex(self): + with option_context('mode.sim_interactive', True): + midx = MultiIndex.from_arrays(tm.rands_array(5, size=(2, 10))) + max_cols = get_option('display.max_columns') + df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1)), + index=midx) + df.index.names = ['Level 0', 'Level 1'] + set_option('display.expand_frame_repr', False) + rep_str = repr(df) + set_option('display.expand_frame_repr', True) + wide_repr = repr(df) + assert rep_str != wide_repr + + with option_context('display.width', 150): + wider_repr = repr(df) + assert len(wider_repr) < len(wide_repr) + + for line in wide_repr.splitlines()[1::13]: + assert 'Level 0 Level 1' in line + + reset_option('display.expand_frame_repr') + + def test_wide_repr_multiindex_cols(self): + with option_context('mode.sim_interactive', True): + max_cols = get_option('display.max_columns') + midx = MultiIndex.from_arrays(tm.rands_array(5, size=(2, 10))) + mcols = MultiIndex.from_arrays( + tm.rands_array(3, size=(2, max_cols - 1))) + df = DataFrame(tm.rands_array(25, (10, max_cols - 1)), + index=midx, columns=mcols) + df.index.names = ['Level 0', 'Level 1'] + set_option('display.expand_frame_repr', False) + rep_str = repr(df) + set_option('display.expand_frame_repr', True) + wide_repr = repr(df) + assert rep_str != wide_repr + + with option_context('display.width', 150): + wider_repr = repr(df) + assert len(wider_repr) < len(wide_repr) + + reset_option('display.expand_frame_repr') + + def test_wide_repr_unicode(self): + with option_context('mode.sim_interactive', True): + max_cols = get_option('display.max_columns') + df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) + set_option('display.expand_frame_repr', False) + rep_str = repr(df) + set_option('display.expand_frame_repr', True) + wide_repr = repr(df) + assert rep_str != wide_repr + + with option_context('display.width', 150): + wider_repr = repr(df) + assert len(wider_repr) < len(wide_repr) + + reset_option('display.expand_frame_repr') + + def test_wide_repr_wide_long_columns(self): + with option_context('mode.sim_interactive', True): + df = DataFrame({'a': ['a' * 30, 'b' * 30], + 'b': ['c' * 70, 'd' * 80]}) + + result = repr(df) + assert 'ccccc' in result + assert 'ddddd' in result + + def test_long_series(self): + n = 1000 + s = Series( + np.random.randint(-50, 50, n), + index=['s%04d' % x for x in range(n)], dtype='int64') + + import re + str_rep = str(s) + nmatches = len(re.findall('dtype', str_rep)) + assert nmatches == 1 + + def test_index_with_nan(self): + # GH 2850 + df = DataFrame({'id1': {0: '1a3', + 1: '9h4'}, + 'id2': {0: np.nan, + 1: 'd67'}, + 'id3': {0: '78d', + 1: '79d'}, + 'value': {0: 123, + 1: 64}}) + + # multi-index + y = df.set_index(['id1', 'id2', 'id3']) + result = y.to_string() + expected = u( + ' value\nid1 id2 id3 \n' + '1a3 NaN 78d 123\n9h4 d67 79d 64') + assert result == expected + + # index + y = df.set_index('id2') + result = y.to_string() + expected = u( + ' id1 id3 value\nid2 \n' + 'NaN 1a3 78d 123\nd67 9h4 79d 64') + assert result == expected + + # with append (this failed in 0.12) + y = df.set_index(['id1', 'id2']).set_index('id3', append=True) + result = y.to_string() + expected = u( + ' value\nid1 id2 id3 \n' + '1a3 NaN 78d 123\n9h4 d67 79d 64') + assert result == expected + + # all-nan in mi + df2 = df.copy() + df2.loc[:, 'id2'] = np.nan + y = df2.set_index('id2') + result = y.to_string() + expected = u( + ' id1 id3 value\nid2 \n' + 'NaN 1a3 78d 123\nNaN 9h4 79d 64') + assert result == expected + + # partial nan in mi + df2 = df.copy() + df2.loc[:, 'id2'] = np.nan + y = df2.set_index(['id2', 'id3']) + result = y.to_string() + expected = u( + ' id1 value\nid2 id3 \n' + 'NaN 78d 1a3 123\n 79d 9h4 64') + assert result == expected + + df = DataFrame({'id1': {0: np.nan, + 1: '9h4'}, + 'id2': {0: np.nan, + 1: 'd67'}, + 'id3': {0: np.nan, + 1: '79d'}, + 'value': {0: 123, + 1: 64}}) + + y = df.set_index(['id1', 'id2', 'id3']) + result = y.to_string() + expected = u( + ' value\nid1 id2 id3 \n' + 'NaN NaN NaN 123\n9h4 d67 79d 64') + assert result == expected + + def test_to_string(self): + + # big mixed + biggie = DataFrame({'A': np.random.randn(200), + 'B': tm.makeStringIndex(200)}, + index=lrange(200)) + + biggie.loc[:20, 'A'] = np.nan + biggie.loc[:20, 'B'] = np.nan + s = biggie.to_string() + + buf = StringIO() + retval = biggie.to_string(buf=buf) + assert retval is None + assert buf.getvalue() == s + + assert isinstance(s, compat.string_types) + + # print in right order + result = biggie.to_string(columns=['B', 'A'], col_space=17, + float_format='%.5f'.__mod__) + lines = result.split('\n') + header = lines[0].strip().split() + joined = '\n'.join([re.sub(r'\s+', ' ', x).strip() for x in lines[1:]]) + recons = read_table(StringIO(joined), names=header, + header=None, sep=' ') + tm.assert_series_equal(recons['B'], biggie['B']) + assert recons['A'].count() == biggie['A'].count() + assert (np.abs(recons['A'].dropna() - + biggie['A'].dropna()) < 0.1).all() + + # expected = ['B', 'A'] + # assert header == expected + + result = biggie.to_string(columns=['A'], col_space=17) + header = result.split('\n')[0].strip().split() + expected = ['A'] + assert header == expected + + biggie.to_string(columns=['B', 'A'], + formatters={'A': lambda x: '%.1f' % x}) + + biggie.to_string(columns=['B', 'A'], float_format=str) + biggie.to_string(columns=['B', 'A'], col_space=12, float_format=str) + + frame = DataFrame(index=np.arange(200)) + frame.to_string() + + def test_to_string_no_header(self): + df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) + + df_s = df.to_string(header=False) + expected = "0 1 4\n1 2 5\n2 3 6" + + assert df_s == expected + + def test_to_string_specified_header(self): + df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) + + df_s = df.to_string(header=['X', 'Y']) + expected = ' X Y\n0 1 4\n1 2 5\n2 3 6' + + assert df_s == expected + + with pytest.raises(ValueError): + df.to_string(header=['X']) + + def test_to_string_no_index(self): + df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) + + df_s = df.to_string(index=False) + expected = "x y\n1 4\n2 5\n3 6" + + assert df_s == expected + + def test_to_string_line_width_no_index(self): + df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) + + df_s = df.to_string(line_width=1, index=False) + expected = "x \\\n1 \n2 \n3 \n\ny \n4 \n5 \n6" + + assert df_s == expected + + def test_to_string_float_formatting(self): + tm.reset_display_options() + fmt.set_option('display.precision', 5, 'display.column_space', 12, + 'display.notebook_repr_html', False) + + df = DataFrame({'x': [0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, + 1.253456, np.pi, -1e6]}) + + df_s = df.to_string() + + # Python 2.5 just wants me to be sad. And debian 32-bit + # sys.version_info[0] == 2 and sys.version_info[1] < 6: + if _three_digit_exp(): + expected = (' x\n0 0.00000e+000\n1 2.50000e-001\n' + '2 3.45600e+003\n3 1.20000e+046\n4 1.64000e+006\n' + '5 1.70000e+008\n6 1.25346e+000\n7 3.14159e+000\n' + '8 -1.00000e+006') + else: + expected = (' x\n0 0.00000e+00\n1 2.50000e-01\n' + '2 3.45600e+03\n3 1.20000e+46\n4 1.64000e+06\n' + '5 1.70000e+08\n6 1.25346e+00\n7 3.14159e+00\n' + '8 -1.00000e+06') + assert df_s == expected + + df = DataFrame({'x': [3234, 0.253]}) + df_s = df.to_string() + + expected = (' x\n' '0 3234.000\n' '1 0.253') + assert df_s == expected + + tm.reset_display_options() + assert get_option("display.precision") == 6 + + df = DataFrame({'x': [1e9, 0.2512]}) + df_s = df.to_string() + # Python 2.5 just wants me to be sad. And debian 32-bit + # sys.version_info[0] == 2 and sys.version_info[1] < 6: + if _three_digit_exp(): + expected = (' x\n' + '0 1.000000e+009\n' + '1 2.512000e-001') + else: + expected = (' x\n' + '0 1.000000e+09\n' + '1 2.512000e-01') + assert df_s == expected + + def test_to_string_small_float_values(self): + df = DataFrame({'a': [1.5, 1e-17, -5.5e-7]}) + + result = df.to_string() + # sadness per above + if '%.4g' % 1.7e8 == '1.7e+008': + expected = (' a\n' + '0 1.500000e+000\n' + '1 1.000000e-017\n' + '2 -5.500000e-007') + else: + expected = (' a\n' + '0 1.500000e+00\n' + '1 1.000000e-17\n' + '2 -5.500000e-07') + assert result == expected + + # but not all exactly zero + df = df * 0 + result = df.to_string() + expected = (' 0\n' '0 0\n' '1 0\n' '2 -0') + + def test_to_string_float_index(self): + index = Index([1.5, 2, 3, 4, 5]) + df = DataFrame(lrange(5), index=index) + + result = df.to_string() + expected = (' 0\n' + '1.5 0\n' + '2.0 1\n' + '3.0 2\n' + '4.0 3\n' + '5.0 4') + assert result == expected + + def test_to_string_ascii_error(self): + data = [('0 ', u(' .gitignore '), u(' 5 '), + ' \xe2\x80\xa2\xe2\x80\xa2\xe2\x80' + '\xa2\xe2\x80\xa2\xe2\x80\xa2')] + df = DataFrame(data) + + # it works! + repr(df) + + def test_to_string_int_formatting(self): + df = DataFrame({'x': [-15, 20, 25, -35]}) + assert issubclass(df['x'].dtype.type, np.integer) + + output = df.to_string() + expected = (' x\n' '0 -15\n' '1 20\n' '2 25\n' '3 -35') + assert output == expected + + def test_to_string_index_formatter(self): + df = DataFrame([lrange(5), lrange(5, 10), lrange(10, 15)]) + + rs = df.to_string(formatters={'__index__': lambda x: 'abc' [x]}) + + xp = """\ + 0 1 2 3 4 +a 0 1 2 3 4 +b 5 6 7 8 9 +c 10 11 12 13 14\ +""" + + assert rs == xp + + def test_to_string_left_justify_cols(self): + tm.reset_display_options() + df = DataFrame({'x': [3234, 0.253]}) + df_s = df.to_string(justify='left') + expected = (' x \n' '0 3234.000\n' '1 0.253') + assert df_s == expected + + def test_to_string_format_na(self): + tm.reset_display_options() + df = DataFrame({'A': [np.nan, -1, -2.1234, 3, 4], + 'B': [np.nan, 'foo', 'foooo', 'fooooo', 'bar']}) + result = df.to_string() + + expected = (' A B\n' + '0 NaN NaN\n' + '1 -1.0000 foo\n' + '2 -2.1234 foooo\n' + '3 3.0000 fooooo\n' + '4 4.0000 bar') + assert result == expected + + df = DataFrame({'A': [np.nan, -1., -2., 3., 4.], + 'B': [np.nan, 'foo', 'foooo', 'fooooo', 'bar']}) + result = df.to_string() + + expected = (' A B\n' + '0 NaN NaN\n' + '1 -1.0 foo\n' + '2 -2.0 foooo\n' + '3 3.0 fooooo\n' + '4 4.0 bar') + assert result == expected + + def test_to_string_line_width(self): + df = DataFrame(123, lrange(10, 15), lrange(30)) + s = df.to_string(line_width=80) + assert max(len(l) for l in s.split('\n')) == 80 + + def test_show_dimensions(self): + df = DataFrame(123, lrange(10, 15), lrange(30)) + + with option_context('display.max_rows', 10, 'display.max_columns', 40, + 'display.width', 500, 'display.expand_frame_repr', + 'info', 'display.show_dimensions', True): + assert '5 rows' in str(df) + assert '5 rows' in df._repr_html_() + with option_context('display.max_rows', 10, 'display.max_columns', 40, + 'display.width', 500, 'display.expand_frame_repr', + 'info', 'display.show_dimensions', False): + assert '5 rows' not in str(df) + assert '5 rows' not in df._repr_html_() + with option_context('display.max_rows', 2, 'display.max_columns', 2, + 'display.width', 500, 'display.expand_frame_repr', + 'info', 'display.show_dimensions', 'truncate'): + assert '5 rows' in str(df) + assert '5 rows' in df._repr_html_() + with option_context('display.max_rows', 10, 'display.max_columns', 40, + 'display.width', 500, 'display.expand_frame_repr', + 'info', 'display.show_dimensions', 'truncate'): + assert '5 rows' not in str(df) + assert '5 rows' not in df._repr_html_() + + def test_repr_html(self): + self.frame._repr_html_() + + fmt.set_option('display.max_rows', 1, 'display.max_columns', 1) + self.frame._repr_html_() + + fmt.set_option('display.notebook_repr_html', False) + self.frame._repr_html_() + + tm.reset_display_options() + + df = DataFrame([[1, 2], [3, 4]]) + fmt.set_option('display.show_dimensions', True) + assert '2 rows' in df._repr_html_() + fmt.set_option('display.show_dimensions', False) + assert '2 rows' not in df._repr_html_() + + tm.reset_display_options() + + def test_repr_html_wide(self): + max_cols = get_option('display.max_columns') + df = DataFrame(tm.rands_array(25, size=(10, max_cols - 1))) + reg_repr = df._repr_html_() + assert "..." not in reg_repr + + wide_df = DataFrame(tm.rands_array(25, size=(10, max_cols + 1))) + wide_repr = wide_df._repr_html_() + assert "..." in wide_repr + + def test_repr_html_wide_multiindex_cols(self): + max_cols = get_option('display.max_columns') + + mcols = MultiIndex.from_product([np.arange(max_cols // 2), + ['foo', 'bar']], + names=['first', 'second']) + df = DataFrame(tm.rands_array(25, size=(10, len(mcols))), + columns=mcols) + reg_repr = df._repr_html_() + assert '...' not in reg_repr + + mcols = MultiIndex.from_product((np.arange(1 + (max_cols // 2)), + ['foo', 'bar']), + names=['first', 'second']) + df = DataFrame(tm.rands_array(25, size=(10, len(mcols))), + columns=mcols) + wide_repr = df._repr_html_() + assert '...' in wide_repr + + def test_repr_html_long(self): + with option_context('display.max_rows', 60): + max_rows = get_option('display.max_rows') + h = max_rows - 1 + df = DataFrame({'A': np.arange(1, 1 + h), + 'B': np.arange(41, 41 + h)}) + reg_repr = df._repr_html_() + assert '..' not in reg_repr + assert str(41 + max_rows // 2) in reg_repr + + h = max_rows + 1 + df = DataFrame({'A': np.arange(1, 1 + h), + 'B': np.arange(41, 41 + h)}) + long_repr = df._repr_html_() + assert '..' in long_repr + assert str(41 + max_rows // 2) not in long_repr + assert u('%d rows ') % h in long_repr + assert u('2 columns') in long_repr + + def test_repr_html_float(self): + with option_context('display.max_rows', 60): + + max_rows = get_option('display.max_rows') + h = max_rows - 1 + df = DataFrame({'idx': np.linspace(-10, 10, h), + 'A': np.arange(1, 1 + h), + 'B': np.arange(41, 41 + h)}).set_index('idx') + reg_repr = df._repr_html_() + assert '..' not in reg_repr + assert str(40 + h) in reg_repr + + h = max_rows + 1 + df = DataFrame({'idx': np.linspace(-10, 10, h), + 'A': np.arange(1, 1 + h), + 'B': np.arange(41, 41 + h)}).set_index('idx') + long_repr = df._repr_html_() + assert '..' in long_repr + assert '31' not in long_repr + assert u('%d rows ') % h in long_repr + assert u('2 columns') in long_repr + + def test_repr_html_long_multiindex(self): + max_rows = get_option('display.max_rows') + max_L1 = max_rows // 2 + + tuples = list(itertools.product(np.arange(max_L1), ['foo', 'bar'])) + idx = MultiIndex.from_tuples(tuples, names=['first', 'second']) + df = DataFrame(np.random.randn(max_L1 * 2, 2), index=idx, + columns=['A', 'B']) + reg_repr = df._repr_html_() + assert '...' not in reg_repr + + tuples = list(itertools.product(np.arange(max_L1 + 1), ['foo', 'bar'])) + idx = MultiIndex.from_tuples(tuples, names=['first', 'second']) + df = DataFrame(np.random.randn((max_L1 + 1) * 2, 2), index=idx, + columns=['A', 'B']) + long_repr = df._repr_html_() + assert '...' in long_repr + + def test_repr_html_long_and_wide(self): + max_cols = get_option('display.max_columns') + max_rows = get_option('display.max_rows') + + h, w = max_rows - 1, max_cols - 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert '...' not in df._repr_html_() + + h, w = max_rows + 1, max_cols + 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert '...' in df._repr_html_() + + def test_info_repr(self): + max_rows = get_option('display.max_rows') + max_cols = get_option('display.max_columns') + # Long + h, w = max_rows + 1, max_cols - 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert has_vertically_truncated_repr(df) + with option_context('display.large_repr', 'info'): + assert has_info_repr(df) + + # Wide + h, w = max_rows - 1, max_cols + 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert has_horizontally_truncated_repr(df) + with option_context('display.large_repr', 'info'): + assert has_info_repr(df) + + def test_info_repr_max_cols(self): + # GH #6939 + df = DataFrame(np.random.randn(10, 5)) + with option_context('display.large_repr', 'info', + 'display.max_columns', 1, + 'display.max_info_columns', 4): + assert has_non_verbose_info_repr(df) + + with option_context('display.large_repr', 'info', + 'display.max_columns', 1, + 'display.max_info_columns', 5): + assert not has_non_verbose_info_repr(df) + + # test verbose overrides + # fmt.set_option('display.max_info_columns', 4) # exceeded + + def test_info_repr_html(self): + max_rows = get_option('display.max_rows') + max_cols = get_option('display.max_columns') + # Long + h, w = max_rows + 1, max_cols - 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert r'<class' not in df._repr_html_() + with option_context('display.large_repr', 'info'): + assert r'<class' in df._repr_html_() + + # Wide + h, w = max_rows - 1, max_cols + 1 + df = DataFrame(dict((k, np.arange(1, 1 + h)) for k in np.arange(w))) + assert ' 1e6 and something that normally formats to + # having length > display.precision + 6 + df = pd.DataFrame(dict(x=[12345.6789])) + assert str(df) == ' x\n0 12345.6789' + df = pd.DataFrame(dict(x=[2e6])) + assert str(df) == ' x\n0 2000000.0' + df = pd.DataFrame(dict(x=[12345.6789, 2e6])) + assert str(df) == ' x\n0 1.2346e+04\n1 2.0000e+06' + + +class TestRepr_timedelta64(object): + + def test_none(self): + delta_1d = pd.to_timedelta(1, unit='D') + delta_0d = pd.to_timedelta(0, unit='D') + delta_1s = pd.to_timedelta(1, unit='s') + delta_500ms = pd.to_timedelta(500, unit='ms') + + drepr = lambda x: x._repr_base() + assert drepr(delta_1d) == "1 days" + assert drepr(-delta_1d) == "-1 days" + assert drepr(delta_0d) == "0 days" + assert drepr(delta_1s) == "0 days 00:00:01" + assert drepr(delta_500ms) == "0 days 00:00:00.500000" + assert drepr(delta_1d + delta_1s) == "1 days 00:00:01" + assert drepr(delta_1d + delta_500ms) == "1 days 00:00:00.500000" + + def test_even_day(self): + delta_1d = pd.to_timedelta(1, unit='D') + delta_0d = pd.to_timedelta(0, unit='D') + delta_1s = pd.to_timedelta(1, unit='s') + delta_500ms = pd.to_timedelta(500, unit='ms') + + drepr = lambda x: x._repr_base(format='even_day') + assert drepr(delta_1d) == "1 days" + assert drepr(-delta_1d) == "-1 days" + assert drepr(delta_0d) == "0 days" + assert drepr(delta_1s) == "0 days 00:00:01" + assert drepr(delta_500ms) == "0 days 00:00:00.500000" + assert drepr(delta_1d + delta_1s) == "1 days 00:00:01" + assert drepr(delta_1d + delta_500ms) == "1 days 00:00:00.500000" + + def test_sub_day(self): + delta_1d = pd.to_timedelta(1, unit='D') + delta_0d = pd.to_timedelta(0, unit='D') + delta_1s = pd.to_timedelta(1, unit='s') + delta_500ms = pd.to_timedelta(500, unit='ms') + + drepr = lambda x: x._repr_base(format='sub_day') + assert drepr(delta_1d) == "1 days" + assert drepr(-delta_1d) == "-1 days" + assert drepr(delta_0d) == "00:00:00" + assert drepr(delta_1s) == "00:00:01" + assert drepr(delta_500ms) == "00:00:00.500000" + assert drepr(delta_1d + delta_1s) == "1 days 00:00:01" + assert drepr(delta_1d + delta_500ms) == "1 days 00:00:00.500000" + + def test_long(self): + delta_1d = pd.to_timedelta(1, unit='D') + delta_0d = pd.to_timedelta(0, unit='D') + delta_1s = pd.to_timedelta(1, unit='s') + delta_500ms = pd.to_timedelta(500, unit='ms') + + drepr = lambda x: x._repr_base(format='long') + assert drepr(delta_1d) == "1 days 00:00:00" + assert drepr(-delta_1d) == "-1 days +00:00:00" + assert drepr(delta_0d) == "0 days 00:00:00" + assert drepr(delta_1s) == "0 days 00:00:01" + assert drepr(delta_500ms) == "0 days 00:00:00.500000" + assert drepr(delta_1d + delta_1s) == "1 days 00:00:01" + assert drepr(delta_1d + delta_500ms) == "1 days 00:00:00.500000" + + def test_all(self): + delta_1d = pd.to_timedelta(1, unit='D') + delta_0d = pd.to_timedelta(0, unit='D') + delta_1ns = pd.to_timedelta(1, unit='ns') + + drepr = lambda x: x._repr_base(format='all') + assert drepr(delta_1d) == "1 days 00:00:00.000000000" + assert drepr(delta_0d) == "0 days 00:00:00.000000000" + assert drepr(delta_1ns) == "0 days 00:00:00.000000001" + + +class TestTimedelta64Formatter(object): + + def test_days(self): + x = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='D') + result = fmt.Timedelta64Formatter(x, box=True).get_result() + assert result[0].strip() == "'0 days'" + assert result[1].strip() == "'1 days'" + + result = fmt.Timedelta64Formatter(x[1:2], box=True).get_result() + assert result[0].strip() == "'1 days'" + + result = fmt.Timedelta64Formatter(x, box=False).get_result() + assert result[0].strip() == "0 days" + assert result[1].strip() == "1 days" + + result = fmt.Timedelta64Formatter(x[1:2], box=False).get_result() + assert result[0].strip() == "1 days" + + def test_days_neg(self): + x = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='D') + result = fmt.Timedelta64Formatter(-x, box=True).get_result() + assert result[0].strip() == "'0 days'" + assert result[1].strip() == "'-1 days'" + + def test_subdays(self): + y = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='s') + result = fmt.Timedelta64Formatter(y, box=True).get_result() + assert result[0].strip() == "'00:00:00'" + assert result[1].strip() == "'00:00:01'" + + def test_subdays_neg(self): + y = pd.to_timedelta(list(range(5)) + [pd.NaT], unit='s') + result = fmt.Timedelta64Formatter(-y, box=True).get_result() + assert result[0].strip() == "'00:00:00'" + assert result[1].strip() == "'-1 days +23:59:59'" + + def test_zero(self): + x = pd.to_timedelta(list(range(1)) + [pd.NaT], unit='D') + result = fmt.Timedelta64Formatter(x, box=True).get_result() + assert result[0].strip() == "'0 days'" + + x = pd.to_timedelta(list(range(1)), unit='D') + result = fmt.Timedelta64Formatter(x, box=True).get_result() + assert result[0].strip() == "'0 days'" + + +class TestDatetime64Formatter(object): + + def test_mixed(self): + x = Series([datetime(2013, 1, 1), datetime(2013, 1, 1, 12), pd.NaT]) + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 00:00:00" + assert result[1].strip() == "2013-01-01 12:00:00" + + def test_dates(self): + x = Series([datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT]) + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01" + assert result[1].strip() == "2013-01-02" + + def test_date_nanos(self): + x = Series([Timestamp(200)]) + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "1970-01-01 00:00:00.000000200" + + def test_dates_display(self): + + # 10170 + # make sure that we are consistently display date formatting + x = Series(date_range('20130101 09:00:00', periods=5, freq='D')) + x.iloc[1] = np.nan + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 09:00:00" + assert result[1].strip() == "NaT" + assert result[4].strip() == "2013-01-05 09:00:00" + + x = Series(date_range('20130101 09:00:00', periods=5, freq='s')) + x.iloc[1] = np.nan + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 09:00:00" + assert result[1].strip() == "NaT" + assert result[4].strip() == "2013-01-01 09:00:04" + + x = Series(date_range('20130101 09:00:00', periods=5, freq='ms')) + x.iloc[1] = np.nan + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 09:00:00.000" + assert result[1].strip() == "NaT" + assert result[4].strip() == "2013-01-01 09:00:00.004" + + x = Series(date_range('20130101 09:00:00', periods=5, freq='us')) + x.iloc[1] = np.nan + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 09:00:00.000000" + assert result[1].strip() == "NaT" + assert result[4].strip() == "2013-01-01 09:00:00.000004" + + x = Series(date_range('20130101 09:00:00', periods=5, freq='N')) + x.iloc[1] = np.nan + result = fmt.Datetime64Formatter(x).get_result() + assert result[0].strip() == "2013-01-01 09:00:00.000000000" + assert result[1].strip() == "NaT" + assert result[4].strip() == "2013-01-01 09:00:00.000000004" + + def test_datetime64formatter_yearmonth(self): + x = Series([datetime(2016, 1, 1), datetime(2016, 2, 2)]) + + def format_func(x): + return x.strftime('%Y-%m') + + formatter = fmt.Datetime64Formatter(x, formatter=format_func) + result = formatter.get_result() + assert result == ['2016-01', '2016-02'] + + def test_datetime64formatter_hoursecond(self): + + x = Series(pd.to_datetime(['10:10:10.100', '12:12:12.120'], + format='%H:%M:%S.%f')) + + def format_func(x): + return x.strftime('%H:%M') + + formatter = fmt.Datetime64Formatter(x, formatter=format_func) + result = formatter.get_result() + assert result == ['10:10', '12:12'] + + +class TestNaTFormatting(object): + + def test_repr(self): + assert repr(pd.NaT) == "NaT" + + def test_str(self): + assert str(pd.NaT) == "NaT" + + +class TestDatetimeIndexFormat(object): + + def test_datetime(self): + formatted = pd.to_datetime([datetime(2003, 1, 1, 12), pd.NaT]).format() + assert formatted[0] == "2003-01-01 12:00:00" + assert formatted[1] == "NaT" + + def test_date(self): + formatted = pd.to_datetime([datetime(2003, 1, 1), pd.NaT]).format() + assert formatted[0] == "2003-01-01" + assert formatted[1] == "NaT" + + def test_date_tz(self): + formatted = pd.to_datetime([datetime(2013, 1, 1)], utc=True).format() + assert formatted[0] == "2013-01-01 00:00:00+00:00" + + formatted = pd.to_datetime( + [datetime(2013, 1, 1), pd.NaT], utc=True).format() + assert formatted[0] == "2013-01-01 00:00:00+00:00" + + def test_date_explict_date_format(self): + formatted = pd.to_datetime([datetime(2003, 2, 1), pd.NaT]).format( + date_format="%m-%d-%Y", na_rep="UT") + assert formatted[0] == "02-01-2003" + assert formatted[1] == "UT" + + +class TestDatetimeIndexUnicode(object): + + def test_dates(self): + text = str(pd.to_datetime([datetime(2013, 1, 1), datetime(2014, 1, 1) + ])) + assert "['2013-01-01'," in text + assert ", '2014-01-01']" in text + + def test_mixed(self): + text = str(pd.to_datetime([datetime(2013, 1, 1), datetime( + 2014, 1, 1, 12), datetime(2014, 1, 1)])) + assert "'2013-01-01 00:00:00'," in text + assert "'2014-01-01 00:00:00']" in text + + +class TestStringRepTimestamp(object): + + def test_no_tz(self): + dt_date = datetime(2013, 1, 2) + assert str(dt_date) == str(Timestamp(dt_date)) + + dt_datetime = datetime(2013, 1, 2, 12, 1, 3) + assert str(dt_datetime) == str(Timestamp(dt_datetime)) + + dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45) + assert str(dt_datetime_us) == str(Timestamp(dt_datetime_us)) + + ts_nanos_only = Timestamp(200) + assert str(ts_nanos_only) == "1970-01-01 00:00:00.000000200" + + ts_nanos_micros = Timestamp(1200) + assert str(ts_nanos_micros) == "1970-01-01 00:00:00.000001200" + + def test_tz_pytz(self): + dt_date = datetime(2013, 1, 2, tzinfo=pytz.utc) + assert str(dt_date) == str(Timestamp(dt_date)) + + dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=pytz.utc) + assert str(dt_datetime) == str(Timestamp(dt_datetime)) + + dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=pytz.utc) + assert str(dt_datetime_us) == str(Timestamp(dt_datetime_us)) + + def test_tz_dateutil(self): + utc = dateutil.tz.tzutc() + + dt_date = datetime(2013, 1, 2, tzinfo=utc) + assert str(dt_date) == str(Timestamp(dt_date)) + + dt_datetime = datetime(2013, 1, 2, 12, 1, 3, tzinfo=utc) + assert str(dt_datetime) == str(Timestamp(dt_datetime)) + + dt_datetime_us = datetime(2013, 1, 2, 12, 1, 3, 45, tzinfo=utc) + assert str(dt_datetime_us) == str(Timestamp(dt_datetime_us)) + + def test_nat_representations(self): + for f in (str, repr, methodcaller('isoformat')): + assert f(pd.NaT) == 'NaT' + + +def test_format_percentiles(): + result = fmt.format_percentiles([0.01999, 0.02001, 0.5, 0.666666, 0.9999]) + expected = ['1.999%', '2.001%', '50%', '66.667%', '99.99%'] + assert result == expected + + result = fmt.format_percentiles([0, 0.5, 0.02001, 0.5, 0.666666, 0.9999]) + expected = ['0%', '50%', '2.0%', '50%', '66.67%', '99.99%'] + assert result == expected + + pytest.raises(ValueError, fmt.format_percentiles, [0.1, np.nan, 0.5]) + pytest.raises(ValueError, fmt.format_percentiles, [-0.001, 0.1, 0.5]) + pytest.raises(ValueError, fmt.format_percentiles, [2, 0.1, 0.5]) + pytest.raises(ValueError, fmt.format_percentiles, [0.1, 0.5, 'a']) diff --git a/pandas/tests/io/formats/test_printing.py b/pandas/tests/io/formats/test_printing.py new file mode 100644 index 0000000000000..ec34e7656e01f --- /dev/null +++ b/pandas/tests/io/formats/test_printing.py @@ -0,0 +1,220 @@ +# -*- coding: utf-8 -*- +import pytest + +import numpy as np +import pandas as pd + +from pandas import compat +import pandas.io.formats.printing as printing +import pandas.io.formats.format as fmt +import pandas.core.config as cf + + +def test_adjoin(): + data = [['a', 'b', 'c'], ['dd', 'ee', 'ff'], ['ggg', 'hhh', 'iii']] + expected = 'a dd ggg\nb ee hhh\nc ff iii' + + adjoined = printing.adjoin(2, *data) + + assert (adjoined == expected) + + +def test_repr_binary_type(): + import string + letters = string.ascii_letters + btype = compat.binary_type + try: + raw = btype(letters, encoding=cf.get_option('display.encoding')) + except TypeError: + raw = btype(letters) + b = compat.text_type(compat.bytes_to_str(raw)) + res = printing.pprint_thing(b, quote_strings=True) + assert res == repr(b) + res = printing.pprint_thing(b, quote_strings=False) + assert res == b + + +class TestFormattBase(object): + + def test_adjoin(self): + data = [['a', 'b', 'c'], ['dd', 'ee', 'ff'], ['ggg', 'hhh', 'iii']] + expected = 'a dd ggg\nb ee hhh\nc ff iii' + + adjoined = printing.adjoin(2, *data) + + assert adjoined == expected + + def test_adjoin_unicode(self): + data = [[u'あ', 'b', 'c'], ['dd', u'ええ', 'ff'], ['ggg', 'hhh', u'いいい']] + expected = u'あ dd ggg\nb ええ hhh\nc ff いいい' + adjoined = printing.adjoin(2, *data) + assert adjoined == expected + + adj = fmt.EastAsianTextAdjustment() + + expected = u"""あ dd ggg +b ええ hhh +c ff いいい""" + + adjoined = adj.adjoin(2, *data) + assert adjoined == expected + cols = adjoined.split('\n') + assert adj.len(cols[0]) == 13 + assert adj.len(cols[1]) == 13 + assert adj.len(cols[2]) == 16 + + expected = u"""あ dd ggg +b ええ hhh +c ff いいい""" + + adjoined = adj.adjoin(7, *data) + assert adjoined == expected + cols = adjoined.split('\n') + assert adj.len(cols[0]) == 23 + assert adj.len(cols[1]) == 23 + assert adj.len(cols[2]) == 26 + + def test_justify(self): + adj = fmt.EastAsianTextAdjustment() + + def just(x, *args, **kwargs): + # wrapper to test single str + return adj.justify([x], *args, **kwargs)[0] + + assert just('abc', 5, mode='left') == 'abc ' + assert just('abc', 5, mode='center') == ' abc ' + assert just('abc', 5, mode='right') == ' abc' + assert just(u'abc', 5, mode='left') == 'abc ' + assert just(u'abc', 5, mode='center') == ' abc ' + assert just(u'abc', 5, mode='right') == ' abc' + + assert just(u'パンダ', 5, mode='left') == u'パンダ' + assert just(u'パンダ', 5, mode='center') == u'パンダ' + assert just(u'パンダ', 5, mode='right') == u'パンダ' + + assert just(u'パンダ', 10, mode='left') == u'パンダ ' + assert just(u'パンダ', 10, mode='center') == u' パンダ ' + assert just(u'パンダ', 10, mode='right') == u' パンダ' + + def test_east_asian_len(self): + adj = fmt.EastAsianTextAdjustment() + + assert adj.len('abc') == 3 + assert adj.len(u'abc') == 3 + + assert adj.len(u'パンダ') == 6 + assert adj.len(u'パンダ') == 5 + assert adj.len(u'パンダpanda') == 11 + assert adj.len(u'パンダpanda') == 10 + + def test_ambiguous_width(self): + adj = fmt.EastAsianTextAdjustment() + assert adj.len(u'¡¡ab') == 4 + + with cf.option_context('display.unicode.ambiguous_as_wide', True): + adj = fmt.EastAsianTextAdjustment() + assert adj.len(u'¡¡ab') == 6 + + data = [[u'あ', 'b', 'c'], ['dd', u'ええ', 'ff'], + ['ggg', u'¡¡ab', u'いいい']] + expected = u'あ dd ggg \nb ええ ¡¡ab\nc ff いいい' + adjoined = adj.adjoin(2, *data) + assert adjoined == expected + + +class TestTableSchemaRepr(object): + + @classmethod + def setup_class(cls): + pytest.importorskip('IPython') + + from IPython.core.interactiveshell import InteractiveShell + cls.display_formatter = InteractiveShell.instance().display_formatter + + def test_publishes(self): + + df = pd.DataFrame({"A": [1, 2]}) + objects = [df['A'], df, df] # dataframe / series + expected_keys = [ + {'text/plain', 'application/vnd.dataresource+json'}, + {'text/plain', 'text/html', 'application/vnd.dataresource+json'}, + ] + + opt = pd.option_context('display.html.table_schema', True) + for obj, expected in zip(objects, expected_keys): + with opt: + formatted = self.display_formatter.format(obj) + assert set(formatted[0].keys()) == expected + + with_latex = pd.option_context('display.latex.repr', True) + + with opt, with_latex: + formatted = self.display_formatter.format(obj) + + expected = {'text/plain', 'text/html', 'text/latex', + 'application/vnd.dataresource+json'} + assert set(formatted[0].keys()) == expected + + def test_publishes_not_implemented(self): + # column MultiIndex + # GH 15996 + midx = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c']]) + df = pd.DataFrame(np.random.randn(5, len(midx)), columns=midx) + + opt = pd.option_context('display.html.table_schema', True) + + with opt: + formatted = self.display_formatter.format(df) + + expected = {'text/plain', 'text/html'} + assert set(formatted[0].keys()) == expected + + def test_config_on(self): + df = pd.DataFrame({"A": [1, 2]}) + with pd.option_context("display.html.table_schema", True): + result = df._repr_data_resource_() + + assert result is not None + + def test_config_default_off(self): + df = pd.DataFrame({"A": [1, 2]}) + with pd.option_context("display.html.table_schema", False): + result = df._repr_data_resource_() + + assert result is None + + def test_enable_data_resource_formatter(self): + # GH 10491 + formatters = self.display_formatter.formatters + mimetype = 'application/vnd.dataresource+json' + + with pd.option_context('display.html.table_schema', True): + assert 'application/vnd.dataresource+json' in formatters + assert formatters[mimetype].enabled + + # still there, just disabled + assert 'application/vnd.dataresource+json' in formatters + assert not formatters[mimetype].enabled + + # able to re-set + with pd.option_context('display.html.table_schema', True): + assert 'application/vnd.dataresource+json' in formatters + assert formatters[mimetype].enabled + # smoke test that it works + self.display_formatter.format(cf) + + +# TODO: fix this broken test + +# def test_console_encode(): +# """ +# On Python 2, if sys.stdin.encoding is None (IPython with zmq frontend) +# common.console_encode should encode things as utf-8. +# """ +# if compat.PY3: +# pytest.skip + +# with tm.stdin_encoding(encoding=None): +# result = printing.console_encode(u"\u05d0") +# expected = u"\u05d0".encode('utf-8') +# assert (result == expected) diff --git a/pandas/tests/formats/test_style.py b/pandas/tests/io/formats/test_style.py similarity index 61% rename from pandas/tests/formats/test_style.py rename to pandas/tests/io/formats/test_style.py index 2fec04b9c1aa3..59d9f938734ab 100644 --- a/pandas/tests/formats/test_style.py +++ b/pandas/tests/io/formats/test_style.py @@ -1,33 +1,20 @@ -import os -from nose import SkipTest - import copy +import textwrap +import re + +import pytest import numpy as np import pandas as pd from pandas import DataFrame -from pandas.util.testing import TestCase import pandas.util.testing as tm -# Getting failures on a python 2.7 build with -# whenever we try to import jinja, whether it's installed or not. -# so we're explicitly skipping that one *before* we try to import -# jinja. We still need to export the imports as globals, -# since importing Styler tries to import jinja2. -job_name = os.environ.get('JOB_NAME', None) -if job_name == '27_slow_nnet_LOCALE': - raise SkipTest("No jinja") -try: - # Do try except on just jinja, so the only reason - # We skip is if jinja can't import, not something else - import jinja2 # noqa -except ImportError: - raise SkipTest("No Jinja2") -from pandas.formats.style import Styler, _get_level_lengths # noqa - - -class TestStyler(TestCase): - - def setUp(self): +jinja2 = pytest.importorskip('jinja2') +from pandas.io.formats.style import Styler, _get_level_lengths # noqa + + +class TestStyler(object): + + def setup_method(self, method): np.random.seed(24) self.s = DataFrame({'A': np.random.permutation(range(6))}) self.df = DataFrame({'A': [0, 1], 'B': np.random.randn(2)}) @@ -47,12 +34,12 @@ def h(x, foo='bar'): ] def test_init_non_pandas(self): - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): Styler([1, 2, 3]) def test_init_series(self): result = Styler(pd.Series([1, 2])) - self.assertEqual(result.data.ndim, 2) + assert result.data.ndim == 2 def test_repr_html_ok(self): self.styler._repr_html_() @@ -61,7 +48,7 @@ def test_update_ctx(self): self.styler._update_ctx(self.attrs) expected = {(0, 0): ['color: red'], (1, 0): ['color: blue']} - self.assertEqual(self.styler.ctx, expected) + assert self.styler.ctx == expected def test_update_ctx_flatten_multi(self): attrs = DataFrame({"A": ['color: red; foo: bar', @@ -69,7 +56,7 @@ def test_update_ctx_flatten_multi(self): self.styler._update_ctx(attrs) expected = {(0, 0): ['color: red', ' foo: bar'], (1, 0): ['color: blue', ' foo: baz']} - self.assertEqual(self.styler.ctx, expected) + assert self.styler.ctx == expected def test_update_ctx_flatten_multi_traliing_semi(self): attrs = DataFrame({"A": ['color: red; foo: bar;', @@ -77,38 +64,38 @@ def test_update_ctx_flatten_multi_traliing_semi(self): self.styler._update_ctx(attrs) expected = {(0, 0): ['color: red', ' foo: bar'], (1, 0): ['color: blue', ' foo: baz']} - self.assertEqual(self.styler.ctx, expected) + assert self.styler.ctx == expected def test_copy(self): s2 = copy.copy(self.styler) - self.assertTrue(self.styler is not s2) - self.assertTrue(self.styler.ctx is s2.ctx) # shallow - self.assertTrue(self.styler._todo is s2._todo) + assert self.styler is not s2 + assert self.styler.ctx is s2.ctx # shallow + assert self.styler._todo is s2._todo self.styler._update_ctx(self.attrs) self.styler.highlight_max() - self.assertEqual(self.styler.ctx, s2.ctx) - self.assertEqual(self.styler._todo, s2._todo) + assert self.styler.ctx == s2.ctx + assert self.styler._todo == s2._todo def test_deepcopy(self): s2 = copy.deepcopy(self.styler) - self.assertTrue(self.styler is not s2) - self.assertTrue(self.styler.ctx is not s2.ctx) - self.assertTrue(self.styler._todo is not s2._todo) + assert self.styler is not s2 + assert self.styler.ctx is not s2.ctx + assert self.styler._todo is not s2._todo self.styler._update_ctx(self.attrs) self.styler.highlight_max() - self.assertNotEqual(self.styler.ctx, s2.ctx) - self.assertEqual(s2._todo, []) - self.assertNotEqual(self.styler._todo, s2._todo) + assert self.styler.ctx != s2.ctx + assert s2._todo == [] + assert self.styler._todo != s2._todo def test_clear(self): s = self.df.style.highlight_max()._compute() - self.assertTrue(len(s.ctx) > 0) - self.assertTrue(len(s._todo) > 0) + assert len(s.ctx) > 0 + assert len(s._todo) > 0 s.clear() - self.assertTrue(len(s.ctx) == 0) - self.assertTrue(len(s._todo) == 0) + assert len(s.ctx) == 0 + assert len(s._todo) == 0 def test_render(self): df = pd.DataFrame({"A": [0, 1]}) @@ -117,6 +104,16 @@ def test_render(self): s.render() # it worked? + def test_render_empty_dfs(self): + empty_df = DataFrame() + es = Styler(empty_df) + es.render() + # An index but no columns + DataFrame(columns=['a']).style.render() + # A column but no index + DataFrame(index=['a']).style.render() + # No IndexError raised? + def test_render_double(self): df = pd.DataFrame({"A": [0, 1]}) style = lambda x: pd.Series(["color: red; border: 1px", @@ -132,16 +129,16 @@ def test_set_properties(self): # order is deterministic v = ["color: white", "size: 10px"] expected = {(0, 0): v, (1, 0): v} - self.assertEqual(result.keys(), expected.keys()) + assert result.keys() == expected.keys() for v1, v2 in zip(result.values(), expected.values()): - self.assertEqual(sorted(v1), sorted(v2)) + assert sorted(v1) == sorted(v2) def test_set_properties_subset(self): df = pd.DataFrame({'A': [0, 1]}) result = df.style.set_properties(subset=pd.IndexSlice[0, 'A'], color='white')._compute().ctx expected = {(0, 0): ['color: white']} - self.assertEqual(result, expected) + assert result == expected def test_empty_index_name_doesnt_display(self): # https://github.com/pandas-dev/pandas/pull/12090#issuecomment-180695902 @@ -155,24 +152,21 @@ def test_empty_index_name_doesnt_display(self): 'type': 'th', 'value': 'A', 'is_visible': True, - 'attributes': ["colspan=1"], }, {'class': 'col_heading level0 col1', 'display_value': 'B', 'type': 'th', 'value': 'B', 'is_visible': True, - 'attributes': ["colspan=1"], }, {'class': 'col_heading level0 col2', 'display_value': 'C', 'type': 'th', 'value': 'C', 'is_visible': True, - 'attributes': ["colspan=1"], }]] - self.assertEqual(result['head'], expected) + assert result['head'] == expected def test_index_name(self): # https://github.com/pandas-dev/pandas/issues/11655 @@ -182,17 +176,15 @@ def test_index_name(self): expected = [[{'class': 'blank level0', 'type': 'th', 'value': '', 'display_value': '', 'is_visible': True}, {'class': 'col_heading level0 col0', 'type': 'th', - 'value': 'B', 'display_value': 'B', - 'is_visible': True, 'attributes': ['colspan=1']}, + 'value': 'B', 'display_value': 'B', 'is_visible': True}, {'class': 'col_heading level0 col1', 'type': 'th', - 'value': 'C', 'display_value': 'C', - 'is_visible': True, 'attributes': ['colspan=1']}], + 'value': 'C', 'display_value': 'C', 'is_visible': True}], [{'class': 'index_name level0', 'type': 'th', 'value': 'A'}, {'class': 'blank', 'type': 'th', 'value': ''}, {'class': 'blank', 'type': 'th', 'value': ''}]] - self.assertEqual(result['head'], expected) + assert result['head'] == expected def test_multiindex_name(self): # https://github.com/pandas-dev/pandas/issues/11655 @@ -205,16 +197,14 @@ def test_multiindex_name(self): {'class': 'blank level0', 'type': 'th', 'value': '', 'display_value': '', 'is_visible': True}, {'class': 'col_heading level0 col0', 'type': 'th', - 'value': 'C', 'display_value': 'C', - 'is_visible': True, 'attributes': ['colspan=1'], - }], + 'value': 'C', 'display_value': 'C', 'is_visible': True}], [{'class': 'index_name level0', 'type': 'th', 'value': 'A'}, {'class': 'index_name level1', 'type': 'th', 'value': 'B'}, {'class': 'blank', 'type': 'th', 'value': ''}]] - self.assertEqual(result['head'], expected) + assert result['head'] == expected def test_numeric_columns(self): # https://github.com/pandas-dev/pandas/issues/12125 @@ -226,21 +216,21 @@ def test_apply_axis(self): df = pd.DataFrame({'A': [0, 0], 'B': [1, 1]}) f = lambda x: ['val: %s' % x.max() for v in x] result = df.style.apply(f, axis=1) - self.assertEqual(len(result._todo), 1) - self.assertEqual(len(result.ctx), 0) + assert len(result._todo) == 1 + assert len(result.ctx) == 0 result._compute() expected = {(0, 0): ['val: 1'], (0, 1): ['val: 1'], (1, 0): ['val: 1'], (1, 1): ['val: 1']} - self.assertEqual(result.ctx, expected) + assert result.ctx == expected result = df.style.apply(f, axis=0) expected = {(0, 0): ['val: 0'], (0, 1): ['val: 1'], (1, 0): ['val: 0'], (1, 1): ['val: 1']} result._compute() - self.assertEqual(result.ctx, expected) + assert result.ctx == expected result = df.style.apply(f) # default result._compute() - self.assertEqual(result.ctx, expected) + assert result.ctx == expected def test_apply_subset(self): axes = [0, 1] @@ -256,7 +246,7 @@ def test_apply_subset(self): for c, col in enumerate(self.df.columns) if row in self.df.loc[slice_].index and col in self.df.loc[slice_].columns) - self.assertEqual(result, expected) + assert result == expected def test_applymap_subset(self): def f(x): @@ -273,7 +263,7 @@ def f(x): for c, col in enumerate(self.df.columns) if row in self.df.loc[slice_].index and col in self.df.loc[slice_].columns) - self.assertEqual(result, expected) + assert result == expected def test_empty(self): df = pd.DataFrame({'A': [1, 0]}) @@ -284,9 +274,9 @@ def test_empty(self): result = s._translate()['cellstyle'] expected = [{'props': [['color', ' red']], 'selector': 'row0_col0'}, {'props': [['', '']], 'selector': 'row1_col0'}] - self.assertEqual(result, expected) + assert result == expected - def test_bar(self): + def test_bar_align_left(self): df = pd.DataFrame({'A': [0, 1, 2]}) result = df.style.bar()._compute().ctx expected = { @@ -298,7 +288,7 @@ def test_bar(self): 'background: linear-gradient(' '90deg,#d65f5f 100.0%, transparent 0%)'] } - self.assertEqual(result, expected) + assert result == expected result = df.style.bar(color='red', width=50)._compute().ctx expected = { @@ -310,16 +300,16 @@ def test_bar(self): 'background: linear-gradient(' '90deg,red 50.0%, transparent 0%)'] } - self.assertEqual(result, expected) + assert result == expected df['C'] = ['a'] * len(df) result = df.style.bar(color='red', width=50)._compute().ctx - self.assertEqual(result, expected) + assert result == expected df['C'] = df['C'].astype('category') result = df.style.bar(color='red', width=50)._compute().ctx - self.assertEqual(result, expected) + assert result == expected - def test_bar_0points(self): + def test_bar_align_left_0points(self): df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) result = df.style.bar()._compute().ctx expected = {(0, 0): ['width: 10em', ' height: 80%'], @@ -343,7 +333,7 @@ def test_bar_0points(self): (2, 2): ['width: 10em', ' height: 80%', 'background: linear-gradient(90deg,#d65f5f 100.0%' ', transparent 0%)']} - self.assertEqual(result, expected) + assert result == expected result = df.style.bar(axis=1)._compute().ctx expected = {(0, 0): ['width: 10em', ' height: 80%'], @@ -367,73 +357,193 @@ def test_bar_0points(self): (2, 2): ['width: 10em', ' height: 80%', 'background: linear-gradient(90deg,#d65f5f 100.0%' ', transparent 0%)']} - self.assertEqual(result, expected) + assert result == expected + + def test_bar_align_mid_pos_and_neg(self): + df = pd.DataFrame({'A': [-10, 0, 20, 90]}) + + result = df.style.bar(align='mid', color=[ + '#d65f5f', '#5fba7d'])._compute().ctx + + expected = {(0, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, #d65f5f 0.0%, ' + '#d65f5f 10.0%, transparent 10.0%)'], + (1, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 10.0%, ' + '#d65f5f 10.0%, #d65f5f 10.0%, ' + 'transparent 10.0%)'], + (2, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 10.0%, #5fba7d 10.0%' + ', #5fba7d 30.0%, transparent 30.0%)'], + (3, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 10.0%, ' + '#5fba7d 10.0%, #5fba7d 100.0%, ' + 'transparent 100.0%)']} + + assert result == expected + + def test_bar_align_mid_all_pos(self): + df = pd.DataFrame({'A': [10, 20, 50, 100]}) + + result = df.style.bar(align='mid', color=[ + '#d65f5f', '#5fba7d'])._compute().ctx + + expected = {(0, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, #5fba7d 0.0%, ' + '#5fba7d 10.0%, transparent 10.0%)'], + (1, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, #5fba7d 0.0%, ' + '#5fba7d 20.0%, transparent 20.0%)'], + (2, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, #5fba7d 0.0%, ' + '#5fba7d 50.0%, transparent 50.0%)'], + (3, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, #5fba7d 0.0%, ' + '#5fba7d 100.0%, transparent 100.0%)']} + + assert result == expected + + def test_bar_align_mid_all_neg(self): + df = pd.DataFrame({'A': [-100, -60, -30, -20]}) + + result = df.style.bar(align='mid', color=[ + '#d65f5f', '#5fba7d'])._compute().ctx + + expected = {(0, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 0.0%, ' + '#d65f5f 0.0%, #d65f5f 100.0%, ' + 'transparent 100.0%)'], + (1, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 40.0%, ' + '#d65f5f 40.0%, #d65f5f 100.0%, ' + 'transparent 100.0%)'], + (2, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 70.0%, ' + '#d65f5f 70.0%, #d65f5f 100.0%, ' + 'transparent 100.0%)'], + (3, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 80.0%, ' + '#d65f5f 80.0%, #d65f5f 100.0%, ' + 'transparent 100.0%)']} + assert result == expected + + def test_bar_align_zero_pos_and_neg(self): + # See https://github.com/pandas-dev/pandas/pull/14757 + df = pd.DataFrame({'A': [-10, 0, 20, 90]}) + + result = df.style.bar(align='zero', color=[ + '#d65f5f', '#5fba7d'], width=90)._compute().ctx + + expected = {(0, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 45.0%, ' + '#d65f5f 45.0%, #d65f5f 50%, ' + 'transparent 50%)'], + (1, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 50%, ' + '#5fba7d 50%, #5fba7d 50.0%, ' + 'transparent 50.0%)'], + (2, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 50%, #5fba7d 50%, ' + '#5fba7d 60.0%, transparent 60.0%)'], + (3, 0): ['width: 10em', ' height: 80%', + 'background: linear-gradient(90deg, ' + 'transparent 0%, transparent 50%, #5fba7d 50%, ' + '#5fba7d 95.0%, transparent 95.0%)']} + assert result == expected + + def test_bar_bad_align_raises(self): + df = pd.DataFrame({'A': [-100, -60, -30, -20]}) + with pytest.raises(ValueError): + df.style.bar(align='poorly', color=['#d65f5f', '#5fba7d']) def test_highlight_null(self, null_color='red'): df = pd.DataFrame({'A': [0, np.nan]}) result = df.style.highlight_null()._compute().ctx expected = {(0, 0): [''], (1, 0): ['background-color: red']} - self.assertEqual(result, expected) + assert result == expected def test_nonunique_raises(self): df = pd.DataFrame([[1, 2]], columns=['A', 'A']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): Styler(df) def test_caption(self): styler = Styler(self.df, caption='foo') result = styler.render() - self.assertTrue(all(['caption' in result, 'foo' in result])) + assert all(['caption' in result, 'foo' in result]) styler = self.df.style result = styler.set_caption('baz') - self.assertTrue(styler is result) - self.assertEqual(styler.caption, 'baz') + assert styler is result + assert styler.caption == 'baz' def test_uuid(self): styler = Styler(self.df, uuid='abc123') result = styler.render() - self.assertTrue('abc123' in result) + assert 'abc123' in result styler = self.df.style result = styler.set_uuid('aaa') - self.assertTrue(result is styler) - self.assertEqual(result.uuid, 'aaa') + assert result is styler + assert result.uuid == 'aaa' + + def test_unique_id(self): + # See https://github.com/pandas-dev/pandas/issues/16780 + df = pd.DataFrame({'a': [1, 3, 5, 6], 'b': [2, 4, 12, 21]}) + result = df.style.render(uuid='test') + assert 'test' in result + ids = re.findall('id="(.*?)"', result) + assert np.unique(ids).size == len(ids) def test_table_styles(self): style = [{'selector': 'th', 'props': [('foo', 'bar')]}] styler = Styler(self.df, table_styles=style) result = ' '.join(styler.render().split()) - self.assertTrue('th { foo: bar; }' in result) + assert 'th { foo: bar; }' in result styler = self.df.style result = styler.set_table_styles(style) - self.assertTrue(styler is result) - self.assertEqual(styler.table_styles, style) + assert styler is result + assert styler.table_styles == style def test_table_attributes(self): attributes = 'class="foo" data-bar' styler = Styler(self.df, table_attributes=attributes) result = styler.render() - self.assertTrue('class="foo" data-bar' in result) + assert 'class="foo" data-bar' in result result = self.df.style.set_table_attributes(attributes).render() - self.assertTrue('class="foo" data-bar' in result) + assert 'class="foo" data-bar' in result def test_precision(self): with pd.option_context('display.precision', 10): s = Styler(self.df) - self.assertEqual(s.precision, 10) + assert s.precision == 10 s = Styler(self.df, precision=2) - self.assertEqual(s.precision, 2) + assert s.precision == 2 s2 = s.set_precision(4) - self.assertTrue(s is s2) - self.assertEqual(s.precision, 4) + assert s is s2 + assert s.precision == 4 def test_apply_none(self): def f(x): @@ -441,14 +551,14 @@ def f(x): index=x.index, columns=x.columns) result = (pd.DataFrame([[1, 2], [3, 4]]) .style.apply(f, axis=None)._compute().ctx) - self.assertEqual(result[(1, 1)], ['color: red']) + assert result[(1, 1)] == ['color: red'] def test_trim(self): result = self.df.style.render() # trim=True - self.assertEqual(result.count('#'), 0) + assert result.count('#') == 0 result = self.df.style.highlight_max().render() - self.assertEqual(result.count('#'), len(self.df.columns)) + assert result.count('#') == len(self.df.columns) def test_highlight_max(self): df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']) @@ -460,25 +570,25 @@ def test_highlight_max(self): df = -df attr = 'highlight_min' result = getattr(df.style, attr)()._compute().ctx - self.assertEqual(result[(1, 1)], ['background-color: yellow']) + assert result[(1, 1)] == ['background-color: yellow'] result = getattr(df.style, attr)(color='green')._compute().ctx - self.assertEqual(result[(1, 1)], ['background-color: green']) + assert result[(1, 1)] == ['background-color: green'] result = getattr(df.style, attr)(subset='A')._compute().ctx - self.assertEqual(result[(1, 0)], ['background-color: yellow']) + assert result[(1, 0)] == ['background-color: yellow'] result = getattr(df.style, attr)(axis=0)._compute().ctx expected = {(1, 0): ['background-color: yellow'], (1, 1): ['background-color: yellow'], (0, 1): [''], (0, 0): ['']} - self.assertEqual(result, expected) + assert result == expected result = getattr(df.style, attr)(axis=1)._compute().ctx expected = {(0, 1): ['background-color: yellow'], (1, 1): ['background-color: yellow'], (0, 0): [''], (1, 0): ['']} - self.assertEqual(result, expected) + assert result == expected # separate since we cant negate the strs df['C'] = ['a', 'b'] @@ -498,25 +608,23 @@ def test_export(self): result = style1.export() style2 = self.df.style style2.use(result) - self.assertEqual(style1._todo, style2._todo) + assert style1._todo == style2._todo style2.render() def test_display_format(self): df = pd.DataFrame(np.random.random(size=(2, 2))) ctx = df.style.format("{:0.1f}")._translate() - self.assertTrue(all(['display_value' in c for c in row] - for row in ctx['body'])) - self.assertTrue(all([len(c['display_value']) <= 3 for c in row[1:]] - for row in ctx['body'])) - self.assertTrue( - len(ctx['body'][0][1]['display_value'].lstrip('-')) <= 3) + assert all(['display_value' in c for c in row] for row in ctx['body']) + assert (all([len(c['display_value']) <= 3 for c in row[1:]] + for row in ctx['body'])) + assert len(ctx['body'][0][1]['display_value'].lstrip('-')) <= 3 def test_display_format_raises(self): df = pd.DataFrame(np.random.randn(2, 2)) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.style.format(5) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.style.format(True) def test_display_subset(self): @@ -525,78 +633,78 @@ def test_display_subset(self): ctx = df.style.format({"a": "{:0.1f}", "b": "{0:.2%}"}, subset=pd.IndexSlice[0, :])._translate() expected = '0.1' - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][1][1]['display_value'], '1.1234') - self.assertEqual(ctx['body'][0][2]['display_value'], '12.34%') + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][1][1]['display_value'] == '1.1234' + assert ctx['body'][0][2]['display_value'] == '12.34%' raw_11 = '1.1234' ctx = df.style.format("{:0.1f}", subset=pd.IndexSlice[0, :])._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][1][1]['display_value'], raw_11) + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][1][1]['display_value'] == raw_11 ctx = df.style.format("{:0.1f}", subset=pd.IndexSlice[0, :])._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][1][1]['display_value'], raw_11) + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][1][1]['display_value'] == raw_11 ctx = df.style.format("{:0.1f}", subset=pd.IndexSlice['a'])._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][0][2]['display_value'], '0.1234') + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][0][2]['display_value'] == '0.1234' ctx = df.style.format("{:0.1f}", subset=pd.IndexSlice[0, 'a'])._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][1][1]['display_value'], raw_11) + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][1][1]['display_value'] == raw_11 ctx = df.style.format("{:0.1f}", subset=pd.IndexSlice[[0, 1], ['a']])._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], expected) - self.assertEqual(ctx['body'][1][1]['display_value'], '1.1') - self.assertEqual(ctx['body'][0][2]['display_value'], '0.1234') - self.assertEqual(ctx['body'][1][2]['display_value'], '1.1234') + assert ctx['body'][0][1]['display_value'] == expected + assert ctx['body'][1][1]['display_value'] == '1.1' + assert ctx['body'][0][2]['display_value'] == '0.1234' + assert ctx['body'][1][2]['display_value'] == '1.1234' def test_display_dict(self): df = pd.DataFrame([[.1234, .1234], [1.1234, 1.1234]], columns=['a', 'b']) ctx = df.style.format({"a": "{:0.1f}", "b": "{0:.2%}"})._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], '0.1') - self.assertEqual(ctx['body'][0][2]['display_value'], '12.34%') + assert ctx['body'][0][1]['display_value'] == '0.1' + assert ctx['body'][0][2]['display_value'] == '12.34%' df['c'] = ['aaa', 'bbb'] ctx = df.style.format({"a": "{:0.1f}", "c": str.upper})._translate() - self.assertEqual(ctx['body'][0][1]['display_value'], '0.1') - self.assertEqual(ctx['body'][0][3]['display_value'], 'AAA') + assert ctx['body'][0][1]['display_value'] == '0.1' + assert ctx['body'][0][3]['display_value'] == 'AAA' def test_bad_apply_shape(self): df = pd.DataFrame([[1, 2], [3, 4]]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(lambda x: 'x', subset=pd.IndexSlice[[0, 1], :]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(lambda x: [''], subset=pd.IndexSlice[[0, 1], :]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(lambda x: ['', '', '', '']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(lambda x: ['', '', ''], subset=1) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(lambda x: ['', '', ''], axis=1) def test_apply_bad_return(self): def f(x): return '' df = pd.DataFrame([[1, 2], [3, 4]]) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.style._apply(f, axis=None) def test_apply_bad_labels(self): def f(x): return pd.DataFrame(index=[1, 2], columns=['a', 'b']) df = pd.DataFrame([[1, 2], [3, 4]]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.style._apply(f, axis=None) def test_get_level_lengths(self): @@ -620,28 +728,29 @@ def test_mi_sparse(self): df = pd.DataFrame({'A': [1, 2]}, index=pd.MultiIndex.from_arrays([['a', 'a'], [0, 1]])) + result = df.style._translate() body_0 = result['body'][0][0] expected_0 = { "value": "a", "display_value": "a", "is_visible": True, "type": "th", "attributes": ["rowspan=2"], - "class": "row_heading level0 row0", + "class": "row_heading level0 row0", "id": "level0_row0" } tm.assert_dict_equal(body_0, expected_0) body_1 = result['body'][0][1] expected_1 = { "value": 0, "display_value": 0, "is_visible": True, - "type": "th", "attributes": ["rowspan=1"], - "class": "row_heading level1 row0", + "type": "th", "class": "row_heading level1 row0", + "id": "level1_row0" } tm.assert_dict_equal(body_1, expected_1) body_10 = result['body'][1][0] expected_10 = { "value": 'a', "display_value": 'a', "is_visible": False, - "type": "th", "attributes": ["rowspan=1"], - "class": "row_heading level0 row1", + "type": "th", "class": "row_heading level0 row1", + "id": "level0_row1" } tm.assert_dict_equal(body_10, expected_10) @@ -651,20 +760,19 @@ def test_mi_sparse(self): 'is_visible': True, "display_value": ''}, {'type': 'th', 'class': 'blank level0', 'value': '', 'is_visible': True, 'display_value': ''}, - {'attributes': ['colspan=1'], 'class': 'col_heading level0 col0', - 'is_visible': True, 'type': 'th', 'value': 'A', - 'display_value': 'A'}] - self.assertEqual(head, expected) + {'type': 'th', 'class': 'col_heading level0 col0', 'value': 'A', + 'is_visible': True, 'display_value': 'A'}] + assert head == expected def test_mi_sparse_disabled(self): with pd.option_context('display.multi_sparse', False): df = pd.DataFrame({'A': [1, 2]}, index=pd.MultiIndex.from_arrays([['a', 'a'], - [0, 1]])) + [0, 1]])) result = df.style._translate() body = result['body'] for row in body: - self.assertEqual(row[0]['attributes'], ['rowspan=1']) + assert 'attributes' not in row[0] def test_mi_sparse_index_names(self): df = pd.DataFrame({'A': [1, 2]}, index=pd.MultiIndex.from_arrays( @@ -680,7 +788,7 @@ def test_mi_sparse_index_names(self): 'type': 'th'}, {'class': 'blank', 'value': '', 'type': 'th'}] - self.assertEqual(head, expected) + assert head == expected def test_mi_sparse_column_names(self): df = pd.DataFrame( @@ -700,48 +808,81 @@ def test_mi_sparse_column_names(self): 'type': 'th', 'is_visible': True}, {'class': 'index_name level1', 'value': 'col_1', 'display_value': 'col_1', 'is_visible': True, 'type': 'th'}, - {'attributes': ['colspan=1'], - 'class': 'col_heading level1 col0', + {'class': 'col_heading level1 col0', 'display_value': 1, 'is_visible': True, 'type': 'th', 'value': 1}, - {'attributes': ['colspan=1'], - 'class': 'col_heading level1 col1', + {'class': 'col_heading level1 col1', 'display_value': 0, 'is_visible': True, 'type': 'th', 'value': 0}, - {'attributes': ['colspan=1'], - 'class': 'col_heading level1 col2', + {'class': 'col_heading level1 col2', 'display_value': 1, 'is_visible': True, 'type': 'th', 'value': 1}, - {'attributes': ['colspan=1'], - 'class': 'col_heading level1 col3', + {'class': 'col_heading level1 col3', 'display_value': 0, 'is_visible': True, 'type': 'th', 'value': 0}, ] - self.assertEqual(head, expected) + assert head == expected -@tm.mplskip -class TestStylerMatplotlibDep(TestCase): +class TestStylerMatplotlibDep(object): def test_background_gradient(self): + tm._skip_if_no_mpl() df = pd.DataFrame([[1, 2], [2, 4]], columns=['A', 'B']) - for axis in [0, 1, 'index', 'columns']: - for cmap in [None, 'YlOrRd']: - result = df.style.background_gradient(cmap=cmap)._compute().ctx - self.assertTrue(all("#" in x[0] for x in result.values())) - self.assertEqual(result[(0, 0)], result[(0, 1)]) - self.assertEqual(result[(1, 0)], result[(1, 1)]) - - result = (df.style.background_gradient(subset=pd.IndexSlice[1, 'A']) - ._compute().ctx) - self.assertEqual(result[(1, 0)], ['background-color: #fff7fb']) + + for c_map in [None, 'YlOrRd']: + result = df.style.background_gradient(cmap=c_map)._compute().ctx + assert all("#" in x[0] for x in result.values()) + assert result[(0, 0)] == result[(0, 1)] + assert result[(1, 0)] == result[(1, 1)] + + result = df.style.background_gradient( + subset=pd.IndexSlice[1, 'A'])._compute().ctx + assert result[(1, 0)] == ['background-color: #fff7fb'] + + +def test_block_names(): + # catch accidental removal of a block + expected = { + 'before_style', 'style', 'table_styles', 'before_cellstyle', + 'cellstyle', 'before_table', 'table', 'caption', 'thead', 'tbody', + 'after_table', 'before_head_rows', 'head_tr', 'after_head_rows', + 'before_rows', 'tr', 'after_rows', + } + result = set(Styler.template.blocks) + assert result == expected + + +def test_from_custom_template(tmpdir): + p = tmpdir.mkdir("templates").join("myhtml.tpl") + p.write(textwrap.dedent("""\ + {% extends "html.tpl" %} + {% block table %} +

{{ table_title|default("My Table") }}

+ {{ super() }} + {% endblock table %}""")) + result = Styler.from_custom_template(str(tmpdir.join('templates')), + 'myhtml.tpl') + assert issubclass(result, Styler) + assert result.env is not Styler.env + assert result.template is not Styler.template + styler = result(pd.DataFrame({"A": [1, 2]})) + assert styler.render() + + +def test_shim(): + # https://github.com/pandas-dev/pandas/pull/16059 + # Remove in 0.21 + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + from pandas.formats.style import Styler as _styler # noqa diff --git a/pandas/tests/io/formats/test_to_csv.py b/pandas/tests/io/formats/test_to_csv.py new file mode 100644 index 0000000000000..1073fbcef5aec --- /dev/null +++ b/pandas/tests/io/formats/test_to_csv.py @@ -0,0 +1,208 @@ +from pandas import DataFrame +import numpy as np +import pandas as pd +from pandas.util import testing as tm + + +class TestToCSV(object): + + def test_to_csv_quotechar(self): + df = DataFrame({'col': [1, 2]}) + expected = """\ +"","col" +"0","1" +"1","2" +""" + + with tm.ensure_clean('test.csv') as path: + df.to_csv(path, quoting=1) # 1=QUOTE_ALL + with open(path, 'r') as f: + assert f.read() == expected + + expected = """\ +$$,$col$ +$0$,$1$ +$1$,$2$ +""" + + with tm.ensure_clean('test.csv') as path: + df.to_csv(path, quoting=1, quotechar="$") + with open(path, 'r') as f: + assert f.read() == expected + + with tm.ensure_clean('test.csv') as path: + with tm.assert_raises_regex(TypeError, 'quotechar'): + df.to_csv(path, quoting=1, quotechar=None) + + def test_to_csv_doublequote(self): + df = DataFrame({'col': ['a"a', '"bb"']}) + expected = '''\ +"","col" +"0","a""a" +"1","""bb""" +''' + + with tm.ensure_clean('test.csv') as path: + df.to_csv(path, quoting=1, doublequote=True) # QUOTE_ALL + with open(path, 'r') as f: + assert f.read() == expected + + from _csv import Error + with tm.ensure_clean('test.csv') as path: + with tm.assert_raises_regex(Error, 'escapechar'): + df.to_csv(path, doublequote=False) # no escapechar set + + def test_to_csv_escapechar(self): + df = DataFrame({'col': ['a"a', '"bb"']}) + expected = '''\ +"","col" +"0","a\\"a" +"1","\\"bb\\"" +''' + + with tm.ensure_clean('test.csv') as path: # QUOTE_ALL + df.to_csv(path, quoting=1, doublequote=False, escapechar='\\') + with open(path, 'r') as f: + assert f.read() == expected + + df = DataFrame({'col': ['a,a', ',bb,']}) + expected = """\ +,col +0,a\\,a +1,\\,bb\\, +""" + + with tm.ensure_clean('test.csv') as path: + df.to_csv(path, quoting=3, escapechar='\\') # QUOTE_NONE + with open(path, 'r') as f: + assert f.read() == expected + + def test_csv_to_string(self): + df = DataFrame({'col': [1, 2]}) + expected = ',col\n0,1\n1,2\n' + assert df.to_csv() == expected + + def test_to_csv_decimal(self): + # GH 781 + df = DataFrame({'col1': [1], 'col2': ['a'], 'col3': [10.1]}) + + expected_default = ',col1,col2,col3\n0,1,a,10.1\n' + assert df.to_csv() == expected_default + + expected_european_excel = ';col1;col2;col3\n0;1;a;10,1\n' + assert df.to_csv(decimal=',', sep=';') == expected_european_excel + + expected_float_format_default = ',col1,col2,col3\n0,1,a,10.10\n' + assert df.to_csv(float_format='%.2f') == expected_float_format_default + + expected_float_format = ';col1;col2;col3\n0;1;a;10,10\n' + assert df.to_csv(decimal=',', sep=';', + float_format='%.2f') == expected_float_format + + # GH 11553: testing if decimal is taken into account for '0.0' + df = pd.DataFrame({'a': [0, 1.1], 'b': [2.2, 3.3], 'c': 1}) + expected = 'a,b,c\n0^0,2^2,1\n1^1,3^3,1\n' + assert df.to_csv(index=False, decimal='^') == expected + + # same but for an index + assert df.set_index('a').to_csv(decimal='^') == expected + + # same for a multi-index + assert df.set_index(['a', 'b']).to_csv(decimal="^") == expected + + def test_to_csv_float_format(self): + # testing if float_format is taken into account for the index + # GH 11553 + df = pd.DataFrame({'a': [0, 1], 'b': [2.2, 3.3], 'c': 1}) + expected = 'a,b,c\n0,2.20,1\n1,3.30,1\n' + assert df.set_index('a').to_csv(float_format='%.2f') == expected + + # same for a multi-index + assert df.set_index(['a', 'b']).to_csv( + float_format='%.2f') == expected + + def test_to_csv_na_rep(self): + # testing if NaN values are correctly represented in the index + # GH 11553 + df = DataFrame({'a': [0, np.NaN], 'b': [0, 1], 'c': [2, 3]}) + expected = "a,b,c\n0.0,0,2\n_,1,3\n" + assert df.set_index('a').to_csv(na_rep='_') == expected + assert df.set_index(['a', 'b']).to_csv(na_rep='_') == expected + + # now with an index containing only NaNs + df = DataFrame({'a': np.NaN, 'b': [0, 1], 'c': [2, 3]}) + expected = "a,b,c\n_,0,2\n_,1,3\n" + assert df.set_index('a').to_csv(na_rep='_') == expected + assert df.set_index(['a', 'b']).to_csv(na_rep='_') == expected + + # check if na_rep parameter does not break anything when no NaN + df = DataFrame({'a': 0, 'b': [0, 1], 'c': [2, 3]}) + expected = "a,b,c\n0,0,2\n0,1,3\n" + assert df.set_index('a').to_csv(na_rep='_') == expected + assert df.set_index(['a', 'b']).to_csv(na_rep='_') == expected + + def test_to_csv_date_format(self): + # GH 10209 + df_sec = DataFrame({'A': pd.date_range('20130101', periods=5, freq='s') + }) + df_day = DataFrame({'A': pd.date_range('20130101', periods=5, freq='d') + }) + + expected_default_sec = (',A\n0,2013-01-01 00:00:00\n1,' + '2013-01-01 00:00:01\n2,2013-01-01 00:00:02' + '\n3,2013-01-01 00:00:03\n4,' + '2013-01-01 00:00:04\n') + assert df_sec.to_csv() == expected_default_sec + + expected_ymdhms_day = (',A\n0,2013-01-01 00:00:00\n1,' + '2013-01-02 00:00:00\n2,2013-01-03 00:00:00' + '\n3,2013-01-04 00:00:00\n4,' + '2013-01-05 00:00:00\n') + assert (df_day.to_csv(date_format='%Y-%m-%d %H:%M:%S') == + expected_ymdhms_day) + + expected_ymd_sec = (',A\n0,2013-01-01\n1,2013-01-01\n2,' + '2013-01-01\n3,2013-01-01\n4,2013-01-01\n') + assert df_sec.to_csv(date_format='%Y-%m-%d') == expected_ymd_sec + + expected_default_day = (',A\n0,2013-01-01\n1,2013-01-02\n2,' + '2013-01-03\n3,2013-01-04\n4,2013-01-05\n') + assert df_day.to_csv() == expected_default_day + assert df_day.to_csv(date_format='%Y-%m-%d') == expected_default_day + + # testing if date_format parameter is taken into account for + # multi-indexed dataframes (GH 7791) + df_sec['B'] = 0 + df_sec['C'] = 1 + expected_ymd_sec = 'A,B,C\n2013-01-01,0,1\n' + df_sec_grouped = df_sec.groupby([pd.Grouper(key='A', freq='1h'), 'B']) + assert (df_sec_grouped.mean().to_csv(date_format='%Y-%m-%d') == + expected_ymd_sec) + + def test_to_csv_multi_index(self): + # see gh-6618 + df = DataFrame([1], columns=pd.MultiIndex.from_arrays([[1], [2]])) + + exp = ",1\n,2\n0,1\n" + assert df.to_csv() == exp + + exp = "1\n2\n1\n" + assert df.to_csv(index=False) == exp + + df = DataFrame([1], columns=pd.MultiIndex.from_arrays([[1], [2]]), + index=pd.MultiIndex.from_arrays([[1], [2]])) + + exp = ",,1\n,,2\n1,2,1\n" + assert df.to_csv() == exp + + exp = "1\n2\n1\n" + assert df.to_csv(index=False) == exp + + df = DataFrame( + [1], columns=pd.MultiIndex.from_arrays([['foo'], ['bar']])) + + exp = ",foo\n,bar\n0,1\n" + assert df.to_csv() == exp + + exp = "foo\nbar\n1\n" + assert df.to_csv(index=False) == exp diff --git a/pandas/tests/io/formats/test_to_excel.py b/pandas/tests/io/formats/test_to_excel.py new file mode 100644 index 0000000000000..26a9bb018f30a --- /dev/null +++ b/pandas/tests/io/formats/test_to_excel.py @@ -0,0 +1,214 @@ +"""Tests formatting as writer-agnostic ExcelCells + +ExcelFormatter is tested implicitly in pandas/tests/io/test_excel.py +""" + +import pytest + +from pandas.io.formats.excel import CSSToExcelConverter + + +@pytest.mark.parametrize('css,expected', [ + # FONT + # - name + ('font-family: foo,bar', {'font': {'name': 'foo'}}), + ('font-family: "foo bar",baz', {'font': {'name': 'foo bar'}}), + ('font-family: foo,\nbar', {'font': {'name': 'foo'}}), + ('font-family: foo, bar, baz', {'font': {'name': 'foo'}}), + ('font-family: bar, foo', {'font': {'name': 'bar'}}), + ('font-family: \'foo bar\', baz', {'font': {'name': 'foo bar'}}), + ('font-family: \'foo \\\'bar\', baz', {'font': {'name': 'foo \'bar'}}), + ('font-family: "foo \\"bar", baz', {'font': {'name': 'foo "bar'}}), + ('font-family: "foo ,bar", baz', {'font': {'name': 'foo ,bar'}}), + # - family + ('font-family: serif', {'font': {'name': 'serif', 'family': 1}}), + ('font-family: Serif', {'font': {'name': 'serif', 'family': 1}}), + ('font-family: roman, serif', {'font': {'name': 'roman', 'family': 1}}), + ('font-family: roman, sans-serif', {'font': {'name': 'roman', + 'family': 2}}), + ('font-family: roman, sans serif', {'font': {'name': 'roman'}}), + ('font-family: roman, sansserif', {'font': {'name': 'roman'}}), + ('font-family: roman, cursive', {'font': {'name': 'roman', 'family': 4}}), + ('font-family: roman, fantasy', {'font': {'name': 'roman', 'family': 5}}), + # - size + ('font-size: 1em', {'font': {'size': 12}}), + ('font-size: xx-small', {'font': {'size': 6}}), + ('font-size: x-small', {'font': {'size': 7.5}}), + ('font-size: small', {'font': {'size': 9.6}}), + ('font-size: medium', {'font': {'size': 12}}), + ('font-size: large', {'font': {'size': 13.5}}), + ('font-size: x-large', {'font': {'size': 18}}), + ('font-size: xx-large', {'font': {'size': 24}}), + ('font-size: 50%', {'font': {'size': 6}}), + # - bold + ('font-weight: 100', {'font': {'bold': False}}), + ('font-weight: 200', {'font': {'bold': False}}), + ('font-weight: 300', {'font': {'bold': False}}), + ('font-weight: 400', {'font': {'bold': False}}), + ('font-weight: normal', {'font': {'bold': False}}), + ('font-weight: lighter', {'font': {'bold': False}}), + ('font-weight: bold', {'font': {'bold': True}}), + ('font-weight: bolder', {'font': {'bold': True}}), + ('font-weight: 700', {'font': {'bold': True}}), + ('font-weight: 800', {'font': {'bold': True}}), + ('font-weight: 900', {'font': {'bold': True}}), + # - italic + ('font-style: italic', {'font': {'italic': True}}), + ('font-style: oblique', {'font': {'italic': True}}), + # - underline + ('text-decoration: underline', + {'font': {'underline': 'single'}}), + ('text-decoration: overline', + {}), + ('text-decoration: none', + {}), + # - strike + ('text-decoration: line-through', + {'font': {'strike': True}}), + ('text-decoration: underline line-through', + {'font': {'strike': True, 'underline': 'single'}}), + ('text-decoration: underline; text-decoration: line-through', + {'font': {'strike': True}}), + # - color + ('color: red', {'font': {'color': 'FF0000'}}), + ('color: #ff0000', {'font': {'color': 'FF0000'}}), + ('color: #f0a', {'font': {'color': 'FF00AA'}}), + # - shadow + ('text-shadow: none', {'font': {'shadow': False}}), + ('text-shadow: 0px -0em 0px #CCC', {'font': {'shadow': False}}), + ('text-shadow: 0px -0em 0px #999', {'font': {'shadow': False}}), + ('text-shadow: 0px -0em 0px', {'font': {'shadow': False}}), + ('text-shadow: 2px -0em 0px #CCC', {'font': {'shadow': True}}), + ('text-shadow: 0px -2em 0px #CCC', {'font': {'shadow': True}}), + ('text-shadow: 0px -0em 2px #CCC', {'font': {'shadow': True}}), + ('text-shadow: 0px -0em 2px', {'font': {'shadow': True}}), + ('text-shadow: 0px -2em', {'font': {'shadow': True}}), + + # FILL + # - color, fillType + ('background-color: red', {'fill': {'fgColor': 'FF0000', + 'patternType': 'solid'}}), + ('background-color: #ff0000', {'fill': {'fgColor': 'FF0000', + 'patternType': 'solid'}}), + ('background-color: #f0a', {'fill': {'fgColor': 'FF00AA', + 'patternType': 'solid'}}), + # BORDER + # - style + ('border-style: solid', + {'border': {'top': {'style': 'medium'}, + 'bottom': {'style': 'medium'}, + 'left': {'style': 'medium'}, + 'right': {'style': 'medium'}}}), + ('border-style: solid; border-width: thin', + {'border': {'top': {'style': 'thin'}, + 'bottom': {'style': 'thin'}, + 'left': {'style': 'thin'}, + 'right': {'style': 'thin'}}}), + + ('border-top-style: solid; border-top-width: thin', + {'border': {'top': {'style': 'thin'}}}), + ('border-top-style: solid; border-top-width: 1pt', + {'border': {'top': {'style': 'thin'}}}), + ('border-top-style: solid', + {'border': {'top': {'style': 'medium'}}}), + ('border-top-style: solid; border-top-width: medium', + {'border': {'top': {'style': 'medium'}}}), + ('border-top-style: solid; border-top-width: 2pt', + {'border': {'top': {'style': 'medium'}}}), + ('border-top-style: solid; border-top-width: thick', + {'border': {'top': {'style': 'thick'}}}), + ('border-top-style: solid; border-top-width: 4pt', + {'border': {'top': {'style': 'thick'}}}), + + ('border-top-style: dotted', + {'border': {'top': {'style': 'mediumDashDotDot'}}}), + ('border-top-style: dotted; border-top-width: thin', + {'border': {'top': {'style': 'dotted'}}}), + ('border-top-style: dashed', + {'border': {'top': {'style': 'mediumDashed'}}}), + ('border-top-style: dashed; border-top-width: thin', + {'border': {'top': {'style': 'dashed'}}}), + ('border-top-style: double', + {'border': {'top': {'style': 'double'}}}), + # - color + ('border-style: solid; border-color: #0000ff', + {'border': {'top': {'style': 'medium', 'color': '0000FF'}, + 'right': {'style': 'medium', 'color': '0000FF'}, + 'bottom': {'style': 'medium', 'color': '0000FF'}, + 'left': {'style': 'medium', 'color': '0000FF'}}}), + ('border-top-style: double; border-top-color: blue', + {'border': {'top': {'style': 'double', 'color': '0000FF'}}}), + ('border-top-style: solid; border-top-color: #06c', + {'border': {'top': {'style': 'medium', 'color': '0066CC'}}}), + # ALIGNMENT + # - horizontal + ('text-align: center', + {'alignment': {'horizontal': 'center'}}), + ('text-align: left', + {'alignment': {'horizontal': 'left'}}), + ('text-align: right', + {'alignment': {'horizontal': 'right'}}), + ('text-align: justify', + {'alignment': {'horizontal': 'justify'}}), + # - vertical + ('vertical-align: top', + {'alignment': {'vertical': 'top'}}), + ('vertical-align: text-top', + {'alignment': {'vertical': 'top'}}), + ('vertical-align: middle', + {'alignment': {'vertical': 'center'}}), + ('vertical-align: bottom', + {'alignment': {'vertical': 'bottom'}}), + ('vertical-align: text-bottom', + {'alignment': {'vertical': 'bottom'}}), + # - wrap_text + ('white-space: nowrap', + {'alignment': {'wrap_text': False}}), + ('white-space: pre', + {'alignment': {'wrap_text': False}}), + ('white-space: pre-line', + {'alignment': {'wrap_text': False}}), + ('white-space: normal', + {'alignment': {'wrap_text': True}}), +]) +def test_css_to_excel(css, expected): + convert = CSSToExcelConverter() + assert expected == convert(css) + + +def test_css_to_excel_multiple(): + convert = CSSToExcelConverter() + actual = convert(''' + font-weight: bold; + text-decoration: underline; + color: red; + border-width: thin; + text-align: center; + vertical-align: top; + unused: something; + ''') + assert {"font": {"bold": True, "underline": "single", "color": "FF0000"}, + "border": {"top": {"style": "thin"}, + "right": {"style": "thin"}, + "bottom": {"style": "thin"}, + "left": {"style": "thin"}}, + "alignment": {"horizontal": "center", + "vertical": "top"}} == actual + + +@pytest.mark.parametrize('css,inherited,expected', [ + ('font-weight: bold', '', + {'font': {'bold': True}}), + ('', 'font-weight: bold', + {'font': {'bold': True}}), + ('font-weight: bold', 'font-style: italic', + {'font': {'bold': True, 'italic': True}}), + ('font-style: normal', 'font-style: italic', + {'font': {'italic': False}}), + ('font-style: inherit', '', {}), + ('font-style: normal; font-style: inherit', 'font-style: italic', + {'font': {'italic': True}}), +]) +def test_css_to_excel_inherited(css, inherited, expected): + convert = CSSToExcelConverter(inherited) + assert expected == convert(css) diff --git a/pandas/tests/io/formats/test_to_html.py b/pandas/tests/io/formats/test_to_html.py new file mode 100644 index 0000000000000..1e174c34221d5 --- /dev/null +++ b/pandas/tests/io/formats/test_to_html.py @@ -0,0 +1,1883 @@ +# -*- coding: utf-8 -*- + +import re +from textwrap import dedent +from datetime import datetime +from distutils.version import LooseVersion + +import pytest +import numpy as np +import pandas as pd +from pandas import compat, DataFrame, MultiIndex, option_context, Index +from pandas.compat import u, lrange, StringIO +from pandas.util import testing as tm +import pandas.io.formats.format as fmt + +div_style = '' +try: + import IPython + if IPython.__version__ < LooseVersion('3.0.0'): + div_style = ' style="max-width:1500px;overflow:auto;"' +except (ImportError, AttributeError): + pass + + +class TestToHTML(object): + + def test_to_html_with_col_space(self): + def check_with_width(df, col_space): + # check that col_space affects HTML generation + # and be very brittle about it. + html = df.to_html(col_space=col_space) + hdrs = [x for x in html.split(r"\n") if re.search(r"\s]", x)] + assert len(hdrs) > 0 + for h in hdrs: + assert "min-width" in h + assert str(col_space) in h + + df = DataFrame(np.random.random(size=(1, 3))) + + check_with_width(df, 30) + check_with_width(df, 50) + + def test_to_html_with_empty_string_label(self): + # GH3547, to_html regards empty string labels as repeated labels + data = {'c1': ['a', 'b'], 'c2': ['a', ''], 'data': [1, 2]} + df = DataFrame(data).set_index(['c1', 'c2']) + res = df.to_html() + assert "rowspan" not in res + + def test_to_html_unicode(self): + df = DataFrame({u('\u03c3'): np.arange(10.)}) + expected = u'\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\u03c3
00.0
11.0
22.0
33.0
44.0
55.0
66.0
77.0
88.0
99.0
' # noqa + assert df.to_html() == expected + df = DataFrame({'A': [u('\u03c3')]}) + expected = u'\n \n \n \n \n \n \n \n \n \n \n \n \n
A
0\u03c3
' # noqa + assert df.to_html() == expected + + def test_to_html_decimal(self): + # GH 12031 + df = DataFrame({'A': [6.0, 3.1, 2.2]}) + result = df.to_html(decimal=',') + expected = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
A
06,0
13,1
22,2
') + assert result == expected + + def test_to_html_escaped(self): + a = 'str", + b: ""}, + 'co>l2': {a: "", + b: ""}} + rs = DataFrame(test_dict).to_html() + xp = """ + + + + + + + + + + + + + + + + + + + +
co<l1co>l2
str<ing1 &amp;<type 'str'><type 'str'>
stri>ng2 &amp;<type 'str'><type 'str'>
""" + + assert xp == rs + + def test_to_html_escape_disabled(self): + a = 'strbold", + b: "bold"}, + 'co>l2': {a: "bold", + b: "bold"}} + rs = DataFrame(test_dict).to_html(escape=False) + xp = """ + + + + + + + + + + + + + + + + + +
co + co>l2
str + boldbold
stri>ng2 &boldbold
""" + + assert xp == rs + + def test_to_html_multiindex_index_false(self): + # issue 8452 + df = DataFrame({ + 'a': range(2), + 'b': range(3, 5), + 'c': range(5, 7), + 'd': range(3, 5) + }) + df.columns = MultiIndex.from_product([['a', 'b'], ['c', 'd']]) + result = df.to_html(index=False) + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ab
cdcd
0353
1464
""" + + assert result == expected + + df.index = Index(df.index.values, name='idx') + result = df.to_html(index=False) + assert result == expected + + def test_to_html_multiindex_sparsify_false_multi_sparse(self): + with option_context('display.multi_sparse', False): + index = MultiIndex.from_arrays([[0, 0, 1, 1], [0, 1, 0, 1]], + names=['foo', None]) + + df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], index=index) + + result = df.to_html() + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
01
foo
0001
0123
1045
1167
""" + + assert result == expected + + df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], + columns=index[::2], index=index) + + result = df.to_html() + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
foo01
00
foo
0001
0123
1045
1167
""" + + assert result == expected + + def test_to_html_multiindex_sparsify(self): + index = MultiIndex.from_arrays([[0, 0, 1, 1], [0, 1, 0, 1]], + names=['foo', None]) + + df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], index=index) + + result = df.to_html() + expected = """ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
01
foo
0001
123
1045
167
""" + + assert result == expected + + df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], columns=index[::2], + index=index) + + result = df.to_html() + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
foo01
00
foo
0001
123
1045
167
""" + + assert result == expected + + def test_to_html_multiindex_odd_even_truncate(self): + # GH 14882 - Issue on truncation with odd length DataFrame + mi = MultiIndex.from_product([[100, 200, 300], + [10, 20, 30], + [1, 2, 3, 4, 5, 6, 7]], + names=['a', 'b', 'c']) + df = DataFrame({'n': range(len(mi))}, index=mi) + result = df.to_html(max_rows=60) + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
n
abc
1001010
21
32
43
54
65
76
2017
28
39
410
511
612
713
30114
215
316
417
518
619
720
20010121
222
323
424
525
626
727
20128
229
......
633
734
30135
236
337
438
539
640
741
30010142
243
344
445
546
647
748
20149
250
351
452
553
654
755
30156
257
358
459
560
661
762
""" + assert result == expected + + # Test that ... appears in a middle level + result = df.to_html(max_rows=56) + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
n
abc
1001010
21
32
43
54
65
76
2017
28
39
410
511
612
713
30114
215
316
417
518
619
720
20010121
222
323
424
525
626
727
.........
30135
236
337
438
539
640
741
30010142
243
344
445
546
647
748
20149
250
351
452
553
654
755
30156
257
358
459
560
661
762
""" + assert result == expected + + def test_to_html_index_formatter(self): + df = DataFrame([[0, 1], [2, 3], [4, 5], [6, 7]], columns=['foo', None], + index=lrange(4)) + + f = lambda x: 'abcd' [x] + result = df.to_html(formatters={'__index__': f}) + expected = """\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
fooNone
a01
b23
c45
d67
""" + + assert result == expected + + def test_to_html_datetime64_monthformatter(self): + months = [datetime(2016, 1, 1), datetime(2016, 2, 2)] + x = DataFrame({'months': months}) + + def format_func(x): + return x.strftime('%Y-%m') + result = x.to_html(formatters={'months': format_func}) + expected = """\ + + + + + + + + + + + + + + + + + +
months
02016-01
12016-02
""" + assert result == expected + + def test_to_html_datetime64_hourformatter(self): + + x = DataFrame({'hod': pd.to_datetime(['10:10:10.100', '12:12:12.120'], + format='%H:%M:%S.%f')}) + + def format_func(x): + return x.strftime('%H:%M') + result = x.to_html(formatters={'hod': format_func}) + expected = """\ + + + + + + + + + + + + + + + + + +
hod
010:10
112:12
""" + assert result == expected + + def test_to_html_regression_GH6098(self): + df = DataFrame({ + u('clé1'): [u('a'), u('a'), u('b'), u('b'), u('a')], + u('clé2'): [u('1er'), u('2ème'), u('1er'), u('2ème'), u('1er')], + 'données1': np.random.randn(5), + 'données2': np.random.randn(5)}) + + # it works + df.pivot_table(index=[u('clé1')], columns=[u('clé2')])._repr_html_() + + def test_to_html_truncate(self): + pytest.skip("unreliable on travis") + index = pd.DatetimeIndex(start='20010101', freq='D', periods=20) + df = DataFrame(index=index, columns=range(20)) + fmt.set_option('display.max_rows', 8) + fmt.set_option('display.max_columns', 4) + result = df._repr_html_() + expected = '''\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
01...1819
2001-01-01NaNNaN...NaNNaN
2001-01-02NaNNaN...NaNNaN
2001-01-03NaNNaN...NaNNaN
2001-01-04NaNNaN...NaNNaN
..................
2001-01-17NaNNaN...NaNNaN
2001-01-18NaNNaN...NaNNaN
2001-01-19NaNNaN...NaNNaN
2001-01-20NaNNaN...NaNNaN
+

20 rows × 20 columns

+'''.format(div_style) + if compat.PY2: + expected = expected.decode('utf-8') + assert result == expected + + def test_to_html_truncate_multi_index(self): + pytest.skip("unreliable on travis") + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + df = DataFrame(index=arrays, columns=arrays) + fmt.set_option('display.max_rows', 7) + fmt.set_option('display.max_columns', 7) + result = df._repr_html_() + expected = '''\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
barbaz...fooqux
onetwoone...twoonetwo
baroneNaNNaNNaN...NaNNaNNaN
twoNaNNaNNaN...NaNNaNNaN
bazoneNaNNaNNaN...NaNNaNNaN
...........................
footwoNaNNaNNaN...NaNNaNNaN
quxoneNaNNaNNaN...NaNNaNNaN
twoNaNNaNNaN...NaNNaNNaN
+

8 rows × 8 columns

+'''.format(div_style) + if compat.PY2: + expected = expected.decode('utf-8') + assert result == expected + + def test_to_html_truncate_multi_index_sparse_off(self): + pytest.skip("unreliable on travis") + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + df = DataFrame(index=arrays, columns=arrays) + fmt.set_option('display.max_rows', 7) + fmt.set_option('display.max_columns', 7) + fmt.set_option('display.multi_sparse', False) + result = df._repr_html_() + expected = '''\ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
barbarbaz...fooquxqux
onetwoone...twoonetwo
baroneNaNNaNNaN...NaNNaNNaN
bartwoNaNNaNNaN...NaNNaNNaN
bazoneNaNNaNNaN...NaNNaNNaN
footwoNaNNaNNaN...NaNNaNNaN
quxoneNaNNaNNaN...NaNNaNNaN
quxtwoNaNNaNNaN...NaNNaNNaN
+

8 rows × 8 columns

+'''.format(div_style) + if compat.PY2: + expected = expected.decode('utf-8') + assert result == expected + + def test_to_html_border(self): + df = DataFrame({'A': [1, 2]}) + result = df.to_html() + assert 'border="1"' in result + + def test_to_html_border_option(self): + df = DataFrame({'A': [1, 2]}) + with pd.option_context('display.html.border', 0): + result = df.to_html() + assert 'border="0"' in result + assert 'border="0"' in df._repr_html_() + + def test_to_html_border_zero(self): + df = DataFrame({'A': [1, 2]}) + result = df.to_html(border=0) + assert 'border="0"' in result + + def test_display_option_warning(self): + with tm.assert_produces_warning(DeprecationWarning, + check_stacklevel=False): + pd.options.html.border + + def test_to_html(self): + # big mixed + biggie = DataFrame({'A': np.random.randn(200), + 'B': tm.makeStringIndex(200)}, + index=lrange(200)) + + biggie.loc[:20, 'A'] = np.nan + biggie.loc[:20, 'B'] = np.nan + s = biggie.to_html() + + buf = StringIO() + retval = biggie.to_html(buf=buf) + assert retval is None + assert buf.getvalue() == s + + assert isinstance(s, compat.string_types) + + biggie.to_html(columns=['B', 'A'], col_space=17) + biggie.to_html(columns=['B', 'A'], + formatters={'A': lambda x: '%.1f' % x}) + + biggie.to_html(columns=['B', 'A'], float_format=str) + biggie.to_html(columns=['B', 'A'], col_space=12, float_format=str) + + frame = DataFrame(index=np.arange(200)) + frame.to_html() + + def test_to_html_filename(self): + biggie = DataFrame({'A': np.random.randn(200), + 'B': tm.makeStringIndex(200)}, + index=lrange(200)) + + biggie.loc[:20, 'A'] = np.nan + biggie.loc[:20, 'B'] = np.nan + with tm.ensure_clean('test.html') as path: + biggie.to_html(path) + with open(path, 'r') as f: + s = biggie.to_html() + s2 = f.read() + assert s == s2 + + frame = DataFrame(index=np.arange(200)) + with tm.ensure_clean('test.html') as path: + frame.to_html(path) + with open(path, 'r') as f: + assert frame.to_html() == f.read() + + def test_to_html_with_no_bold(self): + x = DataFrame({'x': np.random.randn(5)}) + ashtml = x.to_html(bold_rows=False) + assert '")] + + def test_to_html_columns_arg(self): + frame = DataFrame(tm.getSeriesData()) + result = frame.to_html(columns=['A']) + assert 'B' not in result + + def test_to_html_multiindex(self): + columns = MultiIndex.from_tuples(list(zip(np.arange(2).repeat(2), + np.mod(lrange(4), 2))), + names=['CL0', 'CL1']) + df = DataFrame([list('abcd'), list('efgh')], columns=columns) + result = df.to_html(justify='left') + expected = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
CL001
CL10101
0abcd
1efgh
') + + assert result == expected + + columns = MultiIndex.from_tuples(list(zip( + range(4), np.mod( + lrange(4), 2)))) + df = DataFrame([list('abcd'), list('efgh')], columns=columns) + + result = df.to_html(justify='right') + expected = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
0123
0101
0abcd
1efgh
') + + assert result == expected + + def test_to_html_justify(self): + df = DataFrame({'A': [6, 30000, 2], + 'B': [1, 2, 70000], + 'C': [223442, 0, 1]}, + columns=['A', 'B', 'C']) + result = df.to_html(justify='left') + expected = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
061223442
13000020
22700001
') + assert result == expected + + result = df.to_html(justify='right') + expected = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
061223442
13000020
22700001
') + assert result == expected + + def test_to_html_index(self): + index = ['foo', 'bar', 'baz'] + df = DataFrame({'A': [1, 2, 3], + 'B': [1.2, 3.4, 5.6], + 'C': ['one', 'two', np.nan]}, + columns=['A', 'B', 'C'], + index=index) + expected_with_index = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
foo11.2one
bar23.4two
baz35.6NaN
') + assert df.to_html() == expected_with_index + + expected_without_index = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
11.2one
23.4two
35.6NaN
') + result = df.to_html(index=False) + for i in index: + assert i not in result + assert result == expected_without_index + df.index = Index(['foo', 'bar', 'baz'], name='idx') + expected_with_index = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
idx
foo11.2one
bar23.4two
baz35.6NaN
') + assert df.to_html() == expected_with_index + assert df.to_html(index=False) == expected_without_index + + tuples = [('foo', 'car'), ('foo', 'bike'), ('bar', 'car')] + df.index = MultiIndex.from_tuples(tuples) + + expected_with_index = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
foocar11.2one
bike23.4two
barcar35.6NaN
') + assert df.to_html() == expected_with_index + + result = df.to_html(index=False) + for i in ['foo', 'bar', 'car', 'bike']: + assert i not in result + # must be the same result as normal index + assert result == expected_without_index + + df.index = MultiIndex.from_tuples(tuples, names=['idx1', 'idx2']) + expected_with_index = ('\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '
ABC
idx1idx2
foocar11.2one
bike23.4two
barcar35.6NaN
') + assert df.to_html() == expected_with_index + assert df.to_html(index=False) == expected_without_index + + def test_to_html_with_classes(self): + df = DataFrame() + result = df.to_html(classes="sortable draggable") + expected = dedent(""" + + + + + + + + + +
+ + """).strip() + assert result == expected + + result = df.to_html(classes=["sortable", "draggable"]) + assert result == expected + + def test_to_html_no_index_max_rows(self): + # GH https://github.com/pandas-dev/pandas/issues/14998 + df = DataFrame({"A": [1, 2, 3, 4]}) + result = df.to_html(index=False, max_rows=1) + expected = dedent("""\ + + + + + + + + + + + +
A
1
""") + assert result == expected + + def test_to_html_notebook_has_style(self): + df = pd.DataFrame({"A": [1, 2, 3]}) + result = df.to_html(notebook=True) + assert "thead tr:only-child" in result + + def test_to_html_notebook_has_no_style(self): + df = pd.DataFrame({"A": [1, 2, 3]}) + result = df.to_html() + assert "thead tr:only-child" not in result + + def test_to_html_with_index_names_false(self): + # gh-16493 + df = pd.DataFrame({"A": [1, 2]}, index=pd.Index(['a', 'b'], + name='myindexname')) + result = df.to_html(index_names=False) + assert 'myindexname' not in result diff --git a/pandas/tests/io/formats/test_to_latex.py b/pandas/tests/io/formats/test_to_latex.py new file mode 100644 index 0000000000000..aa86d1d9231fb --- /dev/null +++ b/pandas/tests/io/formats/test_to_latex.py @@ -0,0 +1,538 @@ +from datetime import datetime + +import pytest + +import pandas as pd +from pandas import DataFrame, compat, Series +from pandas.util import testing as tm +from pandas.compat import u +import codecs + + +@pytest.fixture +def frame(): + return DataFrame(tm.getSeriesData()) + + +class TestToLatex(object): + + def test_to_latex_filename(self, frame): + with tm.ensure_clean('test.tex') as path: + frame.to_latex(path) + + with open(path, 'r') as f: + assert frame.to_latex() == f.read() + + # test with utf-8 and encoding option (GH 7061) + df = DataFrame([[u'au\xdfgangen']]) + with tm.ensure_clean('test.tex') as path: + df.to_latex(path, encoding='utf-8') + with codecs.open(path, 'r', encoding='utf-8') as f: + assert df.to_latex() == f.read() + + # test with utf-8 without encoding option + if compat.PY3: # python3: pandas default encoding is utf-8 + with tm.ensure_clean('test.tex') as path: + df.to_latex(path) + with codecs.open(path, 'r', encoding='utf-8') as f: + assert df.to_latex() == f.read() + else: + # python2 default encoding is ascii, so an error should be raised + with tm.ensure_clean('test.tex') as path: + with pytest.raises(UnicodeEncodeError): + df.to_latex(path) + + def test_to_latex(self, frame): + # it works! + frame.to_latex() + + df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex() + withindex_expected = r"""\begin{tabular}{lrl} +\toprule +{} & a & b \\ +\midrule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withindex_result == withindex_expected + + withoutindex_result = df.to_latex(index=False) + withoutindex_expected = r"""\begin{tabular}{rl} +\toprule + a & b \\ +\midrule + 1 & b1 \\ + 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withoutindex_result == withoutindex_expected + + def test_to_latex_format(self, frame): + # GH Bug #9402 + frame.to_latex(column_format='ccc') + + df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex(column_format='ccc') + withindex_expected = r"""\begin{tabular}{ccc} +\toprule +{} & a & b \\ +\midrule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withindex_result == withindex_expected + + def test_to_latex_with_formatters(self): + df = DataFrame({'int': [1, 2, 3], + 'float': [1.0, 2.0, 3.0], + 'object': [(1, 2), True, False], + 'datetime64': [datetime(2016, 1, 1), + datetime(2016, 2, 5), + datetime(2016, 3, 3)]}) + + formatters = {'int': lambda x: '0x%x' % x, + 'float': lambda x: '[% 4.1f]' % x, + 'object': lambda x: '-%s-' % str(x), + 'datetime64': lambda x: x.strftime('%Y-%m'), + '__index__': lambda x: 'index: %s' % x} + result = df.to_latex(formatters=dict(formatters)) + + expected = r"""\begin{tabular}{llrrl} +\toprule +{} & datetime64 & float & int & object \\ +\midrule +index: 0 & 2016-01 & [ 1.0] & 0x1 & -(1, 2)- \\ +index: 1 & 2016-02 & [ 2.0] & 0x2 & -True- \\ +index: 2 & 2016-03 & [ 3.0] & 0x3 & -False- \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + def test_to_latex_multiindex(self): + df = DataFrame({('x', 'y'): ['a']}) + result = df.to_latex() + expected = r"""\begin{tabular}{ll} +\toprule +{} & x \\ +{} & y \\ +\midrule +0 & a \\ +\bottomrule +\end{tabular} +""" + + assert result == expected + + result = df.T.to_latex() + expected = r"""\begin{tabular}{lll} +\toprule + & & 0 \\ +\midrule +x & y & a \\ +\bottomrule +\end{tabular} +""" + + assert result == expected + + df = DataFrame.from_dict({ + ('c1', 0): pd.Series(dict((x, x) for x in range(4))), + ('c1', 1): pd.Series(dict((x, x + 4) for x in range(4))), + ('c2', 0): pd.Series(dict((x, x) for x in range(4))), + ('c2', 1): pd.Series(dict((x, x + 4) for x in range(4))), + ('c3', 0): pd.Series(dict((x, x) for x in range(4))), + }).T + result = df.to_latex() + expected = r"""\begin{tabular}{llrrrr} +\toprule + & & 0 & 1 & 2 & 3 \\ +\midrule +c1 & 0 & 0 & 1 & 2 & 3 \\ + & 1 & 4 & 5 & 6 & 7 \\ +c2 & 0 & 0 & 1 & 2 & 3 \\ + & 1 & 4 & 5 & 6 & 7 \\ +c3 & 0 & 0 & 1 & 2 & 3 \\ +\bottomrule +\end{tabular} +""" + + assert result == expected + + # GH 14184 + df = df.T + df.columns.names = ['a', 'b'] + result = df.to_latex() + expected = r"""\begin{tabular}{lrrrrr} +\toprule +a & \multicolumn{2}{l}{c1} & \multicolumn{2}{l}{c2} & c3 \\ +b & 0 & 1 & 0 & 1 & 0 \\ +\midrule +0 & 0 & 4 & 0 & 4 & 0 \\ +1 & 1 & 5 & 1 & 5 & 1 \\ +2 & 2 & 6 & 2 & 6 & 2 \\ +3 & 3 & 7 & 3 & 7 & 3 \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + # GH 10660 + df = pd.DataFrame({'a': [0, 0, 1, 1], + 'b': list('abab'), + 'c': [1, 2, 3, 4]}) + result = df.set_index(['a', 'b']).to_latex() + expected = r"""\begin{tabular}{llr} +\toprule + & & c \\ +a & b & \\ +\midrule +0 & a & 1 \\ + & b & 2 \\ +1 & a & 3 \\ + & b & 4 \\ +\bottomrule +\end{tabular} +""" + + assert result == expected + + result = df.groupby('a').describe().to_latex() + expected = r"""\begin{tabular}{lrrrrrrrr} +\toprule +{} & \multicolumn{8}{l}{c} \\ +{} & count & mean & std & min & 25\% & 50\% & 75\% & max \\ +a & & & & & & & & \\ +\midrule +0 & 2.0 & 1.5 & 0.707107 & 1.0 & 1.25 & 1.5 & 1.75 & 2.0 \\ +1 & 2.0 & 3.5 & 0.707107 & 3.0 & 3.25 & 3.5 & 3.75 & 4.0 \\ +\bottomrule +\end{tabular} +""" + + assert result == expected + + def test_to_latex_multicolumnrow(self): + df = pd.DataFrame({ + ('c1', 0): dict((x, x) for x in range(5)), + ('c1', 1): dict((x, x + 5) for x in range(5)), + ('c2', 0): dict((x, x) for x in range(5)), + ('c2', 1): dict((x, x + 5) for x in range(5)), + ('c3', 0): dict((x, x) for x in range(5)) + }) + result = df.to_latex() + expected = r"""\begin{tabular}{lrrrrr} +\toprule +{} & \multicolumn{2}{l}{c1} & \multicolumn{2}{l}{c2} & c3 \\ +{} & 0 & 1 & 0 & 1 & 0 \\ +\midrule +0 & 0 & 5 & 0 & 5 & 0 \\ +1 & 1 & 6 & 1 & 6 & 1 \\ +2 & 2 & 7 & 2 & 7 & 2 \\ +3 & 3 & 8 & 3 & 8 & 3 \\ +4 & 4 & 9 & 4 & 9 & 4 \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + result = df.to_latex(multicolumn=False) + expected = r"""\begin{tabular}{lrrrrr} +\toprule +{} & c1 & & c2 & & c3 \\ +{} & 0 & 1 & 0 & 1 & 0 \\ +\midrule +0 & 0 & 5 & 0 & 5 & 0 \\ +1 & 1 & 6 & 1 & 6 & 1 \\ +2 & 2 & 7 & 2 & 7 & 2 \\ +3 & 3 & 8 & 3 & 8 & 3 \\ +4 & 4 & 9 & 4 & 9 & 4 \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + result = df.T.to_latex(multirow=True) + expected = r"""\begin{tabular}{llrrrrr} +\toprule + & & 0 & 1 & 2 & 3 & 4 \\ +\midrule +\multirow{2}{*}{c1} & 0 & 0 & 1 & 2 & 3 & 4 \\ + & 1 & 5 & 6 & 7 & 8 & 9 \\ +\cline{1-7} +\multirow{2}{*}{c2} & 0 & 0 & 1 & 2 & 3 & 4 \\ + & 1 & 5 & 6 & 7 & 8 & 9 \\ +\cline{1-7} +c3 & 0 & 0 & 1 & 2 & 3 & 4 \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + df.index = df.T.index + result = df.T.to_latex(multirow=True, multicolumn=True, + multicolumn_format='c') + expected = r"""\begin{tabular}{llrrrrr} +\toprule + & & \multicolumn{2}{c}{c1} & \multicolumn{2}{c}{c2} & c3 \\ + & & 0 & 1 & 0 & 1 & 0 \\ +\midrule +\multirow{2}{*}{c1} & 0 & 0 & 1 & 2 & 3 & 4 \\ + & 1 & 5 & 6 & 7 & 8 & 9 \\ +\cline{1-7} +\multirow{2}{*}{c2} & 0 & 0 & 1 & 2 & 3 & 4 \\ + & 1 & 5 & 6 & 7 & 8 & 9 \\ +\cline{1-7} +c3 & 0 & 0 & 1 & 2 & 3 & 4 \\ +\bottomrule +\end{tabular} +""" + assert result == expected + + def test_to_latex_escape(self): + a = 'a' + b = 'b' + + test_dict = {u('co^l1'): {a: "a", + b: "b"}, + u('co$e^x$'): {a: "a", + b: "b"}} + + unescaped_result = DataFrame(test_dict).to_latex(escape=False) + escaped_result = DataFrame(test_dict).to_latex( + ) # default: escape=True + + unescaped_expected = r'''\begin{tabular}{lll} +\toprule +{} & co$e^x$ & co^l1 \\ +\midrule +a & a & a \\ +b & b & b \\ +\bottomrule +\end{tabular} +''' + + escaped_expected = r'''\begin{tabular}{lll} +\toprule +{} & co\$e\textasciicircumx\$ & co\textasciicircuml1 \\ +\midrule +a & a & a \\ +b & b & b \\ +\bottomrule +\end{tabular} +''' + + assert unescaped_result == unescaped_expected + assert escaped_result == escaped_expected + + def test_to_latex_longtable(self, frame): + frame.to_latex(longtable=True) + + df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex(longtable=True) + withindex_expected = r"""\begin{longtable}{lrl} +\toprule +{} & a & b \\ +\midrule +\endhead +\midrule +\multicolumn{3}{r}{{Continued on next page}} \\ +\midrule +\endfoot + +\bottomrule +\endlastfoot +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\end{longtable} +""" + + assert withindex_result == withindex_expected + + withoutindex_result = df.to_latex(index=False, longtable=True) + withoutindex_expected = r"""\begin{longtable}{rl} +\toprule + a & b \\ +\midrule +\endhead +\midrule +\multicolumn{3}{r}{{Continued on next page}} \\ +\midrule +\endfoot + +\bottomrule +\endlastfoot + 1 & b1 \\ + 2 & b2 \\ +\end{longtable} +""" + + assert withoutindex_result == withoutindex_expected + + def test_to_latex_escape_special_chars(self): + special_characters = ['&', '%', '$', '#', '_', '{', '}', '~', '^', + '\\'] + df = DataFrame(data=special_characters) + observed = df.to_latex() + expected = r"""\begin{tabular}{ll} +\toprule +{} & 0 \\ +\midrule +0 & \& \\ +1 & \% \\ +2 & \$ \\ +3 & \# \\ +4 & \_ \\ +5 & \{ \\ +6 & \} \\ +7 & \textasciitilde \\ +8 & \textasciicircum \\ +9 & \textbackslash \\ +\bottomrule +\end{tabular} +""" + + assert observed == expected + + def test_to_latex_no_header(self): + # GH 7124 + df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex(header=False) + withindex_expected = r"""\begin{tabular}{lrl} +\toprule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withindex_result == withindex_expected + + withoutindex_result = df.to_latex(index=False, header=False) + withoutindex_expected = r"""\begin{tabular}{rl} +\toprule + 1 & b1 \\ + 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withoutindex_result == withoutindex_expected + + def test_to_latex_specified_header(self): + # GH 7124 + df = DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex(header=['AA', 'BB']) + withindex_expected = r"""\begin{tabular}{lrl} +\toprule +{} & AA & BB \\ +\midrule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withindex_result == withindex_expected + + withoutindex_result = df.to_latex(header=['AA', 'BB'], index=False) + withoutindex_expected = r"""\begin{tabular}{rl} +\toprule +AA & BB \\ +\midrule + 1 & b1 \\ + 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withoutindex_result == withoutindex_expected + + withoutescape_result = df.to_latex(header=['$A$', '$B$'], escape=False) + withoutescape_expected = r"""\begin{tabular}{lrl} +\toprule +{} & $A$ & $B$ \\ +\midrule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withoutescape_result == withoutescape_expected + + with pytest.raises(ValueError): + df.to_latex(header=['A']) + + def test_to_latex_decimal(self, frame): + # GH 12031 + frame.to_latex() + + df = DataFrame({'a': [1.0, 2.1], 'b': ['b1', 'b2']}) + withindex_result = df.to_latex(decimal=',') + + withindex_expected = r"""\begin{tabular}{lrl} +\toprule +{} & a & b \\ +\midrule +0 & 1,0 & b1 \\ +1 & 2,1 & b2 \\ +\bottomrule +\end{tabular} +""" + + assert withindex_result == withindex_expected + + def test_to_latex_series(self): + s = Series(['a', 'b', 'c']) + withindex_result = s.to_latex() + withindex_expected = r"""\begin{tabular}{ll} +\toprule +{} & 0 \\ +\midrule +0 & a \\ +1 & b \\ +2 & c \\ +\bottomrule +\end{tabular} +""" + assert withindex_result == withindex_expected + + def test_to_latex_bold_rows(self): + # GH 16707 + df = pd.DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + observed = df.to_latex(bold_rows=True) + expected = r"""\begin{tabular}{lrl} +\toprule +{} & a & b \\ +\midrule +\textbf{0} & 1 & b1 \\ +\textbf{1} & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + assert observed == expected + + def test_to_latex_no_bold_rows(self): + # GH 16707 + df = pd.DataFrame({'a': [1, 2], 'b': ['b1', 'b2']}) + observed = df.to_latex(bold_rows=False) + expected = r"""\begin{tabular}{lrl} +\toprule +{} & a & b \\ +\midrule +0 & 1 & b1 \\ +1 & 2 & b2 \\ +\bottomrule +\end{tabular} +""" + assert observed == expected diff --git a/pandas/io/tests/generate_legacy_storage_files.py b/pandas/tests/io/generate_legacy_storage_files.py old mode 100644 new mode 100755 similarity index 91% rename from pandas/io/tests/generate_legacy_storage_files.py rename to pandas/tests/io/generate_legacy_storage_files.py index d0365cb2c30b3..996965999724e --- a/pandas/io/tests/generate_legacy_storage_files.py +++ b/pandas/tests/io/generate_legacy_storage_files.py @@ -1,5 +1,8 @@ +#!/usr/env/bin python + """ self-contained to write legacy storage (pickle/msgpack) files """ from __future__ import print_function +from warnings import catch_warnings from distutils.version import LooseVersion from pandas import (Series, DataFrame, Panel, SparseSeries, SparseDataFrame, @@ -124,17 +127,23 @@ def create_data(): mixed_dup=mixed_dup_df, dt_mixed_tzs=DataFrame({ u'A': Timestamp('20130102', tz='US/Eastern'), - u'B': Timestamp('20130603', tz='CET')}, index=range(5)) + u'B': Timestamp('20130603', tz='CET')}, index=range(5)), + dt_mixed2_tzs=DataFrame({ + u'A': Timestamp('20130102', tz='US/Eastern'), + u'B': Timestamp('20130603', tz='CET'), + u'C': Timestamp('20130603', tz='UTC')}, index=range(5)) ) - mixed_dup_panel = Panel({u'ItemA': frame[u'float'], - u'ItemB': frame[u'int']}) - mixed_dup_panel.items = [u'ItemA', u'ItemA'] - panel = dict(float=Panel({u'ItemA': frame[u'float'], - u'ItemB': frame[u'float'] + 1}), - dup=Panel(np.arange(30).reshape(3, 5, 2).astype(np.float64), - items=[u'A', u'B', u'A']), - mixed_dup=mixed_dup_panel) + with catch_warnings(record=True): + mixed_dup_panel = Panel({u'ItemA': frame[u'float'], + u'ItemB': frame[u'int']}) + mixed_dup_panel.items = [u'ItemA', u'ItemA'] + panel = dict(float=Panel({u'ItemA': frame[u'float'], + u'ItemB': frame[u'float'] + 1}), + dup=Panel( + np.arange(30).reshape(3, 5, 2).astype(np.float64), + items=[u'A', u'B', u'A']), + mixed_dup=mixed_dup_panel) cat = dict(int8=Categorical(list('abcdefg')), int16=Categorical(np.arange(1000)), diff --git a/pandas/tests/io/json/__init__.py b/pandas/tests/io/json/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/io/tests/json/data/tsframe_iso_v012.json b/pandas/tests/io/json/data/tsframe_iso_v012.json similarity index 100% rename from pandas/io/tests/json/data/tsframe_iso_v012.json rename to pandas/tests/io/json/data/tsframe_iso_v012.json diff --git a/pandas/io/tests/json/data/tsframe_v012.json b/pandas/tests/io/json/data/tsframe_v012.json similarity index 100% rename from pandas/io/tests/json/data/tsframe_v012.json rename to pandas/tests/io/json/data/tsframe_v012.json diff --git a/pandas/tests/io/json/test_json_table_schema.py b/pandas/tests/io/json/test_json_table_schema.py new file mode 100644 index 0000000000000..e447a74b2b462 --- /dev/null +++ b/pandas/tests/io/json/test_json_table_schema.py @@ -0,0 +1,470 @@ +"""Tests for Table Schema integration.""" +import json +from collections import OrderedDict + +import numpy as np +import pandas as pd +import pytest + +from pandas import DataFrame +from pandas.core.dtypes.dtypes import ( + PeriodDtype, CategoricalDtype, DatetimeTZDtype) +from pandas.io.json.table_schema import ( + as_json_table_type, + build_table_schema, + make_field, + set_default_names) + + +class TestBuildSchema(object): + + def setup_method(self, method): + self.df = DataFrame( + {'A': [1, 2, 3, 4], + 'B': ['a', 'b', 'c', 'c'], + 'C': pd.date_range('2016-01-01', freq='d', periods=4), + 'D': pd.timedelta_range('1H', periods=4, freq='T'), + }, + index=pd.Index(range(4), name='idx')) + + def test_build_table_schema(self): + result = build_table_schema(self.df, version=False) + expected = { + 'fields': [{'name': 'idx', 'type': 'integer'}, + {'name': 'A', 'type': 'integer'}, + {'name': 'B', 'type': 'string'}, + {'name': 'C', 'type': 'datetime'}, + {'name': 'D', 'type': 'duration'}, + ], + 'primaryKey': ['idx'] + } + assert result == expected + result = build_table_schema(self.df) + assert "pandas_version" in result + + def test_series(self): + s = pd.Series([1, 2, 3], name='foo') + result = build_table_schema(s, version=False) + expected = {'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'foo', 'type': 'integer'}], + 'primaryKey': ['index']} + assert result == expected + result = build_table_schema(s) + assert 'pandas_version' in result + + def tets_series_unnamed(self): + result = build_table_schema(pd.Series([1, 2, 3]), version=False) + expected = {'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'values', 'type': 'integer'}], + 'primaryKey': ['index']} + assert result == expected + + def test_multiindex(self): + df = self.df.copy() + idx = pd.MultiIndex.from_product([('a', 'b'), (1, 2)]) + df.index = idx + + result = build_table_schema(df, version=False) + expected = { + 'fields': [{'name': 'level_0', 'type': 'string'}, + {'name': 'level_1', 'type': 'integer'}, + {'name': 'A', 'type': 'integer'}, + {'name': 'B', 'type': 'string'}, + {'name': 'C', 'type': 'datetime'}, + {'name': 'D', 'type': 'duration'}, + ], + 'primaryKey': ['level_0', 'level_1'] + } + assert result == expected + + df.index.names = ['idx0', None] + expected['fields'][0]['name'] = 'idx0' + expected['primaryKey'] = ['idx0', 'level_1'] + result = build_table_schema(df, version=False) + assert result == expected + + +class TestTableSchemaType(object): + + def test_as_json_table_type_int_data(self): + int_data = [1, 2, 3] + int_types = [np.int, np.int16, np.int32, np.int64] + for t in int_types: + assert as_json_table_type(np.array( + int_data, dtype=t)) == 'integer' + + def test_as_json_table_type_float_data(self): + float_data = [1., 2., 3.] + float_types = [np.float, np.float16, np.float32, np.float64] + for t in float_types: + assert as_json_table_type(np.array( + float_data, dtype=t)) == 'number' + + def test_as_json_table_type_bool_data(self): + bool_data = [True, False] + bool_types = [bool, np.bool] + for t in bool_types: + assert as_json_table_type(np.array( + bool_data, dtype=t)) == 'boolean' + + def test_as_json_table_type_date_data(self): + date_data = [pd.to_datetime(['2016']), + pd.to_datetime(['2016'], utc=True), + pd.Series(pd.to_datetime(['2016'])), + pd.Series(pd.to_datetime(['2016'], utc=True)), + pd.period_range('2016', freq='A', periods=3)] + for arr in date_data: + assert as_json_table_type(arr) == 'datetime' + + def test_as_json_table_type_string_data(self): + strings = [pd.Series(['a', 'b']), pd.Index(['a', 'b'])] + for t in strings: + assert as_json_table_type(t) == 'string' + + def test_as_json_table_type_categorical_data(self): + assert as_json_table_type(pd.Categorical(['a'])) == 'any' + assert as_json_table_type(pd.Categorical([1])) == 'any' + assert as_json_table_type(pd.Series(pd.Categorical([1]))) == 'any' + assert as_json_table_type(pd.CategoricalIndex([1])) == 'any' + assert as_json_table_type(pd.Categorical([1])) == 'any' + + # ------ + # dtypes + # ------ + def test_as_json_table_type_int_dtypes(self): + integers = [np.int, np.int16, np.int32, np.int64] + for t in integers: + assert as_json_table_type(t) == 'integer' + + def test_as_json_table_type_float_dtypes(self): + floats = [np.float, np.float16, np.float32, np.float64] + for t in floats: + assert as_json_table_type(t) == 'number' + + def test_as_json_table_type_bool_dtypes(self): + bools = [bool, np.bool] + for t in bools: + assert as_json_table_type(t) == 'boolean' + + def test_as_json_table_type_date_dtypes(self): + # TODO: datedate.date? datetime.time? + dates = [np.datetime64, np.dtype("buffer in the C parser can be freed twice leading @@ -95,7 +100,7 @@ def test_dtype_and_names_error(self): 3.0 3 """ # fallback casting, but not castable - with tm.assertRaisesRegexp(ValueError, 'cannot safely convert'): + with tm.assert_raises_regex(ValueError, 'cannot safely convert'): self.read_csv(StringIO(data), sep=r'\s+', header=None, names=['a', 'b'], dtype={'a': np.int32}) @@ -107,22 +112,22 @@ def test_unsupported_dtype(self): df.to_csv(path) # valid but we don't support it (date) - self.assertRaises(TypeError, self.read_csv, path, - dtype={'A': 'datetime64', 'B': 'float64'}, - index_col=0) - self.assertRaises(TypeError, self.read_csv, path, - dtype={'A': 'datetime64', 'B': 'float64'}, - index_col=0, parse_dates=['B']) + pytest.raises(TypeError, self.read_csv, path, + dtype={'A': 'datetime64', 'B': 'float64'}, + index_col=0) + pytest.raises(TypeError, self.read_csv, path, + dtype={'A': 'datetime64', 'B': 'float64'}, + index_col=0, parse_dates=['B']) # valid but we don't support it - self.assertRaises(TypeError, self.read_csv, path, - dtype={'A': 'timedelta64', 'B': 'float64'}, - index_col=0) + pytest.raises(TypeError, self.read_csv, path, + dtype={'A': 'timedelta64', 'B': 'float64'}, + index_col=0) # valid but unsupported - fixed width unicode string - self.assertRaises(TypeError, self.read_csv, path, - dtype={'A': 'U8'}, - index_col=0) + pytest.raises(TypeError, self.read_csv, path, + dtype={'A': 'U8'}, + index_col=0) def test_precise_conversion(self): # see gh-8002 @@ -151,14 +156,14 @@ def error(val): precise_errors.append(error(precise_val)) # round-trip should match float() - self.assertEqual(roundtrip_val, float(text[2:])) + assert roundtrip_val == float(text[2:]) - self.assertTrue(sum(precise_errors) <= sum(normal_errors)) - self.assertTrue(max(precise_errors) <= max(normal_errors)) + assert sum(precise_errors) <= sum(normal_errors) + assert max(precise_errors) <= max(normal_errors) def test_pass_dtype_as_recarray(self): if compat.is_platform_windows() and self.low_memory: - raise nose.SkipTest( + pytest.skip( "segfaults on win-64, only when all tests are run") data = """\ @@ -172,8 +177,8 @@ def test_pass_dtype_as_recarray(self): FutureWarning, check_stacklevel=False): result = self.read_csv(StringIO(data), dtype={ 'one': 'u1', 1: 'S1'}, as_recarray=True) - self.assertEqual(result['one'].dtype, 'u1') - self.assertEqual(result['two'].dtype, 'S1') + assert result['one'].dtype == 'u1' + assert result['two'].dtype == 'S1' def test_usecols_dtypes(self): data = """\ @@ -194,8 +199,8 @@ def test_usecols_dtypes(self): converters={'a': str}, dtype={'b': int, 'c': float}, ) - self.assertTrue((result.dtypes == [object, np.int, np.float]).all()) - self.assertTrue((result2.dtypes == [object, np.float]).all()) + assert (result.dtypes == [object, np.int, np.float]).all() + assert (result2.dtypes == [object, np.float]).all() def test_disable_bool_parsing(self): # #2090 @@ -207,10 +212,10 @@ def test_disable_bool_parsing(self): No,No,No""" result = self.read_csv(StringIO(data), dtype=object) - self.assertTrue((result.dtypes == object).all()) + assert (result.dtypes == object).all() result = self.read_csv(StringIO(data), dtype=object, na_filter=False) - self.assertEqual(result['B'][2], '') + assert result['B'][2] == '' def test_custom_lineterminator(self): data = 'a,b,c~1,2,3~4,5,6' @@ -375,19 +380,19 @@ def test_internal_null_byte(self): def test_read_nrows_large(self): # gh-7626 - Read only nrows of data in for large inputs (>262144b) header_narrow = '\t'.join(['COL_HEADER_' + str(i) - for i in range(10)]) + '\n' + for i in range(10)]) + '\n' data_narrow = '\t'.join(['somedatasomedatasomedata1' - for i in range(10)]) + '\n' + for i in range(10)]) + '\n' header_wide = '\t'.join(['COL_HEADER_' + str(i) - for i in range(15)]) + '\n' + for i in range(15)]) + '\n' data_wide = '\t'.join(['somedatasomedatasomedata2' - for i in range(15)]) + '\n' + for i in range(15)]) + '\n' test_input = (header_narrow + data_narrow * 1050 + header_wide + data_wide * 2) df = self.read_csv(StringIO(test_input), sep='\t', nrows=1010) - self.assertTrue(df.size == 1010 * 10) + assert df.size == 1010 * 10 def test_float_precision_round_trip_with_text(self): # gh-15140 - This should not segfault on Python 2.7+ @@ -407,3 +412,87 @@ def test_large_difference_in_columns(self): expected = DataFrame([row.split(',')[0] for row in rows]) tm.assert_frame_equal(result, expected) + + def test_data_after_quote(self): + # see gh-15910 + + data = 'a\n1\n"b"a' + result = self.read_csv(StringIO(data)) + expected = DataFrame({'a': ['1', 'ba']}) + + tm.assert_frame_equal(result, expected) + + @tm.capture_stderr + def test_comment_whitespace_delimited(self): + test_input = """\ +1 2 +2 2 3 +3 2 3 # 3 fields +4 2 3# 3 fields +5 2 # 2 fields +6 2# 2 fields +7 # 1 field, NaN +8# 1 field, NaN +9 2 3 # skipped line +# comment""" + df = self.read_csv(StringIO(test_input), comment='#', header=None, + delimiter='\\s+', skiprows=0, + error_bad_lines=False) + error = sys.stderr.getvalue() + # skipped lines 2, 3, 4, 9 + for line_num in (2, 3, 4, 9): + assert 'Skipping line {}'.format(line_num) in error, error + expected = DataFrame([[1, 2], + [5, 2], + [6, 2], + [7, np.nan], + [8, np.nan]]) + tm.assert_frame_equal(df, expected) + + def test_file_like_no_next(self): + # gh-16530: the file-like need not have a "next" or "__next__" + # attribute despite having an "__iter__" attribute. + # + # NOTE: This is only true for the C engine, not Python engine. + class NoNextBuffer(StringIO): + def __next__(self): + raise AttributeError("No next method") + + next = __next__ + + data = "a\n1" + + expected = pd.DataFrame({"a": [1]}) + result = self.read_csv(NoNextBuffer(data)) + + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize("tar_suffix", [".tar", ".tar.gz"]) + def test_read_tarfile(self, tar_suffix): + # see gh-16530 + # + # Unfortunately, Python's CSV library can't handle + # tarfile objects (expects string, not bytes when + # iterating through a file-like). + tar_path = os.path.join(self.dirpath, "tar_csv" + tar_suffix) + + with tarfile.open(tar_path, "r") as tar: + data_file = tar.extractfile("tar_data.csv") + + out = self.read_csv(data_file) + expected = pd.DataFrame({"a": [1]}) + tm.assert_frame_equal(out, expected) + + @pytest.mark.high_memory + def test_bytes_exceed_2gb(self): + """Read from a "CSV" that has a column larger than 2GB. + + GH 16798 + """ + if self.low_memory: + pytest.skip("not a high_memory test") + + csv = StringIO('strings\n' + '\n'.join( + ['x' * (1 << 20) for _ in range(2100)])) + df = self.read_csv(csv, low_memory=False) + assert not df.empty diff --git a/pandas/io/tests/parser/comment.py b/pandas/tests/io/parser/comment.py similarity index 100% rename from pandas/io/tests/parser/comment.py rename to pandas/tests/io/parser/comment.py diff --git a/pandas/io/tests/parser/common.py b/pandas/tests/io/parser/common.py similarity index 82% rename from pandas/io/tests/parser/common.py rename to pandas/tests/io/parser/common.py index 9655c481b763a..cfc4a1d7c55eb 100644 --- a/pandas/io/tests/parser/common.py +++ b/pandas/tests/io/parser/common.py @@ -9,9 +9,9 @@ import sys from datetime import datetime -import nose +import pytest import numpy as np -from pandas.lib import Timestamp +from pandas._libs.lib import Timestamp import pandas as pd import pandas.util.testing as tm @@ -19,7 +19,8 @@ from pandas import compat from pandas.compat import (StringIO, BytesIO, PY3, range, lrange, u) -from pandas.io.common import DtypeWarning, EmptyDataError, URLError +from pandas.errors import DtypeWarning, EmptyDataError, ParserError +from pandas.io.common import URLError from pandas.io.parsers import TextFileReader, TextParser @@ -43,7 +44,7 @@ def test_empty_decimal_marker(self): """ # Parsers support only length-1 decimals msg = 'Only length-1 decimal markers supported' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(data), decimal='') def test_bad_stream_exception(self): @@ -63,7 +64,7 @@ def test_bad_stream_exception(self): msg = "'utf-8' codec can't decode byte" else: msg = "'utf8' codec can't decode byte" - with tm.assertRaisesRegexp(UnicodeDecodeError, msg): + with tm.assert_raises_regex(UnicodeDecodeError, msg): self.read_csv(stream) stream.close() @@ -104,7 +105,7 @@ def test_squeeze(self): expected = Series([1, 2, 3], name=1, index=idx) result = self.read_table(StringIO(data), sep=',', index_col=0, header=None, squeeze=True) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_series_equal(result, expected) def test_squeeze_no_view(self): @@ -112,7 +113,7 @@ def test_squeeze_no_view(self): # Series should not be a view data = """time,data\n0,10\n1,11\n2,12\n4,14\n5,15\n3,13""" result = self.read_csv(StringIO(data), index_col='time', squeeze=True) - self.assertFalse(result._is_view) + assert not result._is_view def test_malformed(self): # see gh-6607 @@ -125,7 +126,7 @@ def test_malformed(self): 2,3,4 """ msg = 'Expected 3 fields in line 4, saw 5' - with tm.assertRaisesRegexp(Exception, msg): + with tm.assert_raises_regex(Exception, msg): self.read_table(StringIO(data), sep=',', header=1, comment='#') @@ -139,7 +140,7 @@ def test_malformed(self): 2,3,4 """ msg = 'Expected 3 fields in line 6, saw 5' - with tm.assertRaisesRegexp(Exception, msg): + with tm.assert_raises_regex(Exception, msg): it = self.read_table(StringIO(data), sep=',', header=1, comment='#', iterator=True, chunksize=1, @@ -156,7 +157,7 @@ def test_malformed(self): 2,3,4 """ msg = 'Expected 3 fields in line 6, saw 5' - with tm.assertRaisesRegexp(Exception, msg): + with tm.assert_raises_regex(Exception, msg): it = self.read_table(StringIO(data), sep=',', header=1, comment='#', iterator=True, chunksize=1, skiprows=[2]) @@ -172,7 +173,7 @@ def test_malformed(self): 2,3,4 """ msg = 'Expected 3 fields in line 6, saw 5' - with tm.assertRaisesRegexp(Exception, msg): + with tm.assert_raises_regex(Exception, msg): it = self.read_table(StringIO(data), sep=',', header=1, comment='#', iterator=True, chunksize=1, skiprows=[2]) @@ -189,7 +190,7 @@ def test_malformed(self): footer """ msg = 'Expected 3 fields in line 4, saw 5' - with tm.assertRaisesRegexp(Exception, msg): + with tm.assert_raises_regex(Exception, msg): self.read_table(StringIO(data), sep=',', header=1, comment='#', skipfooter=1) @@ -201,12 +202,12 @@ def test_quoting(self): Klosterdruckerei\tKlosterdruckerei (1609-1805)\t"Furststiftische Hofdruckerei, (1609-1805)\tGaller, Alois Klosterdruckerei\tKlosterdruckerei (1609-1805)\tHochfurstliche Buchhandlung """ # noqa - self.assertRaises(Exception, self.read_table, StringIO(bad_line_small), - sep='\t') + pytest.raises(Exception, self.read_table, StringIO(bad_line_small), + sep='\t') good_line_small = bad_line_small + '"' df = self.read_table(StringIO(good_line_small), sep='\t') - self.assertEqual(len(df), 3) + assert len(df) == 3 def test_unnamed_columns(self): data = """A,B,C,, @@ -219,30 +220,9 @@ def test_unnamed_columns(self): [11, 12, 13, 14, 15]], dtype=np.int64) df = self.read_table(StringIO(data), sep=',') tm.assert_almost_equal(df.values, expected) - self.assert_index_equal(df.columns, - Index(['A', 'B', 'C', 'Unnamed: 3', - 'Unnamed: 4'])) - - def test_duplicate_columns(self): - # TODO: add test for condition 'mangle_dupe_cols=False' - # once it is actually supported (gh-12935) - data = """A,A,B,B,B -1,2,3,4,5 -6,7,8,9,10 -11,12,13,14,15 -""" - - for method in ('read_csv', 'read_table'): - - # check default behavior - df = getattr(self, method)(StringIO(data), sep=',') - self.assertEqual(list(df.columns), - ['A', 'A.1', 'B', 'B.1', 'B.2']) - - df = getattr(self, method)(StringIO(data), sep=',', - mangle_dupe_cols=True) - self.assertEqual(list(df.columns), - ['A', 'A.1', 'B', 'B.1', 'B.2']) + tm.assert_index_equal(df.columns, + Index(['A', 'B', 'C', 'Unnamed: 3', + 'Unnamed: 4'])) def test_csv_mixed_type(self): data = """A,B,C @@ -260,29 +240,27 @@ def test_read_csv_dataframe(self): df = self.read_csv(self.csv1, index_col=0, parse_dates=True) df2 = self.read_table(self.csv1, sep=',', index_col=0, parse_dates=True) - self.assert_index_equal(df.columns, pd.Index(['A', 'B', 'C', 'D'])) - self.assertEqual(df.index.name, 'index') - self.assertIsInstance( + tm.assert_index_equal(df.columns, pd.Index(['A', 'B', 'C', 'D'])) + assert df.index.name == 'index' + assert isinstance( df.index[0], (datetime, np.datetime64, Timestamp)) - self.assertEqual(df.values.dtype, np.float64) + assert df.values.dtype == np.float64 tm.assert_frame_equal(df, df2) def test_read_csv_no_index_name(self): df = self.read_csv(self.csv2, index_col=0, parse_dates=True) df2 = self.read_table(self.csv2, sep=',', index_col=0, parse_dates=True) - self.assert_index_equal(df.columns, - pd.Index(['A', 'B', 'C', 'D', 'E'])) - self.assertIsInstance(df.index[0], - (datetime, np.datetime64, Timestamp)) - self.assertEqual(df.loc[:, ['A', 'B', 'C', 'D']].values.dtype, - np.float64) + tm.assert_index_equal(df.columns, + pd.Index(['A', 'B', 'C', 'D', 'E'])) + assert isinstance(df.index[0], (datetime, np.datetime64, Timestamp)) + assert df.loc[:, ['A', 'B', 'C', 'D']].values.dtype == np.float64 tm.assert_frame_equal(df, df2) def test_read_table_unicode(self): fin = BytesIO(u('\u0141aski, Jan;1').encode('utf-8')) df1 = self.read_table(fin, sep=";", encoding="utf-8", header=None) - tm.assertIsInstance(df1[0].values[0], compat.text_type) + assert isinstance(df1[0].values[0], compat.text_type) def test_read_table_wrong_num_columns(self): # too few! @@ -291,7 +269,7 @@ def test_read_table_wrong_num_columns(self): 6,7,8,9,10,11,12 11,12,13,14,15,16 """ - self.assertRaises(ValueError, self.read_csv, StringIO(data)) + pytest.raises(ValueError, self.read_csv, StringIO(data)) def test_read_duplicate_index_explicit(self): data = """index,A,B,C,D @@ -334,7 +312,7 @@ def test_parse_bools(self): True,3 """ data = self.read_csv(StringIO(data)) - self.assertEqual(data['A'].dtype, np.bool_) + assert data['A'].dtype == np.bool_ data = """A,B YES,1 @@ -346,7 +324,7 @@ def test_parse_bools(self): data = self.read_csv(StringIO(data), true_values=['yes', 'Yes', 'YES'], false_values=['no', 'NO', 'No']) - self.assertEqual(data['A'].dtype, np.bool_) + assert data['A'].dtype == np.bool_ data = """A,B TRUE,1 @@ -354,7 +332,7 @@ def test_parse_bools(self): TRUE,3 """ data = self.read_csv(StringIO(data)) - self.assertEqual(data['A'].dtype, np.bool_) + assert data['A'].dtype == np.bool_ data = """A,B foo,bar @@ -371,8 +349,8 @@ def test_int_conversion(self): 3.0,3 """ data = self.read_csv(StringIO(data)) - self.assertEqual(data['A'].dtype, np.float64) - self.assertEqual(data['B'].dtype, np.int64) + assert data['A'].dtype == np.float64 + assert data['B'].dtype == np.int64 def test_read_nrows(self): expected = self.read_csv(StringIO(self.data1))[:3] @@ -384,14 +362,17 @@ def test_read_nrows(self): df = self.read_csv(StringIO(self.data1), nrows=3.0) tm.assert_frame_equal(df, expected) - msg = "must be an integer" + msg = r"'nrows' must be an integer >=0" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(self.data1), nrows=1.2) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(self.data1), nrows='foo') + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(self.data1), nrows=-1) + def test_read_chunksize(self): reader = self.read_csv(StringIO(self.data1), index_col=0, chunksize=2) df = self.read_csv(StringIO(self.data1), index_col=0) @@ -402,6 +383,45 @@ def test_read_chunksize(self): tm.assert_frame_equal(chunks[1], df[2:4]) tm.assert_frame_equal(chunks[2], df[4:]) + # with invalid chunksize value: + msg = r"'chunksize' must be an integer >=1" + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(self.data1), chunksize=1.3) + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(self.data1), chunksize='foo') + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(self.data1), chunksize=0) + + def test_read_chunksize_and_nrows(self): + + # gh-15755 + # With nrows + reader = self.read_csv(StringIO(self.data1), index_col=0, + chunksize=2, nrows=5) + df = self.read_csv(StringIO(self.data1), index_col=0, nrows=5) + + tm.assert_frame_equal(pd.concat(reader), df) + + # chunksize > nrows + reader = self.read_csv(StringIO(self.data1), index_col=0, + chunksize=8, nrows=5) + df = self.read_csv(StringIO(self.data1), index_col=0, nrows=5) + + tm.assert_frame_equal(pd.concat(reader), df) + + # with changing "size": + reader = self.read_csv(StringIO(self.data1), index_col=0, + chunksize=8, nrows=5) + df = self.read_csv(StringIO(self.data1), index_col=0, nrows=5) + + tm.assert_frame_equal(reader.get_chunk(size=2), df.iloc[:2]) + tm.assert_frame_equal(reader.get_chunk(size=4), df.iloc[2:5]) + with pytest.raises(StopIteration): + reader.get_chunk(size=3) + def test_read_chunksize_named(self): reader = self.read_csv( StringIO(self.data1), index_col='index', chunksize=2) @@ -422,7 +442,7 @@ def test_get_chunk_passed_chunksize(self): result = self.read_csv(StringIO(data), chunksize=2) piece = result.get_chunk() - self.assertEqual(len(piece), 2) + assert len(piece) == 2 def test_read_chunksize_generated_index(self): # GH 12185 @@ -477,7 +497,7 @@ def test_iterator(self): treader = self.read_table(StringIO(self.data1), sep=',', index_col=0, iterator=True) - tm.assertIsInstance(treader, TextFileReader) + assert isinstance(treader, TextFileReader) # gh-3967: stopping iteration when chunksize is specified data = """A,B,C @@ -496,15 +516,15 @@ def test_iterator(self): result = list(reader) expected = DataFrame(dict(A=[1, 4, 7], B=[2, 5, 8], C=[ 3, 6, 9]), index=['foo', 'bar', 'baz']) - self.assertEqual(len(result), 3) + assert len(result) == 3 tm.assert_frame_equal(pd.concat(result), expected) # skipfooter is not supported with the C parser yet if self.engine == 'python': # test bad parameter (skipfooter) reader = self.read_csv(StringIO(self.data1), index_col=0, - iterator=True, skipfooter=True) - self.assertRaises(ValueError, reader.read, 3) + iterator=True, skipfooter=1) + pytest.raises(ValueError, reader.read, 3) def test_pass_names_with_index(self): lines = self.data1.split('\n') @@ -600,7 +620,7 @@ def test_no_unnamed_index(self): 2 2 2 e f """ df = self.read_table(StringIO(data), sep=' ') - self.assertIsNone(df.index.name) + assert df.index.name is None def test_read_csv_parse_simple_list(self): text = """foo @@ -617,7 +637,7 @@ def test_read_csv_parse_simple_list(self): def test_url(self): # HTTP(S) url = ('https://raw.github.com/pandas-dev/pandas/master/' - 'pandas/io/tests/parser/data/salaries.csv') + 'pandas/tests/io/parser/data/salaries.csv') url_table = self.read_table(url) dirpath = tm.get_data_path() localtable = os.path.join(dirpath, 'salaries.csv') @@ -625,7 +645,7 @@ def test_url(self): tm.assert_frame_equal(url_table, local_table) # TODO: ftp testing - @tm.slow + @pytest.mark.slow def test_file(self): dirpath = tm.get_data_path() localtable = os.path.join(dirpath, 'salaries.csv') @@ -635,16 +655,29 @@ def test_file(self): url_table = self.read_table('file://localhost/' + localtable) except URLError: # fails on some systems - raise nose.SkipTest("failing on %s" % - ' '.join(platform.uname()).strip()) + pytest.skip("failing on %s" % + ' '.join(platform.uname()).strip()) tm.assert_frame_equal(url_table, local_table) + def test_path_pathlib(self): + df = tm.makeDataFrame() + result = tm.round_trip_pathlib(df.to_csv, + lambda p: self.read_csv(p, index_col=0)) + tm.assert_frame_equal(df, result) + + def test_path_localpath(self): + df = tm.makeDataFrame() + result = tm.round_trip_localpath( + df.to_csv, + lambda p: self.read_csv(p, index_col=0)) + tm.assert_frame_equal(df, result) + def test_nonexistent_path(self): # gh-2428: pls no segfault # gh-14086: raise more helpful FileNotFoundError path = '%s.csv' % tm.rands(10) - self.assertRaises(compat.FileNotFoundError, self.read_csv, path) + pytest.raises(compat.FileNotFoundError, self.read_csv, path) def test_missing_trailing_delimiters(self): data = """A,B,C,D @@ -652,7 +685,7 @@ def test_missing_trailing_delimiters(self): 1,3,3, 1,4,5""" result = self.read_csv(StringIO(data)) - self.assertTrue(result['D'].isnull()[1:].all()) + assert result['D'].isna()[1:].all() def test_skipinitialspace(self): s = ('"09-Apr-2012", "01:10:18.300", 2456026.548822908, 12849, ' @@ -666,7 +699,7 @@ def test_skipinitialspace(self): # it's 33 columns result = self.read_csv(sfile, names=lrange(33), na_values=['-9999.0'], header=None, skipinitialspace=True) - self.assertTrue(pd.isnull(result.iloc[0, 29])) + assert pd.isna(result.iloc[0, 29]) def test_utf16_bom_skiprows(self): # #2298 @@ -710,12 +743,12 @@ def test_utf16_example(self): # it works! and is the right length result = self.read_table(path, encoding='utf-16') - self.assertEqual(len(result), 50) + assert len(result) == 50 if not compat.PY3: buf = BytesIO(open(path, 'rb').read()) result = self.read_table(buf, encoding='utf-16') - self.assertEqual(len(result), 50) + assert len(result) == 50 def test_unicode_encoding(self): pth = tm.get_data_path('unicode_series.csv') @@ -726,7 +759,7 @@ def test_unicode_encoding(self): got = result[1][1632] expected = u('\xc1 k\xf6ldum klaka (Cold Fever) (1994)') - self.assertEqual(got, expected) + assert got == expected def test_trailing_delimiters(self): # #2442. grumble grumble @@ -751,10 +784,10 @@ def test_escapechar(self): result = self.read_csv(StringIO(data), escapechar='\\', quotechar='"', encoding='utf-8') - self.assertEqual(result['SEARCH_TERM'][2], - 'SLAGBORD, "Bergslagen", IKEA:s 1700-tals serie') - self.assertTrue(np.array_equal(result.columns, - ['SEARCH_TERM', 'ACTUAL_URL'])) + assert result['SEARCH_TERM'][2] == ('SLAGBORD, "Bergslagen", ' + 'IKEA:s 1700-tals serie') + tm.assert_index_equal(result.columns, + Index(['SEARCH_TERM', 'ACTUAL_URL'])) def test_int64_min_issues(self): # #2599 @@ -790,7 +823,7 @@ def test_parse_integers_above_fp_precision(self): 17007000002000192, 17007000002000194]}) - self.assertTrue(np.array_equal(result['Numbers'], expected['Numbers'])) + assert np.array_equal(result['Numbers'], expected['Numbers']) def test_chunks_have_consistent_numerical_type(self): integers = [str(i) for i in range(499999)] @@ -799,8 +832,8 @@ def test_chunks_have_consistent_numerical_type(self): with tm.assert_produces_warning(False): df = self.read_csv(StringIO(data)) # Assert that types were coerced. - self.assertTrue(type(df.a[0]) is np.float64) - self.assertEqual(df.a.dtype, np.float) + assert type(df.a[0]) is np.float64 + assert df.a.dtype == np.float def test_warn_if_chunks_have_mismatched_type(self): warning_type = False @@ -814,17 +847,17 @@ def test_warn_if_chunks_have_mismatched_type(self): with tm.assert_produces_warning(warning_type): df = self.read_csv(StringIO(data)) - self.assertEqual(df.a.dtype, np.object) + assert df.a.dtype == np.object def test_integer_overflow_bug(self): # see gh-2601 data = "65248E10 11\n55555E55 22\n" result = self.read_csv(StringIO(data), header=None, sep=' ') - self.assertTrue(result[0].dtype == np.float64) + assert result[0].dtype == np.float64 result = self.read_csv(StringIO(data), header=None, sep=r'\s+') - self.assertTrue(result[0].dtype == np.float64) + assert result[0].dtype == np.float64 def test_catch_too_many_names(self): # see gh-5156 @@ -833,8 +866,8 @@ def test_catch_too_many_names(self): 4,,6 7,8,9 10,11,12\n""" - tm.assertRaises(ValueError, self.read_csv, StringIO(data), - header=0, names=['a', 'b', 'c', 'd']) + pytest.raises(ValueError, self.read_csv, StringIO(data), + header=0, names=['a', 'b', 'c', 'd']) def test_ignore_leading_whitespace(self): # see gh-3374, gh-6607 @@ -847,7 +880,7 @@ def test_chunk_begins_with_newline_whitespace(self): # see gh-10022 data = '\n hello\nworld\n' result = self.read_csv(StringIO(data), header=None) - self.assertEqual(len(result), 2) + assert len(result) == 2 # see gh-9735: this issue is C parser-specific (bug when # parsing whitespace and characters at chunk boundary) @@ -912,14 +945,14 @@ def test_int64_overflow(self): # 13007854817840016671868 > UINT64_MAX, so this # will overflow and return object as the dtype. result = self.read_csv(StringIO(data)) - self.assertTrue(result['ID'].dtype == object) + assert result['ID'].dtype == object # 13007854817840016671868 > UINT64_MAX, so attempts # to cast to either int64 or uint64 will result in # an OverflowError being raised. for conv in (np.int64, np.uint64): - self.assertRaises(OverflowError, self.read_csv, - StringIO(data), converters={'ID': conv}) + pytest.raises(OverflowError, self.read_csv, + StringIO(data), converters={'ID': conv}) # These numbers fall right inside the int64-uint64 range, # so they should be parsed as string. @@ -1039,18 +1072,18 @@ def test_eof_states(self): # ESCAPED_CHAR data = "a,b,c\n4,5,6\n\\" - self.assertRaises(Exception, self.read_csv, - StringIO(data), escapechar='\\') + pytest.raises(Exception, self.read_csv, + StringIO(data), escapechar='\\') # ESCAPE_IN_QUOTED_FIELD data = 'a,b,c\n4,5,6\n"\\' - self.assertRaises(Exception, self.read_csv, - StringIO(data), escapechar='\\') + pytest.raises(Exception, self.read_csv, + StringIO(data), escapechar='\\') # IN_QUOTED_FIELD data = 'a,b,c\n4,5,6\n"' - self.assertRaises(Exception, self.read_csv, - StringIO(data), escapechar='\\') + pytest.raises(Exception, self.read_csv, + StringIO(data), escapechar='\\') def test_uneven_lines_with_usecols(self): # See gh-12203 @@ -1063,7 +1096,7 @@ def test_uneven_lines_with_usecols(self): # make sure that an error is still thrown # when the 'usecols' parameter is not provided msg = r"Expected \d+ fields in line \d+, saw \d+" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): df = self.read_csv(StringIO(csv)) expected = DataFrame({ @@ -1089,10 +1122,10 @@ def test_read_empty_with_usecols(self): # throws the correct error, with or without usecols errmsg = "No columns to parse from file" - with tm.assertRaisesRegexp(EmptyDataError, errmsg): + with tm.assert_raises_regex(EmptyDataError, errmsg): self.read_csv(StringIO('')) - with tm.assertRaisesRegexp(EmptyDataError, errmsg): + with tm.assert_raises_regex(EmptyDataError, errmsg): self.read_csv(StringIO(''), usecols=usecols) expected = DataFrame(columns=usecols, index=[0], dtype=np.float64) @@ -1131,7 +1164,8 @@ def test_trailing_spaces(self): def test_raise_on_sep_with_delim_whitespace(self): # see gh-6607 data = 'a b c\n1 2 3' - with tm.assertRaisesRegexp(ValueError, 'you can only specify one'): + with tm.assert_raises_regex(ValueError, + 'you can only specify one'): self.read_table(StringIO(data), sep=r'\s', delim_whitespace=True) def test_single_char_leading_whitespace(self): @@ -1202,7 +1236,7 @@ def test_regex_separator(self): df = self.read_table(StringIO(data), sep=r'\s+') expected = self.read_csv(StringIO(re.sub('[ ]+', ',', data)), index_col=0) - self.assertIsNone(expected.index.name) + assert expected.index.name is None tm.assert_frame_equal(df, expected) data = ' a b c\n1 2 3 \n4 5 6\n 7 8 9' @@ -1211,6 +1245,7 @@ def test_regex_separator(self): columns=['a', 'b', 'c']) tm.assert_frame_equal(result, expected) + @tm.capture_stdout def test_verbose_import(self): text = """a,b,c,d one,1,2,3 @@ -1222,22 +1257,18 @@ def test_verbose_import(self): one,1,2,3 two,1,2,3""" - buf = StringIO() - sys.stdout = buf + # Engines are verbose in different ways. + self.read_csv(StringIO(text), verbose=True) + output = sys.stdout.getvalue() - try: # engines are verbose in different ways - self.read_csv(StringIO(text), verbose=True) - if self.engine == 'c': - self.assertIn('Tokenization took:', buf.getvalue()) - self.assertIn('Parser memory cleanup took:', buf.getvalue()) - else: # Python engine - self.assertEqual(buf.getvalue(), - 'Filled 3 NA values in column a\n') - finally: - sys.stdout = sys.__stdout__ + if self.engine == 'c': + assert 'Tokenization took:' in output + assert 'Parser memory cleanup took:' in output + else: # Python engine + assert output == 'Filled 3 NA values in column a\n' - buf = StringIO() - sys.stdout = buf + # Reset the stdout buffer. + sys.stdout = StringIO() text = """a,b,c,d one,1,2,3 @@ -1249,20 +1280,19 @@ def test_verbose_import(self): seven,1,2,3 eight,1,2,3""" - try: # engines are verbose in different ways - self.read_csv(StringIO(text), verbose=True, index_col=0) - if self.engine == 'c': - self.assertIn('Tokenization took:', buf.getvalue()) - self.assertIn('Parser memory cleanup took:', buf.getvalue()) - else: # Python engine - self.assertEqual(buf.getvalue(), - 'Filled 1 NA values in column a\n') - finally: - sys.stdout = sys.__stdout__ + self.read_csv(StringIO(text), verbose=True, index_col=0) + output = sys.stdout.getvalue() + + # Engines are verbose in different ways. + if self.engine == 'c': + assert 'Tokenization took:' in output + assert 'Parser memory cleanup took:' in output + else: # Python engine + assert output == 'Filled 1 NA values in column a\n' def test_iteration_open_handle(self): if PY3: - raise nose.SkipTest( + pytest.skip( "won't work in Python 3 {0}".format(sys.version_info)) with tm.ensure_clean() as path: @@ -1275,8 +1305,8 @@ def test_iteration_open_handle(self): break if self.engine == 'c': - tm.assertRaises(Exception, self.read_table, - f, squeeze=True, header=None) + pytest.raises(Exception, self.read_table, + f, squeeze=True, header=None) else: result = self.read_table(f, squeeze=True, header=None) expected = Series(['DDD', 'EEE', 'FFF', 'GGG'], name=0) @@ -1293,9 +1323,9 @@ def test_1000_sep_with_decimal(self): 'C': [5, 10.] }) - tm.assert_equal(expected.A.dtype, 'int64') - tm.assert_equal(expected.B.dtype, 'float') - tm.assert_equal(expected.C.dtype, 'float') + assert expected.A.dtype == 'int64' + assert expected.B.dtype == 'float' + assert expected.C.dtype == 'float' df = self.read_csv(StringIO(data), sep='|', thousands=',', decimal='.') tm.assert_frame_equal(df, expected) @@ -1323,9 +1353,9 @@ def test_euro_decimal_format(self): 3;878,158;108013,434;GHI;rez;2,735694704""" df2 = self.read_csv(StringIO(data), sep=';', decimal=',') - self.assertEqual(df2['Number1'].dtype, float) - self.assertEqual(df2['Number2'].dtype, float) - self.assertEqual(df2['Number3'].dtype, float) + assert df2['Number1'].dtype == float + assert df2['Number2'].dtype == float + assert df2['Number3'].dtype == float def test_read_duplicate_names(self): # See gh-7160 @@ -1366,11 +1396,11 @@ def test_inf_parsing(self): def test_raise_on_no_columns(self): # single newline data = "\n" - self.assertRaises(EmptyDataError, self.read_csv, StringIO(data)) + pytest.raises(EmptyDataError, self.read_csv, StringIO(data)) # test with more than a single newline data = "\n\n\n" - self.assertRaises(EmptyDataError, self.read_csv, StringIO(data)) + pytest.raises(EmptyDataError, self.read_csv, StringIO(data)) def test_compact_ints_use_unsigned(self): # see gh-13323 @@ -1425,7 +1455,7 @@ def test_compact_ints_as_recarray(self): result = self.read_csv(StringIO(data), delimiter=',', header=None, compact_ints=True, as_recarray=True) ex_dtype = np.dtype([(str(i), 'i1') for i in range(4)]) - self.assertEqual(result.dtype, ex_dtype) + assert result.dtype == ex_dtype with tm.assert_produces_warning( FutureWarning, check_stacklevel=False): @@ -1433,7 +1463,7 @@ def test_compact_ints_as_recarray(self): as_recarray=True, compact_ints=True, use_unsigned=True) ex_dtype = np.dtype([(str(i), 'u1') for i in range(4)]) - self.assertEqual(result.dtype, ex_dtype) + assert result.dtype == ex_dtype def test_as_recarray(self): # basic test @@ -1526,7 +1556,7 @@ def test_null_byte_char(self): tm.assert_frame_equal(out, expected) else: msg = "NULL byte detected" - with tm.assertRaisesRegexp(csv.Error, msg): + with tm.assert_raises_regex(ParserError, msg): self.read_csv(StringIO(data), names=cols) def test_utf8_bom(self): @@ -1613,16 +1643,41 @@ def test_internal_eof_byte(self): result = self.read_csv(StringIO(data)) tm.assert_frame_equal(result, expected) + def test_internal_eof_byte_to_file(self): + # see gh-16559 + data = b'c1,c2\r\n"test \x1a test", test\r\n' + expected = pd.DataFrame([["test \x1a test", " test"]], + columns=["c1", "c2"]) + + path = '__%s__.csv' % tm.rands(10) + + with tm.ensure_clean(path) as path: + with open(path, "wb") as f: + f.write(data) + + result = self.read_csv(path) + tm.assert_frame_equal(result, expected) + + def test_sub_character(self): + # see gh-16893 + dirpath = tm.get_data_path() + filename = os.path.join(dirpath, "sub_char.csv") + + expected = DataFrame([[1, 2, 3]], columns=["a", "\x1ab", "c"]) + result = self.read_csv(filename) + + tm.assert_frame_equal(result, expected) + def test_file_handles(self): # GH 14418 - don't close user provided file handles fh = StringIO('a,b\n1,2') self.read_csv(fh) - self.assertFalse(fh.closed) + assert not fh.closed with open(self.csv1, 'r') as f: self.read_csv(f) - self.assertFalse(f.closed) + assert not f.closed # mmap not working with python engine if self.engine != 'python': @@ -1633,5 +1688,75 @@ def test_file_handles(self): self.read_csv(m) # closed attribute new in python 3.2 if PY3: - self.assertFalse(m.closed) + assert not m.closed m.close() + + def test_invalid_file_buffer(self): + # see gh-15337 + + class InvalidBuffer(object): + pass + + msg = "Invalid file path or buffer object type" + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(InvalidBuffer()) + + # gh-16135: we want to ensure that "tell" and "seek" + # aren't actually being used when we call `read_csv` + # + # Thus, while the object may look "invalid" (these + # methods are attributes of the `StringIO` class), + # it is still a valid file-object for our purposes. + class NoSeekTellBuffer(StringIO): + def tell(self): + raise AttributeError("No tell method") + + def seek(self, pos, whence=0): + raise AttributeError("No seek method") + + data = "a\n1" + + expected = pd.DataFrame({"a": [1]}) + result = self.read_csv(NoSeekTellBuffer(data)) + + tm.assert_frame_equal(result, expected) + + if PY3: + from unittest import mock + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(mock.Mock()) + + @tm.capture_stderr + def test_skip_bad_lines(self): + # see gh-15925 + data = 'a\n1\n1,2,3\n4\n5,6,7' + + with pytest.raises(ParserError): + self.read_csv(StringIO(data)) + + with pytest.raises(ParserError): + self.read_csv(StringIO(data), error_bad_lines=True) + + expected = DataFrame({'a': [1, 4]}) + + out = self.read_csv(StringIO(data), + error_bad_lines=False, + warn_bad_lines=False) + tm.assert_frame_equal(out, expected) + + val = sys.stderr.getvalue() + assert val == '' + + # Reset the stderr buffer. + sys.stderr = StringIO() + + out = self.read_csv(StringIO(data), + error_bad_lines=False, + warn_bad_lines=True) + tm.assert_frame_equal(out, expected) + + val = sys.stderr.getvalue() + assert 'Skipping line 3' in val + assert 'Skipping line 5' in val diff --git a/pandas/io/tests/parser/compression.py b/pandas/tests/io/parser/compression.py similarity index 81% rename from pandas/io/tests/parser/compression.py rename to pandas/tests/io/parser/compression.py index e95617faf2071..797c12139656d 100644 --- a/pandas/io/tests/parser/compression.py +++ b/pandas/tests/io/parser/compression.py @@ -5,17 +5,15 @@ of the parsers defined in parsers.py """ -import nose +import pytest import pandas.util.testing as tm class CompressionTests(object): + def test_zip(self): - try: - import zipfile - except ImportError: - raise nose.SkipTest('need zipfile to run') + import zipfile with open(self.csv1, 'rb') as data_file: data = data_file.read() @@ -44,29 +42,27 @@ def test_zip(self): tmp.writestr(file_name, data) tmp.close() - self.assertRaisesRegexp(ValueError, 'Multiple files', - self.read_csv, path, compression='zip') + tm.assert_raises_regex(ValueError, 'Multiple files', + self.read_csv, path, compression='zip') - self.assertRaisesRegexp(ValueError, 'Multiple files', - self.read_csv, path, compression='infer') + tm.assert_raises_regex(ValueError, 'Multiple files', + self.read_csv, path, + compression='infer') with tm.ensure_clean() as path: tmp = zipfile.ZipFile(path, mode='w') tmp.close() - self.assertRaisesRegexp(ValueError, 'Zero files', - self.read_csv, path, compression='zip') + tm.assert_raises_regex(ValueError, 'Zero files', + self.read_csv, path, compression='zip') with tm.ensure_clean() as path: with open(path, 'wb') as f: - self.assertRaises(zipfile.BadZipfile, self.read_csv, - f, compression='zip') + pytest.raises(zipfile.BadZipfile, self.read_csv, + f, compression='zip') def test_gzip(self): - try: - import gzip - except ImportError: - raise nose.SkipTest('need gzip to run') + import gzip with open(self.csv1, 'rb') as data_file: data = data_file.read() @@ -92,10 +88,7 @@ def test_gzip(self): tm.assert_frame_equal(result, expected) def test_bz2(self): - try: - import bz2 - except ImportError: - raise nose.SkipTest('need bz2 to run') + import bz2 with open(self.csv1, 'rb') as data_file: data = data_file.read() @@ -109,8 +102,8 @@ def test_bz2(self): result = self.read_csv(path, compression='bz2') tm.assert_frame_equal(result, expected) - self.assertRaises(ValueError, self.read_csv, - path, compression='bz3') + pytest.raises(ValueError, self.read_csv, + path, compression='bz3') with open(path, 'rb') as fin: result = self.read_csv(fin, compression='bz2') @@ -166,5 +159,5 @@ def test_read_csv_infer_compression(self): def test_invalid_compression(self): msg = 'Unrecognized compression type: sfark' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv('test_file.zip', compression='sfark') diff --git a/pandas/io/tests/parser/converters.py b/pandas/tests/io/parser/converters.py similarity index 86% rename from pandas/io/tests/parser/converters.py rename to pandas/tests/io/parser/converters.py index 68231d67534ee..1176b1e84e29b 100644 --- a/pandas/io/tests/parser/converters.py +++ b/pandas/tests/io/parser/converters.py @@ -7,23 +7,24 @@ from datetime import datetime -import nose +import pytest import numpy as np import pandas as pd import pandas.util.testing as tm -from pandas.lib import Timestamp +from pandas._libs.lib import Timestamp from pandas import DataFrame, Index from pandas.compat import parse_date, StringIO, lmap class ConverterTests(object): + def test_converters_type_must_be_dict(self): data = """index,A,B,C,D foo,2,3,4,5 """ - with tm.assertRaisesRegexp(TypeError, 'Type converters.+'): + with tm.assert_raises_regex(TypeError, 'Type converters.+'): self.read_csv(StringIO(data), converters=0) def test_converters(self): @@ -38,7 +39,7 @@ def test_converters(self): expected = self.read_csv(StringIO(data)) expected['D'] = expected['D'].map(parse_date) - tm.assertIsInstance(result['D'][0], (datetime, Timestamp)) + assert isinstance(result['D'][0], (datetime, Timestamp)) tm.assert_frame_equal(result, expected) tm.assert_frame_equal(result2, expected) @@ -55,7 +56,7 @@ def test_converters_no_implicit_conv(self): f = lambda x: x.strip() converter = {0: f} df = self.read_csv(StringIO(data), header=None, converters=converter) - self.assertEqual(df[0].dtype, object) + assert df[0].dtype == object def test_converters_euro_decimal_format(self): data = """Id;Number1;Number2;Text1;Text2;Number3 @@ -65,9 +66,9 @@ def test_converters_euro_decimal_format(self): f = lambda x: float(x.replace(",", ".")) converter = {'Number1': f, 'Number2': f, 'Number3': f} df2 = self.read_csv(StringIO(data), sep=';', converters=converter) - self.assertEqual(df2['Number1'].dtype, float) - self.assertEqual(df2['Number2'].dtype, float) - self.assertEqual(df2['Number3'].dtype, float) + assert df2['Number1'].dtype == float + assert df2['Number2'].dtype == float + assert df2['Number3'].dtype == float def test_converter_return_string_bug(self): # see gh-583 @@ -78,13 +79,13 @@ def test_converter_return_string_bug(self): f = lambda x: float(x.replace(",", ".")) converter = {'Number1': f, 'Number2': f, 'Number3': f} df2 = self.read_csv(StringIO(data), sep=';', converters=converter) - self.assertEqual(df2['Number1'].dtype, float) + assert df2['Number1'].dtype == float def test_converters_corner_with_nas(self): # skip aberration observed on Win64 Python 3.2.2 if hash(np.int64(-1)) != -2: - raise nose.SkipTest("skipping because of windows hash on Python" - " 3.2.2") + pytest.skip("skipping because of windows hash on Python" + " 3.2.2") data = """id,score,days 1,2,12 @@ -132,7 +133,7 @@ def convert_score(x): result = self.read_csv(fh, converters={'score': convert_score, 'days': convert_days}, na_values=['', None]) - self.assertTrue(pd.isnull(result['days'][1])) + assert pd.isna(result['days'][1]) fh = StringIO(data) result2 = self.read_csv(fh, converters={'score': convert_score, @@ -149,4 +150,4 @@ def test_converter_index_col_bug(self): xp = DataFrame({'B': [2, 4]}, index=Index([1, 3], name='A')) tm.assert_frame_equal(rs, xp) - self.assertEqual(rs.index.name, xp.index.name) + assert rs.index.name == xp.index.name diff --git a/pandas/io/tests/parser/data/iris.csv b/pandas/tests/io/parser/data/iris.csv similarity index 100% rename from pandas/io/tests/parser/data/iris.csv rename to pandas/tests/io/parser/data/iris.csv diff --git a/pandas/io/tests/parser/data/salaries.csv b/pandas/tests/io/parser/data/salaries.csv similarity index 100% rename from pandas/io/tests/parser/data/salaries.csv rename to pandas/tests/io/parser/data/salaries.csv diff --git a/pandas/io/tests/parser/data/salaries.csv.bz2 b/pandas/tests/io/parser/data/salaries.csv.bz2 similarity index 100% rename from pandas/io/tests/parser/data/salaries.csv.bz2 rename to pandas/tests/io/parser/data/salaries.csv.bz2 diff --git a/pandas/io/tests/parser/data/salaries.csv.gz b/pandas/tests/io/parser/data/salaries.csv.gz similarity index 100% rename from pandas/io/tests/parser/data/salaries.csv.gz rename to pandas/tests/io/parser/data/salaries.csv.gz diff --git a/pandas/io/tests/parser/data/salaries.csv.xz b/pandas/tests/io/parser/data/salaries.csv.xz similarity index 100% rename from pandas/io/tests/parser/data/salaries.csv.xz rename to pandas/tests/io/parser/data/salaries.csv.xz diff --git a/pandas/io/tests/parser/data/salaries.csv.zip b/pandas/tests/io/parser/data/salaries.csv.zip similarity index 100% rename from pandas/io/tests/parser/data/salaries.csv.zip rename to pandas/tests/io/parser/data/salaries.csv.zip diff --git a/pandas/io/tests/parser/data/sauron.SHIFT_JIS.csv b/pandas/tests/io/parser/data/sauron.SHIFT_JIS.csv similarity index 100% rename from pandas/io/tests/parser/data/sauron.SHIFT_JIS.csv rename to pandas/tests/io/parser/data/sauron.SHIFT_JIS.csv diff --git a/pandas/tests/io/parser/data/sub_char.csv b/pandas/tests/io/parser/data/sub_char.csv new file mode 100644 index 0000000000000..ff1fa777832c7 --- /dev/null +++ b/pandas/tests/io/parser/data/sub_char.csv @@ -0,0 +1,2 @@ +a,"b",c +1,2,3 \ No newline at end of file diff --git a/pandas/tests/io/parser/data/tar_csv.tar b/pandas/tests/io/parser/data/tar_csv.tar new file mode 100644 index 0000000000000..d1819550e0a00 Binary files /dev/null and b/pandas/tests/io/parser/data/tar_csv.tar differ diff --git a/pandas/tests/io/parser/data/tar_csv.tar.gz b/pandas/tests/io/parser/data/tar_csv.tar.gz new file mode 100644 index 0000000000000..b5a0f3e1b5805 Binary files /dev/null and b/pandas/tests/io/parser/data/tar_csv.tar.gz differ diff --git a/pandas/io/tests/parser/data/test1.csv b/pandas/tests/io/parser/data/test1.csv similarity index 100% rename from pandas/io/tests/parser/data/test1.csv rename to pandas/tests/io/parser/data/test1.csv diff --git a/pandas/io/tests/parser/data/test1.csv.bz2 b/pandas/tests/io/parser/data/test1.csv.bz2 similarity index 100% rename from pandas/io/tests/parser/data/test1.csv.bz2 rename to pandas/tests/io/parser/data/test1.csv.bz2 diff --git a/pandas/io/tests/parser/data/test1.csv.gz b/pandas/tests/io/parser/data/test1.csv.gz similarity index 100% rename from pandas/io/tests/parser/data/test1.csv.gz rename to pandas/tests/io/parser/data/test1.csv.gz diff --git a/pandas/io/tests/parser/data/test2.csv b/pandas/tests/io/parser/data/test2.csv similarity index 100% rename from pandas/io/tests/parser/data/test2.csv rename to pandas/tests/io/parser/data/test2.csv diff --git a/pandas/io/tests/parser/data/test_mmap.csv b/pandas/tests/io/parser/data/test_mmap.csv similarity index 100% rename from pandas/io/tests/parser/data/test_mmap.csv rename to pandas/tests/io/parser/data/test_mmap.csv diff --git a/pandas/io/tests/parser/data/tips.csv b/pandas/tests/io/parser/data/tips.csv similarity index 100% rename from pandas/io/tests/parser/data/tips.csv rename to pandas/tests/io/parser/data/tips.csv diff --git a/pandas/tests/formats/data/unicode_series.csv b/pandas/tests/io/parser/data/unicode_series.csv similarity index 100% rename from pandas/tests/formats/data/unicode_series.csv rename to pandas/tests/io/parser/data/unicode_series.csv diff --git a/pandas/io/tests/parser/data/utf16_ex.txt b/pandas/tests/io/parser/data/utf16_ex.txt similarity index 100% rename from pandas/io/tests/parser/data/utf16_ex.txt rename to pandas/tests/io/parser/data/utf16_ex.txt diff --git a/pandas/io/tests/parser/dialect.py b/pandas/tests/io/parser/dialect.py similarity index 95% rename from pandas/io/tests/parser/dialect.py rename to pandas/tests/io/parser/dialect.py index ee50cf812f72e..f756fe71bf684 100644 --- a/pandas/io/tests/parser/dialect.py +++ b/pandas/tests/io/parser/dialect.py @@ -9,7 +9,7 @@ from pandas import DataFrame from pandas.compat import StringIO -from pandas.io.common import ParserWarning +from pandas.errors import ParserWarning import pandas.util.testing as tm @@ -61,7 +61,7 @@ class InvalidDialect(object): data = 'a\n1' msg = 'Invalid dialect' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(data), dialect=InvalidDialect) def test_dialect_conflict(self): diff --git a/pandas/io/tests/parser/dtypes.py b/pandas/tests/io/parser/dtypes.py similarity index 95% rename from pandas/io/tests/parser/dtypes.py rename to pandas/tests/io/parser/dtypes.py index abcd14e9499cb..7311c9200f269 100644 --- a/pandas/io/tests/parser/dtypes.py +++ b/pandas/tests/io/parser/dtypes.py @@ -5,17 +5,20 @@ for all of the parsers defined in parsers.py """ +import pytest + import numpy as np import pandas as pd import pandas.util.testing as tm from pandas import DataFrame, Series, Index, MultiIndex, Categorical from pandas.compat import StringIO -from pandas.types.dtypes import CategoricalDtype -from pandas.io.common import ParserWarning +from pandas.core.dtypes.dtypes import CategoricalDtype +from pandas.errors import ParserWarning class DtypeTests(object): + def test_passing_dtype(self): # see gh-6607 df = DataFrame(np.random.rand(5, 2).round(4), columns=list( @@ -39,9 +42,9 @@ def test_passing_dtype(self): tm.assert_frame_equal(result, df) # invalid dtype - self.assertRaises(TypeError, self.read_csv, path, - dtype={'A': 'foo', 'B': 'float64'}, - index_col=0) + pytest.raises(TypeError, self.read_csv, path, + dtype={'A': 'foo', 'B': 'float64'}, + index_col=0) # see gh-12048: empty frame actual = self.read_csv(StringIO('A,B'), dtype=str) @@ -57,8 +60,8 @@ def test_pass_dtype(self): 4,5.5""" result = self.read_csv(StringIO(data), dtype={'one': 'u1', 1: 'S1'}) - self.assertEqual(result['one'].dtype, 'u1') - self.assertEqual(result['two'].dtype, 'object') + assert result['one'].dtype == 'u1' + assert result['two'].dtype == 'object' def test_categorical_dtype(self): # GH 10153 @@ -212,9 +215,9 @@ def test_raise_on_passed_int_dtype_with_nas(self): 2001,106380451,10 2001,,11 2001,106380451,67""" - self.assertRaises(ValueError, self.read_csv, StringIO(data), - sep=",", skipinitialspace=True, - dtype={'DOY': np.int64}) + pytest.raises(ValueError, self.read_csv, StringIO(data), + sep=",", skipinitialspace=True, + dtype={'DOY': np.int64}) def test_dtype_with_converter(self): data = """a,b diff --git a/pandas/io/tests/parser/header.py b/pandas/tests/io/parser/header.py similarity index 79% rename from pandas/io/tests/parser/header.py rename to pandas/tests/io/parser/header.py index dc6d2ad1daa47..50ae4dae541ac 100644 --- a/pandas/io/tests/parser/header.py +++ b/pandas/tests/io/parser/header.py @@ -5,6 +5,8 @@ during parsing for all of the parsers defined in parsers.py """ +import pytest + import numpy as np import pandas.util.testing as tm @@ -17,7 +19,7 @@ class HeaderTests(object): def test_read_with_bad_header(self): errmsg = r"but only \d+ lines in file" - with tm.assertRaisesRegexp(ValueError, errmsg): + with tm.assert_raises_regex(ValueError, errmsg): s = StringIO(',,') self.read_csv(s, header=[10]) @@ -30,9 +32,9 @@ def test_bool_header_arg(self): a b""" for arg in [True, False]: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.read_csv(StringIO(data), header=arg) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.read_table(StringIO(data), header=arg) def test_no_header_prefix(self): @@ -48,9 +50,9 @@ def test_no_header_prefix(self): [11, 12, 13, 14, 15]], dtype=np.int64) tm.assert_almost_equal(df_pref.values, expected) - self.assert_index_equal(df_pref.columns, - Index(['Field0', 'Field1', 'Field2', - 'Field3', 'Field4'])) + tm.assert_index_equal(df_pref.columns, + Index(['Field0', 'Field1', 'Field2', + 'Field3', 'Field4'])) def test_header_with_index_col(self): data = """foo,1,2,3 @@ -60,7 +62,7 @@ def test_header_with_index_col(self): names = ['A', 'B', 'C'] df = self.read_csv(StringIO(data), names=names) - self.assertEqual(names, ['A', 'B', 'C']) + assert list(df.columns) == ['A', 'B', 'C'] values = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] expected = DataFrame(values, index=['foo', 'bar', 'baz'], @@ -117,27 +119,27 @@ def test_header_multi_index(self): # no as_recarray with tm.assert_produces_warning( FutureWarning, check_stacklevel=False): - self.assertRaises(ValueError, self.read_csv, - StringIO(data), header=[0, 1, 2, 3], - index_col=[0, 1], as_recarray=True, - tupleize_cols=False) - - # names - self.assertRaises(ValueError, self.read_csv, + pytest.raises(ValueError, self.read_csv, StringIO(data), header=[0, 1, 2, 3], - index_col=[0, 1], names=['foo', 'bar'], + index_col=[0, 1], as_recarray=True, tupleize_cols=False) + # names + pytest.raises(ValueError, self.read_csv, + StringIO(data), header=[0, 1, 2, 3], + index_col=[0, 1], names=['foo', 'bar'], + tupleize_cols=False) + # usecols - self.assertRaises(ValueError, self.read_csv, - StringIO(data), header=[0, 1, 2, 3], - index_col=[0, 1], usecols=['foo', 'bar'], - tupleize_cols=False) + pytest.raises(ValueError, self.read_csv, + StringIO(data), header=[0, 1, 2, 3], + index_col=[0, 1], usecols=['foo', 'bar'], + tupleize_cols=False) # non-numeric index_col - self.assertRaises(ValueError, self.read_csv, - StringIO(data), header=[0, 1, 2, 3], - index_col=['foo', 'bar'], tupleize_cols=False) + pytest.raises(ValueError, self.read_csv, + StringIO(data), header=[0, 1, 2, 3], + index_col=['foo', 'bar'], tupleize_cols=False) def test_header_multiindex_common_format(self): @@ -270,8 +272,24 @@ def test_no_header(self): tm.assert_almost_equal(df.values, expected) tm.assert_almost_equal(df.values, df2.values) - self.assert_index_equal(df_pref.columns, - Index(['X0', 'X1', 'X2', 'X3', 'X4'])) - self.assert_index_equal(df.columns, Index(lrange(5))) - - self.assert_index_equal(df2.columns, Index(names)) + tm.assert_index_equal(df_pref.columns, + Index(['X0', 'X1', 'X2', 'X3', 'X4'])) + tm.assert_index_equal(df.columns, Index(lrange(5))) + + tm.assert_index_equal(df2.columns, Index(names)) + + def test_non_int_header(self): + # GH 16338 + msg = 'header must be integer or list of integers' + data = """1,2\n3,4""" + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), sep=',', header=['a', 'b']) + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), sep=',', header='string_header') + + def test_singleton_header(self): + # See GH #7757 + data = """a,b,c\n0,1,2\n1,2,3""" + df = self.read_csv(StringIO(data), header=[0]) + expected = DataFrame({"a": [0, 1], "b": [1, 2], "c": [2, 3]}) + tm.assert_frame_equal(df, expected) diff --git a/pandas/io/tests/parser/index_col.py b/pandas/tests/io/parser/index_col.py similarity index 92% rename from pandas/io/tests/parser/index_col.py rename to pandas/tests/io/parser/index_col.py index 6eb15eb3e043c..ee9b210443636 100644 --- a/pandas/io/tests/parser/index_col.py +++ b/pandas/tests/io/parser/index_col.py @@ -6,6 +6,8 @@ the parsers defined in parsers.py """ +import pytest + import pandas.util.testing as tm from pandas import DataFrame, Index, MultiIndex @@ -29,8 +31,8 @@ def test_index_col_named(self): xp = self.read_csv(StringIO(data), header=0).set_index('ID') tm.assert_frame_equal(rs, xp) - self.assertRaises(ValueError, self.read_csv, StringIO(no_header), - index_col='ID') + pytest.raises(ValueError, self.read_csv, StringIO(no_header), + index_col='ID') data = """\ 1,2,3,4,hello @@ -43,16 +45,16 @@ def test_index_col_named(self): index=Index(['hello', 'world', 'foo'], name='message')) rs = self.read_csv(StringIO(data), names=names, index_col=['message']) tm.assert_frame_equal(xp, rs) - self.assertEqual(xp.index.name, rs.index.name) + assert xp.index.name == rs.index.name rs = self.read_csv(StringIO(data), names=names, index_col='message') tm.assert_frame_equal(xp, rs) - self.assertEqual(xp.index.name, rs.index.name) + assert xp.index.name == rs.index.name def test_index_col_is_true(self): # see gh-9798 - self.assertRaises(ValueError, self.read_csv, - StringIO(self.ts_data), index_col=True) + pytest.raises(ValueError, self.read_csv, + StringIO(self.ts_data), index_col=True) def test_infer_index_col(self): data = """A,B,C @@ -61,7 +63,7 @@ def test_infer_index_col(self): baz,7,8,9 """ data = self.read_csv(StringIO(data)) - self.assertTrue(data.index.equals(Index(['foo', 'bar', 'baz']))) + assert data.index.equals(Index(['foo', 'bar', 'baz'])) def test_empty_index_col_scenarios(self): data = 'x,y,z' diff --git a/pandas/tests/io/parser/mangle_dupes.py b/pandas/tests/io/parser/mangle_dupes.py new file mode 100644 index 0000000000000..e2efb1377f8b0 --- /dev/null +++ b/pandas/tests/io/parser/mangle_dupes.py @@ -0,0 +1,64 @@ +# -*- coding: utf-8 -*- + +""" +Tests that duplicate columns are handled appropriately when parsed by the +CSV engine. In general, the expected result is that they are either thoroughly +de-duplicated (if mangling requested) or ignored otherwise. +""" + +from pandas.compat import StringIO + + +class DupeColumnTests(object): + def test_basic(self): + # TODO: add test for condition "mangle_dupe_cols=False" + # once it is actually supported (gh-12935) + data = "a,a,b,b,b\n1,2,3,4,5" + + for method in ("read_csv", "read_table"): + # Check default behavior. + expected = ["a", "a.1", "b", "b.1", "b.2"] + df = getattr(self, method)(StringIO(data), sep=",") + assert list(df.columns) == expected + + df = getattr(self, method)(StringIO(data), sep=",", + mangle_dupe_cols=True) + assert list(df.columns) == expected + + def test_thorough_mangle_columns(self): + # see gh-17060 + data = "a,a,a.1\n1,2,3" + df = self.read_csv(StringIO(data), sep=",", mangle_dupe_cols=True) + assert list(df.columns) == ["a", "a.1", "a.1.1"] + + data = "a,a,a.1,a.1.1,a.1.1.1,a.1.1.1.1\n1,2,3,4,5,6" + df = self.read_csv(StringIO(data), sep=",", mangle_dupe_cols=True) + assert list(df.columns) == ["a", "a.1", "a.1.1", "a.1.1.1", + "a.1.1.1.1", "a.1.1.1.1.1"] + + data = "a,a,a.3,a.1,a.2,a,a\n1,2,3,4,5,6,7" + df = self.read_csv(StringIO(data), sep=",", mangle_dupe_cols=True) + assert list(df.columns) == ["a", "a.1", "a.3", "a.1.1", + "a.2", "a.2.1", "a.3.1"] + + def test_thorough_mangle_names(self): + # see gh-17095 + data = "a,b,b\n1,2,3" + names = ["a.1", "a.1", "a.1.1"] + df = self.read_csv(StringIO(data), sep=",", names=names, + mangle_dupe_cols=True) + assert list(df.columns) == ["a.1", "a.1.1", "a.1.1.1"] + + data = "a,b,c,d,e,f\n1,2,3,4,5,6" + names = ["a", "a", "a.1", "a.1.1", "a.1.1.1", "a.1.1.1.1"] + df = self.read_csv(StringIO(data), sep=",", names=names, + mangle_dupe_cols=True) + assert list(df.columns) == ["a", "a.1", "a.1.1", "a.1.1.1", + "a.1.1.1.1", "a.1.1.1.1.1"] + + data = "a,b,c,d,e,f,g\n1,2,3,4,5,6,7" + names = ["a", "a", "a.3", "a.1", "a.2", "a", "a"] + df = self.read_csv(StringIO(data), sep=",", names=names, + mangle_dupe_cols=True) + assert list(df.columns) == ["a", "a.1", "a.3", "a.1.1", + "a.2", "a.2.1", "a.3.1"] diff --git a/pandas/io/tests/parser/multithread.py b/pandas/tests/io/parser/multithread.py similarity index 100% rename from pandas/io/tests/parser/multithread.py rename to pandas/tests/io/parser/multithread.py diff --git a/pandas/io/tests/parser/na_values.py b/pandas/tests/io/parser/na_values.py similarity index 93% rename from pandas/io/tests/parser/na_values.py rename to pandas/tests/io/parser/na_values.py index 2cbd7cdedf2ab..7fbf174e19eee 100644 --- a/pandas/io/tests/parser/na_values.py +++ b/pandas/tests/io/parser/na_values.py @@ -8,10 +8,10 @@ import numpy as np from numpy import nan -import pandas.io.parsers as parsers +import pandas.io.common as com import pandas.util.testing as tm -from pandas import DataFrame, MultiIndex +from pandas import DataFrame, Index, MultiIndex from pandas.compat import StringIO, range @@ -70,9 +70,9 @@ def test_non_string_na_values(self): def test_default_na_values(self): _NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', - '#N/A', 'N/A', 'NA', '#NA', 'NULL', 'NaN', - 'nan', '-NaN', '-nan', '#N/A N/A', '']) - self.assertEqual(_NA_VALUES, parsers._NA_VALUES) + '#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL', 'null', + 'NaN', 'nan', '-NaN', '-nan', '#N/A N/A', '']) + assert _NA_VALUES == com._NA_VALUES nv = len(_NA_VALUES) def f(i, v): @@ -248,8 +248,8 @@ def test_na_trailing_columns(self): 2012-05-12,USD,SBUX,SELL,500""" result = self.read_csv(StringIO(data)) - self.assertEqual(result['Date'][1], '2012-05-12') - self.assertTrue(result['UnitPrice'].isnull().all()) + assert result['Date'][1] == '2012-05-12' + assert result['UnitPrice'].isna().all() def test_na_values_scalar(self): # see gh-12224 @@ -303,3 +303,12 @@ def test_na_values_uint64(self): expected = DataFrame([[str(2**63), 1], ['', 2]]) out = self.read_csv(StringIO(data), header=None) tm.assert_frame_equal(out, expected) + + def test_empty_na_values_no_default_with_index(self): + # see gh-15835 + data = "a,1\nb,2" + + expected = DataFrame({'1': [2]}, index=Index(["b"], name="a")) + out = self.read_csv(StringIO(data), keep_default_na=False, index_col=0) + + tm.assert_frame_equal(out, expected) diff --git a/pandas/io/tests/parser/parse_dates.py b/pandas/tests/io/parser/parse_dates.py similarity index 63% rename from pandas/io/tests/parser/parse_dates.py rename to pandas/tests/io/parser/parse_dates.py index e4af1ff70a498..e1ae1b577ea29 100644 --- a/pandas/io/tests/parser/parse_dates.py +++ b/pandas/tests/io/parser/parse_dates.py @@ -6,25 +6,28 @@ """ from distutils.version import LooseVersion -from datetime import datetime +from datetime import datetime, date -import nose +import pytest import numpy as np -import pandas.lib as lib -from pandas.lib import Timestamp +import pandas._libs.lib as lib +from pandas._libs.lib import Timestamp import pandas as pd import pandas.io.parsers as parsers -import pandas.tseries.tools as tools +import pandas.core.tools.datetimes as tools import pandas.util.testing as tm -from pandas import DataFrame, Series, Index, DatetimeIndex +import pandas.io.date_converters as conv +from pandas import DataFrame, Series, Index, DatetimeIndex, MultiIndex from pandas import compat from pandas.compat import parse_date, StringIO, lrange -from pandas.tseries.index import date_range +from pandas.compat.numpy import np_array_datetime64_compat +from pandas.core.indexes.datetimes import date_range class ParseDatesTests(object): + def test_separator_date_conflict(self): # Regression test for gh-4678: make sure thousands separator and # date parsing do not conflict. @@ -57,26 +60,26 @@ def func(*date_cols): prefix='X', parse_dates={'nominal': [1, 2], 'actual': [1, 3]}) - self.assertIn('nominal', df) - self.assertIn('actual', df) - self.assertNotIn('X1', df) - self.assertNotIn('X2', df) - self.assertNotIn('X3', df) + assert 'nominal' in df + assert 'actual' in df + assert 'X1' not in df + assert 'X2' not in df + assert 'X3' not in df d = datetime(1999, 1, 27, 19, 0) - self.assertEqual(df.loc[0, 'nominal'], d) + assert df.loc[0, 'nominal'] == d df = self.read_csv(StringIO(data), header=None, date_parser=func, parse_dates={'nominal': [1, 2], 'actual': [1, 3]}, keep_date_col=True) - self.assertIn('nominal', df) - self.assertIn('actual', df) + assert 'nominal' in df + assert 'actual' in df - self.assertIn(1, df) - self.assertIn(2, df) - self.assertIn(3, df) + assert 1 in df + assert 2 in df + assert 3 in df data = """\ KORD,19990127, 19:00:00, 18:56:00, 0.8100, 2.8100, 7.2000, 0.0000, 280.0000 @@ -89,23 +92,23 @@ def func(*date_cols): df = self.read_csv(StringIO(data), header=None, prefix='X', parse_dates=[[1, 2], [1, 3]]) - self.assertIn('X1_X2', df) - self.assertIn('X1_X3', df) - self.assertNotIn('X1', df) - self.assertNotIn('X2', df) - self.assertNotIn('X3', df) + assert 'X1_X2' in df + assert 'X1_X3' in df + assert 'X1' not in df + assert 'X2' not in df + assert 'X3' not in df d = datetime(1999, 1, 27, 19, 0) - self.assertEqual(df.loc[0, 'X1_X2'], d) + assert df.loc[0, 'X1_X2'] == d df = self.read_csv(StringIO(data), header=None, parse_dates=[[1, 2], [1, 3]], keep_date_col=True) - self.assertIn('1_2', df) - self.assertIn('1_3', df) - self.assertIn(1, df) - self.assertIn(2, df) - self.assertIn(3, df) + assert '1_2' in df + assert '1_3' in df + assert 1 in df + assert 2 in df + assert 3 in df data = '''\ KORD,19990127 19:00:00, 18:56:00, 0.8100, 2.8100, 7.2000, 0.0000, 280.0000 @@ -117,7 +120,7 @@ def func(*date_cols): df = self.read_csv(StringIO(data), sep=',', header=None, parse_dates=[1], index_col=1) d = datetime(1999, 1, 27, 19, 0) - self.assertEqual(df.index[0], d) + assert df.index[0] == d def test_multiple_date_cols_int_cast(self): data = ("KORD,19990127, 19:00:00, 18:56:00, 0.8100\n" @@ -132,7 +135,7 @@ def test_multiple_date_cols_int_cast(self): # it works! df = self.read_csv(StringIO(data), header=None, parse_dates=date_spec, date_parser=conv.parse_date_time) - self.assertIn('nominal', df) + assert 'nominal' in df def test_multiple_date_col_timestamp_parse(self): data = """05/31/2012,15:30:00.029,1306.25,1,E,0,,1306.25 @@ -141,7 +144,7 @@ def test_multiple_date_col_timestamp_parse(self): parse_dates=[[0, 1]], date_parser=Timestamp) ex_val = Timestamp('05/31/2012 15:30:00.029') - self.assertEqual(result['0_1'][0], ex_val) + assert result['0_1'][0] == ex_val def test_multiple_date_cols_with_header(self): data = """\ @@ -154,7 +157,7 @@ def test_multiple_date_cols_with_header(self): KORD,19990127, 23:00:00, 22:56:00, -0.5900, 1.7100, 4.6000, 0.0000, 280.0000""" df = self.read_csv(StringIO(data), parse_dates={'nominal': [1, 2]}) - self.assertNotIsInstance(df.nominal[0], compat.string_types) + assert not isinstance(df.nominal[0], compat.string_types) ts_data = """\ ID,date,nominalTime,actualTime,A,B,C,D,E @@ -167,8 +170,8 @@ def test_multiple_date_cols_with_header(self): """ def test_multiple_date_col_name_collision(self): - self.assertRaises(ValueError, self.read_csv, StringIO(self.ts_data), - parse_dates={'ID': [1, 2]}) + with pytest.raises(ValueError): + self.read_csv(StringIO(self.ts_data), parse_dates={'ID': [1, 2]}) data = """\ date_NominalTime,date,NominalTime,ActualTime,TDew,TAir,Windspeed,Precip,WindDir @@ -179,8 +182,8 @@ def test_multiple_date_col_name_collision(self): KORD5,19990127, 22:00:00, 21:56:00, -0.5900, 1.7100, 5.1000, 0.0000, 290.0000 KORD6,19990127, 23:00:00, 22:56:00, -0.5900, 1.7100, 4.6000, 0.0000, 280.0000""" # noqa - self.assertRaises(ValueError, self.read_csv, StringIO(data), - parse_dates=[[1, 2]]) + with pytest.raises(ValueError): + self.read_csv(StringIO(data), parse_dates=[[1, 2]]) def test_date_parser_int_bug(self): # See gh-3071 @@ -238,7 +241,7 @@ def test_parse_dates_implicit_first_col(self): """ df = self.read_csv(StringIO(data), parse_dates=True) expected = self.read_csv(StringIO(data), index_col=0, parse_dates=True) - self.assertIsInstance( + assert isinstance( df.index[0], (datetime, np.datetime64, Timestamp)) tm.assert_frame_equal(df, expected) @@ -267,9 +270,9 @@ def test_yy_format_with_yearfirst(self): # See gh-217 import dateutil if dateutil.__version__ >= LooseVersion('2.5.0'): - raise nose.SkipTest("testing yearfirst=True not-support" - "on datetutil < 2.5.0 this works but" - "is wrong") + pytest.skip("testing yearfirst=True not-support" + "on datetutil < 2.5.0 this works but" + "is wrong") rs = self.read_csv(StringIO(data), index_col=0, parse_dates=[['date', 'time']]) @@ -317,13 +320,13 @@ def test_multi_index_parse_dates(self): 20090103,three,c,4,5 """ df = self.read_csv(StringIO(data), index_col=[0, 1], parse_dates=True) - self.assertIsInstance(df.index.levels[0][0], - (datetime, np.datetime64, Timestamp)) + assert isinstance(df.index.levels[0][0], + (datetime, np.datetime64, Timestamp)) # specify columns out of order! df2 = self.read_csv(StringIO(data), index_col=[1, 0], parse_dates=True) - self.assertIsInstance(df2.index.levels[1][0], - (datetime, np.datetime64, Timestamp)) + assert isinstance(df2.index.levels[1][0], + (datetime, np.datetime64, Timestamp)) def test_parse_dates_custom_euroformat(self): text = """foo,bar,baz @@ -344,11 +347,11 @@ def test_parse_dates_custom_euroformat(self): tm.assert_frame_equal(df, expected) parser = lambda d: parse_date(d, day_first=True) - self.assertRaises(TypeError, self.read_csv, - StringIO(text), skiprows=[0], - names=['time', 'Q', 'NTU'], index_col=0, - parse_dates=True, date_parser=parser, - na_values=['NA']) + pytest.raises(TypeError, self.read_csv, + StringIO(text), skiprows=[0], + names=['time', 'Q', 'NTU'], index_col=0, + parse_dates=True, date_parser=parser, + na_values=['NA']) def test_parse_tz_aware(self): # See gh-1693 @@ -358,15 +361,15 @@ def test_parse_tz_aware(self): # it works result = self.read_csv(data, index_col=0, parse_dates=True) stamp = result.index[0] - self.assertEqual(stamp.minute, 39) + assert stamp.minute == 39 try: - self.assertIs(result.index.tz, pytz.utc) + assert result.index.tz is pytz.utc except AssertionError: # hello Yaroslav arr = result.index.to_pydatetime() result = tools.to_datetime(arr, utc=True)[0] - self.assertEqual(stamp.minute, result.minute) - self.assertEqual(stamp.hour, result.hour) - self.assertEqual(stamp.day, result.day) + assert stamp.minute == result.minute + assert stamp.hour == result.hour + assert stamp.day == result.day def test_multiple_date_cols_index(self): data = """ @@ -399,7 +402,7 @@ def test_multiple_date_cols_chunked(self): chunks = list(reader) - self.assertNotIn('nominalTime', df) + assert 'nominalTime' not in df tm.assert_frame_equal(chunks[0], df[:2]) tm.assert_frame_equal(chunks[1], df[2:4]) @@ -432,11 +435,11 @@ def test_read_with_parse_dates_scalar_non_bool(self): data = """A,B,C 1,2,2003-11-1""" - tm.assertRaisesRegexp(TypeError, errmsg, self.read_csv, - StringIO(data), parse_dates="C") - tm.assertRaisesRegexp(TypeError, errmsg, self.read_csv, - StringIO(data), parse_dates="C", - index_col="C") + tm.assert_raises_regex(TypeError, errmsg, self.read_csv, + StringIO(data), parse_dates="C") + tm.assert_raises_regex(TypeError, errmsg, self.read_csv, + StringIO(data), parse_dates="C", + index_col="C") def test_read_with_parse_dates_invalid_type(self): errmsg = ("Only booleans, lists, and " @@ -445,19 +448,20 @@ def test_read_with_parse_dates_invalid_type(self): data = """A,B,C 1,2,2003-11-1""" - tm.assertRaisesRegexp(TypeError, errmsg, self.read_csv, - StringIO(data), parse_dates=(1,)) - tm.assertRaisesRegexp(TypeError, errmsg, self.read_csv, - StringIO(data), parse_dates=np.array([4, 5])) - tm.assertRaisesRegexp(TypeError, errmsg, self.read_csv, - StringIO(data), parse_dates=set([1, 3, 3])) + tm.assert_raises_regex(TypeError, errmsg, self.read_csv, + StringIO(data), parse_dates=(1,)) + tm.assert_raises_regex(TypeError, errmsg, + self.read_csv, StringIO(data), + parse_dates=np.array([4, 5])) + tm.assert_raises_regex(TypeError, errmsg, self.read_csv, + StringIO(data), parse_dates=set([1, 3, 3])) def test_parse_dates_empty_string(self): # see gh-2263 data = "Date, test\n2012-01-01, 1\n,2" result = self.read_csv(StringIO(data), parse_dates=["Date"], na_filter=False) - self.assertTrue(result['Date'].isnull()[1]) + assert result['Date'].isna()[1] def test_parse_dates_noconvert_thousands(self): # see gh-14066 @@ -490,3 +494,164 @@ def test_parse_dates_noconvert_thousands(self): result = self.read_csv(StringIO(data), index_col=[0, 1], parse_dates=True, thousands='.') tm.assert_frame_equal(result, expected) + + def test_parse_date_time_multi_level_column_name(self): + data = """\ +D,T,A,B +date, time,a,b +2001-01-05, 09:00:00, 0.0, 10. +2001-01-06, 00:00:00, 1.0, 11. +""" + datecols = {'date_time': [0, 1]} + result = self.read_csv(StringIO(data), sep=',', header=[0, 1], + parse_dates=datecols, + date_parser=conv.parse_date_time) + + expected_data = [[datetime(2001, 1, 5, 9, 0, 0), 0., 10.], + [datetime(2001, 1, 6, 0, 0, 0), 1., 11.]] + expected = DataFrame(expected_data, + columns=['date_time', ('A', 'a'), ('B', 'b')]) + tm.assert_frame_equal(result, expected) + + def test_parse_date_time(self): + dates = np.array(['2007/1/3', '2008/2/4'], dtype=object) + times = np.array(['05:07:09', '06:08:00'], dtype=object) + expected = np.array([datetime(2007, 1, 3, 5, 7, 9), + datetime(2008, 2, 4, 6, 8, 0)]) + + result = conv.parse_date_time(dates, times) + assert (result == expected).all() + + data = """\ +date, time, a, b +2001-01-05, 10:00:00, 0.0, 10. +2001-01-05, 00:00:00, 1., 11. +""" + datecols = {'date_time': [0, 1]} + df = self.read_csv(StringIO(data), sep=',', header=0, + parse_dates=datecols, + date_parser=conv.parse_date_time) + assert 'date_time' in df + assert df.date_time.loc[0] == datetime(2001, 1, 5, 10, 0, 0) + + data = ("KORD,19990127, 19:00:00, 18:56:00, 0.8100\n" + "KORD,19990127, 20:00:00, 19:56:00, 0.0100\n" + "KORD,19990127, 21:00:00, 20:56:00, -0.5900\n" + "KORD,19990127, 21:00:00, 21:18:00, -0.9900\n" + "KORD,19990127, 22:00:00, 21:56:00, -0.5900\n" + "KORD,19990127, 23:00:00, 22:56:00, -0.5900") + + date_spec = {'nominal': [1, 2], 'actual': [1, 3]} + df = self.read_csv(StringIO(data), header=None, parse_dates=date_spec, + date_parser=conv.parse_date_time) + + def test_parse_date_fields(self): + years = np.array([2007, 2008]) + months = np.array([1, 2]) + days = np.array([3, 4]) + result = conv.parse_date_fields(years, months, days) + expected = np.array([datetime(2007, 1, 3), datetime(2008, 2, 4)]) + assert (result == expected).all() + + data = ("year, month, day, a\n 2001 , 01 , 10 , 10.\n" + "2001 , 02 , 1 , 11.") + datecols = {'ymd': [0, 1, 2]} + df = self.read_csv(StringIO(data), sep=',', header=0, + parse_dates=datecols, + date_parser=conv.parse_date_fields) + assert 'ymd' in df + assert df.ymd.loc[0] == datetime(2001, 1, 10) + + def test_datetime_six_col(self): + years = np.array([2007, 2008]) + months = np.array([1, 2]) + days = np.array([3, 4]) + hours = np.array([5, 6]) + minutes = np.array([7, 8]) + seconds = np.array([9, 0]) + expected = np.array([datetime(2007, 1, 3, 5, 7, 9), + datetime(2008, 2, 4, 6, 8, 0)]) + + result = conv.parse_all_fields(years, months, days, + hours, minutes, seconds) + + assert (result == expected).all() + + data = """\ +year, month, day, hour, minute, second, a, b +2001, 01, 05, 10, 00, 0, 0.0, 10. +2001, 01, 5, 10, 0, 00, 1., 11. +""" + datecols = {'ymdHMS': [0, 1, 2, 3, 4, 5]} + df = self.read_csv(StringIO(data), sep=',', header=0, + parse_dates=datecols, + date_parser=conv.parse_all_fields) + assert 'ymdHMS' in df + assert df.ymdHMS.loc[0] == datetime(2001, 1, 5, 10, 0, 0) + + def test_datetime_fractional_seconds(self): + data = """\ +year, month, day, hour, minute, second, a, b +2001, 01, 05, 10, 00, 0.123456, 0.0, 10. +2001, 01, 5, 10, 0, 0.500000, 1., 11. +""" + datecols = {'ymdHMS': [0, 1, 2, 3, 4, 5]} + df = self.read_csv(StringIO(data), sep=',', header=0, + parse_dates=datecols, + date_parser=conv.parse_all_fields) + assert 'ymdHMS' in df + assert df.ymdHMS.loc[0] == datetime(2001, 1, 5, 10, 0, 0, + microsecond=123456) + assert df.ymdHMS.loc[1] == datetime(2001, 1, 5, 10, 0, 0, + microsecond=500000) + + def test_generic(self): + data = "year, month, day, a\n 2001, 01, 10, 10.\n 2001, 02, 1, 11." + datecols = {'ym': [0, 1]} + dateconverter = lambda y, m: date(year=int(y), month=int(m), day=1) + df = self.read_csv(StringIO(data), sep=',', header=0, + parse_dates=datecols, + date_parser=dateconverter) + assert 'ym' in df + assert df.ym.loc[0] == date(2001, 1, 1) + + def test_dateparser_resolution_if_not_ns(self): + # GH 10245 + data = """\ +date,time,prn,rxstatus +2013-11-03,19:00:00,126,00E80000 +2013-11-03,19:00:00,23,00E80000 +2013-11-03,19:00:00,13,00E80000 +""" + + def date_parser(date, time): + datetime = np_array_datetime64_compat( + date + 'T' + time + 'Z', dtype='datetime64[s]') + return datetime + + df = self.read_csv(StringIO(data), date_parser=date_parser, + parse_dates={'datetime': ['date', 'time']}, + index_col=['datetime', 'prn']) + + datetimes = np_array_datetime64_compat(['2013-11-03T19:00:00Z'] * 3, + dtype='datetime64[s]') + df_correct = DataFrame(data={'rxstatus': ['00E80000'] * 3}, + index=MultiIndex.from_tuples( + [(datetimes[0], 126), + (datetimes[1], 23), + (datetimes[2], 13)], + names=['datetime', 'prn'])) + tm.assert_frame_equal(df, df_correct) + + def test_parse_date_column_with_empty_string(self): + # GH 6428 + data = """case,opdate + 7,10/18/2006 + 7,10/18/2008 + 621, """ + result = self.read_csv(StringIO(data), parse_dates=['opdate']) + expected_data = [[7, '10/18/2006'], + [7, '10/18/2008'], + [621, ' ']] + expected = DataFrame(expected_data, columns=['case', 'opdate']) + tm.assert_frame_equal(result, expected) diff --git a/pandas/io/tests/parser/python_parser_only.py b/pandas/tests/io/parser/python_parser_only.py similarity index 76% rename from pandas/io/tests/parser/python_parser_only.py rename to pandas/tests/io/parser/python_parser_only.py index ad62aaa275127..a0784d3aeae2d 100644 --- a/pandas/io/tests/parser/python_parser_only.py +++ b/pandas/tests/io/parser/python_parser_only.py @@ -8,30 +8,33 @@ """ import csv -import sys -import nose +import pytest import pandas.util.testing as tm from pandas import DataFrame, Index from pandas import compat +from pandas.errors import ParserError from pandas.compat import StringIO, BytesIO, u class PythonParserTests(object): - def test_negative_skipfooter_raises(self): - text = """#foo,a,b,c -#foo,a,b,c -#foo,a,b,c -#foo,a,b,c -#foo,a,b,c -#foo,a,b,c -1/1/2000,1.,2.,3. -1/2/2000,4,5,6 -1/3/2000,7,8,9 -""" - with tm.assertRaisesRegexp( - ValueError, 'skip footer cannot be negative'): + def test_invalid_skipfooter(self): + text = "a\n1\n2" + + # see gh-15925 (comment) + msg = "skipfooter must be an integer" + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(text), skipfooter="foo") + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(text), skipfooter=1.5) + + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(text), skipfooter=True) + + msg = "skipfooter cannot be negative" + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(text), skipfooter=-1) def test_sniff_delimiter(self): @@ -41,8 +44,8 @@ def test_sniff_delimiter(self): baz|7|8|9 """ data = self.read_csv(StringIO(text), index_col=0, sep=None) - self.assert_index_equal(data.index, - Index(['foo', 'bar', 'baz'], name='index')) + tm.assert_index_equal(data.index, + Index(['foo', 'bar', 'baz'], name='index')) data2 = self.read_csv(StringIO(text), index_col=0, delimiter='|') tm.assert_frame_equal(data, data2) @@ -78,7 +81,7 @@ def test_sniff_delimiter(self): def test_BytesIO_input(self): if not compat.PY3: - raise nose.SkipTest( + pytest.skip( "Bytes-related test - only needs to work on Python 3") data = BytesIO("שלום::1234\n562::123".encode('cp1255')) @@ -88,16 +91,9 @@ def test_BytesIO_input(self): def test_single_line(self): # see gh-6607: sniff separator - - buf = StringIO() - sys.stdout = buf - - try: - df = self.read_csv(StringIO('1,2'), names=['a', 'b'], - header=None, sep=None) - tm.assert_frame_equal(DataFrame({'a': [1], 'b': [2]}), df) - finally: - sys.stdout = sys.__stdout__ + df = self.read_csv(StringIO('1,2'), names=['a', 'b'], + header=None, sep=None) + tm.assert_frame_equal(DataFrame({'a': [1], 'b': [2]}), df) def test_skipfooter(self): # see gh-6607 @@ -129,7 +125,7 @@ def test_decompression_regex_sep(self): import gzip import bz2 except ImportError: - raise nose.SkipTest('need gzip and bz2 to run') + pytest.skip('need gzip and bz2 to run') with open(self.csv1, 'rb') as f: data = f.read() @@ -152,8 +148,8 @@ def test_decompression_regex_sep(self): result = self.read_csv(path, sep='::', compression='bz2') tm.assert_frame_equal(result, expected) - self.assertRaises(ValueError, self.read_csv, - path, compression='bz3') + pytest.raises(ValueError, self.read_csv, + path, compression='bz3') def test_read_table_buglet_4x_multiindex(self): # see gh-6607 @@ -164,7 +160,7 @@ def test_read_table_buglet_4x_multiindex(self): x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838""" df = self.read_table(StringIO(text), sep=r'\s+') - self.assertEqual(df.index.names, ('one', 'two', 'three', 'four')) + assert df.index.names == ('one', 'two', 'three', 'four') # see gh-6893 data = ' A B C\na b c\n1 3 7 0 3 6\n3 1 4 1 5 9' @@ -212,27 +208,29 @@ def test_multi_char_sep_quotes(self): data = 'a,,b\n1,,a\n2,,"2,,b"' msg = 'ignored when a multi-char delimiter is used' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ParserError, msg): self.read_csv(StringIO(data), sep=',,') # We expect no match, so there should be an assertion # error out of the inner context manager. - with tm.assertRaises(AssertionError): - with tm.assertRaisesRegexp(ValueError, msg): + with pytest.raises(AssertionError): + with tm.assert_raises_regex(ParserError, msg): self.read_csv(StringIO(data), sep=',,', quoting=csv.QUOTE_NONE) def test_skipfooter_bad_row(self): # see gh-13879 + # see gh-15910 - data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz' msg = 'parsing errors in the skipped footer rows' - with tm.assertRaisesRegexp(csv.Error, msg): - self.read_csv(StringIO(data), skipfooter=1) + for data in ('a\n1\n"b"a', + 'a,b,c\ncat,foo,bar\ndog,foo,"baz'): + with tm.assert_raises_regex(ParserError, msg): + self.read_csv(StringIO(data), skipfooter=1) - # We expect no match, so there should be an assertion - # error out of the inner context manager. - with tm.assertRaises(AssertionError): - with tm.assertRaisesRegexp(csv.Error, msg): - self.read_csv(StringIO(data)) + # We expect no match, so there should be an assertion + # error out of the inner context manager. + with pytest.raises(AssertionError): + with tm.assert_raises_regex(ParserError, msg): + self.read_csv(StringIO(data)) diff --git a/pandas/io/tests/parser/quoting.py b/pandas/tests/io/parser/quoting.py similarity index 82% rename from pandas/io/tests/parser/quoting.py rename to pandas/tests/io/parser/quoting.py index a692e03e868c7..15427aaf9825c 100644 --- a/pandas/io/tests/parser/quoting.py +++ b/pandas/tests/io/parser/quoting.py @@ -20,29 +20,29 @@ def test_bad_quote_char(self): # Python 2.x: "...must be an 1-character..." # Python 3.x: "...must be a 1-character..." msg = '"quotechar" must be a(n)? 1-character string' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quotechar='foo') + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quotechar='foo') msg = 'quotechar must be set if quoting enabled' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quotechar=None, - quoting=csv.QUOTE_MINIMAL) + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quotechar=None, + quoting=csv.QUOTE_MINIMAL) msg = '"quotechar" must be string, not int' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quotechar=2) + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quotechar=2) def test_bad_quoting(self): data = '1,2,3' msg = '"quoting" must be an integer' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quoting='foo') + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quoting='foo') # quoting must in the range [0, 3] msg = 'bad "quoting" value' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quoting=5) + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quoting=5) def test_quote_char_basic(self): data = 'a,b,c\n1,2,"cat"' @@ -68,13 +68,13 @@ def test_null_quote_char(self): # sanity checks msg = 'quotechar must be set if quoting enabled' - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quotechar=None, - quoting=csv.QUOTE_MINIMAL) + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quotechar=None, + quoting=csv.QUOTE_MINIMAL) - tm.assertRaisesRegexp(TypeError, msg, self.read_csv, - StringIO(data), quotechar='', - quoting=csv.QUOTE_MINIMAL) + tm.assert_raises_regex(TypeError, msg, self.read_csv, + StringIO(data), quotechar='', + quoting=csv.QUOTE_MINIMAL) # no errors should be raised if quoting is None expected = DataFrame([[1, 2, 3]], diff --git a/pandas/io/tests/parser/skiprows.py b/pandas/tests/io/parser/skiprows.py similarity index 98% rename from pandas/io/tests/parser/skiprows.py rename to pandas/tests/io/parser/skiprows.py index c53e6a1579267..fb08ec0447267 100644 --- a/pandas/io/tests/parser/skiprows.py +++ b/pandas/tests/io/parser/skiprows.py @@ -12,7 +12,7 @@ import pandas.util.testing as tm from pandas import DataFrame -from pandas.io.common import EmptyDataError +from pandas.errors import EmptyDataError from pandas.compat import StringIO, range, lrange @@ -215,11 +215,11 @@ def test_skiprows_callable(self): skiprows = lambda x: True msg = "No columns to parse from file" - with tm.assertRaisesRegexp(EmptyDataError, msg): + with tm.assert_raises_regex(EmptyDataError, msg): self.read_csv(StringIO(data), skiprows=skiprows) # This is a bad callable and should raise. msg = "by zero" skiprows = lambda x: 1 / 0 - with tm.assertRaisesRegexp(ZeroDivisionError, msg): + with tm.assert_raises_regex(ZeroDivisionError, msg): self.read_csv(StringIO(data), skiprows=skiprows) diff --git a/pandas/io/tests/parser/test_network.py b/pandas/tests/io/parser/test_network.py similarity index 61% rename from pandas/io/tests/parser/test_network.py rename to pandas/tests/io/parser/test_network.py index d84c2ae3beb0c..3344243f8137a 100644 --- a/pandas/io/tests/parser/test_network.py +++ b/pandas/tests/io/parser/test_network.py @@ -6,83 +6,80 @@ """ import os -import nose -import functools -from itertools import product +import pytest import pandas.util.testing as tm from pandas import DataFrame from pandas.io.parsers import read_csv, read_table -class TestCompressedUrl(object): +@pytest.fixture(scope='module') +def salaries_table(): + path = os.path.join(tm.get_data_path(), 'salaries.csv') + return read_table(path) - compression_to_extension = { - 'gzip': '.gz', - 'bz2': '.bz2', - 'zip': '.zip', - 'xz': '.xz', - } - def __init__(self): - path = os.path.join(tm.get_data_path(), 'salaries.csv') - self.local_table = read_table(path) - self.base_url = ('https://github.com/pandas-dev/pandas/raw/master/' - 'pandas/io/tests/parser/data/salaries.csv') +@pytest.mark.network +@pytest.mark.parametrize( + "compression,extension", + [('gzip', '.gz'), ('bz2', '.bz2'), ('zip', '.zip'), + pytest.param('xz', '.xz', + marks=pytest.mark.skipif(not tm._check_if_lzma(), + reason='need backports.lzma ' + 'to run'))]) +@pytest.mark.parametrize('mode', ['explicit', 'infer']) +@pytest.mark.parametrize('engine', ['python', 'c']) +def test_compressed_urls(salaries_table, compression, extension, mode, engine): + check_compressed_urls(salaries_table, compression, extension, mode, engine) - @tm.network - def test_compressed_urls(self): - # Test reading compressed tables from URL. - msg = ('Test reading {}-compressed tables from URL: ' - 'compression="{}", engine="{}"') - - for compression, extension in self.compression_to_extension.items(): - url = self.base_url + extension - # args is a (compression, engine) tuple - for args in product([compression, 'infer'], ['python', 'c']): - # test_fxn is a workaround for more descriptive nose reporting. - # See http://stackoverflow.com/a/37393684/4651668. - test_fxn = functools.partial(self.check_table) - test_fxn.description = msg.format(compression, *args) - yield (test_fxn, url) + args - - def check_table(self, url, compression, engine): - if url.endswith('.xz'): - tm._skip_if_no_lzma() - url_table = read_table(url, compression=compression, engine=engine) - tm.assert_frame_equal(url_table, self.local_table) - - -class TestS3(tm.TestCase): - - def setUp(self): + +@tm.network +def check_compressed_urls(salaries_table, compression, extension, mode, + engine): + # test reading compressed urls with various engines and + # extension inference + base_url = ('https://github.com/pandas-dev/pandas/raw/master/' + 'pandas/tests/io/parser/data/salaries.csv') + + url = base_url + extension + + if mode != 'explicit': + compression = mode + + url_table = read_table(url, compression=compression, engine=engine) + tm.assert_frame_equal(url_table, salaries_table) + + +class TestS3(object): + + def setup_method(self, method): try: import s3fs # noqa except ImportError: - raise nose.SkipTest("s3fs not installed") + pytest.skip("s3fs not installed") @tm.network def test_parse_public_s3_bucket(self): for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: df = read_csv('s3://pandas-test/tips.csv' + ext, compression=comp) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')), df) # Read public file from bucket with not-public contents df = read_csv('s3://cant_get_it/tips.csv') - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv(tm.get_data_path('tips.csv')), df) @tm.network def test_parse_public_s3n_bucket(self): # Read from AWS s3 as "s3n" URL df = read_csv('s3n://pandas-test/tips.csv', nrows=10) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')).iloc[:10], df) @@ -90,8 +87,8 @@ def test_parse_public_s3n_bucket(self): def test_parse_public_s3a_bucket(self): # Read from AWS s3 as "s3a" URL df = read_csv('s3a://pandas-test/tips.csv', nrows=10) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')).iloc[:10], df) @@ -100,8 +97,8 @@ def test_parse_public_s3_bucket_nrows(self): for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: df = read_csv('s3://pandas-test/tips.csv' + ext, nrows=10, compression=comp) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')).iloc[:10], df) @@ -113,13 +110,13 @@ def test_parse_public_s3_bucket_chunked(self): for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: df_reader = read_csv('s3://pandas-test/tips.csv' + ext, chunksize=chunksize, compression=comp) - self.assertEqual(df_reader.chunksize, chunksize) + assert df_reader.chunksize == chunksize for i_chunk in [0, 1, 2]: # Read a couple of chunks and make sure we see them # properly. df = df_reader.get_chunk() - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty true_df = local_tips.iloc[ chunksize * i_chunk: chunksize * (i_chunk + 1)] tm.assert_frame_equal(true_df, df) @@ -133,12 +130,12 @@ def test_parse_public_s3_bucket_chunked_python(self): df_reader = read_csv('s3://pandas-test/tips.csv' + ext, chunksize=chunksize, compression=comp, engine='python') - self.assertEqual(df_reader.chunksize, chunksize) + assert df_reader.chunksize == chunksize for i_chunk in [0, 1, 2]: # Read a couple of chunks and make sure we see them properly. df = df_reader.get_chunk() - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty true_df = local_tips.iloc[ chunksize * i_chunk: chunksize * (i_chunk + 1)] tm.assert_frame_equal(true_df, df) @@ -148,8 +145,8 @@ def test_parse_public_s3_bucket_python(self): for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: df = read_csv('s3://pandas-test/tips.csv' + ext, engine='python', compression=comp) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')), df) @@ -158,8 +155,8 @@ def test_infer_s3_compression(self): for ext in ['', '.gz', '.bz2']: df = read_csv('s3://pandas-test/tips.csv' + ext, engine='python', compression='infer') - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')), df) @@ -168,21 +165,36 @@ def test_parse_public_s3_bucket_nrows_python(self): for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: df = read_csv('s3://pandas-test/tips.csv' + ext, engine='python', nrows=10, compression=comp) - self.assertTrue(isinstance(df, DataFrame)) - self.assertFalse(df.empty) + assert isinstance(df, DataFrame) + assert not df.empty tm.assert_frame_equal(read_csv( tm.get_data_path('tips.csv')).iloc[:10], df) @tm.network def test_s3_fails(self): - with tm.assertRaises(IOError): + with pytest.raises(IOError): read_csv('s3://nyqpug/asdf.csv') # Receive a permission error when trying to read a private bucket. # It's irrelevant here that this isn't actually a table. - with tm.assertRaises(IOError): + with pytest.raises(IOError): read_csv('s3://cant_get_it/') -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + @tm.network + def boto3_client_s3(self): + # see gh-16135 + + # boto3 is a dependency of s3fs + import boto3 + client = boto3.client("s3") + + key = "/tips.csv" + bucket = "pandas-test" + s3_object = client.get_object(Bucket=bucket, Key=key) + + result = read_csv(s3_object["Body"]) + assert isinstance(result, DataFrame) + assert not result.empty + + expected = read_csv(tm.get_data_path('tips.csv')) + tm.assert_frame_equal(result, expected) diff --git a/pandas/io/tests/parser/test_parsers.py b/pandas/tests/io/parser/test_parsers.py similarity index 87% rename from pandas/io/tests/parser/test_parsers.py rename to pandas/tests/io/parser/test_parsers.py index a90f546d37fc8..2fee2451c5e36 100644 --- a/pandas/io/tests/parser/test_parsers.py +++ b/pandas/tests/io/parser/test_parsers.py @@ -1,8 +1,6 @@ # -*- coding: utf-8 -*- import os -import nose - import pandas.util.testing as tm from pandas import read_csv, read_table @@ -21,6 +19,7 @@ from .c_parser_only import CParserTests from .parse_dates import ParseDatesTests from .compression import CompressionTests +from .mangle_dupes import DupeColumnTests from .multithread import MultithreadTests from .python_parser_only import PythonParserTests from .dtypes import DtypeTests @@ -28,11 +27,13 @@ class BaseParser(CommentTests, CompressionTests, ConverterTests, DialectTests, + DtypeTests, DupeColumnTests, HeaderTests, IndexColTests, MultithreadTests, NAvaluesTests, ParseDatesTests, ParserTests, SkipRowsTests, UsecolsTests, - QuotingTests, DtypeTests): + QuotingTests): + def read_csv(self, *args, **kwargs): raise NotImplementedError @@ -42,7 +43,7 @@ def read_table(self, *args, **kwargs): def float_precision_choices(self): raise AbstractMethodError(self) - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.csv1 = os.path.join(self.dirpath, 'test1.csv') self.csv2 = os.path.join(self.dirpath, 'test2.csv') @@ -50,7 +51,7 @@ def setUp(self): self.csv_shiftjs = os.path.join(self.dirpath, 'sauron.SHIFT_JIS.csv') -class TestCParserHighMemory(BaseParser, CParserTests, tm.TestCase): +class TestCParserHighMemory(BaseParser, CParserTests): engine = 'c' low_memory = False float_precision_choices = [None, 'high', 'round_trip'] @@ -68,7 +69,7 @@ def read_table(self, *args, **kwds): return read_table(*args, **kwds) -class TestCParserLowMemory(BaseParser, CParserTests, tm.TestCase): +class TestCParserLowMemory(BaseParser, CParserTests): engine = 'c' low_memory = True float_precision_choices = [None, 'high', 'round_trip'] @@ -86,7 +87,7 @@ def read_table(self, *args, **kwds): return read_table(*args, **kwds) -class TestPythonParser(BaseParser, PythonParserTests, tm.TestCase): +class TestPythonParser(BaseParser, PythonParserTests): engine = 'python' float_precision_choices = [None] @@ -99,7 +100,3 @@ def read_table(self, *args, **kwds): kwds = kwds.copy() kwds['engine'] = self.engine return read_table(*args, **kwds) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/parser/test_read_fwf.py b/pandas/tests/io/parser/test_read_fwf.py similarity index 90% rename from pandas/io/tests/parser/test_read_fwf.py rename to pandas/tests/io/parser/test_read_fwf.py index a423355081ac3..ec1d1a2a51cdc 100644 --- a/pandas/io/tests/parser/test_read_fwf.py +++ b/pandas/tests/io/parser/test_read_fwf.py @@ -8,7 +8,7 @@ from datetime import datetime -import nose +import pytest import numpy as np import pandas as pd import pandas.util.testing as tm @@ -19,7 +19,7 @@ from pandas.io.parsers import read_csv, read_fwf, EmptyDataError -class TestFwfParsing(tm.TestCase): +class TestFwfParsing(object): def test_fwf(self): data_expected = """\ @@ -67,15 +67,16 @@ def test_fwf(self): StringIO(data3), colspecs=colspecs, delimiter='~', header=None) tm.assert_frame_equal(df, expected) - with tm.assertRaisesRegexp(ValueError, "must specify only one of"): + with tm.assert_raises_regex(ValueError, + "must specify only one of"): read_fwf(StringIO(data3), colspecs=colspecs, widths=[6, 10, 10, 7]) - with tm.assertRaisesRegexp(ValueError, "Must specify either"): + with tm.assert_raises_regex(ValueError, "Must specify either"): read_fwf(StringIO(data3), colspecs=None, widths=None) def test_BytesIO_input(self): if not compat.PY3: - raise nose.SkipTest( + pytest.skip( "Bytes-related test - only needs to work on Python 3") result = read_fwf(BytesIO("שלום\nשלום".encode('utf8')), widths=[ @@ -93,9 +94,9 @@ def test_fwf_colspecs_is_list_or_tuple(self): bar2,12,13,14,15 """ - with tm.assertRaisesRegexp(TypeError, - 'column specifications must be a list or ' - 'tuple.+'): + with tm.assert_raises_regex(TypeError, + 'column specifications must ' + 'be a list or tuple.+'): pd.io.parsers.FixedWidthReader(StringIO(data), {'a': 1}, ',', '#') @@ -109,8 +110,9 @@ def test_fwf_colspecs_is_list_or_tuple_of_two_element_tuples(self): bar2,12,13,14,15 """ - with tm.assertRaisesRegexp(TypeError, - 'Each column specification must be.+'): + with tm.assert_raises_regex(TypeError, + 'Each column specification ' + 'must be.+'): read_fwf(StringIO(data), [('a', 1)]) def test_fwf_colspecs_None(self): @@ -164,7 +166,7 @@ def test_fwf_regression(self): for c in df.columns: res = df.loc[:, c] - self.assertTrue(len(res)) + assert len(res) def test_fwf_for_uint8(self): data = """1421302965.213420 PRI=3 PGN=0xef00 DST=0x17 SRC=0x28 04 154 00 00 00 00 00 127 @@ -192,7 +194,7 @@ def test_fwf_compression(self): import gzip import bz2 except ImportError: - raise nose.SkipTest("Need gzip and bz2 to run this test") + pytest.skip("Need gzip and bz2 to run this test") data = """1111111111 2222222222 @@ -243,7 +245,7 @@ def test_bool_header_arg(self): a b""" for arg in [True, False]: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): read_fwf(StringIO(data), header=arg) def test_full_file(self): @@ -333,7 +335,7 @@ def test_multiple_delimiters(self): def test_variable_width_unicode(self): if not compat.PY3: - raise nose.SkipTest( + pytest.skip( 'Bytes-related test - only needs to work on Python 3') test = """ שלום שלום @@ -401,5 +403,34 @@ def test_skiprows_inference_empty(self): 78 901 2 """.strip() - with tm.assertRaises(EmptyDataError): + with pytest.raises(EmptyDataError): read_fwf(StringIO(test), skiprows=3) + + def test_whitespace_preservation(self): + # Addresses Issue #16772 + data_expected = """ + a ,bbb + cc,dd """ + expected = read_csv(StringIO(data_expected), header=None) + + test_data = """ + a bbb + ccdd """ + result = read_fwf(StringIO(test_data), widths=[3, 3], + header=None, skiprows=[0], delimiter="\n\t") + + tm.assert_frame_equal(result, expected) + + def test_default_delimiter(self): + data_expected = """ +a,bbb +cc,dd""" + expected = read_csv(StringIO(data_expected), header=None) + + test_data = """ +a \tbbb +cc\tdd """ + result = read_fwf(StringIO(test_data), widths=[3, 3], + header=None, skiprows=[0]) + + tm.assert_frame_equal(result, expected) diff --git a/pandas/io/tests/parser/test_textreader.py b/pandas/tests/io/parser/test_textreader.py similarity index 77% rename from pandas/io/tests/parser/test_textreader.py rename to pandas/tests/io/parser/test_textreader.py index 98cb09cd85480..c9088d2ecc5e7 100644 --- a/pandas/io/tests/parser/test_textreader.py +++ b/pandas/tests/io/parser/test_textreader.py @@ -5,12 +5,13 @@ is integral to the C engine in parsers.py """ +import pytest + from pandas.compat import StringIO, BytesIO, map from pandas import compat import os import sys -import nose from numpy import nan import numpy as np @@ -21,13 +22,13 @@ import pandas.util.testing as tm -from pandas.parser import TextReader -import pandas.parser as parser +from pandas._libs.parsers import TextReader +import pandas._libs.parsers as parser -class TestTextReader(tm.TestCase): +class TestTextReader(object): - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.csv1 = os.path.join(self.dirpath, 'test1.csv') self.csv2 = os.path.join(self.dirpath, 'test2.csv') @@ -65,7 +66,7 @@ def test_string_factorize(self): data = 'a\nb\na\nb\na' reader = TextReader(StringIO(data), header=None) result = reader.read() - self.assertEqual(len(set(map(id, result[0]))), 2) + assert len(set(map(id, result[0]))) == 2 def test_skipinitialspace(self): data = ('a, b\n' @@ -77,12 +78,10 @@ def test_skipinitialspace(self): header=None) result = reader.read() - self.assert_numpy_array_equal(result[0], - np.array(['a', 'a', 'a', 'a'], - dtype=np.object_)) - self.assert_numpy_array_equal(result[1], - np.array(['b', 'b', 'b', 'b'], - dtype=np.object_)) + tm.assert_numpy_array_equal(result[0], np.array(['a', 'a', 'a', 'a'], + dtype=np.object_)) + tm.assert_numpy_array_equal(result[1], np.array(['b', 'b', 'b', 'b'], + dtype=np.object_)) def test_parse_booleans(self): data = 'True\nFalse\nTrue\nTrue' @@ -90,7 +89,7 @@ def test_parse_booleans(self): reader = TextReader(StringIO(data), header=None) result = reader.read() - self.assertEqual(result[0].dtype, np.bool_) + assert result[0].dtype == np.bool_ def test_delimit_whitespace(self): data = 'a b\na\t\t "b"\n"a"\t \t b' @@ -99,10 +98,10 @@ def test_delimit_whitespace(self): header=None) result = reader.read() - self.assert_numpy_array_equal(result[0], np.array(['a', 'a', 'a'], - dtype=np.object_)) - self.assert_numpy_array_equal(result[1], np.array(['b', 'b', 'b'], - dtype=np.object_)) + tm.assert_numpy_array_equal(result[0], np.array(['a', 'a', 'a'], + dtype=np.object_)) + tm.assert_numpy_array_equal(result[1], np.array(['b', 'b', 'b'], + dtype=np.object_)) def test_embedded_newline(self): data = 'a\n"hello\nthere"\nthis' @@ -111,7 +110,7 @@ def test_embedded_newline(self): result = reader.read() expected = np.array(['a', 'hello\nthere', 'this'], dtype=np.object_) - self.assert_numpy_array_equal(result[0], expected) + tm.assert_numpy_array_equal(result[0], expected) def test_euro_decimal(self): data = '12345,67\n345,678' @@ -143,6 +142,7 @@ def test_integer_thousands_alt(self): expected = DataFrame([123456, 12500]) tm.assert_frame_equal(result, expected) + @tm.capture_stderr def test_skip_bad_lines(self): # too many lines, see #2430 for why data = ('a:b:c\n' @@ -154,7 +154,7 @@ def test_skip_bad_lines(self): reader = TextReader(StringIO(data), delimiter=':', header=None) - self.assertRaises(parser.ParserError, reader.read) + pytest.raises(parser.ParserError, reader.read) reader = TextReader(StringIO(data), delimiter=':', header=None, @@ -166,19 +166,15 @@ def test_skip_bad_lines(self): 2: ['c', 'f', 'i', 'n']} assert_array_dicts_equal(result, expected) - stderr = sys.stderr - sys.stderr = StringIO() - try: - reader = TextReader(StringIO(data), delimiter=':', - header=None, - error_bad_lines=False, - warn_bad_lines=True) - reader.read() - val = sys.stderr.getvalue() - self.assertTrue('Skipping line 4' in val) - self.assertTrue('Skipping line 6' in val) - finally: - sys.stderr = stderr + reader = TextReader(StringIO(data), delimiter=':', + header=None, + error_bad_lines=False, + warn_bad_lines=True) + reader.read() + val = sys.stderr.getvalue() + + assert 'Skipping line 4' in val + assert 'Skipping line 6' in val def test_header_not_enough_lines(self): data = ('skip this\n' @@ -190,15 +186,15 @@ def test_header_not_enough_lines(self): reader = TextReader(StringIO(data), delimiter=',', header=2) header = reader.header expected = [['a', 'b', 'c']] - self.assertEqual(header, expected) + assert header == expected recs = reader.read() expected = {0: [1, 4], 1: [2, 5], 2: [3, 6]} assert_array_dicts_equal(expected, recs) # not enough rows - self.assertRaises(parser.ParserError, TextReader, StringIO(data), - delimiter=',', header=5, as_recarray=True) + pytest.raises(parser.ParserError, TextReader, StringIO(data), + delimiter=',', header=5, as_recarray=True) def test_header_not_enough_lines_as_recarray(self): data = ('skip this\n' @@ -211,15 +207,15 @@ def test_header_not_enough_lines_as_recarray(self): as_recarray=True) header = reader.header expected = [['a', 'b', 'c']] - self.assertEqual(header, expected) + assert header == expected recs = reader.read() expected = {'a': [1, 4], 'b': [2, 5], 'c': [3, 6]} assert_array_dicts_equal(expected, recs) # not enough rows - self.assertRaises(parser.ParserError, TextReader, StringIO(data), - delimiter=',', header=5, as_recarray=True) + pytest.raises(parser.ParserError, TextReader, StringIO(data), + delimiter=',', header=5, as_recarray=True) def test_escapechar(self): data = ('\\"hello world\"\n' @@ -254,18 +250,18 @@ def _make_reader(**kwds): reader = _make_reader(dtype='S5,i4') result = reader.read() - self.assertEqual(result[0].dtype, 'S5') + assert result[0].dtype == 'S5' ex_values = np.array(['a', 'aa', 'aaa', 'aaaa', 'aaaaa'], dtype='S5') - self.assertTrue((result[0] == ex_values).all()) - self.assertEqual(result[1].dtype, 'i4') + assert (result[0] == ex_values).all() + assert result[1].dtype == 'i4' reader = _make_reader(dtype='S4') result = reader.read() - self.assertEqual(result[0].dtype, 'S4') + assert result[0].dtype == 'S4' ex_values = np.array(['a', 'aa', 'aaa', 'aaaa', 'aaaa'], dtype='S4') - self.assertTrue((result[0] == ex_values).all()) - self.assertEqual(result[1].dtype, 'S4') + assert (result[0] == ex_values).all() + assert result[1].dtype == 'S4' def test_numpy_string_dtype_as_recarray(self): data = """\ @@ -281,10 +277,10 @@ def _make_reader(**kwds): reader = _make_reader(dtype='S4', as_recarray=True) result = reader.read() - self.assertEqual(result['0'].dtype, 'S4') + assert result['0'].dtype == 'S4' ex_values = np.array(['a', 'aa', 'aaa', 'aaaa', 'aaaa'], dtype='S4') - self.assertTrue((result['0'] == ex_values).all()) - self.assertEqual(result['1'].dtype, 'S4') + assert (result['0'] == ex_values).all() + assert result['1'].dtype == 'S4' def test_pass_dtype(self): data = """\ @@ -299,19 +295,19 @@ def _make_reader(**kwds): reader = _make_reader(dtype={'one': 'u1', 1: 'S1'}) result = reader.read() - self.assertEqual(result[0].dtype, 'u1') - self.assertEqual(result[1].dtype, 'S1') + assert result[0].dtype == 'u1' + assert result[1].dtype == 'S1' reader = _make_reader(dtype={'one': np.uint8, 1: object}) result = reader.read() - self.assertEqual(result[0].dtype, 'u1') - self.assertEqual(result[1].dtype, 'O') + assert result[0].dtype == 'u1' + assert result[1].dtype == 'O' reader = _make_reader(dtype={'one': np.dtype('u1'), 1: np.dtype('O')}) result = reader.read() - self.assertEqual(result[0].dtype, 'u1') - self.assertEqual(result[1].dtype, 'O') + assert result[0].dtype == 'u1' + assert result[1].dtype == 'O' def test_usecols(self): data = """\ @@ -328,9 +324,9 @@ def _make_reader(**kwds): result = reader.read() exp = _make_reader().read() - self.assertEqual(len(result), 2) - self.assertTrue((result[1] == exp[1]).all()) - self.assertTrue((result[2] == exp[2]).all()) + assert len(result) == 2 + assert (result[1] == exp[1]).all() + assert (result[2] == exp[2]).all() def test_cr_delimited(self): def _test(text, **kwargs): @@ -396,13 +392,9 @@ def test_empty_csv_input(self): # GH14867 df = read_csv(StringIO(), chunksize=20, header=None, names=['a', 'b', 'c']) - self.assertTrue(isinstance(df, TextFileReader)) + assert isinstance(df, TextFileReader) def assert_array_dicts_equal(left, right): for k, v in compat.iteritems(left): assert(np.array_equal(v, right[k])) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/parser/test_unsupported.py b/pandas/tests/io/parser/test_unsupported.py similarity index 68% rename from pandas/io/tests/parser/test_unsupported.py rename to pandas/tests/io/parser/test_unsupported.py index 4d93df16a0279..5d248f2fef59c 100644 --- a/pandas/io/tests/parser/test_unsupported.py +++ b/pandas/tests/io/parser/test_unsupported.py @@ -9,50 +9,47 @@ test suite as new feature support is added to the parsers. """ -import nose - import pandas.io.parsers as parsers import pandas.util.testing as tm from pandas.compat import StringIO -from pandas.io.common import ParserError +from pandas.errors import ParserError from pandas.io.parsers import read_csv, read_table +import pytest + + +@pytest.fixture(params=["python", "python-fwf"], ids=lambda val: val) +def python_engine(request): + return request.param + + +class TestUnsupportedFeatures(object): -class TestUnsupportedFeatures(tm.TestCase): def test_mangle_dupe_cols_false(self): # see gh-12935 data = 'a b c\n1 2 3' msg = 'is not supported' for engine in ('c', 'python'): - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_csv(StringIO(data), engine=engine, mangle_dupe_cols=False) - def test_nrows_and_chunksize(self): - data = 'a b c' - msg = "cannot be used together yet" - - for engine in ('c', 'python'): - with tm.assertRaisesRegexp(NotImplementedError, msg): - read_csv(StringIO(data), engine=engine, - nrows=10, chunksize=5) - def test_c_engine(self): # see gh-6607 data = 'a b c\n1 2 3' msg = 'does not support' # specify C engine with unsupported options (raise) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_table(StringIO(data), engine='c', sep=None, delim_whitespace=False) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_table(StringIO(data), engine='c', sep=r'\s') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_table(StringIO(data), engine='c', quotechar=chr(128)) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_table(StringIO(data), engine='c', skipfooter=1) # specify C-unsupported options without python-unsupported options @@ -72,9 +69,9 @@ def test_c_engine(self): x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838""" msg = 'Error tokenizing data' - with tm.assertRaisesRegexp(ParserError, msg): + with tm.assert_raises_regex(ParserError, msg): read_table(StringIO(text), sep='\\s+') - with tm.assertRaisesRegexp(ParserError, msg): + with tm.assert_raises_regex(ParserError, msg): read_table(StringIO(text), engine='c', sep='\\s+') msg = "Only length-1 thousands markers supported" @@ -82,17 +79,17 @@ def test_c_engine(self): 1|2,334|5 10|13|10. """ - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_csv(StringIO(data), thousands=',,') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_csv(StringIO(data), thousands='') msg = "Only length-1 line terminators supported" data = 'a,b,c~~1,2,3~~4,5,6' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): read_csv(StringIO(data), lineterminator='~~') - def test_python_engine(self): + def test_python_engine(self, python_engine): from pandas.io.parsers import _python_unsupported as py_unsupported data = """1,2,3,, @@ -100,19 +97,36 @@ def test_python_engine(self): 1,2,3,4,5 1,2,,, 1,2,3,4,""" - engines = 'python', 'python-fwf' - for engine in engines: - for default in py_unsupported: - msg = ('The %r option is not supported ' - 'with the %r engine' % (default, engine)) + for default in py_unsupported: + msg = ('The %r option is not supported ' + 'with the %r engine' % (default, python_engine)) + + kwargs = {default: object()} + with tm.assert_raises_regex(ValueError, msg): + read_csv(StringIO(data), engine=python_engine, **kwargs) + + def test_python_engine_file_no_next(self, python_engine): + # see gh-16530 + class NoNextBuffer(object): + def __init__(self, csv_data): + self.data = csv_data + + def __iter__(self): + return self - kwargs = {default: object()} - with tm.assertRaisesRegexp(ValueError, msg): - read_csv(StringIO(data), engine=engine, **kwargs) + def read(self): + return self.data + data = "a\n1" + msg = "The 'python' engine cannot iterate" + + with tm.assert_raises_regex(ValueError, msg): + read_csv(NoNextBuffer(data), engine=python_engine) + + +class TestDeprecatedFeatures(object): -class TestDeprecatedFeatures(tm.TestCase): def test_deprecated_args(self): data = '1,2,3' @@ -121,8 +135,8 @@ def test_deprecated_args(self): 'as_recarray': True, 'buffer_lines': True, 'compact_ints': True, - 'skip_footer': True, 'use_unsigned': True, + 'skip_footer': 1, } engines = 'c', 'python' @@ -142,7 +156,3 @@ def test_deprecated_args(self): kwargs = {arg: non_default_val} read_csv(StringIO(data), engine=engine, **kwargs) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/parser/usecols.py b/pandas/tests/io/parser/usecols.py similarity index 86% rename from pandas/io/tests/parser/usecols.py rename to pandas/tests/io/parser/usecols.py index 4875282067fb3..f582e5037ca07 100644 --- a/pandas/io/tests/parser/usecols.py +++ b/pandas/tests/io/parser/usecols.py @@ -5,13 +5,13 @@ for all of the parsers defined in parsers.py """ -import nose +import pytest import numpy as np import pandas.util.testing as tm from pandas import DataFrame, Index -from pandas.lib import Timestamp +from pandas._libs.lib import Timestamp from pandas.compat import StringIO @@ -28,7 +28,7 @@ def test_raise_on_mixed_dtype_usecols(self): "all integers or a callable") usecols = [0, 'b', 2] - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(data), usecols=usecols) def test_usecols(self): @@ -43,9 +43,9 @@ def test_usecols(self): result2 = self.read_csv(StringIO(data), usecols=('b', 'c')) exp = self.read_csv(StringIO(data)) - self.assertEqual(len(result.columns), 2) - self.assertTrue((result['b'] == exp['b']).all()) - self.assertTrue((result['c'] == exp['c']).all()) + assert len(result.columns) == 2 + assert (result['b'] == exp['b']).all() + assert (result['c'] == exp['c']).all() tm.assert_frame_equal(result, result2) @@ -82,8 +82,8 @@ def test_usecols(self): tm.assert_frame_equal(result, expected) # length conflict, passed names and usecols disagree - self.assertRaises(ValueError, self.read_csv, StringIO(data), - names=['a', 'b'], usecols=[1], header=None) + pytest.raises(ValueError, self.read_csv, StringIO(data), + names=['a', 'b'], usecols=[1], header=None) def test_usecols_index_col_False(self): # see gh-9082 @@ -351,10 +351,10 @@ def test_usecols_with_mixed_encoding_strings(self): msg = ("'usecols' must either be all strings, all unicode, " "all integers or a callable") - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(s), usecols=[u'AAA', b'BBB']) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.read_csv(StringIO(s), usecols=[b'AAA', u'BBB']) def test_usecols_with_multibyte_characters(self): @@ -377,7 +377,7 @@ def test_usecols_with_multibyte_characters(self): tm.assert_frame_equal(df, expected) def test_usecols_with_multibyte_unicode_characters(self): - raise nose.SkipTest('TODO: see gh-13253') + pytest.skip('TODO: see gh-13253') s = '''あああ,いい,ううう,ええええ 0.056674973,8,True,a @@ -475,3 +475,54 @@ def test_uneven_length_cols(self): 'C': [3, 5, 4, 3, 3, 7]}) df = self.read_csv(StringIO(data), usecols=usecols) tm.assert_frame_equal(df, expected) + + def test_raise_on_usecols_names_mismatch(self): + # GH 14671 + data = 'a,b,c,d\n1,2,3,4\n5,6,7,8' + + if self.engine == 'c': + msg = 'Usecols do not match names' + else: + msg = 'is not in list' + + usecols = ['a', 'b', 'c', 'd'] + df = self.read_csv(StringIO(data), usecols=usecols) + expected = DataFrame({'a': [1, 5], 'b': [2, 6], 'c': [3, 7], + 'd': [4, 8]}) + tm.assert_frame_equal(df, expected) + + usecols = ['a', 'b', 'c', 'f'] + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), usecols=usecols) + + usecols = ['a', 'b', 'f'] + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), usecols=usecols) + + names = ['A', 'B', 'C', 'D'] + + df = self.read_csv(StringIO(data), header=0, names=names) + expected = DataFrame({'A': [1, 5], 'B': [2, 6], 'C': [3, 7], + 'D': [4, 8]}) + tm.assert_frame_equal(df, expected) + + # TODO: https://github.com/pandas-dev/pandas/issues/16469 + # usecols = ['A','C'] + # df = self.read_csv(StringIO(data), header=0, names=names, + # usecols=usecols) + # expected = DataFrame({'A': [1,5], 'C': [3,7]}) + # tm.assert_frame_equal(df, expected) + # + # usecols = [0,2] + # df = self.read_csv(StringIO(data), header=0, names=names, + # usecols=usecols) + # expected = DataFrame({'A': [1,5], 'C': [3,7]}) + # tm.assert_frame_equal(df, expected) + + usecols = ['A', 'B', 'C', 'f'] + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), header=0, names=names, + usecols=usecols) + usecols = ['A', 'B', 'f'] + with tm.assert_raises_regex(ValueError, msg): + self.read_csv(StringIO(data), names=names, usecols=usecols) diff --git a/pandas/tests/io/sas/__init__.py b/pandas/tests/io/sas/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/io/tests/sas/data/DEMO_G.csv b/pandas/tests/io/sas/data/DEMO_G.csv similarity index 100% rename from pandas/io/tests/sas/data/DEMO_G.csv rename to pandas/tests/io/sas/data/DEMO_G.csv diff --git a/pandas/io/tests/sas/data/DEMO_G.xpt b/pandas/tests/io/sas/data/DEMO_G.xpt similarity index 100% rename from pandas/io/tests/sas/data/DEMO_G.xpt rename to pandas/tests/io/sas/data/DEMO_G.xpt diff --git a/pandas/io/tests/sas/data/DRXFCD_G.csv b/pandas/tests/io/sas/data/DRXFCD_G.csv similarity index 100% rename from pandas/io/tests/sas/data/DRXFCD_G.csv rename to pandas/tests/io/sas/data/DRXFCD_G.csv diff --git a/pandas/io/tests/sas/data/DRXFCD_G.xpt b/pandas/tests/io/sas/data/DRXFCD_G.xpt similarity index 100% rename from pandas/io/tests/sas/data/DRXFCD_G.xpt rename to pandas/tests/io/sas/data/DRXFCD_G.xpt diff --git a/pandas/io/tests/sas/data/SSHSV1_A.csv b/pandas/tests/io/sas/data/SSHSV1_A.csv similarity index 100% rename from pandas/io/tests/sas/data/SSHSV1_A.csv rename to pandas/tests/io/sas/data/SSHSV1_A.csv diff --git a/pandas/io/tests/sas/data/SSHSV1_A.xpt b/pandas/tests/io/sas/data/SSHSV1_A.xpt similarity index 100% rename from pandas/io/tests/sas/data/SSHSV1_A.xpt rename to pandas/tests/io/sas/data/SSHSV1_A.xpt diff --git a/pandas/io/tests/sas/data/airline.csv b/pandas/tests/io/sas/data/airline.csv similarity index 100% rename from pandas/io/tests/sas/data/airline.csv rename to pandas/tests/io/sas/data/airline.csv diff --git a/pandas/io/tests/sas/data/airline.sas7bdat b/pandas/tests/io/sas/data/airline.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/airline.sas7bdat rename to pandas/tests/io/sas/data/airline.sas7bdat diff --git a/pandas/tests/io/sas/data/datetime.csv b/pandas/tests/io/sas/data/datetime.csv new file mode 100644 index 0000000000000..6126f6d04eaf0 --- /dev/null +++ b/pandas/tests/io/sas/data/datetime.csv @@ -0,0 +1,5 @@ +Date1,Date2,DateTime,DateTimeHi,Taiw +1677-09-22,1677-09-22,1677-09-21 00:12:44,1677-09-21 00:12:43.145226,1912-01-01 +1960-01-01,1960-01-01,1960-01-01 00:00:00,1960-01-01 00:00:00.000000,1960-01-01 +2016-02-29,2016-02-29,2016-02-29 23:59:59,2016-02-29 23:59:59.123456,2016-02-29 +2262-04-11,2262-04-11,2262-04-11 23:47:16,2262-04-11 23:47:16.854774,2262-04-11 diff --git a/pandas/tests/io/sas/data/datetime.sas7bdat b/pandas/tests/io/sas/data/datetime.sas7bdat new file mode 100644 index 0000000000000..6469dbf29f8ee Binary files /dev/null and b/pandas/tests/io/sas/data/datetime.sas7bdat differ diff --git a/pandas/io/tests/sas/data/paxraw_d_short.csv b/pandas/tests/io/sas/data/paxraw_d_short.csv similarity index 100% rename from pandas/io/tests/sas/data/paxraw_d_short.csv rename to pandas/tests/io/sas/data/paxraw_d_short.csv diff --git a/pandas/io/tests/sas/data/paxraw_d_short.xpt b/pandas/tests/io/sas/data/paxraw_d_short.xpt similarity index 100% rename from pandas/io/tests/sas/data/paxraw_d_short.xpt rename to pandas/tests/io/sas/data/paxraw_d_short.xpt diff --git a/pandas/tests/io/sas/data/productsales.csv b/pandas/tests/io/sas/data/productsales.csv new file mode 100644 index 0000000000000..1f6a4424e1a97 --- /dev/null +++ b/pandas/tests/io/sas/data/productsales.csv @@ -0,0 +1,1441 @@ +ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODTYPE,PRODUCT,QUARTER,YEAR,MONTH +925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +999,297,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +608,846,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +642,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +656,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +948,486,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +612,717,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +114,564,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +685,230,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +657,494,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +608,903,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +353,266,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +107,190,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +354,139,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +101,217,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +553,560,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +877,148,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +431,762,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +511,457,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +157,532,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +520,629,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +114,491,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +277,0,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +561,979,CANADA,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +220,585,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +444,267,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +178,487,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +756,764,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +329,312,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +910,531,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +530,536,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +101,773,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +515,143,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +730,126,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +993,862,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +954,754,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +267,410,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +347,701,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +991,204,CANADA,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +923,509,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +437,378,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +737,507,CANADA,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +104,49,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +840,876,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +704,66,CANADA,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +889,819,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +107,351,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +571,201,CANADA,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +688,209,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +544,51,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +954,135,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +445,47,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +829,379,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +464,758,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +968,475,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +842,343,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +721,507,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +966,269,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +332,699,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +328,824,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +355,497,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +506,44,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +585,522,CANADA,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +634,378,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +662,689,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +783,90,CANADA,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +786,720,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +710,343,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +950,457,CANADA,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +274,947,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +406,834,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +515,71,CANADA,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +35,282,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +995,538,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +670,679,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +406,601,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +825,577,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +467,908,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +709,819,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +522,687,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +688,157,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +956,111,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +129,31,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +687,790,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +877,795,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +845,379,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +425,114,CANADA,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +899,475,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +987,747,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +641,372,CANADA,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +448,415,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +341,955,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +137,356,CANADA,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +235,316,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +482,351,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +678,164,CANADA,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +240,386,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +605,113,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +274,68,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +422,885,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +763,575,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +561,743,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +339,816,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +877,203,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +192,581,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +604,815,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +55,333,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +87,40,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +942,672,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +912,23,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +768,948,CANADA,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +951,291,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +768,839,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +978,864,CANADA,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +20,337,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +298,95,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +193,535,CANADA,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +336,191,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +617,412,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +709,711,CANADA,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +5,425,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +164,215,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +422,948,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +424,544,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +854,764,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +168,446,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +8,957,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +748,967,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +682,11,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +300,110,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +672,263,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +894,215,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +944,965,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +403,423,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +596,753,CANADA,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +481,770,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +503,263,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +126,79,CANADA,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +721,441,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +271,858,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +721,667,CANADA,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +157,193,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +991,394,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +499,680,CANADA,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +284,414,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +705,770,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +737,679,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +745,7,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +633,713,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +983,851,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +591,944,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +42,130,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +771,485,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +465,23,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +296,193,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +890,7,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +312,919,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +777,768,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +364,854,CANADA,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +601,411,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +823,736,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +847,10,CANADA,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +490,311,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +387,348,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +688,458,CANADA,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +650,195,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +447,658,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +91,704,CANADA,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +197,807,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +51,861,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +570,873,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +423,933,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +524,355,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +416,794,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +789,645,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +551,700,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +400,831,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +361,800,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +189,830,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +554,828,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +585,12,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +281,501,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +629,914,CANADA,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +43,685,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +533,755,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +882,708,CANADA,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +790,595,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +600,32,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +148,49,CANADA,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +237,727,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +488,239,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +457,273,CANADA,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +401,986,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +181,544,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +995,182,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +120,197,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +119,435,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +319,974,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +333,524,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +923,688,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +634,750,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +493,155,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +461,860,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +304,102,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +641,425,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +992,224,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +202,408,CANADA,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +770,524,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +202,816,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +14,515,CANADA,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +134,793,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +977,460,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +174,732,CANADA,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +429,435,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +514,38,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +784,616,CANADA,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +973,225,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +511,402,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +30,697,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +895,567,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +557,231,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +282,372,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +909,15,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +276,866,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +234,452,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +479,663,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +782,982,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +755,813,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +689,523,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +496,871,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +24,511,CANADA,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +379,819,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +441,525,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +49,13,CANADA,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +243,694,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +295,782,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +395,839,CANADA,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +929,461,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +997,303,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +889,421,CANADA,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 +72,421,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +926,433,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +850,394,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +826,338,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +651,764,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +854,216,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +899,96,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +309,550,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +943,636,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +138,427,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +99,652,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +270,478,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +862,18,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +574,40,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +359,453,CANADA,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +958,987,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +791,26,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +284,101,CANADA,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +190,969,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +527,492,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +112,263,CANADA,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +271,593,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +643,923,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +554,146,CANADA,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +211,305,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +368,318,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +778,417,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +808,623,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +46,761,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +466,272,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +18,988,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +87,821,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +765,962,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +62,615,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +13,523,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +775,806,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +636,586,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +458,520,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +206,908,CANADA,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +310,30,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +813,247,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +22,647,CANADA,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +742,55,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +394,154,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +957,344,CANADA,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +205,95,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +198,665,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +638,145,CANADA,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +155,925,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +688,395,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +730,749,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +208,279,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +525,288,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +483,509,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +748,255,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +6,214,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +168,473,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +301,702,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +9,814,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +778,231,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +799,422,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +309,572,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +433,363,CANADA,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +969,919,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +181,355,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +787,992,CANADA,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +971,147,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +440,183,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +209,375,CANADA,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +537,77,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +364,308,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +377,660,CANADA,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +251,555,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +607,455,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +127,888,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +513,652,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +146,799,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +917,249,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +776,539,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +330,198,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +981,340,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +862,152,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +612,347,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +607,565,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +786,855,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +160,87,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +199,69,CANADA,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +972,807,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +870,565,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +494,798,CANADA,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +975,714,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +760,17,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +180,797,CANADA,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +256,422,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +422,621,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +859,661,CANADA,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +586,363,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +441,910,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +597,998,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +717,95,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +713,731,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +591,718,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +492,467,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +170,126,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +684,127,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +981,746,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +966,878,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +439,27,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +151,569,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +602,812,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +187,603,CANADA,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +415,506,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +61,185,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +839,692,CANADA,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +596,565,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +751,512,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +460,86,CANADA,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +922,399,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +153,672,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +928,801,CANADA,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +951,730,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +394,408,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +615,982,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +653,499,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +180,307,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +649,741,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +921,640,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +11,300,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +696,929,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +795,309,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +550,340,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +320,228,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +845,1000,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +245,21,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +142,583,CANADA,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +717,506,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +3,405,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +790,556,CANADA,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +646,72,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +230,103,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +938,262,CANADA,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +629,102,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +317,841,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +812,159,CANADA,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +141,570,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +64,375,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +207,298,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +435,32,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +96,760,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +252,338,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +956,149,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +633,343,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +190,151,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +227,44,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +24,583,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +420,230,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +910,907,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +709,783,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +810,117,CANADA,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +723,416,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +911,318,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +230,888,CANADA,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +448,60,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +945,596,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +508,576,CANADA,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +262,576,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +441,280,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +15,219,CANADA,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +795,133,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +301,273,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +304,86,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +49,400,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +576,364,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +669,63,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +325,929,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +272,344,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +80,768,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +46,668,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +223,407,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +774,536,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +784,657,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +92,215,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +67,966,CANADA,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +747,674,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +686,574,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +93,266,CANADA,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +192,680,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +51,362,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +498,412,CANADA,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +546,431,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +485,94,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +925,345,CANADA,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +292,445,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +540,632,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +21,855,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +100,36,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +49,250,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +353,427,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +911,367,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +823,245,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +278,893,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +576,490,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +655,88,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +763,964,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +88,62,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +746,506,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +927,680,CANADA,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +297,153,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +291,403,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +838,98,CANADA,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +112,376,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +509,477,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +472,50,CANADA,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +495,592,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +1000,813,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +241,740,CANADA,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +693,873,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +903,459,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +791,224,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +108,562,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +845,199,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +452,275,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +479,355,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +410,947,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +379,454,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +740,450,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +471,575,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +325,6,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +455,847,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +563,338,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +879,517,CANADA,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +312,630,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +587,381,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +628,864,CANADA,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +486,416,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +811,852,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +990,815,CANADA,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +35,23,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +764,527,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +619,693,CANADA,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 +996,977,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +554,549,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +540,951,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +140,390,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +554,204,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +724,78,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +693,613,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +866,745,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +833,56,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +164,887,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +753,651,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +60,691,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +688,767,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +883,709,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +109,417,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +950,326,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +438,599,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +286,818,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +342,13,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +383,185,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +80,140,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +322,717,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +749,852,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +606,125,GERMANY,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +641,325,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +494,648,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +428,365,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +936,120,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +597,347,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +728,638,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +933,732,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +663,465,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +394,262,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +334,947,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +114,694,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +89,482,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +874,600,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +674,94,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +347,323,GERMANY,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +105,49,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +286,70,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +669,844,GERMANY,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +786,773,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +104,68,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +770,110,GERMANY,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +263,42,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +900,171,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +630,644,GERMANY,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +597,408,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +185,45,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +175,522,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +576,166,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +957,885,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +993,713,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +500,838,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +410,267,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +592,967,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +64,529,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +208,656,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +273,665,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +906,419,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +429,776,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +961,971,GERMANY,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +338,248,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +472,486,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +903,674,GERMANY,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +299,603,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +948,492,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +931,512,GERMANY,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +570,391,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +97,313,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +674,758,GERMANY,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +468,304,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +430,846,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +893,912,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +519,810,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +267,122,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +908,102,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +176,161,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +673,450,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +798,215,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +291,765,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +583,557,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +442,739,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +951,811,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +430,780,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +559,645,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +726,365,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +944,597,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +497,126,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +388,655,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +81,604,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +111,280,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +288,115,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +845,205,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +745,672,GERMANY,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +352,339,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +234,70,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +167,528,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +606,220,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +670,691,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +764,197,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +659,239,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +996,50,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +424,135,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +899,972,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +392,475,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +555,868,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +860,451,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +114,565,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +943,116,GERMANY,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +365,385,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +249,375,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +192,357,GERMANY,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +328,230,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +311,829,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +576,971,GERMANY,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +915,280,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +522,853,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +625,953,GERMANY,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +873,874,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +498,578,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +808,768,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +742,178,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +744,916,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +30,917,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +747,633,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +672,107,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +564,523,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +785,924,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +825,481,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +243,240,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +959,819,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +123,602,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +714,538,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +252,632,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +715,952,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +670,480,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +81,700,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +653,726,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +795,526,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +182,410,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +725,307,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +101,73,GERMANY,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +143,232,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +15,993,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +742,652,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +339,761,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +39,428,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +465,4,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +889,101,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +856,869,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +358,271,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +452,633,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +387,481,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +824,302,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +185,245,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +151,941,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +419,721,GERMANY,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +643,893,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +63,898,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +202,94,GERMANY,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +332,962,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +723,71,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +148,108,GERMANY,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +840,71,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +601,767,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +962,323,GERMANY,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +166,982,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +531,614,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +963,839,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +994,388,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +978,296,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +72,429,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +33,901,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +428,350,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +413,581,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +737,583,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +85,92,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +916,647,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +785,771,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +302,26,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +1000,598,GERMANY,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +458,715,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +896,74,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +615,580,GERMANY,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +174,848,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +651,118,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +784,54,GERMANY,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +121,929,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +341,393,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +615,820,GERMANY,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +697,336,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +215,299,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +197,747,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +205,154,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +256,486,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +377,251,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +577,225,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +686,77,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +332,74,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +534,596,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +485,493,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +594,782,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +413,487,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +13,127,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +483,538,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +820,94,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +745,252,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +79,722,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +36,536,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +950,958,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +74,466,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +458,309,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +609,680,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +429,539,GERMANY,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +956,511,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +205,505,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +629,720,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +277,823,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +266,21,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +872,142,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +435,95,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +988,398,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +953,328,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +556,151,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +211,978,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +389,918,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +351,542,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +14,96,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +181,496,GERMANY,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +452,77,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +511,236,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +193,913,GERMANY,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +797,49,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +988,967,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +487,502,GERMANY,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +941,790,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +577,121,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +456,55,GERMANY,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 +982,739,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +593,683,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +702,610,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +528,248,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +873,530,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +301,889,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +769,245,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +724,473,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +466,938,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +774,150,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +111,772,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +954,201,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +780,945,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +210,177,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +93,378,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +332,83,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +186,803,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +782,398,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +41,215,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +222,194,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +992,287,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +477,410,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +948,50,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +817,204,GERMANY,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +597,239,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +649,637,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +3,938,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +731,788,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +181,399,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +468,576,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +891,187,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +226,703,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +28,455,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +609,244,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +224,868,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +230,353,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +216,101,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +282,924,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +501,144,GERMANY,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +320,0,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +720,910,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +464,259,GERMANY,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +363,107,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +49,63,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +223,270,GERMANY,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +452,554,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +210,154,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +444,205,GERMANY,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +222,441,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +678,183,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +25,459,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +57,810,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +981,268,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +740,916,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +408,742,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +966,522,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +107,299,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +488,677,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +759,709,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +504,310,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +99,160,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +503,698,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +724,540,GERMANY,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +309,901,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +625,34,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +294,536,GERMANY,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +890,780,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +501,716,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +34,532,GERMANY,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +203,871,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +140,199,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +845,845,GERMANY,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +774,591,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +645,378,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +986,942,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +296,686,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +936,720,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +341,546,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +32,845,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +277,667,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +548,627,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +727,142,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +812,655,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +168,556,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +150,459,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +136,89,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +695,726,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +363,38,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +853,60,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +621,369,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +764,381,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +669,465,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +772,981,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +228,758,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +261,31,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +821,237,GERMANY,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +100,285,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +465,94,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +350,561,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +991,143,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +910,95,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +206,341,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +263,388,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +374,272,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +875,890,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +810,734,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +398,364,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +565,619,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +417,517,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +291,781,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +251,327,GERMANY,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +449,48,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +774,809,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +386,73,GERMANY,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +22,936,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +940,400,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +132,736,GERMANY,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +103,211,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +152,271,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +952,855,GERMANY,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +872,923,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +748,854,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +749,769,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +876,271,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +860,383,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +900,29,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +705,185,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +913,351,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +315,560,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +466,840,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +233,517,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +906,949,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +148,633,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +661,636,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +847,138,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +768,481,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +866,408,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +475,130,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +112,813,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +136,661,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +763,311,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +388,872,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +996,643,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +486,174,GERMANY,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +494,528,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +771,124,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +49,126,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +322,440,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +878,881,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +827,292,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +852,873,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +716,357,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +81,247,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +916,18,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +673,395,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +242,620,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +914,946,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +902,72,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +707,691,GERMANY,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +223,95,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +619,878,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +254,757,GERMANY,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +688,898,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +477,172,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +280,419,GERMANY,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +546,849,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +630,807,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +455,599,GERMANY,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +505,59,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +823,790,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +891,574,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +840,96,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +436,376,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +168,352,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +177,741,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +727,12,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +278,157,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +443,10,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +905,544,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +881,817,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +507,754,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +363,425,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +603,492,GERMANY,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +473,485,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +128,369,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +105,560,GERMANY,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +325,651,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +711,326,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +983,180,GERMANY,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +241,935,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +71,403,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +395,345,GERMANY,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +168,278,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +512,376,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +291,104,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +776,543,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +271,798,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +946,333,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +195,833,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +165,132,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +238,629,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +409,337,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +720,300,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +309,470,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +812,875,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +441,237,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +500,272,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +517,860,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +924,415,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +572,140,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +768,367,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +692,195,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +28,245,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +202,285,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +76,98,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +421,932,GERMANY,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +636,898,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +52,330,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +184,603,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +739,280,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +841,507,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +65,202,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +623,513,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +517,132,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +636,21,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +845,657,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +232,195,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +26,323,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +680,299,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +364,811,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +572,739,GERMANY,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +145,889,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +644,189,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +87,698,GERMANY,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +620,646,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +535,562,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +661,753,GERMANY,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +884,425,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +689,693,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +646,941,GERMANY,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 +4,975,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +813,455,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +773,260,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +205,69,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +657,147,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +154,533,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +747,881,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +787,457,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +867,441,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +307,859,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +571,177,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +92,633,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +269,382,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +764,707,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +662,566,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +818,349,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +617,128,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +649,231,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +895,258,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +750,812,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +738,362,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +107,133,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +278,60,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +32,88,U.S.A.,EAST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +129,378,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +187,569,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +670,186,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +678,875,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +423,636,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +389,360,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +257,677,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +780,708,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +159,158,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +97,384,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +479,927,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +9,134,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +614,273,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +261,27,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +115,209,U.S.A.,EAST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +358,470,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +133,219,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +891,907,U.S.A.,EAST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +702,778,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +58,998,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +606,194,U.S.A.,EAST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +668,933,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +813,708,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +450,949,U.S.A.,EAST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +956,579,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +276,131,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +889,689,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +708,908,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +14,524,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +904,336,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +272,916,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +257,236,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +343,965,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +80,350,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +530,599,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +340,901,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +595,935,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +47,667,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +279,104,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +293,803,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +162,64,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +935,825,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +689,839,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +484,184,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +230,348,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +164,904,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +401,219,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +607,381,U.S.A.,EAST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +229,524,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +786,902,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +92,212,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +455,762,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +409,182,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +166,442,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +277,919,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +92,67,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +631,741,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +390,617,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +403,214,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +964,202,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +223,788,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +684,639,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +645,336,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +470,937,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +424,399,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +862,21,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +736,125,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +554,635,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +790,229,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +115,770,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +853,622,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +643,109,U.S.A.,EAST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +794,975,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +892,820,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +728,123,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +744,135,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +678,535,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +768,971,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +234,166,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +333,814,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +968,557,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +119,820,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +469,486,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +261,429,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +984,65,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +845,977,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +374,410,U.S.A.,EAST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +687,150,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +157,630,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +49,488,U.S.A.,EAST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +817,112,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +223,598,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +433,705,U.S.A.,EAST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +41,226,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +396,979,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +131,19,U.S.A.,EAST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +521,204,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +751,805,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +45,549,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +144,912,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +119,427,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +728,1,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +120,540,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +657,940,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +409,644,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +881,821,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +113,560,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +831,309,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +129,1000,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +76,945,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +260,931,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +882,504,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +157,950,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +443,278,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +111,225,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +497,6,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +321,124,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +194,206,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +684,320,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +634,270,U.S.A.,EAST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +622,278,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +689,447,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +120,170,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +374,87,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +926,384,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +687,574,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +600,585,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +779,947,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +223,984,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +628,189,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +326,364,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +836,49,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +361,851,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +444,643,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +501,143,U.S.A.,EAST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +743,763,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +861,987,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +203,264,U.S.A.,EAST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +762,439,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +705,750,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +153,37,U.S.A.,EAST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +436,95,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +428,79,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +804,832,U.S.A.,EAST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +805,649,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +860,838,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +104,439,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +434,207,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +912,804,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +571,875,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +267,473,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +415,845,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +261,91,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +746,630,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +30,185,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +662,317,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +916,88,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +415,607,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +514,35,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +756,680,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +461,78,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +460,117,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +305,440,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +198,652,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +234,249,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +638,658,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +88,563,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +751,737,U.S.A.,EAST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +816,789,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +437,988,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +715,220,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +780,946,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +245,986,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +201,129,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +815,433,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +865,492,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +634,306,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +901,154,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +789,206,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +882,81,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +953,882,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +862,848,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +628,664,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +765,389,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +741,182,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +61,505,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +470,861,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +869,263,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +650,400,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +750,556,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +602,497,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +54,181,U.S.A.,EAST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +384,619,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +161,332,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +977,669,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +615,487,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +783,994,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +977,331,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +375,739,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +298,665,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +104,921,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +713,862,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +556,662,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +323,517,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +391,352,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +593,166,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +906,859,U.S.A.,EAST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +130,571,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +613,976,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +58,466,U.S.A.,EAST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +314,79,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +67,864,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +654,623,U.S.A.,EAST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +312,170,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +349,662,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +415,763,U.S.A.,EAST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 +404,896,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-01-01 +22,973,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-02-01 +744,161,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1993,1993-03-01 +804,934,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-04-01 +101,697,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-05-01 +293,116,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1993,1993-06-01 +266,84,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-07-01 +372,604,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-08-01 +38,371,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1993,1993-09-01 +385,783,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-10-01 +262,335,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-11-01 +961,321,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1993,1993-12-01 +831,177,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-01-01 +579,371,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-02-01 +301,583,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,1,1994,1994-03-01 +693,364,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-04-01 +895,343,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-05-01 +320,854,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,2,1994,1994-06-01 +284,691,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-07-01 +362,387,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-08-01 +132,298,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,3,1994,1994-09-01 +42,635,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-10-01 +118,81,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-11-01 +42,375,U.S.A.,WEST,EDUCATION,FURNITURE,SOFA,4,1994,1994-12-01 +18,846,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-01-01 +512,933,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-02-01 +337,237,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1993,1993-03-01 +167,964,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-04-01 +749,382,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-05-01 +890,610,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1993,1993-06-01 +910,148,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-07-01 +403,837,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-08-01 +403,85,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1993,1993-09-01 +661,425,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-10-01 +485,633,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-11-01 +789,515,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1993,1993-12-01 +415,512,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-01-01 +418,156,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-02-01 +163,464,U.S.A.,WEST,EDUCATION,FURNITURE,BED,1,1994,1994-03-01 +298,813,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-04-01 +584,455,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-05-01 +797,366,U.S.A.,WEST,EDUCATION,FURNITURE,BED,2,1994,1994-06-01 +767,734,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-07-01 +984,451,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-08-01 +388,134,U.S.A.,WEST,EDUCATION,FURNITURE,BED,3,1994,1994-09-01 +924,547,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-10-01 +566,802,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-11-01 +390,61,U.S.A.,WEST,EDUCATION,FURNITURE,BED,4,1994,1994-12-01 +608,556,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-01-01 +840,202,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-02-01 +112,964,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1993,1993-03-01 +288,112,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-04-01 +408,445,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-05-01 +876,884,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1993,1993-06-01 +224,348,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-07-01 +133,564,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-08-01 +662,568,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1993,1993-09-01 +68,882,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-10-01 +626,542,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-11-01 +678,119,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1993,1993-12-01 +361,248,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-01-01 +464,868,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-02-01 +681,841,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,1,1994,1994-03-01 +377,484,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-04-01 +222,986,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-05-01 +972,39,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,2,1994,1994-06-01 +56,930,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-07-01 +695,252,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-08-01 +908,794,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,3,1994,1994-09-01 +328,658,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-10-01 +891,139,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-11-01 +265,331,U.S.A.,WEST,EDUCATION,OFFICE,TABLE,4,1994,1994-12-01 +251,261,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-01-01 +783,122,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-02-01 +425,296,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1993,1993-03-01 +859,391,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-04-01 +314,75,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-05-01 +153,731,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1993,1993-06-01 +955,883,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-07-01 +654,707,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-08-01 +693,97,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1993,1993-09-01 +757,390,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-10-01 +221,237,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-11-01 +942,496,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1993,1993-12-01 +31,814,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-01-01 +540,765,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-02-01 +352,308,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,1,1994,1994-03-01 +904,327,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-04-01 +436,266,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-05-01 +281,699,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,2,1994,1994-06-01 +801,599,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-07-01 +273,950,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-08-01 +716,117,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,3,1994,1994-09-01 +902,632,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-10-01 +341,35,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-11-01 +155,562,U.S.A.,WEST,EDUCATION,OFFICE,CHAIR,4,1994,1994-12-01 +796,144,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-01-01 +257,142,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-02-01 +611,273,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1993,1993-03-01 +6,915,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-04-01 +125,920,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-05-01 +745,294,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1993,1993-06-01 +437,681,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-07-01 +906,86,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-08-01 +844,764,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1993,1993-09-01 +413,269,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-10-01 +869,138,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-11-01 +403,834,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1993,1993-12-01 +137,112,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-01-01 +922,921,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-02-01 +202,859,U.S.A.,WEST,EDUCATION,OFFICE,DESK,1,1994,1994-03-01 +955,442,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-04-01 +781,593,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-05-01 +12,346,U.S.A.,WEST,EDUCATION,OFFICE,DESK,2,1994,1994-06-01 +931,312,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-07-01 +95,690,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-08-01 +795,344,U.S.A.,WEST,EDUCATION,OFFICE,DESK,3,1994,1994-09-01 +542,784,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-10-01 +935,639,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-11-01 +269,726,U.S.A.,WEST,EDUCATION,OFFICE,DESK,4,1994,1994-12-01 +197,596,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-01-01 +828,263,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-02-01 +461,194,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1993,1993-03-01 +35,895,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-04-01 +88,502,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-05-01 +832,342,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1993,1993-06-01 +900,421,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-07-01 +368,901,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-08-01 +201,474,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1993,1993-09-01 +758,571,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-10-01 +504,511,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-11-01 +864,379,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1993,1993-12-01 +574,68,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-01-01 +61,210,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-02-01 +565,478,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,1,1994,1994-03-01 +475,296,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-04-01 +44,664,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-05-01 +145,880,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,2,1994,1994-06-01 +813,607,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-07-01 +703,97,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-08-01 +757,908,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,3,1994,1994-09-01 +96,152,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-10-01 +860,622,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-11-01 +750,309,U.S.A.,WEST,CONSUMER,FURNITURE,SOFA,4,1994,1994-12-01 +585,912,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-01-01 +127,429,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-02-01 +669,580,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1993,1993-03-01 +708,179,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-04-01 +830,119,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-05-01 +550,369,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1993,1993-06-01 +762,882,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-07-01 +468,727,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-08-01 +151,823,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1993,1993-09-01 +103,783,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-10-01 +876,884,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-11-01 +881,891,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1993,1993-12-01 +116,909,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-01-01 +677,765,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-02-01 +477,180,U.S.A.,WEST,CONSUMER,FURNITURE,BED,1,1994,1994-03-01 +154,712,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-04-01 +331,175,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-05-01 +784,869,U.S.A.,WEST,CONSUMER,FURNITURE,BED,2,1994,1994-06-01 +563,820,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-07-01 +229,554,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-08-01 +451,126,U.S.A.,WEST,CONSUMER,FURNITURE,BED,3,1994,1994-09-01 +974,760,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-10-01 +484,446,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-11-01 +69,254,U.S.A.,WEST,CONSUMER,FURNITURE,BED,4,1994,1994-12-01 +755,516,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-01-01 +331,779,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-02-01 +482,987,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1993,1993-03-01 +632,318,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-04-01 +750,427,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-05-01 +618,86,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1993,1993-06-01 +935,553,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-07-01 +716,315,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-08-01 +205,328,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1993,1993-09-01 +215,521,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-10-01 +871,156,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-11-01 +552,841,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1993,1993-12-01 +619,623,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-01-01 +701,849,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-02-01 +104,438,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,1,1994,1994-03-01 +114,719,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-04-01 +854,906,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-05-01 +563,267,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,2,1994,1994-06-01 +73,542,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-07-01 +427,552,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-08-01 +348,428,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,3,1994,1994-09-01 +148,158,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-10-01 +895,379,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-11-01 +394,142,U.S.A.,WEST,CONSUMER,OFFICE,TABLE,4,1994,1994-12-01 +792,588,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-01-01 +175,506,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-02-01 +208,382,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1993,1993-03-01 +354,132,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-04-01 +163,652,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-05-01 +336,723,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1993,1993-06-01 +804,682,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-07-01 +863,382,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-08-01 +326,125,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1993,1993-09-01 +568,321,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-10-01 +691,922,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-11-01 +152,884,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1993,1993-12-01 +565,38,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-01-01 +38,194,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-02-01 +185,996,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,1,1994,1994-03-01 +318,532,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-04-01 +960,391,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-05-01 +122,104,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,2,1994,1994-06-01 +400,22,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-07-01 +301,650,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-08-01 +909,143,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,3,1994,1994-09-01 +433,999,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-10-01 +508,415,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-11-01 +648,350,U.S.A.,WEST,CONSUMER,OFFICE,CHAIR,4,1994,1994-12-01 +793,342,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-01-01 +129,215,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-02-01 +481,52,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1993,1993-03-01 +406,292,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-04-01 +512,862,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-05-01 +668,309,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1993,1993-06-01 +551,886,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-07-01 +124,172,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-08-01 +655,912,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1993,1993-09-01 +523,666,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-10-01 +739,656,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-11-01 +87,145,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1993,1993-12-01 +890,664,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-01-01 +665,639,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-02-01 +329,707,U.S.A.,WEST,CONSUMER,OFFICE,DESK,1,1994,1994-03-01 +417,891,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-04-01 +828,466,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-05-01 +298,451,U.S.A.,WEST,CONSUMER,OFFICE,DESK,2,1994,1994-06-01 +356,451,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-07-01 +909,874,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-08-01 +251,805,U.S.A.,WEST,CONSUMER,OFFICE,DESK,3,1994,1994-09-01 +526,426,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-10-01 +652,932,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-11-01 +573,581,U.S.A.,WEST,CONSUMER,OFFICE,DESK,4,1994,1994-12-01 diff --git a/pandas/io/tests/sas/data/productsales.sas7bdat b/pandas/tests/io/sas/data/productsales.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/productsales.sas7bdat rename to pandas/tests/io/sas/data/productsales.sas7bdat diff --git a/pandas/io/tests/sas/data/test1.sas7bdat b/pandas/tests/io/sas/data/test1.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test1.sas7bdat rename to pandas/tests/io/sas/data/test1.sas7bdat diff --git a/pandas/io/tests/sas/data/test10.sas7bdat b/pandas/tests/io/sas/data/test10.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test10.sas7bdat rename to pandas/tests/io/sas/data/test10.sas7bdat diff --git a/pandas/io/tests/sas/data/test11.sas7bdat b/pandas/tests/io/sas/data/test11.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test11.sas7bdat rename to pandas/tests/io/sas/data/test11.sas7bdat diff --git a/pandas/io/tests/sas/data/test12.sas7bdat b/pandas/tests/io/sas/data/test12.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test12.sas7bdat rename to pandas/tests/io/sas/data/test12.sas7bdat diff --git a/pandas/io/tests/sas/data/test13.sas7bdat b/pandas/tests/io/sas/data/test13.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test13.sas7bdat rename to pandas/tests/io/sas/data/test13.sas7bdat diff --git a/pandas/io/tests/sas/data/test14.sas7bdat b/pandas/tests/io/sas/data/test14.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test14.sas7bdat rename to pandas/tests/io/sas/data/test14.sas7bdat diff --git a/pandas/io/tests/sas/data/test15.sas7bdat b/pandas/tests/io/sas/data/test15.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test15.sas7bdat rename to pandas/tests/io/sas/data/test15.sas7bdat diff --git a/pandas/io/tests/sas/data/test16.sas7bdat b/pandas/tests/io/sas/data/test16.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test16.sas7bdat rename to pandas/tests/io/sas/data/test16.sas7bdat diff --git a/pandas/io/tests/sas/data/test2.sas7bdat b/pandas/tests/io/sas/data/test2.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test2.sas7bdat rename to pandas/tests/io/sas/data/test2.sas7bdat diff --git a/pandas/io/tests/sas/data/test3.sas7bdat b/pandas/tests/io/sas/data/test3.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test3.sas7bdat rename to pandas/tests/io/sas/data/test3.sas7bdat diff --git a/pandas/io/tests/sas/data/test4.sas7bdat b/pandas/tests/io/sas/data/test4.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test4.sas7bdat rename to pandas/tests/io/sas/data/test4.sas7bdat diff --git a/pandas/io/tests/sas/data/test5.sas7bdat b/pandas/tests/io/sas/data/test5.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test5.sas7bdat rename to pandas/tests/io/sas/data/test5.sas7bdat diff --git a/pandas/io/tests/sas/data/test6.sas7bdat b/pandas/tests/io/sas/data/test6.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test6.sas7bdat rename to pandas/tests/io/sas/data/test6.sas7bdat diff --git a/pandas/io/tests/sas/data/test7.sas7bdat b/pandas/tests/io/sas/data/test7.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test7.sas7bdat rename to pandas/tests/io/sas/data/test7.sas7bdat diff --git a/pandas/io/tests/sas/data/test8.sas7bdat b/pandas/tests/io/sas/data/test8.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test8.sas7bdat rename to pandas/tests/io/sas/data/test8.sas7bdat diff --git a/pandas/io/tests/sas/data/test9.sas7bdat b/pandas/tests/io/sas/data/test9.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test9.sas7bdat rename to pandas/tests/io/sas/data/test9.sas7bdat diff --git a/pandas/io/tests/sas/data/test_12659.csv b/pandas/tests/io/sas/data/test_12659.csv similarity index 100% rename from pandas/io/tests/sas/data/test_12659.csv rename to pandas/tests/io/sas/data/test_12659.csv diff --git a/pandas/io/tests/sas/data/test_12659.sas7bdat b/pandas/tests/io/sas/data/test_12659.sas7bdat similarity index 100% rename from pandas/io/tests/sas/data/test_12659.sas7bdat rename to pandas/tests/io/sas/data/test_12659.sas7bdat diff --git a/pandas/io/tests/sas/data/test_sas7bdat_1.csv b/pandas/tests/io/sas/data/test_sas7bdat_1.csv similarity index 100% rename from pandas/io/tests/sas/data/test_sas7bdat_1.csv rename to pandas/tests/io/sas/data/test_sas7bdat_1.csv diff --git a/pandas/io/tests/sas/data/test_sas7bdat_2.csv b/pandas/tests/io/sas/data/test_sas7bdat_2.csv similarity index 100% rename from pandas/io/tests/sas/data/test_sas7bdat_2.csv rename to pandas/tests/io/sas/data/test_sas7bdat_2.csv diff --git a/pandas/tests/io/sas/test_sas.py b/pandas/tests/io/sas/test_sas.py new file mode 100644 index 0000000000000..b85f6b6bbd5ce --- /dev/null +++ b/pandas/tests/io/sas/test_sas.py @@ -0,0 +1,16 @@ +from pandas.compat import StringIO +from pandas import read_sas + +import pandas.util.testing as tm + + +class TestSas(object): + + def test_sas_buffer_format(self): + # see gh-14947 + b = StringIO("") + + msg = ("If this is a buffer object rather than a string " + "name, you must specify a format string") + with tm.assert_raises_regex(ValueError, msg): + read_sas(b) diff --git a/pandas/io/tests/sas/test_sas7bdat.py b/pandas/tests/io/sas/test_sas7bdat.py similarity index 77% rename from pandas/io/tests/sas/test_sas7bdat.py rename to pandas/tests/io/sas/test_sas7bdat.py index 69073a90e9669..c3fb85811ca2a 100644 --- a/pandas/io/tests/sas/test_sas7bdat.py +++ b/pandas/tests/io/sas/test_sas7bdat.py @@ -6,9 +6,9 @@ import numpy as np -class TestSAS7BDAT(tm.TestCase): +class TestSAS7BDAT(object): - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.data = [] self.test_ix = [list(range(1, 16)), [16]] @@ -65,6 +65,27 @@ def test_from_iterator(self): tm.assert_frame_equal(df, df0.iloc[2:5, :]) rdr.close() + def test_path_pathlib(self): + tm._skip_if_no_pathlib() + from pathlib import Path + for j in 0, 1: + df0 = self.data[j] + for k in self.test_ix[j]: + fname = Path(os.path.join(self.dirpath, "test%d.sas7bdat" % k)) + df = pd.read_sas(fname, encoding='utf-8') + tm.assert_frame_equal(df, df0) + + def test_path_localpath(self): + tm._skip_if_no_localpath() + from py.path import local as LocalPath + for j in 0, 1: + df0 = self.data[j] + for k in self.test_ix[j]: + fname = LocalPath(os.path.join(self.dirpath, + "test%d.sas7bdat" % k)) + df = pd.read_sas(fname, encoding='utf-8') + tm.assert_frame_equal(df, df0) + def test_iterator_loop(self): # github #13654 for j in 0, 1: @@ -75,7 +96,7 @@ def test_iterator_loop(self): y = 0 for x in rdr: y += x.shape[0] - self.assertTrue(y == rdr.row_count) + assert y == rdr.row_count rdr.close() def test_iterator_read_too_much(self): @@ -118,8 +139,8 @@ def test_productsales(): fname = os.path.join(dirpath, "productsales.sas7bdat") df = pd.read_sas(fname, encoding='utf-8') fname = os.path.join(dirpath, "productsales.csv") - df0 = pd.read_csv(fname) - vn = ["ACTUAL", "PREDICT", "QUARTER", "YEAR", "MONTH"] + df0 = pd.read_csv(fname, parse_dates=['MONTH']) + vn = ["ACTUAL", "PREDICT", "QUARTER", "YEAR"] df0[vn] = df0[vn].astype(np.float64) tm.assert_frame_equal(df, df0) @@ -142,3 +163,14 @@ def test_airline(): df0 = pd.read_csv(fname) df0 = df0.astype(np.float64) tm.assert_frame_equal(df, df0, check_exact=False) + + +def test_date_time(): + # Support of different SAS date/datetime formats (PR #15871) + dirpath = tm.get_data_path() + fname = os.path.join(dirpath, "datetime.sas7bdat") + df = pd.read_sas(fname) + fname = os.path.join(dirpath, "datetime.csv") + df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime', + 'DateTimeHi', 'Taiw']) + tm.assert_frame_equal(df, df0) diff --git a/pandas/io/tests/sas/test_xport.py b/pandas/tests/io/sas/test_xport.py similarity index 97% rename from pandas/io/tests/sas/test_xport.py rename to pandas/tests/io/sas/test_xport.py index fe2f7cb4bf4be..de31c3e36a8d5 100644 --- a/pandas/io/tests/sas/test_xport.py +++ b/pandas/tests/io/sas/test_xport.py @@ -16,9 +16,9 @@ def numeric_as_float(data): data[v] = data[v].astype(np.float64) -class TestXport(tm.TestCase): +class TestXport(object): - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.file01 = os.path.join(self.dirpath, "DEMO_G.xpt") self.file02 = os.path.join(self.dirpath, "SSHSV1_A.xpt") @@ -40,7 +40,7 @@ def test1_basic(self): # Test reading beyond end of file reader = read_sas(self.file01, format="xport", iterator=True) data = reader.read(num_rows + 100) - self.assertTrue(data.shape[0] == num_rows) + assert data.shape[0] == num_rows reader.close() # Test incremental read with `read` method. @@ -61,7 +61,7 @@ def test1_basic(self): for x in reader: m += x.shape[0] reader.close() - self.assertTrue(m == num_rows) + assert m == num_rows # Read full file with `read_sas` method data = read_sas(self.file01) diff --git a/pandas/io/tests/test_clipboard.py b/pandas/tests/io/test_clipboard.py similarity index 87% rename from pandas/io/tests/test_clipboard.py rename to pandas/tests/io/test_clipboard.py index 93d14077aeacf..940a331a9de84 100644 --- a/pandas/io/tests/test_clipboard.py +++ b/pandas/tests/io/test_clipboard.py @@ -1,8 +1,9 @@ # -*- coding: utf-8 -*- import numpy as np from numpy.random import randint +from textwrap import dedent -import nose +import pytest import pandas as pd from pandas import DataFrame @@ -10,20 +11,24 @@ from pandas import get_option from pandas.util import testing as tm from pandas.util.testing import makeCustomDataframe as mkdf -from pandas.util.clipboard.exceptions import PyperclipException +from pandas.io.clipboard.exceptions import PyperclipException +from pandas.io.clipboard import clipboard_set try: DataFrame({'A': [1, 2]}).to_clipboard() + _DEPS_INSTALLED = 1 except PyperclipException: - raise nose.SkipTest("clipboard primitives not installed") + _DEPS_INSTALLED = 0 -class TestClipboard(tm.TestCase): +@pytest.mark.single +@pytest.mark.skipif(not _DEPS_INSTALLED, + reason="clipboard primitives not installed") +class TestClipboard(object): @classmethod - def setUpClass(cls): - super(TestClipboard, cls).setUpClass() + def setup_class(cls): cls.data = {} cls.data['string'] = mkdf(5, 3, c_idx_type='s', r_idx_type='i', c_idx_names=[None], r_idx_names=[None]) @@ -54,12 +59,11 @@ def setUpClass(cls): 'es': 'en español'.split()}) # unicode round trip test for GH 13747, GH 12529 cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], - 'b': ['øπ∆˚¬', 'œ∑´®']}) + 'b': ['øπ∆˚¬', 'œ∑´®']}) cls.data_types = list(cls.data.keys()) @classmethod - def tearDownClass(cls): - super(TestClipboard, cls).tearDownClass() + def teardown_class(cls): del cls.data_types, cls.data def check_round_trip_frame(self, data_type, excel=None, sep=None, @@ -87,8 +91,6 @@ def test_round_trip_frame(self): self.check_round_trip_frame(dt) def test_read_clipboard_infer_excel(self): - from textwrap import dedent - from pandas.util.clipboard import clipboard_set text = dedent(""" John James Charlie Mingus @@ -99,7 +101,7 @@ def test_read_clipboard_infer_excel(self): df = pd.read_clipboard() # excel data is parsed correctly - self.assertEqual(df.iloc[1][1], 'Harry Carney') + assert df.iloc[1][1] == 'Harry Carney' # having diff tab counts doesn't trigger it text = dedent(""" @@ -123,9 +125,9 @@ def test_read_clipboard_infer_excel(self): def test_invalid_encoding(self): # test case for testing invalid encoding data = self.data['string'] - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): data.to_clipboard(encoding='ascii') - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.read_clipboard(encoding='ascii') def test_round_trip_valid_encodings(self): diff --git a/pandas/tests/io/test_common.py b/pandas/tests/io/test_common.py new file mode 100644 index 0000000000000..30904593fedc4 --- /dev/null +++ b/pandas/tests/io/test_common.py @@ -0,0 +1,269 @@ +""" + Tests for the pandas.io.common functionalities +""" +import mmap +import pytest +import os +from os.path import isabs + +import pandas as pd +import pandas.util.testing as tm + +from pandas.io import common +from pandas.compat import is_platform_windows, StringIO + +from pandas import read_csv, concat + + +class CustomFSPath(object): + """For testing fspath on unknown objects""" + def __init__(self, path): + self.path = path + + def __fspath__(self): + return self.path + + +# Functions that consume a string path and return a string or path-like object +path_types = [str, CustomFSPath] + +try: + from pathlib import Path + path_types.append(Path) +except ImportError: + pass + +try: + from py.path import local as LocalPath + path_types.append(LocalPath) +except ImportError: + pass + +HERE = os.path.dirname(__file__) + + +class TestCommonIOCapabilities(object): + data1 = """index,A,B,C,D +foo,2,3,4,5 +bar,7,8,9,10 +baz,12,13,14,15 +qux,12,13,14,15 +foo2,12,13,14,15 +bar2,12,13,14,15 +""" + + def test_expand_user(self): + filename = '~/sometest' + expanded_name = common._expand_user(filename) + + assert expanded_name != filename + assert isabs(expanded_name) + assert os.path.expanduser(filename) == expanded_name + + def test_expand_user_normal_path(self): + filename = '/somefolder/sometest' + expanded_name = common._expand_user(filename) + + assert expanded_name == filename + assert os.path.expanduser(filename) == expanded_name + + def test_stringify_path_pathlib(self): + tm._skip_if_no_pathlib() + + rel_path = common._stringify_path(Path('.')) + assert rel_path == '.' + redundant_path = common._stringify_path(Path('foo//bar')) + assert redundant_path == os.path.join('foo', 'bar') + + def test_stringify_path_localpath(self): + tm._skip_if_no_localpath() + + path = os.path.join('foo', 'bar') + abs_path = os.path.abspath(path) + lpath = LocalPath(path) + assert common._stringify_path(lpath) == abs_path + + def test_stringify_path_fspath(self): + p = CustomFSPath('foo/bar.csv') + result = common._stringify_path(p) + assert result == 'foo/bar.csv' + + @pytest.mark.parametrize('extension,expected', [ + ('', None), + ('.gz', 'gzip'), + ('.bz2', 'bz2'), + ('.zip', 'zip'), + ('.xz', 'xz'), + ]) + @pytest.mark.parametrize('path_type', path_types) + def test_infer_compression_from_path(self, extension, expected, path_type): + path = path_type('foo/bar.csv' + extension) + compression = common._infer_compression(path, compression='infer') + assert compression == expected + + def test_get_filepath_or_buffer_with_path(self): + filename = '~/sometest' + filepath_or_buffer, _, _ = common.get_filepath_or_buffer(filename) + assert filepath_or_buffer != filename + assert isabs(filepath_or_buffer) + assert os.path.expanduser(filename) == filepath_or_buffer + + def test_get_filepath_or_buffer_with_buffer(self): + input_buffer = StringIO() + filepath_or_buffer, _, _ = common.get_filepath_or_buffer(input_buffer) + assert filepath_or_buffer == input_buffer + + def test_iterator(self): + reader = read_csv(StringIO(self.data1), chunksize=1) + result = concat(reader, ignore_index=True) + expected = read_csv(StringIO(self.data1)) + tm.assert_frame_equal(result, expected) + + # GH12153 + it = read_csv(StringIO(self.data1), chunksize=1) + first = next(it) + tm.assert_frame_equal(first, expected.iloc[[0]]) + tm.assert_frame_equal(concat(it), expected.iloc[1:]) + + @pytest.mark.parametrize('reader, module, path', [ + (pd.read_csv, 'os', os.path.join(HERE, 'data', 'iris.csv')), + (pd.read_table, 'os', os.path.join(HERE, 'data', 'iris.csv')), + (pd.read_fwf, 'os', os.path.join(HERE, 'data', + 'fixed_width_format.txt')), + (pd.read_excel, 'xlrd', os.path.join(HERE, 'data', 'test1.xlsx')), + (pd.read_feather, 'feather', os.path.join(HERE, 'data', + 'feather-0_3_1.feather')), + (pd.read_hdf, 'tables', os.path.join(HERE, 'data', 'legacy_hdf', + 'datetimetz_object.h5')), + (pd.read_stata, 'os', os.path.join(HERE, 'data', 'stata10_115.dta')), + (pd.read_sas, 'os', os.path.join(HERE, 'sas', 'data', + 'test1.sas7bdat')), + (pd.read_json, 'os', os.path.join(HERE, 'json', 'data', + 'tsframe_v012.json')), + (pd.read_msgpack, 'os', os.path.join(HERE, 'msgpack', 'data', + 'frame.mp')), + (pd.read_pickle, 'os', os.path.join(HERE, 'data', + 'categorical_0_14_1.pickle')), + ]) + def test_read_fspath_all(self, reader, module, path): + pytest.importorskip(module) + + mypath = CustomFSPath(path) + result = reader(mypath) + expected = reader(path) + if path.endswith('.pickle'): + # categorical + tm.assert_categorical_equal(result, expected) + else: + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize('writer_name, writer_kwargs, module', [ + ('to_csv', {}, 'os'), + ('to_excel', {'engine': 'xlwt'}, 'xlwt'), + ('to_feather', {}, 'feather'), + ('to_html', {}, 'os'), + ('to_json', {}, 'os'), + ('to_latex', {}, 'os'), + ('to_msgpack', {}, 'os'), + ('to_pickle', {}, 'os'), + ('to_stata', {}, 'os'), + ]) + def test_write_fspath_all(self, writer_name, writer_kwargs, module): + p1 = tm.ensure_clean('string') + p2 = tm.ensure_clean('fspath') + df = pd.DataFrame({"A": [1, 2]}) + + with p1 as string, p2 as fspath: + pytest.importorskip(module) + mypath = CustomFSPath(fspath) + writer = getattr(df, writer_name) + + writer(string, **writer_kwargs) + with open(string, 'rb') as f: + expected = f.read() + + writer(mypath, **writer_kwargs) + with open(fspath, 'rb') as f: + result = f.read() + + assert result == expected + + def test_write_fspath_hdf5(self): + # Same test as write_fspath_all, except HDF5 files aren't + # necessarily byte-for-byte identical for a given dataframe, so we'll + # have to read and compare equality + pytest.importorskip('tables') + + df = pd.DataFrame({"A": [1, 2]}) + p1 = tm.ensure_clean('string') + p2 = tm.ensure_clean('fspath') + + with p1 as string, p2 as fspath: + mypath = CustomFSPath(fspath) + df.to_hdf(mypath, key='bar') + df.to_hdf(string, key='bar') + + result = pd.read_hdf(fspath, key='bar') + expected = pd.read_hdf(string, key='bar') + + tm.assert_frame_equal(result, expected) + + +class TestMMapWrapper(object): + + def setup_method(self, method): + self.mmap_file = os.path.join(tm.get_data_path(), + 'test_mmap.csv') + + def test_constructor_bad_file(self): + non_file = StringIO('I am not a file') + non_file.fileno = lambda: -1 + + # the error raised is different on Windows + if is_platform_windows(): + msg = "The parameter is incorrect" + err = OSError + else: + msg = "[Errno 22]" + err = mmap.error + + tm.assert_raises_regex(err, msg, common.MMapWrapper, non_file) + + target = open(self.mmap_file, 'r') + target.close() + + msg = "I/O operation on closed file" + tm.assert_raises_regex( + ValueError, msg, common.MMapWrapper, target) + + def test_get_attr(self): + with open(self.mmap_file, 'r') as target: + wrapper = common.MMapWrapper(target) + + attrs = dir(wrapper.mmap) + attrs = [attr for attr in attrs + if not attr.startswith('__')] + attrs.append('__next__') + + for attr in attrs: + assert hasattr(wrapper, attr) + + assert not hasattr(wrapper, 'foo') + + def test_next(self): + with open(self.mmap_file, 'r') as target: + wrapper = common.MMapWrapper(target) + lines = target.readlines() + + for line in lines: + next_line = next(wrapper) + assert next_line.strip() == line.strip() + + pytest.raises(StopIteration, next, wrapper) + + def test_unknown_engine(self): + with tm.ensure_clean() as path: + df = tm.makeDataFrame() + df.to_csv(path) + with tm.assert_raises_regex(ValueError, 'Unknown engine'): + read_csv(path, engine='pyt') diff --git a/pandas/io/tests/test_excel.py b/pandas/tests/io/test_excel.py similarity index 80% rename from pandas/io/tests/test_excel.py rename to pandas/tests/io/test_excel.py index 12aecfd50c3a6..92147b46097b8 100644 --- a/pandas/io/tests/test_excel.py +++ b/pandas/tests/io/test_excel.py @@ -5,17 +5,20 @@ import sys import os from distutils.version import LooseVersion +from functools import partial import warnings +from warnings import catch_warnings import operator import functools -import nose +import pytest from numpy import nan import numpy as np import pandas as pd from pandas import DataFrame, Index, MultiIndex +from pandas.io.formats.excel import ExcelFormatter from pandas.io.parsers import read_csv from pandas.io.excel import ( ExcelFile, ExcelWriter, read_excel, _XlwtWriter, _Openpyxl1Writer, @@ -32,30 +35,30 @@ def _skip_if_no_xlrd(): import xlrd ver = tuple(map(int, xlrd.__VERSION__.split(".")[:2])) if ver < (0, 9): - raise nose.SkipTest('xlrd < 0.9, skipping') + pytest.skip('xlrd < 0.9, skipping') except ImportError: - raise nose.SkipTest('xlrd not installed, skipping') + pytest.skip('xlrd not installed, skipping') def _skip_if_no_xlwt(): try: import xlwt # NOQA except ImportError: - raise nose.SkipTest('xlwt not installed, skipping') + pytest.skip('xlwt not installed, skipping') def _skip_if_no_openpyxl(): try: import openpyxl # NOQA except ImportError: - raise nose.SkipTest('openpyxl not installed, skipping') + pytest.skip('openpyxl not installed, skipping') def _skip_if_no_xlsxwriter(): try: import xlsxwriter # NOQA except ImportError: - raise nose.SkipTest('xlsxwriter not installed, skipping') + pytest.skip('xlsxwriter not installed, skipping') def _skip_if_no_excelsuite(): @@ -68,7 +71,7 @@ def _skip_if_no_s3fs(): try: import s3fs # noqa except ImportError: - raise nose.SkipTest('s3fs not installed, skipping') + pytest.skip('s3fs not installed, skipping') _seriesd = tm.getSeriesData() @@ -82,7 +85,7 @@ def _skip_if_no_s3fs(): class SharedItems(object): - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.frame = _frame.copy() self.frame2 = _frame2.copy() @@ -159,9 +162,9 @@ class ReadingTestsBase(SharedItems): # 3. Add a property engine_name, which is the name of the reader class. # For the reader this is not used for anything at the moment. - def setUp(self): + def setup_method(self, method): self.check_skip() - super(ReadingTestsBase, self).setUp() + super(ReadingTestsBase, self).setup_method(method) def test_parse_cols_int(self): @@ -285,7 +288,7 @@ def test_excel_table_sheet_by_index(self): tm.assert_frame_equal(df3, df4) import xlrd - with tm.assertRaises(xlrd.XLRDError): + with pytest.raises(xlrd.XLRDError): read_excel(excel, 'asdf') def test_excel_table(self): @@ -397,7 +400,7 @@ def test_reader_dtype(self): expected['c'] = ['001', '002', '003', '004'] tm.assert_frame_equal(actual, expected) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): actual = self.get_exceldf(basename, dtype={'d': 'int64'}) def test_reading_all_sheets(self): @@ -405,13 +408,13 @@ def test_reading_all_sheets(self): # Ensure a dict is returned. # See PR #9450 basename = 'test_multisheet' - dfs = self.get_exceldf(basename, sheetname=None) + dfs = self.get_exceldf(basename, sheet_name=None) # ensure this is not alphabetical to test order preservation expected_keys = ['Charlie', 'Alpha', 'Beta'] tm.assert_contains_all(expected_keys, dfs.keys()) # Issue 9930 # Ensure sheet order is preserved - tm.assert_equal(expected_keys, list(dfs.keys())) + assert expected_keys == list(dfs.keys()) def test_reading_multiple_specific_sheets(self): # Test reading specific sheetnames by specifying a mixed list @@ -422,7 +425,7 @@ def test_reading_multiple_specific_sheets(self): basename = 'test_multisheet' # Explicitly request duplicates. Only the set should be returned. expected_keys = [2, 'Charlie', 'Charlie'] - dfs = self.get_exceldf(basename, sheetname=expected_keys) + dfs = self.get_exceldf(basename, sheet_name=expected_keys) expected_keys = list(set(expected_keys)) tm.assert_contains_all(expected_keys, dfs.keys()) assert len(expected_keys) == len(dfs.keys()) @@ -432,7 +435,7 @@ def test_reading_all_sheets_with_blank(self): # In the case where some sheets are blank. # Issue #11711 basename = 'blank_with_header' - dfs = self.get_exceldf(basename, sheetname=None) + dfs = self.get_exceldf(basename, sheet_name=None) expected_keys = ['Sheet1', 'Sheet2', 'Sheet3'] tm.assert_contains_all(expected_keys, dfs.keys()) @@ -542,6 +545,22 @@ def test_date_conversion_overflow(self): result = self.get_exceldf('testdateoverflow') tm.assert_frame_equal(result, expected) + def test_sheet_name_and_sheetname(self): + # GH10559: Minor improvement: Change "sheet_name" to "sheetname" + # GH10969: DOC: Consistent var names (sheetname vs sheet_name) + # GH12604: CLN GH10559 Rename sheetname variable to sheet_name + dfref = self.get_csv_refdf('test1') + df1 = self.get_exceldf('test1', sheet_name='Sheet1') # doc + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + df2 = self.get_exceldf('test1', sheetname='Sheet1') # bkwrd compat + + tm.assert_frame_equal(df1, dfref, check_names=False) + tm.assert_frame_equal(df2, dfref, check_names=False) + + def test_sheet_name_both_raises(self): + with tm.assert_raises_regex(TypeError, "Cannot specify both"): + self.get_exceldf('test1', sheetname='Sheet1', sheet_name='Sheet1') + class XlrdTests(ReadingTestsBase): """ @@ -575,13 +594,13 @@ def test_read_xlrd_Book(self): result = read_excel(xl, "SheetA") tm.assert_frame_equal(df, result) - result = read_excel(book, sheetname="SheetA", engine="xlrd") + result = read_excel(book, sheet_name="SheetA", engine="xlrd") tm.assert_frame_equal(df, result) @tm.network def test_read_from_http_url(self): url = ('https://raw.github.com/pandas-dev/pandas/master/' - 'pandas/io/tests/data/test1' + self.ext) + 'pandas/tests/io/data/test1' + self.ext) url_table = read_excel(url) local_table = self.get_exceldf('test1') tm.assert_frame_equal(url_table, local_table) @@ -595,12 +614,12 @@ def test_read_from_s3_url(self): local_table = self.get_exceldf('test1') tm.assert_frame_equal(url_table, local_table) - @tm.slow + @pytest.mark.slow def test_read_from_file_url(self): # FILE if sys.version_info[:2] < (2, 6): - raise nose.SkipTest("file:// not supported with Python < 2.6") + pytest.skip("file:// not supported with Python < 2.6") localtable = os.path.join(self.dirpath, 'test1' + self.ext) local_table = read_excel(localtable) @@ -610,8 +629,8 @@ def test_read_from_file_url(self): except URLError: # fails on some systems import platform - raise nose.SkipTest("failing on %s" % - ' '.join(platform.uname()).strip()) + pytest.skip("failing on %s" % + ' '.join(platform.uname()).strip()) tm.assert_frame_equal(url_table, local_table) @@ -654,7 +673,7 @@ def test_reader_closes_file(self): # parses okay read_excel(xlsx, 'Sheet1', index_col=0) - self.assertTrue(f.closed) + assert f.closed def test_creating_and_reading_multiple_sheets(self): # Test reading multiple sheets, from a runtime created excel file @@ -677,7 +696,7 @@ def tdf(sheetname): with ExcelWriter(pth) as ew: for sheetname, df in iteritems(dfs): df.to_excel(ew, sheetname) - dfs_returned = read_excel(pth, sheetname=sheets) + dfs_returned = read_excel(pth, sheet_name=sheets) for s in sheets: tm.assert_frame_equal(dfs[s], dfs_returned[s]) @@ -862,8 +881,42 @@ def test_excel_multindex_roundtrip(self): tm.assert_frame_equal( df, act, check_names=check_names) - def test_excel_oldindex_format(self): - # GH 4679 + def test_excel_old_index_format(self): + # see gh-4679 + filename = 'test_index_name_pre17' + self.ext + in_file = os.path.join(self.dirpath, filename) + + # We detect headers to determine if index names exist, so + # that "index" name in the "names" version of the data will + # now be interpreted as rows that include null data. + data = np.array([[None, None, None, None, None], + ['R0C0', 'R0C1', 'R0C2', 'R0C3', 'R0C4'], + ['R1C0', 'R1C1', 'R1C2', 'R1C3', 'R1C4'], + ['R2C0', 'R2C1', 'R2C2', 'R2C3', 'R2C4'], + ['R3C0', 'R3C1', 'R3C2', 'R3C3', 'R3C4'], + ['R4C0', 'R4C1', 'R4C2', 'R4C3', 'R4C4']]) + columns = ['C_l0_g0', 'C_l0_g1', 'C_l0_g2', 'C_l0_g3', 'C_l0_g4'] + mi = MultiIndex(levels=[['R0', 'R_l0_g0', 'R_l0_g1', + 'R_l0_g2', 'R_l0_g3', 'R_l0_g4'], + ['R1', 'R_l1_g0', 'R_l1_g1', + 'R_l1_g2', 'R_l1_g3', 'R_l1_g4']], + labels=[[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]], + names=[None, None]) + si = Index(['R0', 'R_l0_g0', 'R_l0_g1', 'R_l0_g2', + 'R_l0_g3', 'R_l0_g4'], name=None) + + expected = pd.DataFrame(data, index=si, columns=columns) + + actual = pd.read_excel(in_file, 'single_names') + tm.assert_frame_equal(actual, expected) + + expected.index = mi + + actual = pd.read_excel(in_file, 'multi_names') + tm.assert_frame_equal(actual, expected) + + # The analogous versions of the "names" version data + # where there are explicitly no names for the indices. data = np.array([['R0C0', 'R0C1', 'R0C2', 'R0C3', 'R0C4'], ['R1C0', 'R1C1', 'R1C2', 'R1C3', 'R1C4'], ['R2C0', 'R2C1', 'R2C2', 'R2C3', 'R2C4'], @@ -875,66 +928,60 @@ def test_excel_oldindex_format(self): ['R_l1_g0', 'R_l1_g1', 'R_l1_g2', 'R_l1_g3', 'R_l1_g4']], labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]], - names=['R0', 'R1']) + names=[None, None]) si = Index(['R_l0_g0', 'R_l0_g1', 'R_l0_g2', - 'R_l0_g3', 'R_l0_g4'], name='R0') - - in_file = os.path.join( - self.dirpath, 'test_index_name_pre17' + self.ext) + 'R_l0_g3', 'R_l0_g4'], name=None) expected = pd.DataFrame(data, index=si, columns=columns) - with tm.assert_produces_warning(FutureWarning): - actual = pd.read_excel( - in_file, 'single_names', has_index_names=True) - tm.assert_frame_equal(actual, expected) - expected.index.name = None actual = pd.read_excel(in_file, 'single_no_names') tm.assert_frame_equal(actual, expected) - with tm.assert_produces_warning(FutureWarning): - actual = pd.read_excel( - in_file, 'single_no_names', has_index_names=False) - tm.assert_frame_equal(actual, expected) expected.index = mi - with tm.assert_produces_warning(FutureWarning): - actual = pd.read_excel( - in_file, 'multi_names', has_index_names=True) - tm.assert_frame_equal(actual, expected) - expected.index.names = [None, None] actual = pd.read_excel(in_file, 'multi_no_names', index_col=[0, 1]) tm.assert_frame_equal(actual, expected, check_names=False) - with tm.assert_produces_warning(FutureWarning): - actual = pd.read_excel(in_file, 'multi_no_names', index_col=[0, 1], - has_index_names=False) - tm.assert_frame_equal(actual, expected, check_names=False) def test_read_excel_bool_header_arg(self): # GH 6114 for arg in [True, False]: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): pd.read_excel(os.path.join(self.dirpath, 'test1' + self.ext), header=arg) def test_read_excel_chunksize(self): # GH 8011 - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): pd.read_excel(os.path.join(self.dirpath, 'test1' + self.ext), chunksize=100) def test_read_excel_parse_dates(self): - # GH 11544 - with tm.assertRaises(NotImplementedError): - pd.read_excel(os.path.join(self.dirpath, 'test1' + self.ext), - parse_dates=True) + # GH 11544, 12051 - def test_read_excel_date_parser(self): - # GH 11544 - with tm.assertRaises(NotImplementedError): - dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') - pd.read_excel(os.path.join(self.dirpath, 'test1' + self.ext), - date_parser=dateparse) + df = DataFrame( + {'col': [1, 2, 3], + 'date_strings': pd.date_range('2012-01-01', periods=3)}) + df2 = df.copy() + df2['date_strings'] = df2['date_strings'].dt.strftime('%m/%d/%Y') + + with ensure_clean(self.ext) as pth: + df2.to_excel(pth) + + res = read_excel(pth) + tm.assert_frame_equal(df2, res) + + # no index_col specified when parse_dates is True + with tm.assert_produces_warning(): + res = read_excel(pth, parse_dates=True) + tm.assert_frame_equal(df2, res) + + res = read_excel(pth, parse_dates=['date_strings'], index_col=0) + tm.assert_frame_equal(df, res) + + dateparser = lambda x: pd.datetime.strptime(x, '%m/%d/%Y') + res = read_excel(pth, parse_dates=['date_strings'], + date_parser=dateparser, index_col=0) + tm.assert_frame_equal(df, res) def test_read_excel_skiprows_list(self): # GH 4903 @@ -972,19 +1019,19 @@ def test_read_excel_squeeze(self): tm.assert_series_equal(actual, expected) -class XlsReaderTests(XlrdTests, tm.TestCase): +class TestXlsReaderTests(XlrdTests): ext = '.xls' engine_name = 'xlrd' check_skip = staticmethod(_skip_if_no_xlrd) -class XlsxReaderTests(XlrdTests, tm.TestCase): +class TestXlsxReaderTests(XlrdTests): ext = '.xlsx' engine_name = 'xlrd' check_skip = staticmethod(_skip_if_no_xlrd) -class XlsmReaderTests(XlrdTests, tm.TestCase): +class TestXlsmReaderTests(XlrdTests): ext = '.xlsm' engine_name = 'xlrd' check_skip = staticmethod(_skip_if_no_xlrd) @@ -1002,14 +1049,14 @@ class ExcelWriterBase(SharedItems): # Test with MultiIndex and Hierarchical Rows as merged cells. merge_cells = True - def setUp(self): + def setup_method(self, method): self.check_skip() - super(ExcelWriterBase, self).setUp() + super(ExcelWriterBase, self).setup_method(method) self.option_name = 'io.excel.%s.writer' % self.ext.strip('.') self.prev_engine = get_option(self.option_name) set_option(self.option_name, self.engine_name) - def tearDown(self): + def teardown_method(self, method): set_option(self.option_name, self.prev_engine) def test_excel_sheet_by_name_raise(self): @@ -1023,7 +1070,7 @@ def test_excel_sheet_by_name_raise(self): df = read_excel(xl, 0) tm.assert_frame_equal(gt, df) - with tm.assertRaises(xlrd.XLRDError): + with pytest.raises(xlrd.XLRDError): read_excel(xl, '0') def test_excelwriter_contextmanager(self): @@ -1199,9 +1246,9 @@ def test_sheets(self): tm.assert_frame_equal(self.frame, recons) recons = read_excel(reader, 'test2', index_col=0) tm.assert_frame_equal(self.tsframe, recons) - self.assertEqual(2, len(reader.sheet_names)) - self.assertEqual('test1', reader.sheet_names[0]) - self.assertEqual('test2', reader.sheet_names[1]) + assert 2 == len(reader.sheet_names) + assert 'test1' == reader.sheet_names[0] + assert 'test2' == reader.sheet_names[1] def test_colaliases(self): _skip_if_no_xlrd() @@ -1245,7 +1292,7 @@ def test_roundtrip_indexlabels(self): index_col=0, ).astype(np.int64) frame.index.names = ['test'] - self.assertEqual(frame.index.names, recons.index.names) + assert frame.index.names == recons.index.names frame = (DataFrame(np.random.randn(10, 2)) >= 0) frame.to_excel(path, @@ -1257,7 +1304,7 @@ def test_roundtrip_indexlabels(self): index_col=0, ).astype(np.int64) frame.index.names = ['test'] - self.assertEqual(frame.index.names, recons.index.names) + assert frame.index.names == recons.index.names frame = (DataFrame(np.random.randn(10, 2)) >= 0) frame.to_excel(path, @@ -1299,7 +1346,7 @@ def test_excel_roundtrip_indexname(self): index_col=0) tm.assert_frame_equal(result, df) - self.assertEqual(result.index.name, 'foo') + assert result.index.name == 'foo' def test_excel_roundtrip_datetime(self): _skip_if_no_xlrd() @@ -1382,8 +1429,7 @@ def test_to_excel_multiindex(self): # round trip frame.to_excel(path, 'test1', merge_cells=self.merge_cells) reader = ExcelFile(path) - df = read_excel(reader, 'test1', index_col=[0, 1], - parse_dates=False) + df = read_excel(reader, 'test1', index_col=[0, 1]) tm.assert_frame_equal(frame, df) # GH13511 @@ -1424,8 +1470,7 @@ def test_to_excel_multiindex_cols(self): frame.to_excel(path, 'test1', merge_cells=self.merge_cells) reader = ExcelFile(path) df = read_excel(reader, 'test1', header=header, - index_col=[0, 1], - parse_dates=False) + index_col=[0, 1]) if not self.merge_cells: fm = frame.columns.format(sparsify=False, adjoin=False, names=False) @@ -1448,7 +1493,7 @@ def test_to_excel_multiindex_dates(self): index_col=[0, 1]) tm.assert_frame_equal(tsframe, recons) - self.assertEqual(recons.index.names, ('time', 'foo')) + assert recons.index.names == ('time', 'foo') def test_to_excel_multiindex_no_write_index(self): _skip_if_no_xlrd() @@ -1513,7 +1558,7 @@ def test_to_excel_unicode_filename(self): try: f = open(filename, 'wb') except UnicodeEncodeError: - raise nose.SkipTest('no unicode file names on this system') + pytest.skip('no unicode file names on this system') else: f.close() @@ -1555,28 +1600,27 @@ def test_to_excel_unicode_filename(self): # import xlwt # import xlrd # except ImportError: - # raise nose.SkipTest + # pytest.skip # filename = '__tmp_to_excel_header_styling_xls__.xls' # pdf.to_excel(filename, 'test1') # wbk = xlrd.open_workbook(filename, # formatting_info=True) - # self.assertEqual(["test1"], wbk.sheet_names()) + # assert ["test1"] == wbk.sheet_names() # ws = wbk.sheet_by_name('test1') - # self.assertEqual([(0, 1, 5, 7), (0, 1, 3, 5), (0, 1, 1, 3)], - # ws.merged_cells) + # assert [(0, 1, 5, 7), (0, 1, 3, 5), (0, 1, 1, 3)] == ws.merged_cells # for i in range(0, 2): # for j in range(0, 7): # xfx = ws.cell_xf_index(0, 0) # cell_xf = wbk.xf_list[xfx] # font = wbk.font_list - # self.assertEqual(1, font[cell_xf.font_index].bold) - # self.assertEqual(1, cell_xf.border.top_line_style) - # self.assertEqual(1, cell_xf.border.right_line_style) - # self.assertEqual(1, cell_xf.border.bottom_line_style) - # self.assertEqual(1, cell_xf.border.left_line_style) - # self.assertEqual(2, cell_xf.alignment.hor_align) + # assert 1 == font[cell_xf.font_index].bold + # assert 1 == cell_xf.border.top_line_style + # assert 1 == cell_xf.border.right_line_style + # assert 1 == cell_xf.border.bottom_line_style + # assert 1 == cell_xf.border.left_line_style + # assert 2 == cell_xf.alignment.hor_align # os.remove(filename) # def test_to_excel_header_styling_xlsx(self): # import StringIO @@ -1601,41 +1645,41 @@ def test_to_excel_unicode_filename(self): # import openpyxl # from openpyxl.cell import get_column_letter # except ImportError: - # raise nose.SkipTest + # pytest.skip # if openpyxl.__version__ < '1.6.1': - # raise nose.SkipTest + # pytest.skip # # test xlsx_styling # filename = '__tmp_to_excel_header_styling_xlsx__.xlsx' # pdf.to_excel(filename, 'test1') # wbk = openpyxl.load_workbook(filename) - # self.assertEqual(["test1"], wbk.get_sheet_names()) + # assert ["test1"] == wbk.get_sheet_names() # ws = wbk.get_sheet_by_name('test1') # xlsaddrs = ["%s2" % chr(i) for i in range(ord('A'), ord('H'))] # xlsaddrs += ["A%s" % i for i in range(1, 6)] # xlsaddrs += ["B1", "D1", "F1"] # for xlsaddr in xlsaddrs: # cell = ws.cell(xlsaddr) - # self.assertTrue(cell.style.font.bold) - # self.assertEqual(openpyxl.style.Border.BORDER_THIN, - # cell.style.borders.top.border_style) - # self.assertEqual(openpyxl.style.Border.BORDER_THIN, - # cell.style.borders.right.border_style) - # self.assertEqual(openpyxl.style.Border.BORDER_THIN, - # cell.style.borders.bottom.border_style) - # self.assertEqual(openpyxl.style.Border.BORDER_THIN, - # cell.style.borders.left.border_style) - # self.assertEqual(openpyxl.style.Alignment.HORIZONTAL_CENTER, - # cell.style.alignment.horizontal) + # assert cell.style.font.bold + # assert (openpyxl.style.Border.BORDER_THIN == + # cell.style.borders.top.border_style) + # assert (openpyxl.style.Border.BORDER_THIN == + # cell.style.borders.right.border_style) + # assert (openpyxl.style.Border.BORDER_THIN == + # cell.style.borders.bottom.border_style) + # assert (openpyxl.style.Border.BORDER_THIN == + # cell.style.borders.left.border_style) + # assert (openpyxl.style.Alignment.HORIZONTAL_CENTER == + # cell.style.alignment.horizontal) # mergedcells_addrs = ["C1", "E1", "G1"] # for maddr in mergedcells_addrs: - # self.assertTrue(ws.cell(maddr).merged) + # assert ws.cell(maddr).merged # os.remove(filename) def test_excel_010_hemstring(self): _skip_if_no_xlrd() if self.merge_cells: - raise nose.SkipTest('Skip tests for merged MI format.') + pytest.skip('Skip tests for merged MI format.') from pandas.util.testing import makeCustomDataframe as mkdf # ensure limited functionality in 0.10 @@ -1660,29 +1704,29 @@ def roundtrip(df, header=True, parser_hdr=0, index=True): # this if will be removed once multi column excel writing # is implemented for now fixing #9794 if j > 1: - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): res = roundtrip(df, use_headers, index=False) else: res = roundtrip(df, use_headers) if use_headers: - self.assertEqual(res.shape, (nrows, ncols + i)) + assert res.shape == (nrows, ncols + i) else: # first row taken as columns - self.assertEqual(res.shape, (nrows - 1, ncols + i)) + assert res.shape == (nrows - 1, ncols + i) # no nans for r in range(len(res.index)): for c in range(len(res.columns)): - self.assertTrue(res.iloc[r, c] is not np.nan) + assert res.iloc[r, c] is not np.nan res = roundtrip(DataFrame([0])) - self.assertEqual(res.shape, (1, 1)) - self.assertTrue(res.iloc[0, 0] is not np.nan) + assert res.shape == (1, 1) + assert res.iloc[0, 0] is not np.nan res = roundtrip(DataFrame([0]), False, None) - self.assertEqual(res.shape, (1, 2)) - self.assertTrue(res.iloc[0, 0] is not np.nan) + assert res.shape == (1, 2) + assert res.iloc[0, 0] is not np.nan def test_excel_010_hemstring_raises_NotImplementedError(self): # This test was failing only for j>1 and header=False, @@ -1690,7 +1734,7 @@ def test_excel_010_hemstring_raises_NotImplementedError(self): _skip_if_no_xlrd() if self.merge_cells: - raise nose.SkipTest('Skip tests for merged MI format.') + pytest.skip('Skip tests for merged MI format.') from pandas.util.testing import makeCustomDataframe as mkdf # ensure limited functionality in 0.10 @@ -1710,7 +1754,7 @@ def roundtrip2(df, header=True, parser_hdr=0, index=True): j = 2 i = 1 df = mkdf(nrows, ncols, r_idx_nlevels=i, c_idx_nlevels=j) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): roundtrip2(df, header=False, index=False) def test_duplicated_columns(self): @@ -1769,7 +1813,7 @@ def test_invalid_columns(self): read_frame = read_excel(path, 'test1') tm.assert_frame_equal(expected, read_frame) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): write_frame.to_excel(path, 'test1', columns=['C', 'D']) def test_datetimes(self): @@ -1836,6 +1880,30 @@ def test_true_and_false_value_options(self): false_values=['bar']) tm.assert_frame_equal(read_frame, expected) + def test_freeze_panes(self): + # GH15160 + expected = DataFrame([[1, 2], [3, 4]], columns=['col1', 'col2']) + with ensure_clean(self.ext) as path: + expected.to_excel(path, "Sheet1", freeze_panes=(1, 1)) + result = read_excel(path) + tm.assert_frame_equal(expected, result) + + def test_path_pathlib(self): + df = tm.makeDataFrame() + writer = partial(df.to_excel, engine=self.engine_name) + reader = partial(pd.read_excel) + result = tm.round_trip_pathlib(writer, reader, + path="foo.{}".format(self.ext)) + tm.assert_frame_equal(df, result) + + def test_path_localpath(self): + df = tm.makeDataFrame() + writer = partial(df.to_excel, engine=self.engine_name) + reader = partial(pd.read_excel) + result = tm.round_trip_pathlib(writer, reader, + path="foo.{}".format(self.ext)) + tm.assert_frame_equal(df, result) + def raise_wrapper(major_ver): def versioned_raise_wrapper(orig_method): @@ -1847,7 +1915,7 @@ def wrapped(self, *args, **kwargs): else: msg = (r'Installed openpyxl is not supported at this ' r'time\. Use.+') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): orig_method(self, *args, **kwargs) return wrapped return versioned_raise_wrapper @@ -1865,7 +1933,7 @@ def versioned_raise_on_incompat_version(cls): @raise_on_incompat_version(1) -class OpenpyxlTests(ExcelWriterBase, tm.TestCase): +class TestOpenpyxlTests(ExcelWriterBase): ext = '.xlsx' engine_name = 'openpyxl1' check_skip = staticmethod(lambda *args, **kwargs: None) @@ -1873,7 +1941,7 @@ class OpenpyxlTests(ExcelWriterBase, tm.TestCase): def test_to_excel_styleconverter(self): _skip_if_no_openpyxl() if not openpyxl_compat.is_compat(major_ver=1): - raise nose.SkipTest('incompatiable openpyxl version') + pytest.skip('incompatible openpyxl version') import openpyxl @@ -1885,40 +1953,40 @@ def test_to_excel_styleconverter(self): "alignment": {"horizontal": "center", "vertical": "top"}} xlsx_style = _Openpyxl1Writer._convert_to_style(hstyle) - self.assertTrue(xlsx_style.font.bold) - self.assertEqual(openpyxl.style.Border.BORDER_THIN, - xlsx_style.borders.top.border_style) - self.assertEqual(openpyxl.style.Border.BORDER_THIN, - xlsx_style.borders.right.border_style) - self.assertEqual(openpyxl.style.Border.BORDER_THIN, - xlsx_style.borders.bottom.border_style) - self.assertEqual(openpyxl.style.Border.BORDER_THIN, - xlsx_style.borders.left.border_style) - self.assertEqual(openpyxl.style.Alignment.HORIZONTAL_CENTER, - xlsx_style.alignment.horizontal) - self.assertEqual(openpyxl.style.Alignment.VERTICAL_TOP, - xlsx_style.alignment.vertical) + assert xlsx_style.font.bold + assert (openpyxl.style.Border.BORDER_THIN == + xlsx_style.borders.top.border_style) + assert (openpyxl.style.Border.BORDER_THIN == + xlsx_style.borders.right.border_style) + assert (openpyxl.style.Border.BORDER_THIN == + xlsx_style.borders.bottom.border_style) + assert (openpyxl.style.Border.BORDER_THIN == + xlsx_style.borders.left.border_style) + assert (openpyxl.style.Alignment.HORIZONTAL_CENTER == + xlsx_style.alignment.horizontal) + assert (openpyxl.style.Alignment.VERTICAL_TOP == + xlsx_style.alignment.vertical) def skip_openpyxl_gt21(cls): - """Skip a TestCase instance if openpyxl >= 2.2""" + """Skip test case if openpyxl >= 2.2""" @classmethod - def setUpClass(cls): + def setup_class(cls): _skip_if_no_openpyxl() import openpyxl ver = openpyxl.__version__ if (not (LooseVersion(ver) >= LooseVersion('2.0.0') and LooseVersion(ver) < LooseVersion('2.2.0'))): - raise nose.SkipTest("openpyxl %s >= 2.2" % str(ver)) + pytest.skip("openpyxl %s >= 2.2" % str(ver)) - cls.setUpClass = setUpClass + cls.setup_class = setup_class return cls @raise_on_incompat_version(2) @skip_openpyxl_gt21 -class Openpyxl20Tests(ExcelWriterBase, tm.TestCase): +class TestOpenpyxl20Tests(ExcelWriterBase): ext = '.xlsx' engine_name = 'openpyxl20' check_skip = staticmethod(lambda *args, **kwargs: None) @@ -1976,15 +2044,15 @@ def test_to_excel_styleconverter(self): protection = styles.Protection(locked=True, hidden=False) kw = _Openpyxl20Writer._convert_to_style_kwargs(hstyle) - self.assertEqual(kw['font'], font) - self.assertEqual(kw['border'], border) - self.assertEqual(kw['alignment'], alignment) - self.assertEqual(kw['fill'], fill) - self.assertEqual(kw['number_format'], number_format) - self.assertEqual(kw['protection'], protection) + assert kw['font'] == font + assert kw['border'] == border + assert kw['alignment'] == alignment + assert kw['fill'] == fill + assert kw['number_format'] == number_format + assert kw['protection'] == protection def test_write_cells_merge_styled(self): - from pandas.formats.format import ExcelCell + from pandas.io.formats.excel import ExcelCell from openpyxl import styles sheet_name = 'merge_styled' @@ -2013,28 +2081,28 @@ def test_write_cells_merge_styled(self): wks = writer.sheets[sheet_name] xcell_b1 = wks['B1'] xcell_a2 = wks['A2'] - self.assertEqual(xcell_b1.style, openpyxl_sty_merged) - self.assertEqual(xcell_a2.style, openpyxl_sty_merged) + assert xcell_b1.style == openpyxl_sty_merged + assert xcell_a2.style == openpyxl_sty_merged def skip_openpyxl_lt22(cls): - """Skip a TestCase instance if openpyxl < 2.2""" + """Skip test case if openpyxl < 2.2""" @classmethod - def setUpClass(cls): + def setup_class(cls): _skip_if_no_openpyxl() import openpyxl ver = openpyxl.__version__ if LooseVersion(ver) < LooseVersion('2.2.0'): - raise nose.SkipTest("openpyxl %s < 2.2" % str(ver)) + pytest.skip("openpyxl %s < 2.2" % str(ver)) - cls.setUpClass = setUpClass + cls.setup_class = setup_class return cls @raise_on_incompat_version(2) @skip_openpyxl_lt22 -class Openpyxl22Tests(ExcelWriterBase, tm.TestCase): +class TestOpenpyxl22Tests(ExcelWriterBase): ext = '.xlsx' engine_name = 'openpyxl22' check_skip = staticmethod(lambda *args, **kwargs: None) @@ -2086,18 +2154,18 @@ def test_to_excel_styleconverter(self): protection = styles.Protection(locked=True, hidden=False) kw = _Openpyxl22Writer._convert_to_style_kwargs(hstyle) - self.assertEqual(kw['font'], font) - self.assertEqual(kw['border'], border) - self.assertEqual(kw['alignment'], alignment) - self.assertEqual(kw['fill'], fill) - self.assertEqual(kw['number_format'], number_format) - self.assertEqual(kw['protection'], protection) + assert kw['font'] == font + assert kw['border'] == border + assert kw['alignment'] == alignment + assert kw['fill'] == fill + assert kw['number_format'] == number_format + assert kw['protection'] == protection def test_write_cells_merge_styled(self): if not openpyxl_compat.is_compat(major_ver=2): - raise nose.SkipTest('incompatiable openpyxl version') + pytest.skip('incompatible openpyxl version') - from pandas.formats.format import ExcelCell + from pandas.io.formats.excel import ExcelCell sheet_name = 'merge_styled' @@ -2125,11 +2193,11 @@ def test_write_cells_merge_styled(self): wks = writer.sheets[sheet_name] xcell_b1 = wks['B1'] xcell_a2 = wks['A2'] - self.assertEqual(xcell_b1.font, openpyxl_sty_merged) - self.assertEqual(xcell_a2.font, openpyxl_sty_merged) + assert xcell_b1.font == openpyxl_sty_merged + assert xcell_a2.font == openpyxl_sty_merged -class XlwtTests(ExcelWriterBase, tm.TestCase): +class TestXlwtTests(ExcelWriterBase): ext = '.xls' engine_name = 'xlwt' check_skip = staticmethod(_skip_if_no_xlwt) @@ -2141,7 +2209,7 @@ def test_excel_raise_error_on_multiindex_columns_and_no_index(self): ('2014', 'height'), ('2014', 'weight')]) df = DataFrame(np.random.randn(10, 3), columns=cols) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): with ensure_clean(self.ext) as path: df.to_excel(path, index=False) @@ -2177,16 +2245,16 @@ def test_to_excel_styleconverter(self): "alignment": {"horizontal": "center", "vertical": "top"}} xls_style = _XlwtWriter._convert_to_style(hstyle) - self.assertTrue(xls_style.font.bold) - self.assertEqual(xlwt.Borders.THIN, xls_style.borders.top) - self.assertEqual(xlwt.Borders.THIN, xls_style.borders.right) - self.assertEqual(xlwt.Borders.THIN, xls_style.borders.bottom) - self.assertEqual(xlwt.Borders.THIN, xls_style.borders.left) - self.assertEqual(xlwt.Alignment.HORZ_CENTER, xls_style.alignment.horz) - self.assertEqual(xlwt.Alignment.VERT_TOP, xls_style.alignment.vert) + assert xls_style.font.bold + assert xlwt.Borders.THIN == xls_style.borders.top + assert xlwt.Borders.THIN == xls_style.borders.right + assert xlwt.Borders.THIN == xls_style.borders.bottom + assert xlwt.Borders.THIN == xls_style.borders.left + assert xlwt.Alignment.HORZ_CENTER == xls_style.alignment.horz + assert xlwt.Alignment.VERT_TOP == xls_style.alignment.vert -class XlsxWriterTests(ExcelWriterBase, tm.TestCase): +class TestXlsxWriterTests(ExcelWriterBase): ext = '.xlsx' engine_name = 'xlsxwriter' check_skip = staticmethod(_skip_if_no_xlsxwriter) @@ -2236,10 +2304,10 @@ def test_column_format(self): except: read_num_format = cell.style.number_format._format_code - self.assertEqual(read_num_format, num_format) + assert read_num_format == num_format -class OpenpyxlTests_NoMerge(ExcelWriterBase, tm.TestCase): +class TestOpenpyxlTests_NoMerge(ExcelWriterBase): ext = '.xlsx' engine_name = 'openpyxl' check_skip = staticmethod(_skip_if_no_openpyxl) @@ -2248,7 +2316,7 @@ class OpenpyxlTests_NoMerge(ExcelWriterBase, tm.TestCase): merge_cells = False -class XlwtTests_NoMerge(ExcelWriterBase, tm.TestCase): +class TestXlwtTests_NoMerge(ExcelWriterBase): ext = '.xls' engine_name = 'xlwt' check_skip = staticmethod(_skip_if_no_xlwt) @@ -2257,7 +2325,7 @@ class XlwtTests_NoMerge(ExcelWriterBase, tm.TestCase): merge_cells = False -class XlsxWriterTests_NoMerge(ExcelWriterBase, tm.TestCase): +class TestXlsxWriterTests_NoMerge(ExcelWriterBase): ext = '.xlsx' engine_name = 'xlsxwriter' check_skip = staticmethod(_skip_if_no_xlsxwriter) @@ -2266,10 +2334,10 @@ class XlsxWriterTests_NoMerge(ExcelWriterBase, tm.TestCase): merge_cells = False -class ExcelWriterEngineTests(tm.TestCase): +class TestExcelWriterEngineTests(object): def test_ExcelWriter_dispatch(self): - with tm.assertRaisesRegexp(ValueError, 'No engine'): + with tm.assert_raises_regex(ValueError, 'No engine'): ExcelWriter('nothing') try: @@ -2278,17 +2346,17 @@ def test_ExcelWriter_dispatch(self): except ImportError: _skip_if_no_openpyxl() if not openpyxl_compat.is_compat(major_ver=1): - raise nose.SkipTest('incompatible openpyxl version') + pytest.skip('incompatible openpyxl version') writer_klass = _Openpyxl1Writer with ensure_clean('.xlsx') as path: writer = ExcelWriter(path) - tm.assertIsInstance(writer, writer_klass) + assert isinstance(writer, writer_klass) _skip_if_no_xlwt() with ensure_clean('.xls') as path: writer = ExcelWriter(path) - tm.assertIsInstance(writer, _XlwtWriter) + assert isinstance(writer, _XlwtWriter) def test_register_writer(self): # some awkward mocking to test out dispatch and such actually works @@ -2309,24 +2377,184 @@ def write_cells(self, *args, **kwargs): def check_called(func): func() - self.assertTrue(len(called_save) >= 1) - self.assertTrue(len(called_write_cells) >= 1) + assert len(called_save) >= 1 + assert len(called_write_cells) >= 1 del called_save[:] del called_write_cells[:] with pd.option_context('io.excel.xlsx.writer', 'dummy'): register_writer(DummyClass) writer = ExcelWriter('something.test') - tm.assertIsInstance(writer, DummyClass) + assert isinstance(writer, DummyClass) df = tm.makeCustomDataframe(1, 1) - panel = tm.makePanel() - func = lambda: df.to_excel('something.test') - check_called(func) - check_called(lambda: panel.to_excel('something.test')) - check_called(lambda: df.to_excel('something.xlsx')) - check_called(lambda: df.to_excel('something.xls', engine='dummy')) + + with catch_warnings(record=True): + panel = tm.makePanel() + func = lambda: df.to_excel('something.test') + check_called(func) + check_called(lambda: panel.to_excel('something.test')) + check_called(lambda: df.to_excel('something.xlsx')) + check_called( + lambda: df.to_excel( + 'something.xls', engine='dummy')) + + +@pytest.mark.parametrize('engine', [ + pytest.param('xlwt', + marks=pytest.mark.xfail(reason='xlwt does not support ' + 'openpyxl-compatible ' + 'style dicts')), + 'xlsxwriter', + 'openpyxl', +]) +def test_styler_to_excel(engine): + def style(df): + # XXX: RGB colors not supported in xlwt + return DataFrame([['font-weight: bold', '', ''], + ['', 'color: blue', ''], + ['', '', 'text-decoration: underline'], + ['border-style: solid', '', ''], + ['', 'font-style: italic', ''], + ['', '', 'text-align: right'], + ['background-color: red', '', ''], + ['', '', ''], + ['', '', ''], + ['', '', '']], + index=df.index, columns=df.columns) + + def assert_equal_style(cell1, cell2): + # XXX: should find a better way to check equality + assert cell1.alignment.__dict__ == cell2.alignment.__dict__ + assert cell1.border.__dict__ == cell2.border.__dict__ + assert cell1.fill.__dict__ == cell2.fill.__dict__ + assert cell1.font.__dict__ == cell2.font.__dict__ + assert cell1.number_format == cell2.number_format + assert cell1.protection.__dict__ == cell2.protection.__dict__ + + def custom_converter(css): + # use bold iff there is custom style attached to the cell + if css.strip(' \n;'): + return {'font': {'bold': True}} + return {} + + pytest.importorskip('jinja2') + pytest.importorskip(engine) + + if engine == 'openpyxl' and openpyxl_compat.is_compat(major_ver=1): + pytest.xfail('openpyxl1 does not support some openpyxl2-compatible ' + 'style dicts') + + # Prepare spreadsheets + + df = DataFrame(np.random.randn(10, 3)) + with ensure_clean('.xlsx' if engine != 'xlwt' else '.xls') as path: + writer = ExcelWriter(path, engine=engine) + df.to_excel(writer, sheet_name='frame') + df.style.to_excel(writer, sheet_name='unstyled') + styled = df.style.apply(style, axis=None) + styled.to_excel(writer, sheet_name='styled') + ExcelFormatter(styled, style_converter=custom_converter).write( + writer, sheet_name='custom') + + # For engines other than openpyxl 2, we only smoke test + if engine != 'openpyxl': + return + if not openpyxl_compat.is_compat(major_ver=2): + pytest.skip('incompatible openpyxl version') + + # (1) compare DataFrame.to_excel and Styler.to_excel when unstyled + n_cells = 0 + for col1, col2 in zip(writer.sheets['frame'].columns, + writer.sheets['unstyled'].columns): + assert len(col1) == len(col2) + for cell1, cell2 in zip(col1, col2): + assert cell1.value == cell2.value + assert_equal_style(cell1, cell2) + n_cells += 1 + + # ensure iteration actually happened: + assert n_cells == (10 + 1) * (3 + 1) + + # (2) check styling with default converter + n_cells = 0 + for col1, col2 in zip(writer.sheets['frame'].columns, + writer.sheets['styled'].columns): + assert len(col1) == len(col2) + for cell1, cell2 in zip(col1, col2): + ref = '%s%d' % (cell2.column, cell2.row) + # XXX: this isn't as strong a test as ideal; we should + # differences are exclusive + if ref == 'B2': + assert not cell1.font.bold + assert cell2.font.bold + elif ref == 'C3': + assert cell1.font.color.rgb != cell2.font.color.rgb + assert cell2.font.color.rgb == '000000FF' + elif ref == 'D4': + assert cell1.font.underline != cell2.font.underline + assert cell2.font.underline == 'single' + elif ref == 'B5': + assert not cell1.border.left.style + assert (cell2.border.top.style == + cell2.border.right.style == + cell2.border.bottom.style == + cell2.border.left.style == + 'medium') + elif ref == 'C6': + assert not cell1.font.italic + assert cell2.font.italic + elif ref == 'D7': + assert (cell1.alignment.horizontal != + cell2.alignment.horizontal) + assert cell2.alignment.horizontal == 'right' + elif ref == 'B8': + assert cell1.fill.fgColor.rgb != cell2.fill.fgColor.rgb + assert cell1.fill.patternType != cell2.fill.patternType + assert cell2.fill.fgColor.rgb == '00FF0000' + assert cell2.fill.patternType == 'solid' + else: + assert_equal_style(cell1, cell2) + + assert cell1.value == cell2.value + n_cells += 1 + + assert n_cells == (10 + 1) * (3 + 1) + + # (3) check styling with custom converter + n_cells = 0 + for col1, col2 in zip(writer.sheets['frame'].columns, + writer.sheets['custom'].columns): + assert len(col1) == len(col2) + for cell1, cell2 in zip(col1, col2): + ref = '%s%d' % (cell2.column, cell2.row) + if ref in ('B2', 'C3', 'D4', 'B5', 'C6', 'D7', 'B8'): + assert not cell1.font.bold + assert cell2.font.bold + else: + assert_equal_style(cell1, cell2) + + assert cell1.value == cell2.value + n_cells += 1 + + assert n_cells == (10 + 1) * (3 + 1) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) +class TestFSPath(object): + + @pytest.mark.skipif(sys.version_info < (3, 6), reason='requires fspath') + def test_excelfile_fspath(self): + _skip_if_no_openpyxl() + with tm.ensure_clean('foo.xlsx') as path: + df = DataFrame({"A": [1, 2]}) + df.to_excel(path) + xl = ExcelFile(path) + result = os.fspath(xl) + assert result == path + + @pytest.mark.skipif(sys.version_info < (3, 6), reason='requires fspath') + # @pytest.mark.xfail + def test_excelwriter_fspath(self): + _skip_if_no_openpyxl() + with tm.ensure_clean('foo.xlsx') as path: + writer = ExcelWriter(path) + assert os.fspath(writer) == str(path) diff --git a/pandas/io/tests/test_feather.py b/pandas/tests/io/test_feather.py similarity index 54% rename from pandas/io/tests/test_feather.py rename to pandas/tests/io/test_feather.py index b8b85d7dbbece..dadfe7ca87e48 100644 --- a/pandas/io/tests/test_feather.py +++ b/pandas/tests/io/test_feather.py @@ -1,42 +1,37 @@ """ test feather-format compat """ -import nose +import pytest +feather = pytest.importorskip('feather') import numpy as np import pandas as pd - from pandas.io.feather_format import to_feather, read_feather -try: - import feather # noqa -except ImportError: - raise nose.SkipTest('no feather-format installed') - from feather import FeatherError -import pandas.util.testing as tm from pandas.util.testing import assert_frame_equal, ensure_clean +import pandas.util.testing as tm +from distutils.version import LooseVersion + +fv = LooseVersion(feather.__version__) -class TestFeather(tm.TestCase): - _multiprocess_can_split_ = True - def setUp(self): - pass +@pytest.mark.single +class TestFeather(object): def check_error_on_write(self, df, exc): # check that we are raising the exception # on writing - def f(): + with pytest.raises(exc): with ensure_clean() as path: to_feather(df, path) - self.assertRaises(exc, f) - def check_round_trip(self, df): + def check_round_trip(self, df, **kwargs): with ensure_clean() as path: to_feather(df, path) - result = read_feather(path) + result = read_feather(path, **kwargs) assert_frame_equal(result, df) def test_error(self): @@ -47,20 +42,26 @@ def test_error(self): def test_basic(self): - df = pd.DataFrame({'a': list('abc'), - 'b': list(range(1, 4)), - 'c': np.arange(3, 6).astype('u1'), - 'd': np.arange(4.0, 7.0, dtype='float64'), - 'e': [True, False, True], - 'f': pd.Categorical(list('abc')), - 'g': pd.date_range('20130101', periods=3), - 'h': pd.date_range('20130101', periods=3, - tz='US/Eastern'), - 'i': pd.date_range('20130101', periods=3, - freq='ns')}) - + df = pd.DataFrame({'string': list('abc'), + 'int': list(range(1, 4)), + 'uint': np.arange(3, 6).astype('u1'), + 'float': np.arange(4.0, 7.0, dtype='float64'), + 'float_with_null': [1., np.nan, 3], + 'bool': [True, False, True], + 'bool_with_null': [True, np.nan, False], + 'cat': pd.Categorical(list('abc')), + 'dt': pd.date_range('20130101', periods=3), + 'dttz': pd.date_range('20130101', periods=3, + tz='US/Eastern'), + 'dt_with_null': [pd.Timestamp('20130101'), pd.NaT, + pd.Timestamp('20130103')], + 'dtns': pd.date_range('20130101', periods=3, + freq='ns')}) + + assert df.dttz.dtype.tz.zone == 'US/Eastern' self.check_round_trip(df) + @pytest.mark.skipif(fv >= '0.4.0', reason='fixed in 0.4.0') def test_strided_data_issues(self): # strided data issuehttps://github.com/wesm/feather/issues/97 @@ -80,16 +81,29 @@ def test_stringify_columns(self): df = pd.DataFrame(np.arange(12).reshape(4, 3)).copy() self.check_error_on_write(df, ValueError) + @pytest.mark.skipif(fv >= '0.4.0', reason='fixed in 0.4.0') def test_unsupported(self): - # period - df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) - self.check_error_on_write(df, ValueError) + # timedelta + df = pd.DataFrame({'a': pd.timedelta_range('1 day', periods=3)}) + self.check_error_on_write(df, FeatherError) # non-strings df = pd.DataFrame({'a': ['a', 1, 2.0]}) self.check_error_on_write(df, ValueError) + def test_unsupported_other(self): + + # period + df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) + self.check_error_on_write(df, ValueError) + + @pytest.mark.skipif(fv < '0.4.0', reason='new in 0.4.0') + def test_rw_nthreads(self): + + df = pd.DataFrame({'A': np.arange(100000)}) + self.check_round_trip(df, nthreads=2) + def test_write_with_index(self): df = pd.DataFrame({'A': [1, 2, 3]}) @@ -117,7 +131,12 @@ def test_write_with_index(self): df.columns = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)]), self.check_error_on_write(df, ValueError) + def test_path_pathlib(self): + df = tm.makeDataFrame().reset_index() + result = tm.round_trip_pathlib(df.to_feather, pd.read_feather) + tm.assert_frame_equal(df, result) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_path_localpath(self): + df = tm.makeDataFrame().reset_index() + result = tm.round_trip_localpath(df.to_feather, pd.read_feather) + tm.assert_frame_equal(df, result) diff --git a/pandas/tests/io/test_gbq.py b/pandas/tests/io/test_gbq.py new file mode 100644 index 0000000000000..8f20fb2e75c8a --- /dev/null +++ b/pandas/tests/io/test_gbq.py @@ -0,0 +1,136 @@ +import pytest +from datetime import datetime +import pytz +import platform +from time import sleep +import os + +import numpy as np +import pandas as pd +from pandas import compat, DataFrame + +from pandas.compat import range + +pandas_gbq = pytest.importorskip('pandas_gbq') + +PROJECT_ID = None +PRIVATE_KEY_JSON_PATH = None +PRIVATE_KEY_JSON_CONTENTS = None + +if compat.PY3: + DATASET_ID = 'pydata_pandas_bq_testing_py3' +else: + DATASET_ID = 'pydata_pandas_bq_testing_py2' + +TABLE_ID = 'new_test' +DESTINATION_TABLE = "{0}.{1}".format(DATASET_ID + "1", TABLE_ID) + +VERSION = platform.python_version() + + +def _skip_if_no_project_id(): + if not _get_project_id(): + pytest.skip( + "Cannot run integration tests without a project id") + + +def _skip_if_no_private_key_path(): + if not _get_private_key_path(): + pytest.skip("Cannot run integration tests without a " + "private key json file path") + + +def _in_travis_environment(): + return 'TRAVIS_BUILD_DIR' in os.environ and \ + 'GBQ_PROJECT_ID' in os.environ + + +def _get_project_id(): + if _in_travis_environment(): + return os.environ.get('GBQ_PROJECT_ID') + else: + return PROJECT_ID + + +def _get_private_key_path(): + if _in_travis_environment(): + return os.path.join(*[os.environ.get('TRAVIS_BUILD_DIR'), 'ci', + 'travis_gbq.json']) + else: + return PRIVATE_KEY_JSON_PATH + + +def clean_gbq_environment(private_key=None): + dataset = pandas_gbq.gbq._Dataset(_get_project_id(), + private_key=private_key) + + for i in range(1, 10): + if DATASET_ID + str(i) in dataset.datasets(): + dataset_id = DATASET_ID + str(i) + table = pandas_gbq.gbq._Table(_get_project_id(), dataset_id, + private_key=private_key) + for j in range(1, 20): + if TABLE_ID + str(j) in dataset.tables(dataset_id): + table.delete(TABLE_ID + str(j)) + + dataset.delete(dataset_id) + + +def make_mixed_dataframe_v2(test_size): + # create df to test for all BQ datatypes except RECORD + bools = np.random.randint(2, size=(1, test_size)).astype(bool) + flts = np.random.randn(1, test_size) + ints = np.random.randint(1, 10, size=(1, test_size)) + strs = np.random.randint(1, 10, size=(1, test_size)).astype(str) + times = [datetime.now(pytz.timezone('US/Arizona')) + for t in range(test_size)] + return DataFrame({'bools': bools[0], + 'flts': flts[0], + 'ints': ints[0], + 'strs': strs[0], + 'times': times[0]}, + index=range(test_size)) + + +@pytest.mark.xfail(reason="gbq having issues") +@pytest.mark.single +class TestToGBQIntegrationWithServiceAccountKeyPath(object): + + @classmethod + def setup_class(cls): + # - GLOBAL CLASS FIXTURES - + # put here any instruction you want to execute only *ONCE* *BEFORE* + # executing *ALL* tests described below. + + _skip_if_no_project_id() + _skip_if_no_private_key_path() + + clean_gbq_environment(_get_private_key_path()) + pandas_gbq.gbq._Dataset(_get_project_id(), + private_key=_get_private_key_path() + ).create(DATASET_ID + "1") + + @classmethod + def teardown_class(cls): + # - GLOBAL CLASS FIXTURES - + # put here any instruction you want to execute only *ONCE* *AFTER* + # executing all tests. + + clean_gbq_environment(_get_private_key_path()) + + def test_roundtrip(self): + destination_table = DESTINATION_TABLE + "1" + + test_size = 20001 + df = make_mixed_dataframe_v2(test_size) + + df.to_gbq(destination_table, _get_project_id(), chunksize=10000, + private_key=_get_private_key_path()) + + sleep(30) # <- Curses Google!!! + + result = pd.read_gbq("SELECT COUNT(*) AS num_rows FROM {0}" + .format(destination_table), + project_id=_get_project_id(), + private_key=_get_private_key_path()) + assert result['num_rows'][0] == test_size diff --git a/pandas/io/tests/test_html.py b/pandas/tests/io/test_html.py similarity index 84% rename from pandas/io/tests/test_html.py rename to pandas/tests/io/test_html.py index 9ac8def3a074d..6fc080c8d9090 100644 --- a/pandas/io/tests/test_html.py +++ b/pandas/tests/io/test_html.py @@ -3,16 +3,20 @@ import glob import os import re +import threading import warnings + +# imports needed for Python 3.x but will fail under Python 2.x try: - from importlib import import_module + from importlib import import_module, reload except ImportError: import_module = __import__ + from distutils.version import LooseVersion -import nose +import pytest import numpy as np from numpy.random import rand @@ -20,10 +24,11 @@ from pandas import (DataFrame, MultiIndex, read_csv, Timestamp, Index, date_range, Series) from pandas.compat import (map, zip, StringIO, string_types, BytesIO, - is_platform_windows) + is_platform_windows, PY3) from pandas.io.common import URLError, urlopen, file_path_to_url +import pandas.io.html from pandas.io.html import read_html -from pandas.parser import ParserError +from pandas._libs.parsers import ParserError import pandas.util.testing as tm from pandas.util.testing import makeCustomDataframe as mkdf, network @@ -39,7 +44,7 @@ def _have_module(module_name): def _skip_if_no(module_name): if not _have_module(module_name): - raise nose.SkipTest("{0!r} not found".format(module_name)) + pytest.skip("{0!r} not found".format(module_name)) def _skip_if_none_of(module_names): @@ -48,16 +53,16 @@ def _skip_if_none_of(module_names): if module_names == 'bs4': import bs4 if bs4.__version__ == LooseVersion('4.2.0'): - raise nose.SkipTest("Bad version of bs4: 4.2.0") + pytest.skip("Bad version of bs4: 4.2.0") else: not_found = [module_name for module_name in module_names if not _have_module(module_name)] if set(not_found) & set(module_names): - raise nose.SkipTest("{0!r} not found".format(not_found)) + pytest.skip("{0!r} not found".format(not_found)) if 'bs4' in module_names: import bs4 if bs4.__version__ == LooseVersion('4.2.0'): - raise nose.SkipTest("Bad version of bs4: 4.2.0") + pytest.skip("Bad version of bs4: 4.2.0") DATA_PATH = tm.get_data_path() @@ -93,14 +98,16 @@ def read_html(self, *args, **kwargs): return read_html(*args, **kwargs) -class TestReadHtml(tm.TestCase, ReadHtmlMixin): +class TestReadHtml(ReadHtmlMixin): flavor = 'bs4' spam_data = os.path.join(DATA_PATH, 'spam.html') + spam_data_kwargs = {} + if PY3: + spam_data_kwargs['encoding'] = 'UTF-8' banklist_data = os.path.join(DATA_PATH, 'banklist.html') @classmethod - def setUpClass(cls): - super(TestReadHtml, cls).setUpClass() + def setup_class(cls): _skip_if_none_of(('bs4', 'html5lib')) def test_to_html_compat(self): @@ -128,7 +135,7 @@ def test_spam_url(self): assert_framelist_equal(df1, df2) - @tm.slow + @pytest.mark.slow def test_banklist(self): df1 = self.read_html(self.banklist_data, '.*Florida.*', attrs={'id': 'table'}) @@ -144,31 +151,31 @@ def test_spam_no_types(self): df2 = self.read_html(self.spam_data, 'Unit') assert_framelist_equal(df1, df2) - self.assertEqual(df1[0].iloc[0, 0], 'Proximates') - self.assertEqual(df1[0].columns[0], 'Nutrient') + assert df1[0].iloc[0, 0] == 'Proximates' + assert df1[0].columns[0] == 'Nutrient' def test_spam_with_types(self): df1 = self.read_html(self.spam_data, '.*Water.*') df2 = self.read_html(self.spam_data, 'Unit') assert_framelist_equal(df1, df2) - self.assertEqual(df1[0].iloc[0, 0], 'Proximates') - self.assertEqual(df1[0].columns[0], 'Nutrient') + assert df1[0].iloc[0, 0] == 'Proximates' + assert df1[0].columns[0] == 'Nutrient' def test_spam_no_match(self): dfs = self.read_html(self.spam_data) for df in dfs: - tm.assertIsInstance(df, DataFrame) + assert isinstance(df, DataFrame) def test_banklist_no_match(self): dfs = self.read_html(self.banklist_data, attrs={'id': 'table'}) for df in dfs: - tm.assertIsInstance(df, DataFrame) + assert isinstance(df, DataFrame) def test_spam_header(self): df = self.read_html(self.spam_data, '.*Water.*', header=1)[0] - self.assertEqual(df.columns[0], 'Proximates') - self.assertFalse(df.empty) + assert df.columns[0] == 'Proximates' + assert not df.empty def test_skiprows_int(self): df1 = self.read_html(self.spam_data, '.*Water.*', skiprows=1) @@ -219,8 +226,8 @@ def test_skiprows_ndarray(self): assert_framelist_equal(df1, df2) def test_skiprows_invalid(self): - with tm.assertRaisesRegexp(TypeError, - 'is not a valid type for skipping rows'): + with tm.assert_raises_regex(TypeError, 'is not a valid type ' + 'for skipping rows'): self.read_html(self.spam_data, '.*Water.*', skiprows='asdf') def test_index(self): @@ -248,10 +255,10 @@ def test_infer_types(self): assert_framelist_equal(df1, df2) def test_string_io(self): - with open(self.spam_data) as f: + with open(self.spam_data, **self.spam_data_kwargs) as f: data1 = StringIO(f.read()) - with open(self.spam_data) as f: + with open(self.spam_data, **self.spam_data_kwargs) as f: data2 = StringIO(f.read()) df1 = self.read_html(data1, '.*Water.*') @@ -259,7 +266,7 @@ def test_string_io(self): assert_framelist_equal(df1, df2) def test_string(self): - with open(self.spam_data) as f: + with open(self.spam_data, **self.spam_data_kwargs) as f: data = f.read() df1 = self.read_html(data, '.*Water.*') @@ -268,41 +275,41 @@ def test_string(self): assert_framelist_equal(df1, df2) def test_file_like(self): - with open(self.spam_data) as f: + with open(self.spam_data, **self.spam_data_kwargs) as f: df1 = self.read_html(f, '.*Water.*') - with open(self.spam_data) as f: + with open(self.spam_data, **self.spam_data_kwargs) as f: df2 = self.read_html(f, 'Unit') assert_framelist_equal(df1, df2) @network def test_bad_url_protocol(self): - with tm.assertRaises(URLError): + with pytest.raises(URLError): self.read_html('git://github.com', match='.*Water.*') @network def test_invalid_url(self): try: - with tm.assertRaises(URLError): + with pytest.raises(URLError): self.read_html('http://www.a23950sdfa908sd.com', match='.*Water.*') except ValueError as e: - self.assertEqual(str(e), 'No tables found') + assert str(e) == 'No tables found' - @tm.slow + @pytest.mark.slow def test_file_url(self): url = self.banklist_data dfs = self.read_html(file_path_to_url(url), 'First', attrs={'id': 'table'}) - tm.assertIsInstance(dfs, list) + assert isinstance(dfs, list) for df in dfs: - tm.assertIsInstance(df, DataFrame) + assert isinstance(df, DataFrame) - @tm.slow + @pytest.mark.slow def test_invalid_table_attrs(self): url = self.banklist_data - with tm.assertRaisesRegexp(ValueError, 'No tables found'): + with tm.assert_raises_regex(ValueError, 'No tables found'): self.read_html(url, 'First Federal Bank of Florida', attrs={'id': 'tasdfable'}) @@ -310,67 +317,67 @@ def _bank_data(self, *args, **kwargs): return self.read_html(self.banklist_data, 'Metcalf', attrs={'id': 'table'}, *args, **kwargs) - @tm.slow + @pytest.mark.slow def test_multiindex_header(self): df = self._bank_data(header=[0, 1])[0] - tm.assertIsInstance(df.columns, MultiIndex) + assert isinstance(df.columns, MultiIndex) - @tm.slow + @pytest.mark.slow def test_multiindex_index(self): df = self._bank_data(index_col=[0, 1])[0] - tm.assertIsInstance(df.index, MultiIndex) + assert isinstance(df.index, MultiIndex) - @tm.slow + @pytest.mark.slow def test_multiindex_header_index(self): df = self._bank_data(header=[0, 1], index_col=[0, 1])[0] - tm.assertIsInstance(df.columns, MultiIndex) - tm.assertIsInstance(df.index, MultiIndex) + assert isinstance(df.columns, MultiIndex) + assert isinstance(df.index, MultiIndex) - @tm.slow + @pytest.mark.slow def test_multiindex_header_skiprows_tuples(self): df = self._bank_data(header=[0, 1], skiprows=1, tupleize_cols=True)[0] - tm.assertIsInstance(df.columns, Index) + assert isinstance(df.columns, Index) - @tm.slow + @pytest.mark.slow def test_multiindex_header_skiprows(self): df = self._bank_data(header=[0, 1], skiprows=1)[0] - tm.assertIsInstance(df.columns, MultiIndex) + assert isinstance(df.columns, MultiIndex) - @tm.slow + @pytest.mark.slow def test_multiindex_header_index_skiprows(self): df = self._bank_data(header=[0, 1], index_col=[0, 1], skiprows=1)[0] - tm.assertIsInstance(df.index, MultiIndex) - tm.assertIsInstance(df.columns, MultiIndex) + assert isinstance(df.index, MultiIndex) + assert isinstance(df.columns, MultiIndex) - @tm.slow + @pytest.mark.slow def test_regex_idempotency(self): url = self.banklist_data dfs = self.read_html(file_path_to_url(url), match=re.compile(re.compile('Florida')), attrs={'id': 'table'}) - tm.assertIsInstance(dfs, list) + assert isinstance(dfs, list) for df in dfs: - tm.assertIsInstance(df, DataFrame) + assert isinstance(df, DataFrame) def test_negative_skiprows(self): - with tm.assertRaisesRegexp(ValueError, - r'\(you passed a negative value\)'): + with tm.assert_raises_regex(ValueError, + r'\(you passed a negative value\)'): self.read_html(self.spam_data, 'Water', skiprows=-1) @network def test_multiple_matches(self): url = 'https://docs.python.org/2/' dfs = self.read_html(url, match='Python') - self.assertTrue(len(dfs) > 1) + assert len(dfs) > 1 @network def test_python_docs_table(self): url = 'https://docs.python.org/2/' dfs = self.read_html(url, match='Python') zz = [df.iloc[0, 0][0:4] for df in dfs] - self.assertEqual(sorted(zz), sorted(['Repo', 'What'])) + assert sorted(zz) == sorted(['Repo', 'What']) - @tm.slow + @pytest.mark.slow def test_thousands_macau_stats(self): all_non_nan_table_index = -2 macau_data = os.path.join(DATA_PATH, 'macau.html') @@ -378,16 +385,16 @@ def test_thousands_macau_stats(self): attrs={'class': 'style1'}) df = dfs[all_non_nan_table_index] - self.assertFalse(any(s.isnull().any() for _, s in df.iteritems())) + assert not any(s.isna().any() for _, s in df.iteritems()) - @tm.slow + @pytest.mark.slow def test_thousands_macau_index_col(self): all_non_nan_table_index = -2 macau_data = os.path.join(DATA_PATH, 'macau.html') dfs = self.read_html(macau_data, index_col=0, header=0) df = dfs[all_non_nan_table_index] - self.assertFalse(any(s.isnull().any() for _, s in df.iteritems())) + assert not any(s.isna().any() for _, s in df.iteritems()) def test_empty_tables(self): """ @@ -518,10 +525,10 @@ def test_nyse_wsj_commas_table(self): columns = Index(['Issue(Roll over for charts and headlines)', 'Volume', 'Price', 'Chg', '% Chg']) nrows = 100 - self.assertEqual(df.shape[0], nrows) - self.assert_index_equal(df.columns, columns) + assert df.shape[0] == nrows + tm.assert_index_equal(df.columns, columns) - @tm.slow + @pytest.mark.slow def test_banklist_header(self): from pandas.io.html import _remove_whitespace @@ -536,7 +543,7 @@ def try_remove_ws(x): ground_truth = read_csv(os.path.join(DATA_PATH, 'banklist.csv'), converters={'Updated Date': Timestamp, 'Closing Date': Timestamp}) - self.assertEqual(df.shape, ground_truth.shape) + assert df.shape == ground_truth.shape old = ['First Vietnamese American BankIn Vietnamese', 'Westernbank Puerto RicoEn Espanol', 'R-G Premier Bank of Puerto RicoEn Espanol', @@ -560,16 +567,16 @@ def try_remove_ws(x): coerce=True) tm.assert_frame_equal(converted, gtnew) - @tm.slow + @pytest.mark.slow def test_gold_canyon(self): gc = 'Gold Canyon' with open(self.banklist_data, 'r') as f: raw_text = f.read() - self.assertIn(gc, raw_text) + assert gc in raw_text df = self.read_html(self.banklist_data, 'Gold Canyon', attrs={'id': 'table'})[0] - self.assertIn(gc, df.to_string()) + assert gc in df.to_string() def test_different_number_of_rows(self): expected = """ @@ -652,9 +659,10 @@ def test_parse_dates_combine(self): def test_computer_sales_page(self): data = os.path.join(DATA_PATH, 'computer_sales_page.html') - with tm.assertRaisesRegexp(ParserError, r"Passed header=\[0,1\] are " - "too many rows for this multi_index " - "of columns"): + with tm.assert_raises_regex(ParserError, + r"Passed header=\[0,1\] are " + r"too many rows for this " + r"multi_index of columns"): self.read_html(data, header=[0, 1]) def test_wikipedia_states_table(self): @@ -662,7 +670,7 @@ def test_wikipedia_states_table(self): assert os.path.isfile(data), '%r is not a file' % data assert os.path.getsize(data), '%r is an empty file' % data result = self.read_html(data, 'Arizona', header=1)[0] - self.assertEqual(result['sq mi'].dtype, np.dtype('float64')) + assert result['sq mi'].dtype == np.dtype('float64') def test_decimal_rows(self): @@ -685,13 +693,13 @@ def test_decimal_rows(self): ''') expected = DataFrame(data={'Header': 1100.101}, index=[0]) result = self.read_html(data, decimal='#')[0] - nose.tools.assert_equal(result['Header'].dtype, np.dtype('float64')) + assert result['Header'].dtype == np.dtype('float64') tm.assert_frame_equal(result, expected) def test_bool_header_arg(self): # GH 6114 for arg in [True, False]: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): read_html(self.spam_data, header=arg) def test_converters(self): @@ -760,18 +768,29 @@ def test_keep_default_na(self): html_df = read_html(html_data, keep_default_na=True)[0] tm.assert_frame_equal(expected_df, html_df) + def test_multiple_header_rows(self): + # Issue #13434 + expected_df = DataFrame(data=[("Hillary", 68, "D"), + ("Bernie", 74, "D"), + ("Donald", 69, "R")]) + expected_df.columns = [["Unnamed: 0_level_0", "Age", "Party"], + ["Name", "Unnamed: 1_level_1", + "Unnamed: 2_level_1"]] + html = expected_df.to_html(index=False) + html_df = read_html(html, )[0] + tm.assert_frame_equal(expected_df, html_df) + def _lang_enc(filename): return os.path.splitext(os.path.basename(filename))[0].split('_') -class TestReadHtmlEncoding(tm.TestCase): +class TestReadHtmlEncoding(object): files = glob.glob(os.path.join(DATA_PATH, 'html_encoding', '*.html')) flavor = 'bs4' @classmethod - def setUpClass(cls): - super(TestReadHtmlEncoding, cls).setUpClass() + def setup_class(cls): _skip_if_none_of((cls.flavor, 'html5lib')) def read_html(self, *args, **kwargs): @@ -812,17 +831,16 @@ class TestReadHtmlEncodingLxml(TestReadHtmlEncoding): flavor = 'lxml' @classmethod - def setUpClass(cls): - super(TestReadHtmlEncodingLxml, cls).setUpClass() + def setup_class(cls): + super(TestReadHtmlEncodingLxml, cls).setup_class() _skip_if_no(cls.flavor) -class TestReadHtmlLxml(tm.TestCase, ReadHtmlMixin): +class TestReadHtmlLxml(ReadHtmlMixin): flavor = 'lxml' @classmethod - def setUpClass(cls): - super(TestReadHtmlLxml, cls).setUpClass() + def setup_class(cls): _skip_if_no('lxml') def test_data_fail(self): @@ -830,19 +848,19 @@ def test_data_fail(self): spam_data = os.path.join(DATA_PATH, 'spam.html') banklist_data = os.path.join(DATA_PATH, 'banklist.html') - with tm.assertRaises(XMLSyntaxError): + with pytest.raises(XMLSyntaxError): self.read_html(spam_data) - with tm.assertRaises(XMLSyntaxError): + with pytest.raises(XMLSyntaxError): self.read_html(banklist_data) def test_works_on_valid_markup(self): filename = os.path.join(DATA_PATH, 'valid_markup.html') dfs = self.read_html(filename, index_col=0) - tm.assertIsInstance(dfs, list) - tm.assertIsInstance(dfs[0], DataFrame) + assert isinstance(dfs, list) + assert isinstance(dfs[0], DataFrame) - @tm.slow + @pytest.mark.slow def test_fallback_success(self): _skip_if_none_of(('bs4', 'html5lib')) banklist_data = os.path.join(DATA_PATH, 'banklist.html') @@ -872,7 +890,7 @@ def test_computer_sales_page(self): def test_invalid_flavor(): url = 'google.com' - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): read_html(url, 'google', flavor='not a* valid**++ flaver') @@ -885,7 +903,7 @@ def get_elements_from_file(url, element='table'): return soup.find_all(element) -@tm.slow +@pytest.mark.slow def test_bs4_finds_tables(): filepath = os.path.join(DATA_PATH, "spam.html") with warnings.catch_warnings(): @@ -900,13 +918,13 @@ def get_lxml_elements(url, element): return doc.xpath('.//{0}'.format(element)) -@tm.slow +@pytest.mark.slow def test_lxml_finds_tables(): filepath = os.path.join(DATA_PATH, "spam.html") assert get_lxml_elements(filepath, 'table') -@tm.slow +@pytest.mark.slow def test_lxml_finds_tbody(): filepath = os.path.join(DATA_PATH, "spam.html") assert get_lxml_elements(filepath, 'tbody') @@ -919,6 +937,31 @@ def test_same_ordering(): dfs_bs4 = read_html(filename, index_col=0, flavor=['bs4']) assert_framelist_equal(dfs_lxml, dfs_bs4) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + +class ErrorThread(threading.Thread): + def run(self): + try: + super(ErrorThread, self).run() + except Exception as e: + self.err = e + else: + self.err = None + + +@pytest.mark.slow +def test_importcheck_thread_safety(): + # see gh-16928 + + # force import check by reinitalising global vars in html.py + reload(pandas.io.html) + + filename = os.path.join(DATA_PATH, 'valid_markup.html') + helper_thread1 = ErrorThread(target=read_html, args=(filename,)) + helper_thread2 = ErrorThread(target=read_html, args=(filename,)) + + helper_thread1.start() + helper_thread2.start() + + while helper_thread1.is_alive() or helper_thread2.is_alive(): + pass + assert None is helper_thread1.err is helper_thread2.err diff --git a/pandas/io/tests/test_packers.py b/pandas/tests/io/test_packers.py similarity index 80% rename from pandas/io/tests/test_packers.py rename to pandas/tests/io/test_packers.py index 6b368bb2bb5ce..a28adcf1ee771 100644 --- a/pandas/io/tests/test_packers.py +++ b/pandas/tests/io/test_packers.py @@ -1,5 +1,6 @@ -import nose +import pytest +from warnings import catch_warnings import os import datetime import numpy as np @@ -10,7 +11,7 @@ from pandas.compat import u, PY3 from pandas import (Series, DataFrame, Panel, MultiIndex, bdate_range, date_range, period_range, Index, Categorical) -from pandas.core.common import PerformanceWarning +from pandas.errors import PerformanceWarning from pandas.io.packers import to_msgpack, read_msgpack import pandas.util.testing as tm from pandas.util.testing import (ensure_clean, @@ -22,7 +23,8 @@ from pandas.tests.test_panel import assert_panel_equal import pandas -from pandas import Timestamp, NaT, tslib +from pandas import Timestamp, NaT +from pandas._libs.tslib import iNaT nan = np.nan @@ -40,7 +42,21 @@ else: _ZLIB_INSTALLED = True -_multiprocess_can_split_ = False + +@pytest.fixture(scope='module') +def current_packers_data(): + # our current version packers data + from pandas.tests.io.generate_legacy_storage_files import ( + create_msgpack_data) + return create_msgpack_data() + + +@pytest.fixture(scope='module') +def all_packers_data(): + # our all of our current version packers data + from pandas.tests.io.generate_legacy_storage_files import ( + create_data) + return create_data() def check_arbitrary(a, b): @@ -74,12 +90,12 @@ def check_arbitrary(a, b): assert(a == b) -class TestPackers(tm.TestCase): +class TestPackers(object): - def setUp(self): + def setup_method(self, method): self.path = '__%s__.msg' % tm.rands(10) - def tearDown(self): + def teardown_method(self, method): pass def encode_decode(self, x, compress=None, **kwargs): @@ -118,6 +134,16 @@ def test_string_io(self): result = read_msgpack(p) tm.assert_frame_equal(result, df) + def test_path_pathlib(self): + df = tm.makeDataFrame() + result = tm.round_trip_pathlib(df.to_msgpack, read_msgpack) + tm.assert_frame_equal(df, result) + + def test_path_localpath(self): + df = tm.makeDataFrame() + result = tm.round_trip_localpath(df.to_msgpack, read_msgpack) + tm.assert_frame_equal(df, result) + def test_iterator_with_string_io(self): dfs = [DataFrame(np.random.randn(10, 2)) for i in range(5)] @@ -132,9 +158,9 @@ class A(object): def __init__(self): self.read = 0 - tm.assertRaises(ValueError, read_msgpack, path_or_buf=None) - tm.assertRaises(ValueError, read_msgpack, path_or_buf={}) - tm.assertRaises(ValueError, read_msgpack, path_or_buf=A()) + pytest.raises(ValueError, read_msgpack, path_or_buf=None) + pytest.raises(ValueError, read_msgpack, path_or_buf={}) + pytest.raises(ValueError, read_msgpack, path_or_buf=A()) class TestNumpy(TestPackers): @@ -147,7 +173,7 @@ def test_numpy_scalar_float(self): def test_numpy_scalar_complex(self): x = np.complex64(np.random.rand() + 1j * np.random.rand()) x_rec = self.encode_decode(x) - self.assertTrue(np.allclose(x, x_rec)) + assert np.allclose(x, x_rec) def test_scalar_float(self): x = np.random.rand() @@ -157,7 +183,7 @@ def test_scalar_float(self): def test_scalar_complex(self): x = np.random.rand() + 1j * np.random.rand() x_rec = self.encode_decode(x) - self.assertTrue(np.allclose(x, x_rec)) + assert np.allclose(x, x_rec) def test_list_numpy_float(self): x = [np.float32(np.random.rand()) for i in range(5)] @@ -170,13 +196,13 @@ def test_list_numpy_float(self): def test_list_numpy_float_complex(self): if not hasattr(np, 'complex128'): - raise nose.SkipTest('numpy cant handle complex128') + pytest.skip('numpy cant handle complex128') x = [np.float32(np.random.rand()) for i in range(5)] + \ [np.complex128(np.random.rand() + 1j * np.random.rand()) for i in range(5)] x_rec = self.encode_decode(x) - self.assertTrue(np.allclose(x, x_rec)) + assert np.allclose(x, x_rec) def test_list_float(self): x = [np.random.rand() for i in range(5)] @@ -191,7 +217,7 @@ def test_list_float_complex(self): x = [np.random.rand() for i in range(5)] + \ [(np.random.rand() + 1j * np.random.rand()) for i in range(5)] x_rec = self.encode_decode(x) - self.assertTrue(np.allclose(x, x_rec)) + assert np.allclose(x, x_rec) def test_dict_float(self): x = {'foo': 1.0, 'bar': 2.0} @@ -201,9 +227,10 @@ def test_dict_float(self): def test_dict_complex(self): x = {'foo': 1.0 + 1.0j, 'bar': 2.0 + 2.0j} x_rec = self.encode_decode(x) - self.assertEqual(x, x_rec) + tm.assert_dict_equal(x, x_rec) + for key in x: - self.assertEqual(type(x[key]), type(x_rec[key])) + tm.assert_class_equal(x[key], x_rec[key], obj="complex value") def test_dict_numpy_float(self): x = {'foo': np.float32(1.0), 'bar': np.float32(2.0)} @@ -214,9 +241,10 @@ def test_dict_numpy_complex(self): x = {'foo': np.complex128(1.0 + 1.0j), 'bar': np.complex128(2.0 + 2.0j)} x_rec = self.encode_decode(x) - self.assertEqual(x, x_rec) + tm.assert_dict_equal(x, x_rec) + for key in x: - self.assertEqual(type(x[key]), type(x_rec[key])) + tm.assert_class_equal(x[key], x_rec[key], obj="numpy complex128") def test_numpy_array_float(self): @@ -231,8 +259,8 @@ def test_numpy_array_float(self): def test_numpy_array_complex(self): x = (np.random.rand(5) + 1j * np.random.rand(5)).astype(np.complex128) x_rec = self.encode_decode(x) - self.assertTrue(all(map(lambda x, y: x == y, x, x_rec)) and - x.dtype == x_rec.dtype) + assert (all(map(lambda x, y: x == y, x, x_rec)) and + x.dtype == x_rec.dtype) def test_list_mixed(self): x = [1.0, np.float32(3.5), np.complex128(4.25), u('foo')] @@ -252,25 +280,25 @@ def test_timestamp(self): '20130101'), Timestamp('20130101', tz='US/Eastern'), Timestamp('201301010501')]: i_rec = self.encode_decode(i) - self.assertEqual(i, i_rec) + assert i == i_rec def test_nat(self): nat_rec = self.encode_decode(NaT) - self.assertIs(NaT, nat_rec) + assert NaT is nat_rec def test_datetimes(self): # fails under 2.6/win32 (np.datetime64 seems broken) if LooseVersion(sys.version) < '2.7': - raise nose.SkipTest('2.6 with np.datetime64 is broken') + pytest.skip('2.6 with np.datetime64 is broken') for i in [datetime.datetime(2013, 1, 1), datetime.datetime(2013, 1, 1, 5, 1), datetime.date(2013, 1, 1), np.datetime64(datetime.datetime(2013, 1, 5, 2, 15))]: i_rec = self.encode_decode(i) - self.assertEqual(i, i_rec) + assert i == i_rec def test_timedeltas(self): @@ -278,13 +306,13 @@ def test_timedeltas(self): datetime.timedelta(days=1, seconds=10), np.timedelta64(1000000)]: i_rec = self.encode_decode(i) - self.assertEqual(i, i_rec) + assert i == i_rec class TestIndex(TestPackers): - def setUp(self): - super(TestIndex, self).setUp() + def setup_method(self, method): + super(TestIndex, self).setup_method(method) self.d = { 'string': tm.makeStringIndex(100), @@ -297,6 +325,7 @@ def setUp(self): 'period': Index(period_range('2012-1-1', freq='M', periods=3)), 'date2': Index(date_range('2013-01-1', periods=10)), 'bdate': Index(bdate_range('2013-01-02', periods=10)), + 'cat': tm.makeCategoricalIndex(100) } self.mi = { @@ -310,36 +339,43 @@ def test_basic_index(self): for s, i in self.d.items(): i_rec = self.encode_decode(i) - self.assert_index_equal(i, i_rec) + tm.assert_index_equal(i, i_rec) # datetime with no freq (GH5506) i = Index([Timestamp('20130101'), Timestamp('20130103')]) i_rec = self.encode_decode(i) - self.assert_index_equal(i, i_rec) + tm.assert_index_equal(i, i_rec) # datetime with timezone i = Index([Timestamp('20130101 9:00:00'), Timestamp( '20130103 11:00:00')]).tz_localize('US/Eastern') i_rec = self.encode_decode(i) - self.assert_index_equal(i, i_rec) + tm.assert_index_equal(i, i_rec) def test_multi_index(self): for s, i in self.mi.items(): i_rec = self.encode_decode(i) - self.assert_index_equal(i, i_rec) + tm.assert_index_equal(i, i_rec) def test_unicode(self): i = tm.makeUnicodeIndex(100) i_rec = self.encode_decode(i) - self.assert_index_equal(i, i_rec) + tm.assert_index_equal(i, i_rec) + + def categorical_index(self): + # GH15487 + df = DataFrame(np.random.randn(10, 2)) + df = df.astype({0: 'category'}).set_index(0) + result = self.encode_decode(df) + tm.assert_frame_equal(result, df) class TestSeries(TestPackers): - def setUp(self): - super(TestSeries, self).setUp() + def setup_method(self, method): + super(TestSeries, self).setup_method(method) self.d = {} @@ -351,7 +387,7 @@ def setUp(self): s.name = 'object' self.d['object'] = s - s = Series(tslib.iNaT, dtype='M8[ns]', index=range(5)) + s = Series(iNaT, dtype='M8[ns]', index=range(5)) self.d['date'] = s data = { @@ -386,8 +422,8 @@ def test_basic(self): class TestCategorical(TestPackers): - def setUp(self): - super(TestCategorical, self).setUp() + def setup_method(self, method): + super(TestCategorical, self).setup_method(method) self.d = {} @@ -409,8 +445,8 @@ def test_basic(self): class TestNDFrame(TestPackers): - def setUp(self): - super(TestNDFrame, self).setUp() + def setup_method(self, method): + super(TestNDFrame, self).setup_method(method) data = { 'A': [0., 1., 2., 3., np.nan], @@ -429,9 +465,10 @@ def setUp(self): 'int': DataFrame(dict(A=data['B'], B=Series(data['B']) + 1)), 'mixed': DataFrame(data)} - self.panel = { - 'float': Panel(dict(ItemA=self.frame['float'], - ItemB=self.frame['float'] + 1))} + with catch_warnings(record=True): + self.panel = { + 'float': Panel(dict(ItemA=self.frame['float'], + ItemB=self.frame['float'] + 1))} def test_basic_frame(self): @@ -441,9 +478,10 @@ def test_basic_frame(self): def test_basic_panel(self): - for s, i in self.panel.items(): - i_rec = self.encode_decode(i) - assert_panel_equal(i, i_rec) + with catch_warnings(record=True): + for s, i in self.panel.items(): + i_rec = self.encode_decode(i) + assert_panel_equal(i, i_rec) def test_multi(self): @@ -460,7 +498,7 @@ def test_multi(self): l = [self.frame['float'], self.frame['float'] .A, self.frame['float'].B, None] l_rec = self.encode_decode(l) - self.assertIsInstance(l_rec, tuple) + assert isinstance(l_rec, tuple) check_arbitrary(l, l_rec) def test_iterator(self): @@ -510,7 +548,7 @@ def _check_roundtrip(self, obj, comparator, **kwargs): # currently these are not implemetned # i_rec = self.encode_decode(obj) # comparator(obj, i_rec, **kwargs) - self.assertRaises(NotImplementedError, self.encode_decode, obj) + pytest.raises(NotImplementedError, self.encode_decode, obj) def test_sparse_series(self): @@ -551,7 +589,7 @@ class TestCompression(TestPackers): """See https://github.com/pandas-dev/pandas/pull/9783 """ - def setUp(self): + def setup_method(self, method): try: from sqlalchemy import create_engine self._create_sql_engine = create_engine @@ -560,7 +598,7 @@ def setUp(self): else: self._SQLALCHEMY_INSTALLED = True - super(TestCompression, self).setUp() + super(TestCompression, self).setup_method(method) data = { 'A': np.arange(1000, dtype=np.float64), 'B': np.arange(1000, dtype=np.int32), @@ -587,16 +625,16 @@ def _test_compression(self, compress): assert_frame_equal(value, expected) # make sure that we can write to the new frames for block in value._data.blocks: - self.assertTrue(block.values.flags.writeable) + assert block.values.flags.writeable def test_compression_zlib(self): if not _ZLIB_INSTALLED: - raise nose.SkipTest('no zlib') + pytest.skip('no zlib') self._test_compression('zlib') def test_compression_blosc(self): if not _BLOSC_INSTALLED: - raise nose.SkipTest('no blosc') + pytest.skip('no blosc') self._test_compression('blosc') def _test_compression_warns_when_decompress_caches(self, compress): @@ -636,31 +674,29 @@ def decompress(ob): # make sure that we can write to the new frames even though # we needed to copy the data for block in value._data.blocks: - self.assertTrue(block.values.flags.writeable) + assert block.values.flags.writeable # mutate the data in some way block.values[0] += rhs[block.dtype] for w in ws: # check the messages from our warnings - self.assertEqual( - str(w.message), - 'copying data after decompressing; this may mean that' - ' decompress is caching its result', - ) + assert str(w.message) == ('copying data after decompressing; ' + 'this may mean that decompress is ' + 'caching its result') for buf, control_buf in zip(not_garbage, control): # make sure none of our mutations above affected the # original buffers - self.assertEqual(buf, control_buf) + assert buf == control_buf def test_compression_warns_when_decompress_caches_zlib(self): if not _ZLIB_INSTALLED: - raise nose.SkipTest('no zlib') + pytest.skip('no zlib') self._test_compression_warns_when_decompress_caches('zlib') def test_compression_warns_when_decompress_caches_blosc(self): if not _BLOSC_INSTALLED: - raise nose.SkipTest('no blosc') + pytest.skip('no blosc') self._test_compression_warns_when_decompress_caches('blosc') def _test_small_strings_no_warn(self, compress): @@ -669,14 +705,14 @@ def _test_small_strings_no_warn(self, compress): empty_unpacked = self.encode_decode(empty, compress=compress) tm.assert_numpy_array_equal(empty_unpacked, empty) - self.assertTrue(empty_unpacked.flags.writeable) + assert empty_unpacked.flags.writeable char = np.array([ord(b'a')], dtype='uint8') with tm.assert_produces_warning(None): char_unpacked = self.encode_decode(char, compress=compress) tm.assert_numpy_array_equal(char_unpacked, char) - self.assertTrue(char_unpacked.flags.writeable) + assert char_unpacked.flags.writeable # if this test fails I am sorry because the interpreter is now in a # bad state where b'a' points to 98 == ord(b'b'). char_unpacked[0] = ord(b'b') @@ -684,7 +720,7 @@ def _test_small_strings_no_warn(self, compress): # we compare the ord of bytes b'a' with unicode u'a' because the should # always be the same (unless we were able to mutate the shared # character singleton in which case ord(b'a') == ord(b'b'). - self.assertEqual(ord(b'a'), ord(u'a')) + assert ord(b'a') == ord(u'a') tm.assert_numpy_array_equal( char_unpacked, np.array([ord(b'b')], dtype='uint8'), @@ -692,36 +728,36 @@ def _test_small_strings_no_warn(self, compress): def test_small_strings_no_warn_zlib(self): if not _ZLIB_INSTALLED: - raise nose.SkipTest('no zlib') + pytest.skip('no zlib') self._test_small_strings_no_warn('zlib') def test_small_strings_no_warn_blosc(self): if not _BLOSC_INSTALLED: - raise nose.SkipTest('no blosc') + pytest.skip('no blosc') self._test_small_strings_no_warn('blosc') def test_readonly_axis_blosc(self): # GH11880 if not _BLOSC_INSTALLED: - raise nose.SkipTest('no blosc') + pytest.skip('no blosc') df1 = DataFrame({'A': list('abcd')}) df2 = DataFrame(df1, index=[1., 2., 3., 4.]) - self.assertTrue(1 in self.encode_decode(df1['A'], compress='blosc')) - self.assertTrue(1. in self.encode_decode(df2['A'], compress='blosc')) + assert 1 in self.encode_decode(df1['A'], compress='blosc') + assert 1. in self.encode_decode(df2['A'], compress='blosc') def test_readonly_axis_zlib(self): # GH11880 df1 = DataFrame({'A': list('abcd')}) df2 = DataFrame(df1, index=[1., 2., 3., 4.]) - self.assertTrue(1 in self.encode_decode(df1['A'], compress='zlib')) - self.assertTrue(1. in self.encode_decode(df2['A'], compress='zlib')) + assert 1 in self.encode_decode(df1['A'], compress='zlib') + assert 1. in self.encode_decode(df2['A'], compress='zlib') def test_readonly_axis_blosc_to_sql(self): # GH11880 if not _BLOSC_INSTALLED: - raise nose.SkipTest('no blosc') + pytest.skip('no blosc') if not self._SQLALCHEMY_INSTALLED: - raise nose.SkipTest('no sqlalchemy') + pytest.skip('no sqlalchemy') expected = DataFrame({'A': list('abcd')}) df = self.encode_decode(expected, compress='blosc') eng = self._create_sql_engine("sqlite:///:memory:") @@ -733,9 +769,9 @@ def test_readonly_axis_blosc_to_sql(self): def test_readonly_axis_zlib_to_sql(self): # GH11880 if not _ZLIB_INSTALLED: - raise nose.SkipTest('no zlib') + pytest.skip('no zlib') if not self._SQLALCHEMY_INSTALLED: - raise nose.SkipTest('no sqlalchemy') + pytest.skip('no sqlalchemy') expected = DataFrame({'A': list('abcd')}) df = self.encode_decode(expected, compress='zlib') eng = self._create_sql_engine("sqlite:///:memory:") @@ -747,8 +783,8 @@ def test_readonly_axis_zlib_to_sql(self): class TestEncoding(TestPackers): - def setUp(self): - super(TestEncoding, self).setUp() + def setup_method(self, method): + super(TestEncoding, self).setup_method(method) data = { 'A': [compat.u('\u2019')] * 1000, 'B': np.arange(1000, dtype=np.int32), @@ -775,12 +811,21 @@ def test_default_encoding(self): for frame in compat.itervalues(self.frame): result = frame.to_msgpack() expected = frame.to_msgpack(encoding='utf8') - self.assertEqual(result, expected) + assert result == expected result = self.encode_decode(frame) assert_frame_equal(result, frame) -class TestMsgpack(): +def legacy_packers_versions(): + # yield the packers versions + path = tm.get_data_path('legacy_msgpack') + for v in os.listdir(path): + p = os.path.join(path, v) + if os.path.isdir(p): + yield v + + +class TestMsgpack(object): """ How to add msgpack tests: @@ -790,47 +835,38 @@ class TestMsgpack(): $ python generate_legacy_storage_files.py msgpack 3. Move the created pickle to "data/legacy_msgpack/" directory. - - NOTE: TestMsgpack can't be a subclass of tm.Testcase to use test generator. - http://stackoverflow.com/questions/6689537/nose-test-generators-inside-class """ - def setUp(self): - from pandas.io.tests.generate_legacy_storage_files import ( - create_msgpack_data, create_data) - self.data = create_msgpack_data() - self.all_data = create_data() - self.path = u('__%s__.msgpack' % tm.rands(10)) - self.minimum_structure = {'series': ['float', 'int', 'mixed', - 'ts', 'mi', 'dup'], - 'frame': ['float', 'int', 'mixed', 'mi'], - 'panel': ['float'], - 'index': ['int', 'date', 'period'], - 'mi': ['reg2']} - - def check_min_structure(self, data): + minimum_structure = {'series': ['float', 'int', 'mixed', + 'ts', 'mi', 'dup'], + 'frame': ['float', 'int', 'mixed', 'mi'], + 'panel': ['float'], + 'index': ['int', 'date', 'period'], + 'mi': ['reg2']} + + def check_min_structure(self, data, version): for typ, v in self.minimum_structure.items(): assert typ in data, '"{0}" not found in unpacked data'.format(typ) for kind in v: msg = '"{0}" not found in data["{1}"]'.format(kind, typ) assert kind in data[typ], msg - def compare(self, vf, version): + def compare(self, current_data, all_data, vf, version): # GH12277 encoding default used to be latin-1, now utf-8 if LooseVersion(version) < '0.18.0': data = read_msgpack(vf, encoding='latin-1') else: data = read_msgpack(vf) - self.check_min_structure(data) + self.check_min_structure(data, version) for typ, dv in data.items(): - assert typ in self.all_data, ('unpacked data contains ' - 'extra key "{0}"' - .format(typ)) + assert typ in all_data, ('unpacked data contains ' + 'extra key "{0}"' + .format(typ)) for dt, result in dv.items(): - assert dt in self.all_data[typ], ('data["{0}"] contains extra ' - 'key "{1}"'.format(typ, dt)) + assert dt in current_data[typ], ('data["{0}"] contains extra ' + 'key "{1}"'.format(typ, dt)) try: - expected = self.data[typ][dt] + expected = current_data[typ][dt] except KeyError: continue @@ -863,30 +899,24 @@ def compare_frame_dt_mixed_tzs(self, result, expected, typ, version): else: tm.assert_frame_equal(result, expected) - def read_msgpacks(self, version): + @pytest.mark.parametrize('version', legacy_packers_versions()) + def test_msgpacks_legacy(self, current_packers_data, all_packers_data, + version): - pth = tm.get_data_path('legacy_msgpack/{0}'.format(str(version))) + pth = tm.get_data_path('legacy_msgpack/{0}'.format(version)) n = 0 for f in os.listdir(pth): # GH12142 0.17 files packed in P2 can't be read in P3 if (compat.PY3 and version.startswith('0.17.') and - f.split('.')[-4][-1] == '2'): + f.split('.')[-4][-1] == '2'): continue vf = os.path.join(pth, f) try: - self.compare(vf, version) + with catch_warnings(record=True): + self.compare(current_packers_data, all_packers_data, + vf, version) except ImportError: # blosc not installed continue n += 1 assert n > 0, 'Msgpack files are not tested' - - def test_msgpack(self): - msgpack_path = tm.get_data_path('legacy_msgpack') - n = 0 - for v in os.listdir(msgpack_path): - pth = os.path.join(msgpack_path, v) - if os.path.isdir(pth): - yield self.read_msgpacks, v - n += 1 - assert n > 0, 'Msgpack files are not tested' diff --git a/pandas/tests/io/test_parquet.py b/pandas/tests/io/test_parquet.py new file mode 100644 index 0000000000000..78c72e2a05566 --- /dev/null +++ b/pandas/tests/io/test_parquet.py @@ -0,0 +1,378 @@ +""" test parquet compat """ + +import pytest +import datetime +from warnings import catch_warnings + +import numpy as np +import pandas as pd +from pandas.compat import PY3, is_platform_windows +from pandas.io.parquet import (to_parquet, read_parquet, get_engine, + PyArrowImpl, FastParquetImpl) +from pandas.util import testing as tm + +try: + import pyarrow # noqa + _HAVE_PYARROW = True +except ImportError: + _HAVE_PYARROW = False + +try: + import fastparquet # noqa + _HAVE_FASTPARQUET = True +except ImportError: + _HAVE_FASTPARQUET = False + + +# setup engines & skips +@pytest.fixture(params=[ + pytest.param('fastparquet', + marks=pytest.mark.skipif(not _HAVE_FASTPARQUET, + reason='fastparquet is ' + 'not installed')), + pytest.param('pyarrow', + marks=pytest.mark.skipif(not _HAVE_PYARROW, + reason='pyarrow is ' + 'not installed'))]) +def engine(request): + return request.param + + +@pytest.fixture +def pa(): + if not _HAVE_PYARROW: + pytest.skip("pyarrow is not installed") + if is_platform_windows(): + pytest.skip("pyarrow-parquet not building on windows") + return 'pyarrow' + + +@pytest.fixture +def fp(): + if not _HAVE_FASTPARQUET: + pytest.skip("fastparquet is not installed") + return 'fastparquet' + + +@pytest.fixture +def df_compat(): + return pd.DataFrame({'A': [1, 2, 3], 'B': 'foo'}) + + +@pytest.fixture +def df_cross_compat(): + df = pd.DataFrame({'a': list('abc'), + 'b': list(range(1, 4)), + 'c': np.arange(3, 6).astype('u1'), + 'd': np.arange(4.0, 7.0, dtype='float64'), + 'e': [True, False, True], + 'f': pd.date_range('20130101', periods=3), + 'g': pd.date_range('20130101', periods=3, + tz='US/Eastern'), + 'h': pd.date_range('20130101', periods=3, freq='ns')}) + return df + + +def test_invalid_engine(df_compat): + + with pytest.raises(ValueError): + df_compat.to_parquet('foo', 'bar') + + +def test_options_py(df_compat, pa): + # use the set option + + df = df_compat + with tm.ensure_clean() as path: + + with pd.option_context('io.parquet.engine', 'pyarrow'): + df.to_parquet(path) + + result = read_parquet(path, compression=None) + tm.assert_frame_equal(result, df) + + +def test_options_fp(df_compat, fp): + # use the set option + + df = df_compat + with tm.ensure_clean() as path: + + with pd.option_context('io.parquet.engine', 'fastparquet'): + df.to_parquet(path, compression=None) + + result = read_parquet(path, compression=None) + tm.assert_frame_equal(result, df) + + +def test_options_auto(df_compat, fp, pa): + + df = df_compat + with tm.ensure_clean() as path: + + with pd.option_context('io.parquet.engine', 'auto'): + df.to_parquet(path) + + result = read_parquet(path, compression=None) + tm.assert_frame_equal(result, df) + + +def test_options_get_engine(fp, pa): + assert isinstance(get_engine('pyarrow'), PyArrowImpl) + assert isinstance(get_engine('fastparquet'), FastParquetImpl) + + with pd.option_context('io.parquet.engine', 'pyarrow'): + assert isinstance(get_engine('auto'), PyArrowImpl) + assert isinstance(get_engine('pyarrow'), PyArrowImpl) + assert isinstance(get_engine('fastparquet'), FastParquetImpl) + + with pd.option_context('io.parquet.engine', 'fastparquet'): + assert isinstance(get_engine('auto'), FastParquetImpl) + assert isinstance(get_engine('pyarrow'), PyArrowImpl) + assert isinstance(get_engine('fastparquet'), FastParquetImpl) + + with pd.option_context('io.parquet.engine', 'auto'): + assert isinstance(get_engine('auto'), PyArrowImpl) + assert isinstance(get_engine('pyarrow'), PyArrowImpl) + assert isinstance(get_engine('fastparquet'), FastParquetImpl) + + +@pytest.mark.xfail(reason="fp does not ignore pa index __index_level_0__") +def test_cross_engine_pa_fp(df_cross_compat, pa, fp): + # cross-compat with differing reading/writing engines + + df = df_cross_compat + with tm.ensure_clean() as path: + df.to_parquet(path, engine=pa, compression=None) + + result = read_parquet(path, engine=fp, compression=None) + tm.assert_frame_equal(result, df) + + +@pytest.mark.xfail(reason="pyarrow reading fp in some cases") +def test_cross_engine_fp_pa(df_cross_compat, pa, fp): + # cross-compat with differing reading/writing engines + + df = df_cross_compat + with tm.ensure_clean() as path: + df.to_parquet(path, engine=fp, compression=None) + + result = read_parquet(path, engine=pa, compression=None) + tm.assert_frame_equal(result, df) + + +class Base(object): + + def check_error_on_write(self, df, engine, exc): + # check that we are raising the exception + # on writing + + with pytest.raises(exc): + with tm.ensure_clean() as path: + to_parquet(df, path, engine, compression=None) + + def check_round_trip(self, df, engine, expected=None, **kwargs): + + with tm.ensure_clean() as path: + df.to_parquet(path, engine, **kwargs) + result = read_parquet(path, engine) + + if expected is None: + expected = df + tm.assert_frame_equal(result, expected) + + # repeat + to_parquet(df, path, engine, **kwargs) + result = pd.read_parquet(path, engine) + + if expected is None: + expected = df + tm.assert_frame_equal(result, expected) + + +class TestBasic(Base): + + def test_error(self, engine): + + for obj in [pd.Series([1, 2, 3]), 1, 'foo', pd.Timestamp('20130101'), + np.array([1, 2, 3])]: + self.check_error_on_write(obj, engine, ValueError) + + def test_columns_dtypes(self, engine): + + df = pd.DataFrame({'string': list('abc'), + 'int': list(range(1, 4))}) + + # unicode + df.columns = [u'foo', u'bar'] + self.check_round_trip(df, engine, compression=None) + + def test_columns_dtypes_invalid(self, engine): + + df = pd.DataFrame({'string': list('abc'), + 'int': list(range(1, 4))}) + + # numeric + df.columns = [0, 1] + self.check_error_on_write(df, engine, ValueError) + + if PY3: + # bytes on PY3, on PY2 these are str + df.columns = [b'foo', b'bar'] + self.check_error_on_write(df, engine, ValueError) + + # python object + df.columns = [datetime.datetime(2011, 1, 1, 0, 0), + datetime.datetime(2011, 1, 1, 1, 1)] + self.check_error_on_write(df, engine, ValueError) + + def test_write_with_index(self, engine): + + df = pd.DataFrame({'A': [1, 2, 3]}) + self.check_round_trip(df, engine, compression=None) + + # non-default index + for index in [[2, 3, 4], + pd.date_range('20130101', periods=3), + list('abc'), + [1, 3, 4], + pd.MultiIndex.from_tuples([('a', 1), ('a', 2), + ('b', 1)]), + ]: + + df.index = index + self.check_error_on_write(df, engine, ValueError) + + # index with meta-data + df.index = [0, 1, 2] + df.index.name = 'foo' + self.check_error_on_write(df, engine, ValueError) + + # column multi-index + df.index = [0, 1, 2] + df.columns = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)]), + self.check_error_on_write(df, engine, ValueError) + + @pytest.mark.parametrize('compression', [None, 'gzip', 'snappy', 'brotli']) + def test_compression(self, engine, compression): + + if compression == 'snappy': + pytest.importorskip('snappy') + + elif compression == 'brotli': + pytest.importorskip('brotli') + + df = pd.DataFrame({'A': [1, 2, 3]}) + self.check_round_trip(df, engine, compression=compression) + + +class TestParquetPyArrow(Base): + + def test_basic(self, pa): + + df = pd.DataFrame({'string': list('abc'), + 'string_with_nan': ['a', np.nan, 'c'], + 'string_with_none': ['a', None, 'c'], + 'bytes': [b'foo', b'bar', b'baz'], + 'unicode': [u'foo', u'bar', u'baz'], + 'int': list(range(1, 4)), + 'uint': np.arange(3, 6).astype('u1'), + 'float': np.arange(4.0, 7.0, dtype='float64'), + 'float_with_nan': [2., np.nan, 3.], + 'bool': [True, False, True], + 'bool_with_none': [True, None, True], + 'datetime_ns': pd.date_range('20130101', periods=3), + 'datetime_with_nat': [pd.Timestamp('20130101'), + pd.NaT, + pd.Timestamp('20130103')] + }) + + self.check_round_trip(df, pa) + + def test_duplicate_columns(self, pa): + + # not currently able to handle duplicate columns + df = pd.DataFrame(np.arange(12).reshape(4, 3), + columns=list('aaa')).copy() + self.check_error_on_write(df, pa, ValueError) + + def test_unsupported(self, pa): + + # period + df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) + self.check_error_on_write(df, pa, ValueError) + + # categorical + df = pd.DataFrame({'a': pd.Categorical(list('abc'))}) + self.check_error_on_write(df, pa, NotImplementedError) + + # timedelta + df = pd.DataFrame({'a': pd.timedelta_range('1 day', + periods=3)}) + self.check_error_on_write(df, pa, NotImplementedError) + + # mixed python objects + df = pd.DataFrame({'a': ['a', 1, 2.0]}) + self.check_error_on_write(df, pa, ValueError) + + +class TestParquetFastParquet(Base): + + def test_basic(self, fp): + + df = pd.DataFrame( + {'string': list('abc'), + 'string_with_nan': ['a', np.nan, 'c'], + 'string_with_none': ['a', None, 'c'], + 'bytes': [b'foo', b'bar', b'baz'], + 'unicode': [u'foo', u'bar', u'baz'], + 'int': list(range(1, 4)), + 'uint': np.arange(3, 6).astype('u1'), + 'float': np.arange(4.0, 7.0, dtype='float64'), + 'float_with_nan': [2., np.nan, 3.], + 'bool': [True, False, True], + 'datetime': pd.date_range('20130101', periods=3), + 'datetime_with_nat': [pd.Timestamp('20130101'), + pd.NaT, + pd.Timestamp('20130103')], + 'timedelta': pd.timedelta_range('1 day', periods=3), + }) + + self.check_round_trip(df, fp, compression=None) + + @pytest.mark.skip(reason="not supported") + def test_duplicate_columns(self, fp): + + # not currently able to handle duplicate columns + df = pd.DataFrame(np.arange(12).reshape(4, 3), + columns=list('aaa')).copy() + self.check_error_on_write(df, fp, ValueError) + + def test_bool_with_none(self, fp): + df = pd.DataFrame({'a': [True, None, False]}) + expected = pd.DataFrame({'a': [1.0, np.nan, 0.0]}, dtype='float16') + self.check_round_trip(df, fp, expected=expected, compression=None) + + def test_unsupported(self, fp): + + # period + df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) + self.check_error_on_write(df, fp, ValueError) + + # mixed + df = pd.DataFrame({'a': ['a', 1, 2.0]}) + self.check_error_on_write(df, fp, ValueError) + + def test_categorical(self, fp): + df = pd.DataFrame({'a': pd.Categorical(list('abc'))}) + self.check_round_trip(df, fp, compression=None) + + def test_datetime_tz(self, fp): + # doesn't preserve tz + df = pd.DataFrame({'a': pd.date_range('20130101', periods=3, + tz='US/Eastern')}) + + # warns on the coercion + with catch_warnings(record=True): + self.check_round_trip(df, fp, df.astype('datetime64[ns]'), + compression=None) diff --git a/pandas/tests/io/test_pickle.py b/pandas/tests/io/test_pickle.py new file mode 100644 index 0000000000000..d56b36779efe7 --- /dev/null +++ b/pandas/tests/io/test_pickle.py @@ -0,0 +1,539 @@ +# pylint: disable=E1101,E1103,W0232 + +""" +manage legacy pickle tests + +How to add pickle tests: + +1. Install pandas version intended to output the pickle. + +2. Execute "generate_legacy_storage_files.py" to create the pickle. +$ python generate_legacy_storage_files.py pickle + +3. Move the created pickle to "data/legacy_pickle/" directory. +""" + +import pytest +from warnings import catch_warnings + +import os +from distutils.version import LooseVersion +import pandas as pd +from pandas import Index +from pandas.compat import is_platform_little_endian +import pandas +import pandas.util.testing as tm +from pandas.tseries.offsets import Day, MonthEnd +import shutil +import sys + + +@pytest.fixture(scope='module') +def current_pickle_data(): + # our current version pickle data + from pandas.tests.io.generate_legacy_storage_files import ( + create_pickle_data) + return create_pickle_data() + + +# --------------------- +# comparision functions +# --------------------- +def compare_element(result, expected, typ, version=None): + if isinstance(expected, Index): + tm.assert_index_equal(expected, result) + return + + if typ.startswith('sp_'): + comparator = getattr(tm, "assert_%s_equal" % typ) + comparator(result, expected, exact_indices=False) + elif typ == 'timestamp': + if expected is pd.NaT: + assert result is pd.NaT + else: + assert result == expected + assert result.freq == expected.freq + else: + comparator = getattr(tm, "assert_%s_equal" % + typ, tm.assert_almost_equal) + comparator(result, expected) + + +def compare(data, vf, version): + + # py3 compat when reading py2 pickle + try: + data = pandas.read_pickle(vf) + except (ValueError) as e: + if 'unsupported pickle protocol:' in str(e): + # trying to read a py3 pickle in py2 + return + else: + raise + + m = globals() + for typ, dv in data.items(): + for dt, result in dv.items(): + try: + expected = data[typ][dt] + except (KeyError): + if version in ('0.10.1', '0.11.0') and dt == 'reg': + break + else: + raise + + # use a specific comparator + # if available + comparator = "compare_{typ}_{dt}".format(typ=typ, dt=dt) + + comparator = m.get(comparator, m['compare_element']) + comparator(result, expected, typ, version) + return data + + +def compare_sp_series_ts(res, exp, typ, version): + # SparseTimeSeries integrated into SparseSeries in 0.12.0 + # and deprecated in 0.17.0 + if version and LooseVersion(version) <= "0.12.0": + tm.assert_sp_series_equal(res, exp, check_series_type=False) + else: + tm.assert_sp_series_equal(res, exp) + + +def compare_series_ts(result, expected, typ, version): + # GH 7748 + tm.assert_series_equal(result, expected) + assert result.index.freq == expected.index.freq + assert not result.index.freq.normalize + tm.assert_series_equal(result > 0, expected > 0) + + # GH 9291 + freq = result.index.freq + assert freq + Day(1) == Day(2) + + res = freq + pandas.Timedelta(hours=1) + assert isinstance(res, pandas.Timedelta) + assert res == pandas.Timedelta(days=1, hours=1) + + res = freq + pandas.Timedelta(nanoseconds=1) + assert isinstance(res, pandas.Timedelta) + assert res == pandas.Timedelta(days=1, nanoseconds=1) + + +def compare_series_dt_tz(result, expected, typ, version): + # 8260 + # dtype is object < 0.17.0 + if LooseVersion(version) < '0.17.0': + expected = expected.astype(object) + tm.assert_series_equal(result, expected) + else: + tm.assert_series_equal(result, expected) + + +def compare_series_cat(result, expected, typ, version): + # Categorical dtype is added in 0.15.0 + # ordered is changed in 0.16.0 + if LooseVersion(version) < '0.15.0': + tm.assert_series_equal(result, expected, check_dtype=False, + check_categorical=False) + elif LooseVersion(version) < '0.16.0': + tm.assert_series_equal(result, expected, check_categorical=False) + else: + tm.assert_series_equal(result, expected) + + +def compare_frame_dt_mixed_tzs(result, expected, typ, version): + # 8260 + # dtype is object < 0.17.0 + if LooseVersion(version) < '0.17.0': + expected = expected.astype(object) + tm.assert_frame_equal(result, expected) + else: + tm.assert_frame_equal(result, expected) + + +def compare_frame_cat_onecol(result, expected, typ, version): + # Categorical dtype is added in 0.15.0 + # ordered is changed in 0.16.0 + if LooseVersion(version) < '0.15.0': + tm.assert_frame_equal(result, expected, check_dtype=False, + check_categorical=False) + elif LooseVersion(version) < '0.16.0': + tm.assert_frame_equal(result, expected, check_categorical=False) + else: + tm.assert_frame_equal(result, expected) + + +def compare_frame_cat_and_float(result, expected, typ, version): + compare_frame_cat_onecol(result, expected, typ, version) + + +def compare_index_period(result, expected, typ, version): + tm.assert_index_equal(result, expected) + assert isinstance(result.freq, MonthEnd) + assert result.freq == MonthEnd() + assert result.freqstr == 'M' + tm.assert_index_equal(result.shift(2), expected.shift(2)) + + +def compare_sp_frame_float(result, expected, typ, version): + if LooseVersion(version) <= '0.18.1': + tm.assert_sp_frame_equal(result, expected, exact_indices=False, + check_dtype=False) + else: + tm.assert_sp_frame_equal(result, expected) + + +# --------------------- +# tests +# --------------------- +def legacy_pickle_versions(): + # yield the pickle versions + path = tm.get_data_path('legacy_pickle') + for v in os.listdir(path): + p = os.path.join(path, v) + if os.path.isdir(p): + yield v + + +@pytest.mark.parametrize('version', legacy_pickle_versions()) +def test_pickles(current_pickle_data, version): + if not is_platform_little_endian(): + pytest.skip("known failure on non-little endian") + + pth = tm.get_data_path('legacy_pickle/{0}'.format(version)) + n = 0 + for f in os.listdir(pth): + vf = os.path.join(pth, f) + with catch_warnings(record=True): + data = compare(current_pickle_data, vf, version) + + if data is None: + continue + n += 1 + assert n > 0, ('Pickle files are not ' + 'tested: {version}'.format(version=version)) + + +def test_round_trip_current(current_pickle_data): + + try: + import cPickle as c_pickle + + def c_pickler(obj, path): + with open(path, 'wb') as fh: + c_pickle.dump(obj, fh, protocol=-1) + + def c_unpickler(path): + with open(path, 'rb') as fh: + fh.seek(0) + return c_pickle.load(fh) + except: + c_pickler = None + c_unpickler = None + + import pickle as python_pickle + + def python_pickler(obj, path): + with open(path, 'wb') as fh: + python_pickle.dump(obj, fh, protocol=-1) + + def python_unpickler(path): + with open(path, 'rb') as fh: + fh.seek(0) + return python_pickle.load(fh) + + data = current_pickle_data + for typ, dv in data.items(): + for dt, expected in dv.items(): + + for writer in [pd.to_pickle, c_pickler, python_pickler]: + if writer is None: + continue + + with tm.ensure_clean() as path: + + # test writing with each pickler + writer(expected, path) + + # test reading with each unpickler + result = pd.read_pickle(path) + compare_element(result, expected, typ) + + if c_unpickler is not None: + result = c_unpickler(path) + compare_element(result, expected, typ) + + result = python_unpickler(path) + compare_element(result, expected, typ) + + +def test_pickle_v0_14_1(): + + cat = pd.Categorical(values=['a', 'b', 'c'], ordered=False, + categories=['a', 'b', 'c', 'd']) + pickle_path = os.path.join(tm.get_data_path(), + 'categorical_0_14_1.pickle') + # This code was executed once on v0.14.1 to generate the pickle: + # + # cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'], + # name='foobar') + # with open(pickle_path, 'wb') as f: pickle.dump(cat, f) + # + tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path)) + + +def test_pickle_v0_15_2(): + # ordered -> _ordered + # GH 9347 + + cat = pd.Categorical(values=['a', 'b', 'c'], ordered=False, + categories=['a', 'b', 'c', 'd']) + pickle_path = os.path.join(tm.get_data_path(), + 'categorical_0_15_2.pickle') + # This code was executed once on v0.15.2 to generate the pickle: + # + # cat = Categorical(labels=np.arange(3), levels=['a', 'b', 'c', 'd'], + # name='foobar') + # with open(pickle_path, 'wb') as f: pickle.dump(cat, f) + # + tm.assert_categorical_equal(cat, pd.read_pickle(pickle_path)) + + +def test_pickle_path_pathlib(): + df = tm.makeDataFrame() + result = tm.round_trip_pathlib(df.to_pickle, pd.read_pickle) + tm.assert_frame_equal(df, result) + + +def test_pickle_path_localpath(): + df = tm.makeDataFrame() + result = tm.round_trip_localpath(df.to_pickle, pd.read_pickle) + tm.assert_frame_equal(df, result) + + +# --------------------- +# test pickle compression +# --------------------- + +@pytest.fixture +def get_random_path(): + return u'__%s__.pickle' % tm.rands(10) + + +class TestCompression(object): + + _compression_to_extension = { + None: ".none", + 'gzip': '.gz', + 'bz2': '.bz2', + 'zip': '.zip', + 'xz': '.xz', + } + + def compress_file(self, src_path, dest_path, compression): + if compression is None: + shutil.copyfile(src_path, dest_path) + return + + if compression == 'gzip': + import gzip + f = gzip.open(dest_path, "w") + elif compression == 'bz2': + import bz2 + f = bz2.BZ2File(dest_path, "w") + elif compression == 'zip': + import zipfile + zip_file = zipfile.ZipFile(dest_path, "w", + compression=zipfile.ZIP_DEFLATED) + zip_file.write(src_path, os.path.basename(src_path)) + elif compression == 'xz': + lzma = pandas.compat.import_lzma() + f = lzma.LZMAFile(dest_path, "w") + else: + msg = 'Unrecognized compression type: {}'.format(compression) + raise ValueError(msg) + + if compression != "zip": + with open(src_path, "rb") as fh: + f.write(fh.read()) + f.close() + + def decompress_file(self, src_path, dest_path, compression): + if compression is None: + shutil.copyfile(src_path, dest_path) + return + + if compression == 'gzip': + import gzip + f = gzip.open(src_path, "r") + elif compression == 'bz2': + import bz2 + f = bz2.BZ2File(src_path, "r") + elif compression == 'zip': + import zipfile + zip_file = zipfile.ZipFile(src_path) + zip_names = zip_file.namelist() + if len(zip_names) == 1: + f = zip_file.open(zip_names.pop()) + else: + raise ValueError('ZIP file {} error. Only one file per ZIP.' + .format(src_path)) + elif compression == 'xz': + lzma = pandas.compat.import_lzma() + f = lzma.LZMAFile(src_path, "r") + else: + msg = 'Unrecognized compression type: {}'.format(compression) + raise ValueError(msg) + + with open(dest_path, "wb") as fh: + fh.write(f.read()) + f.close() + + @pytest.mark.parametrize('compression', [None, 'gzip', 'bz2', 'xz']) + def test_write_explicit(self, compression, get_random_path): + # issue 11666 + if compression == 'xz': + tm._skip_if_no_lzma() + + base = get_random_path + path1 = base + ".compressed" + path2 = base + ".raw" + + with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2: + df = tm.makeDataFrame() + + # write to compressed file + df.to_pickle(p1, compression=compression) + + # decompress + self.decompress_file(p1, p2, compression=compression) + + # read decompressed file + df2 = pd.read_pickle(p2, compression=None) + + tm.assert_frame_equal(df, df2) + + @pytest.mark.parametrize('compression', ['', 'None', 'bad', '7z']) + def test_write_explicit_bad(self, compression, get_random_path): + with tm.assert_raises_regex(ValueError, + "Unrecognized compression type"): + with tm.ensure_clean(get_random_path) as path: + df = tm.makeDataFrame() + df.to_pickle(path, compression=compression) + + @pytest.mark.parametrize('ext', ['', '.gz', '.bz2', '.xz', '.no_compress']) + def test_write_infer(self, ext, get_random_path): + if ext == '.xz': + tm._skip_if_no_lzma() + + base = get_random_path + path1 = base + ext + path2 = base + ".raw" + compression = None + for c in self._compression_to_extension: + if self._compression_to_extension[c] == ext: + compression = c + break + + with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2: + df = tm.makeDataFrame() + + # write to compressed file by inferred compression method + df.to_pickle(p1) + + # decompress + self.decompress_file(p1, p2, compression=compression) + + # read decompressed file + df2 = pd.read_pickle(p2, compression=None) + + tm.assert_frame_equal(df, df2) + + @pytest.mark.parametrize('compression', [None, 'gzip', 'bz2', 'xz', "zip"]) + def test_read_explicit(self, compression, get_random_path): + # issue 11666 + if compression == 'xz': + tm._skip_if_no_lzma() + + base = get_random_path + path1 = base + ".raw" + path2 = base + ".compressed" + + with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2: + df = tm.makeDataFrame() + + # write to uncompressed file + df.to_pickle(p1, compression=None) + + # compress + self.compress_file(p1, p2, compression=compression) + + # read compressed file + df2 = pd.read_pickle(p2, compression=compression) + + tm.assert_frame_equal(df, df2) + + @pytest.mark.parametrize('ext', ['', '.gz', '.bz2', '.xz', '.zip', + '.no_compress']) + def test_read_infer(self, ext, get_random_path): + if ext == '.xz': + tm._skip_if_no_lzma() + + base = get_random_path + path1 = base + ".raw" + path2 = base + ext + compression = None + for c in self._compression_to_extension: + if self._compression_to_extension[c] == ext: + compression = c + break + + with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2: + df = tm.makeDataFrame() + + # write to uncompressed file + df.to_pickle(p1, compression=None) + + # compress + self.compress_file(p1, p2, compression=compression) + + # read compressed file by inferred compression method + df2 = pd.read_pickle(p2) + + tm.assert_frame_equal(df, df2) + + +# --------------------- +# test pickle compression +# --------------------- + +class TestProtocol(object): + + @pytest.mark.parametrize('protocol', [-1, 0, 1, 2]) + def test_read(self, protocol, get_random_path): + with tm.ensure_clean(get_random_path) as path: + df = tm.makeDataFrame() + df.to_pickle(path, protocol=protocol) + df2 = pd.read_pickle(path) + tm.assert_frame_equal(df, df2) + + @pytest.mark.parametrize('protocol', [3, 4]) + @pytest.mark.skipif(sys.version_info[:2] >= (3, 4), + reason="Testing invalid parameters for " + "Python 2.x and 3.y (y < 4).") + def test_read_bad_versions(self, protocol, get_random_path): + # For Python 2.x (respectively 3.y with y < 4), [expected] + # HIGHEST_PROTOCOL should be 2 (respectively 3). Hence, the protocol + # parameter should not exceed 2 (respectively 3). + if sys.version_info[:2] < (3, 0): + expect_hp = 2 + else: + expect_hp = 3 + with tm.assert_raises_regex(ValueError, + "pickle protocol %d asked for; the highest" + " available protocol is %d" % (protocol, + expect_hp)): + with tm.ensure_clean(get_random_path) as path: + df = tm.makeDataFrame() + df.to_pickle(path, protocol=protocol) diff --git a/pandas/io/tests/test_pytables.py b/pandas/tests/io/test_pytables.py similarity index 70% rename from pandas/io/tests/test_pytables.py rename to pandas/tests/io/test_pytables.py index 40db10c42d5a7..f33ba7627101e 100644 --- a/pandas/io/tests/test_pytables.py +++ b/pandas/tests/io/test_pytables.py @@ -1,56 +1,44 @@ -import nose +import pytest import sys import os -import warnings import tempfile from contextlib import contextmanager +from warnings import catch_warnings import datetime +from datetime import timedelta import numpy as np import pandas import pandas as pd -from pandas import (Series, DataFrame, Panel, MultiIndex, Int64Index, +from pandas import (Series, DataFrame, Panel, Panel4D, MultiIndex, Int64Index, RangeIndex, Categorical, bdate_range, date_range, timedelta_range, Index, DatetimeIndex, - isnull) - -from pandas.compat import is_platform_windows, PY3, PY35 -from pandas.formats.printing import pprint_thing -from pandas.io.pytables import _tables, TableIterator -try: - _tables() -except ImportError as e: - raise nose.SkipTest(e) + isna) +from pandas.compat import is_platform_windows, PY3, PY35, BytesIO, text_type +from pandas.io.formats.printing import pprint_thing +tables = pytest.importorskip('tables') +from pandas.io.pytables import TableIterator from pandas.io.pytables import (HDFStore, get_store, Term, read_hdf, - IncompatibilityWarning, PerformanceWarning, - AttributeConflictWarning, DuplicateWarning, PossibleDataLossError, ClosedFileError) + from pandas.io import pytables as pytables import pandas.util.testing as tm from pandas.util.testing import (assert_panel4d_equal, assert_panel_equal, assert_frame_equal, assert_series_equal, - assert_produces_warning, set_timezone) from pandas import concat, Timestamp from pandas import compat from pandas.compat import range, lrange, u - -try: - import tables -except ImportError: - raise nose.SkipTest('no pytables') - from distutils.version import LooseVersion _default_compressor = ('blosc' if LooseVersion(tables.__version__) >= '2.2' else 'zlib') -_multiprocess_can_split_ = False # testing on windows/py3 seems to fault # for using compression @@ -133,61 +121,50 @@ def _maybe_remove(store, key): pass -@contextmanager -def compat_assert_produces_warning(w): - """ don't produce a warning under PY3 """ - if compat.PY3: - yield - else: - with tm.assert_produces_warning(expected_warning=w, - check_stacklevel=False): - yield - - -class Base(tm.TestCase): +class Base(object): @classmethod - def setUpClass(cls): - super(Base, cls).setUpClass() + def setup_class(cls): # Pytables 3.0.0 deprecates lots of things tm.reset_testing_mode() @classmethod - def tearDownClass(cls): - super(Base, cls).tearDownClass() + def teardown_class(cls): # Pytables 3.0.0 deprecates lots of things tm.set_testing_mode() - def setUp(self): - warnings.filterwarnings(action='ignore', category=FutureWarning) - + def setup_method(self, method): self.path = 'tmp.__%s__.h5' % tm.rands(10) - def tearDown(self): + def teardown_method(self, method): pass -class TestHDFStore(Base, tm.TestCase): +@pytest.mark.single +class TestHDFStore(Base): def test_factory_fun(self): path = create_tempfile(self.path) try: - with get_store(path) as tbl: - raise ValueError('blah') + with catch_warnings(record=True): + with get_store(path) as tbl: + raise ValueError('blah') except ValueError: pass finally: safe_remove(path) try: - with get_store(path) as tbl: - tbl['a'] = tm.makeDataFrame() - - with get_store(path) as tbl: - self.assertEqual(len(tbl), 1) - self.assertEqual(type(tbl['a']), DataFrame) + with catch_warnings(record=True): + with get_store(path) as tbl: + tbl['a'] = tm.makeDataFrame() + + with catch_warnings(record=True): + with get_store(path) as tbl: + assert len(tbl) == 1 + assert type(tbl['a']) == DataFrame finally: safe_remove(self.path) @@ -206,8 +183,8 @@ def test_context(self): tbl['a'] = tm.makeDataFrame() with HDFStore(path) as tbl: - self.assertEqual(len(tbl), 1) - self.assertEqual(type(tbl['a']), DataFrame) + assert len(tbl) == 1 + assert type(tbl['a']) == DataFrame finally: safe_remove(path) @@ -227,8 +204,10 @@ def roundtrip(key, obj, **kwargs): o = tm.makeDataFrame() assert_frame_equal(o, roundtrip('frame', o)) - o = tm.makePanel() - assert_panel_equal(o, roundtrip('panel', o)) + with catch_warnings(record=True): + + o = tm.makePanel() + assert_panel_equal(o, roundtrip('panel', o)) # table df = DataFrame(dict(A=lrange(5), B=lrange(5))) @@ -327,20 +306,20 @@ def test_api(self): # invalid df = tm.makeDataFrame() - self.assertRaises(ValueError, df.to_hdf, path, - 'df', append=True, format='f') - self.assertRaises(ValueError, df.to_hdf, path, - 'df', append=True, format='fixed') + pytest.raises(ValueError, df.to_hdf, path, + 'df', append=True, format='f') + pytest.raises(ValueError, df.to_hdf, path, + 'df', append=True, format='fixed') - self.assertRaises(TypeError, df.to_hdf, path, - 'df', append=True, format='foo') - self.assertRaises(TypeError, df.to_hdf, path, - 'df', append=False, format='bar') + pytest.raises(TypeError, df.to_hdf, path, + 'df', append=True, format='foo') + pytest.raises(TypeError, df.to_hdf, path, + 'df', append=False, format='bar') # File path doesn't exist path = "" - self.assertRaises(compat.FileNotFoundError, - read_hdf, path, 'df') + pytest.raises(compat.FileNotFoundError, + read_hdf, path, 'df') def test_api_default_format(self): @@ -351,16 +330,16 @@ def test_api_default_format(self): pandas.set_option('io.hdf.default_format', 'fixed') _maybe_remove(store, 'df') store.put('df', df) - self.assertFalse(store.get_storer('df').is_table) - self.assertRaises(ValueError, store.append, 'df2', df) + assert not store.get_storer('df').is_table + pytest.raises(ValueError, store.append, 'df2', df) pandas.set_option('io.hdf.default_format', 'table') _maybe_remove(store, 'df') store.put('df', df) - self.assertTrue(store.get_storer('df').is_table) + assert store.get_storer('df').is_table _maybe_remove(store, 'df2') store.append('df2', df) - self.assertTrue(store.get_storer('df').is_table) + assert store.get_storer('df').is_table pandas.set_option('io.hdf.default_format', None) @@ -370,17 +349,17 @@ def test_api_default_format(self): pandas.set_option('io.hdf.default_format', 'fixed') df.to_hdf(path, 'df') - with get_store(path) as store: - self.assertFalse(store.get_storer('df').is_table) - self.assertRaises(ValueError, df.to_hdf, path, 'df2', append=True) + with HDFStore(path) as store: + assert not store.get_storer('df').is_table + pytest.raises(ValueError, df.to_hdf, path, 'df2', append=True) pandas.set_option('io.hdf.default_format', 'table') df.to_hdf(path, 'df3') with HDFStore(path) as store: - self.assertTrue(store.get_storer('df3').is_table) + assert store.get_storer('df3').is_table df.to_hdf(path, 'df4', append=True) with HDFStore(path) as store: - self.assertTrue(store.get_storer('df4').is_table) + assert store.get_storer('df4').is_table pandas.set_option('io.hdf.default_format', None) @@ -390,29 +369,33 @@ def test_keys(self): store['a'] = tm.makeTimeSeries() store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - self.assertEqual(len(store), 5) + with catch_warnings(record=True): + store['d'] = tm.makePanel() + store['foo/bar'] = tm.makePanel() + assert len(store) == 5 expected = set(['/a', '/b', '/c', '/d', '/foo/bar']) - self.assertTrue(set(store.keys()) == expected) - self.assertTrue(set(store) == expected) + assert set(store.keys()) == expected + assert set(store) == expected def test_iter_empty(self): with ensure_clean_store(self.path) as store: # GH 12221 - self.assertTrue(list(store) == []) + assert list(store) == [] def test_repr(self): with ensure_clean_store(self.path) as store: repr(store) + store.info() store['a'] = tm.makeTimeSeries() store['b'] = tm.makeStringSeries() store['c'] = tm.makeDataFrame() - store['d'] = tm.makePanel() - store['foo/bar'] = tm.makePanel() - store.append('e', tm.makePanel()) + + with catch_warnings(record=True): + store['d'] = tm.makePanel() + store['foo/bar'] = tm.makePanel() + store.append('e', tm.makePanel()) df = tm.makeDataFrame() df['obj1'] = 'foo' @@ -427,17 +410,18 @@ def test_repr(self): df['datetime1'] = datetime.datetime(2001, 1, 2, 0, 0) df['datetime2'] = datetime.datetime(2001, 1, 3, 0, 0) df.loc[3:6, ['obj1']] = np.nan - df = df.consolidate()._convert(datetime=True) + df = df._consolidate()._convert(datetime=True) - warnings.filterwarnings('ignore', category=PerformanceWarning) - store['df'] = df - warnings.filterwarnings('always', category=PerformanceWarning) + # PerformanceWarning + with catch_warnings(record=True): + store['df'] = df # make a random group in hdf space store._handle.create_group(store._handle.root, 'bah') - repr(store) - str(store) + assert store.filename in repr(store) + assert store.filename in str(store) + store.info() # storers with ensure_clean_store(self.path) as store: @@ -455,19 +439,18 @@ def test_contains(self): store['a'] = tm.makeTimeSeries() store['b'] = tm.makeDataFrame() store['foo/bar'] = tm.makeDataFrame() - self.assertIn('a', store) - self.assertIn('b', store) - self.assertNotIn('c', store) - self.assertIn('foo/bar', store) - self.assertIn('/foo/bar', store) - self.assertNotIn('/foo/b', store) - self.assertNotIn('bar', store) - - # GH 2694 - warnings.filterwarnings( - 'ignore', category=tables.NaturalNameWarning) - store['node())'] = tm.makeDataFrame() - self.assertIn('node())', store) + assert 'a' in store + assert 'b' in store + assert 'c' not in store + assert 'foo/bar' in store + assert '/foo/bar' in store + assert '/foo/b' not in store + assert 'bar' not in store + + # gh-2694: tables.NaturalNameWarning + with catch_warnings(record=True): + store['node())'] = tm.makeDataFrame() + assert 'node())' in store def test_versioning(self): @@ -478,9 +461,9 @@ def test_versioning(self): _maybe_remove(store, 'df1') store.append('df1', df[:10]) store.append('df1', df[10:]) - self.assertEqual(store.root.a._v_attrs.pandas_version, '0.15.2') - self.assertEqual(store.root.b._v_attrs.pandas_version, '0.15.2') - self.assertEqual(store.root.df1._v_attrs.pandas_version, '0.15.2') + assert store.root.a._v_attrs.pandas_version == '0.15.2' + assert store.root.b._v_attrs.pandas_version == '0.15.2' + assert store.root.df1._v_attrs.pandas_version == '0.15.2' # write a file and wipe its versioning _maybe_remove(store, 'df2') @@ -489,7 +472,7 @@ def test_versioning(self): # this is an error because its table_type is appendable, but no # version info store.get_node('df2')._v_attrs.pandas_version = None - self.assertRaises(Exception, store.select, 'df2') + pytest.raises(Exception, store.select, 'df2') def test_mode(self): @@ -501,11 +484,11 @@ def check(mode): # constructor if mode in ['r', 'r+']: - self.assertRaises(IOError, HDFStore, path, mode=mode) + pytest.raises(IOError, HDFStore, path, mode=mode) else: store = HDFStore(path, mode=mode) - self.assertEqual(store._handle.mode, mode) + assert store._handle.mode == mode store.close() with ensure_clean_path(self.path) as path: @@ -515,25 +498,25 @@ def check(mode): def f(): with HDFStore(path, mode=mode) as store: # noqa pass - self.assertRaises(IOError, f) + pytest.raises(IOError, f) else: with HDFStore(path, mode=mode) as store: - self.assertEqual(store._handle.mode, mode) + assert store._handle.mode == mode with ensure_clean_path(self.path) as path: # conv write if mode in ['r', 'r+']: - self.assertRaises(IOError, df.to_hdf, - path, 'df', mode=mode) + pytest.raises(IOError, df.to_hdf, + path, 'df', mode=mode) df.to_hdf(path, 'df', mode='w') else: df.to_hdf(path, 'df', mode=mode) # conv read if mode in ['w']: - self.assertRaises(ValueError, read_hdf, - path, 'df', mode=mode) + pytest.raises(ValueError, read_hdf, + path, 'df', mode=mode) else: result = read_hdf(path, 'df', mode=mode) assert_frame_equal(result, df) @@ -560,43 +543,43 @@ def test_reopen_handle(self): store['a'] = tm.makeTimeSeries() # invalid mode change - self.assertRaises(PossibleDataLossError, store.open, 'w') + pytest.raises(PossibleDataLossError, store.open, 'w') store.close() - self.assertFalse(store.is_open) + assert not store.is_open # truncation ok here store.open('w') - self.assertTrue(store.is_open) - self.assertEqual(len(store), 0) + assert store.is_open + assert len(store) == 0 store.close() - self.assertFalse(store.is_open) + assert not store.is_open store = HDFStore(path, mode='a') store['a'] = tm.makeTimeSeries() # reopen as read store.open('r') - self.assertTrue(store.is_open) - self.assertEqual(len(store), 1) - self.assertEqual(store._mode, 'r') + assert store.is_open + assert len(store) == 1 + assert store._mode == 'r' store.close() - self.assertFalse(store.is_open) + assert not store.is_open # reopen as append store.open('a') - self.assertTrue(store.is_open) - self.assertEqual(len(store), 1) - self.assertEqual(store._mode, 'a') + assert store.is_open + assert len(store) == 1 + assert store._mode == 'a' store.close() - self.assertFalse(store.is_open) + assert not store.is_open # reopen as append (again) store.open('a') - self.assertTrue(store.is_open) - self.assertEqual(len(store), 1) - self.assertEqual(store._mode, 'a') + assert store.is_open + assert len(store) == 1 + assert store._mode == 'a' store.close() - self.assertFalse(store.is_open) + assert not store.is_open def test_open_args(self): @@ -616,7 +599,7 @@ def test_open_args(self): store.close() # the file should not have actually been written - self.assertFalse(os.path.exists(path)) + assert not os.path.exists(path) def test_flush(self): @@ -637,7 +620,7 @@ def test_get(self): right = store['/a'] tm.assert_series_equal(left, right) - self.assertRaises(KeyError, store.get, 'b') + pytest.raises(KeyError, store.get, 'b') def test_getattr(self): @@ -658,10 +641,10 @@ def test_getattr(self): tm.assert_frame_equal(result, df) # errors - self.assertRaises(AttributeError, getattr, store, 'd') + pytest.raises(AttributeError, getattr, store, 'd') for x in ['mode', 'path', 'handle', 'complib']: - self.assertRaises(AttributeError, getattr, store, x) + pytest.raises(AttributeError, getattr, store, x) # not stores for x in ['mode', 'path', 'handle', 'complib']: @@ -681,17 +664,17 @@ def test_put(self): store.put('c', df[:10], format='table') # not OK, not a table - self.assertRaises( + pytest.raises( ValueError, store.put, 'b', df[10:], append=True) # node does not currently exist, test _is_table_type returns False # in this case # _maybe_remove(store, 'f') - # self.assertRaises(ValueError, store.put, 'f', df[10:], + # pytest.raises(ValueError, store.put, 'f', df[10:], # append=True) # can't put to a table (use append instead) - self.assertRaises(ValueError, store.put, 'c', df[10:], append=True) + pytest.raises(ValueError, store.put, 'c', df[10:], append=True) # overwrite table store.put('c', df[:10], format='table', append=False) @@ -733,25 +716,112 @@ def test_put_compression(self): tm.assert_frame_equal(store['c'], df) # can't compress if format='fixed' - self.assertRaises(ValueError, store.put, 'b', df, - format='fixed', complib='zlib') + pytest.raises(ValueError, store.put, 'b', df, + format='fixed', complib='zlib') def test_put_compression_blosc(self): - tm.skip_if_no_package('tables', '2.2', app='blosc support') + tm.skip_if_no_package('tables', min_version='2.2', + app='blosc support') if skip_compression: - raise nose.SkipTest("skipping on windows/PY3") + pytest.skip("skipping on windows/PY3") df = tm.makeTimeDataFrame() with ensure_clean_store(self.path) as store: # can't compress if format='fixed' - self.assertRaises(ValueError, store.put, 'b', df, - format='fixed', complib='blosc') + pytest.raises(ValueError, store.put, 'b', df, + format='fixed', complib='blosc') store.put('c', df, format='table', complib='blosc') tm.assert_frame_equal(store['c'], df) + def test_complibs_default_settings(self): + # GH15943 + df = tm.makeDataFrame() + + # Set complevel and check if complib is automatically set to + # default value + with ensure_clean_path(self.path) as tmpfile: + df.to_hdf(tmpfile, 'df', complevel=9) + result = pd.read_hdf(tmpfile, 'df') + tm.assert_frame_equal(result, df) + + with tables.open_file(tmpfile, mode='r') as h5file: + for node in h5file.walk_nodes(where='/df', classname='Leaf'): + assert node.filters.complevel == 9 + assert node.filters.complib == 'zlib' + + # Set complib and check to see if compression is disabled + with ensure_clean_path(self.path) as tmpfile: + df.to_hdf(tmpfile, 'df', complib='zlib') + result = pd.read_hdf(tmpfile, 'df') + tm.assert_frame_equal(result, df) + + with tables.open_file(tmpfile, mode='r') as h5file: + for node in h5file.walk_nodes(where='/df', classname='Leaf'): + assert node.filters.complevel == 0 + assert node.filters.complib is None + + # Check if not setting complib or complevel results in no compression + with ensure_clean_path(self.path) as tmpfile: + df.to_hdf(tmpfile, 'df') + result = pd.read_hdf(tmpfile, 'df') + tm.assert_frame_equal(result, df) + + with tables.open_file(tmpfile, mode='r') as h5file: + for node in h5file.walk_nodes(where='/df', classname='Leaf'): + assert node.filters.complevel == 0 + assert node.filters.complib is None + + # Check if file-defaults can be overridden on a per table basis + with ensure_clean_path(self.path) as tmpfile: + store = pd.HDFStore(tmpfile) + store.append('dfc', df, complevel=9, complib='blosc') + store.append('df', df) + store.close() + + with tables.open_file(tmpfile, mode='r') as h5file: + for node in h5file.walk_nodes(where='/df', classname='Leaf'): + assert node.filters.complevel == 0 + assert node.filters.complib is None + for node in h5file.walk_nodes(where='/dfc', classname='Leaf'): + assert node.filters.complevel == 9 + assert node.filters.complib == 'blosc' + + def test_complibs(self): + # GH14478 + df = tm.makeDataFrame() + + # Building list of all complibs and complevels tuples + all_complibs = tables.filters.all_complibs + # Remove lzo if its not available on this platform + if not tables.which_lib_version('lzo'): + all_complibs.remove('lzo') + all_levels = range(0, 10) + all_tests = [(lib, lvl) for lib in all_complibs for lvl in all_levels] + + for (lib, lvl) in all_tests: + with ensure_clean_path(self.path) as tmpfile: + gname = 'foo' + + # Write and read file to see if data is consistent + df.to_hdf(tmpfile, gname, complib=lib, complevel=lvl) + result = pd.read_hdf(tmpfile, gname) + tm.assert_frame_equal(result, df) + + # Open file and check metadata + # for correct amount of compression + h5table = tables.open_file(tmpfile, mode='r') + for node in h5table.walk_nodes(where='/' + gname, + classname='Leaf'): + assert node.filters.complevel == lvl + if lvl == 0: + assert node.filters.complib is None + else: + assert node.filters.complib == lib + h5table.close() + def test_put_integer(self): # non-date, non-string index df = DataFrame(np.random.randn(50, 100)) @@ -771,16 +841,14 @@ def test_put_mixed_type(self): df['datetime1'] = datetime.datetime(2001, 1, 2, 0, 0) df['datetime2'] = datetime.datetime(2001, 1, 3, 0, 0) df.loc[3:6, ['obj1']] = np.nan - df = df.consolidate()._convert(datetime=True) + df = df._consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: _maybe_remove(store, 'df') - # cannot use assert_produces_warning here for some reason - # a PendingDeprecationWarning is also raised? - warnings.filterwarnings('ignore', category=PerformanceWarning) - store.put('df', df) - warnings.filterwarnings('always', category=PerformanceWarning) + # PerformanceWarning + with catch_warnings(record=True): + store.put('df', df) expected = store.get('df') tm.assert_frame_equal(expected, df) @@ -788,40 +856,42 @@ def test_put_mixed_type(self): def test_append(self): with ensure_clean_store(self.path) as store: - df = tm.makeTimeDataFrame() - _maybe_remove(store, 'df1') - store.append('df1', df[:10]) - store.append('df1', df[10:]) - tm.assert_frame_equal(store['df1'], df) - - _maybe_remove(store, 'df2') - store.put('df2', df[:10], format='table') - store.append('df2', df[10:]) - tm.assert_frame_equal(store['df2'], df) - - _maybe_remove(store, 'df3') - store.append('/df3', df[:10]) - store.append('/df3', df[10:]) - tm.assert_frame_equal(store['df3'], df) # this is allowed by almost always don't want to do it - with tm.assert_produces_warning( - expected_warning=tables.NaturalNameWarning): + # tables.NaturalNameWarning): + with catch_warnings(record=True): + + df = tm.makeTimeDataFrame() + _maybe_remove(store, 'df1') + store.append('df1', df[:10]) + store.append('df1', df[10:]) + tm.assert_frame_equal(store['df1'], df) + + _maybe_remove(store, 'df2') + store.put('df2', df[:10], format='table') + store.append('df2', df[10:]) + tm.assert_frame_equal(store['df2'], df) + + _maybe_remove(store, 'df3') + store.append('/df3', df[:10]) + store.append('/df3', df[10:]) + tm.assert_frame_equal(store['df3'], df) + + # this is allowed by almost always don't want to do it + # tables.NaturalNameWarning _maybe_remove(store, '/df3 foo') store.append('/df3 foo', df[:10]) store.append('/df3 foo', df[10:]) tm.assert_frame_equal(store['df3 foo'], df) - # panel - wp = tm.makePanel() - _maybe_remove(store, 'wp1') - store.append('wp1', wp.iloc[:, :10, :]) - store.append('wp1', wp.iloc[:, 10:, :]) - assert_panel_equal(store['wp1'], wp) + # panel + wp = tm.makePanel() + _maybe_remove(store, 'wp1') + store.append('wp1', wp.iloc[:, :10, :]) + store.append('wp1', wp.iloc[:, 10:, :]) + assert_panel_equal(store['wp1'], wp) - # ndim - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): + # ndim p4d = tm.makePanel4D() _maybe_remove(store, 'p4d') store.append('p4d', p4d.iloc[:, :, :10, :]) @@ -845,42 +915,42 @@ def test_append(self): 'p4d2', p4d2, axes=['items', 'major_axis', 'minor_axis']) assert_panel4d_equal(store['p4d2'], p4d2) - # test using differt order of items on the non-index axes - _maybe_remove(store, 'wp1') - wp_append1 = wp.iloc[:, :10, :] - store.append('wp1', wp_append1) - wp_append2 = wp.iloc[:, 10:, :].reindex(items=wp.items[::-1]) - store.append('wp1', wp_append2) - assert_panel_equal(store['wp1'], wp) - - # dtype issues - mizxed type in a single object column - df = DataFrame(data=[[1, 2], [0, 1], [1, 2], [0, 0]]) - df['mixed_column'] = 'testing' - df.loc[2, 'mixed_column'] = np.nan - _maybe_remove(store, 'df') - store.append('df', df) - tm.assert_frame_equal(store['df'], df) - - # uints - test storage of uints - uint_data = DataFrame({ - 'u08': Series(np.random.randint(0, high=255, size=5), - dtype=np.uint8), - 'u16': Series(np.random.randint(0, high=65535, size=5), - dtype=np.uint16), - 'u32': Series(np.random.randint(0, high=2**30, size=5), - dtype=np.uint32), - 'u64': Series([2**58, 2**59, 2**60, 2**61, 2**62], - dtype=np.uint64)}, index=np.arange(5)) - _maybe_remove(store, 'uints') - store.append('uints', uint_data) - tm.assert_frame_equal(store['uints'], uint_data) - - # uints - test storage of uints in indexable columns - _maybe_remove(store, 'uints') - # 64-bit indices not yet supported - store.append('uints', uint_data, data_columns=[ - 'u08', 'u16', 'u32']) - tm.assert_frame_equal(store['uints'], uint_data) + # test using differt order of items on the non-index axes + _maybe_remove(store, 'wp1') + wp_append1 = wp.iloc[:, :10, :] + store.append('wp1', wp_append1) + wp_append2 = wp.iloc[:, 10:, :].reindex(items=wp.items[::-1]) + store.append('wp1', wp_append2) + assert_panel_equal(store['wp1'], wp) + + # dtype issues - mizxed type in a single object column + df = DataFrame(data=[[1, 2], [0, 1], [1, 2], [0, 0]]) + df['mixed_column'] = 'testing' + df.loc[2, 'mixed_column'] = np.nan + _maybe_remove(store, 'df') + store.append('df', df) + tm.assert_frame_equal(store['df'], df) + + # uints - test storage of uints + uint_data = DataFrame({ + 'u08': Series(np.random.randint(0, high=255, size=5), + dtype=np.uint8), + 'u16': Series(np.random.randint(0, high=65535, size=5), + dtype=np.uint16), + 'u32': Series(np.random.randint(0, high=2**30, size=5), + dtype=np.uint32), + 'u64': Series([2**58, 2**59, 2**60, 2**61, 2**62], + dtype=np.uint64)}, index=np.arange(5)) + _maybe_remove(store, 'uints') + store.append('uints', uint_data) + tm.assert_frame_equal(store['uints'], uint_data) + + # uints - test storage of uints in indexable columns + _maybe_remove(store, 'uints') + # 64-bit indices not yet supported + store.append('uints', uint_data, data_columns=[ + 'u08', 'u16', 'u32']) + tm.assert_frame_equal(store['uints'], uint_data) def test_append_series(self): @@ -894,27 +964,27 @@ def test_append_series(self): store.append('ss', ss) result = store['ss'] tm.assert_series_equal(result, ss) - self.assertIsNone(result.name) + assert result.name is None store.append('ts', ts) result = store['ts'] tm.assert_series_equal(result, ts) - self.assertIsNone(result.name) + assert result.name is None ns.name = 'foo' store.append('ns', ns) result = store['ns'] tm.assert_series_equal(result, ns) - self.assertEqual(result.name, ns.name) + assert result.name == ns.name # select on the values expected = ns[ns > 60] - result = store.select('ns', Term('foo>60')) + result = store.select('ns', 'foo>60') tm.assert_series_equal(result, expected) # select on the index and values expected = ns[(ns > 70) & (ns.index < 90)] - result = store.select('ns', [Term('foo>70'), Term('index<90')]) + result = store.select('ns', 'foo>70 and index<90') tm.assert_series_equal(result, expected) # multi-index @@ -961,15 +1031,16 @@ def check(format, index): else: # only support for fixed types (and they have a perf warning) - self.assertRaises(TypeError, check, 'table', index) - with tm.assert_produces_warning( - expected_warning=PerformanceWarning): + pytest.raises(TypeError, check, 'table', index) + + # PerformanceWarning + with catch_warnings(record=True): check('fixed', index) def test_encoding(self): if sys.byteorder != 'little': - raise nose.SkipTest('system byteorder is not little') + pytest.skip('system byteorder is not little') with ensure_clean_store(self.path) as store: df = DataFrame(dict(A='foo', B='bar'), index=range(5)) @@ -986,7 +1057,7 @@ def test_encoding(self): def test_latin_encoding(self): if compat.PY2: - self.assertRaisesRegexp( + tm.assert_raises_regex( TypeError, r'\[unicode\] is not implemented as a table column') return @@ -1156,15 +1227,17 @@ def test_append_all_nans(self): [[np.nan, np.nan, np.nan], [np.nan, 5, 6]], [[np.nan, np.nan, np.nan], [np.nan, 3, np.nan]]] - panel_with_missing = Panel(matrix, items=['Item1', 'Item2', 'Item3'], - major_axis=[1, 2], - minor_axis=['A', 'B', 'C']) + with catch_warnings(record=True): + panel_with_missing = Panel(matrix, + items=['Item1', 'Item2', 'Item3'], + major_axis=[1, 2], + minor_axis=['A', 'B', 'C']) - with ensure_clean_path(self.path) as path: - panel_with_missing.to_hdf( - path, 'panel_with_missing', format='table') - reloaded_panel = read_hdf(path, 'panel_with_missing') - tm.assert_panel_equal(panel_with_missing, reloaded_panel) + with ensure_clean_path(self.path) as path: + panel_with_missing.to_hdf( + path, 'panel_with_missing', format='table') + reloaded_panel = read_hdf(path, 'panel_with_missing') + tm.assert_panel_equal(panel_with_missing, reloaded_panel) def test_append_frame_column_oriented(self): @@ -1183,13 +1256,14 @@ def test_append_frame_column_oriented(self): # selection on the non-indexable result = store.select( - 'df1', ('columns=A', Term('index=df.index[0:4]'))) + 'df1', ('columns=A', 'index=df.index[0:4]')) expected = df.reindex(columns=['A'], index=df.index[0:4]) tm.assert_frame_equal(expected, result) # this isn't supported - self.assertRaises(TypeError, store.select, 'df1', ( - 'columns=A', Term('index>df.index[4]'))) + with pytest.raises(TypeError): + store.select('df1', + 'columns=A and index>df.index[4]') def test_append_with_different_block_ordering(self): @@ -1227,16 +1301,16 @@ def test_append_with_different_block_ordering(self): # store additonal fields in different blocks df['int16_2'] = Series([1] * len(df), dtype='int16') - self.assertRaises(ValueError, store.append, 'df', df) + pytest.raises(ValueError, store.append, 'df', df) # store multile additonal fields in different blocks df['float_3'] = Series([1.] * len(df), dtype='float64') - self.assertRaises(ValueError, store.append, 'df', df) + pytest.raises(ValueError, store.append, 'df', df) def test_ndim_indexables(self): # test using ndim tables in new ways - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): with ensure_clean_store(self.path) as store: p4d = tm.makePanel4D() @@ -1244,7 +1318,7 @@ def test_ndim_indexables(self): def check_indexers(key, indexers): for i, idx in enumerate(indexers): descr = getattr(store.root, key).table.description - self.assertTrue(getattr(descr, idx)._v_pos == i) + assert getattr(descr, idx)._v_pos == i # append then change (will take existing schema) indexers = ['items', 'major_axis', 'minor_axis'] @@ -1265,7 +1339,7 @@ def check_indexers(key, indexers): # pass incorrect number of axes _maybe_remove(store, 'p4d') - self.assertRaises(ValueError, store.append, 'p4d', p4d.iloc[ + pytest.raises(ValueError, store.append, 'p4d', p4d.iloc[ :, :, :10, :], axes=['major_axis', 'minor_axis']) # different than default indexables #1 @@ -1290,15 +1364,15 @@ def check_indexers(key, indexers): assert_panel4d_equal(result, expected) # partial selection2 - result = store.select('p4d', [Term( - 'labels=l1'), Term('items=ItemA'), Term('minor_axis=B')]) + result = store.select( + 'p4d', "labels='l1' and items='ItemA' and minor_axis='B'") expected = p4d.reindex( labels=['l1'], items=['ItemA'], minor_axis=['B']) assert_panel4d_equal(result, expected) # non-existant partial selection - result = store.select('p4d', [Term( - 'labels=l1'), Term('items=Item1'), Term('minor_axis=B')]) + result = store.select( + 'p4d', "labels='l1' and items='Item1' and minor_axis='B'") expected = p4d.reindex(labels=['l1'], items=[], minor_axis=['B']) assert_panel4d_equal(result, expected) @@ -1306,106 +1380,109 @@ def check_indexers(key, indexers): def test_append_with_strings(self): with ensure_clean_store(self.path) as store: - wp = tm.makePanel() - wp2 = wp.rename_axis( - dict([(x, "%s_extra" % x) for x in wp.minor_axis]), axis=2) - - def check_col(key, name, size): - self.assertEqual(getattr(store.get_storer( - key).table.description, name).itemsize, size) - - store.append('s1', wp, min_itemsize=20) - store.append('s1', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex(minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s1'], expected) - check_col('s1', 'minor_axis', 20) - - # test dict format - store.append('s2', wp, min_itemsize={'minor_axis': 20}) - store.append('s2', wp2) - expected = concat([wp, wp2], axis=2) - expected = expected.reindex(minor_axis=sorted(expected.minor_axis)) - assert_panel_equal(store['s2'], expected) - check_col('s2', 'minor_axis', 20) - - # apply the wrong field (similar to #1) - store.append('s3', wp, min_itemsize={'major_axis': 20}) - self.assertRaises(ValueError, store.append, 's3', wp2) - - # test truncation of bigger strings - store.append('s4', wp) - self.assertRaises(ValueError, store.append, 's4', wp2) - - # avoid truncation on elements - df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) - store.append('df_big', df) - tm.assert_frame_equal(store.select('df_big'), df) - check_col('df_big', 'values_block_1', 15) - - # appending smaller string ok - df2 = DataFrame([[124, 'asdqy'], [346, 'dggnhefbdfb']]) - store.append('df_big', df2) - expected = concat([df, df2]) - tm.assert_frame_equal(store.select('df_big'), expected) - check_col('df_big', 'values_block_1', 15) - - # avoid truncation on elements - df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) - store.append('df_big2', df, min_itemsize={'values': 50}) - tm.assert_frame_equal(store.select('df_big2'), df) - check_col('df_big2', 'values_block_1', 50) - - # bigger string on next append - store.append('df_new', df) - df_new = DataFrame( - [[124, 'abcdefqhij'], [346, 'abcdefghijklmnopqrtsuvwxyz']]) - self.assertRaises(ValueError, store.append, 'df_new', df_new) - - # min_itemsize on Series index (GH 11412) - df = tm.makeMixedDataFrame().set_index('C') - store.append('ss', df['B'], min_itemsize={'index': 4}) - tm.assert_series_equal(store.select('ss'), df['B']) - - # same as above, with data_columns=True - store.append('ss2', df['B'], data_columns=True, - min_itemsize={'index': 4}) - tm.assert_series_equal(store.select('ss2'), df['B']) - - # min_itemsize in index without appending (GH 10381) - store.put('ss3', df, format='table', - min_itemsize={'index': 6}) - # just make sure there is a longer string: - df2 = df.copy().reset_index().assign(C='longer').set_index('C') - store.append('ss3', df2) - tm.assert_frame_equal(store.select('ss3'), - pd.concat([df, df2])) - - # same as above, with a Series - store.put('ss4', df['B'], format='table', - min_itemsize={'index': 6}) - store.append('ss4', df2['B']) - tm.assert_series_equal(store.select('ss4'), - pd.concat([df['B'], df2['B']])) - - # with nans - _maybe_remove(store, 'df') - df = tm.makeTimeDataFrame() - df['string'] = 'foo' - df.loc[1:4, 'string'] = np.nan - df['string2'] = 'bar' - df.loc[4:8, 'string2'] = np.nan - df['string3'] = 'bah' - df.loc[1:, 'string3'] = np.nan - store.append('df', df) - result = store.select('df') - tm.assert_frame_equal(result, df) + with catch_warnings(record=True): + wp = tm.makePanel() + wp2 = wp.rename_axis( + dict([(x, "%s_extra" % x) for x in wp.minor_axis]), axis=2) + + def check_col(key, name, size): + assert getattr(store.get_storer(key) + .table.description, name).itemsize == size + + store.append('s1', wp, min_itemsize=20) + store.append('s1', wp2) + expected = concat([wp, wp2], axis=2) + expected = expected.reindex( + minor_axis=sorted(expected.minor_axis)) + assert_panel_equal(store['s1'], expected) + check_col('s1', 'minor_axis', 20) + + # test dict format + store.append('s2', wp, min_itemsize={'minor_axis': 20}) + store.append('s2', wp2) + expected = concat([wp, wp2], axis=2) + expected = expected.reindex( + minor_axis=sorted(expected.minor_axis)) + assert_panel_equal(store['s2'], expected) + check_col('s2', 'minor_axis', 20) + + # apply the wrong field (similar to #1) + store.append('s3', wp, min_itemsize={'major_axis': 20}) + pytest.raises(ValueError, store.append, 's3', wp2) + + # test truncation of bigger strings + store.append('s4', wp) + pytest.raises(ValueError, store.append, 's4', wp2) + + # avoid truncation on elements + df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) + store.append('df_big', df) + tm.assert_frame_equal(store.select('df_big'), df) + check_col('df_big', 'values_block_1', 15) + + # appending smaller string ok + df2 = DataFrame([[124, 'asdqy'], [346, 'dggnhefbdfb']]) + store.append('df_big', df2) + expected = concat([df, df2]) + tm.assert_frame_equal(store.select('df_big'), expected) + check_col('df_big', 'values_block_1', 15) + + # avoid truncation on elements + df = DataFrame([[123, 'asdqwerty'], [345, 'dggnhebbsdfbdfb']]) + store.append('df_big2', df, min_itemsize={'values': 50}) + tm.assert_frame_equal(store.select('df_big2'), df) + check_col('df_big2', 'values_block_1', 50) + + # bigger string on next append + store.append('df_new', df) + df_new = DataFrame( + [[124, 'abcdefqhij'], [346, 'abcdefghijklmnopqrtsuvwxyz']]) + pytest.raises(ValueError, store.append, 'df_new', df_new) + + # min_itemsize on Series index (GH 11412) + df = tm.makeMixedDataFrame().set_index('C') + store.append('ss', df['B'], min_itemsize={'index': 4}) + tm.assert_series_equal(store.select('ss'), df['B']) + + # same as above, with data_columns=True + store.append('ss2', df['B'], data_columns=True, + min_itemsize={'index': 4}) + tm.assert_series_equal(store.select('ss2'), df['B']) + + # min_itemsize in index without appending (GH 10381) + store.put('ss3', df, format='table', + min_itemsize={'index': 6}) + # just make sure there is a longer string: + df2 = df.copy().reset_index().assign(C='longer').set_index('C') + store.append('ss3', df2) + tm.assert_frame_equal(store.select('ss3'), + pd.concat([df, df2])) + + # same as above, with a Series + store.put('ss4', df['B'], format='table', + min_itemsize={'index': 6}) + store.append('ss4', df2['B']) + tm.assert_series_equal(store.select('ss4'), + pd.concat([df['B'], df2['B']])) + + # with nans + _maybe_remove(store, 'df') + df = tm.makeTimeDataFrame() + df['string'] = 'foo' + df.loc[1:4, 'string'] = np.nan + df['string2'] = 'bar' + df.loc[4:8, 'string2'] = np.nan + df['string3'] = 'bah' + df.loc[1:, 'string3'] = np.nan + store.append('df', df) + result = store.select('df') + tm.assert_frame_equal(result, df) with ensure_clean_store(self.path) as store: def check_col(key, name, size): - self.assertEqual(getattr(store.get_storer( - key).table.description, name).itemsize, size) + assert getattr(store.get_storer(key) + .table.description, name).itemsize, size df = DataFrame(dict(A='foo', B='bar'), index=range(10)) @@ -1413,13 +1490,13 @@ def check_col(key, name, size): _maybe_remove(store, 'df') store.append('df', df, min_itemsize={'A': 200}) check_col('df', 'A', 200) - self.assertEqual(store.get_storer('df').data_columns, ['A']) + assert store.get_storer('df').data_columns == ['A'] # a min_itemsize that creates a data_column2 _maybe_remove(store, 'df') store.append('df', df, data_columns=['B'], min_itemsize={'A': 200}) check_col('df', 'A', 200) - self.assertEqual(store.get_storer('df').data_columns, ['B', 'A']) + assert store.get_storer('df').data_columns == ['B', 'A'] # a min_itemsize that creates a data_column2 _maybe_remove(store, 'df') @@ -1427,7 +1504,7 @@ def check_col(key, name, size): 'B'], min_itemsize={'values': 200}) check_col('df', 'B', 200) check_col('df', 'values_block_0', 200) - self.assertEqual(store.get_storer('df').data_columns, ['B']) + assert store.get_storer('df').data_columns == ['B'] # infer the .typ on subsequent appends _maybe_remove(store, 'df') @@ -1439,8 +1516,8 @@ def check_col(key, name, size): df = DataFrame(['foo', 'foo', 'foo', 'barh', 'barh', 'barh'], columns=['A']) _maybe_remove(store, 'df') - self.assertRaises(ValueError, store.append, 'df', - df, min_itemsize={'foo': 20, 'foobar': 20}) + pytest.raises(ValueError, store.append, 'df', + df, min_itemsize={'foo': 20, 'foobar': 20}) def test_to_hdf_with_min_itemsize(self): @@ -1477,13 +1554,13 @@ def test_append_with_data_columns(self): assert(store._handle.root.df.table.cols.B.is_indexed is True) # data column searching - result = store.select('df', [Term('B>0')]) + result = store.select('df', 'B>0') expected = df[df.B > 0] tm.assert_frame_equal(result, expected) # data column searching (with an indexable and a data_columns) result = store.select( - 'df', [Term('B>0'), Term('index>df.index[3]')]) + 'df', 'B>0 and index>df.index[3]') df_new = df.reindex(index=df.index[4:]) expected = df_new[df_new.B > 0] tm.assert_frame_equal(result, expected) @@ -1495,14 +1572,14 @@ def test_append_with_data_columns(self): df_new.loc[5:6, 'string'] = 'bar' _maybe_remove(store, 'df') store.append('df', df_new, data_columns=['string']) - result = store.select('df', [Term('string=foo')]) + result = store.select('df', "string='foo'") expected = df_new[df_new.string == 'foo'] tm.assert_frame_equal(result, expected) # using min_itemsize and a data column def check_col(key, name, size): - self.assertEqual(getattr(store.get_storer( - key).table.description, name).itemsize, size) + assert getattr(store.get_storer(key) + .table.description, name).itemsize == size with ensure_clean_store(self.path) as store: _maybe_remove(store, 'df') @@ -1548,15 +1625,15 @@ def check_col(key, name, size): _maybe_remove(store, 'df') store.append( 'df', df_new, data_columns=['A', 'B', 'string', 'string2']) - result = store.select('df', [Term('string=foo'), Term( - 'string2=foo'), Term('A>0'), Term('B<0')]) + result = store.select('df', + "string='foo' and string2='foo'" + " and A>0 and B<0") expected = df_new[(df_new.string == 'foo') & ( df_new.string2 == 'foo') & (df_new.A > 0) & (df_new.B < 0)] tm.assert_frame_equal(result, expected, check_index_type=False) # yield an empty frame - result = store.select('df', [Term('string=foo'), Term( - 'string2=cool')]) + result = store.select('df', "string='foo' and string2='cool'") expected = df_new[(df_new.string == 'foo') & ( df_new.string2 == 'cool')] tm.assert_frame_equal(result, expected, check_index_type=False) @@ -1576,7 +1653,7 @@ def check_col(key, name, size): store.append('df_dc', df_dc, data_columns=['B', 'C', 'string', 'string2', 'datetime']) - result = store.select('df_dc', [Term('B>0')]) + result = store.select('df_dc', 'B>0') expected = df_dc[df_dc.B > 0] tm.assert_frame_equal(result, expected, check_index_type=False) @@ -1603,7 +1680,7 @@ def check_col(key, name, size): store.append('df_dc', df_dc, data_columns=[ 'B', 'C', 'string', 'string2']) - result = store.select('df_dc', [Term('B>0')]) + result = store.select('df_dc', 'B>0') expected = df_dc[df_dc.B > 0] tm.assert_frame_equal(result, expected) @@ -1614,98 +1691,103 @@ def check_col(key, name, size): tm.assert_frame_equal(result, expected) with ensure_clean_store(self.path) as store: - # panel - # GH5717 not handling data_columns - np.random.seed(1234) - p = tm.makePanel() + with catch_warnings(record=True): + # panel + # GH5717 not handling data_columns + np.random.seed(1234) + p = tm.makePanel() - store.append('p1', p) - tm.assert_panel_equal(store.select('p1'), p) + store.append('p1', p) + tm.assert_panel_equal(store.select('p1'), p) - store.append('p2', p, data_columns=True) - tm.assert_panel_equal(store.select('p2'), p) + store.append('p2', p, data_columns=True) + tm.assert_panel_equal(store.select('p2'), p) - result = store.select('p2', where='ItemA>0') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - tm.assert_frame_equal(result.to_frame(), expected) + result = store.select('p2', where='ItemA>0') + expected = p.to_frame() + expected = expected[expected['ItemA'] > 0] + tm.assert_frame_equal(result.to_frame(), expected) - result = store.select('p2', where='ItemA>0 & minor_axis=["A","B"]') - expected = p.to_frame() - expected = expected[expected['ItemA'] > 0] - expected = expected[expected.reset_index( - level=['major']).index.isin(['A', 'B'])] - tm.assert_frame_equal(result.to_frame(), expected) + result = store.select( + 'p2', where='ItemA>0 & minor_axis=["A","B"]') + expected = p.to_frame() + expected = expected[expected['ItemA'] > 0] + expected = expected[expected.reset_index( + level=['major']).index.isin(['A', 'B'])] + tm.assert_frame_equal(result.to_frame(), expected) def test_create_table_index(self): with ensure_clean_store(self.path) as store: - def col(t, column): - return getattr(store.get_storer(t).table.cols, column) + with catch_warnings(record=True): + def col(t, column): + return getattr(store.get_storer(t).table.cols, column) - # index=False - wp = tm.makePanel() - store.append('p5', wp, index=False) - store.create_table_index('p5', columns=['major_axis']) - assert(col('p5', 'major_axis').is_indexed is True) - assert(col('p5', 'minor_axis').is_indexed is False) - - # index=True - store.append('p5i', wp, index=True) - assert(col('p5i', 'major_axis').is_indexed is True) - assert(col('p5i', 'minor_axis').is_indexed is True) - - # default optlevels - store.get_storer('p5').create_index() - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - - # let's change the indexing scheme - store.create_table_index('p5') - assert(col('p5', 'major_axis').index.optlevel == 6) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', optlevel=9) - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'medium') - store.create_table_index('p5', kind='full') - assert(col('p5', 'major_axis').index.optlevel == 9) - assert(col('p5', 'minor_axis').index.kind == 'full') - store.create_table_index('p5', optlevel=1, kind='light') - assert(col('p5', 'major_axis').index.optlevel == 1) - assert(col('p5', 'minor_axis').index.kind == 'light') - - # data columns - df = tm.makeTimeDataFrame() - df['string'] = 'foo' - df['string2'] = 'bar' - store.append('f', df, data_columns=['string', 'string2']) - assert(col('f', 'index').is_indexed is True) - assert(col('f', 'string').is_indexed is True) - assert(col('f', 'string2').is_indexed is True) - - # specify index=columns - store.append( - 'f2', df, index=['string'], data_columns=['string', 'string2']) - assert(col('f2', 'index').is_indexed is False) - assert(col('f2', 'string').is_indexed is True) - assert(col('f2', 'string2').is_indexed is False) + # index=False + wp = tm.makePanel() + store.append('p5', wp, index=False) + store.create_table_index('p5', columns=['major_axis']) + assert(col('p5', 'major_axis').is_indexed is True) + assert(col('p5', 'minor_axis').is_indexed is False) + + # index=True + store.append('p5i', wp, index=True) + assert(col('p5i', 'major_axis').is_indexed is True) + assert(col('p5i', 'minor_axis').is_indexed is True) + + # default optlevels + store.get_storer('p5').create_index() + assert(col('p5', 'major_axis').index.optlevel == 6) + assert(col('p5', 'minor_axis').index.kind == 'medium') + + # let's change the indexing scheme + store.create_table_index('p5') + assert(col('p5', 'major_axis').index.optlevel == 6) + assert(col('p5', 'minor_axis').index.kind == 'medium') + store.create_table_index('p5', optlevel=9) + assert(col('p5', 'major_axis').index.optlevel == 9) + assert(col('p5', 'minor_axis').index.kind == 'medium') + store.create_table_index('p5', kind='full') + assert(col('p5', 'major_axis').index.optlevel == 9) + assert(col('p5', 'minor_axis').index.kind == 'full') + store.create_table_index('p5', optlevel=1, kind='light') + assert(col('p5', 'major_axis').index.optlevel == 1) + assert(col('p5', 'minor_axis').index.kind == 'light') + + # data columns + df = tm.makeTimeDataFrame() + df['string'] = 'foo' + df['string2'] = 'bar' + store.append('f', df, data_columns=['string', 'string2']) + assert(col('f', 'index').is_indexed is True) + assert(col('f', 'string').is_indexed is True) + assert(col('f', 'string2').is_indexed is True) + + # specify index=columns + store.append( + 'f2', df, index=['string'], + data_columns=['string', 'string2']) + assert(col('f2', 'index').is_indexed is False) + assert(col('f2', 'string').is_indexed is True) + assert(col('f2', 'string2').is_indexed is False) - # try to index a non-table - _maybe_remove(store, 'f2') - store.put('f2', df) - self.assertRaises(TypeError, store.create_table_index, 'f2') + # try to index a non-table + _maybe_remove(store, 'f2') + store.put('f2', df) + pytest.raises(TypeError, store.create_table_index, 'f2') def test_append_diff_item_order(self): - wp = tm.makePanel() - wp1 = wp.iloc[:, :10, :] - wp2 = wp.iloc[wp.items.get_indexer(['ItemC', 'ItemB', 'ItemA']), - 10:, :] + with catch_warnings(record=True): + wp = tm.makePanel() + wp1 = wp.iloc[:, :10, :] + wp2 = wp.iloc[wp.items.get_indexer(['ItemC', 'ItemB', 'ItemA']), + 10:, :] - with ensure_clean_store(self.path) as store: - store.put('panel', wp1, format='table') - self.assertRaises(ValueError, store.put, 'panel', wp2, + with ensure_clean_store(self.path) as store: + store.put('panel', wp1, format='table') + pytest.raises(ValueError, store.put, 'panel', wp2, append=True) def test_append_hierarchical(self): @@ -1757,10 +1839,10 @@ def test_column_multiindex(self): check_index_type=True, check_column_type=True) - self.assertRaises(ValueError, store.put, 'df2', df, - format='table', data_columns=['A']) - self.assertRaises(ValueError, store.put, 'df3', df, - format='table', data_columns=True) + pytest.raises(ValueError, store.put, 'df2', df, + format='table', data_columns=['A']) + pytest.raises(ValueError, store.put, 'df3', df, + format='table', data_columns=True) # appending multi-column on existing table (see GH 6167) with ensure_clean_store(self.path) as store: @@ -1823,13 +1905,13 @@ def make_index(names=None): _maybe_remove(store, 'df') df = DataFrame(np.zeros((12, 2)), columns=[ 'a', 'b'], index=make_index(['date', 'a', 't'])) - self.assertRaises(ValueError, store.append, 'df', df) + pytest.raises(ValueError, store.append, 'df', df) # dup within level _maybe_remove(store, 'df') df = DataFrame(np.zeros((12, 2)), columns=['a', 'b'], index=make_index(['date', 'date', 'date'])) - self.assertRaises(ValueError, store.append, 'df', df) + pytest.raises(ValueError, store.append, 'df', df) # fully names _maybe_remove(store, 'df') @@ -1888,26 +1970,25 @@ def test_pass_spec_to_storer(self): with ensure_clean_store(self.path) as store: store.put('df', df) - self.assertRaises(TypeError, store.select, 'df', columns=['A']) - self.assertRaises(TypeError, store.select, - 'df', where=[('columns=A')]) + pytest.raises(TypeError, store.select, 'df', columns=['A']) + pytest.raises(TypeError, store.select, + 'df', where=[('columns=A')]) def test_append_misc(self): with ensure_clean_store(self.path) as store: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): + with catch_warnings(record=True): # unsuported data types for non-tables p4d = tm.makePanel4D() - self.assertRaises(TypeError, store.put, 'p4d', p4d) + pytest.raises(TypeError, store.put, 'p4d', p4d) # unsuported data types - self.assertRaises(TypeError, store.put, 'abc', None) - self.assertRaises(TypeError, store.put, 'abc', '123') - self.assertRaises(TypeError, store.put, 'abc', 123) - self.assertRaises(TypeError, store.put, 'abc', np.arange(5)) + pytest.raises(TypeError, store.put, 'abc', None) + pytest.raises(TypeError, store.put, 'abc', '123') + pytest.raises(TypeError, store.put, 'abc', 123) + pytest.raises(TypeError, store.put, 'abc', np.arange(5)) df = tm.makeDataFrame() store.append('df', df, chunksize=1) @@ -1930,15 +2011,16 @@ def check(obj, comparator): df['string'] = 'foo' df['float322'] = 1. df['float322'] = df['float322'].astype('float32') - df['bool'] = df['float322'] > 0 + df['boolean'] = df['float322'] > 0 df['time1'] = Timestamp('20130101') df['time2'] = Timestamp('20130102') check(df, tm.assert_frame_equal) - p = tm.makePanel() - check(p, assert_panel_equal) + with catch_warnings(record=True): + p = tm.makePanel() + check(p, assert_panel_equal) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p4d = tm.makePanel4D() check(p4d, assert_panel4d_equal) @@ -1948,7 +2030,7 @@ def check(obj, comparator): # 0 len df_empty = DataFrame(columns=list('ABC')) store.append('df', df_empty) - self.assertRaises(KeyError, store.select, 'df') + pytest.raises(KeyError, store.select, 'df') # repeated append of 0/non-zero frames df = DataFrame(np.random.rand(10, 3), columns=list('ABC')) @@ -1962,21 +2044,23 @@ def check(obj, comparator): store.put('df2', df) assert_frame_equal(store.select('df2'), df) - # 0 len - p_empty = Panel(items=list('ABC')) - store.append('p', p_empty) - self.assertRaises(KeyError, store.select, 'p') + with catch_warnings(record=True): - # repeated append of 0/non-zero frames - p = Panel(np.random.randn(3, 4, 5), items=list('ABC')) - store.append('p', p) - assert_panel_equal(store.select('p'), p) - store.append('p', p_empty) - assert_panel_equal(store.select('p'), p) + # 0 len + p_empty = Panel(items=list('ABC')) + store.append('p', p_empty) + pytest.raises(KeyError, store.select, 'p') - # store - store.put('p2', p_empty) - assert_panel_equal(store.select('p2'), p_empty) + # repeated append of 0/non-zero frames + p = Panel(np.random.randn(3, 4, 5), items=list('ABC')) + store.append('p', p) + assert_panel_equal(store.select('p'), p) + store.append('p', p_empty) + assert_panel_equal(store.select('p'), p) + + # store + store.put('p2', p_empty) + assert_panel_equal(store.select('p2'), p_empty) def test_append_raise(self): @@ -1987,13 +2071,13 @@ def test_append_raise(self): # list in column df = tm.makeDataFrame() df['invalid'] = [['a']] * len(df) - self.assertEqual(df.dtypes['invalid'], np.object_) - self.assertRaises(TypeError, store.append, 'df', df) + assert df.dtypes['invalid'] == np.object_ + pytest.raises(TypeError, store.append, 'df', df) # multiple invalid columns df['invalid2'] = [['a']] * len(df) df['invalid3'] = [['a']] * len(df) - self.assertRaises(TypeError, store.append, 'df', df) + pytest.raises(TypeError, store.append, 'df', df) # datetime with embedded nans as object df = tm.makeDataFrame() @@ -2001,22 +2085,22 @@ def test_append_raise(self): s = s.astype(object) s[0:5] = np.nan df['invalid'] = s - self.assertEqual(df.dtypes['invalid'], np.object_) - self.assertRaises(TypeError, store.append, 'df', df) + assert df.dtypes['invalid'] == np.object_ + pytest.raises(TypeError, store.append, 'df', df) # directy ndarray - self.assertRaises(TypeError, store.append, 'df', np.arange(10)) + pytest.raises(TypeError, store.append, 'df', np.arange(10)) # series directly - self.assertRaises(TypeError, store.append, - 'df', Series(np.arange(10))) + pytest.raises(TypeError, store.append, + 'df', Series(np.arange(10))) # appending an incompatible table df = tm.makeDataFrame() store.append('df', df) df['foo'] = 'foo' - self.assertRaises(ValueError, store.append, 'df', df) + pytest.raises(ValueError, store.append, 'df', df) def test_table_index_incompatible_dtypes(self): df1 = DataFrame({'a': [1, 2, 3]}) @@ -2025,8 +2109,8 @@ def test_table_index_incompatible_dtypes(self): with ensure_clean_store(self.path) as store: store.put('frame', df1, format='table') - self.assertRaises(TypeError, store.put, 'frame', df2, - format='table', append=True) + pytest.raises(TypeError, store.put, 'frame', df2, + format='table', append=True) def test_table_values_dtypes_roundtrip(self): @@ -2040,7 +2124,7 @@ def test_table_values_dtypes_roundtrip(self): assert_series_equal(df2.dtypes, store['df_i8'].dtypes) # incompatible dtype - self.assertRaises(ValueError, store.append, 'df_i8', df1) + pytest.raises(ValueError, store.append, 'df_i8', df1) # check creation/storage/retrieval of float32 (a bit hacky to # actually create them thought) @@ -2057,7 +2141,7 @@ def test_table_values_dtypes_roundtrip(self): df1['string'] = 'foo' df1['float322'] = 1. df1['float322'] = df1['float322'].astype('float32') - df1['bool'] = df1['float32'] > 0 + df1['boolean'] = df1['float32'] > 0 df1['time1'] = Timestamp('20130101') df1['time2'] = Timestamp('20130102') @@ -2066,8 +2150,8 @@ def test_table_values_dtypes_roundtrip(self): expected = Series({'float32': 2, 'float64': 1, 'int32': 1, 'bool': 1, 'int16': 1, 'int8': 1, 'int64': 1, 'object': 1, 'datetime64[ns]': 2}) - result.sort() - expected.sort() + result = result.sort_index() + result = expected.sort_index() tm.assert_series_equal(result, expected) def test_table_mixed_dtypes(self): @@ -2086,27 +2170,31 @@ def test_table_mixed_dtypes(self): df['datetime1'] = datetime.datetime(2001, 1, 2, 0, 0) df['datetime2'] = datetime.datetime(2001, 1, 3, 0, 0) df.loc[3:6, ['obj1']] = np.nan - df = df.consolidate()._convert(datetime=True) + df = df._consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: store.append('df1_mixed', df) tm.assert_frame_equal(store.select('df1_mixed'), df) - # panel - wp = tm.makePanel() - wp['obj1'] = 'foo' - wp['obj2'] = 'bar' - wp['bool1'] = wp['ItemA'] > 0 - wp['bool2'] = wp['ItemB'] > 0 - wp['int1'] = 1 - wp['int2'] = 2 - wp = wp.consolidate() + with catch_warnings(record=True): - with ensure_clean_store(self.path) as store: - store.append('p1_mixed', wp) - assert_panel_equal(store.select('p1_mixed'), wp) + # panel + wp = tm.makePanel() + wp['obj1'] = 'foo' + wp['obj2'] = 'bar' + wp['bool1'] = wp['ItemA'] > 0 + wp['bool2'] = wp['ItemB'] > 0 + wp['int1'] = 1 + wp['int2'] = 2 + wp = wp._consolidate() + + with catch_warnings(record=True): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with ensure_clean_store(self.path) as store: + store.append('p1_mixed', wp) + assert_panel_equal(store.select('p1_mixed'), wp) + + with catch_warnings(record=True): # ndim wp = tm.makePanel4D() wp['obj1'] = 'foo' @@ -2115,7 +2203,7 @@ def test_table_mixed_dtypes(self): wp['bool2'] = wp['l2'] > 0 wp['int1'] = 1 wp['int2'] = 2 - wp = wp.consolidate() + wp = wp._consolidate() with ensure_clean_store(self.path) as store: store.append('p4d_mixed', wp) @@ -2135,7 +2223,7 @@ def test_unimplemented_dtypes_table_columns(self): for n, f in l: df = tm.makeDataFrame() df[n] = f - self.assertRaises( + pytest.raises( TypeError, store.append, 'df1_%s' % n, df) # frame @@ -2143,11 +2231,11 @@ def test_unimplemented_dtypes_table_columns(self): df['obj1'] = 'foo' df['obj2'] = 'bar' df['datetime1'] = datetime.date(2001, 1, 2) - df = df.consolidate()._convert(datetime=True) + df = df._consolidate()._convert(datetime=True) with ensure_clean_store(self.path) as store: # this fails because we have a date in the object block...... - self.assertRaises(TypeError, store.append, 'df_unimplemented', df) + pytest.raises(TypeError, store.append, 'df_unimplemented', df) def test_calendar_roundtrip_issue(self): @@ -2178,7 +2266,6 @@ def test_append_with_timedelta(self): # GH 3577 # append timedelta - from datetime import timedelta df = DataFrame(dict(A=Timestamp('20130101'), B=[Timestamp( '20130101') + timedelta(days=i, seconds=10) for i in range(10)])) df['C'] = df['A'] - df['B'] @@ -2192,10 +2279,10 @@ def test_append_with_timedelta(self): result = store.select('df') assert_frame_equal(result, df) - result = store.select('df', Term("C<100000")) + result = store.select('df', where="C<100000") assert_frame_equal(result, df) - result = store.select('df', Term("C", "<", -3 * 86400)) + result = store.select('df', where="Cfoo') - self.assertRaises(KeyError, store.remove, 'a', [crit1]) + with catch_warnings(record=True): - # try to remove non-table (with crit) - # non-table ok (where = None) - wp = tm.makePanel(30) - store.put('wp', wp, format='table') - store.remove('wp', ["minor_axis=['A', 'D']"]) - rs = store.select('wp') - expected = wp.reindex(minor_axis=['B', 'C']) - assert_panel_equal(rs, expected) + # non-existance + crit1 = 'index>foo' + pytest.raises(KeyError, store.remove, 'a', [crit1]) - # empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') + # try to remove non-table (with crit) + # non-table ok (where = None) + wp = tm.makePanel(30) + store.put('wp', wp, format='table') + store.remove('wp', ["minor_axis=['A', 'D']"]) + rs = store.select('wp') + expected = wp.reindex(minor_axis=['B', 'C']) + assert_panel_equal(rs, expected) - # deleted number (entire table) - n = store.remove('wp', []) - self.assertTrue(n == 120) + # empty where + _maybe_remove(store, 'wp') + store.put('wp', wp, format='table') - # non - empty where - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - self.assertRaises(ValueError, store.remove, - 'wp', ['foo']) + # deleted number (entire table) + n = store.remove('wp', []) + assert n == 120 - # selectin non-table with a where - # store.put('wp2', wp, format='f') - # self.assertRaises(ValueError, store.remove, - # 'wp2', [('column', ['A', 'D'])]) + # non - empty where + _maybe_remove(store, 'wp') + store.put('wp', wp, format='table') + pytest.raises(ValueError, store.remove, + 'wp', ['foo']) def test_remove_startstop(self): # GH #4835 and #6177 with ensure_clean_store(self.path) as store: - wp = tm.makePanel(30) - - # start - _maybe_remove(store, 'wp1') - store.put('wp1', wp, format='t') - n = store.remove('wp1', start=32) - self.assertTrue(n == 120 - 32) - result = store.select('wp1') - expected = wp.reindex(major_axis=wp.major_axis[:32 // 4]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='t') - n = store.remove('wp2', start=-32) - self.assertTrue(n == 32) - result = store.select('wp2') - expected = wp.reindex(major_axis=wp.major_axis[:-32 // 4]) - assert_panel_equal(result, expected) - - # stop - _maybe_remove(store, 'wp3') - store.put('wp3', wp, format='t') - n = store.remove('wp3', stop=32) - self.assertTrue(n == 32) - result = store.select('wp3') - expected = wp.reindex(major_axis=wp.major_axis[32 // 4:]) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='t') - n = store.remove('wp4', stop=-32) - self.assertTrue(n == 120 - 32) - result = store.select('wp4') - expected = wp.reindex(major_axis=wp.major_axis[-32 // 4:]) - assert_panel_equal(result, expected) - - # start n stop - _maybe_remove(store, 'wp5') - store.put('wp5', wp, format='t') - n = store.remove('wp5', start=16, stop=-16) - self.assertTrue(n == 120 - 32) - result = store.select('wp5') - expected = wp.reindex(major_axis=wp.major_axis[ - :16 // 4].union(wp.major_axis[-16 // 4:])) - assert_panel_equal(result, expected) - - _maybe_remove(store, 'wp6') - store.put('wp6', wp, format='t') - n = store.remove('wp6', start=16, stop=16) - self.assertTrue(n == 0) - result = store.select('wp6') - expected = wp.reindex(major_axis=wp.major_axis) - assert_panel_equal(result, expected) - - # with where - _maybe_remove(store, 'wp7') - - # TODO: unused? - date = wp.major_axis.take(np.arange(0, 30, 3)) # noqa - - crit = Term('major_axis=date') - store.put('wp7', wp, format='t') - n = store.remove('wp7', where=[crit], stop=80) - self.assertTrue(n == 28) - result = store.select('wp7') - expected = wp.reindex(major_axis=wp.major_axis.difference( - wp.major_axis[np.arange(0, 20, 3)])) - assert_panel_equal(result, expected) + with catch_warnings(record=True): + wp = tm.makePanel(30) + + # start + _maybe_remove(store, 'wp1') + store.put('wp1', wp, format='t') + n = store.remove('wp1', start=32) + assert n == 120 - 32 + result = store.select('wp1') + expected = wp.reindex(major_axis=wp.major_axis[:32 // 4]) + assert_panel_equal(result, expected) + + _maybe_remove(store, 'wp2') + store.put('wp2', wp, format='t') + n = store.remove('wp2', start=-32) + assert n == 32 + result = store.select('wp2') + expected = wp.reindex(major_axis=wp.major_axis[:-32 // 4]) + assert_panel_equal(result, expected) + + # stop + _maybe_remove(store, 'wp3') + store.put('wp3', wp, format='t') + n = store.remove('wp3', stop=32) + assert n == 32 + result = store.select('wp3') + expected = wp.reindex(major_axis=wp.major_axis[32 // 4:]) + assert_panel_equal(result, expected) + + _maybe_remove(store, 'wp4') + store.put('wp4', wp, format='t') + n = store.remove('wp4', stop=-32) + assert n == 120 - 32 + result = store.select('wp4') + expected = wp.reindex(major_axis=wp.major_axis[-32 // 4:]) + assert_panel_equal(result, expected) + + # start n stop + _maybe_remove(store, 'wp5') + store.put('wp5', wp, format='t') + n = store.remove('wp5', start=16, stop=-16) + assert n == 120 - 32 + result = store.select('wp5') + expected = wp.reindex( + major_axis=(wp.major_axis[:16 // 4] + .union(wp.major_axis[-16 // 4:]))) + assert_panel_equal(result, expected) + + _maybe_remove(store, 'wp6') + store.put('wp6', wp, format='t') + n = store.remove('wp6', start=16, stop=16) + assert n == 0 + result = store.select('wp6') + expected = wp.reindex(major_axis=wp.major_axis) + assert_panel_equal(result, expected) + + # with where + _maybe_remove(store, 'wp7') + + # TODO: unused? + date = wp.major_axis.take(np.arange(0, 30, 3)) # noqa + + crit = 'major_axis=date' + store.put('wp7', wp, format='t') + n = store.remove('wp7', where=[crit], stop=80) + assert n == 28 + result = store.select('wp7') + expected = wp.reindex(major_axis=wp.major_axis.difference( + wp.major_axis[np.arange(0, 20, 3)])) + assert_panel_equal(result, expected) def test_remove_crit(self): with ensure_clean_store(self.path) as store: - wp = tm.makePanel(30) - - # group row removal - _maybe_remove(store, 'wp3') - date4 = wp.major_axis.take([0, 1, 2, 4, 5, 6, 8, 9, 10]) - crit4 = Term('major_axis=date4') - store.put('wp3', wp, format='t') - n = store.remove('wp3', where=[crit4]) - self.assertTrue(n == 36) - - result = store.select('wp3') - expected = wp.reindex(major_axis=wp.major_axis.difference(date4)) - assert_panel_equal(result, expected) - - # upper half - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - - crit1 = Term('major_axis>date') - crit2 = Term("minor_axis=['A', 'D']") - n = store.remove('wp', where=[crit1]) - self.assertTrue(n == 56) - - n = store.remove('wp', where=[crit2]) - self.assertTrue(n == 32) - - result = store['wp'] - expected = wp.truncate(after=date).reindex(minor=['B', 'C']) - assert_panel_equal(result, expected) - - # individual row elements - _maybe_remove(store, 'wp2') - store.put('wp2', wp, format='table') - - date1 = wp.major_axis[1:3] - crit1 = Term('major_axis=date1') - store.remove('wp2', where=[crit1]) - result = store.select('wp2') - expected = wp.reindex(major_axis=wp.major_axis.difference(date1)) - assert_panel_equal(result, expected) - - date2 = wp.major_axis[5] - crit2 = Term('major_axis=date2') - store.remove('wp2', where=[crit2]) - result = store['wp2'] - expected = wp.reindex(major_axis=wp.major_axis.difference(date1) - .difference(Index([date2]))) - assert_panel_equal(result, expected) - - date3 = [wp.major_axis[7], wp.major_axis[9]] - crit3 = Term('major_axis=date3') - store.remove('wp2', where=[crit3]) - result = store['wp2'] - expected = wp.reindex(major_axis=wp.major_axis - .difference(date1) - .difference(Index([date2])) - .difference(Index(date3))) - assert_panel_equal(result, expected) - - # corners - _maybe_remove(store, 'wp4') - store.put('wp4', wp, format='table') - n = store.remove( - 'wp4', where=[Term('major_axis>wp.major_axis[-1]')]) - result = store.select('wp4') - assert_panel_equal(result, wp) + with catch_warnings(record=True): + wp = tm.makePanel(30) + + # group row removal + _maybe_remove(store, 'wp3') + date4 = wp.major_axis.take([0, 1, 2, 4, 5, 6, 8, 9, 10]) + crit4 = 'major_axis=date4' + store.put('wp3', wp, format='t') + n = store.remove('wp3', where=[crit4]) + assert n == 36 + + result = store.select('wp3') + expected = wp.reindex( + major_axis=wp.major_axis.difference(date4)) + assert_panel_equal(result, expected) + + # upper half + _maybe_remove(store, 'wp') + store.put('wp', wp, format='table') + date = wp.major_axis[len(wp.major_axis) // 2] + + crit1 = 'major_axis>date' + crit2 = "minor_axis=['A', 'D']" + n = store.remove('wp', where=[crit1]) + assert n == 56 + + n = store.remove('wp', where=[crit2]) + assert n == 32 + + result = store['wp'] + expected = wp.truncate(after=date).reindex(minor=['B', 'C']) + assert_panel_equal(result, expected) + + # individual row elements + _maybe_remove(store, 'wp2') + store.put('wp2', wp, format='table') + + date1 = wp.major_axis[1:3] + crit1 = 'major_axis=date1' + store.remove('wp2', where=[crit1]) + result = store.select('wp2') + expected = wp.reindex( + major_axis=wp.major_axis.difference(date1)) + assert_panel_equal(result, expected) + + date2 = wp.major_axis[5] + crit2 = 'major_axis=date2' + store.remove('wp2', where=[crit2]) + result = store['wp2'] + expected = wp.reindex( + major_axis=(wp.major_axis + .difference(date1) + .difference(Index([date2])) + )) + assert_panel_equal(result, expected) + + date3 = [wp.major_axis[7], wp.major_axis[9]] + crit3 = 'major_axis=date3' + store.remove('wp2', where=[crit3]) + result = store['wp2'] + expected = wp.reindex(major_axis=wp.major_axis + .difference(date1) + .difference(Index([date2])) + .difference(Index(date3))) + assert_panel_equal(result, expected) + + # corners + _maybe_remove(store, 'wp4') + store.put('wp4', wp, format='table') + n = store.remove( + 'wp4', where="major_axis>wp.major_axis[-1]") + result = store.select('wp4') + assert_panel_equal(result, wp) def test_invalid_terms(self): with ensure_clean_store(self.path) as store: - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): df = tm.makeTimeDataFrame() df['string'] = 'foo' @@ -2453,19 +2545,19 @@ def test_invalid_terms(self): store.put('p4d', p4d, format='table') # some invalid terms - self.assertRaises(ValueError, store.select, - 'wp', "minor=['A', 'B']") - self.assertRaises(ValueError, store.select, - 'wp', ["index=['20121114']"]) - self.assertRaises(ValueError, store.select, 'wp', [ + pytest.raises(ValueError, store.select, + 'wp', "minor=['A', 'B']") + pytest.raises(ValueError, store.select, + 'wp', ["index=['20121114']"]) + pytest.raises(ValueError, store.select, 'wp', [ "index=['20121114', '20121114']"]) - self.assertRaises(TypeError, Term) + pytest.raises(TypeError, Term) # more invalid - self.assertRaises( + pytest.raises( ValueError, store.select, 'df', 'df.index[3]') - self.assertRaises(SyntaxError, store.select, 'df', 'index>') - self.assertRaises( + pytest.raises(SyntaxError, store.select, 'df', 'index>') + pytest.raises( ValueError, store.select, 'wp', "major_axis<'20000108' & minor_axis['A', 'B']") @@ -2486,60 +2578,52 @@ def test_invalid_terms(self): 'ABCD'), index=date_range('20130101', periods=10)) dfq.to_hdf(path, 'dfq', format='table') - self.assertRaises(ValueError, read_hdf, path, - 'dfq', where="A>0 or C>0") + pytest.raises(ValueError, read_hdf, path, + 'dfq', where="A>0 or C>0") def test_terms(self): with ensure_clean_store(self.path) as store: - wp = tm.makePanel() - wpneg = Panel.fromDict({-1: tm.makeDataFrame(), - 0: tm.makeDataFrame(), - 1: tm.makeDataFrame()}) - - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): + wp = tm.makePanel() + wpneg = Panel.fromDict({-1: tm.makeDataFrame(), + 0: tm.makeDataFrame(), + 1: tm.makeDataFrame()}) p4d = tm.makePanel4D() store.put('p4d', p4d, format='table') - - store.put('wp', wp, format='table') - store.put('wpneg', wpneg, format='table') - - # panel - result = store.select('wp', [Term( - 'major_axis<"20000108"'), Term("minor_axis=['A', 'B']")]) - expected = wp.truncate(after='20000108').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) - - # with deprecation - result = store.select('wp', [Term( - 'major_axis', '<', "20000108"), Term("minor_axis=['A', 'B']")]) - expected = wp.truncate(after='20000108').reindex(minor=['A', 'B']) - tm.assert_panel_equal(result, expected) + store.put('wp', wp, format='table') + store.put('wpneg', wpneg, format='table') + + # panel + result = store.select( + 'wp', + "major_axis<'20000108' and minor_axis=['A', 'B']") + expected = wp.truncate( + after='20000108').reindex(minor=['A', 'B']) + assert_panel_equal(result, expected) + + # with deprecation + result = store.select( + 'wp', where=("major_axis<'20000108' " + "and minor_axis=['A', 'B']")) + expected = wp.truncate( + after='20000108').reindex(minor=['A', 'B']) + tm.assert_panel_equal(result, expected) # p4d - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): result = store.select('p4d', - [Term('major_axis<"20000108"'), - Term("minor_axis=['A', 'B']"), - Term("items=['ItemA', 'ItemB']")]) + ("major_axis<'20000108' and " + "minor_axis=['A', 'B'] and " + "items=['ItemA', 'ItemB']")) expected = p4d.truncate(after='20000108').reindex( minor=['A', 'B'], items=['ItemA', 'ItemB']) assert_panel4d_equal(result, expected) - # back compat invalid terms - terms = [dict(field='major_axis', op='>', value='20121114'), - [dict(field='major_axis', op='>', value='20121114')], - ["minor_axis=['A','B']", - dict(field='major_axis', op='>', value='20121114')]] - for t in terms: - with tm.assert_produces_warning(expected_warning=FutureWarning, - check_stacklevel=False): - Term(t) - - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): # valid terms terms = [('major_axis=20121114'), @@ -2561,133 +2645,80 @@ def test_terms(self): store.select('p4d', t) # valid for p4d only - terms = [(("labels=['l1', 'l2']"),), - Term("labels=['l1', 'l2']"), - ] - + terms = ["labels=['l1', 'l2']"] for t in terms: store.select('p4d', t) - with tm.assertRaisesRegexp(TypeError, - 'Only named functions are supported'): - store.select('wp', Term( - 'major_axis == (lambda x: x)("20130101")')) + with tm.assert_raises_regex( + TypeError, 'Only named functions are supported'): + store.select( + 'wp', + 'major_axis == (lambda x: x)("20130101")') - # check USub node parsing - res = store.select('wpneg', Term('items == -1')) - expected = Panel({-1: wpneg[-1]}) - tm.assert_panel_equal(res, expected) + with catch_warnings(record=True): + # check USub node parsing + res = store.select('wpneg', 'items == -1') + expected = Panel({-1: wpneg[-1]}) + tm.assert_panel_equal(res, expected) - with tm.assertRaisesRegexp(NotImplementedError, - 'Unary addition not supported'): - store.select('wpneg', Term('items == +1')) + with tm.assert_raises_regex(NotImplementedError, + 'Unary addition ' + 'not supported'): + store.select('wpneg', 'items == +1') def test_term_compat(self): with ensure_clean_store(self.path) as store: - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - result = store.select('wp', [Term('major_axis>20000102'), - Term('minor_axis', '=', ['A', 'B'])]) - expected = wp.loc[:, wp.major_axis > - Timestamp('20000102'), ['A', 'B']] - assert_panel_equal(result, expected) - - store.remove('wp', Term('major_axis>20000103')) - result = store.select('wp') - expected = wp.loc[:, wp.major_axis <= Timestamp('20000103'), :] - assert_panel_equal(result, expected) - - with ensure_clean_store(self.path) as store: - - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - # stringified datetimes - result = store.select( - 'wp', [Term('major_axis', '>', datetime.datetime(2000, 1, 2))]) - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', [Term('major_axis', '>', - datetime.datetime(2000, 1, 2, 0, 0))]) - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - - result = store.select( - 'wp', [Term('major_axis', '=', - [datetime.datetime(2000, 1, 2, 0, 0), - datetime.datetime(2000, 1, 3, 0, 0)])]) - expected = wp.loc[:, [Timestamp('20000102'), - Timestamp('20000103')]] - assert_panel_equal(result, expected) - - result = store.select('wp', [Term('minor_axis', '=', ['A', 'B'])]) - expected = wp.loc[:, :, ['A', 'B']] - assert_panel_equal(result, expected) - - def test_backwards_compat_without_term_object(self): - with ensure_clean_store(self.path) as store: - - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - with assert_produces_warning(expected_warning=FutureWarning, - check_stacklevel=False): - result = store.select('wp', [('major_axis>20000102'), - ('minor_axis', '=', ['A', 'B'])]) - expected = wp.loc[:, - wp.major_axis > Timestamp('20000102'), - ['A', 'B']] - assert_panel_equal(result, expected) - - store.remove('wp', ('major_axis>20000103')) - result = store.select('wp') - expected = wp.loc[:, wp.major_axis <= Timestamp('20000103'), :] - assert_panel_equal(result, expected) - - with ensure_clean_store(self.path) as store: - - wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], - major_axis=date_range('1/1/2000', periods=5), - minor_axis=['A', 'B', 'C', 'D']) - store.append('wp', wp) - - # stringified datetimes - with assert_produces_warning(expected_warning=FutureWarning, - check_stacklevel=False): - result = store.select('wp', - [('major_axis', - '>', - datetime.datetime(2000, 1, 2))]) - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - with assert_produces_warning(expected_warning=FutureWarning, - check_stacklevel=False): - result = store.select('wp', - [('major_axis', - '>', - datetime.datetime(2000, 1, 2, 0, 0))]) - expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] - assert_panel_equal(result, expected) - with assert_produces_warning(expected_warning=FutureWarning, - check_stacklevel=False): - result = store.select('wp', - [('major_axis', - '=', - [datetime.datetime(2000, 1, 2, 0, 0), - datetime.datetime(2000, 1, 3, 0, 0)])] - ) - expected = wp.loc[:, [Timestamp('20000102'), - Timestamp('20000103')]] - assert_panel_equal(result, expected) + with catch_warnings(record=True): + wp = Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], + major_axis=date_range('1/1/2000', periods=5), + minor_axis=['A', 'B', 'C', 'D']) + store.append('wp', wp) + + result = store.select( + 'wp', where=("major_axis>20000102 " + "and minor_axis=['A', 'B']")) + expected = wp.loc[:, wp.major_axis > + Timestamp('20000102'), ['A', 'B']] + assert_panel_equal(result, expected) + + store.remove('wp', 'major_axis>20000103') + result = store.select('wp') + expected = wp.loc[:, wp.major_axis <= Timestamp('20000103'), :] + assert_panel_equal(result, expected) + + with ensure_clean_store(self.path) as store: + + with catch_warnings(record=True): + wp = Panel(np.random.randn(2, 5, 4), + items=['Item1', 'Item2'], + major_axis=date_range('1/1/2000', periods=5), + minor_axis=['A', 'B', 'C', 'D']) + store.append('wp', wp) + + # stringified datetimes + result = store.select( + 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') + expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] + assert_panel_equal(result, expected) + + result = store.select( + 'wp', 'major_axis>datetime.datetime(2000, 1, 2)') + expected = wp.loc[:, wp.major_axis > Timestamp('20000102')] + assert_panel_equal(result, expected) + + result = store.select( + 'wp', + "major_axis=[datetime.datetime(2000, 1, 2, 0, 0), " + "datetime.datetime(2000, 1, 3, 0, 0)]") + expected = wp.loc[:, [Timestamp('20000102'), + Timestamp('20000103')]] + assert_panel_equal(result, expected) + + result = store.select( + 'wp', "minor_axis=['A', 'B']") + expected = wp.loc[:, :, ['A', 'B']] + assert_panel_equal(result, expected) def test_same_name_scoping(self): @@ -2777,68 +2808,73 @@ def test_tuple_index(self): data = np.random.randn(30).reshape((3, 10)) DF = DataFrame(data, index=idx, columns=col) - expected_warning = Warning if PY35 else PerformanceWarning - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): + with catch_warnings(record=True): self._check_roundtrip(DF, tm.assert_frame_equal) def test_index_types(self): - values = np.random.randn(2) + with catch_warnings(record=True): + values = np.random.randn(2) - func = lambda l, r: tm.assert_series_equal(l, r, - check_dtype=True, - check_index_type=True, - check_series_type=True) + func = lambda l, r: tm.assert_series_equal(l, r, + check_dtype=True, + check_index_type=True, + check_series_type=True) + + with catch_warnings(record=True): + ser = Series(values, [0, 'y']) + self._check_roundtrip(ser, func) + + with catch_warnings(record=True): + ser = Series(values, [datetime.datetime.today(), 0]) + self._check_roundtrip(ser, func) + + with catch_warnings(record=True): + ser = Series(values, ['y', 0]) + self._check_roundtrip(ser, func) + + with catch_warnings(record=True): + ser = Series(values, [datetime.date.today(), 'a']) + self._check_roundtrip(ser, func) + + with catch_warnings(record=True): - # nose has a deprecation warning in 3.5 - expected_warning = Warning if PY35 else PerformanceWarning - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): ser = Series(values, [0, 'y']) self._check_roundtrip(ser, func) - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): ser = Series(values, [datetime.datetime.today(), 0]) self._check_roundtrip(ser, func) - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): ser = Series(values, ['y', 0]) self._check_roundtrip(ser, func) - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): ser = Series(values, [datetime.date.today(), 'a']) self._check_roundtrip(ser, func) - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): ser = Series(values, [1.23, 'b']) self._check_roundtrip(ser, func) - ser = Series(values, [1, 1.53]) - self._check_roundtrip(ser, func) + ser = Series(values, [1, 1.53]) + self._check_roundtrip(ser, func) - ser = Series(values, [1, 5]) - self._check_roundtrip(ser, func) + ser = Series(values, [1, 5]) + self._check_roundtrip(ser, func) - ser = Series(values, [datetime.datetime( - 2012, 1, 1), datetime.datetime(2012, 1, 2)]) - self._check_roundtrip(ser, func) + ser = Series(values, [datetime.datetime( + 2012, 1, 1), datetime.datetime(2012, 1, 2)]) + self._check_roundtrip(ser, func) def test_timeseries_preepoch(self): if sys.version_info[0] == 2 and sys.version_info[1] < 7: - raise nose.SkipTest("won't work on Python < 2.7") + pytest.skip("won't work on Python < 2.7") dr = bdate_range('1/1/1940', '1/1/1960') ts = Series(np.random.randn(len(dr)), index=dr) try: self._check_roundtrip(ts, tm.assert_series_equal) except OverflowError: - raise nose.SkipTest('known failer on some windows platforms') + pytest.skip('known failer on some windows platforms') def test_frame(self): @@ -2869,7 +2905,7 @@ def test_frame(self): df['foo'] = np.random.randn(len(df)) store['df'] = df recons = store['df'] - self.assertTrue(recons._data.is_consolidated()) + assert recons._data.is_consolidated() # empty self._check_roundtrip(df[:0], tm.assert_frame_equal) @@ -2939,6 +2975,27 @@ def test_store_index_name_with_tz(self): recons = store['frame'] tm.assert_frame_equal(recons, df) + @pytest.mark.parametrize('table_format', ['table', 'fixed']) + def test_store_index_name_numpy_str(self, table_format): + # GH #13492 + idx = pd.Index(pd.to_datetime([datetime.date(2000, 1, 1), + datetime.date(2000, 1, 2)]), + name=u('cols\u05d2')) + idx1 = pd.Index(pd.to_datetime([datetime.date(2010, 1, 1), + datetime.date(2010, 1, 2)]), + name=u('rows\u05d0')) + df = pd.DataFrame(np.arange(4).reshape(2, 2), columns=idx, index=idx1) + + # This used to fail, returning numpy strings instead of python strings. + with ensure_clean_path(self.path) as path: + df.to_hdf(path, 'df', format=table_format) + df2 = read_hdf(path, 'df') + + assert_frame_equal(df, df2, check_names=True) + + assert type(df2.index.name) == text_type + assert type(df2.columns.name) == text_type + def test_store_series_name(self): df = tm.makeDataFrame() series = df['A'] @@ -2958,7 +3015,7 @@ def _make_one(): df['bool2'] = df['B'] > 0 df['int1'] = 1 df['int2'] = 2 - return df.consolidate() + return df._consolidate() df1 = _make_one() df2 = _make_one() @@ -2989,13 +3046,9 @@ def _make_one(): def test_wide(self): - wp = tm.makePanel() - self._check_roundtrip(wp, assert_panel_equal) - - def test_wide_table(self): - - wp = tm.makePanel() - self._check_roundtrip_table(wp, assert_panel_equal) + with catch_warnings(record=True): + wp = tm.makePanel() + self._check_roundtrip(wp, assert_panel_equal) def test_select_with_dups(self): @@ -3057,25 +3110,24 @@ def test_select_with_dups(self): assert_frame_equal(result, expected, by_blocks=True) def test_wide_table_dups(self): - wp = tm.makePanel() with ensure_clean_store(self.path) as store: - store.put('panel', wp, format='table') - store.put('panel', wp, format='table', append=True) + with catch_warnings(record=True): + + wp = tm.makePanel() + store.put('panel', wp, format='table') + store.put('panel', wp, format='table', append=True) - with tm.assert_produces_warning(expected_warning=DuplicateWarning): recons = store['panel'] - assert_panel_equal(recons, wp) + assert_panel_equal(recons, wp) def test_long(self): def _check(left, right): assert_panel_equal(left.to_panel(), right.to_panel()) - wp = tm.makePanel() - self._check_roundtrip(wp.to_frame(), _check) - - # empty - # self._check_roundtrip(wp.to_frame()[:0], _check) + with catch_warnings(record=True): + wp = tm.makePanel() + self._check_roundtrip(wp.to_frame(), _check) def test_longpanel(self): pass @@ -3122,70 +3174,72 @@ def test_sparse_with_compression(self): check_frame_type=True) def test_select(self): - wp = tm.makePanel() with ensure_clean_store(self.path) as store: - # put/select ok - _maybe_remove(store, 'wp') - store.put('wp', wp, format='table') - store.select('wp') - - # non-table ok (where = None) - _maybe_remove(store, 'wp') - store.put('wp2', wp) - store.select('wp2') - - # selection on the non-indexable with a large number of columns - wp = Panel(np.random.randn(100, 100, 100), - items=['Item%03d' % i for i in range(100)], - major_axis=date_range('1/1/2000', periods=100), - minor_axis=['E%03d' % i for i in range(100)]) - - _maybe_remove(store, 'wp') - store.append('wp', wp) - items = ['Item%03d' % i for i in range(80)] - result = store.select('wp', Term('items=items')) - expected = wp.reindex(items=items) - assert_panel_equal(expected, result) - - # selectin non-table with a where - # self.assertRaises(ValueError, store.select, - # 'wp2', ('column', ['A', 'D'])) + with catch_warnings(record=True): + wp = tm.makePanel() - # select with columns= - df = tm.makeTimeDataFrame() - _maybe_remove(store, 'df') - store.append('df', df) - result = store.select('df', columns=['A', 'B']) - expected = df.reindex(columns=['A', 'B']) - tm.assert_frame_equal(expected, result) + # put/select ok + _maybe_remove(store, 'wp') + store.put('wp', wp, format='table') + store.select('wp') + + # non-table ok (where = None) + _maybe_remove(store, 'wp') + store.put('wp2', wp) + store.select('wp2') + + # selection on the non-indexable with a large number of columns + wp = Panel(np.random.randn(100, 100, 100), + items=['Item%03d' % i for i in range(100)], + major_axis=date_range('1/1/2000', periods=100), + minor_axis=['E%03d' % i for i in range(100)]) + + _maybe_remove(store, 'wp') + store.append('wp', wp) + items = ['Item%03d' % i for i in range(80)] + result = store.select('wp', 'items=items') + expected = wp.reindex(items=items) + assert_panel_equal(expected, result) + + # selectin non-table with a where + # pytest.raises(ValueError, store.select, + # 'wp2', ('column', ['A', 'D'])) + + # select with columns= + df = tm.makeTimeDataFrame() + _maybe_remove(store, 'df') + store.append('df', df) + result = store.select('df', columns=['A', 'B']) + expected = df.reindex(columns=['A', 'B']) + tm.assert_frame_equal(expected, result) - # equivalentsly - result = store.select('df', [("columns=['A', 'B']")]) - expected = df.reindex(columns=['A', 'B']) - tm.assert_frame_equal(expected, result) + # equivalentsly + result = store.select('df', [("columns=['A', 'B']")]) + expected = df.reindex(columns=['A', 'B']) + tm.assert_frame_equal(expected, result) - # with a data column - _maybe_remove(store, 'df') - store.append('df', df, data_columns=['A']) - result = store.select('df', ['A > 0'], columns=['A', 'B']) - expected = df[df.A > 0].reindex(columns=['A', 'B']) - tm.assert_frame_equal(expected, result) + # with a data column + _maybe_remove(store, 'df') + store.append('df', df, data_columns=['A']) + result = store.select('df', ['A > 0'], columns=['A', 'B']) + expected = df[df.A > 0].reindex(columns=['A', 'B']) + tm.assert_frame_equal(expected, result) - # all a data columns - _maybe_remove(store, 'df') - store.append('df', df, data_columns=True) - result = store.select('df', ['A > 0'], columns=['A', 'B']) - expected = df[df.A > 0].reindex(columns=['A', 'B']) - tm.assert_frame_equal(expected, result) + # all a data columns + _maybe_remove(store, 'df') + store.append('df', df, data_columns=True) + result = store.select('df', ['A > 0'], columns=['A', 'B']) + expected = df[df.A > 0].reindex(columns=['A', 'B']) + tm.assert_frame_equal(expected, result) - # with a data column, but different columns - _maybe_remove(store, 'df') - store.append('df', df, data_columns=['A']) - result = store.select('df', ['A > 0'], columns=['C', 'D']) - expected = df[df.A > 0].reindex(columns=['C', 'D']) - tm.assert_frame_equal(expected, result) + # with a data column, but different columns + _maybe_remove(store, 'df') + store.append('df', df, data_columns=['A']) + result = store.select('df', ['A > 0'], columns=['C', 'D']) + expected = df[df.A > 0].reindex(columns=['C', 'D']) + tm.assert_frame_equal(expected, result) def test_select_dtypes(self): @@ -3197,7 +3251,7 @@ def test_select_dtypes(self): _maybe_remove(store, 'df') store.append('df', df, data_columns=['ts', 'A']) - result = store.select('df', [Term("ts>=Timestamp('2012-02-01')")]) + result = store.select('df', "ts>=Timestamp('2012-02-01')") expected = df[df.ts >= Timestamp('2012-02-01')] tm.assert_frame_equal(expected, result) @@ -3212,15 +3266,15 @@ def test_select_dtypes(self): expected = (df[df.boolv == True] # noqa .reindex(columns=['A', 'boolv'])) for v in [True, 'true', 1]: - result = store.select('df', Term( - 'boolv == %s' % str(v)), columns=['A', 'boolv']) + result = store.select('df', 'boolv == %s' % str(v), + columns=['A', 'boolv']) tm.assert_frame_equal(expected, result) expected = (df[df.boolv == False] # noqa .reindex(columns=['A', 'boolv'])) for v in [False, 'false', 0]: - result = store.select('df', Term( - 'boolv == %s' % str(v)), columns=['A', 'boolv']) + result = store.select( + 'df', 'boolv == %s' % str(v), columns=['A', 'boolv']) tm.assert_frame_equal(expected, result) # integer index @@ -3228,7 +3282,7 @@ def test_select_dtypes(self): _maybe_remove(store, 'df_int') store.append('df_int', df) result = store.select( - 'df_int', [Term("index<10"), Term("columns=['A']")]) + 'df_int', "index<10 and columns=['A']") expected = df.reindex(index=list(df.index)[0:10], columns=['A']) tm.assert_frame_equal(expected, result) @@ -3238,7 +3292,7 @@ def test_select_dtypes(self): _maybe_remove(store, 'df_float') store.append('df_float', df) result = store.select( - 'df_float', [Term("index<10.0"), Term("columns=['A']")]) + 'df_float', "index<10.0 and columns=['A']") expected = df.reindex(index=list(df.index)[0:10], columns=['A']) tm.assert_frame_equal(expected, result) @@ -3309,14 +3363,14 @@ def test_select_with_many_inputs(self): store.append('df', df, data_columns=['ts', 'A', 'B', 'users']) # regular select - result = store.select('df', [Term("ts>=Timestamp('2012-02-01')")]) + result = store.select('df', "ts>=Timestamp('2012-02-01')") expected = df[df.ts >= Timestamp('2012-02-01')] tm.assert_frame_equal(expected, result) # small selector result = store.select( - 'df', [Term("ts>=Timestamp('2012-02-01') & " - "users=['a','b','c']")]) + 'df', + "ts>=Timestamp('2012-02-01') & users=['a','b','c']") expected = df[(df.ts >= Timestamp('2012-02-01')) & df.users.isin(['a', 'b', 'c'])] tm.assert_frame_equal(expected, result) @@ -3324,24 +3378,24 @@ def test_select_with_many_inputs(self): # big selector along the columns selector = ['a', 'b', 'c'] + ['a%03d' % i for i in range(60)] result = store.select( - 'df', [Term("ts>=Timestamp('2012-02-01')"), - Term('users=selector')]) + 'df', + "ts>=Timestamp('2012-02-01') and users=selector") expected = df[(df.ts >= Timestamp('2012-02-01')) & df.users.isin(selector)] tm.assert_frame_equal(expected, result) selector = range(100, 200) - result = store.select('df', [Term('B=selector')]) + result = store.select('df', 'B=selector') expected = df[df.B.isin(selector)] tm.assert_frame_equal(expected, result) - self.assertEqual(len(result), 100) + assert len(result) == 100 # big selector along the index selector = Index(df.ts[0:100].values) - result = store.select('df', [Term('ts=selector')]) + result = store.select('df', 'ts=selector') expected = df[df.ts.isin(selector.values)] tm.assert_frame_equal(expected, result) - self.assertEqual(len(result), 100) + assert len(result) == 100 def test_select_iterator(self): @@ -3359,7 +3413,7 @@ def test_select_iterator(self): tm.assert_frame_equal(expected, result) results = [s for s in store.select('df', chunksize=100)] - self.assertEqual(len(results), 5) + assert len(results) == 5 result = concat(results) tm.assert_frame_equal(expected, result) @@ -3371,10 +3425,10 @@ def test_select_iterator(self): df = tm.makeTimeDataFrame(500) df.to_hdf(path, 'df_non_table') - self.assertRaises(TypeError, read_hdf, path, - 'df_non_table', chunksize=100) - self.assertRaises(TypeError, read_hdf, path, - 'df_non_table', iterator=True) + pytest.raises(TypeError, read_hdf, path, + 'df_non_table', chunksize=100) + pytest.raises(TypeError, read_hdf, path, + 'df_non_table', iterator=True) with ensure_clean_path(self.path) as path: @@ -3384,7 +3438,7 @@ def test_select_iterator(self): results = [s for s in read_hdf(path, 'df', chunksize=100)] result = concat(results) - self.assertEqual(len(results), 5) + assert len(results) == 5 tm.assert_frame_equal(result, df) tm.assert_frame_equal(result, read_hdf(path, 'df')) @@ -3409,17 +3463,6 @@ def test_select_iterator(self): result = concat(results) tm.assert_frame_equal(expected, result) - # where selection - # expected = store.select_as_multiple( - # ['df1', 'df2'], where= Term('A>0'), selector='df1') - # results = [] - # for s in store.select_as_multiple( - # ['df1', 'df2'], where= Term('A>0'), selector='df1', - # chunksize=25): - # results.append(s) - # result = concat(results) - # tm.assert_frame_equal(expected, result) - def test_select_iterator_complete_8014(self): # GH 8014 @@ -3548,7 +3591,7 @@ def test_select_iterator_non_complete_8014(self): where = "index > '%s'" % end_dt results = [s for s in store.select( 'df', where=where, chunksize=chunksize)] - self.assertEqual(0, len(results)) + assert 0 == len(results) def test_select_iterator_many_empty_frames(self): @@ -3580,7 +3623,7 @@ def test_select_iterator_many_empty_frames(self): results = [s for s in store.select( 'df', where=where, chunksize=chunksize)] - tm.assert_equal(1, len(results)) + assert len(results) == 1 result = concat(results) rexpected = expected[expected.index <= end_dt] tm.assert_frame_equal(rexpected, result) @@ -3591,7 +3634,7 @@ def test_select_iterator_many_empty_frames(self): 'df', where=where, chunksize=chunksize)] # should be 1, is 10 - tm.assert_equal(1, len(results)) + assert len(results) == 1 result = concat(results) rexpected = expected[(expected.index >= beg_dt) & (expected.index <= end_dt)] @@ -3609,7 +3652,7 @@ def test_select_iterator_many_empty_frames(self): 'df', where=where, chunksize=chunksize)] # should be [] - tm.assert_equal(0, len(results)) + assert len(results) == 0 def test_retain_index_attributes(self): @@ -3627,19 +3670,18 @@ def test_retain_index_attributes(self): for attr in ['freq', 'tz', 'name']: for idx in ['index', 'columns']: - self.assertEqual(getattr(getattr(df, idx), attr, None), - getattr(getattr(result, idx), attr, None)) + assert (getattr(getattr(df, idx), attr, None) == + getattr(getattr(result, idx), attr, None)) # try to append a table with a different frequency - with tm.assert_produces_warning( - expected_warning=AttributeConflictWarning): + with catch_warnings(record=True): df2 = DataFrame(dict( A=Series(lrange(3), index=date_range('2002-1-1', periods=3, freq='D')))) store.append('data', df2) - self.assertIsNone(store.get_storer('data').info['index']['freq']) + assert store.get_storer('data').info['index']['freq'] is None # this is ok _maybe_remove(store, 'df2') @@ -3656,9 +3698,8 @@ def test_retain_index_attributes(self): def test_retain_index_attributes2(self): with ensure_clean_path(self.path) as path: - expected_warning = Warning if PY35 else AttributeConflictWarning - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): + + with catch_warnings(record=True): df = DataFrame(dict( A=Series(lrange(3), @@ -3676,37 +3717,41 @@ def test_retain_index_attributes2(self): df = DataFrame(dict(A=Series(lrange(3), index=idx))) df.to_hdf(path, 'data', mode='w', append=True) - self.assertEqual(read_hdf(path, 'data').index.name, 'foo') + assert read_hdf(path, 'data').index.name == 'foo' - with tm.assert_produces_warning(expected_warning=expected_warning, - check_stacklevel=False): + with catch_warnings(record=True): idx2 = date_range('2001-1-1', periods=3, freq='H') idx2.name = 'bar' df2 = DataFrame(dict(A=Series(lrange(3), index=idx2))) df2.to_hdf(path, 'data', append=True) - self.assertIsNone(read_hdf(path, 'data').index.name) + assert read_hdf(path, 'data').index.name is None def test_panel_select(self): - wp = tm.makePanel() - with ensure_clean_store(self.path) as store: - store.put('wp', wp, format='table') - date = wp.major_axis[len(wp.major_axis) // 2] - crit1 = ('major_axis>=date') - crit2 = ("minor_axis=['A', 'D']") + with catch_warnings(record=True): - result = store.select('wp', [crit1, crit2]) - expected = wp.truncate(before=date).reindex(minor=['A', 'D']) - assert_panel_equal(result, expected) + wp = tm.makePanel() - result = store.select( - 'wp', ['major_axis>="20000124"', ("minor_axis=['A', 'B']")]) - expected = wp.truncate(before='20000124').reindex(minor=['A', 'B']) - assert_panel_equal(result, expected) + store.put('wp', wp, format='table') + date = wp.major_axis[len(wp.major_axis) // 2] + + crit1 = ('major_axis>=date') + crit2 = ("minor_axis=['A', 'D']") + + result = store.select('wp', [crit1, crit2]) + expected = wp.truncate(before=date).reindex(minor=['A', 'D']) + assert_panel_equal(result, expected) + + result = store.select( + 'wp', ['major_axis>="20000124"', + ("minor_axis=['A', 'B']")]) + expected = wp.truncate( + before='20000124').reindex(minor=['A', 'B']) + assert_panel_equal(result, expected) def test_frame_select(self): @@ -3717,7 +3762,7 @@ def test_frame_select(self): date = df.index[len(df) // 2] crit1 = Term('index>=date') - self.assertEqual(crit1.env.scope['date'], date) + assert crit1.env.scope['date'] == date crit2 = ("columns=['A', 'D']") crit3 = ('columns=A') @@ -3733,12 +3778,12 @@ def test_frame_select(self): # invalid terms df = tm.makeTimeDataFrame() store.append('df_time', df) - self.assertRaises( - ValueError, store.select, 'df_time', [Term("index>0")]) + pytest.raises( + ValueError, store.select, 'df_time', "index>0") # can't select if not written as table # store['frame'] = df - # self.assertRaises(ValueError, store.select, + # pytest.raises(ValueError, store.select, # 'frame', [crit1, crit2]) def test_frame_select_complex(self): @@ -3777,8 +3822,8 @@ def test_frame_select_complex(self): tm.assert_frame_equal(result, expected) # invert not implemented in numexpr :( - self.assertRaises(NotImplementedError, - store.select, 'df', '~(string="bar")') + pytest.raises(NotImplementedError, + store.select, 'df', '~(string="bar")') # invert ok for filters result = store.select('df', "~(columns=['A','B'])") @@ -3813,15 +3858,10 @@ def test_frame_select_complex2(self): hist.to_hdf(hh, 'df', mode='w', format='table') - expected = read_hdf(hh, 'df', where=Term('l1', '=', [2, 3, 4])) - - # list like - result = read_hdf(hh, 'df', where=Term( - 'l1', '=', selection.index.tolist())) - assert_frame_equal(result, expected) - l = selection.index.tolist() # noqa + expected = read_hdf(hh, 'df', where='l1=[2, 3, 4]') # sccope with list like + l = selection.index.tolist() # noqa store = HDFStore(hh) result = store.select('df', where='l1=l') assert_frame_equal(result, expected) @@ -3871,12 +3911,12 @@ def test_invalid_filtering(self): store.put('df', df, format='table') # not implemented - self.assertRaises(NotImplementedError, store.select, - 'df', "columns=['A'] | columns=['B']") + pytest.raises(NotImplementedError, store.select, + 'df', "columns=['A'] | columns=['B']") # in theory we could deal with this - self.assertRaises(NotImplementedError, store.select, - 'df', "columns=['A','B'] & columns=['C']") + pytest.raises(NotImplementedError, store.select, + 'df', "columns=['A','B'] & columns=['C']") def test_string_select(self): # GH 2973 @@ -3890,12 +3930,12 @@ def test_string_select(self): store.append('df', df, data_columns=['x']) - result = store.select('df', Term('x=none')) + result = store.select('df', 'x=none') expected = df[df.x == 'none'] assert_frame_equal(result, expected) try: - result = store.select('df', Term('x!=none')) + result = store.select('df', 'x!=none') expected = df[df.x != 'none'] assert_frame_equal(result, expected) except Exception as detail: @@ -3907,8 +3947,8 @@ def test_string_select(self): df2.loc[df2.x == '', 'x'] = np.nan store.append('df2', df2, data_columns=['x']) - result = store.select('df2', Term('x!=none')) - expected = df2[isnull(df2.x)] + result = store.select('df2', 'x!=none') + expected = df2[isna(df2.x)] assert_frame_equal(result, expected) # int ==/!= @@ -3917,11 +3957,11 @@ def test_string_select(self): store.append('df3', df, data_columns=['int']) - result = store.select('df3', Term('int=2')) + result = store.select('df3', 'int=2') expected = df[df.int == 2] assert_frame_equal(result, expected) - result = store.select('df3', Term('int!=2')) + result = store.select('df3', 'int!=2') expected = df[df.int != 2] assert_frame_equal(result, expected) @@ -3934,19 +3974,19 @@ def test_read_column(self): store.append('df', df) # error - self.assertRaises(KeyError, store.select_column, 'df', 'foo') + pytest.raises(KeyError, store.select_column, 'df', 'foo') def f(): store.select_column('df', 'index', where=['index>5']) - self.assertRaises(Exception, f) + pytest.raises(Exception, f) # valid result = store.select_column('df', 'index') tm.assert_almost_equal(result.values, Series(df.index).values) - self.assertIsInstance(result, Series) + assert isinstance(result, Series) # not a data indexable column - self.assertRaises( + pytest.raises( ValueError, store.select_column, 'df', 'values_block_0') # a data column @@ -4018,7 +4058,7 @@ def test_coordinates(self): result = store.select('df', where=c) expected = df.loc[3:4, :] tm.assert_frame_equal(result, expected) - self.assertIsInstance(c, Index) + assert isinstance(c, Index) # multiple tables _maybe_remove(store, 'df1') @@ -4056,14 +4096,14 @@ def test_coordinates(self): tm.assert_frame_equal(result, expected) # invalid - self.assertRaises(ValueError, store.select, 'df', - where=np.arange(len(df), dtype='float64')) - self.assertRaises(ValueError, store.select, 'df', - where=np.arange(len(df) + 1)) - self.assertRaises(ValueError, store.select, 'df', - where=np.arange(len(df)), start=5) - self.assertRaises(ValueError, store.select, 'df', - where=np.arange(len(df)), start=5, stop=10) + pytest.raises(ValueError, store.select, 'df', + where=np.arange(len(df), dtype='float64')) + pytest.raises(ValueError, store.select, 'df', + where=np.arange(len(df) + 1)) + pytest.raises(ValueError, store.select, 'df', + where=np.arange(len(df)), start=5) + pytest.raises(ValueError, store.select, 'df', + where=np.arange(len(df)), start=5, stop=10) # selection with filter selection = date_range('20000101', periods=500) @@ -4099,12 +4139,12 @@ def test_append_to_multiple(self): with ensure_clean_store(self.path) as store: # exceptions - self.assertRaises(ValueError, store.append_to_multiple, - {'df1': ['A', 'B'], 'df2': None}, df, - selector='df3') - self.assertRaises(ValueError, store.append_to_multiple, - {'df1': None, 'df2': None}, df, selector='df3') - self.assertRaises( + pytest.raises(ValueError, store.append_to_multiple, + {'df1': ['A', 'B'], 'df2': None}, df, + selector='df3') + pytest.raises(ValueError, store.append_to_multiple, + {'df1': None, 'df2': None}, df, selector='df3') + pytest.raises( ValueError, store.append_to_multiple, 'df1', df, 'df1') # regular operation @@ -4122,6 +4162,7 @@ def test_append_to_multiple_dropna(self): df = concat([df1, df2], axis=1) with ensure_clean_store(self.path) as store: + # dropna=True should guarantee rows are synchronized store.append_to_multiple( {'df1': ['A', 'B'], 'df2': None}, df, selector='df1', @@ -4132,14 +4173,27 @@ def test_append_to_multiple_dropna(self): tm.assert_index_equal(store.select('df1').index, store.select('df2').index) + @pytest.mark.xfail(run=False, + reason="append_to_multiple_dropna_false " + "is not raising as failed") + def test_append_to_multiple_dropna_false(self): + df1 = tm.makeTimeDataFrame() + df2 = tm.makeTimeDataFrame().rename(columns=lambda x: "%s_2" % x) + df1.iloc[1, df1.columns.get_indexer(['A', 'B'])] = np.nan + df = concat([df1, df2], axis=1) + + with ensure_clean_store(self.path) as store: + # dropna=False shouldn't synchronize row indexes store.append_to_multiple( - {'df1': ['A', 'B'], 'df2': None}, df, selector='df1', + {'df1a': ['A', 'B'], 'df2a': None}, df, selector='df1a', dropna=False) - self.assertRaises( - ValueError, store.select_as_multiple, ['df1', 'df2']) - assert not store.select('df1').index.equals( - store.select('df2').index) + + with pytest.raises(ValueError): + store.select_as_multiple(['df1a', 'df2a']) + + assert not store.select('df1a').index.equals( + store.select('df2a').index) def test_select_as_multiple(self): @@ -4150,25 +4204,25 @@ def test_select_as_multiple(self): with ensure_clean_store(self.path) as store: # no tables stored - self.assertRaises(Exception, store.select_as_multiple, - None, where=['A>0', 'B>0'], selector='df1') + pytest.raises(Exception, store.select_as_multiple, + None, where=['A>0', 'B>0'], selector='df1') store.append('df1', df1, data_columns=['A', 'B']) store.append('df2', df2) # exceptions - self.assertRaises(Exception, store.select_as_multiple, - None, where=['A>0', 'B>0'], selector='df1') - self.assertRaises(Exception, store.select_as_multiple, - [None], where=['A>0', 'B>0'], selector='df1') - self.assertRaises(KeyError, store.select_as_multiple, - ['df1', 'df3'], where=['A>0', 'B>0'], - selector='df1') - self.assertRaises(KeyError, store.select_as_multiple, - ['df3'], where=['A>0', 'B>0'], selector='df1') - self.assertRaises(KeyError, store.select_as_multiple, - ['df1', 'df2'], where=['A>0', 'B>0'], - selector='df4') + pytest.raises(Exception, store.select_as_multiple, + None, where=['A>0', 'B>0'], selector='df1') + pytest.raises(Exception, store.select_as_multiple, + [None], where=['A>0', 'B>0'], selector='df1') + pytest.raises(KeyError, store.select_as_multiple, + ['df1', 'df3'], where=['A>0', 'B>0'], + selector='df1') + pytest.raises(KeyError, store.select_as_multiple, + ['df3'], where=['A>0', 'B>0'], selector='df1') + pytest.raises(KeyError, store.select_as_multiple, + ['df1', 'df2'], where=['A>0', 'B>0'], + selector='df4') # default select result = store.select('df1', ['A>0', 'B>0']) @@ -4187,24 +4241,24 @@ def test_select_as_multiple(self): tm.assert_frame_equal(result, expected) # multiple (diff selector) - result = store.select_as_multiple(['df1', 'df2'], where=[Term( - 'index>df2.index[4]')], selector='df2') + result = store.select_as_multiple( + ['df1', 'df2'], where='index>df2.index[4]', selector='df2') expected = concat([df1, df2], axis=1) expected = expected[5:] tm.assert_frame_equal(result, expected) # test excpection for diff rows store.append('df3', tm.makeTimeDataFrame(nper=50)) - self.assertRaises(ValueError, store.select_as_multiple, - ['df1', 'df3'], where=['A>0', 'B>0'], - selector='df1') + pytest.raises(ValueError, store.select_as_multiple, + ['df1', 'df3'], where=['A>0', 'B>0'], + selector='df1') def test_nan_selection_bug_4858(self): # GH 4858; nan selection bug, only works for pytables >= 3.1 if LooseVersion(tables.__version__) < '3.1.0': - raise nose.SkipTest('tables version does not support fix for nan ' - 'selection bug: GH 4858') + pytest.skip('tables version does not support fix for nan ' + 'selection bug: GH 4858') with ensure_clean_store(self.path) as store: @@ -4230,17 +4284,32 @@ def test_start_stop_table(self): store.append('df', df) result = store.select( - 'df', [Term("columns=['A']")], start=0, stop=5) + 'df', "columns=['A']", start=0, stop=5) expected = df.loc[0:4, ['A']] tm.assert_frame_equal(result, expected) # out of range result = store.select( - 'df', [Term("columns=['A']")], start=30, stop=40) - self.assertTrue(len(result) == 0) + 'df', "columns=['A']", start=30, stop=40) + assert len(result) == 0 expected = df.loc[30:40, ['A']] tm.assert_frame_equal(result, expected) + def test_start_stop_multiple(self): + + # GH 16209 + with ensure_clean_store(self.path) as store: + + df = DataFrame({"foo": [1, 2], "bar": [1, 2]}) + + store.append_to_multiple({'selector': ['foo'], 'data': None}, df, + selector='selector') + result = store.select_as_multiple(['selector', 'data'], + selector='selector', start=0, + stop=1) + expected = df.loc[[0], ['foo', 'bar']] + tm.assert_frame_equal(result, expected) + def test_start_stop_fixed(self): with ensure_clean_store(self.path) as store: @@ -4284,7 +4353,7 @@ def test_start_stop_fixed(self): df.iloc[8:10, -2] = np.nan dfs = df.to_sparse() store.put('dfs', dfs) - with self.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): store.select('dfs', start=0, stop=5) def test_select_filter_corner(self): @@ -4296,14 +4365,57 @@ def test_select_filter_corner(self): with ensure_clean_store(self.path) as store: store.put('frame', df, format='table') - crit = Term('columns=df.columns[:75]') + crit = 'columns=df.columns[:75]' result = store.select('frame', [crit]) tm.assert_frame_equal(result, df.loc[:, df.columns[:75]]) - crit = Term('columns=df.columns[:75:2]') + crit = 'columns=df.columns[:75:2]' result = store.select('frame', [crit]) tm.assert_frame_equal(result, df.loc[:, df.columns[:75:2]]) + def test_path_pathlib(self): + df = tm.makeDataFrame() + + result = tm.round_trip_pathlib( + lambda p: df.to_hdf(p, 'df'), + lambda p: pd.read_hdf(p, 'df')) + tm.assert_frame_equal(df, result) + + def test_path_pathlib_hdfstore(self): + df = tm.makeDataFrame() + + def writer(path): + with pd.HDFStore(path) as store: + df.to_hdf(store, 'df') + + def reader(path): + with pd.HDFStore(path) as store: + return pd.read_hdf(store, 'df') + + result = tm.round_trip_pathlib(writer, reader) + tm.assert_frame_equal(df, result) + + def test_pickle_path_localpath(self): + df = tm.makeDataFrame() + result = tm.round_trip_pathlib( + lambda p: df.to_hdf(p, 'df'), + lambda p: pd.read_hdf(p, 'df')) + tm.assert_frame_equal(df, result) + + def test_path_localpath_hdfstore(self): + df = tm.makeDataFrame() + + def writer(path): + with pd.HDFStore(path) as store: + df.to_hdf(store, 'df') + + def reader(path): + with pd.HDFStore(path) as store: + return pd.read_hdf(store, 'df') + + result = tm.round_trip_localpath(writer, reader) + tm.assert_frame_equal(df, result) + def _check_roundtrip(self, obj, comparator, compression=False, **kwargs): options = {} @@ -4337,11 +4449,11 @@ def _check_roundtrip_table(self, obj, comparator, compression=False): with ensure_clean_store(self.path, 'w', **options) as store: store.put('obj', obj, format='table') retrieved = store['obj'] - # sorted_obj = _test_sort(obj) + comparator(retrieved, obj) def test_multiple_open_close(self): - # GH 4409, open & close multiple times + # gh-4409: open & close multiple times with ensure_clean_path(self.path) as path: @@ -4350,11 +4462,12 @@ def test_multiple_open_close(self): # single store = HDFStore(path) - self.assertNotIn('CLOSED', str(store)) - self.assertTrue(store.is_open) + assert 'CLOSED' not in store.info() + assert store.is_open + store.close() - self.assertIn('CLOSED', str(store)) - self.assertFalse(store.is_open) + assert 'CLOSED' in store.info() + assert not store.is_open with ensure_clean_path(self.path) as path: @@ -4365,7 +4478,7 @@ def test_multiple_open_close(self): def f(): HDFStore(path) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) store1.close() else: @@ -4374,22 +4487,22 @@ def f(): store1 = HDFStore(path) store2 = HDFStore(path) - self.assertNotIn('CLOSED', str(store1)) - self.assertNotIn('CLOSED', str(store2)) - self.assertTrue(store1.is_open) - self.assertTrue(store2.is_open) + assert 'CLOSED' not in store1.info() + assert 'CLOSED' not in store2.info() + assert store1.is_open + assert store2.is_open store1.close() - self.assertIn('CLOSED', str(store1)) - self.assertFalse(store1.is_open) - self.assertNotIn('CLOSED', str(store2)) - self.assertTrue(store2.is_open) + assert 'CLOSED' in store1.info() + assert not store1.is_open + assert 'CLOSED' not in store2.info() + assert store2.is_open store2.close() - self.assertIn('CLOSED', str(store1)) - self.assertIn('CLOSED', str(store2)) - self.assertFalse(store1.is_open) - self.assertFalse(store2.is_open) + assert 'CLOSED' in store1.info() + assert 'CLOSED' in store2.info() + assert not store1.is_open + assert not store2.is_open # nested close store = HDFStore(path, mode='w') @@ -4398,12 +4511,12 @@ def f(): store2 = HDFStore(path) store2.append('df2', df) store2.close() - self.assertIn('CLOSED', str(store2)) - self.assertFalse(store2.is_open) + assert 'CLOSED' in store2.info() + assert not store2.is_open store.close() - self.assertIn('CLOSED', str(store)) - self.assertFalse(store.is_open) + assert 'CLOSED' in store.info() + assert not store.is_open # double closing store = HDFStore(path, mode='w') @@ -4411,12 +4524,12 @@ def f(): store2 = HDFStore(path) store.close() - self.assertIn('CLOSED', str(store)) - self.assertFalse(store.is_open) + assert 'CLOSED' in store.info() + assert not store.is_open store2.close() - self.assertIn('CLOSED', str(store2)) - self.assertFalse(store2.is_open) + assert 'CLOSED' in store2.info() + assert not store2.is_open # ops on a closed store with ensure_clean_path(self.path) as path: @@ -4427,21 +4540,21 @@ def f(): store = HDFStore(path) store.close() - self.assertRaises(ClosedFileError, store.keys) - self.assertRaises(ClosedFileError, lambda: 'df' in store) - self.assertRaises(ClosedFileError, lambda: len(store)) - self.assertRaises(ClosedFileError, lambda: store['df']) - self.assertRaises(ClosedFileError, lambda: store.df) - self.assertRaises(ClosedFileError, store.select, 'df') - self.assertRaises(ClosedFileError, store.get, 'df') - self.assertRaises(ClosedFileError, store.append, 'df2', df) - self.assertRaises(ClosedFileError, store.put, 'df3', df) - self.assertRaises(ClosedFileError, store.get_storer, 'df2') - self.assertRaises(ClosedFileError, store.remove, 'df2') + pytest.raises(ClosedFileError, store.keys) + pytest.raises(ClosedFileError, lambda: 'df' in store) + pytest.raises(ClosedFileError, lambda: len(store)) + pytest.raises(ClosedFileError, lambda: store['df']) + pytest.raises(AttributeError, lambda: store.df) + pytest.raises(ClosedFileError, store.select, 'df') + pytest.raises(ClosedFileError, store.get, 'df') + pytest.raises(ClosedFileError, store.append, 'df2', df) + pytest.raises(ClosedFileError, store.put, 'df3', df) + pytest.raises(ClosedFileError, store.get_storer, 'df2') + pytest.raises(ClosedFileError, store.remove, 'df2') def f(): store.select('df') - tm.assertRaisesRegexp(ClosedFileError, 'file is not open', f) + tm.assert_raises_regex(ClosedFileError, 'file is not open', f) def test_pytables_native_read(self): @@ -4449,55 +4562,46 @@ def test_pytables_native_read(self): tm.get_data_path('legacy_hdf/pytables_native.h5'), mode='r') as store: d2 = store['detector/readout'] - self.assertIsInstance(d2, DataFrame) + assert isinstance(d2, DataFrame) def test_pytables_native2_read(self): # fails on win/3.5 oddly if PY35 and is_platform_windows(): - raise nose.SkipTest("native2 read fails oddly on windows / 3.5") + pytest.skip("native2 read fails oddly on windows / 3.5") with ensure_clean_store( tm.get_data_path('legacy_hdf/pytables_native2.h5'), mode='r') as store: str(store) d1 = store['detector'] - self.assertIsInstance(d1, DataFrame) - - def test_legacy_read(self): - with ensure_clean_store( - tm.get_data_path('legacy_hdf/legacy.h5'), - mode='r') as store: - store['a'] - store['b'] - store['c'] - store['d'] + assert isinstance(d1, DataFrame) def test_legacy_table_read(self): # legacy table types with ensure_clean_store( tm.get_data_path('legacy_hdf/legacy_table.h5'), mode='r') as store: - store.select('df1') - store.select('df2') - store.select('wp1') - # force the frame - store.select('df2', typ='legacy_frame') + with catch_warnings(record=True): + store.select('df1') + store.select('df2') + store.select('wp1') + + # force the frame + store.select('df2', typ='legacy_frame') - # old version warning - with tm.assert_produces_warning( - expected_warning=IncompatibilityWarning): - self.assertRaises( - Exception, store.select, 'wp1', Term('minor_axis=B')) + # old version warning + pytest.raises( + Exception, store.select, 'wp1', 'minor_axis=B') df2 = store.select('df2') - result = store.select('df2', Term('index>df2.index[2]')) + result = store.select('df2', 'index>df2.index[2]') expected = df2[df2.index > df2.index[2]] assert_frame_equal(expected, result) def test_legacy_0_10_read(self): # legacy from 0.10 - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): path = tm.get_data_path('legacy_hdf/legacy_0.10.h5') with ensure_clean_store(path, mode='r') as store: str(store) @@ -4521,7 +4625,7 @@ def test_legacy_0_11_read(self): def test_copy(self): - with compat_assert_produces_warning(FutureWarning): + with catch_warnings(record=True): def do_copy(f=None, new_f=None, keys=None, propindexes=True, **kwargs): @@ -4542,7 +4646,7 @@ def do_copy(f=None, new_f=None, keys=None, # check keys if keys is None: keys = store.keys() - self.assertEqual(set(keys), set(tstore.keys())) + assert set(keys) == set(tstore.keys()) # check indicies & nrows for k in tstore.keys(): @@ -4550,14 +4654,13 @@ def do_copy(f=None, new_f=None, keys=None, new_t = tstore.get_storer(k) orig_t = store.get_storer(k) - self.assertEqual(orig_t.nrows, new_t.nrows) + assert orig_t.nrows == new_t.nrows # check propindixes if propindexes: for a in orig_t.axes: if a.is_indexed: - self.assertTrue( - new_t[a.name].is_indexed) + assert new_t[a.name].is_indexed finally: safe_close(store) @@ -4586,13 +4689,14 @@ def do_copy(f=None, new_f=None, keys=None, safe_remove(path) def test_legacy_table_write(self): - raise nose.SkipTest("cannot write legacy tables") + pytest.skip("cannot write legacy tables") store = HDFStore(tm.get_data_path( 'legacy_hdf/legacy_table_%s.h5' % pandas.__version__), 'a') df = tm.makeDataFrame() - wp = tm.makePanel() + with catch_warnings(record=True): + wp = tm.makePanel() index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], @@ -4615,7 +4719,7 @@ def test_store_datetime_fractional_secs(self): dt = datetime.datetime(2012, 1, 2, 3, 4, 5, 123456) series = Series([0], [dt]) store['a'] = series - self.assertEqual(store['a'].index[0], dt) + assert store['a'].index[0] == dt def test_tseries_indices_series(self): @@ -4625,18 +4729,18 @@ def test_tseries_indices_series(self): store['a'] = ser result = store['a'] - assert_series_equal(result, ser) - self.assertEqual(type(result.index), type(ser.index)) - self.assertEqual(result.index.freq, ser.index.freq) + tm.assert_series_equal(result, ser) + assert result.index.freq == ser.index.freq + tm.assert_class_equal(result.index, ser.index, obj="series index") idx = tm.makePeriodIndex(10) ser = Series(np.random.randn(len(idx)), idx) store['a'] = ser result = store['a'] - assert_series_equal(result, ser) - self.assertEqual(type(result.index), type(ser.index)) - self.assertEqual(result.index.freq, ser.index.freq) + tm.assert_series_equal(result, ser) + assert result.index.freq == ser.index.freq + tm.assert_class_equal(result.index, ser.index, obj="series index") def test_tseries_indices_frame(self): @@ -4647,8 +4751,9 @@ def test_tseries_indices_frame(self): result = store['a'] assert_frame_equal(result, df) - self.assertEqual(type(result.index), type(df.index)) - self.assertEqual(result.index.freq, df.index.freq) + assert result.index.freq == df.index.freq + tm.assert_class_equal(result.index, df.index, + obj="dataframe index") idx = tm.makePeriodIndex(10) df = DataFrame(np.random.randn(len(idx), 3), idx) @@ -4656,14 +4761,16 @@ def test_tseries_indices_frame(self): result = store['a'] assert_frame_equal(result, df) - self.assertEqual(type(result.index), type(df.index)) - self.assertEqual(result.index.freq, df.index.freq) + assert result.index.freq == df.index.freq + tm.assert_class_equal(result.index, df.index, + obj="dataframe index") def test_unicode_index(self): unicode_values = [u('\u03c3'), u('\u03c3\u03c3')] - with compat_assert_produces_warning(PerformanceWarning): + # PerformanceWarning + with catch_warnings(record=True): s = Series(np.random.randn(len(unicode_values)), unicode_values) self._check_roundtrip(s, tm.assert_series_equal) @@ -4696,7 +4803,7 @@ def test_store_datetime_mixed(self): # index=[np.arange(5).repeat(2), # np.tile(np.arange(2), 5)]) - # self.assertRaises(Exception, store.put, 'foo', df, format='table') + # pytest.raises(Exception, store.put, 'foo', df, format='table') def test_append_with_diff_col_name_types_raises_value_error(self): df = DataFrame(np.random.randn(10, 1)) @@ -4710,7 +4817,7 @@ def test_append_with_diff_col_name_types_raises_value_error(self): store.append(name, df) for d in (df2, df3, df4, df5): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): store.append(name, d) def test_query_with_nested_special_character(self): @@ -4727,7 +4834,7 @@ def test_categorical(self): with ensure_clean_store(self.path) as store: - # basic + # Basic _maybe_remove(store, 's') s = Series(Categorical(['a', 'b', 'b', 'a', 'a', 'c'], categories=[ 'a', 'b', 'c', 'd'], ordered=False)) @@ -4743,12 +4850,13 @@ def test_categorical(self): tm.assert_series_equal(s, result) _maybe_remove(store, 'df') + df = DataFrame({"s": s, "vals": [1, 2, 3, 4, 5, 6]}) store.append('df', df, format='table') result = store.select('df') tm.assert_frame_equal(result, df) - # dtypes + # Dtypes s = Series([1, 1, 2, 2, 3, 4, 5]).astype('category') store.append('si', s) result = store.select('si') @@ -4759,17 +4867,18 @@ def test_categorical(self): result = store.select('si2') tm.assert_series_equal(result, s) - # multiple + # Multiple df2 = df.copy() df2['s2'] = Series(list('abcdefg')).astype('category') store.append('df2', df2) result = store.select('df2') tm.assert_frame_equal(result, df2) - # make sure the metadata is ok - self.assertTrue('/df2 ' in str(store)) - self.assertTrue('/df2/meta/values_block_0/meta' in str(store)) - self.assertTrue('/df2/meta/values_block_1/meta' in str(store)) + # Make sure the metadata is OK + info = store.info() + assert '/df2 ' in info + assert '/df2/meta/values_block_0/meta' in info + assert '/df2/meta/values_block_1/meta' in info # unordered s = Series(Categorical(['a', 'b', 'b', 'a', 'a', 'c'], categories=[ @@ -4778,7 +4887,7 @@ def test_categorical(self): result = store.select('s2') tm.assert_series_equal(result, s) - # query + # Query store.append('df3', df, data_columns=['s']) expected = df[df.s.isin(['b', 'c'])] result = store.select('df3', where=['s in ["b","c"]']) @@ -4796,7 +4905,7 @@ def test_categorical(self): result = store.select('df3', where=['s in ["f"]']) tm.assert_frame_equal(result, expected) - # appending with same categories is ok + # Appending with same categories is ok store.append('df3', df) df = concat([df, df]) @@ -4804,20 +4913,21 @@ def test_categorical(self): result = store.select('df3', where=['s in ["b","c"]']) tm.assert_frame_equal(result, expected) - # appending must have the same categories + # Appending must have the same categories df3 = df.copy() df3['s'].cat.remove_unused_categories(inplace=True) - self.assertRaises(ValueError, lambda: store.append('df3', df3)) + with pytest.raises(ValueError): + store.append('df3', df3) - # remove - # make sure meta data is removed (its a recursive removal so should - # be) + # Remove, and make sure meta data is removed (its a recursive + # removal so should be). result = store.select('df3/meta/s/meta') - self.assertIsNotNone(result) + assert result is not None store.remove('df3') - self.assertRaises( - KeyError, lambda: store.select('df3/meta/s/meta')) + + with pytest.raises(KeyError): + store.select('df3/meta/s/meta') def test_categorical_conversion(self): @@ -4853,15 +4963,15 @@ def test_duplicate_column_name(self): df = DataFrame(columns=["a", "a"], data=[[0, 0]]) with ensure_clean_path(self.path) as path: - self.assertRaises(ValueError, df.to_hdf, - path, 'df', format='fixed') + pytest.raises(ValueError, df.to_hdf, + path, 'df', format='fixed') df.to_hdf(path, 'df', format='table') other = read_hdf(path, 'df') tm.assert_frame_equal(df, other) - self.assertTrue(df.equals(other)) - self.assertTrue(other.equals(df)) + assert df.equals(other) + assert other.equals(df) def test_round_trip_equals(self): # GH 9330 @@ -4871,8 +4981,8 @@ def test_round_trip_equals(self): df.to_hdf(path, 'df', format='table') other = read_hdf(path, 'df') tm.assert_frame_equal(df, other) - self.assertTrue(df.equals(other)) - self.assertTrue(other.equals(df)) + assert df.equals(other) + assert other.equals(df) def test_preserve_timedeltaindex_type(self): # GH9635 @@ -4908,7 +5018,7 @@ def test_colums_multiindex_modified(self): cols2load = list('BCD') cols2load_original = list(cols2load) df_loaded = read_hdf(path, 'df', columns=cols2load) # noqa - self.assertTrue(cols2load_original == cols2load) + assert cols2load_original == cols2load def test_to_hdf_with_object_column_names(self): # GH9057 @@ -4928,18 +5038,21 @@ def test_to_hdf_with_object_column_names(self): for index in types_should_fail: df = DataFrame(np.random.randn(10, 2), columns=index(2)) with ensure_clean_path(self.path) as path: - with self.assertRaises( + with catch_warnings(record=True): + with pytest.raises( ValueError, msg=("cannot have non-object label " "DataIndexableCol")): - df.to_hdf(path, 'df', format='table', data_columns=True) + df.to_hdf(path, 'df', format='table', + data_columns=True) for index in types_should_run: df = DataFrame(np.random.randn(10, 2), columns=index(2)) with ensure_clean_path(self.path) as path: - df.to_hdf(path, 'df', format='table', data_columns=True) - result = pd.read_hdf( - path, 'df', where="index = [{0}]".format(df.index[0])) - assert(len(result)) + with catch_warnings(record=True): + df.to_hdf(path, 'df', format='table', data_columns=True) + result = pd.read_hdf( + path, 'df', where="index = [{0}]".format(df.index[0])) + assert(len(result)) def test_read_hdf_open_store(self): # GH10330 @@ -4956,7 +5069,7 @@ def test_read_hdf_open_store(self): store = HDFStore(path, mode='r') indirect = read_hdf(store, 'df') tm.assert_frame_equal(direct, indirect) - self.assertTrue(store.is_open) + assert store.is_open store.close() def test_read_hdf_iterator(self): @@ -4970,7 +5083,7 @@ def test_read_hdf_iterator(self): df.to_hdf(path, 'df', mode='w', format='t') direct = read_hdf(path, 'df') iterator = read_hdf(path, 'df', iterator=True) - self.assertTrue(isinstance(iterator, TableIterator)) + assert isinstance(iterator, TableIterator) indirect = next(iterator.__iter__()) tm.assert_frame_equal(direct, indirect) iterator.store.close() @@ -4981,21 +5094,22 @@ def test_read_hdf_errors(self): columns=list('ABCDE')) with ensure_clean_path(self.path) as path: - self.assertRaises(IOError, read_hdf, path, 'key') + pytest.raises(IOError, read_hdf, path, 'key') df.to_hdf(path, 'df') store = HDFStore(path, mode='r') store.close() - self.assertRaises(IOError, read_hdf, store, 'df') - with open(path, mode='r') as store: - self.assertRaises(NotImplementedError, read_hdf, store, 'df') + pytest.raises(IOError, read_hdf, store, 'df') + + def test_read_hdf_generic_buffer_errors(self): + pytest.raises(NotImplementedError, read_hdf, BytesIO(b''), 'df') def test_invalid_complib(self): df = DataFrame(np.random.rand(4, 5), index=list('abcd'), columns=list('ABCDE')) with ensure_clean_path(self.path) as path: - self.assertRaises(ValueError, df.to_hdf, path, - 'df', complib='blosc:zlib') + with pytest.raises(ValueError): + df.to_hdf(path, 'df', complib='foolib') # GH10443 def test_read_nokey(self): @@ -5010,7 +5124,7 @@ def test_read_nokey(self): reread = read_hdf(path) assert_frame_equal(df, reread) df.to_hdf(path, 'df2', mode='a') - self.assertRaises(ValueError, read_hdf, path) + pytest.raises(ValueError, read_hdf, path) def test_read_nokey_table(self): # GH13231 @@ -5022,13 +5136,13 @@ def test_read_nokey_table(self): reread = read_hdf(path) assert_frame_equal(df, reread) df.to_hdf(path, 'df2', mode='a', format='table') - self.assertRaises(ValueError, read_hdf, path) + pytest.raises(ValueError, read_hdf, path) def test_read_nokey_empty(self): with ensure_clean_path(self.path) as path: store = HDFStore(path) store.close() - self.assertRaises(ValueError, read_hdf, path) + pytest.raises(ValueError, read_hdf, path) def test_read_from_pathlib_path(self): @@ -5077,7 +5191,7 @@ def test_query_long_float_literal(self): cutoff = 1000000000.0006 result = store.select('test', "A < %.4f" % cutoff) - self.assertTrue(result.empty) + assert result.empty cutoff = 1000000000.0010 result = store.select('test', "A > %.4f" % cutoff) @@ -5089,6 +5203,88 @@ def test_query_long_float_literal(self): expected = df.loc[[1], :] tm.assert_frame_equal(expected, result) + def test_query_compare_column_type(self): + # GH 15492 + df = pd.DataFrame({'date': ['2014-01-01', '2014-01-02'], + 'real_date': date_range('2014-01-01', periods=2), + 'float': [1.1, 1.2], + 'int': [1, 2]}, + columns=['date', 'real_date', 'float', 'int']) + + with ensure_clean_store(self.path) as store: + store.append('test', df, format='table', data_columns=True) + + ts = pd.Timestamp('2014-01-01') # noqa + result = store.select('test', where='real_date > ts') + expected = df.loc[[1], :] + tm.assert_frame_equal(expected, result) + + for op in ['<', '>', '==']: + # non strings to string column always fail + for v in [2.1, True, pd.Timestamp('2014-01-01'), + pd.Timedelta(1, 's')]: + query = 'date {op} v'.format(op=op) + with pytest.raises(TypeError): + result = store.select('test', where=query) + + # strings to other columns must be convertible to type + v = 'a' + for col in ['int', 'float', 'real_date']: + query = '{col} {op} v'.format(op=op, col=col) + with pytest.raises(ValueError): + result = store.select('test', where=query) + + for v, col in zip(['1', '1.1', '2014-01-01'], + ['int', 'float', 'real_date']): + query = '{col} {op} v'.format(op=op, col=col) + result = store.select('test', where=query) + + if op == '==': + expected = df.loc[[0], :] + elif op == '>': + expected = df.loc[[1], :] + else: + expected = df.loc[[], :] + tm.assert_frame_equal(expected, result) + + @pytest.mark.parametrize('format', ['fixed', 'table']) + def test_read_hdf_series_mode_r(self, format): + # GH 16583 + # Tests that reading a Series saved to an HDF file + # still works if a mode='r' argument is supplied + series = tm.makeFloatSeries() + with ensure_clean_path(self.path) as path: + series.to_hdf(path, key='data', format=format) + result = pd.read_hdf(path, key='data', mode='r') + tm.assert_series_equal(result, series) + + @pytest.mark.skipif(sys.version_info < (3, 6), reason="Need python 3.6") + def test_fspath(self): + with tm.ensure_clean('foo.h5') as path: + with pd.HDFStore(path) as store: + assert os.fspath(store) == str(path) + + def test_read_py2_hdf_file_in_py3(self): + # GH 16781 + + # tests reading a PeriodIndex DataFrame written in Python2 in Python3 + + # the file was generated in Python 2.7 like so: + # + # df = pd.DataFrame([1.,2,3], index=pd.PeriodIndex( + # ['2015-01-01', '2015-01-02', '2015-01-05'], freq='B')) + # df.to_hdf('periodindex_0.20.1_x86_64_darwin_2.7.13.h5', 'p') + + expected = pd.DataFrame([1., 2, 3], index=pd.PeriodIndex( + ['2015-01-01', '2015-01-02', '2015-01-05'], freq='B')) + + with ensure_clean_store( + tm.get_data_path( + 'legacy_hdf/periodindex_0.20.1_x86_64_darwin_2.7.13.h5'), + mode='r') as store: + result = store['p'] + assert_frame_equal(result, expected) + class TestHDFComplexValues(Base): # GH10447 @@ -5160,7 +5356,7 @@ def test_complex_mixed_table(self): with ensure_clean_store(self.path) as store: store.append('df', df, data_columns=['A', 'B']) - result = store.select('df', where=Term('A>2')) + result = store.select('df', where='A>2') assert_frame_equal(df.loc[df.A > 2], result) with ensure_clean_path(self.path) as path: @@ -5169,28 +5365,30 @@ def test_complex_mixed_table(self): assert_frame_equal(df, reread) def test_complex_across_dimensions_fixed(self): - complex128 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) - s = Series(complex128, index=list('abcd')) - df = DataFrame({'A': s, 'B': s}) - p = Panel({'One': df, 'Two': df}) + with catch_warnings(record=True): + complex128 = np.array( + [1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) + s = Series(complex128, index=list('abcd')) + df = DataFrame({'A': s, 'B': s}) + p = Panel({'One': df, 'Two': df}) - objs = [s, df, p] - comps = [tm.assert_series_equal, tm.assert_frame_equal, - tm.assert_panel_equal] - for obj, comp in zip(objs, comps): - with ensure_clean_path(self.path) as path: - obj.to_hdf(path, 'obj', format='fixed') - reread = read_hdf(path, 'obj') - comp(obj, reread) + objs = [s, df, p] + comps = [tm.assert_series_equal, tm.assert_frame_equal, + tm.assert_panel_equal] + for obj, comp in zip(objs, comps): + with ensure_clean_path(self.path) as path: + obj.to_hdf(path, 'obj', format='fixed') + reread = read_hdf(path, 'obj') + comp(obj, reread) def test_complex_across_dimensions(self): complex128 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) df = DataFrame({'A': s, 'B': s}) - p = Panel({'One': df, 'Two': df}) - with compat_assert_produces_warning(FutureWarning): - p4d = pd.Panel4D({'i': p, 'ii': p}) + with catch_warnings(record=True): + p = Panel({'One': df, 'Two': df}) + p4d = Panel4D({'i': p, 'ii': p}) objs = [df, p, p4d] comps = [tm.assert_frame_equal, tm.assert_panel_equal, @@ -5209,15 +5407,15 @@ def test_complex_indexing_error(self): 'C': complex128}, index=list('abcd')) with ensure_clean_store(self.path) as store: - self.assertRaises(TypeError, store.append, - 'df', df, data_columns=['C']) + pytest.raises(TypeError, store.append, + 'df', df, data_columns=['C']) def test_complex_series_error(self): complex128 = np.array([1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j, 1.0 + 1.0j]) s = Series(complex128, index=list('abcd')) with ensure_clean_path(self.path) as path: - self.assertRaises(TypeError, s.to_hdf, path, 'obj', format='t') + pytest.raises(TypeError, s.to_hdf, path, 'obj', format='t') with ensure_clean_path(self.path) as path: s.to_hdf(path, 'obj', format='t', index=False) @@ -5235,7 +5433,7 @@ def test_complex_append(self): assert_frame_equal(pd.concat([df, df], 0), result) -class TestTimezones(Base, tm.TestCase): +class TestTimezones(Base): def _compare_with_tz(self, a, b): tm.assert_frame_equal(a, b) @@ -5252,11 +5450,10 @@ def _compare_with_tz(self, a, b): def test_append_with_timezones_dateutil(self): from datetime import timedelta - tm._skip_if_no_dateutil() # use maybe_get_tz instead of dateutil.tz.gettz to handle the windows # filename issues. - from pandas.tslib import maybe_get_tz + from pandas._libs.tslib import maybe_get_tz gettz = lambda x: maybe_get_tz('dateutil/' + x) # as columns @@ -5273,7 +5470,7 @@ def test_append_with_timezones_dateutil(self): # select with tz aware expected = df[df.A >= df.A[3]] - result = store.select('df_tz', where=Term('A>=df.A[3]')) + result = store.select('df_tz', where='A>=df.A[3]') self._compare_with_tz(result, expected) # ensure we include dates in DST and STD time here. @@ -5292,7 +5489,7 @@ def test_append_with_timezones_dateutil(self): tz=gettz('US/Eastern')), B=Timestamp('20130102', tz=gettz('EET'))), index=range(5)) - self.assertRaises(ValueError, store.append, 'df_tz', df) + pytest.raises(ValueError, store.append, 'df_tz', df) # this is ok _maybe_remove(store, 'df_tz') @@ -5306,7 +5503,7 @@ def test_append_with_timezones_dateutil(self): tz=gettz('US/Eastern')), B=Timestamp('20130102', tz=gettz('CET'))), index=range(5)) - self.assertRaises(ValueError, store.append, 'df_tz', df) + pytest.raises(ValueError, store.append, 'df_tz', df) # as index with ensure_clean_store(self.path) as store: @@ -5344,7 +5541,7 @@ def test_append_with_timezones_pytz(self): # select with tz aware self._compare_with_tz(store.select( - 'df_tz', where=Term('A>=df.A[3]')), df[df.A >= df.A[3]]) + 'df_tz', where='A>=df.A[3]'), df[df.A >= df.A[3]]) _maybe_remove(store, 'df_tz') # ensure we include dates in DST and STD time here. @@ -5359,7 +5556,7 @@ def test_append_with_timezones_pytz(self): df = DataFrame(dict(A=Timestamp('20130102', tz='US/Eastern'), B=Timestamp('20130102', tz='EET')), index=range(5)) - self.assertRaises(ValueError, store.append, 'df_tz', df) + pytest.raises(ValueError, store.append, 'df_tz', df) # this is ok _maybe_remove(store, 'df_tz') @@ -5372,7 +5569,7 @@ def test_append_with_timezones_pytz(self): df = DataFrame(dict(A=Timestamp('20130102', tz='US/Eastern'), B=Timestamp('20130102', tz='CET')), index=range(5)) - self.assertRaises(ValueError, store.append, 'df_tz', df) + pytest.raises(ValueError, store.append, 'df_tz', df) # as index with ensure_clean_store(self.path) as store: @@ -5403,7 +5600,7 @@ def test_tseries_select_index_column(self): with ensure_clean_store(self.path) as store: store.append('frame', frame) result = store.select_column('frame', 'index') - self.assertEqual(rng.tz, DatetimeIndex(result.values).tz) + assert rng.tz == DatetimeIndex(result.values).tz # check utc rng = date_range('1/1/2000', '1/30/2000', tz='UTC') @@ -5412,7 +5609,7 @@ def test_tseries_select_index_column(self): with ensure_clean_store(self.path) as store: store.append('frame', frame) result = store.select_column('frame', 'index') - self.assertEqual(rng.tz, result.dt.tz) + assert rng.tz == result.dt.tz # double check non-utc rng = date_range('1/1/2000', '1/30/2000', tz='US/Eastern') @@ -5421,7 +5618,7 @@ def test_tseries_select_index_column(self): with ensure_clean_store(self.path) as store: store.append('frame', frame) result = store.select_column('frame', 'index') - self.assertEqual(rng.tz, result.dt.tz) + assert rng.tz == result.dt.tz def test_timezones_fixed(self): with ensure_clean_store(self.path) as store: @@ -5451,8 +5648,8 @@ def test_fixed_offset_tz(self): with ensure_clean_store(self.path) as store: store['frame'] = frame recons = store['frame'] - self.assert_index_equal(recons.index, rng) - self.assertEqual(rng.tz, recons.index.tz) + tm.assert_index_equal(recons.index, rng) + assert rng.tz == recons.index.tz def test_store_timezone(self): # GH2852 @@ -5507,18 +5704,3 @@ def test_dst_transitions(self): store.append('df', df) result = store.select('df') assert_frame_equal(result, df) - - -def _test_sort(obj): - if isinstance(obj, DataFrame): - return obj.reindex(sorted(obj.index)) - elif isinstance(obj, Panel): - return obj.reindex(major=sorted(obj.major_axis)) - else: - raise ValueError('type not supported here') - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/io/test_s3.py b/pandas/tests/io/test_s3.py new file mode 100644 index 0000000000000..8c2a32af33765 --- /dev/null +++ b/pandas/tests/io/test_s3.py @@ -0,0 +1,8 @@ +from pandas.io.common import _is_s3_url + + +class TestS3URL(object): + + def test_is_s3_url(self): + assert _is_s3_url("s3://pandas/somethingelse.com") + assert not _is_s3_url("s4://pandas/somethingelse.com") diff --git a/pandas/io/tests/test_sql.py b/pandas/tests/io/test_sql.py similarity index 81% rename from pandas/io/tests/test_sql.py rename to pandas/tests/io/test_sql.py index 9e639f7ef6057..a7c42391effe6 100644 --- a/pandas/io/tests/test_sql.py +++ b/pandas/tests/io/test_sql.py @@ -18,26 +18,26 @@ """ from __future__ import print_function -import unittest +from warnings import catch_warnings +import pytest import sqlite3 import csv import os -import sys -import nose import warnings import numpy as np import pandas as pd from datetime import datetime, date, time -from pandas.types.common import (is_object_dtype, is_datetime64_dtype, - is_datetime64tz_dtype) -from pandas import DataFrame, Series, Index, MultiIndex, isnull, concat +from pandas.core.dtypes.common import ( + is_object_dtype, is_datetime64_dtype, + is_datetime64tz_dtype) +from pandas import DataFrame, Series, Index, MultiIndex, isna, concat from pandas import date_range, to_datetime, to_timedelta, Timestamp import pandas.compat as compat -from pandas.compat import StringIO, range, lrange, string_types, PY36 -from pandas.tseries.tools import format as date_format +from pandas.compat import range, lrange, string_types, PY36 +from pandas.core.tools.datetimes import format as date_format import pandas.io.sql as sql from pandas.io.sql import read_sql_table, read_sql_query @@ -178,7 +178,7 @@ class MixInBase(object): - def tearDown(self): + def teardown_method(self, method): for tbl in self._get_all_tables(): self.drop_table(tbl) self._close_conn() @@ -236,7 +236,7 @@ def _close_conn(self): pass -class PandasSQLTest(unittest.TestCase): +class PandasSQLTest(object): """ Base class with common private methods for SQLAlchemy and fallback cases. @@ -271,8 +271,7 @@ def _check_iris_loaded_frame(self, iris_frame): pytype = iris_frame.dtypes[0].type row = iris_frame.iloc[0] - self.assertTrue( - issubclass(pytype, np.floating), 'Loaded frame has incorrect type') + assert issubclass(pytype, np.floating) tm.equalContents(row.values, [5.1, 3.5, 1.4, 0.2, 'Iris-setosa']) def _load_test1_data(self): @@ -371,8 +370,7 @@ def _to_sql(self): self.drop_table('test_frame1') self.pandasSQL.to_sql(self.test_frame1, 'test_frame1') - self.assertTrue(self.pandasSQL.has_table( - 'test_frame1'), 'Table not written to DB') + assert self.pandasSQL.has_table('test_frame1') # Nuke table self.drop_table('test_frame1') @@ -386,11 +384,10 @@ def _to_sql_fail(self): self.pandasSQL.to_sql( self.test_frame1, 'test_frame1', if_exists='fail') - self.assertTrue(self.pandasSQL.has_table( - 'test_frame1'), 'Table not written to DB') + assert self.pandasSQL.has_table('test_frame1') - self.assertRaises(ValueError, self.pandasSQL.to_sql, - self.test_frame1, 'test_frame1', if_exists='fail') + pytest.raises(ValueError, self.pandasSQL.to_sql, + self.test_frame1, 'test_frame1', if_exists='fail') self.drop_table('test_frame1') @@ -402,15 +399,12 @@ def _to_sql_replace(self): # Add to table again self.pandasSQL.to_sql( self.test_frame1, 'test_frame1', if_exists='replace') - self.assertTrue(self.pandasSQL.has_table( - 'test_frame1'), 'Table not written to DB') + assert self.pandasSQL.has_table('test_frame1') num_entries = len(self.test_frame1) num_rows = self._count_rows('test_frame1') - self.assertEqual( - num_rows, num_entries, "not the same number of rows as entries") - + assert num_rows == num_entries self.drop_table('test_frame1') def _to_sql_append(self): @@ -423,15 +417,12 @@ def _to_sql_append(self): # Add to table again self.pandasSQL.to_sql( self.test_frame1, 'test_frame1', if_exists='append') - self.assertTrue(self.pandasSQL.has_table( - 'test_frame1'), 'Table not written to DB') + assert self.pandasSQL.has_table('test_frame1') num_entries = 2 * len(self.test_frame1) num_rows = self._count_rows('test_frame1') - self.assertEqual( - num_rows, num_entries, "not the same number of rows as entries") - + assert num_rows == num_entries self.drop_table('test_frame1') def _roundtrip(self): @@ -458,7 +449,7 @@ def _to_sql_save_index(self): columns=['A', 'B', 'C'], index=['A']) self.pandasSQL.to_sql(df, 'test_to_sql_saves_index') ix_cols = self._get_index_columns('test_to_sql_saves_index') - self.assertEqual(ix_cols, [['A', ], ]) + assert ix_cols == [['A', ], ] def _transaction_test(self): self.pandasSQL.execute("CREATE TABLE test_trans (A INT, B TEXT)") @@ -474,13 +465,13 @@ def _transaction_test(self): # ignore raised exception pass res = self.pandasSQL.read_query('SELECT * FROM test_trans') - self.assertEqual(len(res), 0) + assert len(res) == 0 # Make sure when transaction is committed, rows do get inserted with self.pandasSQL.run_transaction() as trans: trans.execute(ins_sql) res2 = self.pandasSQL.read_query('SELECT * FROM test_trans') - self.assertEqual(len(res2), 1) + assert len(res2) == 1 # ----------------------------------------------------------------------------- @@ -506,7 +497,7 @@ class _TestSQLApi(PandasSQLTest): flavor = 'sqlite' mode = None - def setUp(self): + def setup_method(self, method): self.conn = self.connect() self._load_iris_data() self._load_iris_view() @@ -527,19 +518,15 @@ def test_read_sql_view(self): def test_to_sql(self): sql.to_sql(self.test_frame1, 'test_frame1', self.conn) - self.assertTrue( - sql.has_table('test_frame1', self.conn), - 'Table not written to DB') + assert sql.has_table('test_frame1', self.conn) def test_to_sql_fail(self): sql.to_sql(self.test_frame1, 'test_frame2', self.conn, if_exists='fail') - self.assertTrue( - sql.has_table('test_frame2', self.conn), - 'Table not written to DB') + assert sql.has_table('test_frame2', self.conn) - self.assertRaises(ValueError, sql.to_sql, self.test_frame1, - 'test_frame2', self.conn, if_exists='fail') + pytest.raises(ValueError, sql.to_sql, self.test_frame1, + 'test_frame2', self.conn, if_exists='fail') def test_to_sql_replace(self): sql.to_sql(self.test_frame1, 'test_frame3', @@ -547,15 +534,12 @@ def test_to_sql_replace(self): # Add to table again sql.to_sql(self.test_frame1, 'test_frame3', self.conn, if_exists='replace') - self.assertTrue( - sql.has_table('test_frame3', self.conn), - 'Table not written to DB') + assert sql.has_table('test_frame3', self.conn) num_entries = len(self.test_frame1) num_rows = self._count_rows('test_frame3') - self.assertEqual( - num_rows, num_entries, "not the same number of rows as entries") + assert num_rows == num_entries def test_to_sql_append(self): sql.to_sql(self.test_frame1, 'test_frame4', @@ -564,15 +548,12 @@ def test_to_sql_append(self): # Add to table again sql.to_sql(self.test_frame1, 'test_frame4', self.conn, if_exists='append') - self.assertTrue( - sql.has_table('test_frame4', self.conn), - 'Table not written to DB') + assert sql.has_table('test_frame4', self.conn) num_entries = 2 * len(self.test_frame1) num_rows = self._count_rows('test_frame4') - self.assertEqual( - num_rows, num_entries, "not the same number of rows as entries") + assert num_rows == num_entries def test_to_sql_type_mapping(self): sql.to_sql(self.test_frame3, 'test_frame5', self.conn, index=False) @@ -587,8 +568,9 @@ def test_to_sql_series(self): tm.assert_frame_equal(s.to_frame(), s2) def test_to_sql_panel(self): - panel = tm.makePanel() - self.assertRaises(NotImplementedError, sql.to_sql, panel, + with catch_warnings(record=True): + panel = tm.makePanel() + pytest.raises(NotImplementedError, sql.to_sql, panel, 'test_panel', self.conn) def test_roundtrip(self): @@ -623,33 +605,25 @@ def test_date_parsing(self): # Test date parsing in read_sq # No Parsing df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn) - self.assertFalse( - issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert not issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn, parse_dates=['DateCol']) - self.assertTrue( - issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn, parse_dates={'DateCol': '%Y-%m-%d %H:%M:%S'}) - self.assertTrue( - issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn, parse_dates=['IntDateCol']) - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) df = sql.read_sql_query("SELECT * FROM types_test_data", self.conn, parse_dates={'IntDateCol': 's'}) - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) def test_date_and_index(self): # Test case where same column appears in parse_date and index_col @@ -658,11 +632,8 @@ def test_date_and_index(self): index_col='DateCol', parse_dates=['DateCol', 'IntDateCol']) - self.assertTrue(issubclass(df.index.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") - - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.index.dtype.type, np.datetime64) + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) def test_timedelta(self): @@ -677,7 +648,7 @@ def test_timedelta(self): def test_complex(self): df = DataFrame({'a': [1 + 1j, 2j]}) # Complex data type should raise error - self.assertRaises(ValueError, df.to_sql, 'test_complex', self.conn) + pytest.raises(ValueError, df.to_sql, 'test_complex', self.conn) def test_to_sql_index_label(self): temp_frame = DataFrame({'col1': range(4)}) @@ -685,29 +656,39 @@ def test_to_sql_index_label(self): # no index name, defaults to 'index' sql.to_sql(temp_frame, 'test_index_label', self.conn) frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[0], 'index') + assert frame.columns[0] == 'index' # specifying index_label sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace', index_label='other_label') frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[0], 'other_label', - "Specified index_label not written to database") + assert frame.columns[0] == "other_label" # using the index name temp_frame.index.name = 'index_name' sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace') frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[0], 'index_name', - "Index name not written to database") + assert frame.columns[0] == "index_name" # has index name, but specifying index_label sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace', index_label='other_label') frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[0], 'other_label', - "Specified index_label not written to database") + assert frame.columns[0] == "other_label" + + # index name is integer + temp_frame.index.name = 0 + sql.to_sql(temp_frame, 'test_index_label', self.conn, + if_exists='replace') + frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) + assert frame.columns[0] == "0" + + temp_frame.index.name = None + sql.to_sql(temp_frame, 'test_index_label', self.conn, + if_exists='replace', index_label=0) + frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) + assert frame.columns[0] == "0" def test_to_sql_index_label_multiindex(self): temp_frame = DataFrame({'col1': range(4)}, @@ -717,35 +698,32 @@ def test_to_sql_index_label_multiindex(self): # no index name, defaults to 'level_0' and 'level_1' sql.to_sql(temp_frame, 'test_index_label', self.conn) frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[0], 'level_0') - self.assertEqual(frame.columns[1], 'level_1') + assert frame.columns[0] == 'level_0' + assert frame.columns[1] == 'level_1' # specifying index_label sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace', index_label=['A', 'B']) frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[:2].tolist(), ['A', 'B'], - "Specified index_labels not written to database") + assert frame.columns[:2].tolist() == ['A', 'B'] # using the index name temp_frame.index.names = ['A', 'B'] sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace') frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[:2].tolist(), ['A', 'B'], - "Index names not written to database") + assert frame.columns[:2].tolist() == ['A', 'B'] # has index name, but specifying index_label sql.to_sql(temp_frame, 'test_index_label', self.conn, if_exists='replace', index_label=['C', 'D']) frame = sql.read_sql_query('SELECT * FROM test_index_label', self.conn) - self.assertEqual(frame.columns[:2].tolist(), ['C', 'D'], - "Specified index_labels not written to database") + assert frame.columns[:2].tolist() == ['C', 'D'] # wrong length of index_label - self.assertRaises(ValueError, sql.to_sql, temp_frame, - 'test_index_label', self.conn, if_exists='replace', - index_label='C') + pytest.raises(ValueError, sql.to_sql, temp_frame, + 'test_index_label', self.conn, if_exists='replace', + index_label='C') def test_multiindex_roundtrip(self): df = DataFrame.from_records([(1, 2.1, 'line1'), (2, 1.5, 'line2')], @@ -763,27 +741,27 @@ def test_integer_col_names(self): def test_get_schema(self): create_sql = sql.get_schema(self.test_frame1, 'test', con=self.conn) - self.assertTrue('CREATE' in create_sql) + assert 'CREATE' in create_sql def test_get_schema_dtypes(self): float_frame = DataFrame({'a': [1.1, 1.2], 'b': [2.1, 2.2]}) dtype = sqlalchemy.Integer if self.mode == 'sqlalchemy' else 'INTEGER' create_sql = sql.get_schema(float_frame, 'test', con=self.conn, dtype={'b': dtype}) - self.assertTrue('CREATE' in create_sql) - self.assertTrue('INTEGER' in create_sql) + assert 'CREATE' in create_sql + assert 'INTEGER' in create_sql def test_get_schema_keys(self): frame = DataFrame({'Col1': [1.1, 1.2], 'Col2': [2.1, 2.2]}) create_sql = sql.get_schema(frame, 'test', con=self.conn, keys='Col1') constraint_sentence = 'CONSTRAINT test_pk PRIMARY KEY ("Col1")' - self.assertTrue(constraint_sentence in create_sql) + assert constraint_sentence in create_sql # multiple columns as key (GH10385) create_sql = sql.get_schema(self.test_frame1, 'test', con=self.conn, keys=['A', 'B']) constraint_sentence = 'CONSTRAINT test_pk PRIMARY KEY ("A", "B")' - self.assertTrue(constraint_sentence in create_sql) + assert constraint_sentence in create_sql def test_chunksize_read(self): df = DataFrame(np.random.randn(22, 5), columns=list('abcde')) @@ -800,7 +778,7 @@ def test_chunksize_read(self): for chunk in sql.read_sql_query("select * from test_chunksize", self.conn, chunksize=5): res2 = concat([res2, chunk], ignore_index=True) - self.assertEqual(len(chunk), sizes[i]) + assert len(chunk) == sizes[i] i += 1 tm.assert_frame_equal(res1, res2) @@ -814,7 +792,7 @@ def test_chunksize_read(self): for chunk in sql.read_sql_table("test_chunksize", self.conn, chunksize=5): res3 = concat([res3, chunk], ignore_index=True) - self.assertEqual(len(chunk), sizes[i]) + assert len(chunk) == sizes[i] i += 1 tm.assert_frame_equal(res1, res3) @@ -838,7 +816,18 @@ def test_unicode_column_name(self): df = DataFrame([[1, 2], [3, 4]], columns=[u'\xe9', u'b']) df.to_sql('test_unicode', self.conn, index=False) + def test_escaped_table_name(self): + # GH 13206 + df = DataFrame({'A': [0, 1, 2], 'B': [0.2, np.nan, 5.6]}) + df.to_sql('d1187b08-4943-4c8d-a7f6', self.conn, index=False) + + res = sql.read_sql_query('SELECT * FROM `d1187b08-4943-4c8d-a7f6`', + self.conn) + tm.assert_frame_equal(res, df) + + +@pytest.mark.single class TestSQLApi(SQLAlchemyMixIn, _TestSQLApi): """ Test the public API as it would be used directly @@ -854,7 +843,7 @@ def connect(self): if SQLALCHEMY_INSTALLED: return sqlalchemy.create_engine('sqlite:///:memory:') else: - raise nose.SkipTest('SQLAlchemy not installed') + pytest.skip('SQLAlchemy not installed') def test_read_table_columns(self): # test columns argument in read_table @@ -862,29 +851,24 @@ def test_read_table_columns(self): cols = ['A', 'B'] result = sql.read_sql_table('test_frame', self.conn, columns=cols) - self.assertEqual(result.columns.tolist(), cols, - "Columns not correctly selected") + assert result.columns.tolist() == cols def test_read_table_index_col(self): # test columns argument in read_table sql.to_sql(self.test_frame1, 'test_frame', self.conn) result = sql.read_sql_table('test_frame', self.conn, index_col="index") - self.assertEqual(result.index.names, ["index"], - "index_col not correctly set") + assert result.index.names == ["index"] result = sql.read_sql_table( 'test_frame', self.conn, index_col=["A", "B"]) - self.assertEqual(result.index.names, ["A", "B"], - "index_col not correctly set") + assert result.index.names == ["A", "B"] result = sql.read_sql_table('test_frame', self.conn, index_col=["A", "B"], columns=["C", "D"]) - self.assertEqual(result.index.names, ["A", "B"], - "index_col not correctly set") - self.assertEqual(result.columns.tolist(), ["C", "D"], - "columns not set correctly whith index_col") + assert result.index.names == ["A", "B"] + assert result.columns.tolist() == ["C", "D"] def test_read_sql_delegate(self): iris_frame1 = sql.read_sql_query( @@ -911,10 +895,11 @@ def test_not_reflect_all_tables(self): sql.read_sql_table('other_table', self.conn) sql.read_sql_query('SELECT * FROM other_table', self.conn) # Verify some things - self.assertEqual(len(w), 0, "Warning triggered for other table") + assert len(w) == 0 def test_warning_case_insensitive_table_name(self): - # see GH7815. + # see gh-7815 + # # We can't test that this warning is triggered, a the database # configuration would have to be altered. But here we test that # the warning is certainly NOT triggered in a normal case. @@ -924,8 +909,7 @@ def test_warning_case_insensitive_table_name(self): # This should not trigger a Warning self.test_frame1.to_sql('CaseSensitive', self.conn) # Verify some things - self.assertEqual( - len(w), 0, "Warning triggered for writing a table") + assert len(w) == 0 def _get_index_columns(self, tbl_name): from sqlalchemy.engine import reflection @@ -941,8 +925,7 @@ def test_sqlalchemy_type_mapping(self): utc=True)}) db = sql.SQLDatabase(self.conn) table = sql.SQLTable("test_type", db, frame=df) - self.assertTrue(isinstance( - table.table.c['time'].type, sqltypes.DateTime)) + assert isinstance(table.table.c['time'].type, sqltypes.DateTime) def test_database_uri_string(self): @@ -965,8 +948,15 @@ def test_database_uri_string(self): # using driver that will not be installed on Travis to trigger error # in sqlalchemy.create_engine -> test passing of this error to user + try: + # the rest of this test depends on pg8000's being absent + import pg8000 # noqa + pytest.skip("pg8000 is installed") + except ImportError: + pass + db_uri = "postgresql+pg8000://user:pass@host/dbname" - with tm.assertRaisesRegexp(ImportError, "pg8000"): + with tm.assert_raises_regex(ImportError, "pg8000"): sql.read_sql("select * from table", db_uri) def _make_iris_table_metadata(self): @@ -988,7 +978,7 @@ def test_query_by_text_obj(self): iris_df = sql.read_sql(name_text, self.conn, params={ 'name': 'Iris-versicolor'}) all_names = set(iris_df['Name']) - self.assertEqual(all_names, set(['Iris-versicolor'])) + assert all_names == set(['Iris-versicolor']) def test_query_by_select_obj(self): # WIP : GH10846 @@ -999,7 +989,7 @@ def test_query_by_select_obj(self): iris_df = sql.read_sql(name_select, self.conn, params={'name': 'Iris-setosa'}) all_names = set(iris_df['Name']) - self.assertEqual(all_names, set(['Iris-setosa'])) + assert all_names == set(['Iris-setosa']) class _EngineToConnMixin(object): @@ -1007,8 +997,8 @@ class _EngineToConnMixin(object): A mixin that causes setup_connect to create a conn rather than an engine. """ - def setUp(self): - super(_EngineToConnMixin, self).setUp() + def setup_method(self, method): + super(_EngineToConnMixin, self).setup_method(method) engine = self.conn conn = engine.connect() self.__tx = conn.begin() @@ -1016,18 +1006,20 @@ def setUp(self): self.__engine = engine self.conn = conn - def tearDown(self): + def teardown_method(self, method): self.__tx.rollback() self.conn.close() self.conn = self.__engine self.pandasSQL = sql.SQLDatabase(self.__engine) - super(_EngineToConnMixin, self).tearDown() + super(_EngineToConnMixin, self).teardown_method(method) +@pytest.mark.single class TestSQLApiConn(_EngineToConnMixin, TestSQLApi): pass +@pytest.mark.single class TestSQLiteFallbackApi(SQLiteMixIn, _TestSQLApi): """ Test the public sqlite connection fallback API @@ -1060,17 +1052,17 @@ def test_sql_open_close(self): def test_con_string_import_error(self): if not SQLALCHEMY_INSTALLED: conn = 'mysql://root@localhost/pandas_nosetest' - self.assertRaises(ImportError, sql.read_sql, "SELECT * FROM iris", - conn) + pytest.raises(ImportError, sql.read_sql, "SELECT * FROM iris", + conn) else: - raise nose.SkipTest('SQLAlchemy is installed') + pytest.skip('SQLAlchemy is installed') def test_read_sql_delegate(self): iris_frame1 = sql.read_sql_query("SELECT * FROM iris", self.conn) iris_frame2 = sql.read_sql("SELECT * FROM iris", self.conn) tm.assert_frame_equal(iris_frame1, iris_frame2) - self.assertRaises(sql.DatabaseError, sql.read_sql, 'iris', self.conn) + pytest.raises(sql.DatabaseError, sql.read_sql, 'iris', self.conn) def test_safe_names_warning(self): # GH 6798 @@ -1082,7 +1074,7 @@ def test_safe_names_warning(self): def test_get_schema2(self): # without providing a connection object (available for backwards comp) create_sql = sql.get_schema(self.test_frame1, 'test') - self.assertTrue('CREATE' in create_sql) + assert 'CREATE' in create_sql def _get_sqlite_column_type(self, schema, column): @@ -1099,8 +1091,7 @@ def test_sqlite_type_mapping(self): db = sql.SQLiteDatabase(self.conn) table = sql.SQLiteTable("test_type", db, frame=df) schema = table.sql_schema() - self.assertEqual(self._get_sqlite_column_type(schema, 'time'), - "TIMESTAMP") + assert self._get_sqlite_column_type(schema, 'time') == "TIMESTAMP" # ----------------------------------------------------------------------------- @@ -1118,7 +1109,7 @@ class _TestSQLAlchemy(SQLAlchemyMixIn, PandasSQLTest): flavor = None @classmethod - def setUpClass(cls): + def setup_class(cls): cls.setup_import() cls.setup_driver() @@ -1128,9 +1119,9 @@ def setUpClass(cls): conn.connect() except sqlalchemy.exc.OperationalError: msg = "{0} - can't connect to {1} server".format(cls, cls.flavor) - raise nose.SkipTest(msg) + pytest.skip(msg) - def setUp(self): + def setup_method(self, method): self.setup_connect() self._load_iris_data() @@ -1141,7 +1132,7 @@ def setUp(self): def setup_import(cls): # Skip this test if SQLAlchemy not available if not SQLALCHEMY_INSTALLED: - raise nose.SkipTest('SQLAlchemy not installed') + pytest.skip('SQLAlchemy not installed') @classmethod def setup_driver(cls): @@ -1158,7 +1149,7 @@ def setup_connect(self): # to test if connection can be made: self.conn.connect() except sqlalchemy.exc.OperationalError: - raise nose.SkipTest( + pytest.skip( "Can't connect to {0} server".format(self.flavor)) def test_aread_sql(self): @@ -1193,8 +1184,7 @@ def test_create_table(self): pandasSQL = sql.SQLDatabase(temp_conn) pandasSQL.to_sql(temp_frame, 'temp_frame') - self.assertTrue( - temp_conn.has_table('temp_frame'), 'Table not written to DB') + assert temp_conn.has_table('temp_frame') def test_drop_table(self): temp_conn = self.connect() @@ -1205,13 +1195,11 @@ def test_drop_table(self): pandasSQL = sql.SQLDatabase(temp_conn) pandasSQL.to_sql(temp_frame, 'temp_frame') - self.assertTrue( - temp_conn.has_table('temp_frame'), 'Table not written to DB') + assert temp_conn.has_table('temp_frame') pandasSQL.drop_table('temp_frame') - self.assertFalse( - temp_conn.has_table('temp_frame'), 'Table not deleted from DB') + assert not temp_conn.has_table('temp_frame') def test_roundtrip(self): self._roundtrip() @@ -1230,25 +1218,20 @@ def test_read_table_columns(self): iris_frame.columns.values, ['SepalLength', 'SepalLength']) def test_read_table_absent(self): - self.assertRaises( + pytest.raises( ValueError, sql.read_sql_table, "this_doesnt_exist", con=self.conn) def test_default_type_conversion(self): df = sql.read_sql_table("types_test_data", self.conn) - self.assertTrue(issubclass(df.FloatCol.dtype.type, np.floating), - "FloatCol loaded with incorrect type") - self.assertTrue(issubclass(df.IntCol.dtype.type, np.integer), - "IntCol loaded with incorrect type") - self.assertTrue(issubclass(df.BoolCol.dtype.type, np.bool_), - "BoolCol loaded with incorrect type") + assert issubclass(df.FloatCol.dtype.type, np.floating) + assert issubclass(df.IntCol.dtype.type, np.integer) + assert issubclass(df.BoolCol.dtype.type, np.bool_) # Int column with NA values stays as float - self.assertTrue(issubclass(df.IntColWithNull.dtype.type, np.floating), - "IntColWithNull loaded with incorrect type") + assert issubclass(df.IntColWithNull.dtype.type, np.floating) # Bool column with NA values becomes object - self.assertTrue(issubclass(df.BoolColWithNull.dtype.type, np.object), - "BoolColWithNull loaded with incorrect type") + assert issubclass(df.BoolColWithNull.dtype.type, np.object) def test_bigint(self): # int64 should be converted to BigInteger, GH7433 @@ -1263,8 +1246,7 @@ def test_default_date_load(self): # IMPORTANT - sqlite has no native date type, so shouldn't parse, but # MySQL SHOULD be converted. - self.assertTrue(issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) def test_datetime_with_timezone(self): # edge case that converts postgresql datetime with time zone types @@ -1278,24 +1260,22 @@ def check(col): # "2000-01-01 00:00:00-08:00" should convert to # "2000-01-01 08:00:00" - self.assertEqual(col[0], Timestamp('2000-01-01 08:00:00')) + assert col[0] == Timestamp('2000-01-01 08:00:00') # "2000-06-01 00:00:00-07:00" should convert to # "2000-06-01 07:00:00" - self.assertEqual(col[1], Timestamp('2000-06-01 07:00:00')) + assert col[1] == Timestamp('2000-06-01 07:00:00') elif is_datetime64tz_dtype(col.dtype): - self.assertTrue(str(col.dt.tz) == 'UTC') + assert str(col.dt.tz) == 'UTC' # "2000-01-01 00:00:00-08:00" should convert to # "2000-01-01 08:00:00" - self.assertEqual(col[0], Timestamp( - '2000-01-01 08:00:00', tz='UTC')) + assert col[0] == Timestamp('2000-01-01 08:00:00', tz='UTC') # "2000-06-01 00:00:00-07:00" should convert to # "2000-06-01 07:00:00" - self.assertEqual(col[1], Timestamp( - '2000-06-01 07:00:00', tz='UTC')) + assert col[1] == Timestamp('2000-06-01 07:00:00', tz='UTC') else: raise AssertionError("DateCol loaded with incorrect type " @@ -1304,32 +1284,28 @@ def check(col): # GH11216 df = pd.read_sql_query("select * from types_test_data", self.conn) if not hasattr(df, 'DateColWithTz'): - raise nose.SkipTest("no column with datetime with time zone") + pytest.skip("no column with datetime with time zone") # this is parsed on Travis (linux), but not on macosx for some reason # even with the same versions of psycopg2 & sqlalchemy, possibly a # Postgrsql server version difference col = df.DateColWithTz - self.assertTrue(is_object_dtype(col.dtype) or - is_datetime64_dtype(col.dtype) or - is_datetime64tz_dtype(col.dtype), - "DateCol loaded with incorrect type -> {0}" - .format(col.dtype)) + assert (is_object_dtype(col.dtype) or + is_datetime64_dtype(col.dtype) or + is_datetime64tz_dtype(col.dtype)) df = pd.read_sql_query("select * from types_test_data", self.conn, parse_dates=['DateColWithTz']) if not hasattr(df, 'DateColWithTz'): - raise nose.SkipTest("no column with datetime with time zone") + pytest.skip("no column with datetime with time zone") check(df.DateColWithTz) df = pd.concat(list(pd.read_sql_query("select * from types_test_data", self.conn, chunksize=1)), ignore_index=True) col = df.DateColWithTz - self.assertTrue(is_datetime64tz_dtype(col.dtype), - "DateCol loaded with incorrect type -> {0}" - .format(col.dtype)) - self.assertTrue(str(col.dt.tz) == 'UTC') + assert is_datetime64tz_dtype(col.dtype) + assert str(col.dt.tz) == 'UTC' expected = sql.read_sql_table("types_test_data", self.conn) tm.assert_series_equal(df.DateColWithTz, expected.DateColWithTz @@ -1346,33 +1322,27 @@ def test_date_parsing(self): df = sql.read_sql_table("types_test_data", self.conn, parse_dates=['DateCol']) - self.assertTrue(issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_table("types_test_data", self.conn, parse_dates={'DateCol': '%Y-%m-%d %H:%M:%S'}) - self.assertTrue(issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_table("types_test_data", self.conn, parse_dates={ 'DateCol': {'format': '%Y-%m-%d %H:%M:%S'}}) - self.assertTrue(issubclass(df.DateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.DateCol.dtype.type, np.datetime64) df = sql.read_sql_table( "types_test_data", self.conn, parse_dates=['IntDateCol']) - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) df = sql.read_sql_table( "types_test_data", self.conn, parse_dates={'IntDateCol': 's'}) - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) df = sql.read_sql_table("types_test_data", self.conn, parse_dates={'IntDateCol': {'unit': 's'}}) - self.assertTrue(issubclass(df.IntDateCol.dtype.type, np.datetime64), - "IntDateCol loaded with incorrect type") + assert issubclass(df.IntDateCol.dtype.type, np.datetime64) def test_datetime(self): df = DataFrame({'A': date_range('2013-01-01 09:00:00', periods=3), @@ -1388,7 +1358,7 @@ def test_datetime(self): result = sql.read_sql_query('SELECT * FROM test_datetime', self.conn) result = result.drop('index', axis=1) if self.flavor == 'sqlite': - self.assertTrue(isinstance(result.loc[0, 'A'], string_types)) + assert isinstance(result.loc[0, 'A'], string_types) result['A'] = to_datetime(result['A']) tm.assert_frame_equal(result, df) else: @@ -1407,7 +1377,7 @@ def test_datetime_NaT(self): # with read_sql -> no type information -> sqlite has no native result = sql.read_sql_query('SELECT * FROM test_datetime', self.conn) if self.flavor == 'sqlite': - self.assertTrue(isinstance(result.loc[0, 'A'], string_types)) + assert isinstance(result.loc[0, 'A'], string_types) result['A'] = to_datetime(result['A'], errors='coerce') tm.assert_frame_equal(result, df) else: @@ -1540,16 +1510,16 @@ def test_dtype(self): meta = sqlalchemy.schema.MetaData(bind=self.conn) meta.reflect() sqltype = meta.tables['dtype_test2'].columns['B'].type - self.assertTrue(isinstance(sqltype, sqlalchemy.TEXT)) - self.assertRaises(ValueError, df.to_sql, - 'error', self.conn, dtype={'B': str}) + assert isinstance(sqltype, sqlalchemy.TEXT) + pytest.raises(ValueError, df.to_sql, + 'error', self.conn, dtype={'B': str}) # GH9083 df.to_sql('dtype_test3', self.conn, dtype={'B': sqlalchemy.String(10)}) meta.reflect() sqltype = meta.tables['dtype_test3'].columns['B'].type - self.assertTrue(isinstance(sqltype, sqlalchemy.String)) - self.assertEqual(sqltype.length, 10) + assert isinstance(sqltype, sqlalchemy.String) + assert sqltype.length == 10 # single dtype df.to_sql('single_dtype_test', self.conn, dtype=sqlalchemy.TEXT) @@ -1557,10 +1527,10 @@ def test_dtype(self): meta.reflect() sqltypea = meta.tables['single_dtype_test'].columns['A'].type sqltypeb = meta.tables['single_dtype_test'].columns['B'].type - self.assertTrue(isinstance(sqltypea, sqlalchemy.TEXT)) - self.assertTrue(isinstance(sqltypeb, sqlalchemy.TEXT)) + assert isinstance(sqltypea, sqlalchemy.TEXT) + assert isinstance(sqltypeb, sqlalchemy.TEXT) - def test_notnull_dtype(self): + def test_notna_dtype(self): cols = {'Bool': Series([True, None]), 'Date': Series([datetime(2012, 5, 1), None]), 'Int': Series([1, None], dtype='object'), @@ -1568,7 +1538,7 @@ def test_notnull_dtype(self): } df = DataFrame(cols) - tbl = 'notnull_dtype_test' + tbl = 'notna_dtype_test' df.to_sql(tbl, self.conn) returned_df = sql.read_sql_table(tbl, self.conn) # noqa meta = sqlalchemy.schema.MetaData(bind=self.conn) @@ -1580,10 +1550,10 @@ def test_notnull_dtype(self): col_dict = meta.tables[tbl].columns - self.assertTrue(isinstance(col_dict['Bool'].type, my_type)) - self.assertTrue(isinstance(col_dict['Date'].type, sqltypes.DateTime)) - self.assertTrue(isinstance(col_dict['Int'].type, sqltypes.Integer)) - self.assertTrue(isinstance(col_dict['Float'].type, sqltypes.Float)) + assert isinstance(col_dict['Bool'].type, my_type) + assert isinstance(col_dict['Date'].type, sqltypes.DateTime) + assert isinstance(col_dict['Int'].type, sqltypes.Integer) + assert isinstance(col_dict['Float'].type, sqltypes.Float) def test_double_precision(self): V = 1.23456789101112131415 @@ -1600,19 +1570,18 @@ def test_double_precision(self): res = sql.read_sql_table('test_dtypes', self.conn) # check precision of float64 - self.assertEqual(np.round(df['f64'].iloc[0], 14), - np.round(res['f64'].iloc[0], 14)) + assert (np.round(df['f64'].iloc[0], 14) == + np.round(res['f64'].iloc[0], 14)) # check sql types meta = sqlalchemy.schema.MetaData(bind=self.conn) meta.reflect() col_dict = meta.tables['test_dtypes'].columns - self.assertEqual(str(col_dict['f32'].type), - str(col_dict['f64_as_f32'].type)) - self.assertTrue(isinstance(col_dict['f32'].type, sqltypes.Float)) - self.assertTrue(isinstance(col_dict['f64'].type, sqltypes.Float)) - self.assertTrue(isinstance(col_dict['i32'].type, sqltypes.Integer)) - self.assertTrue(isinstance(col_dict['i64'].type, sqltypes.BigInteger)) + assert str(col_dict['f32'].type) == str(col_dict['f64_as_f32'].type) + assert isinstance(col_dict['f32'].type, sqltypes.Float) + assert isinstance(col_dict['f64'].type, sqltypes.Float) + assert isinstance(col_dict['i32'].type, sqltypes.Integer) + assert isinstance(col_dict['i64'].type, sqltypes.BigInteger) def test_connectable_issue_example(self): # This tests the example raised in issue @@ -1665,7 +1634,7 @@ class Temporary(Base): class _TestSQLAlchemyConn(_EngineToConnMixin, _TestSQLAlchemy): def test_transactions(self): - raise nose.SkipTest( + pytest.skip( "Nested transactions rollbacks don't work with Pandas") @@ -1688,27 +1657,23 @@ def setup_driver(cls): def test_default_type_conversion(self): df = sql.read_sql_table("types_test_data", self.conn) - self.assertTrue(issubclass(df.FloatCol.dtype.type, np.floating), - "FloatCol loaded with incorrect type") - self.assertTrue(issubclass(df.IntCol.dtype.type, np.integer), - "IntCol loaded with incorrect type") + assert issubclass(df.FloatCol.dtype.type, np.floating) + assert issubclass(df.IntCol.dtype.type, np.integer) + # sqlite has no boolean type, so integer type is returned - self.assertTrue(issubclass(df.BoolCol.dtype.type, np.integer), - "BoolCol loaded with incorrect type") + assert issubclass(df.BoolCol.dtype.type, np.integer) # Int column with NA values stays as float - self.assertTrue(issubclass(df.IntColWithNull.dtype.type, np.floating), - "IntColWithNull loaded with incorrect type") + assert issubclass(df.IntColWithNull.dtype.type, np.floating) + # Non-native Bool column with NA values stays as float - self.assertTrue(issubclass(df.BoolColWithNull.dtype.type, np.floating), - "BoolColWithNull loaded with incorrect type") + assert issubclass(df.BoolColWithNull.dtype.type, np.floating) def test_default_date_load(self): df = sql.read_sql_table("types_test_data", self.conn) # IMPORTANT - sqlite has no native date type, so shouldn't parse, but - self.assertFalse(issubclass(df.DateCol.dtype.type, np.datetime64), - "DateCol loaded with incorrect type") + assert not issubclass(df.DateCol.dtype.type, np.datetime64) def test_bigint_warning(self): # test no warning for BIGINT (to support int64) is raised (GH7433) @@ -1718,7 +1683,7 @@ def test_bigint_warning(self): with warnings.catch_warnings(record=True) as w: warnings.simplefilter("always") sql.read_sql_table('test_bigintwarning', self.conn) - self.assertEqual(len(w), 0, "Warning triggered for other table") + assert len(w) == 0 class _TestMySQLAlchemy(object): @@ -1739,25 +1704,22 @@ def setup_driver(cls): import pymysql # noqa cls.driver = 'pymysql' except ImportError: - raise nose.SkipTest('pymysql not installed') + pytest.skip('pymysql not installed') def test_default_type_conversion(self): df = sql.read_sql_table("types_test_data", self.conn) - self.assertTrue(issubclass(df.FloatCol.dtype.type, np.floating), - "FloatCol loaded with incorrect type") - self.assertTrue(issubclass(df.IntCol.dtype.type, np.integer), - "IntCol loaded with incorrect type") + assert issubclass(df.FloatCol.dtype.type, np.floating) + assert issubclass(df.IntCol.dtype.type, np.integer) + # MySQL has no real BOOL type (it's an alias for TINYINT) - self.assertTrue(issubclass(df.BoolCol.dtype.type, np.integer), - "BoolCol loaded with incorrect type") + assert issubclass(df.BoolCol.dtype.type, np.integer) # Int column with NA values stays as float - self.assertTrue(issubclass(df.IntColWithNull.dtype.type, np.floating), - "IntColWithNull loaded with incorrect type") + assert issubclass(df.IntColWithNull.dtype.type, np.floating) + # Bool column with NA = int column with NA values => becomes float - self.assertTrue(issubclass(df.BoolColWithNull.dtype.type, np.floating), - "BoolColWithNull loaded with incorrect type") + assert issubclass(df.BoolColWithNull.dtype.type, np.floating) def test_read_procedure(self): # see GH7324. Although it is more an api test, it is added to the @@ -1808,7 +1770,7 @@ def setup_driver(cls): import psycopg2 # noqa cls.driver = 'psycopg2' except ImportError: - raise nose.SkipTest('psycopg2 not installed') + pytest.skip('psycopg2 not installed') def test_schema_support(self): # only test this for postgresql (schema's not supported in @@ -1837,8 +1799,8 @@ def test_schema_support(self): res4 = sql.read_sql_table('test_schema_other', self.conn, schema='other') tm.assert_frame_equal(df, res4) - self.assertRaises(ValueError, sql.read_sql_table, 'test_schema_other', - self.conn, schema='public') + pytest.raises(ValueError, sql.read_sql_table, 'test_schema_other', + self.conn, schema='public') # different if_exists options @@ -1875,26 +1837,32 @@ def test_schema_support(self): tm.assert_frame_equal(res1, res2) +@pytest.mark.single class TestMySQLAlchemy(_TestMySQLAlchemy, _TestSQLAlchemy): pass +@pytest.mark.single class TestMySQLAlchemyConn(_TestMySQLAlchemy, _TestSQLAlchemyConn): pass +@pytest.mark.single class TestPostgreSQLAlchemy(_TestPostgreSQLAlchemy, _TestSQLAlchemy): pass +@pytest.mark.single class TestPostgreSQLAlchemyConn(_TestPostgreSQLAlchemy, _TestSQLAlchemyConn): pass +@pytest.mark.single class TestSQLiteAlchemy(_TestSQLiteAlchemy, _TestSQLAlchemy): pass +@pytest.mark.single class TestSQLiteAlchemyConn(_TestSQLiteAlchemy, _TestSQLAlchemyConn): pass @@ -1902,6 +1870,7 @@ class TestSQLiteAlchemyConn(_TestSQLiteAlchemy, _TestSQLAlchemyConn): # ----------------------------------------------------------------------------- # -- Test Sqlite / MySQL fallback +@pytest.mark.single class TestSQLiteFallback(SQLiteMixIn, PandasSQLTest): """ Test the fallback mode against an in-memory sqlite database. @@ -1913,7 +1882,7 @@ class TestSQLiteFallback(SQLiteMixIn, PandasSQLTest): def connect(cls): return sqlite3.connect(':memory:') - def setUp(self): + def setup_method(self, method): self.conn = self.connect() self.pandasSQL = sql.SQLiteDatabase(self.conn) @@ -1951,13 +1920,11 @@ def test_create_and_drop_table(self): self.pandasSQL.to_sql(temp_frame, 'drop_test_frame') - self.assertTrue(self.pandasSQL.has_table('drop_test_frame'), - 'Table not written to DB') + assert self.pandasSQL.has_table('drop_test_frame') self.pandasSQL.drop_table('drop_test_frame') - self.assertFalse(self.pandasSQL.has_table('drop_test_frame'), - 'Table not deleted from DB') + assert not self.pandasSQL.has_table('drop_test_frame') def test_roundtrip(self): self._roundtrip() @@ -2002,7 +1969,7 @@ def test_to_sql_save_index(self): def test_transactions(self): if PY36: - raise nose.SkipTest("not working on python > 3.5") + pytest.skip("not working on python > 3.5") self._transaction_test() def _get_sqlite_column_type(self, table, column): @@ -2014,7 +1981,7 @@ def _get_sqlite_column_type(self, table, column): def test_dtype(self): if self.flavor == 'mysql': - raise nose.SkipTest('Not applicable to MySQL legacy') + pytest.skip('Not applicable to MySQL legacy') cols = ['A', 'B'] data = [(0.8, True), (0.9, None)] @@ -2023,24 +1990,24 @@ def test_dtype(self): df.to_sql('dtype_test2', self.conn, dtype={'B': 'STRING'}) # sqlite stores Boolean values as INTEGER - self.assertEqual(self._get_sqlite_column_type( - 'dtype_test', 'B'), 'INTEGER') + assert self._get_sqlite_column_type( + 'dtype_test', 'B') == 'INTEGER' - self.assertEqual(self._get_sqlite_column_type( - 'dtype_test2', 'B'), 'STRING') - self.assertRaises(ValueError, df.to_sql, - 'error', self.conn, dtype={'B': bool}) + assert self._get_sqlite_column_type( + 'dtype_test2', 'B') == 'STRING' + pytest.raises(ValueError, df.to_sql, + 'error', self.conn, dtype={'B': bool}) # single dtype df.to_sql('single_dtype_test', self.conn, dtype='STRING') - self.assertEqual( - self._get_sqlite_column_type('single_dtype_test', 'A'), 'STRING') - self.assertEqual( - self._get_sqlite_column_type('single_dtype_test', 'B'), 'STRING') + assert self._get_sqlite_column_type( + 'single_dtype_test', 'A') == 'STRING' + assert self._get_sqlite_column_type( + 'single_dtype_test', 'B') == 'STRING' - def test_notnull_dtype(self): + def test_notna_dtype(self): if self.flavor == 'mysql': - raise nose.SkipTest('Not applicable to MySQL legacy') + pytest.skip('Not applicable to MySQL legacy') cols = {'Bool': Series([True, None]), 'Date': Series([datetime(2012, 5, 1), None]), @@ -2049,21 +2016,20 @@ def test_notnull_dtype(self): } df = DataFrame(cols) - tbl = 'notnull_dtype_test' + tbl = 'notna_dtype_test' df.to_sql(tbl, self.conn) - self.assertEqual(self._get_sqlite_column_type(tbl, 'Bool'), 'INTEGER') - self.assertEqual(self._get_sqlite_column_type( - tbl, 'Date'), 'TIMESTAMP') - self.assertEqual(self._get_sqlite_column_type(tbl, 'Int'), 'INTEGER') - self.assertEqual(self._get_sqlite_column_type(tbl, 'Float'), 'REAL') + assert self._get_sqlite_column_type(tbl, 'Bool') == 'INTEGER' + assert self._get_sqlite_column_type(tbl, 'Date') == 'TIMESTAMP' + assert self._get_sqlite_column_type(tbl, 'Int') == 'INTEGER' + assert self._get_sqlite_column_type(tbl, 'Float') == 'REAL' def test_illegal_names(self): # For sqlite, these should work fine df = DataFrame([[1, 2], [3, 4]], columns=['a', 'b']) # Raise error on blank - self.assertRaises(ValueError, df.to_sql, "", self.conn) + pytest.raises(ValueError, df.to_sql, "", self.conn) for ndx, weird_name in enumerate( ['test_weird_name]', 'test_weird_name[', @@ -2103,7 +2069,7 @@ def format_query(sql, *args): """ processed_args = [] for arg in args: - if isinstance(arg, float) and isnull(arg): + if isinstance(arg, float) and isna(arg): arg = None formatter = _formatters[type(arg)] @@ -2125,12 +2091,14 @@ def _skip_if_no_pymysql(): try: import pymysql # noqa except ImportError: - raise nose.SkipTest('pymysql not installed, skipping') + pytest.skip('pymysql not installed, skipping') -class TestXSQLite(SQLiteMixIn, tm.TestCase): +@pytest.mark.single +class TestXSQLite(SQLiteMixIn): - def setUp(self): + def setup_method(self, method): + self.method = method self.conn = sqlite3.connect(':memory:') def test_basic(self): @@ -2180,15 +2148,16 @@ def test_schema(self): for l in lines: tokens = l.split(' ') if len(tokens) == 2 and tokens[0] == 'A': - self.assertTrue(tokens[1] == 'DATETIME') + assert tokens[1] == 'DATETIME' frame = tm.makeTimeDataFrame() create_sql = sql.get_schema(frame, 'test', keys=['A', 'B']) lines = create_sql.splitlines() - self.assertTrue('PRIMARY KEY ("A", "B")' in create_sql) + assert 'PRIMARY KEY ("A", "B")' in create_sql cur = self.conn.cursor() cur.execute(create_sql) + @tm.capture_stdout def test_execute_fail(self): create_sql = """ CREATE TABLE test @@ -2205,14 +2174,10 @@ def test_execute_fail(self): sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)', self.conn) sql.execute('INSERT INTO test VALUES("foo", "baz", 2.567)', self.conn) - try: - sys.stdout = StringIO() - self.assertRaises(Exception, sql.execute, - 'INSERT INTO test VALUES("foo", "bar", 7)', - self.conn) - finally: - sys.stdout = sys.__stdout__ + with pytest.raises(Exception): + sql.execute('INSERT INTO test VALUES("foo", "bar", 7)', self.conn) + @tm.capture_stdout def test_execute_closed_connection(self): create_sql = """ CREATE TABLE test @@ -2228,15 +2193,12 @@ def test_execute_closed_connection(self): sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)', self.conn) self.conn.close() - try: - sys.stdout = StringIO() - self.assertRaises(Exception, tquery, "select * from test", - con=self.conn) - finally: - sys.stdout = sys.__stdout__ + + with pytest.raises(Exception): + tquery("select * from test", con=self.conn) # Initialize connection again (needed for tearDown) - self.setUp() + self.setup_method(self.method) def test_na_roundtrip(self): pass @@ -2277,7 +2239,7 @@ def test_onecolumn_of_integer(self): the_sum = sum([my_c0[0] for my_c0 in con_x.execute("select * from mono_df")]) # it should not fail, and gives 3 ( Issue #3628 ) - self.assertEqual(the_sum, 3) + assert the_sum == 3 result = sql.read_sql("select * from mono_df", con_x) tm.assert_frame_equal(result, mono_df) @@ -2297,48 +2259,47 @@ def clean_up(test_table_to_drop): self.drop_table(test_table_to_drop) # test if invalid value for if_exists raises appropriate error - self.assertRaises(ValueError, - sql.to_sql, - frame=df_if_exists_1, - con=self.conn, - name=table_name, - if_exists='notvalidvalue') + pytest.raises(ValueError, + sql.to_sql, + frame=df_if_exists_1, + con=self.conn, + name=table_name, + if_exists='notvalidvalue') clean_up(table_name) # test if_exists='fail' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='fail') - self.assertRaises(ValueError, - sql.to_sql, - frame=df_if_exists_1, - con=self.conn, - name=table_name, - if_exists='fail') + pytest.raises(ValueError, + sql.to_sql, + frame=df_if_exists_1, + con=self.conn, + name=table_name, + if_exists='fail') # test if_exists='replace' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='replace', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B')]) + assert tquery(sql_select, con=self.conn) == [(1, 'A'), (2, 'B')] sql.to_sql(frame=df_if_exists_2, con=self.conn, name=table_name, if_exists='replace', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(3, 'C'), (4, 'D'), (5, 'E')]) + assert (tquery(sql_select, con=self.conn) == + [(3, 'C'), (4, 'D'), (5, 'E')]) clean_up(table_name) # test if_exists='append' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='fail', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B')]) + assert tquery(sql_select, con=self.conn) == [(1, 'A'), (2, 'B')] sql.to_sql(frame=df_if_exists_2, con=self.conn, name=table_name, if_exists='append', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]) + assert (tquery(sql_select, con=self.conn) == + [(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]) clean_up(table_name) -class TestSQLFlavorDeprecation(tm.TestCase): +@pytest.mark.single +class TestSQLFlavorDeprecation(object): """ gh-13611: test that the 'flavor' parameter is appropriately deprecated by checking the @@ -2352,8 +2313,8 @@ def test_unsupported_flavor(self): msg = 'is not supported' for func in self.funcs: - tm.assertRaisesRegexp(ValueError, msg, getattr(sql, func), - self.con, flavor='mysql') + tm.assert_raises_regex(ValueError, msg, getattr(sql, func), + self.con, flavor='mysql') def test_deprecated_flavor(self): for func in self.funcs: @@ -2362,12 +2323,13 @@ def test_deprecated_flavor(self): getattr(sql, func)(self.con, flavor='sqlite') -@unittest.skip("gh-13611: there is no support for MySQL " - "if SQLAlchemy is not installed") -class TestXMySQL(MySQLMixIn, tm.TestCase): +@pytest.mark.single +@pytest.mark.skip(reason="gh-13611: there is no support for MySQL " + "if SQLAlchemy is not installed") +class TestXMySQL(MySQLMixIn): @classmethod - def setUpClass(cls): + def setup_class(cls): _skip_if_no_pymysql() # test connection @@ -2384,18 +2346,18 @@ def setUpClass(cls): try: pymysql.connect(read_default_group='pandas') except pymysql.ProgrammingError: - raise nose.SkipTest( + pytest.skip( "Create a group of connection parameters under the heading " "[pandas] in your system's mysql default file, " "typically located at ~/.my.cnf or /etc/.my.cnf. ") except pymysql.Error: - raise nose.SkipTest( + pytest.skip( "Cannot connect to database. " "Create a group of connection parameters under the heading " "[pandas] in your system's mysql default file, " "typically located at ~/.my.cnf or /etc/.my.cnf. ") - def setUp(self): + def setup_method(self, method): _skip_if_no_pymysql() import pymysql try: @@ -2410,17 +2372,19 @@ def setUp(self): try: self.conn = pymysql.connect(read_default_group='pandas') except pymysql.ProgrammingError: - raise nose.SkipTest( + pytest.skip( "Create a group of connection parameters under the heading " "[pandas] in your system's mysql default file, " "typically located at ~/.my.cnf or /etc/.my.cnf. ") except pymysql.Error: - raise nose.SkipTest( + pytest.skip( "Cannot connect to database. " "Create a group of connection parameters under the heading " "[pandas] in your system's mysql default file, " "typically located at ~/.my.cnf or /etc/.my.cnf. ") + self.method = method + def test_basic(self): _skip_if_no_pymysql() frame = tm.makeTimeDataFrame() @@ -2490,17 +2454,18 @@ def test_schema(self): for l in lines: tokens = l.split(' ') if len(tokens) == 2 and tokens[0] == 'A': - self.assertTrue(tokens[1] == 'DATETIME') + assert tokens[1] == 'DATETIME' frame = tm.makeTimeDataFrame() drop_sql = "DROP TABLE IF EXISTS test" create_sql = sql.get_schema(frame, 'test', keys=['A', 'B']) lines = create_sql.splitlines() - self.assertTrue('PRIMARY KEY (`A`, `B`)' in create_sql) + assert 'PRIMARY KEY (`A`, `B`)' in create_sql cur = self.conn.cursor() cur.execute(drop_sql) cur.execute(create_sql) + @tm.capture_stdout def test_execute_fail(self): _skip_if_no_pymysql() drop_sql = "DROP TABLE IF EXISTS test" @@ -2520,14 +2485,10 @@ def test_execute_fail(self): sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)', self.conn) sql.execute('INSERT INTO test VALUES("foo", "baz", 2.567)', self.conn) - try: - sys.stdout = StringIO() - self.assertRaises(Exception, sql.execute, - 'INSERT INTO test VALUES("foo", "bar", 7)', - self.conn) - finally: - sys.stdout = sys.__stdout__ + with pytest.raises(Exception): + sql.execute('INSERT INTO test VALUES("foo", "bar", 7)', self.conn) + @tm.capture_stdout def test_execute_closed_connection(self): _skip_if_no_pymysql() drop_sql = "DROP TABLE IF EXISTS test" @@ -2546,15 +2507,12 @@ def test_execute_closed_connection(self): sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)', self.conn) self.conn.close() - try: - sys.stdout = StringIO() - self.assertRaises(Exception, tquery, "select * from test", - con=self.conn) - finally: - sys.stdout = sys.__stdout__ + + with pytest.raises(Exception): + tquery("select * from test", con=self.conn) # Initialize connection again (needed for tearDown) - self.setUp() + self.setup_method(self.method) def test_na_roundtrip(self): _skip_if_no_pymysql() @@ -2619,47 +2577,40 @@ def clean_up(test_table_to_drop): self.drop_table(test_table_to_drop) # test if invalid value for if_exists raises appropriate error - self.assertRaises(ValueError, - sql.to_sql, - frame=df_if_exists_1, - con=self.conn, - name=table_name, - if_exists='notvalidvalue') + pytest.raises(ValueError, + sql.to_sql, + frame=df_if_exists_1, + con=self.conn, + name=table_name, + if_exists='notvalidvalue') clean_up(table_name) # test if_exists='fail' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='fail', index=False) - self.assertRaises(ValueError, - sql.to_sql, - frame=df_if_exists_1, - con=self.conn, - name=table_name, - if_exists='fail') + pytest.raises(ValueError, + sql.to_sql, + frame=df_if_exists_1, + con=self.conn, + name=table_name, + if_exists='fail') # test if_exists='replace' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='replace', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B')]) + assert tquery(sql_select, con=self.conn) == [(1, 'A'), (2, 'B')] sql.to_sql(frame=df_if_exists_2, con=self.conn, name=table_name, if_exists='replace', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(3, 'C'), (4, 'D'), (5, 'E')]) + assert (tquery(sql_select, con=self.conn) == + [(3, 'C'), (4, 'D'), (5, 'E')]) clean_up(table_name) # test if_exists='append' sql.to_sql(frame=df_if_exists_1, con=self.conn, name=table_name, if_exists='fail', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B')]) + assert tquery(sql_select, con=self.conn) == [(1, 'A'), (2, 'B')] sql.to_sql(frame=df_if_exists_2, con=self.conn, name=table_name, if_exists='append', index=False) - self.assertEqual(tquery(sql_select, con=self.conn), - [(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]) + assert (tquery(sql_select, con=self.conn) == + [(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]) clean_up(table_name) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/io/tests/test_stata.py b/pandas/tests/io/test_stata.py similarity index 93% rename from pandas/io/tests/test_stata.py rename to pandas/tests/io/test_stata.py index 8cfd5d98fe05f..a414928d318c4 100644 --- a/pandas/io/tests/test_stata.py +++ b/pandas/tests/io/test_stata.py @@ -9,23 +9,23 @@ from datetime import datetime from distutils.version import LooseVersion -import nose import numpy as np import pandas as pd import pandas.util.testing as tm +import pytest from pandas import compat +from pandas._libs.tslib import NaT from pandas.compat import iterkeys +from pandas.core.dtypes.common import is_categorical_dtype from pandas.core.frame import DataFrame, Series from pandas.io.parsers import read_csv from pandas.io.stata import (read_stata, StataReader, InvalidColumnName, PossiblePrecisionLoss, StataMissingValue) -from pandas.tslib import NaT -from pandas.types.common import is_categorical_dtype -class TestStata(tm.TestCase): +class TestStata(object): - def setUp(self): + def setup_method(self, method): self.dirpath = tm.get_data_path() self.dta1_114 = os.path.join(self.dirpath, 'stata1_114.dta') self.dta1_117 = os.path.join(self.dirpath, 'stata1_117.dta') @@ -128,7 +128,7 @@ def test_read_dta1(self): def test_read_dta2(self): if LooseVersion(sys.version) < '2.7': - raise nose.SkipTest('datetime interp under 2.6 is faulty') + pytest.skip('datetime interp under 2.6 is faulty') expected = DataFrame.from_records( [ @@ -181,7 +181,7 @@ def test_read_dta2(self): w = [x for x in w if x.category is UserWarning] # should get warning for each call to read_dta - self.assertEqual(len(w), 3) + assert len(w) == 3 # buggy test because of the NaT comparison on certain platforms # Format 113 test fails since it does not support tc and tC formats @@ -283,7 +283,7 @@ def test_read_dta18(self): u'Floats': u'float data'} tm.assert_dict_equal(vl, vl_expected) - self.assertEqual(rdr.data_label, u'This is a Ünicode data label') + assert rdr.data_label == u'This is a Ünicode data label' def test_read_write_dta5(self): original = DataFrame([(np.nan, np.nan, np.nan, np.nan, np.nan)], @@ -351,12 +351,12 @@ def test_encoding(self): if compat.PY3: expected = raw.kreis1849[0] - self.assertEqual(result, expected) - self.assertIsInstance(result, compat.string_types) + assert result == expected + assert isinstance(result, compat.string_types) else: expected = raw.kreis1849.str.decode("latin-1")[0] - self.assertEqual(result, expected) - self.assertIsInstance(result, unicode) # noqa + assert result == expected + assert isinstance(result, unicode) # noqa with tm.ensure_clean() as path: encoded.to_stata(path, encoding='latin-1', write_index=False) @@ -377,7 +377,7 @@ def test_read_write_dta11(self): with warnings.catch_warnings(record=True) as w: original.to_stata(path, None) # should get a warning for that format. - self.assertEqual(len(w), 1) + assert len(w) == 1 written_and_read_again = self.read_dta(path) tm.assert_frame_equal( @@ -405,7 +405,7 @@ def test_read_write_dta12(self): with warnings.catch_warnings(record=True) as w: original.to_stata(path, None) # should get a warning for that format. - self.assertEqual(len(w), 1) + assert len(w) == 1 written_and_read_again = self.read_dta(path) tm.assert_frame_equal( @@ -523,7 +523,7 @@ def test_no_index(self): with tm.ensure_clean() as path: original.to_stata(path, write_index=False) written_and_read_again = self.read_dta(path) - tm.assertRaises( + pytest.raises( KeyError, lambda: written_and_read_again['index_not_written']) def test_string_no_dates(self): @@ -647,10 +647,10 @@ def test_variable_labels(self): keys = ('var1', 'var2', 'var3') labels = ('label1', 'label2', 'label3') for k, v in compat.iteritems(sr_115): - self.assertTrue(k in sr_117) - self.assertTrue(v == sr_117[k]) - self.assertTrue(k in keys) - self.assertTrue(v in labels) + assert k in sr_117 + assert v == sr_117[k] + assert k in keys + assert v in labels def test_minimal_size_col(self): str_lens = (1, 100, 244) @@ -667,8 +667,8 @@ def test_minimal_size_col(self): variables = sr.varlist formats = sr.fmtlist for variable, fmt, typ in zip(variables, formats, typlist): - self.assertTrue(int(variable[1:]) == int(fmt[1:-1])) - self.assertTrue(int(variable[1:]) == typ) + assert int(variable[1:]) == int(fmt[1:-1]) + assert int(variable[1:]) == typ def test_excessively_long_string(self): str_lens = (1, 244, 500) @@ -677,7 +677,7 @@ def test_excessively_long_string(self): s['s' + str(str_len)] = Series(['a' * str_len, 'b' * str_len, 'c' * str_len]) original = DataFrame(s) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): with tm.ensure_clean() as path: original.to_stata(path) @@ -694,21 +694,21 @@ def test_missing_value_generator(self): offset = valid_range[t][1] for i in range(0, 27): val = StataMissingValue(offset + 1 + i) - self.assertTrue(val.string == expected_values[i]) + assert val.string == expected_values[i] # Test extremes for floats val = StataMissingValue(struct.unpack(' 0) + assert len(ax.get_children()) > 0 if layout is not None: - result = self._get_axes_layout(plotting._flatten(axes)) - self.assertEqual(result, layout) + result = self._get_axes_layout(_flatten(axes)) + assert result == layout - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal( visible_axes[0].figure.get_size_inches(), np.array(figsize, dtype=np.float64)) @@ -378,7 +382,7 @@ def _flatten_visible(self, axes): axes : matplotlib Axes object, or its list-like """ - axes = plotting._flatten(axes) + axes = _flatten(axes) axes = [ax for ax in axes if ax.get_visible()] return axes @@ -406,8 +410,8 @@ def _check_has_errorbars(self, axes, xerr=0, yerr=0): xerr_count += 1 if has_yerr: yerr_count += 1 - self.assertEqual(xerr, xerr_count) - self.assertEqual(yerr, yerr_count) + assert xerr == xerr_count + assert yerr == yerr_count def _check_box_return_type(self, returned, return_type, expected_keys=None, check_ax_title=True): @@ -434,36 +438,36 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None, if return_type is None: return_type = 'dict' - self.assertTrue(isinstance(returned, types[return_type])) + assert isinstance(returned, types[return_type]) if return_type == 'both': - self.assertIsInstance(returned.ax, Axes) - self.assertIsInstance(returned.lines, dict) + assert isinstance(returned.ax, Axes) + assert isinstance(returned.lines, dict) else: # should be fixed when the returning default is changed if return_type is None: for r in self._flatten_visible(returned): - self.assertIsInstance(r, Axes) + assert isinstance(r, Axes) return - self.assertTrue(isinstance(returned, Series)) + assert isinstance(returned, Series) - self.assertEqual(sorted(returned.keys()), sorted(expected_keys)) + assert sorted(returned.keys()) == sorted(expected_keys) for key, value in iteritems(returned): - self.assertTrue(isinstance(value, types[return_type])) + assert isinstance(value, types[return_type]) # check returned dict has correct mapping if return_type == 'axes': if check_ax_title: - self.assertEqual(value.get_title(), key) + assert value.get_title() == key elif return_type == 'both': if check_ax_title: - self.assertEqual(value.ax.get_title(), key) - self.assertIsInstance(value.ax, Axes) - self.assertIsInstance(value.lines, dict) + assert value.ax.get_title() == key + assert isinstance(value.ax, Axes) + assert isinstance(value.lines, dict) elif return_type == 'dict': line = value['medians'][0] axes = line.axes if self.mpl_ge_1_5_0 else line.get_axes() if check_ax_title: - self.assertEqual(axes.get_title(), key) + assert axes.get_title() == key else: raise AssertionError @@ -488,26 +492,26 @@ def is_grid_on(): spndx += 1 mpl.rc('axes', grid=False) obj.plot(kind=kind, **kws) - self.assertFalse(is_grid_on()) + assert not is_grid_on() self.plt.subplot(1, 4 * len(kinds), spndx) spndx += 1 mpl.rc('axes', grid=True) obj.plot(kind=kind, grid=False, **kws) - self.assertFalse(is_grid_on()) + assert not is_grid_on() if kind != 'pie': self.plt.subplot(1, 4 * len(kinds), spndx) spndx += 1 mpl.rc('axes', grid=True) obj.plot(kind=kind, **kws) - self.assertTrue(is_grid_on()) + assert is_grid_on() self.plt.subplot(1, 4 * len(kinds), spndx) spndx += 1 mpl.rc('axes', grid=False) obj.plot(kind=kind, grid=True, **kws) - self.assertTrue(is_grid_on()) + assert is_grid_on() def _maybe_unpack_cycler(self, rcParams, field='color'): """ diff --git a/pandas/tests/plotting/test_boxplot_method.py b/pandas/tests/plotting/test_boxplot_method.py index 289d48ba6d4cc..8fe119d28644c 100644 --- a/pandas/tests/plotting/test_boxplot_method.py +++ b/pandas/tests/plotting/test_boxplot_method.py @@ -1,6 +1,6 @@ # coding: utf-8 -import nose +import pytest import itertools import string from distutils.version import LooseVersion @@ -8,19 +8,20 @@ from pandas import Series, DataFrame, MultiIndex from pandas.compat import range, lzip import pandas.util.testing as tm -from pandas.util.testing import slow import numpy as np from numpy import random from numpy.random import randn -import pandas.tools.plotting as plotting +import pandas.plotting as plotting from pandas.tests.plotting.common import (TestPlotBase, _check_plot_works) """ Test cases for .boxplot method """ +tm._skip_if_no_mpl() + def _skip_if_mpl_14_or_dev_boxplot(): # GH 8382 @@ -28,13 +29,12 @@ def _skip_if_mpl_14_or_dev_boxplot(): # Don't need try / except since that's done at class level import matplotlib if str(matplotlib.__version__) >= LooseVersion('1.4'): - raise nose.SkipTest("Matplotlib Regression in 1.4 and current dev.") + pytest.skip("Matplotlib Regression in 1.4 and current dev.") -@tm.mplskip class TestDataFramePlots(TestPlotBase): - @slow + @pytest.mark.slow def test_boxplot_legacy(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), @@ -54,7 +54,8 @@ def test_boxplot_legacy(self): _check_plot_works(df.boxplot, by='indic') with tm.assert_produces_warning(UserWarning): _check_plot_works(df.boxplot, by=['indic', 'indic2']) - _check_plot_works(plotting.boxplot, data=df['one'], return_type='dict') + _check_plot_works(plotting._core.boxplot, data=df['one'], + return_type='dict') _check_plot_works(df.boxplot, notch=1, return_type='dict') with tm.assert_produces_warning(UserWarning): _check_plot_works(df.boxplot, by='indic', notch=1) @@ -70,34 +71,34 @@ def test_boxplot_legacy(self): fig, ax = self.plt.subplots() axes = df.boxplot('Col1', by='X', ax=ax) ax_axes = ax.axes if self.mpl_ge_1_5_0 else ax.get_axes() - self.assertIs(ax_axes, axes) + assert ax_axes is axes fig, ax = self.plt.subplots() axes = df.groupby('Y').boxplot(ax=ax, return_type='axes') ax_axes = ax.axes if self.mpl_ge_1_5_0 else ax.get_axes() - self.assertIs(ax_axes, axes['A']) + assert ax_axes is axes['A'] # Multiple columns with an ax argument should use same figure fig, ax = self.plt.subplots() with tm.assert_produces_warning(UserWarning): axes = df.boxplot(column=['Col1', 'Col2'], by='X', ax=ax, return_type='axes') - self.assertIs(axes['Col1'].get_figure(), fig) + assert axes['Col1'].get_figure() is fig # When by is None, check that all relevant lines are present in the # dict fig, ax = self.plt.subplots() d = df.boxplot(ax=ax, return_type='dict') lines = list(itertools.chain.from_iterable(d.values())) - self.assertEqual(len(ax.get_lines()), len(lines)) + assert len(ax.get_lines()) == len(lines) - @slow + @pytest.mark.slow def test_boxplot_return_type_none(self): # GH 12216; return_type=None & by=None -> axes result = self.hist_df.boxplot() - self.assertTrue(isinstance(result, self.plt.Axes)) + assert isinstance(result, self.plt.Axes) - @slow + @pytest.mark.slow def test_boxplot_return_type_legacy(self): # API change in https://github.com/pandas-dev/pandas/pull/7096 import matplotlib as mpl # noqa @@ -105,7 +106,7 @@ def test_boxplot_return_type_legacy(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), columns=['one', 'two', 'three', 'four']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.boxplot(return_type='NOTATYPE') result = df.boxplot() @@ -123,13 +124,13 @@ def test_boxplot_return_type_legacy(self): result = df.boxplot(return_type='both') self._check_box_return_type(result, 'both') - @slow + @pytest.mark.slow def test_boxplot_axis_limits(self): def _check_ax_limits(col, ax): y_min, y_max = ax.get_ylim() - self.assertTrue(y_min <= col.min()) - self.assertTrue(y_max >= col.max()) + assert y_min <= col.min() + assert y_max >= col.max() df = self.hist_df.copy() df['age'] = np.random.randint(1, 20, df.shape[0]) @@ -137,7 +138,7 @@ def _check_ax_limits(col, ax): height_ax, weight_ax = df.boxplot(['height', 'weight'], by='category') _check_ax_limits(df['height'], height_ax) _check_ax_limits(df['weight'], weight_ax) - self.assertEqual(weight_ax._sharey, height_ax) + assert weight_ax._sharey == height_ax # Two rows, one partial p = df.boxplot(['height', 'weight', 'age'], by='category') @@ -147,27 +148,34 @@ def _check_ax_limits(col, ax): _check_ax_limits(df['height'], height_ax) _check_ax_limits(df['weight'], weight_ax) _check_ax_limits(df['age'], age_ax) - self.assertEqual(weight_ax._sharey, height_ax) - self.assertEqual(age_ax._sharey, height_ax) - self.assertIsNone(dummy_ax._sharey) + assert weight_ax._sharey == height_ax + assert age_ax._sharey == height_ax + assert dummy_ax._sharey is None - @slow + @pytest.mark.slow def test_boxplot_empty_column(self): _skip_if_mpl_14_or_dev_boxplot() df = DataFrame(np.random.randn(20, 4)) df.loc[:, 0] = np.nan _check_plot_works(df.boxplot, return_type='axes') + @pytest.mark.slow + def test_figsize(self): + df = DataFrame(np.random.rand(10, 5), + columns=['A', 'B', 'C', 'D', 'E']) + result = df.boxplot(return_type='axes', figsize=(12, 8)) + assert result.figure.bbox_inches.width == 12 + assert result.figure.bbox_inches.height == 8 + def test_fontsize(self): df = DataFrame({"a": [1, 2, 3, 4, 5, 6]}) self._check_ticks_props(df.boxplot("a", fontsize=16), xlabelsize=16, ylabelsize=16) -@tm.mplskip class TestDataFrameGroupByPlots(TestPlotBase): - @slow + @pytest.mark.slow def test_boxplot_legacy(self): grouped = self.hist_df.groupby(by='gender') with tm.assert_produces_warning(UserWarning): @@ -197,7 +205,7 @@ def test_boxplot_legacy(self): return_type='axes') self._check_axes_shape(axes, axes_num=1, layout=(1, 1)) - @slow + @pytest.mark.slow def test_grouped_plot_fignums(self): n = 10 weight = Series(np.random.normal(166, 20, size=n)) @@ -208,26 +216,26 @@ def test_grouped_plot_fignums(self): gb = df.groupby('gender') res = gb.plot() - self.assertEqual(len(self.plt.get_fignums()), 2) - self.assertEqual(len(res), 2) + assert len(self.plt.get_fignums()) == 2 + assert len(res) == 2 tm.close() res = gb.boxplot(return_type='axes') - self.assertEqual(len(self.plt.get_fignums()), 1) - self.assertEqual(len(res), 2) + assert len(self.plt.get_fignums()) == 1 + assert len(res) == 2 tm.close() # now works with GH 5610 as gender is excluded res = df.groupby('gender').hist() tm.close() - @slow + @pytest.mark.slow def test_grouped_box_return_type(self): df = self.hist_df # old style: return_type=None result = df.boxplot(by='gender') - self.assertIsInstance(result, np.ndarray) + assert isinstance(result, np.ndarray) self._check_box_return_type( result, None, expected_keys=['height', 'weight', 'category']) @@ -258,17 +266,17 @@ def test_grouped_box_return_type(self): returned = df2.boxplot(by='category', return_type=t) self._check_box_return_type(returned, t, expected_keys=columns2) - @slow + @pytest.mark.slow def test_grouped_box_layout(self): df = self.hist_df - self.assertRaises(ValueError, df.boxplot, column=['weight', 'height'], - by=df.gender, layout=(1, 1)) - self.assertRaises(ValueError, df.boxplot, - column=['height', 'weight', 'category'], - layout=(2, 1), return_type='dict') - self.assertRaises(ValueError, df.boxplot, column=['weight', 'height'], - by=df.gender, layout=(-1, -1)) + pytest.raises(ValueError, df.boxplot, column=['weight', 'height'], + by=df.gender, layout=(1, 1)) + pytest.raises(ValueError, df.boxplot, + column=['height', 'weight', 'category'], + layout=(2, 1), return_type='dict') + pytest.raises(ValueError, df.boxplot, column=['weight', 'height'], + by=df.gender, layout=(-1, -1)) # _check_plot_works adds an ax so catch warning. see GH #13188 with tm.assert_produces_warning(UserWarning): @@ -332,7 +340,7 @@ def test_grouped_box_layout(self): return_type='dict') self._check_axes_shape(self.plt.gcf().axes, axes_num=3, layout=(1, 3)) - @slow + @pytest.mark.slow def test_grouped_box_multiple_axes(self): # GH 6970, GH 7069 df = self.hist_df @@ -355,8 +363,8 @@ def test_grouped_box_multiple_axes(self): by='gender', return_type='axes', ax=axes[0]) returned = np.array(list(returned.values)) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assert_numpy_array_equal(returned, axes[0]) - self.assertIs(returned[0].figure, fig) + tm.assert_numpy_array_equal(returned, axes[0]) + assert returned[0].figure is fig # draw on second row with tm.assert_produces_warning(UserWarning): @@ -365,10 +373,10 @@ def test_grouped_box_multiple_axes(self): return_type='axes', ax=axes[1]) returned = np.array(list(returned.values)) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assert_numpy_array_equal(returned, axes[1]) - self.assertIs(returned[0].figure, fig) + tm.assert_numpy_array_equal(returned, axes[1]) + assert returned[0].figure is fig - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): fig, axes = self.plt.subplots(2, 3) # pass different number of axes from required with tm.assert_produces_warning(UserWarning): @@ -378,8 +386,3 @@ def test_fontsize(self): df = DataFrame({"a": [1, 2, 3, 4, 5, 6], "b": [0, 0, 0, 1, 1, 1]}) self._check_ticks_props(df.boxplot("a", by="b", fontsize=16), xlabelsize=16, ylabelsize=16) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_converter.py b/pandas/tests/plotting/test_converter.py similarity index 80% rename from pandas/tseries/tests/test_converter.py rename to pandas/tests/plotting/test_converter.py index f6cf11c871bba..e1f64bed5598d 100644 --- a/pandas/tseries/tests/test_converter.py +++ b/pandas/tests/plotting/test_converter.py @@ -1,7 +1,6 @@ +import pytest from datetime import datetime, date -import nose - import numpy as np from pandas import Timestamp, Period, Index from pandas.compat import u @@ -9,18 +8,16 @@ from pandas.tseries.offsets import Second, Milli, Micro, Day from pandas.compat.numpy import np_datetime64_compat -try: - import pandas.tseries.converter as converter -except ImportError: - raise nose.SkipTest("no pandas.tseries.converter, skipping") +converter = pytest.importorskip('pandas.plotting._converter') def test_timtetonum_accepts_unicode(): assert (converter.time2num("00:01") == converter.time2num(u("00:01"))) -class TestDateTimeConverter(tm.TestCase): - def setUp(self): +class TestDateTimeConverter(object): + + def setup_method(self, method): self.dtc = converter.DatetimeConverter() self.tc = converter.TimeFormatter(None) @@ -32,35 +29,35 @@ def test_convert_accepts_unicode(self): def test_conversion(self): rs = self.dtc.convert(['2012-1-1'], None, None)[0] xp = datetime(2012, 1, 1).toordinal() - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert('2012-1-1', None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(date(2012, 1, 1), None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(datetime(2012, 1, 1).toordinal(), None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert('2012-1-1', None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(Timestamp('2012-1-1'), None, None) - self.assertEqual(rs, xp) + assert rs == xp # also testing datetime64 dtype (GH8614) rs = self.dtc.convert(np_datetime64_compat('2012-01-01'), None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(np_datetime64_compat( '2012-01-01 00:00:00+0000'), None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(np.array([ np_datetime64_compat('2012-01-01 00:00:00+0000'), np_datetime64_compat('2012-01-02 00:00:00+0000')]), None, None) - self.assertEqual(rs[0], xp) + assert rs[0] == xp # we have a tz-aware date (constructed to that when we turn to utc it # is the same as our sample) @@ -69,17 +66,17 @@ def test_conversion(self): .tz_convert('US/Eastern') ) rs = self.dtc.convert(ts, None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(ts.to_pydatetime(), None, None) - self.assertEqual(rs, xp) + assert rs == xp rs = self.dtc.convert(Index([ts - Day(1), ts]), None, None) - self.assertEqual(rs[1], xp) + assert rs[1] == xp rs = self.dtc.convert(Index([ts - Day(1), ts]).to_pydatetime(), None, None) - self.assertEqual(rs[1], xp) + assert rs[1] == xp def test_conversion_float(self): decimals = 9 @@ -104,7 +101,7 @@ def test_conversion_outofbounds_datetime(self): tm.assert_numpy_array_equal(rs, xp) rs = self.dtc.convert(values[0], None, None) xp = converter.dates.date2num(values[0]) - self.assertEqual(rs, xp) + assert rs == xp values = [datetime(1677, 1, 1, 12), datetime(1677, 1, 2, 12)] rs = self.dtc.convert(values, None, None) @@ -112,7 +109,7 @@ def test_conversion_outofbounds_datetime(self): tm.assert_numpy_array_equal(rs, xp) rs = self.dtc.convert(values[0], None, None) xp = converter.dates.date2num(values[0]) - self.assertEqual(rs, xp) + assert rs == xp def test_time_formatter(self): self.tc(90000) @@ -141,9 +138,17 @@ def _assert_less(ts1, ts2): _assert_less(ts, ts + Milli()) _assert_less(ts, ts + Micro(50)) + def test_convert_nested(self): + inner = [Timestamp('2017-01-01', Timestamp('2017-01-02'))] + data = [inner, inner] + result = self.dtc.convert(data, None, None) + expected = [self.dtc.convert(x, None, None) for x in data] + assert result == expected + -class TestPeriodConverter(tm.TestCase): - def setUp(self): +class TestPeriodConverter(object): + + def setup_method(self, method): self.pc = converter.PeriodConverter() class Axis(object): @@ -155,53 +160,52 @@ class Axis(object): def test_convert_accepts_unicode(self): r1 = self.pc.convert("2012-1-1", None, self.axis) r2 = self.pc.convert(u("2012-1-1"), None, self.axis) - self.assert_equal(r1, r2, - "PeriodConverter.convert should accept unicode") + assert r1 == r2 def test_conversion(self): rs = self.pc.convert(['2012-1-1'], None, self.axis)[0] xp = Period('2012-1-1').ordinal - self.assertEqual(rs, xp) + assert rs == xp rs = self.pc.convert('2012-1-1', None, self.axis) - self.assertEqual(rs, xp) + assert rs == xp rs = self.pc.convert([date(2012, 1, 1)], None, self.axis)[0] - self.assertEqual(rs, xp) + assert rs == xp rs = self.pc.convert(date(2012, 1, 1), None, self.axis) - self.assertEqual(rs, xp) + assert rs == xp rs = self.pc.convert([Timestamp('2012-1-1')], None, self.axis)[0] - self.assertEqual(rs, xp) + assert rs == xp rs = self.pc.convert(Timestamp('2012-1-1'), None, self.axis) - self.assertEqual(rs, xp) + assert rs == xp # FIXME # rs = self.pc.convert( # np_datetime64_compat('2012-01-01'), None, self.axis) - # self.assertEqual(rs, xp) + # assert rs == xp # # rs = self.pc.convert( # np_datetime64_compat('2012-01-01 00:00:00+0000'), # None, self.axis) - # self.assertEqual(rs, xp) + # assert rs == xp # # rs = self.pc.convert(np.array([ # np_datetime64_compat('2012-01-01 00:00:00+0000'), # np_datetime64_compat('2012-01-02 00:00:00+0000')]), # None, self.axis) - # self.assertEqual(rs[0], xp) + # assert rs[0] == xp def test_integer_passthrough(self): # GH9012 rs = self.pc.convert([0, 1], None, self.axis) xp = [0, 1] - self.assertEqual(rs, xp) - + assert rs == xp -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_convert_nested(self): + data = ['2012-1-1', '2012-1-2'] + r1 = self.pc.convert([data, data], None, self.axis) + r2 = [self.pc.convert(data, None, self.axis) for _ in range(2)] + assert r1 == r2 diff --git a/pandas/tests/plotting/test_datetimelike.py b/pandas/tests/plotting/test_datetimelike.py index 6486c8aa21c1b..cff0c1c0b424e 100644 --- a/pandas/tests/plotting/test_datetimelike.py +++ b/pandas/tests/plotting/test_datetimelike.py @@ -1,31 +1,32 @@ +""" Test cases for time series specific (freq conversion, etc) """ + from datetime import datetime, timedelta, date, time -import nose +import pytest from pandas.compat import lrange, zip import numpy as np -from pandas import Index, Series, DataFrame - -from pandas.tseries.index import date_range, bdate_range +from pandas import Index, Series, DataFrame, NaT +from pandas.compat import is_platform_mac +from pandas.core.indexes.datetimes import date_range, bdate_range +from pandas.core.indexes.timedeltas import timedelta_range from pandas.tseries.offsets import DateOffset -from pandas.tseries.period import period_range, Period, PeriodIndex -from pandas.tseries.resample import DatetimeIndex +from pandas.core.indexes.period import period_range, Period, PeriodIndex +from pandas.core.resample import DatetimeIndex -from pandas.util.testing import assert_series_equal, ensure_clean, slow +from pandas.util.testing import assert_series_equal, ensure_clean import pandas.util.testing as tm from pandas.tests.plotting.common import (TestPlotBase, _skip_if_no_scipy_gaussian_kde) - -""" Test cases for time series specific (freq conversion, etc) """ +tm._skip_if_no_mpl() -@tm.mplskip class TestTSPlot(TestPlotBase): - def setUp(self): - TestPlotBase.setUp(self) + def setup_method(self, method): + TestPlotBase.setup_method(self, method) freq = ['S', 'T', 'H', 'D', 'W', 'M', 'Q', 'A'] idx = [period_range('12/31/1999', freq=x, periods=100) for x in freq] @@ -41,10 +42,10 @@ def setUp(self): columns=['A', 'B', 'C']) for x in idx] - def tearDown(self): + def teardown_method(self, method): tm.close() - @slow + @pytest.mark.slow def test_ts_plot_with_tz(self): # GH2877 index = date_range('1/1/2011', periods=2, freq='H', @@ -54,16 +55,15 @@ def test_ts_plot_with_tz(self): def test_fontsize_set_correctly(self): # For issue #8765 - import matplotlib.pyplot as plt # noqa df = DataFrame(np.random.randn(10, 9), index=range(10)) - ax = df.plot(fontsize=2) + fig, ax = self.plt.subplots() + df.plot(fontsize=2, ax=ax) for label in (ax.get_xticklabels() + ax.get_yticklabels()): - self.assertEqual(label.get_fontsize(), 2) + assert label.get_fontsize() == 2 - @slow + @pytest.mark.slow def test_frame_inferred(self): # inferred freq - import matplotlib.pyplot as plt # noqa idx = date_range('1/1/1987', freq='MS', periods=100) idx = DatetimeIndex(idx.values, freq=None) @@ -89,26 +89,24 @@ def test_is_error_nozeroindex(self): _check_plot_works(a.plot, yerr=a) def test_nonnumeric_exclude(self): - import matplotlib.pyplot as plt - idx = date_range('1/1/1987', freq='A', periods=3) df = DataFrame({'A': ["x", "y", "z"], 'B': [1, 2, 3]}, idx) - ax = df.plot() # it works - self.assertEqual(len(ax.get_lines()), 1) # B was plotted - plt.close(plt.gcf()) + fig, ax = self.plt.subplots() + df.plot(ax=ax) # it works + assert len(ax.get_lines()) == 1 # B was plotted + self.plt.close(fig) - self.assertRaises(TypeError, df['A'].plot) + pytest.raises(TypeError, df['A'].plot) - @slow + @pytest.mark.slow def test_tsplot(self): from pandas.tseries.plotting import tsplot - import matplotlib.pyplot as plt - ax = plt.gca() + _, ax = self.plt.subplots() ts = tm.makeTimeSeries() - f = lambda *args, **kwds: tsplot(s, plt.Axes.plot, *args, **kwds) + f = lambda *args, **kwds: tsplot(s, self.plt.Axes.plot, *args, **kwds) for s in self.period_ser: _check_plot_works(f, s.index.freq, ax=ax, series=s) @@ -122,90 +120,93 @@ def test_tsplot(self): for s in self.datetime_ser: _check_plot_works(s.plot, ax=ax) - ax = ts.plot(style='k') + _, ax = self.plt.subplots() + ts.plot(style='k', ax=ax) color = (0., 0., 0., 1) if self.mpl_ge_2_0_0 else (0., 0., 0.) - self.assertEqual(color, ax.get_lines()[0].get_color()) + assert color == ax.get_lines()[0].get_color() def test_both_style_and_color(self): - import matplotlib.pyplot as plt # noqa ts = tm.makeTimeSeries() - self.assertRaises(ValueError, ts.plot, style='b-', color='#000099') + pytest.raises(ValueError, ts.plot, style='b-', color='#000099') s = ts.reset_index(drop=True) - self.assertRaises(ValueError, s.plot, style='b-', color='#000099') + pytest.raises(ValueError, s.plot, style='b-', color='#000099') - @slow + @pytest.mark.slow def test_high_freq(self): freaks = ['ms', 'us'] for freq in freaks: + _, ax = self.plt.subplots() rng = date_range('1/1/2012', periods=100000, freq=freq) ser = Series(np.random.randn(len(rng)), rng) - _check_plot_works(ser.plot) + _check_plot_works(ser.plot, ax=ax) def test_get_datevalue(self): - from pandas.tseries.converter import get_datevalue - self.assertIsNone(get_datevalue(None, 'D')) - self.assertEqual(get_datevalue(1987, 'A'), 1987) - self.assertEqual(get_datevalue(Period(1987, 'A'), 'M'), - Period('1987-12', 'M').ordinal) - self.assertEqual(get_datevalue('1/1/1987', 'D'), - Period('1987-1-1', 'D').ordinal) - - @slow + from pandas.plotting._converter import get_datevalue + assert get_datevalue(None, 'D') is None + assert get_datevalue(1987, 'A') == 1987 + assert (get_datevalue(Period(1987, 'A'), 'M') == + Period('1987-12', 'M').ordinal) + assert (get_datevalue('1/1/1987', 'D') == + Period('1987-1-1', 'D').ordinal) + + @pytest.mark.slow def test_ts_plot_format_coord(self): def check_format_of_first_point(ax, expected_string): first_line = ax.get_lines()[0] first_x = first_line.get_xdata()[0].ordinal first_y = first_line.get_ydata()[0] try: - self.assertEqual(expected_string, - ax.format_coord(first_x, first_y)) + assert expected_string == ax.format_coord(first_x, first_y) except (ValueError): - raise nose.SkipTest("skipping test because issue forming " - "test comparison GH7664") + pytest.skip("skipping test because issue forming " + "test comparison GH7664") annual = Series(1, index=date_range('2014-01-01', periods=3, freq='A-DEC')) - check_format_of_first_point(annual.plot(), 't = 2014 y = 1.000000') + _, ax = self.plt.subplots() + annual.plot(ax=ax) + check_format_of_first_point(ax, 't = 2014 y = 1.000000') # note this is added to the annual plot already in existence, and # changes its freq field daily = Series(1, index=date_range('2014-01-01', periods=3, freq='D')) - check_format_of_first_point(daily.plot(), + daily.plot(ax=ax) + check_format_of_first_point(ax, 't = 2014-01-01 y = 1.000000') tm.close() # tsplot - import matplotlib.pyplot as plt + _, ax = self.plt.subplots() from pandas.tseries.plotting import tsplot - tsplot(annual, plt.Axes.plot) - check_format_of_first_point(plt.gca(), 't = 2014 y = 1.000000') - tsplot(daily, plt.Axes.plot) - check_format_of_first_point(plt.gca(), 't = 2014-01-01 y = 1.000000') + tsplot(annual, self.plt.Axes.plot, ax=ax) + check_format_of_first_point(ax, 't = 2014 y = 1.000000') + tsplot(daily, self.plt.Axes.plot, ax=ax) + check_format_of_first_point(ax, 't = 2014-01-01 y = 1.000000') - @slow + @pytest.mark.slow def test_line_plot_period_series(self): for s in self.period_ser: _check_plot_works(s.plot, s.index.freq) - @slow + @pytest.mark.slow def test_line_plot_datetime_series(self): for s in self.datetime_ser: _check_plot_works(s.plot, s.index.freq.rule_code) - @slow + @pytest.mark.slow def test_line_plot_period_frame(self): for df in self.period_df: _check_plot_works(df.plot, df.index.freq) - @slow + @pytest.mark.slow def test_line_plot_datetime_frame(self): for df in self.datetime_df: freq = df.index.to_period(df.index.freq.rule_code).freq _check_plot_works(df.plot, freq) - @slow + @pytest.mark.slow def test_line_plot_inferred_freq(self): for ser in self.datetime_ser: ser = Series(ser.values, Index(np.asarray(ser.index))) @@ -215,17 +216,14 @@ def test_line_plot_inferred_freq(self): _check_plot_works(ser.plot) def test_fake_inferred_business(self): - import matplotlib.pyplot as plt - fig = plt.gcf() - plt.clf() - fig.add_subplot(111) + _, ax = self.plt.subplots() rng = date_range('2001-1-1', '2001-1-10') ts = Series(lrange(len(rng)), rng) ts = ts[:3].append(ts[5:]) - ax = ts.plot() - self.assertFalse(hasattr(ax, 'freq')) + ts.plot(ax=ax) + assert not hasattr(ax, 'freq') - @slow + @pytest.mark.slow def test_plot_offset_freq(self): ser = tm.makeTimeSeries() _check_plot_works(ser.plot) @@ -234,25 +232,21 @@ def test_plot_offset_freq(self): ser = Series(np.random.randn(len(dr)), dr) _check_plot_works(ser.plot) - @slow + @pytest.mark.slow def test_plot_multiple_inferred_freq(self): dr = Index([datetime(2000, 1, 1), datetime(2000, 1, 6), datetime( 2000, 1, 11)]) ser = Series(np.random.randn(len(dr)), dr) _check_plot_works(ser.plot) - @slow + @pytest.mark.slow def test_uhf(self): - import pandas.tseries.converter as conv - import matplotlib.pyplot as plt - fig = plt.gcf() - plt.clf() - fig.add_subplot(111) - + import pandas.plotting._converter as conv idx = date_range('2012-6-22 21:59:51.960928', freq='L', periods=500) df = DataFrame(np.random.randn(len(idx), 2), idx) - ax = df.plot() + _, ax = self.plt.subplots() + df.plot(ax=ax) axis = ax.get_xaxis() tlocs = axis.get_ticklocs() @@ -261,96 +255,88 @@ def test_uhf(self): xp = conv._from_ordinal(loc).strftime('%H:%M:%S.%f') rs = str(label.get_text()) if len(rs): - self.assertEqual(xp, rs) + assert xp == rs - @slow + @pytest.mark.slow def test_irreg_hf(self): - import matplotlib.pyplot as plt - fig = plt.gcf() - plt.clf() - fig.add_subplot(111) - idx = date_range('2012-6-22 21:59:51', freq='S', periods=100) df = DataFrame(np.random.randn(len(idx), 2), idx) irreg = df.iloc[[0, 1, 3, 4]] - ax = irreg.plot() + _, ax = self.plt.subplots() + irreg.plot(ax=ax) diffs = Series(ax.get_lines()[0].get_xydata()[:, 0]).diff() sec = 1. / 24 / 60 / 60 - self.assertTrue((np.fabs(diffs[1:] - [sec, sec * 2, sec]) < 1e-8).all( - )) + assert (np.fabs(diffs[1:] - [sec, sec * 2, sec]) < 1e-8).all() - plt.clf() - fig.add_subplot(111) + _, ax = self.plt.subplots() df2 = df.copy() df2.index = df.index.asobject - ax = df2.plot() + df2.plot(ax=ax) diffs = Series(ax.get_lines()[0].get_xydata()[:, 0]).diff() - self.assertTrue((np.fabs(diffs[1:] - sec) < 1e-8).all()) + assert (np.fabs(diffs[1:] - sec) < 1e-8).all() def test_irregular_datetime64_repr_bug(self): - import matplotlib.pyplot as plt ser = tm.makeTimeSeries() ser = ser[[0, 1, 2, 7]] - fig = plt.gcf() - plt.clf() - ax = fig.add_subplot(211) - ret = ser.plot() - self.assertIsNotNone(ret) + _, ax = self.plt.subplots() + + ret = ser.plot(ax=ax) + assert ret is not None for rs, xp in zip(ax.get_lines()[0].get_xdata(), ser.index): - self.assertEqual(rs, xp) + assert rs == xp def test_business_freq(self): - import matplotlib.pyplot as plt # noqa bts = tm.makePeriodSeries() - ax = bts.plot() - self.assertEqual(ax.get_lines()[0].get_xydata()[0, 0], - bts.index[0].ordinal) + _, ax = self.plt.subplots() + bts.plot(ax=ax) + assert ax.get_lines()[0].get_xydata()[0, 0] == bts.index[0].ordinal idx = ax.get_lines()[0].get_xdata() - self.assertEqual(PeriodIndex(data=idx).freqstr, 'B') + assert PeriodIndex(data=idx).freqstr == 'B' - @slow + @pytest.mark.slow def test_business_freq_convert(self): n = tm.N tm.N = 300 bts = tm.makeTimeSeries().asfreq('BM') tm.N = n ts = bts.to_period('M') - ax = bts.plot() - self.assertEqual(ax.get_lines()[0].get_xydata()[0, 0], - ts.index[0].ordinal) + _, ax = self.plt.subplots() + bts.plot(ax=ax) + assert ax.get_lines()[0].get_xydata()[0, 0] == ts.index[0].ordinal idx = ax.get_lines()[0].get_xdata() - self.assertEqual(PeriodIndex(data=idx).freqstr, 'M') + assert PeriodIndex(data=idx).freqstr == 'M' def test_nonzero_base(self): # GH2571 idx = (date_range('2012-12-20', periods=24, freq='H') + timedelta( minutes=30)) df = DataFrame(np.arange(24), index=idx) - ax = df.plot() + _, ax = self.plt.subplots() + df.plot(ax=ax) rs = ax.get_lines()[0].get_xdata() - self.assertFalse(Index(rs).is_normalized) + assert not Index(rs).is_normalized def test_dataframe(self): bts = DataFrame({'a': tm.makeTimeSeries()}) - ax = bts.plot() + _, ax = self.plt.subplots() + bts.plot(ax=ax) idx = ax.get_lines()[0].get_xdata() tm.assert_index_equal(bts.index.to_period(), PeriodIndex(idx)) - @slow + @pytest.mark.slow def test_axis_limits(self): - import matplotlib.pyplot as plt def _test(ax): xlim = ax.get_xlim() ax.set_xlim(xlim[0] - 5, xlim[1] + 10) ax.get_figure().canvas.draw() result = ax.get_xlim() - self.assertEqual(result[0], xlim[0] - 5) - self.assertEqual(result[1], xlim[1] + 10) + assert result[0] == xlim[0] - 5 + assert result[1] == xlim[1] + 10 # string expected = (Period('1/1/2000', ax.freq), @@ -358,8 +344,8 @@ def _test(ax): ax.set_xlim('1/1/2000', '4/1/2000') ax.get_figure().canvas.draw() result = ax.get_xlim() - self.assertEqual(int(result[0]), expected[0].ordinal) - self.assertEqual(int(result[1]), expected[1].ordinal) + assert int(result[0]) == expected[0].ordinal + assert int(result[1]) == expected[1].ordinal # datetim expected = (Period('1/1/2000', ax.freq), @@ -367,17 +353,19 @@ def _test(ax): ax.set_xlim(datetime(2000, 1, 1), datetime(2000, 4, 1)) ax.get_figure().canvas.draw() result = ax.get_xlim() - self.assertEqual(int(result[0]), expected[0].ordinal) - self.assertEqual(int(result[1]), expected[1].ordinal) + assert int(result[0]) == expected[0].ordinal + assert int(result[1]) == expected[1].ordinal fig = ax.get_figure() - plt.close(fig) + self.plt.close(fig) ser = tm.makeTimeSeries() - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) _test(ax) + _, ax = self.plt.subplots() df = DataFrame({'a': ser, 'b': ser + 1}) - ax = df.plot() + df.plot(ax=ax) _test(ax) df = DataFrame({'a': ser, 'b': ser + 1}) @@ -387,277 +375,282 @@ def _test(ax): _test(ax) def test_get_finder(self): - import pandas.tseries.converter as conv + import pandas.plotting._converter as conv - self.assertEqual(conv.get_finder('B'), conv._daily_finder) - self.assertEqual(conv.get_finder('D'), conv._daily_finder) - self.assertEqual(conv.get_finder('M'), conv._monthly_finder) - self.assertEqual(conv.get_finder('Q'), conv._quarterly_finder) - self.assertEqual(conv.get_finder('A'), conv._annual_finder) - self.assertEqual(conv.get_finder('W'), conv._daily_finder) + assert conv.get_finder('B') == conv._daily_finder + assert conv.get_finder('D') == conv._daily_finder + assert conv.get_finder('M') == conv._monthly_finder + assert conv.get_finder('Q') == conv._quarterly_finder + assert conv.get_finder('A') == conv._annual_finder + assert conv.get_finder('W') == conv._daily_finder - @slow + @pytest.mark.slow def test_finder_daily(self): - import matplotlib.pyplot as plt xp = Period('1999-1-1', freq='B').ordinal day_lst = [10, 40, 252, 400, 950, 2750, 10000] for n in day_lst: rng = bdate_range('1999-1-1', periods=n) ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] - self.assertEqual(xp, rs) + assert xp == rs vmin, vmax = ax.get_xlim() ax.set_xlim(vmin + 0.9, vmax) rs = xaxis.get_majorticklocs()[0] - self.assertEqual(xp, rs) - plt.close(ax.get_figure()) + assert xp == rs + self.plt.close(ax.get_figure()) - @slow + @pytest.mark.slow def test_finder_quarterly(self): - import matplotlib.pyplot as plt xp = Period('1988Q1').ordinal yrs = [3.5, 11] for n in yrs: rng = period_range('1987Q2', periods=int(n * 4), freq='Q') ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] - self.assertEqual(rs, xp) + assert rs == xp (vmin, vmax) = ax.get_xlim() ax.set_xlim(vmin + 0.9, vmax) rs = xaxis.get_majorticklocs()[0] - self.assertEqual(xp, rs) - plt.close(ax.get_figure()) + assert xp == rs + self.plt.close(ax.get_figure()) - @slow + @pytest.mark.slow def test_finder_monthly(self): - import matplotlib.pyplot as plt xp = Period('Jan 1988').ordinal yrs = [1.15, 2.5, 4, 11] for n in yrs: rng = period_range('1987Q2', periods=int(n * 12), freq='M') ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] - self.assertEqual(rs, xp) + assert rs == xp vmin, vmax = ax.get_xlim() ax.set_xlim(vmin + 0.9, vmax) rs = xaxis.get_majorticklocs()[0] - self.assertEqual(xp, rs) - plt.close(ax.get_figure()) + assert xp == rs + self.plt.close(ax.get_figure()) def test_finder_monthly_long(self): rng = period_range('1988Q1', periods=24 * 12, freq='M') ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] xp = Period('1989Q1', 'M').ordinal - self.assertEqual(rs, xp) + assert rs == xp - @slow + @pytest.mark.slow def test_finder_annual(self): - import matplotlib.pyplot as plt xp = [1987, 1988, 1990, 1990, 1995, 2020, 2070, 2170] for i, nyears in enumerate([5, 10, 19, 49, 99, 199, 599, 1001]): rng = period_range('1987', periods=nyears, freq='A') ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] - self.assertEqual(rs, Period(xp[i], freq='A').ordinal) - plt.close(ax.get_figure()) + assert rs == Period(xp[i], freq='A').ordinal + self.plt.close(ax.get_figure()) - @slow + @pytest.mark.slow def test_finder_minutely(self): nminutes = 50 * 24 * 60 rng = date_range('1/1/1999', freq='Min', periods=nminutes) ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] xp = Period('1/1/1999', freq='Min').ordinal - self.assertEqual(rs, xp) + assert rs == xp def test_finder_hourly(self): nhours = 23 rng = date_range('1/1/1999', freq='H', periods=nhours) ser = Series(np.random.randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) xaxis = ax.get_xaxis() rs = xaxis.get_majorticklocs()[0] xp = Period('1/1/1999', freq='H').ordinal - self.assertEqual(rs, xp) + assert rs == xp - @slow + @pytest.mark.slow def test_gaps(self): - import matplotlib.pyplot as plt - ts = tm.makeTimeSeries() ts[5:25] = np.nan - ax = ts.plot() + _, ax = self.plt.subplots() + ts.plot(ax=ax) lines = ax.get_lines() tm._skip_if_mpl_1_5() - self.assertEqual(len(lines), 1) + assert len(lines) == 1 l = lines[0] data = l.get_xydata() - tm.assertIsInstance(data, np.ma.core.MaskedArray) + assert isinstance(data, np.ma.core.MaskedArray) mask = data.mask - self.assertTrue(mask[5:25, 1].all()) - plt.close(ax.get_figure()) + assert mask[5:25, 1].all() + self.plt.close(ax.get_figure()) # irregular ts = tm.makeTimeSeries() ts = ts[[0, 1, 2, 5, 7, 9, 12, 15, 20]] ts[2:5] = np.nan - ax = ts.plot() + _, ax = self.plt.subplots() + ax = ts.plot(ax=ax) lines = ax.get_lines() - self.assertEqual(len(lines), 1) + assert len(lines) == 1 l = lines[0] data = l.get_xydata() - tm.assertIsInstance(data, np.ma.core.MaskedArray) + assert isinstance(data, np.ma.core.MaskedArray) mask = data.mask - self.assertTrue(mask[2:5, 1].all()) - plt.close(ax.get_figure()) + assert mask[2:5, 1].all() + self.plt.close(ax.get_figure()) # non-ts idx = [0, 1, 2, 5, 7, 9, 12, 15, 20] ser = Series(np.random.randn(len(idx)), idx) ser[2:5] = np.nan - ax = ser.plot() + _, ax = self.plt.subplots() + ser.plot(ax=ax) lines = ax.get_lines() - self.assertEqual(len(lines), 1) + assert len(lines) == 1 l = lines[0] data = l.get_xydata() - tm.assertIsInstance(data, np.ma.core.MaskedArray) + assert isinstance(data, np.ma.core.MaskedArray) mask = data.mask - self.assertTrue(mask[2:5, 1].all()) + assert mask[2:5, 1].all() - @slow + @pytest.mark.slow def test_gap_upsample(self): low = tm.makeTimeSeries() low[5:25] = np.nan - ax = low.plot() + _, ax = self.plt.subplots() + low.plot(ax=ax) idxh = date_range(low.index[0], low.index[-1], freq='12h') s = Series(np.random.randn(len(idxh)), idxh) s.plot(secondary_y=True) lines = ax.get_lines() - self.assertEqual(len(lines), 1) - self.assertEqual(len(ax.right_ax.get_lines()), 1) + assert len(lines) == 1 + assert len(ax.right_ax.get_lines()) == 1 l = lines[0] data = l.get_xydata() tm._skip_if_mpl_1_5() - tm.assertIsInstance(data, np.ma.core.MaskedArray) + assert isinstance(data, np.ma.core.MaskedArray) mask = data.mask - self.assertTrue(mask[5:25, 1].all()) + assert mask[5:25, 1].all() - @slow + @pytest.mark.slow def test_secondary_y(self): - import matplotlib.pyplot as plt - ser = Series(np.random.randn(10)) ser2 = Series(np.random.randn(10)) + fig, _ = self.plt.subplots() ax = ser.plot(secondary_y=True) - self.assertTrue(hasattr(ax, 'left_ax')) - self.assertFalse(hasattr(ax, 'right_ax')) - fig = ax.get_figure() + assert hasattr(ax, 'left_ax') + assert not hasattr(ax, 'right_ax') axes = fig.get_axes() l = ax.get_lines()[0] xp = Series(l.get_ydata(), l.get_xdata()) assert_series_equal(ser, xp) - self.assertEqual(ax.get_yaxis().get_ticks_position(), 'right') - self.assertFalse(axes[0].get_yaxis().get_visible()) - plt.close(fig) + assert ax.get_yaxis().get_ticks_position() == 'right' + assert not axes[0].get_yaxis().get_visible() + self.plt.close(fig) - ax2 = ser2.plot() - self.assertEqual(ax2.get_yaxis().get_ticks_position(), - self.default_tick_position) - plt.close(ax2.get_figure()) + _, ax2 = self.plt.subplots() + ser2.plot(ax=ax2) + assert (ax2.get_yaxis().get_ticks_position() == + self.default_tick_position) + self.plt.close(ax2.get_figure()) ax = ser2.plot() ax2 = ser.plot(secondary_y=True) - self.assertTrue(ax.get_yaxis().get_visible()) - self.assertFalse(hasattr(ax, 'left_ax')) - self.assertTrue(hasattr(ax, 'right_ax')) - self.assertTrue(hasattr(ax2, 'left_ax')) - self.assertFalse(hasattr(ax2, 'right_ax')) + assert ax.get_yaxis().get_visible() + assert not hasattr(ax, 'left_ax') + assert hasattr(ax, 'right_ax') + assert hasattr(ax2, 'left_ax') + assert not hasattr(ax2, 'right_ax') - @slow + @pytest.mark.slow def test_secondary_y_ts(self): - import matplotlib.pyplot as plt idx = date_range('1/1/2000', periods=10) ser = Series(np.random.randn(10), idx) ser2 = Series(np.random.randn(10), idx) + fig, _ = self.plt.subplots() ax = ser.plot(secondary_y=True) - self.assertTrue(hasattr(ax, 'left_ax')) - self.assertFalse(hasattr(ax, 'right_ax')) - fig = ax.get_figure() + assert hasattr(ax, 'left_ax') + assert not hasattr(ax, 'right_ax') axes = fig.get_axes() l = ax.get_lines()[0] xp = Series(l.get_ydata(), l.get_xdata()).to_timestamp() assert_series_equal(ser, xp) - self.assertEqual(ax.get_yaxis().get_ticks_position(), 'right') - self.assertFalse(axes[0].get_yaxis().get_visible()) - plt.close(fig) + assert ax.get_yaxis().get_ticks_position() == 'right' + assert not axes[0].get_yaxis().get_visible() + self.plt.close(fig) - ax2 = ser2.plot() - self.assertEqual(ax2.get_yaxis().get_ticks_position(), - self.default_tick_position) - plt.close(ax2.get_figure()) + _, ax2 = self.plt.subplots() + ser2.plot(ax=ax2) + assert (ax2.get_yaxis().get_ticks_position() == + self.default_tick_position) + self.plt.close(ax2.get_figure()) ax = ser2.plot() ax2 = ser.plot(secondary_y=True) - self.assertTrue(ax.get_yaxis().get_visible()) + assert ax.get_yaxis().get_visible() - @slow + @pytest.mark.slow def test_secondary_kde(self): + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() - import matplotlib.pyplot as plt # noqa ser = Series(np.random.randn(10)) - ax = ser.plot(secondary_y=True, kind='density') - self.assertTrue(hasattr(ax, 'left_ax')) - self.assertFalse(hasattr(ax, 'right_ax')) - fig = ax.get_figure() + fig, ax = self.plt.subplots() + ax = ser.plot(secondary_y=True, kind='density', ax=ax) + assert hasattr(ax, 'left_ax') + assert not hasattr(ax, 'right_ax') axes = fig.get_axes() - self.assertEqual(axes[1].get_yaxis().get_ticks_position(), 'right') + assert axes[1].get_yaxis().get_ticks_position() == 'right' - @slow + @pytest.mark.slow def test_secondary_bar(self): ser = Series(np.random.randn(10)) - ax = ser.plot(secondary_y=True, kind='bar') - fig = ax.get_figure() + fig, ax = self.plt.subplots() + ser.plot(secondary_y=True, kind='bar', ax=ax) axes = fig.get_axes() - self.assertEqual(axes[1].get_yaxis().get_ticks_position(), 'right') + assert axes[1].get_yaxis().get_ticks_position() == 'right' - @slow + @pytest.mark.slow def test_secondary_frame(self): df = DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c']) axes = df.plot(secondary_y=['a', 'c'], subplots=True) - self.assertEqual(axes[0].get_yaxis().get_ticks_position(), 'right') - self.assertEqual(axes[1].get_yaxis().get_ticks_position(), - self.default_tick_position) - self.assertEqual(axes[2].get_yaxis().get_ticks_position(), 'right') + assert axes[0].get_yaxis().get_ticks_position() == 'right' + assert (axes[1].get_yaxis().get_ticks_position() == + self.default_tick_position) + assert axes[2].get_yaxis().get_ticks_position() == 'right' - @slow + @pytest.mark.slow def test_secondary_bar_frame(self): df = DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c']) axes = df.plot(kind='bar', secondary_y=['a', 'c'], subplots=True) - self.assertEqual(axes[0].get_yaxis().get_ticks_position(), 'right') - self.assertEqual(axes[1].get_yaxis().get_ticks_position(), - self.default_tick_position) - self.assertEqual(axes[2].get_yaxis().get_ticks_position(), 'right') + assert axes[0].get_yaxis().get_ticks_position() == 'right' + assert (axes[1].get_yaxis().get_ticks_position() == + self.default_tick_position) + assert axes[2].get_yaxis().get_ticks_position() == 'right' def test_mixed_freq_regular_first(self): - import matplotlib.pyplot as plt # noqa + # TODO s1 = tm.makeTimeSeries() s2 = s1[[0, 5, 10, 11, 12, 13, 14, 15]] @@ -668,21 +661,21 @@ def test_mixed_freq_regular_first(self): lines = ax2.get_lines() idx1 = PeriodIndex(lines[0].get_xdata()) idx2 = PeriodIndex(lines[1].get_xdata()) - self.assertTrue(idx1.equals(s1.index.to_period('B'))) - self.assertTrue(idx2.equals(s2.index.to_period('B'))) + assert idx1.equals(s1.index.to_period('B')) + assert idx2.equals(s2.index.to_period('B')) left, right = ax2.get_xlim() pidx = s1.index.to_period() - self.assertEqual(left, pidx[0].ordinal) - self.assertEqual(right, pidx[-1].ordinal) + assert left == pidx[0].ordinal + assert right == pidx[-1].ordinal - @slow + @pytest.mark.slow def test_mixed_freq_irregular_first(self): - import matplotlib.pyplot as plt # noqa s1 = tm.makeTimeSeries() s2 = s1[[0, 5, 10, 11, 12, 13, 14, 15]] - s2.plot(style='g') - ax = s1.plot() - self.assertFalse(hasattr(ax, 'freq')) + _, ax = self.plt.subplots() + s2.plot(style='g', ax=ax) + s1.plot(ax=ax) + assert not hasattr(ax, 'freq') lines = ax.get_lines() x1 = lines[0].get_xdata() tm.assert_numpy_array_equal(x1, s2.index.asobject.values) @@ -691,30 +684,30 @@ def test_mixed_freq_irregular_first(self): def test_mixed_freq_regular_first_df(self): # GH 9852 - import matplotlib.pyplot as plt # noqa s1 = tm.makeTimeSeries().to_frame() s2 = s1.iloc[[0, 5, 10, 11, 12, 13, 14, 15], :] - ax = s1.plot() + _, ax = self.plt.subplots() + s1.plot(ax=ax) ax2 = s2.plot(style='g', ax=ax) lines = ax2.get_lines() idx1 = PeriodIndex(lines[0].get_xdata()) idx2 = PeriodIndex(lines[1].get_xdata()) - self.assertTrue(idx1.equals(s1.index.to_period('B'))) - self.assertTrue(idx2.equals(s2.index.to_period('B'))) + assert idx1.equals(s1.index.to_period('B')) + assert idx2.equals(s2.index.to_period('B')) left, right = ax2.get_xlim() pidx = s1.index.to_period() - self.assertEqual(left, pidx[0].ordinal) - self.assertEqual(right, pidx[-1].ordinal) + assert left == pidx[0].ordinal + assert right == pidx[-1].ordinal - @slow + @pytest.mark.slow def test_mixed_freq_irregular_first_df(self): # GH 9852 - import matplotlib.pyplot as plt # noqa s1 = tm.makeTimeSeries().to_frame() s2 = s1.iloc[[0, 5, 10, 11, 12, 13, 14, 15], :] - ax = s2.plot(style='g') - ax = s1.plot(ax=ax) - self.assertFalse(hasattr(ax, 'freq')) + _, ax = self.plt.subplots() + s2.plot(style='g', ax=ax) + s1.plot(ax=ax) + assert not hasattr(ax, 'freq') lines = ax.get_lines() x1 = lines[0].get_xdata() tm.assert_numpy_array_equal(x1, s2.index.asobject.values) @@ -726,12 +719,13 @@ def test_mixed_freq_hf_first(self): idxl = date_range('1/1/1999', periods=12, freq='M') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - high.plot() - ax = low.plot() + _, ax = self.plt.subplots() + high.plot(ax=ax) + low.plot(ax=ax) for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, 'D') + assert PeriodIndex(data=l.get_xdata()).freq == 'D' - @slow + @pytest.mark.slow def test_mixed_freq_alignment(self): ts_ind = date_range('2012-01-01 13:00', '2012-01-02', freq='H') ts_data = np.random.randn(12) @@ -739,44 +733,46 @@ def test_mixed_freq_alignment(self): ts = Series(ts_data, index=ts_ind) ts2 = ts.asfreq('T').interpolate() - ax = ts.plot() - ts2.plot(style='r') + _, ax = self.plt.subplots() + ax = ts.plot(ax=ax) + ts2.plot(style='r', ax=ax) - self.assertEqual(ax.lines[0].get_xdata()[0], - ax.lines[1].get_xdata()[0]) + assert ax.lines[0].get_xdata()[0] == ax.lines[1].get_xdata()[0] - @slow + @pytest.mark.slow def test_mixed_freq_lf_first(self): - import matplotlib.pyplot as plt idxh = date_range('1/1/1999', periods=365, freq='D') idxl = date_range('1/1/1999', periods=12, freq='M') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - low.plot(legend=True) - ax = high.plot(legend=True) + _, ax = self.plt.subplots() + low.plot(legend=True, ax=ax) + high.plot(legend=True, ax=ax) for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, 'D') + assert PeriodIndex(data=l.get_xdata()).freq == 'D' leg = ax.get_legend() - self.assertEqual(len(leg.texts), 2) - plt.close(ax.get_figure()) + assert len(leg.texts) == 2 + self.plt.close(ax.get_figure()) idxh = date_range('1/1/1999', periods=240, freq='T') idxl = date_range('1/1/1999', periods=4, freq='H') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - low.plot() - ax = high.plot() + _, ax = self.plt.subplots() + low.plot(ax=ax) + high.plot(ax=ax) for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, 'T') + assert PeriodIndex(data=l.get_xdata()).freq == 'T' def test_mixed_freq_irreg_period(self): ts = tm.makeTimeSeries() irreg = ts[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 29]] rng = period_range('1/3/2000', periods=30, freq='B') ps = Series(np.random.randn(len(rng)), rng) - irreg.plot() - ps.plot() + _, ax = self.plt.subplots() + irreg.plot(ax=ax) + ps.plot(ax=ax) def test_mixed_freq_shared_ax(self): @@ -790,10 +786,10 @@ def test_mixed_freq_shared_ax(self): s1.plot(ax=ax1) s2.plot(ax=ax2) - self.assertEqual(ax1.freq, 'M') - self.assertEqual(ax2.freq, 'M') - self.assertEqual(ax1.lines[0].get_xydata()[0, 0], - ax2.lines[0].get_xydata()[0, 0]) + assert ax1.freq == 'M' + assert ax2.freq == 'M' + assert (ax1.lines[0].get_xydata()[0, 0] == + ax2.lines[0].get_xydata()[0, 0]) # using twinx fig, ax1 = self.plt.subplots() @@ -801,8 +797,8 @@ def test_mixed_freq_shared_ax(self): s1.plot(ax=ax1) s2.plot(ax=ax2) - self.assertEqual(ax1.lines[0].get_xydata()[0, 0], - ax2.lines[0].get_xydata()[0, 0]) + assert (ax1.lines[0].get_xydata()[0, 0] == + ax2.lines[0].get_xydata()[0, 0]) # TODO (GH14330, GH14322) # plotting the irregular first does not yet work @@ -810,65 +806,79 @@ def test_mixed_freq_shared_ax(self): # ax2 = ax1.twinx() # s2.plot(ax=ax1) # s1.plot(ax=ax2) - # self.assertEqual(ax1.lines[0].get_xydata()[0, 0], - # ax2.lines[0].get_xydata()[0, 0]) + # assert (ax1.lines[0].get_xydata()[0, 0] == + # ax2.lines[0].get_xydata()[0, 0]) + + def test_nat_handling(self): + + _, ax = self.plt.subplots() + + dti = DatetimeIndex(['2015-01-01', NaT, '2015-01-03']) + s = Series(range(len(dti)), dti) + s.plot(ax=ax) + xdata = ax.get_lines()[0].get_xdata() + # plot x data is bounded by index values + assert s.index.min() <= Series(xdata).min() + assert Series(xdata).max() <= s.index.max() - @slow + @pytest.mark.slow def test_to_weekly_resampling(self): idxh = date_range('1/1/1999', periods=52, freq='W') idxl = date_range('1/1/1999', periods=12, freq='M') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - high.plot() - ax = low.plot() + _, ax = self.plt.subplots() + high.plot(ax=ax) + low.plot(ax=ax) for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, idxh.freq) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq # tsplot from pandas.tseries.plotting import tsplot - import matplotlib.pyplot as plt - tsplot(high, plt.Axes.plot) - lines = tsplot(low, plt.Axes.plot) + _, ax = self.plt.subplots() + tsplot(high, self.plt.Axes.plot, ax=ax) + lines = tsplot(low, self.plt.Axes.plot, ax=ax) for l in lines: - self.assertTrue(PeriodIndex(data=l.get_xdata()).freq, idxh.freq) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq - @slow + @pytest.mark.slow def test_from_weekly_resampling(self): idxh = date_range('1/1/1999', periods=52, freq='W') idxl = date_range('1/1/1999', periods=12, freq='M') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - low.plot() - ax = high.plot() + _, ax = self.plt.subplots() + low.plot(ax=ax) + high.plot(ax=ax) expected_h = idxh.to_period().asi8.astype(np.float64) expected_l = np.array([1514, 1519, 1523, 1527, 1531, 1536, 1540, 1544, 1549, 1553, 1558, 1562], dtype=np.float64) for l in ax.get_lines(): - self.assertTrue(PeriodIndex(data=l.get_xdata()).freq, idxh.freq) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq xdata = l.get_xdata(orig=False) if len(xdata) == 12: # idxl lines - self.assert_numpy_array_equal(xdata, expected_l) + tm.assert_numpy_array_equal(xdata, expected_l) else: - self.assert_numpy_array_equal(xdata, expected_h) + tm.assert_numpy_array_equal(xdata, expected_h) tm.close() # tsplot from pandas.tseries.plotting import tsplot - import matplotlib.pyplot as plt - tsplot(low, plt.Axes.plot) - lines = tsplot(high, plt.Axes.plot) + _, ax = self.plt.subplots() + tsplot(low, self.plt.Axes.plot, ax=ax) + lines = tsplot(high, self.plt.Axes.plot, ax=ax) for l in lines: - self.assertTrue(PeriodIndex(data=l.get_xdata()).freq, idxh.freq) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq xdata = l.get_xdata(orig=False) if len(xdata) == 12: # idxl lines - self.assert_numpy_array_equal(xdata, expected_l) + tm.assert_numpy_array_equal(xdata, expected_l) else: - self.assert_numpy_array_equal(xdata, expected_h) + tm.assert_numpy_array_equal(xdata, expected_h) - @slow + @pytest.mark.slow def test_from_resampling_area_line_mixed(self): idxh = date_range('1/1/1999', periods=52, freq='W') idxl = date_range('1/1/1999', periods=12, freq='M') @@ -879,8 +889,9 @@ def test_from_resampling_area_line_mixed(self): # low to high for kind1, kind2 in [('line', 'area'), ('area', 'line')]: - ax = low.plot(kind=kind1, stacked=True) - ax = high.plot(kind=kind2, stacked=True, ax=ax) + _, ax = self.plt.subplots() + low.plot(kind=kind1, stacked=True, ax=ax) + high.plot(kind=kind2, stacked=True, ax=ax) # check low dataframe result expected_x = np.array([1514, 1519, 1523, 1527, 1531, 1536, 1540, @@ -889,44 +900,43 @@ def test_from_resampling_area_line_mixed(self): expected_y = np.zeros(len(expected_x), dtype=np.float64) for i in range(3): l = ax.lines[i] - self.assertEqual(PeriodIndex(l.get_xdata()).freq, idxh.freq) - self.assert_numpy_array_equal(l.get_xdata(orig=False), - expected_x) + assert PeriodIndex(l.get_xdata()).freq == idxh.freq + tm.assert_numpy_array_equal(l.get_xdata(orig=False), + expected_x) # check stacked values are correct expected_y += low[i].values - self.assert_numpy_array_equal( - l.get_ydata(orig=False), expected_y) + tm.assert_numpy_array_equal(l.get_ydata(orig=False), + expected_y) # check high dataframe result expected_x = idxh.to_period().asi8.astype(np.float64) expected_y = np.zeros(len(expected_x), dtype=np.float64) for i in range(3): l = ax.lines[3 + i] - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, - idxh.freq) - self.assert_numpy_array_equal(l.get_xdata(orig=False), - expected_x) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq + tm.assert_numpy_array_equal(l.get_xdata(orig=False), + expected_x) expected_y += high[i].values - self.assert_numpy_array_equal(l.get_ydata(orig=False), - expected_y) + tm.assert_numpy_array_equal(l.get_ydata(orig=False), + expected_y) # high to low for kind1, kind2 in [('line', 'area'), ('area', 'line')]: - ax = high.plot(kind=kind1, stacked=True) - ax = low.plot(kind=kind2, stacked=True, ax=ax) + _, ax = self.plt.subplots() + high.plot(kind=kind1, stacked=True, ax=ax) + low.plot(kind=kind2, stacked=True, ax=ax) # check high dataframe result expected_x = idxh.to_period().asi8.astype(np.float64) expected_y = np.zeros(len(expected_x), dtype=np.float64) for i in range(3): l = ax.lines[i] - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, - idxh.freq) - self.assert_numpy_array_equal( - l.get_xdata(orig=False), expected_x) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq + tm.assert_numpy_array_equal(l.get_xdata(orig=False), + expected_x) expected_y += high[i].values - self.assert_numpy_array_equal( - l.get_ydata(orig=False), expected_y) + tm.assert_numpy_array_equal(l.get_ydata(orig=False), + expected_y) # check low dataframe result expected_x = np.array([1514, 1519, 1523, 1527, 1531, 1536, 1540, @@ -935,15 +945,14 @@ def test_from_resampling_area_line_mixed(self): expected_y = np.zeros(len(expected_x), dtype=np.float64) for i in range(3): l = ax.lines[3 + i] - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, - idxh.freq) - self.assert_numpy_array_equal(l.get_xdata(orig=False), - expected_x) + assert PeriodIndex(data=l.get_xdata()).freq == idxh.freq + tm.assert_numpy_array_equal(l.get_xdata(orig=False), + expected_x) expected_y += low[i].values - self.assert_numpy_array_equal(l.get_ydata(orig=False), - expected_y) + tm.assert_numpy_array_equal(l.get_ydata(orig=False), + expected_y) - @slow + @pytest.mark.slow def test_mixed_freq_second_millisecond(self): # GH 7772, GH 7760 idxh = date_range('2014-07-01 09:00', freq='S', periods=50) @@ -951,21 +960,23 @@ def test_mixed_freq_second_millisecond(self): high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) # high to low - high.plot() - ax = low.plot() - self.assertEqual(len(ax.get_lines()), 2) + _, ax = self.plt.subplots() + high.plot(ax=ax) + low.plot(ax=ax) + assert len(ax.get_lines()) == 2 for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, 'L') + assert PeriodIndex(data=l.get_xdata()).freq == 'L' tm.close() # low to high - low.plot() - ax = high.plot() - self.assertEqual(len(ax.get_lines()), 2) + _, ax = self.plt.subplots() + low.plot(ax=ax) + high.plot(ax=ax) + assert len(ax.get_lines()) == 2 for l in ax.get_lines(): - self.assertEqual(PeriodIndex(data=l.get_xdata()).freq, 'L') + assert PeriodIndex(data=l.get_xdata()).freq == 'L' - @slow + @pytest.mark.slow def test_irreg_dtypes(self): # date idx = [date(2000, 1, 1), date(2000, 1, 5), date(2000, 1, 20)] @@ -976,9 +987,10 @@ def test_irreg_dtypes(self): idx = date_range('1/1/2000', periods=10) idx = idx[[0, 2, 5, 9]].asobject df = DataFrame(np.random.randn(len(idx), 3), idx) - _check_plot_works(df.plot) + _, ax = self.plt.subplots() + _check_plot_works(df.plot, ax=ax) - @slow + @pytest.mark.slow def test_time(self): t = datetime(1, 1, 1, 3, 30, 0) deltas = np.random.randint(1, 20, 3).cumsum() @@ -986,7 +998,8 @@ def test_time(self): df = DataFrame({'a': np.random.randn(len(ts)), 'b': np.random.randn(len(ts))}, index=ts) - ax = df.plot() + _, ax = self.plt.subplots() + df.plot(ax=ax) # verify tick labels ticks = ax.get_xticks() @@ -997,7 +1010,7 @@ def test_time(self): xp = l.get_text() if len(xp) > 0: rs = time(h, m, s).strftime('%H:%M:%S') - self.assertEqual(xp, rs) + assert xp == rs # change xlim ax.set_xlim('1:30', '5:00') @@ -1011,9 +1024,9 @@ def test_time(self): xp = l.get_text() if len(xp) > 0: rs = time(h, m, s).strftime('%H:%M:%S') - self.assertEqual(xp, rs) + assert xp == rs - @slow + @pytest.mark.slow def test_time_musec(self): t = datetime(1, 1, 1, 3, 30, 0) deltas = np.random.randint(1, 20, 3).cumsum() @@ -1022,7 +1035,8 @@ def test_time_musec(self): df = DataFrame({'a': np.random.randn(len(ts)), 'b': np.random.randn(len(ts))}, index=ts) - ax = df.plot() + _, ax = self.plt.subplots() + ax = df.plot(ax=ax) # verify tick labels ticks = ax.get_xticks() @@ -1037,144 +1051,143 @@ def test_time_musec(self): xp = l.get_text() if len(xp) > 0: rs = time(h, m, s).strftime('%H:%M:%S.%f') - self.assertEqual(xp, rs) + assert xp == rs - @slow + @pytest.mark.slow def test_secondary_upsample(self): idxh = date_range('1/1/1999', periods=365, freq='D') idxl = date_range('1/1/1999', periods=12, freq='M') high = Series(np.random.randn(len(idxh)), idxh) low = Series(np.random.randn(len(idxl)), idxl) - low.plot() - ax = high.plot(secondary_y=True) + _, ax = self.plt.subplots() + low.plot(ax=ax) + ax = high.plot(secondary_y=True, ax=ax) for l in ax.get_lines(): - self.assertEqual(PeriodIndex(l.get_xdata()).freq, 'D') - self.assertTrue(hasattr(ax, 'left_ax')) - self.assertFalse(hasattr(ax, 'right_ax')) + assert PeriodIndex(l.get_xdata()).freq == 'D' + assert hasattr(ax, 'left_ax') + assert not hasattr(ax, 'right_ax') for l in ax.left_ax.get_lines(): - self.assertEqual(PeriodIndex(l.get_xdata()).freq, 'D') + assert PeriodIndex(l.get_xdata()).freq == 'D' - @slow + @pytest.mark.slow def test_secondary_legend(self): - import matplotlib.pyplot as plt - fig = plt.gcf() - plt.clf() + fig = self.plt.figure() ax = fig.add_subplot(211) # ts df = tm.makeTimeDataFrame() - ax = df.plot(secondary_y=['A', 'B']) + df.plot(secondary_y=['A', 'B'], ax=ax) leg = ax.get_legend() - self.assertEqual(len(leg.get_lines()), 4) - self.assertEqual(leg.get_texts()[0].get_text(), 'A (right)') - self.assertEqual(leg.get_texts()[1].get_text(), 'B (right)') - self.assertEqual(leg.get_texts()[2].get_text(), 'C') - self.assertEqual(leg.get_texts()[3].get_text(), 'D') - self.assertIsNone(ax.right_ax.get_legend()) + assert len(leg.get_lines()) == 4 + assert leg.get_texts()[0].get_text() == 'A (right)' + assert leg.get_texts()[1].get_text() == 'B (right)' + assert leg.get_texts()[2].get_text() == 'C' + assert leg.get_texts()[3].get_text() == 'D' + assert ax.right_ax.get_legend() is None colors = set() for line in leg.get_lines(): colors.add(line.get_color()) # TODO: color cycle problems - self.assertEqual(len(colors), 4) + assert len(colors) == 4 + self.plt.close(fig) - plt.clf() + fig = self.plt.figure() ax = fig.add_subplot(211) - ax = df.plot(secondary_y=['A', 'C'], mark_right=False) + df.plot(secondary_y=['A', 'C'], mark_right=False, ax=ax) leg = ax.get_legend() - self.assertEqual(len(leg.get_lines()), 4) - self.assertEqual(leg.get_texts()[0].get_text(), 'A') - self.assertEqual(leg.get_texts()[1].get_text(), 'B') - self.assertEqual(leg.get_texts()[2].get_text(), 'C') - self.assertEqual(leg.get_texts()[3].get_text(), 'D') - - plt.clf() - ax = df.plot(kind='bar', secondary_y=['A']) + assert len(leg.get_lines()) == 4 + assert leg.get_texts()[0].get_text() == 'A' + assert leg.get_texts()[1].get_text() == 'B' + assert leg.get_texts()[2].get_text() == 'C' + assert leg.get_texts()[3].get_text() == 'D' + self.plt.close(fig) + + fig, ax = self.plt.subplots() + df.plot(kind='bar', secondary_y=['A'], ax=ax) leg = ax.get_legend() - self.assertEqual(leg.get_texts()[0].get_text(), 'A (right)') - self.assertEqual(leg.get_texts()[1].get_text(), 'B') + assert leg.get_texts()[0].get_text() == 'A (right)' + assert leg.get_texts()[1].get_text() == 'B' + self.plt.close(fig) - plt.clf() - ax = df.plot(kind='bar', secondary_y=['A'], mark_right=False) + fig, ax = self.plt.subplots() + df.plot(kind='bar', secondary_y=['A'], mark_right=False, ax=ax) leg = ax.get_legend() - self.assertEqual(leg.get_texts()[0].get_text(), 'A') - self.assertEqual(leg.get_texts()[1].get_text(), 'B') + assert leg.get_texts()[0].get_text() == 'A' + assert leg.get_texts()[1].get_text() == 'B' + self.plt.close(fig) - plt.clf() + fig = self.plt.figure() ax = fig.add_subplot(211) df = tm.makeTimeDataFrame() - ax = df.plot(secondary_y=['C', 'D']) + ax = df.plot(secondary_y=['C', 'D'], ax=ax) leg = ax.get_legend() - self.assertEqual(len(leg.get_lines()), 4) - self.assertIsNone(ax.right_ax.get_legend()) + assert len(leg.get_lines()) == 4 + assert ax.right_ax.get_legend() is None colors = set() for line in leg.get_lines(): colors.add(line.get_color()) # TODO: color cycle problems - self.assertEqual(len(colors), 4) + assert len(colors) == 4 + self.plt.close(fig) # non-ts df = tm.makeDataFrame() - plt.clf() + fig = self.plt.figure() ax = fig.add_subplot(211) - ax = df.plot(secondary_y=['A', 'B']) + ax = df.plot(secondary_y=['A', 'B'], ax=ax) leg = ax.get_legend() - self.assertEqual(len(leg.get_lines()), 4) - self.assertIsNone(ax.right_ax.get_legend()) + assert len(leg.get_lines()) == 4 + assert ax.right_ax.get_legend() is None colors = set() for line in leg.get_lines(): colors.add(line.get_color()) # TODO: color cycle problems - self.assertEqual(len(colors), 4) + assert len(colors) == 4 + self.plt.close() - plt.clf() + fig = self.plt.figure() ax = fig.add_subplot(211) - ax = df.plot(secondary_y=['C', 'D']) + ax = df.plot(secondary_y=['C', 'D'], ax=ax) leg = ax.get_legend() - self.assertEqual(len(leg.get_lines()), 4) - self.assertIsNone(ax.right_ax.get_legend()) + assert len(leg.get_lines()) == 4 + assert ax.right_ax.get_legend() is None colors = set() for line in leg.get_lines(): colors.add(line.get_color()) # TODO: color cycle problems - self.assertEqual(len(colors), 4) + assert len(colors) == 4 def test_format_date_axis(self): rng = date_range('1/1/2012', periods=12, freq='M') df = DataFrame(np.random.randn(len(rng), 3), rng) - ax = df.plot() + _, ax = self.plt.subplots() + ax = df.plot(ax=ax) xaxis = ax.get_xaxis() for l in xaxis.get_ticklabels(): if len(l.get_text()) > 0: - self.assertEqual(l.get_rotation(), 30) + assert l.get_rotation() == 30 - @slow + @pytest.mark.slow def test_ax_plot(self): - import matplotlib.pyplot as plt - x = DatetimeIndex(start='2012-01-02', periods=10, freq='D') y = lrange(len(x)) - fig = plt.figure() - ax = fig.add_subplot(111) + _, ax = self.plt.subplots() lines = ax.plot(x, y, label='Y') tm.assert_index_equal(DatetimeIndex(lines[0].get_xdata()), x) - @slow + @pytest.mark.slow def test_mpl_nopandas(self): - import matplotlib.pyplot as plt - dates = [date(2008, 12, 31), date(2009, 1, 31)] values1 = np.arange(10.0, 11.0, 0.5) values2 = np.arange(11.0, 12.0, 0.5) kw = dict(fmt='-', lw=4) - plt.close('all') - fig = plt.figure() - ax = fig.add_subplot(111) + _, ax = self.plt.subplots() ax.plot_date([x.toordinal() for x in dates], values1, **kw) ax.plot_date([x.toordinal() for x in dates], values2, **kw) @@ -1185,22 +1198,23 @@ def test_mpl_nopandas(self): exp = np.array([x.toordinal() for x in dates], dtype=np.float64) tm.assert_numpy_array_equal(line2.get_xydata()[:, 0], exp) - @slow + @pytest.mark.slow def test_irregular_ts_shared_ax_xlim(self): # GH 2960 ts = tm.makeTimeSeries()[:20] ts_irregular = ts[[1, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 17, 18]] # plot the left section of the irregular series, then the right section - ax = ts_irregular[:5].plot() + _, ax = self.plt.subplots() + ts_irregular[:5].plot(ax=ax) ts_irregular[5:].plot(ax=ax) # check that axis limits are correct left, right = ax.get_xlim() - self.assertEqual(left, ts_irregular.index.min().toordinal()) - self.assertEqual(right, ts_irregular.index.max().toordinal()) + assert left == ts_irregular.index.min().toordinal() + assert right == ts_irregular.index.max().toordinal() - @slow + @pytest.mark.slow def test_secondary_y_non_ts_xlim(self): # GH 3490 - non-timeseries with secondary y index_1 = [1, 2, 3, 4] @@ -1208,15 +1222,16 @@ def test_secondary_y_non_ts_xlim(self): s1 = Series(1, index=index_1) s2 = Series(2, index=index_2) - ax = s1.plot() + _, ax = self.plt.subplots() + s1.plot(ax=ax) left_before, right_before = ax.get_xlim() s2.plot(secondary_y=True, ax=ax) left_after, right_after = ax.get_xlim() - self.assertEqual(left_before, left_after) - self.assertTrue(right_before < right_after) + assert left_before == left_after + assert right_before < right_after - @slow + @pytest.mark.slow def test_secondary_y_regular_ts_xlim(self): # GH 3490 - regular-timeseries with secondary y index_1 = date_range(start='2000-01-01', periods=4, freq='D') @@ -1224,52 +1239,146 @@ def test_secondary_y_regular_ts_xlim(self): s1 = Series(1, index=index_1) s2 = Series(2, index=index_2) - ax = s1.plot() + _, ax = self.plt.subplots() + s1.plot(ax=ax) left_before, right_before = ax.get_xlim() s2.plot(secondary_y=True, ax=ax) left_after, right_after = ax.get_xlim() - self.assertEqual(left_before, left_after) - self.assertTrue(right_before < right_after) + assert left_before == left_after + assert right_before < right_after - @slow + @pytest.mark.slow def test_secondary_y_mixed_freq_ts_xlim(self): # GH 3490 - mixed frequency timeseries with secondary y rng = date_range('2000-01-01', periods=10000, freq='min') ts = Series(1, index=rng) - ax = ts.plot() + _, ax = self.plt.subplots() + ts.plot(ax=ax) left_before, right_before = ax.get_xlim() ts.resample('D').mean().plot(secondary_y=True, ax=ax) left_after, right_after = ax.get_xlim() # a downsample should not have changed either limit - self.assertEqual(left_before, left_after) - self.assertEqual(right_before, right_after) + assert left_before == left_after + assert right_before == right_after - @slow + @pytest.mark.slow def test_secondary_y_irregular_ts_xlim(self): # GH 3490 - irregular-timeseries with secondary y ts = tm.makeTimeSeries()[:20] ts_irregular = ts[[1, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 17, 18]] - ax = ts_irregular[:5].plot() + _, ax = self.plt.subplots() + ts_irregular[:5].plot(ax=ax) # plot higher-x values on secondary axis ts_irregular[5:].plot(secondary_y=True, ax=ax) # ensure secondary limits aren't overwritten by plot on primary ts_irregular[:5].plot(ax=ax) left, right = ax.get_xlim() - self.assertEqual(left, ts_irregular.index.min().toordinal()) - self.assertEqual(right, ts_irregular.index.max().toordinal()) + assert left == ts_irregular.index.min().toordinal() + assert right == ts_irregular.index.max().toordinal() def test_plot_outofbounds_datetime(self): # 2579 - checking this does not raise values = [date(1677, 1, 1), date(1677, 1, 2)] - self.plt.plot(values) + _, ax = self.plt.subplots() + ax.plot(values) values = [datetime(1677, 1, 1, 12), datetime(1677, 1, 2, 12)] - self.plt.plot(values) + ax.plot(values) + + def test_format_timedelta_ticks_narrow(self): + if is_platform_mac(): + pytest.skip("skip on mac for precision display issue on older mpl") + + expected_labels = [ + '00:00:00.00000000{:d}'.format(i) + for i in range(10)] + + rng = timedelta_range('0', periods=10, freq='ns') + df = DataFrame(np.random.randn(len(rng), 3), rng) + fig, ax = self.plt.subplots() + df.plot(fontsize=2, ax=ax) + fig.canvas.draw() + labels = ax.get_xticklabels() + assert len(labels) == len(expected_labels) + for l, l_expected in zip(labels, expected_labels): + assert l.get_text() == l_expected + + def test_format_timedelta_ticks_wide(self): + if is_platform_mac(): + pytest.skip("skip on mac for precision display issue on older mpl") + + expected_labels = [ + '00:00:00', + '1 days 03:46:40', + '2 days 07:33:20', + '3 days 11:20:00', + '4 days 15:06:40', + '5 days 18:53:20', + '6 days 22:40:00', + '8 days 02:26:40', + '' + ] + + rng = timedelta_range('0', periods=10, freq='1 d') + df = DataFrame(np.random.randn(len(rng), 3), rng) + fig, ax = self.plt.subplots() + ax = df.plot(fontsize=2, ax=ax) + fig.canvas.draw() + labels = ax.get_xticklabels() + assert len(labels) == len(expected_labels) + for l, l_expected in zip(labels, expected_labels): + assert l.get_text() == l_expected + + def test_timedelta_plot(self): + # test issue #8711 + s = Series(range(5), timedelta_range('1day', periods=5)) + _, ax = self.plt.subplots() + _check_plot_works(s.plot, ax=ax) + + # test long period + index = timedelta_range('1 day 2 hr 30 min 10 s', + periods=10, freq='1 d') + s = Series(np.random.randn(len(index)), index) + _, ax = self.plt.subplots() + _check_plot_works(s.plot, ax=ax) + + # test short period + index = timedelta_range('1 day 2 hr 30 min 10 s', + periods=10, freq='1 ns') + s = Series(np.random.randn(len(index)), index) + _, ax = self.plt.subplots() + _check_plot_works(s.plot, ax=ax) + + def test_hist(self): + # https://github.com/matplotlib/matplotlib/issues/8459 + rng = date_range('1/1/2011', periods=10, freq='H') + x = rng + w1 = np.arange(0, 1, .1) + w2 = np.arange(0, 1, .1)[::-1] + _, ax = self.plt.subplots() + ax.hist([x, x], weights=[w1, w2]) + + @pytest.mark.slow + def test_overlapping_datetime(self): + # GB 6608 + s1 = Series([1, 2, 3], index=[datetime(1995, 12, 31), + datetime(2000, 12, 31), + datetime(2005, 12, 31)]) + s2 = Series([1, 2, 3], index=[datetime(1997, 12, 31), + datetime(2003, 12, 31), + datetime(2008, 12, 31)]) + + # plot first series, then add the second series to those axes, + # then try adding the first series again + _, ax = self.plt.subplots() + s1.plot(ax=ax) + s2.plot(ax=ax) + s1.plot(ax=ax) def _check_plot_works(f, freq=None, series=None, *args, **kwargs): @@ -1309,8 +1418,3 @@ def _check_plot_works(f, freq=None, series=None, *args, **kwargs): plt.savefig(path) finally: plt.close(fig) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/plotting/test_deprecated.py b/pandas/tests/plotting/test_deprecated.py new file mode 100644 index 0000000000000..970de6ff881ab --- /dev/null +++ b/pandas/tests/plotting/test_deprecated.py @@ -0,0 +1,59 @@ +# coding: utf-8 + +import string + +import pandas as pd +import pandas.util.testing as tm +import pytest + +from numpy.random import randn + +import pandas.tools.plotting as plotting + +from pandas.tests.plotting.common import TestPlotBase + + +""" +Test cases for plot functions imported from deprecated +pandas.tools.plotting +""" + +tm._skip_if_no_mpl() + + +class TestDeprecatedNameSpace(TestPlotBase): + + @pytest.mark.slow + def test_scatter_plot_legacy(self): + tm._skip_if_no_scipy() + + df = pd.DataFrame(randn(100, 2)) + + with tm.assert_produces_warning(FutureWarning): + plotting.scatter_matrix(df) + + with tm.assert_produces_warning(FutureWarning): + pd.scatter_matrix(df) + + @pytest.mark.slow + def test_boxplot_deprecated(self): + df = pd.DataFrame(randn(6, 4), + index=list(string.ascii_letters[:6]), + columns=['one', 'two', 'three', 'four']) + df['indic'] = ['foo', 'bar'] * 3 + + with tm.assert_produces_warning(FutureWarning): + plotting.boxplot(df, column=['one', 'two'], + by='indic') + + @pytest.mark.slow + def test_radviz_deprecated(self): + df = self.iris + with tm.assert_produces_warning(FutureWarning): + plotting.radviz(frame=df, class_column='Name') + + @pytest.mark.slow + def test_plot_params(self): + + with tm.assert_produces_warning(FutureWarning): + pd.plot_params['xaxis.compat'] = True diff --git a/pandas/tests/plotting/test_frame.py b/pandas/tests/plotting/test_frame.py index fba554b03f191..67098529a0111 100644 --- a/pandas/tests/plotting/test_frame.py +++ b/pandas/tests/plotting/test_frame.py @@ -1,6 +1,8 @@ # coding: utf-8 -import nose +""" Test cases for DataFrame.plot """ + +import pytest import string import warnings @@ -9,31 +11,26 @@ import pandas as pd from pandas import (Series, DataFrame, MultiIndex, PeriodIndex, date_range, bdate_range) -from pandas.types.api import is_list_like -from pandas.compat import (range, lrange, StringIO, lmap, lzip, u, zip, PY3) -from pandas.formats.printing import pprint_thing +from pandas.core.dtypes.api import is_list_like +from pandas.compat import range, lrange, lmap, lzip, u, zip, PY3 +from pandas.io.formats.printing import pprint_thing import pandas.util.testing as tm -from pandas.util.testing import slow - -from pandas.core.config import set_option import numpy as np from numpy.random import rand, randn -import pandas.tools.plotting as plotting +import pandas.plotting as plotting from pandas.tests.plotting.common import (TestPlotBase, _check_plot_works, _skip_if_no_scipy_gaussian_kde, _ok_for_gaussian_kde) - -""" Test cases for DataFrame.plot """ +tm._skip_if_no_mpl() -@tm.mplskip class TestDataFramePlots(TestPlotBase): - def setUp(self): - TestPlotBase.setUp(self) + def setup_method(self, method): + TestPlotBase.setup_method(self, method) import matplotlib as mpl mpl.rcdefaults() @@ -43,7 +40,7 @@ def setUp(self): "C": np.arange(20) + np.random.uniform( size=20)}) - @slow + @pytest.mark.slow def test_plot(self): df = self.tdf _check_plot_works(df.plot, grid=False) @@ -65,7 +62,7 @@ def test_plot(self): df = DataFrame({'x': [1, 2], 'y': [3, 4]}) # mpl >= 1.5.2 (or slightly below) throw AttributError - with tm.assertRaises((TypeError, AttributeError)): + with pytest.raises((TypeError, AttributeError)): df.plot.line(blarg=True) df = DataFrame(np.random.rand(10, 3), @@ -135,12 +132,39 @@ def test_plot(self): # passed ax should be used: fig, ax = self.plt.subplots() axes = df.plot.bar(subplots=True, ax=ax) - self.assertEqual(len(axes), 1) + assert len(axes) == 1 if self.mpl_ge_1_5_0: result = ax.axes else: result = ax.get_axes() # deprecated - self.assertIs(result, axes[0]) + assert result is axes[0] + + # GH 15516 + def test_mpl2_color_cycle_str(self): + # test CN mpl 2.0 color cycle + if self.mpl_ge_2_0_0: + colors = ['C' + str(x) for x in range(10)] + df = DataFrame(randn(10, 3), columns=['a', 'b', 'c']) + for c in colors: + _check_plot_works(df.plot, color=c) + else: + pytest.skip("not supported in matplotlib < 2.0.0") + + def test_color_single_series_list(self): + # GH 3486 + df = DataFrame({"A": [1, 2, 3]}) + _check_plot_works(df.plot, color=['red']) + + def test_rgb_tuple_color(self): + # GH 16695 + df = DataFrame({'x': [1, 2], 'y': [3, 4]}) + _check_plot_works(df.plot, x='x', y='y', color=(1, 0, 0)) + _check_plot_works(df.plot, x='x', y='y', color=(1, 0, 0, 0.5)) + + def test_color_empty_string(self): + df = DataFrame(randn(10, 2)) + with pytest.raises(ValueError): + df.plot(color='') def test_color_and_style_arguments(self): df = DataFrame({'x': [1, 2], 'y': [3, 4]}) @@ -149,35 +173,35 @@ def test_color_and_style_arguments(self): ax = df.plot(color=['red', 'black'], style=['-', '--']) # check that the linestyles are correctly set: linestyle = [line.get_linestyle() for line in ax.lines] - self.assertEqual(linestyle, ['-', '--']) + assert linestyle == ['-', '--'] # check that the colors are correctly set: color = [line.get_color() for line in ax.lines] - self.assertEqual(color, ['red', 'black']) + assert color == ['red', 'black'] # passing both 'color' and 'style' arguments should not be allowed # if there is a color symbol in the style strings: - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot(color=['red', 'black'], style=['k-', 'r--']) def test_nonnumeric_exclude(self): df = DataFrame({'A': ["x", "y", "z"], 'B': [1, 2, 3]}) ax = df.plot() - self.assertEqual(len(ax.get_lines()), 1) # B was plotted + assert len(ax.get_lines()) == 1 # B was plotted - @slow + @pytest.mark.slow def test_implicit_label(self): df = DataFrame(randn(10, 3), columns=['a', 'b', 'c']) ax = df.plot(x='a', y='b') self._check_text_labels(ax.xaxis.get_label(), 'a') - @slow + @pytest.mark.slow def test_donot_overwrite_index_name(self): # GH 8494 df = DataFrame(randn(2, 2), columns=['a', 'b']) df.index.name = 'NAME' df.plot(y='b', label='LABEL') - self.assertEqual(df.index.name, 'NAME') + assert df.index.name == 'NAME' - @slow + @pytest.mark.slow def test_plot_xy(self): # columns.inferred_type == 'string' df = self.tdf @@ -203,7 +227,7 @@ def test_plot_xy(self): # columns.inferred_type == 'mixed' # TODO add MultiIndex test - @slow + @pytest.mark.slow def test_logscales(self): df = DataFrame({'a': np.arange(100)}, index=np.arange(100)) ax = df.plot(logy=True) @@ -215,40 +239,41 @@ def test_logscales(self): ax = df.plot(loglog=True) self._check_ax_scales(ax, xaxis='log', yaxis='log') - @slow + @pytest.mark.slow def test_xcompat(self): import pandas as pd df = self.tdf ax = df.plot(x_compat=True) lines = ax.get_lines() - self.assertNotIsInstance(lines[0].get_xdata(), PeriodIndex) + assert not isinstance(lines[0].get_xdata(), PeriodIndex) tm.close() - pd.plot_params['xaxis.compat'] = True + pd.plotting.plot_params['xaxis.compat'] = True ax = df.plot() lines = ax.get_lines() - self.assertNotIsInstance(lines[0].get_xdata(), PeriodIndex) + assert not isinstance(lines[0].get_xdata(), PeriodIndex) tm.close() - pd.plot_params['x_compat'] = False + pd.plotting.plot_params['x_compat'] = False + ax = df.plot() lines = ax.get_lines() - self.assertNotIsInstance(lines[0].get_xdata(), PeriodIndex) - self.assertIsInstance(PeriodIndex(lines[0].get_xdata()), PeriodIndex) + assert not isinstance(lines[0].get_xdata(), PeriodIndex) + assert isinstance(PeriodIndex(lines[0].get_xdata()), PeriodIndex) tm.close() # useful if you're plotting a bunch together - with pd.plot_params.use('x_compat', True): + with pd.plotting.plot_params.use('x_compat', True): ax = df.plot() lines = ax.get_lines() - self.assertNotIsInstance(lines[0].get_xdata(), PeriodIndex) + assert not isinstance(lines[0].get_xdata(), PeriodIndex) tm.close() ax = df.plot() lines = ax.get_lines() - self.assertNotIsInstance(lines[0].get_xdata(), PeriodIndex) - self.assertIsInstance(PeriodIndex(lines[0].get_xdata()), PeriodIndex) + assert not isinstance(lines[0].get_xdata(), PeriodIndex) + assert isinstance(PeriodIndex(lines[0].get_xdata()), PeriodIndex) def test_period_compat(self): # GH 9012 @@ -279,7 +304,7 @@ def test_unsorted_index(self): rs = Series(rs[:, 1], rs[:, 0], dtype=np.int64, name='y') tm.assert_series_equal(rs, df.y) - @slow + @pytest.mark.slow def test_subplots(self): df = DataFrame(np.random.rand(10, 3), index=list(string.ascii_letters[:10])) @@ -287,7 +312,7 @@ def test_subplots(self): for kind in ['bar', 'barh', 'line', 'area']: axes = df.plot(kind=kind, subplots=True, sharex=True, legend=True) self._check_axes_shape(axes, axes_num=3, layout=(3, 1)) - self.assertEqual(axes.shape, (3, )) + assert axes.shape == (3, ) for ax, column in zip(axes, df.columns): self._check_legend_labels(ax, @@ -317,9 +342,9 @@ def test_subplots(self): axes = df.plot(kind=kind, subplots=True, legend=False) for ax in axes: - self.assertTrue(ax.get_legend() is None) + assert ax.get_legend() is None - @slow + @pytest.mark.slow def test_subplots_timeseries(self): idx = date_range(start='2014-07-01', freq='M', periods=10) df = DataFrame(np.random.rand(10, 3), index=idx) @@ -355,7 +380,7 @@ def test_subplots_timeseries(self): self._check_ticks_props(ax, xlabelsize=7, xrot=45, ylabelsize=7) - @slow + @pytest.mark.slow def test_subplots_layout(self): # GH 6667 df = DataFrame(np.random.rand(10, 3), @@ -363,45 +388,45 @@ def test_subplots_layout(self): axes = df.plot(subplots=True, layout=(2, 2)) self._check_axes_shape(axes, axes_num=3, layout=(2, 2)) - self.assertEqual(axes.shape, (2, 2)) + assert axes.shape == (2, 2) axes = df.plot(subplots=True, layout=(-1, 2)) self._check_axes_shape(axes, axes_num=3, layout=(2, 2)) - self.assertEqual(axes.shape, (2, 2)) + assert axes.shape == (2, 2) axes = df.plot(subplots=True, layout=(2, -1)) self._check_axes_shape(axes, axes_num=3, layout=(2, 2)) - self.assertEqual(axes.shape, (2, 2)) + assert axes.shape == (2, 2) axes = df.plot(subplots=True, layout=(1, 4)) self._check_axes_shape(axes, axes_num=3, layout=(1, 4)) - self.assertEqual(axes.shape, (1, 4)) + assert axes.shape == (1, 4) axes = df.plot(subplots=True, layout=(-1, 4)) self._check_axes_shape(axes, axes_num=3, layout=(1, 4)) - self.assertEqual(axes.shape, (1, 4)) + assert axes.shape == (1, 4) axes = df.plot(subplots=True, layout=(4, -1)) self._check_axes_shape(axes, axes_num=3, layout=(4, 1)) - self.assertEqual(axes.shape, (4, 1)) + assert axes.shape == (4, 1) - with tm.assertRaises(ValueError): - axes = df.plot(subplots=True, layout=(1, 1)) - with tm.assertRaises(ValueError): - axes = df.plot(subplots=True, layout=(-1, -1)) + with pytest.raises(ValueError): + df.plot(subplots=True, layout=(1, 1)) + with pytest.raises(ValueError): + df.plot(subplots=True, layout=(-1, -1)) # single column df = DataFrame(np.random.rand(10, 1), index=list(string.ascii_letters[:10])) axes = df.plot(subplots=True) self._check_axes_shape(axes, axes_num=1, layout=(1, 1)) - self.assertEqual(axes.shape, (1, )) + assert axes.shape == (1, ) axes = df.plot(subplots=True, layout=(3, 3)) self._check_axes_shape(axes, axes_num=1, layout=(3, 3)) - self.assertEqual(axes.shape, (3, 3)) + assert axes.shape == (3, 3) - @slow + @pytest.mark.slow def test_subplots_warnings(self): # GH 9464 warnings.simplefilter('error') @@ -416,7 +441,7 @@ def test_subplots_warnings(self): self.fail(w) warnings.simplefilter('default') - @slow + @pytest.mark.slow def test_subplots_multiple_axes(self): # GH 5353, 6970, GH 7069 fig, axes = self.plt.subplots(2, 3) @@ -426,18 +451,18 @@ def test_subplots_multiple_axes(self): returned = df.plot(subplots=True, ax=axes[0], sharex=False, sharey=False) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assertEqual(returned.shape, (3, )) - self.assertIs(returned[0].figure, fig) + assert returned.shape == (3, ) + assert returned[0].figure is fig # draw on second row returned = df.plot(subplots=True, ax=axes[1], sharex=False, sharey=False) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assertEqual(returned.shape, (3, )) - self.assertIs(returned[0].figure, fig) + assert returned.shape == (3, ) + assert returned[0].figure is fig self._check_axes_shape(axes, axes_num=6, layout=(2, 3)) tm.close() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): fig, axes = self.plt.subplots(2, 3) # pass different number of axes from required df.plot(subplots=True, ax=axes) @@ -448,24 +473,23 @@ def test_subplots_multiple_axes(self): # TestDataFrameGroupByPlots.test_grouped_box_multiple_axes fig, axes = self.plt.subplots(2, 2) with warnings.catch_warnings(): - warnings.simplefilter('ignore') df = DataFrame(np.random.rand(10, 4), index=list(string.ascii_letters[:10])) returned = df.plot(subplots=True, ax=axes, layout=(2, 1), sharex=False, sharey=False) self._check_axes_shape(returned, axes_num=4, layout=(2, 2)) - self.assertEqual(returned.shape, (4, )) + assert returned.shape == (4, ) returned = df.plot(subplots=True, ax=axes, layout=(2, -1), sharex=False, sharey=False) self._check_axes_shape(returned, axes_num=4, layout=(2, 2)) - self.assertEqual(returned.shape, (4, )) + assert returned.shape == (4, ) returned = df.plot(subplots=True, ax=axes, layout=(-1, 2), sharex=False, sharey=False) self._check_axes_shape(returned, axes_num=4, layout=(2, 2)) - self.assertEqual(returned.shape, (4, )) + assert returned.shape == (4, ) # single column fig, axes = self.plt.subplots(1, 1) @@ -474,7 +498,7 @@ def test_subplots_multiple_axes(self): axes = df.plot(subplots=True, ax=[axes], sharex=False, sharey=False) self._check_axes_shape(axes, axes_num=1, layout=(1, 1)) - self.assertEqual(axes.shape, (1, )) + assert axes.shape == (1, ) def test_subplots_ts_share_axes(self): # GH 3964 @@ -517,36 +541,36 @@ def test_subplots_sharex_axes_existing_axes(self): for ax in axes.ravel(): self._check_visible(ax.get_yticklabels(), visible=True) - @slow + @pytest.mark.slow def test_subplots_dup_columns(self): # GH 10962 df = DataFrame(np.random.rand(5, 5), columns=list('aaaaa')) axes = df.plot(subplots=True) for ax in axes: self._check_legend_labels(ax, labels=['a']) - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 tm.close() axes = df.plot(subplots=True, secondary_y='a') for ax in axes: # (right) is only attached when subplots=False self._check_legend_labels(ax, labels=['a']) - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 tm.close() ax = df.plot(secondary_y='a') self._check_legend_labels(ax, labels=['a (right)'] * 5) - self.assertEqual(len(ax.lines), 0) - self.assertEqual(len(ax.right_ax.lines), 5) + assert len(ax.lines) == 0 + assert len(ax.right_ax.lines) == 5 def test_negative_log(self): df = - DataFrame(rand(6, 4), index=list(string.ascii_letters[:6]), columns=['x', 'y', 'z', 'four']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot.area(logy=True) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot.area(loglog=True) def _compare_stacked_y_cood(self, normal_lines, stacked_lines): @@ -554,7 +578,7 @@ def _compare_stacked_y_cood(self, normal_lines, stacked_lines): for nl, sl in zip(normal_lines, stacked_lines): base += nl.get_data()[1] # get y coodinates sy = sl.get_data()[1] - self.assert_numpy_array_equal(base, sy) + tm.assert_numpy_array_equal(base, sy) def test_line_area_stacked(self): with tm.RNGContext(42): @@ -585,7 +609,7 @@ def test_line_area_stacked(self): self._compare_stacked_y_cood(ax1.lines[2:], ax2.lines[2:]) _check_plot_works(mixed_df.plot, stacked=False) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): mixed_df.plot(stacked=True) _check_plot_works(df.plot, kind=kind, logx=True, stacked=True) @@ -604,55 +628,55 @@ def test_line_area_nan_df(self): # remove nan for comparison purpose exp = np.array([1, 2, 3], dtype=np.float64) - self.assert_numpy_array_equal(np.delete(masked1.data, 2), exp) + tm.assert_numpy_array_equal(np.delete(masked1.data, 2), exp) exp = np.array([3, 2, 1], dtype=np.float64) - self.assert_numpy_array_equal(np.delete(masked2.data, 1), exp) - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal(np.delete(masked2.data, 1), exp) + tm.assert_numpy_array_equal( masked1.mask, np.array([False, False, True, False])) - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal( masked2.mask, np.array([False, True, False, False])) expected1 = np.array([1, 2, 0, 3], dtype=np.float64) expected2 = np.array([3, 0, 2, 1], dtype=np.float64) ax = _check_plot_works(d.plot, stacked=True) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) - self.assert_numpy_array_equal(ax.lines[1].get_ydata(), - expected1 + expected2) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) + tm.assert_numpy_array_equal(ax.lines[1].get_ydata(), + expected1 + expected2) ax = _check_plot_works(d.plot.area) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) - self.assert_numpy_array_equal(ax.lines[1].get_ydata(), - expected1 + expected2) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) + tm.assert_numpy_array_equal(ax.lines[1].get_ydata(), + expected1 + expected2) ax = _check_plot_works(d.plot.area, stacked=False) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) - self.assert_numpy_array_equal(ax.lines[1].get_ydata(), expected2) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected1) + tm.assert_numpy_array_equal(ax.lines[1].get_ydata(), expected2) def test_line_lim(self): df = DataFrame(rand(6, 3), columns=['x', 'y', 'z']) ax = df.plot() xmin, xmax = ax.get_xlim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data()[0][0]) - self.assertEqual(xmax, lines[0].get_data()[0][-1]) + assert xmin == lines[0].get_data()[0][0] + assert xmax == lines[0].get_data()[0][-1] ax = df.plot(secondary_y=True) xmin, xmax = ax.get_xlim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data()[0][0]) - self.assertEqual(xmax, lines[0].get_data()[0][-1]) + assert xmin == lines[0].get_data()[0][0] + assert xmax == lines[0].get_data()[0][-1] axes = df.plot(secondary_y=True, subplots=True) self._check_axes_shape(axes, axes_num=3, layout=(3, 1)) for ax in axes: - self.assertTrue(hasattr(ax, 'left_ax')) - self.assertFalse(hasattr(ax, 'right_ax')) + assert hasattr(ax, 'left_ax') + assert not hasattr(ax, 'right_ax') xmin, xmax = ax.get_xlim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data()[0][0]) - self.assertEqual(xmax, lines[0].get_data()[0][-1]) + assert xmin == lines[0].get_data()[0][0] + assert xmax == lines[0].get_data()[0][-1] def test_area_lim(self): df = DataFrame(rand(6, 4), columns=['x', 'y', 'z', 'four']) @@ -663,15 +687,15 @@ def test_area_lim(self): xmin, xmax = ax.get_xlim() ymin, ymax = ax.get_ylim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data()[0][0]) - self.assertEqual(xmax, lines[0].get_data()[0][-1]) - self.assertEqual(ymin, 0) + assert xmin == lines[0].get_data()[0][0] + assert xmax == lines[0].get_data()[0][-1] + assert ymin == 0 ax = _check_plot_works(neg_df.plot.area, stacked=stacked) ymin, ymax = ax.get_ylim() - self.assertEqual(ymax, 0) + assert ymax == 0 - @slow + @pytest.mark.slow def test_bar_colors(self): import matplotlib.pyplot as plt default_colors = self._maybe_unpack_cycler(plt.rcParams) @@ -707,28 +731,28 @@ def test_bar_colors(self): self._check_colors(ax.patches[::5], facecolors=['green'] * 5) tm.close() - @slow + @pytest.mark.slow def test_bar_linewidth(self): df = DataFrame(randn(5, 5)) # regular ax = df.plot.bar(linewidth=2) for r in ax.patches: - self.assertEqual(r.get_linewidth(), 2) + assert r.get_linewidth() == 2 # stacked ax = df.plot.bar(stacked=True, linewidth=2) for r in ax.patches: - self.assertEqual(r.get_linewidth(), 2) + assert r.get_linewidth() == 2 # subplots axes = df.plot.bar(linewidth=2, subplots=True) self._check_axes_shape(axes, axes_num=5, layout=(5, 1)) for ax in axes: for r in ax.patches: - self.assertEqual(r.get_linewidth(), 2) + assert r.get_linewidth() == 2 - @slow + @pytest.mark.slow def test_bar_barwidth(self): df = DataFrame(randn(5, 5)) @@ -737,36 +761,36 @@ def test_bar_barwidth(self): # regular ax = df.plot.bar(width=width) for r in ax.patches: - self.assertEqual(r.get_width(), width / len(df.columns)) + assert r.get_width() == width / len(df.columns) # stacked ax = df.plot.bar(stacked=True, width=width) for r in ax.patches: - self.assertEqual(r.get_width(), width) + assert r.get_width() == width # horizontal regular ax = df.plot.barh(width=width) for r in ax.patches: - self.assertEqual(r.get_height(), width / len(df.columns)) + assert r.get_height() == width / len(df.columns) # horizontal stacked ax = df.plot.barh(stacked=True, width=width) for r in ax.patches: - self.assertEqual(r.get_height(), width) + assert r.get_height() == width # subplots axes = df.plot.bar(width=width, subplots=True) for ax in axes: for r in ax.patches: - self.assertEqual(r.get_width(), width) + assert r.get_width() == width # horizontal subplots axes = df.plot.barh(width=width, subplots=True) for ax in axes: for r in ax.patches: - self.assertEqual(r.get_height(), width) + assert r.get_height() == width - @slow + @pytest.mark.slow def test_bar_barwidth_position(self): df = DataFrame(randn(5, 5)) self._check_bar_alignment(df, kind='bar', stacked=False, width=0.9, @@ -782,7 +806,7 @@ def test_bar_barwidth_position(self): self._check_bar_alignment(df, kind='barh', subplots=True, width=0.9, position=0.2) - @slow + @pytest.mark.slow def test_bar_barwidth_position_int(self): # GH 12979 df = DataFrame(randn(5, 5)) @@ -791,10 +815,10 @@ def test_bar_barwidth_position_int(self): ax = df.plot.bar(stacked=True, width=w) ticks = ax.xaxis.get_ticklocs() tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4])) - self.assertEqual(ax.get_xlim(), (-0.75, 4.75)) + assert ax.get_xlim() == (-0.75, 4.75) # check left-edge of bars - self.assertEqual(ax.patches[0].get_x(), -0.5) - self.assertEqual(ax.patches[-1].get_x(), 3.5) + assert ax.patches[0].get_x() == -0.5 + assert ax.patches[-1].get_x() == 3.5 self._check_bar_alignment(df, kind='bar', stacked=True, width=1) self._check_bar_alignment(df, kind='barh', stacked=False, width=1) @@ -802,36 +826,36 @@ def test_bar_barwidth_position_int(self): self._check_bar_alignment(df, kind='bar', subplots=True, width=1) self._check_bar_alignment(df, kind='barh', subplots=True, width=1) - @slow + @pytest.mark.slow def test_bar_bottom_left(self): df = DataFrame(rand(5, 5)) ax = df.plot.bar(stacked=False, bottom=1) result = [p.get_y() for p in ax.patches] - self.assertEqual(result, [1] * 25) + assert result == [1] * 25 ax = df.plot.bar(stacked=True, bottom=[-1, -2, -3, -4, -5]) result = [p.get_y() for p in ax.patches[:5]] - self.assertEqual(result, [-1, -2, -3, -4, -5]) + assert result == [-1, -2, -3, -4, -5] ax = df.plot.barh(stacked=False, left=np.array([1, 1, 1, 1, 1])) result = [p.get_x() for p in ax.patches] - self.assertEqual(result, [1] * 25) + assert result == [1] * 25 ax = df.plot.barh(stacked=True, left=[1, 2, 3, 4, 5]) result = [p.get_x() for p in ax.patches[:5]] - self.assertEqual(result, [1, 2, 3, 4, 5]) + assert result == [1, 2, 3, 4, 5] axes = df.plot.bar(subplots=True, bottom=-1) for ax in axes: result = [p.get_y() for p in ax.patches] - self.assertEqual(result, [-1] * 5) + assert result == [-1] * 5 axes = df.plot.barh(subplots=True, left=np.array([1, 1, 1, 1, 1])) for ax in axes: result = [p.get_x() for p in ax.patches] - self.assertEqual(result, [1] * 5) + assert result == [1] * 5 - @slow + @pytest.mark.slow def test_bar_nan(self): df = DataFrame({'A': [10, np.nan, 20], 'B': [5, 10, 20], @@ -839,17 +863,17 @@ def test_bar_nan(self): ax = df.plot.bar() expected = [10, 0, 20, 5, 10, 20, 1, 2, 3] result = [p.get_height() for p in ax.patches] - self.assertEqual(result, expected) + assert result == expected ax = df.plot.bar(stacked=True) result = [p.get_height() for p in ax.patches] - self.assertEqual(result, expected) + assert result == expected result = [p.get_y() for p in ax.patches] expected = [0.0, 0.0, 0.0, 10.0, 0.0, 20.0, 15.0, 10.0, 40.0] - self.assertEqual(result, expected) + assert result == expected - @slow + @pytest.mark.slow def test_bar_categorical(self): # GH 13019 df1 = pd.DataFrame(np.random.randn(6, 5), @@ -864,18 +888,18 @@ def test_bar_categorical(self): ax = df.plot.bar() ticks = ax.xaxis.get_ticklocs() tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5])) - self.assertEqual(ax.get_xlim(), (-0.5, 5.5)) + assert ax.get_xlim() == (-0.5, 5.5) # check left-edge of bars - self.assertEqual(ax.patches[0].get_x(), -0.25) - self.assertEqual(ax.patches[-1].get_x(), 5.15) + assert ax.patches[0].get_x() == -0.25 + assert ax.patches[-1].get_x() == 5.15 ax = df.plot.bar(stacked=True) tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5])) - self.assertEqual(ax.get_xlim(), (-0.5, 5.5)) - self.assertEqual(ax.patches[0].get_x(), -0.25) - self.assertEqual(ax.patches[-1].get_x(), 4.75) + assert ax.get_xlim() == (-0.5, 5.5) + assert ax.patches[0].get_x() == -0.25 + assert ax.patches[-1].get_x() == 4.75 - @slow + @pytest.mark.slow def test_plot_scatter(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), @@ -884,16 +908,34 @@ def test_plot_scatter(self): _check_plot_works(df.plot.scatter, x='x', y='y') _check_plot_works(df.plot.scatter, x=1, y=2) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot.scatter(x='x') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot.scatter(y='y') # GH 6951 axes = df.plot(x='x', y='y', kind='scatter', subplots=True) self._check_axes_shape(axes, axes_num=1, layout=(1, 1)) - @slow + @pytest.mark.slow + def test_plot_scatter_with_categorical_data(self): + # GH 16199 + df = pd.DataFrame({'x': [1, 2, 3, 4], + 'y': pd.Categorical(['a', 'b', 'a', 'c'])}) + + with pytest.raises(ValueError) as ve: + df.plot(x='x', y='y', kind='scatter') + ve.match('requires y column to be numeric') + + with pytest.raises(ValueError) as ve: + df.plot(x='y', y='x', kind='scatter') + ve.match('requires x column to be numeric') + + with pytest.raises(ValueError) as ve: + df.plot(x='y', y='y', kind='scatter') + ve.match('requires x column to be numeric') + + @pytest.mark.slow def test_plot_scatter_with_c(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), @@ -903,25 +945,25 @@ def test_plot_scatter_with_c(self): df.plot.scatter(x=0, y=1, c=2)] for ax in axes: # default to Greys - self.assertEqual(ax.collections[0].cmap.name, 'Greys') + assert ax.collections[0].cmap.name == 'Greys' if self.mpl_ge_1_3_1: # n.b. there appears to be no public method to get the colorbar # label - self.assertEqual(ax.collections[0].colorbar._label, 'z') + assert ax.collections[0].colorbar._label == 'z' cm = 'cubehelix' ax = df.plot.scatter(x='x', y='y', c='z', colormap=cm) - self.assertEqual(ax.collections[0].cmap.name, cm) + assert ax.collections[0].cmap.name == cm # verify turning off colorbar works ax = df.plot.scatter(x='x', y='y', c='z', colorbar=False) - self.assertIs(ax.collections[0].colorbar, None) + assert ax.collections[0].colorbar is None # verify that we can still plot a solid color ax = df.plot.scatter(x=0, y=1, c='red') - self.assertIs(ax.collections[0].colorbar, None) + assert ax.collections[0].colorbar is None self._check_colors(ax.collections, facecolors=['r']) # Ensure that we can pass an np.array straight through to matplotlib, @@ -939,8 +981,8 @@ def test_plot_scatter_with_c(self): # identical to the values we supplied, normally we'd be on shaky ground # comparing floats for equality but here we expect them to be # identical. - self.assertTrue(np.array_equal(ax.collections[0].get_facecolor(), - rgba_array)) + tm.assert_numpy_array_equal(ax.collections[0] + .get_facecolor(), rgba_array) # we don't test the colors of the faces in this next plot because they # are dependent on the spring colormap, which may change its colors # later. @@ -949,7 +991,7 @@ def test_plot_scatter_with_c(self): def test_scatter_colors(self): df = DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3], 'c': [1, 2, 3]}) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot.scatter(x='a', y='b', c='c', color='green') default_colors = self._maybe_unpack_cycler(self.plt.rcParams) @@ -963,7 +1005,7 @@ def test_scatter_colors(self): tm.assert_numpy_array_equal(ax.collections[0].get_facecolor()[0], np.array([1, 1, 1, 1], dtype=np.float64)) - @slow + @pytest.mark.slow def test_plot_bar(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), @@ -1020,8 +1062,8 @@ def _check_bar_alignment(self, df, kind='bar', stacked=False, # GH 7498 # compare margins between lim and bar edges - self.assertAlmostEqual(ax_min, min_edge - 0.25) - self.assertAlmostEqual(ax_max, max_edge + 0.25) + tm.assert_almost_equal(ax_min, min_edge - 0.25) + tm.assert_almost_equal(ax_max, max_edge + 0.25) p = ax.patches[0] if kind == 'bar' and (stacked is True or subplots is True): @@ -1041,20 +1083,20 @@ def _check_bar_alignment(self, df, kind='bar', stacked=False, raise ValueError # Check the ticks locates on integer - self.assertTrue((axis.get_ticklocs() == np.arange(len(df))).all()) + assert (axis.get_ticklocs() == np.arange(len(df))).all() if align == 'center': # Check whether the bar locates on center - self.assertAlmostEqual(axis.get_ticklocs()[0], center) + tm.assert_almost_equal(axis.get_ticklocs()[0], center) elif align == 'edge': # Check whether the bar's edge starts from the tick - self.assertAlmostEqual(axis.get_ticklocs()[0], edge) + tm.assert_almost_equal(axis.get_ticklocs()[0], edge) else: raise ValueError return axes - @slow + @pytest.mark.slow def test_bar_stacked_center(self): # GH2157 df = DataFrame({'A': [3] * 5, 'B': lrange(5)}, index=lrange(5)) @@ -1063,7 +1105,7 @@ def test_bar_stacked_center(self): self._check_bar_alignment(df, kind='barh', stacked=True) self._check_bar_alignment(df, kind='barh', stacked=True, width=0.9) - @slow + @pytest.mark.slow def test_bar_center(self): df = DataFrame({'A': [3] * 5, 'B': lrange(5)}, index=lrange(5)) self._check_bar_alignment(df, kind='bar', stacked=False) @@ -1071,7 +1113,7 @@ def test_bar_center(self): self._check_bar_alignment(df, kind='barh', stacked=False) self._check_bar_alignment(df, kind='barh', stacked=False, width=0.9) - @slow + @pytest.mark.slow def test_bar_subplots_center(self): df = DataFrame({'A': [3] * 5, 'B': lrange(5)}, index=lrange(5)) self._check_bar_alignment(df, kind='bar', subplots=True) @@ -1079,7 +1121,7 @@ def test_bar_subplots_center(self): self._check_bar_alignment(df, kind='barh', subplots=True) self._check_bar_alignment(df, kind='barh', subplots=True, width=0.9) - @slow + @pytest.mark.slow def test_bar_align_single_column(self): df = DataFrame(randn(5)) self._check_bar_alignment(df, kind='bar', stacked=False) @@ -1089,7 +1131,7 @@ def test_bar_align_single_column(self): self._check_bar_alignment(df, kind='bar', subplots=True) self._check_bar_alignment(df, kind='barh', subplots=True) - @slow + @pytest.mark.slow def test_bar_edge(self): df = DataFrame({'A': [3] * 5, 'B': lrange(5)}, index=lrange(5)) @@ -1114,7 +1156,7 @@ def test_bar_edge(self): self._check_bar_alignment(df, kind='barh', subplots=True, width=0.9, align='edge') - @slow + @pytest.mark.slow def test_bar_log_no_subplots(self): # GH3254, GH3298 matplotlib/matplotlib#1882, #1892 # regressions in 1.2.1 @@ -1128,7 +1170,7 @@ def test_bar_log_no_subplots(self): ax = df.plot.bar(grid=True, log=True) tm.assert_numpy_array_equal(ax.yaxis.get_ticklocs(), expected) - @slow + @pytest.mark.slow def test_bar_log_subplots(self): expected = np.array([1., 10., 100., 1000.]) if not self.mpl_le_1_2_1: @@ -1140,7 +1182,7 @@ def test_bar_log_subplots(self): tm.assert_numpy_array_equal(ax[0].yaxis.get_ticklocs(), expected) tm.assert_numpy_array_equal(ax[1].yaxis.get_ticklocs(), expected) - @slow + @pytest.mark.slow def test_boxplot(self): df = self.hist_df series = df['height'] @@ -1151,7 +1193,7 @@ def test_boxplot(self): self._check_text_labels(ax.get_xticklabels(), labels) tm.assert_numpy_array_equal(ax.xaxis.get_ticklocs(), np.arange(1, len(numeric_cols) + 1)) - self.assertEqual(len(ax.lines), self.bp_n_objects * len(numeric_cols)) + assert len(ax.lines) == self.bp_n_objects * len(numeric_cols) # different warning on py3 if not PY3: @@ -1162,7 +1204,7 @@ def test_boxplot(self): self._check_ax_scales(axes, yaxis='log') for ax, label in zip(axes, labels): self._check_text_labels(ax.get_xticklabels(), [label]) - self.assertEqual(len(ax.lines), self.bp_n_objects) + assert len(ax.lines) == self.bp_n_objects axes = series.plot.box(rot=40) self._check_ticks_props(axes, xrot=40, yrot=0) @@ -1176,9 +1218,9 @@ def test_boxplot(self): labels = [pprint_thing(c) for c in numeric_cols] self._check_text_labels(ax.get_xticklabels(), labels) tm.assert_numpy_array_equal(ax.xaxis.get_ticklocs(), positions) - self.assertEqual(len(ax.lines), self.bp_n_objects * len(numeric_cols)) + assert len(ax.lines) == self.bp_n_objects * len(numeric_cols) - @slow + @pytest.mark.slow def test_boxplot_vertical(self): df = self.hist_df numeric_cols = df._get_numeric_data().columns @@ -1188,7 +1230,7 @@ def test_boxplot_vertical(self): ax = df.plot.box(rot=50, fontsize=8, vert=False) self._check_ticks_props(ax, xrot=0, yrot=50, ylabelsize=8) self._check_text_labels(ax.get_yticklabels(), labels) - self.assertEqual(len(ax.lines), self.bp_n_objects * len(numeric_cols)) + assert len(ax.lines) == self.bp_n_objects * len(numeric_cols) # _check_plot_works adds an ax so catch warning. see GH #13188 with tm.assert_produces_warning(UserWarning): @@ -1198,20 +1240,20 @@ def test_boxplot_vertical(self): self._check_ax_scales(axes, xaxis='log') for ax, label in zip(axes, labels): self._check_text_labels(ax.get_yticklabels(), [label]) - self.assertEqual(len(ax.lines), self.bp_n_objects) + assert len(ax.lines) == self.bp_n_objects positions = np.array([3, 2, 8]) ax = df.plot.box(positions=positions, vert=False) self._check_text_labels(ax.get_yticklabels(), labels) tm.assert_numpy_array_equal(ax.yaxis.get_ticklocs(), positions) - self.assertEqual(len(ax.lines), self.bp_n_objects * len(numeric_cols)) + assert len(ax.lines) == self.bp_n_objects * len(numeric_cols) - @slow + @pytest.mark.slow def test_boxplot_return_type(self): df = DataFrame(randn(6, 4), index=list(string.ascii_letters[:6]), columns=['one', 'two', 'three', 'four']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot.box(return_type='NOTATYPE') result = df.plot.box(return_type='dict') @@ -1226,13 +1268,13 @@ def test_boxplot_return_type(self): result = df.plot.box(return_type='both') self._check_box_return_type(result, 'both') - @slow + @pytest.mark.slow def test_boxplot_subplots_return_type(self): df = self.hist_df # normal style: return_type=None result = df.plot.box(subplots=True) - self.assertIsInstance(result, Series) + assert isinstance(result, Series) self._check_box_return_type(result, None, expected_keys=[ 'height', 'weight', 'category']) @@ -1243,10 +1285,13 @@ def test_boxplot_subplots_return_type(self): expected_keys=['height', 'weight', 'category'], check_ax_title=False) - @slow + @pytest.mark.slow def test_kde_df(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + df = DataFrame(randn(100, 4)) ax = _check_plot_works(df.plot, kind='kde') expected = [pprint_thing(c) for c in df.columns] @@ -1264,19 +1309,22 @@ def test_kde_df(self): axes = df.plot(kind='kde', logy=True, subplots=True) self._check_ax_scales(axes, yaxis='log') - @slow + @pytest.mark.slow def test_kde_missing_vals(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + df = DataFrame(np.random.uniform(size=(100, 4))) df.loc[0, 0] = np.nan _check_plot_works(df.plot, kind='kde') - @slow + @pytest.mark.slow def test_hist_df(self): from matplotlib.patches import Rectangle if self.mpl_le_1_2_1: - raise nose.SkipTest("not supported in matplotlib <= 1.2.x") + pytest.skip("not supported in matplotlib <= 1.2.x") df = DataFrame(randn(100, 4)) series = df[0] @@ -1298,13 +1346,13 @@ def test_hist_df(self): ax = series.plot.hist(normed=True, cumulative=True, bins=4) # height of last bin (index 5) must be 1.0 rects = [x for x in ax.get_children() if isinstance(x, Rectangle)] - self.assertAlmostEqual(rects[-1].get_height(), 1.0) + tm.assert_almost_equal(rects[-1].get_height(), 1.0) tm.close() ax = series.plot.hist(cumulative=True, bins=4) rects = [x for x in ax.get_children() if isinstance(x, Rectangle)] - self.assertAlmostEqual(rects[-2].get_height(), 100.0) + tm.assert_almost_equal(rects[-2].get_height(), 100.0) tm.close() # if horizontal, yticklabels are rotated @@ -1320,19 +1368,19 @@ def _check_box_coord(self, patches, expected_y=None, expected_h=None, # dtype is depending on above values, no need to check if expected_y is not None: - self.assert_numpy_array_equal(result_y, expected_y, - check_dtype=False) + tm.assert_numpy_array_equal(result_y, expected_y, + check_dtype=False) if expected_h is not None: - self.assert_numpy_array_equal(result_height, expected_h, - check_dtype=False) + tm.assert_numpy_array_equal(result_height, expected_h, + check_dtype=False) if expected_x is not None: - self.assert_numpy_array_equal(result_x, expected_x, - check_dtype=False) + tm.assert_numpy_array_equal(result_x, expected_x, + check_dtype=False) if expected_w is not None: - self.assert_numpy_array_equal(result_width, expected_w, - check_dtype=False) + tm.assert_numpy_array_equal(result_width, expected_w, + check_dtype=False) - @slow + @pytest.mark.slow def test_hist_df_coord(self): normal_df = DataFrame({'A': np.repeat(np.array([1, 2, 3, 4, 5]), np.array([10, 9, 8, 7, 6])), @@ -1423,12 +1471,12 @@ def test_hist_df_coord(self): expected_x=np.array([0, 0, 0, 0, 0]), expected_w=np.array([6, 7, 8, 9, 10])) - @slow + @pytest.mark.slow def test_plot_int_columns(self): df = DataFrame(randn(100, 4)).cumsum() _check_plot_works(df.plot, legend=True) - @slow + @pytest.mark.slow def test_df_legend_labels(self): kinds = ['line', 'bar', 'barh', 'kde', 'area', 'hist'] df = DataFrame(rand(3, 3), columns=['a', 'b', 'c']) @@ -1495,7 +1543,7 @@ def test_df_legend_labels(self): self._check_text_labels(ax.xaxis.get_label(), 'a') ax = df5.plot(y='c', label='LABEL_c', ax=ax) self._check_legend_labels(ax, labels=['LABEL_b', 'LABEL_c']) - self.assertTrue(df5.columns.tolist() == ['b', 'c']) + assert df5.columns.tolist() == ['b', 'c'] def test_legend_name(self): multi = DataFrame(randn(4, 4), @@ -1521,7 +1569,7 @@ def test_legend_name(self): leg_title = ax.legend_.get_title() self._check_text_labels(leg_title, 'new') - @slow + @pytest.mark.slow def test_no_legend(self): kinds = ['line', 'bar', 'barh', 'kde', 'area', 'hist'] df = DataFrame(rand(3, 3), columns=['a', 'b', 'c']) @@ -1533,7 +1581,7 @@ def test_no_legend(self): ax = df.plot(kind=kind, legend=False) self._check_legend_labels(ax, visible=False) - @slow + @pytest.mark.slow def test_style_by_column(self): import matplotlib.pyplot as plt fig = plt.gcf() @@ -1547,20 +1595,20 @@ def test_style_by_column(self): fig.add_subplot(111) ax = df.plot(style=markers) for i, l in enumerate(ax.get_lines()[:len(markers)]): - self.assertEqual(l.get_marker(), markers[i]) + assert l.get_marker() == markers[i] - @slow + @pytest.mark.slow def test_line_label_none(self): s = Series([1, 2]) ax = s.plot() - self.assertEqual(ax.get_legend(), None) + assert ax.get_legend() is None ax = s.plot(legend=True) - self.assertEqual(ax.get_legend().get_texts()[0].get_text(), 'None') + assert ax.get_legend().get_texts()[0].get_text() == 'None' - @slow + @pytest.mark.slow + @tm.capture_stdout def test_line_colors(self): - import sys from matplotlib import cm custom_colors = 'rgcby' @@ -1569,16 +1617,13 @@ def test_line_colors(self): ax = df.plot(color=custom_colors) self._check_colors(ax.get_lines(), linecolors=custom_colors) - tmp = sys.stderr - sys.stderr = StringIO() - try: - tm.close() - ax2 = df.plot(colors=custom_colors) - lines2 = ax2.get_lines() - for l1, l2 in zip(ax.get_lines(), lines2): - self.assertEqual(l1.get_color(), l2.get_color()) - finally: - sys.stderr = tmp + tm.close() + + ax2 = df.plot(colors=custom_colors) + lines2 = ax2.get_lines() + + for l1, l2 in zip(ax.get_lines(), lines2): + assert l1.get_color() == l2.get_color() tm.close() @@ -1607,19 +1652,19 @@ def test_line_colors(self): self._check_colors(ax.get_lines(), linecolors=custom_colors) tm.close() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # Color contains shorthand hex value results in ValueError custom_colors = ['#F00', '#00F', '#FF0', '#000', '#FFF'] # Forced show plot _check_plot_works(df.plot, color=custom_colors) - @slow + @pytest.mark.slow def test_dont_modify_colors(self): colors = ['r', 'g', 'b'] pd.DataFrame(np.random.rand(10, 2)).plot(color=colors) - self.assertEqual(len(colors), 3) + assert len(colors) == 3 - @slow + @pytest.mark.slow def test_line_colors_and_styles_subplots(self): # GH 9894 from matplotlib import cm @@ -1664,7 +1709,7 @@ def test_line_colors_and_styles_subplots(self): self._check_colors(ax.get_lines(), linecolors=[c]) tm.close() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # Color contains shorthand hex value results in ValueError custom_colors = ['#F00', '#00F', '#FF0', '#000', '#FFF'] # Forced show plot @@ -1697,7 +1742,7 @@ def test_line_colors_and_styles_subplots(self): self._check_colors(ax.get_lines(), linecolors=[c]) tm.close() - @slow + @pytest.mark.slow def test_area_colors(self): from matplotlib import cm from matplotlib.collections import PolyCollection @@ -1720,7 +1765,7 @@ def test_area_colors(self): self._check_colors(linehandles, linecolors=custom_colors) for h in handles: - self.assertTrue(h.get_alpha() is None) + assert h.get_alpha() is None tm.close() ax = df.plot.area(colormap='jet') @@ -1737,7 +1782,7 @@ def test_area_colors(self): if not isinstance(x, PolyCollection)] self._check_colors(linehandles, linecolors=jet_colors) for h in handles: - self.assertTrue(h.get_alpha() is None) + assert h.get_alpha() is None tm.close() # When stacked=False, alpha is set to 0.5 @@ -1755,9 +1800,9 @@ def test_area_colors(self): linecolors = jet_colors self._check_colors(handles[:len(jet_colors)], linecolors=linecolors) for h in handles: - self.assertEqual(h.get_alpha(), 0.5) + assert h.get_alpha() == 0.5 - @slow + @pytest.mark.slow def test_hist_colors(self): default_colors = self._maybe_unpack_cycler(self.plt.rcParams) @@ -1791,10 +1836,12 @@ def test_hist_colors(self): self._check_colors(ax.patches[::10], facecolors=['green'] * 5) tm.close() - @slow + @pytest.mark.slow def test_kde_colors(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") from matplotlib import cm @@ -1814,10 +1861,12 @@ def test_kde_colors(self): rgba_colors = lmap(cm.jet, np.linspace(0, 1, len(df))) self._check_colors(ax.get_lines(), linecolors=rgba_colors) - @slow + @pytest.mark.slow def test_kde_colors_and_styles_subplots(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") from matplotlib import cm default_colors = self._maybe_unpack_cycler(self.plt.rcParams) @@ -1873,7 +1922,7 @@ def test_kde_colors_and_styles_subplots(self): self._check_colors(ax.get_lines(), linecolors=[c]) tm.close() - @slow + @pytest.mark.slow def test_boxplot_colors(self): def _check_colors(bp, box_c, whiskers_c, medians_c, caps_c='k', fliers_c=None): @@ -1934,7 +1983,7 @@ def _check_colors(bp, box_c, whiskers_c, medians_c, caps_c='k', _check_colors(bp, (0, 1, 0), (0, 1, 0), (0, 1, 0), (0, 1, 0), '#123456') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # Color contains invalid key results in ValueError df.plot.box(color=dict(boxes='red', xxxx='blue')) @@ -1961,13 +2010,13 @@ def test_unordered_ts(self): columns=['test']) ax = df.plot() xticks = ax.lines[0].get_xdata() - self.assertTrue(xticks[0] < xticks[1]) + assert xticks[0] < xticks[1] ydata = ax.lines[0].get_ydata() tm.assert_numpy_array_equal(ydata, np.array([1.0, 2.0, 3.0])) def test_kind_both_ways(self): df = DataFrame({'x': [1, 2, 3]}) - for kind in plotting._common_kinds: + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue df.plot(kind=kind) @@ -1978,21 +2027,21 @@ def test_kind_both_ways(self): def test_all_invalid_plot_data(self): df = DataFrame(list('abcd')) - for kind in plotting._common_kinds: + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot(kind=kind) - @slow + @pytest.mark.slow def test_partially_invalid_plot_data(self): with tm.RNGContext(42): df = DataFrame(randn(10, 2), dtype=object) df[np.random.rand(df.shape[0]) > 0.5] = 'a' - for kind in plotting._common_kinds: + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot(kind=kind) with tm.RNGContext(42): @@ -2001,74 +2050,74 @@ def test_partially_invalid_plot_data(self): df = DataFrame(rand(10, 2), dtype=object) df[np.random.rand(df.shape[0]) > 0.5] = 'a' for kind in kinds: - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot(kind=kind) def test_invalid_kind(self): df = DataFrame(randn(10, 2)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot(kind='aasdf') - @slow + @pytest.mark.slow def test_hexbin_basic(self): df = self.hexbin_df ax = df.plot.hexbin(x='A', y='B', gridsize=10) # TODO: need better way to test. This just does existence. - self.assertEqual(len(ax.collections), 1) + assert len(ax.collections) == 1 # GH 6951 axes = df.plot.hexbin(x='A', y='B', subplots=True) # hexbin should have 2 axes in the figure, 1 for plotting and another # is colorbar - self.assertEqual(len(axes[0].figure.axes), 2) + assert len(axes[0].figure.axes) == 2 # return value is single axes self._check_axes_shape(axes, axes_num=1, layout=(1, 1)) - @slow + @pytest.mark.slow def test_hexbin_with_c(self): df = self.hexbin_df ax = df.plot.hexbin(x='A', y='B', C='C') - self.assertEqual(len(ax.collections), 1) + assert len(ax.collections) == 1 ax = df.plot.hexbin(x='A', y='B', C='C', reduce_C_function=np.std) - self.assertEqual(len(ax.collections), 1) + assert len(ax.collections) == 1 - @slow + @pytest.mark.slow def test_hexbin_cmap(self): df = self.hexbin_df # Default to BuGn ax = df.plot.hexbin(x='A', y='B') - self.assertEqual(ax.collections[0].cmap.name, 'BuGn') + assert ax.collections[0].cmap.name == 'BuGn' cm = 'cubehelix' ax = df.plot.hexbin(x='A', y='B', colormap=cm) - self.assertEqual(ax.collections[0].cmap.name, cm) + assert ax.collections[0].cmap.name == cm - @slow + @pytest.mark.slow def test_no_color_bar(self): df = self.hexbin_df ax = df.plot.hexbin(x='A', y='B', colorbar=None) - self.assertIs(ax.collections[0].colorbar, None) + assert ax.collections[0].colorbar is None - @slow + @pytest.mark.slow def test_allow_cmap(self): df = self.hexbin_df ax = df.plot.hexbin(x='A', y='B', cmap='YlGn') - self.assertEqual(ax.collections[0].cmap.name, 'YlGn') + assert ax.collections[0].cmap.name == 'YlGn' - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.plot.hexbin(x='A', y='B', cmap='YlGn', colormap='BuGn') - @slow + @pytest.mark.slow def test_pie_df(self): df = DataFrame(np.random.rand(5, 3), columns=['X', 'Y', 'Z'], index=['a', 'b', 'c', 'd', 'e']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot.pie() ax = _check_plot_works(df.plot.pie, y='Y') @@ -2081,11 +2130,11 @@ def test_pie_df(self): with tm.assert_produces_warning(UserWarning): axes = _check_plot_works(df.plot.pie, subplots=True) - self.assertEqual(len(axes), len(df.columns)) + assert len(axes) == len(df.columns) for ax in axes: self._check_text_labels(ax.texts, df.index) for ax, ylabel in zip(axes, df.columns): - self.assertEqual(ax.get_ylabel(), ylabel) + assert ax.get_ylabel() == ylabel labels = ['A', 'B', 'C', 'D', 'E'] color_args = ['r', 'g', 'b', 'c', 'm'] @@ -2093,7 +2142,7 @@ def test_pie_df(self): axes = _check_plot_works(df.plot.pie, subplots=True, labels=labels, colors=color_args) - self.assertEqual(len(axes), len(df.columns)) + assert len(axes) == len(df.columns) for ax in axes: self._check_text_labels(ax.texts, labels) @@ -2111,83 +2160,85 @@ def test_pie_df_nan(self): expected = list(base_expected) # force copy expected[i] = '' result = [x.get_text() for x in ax.texts] - self.assertEqual(result, expected) + assert result == expected # legend labels # NaN's not included in legend with subplots # see https://github.com/pandas-dev/pandas/issues/8390 - self.assertEqual([x.get_text() for x in - ax.get_legend().get_texts()], - base_expected[:i] + base_expected[i + 1:]) + assert ([x.get_text() for x in ax.get_legend().get_texts()] == + base_expected[:i] + base_expected[i + 1:]) - @slow + @pytest.mark.slow def test_errorbar_plot(self): - d = {'x': np.arange(12), 'y': np.arange(12, 0, -1)} - df = DataFrame(d) - d_err = {'x': np.ones(12) * 0.2, 'y': np.ones(12) * 0.4} - df_err = DataFrame(d_err) - - # check line plots - ax = _check_plot_works(df.plot, yerr=df_err, logy=True) - self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(df.plot, yerr=df_err, logx=True, logy=True) - self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(df.plot, yerr=df_err, loglog=True) - self._check_has_errorbars(ax, xerr=0, yerr=2) + with warnings.catch_warnings(): + d = {'x': np.arange(12), 'y': np.arange(12, 0, -1)} + df = DataFrame(d) + d_err = {'x': np.ones(12) * 0.2, 'y': np.ones(12) * 0.4} + df_err = DataFrame(d_err) - kinds = ['line', 'bar', 'barh'] - for kind in kinds: - ax = _check_plot_works(df.plot, yerr=df_err['x'], kind=kind) + # check line plots + ax = _check_plot_works(df.plot, yerr=df_err, logy=True) self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(df.plot, yerr=d_err, kind=kind) + ax = _check_plot_works(df.plot, yerr=df_err, logx=True, logy=True) self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(df.plot, yerr=df_err, xerr=df_err, - kind=kind) - self._check_has_errorbars(ax, xerr=2, yerr=2) - ax = _check_plot_works(df.plot, yerr=df_err['x'], xerr=df_err['x'], - kind=kind) - self._check_has_errorbars(ax, xerr=2, yerr=2) - ax = _check_plot_works(df.plot, xerr=0.2, yerr=0.2, kind=kind) - self._check_has_errorbars(ax, xerr=2, yerr=2) - # _check_plot_works adds an ax so catch warning. see GH #13188 - with tm.assert_produces_warning(UserWarning): + ax = _check_plot_works(df.plot, yerr=df_err, loglog=True) + self._check_has_errorbars(ax, xerr=0, yerr=2) + + kinds = ['line', 'bar', 'barh'] + for kind in kinds: + ax = _check_plot_works(df.plot, yerr=df_err['x'], kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works(df.plot, yerr=d_err, kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works(df.plot, yerr=df_err, xerr=df_err, + kind=kind) + self._check_has_errorbars(ax, xerr=2, yerr=2) + ax = _check_plot_works(df.plot, yerr=df_err['x'], + xerr=df_err['x'], + kind=kind) + self._check_has_errorbars(ax, xerr=2, yerr=2) + ax = _check_plot_works(df.plot, xerr=0.2, yerr=0.2, kind=kind) + self._check_has_errorbars(ax, xerr=2, yerr=2) + + # _check_plot_works adds an ax so catch warning. see GH #13188 axes = _check_plot_works(df.plot, yerr=df_err, xerr=df_err, subplots=True, kind=kind) - self._check_has_errorbars(axes, xerr=1, yerr=1) - - ax = _check_plot_works((df + 1).plot, yerr=df_err, - xerr=df_err, kind='bar', log=True) - self._check_has_errorbars(ax, xerr=2, yerr=2) + self._check_has_errorbars(axes, xerr=1, yerr=1) - # yerr is raw error values - ax = _check_plot_works(df['y'].plot, yerr=np.ones(12) * 0.4) - self._check_has_errorbars(ax, xerr=0, yerr=1) - ax = _check_plot_works(df.plot, yerr=np.ones((2, 12)) * 0.4) - self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works((df + 1).plot, yerr=df_err, + xerr=df_err, kind='bar', log=True) + self._check_has_errorbars(ax, xerr=2, yerr=2) - # yerr is iterator - import itertools - ax = _check_plot_works(df.plot, yerr=itertools.repeat(0.1, len(df))) - self._check_has_errorbars(ax, xerr=0, yerr=2) + # yerr is raw error values + ax = _check_plot_works(df['y'].plot, yerr=np.ones(12) * 0.4) + self._check_has_errorbars(ax, xerr=0, yerr=1) + ax = _check_plot_works(df.plot, yerr=np.ones((2, 12)) * 0.4) + self._check_has_errorbars(ax, xerr=0, yerr=2) - # yerr is column name - for yerr in ['yerr', u('誤差')]: - s_df = df.copy() - s_df[yerr] = np.ones(12) * 0.2 - ax = _check_plot_works(s_df.plot, yerr=yerr) + # yerr is iterator + import itertools + ax = _check_plot_works(df.plot, + yerr=itertools.repeat(0.1, len(df))) self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(s_df.plot, y='y', x='x', yerr=yerr) - self._check_has_errorbars(ax, xerr=0, yerr=1) - with tm.assertRaises(ValueError): - df.plot(yerr=np.random.randn(11)) + # yerr is column name + for yerr in ['yerr', u('誤差')]: + s_df = df.copy() + s_df[yerr] = np.ones(12) * 0.2 + ax = _check_plot_works(s_df.plot, yerr=yerr) + self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works(s_df.plot, y='y', x='x', yerr=yerr) + self._check_has_errorbars(ax, xerr=0, yerr=1) + + with pytest.raises(ValueError): + df.plot(yerr=np.random.randn(11)) - df_err = DataFrame({'x': ['zzz'] * 12, 'y': ['zzz'] * 12}) - with tm.assertRaises((ValueError, TypeError)): - df.plot(yerr=df_err) + df_err = DataFrame({'x': ['zzz'] * 12, 'y': ['zzz'] * 12}) + with pytest.raises((ValueError, TypeError)): + df.plot(yerr=df_err) - @slow + @pytest.mark.slow def test_errorbar_with_integer_column_names(self): # test with integer column names df = DataFrame(np.random.randn(10, 2)) @@ -2197,7 +2248,7 @@ def test_errorbar_with_integer_column_names(self): ax = _check_plot_works(df.plot, y=0, yerr=1) self._check_has_errorbars(ax, xerr=0, yerr=1) - @slow + @pytest.mark.slow def test_errorbar_with_partial_columns(self): df = DataFrame(np.random.randn(10, 3)) df_err = DataFrame(np.random.randn(10, 2), columns=[0, 2]) @@ -2220,36 +2271,37 @@ def test_errorbar_with_partial_columns(self): ax = _check_plot_works(df.plot, yerr=err) self._check_has_errorbars(ax, xerr=0, yerr=1) - @slow + @pytest.mark.slow def test_errorbar_timeseries(self): - d = {'x': np.arange(12), 'y': np.arange(12, 0, -1)} - d_err = {'x': np.ones(12) * 0.2, 'y': np.ones(12) * 0.4} + with warnings.catch_warnings(): + d = {'x': np.arange(12), 'y': np.arange(12, 0, -1)} + d_err = {'x': np.ones(12) * 0.2, 'y': np.ones(12) * 0.4} - # check time-series plots - ix = date_range('1/1/2000', '1/1/2001', freq='M') - tdf = DataFrame(d, index=ix) - tdf_err = DataFrame(d_err, index=ix) + # check time-series plots + ix = date_range('1/1/2000', '1/1/2001', freq='M') + tdf = DataFrame(d, index=ix) + tdf_err = DataFrame(d_err, index=ix) - kinds = ['line', 'bar', 'barh'] - for kind in kinds: - ax = _check_plot_works(tdf.plot, yerr=tdf_err, kind=kind) - self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(tdf.plot, yerr=d_err, kind=kind) - self._check_has_errorbars(ax, xerr=0, yerr=2) - ax = _check_plot_works(tdf.plot, y='y', yerr=tdf_err['x'], - kind=kind) - self._check_has_errorbars(ax, xerr=0, yerr=1) - ax = _check_plot_works(tdf.plot, y='y', yerr='x', kind=kind) - self._check_has_errorbars(ax, xerr=0, yerr=1) - ax = _check_plot_works(tdf.plot, yerr=tdf_err, kind=kind) - self._check_has_errorbars(ax, xerr=0, yerr=2) - # _check_plot_works adds an ax so catch warning. see GH #13188 - with tm.assert_produces_warning(UserWarning): + kinds = ['line', 'bar', 'barh'] + for kind in kinds: + ax = _check_plot_works(tdf.plot, yerr=tdf_err, kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works(tdf.plot, yerr=d_err, kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=2) + ax = _check_plot_works(tdf.plot, y='y', yerr=tdf_err['x'], + kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=1) + ax = _check_plot_works(tdf.plot, y='y', yerr='x', kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=1) + ax = _check_plot_works(tdf.plot, yerr=tdf_err, kind=kind) + self._check_has_errorbars(ax, xerr=0, yerr=2) + + # _check_plot_works adds an ax so catch warning. see GH #13188 axes = _check_plot_works(tdf.plot, kind=kind, yerr=tdf_err, subplots=True) - self._check_has_errorbars(axes, xerr=0, yerr=1) + self._check_has_errorbars(axes, xerr=0, yerr=1) def test_errorbar_asymmetrical(self): @@ -2267,15 +2319,12 @@ def test_errorbar_asymmetrical(self): expected_0_0 = err[0, :, 0] * np.array([-1, 1]) tm.assert_almost_equal(yerr_0_0, expected_0_0) else: - self.assertEqual(ax.lines[7].get_ydata()[0], - data[0, 1] - err[1, 0, 0]) - self.assertEqual(ax.lines[8].get_ydata()[0], - data[0, 1] + err[1, 1, 0]) + assert ax.lines[7].get_ydata()[0] == data[0, 1] - err[1, 0, 0] + assert ax.lines[8].get_ydata()[0] == data[0, 1] + err[1, 1, 0] + assert ax.lines[5].get_xdata()[0] == -err[1, 0, 0] / 2 + assert ax.lines[6].get_xdata()[0] == err[1, 1, 0] / 2 - self.assertEqual(ax.lines[5].get_xdata()[0], -err[1, 0, 0] / 2) - self.assertEqual(ax.lines[6].get_xdata()[0], err[1, 1, 0] / 2) - - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot(yerr=err.T) tm.close() @@ -2287,9 +2336,9 @@ def test_table(self): _check_plot_works(df.plot, table=df) ax = df.plot() - self.assertTrue(len(ax.tables) == 0) + assert len(ax.tables) == 0 plotting.table(ax, df.T) - self.assertTrue(len(ax.tables) == 1) + assert len(ax.tables) == 1 def test_errorbar_scatter(self): df = DataFrame( @@ -2333,7 +2382,7 @@ def _check_errorbar_color(containers, expected, has_err='has_xerr'): self._check_has_errorbars(ax, xerr=0, yerr=1) _check_errorbar_color(ax.containers, 'green', has_err='has_yerr') - @slow + @pytest.mark.slow def test_sharex_and_ax(self): # https://github.com/pandas-dev/pandas/issues/9737 using gridspec, # the axis in fig.get_axis() are sorted differently than pandas @@ -2349,7 +2398,7 @@ def test_sharex_and_ax(self): def _check(axes): for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) for ax in [axes[0], axes[2]]: self._check_visible(ax.get_xticklabels(), visible=False) @@ -2379,13 +2428,13 @@ def _check(axes): gs.tight_layout(plt.gcf()) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) self._check_visible(ax.get_xticklabels(minor=True), visible=True) tm.close() - @slow + @pytest.mark.slow def test_sharey_and_ax(self): # https://github.com/pandas-dev/pandas/issues/9737 using gridspec, # the axis in fig.get_axis() are sorted differently than pandas @@ -2401,7 +2450,7 @@ def test_sharey_and_ax(self): def _check(axes): for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_xticklabels(), visible=True) self._check_visible( ax.get_xticklabels(minor=True), visible=True) @@ -2431,7 +2480,7 @@ def _check(axes): gs.tight_layout(plt.gcf()) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) self._check_visible(ax.get_xticklabels(minor=True), visible=True) @@ -2442,7 +2491,7 @@ def test_memory_leak(self): import gc results = {} - for kind in plotting._plot_klass.keys(): + for kind in plotting._core._plot_klass.keys(): if not _ok_for_gaussian_kde(kind): continue args = {} @@ -2464,11 +2513,11 @@ def test_memory_leak(self): gc.collect() for key in results: # check that every plot was collected - with tm.assertRaises(ReferenceError): + with pytest.raises(ReferenceError): # need to actually access something to get an error results[key].lines - @slow + @pytest.mark.slow def test_df_subplots_patterns_minorticks(self): # GH 10657 import matplotlib.pyplot as plt @@ -2481,7 +2530,7 @@ def test_df_subplots_patterns_minorticks(self): fig, axes = plt.subplots(2, 1, sharex=True) axes = df.plot(subplots=True, ax=axes) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) # xaxis of 1st ax must be hidden self._check_visible(axes[0].get_xticklabels(), visible=False) @@ -2494,7 +2543,7 @@ def test_df_subplots_patterns_minorticks(self): with tm.assert_produces_warning(UserWarning): axes = df.plot(subplots=True, ax=axes, sharex=True) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) # xaxis of 1st ax must be hidden self._check_visible(axes[0].get_xticklabels(), visible=False) @@ -2507,13 +2556,13 @@ def test_df_subplots_patterns_minorticks(self): fig, axes = plt.subplots(2, 1) axes = df.plot(subplots=True, ax=axes) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) self._check_visible(ax.get_xticklabels(minor=True), visible=True) tm.close() - @slow + @pytest.mark.slow def test_df_gridspec_patterns(self): # GH 10819 import matplotlib.pyplot as plt @@ -2541,9 +2590,9 @@ def _get_horizontal_grid(): for ax1, ax2 in [_get_vertical_grid(), _get_horizontal_grid()]: ax1 = ts.plot(ax=ax1) - self.assertEqual(len(ax1.lines), 1) + assert len(ax1.lines) == 1 ax2 = df.plot(ax=ax2) - self.assertEqual(len(ax2.lines), 2) + assert len(ax2.lines) == 2 for ax in [ax1, ax2]: self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) @@ -2554,8 +2603,8 @@ def _get_horizontal_grid(): # subplots=True for ax1, ax2 in [_get_vertical_grid(), _get_horizontal_grid()]: axes = df.plot(subplots=True, ax=[ax1, ax2]) - self.assertEqual(len(ax1.lines), 1) - self.assertEqual(len(ax2.lines), 1) + assert len(ax1.lines) == 1 + assert len(ax2.lines) == 1 for ax in axes: self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) @@ -2568,8 +2617,8 @@ def _get_horizontal_grid(): with tm.assert_produces_warning(UserWarning): axes = df.plot(subplots=True, ax=[ax1, ax2], sharex=True, sharey=True) - self.assertEqual(len(axes[0].lines), 1) - self.assertEqual(len(axes[1].lines), 1) + assert len(axes[0].lines) == 1 + assert len(axes[1].lines) == 1 for ax in [ax1, ax2]: # yaxis are visible because there is only one column self._check_visible(ax.get_yticklabels(), visible=True) @@ -2585,8 +2634,8 @@ def _get_horizontal_grid(): with tm.assert_produces_warning(UserWarning): axes = df.plot(subplots=True, ax=[ax1, ax2], sharex=True, sharey=True) - self.assertEqual(len(axes[0].lines), 1) - self.assertEqual(len(axes[1].lines), 1) + assert len(axes[0].lines) == 1 + assert len(axes[1].lines) == 1 self._check_visible(axes[0].get_yticklabels(), visible=True) # yaxis of axes1 (right) are hidden self._check_visible(axes[1].get_yticklabels(), visible=False) @@ -2611,7 +2660,7 @@ def _get_boxed_grid(): index=ts.index, columns=list('ABCD')) axes = df.plot(subplots=True, ax=axes) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 # axis are visible because these are not shared self._check_visible(ax.get_yticklabels(), visible=True) self._check_visible(ax.get_xticklabels(), visible=True) @@ -2623,7 +2672,7 @@ def _get_boxed_grid(): with tm.assert_produces_warning(UserWarning): axes = df.plot(subplots=True, ax=axes, sharex=True, sharey=True) for ax in axes: - self.assertEqual(len(ax.lines), 1) + assert len(ax.lines) == 1 for ax in [axes[0], axes[2]]: # left column self._check_visible(ax.get_yticklabels(), visible=True) for ax in [axes[1], axes[3]]: # right column @@ -2636,31 +2685,17 @@ def _get_boxed_grid(): self._check_visible(ax.get_xticklabels(minor=True), visible=True) tm.close() - @slow + @pytest.mark.slow def test_df_grid_settings(self): # Make sure plot defaults to rcParams['axes.grid'] setting, GH 9792 self._check_grid_settings( DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4]}), - plotting._dataframe_kinds, kws={'x': 'a', 'y': 'b'}) - - def test_option_mpl_style(self): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - set_option('display.mpl_style', 'default') - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - set_option('display.mpl_style', None) - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - set_option('display.mpl_style', False) - - with tm.assertRaises(ValueError): - set_option('display.mpl_style', 'default2') + plotting._core._dataframe_kinds, kws={'x': 'a', 'y': 'b'}) def test_invalid_colormap(self): df = DataFrame(randn(3, 2), columns=['A', 'B']) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.plot(colormap='invalid_colormap') def test_plain_axes(self): @@ -2697,8 +2732,7 @@ def test_passed_bar_colors(self): color_tuples = [(0.9, 0, 0, 1), (0, 0.9, 0, 1), (0, 0, 0.9, 1)] colormap = mpl.colors.ListedColormap(color_tuples) barplot = pd.DataFrame([[1, 2, 3]]).plot(kind="bar", cmap=colormap) - self.assertEqual(color_tuples, [c.get_facecolor() - for c in barplot.patches]) + assert color_tuples == [c.get_facecolor() for c in barplot.patches] def test_rcParams_bar_colors(self): import matplotlib as mpl @@ -2710,8 +2744,24 @@ def test_rcParams_bar_colors(self): except (AttributeError, KeyError): # mpl 1.4 with mpl.rc_context(rc={'axes.color_cycle': color_tuples}): barplot = pd.DataFrame([[1, 2, 3]]).plot(kind="bar") - self.assertEqual(color_tuples, [c.get_facecolor() - for c in barplot.patches]) + assert color_tuples == [c.get_facecolor() for c in barplot.patches] + + @pytest.mark.parametrize('method', ['line', 'barh', 'bar']) + def test_secondary_axis_font_size(self, method): + # GH: 12565 + df = (pd.DataFrame(np.random.randn(15, 2), + columns=list('AB')) + .assign(C=lambda df: df.B.cumsum()) + .assign(D=lambda df: df.C * 1.1)) + + fontsize = 20 + sy = ['C', 'D'] + + kwargs = dict(secondary_y=sy, fontsize=fontsize, + mark_right=True) + ax = getattr(df.plot, method)(**kwargs) + self._check_ticks_props(axes=ax.right_ax, + ylabelsize=fontsize) def _generate_4_axes_via_gridspec(): @@ -2726,8 +2776,3 @@ def _generate_4_axes_via_gridspec(): ax_lr = plt.subplot(gs[1, 1]) return gs, [ax_tl, ax_ll, ax_tr, ax_lr] - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/plotting/test_groupby.py b/pandas/tests/plotting/test_groupby.py index 3c682fbfbb89e..de48b58133e9a 100644 --- a/pandas/tests/plotting/test_groupby.py +++ b/pandas/tests/plotting/test_groupby.py @@ -1,6 +1,7 @@ # coding: utf-8 -import nose +""" Test cases for GroupBy.plot """ + from pandas import Series, DataFrame import pandas.util.testing as tm @@ -9,11 +10,9 @@ from pandas.tests.plotting.common import TestPlotBase - -""" Test cases for GroupBy.plot """ +tm._skip_if_no_mpl() -@tm.mplskip class TestDataFrameGroupByPlots(TestPlotBase): def test_series_groupby_plotting_nominally_works(self): @@ -70,12 +69,7 @@ def test_plot_kwargs(self): res = df.groupby('z').plot(kind='scatter', x='x', y='y') # check that a scatter plot is effectively plotted: the axes should # contain a PathCollection from the scatter plot (GH11805) - self.assertEqual(len(res['a'].collections), 1) + assert len(res['a'].collections) == 1 res = df.groupby('z').plot.scatter(x='x', y='y') - self.assertEqual(len(res['a'].collections), 1) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert len(res['a'].collections) == 1 diff --git a/pandas/tests/plotting/test_hist_method.py b/pandas/tests/plotting/test_hist_method.py index bde5544390b85..5f7b2dd2d6ca9 100644 --- a/pandas/tests/plotting/test_hist_method.py +++ b/pandas/tests/plotting/test_hist_method.py @@ -1,33 +1,33 @@ # coding: utf-8 -import nose +""" Test cases for .hist method """ + +import pytest from pandas import Series, DataFrame import pandas.util.testing as tm -from pandas.util.testing import slow import numpy as np from numpy.random import randn -import pandas.tools.plotting as plotting +from pandas.plotting._core import grouped_hist from pandas.tests.plotting.common import (TestPlotBase, _check_plot_works) -""" Test cases for .hist method """ +tm._skip_if_no_mpl() -@tm.mplskip class TestSeriesPlots(TestPlotBase): - def setUp(self): - TestPlotBase.setUp(self) + def setup_method(self, method): + TestPlotBase.setup_method(self, method) import matplotlib as mpl mpl.rcdefaults() self.ts = tm.makeTimeSeries() self.ts.name = 'ts' - @slow + @pytest.mark.slow def test_hist_legacy(self): _check_plot_works(self.ts.hist) _check_plot_works(self.ts.hist, grid=False) @@ -48,25 +48,25 @@ def test_hist_legacy(self): _check_plot_works(self.ts.hist, figure=fig, ax=ax1) _check_plot_works(self.ts.hist, figure=fig, ax=ax2) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): self.ts.hist(by=self.ts.index, figure=fig) - @slow + @pytest.mark.slow def test_hist_bins_legacy(self): df = DataFrame(np.random.randn(10, 2)) ax = df.hist(bins=2)[0][0] - self.assertEqual(len(ax.patches), 2) + assert len(ax.patches) == 2 - @slow + @pytest.mark.slow def test_hist_layout(self): df = self.hist_df - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.height.hist(layout=(1, 1)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.height.hist(layout=[1, 1]) - @slow + @pytest.mark.slow def test_hist_layout_with_by(self): df = self.hist_df @@ -112,7 +112,7 @@ def test_hist_layout_with_by(self): self._check_axes_shape( axes, axes_num=4, layout=(4, 2), figsize=(12, 7)) - @slow + @pytest.mark.slow def test_hist_no_overlap(self): from matplotlib.pyplot import subplot, gcf x = Series(randn(2)) @@ -123,28 +123,27 @@ def test_hist_no_overlap(self): y.hist() fig = gcf() axes = fig.axes if self.mpl_ge_1_5_0 else fig.get_axes() - self.assertEqual(len(axes), 2) + assert len(axes) == 2 - @slow + @pytest.mark.slow def test_hist_by_no_extra_plots(self): df = self.hist_df axes = df.height.hist(by=df.gender) # noqa - self.assertEqual(len(self.plt.get_fignums()), 1) + assert len(self.plt.get_fignums()) == 1 - @slow + @pytest.mark.slow def test_plot_fails_when_ax_differs_from_figure(self): from pylab import figure fig1 = figure() fig2 = figure() ax1 = fig1.add_subplot(111) - with tm.assertRaises(AssertionError): + with pytest.raises(AssertionError): self.ts.hist(ax=ax1, figure=fig2) -@tm.mplskip class TestDataFramePlots(TestPlotBase): - @slow + @pytest.mark.slow def test_hist_df_legacy(self): from matplotlib.patches import Rectangle with tm.assert_produces_warning(UserWarning): @@ -155,7 +154,7 @@ def test_hist_df_legacy(self): with tm.assert_produces_warning(UserWarning): axes = _check_plot_works(df.hist, grid=False) self._check_axes_shape(axes, axes_num=3, layout=(2, 2)) - self.assertFalse(axes[1, 1].get_visible()) + assert not axes[1, 1].get_visible() df = DataFrame(randn(100, 1)) _check_plot_works(df.hist) @@ -197,7 +196,7 @@ def test_hist_df_legacy(self): ax = ser.hist(normed=True, cumulative=True, bins=4) # height of last bin (index 5) must be 1.0 rects = [x for x in ax.get_children() if isinstance(x, Rectangle)] - self.assertAlmostEqual(rects[-1].get_height(), 1.0) + tm.assert_almost_equal(rects[-1].get_height(), 1.0) tm.close() ax = ser.hist(log=True) @@ -207,10 +206,10 @@ def test_hist_df_legacy(self): tm.close() # propagate attr exception from matplotlib.Axes.hist - with tm.assertRaises(AttributeError): + with pytest.raises(AttributeError): ser.hist(foo='bar') - @slow + @pytest.mark.slow def test_hist_layout(self): df = DataFrame(randn(100, 3)) @@ -232,20 +231,29 @@ def test_hist_layout(self): self._check_axes_shape(axes, axes_num=3, layout=expected) # layout too small for all 4 plots - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.hist(layout=(1, 1)) # invalid format for layout - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.hist(layout=(1,)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.hist(layout=(-1, -1)) + @pytest.mark.slow + # GH 9351 + def test_tight_layout(self): + if self.mpl_ge_2_0_1: + df = DataFrame(randn(100, 3)) + _check_plot_works(df.hist) + self.plt.tight_layout() + + tm.close() + -@tm.mplskip class TestDataFrameGroupByPlots(TestPlotBase): - @slow + @pytest.mark.slow def test_grouped_hist_legacy(self): from matplotlib.patches import Rectangle @@ -253,7 +261,7 @@ def test_grouped_hist_legacy(self): df['C'] = np.random.randint(0, 4, 500) df['D'] = ['X'] * 500 - axes = plotting.grouped_hist(df.A, by=df.C) + axes = grouped_hist(df.A, by=df.C) self._check_axes_shape(axes, axes_num=4, layout=(2, 2)) tm.close() @@ -270,32 +278,31 @@ def test_grouped_hist_legacy(self): # make sure kwargs to hist are handled xf, yf = 20, 18 xrot, yrot = 30, 40 - axes = plotting.grouped_hist(df.A, by=df.C, normed=True, - cumulative=True, bins=4, - xlabelsize=xf, xrot=xrot, - ylabelsize=yf, yrot=yrot) + axes = grouped_hist(df.A, by=df.C, normed=True, cumulative=True, + bins=4, xlabelsize=xf, xrot=xrot, + ylabelsize=yf, yrot=yrot) # height of last bin (index 5) must be 1.0 for ax in axes.ravel(): rects = [x for x in ax.get_children() if isinstance(x, Rectangle)] height = rects[-1].get_height() - self.assertAlmostEqual(height, 1.0) + tm.assert_almost_equal(height, 1.0) self._check_ticks_props(axes, xlabelsize=xf, xrot=xrot, ylabelsize=yf, yrot=yrot) tm.close() - axes = plotting.grouped_hist(df.A, by=df.C, log=True) + axes = grouped_hist(df.A, by=df.C, log=True) # scale of y must be 'log' self._check_ax_scales(axes, yaxis='log') tm.close() # propagate attr exception from matplotlib.Axes.hist - with tm.assertRaises(AttributeError): - plotting.grouped_hist(df.A, by=df.C, foo='bar') + with pytest.raises(AttributeError): + grouped_hist(df.A, by=df.C, foo='bar') with tm.assert_produces_warning(FutureWarning): df.hist(by='C', figsize='default') - @slow + @pytest.mark.slow def test_grouped_hist_legacy2(self): n = 10 weight = Series(np.random.normal(166, 20, size=n)) @@ -306,19 +313,19 @@ def test_grouped_hist_legacy2(self): 'gender': gender_int}) gb = df_int.groupby('gender') axes = gb.hist() - self.assertEqual(len(axes), 2) - self.assertEqual(len(self.plt.get_fignums()), 2) + assert len(axes) == 2 + assert len(self.plt.get_fignums()) == 2 tm.close() - @slow + @pytest.mark.slow def test_grouped_hist_layout(self): df = self.hist_df - self.assertRaises(ValueError, df.hist, column='weight', by=df.gender, - layout=(1, 1)) - self.assertRaises(ValueError, df.hist, column='height', by=df.category, - layout=(1, 3)) - self.assertRaises(ValueError, df.hist, column='height', by=df.category, - layout=(-1, -1)) + pytest.raises(ValueError, df.hist, column='weight', by=df.gender, + layout=(1, 1)) + pytest.raises(ValueError, df.hist, column='height', by=df.category, + layout=(1, 3)) + pytest.raises(ValueError, df.hist, column='height', by=df.category, + layout=(-1, -1)) with tm.assert_produces_warning(UserWarning): axes = _check_plot_works(df.hist, column='height', by=df.gender, @@ -359,7 +366,7 @@ def test_grouped_hist_layout(self): axes = df.hist(column=['height', 'weight', 'category']) self._check_axes_shape(axes, axes_num=3, layout=(2, 2)) - @slow + @pytest.mark.slow def test_grouped_hist_multiple_axes(self): # GH 6970, GH 7069 df = self.hist_df @@ -367,59 +374,54 @@ def test_grouped_hist_multiple_axes(self): fig, axes = self.plt.subplots(2, 3) returned = df.hist(column=['height', 'weight', 'category'], ax=axes[0]) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assert_numpy_array_equal(returned, axes[0]) - self.assertIs(returned[0].figure, fig) + tm.assert_numpy_array_equal(returned, axes[0]) + assert returned[0].figure is fig returned = df.hist(by='classroom', ax=axes[1]) self._check_axes_shape(returned, axes_num=3, layout=(1, 3)) - self.assert_numpy_array_equal(returned, axes[1]) - self.assertIs(returned[0].figure, fig) + tm.assert_numpy_array_equal(returned, axes[1]) + assert returned[0].figure is fig - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): fig, axes = self.plt.subplots(2, 3) # pass different number of axes from required axes = df.hist(column='height', ax=axes) - @slow + @pytest.mark.slow def test_axis_share_x(self): df = self.hist_df # GH4089 ax1, ax2 = df.hist(column='height', by=df.gender, sharex=True) # share x - self.assertTrue(ax1._shared_x_axes.joined(ax1, ax2)) - self.assertTrue(ax2._shared_x_axes.joined(ax1, ax2)) + assert ax1._shared_x_axes.joined(ax1, ax2) + assert ax2._shared_x_axes.joined(ax1, ax2) # don't share y - self.assertFalse(ax1._shared_y_axes.joined(ax1, ax2)) - self.assertFalse(ax2._shared_y_axes.joined(ax1, ax2)) + assert not ax1._shared_y_axes.joined(ax1, ax2) + assert not ax2._shared_y_axes.joined(ax1, ax2) - @slow + @pytest.mark.slow def test_axis_share_y(self): df = self.hist_df ax1, ax2 = df.hist(column='height', by=df.gender, sharey=True) # share y - self.assertTrue(ax1._shared_y_axes.joined(ax1, ax2)) - self.assertTrue(ax2._shared_y_axes.joined(ax1, ax2)) + assert ax1._shared_y_axes.joined(ax1, ax2) + assert ax2._shared_y_axes.joined(ax1, ax2) # don't share x - self.assertFalse(ax1._shared_x_axes.joined(ax1, ax2)) - self.assertFalse(ax2._shared_x_axes.joined(ax1, ax2)) + assert not ax1._shared_x_axes.joined(ax1, ax2) + assert not ax2._shared_x_axes.joined(ax1, ax2) - @slow + @pytest.mark.slow def test_axis_share_xy(self): df = self.hist_df ax1, ax2 = df.hist(column='height', by=df.gender, sharex=True, sharey=True) # share both x and y - self.assertTrue(ax1._shared_x_axes.joined(ax1, ax2)) - self.assertTrue(ax2._shared_x_axes.joined(ax1, ax2)) - - self.assertTrue(ax1._shared_y_axes.joined(ax1, ax2)) - self.assertTrue(ax2._shared_y_axes.joined(ax1, ax2)) - + assert ax1._shared_x_axes.joined(ax1, ax2) + assert ax2._shared_x_axes.joined(ax1, ax2) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert ax1._shared_y_axes.joined(ax1, ax2) + assert ax2._shared_y_axes.joined(ax1, ax2) diff --git a/pandas/tests/plotting/test_misc.py b/pandas/tests/plotting/test_misc.py index 2650ce2879db7..c4795ea1e1eca 100644 --- a/pandas/tests/plotting/test_misc.py +++ b/pandas/tests/plotting/test_misc.py @@ -1,89 +1,55 @@ # coding: utf-8 -import nose +""" Test cases for misc plot functions """ + +import pytest -from pandas import Series, DataFrame +from pandas import DataFrame from pandas.compat import lmap import pandas.util.testing as tm -from pandas.util.testing import slow import numpy as np from numpy import random from numpy.random import randn -import pandas.tools.plotting as plotting -from pandas.tests.plotting.common import (TestPlotBase, _check_plot_works, - _ok_for_gaussian_kde) +import pandas.plotting as plotting +from pandas.tests.plotting.common import TestPlotBase, _check_plot_works -""" Test cases for misc plot functions """ +tm._skip_if_no_mpl() -@tm.mplskip class TestSeriesPlots(TestPlotBase): - def setUp(self): - TestPlotBase.setUp(self) + + def setup_method(self, method): + TestPlotBase.setup_method(self, method) import matplotlib as mpl mpl.rcdefaults() self.ts = tm.makeTimeSeries() self.ts.name = 'ts' - @slow + @pytest.mark.slow def test_autocorrelation_plot(self): - from pandas.tools.plotting import autocorrelation_plot + from pandas.plotting import autocorrelation_plot _check_plot_works(autocorrelation_plot, series=self.ts) _check_plot_works(autocorrelation_plot, series=self.ts.values) ax = autocorrelation_plot(self.ts, label='Test') self._check_legend_labels(ax, labels=['Test']) - @slow + @pytest.mark.slow def test_lag_plot(self): - from pandas.tools.plotting import lag_plot + from pandas.plotting import lag_plot _check_plot_works(lag_plot, series=self.ts) _check_plot_works(lag_plot, series=self.ts, lag=5) - @slow + @pytest.mark.slow def test_bootstrap_plot(self): - from pandas.tools.plotting import bootstrap_plot + from pandas.plotting import bootstrap_plot _check_plot_works(bootstrap_plot, series=self.ts, size=10) -@tm.mplskip class TestDataFramePlots(TestPlotBase): - @slow - def test_scatter_plot_legacy(self): - tm._skip_if_no_scipy() - - df = DataFrame(randn(100, 2)) - - def scat(**kwds): - return plotting.scatter_matrix(df, **kwds) - - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat) - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, marker='+') - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, vmin=0) - if _ok_for_gaussian_kde('kde'): - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, diagonal='kde') - if _ok_for_gaussian_kde('density'): - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, diagonal='density') - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, diagonal='hist') - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat, range_padding=.1) - - def scat2(x, y, by=None, ax=None, figsize=None): - return plotting.scatter_plot(df, x, y, by, ax, figsize=None) - - _check_plot_works(scat2, x=0, y=1) - grouper = Series(np.repeat([1, 2, 3, 4, 5], 20), df.index) - with tm.assert_produces_warning(UserWarning): - _check_plot_works(scat2, x=0, y=1, by=grouper) def test_scatter_matrix_axis(self): tm._skip_if_no_scipy() @@ -122,9 +88,9 @@ def test_scatter_matrix_axis(self): self._check_ticks_props( axes, xlabelsize=8, xrot=90, ylabelsize=8, yrot=0) - @slow + @pytest.mark.slow def test_andrews_curves(self): - from pandas.tools.plotting import andrews_curves + from pandas.plotting import andrews_curves from matplotlib import cm df = self.iris @@ -187,9 +153,9 @@ def test_andrews_curves(self): with tm.assert_produces_warning(FutureWarning): andrews_curves(data=df, class_column='Name') - @slow + @pytest.mark.slow def test_parallel_coordinates(self): - from pandas.tools.plotting import parallel_coordinates + from pandas.plotting import parallel_coordinates from matplotlib import cm df = self.iris @@ -235,9 +201,29 @@ def test_parallel_coordinates(self): with tm.assert_produces_warning(FutureWarning): parallel_coordinates(df, 'Name', colors=colors) - @slow + def test_parallel_coordinates_with_sorted_labels(self): + """ For #15908 """ + from pandas.plotting import parallel_coordinates + + df = DataFrame({"feat": [i for i in range(30)], + "class": [2 for _ in range(10)] + + [3 for _ in range(10)] + + [1 for _ in range(10)]}) + ax = parallel_coordinates(df, 'class', sort_labels=True) + polylines, labels = ax.get_legend_handles_labels() + color_label_tuples = \ + zip([polyline.get_color() for polyline in polylines], labels) + ordered_color_label_tuples = sorted(color_label_tuples, + key=lambda x: x[1]) + prev_next_tupels = zip([i for i in ordered_color_label_tuples[0:-1]], + [i for i in ordered_color_label_tuples[1:]]) + for prev, nxt in prev_next_tupels: + # lables and colors are ordered strictly increasing + assert prev[1] < nxt[1] and prev[0] < nxt[0] + + @pytest.mark.slow def test_radviz(self): - from pandas.tools.plotting import radviz + from pandas.plotting import radviz from matplotlib import cm df = self.iris @@ -273,7 +259,7 @@ def test_radviz(self): handles, labels = ax.get_legend_handles_labels() self._check_colors(handles, facecolors=colors) - @slow + @pytest.mark.slow def test_subplot_titles(self): df = self.iris.drop('Name', axis=1).head() # Use the column names as the subplot titles @@ -281,25 +267,20 @@ def test_subplot_titles(self): # Case len(title) == len(df) plot = df.plot(subplots=True, title=title) - self.assertEqual([p.get_title() for p in plot], title) + assert [p.get_title() for p in plot] == title # Case len(title) > len(df) - self.assertRaises(ValueError, df.plot, subplots=True, - title=title + ["kittens > puppies"]) + pytest.raises(ValueError, df.plot, subplots=True, + title=title + ["kittens > puppies"]) # Case len(title) < len(df) - self.assertRaises(ValueError, df.plot, subplots=True, title=title[:2]) + pytest.raises(ValueError, df.plot, subplots=True, title=title[:2]) # Case subplots=False and title is of type list - self.assertRaises(ValueError, df.plot, subplots=False, title=title) + pytest.raises(ValueError, df.plot, subplots=False, title=title) # Case df with 3 numeric columns but layout of (2,2) plot = df.drop('SepalWidth', axis=1).plot(subplots=True, layout=(2, 2), title=title[:-1]) title_list = [ax.get_title() for sublist in plot for ax in sublist] - self.assertEqual(title_list, title[:3] + ['']) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert title_list == title[:3] + [''] diff --git a/pandas/tests/plotting/test_series.py b/pandas/tests/plotting/test_series.py index f668c46a15173..8164ad74a190a 100644 --- a/pandas/tests/plotting/test_series.py +++ b/pandas/tests/plotting/test_series.py @@ -1,7 +1,10 @@ # coding: utf-8 -import nose +""" Test cases for Series.plot """ + + import itertools +import pytest from datetime import datetime @@ -9,25 +12,22 @@ from pandas import Series, DataFrame, date_range from pandas.compat import range, lrange import pandas.util.testing as tm -from pandas.util.testing import slow import numpy as np from numpy.random import randn -import pandas.tools.plotting as plotting +import pandas.plotting as plotting from pandas.tests.plotting.common import (TestPlotBase, _check_plot_works, _skip_if_no_scipy_gaussian_kde, _ok_for_gaussian_kde) +tm._skip_if_no_mpl() -""" Test cases for Series.plot """ - -@tm.mplskip class TestSeriesPlots(TestPlotBase): - def setUp(self): - TestPlotBase.setUp(self) + def setup_method(self, method): + TestPlotBase.setup_method(self, method) import matplotlib as mpl mpl.rcdefaults() @@ -40,7 +40,7 @@ def setUp(self): self.iseries = tm.makePeriodSeries() self.iseries.name = 'iseries' - @slow + @pytest.mark.slow def test_plot(self): _check_plot_works(self.ts.plot, label='foo') _check_plot_works(self.ts.plot, use_index=False) @@ -78,10 +78,11 @@ def test_plot(self): ax = _check_plot_works(self.ts.plot, subplots=True, layout=(1, -1)) self._check_axes_shape(ax, axes_num=1, layout=(1, 1)) - @slow + @pytest.mark.slow def test_plot_figsize_and_title(self): # figsize and title - ax = self.series.plot(title='Test', figsize=(16, 8)) + _, ax = self.plt.subplots() + ax = self.series.plot(title='Test', figsize=(16, 8), ax=ax) self._check_text_labels(ax.title, 'Test') self._check_axes_shape(ax, axes_num=1, layout=(1, 1), figsize=(16, 8)) @@ -92,74 +93,85 @@ def test_dont_modify_rcParams(self): else: key = 'axes.color_cycle' colors = self.plt.rcParams[key] - Series([1, 2, 3]).plot() - self.assertEqual(colors, self.plt.rcParams[key]) + _, ax = self.plt.subplots() + Series([1, 2, 3]).plot(ax=ax) + assert colors == self.plt.rcParams[key] def test_ts_line_lim(self): - ax = self.ts.plot() + fig, ax = self.plt.subplots() + ax = self.ts.plot(ax=ax) xmin, xmax = ax.get_xlim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data(orig=False)[0][0]) - self.assertEqual(xmax, lines[0].get_data(orig=False)[0][-1]) + assert xmin == lines[0].get_data(orig=False)[0][0] + assert xmax == lines[0].get_data(orig=False)[0][-1] tm.close() - ax = self.ts.plot(secondary_y=True) + ax = self.ts.plot(secondary_y=True, ax=ax) xmin, xmax = ax.get_xlim() lines = ax.get_lines() - self.assertEqual(xmin, lines[0].get_data(orig=False)[0][0]) - self.assertEqual(xmax, lines[0].get_data(orig=False)[0][-1]) + assert xmin == lines[0].get_data(orig=False)[0][0] + assert xmax == lines[0].get_data(orig=False)[0][-1] def test_ts_area_lim(self): - ax = self.ts.plot.area(stacked=False) + _, ax = self.plt.subplots() + ax = self.ts.plot.area(stacked=False, ax=ax) xmin, xmax = ax.get_xlim() line = ax.get_lines()[0].get_data(orig=False)[0] - self.assertEqual(xmin, line[0]) - self.assertEqual(xmax, line[-1]) + assert xmin == line[0] + assert xmax == line[-1] tm.close() # GH 7471 - ax = self.ts.plot.area(stacked=False, x_compat=True) + _, ax = self.plt.subplots() + ax = self.ts.plot.area(stacked=False, x_compat=True, ax=ax) xmin, xmax = ax.get_xlim() line = ax.get_lines()[0].get_data(orig=False)[0] - self.assertEqual(xmin, line[0]) - self.assertEqual(xmax, line[-1]) + assert xmin == line[0] + assert xmax == line[-1] tm.close() tz_ts = self.ts.copy() tz_ts.index = tz_ts.tz_localize('GMT').tz_convert('CET') - ax = tz_ts.plot.area(stacked=False, x_compat=True) + _, ax = self.plt.subplots() + ax = tz_ts.plot.area(stacked=False, x_compat=True, ax=ax) xmin, xmax = ax.get_xlim() line = ax.get_lines()[0].get_data(orig=False)[0] - self.assertEqual(xmin, line[0]) - self.assertEqual(xmax, line[-1]) + assert xmin == line[0] + assert xmax == line[-1] tm.close() - ax = tz_ts.plot.area(stacked=False, secondary_y=True) + _, ax = self.plt.subplots() + ax = tz_ts.plot.area(stacked=False, secondary_y=True, ax=ax) xmin, xmax = ax.get_xlim() line = ax.get_lines()[0].get_data(orig=False)[0] - self.assertEqual(xmin, line[0]) - self.assertEqual(xmax, line[-1]) + assert xmin == line[0] + assert xmax == line[-1] def test_label(self): s = Series([1, 2]) - ax = s.plot(label='LABEL', legend=True) + _, ax = self.plt.subplots() + ax = s.plot(label='LABEL', legend=True, ax=ax) self._check_legend_labels(ax, labels=['LABEL']) self.plt.close() - ax = s.plot(legend=True) + _, ax = self.plt.subplots() + ax = s.plot(legend=True, ax=ax) self._check_legend_labels(ax, labels=['None']) self.plt.close() # get name from index s.name = 'NAME' - ax = s.plot(legend=True) + _, ax = self.plt.subplots() + ax = s.plot(legend=True, ax=ax) self._check_legend_labels(ax, labels=['NAME']) self.plt.close() # override the default - ax = s.plot(legend=True, label='LABEL') + _, ax = self.plt.subplots() + ax = s.plot(legend=True, label='LABEL', ax=ax) self._check_legend_labels(ax, labels=['LABEL']) self.plt.close() # Add lebel info, but don't draw - ax = s.plot(legend=False, label='LABEL') - self.assertEqual(ax.get_legend(), None) # Hasn't been drawn + _, ax = self.plt.subplots() + ax = s.plot(legend=False, label='LABEL', ax=ax) + assert ax.get_legend() is None # Hasn't been drawn ax.legend() # draw it self._check_legend_labels(ax, labels=['LABEL']) @@ -173,40 +185,44 @@ def test_line_area_nan_series(self): masked = ax.lines[0].get_ydata() # remove nan for comparison purpose exp = np.array([1, 2, 3], dtype=np.float64) - self.assert_numpy_array_equal(np.delete(masked.data, 2), exp) - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal(np.delete(masked.data, 2), exp) + tm.assert_numpy_array_equal( masked.mask, np.array([False, False, True, False])) expected = np.array([1, 2, 0, 3], dtype=np.float64) ax = _check_plot_works(d.plot, stacked=True) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) ax = _check_plot_works(d.plot.area) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) ax = _check_plot_works(d.plot.area, stacked=False) - self.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) + tm.assert_numpy_array_equal(ax.lines[0].get_ydata(), expected) def test_line_use_index_false(self): s = Series([1, 2, 3], index=['a', 'b', 'c']) s.index.name = 'The Index' - ax = s.plot(use_index=False) + _, ax = self.plt.subplots() + ax = s.plot(use_index=False, ax=ax) label = ax.get_xlabel() - self.assertEqual(label, '') - ax2 = s.plot.bar(use_index=False) + assert label == '' + _, ax = self.plt.subplots() + ax2 = s.plot.bar(use_index=False, ax=ax) label2 = ax2.get_xlabel() - self.assertEqual(label2, '') + assert label2 == '' - @slow + @pytest.mark.slow def test_bar_log(self): expected = np.array([1., 10., 100., 1000.]) if not self.mpl_le_1_2_1: expected = np.hstack((.1, expected, 1e4)) - ax = Series([200, 500]).plot.bar(log=True) + _, ax = self.plt.subplots() + ax = Series([200, 500]).plot.bar(log=True, ax=ax) tm.assert_numpy_array_equal(ax.yaxis.get_ticklocs(), expected) tm.close() - ax = Series([200, 500]).plot.barh(log=True) + _, ax = self.plt.subplots() + ax = Series([200, 500]).plot.barh(log=True, ax=ax) tm.assert_numpy_array_equal(ax.xaxis.get_ticklocs(), expected) tm.close() @@ -218,46 +234,52 @@ def test_bar_log(self): if self.mpl_ge_2_0_0: expected = np.hstack((1.0e-05, expected)) - ax = Series([0.1, 0.01, 0.001]).plot(log=True, kind='bar') + _, ax = self.plt.subplots() + ax = Series([0.1, 0.01, 0.001]).plot(log=True, kind='bar', ax=ax) ymin = 0.0007943282347242822 if self.mpl_ge_2_0_0 else 0.001 ymax = 0.12589254117941673 if self.mpl_ge_2_0_0 else .10000000000000001 res = ax.get_ylim() - self.assertAlmostEqual(res[0], ymin) - self.assertAlmostEqual(res[1], ymax) + tm.assert_almost_equal(res[0], ymin) + tm.assert_almost_equal(res[1], ymax) tm.assert_numpy_array_equal(ax.yaxis.get_ticklocs(), expected) tm.close() - ax = Series([0.1, 0.01, 0.001]).plot(log=True, kind='barh') + _, ax = self.plt.subplots() + ax = Series([0.1, 0.01, 0.001]).plot(log=True, kind='barh', ax=ax) res = ax.get_xlim() - self.assertAlmostEqual(res[0], ymin) - self.assertAlmostEqual(res[1], ymax) + tm.assert_almost_equal(res[0], ymin) + tm.assert_almost_equal(res[1], ymax) tm.assert_numpy_array_equal(ax.xaxis.get_ticklocs(), expected) - @slow + @pytest.mark.slow def test_bar_ignore_index(self): df = Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) - ax = df.plot.bar(use_index=False) + _, ax = self.plt.subplots() + ax = df.plot.bar(use_index=False, ax=ax) self._check_text_labels(ax.get_xticklabels(), ['0', '1', '2', '3']) def test_rotation(self): df = DataFrame(randn(5, 5)) # Default rot 0 - axes = df.plot() + _, ax = self.plt.subplots() + axes = df.plot(ax=ax) self._check_ticks_props(axes, xrot=0) - axes = df.plot(rot=30) + _, ax = self.plt.subplots() + axes = df.plot(rot=30, ax=ax) self._check_ticks_props(axes, xrot=30) def test_irregular_datetime(self): rng = date_range('1/1/2000', '3/1/2000') rng = rng[[0, 1, 2, 3, 5, 9, 10, 11, 12]] ser = Series(randn(len(rng)), rng) - ax = ser.plot() + _, ax = self.plt.subplots() + ax = ser.plot(ax=ax) xp = datetime(1999, 1, 1).toordinal() ax.set_xlim('1/1/1999', '1/1/2001') - self.assertEqual(xp, ax.get_xlim()[0]) + assert xp == ax.get_xlim()[0] - @slow + @pytest.mark.slow def test_pie_series(self): # if sum of values is less than 1.0, pie handle them as rate and draw # semicircle. @@ -265,7 +287,7 @@ def test_pie_series(self): index=['a', 'b', 'c', 'd', 'e'], name='YLABEL') ax = _check_plot_works(series.plot.pie) self._check_text_labels(ax.texts, series.index) - self.assertEqual(ax.get_ylabel(), 'YLABEL') + assert ax.get_ylabel() == 'YLABEL' # without wedge labels ax = _check_plot_works(series.plot.pie, labels=None) @@ -295,10 +317,10 @@ def test_pie_series(self): expected_texts = list(next(it) for it in itertools.cycle(iters)) self._check_text_labels(ax.texts, expected_texts) for t in ax.texts: - self.assertEqual(t.get_fontsize(), 7) + assert t.get_fontsize() == 7 # includes negative value - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): series = Series([1, 2, 0, 4, -1], index=['a', 'b', 'c', 'd', 'e']) series.plot.pie() @@ -310,31 +332,35 @@ def test_pie_series(self): def test_pie_nan(self): s = Series([1, np.nan, 1, 1]) - ax = s.plot.pie(legend=True) + _, ax = self.plt.subplots() + ax = s.plot.pie(legend=True, ax=ax) expected = ['0', '', '2', '3'] result = [x.get_text() for x in ax.texts] - self.assertEqual(result, expected) + assert result == expected - @slow + @pytest.mark.slow def test_hist_df_kwargs(self): df = DataFrame(np.random.randn(10, 2)) - ax = df.plot.hist(bins=5) - self.assertEqual(len(ax.patches), 10) + _, ax = self.plt.subplots() + ax = df.plot.hist(bins=5, ax=ax) + assert len(ax.patches) == 10 - @slow + @pytest.mark.slow def test_hist_df_with_nonnumerics(self): # GH 9853 with tm.RNGContext(1): df = DataFrame( np.random.randn(10, 4), columns=['A', 'B', 'C', 'D']) df['E'] = ['x', 'y'] * 5 - ax = df.plot.hist(bins=5) - self.assertEqual(len(ax.patches), 20) + _, ax = self.plt.subplots() + ax = df.plot.hist(bins=5, ax=ax) + assert len(ax.patches) == 20 - ax = df.plot.hist() # bins=10 - self.assertEqual(len(ax.patches), 40) + _, ax = self.plt.subplots() + ax = df.plot.hist(ax=ax) # bins=10 + assert len(ax.patches) == 40 - @slow + @pytest.mark.slow def test_hist_legacy(self): _check_plot_works(self.ts.hist) _check_plot_works(self.ts.hist, grid=False) @@ -357,25 +383,25 @@ def test_hist_legacy(self): _check_plot_works(self.ts.hist, figure=fig, ax=ax1) _check_plot_works(self.ts.hist, figure=fig, ax=ax2) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): self.ts.hist(by=self.ts.index, figure=fig) - @slow + @pytest.mark.slow def test_hist_bins_legacy(self): df = DataFrame(np.random.randn(10, 2)) ax = df.hist(bins=2)[0][0] - self.assertEqual(len(ax.patches), 2) + assert len(ax.patches) == 2 - @slow + @pytest.mark.slow def test_hist_layout(self): df = self.hist_df - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.height.hist(layout=(1, 1)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.height.hist(layout=[1, 1]) - @slow + @pytest.mark.slow def test_hist_layout_with_by(self): df = self.hist_df @@ -419,7 +445,7 @@ def test_hist_layout_with_by(self): self._check_axes_shape(axes, axes_num=4, layout=(4, 2), figsize=(12, 7)) - @slow + @pytest.mark.slow def test_hist_no_overlap(self): from matplotlib.pyplot import subplot, gcf x = Series(randn(2)) @@ -430,113 +456,126 @@ def test_hist_no_overlap(self): y.hist() fig = gcf() axes = fig.axes if self.mpl_ge_1_5_0 else fig.get_axes() - self.assertEqual(len(axes), 2) + assert len(axes) == 2 - @slow + @pytest.mark.slow def test_hist_secondary_legend(self): # GH 9610 df = DataFrame(np.random.randn(30, 4), columns=list('abcd')) # primary -> secondary - ax = df['a'].plot.hist(legend=True) + _, ax = self.plt.subplots() + ax = df['a'].plot.hist(legend=True, ax=ax) df['b'].plot.hist(ax=ax, legend=True, secondary_y=True) # both legends are dran on left ax # left and right axis must be visible self._check_legend_labels(ax, labels=['a', 'b (right)']) - self.assertTrue(ax.get_yaxis().get_visible()) - self.assertTrue(ax.right_ax.get_yaxis().get_visible()) + assert ax.get_yaxis().get_visible() + assert ax.right_ax.get_yaxis().get_visible() tm.close() # secondary -> secondary - ax = df['a'].plot.hist(legend=True, secondary_y=True) + _, ax = self.plt.subplots() + ax = df['a'].plot.hist(legend=True, secondary_y=True, ax=ax) df['b'].plot.hist(ax=ax, legend=True, secondary_y=True) # both legends are draw on left ax # left axis must be invisible, right axis must be visible self._check_legend_labels(ax.left_ax, labels=['a (right)', 'b (right)']) - self.assertFalse(ax.left_ax.get_yaxis().get_visible()) - self.assertTrue(ax.get_yaxis().get_visible()) + assert not ax.left_ax.get_yaxis().get_visible() + assert ax.get_yaxis().get_visible() tm.close() # secondary -> primary - ax = df['a'].plot.hist(legend=True, secondary_y=True) + _, ax = self.plt.subplots() + ax = df['a'].plot.hist(legend=True, secondary_y=True, ax=ax) # right axes is returned df['b'].plot.hist(ax=ax, legend=True) # both legends are draw on left ax # left and right axis must be visible self._check_legend_labels(ax.left_ax, labels=['a (right)', 'b']) - self.assertTrue(ax.left_ax.get_yaxis().get_visible()) - self.assertTrue(ax.get_yaxis().get_visible()) + assert ax.left_ax.get_yaxis().get_visible() + assert ax.get_yaxis().get_visible() tm.close() - @slow + @pytest.mark.slow def test_df_series_secondary_legend(self): # GH 9779 df = DataFrame(np.random.randn(30, 3), columns=list('abc')) s = Series(np.random.randn(30), name='x') # primary -> secondary (without passing ax) - ax = df.plot() - s.plot(legend=True, secondary_y=True) + _, ax = self.plt.subplots() + ax = df.plot(ax=ax) + s.plot(legend=True, secondary_y=True, ax=ax) # both legends are dran on left ax # left and right axis must be visible self._check_legend_labels(ax, labels=['a', 'b', 'c', 'x (right)']) - self.assertTrue(ax.get_yaxis().get_visible()) - self.assertTrue(ax.right_ax.get_yaxis().get_visible()) + assert ax.get_yaxis().get_visible() + assert ax.right_ax.get_yaxis().get_visible() tm.close() # primary -> secondary (with passing ax) - ax = df.plot() + _, ax = self.plt.subplots() + ax = df.plot(ax=ax) s.plot(ax=ax, legend=True, secondary_y=True) # both legends are dran on left ax # left and right axis must be visible self._check_legend_labels(ax, labels=['a', 'b', 'c', 'x (right)']) - self.assertTrue(ax.get_yaxis().get_visible()) - self.assertTrue(ax.right_ax.get_yaxis().get_visible()) + assert ax.get_yaxis().get_visible() + assert ax.right_ax.get_yaxis().get_visible() tm.close() # seconcary -> secondary (without passing ax) - ax = df.plot(secondary_y=True) - s.plot(legend=True, secondary_y=True) + _, ax = self.plt.subplots() + ax = df.plot(secondary_y=True, ax=ax) + s.plot(legend=True, secondary_y=True, ax=ax) # both legends are dran on left ax # left axis must be invisible and right axis must be visible expected = ['a (right)', 'b (right)', 'c (right)', 'x (right)'] self._check_legend_labels(ax.left_ax, labels=expected) - self.assertFalse(ax.left_ax.get_yaxis().get_visible()) - self.assertTrue(ax.get_yaxis().get_visible()) + assert not ax.left_ax.get_yaxis().get_visible() + assert ax.get_yaxis().get_visible() tm.close() # secondary -> secondary (with passing ax) - ax = df.plot(secondary_y=True) + _, ax = self.plt.subplots() + ax = df.plot(secondary_y=True, ax=ax) s.plot(ax=ax, legend=True, secondary_y=True) # both legends are dran on left ax # left axis must be invisible and right axis must be visible expected = ['a (right)', 'b (right)', 'c (right)', 'x (right)'] self._check_legend_labels(ax.left_ax, expected) - self.assertFalse(ax.left_ax.get_yaxis().get_visible()) - self.assertTrue(ax.get_yaxis().get_visible()) + assert not ax.left_ax.get_yaxis().get_visible() + assert ax.get_yaxis().get_visible() tm.close() # secondary -> secondary (with passing ax) - ax = df.plot(secondary_y=True, mark_right=False) + _, ax = self.plt.subplots() + ax = df.plot(secondary_y=True, mark_right=False, ax=ax) s.plot(ax=ax, legend=True, secondary_y=True) # both legends are dran on left ax # left axis must be invisible and right axis must be visible expected = ['a', 'b', 'c', 'x (right)'] self._check_legend_labels(ax.left_ax, expected) - self.assertFalse(ax.left_ax.get_yaxis().get_visible()) - self.assertTrue(ax.get_yaxis().get_visible()) + assert not ax.left_ax.get_yaxis().get_visible() + assert ax.get_yaxis().get_visible() tm.close() - @slow + @pytest.mark.slow def test_plot_fails_with_dupe_color_and_style(self): x = Series(randn(2)) - with tm.assertRaises(ValueError): - x.plot(style='k--', color='k') + with pytest.raises(ValueError): + _, ax = self.plt.subplots() + x.plot(style='k--', color='k', ax=ax) - @slow + @pytest.mark.slow def test_hist_kde(self): - ax = self.ts.plot.hist(logy=True) + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + + _, ax = self.plt.subplots() + ax = self.ts.plot.hist(logy=True, ax=ax) self._check_ax_scales(ax, yaxis='log') xlabels = ax.get_xticklabels() # ticks are values, thus ticklabels are blank @@ -548,118 +587,139 @@ def test_hist_kde(self): _skip_if_no_scipy_gaussian_kde() _check_plot_works(self.ts.plot.kde) _check_plot_works(self.ts.plot.density) - ax = self.ts.plot.kde(logy=True) + _, ax = self.plt.subplots() + ax = self.ts.plot.kde(logy=True, ax=ax) self._check_ax_scales(ax, yaxis='log') xlabels = ax.get_xticklabels() self._check_text_labels(xlabels, [''] * len(xlabels)) ylabels = ax.get_yticklabels() self._check_text_labels(ylabels, [''] * len(ylabels)) - @slow + @pytest.mark.slow def test_kde_kwargs(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + from numpy import linspace _check_plot_works(self.ts.plot.kde, bw_method=.5, ind=linspace(-100, 100, 20)) _check_plot_works(self.ts.plot.density, bw_method=.5, ind=linspace(-100, 100, 20)) + _, ax = self.plt.subplots() ax = self.ts.plot.kde(logy=True, bw_method=.5, - ind=linspace(-100, 100, 20)) + ind=linspace(-100, 100, 20), ax=ax) self._check_ax_scales(ax, yaxis='log') self._check_text_labels(ax.yaxis.get_label(), 'Density') - @slow + @pytest.mark.slow def test_kde_missing_vals(self): tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + s = Series(np.random.uniform(size=50)) s[0] = np.nan axes = _check_plot_works(s.plot.kde) - # check if the values have any missing values - # GH14821 - self.assertTrue(any(~np.isnan(axes.lines[0].get_xdata())), - msg='Missing Values not dropped') - @slow + # gh-14821: check if the values have any missing values + assert any(~np.isnan(axes.lines[0].get_xdata())) + + @pytest.mark.slow def test_hist_kwargs(self): - ax = self.ts.plot.hist(bins=5) - self.assertEqual(len(ax.patches), 5) + _, ax = self.plt.subplots() + ax = self.ts.plot.hist(bins=5, ax=ax) + assert len(ax.patches) == 5 self._check_text_labels(ax.yaxis.get_label(), 'Frequency') tm.close() if self.mpl_ge_1_3_1: - ax = self.ts.plot.hist(orientation='horizontal') + _, ax = self.plt.subplots() + ax = self.ts.plot.hist(orientation='horizontal', ax=ax) self._check_text_labels(ax.xaxis.get_label(), 'Frequency') tm.close() - ax = self.ts.plot.hist(align='left', stacked=True) + _, ax = self.plt.subplots() + ax = self.ts.plot.hist(align='left', stacked=True, ax=ax) tm.close() - @slow + @pytest.mark.slow def test_hist_kde_color(self): - ax = self.ts.plot.hist(logy=True, bins=10, color='b') + if not self.mpl_ge_1_5_0: + pytest.skip("mpl is not supported") + + _, ax = self.plt.subplots() + ax = self.ts.plot.hist(logy=True, bins=10, color='b', ax=ax) self._check_ax_scales(ax, yaxis='log') - self.assertEqual(len(ax.patches), 10) + assert len(ax.patches) == 10 self._check_colors(ax.patches, facecolors=['b'] * 10) tm._skip_if_no_scipy() _skip_if_no_scipy_gaussian_kde() - ax = self.ts.plot.kde(logy=True, color='r') + _, ax = self.plt.subplots() + ax = self.ts.plot.kde(logy=True, color='r', ax=ax) self._check_ax_scales(ax, yaxis='log') lines = ax.get_lines() - self.assertEqual(len(lines), 1) + assert len(lines) == 1 self._check_colors(lines, ['r']) - @slow + @pytest.mark.slow def test_boxplot_series(self): - ax = self.ts.plot.box(logy=True) + _, ax = self.plt.subplots() + ax = self.ts.plot.box(logy=True, ax=ax) self._check_ax_scales(ax, yaxis='log') xlabels = ax.get_xticklabels() self._check_text_labels(xlabels, [self.ts.name]) ylabels = ax.get_yticklabels() self._check_text_labels(ylabels, [''] * len(ylabels)) - @slow + @pytest.mark.slow def test_kind_both_ways(self): s = Series(range(3)) - for kind in plotting._common_kinds + plotting._series_kinds: + kinds = (plotting._core._common_kinds + + plotting._core._series_kinds) + _, ax = self.plt.subplots() + for kind in kinds: if not _ok_for_gaussian_kde(kind): continue - s.plot(kind=kind) + s.plot(kind=kind, ax=ax) getattr(s.plot, kind)() - @slow + @pytest.mark.slow def test_invalid_plot_data(self): s = Series(list('abcd')) - for kind in plotting._common_kinds: + _, ax = self.plt.subplots() + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue - with tm.assertRaises(TypeError): - s.plot(kind=kind) + with pytest.raises(TypeError): + s.plot(kind=kind, ax=ax) - @slow + @pytest.mark.slow def test_valid_object_plot(self): s = Series(lrange(10), dtype=object) - for kind in plotting._common_kinds: + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue _check_plot_works(s.plot, kind=kind) def test_partially_invalid_plot_data(self): s = Series(['a', 'b', 1.0, 2]) - for kind in plotting._common_kinds: + _, ax = self.plt.subplots() + for kind in plotting._core._common_kinds: if not _ok_for_gaussian_kde(kind): continue - with tm.assertRaises(TypeError): - s.plot(kind=kind) + with pytest.raises(TypeError): + s.plot(kind=kind, ax=ax) def test_invalid_kind(self): s = Series([1, 2]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): s.plot(kind='aasdf') - @slow + @pytest.mark.slow def test_dup_datetime_index_plot(self): dr1 = date_range('1/1/2009', periods=4) dr2 = date_range('1/2/2009', periods=4) @@ -668,7 +728,7 @@ def test_dup_datetime_index_plot(self): s = Series(values, index=index) _check_plot_works(s.plot) - @slow + @pytest.mark.slow def test_errorbar_plot(self): s = Series(np.arange(10), name='x') @@ -703,81 +763,86 @@ def test_errorbar_plot(self): self._check_has_errorbars(ax, xerr=0, yerr=1) # check incorrect lengths and types - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): s.plot(yerr=np.arange(11)) s_err = ['zzz'] * 10 # in mpl 1.5+ this is a TypeError - with tm.assertRaises((ValueError, TypeError)): + with pytest.raises((ValueError, TypeError)): s.plot(yerr=s_err) def test_table(self): _check_plot_works(self.series.plot, table=True) _check_plot_works(self.series.plot, table=self.series) - @slow + @pytest.mark.slow def test_series_grid_settings(self): # Make sure plot defaults to rcParams['axes.grid'] setting, GH 9792 self._check_grid_settings(Series([1, 2, 3]), - plotting._series_kinds + - plotting._common_kinds) + plotting._core._series_kinds + + plotting._core._common_kinds) - @slow + @pytest.mark.slow def test_standard_colors(self): + from pandas.plotting._style import _get_standard_colors + for c in ['r', 'red', 'green', '#FF0000']: - result = plotting._get_standard_colors(1, color=c) - self.assertEqual(result, [c]) + result = _get_standard_colors(1, color=c) + assert result == [c] - result = plotting._get_standard_colors(1, color=[c]) - self.assertEqual(result, [c]) + result = _get_standard_colors(1, color=[c]) + assert result == [c] - result = plotting._get_standard_colors(3, color=c) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(3, color=c) + assert result == [c] * 3 - result = plotting._get_standard_colors(3, color=[c]) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(3, color=[c]) + assert result == [c] * 3 - @slow + @pytest.mark.slow def test_standard_colors_all(self): import matplotlib.colors as colors + from pandas.plotting._style import _get_standard_colors # multiple colors like mediumaquamarine for c in colors.cnames: - result = plotting._get_standard_colors(num_colors=1, color=c) - self.assertEqual(result, [c]) + result = _get_standard_colors(num_colors=1, color=c) + assert result == [c] - result = plotting._get_standard_colors(num_colors=1, color=[c]) - self.assertEqual(result, [c]) + result = _get_standard_colors(num_colors=1, color=[c]) + assert result == [c] - result = plotting._get_standard_colors(num_colors=3, color=c) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(num_colors=3, color=c) + assert result == [c] * 3 - result = plotting._get_standard_colors(num_colors=3, color=[c]) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(num_colors=3, color=[c]) + assert result == [c] * 3 # single letter colors like k for c in colors.ColorConverter.colors: - result = plotting._get_standard_colors(num_colors=1, color=c) - self.assertEqual(result, [c]) + result = _get_standard_colors(num_colors=1, color=c) + assert result == [c] - result = plotting._get_standard_colors(num_colors=1, color=[c]) - self.assertEqual(result, [c]) + result = _get_standard_colors(num_colors=1, color=[c]) + assert result == [c] - result = plotting._get_standard_colors(num_colors=3, color=c) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(num_colors=3, color=c) + assert result == [c] * 3 - result = plotting._get_standard_colors(num_colors=3, color=[c]) - self.assertEqual(result, [c] * 3) + result = _get_standard_colors(num_colors=3, color=[c]) + assert result == [c] * 3 def test_series_plot_color_kwargs(self): # GH1890 - ax = Series(np.arange(12) + 1).plot(color='green') + _, ax = self.plt.subplots() + ax = Series(np.arange(12) + 1).plot(color='green', ax=ax) self._check_colors(ax.get_lines(), linecolors=['green']) def test_time_series_plot_color_kwargs(self): # #1890 + _, ax = self.plt.subplots() ax = Series(np.arange(12) + 1, index=date_range( - '1/1/2000', periods=12)).plot(color='green') + '1/1/2000', periods=12)).plot(color='green', ax=ax) self._check_colors(ax.get_lines(), linecolors=['green']) def test_time_series_plot_color_with_empty_kwargs(self): @@ -792,14 +857,16 @@ def test_time_series_plot_color_with_empty_kwargs(self): ncolors = 3 + _, ax = self.plt.subplots() for i in range(ncolors): - ax = s.plot() + ax = s.plot(ax=ax) self._check_colors(ax.get_lines(), linecolors=def_colors[:ncolors]) def test_xticklabels(self): # GH11529 s = Series(np.arange(10), index=['P%02d' % i for i in range(10)]) - ax = s.plot(xticks=[0, 3, 5, 9]) + _, ax = self.plt.subplots() + ax = s.plot(xticks=[0, 3, 5, 9], ax=ax) exp = ['P%02d' % i for i in [0, 3, 5, 9]] self._check_text_labels(ax.get_xticklabels(), exp) @@ -811,8 +878,3 @@ def test_custom_business_day_freq(self): freq=CustomBusinessDay(holidays=['2014-05-26']))) _check_plot_works(s.plot) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/reshape/__init__.py b/pandas/tests/reshape/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tools/tests/data/allow_exact_matches.csv b/pandas/tests/reshape/data/allow_exact_matches.csv similarity index 100% rename from pandas/tools/tests/data/allow_exact_matches.csv rename to pandas/tests/reshape/data/allow_exact_matches.csv diff --git a/pandas/tools/tests/data/allow_exact_matches_and_tolerance.csv b/pandas/tests/reshape/data/allow_exact_matches_and_tolerance.csv similarity index 100% rename from pandas/tools/tests/data/allow_exact_matches_and_tolerance.csv rename to pandas/tests/reshape/data/allow_exact_matches_and_tolerance.csv diff --git a/pandas/tools/tests/data/asof.csv b/pandas/tests/reshape/data/asof.csv similarity index 100% rename from pandas/tools/tests/data/asof.csv rename to pandas/tests/reshape/data/asof.csv diff --git a/pandas/tools/tests/data/asof2.csv b/pandas/tests/reshape/data/asof2.csv similarity index 100% rename from pandas/tools/tests/data/asof2.csv rename to pandas/tests/reshape/data/asof2.csv diff --git a/pandas/tools/tests/data/cut_data.csv b/pandas/tests/reshape/data/cut_data.csv similarity index 100% rename from pandas/tools/tests/data/cut_data.csv rename to pandas/tests/reshape/data/cut_data.csv diff --git a/pandas/tools/tests/data/quotes.csv b/pandas/tests/reshape/data/quotes.csv similarity index 100% rename from pandas/tools/tests/data/quotes.csv rename to pandas/tests/reshape/data/quotes.csv diff --git a/pandas/tools/tests/data/quotes2.csv b/pandas/tests/reshape/data/quotes2.csv similarity index 100% rename from pandas/tools/tests/data/quotes2.csv rename to pandas/tests/reshape/data/quotes2.csv diff --git a/pandas/tools/tests/data/tolerance.csv b/pandas/tests/reshape/data/tolerance.csv similarity index 100% rename from pandas/tools/tests/data/tolerance.csv rename to pandas/tests/reshape/data/tolerance.csv diff --git a/pandas/tools/tests/data/trades.csv b/pandas/tests/reshape/data/trades.csv similarity index 100% rename from pandas/tools/tests/data/trades.csv rename to pandas/tests/reshape/data/trades.csv diff --git a/pandas/tools/tests/data/trades2.csv b/pandas/tests/reshape/data/trades2.csv similarity index 100% rename from pandas/tools/tests/data/trades2.csv rename to pandas/tests/reshape/data/trades2.csv diff --git a/pandas/tools/tests/test_concat.py b/pandas/tests/reshape/test_concat.py similarity index 77% rename from pandas/tools/tests/test_concat.py rename to pandas/tests/reshape/test_concat.py index 2be7e75573d6e..6e646f9b29442 100644 --- a/pandas/tools/tests/test_concat.py +++ b/pandas/tests/reshape/test_concat.py @@ -1,27 +1,26 @@ -import nose +from warnings import catch_warnings +import dateutil import numpy as np from numpy.random import randn from datetime import datetime -from pandas.compat import StringIO, iteritems +from pandas.compat import StringIO, iteritems, PY2 import pandas as pd from pandas import (DataFrame, concat, - read_csv, isnull, Series, date_range, + read_csv, isna, Series, date_range, Index, Panel, MultiIndex, Timestamp, - DatetimeIndex, Categorical, CategoricalIndex) -from pandas.types.concat import union_categoricals + DatetimeIndex) from pandas.util import testing as tm from pandas.util.testing import (assert_frame_equal, - makeCustomDataframe as mkdf, - assert_almost_equal) + makeCustomDataframe as mkdf) +import pytest -class ConcatenateBase(tm.TestCase): - _multiprocess_can_split_ = True +class ConcatenateBase(object): - def setUp(self): + def setup_method(self, method): self.frame = DataFrame(tm.getSeriesData()) self.mixed_frame = self.frame.copy() self.mixed_frame['foo'] = 'bar' @@ -33,7 +32,7 @@ class TestConcatAppendCommon(ConcatenateBase): Test common dtype coercion rules between concat and append. """ - def setUp(self): + def setup_method(self, method): dt_data = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02'), @@ -67,14 +66,14 @@ def _check_expected_dtype(self, obj, label): """ if isinstance(obj, pd.Index): if label == 'bool': - self.assertEqual(obj.dtype, 'object') + assert obj.dtype == 'object' else: - self.assertEqual(obj.dtype, label) + assert obj.dtype == label elif isinstance(obj, pd.Series): if label.startswith('period'): - self.assertEqual(obj.dtype, 'object') + assert obj.dtype == 'object' else: - self.assertEqual(obj.dtype, label) + assert obj.dtype == label else: raise ValueError @@ -126,10 +125,12 @@ def test_concatlike_same_dtypes(self): tm.assert_index_equal(res, exp) # cannot append non-index - with tm.assertRaisesRegexp(TypeError, 'all inputs must be Index'): + with tm.assert_raises_regex(TypeError, + 'all inputs must be Index'): pd.Index(vals1).append(vals2) - with tm.assertRaisesRegexp(TypeError, 'all inputs must be Index'): + with tm.assert_raises_regex(TypeError, + 'all inputs must be Index'): pd.Index(vals1).append([pd.Index(vals2), vals3]) # ----- Series ----- # @@ -177,16 +178,16 @@ def test_concatlike_same_dtypes(self): # cannot append non-index msg = "cannot concatenate a non-NDFrame object" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): pd.Series(vals1).append(vals2) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): pd.Series(vals1).append([pd.Series(vals2), vals3]) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): pd.concat([pd.Series(vals1), vals2]) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): pd.concat([pd.Series(vals1), pd.Series(vals2), vals3]) def test_concatlike_dtypes_coercion(self): @@ -274,20 +275,20 @@ def test_concatlike_common_coerce_to_pandas_object(self): res = dti.append(tdi) tm.assert_index_equal(res, exp) - tm.assertIsInstance(res[0], pd.Timestamp) - tm.assertIsInstance(res[-1], pd.Timedelta) + assert isinstance(res[0], pd.Timestamp) + assert isinstance(res[-1], pd.Timedelta) dts = pd.Series(dti) tds = pd.Series(tdi) res = dts.append(tds) tm.assert_series_equal(res, pd.Series(exp, index=[0, 1, 0, 1])) - tm.assertIsInstance(res.iloc[0], pd.Timestamp) - tm.assertIsInstance(res.iloc[-1], pd.Timedelta) + assert isinstance(res.iloc[0], pd.Timestamp) + assert isinstance(res.iloc[-1], pd.Timedelta) res = pd.concat([dts, tds]) tm.assert_series_equal(res, pd.Series(exp, index=[0, 1, 0, 1])) - tm.assertIsInstance(res.iloc[0], pd.Timestamp) - tm.assertIsInstance(res.iloc[-1], pd.Timedelta) + assert isinstance(res.iloc[0], pd.Timestamp) + assert isinstance(res.iloc[-1], pd.Timedelta) def test_concatlike_datetimetz(self): # GH 7795 @@ -679,7 +680,7 @@ def test_concat_categorical_empty(self): tm.assert_series_equal(s1.append(s2, ignore_index=True), s2) s1 = pd.Series([], dtype='category') - s2 = pd.Series([]) + s2 = pd.Series([], dtype='object') # different dtype => not-category tm.assert_series_equal(pd.concat([s1, s2], ignore_index=True), s2) @@ -709,25 +710,25 @@ def test_append(self): end_frame = self.frame.reindex(end_index) appended = begin_frame.append(end_frame) - assert_almost_equal(appended['A'], self.frame['A']) + tm.assert_almost_equal(appended['A'], self.frame['A']) del end_frame['A'] partial_appended = begin_frame.append(end_frame) - self.assertIn('A', partial_appended) + assert 'A' in partial_appended partial_appended = end_frame.append(begin_frame) - self.assertIn('A', partial_appended) + assert 'A' in partial_appended # mixed type handling appended = self.mixed_frame[:5].append(self.mixed_frame[5:]) - assert_frame_equal(appended, self.mixed_frame) + tm.assert_frame_equal(appended, self.mixed_frame) # what to test here mixed_appended = self.mixed_frame[:5].append(self.frame[5:]) mixed_appended2 = self.frame[:5].append(self.mixed_frame[5:]) # all equal except 'foo' column - assert_frame_equal( + tm.assert_frame_equal( mixed_appended.reindex(columns=['A', 'B', 'C', 'D']), mixed_appended2.reindex(columns=['A', 'B', 'C', 'D'])) @@ -735,25 +736,24 @@ def test_append(self): empty = DataFrame({}) appended = self.frame.append(empty) - assert_frame_equal(self.frame, appended) - self.assertIsNot(appended, self.frame) + tm.assert_frame_equal(self.frame, appended) + assert appended is not self.frame appended = empty.append(self.frame) - assert_frame_equal(self.frame, appended) - self.assertIsNot(appended, self.frame) + tm.assert_frame_equal(self.frame, appended) + assert appended is not self.frame - # overlap - self.assertRaises(ValueError, self.frame.append, self.frame, - verify_integrity=True) + # Overlap + with pytest.raises(ValueError): + self.frame.append(self.frame, verify_integrity=True) - # new columns - # GH 6129 + # see gh-6129: new columns df = DataFrame({'a': {'x': 1, 'y': 2}, 'b': {'x': 3, 'y': 4}}) row = Series([5, 6, 7], index=['a', 'b', 'c'], name='z') expected = DataFrame({'a': {'x': 1, 'y': 2, 'z': 5}, 'b': { 'x': 3, 'y': 4, 'z': 6}, 'c': {'z': 7}}) result = df.append(row) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_append_length0_frame(self): df = DataFrame(columns=['A', 'B', 'C']) @@ -789,8 +789,8 @@ def test_append_different_columns(self): b = df[5:].loc[:, ['strings', 'ints', 'floats']] appended = a.append(b) - self.assertTrue(isnull(appended['strings'][0:4]).all()) - self.assertTrue(isnull(appended['bools'][5:]).all()) + assert isna(appended['strings'][0:4]).all() + assert isna(appended['bools'][5:]).all() def test_append_many(self): chunks = [self.frame[:5], self.frame[5:10], @@ -803,8 +803,8 @@ def test_append_many(self): chunks[-1]['foo'] = 'bar' result = chunks[0].append(chunks[1:]) tm.assert_frame_equal(result.loc[:, self.frame.columns], self.frame) - self.assertTrue((result['foo'][15:] == 'bar').all()) - self.assertTrue(result['foo'][:15].isnull().all()) + assert (result['foo'][15:] == 'bar').all() + assert result['foo'][:15].isna().all() def test_append_preserve_index_name(self): # #980 @@ -815,7 +815,7 @@ def test_append_preserve_index_name(self): df2 = df2.set_index(['A']) result = df1.append(df2) - self.assertEqual(result.index.name, 'A') + assert result.index.name == 'A' def test_append_dtype_coerce(self): @@ -850,46 +850,44 @@ def test_append_missing_column_proper_upcast(self): dtype=bool)}) appended = df1.append(df2, ignore_index=True) - self.assertEqual(appended['A'].dtype, 'f8') - self.assertEqual(appended['B'].dtype, 'O') + assert appended['A'].dtype == 'f8' + assert appended['B'].dtype == 'O' class TestConcatenate(ConcatenateBase): def test_concat_copy(self): - df = DataFrame(np.random.randn(4, 3)) df2 = DataFrame(np.random.randint(0, 10, size=4).reshape(4, 1)) df3 = DataFrame({5: 'foo'}, index=range(4)) - # these are actual copies + # These are actual copies. result = concat([df, df2, df3], axis=1, copy=True) + for b in result._data.blocks: - self.assertIsNone(b.values.base) + assert b.values.base is None - # these are the same + # These are the same. result = concat([df, df2, df3], axis=1, copy=False) + for b in result._data.blocks: if b.is_float: - self.assertTrue( - b.values.base is df._data.blocks[0].values.base) + assert b.values.base is df._data.blocks[0].values.base elif b.is_integer: - self.assertTrue( - b.values.base is df2._data.blocks[0].values.base) + assert b.values.base is df2._data.blocks[0].values.base elif b.is_object: - self.assertIsNotNone(b.values.base) + assert b.values.base is not None - # float block was consolidated + # Float block was consolidated. df4 = DataFrame(np.random.randn(4, 1)) result = concat([df, df2, df3, df4], axis=1, copy=False) for b in result._data.blocks: if b.is_float: - self.assertIsNone(b.values.base) + assert b.values.base is None elif b.is_integer: - self.assertTrue( - b.values.base is df2._data.blocks[0].values.base) + assert b.values.base is df2._data.blocks[0].values.base elif b.is_object: - self.assertIsNotNone(b.values.base) + assert b.values.base is not None def test_concat_with_group_keys(self): df = DataFrame(np.random.randn(4, 3)) @@ -935,9 +933,9 @@ def test_concat_keys_specific_levels(self): levels=[level], names=['group_key']) - self.assert_index_equal(result.columns.levels[0], - Index(level, name='group_key')) - self.assertEqual(result.columns.names[0], 'group_key') + tm.assert_index_equal(result.columns.levels[0], + Index(level, name='group_key')) + assert result.columns.names[0] == 'group_key' def test_concat_dataframe_keys_bug(self): t1 = DataFrame({ @@ -948,8 +946,7 @@ def test_concat_dataframe_keys_bug(self): # it works result = concat([t1, t2], axis=1, keys=['t1', 't2']) - self.assertEqual(list(result.columns), [('t1', 'value'), - ('t2', 'value')]) + assert list(result.columns) == [('t1', 'value'), ('t2', 'value')] def test_concat_series_partial_columns_names(self): # GH10698 @@ -1023,10 +1020,10 @@ def test_concat_multiindex_with_keys(self): columns=Index(['A', 'B', 'C'], name='exp')) result = concat([frame, frame], keys=[0, 1], names=['iteration']) - self.assertEqual(result.index.names, ('iteration',) + index.names) + assert result.index.names == ('iteration',) + index.names tm.assert_frame_equal(result.loc[0], frame) tm.assert_frame_equal(result.loc[1], frame) - self.assertEqual(result.index.nlevels, 3) + assert result.index.nlevels == 3 def test_concat_multiindex_with_tz(self): # GH 6606 @@ -1049,6 +1046,30 @@ def test_concat_multiindex_with_tz(self): result = concat([df, df]) tm.assert_frame_equal(result, expected) + def test_concat_multiindex_with_none_in_index_names(self): + # GH 15787 + index = pd.MultiIndex.from_product([[1], range(5)], + names=['level1', None]) + df = pd.DataFrame({'col': range(5)}, index=index, dtype=np.int32) + + result = concat([df, df], keys=[1, 2], names=['level2']) + index = pd.MultiIndex.from_product([[1, 2], [1], range(5)], + names=['level2', 'level1', None]) + expected = pd.DataFrame({'col': list(range(5)) * 2}, + index=index, dtype=np.int32) + assert_frame_equal(result, expected) + + result = concat([df, df[:2]], keys=[1, 2], names=['level2']) + level2 = [1] * 5 + [2] * 2 + level1 = [1] * 7 + no_name = list(range(5)) + list(range(2)) + tuples = list(zip(level2, level1, no_name)) + index = pd.MultiIndex.from_tuples(tuples, + names=['level2', 'level1', None]) + expected = pd.DataFrame({'col': no_name}, index=index, + dtype=np.int32) + assert_frame_equal(result, expected) + def test_concat_keys_and_levels(self): df = DataFrame(np.random.randn(1, 3)) df2 = DataFrame(np.random.randn(1, 4)) @@ -1067,35 +1088,34 @@ def test_concat_keys_and_levels(self): names=names + [None]) expected.index = exp_index - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # no names - result = concat([df, df2, df, df2], keys=[('foo', 'one'), ('foo', 'two'), ('baz', 'one'), ('baz', 'two')], levels=levels) - self.assertEqual(result.index.names, (None,) * 3) + assert result.index.names == (None,) * 3 # no levels result = concat([df, df2, df, df2], keys=[('foo', 'one'), ('foo', 'two'), ('baz', 'one'), ('baz', 'two')], names=['first', 'second']) - self.assertEqual(result.index.names, ('first', 'second') + (None,)) - self.assert_index_equal(result.index.levels[0], - Index(['baz', 'foo'], name='first')) + assert result.index.names == ('first', 'second') + (None,) + tm.assert_index_equal(result.index.levels[0], + Index(['baz', 'foo'], name='first')) def test_concat_keys_levels_no_overlap(self): # GH #1406 df = DataFrame(np.random.randn(1, 3), index=['a']) df2 = DataFrame(np.random.randn(1, 4), index=['b']) - self.assertRaises(ValueError, concat, [df, df], - keys=['one', 'two'], levels=[['foo', 'bar', 'baz']]) + pytest.raises(ValueError, concat, [df, df], + keys=['one', 'two'], levels=[['foo', 'bar', 'baz']]) - self.assertRaises(ValueError, concat, [df, df2], - keys=['one', 'two'], levels=[['foo', 'bar', 'baz']]) + pytest.raises(ValueError, concat, [df, df2], + keys=['one', 'two'], levels=[['foo', 'bar', 'baz']]) def test_concat_rename_index(self): a = DataFrame(np.random.rand(3, 3), @@ -1114,7 +1134,7 @@ def test_concat_rename_index(self): exp.index.set_names(names, inplace=True) tm.assert_frame_equal(result, exp) - self.assertEqual(result.index.names, exp.index.names) + assert result.index.names == exp.index.names def test_crossed_dtypes_weird_corner(self): columns = ['A', 'B', 'C', 'D'] @@ -1139,7 +1159,7 @@ def test_crossed_dtypes_weird_corner(self): df2 = DataFrame(np.random.randn(1, 4), index=['b']) result = concat( [df, df2], keys=['one', 'two'], names=['first', 'second']) - self.assertEqual(result.index.names, ('first', 'second')) + assert result.index.names == ('first', 'second') def test_dups_index(self): # GH 4771 @@ -1285,8 +1305,9 @@ def test_concat_mixed_objs(self): assert_frame_equal(result, expected) # invalid concatente of mixed dims - panel = tm.makePanel() - self.assertRaises(ValueError, lambda: concat([panel, s1], axis=1)) + with catch_warnings(record=True): + panel = tm.makePanel() + pytest.raises(ValueError, lambda: concat([panel, s1], axis=1)) def test_empty_dtype_coerce(self): @@ -1324,59 +1345,62 @@ def test_dtype_coerceion(self): tm.assert_series_equal(result.dtypes, df.dtypes) def test_panel_concat_other_axes(self): - panel = tm.makePanel() + with catch_warnings(record=True): + panel = tm.makePanel() - p1 = panel.iloc[:, :5, :] - p2 = panel.iloc[:, 5:, :] + p1 = panel.iloc[:, :5, :] + p2 = panel.iloc[:, 5:, :] - result = concat([p1, p2], axis=1) - tm.assert_panel_equal(result, panel) + result = concat([p1, p2], axis=1) + tm.assert_panel_equal(result, panel) - p1 = panel.iloc[:, :, :2] - p2 = panel.iloc[:, :, 2:] + p1 = panel.iloc[:, :, :2] + p2 = panel.iloc[:, :, 2:] - result = concat([p1, p2], axis=2) - tm.assert_panel_equal(result, panel) + result = concat([p1, p2], axis=2) + tm.assert_panel_equal(result, panel) - # if things are a bit misbehaved - p1 = panel.iloc[:2, :, :2] - p2 = panel.iloc[:, :, 2:] - p1['ItemC'] = 'baz' + # if things are a bit misbehaved + p1 = panel.iloc[:2, :, :2] + p2 = panel.iloc[:, :, 2:] + p1['ItemC'] = 'baz' - result = concat([p1, p2], axis=2) + result = concat([p1, p2], axis=2) - expected = panel.copy() - expected['ItemC'] = expected['ItemC'].astype('O') - expected.loc['ItemC', :, :2] = 'baz' - tm.assert_panel_equal(result, expected) + expected = panel.copy() + expected['ItemC'] = expected['ItemC'].astype('O') + expected.loc['ItemC', :, :2] = 'baz' + tm.assert_panel_equal(result, expected) def test_panel_concat_buglet(self): - # #2257 - def make_panel(): - index = 5 - cols = 3 + with catch_warnings(record=True): + # #2257 + def make_panel(): + index = 5 + cols = 3 - def df(): - return DataFrame(np.random.randn(index, cols), - index=["I%s" % i for i in range(index)], - columns=["C%s" % i for i in range(cols)]) - return Panel(dict([("Item%s" % x, df()) for x in ['A', 'B', 'C']])) + def df(): + return DataFrame(np.random.randn(index, cols), + index=["I%s" % i for i in range(index)], + columns=["C%s" % i for i in range(cols)]) + return Panel(dict([("Item%s" % x, df()) + for x in ['A', 'B', 'C']])) - panel1 = make_panel() - panel2 = make_panel() + panel1 = make_panel() + panel2 = make_panel() - panel2 = panel2.rename_axis(dict([(x, "%s_1" % x) - for x in panel2.major_axis]), - axis=1) + panel2 = panel2.rename_axis(dict([(x, "%s_1" % x) + for x in panel2.major_axis]), + axis=1) - panel3 = panel2.rename_axis(lambda x: '%s_1' % x, axis=1) - panel3 = panel3.rename_axis(lambda x: '%s_1' % x, axis=2) + panel3 = panel2.rename_axis(lambda x: '%s_1' % x, axis=1) + panel3 = panel3.rename_axis(lambda x: '%s_1' % x, axis=2) - # it works! - concat([panel1, panel3], axis=1, verify_integrity=True) + # it works! + concat([panel1, panel3], axis=1, verify_integrity=True) def test_panel4d_concat(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p4d = tm.makePanel4D() p1 = p4d.iloc[:, :, :5, :] @@ -1392,7 +1416,7 @@ def test_panel4d_concat(self): tm.assert_panel4d_equal(result, p4d) def test_panel4d_concat_mixed_type(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p4d = tm.makePanel4D() # if things are a bit misbehaved @@ -1417,7 +1441,7 @@ def test_concat_series(self): result = concat(pieces) tm.assert_series_equal(result, ts) - self.assertEqual(result.name, ts.name) + assert result.name == ts.name result = concat(pieces, keys=[0, 1, 2]) expected = ts.copy() @@ -1454,8 +1478,8 @@ def test_concat_series_axis1(self): s2.name = None result = concat([s, s2], axis=1) - self.assertTrue(np.array_equal( - result.columns, Index(['A', 0], dtype='object'))) + tm.assert_index_equal(result.columns, + Index(['A', 0], dtype='object')) # must reindex, #2603 s = Series(randn(3), index=['c', 'a', 'b'], name='A') @@ -1477,18 +1501,18 @@ def test_concat_exclude_none(self): pieces = [df[:5], None, None, df[5:]] result = concat(pieces) tm.assert_frame_equal(result, df) - self.assertRaises(ValueError, concat, [None, None]) + pytest.raises(ValueError, concat, [None, None]) def test_concat_datetime64_block(self): - from pandas.tseries.index import date_range + from pandas.core.indexes.datetimes import date_range rng = date_range('1/1/2000', periods=10) df = DataFrame({'time': rng}) result = concat([df, df]) - self.assertTrue((result.iloc[:10]['time'] == rng).all()) - self.assertTrue((result.iloc[10:]['time'] == rng).all()) + assert (result.iloc[:10]['time'] == rng).all() + assert (result.iloc[10:]['time'] == rng).all() def test_concat_timedelta64_block(self): from pandas import to_timedelta @@ -1498,8 +1522,8 @@ def test_concat_timedelta64_block(self): df = DataFrame({'time': rng}) result = concat([df, df]) - self.assertTrue((result.iloc[:10]['time'] == rng).all()) - self.assertTrue((result.iloc[10:]['time'] == rng).all()) + assert (result.iloc[:10]['time'] == rng).all() + assert (result.iloc[10:]['time'] == rng).all() def test_concat_keys_with_none(self): # #1649 @@ -1515,283 +1539,6 @@ def test_concat_keys_with_none(self): keys=['b', 'c', 'd', 'e']) tm.assert_frame_equal(result, expected) - def test_union_categorical(self): - # GH 13361 - data = [ - (list('abc'), list('abd'), list('abcabd')), - ([0, 1, 2], [2, 3, 4], [0, 1, 2, 2, 3, 4]), - ([0, 1.2, 2], [2, 3.4, 4], [0, 1.2, 2, 2, 3.4, 4]), - - (['b', 'b', np.nan, 'a'], ['a', np.nan, 'c'], - ['b', 'b', np.nan, 'a', 'a', np.nan, 'c']), - - (pd.date_range('2014-01-01', '2014-01-05'), - pd.date_range('2014-01-06', '2014-01-07'), - pd.date_range('2014-01-01', '2014-01-07')), - - (pd.date_range('2014-01-01', '2014-01-05', tz='US/Central'), - pd.date_range('2014-01-06', '2014-01-07', tz='US/Central'), - pd.date_range('2014-01-01', '2014-01-07', tz='US/Central')), - - (pd.period_range('2014-01-01', '2014-01-05'), - pd.period_range('2014-01-06', '2014-01-07'), - pd.period_range('2014-01-01', '2014-01-07')), - ] - - for a, b, combined in data: - for box in [Categorical, CategoricalIndex, Series]: - result = union_categoricals([box(Categorical(a)), - box(Categorical(b))]) - expected = Categorical(combined) - tm.assert_categorical_equal(result, expected, - check_category_order=True) - - # new categories ordered by appearance - s = Categorical(['x', 'y', 'z']) - s2 = Categorical(['a', 'b', 'c']) - result = union_categoricals([s, s2]) - expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], - categories=['x', 'y', 'z', 'a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - s = Categorical([0, 1.2, 2], ordered=True) - s2 = Categorical([0, 1.2, 2], ordered=True) - result = union_categoricals([s, s2]) - expected = Categorical([0, 1.2, 2, 0, 1.2, 2], ordered=True) - tm.assert_categorical_equal(result, expected) - - # must exactly match types - s = Categorical([0, 1.2, 2]) - s2 = Categorical([2, 3, 4]) - msg = 'dtype of categories must be the same' - with tm.assertRaisesRegexp(TypeError, msg): - union_categoricals([s, s2]) - - msg = 'No Categoricals to union' - with tm.assertRaisesRegexp(ValueError, msg): - union_categoricals([]) - - def test_union_categoricals_nan(self): - # GH 13759 - res = union_categoricals([pd.Categorical([1, 2, np.nan]), - pd.Categorical([3, 2, np.nan])]) - exp = Categorical([1, 2, np.nan, 3, 2, np.nan]) - tm.assert_categorical_equal(res, exp) - - res = union_categoricals([pd.Categorical(['A', 'B']), - pd.Categorical(['B', 'B', np.nan])]) - exp = Categorical(['A', 'B', 'B', 'B', np.nan]) - tm.assert_categorical_equal(res, exp) - - val1 = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-03-01'), - pd.NaT] - val2 = [pd.NaT, pd.Timestamp('2011-01-01'), - pd.Timestamp('2011-02-01')] - - res = union_categoricals([pd.Categorical(val1), pd.Categorical(val2)]) - exp = Categorical(val1 + val2, - categories=[pd.Timestamp('2011-01-01'), - pd.Timestamp('2011-03-01'), - pd.Timestamp('2011-02-01')]) - tm.assert_categorical_equal(res, exp) - - # all NaN - res = union_categoricals([pd.Categorical([np.nan, np.nan]), - pd.Categorical(['X'])]) - exp = Categorical([np.nan, np.nan, 'X']) - tm.assert_categorical_equal(res, exp) - - res = union_categoricals([pd.Categorical([np.nan, np.nan]), - pd.Categorical([np.nan, np.nan])]) - exp = Categorical([np.nan, np.nan, np.nan, np.nan]) - tm.assert_categorical_equal(res, exp) - - def test_union_categoricals_empty(self): - # GH 13759 - res = union_categoricals([pd.Categorical([]), - pd.Categorical([])]) - exp = Categorical([]) - tm.assert_categorical_equal(res, exp) - - res = union_categoricals([pd.Categorical([]), - pd.Categorical([1.0])]) - exp = Categorical([1.0]) - tm.assert_categorical_equal(res, exp) - - # to make dtype equal - nanc = pd.Categorical(np.array([np.nan], dtype=np.float64)) - res = union_categoricals([nanc, - pd.Categorical([])]) - tm.assert_categorical_equal(res, nanc) - - def test_union_categorical_same_category(self): - # check fastpath - c1 = Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4]) - c2 = Categorical([3, 2, 1, np.nan], categories=[1, 2, 3, 4]) - res = union_categoricals([c1, c2]) - exp = Categorical([1, 2, 3, 4, 3, 2, 1, np.nan], - categories=[1, 2, 3, 4]) - tm.assert_categorical_equal(res, exp) - - c1 = Categorical(['z', 'z', 'z'], categories=['x', 'y', 'z']) - c2 = Categorical(['x', 'x', 'x'], categories=['x', 'y', 'z']) - res = union_categoricals([c1, c2]) - exp = Categorical(['z', 'z', 'z', 'x', 'x', 'x'], - categories=['x', 'y', 'z']) - tm.assert_categorical_equal(res, exp) - - def test_union_categoricals_ordered(self): - c1 = Categorical([1, 2, 3], ordered=True) - c2 = Categorical([1, 2, 3], ordered=False) - - msg = 'Categorical.ordered must be the same' - with tm.assertRaisesRegexp(TypeError, msg): - union_categoricals([c1, c2]) - - res = union_categoricals([c1, c1]) - exp = Categorical([1, 2, 3, 1, 2, 3], ordered=True) - tm.assert_categorical_equal(res, exp) - - c1 = Categorical([1, 2, 3, np.nan], ordered=True) - c2 = Categorical([3, 2], categories=[1, 2, 3], ordered=True) - - res = union_categoricals([c1, c2]) - exp = Categorical([1, 2, 3, np.nan, 3, 2], ordered=True) - tm.assert_categorical_equal(res, exp) - - c1 = Categorical([1, 2, 3], ordered=True) - c2 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True) - - msg = "to union ordered Categoricals, all categories must be the same" - with tm.assertRaisesRegexp(TypeError, msg): - union_categoricals([c1, c2]) - - def test_union_categoricals_sort(self): - # GH 13846 - c1 = Categorical(['x', 'y', 'z']) - c2 = Categorical(['a', 'b', 'c']) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], - categories=['a', 'b', 'c', 'x', 'y', 'z']) - tm.assert_categorical_equal(result, expected) - - # fastpath - c1 = Categorical(['a', 'b'], categories=['b', 'a', 'c']) - c2 = Categorical(['b', 'c'], categories=['b', 'a', 'c']) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical(['a', 'b', 'b', 'c'], - categories=['a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical(['a', 'b'], categories=['c', 'a', 'b']) - c2 = Categorical(['b', 'c'], categories=['c', 'a', 'b']) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical(['a', 'b', 'b', 'c'], - categories=['a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - # fastpath - skip resort - c1 = Categorical(['a', 'b'], categories=['a', 'b', 'c']) - c2 = Categorical(['b', 'c'], categories=['a', 'b', 'c']) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical(['a', 'b', 'b', 'c'], - categories=['a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical(['x', np.nan]) - c2 = Categorical([np.nan, 'b']) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical(['x', np.nan, np.nan, 'b'], - categories=['b', 'x']) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical([np.nan]) - c2 = Categorical([np.nan]) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical([np.nan, np.nan], categories=[]) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical([]) - c2 = Categorical([]) - result = union_categoricals([c1, c2], sort_categories=True) - expected = Categorical([]) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical(['b', 'a'], categories=['b', 'a', 'c'], ordered=True) - c2 = Categorical(['a', 'c'], categories=['b', 'a', 'c'], ordered=True) - with tm.assertRaises(TypeError): - union_categoricals([c1, c2], sort_categories=True) - - def test_union_categoricals_sort_false(self): - # GH 13846 - c1 = Categorical(['x', 'y', 'z']) - c2 = Categorical(['a', 'b', 'c']) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], - categories=['x', 'y', 'z', 'a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - # fastpath - c1 = Categorical(['a', 'b'], categories=['b', 'a', 'c']) - c2 = Categorical(['b', 'c'], categories=['b', 'a', 'c']) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical(['a', 'b', 'b', 'c'], - categories=['b', 'a', 'c']) - tm.assert_categorical_equal(result, expected) - - # fastpath - skip resort - c1 = Categorical(['a', 'b'], categories=['a', 'b', 'c']) - c2 = Categorical(['b', 'c'], categories=['a', 'b', 'c']) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical(['a', 'b', 'b', 'c'], - categories=['a', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical(['x', np.nan]) - c2 = Categorical([np.nan, 'b']) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical(['x', np.nan, np.nan, 'b'], - categories=['x', 'b']) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical([np.nan]) - c2 = Categorical([np.nan]) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical([np.nan, np.nan], categories=[]) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical([]) - c2 = Categorical([]) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical([]) - tm.assert_categorical_equal(result, expected) - - c1 = Categorical(['b', 'a'], categories=['b', 'a', 'c'], ordered=True) - c2 = Categorical(['a', 'c'], categories=['b', 'a', 'c'], ordered=True) - result = union_categoricals([c1, c2], sort_categories=False) - expected = Categorical(['b', 'a', 'a', 'c'], - categories=['b', 'a', 'c'], ordered=True) - tm.assert_categorical_equal(result, expected) - - def test_union_categorical_unwrap(self): - # GH 14173 - c1 = Categorical(['a', 'b']) - c2 = pd.Series(['b', 'c'], dtype='category') - result = union_categoricals([c1, c2]) - expected = Categorical(['a', 'b', 'b', 'c']) - tm.assert_categorical_equal(result, expected) - - c2 = CategoricalIndex(c2) - result = union_categoricals([c1, c2]) - tm.assert_categorical_equal(result, expected) - - c1 = Series(c1) - result = union_categoricals([c1, c2]) - tm.assert_categorical_equal(result, expected) - - with tm.assertRaises(TypeError): - union_categoricals([c1, ['a', 'b', 'c']]) - def test_concat_bug_1719(self): ts1 = tm.makeTimeSeries() ts2 = tm.makeTimeSeries()[::2] @@ -1801,7 +1548,7 @@ def test_concat_bug_1719(self): left = concat([ts1, ts2], join='outer', axis=1) right = concat([ts2, ts1], join='outer', axis=1) - self.assertEqual(len(left), len(right)) + assert len(left) == len(right) def test_concat_bug_2972(self): ts0 = Series(np.zeros(5)) @@ -1829,13 +1576,23 @@ def test_concat_bug_3602(self): result = concat([df1, df2], axis=1) assert_frame_equal(result, expected) + def test_concat_inner_join_empty(self): + # GH 15328 + df_empty = pd.DataFrame() + df_a = pd.DataFrame({'a': [1, 2]}, index=[0, 1], dtype='int64') + df_expected = pd.DataFrame({'a': []}, index=[], dtype='int64') + + for how, expected in [('inner', df_expected), ('outer', df_a)]: + result = pd.concat([df_a, df_empty], axis=1, join=how) + assert_frame_equal(result, expected) + def test_concat_series_axis1_same_names_ignore_index(self): dates = date_range('01-Jan-2013', '01-Jan-2014', freq='MS')[0:-1] s1 = Series(randn(len(dates)), index=dates, name='value') s2 = Series(randn(len(dates)), index=dates, name='value') result = concat([s1, s2], axis=1, ignore_index=True) - self.assertTrue(np.array_equal(result.columns, [0, 1])) + assert np.array_equal(result.columns, [0, 1]) def test_concat_iterables(self): from collections import deque, Iterable @@ -1878,12 +1635,12 @@ def test_concat_invalid(self): # trying to concat a ndframe with a non-ndframe df1 = mkdf(10, 2) for obj in [1, dict(), [1, 2], (1, 2)]: - self.assertRaises(TypeError, lambda x: concat([df1, obj])) + pytest.raises(TypeError, lambda x: concat([df1, obj])) def test_concat_invalid_first_argument(self): df1 = mkdf(10, 2) df2 = mkdf(10, 2) - self.assertRaises(TypeError, concat, df1, df2) + pytest.raises(TypeError, concat, df1, df2) # generator ok though concat(DataFrame(np.random.rand(5, 5)) for _ in range(3)) @@ -1948,8 +1705,7 @@ def test_concat_tz_frame(self): assert_frame_equal(df2, df3) def test_concat_tz_series(self): - # GH 11755 - # tz and no tz + # gh-11755: tz and no tz x = Series(date_range('20151124 08:00', '20151124 09:00', freq='1h', tz='UTC')) @@ -1959,8 +1715,7 @@ def test_concat_tz_series(self): result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - # GH 11887 - # concat tz and object + # gh-11887: concat tz and object x = Series(date_range('20151124 08:00', '20151124 09:00', freq='1h', tz='UTC')) @@ -1970,10 +1725,8 @@ def test_concat_tz_series(self): result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - # 12217 - # 12306 fixed I think - - # Concat'ing two UTC times + # see gh-12217 and gh-12306 + # Concatenating two UTC times first = pd.DataFrame([[datetime(2016, 1, 1)]]) first[0] = first[0].dt.tz_localize('UTC') @@ -1981,9 +1734,9 @@ def test_concat_tz_series(self): second[0] = second[0].dt.tz_localize('UTC') result = pd.concat([first, second]) - self.assertEqual(result[0].dtype, 'datetime64[ns, UTC]') + assert result[0].dtype == 'datetime64[ns, UTC]' - # Concat'ing two London times + # Concatenating two London times first = pd.DataFrame([[datetime(2016, 1, 1)]]) first[0] = first[0].dt.tz_localize('Europe/London') @@ -1991,9 +1744,9 @@ def test_concat_tz_series(self): second[0] = second[0].dt.tz_localize('Europe/London') result = pd.concat([first, second]) - self.assertEqual(result[0].dtype, 'datetime64[ns, Europe/London]') + assert result[0].dtype == 'datetime64[ns, Europe/London]' - # Concat'ing 2+1 London times + # Concatenating 2+1 London times first = pd.DataFrame([[datetime(2016, 1, 1)], [datetime(2016, 1, 2)]]) first[0] = first[0].dt.tz_localize('Europe/London') @@ -2001,7 +1754,7 @@ def test_concat_tz_series(self): second[0] = second[0].dt.tz_localize('Europe/London') result = pd.concat([first, second]) - self.assertEqual(result[0].dtype, 'datetime64[ns, Europe/London]') + assert result[0].dtype == 'datetime64[ns, Europe/London]' # Concat'ing 1+2 London times first = pd.DataFrame([[datetime(2016, 1, 1)]]) @@ -2011,11 +1764,10 @@ def test_concat_tz_series(self): second[0] = second[0].dt.tz_localize('Europe/London') result = pd.concat([first, second]) - self.assertEqual(result[0].dtype, 'datetime64[ns, Europe/London]') + assert result[0].dtype == 'datetime64[ns, Europe/London]' def test_concat_tz_series_with_datetimelike(self): - # GH 12620 - # tz and timedelta + # see gh-12620: tz and timedelta x = [pd.Timestamp('2011-01-01', tz='US/Eastern'), pd.Timestamp('2011-02-01', tz='US/Eastern')] y = [pd.Timedelta('1 day'), pd.Timedelta('2 day')] @@ -2028,16 +1780,15 @@ def test_concat_tz_series_with_datetimelike(self): tm.assert_series_equal(result, pd.Series(x + y, dtype='object')) def test_concat_tz_series_tzlocal(self): - # GH 13583 - tm._skip_if_no_dateutil() - import dateutil + # see gh-13583 x = [pd.Timestamp('2011-01-01', tz=dateutil.tz.tzlocal()), pd.Timestamp('2011-02-01', tz=dateutil.tz.tzlocal())] y = [pd.Timestamp('2012-01-01', tz=dateutil.tz.tzlocal()), pd.Timestamp('2012-02-01', tz=dateutil.tz.tzlocal())] + result = concat([pd.Series(x), pd.Series(y)], ignore_index=True) tm.assert_series_equal(result, pd.Series(x + y)) - self.assertEqual(result.dtype, 'datetime64[ns, tzlocal()]') + assert result.dtype == 'datetime64[ns, tzlocal()]' def test_concat_period_series(self): x = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='D')) @@ -2045,7 +1796,7 @@ def test_concat_period_series(self): expected = Series([x[0], x[1], y[0], y[1]], dtype='object') result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'object') + assert result.dtype == 'object' # different freq x = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='D')) @@ -2053,14 +1804,14 @@ def test_concat_period_series(self): expected = Series([x[0], x[1], y[0], y[1]], dtype='object') result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'object') + assert result.dtype == 'object' x = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='D')) y = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='M')) expected = Series([x[0], x[1], y[0], y[1]], dtype='object') result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'object') + assert result.dtype == 'object' # non-period x = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='D')) @@ -2068,14 +1819,14 @@ def test_concat_period_series(self): expected = Series([x[0], x[1], y[0], y[1]], dtype='object') result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'object') + assert result.dtype == 'object' x = Series(pd.PeriodIndex(['2015-11-01', '2015-12-01'], freq='D')) y = Series(['A', 'B']) expected = Series([x[0], x[1], y[0], y[1]], dtype='object') result = concat([x, y], ignore_index=True) tm.assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'object') + assert result.dtype == 'object' def test_concat_empty_series(self): # GH 11082 @@ -2105,7 +1856,7 @@ def test_default_index(self): s1 = pd.Series([1, 2, 3], name='x') s2 = pd.Series([4, 5, 6], name='y') res = pd.concat([s1, s2], axis=1, ignore_index=True) - self.assertIsInstance(res.columns, pd.RangeIndex) + assert isinstance(res.columns, pd.RangeIndex) exp = pd.DataFrame([[1, 4], [2, 5], [3, 6]]) # use check_index_type=True to check the result have # RangeIndex (default index) @@ -2116,7 +1867,7 @@ def test_default_index(self): s1 = pd.Series([1, 2, 3]) s2 = pd.Series([4, 5, 6]) res = pd.concat([s1, s2], axis=1, ignore_index=False) - self.assertIsInstance(res.columns, pd.RangeIndex) + assert isinstance(res.columns, pd.RangeIndex) exp = pd.DataFrame([[1, 4], [2, 5], [3, 6]]) exp.columns = pd.RangeIndex(2) tm.assert_frame_equal(res, exp, check_index_type=True, @@ -2172,7 +1923,59 @@ def test_concat_multiindex_dfs_with_deepcopy(self): result_no_copy = pd.concat(example_dict, names=['testname']) tm.assert_frame_equal(result_no_copy, expected) + def test_concat_categoricalindex(self): + # GH 16111, categories that aren't lexsorted + categories = [9, 0, 1, 2, 3] + + a = pd.Series(1, index=pd.CategoricalIndex([9, 0], + categories=categories)) + b = pd.Series(2, index=pd.CategoricalIndex([0, 1], + categories=categories)) + c = pd.Series(3, index=pd.CategoricalIndex([1, 2], + categories=categories)) + + result = pd.concat([a, b, c], axis=1) + + exp_idx = pd.CategoricalIndex([0, 1, 2, 9]) + exp = pd.DataFrame({0: [1, np.nan, np.nan, 1], + 1: [2, 2, np.nan, np.nan], + 2: [np.nan, 3, 3, np.nan]}, + columns=[0, 1, 2], + index=exp_idx) + tm.assert_frame_equal(result, exp) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_concat_order(self): + # GH 17344 + dfs = [pd.DataFrame(index=range(3), columns=['a', 1, None])] + dfs += [pd.DataFrame(index=range(3), columns=[None, 1, 'a']) + for i in range(100)] + result = pd.concat(dfs).columns + expected = dfs[0].columns + if PY2: + expected = expected.sort_values() + tm.assert_index_equal(result, expected) + + +@pytest.mark.parametrize('pdt', [pd.Series, pd.DataFrame, pd.Panel]) +@pytest.mark.parametrize('dt', np.sctypes['float']) +def test_concat_no_unnecessary_upcast(dt, pdt): + with catch_warnings(record=True): + # GH 13247 + dims = pdt().ndim + dfs = [pdt(np.array([1], dtype=dt, ndmin=dims)), + pdt(np.array([np.nan], dtype=dt, ndmin=dims)), + pdt(np.array([5], dtype=dt, ndmin=dims))] + x = pd.concat(dfs) + assert x.values.dtype == dt + + +@pytest.mark.parametrize('pdt', [pd.Series, pd.DataFrame, pd.Panel]) +@pytest.mark.parametrize('dt', np.sctypes['int']) +def test_concat_will_upcast(dt, pdt): + with catch_warnings(record=True): + dims = pdt().ndim + dfs = [pdt(np.array([1], dtype=dt, ndmin=dims)), + pdt(np.array([np.nan], ndmin=dims)), + pdt(np.array([5], dtype=dt, ndmin=dims))] + x = pd.concat(dfs) + assert x.values.dtype == 'float64' diff --git a/pandas/tools/tests/test_join.py b/pandas/tests/reshape/test_join.py similarity index 78% rename from pandas/tools/tests/test_join.py rename to pandas/tests/reshape/test_join.py index 4a2b64d080b4b..75c01fabea8f6 100644 --- a/pandas/tools/tests/test_join.py +++ b/pandas/tests/reshape/test_join.py @@ -1,30 +1,27 @@ # pylint: disable=E1103 -import nose - +from warnings import catch_warnings from numpy.random import randn import numpy as np +import pytest import pandas as pd from pandas.compat import lrange import pandas.compat as compat -from pandas.tools.merge import merge, concat from pandas.util.testing import assert_frame_equal -from pandas import DataFrame, MultiIndex, Series +from pandas import DataFrame, MultiIndex, Series, Index, merge, concat -import pandas._join as _join +from pandas._libs import join as libjoin import pandas.util.testing as tm -from pandas.tools.tests.test_merge import get_test_data, N, NGROUPS +from pandas.tests.reshape.test_merge import get_test_data, N, NGROUPS a_ = np.array -class TestJoin(tm.TestCase): - - _multiprocess_can_split_ = True +class TestJoin(object): - def setUp(self): + def setup_method(self, method): # aggregate multiple columns self.df = DataFrame({'key1': get_test_data(), 'key2': get_test_data(), @@ -51,7 +48,7 @@ def test_cython_left_outer_join(self): right = a_([1, 1, 0, 4, 2, 2, 1], dtype=np.int64) max_group = 5 - ls, rs = _join.left_outer_join(left, right, max_group) + ls, rs = libjoin.left_outer_join(left, right, max_group) exp_ls = left.argsort(kind='mergesort') exp_rs = right.argsort(kind='mergesort') @@ -67,15 +64,15 @@ def test_cython_left_outer_join(self): exp_rs = exp_rs.take(exp_ri) exp_rs[exp_ri == -1] = -1 - self.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) - self.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) + tm.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) + tm.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) def test_cython_right_outer_join(self): left = a_([0, 1, 2, 1, 2, 0, 0, 1, 2, 3, 3], dtype=np.int64) right = a_([1, 1, 0, 4, 2, 2, 1], dtype=np.int64) max_group = 5 - rs, ls = _join.left_outer_join(right, left, max_group) + rs, ls = libjoin.left_outer_join(right, left, max_group) exp_ls = left.argsort(kind='mergesort') exp_rs = right.argsort(kind='mergesort') @@ -93,15 +90,15 @@ def test_cython_right_outer_join(self): exp_rs = exp_rs.take(exp_ri) exp_rs[exp_ri == -1] = -1 - self.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) - self.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) + tm.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) + tm.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) def test_cython_inner_join(self): left = a_([0, 1, 2, 1, 2, 0, 0, 1, 2, 3, 3], dtype=np.int64) right = a_([1, 1, 0, 4, 2, 2, 1, 4], dtype=np.int64) max_group = 5 - ls, rs = _join.inner_join(left, right, max_group) + ls, rs = libjoin.inner_join(left, right, max_group) exp_ls = left.argsort(kind='mergesort') exp_rs = right.argsort(kind='mergesort') @@ -117,8 +114,8 @@ def test_cython_inner_join(self): exp_rs = exp_rs.take(exp_ri) exp_rs[exp_ri == -1] = -1 - self.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) - self.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) + tm.assert_numpy_array_equal(ls, exp_ls, check_dtype=False) + tm.assert_numpy_array_equal(rs, exp_rs, check_dtype=False) def test_left_outer_join(self): joined_key2 = merge(self.df, self.df2, on='key2') @@ -156,25 +153,25 @@ def test_handle_overlap(self): joined = merge(self.df, self.df2, on='key2', suffixes=['.foo', '.bar']) - self.assertIn('key1.foo', joined) - self.assertIn('key1.bar', joined) + assert 'key1.foo' in joined + assert 'key1.bar' in joined def test_handle_overlap_arbitrary_key(self): joined = merge(self.df, self.df2, left_on='key2', right_on='key1', suffixes=['.foo', '.bar']) - self.assertIn('key1.foo', joined) - self.assertIn('key2.bar', joined) + assert 'key1.foo' in joined + assert 'key2.bar' in joined def test_join_on(self): target = self.target source = self.source merged = target.join(source, on='C') - self.assert_series_equal(merged['MergedA'], target['A'], - check_names=False) - self.assert_series_equal(merged['MergedD'], target['D'], - check_names=False) + tm.assert_series_equal(merged['MergedA'], target['A'], + check_names=False) + tm.assert_series_equal(merged['MergedD'], target['D'], + check_names=False) # join with duplicates (fix regression from DataFrame/Matrix merge) df = DataFrame({'key': ['a', 'a', 'b', 'b', 'c']}) @@ -193,19 +190,19 @@ def test_join_on(self): columns=['three']) joined = df_a.join(df_b, on='one') joined = joined.join(df_c, on='one') - self.assertTrue(np.isnan(joined['two']['c'])) - self.assertTrue(np.isnan(joined['three']['c'])) + assert np.isnan(joined['two']['c']) + assert np.isnan(joined['three']['c']) # merge column not p resent - self.assertRaises(KeyError, target.join, source, on='E') + pytest.raises(KeyError, target.join, source, on='E') # overlap source_copy = source.copy() source_copy['A'] = 0 - self.assertRaises(ValueError, target.join, source_copy, on='A') + pytest.raises(ValueError, target.join, source_copy, on='A') def test_join_on_fails_with_different_right_index(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df = DataFrame({'a': np.random.choice(['m', 'f'], size=3), 'b': np.random.randn(3)}) df2 = DataFrame({'a': np.random.choice(['m', 'f'], size=10), @@ -214,7 +211,7 @@ def test_join_on_fails_with_different_right_index(self): merge(df, df2, left_on='a', right_index=True) def test_join_on_fails_with_different_left_index(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df = DataFrame({'a': np.random.choice(['m', 'f'], size=3), 'b': np.random.randn(3)}, index=tm.makeCustomIndex(10, 2)) @@ -223,7 +220,7 @@ def test_join_on_fails_with_different_left_index(self): merge(df, df2, right_on='b', left_index=True) def test_join_on_fails_with_different_column_counts(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df = DataFrame({'a': np.random.choice(['m', 'f'], size=3), 'b': np.random.randn(3)}) df2 = DataFrame({'a': np.random.choice(['m', 'f'], size=10), @@ -237,9 +234,9 @@ def test_join_on_fails_with_wrong_object_type(self): df = DataFrame({'a': [1, 1]}) for obj in wrongly_typed: - with tm.assertRaisesRegexp(ValueError, str(type(obj))): + with tm.assert_raises_regex(ValueError, str(type(obj))): merge(obj, df, left_on='a', right_on='a') - with tm.assertRaisesRegexp(ValueError, str(type(obj))): + with tm.assert_raises_regex(ValueError, str(type(obj))): merge(df, obj, left_on='a', right_on='a') def test_join_on_pass_vector(self): @@ -254,13 +251,13 @@ def test_join_with_len0(self): # nothing to merge merged = self.target.join(self.source.reindex([]), on='C') for col in self.source: - self.assertIn(col, merged) - self.assertTrue(merged[col].isnull().all()) + assert col in merged + assert merged[col].isna().all() merged2 = self.target.join(self.source.reindex([]), on='C', how='inner') - self.assert_index_equal(merged2.columns, merged.columns) - self.assertEqual(len(merged2), 0) + tm.assert_index_equal(merged2.columns, merged.columns) + assert len(merged2) == 0 def test_join_on_inner(self): df = DataFrame({'key': ['a', 'a', 'd', 'b', 'b', 'c']}) @@ -269,12 +266,12 @@ def test_join_on_inner(self): joined = df.join(df2, on='key', how='inner') expected = df.join(df2, on='key') - expected = expected[expected['value'].notnull()] - self.assert_series_equal(joined['key'], expected['key'], - check_dtype=False) - self.assert_series_equal(joined['value'], expected['value'], - check_dtype=False) - self.assert_index_equal(joined.index, expected.index) + expected = expected[expected['value'].notna()] + tm.assert_series_equal(joined['key'], expected['key'], + check_dtype=False) + tm.assert_series_equal(joined['value'], expected['value'], + check_dtype=False) + tm.assert_index_equal(joined.index, expected.index) def test_join_on_singlekey_list(self): df = DataFrame({'key': ['a', 'a', 'b', 'b', 'c']}) @@ -304,8 +301,8 @@ def test_join_index_mixed(self): df1 = DataFrame({'A': 1., 'B': 2, 'C': 'foo', 'D': True}, index=np.arange(10), columns=['A', 'B', 'C', 'D']) - self.assertEqual(df1['B'].dtype, np.int64) - self.assertEqual(df1['D'].dtype, np.bool_) + assert df1['B'].dtype == np.int64 + assert df1['D'].dtype == np.bool_ df2 = DataFrame({'A': 1., 'B': 2, 'C': 'foo', 'D': True}, index=np.arange(0, 10, 2), @@ -373,22 +370,22 @@ def test_join_multiindex(self): df2 = df2.sort_index(level=0) joined = df1.join(df2, how='outer') - ex_index = index1._tuple_index.union(index2._tuple_index) + ex_index = Index(index1.values).union(Index(index2.values)) expected = df1.reindex(ex_index).join(df2.reindex(ex_index)) expected.index.names = index1.names assert_frame_equal(joined, expected) - self.assertEqual(joined.index.names, index1.names) + assert joined.index.names == index1.names df1 = df1.sort_index(level=1) df2 = df2.sort_index(level=1) joined = df1.join(df2, how='outer').sort_index(level=0) - ex_index = index1._tuple_index.union(index2._tuple_index) + ex_index = Index(index1.values).union(Index(index2.values)) expected = df1.reindex(ex_index).join(df2.reindex(ex_index)) expected.index.names = index1.names assert_frame_equal(joined, expected) - self.assertEqual(joined.index.names, index1.names) + assert joined.index.names == index1.names def test_join_inner_multiindex(self): key1 = ['bar', 'bar', 'bar', 'foo', 'foo', 'baz', 'baz', 'qux', @@ -425,7 +422,7 @@ def test_join_inner_multiindex(self): expected = expected.drop(['first', 'second'], axis=1) expected.index = joined.index - self.assertTrue(joined.index.is_monotonic) + assert joined.index.is_monotonic assert_frame_equal(joined, expected) # _assert_same_contents(expected, expected2.loc[:, expected.columns]) @@ -440,17 +437,17 @@ def test_join_hierarchical_mixed(self): # GH 9455, 12219 with tm.assert_produces_warning(UserWarning): result = merge(new_df, other_df, left_index=True, right_index=True) - self.assertTrue(('b', 'mean') in result) - self.assertTrue('b' in result) + assert ('b', 'mean') in result + assert 'b' in result def test_join_float64_float32(self): a = DataFrame(randn(10, 2), columns=['a', 'b'], dtype=np.float64) b = DataFrame(randn(10, 1), columns=['c'], dtype=np.float32) joined = a.join(b) - self.assertEqual(joined.dtypes['a'], 'float64') - self.assertEqual(joined.dtypes['b'], 'float64') - self.assertEqual(joined.dtypes['c'], 'float32') + assert joined.dtypes['a'] == 'float64' + assert joined.dtypes['b'] == 'float64' + assert joined.dtypes['c'] == 'float32' a = np.random.randint(0, 5, 100).astype('int64') b = np.random.random(100).astype('float64') @@ -459,10 +456,10 @@ def test_join_float64_float32(self): xpdf = DataFrame({'a': a, 'b': b, 'c': c}) s = DataFrame(np.random.random(5).astype('float32'), columns=['md']) rs = df.merge(s, left_on='a', right_index=True) - self.assertEqual(rs.dtypes['a'], 'int64') - self.assertEqual(rs.dtypes['b'], 'float64') - self.assertEqual(rs.dtypes['c'], 'float32') - self.assertEqual(rs.dtypes['md'], 'float32') + assert rs.dtypes['a'] == 'int64' + assert rs.dtypes['b'] == 'float64' + assert rs.dtypes['c'] == 'float32' + assert rs.dtypes['md'] == 'float32' xp = xpdf.merge(s, left_on='a', right_index=True) assert_frame_equal(rs, xp) @@ -534,7 +531,7 @@ def test_join_sort(self): # smoke test joined = left.join(right, on='key', sort=False) - self.assert_index_equal(joined.index, pd.Index(lrange(4))) + tm.assert_index_equal(joined.index, pd.Index(lrange(4))) def test_join_mixed_non_unique_index(self): # GH 12814, unorderable types in py3 with a non-unique index @@ -553,6 +550,18 @@ def test_join_mixed_non_unique_index(self): index=[1, 2, 2, 'a']) tm.assert_frame_equal(result, expected) + def test_join_non_unique_period_index(self): + # GH #16871 + index = pd.period_range('2016-01-01', periods=16, freq='M') + df = DataFrame([i for i in range(len(index))], + index=index, columns=['pnum']) + df2 = concat([df, df]) + result = df.join(df2, how='inner', rsuffix='_df2') + expected = DataFrame( + np.tile(np.arange(16, dtype=np.int64).repeat(2).reshape(-1, 1), 2), + columns=['pnum', 'pnum_df2'], index=df2.sort_index().index) + tm.assert_frame_equal(result, expected) + def test_mixed_type_join_with_suffix(self): # GH #916 df = DataFrame(np.random.randn(20, 6), @@ -592,7 +601,7 @@ def _check_diff_index(df_list, result, exp_index): joined = df_list[0].join(df_list[1:], how='inner') _check_diff_index(df_list, joined, df.index[2:8]) - self.assertRaises(ValueError, df_list[0].join, df_list[1:], on='a') + pytest.raises(ValueError, df_list[0].join, df_list[1:], on='a') def test_join_many_mixed(self): df = DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D']) @@ -634,86 +643,89 @@ def test_join_dups(self): assert_frame_equal(dta, expected) def test_panel_join(self): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.iloc[:2, :10, :3] - p2 = panel.iloc[2:, 5:, 2:] - - # left join - result = p1.join(p2) - expected = p1.copy() - expected['ItemC'] = p2['ItemC'] - tm.assert_panel_equal(result, expected) - - # right join - result = p1.join(p2, how='right') - expected = p2.copy() - expected['ItemA'] = p1['ItemA'] - expected['ItemB'] = p1['ItemB'] - expected = expected.reindex(items=['ItemA', 'ItemB', 'ItemC']) - tm.assert_panel_equal(result, expected) - - # inner join - result = p1.join(p2, how='inner') - expected = panel.iloc[:, 5:10, 2:3] - tm.assert_panel_equal(result, expected) - - # outer join - result = p1.join(p2, how='outer') - expected = p1.reindex(major=panel.major_axis, - minor=panel.minor_axis) - expected = expected.join(p2.reindex(major=panel.major_axis, - minor=panel.minor_axis)) - tm.assert_panel_equal(result, expected) + with catch_warnings(record=True): + panel = tm.makePanel() + tm.add_nans(panel) + + p1 = panel.iloc[:2, :10, :3] + p2 = panel.iloc[2:, 5:, 2:] + + # left join + result = p1.join(p2) + expected = p1.copy() + expected['ItemC'] = p2['ItemC'] + tm.assert_panel_equal(result, expected) + + # right join + result = p1.join(p2, how='right') + expected = p2.copy() + expected['ItemA'] = p1['ItemA'] + expected['ItemB'] = p1['ItemB'] + expected = expected.reindex(items=['ItemA', 'ItemB', 'ItemC']) + tm.assert_panel_equal(result, expected) + + # inner join + result = p1.join(p2, how='inner') + expected = panel.iloc[:, 5:10, 2:3] + tm.assert_panel_equal(result, expected) + + # outer join + result = p1.join(p2, how='outer') + expected = p1.reindex(major=panel.major_axis, + minor=panel.minor_axis) + expected = expected.join(p2.reindex(major=panel.major_axis, + minor=panel.minor_axis)) + tm.assert_panel_equal(result, expected) def test_panel_join_overlap(self): - panel = tm.makePanel() - tm.add_nans(panel) - - p1 = panel.loc[['ItemA', 'ItemB', 'ItemC']] - p2 = panel.loc[['ItemB', 'ItemC']] - - # Expected index is - # - # ItemA, ItemB_p1, ItemC_p1, ItemB_p2, ItemC_p2 - joined = p1.join(p2, lsuffix='_p1', rsuffix='_p2') - p1_suf = p1.loc[['ItemB', 'ItemC']].add_suffix('_p1') - p2_suf = p2.loc[['ItemB', 'ItemC']].add_suffix('_p2') - no_overlap = panel.loc[['ItemA']] - expected = no_overlap.join(p1_suf.join(p2_suf)) - tm.assert_panel_equal(joined, expected) + with catch_warnings(record=True): + panel = tm.makePanel() + tm.add_nans(panel) + + p1 = panel.loc[['ItemA', 'ItemB', 'ItemC']] + p2 = panel.loc[['ItemB', 'ItemC']] + + # Expected index is + # + # ItemA, ItemB_p1, ItemC_p1, ItemB_p2, ItemC_p2 + joined = p1.join(p2, lsuffix='_p1', rsuffix='_p2') + p1_suf = p1.loc[['ItemB', 'ItemC']].add_suffix('_p1') + p2_suf = p2.loc[['ItemB', 'ItemC']].add_suffix('_p2') + no_overlap = panel.loc[['ItemA']] + expected = no_overlap.join(p1_suf.join(p2_suf)) + tm.assert_panel_equal(joined, expected) def test_panel_join_many(self): - tm.K = 10 - panel = tm.makePanel() - tm.K = 4 + with catch_warnings(record=True): + tm.K = 10 + panel = tm.makePanel() + tm.K = 4 - panels = [panel.iloc[:2], panel.iloc[2:6], panel.iloc[6:]] + panels = [panel.iloc[:2], panel.iloc[2:6], panel.iloc[6:]] - joined = panels[0].join(panels[1:]) - tm.assert_panel_equal(joined, panel) + joined = panels[0].join(panels[1:]) + tm.assert_panel_equal(joined, panel) - panels = [panel.iloc[:2, :-5], - panel.iloc[2:6, 2:], - panel.iloc[6:, 5:-7]] + panels = [panel.iloc[:2, :-5], + panel.iloc[2:6, 2:], + panel.iloc[6:, 5:-7]] - data_dict = {} - for p in panels: - data_dict.update(p.iteritems()) + data_dict = {} + for p in panels: + data_dict.update(p.iteritems()) - joined = panels[0].join(panels[1:], how='inner') - expected = pd.Panel.from_dict(data_dict, intersect=True) - tm.assert_panel_equal(joined, expected) + joined = panels[0].join(panels[1:], how='inner') + expected = pd.Panel.from_dict(data_dict, intersect=True) + tm.assert_panel_equal(joined, expected) - joined = panels[0].join(panels[1:], how='outer') - expected = pd.Panel.from_dict(data_dict, intersect=False) - tm.assert_panel_equal(joined, expected) + joined = panels[0].join(panels[1:], how='outer') + expected = pd.Panel.from_dict(data_dict, intersect=False) + tm.assert_panel_equal(joined, expected) - # edge cases - self.assertRaises(ValueError, panels[0].join, panels[1:], + # edge cases + pytest.raises(ValueError, panels[0].join, panels[1:], how='outer', lsuffix='foo', rsuffix='bar') - self.assertRaises(ValueError, panels[0].join, panels[1:], + pytest.raises(ValueError, panels[0].join, panels[1:], how='right') @@ -722,7 +734,7 @@ def _check_join(left, right, result, join_col, how='left', # some smoke tests for c in join_col: - assert(result[c].notnull().all()) + assert(result[c].notna().all()) left_grouped = left.groupby(join_col) right_grouped = right.groupby(join_col) @@ -785,7 +797,7 @@ def _assert_all_na(join_chunk, source_columns, join_col): for c in source_columns: if c in join_col: continue - assert(join_chunk[c].isnull().all()) + assert(join_chunk[c].isna().all()) def _join_by_hand(a, b, how='left'): @@ -799,8 +811,3 @@ def _join_by_hand(a, b, how='left'): for col, s in compat.iteritems(b_re): a_re[col] = s return a_re.reindex(columns=result_columns) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tools/tests/test_merge.py b/pandas/tests/reshape/test_merge.py similarity index 70% rename from pandas/tools/tests/test_merge.py rename to pandas/tests/reshape/test_merge.py index e08074649f7e8..338596d1523e4 100644 --- a/pandas/tools/tests/test_merge.py +++ b/pandas/tests/reshape/test_merge.py @@ -1,8 +1,7 @@ # pylint: disable=E1103 -import nose - -from datetime import datetime +import pytest +from datetime import datetime, date from numpy.random import randn from numpy import nan import numpy as np @@ -10,10 +9,11 @@ import pandas as pd from pandas.compat import lrange, lzip -from pandas.tools.merge import merge, concat, MergeError -from pandas.util.testing import (assert_frame_equal, - assert_series_equal, - slow) +from pandas.core.reshape.concat import concat +from pandas.core.reshape.merge import merge, MergeError +from pandas.util.testing import assert_frame_equal, assert_series_equal +from pandas.core.dtypes.dtypes import CategoricalDtype +from pandas.core.dtypes.common import is_categorical_dtype, is_object_dtype from pandas import DataFrame, Index, MultiIndex, Series, Categorical import pandas.util.testing as tm @@ -33,11 +33,9 @@ def get_test_data(ngroups=NGROUPS, n=N): return arr -class TestMerge(tm.TestCase): - - _multiprocess_can_split_ = True +class TestMerge(object): - def setUp(self): + def setup_method(self, method): # aggregate multiple columns self.df = DataFrame({'key1': get_test_data(), 'key2': get_test_data(), @@ -57,6 +55,14 @@ def setUp(self): self.right = DataFrame({'v2': np.random.randn(4)}, index=['d', 'b', 'c', 'a']) + def test_merge_inner_join_empty(self): + # GH 15328 + df_empty = pd.DataFrame() + df_a = pd.DataFrame({'a': [1, 2]}, index=[0, 1], dtype='int64') + result = pd.merge(df_empty, df_a, left_index=True, right_index=True) + expected = pd.DataFrame({'a': []}, index=[], dtype='int64') + assert_frame_equal(result, expected) + def test_merge_common(self): joined = merge(self.df, self.df2) exp = merge(self.df, self.df2, on=['key1', 'key2']) @@ -98,32 +104,32 @@ def test_merge_index_singlekey_inner(self): assert_frame_equal(result, expected.loc[:, result.columns]) def test_merge_misspecified(self): - self.assertRaises(ValueError, merge, self.left, self.right, - left_index=True) - self.assertRaises(ValueError, merge, self.left, self.right, - right_index=True) + pytest.raises(ValueError, merge, self.left, self.right, + left_index=True) + pytest.raises(ValueError, merge, self.left, self.right, + right_index=True) - self.assertRaises(ValueError, merge, self.left, self.left, - left_on='key', on='key') + pytest.raises(ValueError, merge, self.left, self.left, + left_on='key', on='key') - self.assertRaises(ValueError, merge, self.df, self.df2, - left_on=['key1'], right_on=['key1', 'key2']) + pytest.raises(ValueError, merge, self.df, self.df2, + left_on=['key1'], right_on=['key1', 'key2']) def test_index_and_on_parameters_confusion(self): - self.assertRaises(ValueError, merge, self.df, self.df2, how='left', - left_index=False, right_index=['key1', 'key2']) - self.assertRaises(ValueError, merge, self.df, self.df2, how='left', - left_index=['key1', 'key2'], right_index=False) - self.assertRaises(ValueError, merge, self.df, self.df2, how='left', - left_index=['key1', 'key2'], - right_index=['key1', 'key2']) + pytest.raises(ValueError, merge, self.df, self.df2, how='left', + left_index=False, right_index=['key1', 'key2']) + pytest.raises(ValueError, merge, self.df, self.df2, how='left', + left_index=['key1', 'key2'], right_index=False) + pytest.raises(ValueError, merge, self.df, self.df2, how='left', + left_index=['key1', 'key2'], + right_index=['key1', 'key2']) def test_merge_overlap(self): merged = merge(self.left, self.left, on='key') exp_len = (self.left['key'].value_counts() ** 2).sum() - self.assertEqual(len(merged), exp_len) - self.assertIn('v1_x', merged) - self.assertIn('v1_y', merged) + assert len(merged) == exp_len + assert 'v1_x' in merged + assert 'v1_y' in merged def test_merge_different_column_key_names(self): left = DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], @@ -156,10 +162,10 @@ def test_merge_copy(self): right_index=True, copy=True) merged['a'] = 6 - self.assertTrue((left['a'] == 0).all()) + assert (left['a'] == 0).all() merged['d'] = 'peekaboo' - self.assertTrue((right['d'] == 'bar').all()) + assert (right['d'] == 'bar').all() def test_merge_nocopy(self): left = DataFrame({'a': 0, 'b': 1}, index=lrange(10)) @@ -169,10 +175,10 @@ def test_merge_nocopy(self): right_index=True, copy=False) merged['a'] = 6 - self.assertTrue((left['a'] == 6).all()) + assert (left['a'] == 6).all() merged['d'] = 'peekaboo' - self.assertTrue((right['d'] == 'peekaboo').all()) + assert (right['d'] == 'peekaboo').all() def test_intelligently_handle_join_key(self): # #733, be a bit more 1337 about not returning unconsolidated DataFrame @@ -196,7 +202,7 @@ def test_merge_join_key_dtype_cast(self): df1 = DataFrame({'key': [1], 'v1': [10]}) df2 = DataFrame({'key': [2], 'v1': [20]}) df = merge(df1, df2, how='outer') - self.assertEqual(df['key'].dtype, 'int64') + assert df['key'].dtype == 'int64' df1 = DataFrame({'key': [True], 'v1': [1]}) df2 = DataFrame({'key': [False], 'v1': [0]}) @@ -204,14 +210,14 @@ def test_merge_join_key_dtype_cast(self): # GH13169 # this really should be bool - self.assertEqual(df['key'].dtype, 'object') + assert df['key'].dtype == 'object' df1 = DataFrame({'val': [1]}) df2 = DataFrame({'val': [2]}) lkey = np.array([1]) rkey = np.array([2]) df = merge(df1, df2, left_on=lkey, right_on=rkey, how='outer') - self.assertEqual(df['key_0'].dtype, 'int64') + assert df['key_0'].dtype == 'int64' def test_handle_join_key_pass_array(self): left = DataFrame({'key': [1, 1, 2, 2, 3], @@ -223,8 +229,8 @@ def test_handle_join_key_pass_array(self): merged2 = merge(right, left, left_on=key, right_on='key', how='outer') assert_series_equal(merged['key'], merged2['key']) - self.assertTrue(merged['key'].notnull().all()) - self.assertTrue(merged2['key'].notnull().all()) + assert merged['key'].notna().all() + assert merged2['key'].notna().all() left = DataFrame({'value': lrange(5)}, columns=['value']) right = DataFrame({'rvalue': lrange(6)}) @@ -232,23 +238,23 @@ def test_handle_join_key_pass_array(self): rkey = np.array([1, 1, 2, 3, 4, 5]) merged = merge(left, right, left_on=lkey, right_on=rkey, how='outer') - self.assert_series_equal(merged['key_0'], - Series([1, 1, 1, 1, 2, 2, 3, 4, 5], - name='key_0')) + tm.assert_series_equal(merged['key_0'], Series([1, 1, 1, 1, 2, + 2, 3, 4, 5], + name='key_0')) left = DataFrame({'value': lrange(3)}) right = DataFrame({'rvalue': lrange(6)}) key = np.array([0, 1, 1, 2, 2, 3], dtype=np.int64) merged = merge(left, right, left_index=True, right_on=key, how='outer') - self.assert_series_equal(merged['key_0'], Series(key, name='key_0')) + tm.assert_series_equal(merged['key_0'], Series(key, name='key_0')) def test_no_overlap_more_informative_error(self): dt = datetime.now() df1 = DataFrame({'x': ['a']}, index=[dt]) df2 = DataFrame({'y': ['b', 'c']}, index=[dt, dt]) - self.assertRaises(MergeError, merge, df1, df2) + pytest.raises(MergeError, merge, df1, df2) def test_merge_non_unique_indexes(self): @@ -419,7 +425,7 @@ def test_merge_nosort(self): exp = merge(df, new, on='var3', sort=False) assert_frame_equal(result, exp) - self.assertTrue((df.var3.unique() == result.var3.unique()).all()) + assert (df.var3.unique() == result.var3.unique()).all() def test_merge_nan_right(self): df1 = DataFrame({"i1": [0, 1], "i2": [0, 1]}) @@ -453,7 +459,7 @@ def _constructor(self): nad = NotADataFrame(self.df) result = nad.merge(self.df2, on='key1') - tm.assertIsInstance(result, NotADataFrame) + assert isinstance(result, NotADataFrame) def test_join_append_timedeltas(self): @@ -493,7 +499,7 @@ def test_other_datetime_unit(self): df2 = s.astype(dtype).to_frame('days') # coerces to datetime64[ns], thus sholuld not be affected - self.assertEqual(df2['days'].dtype, 'datetime64[ns]') + assert df2['days'].dtype == 'datetime64[ns]' result = df1.merge(df2, left_on='entity_id', right_index=True) @@ -513,7 +519,7 @@ def test_other_timedelta_unit(self): 'timedelta64[ns]']: df2 = s.astype(dtype).to_frame('days') - self.assertEqual(df2['days'].dtype, dtype) + assert df2['days'].dtype == dtype result = df1.merge(df2, left_on='entity_id', right_index=True) @@ -543,7 +549,7 @@ def test_overlapping_columns_error_message(self): # #2649, #10639 df2.columns = ['key1', 'foo', 'foo'] - self.assertRaises(ValueError, merge, df, df2) + pytest.raises(ValueError, merge, df, df2) def test_merge_on_datetime64tz(self): @@ -576,8 +582,20 @@ def test_merge_on_datetime64tz(self): 'key': [1, 2, 3]}) result = pd.merge(left, right, on='key', how='outer') assert_frame_equal(result, expected) - self.assertEqual(result['value_x'].dtype, 'datetime64[ns, US/Eastern]') - self.assertEqual(result['value_y'].dtype, 'datetime64[ns, US/Eastern]') + assert result['value_x'].dtype == 'datetime64[ns, US/Eastern]' + assert result['value_y'].dtype == 'datetime64[ns, US/Eastern]' + + def test_merge_non_unique_period_index(self): + # GH #16871 + index = pd.period_range('2016-01-01', periods=16, freq='M') + df = DataFrame([i for i in range(len(index))], + index=index, columns=['pnum']) + df2 = concat([df, df]) + result = df.merge(df2, left_index=True, right_index=True, how='inner') + expected = DataFrame( + np.tile(np.arange(16, dtype=np.int64).repeat(2).reshape(-1, 1), 2), + columns=['pnum_x', 'pnum_y'], index=df2.sort_index().index) + tm.assert_frame_equal(result, expected) def test_merge_on_periods(self): left = pd.DataFrame({'key': pd.period_range('20151010', periods=2, @@ -608,8 +626,8 @@ def test_merge_on_periods(self): 'key': [1, 2, 3]}) result = pd.merge(left, right, on='key', how='outer') assert_frame_equal(result, expected) - self.assertEqual(result['value_x'].dtype, 'object') - self.assertEqual(result['value_y'].dtype, 'object') + assert result['value_x'].dtype == 'object' + assert result['value_y'].dtype == 'object' def test_indicator(self): # PR #10054. xref #7412 and closes #8790. @@ -657,46 +675,46 @@ def test_indicator(self): assert_frame_equal(test_custom_name, df_result_custom_name) # Check only accepts strings and booleans - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): merge(df1, df2, on='col1', how='outer', indicator=5) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.merge(df2, on='col1', how='outer', indicator=5) # Check result integrity test2 = merge(df1, df2, on='col1', how='left', indicator=True) - self.assertTrue((test2._merge != 'right_only').all()) + assert (test2._merge != 'right_only').all() test2 = df1.merge(df2, on='col1', how='left', indicator=True) - self.assertTrue((test2._merge != 'right_only').all()) + assert (test2._merge != 'right_only').all() test3 = merge(df1, df2, on='col1', how='right', indicator=True) - self.assertTrue((test3._merge != 'left_only').all()) + assert (test3._merge != 'left_only').all() test3 = df1.merge(df2, on='col1', how='right', indicator=True) - self.assertTrue((test3._merge != 'left_only').all()) + assert (test3._merge != 'left_only').all() test4 = merge(df1, df2, on='col1', how='inner', indicator=True) - self.assertTrue((test4._merge == 'both').all()) + assert (test4._merge == 'both').all() test4 = df1.merge(df2, on='col1', how='inner', indicator=True) - self.assertTrue((test4._merge == 'both').all()) + assert (test4._merge == 'both').all() # Check if working name in df for i in ['_right_indicator', '_left_indicator', '_merge']: df_badcolumn = DataFrame({'col1': [1, 2], i: [2, 2]}) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): merge(df1, df_badcolumn, on='col1', how='outer', indicator=True) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.merge(df_badcolumn, on='col1', how='outer', indicator=True) # Check for name conflict with custom name df_badcolumn = DataFrame( {'col1': [1, 2], 'custom_column_name': [2, 2]}) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): merge(df1, df_badcolumn, on='col1', how='outer', indicator='custom_column_name') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df1.merge(df_badcolumn, on='col1', how='outer', indicator='custom_column_name') @@ -718,6 +736,130 @@ def test_indicator(self): how='outer', indicator=True) assert_frame_equal(test5, hand_coded_result) + def test_validation(self): + left = DataFrame({'a': ['a', 'b', 'c', 'd'], + 'b': ['cat', 'dog', 'weasel', 'horse']}, + index=range(4)) + + right = DataFrame({'a': ['a', 'b', 'c', 'd', 'e'], + 'c': ['meow', 'bark', 'um... weasel noise?', + 'nay', 'chirp']}, + index=range(5)) + + # Make sure no side effects. + left_copy = left.copy() + right_copy = right.copy() + + result = merge(left, right, left_index=True, right_index=True, + validate='1:1') + assert_frame_equal(left, left_copy) + assert_frame_equal(right, right_copy) + + # make sure merge still correct + expected = DataFrame({'a_x': ['a', 'b', 'c', 'd'], + 'b': ['cat', 'dog', 'weasel', 'horse'], + 'a_y': ['a', 'b', 'c', 'd'], + 'c': ['meow', 'bark', 'um... weasel noise?', + 'nay']}, + index=range(4), + columns=['a_x', 'b', 'a_y', 'c']) + + result = merge(left, right, left_index=True, right_index=True, + validate='one_to_one') + assert_frame_equal(result, expected) + + expected_2 = DataFrame({'a': ['a', 'b', 'c', 'd'], + 'b': ['cat', 'dog', 'weasel', 'horse'], + 'c': ['meow', 'bark', 'um... weasel noise?', + 'nay']}, + index=range(4)) + + result = merge(left, right, on='a', validate='1:1') + assert_frame_equal(left, left_copy) + assert_frame_equal(right, right_copy) + assert_frame_equal(result, expected_2) + + result = merge(left, right, on='a', validate='one_to_one') + assert_frame_equal(result, expected_2) + + # One index, one column + expected_3 = DataFrame({'b': ['cat', 'dog', 'weasel', 'horse'], + 'a': ['a', 'b', 'c', 'd'], + 'c': ['meow', 'bark', 'um... weasel noise?', + 'nay']}, + columns=['b', 'a', 'c'], + index=range(4)) + + left_index_reset = left.set_index('a') + result = merge(left_index_reset, right, left_index=True, + right_on='a', validate='one_to_one') + assert_frame_equal(result, expected_3) + + # Dups on right + right_w_dups = right.append(pd.DataFrame({'a': ['e'], 'c': ['moo']}, + index=[4])) + merge(left, right_w_dups, left_index=True, right_index=True, + validate='one_to_many') + + with pytest.raises(MergeError): + merge(left, right_w_dups, left_index=True, right_index=True, + validate='one_to_one') + + with pytest.raises(MergeError): + merge(left, right_w_dups, on='a', validate='one_to_one') + + # Dups on left + left_w_dups = left.append(pd.DataFrame({'a': ['a'], 'c': ['cow']}, + index=[3])) + merge(left_w_dups, right, left_index=True, right_index=True, + validate='many_to_one') + + with pytest.raises(MergeError): + merge(left_w_dups, right, left_index=True, right_index=True, + validate='one_to_one') + + with pytest.raises(MergeError): + merge(left_w_dups, right, on='a', validate='one_to_one') + + # Dups on both + merge(left_w_dups, right_w_dups, on='a', validate='many_to_many') + + with pytest.raises(MergeError): + merge(left_w_dups, right_w_dups, left_index=True, + right_index=True, validate='many_to_one') + + with pytest.raises(MergeError): + merge(left_w_dups, right_w_dups, on='a', + validate='one_to_many') + + # Check invalid arguments + with pytest.raises(ValueError): + merge(left, right, on='a', validate='jibberish') + + # Two column merge, dups in both, but jointly no dups. + left = DataFrame({'a': ['a', 'a', 'b', 'b'], + 'b': [0, 1, 0, 1], + 'c': ['cat', 'dog', 'weasel', 'horse']}, + index=range(4)) + + right = DataFrame({'a': ['a', 'a', 'b'], + 'b': [0, 1, 0], + 'd': ['meow', 'bark', 'um... weasel noise?']}, + index=range(3)) + + expected_multi = DataFrame({'a': ['a', 'a', 'b'], + 'b': [0, 1, 0], + 'c': ['cat', 'dog', 'weasel'], + 'd': ['meow', 'bark', + 'um... weasel noise?']}, + index=range(3)) + + with pytest.raises(MergeError): + merge(left, right, on='a', validate='1:1') + + result = merge(left, right, on=['a', 'b'], validate='1:1') + assert_frame_equal(result, expected_multi) + def _check_merge(x, y): for how in ['inner', 'left', 'outer']: @@ -731,9 +873,9 @@ def _check_merge(x, y): assert_frame_equal(result, expected, check_names=False) -class TestMergeMulti(tm.TestCase): +class TestMergeMulti(object): - def setUp(self): + def setup_method(self, method): self.index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], @@ -783,15 +925,15 @@ def run_asserts(left, right): for sort in [False, True]: res = left.join(right, on=icols, how='left', sort=sort) - self.assertTrue(len(left) < len(res) + 1) - self.assertFalse(res['4th'].isnull().any()) - self.assertFalse(res['5th'].isnull().any()) + assert len(left) < len(res) + 1 + assert not res['4th'].isna().any() + assert not res['5th'].isna().any() tm.assert_series_equal( res['4th'], - res['5th'], check_names=False) result = bind_cols(res.iloc[:, :-2]) tm.assert_series_equal(res['4th'], result, check_names=False) - self.assertTrue(result.name is None) + assert result.name is None if sort: tm.assert_frame_equal( @@ -1021,38 +1163,6 @@ def test_left_join_index_multi_match(self): expected.index = np.arange(len(expected)) tm.assert_frame_equal(result, expected) - def test_join_multi_dtypes(self): - - # test with multi dtypes in the join index - def _test(dtype1, dtype2): - left = DataFrame({'k1': np.array([0, 1, 2] * 8, dtype=dtype1), - 'k2': ['foo', 'bar'] * 12, - 'v': np.array(np.arange(24), dtype=np.int64)}) - - index = MultiIndex.from_tuples([(2, 'bar'), (1, 'foo')]) - right = DataFrame( - {'v2': np.array([5, 7], dtype=dtype2)}, index=index) - - result = left.join(right, on=['k1', 'k2']) - - expected = left.copy() - - if dtype2.kind == 'i': - dtype2 = np.dtype('float64') - expected['v2'] = np.array(np.nan, dtype=dtype2) - expected.loc[(expected.k1 == 2) & (expected.k2 == 'bar'), 'v2'] = 5 - expected.loc[(expected.k1 == 1) & (expected.k2 == 'foo'), 'v2'] = 7 - - tm.assert_frame_equal(result, expected) - - result = left.join(right, on=['k1', 'k2'], sort=True) - expected.sort_values(['k1', 'k2'], kind='mergesort', inplace=True) - tm.assert_frame_equal(result, expected) - - for d1 in [np.int64, np.int32, np.int16, np.int8, np.uint8]: - for d2 in [np.int64, np.float64, np.float32, np.float16]: - _test(np.dtype(d1), np.dtype(d2)) - def test_left_merge_na_buglet(self): left = DataFrame({'id': list('abcde'), 'v1': randn(5), 'v2': randn(5), 'dummy': list('abcde'), @@ -1095,137 +1205,6 @@ def test_merge_na_keys(self): tm.assert_frame_equal(result, expected) - @slow - def test_int64_overflow_issues(self): - from itertools import product - from collections import defaultdict - from pandas.core.groupby import _int64_overflow_possible - - # #2690, combinatorial explosion - df1 = DataFrame(np.random.randn(1000, 7), - columns=list('ABCDEF') + ['G1']) - df2 = DataFrame(np.random.randn(1000, 7), - columns=list('ABCDEF') + ['G2']) - - # it works! - result = merge(df1, df2, how='outer') - self.assertTrue(len(result) == 2000) - - low, high, n = -1 << 10, 1 << 10, 1 << 20 - left = DataFrame(np.random.randint(low, high, (n, 7)), - columns=list('ABCDEFG')) - left['left'] = left.sum(axis=1) - - # one-2-one match - i = np.random.permutation(len(left)) - right = left.iloc[i].copy() - right.columns = right.columns[:-1].tolist() + ['right'] - right.index = np.arange(len(right)) - right['right'] *= -1 - - out = merge(left, right, how='outer') - self.assertEqual(len(out), len(left)) - assert_series_equal(out['left'], - out['right'], check_names=False) - result = out.iloc[:, :-2].sum(axis=1) - assert_series_equal(out['left'], result, check_names=False) - self.assertTrue(result.name is None) - - out.sort_values(out.columns.tolist(), inplace=True) - out.index = np.arange(len(out)) - for how in ['left', 'right', 'outer', 'inner']: - assert_frame_equal(out, merge(left, right, how=how, sort=True)) - - # check that left merge w/ sort=False maintains left frame order - out = merge(left, right, how='left', sort=False) - assert_frame_equal(left, out[left.columns.tolist()]) - - out = merge(right, left, how='left', sort=False) - assert_frame_equal(right, out[right.columns.tolist()]) - - # one-2-many/none match - n = 1 << 11 - left = DataFrame(np.random.randint(low, high, (n, 7)).astype('int64'), - columns=list('ABCDEFG')) - - # confirm that this is checking what it is supposed to check - shape = left.apply(Series.nunique).values - self.assertTrue(_int64_overflow_possible(shape)) - - # add duplicates to left frame - left = concat([left, left], ignore_index=True) - - right = DataFrame(np.random.randint(low, high, (n // 2, 7)) - .astype('int64'), - columns=list('ABCDEFG')) - - # add duplicates & overlap with left to the right frame - i = np.random.choice(len(left), n) - right = concat([right, right, left.iloc[i]], ignore_index=True) - - left['left'] = np.random.randn(len(left)) - right['right'] = np.random.randn(len(right)) - - # shuffle left & right frames - i = np.random.permutation(len(left)) - left = left.iloc[i].copy() - left.index = np.arange(len(left)) - - i = np.random.permutation(len(right)) - right = right.iloc[i].copy() - right.index = np.arange(len(right)) - - # manually compute outer merge - ldict, rdict = defaultdict(list), defaultdict(list) - - for idx, row in left.set_index(list('ABCDEFG')).iterrows(): - ldict[idx].append(row['left']) - - for idx, row in right.set_index(list('ABCDEFG')).iterrows(): - rdict[idx].append(row['right']) - - vals = [] - for k, lval in ldict.items(): - rval = rdict.get(k, [np.nan]) - for lv, rv in product(lval, rval): - vals.append(k + tuple([lv, rv])) - - for k, rval in rdict.items(): - if k not in ldict: - for rv in rval: - vals.append(k + tuple([np.nan, rv])) - - def align(df): - df = df.sort_values(df.columns.tolist()) - df.index = np.arange(len(df)) - return df - - def verify_order(df): - kcols = list('ABCDEFG') - assert_frame_equal(df[kcols].copy(), - df[kcols].sort_values(kcols, kind='mergesort')) - - out = DataFrame(vals, columns=list('ABCDEFG') + ['left', 'right']) - out = align(out) - - jmask = {'left': out['left'].notnull(), - 'right': out['right'].notnull(), - 'inner': out['left'].notnull() & out['right'].notnull(), - 'outer': np.ones(len(out), dtype='bool')} - - for how in 'left', 'right', 'outer', 'inner': - mask = jmask[how] - frame = align(out[mask].copy()) - self.assertTrue(mask.all() ^ mask.any() or how == 'outer') - - for sort in [False, True]: - res = merge(left, right, how=how, sort=sort) - if sort: - verify_order(res) - - # as in GH9092 dtypes break with outer/right join - assert_frame_equal(frame, align(res), - check_dtype=how not in ('right', 'outer')) - def test_join_multi_levels(self): # GH 3662 @@ -1293,14 +1272,14 @@ def test_join_multi_levels(self): def f(): household.join(portfolio, how='inner') - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) portfolio2 = portfolio.copy() portfolio2.index.set_names(['household_id', 'foo']) def f(): portfolio2.join(portfolio, how='inner') - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_join_multi_levels2(self): @@ -1341,7 +1320,7 @@ def test_join_multi_levels2(self): def f(): household.join(log_return, how='inner') - self.assertRaises(NotImplementedError, f) + pytest.raises(NotImplementedError, f) # this is the equivalency result = (merge(household.reset_index(), log_return.reset_index(), @@ -1369,9 +1348,251 @@ def f(): def f(): household.join(log_return, how='outer') - self.assertRaises(NotImplementedError, f) + pytest.raises(NotImplementedError, f) + + +@pytest.fixture +def df(): + return DataFrame( + {'A': ['foo', 'bar'], + 'B': Series(['foo', 'bar']).astype('category'), + 'C': [1, 2], + 'D': [1.0, 2.0], + 'E': Series([1, 2], dtype='uint64'), + 'F': Series([1, 2], dtype='int32')}) + + +class TestMergeDtypes(object): + + def test_different(self, df): + + # we expect differences by kind + # to be ok, while other differences should return object + + left = df + for col in df.columns: + right = DataFrame({'A': df[col]}) + result = pd.merge(left, right, on='A') + assert is_object_dtype(result.A.dtype) + @pytest.mark.parametrize('d1', [np.int64, np.int32, + np.int16, np.int8, np.uint8]) + @pytest.mark.parametrize('d2', [np.int64, np.float64, + np.float32, np.float16]) + def test_join_multi_dtypes(self, d1, d2): -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + dtype1 = np.dtype(d1) + dtype2 = np.dtype(d2) + + left = DataFrame({'k1': np.array([0, 1, 2] * 8, dtype=dtype1), + 'k2': ['foo', 'bar'] * 12, + 'v': np.array(np.arange(24), dtype=np.int64)}) + + index = MultiIndex.from_tuples([(2, 'bar'), (1, 'foo')]) + right = DataFrame({'v2': np.array([5, 7], dtype=dtype2)}, index=index) + + result = left.join(right, on=['k1', 'k2']) + + expected = left.copy() + + if dtype2.kind == 'i': + dtype2 = np.dtype('float64') + expected['v2'] = np.array(np.nan, dtype=dtype2) + expected.loc[(expected.k1 == 2) & (expected.k2 == 'bar'), 'v2'] = 5 + expected.loc[(expected.k1 == 1) & (expected.k2 == 'foo'), 'v2'] = 7 + + tm.assert_frame_equal(result, expected) + + result = left.join(right, on=['k1', 'k2'], sort=True) + expected.sort_values(['k1', 'k2'], kind='mergesort', inplace=True) + tm.assert_frame_equal(result, expected) + + +@pytest.fixture +def left(): + np.random.seed(1234) + return DataFrame( + {'X': Series(np.random.choice( + ['foo', 'bar'], + size=(10,))).astype('category', categories=['foo', 'bar']), + 'Y': np.random.choice(['one', 'two', 'three'], size=(10,))}) + + +@pytest.fixture +def right(): + np.random.seed(1234) + return DataFrame( + {'X': Series(['foo', 'bar']).astype('category', + categories=['foo', 'bar']), + 'Z': [1, 2]}) + + +class TestMergeCategorical(object): + + def test_identical(self, left): + # merging on the same, should preserve dtypes + merged = pd.merge(left, left, on='X') + result = merged.dtypes.sort_index() + expected = Series([CategoricalDtype(), + np.dtype('O'), + np.dtype('O')], + index=['X', 'Y_x', 'Y_y']) + assert_series_equal(result, expected) + + def test_basic(self, left, right): + # we have matching Categorical dtypes in X + # so should preserve the merged column + merged = pd.merge(left, right, on='X') + result = merged.dtypes.sort_index() + expected = Series([CategoricalDtype(), + np.dtype('O'), + np.dtype('int64')], + index=['X', 'Y', 'Z']) + assert_series_equal(result, expected) + + def test_other_columns(self, left, right): + # non-merge columns should preserve if possible + right = right.assign(Z=right.Z.astype('category')) + + merged = pd.merge(left, right, on='X') + result = merged.dtypes.sort_index() + expected = Series([CategoricalDtype(), + np.dtype('O'), + CategoricalDtype()], + index=['X', 'Y', 'Z']) + assert_series_equal(result, expected) + + # categories are preserved + assert left.X.values.is_dtype_equal(merged.X.values) + assert right.Z.values.is_dtype_equal(merged.Z.values) + + @pytest.mark.parametrize( + 'change', [lambda x: x, + lambda x: x.astype('category', + categories=['bar', 'foo']), + lambda x: x.astype('category', + categories=['foo', 'bar', 'bah']), + lambda x: x.astype('category', ordered=True)]) + @pytest.mark.parametrize('how', ['inner', 'outer', 'left', 'right']) + def test_dtype_on_merged_different(self, change, how, left, right): + # our merging columns, X now has 2 different dtypes + # so we must be object as a result + + X = change(right.X.astype('object')) + right = right.assign(X=X) + assert is_categorical_dtype(left.X.values) + assert not left.X.values.is_dtype_equal(right.X.values) + + merged = pd.merge(left, right, on='X', how=how) + + result = merged.dtypes.sort_index() + expected = Series([np.dtype('O'), + np.dtype('O'), + np.dtype('int64')], + index=['X', 'Y', 'Z']) + assert_series_equal(result, expected) + + def test_self_join_multiple_categories(self): + # GH 16767 + # non-duplicates should work with multiple categories + m = 5 + df = pd.DataFrame({ + 'a': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] * m, + 'b': ['t', 'w', 'x', 'y', 'z'] * 2 * m, + 'c': [letter + for each in ['m', 'n', 'u', 'p', 'o'] + for letter in [each] * 2 * m], + 'd': [letter + for each in ['aa', 'bb', 'cc', 'dd', 'ee', + 'ff', 'gg', 'hh', 'ii', 'jj'] + for letter in [each] * m]}) + + # change them all to categorical variables + df = df.apply(lambda x: x.astype('category')) + + # self-join should equal ourselves + result = pd.merge(df, df, on=list(df.columns)) + + assert_frame_equal(result, df) + + def test_dtype_on_categorical_dates(self): + # GH 16900 + # dates should not be coerced to ints + + df = pd.DataFrame( + [[date(2001, 1, 1), 1.1], + [date(2001, 1, 2), 1.3]], + columns=['date', 'num2'] + ) + df['date'] = df['date'].astype('category') + + df2 = pd.DataFrame( + [[date(2001, 1, 1), 1.3], + [date(2001, 1, 3), 1.4]], + columns=['date', 'num4'] + ) + df2['date'] = df2['date'].astype('category') + + expected_outer = pd.DataFrame([ + [pd.Timestamp('2001-01-01'), 1.1, 1.3], + [pd.Timestamp('2001-01-02'), 1.3, np.nan], + [pd.Timestamp('2001-01-03'), np.nan, 1.4]], + columns=['date', 'num2', 'num4'] + ) + result_outer = pd.merge(df, df2, how='outer', on=['date']) + assert_frame_equal(result_outer, expected_outer) + + expected_inner = pd.DataFrame( + [[pd.Timestamp('2001-01-01'), 1.1, 1.3]], + columns=['date', 'num2', 'num4'] + ) + result_inner = pd.merge(df, df2, how='inner', on=['date']) + assert_frame_equal(result_inner, expected_inner) + + +@pytest.fixture +def left_df(): + return DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0]) + + +@pytest.fixture +def right_df(): + return DataFrame({'b': [300, 100, 200]}, index=[3, 1, 2]) + + +class TestMergeOnIndexes(object): + + @pytest.mark.parametrize( + "how, sort, expected", + [('inner', False, DataFrame({'a': [20, 10], + 'b': [200, 100]}, + index=[2, 1])), + ('inner', True, DataFrame({'a': [10, 20], + 'b': [100, 200]}, + index=[1, 2])), + ('left', False, DataFrame({'a': [20, 10, 0], + 'b': [200, 100, np.nan]}, + index=[2, 1, 0])), + ('left', True, DataFrame({'a': [0, 10, 20], + 'b': [np.nan, 100, 200]}, + index=[0, 1, 2])), + ('right', False, DataFrame({'a': [np.nan, 10, 20], + 'b': [300, 100, 200]}, + index=[3, 1, 2])), + ('right', True, DataFrame({'a': [10, 20, np.nan], + 'b': [100, 200, 300]}, + index=[1, 2, 3])), + ('outer', False, DataFrame({'a': [0, 10, 20, np.nan], + 'b': [np.nan, 100, 200, 300]}, + index=[0, 1, 2, 3])), + ('outer', True, DataFrame({'a': [0, 10, 20, np.nan], + 'b': [np.nan, 100, 200, 300]}, + index=[0, 1, 2, 3]))]) + def test_merge_on_indexes(self, left_df, right_df, how, sort, expected): + + result = pd.merge(left_df, right_df, + left_index=True, + right_index=True, + how=how, + sort=sort) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tools/tests/test_merge_asof.py b/pandas/tests/reshape/test_merge_asof.py similarity index 93% rename from pandas/tools/tests/test_merge_asof.py rename to pandas/tests/reshape/test_merge_asof.py index ef7b25008e80a..78bfa2ff8597c 100644 --- a/pandas/tools/tests/test_merge_asof.py +++ b/pandas/tests/reshape/test_merge_asof.py @@ -1,18 +1,17 @@ -import nose import os +import pytest import pytz import numpy as np import pandas as pd from pandas import (merge_asof, read_csv, to_datetime, Timedelta) -from pandas.tools.merge import MergeError +from pandas.core.reshape.merge import MergeError from pandas.util import testing as tm from pandas.util.testing import assert_frame_equal -class TestAsOfMerge(tm.TestCase): - _multiprocess_can_split_ = True +class TestAsOfMerge(object): def read_data(self, name, dedupe=False): path = os.path.join(tm.get_data_path(), name) @@ -24,7 +23,7 @@ def read_data(self, name, dedupe=False): x.time = to_datetime(x.time) return x - def setUp(self): + def setup_method(self, method): self.trades = self.read_data('trades.csv') self.quotes = self.read_data('quotes.csv', dedupe=True) @@ -149,6 +148,7 @@ def test_basic_categorical(self): trades.ticker = trades.ticker.astype('category') quotes = self.quotes.copy() quotes.ticker = quotes.ticker.astype('category') + expected.ticker = expected.ticker.astype('category') result = merge_asof(trades, quotes, on='time', @@ -201,14 +201,14 @@ def test_multi_index(self): # MultiIndex is prohibited trades = self.trades.set_index(['time', 'price']) quotes = self.quotes.set_index('time') - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, left_index=True, right_index=True) trades = self.trades.set_index('time') quotes = self.quotes.set_index(['time', 'bid']) - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, left_index=True, right_index=True) @@ -218,7 +218,7 @@ def test_on_and_index(self): # 'on' parameter and index together is prohibited trades = self.trades.set_index('time') quotes = self.quotes.set_index('time') - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, left_on='price', left_index=True, @@ -226,7 +226,7 @@ def test_on_and_index(self): trades = self.trades.set_index('time') quotes = self.quotes.set_index('time') - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, right_on='bid', left_index=True, @@ -369,6 +369,41 @@ def test_multiby_heterogeneous_types(self): by=['ticker', 'exch']) assert_frame_equal(result, expected) + def test_multiby_indexed(self): + # GH15676 + left = pd.DataFrame([ + [pd.to_datetime('20160602'), 1, 'a'], + [pd.to_datetime('20160602'), 2, 'a'], + [pd.to_datetime('20160603'), 1, 'b'], + [pd.to_datetime('20160603'), 2, 'b']], + columns=['time', 'k1', 'k2']).set_index('time') + + right = pd.DataFrame([ + [pd.to_datetime('20160502'), 1, 'a', 1.0], + [pd.to_datetime('20160502'), 2, 'a', 2.0], + [pd.to_datetime('20160503'), 1, 'b', 3.0], + [pd.to_datetime('20160503'), 2, 'b', 4.0]], + columns=['time', 'k1', 'k2', 'value']).set_index('time') + + expected = pd.DataFrame([ + [pd.to_datetime('20160602'), 1, 'a', 1.0], + [pd.to_datetime('20160602'), 2, 'a', 2.0], + [pd.to_datetime('20160603'), 1, 'b', 3.0], + [pd.to_datetime('20160603'), 2, 'b', 4.0]], + columns=['time', 'k1', 'k2', 'value']).set_index('time') + + result = pd.merge_asof(left, + right, + left_index=True, + right_index=True, + by=['k1', 'k2']) + + assert_frame_equal(expected, result) + + with pytest.raises(MergeError): + pd.merge_asof(left, right, left_index=True, right_index=True, + left_by=['k1', 'k2'], right_by=['k1']) + def test_basic2(self): expected = self.read_data('asof2.csv') @@ -398,18 +433,18 @@ def test_valid_join_keys(self): trades = self.trades quotes = self.quotes - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, left_on='time', right_on='bid', by='ticker') - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, on=['time', 'ticker'], by='ticker') - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, by='ticker') @@ -440,7 +475,7 @@ def test_valid_allow_exact_matches(self): trades = self.trades quotes = self.quotes - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, on='time', by='ticker', @@ -464,27 +499,27 @@ def test_valid_tolerance(self): tolerance=1) # incompat - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, on='time', by='ticker', tolerance=1) # invalid - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades.reset_index(), quotes.reset_index(), on='index', by='ticker', tolerance=1.0) # invalid negative - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades, quotes, on='time', by='ticker', tolerance=-Timedelta('1s')) - with self.assertRaises(MergeError): + with pytest.raises(MergeError): merge_asof(trades.reset_index(), quotes.reset_index(), on='index', by='ticker', @@ -496,24 +531,24 @@ def test_non_sorted(self): quotes = self.quotes.sort_values('time', ascending=False) # we require that we are already sorted on time & quotes - self.assertFalse(trades.time.is_monotonic) - self.assertFalse(quotes.time.is_monotonic) - with self.assertRaises(ValueError): + assert not trades.time.is_monotonic + assert not quotes.time.is_monotonic + with pytest.raises(ValueError): merge_asof(trades, quotes, on='time', by='ticker') trades = self.trades.sort_values('time') - self.assertTrue(trades.time.is_monotonic) - self.assertFalse(quotes.time.is_monotonic) - with self.assertRaises(ValueError): + assert trades.time.is_monotonic + assert not quotes.time.is_monotonic + with pytest.raises(ValueError): merge_asof(trades, quotes, on='time', by='ticker') quotes = self.quotes.sort_values('time') - self.assertTrue(trades.time.is_monotonic) - self.assertTrue(quotes.time.is_monotonic) + assert trades.time.is_monotonic + assert quotes.time.is_monotonic # ok, though has dupes merge_asof(trades, self.quotes, @@ -687,7 +722,7 @@ def test_allow_exact_matches_and_tolerance3(self): # GH 13709 df1 = pd.DataFrame({ 'time': pd.to_datetime(['2016-07-15 13:30:00.030', - '2016-07-15 13:30:00.030']), + '2016-07-15 13:30:00.030']), 'username': ['bob', 'charlie']}) df2 = pd.DataFrame({ 'time': pd.to_datetime(['2016-07-15 13:30:00.000', @@ -857,7 +892,7 @@ def test_on_specialized_type(self): df1 = df1.sort_values('value').reset_index(drop=True) if dtype == np.float16: - with self.assertRaises(MergeError): + with pytest.raises(MergeError): pd.merge_asof(df1, df2, on='value') continue @@ -894,7 +929,7 @@ def test_on_specialized_type_by_int(self): df1 = df1.sort_values('value').reset_index(drop=True) if dtype == np.float16: - with self.assertRaises(MergeError): + with pytest.raises(MergeError): pd.merge_asof(df1, df2, on='value', by='key') else: result = pd.merge_asof(df1, df2, on='value', by='key') @@ -938,8 +973,3 @@ def test_on_float_by_int(self): columns=['symbol', 'exch', 'price', 'mpv']) assert_frame_equal(result, expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tools/tests/test_merge_ordered.py b/pandas/tests/reshape/test_merge_ordered.py similarity index 85% rename from pandas/tools/tests/test_merge_ordered.py rename to pandas/tests/reshape/test_merge_ordered.py index d163468abc88e..9b1806ee52c1d 100644 --- a/pandas/tools/tests/test_merge_ordered.py +++ b/pandas/tests/reshape/test_merge_ordered.py @@ -1,5 +1,3 @@ -import nose - import pandas as pd from pandas import DataFrame, merge_ordered from pandas.util import testing as tm @@ -8,9 +6,9 @@ from numpy import nan -class TestOrderedMerge(tm.TestCase): +class TestOrderedMerge(object): - def setUp(self): + def setup_method(self, method): self.left = DataFrame({'key': ['a', 'c', 'e'], 'lvalue': [1, 2., 3]}) @@ -42,10 +40,8 @@ def test_ffill(self): def test_multigroup(self): left = pd.concat([self.left, self.left], ignore_index=True) - # right = concat([self.right, self.right], ignore_index=True) left['group'] = ['a'] * 3 + ['b'] * 3 - # right['group'] = ['a'] * 4 + ['b'] * 4 result = merge_ordered(left, self.right, on='key', left_by='group', fill_method='ffill') @@ -61,7 +57,7 @@ def test_multigroup(self): assert_frame_equal(result, result2.loc[:, result.columns]) result = merge_ordered(left, self.right, on='key', left_by='group') - self.assertTrue(result['group'].notnull().all()) + assert result['group'].notna().all() def test_merge_type(self): class NotADataFrame(DataFrame): @@ -73,7 +69,7 @@ def _constructor(self): nad = NotADataFrame(self.left) result = nad.merge(self.right, on='key') - tm.assertIsInstance(result, NotADataFrame) + assert isinstance(result, NotADataFrame) def test_empty_sequence_concat(self): # GH 9157 @@ -87,12 +83,8 @@ def test_empty_sequence_concat(self): ([None, None], none_pat) ] for df_seq, pattern in test_cases: - tm.assertRaisesRegexp(ValueError, pattern, pd.concat, df_seq) + tm.assert_raises_regex(ValueError, pattern, pd.concat, df_seq) pd.concat([pd.DataFrame()]) pd.concat([None, pd.DataFrame()]) pd.concat([pd.DataFrame(), None]) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tools/tests/test_pivot.py b/pandas/tests/reshape/test_pivot.py similarity index 79% rename from pandas/tools/tests/test_pivot.py rename to pandas/tests/reshape/test_pivot.py index 9cc520a7adb05..879ac96680fbb 100644 --- a/pandas/tools/tests/test_pivot.py +++ b/pandas/tests/reshape/test_pivot.py @@ -1,20 +1,23 @@ from datetime import datetime, date, timedelta +import pytest + + import numpy as np +from collections import OrderedDict import pandas as pd -from pandas import DataFrame, Series, Index, MultiIndex, Grouper -from pandas.tools.merge import concat -from pandas.tools.pivot import pivot_table, crosstab +from pandas import (DataFrame, Series, Index, MultiIndex, + Grouper, date_range, concat) +from pandas.core.reshape.pivot import pivot_table, crosstab from pandas.compat import range, product import pandas.util.testing as tm +from pandas.tseries.util import pivot_annual, isleapyear -class TestPivotTable(tm.TestCase): +class TestPivotTable(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.data = DataFrame({'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo'], @@ -42,14 +45,14 @@ def test_pivot_table(self): pivot_table(self.data, values='D', index=index) if len(index) > 1: - self.assertEqual(table.index.names, tuple(index)) + assert table.index.names == tuple(index) else: - self.assertEqual(table.index.name, index[0]) + assert table.index.name == index[0] if len(columns) > 1: - self.assertEqual(table.columns.names, columns) + assert table.columns.names == columns else: - self.assertEqual(table.columns.name, columns[0]) + assert table.columns.name == columns[0] expected = self.data.groupby( index + [columns])['D'].agg(np.mean).unstack() @@ -87,6 +90,39 @@ def test_pivot_table_dropna(self): tm.assert_index_equal(pv_col.columns, m) tm.assert_index_equal(pv_ind.index, m) + def test_pivot_table_dropna_categoricals(self): + # GH 15193 + categories = ['a', 'b', 'c', 'd'] + + df = DataFrame({'A': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], + 'B': [1, 2, 3, 1, 2, 3, 1, 2, 3], + 'C': range(0, 9)}) + + df['A'] = df['A'].astype('category', ordered=False, + categories=categories) + result_true = df.pivot_table(index='B', columns='A', values='C', + dropna=True) + expected_columns = Series(['a', 'b', 'c'], name='A') + expected_columns = expected_columns.astype('category', ordered=False, + categories=categories) + expected_index = Series([1, 2, 3], name='B') + expected_true = DataFrame([[0.0, 3.0, 6.0], + [1.0, 4.0, 7.0], + [2.0, 5.0, 8.0]], + index=expected_index, + columns=expected_columns,) + tm.assert_frame_equal(expected_true, result_true) + + result_false = df.pivot_table(index='B', columns='A', values='C', + dropna=False) + expected_columns = Series(['a', 'b', 'c', 'd'], name='A') + expected_false = DataFrame([[0.0, 3.0, 6.0, np.NaN], + [1.0, 4.0, 7.0, np.NaN], + [2.0, 5.0, 8.0, np.NaN]], + index=expected_index, + columns=expected_columns,) + tm.assert_frame_equal(expected_false, result_false) + def test_pass_array(self): result = self.data.pivot_table( 'D', index=self.data.A, columns=self.data.C) @@ -112,7 +148,7 @@ def test_pivot_dtypes(self): # can convert dtypes f = DataFrame({'a': ['cat', 'bat', 'cat', 'bat'], 'v': [ 1, 2, 3, 4], 'i': ['a', 'b', 'a', 'b']}) - self.assertEqual(f.dtypes['v'], 'int64') + assert f.dtypes['v'] == 'int64' z = pivot_table(f, values='v', index=['a'], columns=[ 'i'], fill_value=0, aggfunc=np.sum) @@ -123,7 +159,7 @@ def test_pivot_dtypes(self): # cannot convert dtypes f = DataFrame({'a': ['cat', 'bat', 'cat', 'bat'], 'v': [ 1.5, 2.5, 3.5, 4.5], 'i': ['a', 'b', 'a', 'b']}) - self.assertEqual(f.dtypes['v'], 'float64') + assert f.dtypes['v'] == 'float64' z = pivot_table(f, values='v', index=['a'], columns=[ 'i'], fill_value=0, aggfunc=np.mean) @@ -131,6 +167,24 @@ def test_pivot_dtypes(self): expected = Series(dict(float64=2)) tm.assert_series_equal(result, expected) + @pytest.mark.parametrize('columns,values', + [('bool1', ['float1', 'float2']), + ('bool1', ['float1', 'float2', 'bool1']), + ('bool2', ['float1', 'float2', 'bool1'])]) + def test_pivot_preserve_dtypes(self, columns, values): + # GH 7142 regression test + v = np.arange(5, dtype=np.float64) + df = DataFrame({'float1': v, 'float2': v + 2.0, + 'bool1': v <= 2, 'bool2': v <= 3}) + + df_res = df.reset_index().pivot_table( + index='index', columns=columns, values=values) + + result = dict(df_res.dtypes) + expected = {col: np.dtype('O') if col[0].startswith('b') + else np.dtype('float64') for col in df_res} + assert result == expected + def test_pivot_no_values(self): # GH 14380 idx = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-01-02', @@ -213,10 +267,10 @@ def test_pivot_index_with_nan(self): df.loc[1, 'b'] = df.loc[4, 'b'] = nan pv = df.pivot('a', 'b', 'c') - self.assertEqual(pv.notnull().values.sum(), len(df)) + assert pv.notna().values.sum() == len(df) for _, row in df.iterrows(): - self.assertEqual(pv.loc[row['a'], row['b']], row['c']) + assert pv.loc[row['a'], row['b']] == row['c'] tm.assert_frame_equal(df.pivot('b', 'a', 'c'), pv.T) @@ -305,7 +359,7 @@ def _check_output(result, values_col, index=['A', 'B'], expected_col_margins = self.data.groupby(index)[values_col].mean() tm.assert_series_equal(col_margins, expected_col_margins, check_names=False) - self.assertEqual(col_margins.name, margins_col) + assert col_margins.name == margins_col result = result.sort_index() index_margins = result.loc[(margins_col, '')].iloc[:-1] @@ -313,11 +367,11 @@ def _check_output(result, values_col, index=['A', 'B'], expected_ix_margins = self.data.groupby(columns)[values_col].mean() tm.assert_series_equal(index_margins, expected_ix_margins, check_names=False) - self.assertEqual(index_margins.name, (margins_col, '')) + assert index_margins.name == (margins_col, '') grand_total_margins = result.loc[(margins_col, ''), margins_col] expected_total_margins = self.data[values_col].mean() - self.assertEqual(grand_total_margins, expected_total_margins) + assert grand_total_margins == expected_total_margins # column specified result = self.data.pivot_table(values='D', index=['A', 'B'], @@ -346,18 +400,18 @@ def _check_output(result, values_col, index=['A', 'B'], aggfunc=np.mean) for value_col in table.columns: totals = table.loc[('All', ''), value_col] - self.assertEqual(totals, self.data[value_col].mean()) + assert totals == self.data[value_col].mean() # no rows rtable = self.data.pivot_table(columns=['AA', 'BB'], margins=True, aggfunc=np.mean) - tm.assertIsInstance(rtable, Series) + assert isinstance(rtable, Series) table = self.data.pivot_table(index=['AA', 'BB'], margins=True, aggfunc='mean') for item in ['DD', 'EE', 'FF']: totals = table.loc[('All', ''), item] - self.assertEqual(totals, self.data[item].mean()) + assert totals == self.data[item].mean() # issue number #8349: pivot_table with margins and dictionary aggfunc data = [ @@ -405,6 +459,41 @@ def _check_output(result, values_col, index=['A', 'B'], tm.assert_frame_equal(result['SALARY'], expected['SALARY']) + def test_margins_dtype(self): + # GH 17013 + + df = self.data.copy() + df[['D', 'E', 'F']] = np.arange(len(df) * 3).reshape(len(df), 3) + + mi_val = list(product(['bar', 'foo'], ['one', 'two'])) + [('All', '')] + mi = MultiIndex.from_tuples(mi_val, names=('A', 'B')) + expected = DataFrame({'dull': [12, 21, 3, 9, 45], + 'shiny': [33, 0, 36, 51, 120]}, + index=mi).rename_axis('C', axis=1) + expected['All'] = expected['dull'] + expected['shiny'] + + result = df.pivot_table(values='D', index=['A', 'B'], + columns='C', margins=True, + aggfunc=np.sum, fill_value=0) + + tm.assert_frame_equal(expected, result) + + @pytest.mark.xfail(reason='GH 17035 (len of floats is casted back to ' + 'floats)') + def test_margins_dtype_len(self): + mi_val = list(product(['bar', 'foo'], ['one', 'two'])) + [('All', '')] + mi = MultiIndex.from_tuples(mi_val, names=('A', 'B')) + expected = DataFrame({'dull': [1, 1, 2, 1, 5], + 'shiny': [2, 0, 2, 2, 6]}, + index=mi).rename_axis('C', axis=1) + expected['All'] = expected['dull'] + expected['shiny'] + + result = self.data.pivot_table(values='D', index=['A', 'B'], + columns='C', margins=True, + aggfunc=len, fill_value=0) + + tm.assert_frame_equal(expected, result) + def test_pivot_integer_columns(self): # caused by upstream bug in unstack @@ -478,10 +567,10 @@ def test_pivot_columns_lexsorted(self): columns=['Index', 'Symbol', 'Year'], aggfunc='mean') - self.assertTrue(pivoted.columns.is_monotonic) + assert pivoted.columns.is_monotonic def test_pivot_complex_aggfunc(self): - f = {'D': ['std'], 'E': ['sum']} + f = OrderedDict([('D', ['std']), ('E', ['sum'])]) expected = self.data.groupby(['A', 'B']).agg(f).unstack('B') result = self.data.pivot_table(index='A', columns='B', aggfunc=f) @@ -492,21 +581,21 @@ def test_margins_no_values_no_cols(self): result = self.data[['A', 'B']].pivot_table( index=['A', 'B'], aggfunc=len, margins=True) result_list = result.tolist() - self.assertEqual(sum(result_list[:-1]), result_list[-1]) + assert sum(result_list[:-1]) == result_list[-1] def test_margins_no_values_two_rows(self): # Regression test on pivot table: no values passed but rows are a # multi-index result = self.data[['A', 'B', 'C']].pivot_table( index=['A', 'B'], columns='C', aggfunc=len, margins=True) - self.assertEqual(result.All.tolist(), [3.0, 1.0, 4.0, 3.0, 11.0]) + assert result.All.tolist() == [3.0, 1.0, 4.0, 3.0, 11.0] def test_margins_no_values_one_row_one_col(self): # Regression test on pivot table: no values passed but row and col # defined result = self.data[['A', 'B']].pivot_table( index='A', columns='B', aggfunc=len, margins=True) - self.assertEqual(result.All.tolist(), [4.0, 7.0, 11.0]) + assert result.All.tolist() == [4.0, 7.0, 11.0] def test_margins_no_values_two_row_two_cols(self): # Regression test on pivot table: no values passed but rows and cols @@ -515,22 +604,22 @@ def test_margins_no_values_two_row_two_cols(self): 'e', 'f', 'g', 'h', 'i', 'j', 'k'] result = self.data[['A', 'B', 'C', 'D']].pivot_table( index=['A', 'B'], columns=['C', 'D'], aggfunc=len, margins=True) - self.assertEqual(result.All.tolist(), [3.0, 1.0, 4.0, 3.0, 11.0]) + assert result.All.tolist() == [3.0, 1.0, 4.0, 3.0, 11.0] def test_pivot_table_with_margins_set_margin_name(self): - # GH 3335 + # see gh-3335 for margin_name in ['foo', 'one', 666, None, ['a', 'b']]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): # multi-index index pivot_table(self.data, values='D', index=['A', 'B'], columns=['C'], margins=True, margins_name=margin_name) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): # multi-index column pivot_table(self.data, values='D', index=['C'], columns=['A', 'B'], margins=True, margins_name=margin_name) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): # non-multi-index index/column pivot_table(self.data, values='D', index=['A'], columns=['B'], margins=True, @@ -593,10 +682,10 @@ def test_pivot_timegrouper(self): values='Quantity', aggfunc=np.sum) tm.assert_frame_equal(result, expected.T) - self.assertRaises(KeyError, lambda: pivot_table( + pytest.raises(KeyError, lambda: pivot_table( df, index=Grouper(freq='6MS', key='foo'), columns='Buyer', values='Quantity', aggfunc=np.sum)) - self.assertRaises(KeyError, lambda: pivot_table( + pytest.raises(KeyError, lambda: pivot_table( df, index='Buyer', columns=Grouper(freq='6MS', key='foo'), values='Quantity', aggfunc=np.sum)) @@ -613,10 +702,10 @@ def test_pivot_timegrouper(self): values='Quantity', aggfunc=np.sum) tm.assert_frame_equal(result, expected.T) - self.assertRaises(ValueError, lambda: pivot_table( + pytest.raises(ValueError, lambda: pivot_table( df, index=Grouper(freq='6MS', level='foo'), columns='Buyer', values='Quantity', aggfunc=np.sum)) - self.assertRaises(ValueError, lambda: pivot_table( + pytest.raises(ValueError, lambda: pivot_table( df, index='Buyer', columns=Grouper(freq='6MS', level='foo'), values='Quantity', aggfunc=np.sum)) @@ -840,6 +929,8 @@ def test_pivot_table_margins_name_with_aggfunc_list(self): expected = pd.DataFrame(table.values, index=ix, columns=cols) tm.assert_frame_equal(table, expected) + @pytest.mark.xfail(reason='GH 17035 (np.mean of ints is casted back to ' + 'ints)') def test_categorical_margins(self): # GH 10989 df = pd.DataFrame({'x': np.arange(8), @@ -850,14 +941,23 @@ def test_categorical_margins(self): expected.index = Index([0, 1, 'All'], name='y') expected.columns = Index([0, 1, 'All'], name='z') - data = df.copy() - table = data.pivot_table('x', 'y', 'z', margins=True) + table = df.pivot_table('x', 'y', 'z', margins=True) tm.assert_frame_equal(table, expected) - data = df.copy() - data.y = data.y.astype('category') - data.z = data.z.astype('category') - table = data.pivot_table('x', 'y', 'z', margins=True) + @pytest.mark.xfail(reason='GH 17035 (np.mean of ints is casted back to ' + 'ints)') + def test_categorical_margins_category(self): + df = pd.DataFrame({'x': np.arange(8), + 'y': np.arange(8) // 4, + 'z': np.arange(8) % 2}) + + expected = pd.DataFrame([[1.0, 2.0, 1.5], [5, 6, 5.5], [3, 4, 3.5]]) + expected.index = Index([0, 1, 'All'], name='y') + expected.columns = Index([0, 1, 'All'], name='z') + + df.y = df.y.astype('category') + df.z = df.z.astype('category') + table = df.pivot_table('x', 'y', 'z', margins=True) tm.assert_frame_equal(table, expected) def test_categorical_aggfunc(self): @@ -907,10 +1007,48 @@ def test_categorical_pivot_index_ordering(self): columns=expected_columns) tm.assert_frame_equal(result, expected) + def test_pivot_table_not_series(self): + # GH 4386 + # pivot_table always returns a DataFrame + # when values is not list like and columns is None + # and aggfunc is not instance of list + df = DataFrame({'col1': [3, 4, 5], + 'col2': ['C', 'D', 'E'], + 'col3': [1, 3, 9]}) + + result = df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum) + m = MultiIndex.from_arrays([[1, 3, 9], + ['C', 'D', 'E']], + names=['col3', 'col2']) + expected = DataFrame([3, 4, 5], + index=m, columns=['col1']) -class TestCrosstab(tm.TestCase): + tm.assert_frame_equal(result, expected) - def setUp(self): + result = df.pivot_table( + 'col1', index='col3', columns='col2', aggfunc=np.sum + ) + expected = DataFrame([[3, np.NaN, np.NaN], + [np.NaN, 4, np.NaN], + [np.NaN, np.NaN, 5]], + index=Index([1, 3, 9], name='col3'), + columns=Index(['C', 'D', 'E'], name='col2')) + + tm.assert_frame_equal(result, expected) + + result = df.pivot_table('col1', index='col3', aggfunc=[np.sum]) + m = MultiIndex.from_arrays([['sum'], + ['col1']]) + expected = DataFrame([3, 4, 5], + index=Index([1, 3, 9], name='col3'), + columns=m) + + tm.assert_frame_equal(result, expected) + + +class TestCrosstab(object): + + def setup_method(self, method): df = DataFrame({'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo'], @@ -963,8 +1101,24 @@ def test_crosstab_ndarray(self): # assign arbitrary names result = crosstab(self.df['A'].values, self.df['C'].values) - self.assertEqual(result.index.name, 'row_0') - self.assertEqual(result.columns.name, 'col_0') + assert result.index.name == 'row_0' + assert result.columns.name == 'col_0' + + def test_crosstab_non_aligned(self): + # GH 17005 + a = pd.Series([0, 1, 1], index=['a', 'b', 'c']) + b = pd.Series([3, 4, 3, 4, 3], index=['a', 'b', 'c', 'd', 'f']) + c = np.array([3, 4, 3]) + + expected = pd.DataFrame([[1, 0], [1, 1]], + index=Index([0, 1], name='row_0'), + columns=Index([3, 4], name='col_0')) + + result = crosstab(a, b) + tm.assert_frame_equal(result, expected) + + result = crosstab(a, c) + tm.assert_frame_equal(result, expected) def test_crosstab_margins(self): a = np.random.randint(0, 7, size=100) @@ -976,8 +1130,8 @@ def test_crosstab_margins(self): result = crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'), margins=True) - self.assertEqual(result.index.names, ('a',)) - self.assertEqual(result.columns.names, ['b', 'c']) + assert result.index.names == ('a',) + assert result.columns.names == ['b', 'c'] all_cols = result['All', ''] exp_cols = df.groupby(['a']).size().astype('i8') @@ -997,6 +1151,43 @@ def test_crosstab_margins(self): exp_rows = exp_rows.fillna(0).astype(np.int64) tm.assert_series_equal(all_rows, exp_rows) + def test_crosstab_margins_set_margin_name(self): + # GH 15972 + a = np.random.randint(0, 7, size=100) + b = np.random.randint(0, 3, size=100) + c = np.random.randint(0, 5, size=100) + + df = DataFrame({'a': a, 'b': b, 'c': c}) + + result = crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'), + margins=True, margins_name='TOTAL') + + assert result.index.names == ('a',) + assert result.columns.names == ['b', 'c'] + + all_cols = result['TOTAL', ''] + exp_cols = df.groupby(['a']).size().astype('i8') + # to keep index.name + exp_margin = Series([len(df)], index=Index(['TOTAL'], name='a')) + exp_cols = exp_cols.append(exp_margin) + exp_cols.name = ('TOTAL', '') + + tm.assert_series_equal(all_cols, exp_cols) + + all_rows = result.loc['TOTAL'] + exp_rows = df.groupby(['b', 'c']).size().astype('i8') + exp_rows = exp_rows.append(Series([len(df)], index=[('TOTAL', '')])) + exp_rows.name = 'TOTAL' + + exp_rows = exp_rows.reindex(all_rows.index) + exp_rows = exp_rows.fillna(0).astype(np.int64) + tm.assert_series_equal(all_rows, exp_rows) + + for margins_name in [666, None, ['a', 'b']]: + with pytest.raises(ValueError): + crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'), + margins=True, margins_name=margins_name) + def test_crosstab_pass_values(self): a = np.random.randint(0, 7, size=100) b = np.random.randint(0, 3, size=100) @@ -1152,8 +1343,8 @@ def test_crosstab_normalize(self): pd.crosstab(df.a, df.b, normalize='index')) row_normal_margins = pd.DataFrame([[1.0, 0], - [0.25, 0.75], - [0.4, 0.6]], + [0.25, 0.75], + [0.4, 0.6]], index=pd.Index([1, 2, 'All'], name='a', dtype='object'), @@ -1165,8 +1356,8 @@ def test_crosstab_normalize(self): name='b')) all_normal_margins = pd.DataFrame([[0.2, 0, 0.2], - [0.2, 0.6, 0.8], - [0.4, 0.6, 1]], + [0.2, 0.6, 0.8], + [0.4, 0.6, 1]], index=pd.Index([1, 2, 'All'], name='a', dtype='object'), @@ -1247,22 +1438,22 @@ def test_crosstab_errors(self): 'c': [1, 1, np.nan, 1, 1]}) error = 'values cannot be used without an aggfunc.' - with tm.assertRaisesRegexp(ValueError, error): + with tm.assert_raises_regex(ValueError, error): pd.crosstab(df.a, df.b, values=df.c) error = 'aggfunc cannot be used without values' - with tm.assertRaisesRegexp(ValueError, error): + with tm.assert_raises_regex(ValueError, error): pd.crosstab(df.a, df.b, aggfunc=np.mean) error = 'Not a valid normalize argument' - with tm.assertRaisesRegexp(ValueError, error): + with tm.assert_raises_regex(ValueError, error): pd.crosstab(df.a, df.b, normalize='42') - with tm.assertRaisesRegexp(ValueError, error): + with tm.assert_raises_regex(ValueError, error): pd.crosstab(df.a, df.b, normalize=42) error = 'Not a valid margins argument' - with tm.assertRaisesRegexp(ValueError, error): + with tm.assert_raises_regex(ValueError, error): pd.crosstab(df.a, df.b, normalize='all', margins=42) def test_crosstab_with_categorial_columns(self): @@ -1322,8 +1513,115 @@ def test_crosstab_with_numpy_size(self): columns=expected_column) tm.assert_frame_equal(result, expected) + def test_crosstab_dup_index_names(self): + # GH 13279 + s = pd.Series(range(3), name='foo') + result = pd.crosstab(s, s) + expected_index = pd.Index(range(3), name='foo') + expected = pd.DataFrame(np.eye(3, dtype=np.int64), + index=expected_index, + columns=expected_index) + tm.assert_frame_equal(result, expected) + + +class TestPivotAnnual(object): + """ + New pandas of scikits.timeseries pivot_annual + """ + + def test_daily(self): + rng = date_range('1/1/2000', '12/31/2004', freq='D') + ts = Series(np.random.randn(len(rng)), index=rng) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + annual = pivot_annual(ts, 'D') + + doy = np.asarray(ts.index.dayofyear) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + doy[(~isleapyear(ts.index.year)) & (doy >= 60)] += 1 + + for i in range(1, 367): + subset = ts[doy == i] + subset.index = [x.year for x in subset.index] -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + result = annual[i].dropna() + tm.assert_series_equal(result, subset, check_names=False) + assert result.name == i + + # check leap days + leaps = ts[(ts.index.month == 2) & (ts.index.day == 29)] + day = leaps.index.dayofyear[0] + leaps.index = leaps.index.year + leaps.name = 60 + tm.assert_series_equal(annual[day].dropna(), leaps) + + def test_hourly(self): + rng_hourly = date_range('1/1/1994', periods=(18 * 8760 + 4 * 24), + freq='H') + data_hourly = np.random.randint(100, 350, rng_hourly.size) + ts_hourly = Series(data_hourly, index=rng_hourly) + + grouped = ts_hourly.groupby(ts_hourly.index.year) + hoy = grouped.apply(lambda x: x.reset_index(drop=True)) + hoy = hoy.index.droplevel(0).values + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + hoy[~isleapyear(ts_hourly.index.year) & (hoy >= 1416)] += 24 + hoy += 1 + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + annual = pivot_annual(ts_hourly) + + ts_hourly = ts_hourly.astype(float) + for i in [1, 1416, 1417, 1418, 1439, 1440, 1441, 8784]: + subset = ts_hourly[hoy == i] + subset.index = [x.year for x in subset.index] + + result = annual[i].dropna() + tm.assert_series_equal(result, subset, check_names=False) + assert result.name == i + + leaps = ts_hourly[(ts_hourly.index.month == 2) & ( + ts_hourly.index.day == 29) & (ts_hourly.index.hour == 0)] + hour = leaps.index.dayofyear[0] * 24 - 23 + leaps.index = leaps.index.year + leaps.name = 1417 + tm.assert_series_equal(annual[hour].dropna(), leaps) + + def test_weekly(self): + pass + + def test_monthly(self): + rng = date_range('1/1/2000', '12/31/2004', freq='M') + ts = Series(np.random.randn(len(rng)), index=rng) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + annual = pivot_annual(ts, 'M') + + month = ts.index.month + for i in range(1, 13): + subset = ts[month == i] + subset.index = [x.year for x in subset.index] + result = annual[i].dropna() + tm.assert_series_equal(result, subset, check_names=False) + assert result.name == i + + def test_period_monthly(self): + pass + + def test_period_daily(self): + pass + + def test_period_weekly(self): + pass + + def test_isleapyear_deprecate(self): + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + assert isleapyear(2000) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + assert not isleapyear(2001) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + assert isleapyear(2004) diff --git a/pandas/tests/test_reshape.py b/pandas/tests/reshape/test_reshape.py similarity index 80% rename from pandas/tests/test_reshape.py rename to pandas/tests/reshape/test_reshape.py index 603674ac01bc0..632d3b4ad2e7a 100644 --- a/pandas/tests/test_reshape.py +++ b/pandas/tests/reshape/test_reshape.py @@ -1,9 +1,9 @@ # -*- coding: utf-8 -*- # pylint: disable-msg=W0612,E1101 -import nose + +import pytest from pandas import DataFrame, Series -from pandas.core.sparse import SparseDataFrame import pandas as pd from numpy import nan @@ -11,16 +11,15 @@ from pandas.util.testing import assert_frame_equal -from pandas.core.reshape import (melt, lreshape, get_dummies, wide_to_long) +from pandas.core.reshape.reshape import ( + melt, lreshape, get_dummies, wide_to_long) import pandas.util.testing as tm from pandas.compat import range, u -_multiprocess_can_split_ = True - -class TestMelt(tm.TestCase): +class TestMelt(object): - def setUp(self): + def setup_method(self, method): self.df = tm.makeTimeDataFrame()[:10] self.df['id1'] = (self.df['A'] > 0).astype(np.int64) self.df['id2'] = (self.df['B'] > 0).astype(np.int64) @@ -34,23 +33,44 @@ def setUp(self): self.df1.columns = [list('ABC'), list('abc')] self.df1.columns.names = ['CAP', 'low'] - def test_default_col_names(self): + def test_top_level_method(self): result = melt(self.df) - self.assertEqual(result.columns.tolist(), ['variable', 'value']) + assert result.columns.tolist() == ['variable', 'value'] - result1 = melt(self.df, id_vars=['id1']) - self.assertEqual(result1.columns.tolist(), ['id1', 'variable', 'value' - ]) + def test_method_signatures(self): + tm.assert_frame_equal(self.df.melt(), + melt(self.df)) - result2 = melt(self.df, id_vars=['id1', 'id2']) - self.assertEqual(result2.columns.tolist(), ['id1', 'id2', 'variable', - 'value']) + tm.assert_frame_equal(self.df.melt(id_vars=['id1', 'id2'], + value_vars=['A', 'B']), + melt(self.df, + id_vars=['id1', 'id2'], + value_vars=['A', 'B'])) + + tm.assert_frame_equal(self.df.melt(var_name=self.var_name, + value_name=self.value_name), + melt(self.df, + var_name=self.var_name, + value_name=self.value_name)) + + tm.assert_frame_equal(self.df1.melt(col_level=0), + melt(self.df1, col_level=0)) + + def test_default_col_names(self): + result = self.df.melt() + assert result.columns.tolist() == ['variable', 'value'] + + result1 = self.df.melt(id_vars=['id1']) + assert result1.columns.tolist() == ['id1', 'variable', 'value'] + + result2 = self.df.melt(id_vars=['id1', 'id2']) + assert result2.columns.tolist() == ['id1', 'id2', 'variable', 'value'] def test_value_vars(self): - result3 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A') - self.assertEqual(len(result3), 10) + result3 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A') + assert len(result3) == 10 - result4 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B']) + result4 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B']) expected4 = DataFrame({'id1': self.df['id1'].tolist() * 2, 'id2': self.df['id2'].tolist() * 2, 'variable': ['A'] * 10 + ['B'] * 10, @@ -59,24 +79,61 @@ def test_value_vars(self): columns=['id1', 'id2', 'variable', 'value']) tm.assert_frame_equal(result4, expected4) + def test_value_vars_types(self): + # GH 15348 + expected = DataFrame({'id1': self.df['id1'].tolist() * 2, + 'id2': self.df['id2'].tolist() * 2, + 'variable': ['A'] * 10 + ['B'] * 10, + 'value': (self.df['A'].tolist() + + self.df['B'].tolist())}, + columns=['id1', 'id2', 'variable', 'value']) + + for type_ in (tuple, list, np.array): + result = self.df.melt(id_vars=['id1', 'id2'], + value_vars=type_(('A', 'B'))) + tm.assert_frame_equal(result, expected) + + def test_vars_work_with_multiindex(self): + expected = DataFrame({ + ('A', 'a'): self.df1[('A', 'a')], + 'CAP': ['B'] * len(self.df1), + 'low': ['b'] * len(self.df1), + 'value': self.df1[('B', 'b')], + }, columns=[('A', 'a'), 'CAP', 'low', 'value']) + + result = self.df1.melt(id_vars=[('A', 'a')], value_vars=[('B', 'b')]) + tm.assert_frame_equal(result, expected) + + def test_tuple_vars_fail_with_multiindex(self): + # melt should fail with an informative error message if + # the columns have a MultiIndex and a tuple is passed + # for id_vars or value_vars. + tuple_a = ('A', 'a') + list_a = [tuple_a] + tuple_b = ('B', 'b') + list_b = [tuple_b] + + for id_vars, value_vars in ((tuple_a, list_b), (list_a, tuple_b), + (tuple_a, tuple_b)): + with tm.assert_raises_regex(ValueError, r'MultiIndex'): + self.df1.melt(id_vars=id_vars, value_vars=value_vars) + def test_custom_var_name(self): - result5 = melt(self.df, var_name=self.var_name) - self.assertEqual(result5.columns.tolist(), ['var', 'value']) + result5 = self.df.melt(var_name=self.var_name) + assert result5.columns.tolist() == ['var', 'value'] - result6 = melt(self.df, id_vars=['id1'], var_name=self.var_name) - self.assertEqual(result6.columns.tolist(), ['id1', 'var', 'value']) + result6 = self.df.melt(id_vars=['id1'], var_name=self.var_name) + assert result6.columns.tolist() == ['id1', 'var', 'value'] - result7 = melt(self.df, id_vars=['id1', 'id2'], var_name=self.var_name) - self.assertEqual(result7.columns.tolist(), ['id1', 'id2', 'var', - 'value']) + result7 = self.df.melt(id_vars=['id1', 'id2'], var_name=self.var_name) + assert result7.columns.tolist() == ['id1', 'id2', 'var', 'value'] - result8 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A', - var_name=self.var_name) - self.assertEqual(result8.columns.tolist(), ['id1', 'id2', 'var', - 'value']) + result8 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A', + var_name=self.var_name) + assert result8.columns.tolist() == ['id1', 'id2', 'var', 'value'] - result9 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'], - var_name=self.var_name) + result9 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'], + var_name=self.var_name) expected9 = DataFrame({'id1': self.df['id1'].tolist() * 2, 'id2': self.df['id2'].tolist() * 2, self.var_name: ['A'] * 10 + ['B'] * 10, @@ -86,24 +143,22 @@ def test_custom_var_name(self): tm.assert_frame_equal(result9, expected9) def test_custom_value_name(self): - result10 = melt(self.df, value_name=self.value_name) - self.assertEqual(result10.columns.tolist(), ['variable', 'val']) + result10 = self.df.melt(value_name=self.value_name) + assert result10.columns.tolist() == ['variable', 'val'] - result11 = melt(self.df, id_vars=['id1'], value_name=self.value_name) - self.assertEqual(result11.columns.tolist(), ['id1', 'variable', 'val']) + result11 = self.df.melt(id_vars=['id1'], value_name=self.value_name) + assert result11.columns.tolist() == ['id1', 'variable', 'val'] - result12 = melt(self.df, id_vars=['id1', 'id2'], - value_name=self.value_name) - self.assertEqual(result12.columns.tolist(), ['id1', 'id2', 'variable', - 'val']) + result12 = self.df.melt(id_vars=['id1', 'id2'], + value_name=self.value_name) + assert result12.columns.tolist() == ['id1', 'id2', 'variable', 'val'] - result13 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A', - value_name=self.value_name) - self.assertEqual(result13.columns.tolist(), ['id1', 'id2', 'variable', - 'val']) + result13 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A', + value_name=self.value_name) + assert result13.columns.tolist() == ['id1', 'id2', 'variable', 'val'] - result14 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'], - value_name=self.value_name) + result14 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'], + value_name=self.value_name) expected14 = DataFrame({'id1': self.df['id1'].tolist() * 2, 'id2': self.df['id2'].tolist() * 2, 'variable': ['A'] * 10 + ['B'] * 10, @@ -115,26 +170,27 @@ def test_custom_value_name(self): def test_custom_var_and_value_name(self): - result15 = melt(self.df, var_name=self.var_name, - value_name=self.value_name) - self.assertEqual(result15.columns.tolist(), ['var', 'val']) + result15 = self.df.melt(var_name=self.var_name, + value_name=self.value_name) + assert result15.columns.tolist() == ['var', 'val'] - result16 = melt(self.df, id_vars=['id1'], var_name=self.var_name, - value_name=self.value_name) - self.assertEqual(result16.columns.tolist(), ['id1', 'var', 'val']) + result16 = self.df.melt(id_vars=['id1'], var_name=self.var_name, + value_name=self.value_name) + assert result16.columns.tolist() == ['id1', 'var', 'val'] - result17 = melt(self.df, id_vars=['id1', 'id2'], - var_name=self.var_name, value_name=self.value_name) - self.assertEqual(result17.columns.tolist(), ['id1', 'id2', 'var', 'val' - ]) + result17 = self.df.melt(id_vars=['id1', 'id2'], + var_name=self.var_name, + value_name=self.value_name) + assert result17.columns.tolist() == ['id1', 'id2', 'var', 'val'] - result18 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A', - var_name=self.var_name, value_name=self.value_name) - self.assertEqual(result18.columns.tolist(), ['id1', 'id2', 'var', 'val' - ]) + result18 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A', + var_name=self.var_name, + value_name=self.value_name) + assert result18.columns.tolist() == ['id1', 'id2', 'var', 'val'] - result19 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'], - var_name=self.var_name, value_name=self.value_name) + result19 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'], + var_name=self.var_name, + value_name=self.value_name) expected19 = DataFrame({'id1': self.df['id1'].tolist() * 2, 'id2': self.df['id2'].tolist() * 2, self.var_name: ['A'] * 10 + ['B'] * 10, @@ -146,25 +202,25 @@ def test_custom_var_and_value_name(self): df20 = self.df.copy() df20.columns.name = 'foo' - result20 = melt(df20) - self.assertEqual(result20.columns.tolist(), ['foo', 'value']) + result20 = df20.melt() + assert result20.columns.tolist() == ['foo', 'value'] def test_col_level(self): - res1 = melt(self.df1, col_level=0) - res2 = melt(self.df1, col_level='CAP') - self.assertEqual(res1.columns.tolist(), ['CAP', 'value']) - self.assertEqual(res2.columns.tolist(), ['CAP', 'value']) + res1 = self.df1.melt(col_level=0) + res2 = self.df1.melt(col_level='CAP') + assert res1.columns.tolist() == ['CAP', 'value'] + assert res2.columns.tolist() == ['CAP', 'value'] def test_multiindex(self): - res = pd.melt(self.df1) - self.assertEqual(res.columns.tolist(), ['CAP', 'low', 'value']) + res = self.df1.melt() + assert res.columns.tolist() == ['CAP', 'low', 'value'] -class TestGetDummies(tm.TestCase): +class TestGetDummies(object): sparse = False - def setUp(self): + def setup_method(self, method): self.df = DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'b', 'c'], 'C': [1, 2, 3]}) @@ -198,25 +254,31 @@ def test_basic_types(self): 'b': ['A', 'A', 'B', 'C', 'C'], 'c': [2, 3, 3, 3, 2]}) + expected = DataFrame({'a': [1, 0, 0], + 'b': [0, 1, 0], + 'c': [0, 0, 1]}, + dtype='uint8', + columns=list('abc')) if not self.sparse: - exp_df_type = DataFrame - exp_blk_type = pd.core.internals.IntBlock + compare = tm.assert_frame_equal else: - exp_df_type = SparseDataFrame - exp_blk_type = pd.core.internals.SparseBlock + expected = expected.to_sparse(fill_value=0, kind='integer') + compare = tm.assert_sp_frame_equal + + result = get_dummies(s_list, sparse=self.sparse) + compare(result, expected) - self.assertEqual( - type(get_dummies(s_list, sparse=self.sparse)), exp_df_type) - self.assertEqual( - type(get_dummies(s_series, sparse=self.sparse)), exp_df_type) + result = get_dummies(s_series, sparse=self.sparse) + compare(result, expected) - r = get_dummies(s_df, sparse=self.sparse, columns=s_df.columns) - self.assertEqual(type(r), exp_df_type) + result = get_dummies(s_df, sparse=self.sparse, columns=s_df.columns) + tm.assert_series_equal(result.get_dtype_counts(), + Series({'uint8': 8})) - r = get_dummies(s_df, sparse=self.sparse, columns=['a']) - self.assertEqual(type(r[['a_0']]._data.blocks[0]), exp_blk_type) - self.assertEqual(type(r[['a_1']]._data.blocks[0]), exp_blk_type) - self.assertEqual(type(r[['a_2']]._data.blocks[0]), exp_blk_type) + result = get_dummies(s_df, sparse=self.sparse, columns=['a']) + expected = Series({'uint8': 3, 'int64': 1, 'object': 1}).sort_values() + tm.assert_series_equal(result.get_dtype_counts().sort_values(), + expected) def test_just_na(self): just_na_list = [np.nan] @@ -228,13 +290,13 @@ def test_just_na(self): res_series_index = get_dummies(just_na_series_index, sparse=self.sparse) - self.assertEqual(res_list.empty, True) - self.assertEqual(res_series.empty, True) - self.assertEqual(res_series_index.empty, True) + assert res_list.empty + assert res_series.empty + assert res_series_index.empty - self.assertEqual(res_list.index.tolist(), [0]) - self.assertEqual(res_series.index.tolist(), [0]) - self.assertEqual(res_series_index.index.tolist(), ['A']) + assert res_list.index.tolist() == [0] + assert res_series.index.tolist() == [0] + assert res_series_index.index.tolist() == ['A'] def test_include_na(self): s = ['a', 'b', np.nan] @@ -360,11 +422,11 @@ def test_dataframe_dummies_prefix_sep(self): assert_frame_equal(result, expected) def test_dataframe_dummies_prefix_bad_length(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): get_dummies(self.df, prefix=['too few'], sparse=self.sparse) def test_dataframe_dummies_prefix_sep_bad_length(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): get_dummies(self.df, prefix_sep=['bad'], sparse=self.sparse) def test_dataframe_dummies_prefix_dict(self): @@ -420,8 +482,8 @@ def test_dataframe_dummies_with_categorical(self): 'cat_x', 'cat_y']] assert_frame_equal(result, expected) - # GH12402 Add a new parameter `drop_first` to avoid collinearity def test_basic_drop_first(self): + # GH12402 Add a new parameter `drop_first` to avoid collinearity # Basic case s_list = list('abc') s_series = Series(s_list) @@ -581,8 +643,12 @@ def test_dataframe_dummies_preserve_categorical_dtype(self): class TestGetDummiesSparse(TestGetDummies): sparse = True + @pytest.mark.xfail(reason='nan in index is problematic (GH 16894)') + def test_include_na(self): + super(TestGetDummiesSparse, self).test_include_na() + -class TestMakeAxisDummies(tm.TestCase): +class TestMakeAxisDummies(object): def test_preserve_categorical_dtype(self): # GH13854 @@ -595,7 +661,7 @@ def test_preserve_categorical_dtype(self): expected = DataFrame([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], index=midx, columns=cidx) - from pandas.core.reshape import make_axis_dummies + from pandas.core.reshape.reshape import make_axis_dummies result = make_axis_dummies(df) tm.assert_frame_equal(result, expected) @@ -603,7 +669,7 @@ def test_preserve_categorical_dtype(self): tm.assert_frame_equal(result, expected) -class TestLreshape(tm.TestCase): +class TestLreshape(object): def test_pairs(self): data = {'birthdt': ['08jan2009', '20dec2008', '30dec2008', '21dec2008', @@ -672,10 +738,10 @@ def test_pairs(self): spec = {'visitdt': ['visitdt%d' % i for i in range(1, 3)], 'wt': ['wt%d' % i for i in range(1, 4)]} - self.assertRaises(ValueError, lreshape, df, spec) + pytest.raises(ValueError, lreshape, df, spec) -class TestWideToLong(tm.TestCase): +class TestWideToLong(object): def test_simple(self): np.random.seed(123) @@ -714,7 +780,7 @@ def test_stubs(self): # TODO: unused? df_long = pd.wide_to_long(df, stubs, i='id', j='age') # noqa - self.assertEqual(stubs, ['inc', 'edu']) + assert stubs == ['inc', 'edu'] def test_separating_character(self): # GH14779 @@ -915,7 +981,13 @@ def test_multiple_id_columns(self): long_frame = wide_to_long(df, 'ht', i=['famid', 'birth'], j='age') tm.assert_frame_equal(long_frame, exp_frame) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_non_unique_idvars(self): + # GH16382 + # Raise an error message if non unique id vars (i) are passed + df = pd.DataFrame({ + 'A_A1': [1, 2, 3, 4, 5], + 'B_B1': [1, 2, 3, 4, 5], + 'x': [1, 1, 1, 1, 1] + }) + with pytest.raises(ValueError): + wide_to_long(df, ['A_A', 'B_B'], i='x', j='colname') diff --git a/pandas/tests/reshape/test_tile.py b/pandas/tests/reshape/test_tile.py new file mode 100644 index 0000000000000..91000747b41bb --- /dev/null +++ b/pandas/tests/reshape/test_tile.py @@ -0,0 +1,514 @@ +import os +import pytest + +import numpy as np +from pandas.compat import zip + +from pandas import (Series, Index, isna, + to_datetime, DatetimeIndex, Timestamp, + Interval, IntervalIndex, Categorical, + cut, qcut, date_range) +import pandas.util.testing as tm + +from pandas.core.algorithms import quantile +import pandas.core.reshape.tile as tmod + + +class TestCut(object): + + def test_simple(self): + data = np.ones(5, dtype='int64') + result = cut(data, 4, labels=False) + expected = np.array([1, 1, 1, 1, 1]) + tm.assert_numpy_array_equal(result, expected, + check_dtype=False) + + def test_bins(self): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]) + result, bins = cut(data, 3, retbins=True) + + intervals = IntervalIndex.from_breaks(bins.round(3)) + expected = intervals.take([0, 0, 0, 1, 2, 0]).astype('category') + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, + 6.53333333, 9.7])) + + def test_right(self): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) + result, bins = cut(data, 4, right=True, retbins=True) + intervals = IntervalIndex.from_breaks(bins.round(3)) + expected = intervals.astype('category').take([0, 0, 0, 2, 3, 0, 0]) + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 2.575, 4.95, + 7.325, 9.7])) + + def test_noright(self): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) + result, bins = cut(data, 4, right=False, retbins=True) + intervals = IntervalIndex.from_breaks(bins.round(3), closed='left') + expected = intervals.take([0, 0, 0, 2, 3, 0, 1]).astype('category') + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.2, 2.575, 4.95, + 7.325, 9.7095])) + + def test_arraylike(self): + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + result, bins = cut(data, 3, retbins=True) + intervals = IntervalIndex.from_breaks(bins.round(3)) + expected = intervals.take([0, 0, 0, 1, 2, 0]).astype('category') + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, + 6.53333333, 9.7])) + + def test_bins_from_intervalindex(self): + c = cut(range(5), 3) + expected = c + result = cut(range(5), bins=expected.categories) + tm.assert_categorical_equal(result, expected) + + expected = Categorical.from_codes(np.append(c.codes, -1), + categories=c.categories, + ordered=True) + result = cut(range(6), bins=expected.categories) + tm.assert_categorical_equal(result, expected) + + # doc example + # make sure we preserve the bins + ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60]) + c = cut(ages, bins=[0, 18, 35, 70]) + expected = IntervalIndex.from_tuples([(0, 18), (18, 35), (35, 70)]) + tm.assert_index_equal(c.categories, expected) + + result = cut([25, 20, 50], bins=c.categories) + tm.assert_index_equal(result.categories, expected) + tm.assert_numpy_array_equal(result.codes, + np.array([1, 1, 2], dtype='int8')) + + def test_bins_not_monotonic(self): + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + pytest.raises(ValueError, cut, data, [0.1, 1.5, 1, 10]) + + def test_wrong_num_labels(self): + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + pytest.raises(ValueError, cut, data, [0, 1, 10], + labels=['foo', 'bar', 'baz']) + + def test_cut_corner(self): + # h3h + pytest.raises(ValueError, cut, [], 2) + + pytest.raises(ValueError, cut, [1, 2, 3], 0.5) + + def test_cut_out_of_range_more(self): + # #1511 + s = Series([0, -1, 0, 1, -3], name='x') + ind = cut(s, [0, 1], labels=False) + exp = Series([np.nan, np.nan, np.nan, 0, np.nan], name='x') + tm.assert_series_equal(ind, exp) + + def test_labels(self): + arr = np.tile(np.arange(0, 1.01, 0.1), 4) + + result, bins = cut(arr, 4, retbins=True) + ex_levels = IntervalIndex.from_breaks([-1e-3, 0.25, 0.5, 0.75, 1]) + tm.assert_index_equal(result.categories, ex_levels) + + result, bins = cut(arr, 4, retbins=True, right=False) + ex_levels = IntervalIndex.from_breaks([0, 0.25, 0.5, 0.75, 1 + 1e-3], + closed='left') + tm.assert_index_equal(result.categories, ex_levels) + + def test_cut_pass_series_name_to_factor(self): + s = Series(np.random.randn(100), name='foo') + + factor = cut(s, 4) + assert factor.name == 'foo' + + def test_label_precision(self): + arr = np.arange(0, 0.73, 0.01) + + result = cut(arr, 4, precision=2) + ex_levels = IntervalIndex.from_breaks([-0.00072, 0.18, 0.36, + 0.54, 0.72]) + tm.assert_index_equal(result.categories, ex_levels) + + def test_na_handling(self): + arr = np.arange(0, 0.75, 0.01) + arr[::3] = np.nan + + result = cut(arr, 4) + + result_arr = np.asarray(result) + + ex_arr = np.where(isna(arr), np.nan, result_arr) + + tm.assert_almost_equal(result_arr, ex_arr) + + result = cut(arr, 4, labels=False) + ex_result = np.where(isna(arr), np.nan, result) + tm.assert_almost_equal(result, ex_result) + + def test_inf_handling(self): + data = np.arange(6) + data_ser = Series(data, dtype='int64') + + bins = [-np.inf, 2, 4, np.inf] + result = cut(data, bins) + result_ser = cut(data_ser, bins) + + ex_uniques = IntervalIndex.from_breaks(bins) + tm.assert_index_equal(result.categories, ex_uniques) + assert result[5] == Interval(4, np.inf) + assert result[0] == Interval(-np.inf, 2) + assert result_ser[5] == Interval(4, np.inf) + assert result_ser[0] == Interval(-np.inf, 2) + + def test_qcut(self): + arr = np.random.randn(1000) + + # We store the bins as Index that have been rounded + # to comparisons are a bit tricky. + labels, bins = qcut(arr, 4, retbins=True) + ex_bins = quantile(arr, [0, .25, .5, .75, 1.]) + result = labels.categories.left.values + assert np.allclose(result, ex_bins[:-1], atol=1e-2) + result = labels.categories.right.values + assert np.allclose(result, ex_bins[1:], atol=1e-2) + + ex_levels = cut(arr, ex_bins, include_lowest=True) + tm.assert_categorical_equal(labels, ex_levels) + + def test_qcut_bounds(self): + arr = np.random.randn(1000) + + factor = qcut(arr, 10, labels=False) + assert len(np.unique(factor)) == 10 + + def test_qcut_specify_quantiles(self): + arr = np.random.randn(100) + + factor = qcut(arr, [0, .25, .5, .75, 1.]) + expected = qcut(arr, 4) + tm.assert_categorical_equal(factor, expected) + + def test_qcut_all_bins_same(self): + tm.assert_raises_regex(ValueError, "edges.*unique", qcut, + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3) + + def test_cut_out_of_bounds(self): + arr = np.random.randn(100) + + result = cut(arr, [-1, 0, 1]) + + mask = isna(result) + ex_mask = (arr < -1) | (arr > 1) + tm.assert_numpy_array_equal(mask, ex_mask) + + def test_cut_pass_labels(self): + arr = [50, 5, 10, 15, 20, 30, 70] + bins = [0, 25, 50, 100] + labels = ['Small', 'Medium', 'Large'] + + result = cut(arr, bins, labels=labels) + exp = Categorical(['Medium'] + 4 * ['Small'] + ['Medium', 'Large'], + categories=labels, + ordered=True) + tm.assert_categorical_equal(result, exp) + + result = cut(arr, bins, labels=Categorical.from_codes([0, 1, 2], + labels)) + exp = Categorical.from_codes([1] + 4 * [0] + [1, 2], labels) + tm.assert_categorical_equal(result, exp) + + # issue 16459 + labels = ['Good', 'Medium', 'Bad'] + result = cut(arr, 3, labels=labels) + exp = cut(arr, 3, labels=Categorical(labels, categories=labels, + ordered=True)) + tm.assert_categorical_equal(result, exp) + + def test_qcut_include_lowest(self): + values = np.arange(10) + + ii = qcut(values, 4) + + ex_levels = IntervalIndex.from_intervals( + [Interval(-0.001, 2.25), + Interval(2.25, 4.5), + Interval(4.5, 6.75), + Interval(6.75, 9)]) + tm.assert_index_equal(ii.categories, ex_levels) + + def test_qcut_nas(self): + arr = np.random.randn(100) + arr[:20] = np.nan + + result = qcut(arr, 4) + assert isna(result[:20]).all() + + def test_qcut_index(self): + result = qcut([0, 2], 2) + expected = Index([Interval(-0.001, 1), Interval(1, 2)]).astype( + 'category') + tm.assert_categorical_equal(result, expected) + + def test_round_frac(self): + # it works + result = cut(np.arange(11.), 2) + + result = cut(np.arange(11.) / 1e10, 2) + + # #1979, negative numbers + + result = tmod._round_frac(-117.9998, precision=3) + assert result == -118 + result = tmod._round_frac(117.9998, precision=3) + assert result == 118 + + result = tmod._round_frac(117.9998, precision=2) + assert result == 118 + result = tmod._round_frac(0.000123456, precision=2) + assert result == 0.00012 + + def test_qcut_binning_issues(self): + # #1978, 1979 + path = os.path.join(tm.get_data_path(), 'cut_data.csv') + arr = np.loadtxt(path) + + result = qcut(arr, 20) + + starts = [] + ends = [] + for lev in np.unique(result): + s = lev.left + e = lev.right + assert s != e + + starts.append(float(s)) + ends.append(float(e)) + + for (sp, sn), (ep, en) in zip(zip(starts[:-1], starts[1:]), + zip(ends[:-1], ends[1:])): + assert sp < sn + assert ep < en + assert ep <= sn + + def test_cut_return_intervals(self): + s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) + res = cut(s, 3) + exp_bins = np.linspace(0, 8, num=4).round(3) + exp_bins[0] -= 0.008 + exp = Series(IntervalIndex.from_breaks(exp_bins, closed='right').take( + [0, 0, 0, 1, 1, 1, 2, 2, 2])).astype('category', ordered=True) + tm.assert_series_equal(res, exp) + + def test_qcut_return_intervals(self): + s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) + res = qcut(s, [0, 0.333, 0.666, 1]) + exp_levels = np.array([Interval(-0.001, 2.664), + Interval(2.664, 5.328), Interval(5.328, 8)]) + exp = Series(exp_levels.take([0, 0, 0, 1, 1, 1, 2, 2, 2])).astype( + 'category', ordered=True) + tm.assert_series_equal(res, exp) + + def test_series_retbins(self): + # GH 8589 + s = Series(np.arange(4)) + result, bins = cut(s, 2, retbins=True) + expected = Series(IntervalIndex.from_breaks( + [-0.003, 1.5, 3], closed='right').repeat(2)).astype('category', + ordered=True) + tm.assert_series_equal(result, expected) + + result, bins = qcut(s, 2, retbins=True) + expected = Series(IntervalIndex.from_breaks( + [-0.001, 1.5, 3], closed='right').repeat(2)).astype('category', + ordered=True) + tm.assert_series_equal(result, expected) + + def test_qcut_duplicates_bin(self): + # GH 7751 + values = [0, 0, 0, 0, 1, 2, 3] + expected = IntervalIndex.from_intervals([Interval(-0.001, 1), + Interval(1, 3)]) + + result = qcut(values, 3, duplicates='drop') + tm.assert_index_equal(result.categories, expected) + + pytest.raises(ValueError, qcut, values, 3) + pytest.raises(ValueError, qcut, values, 3, duplicates='raise') + + # invalid + pytest.raises(ValueError, qcut, values, 3, duplicates='foo') + + def test_single_quantile(self): + # issue 15431 + expected = Series([0, 0]) + + s = Series([9., 9.]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(8.999, 9.0), + Interval(8.999, 9.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + s = Series([-9., -9.]) + expected = Series([0, 0]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(-9.001, -9.0), + Interval(-9.001, -9.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + s = Series([0., 0.]) + expected = Series([0, 0]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(-0.001, 0.0), + Interval(-0.001, 0.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + s = Series([9]) + expected = Series([0]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(8.999, 9.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + s = Series([-9]) + expected = Series([0]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(-9.001, -9.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + s = Series([0]) + expected = Series([0]) + result = qcut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + result = qcut(s, 1) + intervals = IntervalIndex([Interval(-0.001, 0.0)], closed='right') + expected = Series(intervals).astype('category', ordered=True) + tm.assert_series_equal(result, expected) + + def test_single_bin(self): + # issue 14652 + expected = Series([0, 0]) + + s = Series([9., 9.]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + s = Series([-9., -9.]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + expected = Series([0]) + + s = Series([9]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + s = Series([-9]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + # issue 15428 + expected = Series([0, 0]) + + s = Series([0., 0.]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + expected = Series([0]) + + s = Series([0]) + result = cut(s, 1, labels=False) + tm.assert_series_equal(result, expected) + + def test_datetime_cut(self): + # GH 14714 + # testing for time data to be present as series + data = to_datetime(Series(['2013-01-01', '2013-01-02', '2013-01-03'])) + + result, bins = cut(data, 3, retbins=True) + expected = ( + Series(IntervalIndex.from_intervals([ + Interval(Timestamp('2012-12-31 23:57:07.200000'), + Timestamp('2013-01-01 16:00:00')), + Interval(Timestamp('2013-01-01 16:00:00'), + Timestamp('2013-01-02 08:00:00')), + Interval(Timestamp('2013-01-02 08:00:00'), + Timestamp('2013-01-03 00:00:00'))])) + .astype('category', ordered=True)) + + tm.assert_series_equal(result, expected) + + # testing for time data to be present as list + data = [np.datetime64('2013-01-01'), np.datetime64('2013-01-02'), + np.datetime64('2013-01-03')] + result, bins = cut(data, 3, retbins=True) + tm.assert_series_equal(Series(result), expected) + + # testing for time data to be present as ndarray + data = np.array([np.datetime64('2013-01-01'), + np.datetime64('2013-01-02'), + np.datetime64('2013-01-03')]) + result, bins = cut(data, 3, retbins=True) + tm.assert_series_equal(Series(result), expected) + + # testing for time data to be present as datetime index + data = DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03']) + result, bins = cut(data, 3, retbins=True) + tm.assert_series_equal(Series(result), expected) + + def test_datetime_bin(self): + data = [np.datetime64('2012-12-13'), np.datetime64('2012-12-15')] + bin_data = ['2012-12-12', '2012-12-14', '2012-12-16'] + expected = ( + Series(IntervalIndex.from_intervals([ + Interval(Timestamp(bin_data[0]), Timestamp(bin_data[1])), + Interval(Timestamp(bin_data[1]), Timestamp(bin_data[2]))])) + .astype('category', ordered=True)) + + for conv in [Timestamp, Timestamp, np.datetime64]: + bins = [conv(v) for v in bin_data] + result = cut(data, bins=bins) + tm.assert_series_equal(Series(result), expected) + + bin_pydatetime = [Timestamp(v).to_pydatetime() for v in bin_data] + result = cut(data, bins=bin_pydatetime) + tm.assert_series_equal(Series(result), expected) + + bins = to_datetime(bin_data) + result = cut(data, bins=bin_pydatetime) + tm.assert_series_equal(Series(result), expected) + + def test_datetime_nan(self): + + def f(): + cut(date_range('20130101', periods=3), bins=[0, 2, 4]) + pytest.raises(ValueError, f) + + result = cut(date_range('20130102', periods=5), + bins=date_range('20130101', periods=2)) + mask = result.categories.isna() + tm.assert_numpy_array_equal(mask, np.array([False])) + mask = result.isna() + tm.assert_numpy_array_equal( + mask, np.array([False, True, True, True, True])) + + +def curpath(): + pth, _ = os.path.split(os.path.abspath(__file__)) + return pth diff --git a/pandas/tests/reshape/test_union_categoricals.py b/pandas/tests/reshape/test_union_categoricals.py new file mode 100644 index 0000000000000..eb80fb54b4016 --- /dev/null +++ b/pandas/tests/reshape/test_union_categoricals.py @@ -0,0 +1,335 @@ +import pytest + +import numpy as np +import pandas as pd +from pandas import Categorical, Series, CategoricalIndex +from pandas.core.dtypes.concat import union_categoricals +from pandas.util import testing as tm + + +class TestUnionCategoricals(object): + + def test_union_categorical(self): + # GH 13361 + data = [ + (list('abc'), list('abd'), list('abcabd')), + ([0, 1, 2], [2, 3, 4], [0, 1, 2, 2, 3, 4]), + ([0, 1.2, 2], [2, 3.4, 4], [0, 1.2, 2, 2, 3.4, 4]), + + (['b', 'b', np.nan, 'a'], ['a', np.nan, 'c'], + ['b', 'b', np.nan, 'a', 'a', np.nan, 'c']), + + (pd.date_range('2014-01-01', '2014-01-05'), + pd.date_range('2014-01-06', '2014-01-07'), + pd.date_range('2014-01-01', '2014-01-07')), + + (pd.date_range('2014-01-01', '2014-01-05', tz='US/Central'), + pd.date_range('2014-01-06', '2014-01-07', tz='US/Central'), + pd.date_range('2014-01-01', '2014-01-07', tz='US/Central')), + + (pd.period_range('2014-01-01', '2014-01-05'), + pd.period_range('2014-01-06', '2014-01-07'), + pd.period_range('2014-01-01', '2014-01-07')), + ] + + for a, b, combined in data: + for box in [Categorical, CategoricalIndex, Series]: + result = union_categoricals([box(Categorical(a)), + box(Categorical(b))]) + expected = Categorical(combined) + tm.assert_categorical_equal(result, expected, + check_category_order=True) + + # new categories ordered by appearance + s = Categorical(['x', 'y', 'z']) + s2 = Categorical(['a', 'b', 'c']) + result = union_categoricals([s, s2]) + expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], + categories=['x', 'y', 'z', 'a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + s = Categorical([0, 1.2, 2], ordered=True) + s2 = Categorical([0, 1.2, 2], ordered=True) + result = union_categoricals([s, s2]) + expected = Categorical([0, 1.2, 2, 0, 1.2, 2], ordered=True) + tm.assert_categorical_equal(result, expected) + + # must exactly match types + s = Categorical([0, 1.2, 2]) + s2 = Categorical([2, 3, 4]) + msg = 'dtype of categories must be the same' + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([s, s2]) + + msg = 'No Categoricals to union' + with tm.assert_raises_regex(ValueError, msg): + union_categoricals([]) + + def test_union_categoricals_nan(self): + # GH 13759 + res = union_categoricals([pd.Categorical([1, 2, np.nan]), + pd.Categorical([3, 2, np.nan])]) + exp = Categorical([1, 2, np.nan, 3, 2, np.nan]) + tm.assert_categorical_equal(res, exp) + + res = union_categoricals([pd.Categorical(['A', 'B']), + pd.Categorical(['B', 'B', np.nan])]) + exp = Categorical(['A', 'B', 'B', 'B', np.nan]) + tm.assert_categorical_equal(res, exp) + + val1 = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-03-01'), + pd.NaT] + val2 = [pd.NaT, pd.Timestamp('2011-01-01'), + pd.Timestamp('2011-02-01')] + + res = union_categoricals([pd.Categorical(val1), pd.Categorical(val2)]) + exp = Categorical(val1 + val2, + categories=[pd.Timestamp('2011-01-01'), + pd.Timestamp('2011-03-01'), + pd.Timestamp('2011-02-01')]) + tm.assert_categorical_equal(res, exp) + + # all NaN + res = union_categoricals([pd.Categorical([np.nan, np.nan]), + pd.Categorical(['X'])]) + exp = Categorical([np.nan, np.nan, 'X']) + tm.assert_categorical_equal(res, exp) + + res = union_categoricals([pd.Categorical([np.nan, np.nan]), + pd.Categorical([np.nan, np.nan])]) + exp = Categorical([np.nan, np.nan, np.nan, np.nan]) + tm.assert_categorical_equal(res, exp) + + def test_union_categoricals_empty(self): + # GH 13759 + res = union_categoricals([pd.Categorical([]), + pd.Categorical([])]) + exp = Categorical([]) + tm.assert_categorical_equal(res, exp) + + res = union_categoricals([Categorical([]), + Categorical(['1'])]) + exp = Categorical(['1']) + tm.assert_categorical_equal(res, exp) + + def test_union_categorical_same_category(self): + # check fastpath + c1 = Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4]) + c2 = Categorical([3, 2, 1, np.nan], categories=[1, 2, 3, 4]) + res = union_categoricals([c1, c2]) + exp = Categorical([1, 2, 3, 4, 3, 2, 1, np.nan], + categories=[1, 2, 3, 4]) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical(['z', 'z', 'z'], categories=['x', 'y', 'z']) + c2 = Categorical(['x', 'x', 'x'], categories=['x', 'y', 'z']) + res = union_categoricals([c1, c2]) + exp = Categorical(['z', 'z', 'z', 'x', 'x', 'x'], + categories=['x', 'y', 'z']) + tm.assert_categorical_equal(res, exp) + + def test_union_categoricals_ordered(self): + c1 = Categorical([1, 2, 3], ordered=True) + c2 = Categorical([1, 2, 3], ordered=False) + + msg = 'Categorical.ordered must be the same' + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([c1, c2]) + + res = union_categoricals([c1, c1]) + exp = Categorical([1, 2, 3, 1, 2, 3], ordered=True) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical([1, 2, 3, np.nan], ordered=True) + c2 = Categorical([3, 2], categories=[1, 2, 3], ordered=True) + + res = union_categoricals([c1, c2]) + exp = Categorical([1, 2, 3, np.nan, 3, 2], ordered=True) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical([1, 2, 3], ordered=True) + c2 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True) + + msg = "to union ordered Categoricals, all categories must be the same" + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([c1, c2]) + + def test_union_categoricals_ignore_order(self): + # GH 15219 + c1 = Categorical([1, 2, 3], ordered=True) + c2 = Categorical([1, 2, 3], ordered=False) + + res = union_categoricals([c1, c2], ignore_order=True) + exp = Categorical([1, 2, 3, 1, 2, 3]) + tm.assert_categorical_equal(res, exp) + + msg = 'Categorical.ordered must be the same' + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([c1, c2], ignore_order=False) + + res = union_categoricals([c1, c1], ignore_order=True) + exp = Categorical([1, 2, 3, 1, 2, 3]) + tm.assert_categorical_equal(res, exp) + + res = union_categoricals([c1, c1], ignore_order=False) + exp = Categorical([1, 2, 3, 1, 2, 3], + categories=[1, 2, 3], ordered=True) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical([1, 2, 3, np.nan], ordered=True) + c2 = Categorical([3, 2], categories=[1, 2, 3], ordered=True) + + res = union_categoricals([c1, c2], ignore_order=True) + exp = Categorical([1, 2, 3, np.nan, 3, 2]) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical([1, 2, 3], ordered=True) + c2 = Categorical([1, 2, 3], categories=[3, 2, 1], ordered=True) + + res = union_categoricals([c1, c2], ignore_order=True) + exp = Categorical([1, 2, 3, 1, 2, 3]) + tm.assert_categorical_equal(res, exp) + + res = union_categoricals([c2, c1], ignore_order=True, + sort_categories=True) + exp = Categorical([1, 2, 3, 1, 2, 3], categories=[1, 2, 3]) + tm.assert_categorical_equal(res, exp) + + c1 = Categorical([1, 2, 3], ordered=True) + c2 = Categorical([4, 5, 6], ordered=True) + result = union_categoricals([c1, c2], ignore_order=True) + expected = Categorical([1, 2, 3, 4, 5, 6]) + tm.assert_categorical_equal(result, expected) + + msg = "to union ordered Categoricals, all categories must be the same" + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([c1, c2], ignore_order=False) + + with tm.assert_raises_regex(TypeError, msg): + union_categoricals([c1, c2]) + + def test_union_categoricals_sort(self): + # GH 13846 + c1 = Categorical(['x', 'y', 'z']) + c2 = Categorical(['a', 'b', 'c']) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], + categories=['a', 'b', 'c', 'x', 'y', 'z']) + tm.assert_categorical_equal(result, expected) + + # fastpath + c1 = Categorical(['a', 'b'], categories=['b', 'a', 'c']) + c2 = Categorical(['b', 'c'], categories=['b', 'a', 'c']) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical(['a', 'b', 'b', 'c'], + categories=['a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical(['a', 'b'], categories=['c', 'a', 'b']) + c2 = Categorical(['b', 'c'], categories=['c', 'a', 'b']) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical(['a', 'b', 'b', 'c'], + categories=['a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + # fastpath - skip resort + c1 = Categorical(['a', 'b'], categories=['a', 'b', 'c']) + c2 = Categorical(['b', 'c'], categories=['a', 'b', 'c']) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical(['a', 'b', 'b', 'c'], + categories=['a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical(['x', np.nan]) + c2 = Categorical([np.nan, 'b']) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical(['x', np.nan, np.nan, 'b'], + categories=['b', 'x']) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical([np.nan]) + c2 = Categorical([np.nan]) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical([np.nan, np.nan], categories=[]) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical([]) + c2 = Categorical([]) + result = union_categoricals([c1, c2], sort_categories=True) + expected = Categorical([]) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical(['b', 'a'], categories=['b', 'a', 'c'], ordered=True) + c2 = Categorical(['a', 'c'], categories=['b', 'a', 'c'], ordered=True) + with pytest.raises(TypeError): + union_categoricals([c1, c2], sort_categories=True) + + def test_union_categoricals_sort_false(self): + # GH 13846 + c1 = Categorical(['x', 'y', 'z']) + c2 = Categorical(['a', 'b', 'c']) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical(['x', 'y', 'z', 'a', 'b', 'c'], + categories=['x', 'y', 'z', 'a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + # fastpath + c1 = Categorical(['a', 'b'], categories=['b', 'a', 'c']) + c2 = Categorical(['b', 'c'], categories=['b', 'a', 'c']) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical(['a', 'b', 'b', 'c'], + categories=['b', 'a', 'c']) + tm.assert_categorical_equal(result, expected) + + # fastpath - skip resort + c1 = Categorical(['a', 'b'], categories=['a', 'b', 'c']) + c2 = Categorical(['b', 'c'], categories=['a', 'b', 'c']) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical(['a', 'b', 'b', 'c'], + categories=['a', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical(['x', np.nan]) + c2 = Categorical([np.nan, 'b']) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical(['x', np.nan, np.nan, 'b'], + categories=['x', 'b']) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical([np.nan]) + c2 = Categorical([np.nan]) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical([np.nan, np.nan], categories=[]) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical([]) + c2 = Categorical([]) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical([]) + tm.assert_categorical_equal(result, expected) + + c1 = Categorical(['b', 'a'], categories=['b', 'a', 'c'], ordered=True) + c2 = Categorical(['a', 'c'], categories=['b', 'a', 'c'], ordered=True) + result = union_categoricals([c1, c2], sort_categories=False) + expected = Categorical(['b', 'a', 'a', 'c'], + categories=['b', 'a', 'c'], ordered=True) + tm.assert_categorical_equal(result, expected) + + def test_union_categorical_unwrap(self): + # GH 14173 + c1 = Categorical(['a', 'b']) + c2 = pd.Series(['b', 'c'], dtype='category') + result = union_categoricals([c1, c2]) + expected = Categorical(['a', 'b', 'b', 'c']) + tm.assert_categorical_equal(result, expected) + + c2 = CategoricalIndex(c2) + result = union_categoricals([c1, c2]) + tm.assert_categorical_equal(result, expected) + + c1 = Series(c1) + result = union_categoricals([c1, c2]) + tm.assert_categorical_equal(result, expected) + + with pytest.raises(TypeError): + union_categoricals([c1, ['a', 'b', 'c']]) diff --git a/pandas/tests/reshape/test_util.py b/pandas/tests/reshape/test_util.py new file mode 100644 index 0000000000000..e4a9591b95c26 --- /dev/null +++ b/pandas/tests/reshape/test_util.py @@ -0,0 +1,49 @@ + +import numpy as np +from pandas import date_range, Index +import pandas.util.testing as tm +from pandas.core.reshape.util import cartesian_product + + +class TestCartesianProduct(object): + + def test_simple(self): + x, y = list('ABC'), [1, 22] + result1, result2 = cartesian_product([x, y]) + expected1 = np.array(['A', 'A', 'B', 'B', 'C', 'C']) + expected2 = np.array([1, 22, 1, 22, 1, 22]) + tm.assert_numpy_array_equal(result1, expected1) + tm.assert_numpy_array_equal(result2, expected2) + + def test_datetimeindex(self): + # regression test for GitHub issue #6439 + # make sure that the ordering on datetimeindex is consistent + x = date_range('2000-01-01', periods=2) + result1, result2 = [Index(y).day for y in cartesian_product([x, x])] + expected1 = Index([1, 1, 2, 2]) + expected2 = Index([1, 2, 1, 2]) + tm.assert_index_equal(result1, expected1) + tm.assert_index_equal(result2, expected2) + + def test_empty(self): + # product of empty factors + X = [[], [0, 1], []] + Y = [[], [], ['a', 'b', 'c']] + for x, y in zip(X, Y): + expected1 = np.array([], dtype=np.asarray(x).dtype) + expected2 = np.array([], dtype=np.asarray(y).dtype) + result1, result2 = cartesian_product([x, y]) + tm.assert_numpy_array_equal(result1, expected1) + tm.assert_numpy_array_equal(result2, expected2) + + # empty product (empty input): + result = cartesian_product([]) + expected = [] + assert result == expected + + def test_invalid_input(self): + invalid_inputs = [1, [1], [1, 2], [[1], 2], + 'a', ['a'], ['a', 'b'], [['a'], 'b']] + msg = "Input must be a list-like of list-likes" + for X in invalid_inputs: + tm.assert_raises_regex(TypeError, msg, cartesian_product, X=X) diff --git a/pandas/tests/scalar/__init__.py b/pandas/tests/scalar/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tests/scalar/test_interval.py b/pandas/tests/scalar/test_interval.py new file mode 100644 index 0000000000000..d431db0b4ca4f --- /dev/null +++ b/pandas/tests/scalar/test_interval.py @@ -0,0 +1,139 @@ +from __future__ import division + +from pandas import Interval + +import pytest +import pandas.util.testing as tm + + +@pytest.fixture +def interval(): + return Interval(0, 1) + + +class TestInterval(object): + + def test_properties(self, interval): + assert interval.closed == 'right' + assert interval.left == 0 + assert interval.right == 1 + assert interval.mid == 0.5 + + def test_repr(self, interval): + assert repr(interval) == "Interval(0, 1, closed='right')" + assert str(interval) == "(0, 1]" + + interval_left = Interval(0, 1, closed='left') + assert repr(interval_left) == "Interval(0, 1, closed='left')" + assert str(interval_left) == "[0, 1)" + + def test_contains(self, interval): + assert 0.5 in interval + assert 1 in interval + assert 0 not in interval + + msg = "__contains__ not defined for two intervals" + with tm.assert_raises_regex(TypeError, msg): + interval in interval + + interval_both = Interval(0, 1, closed='both') + assert 0 in interval_both + assert 1 in interval_both + + interval_neither = Interval(0, 1, closed='neither') + assert 0 not in interval_neither + assert 0.5 in interval_neither + assert 1 not in interval_neither + + def test_equal(self): + assert Interval(0, 1) == Interval(0, 1, closed='right') + assert Interval(0, 1) != Interval(0, 1, closed='left') + assert Interval(0, 1) != 0 + + def test_comparison(self): + with tm.assert_raises_regex(TypeError, 'unorderable types'): + Interval(0, 1) < 2 + + assert Interval(0, 1) < Interval(1, 2) + assert Interval(0, 1) < Interval(0, 2) + assert Interval(0, 1) < Interval(0.5, 1.5) + assert Interval(0, 1) <= Interval(0, 1) + assert Interval(0, 1) > Interval(-1, 2) + assert Interval(0, 1) >= Interval(0, 1) + + def test_hash(self, interval): + # should not raise + hash(interval) + + def test_math_add(self, interval): + expected = Interval(1, 2) + actual = interval + 1 + assert expected == actual + + expected = Interval(1, 2) + actual = 1 + interval + assert expected == actual + + actual = interval + actual += 1 + assert expected == actual + + msg = "unsupported operand type\(s\) for \+" + with tm.assert_raises_regex(TypeError, msg): + interval + Interval(1, 2) + + with tm.assert_raises_regex(TypeError, msg): + interval + 'foo' + + def test_math_sub(self, interval): + expected = Interval(-1, 0) + actual = interval - 1 + assert expected == actual + + actual = interval + actual -= 1 + assert expected == actual + + msg = "unsupported operand type\(s\) for -" + with tm.assert_raises_regex(TypeError, msg): + interval - Interval(1, 2) + + with tm.assert_raises_regex(TypeError, msg): + interval - 'foo' + + def test_math_mult(self, interval): + expected = Interval(0, 2) + actual = interval * 2 + assert expected == actual + + expected = Interval(0, 2) + actual = 2 * interval + assert expected == actual + + actual = interval + actual *= 2 + assert expected == actual + + msg = "unsupported operand type\(s\) for \*" + with tm.assert_raises_regex(TypeError, msg): + interval * Interval(1, 2) + + msg = "can\'t multiply sequence by non-int" + with tm.assert_raises_regex(TypeError, msg): + interval * 'foo' + + def test_math_div(self, interval): + expected = Interval(0, 0.5) + actual = interval / 2.0 + assert expected == actual + + actual = interval + actual /= 2.0 + assert expected == actual + + msg = "unsupported operand type\(s\) for /" + with tm.assert_raises_regex(TypeError, msg): + interval / Interval(1, 2) + + with tm.assert_raises_regex(TypeError, msg): + interval / 'foo' diff --git a/pandas/tests/scalar/test_nat.py b/pandas/tests/scalar/test_nat.py new file mode 100644 index 0000000000000..6f852f2b394e1 --- /dev/null +++ b/pandas/tests/scalar/test_nat.py @@ -0,0 +1,254 @@ +import pytest + +from datetime import datetime, timedelta +import pytz + +import numpy as np +from pandas import (NaT, Index, Timestamp, Timedelta, Period, + DatetimeIndex, PeriodIndex, + TimedeltaIndex, Series, isna) +from pandas.util import testing as tm +from pandas._libs.tslib import iNaT + + +@pytest.mark.parametrize('nat, idx', [(Timestamp('NaT'), DatetimeIndex), + (Timedelta('NaT'), TimedeltaIndex), + (Period('NaT', freq='M'), PeriodIndex)]) +def test_nat_fields(nat, idx): + + for field in idx._field_ops: + + # weekday is a property of DTI, but a method + # on NaT/Timestamp for compat with datetime + if field == 'weekday': + continue + + result = getattr(NaT, field) + assert np.isnan(result) + + result = getattr(nat, field) + assert np.isnan(result) + + for field in idx._bool_ops: + + result = getattr(NaT, field) + assert result is False + + result = getattr(nat, field) + assert result is False + + +def test_nat_vector_field_access(): + idx = DatetimeIndex(['1/1/2000', None, None, '1/4/2000']) + + for field in DatetimeIndex._field_ops: + # weekday is a property of DTI, but a method + # on NaT/Timestamp for compat with datetime + if field == 'weekday': + continue + + result = getattr(idx, field) + expected = Index([getattr(x, field) for x in idx]) + tm.assert_index_equal(result, expected) + + s = Series(idx) + + for field in DatetimeIndex._field_ops: + + # weekday is a property of DTI, but a method + # on NaT/Timestamp for compat with datetime + if field == 'weekday': + continue + + result = getattr(s.dt, field) + expected = [getattr(x, field) for x in idx] + tm.assert_series_equal(result, Series(expected)) + + for field in DatetimeIndex._bool_ops: + result = getattr(s.dt, field) + expected = [getattr(x, field) for x in idx] + tm.assert_series_equal(result, Series(expected)) + + +@pytest.mark.parametrize('klass', [Timestamp, Timedelta, Period]) +def test_identity(klass): + assert klass(None) is NaT + + result = klass(np.nan) + assert result is NaT + + result = klass(None) + assert result is NaT + + result = klass(iNaT) + assert result is NaT + + result = klass(np.nan) + assert result is NaT + + result = klass(float('nan')) + assert result is NaT + + result = klass(NaT) + assert result is NaT + + result = klass('NaT') + assert result is NaT + + assert isna(klass('nat')) + + +@pytest.mark.parametrize('klass', [Timestamp, Timedelta, Period]) +def test_equality(klass): + + # nat + if klass is not Period: + klass('').value == iNaT + klass('nat').value == iNaT + klass('NAT').value == iNaT + klass(None).value == iNaT + klass(np.nan).value == iNaT + assert isna(klass('nat')) + + +@pytest.mark.parametrize('klass', [Timestamp, Timedelta]) +def test_round_nat(klass): + # GH14940 + ts = klass('nat') + for method in ["round", "floor", "ceil"]: + round_method = getattr(ts, method) + for freq in ["s", "5s", "min", "5min", "h", "5h"]: + assert round_method(freq) is ts + + +def test_NaT_methods(): + # GH 9513 + raise_methods = ['astimezone', 'combine', 'ctime', 'dst', + 'fromordinal', 'fromtimestamp', 'isocalendar', + 'strftime', 'strptime', 'time', 'timestamp', + 'timetuple', 'timetz', 'toordinal', 'tzname', + 'utcfromtimestamp', 'utcnow', 'utcoffset', + 'utctimetuple'] + nat_methods = ['date', 'now', 'replace', 'to_datetime', 'today', + 'tz_convert', 'tz_localize'] + nan_methods = ['weekday', 'isoweekday'] + + for method in raise_methods: + if hasattr(NaT, method): + with pytest.raises(ValueError): + getattr(NaT, method)() + + for method in nan_methods: + if hasattr(NaT, method): + assert np.isnan(getattr(NaT, method)()) + + for method in nat_methods: + if hasattr(NaT, method): + # see gh-8254 + exp_warning = None + if method == 'to_datetime': + exp_warning = FutureWarning + with tm.assert_produces_warning( + exp_warning, check_stacklevel=False): + assert getattr(NaT, method)() is NaT + + # GH 12300 + assert NaT.isoformat() == 'NaT' + + +@pytest.mark.parametrize('klass', [Timestamp, Timedelta]) +def test_isoformat(klass): + + result = klass('NaT').isoformat() + expected = 'NaT' + assert result == expected + + +def test_nat_arithmetic(): + # GH 6873 + i = 2 + f = 1.5 + + for (left, right) in [(NaT, i), (NaT, f), (NaT, np.nan)]: + assert left / right is NaT + assert left * right is NaT + assert right * left is NaT + with pytest.raises(TypeError): + right / left + + # Timestamp / datetime + t = Timestamp('2014-01-01') + dt = datetime(2014, 1, 1) + for (left, right) in [(NaT, NaT), (NaT, t), (NaT, dt)]: + # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT + assert right + left is NaT + assert left + right is NaT + assert left - right is NaT + assert right - left is NaT + + # timedelta-like + # offsets are tested in test_offsets.py + + delta = timedelta(3600) + td = Timedelta('5s') + + for (left, right) in [(NaT, delta), (NaT, td)]: + # NaT + timedelta-like returns NaT + assert right + left is NaT + assert left + right is NaT + assert right - left is NaT + assert left - right is NaT + + # GH 11718 + t_utc = Timestamp('2014-01-01', tz='UTC') + t_tz = Timestamp('2014-01-01', tz='US/Eastern') + dt_tz = pytz.timezone('Asia/Tokyo').localize(dt) + + for (left, right) in [(NaT, t_utc), (NaT, t_tz), + (NaT, dt_tz)]: + # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT + assert right + left is NaT + assert left + right is NaT + assert left - right is NaT + assert right - left is NaT + + # int addition / subtraction + for (left, right) in [(NaT, 2), (NaT, 0), (NaT, -3)]: + assert right + left is NaT + assert left + right is NaT + assert left - right is NaT + assert right - left is NaT + + +def test_nat_arithmetic_index(): + # GH 11718 + + dti = DatetimeIndex(['2011-01-01', '2011-01-02'], name='x') + exp = DatetimeIndex([NaT, NaT], name='x') + tm.assert_index_equal(dti + NaT, exp) + tm.assert_index_equal(NaT + dti, exp) + + dti_tz = DatetimeIndex(['2011-01-01', '2011-01-02'], + tz='US/Eastern', name='x') + exp = DatetimeIndex([NaT, NaT], name='x', tz='US/Eastern') + tm.assert_index_equal(dti_tz + NaT, exp) + tm.assert_index_equal(NaT + dti_tz, exp) + + exp = TimedeltaIndex([NaT, NaT], name='x') + for (left, right) in [(NaT, dti), (NaT, dti_tz)]: + tm.assert_index_equal(left - right, exp) + tm.assert_index_equal(right - left, exp) + + # timedelta + tdi = TimedeltaIndex(['1 day', '2 day'], name='x') + exp = DatetimeIndex([NaT, NaT], name='x') + for (left, right) in [(NaT, tdi)]: + tm.assert_index_equal(left + right, exp) + tm.assert_index_equal(right + left, exp) + tm.assert_index_equal(left - right, exp) + tm.assert_index_equal(right - left, exp) + + +def test_nat_pinned_docstrings(): + # GH17327 + assert NaT.ctime.__doc__ == datetime.ctime.__doc__ diff --git a/pandas/tests/scalar/test_period.py b/pandas/tests/scalar/test_period.py new file mode 100644 index 0000000000000..a167c9c738b0b --- /dev/null +++ b/pandas/tests/scalar/test_period.py @@ -0,0 +1,1419 @@ +import pytest + +import pytz +import numpy as np +from datetime import datetime, date, timedelta + +import pandas as pd +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas.compat import text_type, iteritems +from pandas.compat.numpy import np_datetime64_compat + +from pandas._libs import tslib, period as libperiod +from pandas import Period, Timestamp, offsets +from pandas.tseries.frequencies import DAYS, MONTHS + + +class TestPeriodProperties(object): + "Test properties such as year, month, weekday, etc...." + + def test_is_leap_year(self): + # GH 13727 + for freq in ['A', 'M', 'D', 'H']: + p = Period('2000-01-01 00:00:00', freq=freq) + assert p.is_leap_year + assert isinstance(p.is_leap_year, bool) + + p = Period('1999-01-01 00:00:00', freq=freq) + assert not p.is_leap_year + + p = Period('2004-01-01 00:00:00', freq=freq) + assert p.is_leap_year + + p = Period('2100-01-01 00:00:00', freq=freq) + assert not p.is_leap_year + + def test_quarterly_negative_ordinals(self): + p = Period(ordinal=-1, freq='Q-DEC') + assert p.year == 1969 + assert p.quarter == 4 + assert isinstance(p, Period) + + p = Period(ordinal=-2, freq='Q-DEC') + assert p.year == 1969 + assert p.quarter == 3 + assert isinstance(p, Period) + + p = Period(ordinal=-2, freq='M') + assert p.year == 1969 + assert p.month == 11 + assert isinstance(p, Period) + + def test_period_cons_quarterly(self): + # bugs in scikits.timeseries + for month in MONTHS: + freq = 'Q-%s' % month + exp = Period('1989Q3', freq=freq) + assert '1989Q3' in str(exp) + stamp = exp.to_timestamp('D', how='end') + p = Period(stamp, freq=freq) + assert p == exp + + stamp = exp.to_timestamp('3D', how='end') + p = Period(stamp, freq=freq) + assert p == exp + + def test_period_cons_annual(self): + # bugs in scikits.timeseries + for month in MONTHS: + freq = 'A-%s' % month + exp = Period('1989', freq=freq) + stamp = exp.to_timestamp('D', how='end') + timedelta(days=30) + p = Period(stamp, freq=freq) + assert p == exp + 1 + assert isinstance(p, Period) + + def test_period_cons_weekly(self): + for num in range(10, 17): + daystr = '2011-02-%d' % num + for day in DAYS: + freq = 'W-%s' % day + + result = Period(daystr, freq=freq) + expected = Period(daystr, freq='D').asfreq(freq) + assert result == expected + assert isinstance(result, Period) + + def test_period_from_ordinal(self): + p = pd.Period('2011-01', freq='M') + res = pd.Period._from_ordinal(p.ordinal, freq='M') + assert p == res + assert isinstance(res, Period) + + def test_period_cons_nat(self): + p = Period('NaT', freq='M') + assert p is pd.NaT + + p = Period('nat', freq='W-SUN') + assert p is pd.NaT + + p = Period(tslib.iNaT, freq='D') + assert p is pd.NaT + + p = Period(tslib.iNaT, freq='3D') + assert p is pd.NaT + + p = Period(tslib.iNaT, freq='1D1H') + assert p is pd.NaT + + p = Period('NaT') + assert p is pd.NaT + + p = Period(tslib.iNaT) + assert p is pd.NaT + + def test_period_cons_mult(self): + p1 = Period('2011-01', freq='3M') + p2 = Period('2011-01', freq='M') + assert p1.ordinal == p2.ordinal + + assert p1.freq == offsets.MonthEnd(3) + assert p1.freqstr == '3M' + + assert p2.freq == offsets.MonthEnd() + assert p2.freqstr == 'M' + + result = p1 + 1 + assert result.ordinal == (p2 + 3).ordinal + assert result.freq == p1.freq + assert result.freqstr == '3M' + + result = p1 - 1 + assert result.ordinal == (p2 - 3).ordinal + assert result.freq == p1.freq + assert result.freqstr == '3M' + + msg = ('Frequency must be positive, because it' + ' represents span: -3M') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='-3M') + + msg = ('Frequency must be positive, because it' ' represents span: 0M') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='0M') + + def test_period_cons_combined(self): + p = [(Period('2011-01', freq='1D1H'), + Period('2011-01', freq='1H1D'), + Period('2011-01', freq='H')), + (Period(ordinal=1, freq='1D1H'), + Period(ordinal=1, freq='1H1D'), + Period(ordinal=1, freq='H'))] + + for p1, p2, p3 in p: + assert p1.ordinal == p3.ordinal + assert p2.ordinal == p3.ordinal + + assert p1.freq == offsets.Hour(25) + assert p1.freqstr == '25H' + + assert p2.freq == offsets.Hour(25) + assert p2.freqstr == '25H' + + assert p3.freq == offsets.Hour() + assert p3.freqstr == 'H' + + result = p1 + 1 + assert result.ordinal == (p3 + 25).ordinal + assert result.freq == p1.freq + assert result.freqstr == '25H' + + result = p2 + 1 + assert result.ordinal == (p3 + 25).ordinal + assert result.freq == p2.freq + assert result.freqstr == '25H' + + result = p1 - 1 + assert result.ordinal == (p3 - 25).ordinal + assert result.freq == p1.freq + assert result.freqstr == '25H' + + result = p2 - 1 + assert result.ordinal == (p3 - 25).ordinal + assert result.freq == p2.freq + assert result.freqstr == '25H' + + msg = ('Frequency must be positive, because it' + ' represents span: -25H') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='-1D1H') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='-1H1D') + with tm.assert_raises_regex(ValueError, msg): + Period(ordinal=1, freq='-1D1H') + with tm.assert_raises_regex(ValueError, msg): + Period(ordinal=1, freq='-1H1D') + + msg = ('Frequency must be positive, because it' + ' represents span: 0D') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='0D0H') + with tm.assert_raises_regex(ValueError, msg): + Period(ordinal=1, freq='0D0H') + + # You can only combine together day and intraday offsets + msg = ('Invalid frequency: 1W1D') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='1W1D') + msg = ('Invalid frequency: 1D1W') + with tm.assert_raises_regex(ValueError, msg): + Period('2011-01', freq='1D1W') + + def test_timestamp_tz_arg(self): + for case in ['Europe/Brussels', 'Asia/Tokyo', 'US/Pacific']: + p = Period('1/1/2005', freq='M').to_timestamp(tz=case) + exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) + exp_zone = pytz.timezone(case).normalize(p) + + assert p == exp + assert p.tz == exp_zone.tzinfo + assert p.tz == exp.tz + + p = Period('1/1/2005', freq='3H').to_timestamp(tz=case) + exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) + exp_zone = pytz.timezone(case).normalize(p) + + assert p == exp + assert p.tz == exp_zone.tzinfo + assert p.tz == exp.tz + + p = Period('1/1/2005', freq='A').to_timestamp(freq='A', tz=case) + exp = Timestamp('31/12/2005', tz='UTC').tz_convert(case) + exp_zone = pytz.timezone(case).normalize(p) + + assert p == exp + assert p.tz == exp_zone.tzinfo + assert p.tz == exp.tz + + p = Period('1/1/2005', freq='A').to_timestamp(freq='3H', tz=case) + exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) + exp_zone = pytz.timezone(case).normalize(p) + + assert p == exp + assert p.tz == exp_zone.tzinfo + assert p.tz == exp.tz + + def test_timestamp_tz_arg_dateutil(self): + from pandas._libs.tslib import _dateutil_gettz as gettz + from pandas._libs.tslib import maybe_get_tz + for case in ['dateutil/Europe/Brussels', 'dateutil/Asia/Tokyo', + 'dateutil/US/Pacific']: + p = Period('1/1/2005', freq='M').to_timestamp( + tz=maybe_get_tz(case)) + exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) + assert p == exp + assert p.tz == gettz(case.split('/', 1)[1]) + assert p.tz == exp.tz + + p = Period('1/1/2005', + freq='M').to_timestamp(freq='3H', tz=maybe_get_tz(case)) + exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) + assert p == exp + assert p.tz == gettz(case.split('/', 1)[1]) + assert p.tz == exp.tz + + def test_timestamp_tz_arg_dateutil_from_string(self): + from pandas._libs.tslib import _dateutil_gettz as gettz + p = Period('1/1/2005', + freq='M').to_timestamp(tz='dateutil/Europe/Brussels') + assert p.tz == gettz('Europe/Brussels') + + def test_timestamp_mult(self): + p = pd.Period('2011-01', freq='M') + assert p.to_timestamp(how='S') == pd.Timestamp('2011-01-01') + assert p.to_timestamp(how='E') == pd.Timestamp('2011-01-31') + + p = pd.Period('2011-01', freq='3M') + assert p.to_timestamp(how='S') == pd.Timestamp('2011-01-01') + assert p.to_timestamp(how='E') == pd.Timestamp('2011-03-31') + + def test_construction(self): + i1 = Period('1/1/2005', freq='M') + i2 = Period('Jan 2005') + + assert i1 == i2 + + i1 = Period('2005', freq='A') + i2 = Period('2005') + i3 = Period('2005', freq='a') + + assert i1 == i2 + assert i1 == i3 + + i4 = Period('2005', freq='M') + i5 = Period('2005', freq='m') + + pytest.raises(ValueError, i1.__ne__, i4) + assert i4 == i5 + + i1 = Period.now('Q') + i2 = Period(datetime.now(), freq='Q') + i3 = Period.now('q') + + assert i1 == i2 + assert i1 == i3 + + i1 = Period('1982', freq='min') + i2 = Period('1982', freq='MIN') + assert i1 == i2 + i2 = Period('1982', freq=('Min', 1)) + assert i1 == i2 + + i1 = Period(year=2005, month=3, day=1, freq='D') + i2 = Period('3/1/2005', freq='D') + assert i1 == i2 + + i3 = Period(year=2005, month=3, day=1, freq='d') + assert i1 == i3 + + i1 = Period('2007-01-01 09:00:00.001') + expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1000), freq='L') + assert i1 == expected + + expected = Period(np_datetime64_compat( + '2007-01-01 09:00:00.001Z'), freq='L') + assert i1 == expected + + i1 = Period('2007-01-01 09:00:00.00101') + expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1010), freq='U') + assert i1 == expected + + expected = Period(np_datetime64_compat('2007-01-01 09:00:00.00101Z'), + freq='U') + assert i1 == expected + + pytest.raises(ValueError, Period, ordinal=200701) + + pytest.raises(ValueError, Period, '2007-1-1', freq='X') + + def test_construction_bday(self): + + # Biz day construction, roll forward if non-weekday + i1 = Period('3/10/12', freq='B') + i2 = Period('3/10/12', freq='D') + assert i1 == i2.asfreq('B') + i2 = Period('3/11/12', freq='D') + assert i1 == i2.asfreq('B') + i2 = Period('3/12/12', freq='D') + assert i1 == i2.asfreq('B') + + i3 = Period('3/10/12', freq='b') + assert i1 == i3 + + i1 = Period(year=2012, month=3, day=10, freq='B') + i2 = Period('3/12/12', freq='B') + assert i1 == i2 + + def test_construction_quarter(self): + + i1 = Period(year=2005, quarter=1, freq='Q') + i2 = Period('1/1/2005', freq='Q') + assert i1 == i2 + + i1 = Period(year=2005, quarter=3, freq='Q') + i2 = Period('9/1/2005', freq='Q') + assert i1 == i2 + + i1 = Period('2005Q1') + i2 = Period(year=2005, quarter=1, freq='Q') + i3 = Period('2005q1') + assert i1 == i2 + assert i1 == i3 + + i1 = Period('05Q1') + assert i1 == i2 + lower = Period('05q1') + assert i1 == lower + + i1 = Period('1Q2005') + assert i1 == i2 + lower = Period('1q2005') + assert i1 == lower + + i1 = Period('1Q05') + assert i1 == i2 + lower = Period('1q05') + assert i1 == lower + + i1 = Period('4Q1984') + assert i1.year == 1984 + lower = Period('4q1984') + assert i1 == lower + + def test_construction_month(self): + + expected = Period('2007-01', freq='M') + i1 = Period('200701', freq='M') + assert i1 == expected + + i1 = Period('200701', freq='M') + assert i1 == expected + + i1 = Period(200701, freq='M') + assert i1 == expected + + i1 = Period(ordinal=200701, freq='M') + assert i1.year == 18695 + + i1 = Period(datetime(2007, 1, 1), freq='M') + i2 = Period('200701', freq='M') + assert i1 == i2 + + i1 = Period(date(2007, 1, 1), freq='M') + i2 = Period(datetime(2007, 1, 1), freq='M') + i3 = Period(np.datetime64('2007-01-01'), freq='M') + i4 = Period(np_datetime64_compat('2007-01-01 00:00:00Z'), freq='M') + i5 = Period(np_datetime64_compat('2007-01-01 00:00:00.000Z'), freq='M') + assert i1 == i2 + assert i1 == i3 + assert i1 == i4 + assert i1 == i5 + + def test_period_constructor_offsets(self): + assert (Period('1/1/2005', freq=offsets.MonthEnd()) == + Period('1/1/2005', freq='M')) + assert (Period('2005', freq=offsets.YearEnd()) == + Period('2005', freq='A')) + assert (Period('2005', freq=offsets.MonthEnd()) == + Period('2005', freq='M')) + assert (Period('3/10/12', freq=offsets.BusinessDay()) == + Period('3/10/12', freq='B')) + assert (Period('3/10/12', freq=offsets.Day()) == + Period('3/10/12', freq='D')) + + assert (Period(year=2005, quarter=1, + freq=offsets.QuarterEnd(startingMonth=12)) == + Period(year=2005, quarter=1, freq='Q')) + assert (Period(year=2005, quarter=2, + freq=offsets.QuarterEnd(startingMonth=12)) == + Period(year=2005, quarter=2, freq='Q')) + + assert (Period(year=2005, month=3, day=1, freq=offsets.Day()) == + Period(year=2005, month=3, day=1, freq='D')) + assert (Period(year=2012, month=3, day=10, freq=offsets.BDay()) == + Period(year=2012, month=3, day=10, freq='B')) + + expected = Period('2005-03-01', freq='3D') + assert (Period(year=2005, month=3, day=1, + freq=offsets.Day(3)) == expected) + assert Period(year=2005, month=3, day=1, freq='3D') == expected + + assert (Period(year=2012, month=3, day=10, + freq=offsets.BDay(3)) == + Period(year=2012, month=3, day=10, freq='3B')) + + assert (Period(200701, freq=offsets.MonthEnd()) == + Period(200701, freq='M')) + + i1 = Period(ordinal=200701, freq=offsets.MonthEnd()) + i2 = Period(ordinal=200701, freq='M') + assert i1 == i2 + assert i1.year == 18695 + assert i2.year == 18695 + + i1 = Period(datetime(2007, 1, 1), freq='M') + i2 = Period('200701', freq='M') + assert i1 == i2 + + i1 = Period(date(2007, 1, 1), freq='M') + i2 = Period(datetime(2007, 1, 1), freq='M') + i3 = Period(np.datetime64('2007-01-01'), freq='M') + i4 = Period(np_datetime64_compat('2007-01-01 00:00:00Z'), freq='M') + i5 = Period(np_datetime64_compat('2007-01-01 00:00:00.000Z'), freq='M') + assert i1 == i2 + assert i1 == i3 + assert i1 == i4 + assert i1 == i5 + + i1 = Period('2007-01-01 09:00:00.001') + expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1000), freq='L') + assert i1 == expected + + expected = Period(np_datetime64_compat( + '2007-01-01 09:00:00.001Z'), freq='L') + assert i1 == expected + + i1 = Period('2007-01-01 09:00:00.00101') + expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1010), freq='U') + assert i1 == expected + + expected = Period(np_datetime64_compat('2007-01-01 09:00:00.00101Z'), + freq='U') + assert i1 == expected + + pytest.raises(ValueError, Period, ordinal=200701) + + pytest.raises(ValueError, Period, '2007-1-1', freq='X') + + def test_freq_str(self): + i1 = Period('1982', freq='Min') + assert i1.freq == offsets.Minute() + assert i1.freqstr == 'T' + + def test_period_deprecated_freq(self): + cases = {"M": ["MTH", "MONTH", "MONTHLY", "Mth", "month", "monthly"], + "B": ["BUS", "BUSINESS", "BUSINESSLY", "WEEKDAY", "bus"], + "D": ["DAY", "DLY", "DAILY", "Day", "Dly", "Daily"], + "H": ["HR", "HOUR", "HRLY", "HOURLY", "hr", "Hour", "HRly"], + "T": ["minute", "MINUTE", "MINUTELY", "minutely"], + "S": ["sec", "SEC", "SECOND", "SECONDLY", "second"], + "L": ["MILLISECOND", "MILLISECONDLY", "millisecond"], + "U": ["MICROSECOND", "MICROSECONDLY", "microsecond"], + "N": ["NANOSECOND", "NANOSECONDLY", "nanosecond"]} + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + for exp, freqs in iteritems(cases): + for freq in freqs: + with tm.assert_raises_regex(ValueError, msg): + Period('2016-03-01 09:00', freq=freq) + with tm.assert_raises_regex(ValueError, msg): + Period(ordinal=1, freq=freq) + + # check supported freq-aliases still works + p1 = Period('2016-03-01 09:00', freq=exp) + p2 = Period(ordinal=1, freq=exp) + assert isinstance(p1, Period) + assert isinstance(p2, Period) + + def test_hash(self): + assert (hash(Period('2011-01', freq='M')) == + hash(Period('2011-01', freq='M'))) + + assert (hash(Period('2011-01-01', freq='D')) != + hash(Period('2011-01', freq='M'))) + + assert (hash(Period('2011-01', freq='3M')) != + hash(Period('2011-01', freq='2M'))) + + assert (hash(Period('2011-01', freq='M')) != + hash(Period('2011-02', freq='M'))) + + def test_repr(self): + p = Period('Jan-2000') + assert '2000-01' in repr(p) + + p = Period('2000-12-15') + assert '2000-12-15' in repr(p) + + def test_repr_nat(self): + p = Period('nat', freq='M') + assert repr(tslib.NaT) in repr(p) + + def test_millisecond_repr(self): + p = Period('2000-01-01 12:15:02.123') + + assert repr(p) == "Period('2000-01-01 12:15:02.123', 'L')" + + def test_microsecond_repr(self): + p = Period('2000-01-01 12:15:02.123567') + + assert repr(p) == "Period('2000-01-01 12:15:02.123567', 'U')" + + def test_strftime(self): + p = Period('2000-1-1 12:34:12', freq='S') + res = p.strftime('%Y-%m-%d %H:%M:%S') + assert res == '2000-01-01 12:34:12' + assert isinstance(res, text_type) # GH3363 + + def test_sub_delta(self): + left, right = Period('2011', freq='A'), Period('2007', freq='A') + result = left - right + assert result == 4 + + with pytest.raises(period.IncompatibleFrequency): + left - Period('2007-01', freq='M') + + def test_to_timestamp(self): + p = Period('1982', freq='A') + start_ts = p.to_timestamp(how='S') + aliases = ['s', 'StarT', 'BEGIn'] + for a in aliases: + assert start_ts == p.to_timestamp('D', how=a) + # freq with mult should not affect to the result + assert start_ts == p.to_timestamp('3D', how=a) + + end_ts = p.to_timestamp(how='E') + aliases = ['e', 'end', 'FINIsH'] + for a in aliases: + assert end_ts == p.to_timestamp('D', how=a) + assert end_ts == p.to_timestamp('3D', how=a) + + from_lst = ['A', 'Q', 'M', 'W', 'B', 'D', 'H', 'Min', 'S'] + + def _ex(p): + return Timestamp((p + 1).start_time.value - 1) + + for i, fcode in enumerate(from_lst): + p = Period('1982', freq=fcode) + result = p.to_timestamp().to_period(fcode) + assert result == p + + assert p.start_time == p.to_timestamp(how='S') + + assert p.end_time == _ex(p) + + # Frequency other than daily + + p = Period('1985', freq='A') + + result = p.to_timestamp('H', how='end') + expected = datetime(1985, 12, 31, 23) + assert result == expected + result = p.to_timestamp('3H', how='end') + assert result == expected + + result = p.to_timestamp('T', how='end') + expected = datetime(1985, 12, 31, 23, 59) + assert result == expected + result = p.to_timestamp('2T', how='end') + assert result == expected + + result = p.to_timestamp(how='end') + expected = datetime(1985, 12, 31) + assert result == expected + + expected = datetime(1985, 1, 1) + result = p.to_timestamp('H', how='start') + assert result == expected + result = p.to_timestamp('T', how='start') + assert result == expected + result = p.to_timestamp('S', how='start') + assert result == expected + result = p.to_timestamp('3H', how='start') + assert result == expected + result = p.to_timestamp('5S', how='start') + assert result == expected + + def test_start_time(self): + freq_lst = ['A', 'Q', 'M', 'D', 'H', 'T', 'S'] + xp = datetime(2012, 1, 1) + for f in freq_lst: + p = Period('2012', freq=f) + assert p.start_time == xp + assert Period('2012', freq='B').start_time == datetime(2012, 1, 2) + assert Period('2012', freq='W').start_time == datetime(2011, 12, 26) + + def test_end_time(self): + p = Period('2012', freq='A') + + def _ex(*args): + return Timestamp(Timestamp(datetime(*args)).value - 1) + + xp = _ex(2013, 1, 1) + assert xp == p.end_time + + p = Period('2012', freq='Q') + xp = _ex(2012, 4, 1) + assert xp == p.end_time + + p = Period('2012', freq='M') + xp = _ex(2012, 2, 1) + assert xp == p.end_time + + p = Period('2012', freq='D') + xp = _ex(2012, 1, 2) + assert xp == p.end_time + + p = Period('2012', freq='H') + xp = _ex(2012, 1, 1, 1) + assert xp == p.end_time + + p = Period('2012', freq='B') + xp = _ex(2012, 1, 3) + assert xp == p.end_time + + p = Period('2012', freq='W') + xp = _ex(2012, 1, 2) + assert xp == p.end_time + + # Test for GH 11738 + p = Period('2012', freq='15D') + xp = _ex(2012, 1, 16) + assert xp == p.end_time + + p = Period('2012', freq='1D1H') + xp = _ex(2012, 1, 2, 1) + assert xp == p.end_time + + p = Period('2012', freq='1H1D') + xp = _ex(2012, 1, 2, 1) + assert xp == p.end_time + + def test_anchor_week_end_time(self): + def _ex(*args): + return Timestamp(Timestamp(datetime(*args)).value - 1) + + p = Period('2013-1-1', 'W-SAT') + xp = _ex(2013, 1, 6) + assert p.end_time == xp + + def test_properties_annually(self): + # Test properties on Periods with annually frequency. + a_date = Period(freq='A', year=2007) + assert a_date.year == 2007 + + def test_properties_quarterly(self): + # Test properties on Periods with daily frequency. + qedec_date = Period(freq="Q-DEC", year=2007, quarter=1) + qejan_date = Period(freq="Q-JAN", year=2007, quarter=1) + qejun_date = Period(freq="Q-JUN", year=2007, quarter=1) + # + for x in range(3): + for qd in (qedec_date, qejan_date, qejun_date): + assert (qd + x).qyear == 2007 + assert (qd + x).quarter == x + 1 + + def test_properties_monthly(self): + # Test properties on Periods with daily frequency. + m_date = Period(freq='M', year=2007, month=1) + for x in range(11): + m_ival_x = m_date + x + assert m_ival_x.year == 2007 + if 1 <= x + 1 <= 3: + assert m_ival_x.quarter == 1 + elif 4 <= x + 1 <= 6: + assert m_ival_x.quarter == 2 + elif 7 <= x + 1 <= 9: + assert m_ival_x.quarter == 3 + elif 10 <= x + 1 <= 12: + assert m_ival_x.quarter == 4 + assert m_ival_x.month == x + 1 + + def test_properties_weekly(self): + # Test properties on Periods with daily frequency. + w_date = Period(freq='W', year=2007, month=1, day=7) + # + assert w_date.year == 2007 + assert w_date.quarter == 1 + assert w_date.month == 1 + assert w_date.week == 1 + assert (w_date - 1).week == 52 + assert w_date.days_in_month == 31 + assert Period(freq='W', year=2012, + month=2, day=1).days_in_month == 29 + + def test_properties_weekly_legacy(self): + # Test properties on Periods with daily frequency. + w_date = Period(freq='W', year=2007, month=1, day=7) + assert w_date.year == 2007 + assert w_date.quarter == 1 + assert w_date.month == 1 + assert w_date.week == 1 + assert (w_date - 1).week == 52 + assert w_date.days_in_month == 31 + + exp = Period(freq='W', year=2012, month=2, day=1) + assert exp.days_in_month == 29 + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK', year=2007, month=1, day=7) + + def test_properties_daily(self): + # Test properties on Periods with daily frequency. + b_date = Period(freq='B', year=2007, month=1, day=1) + # + assert b_date.year == 2007 + assert b_date.quarter == 1 + assert b_date.month == 1 + assert b_date.day == 1 + assert b_date.weekday == 0 + assert b_date.dayofyear == 1 + assert b_date.days_in_month == 31 + assert Period(freq='B', year=2012, + month=2, day=1).days_in_month == 29 + + d_date = Period(freq='D', year=2007, month=1, day=1) + + assert d_date.year == 2007 + assert d_date.quarter == 1 + assert d_date.month == 1 + assert d_date.day == 1 + assert d_date.weekday == 0 + assert d_date.dayofyear == 1 + assert d_date.days_in_month == 31 + assert Period(freq='D', year=2012, month=2, + day=1).days_in_month == 29 + + def test_properties_hourly(self): + # Test properties on Periods with hourly frequency. + h_date1 = Period(freq='H', year=2007, month=1, day=1, hour=0) + h_date2 = Period(freq='2H', year=2007, month=1, day=1, hour=0) + + for h_date in [h_date1, h_date2]: + assert h_date.year == 2007 + assert h_date.quarter == 1 + assert h_date.month == 1 + assert h_date.day == 1 + assert h_date.weekday == 0 + assert h_date.dayofyear == 1 + assert h_date.hour == 0 + assert h_date.days_in_month == 31 + assert Period(freq='H', year=2012, month=2, day=1, + hour=0).days_in_month == 29 + + def test_properties_minutely(self): + # Test properties on Periods with minutely frequency. + t_date = Period(freq='Min', year=2007, month=1, day=1, hour=0, + minute=0) + # + assert t_date.quarter == 1 + assert t_date.month == 1 + assert t_date.day == 1 + assert t_date.weekday == 0 + assert t_date.dayofyear == 1 + assert t_date.hour == 0 + assert t_date.minute == 0 + assert t_date.days_in_month == 31 + assert Period(freq='D', year=2012, month=2, day=1, hour=0, + minute=0).days_in_month == 29 + + def test_properties_secondly(self): + # Test properties on Periods with secondly frequency. + s_date = Period(freq='Min', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + # + assert s_date.year == 2007 + assert s_date.quarter == 1 + assert s_date.month == 1 + assert s_date.day == 1 + assert s_date.weekday == 0 + assert s_date.dayofyear == 1 + assert s_date.hour == 0 + assert s_date.minute == 0 + assert s_date.second == 0 + assert s_date.days_in_month == 31 + assert Period(freq='Min', year=2012, month=2, day=1, hour=0, + minute=0, second=0).days_in_month == 29 + + def test_pnow(self): + + # deprecation, xref #13790 + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + period.pnow('D') + + def test_constructor_corner(self): + expected = Period('2007-01', freq='2M') + assert Period(year=2007, month=1, freq='2M') == expected + + pytest.raises(ValueError, Period, datetime.now()) + pytest.raises(ValueError, Period, datetime.now().date()) + pytest.raises(ValueError, Period, 1.6, freq='D') + pytest.raises(ValueError, Period, ordinal=1.6, freq='D') + pytest.raises(ValueError, Period, ordinal=2, value=1, freq='D') + assert Period(None) is pd.NaT + pytest.raises(ValueError, Period, month=1) + + p = Period('2007-01-01', freq='D') + + result = Period(p, freq='A') + exp = Period('2007', freq='A') + assert result == exp + + def test_constructor_infer_freq(self): + p = Period('2007-01-01') + assert p.freq == 'D' + + p = Period('2007-01-01 07') + assert p.freq == 'H' + + p = Period('2007-01-01 07:10') + assert p.freq == 'T' + + p = Period('2007-01-01 07:10:15') + assert p.freq == 'S' + + p = Period('2007-01-01 07:10:15.123') + assert p.freq == 'L' + + p = Period('2007-01-01 07:10:15.123000') + assert p.freq == 'L' + + p = Period('2007-01-01 07:10:15.123400') + assert p.freq == 'U' + + def test_badinput(self): + pytest.raises(ValueError, Period, '-2000', 'A') + pytest.raises(tslib.DateParseError, Period, '0', 'A') + pytest.raises(tslib.DateParseError, Period, '1/1/-2000', 'A') + + def test_multiples(self): + result1 = Period('1989', freq='2A') + result2 = Period('1989', freq='A') + assert result1.ordinal == result2.ordinal + assert result1.freqstr == '2A-DEC' + assert result2.freqstr == 'A-DEC' + assert result1.freq == offsets.YearEnd(2) + assert result2.freq == offsets.YearEnd() + + assert (result1 + 1).ordinal == result1.ordinal + 2 + assert (1 + result1).ordinal == result1.ordinal + 2 + assert (result1 - 1).ordinal == result2.ordinal - 2 + assert (-1 + result1).ordinal == result2.ordinal - 2 + + def test_round_trip(self): + + p = Period('2000Q1') + new_p = tm.round_trip_pickle(p) + assert new_p == p + + +class TestPeriodField(object): + + def test_get_period_field_raises_on_out_of_range(self): + pytest.raises(ValueError, libperiod.get_period_field, -1, 0, 0) + + def test_get_period_field_array_raises_on_out_of_range(self): + pytest.raises(ValueError, libperiod.get_period_field_arr, -1, + np.empty(1), 0) + + +class TestComparisons(object): + + def setup_method(self, method): + self.january1 = Period('2000-01', 'M') + self.january2 = Period('2000-01', 'M') + self.february = Period('2000-02', 'M') + self.march = Period('2000-03', 'M') + self.day = Period('2012-01-01', 'D') + + def test_equal(self): + assert self.january1 == self.january2 + + def test_equal_Raises_Value(self): + with pytest.raises(period.IncompatibleFrequency): + self.january1 == self.day + + def test_notEqual(self): + assert self.january1 != 1 + assert self.january1 != self.february + + def test_greater(self): + assert self.february > self.january1 + + def test_greater_Raises_Value(self): + with pytest.raises(period.IncompatibleFrequency): + self.january1 > self.day + + def test_greater_Raises_Type(self): + with pytest.raises(TypeError): + self.january1 > 1 + + def test_greaterEqual(self): + assert self.january1 >= self.january2 + + def test_greaterEqual_Raises_Value(self): + with pytest.raises(period.IncompatibleFrequency): + self.january1 >= self.day + + with pytest.raises(TypeError): + print(self.january1 >= 1) + + def test_smallerEqual(self): + assert self.january1 <= self.january2 + + def test_smallerEqual_Raises_Value(self): + with pytest.raises(period.IncompatibleFrequency): + self.january1 <= self.day + + def test_smallerEqual_Raises_Type(self): + with pytest.raises(TypeError): + self.january1 <= 1 + + def test_smaller(self): + assert self.january1 < self.february + + def test_smaller_Raises_Value(self): + with pytest.raises(period.IncompatibleFrequency): + self.january1 < self.day + + def test_smaller_Raises_Type(self): + with pytest.raises(TypeError): + self.january1 < 1 + + def test_sort(self): + periods = [self.march, self.january1, self.february] + correctPeriods = [self.january1, self.february, self.march] + assert sorted(periods) == correctPeriods + + def test_period_nat_comp(self): + p_nat = Period('NaT', freq='D') + p = Period('2011-01-01', freq='D') + + nat = pd.Timestamp('NaT') + t = pd.Timestamp('2011-01-01') + # confirm Period('NaT') work identical with Timestamp('NaT') + for left, right in [(p_nat, p), (p, p_nat), (p_nat, p_nat), (nat, t), + (t, nat), (nat, nat)]: + assert not left < right + assert not left > right + assert not left == right + assert left != right + assert not left <= right + assert not left >= right + + +class TestMethods(object): + + def test_add(self): + dt1 = Period(freq='D', year=2008, month=1, day=1) + dt2 = Period(freq='D', year=2008, month=1, day=2) + assert dt1 + 1 == dt2 + assert 1 + dt1 == dt2 + + def test_add_pdnat(self): + p = pd.Period('2011-01', freq='M') + assert p + pd.NaT is pd.NaT + assert pd.NaT + p is pd.NaT + + p = pd.Period('NaT', freq='M') + assert p + pd.NaT is pd.NaT + assert pd.NaT + p is pd.NaT + + def test_add_raises(self): + # GH 4731 + dt1 = Period(freq='D', year=2008, month=1, day=1) + dt2 = Period(freq='D', year=2008, month=1, day=2) + msg = r"unsupported operand type\(s\)" + with tm.assert_raises_regex(TypeError, msg): + dt1 + "str" + + msg = r"unsupported operand type\(s\)" + with tm.assert_raises_regex(TypeError, msg): + "str" + dt1 + + with tm.assert_raises_regex(TypeError, msg): + dt1 + dt2 + + def test_sub(self): + dt1 = Period('2011-01-01', freq='D') + dt2 = Period('2011-01-15', freq='D') + + assert dt1 - dt2 == -14 + assert dt2 - dt1 == 14 + + msg = r"Input has different freq=M from Period\(freq=D\)" + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + dt1 - pd.Period('2011-02', freq='M') + + def test_add_offset(self): + # freq is DateOffset + for freq in ['A', '2A', '3A']: + p = Period('2011', freq=freq) + exp = Period('2013', freq=freq) + assert p + offsets.YearEnd(2) == exp + assert offsets.YearEnd(2) + p == exp + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + with pytest.raises(period.IncompatibleFrequency): + p + o + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + with pytest.raises(period.IncompatibleFrequency): + o + p + + for freq in ['M', '2M', '3M']: + p = Period('2011-03', freq=freq) + exp = Period('2011-05', freq=freq) + assert p + offsets.MonthEnd(2) == exp + assert offsets.MonthEnd(2) + p == exp + + exp = Period('2012-03', freq=freq) + assert p + offsets.MonthEnd(12) == exp + assert offsets.MonthEnd(12) + p == exp + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + with pytest.raises(period.IncompatibleFrequency): + p + o + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + with pytest.raises(period.IncompatibleFrequency): + o + p + + # freq is Tick + for freq in ['D', '2D', '3D']: + p = Period('2011-04-01', freq=freq) + + exp = Period('2011-04-06', freq=freq) + assert p + offsets.Day(5) == exp + assert offsets.Day(5) + p == exp + + exp = Period('2011-04-02', freq=freq) + assert p + offsets.Hour(24) == exp + assert offsets.Hour(24) + p == exp + + exp = Period('2011-04-03', freq=freq) + assert p + np.timedelta64(2, 'D') == exp + with pytest.raises(TypeError): + np.timedelta64(2, 'D') + p + + exp = Period('2011-04-02', freq=freq) + assert p + np.timedelta64(3600 * 24, 's') == exp + with pytest.raises(TypeError): + np.timedelta64(3600 * 24, 's') + p + + exp = Period('2011-03-30', freq=freq) + assert p + timedelta(-2) == exp + assert timedelta(-2) + p == exp + + exp = Period('2011-04-03', freq=freq) + assert p + timedelta(hours=48) == exp + assert timedelta(hours=48) + p == exp + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23)]: + with pytest.raises(period.IncompatibleFrequency): + p + o + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + with pytest.raises(period.IncompatibleFrequency): + o + p + + for freq in ['H', '2H', '3H']: + p = Period('2011-04-01 09:00', freq=freq) + + exp = Period('2011-04-03 09:00', freq=freq) + assert p + offsets.Day(2) == exp + assert offsets.Day(2) + p == exp + + exp = Period('2011-04-01 12:00', freq=freq) + assert p + offsets.Hour(3) == exp + assert offsets.Hour(3) + p == exp + + exp = Period('2011-04-01 12:00', freq=freq) + assert p + np.timedelta64(3, 'h') == exp + with pytest.raises(TypeError): + np.timedelta64(3, 'h') + p + + exp = Period('2011-04-01 10:00', freq=freq) + assert p + np.timedelta64(3600, 's') == exp + with pytest.raises(TypeError): + np.timedelta64(3600, 's') + p + + exp = Period('2011-04-01 11:00', freq=freq) + assert p + timedelta(minutes=120) == exp + assert timedelta(minutes=120) + p == exp + + exp = Period('2011-04-05 12:00', freq=freq) + assert p + timedelta(days=4, minutes=180) == exp + assert timedelta(days=4, minutes=180) + p == exp + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(3200, 's'), + timedelta(hours=23, minutes=30)]: + with pytest.raises(period.IncompatibleFrequency): + p + o + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + with pytest.raises(period.IncompatibleFrequency): + o + p + + def test_add_offset_nat(self): + # freq is DateOffset + for freq in ['A', '2A', '3A']: + p = Period('NaT', freq=freq) + for o in [offsets.YearEnd(2)]: + assert p + o is tslib.NaT + assert o + p is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + for freq in ['M', '2M', '3M']: + p = Period('NaT', freq=freq) + for o in [offsets.MonthEnd(2), offsets.MonthEnd(12)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + # freq is Tick + for freq in ['D', '2D', '3D']: + p = Period('NaT', freq=freq) + for o in [offsets.Day(5), offsets.Hour(24), np.timedelta64(2, 'D'), + np.timedelta64(3600 * 24, 's'), timedelta(-2), + timedelta(hours=48)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + for freq in ['H', '2H', '3H']: + p = Period('NaT', freq=freq) + for o in [offsets.Day(2), offsets.Hour(3), np.timedelta64(3, 'h'), + np.timedelta64(3600, 's'), timedelta(minutes=120), + timedelta(days=4, minutes=180)]: + assert p + o is tslib.NaT + + if not isinstance(o, np.timedelta64): + assert o + p is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(3200, 's'), + timedelta(hours=23, minutes=30)]: + assert p + o is tslib.NaT + + if isinstance(o, np.timedelta64): + with pytest.raises(TypeError): + o + p + else: + assert o + p is tslib.NaT + + def test_sub_pdnat(self): + # GH 13071 + p = pd.Period('2011-01', freq='M') + assert p - pd.NaT is pd.NaT + assert pd.NaT - p is pd.NaT + + p = pd.Period('NaT', freq='M') + assert p - pd.NaT is pd.NaT + assert pd.NaT - p is pd.NaT + + def test_sub_offset(self): + # freq is DateOffset + for freq in ['A', '2A', '3A']: + p = Period('2011', freq=freq) + assert p - offsets.YearEnd(2) == Period('2009', freq=freq) + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + with pytest.raises(period.IncompatibleFrequency): + p - o + + for freq in ['M', '2M', '3M']: + p = Period('2011-03', freq=freq) + assert p - offsets.MonthEnd(2) == Period('2011-01', freq=freq) + assert p - offsets.MonthEnd(12) == Period('2010-03', freq=freq) + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + with pytest.raises(period.IncompatibleFrequency): + p - o + + # freq is Tick + for freq in ['D', '2D', '3D']: + p = Period('2011-04-01', freq=freq) + assert p - offsets.Day(5) == Period('2011-03-27', freq=freq) + assert p - offsets.Hour(24) == Period('2011-03-31', freq=freq) + assert p - np.timedelta64(2, 'D') == Period( + '2011-03-30', freq=freq) + assert p - np.timedelta64(3600 * 24, 's') == Period( + '2011-03-31', freq=freq) + assert p - timedelta(-2) == Period('2011-04-03', freq=freq) + assert p - timedelta(hours=48) == Period('2011-03-30', freq=freq) + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23)]: + with pytest.raises(period.IncompatibleFrequency): + p - o + + for freq in ['H', '2H', '3H']: + p = Period('2011-04-01 09:00', freq=freq) + assert p - offsets.Day(2) == Period('2011-03-30 09:00', freq=freq) + assert p - offsets.Hour(3) == Period('2011-04-01 06:00', freq=freq) + assert p - np.timedelta64(3, 'h') == Period( + '2011-04-01 06:00', freq=freq) + assert p - np.timedelta64(3600, 's') == Period( + '2011-04-01 08:00', freq=freq) + assert p - timedelta(minutes=120) == Period( + '2011-04-01 07:00', freq=freq) + assert p - timedelta(days=4, minutes=180) == Period( + '2011-03-28 06:00', freq=freq) + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(3200, 's'), + timedelta(hours=23, minutes=30)]: + with pytest.raises(period.IncompatibleFrequency): + p - o + + def test_sub_offset_nat(self): + # freq is DateOffset + for freq in ['A', '2A', '3A']: + p = Period('NaT', freq=freq) + for o in [offsets.YearEnd(2)]: + assert p - o is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + assert p - o is tslib.NaT + + for freq in ['M', '2M', '3M']: + p = Period('NaT', freq=freq) + for o in [offsets.MonthEnd(2), offsets.MonthEnd(12)]: + assert p - o is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(365, 'D'), + timedelta(365)]: + assert p - o is tslib.NaT + + # freq is Tick + for freq in ['D', '2D', '3D']: + p = Period('NaT', freq=freq) + for o in [offsets.Day(5), offsets.Hour(24), np.timedelta64(2, 'D'), + np.timedelta64(3600 * 24, 's'), timedelta(-2), + timedelta(hours=48)]: + assert p - o is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(4, 'h'), + timedelta(hours=23)]: + assert p - o is tslib.NaT + + for freq in ['H', '2H', '3H']: + p = Period('NaT', freq=freq) + for o in [offsets.Day(2), offsets.Hour(3), np.timedelta64(3, 'h'), + np.timedelta64(3600, 's'), timedelta(minutes=120), + timedelta(days=4, minutes=180)]: + assert p - o is tslib.NaT + + for o in [offsets.YearBegin(2), offsets.MonthBegin(1), + offsets.Minute(), np.timedelta64(3200, 's'), + timedelta(hours=23, minutes=30)]: + assert p - o is tslib.NaT + + def test_nat_ops(self): + for freq in ['M', '2M', '3M']: + p = Period('NaT', freq=freq) + assert p + 1 is tslib.NaT + assert 1 + p is tslib.NaT + assert p - 1 is tslib.NaT + assert p - Period('2011-01', freq=freq) is tslib.NaT + assert Period('2011-01', freq=freq) - p is tslib.NaT + + def test_period_ops_offset(self): + p = Period('2011-04-01', freq='D') + result = p + offsets.Day() + exp = pd.Period('2011-04-02', freq='D') + assert result == exp + + result = p - offsets.Day(2) + exp = pd.Period('2011-03-30', freq='D') + assert result == exp + + msg = r"Input cannot be converted to Period\(freq=D\)" + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + p + offsets.Hour(2) + + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + p - offsets.Hour(2) + + +def test_period_immutable(): + # see gh-17116 + per = pd.Period('2014Q1') + with pytest.raises(AttributeError): + per.ordinal = 14 + + freq = per.freq + with pytest.raises(AttributeError): + per.freq = 2 * freq diff --git a/pandas/tests/scalar/test_period_asfreq.py b/pandas/tests/scalar/test_period_asfreq.py new file mode 100644 index 0000000000000..32cea60c333b7 --- /dev/null +++ b/pandas/tests/scalar/test_period_asfreq.py @@ -0,0 +1,716 @@ +import pandas as pd +from pandas import Period, offsets +from pandas.util import testing as tm +from pandas.tseries.frequencies import _period_code_map + + +class TestFreqConversion(object): + """Test frequency conversion of date objects""" + + def test_asfreq_corner(self): + val = Period(freq='A', year=2007) + result1 = val.asfreq('5t') + result2 = val.asfreq('t') + expected = Period('2007-12-31 23:59', freq='t') + assert result1.ordinal == expected.ordinal + assert result1.freqstr == '5T' + assert result2.ordinal == expected.ordinal + assert result2.freqstr == 'T' + + def test_conv_annual(self): + # frequency conversion tests: from Annual Frequency + + ival_A = Period(freq='A', year=2007) + + ival_AJAN = Period(freq="A-JAN", year=2007) + ival_AJUN = Period(freq="A-JUN", year=2007) + ival_ANOV = Period(freq="A-NOV", year=2007) + + ival_A_to_Q_start = Period(freq='Q', year=2007, quarter=1) + ival_A_to_Q_end = Period(freq='Q', year=2007, quarter=4) + ival_A_to_M_start = Period(freq='M', year=2007, month=1) + ival_A_to_M_end = Period(freq='M', year=2007, month=12) + ival_A_to_W_start = Period(freq='W', year=2007, month=1, day=1) + ival_A_to_W_end = Period(freq='W', year=2007, month=12, day=31) + ival_A_to_B_start = Period(freq='B', year=2007, month=1, day=1) + ival_A_to_B_end = Period(freq='B', year=2007, month=12, day=31) + ival_A_to_D_start = Period(freq='D', year=2007, month=1, day=1) + ival_A_to_D_end = Period(freq='D', year=2007, month=12, day=31) + ival_A_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_A_to_H_end = Period(freq='H', year=2007, month=12, day=31, + hour=23) + ival_A_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_A_to_T_end = Period(freq='Min', year=2007, month=12, day=31, + hour=23, minute=59) + ival_A_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_A_to_S_end = Period(freq='S', year=2007, month=12, day=31, + hour=23, minute=59, second=59) + + ival_AJAN_to_D_end = Period(freq='D', year=2007, month=1, day=31) + ival_AJAN_to_D_start = Period(freq='D', year=2006, month=2, day=1) + ival_AJUN_to_D_end = Period(freq='D', year=2007, month=6, day=30) + ival_AJUN_to_D_start = Period(freq='D', year=2006, month=7, day=1) + ival_ANOV_to_D_end = Period(freq='D', year=2007, month=11, day=30) + ival_ANOV_to_D_start = Period(freq='D', year=2006, month=12, day=1) + + assert ival_A.asfreq('Q', 'S') == ival_A_to_Q_start + assert ival_A.asfreq('Q', 'e') == ival_A_to_Q_end + assert ival_A.asfreq('M', 's') == ival_A_to_M_start + assert ival_A.asfreq('M', 'E') == ival_A_to_M_end + assert ival_A.asfreq('W', 'S') == ival_A_to_W_start + assert ival_A.asfreq('W', 'E') == ival_A_to_W_end + assert ival_A.asfreq('B', 'S') == ival_A_to_B_start + assert ival_A.asfreq('B', 'E') == ival_A_to_B_end + assert ival_A.asfreq('D', 'S') == ival_A_to_D_start + assert ival_A.asfreq('D', 'E') == ival_A_to_D_end + assert ival_A.asfreq('H', 'S') == ival_A_to_H_start + assert ival_A.asfreq('H', 'E') == ival_A_to_H_end + assert ival_A.asfreq('min', 'S') == ival_A_to_T_start + assert ival_A.asfreq('min', 'E') == ival_A_to_T_end + assert ival_A.asfreq('T', 'S') == ival_A_to_T_start + assert ival_A.asfreq('T', 'E') == ival_A_to_T_end + assert ival_A.asfreq('S', 'S') == ival_A_to_S_start + assert ival_A.asfreq('S', 'E') == ival_A_to_S_end + + assert ival_AJAN.asfreq('D', 'S') == ival_AJAN_to_D_start + assert ival_AJAN.asfreq('D', 'E') == ival_AJAN_to_D_end + + assert ival_AJUN.asfreq('D', 'S') == ival_AJUN_to_D_start + assert ival_AJUN.asfreq('D', 'E') == ival_AJUN_to_D_end + + assert ival_ANOV.asfreq('D', 'S') == ival_ANOV_to_D_start + assert ival_ANOV.asfreq('D', 'E') == ival_ANOV_to_D_end + + assert ival_A.asfreq('A') == ival_A + + def test_conv_quarterly(self): + # frequency conversion tests: from Quarterly Frequency + + ival_Q = Period(freq='Q', year=2007, quarter=1) + ival_Q_end_of_year = Period(freq='Q', year=2007, quarter=4) + + ival_QEJAN = Period(freq="Q-JAN", year=2007, quarter=1) + ival_QEJUN = Period(freq="Q-JUN", year=2007, quarter=1) + + ival_Q_to_A = Period(freq='A', year=2007) + ival_Q_to_M_start = Period(freq='M', year=2007, month=1) + ival_Q_to_M_end = Period(freq='M', year=2007, month=3) + ival_Q_to_W_start = Period(freq='W', year=2007, month=1, day=1) + ival_Q_to_W_end = Period(freq='W', year=2007, month=3, day=31) + ival_Q_to_B_start = Period(freq='B', year=2007, month=1, day=1) + ival_Q_to_B_end = Period(freq='B', year=2007, month=3, day=30) + ival_Q_to_D_start = Period(freq='D', year=2007, month=1, day=1) + ival_Q_to_D_end = Period(freq='D', year=2007, month=3, day=31) + ival_Q_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_Q_to_H_end = Period(freq='H', year=2007, month=3, day=31, hour=23) + ival_Q_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_Q_to_T_end = Period(freq='Min', year=2007, month=3, day=31, + hour=23, minute=59) + ival_Q_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_Q_to_S_end = Period(freq='S', year=2007, month=3, day=31, hour=23, + minute=59, second=59) + + ival_QEJAN_to_D_start = Period(freq='D', year=2006, month=2, day=1) + ival_QEJAN_to_D_end = Period(freq='D', year=2006, month=4, day=30) + + ival_QEJUN_to_D_start = Period(freq='D', year=2006, month=7, day=1) + ival_QEJUN_to_D_end = Period(freq='D', year=2006, month=9, day=30) + + assert ival_Q.asfreq('A') == ival_Q_to_A + assert ival_Q_end_of_year.asfreq('A') == ival_Q_to_A + + assert ival_Q.asfreq('M', 'S') == ival_Q_to_M_start + assert ival_Q.asfreq('M', 'E') == ival_Q_to_M_end + assert ival_Q.asfreq('W', 'S') == ival_Q_to_W_start + assert ival_Q.asfreq('W', 'E') == ival_Q_to_W_end + assert ival_Q.asfreq('B', 'S') == ival_Q_to_B_start + assert ival_Q.asfreq('B', 'E') == ival_Q_to_B_end + assert ival_Q.asfreq('D', 'S') == ival_Q_to_D_start + assert ival_Q.asfreq('D', 'E') == ival_Q_to_D_end + assert ival_Q.asfreq('H', 'S') == ival_Q_to_H_start + assert ival_Q.asfreq('H', 'E') == ival_Q_to_H_end + assert ival_Q.asfreq('Min', 'S') == ival_Q_to_T_start + assert ival_Q.asfreq('Min', 'E') == ival_Q_to_T_end + assert ival_Q.asfreq('S', 'S') == ival_Q_to_S_start + assert ival_Q.asfreq('S', 'E') == ival_Q_to_S_end + + assert ival_QEJAN.asfreq('D', 'S') == ival_QEJAN_to_D_start + assert ival_QEJAN.asfreq('D', 'E') == ival_QEJAN_to_D_end + assert ival_QEJUN.asfreq('D', 'S') == ival_QEJUN_to_D_start + assert ival_QEJUN.asfreq('D', 'E') == ival_QEJUN_to_D_end + + assert ival_Q.asfreq('Q') == ival_Q + + def test_conv_monthly(self): + # frequency conversion tests: from Monthly Frequency + + ival_M = Period(freq='M', year=2007, month=1) + ival_M_end_of_year = Period(freq='M', year=2007, month=12) + ival_M_end_of_quarter = Period(freq='M', year=2007, month=3) + ival_M_to_A = Period(freq='A', year=2007) + ival_M_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_M_to_W_start = Period(freq='W', year=2007, month=1, day=1) + ival_M_to_W_end = Period(freq='W', year=2007, month=1, day=31) + ival_M_to_B_start = Period(freq='B', year=2007, month=1, day=1) + ival_M_to_B_end = Period(freq='B', year=2007, month=1, day=31) + ival_M_to_D_start = Period(freq='D', year=2007, month=1, day=1) + ival_M_to_D_end = Period(freq='D', year=2007, month=1, day=31) + ival_M_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_M_to_H_end = Period(freq='H', year=2007, month=1, day=31, hour=23) + ival_M_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_M_to_T_end = Period(freq='Min', year=2007, month=1, day=31, + hour=23, minute=59) + ival_M_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_M_to_S_end = Period(freq='S', year=2007, month=1, day=31, hour=23, + minute=59, second=59) + + assert ival_M.asfreq('A') == ival_M_to_A + assert ival_M_end_of_year.asfreq('A') == ival_M_to_A + assert ival_M.asfreq('Q') == ival_M_to_Q + assert ival_M_end_of_quarter.asfreq('Q') == ival_M_to_Q + + assert ival_M.asfreq('W', 'S') == ival_M_to_W_start + assert ival_M.asfreq('W', 'E') == ival_M_to_W_end + assert ival_M.asfreq('B', 'S') == ival_M_to_B_start + assert ival_M.asfreq('B', 'E') == ival_M_to_B_end + assert ival_M.asfreq('D', 'S') == ival_M_to_D_start + assert ival_M.asfreq('D', 'E') == ival_M_to_D_end + assert ival_M.asfreq('H', 'S') == ival_M_to_H_start + assert ival_M.asfreq('H', 'E') == ival_M_to_H_end + assert ival_M.asfreq('Min', 'S') == ival_M_to_T_start + assert ival_M.asfreq('Min', 'E') == ival_M_to_T_end + assert ival_M.asfreq('S', 'S') == ival_M_to_S_start + assert ival_M.asfreq('S', 'E') == ival_M_to_S_end + + assert ival_M.asfreq('M') == ival_M + + def test_conv_weekly(self): + # frequency conversion tests: from Weekly Frequency + ival_W = Period(freq='W', year=2007, month=1, day=1) + + ival_WSUN = Period(freq='W', year=2007, month=1, day=7) + ival_WSAT = Period(freq='W-SAT', year=2007, month=1, day=6) + ival_WFRI = Period(freq='W-FRI', year=2007, month=1, day=5) + ival_WTHU = Period(freq='W-THU', year=2007, month=1, day=4) + ival_WWED = Period(freq='W-WED', year=2007, month=1, day=3) + ival_WTUE = Period(freq='W-TUE', year=2007, month=1, day=2) + ival_WMON = Period(freq='W-MON', year=2007, month=1, day=1) + + ival_WSUN_to_D_start = Period(freq='D', year=2007, month=1, day=1) + ival_WSUN_to_D_end = Period(freq='D', year=2007, month=1, day=7) + ival_WSAT_to_D_start = Period(freq='D', year=2006, month=12, day=31) + ival_WSAT_to_D_end = Period(freq='D', year=2007, month=1, day=6) + ival_WFRI_to_D_start = Period(freq='D', year=2006, month=12, day=30) + ival_WFRI_to_D_end = Period(freq='D', year=2007, month=1, day=5) + ival_WTHU_to_D_start = Period(freq='D', year=2006, month=12, day=29) + ival_WTHU_to_D_end = Period(freq='D', year=2007, month=1, day=4) + ival_WWED_to_D_start = Period(freq='D', year=2006, month=12, day=28) + ival_WWED_to_D_end = Period(freq='D', year=2007, month=1, day=3) + ival_WTUE_to_D_start = Period(freq='D', year=2006, month=12, day=27) + ival_WTUE_to_D_end = Period(freq='D', year=2007, month=1, day=2) + ival_WMON_to_D_start = Period(freq='D', year=2006, month=12, day=26) + ival_WMON_to_D_end = Period(freq='D', year=2007, month=1, day=1) + + ival_W_end_of_year = Period(freq='W', year=2007, month=12, day=31) + ival_W_end_of_quarter = Period(freq='W', year=2007, month=3, day=31) + ival_W_end_of_month = Period(freq='W', year=2007, month=1, day=31) + ival_W_to_A = Period(freq='A', year=2007) + ival_W_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_W_to_M = Period(freq='M', year=2007, month=1) + + if Period(freq='D', year=2007, month=12, day=31).weekday == 6: + ival_W_to_A_end_of_year = Period(freq='A', year=2007) + else: + ival_W_to_A_end_of_year = Period(freq='A', year=2008) + + if Period(freq='D', year=2007, month=3, day=31).weekday == 6: + ival_W_to_Q_end_of_quarter = Period(freq='Q', year=2007, quarter=1) + else: + ival_W_to_Q_end_of_quarter = Period(freq='Q', year=2007, quarter=2) + + if Period(freq='D', year=2007, month=1, day=31).weekday == 6: + ival_W_to_M_end_of_month = Period(freq='M', year=2007, month=1) + else: + ival_W_to_M_end_of_month = Period(freq='M', year=2007, month=2) + + ival_W_to_B_start = Period(freq='B', year=2007, month=1, day=1) + ival_W_to_B_end = Period(freq='B', year=2007, month=1, day=5) + ival_W_to_D_start = Period(freq='D', year=2007, month=1, day=1) + ival_W_to_D_end = Period(freq='D', year=2007, month=1, day=7) + ival_W_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_W_to_H_end = Period(freq='H', year=2007, month=1, day=7, hour=23) + ival_W_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_W_to_T_end = Period(freq='Min', year=2007, month=1, day=7, + hour=23, minute=59) + ival_W_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_W_to_S_end = Period(freq='S', year=2007, month=1, day=7, hour=23, + minute=59, second=59) + + assert ival_W.asfreq('A') == ival_W_to_A + assert ival_W_end_of_year.asfreq('A') == ival_W_to_A_end_of_year + + assert ival_W.asfreq('Q') == ival_W_to_Q + assert ival_W_end_of_quarter.asfreq('Q') == ival_W_to_Q_end_of_quarter + + assert ival_W.asfreq('M') == ival_W_to_M + assert ival_W_end_of_month.asfreq('M') == ival_W_to_M_end_of_month + + assert ival_W.asfreq('B', 'S') == ival_W_to_B_start + assert ival_W.asfreq('B', 'E') == ival_W_to_B_end + + assert ival_W.asfreq('D', 'S') == ival_W_to_D_start + assert ival_W.asfreq('D', 'E') == ival_W_to_D_end + + assert ival_WSUN.asfreq('D', 'S') == ival_WSUN_to_D_start + assert ival_WSUN.asfreq('D', 'E') == ival_WSUN_to_D_end + assert ival_WSAT.asfreq('D', 'S') == ival_WSAT_to_D_start + assert ival_WSAT.asfreq('D', 'E') == ival_WSAT_to_D_end + assert ival_WFRI.asfreq('D', 'S') == ival_WFRI_to_D_start + assert ival_WFRI.asfreq('D', 'E') == ival_WFRI_to_D_end + assert ival_WTHU.asfreq('D', 'S') == ival_WTHU_to_D_start + assert ival_WTHU.asfreq('D', 'E') == ival_WTHU_to_D_end + assert ival_WWED.asfreq('D', 'S') == ival_WWED_to_D_start + assert ival_WWED.asfreq('D', 'E') == ival_WWED_to_D_end + assert ival_WTUE.asfreq('D', 'S') == ival_WTUE_to_D_start + assert ival_WTUE.asfreq('D', 'E') == ival_WTUE_to_D_end + assert ival_WMON.asfreq('D', 'S') == ival_WMON_to_D_start + assert ival_WMON.asfreq('D', 'E') == ival_WMON_to_D_end + + assert ival_W.asfreq('H', 'S') == ival_W_to_H_start + assert ival_W.asfreq('H', 'E') == ival_W_to_H_end + assert ival_W.asfreq('Min', 'S') == ival_W_to_T_start + assert ival_W.asfreq('Min', 'E') == ival_W_to_T_end + assert ival_W.asfreq('S', 'S') == ival_W_to_S_start + assert ival_W.asfreq('S', 'E') == ival_W_to_S_end + + assert ival_W.asfreq('W') == ival_W + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + ival_W.asfreq('WK') + + def test_conv_weekly_legacy(self): + # frequency conversion tests: from Weekly Frequency + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK', year=2007, month=1, day=1) + + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-SAT', year=2007, month=1, day=6) + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-FRI', year=2007, month=1, day=5) + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-THU', year=2007, month=1, day=4) + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-WED', year=2007, month=1, day=3) + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-TUE', year=2007, month=1, day=2) + with tm.assert_raises_regex(ValueError, msg): + Period(freq='WK-MON', year=2007, month=1, day=1) + + def test_conv_business(self): + # frequency conversion tests: from Business Frequency" + + ival_B = Period(freq='B', year=2007, month=1, day=1) + ival_B_end_of_year = Period(freq='B', year=2007, month=12, day=31) + ival_B_end_of_quarter = Period(freq='B', year=2007, month=3, day=30) + ival_B_end_of_month = Period(freq='B', year=2007, month=1, day=31) + ival_B_end_of_week = Period(freq='B', year=2007, month=1, day=5) + + ival_B_to_A = Period(freq='A', year=2007) + ival_B_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_B_to_M = Period(freq='M', year=2007, month=1) + ival_B_to_W = Period(freq='W', year=2007, month=1, day=7) + ival_B_to_D = Period(freq='D', year=2007, month=1, day=1) + ival_B_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_B_to_H_end = Period(freq='H', year=2007, month=1, day=1, hour=23) + ival_B_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_B_to_T_end = Period(freq='Min', year=2007, month=1, day=1, + hour=23, minute=59) + ival_B_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_B_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=23, + minute=59, second=59) + + assert ival_B.asfreq('A') == ival_B_to_A + assert ival_B_end_of_year.asfreq('A') == ival_B_to_A + assert ival_B.asfreq('Q') == ival_B_to_Q + assert ival_B_end_of_quarter.asfreq('Q') == ival_B_to_Q + assert ival_B.asfreq('M') == ival_B_to_M + assert ival_B_end_of_month.asfreq('M') == ival_B_to_M + assert ival_B.asfreq('W') == ival_B_to_W + assert ival_B_end_of_week.asfreq('W') == ival_B_to_W + + assert ival_B.asfreq('D') == ival_B_to_D + + assert ival_B.asfreq('H', 'S') == ival_B_to_H_start + assert ival_B.asfreq('H', 'E') == ival_B_to_H_end + assert ival_B.asfreq('Min', 'S') == ival_B_to_T_start + assert ival_B.asfreq('Min', 'E') == ival_B_to_T_end + assert ival_B.asfreq('S', 'S') == ival_B_to_S_start + assert ival_B.asfreq('S', 'E') == ival_B_to_S_end + + assert ival_B.asfreq('B') == ival_B + + def test_conv_daily(self): + # frequency conversion tests: from Business Frequency" + + ival_D = Period(freq='D', year=2007, month=1, day=1) + ival_D_end_of_year = Period(freq='D', year=2007, month=12, day=31) + ival_D_end_of_quarter = Period(freq='D', year=2007, month=3, day=31) + ival_D_end_of_month = Period(freq='D', year=2007, month=1, day=31) + ival_D_end_of_week = Period(freq='D', year=2007, month=1, day=7) + + ival_D_friday = Period(freq='D', year=2007, month=1, day=5) + ival_D_saturday = Period(freq='D', year=2007, month=1, day=6) + ival_D_sunday = Period(freq='D', year=2007, month=1, day=7) + + # TODO: unused? + # ival_D_monday = Period(freq='D', year=2007, month=1, day=8) + + ival_B_friday = Period(freq='B', year=2007, month=1, day=5) + ival_B_monday = Period(freq='B', year=2007, month=1, day=8) + + ival_D_to_A = Period(freq='A', year=2007) + + ival_Deoq_to_AJAN = Period(freq='A-JAN', year=2008) + ival_Deoq_to_AJUN = Period(freq='A-JUN', year=2007) + ival_Deoq_to_ADEC = Period(freq='A-DEC', year=2007) + + ival_D_to_QEJAN = Period(freq="Q-JAN", year=2007, quarter=4) + ival_D_to_QEJUN = Period(freq="Q-JUN", year=2007, quarter=3) + ival_D_to_QEDEC = Period(freq="Q-DEC", year=2007, quarter=1) + + ival_D_to_M = Period(freq='M', year=2007, month=1) + ival_D_to_W = Period(freq='W', year=2007, month=1, day=7) + + ival_D_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_D_to_H_end = Period(freq='H', year=2007, month=1, day=1, hour=23) + ival_D_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_D_to_T_end = Period(freq='Min', year=2007, month=1, day=1, + hour=23, minute=59) + ival_D_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_D_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=23, + minute=59, second=59) + + assert ival_D.asfreq('A') == ival_D_to_A + + assert ival_D_end_of_quarter.asfreq('A-JAN') == ival_Deoq_to_AJAN + assert ival_D_end_of_quarter.asfreq('A-JUN') == ival_Deoq_to_AJUN + assert ival_D_end_of_quarter.asfreq('A-DEC') == ival_Deoq_to_ADEC + + assert ival_D_end_of_year.asfreq('A') == ival_D_to_A + assert ival_D_end_of_quarter.asfreq('Q') == ival_D_to_QEDEC + assert ival_D.asfreq("Q-JAN") == ival_D_to_QEJAN + assert ival_D.asfreq("Q-JUN") == ival_D_to_QEJUN + assert ival_D.asfreq("Q-DEC") == ival_D_to_QEDEC + assert ival_D.asfreq('M') == ival_D_to_M + assert ival_D_end_of_month.asfreq('M') == ival_D_to_M + assert ival_D.asfreq('W') == ival_D_to_W + assert ival_D_end_of_week.asfreq('W') == ival_D_to_W + + assert ival_D_friday.asfreq('B') == ival_B_friday + assert ival_D_saturday.asfreq('B', 'S') == ival_B_friday + assert ival_D_saturday.asfreq('B', 'E') == ival_B_monday + assert ival_D_sunday.asfreq('B', 'S') == ival_B_friday + assert ival_D_sunday.asfreq('B', 'E') == ival_B_monday + + assert ival_D.asfreq('H', 'S') == ival_D_to_H_start + assert ival_D.asfreq('H', 'E') == ival_D_to_H_end + assert ival_D.asfreq('Min', 'S') == ival_D_to_T_start + assert ival_D.asfreq('Min', 'E') == ival_D_to_T_end + assert ival_D.asfreq('S', 'S') == ival_D_to_S_start + assert ival_D.asfreq('S', 'E') == ival_D_to_S_end + + assert ival_D.asfreq('D') == ival_D + + def test_conv_hourly(self): + # frequency conversion tests: from Hourly Frequency" + + ival_H = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_H_end_of_year = Period(freq='H', year=2007, month=12, day=31, + hour=23) + ival_H_end_of_quarter = Period(freq='H', year=2007, month=3, day=31, + hour=23) + ival_H_end_of_month = Period(freq='H', year=2007, month=1, day=31, + hour=23) + ival_H_end_of_week = Period(freq='H', year=2007, month=1, day=7, + hour=23) + ival_H_end_of_day = Period(freq='H', year=2007, month=1, day=1, + hour=23) + ival_H_end_of_bus = Period(freq='H', year=2007, month=1, day=1, + hour=23) + + ival_H_to_A = Period(freq='A', year=2007) + ival_H_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_H_to_M = Period(freq='M', year=2007, month=1) + ival_H_to_W = Period(freq='W', year=2007, month=1, day=7) + ival_H_to_D = Period(freq='D', year=2007, month=1, day=1) + ival_H_to_B = Period(freq='B', year=2007, month=1, day=1) + + ival_H_to_T_start = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=0) + ival_H_to_T_end = Period(freq='Min', year=2007, month=1, day=1, hour=0, + minute=59) + ival_H_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_H_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=59, second=59) + + assert ival_H.asfreq('A') == ival_H_to_A + assert ival_H_end_of_year.asfreq('A') == ival_H_to_A + assert ival_H.asfreq('Q') == ival_H_to_Q + assert ival_H_end_of_quarter.asfreq('Q') == ival_H_to_Q + assert ival_H.asfreq('M') == ival_H_to_M + assert ival_H_end_of_month.asfreq('M') == ival_H_to_M + assert ival_H.asfreq('W') == ival_H_to_W + assert ival_H_end_of_week.asfreq('W') == ival_H_to_W + assert ival_H.asfreq('D') == ival_H_to_D + assert ival_H_end_of_day.asfreq('D') == ival_H_to_D + assert ival_H.asfreq('B') == ival_H_to_B + assert ival_H_end_of_bus.asfreq('B') == ival_H_to_B + + assert ival_H.asfreq('Min', 'S') == ival_H_to_T_start + assert ival_H.asfreq('Min', 'E') == ival_H_to_T_end + assert ival_H.asfreq('S', 'S') == ival_H_to_S_start + assert ival_H.asfreq('S', 'E') == ival_H_to_S_end + + assert ival_H.asfreq('H') == ival_H + + def test_conv_minutely(self): + # frequency conversion tests: from Minutely Frequency" + + ival_T = Period(freq='Min', year=2007, month=1, day=1, hour=0, + minute=0) + ival_T_end_of_year = Period(freq='Min', year=2007, month=12, day=31, + hour=23, minute=59) + ival_T_end_of_quarter = Period(freq='Min', year=2007, month=3, day=31, + hour=23, minute=59) + ival_T_end_of_month = Period(freq='Min', year=2007, month=1, day=31, + hour=23, minute=59) + ival_T_end_of_week = Period(freq='Min', year=2007, month=1, day=7, + hour=23, minute=59) + ival_T_end_of_day = Period(freq='Min', year=2007, month=1, day=1, + hour=23, minute=59) + ival_T_end_of_bus = Period(freq='Min', year=2007, month=1, day=1, + hour=23, minute=59) + ival_T_end_of_hour = Period(freq='Min', year=2007, month=1, day=1, + hour=0, minute=59) + + ival_T_to_A = Period(freq='A', year=2007) + ival_T_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_T_to_M = Period(freq='M', year=2007, month=1) + ival_T_to_W = Period(freq='W', year=2007, month=1, day=7) + ival_T_to_D = Period(freq='D', year=2007, month=1, day=1) + ival_T_to_B = Period(freq='B', year=2007, month=1, day=1) + ival_T_to_H = Period(freq='H', year=2007, month=1, day=1, hour=0) + + ival_T_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=0) + ival_T_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=0, + minute=0, second=59) + + assert ival_T.asfreq('A') == ival_T_to_A + assert ival_T_end_of_year.asfreq('A') == ival_T_to_A + assert ival_T.asfreq('Q') == ival_T_to_Q + assert ival_T_end_of_quarter.asfreq('Q') == ival_T_to_Q + assert ival_T.asfreq('M') == ival_T_to_M + assert ival_T_end_of_month.asfreq('M') == ival_T_to_M + assert ival_T.asfreq('W') == ival_T_to_W + assert ival_T_end_of_week.asfreq('W') == ival_T_to_W + assert ival_T.asfreq('D') == ival_T_to_D + assert ival_T_end_of_day.asfreq('D') == ival_T_to_D + assert ival_T.asfreq('B') == ival_T_to_B + assert ival_T_end_of_bus.asfreq('B') == ival_T_to_B + assert ival_T.asfreq('H') == ival_T_to_H + assert ival_T_end_of_hour.asfreq('H') == ival_T_to_H + + assert ival_T.asfreq('S', 'S') == ival_T_to_S_start + assert ival_T.asfreq('S', 'E') == ival_T_to_S_end + + assert ival_T.asfreq('Min') == ival_T + + def test_conv_secondly(self): + # frequency conversion tests: from Secondly Frequency" + + ival_S = Period(freq='S', year=2007, month=1, day=1, hour=0, minute=0, + second=0) + ival_S_end_of_year = Period(freq='S', year=2007, month=12, day=31, + hour=23, minute=59, second=59) + ival_S_end_of_quarter = Period(freq='S', year=2007, month=3, day=31, + hour=23, minute=59, second=59) + ival_S_end_of_month = Period(freq='S', year=2007, month=1, day=31, + hour=23, minute=59, second=59) + ival_S_end_of_week = Period(freq='S', year=2007, month=1, day=7, + hour=23, minute=59, second=59) + ival_S_end_of_day = Period(freq='S', year=2007, month=1, day=1, + hour=23, minute=59, second=59) + ival_S_end_of_bus = Period(freq='S', year=2007, month=1, day=1, + hour=23, minute=59, second=59) + ival_S_end_of_hour = Period(freq='S', year=2007, month=1, day=1, + hour=0, minute=59, second=59) + ival_S_end_of_minute = Period(freq='S', year=2007, month=1, day=1, + hour=0, minute=0, second=59) + + ival_S_to_A = Period(freq='A', year=2007) + ival_S_to_Q = Period(freq='Q', year=2007, quarter=1) + ival_S_to_M = Period(freq='M', year=2007, month=1) + ival_S_to_W = Period(freq='W', year=2007, month=1, day=7) + ival_S_to_D = Period(freq='D', year=2007, month=1, day=1) + ival_S_to_B = Period(freq='B', year=2007, month=1, day=1) + ival_S_to_H = Period(freq='H', year=2007, month=1, day=1, hour=0) + ival_S_to_T = Period(freq='Min', year=2007, month=1, day=1, hour=0, + minute=0) + + assert ival_S.asfreq('A') == ival_S_to_A + assert ival_S_end_of_year.asfreq('A') == ival_S_to_A + assert ival_S.asfreq('Q') == ival_S_to_Q + assert ival_S_end_of_quarter.asfreq('Q') == ival_S_to_Q + assert ival_S.asfreq('M') == ival_S_to_M + assert ival_S_end_of_month.asfreq('M') == ival_S_to_M + assert ival_S.asfreq('W') == ival_S_to_W + assert ival_S_end_of_week.asfreq('W') == ival_S_to_W + assert ival_S.asfreq('D') == ival_S_to_D + assert ival_S_end_of_day.asfreq('D') == ival_S_to_D + assert ival_S.asfreq('B') == ival_S_to_B + assert ival_S_end_of_bus.asfreq('B') == ival_S_to_B + assert ival_S.asfreq('H') == ival_S_to_H + assert ival_S_end_of_hour.asfreq('H') == ival_S_to_H + assert ival_S.asfreq('Min') == ival_S_to_T + assert ival_S_end_of_minute.asfreq('Min') == ival_S_to_T + + assert ival_S.asfreq('S') == ival_S + + def test_asfreq_mult(self): + # normal freq to mult freq + p = Period(freq='A', year=2007) + # ordinal will not change + for freq in ['3A', offsets.YearEnd(3)]: + result = p.asfreq(freq) + expected = Period('2007', freq='3A') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + # ordinal will not change + for freq in ['3A', offsets.YearEnd(3)]: + result = p.asfreq(freq, how='S') + expected = Period('2007', freq='3A') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + + # mult freq to normal freq + p = Period(freq='3A', year=2007) + # ordinal will change because how=E is the default + for freq in ['A', offsets.YearEnd()]: + result = p.asfreq(freq) + expected = Period('2009', freq='A') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + # ordinal will not change + for freq in ['A', offsets.YearEnd()]: + result = p.asfreq(freq, how='S') + expected = Period('2007', freq='A') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + + p = Period(freq='A', year=2007) + for freq in ['2M', offsets.MonthEnd(2)]: + result = p.asfreq(freq) + expected = Period('2007-12', freq='2M') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + for freq in ['2M', offsets.MonthEnd(2)]: + result = p.asfreq(freq, how='S') + expected = Period('2007-01', freq='2M') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + + p = Period(freq='3A', year=2007) + for freq in ['2M', offsets.MonthEnd(2)]: + result = p.asfreq(freq) + expected = Period('2009-12', freq='2M') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + for freq in ['2M', offsets.MonthEnd(2)]: + result = p.asfreq(freq, how='S') + expected = Period('2007-01', freq='2M') + + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + + def test_asfreq_combined(self): + # normal freq to combined freq + p = Period('2007', freq='H') + + # ordinal will not change + expected = Period('2007', freq='25H') + for freq, how in zip(['1D1H', '1H1D'], ['E', 'S']): + result = p.asfreq(freq, how=how) + assert result == expected + assert result.ordinal == expected.ordinal + assert result.freq == expected.freq + + # combined freq to normal freq + p1 = Period(freq='1D1H', year=2007) + p2 = Period(freq='1H1D', year=2007) + + # ordinal will change because how=E is the default + result1 = p1.asfreq('H') + result2 = p2.asfreq('H') + expected = Period('2007-01-02', freq='H') + assert result1 == expected + assert result1.ordinal == expected.ordinal + assert result1.freq == expected.freq + assert result2 == expected + assert result2.ordinal == expected.ordinal + assert result2.freq == expected.freq + + # ordinal will not change + result1 = p1.asfreq('H', how='S') + result2 = p2.asfreq('H', how='S') + expected = Period('2007-01-01', freq='H') + assert result1 == expected + assert result1.ordinal == expected.ordinal + assert result1.freq == expected.freq + assert result2 == expected + assert result2.ordinal == expected.ordinal + assert result2.freq == expected.freq + + def test_asfreq_MS(self): + initial = Period("2013") + + assert initial.asfreq(freq="M", how="S") == Period('2013-01', 'M') + + msg = pd.tseries.frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + initial.asfreq(freq="MS", how="S") + + with tm.assert_raises_regex(ValueError, msg): + pd.Period('2013-01', 'MS') + + assert _period_code_map.get("MS") is None diff --git a/pandas/tests/scalar/test_timedelta.py b/pandas/tests/scalar/test_timedelta.py new file mode 100644 index 0000000000000..bc9a0388df9d9 --- /dev/null +++ b/pandas/tests/scalar/test_timedelta.py @@ -0,0 +1,691 @@ +""" test the scalar Timedelta """ +import pytest + +import numpy as np +from datetime import timedelta + +import pandas as pd +import pandas.util.testing as tm +from pandas.core.tools.timedeltas import _coerce_scalar_to_timedelta_type as ct +from pandas import (Timedelta, TimedeltaIndex, timedelta_range, Series, + to_timedelta, compat) +from pandas._libs.tslib import iNaT, NaTType + + +class TestTimedeltas(object): + _multiprocess_can_split_ = True + + def setup_method(self, method): + pass + + def test_construction(self): + + expected = np.timedelta64(10, 'D').astype('m8[ns]').view('i8') + assert Timedelta(10, unit='d').value == expected + assert Timedelta(10.0, unit='d').value == expected + assert Timedelta('10 days').value == expected + assert Timedelta(days=10).value == expected + assert Timedelta(days=10.0).value == expected + + expected += np.timedelta64(10, 's').astype('m8[ns]').view('i8') + assert Timedelta('10 days 00:00:10').value == expected + assert Timedelta(days=10, seconds=10).value == expected + assert Timedelta(days=10, milliseconds=10 * 1000).value == expected + assert (Timedelta(days=10, microseconds=10 * 1000 * 1000) + .value == expected) + + # gh-8757: test construction with np dtypes + timedelta_kwargs = {'days': 'D', + 'seconds': 's', + 'microseconds': 'us', + 'milliseconds': 'ms', + 'minutes': 'm', + 'hours': 'h', + 'weeks': 'W'} + npdtypes = [np.int64, np.int32, np.int16, np.float64, np.float32, + np.float16] + for npdtype in npdtypes: + for pykwarg, npkwarg in timedelta_kwargs.items(): + expected = np.timedelta64(1, npkwarg).astype( + 'm8[ns]').view('i8') + assert Timedelta(**{pykwarg: npdtype(1)}).value == expected + + # rounding cases + assert Timedelta(82739999850000).value == 82739999850000 + assert ('0 days 22:58:59.999850' in str(Timedelta(82739999850000))) + assert Timedelta(123072001000000).value == 123072001000000 + assert ('1 days 10:11:12.001' in str(Timedelta(123072001000000))) + + # string conversion with/without leading zero + # GH 9570 + assert Timedelta('0:00:00') == timedelta(hours=0) + assert Timedelta('00:00:00') == timedelta(hours=0) + assert Timedelta('-1:00:00') == -timedelta(hours=1) + assert Timedelta('-01:00:00') == -timedelta(hours=1) + + # more strings & abbrevs + # GH 8190 + assert Timedelta('1 h') == timedelta(hours=1) + assert Timedelta('1 hour') == timedelta(hours=1) + assert Timedelta('1 hr') == timedelta(hours=1) + assert Timedelta('1 hours') == timedelta(hours=1) + assert Timedelta('-1 hours') == -timedelta(hours=1) + assert Timedelta('1 m') == timedelta(minutes=1) + assert Timedelta('1.5 m') == timedelta(seconds=90) + assert Timedelta('1 minute') == timedelta(minutes=1) + assert Timedelta('1 minutes') == timedelta(minutes=1) + assert Timedelta('1 s') == timedelta(seconds=1) + assert Timedelta('1 second') == timedelta(seconds=1) + assert Timedelta('1 seconds') == timedelta(seconds=1) + assert Timedelta('1 ms') == timedelta(milliseconds=1) + assert Timedelta('1 milli') == timedelta(milliseconds=1) + assert Timedelta('1 millisecond') == timedelta(milliseconds=1) + assert Timedelta('1 us') == timedelta(microseconds=1) + assert Timedelta('1 micros') == timedelta(microseconds=1) + assert Timedelta('1 microsecond') == timedelta(microseconds=1) + assert Timedelta('1.5 microsecond') == Timedelta('00:00:00.000001500') + assert Timedelta('1 ns') == Timedelta('00:00:00.000000001') + assert Timedelta('1 nano') == Timedelta('00:00:00.000000001') + assert Timedelta('1 nanosecond') == Timedelta('00:00:00.000000001') + + # combos + assert Timedelta('10 days 1 hour') == timedelta(days=10, hours=1) + assert Timedelta('10 days 1 h') == timedelta(days=10, hours=1) + assert Timedelta('10 days 1 h 1m 1s') == timedelta( + days=10, hours=1, minutes=1, seconds=1) + assert Timedelta('-10 days 1 h 1m 1s') == -timedelta( + days=10, hours=1, minutes=1, seconds=1) + assert Timedelta('-10 days 1 h 1m 1s') == -timedelta( + days=10, hours=1, minutes=1, seconds=1) + assert Timedelta('-10 days 1 h 1m 1s 3us') == -timedelta( + days=10, hours=1, minutes=1, seconds=1, microseconds=3) + assert Timedelta('-10 days 1 h 1.5m 1s 3us'), -timedelta( + days=10, hours=1, minutes=1, seconds=31, microseconds=3) + + # Currently invalid as it has a - on the hh:mm:dd part + # (only allowed on the days) + pytest.raises(ValueError, + lambda: Timedelta('-10 days -1 h 1.5m 1s 3us')) + + # only leading neg signs are allowed + pytest.raises(ValueError, + lambda: Timedelta('10 days -1 h 1.5m 1s 3us')) + + # no units specified + pytest.raises(ValueError, lambda: Timedelta('3.1415')) + + # invalid construction + tm.assert_raises_regex(ValueError, "cannot construct a Timedelta", + lambda: Timedelta()) + tm.assert_raises_regex(ValueError, + "unit abbreviation w/o a number", + lambda: Timedelta('foo')) + tm.assert_raises_regex(ValueError, + "cannot construct a Timedelta from the " + "passed arguments, allowed keywords are ", + lambda: Timedelta(day=10)) + + # round-trip both for string and value + for v in ['1s', '-1s', '1us', '-1us', '1 day', '-1 day', + '-23:59:59.999999', '-1 days +23:59:59.999999', '-1ns', + '1ns', '-23:59:59.999999999']: + + td = Timedelta(v) + assert Timedelta(td.value) == td + + # str does not normally display nanos + if not td.nanoseconds: + assert Timedelta(str(td)) == td + assert Timedelta(td._repr_base(format='all')) == td + + # floats + expected = np.timedelta64( + 10, 's').astype('m8[ns]').view('i8') + np.timedelta64( + 500, 'ms').astype('m8[ns]').view('i8') + assert Timedelta(10.5, unit='s').value == expected + + # offset + assert (to_timedelta(pd.offsets.Hour(2)) == + Timedelta('0 days, 02:00:00')) + assert (Timedelta(pd.offsets.Hour(2)) == + Timedelta('0 days, 02:00:00')) + assert (Timedelta(pd.offsets.Second(2)) == + Timedelta('0 days, 00:00:02')) + + # gh-11995: unicode + expected = Timedelta('1H') + result = pd.Timedelta(u'1H') + assert result == expected + assert (to_timedelta(pd.offsets.Hour(2)) == + Timedelta(u'0 days, 02:00:00')) + + pytest.raises(ValueError, lambda: Timedelta(u'foo bar')) + + def test_overflow_on_construction(self): + # xref https://github.com/statsmodels/statsmodels/issues/3374 + value = pd.Timedelta('1day').value * 20169940 + pytest.raises(OverflowError, pd.Timedelta, value) + + def test_total_seconds_scalar(self): + # see gh-10939 + rng = Timedelta('1 days, 10:11:12.100123456') + expt = 1 * 86400 + 10 * 3600 + 11 * 60 + 12 + 100123456. / 1e9 + tm.assert_almost_equal(rng.total_seconds(), expt) + + rng = Timedelta(np.nan) + assert np.isnan(rng.total_seconds()) + + def test_repr(self): + + assert (repr(Timedelta(10, unit='d')) == + "Timedelta('10 days 00:00:00')") + assert (repr(Timedelta(10, unit='s')) == + "Timedelta('0 days 00:00:10')") + assert (repr(Timedelta(10, unit='ms')) == + "Timedelta('0 days 00:00:00.010000')") + assert (repr(Timedelta(-10, unit='ms')) == + "Timedelta('-1 days +23:59:59.990000')") + + def test_conversion(self): + + for td in [Timedelta(10, unit='d'), + Timedelta('1 days, 10:11:12.012345')]: + pydt = td.to_pytimedelta() + assert td == Timedelta(pydt) + assert td == pydt + assert (isinstance(pydt, timedelta) and not isinstance( + pydt, Timedelta)) + + assert td == np.timedelta64(td.value, 'ns') + td64 = td.to_timedelta64() + + assert td64 == np.timedelta64(td.value, 'ns') + assert td == td64 + + assert isinstance(td64, np.timedelta64) + + # this is NOT equal and cannot be roundtriped (because of the nanos) + td = Timedelta('1 days, 10:11:12.012345678') + assert td != td.to_pytimedelta() + + def test_freq_conversion(self): + + # truediv + td = Timedelta('1 days 2 hours 3 ns') + result = td / np.timedelta64(1, 'D') + assert result == td.value / float(86400 * 1e9) + result = td / np.timedelta64(1, 's') + assert result == td.value / float(1e9) + result = td / np.timedelta64(1, 'ns') + assert result == td.value + + # floordiv + td = Timedelta('1 days 2 hours 3 ns') + result = td // np.timedelta64(1, 'D') + assert result == 1 + result = td // np.timedelta64(1, 's') + assert result == 93600 + result = td // np.timedelta64(1, 'ns') + assert result == td.value + + def test_fields(self): + def check(value): + # that we are int/long like + assert isinstance(value, (int, compat.long)) + + # compat to datetime.timedelta + rng = to_timedelta('1 days, 10:11:12') + assert rng.days == 1 + assert rng.seconds == 10 * 3600 + 11 * 60 + 12 + assert rng.microseconds == 0 + assert rng.nanoseconds == 0 + + pytest.raises(AttributeError, lambda: rng.hours) + pytest.raises(AttributeError, lambda: rng.minutes) + pytest.raises(AttributeError, lambda: rng.milliseconds) + + # GH 10050 + check(rng.days) + check(rng.seconds) + check(rng.microseconds) + check(rng.nanoseconds) + + td = Timedelta('-1 days, 10:11:12') + assert abs(td) == Timedelta('13:48:48') + assert str(td) == "-1 days +10:11:12" + assert -td == Timedelta('0 days 13:48:48') + assert -Timedelta('-1 days, 10:11:12').value == 49728000000000 + assert Timedelta('-1 days, 10:11:12').value == -49728000000000 + + rng = to_timedelta('-1 days, 10:11:12.100123456') + assert rng.days == -1 + assert rng.seconds == 10 * 3600 + 11 * 60 + 12 + assert rng.microseconds == 100 * 1000 + 123 + assert rng.nanoseconds == 456 + pytest.raises(AttributeError, lambda: rng.hours) + pytest.raises(AttributeError, lambda: rng.minutes) + pytest.raises(AttributeError, lambda: rng.milliseconds) + + # components + tup = pd.to_timedelta(-1, 'us').components + assert tup.days == -1 + assert tup.hours == 23 + assert tup.minutes == 59 + assert tup.seconds == 59 + assert tup.milliseconds == 999 + assert tup.microseconds == 999 + assert tup.nanoseconds == 0 + + # GH 10050 + check(tup.days) + check(tup.hours) + check(tup.minutes) + check(tup.seconds) + check(tup.milliseconds) + check(tup.microseconds) + check(tup.nanoseconds) + + tup = Timedelta('-1 days 1 us').components + assert tup.days == -2 + assert tup.hours == 23 + assert tup.minutes == 59 + assert tup.seconds == 59 + assert tup.milliseconds == 999 + assert tup.microseconds == 999 + assert tup.nanoseconds == 0 + + def test_nat_converters(self): + assert to_timedelta('nat', box=False).astype('int64') == iNaT + assert to_timedelta('nan', box=False).astype('int64') == iNaT + + def testit(unit, transform): + + # array + result = to_timedelta(np.arange(5), unit=unit) + expected = TimedeltaIndex([np.timedelta64(i, transform(unit)) + for i in np.arange(5).tolist()]) + tm.assert_index_equal(result, expected) + + # scalar + result = to_timedelta(2, unit=unit) + expected = Timedelta(np.timedelta64(2, transform(unit)).astype( + 'timedelta64[ns]')) + assert result == expected + + # validate all units + # GH 6855 + for unit in ['Y', 'M', 'W', 'D', 'y', 'w', 'd']: + testit(unit, lambda x: x.upper()) + for unit in ['days', 'day', 'Day', 'Days']: + testit(unit, lambda x: 'D') + for unit in ['h', 'm', 's', 'ms', 'us', 'ns', 'H', 'S', 'MS', 'US', + 'NS']: + testit(unit, lambda x: x.lower()) + + # offsets + + # m + testit('T', lambda x: 'm') + + # ms + testit('L', lambda x: 'ms') + + def test_numeric_conversions(self): + assert ct(0) == np.timedelta64(0, 'ns') + assert ct(10) == np.timedelta64(10, 'ns') + assert ct(10, unit='ns') == np.timedelta64(10, 'ns').astype('m8[ns]') + + assert ct(10, unit='us') == np.timedelta64(10, 'us').astype('m8[ns]') + assert ct(10, unit='ms') == np.timedelta64(10, 'ms').astype('m8[ns]') + assert ct(10, unit='s') == np.timedelta64(10, 's').astype('m8[ns]') + assert ct(10, unit='d') == np.timedelta64(10, 'D').astype('m8[ns]') + + def test_timedelta_conversions(self): + assert (ct(timedelta(seconds=1)) == + np.timedelta64(1, 's').astype('m8[ns]')) + assert (ct(timedelta(microseconds=1)) == + np.timedelta64(1, 'us').astype('m8[ns]')) + assert (ct(timedelta(days=1)) == + np.timedelta64(1, 'D').astype('m8[ns]')) + + def test_round(self): + + t1 = Timedelta('1 days 02:34:56.789123456') + t2 = Timedelta('-1 days 02:34:56.789123456') + + for (freq, s1, s2) in [('N', t1, t2), + ('U', Timedelta('1 days 02:34:56.789123000'), + Timedelta('-1 days 02:34:56.789123000')), + ('L', Timedelta('1 days 02:34:56.789000000'), + Timedelta('-1 days 02:34:56.789000000')), + ('S', Timedelta('1 days 02:34:57'), + Timedelta('-1 days 02:34:57')), + ('2S', Timedelta('1 days 02:34:56'), + Timedelta('-1 days 02:34:56')), + ('5S', Timedelta('1 days 02:34:55'), + Timedelta('-1 days 02:34:55')), + ('T', Timedelta('1 days 02:35:00'), + Timedelta('-1 days 02:35:00')), + ('12T', Timedelta('1 days 02:36:00'), + Timedelta('-1 days 02:36:00')), + ('H', Timedelta('1 days 03:00:00'), + Timedelta('-1 days 03:00:00')), + ('d', Timedelta('1 days'), + Timedelta('-1 days'))]: + r1 = t1.round(freq) + assert r1 == s1 + r2 = t2.round(freq) + assert r2 == s2 + + # invalid + for freq in ['Y', 'M', 'foobar']: + pytest.raises(ValueError, lambda: t1.round(freq)) + + t1 = timedelta_range('1 days', periods=3, freq='1 min 2 s 3 us') + t2 = -1 * t1 + t1a = timedelta_range('1 days', periods=3, freq='1 min 2 s') + t1c = pd.TimedeltaIndex([1, 1, 1], unit='D') + + # note that negative times round DOWN! so don't give whole numbers + for (freq, s1, s2) in [('N', t1, t2), + ('U', t1, t2), + ('L', t1a, + TimedeltaIndex(['-1 days +00:00:00', + '-2 days +23:58:58', + '-2 days +23:57:56'], + dtype='timedelta64[ns]', + freq=None) + ), + ('S', t1a, + TimedeltaIndex(['-1 days +00:00:00', + '-2 days +23:58:58', + '-2 days +23:57:56'], + dtype='timedelta64[ns]', + freq=None) + ), + ('12T', t1c, + TimedeltaIndex(['-1 days', + '-1 days', + '-1 days'], + dtype='timedelta64[ns]', + freq=None) + ), + ('H', t1c, + TimedeltaIndex(['-1 days', + '-1 days', + '-1 days'], + dtype='timedelta64[ns]', + freq=None) + ), + ('d', t1c, + pd.TimedeltaIndex([-1, -1, -1], unit='D') + )]: + + r1 = t1.round(freq) + tm.assert_index_equal(r1, s1) + r2 = t2.round(freq) + tm.assert_index_equal(r2, s2) + + # invalid + for freq in ['Y', 'M', 'foobar']: + pytest.raises(ValueError, lambda: t1.round(freq)) + + def test_contains(self): + # Checking for any NaT-like objects + # GH 13603 + td = to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) + for v in [pd.NaT, None, float('nan'), np.nan]: + assert not (v in td) + + td = to_timedelta([pd.NaT]) + for v in [pd.NaT, None, float('nan'), np.nan]: + assert (v in td) + + def test_identity(self): + + td = Timedelta(10, unit='d') + assert isinstance(td, Timedelta) + assert isinstance(td, timedelta) + + def test_short_format_converters(self): + def conv(v): + return v.astype('m8[ns]') + + assert ct('10') == np.timedelta64(10, 'ns') + assert ct('10ns') == np.timedelta64(10, 'ns') + assert ct('100') == np.timedelta64(100, 'ns') + assert ct('100ns') == np.timedelta64(100, 'ns') + + assert ct('1000') == np.timedelta64(1000, 'ns') + assert ct('1000ns') == np.timedelta64(1000, 'ns') + assert ct('1000NS') == np.timedelta64(1000, 'ns') + + assert ct('10us') == np.timedelta64(10000, 'ns') + assert ct('100us') == np.timedelta64(100000, 'ns') + assert ct('1000us') == np.timedelta64(1000000, 'ns') + assert ct('1000Us') == np.timedelta64(1000000, 'ns') + assert ct('1000uS') == np.timedelta64(1000000, 'ns') + + assert ct('1ms') == np.timedelta64(1000000, 'ns') + assert ct('10ms') == np.timedelta64(10000000, 'ns') + assert ct('100ms') == np.timedelta64(100000000, 'ns') + assert ct('1000ms') == np.timedelta64(1000000000, 'ns') + + assert ct('-1s') == -np.timedelta64(1000000000, 'ns') + assert ct('1s') == np.timedelta64(1000000000, 'ns') + assert ct('10s') == np.timedelta64(10000000000, 'ns') + assert ct('100s') == np.timedelta64(100000000000, 'ns') + assert ct('1000s') == np.timedelta64(1000000000000, 'ns') + + assert ct('1d') == conv(np.timedelta64(1, 'D')) + assert ct('-1d') == -conv(np.timedelta64(1, 'D')) + assert ct('1D') == conv(np.timedelta64(1, 'D')) + assert ct('10D') == conv(np.timedelta64(10, 'D')) + assert ct('100D') == conv(np.timedelta64(100, 'D')) + assert ct('1000D') == conv(np.timedelta64(1000, 'D')) + assert ct('10000D') == conv(np.timedelta64(10000, 'D')) + + # space + assert ct(' 10000D ') == conv(np.timedelta64(10000, 'D')) + assert ct(' - 10000D ') == -conv(np.timedelta64(10000, 'D')) + + # invalid + pytest.raises(ValueError, ct, '1foo') + pytest.raises(ValueError, ct, 'foo') + + def test_full_format_converters(self): + def conv(v): + return v.astype('m8[ns]') + + d1 = np.timedelta64(1, 'D') + + assert ct('1days') == conv(d1) + assert ct('1days,') == conv(d1) + assert ct('- 1days,') == -conv(d1) + + assert ct('00:00:01') == conv(np.timedelta64(1, 's')) + assert ct('06:00:01') == conv(np.timedelta64(6 * 3600 + 1, 's')) + assert ct('06:00:01.0') == conv(np.timedelta64(6 * 3600 + 1, 's')) + assert ct('06:00:01.01') == conv(np.timedelta64( + 1000 * (6 * 3600 + 1) + 10, 'ms')) + + assert (ct('- 1days, 00:00:01') == + conv(-d1 + np.timedelta64(1, 's'))) + assert (ct('1days, 06:00:01') == + conv(d1 + np.timedelta64(6 * 3600 + 1, 's'))) + assert (ct('1days, 06:00:01.01') == + conv(d1 + np.timedelta64(1000 * (6 * 3600 + 1) + 10, 'ms'))) + + # invalid + pytest.raises(ValueError, ct, '- 1days, 00') + + def test_overflow(self): + # GH 9442 + s = Series(pd.date_range('20130101', periods=100000, freq='H')) + s[0] += pd.Timedelta('1s 1ms') + + # mean + result = (s - s.min()).mean() + expected = pd.Timedelta((pd.DatetimeIndex((s - s.min())).asi8 / len(s) + ).sum()) + + # the computation is converted to float so + # might be some loss of precision + assert np.allclose(result.value / 1000, expected.value / 1000) + + # sum + pytest.raises(ValueError, lambda: (s - s.min()).sum()) + s1 = s[0:10000] + pytest.raises(ValueError, lambda: (s1 - s1.min()).sum()) + s2 = s[0:1000] + result = (s2 - s2.min()).sum() + + def test_pickle(self): + + v = Timedelta('1 days 10:11:12.0123456') + v_p = tm.round_trip_pickle(v) + assert v == v_p + + def test_timedelta_hash_equality(self): + # GH 11129 + v = Timedelta(1, 'D') + td = timedelta(days=1) + assert hash(v) == hash(td) + + d = {td: 2} + assert d[v] == 2 + + tds = timedelta_range('1 second', periods=20) + assert all(hash(td) == hash(td.to_pytimedelta()) for td in tds) + + # python timedeltas drop ns resolution + ns_td = Timedelta(1, 'ns') + assert hash(ns_td) != hash(ns_td.to_pytimedelta()) + + def test_implementation_limits(self): + min_td = Timedelta(Timedelta.min) + max_td = Timedelta(Timedelta.max) + + # GH 12727 + # timedelta limits correspond to int64 boundaries + assert min_td.value == np.iinfo(np.int64).min + 1 + assert max_td.value == np.iinfo(np.int64).max + + # Beyond lower limit, a NAT before the Overflow + assert isinstance(min_td - Timedelta(1, 'ns'), NaTType) + + with pytest.raises(OverflowError): + min_td - Timedelta(2, 'ns') + + with pytest.raises(OverflowError): + max_td + Timedelta(1, 'ns') + + # Same tests using the internal nanosecond values + td = Timedelta(min_td.value - 1, 'ns') + assert isinstance(td, NaTType) + + with pytest.raises(OverflowError): + Timedelta(min_td.value - 2, 'ns') + + with pytest.raises(OverflowError): + Timedelta(max_td.value + 1, 'ns') + + def test_timedelta_arithmetic(self): + data = pd.Series(['nat', '32 days'], dtype='timedelta64[ns]') + deltas = [timedelta(days=1), Timedelta(1, unit='D')] + for delta in deltas: + result_method = data.add(delta) + result_operator = data + delta + expected = pd.Series(['nat', '33 days'], dtype='timedelta64[ns]') + tm.assert_series_equal(result_operator, expected) + tm.assert_series_equal(result_method, expected) + + result_method = data.sub(delta) + result_operator = data - delta + expected = pd.Series(['nat', '31 days'], dtype='timedelta64[ns]') + tm.assert_series_equal(result_operator, expected) + tm.assert_series_equal(result_method, expected) + # GH 9396 + result_method = data.div(delta) + result_operator = data / delta + expected = pd.Series([np.nan, 32.], dtype='float64') + tm.assert_series_equal(result_operator, expected) + tm.assert_series_equal(result_method, expected) + + def test_apply_to_timedelta(self): + timedelta_NaT = pd.to_timedelta('NaT') + + list_of_valid_strings = ['00:00:01', '00:00:02'] + a = pd.to_timedelta(list_of_valid_strings) + b = Series(list_of_valid_strings).apply(pd.to_timedelta) + # Can't compare until apply on a Series gives the correct dtype + # assert_series_equal(a, b) + + list_of_strings = ['00:00:01', np.nan, pd.NaT, timedelta_NaT] + + # TODO: unused? + a = pd.to_timedelta(list_of_strings) # noqa + b = Series(list_of_strings).apply(pd.to_timedelta) # noqa + # Can't compare until apply on a Series gives the correct dtype + # assert_series_equal(a, b) + + def test_components(self): + rng = timedelta_range('1 days, 10:11:12', periods=2, freq='s') + rng.components + + # with nat + s = Series(rng) + s[1] = np.nan + + result = s.dt.components + assert not result.iloc[0].isna().all() + assert result.iloc[1].isna().all() + + def test_isoformat(self): + td = Timedelta(days=6, minutes=50, seconds=3, + milliseconds=10, microseconds=10, nanoseconds=12) + expected = 'P6DT0H50M3.010010012S' + result = td.isoformat() + assert result == expected + + td = Timedelta(days=4, hours=12, minutes=30, seconds=5) + result = td.isoformat() + expected = 'P4DT12H30M5S' + assert result == expected + + td = Timedelta(nanoseconds=123) + result = td.isoformat() + expected = 'P0DT0H0M0.000000123S' + assert result == expected + + # trim nano + td = Timedelta(microseconds=10) + result = td.isoformat() + expected = 'P0DT0H0M0.00001S' + assert result == expected + + # trim micro + td = Timedelta(milliseconds=1) + result = td.isoformat() + expected = 'P0DT0H0M0.001S' + assert result == expected + + # don't strip every 0 + result = Timedelta(minutes=1).isoformat() + expected = 'P0DT0H1M0S' + assert result == expected + + def test_ops_error_str(self): + # GH 13624 + td = Timedelta('1 day') + + for l, r in [(td, 'a'), ('a', td)]: + + with pytest.raises(TypeError): + l + r + + with pytest.raises(TypeError): + l > r + + assert not l == r + assert l != r diff --git a/pandas/tests/scalar/test_timestamp.py b/pandas/tests/scalar/test_timestamp.py new file mode 100644 index 0000000000000..7cd1a7db0f9fe --- /dev/null +++ b/pandas/tests/scalar/test_timestamp.py @@ -0,0 +1,1488 @@ +""" test the scalar Timestamp """ + +import sys +import pytz +import pytest +import dateutil +import operator +import calendar +import numpy as np + +from dateutil.tz import tzutc +from pytz import timezone, utc +from datetime import datetime, timedelta +from distutils.version import LooseVersion +from pytz.exceptions import AmbiguousTimeError, NonExistentTimeError + +import pandas.util.testing as tm +from pandas.tseries import offsets, frequencies +from pandas._libs import tslib, period +from pandas._libs.tslib import get_timezone + +from pandas.compat import lrange, long +from pandas.util.testing import assert_series_equal +from pandas.compat.numpy import np_datetime64_compat +from pandas import (Timestamp, date_range, Period, Timedelta, compat, + Series, NaT, DataFrame, DatetimeIndex) +from pandas.tseries.frequencies import (RESO_DAY, RESO_HR, RESO_MIN, RESO_US, + RESO_MS, RESO_SEC) + + +class TestTimestamp(object): + + def test_constructor(self): + base_str = '2014-07-01 09:00' + base_dt = datetime(2014, 7, 1, 9) + base_expected = 1404205200000000000 + + # confirm base representation is correct + import calendar + assert (calendar.timegm(base_dt.timetuple()) * 1000000000 == + base_expected) + + tests = [(base_str, base_dt, base_expected), + ('2014-07-01 10:00', datetime(2014, 7, 1, 10), + base_expected + 3600 * 1000000000), + ('2014-07-01 09:00:00.000008000', + datetime(2014, 7, 1, 9, 0, 0, 8), + base_expected + 8000), + ('2014-07-01 09:00:00.000000005', + Timestamp('2014-07-01 09:00:00.000000005'), + base_expected + 5)] + + timezones = [(None, 0), ('UTC', 0), (pytz.utc, 0), ('Asia/Tokyo', 9), + ('US/Eastern', -4), ('dateutil/US/Pacific', -7), + (pytz.FixedOffset(-180), -3), + (dateutil.tz.tzoffset(None, 18000), 5)] + + for date_str, date, expected in tests: + for result in [Timestamp(date_str), Timestamp(date)]: + # only with timestring + assert result.value == expected + assert tslib.pydt_to_i8(result) == expected + + # re-creation shouldn't affect to internal value + result = Timestamp(result) + assert result.value == expected + assert tslib.pydt_to_i8(result) == expected + + # with timezone + for tz, offset in timezones: + for result in [Timestamp(date_str, tz=tz), Timestamp(date, + tz=tz)]: + expected_tz = expected - offset * 3600 * 1000000000 + assert result.value == expected_tz + assert tslib.pydt_to_i8(result) == expected_tz + + # should preserve tz + result = Timestamp(result) + assert result.value == expected_tz + assert tslib.pydt_to_i8(result) == expected_tz + + # should convert to UTC + result = Timestamp(result, tz='UTC') + expected_utc = expected - offset * 3600 * 1000000000 + assert result.value == expected_utc + assert tslib.pydt_to_i8(result) == expected_utc + + def test_constructor_with_stringoffset(self): + # GH 7833 + base_str = '2014-07-01 11:00:00+02:00' + base_dt = datetime(2014, 7, 1, 9) + base_expected = 1404205200000000000 + + # confirm base representation is correct + import calendar + assert (calendar.timegm(base_dt.timetuple()) * 1000000000 == + base_expected) + + tests = [(base_str, base_expected), + ('2014-07-01 12:00:00+02:00', + base_expected + 3600 * 1000000000), + ('2014-07-01 11:00:00.000008000+02:00', base_expected + 8000), + ('2014-07-01 11:00:00.000000005+02:00', base_expected + 5)] + + timezones = [(None, 0), ('UTC', 0), (pytz.utc, 0), ('Asia/Tokyo', 9), + ('US/Eastern', -4), ('dateutil/US/Pacific', -7), + (pytz.FixedOffset(-180), -3), + (dateutil.tz.tzoffset(None, 18000), 5)] + + for date_str, expected in tests: + for result in [Timestamp(date_str)]: + # only with timestring + assert result.value == expected + assert tslib.pydt_to_i8(result) == expected + + # re-creation shouldn't affect to internal value + result = Timestamp(result) + assert result.value == expected + assert tslib.pydt_to_i8(result) == expected + + # with timezone + for tz, offset in timezones: + result = Timestamp(date_str, tz=tz) + expected_tz = expected + assert result.value == expected_tz + assert tslib.pydt_to_i8(result) == expected_tz + + # should preserve tz + result = Timestamp(result) + assert result.value == expected_tz + assert tslib.pydt_to_i8(result) == expected_tz + + # should convert to UTC + result = Timestamp(result, tz='UTC') + expected_utc = expected + assert result.value == expected_utc + assert tslib.pydt_to_i8(result) == expected_utc + + # This should be 2013-11-01 05:00 in UTC + # converted to Chicago tz + result = Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago') + assert result.value == Timestamp('2013-11-01 05:00').value + expected = "Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')" # noqa + assert repr(result) == expected + assert result == eval(repr(result)) + + # This should be 2013-11-01 05:00 in UTC + # converted to Tokyo tz (+09:00) + result = Timestamp('2013-11-01 00:00:00-0500', tz='Asia/Tokyo') + assert result.value == Timestamp('2013-11-01 05:00').value + expected = "Timestamp('2013-11-01 14:00:00+0900', tz='Asia/Tokyo')" + assert repr(result) == expected + assert result == eval(repr(result)) + + # GH11708 + # This should be 2015-11-18 10:00 in UTC + # converted to Asia/Katmandu + result = Timestamp("2015-11-18 15:45:00+05:45", tz="Asia/Katmandu") + assert result.value == Timestamp("2015-11-18 10:00").value + expected = "Timestamp('2015-11-18 15:45:00+0545', tz='Asia/Katmandu')" + assert repr(result) == expected + assert result == eval(repr(result)) + + # This should be 2015-11-18 10:00 in UTC + # converted to Asia/Kolkata + result = Timestamp("2015-11-18 15:30:00+05:30", tz="Asia/Kolkata") + assert result.value == Timestamp("2015-11-18 10:00").value + expected = "Timestamp('2015-11-18 15:30:00+0530', tz='Asia/Kolkata')" + assert repr(result) == expected + assert result == eval(repr(result)) + + def test_constructor_invalid(self): + with tm.assert_raises_regex(TypeError, 'Cannot convert input'): + Timestamp(slice(2)) + with tm.assert_raises_regex(ValueError, 'Cannot convert Period'): + Timestamp(Period('1000-01-01')) + + def test_constructor_positional(self): + # see gh-10758 + with pytest.raises(TypeError): + Timestamp(2000, 1) + with pytest.raises(ValueError): + Timestamp(2000, 0, 1) + with pytest.raises(ValueError): + Timestamp(2000, 13, 1) + with pytest.raises(ValueError): + Timestamp(2000, 1, 0) + with pytest.raises(ValueError): + Timestamp(2000, 1, 32) + + # see gh-11630 + assert (repr(Timestamp(2015, 11, 12)) == + repr(Timestamp('20151112'))) + assert (repr(Timestamp(2015, 11, 12, 1, 2, 3, 999999)) == + repr(Timestamp('2015-11-12 01:02:03.999999'))) + + def test_constructor_keyword(self): + # GH 10758 + with pytest.raises(TypeError): + Timestamp(year=2000, month=1) + with pytest.raises(ValueError): + Timestamp(year=2000, month=0, day=1) + with pytest.raises(ValueError): + Timestamp(year=2000, month=13, day=1) + with pytest.raises(ValueError): + Timestamp(year=2000, month=1, day=0) + with pytest.raises(ValueError): + Timestamp(year=2000, month=1, day=32) + + assert (repr(Timestamp(year=2015, month=11, day=12)) == + repr(Timestamp('20151112'))) + + assert (repr(Timestamp(year=2015, month=11, day=12, hour=1, minute=2, + second=3, microsecond=999999)) == + repr(Timestamp('2015-11-12 01:02:03.999999'))) + + def test_constructor_fromordinal(self): + base = datetime(2000, 1, 1) + + ts = Timestamp.fromordinal(base.toordinal(), freq='D') + assert base == ts + assert ts.freq == 'D' + assert base.toordinal() == ts.toordinal() + + ts = Timestamp.fromordinal(base.toordinal(), tz='US/Eastern') + assert Timestamp('2000-01-01', tz='US/Eastern') == ts + assert base.toordinal() == ts.toordinal() + + def test_constructor_offset_depr(self): + # see gh-12160 + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + ts = Timestamp('2011-01-01', offset='D') + assert ts.freq == 'D' + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + assert ts.offset == 'D' + + msg = "Can only specify freq or offset, not both" + with tm.assert_raises_regex(TypeError, msg): + Timestamp('2011-01-01', offset='D', freq='D') + + def test_constructor_offset_depr_fromordinal(self): + # GH 12160 + base = datetime(2000, 1, 1) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + ts = Timestamp.fromordinal(base.toordinal(), offset='D') + assert Timestamp('2000-01-01') == ts + assert ts.freq == 'D' + assert base.toordinal() == ts.toordinal() + + msg = "Can only specify freq or offset, not both" + with tm.assert_raises_regex(TypeError, msg): + Timestamp.fromordinal(base.toordinal(), offset='D', freq='D') + + def test_conversion(self): + # GH 9255 + ts = Timestamp('2000-01-01') + + result = ts.to_pydatetime() + expected = datetime(2000, 1, 1) + assert result == expected + assert type(result) == type(expected) + + result = ts.to_datetime64() + expected = np.datetime64(ts.value, 'ns') + assert result == expected + assert type(result) == type(expected) + assert result.dtype == expected.dtype + + def test_repr(self): + dates = ['2014-03-07', '2014-01-01 09:00', + '2014-01-01 00:00:00.000000001'] + + # dateutil zone change (only matters for repr) + if (dateutil.__version__ >= LooseVersion('2.3') and + (dateutil.__version__ <= LooseVersion('2.4.0') or + dateutil.__version__ >= LooseVersion('2.6.0'))): + timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', + 'dateutil/US/Pacific'] + else: + timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', + 'dateutil/America/Los_Angeles'] + + freqs = ['D', 'M', 'S', 'N'] + + for date in dates: + for tz in timezones: + for freq in freqs: + + # avoid to match with timezone name + freq_repr = "'{0}'".format(freq) + if tz.startswith('dateutil'): + tz_repr = tz.replace('dateutil', '') + else: + tz_repr = tz + + date_only = Timestamp(date) + assert date in repr(date_only) + assert tz_repr not in repr(date_only) + assert freq_repr not in repr(date_only) + assert date_only == eval(repr(date_only)) + + date_tz = Timestamp(date, tz=tz) + assert date in repr(date_tz) + assert tz_repr in repr(date_tz) + assert freq_repr not in repr(date_tz) + assert date_tz == eval(repr(date_tz)) + + date_freq = Timestamp(date, freq=freq) + assert date in repr(date_freq) + assert tz_repr not in repr(date_freq) + assert freq_repr in repr(date_freq) + assert date_freq == eval(repr(date_freq)) + + date_tz_freq = Timestamp(date, tz=tz, freq=freq) + assert date in repr(date_tz_freq) + assert tz_repr in repr(date_tz_freq) + assert freq_repr in repr(date_tz_freq) + assert date_tz_freq == eval(repr(date_tz_freq)) + + # This can cause the tz field to be populated, but it's redundant to + # include this information in the date-string. + date_with_utc_offset = Timestamp('2014-03-13 00:00:00-0400', tz=None) + assert '2014-03-13 00:00:00-0400' in repr(date_with_utc_offset) + assert 'tzoffset' not in repr(date_with_utc_offset) + assert 'pytz.FixedOffset(-240)' in repr(date_with_utc_offset) + expr = repr(date_with_utc_offset).replace("'pytz.FixedOffset(-240)'", + 'pytz.FixedOffset(-240)') + assert date_with_utc_offset == eval(expr) + + def test_bounds_with_different_units(self): + out_of_bounds_dates = ('1677-09-21', '2262-04-12', ) + + time_units = ('D', 'h', 'm', 's', 'ms', 'us') + + for date_string in out_of_bounds_dates: + for unit in time_units: + pytest.raises(ValueError, Timestamp, np.datetime64( + date_string, dtype='M8[%s]' % unit)) + + in_bounds_dates = ('1677-09-23', '2262-04-11', ) + + for date_string in in_bounds_dates: + for unit in time_units: + Timestamp(np.datetime64(date_string, dtype='M8[%s]' % unit)) + + def test_tz(self): + t = '2014-02-01 09:00' + ts = Timestamp(t) + local = ts.tz_localize('Asia/Tokyo') + assert local.hour == 9 + assert local == Timestamp(t, tz='Asia/Tokyo') + conv = local.tz_convert('US/Eastern') + assert conv == Timestamp('2014-01-31 19:00', tz='US/Eastern') + assert conv.hour == 19 + + # preserves nanosecond + ts = Timestamp(t) + offsets.Nano(5) + local = ts.tz_localize('Asia/Tokyo') + assert local.hour == 9 + assert local.nanosecond == 5 + conv = local.tz_convert('US/Eastern') + assert conv.nanosecond == 5 + assert conv.hour == 19 + + def test_tz_localize_ambiguous(self): + + ts = Timestamp('2014-11-02 01:00') + ts_dst = ts.tz_localize('US/Eastern', ambiguous=True) + ts_no_dst = ts.tz_localize('US/Eastern', ambiguous=False) + + rng = date_range('2014-11-02', periods=3, freq='H', tz='US/Eastern') + assert rng[1] == ts_dst + assert rng[2] == ts_no_dst + pytest.raises(ValueError, ts.tz_localize, 'US/Eastern', + ambiguous='infer') + + # GH 8025 + with tm.assert_raises_regex(TypeError, + 'Cannot localize tz-aware Timestamp, ' + 'use tz_convert for conversions'): + Timestamp('2011-01-01', tz='US/Eastern').tz_localize('Asia/Tokyo') + + with tm.assert_raises_regex(TypeError, + 'Cannot convert tz-naive Timestamp, ' + 'use tz_localize to localize'): + Timestamp('2011-01-01').tz_convert('Asia/Tokyo') + + def test_tz_localize_nonexistent(self): + # see gh-13057 + times = ['2015-03-08 02:00', '2015-03-08 02:30', + '2015-03-29 02:00', '2015-03-29 02:30'] + timezones = ['US/Eastern', 'US/Pacific', + 'Europe/Paris', 'Europe/Belgrade'] + for t, tz in zip(times, timezones): + ts = Timestamp(t) + pytest.raises(NonExistentTimeError, ts.tz_localize, + tz) + pytest.raises(NonExistentTimeError, ts.tz_localize, + tz, errors='raise') + assert ts.tz_localize(tz, errors='coerce') is NaT + + def test_tz_localize_errors_ambiguous(self): + # see gh-13057 + ts = Timestamp('2015-11-1 01:00') + pytest.raises(AmbiguousTimeError, + ts.tz_localize, 'US/Pacific', errors='coerce') + + def test_tz_localize_roundtrip(self): + for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']: + for t in ['2014-02-01 09:00', '2014-07-08 09:00', + '2014-11-01 17:00', '2014-11-05 00:00']: + ts = Timestamp(t) + localized = ts.tz_localize(tz) + assert localized == Timestamp(t, tz=tz) + + with pytest.raises(TypeError): + localized.tz_localize(tz) + + reset = localized.tz_localize(None) + assert reset == ts + assert reset.tzinfo is None + + def test_tz_convert_roundtrip(self): + for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']: + for t in ['2014-02-01 09:00', '2014-07-08 09:00', + '2014-11-01 17:00', '2014-11-05 00:00']: + ts = Timestamp(t, tz='UTC') + converted = ts.tz_convert(tz) + + reset = converted.tz_convert(None) + assert reset == Timestamp(t) + assert reset.tzinfo is None + assert reset == converted.tz_convert('UTC').tz_localize(None) + + def test_barely_oob_dts(self): + one_us = np.timedelta64(1).astype('timedelta64[us]') + + # By definition we can't go out of bounds in [ns], so we + # convert the datetime64s to [us] so we can go out of bounds + min_ts_us = np.datetime64(Timestamp.min).astype('M8[us]') + max_ts_us = np.datetime64(Timestamp.max).astype('M8[us]') + + # No error for the min/max datetimes + Timestamp(min_ts_us) + Timestamp(max_ts_us) + + # One us less than the minimum is an error + pytest.raises(ValueError, Timestamp, min_ts_us - one_us) + + # One us more than the maximum is an error + pytest.raises(ValueError, Timestamp, max_ts_us + one_us) + + def test_utc_z_designator(self): + assert get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo) == 'UTC' + + def test_now(self): + # #9000 + ts_from_string = Timestamp('now') + ts_from_method = Timestamp.now() + ts_datetime = datetime.now() + + ts_from_string_tz = Timestamp('now', tz='US/Eastern') + ts_from_method_tz = Timestamp.now(tz='US/Eastern') + + # Check that the delta between the times is less than 1s (arbitrarily + # small) + delta = Timedelta(seconds=1) + assert abs(ts_from_method - ts_from_string) < delta + assert abs(ts_datetime - ts_from_method) < delta + assert abs(ts_from_method_tz - ts_from_string_tz) < delta + assert (abs(ts_from_string_tz.tz_localize(None) - + ts_from_method_tz.tz_localize(None)) < delta) + + def test_today(self): + + ts_from_string = Timestamp('today') + ts_from_method = Timestamp.today() + ts_datetime = datetime.today() + + ts_from_string_tz = Timestamp('today', tz='US/Eastern') + ts_from_method_tz = Timestamp.today(tz='US/Eastern') + + # Check that the delta between the times is less than 1s (arbitrarily + # small) + delta = Timedelta(seconds=1) + assert abs(ts_from_method - ts_from_string) < delta + assert abs(ts_datetime - ts_from_method) < delta + assert abs(ts_from_method_tz - ts_from_string_tz) < delta + assert (abs(ts_from_string_tz.tz_localize(None) - + ts_from_method_tz.tz_localize(None)) < delta) + + def test_asm8(self): + np.random.seed(7960929) + ns = [Timestamp.min.value, Timestamp.max.value, 1000] + + for n in ns: + assert (Timestamp(n).asm8.view('i8') == + np.datetime64(n, 'ns').view('i8') == n) + + assert (Timestamp('nat').asm8.view('i8') == + np.datetime64('nat', 'ns').view('i8')) + + def test_fields(self): + def check(value, equal): + # that we are int/long like + assert isinstance(value, (int, compat.long)) + assert value == equal + + # GH 10050 + ts = Timestamp('2015-05-10 09:06:03.000100001') + check(ts.year, 2015) + check(ts.month, 5) + check(ts.day, 10) + check(ts.hour, 9) + check(ts.minute, 6) + check(ts.second, 3) + pytest.raises(AttributeError, lambda: ts.millisecond) + check(ts.microsecond, 100) + check(ts.nanosecond, 1) + check(ts.dayofweek, 6) + check(ts.quarter, 2) + check(ts.dayofyear, 130) + check(ts.week, 19) + check(ts.daysinmonth, 31) + check(ts.daysinmonth, 31) + + # GH 13303 + ts = Timestamp('2014-12-31 23:59:00-05:00', tz='US/Eastern') + check(ts.year, 2014) + check(ts.month, 12) + check(ts.day, 31) + check(ts.hour, 23) + check(ts.minute, 59) + check(ts.second, 0) + pytest.raises(AttributeError, lambda: ts.millisecond) + check(ts.microsecond, 0) + check(ts.nanosecond, 0) + check(ts.dayofweek, 2) + check(ts.quarter, 4) + check(ts.dayofyear, 365) + check(ts.week, 1) + check(ts.daysinmonth, 31) + + ts = Timestamp('2014-01-01 00:00:00+01:00') + starts = ['is_month_start', 'is_quarter_start', 'is_year_start'] + for start in starts: + assert getattr(ts, start) + ts = Timestamp('2014-12-31 23:59:59+01:00') + ends = ['is_month_end', 'is_year_end', 'is_quarter_end'] + for end in ends: + assert getattr(ts, end) + + def test_pprint(self): + # GH12622 + import pprint + nested_obj = {'foo': 1, + 'bar': [{'w': {'a': Timestamp('2011-01-01')}}] * 10} + result = pprint.pformat(nested_obj, width=50) + expected = r"""{'bar': [{'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, + {'w': {'a': Timestamp('2011-01-01 00:00:00')}}], + 'foo': 1}""" + assert result == expected + + def to_datetime_depr(self): + # see gh-8254 + ts = Timestamp('2011-01-01') + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + expected = datetime(2011, 1, 1) + result = ts.to_datetime() + assert result == expected + + def to_pydatetime_nonzero_nano(self): + ts = Timestamp('2011-01-01 9:00:00.123456789') + + # Warn the user of data loss (nanoseconds). + with tm.assert_produces_warning(UserWarning, + check_stacklevel=False): + expected = datetime(2011, 1, 1, 9, 0, 0, 123456) + result = ts.to_pydatetime() + assert result == expected + + def test_round(self): + + # round + dt = Timestamp('20130101 09:10:11') + result = dt.round('D') + expected = Timestamp('20130101') + assert result == expected + + dt = Timestamp('20130101 19:10:11') + result = dt.round('D') + expected = Timestamp('20130102') + assert result == expected + + dt = Timestamp('20130201 12:00:00') + result = dt.round('D') + expected = Timestamp('20130202') + assert result == expected + + dt = Timestamp('20130104 12:00:00') + result = dt.round('D') + expected = Timestamp('20130105') + assert result == expected + + dt = Timestamp('20130104 12:32:00') + result = dt.round('30Min') + expected = Timestamp('20130104 12:30:00') + assert result == expected + + dti = date_range('20130101 09:10:11', periods=5) + result = dti.round('D') + expected = date_range('20130101', periods=5) + tm.assert_index_equal(result, expected) + + # floor + dt = Timestamp('20130101 09:10:11') + result = dt.floor('D') + expected = Timestamp('20130101') + assert result == expected + + # ceil + dt = Timestamp('20130101 09:10:11') + result = dt.ceil('D') + expected = Timestamp('20130102') + assert result == expected + + # round with tz + dt = Timestamp('20130101 09:10:11', tz='US/Eastern') + result = dt.round('D') + expected = Timestamp('20130101', tz='US/Eastern') + assert result == expected + + dt = Timestamp('20130101 09:10:11', tz='US/Eastern') + result = dt.round('s') + assert result == dt + + dti = date_range('20130101 09:10:11', + periods=5).tz_localize('UTC').tz_convert('US/Eastern') + result = dti.round('D') + expected = date_range('20130101', periods=5).tz_localize('US/Eastern') + tm.assert_index_equal(result, expected) + + result = dti.round('s') + tm.assert_index_equal(result, dti) + + # invalid + for freq in ['Y', 'M', 'foobar']: + pytest.raises(ValueError, lambda: dti.round(freq)) + + # GH 14440 & 15578 + result = Timestamp('2016-10-17 12:00:00.0015').round('ms') + expected = Timestamp('2016-10-17 12:00:00.002000') + assert result == expected + + result = Timestamp('2016-10-17 12:00:00.00149').round('ms') + expected = Timestamp('2016-10-17 12:00:00.001000') + assert result == expected + + ts = Timestamp('2016-10-17 12:00:00.0015') + for freq in ['us', 'ns']: + assert ts == ts.round(freq) + + result = Timestamp('2016-10-17 12:00:00.001501031').round('10ns') + expected = Timestamp('2016-10-17 12:00:00.001501030') + assert result == expected + + with tm.assert_produces_warning(): + Timestamp('2016-10-17 12:00:00.001501031').round('1010ns') + + def test_round_misc(self): + stamp = Timestamp('2000-01-05 05:09:15.13') + + def _check_round(freq, expected): + result = stamp.round(freq=freq) + assert result == expected + + for freq, expected in [('D', Timestamp('2000-01-05 00:00:00')), + ('H', Timestamp('2000-01-05 05:00:00')), + ('S', Timestamp('2000-01-05 05:09:15'))]: + _check_round(freq, expected) + + msg = frequencies._INVALID_FREQ_ERROR + with tm.assert_raises_regex(ValueError, msg): + stamp.round('foo') + + def test_class_ops_pytz(self): + def compare(x, y): + assert (int(Timestamp(x).value / 1e9) == + int(Timestamp(y).value / 1e9)) + + compare(Timestamp.now(), datetime.now()) + compare(Timestamp.now('UTC'), datetime.now(timezone('UTC'))) + compare(Timestamp.utcnow(), datetime.utcnow()) + compare(Timestamp.today(), datetime.today()) + current_time = calendar.timegm(datetime.now().utctimetuple()) + compare(Timestamp.utcfromtimestamp(current_time), + datetime.utcfromtimestamp(current_time)) + compare(Timestamp.fromtimestamp(current_time), + datetime.fromtimestamp(current_time)) + + date_component = datetime.utcnow() + time_component = (date_component + timedelta(minutes=10)).time() + compare(Timestamp.combine(date_component, time_component), + datetime.combine(date_component, time_component)) + + def test_class_ops_dateutil(self): + def compare(x, y): + assert (int(np.round(Timestamp(x).value / 1e9)) == + int(np.round(Timestamp(y).value / 1e9))) + + compare(Timestamp.now(), datetime.now()) + compare(Timestamp.now('UTC'), datetime.now(tzutc())) + compare(Timestamp.utcnow(), datetime.utcnow()) + compare(Timestamp.today(), datetime.today()) + current_time = calendar.timegm(datetime.now().utctimetuple()) + compare(Timestamp.utcfromtimestamp(current_time), + datetime.utcfromtimestamp(current_time)) + compare(Timestamp.fromtimestamp(current_time), + datetime.fromtimestamp(current_time)) + + date_component = datetime.utcnow() + time_component = (date_component + timedelta(minutes=10)).time() + compare(Timestamp.combine(date_component, time_component), + datetime.combine(date_component, time_component)) + + def test_basics_nanos(self): + val = np.int64(946684800000000000).view('M8[ns]') + stamp = Timestamp(val.view('i8') + 500) + assert stamp.year == 2000 + assert stamp.month == 1 + assert stamp.microsecond == 0 + assert stamp.nanosecond == 500 + + # GH 14415 + val = np.iinfo(np.int64).min + 80000000000000 + stamp = Timestamp(val) + assert stamp.year == 1677 + assert stamp.month == 9 + assert stamp.day == 21 + assert stamp.microsecond == 145224 + assert stamp.nanosecond == 192 + + def test_unit(self): + + def check(val, unit=None, h=1, s=1, us=0): + stamp = Timestamp(val, unit=unit) + assert stamp.year == 2000 + assert stamp.month == 1 + assert stamp.day == 1 + assert stamp.hour == h + if unit != 'D': + assert stamp.minute == 1 + assert stamp.second == s + assert stamp.microsecond == us + else: + assert stamp.minute == 0 + assert stamp.second == 0 + assert stamp.microsecond == 0 + assert stamp.nanosecond == 0 + + ts = Timestamp('20000101 01:01:01') + val = ts.value + days = (ts - Timestamp('1970-01-01')).days + + check(val) + check(val / long(1000), unit='us') + check(val / long(1000000), unit='ms') + check(val / long(1000000000), unit='s') + check(days, unit='D', h=0) + + # using truediv, so these are like floats + if compat.PY3: + check((val + 500000) / long(1000000000), unit='s', us=500) + check((val + 500000000) / long(1000000000), unit='s', us=500000) + check((val + 500000) / long(1000000), unit='ms', us=500) + + # get chopped in py2 + else: + check((val + 500000) / long(1000000000), unit='s') + check((val + 500000000) / long(1000000000), unit='s') + check((val + 500000) / long(1000000), unit='ms') + + # ok + check((val + 500000) / long(1000), unit='us', us=500) + check((val + 500000000) / long(1000000), unit='ms', us=500000) + + # floats + check(val / 1000.0 + 5, unit='us', us=5) + check(val / 1000.0 + 5000, unit='us', us=5000) + check(val / 1000000.0 + 0.5, unit='ms', us=500) + check(val / 1000000.0 + 0.005, unit='ms', us=5) + check(val / 1000000000.0 + 0.5, unit='s', us=500000) + check(days + 0.5, unit='D', h=12) + + def test_roundtrip(self): + + # test value to string and back conversions + # further test accessors + base = Timestamp('20140101 00:00:00') + + result = Timestamp(base.value + Timedelta('5ms').value) + assert result == Timestamp(str(base) + ".005000") + assert result.microsecond == 5000 + + result = Timestamp(base.value + Timedelta('5us').value) + assert result == Timestamp(str(base) + ".000005") + assert result.microsecond == 5 + + result = Timestamp(base.value + Timedelta('5ns').value) + assert result == Timestamp(str(base) + ".000000005") + assert result.nanosecond == 5 + assert result.microsecond == 0 + + result = Timestamp(base.value + Timedelta('6ms 5us').value) + assert result == Timestamp(str(base) + ".006005") + assert result.microsecond == 5 + 6 * 1000 + + result = Timestamp(base.value + Timedelta('200ms 5us').value) + assert result == Timestamp(str(base) + ".200005") + assert result.microsecond == 5 + 200 * 1000 + + def test_comparison(self): + # 5-18-2012 00:00:00.000 + stamp = long(1337299200000000000) + + val = Timestamp(stamp) + + assert val == val + assert not val != val + assert not val < val + assert val <= val + assert not val > val + assert val >= val + + other = datetime(2012, 5, 18) + assert val == other + assert not val != other + assert not val < other + assert val <= other + assert not val > other + assert val >= other + + other = Timestamp(stamp + 100) + + assert val != other + assert val != other + assert val < other + assert val <= other + assert other > val + assert other >= val + + def test_compare_invalid(self): + + # GH 8058 + val = Timestamp('20130101 12:01:02') + assert not val == 'foo' + assert not val == 10.0 + assert not val == 1 + assert not val == long(1) + assert not val == [] + assert not val == {'foo': 1} + assert not val == np.float64(1) + assert not val == np.int64(1) + + assert val != 'foo' + assert val != 10.0 + assert val != 1 + assert val != long(1) + assert val != [] + assert val != {'foo': 1} + assert val != np.float64(1) + assert val != np.int64(1) + + # ops testing + df = DataFrame(np.random.randn(5, 2)) + a = df[0] + b = Series(np.random.randn(5)) + b.name = Timestamp('2000-01-01') + tm.assert_series_equal(a / b, 1 / (b / a)) + + def test_cant_compare_tz_naive_w_aware(self): + # see gh-1404 + a = Timestamp('3/12/2012') + b = Timestamp('3/12/2012', tz='utc') + + pytest.raises(Exception, a.__eq__, b) + pytest.raises(Exception, a.__ne__, b) + pytest.raises(Exception, a.__lt__, b) + pytest.raises(Exception, a.__gt__, b) + pytest.raises(Exception, b.__eq__, a) + pytest.raises(Exception, b.__ne__, a) + pytest.raises(Exception, b.__lt__, a) + pytest.raises(Exception, b.__gt__, a) + + if sys.version_info < (3, 3): + pytest.raises(Exception, a.__eq__, b.to_pydatetime()) + pytest.raises(Exception, a.to_pydatetime().__eq__, b) + else: + assert not a == b.to_pydatetime() + assert not a.to_pydatetime() == b + + def test_cant_compare_tz_naive_w_aware_explicit_pytz(self): + # see gh-1404 + a = Timestamp('3/12/2012') + b = Timestamp('3/12/2012', tz=utc) + + pytest.raises(Exception, a.__eq__, b) + pytest.raises(Exception, a.__ne__, b) + pytest.raises(Exception, a.__lt__, b) + pytest.raises(Exception, a.__gt__, b) + pytest.raises(Exception, b.__eq__, a) + pytest.raises(Exception, b.__ne__, a) + pytest.raises(Exception, b.__lt__, a) + pytest.raises(Exception, b.__gt__, a) + + if sys.version_info < (3, 3): + pytest.raises(Exception, a.__eq__, b.to_pydatetime()) + pytest.raises(Exception, a.to_pydatetime().__eq__, b) + else: + assert not a == b.to_pydatetime() + assert not a.to_pydatetime() == b + + def test_cant_compare_tz_naive_w_aware_dateutil(self): + # see gh-1404 + a = Timestamp('3/12/2012') + b = Timestamp('3/12/2012', tz=tzutc()) + + pytest.raises(Exception, a.__eq__, b) + pytest.raises(Exception, a.__ne__, b) + pytest.raises(Exception, a.__lt__, b) + pytest.raises(Exception, a.__gt__, b) + pytest.raises(Exception, b.__eq__, a) + pytest.raises(Exception, b.__ne__, a) + pytest.raises(Exception, b.__lt__, a) + pytest.raises(Exception, b.__gt__, a) + + if sys.version_info < (3, 3): + pytest.raises(Exception, a.__eq__, b.to_pydatetime()) + pytest.raises(Exception, a.to_pydatetime().__eq__, b) + else: + assert not a == b.to_pydatetime() + assert not a.to_pydatetime() == b + + def test_delta_preserve_nanos(self): + val = Timestamp(long(1337299200000000123)) + result = val + timedelta(1) + assert result.nanosecond == val.nanosecond + + def test_frequency_misc(self): + assert (frequencies.get_freq_group('T') == + frequencies.FreqGroup.FR_MIN) + + code, stride = frequencies.get_freq_code(offsets.Hour()) + assert code == frequencies.FreqGroup.FR_HR + + code, stride = frequencies.get_freq_code((5, 'T')) + assert code == frequencies.FreqGroup.FR_MIN + assert stride == 5 + + offset = offsets.Hour() + result = frequencies.to_offset(offset) + assert result == offset + + result = frequencies.to_offset((5, 'T')) + expected = offsets.Minute(5) + assert result == expected + + pytest.raises(ValueError, frequencies.get_freq_code, (5, 'baz')) + + pytest.raises(ValueError, frequencies.to_offset, '100foo') + + pytest.raises(ValueError, frequencies.to_offset, ('', '')) + + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + result = frequencies.get_standard_freq(offsets.Hour()) + assert result == 'H' + + def test_hash_equivalent(self): + d = {datetime(2011, 1, 1): 5} + stamp = Timestamp(datetime(2011, 1, 1)) + assert d[stamp] == 5 + + def test_timestamp_compare_scalars(self): + # case where ndim == 0 + lhs = np.datetime64(datetime(2013, 12, 6)) + rhs = Timestamp('now') + nat = Timestamp('nat') + + ops = {'gt': 'lt', + 'lt': 'gt', + 'ge': 'le', + 'le': 'ge', + 'eq': 'eq', + 'ne': 'ne'} + + for left, right in ops.items(): + left_f = getattr(operator, left) + right_f = getattr(operator, right) + expected = left_f(lhs, rhs) + + result = right_f(rhs, lhs) + assert result == expected + + expected = left_f(rhs, nat) + result = right_f(nat, rhs) + assert result == expected + + def test_timestamp_compare_series(self): + # make sure we can compare Timestamps on the right AND left hand side + # GH4982 + s = Series(date_range('20010101', periods=10), name='dates') + s_nat = s.copy(deep=True) + + s[0] = Timestamp('nat') + s[3] = Timestamp('nat') + + ops = {'lt': 'gt', 'le': 'ge', 'eq': 'eq', 'ne': 'ne'} + + for left, right in ops.items(): + left_f = getattr(operator, left) + right_f = getattr(operator, right) + + # no nats + expected = left_f(s, Timestamp('20010109')) + result = right_f(Timestamp('20010109'), s) + tm.assert_series_equal(result, expected) + + # nats + expected = left_f(s, Timestamp('nat')) + result = right_f(Timestamp('nat'), s) + tm.assert_series_equal(result, expected) + + # compare to timestamp with series containing nats + expected = left_f(s_nat, Timestamp('20010109')) + result = right_f(Timestamp('20010109'), s_nat) + tm.assert_series_equal(result, expected) + + # compare to nat with series containing nats + expected = left_f(s_nat, Timestamp('nat')) + result = right_f(Timestamp('nat'), s_nat) + tm.assert_series_equal(result, expected) + + def test_is_leap_year(self): + # GH 13727 + for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: + dt = Timestamp('2000-01-01 00:00:00', tz=tz) + assert dt.is_leap_year + assert isinstance(dt.is_leap_year, bool) + + dt = Timestamp('1999-01-01 00:00:00', tz=tz) + assert not dt.is_leap_year + + dt = Timestamp('2004-01-01 00:00:00', tz=tz) + assert dt.is_leap_year + + dt = Timestamp('2100-01-01 00:00:00', tz=tz) + assert not dt.is_leap_year + + +class TestTimestampNsOperations(object): + + def setup_method(self, method): + self.timestamp = Timestamp(datetime.utcnow()) + + def assert_ns_timedelta(self, modified_timestamp, expected_value): + value = self.timestamp.value + modified_value = modified_timestamp.value + + assert modified_value - value == expected_value + + def test_timedelta_ns_arithmetic(self): + self.assert_ns_timedelta(self.timestamp + np.timedelta64(-123, 'ns'), + -123) + + def test_timedelta_ns_based_arithmetic(self): + self.assert_ns_timedelta(self.timestamp + np.timedelta64( + 1234567898, 'ns'), 1234567898) + + def test_timedelta_us_arithmetic(self): + self.assert_ns_timedelta(self.timestamp + np.timedelta64(-123, 'us'), + -123000) + + def test_timedelta_ms_arithmetic(self): + time = self.timestamp + np.timedelta64(-123, 'ms') + self.assert_ns_timedelta(time, -123000000) + + def test_nanosecond_string_parsing(self): + ts = Timestamp('2013-05-01 07:15:45.123456789') + # GH 7878 + expected_repr = '2013-05-01 07:15:45.123456789' + expected_value = 1367392545123456789 + assert ts.value == expected_value + assert expected_repr in repr(ts) + + ts = Timestamp('2013-05-01 07:15:45.123456789+09:00', tz='Asia/Tokyo') + assert ts.value == expected_value - 9 * 3600 * 1000000000 + assert expected_repr in repr(ts) + + ts = Timestamp('2013-05-01 07:15:45.123456789', tz='UTC') + assert ts.value == expected_value + assert expected_repr in repr(ts) + + ts = Timestamp('2013-05-01 07:15:45.123456789', tz='US/Eastern') + assert ts.value == expected_value + 4 * 3600 * 1000000000 + assert expected_repr in repr(ts) + + # GH 10041 + ts = Timestamp('20130501T071545.123456789') + assert ts.value == expected_value + assert expected_repr in repr(ts) + + def test_nanosecond_timestamp(self): + # GH 7610 + expected = 1293840000000000005 + t = Timestamp('2011-01-01') + offsets.Nano(5) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000005')" + assert t.value == expected + assert t.nanosecond == 5 + + t = Timestamp(t) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000005')" + assert t.value == expected + assert t.nanosecond == 5 + + t = Timestamp(np_datetime64_compat('2011-01-01 00:00:00.000000005Z')) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000005')" + assert t.value == expected + assert t.nanosecond == 5 + + expected = 1293840000000000010 + t = t + offsets.Nano(5) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000010')" + assert t.value == expected + assert t.nanosecond == 10 + + t = Timestamp(t) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000010')" + assert t.value == expected + assert t.nanosecond == 10 + + t = Timestamp(np_datetime64_compat('2011-01-01 00:00:00.000000010Z')) + assert repr(t) == "Timestamp('2011-01-01 00:00:00.000000010')" + assert t.value == expected + assert t.nanosecond == 10 + + +class TestTimestampOps(object): + + def test_timestamp_and_datetime(self): + assert ((Timestamp(datetime(2013, 10, 13)) - + datetime(2013, 10, 12)).days == 1) + assert ((datetime(2013, 10, 12) - + Timestamp(datetime(2013, 10, 13))).days == -1) + + def test_timestamp_and_series(self): + timestamp_series = Series(date_range('2014-03-17', periods=2, freq='D', + tz='US/Eastern')) + first_timestamp = timestamp_series[0] + + delta_series = Series([np.timedelta64(0, 'D'), np.timedelta64(1, 'D')]) + assert_series_equal(timestamp_series - first_timestamp, delta_series) + assert_series_equal(first_timestamp - timestamp_series, -delta_series) + + def test_addition_subtraction_types(self): + # Assert on the types resulting from Timestamp +/- various date/time + # objects + datetime_instance = datetime(2014, 3, 4) + timedelta_instance = timedelta(seconds=1) + # build a timestamp with a frequency, since then it supports + # addition/subtraction of integers + timestamp_instance = date_range(datetime_instance, periods=1, + freq='D')[0] + + assert type(timestamp_instance + 1) == Timestamp + assert type(timestamp_instance - 1) == Timestamp + + # Timestamp + datetime not supported, though subtraction is supported + # and yields timedelta more tests in tseries/base/tests/test_base.py + assert type(timestamp_instance - datetime_instance) == Timedelta + assert type(timestamp_instance + timedelta_instance) == Timestamp + assert type(timestamp_instance - timedelta_instance) == Timestamp + + # Timestamp +/- datetime64 not supported, so not tested (could possibly + # assert error raised?) + timedelta64_instance = np.timedelta64(1, 'D') + assert type(timestamp_instance + timedelta64_instance) == Timestamp + assert type(timestamp_instance - timedelta64_instance) == Timestamp + + def test_addition_subtraction_preserve_frequency(self): + timestamp_instance = date_range('2014-03-05', periods=1, freq='D')[0] + timedelta_instance = timedelta(days=1) + original_freq = timestamp_instance.freq + + assert (timestamp_instance + 1).freq == original_freq + assert (timestamp_instance - 1).freq == original_freq + assert (timestamp_instance + timedelta_instance).freq == original_freq + assert (timestamp_instance - timedelta_instance).freq == original_freq + + timedelta64_instance = np.timedelta64(1, 'D') + assert (timestamp_instance + + timedelta64_instance).freq == original_freq + assert (timestamp_instance - + timedelta64_instance).freq == original_freq + + def test_resolution(self): + + for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', 'T', + 'S', 'L', 'U'], + [RESO_DAY, RESO_DAY, + RESO_DAY, RESO_DAY, + RESO_HR, RESO_MIN, + RESO_SEC, RESO_MS, + RESO_US]): + for tz in [None, 'Asia/Tokyo', 'US/Eastern', + 'dateutil/US/Eastern']: + idx = date_range(start='2013-04-01', periods=30, freq=freq, + tz=tz) + result = period.resolution(idx.asi8, idx.tz) + assert result == expected + + +class TestTimestampToJulianDate(object): + + def test_compare_1700(self): + r = Timestamp('1700-06-23').to_julian_date() + assert r == 2342145.5 + + def test_compare_2000(self): + r = Timestamp('2000-04-12').to_julian_date() + assert r == 2451646.5 + + def test_compare_2100(self): + r = Timestamp('2100-08-12').to_julian_date() + assert r == 2488292.5 + + def test_compare_hour01(self): + r = Timestamp('2000-08-12T01:00:00').to_julian_date() + assert r == 2451768.5416666666666666 + + def test_compare_hour13(self): + r = Timestamp('2000-08-12T13:00:00').to_julian_date() + assert r == 2451769.0416666666666666 + + +class TestTimeSeries(object): + + def test_timestamp_to_datetime(self): + rng = date_range('20090415', '20090519', tz='US/Eastern') + + stamp = rng[0] + dtval = stamp.to_pydatetime() + assert stamp == dtval + assert stamp.tzinfo == dtval.tzinfo + + def test_timestamp_to_datetime_dateutil(self): + rng = date_range('20090415', '20090519', tz='dateutil/US/Eastern') + + stamp = rng[0] + dtval = stamp.to_pydatetime() + assert stamp == dtval + assert stamp.tzinfo == dtval.tzinfo + + def test_timestamp_to_datetime_explicit_pytz(self): + rng = date_range('20090415', '20090519', + tz=pytz.timezone('US/Eastern')) + + stamp = rng[0] + dtval = stamp.to_pydatetime() + assert stamp == dtval + assert stamp.tzinfo == dtval.tzinfo + + def test_timestamp_to_datetime_explicit_dateutil(self): + tm._skip_if_windows_python_3() + + from pandas._libs.tslib import _dateutil_gettz as gettz + rng = date_range('20090415', '20090519', tz=gettz('US/Eastern')) + + stamp = rng[0] + dtval = stamp.to_pydatetime() + assert stamp == dtval + assert stamp.tzinfo == dtval.tzinfo + + def test_timestamp_fields(self): + # extra fields from DatetimeIndex like quarter and week + idx = tm.makeDateIndex(100) + + fields = ['dayofweek', 'dayofyear', 'week', 'weekofyear', 'quarter', + 'days_in_month', 'is_month_start', 'is_month_end', + 'is_quarter_start', 'is_quarter_end', 'is_year_start', + 'is_year_end', 'weekday_name'] + for f in fields: + expected = getattr(idx, f)[-1] + result = getattr(Timestamp(idx[-1]), f) + assert result == expected + + assert idx.freq == Timestamp(idx[-1], idx.freq).freq + assert idx.freqstr == Timestamp(idx[-1], idx.freq).freqstr + + def test_timestamp_date_out_of_range(self): + pytest.raises(ValueError, Timestamp, '1676-01-01') + pytest.raises(ValueError, Timestamp, '2263-01-01') + + # see gh-1475 + pytest.raises(ValueError, DatetimeIndex, ['1400-01-01']) + pytest.raises(ValueError, DatetimeIndex, [datetime(1400, 1, 1)]) + + def test_timestamp_repr(self): + # pre-1900 + stamp = Timestamp('1850-01-01', tz='US/Eastern') + repr(stamp) + + iso8601 = '1850-01-01 01:23:45.012345' + stamp = Timestamp(iso8601, tz='US/Eastern') + result = repr(stamp) + assert iso8601 in result + + def test_timestamp_from_ordinal(self): + + # GH 3042 + dt = datetime(2011, 4, 16, 0, 0) + ts = Timestamp.fromordinal(dt.toordinal()) + assert ts.to_pydatetime() == dt + + # with a tzinfo + stamp = Timestamp('2011-4-16', tz='US/Eastern') + dt_tz = stamp.to_pydatetime() + ts = Timestamp.fromordinal(dt_tz.toordinal(), tz='US/Eastern') + assert ts.to_pydatetime() == dt_tz + + def test_timestamp_compare_with_early_datetime(self): + # e.g. datetime.min + stamp = Timestamp('2012-01-01') + + assert not stamp == datetime.min + assert not stamp == datetime(1600, 1, 1) + assert not stamp == datetime(2700, 1, 1) + assert stamp != datetime.min + assert stamp != datetime(1600, 1, 1) + assert stamp != datetime(2700, 1, 1) + assert stamp > datetime(1600, 1, 1) + assert stamp >= datetime(1600, 1, 1) + assert stamp < datetime(2700, 1, 1) + assert stamp <= datetime(2700, 1, 1) + + def test_timestamp_equality(self): + + # GH 11034 + s = Series([Timestamp('2000-01-29 01:59:00'), 'NaT']) + result = s != s + assert_series_equal(result, Series([False, True])) + result = s != s[0] + assert_series_equal(result, Series([False, True])) + result = s != s[1] + assert_series_equal(result, Series([True, True])) + + result = s == s + assert_series_equal(result, Series([True, False])) + result = s == s[0] + assert_series_equal(result, Series([True, False])) + result = s == s[1] + assert_series_equal(result, Series([False, False])) + + def test_series_box_timestamp(self): + rng = date_range('20090415', '20090519', freq='B') + s = Series(rng) + + assert isinstance(s[5], Timestamp) + + rng = date_range('20090415', '20090519', freq='B') + s = Series(rng, index=rng) + assert isinstance(s[5], Timestamp) + + assert isinstance(s.iat[5], Timestamp) + + def test_frame_setitem_timestamp(self): + # 2155 + columns = DatetimeIndex(start='1/1/2012', end='2/1/2012', + freq=offsets.BDay()) + index = lrange(10) + data = DataFrame(columns=columns, index=index) + t = datetime(2012, 11, 1) + ts = Timestamp(t) + data[ts] = np.nan # works + + def test_to_html_timestamp(self): + rng = date_range('2000-01-01', periods=10) + df = DataFrame(np.random.randn(10, 4), index=rng) + + result = df.to_html() + assert '2000-01-01' in result + + def test_series_map_box_timestamps(self): + # #2689, #2627 + s = Series(date_range('1/1/2000', periods=10)) + + def f(x): + return (x.hour, x.day, x.month) + + # it works! + s.map(f) + s.apply(f) + DataFrame(s).applymap(f) + + def test_dti_slicing(self): + dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') + dti2 = dti[[1, 3, 5]] + + v1 = dti2[0] + v2 = dti2[1] + v3 = dti2[2] + + assert v1 == Timestamp('2/28/2005') + assert v2 == Timestamp('4/30/2005') + assert v3 == Timestamp('6/30/2005') + + # don't carry freq through irregular slicing + assert dti2.freq is None + + def test_woy_boundary(self): + # make sure weeks at year boundaries are correct + d = datetime(2013, 12, 31) + result = Timestamp(d).week + expected = 1 # ISO standard + assert result == expected + + d = datetime(2008, 12, 28) + result = Timestamp(d).week + expected = 52 # ISO standard + assert result == expected + + d = datetime(2009, 12, 31) + result = Timestamp(d).week + expected = 53 # ISO standard + assert result == expected + + d = datetime(2010, 1, 1) + result = Timestamp(d).week + expected = 53 # ISO standard + assert result == expected + + d = datetime(2010, 1, 3) + result = Timestamp(d).week + expected = 53 # ISO standard + assert result == expected + + result = np.array([Timestamp(datetime(*args)).week + for args in [(2000, 1, 1), (2000, 1, 2), ( + 2005, 1, 1), (2005, 1, 2)]]) + assert (result == [52, 52, 53, 53]).all() + + +class TestTsUtil(object): + + def test_min_valid(self): + # Ensure that Timestamp.min is a valid Timestamp + Timestamp(Timestamp.min) + + def test_max_valid(self): + # Ensure that Timestamp.max is a valid Timestamp + Timestamp(Timestamp.max) + + def test_to_datetime_bijective(self): + # Ensure that converting to datetime and back only loses precision + # by going from nanoseconds to microseconds. + exp_warning = None if Timestamp.max.nanosecond == 0 else UserWarning + with tm.assert_produces_warning(exp_warning, check_stacklevel=False): + assert (Timestamp(Timestamp.max.to_pydatetime()).value / 1000 == + Timestamp.max.value / 1000) + + exp_warning = None if Timestamp.min.nanosecond == 0 else UserWarning + with tm.assert_produces_warning(exp_warning, check_stacklevel=False): + assert (Timestamp(Timestamp.min.to_pydatetime()).value / 1000 == + Timestamp.min.value / 1000) diff --git a/pandas/tests/series/common.py b/pandas/tests/series/common.py index 613961e1c670f..0c25dcb29c3b2 100644 --- a/pandas/tests/series/common.py +++ b/pandas/tests/series/common.py @@ -1,4 +1,4 @@ -from pandas.util.decorators import cache_readonly +from pandas.util._decorators import cache_readonly import pandas.util.testing as tm import pandas as pd diff --git a/pandas/tests/series/test_alter_axes.py b/pandas/tests/series/test_alter_axes.py index 2ddfa27eea377..f33e19c7f6223 100644 --- a/pandas/tests/series/test_alter_axes.py +++ b/pandas/tests/series/test_alter_axes.py @@ -1,6 +1,8 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime import numpy as np @@ -16,29 +18,27 @@ from .common import TestData -class TestSeriesAlterAxes(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesAlterAxes(TestData): def test_setindex(self): # wrong type series = self.series.copy() - self.assertRaises(TypeError, setattr, series, 'index', None) + pytest.raises(TypeError, setattr, series, 'index', None) # wrong length series = self.series.copy() - self.assertRaises(Exception, setattr, series, 'index', - np.arange(len(series) - 1)) + pytest.raises(Exception, setattr, series, 'index', + np.arange(len(series) - 1)) # works series = self.series.copy() series.index = np.arange(len(series)) - tm.assertIsInstance(series.index, Index) + assert isinstance(series.index, Index) def test_rename(self): renamer = lambda x: x.strftime('%Y%m%d') renamed = self.ts.rename(renamer) - self.assertEqual(renamed.index[0], renamer(self.ts.index[0])) + assert renamed.index[0] == renamer(self.ts.index[0]) # dict rename_dict = dict(zip(self.ts.index, renamed.index)) @@ -48,14 +48,14 @@ def test_rename(self): # partial dict s = Series(np.arange(4), index=['a', 'b', 'c', 'd'], dtype='int64') renamed = s.rename({'b': 'foo', 'd': 'bar'}) - self.assert_index_equal(renamed.index, Index(['a', 'foo', 'c', 'bar'])) + tm.assert_index_equal(renamed.index, Index(['a', 'foo', 'c', 'bar'])) # index with name renamer = Series(np.arange(4), index=Index(['a', 'b', 'c', 'd'], name='name'), dtype='int64') renamed = renamer.rename({}) - self.assertEqual(renamed.index.name, renamer.index.name) + assert renamed.index.name == renamer.index.name def test_rename_by_series(self): s = Series(range(5), name='foo') @@ -68,51 +68,48 @@ def test_rename_set_name(self): s = Series(range(4), index=list('abcd')) for name in ['foo', 123, 123., datetime(2001, 11, 11), ('foo',)]: result = s.rename(name) - self.assertEqual(result.name, name) - self.assert_numpy_array_equal(result.index.values, s.index.values) - self.assertTrue(s.name is None) + assert result.name == name + tm.assert_numpy_array_equal(result.index.values, s.index.values) + assert s.name is None def test_rename_set_name_inplace(self): s = Series(range(3), index=list('abc')) for name in ['foo', 123, 123., datetime(2001, 11, 11), ('foo',)]: s.rename(name, inplace=True) - self.assertEqual(s.name, name) + assert s.name == name exp = np.array(['a', 'b', 'c'], dtype=np.object_) - self.assert_numpy_array_equal(s.index.values, exp) + tm.assert_numpy_array_equal(s.index.values, exp) def test_set_name_attribute(self): s = Series([1, 2, 3]) s2 = Series([1, 2, 3], name='bar') for name in [7, 7., 'name', datetime(2001, 1, 1), (1,), u"\u05D0"]: s.name = name - self.assertEqual(s.name, name) + assert s.name == name s2.name = name - self.assertEqual(s2.name, name) + assert s2.name == name def test_set_name(self): s = Series([1, 2, 3]) s2 = s._set_name('foo') - self.assertEqual(s2.name, 'foo') - self.assertTrue(s.name is None) - self.assertTrue(s is not s2) + assert s2.name == 'foo' + assert s.name is None + assert s is not s2 def test_rename_inplace(self): renamer = lambda x: x.strftime('%Y%m%d') expected = renamer(self.ts.index[0]) self.ts.rename(renamer, inplace=True) - self.assertEqual(self.ts.index[0], expected) + assert self.ts.index[0] == expected def test_set_index_makes_timeseries(self): idx = tm.makeDateIndex(10) s = Series(lrange(10)) s.index = idx - - with tm.assert_produces_warning(FutureWarning): - self.assertTrue(s.is_time_series) - self.assertTrue(s.index.is_all_dates) + assert s.index.is_all_dates def test_reset_index(self): df = tm.makeDataFrame()[:5] @@ -121,10 +118,10 @@ def test_reset_index(self): ser.name = 'value' df = ser.reset_index() - self.assertIn('value', df) + assert 'value' in df df = ser.reset_index(name='value2') - self.assertIn('value2', df) + assert 'value2' in df # check inplace s = ser.reset_index(drop=True) @@ -138,17 +135,56 @@ def test_reset_index(self): [0, 1, 0, 1, 0, 1]]) s = Series(np.random.randn(6), index=index) rs = s.reset_index(level=1) - self.assertEqual(len(rs.columns), 2) + assert len(rs.columns) == 2 rs = s.reset_index(level=[0, 2], drop=True) - self.assert_index_equal(rs.index, Index(index.get_level_values(1))) - tm.assertIsInstance(rs, Series) + tm.assert_index_equal(rs.index, Index(index.get_level_values(1))) + assert isinstance(rs, Series) + + def test_reset_index_level(self): + df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], + columns=['A', 'B', 'C']) + + for levels in ['A', 'B'], [0, 1]: + # With MultiIndex + s = df.set_index(['A', 'B'])['C'] + + result = s.reset_index(level=levels[0]) + tm.assert_frame_equal(result, df.set_index('B')) + + result = s.reset_index(level=levels[:1]) + tm.assert_frame_equal(result, df.set_index('B')) + + result = s.reset_index(level=levels) + tm.assert_frame_equal(result, df) + + result = df.set_index(['A', 'B']).reset_index(level=levels, + drop=True) + tm.assert_frame_equal(result, df[['C']]) + + with tm.assert_raises_regex(KeyError, 'Level E '): + s.reset_index(level=['A', 'E']) + + # With single-level Index + s = df.set_index('A')['B'] + + result = s.reset_index(level=levels[0]) + tm.assert_frame_equal(result, df[['A', 'B']]) + + result = s.reset_index(level=levels[:1]) + tm.assert_frame_equal(result, df[['A', 'B']]) + + result = s.reset_index(level=levels[0], drop=True) + tm.assert_series_equal(result, df['B']) + + with tm.assert_raises_regex(IndexError, 'Too many levels'): + s.reset_index(level=[0, 1, 2]) def test_reset_index_range(self): # GH 12071 s = pd.Series(range(2), name='A', dtype='int64') series_result = s.reset_index() - tm.assertIsInstance(series_result.index, RangeIndex) + assert isinstance(series_result.index, RangeIndex) series_expected = pd.DataFrame([[0, 0], [1, 1]], columns=['index', 'A'], index=RangeIndex(stop=2)) @@ -188,3 +224,57 @@ def test_reorder_levels(self): result = s.reorder_levels(['L0', 'L0', 'L0']) assert_series_equal(result, expected) + + def test_rename_axis_inplace(self): + # GH 15704 + series = self.ts.copy() + expected = series.rename_axis('foo') + result = series.copy() + no_return = result.rename_axis('foo', inplace=True) + + assert no_return is None + assert_series_equal(result, expected) + + def test_set_axis_inplace(self): + # GH14636 + + s = Series(np.arange(4), index=[1, 3, 5, 7], dtype='int64') + + expected = s.copy() + expected.index = list('abcd') + + for axis in 0, 'index': + # inplace=True + # The FutureWarning comes from the fact that we would like to have + # inplace default to False some day + for inplace, warn in (None, FutureWarning), (True, None): + result = s.copy() + kwargs = {'inplace': inplace} + with tm.assert_produces_warning(warn): + result.set_axis(list('abcd'), axis=axis, **kwargs) + tm.assert_series_equal(result, expected) + + # inplace=False + result = s.set_axis(list('abcd'), axis=0, inplace=False) + tm.assert_series_equal(expected, result) + + # omitting the "axis" parameter + with tm.assert_produces_warning(None): + result = s.set_axis(list('abcd'), inplace=False) + tm.assert_series_equal(result, expected) + + # wrong values for the "axis" parameter + for axis in 2, 'foo': + with tm.assert_raises_regex(ValueError, 'No axis named'): + s.set_axis(list('abcd'), axis=axis, inplace=False) + + def test_set_axis_prior_to_deprecation_signature(self): + s = Series(np.arange(4), index=[1, 3, 5, 7], dtype='int64') + + expected = s.copy() + expected.index = list('abcd') + + for axis in 0, 'index': + with tm.assert_produces_warning(FutureWarning): + result = s.set_axis(0, list('abcd'), inplace=False) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_analytics.py b/pandas/tests/series/test_analytics.py index 07e1be609670f..f1d044f7a1132 100644 --- a/pandas/tests/series/test_analytics.py +++ b/pandas/tests/series/test_analytics.py @@ -4,22 +4,22 @@ from itertools import product from distutils.version import LooseVersion -import nose +import pytest from numpy import nan import numpy as np import pandas as pd -from pandas import (Series, Categorical, DataFrame, isnull, notnull, +from pandas import (Series, Categorical, DataFrame, isna, notna, bdate_range, date_range, _np_version_under1p10) from pandas.core.index import MultiIndex -from pandas.tseries.index import Timestamp -from pandas.tseries.tdi import Timedelta +from pandas.core.indexes.datetimes import Timestamp +from pandas.core.indexes.timedeltas import Timedelta import pandas.core.config as cf import pandas.core.nanops as nanops -from pandas.compat import lrange, range +from pandas.compat import lrange, range, is_platform_windows from pandas import compat from pandas.util.testing import (assert_series_equal, assert_almost_equal, assert_frame_equal, assert_index_equal) @@ -28,23 +28,25 @@ from .common import TestData -class TestSeriesAnalytics(TestData, tm.TestCase): +skip_if_bottleneck_on_windows = (is_platform_windows() and + nanops._USE_BOTTLENECK) - _multiprocess_can_split_ = True + +class TestSeriesAnalytics(TestData): def test_sum_zero(self): arr = np.array([]) - self.assertEqual(nanops.nansum(arr), 0) + assert nanops.nansum(arr) == 0 arr = np.empty((10, 0)) - self.assertTrue((nanops.nansum(arr, axis=1) == 0).all()) + assert (nanops.nansum(arr, axis=1) == 0).all() # GH #844 s = Series([], index=[]) - self.assertEqual(s.sum(), 0) + assert s.sum() == 0 df = DataFrame(np.empty((10, 0))) - self.assertTrue((df.sum(1) == 0).all()) + assert (df.sum(1) == 0).all() def test_nansum_buglet(self): s = Series([1.0, np.nan], index=[0, 1]) @@ -60,19 +62,11 @@ def test_overflow(self): # no bottleneck result = s.sum(skipna=False) - self.assertEqual(int(result), v.sum(dtype='int64')) + assert int(result) == v.sum(dtype='int64') result = s.min(skipna=False) - self.assertEqual(int(result), 0) + assert int(result) == 0 result = s.max(skipna=False) - self.assertEqual(int(result), v[-1]) - - # use bottleneck if available - result = s.sum() - self.assertEqual(int(result), v.sum(dtype='int64')) - result = s.min() - self.assertEqual(int(result), 0) - result = s.max() - self.assertEqual(int(result), v[-1]) + assert int(result) == v[-1] for dtype in ['float32', 'float64']: v = np.arange(5000000, dtype=dtype) @@ -80,20 +74,45 @@ def test_overflow(self): # no bottleneck result = s.sum(skipna=False) - self.assertEqual(result, v.sum(dtype=dtype)) + assert result == v.sum(dtype=dtype) result = s.min(skipna=False) - self.assertTrue(np.allclose(float(result), 0.0)) + assert np.allclose(float(result), 0.0) result = s.max(skipna=False) - self.assertTrue(np.allclose(float(result), v[-1])) + assert np.allclose(float(result), v[-1]) + + @pytest.mark.xfail( + skip_if_bottleneck_on_windows, + reason="buggy bottleneck with sum overflow on windows") + def test_overflow_with_bottleneck(self): + # GH 6915 + # overflowing on the smaller int dtypes + for dtype in ['int32', 'int64']: + v = np.arange(5000000, dtype=dtype) + s = Series(v) # use bottleneck if available result = s.sum() - self.assertEqual(result, v.sum(dtype=dtype)) + assert int(result) == v.sum(dtype='int64') result = s.min() - self.assertTrue(np.allclose(float(result), 0.0)) + assert int(result) == 0 result = s.max() - self.assertTrue(np.allclose(float(result), v[-1])) + assert int(result) == v[-1] + + for dtype in ['float32', 'float64']: + v = np.arange(5000000, dtype=dtype) + s = Series(v) + # use bottleneck if available + result = s.sum() + assert result == v.sum(dtype=dtype) + result = s.min() + assert np.allclose(float(result), 0.0) + result = s.max() + assert np.allclose(float(result), v[-1]) + + @pytest.mark.xfail( + skip_if_bottleneck_on_windows, + reason="buggy bottleneck with sum overflow on windows") def test_sum(self): self._check_stat_op('sum', np.sum, check_allna=True) @@ -106,16 +125,16 @@ def test_sum_inf(self): s[5:8] = np.inf s2[5:8] = np.nan - self.assertTrue(np.isinf(s.sum())) + assert np.isinf(s.sum()) arr = np.random.randn(100, 100).astype('f4') arr[:, 2] = np.inf - with cf.option_context("mode.use_inf_as_null", True): + with cf.option_context("mode.use_inf_as_na", True): assert_almost_equal(s.sum(), s2.sum()) res = nanops.nansum(arr, axis=1) - self.assertTrue(np.isinf(res).all()) + assert np.isinf(res).all() def test_mean(self): self._check_stat_op('mean', np.mean) @@ -125,17 +144,17 @@ def test_median(self): # test with integers, test failure int_ts = Series(np.ones(10, dtype=int), index=lrange(10)) - self.assertAlmostEqual(np.median(int_ts), int_ts.median()) + tm.assert_almost_equal(np.median(int_ts), int_ts.median()) def test_mode(self): # No mode should be found. exp = Series([], dtype=np.float64) tm.assert_series_equal(Series([]).mode(), exp) - exp = Series([], dtype=np.int64) + exp = Series([1], dtype=np.int64) tm.assert_series_equal(Series([1]).mode(), exp) - exp = Series([], dtype=np.object) + exp = Series(['a', 'b', 'c'], dtype=np.object) tm.assert_series_equal(Series(['a', 'b', 'c']).mode(), exp) # Test numerical data types. @@ -171,7 +190,8 @@ def test_mode(self): tm.assert_series_equal(s.mode(), exp) # Test datetime types. - exp = Series([], dtype="M8[ns]") + exp = Series(['1900-05-03', '2011-01-03', + '2013-01-02'], dtype='M8[ns]') s = Series(['2011-01-03', '2013-01-02', '1900-05-03'], dtype='M8[ns]') tm.assert_series_equal(s.mode(), exp) @@ -182,7 +202,7 @@ def test_mode(self): tm.assert_series_equal(s.mode(), exp) # gh-5986: Test timedelta types. - exp = Series([], dtype='timedelta64[ns]') + exp = Series(['-1 days', '0 days', '1 days'], dtype='timedelta64[ns]') s = Series(['1 days', '-1 days', '0 days'], dtype='timedelta64[ns]') tm.assert_series_equal(s.mode(), exp) @@ -202,13 +222,13 @@ def test_mode(self): s = Series([1, 2**63, 2**63], dtype=np.uint64) tm.assert_series_equal(s.mode(), exp) - exp = Series([], dtype=np.uint64) + exp = Series([1, 2**63], dtype=np.uint64) s = Series([1, 2**63], dtype=np.uint64) tm.assert_series_equal(s.mode(), exp) # Test category dtype. c = Categorical([1, 2]) - exp = Categorical([], categories=[1, 2]) + exp = Categorical([1, 2], categories=[1, 2]) exp = Series(exp, dtype='category') tm.assert_series_equal(Series(c).mode(), exp) @@ -249,10 +269,10 @@ def test_var_std(self): # 1 - element series with ddof=1 s = self.ts.iloc[[0]] result = s.var(ddof=1) - self.assertTrue(isnull(result)) + assert isna(result) result = s.std(ddof=1) - self.assertTrue(isnull(result)) + assert isna(result) def test_sem(self): alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) @@ -266,7 +286,7 @@ def test_sem(self): # 1 - element series with ddof=1 s = self.ts.iloc[[0]] result = s.sem(ddof=1) - self.assertTrue(isnull(result)) + assert isna(result) def test_skew(self): tm._skip_if_no_scipy() @@ -282,11 +302,11 @@ def test_skew(self): s = Series(np.ones(i)) df = DataFrame(np.ones((i, i))) if i < min_N: - self.assertTrue(np.isnan(s.skew())) - self.assertTrue(np.isnan(df.skew()).all()) + assert np.isnan(s.skew()) + assert np.isnan(df.skew()).all() else: - self.assertEqual(0, s.skew()) - self.assertTrue((df.skew() == 0).all()) + assert 0 == s.skew() + assert (df.skew() == 0).all() def test_kurt(self): tm._skip_if_no_scipy() @@ -299,7 +319,7 @@ def test_kurt(self): labels=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1]]) s = Series(np.random.randn(6), index=index) - self.assertAlmostEqual(s.kurt(), s.kurt(level=0)['bar']) + tm.assert_almost_equal(s.kurt(), s.kurt(level=0)['bar']) # test corner cases, kurt() returns NaN unless there's at least 4 # values @@ -308,11 +328,11 @@ def test_kurt(self): s = Series(np.ones(i)) df = DataFrame(np.ones((i, i))) if i < min_N: - self.assertTrue(np.isnan(s.kurt())) - self.assertTrue(np.isnan(df.kurt()).all()) + assert np.isnan(s.kurt()) + assert np.isnan(df.kurt()).all() else: - self.assertEqual(0, s.kurt()) - self.assertTrue((df.kurt() == 0).all()) + assert 0 == s.kurt() + assert (df.kurt() == 0).all() def test_describe(self): s = Series([0, 1, 2, 3, 4], name='int_data') @@ -321,31 +341,31 @@ def test_describe(self): name='int_data', index=['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) s = Series([True, True, False, False, False], name='bool_data') result = s.describe() expected = Series([5, 2, False, 3], name='bool_data', index=['count', 'unique', 'top', 'freq']) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) s = Series(['a', 'a', 'b', 'c', 'd'], name='str_data') result = s.describe() expected = Series([5, 4, 'a', 2], name='str_data', index=['count', 'unique', 'top', 'freq']) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_argsort(self): self._check_accum_op('argsort', check_dtype=False) argsorted = self.ts.argsort() - self.assertTrue(issubclass(argsorted.dtype.type, np.integer)) + assert issubclass(argsorted.dtype.type, np.integer) # GH 2967 (introduced bug in 0.11-dev I think) s = Series([Timestamp('201301%02d' % (i + 1)) for i in range(5)]) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' shifted = s.shift(-1) - self.assertEqual(shifted.dtype, 'datetime64[ns]') - self.assertTrue(isnull(shifted[4])) + assert shifted.dtype == 'datetime64[ns]' + assert isna(shifted[4]) result = s.argsort() expected = Series(lrange(5), dtype='int64') @@ -363,11 +383,12 @@ def test_argsort_stable(self): mexpected = np.argsort(s.values, kind='mergesort') qexpected = np.argsort(s.values, kind='quicksort') - self.assert_series_equal(mindexer, Series(mexpected), - check_dtype=False) - self.assert_series_equal(qindexer, Series(qexpected), - check_dtype=False) - self.assertFalse(np.array_equal(qindexer, mindexer)) + tm.assert_series_equal(mindexer, Series(mexpected), + check_dtype=False) + tm.assert_series_equal(qindexer, Series(qexpected), + check_dtype=False) + pytest.raises(AssertionError, tm.assert_numpy_array_equal, + qindexer, mindexer) def test_cumsum(self): self._check_accum_op('cumsum') @@ -376,24 +397,24 @@ def test_cumprod(self): self._check_accum_op('cumprod') def test_cummin(self): - self.assert_numpy_array_equal(self.ts.cummin().values, - np.minimum.accumulate(np.array(self.ts))) + tm.assert_numpy_array_equal(self.ts.cummin().values, + np.minimum.accumulate(np.array(self.ts))) ts = self.ts.copy() ts[::2] = np.NaN result = ts.cummin()[1::2] expected = np.minimum.accumulate(ts.valid()) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_cummax(self): - self.assert_numpy_array_equal(self.ts.cummax().values, - np.maximum.accumulate(np.array(self.ts))) + tm.assert_numpy_array_equal(self.ts.cummax().values, + np.maximum.accumulate(np.array(self.ts))) ts = self.ts.copy() ts[::2] = np.NaN result = ts.cummax()[1::2] expected = np.maximum.accumulate(ts.valid()) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_cummin_datetime64(self): s = pd.Series(pd.to_datetime(['NaT', '2000-1-2', 'NaT', '2000-1-1', @@ -402,13 +423,13 @@ def test_cummin_datetime64(self): expected = pd.Series(pd.to_datetime(['NaT', '2000-1-2', 'NaT', '2000-1-1', 'NaT', '2000-1-1'])) result = s.cummin(skipna=True) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) expected = pd.Series(pd.to_datetime( ['NaT', '2000-1-2', '2000-1-2', '2000-1-1', '2000-1-1', '2000-1-1' ])) result = s.cummin(skipna=False) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) def test_cummax_datetime64(self): s = pd.Series(pd.to_datetime(['NaT', '2000-1-2', 'NaT', '2000-1-1', @@ -417,13 +438,13 @@ def test_cummax_datetime64(self): expected = pd.Series(pd.to_datetime(['NaT', '2000-1-2', 'NaT', '2000-1-2', 'NaT', '2000-1-3'])) result = s.cummax(skipna=True) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) expected = pd.Series(pd.to_datetime( ['NaT', '2000-1-2', '2000-1-2', '2000-1-2', '2000-1-2', '2000-1-3' ])) result = s.cummax(skipna=False) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) def test_cummin_timedelta64(self): s = pd.Series(pd.to_timedelta(['NaT', @@ -440,7 +461,7 @@ def test_cummin_timedelta64(self): 'NaT', '1 min', ])) result = s.cummin(skipna=True) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) expected = pd.Series(pd.to_timedelta(['NaT', '2 min', @@ -449,7 +470,7 @@ def test_cummin_timedelta64(self): '1 min', '1 min', ])) result = s.cummin(skipna=False) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) def test_cummax_timedelta64(self): s = pd.Series(pd.to_timedelta(['NaT', @@ -466,7 +487,7 @@ def test_cummax_timedelta64(self): 'NaT', '3 min', ])) result = s.cummax(skipna=True) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) expected = pd.Series(pd.to_timedelta(['NaT', '2 min', @@ -475,11 +496,11 @@ def test_cummax_timedelta64(self): '2 min', '3 min', ])) result = s.cummax(skipna=False) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) def test_npdiff(self): - raise nose.SkipTest("skipping due to Series no longer being an " - "ndarray") + pytest.skip("skipping due to Series no longer being an " + "ndarray") # no longer works as the return type of np.diff is now nd.array s = Series(np.arange(5)) @@ -500,11 +521,11 @@ def testit(): # idxmax, idxmin, min, and max are valid for dates if name not in ['max', 'min']: ds = Series(date_range('1/1/2001', periods=10)) - self.assertRaises(TypeError, f, ds) + pytest.raises(TypeError, f, ds) # skipna or no - self.assertTrue(notnull(f(self.series))) - self.assertTrue(isnull(f(self.series, skipna=False))) + assert notna(f(self.series)) + assert isna(f(self.series, skipna=False)) # check the result is correct nona = self.series.dropna() @@ -517,12 +538,12 @@ def testit(): # xref 9422 # bottleneck >= 1.0 give 0.0 for an allna Series sum try: - self.assertTrue(nanops._USE_BOTTLENECK) + assert nanops._USE_BOTTLENECK import bottleneck as bn # noqa - self.assertTrue(bn.__version__ >= LooseVersion('1.0')) - self.assertEqual(f(allna), 0.0) + assert bn.__version__ >= LooseVersion('1.0') + assert f(allna) == 0.0 except: - self.assertTrue(np.isnan(f(allna))) + assert np.isnan(f(allna)) # dtype=object with None, it works! s = Series([1, 2, 3, None, 5]) @@ -539,19 +560,19 @@ def testit(): s = Series(bdate_range('1/1/2000', periods=10)) res = f(s) exp = alternate(s) - self.assertEqual(res, exp) + assert res == exp # check on string data if name not in ['sum', 'min', 'max']: - self.assertRaises(TypeError, f, Series(list('abc'))) + pytest.raises(TypeError, f, Series(list('abc'))) # Invalid axis. - self.assertRaises(ValueError, f, self.series, axis=1) + pytest.raises(ValueError, f, self.series, axis=1) # Unimplemented numeric_only parameter. if 'numeric_only' in compat.signature(f).args: - self.assertRaisesRegexp(NotImplementedError, name, f, - self.series, numeric_only=True) + tm.assert_raises_regex(NotImplementedError, name, f, + self.series, numeric_only=True) testit() @@ -565,9 +586,9 @@ def testit(): def _check_accum_op(self, name, check_dtype=True): func = getattr(np, name) - self.assert_numpy_array_equal(func(self.ts).values, - func(np.array(self.ts)), - check_dtype=check_dtype) + tm.assert_numpy_array_equal(func(self.ts).values, + func(np.array(self.ts)), + check_dtype=check_dtype) # with missing values ts = self.ts.copy() @@ -576,8 +597,8 @@ def _check_accum_op(self, name, check_dtype=True): result = func(ts)[1::2] expected = func(np.array(ts.valid())) - self.assert_numpy_array_equal(result.values, expected, - check_dtype=False) + tm.assert_numpy_array_equal(result.values, expected, + check_dtype=False) def test_compress(self): cond = [True, False, True, False, False] @@ -596,12 +617,12 @@ def test_numpy_compress(self): tm.assert_series_equal(np.compress(cond, s), expected) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.compress, - cond, s, axis=1) + tm.assert_raises_regex(ValueError, msg, np.compress, + cond, s, axis=1) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.compress, - cond, s, out=s) + tm.assert_raises_regex(ValueError, msg, np.compress, + cond, s, out=s) def test_round(self): self.ts.index.name = "index_name" @@ -609,7 +630,7 @@ def test_round(self): expected = Series(np.round(self.ts.values, 2), index=self.ts.index, name='ts') assert_series_equal(result, expected) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_numpy_round(self): # See gh-12600 @@ -619,47 +640,48 @@ def test_numpy_round(self): assert_series_equal(out, expected) msg = "the 'out' parameter is not supported" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): np.round(s, decimals=0, out=s) def test_built_in_round(self): if not compat.PY3: - raise nose.SkipTest( + pytest.skip( 'build in round cannot be overriden prior to Python 3') s = Series([1.123, 2.123, 3.123], index=lrange(3)) result = round(s) expected_rounded0 = Series([1., 2., 3.], index=lrange(3)) - self.assert_series_equal(result, expected_rounded0) + tm.assert_series_equal(result, expected_rounded0) decimals = 2 expected_rounded = Series([1.12, 2.12, 3.12], index=lrange(3)) result = round(s, decimals) - self.assert_series_equal(result, expected_rounded) + tm.assert_series_equal(result, expected_rounded) def test_prod_numpy16_bug(self): s = Series([1., 1., 1.], index=lrange(3)) result = s.prod() - self.assertNotIsInstance(result, Series) + + assert not isinstance(result, Series) def test_all_any(self): ts = tm.makeTimeSeries() bool_series = ts > 0 - self.assertFalse(bool_series.all()) - self.assertTrue(bool_series.any()) + assert not bool_series.all() + assert bool_series.any() # Alternative types, with implicit 'object' dtype. s = Series(['abc', True]) - self.assertEqual('abc', s.any()) # 'abc' || True => 'abc' + assert 'abc' == s.any() # 'abc' || True => 'abc' def test_all_any_params(self): # Check skipna, with implicit 'object' dtype. s1 = Series([np.nan, True]) s2 = Series([np.nan, False]) - self.assertTrue(s1.all(skipna=False)) # nan && True => True - self.assertTrue(s1.all(skipna=True)) - self.assertTrue(np.isnan(s2.any(skipna=False))) # nan || False => nan - self.assertFalse(s2.any(skipna=True)) + assert s1.all(skipna=False) # nan && True => True + assert s1.all(skipna=True) + assert np.isnan(s2.any(skipna=False)) # nan || False => nan + assert not s2.any(skipna=True) # Check level. s = pd.Series([False, False, True, True, False, True], @@ -668,12 +690,12 @@ def test_all_any_params(self): assert_series_equal(s.any(level=0), Series([False, True, True])) # bool_only is not implemented with level option. - self.assertRaises(NotImplementedError, s.any, bool_only=True, level=0) - self.assertRaises(NotImplementedError, s.all, bool_only=True, level=0) + pytest.raises(NotImplementedError, s.any, bool_only=True, level=0) + pytest.raises(NotImplementedError, s.all, bool_only=True, level=0) # bool_only is not implemented alone. - self.assertRaises(NotImplementedError, s.any, bool_only=True) - self.assertRaises(NotImplementedError, s.all, bool_only=True) + pytest.raises(NotImplementedError, s.any, bool_only=True) + pytest.raises(NotImplementedError, s.all, bool_only=True) def test_modulo(self): with np.errstate(all='ignore'): @@ -698,7 +720,7 @@ def test_modulo(self): p = p.astype('float64') result = p['first'] % p['second'] result2 = p['second'] % p['first'] - self.assertFalse(np.array_equal(result, result2)) + assert not np.array_equal(result, result2) # GH 9144 s = Series([0, 1]) @@ -718,23 +740,23 @@ def test_ops_consistency_on_empty(self): # float result = Series(dtype=float).sum() - self.assertEqual(result, 0) + assert result == 0 result = Series(dtype=float).mean() - self.assertTrue(isnull(result)) + assert isna(result) result = Series(dtype=float).median() - self.assertTrue(isnull(result)) + assert isna(result) # timedelta64[ns] result = Series(dtype='m8[ns]').sum() - self.assertEqual(result, Timedelta(0)) + assert result == Timedelta(0) result = Series(dtype='m8[ns]').mean() - self.assertTrue(result is pd.NaT) + assert result is pd.NaT result = Series(dtype='m8[ns]').median() - self.assertTrue(result is pd.NaT) + assert result is pd.NaT def test_corr(self): tm._skip_if_no_scipy() @@ -742,30 +764,30 @@ def test_corr(self): import scipy.stats as stats # full overlap - self.assertAlmostEqual(self.ts.corr(self.ts), 1) + tm.assert_almost_equal(self.ts.corr(self.ts), 1) # partial overlap - self.assertAlmostEqual(self.ts[:15].corr(self.ts[5:]), 1) + tm.assert_almost_equal(self.ts[:15].corr(self.ts[5:]), 1) - self.assertTrue(isnull(self.ts[:15].corr(self.ts[5:], min_periods=12))) + assert isna(self.ts[:15].corr(self.ts[5:], min_periods=12)) ts1 = self.ts[:15].reindex(self.ts.index) ts2 = self.ts[5:].reindex(self.ts.index) - self.assertTrue(isnull(ts1.corr(ts2, min_periods=12))) + assert isna(ts1.corr(ts2, min_periods=12)) # No overlap - self.assertTrue(np.isnan(self.ts[::2].corr(self.ts[1::2]))) + assert np.isnan(self.ts[::2].corr(self.ts[1::2])) # all NA cp = self.ts[:10].copy() cp[:] = np.nan - self.assertTrue(isnull(cp.corr(cp))) + assert isna(cp.corr(cp)) A = tm.makeTimeSeries() B = tm.makeTimeSeries() result = A.corr(B) expected, _ = stats.pearsonr(A, B) - self.assertAlmostEqual(result, expected) + tm.assert_almost_equal(result, expected) def test_corr_rank(self): tm._skip_if_no_scipy() @@ -779,16 +801,16 @@ def test_corr_rank(self): A[-5:] = A[:5] result = A.corr(B, method='kendall') expected = stats.kendalltau(A, B)[0] - self.assertAlmostEqual(result, expected) + tm.assert_almost_equal(result, expected) result = A.corr(B, method='spearman') expected = stats.spearmanr(A, B)[0] - self.assertAlmostEqual(result, expected) + tm.assert_almost_equal(result, expected) # these methods got rewritten in 0.8 if scipy.__version__ < LooseVersion('0.9'): - raise nose.SkipTest("skipping corr rank because of scipy version " - "{0}".format(scipy.__version__)) + pytest.skip("skipping corr rank because of scipy version " + "{0}".format(scipy.__version__)) # results from R A = Series( @@ -799,38 +821,38 @@ def test_corr_rank(self): 1.17258718, -1.06009347, -0.10222060, -0.89076239, 0.89372375]) kexp = 0.4319297 sexp = 0.5853767 - self.assertAlmostEqual(A.corr(B, method='kendall'), kexp) - self.assertAlmostEqual(A.corr(B, method='spearman'), sexp) + tm.assert_almost_equal(A.corr(B, method='kendall'), kexp) + tm.assert_almost_equal(A.corr(B, method='spearman'), sexp) def test_cov(self): # full overlap - self.assertAlmostEqual(self.ts.cov(self.ts), self.ts.std() ** 2) + tm.assert_almost_equal(self.ts.cov(self.ts), self.ts.std() ** 2) # partial overlap - self.assertAlmostEqual(self.ts[:15].cov(self.ts[5:]), + tm.assert_almost_equal(self.ts[:15].cov(self.ts[5:]), self.ts[5:15].std() ** 2) # No overlap - self.assertTrue(np.isnan(self.ts[::2].cov(self.ts[1::2]))) + assert np.isnan(self.ts[::2].cov(self.ts[1::2])) # all NA cp = self.ts[:10].copy() cp[:] = np.nan - self.assertTrue(isnull(cp.cov(cp))) + assert isna(cp.cov(cp)) # min_periods - self.assertTrue(isnull(self.ts[:15].cov(self.ts[5:], min_periods=12))) + assert isna(self.ts[:15].cov(self.ts[5:], min_periods=12)) ts1 = self.ts[:15].reindex(self.ts.index) ts2 = self.ts[5:].reindex(self.ts.index) - self.assertTrue(isnull(ts1.cov(ts2, min_periods=12))) + assert isna(ts1.cov(ts2, min_periods=12)) def test_count(self): - self.assertEqual(self.ts.count(), len(self.ts)) + assert self.ts.count() == len(self.ts) self.ts[::2] = np.NaN - self.assertEqual(self.ts.count(), np.isfinite(self.ts).sum()) + assert self.ts.count() == np.isfinite(self.ts).sum() mi = MultiIndex.from_arrays([list('aabbcc'), [1, 2, 2, nan, 1, 2]]) ts = Series(np.arange(len(mi)), index=mi) @@ -858,15 +880,15 @@ def test_dot(self): # Check ndarray argument result = a.dot(b.values) - self.assertTrue(np.all(result == expected.values)) + assert np.all(result == expected.values) assert_almost_equal(a.dot(b['2'].values), expected['2']) # Check series argument assert_almost_equal(a.dot(b['1']), expected['1']) assert_almost_equal(a.dot(b2['1']), expected['1']) - self.assertRaises(Exception, a.dot, a.values[:3]) - self.assertRaises(ValueError, a.dot, b.T) + pytest.raises(Exception, a.dot, a.values[:3]) + pytest.raises(ValueError, a.dot, b.T) def test_value_counts_nunique(self): @@ -875,7 +897,7 @@ def test_value_counts_nunique(self): series[20:500] = np.nan series[10:20] = 5000 result = series.nunique() - self.assertEqual(result, 11) + assert result == 11 def test_unique(self): @@ -883,24 +905,24 @@ def test_unique(self): s = Series([1.2345] * 100) s[::2] = np.nan result = s.unique() - self.assertEqual(len(result), 2) + assert len(result) == 2 s = Series([1.2345] * 100, dtype='f4') s[::2] = np.nan result = s.unique() - self.assertEqual(len(result), 2) + assert len(result) == 2 # NAs in object arrays #714 s = Series(['foo'] * 100, dtype='O') s[::2] = np.nan result = s.unique() - self.assertEqual(len(result), 2) + assert len(result) == 2 # decision about None s = Series([1, 2, 3, None, None, None], dtype=object) result = s.unique() expected = np.array([1, 2, 3, None], dtype=object) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_drop_duplicates(self): # check both int and object @@ -919,17 +941,6 @@ def test_drop_duplicates(self): sc.drop_duplicates(keep='last', inplace=True) assert_series_equal(sc, s[~expected]) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - assert_series_equal(s.duplicated(take_last=True), expected) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal( - s.drop_duplicates(take_last=True), s[~expected]) - sc = s.copy() - with tm.assert_produces_warning(FutureWarning): - sc.drop_duplicates(take_last=True, inplace=True) - assert_series_equal(sc, s[~expected]) - expected = Series([False, False, True, True]) assert_series_equal(s.duplicated(keep=False), expected) assert_series_equal(s.drop_duplicates(keep=False), s[~expected]) @@ -953,17 +964,6 @@ def test_drop_duplicates(self): sc.drop_duplicates(keep='last', inplace=True) assert_series_equal(sc, s[~expected]) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - assert_series_equal(s.duplicated(take_last=True), expected) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal( - s.drop_duplicates(take_last=True), s[~expected]) - sc = s.copy() - with tm.assert_produces_warning(FutureWarning): - sc.drop_duplicates(take_last=True, inplace=True) - assert_series_equal(sc, s[~expected]) - expected = Series([False, True, True, False, True, True, False]) assert_series_equal(s.duplicated(keep=False), expected) assert_series_equal(s.drop_duplicates(keep=False), s[~expected]) @@ -971,125 +971,19 @@ def test_drop_duplicates(self): sc.drop_duplicates(keep=False, inplace=True) assert_series_equal(sc, s[~expected]) - def test_rank(self): - tm._skip_if_no_scipy() - from scipy.stats import rankdata - - self.ts[::2] = np.nan - self.ts[:10][::3] = 4. - - ranks = self.ts.rank() - oranks = self.ts.astype('O').rank() - - assert_series_equal(ranks, oranks) - - mask = np.isnan(self.ts) - filled = self.ts.fillna(np.inf) - - # rankdata returns a ndarray - exp = Series(rankdata(filled), index=filled.index, name='ts') - exp[mask] = np.nan - - tm.assert_series_equal(ranks, exp) - - iseries = Series(np.arange(5).repeat(2)) - - iranks = iseries.rank() - exp = iseries.astype(float).rank() - assert_series_equal(iranks, exp) - iseries = Series(np.arange(5)) + 1.0 - exp = iseries / 5.0 - iranks = iseries.rank(pct=True) - - assert_series_equal(iranks, exp) - - iseries = Series(np.repeat(1, 100)) - exp = Series(np.repeat(0.505, 100)) - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - iseries[1] = np.nan - exp = Series(np.repeat(50.0 / 99.0, 100)) - exp[1] = np.nan - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - iseries = Series(np.arange(5)) + 1.0 - iseries[4] = np.nan - exp = iseries / 4.0 - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - iseries = Series(np.repeat(np.nan, 100)) - exp = iseries.copy() - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - iseries = Series(np.arange(5)) + 1 - iseries[4] = np.nan - exp = iseries / 4.0 - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - rng = date_range('1/1/1990', periods=5) - iseries = Series(np.arange(5), rng) + 1 - iseries.iloc[4] = np.nan - exp = iseries / 4.0 - iranks = iseries.rank(pct=True) - assert_series_equal(iranks, exp) - - iseries = Series([1e-50, 1e-100, 1e-20, 1e-2, 1e-20 + 1e-30, 1e-1]) - exp = Series([2, 1, 3, 5, 4, 6.0]) - iranks = iseries.rank() - assert_series_equal(iranks, exp) - - # GH 5968 - iseries = Series(['3 day', '1 day 10m', '-2 day', pd.NaT], - dtype='m8[ns]') - exp = Series([3, 2, 1, np.nan]) - iranks = iseries.rank() - assert_series_equal(iranks, exp) - - values = np.array( - [-50, -1, -1e-20, -1e-25, -1e-50, 0, 1e-40, 1e-20, 1e-10, 2, 40 - ], dtype='float64') - random_order = np.random.permutation(len(values)) - iseries = Series(values[random_order]) - exp = Series(random_order + 1.0, dtype='float64') - iranks = iseries.rank() - assert_series_equal(iranks, exp) - - def test_rank_signature(self): - s = Series([0, 1]) - s.rank(method='average') - self.assertRaises(ValueError, s.rank, 'average') - - def test_rank_inf(self): - raise nose.SkipTest('DataFrame.rank does not currently rank ' - 'np.inf and -np.inf properly') - - values = np.array( - [-np.inf, -50, -1, -1e-20, -1e-25, -1e-50, 0, 1e-40, 1e-20, 1e-10, - 2, 40, np.inf], dtype='float64') - random_order = np.random.permutation(len(values)) - iseries = Series(values[random_order]) - exp = Series(random_order + 1.0, dtype='float64') - iranks = iseries.rank() - assert_series_equal(iranks, exp) - def test_clip(self): val = self.ts.median() - self.assertEqual(self.ts.clip_lower(val).min(), val) - self.assertEqual(self.ts.clip_upper(val).max(), val) + assert self.ts.clip_lower(val).min() == val + assert self.ts.clip_upper(val).max() == val - self.assertEqual(self.ts.clip(lower=val).min(), val) - self.assertEqual(self.ts.clip(upper=val).max(), val) + assert self.ts.clip(lower=val).min() == val + assert self.ts.clip(upper=val).max() == val result = self.ts.clip(-0.5, 0.5) expected = np.clip(self.ts, -0.5, 0.5) assert_series_equal(result, expected) - tm.assertIsInstance(expected, Series) + assert isinstance(expected, Series) def test_clip_types_and_nulls(self): @@ -1101,10 +995,21 @@ def test_clip_types_and_nulls(self): thresh = s[2] l = s.clip_lower(thresh) u = s.clip_upper(thresh) - self.assertEqual(l[notnull(l)].min(), thresh) - self.assertEqual(u[notnull(u)].max(), thresh) - self.assertEqual(list(isnull(s)), list(isnull(l))) - self.assertEqual(list(isnull(s)), list(isnull(u))) + assert l[notna(l)].min() == thresh + assert u[notna(u)].max() == thresh + assert list(isna(s)) == list(isna(l)) + assert list(isna(s)) == list(isna(u)) + + def test_clip_with_na_args(self): + """Should process np.nan argument as None """ + # GH # 17276 + s = Series([1, 2, 3]) + + assert_series_equal(s.clip(np.nan), Series([1, 2, 3])) + assert_series_equal(s.clip(upper=[1, 1, np.nan]), Series([1, 2, 3])) + assert_series_equal(s.clip(lower=[1, np.nan, 1]), Series([1, 2, 3])) + assert_series_equal(s.clip(upper=np.nan, lower=np.nan), + Series([1, 2, 3])) def test_clip_against_series(self): # GH #6966 @@ -1117,20 +1022,33 @@ def test_clip_against_series(self): lower = Series([1.0, 2.0, 3.0]) upper = Series([1.5, 2.5, 3.5]) + assert_series_equal(s.clip(lower, upper), Series([1.0, 2.0, 3.5])) assert_series_equal(s.clip(1.5, upper), Series([1.5, 1.5, 3.5])) + @pytest.mark.parametrize("inplace", [True, False]) + @pytest.mark.parametrize("upper", [[1, 2, 3], np.asarray([1, 2, 3])]) + def test_clip_against_list_like(self, inplace, upper): + # GH #15390 + original = pd.Series([5, 6, 7]) + result = original.clip(upper=upper, inplace=inplace) + expected = pd.Series([1, 2, 3]) + + if inplace: + result = original + tm.assert_series_equal(result, expected, check_exact=True) + def test_clip_with_datetimes(self): # GH 11838 # naive and tz-aware datetimes t = Timestamp('2015-12-01 09:30:30') - s = Series([Timestamp('2015-12-01 09:30:00'), Timestamp( - '2015-12-01 09:31:00')]) + s = Series([Timestamp('2015-12-01 09:30:00'), + Timestamp('2015-12-01 09:31:00')]) result = s.clip(upper=t) - expected = Series([Timestamp('2015-12-01 09:30:00'), Timestamp( - '2015-12-01 09:30:30')]) + expected = Series([Timestamp('2015-12-01 09:30:00'), + Timestamp('2015-12-01 09:30:30')]) assert_series_equal(result, expected) t = Timestamp('2015-12-01 09:30:30', tz='US/Eastern') @@ -1185,13 +1103,25 @@ def test_isin(self): expected = Series([True, False, True, False, False, False, True, True]) assert_series_equal(result, expected) + # GH: 16012 + # This specific issue has to have a series over 1e6 in len, but the + # comparison array (in_list) must be large enough so that numpy doesn't + # do a manual masking trick that will avoid this issue altogether + s = Series(list('abcdefghijk' * 10 ** 5)) + # If numpy doesn't do the manual comparison/mask, these + # unorderable mixed types are what cause the exception in numpy + in_list = [-1, 'a', 'b', 'G', 'Y', 'Z', 'E', + 'K', 'E', 'S', 'I', 'R', 'R'] * 6 + + assert s.isin(in_list).sum() == 200000 + def test_isin_with_string_scalar(self): # GH4763 s = Series(['A', 'B', 'C', 'a', 'B', 'B', 'A', 'C']) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): s.isin('a') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): s = Series(['aaa', 'b', 'c']) s.isin('aaa') @@ -1228,6 +1158,15 @@ def test_isin_with_i8(self): result = s.isin(s[0:2]) assert_series_equal(result, expected) + @pytest.mark.parametrize("empty", [[], Series(), np.array([])]) + def test_isin_empty(self, empty): + # see gh-16991 + s = Series(["a", "b"]) + expected = Series([False, False]) + + result = s.isin(empty) + tm.assert_series_equal(expected, result) + def test_timedelta64_analytics(self): from pandas import date_range @@ -1236,20 +1175,20 @@ def test_timedelta64_analytics(self): Timestamp('20120101') result = td.idxmin() - self.assertEqual(result, 0) + assert result == 0 result = td.idxmax() - self.assertEqual(result, 2) + assert result == 2 # GH 2982 # with NaT td[0] = np.nan result = td.idxmin() - self.assertEqual(result, 1) + assert result == 1 result = td.idxmax() - self.assertEqual(result, 2) + assert result == 2 # abs s1 = Series(date_range('20120101', periods=3)) @@ -1266,145 +1205,145 @@ def test_timedelta64_analytics(self): # max/min result = td.max() expected = Timedelta('2 days') - self.assertEqual(result, expected) + assert result == expected result = td.min() expected = Timedelta('1 days') - self.assertEqual(result, expected) + assert result == expected def test_idxmin(self): # test idxmin - # _check_stat_op approach can not be used here because of isnull check. + # _check_stat_op approach can not be used here because of isna check. # add some NaNs self.series[5:15] = np.NaN # skipna or no - self.assertEqual(self.series[self.series.idxmin()], self.series.min()) - self.assertTrue(isnull(self.series.idxmin(skipna=False))) + assert self.series[self.series.idxmin()] == self.series.min() + assert isna(self.series.idxmin(skipna=False)) # no NaNs nona = self.series.dropna() - self.assertEqual(nona[nona.idxmin()], nona.min()) - self.assertEqual(nona.index.values.tolist().index(nona.idxmin()), - nona.values.argmin()) + assert nona[nona.idxmin()] == nona.min() + assert (nona.index.values.tolist().index(nona.idxmin()) == + nona.values.argmin()) # all NaNs allna = self.series * nan - self.assertTrue(isnull(allna.idxmin())) + assert isna(allna.idxmin()) # datetime64[ns] from pandas import date_range s = Series(date_range('20130102', periods=6)) result = s.idxmin() - self.assertEqual(result, 0) + assert result == 0 s[0] = np.nan result = s.idxmin() - self.assertEqual(result, 1) + assert result == 1 def test_numpy_argmin(self): # argmin is aliased to idxmin data = np.random.randint(0, 11, size=10) result = np.argmin(Series(data)) - self.assertEqual(result, np.argmin(data)) + assert result == np.argmin(data) if not _np_version_under1p10: msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argmin, - Series(data), out=data) + tm.assert_raises_regex(ValueError, msg, np.argmin, + Series(data), out=data) def test_idxmax(self): # test idxmax - # _check_stat_op approach can not be used here because of isnull check. + # _check_stat_op approach can not be used here because of isna check. # add some NaNs self.series[5:15] = np.NaN # skipna or no - self.assertEqual(self.series[self.series.idxmax()], self.series.max()) - self.assertTrue(isnull(self.series.idxmax(skipna=False))) + assert self.series[self.series.idxmax()] == self.series.max() + assert isna(self.series.idxmax(skipna=False)) # no NaNs nona = self.series.dropna() - self.assertEqual(nona[nona.idxmax()], nona.max()) - self.assertEqual(nona.index.values.tolist().index(nona.idxmax()), - nona.values.argmax()) + assert nona[nona.idxmax()] == nona.max() + assert (nona.index.values.tolist().index(nona.idxmax()) == + nona.values.argmax()) # all NaNs allna = self.series * nan - self.assertTrue(isnull(allna.idxmax())) + assert isna(allna.idxmax()) from pandas import date_range s = Series(date_range('20130102', periods=6)) result = s.idxmax() - self.assertEqual(result, 5) + assert result == 5 s[5] = np.nan result = s.idxmax() - self.assertEqual(result, 4) + assert result == 4 # Float64Index # GH 5914 s = pd.Series([1, 2, 3], [1.1, 2.1, 3.1]) result = s.idxmax() - self.assertEqual(result, 3.1) + assert result == 3.1 result = s.idxmin() - self.assertEqual(result, 1.1) + assert result == 1.1 s = pd.Series(s.index, s.index) result = s.idxmax() - self.assertEqual(result, 3.1) + assert result == 3.1 result = s.idxmin() - self.assertEqual(result, 1.1) + assert result == 1.1 def test_numpy_argmax(self): # argmax is aliased to idxmax data = np.random.randint(0, 11, size=10) result = np.argmax(Series(data)) - self.assertEqual(result, np.argmax(data)) + assert result == np.argmax(data) if not _np_version_under1p10: msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argmax, - Series(data), out=data) + tm.assert_raises_regex(ValueError, msg, np.argmax, + Series(data), out=data) def test_ptp(self): N = 1000 arr = np.random.randn(N) ser = Series(arr) - self.assertEqual(np.ptp(ser), np.ptp(arr)) + assert np.ptp(ser) == np.ptp(arr) # GH11163 s = Series([3, 5, np.nan, -3, 10]) - self.assertEqual(s.ptp(), 13) - self.assertTrue(pd.isnull(s.ptp(skipna=False))) + assert s.ptp() == 13 + assert pd.isna(s.ptp(skipna=False)) mi = pd.MultiIndex.from_product([['a', 'b'], [1, 2, 3]]) s = pd.Series([1, np.nan, 7, 3, 5, np.nan], index=mi) expected = pd.Series([6, 2], index=['a', 'b'], dtype=np.float64) - self.assert_series_equal(s.ptp(level=0), expected) + tm.assert_series_equal(s.ptp(level=0), expected) expected = pd.Series([np.nan, np.nan], index=['a', 'b']) - self.assert_series_equal(s.ptp(level=0, skipna=False), expected) + tm.assert_series_equal(s.ptp(level=0, skipna=False), expected) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): s.ptp(axis=1) s = pd.Series(['a', 'b', 'c', 'd', 'e']) - with self.assertRaises(TypeError): + with pytest.raises(TypeError): s.ptp() - with self.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): s.ptp(numeric_only=True) def test_empty_timeseries_redections_return_nat(self): # covers #11245 for dtype in ('m8[ns]', 'm8[ns]', 'M8[ns]', 'M8[ns, UTC]'): - self.assertIs(Series([], dtype=dtype).min(), pd.NaT) - self.assertIs(Series([], dtype=dtype).max(), pd.NaT) + assert Series([], dtype=dtype).min() is pd.NaT + assert Series([], dtype=dtype).max() is pd.NaT def test_unique_data_ownership(self): # it works! #1807 @@ -1434,7 +1373,7 @@ def test_numpy_repeat(self): assert_series_equal(np.repeat(s, 2), expected) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.repeat, s, 2, axis=0) + tm.assert_raises_regex(ValueError, msg, np.repeat, s, 2, axis=0) def test_searchsorted(self): s = Series([1, 2, 3]) @@ -1453,7 +1392,7 @@ def test_searchsorted_numeric_dtypes_scalar(self): s = Series([1, 2, 90, 1000, 3e9]) r = s.searchsorted(30) e = 2 - self.assertEqual(r, e) + assert r == e r = s.searchsorted([30]) e = np.array([2], dtype=np.intp) @@ -1470,7 +1409,7 @@ def test_search_sorted_datetime64_scalar(self): v = pd.Timestamp('20120102') r = s.searchsorted(v) e = 1 - self.assertEqual(r, e) + assert r == e def test_search_sorted_datetime64_list(self): s = Series(pd.date_range('20120101', periods=10, freq='2D')) @@ -1489,111 +1428,26 @@ def test_searchsorted_sorter(self): def test_is_unique(self): # GH11946 s = Series(np.random.randint(0, 10, size=1000)) - self.assertFalse(s.is_unique) + assert not s.is_unique s = Series(np.arange(1000)) - self.assertTrue(s.is_unique) + assert s.is_unique def test_is_monotonic(self): s = Series(np.random.randint(0, 10, size=1000)) - self.assertFalse(s.is_monotonic) + assert not s.is_monotonic s = Series(np.arange(1000)) - self.assertTrue(s.is_monotonic) - self.assertTrue(s.is_monotonic_increasing) + assert s.is_monotonic + assert s.is_monotonic_increasing s = Series(np.arange(1000, 0, -1)) - self.assertTrue(s.is_monotonic_decreasing) + assert s.is_monotonic_decreasing s = Series(pd.date_range('20130101', periods=10)) - self.assertTrue(s.is_monotonic) - self.assertTrue(s.is_monotonic_increasing) + assert s.is_monotonic + assert s.is_monotonic_increasing s = Series(list(reversed(s.tolist()))) - self.assertFalse(s.is_monotonic) - self.assertTrue(s.is_monotonic_decreasing) - - def test_nsmallest_nlargest(self): - # float, int, datetime64 (use i8), timedelts64 (same), - # object that are numbers, object that are strings - - base = [3, 2, 1, 2, 5] - - s_list = [ - Series(base, dtype='int8'), - Series(base, dtype='int16'), - Series(base, dtype='int32'), - Series(base, dtype='int64'), - Series(base, dtype='float32'), - Series(base, dtype='float64'), - Series(base, dtype='uint8'), - Series(base, dtype='uint16'), - Series(base, dtype='uint32'), - Series(base, dtype='uint64'), - Series(base).astype('timedelta64[ns]'), - Series(pd.to_datetime(['2003', '2002', '2001', '2002', '2005'])), - ] - - raising = [ - Series([3., 2, 1, 2, '5'], dtype='object'), - Series([3., 2, 1, 2, 5], dtype='object'), - # not supported on some archs - # Series([3., 2, 1, 2, 5], dtype='complex256'), - Series([3., 2, 1, 2, 5], dtype='complex128'), - ] - - for r in raising: - dt = r.dtype - msg = "Cannot use method 'n(larg|small)est' with dtype %s" % dt - args = 2, len(r), 0, -1 - methods = r.nlargest, r.nsmallest - for method, arg in product(methods, args): - with tm.assertRaisesRegexp(TypeError, msg): - method(arg) - - for s in s_list: - - assert_series_equal(s.nsmallest(2), s.iloc[[2, 1]]) - - assert_series_equal(s.nsmallest(2, keep='last'), s.iloc[[2, 3]]) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal( - s.nsmallest(2, take_last=True), s.iloc[[2, 3]]) - - assert_series_equal(s.nlargest(3), s.iloc[[4, 0, 1]]) - - assert_series_equal(s.nlargest(3, keep='last'), s.iloc[[4, 0, 3]]) - with tm.assert_produces_warning(FutureWarning): - assert_series_equal( - s.nlargest(3, take_last=True), s.iloc[[4, 0, 3]]) - - empty = s.iloc[0:0] - assert_series_equal(s.nsmallest(0), empty) - assert_series_equal(s.nsmallest(-1), empty) - assert_series_equal(s.nlargest(0), empty) - assert_series_equal(s.nlargest(-1), empty) - - assert_series_equal(s.nsmallest(len(s)), s.sort_values()) - assert_series_equal(s.nsmallest(len(s) + 1), s.sort_values()) - assert_series_equal(s.nlargest(len(s)), s.iloc[[4, 0, 1, 3, 2]]) - assert_series_equal(s.nlargest(len(s) + 1), - s.iloc[[4, 0, 1, 3, 2]]) - - s = Series([3., np.nan, 1, 2, 5]) - assert_series_equal(s.nlargest(), s.iloc[[4, 0, 3, 2]]) - assert_series_equal(s.nsmallest(), s.iloc[[2, 3, 0, 4]]) - - msg = 'keep must be either "first", "last"' - with tm.assertRaisesRegexp(ValueError, msg): - s.nsmallest(keep='invalid') - with tm.assertRaisesRegexp(ValueError, msg): - s.nlargest(keep='invalid') - - # GH 13412 - s = Series([1, 4, 3, 2], index=[0, 0, 1, 1]) - result = s.nlargest(3) - expected = s.sort_values(ascending=False).head(3) - assert_series_equal(result, expected) - result = s.nsmallest(3) - expected = s.sort_values().head(3) - assert_series_equal(result, expected) + assert not s.is_monotonic + assert s.is_monotonic_decreasing def test_sort_index_level(self): mi = MultiIndex.from_tuples([[1, 1, 3], [1, 1, 1]], names=list('ABC')) @@ -1629,7 +1483,7 @@ def test_apply_categorical(self): result = s.apply(lambda x: 'A') exp = pd.Series(['A'] * 7, name='XX', index=list('abcdefg')) tm.assert_series_equal(result, exp) - self.assertEqual(result.dtype, np.object) + assert result.dtype == np.object def test_shift_int(self): ts = self.ts.astype(int) @@ -1645,13 +1499,13 @@ def test_shift_categorical(self): sp1 = s.shift(1) assert_index_equal(s.index, sp1.index) - self.assertTrue(np.all(sp1.values.codes[:1] == -1)) - self.assertTrue(np.all(s.values.codes[:-1] == sp1.values.codes[1:])) + assert np.all(sp1.values.codes[:1] == -1) + assert np.all(s.values.codes[:-1] == sp1.values.codes[1:]) sn2 = s.shift(-2) assert_index_equal(s.index, sn2.index) - self.assertTrue(np.all(sn2.values.codes[-2:] == -1)) - self.assertTrue(np.all(s.values.codes[2:] == sn2.values.codes[:-2])) + assert np.all(sn2.values.codes[-2:] == -1) + assert np.all(s.values.codes[2:] == sn2.values.codes[:-2]) assert_index_equal(s.values.categories, sp1.values.categories) assert_index_equal(s.values.categories, sn2.values.categories) @@ -1664,7 +1518,7 @@ def test_reshape_non_2d(self): # see gh-4554 with tm.assert_produces_warning(FutureWarning): x = Series(np.random.random(201), name='x') - self.assertTrue(x.reshape(x.shape, ) is x) + assert x.reshape(x.shape, ) is x # see gh-2719 with tm.assert_produces_warning(FutureWarning): @@ -1672,34 +1526,36 @@ def test_reshape_non_2d(self): result = a.reshape(2, 2) expected = a.values.reshape(2, 2) tm.assert_numpy_array_equal(result, expected) - self.assertIsInstance(result, type(expected)) + assert isinstance(result, type(expected)) def test_reshape_2d_return_array(self): x = Series(np.random.random(201), name='x') with tm.assert_produces_warning(FutureWarning): result = x.reshape((-1, 1)) - self.assertNotIsInstance(result, Series) + assert not isinstance(result, Series) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): result2 = np.reshape(x, (-1, 1)) - self.assertNotIsInstance(result2, Series) + assert not isinstance(result2, Series) with tm.assert_produces_warning(FutureWarning): result = x[:, None] expected = x.reshape((-1, 1)) - assert_almost_equal(result, expected) + tm.assert_almost_equal(result, expected) def test_reshape_bad_kwarg(self): a = Series([1, 2, 3, 4]) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): msg = "'foo' is an invalid keyword argument for this function" - tm.assertRaisesRegexp(TypeError, msg, a.reshape, (2, 2), foo=2) + tm.assert_raises_regex( + TypeError, msg, a.reshape, (2, 2), foo=2) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): msg = r"reshape\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, a.reshape, a.shape, foo=2) + tm.assert_raises_regex( + TypeError, msg, a.reshape, a.shape, foo=2) def test_numpy_reshape(self): a = Series([1, 2, 3, 4]) @@ -1708,7 +1564,7 @@ def test_numpy_reshape(self): result = np.reshape(a, (2, 2)) expected = a.values.reshape(2, 2) tm.assert_numpy_array_equal(result, expected) - self.assertIsInstance(result, type(expected)) + assert isinstance(result, type(expected)) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): result = np.reshape(a, a.shape) @@ -1740,7 +1596,7 @@ def test_unstack(self): labels=[[0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1]]) expected = DataFrame({'bar': s.values}, index=exp_index).sort_index(level=0) - unstacked = s.unstack(0) + unstacked = s.unstack(0).sort_index() assert_frame_equal(unstacked, expected) # GH5873 @@ -1869,3 +1725,109 @@ def test_value_counts_categorical_not_ordered(self): index=exp_idx, name='xxx') tm.assert_series_equal(s.value_counts(normalize=True), exp) tm.assert_series_equal(idx.value_counts(normalize=True), exp) + + +@pytest.fixture +def s_main_dtypes(): + df = pd.DataFrame( + {'datetime': pd.to_datetime(['2003', '2002', + '2001', '2002', + '2005']), + 'datetimetz': pd.to_datetime( + ['2003', '2002', + '2001', '2002', + '2005']).tz_localize('US/Eastern'), + 'timedelta': pd.to_timedelta(['3d', '2d', '1d', + '2d', '5d'])}) + + for dtype in ['int8', 'int16', 'int32', 'int64', + 'float32', 'float64', + 'uint8', 'uint16', 'uint32', 'uint64']: + df[dtype] = Series([3, 2, 1, 2, 5], dtype=dtype) + + return df + + +class TestNLargestNSmallest(object): + + @pytest.mark.parametrize( + "r", [Series([3., 2, 1, 2, '5'], dtype='object'), + Series([3., 2, 1, 2, 5], dtype='object'), + # not supported on some archs + # Series([3., 2, 1, 2, 5], dtype='complex256'), + Series([3., 2, 1, 2, 5], dtype='complex128'), + Series(list('abcde'), dtype='category'), + Series(list('abcde'))]) + def test_error(self, r): + dt = r.dtype + msg = ("Cannot use method 'n(larg|small)est' with " + "dtype {dt}".format(dt=dt)) + args = 2, len(r), 0, -1 + methods = r.nlargest, r.nsmallest + for method, arg in product(methods, args): + with tm.assert_raises_regex(TypeError, msg): + method(arg) + + @pytest.mark.parametrize( + "s", + [v for k, v in s_main_dtypes().iteritems()]) + def test_nsmallest_nlargest(self, s): + # float, int, datetime64 (use i8), timedelts64 (same), + # object that are numbers, object that are strings + + assert_series_equal(s.nsmallest(2), s.iloc[[2, 1]]) + assert_series_equal(s.nsmallest(2, keep='last'), s.iloc[[2, 3]]) + + empty = s.iloc[0:0] + assert_series_equal(s.nsmallest(0), empty) + assert_series_equal(s.nsmallest(-1), empty) + assert_series_equal(s.nlargest(0), empty) + assert_series_equal(s.nlargest(-1), empty) + + assert_series_equal(s.nsmallest(len(s)), s.sort_values()) + assert_series_equal(s.nsmallest(len(s) + 1), s.sort_values()) + assert_series_equal(s.nlargest(len(s)), s.iloc[[4, 0, 1, 3, 2]]) + assert_series_equal(s.nlargest(len(s) + 1), + s.iloc[[4, 0, 1, 3, 2]]) + + def test_misc(self): + + s = Series([3., np.nan, 1, 2, 5]) + assert_series_equal(s.nlargest(), s.iloc[[4, 0, 3, 2]]) + assert_series_equal(s.nsmallest(), s.iloc[[2, 3, 0, 4]]) + + msg = 'keep must be either "first", "last"' + with tm.assert_raises_regex(ValueError, msg): + s.nsmallest(keep='invalid') + with tm.assert_raises_regex(ValueError, msg): + s.nlargest(keep='invalid') + + # GH 15297 + s = Series([1] * 5, index=[1, 2, 3, 4, 5]) + expected_first = Series([1] * 3, index=[1, 2, 3]) + expected_last = Series([1] * 3, index=[5, 4, 3]) + + result = s.nsmallest(3) + assert_series_equal(result, expected_first) + + result = s.nsmallest(3, keep='last') + assert_series_equal(result, expected_last) + + result = s.nlargest(3) + assert_series_equal(result, expected_first) + + result = s.nlargest(3, keep='last') + assert_series_equal(result, expected_last) + + @pytest.mark.parametrize('n', range(1, 5)) + def test_n(self, n): + + # GH 13412 + s = Series([1, 4, 3, 2], index=[0, 0, 1, 1]) + result = s.nlargest(n) + expected = s.sort_values(ascending=False).head(n) + assert_series_equal(result, expected) + + result = s.nsmallest(n) + expected = s.sort_values().head(n) + assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_misc_api.py b/pandas/tests/series/test_api.py similarity index 53% rename from pandas/tests/series/test_misc_api.py rename to pandas/tests/series/test_api.py index b1b06cc7be8a4..b7fbe803f8d3b 100644 --- a/pandas/tests/series/test_misc_api.py +++ b/pandas/tests/series/test_api.py @@ -1,15 +1,18 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +from collections import OrderedDict + +import pytest import numpy as np import pandas as pd from pandas import Index, Series, DataFrame, date_range -from pandas.tseries.index import Timestamp +from pandas.core.indexes.datetimes import Timestamp from pandas.compat import range from pandas import compat -import pandas.formats.printing as printing +import pandas.io.formats.printing as printing from pandas.util.testing import (assert_series_equal, ensure_clean) import pandas.util.testing as tm @@ -18,49 +21,58 @@ class SharedWithSparse(object): + """ + A collection of tests Series and SparseSeries can share. + + In generic tests on this class, use ``self._assert_series_equal()`` + which is implemented in sub-classes. + """ + def _assert_series_equal(self, left, right): + """Dispatch to series class dependent assertion""" + raise NotImplementedError def test_scalarop_preserve_name(self): result = self.ts * 2 - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_copy_name(self): result = self.ts.copy() - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_copy_index_name_checking(self): # don't want to be able to modify the index stored elsewhere after # making a copy self.ts.index.name = None - self.assertIsNone(self.ts.index.name) - self.assertIs(self.ts, self.ts) + assert self.ts.index.name is None + assert self.ts is self.ts cp = self.ts.copy() cp.index.name = 'foo' printing.pprint_thing(self.ts.index.name) - self.assertIsNone(self.ts.index.name) + assert self.ts.index.name is None def test_append_preserve_name(self): result = self.ts[:5].append(self.ts[5:]) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_binop_maybe_preserve_name(self): # names match, preserve result = self.ts * self.ts - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name result = self.ts.mul(self.ts) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name result = self.ts * self.ts[:-2] - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name # names don't match, don't preserve cp = self.ts.copy() cp.name = 'something else' result = self.ts + cp - self.assertIsNone(result.name) + assert result.name is None result = self.ts.add(cp) - self.assertIsNone(result.name) + assert result.name is None ops = ['add', 'sub', 'mul', 'div', 'truediv', 'floordiv', 'mod', 'pow'] ops = ops + ['r' + op for op in ops] @@ -68,27 +80,27 @@ def test_binop_maybe_preserve_name(self): # names match, preserve s = self.ts.copy() result = getattr(s, op)(s) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name # names don't match, don't preserve cp = self.ts.copy() cp.name = 'changed' result = getattr(s, op)(cp) - self.assertIsNone(result.name) + assert result.name is None def test_combine_first_name(self): result = self.ts.combine_first(self.ts[:5]) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_getitem_preserve_name(self): result = self.ts[self.ts > 0] - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name result = self.ts[[0, 2, 4]] - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name result = self.ts[5:10] - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_pickle(self): unp_series = self._pickle_roundtrip(self.series) @@ -105,122 +117,203 @@ def _pickle_roundtrip(self, obj): def test_argsort_preserve_name(self): result = self.ts.argsort() - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_sort_index_name(self): result = self.ts.sort_index(ascending=False) - self.assertEqual(result.name, self.ts.name) + assert result.name == self.ts.name def test_to_sparse_pass_name(self): result = self.ts.to_sparse() - self.assertEqual(result.name, self.ts.name) - - -class TestSeriesMisc(TestData, SharedWithSparse, tm.TestCase): - - _multiprocess_can_split_ = True + assert result.name == self.ts.name + + def test_constructor_dict(self): + d = {'a': 0., 'b': 1., 'c': 2.} + result = self.series_klass(d) + expected = self.series_klass(d, index=sorted(d.keys())) + self._assert_series_equal(result, expected) + + result = self.series_klass(d, index=['b', 'c', 'd', 'a']) + expected = self.series_klass([1, 2, np.nan, 0], + index=['b', 'c', 'd', 'a']) + self._assert_series_equal(result, expected) + + def test_constructor_subclass_dict(self): + data = tm.TestSubDict((x, 10.0 * x) for x in range(10)) + series = self.series_klass(data) + expected = self.series_klass(dict(compat.iteritems(data))) + self._assert_series_equal(series, expected) + + def test_constructor_ordereddict(self): + # GH3283 + data = OrderedDict( + ('col%s' % i, np.random.random()) for i in range(12)) + + series = self.series_klass(data) + expected = self.series_klass(list(data.values()), list(data.keys())) + self._assert_series_equal(series, expected) + + # Test with subclass + class A(OrderedDict): + pass + + series = self.series_klass(A(data)) + self._assert_series_equal(series, expected) + + def test_constructor_dict_multiindex(self): + d = {('a', 'a'): 0., ('b', 'a'): 1., ('b', 'c'): 2.} + _d = sorted(d.items()) + result = self.series_klass(d) + expected = self.series_klass( + [x[1] for x in _d], + index=pd.MultiIndex.from_tuples([x[0] for x in _d])) + self._assert_series_equal(result, expected) + + d['z'] = 111. + _d.insert(0, ('z', d['z'])) + result = self.series_klass(d) + expected = self.series_klass([x[1] for x in _d], + index=pd.Index([x[0] for x in _d], + tupleize_cols=False)) + result = result.reindex(index=expected.index) + self._assert_series_equal(result, expected) + + def test_constructor_dict_timedelta_index(self): + # GH #12169 : Resample category data with timedelta index + # construct Series from dict as data and TimedeltaIndex as index + # will result NaN in result Series data + expected = self.series_klass( + data=['A', 'B', 'C'], + index=pd.to_timedelta([0, 10, 20], unit='s') + ) + + result = self.series_klass( + data={pd.to_timedelta(0, unit='s'): 'A', + pd.to_timedelta(10, unit='s'): 'B', + pd.to_timedelta(20, unit='s'): 'C'}, + index=pd.to_timedelta([0, 10, 20], unit='s') + ) + self._assert_series_equal(result, expected) + + +class TestSeriesMisc(TestData, SharedWithSparse): + + series_klass = Series + # SharedWithSparse tests use generic, series_klass-agnostic assertion + _assert_series_equal = staticmethod(tm.assert_series_equal) def test_tab_completion(self): # GH 9910 s = Series(list('abcd')) # Series of str values should have .str but not .dt/.cat in __dir__ - self.assertTrue('str' in dir(s)) - self.assertTrue('dt' not in dir(s)) - self.assertTrue('cat' not in dir(s)) + assert 'str' in dir(s) + assert 'dt' not in dir(s) + assert 'cat' not in dir(s) # similiarly for .dt s = Series(date_range('1/1/2015', periods=5)) - self.assertTrue('dt' in dir(s)) - self.assertTrue('str' not in dir(s)) - self.assertTrue('cat' not in dir(s)) + assert 'dt' in dir(s) + assert 'str' not in dir(s) + assert 'cat' not in dir(s) - # similiarly for .cat, but with the twist that str and dt should be - # there if the categories are of that type first cat and str + # Similarly for .cat, but with the twist that str and dt should be + # there if the categories are of that type first cat and str. s = Series(list('abbcd'), dtype="category") - self.assertTrue('cat' in dir(s)) - self.assertTrue('str' in dir(s)) # as it is a string categorical - self.assertTrue('dt' not in dir(s)) + assert 'cat' in dir(s) + assert 'str' in dir(s) # as it is a string categorical + assert 'dt' not in dir(s) # similar to cat and str s = Series(date_range('1/1/2015', periods=5)).astype("category") - self.assertTrue('cat' in dir(s)) - self.assertTrue('str' not in dir(s)) - self.assertTrue('dt' in dir(s)) # as it is a datetime categorical + assert 'cat' in dir(s) + assert 'str' not in dir(s) + assert 'dt' in dir(s) # as it is a datetime categorical def test_not_hashable(self): s_empty = Series() s = Series([1]) - self.assertRaises(TypeError, hash, s_empty) - self.assertRaises(TypeError, hash, s) + pytest.raises(TypeError, hash, s_empty) + pytest.raises(TypeError, hash, s) def test_contains(self): tm.assert_contains_all(self.ts.index, self.ts) def test_iter(self): for i, val in enumerate(self.series): - self.assertEqual(val, self.series[i]) + assert val == self.series[i] for i, val in enumerate(self.ts): - self.assertEqual(val, self.ts[i]) + assert val == self.ts[i] def test_iter_box(self): vals = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' for res, exp in zip(s, vals): - self.assertIsInstance(res, pd.Timestamp) - self.assertEqual(res, exp) - self.assertIsNone(res.tz) + assert isinstance(res, pd.Timestamp) + assert res.tz is None + assert res == exp vals = [pd.Timestamp('2011-01-01', tz='US/Eastern'), pd.Timestamp('2011-01-02', tz='US/Eastern')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns, US/Eastern]') + + assert s.dtype == 'datetime64[ns, US/Eastern]' for res, exp in zip(s, vals): - self.assertIsInstance(res, pd.Timestamp) - self.assertEqual(res, exp) - self.assertEqual(res.tz, exp.tz) + assert isinstance(res, pd.Timestamp) + assert res.tz == exp.tz + assert res == exp # timedelta vals = [pd.Timedelta('1 days'), pd.Timedelta('2 days')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' for res, exp in zip(s, vals): - self.assertIsInstance(res, pd.Timedelta) - self.assertEqual(res, exp) + assert isinstance(res, pd.Timedelta) + assert res == exp # period (object dtype, not boxed) vals = [pd.Period('2011-01-01', freq='M'), pd.Period('2011-01-02', freq='M')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'object') + assert s.dtype == 'object' for res, exp in zip(s, vals): - self.assertIsInstance(res, pd.Period) - self.assertEqual(res, exp) - self.assertEqual(res.freq, 'M') + assert isinstance(res, pd.Period) + assert res.freq == 'M' + assert res == exp def test_keys(self): # HACK: By doing this in two stages, we avoid 2to3 wrapping the call # to .keys() in a list() getkeys = self.ts.keys - self.assertIs(getkeys(), self.ts.index) + assert getkeys() is self.ts.index def test_values(self): - self.assert_almost_equal(self.ts.values, self.ts, check_dtype=False) + tm.assert_almost_equal(self.ts.values, self.ts, check_dtype=False) def test_iteritems(self): for idx, val in compat.iteritems(self.series): - self.assertEqual(val, self.series[idx]) + assert val == self.series[idx] for idx, val in compat.iteritems(self.ts): - self.assertEqual(val, self.ts[idx]) + assert val == self.ts[idx] + + # assert is lazy (genrators don't define reverse, lists do) + assert not hasattr(self.series.iteritems(), 'reverse') + + def test_items(self): + for idx, val in self.series.items(): + assert val == self.series[idx] + + for idx, val in self.ts.items(): + assert val == self.ts[idx] # assert is lazy (genrators don't define reverse, lists do) - self.assertFalse(hasattr(self.series.iteritems(), 'reverse')) + assert not hasattr(self.series.items(), 'reverse') def test_raise_on_info(self): s = Series(np.random.randn(10)) - with tm.assertRaises(AttributeError): + with pytest.raises(AttributeError): s.info() def test_copy(self): @@ -238,12 +331,12 @@ def test_copy(self): if deep is None or deep is True: # Did not modify original Series - self.assertTrue(np.isnan(s2[0])) - self.assertFalse(np.isnan(s[0])) + assert np.isnan(s2[0]) + assert not np.isnan(s[0]) else: # we DID modify the original Series - self.assertTrue(np.isnan(s2[0])) - self.assertTrue(np.isnan(s[0])) + assert np.isnan(s2[0]) + assert np.isnan(s[0]) # GH 11794 # copy of tz-aware @@ -274,9 +367,9 @@ def test_copy(self): def test_axis_alias(self): s = Series([1, 2, np.nan]) assert_series_equal(s.dropna(axis='rows'), s.dropna(axis='index')) - self.assertEqual(s.dropna().sum('rows'), 3) - self.assertEqual(s._get_axis_number('rows'), 0) - self.assertEqual(s._get_axis_name('rows'), 'index') + assert s.dropna().sum('rows') == 3 + assert s._get_axis_number('rows') == 0 + assert s._get_axis_name('rows') == 'index' def test_numpy_unique(self): # it works! @@ -293,19 +386,19 @@ def f(x): result = tsdf.apply(f) expected = tsdf.max() - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # .item() s = Series([1]) result = s.item() - self.assertEqual(result, 1) - self.assertEqual(s.item(), s.iloc[0]) + assert result == 1 + assert s.item() == s.iloc[0] # using an ndarray like function s = Series(np.random.randn(10)) - result = np.ones_like(s) + result = Series(np.ones_like(s)) expected = Series(1, index=range(10), dtype='float64') - # assert_series_equal(result,expected) + tm.assert_series_equal(result, expected) # ravel s = Series(np.random.randn(10)) @@ -315,21 +408,21 @@ def f(x): # GH 6658 s = Series([0, 1., -1], index=list('abc')) result = np.compress(s > 0, s) - assert_series_equal(result, Series([1.], index=['b'])) + tm.assert_series_equal(result, Series([1.], index=['b'])) result = np.compress(s < -1, s) # result empty Index(dtype=object) as the same as original exp = Series([], dtype='float64', index=Index([], dtype='object')) - assert_series_equal(result, exp) + tm.assert_series_equal(result, exp) s = Series([0, 1., -1], index=[.1, .2, .3]) result = np.compress(s > 0, s) - assert_series_equal(result, Series([1.], index=[.2])) + tm.assert_series_equal(result, Series([1.], index=[.2])) result = np.compress(s < -1, s) # result empty Float64Index as the same as original exp = Series([], dtype='float64', index=Index([], dtype='float64')) - assert_series_equal(result, exp) + tm.assert_series_equal(result, exp) def test_str_attribute(self): # GH9068 @@ -341,12 +434,13 @@ def test_str_attribute(self): # str accessor only valid with string values s = Series(range(5)) - with self.assertRaisesRegexp(AttributeError, 'only use .str accessor'): + with tm.assert_raises_regex(AttributeError, + 'only use .str accessor'): s.str.repeat(2) def test_empty_method(self): s_empty = pd.Series() - tm.assert_equal(s_empty.empty, True) + assert s_empty.empty for full_series in [pd.Series([1]), pd.Series(index=[1])]: - tm.assert_equal(full_series.empty, False) + assert not full_series.empty diff --git a/pandas/tests/series/test_apply.py b/pandas/tests/series/test_apply.py index ec7ffde344d31..e3be5427588b3 100644 --- a/pandas/tests/series/test_apply.py +++ b/pandas/tests/series/test_apply.py @@ -1,45 +1,42 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + +from collections import Counter, defaultdict, OrderedDict + import numpy as np import pandas as pd -from pandas import (Index, Series, DataFrame, isnull) +from pandas import (Index, Series, DataFrame, isna) from pandas.compat import lrange from pandas import compat -from pandas.util.testing import assert_series_equal +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from .common import TestData -class TestSeriesApply(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesApply(TestData): def test_apply(self): with np.errstate(all='ignore'): - assert_series_equal(self.ts.apply(np.sqrt), np.sqrt(self.ts)) + tm.assert_series_equal(self.ts.apply(np.sqrt), np.sqrt(self.ts)) - # elementwise-apply + # element-wise apply import math - assert_series_equal(self.ts.apply(math.exp), np.exp(self.ts)) - - # how to handle Series result, #2316 - result = self.ts.apply(lambda x: Series( - [x, x ** 2], index=['x', 'x^2'])) - expected = DataFrame({'x': self.ts, 'x^2': self.ts ** 2}) - tm.assert_frame_equal(result, expected) + tm.assert_series_equal(self.ts.apply(math.exp), np.exp(self.ts)) # empty series s = Series(dtype=object, name='foo', index=pd.Index([], name='bar')) rs = s.apply(lambda x: x) tm.assert_series_equal(s, rs) + # check all metadata (GH 9322) - self.assertIsNot(s, rs) - self.assertIs(s.index, rs.index) - self.assertEqual(s.dtype, rs.dtype) - self.assertEqual(s.name, rs.name) + assert s is not rs + assert s.index is rs.index + assert s.dtype == rs.dtype + assert s.name == rs.name # index but no data s = Series(index=[1, 2, 3]) @@ -64,20 +61,27 @@ def test_apply_dont_convert_dtype(self): f = lambda x: x if x > 0 else np.nan result = s.apply(f, convert_dtype=False) - self.assertEqual(result.dtype, object) + assert result.dtype == object + + def test_with_string_args(self): + + for arg in ['sum', 'mean', 'min', 'max', 'std']: + result = self.ts.apply(arg) + expected = getattr(self.ts, arg)() + assert result == expected def test_apply_args(self): s = Series(['foo,bar']) result = s.apply(str.split, args=(',', )) - self.assertEqual(result[0], ['foo', 'bar']) - tm.assertIsInstance(result[0], list) + assert result[0] == ['foo', 'bar'] + assert isinstance(result[0], list) def test_apply_box(self): # ufunc will not be boxed. Same test cases as the test_map_box vals = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' # boxed value must be Timestamp instance res = s.apply(lambda x: '{0}_{1}_{2}'.format(x.__class__.__name__, x.day, x.tz)) @@ -87,7 +91,7 @@ def test_apply_box(self): vals = [pd.Timestamp('2011-01-01', tz='US/Eastern'), pd.Timestamp('2011-01-02', tz='US/Eastern')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns, US/Eastern]') + assert s.dtype == 'datetime64[ns, US/Eastern]' res = s.apply(lambda x: '{0}_{1}_{2}'.format(x.__class__.__name__, x.day, x.tz)) exp = pd.Series(['Timestamp_1_US/Eastern', 'Timestamp_2_US/Eastern']) @@ -96,7 +100,7 @@ def test_apply_box(self): # timedelta vals = [pd.Timedelta('1 days'), pd.Timedelta('2 days')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' res = s.apply(lambda x: '{0}_{1}'.format(x.__class__.__name__, x.days)) exp = pd.Series(['Timedelta_1', 'Timedelta_2']) tm.assert_series_equal(res, exp) @@ -105,7 +109,7 @@ def test_apply_box(self): vals = [pd.Period('2011-01-01', freq='M'), pd.Period('2011-01-02', freq='M')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'object') + assert s.dtype == 'object' res = s.apply(lambda x: '{0}_{1}'.format(x.__class__.__name__, x.freqstr)) exp = pd.Series(['Period_M', 'Period_M']) @@ -138,11 +142,189 @@ def f(x): exp = pd.Series(['Asia/Tokyo'] * 25, name='XX') tm.assert_series_equal(result, exp) + def test_apply_dict_depr(self): + + tsdf = pd.DataFrame(np.random.randn(10, 3), + columns=['A', 'B', 'C'], + index=pd.date_range('1/1/2000', periods=10)) + with tm.assert_produces_warning(FutureWarning): + tsdf.A.agg({'foo': ['sum', 'mean']}) -class TestSeriesMap(TestData, tm.TestCase): + +class TestSeriesAggregate(TestData): _multiprocess_can_split_ = True + def test_transform(self): + # transforming functions + + with np.errstate(all='ignore'): + + f_sqrt = np.sqrt(self.series) + f_abs = np.abs(self.series) + + # ufunc + result = self.series.transform(np.sqrt) + expected = f_sqrt.copy() + assert_series_equal(result, expected) + + result = self.series.apply(np.sqrt) + assert_series_equal(result, expected) + + # list-like + result = self.series.transform([np.sqrt]) + expected = f_sqrt.to_frame().copy() + expected.columns = ['sqrt'] + assert_frame_equal(result, expected) + + result = self.series.transform([np.sqrt]) + assert_frame_equal(result, expected) + + result = self.series.transform(['sqrt']) + assert_frame_equal(result, expected) + + # multiple items in list + # these are in the order as if we are applying both functions per + # series and then concatting + expected = pd.concat([f_sqrt, f_abs], axis=1) + expected.columns = ['sqrt', 'absolute'] + result = self.series.apply([np.sqrt, np.abs]) + assert_frame_equal(result, expected) + + result = self.series.transform(['sqrt', 'abs']) + expected.columns = ['sqrt', 'abs'] + assert_frame_equal(result, expected) + + # dict, provide renaming + expected = pd.concat([f_sqrt, f_abs], axis=1) + expected.columns = ['foo', 'bar'] + expected = expected.unstack().rename('series') + + result = self.series.apply({'foo': np.sqrt, 'bar': np.abs}) + assert_series_equal(result.reindex_like(expected), expected) + + def test_transform_and_agg_error(self): + # we are trying to transform with an aggregator + def f(): + self.series.transform(['min', 'max']) + pytest.raises(ValueError, f) + + def f(): + with np.errstate(all='ignore'): + self.series.agg(['sqrt', 'max']) + pytest.raises(ValueError, f) + + def f(): + with np.errstate(all='ignore'): + self.series.transform(['sqrt', 'max']) + pytest.raises(ValueError, f) + + def f(): + with np.errstate(all='ignore'): + self.series.agg({'foo': np.sqrt, 'bar': 'sum'}) + pytest.raises(ValueError, f) + + def test_demo(self): + # demonstration tests + s = Series(range(6), dtype='int64', name='series') + + result = s.agg(['min', 'max']) + expected = Series([0, 5], index=['min', 'max'], name='series') + tm.assert_series_equal(result, expected) + + result = s.agg({'foo': 'min'}) + expected = Series([0], index=['foo'], name='series') + tm.assert_series_equal(result, expected) + + # nested renaming + with tm.assert_produces_warning(FutureWarning): + result = s.agg({'foo': ['min', 'max']}) + + expected = DataFrame( + {'foo': [0, 5]}, + index=['min', 'max']).unstack().rename('series') + tm.assert_series_equal(result, expected) + + def test_multiple_aggregators_with_dict_api(self): + + s = Series(range(6), dtype='int64', name='series') + # nested renaming + with tm.assert_produces_warning(FutureWarning): + result = s.agg({'foo': ['min', 'max'], 'bar': ['sum', 'mean']}) + + expected = DataFrame( + {'foo': [5.0, np.nan, 0.0, np.nan], + 'bar': [np.nan, 2.5, np.nan, 15.0]}, + columns=['foo', 'bar'], + index=['max', 'mean', + 'min', 'sum']).unstack().rename('series') + tm.assert_series_equal(result.reindex_like(expected), expected) + + def test_agg_apply_evaluate_lambdas_the_same(self): + # test that we are evaluating row-by-row first + # before vectorized evaluation + result = self.series.apply(lambda x: str(x)) + expected = self.series.agg(lambda x: str(x)) + tm.assert_series_equal(result, expected) + + result = self.series.apply(str) + expected = self.series.agg(str) + tm.assert_series_equal(result, expected) + + def test_with_nested_series(self): + # GH 2316 + # .agg with a reducer and a transform, what to do + result = self.ts.apply(lambda x: Series( + [x, x ** 2], index=['x', 'x^2'])) + expected = DataFrame({'x': self.ts, 'x^2': self.ts ** 2}) + tm.assert_frame_equal(result, expected) + + result = self.ts.agg(lambda x: Series( + [x, x ** 2], index=['x', 'x^2'])) + tm.assert_frame_equal(result, expected) + + def test_replicate_describe(self): + # this also tests a result set that is all scalars + expected = self.series.describe() + result = self.series.apply(OrderedDict( + [('count', 'count'), + ('mean', 'mean'), + ('std', 'std'), + ('min', 'min'), + ('25%', lambda x: x.quantile(0.25)), + ('50%', 'median'), + ('75%', lambda x: x.quantile(0.75)), + ('max', 'max')])) + assert_series_equal(result, expected) + + def test_reduce(self): + # reductions with named functions + result = self.series.agg(['sum', 'mean']) + expected = Series([self.series.sum(), + self.series.mean()], + ['sum', 'mean'], + name=self.series.name) + assert_series_equal(result, expected) + + def test_non_callable_aggregates(self): + # test agg using non-callable series attributes + s = Series([1, 2, None]) + + # Calling agg w/ just a string arg same as calling s.arg + result = s.agg('size') + expected = s.size + assert result == expected + + # test when mixed w/ callable reducers + result = s.agg(['size', 'count', 'mean']) + expected = Series(OrderedDict({'size': 3.0, + 'count': 2.0, + 'mean': 1.5})) + assert_series_equal(result[expected.index], expected) + + +class TestSeriesMap(TestData): + def test_map(self): index, data = tm.getMixedTypeDict() @@ -152,17 +334,17 @@ def test_map(self): merged = target.map(source) for k, v in compat.iteritems(merged): - self.assertEqual(v, source[target[k]]) + assert v == source[target[k]] # input could be a dict merged = target.map(source.to_dict()) for k, v in compat.iteritems(merged): - self.assertEqual(v, source[target[k]]) + assert v == source[target[k]] # function result = self.ts.map(lambda x: x * 2) - self.assert_series_equal(result, self.ts * 2) + tm.assert_series_equal(result, self.ts * 2) # GH 10324 a = Series([1, 2, 3, 4]) @@ -170,9 +352,9 @@ def test_map(self): c = Series(["even", "odd", "even", "odd"]) exp = Series(["odd", "even", "odd", np.nan], dtype="category") - self.assert_series_equal(a.map(b), exp) + tm.assert_series_equal(a.map(b), exp) exp = Series(["odd", "even", "odd", np.nan]) - self.assert_series_equal(a.map(c), exp) + tm.assert_series_equal(a.map(c), exp) a = Series(['a', 'b', 'c', 'd']) b = Series([1, 2, 3, 4], @@ -180,9 +362,9 @@ def test_map(self): c = Series([1, 2, 3, 4], index=Index(['b', 'c', 'd', 'e'])) exp = Series([np.nan, 1, 2, 3]) - self.assert_series_equal(a.map(b), exp) + tm.assert_series_equal(a.map(b), exp) exp = Series([np.nan, 1, 2, 3]) - self.assert_series_equal(a.map(c), exp) + tm.assert_series_equal(a.map(c), exp) a = Series(['a', 'b', 'c', 'd']) b = Series(['B', 'C', 'D', 'E'], dtype='category', @@ -191,9 +373,9 @@ def test_map(self): exp = Series(pd.Categorical([np.nan, 'B', 'C', 'D'], categories=['B', 'C', 'D', 'E'])) - self.assert_series_equal(a.map(b), exp) + tm.assert_series_equal(a.map(b), exp) exp = Series([np.nan, 'B', 'C', 'D']) - self.assert_series_equal(a.map(c), exp) + tm.assert_series_equal(a.map(c), exp) def test_map_compat(self): # related GH 8024 @@ -206,25 +388,25 @@ def test_map_int(self): left = Series({'a': 1., 'b': 2., 'c': 3., 'd': 4}) right = Series({1: 11, 2: 22, 3: 33}) - self.assertEqual(left.dtype, np.float_) - self.assertTrue(issubclass(right.dtype.type, np.integer)) + assert left.dtype == np.float_ + assert issubclass(right.dtype.type, np.integer) merged = left.map(right) - self.assertEqual(merged.dtype, np.float_) - self.assertTrue(isnull(merged['d'])) - self.assertTrue(not isnull(merged['c'])) + assert merged.dtype == np.float_ + assert isna(merged['d']) + assert not isna(merged['c']) def test_map_type_inference(self): s = Series(lrange(3)) s2 = s.map(lambda x: np.where(x == 0, 0, 1)) - self.assertTrue(issubclass(s2.dtype.type, np.integer)) + assert issubclass(s2.dtype.type, np.integer) def test_map_decimal(self): from decimal import Decimal result = self.series.map(lambda x: Decimal(str(x))) - self.assertEqual(result.dtype, np.object_) - tm.assertIsInstance(result[0], Decimal) + assert result.dtype == np.object_ + assert isinstance(result[0], Decimal) def test_map_na_exclusion(self): s = Series([1.5, np.nan, 3, np.nan, 5]) @@ -248,10 +430,50 @@ def test_map_dict_with_tuple_keys(self): tm.assert_series_equal(df['labels'], df['expected_labels'], check_names=False) + def test_map_counter(self): + s = Series(['a', 'b', 'c'], index=[1, 2, 3]) + counter = Counter() + counter['b'] = 5 + counter['c'] += 1 + result = s.map(counter) + expected = Series([0, 5, 1], index=[1, 2, 3]) + assert_series_equal(result, expected) + + def test_map_defaultdict(self): + s = Series([1, 2, 3], index=['a', 'b', 'c']) + default_dict = defaultdict(lambda: 'blank') + default_dict[1] = 'stuff' + result = s.map(default_dict) + expected = Series(['stuff', 'blank', 'blank'], index=['a', 'b', 'c']) + assert_series_equal(result, expected) + + def test_map_dict_subclass_with_missing(self): + """ + Test Series.map with a dictionary subclass that defines __missing__, + i.e. sets a default value (GH #15999). + """ + class DictWithMissing(dict): + def __missing__(self, key): + return 'missing' + s = Series([1, 2, 3]) + dictionary = DictWithMissing({3: 'three'}) + result = s.map(dictionary) + expected = Series(['missing', 'missing', 'three']) + assert_series_equal(result, expected) + + def test_map_dict_subclass_without_missing(self): + class DictWithoutMissing(dict): + pass + s = Series([1, 2, 3]) + dictionary = DictWithoutMissing({3: 'three'}) + result = s.map(dictionary) + expected = Series([np.nan, np.nan, 'three']) + assert_series_equal(result, expected) + def test_map_box(self): vals = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' # boxed value must be Timestamp instance res = s.map(lambda x: '{0}_{1}_{2}'.format(x.__class__.__name__, x.day, x.tz)) @@ -261,7 +483,7 @@ def test_map_box(self): vals = [pd.Timestamp('2011-01-01', tz='US/Eastern'), pd.Timestamp('2011-01-02', tz='US/Eastern')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'datetime64[ns, US/Eastern]') + assert s.dtype == 'datetime64[ns, US/Eastern]' res = s.map(lambda x: '{0}_{1}_{2}'.format(x.__class__.__name__, x.day, x.tz)) exp = pd.Series(['Timestamp_1_US/Eastern', 'Timestamp_2_US/Eastern']) @@ -270,7 +492,7 @@ def test_map_box(self): # timedelta vals = [pd.Timedelta('1 days'), pd.Timedelta('2 days')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' res = s.map(lambda x: '{0}_{1}'.format(x.__class__.__name__, x.days)) exp = pd.Series(['Timedelta_1', 'Timedelta_2']) tm.assert_series_equal(res, exp) @@ -279,7 +501,7 @@ def test_map_box(self): vals = [pd.Period('2011-01-01', freq='M'), pd.Period('2011-01-02', freq='M')] s = pd.Series(vals) - self.assertEqual(s.dtype, 'object') + assert s.dtype == 'object' res = s.map(lambda x: '{0}_{1}'.format(x.__class__.__name__, x.freqstr)) exp = pd.Series(['Period_M', 'Period_M']) @@ -300,9 +522,9 @@ def test_map_categorical(self): result = s.map(lambda x: 'A') exp = pd.Series(['A'] * 7, name='XX', index=list('abcdefg')) tm.assert_series_equal(result, exp) - self.assertEqual(result.dtype, np.object) + assert result.dtype == np.object - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): s.map(lambda x: x, na_action='ignore') def test_map_datetimetz(self): @@ -323,7 +545,7 @@ def test_map_datetimetz(self): exp = pd.Series(list(range(24)) + [0], name='XX', dtype=np.int64) tm.assert_series_equal(result, exp) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): s.map(lambda x: x, na_action='ignore') # not vectorized diff --git a/pandas/tests/series/test_asof.py b/pandas/tests/series/test_asof.py index e2092feab9004..3104d85601434 100644 --- a/pandas/tests/series/test_asof.py +++ b/pandas/tests/series/test_asof.py @@ -1,19 +1,17 @@ # coding=utf-8 -import nose +import pytest import numpy as np - -from pandas import (offsets, Series, notnull, - isnull, date_range, Timestamp) +from pandas import (offsets, Series, notna, + isna, date_range, Timestamp) import pandas.util.testing as tm from .common import TestData -class TestSeriesAsof(TestData, tm.TestCase): - _multiprocess_can_split_ = True +class TestSeriesAsof(TestData): def test_basic(self): @@ -25,21 +23,21 @@ def test_basic(self): dates = date_range('1/1/1990', periods=N * 3, freq='25s') result = ts.asof(dates) - self.assertTrue(notnull(result).all()) + assert notna(result).all() lb = ts.index[14] ub = ts.index[30] result = ts.asof(list(dates)) - self.assertTrue(notnull(result).all()) + assert notna(result).all() lb = ts.index[14] ub = ts.index[30] mask = (result.index >= lb) & (result.index < ub) rs = result[mask] - self.assertTrue((rs == ts[lb]).all()) + assert (rs == ts[lb]).all() val = result[result.index[result.index >= ub][0]] - self.assertEqual(ts[ub], val) + assert ts[ub] == val def test_scalar(self): @@ -52,20 +50,20 @@ def test_scalar(self): val1 = ts.asof(ts.index[7]) val2 = ts.asof(ts.index[19]) - self.assertEqual(val1, ts[4]) - self.assertEqual(val2, ts[14]) + assert val1 == ts[4] + assert val2 == ts[14] # accepts strings val1 = ts.asof(str(ts.index[7])) - self.assertEqual(val1, ts[4]) + assert val1 == ts[4] # in there result = ts.asof(ts.index[3]) - self.assertEqual(result, ts[3]) + assert result == ts[3] # no as of value d = ts.index[0] - offsets.BDay() - self.assertTrue(np.isnan(ts.asof(d))) + assert np.isnan(ts.asof(d)) def test_with_nan(self): # basic asof test @@ -100,19 +98,19 @@ def test_periodindex(self): dates = date_range('1/1/1990', periods=N * 3, freq='37min') result = ts.asof(dates) - self.assertTrue(notnull(result).all()) + assert notna(result).all() lb = ts.index[14] ub = ts.index[30] result = ts.asof(list(dates)) - self.assertTrue(notnull(result).all()) + assert notna(result).all() lb = ts.index[14] ub = ts.index[30] pix = PeriodIndex(result.index.values, freq='H') mask = (pix >= lb) & (pix < ub) rs = result[mask] - self.assertTrue((rs == ts[lb]).all()) + assert (rs == ts[lb]).all() ts[5:10] = np.nan ts[15:20] = np.nan @@ -120,19 +118,19 @@ def test_periodindex(self): val1 = ts.asof(ts.index[7]) val2 = ts.asof(ts.index[19]) - self.assertEqual(val1, ts[4]) - self.assertEqual(val2, ts[14]) + assert val1 == ts[4] + assert val2 == ts[14] # accepts strings val1 = ts.asof(str(ts.index[7])) - self.assertEqual(val1, ts[4]) + assert val1 == ts[4] # in there - self.assertEqual(ts.asof(ts.index[3]), ts[3]) + assert ts.asof(ts.index[3]) == ts[3] # no as of value d = ts.index[0].to_timestamp() - offsets.BDay() - self.assertTrue(isnull(ts.asof(d))) + assert isna(ts.asof(d)) def test_errors(self): @@ -142,17 +140,39 @@ def test_errors(self): Timestamp('20130102')]) # non-monotonic - self.assertFalse(s.index.is_monotonic) - with self.assertRaises(ValueError): + assert not s.index.is_monotonic + with pytest.raises(ValueError): s.asof(s.index[0]) # subset with Series N = 10 rng = date_range('1/1/1990', periods=N, freq='53s') s = Series(np.random.randn(N), index=rng) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): s.asof(s.index[0], subset='foo') -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_all_nans(self): + # GH 15713 + # series is all nans + result = Series([np.nan]).asof([0]) + expected = Series([np.nan]) + tm.assert_series_equal(result, expected) + + # testing non-default indexes + N = 50 + rng = date_range('1/1/1990', periods=N, freq='53s') + + dates = date_range('1/1/1990', periods=N * 3, freq='25s') + result = Series(np.nan, index=rng).asof(dates) + expected = Series(np.nan, index=dates) + tm.assert_series_equal(result, expected) + + # testing scalar input + date = date_range('1/1/1990', periods=N * 3, freq='25s')[0] + result = Series(np.nan, index=rng).asof(date) + assert isna(result) + + # test name is propagated + result = Series(np.nan, index=[1, 2, 3, 4], name='test').asof([4, 5]) + expected = Series(np.nan, index=[4, 5], name='test') + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_combine_concat.py b/pandas/tests/series/test_combine_concat.py index 23261c2ef79e2..71ac00975af03 100644 --- a/pandas/tests/series/test_combine_concat.py +++ b/pandas/tests/series/test_combine_concat.py @@ -1,13 +1,15 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime from numpy import nan import numpy as np import pandas as pd -from pandas import Series, DataFrame +from pandas import Series, DataFrame, date_range, DatetimeIndex from pandas import compat from pandas.util.testing import assert_series_equal @@ -16,22 +18,20 @@ from .common import TestData -class TestSeriesCombine(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesCombine(TestData): def test_append(self): appendedSeries = self.series.append(self.objSeries) for idx, value in compat.iteritems(appendedSeries): if idx in self.series.index: - self.assertEqual(value, self.series[idx]) + assert value == self.series[idx] elif idx in self.objSeries.index: - self.assertEqual(value, self.objSeries[idx]) + assert value == self.objSeries[idx] else: self.fail("orphaned index!") - self.assertRaises(ValueError, self.ts.append, self.ts, - verify_integrity=True) + pytest.raises(ValueError, self.ts.append, self.ts, + verify_integrity=True) def test_append_many(self): pieces = [self.ts[:5], self.ts[5:10], self.ts[10:]] @@ -55,9 +55,9 @@ def test_append_duplicates(self): exp, check_index_type=True) msg = 'Indexes have overlapping values:' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): s1.append(s2, verify_integrity=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): pd.concat([s1, s2], verify_integrity=True) def test_combine_first(self): @@ -70,14 +70,14 @@ def test_combine_first(self): # nothing used from the input combined = series.combine_first(series_copy) - self.assert_series_equal(combined, series) + tm.assert_series_equal(combined, series) # Holes filled from input combined = series_copy.combine_first(series) - self.assertTrue(np.isfinite(combined).all()) + assert np.isfinite(combined).all() - self.assert_series_equal(combined[::2], series[::2]) - self.assert_series_equal(combined[1::2], series_copy[1::2]) + tm.assert_series_equal(combined[::2], series[::2]) + tm.assert_series_equal(combined[1::2], series_copy[1::2]) # mixed types index = tm.makeStringIndex(20) @@ -117,9 +117,9 @@ def test_concat_empty_series_dtypes_roundtrips(self): 'M8[ns]']) for dtype in dtypes: - self.assertEqual(pd.concat([Series(dtype=dtype)]).dtype, dtype) - self.assertEqual(pd.concat([Series(dtype=dtype), - Series(dtype=dtype)]).dtype, dtype) + assert pd.concat([Series(dtype=dtype)]).dtype == dtype + assert pd.concat([Series(dtype=dtype), + Series(dtype=dtype)]).dtype == dtype def int_result_type(dtype, dtype2): typs = set([dtype.kind, dtype2.kind]) @@ -155,58 +155,55 @@ def get_result_type(dtype, dtype2): expected = get_result_type(dtype, dtype2) result = pd.concat([Series(dtype=dtype), Series(dtype=dtype2) ]).dtype - self.assertEqual(result.kind, expected) + assert result.kind == expected def test_concat_empty_series_dtypes(self): - # bools - self.assertEqual(pd.concat([Series(dtype=np.bool_), - Series(dtype=np.int32)]).dtype, np.int32) - self.assertEqual(pd.concat([Series(dtype=np.bool_), - Series(dtype=np.float32)]).dtype, - np.object_) - - # datetimelike - self.assertEqual(pd.concat([Series(dtype='m8[ns]'), - Series(dtype=np.bool)]).dtype, np.object_) - self.assertEqual(pd.concat([Series(dtype='m8[ns]'), - Series(dtype=np.int64)]).dtype, np.object_) - self.assertEqual(pd.concat([Series(dtype='M8[ns]'), - Series(dtype=np.bool)]).dtype, np.object_) - self.assertEqual(pd.concat([Series(dtype='M8[ns]'), - Series(dtype=np.int64)]).dtype, np.object_) - self.assertEqual(pd.concat([Series(dtype='M8[ns]'), - Series(dtype=np.bool_), - Series(dtype=np.int64)]).dtype, np.object_) + # booleans + assert pd.concat([Series(dtype=np.bool_), + Series(dtype=np.int32)]).dtype == np.int32 + assert pd.concat([Series(dtype=np.bool_), + Series(dtype=np.float32)]).dtype == np.object_ + + # datetime-like + assert pd.concat([Series(dtype='m8[ns]'), + Series(dtype=np.bool)]).dtype == np.object_ + assert pd.concat([Series(dtype='m8[ns]'), + Series(dtype=np.int64)]).dtype == np.object_ + assert pd.concat([Series(dtype='M8[ns]'), + Series(dtype=np.bool)]).dtype == np.object_ + assert pd.concat([Series(dtype='M8[ns]'), + Series(dtype=np.int64)]).dtype == np.object_ + assert pd.concat([Series(dtype='M8[ns]'), + Series(dtype=np.bool_), + Series(dtype=np.int64)]).dtype == np.object_ # categorical - self.assertEqual(pd.concat([Series(dtype='category'), - Series(dtype='category')]).dtype, - 'category') - self.assertEqual(pd.concat([Series(dtype='category'), - Series(dtype='float64')]).dtype, - 'float64') - self.assertEqual(pd.concat([Series(dtype='category'), - Series(dtype='object')]).dtype, 'object') + assert pd.concat([Series(dtype='category'), + Series(dtype='category')]).dtype == 'category' + assert pd.concat([Series(dtype='category'), + Series(dtype='float64')]).dtype == 'float64' + assert pd.concat([Series(dtype='category'), + Series(dtype='object')]).dtype == 'object' # sparse result = pd.concat([Series(dtype='float64').to_sparse(), Series( dtype='float64').to_sparse()]) - self.assertEqual(result.dtype, np.float64) - self.assertEqual(result.ftype, 'float64:sparse') + assert result.dtype == np.float64 + assert result.ftype == 'float64:sparse' result = pd.concat([Series(dtype='float64').to_sparse(), Series( dtype='float64')]) - self.assertEqual(result.dtype, np.float64) - self.assertEqual(result.ftype, 'float64:sparse') + assert result.dtype == np.float64 + assert result.ftype == 'float64:sparse' result = pd.concat([Series(dtype='float64').to_sparse(), Series( dtype='object')]) - self.assertEqual(result.dtype, np.object_) - self.assertEqual(result.ftype, 'object:dense') + assert result.dtype == np.object_ + assert result.ftype == 'object:dense' def test_combine_first_dt64(self): - from pandas.tseries.tools import to_datetime + from pandas.core.tools.datetimes import to_datetime s0 = to_datetime(Series(["2010", np.NaN])) s1 = to_datetime(Series([np.NaN, "2011"])) rs = s0.combine_first(s1) @@ -218,3 +215,97 @@ def test_combine_first_dt64(self): rs = s0.combine_first(s1) xp = Series([datetime(2010, 1, 1), '2011']) assert_series_equal(rs, xp) + + +class TestTimeseries(object): + + def test_append_concat(self): + rng = date_range('5/8/2012 1:45', periods=10, freq='5T') + ts = Series(np.random.randn(len(rng)), rng) + df = DataFrame(np.random.randn(len(rng), 4), index=rng) + + result = ts.append(ts) + result_df = df.append(df) + ex_index = DatetimeIndex(np.tile(rng.values, 2)) + tm.assert_index_equal(result.index, ex_index) + tm.assert_index_equal(result_df.index, ex_index) + + appended = rng.append(rng) + tm.assert_index_equal(appended, ex_index) + + appended = rng.append([rng, rng]) + ex_index = DatetimeIndex(np.tile(rng.values, 3)) + tm.assert_index_equal(appended, ex_index) + + # different index names + rng1 = rng.copy() + rng2 = rng.copy() + rng1.name = 'foo' + rng2.name = 'bar' + assert rng1.append(rng1).name == 'foo' + assert rng1.append(rng2).name is None + + def test_append_concat_tz(self): + # see gh-2938 + rng = date_range('5/8/2012 1:45', periods=10, freq='5T', + tz='US/Eastern') + rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', + tz='US/Eastern') + rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', + tz='US/Eastern') + ts = Series(np.random.randn(len(rng)), rng) + df = DataFrame(np.random.randn(len(rng), 4), index=rng) + ts2 = Series(np.random.randn(len(rng2)), rng2) + df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) + + result = ts.append(ts2) + result_df = df.append(df2) + tm.assert_index_equal(result.index, rng3) + tm.assert_index_equal(result_df.index, rng3) + + appended = rng.append(rng2) + tm.assert_index_equal(appended, rng3) + + def test_append_concat_tz_explicit_pytz(self): + # see gh-2938 + from pytz import timezone as timezone + + rng = date_range('5/8/2012 1:45', periods=10, freq='5T', + tz=timezone('US/Eastern')) + rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', + tz=timezone('US/Eastern')) + rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', + tz=timezone('US/Eastern')) + ts = Series(np.random.randn(len(rng)), rng) + df = DataFrame(np.random.randn(len(rng), 4), index=rng) + ts2 = Series(np.random.randn(len(rng2)), rng2) + df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) + + result = ts.append(ts2) + result_df = df.append(df2) + tm.assert_index_equal(result.index, rng3) + tm.assert_index_equal(result_df.index, rng3) + + appended = rng.append(rng2) + tm.assert_index_equal(appended, rng3) + + def test_append_concat_tz_dateutil(self): + # see gh-2938 + rng = date_range('5/8/2012 1:45', periods=10, freq='5T', + tz='dateutil/US/Eastern') + rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', + tz='dateutil/US/Eastern') + rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', + tz='dateutil/US/Eastern') + ts = Series(np.random.randn(len(rng)), rng) + df = DataFrame(np.random.randn(len(rng), 4), index=rng) + ts2 = Series(np.random.randn(len(rng2)), rng2) + df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) + + result = ts.append(ts2) + result_df = df.append(df2) + tm.assert_index_equal(result.index, rng3) + tm.assert_index_equal(result_df.index, rng3) + + appended = rng.append(rng2) + tm.assert_index_equal(appended, rng3) diff --git a/pandas/tests/series/test_constructors.py b/pandas/tests/series/test_constructors.py index 05818b013ac52..3b95c2803dd9e 100644 --- a/pandas/tests/series/test_constructors.py +++ b/pandas/tests/series/test_constructors.py @@ -1,6 +1,8 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime, timedelta from numpy import nan @@ -8,80 +10,73 @@ import numpy.ma as ma import pandas as pd -from pandas.types.common import is_categorical_dtype, is_datetime64tz_dtype -from pandas import Index, Series, isnull, date_range, period_range -from pandas.core.index import MultiIndex -from pandas.tseries.index import Timestamp, DatetimeIndex +from pandas.core.dtypes.common import ( + is_categorical_dtype, + is_datetime64tz_dtype) +from pandas import (Index, Series, isna, date_range, + NaT, period_range, MultiIndex, IntervalIndex) +from pandas.core.indexes.datetimes import Timestamp, DatetimeIndex -import pandas.lib as lib +from pandas._libs import lib +from pandas._libs.tslib import iNaT -from pandas.compat import lrange, range, zip, OrderedDict, long -from pandas import compat +from pandas.compat import lrange, range, zip, long from pandas.util.testing import assert_series_equal import pandas.util.testing as tm from .common import TestData -class TestSeriesConstructors(TestData, tm.TestCase): +class TestSeriesConstructors(TestData): - _multiprocess_can_split_ = True + def test_invalid_dtype(self): + # GH15520 + msg = 'not understood' + invalid_list = [pd.Timestamp, 'pd.Timestamp', list] + for dtype in invalid_list: + with tm.assert_raises_regex(TypeError, msg): + Series([], name='time', dtype=dtype) def test_scalar_conversion(self): # Pass in scalar is disabled scalar = Series(0.5) - self.assertNotIsInstance(scalar, float) - - # coercion - self.assertEqual(float(Series([1.])), 1.0) - self.assertEqual(int(Series([1.])), 1) - self.assertEqual(long(Series([1.])), 1) + assert not isinstance(scalar, float) - def test_TimeSeries_deprecation(self): - - # deprecation TimeSeries, #10890 - with tm.assert_produces_warning(FutureWarning): - pd.TimeSeries(1, index=date_range('20130101', periods=3)) + # Coercion + assert float(Series([1.])) == 1.0 + assert int(Series([1.])) == 1 + assert long(Series([1.])) == 1 def test_constructor(self): - # Recognize TimeSeries - with tm.assert_produces_warning(FutureWarning): - self.assertTrue(self.ts.is_time_series) - self.assertTrue(self.ts.index.is_all_dates) + assert self.ts.index.is_all_dates # Pass in Series derived = Series(self.ts) - with tm.assert_produces_warning(FutureWarning): - self.assertTrue(derived.is_time_series) - self.assertTrue(derived.index.is_all_dates) + assert derived.index.is_all_dates - self.assertTrue(tm.equalContents(derived.index, self.ts.index)) + assert tm.equalContents(derived.index, self.ts.index) # Ensure new index is not created - self.assertEqual(id(self.ts.index), id(derived.index)) + assert id(self.ts.index) == id(derived.index) # Mixed type Series mixed = Series(['hello', np.NaN], index=[0, 1]) - self.assertEqual(mixed.dtype, np.object_) - self.assertIs(mixed[1], np.NaN) + assert mixed.dtype == np.object_ + assert mixed[1] is np.NaN - with tm.assert_produces_warning(FutureWarning): - self.assertFalse(self.empty.is_time_series) - self.assertFalse(self.empty.index.is_all_dates) - with tm.assert_produces_warning(FutureWarning): - self.assertFalse(Series({}).is_time_series) - self.assertFalse(Series({}).index.is_all_dates) - self.assertRaises(Exception, Series, np.random.randn(3, 3), - index=np.arange(3)) + assert not self.empty.index.is_all_dates + assert not Series({}).index.is_all_dates + pytest.raises(Exception, Series, np.random.randn(3, 3), + index=np.arange(3)) mixed.name = 'Series' rs = Series(mixed).name xp = 'Series' - self.assertEqual(rs, xp) + assert rs == xp # raise on MultiIndex GH4187 m = MultiIndex.from_arrays([[1, 2], [3, 4]]) - self.assertRaises(NotImplementedError, Series, m) + pytest.raises(NotImplementedError, Series, m) def test_constructor_empty(self): empty = Series() @@ -152,15 +147,15 @@ def test_constructor_categorical(self): tm.assert_categorical_equal(res.values, cat) # GH12574 - self.assertRaises( + pytest.raises( ValueError, lambda: Series(pd.Categorical([1, 2, 3]), dtype='int64')) cat = Series(pd.Categorical([1, 2, 3]), dtype='category') - self.assertTrue(is_categorical_dtype(cat)) - self.assertTrue(is_categorical_dtype(cat.dtype)) + assert is_categorical_dtype(cat) + assert is_categorical_dtype(cat.dtype) s = Series([1, 2, 3], dtype='category') - self.assertTrue(is_categorical_dtype(s)) - self.assertTrue(is_categorical_dtype(s.dtype)) + assert is_categorical_dtype(s) + assert is_categorical_dtype(s.dtype) def test_constructor_maskedarray(self): data = ma.masked_all((3, ), dtype=float) @@ -214,17 +209,16 @@ def test_constructor_maskedarray(self): expected = Series([True, True, False], index=index, dtype=bool) assert_series_equal(result, expected) - from pandas import tslib data = ma.masked_all((3, ), dtype='M8[ns]') result = Series(data) - expected = Series([tslib.iNaT, tslib.iNaT, tslib.iNaT], dtype='M8[ns]') + expected = Series([iNaT, iNaT, iNaT], dtype='M8[ns]') assert_series_equal(result, expected) data[0] = datetime(2001, 1, 1) data[2] = datetime(2001, 1, 3) index = ['a', 'b', 'c'] result = Series(data, index=index) - expected = Series([datetime(2001, 1, 1), tslib.iNaT, + expected = Series([datetime(2001, 1, 1), iNaT, datetime(2001, 1, 3)], index=index, dtype='M8[ns]') assert_series_equal(result, expected) @@ -234,6 +228,13 @@ def test_constructor_maskedarray(self): datetime(2001, 1, 3)], index=index, dtype='M8[ns]') assert_series_equal(result, expected) + def test_series_ctor_plus_datetimeindex(self): + rng = date_range('20090415', '20090519', freq='B') + data = dict((k, 1) for k in rng) + + result = Series(data, index=rng) + assert result.index is rng + def test_constructor_default_index(self): s = Series([0, 1, 2]) tm.assert_index_equal(s.index, pd.Index(np.arange(3))) @@ -242,14 +243,14 @@ def test_constructor_corner(self): df = tm.makeTimeDataFrame() objs = [df, df] s = Series(objs, index=[0, 1]) - tm.assertIsInstance(s, Series) + assert isinstance(s, Series) def test_constructor_sanitize(self): s = Series(np.array([1., 1., 8.]), dtype='i8') - self.assertEqual(s.dtype, np.dtype('i8')) + assert s.dtype == np.dtype('i8') s = Series(np.array([1., 1., np.nan]), copy=True, dtype='i8') - self.assertEqual(s.dtype, np.dtype('f8')) + assert s.dtype == np.dtype('f8') def test_constructor_copy(self): # GH15125 @@ -263,16 +264,16 @@ def test_constructor_copy(self): # changes to origin of copy does not affect the copy x[0] = 2. - self.assertFalse(x.equals(y)) - self.assertEqual(x[0], 2.) - self.assertEqual(y[0], 1.) + assert not x.equals(y) + assert x[0] == 2. + assert y[0] == 1. def test_constructor_pass_none(self): s = Series(None, index=lrange(5)) - self.assertEqual(s.dtype, np.float64) + assert s.dtype == np.float64 s = Series(None, index=lrange(5), dtype=object) - self.assertEqual(s.dtype, np.object_) + assert s.dtype == np.object_ # GH 7431 # inference on the index @@ -283,12 +284,12 @@ def test_constructor_pass_none(self): def test_constructor_pass_nan_nat(self): # GH 13467 exp = Series([np.nan, np.nan], dtype=np.float64) - self.assertEqual(exp.dtype, np.float64) + assert exp.dtype == np.float64 tm.assert_series_equal(Series([np.nan, np.nan]), exp) tm.assert_series_equal(Series(np.array([np.nan, np.nan])), exp) exp = Series([pd.NaT, pd.NaT]) - self.assertEqual(exp.dtype, 'datetime64[ns]') + assert exp.dtype == 'datetime64[ns]' tm.assert_series_equal(Series([pd.NaT, pd.NaT]), exp) tm.assert_series_equal(Series(np.array([pd.NaT, pd.NaT])), exp) @@ -299,7 +300,7 @@ def test_constructor_pass_nan_nat(self): tm.assert_series_equal(Series(np.array([np.nan, pd.NaT])), exp) def test_constructor_cast(self): - self.assertRaises(ValueError, Series, ['a', 'b', 'c'], dtype=float) + pytest.raises(ValueError, Series, ['a', 'b', 'c'], dtype=float) def test_constructor_dtype_nocast(self): # 1572 @@ -308,7 +309,7 @@ def test_constructor_dtype_nocast(self): s2 = Series(s, dtype=np.int64) s2[1] = 5 - self.assertEqual(s[1], 5) + assert s[1] == 5 def test_constructor_datelike_coercion(self): @@ -316,9 +317,9 @@ def test_constructor_datelike_coercion(self): # incorrectly infering on dateimelike looking when object dtype is # specified s = Series([Timestamp('20130101'), 'NOV'], dtype=object) - self.assertEqual(s.iloc[0], Timestamp('20130101')) - self.assertEqual(s.iloc[1], 'NOV') - self.assertTrue(s.dtype == object) + assert s.iloc[0] == Timestamp('20130101') + assert s.iloc[1] == 'NOV' + assert s.dtype == object # the dtype was being reset on the slicing and re-inferred to datetime # even thought the blocks are mixed @@ -332,31 +333,38 @@ def test_constructor_datelike_coercion(self): 'mat': mat}, index=belly) result = df.loc['3T19'] - self.assertTrue(result.dtype == object) + assert result.dtype == object result = df.loc['216'] - self.assertTrue(result.dtype == object) + assert result.dtype == object + + def test_constructor_datetimes_with_nulls(self): + # gh-15869 + for arr in [np.array([None, None, None, None, + datetime.now(), None]), + np.array([None, None, datetime.now(), None])]: + result = Series(arr) + assert result.dtype == 'M8[ns]' def test_constructor_dtype_datetime64(self): - import pandas.tslib as tslib - s = Series(tslib.iNaT, dtype='M8[ns]', index=lrange(5)) - self.assertTrue(isnull(s).all()) + s = Series(iNaT, dtype='M8[ns]', index=lrange(5)) + assert isna(s).all() # in theory this should be all nulls, but since # we are not specifying a dtype is ambiguous - s = Series(tslib.iNaT, index=lrange(5)) - self.assertFalse(isnull(s).all()) + s = Series(iNaT, index=lrange(5)) + assert not isna(s).all() s = Series(nan, dtype='M8[ns]', index=lrange(5)) - self.assertTrue(isnull(s).all()) + assert isna(s).all() - s = Series([datetime(2001, 1, 2, 0, 0), tslib.iNaT], dtype='M8[ns]') - self.assertTrue(isnull(s[1])) - self.assertEqual(s.dtype, 'M8[ns]') + s = Series([datetime(2001, 1, 2, 0, 0), iNaT], dtype='M8[ns]') + assert isna(s[1]) + assert s.dtype == 'M8[ns]' s = Series([datetime(2001, 1, 2, 0, 0), nan], dtype='M8[ns]') - self.assertTrue(isnull(s[1])) - self.assertEqual(s.dtype, 'M8[ns]') + assert isna(s[1]) + assert s.dtype == 'M8[ns]' # GH3416 dates = [ @@ -366,32 +374,32 @@ def test_constructor_dtype_datetime64(self): ] s = Series(dates) - self.assertEqual(s.dtype, 'M8[ns]') + assert s.dtype == 'M8[ns]' s.iloc[0] = np.nan - self.assertEqual(s.dtype, 'M8[ns]') + assert s.dtype == 'M8[ns]' # invalid astypes for t in ['s', 'D', 'us', 'ms']: - self.assertRaises(TypeError, s.astype, 'M8[%s]' % t) + pytest.raises(TypeError, s.astype, 'M8[%s]' % t) # GH3414 related - self.assertRaises(TypeError, lambda x: Series( + pytest.raises(TypeError, lambda x: Series( Series(dates).astype('int') / 1000000, dtype='M8[ms]')) - self.assertRaises(TypeError, - lambda x: Series(dates, dtype='datetime64')) + pytest.raises(TypeError, + lambda x: Series(dates, dtype='datetime64')) # invalid dates can be help as object result = Series([datetime(2, 1, 1)]) - self.assertEqual(result[0], datetime(2, 1, 1, 0, 0)) + assert result[0] == datetime(2, 1, 1, 0, 0) result = Series([datetime(3000, 1, 1)]) - self.assertEqual(result[0], datetime(3000, 1, 1, 0, 0)) + assert result[0] == datetime(3000, 1, 1, 0, 0) # don't mix types result = Series([Timestamp('20130101'), 1], index=['a', 'b']) - self.assertEqual(result['a'], Timestamp('20130101')) - self.assertEqual(result['b'], 1) + assert result['a'] == Timestamp('20130101') + assert result['b'] == 1 # GH6529 # coerce datetime64 non-ns properly @@ -416,45 +424,45 @@ def test_constructor_dtype_datetime64(self): dates2 = np.array([d.date() for d in dates.to_pydatetime()], dtype=object) series1 = Series(dates2, dates) - self.assert_numpy_array_equal(series1.values, dates2) - self.assertEqual(series1.dtype, object) + tm.assert_numpy_array_equal(series1.values, dates2) + assert series1.dtype == object # these will correctly infer a datetime s = Series([None, pd.NaT, '2013-08-05 15:30:00.000001']) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' s = Series([np.nan, pd.NaT, '2013-08-05 15:30:00.000001']) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' s = Series([pd.NaT, None, '2013-08-05 15:30:00.000001']) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' s = Series([pd.NaT, np.nan, '2013-08-05 15:30:00.000001']) - self.assertEqual(s.dtype, 'datetime64[ns]') + assert s.dtype == 'datetime64[ns]' # tz-aware (UTC and other tz's) # GH 8411 dr = date_range('20130101', periods=3) - self.assertTrue(Series(dr).iloc[0].tz is None) + assert Series(dr).iloc[0].tz is None dr = date_range('20130101', periods=3, tz='UTC') - self.assertTrue(str(Series(dr).iloc[0].tz) == 'UTC') + assert str(Series(dr).iloc[0].tz) == 'UTC' dr = date_range('20130101', periods=3, tz='US/Eastern') - self.assertTrue(str(Series(dr).iloc[0].tz) == 'US/Eastern') + assert str(Series(dr).iloc[0].tz) == 'US/Eastern' # non-convertible s = Series([1479596223000, -1479590, pd.NaT]) - self.assertTrue(s.dtype == 'object') - self.assertTrue(s[2] is pd.NaT) - self.assertTrue('NaT' in str(s)) + assert s.dtype == 'object' + assert s[2] is pd.NaT + assert 'NaT' in str(s) # if we passed a NaT it remains s = Series([datetime(2010, 1, 1), datetime(2, 1, 1), pd.NaT]) - self.assertTrue(s.dtype == 'object') - self.assertTrue(s[2] is pd.NaT) - self.assertTrue('NaT' in str(s)) + assert s.dtype == 'object' + assert s[2] is pd.NaT + assert 'NaT' in str(s) # if we passed a nan it remains s = Series([datetime(2010, 1, 1), datetime(2, 1, 1), np.nan]) - self.assertTrue(s.dtype == 'object') - self.assertTrue(s[2] is np.nan) - self.assertTrue('NaN' in str(s)) + assert s.dtype == 'object' + assert s[2] is np.nan + assert 'NaN' in str(s) def test_constructor_with_datetime_tz(self): @@ -463,27 +471,27 @@ def test_constructor_with_datetime_tz(self): dr = date_range('20130101', periods=3, tz='US/Eastern') s = Series(dr) - self.assertTrue(s.dtype.name == 'datetime64[ns, US/Eastern]') - self.assertTrue(s.dtype == 'datetime64[ns, US/Eastern]') - self.assertTrue(is_datetime64tz_dtype(s.dtype)) - self.assertTrue('datetime64[ns, US/Eastern]' in str(s)) + assert s.dtype.name == 'datetime64[ns, US/Eastern]' + assert s.dtype == 'datetime64[ns, US/Eastern]' + assert is_datetime64tz_dtype(s.dtype) + assert 'datetime64[ns, US/Eastern]' in str(s) # export result = s.values - self.assertIsInstance(result, np.ndarray) - self.assertTrue(result.dtype == 'datetime64[ns]') + assert isinstance(result, np.ndarray) + assert result.dtype == 'datetime64[ns]' exp = pd.DatetimeIndex(result) exp = exp.tz_localize('UTC').tz_convert(tz=s.dt.tz) - self.assert_index_equal(dr, exp) + tm.assert_index_equal(dr, exp) # indexing result = s.iloc[0] - self.assertEqual(result, Timestamp('2013-01-01 00:00:00-0500', - tz='US/Eastern', freq='D')) + assert result == Timestamp('2013-01-01 00:00:00-0500', + tz='US/Eastern', freq='D') result = s[0] - self.assertEqual(result, Timestamp('2013-01-01 00:00:00-0500', - tz='US/Eastern', freq='D')) + assert result == Timestamp('2013-01-01 00:00:00-0500', + tz='US/Eastern', freq='D') result = s[Series([True, True, False], index=s.index)] assert_series_equal(result, s[0:2]) @@ -515,16 +523,16 @@ def test_constructor_with_datetime_tz(self): assert_series_equal(result, expected) # short str - self.assertTrue('datetime64[ns, US/Eastern]' in str(s)) + assert 'datetime64[ns, US/Eastern]' in str(s) # formatting with NaT result = s.shift() - self.assertTrue('datetime64[ns, US/Eastern]' in str(result)) - self.assertTrue('NaT' in str(result)) + assert 'datetime64[ns, US/Eastern]' in str(result) + assert 'NaT' in str(result) # long str t = Series(date_range('20130101', periods=1000, tz='US/Eastern')) - self.assertTrue('datetime64[ns, US/Eastern]' in str(t)) + assert 'datetime64[ns, US/Eastern]' in str(t) result = pd.DatetimeIndex(s, freq='infer') tm.assert_index_equal(result, dr) @@ -532,19 +540,30 @@ def test_constructor_with_datetime_tz(self): # inference s = Series([pd.Timestamp('2013-01-01 13:00:00-0800', tz='US/Pacific'), pd.Timestamp('2013-01-02 14:00:00-0800', tz='US/Pacific')]) - self.assertTrue(s.dtype == 'datetime64[ns, US/Pacific]') - self.assertTrue(lib.infer_dtype(s) == 'datetime64') + assert s.dtype == 'datetime64[ns, US/Pacific]' + assert lib.infer_dtype(s) == 'datetime64' s = Series([pd.Timestamp('2013-01-01 13:00:00-0800', tz='US/Pacific'), pd.Timestamp('2013-01-02 14:00:00-0800', tz='US/Eastern')]) - self.assertTrue(s.dtype == 'object') - self.assertTrue(lib.infer_dtype(s) == 'datetime') + assert s.dtype == 'object' + assert lib.infer_dtype(s) == 'datetime' # with all NaT s = Series(pd.NaT, index=[0, 1], dtype='datetime64[ns, US/Eastern]') expected = Series(pd.DatetimeIndex(['NaT', 'NaT'], tz='US/Eastern')) assert_series_equal(s, expected) + def test_construction_interval(self): + # construction from interval & array of intervals + index = IntervalIndex.from_breaks(np.arange(3), closed='right') + result = Series(index) + repr(result) + str(result) + tm.assert_index_equal(Index(result.values), index) + + result = Series(index.values) + tm.assert_index_equal(Index(result.values), index) + def test_construction_consistency(self): # make sure that we are not re-localizing upon construction @@ -569,7 +588,7 @@ def test_constructor_periodindex(self): expected = Series(pi.asobject) assert_series_equal(s, expected) - self.assertEqual(s.dtype, 'object') + assert s.dtype == 'object' def test_constructor_dict(self): d = {'a': 0., 'b': 1., 'c': 2.} @@ -585,48 +604,6 @@ def test_constructor_dict(self): expected.iloc[1] = 1 assert_series_equal(result, expected) - def test_constructor_dict_multiindex(self): - check = lambda result, expected: tm.assert_series_equal( - result, expected, check_dtype=True, check_series_type=True) - d = {('a', 'a'): 0., ('b', 'a'): 1., ('b', 'c'): 2.} - _d = sorted(d.items()) - ser = Series(d) - expected = Series([x[1] for x in _d], - index=MultiIndex.from_tuples([x[0] for x in _d])) - check(ser, expected) - - d['z'] = 111. - _d.insert(0, ('z', d['z'])) - ser = Series(d) - expected = Series([x[1] for x in _d], index=Index( - [x[0] for x in _d], tupleize_cols=False)) - ser = ser.reindex(index=expected.index) - check(ser, expected) - - def test_constructor_dict_timedelta_index(self): - # GH #12169 : Resample category data with timedelta index - # construct Series from dict as data and TimedeltaIndex as index - # will result NaN in result Series data - expected = Series( - data=['A', 'B', 'C'], - index=pd.to_timedelta([0, 10, 20], unit='s') - ) - - result = Series( - data={pd.to_timedelta(0, unit='s'): 'A', - pd.to_timedelta(10, unit='s'): 'B', - pd.to_timedelta(20, unit='s'): 'C'}, - index=pd.to_timedelta([0, 10, 20], unit='s') - ) - # this should work - assert_series_equal(result, expected) - - def test_constructor_subclass_dict(self): - data = tm.TestSubDict((x, 10.0 * x) for x in range(10)) - series = Series(data) - refseries = Series(dict(compat.iteritems(data))) - assert_series_equal(refseries, series) - def test_constructor_dict_datetime64_index(self): # GH 9456 @@ -650,163 +627,201 @@ def create_data(constructor): assert_series_equal(result_datetime, expected) assert_series_equal(result_Timestamp, expected) - def test_orderedDict_ctor(self): - # GH3283 - import pandas - import random - data = OrderedDict([('col%s' % i, random.random()) for i in range(12)]) - s = pandas.Series(data) - self.assertTrue(all(s.values == list(data.values()))) - - def test_orderedDict_subclass_ctor(self): - # GH3283 - import pandas - import random - - class A(OrderedDict): - pass - - data = A([('col%s' % i, random.random()) for i in range(12)]) - s = pandas.Series(data) - self.assertTrue(all(s.values == list(data.values()))) - def test_constructor_list_of_tuples(self): data = [(1, 1), (2, 2), (2, 3)] s = Series(data) - self.assertEqual(list(s), data) + assert list(s) == data def test_constructor_tuple_of_tuples(self): data = ((1, 1), (2, 2), (2, 3)) s = Series(data) - self.assertEqual(tuple(s), data) + assert tuple(s) == data def test_constructor_set(self): values = set([1, 2, 3, 4, 5]) - self.assertRaises(TypeError, Series, values) + pytest.raises(TypeError, Series, values) values = frozenset(values) - self.assertRaises(TypeError, Series, values) + pytest.raises(TypeError, Series, values) def test_fromDict(self): data = {'a': 0, 'b': 1, 'c': 2, 'd': 3} series = Series(data) - self.assertTrue(tm.is_sorted(series.index)) + assert tm.is_sorted(series.index) data = {'a': 0, 'b': '1', 'c': '2', 'd': datetime.now()} series = Series(data) - self.assertEqual(series.dtype, np.object_) + assert series.dtype == np.object_ data = {'a': 0, 'b': '1', 'c': '2', 'd': '3'} series = Series(data) - self.assertEqual(series.dtype, np.object_) + assert series.dtype == np.object_ data = {'a': '0', 'b': '1'} series = Series(data, dtype=float) - self.assertEqual(series.dtype, np.float64) + assert series.dtype == np.float64 def test_fromValue(self): nans = Series(np.NaN, index=self.ts.index) - self.assertEqual(nans.dtype, np.float_) - self.assertEqual(len(nans), len(self.ts)) + assert nans.dtype == np.float_ + assert len(nans) == len(self.ts) strings = Series('foo', index=self.ts.index) - self.assertEqual(strings.dtype, np.object_) - self.assertEqual(len(strings), len(self.ts)) + assert strings.dtype == np.object_ + assert len(strings) == len(self.ts) d = datetime.now() dates = Series(d, index=self.ts.index) - self.assertEqual(dates.dtype, 'M8[ns]') - self.assertEqual(len(dates), len(self.ts)) + assert dates.dtype == 'M8[ns]' + assert len(dates) == len(self.ts) # GH12336 # Test construction of categorical series from value categorical = Series(0, index=self.ts.index, dtype="category") expected = Series(0, index=self.ts.index).astype("category") - self.assertEqual(categorical.dtype, 'category') - self.assertEqual(len(categorical), len(self.ts)) + assert categorical.dtype == 'category' + assert len(categorical) == len(self.ts) tm.assert_series_equal(categorical, expected) def test_constructor_dtype_timedelta64(self): # basic td = Series([timedelta(days=i) for i in range(3)]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([timedelta(days=1)]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([timedelta(days=1), timedelta(days=2), np.timedelta64( 1, 's')]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' # mixed with NaT - from pandas import tslib - td = Series([timedelta(days=1), tslib.NaT], dtype='m8[ns]') - self.assertEqual(td.dtype, 'timedelta64[ns]') + td = Series([timedelta(days=1), NaT], dtype='m8[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([timedelta(days=1), np.nan], dtype='m8[ns]') - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([np.timedelta64(300000000), pd.NaT], dtype='m8[ns]') - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' # improved inference # GH5689 - td = Series([np.timedelta64(300000000), pd.NaT]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + td = Series([np.timedelta64(300000000), NaT]) + assert td.dtype == 'timedelta64[ns]' # because iNaT is int, not coerced to timedelta - td = Series([np.timedelta64(300000000), tslib.iNaT]) - self.assertEqual(td.dtype, 'object') + td = Series([np.timedelta64(300000000), iNaT]) + assert td.dtype == 'object' td = Series([np.timedelta64(300000000), np.nan]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([pd.NaT, np.timedelta64(300000000)]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' td = Series([np.timedelta64(1, 's')]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' # these are frequency conversion astypes # for t in ['s', 'D', 'us', 'ms']: - # self.assertRaises(TypeError, td.astype, 'm8[%s]' % t) + # pytest.raises(TypeError, td.astype, 'm8[%s]' % t) # valid astype td.astype('int64') # invalid casting - self.assertRaises(TypeError, td.astype, 'int32') + pytest.raises(TypeError, td.astype, 'int32') # this is an invalid casting def f(): Series([timedelta(days=1), 'foo'], dtype='m8[ns]') - self.assertRaises(Exception, f) + pytest.raises(Exception, f) # leave as object here td = Series([timedelta(days=i) for i in range(3)] + ['foo']) - self.assertEqual(td.dtype, 'object') + assert td.dtype == 'object' # these will correctly infer a timedelta s = Series([None, pd.NaT, '1 Day']) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' s = Series([np.nan, pd.NaT, '1 Day']) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' s = Series([pd.NaT, None, '1 Day']) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' s = Series([pd.NaT, np.nan, '1 Day']) - self.assertEqual(s.dtype, 'timedelta64[ns]') + assert s.dtype == 'timedelta64[ns]' + + def test_NaT_scalar(self): + series = Series([0, 1000, 2000, iNaT], dtype='M8[ns]') + + val = series[3] + assert isna(val) + + series[2] = val + assert isna(series[2]) + + def test_NaT_cast(self): + # GH10747 + result = Series([np.nan]).astype('M8[ns]') + expected = Series([NaT]) + assert_series_equal(result, expected) def test_constructor_name_hashable(self): for n in [777, 777., 'name', datetime(2001, 11, 11), (1, ), u"\u05D0"]: for data in [[1, 2, 3], np.ones(3), {'a': 0, 'b': 1}]: s = Series(data, name=n) - self.assertEqual(s.name, n) + assert s.name == n def test_constructor_name_unhashable(self): for n in [['name_list'], np.ones(2), {1: 2}]: for data in [['name_list'], np.ones(2), {1: 2}]: - self.assertRaises(TypeError, Series, data, name=n) + pytest.raises(TypeError, Series, data, name=n) + + def test_auto_conversion(self): + series = Series(list(date_range('1/1/2000', periods=10))) + assert series.dtype == 'M8[ns]' + + def test_constructor_cant_cast_datetime64(self): + msg = "Cannot cast datetime64 to " + with tm.assert_raises_regex(TypeError, msg): + Series(date_range('1/1/2000', periods=10), dtype=float) + + with tm.assert_raises_regex(TypeError, msg): + Series(date_range('1/1/2000', periods=10), dtype=int) + + def test_constructor_cast_object(self): + s = Series(date_range('1/1/2000', periods=10), dtype=object) + exp = Series(date_range('1/1/2000', periods=10)) + tm.assert_series_equal(s, exp) + + def test_constructor_generic_timestamp_deprecated(self): + # see gh-15524 + + with tm.assert_produces_warning(FutureWarning): + dtype = np.timedelta64 + s = Series([], dtype=dtype) + + assert s.empty + assert s.dtype == 'm8[ns]' + + with tm.assert_produces_warning(FutureWarning): + dtype = np.datetime64 + s = Series([], dtype=dtype) + + assert s.empty + assert s.dtype == 'M8[ns]' + + # These timestamps have the wrong frequencies, + # so an Exception should be raised now. + msg = "cannot convert timedeltalike" + with tm.assert_raises_regex(TypeError, msg): + Series([], dtype='m8[ps]') + + msg = "cannot convert datetimelike" + with tm.assert_raises_regex(TypeError, msg): + Series([], dtype='M8[ps]') diff --git a/pandas/tests/series/test_datetime_values.py b/pandas/tests/series/test_datetime_values.py index b9f999a6c6ffe..e810eadd2dee9 100644 --- a/pandas/tests/series/test_datetime_values.py +++ b/pandas/tests/series/test_datetime_values.py @@ -1,17 +1,17 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime, date import numpy as np import pandas as pd -from pandas.types.common import is_integer_dtype, is_list_like +from pandas.core.dtypes.common import is_integer_dtype, is_list_like from pandas import (Index, Series, DataFrame, bdate_range, - date_range, period_range, timedelta_range) -from pandas.tseries.period import PeriodIndex -from pandas.tseries.index import Timestamp, DatetimeIndex -from pandas.tseries.tdi import TimedeltaIndex + date_range, period_range, timedelta_range, + PeriodIndex, Timestamp, DatetimeIndex, TimedeltaIndex) import pandas.core.common as com from pandas.util.testing import assert_series_equal @@ -20,30 +20,20 @@ from .common import TestData -class TestSeriesDatetimeValues(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesDatetimeValues(TestData): def test_dt_namespace_accessor(self): # GH 7207, 11128 # test .dt namespace accessor - ok_for_base = ['year', 'month', 'day', 'hour', 'minute', 'second', - 'weekofyear', 'week', 'dayofweek', 'weekday', - 'dayofyear', 'quarter', 'freq', 'days_in_month', - 'daysinmonth', 'is_leap_year'] - ok_for_period = ok_for_base + ['qyear', 'start_time', 'end_time'] + ok_for_period = PeriodIndex._datetimelike_ops ok_for_period_methods = ['strftime', 'to_timestamp', 'asfreq'] - ok_for_dt = ok_for_base + ['date', 'time', 'microsecond', 'nanosecond', - 'is_month_start', 'is_month_end', - 'is_quarter_start', 'is_quarter_end', - 'is_year_start', 'is_year_end', 'tz', - 'weekday_name'] + ok_for_dt = DatetimeIndex._datetimelike_ops ok_for_dt_methods = ['to_period', 'to_pydatetime', 'tz_localize', 'tz_convert', 'normalize', 'strftime', 'round', 'floor', 'ceil', 'weekday_name'] - ok_for_td = ['days', 'seconds', 'microseconds', 'nanoseconds'] + ok_for_td = TimedeltaIndex._datetimelike_ops ok_for_td_methods = ['components', 'to_pytimedelta', 'total_seconds', 'round', 'floor', 'ceil'] @@ -60,7 +50,7 @@ def compare(s, name): a = getattr(s.dt, prop) b = get_expected(s, prop) if not (is_list_like(a) and is_list_like(b)): - self.assertEqual(a, b) + assert a == b else: tm.assert_series_equal(a, b) @@ -80,8 +70,8 @@ def compare(s, name): getattr(s.dt, prop) result = s.dt.to_pydatetime() - self.assertIsInstance(result, np.ndarray) - self.assertTrue(result.dtype == object) + assert isinstance(result, np.ndarray) + assert result.dtype == object result = s.dt.tz_localize('US/Eastern') exp_values = DatetimeIndex(s.values).tz_localize('US/Eastern') @@ -89,10 +79,9 @@ def compare(s, name): tm.assert_series_equal(result, expected) tz_result = result.dt.tz - self.assertEqual(str(tz_result), 'US/Eastern') + assert str(tz_result) == 'US/Eastern' freq_result = s.dt.freq - self.assertEqual(freq_result, DatetimeIndex(s.values, - freq='infer').freq) + assert freq_result == DatetimeIndex(s.values, freq='infer').freq # let's localize, then convert result = s.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') @@ -150,8 +139,8 @@ def compare(s, name): getattr(s.dt, prop) result = s.dt.to_pydatetime() - self.assertIsInstance(result, np.ndarray) - self.assertTrue(result.dtype == object) + assert isinstance(result, np.ndarray) + assert result.dtype == object result = s.dt.tz_convert('CET') expected = Series(s._values.tz_convert('CET'), @@ -159,18 +148,17 @@ def compare(s, name): tm.assert_series_equal(result, expected) tz_result = result.dt.tz - self.assertEqual(str(tz_result), 'CET') + assert str(tz_result) == 'CET' freq_result = s.dt.freq - self.assertEqual(freq_result, DatetimeIndex(s.values, - freq='infer').freq) + assert freq_result == DatetimeIndex(s.values, freq='infer').freq - # timedeltaindex + # timedelta index cases = [Series(timedelta_range('1 day', periods=5), index=list('abcde'), name='xxx'), Series(timedelta_range('1 day 01:23:45', periods=5, - freq='s'), name='xxx'), + freq='s'), name='xxx'), Series(timedelta_range('2 days 01:23:45.012345', periods=5, - freq='ms'), name='xxx')] + freq='ms'), name='xxx')] for s in cases: for prop in ok_for_td: # we test freq below @@ -181,20 +169,19 @@ def compare(s, name): getattr(s.dt, prop) result = s.dt.components - self.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) tm.assert_index_equal(result.index, s.index) result = s.dt.to_pytimedelta() - self.assertIsInstance(result, np.ndarray) - self.assertTrue(result.dtype == object) + assert isinstance(result, np.ndarray) + assert result.dtype == object result = s.dt.total_seconds() - self.assertIsInstance(result, pd.Series) - self.assertTrue(result.dtype == 'float64') + assert isinstance(result, pd.Series) + assert result.dtype == 'float64' freq_result = s.dt.freq - self.assertEqual(freq_result, TimedeltaIndex(s.values, - freq='infer').freq) + assert freq_result == TimedeltaIndex(s.values, freq='infer').freq # both index = date_range('20130101', periods=3, freq='D') @@ -228,7 +215,7 @@ def compare(s, name): getattr(s.dt, prop) freq_result = s.dt.freq - self.assertEqual(freq_result, PeriodIndex(s.values).freq) + assert freq_result == PeriodIndex(s.values).freq # test limited display api def get_dir(s): @@ -261,7 +248,7 @@ def get_dir(s): # no setting allowed s = Series(date_range('20130101', periods=5, freq='D'), name='xxx') - with tm.assertRaisesRegexp(ValueError, "modifications"): + with tm.assert_raises_regex(ValueError, "modifications"): s.dt.hour = 5 # trying to set a copy @@ -270,13 +257,13 @@ def get_dir(s): def f(): s.dt.hour[0] = 5 - self.assertRaises(com.SettingWithCopyError, f) + pytest.raises(com.SettingWithCopyError, f) def test_dt_accessor_no_new_attributes(self): # https://github.com/pandas-dev/pandas/issues/10673 s = Series(date_range('20130101', periods=5, freq='D')) - with tm.assertRaisesRegexp(AttributeError, - "You cannot add any new attribute"): + with tm.assert_raises_regex(AttributeError, + "You cannot add any new attribute"): s.dt.xlabel = "a" def test_strftime(self): @@ -321,13 +308,13 @@ def test_strftime(self): expected = np.array(['2015/03/01', '2015/03/02', '2015/03/03', '2015/03/04', '2015/03/05'], dtype=np.object_) # dtype may be S10 or U10 depending on python version - self.assert_numpy_array_equal(result, expected, check_dtype=False) + tm.assert_numpy_array_equal(result, expected, check_dtype=False) period_index = period_range('20150301', periods=5) result = period_index.strftime("%Y/%m/%d") expected = np.array(['2015/03/01', '2015/03/02', '2015/03/03', '2015/03/04', '2015/03/05'], dtype='=U10') - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) s = Series([datetime(2013, 1, 1, 2, 32, 59), datetime(2013, 1, 2, 14, 32, 1)]) @@ -376,28 +363,28 @@ def test_valid_dt_with_missing_values(self): def test_dt_accessor_api(self): # GH 9322 - from pandas.tseries.common import (CombinedDatetimelikeProperties, - DatetimeProperties) - self.assertIs(Series.dt, CombinedDatetimelikeProperties) + from pandas.core.indexes.accessors import ( + CombinedDatetimelikeProperties, DatetimeProperties) + assert Series.dt is CombinedDatetimelikeProperties s = Series(date_range('2000-01-01', periods=3)) - self.assertIsInstance(s.dt, DatetimeProperties) + assert isinstance(s.dt, DatetimeProperties) for s in [Series(np.arange(5)), Series(list('abcde')), Series(np.random.randn(5))]: - with tm.assertRaisesRegexp(AttributeError, - "only use .dt accessor"): + with tm.assert_raises_regex(AttributeError, + "only use .dt accessor"): s.dt - self.assertFalse(hasattr(s, 'dt')) + assert not hasattr(s, 'dt') def test_sub_of_datetime_from_TimeSeries(self): - from pandas.tseries.timedeltas import to_timedelta + from pandas.core.tools.timedeltas import to_timedelta from datetime import datetime a = Timestamp(datetime(1993, 0o1, 0o7, 13, 30, 00)) b = datetime(1993, 6, 22, 13, 30) a = Series([a]) result = to_timedelta(np.abs(a - b)) - self.assertEqual(result.dtype, 'timedelta64[ns]') + assert result.dtype == 'timedelta64[ns]' def test_between(self): s = Series(bdate_range('1/1/2000', periods=20).asobject) @@ -422,3 +409,13 @@ def test_date_tz(self): date(2015, 11, 22)]) assert_series_equal(s.dt.date, expected) assert_series_equal(s.apply(lambda x: x.date()), expected) + + def test_datetime_understood(self): + # Ensures it doesn't fail to create the right series + # reported in issue#16726 + series = pd.Series(pd.date_range("2012-01-01", periods=3)) + offset = pd.offsets.DateOffset(days=6) + result = series - offset + expected = pd.Series(pd.to_datetime([ + '2011-12-26', '2011-12-27', '2011-12-28'])) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_dtypes.py b/pandas/tests/series/test_dtypes.py index 1a1ff28bbb398..c214280ee8386 100644 --- a/pandas/tests/series/test_dtypes.py +++ b/pandas/tests/series/test_dtypes.py @@ -1,167 +1,225 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 -import sys +import pytest + from datetime import datetime + +import sys import string +import warnings from numpy import nan import numpy as np -from pandas import Series -from pandas.tseries.index import Timestamp -from pandas.tseries.tdi import Timedelta +from pandas import Series, Timestamp, Timedelta, DataFrame, date_range from pandas.compat import lrange, range, u from pandas import compat -from pandas.util.testing import assert_series_equal import pandas.util.testing as tm from .common import TestData -class TestSeriesDtypes(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesDtypes(TestData): - def test_astype(self): + @pytest.mark.parametrize("dtype", ["float32", "float64", + "int64", "int32"]) + def test_astype(self, dtype): s = Series(np.random.randn(5), name='foo') + as_typed = s.astype(dtype) - for dtype in ['float32', 'float64', 'int64', 'int32']: - astyped = s.astype(dtype) - self.assertEqual(astyped.dtype, dtype) - self.assertEqual(astyped.name, s.name) + assert as_typed.dtype == dtype + assert as_typed.name == s.name def test_dtype(self): - self.assertEqual(self.ts.dtype, np.dtype('float64')) - self.assertEqual(self.ts.dtypes, np.dtype('float64')) - self.assertEqual(self.ts.ftype, 'float64:dense') - self.assertEqual(self.ts.ftypes, 'float64:dense') - assert_series_equal(self.ts.get_dtype_counts(), Series(1, ['float64'])) - assert_series_equal(self.ts.get_ftype_counts(), Series( - 1, ['float64:dense'])) - - def test_astype_cast_nan_inf_int(self): - # GH14265, check nan and inf raise error when converting to int - types = [np.int32, np.int64] - values = [np.nan, np.inf] + assert self.ts.dtype == np.dtype('float64') + assert self.ts.dtypes == np.dtype('float64') + assert self.ts.ftype == 'float64:dense' + assert self.ts.ftypes == 'float64:dense' + tm.assert_series_equal(self.ts.get_dtype_counts(), + Series(1, ['float64'])) + tm.assert_series_equal(self.ts.get_ftype_counts(), + Series(1, ['float64:dense'])) + + @pytest.mark.parametrize("value", [np.nan, np.inf]) + @pytest.mark.parametrize("dtype", [np.int32, np.int64]) + def test_astype_cast_nan_inf_int(self, dtype, value): + # gh-14265: check NaN and inf raise error when converting to int msg = 'Cannot convert non-finite values \\(NA or inf\\) to integer' + s = Series([value]) - for this_type in types: - for this_val in values: - s = Series([this_val]) - with self.assertRaisesRegexp(ValueError, msg): - s.astype(this_type) + with tm.assert_raises_regex(ValueError, msg): + s.astype(dtype) - def test_astype_cast_object_int(self): + @pytest.mark.parametrize("dtype", [int, np.int8, np.int64]) + def test_astype_cast_object_int_fail(self, dtype): arr = Series(["car", "house", "tree", "1"]) + with pytest.raises(ValueError): + arr.astype(dtype) - self.assertRaises(ValueError, arr.astype, int) - self.assertRaises(ValueError, arr.astype, np.int64) - self.assertRaises(ValueError, arr.astype, np.int8) - + def test_astype_cast_object_int(self): arr = Series(['1', '2', '3', '4'], dtype=object) result = arr.astype(int) - self.assert_series_equal(result, Series(np.arange(1, 5))) - def test_astype_datetimes(self): - import pandas.tslib as tslib + tm.assert_series_equal(result, Series(np.arange(1, 5))) + def test_astype_datetimes(self): + import pandas._libs.tslib as tslib s = Series(tslib.iNaT, dtype='M8[ns]', index=lrange(5)) + s = s.astype('O') - self.assertEqual(s.dtype, np.object_) + assert s.dtype == np.object_ s = Series([datetime(2001, 1, 2, 0, 0)]) + s = s.astype('O') - self.assertEqual(s.dtype, np.object_) + assert s.dtype == np.object_ s = Series([datetime(2001, 1, 2, 0, 0) for i in range(3)]) + s[1] = np.nan - self.assertEqual(s.dtype, 'M8[ns]') - s = s.astype('O') - self.assertEqual(s.dtype, np.object_) + assert s.dtype == 'M8[ns]' - def test_astype_str(self): - # GH4405 - digits = string.digits - s1 = Series([digits * 10, tm.rands(63), tm.rands(64), tm.rands(1000)]) - s2 = Series([digits * 10, tm.rands(63), tm.rands(64), nan, 1.0]) - types = (compat.text_type, np.str_) - for typ in types: - for s in (s1, s2): - res = s.astype(typ) - expec = s.map(compat.text_type) - assert_series_equal(res, expec) - - # GH9757 - # Test str and unicode on python 2.x and just str on python 3.x - for tt in set([str, compat.text_type]): - ts = Series([Timestamp('2010-01-04 00:00:00')]) - s = ts.astype(tt) - expected = Series([tt('2010-01-04')]) - assert_series_equal(s, expected) - - ts = Series([Timestamp('2010-01-04 00:00:00', tz='US/Eastern')]) - s = ts.astype(tt) - expected = Series([tt('2010-01-04 00:00:00-05:00')]) - assert_series_equal(s, expected) - - td = Series([Timedelta(1, unit='d')]) - s = td.astype(tt) - expected = Series([tt('1 days 00:00:00.000000000')]) - assert_series_equal(s, expected) + s = s.astype('O') + assert s.dtype == np.object_ + + @pytest.mark.parametrize("dtype", [compat.text_type, np.str_]) + @pytest.mark.parametrize("series", [Series([string.digits * 10, + tm.rands(63), + tm.rands(64), + tm.rands(1000)]), + Series([string.digits * 10, + tm.rands(63), + tm.rands(64), nan, 1.0])]) + def test_astype_str_map(self, dtype, series): + # see gh-4405 + result = series.astype(dtype) + expected = series.map(compat.text_type) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize("dtype", [str, compat.text_type]) + def test_astype_str_cast(self, dtype): + # see gh-9757: test str and unicode on python 2.x + # and just str on python 3.x + ts = Series([Timestamp('2010-01-04 00:00:00')]) + s = ts.astype(dtype) + + expected = Series([dtype('2010-01-04')]) + tm.assert_series_equal(s, expected) + + ts = Series([Timestamp('2010-01-04 00:00:00', tz='US/Eastern')]) + s = ts.astype(dtype) + + expected = Series([dtype('2010-01-04 00:00:00-05:00')]) + tm.assert_series_equal(s, expected) + + td = Series([Timedelta(1, unit='d')]) + s = td.astype(dtype) + + expected = Series([dtype('1 days 00:00:00.000000000')]) + tm.assert_series_equal(s, expected) def test_astype_unicode(self): - - # GH7758 - # a bit of magic is required to set default encoding encoding to utf-8 + # see gh-7758: A bit of magic is required to set + # default encoding to utf-8 digits = string.digits test_series = [ Series([digits * 10, tm.rands(63), tm.rands(64), tm.rands(1000)]), Series([u('データーサイエンス、お前はもう死んでいる')]), - ] former_encoding = None + if not compat.PY3: - # in python we can force the default encoding for this test + # In Python, we can force the default encoding for this test former_encoding = sys.getdefaultencoding() reload(sys) # noqa + sys.setdefaultencoding("utf-8") if sys.getdefaultencoding() == "utf-8": test_series.append(Series([u('野菜食べないとやばい') .encode("utf-8")])) + for s in test_series: res = s.astype("unicode") expec = s.map(compat.text_type) - assert_series_equal(res, expec) - # restore the former encoding + tm.assert_series_equal(res, expec) + + # Restore the former encoding if former_encoding is not None and former_encoding != "utf-8": reload(sys) # noqa sys.setdefaultencoding(former_encoding) - def test_astype_dict(self): - # GH7271 + @pytest.mark.parametrize("dtype_class", [dict, Series]) + def test_astype_dict_like(self, dtype_class): + # see gh-7271 s = Series(range(0, 10, 2), name='abc') - result = s.astype({'abc': str}) + dt1 = dtype_class({'abc': str}) + result = s.astype(dt1) expected = Series(['0', '2', '4', '6', '8'], name='abc') - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) - result = s.astype({'abc': 'float64'}) + dt2 = dtype_class({'abc': 'float64'}) + result = s.astype(dt2) expected = Series([0.0, 2.0, 4.0, 6.0, 8.0], dtype='float64', name='abc') - assert_series_equal(result, expected) - - self.assertRaises(KeyError, s.astype, {'abc': str, 'def': str}) - self.assertRaises(KeyError, s.astype, {0: str}) - - def test_complexx(self): - # GH4819 - # complex access for ndarray compat + tm.assert_series_equal(result, expected) + + dt3 = dtype_class({'abc': str, 'def': str}) + with pytest.raises(KeyError): + s.astype(dt3) + + dt4 = dtype_class({0: str}) + with pytest.raises(KeyError): + s.astype(dt4) + + # GH16717 + # if dtypes provided is empty, it should error + dt5 = dtype_class({}) + with pytest.raises(KeyError): + s.astype(dt5) + + def test_astype_generic_timestamp_deprecated(self): + # see gh-15524 + data = [1] + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + s = Series(data) + dtype = np.datetime64 + result = s.astype(dtype) + expected = Series(data, dtype=dtype) + tm.assert_series_equal(result, expected) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + s = Series(data) + dtype = np.timedelta64 + result = s.astype(dtype) + expected = Series(data, dtype=dtype) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize("dtype", np.typecodes['All']) + def test_astype_empty_constructor_equality(self, dtype): + # see gh-15524 + + if dtype not in ('S', 'V'): # poor support (if any) currently + with warnings.catch_warnings(record=True): + # Generic timestamp dtypes ('M' and 'm') are deprecated, + # but we test that already in series/test_constructors.py + + init_empty = Series([], dtype=dtype) + as_type_empty = Series([]).astype(dtype) + tm.assert_series_equal(init_empty, as_type_empty) + + def test_complex(self): + # see gh-4819: complex access for ndarray compat a = np.arange(5, dtype=np.float64) b = Series(a + 4j * a) + tm.assert_numpy_array_equal(a, b.real) tm.assert_numpy_array_equal(4 * a, b.imag) @@ -170,14 +228,61 @@ def test_complexx(self): tm.assert_numpy_array_equal(4 * a, b.imag) def test_arg_for_errors_in_astype(self): - # issue #14878 + # see gh-14878 + s = Series([1, 2, 3]) - sr = Series([1, 2, 3]) - - with self.assertRaises(ValueError): - sr.astype(np.float64, errors=False) + with pytest.raises(ValueError): + s.astype(np.float64, errors=False) with tm.assert_produces_warning(FutureWarning): - sr.astype(np.int8, raise_on_error=True) + s.astype(np.int8, raise_on_error=True) + + s.astype(np.int8, errors='raise') + + def test_intercept_astype_object(self): + series = Series(date_range('1/1/2000', periods=10)) + + # This test no longer makes sense, as + # Series is by default already M8[ns]. + expected = series.astype('object') + + df = DataFrame({'a': series, + 'b': np.random.randn(len(series))}) + exp_dtypes = Series([np.dtype('datetime64[ns]'), + np.dtype('float64')], index=['a', 'b']) + tm.assert_series_equal(df.dtypes, exp_dtypes) + + result = df.values.squeeze() + assert (result[:, 0] == expected.values).all() + + df = DataFrame({'a': series, 'b': ['foo'] * len(series)}) + + result = df.values.squeeze() + assert (result[:, 0] == expected.values).all() + + def test_series_to_categorical(self): + # see gh-16524: test conversion of Series to Categorical + series = Series(['a', 'b', 'c']) + + result = Series(series, dtype='category') + expected = Series(['a', 'b', 'c'], dtype='category') + + tm.assert_series_equal(result, expected) + + def test_infer_objects_series(self): + # GH 11221 + actual = Series(np.array([1, 2, 3], dtype='O')).infer_objects() + expected = Series([1, 2, 3]) + tm.assert_series_equal(actual, expected) + + actual = Series(np.array([1, 2, 3, None], dtype='O')).infer_objects() + expected = Series([1., 2., 3., np.nan]) + tm.assert_series_equal(actual, expected) + + # only soft conversions, uncovertable pass thru unchanged + actual = (Series(np.array([1, 2, 3, None, 'a'], dtype='O')) + .infer_objects()) + expected = Series([1, 2, 3, None, 'a']) - sr.astype(np.int8, errors='raise') + assert actual.dtype == 'object' + tm.assert_series_equal(actual, expected) diff --git a/pandas/tests/series/test_indexing.py b/pandas/tests/series/test_indexing.py index e6209a853e958..45a92f6d6f50b 100644 --- a/pandas/tests/series/test_indexing.py +++ b/pandas/tests/series/test_indexing.py @@ -1,23 +1,28 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime, timedelta from numpy import nan import numpy as np import pandas as pd -from pandas.types.common import is_integer, is_scalar -from pandas import Index, Series, DataFrame, isnull, date_range -from pandas.core.index import MultiIndex +import pandas._libs.index as _index +from pandas.core.dtypes.common import is_integer, is_scalar +from pandas import (Index, Series, DataFrame, isna, + date_range, NaT, MultiIndex, + Timestamp, DatetimeIndex, Timedelta) from pandas.core.indexing import IndexingError -from pandas.tseries.index import Timestamp from pandas.tseries.offsets import BDay -from pandas.tseries.tdi import Timedelta +from pandas._libs import tslib, lib from pandas.compat import lrange, range from pandas import compat -from pandas.util.testing import assert_series_equal, assert_almost_equal +from pandas.util.testing import (assert_series_equal, + assert_almost_equal, + assert_frame_equal) import pandas.util.testing as tm from pandas.tests.series.common import TestData @@ -25,9 +30,7 @@ JOIN_TYPES = ['inner', 'outer', 'left', 'right'] -class TestSeriesIndexing(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesIndexing(TestData): def test_get(self): @@ -37,7 +40,7 @@ def test_get(self): result = s.get(25, 0) expected = 0 - self.assertEqual(result, expected) + assert result == expected s = Series(np.array([43, 48, 60, 48, 50, 51, 50, 45, 57, 48, 56, 45, 51, 39, 55, 43, 54, 52, 51, 54]), @@ -50,21 +53,36 @@ def test_get(self): result = s.get(25, 0) expected = 43 - self.assertEqual(result, expected) + assert result == expected # GH 7407 # with a boolean accessor df = pd.DataFrame({'i': [0] * 3, 'b': [False] * 3}) vc = df.i.value_counts() result = vc.get(99, default='Missing') - self.assertEqual(result, 'Missing') + assert result == 'Missing' vc = df.b.value_counts() result = vc.get(False, default='Missing') - self.assertEqual(result, 3) + assert result == 3 result = vc.get(True, default='Missing') - self.assertEqual(result, 'Missing') + assert result == 'Missing' + + def test_get_nan(self): + # GH 8569 + s = pd.Float64Index(range(10)).to_series() + assert s.get(np.nan) is None + assert s.get(np.nan, default='Missing') == 'Missing' + + # ensure that fixing the above hasn't broken get + # with multiple elements + idx = [20, 30] + assert_series_equal(s.get(idx), + Series([np.nan] * 2, index=idx)) + idx = [np.nan, np.nan] + assert_series_equal(s.get(idx), + Series([np.nan] * 2, index=idx)) def test_delitem(self): @@ -86,7 +104,7 @@ def test_delitem(self): def f(): del s[0] - self.assertRaises(KeyError, f) + pytest.raises(KeyError, f) # only 1 left, del, add, del s = Series(1) @@ -119,13 +137,13 @@ def test_getitem_setitem_ellipsis(self): assert_series_equal(result, s) s[...] = 5 - self.assertTrue((result == 5).all()) + assert (result == 5).all() def test_getitem_negative_out_of_bounds(self): s = Series(tm.rands_array(5, 10), index=tm.rands_array(10, 10)) - self.assertRaises(IndexError, s.__getitem__, -11) - self.assertRaises(IndexError, s.__setitem__, -11, 'foo') + pytest.raises(IndexError, s.__getitem__, -11) + pytest.raises(IndexError, s.__setitem__, -11, 'foo') def test_pop(self): # GH 6600 @@ -133,7 +151,7 @@ def test_pop(self): k = df.iloc[4] result = k.pop('B') - self.assertEqual(result, 4) + assert result == 4 expected = Series([0, 0], index=['A', 'C'], name=4) assert_series_equal(k, expected) @@ -142,42 +160,29 @@ def test_getitem_get(self): idx1 = self.series.index[5] idx2 = self.objSeries.index[5] - self.assertEqual(self.series[idx1], self.series.get(idx1)) - self.assertEqual(self.objSeries[idx2], self.objSeries.get(idx2)) + assert self.series[idx1] == self.series.get(idx1) + assert self.objSeries[idx2] == self.objSeries.get(idx2) - self.assertEqual(self.series[idx1], self.series[5]) - self.assertEqual(self.objSeries[idx2], self.objSeries[5]) + assert self.series[idx1] == self.series[5] + assert self.objSeries[idx2] == self.objSeries[5] - self.assertEqual( - self.series.get(-1), self.series.get(self.series.index[-1])) - self.assertEqual(self.series[5], self.series.get(self.series.index[5])) + assert self.series.get(-1) == self.series.get(self.series.index[-1]) + assert self.series[5] == self.series.get(self.series.index[5]) # missing d = self.ts.index[0] - BDay() - self.assertRaises(KeyError, self.ts.__getitem__, d) + pytest.raises(KeyError, self.ts.__getitem__, d) # None # GH 5652 for s in [Series(), Series(index=list('abc'))]: result = s.get(None) - self.assertIsNone(result) + assert result is None - def test_iget(self): + def test_iloc(self): s = Series(np.random.randn(10), index=lrange(0, 20, 2)) - # 10711, deprecated - with tm.assert_produces_warning(FutureWarning): - s.iget(1) - - # 10711, deprecated - with tm.assert_produces_warning(FutureWarning): - s.irow(1) - - # 10711, deprecated - with tm.assert_produces_warning(FutureWarning): - s.iget_value(1) - for i in range(len(s)): result = s.iloc[i] exp = s[s.index[i]] @@ -190,16 +195,16 @@ def test_iget(self): # test slice is a view result[:] = 0 - self.assertTrue((s[1:3] == 0).all()) + assert (s[1:3] == 0).all() # list of integers result = s.iloc[[0, 2, 3, 4, 5]] expected = s.reindex(s.index[[0, 2, 3, 4, 5]]) assert_series_equal(result, expected) - def test_iget_nonunique(self): + def test_iloc_nonunique(self): s = Series([0, 1, 2], index=[0, 1, 0]) - self.assertEqual(s.iloc[2], 2) + assert s.iloc[2] == 2 def test_getitem_regression(self): s = Series(lrange(5), index=lrange(5)) @@ -219,22 +224,22 @@ def test_getitem_setitem_slice_bug(self): s = Series(lrange(10), lrange(10)) s[-12:] = 0 - self.assertTrue((s == 0).all()) + assert (s == 0).all() s[:-12] = 5 - self.assertTrue((s == 0).all()) + assert (s == 0).all() def test_getitem_int64(self): idx = np.int64(5) - self.assertEqual(self.ts[idx], self.ts[5]) + assert self.ts[idx] == self.ts[5] def test_getitem_fancy(self): slice1 = self.series[[1, 2, 3]] slice2 = self.objSeries[[1, 2, 3]] - self.assertEqual(self.series.index[2], slice1.index[1]) - self.assertEqual(self.objSeries.index[2], slice2.index[1]) - self.assertEqual(self.series[2], slice1[1]) - self.assertEqual(self.objSeries[2], slice2[1]) + assert self.series.index[2] == slice1.index[1] + assert self.objSeries.index[2] == slice2.index[1] + assert self.series[2] == slice1[1] + assert self.objSeries[2] == slice2[1] def test_getitem_boolean(self): s = self.series @@ -244,14 +249,14 @@ def test_getitem_boolean(self): result = s[list(mask)] expected = s[mask] assert_series_equal(result, expected) - self.assert_index_equal(result.index, s.index[mask]) + tm.assert_index_equal(result.index, s.index[mask]) def test_getitem_boolean_empty(self): s = Series([], dtype=np.int64) s.index.name = 'index_name' - s = s[s.isnull()] - self.assertEqual(s.index.name, 'index_name') - self.assertEqual(s.dtype, np.int64) + s = s[s.isna()] + assert s.index.name == 'index_name' + assert s.dtype == np.int64 # GH5877 # indexing with empty series @@ -270,12 +275,12 @@ def test_getitem_boolean_empty(self): def f(): s[Series([], dtype=bool)] - self.assertRaises(IndexingError, f) + pytest.raises(IndexingError, f) def f(): s[Series([True], dtype=bool)] - self.assertRaises(IndexingError, f) + pytest.raises(IndexingError, f) def test_getitem_generator(self): gen = (x > 0 for x in self.series) @@ -316,8 +321,8 @@ def test_getitem_boolean_object(self): # nans raise exception omask[5:10] = np.nan - self.assertRaises(Exception, s.__getitem__, omask) - self.assertRaises(Exception, s.__setitem__, omask, 5) + pytest.raises(Exception, s.__getitem__, omask) + pytest.raises(Exception, s.__setitem__, omask, 5) def test_getitem_setitem_boolean_corner(self): ts = self.ts @@ -325,13 +330,13 @@ def test_getitem_setitem_boolean_corner(self): # these used to raise...?? - self.assertRaises(Exception, ts.__getitem__, mask_shifted) - self.assertRaises(Exception, ts.__setitem__, mask_shifted, 1) + pytest.raises(Exception, ts.__getitem__, mask_shifted) + pytest.raises(Exception, ts.__setitem__, mask_shifted, 1) # ts[mask_shifted] # ts[mask_shifted] = 1 - self.assertRaises(Exception, ts.loc.__getitem__, mask_shifted) - self.assertRaises(Exception, ts.loc.__setitem__, mask_shifted, 1) + pytest.raises(Exception, ts.loc.__getitem__, mask_shifted) + pytest.raises(Exception, ts.loc.__setitem__, mask_shifted, 1) # ts.loc[mask_shifted] # ts.loc[mask_shifted] = 2 @@ -343,38 +348,242 @@ def test_getitem_setitem_slice_integers(self): assert_series_equal(result, expected) s[:4] = 0 - self.assertTrue((s[:4] == 0).all()) - self.assertTrue(not (s[4:] == 0).any()) + assert (s[:4] == 0).all() + assert not (s[4:] == 0).any() + + def test_getitem_setitem_datetime_tz_pytz(self): + from pytz import timezone as tz + from pandas import date_range + + N = 50 + # testing with timezone, GH #2785 + rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern') + ts = Series(np.random.randn(N), index=rng) + + # also test Timestamp tz handling, GH #2789 + result = ts.copy() + result["1990-01-01 09:00:00+00:00"] = 0 + result["1990-01-01 09:00:00+00:00"] = ts[4] + assert_series_equal(result, ts) + + result = ts.copy() + result["1990-01-01 03:00:00-06:00"] = 0 + result["1990-01-01 03:00:00-06:00"] = ts[4] + assert_series_equal(result, ts) + + # repeat with datetimes + result = ts.copy() + result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0 + result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4] + assert_series_equal(result, ts) + + result = ts.copy() + + # comparison dates with datetime MUST be localized! + date = tz('US/Central').localize(datetime(1990, 1, 1, 3)) + result[date] = 0 + result[date] = ts[4] + assert_series_equal(result, ts) + + def test_getitem_setitem_datetime_tz_dateutil(self): + from dateutil.tz import tzutc + from pandas._libs.tslib import _dateutil_gettz as gettz + + tz = lambda x: tzutc() if x == 'UTC' else gettz( + x) # handle special case for utc in dateutil + + from pandas import date_range + + N = 50 + + # testing with timezone, GH #2785 + rng = date_range('1/1/1990', periods=N, freq='H', + tz='America/New_York') + ts = Series(np.random.randn(N), index=rng) + + # also test Timestamp tz handling, GH #2789 + result = ts.copy() + result["1990-01-01 09:00:00+00:00"] = 0 + result["1990-01-01 09:00:00+00:00"] = ts[4] + assert_series_equal(result, ts) + + result = ts.copy() + result["1990-01-01 03:00:00-06:00"] = 0 + result["1990-01-01 03:00:00-06:00"] = ts[4] + assert_series_equal(result, ts) + + # repeat with datetimes + result = ts.copy() + result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0 + result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4] + assert_series_equal(result, ts) + + result = ts.copy() + result[datetime(1990, 1, 1, 3, tzinfo=tz('America/Chicago'))] = 0 + result[datetime(1990, 1, 1, 3, tzinfo=tz('America/Chicago'))] = ts[4] + assert_series_equal(result, ts) + + def test_getitem_setitem_datetimeindex(self): + N = 50 + # testing with timezone, GH #2785 + rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern') + ts = Series(np.random.randn(N), index=rng) + + result = ts["1990-01-01 04:00:00"] + expected = ts[4] + assert result == expected + + result = ts.copy() + result["1990-01-01 04:00:00"] = 0 + result["1990-01-01 04:00:00"] = ts[4] + assert_series_equal(result, ts) + + result = ts["1990-01-01 04:00:00":"1990-01-01 07:00:00"] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts.copy() + result["1990-01-01 04:00:00":"1990-01-01 07:00:00"] = 0 + result["1990-01-01 04:00:00":"1990-01-01 07:00:00"] = ts[4:8] + assert_series_equal(result, ts) + + lb = "1990-01-01 04:00:00" + rb = "1990-01-01 07:00:00" + result = ts[(ts.index >= lb) & (ts.index <= rb)] + expected = ts[4:8] + assert_series_equal(result, expected) + + # repeat all the above with naive datetimes + result = ts[datetime(1990, 1, 1, 4)] + expected = ts[4] + assert result == expected + + result = ts.copy() + result[datetime(1990, 1, 1, 4)] = 0 + result[datetime(1990, 1, 1, 4)] = ts[4] + assert_series_equal(result, ts) + + result = ts[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts.copy() + result[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] = 0 + result[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] = ts[4:8] + assert_series_equal(result, ts) + + lb = datetime(1990, 1, 1, 4) + rb = datetime(1990, 1, 1, 7) + result = ts[(ts.index >= lb) & (ts.index <= rb)] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts[ts.index[4]] + expected = ts[4] + assert result == expected + + result = ts[ts.index[4:8]] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts.copy() + result[ts.index[4:8]] = 0 + result[4:8] = ts[4:8] + assert_series_equal(result, ts) + + # also test partial date slicing + result = ts["1990-01-02"] + expected = ts[24:48] + assert_series_equal(result, expected) + + result = ts.copy() + result["1990-01-02"] = 0 + result["1990-01-02"] = ts[24:48] + assert_series_equal(result, ts) + + def test_getitem_setitem_periodindex(self): + from pandas import period_range + + N = 50 + rng = period_range('1/1/1990', periods=N, freq='H') + ts = Series(np.random.randn(N), index=rng) + + result = ts["1990-01-01 04"] + expected = ts[4] + assert result == expected + + result = ts.copy() + result["1990-01-01 04"] = 0 + result["1990-01-01 04"] = ts[4] + assert_series_equal(result, ts) + + result = ts["1990-01-01 04":"1990-01-01 07"] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts.copy() + result["1990-01-01 04":"1990-01-01 07"] = 0 + result["1990-01-01 04":"1990-01-01 07"] = ts[4:8] + assert_series_equal(result, ts) + + lb = "1990-01-01 04" + rb = "1990-01-01 07" + result = ts[(ts.index >= lb) & (ts.index <= rb)] + expected = ts[4:8] + assert_series_equal(result, expected) + + # GH 2782 + result = ts[ts.index[4]] + expected = ts[4] + assert result == expected + + result = ts[ts.index[4:8]] + expected = ts[4:8] + assert_series_equal(result, expected) + + result = ts.copy() + result[ts.index[4:8]] = 0 + result[4:8] = ts[4:8] + assert_series_equal(result, ts) + + def test_getitem_median_slice_bug(self): + index = date_range('20090415', '20090519', freq='2B') + s = Series(np.random.randn(13), index=index) + + indexer = [slice(6, 7, None)] + result = s[indexer] + expected = s[indexer[0]] + assert_series_equal(result, expected) def test_getitem_out_of_bounds(self): # don't segfault, GH #495 - self.assertRaises(IndexError, self.ts.__getitem__, len(self.ts)) + pytest.raises(IndexError, self.ts.__getitem__, len(self.ts)) # GH #917 s = Series([]) - self.assertRaises(IndexError, s.__getitem__, -1) + pytest.raises(IndexError, s.__getitem__, -1) def test_getitem_setitem_integers(self): # caused bug without test s = Series([1, 2, 3], ['a', 'b', 'c']) - self.assertEqual(s.iloc[0], s['a']) + assert s.iloc[0] == s['a'] s.iloc[0] = 5 - self.assertAlmostEqual(s['a'], 5) + tm.assert_almost_equal(s['a'], 5) def test_getitem_box_float64(self): value = self.ts[5] - tm.assertIsInstance(value, np.float64) + assert isinstance(value, np.float64) def test_getitem_ambiguous_keyerror(self): s = Series(lrange(10), index=lrange(0, 20, 2)) - self.assertRaises(KeyError, s.__getitem__, 1) - self.assertRaises(KeyError, s.loc.__getitem__, 1) + pytest.raises(KeyError, s.__getitem__, 1) + pytest.raises(KeyError, s.loc.__getitem__, 1) def test_getitem_unordered_dup(self): obj = Series(lrange(5), index=['c', 'a', 'a', 'b', 'b']) - self.assertTrue(is_scalar(obj['c'])) - self.assertEqual(obj['c'], 0) + assert is_scalar(obj['c']) + assert obj['c'] == 0 def test_getitem_dups_with_missing(self): @@ -395,13 +604,13 @@ def test_getitem_dataframe(self): rng = list(range(10)) s = pd.Series(10, index=rng) df = pd.DataFrame(rng, index=rng) - self.assertRaises(TypeError, s.__getitem__, df > 5) + pytest.raises(TypeError, s.__getitem__, df > 5) def test_getitem_callable(self): # GH 12533 s = pd.Series(4, index=list('ABCD')) result = s[lambda x: 'A'] - self.assertEqual(result, s.loc['A']) + assert result == s.loc['A'] result = s[lambda x: ['A', 'B']] tm.assert_series_equal(result, s.loc[['A', 'B']]) @@ -454,22 +663,20 @@ def test_slice(self): numSliceEnd = self.series[-10:] objSlice = self.objSeries[10:20] - self.assertNotIn(self.series.index[9], numSlice.index) - self.assertNotIn(self.objSeries.index[9], objSlice.index) + assert self.series.index[9] not in numSlice.index + assert self.objSeries.index[9] not in objSlice.index - self.assertEqual(len(numSlice), len(numSlice.index)) - self.assertEqual(self.series[numSlice.index[0]], - numSlice[numSlice.index[0]]) + assert len(numSlice) == len(numSlice.index) + assert self.series[numSlice.index[0]] == numSlice[numSlice.index[0]] - self.assertEqual(numSlice.index[1], self.series.index[11]) + assert numSlice.index[1] == self.series.index[11] + assert tm.equalContents(numSliceEnd, np.array(self.series)[-10:]) - self.assertTrue(tm.equalContents(numSliceEnd, np.array(self.series)[ - -10:])) - - # test return view + # Test return view. sl = self.series[10:20] sl[:] = 0 - self.assertTrue((self.series[10:20] == 0).all()) + + assert (self.series[10:20] == 0).all() def test_slice_can_reorder_not_uniquely_indexed(self): s = Series(1, index=['a', 'a', 'b', 'b', 'c']) @@ -477,27 +684,27 @@ def test_slice_can_reorder_not_uniquely_indexed(self): def test_slice_float_get_set(self): - self.assertRaises(TypeError, lambda: self.ts[4.0:10.0]) + pytest.raises(TypeError, lambda: self.ts[4.0:10.0]) def f(): self.ts[4.0:10.0] = 0 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) - self.assertRaises(TypeError, self.ts.__getitem__, slice(4.5, 10.0)) - self.assertRaises(TypeError, self.ts.__setitem__, slice(4.5, 10.0), 0) + pytest.raises(TypeError, self.ts.__getitem__, slice(4.5, 10.0)) + pytest.raises(TypeError, self.ts.__setitem__, slice(4.5, 10.0), 0) def test_slice_floats2(self): s = Series(np.random.rand(10), index=np.arange(10, 20, dtype=float)) - self.assertEqual(len(s.loc[12.0:]), 8) - self.assertEqual(len(s.loc[12.5:]), 7) + assert len(s.loc[12.0:]) == 8 + assert len(s.loc[12.5:]) == 7 i = np.arange(10, 20, dtype=float) i[2] = 12.2 s.index = i - self.assertEqual(len(s.loc[12.0:]), 8) - self.assertEqual(len(s.loc[12.5:]), 7) + assert len(s.loc[12.0:]) == 8 + assert len(s.loc[12.5:]) == 7 def test_slice_float64(self): @@ -528,17 +735,17 @@ def test_setitem(self): self.ts[self.ts.index[5]] = np.NaN self.ts[[1, 2, 17]] = np.NaN self.ts[6] = np.NaN - self.assertTrue(np.isnan(self.ts[6])) - self.assertTrue(np.isnan(self.ts[2])) + assert np.isnan(self.ts[6]) + assert np.isnan(self.ts[2]) self.ts[np.isnan(self.ts)] = 5 - self.assertFalse(np.isnan(self.ts[2])) + assert not np.isnan(self.ts[2]) # caught this bug when writing tests series = Series(tm.makeIntIndex(20).astype(float), index=tm.makeIntIndex(20)) series[::2] = 0 - self.assertTrue((series[::2] == 0).all()) + assert (series[::2] == 0).all() # set item that's not contained s = self.series.copy() @@ -589,31 +796,31 @@ def test_setitem_dtypes(self): def test_set_value(self): idx = self.ts.index[10] res = self.ts.set_value(idx, 0) - self.assertIs(res, self.ts) - self.assertEqual(self.ts[idx], 0) + assert res is self.ts + assert self.ts[idx] == 0 # equiv s = self.series.copy() res = s.set_value('foobar', 0) - self.assertIs(res, s) - self.assertEqual(res.index[-1], 'foobar') - self.assertEqual(res['foobar'], 0) + assert res is s + assert res.index[-1] == 'foobar' + assert res['foobar'] == 0 s = self.series.copy() s.loc['foobar'] = 0 - self.assertEqual(s.index[-1], 'foobar') - self.assertEqual(s['foobar'], 0) + assert s.index[-1] == 'foobar' + assert s['foobar'] == 0 def test_setslice(self): sl = self.ts[5:20] - self.assertEqual(len(sl), len(sl.index)) - self.assertTrue(sl.index.is_unique) + assert len(sl) == len(sl.index) + assert sl.index.is_unique def test_basic_getitem_setitem_corner(self): # invalid tuples, e.g. self.ts[:, None] vs. self.ts[:, 2] - with tm.assertRaisesRegexp(ValueError, 'tuple-index'): + with tm.assert_raises_regex(ValueError, 'tuple-index'): self.ts[:, 2] - with tm.assertRaisesRegexp(ValueError, 'tuple-index'): + with tm.assert_raises_regex(ValueError, 'tuple-index'): self.ts[:, 2] = 2 # weird lists. [slice(0, 5)] will work but not two slices @@ -622,10 +829,10 @@ def test_basic_getitem_setitem_corner(self): assert_series_equal(result, expected) # OK - self.assertRaises(Exception, self.ts.__getitem__, - [5, slice(None, None)]) - self.assertRaises(Exception, self.ts.__setitem__, - [5, slice(None, None)], 2) + pytest.raises(Exception, self.ts.__getitem__, + [5, slice(None, None)]) + pytest.raises(Exception, self.ts.__setitem__, + [5, slice(None, None)], 2) def test_basic_getitem_with_labels(self): indices = self.ts.index[[5, 10, 15]] @@ -656,11 +863,11 @@ def test_basic_getitem_with_labels(self): index=['a', 'b', 'c']) expected = Timestamp('2011-01-01', tz='US/Eastern') result = s.loc['a'] - self.assertEqual(result, expected) + assert result == expected result = s.iloc[0] - self.assertEqual(result, expected) + assert result == expected result = s['a'] - self.assertEqual(result, expected) + assert result == expected def test_basic_setitem_with_labels(self): indices = self.ts.index[[5, 10, 15]] @@ -696,8 +903,8 @@ def test_basic_setitem_with_labels(self): inds_notfound = [0, 4, 5, 6] arr_inds_notfound = np.array([0, 4, 5, 6]) - self.assertRaises(Exception, s.__setitem__, inds_notfound, 0) - self.assertRaises(Exception, s.__setitem__, arr_inds_notfound, 0) + pytest.raises(Exception, s.__setitem__, inds_notfound, 0) + pytest.raises(Exception, s.__setitem__, arr_inds_notfound, 0) # GH12089 # with tz for values @@ -707,17 +914,17 @@ def test_basic_setitem_with_labels(self): expected = Timestamp('2011-01-03', tz='US/Eastern') s2.loc['a'] = expected result = s2.loc['a'] - self.assertEqual(result, expected) + assert result == expected s2 = s.copy() s2.iloc[0] = expected result = s2.iloc[0] - self.assertEqual(result, expected) + assert result == expected s2 = s.copy() s2['a'] = expected result = s2['a'] - self.assertEqual(result, expected) + assert result == expected def test_loc_getitem(self): inds = self.series.index[[3, 4, 7]] @@ -735,16 +942,16 @@ def test_loc_getitem(self): assert_series_equal(self.series.loc[mask], self.series[mask]) # ask for index value - self.assertEqual(self.ts.loc[d1], self.ts[d1]) - self.assertEqual(self.ts.loc[d2], self.ts[d2]) + assert self.ts.loc[d1] == self.ts[d1] + assert self.ts.loc[d2] == self.ts[d2] def test_loc_getitem_not_monotonic(self): d1, d2 = self.ts.index[[5, 15]] ts2 = self.ts[::2][[1, 2, 0]] - self.assertRaises(KeyError, ts2.loc.__getitem__, slice(d1, d2)) - self.assertRaises(KeyError, ts2.loc.__setitem__, slice(d1, d2), 0) + pytest.raises(KeyError, ts2.loc.__getitem__, slice(d1, d2)) + pytest.raises(KeyError, ts2.loc.__setitem__, slice(d1, d2), 0) def test_loc_getitem_setitem_integer_slice_keyerrors(self): s = Series(np.random.randn(10), index=lrange(0, 20, 2)) @@ -752,12 +959,12 @@ def test_loc_getitem_setitem_integer_slice_keyerrors(self): # this is OK cp = s.copy() cp.iloc[4:10] = 0 - self.assertTrue((cp.iloc[4:10] == 0).all()) + assert (cp.iloc[4:10] == 0).all() # so is this cp = s.copy() cp.iloc[3:11] = 0 - self.assertTrue((cp.iloc[3:11] == 0).values.all()) + assert (cp.iloc[3:11] == 0).values.all() result = s.iloc[2:6] result2 = s.loc[3:11] @@ -768,8 +975,8 @@ def test_loc_getitem_setitem_integer_slice_keyerrors(self): # non-monotonic, raise KeyError s2 = s.iloc[lrange(5) + lrange(5, 10)[::-1]] - self.assertRaises(KeyError, s2.loc.__getitem__, slice(3, 11)) - self.assertRaises(KeyError, s2.loc.__setitem__, slice(3, 11), 0) + pytest.raises(KeyError, s2.loc.__getitem__, slice(3, 11)) + pytest.raises(KeyError, s2.loc.__setitem__, slice(3, 11), 0) def test_loc_getitem_iterator(self): idx = iter(self.series.index[:10]) @@ -780,7 +987,7 @@ def test_setitem_with_tz(self): for tz in ['US/Eastern', 'UTC', 'Asia/Tokyo']: orig = pd.Series(pd.date_range('2016-01-01', freq='H', periods=3, tz=tz)) - self.assertEqual(orig.dtype, 'datetime64[ns, {0}]'.format(tz)) + assert orig.dtype == 'datetime64[ns, {0}]'.format(tz) # scalar s = orig.copy() @@ -801,7 +1008,7 @@ def test_setitem_with_tz(self): # vector vals = pd.Series([pd.Timestamp('2011-01-01', tz=tz), pd.Timestamp('2012-01-01', tz=tz)], index=[1, 2]) - self.assertEqual(vals.dtype, 'datetime64[ns, {0}]'.format(tz)) + assert vals.dtype == 'datetime64[ns, {0}]'.format(tz) s[[1, 2]] = vals exp = pd.Series([pd.Timestamp('2016-01-01 00:00', tz=tz), @@ -822,14 +1029,14 @@ def test_setitem_with_tz_dst(self): tz = 'US/Eastern' orig = pd.Series(pd.date_range('2016-11-06', freq='H', periods=3, tz=tz)) - self.assertEqual(orig.dtype, 'datetime64[ns, {0}]'.format(tz)) + assert orig.dtype == 'datetime64[ns, {0}]'.format(tz) # scalar s = orig.copy() s[1] = pd.Timestamp('2011-01-01', tz=tz) - exp = pd.Series([pd.Timestamp('2016-11-06 00:00', tz=tz), - pd.Timestamp('2011-01-01 00:00', tz=tz), - pd.Timestamp('2016-11-06 02:00', tz=tz)]) + exp = pd.Series([pd.Timestamp('2016-11-06 00:00-04:00', tz=tz), + pd.Timestamp('2011-01-01 00:00-05:00', tz=tz), + pd.Timestamp('2016-11-06 01:00-05:00', tz=tz)]) tm.assert_series_equal(s, exp) s = orig.copy() @@ -843,7 +1050,7 @@ def test_setitem_with_tz_dst(self): # vector vals = pd.Series([pd.Timestamp('2011-01-01', tz=tz), pd.Timestamp('2012-01-01', tz=tz)], index=[1, 2]) - self.assertEqual(vals.dtype, 'datetime64[ns, {0}]'.format(tz)) + assert vals.dtype == 'datetime64[ns, {0}]'.format(tz) s[[1, 2]] = vals exp = pd.Series([pd.Timestamp('2016-11-06 00:00', tz=tz), @@ -887,8 +1094,13 @@ def test_where(self): rs = s2.where(cond[:3], -s2) assert_series_equal(rs, expected) - self.assertRaises(ValueError, s.where, 1) - self.assertRaises(ValueError, s.where, cond[:3].values, -s) + def test_where_error(self): + + s = Series(np.random.randn(5)) + cond = s > 0 + + pytest.raises(ValueError, s.where, 1) + pytest.raises(ValueError, s.where, cond[:3].values, -s) # GH 2745 s = Series([1, 2]) @@ -897,10 +1109,12 @@ def test_where(self): assert_series_equal(s, expected) # failures - self.assertRaises(ValueError, s.__setitem__, tuple([[[True, False]]]), - [0, 2, 3]) - self.assertRaises(ValueError, s.__setitem__, tuple([[[True, False]]]), - []) + pytest.raises(ValueError, s.__setitem__, tuple([[[True, False]]]), + [0, 2, 3]) + pytest.raises(ValueError, s.__setitem__, tuple([[[True, False]]]), + []) + + def test_where_unsafe(self): # unsafe dtype changes for dtype in [np.int8, np.int16, np.int32, np.int64, np.float16, @@ -910,7 +1124,7 @@ def test_where(self): s[mask] = lrange(2, 7) expected = Series(lrange(2, 7) + lrange(5, 10), dtype=dtype) assert_series_equal(s, expected) - self.assertEqual(s.dtype, expected.dtype) + assert s.dtype == expected.dtype # these are allowed operations, but are upcasted for dtype in [np.int64, np.float64]: @@ -920,7 +1134,7 @@ def test_where(self): s[mask] = values expected = Series(values + lrange(5, 10), dtype='float64') assert_series_equal(s, expected) - self.assertEqual(s.dtype, expected.dtype) + assert s.dtype == expected.dtype # GH 9731 s = Series(np.arange(10), dtype='int64') @@ -936,7 +1150,7 @@ def test_where(self): s = Series(np.arange(10), dtype=dtype) mask = s < 5 values = [2.5, 3.5, 4.5, 5.5, 6.5] - self.assertRaises(Exception, s.__setitem__, tuple(mask), values) + pytest.raises(Exception, s.__setitem__, tuple(mask), values) # GH3235 s = Series(np.arange(10), dtype='int64') @@ -944,7 +1158,7 @@ def test_where(self): s[mask] = lrange(2, 7) expected = Series(lrange(2, 7) + lrange(5, 10), dtype='int64') assert_series_equal(s, expected) - self.assertEqual(s.dtype, expected.dtype) + assert s.dtype == expected.dtype s = Series(np.arange(10), dtype='int64') mask = s > 5 @@ -958,12 +1172,12 @@ def test_where(self): def f(): s[mask] = [5, 4, 3, 2, 1] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def f(): s[mask] = [0] * 5 - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # dtype changes s = Series([1, 2, 3, 4]) @@ -976,14 +1190,68 @@ def f(): s = Series(range(10)).astype(float) s[8] = None result = s[8] - self.assertTrue(isnull(result)) + assert isna(result) s = Series(range(10)).astype(float) s[s > 8] = None - result = s[isnull(s)] + result = s[isna(s)] expected = Series(np.nan, index=[9]) assert_series_equal(result, expected) + def test_where_array_like(self): + # see gh-15414 + s = Series([1, 2, 3]) + cond = [False, True, True] + expected = Series([np.nan, 2, 3]) + klasses = [list, tuple, np.array, Series] + + for klass in klasses: + result = s.where(klass(cond)) + assert_series_equal(result, expected) + + def test_where_invalid_input(self): + # see gh-15414: only boolean arrays accepted + s = Series([1, 2, 3]) + msg = "Boolean array expected for the condition" + + conds = [ + [1, 0, 1], + Series([2, 5, 7]), + ["True", "False", "True"], + [Timestamp("2017-01-01"), + pd.NaT, Timestamp("2017-01-02")] + ] + + for cond in conds: + with tm.assert_raises_regex(ValueError, msg): + s.where(cond) + + msg = "Array conditional must be same shape as self" + with tm.assert_raises_regex(ValueError, msg): + s.where([True]) + + def test_where_ndframe_align(self): + msg = "Array conditional must be same shape as self" + s = Series([1, 2, 3]) + + cond = [True] + with tm.assert_raises_regex(ValueError, msg): + s.where(cond) + + expected = Series([1, np.nan, np.nan]) + + out = s.where(Series(cond)) + tm.assert_series_equal(out, expected) + + cond = np.array([False, True, False, True]) + with tm.assert_raises_regex(ValueError, msg): + s.where(cond) + + expected = Series([np.nan, 2, np.nan]) + + out = s.where(Series(cond)) + tm.assert_series_equal(out, expected) + def test_where_setitem_invalid(self): # GH 2702 @@ -995,7 +1263,7 @@ def test_where_setitem_invalid(self): def f(): s[0:3] = list(range(27)) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) s[0:3] = list(range(3)) expected = Series([0, 1, 2]) @@ -1007,7 +1275,7 @@ def f(): def f(): s[0:4:2] = list(range(27)) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) s = Series(list('abcdef')) s[0:4:2] = list(range(2)) @@ -1020,7 +1288,7 @@ def f(): def f(): s[:-1] = list(range(27)) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) s[-3:-1] = list(range(2)) expected = Series(['a', 'b', 'c', 0, 1, 'f']) @@ -1032,14 +1300,14 @@ def f(): def f(): s[[0, 1, 2]] = list(range(27)) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) s = Series(list('abc')) def f(): s[[0, 1, 2]] = list(range(2)) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # scalar s = Series(list('abc')) @@ -1113,9 +1381,9 @@ def test_where_dups(self): expected = Series([5, 11, 2, 5, 11, 2], index=[0, 1, 2, 0, 1, 2]) assert_series_equal(comb, expected) - def test_where_datetime(self): + def test_where_datetime_conversion(self): s = Series(date_range('20130102', periods=2)) - expected = Series([10, 10], dtype='datetime64[ns]') + expected = Series([10, 10]) mask = np.array([False, False]) rs = s.where(mask, [10, 10]) @@ -1131,12 +1399,20 @@ def test_where_datetime(self): assert_series_equal(rs, expected) rs = s.where(mask, [10.0, np.nan]) - expected = Series([10, None], dtype='datetime64[ns]') + expected = Series([10, None], dtype='object') + assert_series_equal(rs, expected) + + # GH 15701 + timestamps = ['2016-12-31 12:00:04+00:00', + '2016-12-31 12:00:04.010000+00:00'] + s = Series([pd.Timestamp(t) for t in timestamps]) + rs = s.where(Series([False, True])) + expected = Series([pd.NaT, s[1]]) assert_series_equal(rs, expected) - def test_where_timedelta(self): + def test_where_timedelta_coerce(self): s = Series([1, 2], dtype='timedelta64[ns]') - expected = Series([10, 10], dtype='timedelta64[ns]') + expected = Series([10, 10]) mask = np.array([False, False]) rs = s.where(mask, [10, 10]) @@ -1152,7 +1428,7 @@ def test_where_timedelta(self): assert_series_equal(rs, expected) rs = s.where(mask, [10.0, np.nan]) - expected = Series([10, None], dtype='timedelta64[ns]') + expected = Series([10, None], dtype='object') assert_series_equal(rs, expected) def test_mask(self): @@ -1181,8 +1457,8 @@ def test_mask(self): rs2 = s2.mask(cond[:3], -s2) assert_series_equal(rs, rs2) - self.assertRaises(ValueError, s.mask, 1) - self.assertRaises(ValueError, s.mask, cond[:3].values, -s) + pytest.raises(ValueError, s.mask, 1) + pytest.raises(ValueError, s.mask, cond[:3].values, -s) # dtype changes s = Series([1, 2, 3, 4]) @@ -1247,33 +1523,33 @@ def test_ix_setitem(self): # set index value self.series.loc[d1] = 4 self.series.loc[d2] = 6 - self.assertEqual(self.series[d1], 4) - self.assertEqual(self.series[d2], 6) + assert self.series[d1] == 4 + assert self.series[d2] == 6 def test_where_numeric_with_string(self): # GH 9280 s = pd.Series([1, 2, 3]) w = s.where(s > 1, 'X') - self.assertFalse(is_integer(w[0])) - self.assertTrue(is_integer(w[1])) - self.assertTrue(is_integer(w[2])) - self.assertTrue(isinstance(w[0], str)) - self.assertTrue(w.dtype == 'object') + assert not is_integer(w[0]) + assert is_integer(w[1]) + assert is_integer(w[2]) + assert isinstance(w[0], str) + assert w.dtype == 'object' w = s.where(s > 1, ['X', 'Y', 'Z']) - self.assertFalse(is_integer(w[0])) - self.assertTrue(is_integer(w[1])) - self.assertTrue(is_integer(w[2])) - self.assertTrue(isinstance(w[0], str)) - self.assertTrue(w.dtype == 'object') + assert not is_integer(w[0]) + assert is_integer(w[1]) + assert is_integer(w[2]) + assert isinstance(w[0], str) + assert w.dtype == 'object' w = s.where(s > 1, np.array(['X', 'Y', 'Z'])) - self.assertFalse(is_integer(w[0])) - self.assertTrue(is_integer(w[1])) - self.assertTrue(is_integer(w[2])) - self.assertTrue(isinstance(w[0], str)) - self.assertTrue(w.dtype == 'object') + assert not is_integer(w[0]) + assert is_integer(w[1]) + assert is_integer(w[2]) + assert isinstance(w[0], str) + assert w.dtype == 'object' def test_setitem_boolean(self): mask = self.series > self.series.median() @@ -1303,8 +1579,8 @@ def test_ix_setitem_boolean(self): def test_ix_setitem_corner(self): inds = list(self.series.index[[5, 8, 12]]) self.series.loc[inds] = 5 - self.assertRaises(Exception, self.series.loc.__setitem__, - inds + ['foo'], 5) + pytest.raises(Exception, self.series.loc.__setitem__, + inds + ['foo'], 5) def test_get_set_boolean_different_order(self): ordered = self.series.sort_values() @@ -1345,29 +1621,29 @@ def test_setitem_na(self): def test_basic_indexing(self): s = Series(np.random.randn(5), index=['a', 'b', 'a', 'a', 'b']) - self.assertRaises(IndexError, s.__getitem__, 5) - self.assertRaises(IndexError, s.__setitem__, 5, 0) + pytest.raises(IndexError, s.__getitem__, 5) + pytest.raises(IndexError, s.__setitem__, 5, 0) - self.assertRaises(KeyError, s.__getitem__, 'c') + pytest.raises(KeyError, s.__getitem__, 'c') s = s.sort_index() - self.assertRaises(IndexError, s.__getitem__, 5) - self.assertRaises(IndexError, s.__setitem__, 5, 0) + pytest.raises(IndexError, s.__getitem__, 5) + pytest.raises(IndexError, s.__setitem__, 5, 0) def test_int_indexing(self): s = Series(np.random.randn(6), index=[0, 0, 1, 1, 2, 2]) - self.assertRaises(KeyError, s.__getitem__, 5) + pytest.raises(KeyError, s.__getitem__, 5) - self.assertRaises(KeyError, s.__getitem__, 'c') + pytest.raises(KeyError, s.__getitem__, 'c') # not monotonic s = Series(np.random.randn(6), index=[2, 2, 0, 0, 1, 1]) - self.assertRaises(KeyError, s.__getitem__, 5) + pytest.raises(KeyError, s.__getitem__, 5) - self.assertRaises(KeyError, s.__getitem__, 'c') + pytest.raises(KeyError, s.__getitem__, 'c') def test_datetime_indexing(self): from pandas import date_range @@ -1378,17 +1654,17 @@ def test_datetime_indexing(self): s = Series(len(index), index=index) stamp = Timestamp('1/8/2000') - self.assertRaises(KeyError, s.__getitem__, stamp) + pytest.raises(KeyError, s.__getitem__, stamp) s[stamp] = 0 - self.assertEqual(s[stamp], 0) + assert s[stamp] == 0 # not monotonic s = Series(len(index), index=index) s = s[::-1] - self.assertRaises(KeyError, s.__getitem__, stamp) + pytest.raises(KeyError, s.__getitem__, stamp) s[stamp] = 0 - self.assertEqual(s[stamp], 0) + assert s[stamp] == 0 def test_timedelta_assignment(self): # GH 8209 @@ -1443,7 +1719,7 @@ def test_underlying_data_conversion(self): df_tmp = df.iloc[ck] # noqa df["bb"].iloc[0] = .15 - self.assertEqual(df['bb'].iloc[0], 0.15) + assert df['bb'].iloc[0] == 0.15 pd.set_option('chained_assignment', 'raise') # GH 3217 @@ -1457,7 +1733,7 @@ def test_underlying_data_conversion(self): def test_preserveRefs(self): seq = self.ts[[5, 10, 15]] seq[1] = np.NaN - self.assertFalse(np.isnan(self.ts[10])) + assert not np.isnan(self.ts[10]) def test_drop(self): @@ -1486,8 +1762,8 @@ def test_drop(self): # single string/tuple-like s = Series(range(3), index=list('abc')) - self.assertRaises(ValueError, s.drop, 'bc') - self.assertRaises(ValueError, s.drop, ('a', )) + pytest.raises(ValueError, s.drop, 'bc') + pytest.raises(ValueError, s.drop, ('a', )) # errors='ignore' s = Series(range(3), index=list('abc')) @@ -1498,11 +1774,11 @@ def test_drop(self): assert_series_equal(result, expected) # bad axis - self.assertRaises(ValueError, s.drop, 'one', axis='columns') + pytest.raises(ValueError, s.drop, 'one', axis='columns') # GH 8522 s = Series([2, 3], index=[True, False]) - self.assertTrue(s.index.is_object()) + assert s.index.is_object() result = s.drop(True) expected = Series([3], index=[False]) assert_series_equal(result, expected) @@ -1516,9 +1792,9 @@ def _check_align(a, b, how='left', fill=None): diff_a = aa.index.difference(join_index) diff_b = ab.index.difference(join_index) if len(diff_a) > 0: - self.assertTrue((aa.reindex(diff_a) == fill).all()) + assert (aa.reindex(diff_a) == fill).all() if len(diff_b) > 0: - self.assertTrue((ab.reindex(diff_b) == fill).all()) + assert (ab.reindex(diff_b) == fill).all() ea = a.reindex(join_index) eb = b.reindex(join_index) @@ -1529,10 +1805,10 @@ def _check_align(a, b, how='left', fill=None): assert_series_equal(aa, ea) assert_series_equal(ab, eb) - self.assertEqual(aa.name, 'ts') - self.assertEqual(ea.name, 'ts') - self.assertEqual(ab.name, 'ts') - self.assertEqual(eb.name, 'ts') + assert aa.name == 'ts' + assert ea.name == 'ts' + assert ab.name == 'ts' + assert eb.name == 'ts' for kind in JOIN_TYPES: _check_align(self.ts[2:], self.ts[:-5], how=kind) @@ -1592,36 +1868,36 @@ def test_align_nocopy(self): a = self.ts.copy() ra, _ = a.align(b, join='left') ra[:5] = 5 - self.assertFalse((a[:5] == 5).any()) + assert not (a[:5] == 5).any() # do not copy a = self.ts.copy() ra, _ = a.align(b, join='left', copy=False) ra[:5] = 5 - self.assertTrue((a[:5] == 5).all()) + assert (a[:5] == 5).all() # do copy a = self.ts.copy() b = self.ts[:5].copy() _, rb = a.align(b, join='right') rb[:3] = 5 - self.assertFalse((b[:3] == 5).any()) + assert not (b[:3] == 5).any() # do not copy a = self.ts.copy() b = self.ts[:5].copy() _, rb = a.align(b, join='right', copy=False) rb[:2] = 5 - self.assertTrue((b[:2] == 5).all()) + assert (b[:2] == 5).all() - def test_align_sameindex(self): + def test_align_same_index(self): a, b = self.ts.align(self.ts, copy=False) - self.assertIs(a.index, self.ts.index) - self.assertIs(b.index, self.ts.index) + assert a.index is self.ts.index + assert b.index is self.ts.index - # a, b = self.ts.align(self.ts, copy=True) - # self.assertIsNot(a.index, self.ts.index) - # self.assertIsNot(b.index, self.ts.index) + a, b = self.ts.align(self.ts, copy=True) + assert a.index is not self.ts.index + assert b.index is not self.ts.index def test_align_multiindex(self): # GH 10665 @@ -1662,38 +1938,37 @@ def test_reindex(self): # __array_interface__ is not defined for older numpies # and on some pythons try: - self.assertTrue(np.may_share_memory(self.series.index, - identity.index)) - except (AttributeError): + assert np.may_share_memory(self.series.index, identity.index) + except AttributeError: pass - self.assertTrue(identity.index.is_(self.series.index)) - self.assertTrue(identity.index.identical(self.series.index)) + assert identity.index.is_(self.series.index) + assert identity.index.identical(self.series.index) subIndex = self.series.index[10:20] subSeries = self.series.reindex(subIndex) for idx, val in compat.iteritems(subSeries): - self.assertEqual(val, self.series[idx]) + assert val == self.series[idx] subIndex2 = self.ts.index[10:20] subTS = self.ts.reindex(subIndex2) for idx, val in compat.iteritems(subTS): - self.assertEqual(val, self.ts[idx]) + assert val == self.ts[idx] stuffSeries = self.ts.reindex(subIndex) - self.assertTrue(np.isnan(stuffSeries).all()) + assert np.isnan(stuffSeries).all() # This is extremely important for the Cython code to not screw up nonContigIndex = self.ts.index[::2] subNonContig = self.ts.reindex(nonContigIndex) for idx, val in compat.iteritems(subNonContig): - self.assertEqual(val, self.ts[idx]) + assert val == self.ts[idx] # return a copy the same index here result = self.ts.reindex() - self.assertFalse((result is self.ts)) + assert not (result is self.ts) def test_reindex_nan(self): ts = Series([2, 3, 5, 7], index=[1, 4, nan, 8]) @@ -1706,6 +1981,28 @@ def test_reindex_nan(self): # reindex coerces index.dtype to float, loc/iloc doesn't assert_series_equal(ts.reindex(i), ts.iloc[j], check_index_type=False) + def test_reindex_series_add_nat(self): + rng = date_range('1/1/2000 00:00:00', periods=10, freq='10s') + series = Series(rng) + + result = series.reindex(lrange(15)) + assert np.issubdtype(result.dtype, np.dtype('M8[ns]')) + + mask = result.isna() + assert mask[-5:].all() + assert not mask[:-5].any() + + def test_reindex_with_datetimes(self): + rng = date_range('1/1/2000', periods=20) + ts = Series(np.random.randn(20), index=rng) + + result = ts.reindex(list(ts.index[5:10])) + expected = ts[5:10] + tm.assert_series_equal(result, expected) + + result = ts[list(ts.index[5:10])] + tm.assert_series_equal(result, expected) + def test_reindex_corner(self): # (don't forget to fix this) I think it's fixed self.empty.reindex(self.ts.index, method='pad') # it works @@ -1719,7 +2016,7 @@ def test_reindex_corner(self): # bad fill method ts = self.ts[::2] - self.assertRaises(Exception, ts.reindex, self.ts.index, method='foo') + pytest.raises(Exception, ts.reindex, self.ts.index, method='foo') def test_reindex_pad(self): @@ -1790,11 +2087,11 @@ def test_reindex_int(self): reindexed_int = int_ts.reindex(self.ts.index) # if NaNs introduced - self.assertEqual(reindexed_int.dtype, np.float_) + assert reindexed_int.dtype == np.float_ # NO NaNs introduced reindexed_int = int_ts.reindex(int_ts.index[::2]) - self.assertEqual(reindexed_int.dtype, np.int_) + assert reindexed_int.dtype == np.int_ def test_reindex_bool(self): @@ -1806,18 +2103,18 @@ def test_reindex_bool(self): reindexed_bool = bool_ts.reindex(self.ts.index) # if NaNs introduced - self.assertEqual(reindexed_bool.dtype, np.object_) + assert reindexed_bool.dtype == np.object_ # NO NaNs introduced reindexed_bool = bool_ts.reindex(bool_ts.index[::2]) - self.assertEqual(reindexed_bool.dtype, np.bool_) + assert reindexed_bool.dtype == np.bool_ def test_reindex_bool_pad(self): # fail ts = self.ts[5:] bool_ts = Series(np.zeros(len(ts), dtype=bool), index=ts.index) filled_bool = bool_ts.reindex(self.ts.index, method='pad') - self.assertTrue(isnull(filled_bool[:5]).all()) + assert isna(filled_bool[:5]).all() def test_reindex_like(self): other = self.ts[::2] @@ -1859,7 +2156,7 @@ def test_reindex_fill_value(self): # don't upcast result = ints.reindex([1, 2, 3], fill_value=0) expected = Series([2, 3, 0], index=[1, 2, 3]) - self.assertTrue(issubclass(result.dtype.type, np.integer)) + assert issubclass(result.dtype.type, np.integer) assert_series_equal(result, expected) # ----------------------------------------------------------- @@ -1944,8 +2241,8 @@ def test_multilevel_preserve_name(self): result = s['foo'] result2 = s.loc['foo'] - self.assertEqual(result.name, s.name) - self.assertEqual(result2.name, s.name) + assert result.name == s.name + assert result2.name == s.name def test_setitem_scalar_into_readonly_backing_data(self): # GH14359: test that you cannot mutate a read only buffer @@ -1955,15 +2252,10 @@ def test_setitem_scalar_into_readonly_backing_data(self): series = Series(array) for n in range(len(series)): - with self.assertRaises(ValueError): + with pytest.raises(ValueError): series[n] = 1 - self.assertEqual( - array[n], - 0, - msg='even though the ValueError was raised, the underlying' - ' array was still mutated!', - ) + assert array[n] == 0 def test_setitem_slice_into_readonly_backing_data(self): # GH14359: test that you cannot mutate a read only buffer @@ -1972,16 +2264,432 @@ def test_setitem_slice_into_readonly_backing_data(self): array.flags.writeable = False # make the array immutable series = Series(array) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): series[1:3] = 1 - self.assertTrue( - not array.any(), - msg='even though the ValueError was raised, the underlying' - ' array was still mutated!', - ) + assert not array.any() + + +class TestTimeSeriesDuplicates(object): + + def setup_method(self, method): + dates = [datetime(2000, 1, 2), datetime(2000, 1, 2), + datetime(2000, 1, 2), datetime(2000, 1, 3), + datetime(2000, 1, 3), datetime(2000, 1, 3), + datetime(2000, 1, 4), datetime(2000, 1, 4), + datetime(2000, 1, 4), datetime(2000, 1, 5)] + + self.dups = Series(np.random.randn(len(dates)), index=dates) + + def test_constructor(self): + assert isinstance(self.dups, Series) + assert isinstance(self.dups.index, DatetimeIndex) + + def test_is_unique_monotonic(self): + assert not self.dups.index.is_unique + + def test_index_unique(self): + uniques = self.dups.index.unique() + expected = DatetimeIndex([datetime(2000, 1, 2), datetime(2000, 1, 3), + datetime(2000, 1, 4), datetime(2000, 1, 5)]) + assert uniques.dtype == 'M8[ns]' # sanity + tm.assert_index_equal(uniques, expected) + assert self.dups.index.nunique() == 4 + + # #2563 + assert isinstance(uniques, DatetimeIndex) + + dups_local = self.dups.index.tz_localize('US/Eastern') + dups_local.name = 'foo' + result = dups_local.unique() + expected = DatetimeIndex(expected, name='foo') + expected = expected.tz_localize('US/Eastern') + assert result.tz is not None + assert result.name == 'foo' + tm.assert_index_equal(result, expected) + + # NaT, note this is excluded + arr = [1370745748 + t for t in range(20)] + [tslib.iNaT] + idx = DatetimeIndex(arr * 3) + tm.assert_index_equal(idx.unique(), DatetimeIndex(arr)) + assert idx.nunique() == 20 + assert idx.nunique(dropna=False) == 21 + + arr = [Timestamp('2013-06-09 02:42:28') + timedelta(seconds=t) + for t in range(20)] + [NaT] + idx = DatetimeIndex(arr * 3) + tm.assert_index_equal(idx.unique(), DatetimeIndex(arr)) + assert idx.nunique() == 20 + assert idx.nunique(dropna=False) == 21 + + def test_index_dupes_contains(self): + d = datetime(2011, 12, 5, 20, 30) + ix = DatetimeIndex([d, d]) + assert d in ix + + def test_duplicate_dates_indexing(self): + ts = self.dups + + uniques = ts.index.unique() + for date in uniques: + result = ts[date] + + mask = ts.index == date + total = (ts.index == date).sum() + expected = ts[mask] + if total > 1: + assert_series_equal(result, expected) + else: + assert_almost_equal(result, expected[0]) + + cp = ts.copy() + cp[date] = 0 + expected = Series(np.where(mask, 0, ts), index=ts.index) + assert_series_equal(cp, expected) + + pytest.raises(KeyError, ts.__getitem__, datetime(2000, 1, 6)) + + # new index + ts[datetime(2000, 1, 6)] = 0 + assert ts[datetime(2000, 1, 6)] == 0 + + def test_range_slice(self): + idx = DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/3/2000', + '1/4/2000']) + + ts = Series(np.random.randn(len(idx)), index=idx) + + result = ts['1/2/2000':] + expected = ts[1:] + assert_series_equal(result, expected) + + result = ts['1/2/2000':'1/3/2000'] + expected = ts[1:4] + assert_series_equal(result, expected) + + def test_groupby_average_dup_values(self): + result = self.dups.groupby(level=0).mean() + expected = self.dups.groupby(self.dups.index).mean() + assert_series_equal(result, expected) + + def test_indexing_over_size_cutoff(self): + import datetime + # #1821 + + old_cutoff = _index._SIZE_CUTOFF + try: + _index._SIZE_CUTOFF = 1000 + + # create large list of non periodic datetime + dates = [] + sec = datetime.timedelta(seconds=1) + half_sec = datetime.timedelta(microseconds=500000) + d = datetime.datetime(2011, 12, 5, 20, 30) + n = 1100 + for i in range(n): + dates.append(d) + dates.append(d + sec) + dates.append(d + sec + half_sec) + dates.append(d + sec + sec + half_sec) + d += 3 * sec + + # duplicate some values in the list + duplicate_positions = np.random.randint(0, len(dates) - 1, 20) + for p in duplicate_positions: + dates[p + 1] = dates[p] + + df = DataFrame(np.random.randn(len(dates), 4), + index=dates, + columns=list('ABCD')) + + pos = n * 3 + timestamp = df.index[pos] + assert timestamp in df.index + + # it works! + df.loc[timestamp] + assert len(df.loc[[timestamp]]) > 0 + finally: + _index._SIZE_CUTOFF = old_cutoff + + def test_indexing_unordered(self): + # GH 2437 + rng = date_range(start='2011-01-01', end='2011-01-15') + ts = Series(np.random.rand(len(rng)), index=rng) + ts2 = pd.concat([ts[0:4], ts[-4:], ts[4:-4]]) + + for t in ts.index: + # TODO: unused? + s = str(t) # noqa + + expected = ts[t] + result = ts2[t] + assert expected == result + + # GH 3448 (ranges) + def compare(slobj): + result = ts2[slobj].copy() + result = result.sort_index() + expected = ts[slobj] + assert_series_equal(result, expected) + + compare(slice('2011-01-01', '2011-01-15')) + compare(slice('2010-12-30', '2011-01-15')) + compare(slice('2011-01-01', '2011-01-16')) + + # partial ranges + compare(slice('2011-01-01', '2011-01-6')) + compare(slice('2011-01-06', '2011-01-8')) + compare(slice('2011-01-06', '2011-01-12')) + + # single values + result = ts2['2011'].sort_index() + expected = ts['2011'] + assert_series_equal(result, expected) + + # diff freq + rng = date_range(datetime(2005, 1, 1), periods=20, freq='M') + ts = Series(np.arange(len(rng)), index=rng) + ts = ts.take(np.random.permutation(20)) + + result = ts['2005'] + for t in result.index: + assert t.year == 2005 + + def test_indexing(self): + + idx = date_range("2001-1-1", periods=20, freq='M') + ts = Series(np.random.rand(len(idx)), index=idx) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + # getting + + # GH 3070, make sure semantics work on Series/Frame + expected = ts['2001'] + expected.name = 'A' + + df = DataFrame(dict(A=ts)) + result = df['2001']['A'] + assert_series_equal(expected, result) + + # setting + ts['2001'] = 1 + expected = ts['2001'] + expected.name = 'A' + + df.loc['2001', 'A'] = 1 + + result = df['2001']['A'] + assert_series_equal(expected, result) + + # GH3546 (not including times on the last day) + idx = date_range(start='2013-05-31 00:00', end='2013-05-31 23:00', + freq='H') + ts = Series(lrange(len(idx)), index=idx) + expected = ts['2013-05'] + assert_series_equal(expected, ts) + + idx = date_range(start='2013-05-31 00:00', end='2013-05-31 23:59', + freq='S') + ts = Series(lrange(len(idx)), index=idx) + expected = ts['2013-05'] + assert_series_equal(expected, ts) + + idx = [Timestamp('2013-05-31 00:00'), + Timestamp(datetime(2013, 5, 31, 23, 59, 59, 999999))] + ts = Series(lrange(len(idx)), index=idx) + expected = ts['2013'] + assert_series_equal(expected, ts) + + # GH14826, indexing with a seconds resolution string / datetime object + df = DataFrame(np.random.rand(5, 5), + columns=['open', 'high', 'low', 'close', 'volume'], + index=date_range('2012-01-02 18:01:00', + periods=5, tz='US/Central', freq='s')) + expected = df.loc[[df.index[2]]] + + # this is a single date, so will raise + pytest.raises(KeyError, df.__getitem__, '2012-01-02 18:01:02', ) + pytest.raises(KeyError, df.__getitem__, df.index[2], ) + + +class TestDatetimeIndexing(object): + """ + Also test support for datetime64[ns] in Series / DataFrame + """ + + def setup_method(self, method): + dti = DatetimeIndex(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='Min') + self.series = Series(np.random.rand(len(dti)), dti) + + def test_fancy_getitem(self): + dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), + end=datetime(2010, 1, 1)) + + s = Series(np.arange(len(dti)), index=dti) + + assert s[48] == 48 + assert s['1/2/2009'] == 48 + assert s['2009-1-2'] == 48 + assert s[datetime(2009, 1, 2)] == 48 + assert s[lib.Timestamp(datetime(2009, 1, 2))] == 48 + pytest.raises(KeyError, s.__getitem__, '2009-1-3') + + assert_series_equal(s['3/6/2009':'2009-06-05'], + s[datetime(2009, 3, 6):datetime(2009, 6, 5)]) + + def test_fancy_setitem(self): + dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), + end=datetime(2010, 1, 1)) + + s = Series(np.arange(len(dti)), index=dti) + s[48] = -1 + assert s[48] == -1 + s['1/2/2009'] = -2 + assert s[48] == -2 + s['1/2/2009':'2009-06-05'] = -3 + assert (s[48:54] == -3).all() + + def test_dti_snap(self): + dti = DatetimeIndex(['1/1/2002', '1/2/2002', '1/3/2002', '1/4/2002', + '1/5/2002', '1/6/2002', '1/7/2002'], freq='D') + + res = dti.snap(freq='W-MON') + exp = date_range('12/31/2001', '1/7/2002', freq='w-mon') + exp = exp.repeat([3, 4]) + assert (res == exp).all() + + res = dti.snap(freq='B') + + exp = date_range('1/1/2002', '1/7/2002', freq='b') + exp = exp.repeat([1, 1, 1, 2, 2]) + assert (res == exp).all() + + def test_dti_reset_index_round_trip(self): + dti = DatetimeIndex(start='1/1/2001', end='6/1/2001', freq='D') + d1 = DataFrame({'v': np.random.rand(len(dti))}, index=dti) + d2 = d1.reset_index() + assert d2.dtypes[0] == np.dtype('M8[ns]') + d3 = d2.set_index('index') + assert_frame_equal(d1, d3, check_names=False) + + # #2329 + stamp = datetime(2012, 11, 22) + df = DataFrame([[stamp, 12.1]], columns=['Date', 'Value']) + df = df.set_index('Date') + + assert df.index[0] == stamp + assert df.reset_index()['Date'][0] == stamp + + def test_series_set_value(self): + # #1561 + + dates = [datetime(2001, 1, 1), datetime(2001, 1, 2)] + index = DatetimeIndex(dates) + + s = Series().set_value(dates[0], 1.) + s2 = s.set_value(dates[1], np.nan) + + exp = Series([1., np.nan], index=index) + + assert_series_equal(s2, exp) + + # s = Series(index[:1], index[:1]) + # s2 = s.set_value(dates[1], index[1]) + # assert s2.values.dtype == 'M8[ns]' + + @pytest.mark.slow + def test_slice_locs_indexerror(self): + times = [datetime(2000, 1, 1) + timedelta(minutes=i * 10) + for i in range(100000)] + s = Series(lrange(100000), times) + s.loc[datetime(1900, 1, 1):datetime(2100, 1, 1)] + + def test_slicing_datetimes(self): + + # GH 7523 + + # unique + df = DataFrame(np.arange(4., dtype='float64'), + index=[datetime(2001, 1, i, 10, 00) + for i in [1, 2, 3, 4]]) + result = df.loc[datetime(2001, 1, 1, 10):] + assert_frame_equal(result, df) + result = df.loc[:datetime(2001, 1, 4, 10)] + assert_frame_equal(result, df) + result = df.loc[datetime(2001, 1, 1, 10):datetime(2001, 1, 4, 10)] + assert_frame_equal(result, df) + + result = df.loc[datetime(2001, 1, 1, 11):] + expected = df.iloc[1:] + assert_frame_equal(result, expected) + result = df.loc['20010101 11':] + assert_frame_equal(result, expected) + + # duplicates + df = pd.DataFrame(np.arange(5., dtype='float64'), + index=[datetime(2001, 1, i, 10, 00) + for i in [1, 2, 2, 3, 4]]) + + result = df.loc[datetime(2001, 1, 1, 10):] + assert_frame_equal(result, df) + result = df.loc[:datetime(2001, 1, 4, 10)] + assert_frame_equal(result, df) + result = df.loc[datetime(2001, 1, 1, 10):datetime(2001, 1, 4, 10)] + assert_frame_equal(result, df) + + result = df.loc[datetime(2001, 1, 1, 11):] + expected = df.iloc[1:] + assert_frame_equal(result, expected) + result = df.loc['20010101 11':] + assert_frame_equal(result, expected) + + def test_frame_datetime64_duplicated(self): + dates = date_range('2010-07-01', end='2010-08-05') + + tst = DataFrame({'symbol': 'AAA', 'date': dates}) + result = tst.duplicated(['date', 'symbol']) + assert (-result).all() + + tst = DataFrame({'date': dates}) + result = tst.duplicated() + assert (-result).all() + + +class TestNatIndexing(object): + + def setup_method(self, method): + self.series = Series(date_range('1/1/2000', periods=10)) + + # --------------------------------------------------------------------- + # NaT support + + def test_set_none_nan(self): + self.series[3] = None + assert self.series[3] is NaT + + self.series[3:5] = None + assert self.series[4] is NaT + + self.series[5] = np.nan + assert self.series[5] is NaT + + self.series[5:7] = np.nan + assert self.series[6] is NaT + + def test_nat_operations(self): + # GH 8617 + s = Series([0, pd.NaT], dtype='m8[ns]') + exp = s[0] + assert s.median() == exp + assert s.min() == exp + assert s.max() == exp + + def test_round_nat(self): + # GH14940 + s = Series([pd.NaT]) + expected = Series(pd.NaT) + for method in ["round", "floor", "ceil"]: + round_method = getattr(s.dt, method) + for freq in ["s", "5s", "min", "5min", "h", "5h"]: + assert_series_equal(round_method(freq), expected) diff --git a/pandas/tests/series/test_internals.py b/pandas/tests/series/test_internals.py index e3a0e056f4da1..79e23459ac992 100644 --- a/pandas/tests/series/test_internals.py +++ b/pandas/tests/series/test_internals.py @@ -1,22 +1,22 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + from datetime import datetime from numpy import nan import numpy as np from pandas import Series -from pandas.tseries.index import Timestamp -import pandas.lib as lib +from pandas.core.indexes.datetimes import Timestamp +import pandas._libs.lib as lib from pandas.util.testing import assert_series_equal import pandas.util.testing as tm -class TestSeriesInternals(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesInternals(object): def test_convert_objects(self): @@ -116,7 +116,7 @@ def test_convert_objects(self): # r = s.copy() # r[0] = np.nan # result = r.convert_objects(convert_dates=True,convert_numeric=False) - # self.assertEqual(result.dtype, 'M8[ns]') + # assert result.dtype == 'M8[ns]' # dateutil parses some single letters into today's value as a date for x in 'abcdefghijklmnopqrstuvwxyz': @@ -282,7 +282,7 @@ def test_convert(self): # r = s.copy() # r[0] = np.nan # result = r._convert(convert_dates=True,convert_numeric=False) - # self.assertEqual(result.dtype, 'M8[ns]') + # assert result.dtype == 'M8[ns]' # dateutil parses some single letters into today's value as a date expected = Series([lib.NaT]) @@ -296,7 +296,7 @@ def test_convert(self): def test_convert_no_arg_error(self): s = Series(['1.0', '2']) - self.assertRaises(ValueError, s._convert) + pytest.raises(ValueError, s._convert) def test_convert_preserve_bool(self): s = Series([1, True, 3, 5], dtype=object) diff --git a/pandas/tests/series/test_io.py b/pandas/tests/series/test_io.py index 48528dc54adbd..503185de427f1 100644 --- a/pandas/tests/series/test_io.py +++ b/pandas/tests/series/test_io.py @@ -2,6 +2,8 @@ # pylint: disable-msg=E1101,W0612 from datetime import datetime +import collections +import pytest import numpy as np import pandas as pd @@ -16,9 +18,7 @@ from .common import TestData -class TestSeriesToCSV(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesToCSV(TestData): def test_from_csv(self): @@ -26,25 +26,25 @@ def test_from_csv(self): self.ts.to_csv(path) ts = Series.from_csv(path) assert_series_equal(self.ts, ts, check_names=False) - self.assertTrue(ts.name is None) - self.assertTrue(ts.index.name is None) + assert ts.name is None + assert ts.index.name is None # GH10483 self.ts.to_csv(path, header=True) ts_h = Series.from_csv(path, header=0) - self.assertTrue(ts_h.name == 'ts') + assert ts_h.name == 'ts' self.series.to_csv(path) series = Series.from_csv(path) - self.assertIsNone(series.name) - self.assertIsNone(series.index.name) + assert series.name is None + assert series.index.name is None assert_series_equal(self.series, series, check_names=False) - self.assertTrue(series.name is None) - self.assertTrue(series.index.name is None) + assert series.name is None + assert series.index.name is None self.series.to_csv(path, header=True) series_h = Series.from_csv(path, header=0) - self.assertTrue(series_h.name == 'series') + assert series_h.name == 'series' outfile = open(path, 'w') outfile.write('1998-01-01|1.0\n1999-01-01|2.0') @@ -107,12 +107,10 @@ def test_to_csv_path_is_none(self): # DataFrame.to_csv() which returned string s = Series([1, 2, 3]) csv_str = s.to_csv(path=None) - self.assertIsInstance(csv_str, str) - + assert isinstance(csv_str, str) -class TestSeriesIO(TestData, tm.TestCase): - _multiprocess_can_split_ = True +class TestSeriesIO(TestData): def test_to_frame(self): self.ts.name = None @@ -130,21 +128,18 @@ def test_to_frame(self): dict(testdifferent=self.ts.values), index=self.ts.index) assert_frame_equal(rs, xp) - def test_to_dict(self): - self.assert_series_equal(Series(self.ts.to_dict(), name='ts'), self.ts) - def test_timeseries_periodindex(self): # GH2891 from pandas import period_range prng = period_range('1/1/2011', '1/1/2012', freq='M') ts = Series(np.random.randn(len(prng)), prng) - new_ts = self.round_trip_pickle(ts) - self.assertEqual(new_ts.index.freq, 'M') + new_ts = tm.round_trip_pickle(ts) + assert new_ts.index.freq == 'M' def test_pickle_preserve_name(self): for n in [777, 777., 'name', datetime(2001, 11, 11), (1, 2)]: unpickled = self._pickle_roundtrip_name(tm.makeTimeSeries(name=n)) - self.assertEqual(unpickled.name, n) + assert unpickled.name == n def _pickle_roundtrip_name(self, obj): @@ -167,14 +162,25 @@ class SubclassedFrame(DataFrame): s = SubclassedSeries([1, 2, 3], name='X') result = s.to_frame() - self.assertTrue(isinstance(result, SubclassedFrame)) + assert isinstance(result, SubclassedFrame) expected = SubclassedFrame({'X': [1, 2, 3]}) assert_frame_equal(result, expected) + @pytest.mark.parametrize('mapping', ( + dict, + collections.defaultdict(list), + collections.OrderedDict)) + def test_to_dict(self, mapping): + # GH16122 + ts = TestData().ts + tm.assert_series_equal( + Series(ts.to_dict(mapping), name='ts'), ts) + from_method = Series(ts.to_dict(collections.Counter)) + from_constructor = Series(collections.Counter(ts.iteritems())) + tm.assert_series_equal(from_method, from_constructor) -class TestSeriesToList(TestData, tm.TestCase): - _multiprocess_can_split_ = True +class TestSeriesToList(TestData): def test_tolist(self): rs = self.ts.tolist() @@ -184,25 +190,25 @@ def test_tolist(self): # datetime64 s = Series(self.ts.index) rs = s.tolist() - self.assertEqual(self.ts.index[0], rs[0]) + assert self.ts.index[0] == rs[0] def test_tolist_np_int(self): # GH10904 for t in ['int8', 'int16', 'int32', 'int64']: s = pd.Series([1], dtype=t) - self.assertIsInstance(s.tolist()[0], (int, long)) + assert isinstance(s.tolist()[0], (int, long)) def test_tolist_np_uint(self): # GH10904 for t in ['uint8', 'uint16']: s = pd.Series([1], dtype=t) - self.assertIsInstance(s.tolist()[0], int) + assert isinstance(s.tolist()[0], int) for t in ['uint32', 'uint64']: s = pd.Series([1], dtype=t) - self.assertIsInstance(s.tolist()[0], long) + assert isinstance(s.tolist()[0], long) def test_tolist_np_float(self): # GH10904 for t in ['float16', 'float32', 'float64']: s = pd.Series([1], dtype=t) - self.assertIsInstance(s.tolist()[0], float) + assert isinstance(s.tolist()[0], float) diff --git a/pandas/tests/series/test_missing.py b/pandas/tests/series/test_missing.py index 3c82e4ed82969..01bf7274fd384 100644 --- a/pandas/tests/series/test_missing.py +++ b/pandas/tests/series/test_missing.py @@ -2,52 +2,70 @@ # pylint: disable-msg=E1101,W0612 import pytz +import pytest + from datetime import timedelta, datetime +from distutils.version import LooseVersion from numpy import nan import numpy as np import pandas as pd -from pandas import (Series, isnull, date_range, - MultiIndex, Index) -from pandas.tseries.index import Timestamp +from pandas import (Series, DataFrame, isna, date_range, + MultiIndex, Index, Timestamp, NaT, IntervalIndex) from pandas.compat import range -from pandas.util.testing import assert_series_equal +from pandas._libs.tslib import iNaT +from pandas.core.series import remove_na +from pandas.util.testing import assert_series_equal, assert_frame_equal import pandas.util.testing as tm from .common import TestData +try: + import scipy + _is_scipy_ge_0190 = scipy.__version__ >= LooseVersion('0.19.0') +except: + _is_scipy_ge_0190 = False + def _skip_if_no_pchip(): try: from scipy.interpolate import pchip_interpolate # noqa except ImportError: - import nose - raise nose.SkipTest('scipy.interpolate.pchip missing') + import pytest + pytest.skip('scipy.interpolate.pchip missing') def _skip_if_no_akima(): try: from scipy.interpolate import Akima1DInterpolator # noqa except ImportError: - import nose - raise nose.SkipTest('scipy.interpolate.Akima1DInterpolator missing') + import pytest + pytest.skip('scipy.interpolate.Akima1DInterpolator missing') + +def _simple_ts(start, end, freq='D'): + rng = date_range(start, end, freq=freq) + return Series(np.random.randn(len(rng)), index=rng) -class TestSeriesMissingData(TestData, tm.TestCase): - _multiprocess_can_split_ = True +class TestSeriesMissingData(TestData): + + def test_remove_na_deprecation(self): + # see gh-16971 + with tm.assert_produces_warning(FutureWarning): + remove_na(Series([])) def test_timedelta_fillna(self): # GH 3371 - s = Series([Timestamp('20130101'), Timestamp('20130101'), Timestamp( - '20130102'), Timestamp('20130103 9:01:01')]) + s = Series([Timestamp('20130101'), Timestamp('20130101'), + Timestamp('20130102'), Timestamp('20130103 9:01:01')]) td = s.diff() # reg fillna result = td.fillna(0) - expected = Series([timedelta(0), timedelta(0), timedelta(1), timedelta( - days=1, seconds=9 * 3600 + 60 + 1)]) + expected = Series([timedelta(0), timedelta(0), timedelta(1), + timedelta(days=1, seconds=9 * 3600 + 60 + 1)]) assert_series_equal(result, expected) # interprested as seconds @@ -57,8 +75,9 @@ def test_timedelta_fillna(self): assert_series_equal(result, expected) result = td.fillna(timedelta(days=1, seconds=1)) - expected = Series([timedelta(days=1, seconds=1), timedelta( - 0), timedelta(1), timedelta(days=1, seconds=9 * 3600 + 60 + 1)]) + expected = Series([timedelta(days=1, seconds=1), timedelta(0), + timedelta(1), + timedelta(days=1, seconds=9 * 3600 + 60 + 1)]) assert_series_equal(result, expected) result = td.fillna(np.timedelta64(int(1e9))) @@ -66,9 +85,8 @@ def test_timedelta_fillna(self): timedelta(days=1, seconds=9 * 3600 + 60 + 1)]) assert_series_equal(result, expected) - from pandas import tslib - result = td.fillna(tslib.NaT) - expected = Series([tslib.NaT, timedelta(0), timedelta(1), + result = td.fillna(NaT) + expected = Series([NaT, timedelta(0), timedelta(1), timedelta(days=1, seconds=9 * 3600 + 60 + 1)], dtype='m8[ns]') assert_series_equal(result, expected) @@ -99,8 +117,7 @@ def test_datetime64_fillna(self): '20130101'), Timestamp('20130104'), Timestamp('20130103 9:01:01')]) assert_series_equal(result, expected) - from pandas import tslib - result = s.fillna(tslib.NaT) + result = s.fillna(NaT) expected = s assert_series_equal(result, expected) @@ -128,6 +145,7 @@ def test_datetime64_fillna(self): assert_series_equal(result, expected) def test_datetime64_tz_fillna(self): + for tz in ['US/Eastern', 'Asia/Tokyo']: # DatetimeBlock s = Series([Timestamp('2011-01-01 10:00'), pd.NaT, @@ -139,24 +157,24 @@ def test_datetime64_tz_fillna(self): Timestamp('2011-01-02 10:00'), Timestamp('2011-01-03 10:00'), Timestamp('2011-01-02 10:00')]) - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) # check s is not changed - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna(pd.Timestamp('2011-01-02 10:00', tz=tz)) expected = Series([Timestamp('2011-01-01 10:00'), Timestamp('2011-01-02 10:00', tz=tz), Timestamp('2011-01-03 10:00'), Timestamp('2011-01-02 10:00', tz=tz)]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna('AAA') expected = Series([Timestamp('2011-01-01 10:00'), 'AAA', Timestamp('2011-01-03 10:00'), 'AAA'], dtype=object) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna({1: pd.Timestamp('2011-01-02 10:00', tz=tz), 3: pd.Timestamp('2011-01-04 10:00')}) @@ -164,8 +182,8 @@ def test_datetime64_tz_fillna(self): Timestamp('2011-01-02 10:00', tz=tz), Timestamp('2011-01-03 10:00'), Timestamp('2011-01-04 10:00')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna({1: pd.Timestamp('2011-01-02 10:00'), 3: pd.Timestamp('2011-01-04 10:00')}) @@ -173,31 +191,31 @@ def test_datetime64_tz_fillna(self): Timestamp('2011-01-02 10:00'), Timestamp('2011-01-03 10:00'), Timestamp('2011-01-04 10:00')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) # DatetimeBlockTZ idx = pd.DatetimeIndex(['2011-01-01 10:00', pd.NaT, '2011-01-03 10:00', pd.NaT], tz=tz) s = pd.Series(idx) - self.assertEqual(s.dtype, 'datetime64[ns, {0}]'.format(tz)) - self.assert_series_equal(pd.isnull(s), null_loc) + assert s.dtype == 'datetime64[ns, {0}]'.format(tz) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna(pd.Timestamp('2011-01-02 10:00')) expected = Series([Timestamp('2011-01-01 10:00', tz=tz), Timestamp('2011-01-02 10:00'), Timestamp('2011-01-03 10:00', tz=tz), Timestamp('2011-01-02 10:00')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna(pd.Timestamp('2011-01-02 10:00', tz=tz)) idx = pd.DatetimeIndex(['2011-01-01 10:00', '2011-01-02 10:00', '2011-01-03 10:00', '2011-01-02 10:00'], tz=tz) expected = Series(idx) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna(pd.Timestamp('2011-01-02 10:00', tz=tz).to_pydatetime()) @@ -205,15 +223,15 @@ def test_datetime64_tz_fillna(self): '2011-01-03 10:00', '2011-01-02 10:00'], tz=tz) expected = Series(idx) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna('AAA') expected = Series([Timestamp('2011-01-01 10:00', tz=tz), 'AAA', Timestamp('2011-01-03 10:00', tz=tz), 'AAA'], dtype=object) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna({1: pd.Timestamp('2011-01-02 10:00', tz=tz), 3: pd.Timestamp('2011-01-04 10:00')}) @@ -221,8 +239,8 @@ def test_datetime64_tz_fillna(self): Timestamp('2011-01-02 10:00', tz=tz), Timestamp('2011-01-03 10:00', tz=tz), Timestamp('2011-01-04 10:00')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna({1: pd.Timestamp('2011-01-02 10:00', tz=tz), 3: pd.Timestamp('2011-01-04 10:00', tz=tz)}) @@ -230,8 +248,8 @@ def test_datetime64_tz_fillna(self): Timestamp('2011-01-02 10:00', tz=tz), Timestamp('2011-01-03 10:00', tz=tz), Timestamp('2011-01-04 10:00', tz=tz)]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) # filling with a naive/other zone, coerce to object result = s.fillna(Timestamp('20130101')) @@ -239,16 +257,62 @@ def test_datetime64_tz_fillna(self): Timestamp('2013-01-01'), Timestamp('2011-01-03 10:00', tz=tz), Timestamp('2013-01-01')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) result = s.fillna(Timestamp('20130101', tz='US/Pacific')) expected = Series([Timestamp('2011-01-01 10:00', tz=tz), Timestamp('2013-01-01', tz='US/Pacific'), Timestamp('2011-01-03 10:00', tz=tz), Timestamp('2013-01-01', tz='US/Pacific')]) - self.assert_series_equal(expected, result) - self.assert_series_equal(pd.isnull(s), null_loc) + tm.assert_series_equal(expected, result) + tm.assert_series_equal(pd.isna(s), null_loc) + + # with timezone + # GH 15855 + df = pd.Series([pd.Timestamp('2012-11-11 00:00:00+01:00'), pd.NaT]) + exp = pd.Series([pd.Timestamp('2012-11-11 00:00:00+01:00'), + pd.Timestamp('2012-11-11 00:00:00+01:00')]) + assert_series_equal(df.fillna(method='pad'), exp) + + df = pd.Series([pd.NaT, pd.Timestamp('2012-11-11 00:00:00+01:00')]) + exp = pd.Series([pd.Timestamp('2012-11-11 00:00:00+01:00'), + pd.Timestamp('2012-11-11 00:00:00+01:00')]) + assert_series_equal(df.fillna(method='bfill'), exp) + + def test_fillna_consistency(self): + # GH 16402 + # fillna with a tz aware to a tz-naive, should result in object + + s = Series([Timestamp('20130101'), pd.NaT]) + + result = s.fillna(Timestamp('20130101', tz='US/Eastern')) + expected = Series([Timestamp('20130101'), + Timestamp('2013-01-01', tz='US/Eastern')], + dtype='object') + assert_series_equal(result, expected) + + # where (we ignore the raise_on_error) + result = s.where([True, False], + Timestamp('20130101', tz='US/Eastern'), + raise_on_error=False) + assert_series_equal(result, expected) + + result = s.where([True, False], + Timestamp('20130101', tz='US/Eastern'), + raise_on_error=True) + assert_series_equal(result, expected) + + # with a non-datetime + result = s.fillna('foo') + expected = Series([Timestamp('20130101'), + 'foo']) + assert_series_equal(result, expected) + + # assignment + s2 = s.copy() + s2[1] = 'foo' + assert_series_equal(s2, expected) def test_datetime64tz_fillna_round_issue(self): # GH 14872 @@ -268,6 +332,20 @@ def test_datetime64tz_fillna_round_issue(self): assert_series_equal(filled, expected) + def test_fillna_downcast(self): + # GH 15277 + # infer int64 from float64 + s = pd.Series([1., np.nan]) + result = s.fillna(0, downcast='infer') + expected = pd.Series([1, 0]) + assert_series_equal(result, expected) + + # infer int64 from float64 when fillna value is a dict + s = pd.Series([1., np.nan]) + result = s.fillna({1: 0}, downcast='infer') + expected = pd.Series([1, 0]) + assert_series_equal(result, expected) + def test_fillna_int(self): s = Series(np.random.randint(-100, 100, 50)) s.fillna(method='ffill', inplace=True) @@ -275,37 +353,97 @@ def test_fillna_int(self): def test_fillna_raise(self): s = Series(np.random.randint(-100, 100, 50)) - self.assertRaises(TypeError, s.fillna, [1, 2]) - self.assertRaises(TypeError, s.fillna, (1, 2)) + pytest.raises(TypeError, s.fillna, [1, 2]) + pytest.raises(TypeError, s.fillna, (1, 2)) + + # related GH 9217, make sure limit is an int and greater than 0 + s = Series([1, 2, 3, None]) + for limit in [-1, 0, 1., 2.]: + for method in ['backfill', 'bfill', 'pad', 'ffill', None]: + with pytest.raises(ValueError): + s.fillna(1, limit=limit, method=method) + + def test_fillna_nat(self): + series = Series([0, 1, 2, iNaT], dtype='M8[ns]') - def test_isnull_for_inf(self): + filled = series.fillna(method='pad') + filled2 = series.fillna(value=series.values[2]) + + expected = series.copy() + expected.values[3] = expected.values[2] + + assert_series_equal(filled, expected) + assert_series_equal(filled2, expected) + + df = DataFrame({'A': series}) + filled = df.fillna(method='pad') + filled2 = df.fillna(value=series.values[2]) + expected = DataFrame({'A': expected}) + assert_frame_equal(filled, expected) + assert_frame_equal(filled2, expected) + + series = Series([iNaT, 0, 1, 2], dtype='M8[ns]') + + filled = series.fillna(method='bfill') + filled2 = series.fillna(value=series[1]) + + expected = series.copy() + expected[0] = expected[1] + + assert_series_equal(filled, expected) + assert_series_equal(filled2, expected) + + df = DataFrame({'A': series}) + filled = df.fillna(method='bfill') + filled2 = df.fillna(value=series[1]) + expected = DataFrame({'A': expected}) + assert_frame_equal(filled, expected) + assert_frame_equal(filled2, expected) + + def test_isna_for_inf(self): s = Series(['a', np.inf, np.nan, 1.0]) - with pd.option_context('mode.use_inf_as_null', True): - r = s.isnull() + with pd.option_context('mode.use_inf_as_na', True): + r = s.isna() dr = s.dropna() e = Series([False, True, True, False]) de = Series(['a', 1.0], index=[0, 3]) tm.assert_series_equal(r, e) tm.assert_series_equal(dr, de) + @tm.capture_stdout + def test_isnull_for_inf_deprecated(self): + # gh-17115 + s = Series(['a', np.inf, np.nan, 1.0]) + with tm.assert_produces_warning(DeprecationWarning, + check_stacklevel=False): + pd.set_option('mode.use_inf_as_null', True) + r = s.isna() + dr = s.dropna() + pd.reset_option('mode.use_inf_as_null') + + e = Series([False, True, True, False]) + de = Series(['a', 1.0], index=[0, 3]) + tm.assert_series_equal(r, e) + tm.assert_series_equal(dr, de) + def test_fillna(self): ts = Series([0., 1., 2., 3., 4.], index=tm.makeDateIndex(5)) - self.assert_series_equal(ts, ts.fillna(method='ffill')) + tm.assert_series_equal(ts, ts.fillna(method='ffill')) ts[2] = np.NaN exp = Series([0., 1., 1., 3., 4.], index=ts.index) - self.assert_series_equal(ts.fillna(method='ffill'), exp) + tm.assert_series_equal(ts.fillna(method='ffill'), exp) exp = Series([0., 1., 3., 3., 4.], index=ts.index) - self.assert_series_equal(ts.fillna(method='backfill'), exp) + tm.assert_series_equal(ts.fillna(method='backfill'), exp) exp = Series([0., 1., 5., 3., 4.], index=ts.index) - self.assert_series_equal(ts.fillna(value=5), exp) + tm.assert_series_equal(ts.fillna(value=5), exp) - self.assertRaises(ValueError, ts.fillna) - self.assertRaises(ValueError, self.ts.fillna, value=0, method='ffill') + pytest.raises(ValueError, ts.fillna) + pytest.raises(ValueError, self.ts.fillna, value=0, method='ffill') # GH 5703 s1 = Series([np.nan]) @@ -379,7 +517,7 @@ def test_fillna_invalid_method(self): try: self.ts.fillna(method='ffil') except ValueError as inst: - self.assertIn('ffil', str(inst)) + assert 'ffil' in str(inst) def test_ffill(self): ts = Series([0., 1., 2., 3., 4.], index=tm.makeDateIndex(5)) @@ -399,34 +537,33 @@ def test_bfill(self): def test_timedelta64_nan(self): - from pandas import tslib td = Series([timedelta(days=i) for i in range(10)]) # nan ops on timedeltas td1 = td.copy() td1[0] = np.nan - self.assertTrue(isnull(td1[0])) - self.assertEqual(td1[0].value, tslib.iNaT) + assert isna(td1[0]) + assert td1[0].value == iNaT td1[0] = td[0] - self.assertFalse(isnull(td1[0])) + assert not isna(td1[0]) - td1[1] = tslib.iNaT - self.assertTrue(isnull(td1[1])) - self.assertEqual(td1[1].value, tslib.iNaT) + td1[1] = iNaT + assert isna(td1[1]) + assert td1[1].value == iNaT td1[1] = td[1] - self.assertFalse(isnull(td1[1])) + assert not isna(td1[1]) - td1[2] = tslib.NaT - self.assertTrue(isnull(td1[2])) - self.assertEqual(td1[2].value, tslib.iNaT) + td1[2] = NaT + assert isna(td1[2]) + assert td1[2].value == iNaT td1[2] = td[2] - self.assertFalse(isnull(td1[2])) + assert not isna(td1[2]) # boolean setting # this doesn't work, not sure numpy even supports it # result = td[(td>np.timedelta64(timedelta(days=3))) & # td val + expected = Series([x > val for x in series]) + tm.assert_series_equal(result, expected) - _multiprocess_can_split_ = True + val = series[5] + result = series > val + expected = Series([x > val for x in series]) + tm.assert_series_equal(result, expected) def test_comparisons(self): left = np.random.randn(10) @@ -108,8 +121,8 @@ def test_div(self): result = p['first'] / p['second'] assert_series_equal(result, p['first'].astype('float64'), check_names=False) - self.assertTrue(result.name is None) - self.assertFalse(np.array_equal(result, p['second'] / p['first'])) + assert result.name is None + assert not np.array_equal(result, p['second'] / p['first']) # inf signing s = Series([np.nan, 1., -1.]) @@ -219,7 +232,7 @@ def check(series, other): # check the values, name, and index separatly assert_almost_equal(np.asarray(result), expected) - self.assertEqual(result.name, series.name) + assert result.name == series.name assert_index_equal(result.index, series.index) check(self.ts, self.ts * 2) @@ -235,12 +248,12 @@ def test_operators_empty_int_corner(self): def test_operators_timedelta64(self): # invalid ops - self.assertRaises(Exception, self.objSeries.__add__, 1) - self.assertRaises(Exception, self.objSeries.__add__, - np.array(1, dtype=np.int64)) - self.assertRaises(Exception, self.objSeries.__sub__, 1) - self.assertRaises(Exception, self.objSeries.__sub__, - np.array(1, dtype=np.int64)) + pytest.raises(Exception, self.objSeries.__add__, 1) + pytest.raises(Exception, self.objSeries.__add__, + np.array(1, dtype=np.int64)) + pytest.raises(Exception, self.objSeries.__sub__, 1) + pytest.raises(Exception, self.objSeries.__sub__, + np.array(1, dtype=np.int64)) # seriese ops v1 = date_range('2012-1-1', periods=3, freq='D') @@ -249,25 +262,25 @@ def test_operators_timedelta64(self): xp = Series(1e9 * 3600 * 24, rs.index).astype('int64').astype('timedelta64[ns]') assert_series_equal(rs, xp) - self.assertEqual(rs.dtype, 'timedelta64[ns]') + assert rs.dtype == 'timedelta64[ns]' df = DataFrame(dict(A=v1)) td = Series([timedelta(days=i) for i in range(3)]) - self.assertEqual(td.dtype, 'timedelta64[ns]') + assert td.dtype == 'timedelta64[ns]' # series on the rhs result = df['A'] - df['A'].shift() - self.assertEqual(result.dtype, 'timedelta64[ns]') + assert result.dtype == 'timedelta64[ns]' result = df['A'] + td - self.assertEqual(result.dtype, 'M8[ns]') + assert result.dtype == 'M8[ns]' # scalar Timestamp on rhs maxa = df['A'].max() - tm.assertIsInstance(maxa, Timestamp) + assert isinstance(maxa, Timestamp) resultb = df['A'] - df['A'].max() - self.assertEqual(resultb.dtype, 'timedelta64[ns]') + assert resultb.dtype == 'timedelta64[ns]' # timestamp on lhs result = resultb + df['A'] @@ -281,11 +294,11 @@ def test_operators_timedelta64(self): expected = Series( [timedelta(days=4017 + i) for i in range(3)], name='A') assert_series_equal(result, expected) - self.assertEqual(result.dtype, 'm8[ns]') + assert result.dtype == 'm8[ns]' d = datetime(2001, 1, 1, 3, 4) resulta = df['A'] - d - self.assertEqual(resulta.dtype, 'm8[ns]') + assert resulta.dtype == 'm8[ns]' # roundtrip resultb = resulta + d @@ -296,31 +309,31 @@ def test_operators_timedelta64(self): resulta = df['A'] + td resultb = resulta - td assert_series_equal(resultb, df['A']) - self.assertEqual(resultb.dtype, 'M8[ns]') + assert resultb.dtype == 'M8[ns]' # roundtrip td = timedelta(minutes=5, seconds=3) resulta = df['A'] + td resultb = resulta - td assert_series_equal(df['A'], resultb) - self.assertEqual(resultb.dtype, 'M8[ns]') + assert resultb.dtype == 'M8[ns]' # inplace value = rs[2] + np.timedelta64(timedelta(minutes=5, seconds=1)) rs[2] += np.timedelta64(timedelta(minutes=5, seconds=1)) - self.assertEqual(rs[2], value) + assert rs[2] == value def test_operator_series_comparison_zerorank(self): # GH 13006 result = np.float64(0) > pd.Series([1, 2, 3]) expected = 0.0 > pd.Series([1, 2, 3]) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = pd.Series([1, 2, 3]) < np.float64(0) expected = pd.Series([1, 2, 3]) < 0.0 - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = np.array([0, 1, 2])[0] > pd.Series([0, 1, 2]) expected = 0.0 > pd.Series([1, 2, 3]) - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_timedeltas_with_DateOffset(self): @@ -427,7 +440,7 @@ def test_timedelta64_operations_with_timedeltas(self): result = td1 - td2 expected = Series([timedelta(seconds=0)] * 3) - Series([timedelta( seconds=1)] * 3) - self.assertEqual(result.dtype, 'm8[ns]') + assert result.dtype == 'm8[ns]' assert_series_equal(result, expected) result2 = td2 - td1 @@ -445,7 +458,7 @@ def test_timedelta64_operations_with_timedeltas(self): result = td1 - td2 expected = Series([timedelta(seconds=0)] * 3) - Series([timedelta( seconds=1)] * 3) - self.assertEqual(result.dtype, 'm8[ns]') + assert result.dtype == 'm8[ns]' assert_series_equal(result, expected) result2 = td2 - td1 @@ -519,8 +532,8 @@ def test_timedelta64_operations_with_integers(self): for op in ['__add__', '__sub__']: sop = getattr(s1, op, None) if sop is not None: - self.assertRaises(TypeError, sop, 1) - self.assertRaises(TypeError, sop, s2.values) + pytest.raises(TypeError, sop, 1) + pytest.raises(TypeError, sop, s2.values) def test_timedelta64_conversions(self): startdate = Series(date_range('2013-01-01', '2013-01-03')) @@ -551,12 +564,12 @@ def test_timedelta64_conversions(self): # astype s = Series(date_range('20130101', periods=3)) result = s.astype(object) - self.assertIsInstance(result.iloc[0], datetime) - self.assertTrue(result.dtype == np.object_) + assert isinstance(result.iloc[0], datetime) + assert result.dtype == np.object_ result = s1.astype(object) - self.assertIsInstance(result.iloc[0], timedelta) - self.assertTrue(result.dtype == np.object_) + assert isinstance(result.iloc[0], timedelta) + assert result.dtype == np.object_ def test_timedelta64_equal_timedelta_supported_ops(self): ser = Series([Timestamp('20130301'), Timestamp('20130228 23:00:00'), @@ -599,7 +612,7 @@ def run_ops(ops, get_ser, test_ser): # defined for op_str in ops: op = getattr(get_ser, op_str, None) - with tm.assertRaisesRegexp(TypeError, 'operate'): + with tm.assert_raises_regex(TypeError, 'operate'): op(test_ser) # ## timedelta64 ### @@ -673,24 +686,23 @@ def run_ops(ops, get_ser, test_ser): assert_series_equal(result, exp) # odd numpy behavior with scalar timedeltas - if not _np_version_under1p8: - result = td1[0] + dt1 - exp = (dt1.dt.tz_localize(None) + td1[0]).dt.tz_localize(tz) - assert_series_equal(result, exp) + result = td1[0] + dt1 + exp = (dt1.dt.tz_localize(None) + td1[0]).dt.tz_localize(tz) + assert_series_equal(result, exp) - result = td2[0] + dt2 - exp = (dt2.dt.tz_localize(None) + td2[0]).dt.tz_localize(tz) - assert_series_equal(result, exp) + result = td2[0] + dt2 + exp = (dt2.dt.tz_localize(None) + td2[0]).dt.tz_localize(tz) + assert_series_equal(result, exp) result = dt1 - td1[0] exp = (dt1.dt.tz_localize(None) - td1[0]).dt.tz_localize(tz) assert_series_equal(result, exp) - self.assertRaises(TypeError, lambda: td1[0] - dt1) + pytest.raises(TypeError, lambda: td1[0] - dt1) result = dt2 - td2[0] exp = (dt2.dt.tz_localize(None) - td2[0]).dt.tz_localize(tz) assert_series_equal(result, exp) - self.assertRaises(TypeError, lambda: td2[0] - dt2) + pytest.raises(TypeError, lambda: td2[0] - dt2) result = dt1 + td1 exp = (dt1.dt.tz_localize(None) + td1).dt.tz_localize(tz) @@ -708,13 +720,11 @@ def run_ops(ops, get_ser, test_ser): exp = (dt2.dt.tz_localize(None) - td2).dt.tz_localize(tz) assert_series_equal(result, exp) - self.assertRaises(TypeError, lambda: td1 - dt1) - self.assertRaises(TypeError, lambda: td2 - dt2) + pytest.raises(TypeError, lambda: td1 - dt1) + pytest.raises(TypeError, lambda: td2 - dt2) def test_sub_datetime_compat(self): - # GH 14088 - tm._skip_if_no_pytz() - import pytz + # see gh-14088 s = Series([datetime(2016, 8, 23, 12, tzinfo=pytz.utc), pd.NaT]) dt = datetime(2016, 8, 22, 12, tzinfo=pytz.utc) exp = Series([Timedelta('1 days'), pd.NaT]) @@ -757,7 +767,7 @@ def test_ops_nat(self): assert_series_equal(datetime_series - single_nat_dtype_datetime, nat_series_dtype_timedelta) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): -single_nat_dtype_datetime + datetime_series assert_series_equal(datetime_series - single_nat_dtype_timedelta, @@ -776,7 +786,7 @@ def test_ops_nat(self): assert_series_equal(nat_series_dtype_timestamp - single_nat_dtype_datetime, nat_series_dtype_timedelta) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): -single_nat_dtype_datetime + nat_series_dtype_timestamp assert_series_equal(nat_series_dtype_timestamp - @@ -786,7 +796,7 @@ def test_ops_nat(self): nat_series_dtype_timestamp, nat_series_dtype_timestamp) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): timedelta_series - single_nat_dtype_datetime # addition @@ -870,13 +880,13 @@ def test_ops_nat(self): assert_series_equal(timedelta_series * nan, nat_series_dtype_timedelta) assert_series_equal(nan * timedelta_series, nat_series_dtype_timedelta) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): datetime_series * 1 - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): nat_series_dtype_timestamp * 1 - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): datetime_series * 1.0 - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): nat_series_dtype_timestamp * 1.0 # division @@ -885,9 +895,9 @@ def test_ops_nat(self): assert_series_equal(timedelta_series / 2.0, Series([NaT, Timedelta('0.5s')])) assert_series_equal(timedelta_series / nan, nat_series_dtype_timedelta) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): nat_series_dtype_timestamp / 1.0 - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): nat_series_dtype_timestamp / 1 def test_ops_datetimelike_align(self): @@ -987,7 +997,7 @@ def test_comparison_operators_with_nas(self): # boolean &, |, ^ should work with object arrays and propagate NAs ops = ['and_', 'or_', 'xor'] - mask = s.isnull() + mask = s.isna() for bool_op in ops: f = getattr(operator, bool_op) @@ -1019,12 +1029,12 @@ def test_comparison_invalid(self): s2 = Series(date_range('20010101', periods=5)) for (x, y) in [(s, s2), (s2, s)]: - self.assertRaises(TypeError, lambda: x == y) - self.assertRaises(TypeError, lambda: x != y) - self.assertRaises(TypeError, lambda: x >= y) - self.assertRaises(TypeError, lambda: x > y) - self.assertRaises(TypeError, lambda: x < y) - self.assertRaises(TypeError, lambda: x <= y) + pytest.raises(TypeError, lambda: x == y) + pytest.raises(TypeError, lambda: x != y) + pytest.raises(TypeError, lambda: x >= y) + pytest.raises(TypeError, lambda: x > y) + pytest.raises(TypeError, lambda: x < y) + pytest.raises(TypeError, lambda: x <= y) def test_more_na_comparisons(self): for dtype in [None, object]: @@ -1122,11 +1132,11 @@ def test_nat_comparisons_scalar(self): def test_comparison_different_length(self): a = Series(['a', 'b', 'c']) b = Series(['b', 'a']) - self.assertRaises(ValueError, a.__lt__, b) + pytest.raises(ValueError, a.__lt__, b) a = Series([1, 2]) b = Series([2, 3, 4]) - self.assertRaises(ValueError, a.__eq__, b) + pytest.raises(ValueError, a.__eq__, b) def test_comparison_label_based(self): @@ -1205,7 +1215,7 @@ def test_comparison_label_based(self): assert_series_equal(result, expected) for v in [np.nan, 'foo']: - self.assertRaises(TypeError, lambda: t | v) + pytest.raises(TypeError, lambda: t | v) for v in [False, 0]: result = Series([True, False, True], index=index) | v @@ -1222,7 +1232,7 @@ def test_comparison_label_based(self): expected = Series([False, False, False], index=index) assert_series_equal(result, expected) for v in [np.nan]: - self.assertRaises(TypeError, lambda: t & v) + pytest.raises(TypeError, lambda: t & v) def test_comparison_flex_basic(self): left = pd.Series(np.random.randn(10)) @@ -1247,7 +1257,7 @@ def test_comparison_flex_basic(self): # msg = 'No axis named 1 for object type' for op in ['eq', 'ne', 'le', 'le', 'gt', 'ge']: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): getattr(left, op)(right, axis=1) def test_comparison_flex_alignment(self): @@ -1300,13 +1310,13 @@ def test_return_dtypes_bool_op_costant(self): const = 2 for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: result = getattr(s, op)(const).get_dtype_counts() - self.assert_series_equal(result, Series([1], ['bool'])) + tm.assert_series_equal(result, Series([1], ['bool'])) # empty Series empty = s.iloc[:0] for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: result = getattr(empty, op)(const).get_dtype_counts() - self.assert_series_equal(result, Series([1], ['bool'])) + tm.assert_series_equal(result, Series([1], ['bool'])) def test_operators_bitwise(self): # GH 9016: support bitwise op for integer types @@ -1377,11 +1387,11 @@ def test_operators_bitwise(self): expected = Series([1, 1, 3, 3], dtype='int32') assert_series_equal(res, expected) - self.assertRaises(TypeError, lambda: s_1111 & 'a') - self.assertRaises(TypeError, lambda: s_1111 & ['a', 'b', 'c', 'd']) - self.assertRaises(TypeError, lambda: s_0123 & np.NaN) - self.assertRaises(TypeError, lambda: s_0123 & 3.14) - self.assertRaises(TypeError, lambda: s_0123 & [0.1, 4, 3.14, 2]) + pytest.raises(TypeError, lambda: s_1111 & 'a') + pytest.raises(TypeError, lambda: s_1111 & ['a', 'b', 'c', 'd']) + pytest.raises(TypeError, lambda: s_0123 & np.NaN) + pytest.raises(TypeError, lambda: s_0123 & 3.14) + pytest.raises(TypeError, lambda: s_0123 & [0.1, 4, 3.14, 2]) # s_0123 will be all false now because of reindexing like s_tft if compat.PY3: @@ -1424,7 +1434,7 @@ def test_scalar_na_cmp_corners(self): def tester(a, b): return a & b - self.assertRaises(TypeError, tester, s, datetime(2005, 1, 1)) + pytest.raises(TypeError, tester, s, datetime(2005, 1, 1)) s = Series([2, 3, 4, 5, 6, 7, 8, 9, datetime(2005, 1, 1)]) s[::2] = np.nan @@ -1441,8 +1451,8 @@ def tester(a, b): # this is an alignment issue; these are equivalent # https://github.com/pandas-dev/pandas/issues/5284 - self.assertRaises(ValueError, lambda: d.__and__(s, axis='columns')) - self.assertRaises(ValueError, tester, s, d) + pytest.raises(ValueError, lambda: d.__and__(s, axis='columns')) + pytest.raises(ValueError, tester, s, d) # this is wrong as its not a boolean result # result = d.__and__(s,axis='index') @@ -1453,10 +1463,10 @@ def test_operators_corner(self): empty = Series([], index=Index([])) result = series + empty - self.assertTrue(np.isnan(result).all()) + assert np.isnan(result).all() result = empty + Series([], index=Index([])) - self.assertEqual(len(result), 0) + assert len(result) == 0 # TODO: this returned NotImplemented earlier, what to do? # deltas = Series([timedelta(1)] * 5, index=np.arange(5)) @@ -1469,7 +1479,7 @@ def test_operators_corner(self): added = self.ts + int_ts expected = Series(self.ts.values[:-5] + int_ts.values, index=self.ts.index[:-5], name='ts') - self.assert_series_equal(added[:-5], expected) + tm.assert_series_equal(added[:-5], expected) def test_operators_reverse_object(self): # GH 56 @@ -1526,23 +1536,23 @@ def test_comp_ops_df_compat(self): for l, r in [(s1, s2), (s2, s1), (s3, s4), (s4, s3)]: msg = "Can only compare identically-labeled Series objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l == r - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l != r - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l < r msg = "Can only compare identically-labeled DataFrame objects" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l.to_frame() == r.to_frame() - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l.to_frame() != r.to_frame() - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): l.to_frame() < r.to_frame() def test_bool_ops_df_compat(self): @@ -1616,10 +1626,10 @@ def test_series_frame_radd_bug(self): assert_frame_equal(result, expected) # really raise this time - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): datetime.now() + self.ts - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.ts + datetime.now() def test_series_radd_more(self): @@ -1632,7 +1642,7 @@ def test_series_radd_more(self): for d in data: for dtype in [None, object]: s = Series(d, dtype=dtype) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): 'foo_' + s for dtype in [None, object]: @@ -1669,7 +1679,7 @@ def test_frame_radd_more(self): for d in data: for dtype in [None, object]: s = DataFrame(d, dtype=dtype) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): 'foo_' + s for dtype in [None, object]: @@ -1708,8 +1718,8 @@ def _check_fill(meth, op, a, b, fill_value=0): a = a.reindex(exp_index) b = b.reindex(exp_index) - amask = isnull(a) - bmask = isnull(b) + amask = isna(a) + bmask = isna(b) exp_values = [] for i in range(len(exp_index)): @@ -1764,8 +1774,8 @@ def _check_fill(meth, op, a, b, fill_value=0): def test_ne(self): ts = Series([3, 4, 5, 6, 7], [3, 4, 5, 6, 7], dtype=float) expected = [True, True, False, True, True] - self.assertTrue(tm.equalContents(ts.index != 5, expected)) - self.assertTrue(tm.equalContents(~(ts.index == 5), expected)) + assert tm.equalContents(ts.index != 5, expected) + assert tm.equalContents(~(ts.index == 5), expected) def test_operators_na_handling(self): from decimal import Decimal @@ -1775,8 +1785,8 @@ def test_operators_na_handling(self): result = s + s.shift(1) result2 = s.shift(1) + s - self.assertTrue(isnull(result[0])) - self.assertTrue(isnull(result2[0])) + assert isna(result[0]) + assert isna(result2[0]) s = Series(['foo', 'bar', 'baz', np.nan]) result = 'prefix_' + s @@ -1845,3 +1855,50 @@ def test_op_duplicate_index(self): result = s1 + s2 expected = pd.Series([11, 12, np.nan], index=[1, 1, 2]) assert_series_equal(result, expected) + + @pytest.mark.parametrize( + "test_input,error_type", + [ + (pd.Series([]), ValueError), + + # For strings, or any Series with dtype 'O' + (pd.Series(['foo', 'bar', 'baz']), TypeError), + (pd.Series([(1,), (2,)]), TypeError), + + # For mixed data types + ( + pd.Series(['foo', 'foo', 'bar', 'bar', None, np.nan, 'baz']), + TypeError + ), + ] + ) + def test_assert_argminmax_raises(self, test_input, error_type): + """ + Cases where ``Series.argmax`` and related should raise an exception + """ + with pytest.raises(error_type): + test_input.argmin() + with pytest.raises(error_type): + test_input.argmin(skipna=False) + with pytest.raises(error_type): + test_input.argmax() + with pytest.raises(error_type): + test_input.argmax(skipna=False) + + def test_argminmax_with_inf(self): + # For numeric data with NA and Inf (GH #13595) + s = pd.Series([0, -np.inf, np.inf, np.nan]) + + assert s.argmin() == 1 + assert np.isnan(s.argmin(skipna=False)) + + assert s.argmax() == 2 + assert np.isnan(s.argmax(skipna=False)) + + # Using old-style behavior that treats floating point nan, -inf, and + # +inf as missing + with pd.option_context('mode.use_inf_as_na', True): + assert s.argmin() == 0 + assert np.isnan(s.argmin(skipna=False)) + assert s.argmax() == 0 + np.isnan(s.argmax(skipna=False)) diff --git a/pandas/tests/series/test_period.py b/pandas/tests/series/test_period.py new file mode 100644 index 0000000000000..e907b0edd5c6a --- /dev/null +++ b/pandas/tests/series/test_period.py @@ -0,0 +1,251 @@ +import numpy as np + +import pandas as pd +import pandas.util.testing as tm +import pandas.core.indexes.period as period +from pandas import Series, period_range, DataFrame, Period + + +def _permute(obj): + return obj.take(np.random.permutation(len(obj))) + + +class TestSeriesPeriod(object): + + def setup_method(self, method): + self.series = Series(period_range('2000-01-01', periods=10, freq='D')) + + def test_auto_conversion(self): + series = Series(list(period_range('2000-01-01', periods=10, freq='D'))) + assert series.dtype == 'object' + + series = pd.Series([pd.Period('2011-01-01', freq='D'), + pd.Period('2011-02-01', freq='D')]) + assert series.dtype == 'object' + + def test_getitem(self): + assert self.series[1] == pd.Period('2000-01-02', freq='D') + + result = self.series[[2, 4]] + exp = pd.Series([pd.Period('2000-01-03', freq='D'), + pd.Period('2000-01-05', freq='D')], + index=[2, 4]) + tm.assert_series_equal(result, exp) + assert result.dtype == 'object' + + def test_isna(self): + # GH 13737 + s = Series([pd.Period('2011-01', freq='M'), + pd.Period('NaT', freq='M')]) + tm.assert_series_equal(s.isna(), Series([False, True])) + tm.assert_series_equal(s.notna(), Series([True, False])) + + def test_fillna(self): + # GH 13737 + s = Series([pd.Period('2011-01', freq='M'), + pd.Period('NaT', freq='M')]) + + res = s.fillna(pd.Period('2012-01', freq='M')) + exp = Series([pd.Period('2011-01', freq='M'), + pd.Period('2012-01', freq='M')]) + tm.assert_series_equal(res, exp) + assert res.dtype == 'object' + + res = s.fillna('XXX') + exp = Series([pd.Period('2011-01', freq='M'), 'XXX']) + tm.assert_series_equal(res, exp) + assert res.dtype == 'object' + + def test_dropna(self): + # GH 13737 + s = Series([pd.Period('2011-01', freq='M'), + pd.Period('NaT', freq='M')]) + tm.assert_series_equal(s.dropna(), + Series([pd.Period('2011-01', freq='M')])) + + def test_series_comparison_scalars(self): + val = pd.Period('2000-01-04', freq='D') + result = self.series > val + expected = pd.Series([x > val for x in self.series]) + tm.assert_series_equal(result, expected) + + val = self.series[5] + result = self.series > val + expected = pd.Series([x > val for x in self.series]) + tm.assert_series_equal(result, expected) + + def test_between(self): + left, right = self.series[[2, 7]] + result = self.series.between(left, right) + expected = (self.series >= left) & (self.series <= right) + tm.assert_series_equal(result, expected) + + # --------------------------------------------------------------------- + # NaT support + + """ + # ToDo: Enable when support period dtype + def test_NaT_scalar(self): + series = Series([0, 1000, 2000, iNaT], dtype='period[D]') + + val = series[3] + assert isna(val) + + series[2] = val + assert isna(series[2]) + + def test_NaT_cast(self): + result = Series([np.nan]).astype('period[D]') + expected = Series([NaT]) + tm.assert_series_equal(result, expected) + """ + + def test_set_none_nan(self): + # currently Period is stored as object dtype, not as NaT + self.series[3] = None + assert self.series[3] is None + + self.series[3:5] = None + assert self.series[4] is None + + self.series[5] = np.nan + assert np.isnan(self.series[5]) + + self.series[5:7] = np.nan + assert np.isnan(self.series[6]) + + def test_intercept_astype_object(self): + expected = self.series.astype('object') + + df = DataFrame({'a': self.series, + 'b': np.random.randn(len(self.series))}) + + result = df.values.squeeze() + assert (result[:, 0] == expected.values).all() + + df = DataFrame({'a': self.series, 'b': ['foo'] * len(self.series)}) + + result = df.values.squeeze() + assert (result[:, 0] == expected.values).all() + + def test_comp_series_period_scalar(self): + # GH 13200 + for freq in ['M', '2M', '3M']: + base = Series([Period(x, freq=freq) for x in + ['2011-01', '2011-02', '2011-03', '2011-04']]) + p = Period('2011-02', freq=freq) + + exp = pd.Series([False, True, False, False]) + tm.assert_series_equal(base == p, exp) + tm.assert_series_equal(p == base, exp) + + exp = pd.Series([True, False, True, True]) + tm.assert_series_equal(base != p, exp) + tm.assert_series_equal(p != base, exp) + + exp = pd.Series([False, False, True, True]) + tm.assert_series_equal(base > p, exp) + tm.assert_series_equal(p < base, exp) + + exp = pd.Series([True, False, False, False]) + tm.assert_series_equal(base < p, exp) + tm.assert_series_equal(p > base, exp) + + exp = pd.Series([False, True, True, True]) + tm.assert_series_equal(base >= p, exp) + tm.assert_series_equal(p <= base, exp) + + exp = pd.Series([True, True, False, False]) + tm.assert_series_equal(base <= p, exp) + tm.assert_series_equal(p >= base, exp) + + # different base freq + msg = "Input has different freq=A-DEC from Period" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + base <= Period('2011', freq='A') + + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + Period('2011', freq='A') >= base + + def test_comp_series_period_series(self): + # GH 13200 + for freq in ['M', '2M', '3M']: + base = Series([Period(x, freq=freq) for x in + ['2011-01', '2011-02', '2011-03', '2011-04']]) + + s = Series([Period(x, freq=freq) for x in + ['2011-02', '2011-01', '2011-03', '2011-05']]) + + exp = Series([False, False, True, False]) + tm.assert_series_equal(base == s, exp) + + exp = Series([True, True, False, True]) + tm.assert_series_equal(base != s, exp) + + exp = Series([False, True, False, False]) + tm.assert_series_equal(base > s, exp) + + exp = Series([True, False, False, True]) + tm.assert_series_equal(base < s, exp) + + exp = Series([False, True, True, False]) + tm.assert_series_equal(base >= s, exp) + + exp = Series([True, False, True, True]) + tm.assert_series_equal(base <= s, exp) + + s2 = Series([Period(x, freq='A') for x in + ['2011', '2011', '2011', '2011']]) + + # different base freq + msg = "Input has different freq=A-DEC from Period" + with tm.assert_raises_regex( + period.IncompatibleFrequency, msg): + base <= s2 + + def test_comp_series_period_object(self): + # GH 13200 + base = Series([Period('2011', freq='A'), Period('2011-02', freq='M'), + Period('2013', freq='A'), Period('2011-04', freq='M')]) + + s = Series([Period('2012', freq='A'), Period('2011-01', freq='M'), + Period('2013', freq='A'), Period('2011-05', freq='M')]) + + exp = Series([False, False, True, False]) + tm.assert_series_equal(base == s, exp) + + exp = Series([True, True, False, True]) + tm.assert_series_equal(base != s, exp) + + exp = Series([False, True, False, False]) + tm.assert_series_equal(base > s, exp) + + exp = Series([True, False, False, True]) + tm.assert_series_equal(base < s, exp) + + exp = Series([False, True, True, False]) + tm.assert_series_equal(base >= s, exp) + + exp = Series([True, False, True, True]) + tm.assert_series_equal(base <= s, exp) + + def test_align_series(self): + rng = period_range('1/1/2000', '1/1/2010', freq='A') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts + ts[::2] + expected = ts + ts + expected[1::2] = np.nan + tm.assert_series_equal(result, expected) + + result = ts + _permute(ts[::2]) + tm.assert_series_equal(result, expected) + + # it works! + for kind in ['inner', 'outer', 'left', 'right']: + ts.align(ts[::2], join=kind) + msg = "Input has different freq=D from PeriodIndex\\(freq=A-DEC\\)" + with tm.assert_raises_regex(period.IncompatibleFrequency, msg): + ts + ts.asfreq('D', how="end") diff --git a/pandas/tests/series/test_quantile.py b/pandas/tests/series/test_quantile.py index 76db6c90a685f..cf5e3fe4f29b0 100644 --- a/pandas/tests/series/test_quantile.py +++ b/pandas/tests/series/test_quantile.py @@ -1,59 +1,56 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 -import nose import numpy as np import pandas as pd -from pandas import (Index, Series, _np_version_under1p9) -from pandas.tseries.index import Timestamp -from pandas.types.common import is_integer +from pandas import Index, Series +from pandas.core.indexes.datetimes import Timestamp +from pandas.core.dtypes.common import is_integer import pandas.util.testing as tm from .common import TestData -class TestSeriesQuantile(TestData, tm.TestCase): +class TestSeriesQuantile(TestData): def test_quantile(self): - from numpy import percentile q = self.ts.quantile(0.1) - self.assertEqual(q, percentile(self.ts.valid(), 10)) + assert q == np.percentile(self.ts.valid(), 10) q = self.ts.quantile(0.9) - self.assertEqual(q, percentile(self.ts.valid(), 90)) + assert q == np.percentile(self.ts.valid(), 90) # object dtype q = Series(self.ts, dtype=object).quantile(0.9) - self.assertEqual(q, percentile(self.ts.valid(), 90)) + assert q == np.percentile(self.ts.valid(), 90) # datetime64[ns] dtype dts = self.ts.index.to_series() q = dts.quantile(.2) - self.assertEqual(q, Timestamp('2000-01-10 19:12:00')) + assert q == Timestamp('2000-01-10 19:12:00') # timedelta64[ns] dtype tds = dts.diff() q = tds.quantile(.25) - self.assertEqual(q, pd.to_timedelta('24:00:00')) + assert q == pd.to_timedelta('24:00:00') # GH7661 result = Series([np.timedelta64('NaT')]).sum() - self.assertTrue(result is pd.NaT) + assert result is pd.NaT msg = 'percentiles should all be in the interval \\[0, 1\\]' for invalid in [-1, 2, [0.5, -1], [0.5, 2]]: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.ts.quantile(invalid) def test_quantile_multi(self): - from numpy import percentile qs = [.1, .9] result = self.ts.quantile(qs) - expected = pd.Series([percentile(self.ts.valid(), 10), - percentile(self.ts.valid(), 90)], + expected = pd.Series([np.percentile(self.ts.valid(), 10), + np.percentile(self.ts.valid(), 90)], index=qs, name=self.ts.name) tm.assert_series_equal(result, expected) @@ -71,59 +68,28 @@ def test_quantile_multi(self): tm.assert_series_equal(result, expected) def test_quantile_interpolation(self): - # GH #10174 - if _np_version_under1p9: - raise nose.SkipTest("Numpy version is under 1.9") - - from numpy import percentile + # see gh-10174 # interpolation = linear (default case) q = self.ts.quantile(0.1, interpolation='linear') - self.assertEqual(q, percentile(self.ts.valid(), 10)) + assert q == np.percentile(self.ts.valid(), 10) q1 = self.ts.quantile(0.1) - self.assertEqual(q1, percentile(self.ts.valid(), 10)) + assert q1 == np.percentile(self.ts.valid(), 10) # test with and without interpolation keyword - self.assertEqual(q, q1) + assert q == q1 def test_quantile_interpolation_dtype(self): # GH #10174 - if _np_version_under1p9: - raise nose.SkipTest("Numpy version is under 1.9") - - from numpy import percentile # interpolation = linear (default case) q = pd.Series([1, 3, 4]).quantile(0.5, interpolation='lower') - self.assertEqual(q, percentile(np.array([1, 3, 4]), 50)) - self.assertTrue(is_integer(q)) + assert q == np.percentile(np.array([1, 3, 4]), 50) + assert is_integer(q) q = pd.Series([1, 3, 4]).quantile(0.5, interpolation='higher') - self.assertEqual(q, percentile(np.array([1, 3, 4]), 50)) - self.assertTrue(is_integer(q)) - - def test_quantile_interpolation_np_lt_1p9(self): - # GH #10174 - if not _np_version_under1p9: - raise nose.SkipTest("Numpy version is greater than 1.9") - - from numpy import percentile - - # interpolation = linear (default case) - q = self.ts.quantile(0.1, interpolation='linear') - self.assertEqual(q, percentile(self.ts.valid(), 10)) - q1 = self.ts.quantile(0.1) - self.assertEqual(q1, percentile(self.ts.valid(), 10)) - - # interpolation other than linear - expErrMsg = "Interpolation methods other than " - with tm.assertRaisesRegexp(ValueError, expErrMsg): - self.ts.quantile(0.9, interpolation='nearest') - - # object dtype - with tm.assertRaisesRegexp(ValueError, expErrMsg): - q = Series(self.ts, dtype=object).quantile(0.7, - interpolation='higher') + assert q == np.percentile(np.array([1, 3, 4]), 50) + assert is_integer(q) def test_quantile_nan(self): @@ -131,14 +97,14 @@ def test_quantile_nan(self): s = pd.Series([1, 2, 3, 4, np.nan]) result = s.quantile(0.5) expected = 2.5 - self.assertEqual(result, expected) + assert result == expected # all nan/empty cases = [Series([]), Series([np.nan, np.nan])] for s in cases: res = s.quantile(0.5) - self.assertTrue(np.isnan(res)) + assert np.isnan(res) res = s.quantile([0.5]) tm.assert_series_equal(res, pd.Series([np.nan], index=[0.5])) @@ -167,7 +133,7 @@ def test_quantile_box(self): for case in cases: s = pd.Series(case, name='XXX') res = s.quantile(0.5) - self.assertEqual(res, case[1]) + assert res == case[1] res = s.quantile([0.5]) exp = pd.Series([case[1]], index=[0.5], name='XXX') @@ -175,12 +141,12 @@ def test_quantile_box(self): def test_datetime_timedelta_quantiles(self): # covers #9694 - self.assertTrue(pd.isnull(Series([], dtype='M8[ns]').quantile(.5))) - self.assertTrue(pd.isnull(Series([], dtype='m8[ns]').quantile(.5))) + assert pd.isna(Series([], dtype='M8[ns]').quantile(.5)) + assert pd.isna(Series([], dtype='m8[ns]').quantile(.5)) def test_quantile_nat(self): res = Series([pd.NaT, pd.NaT]).quantile(0.5) - self.assertTrue(res is pd.NaT) + assert res is pd.NaT res = Series([pd.NaT, pd.NaT]).quantile([0.5]) tm.assert_series_equal(res, pd.Series([pd.NaT], index=[0.5])) @@ -191,7 +157,7 @@ def test_quantile_empty(self): s = Series([], dtype='float64') res = s.quantile(0.5) - self.assertTrue(np.isnan(res)) + assert np.isnan(res) res = s.quantile([0.5]) exp = Series([np.nan], index=[0.5]) @@ -201,7 +167,7 @@ def test_quantile_empty(self): s = Series([], dtype='int64') res = s.quantile(0.5) - self.assertTrue(np.isnan(res)) + assert np.isnan(res) res = s.quantile([0.5]) exp = Series([np.nan], index=[0.5]) @@ -211,7 +177,7 @@ def test_quantile_empty(self): s = Series([], dtype='datetime64[ns]') res = s.quantile(0.5) - self.assertTrue(res is pd.NaT) + assert res is pd.NaT res = s.quantile([0.5]) exp = Series([pd.NaT], index=[0.5]) diff --git a/pandas/tests/series/test_rank.py b/pandas/tests/series/test_rank.py new file mode 100644 index 0000000000000..128a4cdd845e6 --- /dev/null +++ b/pandas/tests/series/test_rank.py @@ -0,0 +1,323 @@ +# -*- coding: utf-8 -*- +from pandas import compat + +import pytest + +from distutils.version import LooseVersion +from numpy import nan +import numpy as np + +from pandas import (Series, date_range, NaT) + +from pandas.compat import product +from pandas.util.testing import assert_series_equal +import pandas.util.testing as tm +from pandas.tests.series.common import TestData + + +class TestSeriesRank(TestData): + s = Series([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]) + + results = { + 'average': np.array([1.5, 5.5, 7.0, 3.5, nan, + 3.5, 1.5, 8.0, nan, 5.5]), + 'min': np.array([1, 5, 7, 3, nan, 3, 1, 8, nan, 5]), + 'max': np.array([2, 6, 7, 4, nan, 4, 2, 8, nan, 6]), + 'first': np.array([1, 5, 7, 3, nan, 4, 2, 8, nan, 6]), + 'dense': np.array([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]), + } + + def test_rank(self): + pytest.importorskip('scipy.stats.special') + rankdata = pytest.importorskip('scipy.stats.rankdata') + + self.ts[::2] = np.nan + self.ts[:10][::3] = 4. + + ranks = self.ts.rank() + oranks = self.ts.astype('O').rank() + + assert_series_equal(ranks, oranks) + + mask = np.isnan(self.ts) + filled = self.ts.fillna(np.inf) + + # rankdata returns a ndarray + exp = Series(rankdata(filled), index=filled.index, name='ts') + exp[mask] = np.nan + + tm.assert_series_equal(ranks, exp) + + iseries = Series(np.arange(5).repeat(2)) + + iranks = iseries.rank() + exp = iseries.astype(float).rank() + assert_series_equal(iranks, exp) + iseries = Series(np.arange(5)) + 1.0 + exp = iseries / 5.0 + iranks = iseries.rank(pct=True) + + assert_series_equal(iranks, exp) + + iseries = Series(np.repeat(1, 100)) + exp = Series(np.repeat(0.505, 100)) + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + iseries[1] = np.nan + exp = Series(np.repeat(50.0 / 99.0, 100)) + exp[1] = np.nan + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + iseries = Series(np.arange(5)) + 1.0 + iseries[4] = np.nan + exp = iseries / 4.0 + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + iseries = Series(np.repeat(np.nan, 100)) + exp = iseries.copy() + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + iseries = Series(np.arange(5)) + 1 + iseries[4] = np.nan + exp = iseries / 4.0 + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + rng = date_range('1/1/1990', periods=5) + iseries = Series(np.arange(5), rng) + 1 + iseries.iloc[4] = np.nan + exp = iseries / 4.0 + iranks = iseries.rank(pct=True) + assert_series_equal(iranks, exp) + + iseries = Series([1e-50, 1e-100, 1e-20, 1e-2, 1e-20 + 1e-30, 1e-1]) + exp = Series([2, 1, 3, 5, 4, 6.0]) + iranks = iseries.rank() + assert_series_equal(iranks, exp) + + # GH 5968 + iseries = Series(['3 day', '1 day 10m', '-2 day', NaT], + dtype='m8[ns]') + exp = Series([3, 2, 1, np.nan]) + iranks = iseries.rank() + assert_series_equal(iranks, exp) + + values = np.array( + [-50, -1, -1e-20, -1e-25, -1e-50, 0, 1e-40, 1e-20, 1e-10, 2, 40 + ], dtype='float64') + random_order = np.random.permutation(len(values)) + iseries = Series(values[random_order]) + exp = Series(random_order + 1.0, dtype='float64') + iranks = iseries.rank() + assert_series_equal(iranks, exp) + + def test_rank_categorical(self): + # GH issue #15420 rank incorrectly orders ordered categories + + # Test ascending/descending ranking for ordered categoricals + exp = Series([1., 2., 3., 4., 5., 6.]) + exp_desc = Series([6., 5., 4., 3., 2., 1.]) + ordered = Series( + ['first', 'second', 'third', 'fourth', 'fifth', 'sixth'] + ).astype( + 'category', + categories=['first', 'second', 'third', + 'fourth', 'fifth', 'sixth'], + ordered=True + ) + assert_series_equal(ordered.rank(), exp) + assert_series_equal(ordered.rank(ascending=False), exp_desc) + + # Unordered categoricals should be ranked as objects + unordered = Series( + ['first', 'second', 'third', 'fourth', 'fifth', 'sixth'], + ).astype( + 'category', + categories=['first', 'second', 'third', + 'fourth', 'fifth', 'sixth'], + ordered=False + ) + exp_unordered = Series([2., 4., 6., 3., 1., 5.]) + res = unordered.rank() + assert_series_equal(res, exp_unordered) + + unordered1 = Series( + [1, 2, 3, 4, 5, 6], + ).astype( + 'category', + categories=[1, 2, 3, 4, 5, 6], + ordered=False + ) + exp_unordered1 = Series([1., 2., 3., 4., 5., 6.]) + res1 = unordered1.rank() + assert_series_equal(res1, exp_unordered1) + + # Test na_option for rank data + na_ser = Series( + ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', np.NaN] + ).astype( + 'category', + categories=[ + 'first', 'second', 'third', 'fourth', + 'fifth', 'sixth', 'seventh' + ], + ordered=True + ) + + exp_top = Series([2., 3., 4., 5., 6., 7., 1.]) + exp_bot = Series([1., 2., 3., 4., 5., 6., 7.]) + exp_keep = Series([1., 2., 3., 4., 5., 6., np.NaN]) + + assert_series_equal(na_ser.rank(na_option='top'), exp_top) + assert_series_equal(na_ser.rank(na_option='bottom'), exp_bot) + assert_series_equal(na_ser.rank(na_option='keep'), exp_keep) + + # Test na_option for rank data with ascending False + exp_top = Series([7., 6., 5., 4., 3., 2., 1.]) + exp_bot = Series([6., 5., 4., 3., 2., 1., 7.]) + exp_keep = Series([6., 5., 4., 3., 2., 1., np.NaN]) + + assert_series_equal( + na_ser.rank(na_option='top', ascending=False), + exp_top + ) + assert_series_equal( + na_ser.rank(na_option='bottom', ascending=False), + exp_bot + ) + assert_series_equal( + na_ser.rank(na_option='keep', ascending=False), + exp_keep + ) + + # Test with pct=True + na_ser = Series( + ['first', 'second', 'third', 'fourth', np.NaN], + ).astype( + 'category', + categories=['first', 'second', 'third', 'fourth'], + ordered=True + ) + exp_top = Series([0.4, 0.6, 0.8, 1., 0.2]) + exp_bot = Series([0.2, 0.4, 0.6, 0.8, 1.]) + exp_keep = Series([0.25, 0.5, 0.75, 1., np.NaN]) + + assert_series_equal(na_ser.rank(na_option='top', pct=True), exp_top) + assert_series_equal(na_ser.rank(na_option='bottom', pct=True), exp_bot) + assert_series_equal(na_ser.rank(na_option='keep', pct=True), exp_keep) + + def test_rank_signature(self): + s = Series([0, 1]) + s.rank(method='average') + pytest.raises(ValueError, s.rank, 'average') + + def test_rank_inf(self): + pytest.skip('DataFrame.rank does not currently rank ' + 'np.inf and -np.inf properly') + + values = np.array( + [-np.inf, -50, -1, -1e-20, -1e-25, -1e-50, 0, 1e-40, 1e-20, 1e-10, + 2, 40, np.inf], dtype='float64') + random_order = np.random.permutation(len(values)) + iseries = Series(values[random_order]) + exp = Series(random_order + 1.0, dtype='float64') + iranks = iseries.rank() + assert_series_equal(iranks, exp) + + def test_rank_tie_methods(self): + s = self.s + + def _check(s, expected, method='average'): + result = s.rank(method=method) + tm.assert_series_equal(result, Series(expected)) + + dtypes = [None, object] + disabled = set([(object, 'first')]) + results = self.results + + for method, dtype in product(results, dtypes): + if (dtype, method) in disabled: + continue + series = s if dtype is None else s.astype(dtype) + _check(series, results[method], method=method) + + def test_rank_methods_series(self): + pytest.importorskip('scipy.stats.special') + rankdata = pytest.importorskip('scipy.stats.rankdata') + import scipy + + xs = np.random.randn(9) + xs = np.concatenate([xs[i:] for i in range(0, 9, 2)]) # add duplicates + np.random.shuffle(xs) + + index = [chr(ord('a') + i) for i in range(len(xs))] + + for vals in [xs, xs + 1e6, xs * 1e-6]: + ts = Series(vals, index=index) + + for m in ['average', 'min', 'max', 'first', 'dense']: + result = ts.rank(method=m) + sprank = rankdata(vals, m if m != 'first' else 'ordinal') + expected = Series(sprank, index=index) + + if LooseVersion(scipy.__version__) >= '0.17.0': + expected = expected.astype('float64') + tm.assert_series_equal(result, expected) + + def test_rank_dense_method(self): + dtypes = ['O', 'f8', 'i8'] + in_out = [([1], [1]), + ([2], [1]), + ([0], [1]), + ([2, 2], [1, 1]), + ([1, 2, 3], [1, 2, 3]), + ([4, 2, 1], [3, 2, 1],), + ([1, 1, 5, 5, 3], [1, 1, 3, 3, 2]), + ([-5, -4, -3, -2, -1], [1, 2, 3, 4, 5])] + + for ser, exp in in_out: + for dtype in dtypes: + s = Series(ser).astype(dtype) + result = s.rank(method='dense') + expected = Series(exp).astype(result.dtype) + assert_series_equal(result, expected) + + def test_rank_descending(self): + dtypes = ['O', 'f8', 'i8'] + + for dtype, method in product(dtypes, self.results): + if 'i' in dtype: + s = self.s.dropna() + else: + s = self.s.astype(dtype) + + res = s.rank(ascending=False) + expected = (s.max() - s).rank() + assert_series_equal(res, expected) + + if method == 'first' and dtype == 'O': + continue + + expected = (s.max() - s).rank(method=method) + res2 = s.rank(method=method, ascending=False) + assert_series_equal(res2, expected) + + def test_rank_int(self): + s = self.s.dropna().astype('i8') + + for method, res in compat.iteritems(self.results): + result = s.rank(method=method) + expected = Series(res).dropna() + expected.index = result.index + assert_series_equal(result, expected) + + def test_rank_object_bug(self): + # GH 13445 + + # smoke tests + Series([np.nan] * 32).astype(object).rank(ascending=True) + Series([np.nan] * 32).astype(object).rank(ascending=False) diff --git a/pandas/tests/series/test_replace.py b/pandas/tests/series/test_replace.py index d80328ea3863a..2c07d87865f53 100644 --- a/pandas/tests/series/test_replace.py +++ b/pandas/tests/series/test_replace.py @@ -1,18 +1,17 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 +import pytest + import numpy as np import pandas as pd -import pandas.lib as lib +import pandas._libs.lib as lib import pandas.util.testing as tm from .common import TestData -class TestSeriesReplace(TestData, tm.TestCase): - - _multiprocess_can_split_ = True - +class TestSeriesReplace(TestData): def test_replace(self): N = 100 ser = pd.Series(np.random.randn(N)) @@ -38,18 +37,18 @@ def test_replace(self): # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) - self.assertTrue((rs[:5] == -1).all()) - self.assertTrue((rs[6:10] == -1).all()) - self.assertTrue((rs[20:30] == -1).all()) - self.assertTrue((pd.isnull(ser[:5])).all()) + assert (rs[:5] == -1).all() + assert (rs[6:10] == -1).all() + assert (rs[20:30] == -1).all() + assert (pd.isna(ser[:5])).all() # replace with different values rs = ser.replace({np.nan: -1, 'foo': -2, 'bar': -3}) - self.assertTrue((rs[:5] == -1).all()) - self.assertTrue((rs[6:10] == -2).all()) - self.assertTrue((rs[20:30] == -3).all()) - self.assertTrue((pd.isnull(ser[:5])).all()) + assert (rs[:5] == -1).all() + assert (rs[6:10] == -2).all() + assert (rs[20:30] == -3).all() + assert (pd.isna(ser[:5])).all() # replace with different values with 2 lists rs2 = ser.replace([np.nan, 'foo', 'bar'], [-1, -2, -3]) @@ -58,9 +57,9 @@ def test_replace(self): # replace inplace ser.replace([np.nan, 'foo', 'bar'], -1, inplace=True) - self.assertTrue((ser[:5] == -1).all()) - self.assertTrue((ser[6:10] == -1).all()) - self.assertTrue((ser[20:30] == -1).all()) + assert (ser[:5] == -1).all() + assert (ser[6:10] == -1).all() + assert (ser[20:30] == -1).all() ser = pd.Series([np.nan, 0, np.inf]) tm.assert_series_equal(ser.replace(np.nan, 0), ser.fillna(0)) @@ -75,11 +74,11 @@ def test_replace(self): tm.assert_series_equal(ser.replace(np.nan, 0), ser.fillna(0)) # malformed - self.assertRaises(ValueError, ser.replace, [1, 2, 3], [np.nan, 0]) + pytest.raises(ValueError, ser.replace, [1, 2, 3], [np.nan, 0]) # make sure that we aren't just masking a TypeError because bools don't # implement indexing - with tm.assertRaisesRegexp(TypeError, 'Cannot compare types .+'): + with tm.assert_raises_regex(TypeError, 'Cannot compare types .+'): ser.replace([1, 2], [np.nan, 0]) ser = pd.Series([0, 1, 2, 3, 4]) @@ -120,7 +119,7 @@ def test_replace_with_single_list(self): # make sure things don't get corrupted when fillna call fails s = ser.copy() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): s.replace([1, 2, 3], inplace=True, method='crash_cymbal') tm.assert_series_equal(s, ser) @@ -134,8 +133,8 @@ def check_replace(to_rep, val, expected): tm.assert_series_equal(expected, r) tm.assert_series_equal(expected, sc) - # should NOT upcast to float - e = pd.Series([0, 1, 2, 3, 4]) + # MUST upcast to float + e = pd.Series([0., 1., 2., 3., 4.]) tr, v = [3], [3.0] check_replace(tr, v, e) @@ -154,8 +153,8 @@ def check_replace(to_rep, val, expected): tr, v = [3, 4], [3.5, pd.Timestamp('20130101')] check_replace(tr, v, e) - # casts to float - e = pd.Series([0, 1, 2, 3.5, 1]) + # casts to object + e = pd.Series([0, 1, 2, 3.5, True], dtype='object') tr, v = [3, 4], [3.5, True] check_replace(tr, v, e) @@ -187,7 +186,7 @@ def test_replace_bool_with_bool(self): def test_replace_with_dict_with_bool_keys(self): s = pd.Series([True, False, True]) - with tm.assertRaisesRegexp(TypeError, 'Cannot compare types .+'): + with tm.assert_raises_regex(TypeError, 'Cannot compare types .+'): s.replace({'asdf': 'asdb', True: 'yes'}) def test_replace2(self): @@ -201,18 +200,18 @@ def test_replace2(self): # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) - self.assertTrue((rs[:5] == -1).all()) - self.assertTrue((rs[6:10] == -1).all()) - self.assertTrue((rs[20:30] == -1).all()) - self.assertTrue((pd.isnull(ser[:5])).all()) + assert (rs[:5] == -1).all() + assert (rs[6:10] == -1).all() + assert (rs[20:30] == -1).all() + assert (pd.isna(ser[:5])).all() # replace with different values rs = ser.replace({np.nan: -1, 'foo': -2, 'bar': -3}) - self.assertTrue((rs[:5] == -1).all()) - self.assertTrue((rs[6:10] == -2).all()) - self.assertTrue((rs[20:30] == -3).all()) - self.assertTrue((pd.isnull(ser[:5])).all()) + assert (rs[:5] == -1).all() + assert (rs[6:10] == -2).all() + assert (rs[20:30] == -3).all() + assert (pd.isna(ser[:5])).all() # replace with different values with 2 lists rs2 = ser.replace([np.nan, 'foo', 'bar'], [-1, -2, -3]) @@ -220,6 +219,33 @@ def test_replace2(self): # replace inplace ser.replace([np.nan, 'foo', 'bar'], -1, inplace=True) - self.assertTrue((ser[:5] == -1).all()) - self.assertTrue((ser[6:10] == -1).all()) - self.assertTrue((ser[20:30] == -1).all()) + assert (ser[:5] == -1).all() + assert (ser[6:10] == -1).all() + assert (ser[20:30] == -1).all() + + def test_replace_with_empty_dictlike(self): + # GH 15289 + s = pd.Series(list('abcd')) + tm.assert_series_equal(s, s.replace(dict())) + tm.assert_series_equal(s, s.replace(pd.Series([]))) + + def test_replace_string_with_number(self): + # GH 15743 + s = pd.Series([1, 2, 3]) + result = s.replace('2', np.nan) + expected = pd.Series([1, 2, 3]) + tm.assert_series_equal(expected, result) + + def test_replace_unicode_with_number(self): + # GH 15743 + s = pd.Series([1, 2, 3]) + result = s.replace(u'2', np.nan) + expected = pd.Series([1, 2, 3]) + tm.assert_series_equal(expected, result) + + def test_replace_mixed_types_with_string(self): + # Testing mixed + s = pd.Series([1, 2, 3, '4', 4, 5]) + result = s.replace([2, '4'], np.nan) + expected = pd.Series([1, np.nan, 3, np.nan, 4, 5]) + tm.assert_series_equal(expected, result) diff --git a/pandas/tests/series/test_repr.py b/pandas/tests/series/test_repr.py index af52f6e712e61..c22e2ca8e0dc8 100644 --- a/pandas/tests/series/test_repr.py +++ b/pandas/tests/series/test_repr.py @@ -3,22 +3,22 @@ from datetime import datetime, timedelta +import sys + import numpy as np import pandas as pd -from pandas import (Index, Series, DataFrame, date_range) +from pandas import (Index, Series, DataFrame, date_range, option_context) from pandas.core.index import MultiIndex -from pandas.compat import StringIO, lrange, range, u +from pandas.compat import lrange, range, u from pandas import compat import pandas.util.testing as tm from .common import TestData -class TestSeriesRepr(TestData, tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesRepr(TestData): def test_multilevel_name_print(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', @@ -34,24 +34,29 @@ def test_multilevel_name_print(self): "qux one 7", " two 8", " three 9", "Name: sth, dtype: int64"] expected = "\n".join(expected) - self.assertEqual(repr(s), expected) + assert repr(s) == expected def test_name_printing(self): - # test small series + # Test small Series. s = Series([0, 1, 2]) + s.name = "test" - self.assertIn("Name: test", repr(s)) + assert "Name: test" in repr(s) + s.name = None - self.assertNotIn("Name:", repr(s)) - # test big series (diff code path) + assert "Name:" not in repr(s) + + # Test big Series (diff code path). s = Series(lrange(0, 1000)) + s.name = "test" - self.assertIn("Name: test", repr(s)) + assert "Name: test" in repr(s) + s.name = None - self.assertNotIn("Name:", repr(s)) + assert "Name:" not in repr(s) s = Series(index=date_range('20010101', '20020101'), name='test') - self.assertIn("Name: test", repr(s)) + assert "Name: test" in repr(s) def test_repr(self): str(self.ts) @@ -90,44 +95,39 @@ def test_repr(self): # 0 as name ser = Series(np.random.randn(100), name=0) rep_str = repr(ser) - self.assertIn("Name: 0", rep_str) + assert "Name: 0" in rep_str # tidy repr ser = Series(np.random.randn(1001), name=0) rep_str = repr(ser) - self.assertIn("Name: 0", rep_str) + assert "Name: 0" in rep_str ser = Series(["a\n\r\tb"], name="a\n\r\td", index=["a\n\r\tf"]) - self.assertFalse("\t" in repr(ser)) - self.assertFalse("\r" in repr(ser)) - self.assertFalse("a\n" in repr(ser)) + assert "\t" not in repr(ser) + assert "\r" not in repr(ser) + assert "a\n" not in repr(ser) # with empty series (#4651) s = Series([], dtype=np.int64, name='foo') - self.assertEqual(repr(s), 'Series([], Name: foo, dtype: int64)') + assert repr(s) == 'Series([], Name: foo, dtype: int64)' s = Series([], dtype=np.int64, name=None) - self.assertEqual(repr(s), 'Series([], dtype: int64)') + assert repr(s) == 'Series([], dtype: int64)' def test_tidy_repr(self): a = Series([u("\u05d0")] * 1000) a.name = 'title1' repr(a) # should not raise exception + @tm.capture_stderr def test_repr_bool_fails(self): s = Series([DataFrame(np.random.randn(2, 2)) for i in range(5)]) - import sys + # It works (with no Cython exception barf)! + repr(s) - buf = StringIO() - tmp = sys.stderr - sys.stderr = buf - try: - # it works (with no Cython exception barf)! - repr(s) - finally: - sys.stderr = tmp - self.assertEqual(buf.getvalue(), '') + output = sys.stderr.getvalue() + assert output == '' def test_repr_name_iterable_indexable(self): s = Series([1, 2, 3], name=np.int64(3)) @@ -148,7 +148,7 @@ def test_repr_should_return_str(self): data = [8, 5, 3, 5] index1 = [u("\u03c3"), u("\u03c4"), u("\u03c5"), u("\u03c6")] df = Series(data, index=index1) - self.assertTrue(type(df.__repr__() == str)) # both py2 / 3 + assert type(df.__repr__() == str) # both py2 / 3 def test_repr_max_rows(self): # GH 6863 @@ -176,7 +176,25 @@ def test_timeseries_repr_object_dtype(self): repr(ts) ts = tm.makeTimeSeries(1000) - self.assertTrue(repr(ts).splitlines()[-1].startswith('Freq:')) + assert repr(ts).splitlines()[-1].startswith('Freq:') ts2 = ts.iloc[np.random.randint(0, len(ts) - 1, 400)] repr(ts2).splitlines()[-1] + + def test_latex_repr(self): + result = r"""\begin{tabular}{ll} +\toprule +{} & 0 \\ +\midrule +0 & $\alpha$ \\ +1 & b \\ +2 & c \\ +\bottomrule +\end{tabular} +""" + with option_context('display.latex.escape', False, + 'display.latex.repr', True): + s = Series([r'$\alpha$', 'b', 'c']) + assert result == s._repr_latex_() + + assert s._repr_latex_() is None diff --git a/pandas/tests/series/test_sorting.py b/pandas/tests/series/test_sorting.py index fb3817eb84acd..40b0280de3719 100644 --- a/pandas/tests/series/test_sorting.py +++ b/pandas/tests/series/test_sorting.py @@ -1,58 +1,47 @@ # coding=utf-8 +import pytest + import numpy as np import random -from pandas import (DataFrame, Series, MultiIndex) +from pandas import DataFrame, Series, MultiIndex, IntervalIndex -from pandas.util.testing import (assert_series_equal, assert_almost_equal) +from pandas.util.testing import assert_series_equal, assert_almost_equal import pandas.util.testing as tm from .common import TestData -class TestSeriesSorting(TestData, tm.TestCase): - - _multiprocess_can_split_ = True - - def test_sort(self): +class TestSeriesSorting(TestData): + def test_sortlevel_deprecated(self): ts = self.ts.copy() - # 9816 deprecated - with tm.assert_produces_warning(FutureWarning): - ts.sort() # sorts inplace - self.assert_series_equal(ts, self.ts.sort_values()) + # see gh-9816 with tm.assert_produces_warning(FutureWarning): ts.sortlevel() - def test_order(self): - - # 9816 deprecated - with tm.assert_produces_warning(FutureWarning): - result = self.ts.order() - self.assert_series_equal(result, self.ts.sort_values()) - def test_sort_values(self): # check indexes are reordered corresponding with the values ser = Series([3, 2, 4, 1], ['A', 'B', 'C', 'D']) expected = Series([1, 2, 3, 4], ['D', 'B', 'A', 'C']) result = ser.sort_values() - self.assert_series_equal(expected, result) + tm.assert_series_equal(expected, result) ts = self.ts.copy() ts[:5] = np.NaN vals = ts.values result = ts.sort_values() - self.assertTrue(np.isnan(result[-5:]).all()) - self.assert_numpy_array_equal(result[:-5].values, np.sort(vals[5:])) + assert np.isnan(result[-5:]).all() + tm.assert_numpy_array_equal(result[:-5].values, np.sort(vals[5:])) # na_position result = ts.sort_values(na_position='first') - self.assertTrue(np.isnan(result[:5]).all()) - self.assert_numpy_array_equal(result[5:].values, np.sort(vals[5:])) + assert np.isnan(result[:5]).all() + tm.assert_numpy_array_equal(result[5:].values, np.sort(vals[5:])) # something object-type ser = Series(['A', 'B'], [1, 2]) @@ -66,12 +55,31 @@ def test_sort_values(self): ordered = ts.sort_values(ascending=False, na_position='first') assert_almost_equal(expected, ordered.valid().values) + # ascending=[False] should behave the same as ascending=False + ordered = ts.sort_values(ascending=[False]) + expected = ts.sort_values(ascending=False) + assert_series_equal(expected, ordered) + ordered = ts.sort_values(ascending=[False], na_position='first') + expected = ts.sort_values(ascending=False, na_position='first') + assert_series_equal(expected, ordered) + + pytest.raises(ValueError, + lambda: ts.sort_values(ascending=None)) + pytest.raises(ValueError, + lambda: ts.sort_values(ascending=[])) + pytest.raises(ValueError, + lambda: ts.sort_values(ascending=[1, 2, 3])) + pytest.raises(ValueError, + lambda: ts.sort_values(ascending=[False, False])) + pytest.raises(ValueError, + lambda: ts.sort_values(ascending='foobar')) + # inplace=True ts = self.ts.copy() ts.sort_values(ascending=False, inplace=True) - self.assert_series_equal(ts, self.ts.sort_values(ascending=False)) - self.assert_index_equal(ts.index, - self.ts.sort_values(ascending=False).index) + tm.assert_series_equal(ts, self.ts.sort_values(ascending=False)) + tm.assert_index_equal(ts.index, + self.ts.sort_values(ascending=False).index) # GH 5856/5853 # Series.sort_values operating on a view @@ -81,7 +89,7 @@ def test_sort_values(self): def f(): s.sort_values(inplace=True) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_sort_index(self): rindex = list(self.ts.index) @@ -104,13 +112,13 @@ def test_sort_index(self): sorted_series = random_order.sort_index(axis=0) assert_series_equal(sorted_series, self.ts) - self.assertRaises(ValueError, lambda: random_order.sort_values(axis=1)) + pytest.raises(ValueError, lambda: random_order.sort_values(axis=1)) sorted_series = random_order.sort_index(level=0, axis=0) assert_series_equal(sorted_series, self.ts) - self.assertRaises(ValueError, - lambda: random_order.sort_index(level=0, axis=1)) + pytest.raises(ValueError, + lambda: random_order.sort_index(level=0, axis=1)) def test_sort_index_inplace(self): @@ -121,16 +129,17 @@ def test_sort_index_inplace(self): # descending random_order = self.ts.reindex(rindex) result = random_order.sort_index(ascending=False, inplace=True) - self.assertIs(result, None, - msg='sort_index() inplace should return None') - assert_series_equal(random_order, self.ts.reindex(self.ts.index[::-1])) + + assert result is None + tm.assert_series_equal(random_order, self.ts.reindex( + self.ts.index[::-1])) # ascending random_order = self.ts.reindex(rindex) result = random_order.sort_index(ascending=True, inplace=True) - self.assertIs(result, None, - msg='sort_index() inplace should return None') - assert_series_equal(random_order, self.ts) + + assert result is None + tm.assert_series_equal(random_order, self.ts) def test_sort_index_multiindex(self): @@ -171,3 +180,18 @@ def test_sort_index_na_position(self): expected_series_last = Series(index=[1, 2, 3, 3, 4, np.nan]) index_sorted_series = series.sort_index(na_position='last') assert_series_equal(expected_series_last, index_sorted_series) + + def test_sort_index_intervals(self): + s = Series([np.nan, 1, 2, 3], IntervalIndex.from_arrays( + [0, 1, 2, 3], + [1, 2, 3, 4])) + + result = s.sort_index() + expected = s + assert_series_equal(result, expected) + + result = s.sort_index(ascending=False) + expected = Series([3, 2, 1, np.nan], IntervalIndex.from_arrays( + [3, 2, 1, 0], + [4, 3, 2, 1])) + assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_subclass.py b/pandas/tests/series/test_subclass.py index 5bcf258020349..37c8d7343f7f1 100644 --- a/pandas/tests/series/test_subclass.py +++ b/pandas/tests/series/test_subclass.py @@ -6,67 +6,63 @@ import pandas.util.testing as tm -class TestSeriesSubclassing(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSeriesSubclassing(object): def test_indexing_sliced(self): s = tm.SubclassedSeries([1, 2, 3, 4], index=list('abcd')) res = s.loc[['a', 'b']] exp = tm.SubclassedSeries([1, 2], index=list('ab')) tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = s.iloc[[2, 3]] exp = tm.SubclassedSeries([3, 4], index=list('cd')) tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) res = s.loc[['a', 'b']] exp = tm.SubclassedSeries([1, 2], index=list('ab')) tm.assert_series_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedSeries) + assert isinstance(res, tm.SubclassedSeries) def test_to_frame(self): s = tm.SubclassedSeries([1, 2, 3, 4], index=list('abcd'), name='xxx') res = s.to_frame() exp = tm.SubclassedDataFrame({'xxx': [1, 2, 3, 4]}, index=list('abcd')) tm.assert_frame_equal(res, exp) - tm.assertIsInstance(res, tm.SubclassedDataFrame) - + assert isinstance(res, tm.SubclassedDataFrame) -class TestSparseSeriesSubclassing(tm.TestCase): - _multiprocess_can_split_ = True +class TestSparseSeriesSubclassing(object): def test_subclass_sparse_slice(self): # int64 s = tm.SubclassedSparseSeries([1, 2, 3, 4, 5]) exp = tm.SubclassedSparseSeries([2, 3, 4], index=[1, 2, 3]) tm.assert_sp_series_equal(s.loc[1:3], exp) - self.assertEqual(s.loc[1:3].dtype, np.int64) + assert s.loc[1:3].dtype == np.int64 exp = tm.SubclassedSparseSeries([2, 3], index=[1, 2]) tm.assert_sp_series_equal(s.iloc[1:3], exp) - self.assertEqual(s.iloc[1:3].dtype, np.int64) + assert s.iloc[1:3].dtype == np.int64 exp = tm.SubclassedSparseSeries([2, 3], index=[1, 2]) tm.assert_sp_series_equal(s[1:3], exp) - self.assertEqual(s[1:3].dtype, np.int64) + assert s[1:3].dtype == np.int64 # float64 s = tm.SubclassedSparseSeries([1., 2., 3., 4., 5.]) exp = tm.SubclassedSparseSeries([2., 3., 4.], index=[1, 2, 3]) tm.assert_sp_series_equal(s.loc[1:3], exp) - self.assertEqual(s.loc[1:3].dtype, np.float64) + assert s.loc[1:3].dtype == np.float64 exp = tm.SubclassedSparseSeries([2., 3.], index=[1, 2]) tm.assert_sp_series_equal(s.iloc[1:3], exp) - self.assertEqual(s.iloc[1:3].dtype, np.float64) + assert s.iloc[1:3].dtype == np.float64 exp = tm.SubclassedSparseSeries([2., 3.], index=[1, 2]) tm.assert_sp_series_equal(s[1:3], exp) - self.assertEqual(s[1:3].dtype, np.float64) + assert s[1:3].dtype == np.float64 def test_subclass_sparse_addition(self): s1 = tm.SubclassedSparseSeries([1, 3, 5]) diff --git a/pandas/tests/series/test_timeseries.py b/pandas/tests/series/test_timeseries.py index df7ab24430746..6018260708335 100644 --- a/pandas/tests/series/test_timeseries.py +++ b/pandas/tests/series/test_timeseries.py @@ -1,23 +1,39 @@ # coding=utf-8 # pylint: disable-msg=E1101,W0612 -from datetime import datetime +import pytest import numpy as np +from datetime import datetime, timedelta, time -from pandas import Index, Series, date_range, NaT -from pandas.tseries.index import DatetimeIndex -from pandas.tseries.offsets import BDay, BMonthEnd -from pandas.tseries.tdi import TimedeltaIndex - -from pandas.util.testing import assert_series_equal, assert_almost_equal +import pandas as pd import pandas.util.testing as tm +from pandas._libs.tslib import iNaT +from pandas.compat import lrange, StringIO, product +from pandas.core.indexes.timedeltas import TimedeltaIndex +from pandas.core.indexes.datetimes import DatetimeIndex +from pandas.tseries.offsets import BDay, BMonthEnd +from pandas import (Index, Series, date_range, NaT, concat, DataFrame, + Timestamp, to_datetime, offsets, + timedelta_range) +from pandas.util.testing import (assert_series_equal, assert_almost_equal, + assert_frame_equal, _skip_if_has_locale) from pandas.tests.series.common import TestData -class TestSeriesTimeSeries(TestData, tm.TestCase): - _multiprocess_can_split_ = True +def _simple_ts(start, end, freq='D'): + rng = date_range(start, end, freq=freq) + return Series(np.random.randn(len(rng)), index=rng) + + +def assert_range_equal(left, right): + assert (left.equals(right)) + assert (left.freq == right.freq) + assert (left.tz == right.tz) + + +class TestTimeSeries(TestData): def test_shift(self): shifted = self.ts.shift(1) @@ -59,7 +75,7 @@ def test_shift(self): assert_series_equal(shifted2, shifted3) assert_series_equal(ps, shifted2.shift(-1, 'B')) - self.assertRaises(ValueError, ps.shift, freq='D') + pytest.raises(ValueError, ps.shift, freq='D') # legacy support shifted4 = ps.shift(1, freq='B') @@ -90,7 +106,23 @@ def test_shift(self): # incompat tz s2 = Series(date_range('2000-01-01 09:00:00', periods=5, tz='CET'), name='foo') - self.assertRaises(ValueError, lambda: s - s2) + pytest.raises(ValueError, lambda: s - s2) + + def test_shift2(self): + ts = Series(np.random.randn(5), + index=date_range('1/1/2000', periods=5, freq='H')) + + result = ts.shift(1, freq='5T') + exp_index = ts.index.shift(1, freq='5T') + tm.assert_index_equal(result.index, exp_index) + + # GH #1063, multiple of same base + result = ts.shift(1, freq='4H') + exp_index = ts.index + offsets.Hour(4) + tm.assert_index_equal(result.index, exp_index) + + idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04']) + pytest.raises(ValueError, idx.shift, 1) def test_shift_dst(self): # GH 13926 @@ -99,25 +131,25 @@ def test_shift_dst(self): res = s.shift(0) tm.assert_series_equal(res, s) - self.assertEqual(res.dtype, 'datetime64[ns, US/Eastern]') + assert res.dtype == 'datetime64[ns, US/Eastern]' res = s.shift(1) exp_vals = [NaT] + dates.asobject.values.tolist()[:9] exp = Series(exp_vals) tm.assert_series_equal(res, exp) - self.assertEqual(res.dtype, 'datetime64[ns, US/Eastern]') + assert res.dtype == 'datetime64[ns, US/Eastern]' res = s.shift(-2) exp_vals = dates.asobject.values.tolist()[2:] + [NaT, NaT] exp = Series(exp_vals) tm.assert_series_equal(res, exp) - self.assertEqual(res.dtype, 'datetime64[ns, US/Eastern]') + assert res.dtype == 'datetime64[ns, US/Eastern]' for ex in [10, -10, 20, -20]: res = s.shift(ex) exp = Series([NaT] * 10, dtype='datetime64[ns, US/Eastern]') tm.assert_series_equal(res, exp) - self.assertEqual(res.dtype, 'datetime64[ns, US/Eastern]') + assert res.dtype == 'datetime64[ns, US/Eastern]' def test_tshift(self): # PeriodIndex @@ -133,7 +165,7 @@ def test_tshift(self): shifted3 = ps.tshift(freq=BDay()) assert_series_equal(shifted, shifted3) - self.assertRaises(ValueError, ps.tshift, freq='M') + pytest.raises(ValueError, ps.tshift, freq='M') # DatetimeIndex shifted = self.ts.tshift(1) @@ -152,7 +184,7 @@ def test_tshift(self): assert_series_equal(unshifted, inferred_ts) no_freq = self.ts[[0, 5, 7]] - self.assertRaises(ValueError, no_freq.tshift) + pytest.raises(ValueError, no_freq.tshift) def test_truncate(self): offset = BDay() @@ -200,209 +232,9 @@ def test_truncate(self): truncated = ts.truncate(before=self.ts.index[-1] + offset) assert (len(truncated) == 0) - self.assertRaises(ValueError, ts.truncate, - before=self.ts.index[-1] + offset, - after=self.ts.index[0] - offset) - - def test_getitem_setitem_datetimeindex(self): - from pandas import date_range - - N = 50 - # testing with timezone, GH #2785 - rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern') - ts = Series(np.random.randn(N), index=rng) - - result = ts["1990-01-01 04:00:00"] - expected = ts[4] - self.assertEqual(result, expected) - - result = ts.copy() - result["1990-01-01 04:00:00"] = 0 - result["1990-01-01 04:00:00"] = ts[4] - assert_series_equal(result, ts) - - result = ts["1990-01-01 04:00:00":"1990-01-01 07:00:00"] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts.copy() - result["1990-01-01 04:00:00":"1990-01-01 07:00:00"] = 0 - result["1990-01-01 04:00:00":"1990-01-01 07:00:00"] = ts[4:8] - assert_series_equal(result, ts) - - lb = "1990-01-01 04:00:00" - rb = "1990-01-01 07:00:00" - result = ts[(ts.index >= lb) & (ts.index <= rb)] - expected = ts[4:8] - assert_series_equal(result, expected) - - # repeat all the above with naive datetimes - result = ts[datetime(1990, 1, 1, 4)] - expected = ts[4] - self.assertEqual(result, expected) - - result = ts.copy() - result[datetime(1990, 1, 1, 4)] = 0 - result[datetime(1990, 1, 1, 4)] = ts[4] - assert_series_equal(result, ts) - - result = ts[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts.copy() - result[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] = 0 - result[datetime(1990, 1, 1, 4):datetime(1990, 1, 1, 7)] = ts[4:8] - assert_series_equal(result, ts) - - lb = datetime(1990, 1, 1, 4) - rb = datetime(1990, 1, 1, 7) - result = ts[(ts.index >= lb) & (ts.index <= rb)] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts[ts.index[4]] - expected = ts[4] - self.assertEqual(result, expected) - - result = ts[ts.index[4:8]] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts.copy() - result[ts.index[4:8]] = 0 - result[4:8] = ts[4:8] - assert_series_equal(result, ts) - - # also test partial date slicing - result = ts["1990-01-02"] - expected = ts[24:48] - assert_series_equal(result, expected) - - result = ts.copy() - result["1990-01-02"] = 0 - result["1990-01-02"] = ts[24:48] - assert_series_equal(result, ts) - - def test_getitem_setitem_datetime_tz_pytz(self): - tm._skip_if_no_pytz() - from pytz import timezone as tz - - from pandas import date_range - - N = 50 - # testing with timezone, GH #2785 - rng = date_range('1/1/1990', periods=N, freq='H', tz='US/Eastern') - ts = Series(np.random.randn(N), index=rng) - - # also test Timestamp tz handling, GH #2789 - result = ts.copy() - result["1990-01-01 09:00:00+00:00"] = 0 - result["1990-01-01 09:00:00+00:00"] = ts[4] - assert_series_equal(result, ts) - - result = ts.copy() - result["1990-01-01 03:00:00-06:00"] = 0 - result["1990-01-01 03:00:00-06:00"] = ts[4] - assert_series_equal(result, ts) - - # repeat with datetimes - result = ts.copy() - result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0 - result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4] - assert_series_equal(result, ts) - - result = ts.copy() - - # comparison dates with datetime MUST be localized! - date = tz('US/Central').localize(datetime(1990, 1, 1, 3)) - result[date] = 0 - result[date] = ts[4] - assert_series_equal(result, ts) - - def test_getitem_setitem_datetime_tz_dateutil(self): - tm._skip_if_no_dateutil() - from dateutil.tz import tzutc - from pandas.tslib import _dateutil_gettz as gettz - - tz = lambda x: tzutc() if x == 'UTC' else gettz( - x) # handle special case for utc in dateutil - - from pandas import date_range - - N = 50 - - # testing with timezone, GH #2785 - rng = date_range('1/1/1990', periods=N, freq='H', - tz='America/New_York') - ts = Series(np.random.randn(N), index=rng) - - # also test Timestamp tz handling, GH #2789 - result = ts.copy() - result["1990-01-01 09:00:00+00:00"] = 0 - result["1990-01-01 09:00:00+00:00"] = ts[4] - assert_series_equal(result, ts) - - result = ts.copy() - result["1990-01-01 03:00:00-06:00"] = 0 - result["1990-01-01 03:00:00-06:00"] = ts[4] - assert_series_equal(result, ts) - - # repeat with datetimes - result = ts.copy() - result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = 0 - result[datetime(1990, 1, 1, 9, tzinfo=tz('UTC'))] = ts[4] - assert_series_equal(result, ts) - - result = ts.copy() - result[datetime(1990, 1, 1, 3, tzinfo=tz('America/Chicago'))] = 0 - result[datetime(1990, 1, 1, 3, tzinfo=tz('America/Chicago'))] = ts[4] - assert_series_equal(result, ts) - - def test_getitem_setitem_periodindex(self): - from pandas import period_range - - N = 50 - rng = period_range('1/1/1990', periods=N, freq='H') - ts = Series(np.random.randn(N), index=rng) - - result = ts["1990-01-01 04"] - expected = ts[4] - self.assertEqual(result, expected) - - result = ts.copy() - result["1990-01-01 04"] = 0 - result["1990-01-01 04"] = ts[4] - assert_series_equal(result, ts) - - result = ts["1990-01-01 04":"1990-01-01 07"] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts.copy() - result["1990-01-01 04":"1990-01-01 07"] = 0 - result["1990-01-01 04":"1990-01-01 07"] = ts[4:8] - assert_series_equal(result, ts) - - lb = "1990-01-01 04" - rb = "1990-01-01 07" - result = ts[(ts.index >= lb) & (ts.index <= rb)] - expected = ts[4:8] - assert_series_equal(result, expected) - - # GH 2782 - result = ts[ts.index[4]] - expected = ts[4] - self.assertEqual(result, expected) - - result = ts[ts.index[4:8]] - expected = ts[4:8] - assert_series_equal(result, expected) - - result = ts.copy() - result[ts.index[4:8]] = 0 - result[4:8] = ts[4:8] - assert_series_equal(result, ts) + pytest.raises(ValueError, ts.truncate, + before=self.ts.index[-1] + offset, + after=self.ts.index[0] - offset) def test_asfreq(self): ts = Series([0., 1., 2.], index=[datetime(2009, 10, 30), datetime( @@ -410,25 +242,33 @@ def test_asfreq(self): daily_ts = ts.asfreq('B') monthly_ts = daily_ts.asfreq('BM') - assert_series_equal(monthly_ts, ts) + tm.assert_series_equal(monthly_ts, ts) daily_ts = ts.asfreq('B', method='pad') monthly_ts = daily_ts.asfreq('BM') - assert_series_equal(monthly_ts, ts) + tm.assert_series_equal(monthly_ts, ts) daily_ts = ts.asfreq(BDay()) monthly_ts = daily_ts.asfreq(BMonthEnd()) - assert_series_equal(monthly_ts, ts) + tm.assert_series_equal(monthly_ts, ts) result = ts[:0].asfreq('M') - self.assertEqual(len(result), 0) - self.assertIsNot(result, ts) + assert len(result) == 0 + assert result is not ts daily_ts = ts.asfreq('D', fill_value=-1) result = daily_ts.value_counts().sort_index() expected = Series([60, 1, 1, 1], index=[-1.0, 2.0, 1.0, 0.0]).sort_index() - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) + + def test_asfreq_datetimeindex_empty_series(self): + # GH 14320 + expected = Series(index=pd.DatetimeIndex( + ["2016-09-29 11:00"])).asfreq('H') + result = Series(index=pd.DatetimeIndex(["2016-09-29 11:00"]), + data=[3]).asfreq('H') + tm.assert_index_equal(expected.index, result.index) def test_diff(self): # Just run the function @@ -440,7 +280,7 @@ def test_diff(self): s = Series([a, b]) rs = s.diff() - self.assertEqual(rs[1], 1) + assert rs[1] == 1 # neg n rs = self.ts.diff(-1) @@ -503,10 +343,10 @@ def test_autocorr(self): # corr() with lag needs Series of at least length 2 if len(self.ts) <= 2: - self.assertTrue(np.isnan(corr1)) - self.assertTrue(np.isnan(corr2)) + assert np.isnan(corr1) + assert np.isnan(corr2) else: - self.assertEqual(corr1, corr2) + assert corr1 == corr2 # Choose a random lag between 1 and length of Series - 2 # and compare the result with the Series corr() function @@ -516,34 +356,34 @@ def test_autocorr(self): # corr() with lag needs Series of at least length 2 if len(self.ts) <= 2: - self.assertTrue(np.isnan(corr1)) - self.assertTrue(np.isnan(corr2)) + assert np.isnan(corr1) + assert np.isnan(corr2) else: - self.assertEqual(corr1, corr2) + assert corr1 == corr2 def test_first_last_valid(self): ts = self.ts.copy() ts[:5] = np.NaN index = ts.first_valid_index() - self.assertEqual(index, ts.index[5]) + assert index == ts.index[5] ts[-5:] = np.NaN index = ts.last_valid_index() - self.assertEqual(index, ts.index[-6]) + assert index == ts.index[-6] ts[:] = np.nan - self.assertIsNone(ts.last_valid_index()) - self.assertIsNone(ts.first_valid_index()) + assert ts.last_valid_index() is None + assert ts.first_valid_index() is None ser = Series([], index=[]) - self.assertIsNone(ser.last_valid_index()) - self.assertIsNone(ser.first_valid_index()) + assert ser.last_valid_index() is None + assert ser.first_valid_index() is None # GH12800 empty = Series() - self.assertIsNone(empty.last_valid_index()) - self.assertIsNone(empty.first_valid_index()) + assert empty.last_valid_index() is None + assert empty.first_valid_index() is None def test_mpl_compat_hack(self): result = self.ts[:, np.newaxis] @@ -553,10 +393,8 @@ def test_mpl_compat_hack(self): def test_timeseries_coercion(self): idx = tm.makeDateIndex(10000) ser = Series(np.random.randn(len(idx)), idx.astype(object)) - with tm.assert_produces_warning(FutureWarning): - self.assertTrue(ser.is_time_series) - self.assertTrue(ser.index.is_all_dates) - self.assertIsInstance(ser.index, DatetimeIndex) + assert ser.index.is_all_dates + assert isinstance(ser.index, DatetimeIndex) def test_empty_series_ops(self): # see issue #13844 @@ -565,10 +403,532 @@ def test_empty_series_ops(self): assert_series_equal(a, a + b) assert_series_equal(a, a - b) assert_series_equal(a, b + a) - self.assertRaises(TypeError, lambda x, y: x - y, b, a) + pytest.raises(TypeError, lambda x, y: x - y, b, a) + + def test_contiguous_boolean_preserve_freq(self): + rng = date_range('1/1/2000', '3/1/2000', freq='B') + + mask = np.zeros(len(rng), dtype=bool) + mask[10:20] = True + + masked = rng[mask] + expected = rng[10:20] + assert expected.freq is not None + assert_range_equal(masked, expected) + + mask[22] = True + masked = rng[mask] + assert masked.freq is None + + def test_to_datetime_unit(self): + + epoch = 1370745748 + s = Series([epoch + t for t in range(20)]) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in range(20)]) + assert_series_equal(result, expected) + + s = Series([epoch + t for t in range(20)]).astype(float) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in range(20)]) + assert_series_equal(result, expected) + + s = Series([epoch + t for t in range(20)] + [iNaT]) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in range(20)] + [NaT]) + assert_series_equal(result, expected) + + s = Series([epoch + t for t in range(20)] + [iNaT]).astype(float) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in range(20)] + [NaT]) + assert_series_equal(result, expected) + + # GH13834 + s = Series([epoch + t for t in np.arange(0, 2, .25)] + + [iNaT]).astype(float) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in np.arange(0, 2, .25)] + [NaT]) + assert_series_equal(result, expected) + + s = concat([Series([epoch + t for t in range(20)] + ).astype(float), Series([np.nan])], + ignore_index=True) + result = to_datetime(s, unit='s') + expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( + seconds=t) for t in range(20)] + [NaT]) + assert_series_equal(result, expected) + + result = to_datetime([1, 2, 'NaT', pd.NaT, np.nan], unit='D') + expected = DatetimeIndex([Timestamp('1970-01-02'), + Timestamp('1970-01-03')] + ['NaT'] * 3) + tm.assert_index_equal(result, expected) + + with pytest.raises(ValueError): + to_datetime([1, 2, 'foo'], unit='D') + with pytest.raises(ValueError): + to_datetime([1, 2, 111111111], unit='D') + + # coerce we can process + expected = DatetimeIndex([Timestamp('1970-01-02'), + Timestamp('1970-01-03')] + ['NaT'] * 1) + result = to_datetime([1, 2, 'foo'], unit='D', errors='coerce') + tm.assert_index_equal(result, expected) + + result = to_datetime([1, 2, 111111111], unit='D', errors='coerce') + tm.assert_index_equal(result, expected) + + def test_series_ctor_datetime64(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') + dates = np.asarray(rng) + + series = Series(dates) + assert np.issubdtype(series.dtype, np.dtype('M8[ns]')) + + def test_series_repr_nat(self): + series = Series([0, 1000, 2000, iNaT], dtype='M8[ns]') + + result = repr(series) + expected = ('0 1970-01-01 00:00:00.000000\n' + '1 1970-01-01 00:00:00.000001\n' + '2 1970-01-01 00:00:00.000002\n' + '3 NaT\n' + 'dtype: datetime64[ns]') + assert result == expected + + def test_asfreq_keep_index_name(self): + # GH #9854 + index_name = 'bar' + index = pd.date_range('20130101', periods=20, name=index_name) + df = pd.DataFrame([x for x in range(20)], columns=['foo'], index=index) + + assert index_name == df.index.name + assert index_name == df.asfreq('10D').index.name + + def test_promote_datetime_date(self): + rng = date_range('1/1/2000', periods=20) + ts = Series(np.random.randn(20), index=rng) + + ts_slice = ts[5:] + ts2 = ts_slice.copy() + ts2.index = [x.date() for x in ts2.index] + + result = ts + ts2 + result2 = ts2 + ts + expected = ts + ts[5:] + assert_series_equal(result, expected) + assert_series_equal(result2, expected) + + # test asfreq + result = ts2.asfreq('4H', method='ffill') + expected = ts[5:].asfreq('4H', method='ffill') + assert_series_equal(result, expected) + + result = rng.get_indexer(ts2.index) + expected = rng.get_indexer(ts_slice.index) + tm.assert_numpy_array_equal(result, expected) + + def test_asfreq_normalize(self): + rng = date_range('1/1/2000 09:30', periods=20) + norm = date_range('1/1/2000', periods=20) + vals = np.random.randn(20) + ts = Series(vals, index=rng) + + result = ts.asfreq('D', normalize=True) + norm = date_range('1/1/2000', periods=20) + expected = Series(vals, index=norm) + + assert_series_equal(result, expected) + + vals = np.random.randn(20, 3) + ts = DataFrame(vals, index=rng) + + result = ts.asfreq('D', normalize=True) + expected = DataFrame(vals, index=norm) + + assert_frame_equal(result, expected) + + def test_first_subset(self): + ts = _simple_ts('1/1/2000', '1/1/2010', freq='12h') + result = ts.first('10d') + assert len(result) == 20 + + ts = _simple_ts('1/1/2000', '1/1/2010') + result = ts.first('10d') + assert len(result) == 10 + + result = ts.first('3M') + expected = ts[:'3/31/2000'] + assert_series_equal(result, expected) + + result = ts.first('21D') + expected = ts[:21] + assert_series_equal(result, expected) + + result = ts[:0].first('3M') + assert_series_equal(result, ts[:0]) + + def test_last_subset(self): + ts = _simple_ts('1/1/2000', '1/1/2010', freq='12h') + result = ts.last('10d') + assert len(result) == 20 + + ts = _simple_ts('1/1/2000', '1/1/2010') + result = ts.last('10d') + assert len(result) == 10 + + result = ts.last('21D') + expected = ts['12/12/2009':] + assert_series_equal(result, expected) + + result = ts.last('21D') + expected = ts[-21:] + assert_series_equal(result, expected) + + result = ts[:0].last('3M') + assert_series_equal(result, ts[:0]) + + def test_format_pre_1900_dates(self): + rng = date_range('1/1/1850', '1/1/1950', freq='A-DEC') + rng.format() + ts = Series(1, index=rng) + repr(ts) + + def test_at_time(self): + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = Series(np.random.randn(len(rng)), index=rng) + rs = ts.at_time(rng[1]) + assert (rs.index.hour == rng[1].hour).all() + assert (rs.index.minute == rng[1].minute).all() + assert (rs.index.second == rng[1].second).all() + + result = ts.at_time('9:30') + expected = ts.at_time(time(9, 30)) + assert_series_equal(result, expected) + + df = DataFrame(np.random.randn(len(rng), 3), index=rng) + + result = ts[time(9, 30)] + result_df = df.loc[time(9, 30)] + expected = ts[(rng.hour == 9) & (rng.minute == 30)] + exp_df = df[(rng.hour == 9) & (rng.minute == 30)] + + # expected.index = date_range('1/1/2000', '1/4/2000') + + assert_series_equal(result, expected) + tm.assert_frame_equal(result_df, exp_df) + + chunk = df.loc['1/4/2000':] + result = chunk.loc[time(9, 30)] + expected = result_df[-1:] + tm.assert_frame_equal(result, expected) + + # midnight, everything + rng = date_range('1/1/2000', '1/31/2000') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.at_time(time(0, 0)) + assert_series_equal(result, ts) + + # time doesn't exist + rng = date_range('1/1/2012', freq='23Min', periods=384) + ts = Series(np.random.randn(len(rng)), rng) + rs = ts.at_time('16:00') + assert len(rs) == 0 + + def test_between(self): + series = Series(date_range('1/1/2000', periods=10)) + left, right = series[[2, 7]] + + result = series.between(left, right) + expected = (series >= left) & (series <= right) + assert_series_equal(result, expected) + def test_between_time(self): + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = Series(np.random.randn(len(rng)), index=rng) + stime = time(0, 0) + etime = time(1, 0) + + close_open = product([True, False], [True, False]) + for inc_start, inc_end in close_open: + filtered = ts.between_time(stime, etime, inc_start, inc_end) + exp_len = 13 * 4 + 1 + if not inc_start: + exp_len -= 5 + if not inc_end: + exp_len -= 4 + + assert len(filtered) == exp_len + for rs in filtered.index: + t = rs.time() + if inc_start: + assert t >= stime + else: + assert t > stime + + if inc_end: + assert t <= etime + else: + assert t < etime + + result = ts.between_time('00:00', '01:00') + expected = ts.between_time(stime, etime) + assert_series_equal(result, expected) + + # across midnight + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = Series(np.random.randn(len(rng)), index=rng) + stime = time(22, 0) + etime = time(9, 0) + + close_open = product([True, False], [True, False]) + for inc_start, inc_end in close_open: + filtered = ts.between_time(stime, etime, inc_start, inc_end) + exp_len = (12 * 11 + 1) * 4 + 1 + if not inc_start: + exp_len -= 4 + if not inc_end: + exp_len -= 4 + + assert len(filtered) == exp_len + for rs in filtered.index: + t = rs.time() + if inc_start: + assert (t >= stime) or (t <= etime) + else: + assert (t > stime) or (t <= etime) + + if inc_end: + assert (t <= etime) or (t >= stime) + else: + assert (t < etime) or (t >= stime) + + def test_between_time_types(self): + # GH11818 + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + pytest.raises(ValueError, rng.indexer_between_time, + datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) + + frame = DataFrame({'A': 0}, index=rng) + pytest.raises(ValueError, frame.between_time, + datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) + + series = Series(0, index=rng) + pytest.raises(ValueError, series.between_time, + datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) + + def test_between_time_formats(self): + # GH11818 + _skip_if_has_locale() + + rng = date_range('1/1/2000', '1/5/2000', freq='5min') + ts = DataFrame(np.random.randn(len(rng), 2), index=rng) + + strings = [("2:00", "2:30"), ("0200", "0230"), ("2:00am", "2:30am"), + ("0200am", "0230am"), ("2:00:00", "2:30:00"), + ("020000", "023000"), ("2:00:00am", "2:30:00am"), + ("020000am", "023000am")] + expected_length = 28 + + for time_string in strings: + assert len(ts.between_time(*time_string)) == expected_length + + def test_to_period(self): + from pandas.core.indexes.period import period_range + + ts = _simple_ts('1/1/2000', '1/1/2001') + + pts = ts.to_period() + exp = ts.copy() + exp.index = period_range('1/1/2000', '1/1/2001') + assert_series_equal(pts, exp) + + pts = ts.to_period('M') + exp.index = exp.index.asfreq('M') + tm.assert_index_equal(pts.index, exp.index.asfreq('M')) + assert_series_equal(pts, exp) + + # GH 7606 without freq + idx = DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', + '2011-01-04']) + exp_idx = pd.PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03', + '2011-01-04'], freq='D') + + s = Series(np.random.randn(4), index=idx) + expected = s.copy() + expected.index = exp_idx + assert_series_equal(s.to_period(), expected) + + df = DataFrame(np.random.randn(4, 4), index=idx, columns=idx) + expected = df.copy() + expected.index = exp_idx + assert_frame_equal(df.to_period(), expected) + + expected = df.copy() + expected.columns = exp_idx + assert_frame_equal(df.to_period(axis=1), expected) + + def test_groupby_count_dateparseerror(self): + dr = date_range(start='1/1/2012', freq='5min', periods=10) + + # BAD Example, datetimes first + s = Series(np.arange(10), index=[dr, lrange(10)]) + grouped = s.groupby(lambda x: x[1] % 2 == 0) + result = grouped.count() + + s = Series(np.arange(10), index=[lrange(10), dr]) + grouped = s.groupby(lambda x: x[0] % 2 == 0) + expected = grouped.count() + + assert_series_equal(result, expected) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + def test_to_csv_numpy_16_bug(self): + frame = DataFrame({'a': date_range('1/1/2000', periods=10)}) + + buf = StringIO() + frame.to_csv(buf) + + result = buf.getvalue() + assert '2000-01-01' in result + + def test_series_map_box_timedelta(self): + # GH 11349 + s = Series(timedelta_range('1 day 1 s', periods=5, freq='h')) + + def f(x): + return x.total_seconds() + + s.map(f) + s.apply(f) + DataFrame(s).applymap(f) + + def test_asfreq_resample_set_correct_freq(self): + # GH5613 + # we test if .asfreq() and .resample() set the correct value for .freq + df = pd.DataFrame({'date': ["2012-01-01", "2012-01-02", "2012-01-03"], + 'col': [1, 2, 3]}) + df = df.set_index(pd.to_datetime(df.date)) + + # testing the settings before calling .asfreq() and .resample() + assert df.index.freq is None + assert df.index.inferred_freq == 'D' + + # does .asfreq() set .freq correctly? + assert df.asfreq('D').index.freq == 'D' + + # does .resample() set .freq correctly? + assert df.resample('D').asfreq().index.freq == 'D' + + def test_pickle(self): + + # GH4606 + p = tm.round_trip_pickle(NaT) + assert p is NaT + + idx = pd.to_datetime(['2013-01-01', NaT, '2014-01-06']) + idx_p = tm.round_trip_pickle(idx) + assert idx_p[0] == idx[0] + assert idx_p[1] is NaT + assert idx_p[2] == idx[2] + + # GH11002 + # don't infer freq + idx = date_range('1750-1-1', '2050-1-1', freq='7D') + idx_p = tm.round_trip_pickle(idx) + tm.assert_index_equal(idx, idx_p) + + def test_setops_preserve_freq(self): + for tz in [None, 'Asia/Tokyo', 'US/Eastern']: + rng = date_range('1/1/2000', '1/1/2002', name='idx', tz=tz) + + result = rng[:50].union(rng[50:100]) + assert result.name == rng.name + assert result.freq == rng.freq + assert result.tz == rng.tz + + result = rng[:50].union(rng[30:100]) + assert result.name == rng.name + assert result.freq == rng.freq + assert result.tz == rng.tz + + result = rng[:50].union(rng[60:100]) + assert result.name == rng.name + assert result.freq is None + assert result.tz == rng.tz + + result = rng[:50].intersection(rng[25:75]) + assert result.name == rng.name + assert result.freqstr == 'D' + assert result.tz == rng.tz + + nofreq = DatetimeIndex(list(rng[25:75]), name='other') + result = rng[:50].union(nofreq) + assert result.name is None + assert result.freq == rng.freq + assert result.tz == rng.tz + + result = rng[:50].intersection(nofreq) + assert result.name is None + assert result.freq == rng.freq + assert result.tz == rng.tz + + def test_min_max(self): + rng = date_range('1/1/2000', '12/31/2000') + rng2 = rng.take(np.random.permutation(len(rng))) + + the_min = rng2.min() + the_max = rng2.max() + assert isinstance(the_min, Timestamp) + assert isinstance(the_max, Timestamp) + assert the_min == rng[0] + assert the_max == rng[-1] + + assert rng.min() == rng[0] + assert rng.max() == rng[-1] + + def test_min_max_series(self): + rng = date_range('1/1/2000', periods=10, freq='4h') + lvls = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'] + df = DataFrame({'TS': rng, 'V': np.random.randn(len(rng)), 'L': lvls}) + + result = df.TS.max() + exp = Timestamp(df.TS.iat[-1]) + assert isinstance(result, Timestamp) + assert result == exp + + result = df.TS.min() + exp = Timestamp(df.TS.iat[0]) + assert isinstance(result, Timestamp) + assert result == exp + + def test_from_M8_structured(self): + dates = [(datetime(2012, 9, 9, 0, 0), datetime(2012, 9, 8, 15, 10))] + arr = np.array(dates, + dtype=[('Date', 'M8[us]'), ('Forecasting', 'M8[us]')]) + df = DataFrame(arr) + + assert df['Date'][0] == dates[0][0] + assert df['Forecasting'][0] == dates[0][1] + + s = Series(arr['Date']) + assert isinstance(s[0], Timestamp) + assert s[0] == dates[0][0] + + s = Series.from_array(arr['Date'], Index([0])) + assert s[0] == dates[0][0] + + def test_get_level_values_box(self): + from pandas import MultiIndex + + dates = date_range('1/1/2000', periods=4) + levels = [dates, [0, 1]] + labels = [[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]] + + index = MultiIndex(levels=levels, labels=labels) + + assert isinstance(index.get_level_values(0)[0], Timestamp) diff --git a/pandas/tests/series/test_validate.py b/pandas/tests/series/test_validate.py index cf0482b41c80a..a0cde5f81d021 100644 --- a/pandas/tests/series/test_validate.py +++ b/pandas/tests/series/test_validate.py @@ -1,33 +1,27 @@ -from unittest import TestCase from pandas.core.series import Series +import pytest +import pandas.util.testing as tm -class TestSeriesValidate(TestCase): - """Tests for error handling related to data types of method arguments.""" - s = Series([1, 2, 3, 4, 5]) - - def test_validate_bool_args(self): - # Tests for error handling related to boolean arguments. - invalid_values = [1, "True", [1, 2, 3], 5.0] - for value in invalid_values: - with self.assertRaises(ValueError): - self.s.reset_index(inplace=value) +@pytest.fixture +def series(): + return Series([1, 2, 3, 4, 5]) - with self.assertRaises(ValueError): - self.s._set_name(name='hello', inplace=value) - with self.assertRaises(ValueError): - self.s.sort_values(inplace=value) - - with self.assertRaises(ValueError): - self.s.sort_index(inplace=value) +class TestSeriesValidate(object): + """Tests for error handling related to data types of method arguments.""" - with self.assertRaises(ValueError): - self.s.sort_index(inplace=value) + @pytest.mark.parametrize("func", ["reset_index", "_set_name", + "sort_values", "sort_index", + "rename", "dropna"]) + @pytest.mark.parametrize("inplace", [1, "True", [1, 2, 3], 5.0]) + def test_validate_bool_args(self, series, func, inplace): + msg = "For argument \"inplace\" expected type bool" + kwargs = dict(inplace=inplace) - with self.assertRaises(ValueError): - self.s.rename(inplace=value) + if func == "_set_name": + kwargs["name"] = "hello" - with self.assertRaises(ValueError): - self.s.dropna(inplace=value) + with tm.assert_raises_regex(ValueError, msg): + getattr(series, func)(**kwargs) diff --git a/pandas/tests/sparse/__init__.py b/pandas/tests/sparse/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tests/sparse/common.py b/pandas/tests/sparse/common.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/sparse/tests/test_arithmetics.py b/pandas/tests/sparse/test_arithmetics.py similarity index 94% rename from pandas/sparse/tests/test_arithmetics.py rename to pandas/tests/sparse/test_arithmetics.py index f24244b38c42b..f023cd0003910 100644 --- a/pandas/sparse/tests/test_arithmetics.py +++ b/pandas/tests/sparse/test_arithmetics.py @@ -3,9 +3,7 @@ import pandas.util.testing as tm -class TestSparseArrayArithmetics(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSparseArrayArithmetics(object): _base = np.array _klass = pd.SparseArray @@ -69,9 +67,9 @@ def _check_numeric_ops(self, a, b, a_dense, b_dense): self._assert((b_dense ** a).to_dense(), b_dense ** a_dense) def _check_bool_result(self, res): - tm.assertIsInstance(res, self._klass) - self.assertEqual(res.dtype, np.bool) - self.assertIsInstance(res.fill_value, bool) + assert isinstance(res, self._klass) + assert res.dtype == np.bool + assert isinstance(res.fill_value, bool) def _check_comparison_ops(self, a, b, a_dense, b_dense): with np.errstate(invalid='ignore'): @@ -276,30 +274,30 @@ def test_int_array(self): for kind in ['integer', 'block']: a = self._klass(values, dtype=dtype, kind=kind) - self.assertEqual(a.dtype, dtype) + assert a.dtype == dtype b = self._klass(rvalues, dtype=dtype, kind=kind) - self.assertEqual(b.dtype, dtype) + assert b.dtype == dtype self._check_numeric_ops(a, b, values, rvalues) self._check_numeric_ops(a, b * 0, values, rvalues * 0) a = self._klass(values, fill_value=0, dtype=dtype, kind=kind) - self.assertEqual(a.dtype, dtype) + assert a.dtype == dtype b = self._klass(rvalues, dtype=dtype, kind=kind) - self.assertEqual(b.dtype, dtype) + assert b.dtype == dtype self._check_numeric_ops(a, b, values, rvalues) a = self._klass(values, fill_value=0, dtype=dtype, kind=kind) - self.assertEqual(a.dtype, dtype) + assert a.dtype == dtype b = self._klass(rvalues, fill_value=0, dtype=dtype, kind=kind) - self.assertEqual(b.dtype, dtype) + assert b.dtype == dtype self._check_numeric_ops(a, b, values, rvalues) a = self._klass(values, fill_value=1, dtype=dtype, kind=kind) - self.assertEqual(a.dtype, dtype) + assert a.dtype == dtype b = self._klass(rvalues, fill_value=2, dtype=dtype, kind=kind) - self.assertEqual(b.dtype, dtype) + assert b.dtype == dtype self._check_numeric_ops(a, b, values, rvalues) def test_int_array_comparison(self): @@ -366,24 +364,24 @@ def test_mixed_array_float_int(self): for kind in ['integer', 'block']: a = self._klass(values, kind=kind) b = self._klass(rvalues, kind=kind) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_numeric_ops(a, b, values, rvalues) self._check_numeric_ops(a, b * 0, values, rvalues * 0) a = self._klass(values, kind=kind, fill_value=0) b = self._klass(rvalues, kind=kind) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_numeric_ops(a, b, values, rvalues) a = self._klass(values, kind=kind, fill_value=0) b = self._klass(rvalues, kind=kind, fill_value=0) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_numeric_ops(a, b, values, rvalues) a = self._klass(values, kind=kind, fill_value=1) b = self._klass(rvalues, kind=kind, fill_value=2) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_numeric_ops(a, b, values, rvalues) def test_mixed_array_comparison(self): @@ -396,24 +394,24 @@ def test_mixed_array_comparison(self): for kind in ['integer', 'block']: a = self._klass(values, kind=kind) b = self._klass(rvalues, kind=kind) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_comparison_ops(a, b, values, rvalues) self._check_comparison_ops(a, b * 0, values, rvalues * 0) a = self._klass(values, kind=kind, fill_value=0) b = self._klass(rvalues, kind=kind) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_comparison_ops(a, b, values, rvalues) a = self._klass(values, kind=kind, fill_value=0) b = self._klass(rvalues, kind=kind, fill_value=0) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_comparison_ops(a, b, values, rvalues) a = self._klass(values, kind=kind, fill_value=1) b = self._klass(rvalues, kind=kind, fill_value=2) - self.assertEqual(b.dtype, rdtype) + assert b.dtype == rdtype self._check_comparison_ops(a, b, values, rvalues) diff --git a/pandas/sparse/tests/test_array.py b/pandas/tests/sparse/test_array.py similarity index 76% rename from pandas/sparse/tests/test_array.py rename to pandas/tests/sparse/test_array.py index 592926f8e821d..b0a9182a265fe 100644 --- a/pandas/sparse/tests/test_array.py +++ b/pandas/tests/sparse/test_array.py @@ -1,108 +1,108 @@ from pandas.compat import range + import re import operator +import pytest import warnings from numpy import nan import numpy as np -from pandas import _np_version_under1p8 -from pandas.sparse.api import SparseArray, SparseSeries -from pandas._sparse import IntIndex -from pandas.util.testing import assert_almost_equal, assertRaisesRegexp +from pandas.core.sparse.api import SparseArray, SparseSeries +from pandas._libs.sparse import IntIndex +from pandas.util.testing import assert_almost_equal import pandas.util.testing as tm -class TestSparseArray(tm.TestCase): - _multiprocess_can_split_ = True +class TestSparseArray(object): - def setUp(self): + def setup_method(self, method): self.arr_data = np.array([nan, nan, 1, 2, 3, nan, 4, 5, nan, 6]) self.arr = SparseArray(self.arr_data) self.zarr = SparseArray([0, 0, 1, 2, 3, 0, 4, 5, 0, 6], fill_value=0) def test_constructor_dtype(self): arr = SparseArray([np.nan, 1, 2, np.nan]) - self.assertEqual(arr.dtype, np.float64) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.float64 + assert np.isnan(arr.fill_value) arr = SparseArray([np.nan, 1, 2, np.nan], fill_value=0) - self.assertEqual(arr.dtype, np.float64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.float64 + assert arr.fill_value == 0 arr = SparseArray([0, 1, 2, 4], dtype=np.float64) - self.assertEqual(arr.dtype, np.float64) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.float64 + assert np.isnan(arr.fill_value) arr = SparseArray([0, 1, 2, 4], dtype=np.int64) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray([0, 1, 2, 4], fill_value=0, dtype=np.int64) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray([0, 1, 2, 4], dtype=None) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray([0, 1, 2, 4], fill_value=0, dtype=None) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 def test_constructor_object_dtype(self): # GH 11856 arr = SparseArray(['A', 'A', np.nan, 'B'], dtype=np.object) - self.assertEqual(arr.dtype, np.object) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.object + assert np.isnan(arr.fill_value) arr = SparseArray(['A', 'A', np.nan, 'B'], dtype=np.object, fill_value='A') - self.assertEqual(arr.dtype, np.object) - self.assertEqual(arr.fill_value, 'A') + assert arr.dtype == np.object + assert arr.fill_value == 'A' def test_constructor_spindex_dtype(self): arr = SparseArray(data=[1, 2], sparse_index=IntIndex(4, [1, 2])) tm.assert_sp_array_equal(arr, SparseArray([np.nan, 1, 2, np.nan])) - self.assertEqual(arr.dtype, np.float64) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.float64 + assert np.isnan(arr.fill_value) arr = SparseArray(data=[1, 2, 3], sparse_index=IntIndex(4, [1, 2, 3]), dtype=np.int64, fill_value=0) exp = SparseArray([0, 1, 2, 3], dtype=np.int64, fill_value=0) tm.assert_sp_array_equal(arr, exp) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray(data=[1, 2], sparse_index=IntIndex(4, [1, 2]), fill_value=0, dtype=np.int64) exp = SparseArray([0, 1, 2, 0], fill_value=0, dtype=np.int64) tm.assert_sp_array_equal(arr, exp) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray(data=[1, 2, 3], sparse_index=IntIndex(4, [1, 2, 3]), dtype=None, fill_value=0) exp = SparseArray([0, 1, 2, 3], dtype=None) tm.assert_sp_array_equal(arr, exp) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 # scalar input arr = SparseArray(data=1, sparse_index=IntIndex(1, [0]), dtype=None) exp = SparseArray([1], dtype=None) tm.assert_sp_array_equal(arr, exp) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseArray(data=[1, 2], sparse_index=IntIndex(4, [1, 2]), fill_value=0, dtype=None) exp = SparseArray([0, 1, 2, 0], fill_value=0, dtype=None) tm.assert_sp_array_equal(arr, exp) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 def test_sparseseries_roundtrip(self): # GH 13999 @@ -132,27 +132,25 @@ def test_sparseseries_roundtrip(self): def test_get_item(self): - self.assertTrue(np.isnan(self.arr[1])) - self.assertEqual(self.arr[2], 1) - self.assertEqual(self.arr[7], 5) + assert np.isnan(self.arr[1]) + assert self.arr[2] == 1 + assert self.arr[7] == 5 - self.assertEqual(self.zarr[0], 0) - self.assertEqual(self.zarr[2], 1) - self.assertEqual(self.zarr[7], 5) + assert self.zarr[0] == 0 + assert self.zarr[2] == 1 + assert self.zarr[7] == 5 errmsg = re.compile("bounds") - assertRaisesRegexp(IndexError, errmsg, lambda: self.arr[11]) - assertRaisesRegexp(IndexError, errmsg, lambda: self.arr[-11]) - self.assertEqual(self.arr[-1], self.arr[len(self.arr) - 1]) + tm.assert_raises_regex(IndexError, errmsg, lambda: self.arr[11]) + tm.assert_raises_regex(IndexError, errmsg, lambda: self.arr[-11]) + assert self.arr[-1] == self.arr[len(self.arr) - 1] def test_take(self): - self.assertTrue(np.isnan(self.arr.take(0))) - self.assertTrue(np.isscalar(self.arr.take(2))) + assert np.isnan(self.arr.take(0)) + assert np.isscalar(self.arr.take(2)) - # np.take in < 1.8 doesn't support scalar indexing - if not _np_version_under1p8: - self.assertEqual(self.arr.take(2), np.take(self.arr_data, 2)) - self.assertEqual(self.arr.take(6), np.take(self.arr_data, 6)) + assert self.arr.take(2) == np.take(self.arr_data, 2) + assert self.arr.take(6) == np.take(self.arr_data, 6) exp = SparseArray(np.take(self.arr_data, [2, 3])) tm.assert_sp_array_equal(self.arr.take([2, 3]), exp) @@ -178,21 +176,22 @@ def test_take_negative(self): tm.assert_sp_array_equal(self.arr.take([-4, -3, -2]), exp) def test_bad_take(self): - assertRaisesRegexp(IndexError, "bounds", lambda: self.arr.take(11)) - self.assertRaises(IndexError, lambda: self.arr.take(-11)) + tm.assert_raises_regex( + IndexError, "bounds", lambda: self.arr.take(11)) + pytest.raises(IndexError, lambda: self.arr.take(-11)) def test_take_invalid_kwargs(self): msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, self.arr.take, - [2, 3], foo=2) + tm.assert_raises_regex(TypeError, msg, self.arr.take, + [2, 3], foo=2) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, self.arr.take, - [2, 3], out=self.arr) + tm.assert_raises_regex(ValueError, msg, self.arr.take, + [2, 3], out=self.arr) msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, self.arr.take, - [2, 3], mode='clip') + tm.assert_raises_regex(ValueError, msg, self.arr.take, + [2, 3], mode='clip') def test_take_filling(self): # similar tests as GH 12631 @@ -214,16 +213,16 @@ def test_take_filling(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): sparse.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): sparse.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, -6])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5]), fill_value=True) def test_take_filling_fill_value(self): @@ -246,16 +245,16 @@ def test_take_filling_fill_value(self): msg = ('When allow_fill=True and fill_value is not None, ' 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): sparse.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): sparse.take(np.array([1, 0, -5]), fill_value=True) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, -6])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5]), fill_value=True) def test_take_filling_all_nan(self): @@ -268,11 +267,11 @@ def test_take_filling_all_nan(self): expected = SparseArray([np.nan, np.nan, np.nan]) tm.assert_sp_array_equal(result, expected) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, -6])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5])) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.take(np.array([1, 5]), fill_value=True) def test_set_item(self): @@ -282,61 +281,61 @@ def setitem(): def setslice(): self.arr[1:5] = 2 - assertRaisesRegexp(TypeError, "item assignment", setitem) - assertRaisesRegexp(TypeError, "item assignment", setslice) + tm.assert_raises_regex(TypeError, "item assignment", setitem) + tm.assert_raises_regex(TypeError, "item assignment", setslice) def test_constructor_from_too_large_array(self): - assertRaisesRegexp(TypeError, "expected dimension <= 1 data", - SparseArray, np.arange(10).reshape((2, 5))) + tm.assert_raises_regex(TypeError, "expected dimension <= 1 data", + SparseArray, np.arange(10).reshape((2, 5))) def test_constructor_from_sparse(self): res = SparseArray(self.zarr) - self.assertEqual(res.fill_value, 0) + assert res.fill_value == 0 assert_almost_equal(res.sp_values, self.zarr.sp_values) def test_constructor_copy(self): cp = SparseArray(self.arr, copy=True) cp.sp_values[:3] = 0 - self.assertFalse((self.arr.sp_values[:3] == 0).any()) + assert not (self.arr.sp_values[:3] == 0).any() not_copy = SparseArray(self.arr) not_copy.sp_values[:3] = 0 - self.assertTrue((self.arr.sp_values[:3] == 0).all()) + assert (self.arr.sp_values[:3] == 0).all() def test_constructor_bool(self): # GH 10648 data = np.array([False, False, True, True, False, False]) arr = SparseArray(data, fill_value=False, dtype=bool) - self.assertEqual(arr.dtype, bool) + assert arr.dtype == bool tm.assert_numpy_array_equal(arr.sp_values, np.array([True, True])) tm.assert_numpy_array_equal(arr.sp_values, np.asarray(arr)) tm.assert_numpy_array_equal(arr.sp_index.indices, np.array([2, 3], np.int32)) for dense in [arr.to_dense(), arr.values]: - self.assertEqual(dense.dtype, bool) + assert dense.dtype == bool tm.assert_numpy_array_equal(dense, data) def test_constructor_bool_fill_value(self): arr = SparseArray([True, False, True], dtype=None) - self.assertEqual(arr.dtype, np.bool) - self.assertFalse(arr.fill_value) + assert arr.dtype == np.bool + assert not arr.fill_value arr = SparseArray([True, False, True], dtype=np.bool) - self.assertEqual(arr.dtype, np.bool) - self.assertFalse(arr.fill_value) + assert arr.dtype == np.bool + assert not arr.fill_value arr = SparseArray([True, False, True], dtype=np.bool, fill_value=True) - self.assertEqual(arr.dtype, np.bool) - self.assertTrue(arr.fill_value) + assert arr.dtype == np.bool + assert arr.fill_value def test_constructor_float32(self): # GH 10648 data = np.array([1., np.nan, 3], dtype=np.float32) arr = SparseArray(data, dtype=np.float32) - self.assertEqual(arr.dtype, np.float32) + assert arr.dtype == np.float32 tm.assert_numpy_array_equal(arr.sp_values, np.array([1, 3], dtype=np.float32)) tm.assert_numpy_array_equal(arr.sp_values, np.asarray(arr)) @@ -344,25 +343,25 @@ def test_constructor_float32(self): np.array([0, 2], dtype=np.int32)) for dense in [arr.to_dense(), arr.values]: - self.assertEqual(dense.dtype, np.float32) - self.assert_numpy_array_equal(dense, data) + assert dense.dtype == np.float32 + tm.assert_numpy_array_equal(dense, data) def test_astype(self): res = self.arr.astype('f8') res.sp_values[:3] = 27 - self.assertFalse((self.arr.sp_values[:3] == 27).any()) + assert not (self.arr.sp_values[:3] == 27).any() msg = "unable to coerce current fill_value nan to int64 dtype" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.arr.astype('i8') arr = SparseArray([0, np.nan, 0, 1]) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.astype('i8') arr = SparseArray([0, np.nan, 0, 1], fill_value=0) msg = 'Cannot convert non-finite values \\(NA or inf\\) to integer' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.astype('i8') def test_astype_all(self): @@ -373,46 +372,46 @@ def test_astype_all(self): np.int32, np.int16, np.int8] for typ in types: res = arr.astype(typ) - self.assertEqual(res.dtype, typ) - self.assertEqual(res.sp_values.dtype, typ) + assert res.dtype == typ + assert res.sp_values.dtype == typ tm.assert_numpy_array_equal(res.values, vals.astype(typ)) def test_set_fill_value(self): arr = SparseArray([1., np.nan, 2.], fill_value=np.nan) arr.fill_value = 2 - self.assertEqual(arr.fill_value, 2) + assert arr.fill_value == 2 arr = SparseArray([1, 0, 2], fill_value=0, dtype=np.int64) arr.fill_value = 2 - self.assertEqual(arr.fill_value, 2) + assert arr.fill_value == 2 # coerces to int msg = "unable to set fill_value 3\\.1 to int64 dtype" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.fill_value = 3.1 msg = "unable to set fill_value nan to int64 dtype" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.fill_value = np.nan arr = SparseArray([True, False, True], fill_value=False, dtype=np.bool) arr.fill_value = True - self.assertTrue(arr.fill_value) + assert arr.fill_value # coerces to bool msg = "unable to set fill_value 0 to bool dtype" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.fill_value = 0 msg = "unable to set fill_value nan to bool dtype" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.fill_value = np.nan # invalid msg = "fill_value must be a scalar" for val in [[1, 2, 3], np.array([1, 2]), (1, 2, 3)]: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): arr.fill_value = val def test_copy_shallow(self): @@ -497,10 +496,10 @@ def test_getslice_tuple(self): exp = SparseArray(dense[4:, ], fill_value=0) tm.assert_sp_array_equal(res, exp) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse[4:, :] - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): # check numpy compat dense[4:, :] @@ -522,19 +521,19 @@ def _check_op(op, first, second): res = op(first, second) exp = SparseArray(op(first.values, second.values), fill_value=first.fill_value) - tm.assertIsInstance(res, SparseArray) + assert isinstance(res, SparseArray) assert_almost_equal(res.values, exp.values) res2 = op(first, second.values) - tm.assertIsInstance(res2, SparseArray) + assert isinstance(res2, SparseArray) tm.assert_sp_array_equal(res, res2) res3 = op(first.values, second) - tm.assertIsInstance(res3, SparseArray) + assert isinstance(res3, SparseArray) tm.assert_sp_array_equal(res, res3) res4 = op(first, 4) - tm.assertIsInstance(res4, SparseArray) + assert isinstance(res4, SparseArray) # ignore this if the actual op raises (e.g. pow) try: @@ -547,7 +546,7 @@ def _check_op(op, first, second): def _check_inplace_op(op): tmp = arr1.copy() - self.assertRaises(NotImplementedError, op, tmp, arr2) + pytest.raises(NotImplementedError, op, tmp, arr2) with np.errstate(all='ignore'): bin_ops = [operator.add, operator.sub, operator.mul, @@ -563,7 +562,7 @@ def _check_inplace_op(op): def test_pickle(self): def _check_roundtrip(obj): - unpickled = self.round_trip_pickle(obj) + unpickled = tm.round_trip_pickle(obj) tm.assert_sp_array_equal(unpickled, obj) _check_roundtrip(self.arr) @@ -619,14 +618,14 @@ def test_fillna(self): # int dtype shouldn't have missing. No changes. s = SparseArray([0, 0, 0, 0]) - self.assertEqual(s.dtype, np.int64) - self.assertEqual(s.fill_value, 0) + assert s.dtype == np.int64 + assert s.fill_value == 0 res = s.fillna(-1) tm.assert_sp_array_equal(res, s) s = SparseArray([0, 0, 0, 0], fill_value=0) - self.assertEqual(s.dtype, np.int64) - self.assertEqual(s.fill_value, 0) + assert s.dtype == np.int64 + assert s.fill_value == 0 res = s.fillna(-1) exp = SparseArray([0, 0, 0, 0], fill_value=0) tm.assert_sp_array_equal(res, exp) @@ -634,8 +633,8 @@ def test_fillna(self): # fill_value can be nan if there is no missing hole. # only fill_value will be changed s = SparseArray([0, 0, 0, 0], fill_value=np.nan) - self.assertEqual(s.dtype, np.int64) - self.assertTrue(np.isnan(s.fill_value)) + assert s.dtype == np.int64 + assert np.isnan(s.fill_value) res = s.fillna(-1) exp = SparseArray([0, 0, 0, 0], fill_value=-1) tm.assert_sp_array_equal(res, exp) @@ -654,38 +653,39 @@ def test_fillna_overlap(self): tm.assert_sp_array_equal(res, exp) -class TestSparseArrayAnalytics(tm.TestCase): +class TestSparseArrayAnalytics(object): + def test_sum(self): data = np.arange(10).astype(float) out = SparseArray(data).sum() - self.assertEqual(out, 45.0) + assert out == 45.0 data[5] = np.nan out = SparseArray(data, fill_value=2).sum() - self.assertEqual(out, 40.0) + assert out == 40.0 out = SparseArray(data, fill_value=np.nan).sum() - self.assertEqual(out, 40.0) + assert out == 40.0 def test_numpy_sum(self): data = np.arange(10).astype(float) out = np.sum(SparseArray(data)) - self.assertEqual(out, 45.0) + assert out == 45.0 data[5] = np.nan out = np.sum(SparseArray(data, fill_value=2)) - self.assertEqual(out, 40.0) + assert out == 40.0 out = np.sum(SparseArray(data, fill_value=np.nan)) - self.assertEqual(out, 40.0) + assert out == 40.0 msg = "the 'dtype' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.sum, - SparseArray(data), dtype=np.int64) + tm.assert_raises_regex(ValueError, msg, np.sum, + SparseArray(data), dtype=np.int64) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.sum, - SparseArray(data), out=out) + tm.assert_raises_regex(ValueError, msg, np.sum, + SparseArray(data), out=out) def test_cumsum(self): non_null_data = np.array([1, 2, 3, 4, 5], dtype=float) @@ -709,7 +709,7 @@ def test_cumsum(self): axis = 1 # SparseArray currently 1-D, so only axis = 0 is valid. msg = "axis\\(={axis}\\) out of bounds".format(axis=axis) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): SparseArray(data).cumsum(axis=axis) def test_numpy_cumsum(self): @@ -733,38 +733,38 @@ def test_numpy_cumsum(self): tm.assert_sp_array_equal(out, expected) msg = "the 'dtype' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - SparseArray(data), dtype=np.int64) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + SparseArray(data), dtype=np.int64) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - SparseArray(data), out=out) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + SparseArray(data), out=out) def test_mean(self): data = np.arange(10).astype(float) out = SparseArray(data).mean() - self.assertEqual(out, 4.5) + assert out == 4.5 data[5] = np.nan out = SparseArray(data).mean() - self.assertEqual(out, 40.0 / 9) + assert out == 40.0 / 9 def test_numpy_mean(self): data = np.arange(10).astype(float) out = np.mean(SparseArray(data)) - self.assertEqual(out, 4.5) + assert out == 4.5 data[5] = np.nan out = np.mean(SparseArray(data)) - self.assertEqual(out, 40.0 / 9) + assert out == 40.0 / 9 msg = "the 'dtype' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.mean, - SparseArray(data), dtype=np.int64) + tm.assert_raises_regex(ValueError, msg, np.mean, + SparseArray(data), dtype=np.int64) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.mean, - SparseArray(data), out=out) + tm.assert_raises_regex(ValueError, msg, np.mean, + SparseArray(data), out=out) def test_ufunc(self): # GH 13853 make sure ufunc is applied to fill_value @@ -810,9 +810,3 @@ def test_ufunc_args(self): sparse = SparseArray([1, -1, 0, -2], fill_value=0) result = SparseArray([2, 0, 1, -1], fill_value=1) tm.assert_sp_array_equal(np.add(sparse, 1), result) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/sparse/tests/test_combine_concat.py b/pandas/tests/sparse/test_combine_concat.py similarity index 93% rename from pandas/sparse/tests/test_combine_concat.py rename to pandas/tests/sparse/test_combine_concat.py index fcdc6d9580dd5..15639fbe156c6 100644 --- a/pandas/sparse/tests/test_combine_concat.py +++ b/pandas/tests/sparse/test_combine_concat.py @@ -1,14 +1,11 @@ # pylint: disable-msg=E1101,W0612 -import nose # noqa import numpy as np import pandas as pd import pandas.util.testing as tm -class TestSparseSeriesConcat(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSparseSeriesConcat(object): def test_concat(self): val1 = np.array([1, 2, np.nan, np.nan, 0, np.nan]) @@ -72,7 +69,7 @@ def test_concat_axis1_different_fill(self): res = pd.concat([sparse1, sparse2], axis=1) exp = pd.concat([pd.Series(val1, name='x'), pd.Series(val2, name='y')], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) def test_concat_different_kind(self): @@ -125,11 +122,9 @@ def test_concat_sparse_dense(self): tm.assert_sp_series_equal(res, exp) -class TestSparseDataFrameConcat(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSparseDataFrameConcat(object): - def setUp(self): + def setup_method(self, method): self.dense1 = pd.DataFrame({'A': [0., 1., 2., np.nan], 'B': [0., 0., 0., 0.], @@ -239,12 +234,12 @@ def test_concat_different_columns(self): # each columns keeps its fill_value, thus compare in dense res = pd.concat([sparse, sparse3]) exp = pd.concat([self.dense1, self.dense3]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) res = pd.concat([sparse3, sparse]) exp = pd.concat([self.dense3, self.dense1]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) def test_concat_series(self): @@ -314,12 +309,12 @@ def test_concat_axis1(self): # each columns keeps its fill_value, thus compare in dense res = pd.concat([sparse, sparse3], axis=1) exp = pd.concat([self.dense1, self.dense3], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) res = pd.concat([sparse3, sparse], axis=1) exp = pd.concat([self.dense3, self.dense1], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) def test_concat_sparse_dense(self): @@ -327,38 +322,32 @@ def test_concat_sparse_dense(self): res = pd.concat([sparse, self.dense2]) exp = pd.concat([self.dense1, self.dense2]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) res = pd.concat([self.dense2, sparse]) exp = pd.concat([self.dense2, self.dense1]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) sparse = self.dense1.to_sparse(fill_value=0) res = pd.concat([sparse, self.dense2]) exp = pd.concat([self.dense1, self.dense2]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) res = pd.concat([self.dense2, sparse]) exp = pd.concat([self.dense2, self.dense1]) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) res = pd.concat([self.dense3, sparse], axis=1) exp = pd.concat([self.dense3, self.dense1], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res, exp) res = pd.concat([sparse, self.dense3], axis=1) exp = pd.concat([self.dense1, self.dense3], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res, exp) - - -if __name__ == '__main__': - import nose # noqa - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/sparse/tests/test_format.py b/pandas/tests/sparse/test_format.py similarity index 77% rename from pandas/sparse/tests/test_format.py rename to pandas/tests/sparse/test_format.py index 377eaa20565a2..d983bd209085a 100644 --- a/pandas/sparse/tests/test_format.py +++ b/pandas/tests/sparse/test_format.py @@ -13,9 +13,7 @@ use_32bit_repr = is_platform_windows() or is_platform_32bit() -class TestSparseSeriesFormatting(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSparseSeriesFormatting(object): @property def dtype_format_for_platform(self): @@ -29,16 +27,16 @@ def test_sparse_max_row(self): "4 NaN\ndtype: float64\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dfm)) - self.assertEqual(result, exp) + assert result == exp with option_context("display.max_rows", 3): # GH 10560 result = repr(s) exp = ("0 1.0\n ... \n4 NaN\n" - "dtype: float64\nBlockIndex\n" + "Length: 5, dtype: float64\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dfm)) - self.assertEqual(result, exp) + assert result == exp def test_sparse_mi_max_row(self): idx = pd.MultiIndex.from_tuples([('A', 0), ('A', 1), ('B', 0), @@ -52,16 +50,17 @@ def test_sparse_mi_max_row(self): "dtype: float64\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dfm)) - self.assertEqual(result, exp) + assert result == exp - with option_context("display.max_rows", 3): + with option_context("display.max_rows", 3, + "display.show_dimensions", False): # GH 13144 result = repr(s) exp = ("A 0 1.0\n ... \nC 2 NaN\n" "dtype: float64\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dfm)) - self.assertEqual(result, exp) + assert result == exp def test_sparse_bool(self): # GH 13110 @@ -74,15 +73,15 @@ def test_sparse_bool(self): "dtype: bool\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dtype)) - self.assertEqual(result, exp) + assert result == exp with option_context("display.max_rows", 3): result = repr(s) exp = ("0 True\n ... \n5 False\n" - "dtype: bool\nBlockIndex\n" + "Length: 6, dtype: bool\nBlockIndex\n" "Block locations: array([0, 3]{0})\n" "Block lengths: array([1, 1]{0})".format(dtype)) - self.assertEqual(result, exp) + assert result == exp def test_sparse_int(self): # GH 13110 @@ -94,18 +93,19 @@ def test_sparse_int(self): "5 0\ndtype: int64\nBlockIndex\n" "Block locations: array([1, 4]{0})\n" "Block lengths: array([1, 1]{0})".format(dtype)) - self.assertEqual(result, exp) + assert result == exp - with option_context("display.max_rows", 3): + with option_context("display.max_rows", 3, + "display.show_dimensions", False): result = repr(s) exp = ("0 0\n ..\n5 0\n" "dtype: int64\nBlockIndex\n" "Block locations: array([1, 4]{0})\n" "Block lengths: array([1, 1]{0})".format(dtype)) - self.assertEqual(result, exp) + assert result == exp -class TestSparseDataFrameFormatting(tm.TestCase): +class TestSparseDataFrameFormatting(object): def test_sparse_frame(self): # GH 13110 @@ -114,7 +114,19 @@ def test_sparse_frame(self): 'C': [0, 0, 3, 0, 5], 'D': [np.nan, np.nan, np.nan, 1, 2]}) sparse = df.to_sparse() - self.assertEqual(repr(sparse), repr(df)) + assert repr(sparse) == repr(df) with option_context("display.max_rows", 3): - self.assertEqual(repr(sparse), repr(df)) + assert repr(sparse) == repr(df) + + def test_sparse_repr_after_set(self): + # GH 15488 + sdf = pd.SparseDataFrame([[np.nan, 1], [2, np.nan]]) + res = sdf.copy() + + # Ignore the warning + with pd.option_context('mode.chained_assignment', None): + sdf[0][1] = 2 # This line triggers the bug + + repr(sdf) + tm.assert_sp_frame_equal(sdf, res) diff --git a/pandas/sparse/tests/test_frame.py b/pandas/tests/sparse/test_frame.py similarity index 68% rename from pandas/sparse/tests/test_frame.py rename to pandas/tests/sparse/test_frame.py index b9e8a31393931..004af5066fe83 100644 --- a/pandas/sparse/tests/test_frame.py +++ b/pandas/tests/sparse/test_frame.py @@ -2,29 +2,38 @@ import operator +import pytest +from warnings import catch_warnings from numpy import nan import numpy as np import pandas as pd from pandas import Series, DataFrame, bdate_range, Panel -from pandas.tseries.index import DatetimeIndex +from pandas.core.dtypes.common import ( + is_bool_dtype, + is_float_dtype, + is_object_dtype, + is_float) +from pandas.core.indexes.datetimes import DatetimeIndex from pandas.tseries.offsets import BDay -import pandas.util.testing as tm +from pandas.util import testing as tm from pandas.compat import lrange from pandas import compat -import pandas.sparse.frame as spf +from pandas.core.sparse import frame as spf -from pandas._sparse import BlockIndex, IntIndex -from pandas.sparse.api import SparseSeries, SparseDataFrame, SparseArray -from pandas.tests.frame.test_misc_api import SharedWithSparse +from pandas._libs.sparse import BlockIndex, IntIndex +from pandas.core.sparse.api import SparseSeries, SparseDataFrame, SparseArray +from pandas.tests.frame.test_api import SharedWithSparse -class TestSparseDataFrame(tm.TestCase, SharedWithSparse): - +class TestSparseDataFrame(SharedWithSparse): klass = SparseDataFrame - _multiprocess_can_split_ = True - def setUp(self): + # SharedWithSparse tests use generic, klass-agnostic assertion + _assert_frame_equal = staticmethod(tm.assert_sp_frame_equal) + _assert_series_equal = staticmethod(tm.assert_sp_series_equal) + + def setup_method(self, method): self.data = {'A': [nan, nan, nan, 0, 1, 2, 3, 4, 5, 6], 'B': [0, 1, 2, nan, nan, nan, 3, 4, 5, 6], 'C': np.arange(10, dtype=np.float64), @@ -38,6 +47,8 @@ def setUp(self): self.frame = SparseDataFrame(self.data, index=self.dates) self.iframe = SparseDataFrame(self.data, index=self.dates, default_kind='integer') + self.mixed_frame = self.frame.copy(False) + self.mixed_frame['foo'] = pd.SparseArray(['bar'] * len(self.dates)) values = self.frame.values.copy() values[np.isnan(values)] = 0 @@ -69,33 +80,33 @@ def test_fill_value_when_combine_const(self): def test_as_matrix(self): empty = self.empty.as_matrix() - self.assertEqual(empty.shape, (0, 0)) + assert empty.shape == (0, 0) no_cols = SparseDataFrame(index=np.arange(10)) mat = no_cols.as_matrix() - self.assertEqual(mat.shape, (10, 0)) + assert mat.shape == (10, 0) no_index = SparseDataFrame(columns=np.arange(10)) mat = no_index.as_matrix() - self.assertEqual(mat.shape, (0, 10)) + assert mat.shape == (0, 10) def test_copy(self): cp = self.frame.copy() - tm.assertIsInstance(cp, SparseDataFrame) + assert isinstance(cp, SparseDataFrame) tm.assert_sp_frame_equal(cp, self.frame) # as of v0.15.0 # this is now identical (but not is_a ) - self.assertTrue(cp.index.identical(self.frame.index)) + assert cp.index.identical(self.frame.index) def test_constructor(self): for col, series in compat.iteritems(self.frame): - tm.assertIsInstance(series, SparseSeries) + assert isinstance(series, SparseSeries) - tm.assertIsInstance(self.iframe['A'].sp_index, IntIndex) + assert isinstance(self.iframe['A'].sp_index, IntIndex) # constructed zframe from matrix above - self.assertEqual(self.zframe['A'].fill_value, 0) + assert self.zframe['A'].fill_value == 0 tm.assert_numpy_array_equal(pd.SparseArray([1., 2., 3., 4., 5., 6.]), self.zframe['A'].values) tm.assert_numpy_array_equal(np.array([0., 0., 0., 0., 1., 2., @@ -105,7 +116,7 @@ def test_constructor(self): # construct no data sdf = SparseDataFrame(columns=np.arange(10), index=np.arange(10)) for col, series in compat.iteritems(sdf): - tm.assertIsInstance(series, SparseSeries) + assert isinstance(series, SparseSeries) # construct from nested dict data = {} @@ -128,7 +139,7 @@ def test_constructor(self): tm.assert_sp_frame_equal(cons, reindexed, exact_indices=False) # assert level parameter breaks reindex - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): self.frame.reindex(idx, level=0) repr(self.frame) @@ -142,21 +153,21 @@ def test_constructor_ndarray(self): tm.assert_sp_frame_equal(sp, self.frame.reindex(columns=['A'])) # raise on level argument - self.assertRaises(TypeError, self.frame.reindex, columns=['A'], - level=1) + pytest.raises(TypeError, self.frame.reindex, columns=['A'], + level=1) # wrong length index / columns - with tm.assertRaisesRegexp(ValueError, "^Index length"): + with tm.assert_raises_regex(ValueError, "^Index length"): SparseDataFrame(self.frame.values, index=self.frame.index[:-1]) - with tm.assertRaisesRegexp(ValueError, "^Column length"): + with tm.assert_raises_regex(ValueError, "^Column length"): SparseDataFrame(self.frame.values, columns=self.frame.columns[:-1]) # GH 9272 def test_constructor_empty(self): sp = SparseDataFrame() - self.assertEqual(len(sp.index), 0) - self.assertEqual(len(sp.columns), 0) + assert len(sp.index) == 0 + assert len(sp.columns) == 0 def test_constructor_dataframe(self): dense = self.frame.to_dense() @@ -166,16 +177,16 @@ def test_constructor_dataframe(self): def test_constructor_convert_index_once(self): arr = np.array([1.5, 2.5, 3.5]) sdf = SparseDataFrame(columns=lrange(4), index=arr) - self.assertTrue(sdf[0].index is sdf[1].index) + assert sdf[0].index is sdf[1].index def test_constructor_from_series(self): # GH 2873 x = Series(np.random.randn(10000), name='a') x = x.to_sparse(fill_value=0) - tm.assertIsInstance(x, SparseSeries) + assert isinstance(x, SparseSeries) df = SparseDataFrame(x) - tm.assertIsInstance(df, SparseDataFrame) + assert isinstance(df, SparseDataFrame) x = Series(np.random.randn(10000), name='a') y = Series(np.random.randn(10000), name='b') @@ -196,24 +207,24 @@ def test_constructor_from_series(self): def test_constructor_preserve_attr(self): # GH 13866 arr = pd.SparseArray([1, 0, 3, 0], dtype=np.int64, fill_value=0) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 df = pd.SparseDataFrame({'x': arr}) - self.assertEqual(df['x'].dtype, np.int64) - self.assertEqual(df['x'].fill_value, 0) + assert df['x'].dtype == np.int64 + assert df['x'].fill_value == 0 s = pd.SparseSeries(arr, name='x') - self.assertEqual(s.dtype, np.int64) - self.assertEqual(s.fill_value, 0) + assert s.dtype == np.int64 + assert s.fill_value == 0 df = pd.SparseDataFrame(s) - self.assertEqual(df['x'].dtype, np.int64) - self.assertEqual(df['x'].fill_value, 0) + assert df['x'].dtype == np.int64 + assert df['x'].fill_value == 0 df = pd.SparseDataFrame({'x': s}) - self.assertEqual(df['x'].dtype, np.int64) - self.assertEqual(df['x'].fill_value, 0) + assert df['x'].dtype == np.int64 + assert df['x'].fill_value == 0 def test_constructor_nan_dataframe(self): # GH 10079 @@ -230,6 +241,18 @@ def test_constructor_nan_dataframe(self): dtype=float) tm.assert_sp_frame_equal(result, expected) + def test_type_coercion_at_construction(self): + # GH 15682 + result = pd.SparseDataFrame( + {'a': [1, 0, 0], 'b': [0, 1, 0], 'c': [0, 0, 1]}, dtype='uint8', + default_fill_value=0) + expected = pd.SparseDataFrame( + {'a': pd.SparseSeries([1, 0, 0], dtype='uint8'), + 'b': pd.SparseSeries([0, 1, 0], dtype='uint8'), + 'c': pd.SparseSeries([0, 0, 1], dtype='uint8')}, + default_fill_value=0) + tm.assert_sp_frame_equal(result, expected) + def test_dtypes(self): df = DataFrame(np.random.randn(10000, 4)) df.loc[:9998] = np.nan @@ -240,11 +263,11 @@ def test_dtypes(self): tm.assert_series_equal(result, expected) def test_shape(self): - # GH 10452 - self.assertEqual(self.frame.shape, (10, 4)) - self.assertEqual(self.iframe.shape, (10, 4)) - self.assertEqual(self.zframe.shape, (10, 4)) - self.assertEqual(self.fill_frame.shape, (10, 4)) + # see gh-10452 + assert self.frame.shape == (10, 4) + assert self.iframe.shape == (10, 4) + assert self.zframe.shape == (10, 4) + assert self.fill_frame.shape == (10, 4) def test_str(self): df = DataFrame(np.random.randn(10000, 4)) @@ -261,7 +284,7 @@ def test_array_interface(self): def test_pickle(self): def _test_roundtrip(frame, orig): - result = self.round_trip_pickle(frame) + result = tm.round_trip_pickle(frame) tm.assert_sp_frame_equal(frame, result) tm.assert_frame_equal(result.to_dense(), orig, check_dtype=False) @@ -272,30 +295,30 @@ def test_dense_to_sparse(self): df = DataFrame({'A': [nan, nan, nan, 1, 2], 'B': [1, 2, nan, nan, nan]}) sdf = df.to_sparse() - tm.assertIsInstance(sdf, SparseDataFrame) - self.assertTrue(np.isnan(sdf.default_fill_value)) - tm.assertIsInstance(sdf['A'].sp_index, BlockIndex) + assert isinstance(sdf, SparseDataFrame) + assert np.isnan(sdf.default_fill_value) + assert isinstance(sdf['A'].sp_index, BlockIndex) tm.assert_frame_equal(sdf.to_dense(), df) sdf = df.to_sparse(kind='integer') - tm.assertIsInstance(sdf['A'].sp_index, IntIndex) + assert isinstance(sdf['A'].sp_index, IntIndex) df = DataFrame({'A': [0, 0, 0, 1, 2], 'B': [1, 2, 0, 0, 0]}, dtype=float) sdf = df.to_sparse(fill_value=0) - self.assertEqual(sdf.default_fill_value, 0) + assert sdf.default_fill_value == 0 tm.assert_frame_equal(sdf.to_dense(), df) def test_density(self): df = SparseSeries([nan, nan, nan, 0, 1, 2, 3, 4, 5, 6]) - self.assertEqual(df.density, 0.7) + assert df.density == 0.7 df = SparseDataFrame({'A': [nan, nan, nan, 0, 1, 2, 3, 4, 5, 6], 'B': [0, 1, 2, nan, nan, nan, 3, 4, 5, 6], 'C': np.arange(10), 'D': [0, 1, 2, 3, 4, 5, nan, nan, nan, nan]}) - self.assertEqual(df.density, 0.75) + assert df.density == 0.75 def test_sparse_to_dense(self): pass @@ -325,7 +348,7 @@ def _compare_to_dense(a, b, da, db, op): if isinstance(a, DataFrame) and isinstance(db, DataFrame): mixed_result = op(a, db) - tm.assertIsInstance(mixed_result, SparseDataFrame) + assert isinstance(mixed_result, SparseDataFrame) tm.assert_sp_frame_equal(mixed_result, sparse_result, exact_indices=False) @@ -368,10 +391,10 @@ def _compare_to_dense(a, b, da, db, op): def test_op_corners(self): empty = self.empty + self.empty - self.assertTrue(empty.empty) + assert empty.empty foo = self.frame + self.empty - tm.assertIsInstance(foo.index, DatetimeIndex) + assert isinstance(foo.index, DatetimeIndex) tm.assert_frame_equal(foo, self.frame * np.nan) foo = self.empty + self.frame @@ -388,42 +411,41 @@ def test_getitem(self): exp = sdf.reindex(columns=['a', 'b']) tm.assert_sp_frame_equal(result, exp) - self.assertRaises(Exception, sdf.__getitem__, ['a', 'd']) + pytest.raises(Exception, sdf.__getitem__, ['a', 'd']) - def test_icol(self): - # 10711 deprecated + def test_iloc(self): # 2227 result = self.frame.iloc[:, 0] - self.assertTrue(isinstance(result, SparseSeries)) + assert isinstance(result, SparseSeries) tm.assert_sp_series_equal(result, self.frame['A']) # preserve sparse index type. #2251 data = {'A': [0, 1]} iframe = SparseDataFrame(data, default_kind='integer') - self.assertEqual(type(iframe['A'].sp_index), - type(iframe.iloc[:, 0].sp_index)) + tm.assert_class_equal(iframe['A'].sp_index, + iframe.iloc[:, 0].sp_index) def test_set_value(self): - # ok as the index gets conver to object + # ok, as the index gets converted to object frame = self.frame.copy() res = frame.set_value('foobar', 'B', 1.5) - self.assertEqual(res.index.dtype, 'object') + assert res.index.dtype == 'object' res = self.frame res.index = res.index.astype(object) res = self.frame.set_value('foobar', 'B', 1.5) - self.assertIsNot(res, self.frame) - self.assertEqual(res.index[-1], 'foobar') - self.assertEqual(res.get_value('foobar', 'B'), 1.5) + assert res is not self.frame + assert res.index[-1] == 'foobar' + assert res.get_value('foobar', 'B') == 1.5 res2 = res.set_value('foobar', 'qux', 1.5) - self.assertIsNot(res2, res) - self.assert_index_equal(res2.columns, - pd.Index(list(self.frame.columns) + ['qux'])) - self.assertEqual(res2.get_value('foobar', 'qux'), 1.5) + assert res2 is not res + tm.assert_index_equal(res2.columns, + pd.Index(list(self.frame.columns) + ['qux'])) + assert res2.get_value('foobar', 'qux') == 1.5 def test_fancy_index_misc(self): # axis = 0 @@ -448,8 +470,8 @@ def test_getitem_overload(self): subindex = self.frame.index[indexer] subframe = self.frame[indexer] - self.assert_index_equal(subindex, subframe.index) - self.assertRaises(Exception, self.frame.__getitem__, indexer[:-1]) + tm.assert_index_equal(subindex, subframe.index) + pytest.raises(Exception, self.frame.__getitem__, indexer[:-1]) def test_setitem(self): @@ -458,7 +480,7 @@ def _check_frame(frame, orig): # insert SparseSeries frame['E'] = frame['A'] - tm.assertIsInstance(frame['E'], SparseSeries) + assert isinstance(frame['E'], SparseSeries) tm.assert_sp_series_equal(frame['E'], frame['A'], check_names=False) @@ -468,11 +490,11 @@ def _check_frame(frame, orig): expected = to_insert.to_dense().reindex(frame.index) result = frame['E'].to_dense() tm.assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, 'E') + assert result.name == 'E' # insert Series frame['F'] = frame['A'].to_dense() - tm.assertIsInstance(frame['F'], SparseSeries) + assert isinstance(frame['F'], SparseSeries) tm.assert_sp_series_equal(frame['F'], frame['A'], check_names=False) @@ -485,24 +507,24 @@ def _check_frame(frame, orig): # insert ndarray frame['H'] = np.random.randn(N) - tm.assertIsInstance(frame['H'], SparseSeries) + assert isinstance(frame['H'], SparseSeries) to_sparsify = np.random.randn(N) to_sparsify[N // 2:] = frame.default_fill_value frame['I'] = to_sparsify - self.assertEqual(len(frame['I'].sp_values), N // 2) + assert len(frame['I'].sp_values) == N // 2 # insert ndarray wrong size - self.assertRaises(Exception, frame.__setitem__, 'foo', - np.random.randn(N - 1)) + pytest.raises(Exception, frame.__setitem__, 'foo', + np.random.randn(N - 1)) # scalar value frame['J'] = 5 - self.assertEqual(len(frame['J'].sp_values), N) - self.assertTrue((frame['J'].sp_values == 5).all()) + assert len(frame['J'].sp_values) == N + assert (frame['J'].sp_values == 5).all() frame['K'] = frame.default_fill_value - self.assertEqual(len(frame['K'].sp_values), 0) + assert len(frame['K'].sp_values) == 0 self._check_all(_check_frame) @@ -529,25 +551,25 @@ def test_delitem(self): C = self.frame['C'] del self.frame['B'] - self.assertNotIn('B', self.frame) + assert 'B' not in self.frame tm.assert_sp_series_equal(self.frame['A'], A) tm.assert_sp_series_equal(self.frame['C'], C) del self.frame['D'] - self.assertNotIn('D', self.frame) + assert 'D' not in self.frame del self.frame['A'] - self.assertNotIn('A', self.frame) + assert 'A' not in self.frame def test_set_columns(self): self.frame.columns = self.frame.columns - self.assertRaises(Exception, setattr, self.frame, 'columns', - self.frame.columns[:-1]) + pytest.raises(Exception, setattr, self.frame, 'columns', + self.frame.columns[:-1]) def test_set_index(self): self.frame.index = self.frame.index - self.assertRaises(Exception, setattr, self.frame, 'index', - self.frame.index[:-1]) + pytest.raises(Exception, setattr, self.frame, 'index', + self.frame.index[:-1]) def test_append(self): a = self.frame[:5] @@ -564,20 +586,20 @@ def test_append(self): def test_apply(self): applied = self.frame.apply(np.sqrt) - tm.assertIsInstance(applied, SparseDataFrame) + assert isinstance(applied, SparseDataFrame) tm.assert_almost_equal(applied.values, np.sqrt(self.frame.values)) applied = self.fill_frame.apply(np.sqrt) - self.assertEqual(applied['A'].fill_value, np.sqrt(2)) + assert applied['A'].fill_value == np.sqrt(2) # agg / broadcast broadcasted = self.frame.apply(np.sum, broadcast=True) - tm.assertIsInstance(broadcasted, SparseDataFrame) + assert isinstance(broadcasted, SparseDataFrame) exp = self.frame.to_dense().apply(np.sum, broadcast=True) tm.assert_frame_equal(broadcasted.to_dense(), exp) - self.assertIs(self.empty.apply(np.sqrt), self.empty) + assert self.empty.apply(np.sqrt) is self.empty from pandas.core import nanops applied = self.frame.apply(np.sum) @@ -591,9 +613,9 @@ def test_apply_nonuq(self): res = sparse.apply(lambda s: s[0], axis=1) exp = orig.apply(lambda s: s[0], axis=1) # dtype must be kept - self.assertEqual(res.dtype, np.int64) + assert res.dtype == np.int64 # ToDo: apply must return subclassed dtype - self.assertIsInstance(res, pd.Series) + assert isinstance(res, pd.Series) tm.assert_series_equal(res.to_dense(), exp) # df.T breaks @@ -606,15 +628,15 @@ def test_apply_nonuq(self): def test_applymap(self): # just test that it works result = self.frame.applymap(lambda x: x * 2) - tm.assertIsInstance(result, SparseDataFrame) + assert isinstance(result, SparseDataFrame) def test_astype(self): sparse = pd.SparseDataFrame({'A': SparseArray([1, 2, 3, 4], dtype=np.int64), 'B': SparseArray([4, 5, 6, 7], dtype=np.int64)}) - self.assertEqual(sparse['A'].dtype, np.int64) - self.assertEqual(sparse['B'].dtype, np.int64) + assert sparse['A'].dtype == np.int64 + assert sparse['B'].dtype == np.int64 res = sparse.astype(np.float64) exp = pd.SparseDataFrame({'A': SparseArray([1., 2., 3., 4.], @@ -623,16 +645,16 @@ def test_astype(self): fill_value=0.)}, default_fill_value=np.nan) tm.assert_sp_frame_equal(res, exp) - self.assertEqual(res['A'].dtype, np.float64) - self.assertEqual(res['B'].dtype, np.float64) + assert res['A'].dtype == np.float64 + assert res['B'].dtype == np.float64 sparse = pd.SparseDataFrame({'A': SparseArray([0, 2, 0, 4], dtype=np.int64), 'B': SparseArray([0, 5, 0, 7], dtype=np.int64)}, default_fill_value=0) - self.assertEqual(sparse['A'].dtype, np.int64) - self.assertEqual(sparse['B'].dtype, np.int64) + assert sparse['A'].dtype == np.int64 + assert sparse['B'].dtype == np.int64 res = sparse.astype(np.float64) exp = pd.SparseDataFrame({'A': SparseArray([0., 2., 0., 4.], @@ -641,8 +663,8 @@ def test_astype(self): fill_value=0.)}, default_fill_value=0.) tm.assert_sp_frame_equal(res, exp) - self.assertEqual(res['A'].dtype, np.float64) - self.assertEqual(res['B'].dtype, np.float64) + assert res['A'].dtype == np.float64 + assert res['B'].dtype == np.float64 def test_astype_bool(self): sparse = pd.SparseDataFrame({'A': SparseArray([0, 2, 0, 4], @@ -652,8 +674,8 @@ def test_astype_bool(self): fill_value=0, dtype=np.int64)}, default_fill_value=0) - self.assertEqual(sparse['A'].dtype, np.int64) - self.assertEqual(sparse['B'].dtype, np.int64) + assert sparse['A'].dtype == np.int64 + assert sparse['B'].dtype == np.int64 res = sparse.astype(bool) exp = pd.SparseDataFrame({'A': SparseArray([False, True, False, True], @@ -664,8 +686,8 @@ def test_astype_bool(self): fill_value=False)}, default_fill_value=False) tm.assert_sp_frame_equal(res, exp) - self.assertEqual(res['A'].dtype, np.bool) - self.assertEqual(res['B'].dtype, np.bool) + assert res['A'].dtype == np.bool + assert res['B'].dtype == np.bool def test_fillna(self): df = self.zframe.reindex(lrange(5)) @@ -705,10 +727,63 @@ def test_fillna_fill_value(self): tm.assert_frame_equal(sparse.fillna(-1).to_dense(), df.fillna(-1), check_dtype=False) + def test_sparse_frame_pad_backfill_limit(self): + index = np.arange(10) + df = DataFrame(np.random.randn(10, 4), index=index) + sdf = df.to_sparse() + + result = sdf[:2].reindex(index, method='pad', limit=5) + + expected = sdf[:2].reindex(index).fillna(method='pad') + expected = expected.to_dense() + expected.values[-3:] = np.nan + expected = expected.to_sparse() + tm.assert_frame_equal(result, expected) + + result = sdf[-2:].reindex(index, method='backfill', limit=5) + + expected = sdf[-2:].reindex(index).fillna(method='backfill') + expected = expected.to_dense() + expected.values[:3] = np.nan + expected = expected.to_sparse() + tm.assert_frame_equal(result, expected) + + def test_sparse_frame_fillna_limit(self): + index = np.arange(10) + df = DataFrame(np.random.randn(10, 4), index=index) + sdf = df.to_sparse() + + result = sdf[:2].reindex(index) + result = result.fillna(method='pad', limit=5) + + expected = sdf[:2].reindex(index).fillna(method='pad') + expected = expected.to_dense() + expected.values[-3:] = np.nan + expected = expected.to_sparse() + tm.assert_frame_equal(result, expected) + + result = sdf[-2:].reindex(index) + result = result.fillna(method='backfill', limit=5) + + expected = sdf[-2:].reindex(index).fillna(method='backfill') + expected = expected.to_dense() + expected.values[:3] = np.nan + expected = expected.to_sparse() + tm.assert_frame_equal(result, expected) + def test_rename(self): - # just check this works - renamed = self.frame.rename(index=str) # noqa - renamed = self.frame.rename(columns=lambda x: '%s%d' % (x, len(x))) # noqa + result = self.frame.rename(index=str) + expected = SparseDataFrame(self.data, index=self.dates.strftime( + "%Y-%m-%d %H:%M:%S")) + tm.assert_sp_frame_equal(result, expected) + + result = self.frame.rename(columns=lambda x: '%s%d' % (x, len(x))) + data = {'A1': [nan, nan, nan, 0, 1, 2, 3, 4, 5, 6], + 'B1': [0, 1, 2, nan, nan, nan, 3, 4, 5, 6], + 'C1': np.arange(10, dtype=np.float64), + 'D1': [0, 1, 2, 3, 4, 5, nan, nan, nan, nan]} + expected = SparseDataFrame(data, index=self.dates) + tm.assert_sp_frame_equal(result, expected) def test_corr(self): res = self.frame.corr() @@ -727,10 +802,10 @@ def test_join(self): tm.assert_sp_frame_equal(joined, self.frame, exact_indices=False) right = self.frame.loc[:, ['B', 'D']] - self.assertRaises(Exception, left.join, right) + pytest.raises(Exception, left.join, right) - with tm.assertRaisesRegexp(ValueError, - 'Other Series must have a name'): + with tm.assert_raises_regex(ValueError, + 'Other Series must have a name'): self.frame.join(Series( np.random.randn(len(self.frame)), index=self.frame.index)) @@ -760,22 +835,22 @@ def _check_frame(frame): # length zero length_zero = frame.reindex([]) - self.assertEqual(len(length_zero), 0) - self.assertEqual(len(length_zero.columns), len(frame.columns)) - self.assertEqual(len(length_zero['A']), 0) + assert len(length_zero) == 0 + assert len(length_zero.columns) == len(frame.columns) + assert len(length_zero['A']) == 0 # frame being reindexed has length zero length_n = length_zero.reindex(index) - self.assertEqual(len(length_n), len(frame)) - self.assertEqual(len(length_n.columns), len(frame.columns)) - self.assertEqual(len(length_n['A']), len(frame)) + assert len(length_n) == len(frame) + assert len(length_n.columns) == len(frame.columns) + assert len(length_n['A']) == len(frame) # reindex columns reindexed = frame.reindex(columns=['A', 'B', 'Z']) - self.assertEqual(len(reindexed.columns), 3) + assert len(reindexed.columns) == 3 tm.assert_almost_equal(reindexed['Z'].fill_value, frame.default_fill_value) - self.assertTrue(np.isnan(reindexed['Z'].sp_values).all()) + assert np.isnan(reindexed['Z'].sp_values).all() _check_frame(self.frame) _check_frame(self.iframe) @@ -785,11 +860,11 @@ def _check_frame(frame): # with copy=False reindexed = self.frame.reindex(self.frame.index, copy=False) reindexed['F'] = reindexed['A'] - self.assertIn('F', self.frame) + assert 'F' in self.frame reindexed = self.frame.reindex(self.frame.index) reindexed['G'] = reindexed['A'] - self.assertNotIn('G', self.frame) + assert 'G' not in self.frame def test_reindex_fill_value(self): rng = bdate_range('20110110', periods=20) @@ -862,11 +937,11 @@ def test_reindex_method(self): tm.assert_sp_frame_equal(result, expected) # method='bfill' - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): sparse.reindex(columns=range(6), method='bfill') # method='ffill' - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): sparse.reindex(columns=range(6), method='ffill') def test_take(self): @@ -883,23 +958,25 @@ def _check(frame, orig): self._check_all(_check) def test_stack_sparse_frame(self): - def _check(frame): - dense_frame = frame.to_dense() # noqa + with catch_warnings(record=True): + + def _check(frame): + dense_frame = frame.to_dense() # noqa - wp = Panel.from_dict({'foo': frame}) - from_dense_lp = wp.to_frame() + wp = Panel.from_dict({'foo': frame}) + from_dense_lp = wp.to_frame() - from_sparse_lp = spf.stack_sparse_frame(frame) + from_sparse_lp = spf.stack_sparse_frame(frame) - self.assert_numpy_array_equal(from_dense_lp.values, - from_sparse_lp.values) + tm.assert_numpy_array_equal(from_dense_lp.values, + from_sparse_lp.values) - _check(self.frame) - _check(self.iframe) + _check(self.frame) + _check(self.iframe) - # for now - self.assertRaises(Exception, _check, self.zframe) - self.assertRaises(Exception, _check, self.fill_frame) + # for now + pytest.raises(Exception, _check, self.zframe) + pytest.raises(Exception, _check, self.fill_frame) def test_transpose(self): @@ -917,7 +994,6 @@ def _check(frame, orig): def test_shift(self): def _check(frame, orig): - shifted = frame.shift(0) exp = orig.shift(0) tm.assert_frame_equal(shifted.to_dense(), exp) @@ -932,12 +1008,14 @@ def _check(frame, orig): shifted = frame.shift(2, freq='B') exp = orig.shift(2, freq='B') - exp = exp.to_sparse(frame.default_fill_value) + exp = exp.to_sparse(frame.default_fill_value, + kind=frame.default_kind) tm.assert_frame_equal(shifted, exp) shifted = frame.shift(2, freq=BDay()) exp = orig.shift(2, freq=BDay()) - exp = exp.to_sparse(frame.default_fill_value) + exp = exp.to_sparse(frame.default_fill_value, + kind=frame.default_kind) tm.assert_frame_equal(shifted, exp) self._check_all(_check) @@ -972,7 +1050,7 @@ def test_numpy_transpose(self): tm.assert_sp_frame_equal(result, sdf) msg = "the 'axes' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.transpose, sdf, axes=1) + tm.assert_raises_regex(ValueError, msg, np.transpose, sdf, axes=1) def test_combine_first(self): df = self.frame @@ -1010,33 +1088,35 @@ def test_sparse_pow_issue(self): df = SparseDataFrame({'A': [nan, 0, 1]}) # note that 2 ** df works fine, also df ** 1 - result = 1**df + result = 1 ** df r1 = result.take([0], 1)['A'] r2 = result['A'] - self.assertEqual(len(r2.sp_values), len(r1.sp_values)) + assert len(r2.sp_values) == len(r1.sp_values) def test_as_blocks(self): df = SparseDataFrame({'A': [1.1, 3.3], 'B': [nan, -3.9]}, dtype='float64') df_blocks = df.blocks - self.assertEqual(list(df_blocks.keys()), ['float64']) + assert list(df_blocks.keys()) == ['float64'] tm.assert_frame_equal(df_blocks['float64'], df) + @pytest.mark.xfail(reason='nan column names in _init_dict problematic ' + '(GH 16894)') def test_nan_columnname(self): # GH 8822 nan_colname = DataFrame(Series(1.0, index=[0]), columns=[nan]) nan_colname_sparse = nan_colname.to_sparse() - self.assertTrue(np.isnan(nan_colname_sparse.columns[0])) + assert np.isnan(nan_colname_sparse.columns[0]) - def test_isnull(self): + def test_isna(self): # GH 8276 df = pd.SparseDataFrame({'A': [np.nan, np.nan, 1, 2, np.nan], 'B': [0, np.nan, np.nan, 2, np.nan]}) - res = df.isnull() + res = df.isna() exp = pd.SparseDataFrame({'A': [True, True, False, False, True], 'B': [False, True, True, False, True]}, default_fill_value=True) @@ -1047,18 +1127,18 @@ def test_isnull(self): df = pd.SparseDataFrame({'A': [0, 0, 1, 2, np.nan], 'B': [0, np.nan, 0, 2, np.nan]}, default_fill_value=0.) - res = df.isnull() - tm.assertIsInstance(res, pd.SparseDataFrame) + res = df.isna() + assert isinstance(res, pd.SparseDataFrame) exp = pd.DataFrame({'A': [False, False, False, False, True], 'B': [False, True, False, False, True]}) tm.assert_frame_equal(res.to_dense(), exp) - def test_isnotnull(self): + def test_notna(self): # GH 8276 df = pd.SparseDataFrame({'A': [np.nan, np.nan, 1, 2, np.nan], 'B': [0, np.nan, np.nan, 2, np.nan]}) - res = df.isnotnull() + res = df.notna() exp = pd.SparseDataFrame({'A': [False, False, True, True, False], 'B': [True, False, False, True, False]}, default_fill_value=False) @@ -1069,14 +1149,170 @@ def test_isnotnull(self): df = pd.SparseDataFrame({'A': [0, 0, 1, 2, np.nan], 'B': [0, np.nan, 0, 2, np.nan]}, default_fill_value=0.) - res = df.isnotnull() - tm.assertIsInstance(res, pd.SparseDataFrame) + res = df.notna() + assert isinstance(res, pd.SparseDataFrame) exp = pd.DataFrame({'A': [True, True, True, True, False], 'B': [True, False, True, True, False]}) tm.assert_frame_equal(res.to_dense(), exp) -class TestSparseDataFrameArithmetic(tm.TestCase): +@pytest.mark.parametrize('index', [None, list('abc')]) # noqa: F811 +@pytest.mark.parametrize('columns', [None, list('def')]) +@pytest.mark.parametrize('fill_value', [None, 0, np.nan]) +@pytest.mark.parametrize('dtype', [bool, int, float, np.uint16]) +def test_from_to_scipy(spmatrix, index, columns, fill_value, dtype): + # GH 4343 + tm.skip_if_no_package('scipy') + + # Make one ndarray and from it one sparse matrix, both to be used for + # constructing frames and comparing results + arr = np.eye(3, dtype=dtype) + # GH 16179 + arr[0, 1] = dtype(2) + try: + spm = spmatrix(arr) + assert spm.dtype == arr.dtype + except (TypeError, AssertionError): + # If conversion to sparse fails for this spmatrix type and arr.dtype, + # then the combination is not currently supported in NumPy, so we + # can just skip testing it thoroughly + return + + sdf = pd.SparseDataFrame(spm, index=index, columns=columns, + default_fill_value=fill_value) + + # Expected result construction is kind of tricky for all + # dtype-fill_value combinations; easiest to cast to something generic + # and except later on + rarr = arr.astype(object) + rarr[arr == 0] = np.nan + expected = pd.SparseDataFrame(rarr, index=index, columns=columns).fillna( + fill_value if fill_value is not None else np.nan) + + # Assert frame is as expected + sdf_obj = sdf.astype(object) + tm.assert_sp_frame_equal(sdf_obj, expected) + tm.assert_frame_equal(sdf_obj.to_dense(), expected.to_dense()) + + # Assert spmatrices equal + assert dict(sdf.to_coo().todok()) == dict(spm.todok()) + + # Ensure dtype is preserved if possible + was_upcast = ((fill_value is None or is_float(fill_value)) and + not is_object_dtype(dtype) and + not is_float_dtype(dtype)) + res_dtype = (bool if is_bool_dtype(dtype) else + float if was_upcast else + dtype) + tm.assert_contains_all(sdf.dtypes, {np.dtype(res_dtype)}) + assert sdf.to_coo().dtype == res_dtype + + # However, adding a str column results in an upcast to object + sdf['strings'] = np.arange(len(sdf)).astype(str) + assert sdf.to_coo().dtype == np.object_ + + +@pytest.mark.parametrize('fill_value', [None, 0, np.nan]) # noqa: F811 +def test_from_to_scipy_object(spmatrix, fill_value): + # GH 4343 + dtype = object + columns = list('cd') + index = list('ab') + tm.skip_if_no_package('scipy', max_version='0.19.0') + + # Make one ndarray and from it one sparse matrix, both to be used for + # constructing frames and comparing results + arr = np.eye(2, dtype=dtype) + try: + spm = spmatrix(arr) + assert spm.dtype == arr.dtype + except (TypeError, AssertionError): + # If conversion to sparse fails for this spmatrix type and arr.dtype, + # then the combination is not currently supported in NumPy, so we + # can just skip testing it thoroughly + return + + sdf = pd.SparseDataFrame(spm, index=index, columns=columns, + default_fill_value=fill_value) + + # Expected result construction is kind of tricky for all + # dtype-fill_value combinations; easiest to cast to something generic + # and except later on + rarr = arr.astype(object) + rarr[arr == 0] = np.nan + expected = pd.SparseDataFrame(rarr, index=index, columns=columns).fillna( + fill_value if fill_value is not None else np.nan) + + # Assert frame is as expected + sdf_obj = sdf.astype(object) + tm.assert_sp_frame_equal(sdf_obj, expected) + tm.assert_frame_equal(sdf_obj.to_dense(), expected.to_dense()) + + # Assert spmatrices equal + assert dict(sdf.to_coo().todok()) == dict(spm.todok()) + + # Ensure dtype is preserved if possible + res_dtype = object + tm.assert_contains_all(sdf.dtypes, {np.dtype(res_dtype)}) + assert sdf.to_coo().dtype == res_dtype + + +def test_from_scipy_correct_ordering(spmatrix): + # GH 16179 + tm.skip_if_no_package('scipy') + + arr = np.arange(1, 5).reshape(2, 2) + try: + spm = spmatrix(arr) + assert spm.dtype == arr.dtype + except (TypeError, AssertionError): + # If conversion to sparse fails for this spmatrix type and arr.dtype, + # then the combination is not currently supported in NumPy, so we + # can just skip testing it thoroughly + return + + sdf = pd.SparseDataFrame(spm) + expected = pd.SparseDataFrame(arr) + tm.assert_sp_frame_equal(sdf, expected) + tm.assert_frame_equal(sdf.to_dense(), expected.to_dense()) + + +def test_from_scipy_fillna(spmatrix): + # GH 16112 + tm.skip_if_no_package('scipy') + + arr = np.eye(3) + arr[1:, 0] = np.nan + + try: + spm = spmatrix(arr) + assert spm.dtype == arr.dtype + except (TypeError, AssertionError): + # If conversion to sparse fails for this spmatrix type and arr.dtype, + # then the combination is not currently supported in NumPy, so we + # can just skip testing it thoroughly + return + + sdf = pd.SparseDataFrame(spm).fillna(-1.0) + + # Returning frame should fill all nan values with -1.0 + expected = pd.SparseDataFrame({ + 0: pd.SparseSeries([1., -1, -1]), + 1: pd.SparseSeries([np.nan, 1, np.nan]), + 2: pd.SparseSeries([np.nan, np.nan, 1]), + }, default_fill_value=-1) + + # fill_value is expected to be what .fillna() above was called with + # We don't use -1 as initial fill_value in expected SparseSeries + # construction because this way we obtain "compressed" SparseArrays, + # avoiding having to construct them ourselves + for col in expected: + expected[col].fill_value = -1 + + tm.assert_sp_frame_equal(sdf, expected) + + +class TestSparseDataFrameArithmetic(object): def test_numeric_op_scalar(self): df = pd.DataFrame({'A': [nan, nan, 0, 1, ], @@ -1097,16 +1333,16 @@ def test_comparison_op_scalar(self): # comparison changes internal repr, compare with dense res = sparse > 1 - tm.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), df > 1) res = sparse != 0 - tm.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), df != 0) -class TestSparseDataFrameAnalytics(tm.TestCase): - def setUp(self): +class TestSparseDataFrameAnalytics(object): + def setup_method(self, method): self.data = {'A': [nan, nan, nan, 0, 1, 2, 3, 4, 5, 6], 'B': [0, 1, 2, nan, nan, nan, 3, 4, 5, 6], 'C': np.arange(10, dtype=float), @@ -1134,12 +1370,12 @@ def test_numpy_cumsum(self): tm.assert_sp_frame_equal(result, expected) msg = "the 'dtype' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - self.frame, dtype=np.int64) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + self.frame, dtype=np.int64) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - self.frame, out=result) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + self.frame, out=result) def test_numpy_func_call(self): # no exception should be raised even though @@ -1149,8 +1385,3 @@ def test_numpy_func_call(self): 'std', 'min', 'max'] for func in funcs: getattr(np, func)(self.frame) - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/sparse/tests/test_groupby.py b/pandas/tests/sparse/test_groupby.py similarity index 94% rename from pandas/sparse/tests/test_groupby.py rename to pandas/tests/sparse/test_groupby.py index 0cb33f4ea0a56..c9049ed9743dd 100644 --- a/pandas/sparse/tests/test_groupby.py +++ b/pandas/tests/sparse/test_groupby.py @@ -4,11 +4,9 @@ import pandas.util.testing as tm -class TestSparseGroupBy(tm.TestCase): +class TestSparseGroupBy(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.dense = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', diff --git a/pandas/sparse/tests/test_indexing.py b/pandas/tests/sparse/test_indexing.py similarity index 83% rename from pandas/sparse/tests/test_indexing.py rename to pandas/tests/sparse/test_indexing.py index a634c34139186..382cff4b9d0ac 100644 --- a/pandas/sparse/tests/test_indexing.py +++ b/pandas/tests/sparse/test_indexing.py @@ -1,16 +1,14 @@ # pylint: disable-msg=E1101,W0612 -import nose # noqa +import pytest import numpy as np import pandas as pd import pandas.util.testing as tm -class TestSparseSeriesIndexing(tm.TestCase): +class TestSparseSeriesIndexing(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.orig = pd.Series([1, np.nan, np.nan, 3, np.nan]) self.sparse = self.orig.to_sparse() @@ -18,9 +16,9 @@ def test_getitem(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse[0], 1) - self.assertTrue(np.isnan(sparse[1])) - self.assertEqual(sparse[3], 3) + assert sparse[0] == 1 + assert np.isnan(sparse[1]) + assert sparse[3] == 3 result = sparse[[1, 3, 4]] exp = orig[[1, 3, 4]].to_sparse() @@ -55,23 +53,23 @@ def test_getitem_int_dtype(self): res = s[::2] exp = pd.SparseSeries([0, 2, 4, 6], index=[0, 2, 4, 6], name='xxx') tm.assert_sp_series_equal(res, exp) - self.assertEqual(res.dtype, np.int64) + assert res.dtype == np.int64 s = pd.SparseSeries([0, 1, 2, 3, 4, 5, 6], fill_value=0, name='xxx') res = s[::2] exp = pd.SparseSeries([0, 2, 4, 6], index=[0, 2, 4, 6], fill_value=0, name='xxx') tm.assert_sp_series_equal(res, exp) - self.assertEqual(res.dtype, np.int64) + assert res.dtype == np.int64 def test_getitem_fill_value(self): orig = pd.Series([1, np.nan, 0, 3, 0]) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse[0], 1) - self.assertTrue(np.isnan(sparse[1])) - self.assertEqual(sparse[2], 0) - self.assertEqual(sparse[3], 3) + assert sparse[0] == 1 + assert np.isnan(sparse[1]) + assert sparse[2] == 0 + assert sparse[3] == 3 result = sparse[[1, 3, 4]] exp = orig[[1, 3, 4]].to_sparse(fill_value=0) @@ -115,8 +113,8 @@ def test_loc(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse.loc[0], 1) - self.assertTrue(np.isnan(sparse.loc[1])) + assert sparse.loc[0] == 1 + assert np.isnan(sparse.loc[1]) result = sparse.loc[[1, 3, 4]] exp = orig.loc[[1, 3, 4]].to_sparse() @@ -127,7 +125,7 @@ def test_loc(self): exp = orig.loc[[1, 3, 4, 5]].to_sparse() tm.assert_sp_series_equal(result, exp) # padded with NaN - self.assertTrue(np.isnan(result[-1])) + assert np.isnan(result[-1]) # dense array result = sparse.loc[orig % 2 == 1] @@ -147,8 +145,8 @@ def test_loc_index(self): orig = pd.Series([1, np.nan, np.nan, 3, np.nan], index=list('ABCDE')) sparse = orig.to_sparse() - self.assertEqual(sparse.loc['A'], 1) - self.assertTrue(np.isnan(sparse.loc['B'])) + assert sparse.loc['A'] == 1 + assert np.isnan(sparse.loc['B']) result = sparse.loc[['A', 'C', 'D']] exp = orig.loc[['A', 'C', 'D']].to_sparse() @@ -172,8 +170,8 @@ def test_loc_index_fill_value(self): orig = pd.Series([1, np.nan, 0, 3, 0], index=list('ABCDE')) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse.loc['A'], 1) - self.assertTrue(np.isnan(sparse.loc['B'])) + assert sparse.loc['A'] == 1 + assert np.isnan(sparse.loc['B']) result = sparse.loc[['A', 'C', 'D']] exp = orig.loc[['A', 'C', 'D']].to_sparse(fill_value=0) @@ -211,8 +209,8 @@ def test_iloc(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse.iloc[3], 3) - self.assertTrue(np.isnan(sparse.iloc[2])) + assert sparse.iloc[3] == 3 + assert np.isnan(sparse.iloc[2]) result = sparse.iloc[[1, 3, 4]] exp = orig.iloc[[1, 3, 4]].to_sparse() @@ -222,16 +220,16 @@ def test_iloc(self): exp = orig.iloc[[1, -2, -4]].to_sparse() tm.assert_sp_series_equal(result, exp) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.iloc[[1, 3, 5]] def test_iloc_fill_value(self): orig = pd.Series([1, np.nan, 0, 3, 0]) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse.iloc[3], 3) - self.assertTrue(np.isnan(sparse.iloc[1])) - self.assertEqual(sparse.iloc[4], 0) + assert sparse.iloc[3] == 3 + assert np.isnan(sparse.iloc[1]) + assert sparse.iloc[4] == 0 result = sparse.iloc[[1, 3, 4]] exp = orig.iloc[[1, 3, 4]].to_sparse(fill_value=0) @@ -251,74 +249,74 @@ def test_iloc_slice_fill_value(self): def test_at(self): orig = pd.Series([1, np.nan, np.nan, 3, np.nan]) sparse = orig.to_sparse() - self.assertEqual(sparse.at[0], orig.at[0]) - self.assertTrue(np.isnan(sparse.at[1])) - self.assertTrue(np.isnan(sparse.at[2])) - self.assertEqual(sparse.at[3], orig.at[3]) - self.assertTrue(np.isnan(sparse.at[4])) + assert sparse.at[0] == orig.at[0] + assert np.isnan(sparse.at[1]) + assert np.isnan(sparse.at[2]) + assert sparse.at[3] == orig.at[3] + assert np.isnan(sparse.at[4]) orig = pd.Series([1, np.nan, np.nan, 3, np.nan], index=list('abcde')) sparse = orig.to_sparse() - self.assertEqual(sparse.at['a'], orig.at['a']) - self.assertTrue(np.isnan(sparse.at['b'])) - self.assertTrue(np.isnan(sparse.at['c'])) - self.assertEqual(sparse.at['d'], orig.at['d']) - self.assertTrue(np.isnan(sparse.at['e'])) + assert sparse.at['a'] == orig.at['a'] + assert np.isnan(sparse.at['b']) + assert np.isnan(sparse.at['c']) + assert sparse.at['d'] == orig.at['d'] + assert np.isnan(sparse.at['e']) def test_at_fill_value(self): orig = pd.Series([1, np.nan, 0, 3, 0], index=list('abcde')) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse.at['a'], orig.at['a']) - self.assertTrue(np.isnan(sparse.at['b'])) - self.assertEqual(sparse.at['c'], orig.at['c']) - self.assertEqual(sparse.at['d'], orig.at['d']) - self.assertEqual(sparse.at['e'], orig.at['e']) + assert sparse.at['a'] == orig.at['a'] + assert np.isnan(sparse.at['b']) + assert sparse.at['c'] == orig.at['c'] + assert sparse.at['d'] == orig.at['d'] + assert sparse.at['e'] == orig.at['e'] def test_iat(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse.iat[0], orig.iat[0]) - self.assertTrue(np.isnan(sparse.iat[1])) - self.assertTrue(np.isnan(sparse.iat[2])) - self.assertEqual(sparse.iat[3], orig.iat[3]) - self.assertTrue(np.isnan(sparse.iat[4])) + assert sparse.iat[0] == orig.iat[0] + assert np.isnan(sparse.iat[1]) + assert np.isnan(sparse.iat[2]) + assert sparse.iat[3] == orig.iat[3] + assert np.isnan(sparse.iat[4]) - self.assertTrue(np.isnan(sparse.iat[-1])) - self.assertEqual(sparse.iat[-5], orig.iat[-5]) + assert np.isnan(sparse.iat[-1]) + assert sparse.iat[-5] == orig.iat[-5] def test_iat_fill_value(self): orig = pd.Series([1, np.nan, 0, 3, 0]) sparse = orig.to_sparse() - self.assertEqual(sparse.iat[0], orig.iat[0]) - self.assertTrue(np.isnan(sparse.iat[1])) - self.assertEqual(sparse.iat[2], orig.iat[2]) - self.assertEqual(sparse.iat[3], orig.iat[3]) - self.assertEqual(sparse.iat[4], orig.iat[4]) + assert sparse.iat[0] == orig.iat[0] + assert np.isnan(sparse.iat[1]) + assert sparse.iat[2] == orig.iat[2] + assert sparse.iat[3] == orig.iat[3] + assert sparse.iat[4] == orig.iat[4] - self.assertEqual(sparse.iat[-1], orig.iat[-1]) - self.assertEqual(sparse.iat[-5], orig.iat[-5]) + assert sparse.iat[-1] == orig.iat[-1] + assert sparse.iat[-5] == orig.iat[-5] def test_get(self): s = pd.SparseSeries([1, np.nan, np.nan, 3, np.nan]) - self.assertEqual(s.get(0), 1) - self.assertTrue(np.isnan(s.get(1))) - self.assertIsNone(s.get(5)) + assert s.get(0) == 1 + assert np.isnan(s.get(1)) + assert s.get(5) is None s = pd.SparseSeries([1, np.nan, 0, 3, 0], index=list('ABCDE')) - self.assertEqual(s.get('A'), 1) - self.assertTrue(np.isnan(s.get('B'))) - self.assertEqual(s.get('C'), 0) - self.assertIsNone(s.get('XX')) + assert s.get('A') == 1 + assert np.isnan(s.get('B')) + assert s.get('C') == 0 + assert s.get('XX') is None s = pd.SparseSeries([1, np.nan, 0, 3, 0], index=list('ABCDE'), fill_value=0) - self.assertEqual(s.get('A'), 1) - self.assertTrue(np.isnan(s.get('B'))) - self.assertEqual(s.get('C'), 0) - self.assertIsNone(s.get('XX')) + assert s.get('A') == 1 + assert np.isnan(s.get('B')) + assert s.get('C') == 0 + assert s.get('XX') is None def test_take(self): orig = pd.Series([1, np.nan, np.nan, 3, np.nan], @@ -368,7 +366,7 @@ def test_reindex(self): exp = orig.reindex(['A', 'E', 'C', 'D']).to_sparse() tm.assert_sp_series_equal(res, exp) - def test_reindex_fill_value(self): + def test_fill_value_reindex(self): orig = pd.Series([1, np.nan, 0, 3, 0], index=list('ABCDE')) sparse = orig.to_sparse(fill_value=0) @@ -399,6 +397,23 @@ def test_reindex_fill_value(self): exp = orig.reindex(['A', 'E', 'C', 'D']).to_sparse(fill_value=0) tm.assert_sp_series_equal(res, exp) + def test_reindex_fill_value(self): + floats = pd.Series([1., 2., 3.]).to_sparse() + result = floats.reindex([1, 2, 3], fill_value=0) + expected = pd.Series([2., 3., 0], index=[1, 2, 3]).to_sparse() + tm.assert_sp_series_equal(result, expected) + + def test_reindex_nearest(self): + s = pd.Series(np.arange(10, dtype='float64')).to_sparse() + target = [0.1, 0.9, 1.5, 2.0] + actual = s.reindex(target, method='nearest') + expected = pd.Series(np.around(target), target).to_sparse() + tm.assert_sp_series_equal(expected, actual) + + actual = s.reindex(target, method='nearest', tolerance=0.2) + expected = pd.Series([0, 1, np.nan, 2], target).to_sparse() + tm.assert_sp_series_equal(expected, actual) + def tests_indexing_with_sparse(self): # GH 13985 @@ -425,15 +440,13 @@ def tests_indexing_with_sparse(self): msg = ("iLocation based boolean indexing cannot use an " "indexable as a mask") - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): s.iloc[indexer] class TestSparseSeriesMultiIndexing(TestSparseSeriesIndexing): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): # Mi with duplicated values idx = pd.MultiIndex.from_tuples([('A', 0), ('A', 1), ('B', 0), ('C', 0), ('C', 1)]) @@ -444,9 +457,9 @@ def test_getitem_multi(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse[0], orig[0]) - self.assertTrue(np.isnan(sparse[1])) - self.assertEqual(sparse[3], orig[3]) + assert sparse[0] == orig[0] + assert np.isnan(sparse[1]) + assert sparse[3] == orig[3] tm.assert_sp_series_equal(sparse['A'], orig['A'].to_sparse()) tm.assert_sp_series_equal(sparse['B'], orig['B'].to_sparse()) @@ -473,9 +486,9 @@ def test_getitem_multi_tuple(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse['C', 0], orig['C', 0]) - self.assertTrue(np.isnan(sparse['A', 1])) - self.assertTrue(np.isnan(sparse['B', 0])) + assert sparse['C', 0] == orig['C', 0] + assert np.isnan(sparse['A', 1]) + assert np.isnan(sparse['B', 0]) def test_getitems_slice_multi(self): orig = self.orig @@ -508,6 +521,11 @@ def test_loc(self): exp = orig.loc[[1, 3, 4, 5]].to_sparse() tm.assert_sp_series_equal(result, exp) + # single element list (GH 15447) + result = sparse.loc[['A']] + exp = orig.loc[['A']].to_sparse() + tm.assert_sp_series_equal(result, exp) + # dense array result = sparse.loc[orig % 2 == 1] exp = orig.loc[orig % 2 == 1].to_sparse() @@ -526,9 +544,9 @@ def test_loc_multi_tuple(self): orig = self.orig sparse = self.sparse - self.assertEqual(sparse.loc['C', 0], orig.loc['C', 0]) - self.assertTrue(np.isnan(sparse.loc['A', 1])) - self.assertTrue(np.isnan(sparse.loc['B', 0])) + assert sparse.loc['C', 0] == orig.loc['C', 0] + assert np.isnan(sparse.loc['A', 1]) + assert np.isnan(sparse.loc['B', 0]) def test_loc_slice(self): orig = self.orig @@ -541,10 +559,37 @@ def test_loc_slice(self): orig.loc['A':'B'].to_sparse()) tm.assert_sp_series_equal(sparse.loc[:'B'], orig.loc[:'B'].to_sparse()) + def test_reindex(self): + # GH 15447 + orig = self.orig + sparse = self.sparse + + res = sparse.reindex([('A', 0), ('C', 1)]) + exp = orig.reindex([('A', 0), ('C', 1)]).to_sparse() + tm.assert_sp_series_equal(res, exp) + + # On specific level: + res = sparse.reindex(['A', 'C', 'B'], level=0) + exp = orig.reindex(['A', 'C', 'B'], level=0).to_sparse() + tm.assert_sp_series_equal(res, exp) + + # single element list (GH 15447) + res = sparse.reindex(['A'], level=0) + exp = orig.reindex(['A'], level=0).to_sparse() + tm.assert_sp_series_equal(res, exp) + + with pytest.raises(TypeError): + # Incomplete keys are not accepted for reindexing: + sparse.reindex(['A', 'C']) -class TestSparseDataFrameIndexing(tm.TestCase): + # "copy" argument: + res = sparse.reindex(sparse.index, copy=True) + exp = orig.reindex(orig.index, copy=True).to_sparse() + tm.assert_sp_series_equal(res, exp) + assert sparse is not res - _multiprocess_can_split_ = True + +class TestSparseDataFrameIndexing(object): def test_getitem(self): orig = pd.DataFrame([[1, np.nan, np.nan], @@ -600,9 +645,9 @@ def test_loc(self): columns=list('xyz')) sparse = orig.to_sparse() - self.assertEqual(sparse.loc[0, 'x'], 1) - self.assertTrue(np.isnan(sparse.loc[1, 'z'])) - self.assertEqual(sparse.loc[2, 'z'], 4) + assert sparse.loc[0, 'x'] == 1 + assert np.isnan(sparse.loc[1, 'z']) + assert sparse.loc[2, 'z'] == 4 tm.assert_sp_series_equal(sparse.loc[0], orig.loc[0].to_sparse()) tm.assert_sp_series_equal(sparse.loc[1], orig.loc[1].to_sparse()) @@ -657,9 +702,9 @@ def test_loc_index(self): index=list('abc'), columns=list('xyz')) sparse = orig.to_sparse() - self.assertEqual(sparse.loc['a', 'x'], 1) - self.assertTrue(np.isnan(sparse.loc['b', 'z'])) - self.assertEqual(sparse.loc['c', 'z'], 4) + assert sparse.loc['a', 'x'] == 1 + assert np.isnan(sparse.loc['b', 'z']) + assert sparse.loc['c', 'z'] == 4 tm.assert_sp_series_equal(sparse.loc['a'], orig.loc['a'].to_sparse()) tm.assert_sp_series_equal(sparse.loc['b'], orig.loc['b'].to_sparse()) @@ -717,8 +762,8 @@ def test_iloc(self): [np.nan, np.nan, 4]]) sparse = orig.to_sparse() - self.assertEqual(sparse.iloc[1, 1], 3) - self.assertTrue(np.isnan(sparse.iloc[2, 0])) + assert sparse.iloc[1, 1] == 3 + assert np.isnan(sparse.iloc[2, 0]) tm.assert_sp_series_equal(sparse.iloc[0], orig.loc[0].to_sparse()) tm.assert_sp_series_equal(sparse.iloc[1], orig.loc[1].to_sparse()) @@ -747,7 +792,7 @@ def test_iloc(self): exp = orig.iloc[[2], [1, 0]].to_sparse() tm.assert_sp_frame_equal(result, exp) - with tm.assertRaises(IndexError): + with pytest.raises(IndexError): sparse.iloc[[1, 3, 5]] def test_iloc_slice(self): @@ -765,10 +810,10 @@ def test_at(self): [0, np.nan, 5]], index=list('ABCD'), columns=list('xyz')) sparse = orig.to_sparse() - self.assertEqual(sparse.at['A', 'x'], orig.at['A', 'x']) - self.assertTrue(np.isnan(sparse.at['B', 'z'])) - self.assertTrue(np.isnan(sparse.at['C', 'y'])) - self.assertEqual(sparse.at['D', 'x'], orig.at['D', 'x']) + assert sparse.at['A', 'x'] == orig.at['A', 'x'] + assert np.isnan(sparse.at['B', 'z']) + assert np.isnan(sparse.at['C', 'y']) + assert sparse.at['D', 'x'] == orig.at['D', 'x'] def test_at_fill_value(self): orig = pd.DataFrame([[1, np.nan, 0], @@ -777,10 +822,10 @@ def test_at_fill_value(self): [0, np.nan, 5]], index=list('ABCD'), columns=list('xyz')) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse.at['A', 'x'], orig.at['A', 'x']) - self.assertTrue(np.isnan(sparse.at['B', 'z'])) - self.assertTrue(np.isnan(sparse.at['C', 'y'])) - self.assertEqual(sparse.at['D', 'x'], orig.at['D', 'x']) + assert sparse.at['A', 'x'] == orig.at['A', 'x'] + assert np.isnan(sparse.at['B', 'z']) + assert np.isnan(sparse.at['C', 'y']) + assert sparse.at['D', 'x'] == orig.at['D', 'x'] def test_iat(self): orig = pd.DataFrame([[1, np.nan, 0], @@ -789,13 +834,13 @@ def test_iat(self): [0, np.nan, 5]], index=list('ABCD'), columns=list('xyz')) sparse = orig.to_sparse() - self.assertEqual(sparse.iat[0, 0], orig.iat[0, 0]) - self.assertTrue(np.isnan(sparse.iat[1, 2])) - self.assertTrue(np.isnan(sparse.iat[2, 1])) - self.assertEqual(sparse.iat[2, 0], orig.iat[2, 0]) + assert sparse.iat[0, 0] == orig.iat[0, 0] + assert np.isnan(sparse.iat[1, 2]) + assert np.isnan(sparse.iat[2, 1]) + assert sparse.iat[2, 0] == orig.iat[2, 0] - self.assertTrue(np.isnan(sparse.iat[-1, -2])) - self.assertEqual(sparse.iat[-1, -1], orig.iat[-1, -1]) + assert np.isnan(sparse.iat[-1, -2]) + assert sparse.iat[-1, -1] == orig.iat[-1, -1] def test_iat_fill_value(self): orig = pd.DataFrame([[1, np.nan, 0], @@ -804,13 +849,13 @@ def test_iat_fill_value(self): [0, np.nan, 5]], index=list('ABCD'), columns=list('xyz')) sparse = orig.to_sparse(fill_value=0) - self.assertEqual(sparse.iat[0, 0], orig.iat[0, 0]) - self.assertTrue(np.isnan(sparse.iat[1, 2])) - self.assertTrue(np.isnan(sparse.iat[2, 1])) - self.assertEqual(sparse.iat[2, 0], orig.iat[2, 0]) + assert sparse.iat[0, 0] == orig.iat[0, 0] + assert np.isnan(sparse.iat[1, 2]) + assert np.isnan(sparse.iat[2, 1]) + assert sparse.iat[2, 0] == orig.iat[2, 0] - self.assertTrue(np.isnan(sparse.iat[-1, -2])) - self.assertEqual(sparse.iat[-1, -1], orig.iat[-1, -1]) + assert np.isnan(sparse.iat[-1, -2]) + assert sparse.iat[-1, -1] == orig.iat[-1, -1] def test_take(self): orig = pd.DataFrame([[1, np.nan, 0], @@ -907,8 +952,9 @@ def test_reindex_fill_value(self): tm.assert_sp_frame_equal(res, exp) -class TestMultitype(tm.TestCase): - def setUp(self): +class TestMultitype(object): + + def setup_method(self, method): self.cols = ['string', 'int', 'float', 'object'] self.string_series = pd.SparseSeries(['a', 'b', 'c']) @@ -926,7 +972,7 @@ def setUp(self): def test_frame_basic_dtypes(self): for _, row in self.sdf.iterrows(): - self.assertEqual(row.dtype, object) + assert row.dtype == object tm.assert_sp_series_equal(self.sdf['string'], self.string_series, check_names=False) tm.assert_sp_series_equal(self.sdf['int'], self.int_series, @@ -968,13 +1014,14 @@ def test_frame_indexing_multiple(self): def test_series_indexing_single(self): for i, idx in enumerate(self.cols): - self.assertEqual(self.ss.iloc[i], self.ss[idx]) - self.assertEqual(type(self.ss.iloc[i]), - type(self.ss[idx])) - self.assertEqual(self.ss['string'], 'a') - self.assertEqual(self.ss['int'], 1) - self.assertEqual(self.ss['float'], 1.1) - self.assertEqual(self.ss['object'], []) + assert self.ss.iloc[i] == self.ss[idx] + tm.assert_class_equal(self.ss.iloc[i], self.ss[idx], + obj="series index") + + assert self.ss['string'] == 'a' + assert self.ss['int'] == 1 + assert self.ss['float'] == 1.1 + assert self.ss['object'] == [] def test_series_indexing_multiple(self): tm.assert_sp_series_equal(self.ss.loc[['string', 'int']], diff --git a/pandas/sparse/tests/test_libsparse.py b/pandas/tests/sparse/test_libsparse.py similarity index 77% rename from pandas/sparse/tests/test_libsparse.py rename to pandas/tests/sparse/test_libsparse.py index c289b4a1b204f..4842ebdd103c4 100644 --- a/pandas/sparse/tests/test_libsparse.py +++ b/pandas/tests/sparse/test_libsparse.py @@ -1,14 +1,14 @@ from pandas import Series -import nose # noqa +import pytest import numpy as np import operator import pandas.util.testing as tm from pandas import compat -from pandas.sparse.array import IntIndex, BlockIndex, _make_index -import pandas._sparse as splib +from pandas.core.sparse.array import IntIndex, BlockIndex, _make_index +import pandas._libs.sparse as splib TEST_LENGTH = 20 @@ -42,7 +42,7 @@ def _check_case_dict(case): _check_case([], [], [], [], [], []) -class TestSparseIndexUnion(tm.TestCase): +class TestSparseIndexUnion(object): def test_index_make_union(self): def _check_case(xloc, xlen, yloc, ylen, eloc, elen): @@ -162,33 +162,33 @@ def test_intindex_make_union(self): b = IntIndex(5, np.array([0, 2], dtype=np.int32)) res = a.make_union(b) exp = IntIndex(5, np.array([0, 2, 3, 4], np.int32)) - self.assertTrue(res.equals(exp)) + assert res.equals(exp) a = IntIndex(5, np.array([], dtype=np.int32)) b = IntIndex(5, np.array([0, 2], dtype=np.int32)) res = a.make_union(b) exp = IntIndex(5, np.array([0, 2], np.int32)) - self.assertTrue(res.equals(exp)) + assert res.equals(exp) a = IntIndex(5, np.array([], dtype=np.int32)) b = IntIndex(5, np.array([], dtype=np.int32)) res = a.make_union(b) exp = IntIndex(5, np.array([], np.int32)) - self.assertTrue(res.equals(exp)) + assert res.equals(exp) a = IntIndex(5, np.array([0, 1, 2, 3, 4], dtype=np.int32)) b = IntIndex(5, np.array([0, 1, 2, 3, 4], dtype=np.int32)) res = a.make_union(b) exp = IntIndex(5, np.array([0, 1, 2, 3, 4], np.int32)) - self.assertTrue(res.equals(exp)) + assert res.equals(exp) a = IntIndex(5, np.array([0, 1], dtype=np.int32)) b = IntIndex(4, np.array([0, 1], dtype=np.int32)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): a.make_union(b) -class TestSparseIndexIntersect(tm.TestCase): +class TestSparseIndexIntersect(object): def test_intersect(self): def _check_correct(a, b, expected): @@ -196,7 +196,7 @@ def _check_correct(a, b, expected): assert (result.equals(expected)) def _check_length_exc(a, longer): - nose.tools.assert_raises(Exception, a.intersect, longer) + pytest.raises(Exception, a.intersect, longer) def _check_case(xloc, xlen, yloc, ylen, eloc, elen): xindex = BlockIndex(TEST_LENGTH, xloc, xlen) @@ -213,19 +213,19 @@ def _check_case(xloc, xlen, yloc, ylen, eloc, elen): longer_index.to_int_index()) if compat.is_platform_windows(): - raise nose.SkipTest("segfaults on win-64 when all tests are run") + pytest.skip("segfaults on win-64 when all tests are run") check_cases(_check_case) def test_intersect_empty(self): xindex = IntIndex(4, np.array([], dtype=np.int32)) yindex = IntIndex(4, np.array([2, 3], dtype=np.int32)) - self.assertTrue(xindex.intersect(yindex).equals(xindex)) - self.assertTrue(yindex.intersect(xindex).equals(xindex)) + assert xindex.intersect(yindex).equals(xindex) + assert yindex.intersect(xindex).equals(xindex) xindex = xindex.to_block_index() yindex = yindex.to_block_index() - self.assertTrue(xindex.intersect(yindex).equals(xindex)) - self.assertTrue(yindex.intersect(xindex).equals(xindex)) + assert xindex.intersect(yindex).equals(xindex) + assert yindex.intersect(xindex).equals(xindex) def test_intersect_identical(self): cases = [IntIndex(5, np.array([1, 2], dtype=np.int32)), @@ -234,47 +234,45 @@ def test_intersect_identical(self): IntIndex(5, np.array([], dtype=np.int32))] for case in cases: - self.assertTrue(case.intersect(case).equals(case)) + assert case.intersect(case).equals(case) case = case.to_block_index() - self.assertTrue(case.intersect(case).equals(case)) + assert case.intersect(case).equals(case) -class TestSparseIndexCommon(tm.TestCase): - - _multiprocess_can_split_ = True +class TestSparseIndexCommon(object): def test_int_internal(self): idx = _make_index(4, np.array([2, 3], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 2) + assert isinstance(idx, IntIndex) + assert idx.npoints == 2 tm.assert_numpy_array_equal(idx.indices, np.array([2, 3], dtype=np.int32)) idx = _make_index(4, np.array([], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 0) + assert isinstance(idx, IntIndex) + assert idx.npoints == 0 tm.assert_numpy_array_equal(idx.indices, np.array([], dtype=np.int32)) idx = _make_index(4, np.array([0, 1, 2, 3], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 4) + assert isinstance(idx, IntIndex) + assert idx.npoints == 4 tm.assert_numpy_array_equal(idx.indices, np.array([0, 1, 2, 3], dtype=np.int32)) def test_block_internal(self): idx = _make_index(4, np.array([2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 2) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 2 tm.assert_numpy_array_equal(idx.blocs, np.array([2], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, np.array([2], dtype=np.int32)) idx = _make_index(4, np.array([], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 0) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 0 tm.assert_numpy_array_equal(idx.blocs, np.array([], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, @@ -282,8 +280,8 @@ def test_block_internal(self): idx = _make_index(4, np.array([0, 1, 2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 4) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 4 tm.assert_numpy_array_equal(idx.blocs, np.array([0], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, @@ -291,8 +289,8 @@ def test_block_internal(self): idx = _make_index(4, np.array([0, 2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 3) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 3 tm.assert_numpy_array_equal(idx.blocs, np.array([0, 2], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, @@ -301,35 +299,35 @@ def test_block_internal(self): def test_lookup(self): for kind in ['integer', 'block']: idx = _make_index(4, np.array([2, 3], dtype=np.int32), kind=kind) - self.assertEqual(idx.lookup(-1), -1) - self.assertEqual(idx.lookup(0), -1) - self.assertEqual(idx.lookup(1), -1) - self.assertEqual(idx.lookup(2), 0) - self.assertEqual(idx.lookup(3), 1) - self.assertEqual(idx.lookup(4), -1) + assert idx.lookup(-1) == -1 + assert idx.lookup(0) == -1 + assert idx.lookup(1) == -1 + assert idx.lookup(2) == 0 + assert idx.lookup(3) == 1 + assert idx.lookup(4) == -1 idx = _make_index(4, np.array([], dtype=np.int32), kind=kind) for i in range(-1, 5): - self.assertEqual(idx.lookup(i), -1) + assert idx.lookup(i) == -1 idx = _make_index(4, np.array([0, 1, 2, 3], dtype=np.int32), kind=kind) - self.assertEqual(idx.lookup(-1), -1) - self.assertEqual(idx.lookup(0), 0) - self.assertEqual(idx.lookup(1), 1) - self.assertEqual(idx.lookup(2), 2) - self.assertEqual(idx.lookup(3), 3) - self.assertEqual(idx.lookup(4), -1) + assert idx.lookup(-1) == -1 + assert idx.lookup(0) == 0 + assert idx.lookup(1) == 1 + assert idx.lookup(2) == 2 + assert idx.lookup(3) == 3 + assert idx.lookup(4) == -1 idx = _make_index(4, np.array([0, 2, 3], dtype=np.int32), kind=kind) - self.assertEqual(idx.lookup(-1), -1) - self.assertEqual(idx.lookup(0), 0) - self.assertEqual(idx.lookup(1), -1) - self.assertEqual(idx.lookup(2), 1) - self.assertEqual(idx.lookup(3), 2) - self.assertEqual(idx.lookup(4), -1) + assert idx.lookup(-1) == -1 + assert idx.lookup(0) == 0 + assert idx.lookup(1) == -1 + assert idx.lookup(2) == 1 + assert idx.lookup(3) == 2 + assert idx.lookup(4) == -1 def test_lookup_array(self): for kind in ['integer', 'block']: @@ -337,11 +335,11 @@ def test_lookup_array(self): res = idx.lookup_array(np.array([-1, 0, 2], dtype=np.int32)) exp = np.array([-1, -1, 0], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) res = idx.lookup_array(np.array([4, 2, 1, 3], dtype=np.int32)) exp = np.array([-1, 0, -1, 1], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) idx = _make_index(4, np.array([], dtype=np.int32), kind=kind) res = idx.lookup_array(np.array([-1, 0, 2, 4], dtype=np.int32)) @@ -351,21 +349,21 @@ def test_lookup_array(self): kind=kind) res = idx.lookup_array(np.array([-1, 0, 2], dtype=np.int32)) exp = np.array([-1, 0, 2], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) res = idx.lookup_array(np.array([4, 2, 1, 3], dtype=np.int32)) exp = np.array([-1, 2, 1, 3], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) idx = _make_index(4, np.array([0, 2, 3], dtype=np.int32), kind=kind) res = idx.lookup_array(np.array([2, 1, 3, 0], dtype=np.int32)) exp = np.array([1, -1, 2, 0], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) res = idx.lookup_array(np.array([1, 4, 2, 5], dtype=np.int32)) exp = np.array([-1, -1, 1, -1], dtype=np.int32) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) def test_lookup_basics(self): def _check(index): @@ -389,22 +387,20 @@ def _check(index): # corner cases -class TestBlockIndex(tm.TestCase): - - _multiprocess_can_split_ = True +class TestBlockIndex(object): def test_block_internal(self): idx = _make_index(4, np.array([2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 2) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 2 tm.assert_numpy_array_equal(idx.blocs, np.array([2], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, np.array([2], dtype=np.int32)) idx = _make_index(4, np.array([], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 0) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 0 tm.assert_numpy_array_equal(idx.blocs, np.array([], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, @@ -412,16 +408,16 @@ def test_block_internal(self): idx = _make_index(4, np.array([0, 1, 2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 4) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 4 tm.assert_numpy_array_equal(idx.blocs, np.array([0], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, np.array([4], dtype=np.int32)) idx = _make_index(4, np.array([0, 2, 3], dtype=np.int32), kind='block') - self.assertIsInstance(idx, BlockIndex) - self.assertEqual(idx.npoints, 3) + assert isinstance(idx, BlockIndex) + assert idx.npoints == 3 tm.assert_numpy_array_equal(idx.blocs, np.array([0, 2], dtype=np.int32)) tm.assert_numpy_array_equal(idx.blengths, @@ -440,8 +436,8 @@ def test_make_block_boundary(self): def test_equals(self): index = BlockIndex(10, [0, 4], [2, 5]) - self.assertTrue(index.equals(index)) - self.assertFalse(index.equals(BlockIndex(10, [0, 4], [2, 6]))) + assert index.equals(index) + assert not index.equals(BlockIndex(10, [0, 4], [2, 6])) def test_check_integrity(self): locs = [] @@ -455,10 +451,10 @@ def test_check_integrity(self): index = BlockIndex(1, locs, lengths) # noqa # block extend beyond end - self.assertRaises(Exception, BlockIndex, 10, [5], [10]) + pytest.raises(Exception, BlockIndex, 10, [5], [10]) # block overlap - self.assertRaises(Exception, BlockIndex, 10, [2, 5], [5, 3]) + pytest.raises(Exception, BlockIndex, 10, [2, 5], [5, 3]) def test_to_int_index(self): locs = [0, 10] @@ -473,37 +469,73 @@ def test_to_int_index(self): def test_to_block_index(self): index = BlockIndex(10, [0, 5], [4, 5]) - self.assertIs(index.to_block_index(), index) + assert index.to_block_index() is index + + +class TestIntIndex(object): + + def test_check_integrity(self): + + # Too many indices than specified in self.length + msg = "Too many indices" + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=1, indices=[1, 2, 3]) + + # No index can be negative. + msg = "No index can be less than zero" + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, -2, 3]) + # No index can be negative. + msg = "No index can be less than zero" -class TestIntIndex(tm.TestCase): + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, -2, 3]) - _multiprocess_can_split_ = True + # All indices must be less than the length. + msg = "All indices must be less than the length" + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, 2, 5]) + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, 2, 6]) + + # Indices must be strictly ascending. + msg = "Indices must be strictly increasing" + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, 3, 2]) + + with tm.assert_raises_regex(ValueError, msg): + IntIndex(length=5, indices=[1, 3, 3]) def test_int_internal(self): idx = _make_index(4, np.array([2, 3], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 2) + assert isinstance(idx, IntIndex) + assert idx.npoints == 2 tm.assert_numpy_array_equal(idx.indices, np.array([2, 3], dtype=np.int32)) idx = _make_index(4, np.array([], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 0) + assert isinstance(idx, IntIndex) + assert idx.npoints == 0 tm.assert_numpy_array_equal(idx.indices, np.array([], dtype=np.int32)) idx = _make_index(4, np.array([0, 1, 2, 3], dtype=np.int32), kind='integer') - self.assertIsInstance(idx, IntIndex) - self.assertEqual(idx.npoints, 4) + assert isinstance(idx, IntIndex) + assert idx.npoints == 4 tm.assert_numpy_array_equal(idx.indices, np.array([0, 1, 2, 3], dtype=np.int32)) def test_equals(self): index = IntIndex(10, [0, 1, 2, 3, 4]) - self.assertTrue(index.equals(index)) - self.assertFalse(index.equals(IntIndex(10, [0, 1, 2, 3]))) + assert index.equals(index) + assert not index.equals(IntIndex(10, [0, 1, 2, 3])) def test_to_block_index(self): @@ -514,18 +546,18 @@ def _check_case(xloc, xlen, yloc, ylen, eloc, elen): # see if survive the round trip xbindex = xindex.to_int_index().to_block_index() ybindex = yindex.to_int_index().to_block_index() - tm.assertIsInstance(xbindex, BlockIndex) - self.assertTrue(xbindex.equals(xindex)) - self.assertTrue(ybindex.equals(yindex)) + assert isinstance(xbindex, BlockIndex) + assert xbindex.equals(xindex) + assert ybindex.equals(yindex) check_cases(_check_case) def test_to_int_index(self): index = IntIndex(10, [2, 3, 4, 5, 6]) - self.assertIs(index.to_int_index(), index) + assert index.to_int_index() is index -class TestSparseOperators(tm.TestCase): +class TestSparseOperators(object): def _op_tests(self, sparse_op, python_op): def _check_case(xloc, xlen, yloc, ylen, eloc, elen): @@ -546,9 +578,9 @@ def _check_case(xloc, xlen, yloc, ylen, eloc, elen): result_int_vals, ri_index, ifill = sparse_op(x, xdindex, xfill, y, ydindex, yfill) - self.assertTrue(rb_index.to_int_index().equals(ri_index)) + assert rb_index.to_int_index().equals(ri_index) tm.assert_numpy_array_equal(result_block_vals, result_int_vals) - self.assertEqual(bfill, ifill) + assert bfill == ifill # check versus Series... xseries = Series(x, xdindex.indices) @@ -566,8 +598,8 @@ def _check_case(xloc, xlen, yloc, ylen, eloc, elen): check_cases(_check_case) -# too cute? oh but how I abhor code duplication +# too cute? oh but how I abhor code duplication check_ops = ['add', 'sub', 'mul', 'truediv', 'floordiv'] @@ -585,9 +617,3 @@ def f(self): g = make_optestf(op) setattr(TestSparseOperators, g.__name__, g) del g - - -if __name__ == '__main__': - import nose # noqa - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/sparse/tests/test_list.py b/pandas/tests/sparse/test_list.py similarity index 83% rename from pandas/sparse/tests/test_list.py rename to pandas/tests/sparse/test_list.py index b117685b6e968..6c721ca813a21 100644 --- a/pandas/sparse/tests/test_list.py +++ b/pandas/tests/sparse/test_list.py @@ -1,18 +1,15 @@ from pandas.compat import range -import unittest from numpy import nan import numpy as np -from pandas.sparse.api import SparseList, SparseArray +from pandas.core.sparse.api import SparseList, SparseArray import pandas.util.testing as tm -class TestSparseList(unittest.TestCase): +class TestSparseList(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.na_data = np.array([nan, nan, 1, 2, 3, nan, 4, 5, nan, 6]) self.zero_data = np.array([0, 0, 1, 2, 3, 0, 4, 5, 0, 6]) @@ -35,11 +32,11 @@ def test_len(self): arr = self.na_data splist = SparseList() splist.append(arr[:5]) - self.assertEqual(len(splist), 5) + assert len(splist) == 5 splist.append(arr[5]) - self.assertEqual(len(splist), 6) + assert len(splist) == 6 splist.append(arr[6:]) - self.assertEqual(len(splist), 10) + assert len(splist) == 10 def test_append_na(self): with tm.assert_produces_warning(FutureWarning): @@ -78,12 +75,12 @@ def test_consolidate(self): splist.append(arr[6:]) consol = splist.consolidate(inplace=False) - self.assertEqual(consol.nchunks, 1) - self.assertEqual(splist.nchunks, 3) + assert consol.nchunks == 1 + assert splist.nchunks == 3 tm.assert_sp_array_equal(consol.to_array(), exp_sparr) splist.consolidate() - self.assertEqual(splist.nchunks, 1) + assert splist.nchunks == 1 tm.assert_sp_array_equal(splist.to_array(), exp_sparr) def test_copy(self): @@ -98,7 +95,7 @@ def test_copy(self): cp = splist.copy() cp.append(arr[6:]) - self.assertEqual(splist.nchunks, 2) + assert splist.nchunks == 2 tm.assert_sp_array_equal(cp.to_array(), exp_sparr) def test_getitem(self): @@ -112,9 +109,3 @@ def test_getitem(self): for i in range(len(arr)): tm.assert_almost_equal(splist[i], arr[i]) tm.assert_almost_equal(splist[-i], arr[-i]) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/sparse/tests/test_pivot.py b/pandas/tests/sparse/test_pivot.py similarity index 96% rename from pandas/sparse/tests/test_pivot.py rename to pandas/tests/sparse/test_pivot.py index 482a99a96194f..e7eba63e4e0b3 100644 --- a/pandas/sparse/tests/test_pivot.py +++ b/pandas/tests/sparse/test_pivot.py @@ -3,11 +3,9 @@ import pandas.util.testing as tm -class TestPivotTable(tm.TestCase): +class TestPivotTable(object): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.dense = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', diff --git a/pandas/sparse/tests/test_series.py b/pandas/tests/sparse/test_series.py similarity index 85% rename from pandas/sparse/tests/test_series.py rename to pandas/tests/sparse/test_series.py index 06d76bdd4dd3d..b44314d4e733b 100644 --- a/pandas/sparse/tests/test_series.py +++ b/pandas/tests/sparse/test_series.py @@ -1,24 +1,25 @@ # pylint: disable-msg=E1101,W0612 import operator +from datetime import datetime + +import pytest from numpy import nan import numpy as np import pandas as pd -from pandas import Series, DataFrame, bdate_range -from pandas.core.common import isnull +from pandas import Series, DataFrame, bdate_range, isna, compat from pandas.tseries.offsets import BDay import pandas.util.testing as tm from pandas.compat import range -from pandas import compat -from pandas.tools.util import cartesian_product +from pandas.core.reshape.util import cartesian_product -import pandas.sparse.frame as spf +import pandas.core.sparse.frame as spf -from pandas._sparse import BlockIndex, IntIndex -from pandas.sparse.api import SparseSeries -from pandas.tests.series.test_misc_api import SharedWithSparse +from pandas._libs.sparse import BlockIndex, IntIndex +from pandas.core.sparse.api import SparseSeries +from pandas.tests.series.test_api import SharedWithSparse def _test_data1(): @@ -55,10 +56,13 @@ def _test_data2_zero(): return arr, index -class TestSparseSeries(tm.TestCase, SharedWithSparse): - _multiprocess_can_split_ = True +class TestSparseSeries(SharedWithSparse): + + series_klass = SparseSeries + # SharedWithSparse tests use generic, series_klass-agnostic assertion + _assert_series_equal = staticmethod(tm.assert_sp_series_equal) - def setUp(self): + def setup_method(self, method): arr, index = _test_data1() date_index = bdate_range('1/1/2011', periods=len(index)) @@ -88,37 +92,49 @@ def setUp(self): self.ziseries2 = SparseSeries(arr, index=index, kind='integer', fill_value=0) + def test_constructor_dict_input(self): + # gh-16905 + constructor_dict = {1: 1.} + index = [0, 1, 2] + + # Series with index passed in + series = pd.Series(constructor_dict) + expected = SparseSeries(series, index=index) + + result = SparseSeries(constructor_dict, index=index) + tm.assert_sp_series_equal(result, expected) + + # Series with index and dictionary with no index + expected = SparseSeries(series) + + result = SparseSeries(constructor_dict) + tm.assert_sp_series_equal(result, expected) + def test_constructor_dtype(self): arr = SparseSeries([np.nan, 1, 2, np.nan]) - self.assertEqual(arr.dtype, np.float64) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.float64 + assert np.isnan(arr.fill_value) arr = SparseSeries([np.nan, 1, 2, np.nan], fill_value=0) - self.assertEqual(arr.dtype, np.float64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.float64 + assert arr.fill_value == 0 arr = SparseSeries([0, 1, 2, 4], dtype=np.int64, fill_value=np.nan) - self.assertEqual(arr.dtype, np.int64) - self.assertTrue(np.isnan(arr.fill_value)) + assert arr.dtype == np.int64 + assert np.isnan(arr.fill_value) arr = SparseSeries([0, 1, 2, 4], dtype=np.int64) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 arr = SparseSeries([0, 1, 2, 4], fill_value=0, dtype=np.int64) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 def test_iteration_and_str(self): [x for x in self.bseries] str(self.bseries) - def test_TimeSeries_deprecation(self): - - # deprecation TimeSeries, #10890 - with tm.assert_produces_warning(FutureWarning): - pd.SparseTimeSeries(1, index=pd.date_range('20130101', periods=3)) - def test_construct_DataFrame_with_sp_series(self): # it works! df = DataFrame({'col': self.bseries}) @@ -141,12 +157,12 @@ def test_construct_DataFrame_with_sp_series(self): def test_constructor_preserve_attr(self): arr = pd.SparseArray([1, 0, 3, 0], dtype=np.int64, fill_value=0) - self.assertEqual(arr.dtype, np.int64) - self.assertEqual(arr.fill_value, 0) + assert arr.dtype == np.int64 + assert arr.fill_value == 0 s = pd.SparseSeries(arr, name='x') - self.assertEqual(s.dtype, np.int64) - self.assertEqual(s.fill_value, 0) + assert s.dtype == np.int64 + assert s.fill_value == 0 def test_series_density(self): # GH2803 @@ -154,7 +170,7 @@ def test_series_density(self): ts[2:-2] = nan sts = ts.to_sparse() density = sts.density # don't die - self.assertEqual(density, 4 / 10.0) + assert density == 4 / 10.0 def test_sparse_to_dense(self): arr, index = _test_data1() @@ -209,12 +225,12 @@ def test_dense_to_sparse(self): iseries = series.to_sparse(kind='integer') tm.assert_sp_series_equal(bseries, self.bseries) tm.assert_sp_series_equal(iseries, self.iseries, check_names=False) - self.assertEqual(iseries.name, self.bseries.name) + assert iseries.name == self.bseries.name - self.assertEqual(len(series), len(bseries)) - self.assertEqual(len(series), len(iseries)) - self.assertEqual(series.shape, bseries.shape) - self.assertEqual(series.shape, iseries.shape) + assert len(series) == len(bseries) + assert len(series) == len(iseries) + assert series.shape == bseries.shape + assert series.shape == iseries.shape # non-NaN fill value series = self.zbseries.to_dense() @@ -222,26 +238,26 @@ def test_dense_to_sparse(self): ziseries = series.to_sparse(kind='integer', fill_value=0) tm.assert_sp_series_equal(zbseries, self.zbseries) tm.assert_sp_series_equal(ziseries, self.ziseries, check_names=False) - self.assertEqual(ziseries.name, self.zbseries.name) + assert ziseries.name == self.zbseries.name - self.assertEqual(len(series), len(zbseries)) - self.assertEqual(len(series), len(ziseries)) - self.assertEqual(series.shape, zbseries.shape) - self.assertEqual(series.shape, ziseries.shape) + assert len(series) == len(zbseries) + assert len(series) == len(ziseries) + assert series.shape == zbseries.shape + assert series.shape == ziseries.shape def test_to_dense_preserve_name(self): assert (self.bseries.name is not None) result = self.bseries.to_dense() - self.assertEqual(result.name, self.bseries.name) + assert result.name == self.bseries.name def test_constructor(self): # test setup guys - self.assertTrue(np.isnan(self.bseries.fill_value)) - tm.assertIsInstance(self.bseries.sp_index, BlockIndex) - self.assertTrue(np.isnan(self.iseries.fill_value)) - tm.assertIsInstance(self.iseries.sp_index, IntIndex) + assert np.isnan(self.bseries.fill_value) + assert isinstance(self.bseries.sp_index, BlockIndex) + assert np.isnan(self.iseries.fill_value) + assert isinstance(self.iseries.sp_index, IntIndex) - self.assertEqual(self.zbseries.fill_value, 0) + assert self.zbseries.fill_value == 0 tm.assert_numpy_array_equal(self.zbseries.values.values, self.bseries.to_dense().fillna(0).values) @@ -250,13 +266,13 @@ def _check_const(sparse, name): # use passed series name result = SparseSeries(sparse) tm.assert_sp_series_equal(result, sparse) - self.assertEqual(sparse.name, name) - self.assertEqual(result.name, name) + assert sparse.name == name + assert result.name == name # use passed name result = SparseSeries(sparse, name='x') tm.assert_sp_series_equal(result, sparse, check_names=False) - self.assertEqual(result.name, 'x') + assert result.name == 'x' _check_const(self.bseries, 'bseries') _check_const(self.iseries, 'iseries') @@ -265,7 +281,7 @@ def _check_const(sparse, name): # Sparse time series works date_index = bdate_range('1/1/2000', periods=len(self.bseries)) s5 = SparseSeries(self.bseries, index=date_index) - tm.assertIsInstance(s5, SparseSeries) + assert isinstance(s5, SparseSeries) # pass Series bseries2 = SparseSeries(self.bseries.to_dense()) @@ -277,31 +293,31 @@ def _check_const(sparse, name): values = np.ones(self.bseries.npoints) sp = SparseSeries(values, sparse_index=self.bseries.sp_index) sp.sp_values[:5] = 97 - self.assertEqual(values[0], 97) + assert values[0] == 97 - self.assertEqual(len(sp), 20) - self.assertEqual(sp.shape, (20, )) + assert len(sp) == 20 + assert sp.shape == (20, ) # but can make it copy! sp = SparseSeries(values, sparse_index=self.bseries.sp_index, copy=True) sp.sp_values[:5] = 100 - self.assertEqual(values[0], 97) + assert values[0] == 97 - self.assertEqual(len(sp), 20) - self.assertEqual(sp.shape, (20, )) + assert len(sp) == 20 + assert sp.shape == (20, ) def test_constructor_scalar(self): data = 5 sp = SparseSeries(data, np.arange(100)) sp = sp.reindex(np.arange(200)) - self.assertTrue((sp.loc[:99] == data).all()) - self.assertTrue(isnull(sp.loc[100:]).all()) + assert (sp.loc[:99] == data).all() + assert isna(sp.loc[100:]).all() data = np.nan sp = SparseSeries(data, np.arange(100)) - self.assertEqual(len(sp), 100) - self.assertEqual(sp.shape, (100, )) + assert len(sp) == 100 + assert sp.shape == (100, ) def test_constructor_ndarray(self): pass @@ -310,20 +326,20 @@ def test_constructor_nonnan(self): arr = [0, 0, 0, nan, nan] sp_series = SparseSeries(arr, fill_value=0) tm.assert_numpy_array_equal(sp_series.values.values, np.array(arr)) - self.assertEqual(len(sp_series), 5) - self.assertEqual(sp_series.shape, (5, )) + assert len(sp_series) == 5 + assert sp_series.shape == (5, ) - # GH 9272 def test_constructor_empty(self): + # see gh-9272 sp = SparseSeries() - self.assertEqual(len(sp.index), 0) - self.assertEqual(sp.shape, (0, )) + assert len(sp.index) == 0 + assert sp.shape == (0, ) def test_copy_astype(self): cop = self.bseries.astype(np.float64) - self.assertIsNot(cop, self.bseries) - self.assertIs(cop.sp_index, self.bseries.sp_index) - self.assertEqual(cop.dtype, np.float64) + assert cop is not self.bseries + assert cop.sp_index is self.bseries.sp_index + assert cop.dtype == np.float64 cop2 = self.iseries.copy() @@ -332,8 +348,8 @@ def test_copy_astype(self): # test that data is copied cop[:5] = 97 - self.assertEqual(cop.sp_values[0], 97) - self.assertNotEqual(self.bseries.sp_values[0], 97) + assert cop.sp_values[0] == 97 + assert self.bseries.sp_values[0] != 97 # correct fill value zbcop = self.zbseries.copy() @@ -345,22 +361,22 @@ def test_copy_astype(self): # no deep copy view = self.bseries.copy(deep=False) view.sp_values[:5] = 5 - self.assertTrue((self.bseries.sp_values[:5] == 5).all()) + assert (self.bseries.sp_values[:5] == 5).all() def test_shape(self): - # GH 10452 - self.assertEqual(self.bseries.shape, (20, )) - self.assertEqual(self.btseries.shape, (20, )) - self.assertEqual(self.iseries.shape, (20, )) + # see gh-10452 + assert self.bseries.shape == (20, ) + assert self.btseries.shape == (20, ) + assert self.iseries.shape == (20, ) - self.assertEqual(self.bseries2.shape, (15, )) - self.assertEqual(self.iseries2.shape, (15, )) + assert self.bseries2.shape == (15, ) + assert self.iseries2.shape == (15, ) - self.assertEqual(self.zbseries2.shape, (15, )) - self.assertEqual(self.ziseries2.shape, (15, )) + assert self.zbseries2.shape == (15, ) + assert self.ziseries2.shape == (15, ) def test_astype(self): - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): self.bseries.astype(np.int64) def test_astype_all(self): @@ -371,12 +387,12 @@ def test_astype_all(self): np.int32, np.int16, np.int8] for typ in types: res = s.astype(typ) - self.assertEqual(res.dtype, typ) + assert res.dtype == typ tm.assert_series_equal(res.to_dense(), orig.astype(typ)) def test_kind(self): - self.assertEqual(self.bseries.kind, 'block') - self.assertEqual(self.iseries.kind, 'integer') + assert self.bseries.kind == 'block' + assert self.iseries.kind == 'integer' def test_to_frame(self): # GH 9850 @@ -397,7 +413,7 @@ def test_to_frame(self): def test_pickle(self): def _test_roundtrip(series): - unpickled = self.round_trip_pickle(series) + unpickled = tm.round_trip_pickle(series) tm.assert_sp_series_equal(series, unpickled) tm.assert_series_equal(series.to_dense(), unpickled.to_dense()) @@ -432,16 +448,16 @@ def _check_getitem(sp, dense): _check_getitem(self.ziseries, self.ziseries.to_dense()) # exception handling - self.assertRaises(Exception, self.bseries.__getitem__, - len(self.bseries) + 1) + pytest.raises(Exception, self.bseries.__getitem__, + len(self.bseries) + 1) # index not contained - self.assertRaises(Exception, self.btseries.__getitem__, - self.btseries.index[-1] + BDay()) + pytest.raises(Exception, self.btseries.__getitem__, + self.btseries.index[-1] + BDay()) def test_get_get_value(self): tm.assert_almost_equal(self.bseries.get(10), self.bseries[10]) - self.assertIsNone(self.bseries.get(len(self.bseries) + 1)) + assert self.bseries.get(len(self.bseries) + 1) is None dt = self.btseries.index[10] result = self.btseries.get(dt) @@ -454,22 +470,22 @@ def test_set_value(self): idx = self.btseries.index[7] self.btseries.set_value(idx, 0) - self.assertEqual(self.btseries[idx], 0) + assert self.btseries[idx] == 0 self.iseries.set_value('foobar', 0) - self.assertEqual(self.iseries.index[-1], 'foobar') - self.assertEqual(self.iseries['foobar'], 0) + assert self.iseries.index[-1] == 'foobar' + assert self.iseries['foobar'] == 0 def test_getitem_slice(self): idx = self.bseries.index res = self.bseries[::2] - tm.assertIsInstance(res, SparseSeries) + assert isinstance(res, SparseSeries) expected = self.bseries.reindex(idx[::2]) tm.assert_sp_series_equal(res, expected) res = self.bseries[:5] - tm.assertIsInstance(res, SparseSeries) + assert isinstance(res, SparseSeries) tm.assert_sp_series_equal(res, self.bseries.reindex(idx[:5])) res = self.bseries[5:] @@ -486,7 +502,7 @@ def _compare_with_dense(sp): def _compare(idx): dense_result = dense.take(idx).values sparse_result = sp.take(idx) - self.assertIsInstance(sparse_result, SparseSeries) + assert isinstance(sparse_result, SparseSeries) tm.assert_almost_equal(dense_result, sparse_result.values.values) @@ -496,8 +512,8 @@ def _compare(idx): self._check_all(_compare_with_dense) - self.assertRaises(Exception, self.bseries.take, - [0, len(self.bseries) + 1]) + pytest.raises(Exception, self.bseries.take, + [0, len(self.bseries) + 1]) # Corner case sp = SparseSeries(np.ones(10) * nan) @@ -512,16 +528,16 @@ def test_numpy_take(self): np.take(sp.to_dense(), indices, axis=0)) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.take, - sp, indices, out=np.empty(sp.shape)) + tm.assert_raises_regex(ValueError, msg, np.take, + sp, indices, out=np.empty(sp.shape)) msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.take, - sp, indices, mode='clip') + tm.assert_raises_regex(ValueError, msg, np.take, + sp, indices, mode='clip') def test_setitem(self): self.bseries[5] = 7. - self.assertEqual(self.bseries[5], 7.) + assert self.bseries[5] == 7. def test_setslice(self): self.bseries[5:10] = 7. @@ -578,8 +594,8 @@ def check(a, b): def test_binary_operators(self): # skipping for now ##### - import nose - raise nose.SkipTest("skipping sparse binary operators test") + import pytest + pytest.skip("skipping sparse binary operators test") def _check_inplace_op(iop, op): tmp = self.bseries.copy() @@ -598,30 +614,30 @@ def test_abs(self): expected = SparseSeries([1, 2, 3], name='x') result = s.abs() tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' result = abs(s) tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' result = np.abs(s) tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' s = SparseSeries([1, -2, 2, -3], fill_value=-2, name='x') expected = SparseSeries([1, 2, 3], sparse_index=s.sp_index, fill_value=2, name='x') result = s.abs() tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' result = abs(s) tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' result = np.abs(s) tm.assert_sp_series_equal(result, expected) - self.assertEqual(result.name, 'x') + assert result.name == 'x' def test_reindex(self): def _compare_with_series(sps, new_index): @@ -646,7 +662,7 @@ def _compare_with_series(sps, new_index): # special cases same_index = self.bseries.reindex(self.bseries.index) tm.assert_sp_series_equal(self.bseries, same_index) - self.assertIsNot(same_index, self.bseries) + assert same_index is not self.bseries # corner cases sp = SparseSeries([], index=[]) @@ -657,7 +673,7 @@ def _compare_with_series(sps, new_index): # with copy=False reindexed = self.bseries.reindex(self.bseries.index, copy=True) reindexed.sp_values[:] = 1. - self.assertTrue((self.bseries.sp_values != 1.).all()) + assert (self.bseries.sp_values != 1.).all() reindexed = self.bseries.reindex(self.bseries.index, copy=False) reindexed.sp_values[:] = 1. @@ -670,7 +686,7 @@ def _check(values, index1, index2, fill_value): first_series = SparseSeries(values, sparse_index=index1, fill_value=fill_value) reindexed = first_series.sparse_reindex(index2) - self.assertIs(reindexed.sp_index, index2) + assert reindexed.sp_index is index2 int_indices1 = index1.to_int_index().indices int_indices2 = index2.to_int_index().indices @@ -709,8 +725,8 @@ def _check_all(values, first, second): first_series = SparseSeries(values1, sparse_index=IntIndex(length, index1), fill_value=nan) - with tm.assertRaisesRegexp(TypeError, - 'new index must be a SparseIndex'): + with tm.assert_raises_regex(TypeError, + 'new index must be a SparseIndex'): reindexed = first_series.sparse_reindex(0) # noqa def test_repr(self): @@ -735,7 +751,7 @@ def _compare_with_dense(obj, op): sparse_result = getattr(obj, op)() series = obj.to_dense() dense_result = getattr(series, op)() - self.assertEqual(sparse_result, dense_result) + assert sparse_result == dense_result to_compare = ['count', 'sum', 'mean', 'std', 'var', 'skew'] @@ -771,12 +787,12 @@ def test_dropna(self): expected = expected[expected != 0] exp_arr = pd.SparseArray(expected.values, fill_value=0, kind='block') tm.assert_sp_array_equal(sp_valid.values, exp_arr) - self.assert_index_equal(sp_valid.index, expected.index) - self.assertEqual(len(sp_valid.sp_values), 2) + tm.assert_index_equal(sp_valid.index, expected.index) + assert len(sp_valid.sp_values) == 2 result = self.bseries.dropna() expected = self.bseries.to_dense().dropna() - self.assertNotIsInstance(result, SparseSeries) + assert not isinstance(result, SparseSeries) tm.assert_series_equal(result, expected) def test_homogenize(self): @@ -803,7 +819,7 @@ def _check_matches(indices, expected): # must have NaN fill value data = {'a': SparseSeries(np.arange(7), sparse_index=expected2, fill_value=0)} - with tm.assertRaisesRegexp(TypeError, "NaN fill value"): + with tm.assert_raises_regex(TypeError, "NaN fill value"): spf.homogenize(data) def test_fill_value_corner(self): @@ -811,13 +827,13 @@ def test_fill_value_corner(self): cop.fill_value = 0 result = self.bseries / cop - self.assertTrue(np.isnan(result.fill_value)) + assert np.isnan(result.fill_value) cop2 = self.zbseries.copy() cop2.fill_value = 1 result = cop2 / cop # 1 / 0 is inf - self.assertTrue(np.isinf(result.fill_value)) + assert np.isinf(result.fill_value) def test_fill_value_when_combine_const(self): # GH12723 @@ -825,13 +841,13 @@ def test_fill_value_when_combine_const(self): exp = s.fillna(0).add(2) res = s.add(2, fill_value=0) - self.assert_series_equal(res, exp) + tm.assert_series_equal(res, exp) def test_shift(self): series = SparseSeries([nan, 1., 2., 3., nan, nan], index=np.arange(6)) shifted = series.shift(0) - self.assertIsNot(shifted, series) + assert shifted is not series tm.assert_sp_series_equal(shifted, series) f = lambda s: s.shift(1) @@ -940,8 +956,9 @@ def test_combine_first(self): tm.assert_sp_series_equal(result, expected) -class TestSparseHandlingMultiIndexes(tm.TestCase): - def setUp(self): +class TestSparseHandlingMultiIndexes(object): + + def setup_method(self, method): miindex = pd.MultiIndex.from_product( [["x", "y"], ["10", "20"]], names=['row-foo', 'row-bar']) micol = pd.MultiIndex.from_product( @@ -965,10 +982,10 @@ def test_round_trip_preserve_multiindex_names(self): check_names=True) -class TestSparseSeriesScipyInteraction(tm.TestCase): +class TestSparseSeriesScipyInteraction(object): # Issue 8048: add SparseSeries coo methods - def setUp(self): + def setup_method(self, method): tm._skip_if_no_scipy() import scipy.sparse # SparseSeries inputs used in tests, the tests rely on the order @@ -1042,25 +1059,25 @@ def test_to_coo_text_names_text_row_levels_nosort(self): def test_to_coo_bad_partition_nonnull_intersection(self): ss = self.sparse_series[0] - self.assertRaises(ValueError, ss.to_coo, ['A', 'B', 'C'], ['C', 'D']) + pytest.raises(ValueError, ss.to_coo, ['A', 'B', 'C'], ['C', 'D']) def test_to_coo_bad_partition_small_union(self): ss = self.sparse_series[0] - self.assertRaises(ValueError, ss.to_coo, ['A'], ['C', 'D']) + pytest.raises(ValueError, ss.to_coo, ['A'], ['C', 'D']) def test_to_coo_nlevels_less_than_two(self): ss = self.sparse_series[0] ss.index = np.arange(len(ss.index)) - self.assertRaises(ValueError, ss.to_coo) + pytest.raises(ValueError, ss.to_coo) def test_to_coo_bad_ilevel(self): ss = self.sparse_series[0] - self.assertRaises(KeyError, ss.to_coo, ['A', 'B'], ['C', 'D', 'E']) + pytest.raises(KeyError, ss.to_coo, ['A', 'B'], ['C', 'D', 'E']) def test_to_coo_duplicate_index_entries(self): ss = pd.concat([self.sparse_series[0], self.sparse_series[0]]).to_sparse() - self.assertRaises(ValueError, ss.to_coo, ['A', 'B'], ['C', 'D']) + pytest.raises(ValueError, ss.to_coo, ['A', 'B'], ['C', 'D']) def test_from_coo_dense_index(self): ss = SparseSeries.from_coo(self.coo_matrices[0], dense_index=True) @@ -1102,8 +1119,8 @@ def _check_results_to_coo(self, results, check): # or compare directly as difference of sparse # assert(abs(A - A_result).max() < 1e-12) # max is failing in python # 2.6 - self.assertEqual(il, il_result) - self.assertEqual(jl, jl_result) + assert il == il_result + assert jl == jl_result def test_concat(self): val1 = np.array([1, 2, np.nan, np.nan, 0, np.nan]) @@ -1167,7 +1184,7 @@ def test_concat_axis1_different_fill(self): res = pd.concat([sparse1, sparse2], axis=1) exp = pd.concat([pd.Series(val1, name='x'), pd.Series(val2, name='y')], axis=1) - self.assertIsInstance(res, pd.SparseDataFrame) + assert isinstance(res, pd.SparseDataFrame) tm.assert_frame_equal(res.to_dense(), exp) def test_concat_different_kind(self): @@ -1273,11 +1290,11 @@ def test_value_counts_int(self): tm.assert_series_equal(sparse.value_counts(dropna=False), dense.value_counts(dropna=False)) - def test_isnull(self): + def test_isna(self): # GH 8276 s = pd.SparseSeries([np.nan, np.nan, 1, 2, np.nan], name='xxx') - res = s.isnull() + res = s.isna() exp = pd.SparseSeries([True, True, False, False, True], name='xxx', fill_value=True) tm.assert_sp_series_equal(res, exp) @@ -1285,16 +1302,16 @@ def test_isnull(self): # if fill_value is not nan, True can be included in sp_values s = pd.SparseSeries([np.nan, 0., 1., 2., 0.], name='xxx', fill_value=0.) - res = s.isnull() - tm.assertIsInstance(res, pd.SparseSeries) + res = s.isna() + assert isinstance(res, pd.SparseSeries) exp = pd.Series([True, False, False, False, False], name='xxx') tm.assert_series_equal(res.to_dense(), exp) - def test_isnotnull(self): + def test_notna(self): # GH 8276 s = pd.SparseSeries([np.nan, np.nan, 1, 2, np.nan], name='xxx') - res = s.isnotnull() + res = s.notna() exp = pd.SparseSeries([False, False, True, True, False], name='xxx', fill_value=False) tm.assert_sp_series_equal(res, exp) @@ -1302,8 +1319,8 @@ def test_isnotnull(self): # if fill_value is not nan, True can be included in sp_values s = pd.SparseSeries([np.nan, 0., 1., 2., 0.], name='xxx', fill_value=0.) - res = s.isnotnull() - tm.assertIsInstance(res, pd.SparseSeries) + res = s.notna() + assert isinstance(res, pd.SparseSeries) exp = pd.Series([False, True, True, True, True], name='xxx') tm.assert_series_equal(res.to_dense(), exp) @@ -1315,9 +1332,9 @@ def _dense_series_compare(s, f): tm.assert_series_equal(result.to_dense(), dense_result) -class TestSparseSeriesAnalytics(tm.TestCase): +class TestSparseSeriesAnalytics(object): - def setUp(self): + def setup_method(self, method): arr, index = _test_data1() self.bseries = SparseSeries(arr, index=index, kind='block', name='bseries') @@ -1337,7 +1354,7 @@ def test_cumsum(self): axis = 1 # Series is 1-D, so only axis = 0 is valid. msg = "No axis named {axis}".format(axis=axis) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): self.bseries.cumsum(axis=axis) def test_numpy_cumsum(self): @@ -1350,12 +1367,12 @@ def test_numpy_cumsum(self): tm.assert_series_equal(result, expected) msg = "the 'dtype' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - self.bseries, dtype=np.int64) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + self.bseries, dtype=np.int64) msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.cumsum, - self.zbseries, out=result) + tm.assert_raises_regex(ValueError, msg, np.cumsum, + self.zbseries, out=result) def test_numpy_func_call(self): # no exception should be raised even though @@ -1368,7 +1385,16 @@ def test_numpy_func_call(self): getattr(np, func)(getattr(self, series)) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) +@pytest.mark.parametrize( + 'datetime_type', (np.datetime64, + pd.Timestamp, + lambda x: datetime.strptime(x, '%Y-%m-%d'))) +def test_constructor_dict_datetime64_index(datetime_type): + # GH 9456 + dates = ['1984-02-19', '1988-11-06', '1989-12-03', '1990-03-15'] + values = [42544017.198965244, 1234565, 40512335.181958228, -1] + + result = SparseSeries(dict(zip(map(datetime_type, dates), values))) + expected = SparseSeries(values, map(pd.Timestamp, dates)) + + tm.assert_sp_series_equal(result, expected) diff --git a/pandas/tests/test_algos.py b/pandas/tests/test_algos.py index 99453b9793007..b26089ea7a822 100644 --- a/pandas/tests/test_algos.py +++ b/pandas/tests/test_algos.py @@ -1,26 +1,29 @@ # -*- coding: utf-8 -*- -from pandas.compat import range import numpy as np +import pytest + from numpy.random import RandomState from numpy import nan from datetime import datetime from itertools import permutations -from pandas import Series, Categorical, CategoricalIndex, Index +from pandas import (Series, Categorical, CategoricalIndex, + Timestamp, DatetimeIndex, + Index, IntervalIndex) import pandas as pd from pandas import compat -import pandas.algos as _algos -from pandas.compat import lrange +from pandas._libs import (groupby as libgroupby, algos as libalgos, + hashtable as ht) +from pandas._libs.hashtable import unique_label_indices +from pandas.compat import lrange, range import pandas.core.algorithms as algos import pandas.util.testing as tm -import pandas.hashtable as hashtable from pandas.compat.numpy import np_array_datetime64_compat from pandas.util.testing import assert_almost_equal -class TestMatch(tm.TestCase): - _multiprocess_can_split_ = True +class TestMatch(object): def test_ints(self): values = np.array([0, 2, 1]) @@ -28,16 +31,16 @@ def test_ints(self): result = algos.match(to_match, values) expected = np.array([0, 2, 1, 1, 0, 2, -1, 0], dtype=np.int64) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = Series(algos.match(to_match, values, np.nan)) expected = Series(np.array([0, 2, 1, 1, 0, 2, np.nan, 0])) tm.assert_series_equal(result, expected) - s = pd.Series(np.arange(5), dtype=np.float32) + s = Series(np.arange(5), dtype=np.float32) result = algos.match(s, [2, 4]) expected = np.array([-1, -1, 0, -1, 1], dtype=np.int64) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = Series(algos.match(s, [2, 4], np.nan)) expected = Series(np.array([np.nan, np.nan, 0, np.nan, 1])) @@ -49,142 +52,54 @@ def test_strings(self): result = algos.match(to_match, values) expected = np.array([1, 0, -1, 0, 1, 2, -1], dtype=np.int64) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = Series(algos.match(to_match, values, np.nan)) expected = Series(np.array([1, 0, np.nan, 0, 1, 2, np.nan])) tm.assert_series_equal(result, expected) -class TestSafeSort(tm.TestCase): - _multiprocess_can_split_ = True - - def test_basic_sort(self): - values = [3, 1, 2, 0, 4] - result = algos.safe_sort(values) - expected = np.array([0, 1, 2, 3, 4]) - tm.assert_numpy_array_equal(result, expected) - - values = list("baaacb") - result = algos.safe_sort(values) - expected = np.array(list("aaabbc")) - tm.assert_numpy_array_equal(result, expected) - - values = [] - result = algos.safe_sort(values) - expected = np.array([]) - tm.assert_numpy_array_equal(result, expected) - - def test_labels(self): - values = [3, 1, 2, 0, 4] - expected = np.array([0, 1, 2, 3, 4]) - - labels = [0, 1, 1, 2, 3, 0, -1, 4] - result, result_labels = algos.safe_sort(values, labels) - expected_labels = np.array([3, 1, 1, 2, 0, 3, -1, 4], dtype=np.intp) - tm.assert_numpy_array_equal(result, expected) - tm.assert_numpy_array_equal(result_labels, expected_labels) - - # na_sentinel - labels = [0, 1, 1, 2, 3, 0, 99, 4] - result, result_labels = algos.safe_sort(values, labels, - na_sentinel=99) - expected_labels = np.array([3, 1, 1, 2, 0, 3, 99, 4], dtype=np.intp) - tm.assert_numpy_array_equal(result, expected) - tm.assert_numpy_array_equal(result_labels, expected_labels) - - # out of bound indices - labels = [0, 101, 102, 2, 3, 0, 99, 4] - result, result_labels = algos.safe_sort(values, labels) - expected_labels = np.array([3, -1, -1, 2, 0, 3, -1, 4], dtype=np.intp) - tm.assert_numpy_array_equal(result, expected) - tm.assert_numpy_array_equal(result_labels, expected_labels) - - labels = [] - result, result_labels = algos.safe_sort(values, labels) - expected_labels = np.array([], dtype=np.intp) - tm.assert_numpy_array_equal(result, expected) - tm.assert_numpy_array_equal(result_labels, expected_labels) - - def test_mixed_integer(self): - values = np.array(['b', 1, 0, 'a', 0, 'b'], dtype=object) - result = algos.safe_sort(values) - expected = np.array([0, 0, 1, 'a', 'b', 'b'], dtype=object) - tm.assert_numpy_array_equal(result, expected) - - values = np.array(['b', 1, 0, 'a'], dtype=object) - labels = [0, 1, 2, 3, 0, -1, 1] - result, result_labels = algos.safe_sort(values, labels) - expected = np.array([0, 1, 'a', 'b'], dtype=object) - expected_labels = np.array([3, 1, 0, 2, 3, -1, 1], dtype=np.intp) - tm.assert_numpy_array_equal(result, expected) - tm.assert_numpy_array_equal(result_labels, expected_labels) - - def test_unsortable(self): - # GH 13714 - arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object) - if compat.PY2 and not pd._np_version_under1p10: - # RuntimeWarning: tp_compare didn't return -1 or -2 for exception - with tm.assert_produces_warning(RuntimeWarning): - tm.assertRaises(TypeError, algos.safe_sort, arr) - else: - tm.assertRaises(TypeError, algos.safe_sort, arr) - - def test_exceptions(self): - with tm.assertRaisesRegexp(TypeError, - "Only list-like objects are allowed"): - algos.safe_sort(values=1) - - with tm.assertRaisesRegexp(TypeError, - "Only list-like objects or None"): - algos.safe_sort(values=[0, 1, 2], labels=1) - - with tm.assertRaisesRegexp(ValueError, "values should be unique"): - algos.safe_sort(values=[0, 1, 2, 1], labels=[0, 1]) - - -class TestFactorize(tm.TestCase): - _multiprocess_can_split_ = True +class TestFactorize(object): def test_basic(self): labels, uniques = algos.factorize(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c']) - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal( uniques, np.array(['a', 'b', 'c'], dtype=object)) labels, uniques = algos.factorize(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c'], sort=True) exp = np.array([0, 1, 1, 0, 0, 2, 2, 2], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = np.array(['a', 'b', 'c'], dtype=object) - self.assert_numpy_array_equal(uniques, exp) + tm.assert_numpy_array_equal(uniques, exp) labels, uniques = algos.factorize(list(reversed(range(5)))) exp = np.array([0, 1, 2, 3, 4], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = np.array([4, 3, 2, 1, 0], dtype=np.int64) - self.assert_numpy_array_equal(uniques, exp) + tm.assert_numpy_array_equal(uniques, exp) labels, uniques = algos.factorize(list(reversed(range(5))), sort=True) exp = np.array([4, 3, 2, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = np.array([0, 1, 2, 3, 4], dtype=np.int64) - self.assert_numpy_array_equal(uniques, exp) + tm.assert_numpy_array_equal(uniques, exp) labels, uniques = algos.factorize(list(reversed(np.arange(5.)))) exp = np.array([0, 1, 2, 3, 4], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = np.array([4., 3., 2., 1., 0.], dtype=np.float64) - self.assert_numpy_array_equal(uniques, exp) + tm.assert_numpy_array_equal(uniques, exp) labels, uniques = algos.factorize(list(reversed(np.arange(5.))), sort=True) exp = np.array([4, 3, 2, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = np.array([0., 1., 2., 3., 4.], dtype=np.float64) - self.assert_numpy_array_equal(uniques, exp) + tm.assert_numpy_array_equal(uniques, exp) def test_mixed(self): @@ -193,34 +108,34 @@ def test_mixed(self): labels, uniques = algos.factorize(x) exp = np.array([0, 0, -1, 1, 2, 3], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = pd.Index(['A', 'B', 3.14, np.inf]) tm.assert_index_equal(uniques, exp) labels, uniques = algos.factorize(x, sort=True) exp = np.array([2, 2, -1, 3, 0, 1], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) + tm.assert_numpy_array_equal(labels, exp) exp = pd.Index([3.14, np.inf, 'A', 'B']) tm.assert_index_equal(uniques, exp) def test_datelike(self): # M8 - v1 = pd.Timestamp('20130101 09:00:00.00004') - v2 = pd.Timestamp('20130101') + v1 = Timestamp('20130101 09:00:00.00004') + v2 = Timestamp('20130101') x = Series([v1, v1, v1, v2, v2, v1]) labels, uniques = algos.factorize(x) exp = np.array([0, 0, 0, 1, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - exp = pd.DatetimeIndex([v1, v2]) - self.assert_index_equal(uniques, exp) + tm.assert_numpy_array_equal(labels, exp) + exp = DatetimeIndex([v1, v2]) + tm.assert_index_equal(uniques, exp) labels, uniques = algos.factorize(x, sort=True) exp = np.array([1, 1, 1, 0, 0, 1], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - exp = pd.DatetimeIndex([v2, v1]) - self.assert_index_equal(uniques, exp) + tm.assert_numpy_array_equal(labels, exp) + exp = DatetimeIndex([v2, v1]) + tm.assert_index_equal(uniques, exp) # period v1 = pd.Period('201302', freq='M') @@ -230,13 +145,13 @@ def test_datelike(self): # periods are not 'sorted' as they are converted back into an index labels, uniques = algos.factorize(x) exp = np.array([0, 0, 0, 1, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - self.assert_index_equal(uniques, pd.PeriodIndex([v1, v2])) + tm.assert_numpy_array_equal(labels, exp) + tm.assert_index_equal(uniques, pd.PeriodIndex([v1, v2])) labels, uniques = algos.factorize(x, sort=True) exp = np.array([0, 0, 0, 1, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - self.assert_index_equal(uniques, pd.PeriodIndex([v1, v2])) + tm.assert_numpy_array_equal(labels, exp) + tm.assert_index_equal(uniques, pd.PeriodIndex([v1, v2])) # GH 5986 v1 = pd.to_timedelta('1 day 1 min') @@ -244,26 +159,26 @@ def test_datelike(self): x = Series([v1, v2, v1, v1, v2, v2, v1]) labels, uniques = algos.factorize(x) exp = np.array([0, 1, 0, 0, 1, 1, 0], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - self.assert_index_equal(uniques, pd.to_timedelta([v1, v2])) + tm.assert_numpy_array_equal(labels, exp) + tm.assert_index_equal(uniques, pd.to_timedelta([v1, v2])) labels, uniques = algos.factorize(x, sort=True) exp = np.array([1, 0, 1, 1, 0, 0, 1], dtype=np.intp) - self.assert_numpy_array_equal(labels, exp) - self.assert_index_equal(uniques, pd.to_timedelta([v2, v1])) + tm.assert_numpy_array_equal(labels, exp) + tm.assert_index_equal(uniques, pd.to_timedelta([v2, v1])) def test_factorize_nan(self): # nan should map to na_sentinel, not reverse_indexer[na_sentinel] # rizer.factorize should not raise an exception if na_sentinel indexes # outside of reverse_indexer key = np.array([1, 2, 1, np.nan], dtype='O') - rizer = hashtable.Factorizer(len(key)) + rizer = ht.Factorizer(len(key)) for na_sentinel in (-1, 20): ids = rizer.factorize(key, sort=True, na_sentinel=na_sentinel) expected = np.array([0, 1, 0, na_sentinel], dtype='int32') - self.assertEqual(len(set(key)), len(set(expected))) - self.assertTrue(np.array_equal( - pd.isnull(key), expected == na_sentinel)) + assert len(set(key)) == len(set(expected)) + tm.assert_numpy_array_equal(pd.isna(key), + expected == na_sentinel) # nan still maps to na_sentinel when sort=False key = np.array([0, np.nan, 1], dtype='O') @@ -273,19 +188,18 @@ def test_factorize_nan(self): ids = rizer.factorize(key, sort=False, na_sentinel=na_sentinel) # noqa expected = np.array([2, -1, 0], dtype='int32') - self.assertEqual(len(set(key)), len(set(expected))) - self.assertTrue( - np.array_equal(pd.isnull(key), expected == na_sentinel)) + assert len(set(key)) == len(set(expected)) + tm.assert_numpy_array_equal(pd.isna(key), expected == na_sentinel) def test_complex_sorting(self): # gh 12666 - check no segfault # Test not valid numpy versions older than 1.11 if pd._np_version_under1p11: - self.skipTest("Test valid only for numpy 1.11+") + pytest.skip("Test valid only for numpy 1.11+") x17 = np.array([complex(i) for i in range(17)], dtype=object) - self.assertRaises(TypeError, algos.factorize, x17[::-1], sort=True) + pytest.raises(TypeError, algos.factorize, x17[::-1], sort=True) def test_uint64_factorize(self): data = np.array([2**63, 1, 2**63], dtype=np.uint64) @@ -305,20 +219,19 @@ def test_uint64_factorize(self): tm.assert_numpy_array_equal(uniques, exp_uniques) -class TestUnique(tm.TestCase): - _multiprocess_can_split_ = True +class TestUnique(object): def test_ints(self): arr = np.random.randint(0, 100, size=50) result = algos.unique(arr) - tm.assertIsInstance(result, np.ndarray) + assert isinstance(result, np.ndarray) def test_objects(self): arr = np.random.randint(0, 100, size=50).astype('O') result = algos.unique(arr) - tm.assertIsInstance(result, np.ndarray) + assert isinstance(result, np.ndarray) def test_object_refcount_bug(self): lst = ['A', 'B', 'C', 'D', 'E'] @@ -351,17 +264,17 @@ def test_datetime64_dtype_array_returned(self): '2015-01-01T00:00:00.000000000+0000']) result = algos.unique(dt_index) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype - s = pd.Series(dt_index) + s = Series(dt_index) result = algos.unique(s) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype arr = s.values result = algos.unique(arr) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype def test_timedelta64_dtype_array_returned(self): # GH 9431 @@ -370,32 +283,155 @@ def test_timedelta64_dtype_array_returned(self): td_index = pd.to_timedelta([31200, 45678, 31200, 10000, 45678]) result = algos.unique(td_index) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype - s = pd.Series(td_index) + s = Series(td_index) result = algos.unique(s) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype arr = s.values result = algos.unique(arr) tm.assert_numpy_array_equal(result, expected) - self.assertEqual(result.dtype, expected.dtype) + assert result.dtype == expected.dtype def test_uint64_overflow(self): - s = pd.Series([1, 2, 2**63, 2**63], dtype=np.uint64) + s = Series([1, 2, 2**63, 2**63], dtype=np.uint64) exp = np.array([1, 2, 2**63], dtype=np.uint64) tm.assert_numpy_array_equal(algos.unique(s), exp) + def test_nan_in_object_array(self): + l = ['a', np.nan, 'c', 'c'] + result = pd.unique(l) + expected = np.array(['a', np.nan, 'c'], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + def test_categorical(self): + + # we are expecting to return in the order + # of appearance + expected = pd.Categorical(list('bac'), + categories=list('bac')) + + # we are expecting to return in the order + # of the categories + expected_o = pd.Categorical(list('bac'), + categories=list('abc'), + ordered=True) + + # GH 15939 + c = pd.Categorical(list('baabc')) + result = c.unique() + tm.assert_categorical_equal(result, expected) + + result = algos.unique(c) + tm.assert_categorical_equal(result, expected) + + c = pd.Categorical(list('baabc'), ordered=True) + result = c.unique() + tm.assert_categorical_equal(result, expected_o) + + result = algos.unique(c) + tm.assert_categorical_equal(result, expected_o) + + # Series of categorical dtype + s = Series(pd.Categorical(list('baabc')), name='foo') + result = s.unique() + tm.assert_categorical_equal(result, expected) + + result = pd.unique(s) + tm.assert_categorical_equal(result, expected) + + # CI -> return CI + ci = pd.CategoricalIndex(pd.Categorical(list('baabc'), + categories=list('bac'))) + expected = pd.CategoricalIndex(expected) + result = ci.unique() + tm.assert_index_equal(result, expected) + + result = pd.unique(ci) + tm.assert_index_equal(result, expected) + + def test_datetime64tz_aware(self): + # GH 15939 + + result = Series( + pd.Index([Timestamp('20160101', tz='US/Eastern'), + Timestamp('20160101', tz='US/Eastern')])).unique() + expected = np.array([Timestamp('2016-01-01 00:00:00-0500', + tz='US/Eastern')], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + result = pd.Index([Timestamp('20160101', tz='US/Eastern'), + Timestamp('20160101', tz='US/Eastern')]).unique() + expected = DatetimeIndex(['2016-01-01 00:00:00'], + dtype='datetime64[ns, US/Eastern]', freq=None) + tm.assert_index_equal(result, expected) + + result = pd.unique( + Series(pd.Index([Timestamp('20160101', tz='US/Eastern'), + Timestamp('20160101', tz='US/Eastern')]))) + expected = np.array([Timestamp('2016-01-01 00:00:00-0500', + tz='US/Eastern')], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + result = pd.unique(pd.Index([Timestamp('20160101', tz='US/Eastern'), + Timestamp('20160101', tz='US/Eastern')])) + expected = DatetimeIndex(['2016-01-01 00:00:00'], + dtype='datetime64[ns, US/Eastern]', freq=None) + tm.assert_index_equal(result, expected) + + def test_order_of_appearance(self): + # 9346 + # light testing of guarantee of order of appearance + # these also are the doc-examples + result = pd.unique(Series([2, 1, 3, 3])) + tm.assert_numpy_array_equal(result, + np.array([2, 1, 3], dtype='int64')) + + result = pd.unique(Series([2] + [1] * 5)) + tm.assert_numpy_array_equal(result, + np.array([2, 1], dtype='int64')) + + result = pd.unique(Series([Timestamp('20160101'), + Timestamp('20160101')])) + expected = np.array(['2016-01-01T00:00:00.000000000'], + dtype='datetime64[ns]') + tm.assert_numpy_array_equal(result, expected) + + result = pd.unique(pd.Index( + [Timestamp('20160101', tz='US/Eastern'), + Timestamp('20160101', tz='US/Eastern')])) + expected = DatetimeIndex(['2016-01-01 00:00:00'], + dtype='datetime64[ns, US/Eastern]', + freq=None) + tm.assert_index_equal(result, expected) + + result = pd.unique(list('aabc')) + expected = np.array(['a', 'b', 'c'], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + result = pd.unique(Series(pd.Categorical(list('aabc')))) + expected = pd.Categorical(list('abc')) + tm.assert_categorical_equal(result, expected) + + @pytest.mark.parametrize("arg ,expected", [ + (('1', '1', '2'), np.array(['1', '2'], dtype=object)), + (('foo',), np.array(['foo'], dtype=object)) + ]) + def test_tuple_with_strings(self, arg, expected): + # see GH 17108 + result = pd.unique(arg) + tm.assert_numpy_array_equal(result, expected) + -class TestIsin(tm.TestCase): - _multiprocess_can_split_ = True +class TestIsin(object): def test_invalid(self): - self.assertRaises(TypeError, lambda: algos.isin(1, 1)) - self.assertRaises(TypeError, lambda: algos.isin(1, [1])) - self.assertRaises(TypeError, lambda: algos.isin([1], 1)) + pytest.raises(TypeError, lambda: algos.isin(1, 1)) + pytest.raises(TypeError, lambda: algos.isin(1, [1])) + pytest.raises(TypeError, lambda: algos.isin([1], 1)) def test_basic(self): @@ -407,15 +443,15 @@ def test_basic(self): expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) - result = algos.isin(pd.Series([1, 2]), [1]) + result = algos.isin(Series([1, 2]), [1]) expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) - result = algos.isin(pd.Series([1, 2]), pd.Series([1])) + result = algos.isin(Series([1, 2]), Series([1])) expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) - result = algos.isin(pd.Series([1, 2]), set([1])) + result = algos.isin(Series([1, 2]), set([1])) expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) @@ -423,11 +459,11 @@ def test_basic(self): expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) - result = algos.isin(pd.Series(['a', 'b']), pd.Series(['a'])) + result = algos.isin(Series(['a', 'b']), Series(['a'])) expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) - result = algos.isin(pd.Series(['a', 'b']), set(['a'])) + result = algos.isin(Series(['a', 'b']), set(['a'])) expected = np.array([True, False]) tm.assert_numpy_array_equal(result, expected) @@ -435,6 +471,8 @@ def test_basic(self): expected = np.array([False, False]) tm.assert_numpy_array_equal(result, expected) + def test_i8(self): + arr = pd.date_range('20130101', periods=3).values result = algos.isin(arr, [arr[0]]) expected = np.array([True, False, False]) @@ -470,48 +508,69 @@ def test_large(self): expected[1] = True tm.assert_numpy_array_equal(result, expected) + def test_categorical_from_codes(self): + # GH 16639 + vals = np.array([0, 1, 2, 0]) + cats = ['a', 'b', 'c'] + Sd = pd.Series(pd.Categorical(1).from_codes(vals, cats)) + St = pd.Series(pd.Categorical(1).from_codes(np.array([0, 1]), cats)) + expected = np.array([True, True, False, True]) + result = algos.isin(Sd, St) + tm.assert_numpy_array_equal(expected, result) + + @pytest.mark.parametrize("empty", [[], pd.Series(), np.array([])]) + def test_empty(self, empty): + # see gh-16991 + vals = pd.Index(["a", "b"]) + expected = np.array([False, False]) + + result = algos.isin(vals, empty) + tm.assert_numpy_array_equal(expected, result) -class TestValueCounts(tm.TestCase): - _multiprocess_can_split_ = True + +class TestValueCounts(object): def test_value_counts(self): np.random.seed(1234) - from pandas.tools.tile import cut + from pandas.core.reshape.tile import cut arr = np.random.randn(4) factor = cut(arr, 4) - tm.assertIsInstance(factor, Categorical) + # assert isinstance(factor, n) result = algos.value_counts(factor) - cats = ['(-1.194, -0.535]', '(-0.535, 0.121]', '(0.121, 0.777]', - '(0.777, 1.433]'] - expected_index = CategoricalIndex(cats, cats, ordered=True) - expected = Series([1, 1, 1, 1], index=expected_index) + breaks = [-1.194, -0.535, 0.121, 0.777, 1.433] + expected_index = pd.IntervalIndex.from_breaks( + breaks).astype('category') + expected = Series([1, 1, 1, 1], + index=expected_index) tm.assert_series_equal(result.sort_index(), expected.sort_index()) def test_value_counts_bins(self): s = [1, 2, 3, 4] result = algos.value_counts(s, bins=1) - self.assertEqual(result.tolist(), [4]) - self.assertEqual(result.index[0], 0.997) + expected = Series([4], + index=IntervalIndex.from_tuples([(0.996, 4.0)])) + tm.assert_series_equal(result, expected) result = algos.value_counts(s, bins=2, sort=False) - self.assertEqual(result.tolist(), [2, 2]) - self.assertEqual(result.index[0], 0.997) - self.assertEqual(result.index[1], 2.5) + expected = Series([2, 2], + index=IntervalIndex.from_tuples([(0.996, 2.5), + (2.5, 4.0)])) + tm.assert_series_equal(result, expected) def test_value_counts_dtypes(self): result = algos.value_counts([1, 1.]) - self.assertEqual(len(result), 1) + assert len(result) == 1 result = algos.value_counts([1, 1.], bins=1) - self.assertEqual(len(result), 1) + assert len(result) == 1 result = algos.value_counts(Series([1, 1., '1'])) # object - self.assertEqual(len(result), 2) + assert len(result) == 2 - self.assertRaises(TypeError, lambda s: algos.value_counts(s, bins=1), - ['1', 1]) + pytest.raises(TypeError, lambda s: algos.value_counts(s, bins=1), + ['1', 1]) def test_value_counts_nat(self): td = Series([np.timedelta64(10000), pd.NaT], dtype='timedelta64[ns]') @@ -520,36 +579,37 @@ def test_value_counts_nat(self): for s in [td, dt]: vc = algos.value_counts(s) vc_with_na = algos.value_counts(s, dropna=False) - self.assertEqual(len(vc), 1) - self.assertEqual(len(vc_with_na), 2) + assert len(vc) == 1 + assert len(vc_with_na) == 2 - exp_dt = pd.Series({pd.Timestamp('2014-01-01 00:00:00'): 1}) + exp_dt = Series({Timestamp('2014-01-01 00:00:00'): 1}) tm.assert_series_equal(algos.value_counts(dt), exp_dt) # TODO same for (timedelta) def test_value_counts_datetime_outofbounds(self): # GH 13663 - s = pd.Series([datetime(3000, 1, 1), datetime(5000, 1, 1), - datetime(5000, 1, 1), datetime(6000, 1, 1), - datetime(3000, 1, 1), datetime(3000, 1, 1)]) + s = Series([datetime(3000, 1, 1), datetime(5000, 1, 1), + datetime(5000, 1, 1), datetime(6000, 1, 1), + datetime(3000, 1, 1), datetime(3000, 1, 1)]) res = s.value_counts() exp_index = pd.Index([datetime(3000, 1, 1), datetime(5000, 1, 1), datetime(6000, 1, 1)], dtype=object) - exp = pd.Series([3, 2, 1], index=exp_index) + exp = Series([3, 2, 1], index=exp_index) tm.assert_series_equal(res, exp) # GH 12424 - res = pd.to_datetime(pd.Series(['2362-01-01', np.nan]), + res = pd.to_datetime(Series(['2362-01-01', np.nan]), errors='ignore') - exp = pd.Series(['2362-01-01', np.nan], dtype=object) + exp = Series(['2362-01-01', np.nan], dtype=object) tm.assert_series_equal(res, exp) def test_categorical(self): s = Series(pd.Categorical(list('aaabbc'))) result = s.value_counts() - expected = pd.Series([3, 2, 1], - index=pd.CategoricalIndex(['a', 'b', 'c'])) + expected = Series([3, 2, 1], + index=pd.CategoricalIndex(['a', 'b', 'c'])) + tm.assert_series_equal(result, expected, check_index_type=True) # preserve order? @@ -562,13 +622,14 @@ def test_categorical_nans(self): s = Series(pd.Categorical(list('aaaaabbbcc'))) # 4,3,2,1 (nan) s.iloc[1] = np.nan result = s.value_counts() - expected = pd.Series([4, 3, 2], index=pd.CategoricalIndex( + expected = Series([4, 3, 2], index=pd.CategoricalIndex( + ['a', 'b', 'c'], categories=['a', 'b', 'c'])) tm.assert_series_equal(result, expected, check_index_type=True) result = s.value_counts(dropna=False) - expected = pd.Series([ + expected = Series([ 4, 3, 2, 1 - ], index=pd.CategoricalIndex(['a', 'b', 'c', np.nan])) + ], index=CategoricalIndex(['a', 'b', 'c', np.nan])) tm.assert_series_equal(result, expected, check_index_type=True) # out of order @@ -576,12 +637,12 @@ def test_categorical_nans(self): list('aaaaabbbcc'), ordered=True, categories=['b', 'a', 'c'])) s.iloc[1] = np.nan result = s.value_counts() - expected = pd.Series([4, 3, 2], index=pd.CategoricalIndex( + expected = Series([4, 3, 2], index=pd.CategoricalIndex( ['a', 'b', 'c'], categories=['b', 'a', 'c'], ordered=True)) tm.assert_series_equal(result, expected, check_index_type=True) result = s.value_counts(dropna=False) - expected = pd.Series([4, 3, 2, 1], index=pd.CategoricalIndex( + expected = Series([4, 3, 2, 1], index=pd.CategoricalIndex( ['a', 'b', 'c', np.nan], categories=['b', 'a', 'c'], ordered=True)) tm.assert_series_equal(result, expected, check_index_type=True) @@ -598,34 +659,34 @@ def test_dropna(self): # https://github.com/pandas-dev/pandas/issues/9443#issuecomment-73719328 tm.assert_series_equal( - pd.Series([True, True, False]).value_counts(dropna=True), - pd.Series([2, 1], index=[True, False])) + Series([True, True, False]).value_counts(dropna=True), + Series([2, 1], index=[True, False])) tm.assert_series_equal( - pd.Series([True, True, False]).value_counts(dropna=False), - pd.Series([2, 1], index=[True, False])) + Series([True, True, False]).value_counts(dropna=False), + Series([2, 1], index=[True, False])) tm.assert_series_equal( - pd.Series([True, True, False, None]).value_counts(dropna=True), - pd.Series([2, 1], index=[True, False])) + Series([True, True, False, None]).value_counts(dropna=True), + Series([2, 1], index=[True, False])) tm.assert_series_equal( - pd.Series([True, True, False, None]).value_counts(dropna=False), - pd.Series([2, 1, 1], index=[True, False, np.nan])) + Series([True, True, False, None]).value_counts(dropna=False), + Series([2, 1, 1], index=[True, False, np.nan])) tm.assert_series_equal( - pd.Series([10.3, 5., 5.]).value_counts(dropna=True), - pd.Series([2, 1], index=[5., 10.3])) + Series([10.3, 5., 5.]).value_counts(dropna=True), + Series([2, 1], index=[5., 10.3])) tm.assert_series_equal( - pd.Series([10.3, 5., 5.]).value_counts(dropna=False), - pd.Series([2, 1], index=[5., 10.3])) + Series([10.3, 5., 5.]).value_counts(dropna=False), + Series([2, 1], index=[5., 10.3])) tm.assert_series_equal( - pd.Series([10.3, 5., 5., None]).value_counts(dropna=True), - pd.Series([2, 1], index=[5., 10.3])) + Series([10.3, 5., 5., None]).value_counts(dropna=True), + Series([2, 1], index=[5., 10.3])) # 32-bit linux has a different ordering if not compat.is_platform_32bit(): - tm.assert_series_equal( - pd.Series([10.3, 5., 5., None]).value_counts(dropna=False), - pd.Series([2, 1, 1], index=[5., 10.3, np.nan])) + result = Series([10.3, 5., 5., None]).value_counts(dropna=False) + expected = Series([2, 1, 1], index=[5., 10.3, np.nan]) + tm.assert_series_equal(result, expected) def test_value_counts_normalized(self): # GH12558 @@ -654,12 +715,12 @@ def test_value_counts_uint64(self): expected = Series([1, 1], index=[-1, 2**63]) result = algos.value_counts(arr) - tm.assert_series_equal(result, expected) - + # 32-bit linux has a different ordering + if not compat.is_platform_32bit(): + tm.assert_series_equal(result, expected) -class TestDuplicated(tm.TestCase): - _multiprocess_can_split_ = True +class TestDuplicated(object): def test_duplicated_with_nas(self): keys = np.array([0, 1, np.nan, 0, 2, np.nan], dtype=object) @@ -739,15 +800,15 @@ def test_numeric_object_likes(self): tm.assert_numpy_array_equal(res_false, exp_false) # series - for s in [pd.Series(case), pd.Series(case, dtype='category')]: + for s in [Series(case), Series(case, dtype='category')]: res_first = s.duplicated(keep='first') - tm.assert_series_equal(res_first, pd.Series(exp_first)) + tm.assert_series_equal(res_first, Series(exp_first)) res_last = s.duplicated(keep='last') - tm.assert_series_equal(res_last, pd.Series(exp_last)) + tm.assert_series_equal(res_last, Series(exp_last)) res_false = s.duplicated(keep=False) - tm.assert_series_equal(res_false, pd.Series(exp_false)) + tm.assert_series_equal(res_false, Series(exp_false)) def test_datetime_likes(self): @@ -756,8 +817,8 @@ def test_datetime_likes(self): td = ['1 days', '2 days', '1 days', 'NaT', '3 days', '2 days', '4 days', '1 days', 'NaT', '6 days'] - cases = [np.array([pd.Timestamp(d) for d in dt]), - np.array([pd.Timestamp(d, tz='US/Eastern') for d in dt]), + cases = [np.array([Timestamp(d) for d in dt]), + np.array([Timestamp(d, tz='US/Eastern') for d in dt]), np.array([pd.Period(d, freq='D') for d in dt]), np.array([np.datetime64(d) for d in dt]), np.array([pd.Timedelta(d) for d in td])] @@ -791,24 +852,40 @@ def test_datetime_likes(self): tm.assert_numpy_array_equal(res_false, exp_false) # series - for s in [pd.Series(case), pd.Series(case, dtype='category'), - pd.Series(case, dtype=object)]: + for s in [Series(case), Series(case, dtype='category'), + Series(case, dtype=object)]: res_first = s.duplicated(keep='first') - tm.assert_series_equal(res_first, pd.Series(exp_first)) + tm.assert_series_equal(res_first, Series(exp_first)) res_last = s.duplicated(keep='last') - tm.assert_series_equal(res_last, pd.Series(exp_last)) + tm.assert_series_equal(res_last, Series(exp_last)) res_false = s.duplicated(keep=False) - tm.assert_series_equal(res_false, pd.Series(exp_false)) + tm.assert_series_equal(res_false, Series(exp_false)) def test_unique_index(self): cases = [pd.Index([1, 2, 3]), pd.RangeIndex(0, 3)] for case in cases: - self.assertTrue(case.is_unique) + assert case.is_unique tm.assert_numpy_array_equal(case.duplicated(), np.array([False, False, False])) + @pytest.mark.parametrize('arr, unique', [ + ([(0, 0), (0, 1), (1, 0), (1, 1), (0, 0), (0, 1), (1, 0), (1, 1)], + [(0, 0), (0, 1), (1, 0), (1, 1)]), + ([('b', 'c'), ('a', 'b'), ('a', 'b'), ('b', 'c')], + [('b', 'c'), ('a', 'b')]), + ([('a', 1), ('b', 2), ('a', 3), ('a', 1)], + [('a', 1), ('b', 2), ('a', 3)]), + ]) + def test_unique_tuples(self, arr, unique): + # https://github.com/pandas-dev/pandas/issues/16519 + expected = np.empty(len(unique), dtype=object) + expected[:] = unique + + result = pd.unique(arr) + tm.assert_numpy_array_equal(result, expected) + class GroupVarTestMixin(object): @@ -826,7 +903,7 @@ def test_group_var_generic_1d(self): expected_counts = counts + 3 self.algo(out, counts, values, labels) - self.assertTrue(np.allclose(out, expected_out, self.rtol)) + assert np.allclose(out, expected_out, self.rtol) tm.assert_numpy_array_equal(counts, expected_counts) def test_group_var_generic_1d_flat_labels(self): @@ -842,7 +919,7 @@ def test_group_var_generic_1d_flat_labels(self): self.algo(out, counts, values, labels) - self.assertTrue(np.allclose(out, expected_out, self.rtol)) + assert np.allclose(out, expected_out, self.rtol) tm.assert_numpy_array_equal(counts, expected_counts) def test_group_var_generic_2d_all_finite(self): @@ -857,7 +934,7 @@ def test_group_var_generic_2d_all_finite(self): expected_counts = counts + 2 self.algo(out, counts, values, labels) - self.assertTrue(np.allclose(out, expected_out, self.rtol)) + assert np.allclose(out, expected_out, self.rtol) tm.assert_numpy_array_equal(counts, expected_counts) def test_group_var_generic_2d_some_nan(self): @@ -889,16 +966,15 @@ def test_group_var_constant(self): self.algo(out, counts, values, labels) - self.assertEqual(counts[0], 3) - self.assertTrue(out[0, 0] >= 0) + assert counts[0] == 3 + assert out[0, 0] >= 0 tm.assert_almost_equal(out[0, 0], 0.0) -class TestGroupVarFloat64(tm.TestCase, GroupVarTestMixin): +class TestGroupVarFloat64(GroupVarTestMixin): __test__ = True - _multiprocess_can_split_ = True - algo = algos.algos.group_var_float64 + algo = libgroupby.group_var_float64 dtype = np.float64 rtol = 1e-5 @@ -914,63 +990,72 @@ def test_group_var_large_inputs(self): self.algo(out, counts, values, labels) - self.assertEqual(counts[0], 10 ** 6) + assert counts[0] == 10 ** 6 tm.assert_almost_equal(out[0, 0], 1.0 / 12, check_less_precise=True) -class TestGroupVarFloat32(tm.TestCase, GroupVarTestMixin): +class TestGroupVarFloat32(GroupVarTestMixin): __test__ = True - _multiprocess_can_split_ = True - algo = algos.algos.group_var_float32 + algo = libgroupby.group_var_float32 dtype = np.float32 rtol = 1e-2 -class TestHashTable(tm.TestCase): +class TestHashTable(object): def test_lookup_nan(self): xs = np.array([2.718, 3.14, np.nan, -7, 5, 2, 3]) - m = hashtable.Float64HashTable() + m = ht.Float64HashTable() m.map_locations(xs) - self.assert_numpy_array_equal(m.lookup(xs), - np.arange(len(xs), dtype=np.int64)) + tm.assert_numpy_array_equal(m.lookup(xs), np.arange(len(xs), + dtype=np.int64)) def test_lookup_overflow(self): xs = np.array([1, 2, 2**63], dtype=np.uint64) - m = hashtable.UInt64HashTable() + m = ht.UInt64HashTable() m.map_locations(xs) - self.assert_numpy_array_equal(m.lookup(xs), - np.arange(len(xs), dtype=np.int64)) + tm.assert_numpy_array_equal(m.lookup(xs), np.arange(len(xs), + dtype=np.int64)) def test_get_unique(self): - s = pd.Series([1, 2, 2**63, 2**63], dtype=np.uint64) + s = Series([1, 2, 2**63, 2**63], dtype=np.uint64) exp = np.array([1, 2, 2**63], dtype=np.uint64) - self.assert_numpy_array_equal(s.unique(), exp) + tm.assert_numpy_array_equal(s.unique(), exp) def test_vector_resize(self): # Test for memory errors after internal vector # reallocations (pull request #7157) - def _test_vector_resize(htable, uniques, dtype, nvals): + def _test_vector_resize(htable, uniques, dtype, nvals, safely_resizes): vals = np.array(np.random.randn(1000), dtype=dtype) - # get_labels appends to the vector + # get_labels may append to uniques htable.get_labels(vals[:nvals], uniques, 0, -1) - # to_array resizes the vector - uniques.to_array() - htable.get_labels(vals, uniques, 0, -1) + # to_array() set an external_view_exists flag on uniques. + tmp = uniques.to_array() + oldshape = tmp.shape + # subsequent get_labels() calls can no longer append to it + # (for all but StringHashTables + ObjectVector) + if safely_resizes: + htable.get_labels(vals, uniques, 0, -1) + else: + with pytest.raises(ValueError) as excinfo: + htable.get_labels(vals, uniques, 0, -1) + assert str(excinfo.value).startswith('external reference') + uniques.to_array() # should not raise here + assert tmp.shape == oldshape test_cases = [ - (hashtable.PyObjectHashTable, hashtable.ObjectVector, 'object'), - (hashtable.StringHashTable, hashtable.ObjectVector, 'object'), - (hashtable.Float64HashTable, hashtable.Float64Vector, 'float64'), - (hashtable.Int64HashTable, hashtable.Int64Vector, 'int64'), - (hashtable.UInt64HashTable, hashtable.UInt64Vector, 'uint64')] + (ht.PyObjectHashTable, ht.ObjectVector, 'object', False), + (ht.StringHashTable, ht.ObjectVector, 'object', True), + (ht.Float64HashTable, ht.Float64Vector, 'float64', False), + (ht.Int64HashTable, ht.Int64Vector, 'int64', False), + (ht.UInt64HashTable, ht.UInt64Vector, 'uint64', False)] - for (tbl, vect, dtype) in test_cases: + for (tbl, vect, dtype, safely_resizes) in test_cases: # resizing to empty is a special case - _test_vector_resize(tbl(), vect(), dtype, 0) - _test_vector_resize(tbl(), vect(), dtype, 10) + _test_vector_resize(tbl(), vect(), dtype, 0, safely_resizes) + _test_vector_resize(tbl(), vect(), dtype, 10, safely_resizes) def test_quantile(): @@ -982,7 +1067,6 @@ def test_quantile(): def test_unique_label_indices(): - from pandas.hashtable import unique_label_indices a = np.random.randint(1, 1 << 10, 1 << 15).astype('i8') @@ -999,7 +1083,7 @@ def test_unique_label_indices(): check_dtype=False) -class TestRank(tm.TestCase): +class TestRank(object): def test_scipy_compat(self): tm._skip_if_no_scipy() @@ -1008,7 +1092,7 @@ def test_scipy_compat(self): def _check(arr): mask = ~np.isfinite(arr) arr = arr.copy() - result = _algos.rank_1d_float64(arr) + result = libalgos.rank_1d_float64(arr) arr[mask] = np.inf exp = rankdata(arr) exp[mask] = nan @@ -1035,7 +1119,7 @@ def test_too_many_ndims(self): arr = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]]) msg = "Array with ndim > 2 are not supported" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): algos.rank(arr) @@ -1044,31 +1128,30 @@ def test_pad_backfill_object_segfault(): old = np.array([], dtype='O') new = np.array([datetime(2010, 12, 31)], dtype='O') - result = _algos.pad_object(old, new) + result = libalgos.pad_object(old, new) expected = np.array([-1], dtype=np.int64) assert (np.array_equal(result, expected)) - result = _algos.pad_object(new, old) + result = libalgos.pad_object(new, old) expected = np.array([], dtype=np.int64) assert (np.array_equal(result, expected)) - result = _algos.backfill_object(old, new) + result = libalgos.backfill_object(old, new) expected = np.array([-1], dtype=np.int64) assert (np.array_equal(result, expected)) - result = _algos.backfill_object(new, old) + result = libalgos.backfill_object(new, old) expected = np.array([], dtype=np.int64) assert (np.array_equal(result, expected)) def test_arrmap(): values = np.array(['foo', 'foo', 'bar', 'bar', 'baz', 'qux'], dtype='O') - result = _algos.arrmap_object(values, lambda x: x in ['foo', 'bar']) + result = libalgos.arrmap_object(values, lambda x: x in ['foo', 'bar']) assert (result.dtype == np.bool_) -class TestTseriesUtil(tm.TestCase): - _multiprocess_can_split_ = True +class TestTseriesUtil(object): def test_combineFunc(self): pass @@ -1076,7 +1159,7 @@ def test_combineFunc(self): def test_reindex(self): pass - def test_isnull(self): + def test_isna(self): pass def test_groupby(self): @@ -1089,36 +1172,36 @@ def test_backfill(self): old = Index([1, 5, 10]) new = Index(lrange(12)) - filler = _algos.backfill_int64(old.values, new.values) + filler = libalgos.backfill_int64(old.values, new.values) expect_filler = np.array([0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, -1], dtype=np.int64) - self.assert_numpy_array_equal(filler, expect_filler) + tm.assert_numpy_array_equal(filler, expect_filler) # corner case old = Index([1, 4]) new = Index(lrange(5, 10)) - filler = _algos.backfill_int64(old.values, new.values) + filler = libalgos.backfill_int64(old.values, new.values) expect_filler = np.array([-1, -1, -1, -1, -1], dtype=np.int64) - self.assert_numpy_array_equal(filler, expect_filler) + tm.assert_numpy_array_equal(filler, expect_filler) def test_pad(self): old = Index([1, 5, 10]) new = Index(lrange(12)) - filler = _algos.pad_int64(old.values, new.values) + filler = libalgos.pad_int64(old.values, new.values) expect_filler = np.array([-1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2], dtype=np.int64) - self.assert_numpy_array_equal(filler, expect_filler) + tm.assert_numpy_array_equal(filler, expect_filler) # corner case old = Index([5, 10]) new = Index(lrange(5)) - filler = _algos.pad_int64(old.values, new.values) + filler = libalgos.pad_int64(old.values, new.values) expect_filler = np.array([-1, -1, -1, -1, -1], dtype=np.int64) - self.assert_numpy_array_equal(filler, expect_filler) + tm.assert_numpy_array_equal(filler, expect_filler) def test_is_lexsorted(): @@ -1148,7 +1231,7 @@ def test_is_lexsorted(): 6, 5, 4, 3, 2, 1, 0])] - assert (not _algos.is_lexsorted(failure)) + assert (not libalgos.is_lexsorted(failure)) # def test_get_group_index(): # a = np.array([0, 1, 2, 0, 2, 1, 0, 0], dtype=np.int64) @@ -1164,7 +1247,7 @@ def test_groupsort_indexer(): a = np.random.randint(0, 1000, 100).astype(np.int64) b = np.random.randint(0, 1000, 100).astype(np.int64) - result = _algos.groupsort_indexer(a, 1000)[0] + result = libalgos.groupsort_indexer(a, 1000)[0] # need to use a stable sort expected = np.argsort(a, kind='mergesort') @@ -1172,7 +1255,7 @@ def test_groupsort_indexer(): # compare with lexsort key = a * 1000 + b - result = _algos.groupsort_indexer(key, 1000000)[0] + result = libalgos.groupsort_indexer(key, 1000000)[0] expected = np.lexsort((b, a)) assert (np.array_equal(result, expected)) @@ -1183,8 +1266,8 @@ def test_infinity_sort(): # itself. Instead, let's give our infinities a self-consistent # ordering, but outside the float extended real line. - Inf = _algos.Infinity() - NegInf = _algos.NegInfinity() + Inf = libalgos.Infinity() + NegInf = libalgos.NegInfinity() ref_nums = [NegInf, float("-inf"), -1e100, 0, 1e100, float("inf"), Inf] @@ -1202,14 +1285,14 @@ def test_infinity_sort(): assert sorted(perm) == ref_nums # smoke tests - np.array([_algos.Infinity()] * 32).argsort() - np.array([_algos.NegInfinity()] * 32).argsort() + np.array([libalgos.Infinity()] * 32).argsort() + np.array([libalgos.NegInfinity()] * 32).argsort() def test_ensure_platform_int(): arr = np.arange(100, dtype=np.intp) - result = _algos.ensure_platform_int(arr) + result = libalgos.ensure_platform_int(arr) assert (result is arr) @@ -1219,27 +1302,27 @@ def test_int64_add_overflow(): m = np.iinfo(np.int64).max n = np.iinfo(np.int64).min - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), m) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([n, n]), n) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([n, n]), np.array([n, n])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, n]), np.array([n, n])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), arr_mask=np.array([False, True])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), b_mask=np.array([False, True])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), arr_mask=np.array([False, True]), b_mask=np.array([False, True])) - with tm.assertRaisesRegexp(OverflowError, msg): + with tm.assert_raises_regex(OverflowError, msg): with tm.assert_produces_warning(RuntimeWarning): algos.checked_add_with_arr(np.array([m, m]), np.array([np.nan, m])) @@ -1247,31 +1330,48 @@ def test_int64_add_overflow(): # Check that the nan boolean arrays override whether or not # the addition overflows. We don't check the result but just # the fact that an OverflowError is not raised. - with tm.assertRaises(AssertionError): - with tm.assertRaisesRegexp(OverflowError, msg): + with pytest.raises(AssertionError): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), arr_mask=np.array([True, True])) - with tm.assertRaises(AssertionError): - with tm.assertRaisesRegexp(OverflowError, msg): + with pytest.raises(AssertionError): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), b_mask=np.array([True, True])) - with tm.assertRaises(AssertionError): - with tm.assertRaisesRegexp(OverflowError, msg): + with pytest.raises(AssertionError): + with tm.assert_raises_regex(OverflowError, msg): algos.checked_add_with_arr(np.array([m, m]), np.array([m, m]), arr_mask=np.array([True, False]), b_mask=np.array([False, True])) -class TestMode(tm.TestCase): +class TestMode(object): def test_no_mode(self): exp = Series([], dtype=np.float64) tm.assert_series_equal(algos.mode([]), exp) - exp = Series([], dtype=np.int) + def test_mode_single(self): + # GH 15714 + exp_single = [1] + data_single = [1] + + exp_multi = [1] + data_multi = [1, 1] + + for dt in np.typecodes['AllInteger'] + np.typecodes['Float']: + s = Series(data_single, dtype=dt) + exp = Series(exp_single, dtype=dt) + tm.assert_series_equal(algos.mode(s), exp) + + s = Series(data_multi, dtype=dt) + exp = Series(exp_multi, dtype=dt) + tm.assert_series_equal(algos.mode(s), exp) + + exp = Series([1], dtype=np.int) tm.assert_series_equal(algos.mode([1]), exp) - exp = Series([], dtype=np.object) + exp = Series(['a', 'b', 'c'], dtype=np.object) tm.assert_series_equal(algos.mode(['a', 'b', 'c']), exp) def test_number_mode(self): @@ -1307,7 +1407,8 @@ def test_strobj_mode(self): tm.assert_series_equal(algos.mode(s), exp) def test_datelike_mode(self): - exp = Series([], dtype="M8[ns]") + exp = Series(['1900-05-03', '2011-01-03', + '2013-01-02'], dtype="M8[ns]") s = Series(['2011-01-03', '2013-01-02', '1900-05-03'], dtype='M8[ns]') tm.assert_series_equal(algos.mode(s), exp) @@ -1318,7 +1419,8 @@ def test_datelike_mode(self): tm.assert_series_equal(algos.mode(s), exp) def test_timedelta_mode(self): - exp = Series([], dtype='timedelta64[ns]') + exp = Series(['-1 days', '0 days', '1 days'], + dtype='timedelta64[ns]') s = Series(['1 days', '-1 days', '0 days'], dtype='timedelta64[ns]') tm.assert_series_equal(algos.mode(s), exp) @@ -1338,26 +1440,29 @@ def test_uint64_overflow(self): s = Series([1, 2**63, 2**63], dtype=np.uint64) tm.assert_series_equal(algos.mode(s), exp) - exp = Series([], dtype=np.uint64) + exp = Series([1, 2**63], dtype=np.uint64) s = Series([1, 2**63], dtype=np.uint64) tm.assert_series_equal(algos.mode(s), exp) def test_categorical(self): c = Categorical([1, 2]) - exp = Series([], dtype=np.int64) - tm.assert_series_equal(algos.mode(c), exp) + exp = c + tm.assert_categorical_equal(algos.mode(c), exp) + tm.assert_categorical_equal(c.mode(), exp) c = Categorical([1, 'a', 'a']) - exp = Series(['a'], dtype=object) - tm.assert_series_equal(algos.mode(c), exp) + exp = Categorical(['a'], categories=[1, 'a']) + tm.assert_categorical_equal(algos.mode(c), exp) + tm.assert_categorical_equal(c.mode(), exp) c = Categorical([1, 1, 2, 3, 3]) - exp = Series([1, 3], dtype=np.int64) - tm.assert_series_equal(algos.mode(c), exp) + exp = Categorical([1, 3], categories=[1, 2, 3]) + tm.assert_categorical_equal(algos.mode(c), exp) + tm.assert_categorical_equal(c.mode(), exp) def test_index(self): idx = Index([1, 2, 3]) - exp = Series([], dtype=np.int64) + exp = Series([1, 2, 3], dtype=np.int64) tm.assert_series_equal(algos.mode(idx), exp) idx = Index([1, 'a', 'a']) @@ -1372,9 +1477,3 @@ def test_index(self): idx = Index(['1 day', '1 day', '-1 day', '-1 day 2 min', '2 min', '2 min'], dtype='timedelta64[ns]') tm.assert_series_equal(algos.mode(idx), exp) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_base.py b/pandas/tests/test_base.py index f750936961831..9e92c7cf1a9b8 100644 --- a/pandas/tests/test_base.py +++ b/pandas/tests/test_base.py @@ -4,21 +4,22 @@ import re import sys from datetime import datetime, timedelta - +import pytest import numpy as np import pandas as pd import pandas.compat as compat -from pandas.types.common import (is_object_dtype, is_datetimetz, - needs_i8_conversion) +from pandas.core.dtypes.common import ( + is_object_dtype, is_datetimetz, + needs_i8_conversion) import pandas.util.testing as tm from pandas import (Series, Index, DatetimeIndex, TimedeltaIndex, PeriodIndex, - Timedelta) -from pandas.compat import u, StringIO + Timedelta, IntervalIndex, Interval) +from pandas.compat import StringIO, PYPY from pandas.compat.numpy import np_array_datetime64_compat -from pandas.core.base import (FrozenList, FrozenNDArray, PandasDelegate, - NoNewAttributesMixin) -from pandas.tseries.base import DatetimeIndexOpsMixin +from pandas.core.base import PandasDelegate, NoNewAttributesMixin +from pandas.core.indexes.datetimelike import DatetimeIndexOpsMixin +from pandas._libs.tslib import iNaT class CheckStringMixin(object): @@ -32,7 +33,7 @@ def test_string_methods_dont_fail(self): def test_tricky_container(self): if not hasattr(self, 'unicode_container'): - raise nose.SkipTest('Need unicode_container to test with this') + pytest.skip('Need unicode_container to test with this') repr(self.unicode_container) str(self.unicode_container) bytes(self.unicode_container) @@ -44,9 +45,10 @@ class CheckImmutable(object): mutable_regex = re.compile('does not support mutable operations') def check_mutable_error(self, *args, **kwargs): - # pass whatever functions you normally would to assertRaises (after the - # Exception kind) - tm.assertRaisesRegexp(TypeError, self.mutable_regex, *args, **kwargs) + # Pass whatever function you normally would to assert_raises_regex + # (after the Exception kind). + tm.assert_raises_regex( + TypeError, self.mutable_regex, *args, **kwargs) def test_no_mutable_funcs(self): def setitem(): @@ -69,6 +71,7 @@ def delslice(): self.check_mutable_error(delslice) mutable_methods = getattr(self, "mutable_methods", []) + for meth in mutable_methods: self.check_mutable_error(getattr(self.container, meth)) @@ -79,74 +82,11 @@ def test_slicing_maintains_type(self): def check_result(self, result, expected, klass=None): klass = klass or self.klass - self.assertIsInstance(result, klass) - self.assertEqual(result, expected) - + assert isinstance(result, klass) + assert result == expected -class TestFrozenList(CheckImmutable, CheckStringMixin, tm.TestCase): - mutable_methods = ('extend', 'pop', 'remove', 'insert') - unicode_container = FrozenList([u("\u05d0"), u("\u05d1"), "c"]) - def setUp(self): - self.lst = [1, 2, 3, 4, 5] - self.container = FrozenList(self.lst) - self.klass = FrozenList - - def test_add(self): - result = self.container + (1, 2, 3) - expected = FrozenList(self.lst + [1, 2, 3]) - self.check_result(result, expected) - - result = (1, 2, 3) + self.container - expected = FrozenList([1, 2, 3] + self.lst) - self.check_result(result, expected) - - def test_inplace(self): - q = r = self.container - q += [5] - self.check_result(q, self.lst + [5]) - # other shouldn't be mutated - self.check_result(r, self.lst) - - -class TestFrozenNDArray(CheckImmutable, CheckStringMixin, tm.TestCase): - mutable_methods = ('put', 'itemset', 'fill') - unicode_container = FrozenNDArray([u("\u05d0"), u("\u05d1"), "c"]) - - def setUp(self): - self.lst = [3, 5, 7, -2] - self.container = FrozenNDArray(self.lst) - self.klass = FrozenNDArray - - def test_shallow_copying(self): - original = self.container.copy() - self.assertIsInstance(self.container.view(), FrozenNDArray) - self.assertFalse(isinstance( - self.container.view(np.ndarray), FrozenNDArray)) - self.assertIsNot(self.container.view(), self.container) - self.assert_numpy_array_equal(self.container, original) - # shallow copy should be the same too - self.assertIsInstance(self.container._shallow_copy(), FrozenNDArray) - - # setting should not be allowed - def testit(container): - container[0] = 16 - - self.check_mutable_error(testit, self.container) - - def test_values(self): - original = self.container.view(np.ndarray).copy() - n = original[0] + 15 - vals = self.container.values() - self.assert_numpy_array_equal(original, vals) - self.assertIsNot(original, vals) - vals[0] = n - self.assertIsInstance(self.container, pd.core.base.FrozenNDArray) - self.assert_numpy_array_equal(self.container.values(), original) - self.assertEqual(vals[0], n) - - -class TestPandasDelegate(tm.TestCase): +class TestPandasDelegate(object): class Delegator(object): _properties = ['foo'] @@ -169,7 +109,7 @@ class Delegate(PandasDelegate): def __init__(self, obj): self.obj = obj - def setUp(self): + def setup_method(self, method): pass def test_invalida_delgation(self): @@ -192,18 +132,19 @@ def test_invalida_delgation(self): def f(): delegate.foo - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): delegate.foo = 5 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): delegate.foo() - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) + @pytest.mark.skipif(PYPY, reason="not relevant for PyPy") def test_memory_usage(self): # Delegate does not implement memory_usage. # Check that we fall back to in-built `__sizeof__` @@ -212,7 +153,7 @@ def test_memory_usage(self): sys.getsizeof(delegate) -class Ops(tm.TestCase): +class Ops(object): def _allow_na_ops(self, obj): """Whether to skip test cases including NaN""" @@ -222,7 +163,7 @@ def _allow_na_ops(self, obj): return False return True - def setUp(self): + def setup_method(self, method): self.bool_index = tm.makeBoolIndex(10, name='a') self.int_index = tm.makeIntIndex(10, name='a') self.float_index = tm.makeFloatIndex(10, name='a') @@ -277,22 +218,22 @@ def check_ops_properties(self, props, filter=None, ignore_failures=False): tm.assert_index_equal(result, expected) elif isinstance(result, np.ndarray) and isinstance(expected, np.ndarray): - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) else: - self.assertEqual(result, expected) + assert result == expected # freq raises AttributeError on an Int64Index because its not - # defined we mostly care about Series hwere anyhow + # defined we mostly care about Series here anyhow if not ignore_failures: for o in self.not_valid_objs: # an object that is datetimelike will raise a TypeError, # otherwise an AttributeError if issubclass(type(o), DatetimeIndexOpsMixin): - self.assertRaises(TypeError, lambda: getattr(o, op)) + pytest.raises(TypeError, lambda: getattr(o, op)) else: - self.assertRaises(AttributeError, - lambda: getattr(o, op)) + pytest.raises(AttributeError, + lambda: getattr(o, op)) def test_binary_ops_docs(self): from pandas import DataFrame, Panel @@ -310,19 +251,17 @@ def test_binary_ops_docs(self): operand2 = 'other' op = op_map[op_name] expected_str = ' '.join([operand1, op, operand2]) - self.assertTrue(expected_str in getattr(klass, - op_name).__doc__) + assert expected_str in getattr(klass, op_name).__doc__ # reverse version of the binary ops expected_str = ' '.join([operand2, op, operand1]) - self.assertTrue(expected_str in getattr(klass, 'r' + - op_name).__doc__) + assert expected_str in getattr(klass, 'r' + op_name).__doc__ class TestIndexOps(Ops): - def setUp(self): - super(TestIndexOps, self).setUp() + def setup_method(self, method): + super(TestIndexOps, self).setup_method(method) self.is_valid_objs = [o for o in self.objs if o._allow_index_ops] self.not_valid_objs = [o for o in self.objs if not o._allow_index_ops] @@ -337,55 +276,57 @@ def test_none_comparison(self): # noinspection PyComparisonWithNone result = o == None # noqa - self.assertFalse(result.iat[0]) - self.assertFalse(result.iat[1]) + assert not result.iat[0] + assert not result.iat[1] # noinspection PyComparisonWithNone result = o != None # noqa - self.assertTrue(result.iat[0]) - self.assertTrue(result.iat[1]) + assert result.iat[0] + assert result.iat[1] result = None == o # noqa - self.assertFalse(result.iat[0]) - self.assertFalse(result.iat[1]) + assert not result.iat[0] + assert not result.iat[1] # this fails for numpy < 1.9 # and oddly for *some* platforms # result = None != o # noqa - # self.assertTrue(result.iat[0]) - # self.assertTrue(result.iat[1]) + # assert result.iat[0] + # assert result.iat[1] result = None > o - self.assertFalse(result.iat[0]) - self.assertFalse(result.iat[1]) + assert not result.iat[0] + assert not result.iat[1] result = o < None - self.assertFalse(result.iat[0]) - self.assertFalse(result.iat[1]) + assert not result.iat[0] + assert not result.iat[1] def test_ndarray_compat_properties(self): for o in self.objs: + # Check that we work. + for p in ['shape', 'dtype', 'flags', 'T', + 'strides', 'itemsize', 'nbytes']: + assert getattr(o, p, None) is not None - # check that we work - for p in ['shape', 'dtype', 'flags', 'T', 'strides', 'itemsize', - 'nbytes']: - self.assertIsNotNone(getattr(o, p, None)) - self.assertTrue(hasattr(o, 'base')) + assert hasattr(o, 'base') - # if we have a datetimelike dtype then needs a view to work + # If we have a datetime-like dtype then needs a view to work # but the user is responsible for that try: - self.assertIsNotNone(o.data) + assert o.data is not None except ValueError: pass - self.assertRaises(ValueError, o.item) # len > 1 - self.assertEqual(o.ndim, 1) - self.assertEqual(o.size, len(o)) + with pytest.raises(ValueError): + o.item() # len > 1 + + assert o.ndim == 1 + assert o.size == len(o) - self.assertEqual(Index([1]).item(), 1) - self.assertEqual(Series([1]).item(), 1) + assert Index([1]).item() == 1 + assert Series([1]).item() == 1 def test_ops(self): for op in ['max', 'min']: @@ -397,12 +338,12 @@ def test_ops(self): expected = pd.Period(ordinal=getattr(o._values, op)(), freq=o.freq) try: - self.assertEqual(result, expected) + assert result == expected except TypeError: # comparing tz-aware series with np.array results in # TypeError expected = expected.astype('M8[ns]').astype('int64') - self.assertEqual(result.value, expected) + assert result.value == expected def test_nanops(self): # GH 7261 @@ -410,43 +351,43 @@ def test_nanops(self): for klass in [Index, Series]: obj = klass([np.nan, 2.0]) - self.assertEqual(getattr(obj, op)(), 2.0) + assert getattr(obj, op)() == 2.0 obj = klass([np.nan]) - self.assertTrue(pd.isnull(getattr(obj, op)())) + assert pd.isna(getattr(obj, op)()) obj = klass([]) - self.assertTrue(pd.isnull(getattr(obj, op)())) + assert pd.isna(getattr(obj, op)()) obj = klass([pd.NaT, datetime(2011, 11, 1)]) # check DatetimeIndex monotonic path - self.assertEqual(getattr(obj, op)(), datetime(2011, 11, 1)) + assert getattr(obj, op)() == datetime(2011, 11, 1) obj = klass([pd.NaT, datetime(2011, 11, 1), pd.NaT]) # check DatetimeIndex non-monotonic path - self.assertEqual(getattr(obj, op)(), datetime(2011, 11, 1)) + assert getattr(obj, op)(), datetime(2011, 11, 1) # argmin/max obj = Index(np.arange(5, dtype='int64')) - self.assertEqual(obj.argmin(), 0) - self.assertEqual(obj.argmax(), 4) + assert obj.argmin() == 0 + assert obj.argmax() == 4 obj = Index([np.nan, 1, np.nan, 2]) - self.assertEqual(obj.argmin(), 1) - self.assertEqual(obj.argmax(), 3) + assert obj.argmin() == 1 + assert obj.argmax() == 3 obj = Index([np.nan]) - self.assertEqual(obj.argmin(), -1) - self.assertEqual(obj.argmax(), -1) + assert obj.argmin() == -1 + assert obj.argmax() == -1 obj = Index([pd.NaT, datetime(2011, 11, 1), datetime(2011, 11, 2), pd.NaT]) - self.assertEqual(obj.argmin(), 1) - self.assertEqual(obj.argmax(), 2) + assert obj.argmin() == 1 + assert obj.argmax() == 2 obj = Index([pd.NaT]) - self.assertEqual(obj.argmin(), -1) - self.assertEqual(obj.argmax(), -1) + assert obj.argmin() == -1 + assert obj.argmax() == -1 def test_value_counts_unique_nunique(self): for orig in self.objs: @@ -474,31 +415,31 @@ def test_value_counts_unique_nunique(self): o = klass(rep, index=idx, name='a') # check values has the same dtype as the original - self.assertEqual(o.dtype, orig.dtype) + assert o.dtype == orig.dtype expected_s = Series(range(10, 0, -1), index=expected_index, dtype='int64', name='a') result = o.value_counts() tm.assert_series_equal(result, expected_s) - self.assertTrue(result.index.name is None) - self.assertEqual(result.name, 'a') + assert result.index.name is None + assert result.name == 'a' result = o.unique() if isinstance(o, Index): - self.assertTrue(isinstance(result, o.__class__)) - self.assert_index_equal(result, orig) + assert isinstance(result, o.__class__) + tm.assert_index_equal(result, orig) elif is_datetimetz(o): # datetimetz Series returns array of Timestamp - self.assertEqual(result[0], orig[0]) + assert result[0] == orig[0] for r in result: - self.assertIsInstance(r, pd.Timestamp) + assert isinstance(r, pd.Timestamp) tm.assert_numpy_array_equal(result, orig._values.asobject.values) else: tm.assert_numpy_array_equal(result, orig.values) - self.assertEqual(o.nunique(), len(np.unique(o.values))) + assert o.nunique() == len(np.unique(o.values)) def test_value_counts_unique_nunique_null(self): @@ -515,21 +456,21 @@ def test_value_counts_unique_nunique_null(self): if is_datetimetz(o): if isinstance(o, DatetimeIndex): v = o.asi8 - v[0:2] = pd.tslib.iNaT + v[0:2] = iNaT values = o._shallow_copy(v) else: o = o.copy() - o[0:2] = pd.tslib.iNaT + o[0:2] = iNaT values = o._values elif needs_i8_conversion(o): - values[0:2] = pd.tslib.iNaT + values[0:2] = iNaT values = o._shallow_copy(values) else: values[0:2] = null_obj # check values has the same dtype as the original - self.assertEqual(values.dtype, o.dtype) + assert values.dtype == o.dtype # create repeated values, 'n'th element is repeated by n+1 # times @@ -550,15 +491,15 @@ def test_value_counts_unique_nunique_null(self): o.name = 'a' # check values has the same dtype as the original - self.assertEqual(o.dtype, orig.dtype) + assert o.dtype == orig.dtype # check values correctly have NaN nanloc = np.zeros(len(o), dtype=np.bool) nanloc[:3] = True if isinstance(o, Index): - self.assert_numpy_array_equal(pd.isnull(o), nanloc) + tm.assert_numpy_array_equal(pd.isna(o), nanloc) else: exp = pd.Series(nanloc, o.index, name='a') - self.assert_series_equal(pd.isnull(o), exp) + tm.assert_series_equal(pd.isna(o), exp) expected_s_na = Series(list(range(10, 2, -1)) + [3], index=expected_index[9:0:-1], @@ -569,12 +510,12 @@ def test_value_counts_unique_nunique_null(self): result_s_na = o.value_counts(dropna=False) tm.assert_series_equal(result_s_na, expected_s_na) - self.assertTrue(result_s_na.index.name is None) - self.assertEqual(result_s_na.name, 'a') + assert result_s_na.index.name is None + assert result_s_na.name == 'a' result_s = o.value_counts() tm.assert_series_equal(o.value_counts(), expected_s) - self.assertTrue(result_s.index.name is None) - self.assertEqual(result_s.name, 'a') + assert result_s.index.name is None + assert result_s.name == 'a' result = o.unique() if isinstance(o, Index): @@ -584,15 +525,15 @@ def test_value_counts_unique_nunique_null(self): # unable to compare NaT / nan tm.assert_numpy_array_equal(result[1:], values[2:].asobject.values) - self.assertIs(result[0], pd.NaT) + assert result[0] is pd.NaT else: tm.assert_numpy_array_equal(result[1:], values[2:]) - self.assertTrue(pd.isnull(result[0])) - self.assertEqual(result.dtype, orig.dtype) + assert pd.isna(result[0]) + assert result.dtype == orig.dtype - self.assertEqual(o.nunique(), 8) - self.assertEqual(o.nunique(dropna=False), 9) + assert o.nunique() == 8 + assert o.nunique(dropna=False) == 9 def test_value_counts_inferred(self): klasses = [Index, Series] @@ -609,7 +550,7 @@ def test_value_counts_inferred(self): exp = np.unique(np.array(s_values, dtype=np.object_)) tm.assert_numpy_array_equal(s.unique(), exp) - self.assertEqual(s.nunique(), 4) + assert s.nunique() == 4 # don't sort, have to sort after the fact as not sorting is # platform-dep hist = s.value_counts(sort=False).sort_values() @@ -633,15 +574,14 @@ def test_value_counts_bins(self): s = klass(s_values) # bins - self.assertRaises(TypeError, - lambda bins: s.value_counts(bins=bins), 1) + pytest.raises(TypeError, lambda bins: s.value_counts(bins=bins), 1) s1 = Series([1, 1, 2, 3]) res1 = s1.value_counts(bins=1) - exp1 = Series({0.998: 4}) + exp1 = Series({Interval(0.997, 3.0): 4}) tm.assert_series_equal(res1, exp1) res1n = s1.value_counts(bins=1, normalize=True) - exp1n = Series({0.998: 1.0}) + exp1n = Series({Interval(0.997, 3.0): 1.0}) tm.assert_series_equal(res1n, exp1n) if isinstance(s1, Index): @@ -650,20 +590,22 @@ def test_value_counts_bins(self): exp = np.array([1, 2, 3], dtype=np.int64) tm.assert_numpy_array_equal(s1.unique(), exp) - self.assertEqual(s1.nunique(), 3) + assert s1.nunique() == 3 + + # these return the same + res4 = s1.value_counts(bins=4, dropna=True) + intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) + exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) + tm.assert_series_equal(res4, exp4) - res4 = s1.value_counts(bins=4) - exp4 = Series({0.998: 2, - 1.5: 1, - 2.0: 0, - 2.5: 1}, index=[0.998, 2.5, 1.5, 2.0]) + res4 = s1.value_counts(bins=4, dropna=False) + intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) + exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) tm.assert_series_equal(res4, exp4) + res4n = s1.value_counts(bins=4, normalize=True) - exp4n = Series( - {0.998: 0.5, - 1.5: 0.25, - 2.0: 0.0, - 2.5: 0.25}, index=[0.998, 2.5, 1.5, 2.0]) + exp4n = Series([0.5, 0.25, 0.25, 0], + index=intervals.take([0, 3, 1, 2])) tm.assert_series_equal(res4n, exp4n) # handle NA's properly @@ -679,7 +621,7 @@ def test_value_counts_bins(self): else: exp = np.array(['a', 'b', np.nan, 'd'], dtype=object) tm.assert_numpy_array_equal(s.unique(), exp) - self.assertEqual(s.nunique(), 3) + assert s.nunique() == 3 s = klass({}) expected = Series([], dtype=np.int64) @@ -687,13 +629,12 @@ def test_value_counts_bins(self): check_index_type=False) # returned dtype differs depending on original if isinstance(s, Index): - self.assert_index_equal(s.unique(), Index([]), - exact=False) + tm.assert_index_equal(s.unique(), Index([]), exact=False) else: - self.assert_numpy_array_equal(s.unique(), np.array([]), - check_dtype=False) + tm.assert_numpy_array_equal(s.unique(), np.array([]), + check_dtype=False) - self.assertEqual(s.nunique(), 0) + assert s.nunique() == 0 def test_value_counts_datetime64(self): klasses = [Index, Series] @@ -726,14 +667,14 @@ def test_value_counts_datetime64(self): else: tm.assert_numpy_array_equal(s.unique(), expected) - self.assertEqual(s.nunique(), 3) + assert s.nunique() == 3 # with NaT s = df['dt'].copy() s = klass([v for v in s.values] + [pd.NaT]) result = s.value_counts() - self.assertEqual(result.index.dtype, 'datetime64[ns]') + assert result.index.dtype == 'datetime64[ns]' tm.assert_series_equal(result, expected_s) result = s.value_counts(dropna=False) @@ -741,7 +682,7 @@ def test_value_counts_datetime64(self): tm.assert_series_equal(result, expected_s) unique = s.unique() - self.assertEqual(unique.dtype, 'datetime64[ns]') + assert unique.dtype == 'datetime64[ns]' # numpy_array_equal cannot compare pd.NaT if isinstance(s, Index): @@ -749,10 +690,10 @@ def test_value_counts_datetime64(self): tm.assert_index_equal(unique, exp_idx) else: tm.assert_numpy_array_equal(unique[:3], expected) - self.assertTrue(pd.isnull(unique[3])) + assert pd.isna(unique[3]) - self.assertEqual(s.nunique(), 3) - self.assertEqual(s.nunique(dropna=False), 4) + assert s.nunique() == 3 + assert s.nunique(dropna=False) == 4 # timedelta64[ns] td = df.dt - df.dt + timedelta(1) @@ -786,14 +727,14 @@ def test_factorize(self): exp_uniques = o labels, uniques = o.factorize() - self.assert_numpy_array_equal(labels, exp_arr) + tm.assert_numpy_array_equal(labels, exp_arr) if isinstance(o, Series): - self.assert_index_equal(uniques, Index(orig), - check_names=False) + tm.assert_index_equal(uniques, Index(orig), + check_names=False) else: # factorize explicitly resets name - self.assert_index_equal(uniques, exp_uniques, - check_names=False) + tm.assert_index_equal(uniques, exp_uniques, + check_names=False) def test_factorize_repeated(self): for orig in self.objs: @@ -816,24 +757,24 @@ def test_factorize_repeated(self): dtype=np.intp) labels, uniques = n.factorize(sort=True) - self.assert_numpy_array_equal(labels, exp_arr) + tm.assert_numpy_array_equal(labels, exp_arr) if isinstance(o, Series): - self.assert_index_equal(uniques, Index(orig).sort_values(), - check_names=False) + tm.assert_index_equal(uniques, Index(orig).sort_values(), + check_names=False) else: - self.assert_index_equal(uniques, o, check_names=False) + tm.assert_index_equal(uniques, o, check_names=False) exp_arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4], np.intp) labels, uniques = n.factorize(sort=False) - self.assert_numpy_array_equal(labels, exp_arr) + tm.assert_numpy_array_equal(labels, exp_arr) if isinstance(o, Series): expected = Index(o.iloc[5:10].append(o.iloc[:5])) - self.assert_index_equal(uniques, expected, check_names=False) + tm.assert_index_equal(uniques, expected, check_names=False) else: expected = o[5:10].append(o[:5]) - self.assert_index_equal(uniques, expected, check_names=False) + tm.assert_index_equal(uniques, expected, check_names=False) def test_duplicated_drop_duplicates_index(self): # GH 4060 @@ -851,13 +792,13 @@ def test_duplicated_drop_duplicates_index(self): expected = np.array([False] * len(original), dtype=bool) duplicated = original.duplicated() tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool result = original.drop_duplicates() tm.assert_index_equal(result, original) - self.assertFalse(result is original) + assert result is not original # has_duplicates - self.assertFalse(original.has_duplicates) + assert not original.has_duplicates # create repeated values, 3rd and 5th values are duplicated idx = original[list(range(len(original))) + [5, 3]] @@ -865,7 +806,7 @@ def test_duplicated_drop_duplicates_index(self): dtype=bool) duplicated = idx.duplicated() tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool tm.assert_index_equal(idx.drop_duplicates(), original) base = [False] * len(idx) @@ -875,19 +816,10 @@ def test_duplicated_drop_duplicates_index(self): duplicated = idx.duplicated(keep='last') tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool result = idx.drop_duplicates(keep='last') tm.assert_index_equal(result, idx[~expected]) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - duplicated = idx.duplicated(take_last=True) - tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) - with tm.assert_produces_warning(FutureWarning): - result = idx.drop_duplicates(take_last=True) - tm.assert_index_equal(result, idx[~expected]) - base = [False] * len(original) + [True, True] base[3] = True base[5] = True @@ -895,11 +827,11 @@ def test_duplicated_drop_duplicates_index(self): duplicated = idx.duplicated(keep=False) tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool result = idx.drop_duplicates(keep=False) tm.assert_index_equal(result, idx[~expected]) - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( TypeError, r"drop_duplicates\(\) got an unexpected " "keyword argument"): idx.drop_duplicates(inplace=True) @@ -910,7 +842,7 @@ def test_duplicated_drop_duplicates_index(self): tm.assert_series_equal(original.duplicated(), expected) result = original.drop_duplicates() tm.assert_series_equal(result, original) - self.assertFalse(result is original) + assert result is not original idx = original.index[list(range(len(original))) + [5, 3]] values = original._values[list(range(len(original))) + [5, 3]] @@ -930,13 +862,6 @@ def test_duplicated_drop_duplicates_index(self): tm.assert_series_equal(s.drop_duplicates(keep='last'), s[~np.array(base)]) - # deprecate take_last - with tm.assert_produces_warning(FutureWarning): - tm.assert_series_equal( - s.duplicated(take_last=True), expected) - with tm.assert_produces_warning(FutureWarning): - tm.assert_series_equal(s.drop_duplicates(take_last=True), - s[~np.array(base)]) base = [False] * len(original) + [True, True] base[3] = True base[5] = True @@ -977,11 +902,11 @@ def test_fillna(self): # values will not be changed result = o.fillna(o.astype(object).values[0]) if isinstance(o, Index): - self.assert_index_equal(o, result) + tm.assert_index_equal(o, result) else: - self.assert_series_equal(o, result) + tm.assert_series_equal(o, result) # check shallow_copied - self.assertFalse(o is result) + assert o is not result for null_obj in [np.nan, None]: for orig in self.objs: @@ -1007,16 +932,17 @@ def test_fillna(self): o = klass(values) # check values has the same dtype as the original - self.assertEqual(o.dtype, orig.dtype) + assert o.dtype == orig.dtype result = o.fillna(fill_value) if isinstance(o, Index): - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) else: - self.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # check shallow_copied - self.assertFalse(o is result) + assert o is not result + @pytest.mark.skipif(PYPY, reason="not relevant for PyPy") def test_memory_usage(self): for o in self.objs: res = o.memory_usage() @@ -1025,36 +951,34 @@ def test_memory_usage(self): if (is_object_dtype(o) or (isinstance(o, Series) and is_object_dtype(o.index))): # if there are objects, only deep will pick them up - self.assertTrue(res_deep > res) + assert res_deep > res else: - self.assertEqual(res, res_deep) + assert res == res_deep if isinstance(o, Series): - self.assertEqual( - (o.memory_usage(index=False) + - o.index.memory_usage()), - o.memory_usage(index=True) - ) + assert ((o.memory_usage(index=False) + + o.index.memory_usage()) == + o.memory_usage(index=True)) # sys.getsizeof will call the .memory_usage with # deep=True, and add on some GC overhead diff = res_deep - sys.getsizeof(o) - self.assertTrue(abs(diff) < 100) + assert abs(diff) < 100 def test_searchsorted(self): # See gh-12238 for o in self.objs: index = np.searchsorted(o, max(o)) - self.assertTrue(0 <= index <= len(o)) + assert 0 <= index <= len(o) index = np.searchsorted(o, max(o), sorter=range(len(o))) - self.assertTrue(0 <= index <= len(o)) + assert 0 <= index <= len(o) def test_validate_bool_args(self): invalid_values = [1, "True", [1, 2, 3], 5.0] for value in invalid_values: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): self.int_series.drop_duplicates(inplace=value) @@ -1070,10 +994,10 @@ def test_transpose(self): def test_transpose_non_default_axes(self): for obj in self.objs: - tm.assertRaisesRegexp(ValueError, self.errmsg, - obj.transpose, 1) - tm.assertRaisesRegexp(ValueError, self.errmsg, - obj.transpose, axes=1) + tm.assert_raises_regex(ValueError, self.errmsg, + obj.transpose, 1) + tm.assert_raises_regex(ValueError, self.errmsg, + obj.transpose, axes=1) def test_numpy_transpose(self): for obj in self.objs: @@ -1082,34 +1006,28 @@ def test_numpy_transpose(self): else: tm.assert_series_equal(np.transpose(obj), obj) - tm.assertRaisesRegexp(ValueError, self.errmsg, - np.transpose, obj, axes=1) + tm.assert_raises_regex(ValueError, self.errmsg, + np.transpose, obj, axes=1) -class TestNoNewAttributesMixin(tm.TestCase): +class TestNoNewAttributesMixin(object): def test_mixin(self): class T(NoNewAttributesMixin): pass t = T() - self.assertFalse(hasattr(t, "__frozen")) + assert not hasattr(t, "__frozen") + t.a = "test" - self.assertEqual(t.a, "test") + assert t.a == "test" + t._freeze() - # self.assertTrue("__frozen" not in dir(t)) - self.assertIs(getattr(t, "__frozen"), True) + assert "__frozen" in dir(t) + assert getattr(t, "__frozen") def f(): t.b = "test" - self.assertRaises(AttributeError, f) - self.assertFalse(hasattr(t, "b")) - - -if __name__ == '__main__': - import nose - - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - # '--with-coverage', '--cover-package=pandas.core'], - exit=False) + pytest.raises(AttributeError, f) + assert not hasattr(t, "b") diff --git a/pandas/tests/test_categorical.py b/pandas/tests/test_categorical.py index 745914d3e7ef5..7bbe220378993 100644 --- a/pandas/tests/test_categorical.py +++ b/pandas/tests/test_categorical.py @@ -1,40 +1,42 @@ # -*- coding: utf-8 -*- # pylint: disable=E1101,E1103,W0232 +from warnings import catch_warnings +import pytest import sys from datetime import datetime from distutils.version import LooseVersion import numpy as np -from pandas.types.dtypes import CategoricalDtype -from pandas.types.common import (is_categorical_dtype, - is_object_dtype, - is_float_dtype, - is_integer_dtype) +from pandas.core.dtypes.dtypes import CategoricalDtype +from pandas.core.dtypes.common import ( + is_categorical_dtype, + is_float_dtype, + is_integer_dtype) import pandas as pd import pandas.compat as compat import pandas.util.testing as tm -from pandas import (Categorical, Index, Series, DataFrame, PeriodIndex, - Timestamp, CategoricalIndex, isnull) -from pandas.compat import range, lrange, u, PY3 +from pandas import (Categorical, Index, Series, DataFrame, + Timestamp, CategoricalIndex, isna, + date_range, DatetimeIndex, + period_range, PeriodIndex, + timedelta_range, TimedeltaIndex, NaT, + Interval, IntervalIndex) +from pandas.compat import range, lrange, u, PY3, PYPY from pandas.core.config import option_context -# GH 12066 -# flake8: noqa +class TestCategorical(object): -class TestCategorical(tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): self.factor = Categorical(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c'], ordered=True) def test_getitem(self): - self.assertEqual(self.factor[0], 'a') - self.assertEqual(self.factor[-1], 'c') + assert self.factor[0] == 'a' + assert self.factor[-1] == 'c' subf = self.factor[[0, 1, 2]] tm.assert_numpy_array_equal(subf._codes, @@ -52,7 +54,7 @@ def test_getitem_listlike(self): c = Categorical(np.random.randint(0, 5, size=150000).astype(np.int8)) result = c.codes[np.array([100000]).astype(np.int64)] expected = c[np.array([100000]).astype(np.int64)].codes - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) def test_getitem_category_type(self): # GH 14580 @@ -80,9 +82,9 @@ def test_setitem(self): # int/positional c = self.factor.copy() c[0] = 'b' - self.assertEqual(c[0], 'b') + assert c[0] == 'b' c[-1] = 'a' - self.assertEqual(c[-1], 'a') + assert c[-1] == 'a' # boolean c = self.factor.copy() @@ -93,7 +95,7 @@ def test_setitem(self): expected = Categorical(['c', 'b', 'b', 'a', 'a', 'c', 'c', 'c'], ordered=True) - self.assert_categorical_equal(c, expected) + tm.assert_categorical_equal(c, expected) def test_setitem_listlike(self): @@ -108,19 +110,39 @@ def test_setitem_listlike(self): # we are asserting the code result here # which maps to the -1000 category result = c.codes[np.array([100000]).astype(np.int64)] - self.assertEqual(result, np.array([5], dtype='int8')) + tm.assert_numpy_array_equal(result, np.array([5], dtype='int8')) + + def test_constructor_empty(self): + # GH 17248 + c = Categorical([]) + expected = Index([]) + tm.assert_index_equal(c.categories, expected) + + c = Categorical([], categories=[1, 2, 3]) + expected = pd.Int64Index([1, 2, 3]) + tm.assert_index_equal(c.categories, expected) def test_constructor_unsortable(self): # it works! arr = np.array([1, 2, 3, datetime.now()], dtype='O') factor = Categorical(arr, ordered=False) - self.assertFalse(factor.ordered) + assert not factor.ordered # this however will raise as cannot be sorted - self.assertRaises( + pytest.raises( TypeError, lambda: Categorical(arr, ordered=True)) + def test_constructor_interval(self): + result = Categorical([Interval(1, 2), Interval(2, 3), Interval(3, 6)], + ordered=True) + ii = IntervalIndex.from_intervals([Interval(1, 2), + Interval(2, 3), + Interval(3, 6)]) + exp = Categorical(ii, ordered=True) + tm.assert_categorical_equal(result, exp) + tm.assert_index_equal(result.categories, ii) + def test_is_equal_dtype(self): # test dtype comparisons between cats @@ -128,48 +150,42 @@ def test_is_equal_dtype(self): c1 = Categorical(list('aabca'), categories=list('abc'), ordered=False) c2 = Categorical(list('aabca'), categories=list('cab'), ordered=False) c3 = Categorical(list('aabca'), categories=list('cab'), ordered=True) - self.assertTrue(c1.is_dtype_equal(c1)) - self.assertTrue(c2.is_dtype_equal(c2)) - self.assertTrue(c3.is_dtype_equal(c3)) - self.assertFalse(c1.is_dtype_equal(c2)) - self.assertFalse(c1.is_dtype_equal(c3)) - self.assertFalse(c1.is_dtype_equal(Index(list('aabca')))) - self.assertFalse(c1.is_dtype_equal(c1.astype(object))) - self.assertTrue(c1.is_dtype_equal(CategoricalIndex(c1))) - self.assertFalse(c1.is_dtype_equal( + assert c1.is_dtype_equal(c1) + assert c2.is_dtype_equal(c2) + assert c3.is_dtype_equal(c3) + assert not c1.is_dtype_equal(c2) + assert not c1.is_dtype_equal(c3) + assert not c1.is_dtype_equal(Index(list('aabca'))) + assert not c1.is_dtype_equal(c1.astype(object)) + assert c1.is_dtype_equal(CategoricalIndex(c1)) + assert not (c1.is_dtype_equal( CategoricalIndex(c1, categories=list('cab')))) - self.assertFalse(c1.is_dtype_equal(CategoricalIndex(c1, ordered=True))) + assert not c1.is_dtype_equal(CategoricalIndex(c1, ordered=True)) def test_constructor(self): exp_arr = np.array(["a", "b", "c", "a", "b", "c"], dtype=np.object_) c1 = Categorical(exp_arr) - self.assert_numpy_array_equal(c1.__array__(), exp_arr) + tm.assert_numpy_array_equal(c1.__array__(), exp_arr) c2 = Categorical(exp_arr, categories=["a", "b", "c"]) - self.assert_numpy_array_equal(c2.__array__(), exp_arr) + tm.assert_numpy_array_equal(c2.__array__(), exp_arr) c2 = Categorical(exp_arr, categories=["c", "b", "a"]) - self.assert_numpy_array_equal(c2.__array__(), exp_arr) + tm.assert_numpy_array_equal(c2.__array__(), exp_arr) # categories must be unique def f(): Categorical([1, 2], [1, 2, 2]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def f(): Categorical(["a", "b"], ["a", "b", "b"]) - self.assertRaises(ValueError, f) - - def f(): - with tm.assert_produces_warning(FutureWarning): - Categorical([1, 2], [1, 2, np.nan, np.nan]) - - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # The default should be unordered c1 = Categorical(["a", "b", "c", "a"]) - self.assertFalse(c1.ordered) + assert not c1.ordered # Categorical as input c1 = Categorical(["a", "b", "c", "a"]) @@ -186,8 +202,8 @@ def f(): c1 = Categorical(["a", "b", "c", "a"], categories=["a", "c", "b"]) c2 = Categorical(c1, categories=["a", "b", "c"]) - self.assert_numpy_array_equal(c1.__array__(), c2.__array__()) - self.assert_index_equal(c2.categories, Index(["a", "b", "c"])) + tm.assert_numpy_array_equal(c1.__array__(), c2.__array__()) + tm.assert_index_equal(c2.categories, Index(["a", "b", "c"])) # Series of dtype category c1 = Categorical(["a", "b", "c", "a"], categories=["a", "b", "c", "d"]) @@ -210,68 +226,45 @@ def f(): # This should result in integer categories, not float! cat = pd.Categorical([1, 2, 3, np.nan], categories=[1, 2, 3]) - self.assertTrue(is_integer_dtype(cat.categories)) + assert is_integer_dtype(cat.categories) # https://github.com/pandas-dev/pandas/issues/3678 cat = pd.Categorical([np.nan, 1, 2, 3]) - self.assertTrue(is_integer_dtype(cat.categories)) + assert is_integer_dtype(cat.categories) # this should result in floats cat = pd.Categorical([np.nan, 1, 2., 3]) - self.assertTrue(is_float_dtype(cat.categories)) + assert is_float_dtype(cat.categories) cat = pd.Categorical([np.nan, 1., 2., 3.]) - self.assertTrue(is_float_dtype(cat.categories)) - - # Deprecating NaNs in categoires (GH #10748) - # preserve int as far as possible by converting to object if NaN is in - # categories - with tm.assert_produces_warning(FutureWarning): - cat = pd.Categorical([np.nan, 1, 2, 3], - categories=[np.nan, 1, 2, 3]) - self.assertTrue(is_object_dtype(cat.categories)) + assert is_float_dtype(cat.categories) # This doesn't work -> this would probably need some kind of "remember # the original type" feature to try to cast the array interface result # to... - # vals = np.asarray(cat[cat.notnull()]) - # self.assertTrue(is_integer_dtype(vals)) - with tm.assert_produces_warning(FutureWarning): - cat = pd.Categorical([np.nan, "a", "b", "c"], - categories=[np.nan, "a", "b", "c"]) - self.assertTrue(is_object_dtype(cat.categories)) - # but don't do it for floats - with tm.assert_produces_warning(FutureWarning): - cat = pd.Categorical([np.nan, 1., 2., 3.], - categories=[np.nan, 1., 2., 3.]) - self.assertTrue(is_float_dtype(cat.categories)) + # vals = np.asarray(cat[cat.notna()]) + # assert is_integer_dtype(vals) # corner cases cat = pd.Categorical([1]) - self.assertTrue(len(cat.categories) == 1) - self.assertTrue(cat.categories[0] == 1) - self.assertTrue(len(cat.codes) == 1) - self.assertTrue(cat.codes[0] == 0) + assert len(cat.categories) == 1 + assert cat.categories[0] == 1 + assert len(cat.codes) == 1 + assert cat.codes[0] == 0 cat = pd.Categorical(["a"]) - self.assertTrue(len(cat.categories) == 1) - self.assertTrue(cat.categories[0] == "a") - self.assertTrue(len(cat.codes) == 1) - self.assertTrue(cat.codes[0] == 0) + assert len(cat.categories) == 1 + assert cat.categories[0] == "a" + assert len(cat.codes) == 1 + assert cat.codes[0] == 0 # Scalars should be converted to lists cat = pd.Categorical(1) - self.assertTrue(len(cat.categories) == 1) - self.assertTrue(cat.categories[0] == 1) - self.assertTrue(len(cat.codes) == 1) - self.assertTrue(cat.codes[0] == 0) - - cat = pd.Categorical([1], categories=1) - self.assertTrue(len(cat.categories) == 1) - self.assertTrue(cat.categories[0] == 1) - self.assertTrue(len(cat.codes) == 1) - self.assertTrue(cat.codes[0] == 0) + assert len(cat.categories) == 1 + assert cat.categories[0] == 1 + assert len(cat.codes) == 1 + assert cat.codes[0] == 0 # Catch old style constructor useage: two arrays, codes + categories # We can only catch two cases: @@ -296,6 +289,26 @@ def f(): c = Categorical(np.array([], dtype='int64'), # noqa categories=[3, 2, 1], ordered=True) + def test_constructor_not_sequence(self): + # https://github.com/pandas-dev/pandas/issues/16022 + with pytest.raises(TypeError): + Categorical(['a', 'b'], categories='a') + + def test_constructor_with_null(self): + + # Cannot have NaN in categories + with pytest.raises(ValueError): + pd.Categorical([np.nan, "a", "b", "c"], + categories=[np.nan, "a", "b", "c"]) + + with pytest.raises(ValueError): + pd.Categorical([None, "a", "b", "c"], + categories=[None, "a", "b", "c"]) + + with pytest.raises(ValueError): + pd.Categorical(DatetimeIndex(['nat', '20160101']), + categories=[NaT, Timestamp('20160101')]) + def test_constructor_with_index(self): ci = CategoricalIndex(list('aabbca'), categories=list('cab')) tm.assert_categorical_equal(ci.values, Categorical(ci)) @@ -306,7 +319,7 @@ def test_constructor_with_index(self): categories=ci.categories)) def test_constructor_with_generator(self): - # This was raising an Error in isnull(single_val).any() because isnull + # This was raising an Error in isna(single_val).any() because isna # returned a scalar for a generator xrange = range @@ -342,7 +355,7 @@ def test_constructor_with_datetimelike(self): expected = type(dtl)(s) expected.freq = None tm.assert_index_equal(c.categories, expected) - self.assert_numpy_array_equal(c.codes, np.arange(5, dtype='int8')) + tm.assert_numpy_array_equal(c.codes, np.arange(5, dtype='int8')) # with NaT s2 = s.copy() @@ -353,10 +366,10 @@ def test_constructor_with_datetimelike(self): tm.assert_index_equal(c.categories, expected) exp = np.array([0, 1, 2, 3, -1], dtype=np.int8) - self.assert_numpy_array_equal(c.codes, exp) + tm.assert_numpy_array_equal(c.codes, exp) result = repr(c) - self.assertTrue('NaT' in result) + assert 'NaT' in result def test_constructor_from_index_series_datetimetz(self): idx = pd.date_range('2015-01-01 10:00', freq='D', periods=3, @@ -405,25 +418,31 @@ def test_from_codes(self): def f(): Categorical.from_codes([1, 2], [1, 2]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # no int codes def f(): Categorical.from_codes(["a"], [1, 2]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # no unique categories def f(): Categorical.from_codes([0, 1, 2], ["a", "a", "b"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) + + # NaN categories included + def f(): + Categorical.from_codes([0, 1, 2], ["a", "b", np.nan]) + + pytest.raises(ValueError, f) # too negative def f(): Categorical.from_codes([-2, 1, 2], ["a", "b", "c"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) exp = Categorical(["a", "b", "c"], ordered=False) res = Categorical.from_codes([0, 1, 2], ["a", "b", "c"]) @@ -442,10 +461,10 @@ def test_validate_ordered(self): # This should be a boolean. ordered = np.array([0, 1, 2]) - with tm.assertRaisesRegexp(exp_err, exp_msg): + with tm.assert_raises_regex(exp_err, exp_msg): Categorical([1, 2, 3], ordered=ordered) - with tm.assertRaisesRegexp(exp_err, exp_msg): + with tm.assert_raises_regex(exp_err, exp_msg): Categorical.from_codes([0, 0, 1], categories=['a', 'b', 'c'], ordered=ordered) @@ -480,11 +499,11 @@ def test_comparisons(self): other = self.factor[np.random.permutation(n)] result = self.factor == other expected = np.asarray(self.factor) == np.asarray(other) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = self.factor == 'd' expected = np.repeat(False, len(self.factor)) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) # comparisons with categoricals cat_rev = pd.Categorical(["a", "b", "c"], categories=["c", "b", "a"], @@ -498,21 +517,21 @@ def test_comparisons(self): # comparisons need to take categories ordering into account res_rev = cat_rev > cat_rev_base exp_rev = np.array([True, False, False]) - self.assert_numpy_array_equal(res_rev, exp_rev) + tm.assert_numpy_array_equal(res_rev, exp_rev) res_rev = cat_rev < cat_rev_base exp_rev = np.array([False, False, True]) - self.assert_numpy_array_equal(res_rev, exp_rev) + tm.assert_numpy_array_equal(res_rev, exp_rev) res = cat > cat_base exp = np.array([False, False, True]) - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) # Only categories with same categories can be compared def f(): cat > cat_rev - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) cat_rev_base2 = pd.Categorical( ["b", "b", "b"], categories=["c", "b", "a", "d"]) @@ -520,35 +539,35 @@ def f(): def f(): cat_rev > cat_rev_base2 - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # Only categories with same ordering information can be compared cat_unorderd = cat.set_ordered(False) - self.assertFalse((cat > cat).any()) + assert not (cat > cat).any() def f(): cat > cat_unorderd - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # comparison (in both directions) with Series will raise s = Series(["b", "b", "b"]) - self.assertRaises(TypeError, lambda: cat > s) - self.assertRaises(TypeError, lambda: cat_rev > s) - self.assertRaises(TypeError, lambda: s < cat) - self.assertRaises(TypeError, lambda: s < cat_rev) + pytest.raises(TypeError, lambda: cat > s) + pytest.raises(TypeError, lambda: cat_rev > s) + pytest.raises(TypeError, lambda: s < cat) + pytest.raises(TypeError, lambda: s < cat_rev) # comparison with numpy.array will raise in both direction, but only on # newer numpy versions a = np.array(["b", "b", "b"]) - self.assertRaises(TypeError, lambda: cat > a) - self.assertRaises(TypeError, lambda: cat_rev > a) + pytest.raises(TypeError, lambda: cat > a) + pytest.raises(TypeError, lambda: cat_rev > a) # The following work via '__array_priority__ = 1000' # works only on numpy >= 1.7.1 if LooseVersion(np.__version__) > "1.7.1": - self.assertRaises(TypeError, lambda: a < cat) - self.assertRaises(TypeError, lambda: a < cat_rev) + pytest.raises(TypeError, lambda: a < cat) + pytest.raises(TypeError, lambda: a < cat_rev) # Make sure that unequal comparison take the categories order in # account @@ -556,7 +575,7 @@ def f(): list("abc"), categories=list("cba"), ordered=True) exp = np.array([True, False, False]) res = cat_rev > "b" - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) def test_argsort(self): c = Categorical([5, 3, 1, 4, 2], ordered=True) @@ -576,17 +595,16 @@ def test_numpy_argsort(self): tm.assert_numpy_array_equal(np.argsort(c), expected, check_dtype=False) - msg = "the 'kind' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argsort, - c, kind='mergesort') + tm.assert_numpy_array_equal(np.argsort(c, kind='mergesort'), expected, + check_dtype=False) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argsort, - c, axis=0) + tm.assert_raises_regex(ValueError, msg, np.argsort, + c, axis=0) msg = "the 'order' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.argsort, - c, order='C') + tm.assert_raises_regex(ValueError, msg, np.argsort, + c, order='C') def test_na_flags_int_categories(self): # #1457 @@ -598,7 +616,7 @@ def test_na_flags_int_categories(self): cat = Categorical(labels, categories, fastpath=True) repr(cat) - self.assert_numpy_array_equal(isnull(cat), labels == -1) + tm.assert_numpy_array_equal(isna(cat), labels == -1) def test_categories_none(self): factor = Categorical(['a', 'b', 'b', 'a', @@ -608,7 +626,7 @@ def test_categories_none(self): def test_describe(self): # string type desc = self.factor.describe() - self.assertTrue(self.factor.ordered) + assert self.factor.ordered exp_index = pd.CategoricalIndex(['a', 'b', 'c'], name='categories', ordered=self.factor.ordered) expected = DataFrame({'counts': [3, 2, 3], @@ -650,64 +668,40 @@ def test_describe(self): name='categories')) tm.assert_frame_equal(desc, expected) - # NA as a category - with tm.assert_produces_warning(FutureWarning): - cat = pd.Categorical(["a", "c", "c", np.nan], - categories=["b", "a", "c", np.nan]) - result = cat.describe() - - expected = DataFrame([[0, 0], [1, 0.25], [2, 0.5], [1, 0.25]], - columns=['counts', 'freqs'], - index=pd.CategoricalIndex(['b', 'a', 'c', np.nan], - name='categories')) - tm.assert_frame_equal(result, expected, check_categorical=False) - - # NA as an unused category - with tm.assert_produces_warning(FutureWarning): - cat = pd.Categorical(["a", "c", "c"], - categories=["b", "a", "c", np.nan]) - result = cat.describe() - - exp_idx = pd.CategoricalIndex( - ['b', 'a', 'c', np.nan], name='categories') - expected = DataFrame([[0, 0], [1, 1 / 3.], [2, 2 / 3.], [0, 0]], - columns=['counts', 'freqs'], index=exp_idx) - tm.assert_frame_equal(result, expected, check_categorical=False) - def test_print(self): expected = ["[a, b, b, a, a, c, c, c]", "Categories (3, object): [a < b < c]"] expected = "\n".join(expected) actual = repr(self.factor) - self.assertEqual(actual, expected) + assert actual == expected def test_big_print(self): factor = Categorical([0, 1, 2, 0, 1, 2] * 100, ['a', 'b', 'c'], - name='cat', fastpath=True) + fastpath=True) expected = ["[a, b, c, a, b, ..., b, c, a, b, c]", "Length: 600", "Categories (3, object): [a, b, c]"] expected = "\n".join(expected) actual = repr(factor) - self.assertEqual(actual, expected) + assert actual == expected def test_empty_print(self): factor = Categorical([], ["a", "b", "c"]) expected = ("[], Categories (3, object): [a, b, c]") # hack because array_repr changed in numpy > 1.6.x actual = repr(factor) - self.assertEqual(actual, expected) + assert actual == expected - self.assertEqual(expected, actual) + assert expected == actual factor = Categorical([], ["a", "b", "c"], ordered=True) expected = ("[], Categories (3, object): [a < b < c]") actual = repr(factor) - self.assertEqual(expected, actual) + assert expected == actual factor = Categorical([], []) expected = ("[], Categories (0, object): []") - self.assertEqual(expected, repr(factor)) + assert expected == repr(factor) def test_print_none_width(self): # GH10087 @@ -716,7 +710,7 @@ def test_print_none_width(self): "dtype: category\nCategories (4, int64): [1, 2, 3, 4]") with option_context("display.width", None): - self.assertEqual(exp, repr(a)) + assert exp == repr(a) def test_unicode_print(self): if PY3: @@ -730,28 +724,37 @@ def test_unicode_print(self): Length: 60 Categories (3, object): [aaaaa, bb, cccc]""" - self.assertEqual(_rep(c), expected) + assert _rep(c) == expected - c = pd.Categorical([u'ああああ', u'いいいいい', u'ううううううう'] - * 20) + c = pd.Categorical([u'ああああ', u'いいいいい', u'ううううううう'] * 20) expected = u"""\ [ああああ, いいいいい, ううううううう, ああああ, いいいいい, ..., いいいいい, ううううううう, ああああ, いいいいい, ううううううう] Length: 60 Categories (3, object): [ああああ, いいいいい, ううううううう]""" # noqa - self.assertEqual(_rep(c), expected) + assert _rep(c) == expected # unicode option should not affect to Categorical, as it doesn't care # the repr width with option_context('display.unicode.east_asian_width', True): - c = pd.Categorical([u'ああああ', u'いいいいい', u'ううううううう'] - * 20) + c = pd.Categorical([u'ああああ', u'いいいいい', u'ううううううう'] * 20) expected = u"""[ああああ, いいいいい, ううううううう, ああああ, いいいいい, ..., いいいいい, ううううううう, ああああ, いいいいい, ううううううう] Length: 60 Categories (3, object): [ああああ, いいいいい, ううううううう]""" # noqa - self.assertEqual(_rep(c), expected) + assert _rep(c) == expected + + def test_tab_complete_warning(self, ip): + # https://github.com/pandas-dev/pandas/issues/16409 + pytest.importorskip('IPython', minversion="6.0.0") + from IPython.core.completer import provisionalcompleter + + code = "import pandas as pd; c = pd.Categorical([])" + ip.run_code(code) + with tm.assert_produces_warning(None): + with provisionalcompleter('ignore'): + list(ip.Completer.completions('c.', 1)) def test_periodindex(self): idx1 = PeriodIndex(['2014-01', '2014-01', '2014-02', '2014-02', @@ -761,8 +764,8 @@ def test_periodindex(self): str(cat1) exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.int8) exp_idx = PeriodIndex(['2014-01', '2014-02', '2014-03'], freq='M') - self.assert_numpy_array_equal(cat1._codes, exp_arr) - self.assert_index_equal(cat1.categories, exp_idx) + tm.assert_numpy_array_equal(cat1._codes, exp_arr) + tm.assert_index_equal(cat1.categories, exp_idx) idx2 = PeriodIndex(['2014-03', '2014-03', '2014-02', '2014-01', '2014-03', '2014-01'], freq='M') @@ -770,8 +773,8 @@ def test_periodindex(self): str(cat2) exp_arr = np.array([2, 2, 1, 0, 2, 0], dtype=np.int8) exp_idx2 = PeriodIndex(['2014-01', '2014-02', '2014-03'], freq='M') - self.assert_numpy_array_equal(cat2._codes, exp_arr) - self.assert_index_equal(cat2.categories, exp_idx2) + tm.assert_numpy_array_equal(cat2._codes, exp_arr) + tm.assert_index_equal(cat2.categories, exp_idx2) idx3 = PeriodIndex(['2013-12', '2013-11', '2013-10', '2013-09', '2013-08', '2013-07', '2013-05'], freq='M') @@ -779,81 +782,81 @@ def test_periodindex(self): exp_arr = np.array([6, 5, 4, 3, 2, 1, 0], dtype=np.int8) exp_idx = PeriodIndex(['2013-05', '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12'], freq='M') - self.assert_numpy_array_equal(cat3._codes, exp_arr) - self.assert_index_equal(cat3.categories, exp_idx) + tm.assert_numpy_array_equal(cat3._codes, exp_arr) + tm.assert_index_equal(cat3.categories, exp_idx) def test_categories_assigments(self): s = pd.Categorical(["a", "b", "c", "a"]) exp = np.array([1, 2, 3, 1], dtype=np.int64) s.categories = [1, 2, 3] - self.assert_numpy_array_equal(s.__array__(), exp) - self.assert_index_equal(s.categories, Index([1, 2, 3])) + tm.assert_numpy_array_equal(s.__array__(), exp) + tm.assert_index_equal(s.categories, Index([1, 2, 3])) # lengthen def f(): s.categories = [1, 2, 3, 4] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # shorten def f(): s.categories = [1, 2] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_construction_with_ordered(self): # GH 9347, 9190 cat = Categorical([0, 1, 2]) - self.assertFalse(cat.ordered) + assert not cat.ordered cat = Categorical([0, 1, 2], ordered=False) - self.assertFalse(cat.ordered) + assert not cat.ordered cat = Categorical([0, 1, 2], ordered=True) - self.assertTrue(cat.ordered) + assert cat.ordered def test_ordered_api(self): # GH 9347 cat1 = pd.Categorical(["a", "c", "b"], ordered=False) - self.assert_index_equal(cat1.categories, Index(['a', 'b', 'c'])) - self.assertFalse(cat1.ordered) + tm.assert_index_equal(cat1.categories, Index(['a', 'b', 'c'])) + assert not cat1.ordered cat2 = pd.Categorical(["a", "c", "b"], categories=['b', 'c', 'a'], ordered=False) - self.assert_index_equal(cat2.categories, Index(['b', 'c', 'a'])) - self.assertFalse(cat2.ordered) + tm.assert_index_equal(cat2.categories, Index(['b', 'c', 'a'])) + assert not cat2.ordered cat3 = pd.Categorical(["a", "c", "b"], ordered=True) - self.assert_index_equal(cat3.categories, Index(['a', 'b', 'c'])) - self.assertTrue(cat3.ordered) + tm.assert_index_equal(cat3.categories, Index(['a', 'b', 'c'])) + assert cat3.ordered cat4 = pd.Categorical(["a", "c", "b"], categories=['b', 'c', 'a'], ordered=True) - self.assert_index_equal(cat4.categories, Index(['b', 'c', 'a'])) - self.assertTrue(cat4.ordered) + tm.assert_index_equal(cat4.categories, Index(['b', 'c', 'a'])) + assert cat4.ordered def test_set_ordered(self): cat = Categorical(["a", "b", "c", "a"], ordered=True) cat2 = cat.as_unordered() - self.assertFalse(cat2.ordered) + assert not cat2.ordered cat2 = cat.as_ordered() - self.assertTrue(cat2.ordered) + assert cat2.ordered cat2.as_unordered(inplace=True) - self.assertFalse(cat2.ordered) + assert not cat2.ordered cat2.as_ordered(inplace=True) - self.assertTrue(cat2.ordered) + assert cat2.ordered - self.assertTrue(cat2.set_ordered(True).ordered) - self.assertFalse(cat2.set_ordered(False).ordered) + assert cat2.set_ordered(True).ordered + assert not cat2.set_ordered(False).ordered cat2.set_ordered(True, inplace=True) - self.assertTrue(cat2.ordered) + assert cat2.ordered cat2.set_ordered(False, inplace=True) - self.assertFalse(cat2.ordered) + assert not cat2.ordered # removed in 0.19.0 msg = "can\'t set attribute" - with tm.assertRaisesRegexp(AttributeError, msg): + with tm.assert_raises_regex(AttributeError, msg): cat.ordered = True - with tm.assertRaisesRegexp(AttributeError, msg): + with tm.assert_raises_regex(AttributeError, msg): cat.ordered = False def test_set_categories(self): @@ -862,106 +865,104 @@ def test_set_categories(self): exp_values = np.array(["a", "b", "c", "a"], dtype=np.object_) res = cat.set_categories(["c", "b", "a"], inplace=True) - self.assert_index_equal(cat.categories, exp_categories) - self.assert_numpy_array_equal(cat.__array__(), exp_values) - self.assertIsNone(res) + tm.assert_index_equal(cat.categories, exp_categories) + tm.assert_numpy_array_equal(cat.__array__(), exp_values) + assert res is None res = cat.set_categories(["a", "b", "c"]) # cat must be the same as before - self.assert_index_equal(cat.categories, exp_categories) - self.assert_numpy_array_equal(cat.__array__(), exp_values) + tm.assert_index_equal(cat.categories, exp_categories) + tm.assert_numpy_array_equal(cat.__array__(), exp_values) # only res is changed exp_categories_back = Index(["a", "b", "c"]) - self.assert_index_equal(res.categories, exp_categories_back) - self.assert_numpy_array_equal(res.__array__(), exp_values) + tm.assert_index_equal(res.categories, exp_categories_back) + tm.assert_numpy_array_equal(res.__array__(), exp_values) # not all "old" included in "new" -> all not included ones are now # np.nan cat = Categorical(["a", "b", "c", "a"], ordered=True) res = cat.set_categories(["a"]) - self.assert_numpy_array_equal(res.codes, - np.array([0, -1, -1, 0], dtype=np.int8)) + tm.assert_numpy_array_equal(res.codes, np.array([0, -1, -1, 0], + dtype=np.int8)) # still not all "old" in "new" res = cat.set_categories(["a", "b", "d"]) - self.assert_numpy_array_equal(res.codes, - np.array([0, 1, -1, 0], dtype=np.int8)) - self.assert_index_equal(res.categories, Index(["a", "b", "d"])) + tm.assert_numpy_array_equal(res.codes, np.array([0, 1, -1, 0], + dtype=np.int8)) + tm.assert_index_equal(res.categories, Index(["a", "b", "d"])) # all "old" included in "new" cat = cat.set_categories(["a", "b", "c", "d"]) exp_categories = Index(["a", "b", "c", "d"]) - self.assert_index_equal(cat.categories, exp_categories) + tm.assert_index_equal(cat.categories, exp_categories) # internals... c = Categorical([1, 2, 3, 4, 1], categories=[1, 2, 3, 4], ordered=True) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, 2, 3, 0], dtype=np.int8)) - self.assert_index_equal(c.categories, Index([1, 2, 3, 4])) + tm.assert_numpy_array_equal(c._codes, np.array([0, 1, 2, 3, 0], + dtype=np.int8)) + tm.assert_index_equal(c.categories, Index([1, 2, 3, 4])) exp = np.array([1, 2, 3, 4, 1], dtype=np.int64) - self.assert_numpy_array_equal(c.get_values(), exp) + tm.assert_numpy_array_equal(c.get_values(), exp) # all "pointers" to '4' must be changed from 3 to 0,... c = c.set_categories([4, 3, 2, 1]) # positions are changed - self.assert_numpy_array_equal(c._codes, - np.array([3, 2, 1, 0, 3], dtype=np.int8)) + tm.assert_numpy_array_equal(c._codes, np.array([3, 2, 1, 0, 3], + dtype=np.int8)) # categories are now in new order - self.assert_index_equal(c.categories, Index([4, 3, 2, 1])) + tm.assert_index_equal(c.categories, Index([4, 3, 2, 1])) # output is the same exp = np.array([1, 2, 3, 4, 1], dtype=np.int64) - self.assert_numpy_array_equal(c.get_values(), exp) - self.assertTrue(c.min(), 4) - self.assertTrue(c.max(), 1) + tm.assert_numpy_array_equal(c.get_values(), exp) + assert c.min() == 4 + assert c.max() == 1 # set_categories should set the ordering if specified c2 = c.set_categories([4, 3, 2, 1], ordered=False) - self.assertFalse(c2.ordered) - self.assert_numpy_array_equal(c.get_values(), c2.get_values()) + assert not c2.ordered + + tm.assert_numpy_array_equal(c.get_values(), c2.get_values()) # set_categories should pass thru the ordering c2 = c.set_ordered(False).set_categories([4, 3, 2, 1]) - self.assertFalse(c2.ordered) - self.assert_numpy_array_equal(c.get_values(), c2.get_values()) + assert not c2.ordered + + tm.assert_numpy_array_equal(c.get_values(), c2.get_values()) def test_rename_categories(self): cat = pd.Categorical(["a", "b", "c", "a"]) # inplace=False: the old one must not be changed res = cat.rename_categories([1, 2, 3]) - self.assert_numpy_array_equal(res.__array__(), - np.array([1, 2, 3, 1], dtype=np.int64)) - self.assert_index_equal(res.categories, Index([1, 2, 3])) + tm.assert_numpy_array_equal(res.__array__(), np.array([1, 2, 3, 1], + dtype=np.int64)) + tm.assert_index_equal(res.categories, Index([1, 2, 3])) exp_cat = np.array(["a", "b", "c", "a"], dtype=np.object_) - self.assert_numpy_array_equal(cat.__array__(), exp_cat) + tm.assert_numpy_array_equal(cat.__array__(), exp_cat) exp_cat = Index(["a", "b", "c"]) - self.assert_index_equal(cat.categories, exp_cat) + tm.assert_index_equal(cat.categories, exp_cat) res = cat.rename_categories([1, 2, 3], inplace=True) # and now inplace - self.assertIsNone(res) - self.assert_numpy_array_equal(cat.__array__(), - np.array([1, 2, 3, 1], dtype=np.int64)) - self.assert_index_equal(cat.categories, Index([1, 2, 3])) + assert res is None + tm.assert_numpy_array_equal(cat.__array__(), np.array([1, 2, 3, 1], + dtype=np.int64)) + tm.assert_index_equal(cat.categories, Index([1, 2, 3])) - # lengthen - def f(): + # Lengthen + with pytest.raises(ValueError): cat.rename_categories([1, 2, 3, 4]) - self.assertRaises(ValueError, f) - - # shorten - def f(): + # Shorten + with pytest.raises(ValueError): cat.rename_categories([1, 2]) - self.assertRaises(ValueError, f) - def test_reorder_categories(self): cat = Categorical(["a", "b", "c", "a"], ordered=True) old = cat.copy() @@ -971,14 +972,14 @@ def test_reorder_categories(self): # first inplace == False res = cat.reorder_categories(["c", "b", "a"]) # cat must be the same as before - self.assert_categorical_equal(cat, old) + tm.assert_categorical_equal(cat, old) # only res is changed - self.assert_categorical_equal(res, new) + tm.assert_categorical_equal(res, new) # inplace == True res = cat.reorder_categories(["c", "b", "a"], inplace=True) - self.assertIsNone(res) - self.assert_categorical_equal(cat, new) + assert res is None + tm.assert_categorical_equal(cat, new) # not all "old" included in "new" cat = Categorical(["a", "b", "c", "a"], ordered=True) @@ -986,19 +987,19 @@ def test_reorder_categories(self): def f(): cat.reorder_categories(["a"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # still not all "old" in "new" def f(): cat.reorder_categories(["a", "b", "d"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # all "old" included in "new", but too long def f(): cat.reorder_categories(["a", "b", "c", "d"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_add_categories(self): cat = Categorical(["a", "b", "c", "a"], ordered=True) @@ -1008,23 +1009,23 @@ def test_add_categories(self): # first inplace == False res = cat.add_categories("d") - self.assert_categorical_equal(cat, old) - self.assert_categorical_equal(res, new) + tm.assert_categorical_equal(cat, old) + tm.assert_categorical_equal(res, new) res = cat.add_categories(["d"]) - self.assert_categorical_equal(cat, old) - self.assert_categorical_equal(res, new) + tm.assert_categorical_equal(cat, old) + tm.assert_categorical_equal(res, new) # inplace == True res = cat.add_categories("d", inplace=True) - self.assert_categorical_equal(cat, new) - self.assertIsNone(res) + tm.assert_categorical_equal(cat, new) + assert res is None # new is in old categories def f(): cat.add_categories(["d"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # GH 9927 cat = Categorical(list("abc"), ordered=True) @@ -1032,13 +1033,13 @@ def f(): list("abc"), categories=list("abcde"), ordered=True) # test with Series, np.array, index, list res = cat.add_categories(Series(["d", "e"])) - self.assert_categorical_equal(res, expected) + tm.assert_categorical_equal(res, expected) res = cat.add_categories(np.array(["d", "e"])) - self.assert_categorical_equal(res, expected) + tm.assert_categorical_equal(res, expected) res = cat.add_categories(Index(["d", "e"])) - self.assert_categorical_equal(res, expected) + tm.assert_categorical_equal(res, expected) res = cat.add_categories(["d", "e"]) - self.assert_categorical_equal(res, expected) + tm.assert_categorical_equal(res, expected) def test_remove_categories(self): cat = Categorical(["a", "b", "c", "a"], ordered=True) @@ -1048,23 +1049,23 @@ def test_remove_categories(self): # first inplace == False res = cat.remove_categories("c") - self.assert_categorical_equal(cat, old) - self.assert_categorical_equal(res, new) + tm.assert_categorical_equal(cat, old) + tm.assert_categorical_equal(res, new) res = cat.remove_categories(["c"]) - self.assert_categorical_equal(cat, old) - self.assert_categorical_equal(res, new) + tm.assert_categorical_equal(cat, old) + tm.assert_categorical_equal(res, new) # inplace == True res = cat.remove_categories("c", inplace=True) - self.assert_categorical_equal(cat, new) - self.assertIsNone(res) + tm.assert_categorical_equal(cat, new) + assert res is None # removal is not in categories def f(): cat.remove_categories(["c"]) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_remove_unused_categories(self): c = Categorical(["a", "b", "c", "d", "a"], @@ -1072,33 +1073,33 @@ def test_remove_unused_categories(self): exp_categories_all = Index(["a", "b", "c", "d", "e"]) exp_categories_dropped = Index(["a", "b", "c", "d"]) - self.assert_index_equal(c.categories, exp_categories_all) + tm.assert_index_equal(c.categories, exp_categories_all) res = c.remove_unused_categories() - self.assert_index_equal(res.categories, exp_categories_dropped) - self.assert_index_equal(c.categories, exp_categories_all) + tm.assert_index_equal(res.categories, exp_categories_dropped) + tm.assert_index_equal(c.categories, exp_categories_all) res = c.remove_unused_categories(inplace=True) - self.assert_index_equal(c.categories, exp_categories_dropped) - self.assertIsNone(res) + tm.assert_index_equal(c.categories, exp_categories_dropped) + assert res is None # with NaN values (GH11599) c = Categorical(["a", "b", "c", np.nan], categories=["a", "b", "c", "d", "e"]) res = c.remove_unused_categories() - self.assert_index_equal(res.categories, - Index(np.array(["a", "b", "c"]))) + tm.assert_index_equal(res.categories, + Index(np.array(["a", "b", "c"]))) exp_codes = np.array([0, 1, 2, -1], dtype=np.int8) - self.assert_numpy_array_equal(res.codes, exp_codes) - self.assert_index_equal(c.categories, exp_categories_all) + tm.assert_numpy_array_equal(res.codes, exp_codes) + tm.assert_index_equal(c.categories, exp_categories_all) val = ['F', np.nan, 'D', 'B', 'D', 'F', np.nan] cat = pd.Categorical(values=val, categories=list('ABCDEFG')) out = cat.remove_unused_categories() - self.assert_index_equal(out.categories, Index(['B', 'D', 'F'])) + tm.assert_index_equal(out.categories, Index(['B', 'D', 'F'])) exp_codes = np.array([2, -1, 1, 0, 1, 2, -1], dtype=np.int8) - self.assert_numpy_array_equal(out.codes, exp_codes) - self.assertEqual(out.get_values().tolist(), val) + tm.assert_numpy_array_equal(out.codes, exp_codes) + assert out.get_values().tolist() == val alpha = list('abcdefghijklmnopqrstuvwxyz') val = np.random.choice(alpha[::2], 10000).astype('object') @@ -1106,118 +1107,46 @@ def test_remove_unused_categories(self): cat = pd.Categorical(values=val, categories=alpha) out = cat.remove_unused_categories() - self.assertEqual(out.get_values().tolist(), val.tolist()) + assert out.get_values().tolist() == val.tolist() def test_nan_handling(self): # Nans are represented as -1 in codes c = Categorical(["a", "b", np.nan, "a"]) - self.assert_index_equal(c.categories, Index(["a", "b"])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, -1, 0], dtype=np.int8)) + tm.assert_index_equal(c.categories, Index(["a", "b"])) + tm.assert_numpy_array_equal(c._codes, np.array([0, 1, -1, 0], + dtype=np.int8)) c[1] = np.nan - self.assert_index_equal(c.categories, Index(["a", "b"])) - self.assert_numpy_array_equal(c._codes, - np.array([0, -1, -1, 0], dtype=np.int8)) - - # If categories have nan included, the code should point to that - # instead - with tm.assert_produces_warning(FutureWarning): - c = Categorical(["a", "b", np.nan, "a"], - categories=["a", "b", np.nan]) - self.assert_index_equal(c.categories, Index(["a", "b", np.nan])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, 2, 0], dtype=np.int8)) - c[1] = np.nan - self.assert_index_equal(c.categories, Index(["a", "b", np.nan])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 2, 2, 0], dtype=np.int8)) - - # Changing categories should also make the replaced category np.nan - c = Categorical(["a", "b", "c", "a"]) - with tm.assert_produces_warning(FutureWarning): - c.categories = ["a", "b", np.nan] # noqa - - self.assert_index_equal(c.categories, Index(["a", "b", np.nan])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, 2, 0], dtype=np.int8)) + tm.assert_index_equal(c.categories, Index(["a", "b"])) + tm.assert_numpy_array_equal(c._codes, np.array([0, -1, -1, 0], + dtype=np.int8)) # Adding nan to categories should make assigned nan point to the # category! c = Categorical(["a", "b", np.nan, "a"]) - self.assert_index_equal(c.categories, Index(["a", "b"])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, -1, 0], dtype=np.int8)) - with tm.assert_produces_warning(FutureWarning): - c.set_categories(["a", "b", np.nan], rename=True, inplace=True) - - self.assert_index_equal(c.categories, Index(["a", "b", np.nan])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 1, -1, 0], dtype=np.int8)) - c[1] = np.nan - self.assert_index_equal(c.categories, Index(["a", "b", np.nan])) - self.assert_numpy_array_equal(c._codes, - np.array([0, 2, -1, 0], dtype=np.int8)) - - # Remove null categories (GH 10156) - cases = [([1.0, 2.0, np.nan], [1.0, 2.0]), - (['a', 'b', None], ['a', 'b']), - ([pd.Timestamp('2012-05-01'), pd.NaT], - [pd.Timestamp('2012-05-01')])] - - null_values = [np.nan, None, pd.NaT] - - for with_null, without in cases: - with tm.assert_produces_warning(FutureWarning): - base = Categorical([], with_null) - expected = Categorical([], without) + tm.assert_index_equal(c.categories, Index(["a", "b"])) + tm.assert_numpy_array_equal(c._codes, np.array([0, 1, -1, 0], + dtype=np.int8)) - for nullval in null_values: - result = base.remove_categories(nullval) - self.assert_categorical_equal(result, expected) - - # Different null values are indistinguishable - for i, j in [(0, 1), (0, 2), (1, 2)]: - nulls = [null_values[i], null_values[j]] - - def f(): - with tm.assert_produces_warning(FutureWarning): - Categorical([], categories=nulls) - - self.assertRaises(ValueError, f) - - def test_isnull(self): + def test_isna(self): exp = np.array([False, False, True]) c = Categorical(["a", "b", np.nan]) - res = c.isnull() - self.assert_numpy_array_equal(res, exp) + res = c.isna() - with tm.assert_produces_warning(FutureWarning): - c = Categorical(["a", "b", np.nan], categories=["a", "b", np.nan]) - res = c.isnull() - self.assert_numpy_array_equal(res, exp) - - # test both nan in categories and as -1 - exp = np.array([True, False, True]) - c = Categorical(["a", "b", np.nan]) - with tm.assert_produces_warning(FutureWarning): - c.set_categories(["a", "b", np.nan], rename=True, inplace=True) - c[0] = np.nan - res = c.isnull() - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) def test_codes_immutable(self): # Codes should be read only c = Categorical(["a", "b", "c", "a", np.nan]) exp = np.array([0, 1, 2, 0, -1], dtype='int8') - self.assert_numpy_array_equal(c.codes, exp) + tm.assert_numpy_array_equal(c.codes, exp) # Assignments to codes should raise def f(): c.codes = np.array([0, 1, 2, 0, 1], dtype='int8') - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # changes in the codes array should raise # np 1.6.1 raises RuntimeError rather than ValueError @@ -1226,76 +1155,76 @@ def f(): def f(): codes[4] = 1 - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # But even after getting the codes, the original array should still be # writeable! c[4] = "a" exp = np.array([0, 1, 2, 0, 0], dtype='int8') - self.assert_numpy_array_equal(c.codes, exp) + tm.assert_numpy_array_equal(c.codes, exp) c._codes[4] = 2 exp = np.array([0, 1, 2, 0, 2], dtype='int8') - self.assert_numpy_array_equal(c.codes, exp) + tm.assert_numpy_array_equal(c.codes, exp) def test_min_max(self): # unordered cats have no min/max cat = Categorical(["a", "b", "c", "d"], ordered=False) - self.assertRaises(TypeError, lambda: cat.min()) - self.assertRaises(TypeError, lambda: cat.max()) + pytest.raises(TypeError, lambda: cat.min()) + pytest.raises(TypeError, lambda: cat.max()) cat = Categorical(["a", "b", "c", "d"], ordered=True) _min = cat.min() _max = cat.max() - self.assertEqual(_min, "a") - self.assertEqual(_max, "d") + assert _min == "a" + assert _max == "d" cat = Categorical(["a", "b", "c", "d"], categories=['d', 'c', 'b', 'a'], ordered=True) _min = cat.min() _max = cat.max() - self.assertEqual(_min, "d") - self.assertEqual(_max, "a") + assert _min == "d" + assert _max == "a" cat = Categorical([np.nan, "b", "c", np.nan], categories=['d', 'c', 'b', 'a'], ordered=True) _min = cat.min() _max = cat.max() - self.assertTrue(np.isnan(_min)) - self.assertEqual(_max, "b") + assert np.isnan(_min) + assert _max == "b" _min = cat.min(numeric_only=True) - self.assertEqual(_min, "c") + assert _min == "c" _max = cat.max(numeric_only=True) - self.assertEqual(_max, "b") + assert _max == "b" cat = Categorical([np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True) _min = cat.min() _max = cat.max() - self.assertTrue(np.isnan(_min)) - self.assertEqual(_max, 1) + assert np.isnan(_min) + assert _max == 1 _min = cat.min(numeric_only=True) - self.assertEqual(_min, 2) + assert _min == 2 _max = cat.max(numeric_only=True) - self.assertEqual(_max, 1) + assert _max == 1 def test_unique(self): # categories are reordered based on value when ordered=False cat = Categorical(["a", "b"]) exp = Index(["a", "b"]) res = cat.unique() - self.assert_index_equal(res.categories, exp) - self.assert_categorical_equal(res, cat) + tm.assert_index_equal(res.categories, exp) + tm.assert_categorical_equal(res, cat) cat = Categorical(["a", "b", "a", "a"], categories=["a", "b", "c"]) res = cat.unique() - self.assert_index_equal(res.categories, exp) + tm.assert_index_equal(res.categories, exp) tm.assert_categorical_equal(res, Categorical(exp)) cat = Categorical(["c", "a", "b", "a", "a"], categories=["a", "b", "c"]) exp = Index(["c", "a", "b"]) res = cat.unique() - self.assert_index_equal(res.categories, exp) + tm.assert_index_equal(res.categories, exp) exp_cat = Categorical(exp, categories=['c', 'a', 'b']) tm.assert_categorical_equal(res, exp_cat) @@ -1304,7 +1233,7 @@ def test_unique(self): categories=["a", "b", "c"]) res = cat.unique() exp = Index(["b", "a"]) - self.assert_index_equal(res.categories, exp) + tm.assert_index_equal(res.categories, exp) exp_cat = Categorical(["b", np.nan, "a"], categories=["b", "a"]) tm.assert_categorical_equal(res, exp_cat) @@ -1373,13 +1302,14 @@ def test_mode(self): s = Categorical([1, 2, 3, 4, 5], categories=[5, 4, 3, 2, 1], ordered=True) res = s.mode() - exp = Categorical([], categories=[5, 4, 3, 2, 1], ordered=True) + exp = Categorical([5, 4, 3, 2, 1], + categories=[5, 4, 3, 2, 1], ordered=True) tm.assert_categorical_equal(res, exp) # NaN should not become the mode! s = Categorical([np.nan, np.nan, np.nan, 4, 5], categories=[5, 4, 3, 2, 1], ordered=True) res = s.mode() - exp = Categorical([], categories=[5, 4, 3, 2, 1], ordered=True) + exp = Categorical([5, 4], categories=[5, 4, 3, 2, 1], ordered=True) tm.assert_categorical_equal(res, exp) s = Categorical([np.nan, np.nan, np.nan, 4, 5, 4], categories=[5, 4, 3, 2, 1], ordered=True) @@ -1403,35 +1333,35 @@ def test_sort_values(self): # sort_values res = cat.sort_values() exp = np.array(["a", "b", "c", "d"], dtype=object) - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, cat.categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, cat.categories) cat = Categorical(["a", "c", "b", "d"], categories=["a", "b", "c", "d"], ordered=True) res = cat.sort_values() exp = np.array(["a", "b", "c", "d"], dtype=object) - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, cat.categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, cat.categories) res = cat.sort_values(ascending=False) exp = np.array(["d", "c", "b", "a"], dtype=object) - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, cat.categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, cat.categories) # sort (inplace order) cat1 = cat.copy() cat1.sort_values(inplace=True) exp = np.array(["a", "b", "c", "d"], dtype=object) - self.assert_numpy_array_equal(cat1.__array__(), exp) - self.assert_index_equal(res.categories, cat.categories) + tm.assert_numpy_array_equal(cat1.__array__(), exp) + tm.assert_index_equal(res.categories, cat.categories) # reverse cat = Categorical(["a", "c", "c", "b", "d"], ordered=True) res = cat.sort_values(ascending=False) exp_val = np.array(["d", "c", "c", "b", "a"], dtype=object) exp_categories = Index(["a", "b", "c", "d"]) - self.assert_numpy_array_equal(res.__array__(), exp_val) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp_val) + tm.assert_index_equal(res.categories, exp_categories) def test_sort_values_na_position(self): # see gh-12882 @@ -1440,93 +1370,58 @@ def test_sort_values_na_position(self): exp = np.array([2.0, 2.0, 5.0, np.nan, np.nan]) res = cat.sort_values() # default arguments - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, exp_categories) exp = np.array([np.nan, np.nan, 2.0, 2.0, 5.0]) res = cat.sort_values(ascending=True, na_position='first') - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, exp_categories) exp = np.array([np.nan, np.nan, 5.0, 2.0, 2.0]) res = cat.sort_values(ascending=False, na_position='first') - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, exp_categories) exp = np.array([2.0, 2.0, 5.0, np.nan, np.nan]) res = cat.sort_values(ascending=True, na_position='last') - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, exp_categories) exp = np.array([5.0, 2.0, 2.0, np.nan, np.nan]) res = cat.sort_values(ascending=False, na_position='last') - self.assert_numpy_array_equal(res.__array__(), exp) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_index_equal(res.categories, exp_categories) cat = Categorical(["a", "c", "b", "d", np.nan], ordered=True) res = cat.sort_values(ascending=False, na_position='last') exp_val = np.array(["d", "c", "b", "a", np.nan], dtype=object) exp_categories = Index(["a", "b", "c", "d"]) - self.assert_numpy_array_equal(res.__array__(), exp_val) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp_val) + tm.assert_index_equal(res.categories, exp_categories) cat = Categorical(["a", "c", "b", "d", np.nan], ordered=True) res = cat.sort_values(ascending=False, na_position='first') exp_val = np.array([np.nan, "d", "c", "b", "a"], dtype=object) exp_categories = Index(["a", "b", "c", "d"]) - self.assert_numpy_array_equal(res.__array__(), exp_val) - self.assert_index_equal(res.categories, exp_categories) + tm.assert_numpy_array_equal(res.__array__(), exp_val) + tm.assert_index_equal(res.categories, exp_categories) def test_slicing_directly(self): cat = Categorical(["a", "b", "c", "d", "a", "b", "c"]) sliced = cat[3] - self.assertEqual(sliced, "d") + assert sliced == "d" sliced = cat[3:5] expected = Categorical(["d", "a"], categories=['a', 'b', 'c', 'd']) - self.assert_numpy_array_equal(sliced._codes, expected._codes) + tm.assert_numpy_array_equal(sliced._codes, expected._codes) tm.assert_index_equal(sliced.categories, expected.categories) def test_set_item_nan(self): cat = pd.Categorical([1, 2, 3]) - exp = pd.Categorical([1, np.nan, 3], categories=[1, 2, 3]) - cat[1] = np.nan - tm.assert_categorical_equal(cat, exp) - - # if nan in categories, the proper code should be set! - cat = pd.Categorical([1, 2, 3, np.nan], categories=[1, 2, 3]) - with tm.assert_produces_warning(FutureWarning): - cat.set_categories([1, 2, 3, np.nan], rename=True, inplace=True) cat[1] = np.nan - exp = np.array([0, 3, 2, -1], dtype=np.int8) - self.assert_numpy_array_equal(cat.codes, exp) - - cat = pd.Categorical([1, 2, 3, np.nan], categories=[1, 2, 3]) - with tm.assert_produces_warning(FutureWarning): - cat.set_categories([1, 2, 3, np.nan], rename=True, inplace=True) - cat[1:3] = np.nan - exp = np.array([0, 3, 3, -1], dtype=np.int8) - self.assert_numpy_array_equal(cat.codes, exp) - - cat = pd.Categorical([1, 2, 3, np.nan], categories=[1, 2, 3]) - with tm.assert_produces_warning(FutureWarning): - cat.set_categories([1, 2, 3, np.nan], rename=True, inplace=True) - cat[1:3] = [np.nan, 1] - exp = np.array([0, 3, 0, -1], dtype=np.int8) - self.assert_numpy_array_equal(cat.codes, exp) - - cat = pd.Categorical([1, 2, 3, np.nan], categories=[1, 2, 3]) - with tm.assert_produces_warning(FutureWarning): - cat.set_categories([1, 2, 3, np.nan], rename=True, inplace=True) - cat[1:3] = [np.nan, np.nan] - exp = np.array([0, 3, 3, -1], dtype=np.int8) - self.assert_numpy_array_equal(cat.codes, exp) - cat = pd.Categorical([1, 2, np.nan, 3], categories=[1, 2, 3]) - with tm.assert_produces_warning(FutureWarning): - cat.set_categories([1, 2, 3, np.nan], rename=True, inplace=True) - cat[pd.isnull(cat)] = np.nan - exp = np.array([0, 1, 3, 2], dtype=np.int8) - self.assert_numpy_array_equal(cat.codes, exp) + exp = pd.Categorical([1, np.nan, 3], categories=[1, 2, 3]) + tm.assert_categorical_equal(cat, exp) def test_shift(self): # GH 9416 @@ -1535,87 +1430,87 @@ def test_shift(self): # shift forward sp1 = cat.shift(1) xp1 = pd.Categorical([np.nan, 'a', 'b', 'c', 'd']) - self.assert_categorical_equal(sp1, xp1) - self.assert_categorical_equal(cat[:-1], sp1[1:]) + tm.assert_categorical_equal(sp1, xp1) + tm.assert_categorical_equal(cat[:-1], sp1[1:]) # shift back sn2 = cat.shift(-2) xp2 = pd.Categorical(['c', 'd', 'a', np.nan, np.nan], categories=['a', 'b', 'c', 'd']) - self.assert_categorical_equal(sn2, xp2) - self.assert_categorical_equal(cat[2:], sn2[:-2]) + tm.assert_categorical_equal(sn2, xp2) + tm.assert_categorical_equal(cat[2:], sn2[:-2]) # shift by zero - self.assert_categorical_equal(cat, cat.shift(0)) + tm.assert_categorical_equal(cat, cat.shift(0)) def test_nbytes(self): cat = pd.Categorical([1, 2, 3]) exp = cat._codes.nbytes + cat._categories.values.nbytes - self.assertEqual(cat.nbytes, exp) + assert cat.nbytes == exp def test_memory_usage(self): cat = pd.Categorical([1, 2, 3]) # .categories is an index, so we include the hashtable - self.assertTrue(cat.nbytes > 0 and cat.nbytes <= cat.memory_usage()) - self.assertTrue(cat.nbytes > 0 and - cat.nbytes <= cat.memory_usage(deep=True)) + assert 0 < cat.nbytes <= cat.memory_usage() + assert 0 < cat.nbytes <= cat.memory_usage(deep=True) cat = pd.Categorical(['foo', 'foo', 'bar']) - self.assertTrue(cat.memory_usage(deep=True) > cat.nbytes) + assert cat.memory_usage(deep=True) > cat.nbytes - # sys.getsizeof will call the .memory_usage with - # deep=True, and add on some GC overhead - diff = cat.memory_usage(deep=True) - sys.getsizeof(cat) - self.assertTrue(abs(diff) < 100) + if not PYPY: + # sys.getsizeof will call the .memory_usage with + # deep=True, and add on some GC overhead + diff = cat.memory_usage(deep=True) - sys.getsizeof(cat) + assert abs(diff) < 100 def test_searchsorted(self): # https://github.com/pandas-dev/pandas/issues/8420 # https://github.com/pandas-dev/pandas/issues/14522 c1 = pd.Categorical(['cheese', 'milk', 'apple', 'bread', 'bread'], - categories=['cheese', 'milk', 'apple', 'bread'], - ordered=True) + categories=['cheese', 'milk', 'apple', 'bread'], + ordered=True) s1 = pd.Series(c1) c2 = pd.Categorical(['cheese', 'milk', 'apple', 'bread', 'bread'], - categories=['cheese', 'milk', 'apple', 'bread'], - ordered=False) + categories=['cheese', 'milk', 'apple', 'bread'], + ordered=False) s2 = pd.Series(c2) # Searching for single item argument, side='left' (default) res_cat = c1.searchsorted('apple') res_ser = s1.searchsorted('apple') exp = np.array([2], dtype=np.intp) - self.assert_numpy_array_equal(res_cat, exp) - self.assert_numpy_array_equal(res_ser, exp) + tm.assert_numpy_array_equal(res_cat, exp) + tm.assert_numpy_array_equal(res_ser, exp) # Searching for single item array, side='left' (default) res_cat = c1.searchsorted(['bread']) res_ser = s1.searchsorted(['bread']) exp = np.array([3], dtype=np.intp) - self.assert_numpy_array_equal(res_cat, exp) - self.assert_numpy_array_equal(res_ser, exp) + tm.assert_numpy_array_equal(res_cat, exp) + tm.assert_numpy_array_equal(res_ser, exp) # Searching for several items array, side='right' res_cat = c1.searchsorted(['apple', 'bread'], side='right') res_ser = s1.searchsorted(['apple', 'bread'], side='right') exp = np.array([3, 5], dtype=np.intp) - self.assert_numpy_array_equal(res_cat, exp) - self.assert_numpy_array_equal(res_ser, exp) + tm.assert_numpy_array_equal(res_cat, exp) + tm.assert_numpy_array_equal(res_ser, exp) # Searching for a single value that is not from the Categorical - self.assertRaises(ValueError, lambda: c1.searchsorted('cucumber')) - self.assertRaises(ValueError, lambda: s1.searchsorted('cucumber')) + pytest.raises(ValueError, lambda: c1.searchsorted('cucumber')) + pytest.raises(ValueError, lambda: s1.searchsorted('cucumber')) # Searching for multiple values one of each is not from the Categorical - self.assertRaises(ValueError, - lambda: c1.searchsorted(['bread', 'cucumber'])) - self.assertRaises(ValueError, - lambda: s1.searchsorted(['bread', 'cucumber'])) + pytest.raises(ValueError, + lambda: c1.searchsorted(['bread', 'cucumber'])) + pytest.raises(ValueError, + lambda: s1.searchsorted(['bread', 'cucumber'])) # searchsorted call for unordered Categorical - self.assertRaises(ValueError, lambda: c2.searchsorted('apple')) - self.assertRaises(ValueError, lambda: s2.searchsorted('apple')) + pytest.raises(ValueError, lambda: c2.searchsorted('apple')) + pytest.raises(ValueError, lambda: s2.searchsorted('apple')) with tm.assert_produces_warning(FutureWarning): res = c1.searchsorted(v=['bread']) @@ -1629,37 +1524,28 @@ def test_deprecated_labels(self): exp = cat.codes with tm.assert_produces_warning(FutureWarning): res = cat.labels - self.assert_numpy_array_equal(res, exp) + tm.assert_numpy_array_equal(res, exp) def test_deprecated_from_array(self): # GH13854, `.from_array` is deprecated with tm.assert_produces_warning(FutureWarning): Categorical.from_array([0, 1]) - def test_removed_names_produces_warning(self): - - # 10482 - with tm.assert_produces_warning(UserWarning): - Categorical([0, 1], name="a") - - with tm.assert_produces_warning(UserWarning): - Categorical.from_codes([1, 2], ["a", "b", "c"], name="a") - def test_datetime_categorical_comparison(self): dt_cat = pd.Categorical( pd.date_range('2014-01-01', periods=3), ordered=True) - self.assert_numpy_array_equal(dt_cat > dt_cat[0], - np.array([False, True, True])) - self.assert_numpy_array_equal(dt_cat[0] < dt_cat, - np.array([False, True, True])) + tm.assert_numpy_array_equal(dt_cat > dt_cat[0], + np.array([False, True, True])) + tm.assert_numpy_array_equal(dt_cat[0] < dt_cat, + np.array([False, True, True])) def test_reflected_comparison_with_scalars(self): # GH8658 cat = pd.Categorical([1, 2, 3], ordered=True) - self.assert_numpy_array_equal(cat > cat[0], - np.array([False, True, True])) - self.assert_numpy_array_equal(cat[0] < cat, - np.array([False, True, True])) + tm.assert_numpy_array_equal(cat > cat[0], + np.array([False, True, True])) + tm.assert_numpy_array_equal(cat[0] < cat, + np.array([False, True, True])) def test_comparison_with_unknown_scalars(self): # https://github.com/pandas-dev/pandas/issues/9836#issuecomment-92123057 @@ -1667,15 +1553,15 @@ def test_comparison_with_unknown_scalars(self): # for unequal comps, but not for equal/not equal cat = pd.Categorical([1, 2, 3], ordered=True) - self.assertRaises(TypeError, lambda: cat < 4) - self.assertRaises(TypeError, lambda: cat > 4) - self.assertRaises(TypeError, lambda: 4 < cat) - self.assertRaises(TypeError, lambda: 4 > cat) + pytest.raises(TypeError, lambda: cat < 4) + pytest.raises(TypeError, lambda: cat > 4) + pytest.raises(TypeError, lambda: 4 < cat) + pytest.raises(TypeError, lambda: 4 > cat) - self.assert_numpy_array_equal(cat == 4, - np.array([False, False, False])) - self.assert_numpy_array_equal(cat != 4, - np.array([True, True, True])) + tm.assert_numpy_array_equal(cat == 4, + np.array([False, False, False])) + tm.assert_numpy_array_equal(cat != 4, + np.array([True, True, True])) def test_map(self): c = pd.Categorical(list('ABABC'), categories=list('CBA'), @@ -1697,53 +1583,55 @@ def test_map(self): tm.assert_index_equal(result, Index(np.array([1] * 5, dtype=np.int64))) def test_validate_inplace(self): - cat = Categorical(['A','B','B','C','A']) - invalid_values = [1, "True", [1,2,3], 5.0] + cat = Categorical(['A', 'B', 'B', 'C', 'A']) + invalid_values = [1, "True", [1, 2, 3], 5.0] for value in invalid_values: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): cat.set_ordered(value=True, inplace=value) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): cat.as_ordered(inplace=value) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): cat.as_unordered(inplace=value) - with self.assertRaises(ValueError): - cat.set_categories(['X','Y','Z'], rename=True, inplace=value) + with pytest.raises(ValueError): + cat.set_categories(['X', 'Y', 'Z'], rename=True, inplace=value) - with self.assertRaises(ValueError): - cat.rename_categories(['X','Y','Z'], inplace=value) + with pytest.raises(ValueError): + cat.rename_categories(['X', 'Y', 'Z'], inplace=value) - with self.assertRaises(ValueError): - cat.reorder_categories(['X','Y','Z'], ordered=True, inplace=value) + with pytest.raises(ValueError): + cat.reorder_categories( + ['X', 'Y', 'Z'], ordered=True, inplace=value) - with self.assertRaises(ValueError): - cat.add_categories(new_categories=['D','E','F'], inplace=value) + with pytest.raises(ValueError): + cat.add_categories( + new_categories=['D', 'E', 'F'], inplace=value) - with self.assertRaises(ValueError): - cat.remove_categories(removals=['D','E','F'], inplace=value) + with pytest.raises(ValueError): + cat.remove_categories(removals=['D', 'E', 'F'], inplace=value) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): cat.remove_unused_categories(inplace=value) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): cat.sort_values(inplace=value) -class TestCategoricalAsBlock(tm.TestCase): - _multiprocess_can_split_ = True +class TestCategoricalAsBlock(object): - def setUp(self): + def setup_method(self, method): self.factor = Categorical(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c']) df = DataFrame({'value': np.random.randint(0, 10000, 100)}) labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)] + cat_labels = Categorical(labels, labels) df = df.sort_values(by=['value'], ascending=True) - df['value_group'] = pd.cut(df.value, range(0, 10500, 500), right=False, - labels=labels) + df['value_group'] = pd.cut(df.value, range(0, 10500, 500), + right=False, labels=cat_labels) self.cat = df def test_dtypes(self): @@ -1771,30 +1659,30 @@ def test_codes_dtypes(self): # GH 8453 result = Categorical(['foo', 'bar', 'baz']) - self.assertTrue(result.codes.dtype == 'int8') + assert result.codes.dtype == 'int8' result = Categorical(['foo%05d' % i for i in range(400)]) - self.assertTrue(result.codes.dtype == 'int16') + assert result.codes.dtype == 'int16' result = Categorical(['foo%05d' % i for i in range(40000)]) - self.assertTrue(result.codes.dtype == 'int32') + assert result.codes.dtype == 'int32' # adding cats result = Categorical(['foo', 'bar', 'baz']) - self.assertTrue(result.codes.dtype == 'int8') + assert result.codes.dtype == 'int8' result = result.add_categories(['foo%05d' % i for i in range(400)]) - self.assertTrue(result.codes.dtype == 'int16') + assert result.codes.dtype == 'int16' # removing cats result = result.remove_categories(['foo%05d' % i for i in range(300)]) - self.assertTrue(result.codes.dtype == 'int8') + assert result.codes.dtype == 'int8' def test_basic(self): # test basic creation / coercion of categoricals s = Series(self.factor, name='A') - self.assertEqual(s.dtype, 'category') - self.assertEqual(len(s), len(self.factor)) + assert s.dtype == 'category' + assert len(s) == len(self.factor) str(s.values) str(s) @@ -1804,14 +1692,14 @@ def test_basic(self): tm.assert_series_equal(result, s) result = df.iloc[:, 0] tm.assert_series_equal(result, s) - self.assertEqual(len(df), len(self.factor)) + assert len(df) == len(self.factor) str(df.values) str(df) df = DataFrame({'A': s}) result = df['A'] tm.assert_series_equal(result, s) - self.assertEqual(len(df), len(self.factor)) + assert len(df) == len(self.factor) str(df.values) str(df) @@ -1821,8 +1709,8 @@ def test_basic(self): result2 = df['B'] tm.assert_series_equal(result1, s) tm.assert_series_equal(result2, s, check_names=False) - self.assertEqual(result2.name, 'B') - self.assertEqual(len(df), len(self.factor)) + assert result2.name == 'B' + assert len(df) == len(self.factor) str(df.values) str(df) @@ -1835,13 +1723,13 @@ def test_basic(self): expected = x.iloc[0].person_name result = x.person_name.iloc[0] - self.assertEqual(result, expected) + assert result == expected result = x.person_name[0] - self.assertEqual(result, expected) + assert result == expected result = x.person_name.loc[0] - self.assertEqual(result, expected) + assert result == expected def test_creation_astype(self): l = ["a", "b", "c", "a"] @@ -1948,20 +1836,22 @@ def test_construction_frame(self): tm.assert_frame_equal(df, expected) # invalid (shape) - self.assertRaises( + pytest.raises( ValueError, lambda: DataFrame([pd.Categorical(list('abc')), pd.Categorical(list('abdefg'))])) # ndim > 1 - self.assertRaises(NotImplementedError, - lambda: pd.Categorical(np.array([list('abcd')]))) + pytest.raises(NotImplementedError, + lambda: pd.Categorical(np.array([list('abcd')]))) def test_reshaping(self): - p = tm.makePanel() - p['str'] = 'foo' - df = p.to_frame() + with catch_warnings(record=True): + p = tm.makePanel() + p['str'] = 'foo' + df = p.to_frame() + df['category'] = df['str'].astype('category') result = df['category'].unstack() @@ -2005,67 +1895,47 @@ def test_sideeffects_free(self): # other one, IF you specify copy! cat = Categorical(["a", "b", "c", "a"]) s = pd.Series(cat, copy=True) - self.assertFalse(s.cat is cat) + assert s.cat is not cat s.cat.categories = [1, 2, 3] exp_s = np.array([1, 2, 3, 1], dtype=np.int64) exp_cat = np.array(["a", "b", "c", "a"], dtype=np.object_) - self.assert_numpy_array_equal(s.__array__(), exp_s) - self.assert_numpy_array_equal(cat.__array__(), exp_cat) + tm.assert_numpy_array_equal(s.__array__(), exp_s) + tm.assert_numpy_array_equal(cat.__array__(), exp_cat) # setting s[0] = 2 exp_s2 = np.array([2, 2, 3, 1], dtype=np.int64) - self.assert_numpy_array_equal(s.__array__(), exp_s2) - self.assert_numpy_array_equal(cat.__array__(), exp_cat) + tm.assert_numpy_array_equal(s.__array__(), exp_s2) + tm.assert_numpy_array_equal(cat.__array__(), exp_cat) # however, copy is False by default # so this WILL change values cat = Categorical(["a", "b", "c", "a"]) s = pd.Series(cat) - self.assertTrue(s.values is cat) + assert s.values is cat s.cat.categories = [1, 2, 3] exp_s = np.array([1, 2, 3, 1], dtype=np.int64) - self.assert_numpy_array_equal(s.__array__(), exp_s) - self.assert_numpy_array_equal(cat.__array__(), exp_s) + tm.assert_numpy_array_equal(s.__array__(), exp_s) + tm.assert_numpy_array_equal(cat.__array__(), exp_s) s[0] = 2 exp_s2 = np.array([2, 2, 3, 1], dtype=np.int64) - self.assert_numpy_array_equal(s.__array__(), exp_s2) - self.assert_numpy_array_equal(cat.__array__(), exp_s2) + tm.assert_numpy_array_equal(s.__array__(), exp_s2) + tm.assert_numpy_array_equal(cat.__array__(), exp_s2) def test_nan_handling(self): - # Nans are represented as -1 in labels + # NaNs are represented as -1 in labels s = Series(Categorical(["a", "b", np.nan, "a"])) - self.assert_index_equal(s.cat.categories, Index(["a", "b"])) - self.assert_numpy_array_equal(s.values.codes, - np.array([0, 1, -1, 0], dtype=np.int8)) - - # If categories have nan included, the label should point to that - # instead - with tm.assert_produces_warning(FutureWarning): - s2 = Series(Categorical(["a", "b", np.nan, "a"], - categories=["a", "b", np.nan])) - - exp_cat = Index(["a", "b", np.nan]) - self.assert_index_equal(s2.cat.categories, exp_cat) - self.assert_numpy_array_equal(s2.values.codes, - np.array([0, 1, 2, 0], dtype=np.int8)) - - # Changing categories should also make the replaced category np.nan - s3 = Series(Categorical(["a", "b", "c", "a"])) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - s3.cat.categories = ["a", "b", np.nan] - - exp_cat = Index(["a", "b", np.nan]) - self.assert_index_equal(s3.cat.categories, exp_cat) - self.assert_numpy_array_equal(s3.values.codes, - np.array([0, 1, 2, 0], dtype=np.int8)) + tm.assert_index_equal(s.cat.categories, Index(["a", "b"])) + tm.assert_numpy_array_equal(s.values.codes, + np.array([0, 1, -1, 0], dtype=np.int8)) def test_cat_accessor(self): s = Series(Categorical(["a", "b", np.nan, "a"])) - self.assert_index_equal(s.cat.categories, Index(["a", "b"])) - self.assertEqual(s.cat.ordered, False) + tm.assert_index_equal(s.cat.categories, Index(["a", "b"])) + assert not s.cat.ordered, False + exp = Categorical(["a", "b", np.nan, "a"], categories=["b", "a"]) s.cat.set_categories(["b", "a"], inplace=True) tm.assert_categorical_equal(s.values, exp) @@ -2073,10 +1943,9 @@ def test_cat_accessor(self): res = s.cat.set_categories(["b", "a"]) tm.assert_categorical_equal(res.values, exp) - exp = Categorical(["a", "b", np.nan, "a"], categories=["b", "a"]) s[:] = "a" s = s.cat.remove_unused_categories() - self.assert_index_equal(s.cat.categories, Index(["a"])) + tm.assert_index_equal(s.cat.categories, Index(["a"])) def test_sequence_like(self): @@ -2104,15 +1973,15 @@ def test_sequence_like(self): def test_series_delegations(self): # invalid accessor - self.assertRaises(AttributeError, lambda: Series([1, 2, 3]).cat) - tm.assertRaisesRegexp( + pytest.raises(AttributeError, lambda: Series([1, 2, 3]).cat) + tm.assert_raises_regex( AttributeError, r"Can only use .cat accessor with a 'category' dtype", lambda: Series([1, 2, 3]).cat) - self.assertRaises(AttributeError, lambda: Series(['a', 'b', 'c']).cat) - self.assertRaises(AttributeError, lambda: Series(np.arange(5.)).cat) - self.assertRaises(AttributeError, - lambda: Series([Timestamp('20130101')]).cat) + pytest.raises(AttributeError, lambda: Series(['a', 'b', 'c']).cat) + pytest.raises(AttributeError, lambda: Series(np.arange(5.)).cat) + pytest.raises(AttributeError, + lambda: Series([Timestamp('20130101')]).cat) # Series should delegate calls to '.categories', '.codes', '.ordered' # and the methods '.set_categories()' 'drop_unused_categories()' to the @@ -2122,16 +1991,16 @@ def test_series_delegations(self): tm.assert_index_equal(s.cat.categories, exp_categories) s.cat.categories = [1, 2, 3] exp_categories = Index([1, 2, 3]) - self.assert_index_equal(s.cat.categories, exp_categories) + tm.assert_index_equal(s.cat.categories, exp_categories) exp_codes = Series([0, 1, 2, 0], dtype='int8') tm.assert_series_equal(s.cat.codes, exp_codes) - self.assertEqual(s.cat.ordered, True) + assert s.cat.ordered s = s.cat.as_unordered() - self.assertEqual(s.cat.ordered, False) + assert not s.cat.ordered s.cat.as_ordered(inplace=True) - self.assertEqual(s.cat.ordered, True) + assert s.cat.ordered # reorder s = Series(Categorical(["a", "b", "c", "a"], ordered=True)) @@ -2139,8 +2008,8 @@ def test_series_delegations(self): exp_values = np.array(["a", "b", "c", "a"], dtype=np.object_) s = s.cat.set_categories(["c", "b", "a"]) tm.assert_index_equal(s.cat.categories, exp_categories) - self.assert_numpy_array_equal(s.values.__array__(), exp_values) - self.assert_numpy_array_equal(s.__array__(), exp_values) + tm.assert_numpy_array_equal(s.values.__array__(), exp_values) + tm.assert_numpy_array_equal(s.__array__(), exp_values) # remove unused categories s = Series(Categorical(["a", "b", "b", "a"], categories=["a", "b", "c" @@ -2148,16 +2017,16 @@ def test_series_delegations(self): exp_categories = Index(["a", "b"]) exp_values = np.array(["a", "b", "b", "a"], dtype=np.object_) s = s.cat.remove_unused_categories() - self.assert_index_equal(s.cat.categories, exp_categories) - self.assert_numpy_array_equal(s.values.__array__(), exp_values) - self.assert_numpy_array_equal(s.__array__(), exp_values) + tm.assert_index_equal(s.cat.categories, exp_categories) + tm.assert_numpy_array_equal(s.values.__array__(), exp_values) + tm.assert_numpy_array_equal(s.__array__(), exp_values) # This method is likely to be confused, so test that it raises an error # on wrong inputs: def f(): s.set_categories([4, 3, 2, 1]) - self.assertRaises(Exception, f) + pytest.raises(Exception, f) # right: s.cat.set_categories([4,3,2,1]) def test_series_functions_no_warnings(self): @@ -2169,9 +2038,10 @@ def test_series_functions_no_warnings(self): def test_assignment_to_dataframe(self): # assignment - df = DataFrame({'value': np.array(np.random.randint(0, 10000, 100), - dtype='int32')}) - labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)] + df = DataFrame({'value': np.array( + np.random.randint(0, 10000, 100), dtype='int32')}) + labels = Categorical(["{0} - {1}".format(i, i + 499) + for i in range(0, 10000, 500)]) df = df.sort_values(by=['value'], ascending=True) s = pd.cut(df.value, range(0, 10500, 500), right=False, labels=labels) @@ -2195,11 +2065,11 @@ def test_assignment_to_dataframe(self): result1 = df['D'] result2 = df['E'] - self.assert_categorical_equal(result1._data._block.values, d) + tm.assert_categorical_equal(result1._data._block.values, d) # sorting s.name = 'E' - self.assert_series_equal(result2.sort_index(), s.sort_index()) + tm.assert_series_equal(result2.sort_index(), s.sort_index()) cat = pd.Categorical([1, 2, 3, 10], categories=[1, 2, 3, 4, 10]) df = pd.DataFrame(pd.Series(cat)) @@ -2208,7 +2078,7 @@ def test_describe(self): # Categoricals should not show up together with numerical columns result = self.cat.describe() - self.assertEqual(len(result.columns), 1) + assert len(result.columns) == 1 # In a frame, describe() for the cat should be the same as for string # arrays (count, unique, top, freq) @@ -2224,82 +2094,82 @@ def test_describe(self): cat = pd.Series(pd.Categorical(["a", "b", "c", "c"])) df3 = pd.DataFrame({"cat": cat, "s": ["a", "b", "c", "c"]}) res = df3.describe() - self.assert_numpy_array_equal(res["cat"].values, res["s"].values) + tm.assert_numpy_array_equal(res["cat"].values, res["s"].values) def test_repr(self): a = pd.Series(pd.Categorical([1, 2, 3, 4])) exp = u("0 1\n1 2\n2 3\n3 4\n" + "dtype: category\nCategories (4, int64): [1, 2, 3, 4]") - self.assertEqual(exp, a.__unicode__()) + assert exp == a.__unicode__() a = pd.Series(pd.Categorical(["a", "b"] * 25)) exp = u("0 a\n1 b\n" + " ..\n" + "48 a\n49 b\n" + - "dtype: category\nCategories (2, object): [a, b]") + "Length: 50, dtype: category\nCategories (2, object): [a, b]") with option_context("display.max_rows", 5): - self.assertEqual(exp, repr(a)) + assert exp == repr(a) levs = list("abcdefghijklmnopqrstuvwxyz") a = pd.Series(pd.Categorical( ["a", "b"], categories=levs, ordered=True)) exp = u("0 a\n1 b\n" + "dtype: category\n" "Categories (26, object): [a < b < c < d ... w < x < y < z]") - self.assertEqual(exp, a.__unicode__()) + assert exp == a.__unicode__() def test_categorical_repr(self): c = pd.Categorical([1, 2, 3]) exp = """[1, 2, 3] Categories (3, int64): [1, 2, 3]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical([1, 2, 3, 1, 2, 3], categories=[1, 2, 3]) exp = """[1, 2, 3, 1, 2, 3] Categories (3, int64): [1, 2, 3]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical([1, 2, 3, 4, 5] * 10) exp = """[1, 2, 3, 4, 5, ..., 1, 2, 3, 4, 5] Length: 50 Categories (5, int64): [1, 2, 3, 4, 5]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(np.arange(20)) exp = """[0, 1, 2, 3, 4, ..., 15, 16, 17, 18, 19] Length: 20 Categories (20, int64): [0, 1, 2, 3, ..., 16, 17, 18, 19]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_ordered(self): c = pd.Categorical([1, 2, 3], ordered=True) exp = """[1, 2, 3] Categories (3, int64): [1 < 2 < 3]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical([1, 2, 3, 1, 2, 3], categories=[1, 2, 3], ordered=True) exp = """[1, 2, 3, 1, 2, 3] Categories (3, int64): [1 < 2 < 3]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical([1, 2, 3, 4, 5] * 10, ordered=True) exp = """[1, 2, 3, 4, 5, ..., 1, 2, 3, 4, 5] Length: 50 Categories (5, int64): [1 < 2 < 3 < 4 < 5]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(np.arange(20), ordered=True) exp = """[0, 1, 2, 3, 4, ..., 15, 16, 17, 18, 19] Length: 20 Categories (20, int64): [0 < 1 < 2 < 3 ... 16 < 17 < 18 < 19]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_datetime(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2314,7 +2184,7 @@ def test_categorical_repr_datetime(self): "2011-01-01 10:00:00, 2011-01-01 11:00:00,\n" " 2011-01-01 12:00:00, " "2011-01-01 13:00:00]""") - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = ( @@ -2327,7 +2197,7 @@ def test_categorical_repr_datetime(self): " 2011-01-01 12:00:00, " "2011-01-01 13:00:00]") - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2343,7 +2213,7 @@ def test_categorical_repr_datetime(self): " " "2011-01-01 13:00:00-05:00]") - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = ( @@ -2359,7 +2229,7 @@ def test_categorical_repr_datetime(self): " " "2011-01-01 13:00:00-05:00]") - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_datetime_ordered(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2368,14 +2238,14 @@ def test_categorical_repr_datetime_ordered(self): Categories (5, datetime64[ns]): [2011-01-01 09:00:00 < 2011-01-01 10:00:00 < 2011-01-01 11:00:00 < 2011-01-01 12:00:00 < 2011-01-01 13:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00, 2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00] Categories (5, datetime64[ns]): [2011-01-01 09:00:00 < 2011-01-01 10:00:00 < 2011-01-01 11:00:00 < 2011-01-01 12:00:00 < 2011-01-01 13:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2385,73 +2255,73 @@ def test_categorical_repr_datetime_ordered(self): 2011-01-01 11:00:00-05:00 < 2011-01-01 12:00:00-05:00 < 2011-01-01 13:00:00-05:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00, 2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00] Categories (5, datetime64[ns, US/Eastern]): [2011-01-01 09:00:00-05:00 < 2011-01-01 10:00:00-05:00 < 2011-01-01 11:00:00-05:00 < 2011-01-01 12:00:00-05:00 < - 2011-01-01 13:00:00-05:00]""" + 2011-01-01 13:00:00-05:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_period(self): idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) c = pd.Categorical(idx) exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] Categories (5, period[H]): [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00, 2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] Categories (5, period[H]): [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.period_range('2011-01', freq='M', periods=5) c = pd.Categorical(idx) exp = """[2011-01, 2011-02, 2011-03, 2011-04, 2011-05] Categories (5, period[M]): [2011-01, 2011-02, 2011-03, 2011-04, 2011-05]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = """[2011-01, 2011-02, 2011-03, 2011-04, 2011-05, 2011-01, 2011-02, 2011-03, 2011-04, 2011-05] -Categories (5, period[M]): [2011-01, 2011-02, 2011-03, 2011-04, 2011-05]""" +Categories (5, period[M]): [2011-01, 2011-02, 2011-03, 2011-04, 2011-05]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_period_ordered(self): idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) c = pd.Categorical(idx, ordered=True) exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00, 2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.period_range('2011-01', freq='M', periods=5) c = pd.Categorical(idx, ordered=True) exp = """[2011-01, 2011-02, 2011-03, 2011-04, 2011-05] Categories (5, period[M]): [2011-01 < 2011-02 < 2011-03 < 2011-04 < 2011-05]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[2011-01, 2011-02, 2011-03, 2011-04, 2011-05, 2011-01, 2011-02, 2011-03, 2011-04, 2011-05] -Categories (5, period[M]): [2011-01 < 2011-02 < 2011-03 < 2011-04 < 2011-05]""" +Categories (5, period[M]): [2011-01 < 2011-02 < 2011-03 < 2011-04 < 2011-05]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_timedelta(self): idx = pd.timedelta_range('1 days', periods=5) @@ -2459,13 +2329,13 @@ def test_categorical_repr_timedelta(self): exp = """[1 days, 2 days, 3 days, 4 days, 5 days] Categories (5, timedelta64[ns]): [1 days, 2 days, 3 days, 4 days, 5 days]""" - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = """[1 days, 2 days, 3 days, 4 days, 5 days, 1 days, 2 days, 3 days, 4 days, 5 days] -Categories (5, timedelta64[ns]): [1 days, 2 days, 3 days, 4 days, 5 days]""" +Categories (5, timedelta64[ns]): [1 days, 2 days, 3 days, 4 days, 5 days]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.timedelta_range('1 hours', periods=20) c = pd.Categorical(idx) @@ -2473,32 +2343,32 @@ def test_categorical_repr_timedelta(self): Length: 20 Categories (20, timedelta64[ns]): [0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, ..., 16 days 01:00:00, 17 days 01:00:00, - 18 days 01:00:00, 19 days 01:00:00]""" + 18 days 01:00:00, 19 days 01:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx) exp = """[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, ..., 15 days 01:00:00, 16 days 01:00:00, 17 days 01:00:00, 18 days 01:00:00, 19 days 01:00:00] Length: 40 Categories (20, timedelta64[ns]): [0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, ..., 16 days 01:00:00, 17 days 01:00:00, - 18 days 01:00:00, 19 days 01:00:00]""" + 18 days 01:00:00, 19 days 01:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_repr_timedelta_ordered(self): idx = pd.timedelta_range('1 days', periods=5) c = pd.Categorical(idx, ordered=True) exp = """[1 days, 2 days, 3 days, 4 days, 5 days] -Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" +Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[1 days, 2 days, 3 days, 4 days, 5 days, 1 days, 2 days, 3 days, 4 days, 5 days] -Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" +Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp idx = pd.timedelta_range('1 hours', periods=20) c = pd.Categorical(idx, ordered=True) @@ -2506,18 +2376,18 @@ def test_categorical_repr_timedelta_ordered(self): Length: 20 Categories (20, timedelta64[ns]): [0 days 01:00:00 < 1 days 01:00:00 < 2 days 01:00:00 < 3 days 01:00:00 ... 16 days 01:00:00 < 17 days 01:00:00 < - 18 days 01:00:00 < 19 days 01:00:00]""" + 18 days 01:00:00 < 19 days 01:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp c = pd.Categorical(idx.append(idx), categories=idx, ordered=True) exp = """[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, ..., 15 days 01:00:00, 16 days 01:00:00, 17 days 01:00:00, 18 days 01:00:00, 19 days 01:00:00] Length: 40 Categories (20, timedelta64[ns]): [0 days 01:00:00 < 1 days 01:00:00 < 2 days 01:00:00 < 3 days 01:00:00 ... 16 days 01:00:00 < 17 days 01:00:00 < - 18 days 01:00:00 < 19 days 01:00:00]""" + 18 days 01:00:00 < 19 days 01:00:00]""" # noqa - self.assertEqual(repr(c), exp) + assert repr(c) == exp def test_categorical_series_repr(self): s = pd.Series(pd.Categorical([1, 2, 3])) @@ -2527,7 +2397,7 @@ def test_categorical_series_repr(self): dtype: category Categories (3, int64): [1, 2, 3]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp s = pd.Series(pd.Categorical(np.arange(10))) exp = """0 0 @@ -2543,7 +2413,7 @@ def test_categorical_series_repr(self): dtype: category Categories (10, int64): [0, 1, 2, 3, ..., 6, 7, 8, 9]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_ordered(self): s = pd.Series(pd.Categorical([1, 2, 3], ordered=True)) @@ -2553,7 +2423,7 @@ def test_categorical_series_repr_ordered(self): dtype: category Categories (3, int64): [1 < 2 < 3]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp s = pd.Series(pd.Categorical(np.arange(10), ordered=True)) exp = """0 0 @@ -2569,7 +2439,7 @@ def test_categorical_series_repr_ordered(self): dtype: category Categories (10, int64): [0 < 1 < 2 < 3 ... 6 < 7 < 8 < 9]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_datetime(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2581,9 +2451,9 @@ def test_categorical_series_repr_datetime(self): 4 2011-01-01 13:00:00 dtype: category Categories (5, datetime64[ns]): [2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, - 2011-01-01 12:00:00, 2011-01-01 13:00:00]""" + 2011-01-01 12:00:00, 2011-01-01 13:00:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2596,9 +2466,9 @@ def test_categorical_series_repr_datetime(self): dtype: category Categories (5, datetime64[ns, US/Eastern]): [2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, - 2011-01-01 13:00:00-05:00]""" + 2011-01-01 13:00:00-05:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_datetime_ordered(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2610,9 +2480,9 @@ def test_categorical_series_repr_datetime_ordered(self): 4 2011-01-01 13:00:00 dtype: category Categories (5, datetime64[ns]): [2011-01-01 09:00:00 < 2011-01-01 10:00:00 < 2011-01-01 11:00:00 < - 2011-01-01 12:00:00 < 2011-01-01 13:00:00]""" + 2011-01-01 12:00:00 < 2011-01-01 13:00:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2625,9 +2495,9 @@ def test_categorical_series_repr_datetime_ordered(self): dtype: category Categories (5, datetime64[ns, US/Eastern]): [2011-01-01 09:00:00-05:00 < 2011-01-01 10:00:00-05:00 < 2011-01-01 11:00:00-05:00 < 2011-01-01 12:00:00-05:00 < - 2011-01-01 13:00:00-05:00]""" + 2011-01-01 13:00:00-05:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_period(self): idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) @@ -2639,9 +2509,9 @@ def test_categorical_series_repr_period(self): 4 2011-01-01 13:00 dtype: category Categories (5, period[H]): [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.period_range('2011-01', freq='M', periods=5) s = pd.Series(pd.Categorical(idx)) @@ -2653,7 +2523,7 @@ def test_categorical_series_repr_period(self): dtype: category Categories (5, period[M]): [2011-01, 2011-02, 2011-03, 2011-04, 2011-05]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_period_ordered(self): idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) @@ -2665,9 +2535,9 @@ def test_categorical_series_repr_period_ordered(self): 4 2011-01-01 13:00 dtype: category Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < - 2011-01-01 13:00]""" + 2011-01-01 13:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.period_range('2011-01', freq='M', periods=5) s = pd.Series(pd.Categorical(idx, ordered=True)) @@ -2679,7 +2549,7 @@ def test_categorical_series_repr_period_ordered(self): dtype: category Categories (5, period[M]): [2011-01 < 2011-02 < 2011-03 < 2011-04 < 2011-05]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_timedelta(self): idx = pd.timedelta_range('1 days', periods=5) @@ -2692,7 +2562,7 @@ def test_categorical_series_repr_timedelta(self): dtype: category Categories (5, timedelta64[ns]): [1 days, 2 days, 3 days, 4 days, 5 days]""" - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.timedelta_range('1 hours', periods=10) s = pd.Series(pd.Categorical(idx)) @@ -2709,9 +2579,9 @@ def test_categorical_series_repr_timedelta(self): dtype: category Categories (10, timedelta64[ns]): [0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, ..., 6 days 01:00:00, 7 days 01:00:00, - 8 days 01:00:00, 9 days 01:00:00]""" + 8 days 01:00:00, 9 days 01:00:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_series_repr_timedelta_ordered(self): idx = pd.timedelta_range('1 days', periods=5) @@ -2722,9 +2592,9 @@ def test_categorical_series_repr_timedelta_ordered(self): 3 4 days 4 5 days dtype: category -Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" +Categories (5, timedelta64[ns]): [1 days < 2 days < 3 days < 4 days < 5 days]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp idx = pd.timedelta_range('1 hours', periods=10) s = pd.Series(pd.Categorical(idx, ordered=True)) @@ -2741,27 +2611,27 @@ def test_categorical_series_repr_timedelta_ordered(self): dtype: category Categories (10, timedelta64[ns]): [0 days 01:00:00 < 1 days 01:00:00 < 2 days 01:00:00 < 3 days 01:00:00 ... 6 days 01:00:00 < 7 days 01:00:00 < - 8 days 01:00:00 < 9 days 01:00:00]""" + 8 days 01:00:00 < 9 days 01:00:00]""" # noqa - self.assertEqual(repr(s), exp) + assert repr(s) == exp def test_categorical_index_repr(self): idx = pd.CategoricalIndex(pd.Categorical([1, 2, 3])) - exp = """CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category')""" - self.assertEqual(repr(idx), exp) + exp = """CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=False, dtype='category')""" # noqa + assert repr(idx) == exp i = pd.CategoricalIndex(pd.Categorical(np.arange(10))) - exp = """CategoricalIndex([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], categories=[0, 1, 2, 3, 4, 5, 6, 7, ...], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], categories=[0, 1, 2, 3, 4, 5, 6, 7, ...], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp def test_categorical_index_repr_ordered(self): i = pd.CategoricalIndex(pd.Categorical([1, 2, 3], ordered=True)) - exp = """CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=True, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=True, dtype='category')""" # noqa + assert repr(i) == exp i = pd.CategoricalIndex(pd.Categorical(np.arange(10), ordered=True)) - exp = """CategoricalIndex([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], categories=[0, 1, 2, 3, 4, 5, 6, 7, ...], ordered=True, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], categories=[0, 1, 2, 3, 4, 5, 6, 7, ...], ordered=True, dtype='category')""" # noqa + assert repr(i) == exp def test_categorical_index_repr_datetime(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2769,9 +2639,9 @@ def test_categorical_index_repr_datetime(self): exp = """CategoricalIndex(['2011-01-01 09:00:00', '2011-01-01 10:00:00', '2011-01-01 11:00:00', '2011-01-01 12:00:00', '2011-01-01 13:00:00'], - categories=[2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00], ordered=False, dtype='category')""" + categories=[2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2779,9 +2649,9 @@ def test_categorical_index_repr_datetime(self): exp = """CategoricalIndex(['2011-01-01 09:00:00-05:00', '2011-01-01 10:00:00-05:00', '2011-01-01 11:00:00-05:00', '2011-01-01 12:00:00-05:00', '2011-01-01 13:00:00-05:00'], - categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=False, dtype='category')""" + categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp def test_categorical_index_repr_datetime_ordered(self): idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5) @@ -2789,9 +2659,9 @@ def test_categorical_index_repr_datetime_ordered(self): exp = """CategoricalIndex(['2011-01-01 09:00:00', '2011-01-01 10:00:00', '2011-01-01 11:00:00', '2011-01-01 12:00:00', '2011-01-01 13:00:00'], - categories=[2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00], ordered=True, dtype='category')""" + categories=[2011-01-01 09:00:00, 2011-01-01 10:00:00, 2011-01-01 11:00:00, 2011-01-01 12:00:00, 2011-01-01 13:00:00], ordered=True, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp idx = pd.date_range('2011-01-01 09:00', freq='H', periods=5, tz='US/Eastern') @@ -2799,9 +2669,9 @@ def test_categorical_index_repr_datetime_ordered(self): exp = """CategoricalIndex(['2011-01-01 09:00:00-05:00', '2011-01-01 10:00:00-05:00', '2011-01-01 11:00:00-05:00', '2011-01-01 12:00:00-05:00', '2011-01-01 13:00:00-05:00'], - categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=True, dtype='category')""" + categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=True, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp i = pd.CategoricalIndex(pd.Categorical(idx.append(idx), ordered=True)) exp = """CategoricalIndex(['2011-01-01 09:00:00-05:00', '2011-01-01 10:00:00-05:00', @@ -2809,68 +2679,68 @@ def test_categorical_index_repr_datetime_ordered(self): '2011-01-01 13:00:00-05:00', '2011-01-01 09:00:00-05:00', '2011-01-01 10:00:00-05:00', '2011-01-01 11:00:00-05:00', '2011-01-01 12:00:00-05:00', '2011-01-01 13:00:00-05:00'], - categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=True, dtype='category')""" + categories=[2011-01-01 09:00:00-05:00, 2011-01-01 10:00:00-05:00, 2011-01-01 11:00:00-05:00, 2011-01-01 12:00:00-05:00, 2011-01-01 13:00:00-05:00], ordered=True, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp def test_categorical_index_repr_period(self): # test all length idx = pd.period_range('2011-01-01 09:00', freq='H', periods=1) i = pd.CategoricalIndex(pd.Categorical(idx)) - exp = """CategoricalIndex(['2011-01-01 09:00'], categories=[2011-01-01 09:00], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['2011-01-01 09:00'], categories=[2011-01-01 09:00], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp idx = pd.period_range('2011-01-01 09:00', freq='H', periods=2) i = pd.CategoricalIndex(pd.Categorical(idx)) - exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00'], categories=[2011-01-01 09:00, 2011-01-01 10:00], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00'], categories=[2011-01-01 09:00, 2011-01-01 10:00], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp idx = pd.period_range('2011-01-01 09:00', freq='H', periods=3) i = pd.CategoricalIndex(pd.Categorical(idx)) - exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00'], categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00'], categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx)) exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'], - categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=False, dtype='category')""" + categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp i = pd.CategoricalIndex(pd.Categorical(idx.append(idx))) exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00', '2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'], - categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=False, dtype='category')""" + categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp idx = pd.period_range('2011-01', freq='M', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx)) - exp = """CategoricalIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05'], categories=[2011-01, 2011-02, 2011-03, 2011-04, 2011-05], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05'], categories=[2011-01, 2011-02, 2011-03, 2011-04, 2011-05], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp def test_categorical_index_repr_period_ordered(self): idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx, ordered=True)) exp = """CategoricalIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'], - categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=True, dtype='category')""" + categories=[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00], ordered=True, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp idx = pd.period_range('2011-01', freq='M', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx, ordered=True)) - exp = """CategoricalIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05'], categories=[2011-01, 2011-02, 2011-03, 2011-04, 2011-05], ordered=True, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05'], categories=[2011-01, 2011-02, 2011-03, 2011-04, 2011-05], ordered=True, dtype='category')""" # noqa + assert repr(i) == exp def test_categorical_index_repr_timedelta(self): idx = pd.timedelta_range('1 days', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx)) - exp = """CategoricalIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], categories=[1 days 00:00:00, 2 days 00:00:00, 3 days 00:00:00, 4 days 00:00:00, 5 days 00:00:00], ordered=False, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], categories=[1 days 00:00:00, 2 days 00:00:00, 3 days 00:00:00, 4 days 00:00:00, 5 days 00:00:00], ordered=False, dtype='category')""" # noqa + assert repr(i) == exp idx = pd.timedelta_range('1 hours', periods=10) i = pd.CategoricalIndex(pd.Categorical(idx)) @@ -2878,15 +2748,15 @@ def test_categorical_index_repr_timedelta(self): '3 days 01:00:00', '4 days 01:00:00', '5 days 01:00:00', '6 days 01:00:00', '7 days 01:00:00', '8 days 01:00:00', '9 days 01:00:00'], - categories=[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, 5 days 01:00:00, 6 days 01:00:00, 7 days 01:00:00, ...], ordered=False, dtype='category')""" + categories=[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, 5 days 01:00:00, 6 days 01:00:00, 7 days 01:00:00, ...], ordered=False, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp def test_categorical_index_repr_timedelta_ordered(self): idx = pd.timedelta_range('1 days', periods=5) i = pd.CategoricalIndex(pd.Categorical(idx, ordered=True)) - exp = """CategoricalIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], categories=[1 days 00:00:00, 2 days 00:00:00, 3 days 00:00:00, 4 days 00:00:00, 5 days 00:00:00], ordered=True, dtype='category')""" - self.assertEqual(repr(i), exp) + exp = """CategoricalIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], categories=[1 days 00:00:00, 2 days 00:00:00, 3 days 00:00:00, 4 days 00:00:00, 5 days 00:00:00], ordered=True, dtype='category')""" # noqa + assert repr(i) == exp idx = pd.timedelta_range('1 hours', periods=10) i = pd.CategoricalIndex(pd.Categorical(idx, ordered=True)) @@ -2894,9 +2764,9 @@ def test_categorical_index_repr_timedelta_ordered(self): '3 days 01:00:00', '4 days 01:00:00', '5 days 01:00:00', '6 days 01:00:00', '7 days 01:00:00', '8 days 01:00:00', '9 days 01:00:00'], - categories=[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, 5 days 01:00:00, 6 days 01:00:00, 7 days 01:00:00, ...], ordered=True, dtype='category')""" + categories=[0 days 01:00:00, 1 days 01:00:00, 2 days 01:00:00, 3 days 01:00:00, 4 days 01:00:00, 5 days 01:00:00, 6 days 01:00:00, 7 days 01:00:00, ...], ordered=True, dtype='category')""" # noqa - self.assertEqual(repr(i), exp) + assert repr(i) == exp def test_categorical_frame(self): # normal DataFrame @@ -2912,7 +2782,7 @@ def test_categorical_frame(self): 4 2011-01-01 13:00:00-05:00 2011-05""" df = pd.DataFrame({'dt': pd.Categorical(dt), 'p': pd.Categorical(p)}) - self.assertEqual(repr(df), exp) + assert repr(df) == exp def test_info(self): @@ -2921,11 +2791,13 @@ def test_info(self): df = DataFrame({'int64': np.random.randint(100, size=n)}) df['category'] = Series(np.array(list('abcdefghij')).take( np.random.randint(0, 10, size=n))).astype('category') - df.isnull() - df.info() + df.isna() + buf = compat.StringIO() + df.info(buf=buf) df2 = df[df['category'] == 'd'] - df2.info() + buf = compat.StringIO() + df2.info(buf=buf) def test_groupby_sort(self): @@ -2942,36 +2814,36 @@ def test_groupby_sort(self): def test_min_max(self): # unordered cats have no min/max cat = Series(Categorical(["a", "b", "c", "d"], ordered=False)) - self.assertRaises(TypeError, lambda: cat.min()) - self.assertRaises(TypeError, lambda: cat.max()) + pytest.raises(TypeError, lambda: cat.min()) + pytest.raises(TypeError, lambda: cat.max()) cat = Series(Categorical(["a", "b", "c", "d"], ordered=True)) _min = cat.min() _max = cat.max() - self.assertEqual(_min, "a") - self.assertEqual(_max, "d") + assert _min == "a" + assert _max == "d" cat = Series(Categorical(["a", "b", "c", "d"], categories=[ 'd', 'c', 'b', 'a'], ordered=True)) _min = cat.min() _max = cat.max() - self.assertEqual(_min, "d") - self.assertEqual(_max, "a") + assert _min == "d" + assert _max == "a" cat = Series(Categorical( [np.nan, "b", "c", np.nan], categories=['d', 'c', 'b', 'a' ], ordered=True)) _min = cat.min() _max = cat.max() - self.assertTrue(np.isnan(_min)) - self.assertEqual(_max, "b") + assert np.isnan(_min) + assert _max == "b" cat = Series(Categorical( [np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True)) _min = cat.min() _max = cat.max() - self.assertTrue(np.isnan(_min)) - self.assertEqual(_max, 1) + assert np.isnan(_min) + assert _max == 1 def test_mode(self): s = Series(Categorical([1, 1, 2, 4, 5, 5, 5], @@ -2989,7 +2861,8 @@ def test_mode(self): s = Series(Categorical([1, 2, 3, 4, 5], categories=[5, 4, 3, 2, 1], ordered=True)) res = s.mode() - exp = Series(Categorical([], categories=[5, 4, 3, 2, 1], ordered=True)) + exp = Series(Categorical([5, 4, 3, 2, 1], categories=[5, 4, 3, 2, 1], + ordered=True)) tm.assert_series_equal(res, exp) def test_value_counts(self): @@ -3045,13 +2918,15 @@ def test_value_counts_with_nan(self): tm.assert_series_equal(res, exp) # we don't exclude the count of None and sort by counts - exp = pd.Series([3, 2, 1], index=pd.CategoricalIndex([np.nan, "a", "b"])) + exp = pd.Series( + [3, 2, 1], index=pd.CategoricalIndex([np.nan, "a", "b"])) res = s.value_counts(dropna=False) tm.assert_series_equal(res, exp) # When we aren't sorting by counts, and np.nan isn't a # category, it should be last. - exp = pd.Series([2, 1, 3], index=pd.CategoricalIndex(["a", "b", np.nan])) + exp = pd.Series( + [2, 1, 3], index=pd.CategoricalIndex(["a", "b", np.nan])) res = s.value_counts(dropna=False, sort=False) tm.assert_series_equal(res, exp) @@ -3163,7 +3038,7 @@ def f(x): # GH 9603 df = pd.DataFrame({'a': [1, 0, 0, 0]}) - c = pd.cut(df.a, [0, 1, 2, 3, 4]) + c = pd.cut(df.a, [0, 1, 2, 3, 4], labels=pd.Categorical(list('abcd'))) result = df.groupby(c).apply(len) exp_index = pd.CategoricalIndex(c.values.categories, @@ -3185,28 +3060,23 @@ def test_pivot_table(self): [Categorical(["a", "b", "z"], ordered=True), Categorical(["c", "d", "y"], ordered=True)], names=['A', 'B']) - expected = Series([1, 2, np.nan, 3, 4, np.nan, np.nan, np.nan, np.nan], - index=exp_index, name='values') - tm.assert_series_equal(result, expected) + expected = DataFrame( + {'values': [1, 2, np.nan, 3, 4, np.nan, np.nan, np.nan, np.nan]}, + index=exp_index) + tm.assert_frame_equal(result, expected) def test_count(self): s = Series(Categorical([np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True)) result = s.count() - self.assertEqual(result, 2) + assert result == 2 def test_sort_values(self): c = Categorical(["a", "b", "b", "a"], ordered=False) cat = Series(c.copy()) - # 'order' was deprecated in gh-10726 - # 'sort' was deprecated in gh-12882 - for func in ('order', 'sort'): - with tm.assert_produces_warning(FutureWarning): - getattr(c, func)() - # sort in the categories order expected = Series( Categorical(["a", "a", "b", "b"], @@ -3217,17 +3087,17 @@ def test_sort_values(self): cat = Series(Categorical(["a", "c", "b", "d"], ordered=True)) res = cat.sort_values() exp = np.array(["a", "b", "c", "d"], dtype=np.object_) - self.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_numpy_array_equal(res.__array__(), exp) cat = Series(Categorical(["a", "c", "b", "d"], categories=[ "a", "b", "c", "d"], ordered=True)) res = cat.sort_values() exp = np.array(["a", "b", "c", "d"], dtype=np.object_) - self.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_numpy_array_equal(res.__array__(), exp) res = cat.sort_values(ascending=False) exp = np.array(["d", "c", "b", "a"], dtype=np.object_) - self.assert_numpy_array_equal(res.__array__(), exp) + tm.assert_numpy_array_equal(res.__array__(), exp) raw_cat1 = Categorical(["a", "b", "c", "d"], categories=["a", "b", "c", "d"], ordered=False) @@ -3242,14 +3112,14 @@ def test_sort_values(self): # Cats must be sorted in a dataframe res = df.sort_values(by=["string"], ascending=False) exp = np.array(["d", "c", "b", "a"], dtype=np.object_) - self.assert_numpy_array_equal(res["sort"].values.__array__(), exp) - self.assertEqual(res["sort"].dtype, "category") + tm.assert_numpy_array_equal(res["sort"].values.__array__(), exp) + assert res["sort"].dtype == "category" res = df.sort_values(by=["sort"], ascending=False) exp = df.sort_values(by=["string"], ascending=True) - self.assert_series_equal(res["values"], exp["values"]) - self.assertEqual(res["sort"].dtype, "category") - self.assertEqual(res["unsort"].dtype, "category") + tm.assert_series_equal(res["values"], exp["values"]) + assert res["sort"].dtype == "category" + assert res["unsort"].dtype == "category" # unordered cat, but we allow this df.sort_values(by=["unsort"], ascending=False) @@ -3275,12 +3145,12 @@ def test_slicing(self): cat = Series(Categorical([1, 2, 3, 4])) reversed = cat[::-1] exp = np.array([4, 3, 2, 1], dtype=np.int64) - self.assert_numpy_array_equal(reversed.__array__(), exp) + tm.assert_numpy_array_equal(reversed.__array__(), exp) df = DataFrame({'value': (np.arange(100) + 1).astype('int64')}) df['D'] = pd.cut(df.value, bins=[0, 25, 50, 75, 100]) - expected = Series([11, '(0, 25]'], index=['value', 'D'], name=10) + expected = Series([11, Interval(0, 25)], index=['value', 'D'], name=10) result = df.iloc[10] tm.assert_series_equal(result, expected) @@ -3290,7 +3160,7 @@ def test_slicing(self): result = df.iloc[10:20] tm.assert_frame_equal(result, expected) - expected = Series([9, '(0, 25]'], index=['value', 'D'], name=8) + expected = Series([9, Interval(0, 25)], index=['value', 'D'], name=8) result = df.loc[8] tm.assert_series_equal(result, expected) @@ -3331,70 +3201,70 @@ def test_slicing_and_getting_ops(self): # frame res_df = df.iloc[2:4, :] tm.assert_frame_equal(res_df, exp_df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) # row res_row = df.iloc[2, :] tm.assert_series_equal(res_row, exp_row) - tm.assertIsInstance(res_row["cats"], compat.string_types) + assert isinstance(res_row["cats"], compat.string_types) # col res_col = df.iloc[:, 0] tm.assert_series_equal(res_col, exp_col) - self.assertTrue(is_categorical_dtype(res_col)) + assert is_categorical_dtype(res_col) # single value res_val = df.iloc[2, 0] - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # loc # frame res_df = df.loc["j":"k", :] tm.assert_frame_equal(res_df, exp_df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) # row res_row = df.loc["j", :] tm.assert_series_equal(res_row, exp_row) - tm.assertIsInstance(res_row["cats"], compat.string_types) + assert isinstance(res_row["cats"], compat.string_types) # col res_col = df.loc[:, "cats"] tm.assert_series_equal(res_col, exp_col) - self.assertTrue(is_categorical_dtype(res_col)) + assert is_categorical_dtype(res_col) # single value res_val = df.loc["j", "cats"] - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # ix # frame # res_df = df.loc["j":"k",[0,1]] # doesn't work? res_df = df.loc["j":"k", :] tm.assert_frame_equal(res_df, exp_df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) # row res_row = df.loc["j", :] tm.assert_series_equal(res_row, exp_row) - tm.assertIsInstance(res_row["cats"], compat.string_types) + assert isinstance(res_row["cats"], compat.string_types) # col res_col = df.loc[:, "cats"] tm.assert_series_equal(res_col, exp_col) - self.assertTrue(is_categorical_dtype(res_col)) + assert is_categorical_dtype(res_col) # single value res_val = df.loc["j", df.columns[0]] - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # iat res_val = df.iat[2, 0] - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # at res_val = df.at["j", "cats"] - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # fancy indexing exp_fancy = df.iloc[[2]] @@ -3406,32 +3276,32 @@ def test_slicing_and_getting_ops(self): # get_value res_val = df.get_value("j", "cats") - self.assertEqual(res_val, exp_val) + assert res_val == exp_val # i : int, slice, or sequence of integers res_row = df.iloc[2] tm.assert_series_equal(res_row, exp_row) - tm.assertIsInstance(res_row["cats"], compat.string_types) + assert isinstance(res_row["cats"], compat.string_types) res_df = df.iloc[slice(2, 4)] tm.assert_frame_equal(res_df, exp_df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) res_df = df.iloc[[2, 3]] tm.assert_frame_equal(res_df, exp_df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) res_col = df.iloc[:, 0] tm.assert_series_equal(res_col, exp_col) - self.assertTrue(is_categorical_dtype(res_col)) + assert is_categorical_dtype(res_col) res_df = df.iloc[:, slice(0, 2)] tm.assert_frame_equal(res_df, df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) res_df = df.iloc[:, [0, 1]] tm.assert_frame_equal(res_df, df) - self.assertTrue(is_categorical_dtype(res_df["cats"])) + assert is_categorical_dtype(res_df["cats"]) def test_slicing_doc_examples(self): @@ -3538,7 +3408,7 @@ def f(): df = orig.copy() df.iloc[2, 0] = "c" - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign a complete row (mixed values) -> exp_single_row df = orig.copy() @@ -3550,7 +3420,7 @@ def f(): df = orig.copy() df.iloc[2, :] = ["c", 2] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign multiple rows (mixed values) -> exp_multi_row df = orig.copy() @@ -3561,7 +3431,7 @@ def f(): df = orig.copy() df.iloc[2:4, :] = [["c", 2], ["c", 2]] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # assign a part of a column with dtype == categorical -> # exp_parts_cats_col @@ -3569,13 +3439,13 @@ def f(): df.iloc[2:4, 0] = pd.Categorical(["b", "b"], categories=["a", "b"]) tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different categories -> not sure if this should fail or pass df = orig.copy() df.iloc[2:4, 0] = pd.Categorical( ["b", "b"], categories=["a", "b", "c"]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different values df = orig.copy() df.iloc[2:4, 0] = pd.Categorical( @@ -3587,7 +3457,7 @@ def f(): df.iloc[2:4, 0] = ["b", "b"] tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.iloc[2:4, 0] = ["c", "c"] # loc @@ -3606,7 +3476,7 @@ def f(): df = orig.copy() df.loc["j", "cats"] = "c" - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign a complete row (mixed values) -> exp_single_row df = orig.copy() @@ -3618,7 +3488,7 @@ def f(): df = orig.copy() df.loc["j", :] = ["c", 2] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign multiple rows (mixed values) -> exp_multi_row df = orig.copy() @@ -3629,7 +3499,7 @@ def f(): df = orig.copy() df.loc["j":"k", :] = [["c", 2], ["c", 2]] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # assign a part of a column with dtype == categorical -> # exp_parts_cats_col @@ -3638,13 +3508,13 @@ def f(): ["b", "b"], categories=["a", "b"]) tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different categories -> not sure if this should fail or pass df = orig.copy() df.loc["j":"k", "cats"] = pd.Categorical( ["b", "b"], categories=["a", "b", "c"]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different values df = orig.copy() df.loc["j":"k", "cats"] = pd.Categorical( @@ -3656,7 +3526,7 @@ def f(): df.loc["j":"k", "cats"] = ["b", "b"] tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.loc["j":"k", "cats"] = ["c", "c"] # loc @@ -3675,7 +3545,7 @@ def f(): df = orig.copy() df.loc["j", df.columns[0]] = "c" - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign a complete row (mixed values) -> exp_single_row df = orig.copy() @@ -3687,7 +3557,7 @@ def f(): df = orig.copy() df.loc["j", :] = ["c", 2] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # - assign multiple rows (mixed values) -> exp_multi_row df = orig.copy() @@ -3698,21 +3568,22 @@ def f(): df = orig.copy() df.loc["j":"k", :] = [["c", 2], ["c", 2]] - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # assign a part of a column with dtype == categorical -> # exp_parts_cats_col df = orig.copy() - df.loc["j":"k", df.columns[0]] = pd.Categorical(["b", "b"], categories=["a", "b"]) + df.loc["j":"k", df.columns[0]] = pd.Categorical( + ["b", "b"], categories=["a", "b"]) tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different categories -> not sure if this should fail or pass df = orig.copy() df.loc["j":"k", df.columns[0]] = pd.Categorical( ["b", "b"], categories=["a", "b", "c"]) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): # different values df = orig.copy() df.loc["j":"k", df.columns[0]] = pd.Categorical( @@ -3724,7 +3595,7 @@ def f(): df.loc["j":"k", df.columns[0]] = ["b", "b"] tm.assert_frame_equal(df, exp_parts_cats_col) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.loc["j":"k", df.columns[0]] = ["c", "c"] # iat @@ -3737,7 +3608,7 @@ def f(): df = orig.copy() df.iat[2, 0] = "c" - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # at # - assign a single value -> exp_single_cats_value @@ -3750,7 +3621,7 @@ def f(): df = orig.copy() df.at["j", "cats"] = "c" - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # fancy indexing catsf = pd.Categorical(["a", "a", "c", "c", "a", "a", "a"], @@ -3775,7 +3646,7 @@ def f(): df = orig.copy() df.set_value("j", "cats", "c") - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # Assigning a Category to parts of a int/... column uses the values of # the Catgorical @@ -3865,20 +3736,20 @@ def test_comparisons(self): def f(): cat > cat_rev - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # categorical cannot be compared to Series or numpy array, and also # not the other way around - self.assertRaises(TypeError, lambda: cat > s) - self.assertRaises(TypeError, lambda: cat_rev > s) - self.assertRaises(TypeError, lambda: cat > a) - self.assertRaises(TypeError, lambda: cat_rev > a) + pytest.raises(TypeError, lambda: cat > s) + pytest.raises(TypeError, lambda: cat_rev > s) + pytest.raises(TypeError, lambda: cat > a) + pytest.raises(TypeError, lambda: cat_rev > a) - self.assertRaises(TypeError, lambda: s < cat) - self.assertRaises(TypeError, lambda: s < cat_rev) + pytest.raises(TypeError, lambda: s < cat) + pytest.raises(TypeError, lambda: s < cat_rev) - self.assertRaises(TypeError, lambda: a < cat) - self.assertRaises(TypeError, lambda: a < cat_rev) + pytest.raises(TypeError, lambda: a < cat) + pytest.raises(TypeError, lambda: a < cat_rev) # unequal comparison should raise for unordered cats cat = Series(Categorical(list("abc"))) @@ -3886,26 +3757,26 @@ def f(): def f(): cat > "b" - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) cat = Series(Categorical(list("abc"), ordered=False)) def f(): cat > "b" - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) # https://github.com/pandas-dev/pandas/issues/9836#issuecomment-92123057 # and following comparisons with scalars not in categories should raise # for unequal comps, but not for equal/not equal cat = Series(Categorical(list("abc"), ordered=True)) - self.assertRaises(TypeError, lambda: cat < "d") - self.assertRaises(TypeError, lambda: cat > "d") - self.assertRaises(TypeError, lambda: "d" < cat) - self.assertRaises(TypeError, lambda: "d" > cat) + pytest.raises(TypeError, lambda: cat < "d") + pytest.raises(TypeError, lambda: cat > "d") + pytest.raises(TypeError, lambda: "d" < cat) + pytest.raises(TypeError, lambda: "d" > cat) - self.assert_series_equal(cat == "d", Series([False, False, False])) - self.assert_series_equal(cat != "d", Series([True, True, True])) + tm.assert_series_equal(cat == "d", Series([False, False, False])) + tm.assert_series_equal(cat != "d", Series([True, True, True])) # And test NaN handling... cat = Series(Categorical(["a", "b", "c", np.nan])) @@ -3925,45 +3796,82 @@ def test_cat_equality(self): f = Categorical(list('acb')) # vs scalar - self.assertFalse((a == 'a').all()) - self.assertTrue(((a != 'a') == ~(a == 'a')).all()) + assert not (a == 'a').all() + assert ((a != 'a') == ~(a == 'a')).all() - self.assertFalse(('a' == a).all()) - self.assertTrue((a == 'a')[0]) - self.assertTrue(('a' == a)[0]) - self.assertFalse(('a' != a)[0]) + assert not ('a' == a).all() + assert (a == 'a')[0] + assert ('a' == a)[0] + assert not ('a' != a)[0] # vs list-like - self.assertTrue((a == a).all()) - self.assertFalse((a != a).all()) + assert (a == a).all() + assert not (a != a).all() - self.assertTrue((a == list(a)).all()) - self.assertTrue((a == b).all()) - self.assertTrue((b == a).all()) - self.assertTrue(((~(a == b)) == (a != b)).all()) - self.assertTrue(((~(b == a)) == (b != a)).all()) + assert (a == list(a)).all() + assert (a == b).all() + assert (b == a).all() + assert ((~(a == b)) == (a != b)).all() + assert ((~(b == a)) == (b != a)).all() - self.assertFalse((a == c).all()) - self.assertFalse((c == a).all()) - self.assertFalse((a == d).all()) - self.assertFalse((d == a).all()) + assert not (a == c).all() + assert not (c == a).all() + assert not (a == d).all() + assert not (d == a).all() # vs a cat-like - self.assertTrue((a == e).all()) - self.assertTrue((e == a).all()) - self.assertFalse((a == f).all()) - self.assertFalse((f == a).all()) + assert (a == e).all() + assert (e == a).all() + assert not (a == f).all() + assert not (f == a).all() - self.assertTrue(((~(a == e) == (a != e)).all())) - self.assertTrue(((~(e == a) == (e != a)).all())) - self.assertTrue(((~(a == f) == (a != f)).all())) - self.assertTrue(((~(f == a) == (f != a)).all())) + assert ((~(a == e) == (a != e)).all()) + assert ((~(e == a) == (e != a)).all()) + assert ((~(a == f) == (a != f)).all()) + assert ((~(f == a) == (f != a)).all()) # non-equality is not comparable - self.assertRaises(TypeError, lambda: a < b) - self.assertRaises(TypeError, lambda: b < a) - self.assertRaises(TypeError, lambda: a > b) - self.assertRaises(TypeError, lambda: b > a) + pytest.raises(TypeError, lambda: a < b) + pytest.raises(TypeError, lambda: b < a) + pytest.raises(TypeError, lambda: a > b) + pytest.raises(TypeError, lambda: b > a) + + @pytest.mark.parametrize('ctor', [ + lambda *args, **kwargs: Categorical(*args, **kwargs), + lambda *args, **kwargs: Series(Categorical(*args, **kwargs)), + ]) + def test_unordered_different_order_equal(self, ctor): + # https://github.com/pandas-dev/pandas/issues/16014 + c1 = ctor(['a', 'b'], categories=['a', 'b'], ordered=False) + c2 = ctor(['a', 'b'], categories=['b', 'a'], ordered=False) + assert (c1 == c2).all() + + c1 = ctor(['a', 'b'], categories=['a', 'b'], ordered=False) + c2 = ctor(['b', 'a'], categories=['b', 'a'], ordered=False) + assert (c1 != c2).all() + + c1 = ctor(['a', 'a'], categories=['a', 'b'], ordered=False) + c2 = ctor(['b', 'b'], categories=['b', 'a'], ordered=False) + assert (c1 != c2).all() + + c1 = ctor(['a', 'a'], categories=['a', 'b'], ordered=False) + c2 = ctor(['a', 'b'], categories=['b', 'a'], ordered=False) + result = c1 == c2 + tm.assert_numpy_array_equal(np.array(result), np.array([True, False])) + + def test_unordered_different_categories_raises(self): + c1 = Categorical(['a', 'b'], categories=['a', 'b'], ordered=False) + c2 = Categorical(['a', 'c'], categories=['c', 'a'], ordered=False) + with tm.assert_raises_regex(TypeError, + "Categoricals can only be compared"): + c1 == c2 + + def test_compare_different_lengths(self): + c1 = Categorical([], categories=['a', 'b']) + c2 = Categorical([], categories=['a']) + msg = "Categories are different lengths" + with tm.assert_raises_regex(TypeError, msg): + c1 == c2 def test_concat_append(self): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) @@ -4000,19 +3908,18 @@ def test_concat_append_gh7864(self): df1 = df[0:3] df2 = df[3:] - self.assert_index_equal(df['grade'].cat.categories, - df1['grade'].cat.categories) - self.assert_index_equal(df['grade'].cat.categories, - df2['grade'].cat.categories) + tm.assert_index_equal(df['grade'].cat.categories, + df1['grade'].cat.categories) + tm.assert_index_equal(df['grade'].cat.categories, + df2['grade'].cat.categories) dfx = pd.concat([df1, df2]) - self.assert_index_equal(df['grade'].cat.categories, - dfx['grade'].cat.categories) + tm.assert_index_equal(df['grade'].cat.categories, + dfx['grade'].cat.categories) dfa = df1.append(df2) - self.assert_index_equal(df['grade'].cat.categories, - dfa['grade'].cat.categories) - + tm.assert_index_equal(df['grade'].cat.categories, + dfa['grade'].cat.categories) def test_concat_preserve(self): @@ -4042,7 +3949,7 @@ def test_concat_preserve(self): res = pd.concat([df2, df2]) exp = DataFrame({'A': pd.concat([a, a]), 'B': pd.concat([b, b]).astype( - 'category', categories=list('cab'))}) + 'category', categories=list('cab'))}) tm.assert_frame_equal(res, exp) def test_categorical_index_preserver(self): @@ -4052,19 +3959,19 @@ def test_categorical_index_preserver(self): df2 = DataFrame({'A': a, 'B': b.astype('category', categories=list('cab')) - }).set_index('B') + }).set_index('B') result = pd.concat([df2, df2]) expected = DataFrame({'A': pd.concat([a, a]), 'B': pd.concat([b, b]).astype( 'category', categories=list('cab')) - }).set_index('B') + }).set_index('B') tm.assert_frame_equal(result, expected) # wrong catgories df3 = DataFrame({'A': a, 'B': pd.Categorical(b, categories=list('abc')) - }).set_index('B') - self.assertRaises(TypeError, lambda: pd.concat([df2, df3])) + }).set_index('B') + pytest.raises(TypeError, lambda: pd.concat([df2, df3])) def test_merge(self): # GH 9426 @@ -4095,9 +4002,12 @@ def test_merge(self): expected = df.copy() # object-cat + # note that we propogate the category + # because we don't have any matching rows cright = right.copy() cright['d'] = cright['d'].astype('category') result = pd.merge(left, cright, how='left', left_on='b', right_on='c') + expected['d'] = expected['d'].astype('category', categories=['null']) tm.assert_frame_equal(result, expected) # cat-object @@ -4119,15 +4029,15 @@ def test_repeat(self): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) exp = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b"]) res = cat.repeat(2) - self.assert_categorical_equal(res, exp) + tm.assert_categorical_equal(res, exp) def test_numpy_repeat(self): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) exp = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b"]) - self.assert_categorical_equal(np.repeat(cat, 2), exp) + tm.assert_categorical_equal(np.repeat(cat, 2), exp) msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.repeat, cat, 2, axis=1) + tm.assert_raises_regex(ValueError, msg, np.repeat, cat, 2, axis=1) def test_reshape(self): cat = pd.Categorical([], categories=["a", "b"]) @@ -4135,34 +4045,34 @@ def test_reshape(self): with tm.assert_produces_warning(FutureWarning): cat = pd.Categorical([], categories=["a", "b"]) - self.assert_categorical_equal(cat.reshape(0), cat) + tm.assert_categorical_equal(cat.reshape(0), cat) with tm.assert_produces_warning(FutureWarning): cat = pd.Categorical([], categories=["a", "b"]) - self.assert_categorical_equal(cat.reshape((5, -1)), cat) + tm.assert_categorical_equal(cat.reshape((5, -1)), cat) with tm.assert_produces_warning(FutureWarning): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) - self.assert_categorical_equal(cat.reshape(cat.shape), cat) + tm.assert_categorical_equal(cat.reshape(cat.shape), cat) with tm.assert_produces_warning(FutureWarning): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) - self.assert_categorical_equal(cat.reshape(cat.size), cat) + tm.assert_categorical_equal(cat.reshape(cat.size), cat) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): msg = "can only specify one unknown dimension" cat = pd.Categorical(["a", "b"], categories=["a", "b"]) - tm.assertRaisesRegexp(ValueError, msg, cat.reshape, (-2, -1)) + tm.assert_raises_regex(ValueError, msg, cat.reshape, (-2, -1)) def test_numpy_reshape(self): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): cat = pd.Categorical(["a", "b"], categories=["a", "b"]) - self.assert_categorical_equal(np.reshape(cat, cat.shape), cat) + tm.assert_categorical_equal(np.reshape(cat, cat.shape), cat) with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): msg = "the 'order' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.reshape, - cat, cat.shape, order='F') + tm.assert_raises_regex(ValueError, msg, np.reshape, + cat, cat.shape, order='F') def test_na_actions(self): @@ -4186,7 +4096,7 @@ def test_na_actions(self): def f(): df.fillna(value={"cats": 4, "vals": "c"}) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) res = df.fillna(method='pad') tm.assert_frame_equal(res, df_exp_fill) @@ -4244,7 +4154,7 @@ def test_astype_to_other(self): expected = s tm.assert_series_equal(s.astype('category'), expected) tm.assert_series_equal(s.astype(CategoricalDtype()), expected) - self.assertRaises(ValueError, lambda: s.astype('float64')) + pytest.raises(ValueError, lambda: s.astype('float64')) cat = Series(Categorical(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c'])) exp = Series(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c']) @@ -4282,7 +4192,7 @@ def cmp(a, b): # invalid conversion (these are NOT a dtype) for invalid in [lambda x: x.astype(pd.Categorical), lambda x: x.astype('object').astype(pd.Categorical)]: - self.assertRaises(TypeError, lambda: invalid(s)) + pytest.raises(TypeError, lambda: invalid(s)) def test_astype_categorical(self): @@ -4290,7 +4200,7 @@ def test_astype_categorical(self): tm.assert_categorical_equal(cat, cat.astype('category')) tm.assert_almost_equal(np.array(cat), cat.astype('object')) - self.assertRaises(ValueError, lambda: cat.astype(float)) + pytest.raises(ValueError, lambda: cat.astype(float)) def test_to_records(self): @@ -4317,28 +4227,28 @@ def test_numeric_like_ops(self): # numeric ops should not succeed for op in ['__add__', '__sub__', '__mul__', '__truediv__']: - self.assertRaises(TypeError, - lambda: getattr(self.cat, op)(self.cat)) + pytest.raises(TypeError, + lambda: getattr(self.cat, op)(self.cat)) # reduction ops should not succeed (unless specifically defined, e.g. # min/max) s = self.cat['value_group'] for op in ['kurt', 'skew', 'var', 'std', 'mean', 'sum', 'median']: - self.assertRaises(TypeError, - lambda: getattr(s, op)(numeric_only=False)) + pytest.raises(TypeError, + lambda: getattr(s, op)(numeric_only=False)) # mad technically works because it takes always the numeric data # numpy ops s = pd.Series(pd.Categorical([1, 2, 3, 4])) - self.assertRaises(TypeError, lambda: np.sum(s)) + pytest.raises(TypeError, lambda: np.sum(s)) # numeric ops on a Series for op in ['__add__', '__sub__', '__mul__', '__truediv__']: - self.assertRaises(TypeError, lambda: getattr(s, op)(2)) + pytest.raises(TypeError, lambda: getattr(s, op)(2)) # invalid ufunc - self.assertRaises(TypeError, lambda: np.log(s)) + pytest.raises(TypeError, lambda: np.log(s)) def test_cat_tab_completition(self): # test the tab completion display @@ -4358,20 +4268,21 @@ def get_dir(s): def test_cat_accessor_api(self): # GH 9322 from pandas.core.categorical import CategoricalAccessor - self.assertIs(Series.cat, CategoricalAccessor) + assert Series.cat is CategoricalAccessor s = Series(list('aabbcde')).astype('category') - self.assertIsInstance(s.cat, CategoricalAccessor) + assert isinstance(s.cat, CategoricalAccessor) invalid = Series([1]) - with tm.assertRaisesRegexp(AttributeError, "only use .cat accessor"): + with tm.assert_raises_regex(AttributeError, + "only use .cat accessor"): invalid.cat - self.assertFalse(hasattr(invalid, 'cat')) + assert not hasattr(invalid, 'cat') def test_cat_accessor_no_new_attributes(self): # https://github.com/pandas-dev/pandas/issues/10673 c = Series(list('aabbcde')).astype('category') - with tm.assertRaisesRegexp(AttributeError, - "You cannot add any new attribute"): + with tm.assert_raises_regex(AttributeError, + "You cannot add any new attribute"): c.cat.xlabel = "a" def test_str_accessor_api_for_categorical(self): @@ -4380,7 +4291,7 @@ def test_str_accessor_api_for_categorical(self): s = Series(list('aabb')) s = s + " " + s c = s.astype('category') - self.assertIsInstance(c.str, StringMethods) + assert isinstance(c.str, StringMethods) # str functions, which need special arguments special_func_defs = [ @@ -4391,8 +4302,8 @@ def test_str_accessor_api_for_categorical(self): ('decode', ("UTF-8",), {}), ('encode', ("UTF-8",), {}), ('endswith', ("a",), {}), - ('extract', ("([a-z]*) ",), {"expand":False}), - ('extract', ("([a-z]*) ",), {"expand":True}), + ('extract', ("([a-z]*) ",), {"expand": False}), + ('extract', ("([a-z]*) ",), {"expand": True}), ('extractall', ("([a-z]*) ",), {}), ('find', ("a",), {}), ('findall', ("a",), {}), @@ -4426,10 +4337,10 @@ def test_str_accessor_api_for_categorical(self): # * `translate` has different interfaces for py2 vs. py3 _ignore_names = ["get", "join", "translate"] - str_func_names = [f - for f in dir(s.str) - if not (f.startswith("_") or f in _special_func_names - or f in _ignore_names)] + str_func_names = [f for f in dir(s.str) if not ( + f.startswith("_") or + f in _special_func_names or + f in _ignore_names)] func_defs = [(f, (), {}) for f in str_func_names] func_defs.extend(special_func_defs) @@ -4444,17 +4355,15 @@ def test_str_accessor_api_for_categorical(self): tm.assert_series_equal(res, exp) invalid = Series([1, 2, 3]).astype('category') - with tm.assertRaisesRegexp(AttributeError, - "Can only use .str accessor with string"): + with tm.assert_raises_regex(AttributeError, + "Can only use .str " + "accessor with string"): invalid.str - self.assertFalse(hasattr(invalid, 'str')) + assert not hasattr(invalid, 'str') def test_dt_accessor_api_for_categorical(self): # https://github.com/pandas-dev/pandas/issues/10661 - from pandas.tseries.common import Properties - from pandas.tseries.index import date_range, DatetimeIndex - from pandas.tseries.period import period_range, PeriodIndex - from pandas.tseries.tdi import timedelta_range, TimedeltaIndex + from pandas.core.indexes.accessors import Properties s_dr = Series(date_range('1/1/2015', periods=5, tz="MET")) c_dr = s_dr.astype("category") @@ -4465,12 +4374,16 @@ def test_dt_accessor_api_for_categorical(self): s_tdr = Series(timedelta_range('1 days', '10 days')) c_tdr = s_tdr.astype("category") + # only testing field (like .day) + # and bool (is_month_start) + get_ops = lambda x: x._datetimelike_ops + test_data = [ - ("Datetime", DatetimeIndex._datetimelike_ops, s_dr, c_dr), - ("Period", PeriodIndex._datetimelike_ops, s_pr, c_pr), - ("Timedelta", TimedeltaIndex._datetimelike_ops, s_tdr, c_tdr)] + ("Datetime", get_ops(DatetimeIndex), s_dr, c_dr), + ("Period", get_ops(PeriodIndex), s_pr, c_pr), + ("Timedelta", get_ops(TimedeltaIndex), s_tdr, c_tdr)] - self.assertIsInstance(c_dr.dt, Properties) + assert isinstance(c_dr.dt, Properties) special_func_defs = [ ('strftime', ("%Y-%m-%d",), {}), @@ -4478,12 +4391,13 @@ def test_dt_accessor_api_for_categorical(self): ('round', ("D",), {}), ('floor', ("D",), {}), ('ceil', ("D",), {}), + ('asfreq', ("D",), {}), # ('tz_localize', ("UTC",), {}), ] _special_func_names = [f[0] for f in special_func_defs] # the series is already localized - _ignore_names = ['tz_localize'] + _ignore_names = ['tz_localize', 'components'] for name, attr_names, s, c in test_data: func_names = [f @@ -4505,7 +4419,7 @@ def test_dt_accessor_api_for_categorical(self): elif isinstance(res, pd.Series): tm.assert_series_equal(res, exp) else: - tm.assert_numpy_array_equal(res, exp) + tm.assert_almost_equal(res, exp) for attr in attr_names: try: @@ -4520,13 +4434,13 @@ def test_dt_accessor_api_for_categorical(self): elif isinstance(res, pd.Series): tm.assert_series_equal(res, exp) else: - tm.assert_numpy_array_equal(res, exp) + tm.assert_almost_equal(res, exp) invalid = Series([1, 2, 3]).astype('category') - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( AttributeError, "Can only use .dt accessor with datetimelike"): invalid.dt - self.assertFalse(hasattr(invalid, 'str')) + assert not hasattr(invalid, 'str') def test_concat_categorical(self): # See GH 10177 @@ -4548,38 +4462,22 @@ def test_concat_categorical(self): tm.assert_frame_equal(res, exp) -class TestCategoricalSubclassing(tm.TestCase): - - _multiprocess_can_split_ = True +class TestCategoricalSubclassing(object): def test_constructor(self): sc = tm.SubclassedCategorical(['a', 'b', 'c']) - self.assertIsInstance(sc, tm.SubclassedCategorical) + assert isinstance(sc, tm.SubclassedCategorical) tm.assert_categorical_equal(sc, Categorical(['a', 'b', 'c'])) def test_from_array(self): sc = tm.SubclassedCategorical.from_codes([1, 0, 2], ['a', 'b', 'c']) - self.assertIsInstance(sc, tm.SubclassedCategorical) + assert isinstance(sc, tm.SubclassedCategorical) exp = Categorical.from_codes([1, 0, 2], ['a', 'b', 'c']) tm.assert_categorical_equal(sc, exp) def test_map(self): sc = tm.SubclassedCategorical(['a', 'b', 'c']) res = sc.map(lambda x: x.upper()) - self.assertIsInstance(res, tm.SubclassedCategorical) - exp = Categorical(['A', 'B', 'C']) - tm.assert_categorical_equal(res, exp) - - def test_map(self): - sc = tm.SubclassedCategorical(['a', 'b', 'c']) - res = sc.map(lambda x: x.upper()) - self.assertIsInstance(res, tm.SubclassedCategorical) + assert isinstance(res, tm.SubclassedCategorical) exp = Categorical(['A', 'B', 'C']) tm.assert_categorical_equal(res, exp) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - # '--with-coverage', '--cover-package=pandas.core'] - exit=False) diff --git a/pandas/tests/test_common.py b/pandas/tests/test_common.py index 09dd3f7ab517c..57479be4d989f 100644 --- a/pandas/tests/test_common.py +++ b/pandas/tests/test_common.py @@ -1,6 +1,9 @@ # -*- coding: utf-8 -*- -import nose +import pytest +import collections +from functools import partial + import numpy as np from pandas import Series, Timestamp @@ -8,15 +11,14 @@ import pandas.core.common as com import pandas.util.testing as tm -_multiprocess_can_split_ = True - def test_mut_exclusive(): msg = "mutually exclusive arguments: '[ab]' and '[ab]'" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): com._mut_exclusive(a=1, b=2) assert com._mut_exclusive(a=1, b=None) == 1 assert com._mut_exclusive(major=None, major_axis=None) is None + assert com._mut_exclusive(a=None, b=2) == 2 def test_get_callable_name(): @@ -145,21 +147,21 @@ def test_random_state(): import numpy.random as npr # Check with seed state = com._random_state(5) - tm.assert_equal(state.uniform(), npr.RandomState(5).uniform()) + assert state.uniform() == npr.RandomState(5).uniform() # Check with random state object state2 = npr.RandomState(10) - tm.assert_equal( - com._random_state(state2).uniform(), npr.RandomState(10).uniform()) + assert (com._random_state(state2).uniform() == + npr.RandomState(10).uniform()) # check with no arg random state assert com._random_state() is np.random # Error for floats or strings - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): com._random_state('test') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): com._random_state(5.5) @@ -198,6 +200,24 @@ def test_dict_compat(): assert (com._dict_compat(data_unchanged) == data_unchanged) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) +def test_standardize_mapping(): + # No uninitialized defaultdicts + with pytest.raises(TypeError): + com.standardize_mapping(collections.defaultdict) + + # No non-mapping subtypes, instance + with pytest.raises(TypeError): + com.standardize_mapping([]) + + # No non-mapping subtypes, class + with pytest.raises(TypeError): + com.standardize_mapping(list) + + fill = {'bad': 'data'} + assert (com.standardize_mapping(fill) == dict) + + # Convert instance to type + assert (com.standardize_mapping({}) == dict) + + dd = collections.defaultdict(list) + assert isinstance(com.standardize_mapping(dd), partial) diff --git a/pandas/tests/test_compat.py b/pandas/tests/test_compat.py index 68c0b81eb18ce..ff9d09c033164 100644 --- a/pandas/tests/test_compat.py +++ b/pandas/tests/test_compat.py @@ -6,21 +6,23 @@ from pandas.compat import (range, zip, map, filter, lrange, lzip, lmap, lfilter, builtins, iterkeys, itervalues, iteritems, next) -import pandas.util.testing as tm -class TestBuiltinIterators(tm.TestCase): +class TestBuiltinIterators(object): - def check_result(self, actual, expected, lengths): + @classmethod + def check_result(cls, actual, expected, lengths): for (iter_res, list_res), exp, length in zip(actual, expected, lengths): - self.assertNotIsInstance(iter_res, list) - tm.assertIsInstance(list_res, list) + assert not isinstance(iter_res, list) + assert isinstance(list_res, list) + iter_res = list(iter_res) - self.assertEqual(len(list_res), length) - self.assertEqual(len(iter_res), length) - self.assertEqual(iter_res, exp) - self.assertEqual(list_res, exp) + + assert len(list_res) == length + assert len(iter_res) == length + assert iter_res == exp + assert list_res == exp def test_range(self): actual1 = range(10) @@ -64,6 +66,6 @@ def test_zip(self): self.check_result(actual, expected, lengths) def test_dict_iterators(self): - self.assertEqual(next(itervalues({1: 2})), 2) - self.assertEqual(next(iterkeys({1: 2})), 1) - self.assertEqual(next(iteritems({1: 2})), (1, 2)) + assert next(itervalues({1: 2})) == 2 + assert next(iterkeys({1: 2})) == 1 + assert next(iteritems({1: 2})) == (1, 2) diff --git a/pandas/tests/test_config.py b/pandas/tests/test_config.py index ed8c37fd6dd20..8d6f36ac6a798 100644 --- a/pandas/tests/test_config.py +++ b/pandas/tests/test_config.py @@ -1,29 +1,36 @@ # -*- coding: utf-8 -*- +import pytest + import pandas as pd -import unittest -import warnings +import warnings -class TestConfig(unittest.TestCase): - _multiprocess_can_split_ = True - def __init__(self, *args): - super(TestConfig, self).__init__(*args) +class TestConfig(object): + @classmethod + def setup_class(cls): from copy import deepcopy - self.cf = pd.core.config - self.gc = deepcopy(getattr(self.cf, '_global_config')) - self.do = deepcopy(getattr(self.cf, '_deprecated_options')) - self.ro = deepcopy(getattr(self.cf, '_registered_options')) - def setUp(self): + cls.cf = pd.core.config + cls.gc = deepcopy(getattr(cls.cf, '_global_config')) + cls.do = deepcopy(getattr(cls.cf, '_deprecated_options')) + cls.ro = deepcopy(getattr(cls.cf, '_registered_options')) + + def setup_method(self, method): setattr(self.cf, '_global_config', {}) - setattr( - self.cf, 'options', self.cf.DictWrapper(self.cf._global_config)) + setattr(self.cf, 'options', self.cf.DictWrapper( + self.cf._global_config)) setattr(self.cf, '_deprecated_options', {}) setattr(self.cf, '_registered_options', {}) - def tearDown(self): + # Our test fixture in conftest.py sets "chained_assignment" + # to "raise" only after all test methods have been setup. + # However, after this setup, there is no longer any + # "chained_assignment" option, so re-register it. + self.cf.register_option('chained_assignment', 'raise') + + def teardown_method(self, method): setattr(self.cf, '_global_config', self.gc) setattr(self.cf, '_deprecated_options', self.do) setattr(self.cf, '_registered_options', self.ro) @@ -31,36 +38,36 @@ def tearDown(self): def test_api(self): # the pandas object exposes the user API - self.assertTrue(hasattr(pd, 'get_option')) - self.assertTrue(hasattr(pd, 'set_option')) - self.assertTrue(hasattr(pd, 'reset_option')) - self.assertTrue(hasattr(pd, 'describe_option')) + assert hasattr(pd, 'get_option') + assert hasattr(pd, 'set_option') + assert hasattr(pd, 'reset_option') + assert hasattr(pd, 'describe_option') def test_is_one_of_factory(self): v = self.cf.is_one_of_factory([None, 12]) v(12) v(None) - self.assertRaises(ValueError, v, 1.1) + pytest.raises(ValueError, v, 1.1) def test_register_option(self): self.cf.register_option('a', 1, 'doc') # can't register an already registered option - self.assertRaises(KeyError, self.cf.register_option, 'a', 1, 'doc') + pytest.raises(KeyError, self.cf.register_option, 'a', 1, 'doc') # can't register an already registered option - self.assertRaises(KeyError, self.cf.register_option, 'a.b.c.d1', 1, - 'doc') - self.assertRaises(KeyError, self.cf.register_option, 'a.b.c.d2', 1, - 'doc') + pytest.raises(KeyError, self.cf.register_option, 'a.b.c.d1', 1, + 'doc') + pytest.raises(KeyError, self.cf.register_option, 'a.b.c.d2', 1, + 'doc') # no python keywords - self.assertRaises(ValueError, self.cf.register_option, 'for', 0) - self.assertRaises(ValueError, self.cf.register_option, 'a.for.b', 0) + pytest.raises(ValueError, self.cf.register_option, 'for', 0) + pytest.raises(ValueError, self.cf.register_option, 'a.for.b', 0) # must be valid identifier (ensure attribute access works) - self.assertRaises(ValueError, self.cf.register_option, - 'Oh my Goddess!', 0) + pytest.raises(ValueError, self.cf.register_option, + 'Oh my Goddess!', 0) # we can register options several levels deep # without predefining the intermediate steps @@ -83,56 +90,42 @@ def test_describe_option(self): self.cf.register_option('l', "foo") # non-existent keys raise KeyError - self.assertRaises(KeyError, self.cf.describe_option, 'no.such.key') + pytest.raises(KeyError, self.cf.describe_option, 'no.such.key') # we can get the description for any key we registered - self.assertTrue( - 'doc' in self.cf.describe_option('a', _print_desc=False)) - self.assertTrue( - 'doc2' in self.cf.describe_option('b', _print_desc=False)) - self.assertTrue( - 'precated' in self.cf.describe_option('b', _print_desc=False)) - - self.assertTrue( - 'doc3' in self.cf.describe_option('c.d.e1', _print_desc=False)) - self.assertTrue( - 'doc4' in self.cf.describe_option('c.d.e2', _print_desc=False)) + assert 'doc' in self.cf.describe_option('a', _print_desc=False) + assert 'doc2' in self.cf.describe_option('b', _print_desc=False) + assert 'precated' in self.cf.describe_option('b', _print_desc=False) + assert 'doc3' in self.cf.describe_option('c.d.e1', _print_desc=False) + assert 'doc4' in self.cf.describe_option('c.d.e2', _print_desc=False) # if no doc is specified we get a default message # saying "description not available" - self.assertTrue( - 'vailable' in self.cf.describe_option('f', _print_desc=False)) - self.assertTrue( - 'vailable' in self.cf.describe_option('g.h', _print_desc=False)) - self.assertTrue( - 'precated' in self.cf.describe_option('g.h', _print_desc=False)) - self.assertTrue( - 'k' in self.cf.describe_option('g.h', _print_desc=False)) + assert 'vailable' in self.cf.describe_option('f', _print_desc=False) + assert 'vailable' in self.cf.describe_option('g.h', _print_desc=False) + assert 'precated' in self.cf.describe_option('g.h', _print_desc=False) + assert 'k' in self.cf.describe_option('g.h', _print_desc=False) # default is reported - self.assertTrue( - 'foo' in self.cf.describe_option('l', _print_desc=False)) + assert 'foo' in self.cf.describe_option('l', _print_desc=False) # current value is reported - self.assertFalse( - 'bar' in self.cf.describe_option('l', _print_desc=False)) + assert 'bar' not in self.cf.describe_option('l', _print_desc=False) self.cf.set_option("l", "bar") - self.assertTrue( - 'bar' in self.cf.describe_option('l', _print_desc=False)) + assert 'bar' in self.cf.describe_option('l', _print_desc=False) def test_case_insensitive(self): self.cf.register_option('KanBAN', 1, 'doc') - self.assertTrue( - 'doc' in self.cf.describe_option('kanbaN', _print_desc=False)) - self.assertEqual(self.cf.get_option('kanBaN'), 1) + assert 'doc' in self.cf.describe_option('kanbaN', _print_desc=False) + assert self.cf.get_option('kanBaN') == 1 self.cf.set_option('KanBan', 2) - self.assertEqual(self.cf.get_option('kAnBaN'), 2) + assert self.cf.get_option('kAnBaN') == 2 # gets of non-existent keys fail - self.assertRaises(KeyError, self.cf.get_option, 'no_such_option') + pytest.raises(KeyError, self.cf.get_option, 'no_such_option') self.cf.deprecate_option('KanBan') - self.assertTrue(self.cf._is_deprecated('kAnBaN')) + assert self.cf._is_deprecated('kAnBaN') def test_get_option(self): self.cf.register_option('a', 1, 'doc') @@ -140,118 +133,118 @@ def test_get_option(self): self.cf.register_option('b.b', None, 'doc2') # gets of existing keys succeed - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') - self.assertTrue(self.cf.get_option('b.b') is None) + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' + assert self.cf.get_option('b.b') is None # gets of non-existent keys fail - self.assertRaises(KeyError, self.cf.get_option, 'no_such_option') + pytest.raises(KeyError, self.cf.get_option, 'no_such_option') def test_set_option(self): self.cf.register_option('a', 1, 'doc') self.cf.register_option('b.c', 'hullo', 'doc2') self.cf.register_option('b.b', None, 'doc2') - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') - self.assertTrue(self.cf.get_option('b.b') is None) + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' + assert self.cf.get_option('b.b') is None self.cf.set_option('a', 2) self.cf.set_option('b.c', 'wurld') self.cf.set_option('b.b', 1.1) - self.assertEqual(self.cf.get_option('a'), 2) - self.assertEqual(self.cf.get_option('b.c'), 'wurld') - self.assertEqual(self.cf.get_option('b.b'), 1.1) + assert self.cf.get_option('a') == 2 + assert self.cf.get_option('b.c') == 'wurld' + assert self.cf.get_option('b.b') == 1.1 - self.assertRaises(KeyError, self.cf.set_option, 'no.such.key', None) + pytest.raises(KeyError, self.cf.set_option, 'no.such.key', None) def test_set_option_empty_args(self): - self.assertRaises(ValueError, self.cf.set_option) + pytest.raises(ValueError, self.cf.set_option) def test_set_option_uneven_args(self): - self.assertRaises(ValueError, self.cf.set_option, 'a.b', 2, 'b.c') + pytest.raises(ValueError, self.cf.set_option, 'a.b', 2, 'b.c') def test_set_option_invalid_single_argument_type(self): - self.assertRaises(ValueError, self.cf.set_option, 2) + pytest.raises(ValueError, self.cf.set_option, 2) def test_set_option_multiple(self): self.cf.register_option('a', 1, 'doc') self.cf.register_option('b.c', 'hullo', 'doc2') self.cf.register_option('b.b', None, 'doc2') - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') - self.assertTrue(self.cf.get_option('b.b') is None) + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' + assert self.cf.get_option('b.b') is None self.cf.set_option('a', '2', 'b.c', None, 'b.b', 10.0) - self.assertEqual(self.cf.get_option('a'), '2') - self.assertTrue(self.cf.get_option('b.c') is None) - self.assertEqual(self.cf.get_option('b.b'), 10.0) + assert self.cf.get_option('a') == '2' + assert self.cf.get_option('b.c') is None + assert self.cf.get_option('b.b') == 10.0 def test_validation(self): self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) self.cf.register_option('b.c', 'hullo', 'doc2', validator=self.cf.is_text) - self.assertRaises(ValueError, self.cf.register_option, 'a.b.c.d2', - 'NO', 'doc', validator=self.cf.is_int) + pytest.raises(ValueError, self.cf.register_option, 'a.b.c.d2', + 'NO', 'doc', validator=self.cf.is_int) self.cf.set_option('a', 2) # int is_int self.cf.set_option('b.c', 'wurld') # str is_str - self.assertRaises( + pytest.raises( ValueError, self.cf.set_option, 'a', None) # None not is_int - self.assertRaises(ValueError, self.cf.set_option, 'a', 'ab') - self.assertRaises(ValueError, self.cf.set_option, 'b.c', 1) + pytest.raises(ValueError, self.cf.set_option, 'a', 'ab') + pytest.raises(ValueError, self.cf.set_option, 'b.c', 1) validator = self.cf.is_one_of_factory([None, self.cf.is_callable]) self.cf.register_option('b', lambda: None, 'doc', validator=validator) self.cf.set_option('b', '%.1f'.format) # Formatter is callable self.cf.set_option('b', None) # Formatter is none (default) - self.assertRaises(ValueError, self.cf.set_option, 'b', '%.1f') + pytest.raises(ValueError, self.cf.set_option, 'b', '%.1f') def test_reset_option(self): self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) self.cf.register_option('b.c', 'hullo', 'doc2', validator=self.cf.is_str) - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' self.cf.set_option('a', 2) self.cf.set_option('b.c', 'wurld') - self.assertEqual(self.cf.get_option('a'), 2) - self.assertEqual(self.cf.get_option('b.c'), 'wurld') + assert self.cf.get_option('a') == 2 + assert self.cf.get_option('b.c') == 'wurld' self.cf.reset_option('a') - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'wurld') + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'wurld' self.cf.reset_option('b.c') - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' def test_reset_option_all(self): self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) self.cf.register_option('b.c', 'hullo', 'doc2', validator=self.cf.is_str) - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' self.cf.set_option('a', 2) self.cf.set_option('b.c', 'wurld') - self.assertEqual(self.cf.get_option('a'), 2) - self.assertEqual(self.cf.get_option('b.c'), 'wurld') + assert self.cf.get_option('a') == 2 + assert self.cf.get_option('b.c') == 'wurld' self.cf.reset_option("all") - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b.c'), 'hullo') + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b.c') == 'hullo' def test_deprecate_option(self): # we can deprecate non-existent options self.cf.deprecate_option('foo') - self.assertTrue(self.cf._is_deprecated('foo')) + assert self.cf._is_deprecated('foo') with warnings.catch_warnings(record=True) as w: warnings.simplefilter('always') try: @@ -261,9 +254,8 @@ def test_deprecate_option(self): else: self.fail("Nonexistent option didn't raise KeyError") - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'deprecated' in str(w[-1])) # we get the default message + assert len(w) == 1 # should have raised one warning + assert 'deprecated' in str(w[-1]) # we get the default message self.cf.register_option('a', 1, 'doc', validator=self.cf.is_int) self.cf.register_option('b.c', 'hullo', 'doc2') @@ -274,13 +266,11 @@ def test_deprecate_option(self): warnings.simplefilter('always') self.cf.get_option('a') - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'eprecated' in str(w[-1])) # we get the default message - self.assertTrue( - 'nifty_ver' in str(w[-1])) # with the removal_ver quoted + assert len(w) == 1 # should have raised one warning + assert 'eprecated' in str(w[-1]) # we get the default message + assert 'nifty_ver' in str(w[-1]) # with the removal_ver quoted - self.assertRaises( + pytest.raises( KeyError, self.cf.deprecate_option, 'a') # can't depr. twice self.cf.deprecate_option('b.c', 'zounds!') @@ -288,66 +278,60 @@ def test_deprecate_option(self): warnings.simplefilter('always') self.cf.get_option('b.c') - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'zounds!' in str(w[-1])) # we get the custom message + assert len(w) == 1 # should have raised one warning + assert 'zounds!' in str(w[-1]) # we get the custom message # test rerouting keys self.cf.register_option('d.a', 'foo', 'doc2') self.cf.register_option('d.dep', 'bar', 'doc2') - self.assertEqual(self.cf.get_option('d.a'), 'foo') - self.assertEqual(self.cf.get_option('d.dep'), 'bar') + assert self.cf.get_option('d.a') == 'foo' + assert self.cf.get_option('d.dep') == 'bar' self.cf.deprecate_option('d.dep', rkey='d.a') # reroute d.dep to d.a with warnings.catch_warnings(record=True) as w: warnings.simplefilter('always') - self.assertEqual(self.cf.get_option('d.dep'), 'foo') + assert self.cf.get_option('d.dep') == 'foo' - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'eprecated' in str(w[-1])) # we get the custom message + assert len(w) == 1 # should have raised one warning + assert 'eprecated' in str(w[-1]) # we get the custom message with warnings.catch_warnings(record=True) as w: warnings.simplefilter('always') self.cf.set_option('d.dep', 'baz') # should overwrite "d.a" - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'eprecated' in str(w[-1])) # we get the custom message + assert len(w) == 1 # should have raised one warning + assert 'eprecated' in str(w[-1]) # we get the custom message with warnings.catch_warnings(record=True) as w: warnings.simplefilter('always') - self.assertEqual(self.cf.get_option('d.dep'), 'baz') + assert self.cf.get_option('d.dep') == 'baz' - self.assertEqual(len(w), 1) # should have raised one warning - self.assertTrue( - 'eprecated' in str(w[-1])) # we get the custom message + assert len(w) == 1 # should have raised one warning + assert 'eprecated' in str(w[-1]) # we get the custom message def test_config_prefix(self): with self.cf.config_prefix("base"): self.cf.register_option('a', 1, "doc1") self.cf.register_option('b', 2, "doc2") - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b'), 2) + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b') == 2 self.cf.set_option('a', 3) self.cf.set_option('b', 4) - self.assertEqual(self.cf.get_option('a'), 3) - self.assertEqual(self.cf.get_option('b'), 4) + assert self.cf.get_option('a') == 3 + assert self.cf.get_option('b') == 4 - self.assertEqual(self.cf.get_option('base.a'), 3) - self.assertEqual(self.cf.get_option('base.b'), 4) - self.assertTrue( - 'doc1' in self.cf.describe_option('base.a', _print_desc=False)) - self.assertTrue( - 'doc2' in self.cf.describe_option('base.b', _print_desc=False)) + assert self.cf.get_option('base.a') == 3 + assert self.cf.get_option('base.b') == 4 + assert 'doc1' in self.cf.describe_option('base.a', _print_desc=False) + assert 'doc2' in self.cf.describe_option('base.b', _print_desc=False) self.cf.reset_option('base.a') self.cf.reset_option('base.b') with self.cf.config_prefix("base"): - self.assertEqual(self.cf.get_option('a'), 1) - self.assertEqual(self.cf.get_option('b'), 2) + assert self.cf.get_option('a') == 1 + assert self.cf.get_option('b') == 2 def test_callback(self): k = [None] @@ -362,21 +346,21 @@ def callback(key): del k[-1], v[-1] self.cf.set_option("d.a", "fooz") - self.assertEqual(k[-1], "d.a") - self.assertEqual(v[-1], "fooz") + assert k[-1] == "d.a" + assert v[-1] == "fooz" del k[-1], v[-1] self.cf.set_option("d.b", "boo") - self.assertEqual(k[-1], "d.b") - self.assertEqual(v[-1], "boo") + assert k[-1] == "d.b" + assert v[-1] == "boo" del k[-1], v[-1] self.cf.reset_option("d.b") - self.assertEqual(k[-1], "d.b") + assert k[-1] == "d.b" def test_set_ContextManager(self): def eq(val): - self.assertEqual(self.cf.get_option("a"), val) + assert self.cf.get_option("a") == val self.cf.register_option('a', 0) eq(0) @@ -406,22 +390,22 @@ def f3(key): self.cf.register_option('c', 0, cb=f3) options = self.cf.options - self.assertEqual(options.a, 0) + assert options.a == 0 with self.cf.option_context("a", 15): - self.assertEqual(options.a, 15) + assert options.a == 15 options.a = 500 - self.assertEqual(self.cf.get_option("a"), 500) + assert self.cf.get_option("a") == 500 self.cf.reset_option("a") - self.assertEqual(options.a, self.cf.get_option("a", 0)) + assert options.a == self.cf.get_option("a", 0) - self.assertRaises(KeyError, f) - self.assertRaises(KeyError, f2) + pytest.raises(KeyError, f) + pytest.raises(KeyError, f2) # make sure callback kicks when using this form of setting options.c = 1 - self.assertEqual(len(holder), 1) + assert len(holder) == 1 def test_option_context_scope(self): # Ensure that creating a context does not affect the existing @@ -436,11 +420,11 @@ def test_option_context_scope(self): # Ensure creating contexts didn't affect the current context. ctx = self.cf.option_context(option_name, context_value) - self.assertEqual(self.cf.get_option(option_name), original_value) + assert self.cf.get_option(option_name) == original_value # Ensure the correct value is available inside the context. with ctx: - self.assertEqual(self.cf.get_option(option_name), context_value) + assert self.cf.get_option(option_name) == context_value # Ensure the current context is reset - self.assertEqual(self.cf.get_option(option_name), original_value) + assert self.cf.get_option(option_name) == original_value diff --git a/pandas/tests/test_downstream.py b/pandas/tests/test_downstream.py new file mode 100644 index 0000000000000..61f0c992225c6 --- /dev/null +++ b/pandas/tests/test_downstream.py @@ -0,0 +1,107 @@ +""" +Testing that we work in the downstream packages +""" +import pytest +import numpy as np # noqa +from pandas import DataFrame +from pandas.compat import PY36 +from pandas.util import testing as tm +import importlib + + +def import_module(name): + # we *only* want to skip if the module is truly not available + # and NOT just an actual import error because of pandas changes + + if PY36: + try: + return importlib.import_module(name) + except ModuleNotFoundError: # noqa + pytest.skip("skipping as {} not available".format(name)) + + else: + try: + return importlib.import_module(name) + except ImportError as e: + if "No module named" in str(e) and name in str(e): + pytest.skip("skipping as {} not available".format(name)) + raise + + +@pytest.fixture +def df(): + return DataFrame({'A': [1, 2, 3]}) + + +def test_dask(df): + + toolz = import_module('toolz') # noqa + dask = import_module('dask') # noqa + + import dask.dataframe as dd + + ddf = dd.from_pandas(df, npartitions=3) + assert ddf.A is not None + assert ddf.compute() is not None + + +def test_xarray(df): + + xarray = import_module('xarray') # noqa + + assert df.to_xarray() is not None + + +@tm.network +def test_statsmodels(): + + statsmodels = import_module('statsmodels') # noqa + import statsmodels.api as sm + import statsmodels.formula.api as smf + df = sm.datasets.get_rdataset("Guerry", "HistData").data + smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=df).fit() + + +def test_scikit_learn(df): + + sklearn = import_module('sklearn') # noqa + from sklearn import svm, datasets + + digits = datasets.load_digits() + clf = svm.SVC(gamma=0.001, C=100.) + clf.fit(digits.data[:-1], digits.target[:-1]) + clf.predict(digits.data[-1:]) + + +def test_seaborn(): + + seaborn = import_module('seaborn') + tips = seaborn.load_dataset("tips") + seaborn.stripplot(x="day", y="total_bill", data=tips) + + +def test_pandas_gbq(df): + + pandas_gbq = import_module('pandas_gbq') # noqa + + +@tm.network +def test_pandas_datareader(): + + pandas_datareader = import_module('pandas_datareader') # noqa + pandas_datareader.get_data_google('AAPL') + + +def test_geopandas(): + + geopandas = import_module('geopandas') # noqa + fp = geopandas.datasets.get_path('naturalearth_lowres') + assert geopandas.read_file(fp) is not None + + +def test_pyarrow(df): + + pyarrow = import_module('pyarrow') # noqa + table = pyarrow.Table.from_pandas(df) + result = table.to_pandas() + tm.assert_frame_equal(result, df) diff --git a/pandas/tests/test_errors.py b/pandas/tests/test_errors.py new file mode 100644 index 0000000000000..babf88ef1df8d --- /dev/null +++ b/pandas/tests/test_errors.py @@ -0,0 +1,52 @@ +# -*- coding: utf-8 -*- + +import pytest +from warnings import catch_warnings +import pandas # noqa +import pandas as pd + + +@pytest.mark.parametrize( + "exc", ['UnsupportedFunctionCall', 'UnsortedIndexError', + 'OutOfBoundsDatetime', + 'ParserError', 'PerformanceWarning', 'DtypeWarning', + 'EmptyDataError', 'ParserWarning', 'MergeError']) +def test_exception_importable(exc): + from pandas import errors + e = getattr(errors, exc) + assert e is not None + + # check that we can raise on them + with pytest.raises(e): + raise e() + + +def test_catch_oob(): + from pandas import errors + + try: + pd.Timestamp('15000101') + except errors.OutOfBoundsDatetime: + pass + + +def test_error_rename(): + # see gh-12665 + from pandas.errors import ParserError + from pandas.io.common import CParserError + + try: + raise CParserError() + except ParserError: + pass + + try: + raise ParserError() + except CParserError: + pass + + with catch_warnings(record=True): + try: + raise ParserError() + except pd.parser.CParserError: + pass diff --git a/pandas/tests/test_expressions.py b/pandas/tests/test_expressions.py index c037f02f20609..2b972477ae999 100644 --- a/pandas/tests/test_expressions.py +++ b/pandas/tests/test_expressions.py @@ -2,33 +2,25 @@ from __future__ import print_function # pylint: disable-msg=W0612,E1101 -import nose +from warnings import catch_warnings import re +import operator +import pytest from numpy.random import randn -import operator import numpy as np from pandas.core.api import DataFrame, Panel -from pandas.computation import expressions as expr -from pandas import compat +from pandas.core.computation import expressions as expr +from pandas import compat, _np_version_under1p11, _np_version_under1p13 from pandas.util.testing import (assert_almost_equal, assert_series_equal, assert_frame_equal, assert_panel_equal, - assert_panel4d_equal, slow) -from pandas.formats.printing import pprint_thing + assert_panel4d_equal) +from pandas.io.formats.printing import pprint_thing import pandas.util.testing as tm -if not expr._USE_NUMEXPR: - try: - import numexpr # noqa - except ImportError: - msg = "don't have" - else: - msg = "not using" - raise nose.SkipTest("{0} numexpr".format(msg)) - _frame = DataFrame(randn(10000, 4), columns=list('ABCD'), dtype='float64') _frame2 = DataFrame(randn(100, 4), columns=list('ABCD'), dtype='float64') _mixed = DataFrame({'A': _frame['A'].copy(), @@ -41,26 +33,32 @@ 'D': _frame2['D'].astype('int32')}) _integer = DataFrame( np.random.randint(1, 100, - size=(10001, 4)), columns=list('ABCD'), dtype='int64') + size=(10001, 4)), + columns=list('ABCD'), dtype='int64') _integer2 = DataFrame(np.random.randint(1, 100, size=(101, 4)), columns=list('ABCD'), dtype='int64') -_frame_panel = Panel(dict(ItemA=_frame.copy(), ItemB=( - _frame.copy() + 3), ItemC=_frame.copy(), ItemD=_frame.copy())) -_frame2_panel = Panel(dict(ItemA=_frame2.copy(), ItemB=(_frame2.copy() + 3), - ItemC=_frame2.copy(), ItemD=_frame2.copy())) -_integer_panel = Panel(dict(ItemA=_integer, ItemB=(_integer + 34).astype( - 'int64'))) -_integer2_panel = Panel(dict(ItemA=_integer2, ItemB=(_integer2 + 34).astype( - 'int64'))) -_mixed_panel = Panel(dict(ItemA=_mixed, ItemB=(_mixed + 3))) -_mixed2_panel = Panel(dict(ItemA=_mixed2, ItemB=(_mixed2 + 3))) +with catch_warnings(record=True): + _frame_panel = Panel(dict(ItemA=_frame.copy(), + ItemB=(_frame.copy() + 3), + ItemC=_frame.copy(), + ItemD=_frame.copy())) + _frame2_panel = Panel(dict(ItemA=_frame2.copy(), + ItemB=(_frame2.copy() + 3), + ItemC=_frame2.copy(), + ItemD=_frame2.copy())) + _integer_panel = Panel(dict(ItemA=_integer, + ItemB=(_integer + 34).astype('int64'))) + _integer2_panel = Panel(dict(ItemA=_integer2, + ItemB=(_integer2 + 34).astype('int64'))) + _mixed_panel = Panel(dict(ItemA=_mixed, ItemB=(_mixed + 3))) + _mixed2_panel = Panel(dict(ItemA=_mixed2, ItemB=(_mixed2 + 3))) -class TestExpressions(tm.TestCase): - _multiprocess_can_split_ = False +@pytest.mark.skipif(not expr._USE_NUMEXPR, reason='not using numexpr') +class TestExpressions(object): - def setUp(self): + def setup_method(self, method): self.frame = _frame.copy() self.frame2 = _frame2.copy() @@ -69,17 +67,23 @@ def setUp(self): self.integer = _integer.copy() self._MIN_ELEMENTS = expr._MIN_ELEMENTS - def tearDown(self): + def teardown_method(self, method): expr._MIN_ELEMENTS = self._MIN_ELEMENTS - @nose.tools.nottest - def run_arithmetic_test(self, df, other, assert_func, check_dtype=False, - test_flex=True): + def run_arithmetic(self, df, other, assert_func, check_dtype=False, + test_flex=True): expr._MIN_ELEMENTS = 0 operations = ['add', 'sub', 'mul', 'mod', 'truediv', 'floordiv', 'pow'] if not compat.PY3: operations.append('div') for arith in operations: + + # numpy >= 1.11 doesn't handle integers + # raised to integer powers + # https://github.com/pandas-dev/pandas/issues/15363 + if arith == 'pow' and not _np_version_under1p11: + continue + operator_name = arith if arith == 'div': operator_name = 'truediv' @@ -92,6 +96,7 @@ def run_arithmetic_test(self, df, other, assert_func, check_dtype=False, expr.set_use_numexpr(False) expected = op(df, other) expr.set_use_numexpr(True) + result = op(df, other) try: if check_dtype: @@ -103,15 +108,14 @@ def run_arithmetic_test(self, df, other, assert_func, check_dtype=False, raise def test_integer_arithmetic(self): - self.run_arithmetic_test(self.integer, self.integer, - assert_frame_equal) - self.run_arithmetic_test(self.integer.iloc[:, 0], - self.integer.iloc[:, 0], assert_series_equal, - check_dtype=True) - - @nose.tools.nottest - def run_binary_test(self, df, other, assert_func, test_flex=False, - numexpr_ops=set(['gt', 'lt', 'ge', 'le', 'eq', 'ne'])): + self.run_arithmetic(self.integer, self.integer, + assert_frame_equal) + self.run_arithmetic(self.integer.iloc[:, 0], + self.integer.iloc[:, 0], assert_series_equal, + check_dtype=True) + + def run_binary(self, df, other, assert_func, test_flex=False, + numexpr_ops=set(['gt', 'lt', 'ge', 'le', 'eq', 'ne'])): """ tests solely that the result is the same whether or not numexpr is enabled. Need to test whether the function does the correct thing @@ -145,46 +149,46 @@ def run_binary_test(self, df, other, assert_func, test_flex=False, def run_frame(self, df, other, binary_comp=None, run_binary=True, **kwargs): - self.run_arithmetic_test(df, other, assert_frame_equal, - test_flex=False, **kwargs) - self.run_arithmetic_test(df, other, assert_frame_equal, test_flex=True, - **kwargs) + self.run_arithmetic(df, other, assert_frame_equal, + test_flex=False, **kwargs) + self.run_arithmetic(df, other, assert_frame_equal, test_flex=True, + **kwargs) if run_binary: if binary_comp is None: expr.set_use_numexpr(False) binary_comp = other + 1 expr.set_use_numexpr(True) - self.run_binary_test(df, binary_comp, assert_frame_equal, - test_flex=False, **kwargs) - self.run_binary_test(df, binary_comp, assert_frame_equal, - test_flex=True, **kwargs) + self.run_binary(df, binary_comp, assert_frame_equal, + test_flex=False, **kwargs) + self.run_binary(df, binary_comp, assert_frame_equal, + test_flex=True, **kwargs) def run_series(self, ser, other, binary_comp=None, **kwargs): - self.run_arithmetic_test(ser, other, assert_series_equal, - test_flex=False, **kwargs) - self.run_arithmetic_test(ser, other, assert_almost_equal, - test_flex=True, **kwargs) + self.run_arithmetic(ser, other, assert_series_equal, + test_flex=False, **kwargs) + self.run_arithmetic(ser, other, assert_almost_equal, + test_flex=True, **kwargs) # series doesn't uses vec_compare instead of numexpr... # if binary_comp is None: # binary_comp = other + 1 - # self.run_binary_test(ser, binary_comp, assert_frame_equal, + # self.run_binary(ser, binary_comp, assert_frame_equal, # test_flex=False, **kwargs) - # self.run_binary_test(ser, binary_comp, assert_frame_equal, + # self.run_binary(ser, binary_comp, assert_frame_equal, # test_flex=True, **kwargs) def run_panel(self, panel, other, binary_comp=None, run_binary=True, assert_func=assert_panel_equal, **kwargs): - self.run_arithmetic_test(panel, other, assert_func, test_flex=False, - **kwargs) - self.run_arithmetic_test(panel, other, assert_func, test_flex=True, - **kwargs) + self.run_arithmetic(panel, other, assert_func, test_flex=False, + **kwargs) + self.run_arithmetic(panel, other, assert_func, test_flex=True, + **kwargs) if run_binary: if binary_comp is None: binary_comp = other + 1 - self.run_binary_test(panel, binary_comp, assert_func, - test_flex=False, **kwargs) - self.run_binary_test(panel, binary_comp, assert_func, - test_flex=True, **kwargs) + self.run_binary(panel, binary_comp, assert_func, + test_flex=False, **kwargs) + self.run_binary(panel, binary_comp, assert_func, + test_flex=True, **kwargs) def test_integer_arithmetic_frame(self): self.run_frame(self.integer, self.integer) @@ -192,7 +196,7 @@ def test_integer_arithmetic_frame(self): def test_integer_arithmetic_series(self): self.run_series(self.integer.iloc[:, 0], self.integer.iloc[:, 0]) - @slow + @pytest.mark.slow def test_integer_panel(self): self.run_panel(_integer2_panel, np.random.randint(1, 100)) @@ -202,13 +206,13 @@ def test_float_arithemtic_frame(self): def test_float_arithmetic_series(self): self.run_series(self.frame2.iloc[:, 0], self.frame2.iloc[:, 0]) - @slow + @pytest.mark.slow def test_float_panel(self): self.run_panel(_frame2_panel, np.random.randn() + 0.1, binary_comp=0.8) - @slow + @pytest.mark.slow def test_panel4d(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self.run_panel(tm.makePanel4D(), np.random.randn() + 0.5, assert_func=assert_panel4d_equal, binary_comp=3) @@ -222,50 +226,50 @@ def test_mixed_arithmetic_series(self): for col in self.mixed2.columns: self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) - @slow + @pytest.mark.slow def test_mixed_panel(self): self.run_panel(_mixed2_panel, np.random.randint(1, 100), binary_comp=-2) def test_float_arithemtic(self): - self.run_arithmetic_test(self.frame, self.frame, assert_frame_equal) - self.run_arithmetic_test(self.frame.iloc[:, 0], self.frame.iloc[:, 0], - assert_series_equal, check_dtype=True) + self.run_arithmetic(self.frame, self.frame, assert_frame_equal) + self.run_arithmetic(self.frame.iloc[:, 0], self.frame.iloc[:, 0], + assert_series_equal, check_dtype=True) def test_mixed_arithmetic(self): - self.run_arithmetic_test(self.mixed, self.mixed, assert_frame_equal) + self.run_arithmetic(self.mixed, self.mixed, assert_frame_equal) for col in self.mixed.columns: - self.run_arithmetic_test(self.mixed[col], self.mixed[col], - assert_series_equal) + self.run_arithmetic(self.mixed[col], self.mixed[col], + assert_series_equal) def test_integer_with_zeros(self): self.integer *= np.random.randint(0, 2, size=np.shape(self.integer)) - self.run_arithmetic_test(self.integer, self.integer, - assert_frame_equal) - self.run_arithmetic_test(self.integer.iloc[:, 0], - self.integer.iloc[:, 0], assert_series_equal) + self.run_arithmetic(self.integer, self.integer, + assert_frame_equal) + self.run_arithmetic(self.integer.iloc[:, 0], + self.integer.iloc[:, 0], assert_series_equal) def test_invalid(self): # no op result = expr._can_use_numexpr(operator.add, None, self.frame, self.frame, 'evaluate') - self.assertFalse(result) + assert not result # mixed result = expr._can_use_numexpr(operator.add, '+', self.mixed, self.frame, 'evaluate') - self.assertFalse(result) + assert not result # min elements result = expr._can_use_numexpr(operator.add, '+', self.frame2, self.frame2, 'evaluate') - self.assertFalse(result) + assert not result # ok, we only check on first part of expression result = expr._can_use_numexpr(operator.add, '+', self.frame, self.frame2, 'evaluate') - self.assertTrue(result) + assert result def test_binary_ops(self): def testit(): @@ -275,6 +279,13 @@ def testit(): for op, op_str in [('add', '+'), ('sub', '-'), ('mul', '*'), ('div', '/'), ('pow', '**')]: + + # numpy >= 1.11 doesn't handle integers + # raised to integer powers + # https://github.com/pandas-dev/pandas/issues/15363 + if op == 'pow' and not _np_version_under1p11: + continue + if op == 'div': op = getattr(operator, 'truediv', None) else: @@ -282,7 +293,7 @@ def testit(): if op is not None: result = expr._can_use_numexpr(op, op_str, f, f, 'evaluate') - self.assertNotEqual(result, f._is_mixed_type) + assert result != f._is_mixed_type result = expr.evaluate(op, op_str, f, f, use_numexpr=True) @@ -297,7 +308,7 @@ def testit(): result = expr._can_use_numexpr(op, op_str, f2, f2, 'evaluate') - self.assertFalse(result) + assert not result expr.set_use_numexpr(False) testit() @@ -325,7 +336,7 @@ def testit(): result = expr._can_use_numexpr(op, op_str, f11, f12, 'evaluate') - self.assertNotEqual(result, f11._is_mixed_type) + assert result != f11._is_mixed_type result = expr.evaluate(op, op_str, f11, f12, use_numexpr=True) @@ -338,7 +349,7 @@ def testit(): result = expr._can_use_numexpr(op, op_str, f21, f22, 'evaluate') - self.assertFalse(result) + assert not result expr.set_use_numexpr(False) testit() @@ -379,22 +390,22 @@ def test_bool_ops_raise_on_arithmetic(self): f = getattr(operator, name) err_msg = re.escape(msg % op) - with tm.assertRaisesRegexp(NotImplementedError, err_msg): + with tm.assert_raises_regex(NotImplementedError, err_msg): f(df, df) - with tm.assertRaisesRegexp(NotImplementedError, err_msg): + with tm.assert_raises_regex(NotImplementedError, err_msg): f(df.a, df.b) - with tm.assertRaisesRegexp(NotImplementedError, err_msg): + with tm.assert_raises_regex(NotImplementedError, err_msg): f(df.a, True) - with tm.assertRaisesRegexp(NotImplementedError, err_msg): + with tm.assert_raises_regex(NotImplementedError, err_msg): f(False, df.a) - with tm.assertRaisesRegexp(TypeError, err_msg): + with tm.assert_raises_regex(TypeError, err_msg): f(False, df) - with tm.assertRaisesRegexp(TypeError, err_msg): + with tm.assert_raises_regex(TypeError, err_msg): f(df, True) def test_bool_ops_warn_on_arithmetic(self): @@ -409,6 +420,10 @@ def test_bool_ops_warn_on_arithmetic(self): f = getattr(operator, name) fe = getattr(operator, sub_funcs[subs[op]]) + # >= 1.13.0 these are now TypeErrors + if op == '-' and not _np_version_under1p13: + continue + with tm.use_numexpr(True, min_elements=5): with tm.assert_produces_warning(check_stacklevel=False): r = f(df, df) @@ -439,9 +454,3 @@ def test_bool_ops_warn_on_arithmetic(self): r = f(df, True) e = fe(df, True) tm.assert_frame_equal(r, e) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_generic.py b/pandas/tests/test_generic.py deleted file mode 100644 index f7b7ae8c66382..0000000000000 --- a/pandas/tests/test_generic.py +++ /dev/null @@ -1,2025 +0,0 @@ -# -*- coding: utf-8 -*- -# pylint: disable-msg=E1101,W0612 - -from operator import methodcaller -import nose -import numpy as np -from numpy import nan -import pandas as pd - -from pandas.types.common import is_scalar -from pandas import (Index, Series, DataFrame, Panel, isnull, - date_range, period_range, Panel4D) -from pandas.core.index import MultiIndex - -import pandas.formats.printing as printing - -from pandas.compat import range, zip, PY3 -from pandas import compat -from pandas.util.testing import (assertRaisesRegexp, - assert_series_equal, - assert_frame_equal, - assert_panel_equal, - assert_panel4d_equal, - assert_almost_equal) - -import pandas.util.testing as tm - - -# ---------------------------------------------------------------------- -# Generic types test cases - - -class Generic(object): - - _multiprocess_can_split_ = True - - def setUp(self): - pass - - @property - def _ndim(self): - return self._typ._AXIS_LEN - - def _axes(self): - """ return the axes for my object typ """ - return self._typ._AXIS_ORDERS - - def _construct(self, shape, value=None, dtype=None, **kwargs): - """ construct an object for the given shape - if value is specified use that if its a scalar - if value is an array, repeat it as needed """ - - if isinstance(shape, int): - shape = tuple([shape] * self._ndim) - if value is not None: - if is_scalar(value): - if value == 'empty': - arr = None - - # remove the info axis - kwargs.pop(self._typ._info_axis_name, None) - else: - arr = np.empty(shape, dtype=dtype) - arr.fill(value) - else: - fshape = np.prod(shape) - arr = value.ravel() - new_shape = fshape / arr.shape[0] - if fshape % arr.shape[0] != 0: - raise Exception("invalid value passed in _construct") - - arr = np.repeat(arr, new_shape).reshape(shape) - else: - arr = np.random.randn(*shape) - return self._typ(arr, dtype=dtype, **kwargs) - - def _compare(self, result, expected): - self._comparator(result, expected) - - def test_rename(self): - - # single axis - idx = list('ABCD') - # relabeling values passed into self.rename - args = [ - str.lower, - {x: x.lower() for x in idx}, - Series({x: x.lower() for x in idx}), - ] - - for axis in self._axes(): - kwargs = {axis: idx} - obj = self._construct(4, **kwargs) - - for arg in args: - # rename a single axis - result = obj.rename(**{axis: arg}) - expected = obj.copy() - setattr(expected, axis, list('abcd')) - self._compare(result, expected) - - # multiple axes at once - - def test_rename_axis(self): - idx = list('ABCD') - # relabeling values passed into self.rename - args = [ - str.lower, - {x: x.lower() for x in idx}, - Series({x: x.lower() for x in idx}), - ] - - for axis in self._axes(): - kwargs = {axis: idx} - obj = self._construct(4, **kwargs) - - for arg in args: - # rename a single axis - result = obj.rename_axis(arg, axis=axis) - expected = obj.copy() - setattr(expected, axis, list('abcd')) - self._compare(result, expected) - # scalar values - for arg in ['foo', None]: - result = obj.rename_axis(arg, axis=axis) - expected = obj.copy() - getattr(expected, axis).name = arg - self._compare(result, expected) - - def test_get_numeric_data(self): - - n = 4 - kwargs = {} - for i in range(self._ndim): - kwargs[self._typ._AXIS_NAMES[i]] = list(range(n)) - - # get the numeric data - o = self._construct(n, **kwargs) - result = o._get_numeric_data() - self._compare(result, o) - - # non-inclusion - result = o._get_bool_data() - expected = self._construct(n, value='empty', **kwargs) - self._compare(result, expected) - - # get the bool data - arr = np.array([True, True, False, True]) - o = self._construct(n, value=arr, **kwargs) - result = o._get_numeric_data() - self._compare(result, o) - - # _get_numeric_data is includes _get_bool_data, so can't test for - # non-inclusion - - def test_get_default(self): - - # GH 7725 - d0 = "a", "b", "c", "d" - d1 = np.arange(4, dtype='int64') - others = "e", 10 - - for data, index in ((d0, d1), (d1, d0)): - s = Series(data, index=index) - for i, d in zip(index, data): - self.assertEqual(s.get(i), d) - self.assertEqual(s.get(i, d), d) - self.assertEqual(s.get(i, "z"), d) - for other in others: - self.assertEqual(s.get(other, "z"), "z") - self.assertEqual(s.get(other, other), other) - - def test_nonzero(self): - - # GH 4633 - # look at the boolean/nonzero behavior for objects - obj = self._construct(shape=4) - self.assertRaises(ValueError, lambda: bool(obj == 0)) - self.assertRaises(ValueError, lambda: bool(obj == 1)) - self.assertRaises(ValueError, lambda: bool(obj)) - - obj = self._construct(shape=4, value=1) - self.assertRaises(ValueError, lambda: bool(obj == 0)) - self.assertRaises(ValueError, lambda: bool(obj == 1)) - self.assertRaises(ValueError, lambda: bool(obj)) - - obj = self._construct(shape=4, value=np.nan) - self.assertRaises(ValueError, lambda: bool(obj == 0)) - self.assertRaises(ValueError, lambda: bool(obj == 1)) - self.assertRaises(ValueError, lambda: bool(obj)) - - # empty - obj = self._construct(shape=0) - self.assertRaises(ValueError, lambda: bool(obj)) - - # invalid behaviors - - obj1 = self._construct(shape=4, value=1) - obj2 = self._construct(shape=4, value=1) - - def f(): - if obj1: - printing.pprint_thing("this works and shouldn't") - - self.assertRaises(ValueError, f) - self.assertRaises(ValueError, lambda: obj1 and obj2) - self.assertRaises(ValueError, lambda: obj1 or obj2) - self.assertRaises(ValueError, lambda: not obj1) - - def test_numpy_1_7_compat_numeric_methods(self): - # GH 4435 - # numpy in 1.7 tries to pass addtional arguments to pandas functions - - o = self._construct(shape=4) - for op in ['min', 'max', 'max', 'var', 'std', 'prod', 'sum', 'cumsum', - 'cumprod', 'median', 'skew', 'kurt', 'compound', 'cummax', - 'cummin', 'all', 'any']: - f = getattr(np, op, None) - if f is not None: - f(o) - - def test_downcast(self): - # test close downcasting - - o = self._construct(shape=4, value=9, dtype=np.int64) - result = o.copy() - result._data = o._data.downcast(dtypes='infer') - self._compare(result, o) - - o = self._construct(shape=4, value=9.) - expected = o.astype(np.int64) - result = o.copy() - result._data = o._data.downcast(dtypes='infer') - self._compare(result, expected) - - o = self._construct(shape=4, value=9.5) - result = o.copy() - result._data = o._data.downcast(dtypes='infer') - self._compare(result, o) - - # are close - o = self._construct(shape=4, value=9.000000000005) - result = o.copy() - result._data = o._data.downcast(dtypes='infer') - expected = o.astype(np.int64) - self._compare(result, expected) - - def test_constructor_compound_dtypes(self): - # GH 5191 - # compound dtypes should raise not-implementederror - - def f(dtype): - return self._construct(shape=3, dtype=dtype) - - self.assertRaises(NotImplementedError, f, [("A", "datetime64[h]"), - ("B", "str"), - ("C", "int32")]) - - # these work (though results may be unexpected) - f('int64') - f('float64') - f('M8[ns]') - - def check_metadata(self, x, y=None): - for m in x._metadata: - v = getattr(x, m, None) - if y is None: - self.assertIsNone(v) - else: - self.assertEqual(v, getattr(y, m, None)) - - def test_metadata_propagation(self): - # check that the metadata matches up on the resulting ops - - o = self._construct(shape=3) - o.name = 'foo' - o2 = self._construct(shape=3) - o2.name = 'bar' - - # TODO - # Once panel can do non-trivial combine operations - # (currently there is an a raise in the Panel arith_ops to prevent - # this, though it actually does work) - # can remove all of these try: except: blocks on the actual operations - - # ---------- - # preserving - # ---------- - - # simple ops with scalars - for op in ['__add__', '__sub__', '__truediv__', '__mul__']: - result = getattr(o, op)(1) - self.check_metadata(o, result) - - # ops with like - for op in ['__add__', '__sub__', '__truediv__', '__mul__']: - try: - result = getattr(o, op)(o) - self.check_metadata(o, result) - except (ValueError, AttributeError): - pass - - # simple boolean - for op in ['__eq__', '__le__', '__ge__']: - v1 = getattr(o, op)(o) - self.check_metadata(o, v1) - - try: - self.check_metadata(o, v1 & v1) - except (ValueError): - pass - - try: - self.check_metadata(o, v1 | v1) - except (ValueError): - pass - - # combine_first - try: - result = o.combine_first(o2) - self.check_metadata(o, result) - except (AttributeError): - pass - - # --------------------------- - # non-preserving (by default) - # --------------------------- - - # add non-like - try: - result = o + o2 - self.check_metadata(result) - except (ValueError, AttributeError): - pass - - # simple boolean - for op in ['__eq__', '__le__', '__ge__']: - - # this is a name matching op - v1 = getattr(o, op)(o) - - v2 = getattr(o, op)(o2) - self.check_metadata(v2) - - try: - self.check_metadata(v1 & v2) - except (ValueError): - pass - - try: - self.check_metadata(v1 | v2) - except (ValueError): - pass - - def test_head_tail(self): - # GH5370 - - o = self._construct(shape=10) - - # check all index types - for index in [tm.makeFloatIndex, tm.makeIntIndex, tm.makeStringIndex, - tm.makeUnicodeIndex, tm.makeDateIndex, - tm.makePeriodIndex]: - axis = o._get_axis_name(0) - setattr(o, axis, index(len(getattr(o, axis)))) - - # Panel + dims - try: - o.head() - except (NotImplementedError): - raise nose.SkipTest('not implemented on {0}'.format( - o.__class__.__name__)) - - self._compare(o.head(), o.iloc[:5]) - self._compare(o.tail(), o.iloc[-5:]) - - # 0-len - self._compare(o.head(0), o.iloc[0:0]) - self._compare(o.tail(0), o.iloc[0:0]) - - # bounded - self._compare(o.head(len(o) + 1), o) - self._compare(o.tail(len(o) + 1), o) - - # neg index - self._compare(o.head(-3), o.head(7)) - self._compare(o.tail(-3), o.tail(7)) - - def test_sample(self): - # Fixes issue: 2419 - - o = self._construct(shape=10) - - ### - # Check behavior of random_state argument - ### - - # Check for stability when receives seed or random state -- run 10 - # times. - for test in range(10): - seed = np.random.randint(0, 100) - self._compare( - o.sample(n=4, random_state=seed), o.sample(n=4, - random_state=seed)) - self._compare( - o.sample(frac=0.7, random_state=seed), o.sample( - frac=0.7, random_state=seed)) - - self._compare( - o.sample(n=4, random_state=np.random.RandomState(test)), - o.sample(n=4, random_state=np.random.RandomState(test))) - - self._compare( - o.sample(frac=0.7, random_state=np.random.RandomState(test)), - o.sample(frac=0.7, random_state=np.random.RandomState(test))) - - os1, os2 = [], [] - for _ in range(2): - np.random.seed(test) - os1.append(o.sample(n=4)) - os2.append(o.sample(frac=0.7)) - self._compare(*os1) - self._compare(*os2) - - # Check for error when random_state argument invalid. - with tm.assertRaises(ValueError): - o.sample(random_state='astring!') - - ### - # Check behavior of `frac` and `N` - ### - - # Giving both frac and N throws error - with tm.assertRaises(ValueError): - o.sample(n=3, frac=0.3) - - # Check that raises right error for negative lengths - with tm.assertRaises(ValueError): - o.sample(n=-3) - with tm.assertRaises(ValueError): - o.sample(frac=-0.3) - - # Make sure float values of `n` give error - with tm.assertRaises(ValueError): - o.sample(n=3.2) - - # Check lengths are right - self.assertTrue(len(o.sample(n=4) == 4)) - self.assertTrue(len(o.sample(frac=0.34) == 3)) - self.assertTrue(len(o.sample(frac=0.36) == 4)) - - ### - # Check weights - ### - - # Weight length must be right - with tm.assertRaises(ValueError): - o.sample(n=3, weights=[0, 1]) - - with tm.assertRaises(ValueError): - bad_weights = [0.5] * 11 - o.sample(n=3, weights=bad_weights) - - with tm.assertRaises(ValueError): - bad_weight_series = Series([0, 0, 0.2]) - o.sample(n=4, weights=bad_weight_series) - - # Check won't accept negative weights - with tm.assertRaises(ValueError): - bad_weights = [-0.1] * 10 - o.sample(n=3, weights=bad_weights) - - # Check inf and -inf throw errors: - with tm.assertRaises(ValueError): - weights_with_inf = [0.1] * 10 - weights_with_inf[0] = np.inf - o.sample(n=3, weights=weights_with_inf) - - with tm.assertRaises(ValueError): - weights_with_ninf = [0.1] * 10 - weights_with_ninf[0] = -np.inf - o.sample(n=3, weights=weights_with_ninf) - - # All zeros raises errors - zero_weights = [0] * 10 - with tm.assertRaises(ValueError): - o.sample(n=3, weights=zero_weights) - - # All missing weights - nan_weights = [np.nan] * 10 - with tm.assertRaises(ValueError): - o.sample(n=3, weights=nan_weights) - - # Check np.nan are replaced by zeros. - weights_with_nan = [np.nan] * 10 - weights_with_nan[5] = 0.5 - self._compare( - o.sample(n=1, axis=0, weights=weights_with_nan), o.iloc[5:6]) - - # Check None are also replaced by zeros. - weights_with_None = [None] * 10 - weights_with_None[5] = 0.5 - self._compare( - o.sample(n=1, axis=0, weights=weights_with_None), o.iloc[5:6]) - - def test_size_compat(self): - # GH8846 - # size property should be defined - - o = self._construct(shape=10) - self.assertTrue(o.size == np.prod(o.shape)) - self.assertTrue(o.size == 10 ** len(o.axes)) - - def test_split_compat(self): - # xref GH8846 - o = self._construct(shape=10) - self.assertTrue(len(np.array_split(o, 5)) == 5) - self.assertTrue(len(np.array_split(o, 2)) == 2) - - def test_unexpected_keyword(self): # GH8597 - df = DataFrame(np.random.randn(5, 2), columns=['jim', 'joe']) - ca = pd.Categorical([0, 0, 2, 2, 3, np.nan]) - ts = df['joe'].copy() - ts[2] = np.nan - - with assertRaisesRegexp(TypeError, 'unexpected keyword'): - df.drop('joe', axis=1, in_place=True) - - with assertRaisesRegexp(TypeError, 'unexpected keyword'): - df.reindex([1, 0], inplace=True) - - with assertRaisesRegexp(TypeError, 'unexpected keyword'): - ca.fillna(0, inplace=True) - - with assertRaisesRegexp(TypeError, 'unexpected keyword'): - ts.fillna(0, in_place=True) - - # See gh-12301 - def test_stat_unexpected_keyword(self): - obj = self._construct(5) - starwars = 'Star Wars' - errmsg = 'unexpected keyword' - - with assertRaisesRegexp(TypeError, errmsg): - obj.max(epic=starwars) # stat_function - with assertRaisesRegexp(TypeError, errmsg): - obj.var(epic=starwars) # stat_function_ddof - with assertRaisesRegexp(TypeError, errmsg): - obj.sum(epic=starwars) # cum_function - with assertRaisesRegexp(TypeError, errmsg): - obj.any(epic=starwars) # logical_function - - def test_api_compat(self): - - # GH 12021 - # compat for __name__, __qualname__ - - obj = self._construct(5) - for func in ['sum', 'cumsum', 'any', 'var']: - f = getattr(obj, func) - self.assertEqual(f.__name__, func) - if PY3: - self.assertTrue(f.__qualname__.endswith(func)) - - def test_stat_non_defaults_args(self): - obj = self._construct(5) - out = np.array([0]) - errmsg = "the 'out' parameter is not supported" - - with assertRaisesRegexp(ValueError, errmsg): - obj.max(out=out) # stat_function - with assertRaisesRegexp(ValueError, errmsg): - obj.var(out=out) # stat_function_ddof - with assertRaisesRegexp(ValueError, errmsg): - obj.sum(out=out) # cum_function - with assertRaisesRegexp(ValueError, errmsg): - obj.any(out=out) # logical_function - - def test_clip(self): - lower = 1 - upper = 3 - col = np.arange(5) - - obj = self._construct(len(col), value=col) - - if isinstance(obj, Panel): - msg = "clip is not supported yet for panels" - tm.assertRaisesRegexp(NotImplementedError, msg, - obj.clip, lower=lower, - upper=upper) - - else: - out = obj.clip(lower=lower, upper=upper) - expected = self._construct(len(col), value=col - .clip(lower, upper)) - self._compare(out, expected) - - bad_axis = 'foo' - msg = ('No axis named {axis} ' - 'for object').format(axis=bad_axis) - assertRaisesRegexp(ValueError, msg, obj.clip, - lower=lower, upper=upper, - axis=bad_axis) - - def test_truncate_out_of_bounds(self): - # GH11382 - - # small - shape = [int(2e3)] + ([1] * (self._ndim - 1)) - small = self._construct(shape, dtype='int8') - self._compare(small.truncate(), small) - self._compare(small.truncate(before=0, after=3e3), small) - self._compare(small.truncate(before=-1, after=2e3), small) - - # big - shape = [int(2e6)] + ([1] * (self._ndim - 1)) - big = self._construct(shape, dtype='int8') - self._compare(big.truncate(), big) - self._compare(big.truncate(before=0, after=3e6), big) - self._compare(big.truncate(before=-1, after=2e6), big) - - def test_numpy_clip(self): - lower = 1 - upper = 3 - col = np.arange(5) - - obj = self._construct(len(col), value=col) - - if isinstance(obj, Panel): - msg = "clip is not supported yet for panels" - tm.assertRaisesRegexp(NotImplementedError, msg, - np.clip, obj, - lower, upper) - else: - out = np.clip(obj, lower, upper) - expected = self._construct(len(col), value=col - .clip(lower, upper)) - self._compare(out, expected) - - msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, - np.clip, obj, - lower, upper, out=col) - - def test_validate_bool_args(self): - df = DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) - invalid_values = [1, "True", [1, 2, 3], 5.0] - - for value in invalid_values: - with self.assertRaises(ValueError): - super(DataFrame, df).rename_axis(mapper={'a': 'x', 'b': 'y'}, - axis=1, inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).drop('a', axis=1, inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).sort_index(inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).consolidate(inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).fillna(value=0, inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).replace(to_replace=1, value=7, - inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).interpolate(inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df)._where(cond=df.a > 2, inplace=value) - - with self.assertRaises(ValueError): - super(DataFrame, df).mask(cond=df.a > 2, inplace=value) - - -class TestSeries(tm.TestCase, Generic): - _typ = Series - _comparator = lambda self, x, y: assert_series_equal(x, y) - - def setUp(self): - self.ts = tm.makeTimeSeries() # Was at top level in test_series - self.ts.name = 'ts' - - self.series = tm.makeStringSeries() - self.series.name = 'series' - - def test_rename_mi(self): - s = Series([11, 21, 31], - index=MultiIndex.from_tuples( - [("A", x) for x in ["a", "B", "c"]])) - s.rename(str.lower) - - def test_set_axis_name(self): - s = Series([1, 2, 3], index=['a', 'b', 'c']) - funcs = ['rename_axis', '_set_axis_name'] - name = 'foo' - for func in funcs: - result = methodcaller(func, name)(s) - self.assertTrue(s.index.name is None) - self.assertEqual(result.index.name, name) - - def test_set_axis_name_mi(self): - s = Series([11, 21, 31], index=MultiIndex.from_tuples( - [("A", x) for x in ["a", "B", "c"]], - names=['l1', 'l2']) - ) - funcs = ['rename_axis', '_set_axis_name'] - for func in funcs: - result = methodcaller(func, ['L1', 'L2'])(s) - self.assertTrue(s.index.name is None) - self.assertEqual(s.index.names, ['l1', 'l2']) - self.assertTrue(result.index.name is None) - self.assertTrue(result.index.names, ['L1', 'L2']) - - def test_set_axis_name_raises(self): - s = pd.Series([1]) - with tm.assertRaises(ValueError): - s._set_axis_name(name='a', axis=1) - - def test_get_numeric_data_preserve_dtype(self): - - # get the numeric data - o = Series([1, 2, 3]) - result = o._get_numeric_data() - self._compare(result, o) - - o = Series([1, '2', 3.]) - result = o._get_numeric_data() - expected = Series([], dtype=object, index=pd.Index([], dtype=object)) - self._compare(result, expected) - - o = Series([True, False, True]) - result = o._get_numeric_data() - self._compare(result, o) - - o = Series([True, False, True]) - result = o._get_bool_data() - self._compare(result, o) - - o = Series(date_range('20130101', periods=3)) - result = o._get_numeric_data() - expected = Series([], dtype='M8[ns]', index=pd.Index([], dtype=object)) - self._compare(result, expected) - - def test_nonzero_single_element(self): - - # allow single item via bool method - s = Series([True]) - self.assertTrue(s.bool()) - - s = Series([False]) - self.assertFalse(s.bool()) - - # single item nan to raise - for s in [Series([np.nan]), Series([pd.NaT]), Series([True]), - Series([False])]: - self.assertRaises(ValueError, lambda: bool(s)) - - for s in [Series([np.nan]), Series([pd.NaT])]: - self.assertRaises(ValueError, lambda: s.bool()) - - # multiple bool are still an error - for s in [Series([True, True]), Series([False, False])]: - self.assertRaises(ValueError, lambda: bool(s)) - self.assertRaises(ValueError, lambda: s.bool()) - - # single non-bool are an error - for s in [Series([1]), Series([0]), Series(['a']), Series([0.0])]: - self.assertRaises(ValueError, lambda: bool(s)) - self.assertRaises(ValueError, lambda: s.bool()) - - def test_metadata_propagation_indiv(self): - # check that the metadata matches up on the resulting ops - - o = Series(range(3), range(3)) - o.name = 'foo' - o2 = Series(range(3), range(3)) - o2.name = 'bar' - - result = o.T - self.check_metadata(o, result) - - # resample - ts = Series(np.random.rand(1000), - index=date_range('20130101', periods=1000, freq='s'), - name='foo') - result = ts.resample('1T').mean() - self.check_metadata(ts, result) - - result = ts.resample('1T').min() - self.check_metadata(ts, result) - - result = ts.resample('1T').apply(lambda x: x.sum()) - self.check_metadata(ts, result) - - _metadata = Series._metadata - _finalize = Series.__finalize__ - Series._metadata = ['name', 'filename'] - o.filename = 'foo' - o2.filename = 'bar' - - def finalize(self, other, method=None, **kwargs): - for name in self._metadata: - if method == 'concat' and name == 'filename': - value = '+'.join([getattr( - o, name) for o in other.objs if getattr(o, name, None) - ]) - object.__setattr__(self, name, value) - else: - object.__setattr__(self, name, getattr(other, name, None)) - - return self - - Series.__finalize__ = finalize - - result = pd.concat([o, o2]) - self.assertEqual(result.filename, 'foo+bar') - self.assertIsNone(result.name) - - # reset - Series._metadata = _metadata - Series.__finalize__ = _finalize - - def test_describe(self): - self.series.describe() - self.ts.describe() - - def test_describe_objects(self): - s = Series(['a', 'b', 'b', np.nan, np.nan, np.nan, 'c', 'd', 'a', 'a']) - result = s.describe() - expected = Series({'count': 7, 'unique': 4, - 'top': 'a', 'freq': 3, 'second': 'b', - 'second_freq': 2}, index=result.index) - assert_series_equal(result, expected) - - dt = list(self.ts.index) - dt.append(dt[0]) - ser = Series(dt) - rs = ser.describe() - min_date = min(dt) - max_date = max(dt) - xp = Series({'count': len(dt), - 'unique': len(self.ts.index), - 'first': min_date, 'last': max_date, 'freq': 2, - 'top': min_date}, index=rs.index) - assert_series_equal(rs, xp) - - def test_describe_empty(self): - result = pd.Series().describe() - - self.assertEqual(result['count'], 0) - self.assertTrue(result.drop('count').isnull().all()) - - nanSeries = Series([np.nan]) - nanSeries.name = 'NaN' - result = nanSeries.describe() - self.assertEqual(result['count'], 0) - self.assertTrue(result.drop('count').isnull().all()) - - def test_describe_none(self): - noneSeries = Series([None]) - noneSeries.name = 'None' - expected = Series([0, 0], index=['count', 'unique'], name='None') - assert_series_equal(noneSeries.describe(), expected) - - def test_to_xarray(self): - - tm._skip_if_no_xarray() - from xarray import DataArray - - s = Series([]) - s.index.name = 'foo' - result = s.to_xarray() - self.assertEqual(len(result), 0) - self.assertEqual(len(result.coords), 1) - assert_almost_equal(list(result.coords.keys()), ['foo']) - self.assertIsInstance(result, DataArray) - - def testit(index, check_index_type=True, check_categorical=True): - s = Series(range(6), index=index(6)) - s.index.name = 'foo' - result = s.to_xarray() - repr(result) - self.assertEqual(len(result), 6) - self.assertEqual(len(result.coords), 1) - assert_almost_equal(list(result.coords.keys()), ['foo']) - self.assertIsInstance(result, DataArray) - - # idempotency - assert_series_equal(result.to_series(), s, - check_index_type=check_index_type, - check_categorical=check_categorical) - - for index in [tm.makeFloatIndex, tm.makeIntIndex, - tm.makeStringIndex, tm.makeUnicodeIndex, - tm.makeDateIndex, tm.makePeriodIndex, - tm.makeTimedeltaIndex]: - testit(index) - - # not idempotent - testit(tm.makeCategoricalIndex, check_index_type=False, - check_categorical=False) - - s = Series(range(6)) - s.index.name = 'foo' - s.index = pd.MultiIndex.from_product([['a', 'b'], range(3)], - names=['one', 'two']) - result = s.to_xarray() - self.assertEqual(len(result), 2) - assert_almost_equal(list(result.coords.keys()), ['one', 'two']) - self.assertIsInstance(result, DataArray) - assert_series_equal(result.to_series(), s) - - -class TestDataFrame(tm.TestCase, Generic): - _typ = DataFrame - _comparator = lambda self, x, y: assert_frame_equal(x, y) - - def test_rename_mi(self): - df = DataFrame([ - 11, 21, 31 - ], index=MultiIndex.from_tuples([("A", x) for x in ["a", "B", "c"]])) - df.rename(str.lower) - - def test_set_axis_name(self): - df = pd.DataFrame([[1, 2], [3, 4]]) - funcs = ['_set_axis_name', 'rename_axis'] - for func in funcs: - result = methodcaller(func, 'foo')(df) - self.assertTrue(df.index.name is None) - self.assertEqual(result.index.name, 'foo') - - result = methodcaller(func, 'cols', axis=1)(df) - self.assertTrue(df.columns.name is None) - self.assertEqual(result.columns.name, 'cols') - - def test_set_axis_name_mi(self): - df = DataFrame( - np.empty((3, 3)), - index=MultiIndex.from_tuples([("A", x) for x in list('aBc')]), - columns=MultiIndex.from_tuples([('C', x) for x in list('xyz')]) - ) - - level_names = ['L1', 'L2'] - funcs = ['_set_axis_name', 'rename_axis'] - for func in funcs: - result = methodcaller(func, level_names)(df) - self.assertEqual(result.index.names, level_names) - self.assertEqual(result.columns.names, [None, None]) - - result = methodcaller(func, level_names, axis=1)(df) - self.assertEqual(result.columns.names, ["L1", "L2"]) - self.assertEqual(result.index.names, [None, None]) - - def test_nonzero_single_element(self): - - # allow single item via bool method - df = DataFrame([[True]]) - self.assertTrue(df.bool()) - - df = DataFrame([[False]]) - self.assertFalse(df.bool()) - - df = DataFrame([[False, False]]) - self.assertRaises(ValueError, lambda: df.bool()) - self.assertRaises(ValueError, lambda: bool(df)) - - def test_get_numeric_data_preserve_dtype(self): - - # get the numeric data - o = DataFrame({'A': [1, '2', 3.]}) - result = o._get_numeric_data() - expected = DataFrame(index=[0, 1, 2], dtype=object) - self._compare(result, expected) - - def test_describe(self): - tm.makeDataFrame().describe() - tm.makeMixedDataFrame().describe() - tm.makeTimeDataFrame().describe() - - def test_describe_percentiles_percent_or_raw(self): - msg = 'percentiles should all be in the interval \\[0, 1\\]' - - df = tm.makeDataFrame() - with tm.assertRaisesRegexp(ValueError, msg): - df.describe(percentiles=[10, 50, 100]) - - with tm.assertRaisesRegexp(ValueError, msg): - df.describe(percentiles=[2]) - - with tm.assertRaisesRegexp(ValueError, msg): - df.describe(percentiles=[-2]) - - def test_describe_percentiles_equivalence(self): - df = tm.makeDataFrame() - d1 = df.describe() - d2 = df.describe(percentiles=[.25, .75]) - assert_frame_equal(d1, d2) - - def test_describe_percentiles_insert_median(self): - df = tm.makeDataFrame() - d1 = df.describe(percentiles=[.25, .75]) - d2 = df.describe(percentiles=[.25, .5, .75]) - assert_frame_equal(d1, d2) - self.assertTrue('25%' in d1.index) - self.assertTrue('75%' in d2.index) - - # none above - d1 = df.describe(percentiles=[.25, .45]) - d2 = df.describe(percentiles=[.25, .45, .5]) - assert_frame_equal(d1, d2) - self.assertTrue('25%' in d1.index) - self.assertTrue('45%' in d2.index) - - # none below - d1 = df.describe(percentiles=[.75, 1]) - d2 = df.describe(percentiles=[.5, .75, 1]) - assert_frame_equal(d1, d2) - self.assertTrue('75%' in d1.index) - self.assertTrue('100%' in d2.index) - - # edge - d1 = df.describe(percentiles=[0, 1]) - d2 = df.describe(percentiles=[0, .5, 1]) - assert_frame_equal(d1, d2) - self.assertTrue('0%' in d1.index) - self.assertTrue('100%' in d2.index) - - def test_describe_percentiles_insert_median_ndarray(self): - # GH14908 - df = tm.makeDataFrame() - result = df.describe(percentiles=np.array([.25, .75])) - expected = df.describe(percentiles=[.25, .75]) - assert_frame_equal(result, expected) - - def test_describe_percentiles_unique(self): - # GH13104 - df = tm.makeDataFrame() - with self.assertRaises(ValueError): - df.describe(percentiles=[0.1, 0.2, 0.4, 0.5, 0.2, 0.6]) - with self.assertRaises(ValueError): - df.describe(percentiles=[0.1, 0.2, 0.4, 0.2, 0.6]) - - def test_describe_percentiles_formatting(self): - # GH13104 - df = tm.makeDataFrame() - - # default - result = df.describe().index - expected = Index(['count', 'mean', 'std', 'min', '25%', '50%', '75%', - 'max'], - dtype='object') - tm.assert_index_equal(result, expected) - - result = df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, - 0.9995, 0.9999]).index - expected = Index(['count', 'mean', 'std', 'min', '0.01%', '0.05%', - '0.1%', '50%', '99.9%', '99.95%', '99.99%', 'max'], - dtype='object') - tm.assert_index_equal(result, expected) - - result = df.describe(percentiles=[0.00499, 0.005, 0.25, 0.50, - 0.75]).index - expected = Index(['count', 'mean', 'std', 'min', '0.499%', '0.5%', - '25%', '50%', '75%', 'max'], - dtype='object') - tm.assert_index_equal(result, expected) - - result = df.describe(percentiles=[0.00499, 0.01001, 0.25, 0.50, - 0.75]).index - expected = Index(['count', 'mean', 'std', 'min', '0.5%', '1.0%', - '25%', '50%', '75%', 'max'], - dtype='object') - tm.assert_index_equal(result, expected) - - def test_describe_column_index_type(self): - # GH13288 - df = pd.DataFrame([1, 2, 3, 4]) - df.columns = pd.Index([0], dtype=object) - result = df.describe().columns - expected = Index([0], dtype=object) - tm.assert_index_equal(result, expected) - - df = pd.DataFrame({'A': list("BCDE"), 0: [1, 2, 3, 4]}) - result = df.describe().columns - expected = Index([0], dtype=object) - tm.assert_index_equal(result, expected) - - def test_describe_no_numeric(self): - df = DataFrame({'A': ['foo', 'foo', 'bar'] * 8, - 'B': ['a', 'b', 'c', 'd'] * 6}) - desc = df.describe() - expected = DataFrame(dict((k, v.describe()) - for k, v in compat.iteritems(df)), - columns=df.columns) - assert_frame_equal(desc, expected) - - ts = tm.makeTimeSeries() - df = DataFrame({'time': ts.index}) - desc = df.describe() - self.assertEqual(desc.time['first'], min(ts.index)) - - def test_describe_empty(self): - df = DataFrame() - tm.assertRaisesRegexp(ValueError, 'DataFrame without columns', - df.describe) - - df = DataFrame(columns=['A', 'B']) - result = df.describe() - expected = DataFrame(0, columns=['A', 'B'], index=['count', 'unique']) - tm.assert_frame_equal(result, expected) - - def test_describe_empty_int_columns(self): - df = DataFrame([[0, 1], [1, 2]]) - desc = df[df[0] < 0].describe() # works - assert_series_equal(desc.xs('count'), - Series([0, 0], dtype=float, name='count')) - self.assertTrue(isnull(desc.iloc[1:]).all().all()) - - def test_describe_objects(self): - df = DataFrame({"C1": ['a', 'a', 'c'], "C2": ['d', 'd', 'f']}) - result = df.describe() - expected = DataFrame({"C1": [3, 2, 'a', 2], "C2": [3, 2, 'd', 2]}, - index=['count', 'unique', 'top', 'freq']) - assert_frame_equal(result, expected) - - df = DataFrame({"C1": pd.date_range('2010-01-01', periods=4, freq='D') - }) - df.loc[4] = pd.Timestamp('2010-01-04') - result = df.describe() - expected = DataFrame({"C1": [5, 4, pd.Timestamp('2010-01-04'), 2, - pd.Timestamp('2010-01-01'), - pd.Timestamp('2010-01-04')]}, - index=['count', 'unique', 'top', 'freq', - 'first', 'last']) - assert_frame_equal(result, expected) - - # mix time and str - df['C2'] = ['a', 'a', 'b', 'c', 'a'] - result = df.describe() - expected['C2'] = [5, 3, 'a', 3, np.nan, np.nan] - assert_frame_equal(result, expected) - - # just str - expected = DataFrame({'C2': [5, 3, 'a', 4]}, - index=['count', 'unique', 'top', 'freq']) - result = df[['C2']].describe() - - # mix of time, str, numeric - df['C3'] = [2, 4, 6, 8, 2] - result = df.describe() - expected = DataFrame({"C3": [5., 4.4, 2.607681, 2., 2., 4., 6., 8.]}, - index=['count', 'mean', 'std', 'min', '25%', - '50%', '75%', 'max']) - assert_frame_equal(result, expected) - assert_frame_equal(df.describe(), df[['C3']].describe()) - - assert_frame_equal(df[['C1', 'C3']].describe(), df[['C3']].describe()) - assert_frame_equal(df[['C2', 'C3']].describe(), df[['C3']].describe()) - - def test_describe_typefiltering(self): - df = DataFrame({'catA': ['foo', 'foo', 'bar'] * 8, - 'catB': ['a', 'b', 'c', 'd'] * 6, - 'numC': np.arange(24, dtype='int64'), - 'numD': np.arange(24.) + .5, - 'ts': tm.makeTimeSeries()[:24].index}) - - descN = df.describe() - expected_cols = ['numC', 'numD', ] - expected = DataFrame(dict((k, df[k].describe()) - for k in expected_cols), - columns=expected_cols) - assert_frame_equal(descN, expected) - - desc = df.describe(include=['number']) - assert_frame_equal(desc, descN) - desc = df.describe(exclude=['object', 'datetime']) - assert_frame_equal(desc, descN) - desc = df.describe(include=['float']) - assert_frame_equal(desc, descN.drop('numC', 1)) - - descC = df.describe(include=['O']) - expected_cols = ['catA', 'catB'] - expected = DataFrame(dict((k, df[k].describe()) - for k in expected_cols), - columns=expected_cols) - assert_frame_equal(descC, expected) - - descD = df.describe(include=['datetime']) - assert_series_equal(descD.ts, df.ts.describe()) - - desc = df.describe(include=['object', 'number', 'datetime']) - assert_frame_equal(desc.loc[:, ["numC", "numD"]].dropna(), descN) - assert_frame_equal(desc.loc[:, ["catA", "catB"]].dropna(), descC) - descDs = descD.sort_index() # the index order change for mixed-types - assert_frame_equal(desc.loc[:, "ts":].dropna().sort_index(), descDs) - - desc = df.loc[:, 'catA':'catB'].describe(include='all') - assert_frame_equal(desc, descC) - desc = df.loc[:, 'numC':'numD'].describe(include='all') - assert_frame_equal(desc, descN) - - desc = df.describe(percentiles=[], include='all') - cnt = Series(data=[4, 4, 6, 6, 6], - index=['catA', 'catB', 'numC', 'numD', 'ts']) - assert_series_equal(desc.count(), cnt) - self.assertTrue('count' in desc.index) - self.assertTrue('unique' in desc.index) - self.assertTrue('50%' in desc.index) - self.assertTrue('first' in desc.index) - - desc = df.drop("ts", 1).describe(percentiles=[], include='all') - assert_series_equal(desc.count(), cnt.drop("ts")) - self.assertTrue('first' not in desc.index) - desc = df.drop(["numC", "numD"], 1).describe(percentiles=[], - include='all') - assert_series_equal(desc.count(), cnt.drop(["numC", "numD"])) - self.assertTrue('50%' not in desc.index) - - def test_describe_typefiltering_category_bool(self): - df = DataFrame({'A_cat': pd.Categorical(['foo', 'foo', 'bar'] * 8), - 'B_str': ['a', 'b', 'c', 'd'] * 6, - 'C_bool': [True] * 12 + [False] * 12, - 'D_num': np.arange(24.) + .5, - 'E_ts': tm.makeTimeSeries()[:24].index}) - - desc = df.describe() - expected_cols = ['D_num'] - expected = DataFrame(dict((k, df[k].describe()) - for k in expected_cols), - columns=expected_cols) - assert_frame_equal(desc, expected) - - desc = df.describe(include=["category"]) - self.assertTrue(desc.columns.tolist() == ["A_cat"]) - - # 'all' includes numpy-dtypes + category - desc1 = df.describe(include="all") - desc2 = df.describe(include=[np.generic, "category"]) - assert_frame_equal(desc1, desc2) - - def test_describe_timedelta(self): - df = DataFrame({"td": pd.to_timedelta(np.arange(24) % 20, "D")}) - self.assertTrue(df.describe().loc["mean"][0] == pd.to_timedelta( - "8d4h")) - - def test_describe_typefiltering_dupcol(self): - df = DataFrame({'catA': ['foo', 'foo', 'bar'] * 8, - 'catB': ['a', 'b', 'c', 'd'] * 6, - 'numC': np.arange(24), - 'numD': np.arange(24.) + .5, - 'ts': tm.makeTimeSeries()[:24].index}) - s = df.describe(include='all').shape[1] - df = pd.concat([df, df], axis=1) - s2 = df.describe(include='all').shape[1] - self.assertTrue(s2 == 2 * s) - - def test_describe_typefiltering_groupby(self): - df = DataFrame({'catA': ['foo', 'foo', 'bar'] * 8, - 'catB': ['a', 'b', 'c', 'd'] * 6, - 'numC': np.arange(24), - 'numD': np.arange(24.) + .5, - 'ts': tm.makeTimeSeries()[:24].index}) - G = df.groupby('catA') - self.assertTrue(G.describe(include=['number']).shape == (16, 2)) - self.assertTrue(G.describe(include=['number', 'object']).shape == (22, - 3)) - self.assertTrue(G.describe(include='all').shape == (26, 4)) - - def test_describe_multi_index_df_column_names(self): - """ Test that column names persist after the describe operation.""" - - df = pd.DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.random.randn(8)}) - - # GH 11517 - # test for hierarchical index - hierarchical_index_df = df.groupby(['A', 'B']).mean().T - self.assertTrue(hierarchical_index_df.columns.names == ['A', 'B']) - self.assertTrue(hierarchical_index_df.describe().columns.names == - ['A', 'B']) - - # test for non-hierarchical index - non_hierarchical_index_df = df.groupby(['A']).mean().T - self.assertTrue(non_hierarchical_index_df.columns.names == ['A']) - self.assertTrue(non_hierarchical_index_df.describe().columns.names == - ['A']) - - def test_metadata_propagation_indiv(self): - - # groupby - df = DataFrame( - {'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], - 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], - 'C': np.random.randn(8), - 'D': np.random.randn(8)}) - result = df.groupby('A').sum() - self.check_metadata(df, result) - - # resample - df = DataFrame(np.random.randn(1000, 2), - index=date_range('20130101', periods=1000, freq='s')) - result = df.resample('1T') - self.check_metadata(df, result) - - # merging with override - # GH 6923 - _metadata = DataFrame._metadata - _finalize = DataFrame.__finalize__ - - np.random.seed(10) - df1 = DataFrame(np.random.randint(0, 4, (3, 2)), columns=['a', 'b']) - df2 = DataFrame(np.random.randint(0, 4, (3, 2)), columns=['c', 'd']) - DataFrame._metadata = ['filename'] - df1.filename = 'fname1.csv' - df2.filename = 'fname2.csv' - - def finalize(self, other, method=None, **kwargs): - - for name in self._metadata: - if method == 'merge': - left, right = other.left, other.right - value = getattr(left, name, '') + '|' + getattr(right, - name, '') - object.__setattr__(self, name, value) - else: - object.__setattr__(self, name, getattr(other, name, '')) - - return self - - DataFrame.__finalize__ = finalize - result = df1.merge(df2, left_on=['a'], right_on=['c'], how='inner') - self.assertEqual(result.filename, 'fname1.csv|fname2.csv') - - # concat - # GH 6927 - DataFrame._metadata = ['filename'] - df1 = DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab')) - df1.filename = 'foo' - - def finalize(self, other, method=None, **kwargs): - for name in self._metadata: - if method == 'concat': - value = '+'.join([getattr( - o, name) for o in other.objs if getattr(o, name, None) - ]) - object.__setattr__(self, name, value) - else: - object.__setattr__(self, name, getattr(other, name, None)) - - return self - - DataFrame.__finalize__ = finalize - - result = pd.concat([df1, df1]) - self.assertEqual(result.filename, 'foo+foo') - - # reset - DataFrame._metadata = _metadata - DataFrame.__finalize__ = _finalize - - def test_tz_convert_and_localize(self): - l0 = date_range('20140701', periods=5, freq='D') - - # TODO: l1 should be a PeriodIndex for testing - # after GH2106 is addressed - with tm.assertRaises(NotImplementedError): - period_range('20140701', periods=1).tz_convert('UTC') - with tm.assertRaises(NotImplementedError): - period_range('20140701', periods=1).tz_localize('UTC') - # l1 = period_range('20140701', periods=5, freq='D') - l1 = date_range('20140701', periods=5, freq='D') - - int_idx = Index(range(5)) - - for fn in ['tz_localize', 'tz_convert']: - - if fn == 'tz_convert': - l0 = l0.tz_localize('UTC') - l1 = l1.tz_localize('UTC') - - for idx in [l0, l1]: - - l0_expected = getattr(idx, fn)('US/Pacific') - l1_expected = getattr(idx, fn)('US/Pacific') - - df1 = DataFrame(np.ones(5), index=l0) - df1 = getattr(df1, fn)('US/Pacific') - self.assert_index_equal(df1.index, l0_expected) - - # MultiIndex - # GH7846 - df2 = DataFrame(np.ones(5), MultiIndex.from_arrays([l0, l1])) - - df3 = getattr(df2, fn)('US/Pacific', level=0) - self.assertFalse(df3.index.levels[0].equals(l0)) - self.assert_index_equal(df3.index.levels[0], l0_expected) - self.assert_index_equal(df3.index.levels[1], l1) - self.assertFalse(df3.index.levels[1].equals(l1_expected)) - - df3 = getattr(df2, fn)('US/Pacific', level=1) - self.assert_index_equal(df3.index.levels[0], l0) - self.assertFalse(df3.index.levels[0].equals(l0_expected)) - self.assert_index_equal(df3.index.levels[1], l1_expected) - self.assertFalse(df3.index.levels[1].equals(l1)) - - df4 = DataFrame(np.ones(5), - MultiIndex.from_arrays([int_idx, l0])) - - # TODO: untested - df5 = getattr(df4, fn)('US/Pacific', level=1) # noqa - - self.assert_index_equal(df3.index.levels[0], l0) - self.assertFalse(df3.index.levels[0].equals(l0_expected)) - self.assert_index_equal(df3.index.levels[1], l1_expected) - self.assertFalse(df3.index.levels[1].equals(l1)) - - # Bad Inputs - for fn in ['tz_localize', 'tz_convert']: - # Not DatetimeIndex / PeriodIndex - with tm.assertRaisesRegexp(TypeError, 'DatetimeIndex'): - df = DataFrame(index=int_idx) - df = getattr(df, fn)('US/Pacific') - - # Not DatetimeIndex / PeriodIndex - with tm.assertRaisesRegexp(TypeError, 'DatetimeIndex'): - df = DataFrame(np.ones(5), - MultiIndex.from_arrays([int_idx, l0])) - df = getattr(df, fn)('US/Pacific', level=0) - - # Invalid level - with tm.assertRaisesRegexp(ValueError, 'not valid'): - df = DataFrame(index=l0) - df = getattr(df, fn)('US/Pacific', level=1) - - def test_set_attribute(self): - # Test for consistent setattr behavior when an attribute and a column - # have the same name (Issue #8994) - df = DataFrame({'x': [1, 2, 3]}) - - df.y = 2 - df['y'] = [2, 4, 6] - df.y = 5 - - self.assertEqual(df.y, 5) - assert_series_equal(df['y'], Series([2, 4, 6], name='y')) - - def test_pct_change(self): - # GH 11150 - pnl = DataFrame([np.arange(0, 40, 10), np.arange(0, 40, 10), np.arange( - 0, 40, 10)]).astype(np.float64) - pnl.iat[1, 0] = np.nan - pnl.iat[1, 1] = np.nan - pnl.iat[2, 3] = 60 - - mask = pnl.isnull() - - for axis in range(2): - expected = pnl.ffill(axis=axis) / pnl.ffill(axis=axis).shift( - axis=axis) - 1 - expected[mask] = np.nan - result = pnl.pct_change(axis=axis, fill_method='pad') - - self.assert_frame_equal(result, expected) - - def test_to_xarray(self): - - tm._skip_if_no_xarray() - from xarray import Dataset - - df = DataFrame({'a': list('abc'), - 'b': list(range(1, 4)), - 'c': np.arange(3, 6).astype('u1'), - 'd': np.arange(4.0, 7.0, dtype='float64'), - 'e': [True, False, True], - 'f': pd.Categorical(list('abc')), - 'g': pd.date_range('20130101', periods=3), - 'h': pd.date_range('20130101', - periods=3, - tz='US/Eastern')} - ) - - df.index.name = 'foo' - result = df[0:0].to_xarray() - self.assertEqual(result.dims['foo'], 0) - self.assertIsInstance(result, Dataset) - - for index in [tm.makeFloatIndex, tm.makeIntIndex, - tm.makeStringIndex, tm.makeUnicodeIndex, - tm.makeDateIndex, tm.makePeriodIndex, - tm.makeCategoricalIndex, tm.makeTimedeltaIndex]: - df.index = index(3) - df.index.name = 'foo' - df.columns.name = 'bar' - result = df.to_xarray() - self.assertEqual(result.dims['foo'], 3) - self.assertEqual(len(result.coords), 1) - self.assertEqual(len(result.data_vars), 8) - assert_almost_equal(list(result.coords.keys()), ['foo']) - self.assertIsInstance(result, Dataset) - - # idempotency - # categoricals are not preserved - # datetimes w/tz are not preserved - # column names are lost - expected = df.copy() - expected['f'] = expected['f'].astype(object) - expected['h'] = expected['h'].astype('datetime64[ns]') - expected.columns.name = None - assert_frame_equal(result.to_dataframe(), expected, - check_index_type=False, check_categorical=False) - - # available in 0.7.1 - # MultiIndex - df.index = pd.MultiIndex.from_product([['a'], range(3)], - names=['one', 'two']) - result = df.to_xarray() - self.assertEqual(result.dims['one'], 1) - self.assertEqual(result.dims['two'], 3) - self.assertEqual(len(result.coords), 2) - self.assertEqual(len(result.data_vars), 8) - assert_almost_equal(list(result.coords.keys()), ['one', 'two']) - self.assertIsInstance(result, Dataset) - - result = result.to_dataframe() - expected = df.copy() - expected['f'] = expected['f'].astype(object) - expected['h'] = expected['h'].astype('datetime64[ns]') - expected.columns.name = None - assert_frame_equal(result, - expected, - check_index_type=False) - - -class TestPanel(tm.TestCase, Generic): - _typ = Panel - _comparator = lambda self, x, y: assert_panel_equal(x, y, by_blocks=True) - - def test_to_xarray(self): - - tm._skip_if_no_xarray() - from xarray import DataArray - - p = tm.makePanel() - - result = p.to_xarray() - self.assertIsInstance(result, DataArray) - self.assertEqual(len(result.coords), 3) - assert_almost_equal(list(result.coords.keys()), - ['items', 'major_axis', 'minor_axis']) - self.assertEqual(len(result.dims), 3) - - # idempotency - assert_panel_equal(result.to_pandas(), p) - - -class TestPanel4D(tm.TestCase, Generic): - _typ = Panel4D - _comparator = lambda self, x, y: assert_panel4d_equal(x, y, by_blocks=True) - - def test_sample(self): - raise nose.SkipTest("sample on Panel4D") - - def test_to_xarray(self): - - tm._skip_if_no_xarray() - from xarray import DataArray - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - p = tm.makePanel4D() - - result = p.to_xarray() - self.assertIsInstance(result, DataArray) - self.assertEqual(len(result.coords), 4) - assert_almost_equal(list(result.coords.keys()), - ['labels', 'items', 'major_axis', - 'minor_axis']) - self.assertEqual(len(result.dims), 4) - - # non-convertible - self.assertRaises(ValueError, lambda: result.to_pandas()) - -# run all the tests, but wrap each in a warning catcher -for t in ['test_rename', 'test_rename_axis', 'test_get_numeric_data', - 'test_get_default', 'test_nonzero', - 'test_numpy_1_7_compat_numeric_methods', - 'test_downcast', 'test_constructor_compound_dtypes', - 'test_head_tail', - 'test_size_compat', 'test_split_compat', - 'test_unexpected_keyword', - 'test_stat_unexpected_keyword', 'test_api_compat', - 'test_stat_non_defaults_args', - 'test_clip', 'test_truncate_out_of_bounds', 'test_numpy_clip', - 'test_metadata_propagation']: - - def f(): - def tester(self): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - return getattr(super(TestPanel4D, self), t)() - return tester - - setattr(TestPanel4D, t, f()) - - -class TestNDFrame(tm.TestCase): - # tests that don't fit elsewhere - - def test_sample(sel): - # Fixes issue: 2419 - # additional specific object based tests - - # A few dataframe test with degenerate weights. - easy_weight_list = [0] * 10 - easy_weight_list[5] = 1 - - df = pd.DataFrame({'col1': range(10, 20), - 'col2': range(20, 30), - 'colString': ['a'] * 10, - 'easyweights': easy_weight_list}) - sample1 = df.sample(n=1, weights='easyweights') - assert_frame_equal(sample1, df.iloc[5:6]) - - # Ensure proper error if string given as weight for Series, panel, or - # DataFrame with axis = 1. - s = Series(range(10)) - with tm.assertRaises(ValueError): - s.sample(n=3, weights='weight_column') - - panel = pd.Panel(items=[0, 1, 2], major_axis=[2, 3, 4], - minor_axis=[3, 4, 5]) - with tm.assertRaises(ValueError): - panel.sample(n=1, weights='weight_column') - - with tm.assertRaises(ValueError): - df.sample(n=1, weights='weight_column', axis=1) - - # Check weighting key error - with tm.assertRaises(KeyError): - df.sample(n=3, weights='not_a_real_column_name') - - # Check that re-normalizes weights that don't sum to one. - weights_less_than_1 = [0] * 10 - weights_less_than_1[0] = 0.5 - tm.assert_frame_equal( - df.sample(n=1, weights=weights_less_than_1), df.iloc[:1]) - - ### - # Test axis argument - ### - - # Test axis argument - df = pd.DataFrame({'col1': range(10), 'col2': ['a'] * 10}) - second_column_weight = [0, 1] - assert_frame_equal( - df.sample(n=1, axis=1, weights=second_column_weight), df[['col2']]) - - # Different axis arg types - assert_frame_equal(df.sample(n=1, axis='columns', - weights=second_column_weight), - df[['col2']]) - - weight = [0] * 10 - weight[5] = 0.5 - assert_frame_equal(df.sample(n=1, axis='rows', weights=weight), - df.iloc[5:6]) - assert_frame_equal(df.sample(n=1, axis='index', weights=weight), - df.iloc[5:6]) - - # Check out of range axis values - with tm.assertRaises(ValueError): - df.sample(n=1, axis=2) - - with tm.assertRaises(ValueError): - df.sample(n=1, axis='not_a_name') - - with tm.assertRaises(ValueError): - s = pd.Series(range(10)) - s.sample(n=1, axis=1) - - # Test weight length compared to correct axis - with tm.assertRaises(ValueError): - df.sample(n=1, axis=1, weights=[0.5] * 10) - - # Check weights with axis = 1 - easy_weight_list = [0] * 3 - easy_weight_list[2] = 1 - - df = pd.DataFrame({'col1': range(10, 20), - 'col2': range(20, 30), - 'colString': ['a'] * 10}) - sample1 = df.sample(n=1, axis=1, weights=easy_weight_list) - assert_frame_equal(sample1, df[['colString']]) - - # Test default axes - p = pd.Panel(items=['a', 'b', 'c'], major_axis=[2, 4, 6], - minor_axis=[1, 3, 5]) - assert_panel_equal( - p.sample(n=3, random_state=42), p.sample(n=3, axis=1, - random_state=42)) - assert_frame_equal( - df.sample(n=3, random_state=42), df.sample(n=3, axis=0, - random_state=42)) - - # Test that function aligns weights with frame - df = DataFrame( - {'col1': [5, 6, 7], - 'col2': ['a', 'b', 'c'], }, index=[9, 5, 3]) - s = Series([1, 0, 0], index=[3, 5, 9]) - assert_frame_equal(df.loc[[3]], df.sample(1, weights=s)) - - # Weights have index values to be dropped because not in - # sampled DataFrame - s2 = Series([0.001, 0, 10000], index=[3, 5, 10]) - assert_frame_equal(df.loc[[3]], df.sample(1, weights=s2)) - - # Weights have empty values to be filed with zeros - s3 = Series([0.01, 0], index=[3, 5]) - assert_frame_equal(df.loc[[3]], df.sample(1, weights=s3)) - - # No overlap in weight and sampled DataFrame indices - s4 = Series([1, 0], index=[1, 2]) - with tm.assertRaises(ValueError): - df.sample(1, weights=s4) - - def test_squeeze(self): - # noop - for s in [tm.makeFloatSeries(), tm.makeStringSeries(), - tm.makeObjectSeries()]: - tm.assert_series_equal(s.squeeze(), s) - for df in [tm.makeTimeDataFrame()]: - tm.assert_frame_equal(df.squeeze(), df) - for p in [tm.makePanel()]: - tm.assert_panel_equal(p.squeeze(), p) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - for p4d in [tm.makePanel4D()]: - tm.assert_panel4d_equal(p4d.squeeze(), p4d) - - # squeezing - df = tm.makeTimeDataFrame().reindex(columns=['A']) - tm.assert_series_equal(df.squeeze(), df['A']) - - p = tm.makePanel().reindex(items=['ItemA']) - tm.assert_frame_equal(p.squeeze(), p['ItemA']) - - p = tm.makePanel().reindex(items=['ItemA'], minor_axis=['A']) - tm.assert_series_equal(p.squeeze(), p.loc['ItemA', :, 'A']) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - p4d = tm.makePanel4D().reindex(labels=['label1']) - tm.assert_panel_equal(p4d.squeeze(), p4d['label1']) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - p4d = tm.makePanel4D().reindex(labels=['label1'], items=['ItemA']) - tm.assert_frame_equal(p4d.squeeze(), p4d.loc['label1', 'ItemA']) - - # don't fail with 0 length dimensions GH11229 & GH8999 - empty_series = pd.Series([], name='five') - empty_frame = pd.DataFrame([empty_series]) - empty_panel = pd.Panel({'six': empty_frame}) - - [tm.assert_series_equal(empty_series, higher_dim.squeeze()) - for higher_dim in [empty_series, empty_frame, empty_panel]] - - def test_numpy_squeeze(self): - s = tm.makeFloatSeries() - tm.assert_series_equal(np.squeeze(s), s) - - df = tm.makeTimeDataFrame().reindex(columns=['A']) - tm.assert_series_equal(np.squeeze(df), df['A']) - - msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, - np.squeeze, s, axis=0) - - def test_transpose(self): - msg = (r"transpose\(\) got multiple values for " - r"keyword argument 'axes'") - for s in [tm.makeFloatSeries(), tm.makeStringSeries(), - tm.makeObjectSeries()]: - # calls implementation in pandas/core/base.py - tm.assert_series_equal(s.transpose(), s) - for df in [tm.makeTimeDataFrame()]: - tm.assert_frame_equal(df.transpose().transpose(), df) - for p in [tm.makePanel()]: - tm.assert_panel_equal(p.transpose(2, 0, 1) - .transpose(1, 2, 0), p) - tm.assertRaisesRegexp(TypeError, msg, p.transpose, - 2, 0, 1, axes=(2, 0, 1)) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - for p4d in [tm.makePanel4D()]: - tm.assert_panel4d_equal(p4d.transpose(2, 0, 3, 1) - .transpose(1, 3, 0, 2), p4d) - tm.assertRaisesRegexp(TypeError, msg, p4d.transpose, - 2, 0, 3, 1, axes=(2, 0, 3, 1)) - - def test_numpy_transpose(self): - msg = "the 'axes' parameter is not supported" - - s = tm.makeFloatSeries() - tm.assert_series_equal( - np.transpose(s), s) - tm.assertRaisesRegexp(ValueError, msg, - np.transpose, s, axes=1) - - df = tm.makeTimeDataFrame() - tm.assert_frame_equal(np.transpose( - np.transpose(df)), df) - tm.assertRaisesRegexp(ValueError, msg, - np.transpose, df, axes=1) - - p = tm.makePanel() - tm.assert_panel_equal(np.transpose( - np.transpose(p, axes=(2, 0, 1)), - axes=(1, 2, 0)), p) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - p4d = tm.makePanel4D() - tm.assert_panel4d_equal(np.transpose( - np.transpose(p4d, axes=(2, 0, 3, 1)), - axes=(1, 3, 0, 2)), p4d) - - def test_take(self): - indices = [1, 5, -2, 6, 3, -1] - for s in [tm.makeFloatSeries(), tm.makeStringSeries(), - tm.makeObjectSeries()]: - out = s.take(indices) - expected = Series(data=s.values.take(indices), - index=s.index.take(indices)) - tm.assert_series_equal(out, expected) - for df in [tm.makeTimeDataFrame()]: - out = df.take(indices) - expected = DataFrame(data=df.values.take(indices, axis=0), - index=df.index.take(indices), - columns=df.columns) - tm.assert_frame_equal(out, expected) - - indices = [-3, 2, 0, 1] - for p in [tm.makePanel()]: - out = p.take(indices) - expected = Panel(data=p.values.take(indices, axis=0), - items=p.items.take(indices), - major_axis=p.major_axis, - minor_axis=p.minor_axis) - tm.assert_panel_equal(out, expected) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - for p4d in [tm.makePanel4D()]: - out = p4d.take(indices) - expected = Panel4D(data=p4d.values.take(indices, axis=0), - labels=p4d.labels.take(indices), - major_axis=p4d.major_axis, - minor_axis=p4d.minor_axis, - items=p4d.items) - tm.assert_panel4d_equal(out, expected) - - def test_take_invalid_kwargs(self): - indices = [-3, 2, 0, 1] - s = tm.makeFloatSeries() - df = tm.makeTimeDataFrame() - p = tm.makePanel() - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - p4d = tm.makePanel4D() - - for obj in (s, df, p, p4d): - msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, obj.take, - indices, foo=2) - - msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, obj.take, - indices, out=indices) - - msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, obj.take, - indices, mode='clip') - - def test_equals(self): - s1 = pd.Series([1, 2, 3], index=[0, 2, 1]) - s2 = s1.copy() - self.assertTrue(s1.equals(s2)) - - s1[1] = 99 - self.assertFalse(s1.equals(s2)) - - # NaNs compare as equal - s1 = pd.Series([1, np.nan, 3, np.nan], index=[0, 2, 1, 3]) - s2 = s1.copy() - self.assertTrue(s1.equals(s2)) - - s2[0] = 9.9 - self.assertFalse(s1.equals(s2)) - - idx = MultiIndex.from_tuples([(0, 'a'), (1, 'b'), (2, 'c')]) - s1 = Series([1, 2, np.nan], index=idx) - s2 = s1.copy() - self.assertTrue(s1.equals(s2)) - - # Add object dtype column with nans - index = np.random.random(10) - df1 = DataFrame( - np.random.random(10, ), index=index, columns=['floats']) - df1['text'] = 'the sky is so blue. we could use more chocolate.'.split( - ) - df1['start'] = date_range('2000-1-1', periods=10, freq='T') - df1['end'] = date_range('2000-1-1', periods=10, freq='D') - df1['diff'] = df1['end'] - df1['start'] - df1['bool'] = (np.arange(10) % 3 == 0) - df1.loc[::2] = nan - df2 = df1.copy() - self.assertTrue(df1['text'].equals(df2['text'])) - self.assertTrue(df1['start'].equals(df2['start'])) - self.assertTrue(df1['end'].equals(df2['end'])) - self.assertTrue(df1['diff'].equals(df2['diff'])) - self.assertTrue(df1['bool'].equals(df2['bool'])) - self.assertTrue(df1.equals(df2)) - self.assertFalse(df1.equals(object)) - - # different dtype - different = df1.copy() - different['floats'] = different['floats'].astype('float32') - self.assertFalse(df1.equals(different)) - - # different index - different_index = -index - different = df2.set_index(different_index) - self.assertFalse(df1.equals(different)) - - # different columns - different = df2.copy() - different.columns = df2.columns[::-1] - self.assertFalse(df1.equals(different)) - - # DatetimeIndex - index = pd.date_range('2000-1-1', periods=10, freq='T') - df1 = df1.set_index(index) - df2 = df1.copy() - self.assertTrue(df1.equals(df2)) - - # MultiIndex - df3 = df1.set_index(['text'], append=True) - df2 = df1.set_index(['text'], append=True) - self.assertTrue(df3.equals(df2)) - - df2 = df1.set_index(['floats'], append=True) - self.assertFalse(df3.equals(df2)) - - # NaN in index - df3 = df1.set_index(['floats'], append=True) - df2 = df1.set_index(['floats'], append=True) - self.assertTrue(df3.equals(df2)) - - # GH 8437 - a = pd.Series([False, np.nan]) - b = pd.Series([False, np.nan]) - c = pd.Series(index=range(2)) - d = pd.Series(index=range(2)) - e = pd.Series(index=range(2)) - f = pd.Series(index=range(2)) - c[:-1] = d[:-1] = e[0] = f[0] = False - self.assertTrue(a.equals(a)) - self.assertTrue(a.equals(b)) - self.assertTrue(a.equals(c)) - self.assertTrue(a.equals(d)) - self.assertFalse(a.equals(e)) - self.assertTrue(e.equals(f)) - - def test_describe_raises(self): - with tm.assertRaises(NotImplementedError): - tm.makePanel().describe() - - def test_pipe(self): - df = DataFrame({'A': [1, 2, 3]}) - f = lambda x, y: x ** y - result = df.pipe(f, 2) - expected = DataFrame({'A': [1, 4, 9]}) - self.assert_frame_equal(result, expected) - - result = df.A.pipe(f, 2) - self.assert_series_equal(result, expected.A) - - def test_pipe_tuple(self): - df = DataFrame({'A': [1, 2, 3]}) - f = lambda x, y: y - result = df.pipe((f, 'y'), 0) - self.assert_frame_equal(result, df) - - result = df.A.pipe((f, 'y'), 0) - self.assert_series_equal(result, df.A) - - def test_pipe_tuple_error(self): - df = DataFrame({"A": [1, 2, 3]}) - f = lambda x, y: y - with tm.assertRaises(ValueError): - df.pipe((f, 'y'), x=1, y=0) - - with tm.assertRaises(ValueError): - df.A.pipe((f, 'y'), x=1, y=0) - - def test_pipe_panel(self): - wp = Panel({'r1': DataFrame({"A": [1, 2, 3]})}) - f = lambda x, y: x + y - result = wp.pipe(f, 2) - expected = wp + 2 - assert_panel_equal(result, expected) - - result = wp.pipe((f, 'y'), x=1) - expected = wp + 1 - assert_panel_equal(result, expected) - - with tm.assertRaises(ValueError): - result = wp.pipe((f, 'y'), x=1, y=1) - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_join.py b/pandas/tests/test_join.py index bfdb77f3fb350..cde1cab37d09c 100644 --- a/pandas/tests/test_join.py +++ b/pandas/tests/test_join.py @@ -1,15 +1,14 @@ # -*- coding: utf-8 -*- import numpy as np -from pandas import Index +from pandas import Index, DataFrame, Categorical, merge -import pandas._join as _join +from pandas._libs import join as _join import pandas.util.testing as tm -from pandas.util.testing import assert_almost_equal +from pandas.util.testing import assert_almost_equal, assert_frame_equal -class TestIndexer(tm.TestCase): - _multiprocess_can_split_ = True +class TestIndexer(object): def test_outer_join_indexer(self): typemap = [('int32', _join.outer_join_indexer_int32), @@ -24,9 +23,9 @@ def test_outer_join_indexer(self): empty = np.array([], dtype=dtype) result, lindexer, rindexer = indexer(left, right) - tm.assertIsInstance(result, np.ndarray) - tm.assertIsInstance(lindexer, np.ndarray) - tm.assertIsInstance(rindexer, np.ndarray) + assert isinstance(result, np.ndarray) + assert isinstance(lindexer, np.ndarray) + assert isinstance(rindexer, np.ndarray) tm.assert_numpy_array_equal(result, np.arange(5, dtype=dtype)) exp = np.array([0, 1, 2, -1, -1], dtype=np.int64) tm.assert_numpy_array_equal(lindexer, exp) @@ -195,7 +194,41 @@ def test_inner_join_indexer2(): assert_almost_equal(ridx, exp_ridx) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) +def test_merge_join_categorical_multiindex(): + # From issue 16627 + a = {'Cat1': Categorical(['a', 'b', 'a', 'c', 'a', 'b'], + ['a', 'b', 'c']), + 'Int1': [0, 1, 0, 1, 0, 0]} + a = DataFrame(a) + + b = {'Cat': Categorical(['a', 'b', 'c', 'a', 'b', 'c'], + ['a', 'b', 'c']), + 'Int': [0, 0, 0, 1, 1, 1], + 'Factor': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]} + b = DataFrame(b).set_index(['Cat', 'Int'])['Factor'] + + expected = merge(a, b.reset_index(), left_on=['Cat1', 'Int1'], + right_on=['Cat', 'Int'], how='left') + result = a.join(b, on=['Cat1', 'Int1']) + expected = expected.drop(['Cat', 'Int'], axis=1) + assert_frame_equal(expected, result) + + # Same test, but with ordered categorical + a = {'Cat1': Categorical(['a', 'b', 'a', 'c', 'a', 'b'], + ['b', 'a', 'c'], + ordered=True), + 'Int1': [0, 1, 0, 1, 0, 0]} + a = DataFrame(a) + + b = {'Cat': Categorical(['a', 'b', 'c', 'a', 'b', 'c'], + ['b', 'a', 'c'], + ordered=True), + 'Int': [0, 0, 0, 1, 1, 1], + 'Factor': [1.1, 1.2, 1.3, 1.4, 1.5, 1.6]} + b = DataFrame(b).set_index(['Cat', 'Int'])['Factor'] + + expected = merge(a, b.reset_index(), left_on=['Cat1', 'Int1'], + right_on=['Cat', 'Int'], how='left') + result = a.join(b, on=['Cat1', 'Int1']) + expected = expected.drop(['Cat', 'Int'], axis=1) + assert_frame_equal(expected, result) diff --git a/pandas/tests/test_lib.py b/pandas/tests/test_lib.py index 945f8004687cd..2662720bb436d 100644 --- a/pandas/tests/test_lib.py +++ b/pandas/tests/test_lib.py @@ -1,29 +1,31 @@ # -*- coding: utf-8 -*- +import pytest + import numpy as np import pandas as pd -import pandas.lib as lib +import pandas._libs.lib as lib import pandas.util.testing as tm -class TestMisc(tm.TestCase): +class TestMisc(object): def test_max_len_string_array(self): arr = a = np.array(['foo', 'b', np.nan], dtype='object') - self.assertTrue(lib.max_len_string_array(arr), 3) + assert lib.max_len_string_array(arr) == 3 # unicode arr = a.astype('U').astype(object) - self.assertTrue(lib.max_len_string_array(arr), 3) + assert lib.max_len_string_array(arr) == 3 # bytes for python3 arr = a.astype('S').astype(object) - self.assertTrue(lib.max_len_string_array(arr), 3) + assert lib.max_len_string_array(arr) == 3 # raises - tm.assertRaises(TypeError, - lambda: lib.max_len_string_array(arr.astype('U'))) + pytest.raises(TypeError, + lambda: lib.max_len_string_array(arr.astype('U'))) def test_fast_unique_multiple_list_gen_sort(self): keys = [['p', 'a'], ['n', 'd'], ['a', 's']] @@ -39,7 +41,7 @@ def test_fast_unique_multiple_list_gen_sort(self): tm.assert_numpy_array_equal(np.array(out), expected) -class TestIndexing(tm.TestCase): +class TestIndexing(object): def test_maybe_indices_to_slice_left_edge(self): target = np.arange(100) @@ -47,32 +49,36 @@ def test_maybe_indices_to_slice_left_edge(self): # slice indices = np.array([], dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) for end in [1, 2, 5, 20, 99]: for step in [1, 2, 4]: indices = np.arange(0, end, step, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # reverse indices = indices[::-1] maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # not slice for case in [[2, 1, 2, 0], [2, 2, 1, 0], [0, 1, 2, 1], [-2, 0, 2], [2, 0, -2]]: indices = np.array(case, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) def test_maybe_indices_to_slice_right_edge(self): target = np.arange(100) @@ -82,42 +88,49 @@ def test_maybe_indices_to_slice_right_edge(self): for step in [1, 2, 4]: indices = np.arange(start, 99, step, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # reverse indices = indices[::-1] maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # not slice indices = np.array([97, 98, 99, 100], dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - with self.assertRaises(IndexError): + + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + + with pytest.raises(IndexError): target[indices] - with self.assertRaises(IndexError): + with pytest.raises(IndexError): target[maybe_slice] indices = np.array([100, 99, 98, 97], dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - with self.assertRaises(IndexError): + + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + + with pytest.raises(IndexError): target[indices] - with self.assertRaises(IndexError): + with pytest.raises(IndexError): target[maybe_slice] for case in [[99, 97, 99, 96], [99, 99, 98, 97], [98, 98, 97, 96]]: indices = np.array(case, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) def test_maybe_indices_to_slice_both_edges(self): target = np.arange(10) @@ -126,22 +139,22 @@ def test_maybe_indices_to_slice_both_edges(self): for step in [1, 2, 4, 5, 8, 9]: indices = np.arange(0, 9, step, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) # reverse indices = indices[::-1] maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) # not slice for case in [[4, 2, 0, -2], [2, 2, 1, 0], [0, 1, 2, 1]]: indices = np.array(case, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) def test_maybe_indices_to_slice_middle(self): target = np.arange(100) @@ -151,54 +164,57 @@ def test_maybe_indices_to_slice_middle(self): for step in [1, 2, 4, 20]: indices = np.arange(start, end, step, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # reverse indices = indices[::-1] maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertTrue(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(target[indices], - target[maybe_slice]) + + assert isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(target[indices], + target[maybe_slice]) # not slice for case in [[14, 12, 10, 12], [12, 12, 11, 10], [10, 11, 12, 11]]: indices = np.array(case, dtype=np.int64) maybe_slice = lib.maybe_indices_to_slice(indices, len(target)) - self.assertFalse(isinstance(maybe_slice, slice)) - self.assert_numpy_array_equal(maybe_slice, indices) - self.assert_numpy_array_equal(target[indices], target[maybe_slice]) + + assert not isinstance(maybe_slice, slice) + tm.assert_numpy_array_equal(maybe_slice, indices) + tm.assert_numpy_array_equal(target[indices], target[maybe_slice]) def test_maybe_booleans_to_slice(self): arr = np.array([0, 0, 1, 1, 1, 0, 1], dtype=np.uint8) result = lib.maybe_booleans_to_slice(arr) - self.assertTrue(result.dtype == np.bool_) + assert result.dtype == np.bool_ result = lib.maybe_booleans_to_slice(arr[:0]) - self.assertTrue(result == slice(0, 0)) + assert result == slice(0, 0) def test_get_reverse_indexer(self): indexer = np.array([-1, -1, 1, 2, 0, -1, 3, 4], dtype=np.int64) result = lib.get_reverse_indexer(indexer, 5) expected = np.array([4, 2, 3, 6, 7], dtype=np.int64) - self.assertTrue(np.array_equal(result, expected)) + assert np.array_equal(result, expected) -class TestNullObj(tm.TestCase): +class TestNAObj(object): - _1d_methods = ['isnullobj', 'isnullobj_old'] - _2d_methods = ['isnullobj2d', 'isnullobj2d_old'] + _1d_methods = ['isnaobj', 'isnaobj_old'] + _2d_methods = ['isnaobj2d', 'isnaobj2d_old'] def _check_behavior(self, arr, expected): - for method in TestNullObj._1d_methods: + for method in TestNAObj._1d_methods: result = getattr(lib, method)(arr) tm.assert_numpy_array_equal(result, expected) arr = np.atleast_2d(arr) expected = np.atleast_2d(expected) - for method in TestNullObj._2d_methods: + for method in TestNAObj._2d_methods: result = getattr(lib, method)(arr) tm.assert_numpy_array_equal(result, expected) @@ -221,7 +237,7 @@ def test_empty_arr(self): self._check_behavior(arr, expected) def test_empty_str_inp(self): - arr = np.array([""]) # empty but not null + arr = np.array([""]) # empty but not na expected = np.array([False]) self._check_behavior(arr, expected) @@ -232,10 +248,3 @@ def test_empty_like(self): expected = np.array([True]) self._check_behavior(arr, expected) - - -if __name__ == '__main__': - import nose - - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_msgpack/test_except.py b/pandas/tests/test_msgpack/test_except.py deleted file mode 100644 index 76b91bb375bbc..0000000000000 --- a/pandas/tests/test_msgpack/test_except.py +++ /dev/null @@ -1,33 +0,0 @@ -# coding: utf-8 - -import unittest -from pandas.msgpack import packb, unpackb - - -class DummyException(Exception): - pass - - -class TestExceptions(unittest.TestCase): - - def test_raise_on_find_unsupported_value(self): - import datetime - self.assertRaises(TypeError, packb, datetime.datetime.now()) - - def test_raise_from_object_hook(self): - def hook(obj): - raise DummyException - - self.assertRaises(DummyException, unpackb, packb({}), object_hook=hook) - self.assertRaises(DummyException, unpackb, packb({'fizz': 'buzz'}), - object_hook=hook) - self.assertRaises(DummyException, unpackb, packb({'fizz': 'buzz'}), - object_pairs_hook=hook) - self.assertRaises(DummyException, unpackb, - packb({'fizz': {'buzz': 'spam'}}), object_hook=hook) - self.assertRaises(DummyException, unpackb, - packb({'fizz': {'buzz': 'spam'}}), - object_pairs_hook=hook) - - def test_invalidvalue(self): - self.assertRaises(ValueError, unpackb, b'\xd9\x97#DL_') diff --git a/pandas/tests/test_multilevel.py b/pandas/tests/test_multilevel.py old mode 100755 new mode 100644 index 37bfe667b0205..a765e2c4ca1bf --- a/pandas/tests/test_multilevel.py +++ b/pandas/tests/test_multilevel.py @@ -3,31 +3,27 @@ from warnings import catch_warnings import datetime import itertools -import nose +import pytest +import pytz from numpy.random import randn import numpy as np from pandas.core.index import Index, MultiIndex -from pandas import Panel, DataFrame, Series, notnull, isnull, Timestamp +from pandas import Panel, DataFrame, Series, notna, isna, Timestamp -from pandas.types.common import is_float_dtype, is_integer_dtype -from pandas.util.testing import (assert_almost_equal, assert_series_equal, - assert_frame_equal, assertRaisesRegexp) +from pandas.core.dtypes.common import is_float_dtype, is_integer_dtype import pandas.core.common as com import pandas.util.testing as tm from pandas.compat import (range, lrange, StringIO, lzip, u, product as cart_product, zip) import pandas as pd +import pandas._libs.index as _index -import pandas.index as _index +class Base(object): -class TestMultiLevel(tm.TestCase): - - _multiprocess_can_split_ = True - - def setUp(self): + def setup_method(self, method): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], @@ -60,6 +56,9 @@ def setUp(self): inplace=True) self.ymd.index.set_names(['year', 'month', 'day'], inplace=True) + +class TestMultiLevel(Base): + def test_append(self): a, b = self.frame[:5], self.frame[5:] @@ -70,8 +69,6 @@ def test_append(self): tm.assert_series_equal(result, self.frame['A']) def test_append_index(self): - tm._skip_if_no_pytz() - idx1 = Index([1.1, 1.2, 1.3]) idx2 = pd.date_range('2011-01-01', freq='D', periods=3, tz='Asia/Tokyo') @@ -82,58 +79,57 @@ def test_append_index(self): result = idx1.append(midx_lv2) - # GH 7112 - import pytz + # see gh-7112 tz = pytz.timezone('Asia/Tokyo') - expected_tuples = [(1.1, datetime.datetime(2011, 1, 1, tzinfo=tz)), - (1.2, datetime.datetime(2011, 1, 2, tzinfo=tz)), - (1.3, datetime.datetime(2011, 1, 3, tzinfo=tz))] + expected_tuples = [(1.1, tz.localize(datetime.datetime(2011, 1, 1))), + (1.2, tz.localize(datetime.datetime(2011, 1, 2))), + (1.3, tz.localize(datetime.datetime(2011, 1, 3)))] expected = Index([1.1, 1.2, 1.3] + expected_tuples) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = midx_lv2.append(idx1) expected = Index(expected_tuples + [1.1, 1.2, 1.3]) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = midx_lv2.append(midx_lv2) expected = MultiIndex.from_arrays([idx1.append(idx1), idx2.append(idx2)]) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = midx_lv2.append(midx_lv3) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) result = midx_lv3.append(midx_lv2) expected = Index._simple_new( - np.array([(1.1, datetime.datetime(2011, 1, 1, tzinfo=tz), 'A'), - (1.2, datetime.datetime(2011, 1, 2, tzinfo=tz), 'B'), - (1.3, datetime.datetime(2011, 1, 3, tzinfo=tz), 'C')] + + np.array([(1.1, tz.localize(datetime.datetime(2011, 1, 1)), 'A'), + (1.2, tz.localize(datetime.datetime(2011, 1, 2)), 'B'), + (1.3, tz.localize(datetime.datetime(2011, 1, 3)), 'C')] + expected_tuples), None) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_dataframe_constructor(self): multi = DataFrame(np.random.randn(4, 4), index=[np.array(['a', 'a', 'b', 'b']), np.array(['x', 'y', 'x', 'y'])]) - tm.assertIsInstance(multi.index, MultiIndex) - self.assertNotIsInstance(multi.columns, MultiIndex) + assert isinstance(multi.index, MultiIndex) + assert not isinstance(multi.columns, MultiIndex) multi = DataFrame(np.random.randn(4, 4), columns=[['a', 'a', 'b', 'b'], ['x', 'y', 'x', 'y']]) - tm.assertIsInstance(multi.columns, MultiIndex) + assert isinstance(multi.columns, MultiIndex) def test_series_constructor(self): multi = Series(1., index=[np.array(['a', 'a', 'b', 'b']), np.array( ['x', 'y', 'x', 'y'])]) - tm.assertIsInstance(multi.index, MultiIndex) + assert isinstance(multi.index, MultiIndex) multi = Series(1., index=[['a', 'a', 'b', 'b'], ['x', 'y', 'x', 'y']]) - tm.assertIsInstance(multi.index, MultiIndex) + assert isinstance(multi.index, MultiIndex) multi = Series(lrange(4), index=[['a', 'a', 'b', 'b'], ['x', 'y', 'x', 'y']]) - tm.assertIsInstance(multi.index, MultiIndex) + assert isinstance(multi.index, MultiIndex) def test_reindex_level(self): # axis=0 @@ -141,18 +137,18 @@ def test_reindex_level(self): result = month_sums.reindex(self.ymd.index, level=1) expected = self.ymd.groupby(level='month').transform(np.sum) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # Series result = month_sums['A'].reindex(self.ymd.index, level=1) expected = self.ymd['A'].groupby(level='month').transform(np.sum) - assert_series_equal(result, expected, check_names=False) + tm.assert_series_equal(result, expected, check_names=False) # axis=1 month_sums = self.ymd.T.sum(axis=1, level='month') result = month_sums.reindex(columns=self.ymd.index, level=1) expected = self.ymd.groupby(level='month').transform(np.sum).T - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_binops_level(self): def _check_op(opname): @@ -162,7 +158,7 @@ def _check_op(opname): broadcasted = self.ymd.groupby(level='month').transform(np.sum) expected = op(self.ymd, broadcasted) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # Series op = getattr(Series, opname) @@ -171,7 +167,7 @@ def _check_op(opname): np.sum) expected = op(self.ymd['A'], broadcasted) expected.name = 'A' - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) _check_op('sub') _check_op('add') @@ -180,8 +176,8 @@ def _check_op(opname): def test_pickle(self): def _test_roundtrip(frame): - unpickled = self.round_trip_pickle(frame) - assert_frame_equal(frame, unpickled) + unpickled = tm.round_trip_pickle(frame) + tm.assert_frame_equal(frame, unpickled) _test_roundtrip(self.frame) _test_roundtrip(self.frame.T) @@ -191,74 +187,30 @@ def _test_roundtrip(frame): def test_reindex(self): expected = self.frame.iloc[[0, 3]] reindexed = self.frame.loc[[('foo', 'one'), ('bar', 'one')]] - assert_frame_equal(reindexed, expected) + tm.assert_frame_equal(reindexed, expected) with catch_warnings(record=True): reindexed = self.frame.ix[[('foo', 'one'), ('bar', 'one')]] - assert_frame_equal(reindexed, expected) + tm.assert_frame_equal(reindexed, expected) def test_reindex_preserve_levels(self): new_index = self.ymd.index[::10] chunk = self.ymd.reindex(new_index) - self.assertIs(chunk.index, new_index) + assert chunk.index is new_index chunk = self.ymd.loc[new_index] - self.assertIs(chunk.index, new_index) + assert chunk.index is new_index with catch_warnings(record=True): chunk = self.ymd.ix[new_index] - self.assertIs(chunk.index, new_index) + assert chunk.index is new_index ymdT = self.ymd.T chunk = ymdT.reindex(columns=new_index) - self.assertIs(chunk.columns, new_index) + assert chunk.columns is new_index chunk = ymdT.loc[:, new_index] - self.assertIs(chunk.columns, new_index) - - def test_sort_index_preserve_levels(self): - result = self.frame.sort_index() - self.assertEqual(result.index.names, self.frame.index.names) - - def test_sorting_repr_8017(self): - - np.random.seed(0) - data = np.random.randn(3, 4) - - for gen, extra in [([1., 3., 2., 5.], 4.), ([1, 3, 2, 5], 4), - ([Timestamp('20130101'), Timestamp('20130103'), - Timestamp('20130102'), Timestamp('20130105')], - Timestamp('20130104')), - (['1one', '3one', '2one', '5one'], '4one')]: - columns = MultiIndex.from_tuples([('red', i) for i in gen]) - df = DataFrame(data, index=list('def'), columns=columns) - df2 = pd.concat([df, - DataFrame('world', index=list('def'), - columns=MultiIndex.from_tuples( - [('red', extra)]))], axis=1) - - # check that the repr is good - # make sure that we have a correct sparsified repr - # e.g. only 1 header of read - self.assertEqual(str(df2).splitlines()[0].split(), ['red']) - - # GH 8017 - # sorting fails after columns added - - # construct single-dtype then sort - result = df.copy().sort_index(axis=1) - expected = df.iloc[:, [0, 2, 1, 3]] - assert_frame_equal(result, expected) - - result = df2.sort_index(axis=1) - expected = df2.iloc[:, [0, 2, 1, 4, 3]] - assert_frame_equal(result, expected) - - # setitem then sort - result = df.copy() - result[('red', extra)] = 'world' - result = result.sort_index(axis=1) - assert_frame_equal(result, expected) + assert chunk.columns is new_index def test_repr_to_string(self): repr(self.frame) @@ -279,15 +231,17 @@ def test_repr_name_coincide(self): df = DataFrame({'value': [0, 1]}, index=index) lines = repr(df).split('\n') - self.assertTrue(lines[2].startswith('a 0 foo')) + assert lines[2].startswith('a 0 foo') def test_getitem_simple(self): df = self.frame.T col = df['foo', 'one'] - assert_almost_equal(col.values, df.values[:, 0]) - self.assertRaises(KeyError, df.__getitem__, ('foo', 'four')) - self.assertRaises(KeyError, df.__getitem__, 'foobar') + tm.assert_almost_equal(col.values, df.values[:, 0]) + with pytest.raises(KeyError): + df[('foo', 'four')] + with pytest.raises(KeyError): + df['foobar'] def test_series_getitem(self): s = self.ymd['A'] @@ -299,46 +253,46 @@ def test_series_getitem(self): expected = s.reindex(s.index[42:65]) expected.index = expected.index.droplevel(0).droplevel(0) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = s[2000, 3, 10] expected = s[49] - self.assertEqual(result, expected) + assert result == expected # fancy expected = s.reindex(s.index[49:51]) result = s.loc[[(2000, 3, 10), (2000, 3, 13)]] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) with catch_warnings(record=True): result = s.ix[[(2000, 3, 10), (2000, 3, 13)]] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # key error - self.assertRaises(KeyError, s.__getitem__, (2000, 3, 4)) + pytest.raises(KeyError, s.__getitem__, (2000, 3, 4)) def test_series_getitem_corner(self): s = self.ymd['A'] # don't segfault, GH #495 # out of bounds access - self.assertRaises(IndexError, s.__getitem__, len(self.ymd)) + pytest.raises(IndexError, s.__getitem__, len(self.ymd)) # generator result = s[(x > 0 for x in s)] expected = s[s > 0] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_series_setitem(self): s = self.ymd['A'] s[2000, 3] = np.nan - self.assertTrue(isnull(s.values[42:65]).all()) - self.assertTrue(notnull(s.values[:42]).all()) - self.assertTrue(notnull(s.values[65:]).all()) + assert isna(s.values[42:65]).all() + assert notna(s.values[:42]).all() + assert notna(s.values[65:]).all() s[2000, 3, 10] = np.nan - self.assertTrue(isnull(s[49])) + assert isna(s[49]) def test_series_slice_partial(self): pass @@ -349,36 +303,36 @@ def test_frame_getitem_setitem_boolean(self): result = df[df > 0] expected = df.where(df > 0) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) df[df > 0] = 5 values[values > 0] = 5 - assert_almost_equal(df.values, values) + tm.assert_almost_equal(df.values, values) df[df == 5] = 0 values[values == 5] = 0 - assert_almost_equal(df.values, values) + tm.assert_almost_equal(df.values, values) # a df that needs alignment first df[df[:-1] < 0] = 2 np.putmask(values[:-1], values[:-1] < 0, 2) - assert_almost_equal(df.values, values) + tm.assert_almost_equal(df.values, values) - with assertRaisesRegexp(TypeError, 'boolean values only'): + with tm.assert_raises_regex(TypeError, 'boolean values only'): df[df * 0] = 2 def test_frame_getitem_setitem_slice(self): # getitem result = self.frame.iloc[:4] expected = self.frame[:4] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # setitem cp = self.frame.copy() cp.iloc[:4] = 0 - self.assertTrue((cp.values[:4] == 0).all()) - self.assertTrue((cp.values[4:] != 0).all()) + assert (cp.values[:4] == 0).all() + assert (cp.values[4:] != 0).all() def test_frame_getitem_setitem_multislice(self): levels = [['t1', 't2'], ['a', 'b', 'c']] @@ -387,25 +341,25 @@ def test_frame_getitem_setitem_multislice(self): df = DataFrame({'value': [1, 2, 3, 7, 8]}, index=midx) result = df.loc[:, 'value'] - assert_series_equal(df['value'], result) + tm.assert_series_equal(df['value'], result) with catch_warnings(record=True): result = df.ix[:, 'value'] - assert_series_equal(df['value'], result) + tm.assert_series_equal(df['value'], result) result = df.loc[df.index[1:3], 'value'] - assert_series_equal(df['value'][1:3], result) + tm.assert_series_equal(df['value'][1:3], result) result = df.loc[:, :] - assert_frame_equal(df, result) + tm.assert_frame_equal(df, result) result = df df.loc[:, 'value'] = 10 result['value'] = 10 - assert_frame_equal(df, result) + tm.assert_frame_equal(df, result) df.loc[:, :] = 10 - assert_frame_equal(df, result) + tm.assert_frame_equal(df, result) def test_frame_getitem_multicolumn_empty_level(self): f = DataFrame({'a': ['1', '2', '3'], 'b': ['2', '3', '4']}) @@ -415,7 +369,7 @@ def test_frame_getitem_multicolumn_empty_level(self): result = f['level1 item1'] expected = DataFrame([['1'], ['2'], ['3']], index=f.index, columns=['level3 item1']) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_frame_setitem_multi_column(self): df = DataFrame(randn(10, 4), columns=[['a', 'a', 'b', 'b'], @@ -423,12 +377,12 @@ def test_frame_setitem_multi_column(self): cp = df.copy() cp['a'] = cp['b'] - assert_frame_equal(cp['a'], cp['b']) + tm.assert_frame_equal(cp['a'], cp['b']) # set with ndarray cp = df.copy() cp['a'] = cp['b'].values - assert_frame_equal(cp['a'], cp['b']) + tm.assert_frame_equal(cp['a'], cp['b']) # --------------------------------------- # #1803 @@ -437,7 +391,7 @@ def test_frame_setitem_multi_column(self): # Works, but adds a column instead of updating the two existing ones df['A'] = 0.0 # Doesn't work - self.assertTrue((df['A'].values == 0).all()) + assert (df['A'].values == 0).all() # it broadcasts df['B', '1'] = [1, 2, 3] @@ -446,11 +400,11 @@ def test_frame_setitem_multi_column(self): sliced_a1 = df['A', '1'] sliced_a2 = df['A', '2'] sliced_b1 = df['B', '1'] - assert_series_equal(sliced_a1, sliced_b1, check_names=False) - assert_series_equal(sliced_a2, sliced_b1, check_names=False) - self.assertEqual(sliced_a1.name, ('A', '1')) - self.assertEqual(sliced_a2.name, ('A', '2')) - self.assertEqual(sliced_b1.name, ('B', '1')) + tm.assert_series_equal(sliced_a1, sliced_b1, check_names=False) + tm.assert_series_equal(sliced_a2, sliced_b1, check_names=False) + assert sliced_a1.name == ('A', '1') + assert sliced_a2.name == ('A', '2') + assert sliced_b1.name == ('B', '1') def test_getitem_tuple_plus_slice(self): # GH #671 @@ -467,9 +421,9 @@ def test_getitem_tuple_plus_slice(self): with catch_warnings(record=True): expected3 = idf.ix[0, 0] - assert_series_equal(result, expected) - assert_series_equal(result, expected2) - assert_series_equal(result, expected3) + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result, expected2) + tm.assert_series_equal(result, expected3) def test_getitem_setitem_tuple_plus_columns(self): # GH #1013 @@ -478,26 +432,14 @@ def test_getitem_setitem_tuple_plus_columns(self): result = df.loc[(2000, 1, 6), ['A', 'B', 'C']] expected = df.loc[2000, 1, 6][['A', 'B', 'C']] - assert_series_equal(result, expected) - - def test_getitem_multilevel_index_tuple_unsorted(self): - index_columns = list("abc") - df = DataFrame([[0, 1, 0, "x"], [0, 0, 1, "y"]], - columns=index_columns + ["data"]) - df = df.set_index(index_columns) - query_index = df.index[:1] - rs = df.loc[query_index, "data"] - - xp_idx = MultiIndex.from_tuples([(0, 1, 0)], names=['a', 'b', 'c']) - xp = Series(['x'], index=xp_idx, name='data') - assert_series_equal(rs, xp) + tm.assert_series_equal(result, expected) def test_xs(self): xs = self.frame.xs(('bar', 'two')) xs2 = self.frame.loc[('bar', 'two')] - assert_series_equal(xs, xs2) - assert_almost_equal(xs.values, self.frame.values[4]) + tm.assert_series_equal(xs, xs2) + tm.assert_almost_equal(xs.values, self.frame.values[4]) # GH 6574 # missing values in returned index should be preserrved @@ -516,18 +458,18 @@ def test_xs(self): ['xbcde', np.nan, 'zbcde', 'ybcde'], name='a2')) result = df.xs('z', level='a1') - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_xs_partial(self): result = self.frame.xs('foo') result2 = self.frame.loc['foo'] expected = self.frame.T['foo'].T - assert_frame_equal(result, expected) - assert_frame_equal(result, result2) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result, result2) result = self.ymd.xs((2000, 4)) expected = self.ymd.loc[2000, 4] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # ex from #1796 index = MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], [-1, 1]], @@ -539,14 +481,14 @@ def test_xs_partial(self): result = df.xs(['foo', 'one']) expected = df.loc['foo', 'one'] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_xs_level(self): result = self.frame.xs('two', level='second') expected = self.frame[self.frame.index.get_level_values(1) == 'two'] expected.index = expected.index.droplevel(1) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) index = MultiIndex.from_tuples([('x', 'y', 'z'), ('a', 'b', 'c'), ( 'p', 'q', 'r')]) @@ -554,7 +496,7 @@ def test_xs_level(self): result = df.xs('c', level=2) expected = df[1:2] expected.index = expected.index.droplevel(2) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # this is a copy in 0.14 result = self.frame.xs('two', level='second') @@ -564,7 +506,7 @@ def test_xs_level(self): def f(x): x[:] = 10 - self.assertRaises(com.SettingWithCopyError, f, result) + pytest.raises(com.SettingWithCopyError, f, result) def test_xs_level_multiple(self): from pandas import read_table @@ -578,7 +520,7 @@ def test_xs_level_multiple(self): result = df.xs(('a', 4), level=['one', 'four']) expected = df.xs('a').xs(4, level='four') - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # this is a copy in 0.14 result = df.xs(('a', 4), level=['one', 'four']) @@ -588,7 +530,7 @@ def test_xs_level_multiple(self): def f(x): x[:] = 10 - self.assertRaises(com.SettingWithCopyError, f, result) + pytest.raises(com.SettingWithCopyError, f, result) # GH2107 dates = lrange(20111201, 20111205) @@ -599,7 +541,7 @@ def f(x): rs = df.xs(20111201, level='date') xp = df.loc[20111201, :] - assert_frame_equal(rs, xp) + tm.assert_frame_equal(rs, xp) def test_xs_level0(self): from pandas import read_table @@ -613,29 +555,29 @@ def test_xs_level0(self): result = df.xs('a', level=0) expected = df.xs('a') - self.assertEqual(len(result), 2) - assert_frame_equal(result, expected) + assert len(result) == 2 + tm.assert_frame_equal(result, expected) def test_xs_level_series(self): s = self.frame['A'] result = s[:, 'two'] expected = self.frame.xs('two', level=1)['A'] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) s = self.ymd['A'] result = s[2000, 5] expected = self.ymd.loc[2000, 5]['A'] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # not implementing this for now - self.assertRaises(TypeError, s.__getitem__, (2000, slice(3, 4))) + pytest.raises(TypeError, s.__getitem__, (2000, slice(3, 4))) # result = s[2000, 3:4] # lv =s.index.get_level_values(1) # expected = s[(lv == 3) | (lv == 4)] # expected.index = expected.index.droplevel(0) - # assert_series_equal(result, expected) + # tm.assert_series_equal(result, expected) # can do this though @@ -651,15 +593,15 @@ def test_getitem_toplevel(self): result = df['foo'] expected = df.reindex(columns=df.columns[:3]) expected.columns = expected.columns.droplevel(0) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = df['bar'] result2 = df.loc[:, 'bar'] expected = df.reindex(columns=df.columns[3:5]) expected.columns = expected.columns.droplevel(0) - assert_frame_equal(result, expected) - assert_frame_equal(result, result2) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result, result2) def test_getitem_setitem_slice_integers(self): index = MultiIndex(levels=[[0, 1, 2], [0, 2]], @@ -669,19 +611,19 @@ def test_getitem_setitem_slice_integers(self): columns=['a', 'b', 'c', 'd']) res = frame.loc[1:2] exp = frame.reindex(frame.index[2:]) - assert_frame_equal(res, exp) + tm.assert_frame_equal(res, exp) frame.loc[1:2] = 7 - self.assertTrue((frame.loc[1:2] == 7).values.all()) + assert (frame.loc[1:2] == 7).values.all() series = Series(np.random.randn(len(index)), index=index) res = series.loc[1:2] exp = series.reindex(series.index[2:]) - assert_series_equal(res, exp) + tm.assert_series_equal(res, exp) series.loc[1:2] = 7 - self.assertTrue((series.loc[1:2] == 7).values.all()) + assert (series.loc[1:2] == 7).values.all() def test_getitem_int(self): levels = [[0, 1], [0, 1, 2]] @@ -693,15 +635,15 @@ def test_getitem_int(self): result = frame.loc[1] expected = frame[-3:] expected.index = expected.index.droplevel(0) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # raises exception - self.assertRaises(KeyError, frame.loc.__getitem__, 3) + pytest.raises(KeyError, frame.loc.__getitem__, 3) # however this will work result = self.frame.iloc[2] expected = self.frame.xs(self.frame.index[2]) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_getitem_partial(self): ymd = self.ymd.T @@ -709,51 +651,43 @@ def test_getitem_partial(self): expected = ymd.reindex(columns=ymd.columns[ymd.columns.labels[1] == 1]) expected.columns = expected.columns.droplevel(0).droplevel(0) - assert_frame_equal(result, expected) - - def test_getitem_slice_not_sorted(self): - df = self.frame.sort_index(level=1).T - - # buglet with int typechecking - result = df.iloc[:, :np.int32(3)] - expected = df.reindex(columns=df.columns[:3]) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_setitem_change_dtype(self): dft = self.frame.T s = dft['foo', 'two'] dft['foo', 'two'] = s > s.median() - assert_series_equal(dft['foo', 'two'], s > s.median()) - # tm.assertIsInstance(dft._data.blocks[1].items, MultiIndex) + tm.assert_series_equal(dft['foo', 'two'], s > s.median()) + # assert isinstance(dft._data.blocks[1].items, MultiIndex) reindexed = dft.reindex(columns=[('foo', 'two')]) - assert_series_equal(reindexed['foo', 'two'], s > s.median()) + tm.assert_series_equal(reindexed['foo', 'two'], s > s.median()) def test_frame_setitem_ix(self): self.frame.loc[('bar', 'two'), 'B'] = 5 - self.assertEqual(self.frame.loc[('bar', 'two'), 'B'], 5) + assert self.frame.loc[('bar', 'two'), 'B'] == 5 # with integer labels df = self.frame.copy() df.columns = lrange(3) df.loc[('bar', 'two'), 1] = 7 - self.assertEqual(df.loc[('bar', 'two'), 1], 7) + assert df.loc[('bar', 'two'), 1] == 7 with catch_warnings(record=True): df = self.frame.copy() df.columns = lrange(3) df.ix[('bar', 'two'), 1] = 7 - self.assertEqual(df.loc[('bar', 'two'), 1], 7) + assert df.loc[('bar', 'two'), 1] == 7 def test_fancy_slice_partial(self): result = self.frame.loc['bar':'baz'] expected = self.frame[3:7] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.ymd.loc[(2000, 2):(2000, 4)] lev = self.ymd.index.labels[1] expected = self.ymd[(lev >= 1) & (lev <= 3)] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_getitem_partial_column_select(self): idx = MultiIndex(labels=[[0, 0, 0], [0, 1, 1], [1, 0, 1]], @@ -762,54 +696,18 @@ def test_getitem_partial_column_select(self): result = df.loc[('a', 'y'), :] expected = df.loc[('a', 'y')] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = df.loc[('a', 'y'), [1, 0]] expected = df.loc[('a', 'y')][[1, 0]] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) with catch_warnings(record=True): result = df.ix[('a', 'y'), [1, 0]] - assert_frame_equal(result, expected) - - self.assertRaises(KeyError, df.loc.__getitem__, - (('a', 'foo'), slice(None, None))) - - def test_sort_index_level(self): - df = self.frame.copy() - df.index = np.arange(len(df)) - - # axis=1 + tm.assert_frame_equal(result, expected) - # series - a_sorted = self.frame['A'].sort_index(level=0) - - # preserve names - self.assertEqual(a_sorted.index.names, self.frame.index.names) - - # inplace - rs = self.frame.copy() - rs.sort_index(level=0, inplace=True) - assert_frame_equal(rs, self.frame.sort_index(level=0)) - - def test_sort_index_level_large_cardinality(self): - - # #2684 (int64) - index = MultiIndex.from_arrays([np.arange(4000)] * 3) - df = DataFrame(np.random.randn(4000), index=index, dtype=np.int64) - - # it works! - result = df.sort_index(level=0) - self.assertTrue(result.index.lexsort_depth == 3) - - # #2684 (int32) - index = MultiIndex.from_arrays([np.arange(4000)] * 3) - df = DataFrame(np.random.randn(4000), index=index, dtype=np.int32) - - # it works! - result = df.sort_index(level=0) - self.assertTrue((result.dtypes.values == df.dtypes.values).all()) - self.assertTrue(result.index.lexsort_depth == 3) + pytest.raises(KeyError, df.loc.__getitem__, + (('a', 'foo'), slice(None, None))) def test_delevel_infer_dtype(self): tuples = [tuple @@ -819,42 +717,19 @@ def test_delevel_infer_dtype(self): df = DataFrame(np.random.randn(8, 3), columns=['A', 'B', 'C'], index=index) deleveled = df.reset_index() - self.assertTrue(is_integer_dtype(deleveled['prm1'])) - self.assertTrue(is_float_dtype(deleveled['prm2'])) + assert is_integer_dtype(deleveled['prm1']) + assert is_float_dtype(deleveled['prm2']) def test_reset_index_with_drop(self): deleveled = self.ymd.reset_index(drop=True) - self.assertEqual(len(deleveled.columns), len(self.ymd.columns)) + assert len(deleveled.columns) == len(self.ymd.columns) deleveled = self.series.reset_index() - tm.assertIsInstance(deleveled, DataFrame) - self.assertEqual(len(deleveled.columns), - len(self.series.index.levels) + 1) + assert isinstance(deleveled, DataFrame) + assert len(deleveled.columns) == len(self.series.index.levels) + 1 deleveled = self.series.reset_index(drop=True) - tm.assertIsInstance(deleveled, Series) - - def test_sort_index_level_by_name(self): - self.frame.index.names = ['first', 'second'] - result = self.frame.sort_index(level='second') - expected = self.frame.sort_index(level=1) - assert_frame_equal(result, expected) - - def test_sort_index_level_mixed(self): - sorted_before = self.frame.sort_index(level=1) - - df = self.frame.copy() - df['foo'] = 'bar' - sorted_after = df.sort_index(level=1) - assert_frame_equal(sorted_before, sorted_after.drop(['foo'], axis=1)) - - dft = self.frame.T - sorted_before = dft.sort_index(level=1, axis=1) - dft['foo', 'three'] = 'bar' - - sorted_after = dft.sort_index(level=1, axis=1) - assert_frame_equal(sorted_before.drop([('foo', 'three')], axis=1), - sorted_after.drop([('foo', 'three')], axis=1)) + assert isinstance(deleveled, Series) def test_count_level(self): def _check_counts(frame, axis=0): @@ -863,7 +738,7 @@ def _check_counts(frame, axis=0): result = frame.count(axis=axis, level=i) expected = frame.groupby(axis=axis, level=i).count() expected = expected.reindex_like(result).astype('i8') - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) self.frame.iloc[1, [1, 2]] = np.nan self.frame.iloc[7, [0, 1]] = np.nan @@ -877,7 +752,8 @@ def _check_counts(frame, axis=0): # can't call with level on regular DataFrame df = tm.makeTimeDataFrame() - assertRaisesRegexp(TypeError, 'hierarchical', df.count, level=0) + tm.assert_raises_regex( + TypeError, 'hierarchical', df.count, level=0) self.frame['D'] = 'foo' result = self.frame.count(level=0, numeric_only=True) @@ -893,30 +769,31 @@ def test_count_level_series(self): result = s.count(level=0) expected = s.groupby(level=0).count() - assert_series_equal(result.astype('f8'), - expected.reindex(result.index).fillna(0)) + tm.assert_series_equal( + result.astype('f8'), expected.reindex(result.index).fillna(0)) result = s.count(level=1) expected = s.groupby(level=1).count() - assert_series_equal(result.astype('f8'), - expected.reindex(result.index).fillna(0)) + tm.assert_series_equal( + result.astype('f8'), expected.reindex(result.index).fillna(0)) def test_count_level_corner(self): s = self.frame['A'][:0] result = s.count(level=0) expected = Series(0, index=s.index.levels[0], name='A') - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) df = self.frame[:0] result = df.count(level=0) expected = DataFrame({}, index=s.index.levels[0], columns=df.columns).fillna(0).astype(np.int64) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_get_level_number_out_of_bounds(self): - with assertRaisesRegexp(IndexError, "Too many levels"): + with tm.assert_raises_regex(IndexError, "Too many levels"): self.frame.index._get_level_number(2) - with assertRaisesRegexp(IndexError, "not a valid level number"): + with tm.assert_raises_regex(IndexError, + "not a valid level number"): self.frame.index._get_level_number(-3) def test_unstack(self): @@ -938,56 +815,56 @@ def test_unstack_multiple_no_empty_columns(self): unstacked = s.unstack([1, 2]) expected = unstacked.dropna(axis=1, how='all') - assert_frame_equal(unstacked, expected) + tm.assert_frame_equal(unstacked, expected) def test_stack(self): # regular roundtrip unstacked = self.ymd.unstack() restacked = unstacked.stack() - assert_frame_equal(restacked, self.ymd) + tm.assert_frame_equal(restacked, self.ymd) unlexsorted = self.ymd.sort_index(level=2) unstacked = unlexsorted.unstack(2) restacked = unstacked.stack() - assert_frame_equal(restacked.sort_index(level=0), self.ymd) + tm.assert_frame_equal(restacked.sort_index(level=0), self.ymd) unlexsorted = unlexsorted[::-1] unstacked = unlexsorted.unstack(1) restacked = unstacked.stack().swaplevel(1, 2) - assert_frame_equal(restacked.sort_index(level=0), self.ymd) + tm.assert_frame_equal(restacked.sort_index(level=0), self.ymd) unlexsorted = unlexsorted.swaplevel(0, 1) unstacked = unlexsorted.unstack(0).swaplevel(0, 1, axis=1) restacked = unstacked.stack(0).swaplevel(1, 2) - assert_frame_equal(restacked.sort_index(level=0), self.ymd) + tm.assert_frame_equal(restacked.sort_index(level=0), self.ymd) # columns unsorted unstacked = self.ymd.unstack() unstacked = unstacked.sort_index(axis=1, ascending=False) restacked = unstacked.stack() - assert_frame_equal(restacked, self.ymd) + tm.assert_frame_equal(restacked, self.ymd) # more than 2 levels in the columns unstacked = self.ymd.unstack(1).unstack(1) result = unstacked.stack(1) expected = self.ymd.unstack() - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = unstacked.stack(2) expected = self.ymd.unstack(1) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = unstacked.stack(0) expected = self.ymd.stack().unstack(1).unstack(1) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) # not all levels present in each echelon unstacked = self.ymd.unstack(2).loc[:, ::3] stacked = unstacked.stack().stack() ymd_stacked = self.ymd.stack() - assert_series_equal(stacked, ymd_stacked.reindex(stacked.index)) + tm.assert_series_equal(stacked, ymd_stacked.reindex(stacked.index)) # stack with negative number result = self.ymd.unstack(0).stack(-2) @@ -995,8 +872,8 @@ def test_stack(self): # GH10417 def check(left, right): - assert_series_equal(left, right) - self.assertFalse(left.index.is_unique) + tm.assert_series_equal(left, right) + assert not left.index.is_unique li, ri = left.index, right.index tm.assert_index_equal(li, ri) @@ -1051,7 +928,7 @@ def test_unstack_odd_failure(self): result = df.unstack(2) recons = result.stack() - assert_frame_equal(recons, df) + tm.assert_frame_equal(recons, df) def test_stack_mixed_dtype(self): df = self.frame.T @@ -1059,10 +936,10 @@ def test_stack_mixed_dtype(self): df = df.sort_index(level=1, axis=1) stacked = df.stack() - result = df['foo'].stack() - assert_series_equal(stacked['foo'], result, check_names=False) - self.assertIs(result.name, None) - self.assertEqual(stacked['bar'].dtype, np.float_) + result = df['foo'].stack().sort_index() + tm.assert_series_equal(stacked['foo'], result, check_names=False) + assert result.name is None + assert stacked['bar'].dtype == np.float_ def test_unstack_bug(self): df = DataFrame({'state': ['naive', 'naive', 'naive', 'activ', 'activ', @@ -1076,73 +953,74 @@ def test_unstack_bug(self): unstacked = result.unstack() restacked = unstacked.stack() - assert_series_equal(restacked, - result.reindex(restacked.index).astype(float)) + tm.assert_series_equal( + restacked, result.reindex(restacked.index).astype(float)) def test_stack_unstack_preserve_names(self): unstacked = self.frame.unstack() - self.assertEqual(unstacked.index.name, 'first') - self.assertEqual(unstacked.columns.names, ['exp', 'second']) + assert unstacked.index.name == 'first' + assert unstacked.columns.names == ['exp', 'second'] restacked = unstacked.stack() - self.assertEqual(restacked.index.names, self.frame.index.names) + assert restacked.index.names == self.frame.index.names def test_unstack_level_name(self): result = self.frame.unstack('second') expected = self.frame.unstack(level=1) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_stack_level_name(self): unstacked = self.frame.unstack('second') result = unstacked.stack('exp') expected = self.frame.unstack().stack(0) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame.stack('exp') expected = self.frame.stack() - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_stack_unstack_multiple(self): unstacked = self.ymd.unstack(['year', 'month']) expected = self.ymd.unstack('year').unstack('month') - assert_frame_equal(unstacked, expected) - self.assertEqual(unstacked.columns.names, expected.columns.names) + tm.assert_frame_equal(unstacked, expected) + assert unstacked.columns.names == expected.columns.names # series s = self.ymd['A'] s_unstacked = s.unstack(['year', 'month']) - assert_frame_equal(s_unstacked, expected['A']) + tm.assert_frame_equal(s_unstacked, expected['A']) restacked = unstacked.stack(['year', 'month']) restacked = restacked.swaplevel(0, 1).swaplevel(1, 2) restacked = restacked.sort_index(level=0) - assert_frame_equal(restacked, self.ymd) - self.assertEqual(restacked.index.names, self.ymd.index.names) + tm.assert_frame_equal(restacked, self.ymd) + assert restacked.index.names == self.ymd.index.names # GH #451 unstacked = self.ymd.unstack([1, 2]) expected = self.ymd.unstack(1).unstack(1).dropna(axis=1, how='all') - assert_frame_equal(unstacked, expected) + tm.assert_frame_equal(unstacked, expected) unstacked = self.ymd.unstack([2, 1]) expected = self.ymd.unstack(2).unstack(1).dropna(axis=1, how='all') - assert_frame_equal(unstacked, expected.loc[:, unstacked.columns]) + tm.assert_frame_equal(unstacked, expected.loc[:, unstacked.columns]) def test_stack_names_and_numbers(self): unstacked = self.ymd.unstack(['year', 'month']) # Can't use mixture of names and numbers to stack - with assertRaisesRegexp(ValueError, "level should contain"): + with tm.assert_raises_regex(ValueError, "level should contain"): unstacked.stack([0, 'month']) def test_stack_multiple_out_of_bounds(self): # nlevels == 3 unstacked = self.ymd.unstack(['year', 'month']) - with assertRaisesRegexp(IndexError, "Too many levels"): + with tm.assert_raises_regex(IndexError, "Too many levels"): unstacked.stack([2, 3]) - with assertRaisesRegexp(IndexError, "not a valid level number"): + with tm.assert_raises_regex(IndexError, + "not a valid level number"): unstacked.stack([-4, -3]) def test_unstack_period_series(self): @@ -1165,9 +1043,9 @@ def test_unstack_period_series(self): columns=['A', 'B']) expected.columns.name = 'str' - assert_frame_equal(result1, expected) - assert_frame_equal(result2, expected) - assert_frame_equal(result3, expected.T) + tm.assert_frame_equal(result1, expected) + tm.assert_frame_equal(result2, expected) + tm.assert_frame_equal(result3, expected.T) idx1 = pd.PeriodIndex(['2013-01', '2013-01', '2013-02', '2013-02', '2013-03', '2013-03'], freq='M', name='period1') @@ -1191,9 +1069,9 @@ def test_unstack_period_series(self): [6, 5, np.nan, np.nan, np.nan, np.nan]], index=e_idx, columns=e_cols) - assert_frame_equal(result1, expected) - assert_frame_equal(result2, expected) - assert_frame_equal(result3, expected.T) + tm.assert_frame_equal(result1, expected) + tm.assert_frame_equal(result2, expected) + tm.assert_frame_equal(result3, expected.T) def test_unstack_period_frame(self): # GH 4342 @@ -1218,8 +1096,8 @@ def test_unstack_period_frame(self): expected = DataFrame([[5, 1, 6, 2, 6, 1], [4, 2, 3, 3, 5, 4]], index=e_1, columns=e_cols) - assert_frame_equal(result1, expected) - assert_frame_equal(result2, expected) + tm.assert_frame_equal(result1, expected) + tm.assert_frame_equal(result2, expected) e_1 = pd.PeriodIndex(['2014-01', '2014-02', '2014-01', '2014-02'], freq='M', name='period1') @@ -1229,7 +1107,7 @@ def test_unstack_period_frame(self): expected = DataFrame([[5, 4, 2, 3], [1, 2, 6, 5], [6, 3, 1, 4]], index=e_2, columns=e_cols) - assert_frame_equal(result3, expected) + tm.assert_frame_equal(result3, expected) def test_stack_multiple_bug(self): """ bug when some uniques are not present in the data #3170""" @@ -1247,7 +1125,7 @@ def test_stack_multiple_bug(self): rs = down.stack('ID') xp = unst.loc[:, ['VAR1']].resample('W-THU').mean().stack('ID') xp.columns.name = 'Params' - assert_frame_equal(rs, xp) + tm.assert_frame_equal(rs, xp) def test_stack_dropna(self): # GH #3997 @@ -1255,10 +1133,10 @@ def test_stack_dropna(self): df = df.set_index(['A', 'B']) stacked = df.unstack().stack(dropna=False) - self.assertTrue(len(stacked) > len(stacked.dropna())) + assert len(stacked) > len(stacked.dropna()) stacked = df.unstack().stack(dropna=True) - assert_frame_equal(stacked, stacked.dropna()) + tm.assert_frame_equal(stacked, stacked.dropna()) def test_unstack_multiple_hierarchical(self): df = DataFrame(index=[[0, 0, 0, 0, 1, 1, 1, 1], @@ -1281,7 +1159,7 @@ def test_groupby_transform(self): applied = grouped.apply(lambda x: x * 2) expected = grouped.transform(lambda x: x * 2) result = applied.reindex(expected.index) - assert_series_equal(result, expected, check_names=False) + tm.assert_series_equal(result, expected, check_names=False) def test_unstack_sparse_keyspace(self): # memory problems with naive impl #2278 @@ -1310,10 +1188,41 @@ def test_unstack_unobserved_keys(self): df = DataFrame(np.random.randn(4, 2), index=index) result = df.unstack() - self.assertEqual(len(result.columns), 4) + assert len(result.columns) == 4 recons = result.stack() - assert_frame_equal(recons, df) + tm.assert_frame_equal(recons, df) + + def test_stack_order_with_unsorted_levels(self): + # GH 16323 + + def manual_compare_stacked(df, df_stacked, lev0, lev1): + assert all(df.loc[row, col] == + df_stacked.loc[(row, col[lev0]), col[lev1]] + for row in df.index for col in df.columns) + + # deep check for 1-row case + for width in [2, 3]: + levels_poss = itertools.product( + itertools.permutations([0, 1, 2], width), + repeat=2) + + for levels in levels_poss: + columns = MultiIndex(levels=levels, + labels=[[0, 0, 1, 1], + [0, 1, 0, 1]]) + df = DataFrame(columns=columns, data=[range(4)]) + for stack_lev in range(2): + df_stacked = df.stack(stack_lev) + manual_compare_stacked(df, df_stacked, + stack_lev, 1 - stack_lev) + + # check multi-row case + mi = MultiIndex(levels=[["A", "C", "B"], ["B", "A", "C"]], + labels=[np.repeat(range(3), 3), np.tile(range(3), 3)]) + df = DataFrame(columns=mi, index=range(5), + data=np.arange(5 * len(mi)).reshape(5, -1)) + manual_compare_stacked(df, df.stack(0), 0, 1) def test_groupby_corner(self): midx = MultiIndex(levels=[['foo'], ['bar'], ['baz']], @@ -1334,7 +1243,7 @@ def test_groupby_level_no_obs(self): grouped = df1.groupby(axis=1, level=0) result = grouped.sum() - self.assertTrue((result.columns == ['f2', 'f3']).all()) + assert (result.columns == ['f2', 'f3']).all() def test_join(self): a = self.frame.loc[self.frame.index[:5], ['A']] @@ -1344,69 +1253,70 @@ def test_join(self): expected = self.frame.copy() expected.values[np.isnan(joined.values)] = np.nan - self.assertFalse(np.isnan(joined.values).all()) + assert not np.isnan(joined.values).all() - assert_frame_equal(joined, expected, check_names=False - ) # TODO what should join do with names ? + # TODO what should join do with names ? + tm.assert_frame_equal(joined, expected, check_names=False) def test_swaplevel(self): swapped = self.frame['A'].swaplevel() swapped2 = self.frame['A'].swaplevel(0) swapped3 = self.frame['A'].swaplevel(0, 1) swapped4 = self.frame['A'].swaplevel('first', 'second') - self.assertFalse(swapped.index.equals(self.frame.index)) - assert_series_equal(swapped, swapped2) - assert_series_equal(swapped, swapped3) - assert_series_equal(swapped, swapped4) + assert not swapped.index.equals(self.frame.index) + tm.assert_series_equal(swapped, swapped2) + tm.assert_series_equal(swapped, swapped3) + tm.assert_series_equal(swapped, swapped4) back = swapped.swaplevel() back2 = swapped.swaplevel(0) back3 = swapped.swaplevel(0, 1) back4 = swapped.swaplevel('second', 'first') - self.assertTrue(back.index.equals(self.frame.index)) - assert_series_equal(back, back2) - assert_series_equal(back, back3) - assert_series_equal(back, back4) + assert back.index.equals(self.frame.index) + tm.assert_series_equal(back, back2) + tm.assert_series_equal(back, back3) + tm.assert_series_equal(back, back4) ft = self.frame.T swapped = ft.swaplevel('first', 'second', axis=1) exp = self.frame.swaplevel('first', 'second').T - assert_frame_equal(swapped, exp) + tm.assert_frame_equal(swapped, exp) def test_swaplevel_panel(self): - panel = Panel({'ItemA': self.frame, 'ItemB': self.frame * 2}) - expected = panel.copy() - expected.major_axis = expected.major_axis.swaplevel(0, 1) + with catch_warnings(record=True): + panel = Panel({'ItemA': self.frame, 'ItemB': self.frame * 2}) + expected = panel.copy() + expected.major_axis = expected.major_axis.swaplevel(0, 1) - for result in (panel.swaplevel(axis='major'), - panel.swaplevel(0, axis='major'), - panel.swaplevel(0, 1, axis='major')): - tm.assert_panel_equal(result, expected) + for result in (panel.swaplevel(axis='major'), + panel.swaplevel(0, axis='major'), + panel.swaplevel(0, 1, axis='major')): + tm.assert_panel_equal(result, expected) def test_reorder_levels(self): result = self.ymd.reorder_levels(['month', 'day', 'year']) expected = self.ymd.swaplevel(0, 1).swaplevel(1, 2) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.ymd['A'].reorder_levels(['month', 'day', 'year']) expected = self.ymd['A'].swaplevel(0, 1).swaplevel(1, 2) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = self.ymd.T.reorder_levels(['month', 'day', 'year'], axis=1) expected = self.ymd.T.swaplevel(0, 1, axis=1).swaplevel(1, 2, axis=1) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) - with assertRaisesRegexp(TypeError, 'hierarchical axis'): + with tm.assert_raises_regex(TypeError, 'hierarchical axis'): self.ymd.reorder_levels([1, 2], axis=1) - with assertRaisesRegexp(IndexError, 'Too many levels'): + with tm.assert_raises_regex(IndexError, 'Too many levels'): self.ymd.index.reorder_levels([1, 2, 3]) def test_insert_index(self): df = self.ymd[:5].T df[2000, 1, 10] = df[2000, 1, 7] - tm.assertIsInstance(df.columns, MultiIndex) - self.assertTrue((df[2000, 1, 10] == df[2000, 1, 7]).all()) + assert isinstance(df.columns, MultiIndex) + assert (df[2000, 1, 10] == df[2000, 1, 7]).all() def test_alignment(self): x = Series(data=[1, 2, 3], index=MultiIndex.from_tuples([("A", 1), ( @@ -1418,29 +1328,13 @@ def test_alignment(self): res = x - y exp_index = x.index.union(y.index) exp = x.reindex(exp_index) - y.reindex(exp_index) - assert_series_equal(res, exp) + tm.assert_series_equal(res, exp) # hit non-monotonic code path res = x[::-1] - y[::-1] exp_index = x.index.union(y.index) exp = x.reindex(exp_index) - y.reindex(exp_index) - assert_series_equal(res, exp) - - def test_is_lexsorted(self): - levels = [[0, 1], [0, 1, 2]] - - index = MultiIndex(levels=levels, - labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]) - self.assertTrue(index.is_lexsorted()) - - index = MultiIndex(levels=levels, - labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 2, 1]]) - self.assertFalse(index.is_lexsorted()) - - index = MultiIndex(levels=levels, - labels=[[0, 0, 1, 0, 1, 1], [0, 1, 0, 2, 2, 1]]) - self.assertFalse(index.is_lexsorted()) - self.assertEqual(index.lexsort_depth, 0) + tm.assert_series_equal(res, exp) def test_frame_getitem_view(self): df = self.frame.T.copy() @@ -1448,7 +1342,7 @@ def test_frame_getitem_view(self): # this works because we are modifying the underlying array # really a no-no df['foo'].values[:] = 0 - self.assertTrue((df['foo'].values == 0).all()) + assert (df['foo'].values == 0).all() # but not if it's mixed-type df['foo', 'four'] = 'foo' @@ -1459,50 +1353,13 @@ def f(): df['foo']['one'] = 2 return df - self.assertRaises(com.SettingWithCopyError, f) + pytest.raises(com.SettingWithCopyError, f) try: df = f() except: pass - self.assertTrue((df['foo', 'one'] == 0).all()) - - def test_frame_getitem_not_sorted(self): - df = self.frame.T - df['foo', 'four'] = 'foo' - - arrays = [np.array(x) for x in zip(*df.columns._tuple_index)] - - result = df['foo'] - result2 = df.loc[:, 'foo'] - expected = df.reindex(columns=df.columns[arrays[0] == 'foo']) - expected.columns = expected.columns.droplevel(0) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) - - df = df.T - result = df.xs('foo') - result2 = df.loc['foo'] - expected = df.reindex(df.index[arrays[0] == 'foo']) - expected.index = expected.index.droplevel(0) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) - - def test_series_getitem_not_sorted(self): - arrays = [['bar', 'bar', 'baz', 'baz', 'qux', 'qux', 'foo', 'foo'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - tuples = lzip(*arrays) - index = MultiIndex.from_tuples(tuples) - s = Series(randn(8), index=index) - - arrays = [np.array(x) for x in zip(*index._tuple_index)] - - result = s['qux'] - result2 = s.loc['qux'] - expected = s[arrays[0] == 'qux'] - expected.index = expected.index.droplevel(0) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) + assert (df['foo', 'one'] == 0).all() def test_count(self): frame = self.frame.copy() @@ -1510,27 +1367,27 @@ def test_count(self): result = frame.count(level='b') expect = self.frame.count(level=1) - assert_frame_equal(result, expect, check_names=False) + tm.assert_frame_equal(result, expect, check_names=False) result = frame.count(level='a') expect = self.frame.count(level=0) - assert_frame_equal(result, expect, check_names=False) + tm.assert_frame_equal(result, expect, check_names=False) series = self.series.copy() series.index.names = ['a', 'b'] result = series.count(level='b') expect = self.series.count(level=1) - assert_series_equal(result, expect, check_names=False) - self.assertEqual(result.index.name, 'b') + tm.assert_series_equal(result, expect, check_names=False) + assert result.index.name == 'b' result = series.count(level='a') expect = self.series.count(level=0) - assert_series_equal(result, expect, check_names=False) - self.assertEqual(result.index.name, 'a') + tm.assert_series_equal(result, expect, check_names=False) + assert result.index.name == 'a' - self.assertRaises(KeyError, series.count, 'x') - self.assertRaises(KeyError, frame.count, level='x') + pytest.raises(KeyError, series.count, 'x') + pytest.raises(KeyError, frame.count, level='x') AGG_FUNCTIONS = ['sum', 'prod', 'min', 'max', 'median', 'mean', 'skew', 'mad', 'std', 'var', 'sem'] @@ -1543,7 +1400,7 @@ def test_series_group_min_max(self): # skipna=True leftside = grouped.agg(aggf) rightside = getattr(self.series, op)(level=level, skipna=skipna) - assert_series_equal(leftside, rightside) + tm.assert_series_equal(leftside, rightside) def test_frame_group_ops(self): self.frame.iloc[1, [1, 2]] = np.nan @@ -1552,6 +1409,7 @@ def test_frame_group_ops(self): for op, level, axis, skipna in cart_product(self.AGG_FUNCTIONS, lrange(2), lrange(2), [False, True]): + if axis == 0: frame = self.frame else: @@ -1572,17 +1430,17 @@ def aggf(x): # for good measure, groupby detail level_index = frame._get_axis(axis).levels[level] - self.assert_index_equal(leftside._get_axis(axis), level_index) - self.assert_index_equal(rightside._get_axis(axis), level_index) + tm.assert_index_equal(leftside._get_axis(axis), level_index) + tm.assert_index_equal(rightside._get_axis(axis), level_index) - assert_frame_equal(leftside, rightside) + tm.assert_frame_equal(leftside, rightside) def test_stat_op_corner(self): obj = Series([10.0], index=MultiIndex.from_tuples([(2, 3)])) result = obj.sum(level=0) expected = Series([10.0], index=[2]) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_frame_any_all_group(self): df = DataFrame( @@ -1593,11 +1451,11 @@ def test_frame_any_all_group(self): result = df.any(level=0) ex = DataFrame({'data': [False, True]}, index=['one', 'two']) - assert_frame_equal(result, ex) + tm.assert_frame_equal(result, ex) result = df.all(level=0) ex = DataFrame({'data': [False, False]}, index=['one', 'two']) - assert_frame_equal(result, ex) + tm.assert_frame_equal(result, ex) def test_std_var_pass_ddof(self): index = MultiIndex.from_arrays([np.arange(5).repeat(10), np.tile( @@ -1610,20 +1468,20 @@ def test_std_var_pass_ddof(self): result = getattr(df[0], meth)(level=0, ddof=ddof) expected = df[0].groupby(level=0).agg(alt) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) result = getattr(df, meth)(level=0, ddof=ddof) expected = df.groupby(level=0).agg(alt) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_frame_series_agg_multiple_levels(self): result = self.ymd.sum(level=['year', 'month']) expected = self.ymd.groupby(level=['year', 'month']).sum() - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.ymd['A'].sum(level=['year', 'month']) expected = self.ymd['A'].groupby(level=['year', 'month']).sum() - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_groupby_multilevel(self): result = self.ymd.groupby(level=[0, 1]).mean() @@ -1633,12 +1491,12 @@ def test_groupby_multilevel(self): expected = self.ymd.groupby([k1, k2]).mean() - assert_frame_equal(result, expected, check_names=False - ) # TODO groupby with level_values drops names - self.assertEqual(result.index.names, self.ymd.index.names[:2]) + # TODO groupby with level_values drops names + tm.assert_frame_equal(result, expected, check_names=False) + assert result.index.names == self.ymd.index.names[:2] result2 = self.ymd.groupby(level=self.ymd.index.names[:2]).mean() - assert_frame_equal(result, result2) + tm.assert_frame_equal(result, result2) def test_groupby_multilevel_with_transform(self): pass @@ -1648,18 +1506,18 @@ def test_multilevel_consolidate(self): 'bar', 'one'), ('bar', 'two')]) df = DataFrame(np.random.randn(4, 4), index=index, columns=index) df['Totals', ''] = df.sum(1) - df = df.consolidate() + df = df._consolidate() def test_ix_preserve_names(self): result = self.ymd.loc[2000] result2 = self.ymd['A'].loc[2000] - self.assertEqual(result.index.names, self.ymd.index.names[1:]) - self.assertEqual(result2.index.names, self.ymd.index.names[1:]) + assert result.index.names == self.ymd.index.names[1:] + assert result2.index.names == self.ymd.index.names[1:] result = self.ymd.loc[2000, 2] result2 = self.ymd['A'].loc[2000, 2] - self.assertEqual(result.index.name, self.ymd.index.names[2]) - self.assertEqual(result2.index.name, self.ymd.index.names[2]) + assert result.index.name == self.ymd.index.names[2] + assert result2.index.name == self.ymd.index.names[2] def test_partial_set(self): # GH #397 @@ -1667,19 +1525,19 @@ def test_partial_set(self): exp = self.ymd.copy() df.loc[2000, 4] = 0 exp.loc[2000, 4].values[:] = 0 - assert_frame_equal(df, exp) + tm.assert_frame_equal(df, exp) df['A'].loc[2000, 4] = 1 exp['A'].loc[2000, 4].values[:] = 1 - assert_frame_equal(df, exp) + tm.assert_frame_equal(df, exp) df.loc[2000] = 5 exp.loc[2000].values[:] = 5 - assert_frame_equal(df, exp) + tm.assert_frame_equal(df, exp) # this works...for now df['A'].iloc[14] = 5 - self.assertEqual(df['A'][14], 5) + assert df['A'][14] == 5 def test_unstack_preserve_types(self): # GH #403 @@ -1687,9 +1545,9 @@ def test_unstack_preserve_types(self): self.ymd['F'] = 2 unstacked = self.ymd.unstack('month') - self.assertEqual(unstacked['A', 1].dtype, np.float64) - self.assertEqual(unstacked['E', 1].dtype, np.object_) - self.assertEqual(unstacked['F', 1].dtype, np.float64) + assert unstacked['A', 1].dtype == np.float64 + assert unstacked['E', 1].dtype == np.object_ + assert unstacked['F', 1].dtype == np.float64 def test_unstack_group_index_overflow(self): labels = np.tile(np.arange(500), 2) @@ -1700,11 +1558,11 @@ def test_unstack_group_index_overflow(self): s = Series(np.arange(1000), index=index) result = s.unstack() - self.assertEqual(result.shape, (500, 2)) + assert result.shape == (500, 2) # test roundtrip stacked = result.stack() - assert_series_equal(s, stacked.reindex(s.index)) + tm.assert_series_equal(s, stacked.reindex(s.index)) # put it at beginning index = MultiIndex(levels=[[0, 1]] + [level] * 8, @@ -1712,7 +1570,7 @@ def test_unstack_group_index_overflow(self): s = Series(np.arange(1000), index=index) result = s.unstack(0) - self.assertEqual(result.shape, (500, 2)) + assert result.shape == (500, 2) # put it in middle index = MultiIndex(levels=[level] * 4 + [[0, 1]] + [level] * 4, @@ -1721,34 +1579,34 @@ def test_unstack_group_index_overflow(self): s = Series(np.arange(1000), index=index) result = s.unstack(4) - self.assertEqual(result.shape, (500, 2)) + assert result.shape == (500, 2) def test_getitem_lowerdim_corner(self): - self.assertRaises(KeyError, self.frame.loc.__getitem__, - (('bar', 'three'), 'B')) + pytest.raises(KeyError, self.frame.loc.__getitem__, + (('bar', 'three'), 'B')) # in theory should be inserting in a sorted space???? self.frame.loc[('bar', 'three'), 'B'] = 0 - self.assertEqual(self.frame.sort_index().loc[('bar', 'three'), 'B'], 0) + assert self.frame.sort_index().loc[('bar', 'three'), 'B'] == 0 # --------------------------------------------------------------------- # AMBIGUOUS CASES! def test_partial_ix_missing(self): - raise nose.SkipTest("skipping for now") + pytest.skip("skipping for now") result = self.ymd.loc[2000, 0] expected = self.ymd.loc[2000]['A'] - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) # need to put in some work here # self.ymd.loc[2000, 0] = 0 - # self.assertTrue((self.ymd.loc[2000]['A'] == 0).all()) + # assert (self.ymd.loc[2000]['A'] == 0).all() # Pretty sure the second (and maybe even the first) is already wrong. - self.assertRaises(Exception, self.ymd.loc.__getitem__, (2000, 6)) - self.assertRaises(Exception, self.ymd.loc.__getitem__, (2000, 6), 0) + pytest.raises(Exception, self.ymd.loc.__getitem__, (2000, 6)) + pytest.raises(Exception, self.ymd.loc.__getitem__, (2000, 6), 0) # --------------------------------------------------------------------- @@ -1769,17 +1627,17 @@ def test_level_with_tuples(self): result2 = series.loc[('foo', 'bar', 0)] expected = series[:2] expected.index = expected.index.droplevel(0) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result2, expected) - self.assertRaises(KeyError, series.__getitem__, (('foo', 'bar', 0), 2)) + pytest.raises(KeyError, series.__getitem__, (('foo', 'bar', 0), 2)) result = frame.loc[('foo', 'bar', 0)] result2 = frame.xs(('foo', 'bar', 0)) expected = frame[:2] expected.index = expected.index.droplevel(0) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result2, expected) index = MultiIndex(levels=[[('foo', 'bar'), ('foo', 'baz'), ( 'foo', 'qux')], [0, 1]], @@ -1792,49 +1650,56 @@ def test_level_with_tuples(self): result2 = series.loc[('foo', 'bar')] expected = series[:2] expected.index = expected.index.droplevel(0) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result2, expected) result = frame.loc[('foo', 'bar')] result2 = frame.xs(('foo', 'bar')) expected = frame[:2] expected.index = expected.index.droplevel(0) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result2, expected) def test_int_series_slicing(self): s = self.ymd['A'] result = s[5:] expected = s.reindex(s.index[5:]) - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) exp = self.ymd['A'].copy() s[5:] = 0 exp.values[5:] = 0 - self.assert_numpy_array_equal(s.values, exp.values) + tm.assert_numpy_array_equal(s.values, exp.values) result = self.ymd[5:] expected = self.ymd.reindex(s.index[5:]) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize('unicode_strings', [True, False]) + def test_mixed_depth_get(self, unicode_strings): + # If unicode_strings is True, the column labels in dataframe + # construction will use unicode strings in Python 2 (pull request + # #17099). - def test_mixed_depth_get(self): arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], ['', 'OD', 'OD', 'result1', 'result2', 'result1'], ['', 'wx', 'wy', '', '', '']] + if unicode_strings: + arrays = [[u(s) for s in arr] for arr in arrays] + tuples = sorted(zip(*arrays)) index = MultiIndex.from_tuples(tuples) - df = DataFrame(randn(4, 6), columns=index) + df = DataFrame(np.random.randn(4, 6), columns=index) result = df['a'] - expected = df['a', '', ''] - assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, 'a') + expected = df['a', '', ''].rename('a') + tm.assert_series_equal(result, expected) result = df['routine1', 'result1'] expected = df['routine1', 'result1', ''] - assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, ('routine1', 'result1')) + expected = expected.rename(('routine1', 'result1')) + tm.assert_series_equal(result, expected) def test_mixed_depth_insert(self): arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], @@ -1849,7 +1714,7 @@ def test_mixed_depth_insert(self): expected = df.copy() result['b'] = [1, 2, 3, 4] expected['b', '', ''] = [1, 2, 3, 4] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_mixed_depth_drop(self): arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], @@ -1862,16 +1727,16 @@ def test_mixed_depth_drop(self): result = df.drop('a', axis=1) expected = df.drop([('a', '', '')], axis=1) - assert_frame_equal(expected, result) + tm.assert_frame_equal(expected, result) result = df.drop(['top'], axis=1) expected = df.drop([('top', 'OD', 'wx')], axis=1) expected = expected.drop([('top', 'OD', 'wy')], axis=1) - assert_frame_equal(expected, result) + tm.assert_frame_equal(expected, result) result = df.drop(('top', 'OD', 'wx'), axis=1) expected = df.drop([('top', 'OD', 'wx')], axis=1) - assert_frame_equal(expected, result) + tm.assert_frame_equal(expected, result) expected = df.drop([('top', 'OD', 'wy')], axis=1) expected = df.drop('top', axis=1) @@ -1879,7 +1744,7 @@ def test_mixed_depth_drop(self): result = df.drop('result1', level=1, axis=1) expected = df.drop([('routine1', 'result1', ''), ('routine2', 'result1', '')], axis=1) - assert_frame_equal(expected, result) + tm.assert_frame_equal(expected, result) def test_drop_nonunique(self): df = DataFrame([["x-a", "x", "a", 1.5], ["x-a", "x", "a", 1.2], @@ -1900,7 +1765,7 @@ def test_drop_nonunique(self): result.index = expected.index - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_mixed_depth_pop(self): arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], @@ -1915,32 +1780,32 @@ def test_mixed_depth_pop(self): df2 = df.copy() result = df1.pop('a') expected = df2.pop(('a', '', '')) - assert_series_equal(expected, result, check_names=False) - assert_frame_equal(df1, df2) - self.assertEqual(result.name, 'a') + tm.assert_series_equal(expected, result, check_names=False) + tm.assert_frame_equal(df1, df2) + assert result.name == 'a' expected = df1['top'] df1 = df1.drop(['top'], axis=1) result = df2.pop('top') - assert_frame_equal(expected, result) - assert_frame_equal(df1, df2) + tm.assert_frame_equal(expected, result) + tm.assert_frame_equal(df1, df2) def test_reindex_level_partial_selection(self): result = self.frame.reindex(['foo', 'qux'], level=0) expected = self.frame.iloc[[0, 1, 2, 7, 8, 9]] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame.T.reindex_axis(['foo', 'qux'], axis=1, level=0) - assert_frame_equal(result, expected.T) + tm.assert_frame_equal(result, expected.T) result = self.frame.loc[['foo', 'qux']] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame['A'].loc[['foo', 'qux']] - assert_series_equal(result, expected['A']) + tm.assert_series_equal(result, expected['A']) result = self.frame.T.loc[:, ['foo', 'qux']] - assert_frame_equal(result, expected.T) + tm.assert_frame_equal(result, expected.T) def test_setitem_multiple_partial(self): expected = self.frame.copy() @@ -1948,45 +1813,45 @@ def test_setitem_multiple_partial(self): result.loc[['foo', 'bar']] = 0 expected.loc['foo'] = 0 expected.loc['bar'] = 0 - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) expected = self.frame.copy() result = self.frame.copy() result.loc['foo':'bar'] = 0 expected.loc['foo'] = 0 expected.loc['bar'] = 0 - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) expected = self.frame['A'].copy() result = self.frame['A'].copy() result.loc[['foo', 'bar']] = 0 expected.loc['foo'] = 0 expected.loc['bar'] = 0 - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) expected = self.frame['A'].copy() result = self.frame['A'].copy() result.loc['foo':'bar'] = 0 expected.loc['foo'] = 0 expected.loc['bar'] = 0 - assert_series_equal(result, expected) + tm.assert_series_equal(result, expected) def test_drop_level(self): result = self.frame.drop(['bar', 'qux'], level='first') expected = self.frame.iloc[[0, 1, 2, 5, 6]] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame.drop(['two'], level='second') expected = self.frame.iloc[[0, 2, 3, 6, 7, 9]] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame.T.drop(['bar', 'qux'], axis=1, level='first') expected = self.frame.iloc[[0, 1, 2, 5, 6]].T - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) result = self.frame.T.drop(['two'], axis=1, level='second') expected = self.frame.iloc[[0, 2, 3, 6, 7, 9]].T - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_drop_level_nonunique_datetime(self): # GH 12701 @@ -2001,11 +1866,11 @@ def test_drop_level_nonunique_datetime(self): df['tstamp'] = idxdt df = df.set_index('tstamp', append=True) ts = pd.Timestamp('201603231600') - self.assertFalse(df.index.is_unique) + assert not df.index.is_unique result = df.drop(ts, level='tstamp') expected = df.loc[idx != 4] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_drop_preserve_names(self): index = MultiIndex.from_arrays([[0, 0, 0, 1, 1, 1], @@ -2015,7 +1880,7 @@ def test_drop_preserve_names(self): df = DataFrame(np.random.randn(6, 3), index=index) result = df.drop([(0, 2)]) - self.assertEqual(result.index.names, ('one', 'two')) + assert result.index.names == ('one', 'two') def test_unicode_repr_issues(self): levels = [Index([u('a/\u03c3'), u('b/\u03c3'), u('c/\u03c3')]), @@ -2044,7 +1909,7 @@ def test_dataframe_insert_column_all_na(self): df = DataFrame([[1, 2], [3, 4], [5, 6]], index=mix) s = Series({(1, 1): 1, (1, 2): 2}) df['new'] = s - self.assertTrue(df['new'].isnull().all()) + assert df['new'].isna().all() def test_join_segfault(self): # 1532 @@ -2060,11 +1925,11 @@ def test_set_column_scalar_with_ix(self): subset = self.frame.index[[1, 4, 5]] self.frame.loc[subset] = 99 - self.assertTrue((self.frame.loc[subset].values == 99).all()) + assert (self.frame.loc[subset].values == 99).all() col = self.frame['B'] col[subset] = 97 - self.assertTrue((self.frame.loc[subset, 'B'] == 97).all()) + assert (self.frame.loc[subset, 'B'] == 97).all() def test_frame_dict_constructor_empty_series(self): s1 = Series([ @@ -2090,8 +1955,8 @@ def test_indexing_ambiguity_bug_1678(self): result = frame.iloc[:, 1] exp = frame.loc[:, ('Ohio', 'Red')] - tm.assertIsInstance(result, Series) - assert_series_equal(result, exp) + assert isinstance(result, Series) + tm.assert_series_equal(result, exp) def test_nonunique_assignment_1750(self): df = DataFrame([[1, 1, "x", "X"], [1, 1, "y", "Y"], [1, 2, "z", "Z"]], @@ -2102,7 +1967,7 @@ def test_nonunique_assignment_1750(self): df.loc[ix, "C"] = '_' - self.assertTrue((df.xs((1, 1))['C'] == '_').all()) + assert (df.xs((1, 1))['C'] == '_').all() def test_indexing_over_hashtable_size_cutoff(self): n = 10000 @@ -2114,9 +1979,9 @@ def test_indexing_over_hashtable_size_cutoff(self): MultiIndex.from_arrays((["a"] * n, np.arange(n)))) # hai it works! - self.assertEqual(s[("a", 5)], 5) - self.assertEqual(s[("a", 6)], 6) - self.assertEqual(s[("a", 7)], 7) + assert s[("a", 5)] == 5 + assert s[("a", 6)] == 6 + assert s[("a", 7)] == 7 _index._SIZE_CUTOFF = old_cutoff @@ -2156,8 +2021,8 @@ def test_tuples_have_na(self): labels=[[1, 1, 1, 1, -1, 0, 0, 0], [0, 1, 2, 3, 0, 1, 2, 3]]) - self.assertTrue(isnull(index[4][0])) - self.assertTrue(isnull(index.values[4][0])) + assert isna(index[4][0]) + assert isna(index.values[4][0]) def test_duplicate_groupby_issues(self): idx_tp = [('600809', '20061231'), ('600809', '20070331'), @@ -2168,7 +2033,7 @@ def test_duplicate_groupby_issues(self): s = Series(dt, index=idx) result = s.groupby(s.index).first() - self.assertEqual(len(result), 3) + assert len(result) == 3 def test_duplicate_mi(self): # GH 4516 @@ -2183,7 +2048,7 @@ def test_duplicate_mi(self): ['foo', 'bar', 5.0, 5]], columns=list('ABCD')).set_index(['A', 'B']) result = df.loc[('foo', 'bar')] - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) def test_duplicated_drop_duplicates(self): # GH 4060 @@ -2193,35 +2058,24 @@ def test_duplicated_drop_duplicates(self): [False, False, False, True, False, False], dtype=bool) duplicated = idx.duplicated() tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool expected = MultiIndex.from_arrays(([1, 2, 3, 2, 3], [1, 1, 1, 2, 2])) tm.assert_index_equal(idx.drop_duplicates(), expected) expected = np.array([True, False, False, False, False, False]) duplicated = idx.duplicated(keep='last') tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool expected = MultiIndex.from_arrays(([2, 3, 1, 2, 3], [1, 1, 1, 2, 2])) tm.assert_index_equal(idx.drop_duplicates(keep='last'), expected) expected = np.array([True, False, False, True, False, False]) duplicated = idx.duplicated(keep=False) tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) + assert duplicated.dtype == bool expected = MultiIndex.from_arrays(([2, 3, 2, 3], [1, 1, 2, 2])) tm.assert_index_equal(idx.drop_duplicates(keep=False), expected) - # deprecate take_last - expected = np.array([True, False, False, False, False, False]) - with tm.assert_produces_warning(FutureWarning): - duplicated = idx.duplicated(take_last=True) - tm.assert_numpy_array_equal(duplicated, expected) - self.assertTrue(duplicated.dtype == bool) - expected = MultiIndex.from_arrays(([2, 3, 1, 2, 3], [1, 1, 1, 2, 2])) - with tm.assert_produces_warning(FutureWarning): - tm.assert_index_equal( - idx.drop_duplicates(take_last=True), expected) - def test_multiindex_set_index(self): # segfault in #3308 d = {'t1': [2, 2.5, 3], 't2': [4, 5, 6]} @@ -2244,8 +2098,8 @@ def test_datetimeindex(self): expected1 = pd.DatetimeIndex(['2013-04-01 9:00', '2013-04-02 9:00', '2013-04-03 9:00'], tz='Asia/Tokyo') - self.assert_index_equal(idx.levels[0], expected1) - self.assert_index_equal(idx.levels[1], idx2) + tm.assert_index_equal(idx.levels[0], expected1) + tm.assert_index_equal(idx.levels[1], idx2) # from datetime combos # GH 7888 @@ -2256,8 +2110,8 @@ def test_datetimeindex(self): for d1, d2 in itertools.product( [date1, date2, date3], [date1, date2, date3]): index = pd.MultiIndex.from_product([[d1], [d2]]) - self.assertIsInstance(index.levels[0], pd.DatetimeIndex) - self.assertIsInstance(index.levels[1], pd.DatetimeIndex) + assert isinstance(index.levels[0], pd.DatetimeIndex) + assert isinstance(index.levels[1], pd.DatetimeIndex) def test_constructor_with_tz(self): @@ -2291,14 +2145,14 @@ def test_set_index_datetime(self): expected = expected.tz_localize('UTC').tz_convert('US/Pacific') df = df.set_index('label', append=True) - self.assert_index_equal(df.index.levels[0], expected) - self.assert_index_equal(df.index.levels[1], - pd.Index(['a', 'b'], name='label')) + tm.assert_index_equal(df.index.levels[0], expected) + tm.assert_index_equal(df.index.levels[1], + pd.Index(['a', 'b'], name='label')) df = df.swaplevel(0, 1) - self.assert_index_equal(df.index.levels[0], - pd.Index(['a', 'b'], name='label')) - self.assert_index_equal(df.index.levels[1], expected) + tm.assert_index_equal(df.index.levels[0], + pd.Index(['a', 'b'], name='label')) + tm.assert_index_equal(df.index.levels[1], expected) df = DataFrame(np.random.random(6)) idx1 = pd.DatetimeIndex(['2011-07-19 07:00:00', '2011-07-19 08:00:00', @@ -2321,14 +2175,14 @@ def test_set_index_datetime(self): expected2 = pd.DatetimeIndex(['2012-04-01 09:00', '2012-04-02 09:00'], tz='US/Eastern') - self.assert_index_equal(df.index.levels[0], expected1) - self.assert_index_equal(df.index.levels[1], expected2) - self.assert_index_equal(df.index.levels[2], idx3) + tm.assert_index_equal(df.index.levels[0], expected1) + tm.assert_index_equal(df.index.levels[1], expected2) + tm.assert_index_equal(df.index.levels[2], idx3) # GH 7092 - self.assert_index_equal(df.index.get_level_values(0), idx1) - self.assert_index_equal(df.index.get_level_values(1), idx2) - self.assert_index_equal(df.index.get_level_values(2), idx3) + tm.assert_index_equal(df.index.get_level_values(0), idx1) + tm.assert_index_equal(df.index.get_level_values(1), idx2) + tm.assert_index_equal(df.index.get_level_values(2), idx3) def test_reset_index_datetime(self): # GH 3950 @@ -2353,7 +2207,7 @@ def test_reset_index_datetime(self): expected['idx1'] = expected['idx1'].apply( lambda d: pd.Timestamp(d, tz=tz)) - assert_frame_equal(df.reset_index(), expected) + tm.assert_frame_equal(df.reset_index(), expected) idx3 = pd.date_range('1/1/2012', periods=5, freq='MS', tz='Europe/Paris', name='idx3') @@ -2380,7 +2234,7 @@ def test_reset_index_datetime(self): lambda d: pd.Timestamp(d, tz=tz)) expected['idx3'] = expected['idx3'].apply( lambda d: pd.Timestamp(d, tz='Europe/Paris')) - assert_frame_equal(df.reset_index(), expected) + tm.assert_frame_equal(df.reset_index(), expected) # GH 7793 idx = pd.MultiIndex.from_product([['a', 'b'], pd.date_range( @@ -2398,7 +2252,7 @@ def test_reset_index_datetime(self): columns=['level_0', 'level_1', 'a']) expected['level_1'] = expected['level_1'].apply( lambda d: pd.Timestamp(d, freq='D', tz=tz)) - assert_frame_equal(df.reset_index(), expected) + tm.assert_frame_equal(df.reset_index(), expected) def test_reset_index_period(self): # GH 7746 @@ -2417,7 +2271,61 @@ def test_reset_index_period(self): 'feature': ['a', 'b', 'c'] * 3, 'a': np.arange(9, dtype='int64') }, columns=['month', 'feature', 'a']) - assert_frame_equal(df.reset_index(), expected) + tm.assert_frame_equal(df.reset_index(), expected) + + def test_reset_index_multiindex_columns(self): + levels = [['A', ''], ['B', 'b']] + df = pd.DataFrame([[0, 2], [1, 3]], + columns=pd.MultiIndex.from_tuples(levels)) + result = df[['B']].rename_axis('A').reset_index() + tm.assert_frame_equal(result, df) + + # gh-16120: already existing column + with tm.assert_raises_regex(ValueError, + ("cannot insert \('A', ''\), " + "already exists")): + df.rename_axis('A').reset_index() + + # gh-16164: multiindex (tuple) full key + result = df.set_index([('A', '')]).reset_index() + tm.assert_frame_equal(result, df) + + # with additional (unnamed) index level + idx_col = pd.DataFrame([[0], [1]], + columns=pd.MultiIndex.from_tuples([('level_0', + '')])) + expected = pd.concat([idx_col, df[[('B', 'b'), ('A', '')]]], axis=1) + result = df.set_index([('B', 'b')], append=True).reset_index() + tm.assert_frame_equal(result, expected) + + # with index name which is a too long tuple... + with tm.assert_raises_regex(ValueError, + ("Item must have length equal to number " + "of levels.")): + df.rename_axis([('C', 'c', 'i')]).reset_index() + + # or too short... + levels = [['A', 'a', ''], ['B', 'b', 'i']] + df2 = pd.DataFrame([[0, 2], [1, 3]], + columns=pd.MultiIndex.from_tuples(levels)) + idx_col = pd.DataFrame([[0], [1]], + columns=pd.MultiIndex.from_tuples([('C', + 'c', + 'ii')])) + expected = pd.concat([idx_col, df2], axis=1) + result = df2.rename_axis([('C', 'c')]).reset_index(col_fill='ii') + tm.assert_frame_equal(result, expected) + + # ... which is incompatible with col_fill=None + with tm.assert_raises_regex(ValueError, + ("col_fill=None is incompatible with " + "incomplete column name \('C', 'c'\)")): + df2.rename_axis([('C', 'c')]).reset_index(col_fill=None) + + # with col_level != 0 + result = df2.rename_axis([('c', 'ii')]).reset_index(col_level=1, + col_fill='C') + tm.assert_frame_equal(result, expected) def test_set_index_period(self): # GH 6631 @@ -2435,13 +2343,13 @@ def test_set_index_period(self): expected1 = pd.period_range('2011-01-01', periods=3, freq='M') expected2 = pd.period_range('2013-01-01 09:00', periods=2, freq='H') - self.assert_index_equal(df.index.levels[0], expected1) - self.assert_index_equal(df.index.levels[1], expected2) - self.assert_index_equal(df.index.levels[2], idx3) + tm.assert_index_equal(df.index.levels[0], expected1) + tm.assert_index_equal(df.index.levels[1], expected2) + tm.assert_index_equal(df.index.levels[2], idx3) - self.assert_index_equal(df.index.get_level_values(0), idx1) - self.assert_index_equal(df.index.get_level_values(1), idx2) - self.assert_index_equal(df.index.get_level_values(2), idx3) + tm.assert_index_equal(df.index.get_level_values(0), idx1) + tm.assert_index_equal(df.index.get_level_values(1), idx2) + tm.assert_index_equal(df.index.get_level_values(2), idx3) def test_repeat(self): # GH 9361 @@ -2477,9 +2385,429 @@ def test_iloc_mi(self): result = pd.DataFrame([[df_mi.iloc[r, c] for c in range(2)] for r in range(5)]) - assert_frame_equal(result, expected) + tm.assert_frame_equal(result, expected) + + +class TestSorted(Base): + """ everthing you wanted to test about sorting """ + + def test_sort_index_preserve_levels(self): + result = self.frame.sort_index() + assert result.index.names == self.frame.index.names + + def test_sorting_repr_8017(self): + + np.random.seed(0) + data = np.random.randn(3, 4) + + for gen, extra in [([1., 3., 2., 5.], 4.), ([1, 3, 2, 5], 4), + ([Timestamp('20130101'), Timestamp('20130103'), + Timestamp('20130102'), Timestamp('20130105')], + Timestamp('20130104')), + (['1one', '3one', '2one', '5one'], '4one')]: + columns = MultiIndex.from_tuples([('red', i) for i in gen]) + df = DataFrame(data, index=list('def'), columns=columns) + df2 = pd.concat([df, + DataFrame('world', index=list('def'), + columns=MultiIndex.from_tuples( + [('red', extra)]))], axis=1) + + # check that the repr is good + # make sure that we have a correct sparsified repr + # e.g. only 1 header of read + assert str(df2).splitlines()[0].split() == ['red'] + + # GH 8017 + # sorting fails after columns added + + # construct single-dtype then sort + result = df.copy().sort_index(axis=1) + expected = df.iloc[:, [0, 2, 1, 3]] + tm.assert_frame_equal(result, expected) + + result = df2.sort_index(axis=1) + expected = df2.iloc[:, [0, 2, 1, 4, 3]] + tm.assert_frame_equal(result, expected) + + # setitem then sort + result = df.copy() + result[('red', extra)] = 'world' + + result = result.sort_index(axis=1) + tm.assert_frame_equal(result, expected) + + def test_sort_index_level(self): + df = self.frame.copy() + df.index = np.arange(len(df)) + + # axis=1 + + # series + a_sorted = self.frame['A'].sort_index(level=0) + + # preserve names + assert a_sorted.index.names == self.frame.index.names + + # inplace + rs = self.frame.copy() + rs.sort_index(level=0, inplace=True) + tm.assert_frame_equal(rs, self.frame.sort_index(level=0)) + + def test_sort_index_level_large_cardinality(self): + + # #2684 (int64) + index = MultiIndex.from_arrays([np.arange(4000)] * 3) + df = DataFrame(np.random.randn(4000), index=index, dtype=np.int64) + + # it works! + result = df.sort_index(level=0) + assert result.index.lexsort_depth == 3 + + # #2684 (int32) + index = MultiIndex.from_arrays([np.arange(4000)] * 3) + df = DataFrame(np.random.randn(4000), index=index, dtype=np.int32) + + # it works! + result = df.sort_index(level=0) + assert (result.dtypes.values == df.dtypes.values).all() + assert result.index.lexsort_depth == 3 + + def test_sort_index_level_by_name(self): + self.frame.index.names = ['first', 'second'] + result = self.frame.sort_index(level='second') + expected = self.frame.sort_index(level=1) + tm.assert_frame_equal(result, expected) + + def test_sort_index_level_mixed(self): + sorted_before = self.frame.sort_index(level=1) + + df = self.frame.copy() + df['foo'] = 'bar' + sorted_after = df.sort_index(level=1) + tm.assert_frame_equal(sorted_before, + sorted_after.drop(['foo'], axis=1)) + + dft = self.frame.T + sorted_before = dft.sort_index(level=1, axis=1) + dft['foo', 'three'] = 'bar' + + sorted_after = dft.sort_index(level=1, axis=1) + tm.assert_frame_equal(sorted_before.drop([('foo', 'three')], axis=1), + sorted_after.drop([('foo', 'three')], axis=1)) + + def test_is_lexsorted(self): + levels = [[0, 1], [0, 1, 2]] + + index = MultiIndex(levels=levels, + labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]) + assert index.is_lexsorted() + + index = MultiIndex(levels=levels, + labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 2, 1]]) + assert not index.is_lexsorted() + + index = MultiIndex(levels=levels, + labels=[[0, 0, 1, 0, 1, 1], [0, 1, 0, 2, 2, 1]]) + assert not index.is_lexsorted() + assert index.lexsort_depth == 0 + + def test_getitem_multilevel_index_tuple_not_sorted(self): + index_columns = list("abc") + df = DataFrame([[0, 1, 0, "x"], [0, 0, 1, "y"]], + columns=index_columns + ["data"]) + df = df.set_index(index_columns) + query_index = df.index[:1] + rs = df.loc[query_index, "data"] + + xp_idx = MultiIndex.from_tuples([(0, 1, 0)], names=['a', 'b', 'c']) + xp = Series(['x'], index=xp_idx, name='data') + tm.assert_series_equal(rs, xp) + + def test_getitem_slice_not_sorted(self): + df = self.frame.sort_index(level=1).T + + # buglet with int typechecking + result = df.iloc[:, :np.int32(3)] + expected = df.reindex(columns=df.columns[:3]) + tm.assert_frame_equal(result, expected) + + def test_frame_getitem_not_sorted2(self): + # 13431 + df = DataFrame({'col1': ['b', 'd', 'b', 'a'], + 'col2': [3, 1, 1, 2], + 'data': ['one', 'two', 'three', 'four']}) + + df2 = df.set_index(['col1', 'col2']) + df2_original = df2.copy() + + df2.index.set_levels(['b', 'd', 'a'], level='col1', inplace=True) + df2.index.set_labels([0, 1, 0, 2], level='col1', inplace=True) + assert not df2.index.is_lexsorted() + assert not df2.index.is_monotonic + + assert df2_original.index.equals(df2.index) + expected = df2.sort_index() + assert expected.index.is_lexsorted() + assert expected.index.is_monotonic + result = df2.sort_index(level=0) + assert result.index.is_lexsorted() + assert result.index.is_monotonic + tm.assert_frame_equal(result, expected) + + def test_frame_getitem_not_sorted(self): + df = self.frame.T + df['foo', 'four'] = 'foo' + + arrays = [np.array(x) for x in zip(*df.columns.values)] + + result = df['foo'] + result2 = df.loc[:, 'foo'] + expected = df.reindex(columns=df.columns[arrays[0] == 'foo']) + expected.columns = expected.columns.droplevel(0) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result2, expected) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + df = df.T + result = df.xs('foo') + result2 = df.loc['foo'] + expected = df.reindex(df.index[arrays[0] == 'foo']) + expected.index = expected.index.droplevel(0) + tm.assert_frame_equal(result, expected) + tm.assert_frame_equal(result2, expected) + + def test_series_getitem_not_sorted(self): + arrays = [['bar', 'bar', 'baz', 'baz', 'qux', 'qux', 'foo', 'foo'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + tuples = lzip(*arrays) + index = MultiIndex.from_tuples(tuples) + s = Series(randn(8), index=index) + + arrays = [np.array(x) for x in zip(*index.values)] + + result = s['qux'] + result2 = s.loc['qux'] + expected = s[arrays[0] == 'qux'] + expected.index = expected.index.droplevel(0) + tm.assert_series_equal(result, expected) + tm.assert_series_equal(result2, expected) + + def test_sort_index_and_reconstruction(self): + + # 15622 + # lexsortedness should be identical + # across MultiIndex consruction methods + + df = DataFrame([[1, 1], [2, 2]], index=list('ab')) + expected = DataFrame([[1, 1], [2, 2], [1, 1], [2, 2]], + index=MultiIndex.from_tuples([(0.5, 'a'), + (0.5, 'b'), + (0.8, 'a'), + (0.8, 'b')])) + assert expected.index.is_lexsorted() + + result = DataFrame( + [[1, 1], [2, 2], [1, 1], [2, 2]], + index=MultiIndex.from_product([[0.5, 0.8], list('ab')])) + result = result.sort_index() + assert result.index.is_lexsorted() + assert result.index.is_monotonic + + tm.assert_frame_equal(result, expected) + + result = DataFrame( + [[1, 1], [2, 2], [1, 1], [2, 2]], + index=MultiIndex(levels=[[0.5, 0.8], ['a', 'b']], + labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + result = result.sort_index() + assert result.index.is_lexsorted() + + tm.assert_frame_equal(result, expected) + + concatted = pd.concat([df, df], keys=[0.8, 0.5]) + result = concatted.sort_index() + + assert result.index.is_lexsorted() + assert result.index.is_monotonic + + tm.assert_frame_equal(result, expected) + + # 14015 + df = DataFrame([[1, 2], [6, 7]], + columns=MultiIndex.from_tuples( + [(0, '20160811 12:00:00'), + (0, '20160809 12:00:00')], + names=['l1', 'Date'])) + + df.columns.set_levels(pd.to_datetime(df.columns.levels[1]), + level=1, + inplace=True) + assert not df.columns.is_lexsorted() + assert not df.columns.is_monotonic + result = df.sort_index(axis=1) + assert result.columns.is_lexsorted() + assert result.columns.is_monotonic + result = df.sort_index(axis=1, level=1) + assert result.columns.is_lexsorted() + assert result.columns.is_monotonic + + def test_sort_index_and_reconstruction_doc_example(self): + # doc example + df = DataFrame({'value': [1, 2, 3, 4]}, + index=MultiIndex( + levels=[['a', 'b'], ['bb', 'aa']], + labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + assert df.index.is_lexsorted() + assert not df.index.is_monotonic + + # sort it + expected = DataFrame({'value': [2, 1, 4, 3]}, + index=MultiIndex( + levels=[['a', 'b'], ['aa', 'bb']], + labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + result = df.sort_index() + assert result.index.is_lexsorted() + assert result.index.is_monotonic + + tm.assert_frame_equal(result, expected) + + # reconstruct + result = df.sort_index().copy() + result.index = result.index._sort_levels_monotonic() + assert result.index.is_lexsorted() + assert result.index.is_monotonic + + tm.assert_frame_equal(result, expected) + + def test_sort_index_reorder_on_ops(self): + # 15687 + df = pd.DataFrame( + np.random.randn(8, 2), + index=MultiIndex.from_product( + [['a', 'b'], + ['big', 'small'], + ['red', 'blu']], + names=['letter', 'size', 'color']), + columns=['near', 'far']) + df = df.sort_index() + + def my_func(group): + group.index = ['newz', 'newa'] + return group + + result = df.groupby(level=['letter', 'size']).apply( + my_func).sort_index() + expected = MultiIndex.from_product( + [['a', 'b'], + ['big', 'small'], + ['newa', 'newz']], + names=['letter', 'size', None]) + + tm.assert_index_equal(result.index, expected) + + def test_sort_non_lexsorted(self): + # degenerate case where we sort but don't + # have a satisfying result :< + # GH 15797 + idx = MultiIndex([['A', 'B', 'C'], + ['c', 'b', 'a']], + [[0, 1, 2, 0, 1, 2], + [0, 2, 1, 1, 0, 2]]) + + df = DataFrame({'col': range(len(idx))}, + index=idx, + dtype='int64') + assert df.index.is_lexsorted() is False + assert df.index.is_monotonic is False + + sorted = df.sort_index() + assert sorted.index.is_lexsorted() is True + assert sorted.index.is_monotonic is True + + expected = DataFrame( + {'col': [1, 4, 5, 2]}, + index=MultiIndex.from_tuples([('B', 'a'), ('B', 'c'), + ('C', 'a'), ('C', 'b')]), + dtype='int64') + result = sorted.loc[pd.IndexSlice['B':'C', 'a':'c'], :] + tm.assert_frame_equal(result, expected) + + def test_sort_index_nan(self): + # GH 14784 + # incorrect sorting w.r.t. nans + tuples = [[12, 13], [np.nan, np.nan], [np.nan, 3], [1, 2]] + mi = MultiIndex.from_tuples(tuples) + + df = DataFrame(np.arange(16).reshape(4, 4), + index=mi, columns=list('ABCD')) + s = Series(np.arange(4), index=mi) + + df2 = DataFrame({ + 'date': pd.to_datetime([ + '20121002', '20121007', '20130130', '20130202', '20130305', + '20121002', '20121207', '20130130', '20130202', '20130305', + '20130202', '20130305' + ]), + 'user_id': [1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 5], + 'whole_cost': [1790, np.nan, 280, 259, np.nan, 623, 90, 312, + np.nan, 301, 359, 801], + 'cost': [12, 15, 10, 24, 39, 1, 0, np.nan, 45, 34, 1, 12] + }).set_index(['date', 'user_id']) + + # sorting frame, default nan position is last + result = df.sort_index() + expected = df.iloc[[3, 0, 2, 1], :] + tm.assert_frame_equal(result, expected) + + # sorting frame, nan position last + result = df.sort_index(na_position='last') + expected = df.iloc[[3, 0, 2, 1], :] + tm.assert_frame_equal(result, expected) + + # sorting frame, nan position first + result = df.sort_index(na_position='first') + expected = df.iloc[[1, 2, 3, 0], :] + tm.assert_frame_equal(result, expected) + + # sorting frame with removed rows + result = df2.dropna().sort_index() + expected = df2.sort_index().dropna() + tm.assert_frame_equal(result, expected) + + # sorting series, default nan position is last + result = s.sort_index() + expected = s.iloc[[3, 0, 2, 1]] + tm.assert_series_equal(result, expected) + + # sorting series, nan position last + result = s.sort_index(na_position='last') + expected = s.iloc[[3, 0, 2, 1]] + tm.assert_series_equal(result, expected) + + # sorting series, nan position first + result = s.sort_index(na_position='first') + expected = s.iloc[[1, 2, 3, 0]] + tm.assert_series_equal(result, expected) + + def test_sort_ascending_list(self): + # GH: 16934 + + # Set up a Series with a three level MultiIndex + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], + [4, 3, 2, 1, 4, 3, 2, 1]] + tuples = list(zip(*arrays)) + index = pd.MultiIndex.from_tuples(tuples, + names=['first', 'second', 'third']) + s = pd.Series(range(8), index=index) + + # Sort with boolean ascending + result = s.sort_index(level=['third', 'first'], ascending=False) + expected = s.iloc[[4, 0, 5, 1, 6, 2, 7, 3]] + tm.assert_series_equal(result, expected) + + # Sort with list of boolean ascending + result = s.sort_index(level=['third', 'first'], + ascending=[False, True]) + expected = s.iloc[[0, 4, 1, 5, 2, 6, 3, 7]] + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/test_nanops.py b/pandas/tests/test_nanops.py index dd3a49de55d73..9305504f8d5e3 100644 --- a/pandas/tests/test_nanops.py +++ b/pandas/tests/test_nanops.py @@ -3,19 +3,22 @@ from functools import partial +import pytest import warnings import numpy as np -from pandas import Series, isnull -from pandas.types.common import is_integer_dtype + +import pandas as pd +from pandas import Series, isna +from pandas.core.dtypes.common import is_integer_dtype import pandas.core.nanops as nanops import pandas.util.testing as tm use_bn = nanops._USE_BOTTLENECK -class TestnanopsDataFrame(tm.TestCase): +class TestnanopsDataFrame(object): - def setUp(self): + def setup_method(self, method): np.random.seed(11235) nanops._USE_BOTTLENECK = False @@ -115,7 +118,7 @@ def setUp(self): self.arr_float_nan_inf_1d = self.arr_float_nan_inf[:, 0, 0] self.arr_nan_nan_inf_1d = self.arr_nan_nan_inf[:, 0, 0] - def tearDown(self): + def teardown_method(self, method): nanops._USE_BOTTLENECK = use_bn def check_results(self, targ, res, axis, check_dtype=True): @@ -337,16 +340,13 @@ def test_nanmean_overflow(self): # In the previous implementation mean can overflow for int dtypes, it # is now consistent with numpy - # numpy < 1.9.0 is not computing this correctly - from distutils.version import LooseVersion - if LooseVersion(np.__version__) >= '1.9.0': - for a in [2 ** 55, -2 ** 55, 20150515061816532]: - s = Series(a, index=range(500), dtype=np.int64) - result = s.mean() - np_result = s.values.mean() - self.assertEqual(result, a) - self.assertEqual(result, np_result) - self.assertTrue(result.dtype == np.float64) + for a in [2 ** 55, -2 ** 55, 20150515061816532]: + s = Series(a, index=range(500), dtype=np.int64) + result = s.mean() + np_result = s.values.mean() + assert result == a + assert result == np_result + assert result.dtype == np.float64 def test_returned_dtype(self): @@ -361,15 +361,9 @@ def test_returned_dtype(self): for method in group_a + group_b: result = getattr(s, method)() if is_integer_dtype(dtype) and method in group_a: - self.assertTrue( - result.dtype == np.float64, - "return dtype expected from %s is np.float64, " - "got %s instead" % (method, result.dtype)) + assert result.dtype == np.float64 else: - self.assertTrue( - result.dtype == dtype, - "return dtype expected from %s is %s, " - "got %s instead" % (method, dtype, result.dtype)) + assert result.dtype == dtype def test_nanmedian(self): with warnings.catch_warnings(record=True): @@ -388,12 +382,12 @@ def test_nanstd(self): allow_tdelta=True, allow_obj='convert') def test_nansem(self): - tm.skip_if_no_package('scipy.stats') - tm._skip_if_scipy_0_17() + tm.skip_if_no_package('scipy', min_version='0.17.0') from scipy.stats import sem - self.check_funs_ddof(nanops.nansem, sem, allow_complex=False, - allow_str=False, allow_date=False, - allow_tdelta=True, allow_obj='convert') + with np.errstate(invalid='ignore'): + self.check_funs_ddof(nanops.nansem, sem, allow_complex=False, + allow_str=False, allow_date=False, + allow_tdelta=False, allow_obj='convert') def _minmax_wrap(self, value, axis=None, func=None): res = func(value, axis) @@ -412,7 +406,7 @@ def test_nanmax(self): def _argminmax_wrap(self, value, axis=None, func=None): res = func(value, axis) nans = np.min(value, axis) - nullnan = isnull(nans) + nullnan = isna(nans) if res.ndim: res[nullnan] = -1 elif (hasattr(nullnan, 'all') and nullnan.all() or @@ -448,21 +442,23 @@ def _skew_kurt_wrap(self, values, axis=None, func=None): return result def test_nanskew(self): - tm.skip_if_no_package('scipy.stats') - tm._skip_if_scipy_0_17() + tm.skip_if_no_package('scipy', min_version='0.17.0') from scipy.stats import skew func = partial(self._skew_kurt_wrap, func=skew) - self.check_funs(nanops.nanskew, func, allow_complex=False, - allow_str=False, allow_date=False, allow_tdelta=False) + with np.errstate(invalid='ignore'): + self.check_funs(nanops.nanskew, func, allow_complex=False, + allow_str=False, allow_date=False, + allow_tdelta=False) def test_nankurt(self): - tm.skip_if_no_package('scipy.stats') - tm._skip_if_scipy_0_17() + tm.skip_if_no_package('scipy', min_version='0.17.0') from scipy.stats import kurtosis func1 = partial(kurtosis, fisher=True) func = partial(self._skew_kurt_wrap, func=func1) - self.check_funs(nanops.nankurt, func, allow_complex=False, - allow_str=False, allow_date=False, allow_tdelta=False) + with np.errstate(invalid='ignore'): + self.check_funs(nanops.nankurt, func, allow_complex=False, + allow_str=False, allow_date=False, + allow_tdelta=False) def test_nanprod(self): self.check_funs(nanops.nanprod, np.prod, allow_str=False, @@ -654,9 +650,9 @@ def check_bool(self, func, value, correct, *args, **kwargs): try: res0 = func(value, *args, **kwargs) if correct: - self.assertTrue(res0) + assert res0 else: - self.assertFalse(res0) + assert not res0 except BaseException as exc: exc.args += ('dim: %s' % getattr(value, 'ndim', value), ) raise @@ -733,67 +729,62 @@ def test__isfinite(self): raise def test__bn_ok_dtype(self): - self.assertTrue(nanops._bn_ok_dtype(self.arr_float.dtype, 'test')) - self.assertTrue(nanops._bn_ok_dtype(self.arr_complex.dtype, 'test')) - self.assertTrue(nanops._bn_ok_dtype(self.arr_int.dtype, 'test')) - self.assertTrue(nanops._bn_ok_dtype(self.arr_bool.dtype, 'test')) - self.assertTrue(nanops._bn_ok_dtype(self.arr_str.dtype, 'test')) - self.assertTrue(nanops._bn_ok_dtype(self.arr_utf.dtype, 'test')) - self.assertFalse(nanops._bn_ok_dtype(self.arr_date.dtype, 'test')) - self.assertFalse(nanops._bn_ok_dtype(self.arr_tdelta.dtype, 'test')) - self.assertFalse(nanops._bn_ok_dtype(self.arr_obj.dtype, 'test')) + assert nanops._bn_ok_dtype(self.arr_float.dtype, 'test') + assert nanops._bn_ok_dtype(self.arr_complex.dtype, 'test') + assert nanops._bn_ok_dtype(self.arr_int.dtype, 'test') + assert nanops._bn_ok_dtype(self.arr_bool.dtype, 'test') + assert nanops._bn_ok_dtype(self.arr_str.dtype, 'test') + assert nanops._bn_ok_dtype(self.arr_utf.dtype, 'test') + assert not nanops._bn_ok_dtype(self.arr_date.dtype, 'test') + assert not nanops._bn_ok_dtype(self.arr_tdelta.dtype, 'test') + assert not nanops._bn_ok_dtype(self.arr_obj.dtype, 'test') -class TestEnsureNumeric(tm.TestCase): +class TestEnsureNumeric(object): def test_numeric_values(self): # Test integer - self.assertEqual(nanops._ensure_numeric(1), 1, 'Failed for int') + assert nanops._ensure_numeric(1) == 1 + # Test float - self.assertEqual(nanops._ensure_numeric(1.1), 1.1, 'Failed for float') + assert nanops._ensure_numeric(1.1) == 1.1 + # Test complex - self.assertEqual(nanops._ensure_numeric(1 + 2j), 1 + 2j, - 'Failed for complex') + assert nanops._ensure_numeric(1 + 2j) == 1 + 2j def test_ndarray(self): # Test numeric ndarray values = np.array([1, 2, 3]) - self.assertTrue(np.allclose(nanops._ensure_numeric(values), values), - 'Failed for numeric ndarray') + assert np.allclose(nanops._ensure_numeric(values), values) # Test object ndarray o_values = values.astype(object) - self.assertTrue(np.allclose(nanops._ensure_numeric(o_values), values), - 'Failed for object ndarray') + assert np.allclose(nanops._ensure_numeric(o_values), values) # Test convertible string ndarray s_values = np.array(['1', '2', '3'], dtype=object) - self.assertTrue(np.allclose(nanops._ensure_numeric(s_values), values), - 'Failed for convertible string ndarray') + assert np.allclose(nanops._ensure_numeric(s_values), values) # Test non-convertible string ndarray s_values = np.array(['foo', 'bar', 'baz'], dtype=object) - self.assertRaises(ValueError, lambda: nanops._ensure_numeric(s_values)) + pytest.raises(ValueError, lambda: nanops._ensure_numeric(s_values)) def test_convertable_values(self): - self.assertTrue(np.allclose(nanops._ensure_numeric('1'), 1.0), - 'Failed for convertible integer string') - self.assertTrue(np.allclose(nanops._ensure_numeric('1.1'), 1.1), - 'Failed for convertible float string') - self.assertTrue(np.allclose(nanops._ensure_numeric('1+1j'), 1 + 1j), - 'Failed for convertible complex string') + assert np.allclose(nanops._ensure_numeric('1'), 1.0) + assert np.allclose(nanops._ensure_numeric('1.1'), 1.1) + assert np.allclose(nanops._ensure_numeric('1+1j'), 1 + 1j) def test_non_convertable_values(self): - self.assertRaises(TypeError, lambda: nanops._ensure_numeric('foo')) - self.assertRaises(TypeError, lambda: nanops._ensure_numeric({})) - self.assertRaises(TypeError, lambda: nanops._ensure_numeric([])) + pytest.raises(TypeError, lambda: nanops._ensure_numeric('foo')) + pytest.raises(TypeError, lambda: nanops._ensure_numeric({})) + pytest.raises(TypeError, lambda: nanops._ensure_numeric([])) -class TestNanvarFixedValues(tm.TestCase): +class TestNanvarFixedValues(object): # xref GH10242 - def setUp(self): + def setup_method(self, method): # Samples from a normal distribution. self.variance = variance = 3.0 self.samples = self.prng.normal(scale=variance ** 0.5, size=100000) @@ -880,14 +871,14 @@ def test_ground_truth(self): for ddof in range(3): var = nanops.nanvar(samples, skipna=True, axis=axis, ddof=ddof) tm.assert_almost_equal(var[:3], variance[axis, ddof]) - self.assertTrue(np.isnan(var[3])) + assert np.isnan(var[3]) # Test nanstd. for axis in range(2): for ddof in range(3): std = nanops.nanstd(samples, skipna=True, axis=axis, ddof=ddof) tm.assert_almost_equal(std[:3], variance[axis, ddof] ** 0.5) - self.assertTrue(np.isnan(std[3])) + assert np.isnan(std[3]) def test_nanstd_roundoff(self): # Regression test for GH 10242 (test data taken from GH 10489). Ensure @@ -895,18 +886,18 @@ def test_nanstd_roundoff(self): data = Series(766897346 * np.ones(10)) for ddof in range(3): result = data.std(ddof=ddof) - self.assertEqual(result, 0.0) + assert result == 0.0 @property def prng(self): return np.random.RandomState(1234) -class TestNanskewFixedValues(tm.TestCase): +class TestNanskewFixedValues(object): # xref GH 11974 - def setUp(self): + def setup_method(self, method): # Test data + skewness value (computed with scipy.stats.skew) self.samples = np.sin(np.linspace(0, 1, 200)) self.actual_skew = -0.1875895205961754 @@ -916,20 +907,20 @@ def test_constant_series(self): for val in [3075.2, 3075.3, 3075.5]: data = val * np.ones(300) skew = nanops.nanskew(data) - self.assertEqual(skew, 0.0) + assert skew == 0.0 def test_all_finite(self): alpha, beta = 0.3, 0.1 left_tailed = self.prng.beta(alpha, beta, size=100) - self.assertLess(nanops.nanskew(left_tailed), 0) + assert nanops.nanskew(left_tailed) < 0 alpha, beta = 0.1, 0.3 right_tailed = self.prng.beta(alpha, beta, size=100) - self.assertGreater(nanops.nanskew(right_tailed), 0) + assert nanops.nanskew(right_tailed) > 0 def test_ground_truth(self): skew = nanops.nanskew(self.samples) - self.assertAlmostEqual(skew, self.actual_skew) + tm.assert_almost_equal(skew, self.actual_skew) def test_axis(self): samples = np.vstack([self.samples, @@ -940,7 +931,7 @@ def test_axis(self): def test_nans(self): samples = np.hstack([self.samples, np.nan]) skew = nanops.nanskew(samples, skipna=False) - self.assertTrue(np.isnan(skew)) + assert np.isnan(skew) def test_nans_skipna(self): samples = np.hstack([self.samples, np.nan]) @@ -952,11 +943,11 @@ def prng(self): return np.random.RandomState(1234) -class TestNankurtFixedValues(tm.TestCase): +class TestNankurtFixedValues(object): # xref GH 11974 - def setUp(self): + def setup_method(self, method): # Test data + kurtosis value (computed with scipy.stats.kurtosis) self.samples = np.sin(np.linspace(0, 1, 200)) self.actual_kurt = -1.2058303433799713 @@ -966,20 +957,20 @@ def test_constant_series(self): for val in [3075.2, 3075.3, 3075.5]: data = val * np.ones(300) kurt = nanops.nankurt(data) - self.assertEqual(kurt, 0.0) + assert kurt == 0.0 def test_all_finite(self): alpha, beta = 0.3, 0.1 left_tailed = self.prng.beta(alpha, beta, size=100) - self.assertLess(nanops.nankurt(left_tailed), 0) + assert nanops.nankurt(left_tailed) < 0 alpha, beta = 0.1, 0.3 right_tailed = self.prng.beta(alpha, beta, size=100) - self.assertGreater(nanops.nankurt(right_tailed), 0) + assert nanops.nankurt(right_tailed) > 0 def test_ground_truth(self): kurt = nanops.nankurt(self.samples) - self.assertAlmostEqual(kurt, self.actual_kurt) + tm.assert_almost_equal(kurt, self.actual_kurt) def test_axis(self): samples = np.vstack([self.samples, @@ -990,7 +981,7 @@ def test_axis(self): def test_nans(self): samples = np.hstack([self.samples, np.nan]) kurt = nanops.nankurt(samples, skipna=False) - self.assertTrue(np.isnan(kurt)) + assert np.isnan(kurt) def test_nans_skipna(self): samples = np.hstack([self.samples, np.nan]) @@ -1002,7 +993,14 @@ def prng(self): return np.random.RandomState(1234) -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure', '-s' - ], exit=False) +def test_use_bottleneck(): + + if nanops._BOTTLENECK_INSTALLED: + + pd.set_option('use_bottleneck', True) + assert pd.get_option('use_bottleneck') + + pd.set_option('use_bottleneck', False) + assert not pd.get_option('use_bottleneck') + + pd.set_option('use_bottleneck', use_bn) diff --git a/pandas/tests/test_panel.py b/pandas/tests/test_panel.py index d79081a06dbc0..a6113f231f8f2 100644 --- a/pandas/tests/test_panel.py +++ b/pandas/tests/test_panel.py @@ -1,74 +1,85 @@ # -*- coding: utf-8 -*- # pylint: disable=W0612,E1101 +from warnings import catch_warnings from datetime import datetime - import operator -import nose +import pytest import numpy as np import pandas as pd -from pandas.types.common import is_float_dtype -from pandas import (Series, DataFrame, Index, date_range, isnull, notnull, +from pandas.core.dtypes.common import is_float_dtype +from pandas.core.dtypes.missing import remove_na_arraylike +from pandas import (Series, DataFrame, Index, date_range, isna, notna, pivot, MultiIndex) from pandas.core.nanops import nanall, nanany from pandas.core.panel import Panel -from pandas.core.series import remove_na -from pandas.formats.printing import pprint_thing +from pandas.io.formats.printing import pprint_thing from pandas import compat from pandas.compat import range, lrange, StringIO, OrderedDict, signature from pandas.tseries.offsets import BDay, MonthEnd from pandas.util.testing import (assert_panel_equal, assert_frame_equal, assert_series_equal, assert_almost_equal, - ensure_clean, assertRaisesRegexp, - makeCustomDataframe as mkdf, - makeMixedDataFrame) + ensure_clean, makeMixedDataFrame, + makeCustomDataframe as mkdf) import pandas.core.panel as panelm import pandas.util.testing as tm +def make_test_panel(): + with catch_warnings(record=True): + _panel = tm.makePanel() + tm.add_nans(_panel) + _panel = _panel.copy() + return _panel + + class PanelTests(object): panel = None def test_pickle(self): - unpickled = self.round_trip_pickle(self.panel) - assert_frame_equal(unpickled['ItemA'], self.panel['ItemA']) + with catch_warnings(record=True): + unpickled = tm.round_trip_pickle(self.panel) + assert_frame_equal(unpickled['ItemA'], self.panel['ItemA']) def test_rank(self): - self.assertRaises(NotImplementedError, lambda: self.panel.rank()) + with catch_warnings(record=True): + pytest.raises(NotImplementedError, lambda: self.panel.rank()) def test_cumsum(self): - cumsum = self.panel.cumsum() - assert_frame_equal(cumsum['ItemA'], self.panel['ItemA'].cumsum()) + with catch_warnings(record=True): + cumsum = self.panel.cumsum() + assert_frame_equal(cumsum['ItemA'], self.panel['ItemA'].cumsum()) def not_hashable(self): - c_empty = Panel() - c = Panel(Panel([[[1]]])) - self.assertRaises(TypeError, hash, c_empty) - self.assertRaises(TypeError, hash, c) + with catch_warnings(record=True): + c_empty = Panel() + c = Panel(Panel([[[1]]])) + pytest.raises(TypeError, hash, c_empty) + pytest.raises(TypeError, hash, c) class SafeForLongAndSparse(object): - _multiprocess_can_split_ = True def test_repr(self): repr(self.panel) def test_copy_names(self): - for attr in ('major_axis', 'minor_axis'): - getattr(self.panel, attr).name = None - cp = self.panel.copy() - getattr(cp, attr).name = 'foo' - self.assertIsNone(getattr(self.panel, attr).name) + with catch_warnings(record=True): + for attr in ('major_axis', 'minor_axis'): + getattr(self.panel, attr).name = None + cp = self.panel.copy() + getattr(cp, attr).name = 'foo' + assert getattr(self.panel, attr).name is None def test_iter(self): tm.equalContents(list(self.panel), self.panel.items) def test_count(self): - f = lambda s: notnull(s).sum() + f = lambda s: notna(s).sum() self._check_stat_op('count', f, obj=self.panel, has_skipna=False) def test_sum(self): @@ -82,7 +93,7 @@ def test_prod(self): def test_median(self): def wrapper(x): - if isnull(x).any(): + if isna(x).any(): return np.nan return np.median(x) @@ -98,7 +109,7 @@ def test_skew(self): try: from scipy.stats import skew except ImportError: - raise nose.SkipTest("no scipy.stats.skew") + pytest.skip("no scipy.stats.skew") def this_skew(x): if len(x) < 3: @@ -107,10 +118,6 @@ def this_skew(x): self._check_stat_op('skew', this_skew) - # def test_mad(self): - # f = lambda x: np.abs(x - x.mean()).mean() - # self._check_stat_op('mad', f) - def test_var(self): def alt(x): if len(x) < 2: @@ -148,7 +155,7 @@ def _check_stat_op(self, name, alternative, obj=None, has_skipna=True): if has_skipna: def skipna_wrapper(x): - nona = remove_na(x) + nona = remove_na_arraylike(x) if len(nona) == 0: return np.nan return alternative(nona) @@ -168,16 +175,15 @@ def wrapper(x): if not tm._incompat_bottleneck_version(name): assert_frame_equal(result, obj.apply(skipna_wrapper, axis=i)) - self.assertRaises(Exception, f, axis=obj.ndim) + pytest.raises(Exception, f, axis=obj.ndim) # Unimplemented numeric_only parameter. if 'numeric_only' in signature(f).args: - self.assertRaisesRegexp(NotImplementedError, name, f, - numeric_only=True) + tm.assert_raises_regex(NotImplementedError, name, f, + numeric_only=True) class SafeForSparse(object): - _multiprocess_can_split_ = True @classmethod def assert_panel_equal(cls, x, y): @@ -198,38 +204,38 @@ def test_set_axis(self): self.panel.items = new_items if hasattr(self.panel, '_item_cache'): - self.assertNotIn('ItemA', self.panel._item_cache) - self.assertIs(self.panel.items, new_items) + assert 'ItemA' not in self.panel._item_cache + assert self.panel.items is new_items # TODO: unused? item = self.panel[0] # noqa self.panel.major_axis = new_major - self.assertIs(self.panel[0].index, new_major) - self.assertIs(self.panel.major_axis, new_major) + assert self.panel[0].index is new_major + assert self.panel.major_axis is new_major # TODO: unused? item = self.panel[0] # noqa self.panel.minor_axis = new_minor - self.assertIs(self.panel[0].columns, new_minor) - self.assertIs(self.panel.minor_axis, new_minor) + assert self.panel[0].columns is new_minor + assert self.panel.minor_axis is new_minor def test_get_axis_number(self): - self.assertEqual(self.panel._get_axis_number('items'), 0) - self.assertEqual(self.panel._get_axis_number('major'), 1) - self.assertEqual(self.panel._get_axis_number('minor'), 2) + assert self.panel._get_axis_number('items') == 0 + assert self.panel._get_axis_number('major') == 1 + assert self.panel._get_axis_number('minor') == 2 - with tm.assertRaisesRegexp(ValueError, "No axis named foo"): + with tm.assert_raises_regex(ValueError, "No axis named foo"): self.panel._get_axis_number('foo') - with tm.assertRaisesRegexp(ValueError, "No axis named foo"): + with tm.assert_raises_regex(ValueError, "No axis named foo"): self.panel.__ge__(self.panel, axis='foo') def test_get_axis_name(self): - self.assertEqual(self.panel._get_axis_name(0), 'items') - self.assertEqual(self.panel._get_axis_name(1), 'major_axis') - self.assertEqual(self.panel._get_axis_name(2), 'minor_axis') + assert self.panel._get_axis_name(0) == 'items' + assert self.panel._get_axis_name(1) == 'major_axis' + assert self.panel._get_axis_name(2) == 'minor_axis' def test_get_plane_axes(self): # what to do here? @@ -240,47 +246,48 @@ def test_get_plane_axes(self): index, columns = self.panel._get_plane_axes(0) def test_truncate(self): - dates = self.panel.major_axis - start, end = dates[1], dates[5] - - trunced = self.panel.truncate(start, end, axis='major') - expected = self.panel['ItemA'].truncate(start, end) + with catch_warnings(record=True): + dates = self.panel.major_axis + start, end = dates[1], dates[5] - assert_frame_equal(trunced['ItemA'], expected) + trunced = self.panel.truncate(start, end, axis='major') + expected = self.panel['ItemA'].truncate(start, end) - trunced = self.panel.truncate(before=start, axis='major') - expected = self.panel['ItemA'].truncate(before=start) + assert_frame_equal(trunced['ItemA'], expected) - assert_frame_equal(trunced['ItemA'], expected) + trunced = self.panel.truncate(before=start, axis='major') + expected = self.panel['ItemA'].truncate(before=start) - trunced = self.panel.truncate(after=end, axis='major') - expected = self.panel['ItemA'].truncate(after=end) + assert_frame_equal(trunced['ItemA'], expected) - assert_frame_equal(trunced['ItemA'], expected) + trunced = self.panel.truncate(after=end, axis='major') + expected = self.panel['ItemA'].truncate(after=end) - # XXX test other axes + assert_frame_equal(trunced['ItemA'], expected) def test_arith(self): - self._test_op(self.panel, operator.add) - self._test_op(self.panel, operator.sub) - self._test_op(self.panel, operator.mul) - self._test_op(self.panel, operator.truediv) - self._test_op(self.panel, operator.floordiv) - self._test_op(self.panel, operator.pow) - - self._test_op(self.panel, lambda x, y: y + x) - self._test_op(self.panel, lambda x, y: y - x) - self._test_op(self.panel, lambda x, y: y * x) - self._test_op(self.panel, lambda x, y: y / x) - self._test_op(self.panel, lambda x, y: y ** x) - - self._test_op(self.panel, lambda x, y: x + y) # panel + 1 - self._test_op(self.panel, lambda x, y: x - y) # panel - 1 - self._test_op(self.panel, lambda x, y: x * y) # panel * 1 - self._test_op(self.panel, lambda x, y: x / y) # panel / 1 - self._test_op(self.panel, lambda x, y: x ** y) # panel ** 1 - - self.assertRaises(Exception, self.panel.__add__, self.panel['ItemA']) + with catch_warnings(record=True): + self._test_op(self.panel, operator.add) + self._test_op(self.panel, operator.sub) + self._test_op(self.panel, operator.mul) + self._test_op(self.panel, operator.truediv) + self._test_op(self.panel, operator.floordiv) + self._test_op(self.panel, operator.pow) + + self._test_op(self.panel, lambda x, y: y + x) + self._test_op(self.panel, lambda x, y: y - x) + self._test_op(self.panel, lambda x, y: y * x) + self._test_op(self.panel, lambda x, y: y / x) + self._test_op(self.panel, lambda x, y: y ** x) + + self._test_op(self.panel, lambda x, y: x + y) # panel + 1 + self._test_op(self.panel, lambda x, y: x - y) # panel - 1 + self._test_op(self.panel, lambda x, y: x * y) # panel * 1 + self._test_op(self.panel, lambda x, y: x / y) # panel / 1 + self._test_op(self.panel, lambda x, y: x ** y) # panel ** 1 + + pytest.raises(Exception, self.panel.__add__, + self.panel['ItemA']) @staticmethod def _test_op(panel, op): @@ -296,96 +303,103 @@ def test_iteritems(self): for k, v in self.panel.iteritems(): pass - self.assertEqual(len(list(self.panel.iteritems())), - len(self.panel.items)) + assert len(list(self.panel.iteritems())) == len(self.panel.items) def test_combineFrame(self): - def check_op(op, name): - # items - df = self.panel['ItemA'] + with catch_warnings(record=True): + def check_op(op, name): + # items + df = self.panel['ItemA'] - func = getattr(self.panel, name) + func = getattr(self.panel, name) - result = func(df, axis='items') + result = func(df, axis='items') - assert_frame_equal(result['ItemB'], op(self.panel['ItemB'], df)) + assert_frame_equal( + result['ItemB'], op(self.panel['ItemB'], df)) - # major - xs = self.panel.major_xs(self.panel.major_axis[0]) - result = func(xs, axis='major') + # major + xs = self.panel.major_xs(self.panel.major_axis[0]) + result = func(xs, axis='major') - idx = self.panel.major_axis[1] + idx = self.panel.major_axis[1] - assert_frame_equal(result.major_xs(idx), - op(self.panel.major_xs(idx), xs)) + assert_frame_equal(result.major_xs(idx), + op(self.panel.major_xs(idx), xs)) - # minor - xs = self.panel.minor_xs(self.panel.minor_axis[0]) - result = func(xs, axis='minor') + # minor + xs = self.panel.minor_xs(self.panel.minor_axis[0]) + result = func(xs, axis='minor') - idx = self.panel.minor_axis[1] + idx = self.panel.minor_axis[1] - assert_frame_equal(result.minor_xs(idx), - op(self.panel.minor_xs(idx), xs)) + assert_frame_equal(result.minor_xs(idx), + op(self.panel.minor_xs(idx), xs)) - ops = ['add', 'sub', 'mul', 'truediv', 'floordiv', 'pow', 'mod'] - if not compat.PY3: - ops.append('div') + ops = ['add', 'sub', 'mul', 'truediv', 'floordiv', 'pow', 'mod'] + if not compat.PY3: + ops.append('div') - for op in ops: - try: - check_op(getattr(operator, op), op) - except: - pprint_thing("Failing operation: %r" % op) - raise - if compat.PY3: - try: - check_op(operator.truediv, 'div') - except: - pprint_thing("Failing operation: %r" % 'div') - raise + for op in ops: + try: + check_op(getattr(operator, op), op) + except: + pprint_thing("Failing operation: %r" % op) + raise + if compat.PY3: + try: + check_op(operator.truediv, 'div') + except: + pprint_thing("Failing operation: %r" % 'div') + raise def test_combinePanel(self): - result = self.panel.add(self.panel) - self.assert_panel_equal(result, self.panel * 2) + with catch_warnings(record=True): + result = self.panel.add(self.panel) + assert_panel_equal(result, self.panel * 2) def test_neg(self): - self.assert_panel_equal(-self.panel, self.panel * -1) + with catch_warnings(record=True): + assert_panel_equal(-self.panel, self.panel * -1) # issue 7692 def test_raise_when_not_implemented(self): - p = Panel(np.arange(3 * 4 * 5).reshape(3, 4, 5), - items=['ItemA', 'ItemB', 'ItemC'], - major_axis=pd.date_range('20130101', periods=4), - minor_axis=list('ABCDE')) - d = p.sum(axis=1).iloc[0] - ops = ['add', 'sub', 'mul', 'truediv', 'floordiv', 'div', 'mod', 'pow'] - for op in ops: - with self.assertRaises(NotImplementedError): - getattr(p, op)(d, axis=0) + with catch_warnings(record=True): + p = Panel(np.arange(3 * 4 * 5).reshape(3, 4, 5), + items=['ItemA', 'ItemB', 'ItemC'], + major_axis=pd.date_range('20130101', periods=4), + minor_axis=list('ABCDE')) + d = p.sum(axis=1).iloc[0] + ops = ['add', 'sub', 'mul', 'truediv', + 'floordiv', 'div', 'mod', 'pow'] + for op in ops: + with pytest.raises(NotImplementedError): + getattr(p, op)(d, axis=0) def test_select(self): - p = self.panel + with catch_warnings(record=True): + p = self.panel - # select items - result = p.select(lambda x: x in ('ItemA', 'ItemC'), axis='items') - expected = p.reindex(items=['ItemA', 'ItemC']) - self.assert_panel_equal(result, expected) + # select items + result = p.select(lambda x: x in ('ItemA', 'ItemC'), axis='items') + expected = p.reindex(items=['ItemA', 'ItemC']) + assert_panel_equal(result, expected) - # select major_axis - result = p.select(lambda x: x >= datetime(2000, 1, 15), axis='major') - new_major = p.major_axis[p.major_axis >= datetime(2000, 1, 15)] - expected = p.reindex(major=new_major) - self.assert_panel_equal(result, expected) + # select major_axis + result = p.select(lambda x: x >= datetime( + 2000, 1, 15), axis='major') + new_major = p.major_axis[p.major_axis >= datetime(2000, 1, 15)] + expected = p.reindex(major=new_major) + assert_panel_equal(result, expected) - # select minor_axis - result = p.select(lambda x: x in ('D', 'A'), axis=2) - expected = p.reindex(minor=['A', 'D']) - self.assert_panel_equal(result, expected) + # select minor_axis + result = p.select(lambda x: x in ('D', 'A'), axis=2) + expected = p.reindex(minor=['A', 'D']) + assert_panel_equal(result, expected) - # corner case, empty thing - result = p.select(lambda x: x in ('foo', ), axis='items') - self.assert_panel_equal(result, p.reindex(items=[])) + # corner case, empty thing + result = p.select(lambda x: x in ('foo', ), axis='items') + assert_panel_equal(result, p.reindex(items=[])) def test_get_value(self): for item in self.panel.items: @@ -397,219 +411,232 @@ def test_get_value(self): def test_abs(self): - result = self.panel.abs() - result2 = abs(self.panel) - expected = np.abs(self.panel) - self.assert_panel_equal(result, expected) - self.assert_panel_equal(result2, expected) - - df = self.panel['ItemA'] - result = df.abs() - result2 = abs(df) - expected = np.abs(df) - assert_frame_equal(result, expected) - assert_frame_equal(result2, expected) + with catch_warnings(record=True): + result = self.panel.abs() + result2 = abs(self.panel) + expected = np.abs(self.panel) + assert_panel_equal(result, expected) + assert_panel_equal(result2, expected) - s = df['A'] - result = s.abs() - result2 = abs(s) - expected = np.abs(s) - assert_series_equal(result, expected) - assert_series_equal(result2, expected) - self.assertEqual(result.name, 'A') - self.assertEqual(result2.name, 'A') + df = self.panel['ItemA'] + result = df.abs() + result2 = abs(df) + expected = np.abs(df) + assert_frame_equal(result, expected) + assert_frame_equal(result2, expected) + + s = df['A'] + result = s.abs() + result2 = abs(s) + expected = np.abs(s) + assert_series_equal(result, expected) + assert_series_equal(result2, expected) + assert result.name == 'A' + assert result2.name == 'A' class CheckIndexing(object): - _multiprocess_can_split_ = True - def test_getitem(self): - self.assertRaises(Exception, self.panel.__getitem__, 'ItemQ') + pytest.raises(Exception, self.panel.__getitem__, 'ItemQ') def test_delitem_and_pop(self): - expected = self.panel['ItemA'] - result = self.panel.pop('ItemA') - assert_frame_equal(expected, result) - self.assertNotIn('ItemA', self.panel.items) + with catch_warnings(record=True): + expected = self.panel['ItemA'] + result = self.panel.pop('ItemA') + assert_frame_equal(expected, result) + assert 'ItemA' not in self.panel.items - del self.panel['ItemB'] - self.assertNotIn('ItemB', self.panel.items) - self.assertRaises(Exception, self.panel.__delitem__, 'ItemB') + del self.panel['ItemB'] + assert 'ItemB' not in self.panel.items + pytest.raises(Exception, self.panel.__delitem__, 'ItemB') - values = np.empty((3, 3, 3)) - values[0] = 0 - values[1] = 1 - values[2] = 2 + values = np.empty((3, 3, 3)) + values[0] = 0 + values[1] = 1 + values[2] = 2 - panel = Panel(values, lrange(3), lrange(3), lrange(3)) + panel = Panel(values, lrange(3), lrange(3), lrange(3)) - # did we delete the right row? + # did we delete the right row? - panelc = panel.copy() - del panelc[0] - assert_frame_equal(panelc[1], panel[1]) - assert_frame_equal(panelc[2], panel[2]) + panelc = panel.copy() + del panelc[0] + tm.assert_frame_equal(panelc[1], panel[1]) + tm.assert_frame_equal(panelc[2], panel[2]) - panelc = panel.copy() - del panelc[1] - assert_frame_equal(panelc[0], panel[0]) - assert_frame_equal(panelc[2], panel[2]) + panelc = panel.copy() + del panelc[1] + tm.assert_frame_equal(panelc[0], panel[0]) + tm.assert_frame_equal(panelc[2], panel[2]) - panelc = panel.copy() - del panelc[2] - assert_frame_equal(panelc[1], panel[1]) - assert_frame_equal(panelc[0], panel[0]) + panelc = panel.copy() + del panelc[2] + tm.assert_frame_equal(panelc[1], panel[1]) + tm.assert_frame_equal(panelc[0], panel[0]) def test_setitem(self): - # LongPanel with one item - lp = self.panel.filter(['ItemA', 'ItemB']).to_frame() - with tm.assertRaises(ValueError): - self.panel['ItemE'] = lp + with catch_warnings(record=True): + + # LongPanel with one item + lp = self.panel.filter(['ItemA', 'ItemB']).to_frame() + with pytest.raises(ValueError): + self.panel['ItemE'] = lp - # DataFrame - df = self.panel['ItemA'][2:].filter(items=['A', 'B']) - self.panel['ItemF'] = df - self.panel['ItemE'] = df + # DataFrame + df = self.panel['ItemA'][2:].filter(items=['A', 'B']) + self.panel['ItemF'] = df + self.panel['ItemE'] = df - df2 = self.panel['ItemF'] + df2 = self.panel['ItemF'] - assert_frame_equal(df, df2.reindex(index=df.index, columns=df.columns)) + assert_frame_equal(df, df2.reindex( + index=df.index, columns=df.columns)) - # scalar - self.panel['ItemG'] = 1 - self.panel['ItemE'] = True - self.assertEqual(self.panel['ItemG'].values.dtype, np.int64) - self.assertEqual(self.panel['ItemE'].values.dtype, np.bool_) + # scalar + self.panel['ItemG'] = 1 + self.panel['ItemE'] = True + assert self.panel['ItemG'].values.dtype == np.int64 + assert self.panel['ItemE'].values.dtype == np.bool_ - # object dtype - self.panel['ItemQ'] = 'foo' - self.assertEqual(self.panel['ItemQ'].values.dtype, np.object_) + # object dtype + self.panel['ItemQ'] = 'foo' + assert self.panel['ItemQ'].values.dtype == np.object_ - # boolean dtype - self.panel['ItemP'] = self.panel['ItemA'] > 0 - self.assertEqual(self.panel['ItemP'].values.dtype, np.bool_) + # boolean dtype + self.panel['ItemP'] = self.panel['ItemA'] > 0 + assert self.panel['ItemP'].values.dtype == np.bool_ - self.assertRaises(TypeError, self.panel.__setitem__, 'foo', + pytest.raises(TypeError, self.panel.__setitem__, 'foo', self.panel.loc[['ItemP']]) - # bad shape - p = Panel(np.random.randn(4, 3, 2)) - with tm.assertRaisesRegexp(ValueError, - r"shape of value must be \(3, 2\), " - r"shape of given object was \(4, 2\)"): - p[0] = np.random.randn(4, 2) + # bad shape + p = Panel(np.random.randn(4, 3, 2)) + with tm.assert_raises_regex(ValueError, + r"shape of value must be " + r"\(3, 2\), shape of given " + r"object was \(4, 2\)"): + p[0] = np.random.randn(4, 2) def test_setitem_ndarray(self): - timeidx = date_range(start=datetime(2009, 1, 1), - end=datetime(2009, 12, 31), - freq=MonthEnd()) - lons_coarse = np.linspace(-177.5, 177.5, 72) - lats_coarse = np.linspace(-87.5, 87.5, 36) - P = Panel(items=timeidx, major_axis=lons_coarse, - minor_axis=lats_coarse) - data = np.random.randn(72 * 36).reshape((72, 36)) - key = datetime(2009, 2, 28) - P[key] = data - - assert_almost_equal(P[key].values, data) + with catch_warnings(record=True): + timeidx = date_range(start=datetime(2009, 1, 1), + end=datetime(2009, 12, 31), + freq=MonthEnd()) + lons_coarse = np.linspace(-177.5, 177.5, 72) + lats_coarse = np.linspace(-87.5, 87.5, 36) + P = Panel(items=timeidx, major_axis=lons_coarse, + minor_axis=lats_coarse) + data = np.random.randn(72 * 36).reshape((72, 36)) + key = datetime(2009, 2, 28) + P[key] = data + + assert_almost_equal(P[key].values, data) def test_set_minor_major(self): - # GH 11014 - df1 = DataFrame(['a', 'a', 'a', np.nan, 'a', np.nan]) - df2 = DataFrame([1.0, np.nan, 1.0, np.nan, 1.0, 1.0]) - panel = Panel({'Item1': df1, 'Item2': df2}) - - newminor = notnull(panel.iloc[:, :, 0]) - panel.loc[:, :, 'NewMinor'] = newminor - assert_frame_equal(panel.loc[:, :, 'NewMinor'], - newminor.astype(object)) - - newmajor = notnull(panel.iloc[:, 0, :]) - panel.loc[:, 'NewMajor', :] = newmajor - assert_frame_equal(panel.loc[:, 'NewMajor', :], - newmajor.astype(object)) + with catch_warnings(record=True): + # GH 11014 + df1 = DataFrame(['a', 'a', 'a', np.nan, 'a', np.nan]) + df2 = DataFrame([1.0, np.nan, 1.0, np.nan, 1.0, 1.0]) + panel = Panel({'Item1': df1, 'Item2': df2}) + + newminor = notna(panel.iloc[:, :, 0]) + panel.loc[:, :, 'NewMinor'] = newminor + assert_frame_equal(panel.loc[:, :, 'NewMinor'], + newminor.astype(object)) + + newmajor = notna(panel.iloc[:, 0, :]) + panel.loc[:, 'NewMajor', :] = newmajor + assert_frame_equal(panel.loc[:, 'NewMajor', :], + newmajor.astype(object)) def test_major_xs(self): - ref = self.panel['ItemA'] + with catch_warnings(record=True): + ref = self.panel['ItemA'] - idx = self.panel.major_axis[5] - xs = self.panel.major_xs(idx) + idx = self.panel.major_axis[5] + xs = self.panel.major_xs(idx) - result = xs['ItemA'] - assert_series_equal(result, ref.xs(idx), check_names=False) - self.assertEqual(result.name, 'ItemA') + result = xs['ItemA'] + assert_series_equal(result, ref.xs(idx), check_names=False) + assert result.name == 'ItemA' - # not contained - idx = self.panel.major_axis[0] - BDay() - self.assertRaises(Exception, self.panel.major_xs, idx) + # not contained + idx = self.panel.major_axis[0] - BDay() + pytest.raises(Exception, self.panel.major_xs, idx) def test_major_xs_mixed(self): - self.panel['ItemD'] = 'foo' - xs = self.panel.major_xs(self.panel.major_axis[0]) - self.assertEqual(xs['ItemA'].dtype, np.float64) - self.assertEqual(xs['ItemD'].dtype, np.object_) + with catch_warnings(record=True): + self.panel['ItemD'] = 'foo' + xs = self.panel.major_xs(self.panel.major_axis[0]) + assert xs['ItemA'].dtype == np.float64 + assert xs['ItemD'].dtype == np.object_ def test_minor_xs(self): - ref = self.panel['ItemA'] + with catch_warnings(record=True): + ref = self.panel['ItemA'] - idx = self.panel.minor_axis[1] - xs = self.panel.minor_xs(idx) + idx = self.panel.minor_axis[1] + xs = self.panel.minor_xs(idx) - assert_series_equal(xs['ItemA'], ref[idx], check_names=False) + assert_series_equal(xs['ItemA'], ref[idx], check_names=False) - # not contained - self.assertRaises(Exception, self.panel.minor_xs, 'E') + # not contained + pytest.raises(Exception, self.panel.minor_xs, 'E') def test_minor_xs_mixed(self): - self.panel['ItemD'] = 'foo' + with catch_warnings(record=True): + self.panel['ItemD'] = 'foo' - xs = self.panel.minor_xs('D') - self.assertEqual(xs['ItemA'].dtype, np.float64) - self.assertEqual(xs['ItemD'].dtype, np.object_) + xs = self.panel.minor_xs('D') + assert xs['ItemA'].dtype == np.float64 + assert xs['ItemD'].dtype == np.object_ def test_xs(self): - itemA = self.panel.xs('ItemA', axis=0) - expected = self.panel['ItemA'] - assert_frame_equal(itemA, expected) + with catch_warnings(record=True): + itemA = self.panel.xs('ItemA', axis=0) + expected = self.panel['ItemA'] + tm.assert_frame_equal(itemA, expected) - # get a view by default - itemA_view = self.panel.xs('ItemA', axis=0) - itemA_view.values[:] = np.nan - self.assertTrue(np.isnan(self.panel['ItemA'].values).all()) + # Get a view by default. + itemA_view = self.panel.xs('ItemA', axis=0) + itemA_view.values[:] = np.nan - # mixed-type yields a copy - self.panel['strings'] = 'foo' - result = self.panel.xs('D', axis=2) - self.assertIsNotNone(result.is_copy) + assert np.isnan(self.panel['ItemA'].values).all() + + # Mixed-type yields a copy. + self.panel['strings'] = 'foo' + result = self.panel.xs('D', axis=2) + assert result.is_copy is not None def test_getitem_fancy_labels(self): - p = self.panel + with catch_warnings(record=True): + p = self.panel - items = p.items[[1, 0]] - dates = p.major_axis[::2] - cols = ['D', 'C', 'F'] + items = p.items[[1, 0]] + dates = p.major_axis[::2] + cols = ['D', 'C', 'F'] - # all 3 specified - assert_panel_equal(p.loc[items, dates, cols], - p.reindex(items=items, major=dates, minor=cols)) + # all 3 specified + assert_panel_equal(p.loc[items, dates, cols], + p.reindex(items=items, major=dates, minor=cols)) - # 2 specified - assert_panel_equal(p.loc[:, dates, cols], - p.reindex(major=dates, minor=cols)) + # 2 specified + assert_panel_equal(p.loc[:, dates, cols], + p.reindex(major=dates, minor=cols)) - assert_panel_equal(p.loc[items, :, cols], - p.reindex(items=items, minor=cols)) + assert_panel_equal(p.loc[items, :, cols], + p.reindex(items=items, minor=cols)) - assert_panel_equal(p.loc[items, dates, :], - p.reindex(items=items, major=dates)) + assert_panel_equal(p.loc[items, dates, :], + p.reindex(items=items, major=dates)) - # only 1 - assert_panel_equal(p.loc[items, :, :], p.reindex(items=items)) + # only 1 + assert_panel_equal(p.loc[items, :, :], p.reindex(items=items)) - assert_panel_equal(p.loc[:, dates, :], p.reindex(major=dates)) + assert_panel_equal(p.loc[:, dates, :], p.reindex(major=dates)) - assert_panel_equal(p.loc[:, :, cols], p.reindex(minor=cols)) + assert_panel_equal(p.loc[:, :, cols], p.reindex(minor=cols)) def test_getitem_fancy_slice(self): pass @@ -649,187 +676,195 @@ def test_getitem_fancy_xs(self): assert_series_equal(p.loc[:, date, col], p.major_xs(date).loc[col]) def test_getitem_fancy_xs_check_view(self): - item = 'ItemB' - date = self.panel.major_axis[5] - - # make sure it's always a view - NS = slice(None, None) - - # DataFrames - comp = assert_frame_equal - self._check_view(item, comp) - self._check_view((item, NS), comp) - self._check_view((item, NS, NS), comp) - self._check_view((NS, date), comp) - self._check_view((NS, date, NS), comp) - self._check_view((NS, NS, 'C'), comp) - - # Series - comp = assert_series_equal - self._check_view((item, date), comp) - self._check_view((item, date, NS), comp) - self._check_view((item, NS, 'C'), comp) - self._check_view((NS, date, 'C'), comp) + with catch_warnings(record=True): + item = 'ItemB' + date = self.panel.major_axis[5] + + # make sure it's always a view + NS = slice(None, None) + + # DataFrames + comp = assert_frame_equal + self._check_view(item, comp) + self._check_view((item, NS), comp) + self._check_view((item, NS, NS), comp) + self._check_view((NS, date), comp) + self._check_view((NS, date, NS), comp) + self._check_view((NS, NS, 'C'), comp) + + # Series + comp = assert_series_equal + self._check_view((item, date), comp) + self._check_view((item, date, NS), comp) + self._check_view((item, NS, 'C'), comp) + self._check_view((NS, date, 'C'), comp) def test_getitem_callable(self): - p = self.panel - # GH 12533 + with catch_warnings(record=True): + p = self.panel + # GH 12533 - assert_frame_equal(p[lambda x: 'ItemB'], p.loc['ItemB']) - assert_panel_equal(p[lambda x: ['ItemB', 'ItemC']], - p.loc[['ItemB', 'ItemC']]) + assert_frame_equal(p[lambda x: 'ItemB'], p.loc['ItemB']) + assert_panel_equal(p[lambda x: ['ItemB', 'ItemC']], + p.loc[['ItemB', 'ItemC']]) def test_ix_setitem_slice_dataframe(self): - a = Panel(items=[1, 2, 3], major_axis=[11, 22, 33], - minor_axis=[111, 222, 333]) - b = DataFrame(np.random.randn(2, 3), index=[111, 333], - columns=[1, 2, 3]) + with catch_warnings(record=True): + a = Panel(items=[1, 2, 3], major_axis=[11, 22, 33], + minor_axis=[111, 222, 333]) + b = DataFrame(np.random.randn(2, 3), index=[111, 333], + columns=[1, 2, 3]) - a.loc[:, 22, [111, 333]] = b + a.loc[:, 22, [111, 333]] = b - assert_frame_equal(a.loc[:, 22, [111, 333]], b) + assert_frame_equal(a.loc[:, 22, [111, 333]], b) def test_ix_align(self): - from pandas import Series - b = Series(np.random.randn(10), name=0) - b.sort() - df_orig = Panel(np.random.randn(3, 10, 2)) - df = df_orig.copy() + with catch_warnings(record=True): + from pandas import Series + b = Series(np.random.randn(10), name=0) + b.sort_values() + df_orig = Panel(np.random.randn(3, 10, 2)) + df = df_orig.copy() - df.loc[0, :, 0] = b - assert_series_equal(df.loc[0, :, 0].reindex(b.index), b) + df.loc[0, :, 0] = b + assert_series_equal(df.loc[0, :, 0].reindex(b.index), b) - df = df_orig.swapaxes(0, 1) - df.loc[:, 0, 0] = b - assert_series_equal(df.loc[:, 0, 0].reindex(b.index), b) + df = df_orig.swapaxes(0, 1) + df.loc[:, 0, 0] = b + assert_series_equal(df.loc[:, 0, 0].reindex(b.index), b) - df = df_orig.swapaxes(1, 2) - df.loc[0, 0, :] = b - assert_series_equal(df.loc[0, 0, :].reindex(b.index), b) + df = df_orig.swapaxes(1, 2) + df.loc[0, 0, :] = b + assert_series_equal(df.loc[0, 0, :].reindex(b.index), b) def test_ix_frame_align(self): - p_orig = tm.makePanel() - df = p_orig.iloc[0].copy() - assert_frame_equal(p_orig['ItemA'], df) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.loc['ItemA', :, :] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p['ItemA'] = df - assert_panel_equal(p, p_orig) - - p = p_orig.copy() - p.iloc[0, [0, 1, 3, 5], -2:] = df - out = p.iloc[0, [0, 1, 3, 5], -2:] - assert_frame_equal(out, df.iloc[[0, 1, 3, 5], [2, 3]]) - - # GH3830, panel assignent by values/frame - for dtype in ['float64', 'int64']: - - panel = Panel(np.arange(40).reshape((2, 4, 5)), - items=['a1', 'a2'], dtype=dtype) - df1 = panel.iloc[0] - df2 = panel.iloc[1] - - tm.assert_frame_equal(panel.loc['a1'], df1) - tm.assert_frame_equal(panel.loc['a2'], df2) - - # Assignment by Value Passes for 'a2' - panel.loc['a2'] = df1.values - tm.assert_frame_equal(panel.loc['a1'], df1) - tm.assert_frame_equal(panel.loc['a2'], df1) - - # Assignment by DataFrame Ok w/o loc 'a2' - panel['a2'] = df2 - tm.assert_frame_equal(panel.loc['a1'], df1) - tm.assert_frame_equal(panel.loc['a2'], df2) - - # Assignment by DataFrame Fails for 'a2' - panel.loc['a2'] = df2 - tm.assert_frame_equal(panel.loc['a1'], df1) - tm.assert_frame_equal(panel.loc['a2'], df2) + with catch_warnings(record=True): + p_orig = tm.makePanel() + df = p_orig.iloc[0].copy() + assert_frame_equal(p_orig['ItemA'], df) + + p = p_orig.copy() + p.iloc[0, :, :] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.iloc[0] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.iloc[0, :, :] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.iloc[0] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.loc['ItemA'] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.loc['ItemA', :, :] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p['ItemA'] = df + assert_panel_equal(p, p_orig) + + p = p_orig.copy() + p.iloc[0, [0, 1, 3, 5], -2:] = df + out = p.iloc[0, [0, 1, 3, 5], -2:] + assert_frame_equal(out, df.iloc[[0, 1, 3, 5], [2, 3]]) + + # GH3830, panel assignent by values/frame + for dtype in ['float64', 'int64']: + + panel = Panel(np.arange(40).reshape((2, 4, 5)), + items=['a1', 'a2'], dtype=dtype) + df1 = panel.iloc[0] + df2 = panel.iloc[1] + + tm.assert_frame_equal(panel.loc['a1'], df1) + tm.assert_frame_equal(panel.loc['a2'], df2) + + # Assignment by Value Passes for 'a2' + panel.loc['a2'] = df1.values + tm.assert_frame_equal(panel.loc['a1'], df1) + tm.assert_frame_equal(panel.loc['a2'], df1) + + # Assignment by DataFrame Ok w/o loc 'a2' + panel['a2'] = df2 + tm.assert_frame_equal(panel.loc['a1'], df1) + tm.assert_frame_equal(panel.loc['a2'], df2) + + # Assignment by DataFrame Fails for 'a2' + panel.loc['a2'] = df2 + tm.assert_frame_equal(panel.loc['a1'], df1) + tm.assert_frame_equal(panel.loc['a2'], df2) def _check_view(self, indexer, comp): cp = self.panel.copy() obj = cp.loc[indexer] obj.values[:] = 0 - self.assertTrue((obj.values == 0).all()) + assert (obj.values == 0).all() comp(cp.loc[indexer].reindex_like(obj), obj) def test_logical_with_nas(self): - d = Panel({'ItemA': {'a': [np.nan, False]}, - 'ItemB': {'a': [True, True]}}) + with catch_warnings(record=True): + d = Panel({'ItemA': {'a': [np.nan, False]}, + 'ItemB': {'a': [True, True]}}) - result = d['ItemA'] | d['ItemB'] - expected = DataFrame({'a': [np.nan, True]}) - assert_frame_equal(result, expected) + result = d['ItemA'] | d['ItemB'] + expected = DataFrame({'a': [np.nan, True]}) + assert_frame_equal(result, expected) - # this is autodowncasted here - result = d['ItemA'].fillna(False) | d['ItemB'] - expected = DataFrame({'a': [True, True]}) - assert_frame_equal(result, expected) + # this is autodowncasted here + result = d['ItemA'].fillna(False) | d['ItemB'] + expected = DataFrame({'a': [True, True]}) + assert_frame_equal(result, expected) def test_neg(self): - # what to do? - assert_panel_equal(-self.panel, -1 * self.panel) + with catch_warnings(record=True): + assert_panel_equal(-self.panel, -1 * self.panel) def test_invert(self): - assert_panel_equal(-(self.panel < 0), ~(self.panel < 0)) + with catch_warnings(record=True): + assert_panel_equal(-(self.panel < 0), ~(self.panel < 0)) def test_comparisons(self): - p1 = tm.makePanel() - p2 = tm.makePanel() + with catch_warnings(record=True): + p1 = tm.makePanel() + p2 = tm.makePanel() - tp = p1.reindex(items=p1.items + ['foo']) - df = p1[p1.items[0]] + tp = p1.reindex(items=p1.items + ['foo']) + df = p1[p1.items[0]] - def test_comp(func): + def test_comp(func): - # versus same index - result = func(p1, p2) - self.assert_numpy_array_equal(result.values, - func(p1.values, p2.values)) + # versus same index + result = func(p1, p2) + tm.assert_numpy_array_equal(result.values, + func(p1.values, p2.values)) - # versus non-indexed same objs - self.assertRaises(Exception, func, p1, tp) + # versus non-indexed same objs + pytest.raises(Exception, func, p1, tp) - # versus different objs - self.assertRaises(Exception, func, p1, df) + # versus different objs + pytest.raises(Exception, func, p1, df) - # versus scalar - result3 = func(self.panel, 0) - self.assert_numpy_array_equal(result3.values, - func(self.panel.values, 0)) + # versus scalar + result3 = func(self.panel, 0) + tm.assert_numpy_array_equal(result3.values, + func(self.panel.values, 0)) - with np.errstate(invalid='ignore'): - test_comp(operator.eq) - test_comp(operator.ne) - test_comp(operator.lt) - test_comp(operator.gt) - test_comp(operator.ge) - test_comp(operator.le) + with np.errstate(invalid='ignore'): + test_comp(operator.eq) + test_comp(operator.ne) + test_comp(operator.lt) + test_comp(operator.gt) + test_comp(operator.ge) + test_comp(operator.le) def test_get_value(self): for item in self.panel.items: @@ -838,330 +873,356 @@ def test_get_value(self): result = self.panel.get_value(item, mjr, mnr) expected = self.panel[item][mnr][mjr] assert_almost_equal(result, expected) - with tm.assertRaisesRegexp(TypeError, - "There must be an argument for each axis"): + with tm.assert_raises_regex(TypeError, + "There must be an argument " + "for each axis"): self.panel.get_value('a') def test_set_value(self): - for item in self.panel.items: - for mjr in self.panel.major_axis[::2]: - for mnr in self.panel.minor_axis: - self.panel.set_value(item, mjr, mnr, 1.) - assert_almost_equal(self.panel[item][mnr][mjr], 1.) + with catch_warnings(record=True): + for item in self.panel.items: + for mjr in self.panel.major_axis[::2]: + for mnr in self.panel.minor_axis: + self.panel.set_value(item, mjr, mnr, 1.) + tm.assert_almost_equal(self.panel[item][mnr][mjr], 1.) - # resize - res = self.panel.set_value('ItemE', 'foo', 'bar', 1.5) - tm.assertIsInstance(res, Panel) - self.assertIsNot(res, self.panel) - self.assertEqual(res.get_value('ItemE', 'foo', 'bar'), 1.5) + # resize + res = self.panel.set_value('ItemE', 'foo', 'bar', 1.5) + assert isinstance(res, Panel) + assert res is not self.panel + assert res.get_value('ItemE', 'foo', 'bar') == 1.5 - res3 = self.panel.set_value('ItemE', 'foobar', 'baz', 5) - self.assertTrue(is_float_dtype(res3['ItemE'].values)) - with tm.assertRaisesRegexp(TypeError, - "There must be an argument for each axis" - " plus the value provided"): - self.panel.set_value('a') + res3 = self.panel.set_value('ItemE', 'foobar', 'baz', 5) + assert is_float_dtype(res3['ItemE'].values) + msg = ("There must be an argument for each " + "axis plus the value provided") + with tm.assert_raises_regex(TypeError, msg): + self.panel.set_value('a') -_panel = tm.makePanel() -tm.add_nans(_panel) - -class TestPanel(tm.TestCase, PanelTests, CheckIndexing, SafeForLongAndSparse, +class TestPanel(PanelTests, CheckIndexing, SafeForLongAndSparse, SafeForSparse): - _multiprocess_can_split_ = True @classmethod def assert_panel_equal(cls, x, y): assert_panel_equal(x, y) - def setUp(self): - self.panel = _panel.copy() + def setup_method(self, method): + self.panel = make_test_panel() self.panel.major_axis.name = None self.panel.minor_axis.name = None self.panel.items.name = None def test_constructor(self): - # with BlockManager - wp = Panel(self.panel._data) - self.assertIs(wp._data, self.panel._data) - - wp = Panel(self.panel._data, copy=True) - self.assertIsNot(wp._data, self.panel._data) - assert_panel_equal(wp, self.panel) - - # strings handled prop - wp = Panel([[['foo', 'foo', 'foo', ], ['foo', 'foo', 'foo']]]) - self.assertEqual(wp.values.dtype, np.object_) - - vals = self.panel.values - - # no copy - wp = Panel(vals) - self.assertIs(wp.values, vals) - - # copy - wp = Panel(vals, copy=True) - self.assertIsNot(wp.values, vals) - - # GH #8285, test when scalar data is used to construct a Panel - # if dtype is not passed, it should be inferred - value_and_dtype = [(1, 'int64'), (3.14, 'float64'), - ('foo', np.object_)] - for (val, dtype) in value_and_dtype: - wp = Panel(val, items=range(2), major_axis=range(3), - minor_axis=range(4)) - vals = np.empty((2, 3, 4), dtype=dtype) - vals.fill(val) - assert_panel_equal(wp, Panel(vals, dtype=dtype)) - - # test the case when dtype is passed - wp = Panel(1, items=range(2), major_axis=range(3), minor_axis=range(4), - dtype='float32') - vals = np.empty((2, 3, 4), dtype='float32') - vals.fill(1) - assert_panel_equal(wp, Panel(vals, dtype='float32')) + with catch_warnings(record=True): + # with BlockManager + wp = Panel(self.panel._data) + assert wp._data is self.panel._data + + wp = Panel(self.panel._data, copy=True) + assert wp._data is not self.panel._data + tm.assert_panel_equal(wp, self.panel) + + # strings handled prop + wp = Panel([[['foo', 'foo', 'foo', ], ['foo', 'foo', 'foo']]]) + assert wp.values.dtype == np.object_ + + vals = self.panel.values + + # no copy + wp = Panel(vals) + assert wp.values is vals + + # copy + wp = Panel(vals, copy=True) + assert wp.values is not vals + + # GH #8285, test when scalar data is used to construct a Panel + # if dtype is not passed, it should be inferred + value_and_dtype = [(1, 'int64'), (3.14, 'float64'), + ('foo', np.object_)] + for (val, dtype) in value_and_dtype: + wp = Panel(val, items=range(2), major_axis=range(3), + minor_axis=range(4)) + vals = np.empty((2, 3, 4), dtype=dtype) + vals.fill(val) + + tm.assert_panel_equal(wp, Panel(vals, dtype=dtype)) + + # test the case when dtype is passed + wp = Panel(1, items=range(2), major_axis=range(3), + minor_axis=range(4), + dtype='float32') + vals = np.empty((2, 3, 4), dtype='float32') + vals.fill(1) + + tm.assert_panel_equal(wp, Panel(vals, dtype='float32')) def test_constructor_cast(self): - zero_filled = self.panel.fillna(0) + with catch_warnings(record=True): + zero_filled = self.panel.fillna(0) - casted = Panel(zero_filled._data, dtype=int) - casted2 = Panel(zero_filled.values, dtype=int) + casted = Panel(zero_filled._data, dtype=int) + casted2 = Panel(zero_filled.values, dtype=int) - exp_values = zero_filled.values.astype(int) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) + exp_values = zero_filled.values.astype(int) + assert_almost_equal(casted.values, exp_values) + assert_almost_equal(casted2.values, exp_values) - casted = Panel(zero_filled._data, dtype=np.int32) - casted2 = Panel(zero_filled.values, dtype=np.int32) + casted = Panel(zero_filled._data, dtype=np.int32) + casted2 = Panel(zero_filled.values, dtype=np.int32) - exp_values = zero_filled.values.astype(np.int32) - assert_almost_equal(casted.values, exp_values) - assert_almost_equal(casted2.values, exp_values) + exp_values = zero_filled.values.astype(np.int32) + assert_almost_equal(casted.values, exp_values) + assert_almost_equal(casted2.values, exp_values) - # can't cast - data = [[['foo', 'bar', 'baz']]] - self.assertRaises(ValueError, Panel, data, dtype=float) + # can't cast + data = [[['foo', 'bar', 'baz']]] + pytest.raises(ValueError, Panel, data, dtype=float) def test_constructor_empty_panel(self): - empty = Panel() - self.assertEqual(len(empty.items), 0) - self.assertEqual(len(empty.major_axis), 0) - self.assertEqual(len(empty.minor_axis), 0) + with catch_warnings(record=True): + empty = Panel() + assert len(empty.items) == 0 + assert len(empty.major_axis) == 0 + assert len(empty.minor_axis) == 0 def test_constructor_observe_dtype(self): - # GH #411 - panel = Panel(items=lrange(3), major_axis=lrange(3), - minor_axis=lrange(3), dtype='O') - self.assertEqual(panel.values.dtype, np.object_) + with catch_warnings(record=True): + # GH #411 + panel = Panel(items=lrange(3), major_axis=lrange(3), + minor_axis=lrange(3), dtype='O') + assert panel.values.dtype == np.object_ def test_constructor_dtypes(self): - # GH #797 - - def _check_dtype(panel, dtype): - for i in panel.items: - self.assertEqual(panel[i].values.dtype.name, dtype) - - # only nan holding types allowed here - for dtype in ['float64', 'float32', 'object']: - panel = Panel(items=lrange(2), major_axis=lrange(10), - minor_axis=lrange(5), dtype=dtype) - _check_dtype(panel, dtype) - - for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: - panel = Panel(np.array(np.random.randn(2, 10, 5), dtype=dtype), - items=lrange(2), - major_axis=lrange(10), - minor_axis=lrange(5), dtype=dtype) - _check_dtype(panel, dtype) - - for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: - panel = Panel(np.array(np.random.randn(2, 10, 5), dtype='O'), - items=lrange(2), - major_axis=lrange(10), - minor_axis=lrange(5), dtype=dtype) - _check_dtype(panel, dtype) - - for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: - panel = Panel(np.random.randn(2, 10, 5), items=lrange( - 2), major_axis=lrange(10), minor_axis=lrange(5), dtype=dtype) - _check_dtype(panel, dtype) - - for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: - df1 = DataFrame(np.random.randn(2, 5), - index=lrange(2), columns=lrange(5)) - df2 = DataFrame(np.random.randn(2, 5), - index=lrange(2), columns=lrange(5)) - panel = Panel.from_dict({'a': df1, 'b': df2}, dtype=dtype) - _check_dtype(panel, dtype) + with catch_warnings(record=True): + # GH #797 + + def _check_dtype(panel, dtype): + for i in panel.items: + assert panel[i].values.dtype.name == dtype + + # only nan holding types allowed here + for dtype in ['float64', 'float32', 'object']: + panel = Panel(items=lrange(2), major_axis=lrange(10), + minor_axis=lrange(5), dtype=dtype) + _check_dtype(panel, dtype) + + for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: + panel = Panel(np.array(np.random.randn(2, 10, 5), dtype=dtype), + items=lrange(2), + major_axis=lrange(10), + minor_axis=lrange(5), dtype=dtype) + _check_dtype(panel, dtype) + + for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: + panel = Panel(np.array(np.random.randn(2, 10, 5), dtype='O'), + items=lrange(2), + major_axis=lrange(10), + minor_axis=lrange(5), dtype=dtype) + _check_dtype(panel, dtype) + + for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: + panel = Panel( + np.random.randn(2, 10, 5), + items=lrange(2), major_axis=lrange(10), + minor_axis=lrange(5), + dtype=dtype) + _check_dtype(panel, dtype) + + for dtype in ['float64', 'float32', 'int64', 'int32', 'object']: + df1 = DataFrame(np.random.randn(2, 5), + index=lrange(2), columns=lrange(5)) + df2 = DataFrame(np.random.randn(2, 5), + index=lrange(2), columns=lrange(5)) + panel = Panel.from_dict({'a': df1, 'b': df2}, dtype=dtype) + _check_dtype(panel, dtype) def test_constructor_fails_with_not_3d_input(self): - with tm.assertRaisesRegexp(ValueError, - "The number of dimensions required is 3"): - Panel(np.random.randn(10, 2)) + with catch_warnings(record=True): + with tm.assert_raises_regex(ValueError, "The number of dimensions required is 3"): # noqa + Panel(np.random.randn(10, 2)) def test_consolidate(self): - self.assertTrue(self.panel._data.is_consolidated()) + with catch_warnings(record=True): + assert self.panel._data.is_consolidated() - self.panel['foo'] = 1. - self.assertFalse(self.panel._data.is_consolidated()) + self.panel['foo'] = 1. + assert not self.panel._data.is_consolidated() - panel = self.panel.consolidate() - self.assertTrue(panel._data.is_consolidated()) + panel = self.panel._consolidate() + assert panel._data.is_consolidated() def test_ctor_dict(self): - itema = self.panel['ItemA'] - itemb = self.panel['ItemB'] + with catch_warnings(record=True): + itema = self.panel['ItemA'] + itemb = self.panel['ItemB'] - d = {'A': itema, 'B': itemb[5:]} - d2 = {'A': itema._series, 'B': itemb[5:]._series} - d3 = {'A': None, - 'B': DataFrame(itemb[5:]._series), - 'C': DataFrame(itema._series)} + d = {'A': itema, 'B': itemb[5:]} + d2 = {'A': itema._series, 'B': itemb[5:]._series} + d3 = {'A': None, + 'B': DataFrame(itemb[5:]._series), + 'C': DataFrame(itema._series)} - wp = Panel.from_dict(d) - wp2 = Panel.from_dict(d2) # nested Dict + wp = Panel.from_dict(d) + wp2 = Panel.from_dict(d2) # nested Dict - # TODO: unused? - wp3 = Panel.from_dict(d3) # noqa + # TODO: unused? + wp3 = Panel.from_dict(d3) # noqa - self.assert_index_equal(wp.major_axis, self.panel.major_axis) - assert_panel_equal(wp, wp2) + tm.assert_index_equal(wp.major_axis, self.panel.major_axis) + assert_panel_equal(wp, wp2) - # intersect - wp = Panel.from_dict(d, intersect=True) - self.assert_index_equal(wp.major_axis, itemb.index[5:]) + # intersect + wp = Panel.from_dict(d, intersect=True) + tm.assert_index_equal(wp.major_axis, itemb.index[5:]) - # use constructor - assert_panel_equal(Panel(d), Panel.from_dict(d)) - assert_panel_equal(Panel(d2), Panel.from_dict(d2)) - assert_panel_equal(Panel(d3), Panel.from_dict(d3)) + # use constructor + assert_panel_equal(Panel(d), Panel.from_dict(d)) + assert_panel_equal(Panel(d2), Panel.from_dict(d2)) + assert_panel_equal(Panel(d3), Panel.from_dict(d3)) - # a pathological case - d4 = {'A': None, 'B': None} + # a pathological case + d4 = {'A': None, 'B': None} - # TODO: unused? - wp4 = Panel.from_dict(d4) # noqa + # TODO: unused? + wp4 = Panel.from_dict(d4) # noqa - assert_panel_equal(Panel(d4), Panel(items=['A', 'B'])) + assert_panel_equal(Panel(d4), Panel(items=['A', 'B'])) - # cast - dcasted = dict((k, v.reindex(wp.major_axis).fillna(0)) - for k, v in compat.iteritems(d)) - result = Panel(dcasted, dtype=int) - expected = Panel(dict((k, v.astype(int)) - for k, v in compat.iteritems(dcasted))) - assert_panel_equal(result, expected) + # cast + dcasted = dict((k, v.reindex(wp.major_axis).fillna(0)) + for k, v in compat.iteritems(d)) + result = Panel(dcasted, dtype=int) + expected = Panel(dict((k, v.astype(int)) + for k, v in compat.iteritems(dcasted))) + assert_panel_equal(result, expected) - result = Panel(dcasted, dtype=np.int32) - expected = Panel(dict((k, v.astype(np.int32)) - for k, v in compat.iteritems(dcasted))) - assert_panel_equal(result, expected) + result = Panel(dcasted, dtype=np.int32) + expected = Panel(dict((k, v.astype(np.int32)) + for k, v in compat.iteritems(dcasted))) + assert_panel_equal(result, expected) def test_constructor_dict_mixed(self): - data = dict((k, v.values) for k, v in self.panel.iteritems()) - result = Panel(data) - exp_major = Index(np.arange(len(self.panel.major_axis))) - self.assert_index_equal(result.major_axis, exp_major) + with catch_warnings(record=True): + data = dict((k, v.values) for k, v in self.panel.iteritems()) + result = Panel(data) + exp_major = Index(np.arange(len(self.panel.major_axis))) + tm.assert_index_equal(result.major_axis, exp_major) - result = Panel(data, items=self.panel.items, - major_axis=self.panel.major_axis, - minor_axis=self.panel.minor_axis) - assert_panel_equal(result, self.panel) + result = Panel(data, items=self.panel.items, + major_axis=self.panel.major_axis, + minor_axis=self.panel.minor_axis) + assert_panel_equal(result, self.panel) - data['ItemC'] = self.panel['ItemC'] - result = Panel(data) - assert_panel_equal(result, self.panel) + data['ItemC'] = self.panel['ItemC'] + result = Panel(data) + assert_panel_equal(result, self.panel) - # corner, blow up - data['ItemB'] = data['ItemB'][:-1] - self.assertRaises(Exception, Panel, data) + # corner, blow up + data['ItemB'] = data['ItemB'][:-1] + pytest.raises(Exception, Panel, data) - data['ItemB'] = self.panel['ItemB'].values[:, :-1] - self.assertRaises(Exception, Panel, data) + data['ItemB'] = self.panel['ItemB'].values[:, :-1] + pytest.raises(Exception, Panel, data) def test_ctor_orderedDict(self): - keys = list(set(np.random.randint(0, 5000, 100)))[ - :50] # unique random int keys - d = OrderedDict([(k, mkdf(10, 5)) for k in keys]) - p = Panel(d) - self.assertTrue(list(p.items) == keys) + with catch_warnings(record=True): + keys = list(set(np.random.randint(0, 5000, 100)))[ + :50] # unique random int keys + d = OrderedDict([(k, mkdf(10, 5)) for k in keys]) + p = Panel(d) + assert list(p.items) == keys - p = Panel.from_dict(d) - self.assertTrue(list(p.items) == keys) + p = Panel.from_dict(d) + assert list(p.items) == keys def test_constructor_resize(self): - data = self.panel._data - items = self.panel.items[:-1] - major = self.panel.major_axis[:-1] - minor = self.panel.minor_axis[:-1] - - result = Panel(data, items=items, major_axis=major, minor_axis=minor) - expected = self.panel.reindex(items=items, major=major, minor=minor) - assert_panel_equal(result, expected) + with catch_warnings(record=True): + data = self.panel._data + items = self.panel.items[:-1] + major = self.panel.major_axis[:-1] + minor = self.panel.minor_axis[:-1] + + result = Panel(data, items=items, + major_axis=major, minor_axis=minor) + expected = self.panel.reindex( + items=items, major=major, minor=minor) + assert_panel_equal(result, expected) - result = Panel(data, items=items, major_axis=major) - expected = self.panel.reindex(items=items, major=major) - assert_panel_equal(result, expected) + result = Panel(data, items=items, major_axis=major) + expected = self.panel.reindex(items=items, major=major) + assert_panel_equal(result, expected) - result = Panel(data, items=items) - expected = self.panel.reindex(items=items) - assert_panel_equal(result, expected) + result = Panel(data, items=items) + expected = self.panel.reindex(items=items) + assert_panel_equal(result, expected) - result = Panel(data, minor_axis=minor) - expected = self.panel.reindex(minor=minor) - assert_panel_equal(result, expected) + result = Panel(data, minor_axis=minor) + expected = self.panel.reindex(minor=minor) + assert_panel_equal(result, expected) def test_from_dict_mixed_orient(self): - df = tm.makeDataFrame() - df['foo'] = 'bar' + with catch_warnings(record=True): + df = tm.makeDataFrame() + df['foo'] = 'bar' - data = {'k1': df, 'k2': df} + data = {'k1': df, 'k2': df} - panel = Panel.from_dict(data, orient='minor') + panel = Panel.from_dict(data, orient='minor') - self.assertEqual(panel['foo'].values.dtype, np.object_) - self.assertEqual(panel['A'].values.dtype, np.float64) + assert panel['foo'].values.dtype == np.object_ + assert panel['A'].values.dtype == np.float64 def test_constructor_error_msgs(self): - def testit(): - Panel(np.random.randn(3, 4, 5), lrange(4), lrange(5), lrange(5)) - - assertRaisesRegexp(ValueError, - r"Shape of passed values is \(3, 4, 5\), " - r"indices imply \(4, 5, 5\)", - testit) - - def testit(): - Panel(np.random.randn(3, 4, 5), lrange(5), lrange(4), lrange(5)) - - assertRaisesRegexp(ValueError, - r"Shape of passed values is \(3, 4, 5\), " - r"indices imply \(5, 4, 5\)", - testit) - - def testit(): - Panel(np.random.randn(3, 4, 5), lrange(5), lrange(5), lrange(4)) - - assertRaisesRegexp(ValueError, - r"Shape of passed values is \(3, 4, 5\), " - r"indices imply \(5, 5, 4\)", - testit) + with catch_warnings(record=True): + def testit(): + Panel(np.random.randn(3, 4, 5), + lrange(4), lrange(5), lrange(5)) + + tm.assert_raises_regex(ValueError, + r"Shape of passed values is " + r"\(3, 4, 5\), indices imply " + r"\(4, 5, 5\)", + testit) + + def testit(): + Panel(np.random.randn(3, 4, 5), + lrange(5), lrange(4), lrange(5)) + + tm.assert_raises_regex(ValueError, + r"Shape of passed values is " + r"\(3, 4, 5\), indices imply " + r"\(5, 4, 5\)", + testit) + + def testit(): + Panel(np.random.randn(3, 4, 5), + lrange(5), lrange(5), lrange(4)) + + tm.assert_raises_regex(ValueError, + r"Shape of passed values is " + r"\(3, 4, 5\), indices imply " + r"\(5, 5, 4\)", + testit) def test_conform(self): - df = self.panel['ItemA'][:-5].filter(items=['A', 'B']) - conformed = self.panel.conform(df) + with catch_warnings(record=True): + df = self.panel['ItemA'][:-5].filter(items=['A', 'B']) + conformed = self.panel.conform(df) - tm.assert_index_equal(conformed.index, self.panel.major_axis) - tm.assert_index_equal(conformed.columns, self.panel.minor_axis) + tm.assert_index_equal(conformed.index, self.panel.major_axis) + tm.assert_index_equal(conformed.columns, self.panel.minor_axis) def test_convert_objects(self): + with catch_warnings(record=True): - # GH 4937 - p = Panel(dict(A=dict(a=['1', '1.0']))) - expected = Panel(dict(A=dict(a=[1, 1.0]))) - result = p._convert(numeric=True, coerce=True) - assert_panel_equal(result, expected) + # GH 4937 + p = Panel(dict(A=dict(a=['1', '1.0']))) + expected = Panel(dict(A=dict(a=[1, 1.0]))) + result = p._convert(numeric=True, coerce=True) + assert_panel_equal(result, expected) def test_dtypes(self): @@ -1170,875 +1231,940 @@ def test_dtypes(self): assert_series_equal(result, expected) def test_astype(self): - # GH7271 - data = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) - panel = Panel(data, ['a', 'b'], ['c', 'd'], ['e', 'f']) + with catch_warnings(record=True): + # GH7271 + data = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) + panel = Panel(data, ['a', 'b'], ['c', 'd'], ['e', 'f']) - str_data = np.array([[['1', '2'], ['3', '4']], - [['5', '6'], ['7', '8']]]) - expected = Panel(str_data, ['a', 'b'], ['c', 'd'], ['e', 'f']) - assert_panel_equal(panel.astype(str), expected) + str_data = np.array([[['1', '2'], ['3', '4']], + [['5', '6'], ['7', '8']]]) + expected = Panel(str_data, ['a', 'b'], ['c', 'd'], ['e', 'f']) + assert_panel_equal(panel.astype(str), expected) - self.assertRaises(NotImplementedError, panel.astype, {0: str}) + pytest.raises(NotImplementedError, panel.astype, {0: str}) def test_apply(self): - # GH1148 - - # ufunc - applied = self.panel.apply(np.sqrt) - with np.errstate(invalid='ignore'): - expected = np.sqrt(self.panel.values) - assert_almost_equal(applied.values, expected) - - # ufunc same shape - result = self.panel.apply(lambda x: x * 2, axis='items') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='major_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, axis='minor_axis') - expected = self.panel * 2 - assert_panel_equal(result, expected) - - # reduction to DataFrame - result = self.panel.apply(lambda x: x.dtype, axis='items') - expected = DataFrame(np.dtype('float64'), index=self.panel.major_axis, - columns=self.panel.minor_axis) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='major_axis') - expected = DataFrame(np.dtype('float64'), index=self.panel.minor_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - result = self.panel.apply(lambda x: x.dtype, axis='minor_axis') - expected = DataFrame(np.dtype('float64'), index=self.panel.major_axis, - columns=self.panel.items) - assert_frame_equal(result, expected) - - # reductions via other dims - expected = self.panel.sum(0) - result = self.panel.apply(lambda x: x.sum(), axis='items') - assert_frame_equal(result, expected) - expected = self.panel.sum(1) - result = self.panel.apply(lambda x: x.sum(), axis='major_axis') - assert_frame_equal(result, expected) - expected = self.panel.sum(2) - result = self.panel.apply(lambda x: x.sum(), axis='minor_axis') - assert_frame_equal(result, expected) + with catch_warnings(record=True): + # GH1148 + + # ufunc + applied = self.panel.apply(np.sqrt) + with np.errstate(invalid='ignore'): + expected = np.sqrt(self.panel.values) + assert_almost_equal(applied.values, expected) + + # ufunc same shape + result = self.panel.apply(lambda x: x * 2, axis='items') + expected = self.panel * 2 + assert_panel_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, axis='major_axis') + expected = self.panel * 2 + assert_panel_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, axis='minor_axis') + expected = self.panel * 2 + assert_panel_equal(result, expected) - # pass kwargs - result = self.panel.apply(lambda x, y: x.sum() + y, axis='items', y=5) - expected = self.panel.sum(0) + 5 - assert_frame_equal(result, expected) + # reduction to DataFrame + result = self.panel.apply(lambda x: x.dtype, axis='items') + expected = DataFrame(np.dtype('float64'), + index=self.panel.major_axis, + columns=self.panel.minor_axis) + assert_frame_equal(result, expected) + result = self.panel.apply(lambda x: x.dtype, axis='major_axis') + expected = DataFrame(np.dtype('float64'), + index=self.panel.minor_axis, + columns=self.panel.items) + assert_frame_equal(result, expected) + result = self.panel.apply(lambda x: x.dtype, axis='minor_axis') + expected = DataFrame(np.dtype('float64'), + index=self.panel.major_axis, + columns=self.panel.items) + assert_frame_equal(result, expected) + + # reductions via other dims + expected = self.panel.sum(0) + result = self.panel.apply(lambda x: x.sum(), axis='items') + assert_frame_equal(result, expected) + expected = self.panel.sum(1) + result = self.panel.apply(lambda x: x.sum(), axis='major_axis') + assert_frame_equal(result, expected) + expected = self.panel.sum(2) + result = self.panel.apply(lambda x: x.sum(), axis='minor_axis') + assert_frame_equal(result, expected) + + # pass kwargs + result = self.panel.apply( + lambda x, y: x.sum() + y, axis='items', y=5) + expected = self.panel.sum(0) + 5 + assert_frame_equal(result, expected) def test_apply_slabs(self): + with catch_warnings(record=True): - # same shape as original - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'major_axis']) - expected = (self.panel * 2).transpose('minor_axis', 'major_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['items', 'minor_axis']) - expected = (self.panel * 2).transpose('major_axis', 'minor_axis', - 'items') - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'items']) - assert_panel_equal(result, expected) - - result = self.panel.apply(lambda x: x * 2, - axis=['major_axis', 'minor_axis']) - expected = self.panel * 2 - assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x * 2, - axis=['minor_axis', 'major_axis']) - assert_panel_equal(result, expected) - - # reductions - result = self.panel.apply(lambda x: x.sum(0), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(1).T - assert_frame_equal(result, expected) + # same shape as original + result = self.panel.apply(lambda x: x * 2, + axis=['items', 'major_axis']) + expected = (self.panel * 2).transpose('minor_axis', 'major_axis', + 'items') + assert_panel_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, + axis=['major_axis', 'items']) + assert_panel_equal(result, expected) - result = self.panel.apply(lambda x: x.sum(1), axis=[ - 'items', 'major_axis' - ]) - expected = self.panel.sum(0) - assert_frame_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, + axis=['items', 'minor_axis']) + expected = (self.panel * 2).transpose('major_axis', 'minor_axis', + 'items') + assert_panel_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, + axis=['minor_axis', 'items']) + assert_panel_equal(result, expected) + + result = self.panel.apply(lambda x: x * 2, + axis=['major_axis', 'minor_axis']) + expected = self.panel * 2 + assert_panel_equal(result, expected) + result = self.panel.apply(lambda x: x * 2, + axis=['minor_axis', 'major_axis']) + assert_panel_equal(result, expected) - # transforms - f = lambda x: ((x.T - x.mean(1)) / x.std(1)).T + # reductions + result = self.panel.apply(lambda x: x.sum(0), axis=[ + 'items', 'major_axis' + ]) + expected = self.panel.sum(1).T + assert_frame_equal(result, expected) # make sure that we don't trigger any warnings - with tm.assert_produces_warning(False): + with catch_warnings(record=True): + result = self.panel.apply(lambda x: x.sum(1), axis=[ + 'items', 'major_axis' + ]) + expected = self.panel.sum(0) + assert_frame_equal(result, expected) + + # transforms + f = lambda x: ((x.T - x.mean(1)) / x.std(1)).T + + # make sure that we don't trigger any warnings result = self.panel.apply(f, axis=['items', 'major_axis']) expected = Panel(dict([(ax, f(self.panel.loc[:, :, ax])) for ax in self.panel.minor_axis])) assert_panel_equal(result, expected) - result = self.panel.apply(f, axis=['major_axis', 'minor_axis']) - expected = Panel(dict([(ax, f(self.panel.loc[ax])) - for ax in self.panel.items])) - assert_panel_equal(result, expected) - - result = self.panel.apply(f, axis=['minor_axis', 'items']) - expected = Panel(dict([(ax, f(self.panel.loc[:, ax])) - for ax in self.panel.major_axis])) - assert_panel_equal(result, expected) - - # with multi-indexes - # GH7469 - index = MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ( - 'two', 'a'), ('two', 'b')]) - dfa = DataFrame(np.array(np.arange(12, dtype='int64')).reshape( - 4, 3), columns=list("ABC"), index=index) - dfb = DataFrame(np.array(np.arange(10, 22, dtype='int64')).reshape( - 4, 3), columns=list("ABC"), index=index) - p = Panel({'f': dfa, 'g': dfb}) - result = p.apply(lambda x: x.sum(), axis=0) - - # on windows this will be in32 - result = result.astype('int64') - expected = p.sum(0) - assert_frame_equal(result, expected) + result = self.panel.apply(f, axis=['major_axis', 'minor_axis']) + expected = Panel(dict([(ax, f(self.panel.loc[ax])) + for ax in self.panel.items])) + assert_panel_equal(result, expected) + + result = self.panel.apply(f, axis=['minor_axis', 'items']) + expected = Panel(dict([(ax, f(self.panel.loc[:, ax])) + for ax in self.panel.major_axis])) + assert_panel_equal(result, expected) + + # with multi-indexes + # GH7469 + index = MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ( + 'two', 'a'), ('two', 'b')]) + dfa = DataFrame(np.array(np.arange(12, dtype='int64')).reshape( + 4, 3), columns=list("ABC"), index=index) + dfb = DataFrame(np.array(np.arange(10, 22, dtype='int64')).reshape( + 4, 3), columns=list("ABC"), index=index) + p = Panel({'f': dfa, 'g': dfb}) + result = p.apply(lambda x: x.sum(), axis=0) + + # on windows this will be in32 + result = result.astype('int64') + expected = p.sum(0) + assert_frame_equal(result, expected) def test_apply_no_or_zero_ndim(self): - # GH10332 - self.panel = Panel(np.random.rand(5, 5, 5)) + with catch_warnings(record=True): + # GH10332 + self.panel = Panel(np.random.rand(5, 5, 5)) - result_int = self.panel.apply(lambda df: 0, axis=[1, 2]) - result_float = self.panel.apply(lambda df: 0.0, axis=[1, 2]) - result_int64 = self.panel.apply(lambda df: np.int64(0), axis=[1, 2]) - result_float64 = self.panel.apply(lambda df: np.float64(0.0), - axis=[1, 2]) + result_int = self.panel.apply(lambda df: 0, axis=[1, 2]) + result_float = self.panel.apply(lambda df: 0.0, axis=[1, 2]) + result_int64 = self.panel.apply( + lambda df: np.int64(0), axis=[1, 2]) + result_float64 = self.panel.apply(lambda df: np.float64(0.0), + axis=[1, 2]) - expected_int = expected_int64 = Series([0] * 5) - expected_float = expected_float64 = Series([0.0] * 5) + expected_int = expected_int64 = Series([0] * 5) + expected_float = expected_float64 = Series([0.0] * 5) - assert_series_equal(result_int, expected_int) - assert_series_equal(result_int64, expected_int64) - assert_series_equal(result_float, expected_float) - assert_series_equal(result_float64, expected_float64) + assert_series_equal(result_int, expected_int) + assert_series_equal(result_int64, expected_int64) + assert_series_equal(result_float, expected_float) + assert_series_equal(result_float64, expected_float64) def test_reindex(self): - ref = self.panel['ItemB'] + with catch_warnings(record=True): + ref = self.panel['ItemB'] - # items - result = self.panel.reindex(items=['ItemA', 'ItemB']) - assert_frame_equal(result['ItemB'], ref) + # items + result = self.panel.reindex(items=['ItemA', 'ItemB']) + assert_frame_equal(result['ItemB'], ref) - # major - new_major = list(self.panel.major_axis[:10]) - result = self.panel.reindex(major=new_major) - assert_frame_equal(result['ItemB'], ref.reindex(index=new_major)) + # major + new_major = list(self.panel.major_axis[:10]) + result = self.panel.reindex(major=new_major) + assert_frame_equal(result['ItemB'], ref.reindex(index=new_major)) - # raise exception put both major and major_axis - self.assertRaises(Exception, self.panel.reindex, major_axis=new_major, + # raise exception put both major and major_axis + pytest.raises(Exception, self.panel.reindex, + major_axis=new_major, major=new_major) - # minor - new_minor = list(self.panel.minor_axis[:2]) - result = self.panel.reindex(minor=new_minor) - assert_frame_equal(result['ItemB'], ref.reindex(columns=new_minor)) + # minor + new_minor = list(self.panel.minor_axis[:2]) + result = self.panel.reindex(minor=new_minor) + assert_frame_equal(result['ItemB'], ref.reindex(columns=new_minor)) - # this ok - result = self.panel.reindex() - assert_panel_equal(result, self.panel) - self.assertFalse(result is self.panel) + # this ok + result = self.panel.reindex() + assert_panel_equal(result, self.panel) + assert result is not self.panel - # with filling - smaller_major = self.panel.major_axis[::5] - smaller = self.panel.reindex(major=smaller_major) + # with filling + smaller_major = self.panel.major_axis[::5] + smaller = self.panel.reindex(major=smaller_major) - larger = smaller.reindex(major=self.panel.major_axis, method='pad') + larger = smaller.reindex(major=self.panel.major_axis, method='pad') - assert_frame_equal(larger.major_xs(self.panel.major_axis[1]), - smaller.major_xs(smaller_major[0])) + assert_frame_equal(larger.major_xs(self.panel.major_axis[1]), + smaller.major_xs(smaller_major[0])) - # don't necessarily copy - result = self.panel.reindex(major=self.panel.major_axis, copy=False) - assert_panel_equal(result, self.panel) - self.assertTrue(result is self.panel) + # don't necessarily copy + result = self.panel.reindex( + major=self.panel.major_axis, copy=False) + assert_panel_equal(result, self.panel) + assert result is self.panel def test_reindex_multi(self): - - # with and without copy full reindexing - result = self.panel.reindex(items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - - self.assertIs(result.items, self.panel.items) - self.assertIs(result.major_axis, self.panel.major_axis) - self.assertIs(result.minor_axis, self.panel.minor_axis) - - result = self.panel.reindex(items=self.panel.items, - major=self.panel.major_axis, - minor=self.panel.minor_axis, copy=False) - assert_panel_equal(result, self.panel) - - # multi-axis indexing consistency - # GH 5900 - df = DataFrame(np.random.randn(4, 3)) - p = Panel({'Item1': df}) - expected = Panel({'Item1': df}) - expected['Item2'] = np.nan - - items = ['Item1', 'Item2'] - major_axis = np.arange(4) - minor_axis = np.arange(3) - - results = [] - results.append(p.reindex(items=items, major_axis=major_axis, - copy=True)) - results.append(p.reindex(items=items, major_axis=major_axis, - copy=False)) - results.append(p.reindex(items=items, minor_axis=minor_axis, - copy=True)) - results.append(p.reindex(items=items, minor_axis=minor_axis, - copy=False)) - results.append(p.reindex(items=items, major_axis=major_axis, - minor_axis=minor_axis, copy=True)) - results.append(p.reindex(items=items, major_axis=major_axis, - minor_axis=minor_axis, copy=False)) - - for i, r in enumerate(results): - assert_panel_equal(expected, r) + with catch_warnings(record=True): + + # with and without copy full reindexing + result = self.panel.reindex( + items=self.panel.items, + major=self.panel.major_axis, + minor=self.panel.minor_axis, copy=False) + + assert result.items is self.panel.items + assert result.major_axis is self.panel.major_axis + assert result.minor_axis is self.panel.minor_axis + + result = self.panel.reindex( + items=self.panel.items, + major=self.panel.major_axis, + minor=self.panel.minor_axis, copy=False) + assert_panel_equal(result, self.panel) + + # multi-axis indexing consistency + # GH 5900 + df = DataFrame(np.random.randn(4, 3)) + p = Panel({'Item1': df}) + expected = Panel({'Item1': df}) + expected['Item2'] = np.nan + + items = ['Item1', 'Item2'] + major_axis = np.arange(4) + minor_axis = np.arange(3) + + results = [] + results.append(p.reindex(items=items, major_axis=major_axis, + copy=True)) + results.append(p.reindex(items=items, major_axis=major_axis, + copy=False)) + results.append(p.reindex(items=items, minor_axis=minor_axis, + copy=True)) + results.append(p.reindex(items=items, minor_axis=minor_axis, + copy=False)) + results.append(p.reindex(items=items, major_axis=major_axis, + minor_axis=minor_axis, copy=True)) + results.append(p.reindex(items=items, major_axis=major_axis, + minor_axis=minor_axis, copy=False)) + + for i, r in enumerate(results): + assert_panel_equal(expected, r) def test_reindex_like(self): - # reindex_like - smaller = self.panel.reindex(items=self.panel.items[:-1], - major=self.panel.major_axis[:-1], - minor=self.panel.minor_axis[:-1]) - smaller_like = self.panel.reindex_like(smaller) - assert_panel_equal(smaller, smaller_like) + with catch_warnings(record=True): + # reindex_like + smaller = self.panel.reindex(items=self.panel.items[:-1], + major=self.panel.major_axis[:-1], + minor=self.panel.minor_axis[:-1]) + smaller_like = self.panel.reindex_like(smaller) + assert_panel_equal(smaller, smaller_like) def test_take(self): - # axis == 0 - result = self.panel.take([2, 0, 1], axis=0) - expected = self.panel.reindex(items=['ItemC', 'ItemA', 'ItemB']) - assert_panel_equal(result, expected) + with catch_warnings(record=True): + # axis == 0 + result = self.panel.take([2, 0, 1], axis=0) + expected = self.panel.reindex(items=['ItemC', 'ItemA', 'ItemB']) + assert_panel_equal(result, expected) - # axis >= 1 - result = self.panel.take([3, 0, 1, 2], axis=2) - expected = self.panel.reindex(minor=['D', 'A', 'B', 'C']) - assert_panel_equal(result, expected) + # axis >= 1 + result = self.panel.take([3, 0, 1, 2], axis=2) + expected = self.panel.reindex(minor=['D', 'A', 'B', 'C']) + assert_panel_equal(result, expected) - # neg indicies ok - expected = self.panel.reindex(minor=['D', 'D', 'B', 'C']) - result = self.panel.take([3, -1, 1, 2], axis=2) - assert_panel_equal(result, expected) + # neg indicies ok + expected = self.panel.reindex(minor=['D', 'D', 'B', 'C']) + result = self.panel.take([3, -1, 1, 2], axis=2) + assert_panel_equal(result, expected) - self.assertRaises(Exception, self.panel.take, [4, 0, 1, 2], axis=2) + pytest.raises(Exception, self.panel.take, [4, 0, 1, 2], axis=2) def test_sort_index(self): - import random - - ritems = list(self.panel.items) - rmajor = list(self.panel.major_axis) - rminor = list(self.panel.minor_axis) - random.shuffle(ritems) - random.shuffle(rmajor) - random.shuffle(rminor) - - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0) - assert_panel_equal(sorted_panel, self.panel) - - # descending - random_order = self.panel.reindex(items=ritems) - sorted_panel = random_order.sort_index(axis=0, ascending=False) - assert_panel_equal(sorted_panel, - self.panel.reindex(items=self.panel.items[::-1])) - - random_order = self.panel.reindex(major=rmajor) - sorted_panel = random_order.sort_index(axis=1) - assert_panel_equal(sorted_panel, self.panel) - - random_order = self.panel.reindex(minor=rminor) - sorted_panel = random_order.sort_index(axis=2) - assert_panel_equal(sorted_panel, self.panel) + with catch_warnings(record=True): + import random + + ritems = list(self.panel.items) + rmajor = list(self.panel.major_axis) + rminor = list(self.panel.minor_axis) + random.shuffle(ritems) + random.shuffle(rmajor) + random.shuffle(rminor) + + random_order = self.panel.reindex(items=ritems) + sorted_panel = random_order.sort_index(axis=0) + assert_panel_equal(sorted_panel, self.panel) + + # descending + random_order = self.panel.reindex(items=ritems) + sorted_panel = random_order.sort_index(axis=0, ascending=False) + assert_panel_equal( + sorted_panel, + self.panel.reindex(items=self.panel.items[::-1])) + + random_order = self.panel.reindex(major=rmajor) + sorted_panel = random_order.sort_index(axis=1) + assert_panel_equal(sorted_panel, self.panel) + + random_order = self.panel.reindex(minor=rminor) + sorted_panel = random_order.sort_index(axis=2) + assert_panel_equal(sorted_panel, self.panel) def test_fillna(self): - filled = self.panel.fillna(0) - self.assertTrue(np.isfinite(filled.values).all()) - - filled = self.panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - self.panel['ItemA'].fillna(method='backfill')) - - panel = self.panel.copy() - panel['str'] = 'foo' - - filled = panel.fillna(method='backfill') - assert_frame_equal(filled['ItemA'], - panel['ItemA'].fillna(method='backfill')) - - empty = self.panel.reindex(items=[]) - filled = empty.fillna(0) - assert_panel_equal(filled, empty) - - self.assertRaises(ValueError, self.panel.fillna) - self.assertRaises(ValueError, self.panel.fillna, 5, method='ffill') - - self.assertRaises(TypeError, self.panel.fillna, [1, 2]) - self.assertRaises(TypeError, self.panel.fillna, (1, 2)) - - # limit not implemented when only value is specified - p = Panel(np.random.randn(3, 4, 5)) - p.iloc[0:2, 0:2, 0:2] = np.nan - self.assertRaises(NotImplementedError, lambda: p.fillna(999, limit=1)) - - # Test in place fillNA - # Expected result - expected = Panel([[[0, 1], [2, 1]], [[10, 11], [12, 11]]], - items=['a', 'b'], minor_axis=['x', 'y'], - dtype=np.float64) - # method='ffill' - p1 = Panel([[[0, 1], [2, np.nan]], [[10, 11], [12, np.nan]]], - items=['a', 'b'], minor_axis=['x', 'y'], - dtype=np.float64) - p1.fillna(method='ffill', inplace=True) - assert_panel_equal(p1, expected) - - # method='bfill' - p2 = Panel([[[0, np.nan], [2, 1]], [[10, np.nan], [12, 11]]], - items=['a', 'b'], minor_axis=['x', 'y'], dtype=np.float64) - p2.fillna(method='bfill', inplace=True) - assert_panel_equal(p2, expected) + with catch_warnings(record=True): + filled = self.panel.fillna(0) + assert np.isfinite(filled.values).all() + + filled = self.panel.fillna(method='backfill') + assert_frame_equal(filled['ItemA'], + self.panel['ItemA'].fillna(method='backfill')) + + panel = self.panel.copy() + panel['str'] = 'foo' + + filled = panel.fillna(method='backfill') + assert_frame_equal(filled['ItemA'], + panel['ItemA'].fillna(method='backfill')) + + empty = self.panel.reindex(items=[]) + filled = empty.fillna(0) + assert_panel_equal(filled, empty) + + pytest.raises(ValueError, self.panel.fillna) + pytest.raises(ValueError, self.panel.fillna, 5, method='ffill') + + pytest.raises(TypeError, self.panel.fillna, [1, 2]) + pytest.raises(TypeError, self.panel.fillna, (1, 2)) + + # limit not implemented when only value is specified + p = Panel(np.random.randn(3, 4, 5)) + p.iloc[0:2, 0:2, 0:2] = np.nan + pytest.raises(NotImplementedError, + lambda: p.fillna(999, limit=1)) + + # Test in place fillNA + # Expected result + expected = Panel([[[0, 1], [2, 1]], [[10, 11], [12, 11]]], + items=['a', 'b'], minor_axis=['x', 'y'], + dtype=np.float64) + # method='ffill' + p1 = Panel([[[0, 1], [2, np.nan]], [[10, 11], [12, np.nan]]], + items=['a', 'b'], minor_axis=['x', 'y'], + dtype=np.float64) + p1.fillna(method='ffill', inplace=True) + assert_panel_equal(p1, expected) + + # method='bfill' + p2 = Panel([[[0, np.nan], [2, 1]], [[10, np.nan], [12, 11]]], + items=['a', 'b'], minor_axis=['x', 'y'], + dtype=np.float64) + p2.fillna(method='bfill', inplace=True) + assert_panel_equal(p2, expected) def test_ffill_bfill(self): - assert_panel_equal(self.panel.ffill(), - self.panel.fillna(method='ffill')) - assert_panel_equal(self.panel.bfill(), - self.panel.fillna(method='bfill')) + with catch_warnings(record=True): + assert_panel_equal(self.panel.ffill(), + self.panel.fillna(method='ffill')) + assert_panel_equal(self.panel.bfill(), + self.panel.fillna(method='bfill')) def test_truncate_fillna_bug(self): - # #1823 - result = self.panel.truncate(before=None, after=None, axis='items') + with catch_warnings(record=True): + # #1823 + result = self.panel.truncate(before=None, after=None, axis='items') - # it works! - result.fillna(value=0.0) + # it works! + result.fillna(value=0.0) def test_swapaxes(self): - result = self.panel.swapaxes('items', 'minor') - self.assertIs(result.items, self.panel.minor_axis) + with catch_warnings(record=True): + result = self.panel.swapaxes('items', 'minor') + assert result.items is self.panel.minor_axis - result = self.panel.swapaxes('items', 'major') - self.assertIs(result.items, self.panel.major_axis) + result = self.panel.swapaxes('items', 'major') + assert result.items is self.panel.major_axis - result = self.panel.swapaxes('major', 'minor') - self.assertIs(result.major_axis, self.panel.minor_axis) + result = self.panel.swapaxes('major', 'minor') + assert result.major_axis is self.panel.minor_axis - panel = self.panel.copy() - result = panel.swapaxes('major', 'minor') - panel.values[0, 0, 1] = np.nan - expected = panel.swapaxes('major', 'minor') - assert_panel_equal(result, expected) + panel = self.panel.copy() + result = panel.swapaxes('major', 'minor') + panel.values[0, 0, 1] = np.nan + expected = panel.swapaxes('major', 'minor') + assert_panel_equal(result, expected) - # this should also work - result = self.panel.swapaxes(0, 1) - self.assertIs(result.items, self.panel.major_axis) + # this should also work + result = self.panel.swapaxes(0, 1) + assert result.items is self.panel.major_axis - # this works, but return a copy - result = self.panel.swapaxes('items', 'items') - assert_panel_equal(self.panel, result) - self.assertNotEqual(id(self.panel), id(result)) + # this works, but return a copy + result = self.panel.swapaxes('items', 'items') + assert_panel_equal(self.panel, result) + assert id(self.panel) != id(result) def test_transpose(self): - result = self.panel.transpose('minor', 'major', 'items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) + with catch_warnings(record=True): + result = self.panel.transpose('minor', 'major', 'items') + expected = self.panel.swapaxes('items', 'minor') + assert_panel_equal(result, expected) - # test kwargs - result = self.panel.transpose(items='minor', major='major', - minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) + # test kwargs + result = self.panel.transpose(items='minor', major='major', + minor='items') + expected = self.panel.swapaxes('items', 'minor') + assert_panel_equal(result, expected) - # text mixture of args - result = self.panel.transpose('minor', major='major', minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) + # text mixture of args + result = self.panel.transpose( + 'minor', major='major', minor='items') + expected = self.panel.swapaxes('items', 'minor') + assert_panel_equal(result, expected) - result = self.panel.transpose('minor', 'major', minor='items') - expected = self.panel.swapaxes('items', 'minor') - assert_panel_equal(result, expected) + result = self.panel.transpose('minor', + 'major', + minor='items') + expected = self.panel.swapaxes('items', 'minor') + assert_panel_equal(result, expected) - # duplicate axes - with tm.assertRaisesRegexp(TypeError, - 'not enough/duplicate arguments'): - self.panel.transpose('minor', maj='major', minor='items') + # duplicate axes + with tm.assert_raises_regex(TypeError, + 'not enough/duplicate arguments'): + self.panel.transpose('minor', maj='major', minor='items') - with tm.assertRaisesRegexp(ValueError, 'repeated axis in transpose'): - self.panel.transpose('minor', 'major', major='minor', - minor='items') + with tm.assert_raises_regex(ValueError, + 'repeated axis in transpose'): + self.panel.transpose('minor', 'major', major='minor', + minor='items') - result = self.panel.transpose(2, 1, 0) - assert_panel_equal(result, expected) + result = self.panel.transpose(2, 1, 0) + assert_panel_equal(result, expected) - result = self.panel.transpose('minor', 'items', 'major') - expected = self.panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) + result = self.panel.transpose('minor', 'items', 'major') + expected = self.panel.swapaxes('items', 'minor') + expected = expected.swapaxes('major', 'minor') + assert_panel_equal(result, expected) - result = self.panel.transpose(2, 0, 1) - assert_panel_equal(result, expected) + result = self.panel.transpose(2, 0, 1) + assert_panel_equal(result, expected) - self.assertRaises(ValueError, self.panel.transpose, 0, 0, 1) + pytest.raises(ValueError, self.panel.transpose, 0, 0, 1) def test_transpose_copy(self): - panel = self.panel.copy() - result = panel.transpose(2, 0, 1, copy=True) - expected = panel.swapaxes('items', 'minor') - expected = expected.swapaxes('major', 'minor') - assert_panel_equal(result, expected) + with catch_warnings(record=True): + panel = self.panel.copy() + result = panel.transpose(2, 0, 1, copy=True) + expected = panel.swapaxes('items', 'minor') + expected = expected.swapaxes('major', 'minor') + assert_panel_equal(result, expected) - panel.values[0, 1, 1] = np.nan - self.assertTrue(notnull(result.values[1, 0, 1])) + panel.values[0, 1, 1] = np.nan + assert notna(result.values[1, 0, 1]) def test_to_frame(self): - # filtered - filtered = self.panel.to_frame() - expected = self.panel.to_frame().dropna(how='any') - assert_frame_equal(filtered, expected) - - # unfiltered - unfiltered = self.panel.to_frame(filter_observations=False) - assert_panel_equal(unfiltered.to_panel(), self.panel) - - # names - self.assertEqual(unfiltered.index.names, ('major', 'minor')) - - # unsorted, round trip - df = self.panel.to_frame(filter_observations=False) - unsorted = df.take(np.random.permutation(len(df))) - pan = unsorted.to_panel() - assert_panel_equal(pan, self.panel) - - # preserve original index names - df = DataFrame(np.random.randn(6, 2), - index=[['a', 'a', 'b', 'b', 'c', 'c'], - [0, 1, 0, 1, 0, 1]], - columns=['one', 'two']) - df.index.names = ['foo', 'bar'] - df.columns.name = 'baz' - - rdf = df.to_panel().to_frame() - self.assertEqual(rdf.index.names, df.index.names) - self.assertEqual(rdf.columns.names, df.columns.names) + with catch_warnings(record=True): + # filtered + filtered = self.panel.to_frame() + expected = self.panel.to_frame().dropna(how='any') + assert_frame_equal(filtered, expected) + + # unfiltered + unfiltered = self.panel.to_frame(filter_observations=False) + assert_panel_equal(unfiltered.to_panel(), self.panel) + + # names + assert unfiltered.index.names == ('major', 'minor') + + # unsorted, round trip + df = self.panel.to_frame(filter_observations=False) + unsorted = df.take(np.random.permutation(len(df))) + pan = unsorted.to_panel() + assert_panel_equal(pan, self.panel) + + # preserve original index names + df = DataFrame(np.random.randn(6, 2), + index=[['a', 'a', 'b', 'b', 'c', 'c'], + [0, 1, 0, 1, 0, 1]], + columns=['one', 'two']) + df.index.names = ['foo', 'bar'] + df.columns.name = 'baz' + + rdf = df.to_panel().to_frame() + assert rdf.index.names == df.index.names + assert rdf.columns.names == df.columns.names def test_to_frame_mixed(self): - panel = self.panel.fillna(0) - panel['str'] = 'foo' - panel['bool'] = panel['ItemA'] > 0 - - lp = panel.to_frame() - wp = lp.to_panel() - self.assertEqual(wp['bool'].values.dtype, np.bool_) - # Previously, this was mutating the underlying index and changing its - # name - assert_frame_equal(wp['bool'], panel['bool'], check_names=False) - - # GH 8704 - # with categorical - df = panel.to_frame() - df['category'] = df['str'].astype('category') - - # to_panel - # TODO: this converts back to object - p = df.to_panel() - expected = panel.copy() - expected['category'] = 'foo' - assert_panel_equal(p, expected) + with catch_warnings(record=True): + panel = self.panel.fillna(0) + panel['str'] = 'foo' + panel['bool'] = panel['ItemA'] > 0 + + lp = panel.to_frame() + wp = lp.to_panel() + assert wp['bool'].values.dtype == np.bool_ + # Previously, this was mutating the underlying + # index and changing its name + assert_frame_equal(wp['bool'], panel['bool'], check_names=False) + + # GH 8704 + # with categorical + df = panel.to_frame() + df['category'] = df['str'].astype('category') + + # to_panel + # TODO: this converts back to object + p = df.to_panel() + expected = panel.copy() + expected['category'] = 'foo' + assert_panel_equal(p, expected) def test_to_frame_multi_major(self): - idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), ( - 2, 'two')]) - df = DataFrame([[1, 'a', 1], [2, 'b', 1], [3, 'c', 1], [4, 'd', 1]], - columns=['A', 'B', 'C'], index=idx) - wp = Panel({'i1': df, 'i2': df}) - expected_idx = MultiIndex.from_tuples( - [ - (1, 'one', 'A'), (1, 'one', 'B'), - (1, 'one', 'C'), (1, 'two', 'A'), - (1, 'two', 'B'), (1, 'two', 'C'), - (2, 'one', 'A'), (2, 'one', 'B'), - (2, 'one', 'C'), (2, 'two', 'A'), - (2, 'two', 'B'), (2, 'two', 'C') - ], - names=[None, None, 'minor']) - expected = DataFrame({'i1': [1, 'a', 1, 2, 'b', 1, 3, - 'c', 1, 4, 'd', 1], - 'i2': [1, 'a', 1, 2, 'b', - 1, 3, 'c', 1, 4, 'd', 1]}, - index=expected_idx) - result = wp.to_frame() - assert_frame_equal(result, expected) - - wp.iloc[0, 0].iloc[0] = np.nan # BUG on setting. GH #5773 - result = wp.to_frame() - assert_frame_equal(result, expected[1:]) - - idx = MultiIndex.from_tuples([(1, 'two'), (1, 'one'), (2, 'one'), ( - np.nan, 'two')]) - df = DataFrame([[1, 'a', 1], [2, 'b', 1], [3, 'c', 1], [4, 'd', 1]], - columns=['A', 'B', 'C'], index=idx) - wp = Panel({'i1': df, 'i2': df}) - ex_idx = MultiIndex.from_tuples([(1, 'two', 'A'), (1, 'two', 'B'), - (1, 'two', 'C'), - (1, 'one', 'A'), - (1, 'one', 'B'), - (1, 'one', 'C'), - (2, 'one', 'A'), - (2, 'one', 'B'), - (2, 'one', 'C'), - (np.nan, 'two', 'A'), - (np.nan, 'two', 'B'), - (np.nan, 'two', 'C')], - names=[None, None, 'minor']) - expected.index = ex_idx - result = wp.to_frame() - assert_frame_equal(result, expected) + with catch_warnings(record=True): + idx = MultiIndex.from_tuples( + [(1, 'one'), (1, 'two'), (2, 'one'), (2, 'two')]) + df = DataFrame([[1, 'a', 1], [2, 'b', 1], + [3, 'c', 1], [4, 'd', 1]], + columns=['A', 'B', 'C'], index=idx) + wp = Panel({'i1': df, 'i2': df}) + expected_idx = MultiIndex.from_tuples( + [ + (1, 'one', 'A'), (1, 'one', 'B'), + (1, 'one', 'C'), (1, 'two', 'A'), + (1, 'two', 'B'), (1, 'two', 'C'), + (2, 'one', 'A'), (2, 'one', 'B'), + (2, 'one', 'C'), (2, 'two', 'A'), + (2, 'two', 'B'), (2, 'two', 'C') + ], + names=[None, None, 'minor']) + expected = DataFrame({'i1': [1, 'a', 1, 2, 'b', 1, 3, + 'c', 1, 4, 'd', 1], + 'i2': [1, 'a', 1, 2, 'b', + 1, 3, 'c', 1, 4, 'd', 1]}, + index=expected_idx) + result = wp.to_frame() + assert_frame_equal(result, expected) + + wp.iloc[0, 0].iloc[0] = np.nan # BUG on setting. GH #5773 + result = wp.to_frame() + assert_frame_equal(result, expected[1:]) + + idx = MultiIndex.from_tuples( + [(1, 'two'), (1, 'one'), (2, 'one'), (np.nan, 'two')]) + df = DataFrame([[1, 'a', 1], [2, 'b', 1], + [3, 'c', 1], [4, 'd', 1]], + columns=['A', 'B', 'C'], index=idx) + wp = Panel({'i1': df, 'i2': df}) + ex_idx = MultiIndex.from_tuples([(1, 'two', 'A'), (1, 'two', 'B'), + (1, 'two', 'C'), + (1, 'one', 'A'), + (1, 'one', 'B'), + (1, 'one', 'C'), + (2, 'one', 'A'), + (2, 'one', 'B'), + (2, 'one', 'C'), + (np.nan, 'two', 'A'), + (np.nan, 'two', 'B'), + (np.nan, 'two', 'C')], + names=[None, None, 'minor']) + expected.index = ex_idx + result = wp.to_frame() + assert_frame_equal(result, expected) def test_to_frame_multi_major_minor(self): - cols = MultiIndex(levels=[['C_A', 'C_B'], ['C_1', 'C_2']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) - idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), ( - 2, 'two'), (3, 'three'), (4, 'four')]) - df = DataFrame([[1, 2, 11, 12], [3, 4, 13, 14], - ['a', 'b', 'w', 'x'], - ['c', 'd', 'y', 'z'], [-1, -2, -3, -4], - [-5, -6, -7, -8]], columns=cols, index=idx) - wp = Panel({'i1': df, 'i2': df}) - - exp_idx = MultiIndex.from_tuples( - [(1, 'one', 'C_A', 'C_1'), (1, 'one', 'C_A', 'C_2'), - (1, 'one', 'C_B', 'C_1'), (1, 'one', 'C_B', 'C_2'), - (1, 'two', 'C_A', 'C_1'), (1, 'two', 'C_A', 'C_2'), - (1, 'two', 'C_B', 'C_1'), (1, 'two', 'C_B', 'C_2'), - (2, 'one', 'C_A', 'C_1'), (2, 'one', 'C_A', 'C_2'), - (2, 'one', 'C_B', 'C_1'), (2, 'one', 'C_B', 'C_2'), - (2, 'two', 'C_A', 'C_1'), (2, 'two', 'C_A', 'C_2'), - (2, 'two', 'C_B', 'C_1'), (2, 'two', 'C_B', 'C_2'), - (3, 'three', 'C_A', 'C_1'), (3, 'three', 'C_A', 'C_2'), - (3, 'three', 'C_B', 'C_1'), (3, 'three', 'C_B', 'C_2'), - (4, 'four', 'C_A', 'C_1'), (4, 'four', 'C_A', 'C_2'), - (4, 'four', 'C_B', 'C_1'), (4, 'four', 'C_B', 'C_2')], - names=[None, None, None, None]) - exp_val = [[1, 1], [2, 2], [11, 11], [12, 12], [3, 3], [4, 4], - [13, 13], [14, 14], ['a', 'a'], ['b', 'b'], ['w', 'w'], - ['x', 'x'], ['c', 'c'], ['d', 'd'], ['y', 'y'], ['z', 'z'], - [-1, -1], [-2, -2], [-3, -3], [-4, -4], [-5, -5], [-6, -6], - [-7, -7], [-8, -8]] - result = wp.to_frame() - expected = DataFrame(exp_val, columns=['i1', 'i2'], index=exp_idx) - assert_frame_equal(result, expected) + with catch_warnings(record=True): + cols = MultiIndex(levels=[['C_A', 'C_B'], ['C_1', 'C_2']], + labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) + idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), ( + 2, 'two'), (3, 'three'), (4, 'four')]) + df = DataFrame([[1, 2, 11, 12], [3, 4, 13, 14], + ['a', 'b', 'w', 'x'], + ['c', 'd', 'y', 'z'], [-1, -2, -3, -4], + [-5, -6, -7, -8]], columns=cols, index=idx) + wp = Panel({'i1': df, 'i2': df}) + + exp_idx = MultiIndex.from_tuples( + [(1, 'one', 'C_A', 'C_1'), (1, 'one', 'C_A', 'C_2'), + (1, 'one', 'C_B', 'C_1'), (1, 'one', 'C_B', 'C_2'), + (1, 'two', 'C_A', 'C_1'), (1, 'two', 'C_A', 'C_2'), + (1, 'two', 'C_B', 'C_1'), (1, 'two', 'C_B', 'C_2'), + (2, 'one', 'C_A', 'C_1'), (2, 'one', 'C_A', 'C_2'), + (2, 'one', 'C_B', 'C_1'), (2, 'one', 'C_B', 'C_2'), + (2, 'two', 'C_A', 'C_1'), (2, 'two', 'C_A', 'C_2'), + (2, 'two', 'C_B', 'C_1'), (2, 'two', 'C_B', 'C_2'), + (3, 'three', 'C_A', 'C_1'), (3, 'three', 'C_A', 'C_2'), + (3, 'three', 'C_B', 'C_1'), (3, 'three', 'C_B', 'C_2'), + (4, 'four', 'C_A', 'C_1'), (4, 'four', 'C_A', 'C_2'), + (4, 'four', 'C_B', 'C_1'), (4, 'four', 'C_B', 'C_2')], + names=[None, None, None, None]) + exp_val = [[1, 1], [2, 2], [11, 11], [12, 12], + [3, 3], [4, 4], + [13, 13], [14, 14], ['a', 'a'], + ['b', 'b'], ['w', 'w'], + ['x', 'x'], ['c', 'c'], ['d', 'd'], [ + 'y', 'y'], ['z', 'z'], + [-1, -1], [-2, -2], [-3, -3], [-4, -4], + [-5, -5], [-6, -6], + [-7, -7], [-8, -8]] + result = wp.to_frame() + expected = DataFrame(exp_val, columns=['i1', 'i2'], index=exp_idx) + assert_frame_equal(result, expected) def test_to_frame_multi_drop_level(self): - idx = MultiIndex.from_tuples([(1, 'one'), (2, 'one'), (2, 'two')]) - df = DataFrame({'A': [np.nan, 1, 2]}, index=idx) - wp = Panel({'i1': df, 'i2': df}) - result = wp.to_frame() - exp_idx = MultiIndex.from_tuples([(2, 'one', 'A'), (2, 'two', 'A')], - names=[None, None, 'minor']) - expected = DataFrame({'i1': [1., 2], 'i2': [1., 2]}, index=exp_idx) - assert_frame_equal(result, expected) + with catch_warnings(record=True): + idx = MultiIndex.from_tuples([(1, 'one'), (2, 'one'), (2, 'two')]) + df = DataFrame({'A': [np.nan, 1, 2]}, index=idx) + wp = Panel({'i1': df, 'i2': df}) + result = wp.to_frame() + exp_idx = MultiIndex.from_tuples( + [(2, 'one', 'A'), (2, 'two', 'A')], + names=[None, None, 'minor']) + expected = DataFrame({'i1': [1., 2], 'i2': [1., 2]}, index=exp_idx) + assert_frame_equal(result, expected) def test_to_panel_na_handling(self): - df = DataFrame(np.random.randint(0, 10, size=20).reshape((10, 2)), - index=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1], - [0, 1, 2, 3, 4, 5, 2, 3, 4, 5]]) + with catch_warnings(record=True): + df = DataFrame(np.random.randint(0, 10, size=20).reshape((10, 2)), + index=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1], + [0, 1, 2, 3, 4, 5, 2, 3, 4, 5]]) - panel = df.to_panel() - self.assertTrue(isnull(panel[0].loc[1, [0, 1]]).all()) + panel = df.to_panel() + assert isna(panel[0].loc[1, [0, 1]]).all() def test_to_panel_duplicates(self): # #2441 - df = DataFrame({'a': [0, 0, 1], 'b': [1, 1, 1], 'c': [1, 2, 3]}) - idf = df.set_index(['a', 'b']) - assertRaisesRegexp(ValueError, 'non-uniquely indexed', idf.to_panel) + with catch_warnings(record=True): + df = DataFrame({'a': [0, 0, 1], 'b': [1, 1, 1], 'c': [1, 2, 3]}) + idf = df.set_index(['a', 'b']) + tm.assert_raises_regex( + ValueError, 'non-uniquely indexed', idf.to_panel) def test_panel_dups(self): + with catch_warnings(record=True): - # GH 4960 - # duplicates in an index + # GH 4960 + # duplicates in an index - # items - data = np.random.randn(5, 100, 5) - no_dup_panel = Panel(data, items=list("ABCDE")) - panel = Panel(data, items=list("AACDE")) + # items + data = np.random.randn(5, 100, 5) + no_dup_panel = Panel(data, items=list("ABCDE")) + panel = Panel(data, items=list("AACDE")) - expected = no_dup_panel['A'] - result = panel.iloc[0] - assert_frame_equal(result, expected) + expected = no_dup_panel['A'] + result = panel.iloc[0] + assert_frame_equal(result, expected) - expected = no_dup_panel['E'] - result = panel.loc['E'] - assert_frame_equal(result, expected) + expected = no_dup_panel['E'] + result = panel.loc['E'] + assert_frame_equal(result, expected) - expected = no_dup_panel.loc[['A', 'B']] - expected.items = ['A', 'A'] - result = panel.loc['A'] - assert_panel_equal(result, expected) + expected = no_dup_panel.loc[['A', 'B']] + expected.items = ['A', 'A'] + result = panel.loc['A'] + assert_panel_equal(result, expected) - # major - data = np.random.randn(5, 5, 5) - no_dup_panel = Panel(data, major_axis=list("ABCDE")) - panel = Panel(data, major_axis=list("AACDE")) + # major + data = np.random.randn(5, 5, 5) + no_dup_panel = Panel(data, major_axis=list("ABCDE")) + panel = Panel(data, major_axis=list("AACDE")) - expected = no_dup_panel.loc[:, 'A'] - result = panel.iloc[:, 0] - assert_frame_equal(result, expected) + expected = no_dup_panel.loc[:, 'A'] + result = panel.iloc[:, 0] + assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, 'E'] - result = panel.loc[:, 'E'] - assert_frame_equal(result, expected) + expected = no_dup_panel.loc[:, 'E'] + result = panel.loc[:, 'E'] + assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, ['A', 'B']] - expected.major_axis = ['A', 'A'] - result = panel.loc[:, 'A'] - assert_panel_equal(result, expected) + expected = no_dup_panel.loc[:, ['A', 'B']] + expected.major_axis = ['A', 'A'] + result = panel.loc[:, 'A'] + assert_panel_equal(result, expected) - # minor - data = np.random.randn(5, 100, 5) - no_dup_panel = Panel(data, minor_axis=list("ABCDE")) - panel = Panel(data, minor_axis=list("AACDE")) + # minor + data = np.random.randn(5, 100, 5) + no_dup_panel = Panel(data, minor_axis=list("ABCDE")) + panel = Panel(data, minor_axis=list("AACDE")) - expected = no_dup_panel.loc[:, :, 'A'] - result = panel.iloc[:, :, 0] - assert_frame_equal(result, expected) + expected = no_dup_panel.loc[:, :, 'A'] + result = panel.iloc[:, :, 0] + assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, :, 'E'] - result = panel.loc[:, :, 'E'] - assert_frame_equal(result, expected) + expected = no_dup_panel.loc[:, :, 'E'] + result = panel.loc[:, :, 'E'] + assert_frame_equal(result, expected) - expected = no_dup_panel.loc[:, :, ['A', 'B']] - expected.minor_axis = ['A', 'A'] - result = panel.loc[:, :, 'A'] - assert_panel_equal(result, expected) + expected = no_dup_panel.loc[:, :, ['A', 'B']] + expected.minor_axis = ['A', 'A'] + result = panel.loc[:, :, 'A'] + assert_panel_equal(result, expected) def test_filter(self): pass def test_compound(self): - compounded = self.panel.compound() + with catch_warnings(record=True): + compounded = self.panel.compound() - assert_series_equal(compounded['ItemA'], - (1 + self.panel['ItemA']).product(0) - 1, - check_names=False) + assert_series_equal(compounded['ItemA'], + (1 + self.panel['ItemA']).product(0) - 1, + check_names=False) def test_shift(self): - # major - idx = self.panel.major_axis[0] - idx_lag = self.panel.major_axis[1] - shifted = self.panel.shift(1) - assert_frame_equal(self.panel.major_xs(idx), shifted.major_xs(idx_lag)) - - # minor - idx = self.panel.minor_axis[0] - idx_lag = self.panel.minor_axis[1] - shifted = self.panel.shift(1, axis='minor') - assert_frame_equal(self.panel.minor_xs(idx), shifted.minor_xs(idx_lag)) - - # items - idx = self.panel.items[0] - idx_lag = self.panel.items[1] - shifted = self.panel.shift(1, axis='items') - assert_frame_equal(self.panel[idx], shifted[idx_lag]) - - # negative numbers, #2164 - result = self.panel.shift(-1) - expected = Panel(dict((i, f.shift(-1)[:-1]) - for i, f in self.panel.iteritems())) - assert_panel_equal(result, expected) - - # mixed dtypes #6959 - data = [('item ' + ch, makeMixedDataFrame()) for ch in list('abcde')] - data = dict(data) - mixed_panel = Panel.from_dict(data, orient='minor') - shifted = mixed_panel.shift(1) - assert_series_equal(mixed_panel.dtypes, shifted.dtypes) + with catch_warnings(record=True): + # major + idx = self.panel.major_axis[0] + idx_lag = self.panel.major_axis[1] + shifted = self.panel.shift(1) + assert_frame_equal(self.panel.major_xs(idx), + shifted.major_xs(idx_lag)) + + # minor + idx = self.panel.minor_axis[0] + idx_lag = self.panel.minor_axis[1] + shifted = self.panel.shift(1, axis='minor') + assert_frame_equal(self.panel.minor_xs(idx), + shifted.minor_xs(idx_lag)) + + # items + idx = self.panel.items[0] + idx_lag = self.panel.items[1] + shifted = self.panel.shift(1, axis='items') + assert_frame_equal(self.panel[idx], shifted[idx_lag]) + + # negative numbers, #2164 + result = self.panel.shift(-1) + expected = Panel(dict((i, f.shift(-1)[:-1]) + for i, f in self.panel.iteritems())) + assert_panel_equal(result, expected) + + # mixed dtypes #6959 + data = [('item ' + ch, makeMixedDataFrame()) + for ch in list('abcde')] + data = dict(data) + mixed_panel = Panel.from_dict(data, orient='minor') + shifted = mixed_panel.shift(1) + assert_series_equal(mixed_panel.dtypes, shifted.dtypes) def test_tshift(self): # PeriodIndex - ps = tm.makePeriodPanel() - shifted = ps.tshift(1) - unshifted = shifted.tshift(-1) + with catch_warnings(record=True): + ps = tm.makePeriodPanel() + shifted = ps.tshift(1) + unshifted = shifted.tshift(-1) - assert_panel_equal(unshifted, ps) + assert_panel_equal(unshifted, ps) - shifted2 = ps.tshift(freq='B') - assert_panel_equal(shifted, shifted2) + shifted2 = ps.tshift(freq='B') + assert_panel_equal(shifted, shifted2) - shifted3 = ps.tshift(freq=BDay()) - assert_panel_equal(shifted, shifted3) + shifted3 = ps.tshift(freq=BDay()) + assert_panel_equal(shifted, shifted3) - assertRaisesRegexp(ValueError, 'does not match', ps.tshift, freq='M') + tm.assert_raises_regex(ValueError, 'does not match', + ps.tshift, freq='M') - # DatetimeIndex - panel = _panel - shifted = panel.tshift(1) - unshifted = shifted.tshift(-1) + # DatetimeIndex + panel = make_test_panel() + shifted = panel.tshift(1) + unshifted = shifted.tshift(-1) - assert_panel_equal(panel, unshifted) + assert_panel_equal(panel, unshifted) - shifted2 = panel.tshift(freq=panel.major_axis.freq) - assert_panel_equal(shifted, shifted2) + shifted2 = panel.tshift(freq=panel.major_axis.freq) + assert_panel_equal(shifted, shifted2) - inferred_ts = Panel(panel.values, items=panel.items, - major_axis=Index(np.asarray(panel.major_axis)), - minor_axis=panel.minor_axis) - shifted = inferred_ts.tshift(1) - unshifted = shifted.tshift(-1) - assert_panel_equal(shifted, panel.tshift(1)) - assert_panel_equal(unshifted, inferred_ts) + inferred_ts = Panel(panel.values, items=panel.items, + major_axis=Index(np.asarray(panel.major_axis)), + minor_axis=panel.minor_axis) + shifted = inferred_ts.tshift(1) + unshifted = shifted.tshift(-1) + assert_panel_equal(shifted, panel.tshift(1)) + assert_panel_equal(unshifted, inferred_ts) - no_freq = panel.iloc[:, [0, 5, 7], :] - self.assertRaises(ValueError, no_freq.tshift) + no_freq = panel.iloc[:, [0, 5, 7], :] + pytest.raises(ValueError, no_freq.tshift) def test_pct_change(self): - df1 = DataFrame({'c1': [1, 2, 5], 'c2': [3, 4, 6]}) - df2 = df1 + 1 - df3 = DataFrame({'c1': [3, 4, 7], 'c2': [5, 6, 8]}) - wp = Panel({'i1': df1, 'i2': df2, 'i3': df3}) - # major, 1 - result = wp.pct_change() # axis='major' - expected = Panel({'i1': df1.pct_change(), - 'i2': df2.pct_change(), - 'i3': df3.pct_change()}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=1) - assert_panel_equal(result, expected) - # major, 2 - result = wp.pct_change(periods=2) - expected = Panel({'i1': df1.pct_change(2), - 'i2': df2.pct_change(2), - 'i3': df3.pct_change(2)}) - assert_panel_equal(result, expected) - # minor, 1 - result = wp.pct_change(axis='minor') - expected = Panel({'i1': df1.pct_change(axis=1), - 'i2': df2.pct_change(axis=1), - 'i3': df3.pct_change(axis=1)}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=2) - assert_panel_equal(result, expected) - # minor, 2 - result = wp.pct_change(periods=2, axis='minor') - expected = Panel({'i1': df1.pct_change(periods=2, axis=1), - 'i2': df2.pct_change(periods=2, axis=1), - 'i3': df3.pct_change(periods=2, axis=1)}) - assert_panel_equal(result, expected) - # items, 1 - result = wp.pct_change(axis='items') - expected = Panel({'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i2': DataFrame({'c1': [1, 0.5, .2], - 'c2': [1. / 3, 0.25, 1. / 6]}), - 'i3': DataFrame({'c1': [.5, 1. / 3, 1. / 6], - 'c2': [.25, .2, 1. / 7]})}) - assert_panel_equal(result, expected) - result = wp.pct_change(axis=0) - assert_panel_equal(result, expected) - # items, 2 - result = wp.pct_change(periods=2, axis='items') - expected = Panel({'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i2': DataFrame({'c1': [np.nan, np.nan, np.nan], - 'c2': [np.nan, np.nan, np.nan]}), - 'i3': DataFrame({'c1': [2, 1, .4], - 'c2': [2. / 3, .5, 1. / 3]})}) - assert_panel_equal(result, expected) + with catch_warnings(record=True): + df1 = DataFrame({'c1': [1, 2, 5], 'c2': [3, 4, 6]}) + df2 = df1 + 1 + df3 = DataFrame({'c1': [3, 4, 7], 'c2': [5, 6, 8]}) + wp = Panel({'i1': df1, 'i2': df2, 'i3': df3}) + # major, 1 + result = wp.pct_change() # axis='major' + expected = Panel({'i1': df1.pct_change(), + 'i2': df2.pct_change(), + 'i3': df3.pct_change()}) + assert_panel_equal(result, expected) + result = wp.pct_change(axis=1) + assert_panel_equal(result, expected) + # major, 2 + result = wp.pct_change(periods=2) + expected = Panel({'i1': df1.pct_change(2), + 'i2': df2.pct_change(2), + 'i3': df3.pct_change(2)}) + assert_panel_equal(result, expected) + # minor, 1 + result = wp.pct_change(axis='minor') + expected = Panel({'i1': df1.pct_change(axis=1), + 'i2': df2.pct_change(axis=1), + 'i3': df3.pct_change(axis=1)}) + assert_panel_equal(result, expected) + result = wp.pct_change(axis=2) + assert_panel_equal(result, expected) + # minor, 2 + result = wp.pct_change(periods=2, axis='minor') + expected = Panel({'i1': df1.pct_change(periods=2, axis=1), + 'i2': df2.pct_change(periods=2, axis=1), + 'i3': df3.pct_change(periods=2, axis=1)}) + assert_panel_equal(result, expected) + # items, 1 + result = wp.pct_change(axis='items') + expected = Panel( + {'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], + 'c2': [np.nan, np.nan, np.nan]}), + 'i2': DataFrame({'c1': [1, 0.5, .2], + 'c2': [1. / 3, 0.25, 1. / 6]}), + 'i3': DataFrame({'c1': [.5, 1. / 3, 1. / 6], + 'c2': [.25, .2, 1. / 7]})}) + assert_panel_equal(result, expected) + result = wp.pct_change(axis=0) + assert_panel_equal(result, expected) + # items, 2 + result = wp.pct_change(periods=2, axis='items') + expected = Panel( + {'i1': DataFrame({'c1': [np.nan, np.nan, np.nan], + 'c2': [np.nan, np.nan, np.nan]}), + 'i2': DataFrame({'c1': [np.nan, np.nan, np.nan], + 'c2': [np.nan, np.nan, np.nan]}), + 'i3': DataFrame({'c1': [2, 1, .4], + 'c2': [2. / 3, .5, 1. / 3]})}) + assert_panel_equal(result, expected) def test_round(self): - values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], - [-1566.213, 88.88], [-12, 94.5]], - [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], - [272.212, -99.99], [23, -76.5]]] - evalues = [[[float(np.around(i)) for i in j] for j in k] - for k in values] - p = Panel(values, items=['Item1', 'Item2'], - major_axis=pd.date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - expected = Panel(evalues, items=['Item1', 'Item2'], - major_axis=pd.date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - result = p.round() - self.assert_panel_equal(expected, result) + with catch_warnings(record=True): + values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], + [-1566.213, 88.88], [-12, 94.5]], + [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], + [272.212, -99.99], [23, -76.5]]] + evalues = [[[float(np.around(i)) for i in j] for j in k] + for k in values] + p = Panel(values, items=['Item1', 'Item2'], + major_axis=pd.date_range('1/1/2000', periods=5), + minor_axis=['A', 'B']) + expected = Panel(evalues, items=['Item1', 'Item2'], + major_axis=pd.date_range('1/1/2000', periods=5), + minor_axis=['A', 'B']) + result = p.round() + assert_panel_equal(expected, result) def test_numpy_round(self): - values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], - [-1566.213, 88.88], [-12, 94.5]], - [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], - [272.212, -99.99], [23, -76.5]]] - evalues = [[[float(np.around(i)) for i in j] for j in k] - for k in values] - p = Panel(values, items=['Item1', 'Item2'], - major_axis=pd.date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - expected = Panel(evalues, items=['Item1', 'Item2'], - major_axis=pd.date_range('1/1/2000', periods=5), - minor_axis=['A', 'B']) - result = np.round(p) - self.assert_panel_equal(expected, result) - - msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.round, p, out=p) + with catch_warnings(record=True): + values = [[[-3.2, 2.2], [0, -4.8213], [3.123, 123.12], + [-1566.213, 88.88], [-12, 94.5]], + [[-5.82, 3.5], [6.21, -73.272], [-9.087, 23.12], + [272.212, -99.99], [23, -76.5]]] + evalues = [[[float(np.around(i)) for i in j] for j in k] + for k in values] + p = Panel(values, items=['Item1', 'Item2'], + major_axis=pd.date_range('1/1/2000', periods=5), + minor_axis=['A', 'B']) + expected = Panel(evalues, items=['Item1', 'Item2'], + major_axis=pd.date_range('1/1/2000', periods=5), + minor_axis=['A', 'B']) + result = np.round(p) + assert_panel_equal(expected, result) + + msg = "the 'out' parameter is not supported" + tm.assert_raises_regex(ValueError, msg, np.round, p, out=p) def test_multiindex_get(self): - ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)], - names=['first', 'second']) - wp = Panel(np.random.random((4, 5, 5)), - items=ind, - major_axis=np.arange(5), - minor_axis=np.arange(5)) - f1 = wp['a'] - f2 = wp.loc['a'] - assert_panel_equal(f1, f2) - - self.assertTrue((f1.items == [1, 2]).all()) - self.assertTrue((f2.items == [1, 2]).all()) - - ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], - names=['first', 'second']) + with catch_warnings(record=True): + ind = MultiIndex.from_tuples( + [('a', 1), ('a', 2), ('b', 1), ('b', 2)], + names=['first', 'second']) + wp = Panel(np.random.random((4, 5, 5)), + items=ind, + major_axis=np.arange(5), + minor_axis=np.arange(5)) + f1 = wp['a'] + f2 = wp.loc['a'] + assert_panel_equal(f1, f2) + + assert (f1.items == [1, 2]).all() + assert (f2.items == [1, 2]).all() + + ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], + names=['first', 'second']) def test_multiindex_blocks(self): - ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], - names=['first', 'second']) - wp = Panel(self.panel._data) - wp.items = ind - f1 = wp['a'] - self.assertTrue((f1.items == [1, 2]).all()) + with catch_warnings(record=True): + ind = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)], + names=['first', 'second']) + wp = Panel(self.panel._data) + wp.items = ind + f1 = wp['a'] + assert (f1.items == [1, 2]).all() - f1 = wp[('b', 1)] - self.assertTrue((f1.columns == ['A', 'B', 'C', 'D']).all()) + f1 = wp[('b', 1)] + assert (f1.columns == ['A', 'B', 'C', 'D']).all() def test_repr_empty(self): - empty = Panel() - repr(empty) + with catch_warnings(record=True): + empty = Panel() + repr(empty) def test_rename(self): - mapper = {'ItemA': 'foo', 'ItemB': 'bar', 'ItemC': 'baz'} + with catch_warnings(record=True): + mapper = {'ItemA': 'foo', 'ItemB': 'bar', 'ItemC': 'baz'} - renamed = self.panel.rename_axis(mapper, axis=0) - exp = Index(['foo', 'bar', 'baz']) - self.assert_index_equal(renamed.items, exp) + renamed = self.panel.rename_axis(mapper, axis=0) + exp = Index(['foo', 'bar', 'baz']) + tm.assert_index_equal(renamed.items, exp) - renamed = self.panel.rename_axis(str.lower, axis=2) - exp = Index(['a', 'b', 'c', 'd']) - self.assert_index_equal(renamed.minor_axis, exp) + renamed = self.panel.rename_axis(str.lower, axis=2) + exp = Index(['a', 'b', 'c', 'd']) + tm.assert_index_equal(renamed.minor_axis, exp) - # don't copy - renamed_nocopy = self.panel.rename_axis(mapper, axis=0, copy=False) - renamed_nocopy['foo'] = 3. - self.assertTrue((self.panel['ItemA'].values == 3).all()) + # don't copy + renamed_nocopy = self.panel.rename_axis(mapper, axis=0, copy=False) + renamed_nocopy['foo'] = 3. + assert (self.panel['ItemA'].values == 3).all() def test_get_attr(self): assert_frame_equal(self.panel['ItemA'], self.panel.ItemA) @@ -2050,12 +2176,13 @@ def test_get_attr(self): assert_frame_equal(self.panel['i'], self.panel.i) def test_from_frame_level1_unsorted(self): - tuples = [('MSFT', 3), ('MSFT', 2), ('AAPL', 2), ('AAPL', 1), - ('MSFT', 1)] - midx = MultiIndex.from_tuples(tuples) - df = DataFrame(np.random.rand(5, 4), index=midx) - p = df.to_panel() - assert_frame_equal(p.minor_xs(2), df.xs(2, level=1).sort_index()) + with catch_warnings(record=True): + tuples = [('MSFT', 3), ('MSFT', 2), ('AAPL', 2), ('AAPL', 1), + ('MSFT', 1)] + midx = MultiIndex.from_tuples(tuples) + df = DataFrame(np.random.rand(5, 4), index=midx) + p = df.to_panel() + assert_frame_equal(p.minor_xs(2), df.xs(2, level=1).sort_index()) def test_to_excel(self): try: @@ -2064,7 +2191,7 @@ def test_to_excel(self): import openpyxl # noqa from pandas.io.excel import ExcelFile except ImportError: - raise nose.SkipTest("need xlwt xlrd openpyxl") + pytest.skip("need xlwt xlrd openpyxl") for ext in ['xls', 'xlsx']: with ensure_clean('__tmp__.' + ext) as path: @@ -2072,7 +2199,7 @@ def test_to_excel(self): try: reader = ExcelFile(path) except ImportError: - raise nose.SkipTest("need xlwt xlrd openpyxl") + pytest.skip("need xlwt xlrd openpyxl") for item, df in self.panel.iteritems(): recdf = reader.parse(str(item), index_col=0) @@ -2084,297 +2211,335 @@ def test_to_excel_xlsxwriter(self): import xlsxwriter # noqa from pandas.io.excel import ExcelFile except ImportError: - raise nose.SkipTest("Requires xlrd and xlsxwriter. Skipping test.") + pytest.skip("Requires xlrd and xlsxwriter. Skipping test.") with ensure_clean('__tmp__.xlsx') as path: self.panel.to_excel(path, engine='xlsxwriter') try: reader = ExcelFile(path) except ImportError as e: - raise nose.SkipTest("cannot write excel file: %s" % e) + pytest.skip("cannot write excel file: %s" % e) for item, df in self.panel.iteritems(): recdf = reader.parse(str(item), index_col=0) assert_frame_equal(df, recdf) def test_dropna(self): - p = Panel(np.random.randn(4, 5, 6), major_axis=list('abcde')) - p.loc[:, ['b', 'd'], 0] = np.nan + with catch_warnings(record=True): + p = Panel(np.random.randn(4, 5, 6), major_axis=list('abcde')) + p.loc[:, ['b', 'd'], 0] = np.nan - result = p.dropna(axis=1) - exp = p.loc[:, ['a', 'c', 'e'], :] - assert_panel_equal(result, exp) - inp = p.copy() - inp.dropna(axis=1, inplace=True) - assert_panel_equal(inp, exp) + result = p.dropna(axis=1) + exp = p.loc[:, ['a', 'c', 'e'], :] + assert_panel_equal(result, exp) + inp = p.copy() + inp.dropna(axis=1, inplace=True) + assert_panel_equal(inp, exp) - result = p.dropna(axis=1, how='all') - assert_panel_equal(result, p) + result = p.dropna(axis=1, how='all') + assert_panel_equal(result, p) - p.loc[:, ['b', 'd'], :] = np.nan - result = p.dropna(axis=1, how='all') - exp = p.loc[:, ['a', 'c', 'e'], :] - assert_panel_equal(result, exp) + p.loc[:, ['b', 'd'], :] = np.nan + result = p.dropna(axis=1, how='all') + exp = p.loc[:, ['a', 'c', 'e'], :] + assert_panel_equal(result, exp) - p = Panel(np.random.randn(4, 5, 6), items=list('abcd')) - p.loc[['b'], :, 0] = np.nan + p = Panel(np.random.randn(4, 5, 6), items=list('abcd')) + p.loc[['b'], :, 0] = np.nan - result = p.dropna() - exp = p.loc[['a', 'c', 'd']] - assert_panel_equal(result, exp) + result = p.dropna() + exp = p.loc[['a', 'c', 'd']] + assert_panel_equal(result, exp) - result = p.dropna(how='all') - assert_panel_equal(result, p) + result = p.dropna(how='all') + assert_panel_equal(result, p) - p.loc['b'] = np.nan - result = p.dropna(how='all') - exp = p.loc[['a', 'c', 'd']] - assert_panel_equal(result, exp) + p.loc['b'] = np.nan + result = p.dropna(how='all') + exp = p.loc[['a', 'c', 'd']] + assert_panel_equal(result, exp) def test_drop(self): - df = DataFrame({"A": [1, 2], "B": [3, 4]}) - panel = Panel({"One": df, "Two": df}) + with catch_warnings(record=True): + df = DataFrame({"A": [1, 2], "B": [3, 4]}) + panel = Panel({"One": df, "Two": df}) - def check_drop(drop_val, axis_number, aliases, expected): - try: - actual = panel.drop(drop_val, axis=axis_number) - assert_panel_equal(actual, expected) - for alias in aliases: - actual = panel.drop(drop_val, axis=alias) + def check_drop(drop_val, axis_number, aliases, expected): + try: + actual = panel.drop(drop_val, axis=axis_number) assert_panel_equal(actual, expected) - except AssertionError: - pprint_thing("Failed with axis_number %d and aliases: %s" % - (axis_number, aliases)) - raise - # Items - expected = Panel({"One": df}) - check_drop('Two', 0, ['items'], expected) - - self.assertRaises(ValueError, panel.drop, 'Three') - - # errors = 'ignore' - dropped = panel.drop('Three', errors='ignore') - assert_panel_equal(dropped, panel) - dropped = panel.drop(['Two', 'Three'], errors='ignore') - expected = Panel({"One": df}) - assert_panel_equal(dropped, expected) - - # Major - exp_df = DataFrame({"A": [2], "B": [4]}, index=[1]) - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop(0, 1, ['major_axis', 'major'], expected) - - exp_df = DataFrame({"A": [1], "B": [3]}, index=[0]) - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop([1], 1, ['major_axis', 'major'], expected) - - # Minor - exp_df = df[['B']] - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop(["A"], 2, ['minor_axis', 'minor'], expected) - - exp_df = df[['A']] - expected = Panel({"One": exp_df, "Two": exp_df}) - check_drop("B", 2, ['minor_axis', 'minor'], expected) + for alias in aliases: + actual = panel.drop(drop_val, axis=alias) + assert_panel_equal(actual, expected) + except AssertionError: + pprint_thing("Failed with axis_number %d and aliases: %s" % + (axis_number, aliases)) + raise + # Items + expected = Panel({"One": df}) + check_drop('Two', 0, ['items'], expected) + + pytest.raises(ValueError, panel.drop, 'Three') + + # errors = 'ignore' + dropped = panel.drop('Three', errors='ignore') + assert_panel_equal(dropped, panel) + dropped = panel.drop(['Two', 'Three'], errors='ignore') + expected = Panel({"One": df}) + assert_panel_equal(dropped, expected) + + # Major + exp_df = DataFrame({"A": [2], "B": [4]}, index=[1]) + expected = Panel({"One": exp_df, "Two": exp_df}) + check_drop(0, 1, ['major_axis', 'major'], expected) + + exp_df = DataFrame({"A": [1], "B": [3]}, index=[0]) + expected = Panel({"One": exp_df, "Two": exp_df}) + check_drop([1], 1, ['major_axis', 'major'], expected) + + # Minor + exp_df = df[['B']] + expected = Panel({"One": exp_df, "Two": exp_df}) + check_drop(["A"], 2, ['minor_axis', 'minor'], expected) + + exp_df = df[['A']] + expected = Panel({"One": exp_df, "Two": exp_df}) + check_drop("B", 2, ['minor_axis', 'minor'], expected) def test_update(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) + with catch_warnings(record=True): + pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]], + [[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) - other = Panel([[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) + other = Panel( + [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - pan.update(other) + pan.update(other) - expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[3.6, 2., 3], [1.5, np.nan, 7], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) + expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], [1.5, np.nan, 3.]], + [[3.6, 2., 3], [1.5, np.nan, 7], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) - assert_panel_equal(pan, expected) + assert_panel_equal(pan, expected) def test_update_from_dict(self): - pan = Panel({'one': DataFrame([[1.5, np.nan, 3], [1.5, np.nan, 3], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]]), - 'two': DataFrame([[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]])}) - - other = {'two': DataFrame([[3.6, 2., np.nan], [np.nan, np.nan, 7]])} - - pan.update(other) - - expected = Panel( - {'two': DataFrame([[3.6, 2., 3], [1.5, np.nan, 7], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]]), - 'one': DataFrame([[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]])}) - - assert_panel_equal(pan, expected) + with catch_warnings(record=True): + pan = Panel({'one': DataFrame([[1.5, np.nan, 3], + [1.5, np.nan, 3], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]), + 'two': DataFrame([[1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]])}) + + other = {'two': DataFrame( + [[3.6, 2., np.nan], [np.nan, np.nan, 7]])} + + pan.update(other) + + expected = Panel( + {'two': DataFrame([[3.6, 2., 3], + [1.5, np.nan, 7], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]), + 'one': DataFrame([[1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]])}) + + assert_panel_equal(pan, expected) def test_update_nooverwrite(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) + with catch_warnings(record=True): + pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]], + [[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) - other = Panel([[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) + other = Panel( + [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - pan.update(other, overwrite=False) + pan.update(other, overwrite=False) - expected = Panel([[[1.5, np.nan, 3], [1.5, np.nan, 3], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[1.5, 2., 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) + expected = Panel([[[1.5, np.nan, 3], [1.5, np.nan, 3], + [1.5, np.nan, 3.], [1.5, np.nan, 3.]], + [[1.5, 2., 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) - assert_panel_equal(pan, expected) + assert_panel_equal(pan, expected) def test_update_filtered(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) + with catch_warnings(record=True): + pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]], + [[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) - other = Panel([[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) + other = Panel( + [[[3.6, 2., np.nan], [np.nan, np.nan, 7]]], items=[1]) - pan.update(other, filter_func=lambda x: x > 2) + pan.update(other, filter_func=lambda x: x > 2) - expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]], - [[1.5, np.nan, 3], [1.5, np.nan, 7], - [1.5, np.nan, 3.], [1.5, np.nan, 3.]]]) + expected = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], [1.5, np.nan, 3.]], + [[1.5, np.nan, 3], [1.5, np.nan, 7], + [1.5, np.nan, 3.], [1.5, np.nan, 3.]]]) - assert_panel_equal(pan, expected) + assert_panel_equal(pan, expected) def test_update_raise(self): - pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]], - [[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], - [1.5, np.nan, 3.]]]) - - self.assertRaises(Exception, pan.update, *(pan, ), + with catch_warnings(record=True): + pan = Panel([[[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]], + [[1.5, np.nan, 3.], [1.5, np.nan, 3.], + [1.5, np.nan, 3.], + [1.5, np.nan, 3.]]]) + + pytest.raises(Exception, pan.update, *(pan, ), **{'raise_conflict': True}) def test_all_any(self): - self.assertTrue((self.panel.all(axis=0).values == nanall( - self.panel, axis=0)).all()) - self.assertTrue((self.panel.all(axis=1).values == nanall( - self.panel, axis=1).T).all()) - self.assertTrue((self.panel.all(axis=2).values == nanall( - self.panel, axis=2).T).all()) - self.assertTrue((self.panel.any(axis=0).values == nanany( - self.panel, axis=0)).all()) - self.assertTrue((self.panel.any(axis=1).values == nanany( - self.panel, axis=1).T).all()) - self.assertTrue((self.panel.any(axis=2).values == nanany( - self.panel, axis=2).T).all()) + assert (self.panel.all(axis=0).values == nanall( + self.panel, axis=0)).all() + assert (self.panel.all(axis=1).values == nanall( + self.panel, axis=1).T).all() + assert (self.panel.all(axis=2).values == nanall( + self.panel, axis=2).T).all() + assert (self.panel.any(axis=0).values == nanany( + self.panel, axis=0)).all() + assert (self.panel.any(axis=1).values == nanany( + self.panel, axis=1).T).all() + assert (self.panel.any(axis=2).values == nanany( + self.panel, axis=2).T).all() def test_all_any_unhandled(self): - self.assertRaises(NotImplementedError, self.panel.all, bool_only=True) - self.assertRaises(NotImplementedError, self.panel.any, bool_only=True) + pytest.raises(NotImplementedError, self.panel.all, bool_only=True) + pytest.raises(NotImplementedError, self.panel.any, bool_only=True) + + # GH issue 15960 + def test_sort_values(self): + pytest.raises(NotImplementedError, self.panel.sort_values) + pytest.raises(NotImplementedError, self.panel.sort_values, 'ItemA') -class TestLongPanel(tm.TestCase): +class TestLongPanel(object): """ LongPanel no longer exists, but... """ - _multiprocess_can_split_ = True - - def setUp(self): - import warnings - warnings.filterwarnings(action='ignore', category=FutureWarning) - - panel = tm.makePanel() - tm.add_nans(panel) + def setup_method(self, method): + panel = make_test_panel() self.panel = panel.to_frame() self.unfiltered_panel = panel.to_frame(filter_observations=False) def test_ops_differently_indexed(self): - # trying to set non-identically indexed panel - wp = self.panel.to_panel() - wp2 = wp.reindex(major=wp.major_axis[:-1]) - lp2 = wp2.to_frame() + with catch_warnings(record=True): + # trying to set non-identically indexed panel + wp = self.panel.to_panel() + wp2 = wp.reindex(major=wp.major_axis[:-1]) + lp2 = wp2.to_frame() - result = self.panel + lp2 - assert_frame_equal(result.reindex(lp2.index), lp2 * 2) + result = self.panel + lp2 + assert_frame_equal(result.reindex(lp2.index), lp2 * 2) - # careful, mutation - self.panel['foo'] = lp2['ItemA'] - assert_series_equal(self.panel['foo'].reindex(lp2.index), lp2['ItemA'], - check_names=False) + # careful, mutation + self.panel['foo'] = lp2['ItemA'] + assert_series_equal(self.panel['foo'].reindex(lp2.index), + lp2['ItemA'], + check_names=False) def test_ops_scalar(self): - result = self.panel.mul(2) - expected = DataFrame.__mul__(self.panel, 2) - assert_frame_equal(result, expected) + with catch_warnings(record=True): + result = self.panel.mul(2) + expected = DataFrame.__mul__(self.panel, 2) + assert_frame_equal(result, expected) def test_combineFrame(self): - wp = self.panel.to_panel() - result = self.panel.add(wp['ItemA'].stack(), axis=0) - assert_frame_equal(result.to_panel()['ItemA'], wp['ItemA'] * 2) + with catch_warnings(record=True): + wp = self.panel.to_panel() + result = self.panel.add(wp['ItemA'].stack(), axis=0) + assert_frame_equal(result.to_panel()['ItemA'], wp['ItemA'] * 2) def test_combinePanel(self): - wp = self.panel.to_panel() - result = self.panel.add(self.panel) - wide_result = result.to_panel() - assert_frame_equal(wp['ItemA'] * 2, wide_result['ItemA']) + with catch_warnings(record=True): + wp = self.panel.to_panel() + result = self.panel.add(self.panel) + wide_result = result.to_panel() + assert_frame_equal(wp['ItemA'] * 2, wide_result['ItemA']) - # one item - result = self.panel.add(self.panel.filter(['ItemA'])) + # one item + result = self.panel.add(self.panel.filter(['ItemA'])) def test_combine_scalar(self): - result = self.panel.mul(2) - expected = DataFrame(self.panel._data) * 2 - assert_frame_equal(result, expected) + with catch_warnings(record=True): + result = self.panel.mul(2) + expected = DataFrame(self.panel._data) * 2 + assert_frame_equal(result, expected) def test_combine_series(self): - s = self.panel['ItemA'][:10] - result = self.panel.add(s, axis=0) - expected = DataFrame.add(self.panel, s, axis=0) - assert_frame_equal(result, expected) + with catch_warnings(record=True): + s = self.panel['ItemA'][:10] + result = self.panel.add(s, axis=0) + expected = DataFrame.add(self.panel, s, axis=0) + assert_frame_equal(result, expected) - s = self.panel.iloc[5] - result = self.panel + s - expected = DataFrame.add(self.panel, s, axis=1) - assert_frame_equal(result, expected) + s = self.panel.iloc[5] + result = self.panel + s + expected = DataFrame.add(self.panel, s, axis=1) + assert_frame_equal(result, expected) def test_operators(self): - wp = self.panel.to_panel() - result = (self.panel + 1).to_panel() - assert_frame_equal(wp['ItemA'] + 1, result['ItemA']) + with catch_warnings(record=True): + wp = self.panel.to_panel() + result = (self.panel + 1).to_panel() + assert_frame_equal(wp['ItemA'] + 1, result['ItemA']) def test_arith_flex_panel(self): - ops = ['add', 'sub', 'mul', 'div', 'truediv', 'pow', 'floordiv', 'mod'] - if not compat.PY3: - aliases = {} - else: - aliases = {'div': 'truediv'} - self.panel = self.panel.to_panel() - - for n in [np.random.randint(-50, -1), np.random.randint(1, 50), 0]: - for op in ops: - alias = aliases.get(op, op) - f = getattr(operator, alias) - exp = f(self.panel, n) - result = getattr(self.panel, op)(n) - assert_panel_equal(result, exp, check_panel_type=True) - - # rops - r_f = lambda x, y: f(y, x) - exp = r_f(self.panel, n) - result = getattr(self.panel, 'r' + op)(n) - assert_panel_equal(result, exp) + with catch_warnings(record=True): + ops = ['add', 'sub', 'mul', 'div', + 'truediv', 'pow', 'floordiv', 'mod'] + if not compat.PY3: + aliases = {} + else: + aliases = {'div': 'truediv'} + self.panel = self.panel.to_panel() + + for n in [np.random.randint(-50, -1), np.random.randint(1, 50), 0]: + for op in ops: + alias = aliases.get(op, op) + f = getattr(operator, alias) + exp = f(self.panel, n) + result = getattr(self.panel, op)(n) + assert_panel_equal(result, exp, check_panel_type=True) + + # rops + r_f = lambda x, y: f(y, x) + exp = r_f(self.panel, n) + result = getattr(self.panel, 'r' + op)(n) + assert_panel_equal(result, exp) def test_sort(self): def is_sorted(arr): return (arr[1:] > arr[:-1]).any() sorted_minor = self.panel.sort_index(level=1) - self.assertTrue(is_sorted(sorted_minor.index.labels[1])) + assert is_sorted(sorted_minor.index.labels[1]) sorted_major = sorted_minor.sort_index(level=0) - self.assertTrue(is_sorted(sorted_major.index.labels[0])) + assert is_sorted(sorted_major.index.labels[0]) def test_to_string(self): buf = StringIO() @@ -2383,153 +2548,140 @@ def test_to_string(self): def test_to_sparse(self): if isinstance(self.panel, Panel): msg = 'sparsifying is not supported' - tm.assertRaisesRegexp(NotImplementedError, msg, - self.panel.to_sparse) + tm.assert_raises_regex(NotImplementedError, msg, + self.panel.to_sparse) def test_truncate(self): - dates = self.panel.index.levels[0] - start, end = dates[1], dates[5] + with catch_warnings(record=True): + dates = self.panel.index.levels[0] + start, end = dates[1], dates[5] - trunced = self.panel.truncate(start, end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(start, end) + trunced = self.panel.truncate(start, end).to_panel() + expected = self.panel.to_panel()['ItemA'].truncate(start, end) - # TODO trucate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) + # TODO trucate drops index.names + assert_frame_equal(trunced['ItemA'], expected, check_names=False) - trunced = self.panel.truncate(before=start).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(before=start) + trunced = self.panel.truncate(before=start).to_panel() + expected = self.panel.to_panel()['ItemA'].truncate(before=start) - # TODO trucate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) + # TODO trucate drops index.names + assert_frame_equal(trunced['ItemA'], expected, check_names=False) - trunced = self.panel.truncate(after=end).to_panel() - expected = self.panel.to_panel()['ItemA'].truncate(after=end) + trunced = self.panel.truncate(after=end).to_panel() + expected = self.panel.to_panel()['ItemA'].truncate(after=end) - # TODO trucate drops index.names - assert_frame_equal(trunced['ItemA'], expected, check_names=False) + # TODO trucate drops index.names + assert_frame_equal(trunced['ItemA'], expected, check_names=False) - # truncate on dates that aren't in there - wp = self.panel.to_panel() - new_index = wp.major_axis[::5] + # truncate on dates that aren't in there + wp = self.panel.to_panel() + new_index = wp.major_axis[::5] - wp2 = wp.reindex(major=new_index) + wp2 = wp.reindex(major=new_index) - lp2 = wp2.to_frame() - lp_trunc = lp2.truncate(wp.major_axis[2], wp.major_axis[-2]) + lp2 = wp2.to_frame() + lp_trunc = lp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - wp_trunc = wp2.truncate(wp.major_axis[2], wp.major_axis[-2]) + wp_trunc = wp2.truncate(wp.major_axis[2], wp.major_axis[-2]) - assert_panel_equal(wp_trunc, lp_trunc.to_panel()) + assert_panel_equal(wp_trunc, lp_trunc.to_panel()) - # throw proper exception - self.assertRaises(Exception, lp2.truncate, wp.major_axis[-2], + # throw proper exception + pytest.raises(Exception, lp2.truncate, wp.major_axis[-2], wp.major_axis[2]) def test_axis_dummies(self): - from pandas.core.reshape import make_axis_dummies + from pandas.core.reshape.reshape import make_axis_dummies minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) - self.assertEqual(len(minor_dummies.columns), - len(self.panel.index.levels[1])) + assert len(minor_dummies.columns) == len(self.panel.index.levels[1]) major_dummies = make_axis_dummies(self.panel, 'major').astype(np.uint8) - self.assertEqual(len(major_dummies.columns), - len(self.panel.index.levels[0])) + assert len(major_dummies.columns) == len(self.panel.index.levels[0]) mapping = {'A': 'one', 'B': 'one', 'C': 'two', 'D': 'two'} transformed = make_axis_dummies(self.panel, 'minor', transform=mapping.get).astype(np.uint8) - self.assertEqual(len(transformed.columns), 2) - self.assert_index_equal(transformed.columns, Index(['one', 'two'])) + assert len(transformed.columns) == 2 + tm.assert_index_equal(transformed.columns, Index(['one', 'two'])) # TODO: test correctness def test_get_dummies(self): - from pandas.core.reshape import get_dummies, make_axis_dummies + from pandas.core.reshape.reshape import get_dummies, make_axis_dummies self.panel['Label'] = self.panel.index.labels[1] minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) dummies = get_dummies(self.panel['Label']) - self.assert_numpy_array_equal(dummies.values, minor_dummies.values) + tm.assert_numpy_array_equal(dummies.values, minor_dummies.values) def test_mean(self): - means = self.panel.mean(level='minor') + with catch_warnings(record=True): + means = self.panel.mean(level='minor') - # test versus Panel version - wide_means = self.panel.to_panel().mean('major') - assert_frame_equal(means, wide_means) + # test versus Panel version + wide_means = self.panel.to_panel().mean('major') + assert_frame_equal(means, wide_means) def test_sum(self): - sums = self.panel.sum(level='minor') + with catch_warnings(record=True): + sums = self.panel.sum(level='minor') - # test versus Panel version - wide_sums = self.panel.to_panel().sum('major') - assert_frame_equal(sums, wide_sums) + # test versus Panel version + wide_sums = self.panel.to_panel().sum('major') + assert_frame_equal(sums, wide_sums) def test_count(self): - index = self.panel.index + with catch_warnings(record=True): + index = self.panel.index - major_count = self.panel.count(level=0)['ItemA'] - labels = index.labels[0] - for i, idx in enumerate(index.levels[0]): - self.assertEqual(major_count[i], (labels == i).sum()) + major_count = self.panel.count(level=0)['ItemA'] + labels = index.labels[0] + for i, idx in enumerate(index.levels[0]): + assert major_count[i] == (labels == i).sum() - minor_count = self.panel.count(level=1)['ItemA'] - labels = index.labels[1] - for i, idx in enumerate(index.levels[1]): - self.assertEqual(minor_count[i], (labels == i).sum()) + minor_count = self.panel.count(level=1)['ItemA'] + labels = index.labels[1] + for i, idx in enumerate(index.levels[1]): + assert minor_count[i] == (labels == i).sum() def test_join(self): - lp1 = self.panel.filter(['ItemA', 'ItemB']) - lp2 = self.panel.filter(['ItemC']) + with catch_warnings(record=True): + lp1 = self.panel.filter(['ItemA', 'ItemB']) + lp2 = self.panel.filter(['ItemC']) - joined = lp1.join(lp2) + joined = lp1.join(lp2) - self.assertEqual(len(joined.columns), 3) + assert len(joined.columns) == 3 - self.assertRaises(Exception, lp1.join, + pytest.raises(Exception, lp1.join, self.panel.filter(['ItemB', 'ItemC'])) def test_pivot(self): - from pandas.core.reshape import _slow_pivot - - one, two, three = (np.array([1, 2, 3, 4, 5]), - np.array(['a', 'b', 'c', 'd', 'e']), - np.array([1, 2, 3, 5, 4.])) - df = pivot(one, two, three) - self.assertEqual(df['a'][1], 1) - self.assertEqual(df['b'][2], 2) - self.assertEqual(df['c'][3], 3) - self.assertEqual(df['d'][4], 5) - self.assertEqual(df['e'][5], 4) - assert_frame_equal(df, _slow_pivot(one, two, three)) - - # weird overlap, TODO: test? - a, b, c = (np.array([1, 2, 3, 4, 4]), - np.array(['a', 'a', 'a', 'a', 'a']), - np.array([1., 2., 3., 4., 5.])) - self.assertRaises(Exception, pivot, a, b, c) - - # corner case, empty - df = pivot(np.array([]), np.array([]), np.array([])) - - -def test_monotonic(): - pos = np.array([1, 2, 3, 5]) - - def _monotonic(arr): - return not (arr[1:] < arr[:-1]).any() - - assert _monotonic(pos) - - neg = np.array([1, 2, 3, 4, 3]) - - assert not _monotonic(neg) - - neg2 = np.array([5, 1, 2, 3, 4, 5]) - - assert not _monotonic(neg2) + with catch_warnings(record=True): + from pandas.core.reshape.reshape import _slow_pivot + + one, two, three = (np.array([1, 2, 3, 4, 5]), + np.array(['a', 'b', 'c', 'd', 'e']), + np.array([1, 2, 3, 5, 4.])) + df = pivot(one, two, three) + assert df['a'][1] == 1 + assert df['b'][2] == 2 + assert df['c'][3] == 3 + assert df['d'][4] == 5 + assert df['e'][5] == 4 + assert_frame_equal(df, _slow_pivot(one, two, three)) + + # weird overlap, TODO: test? + a, b, c = (np.array([1, 2, 3, 4, 4]), + np.array(['a', 'a', 'a', 'a', 'a']), + np.array([1., 2., 3., 4., 5.])) + pytest.raises(Exception, pivot, a, b, c) + + # corner case, empty + df = pivot(np.array([]), np.array([]), np.array([])) def test_panel_index(): @@ -2538,8 +2690,3 @@ def test_panel_index(): np.repeat([1, 2, 3], 4)], names=['time', 'panel']) tm.assert_index_equal(index, expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_panel4d.py b/pandas/tests/test_panel4d.py index 0769b8916a11b..863671feb4ed8 100644 --- a/pandas/tests/test_panel4d.py +++ b/pandas/tests/test_panel4d.py @@ -2,21 +2,18 @@ from datetime import datetime from pandas.compat import range, lrange import operator -import nose - +import pytest +from warnings import catch_warnings import numpy as np -from pandas.types.common import is_float_dtype -from pandas import Series, Index, isnull, notnull +from pandas import Series, Index, isna, notna +from pandas.core.dtypes.common import is_float_dtype +from pandas.core.dtypes.missing import remove_na_arraylike from pandas.core.panel import Panel from pandas.core.panel4d import Panel4D -from pandas.core.series import remove_na from pandas.tseries.offsets import BDay -from pandas.util.testing import (assert_panel_equal, - assert_panel4d_equal, - assert_frame_equal, - assert_series_equal, +from pandas.util.testing import (assert_frame_equal, assert_series_equal, assert_almost_equal) import pandas.util.testing as tm @@ -29,8 +26,6 @@ def add_nans(panel4d): class SafeForLongAndSparse(object): - _multiprocess_can_split_ = True - def test_repr(self): repr(self.panel4d) @@ -38,7 +33,7 @@ def test_iter(self): tm.equalContents(list(self.panel4d), self.panel4d.labels) def test_count(self): - f = lambda s: notnull(s).sum() + f = lambda s: notna(s).sum() self._check_stat_op('count', f, obj=self.panel4d, has_skipna=False) def test_sum(self): @@ -52,7 +47,7 @@ def test_prod(self): def test_median(self): def wrapper(x): - if isnull(x).any(): + if isna(x).any(): return np.nan return np.median(x) @@ -68,7 +63,7 @@ def test_skew(self): try: from scipy.stats import skew except ImportError: - raise nose.SkipTest("no scipy.stats.skew") + pytest.skip("no scipy.stats.skew") def this_skew(x): if len(x) < 3: @@ -123,7 +118,7 @@ def _check_stat_op(self, name, alternative, obj=None, has_skipna=True): if has_skipna: def skipna_wrapper(x): - nona = remove_na(x) + nona = remove_na_arraylike(x) if len(nona) == 0: return np.nan return alternative(nona) @@ -131,81 +126,76 @@ def skipna_wrapper(x): def wrapper(x): return alternative(np.asarray(x)) - for i in range(obj.ndim): - result = f(axis=i, skipna=False) - assert_panel_equal(result, obj.apply(wrapper, axis=i)) + with catch_warnings(record=True): + for i in range(obj.ndim): + result = f(axis=i, skipna=False) + expected = obj.apply(wrapper, axis=i) + tm.assert_panel_equal(result, expected) else: skipna_wrapper = alternative wrapper = alternative - for i in range(obj.ndim): - result = f(axis=i) - if not tm._incompat_bottleneck_version(name): - assert_panel_equal(result, obj.apply(skipna_wrapper, axis=i)) + with catch_warnings(record=True): + for i in range(obj.ndim): + result = f(axis=i) + if not tm._incompat_bottleneck_version(name): + expected = obj.apply(skipna_wrapper, axis=i) + tm.assert_panel_equal(result, expected) - self.assertRaises(Exception, f, axis=obj.ndim) + pytest.raises(Exception, f, axis=obj.ndim) class SafeForSparse(object): - _multiprocess_can_split_ = True - - @classmethod - def assert_panel_equal(cls, x, y): - assert_panel_equal(x, y) - - @classmethod - def assert_panel4d_equal(cls, x, y): - assert_panel4d_equal(x, y) - def test_get_axis(self): - assert(self.panel4d._get_axis(0) is self.panel4d.labels) - assert(self.panel4d._get_axis(1) is self.panel4d.items) - assert(self.panel4d._get_axis(2) is self.panel4d.major_axis) - assert(self.panel4d._get_axis(3) is self.panel4d.minor_axis) + assert self.panel4d._get_axis(0) is self.panel4d.labels + assert self.panel4d._get_axis(1) is self.panel4d.items + assert self.panel4d._get_axis(2) is self.panel4d.major_axis + assert self.panel4d._get_axis(3) is self.panel4d.minor_axis def test_set_axis(self): - new_labels = Index(np.arange(len(self.panel4d.labels))) + with catch_warnings(record=True): + new_labels = Index(np.arange(len(self.panel4d.labels))) - # TODO: unused? - # new_items = Index(np.arange(len(self.panel4d.items))) + # TODO: unused? + # new_items = Index(np.arange(len(self.panel4d.items))) - new_major = Index(np.arange(len(self.panel4d.major_axis))) - new_minor = Index(np.arange(len(self.panel4d.minor_axis))) + new_major = Index(np.arange(len(self.panel4d.major_axis))) + new_minor = Index(np.arange(len(self.panel4d.minor_axis))) - # ensure propagate to potentially prior-cached items too + # ensure propagate to potentially prior-cached items too - # TODO: unused? - # label = self.panel4d['l1'] + # TODO: unused? + # label = self.panel4d['l1'] - self.panel4d.labels = new_labels + self.panel4d.labels = new_labels - if hasattr(self.panel4d, '_item_cache'): - self.assertNotIn('l1', self.panel4d._item_cache) - self.assertIs(self.panel4d.labels, new_labels) + if hasattr(self.panel4d, '_item_cache'): + assert 'l1' not in self.panel4d._item_cache + assert self.panel4d.labels is new_labels - self.panel4d.major_axis = new_major - self.assertIs(self.panel4d[0].major_axis, new_major) - self.assertIs(self.panel4d.major_axis, new_major) + self.panel4d.major_axis = new_major + assert self.panel4d[0].major_axis is new_major + assert self.panel4d.major_axis is new_major - self.panel4d.minor_axis = new_minor - self.assertIs(self.panel4d[0].minor_axis, new_minor) - self.assertIs(self.panel4d.minor_axis, new_minor) + self.panel4d.minor_axis = new_minor + assert self.panel4d[0].minor_axis is new_minor + assert self.panel4d.minor_axis is new_minor def test_get_axis_number(self): - self.assertEqual(self.panel4d._get_axis_number('labels'), 0) - self.assertEqual(self.panel4d._get_axis_number('items'), 1) - self.assertEqual(self.panel4d._get_axis_number('major'), 2) - self.assertEqual(self.panel4d._get_axis_number('minor'), 3) + assert self.panel4d._get_axis_number('labels') == 0 + assert self.panel4d._get_axis_number('items') == 1 + assert self.panel4d._get_axis_number('major') == 2 + assert self.panel4d._get_axis_number('minor') == 3 def test_get_axis_name(self): - self.assertEqual(self.panel4d._get_axis_name(0), 'labels') - self.assertEqual(self.panel4d._get_axis_name(1), 'items') - self.assertEqual(self.panel4d._get_axis_name(2), 'major_axis') - self.assertEqual(self.panel4d._get_axis_name(3), 'minor_axis') + assert self.panel4d._get_axis_name(0) == 'labels' + assert self.panel4d._get_axis_name(1) == 'items' + assert self.panel4d._get_axis_name(2) == 'major_axis' + assert self.panel4d._get_axis_name(3) == 'minor_axis' def test_arith(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self._test_op(self.panel4d, operator.add) self._test_op(self.panel4d, operator.sub) self._test_op(self.panel4d, operator.mul) @@ -219,13 +209,13 @@ def test_arith(self): self._test_op(self.panel4d, lambda x, y: y / x) self._test_op(self.panel4d, lambda x, y: y ** x) - self.assertRaises(Exception, self.panel4d.__add__, - self.panel4d['l1']) + pytest.raises(Exception, self.panel4d.__add__, + self.panel4d['l1']) @staticmethod def _test_op(panel4d, op): result = op(panel4d, 1) - assert_panel_equal(result['l1'], op(panel4d['l1'], 1)) + tm.assert_panel_equal(result['l1'], op(panel4d['l1'], 1)) def test_keys(self): tm.equalContents(list(self.panel4d.keys()), self.panel4d.labels) @@ -233,48 +223,48 @@ def test_keys(self): def test_iteritems(self): """Test panel4d.iteritems()""" - self.assertEqual(len(list(self.panel4d.iteritems())), - len(self.panel4d.labels)) + assert (len(list(self.panel4d.iteritems())) == + len(self.panel4d.labels)) def test_combinePanel4d(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = self.panel4d.add(self.panel4d) - self.assert_panel4d_equal(result, self.panel4d * 2) + tm.assert_panel4d_equal(result, self.panel4d * 2) def test_neg(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assert_panel4d_equal(-self.panel4d, self.panel4d * -1) + with catch_warnings(record=True): + tm.assert_panel4d_equal(-self.panel4d, self.panel4d * -1) def test_select(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p = self.panel4d # select labels result = p.select(lambda x: x in ('l1', 'l3'), axis='labels') expected = p.reindex(labels=['l1', 'l3']) - self.assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) # select items result = p.select(lambda x: x in ('ItemA', 'ItemC'), axis='items') expected = p.reindex(items=['ItemA', 'ItemC']) - self.assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) # select major_axis result = p.select(lambda x: x >= datetime(2000, 1, 15), axis='major') new_major = p.major_axis[p.major_axis >= datetime(2000, 1, 15)] expected = p.reindex(major=new_major) - self.assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) # select minor_axis result = p.select(lambda x: x in ('D', 'A'), axis=3) expected = p.reindex(minor=['A', 'D']) - self.assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) # corner case, empty thing result = p.select(lambda x: x in ('foo',), axis='items') - self.assert_panel4d_equal(result, p.reindex(items=[])) + tm.assert_panel4d_equal(result, p.reindex(items=[])) def test_get_value(self): @@ -287,15 +277,15 @@ def test_get_value(self): def test_abs(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = self.panel4d.abs() expected = np.abs(self.panel4d) - self.assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) p = self.panel4d['l1'] result = p.abs() expected = np.abs(p) - assert_panel_equal(result, expected) + tm.assert_panel_equal(result, expected) df = p['ItemA'] result = df.abs() @@ -305,22 +295,20 @@ def test_abs(self): class CheckIndexing(object): - _multiprocess_can_split_ = True - def test_getitem(self): - self.assertRaises(Exception, self.panel4d.__getitem__, 'ItemQ') + pytest.raises(Exception, self.panel4d.__getitem__, 'ItemQ') def test_delitem_and_pop(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): expected = self.panel4d['l2'] result = self.panel4d.pop('l2') - assert_panel_equal(expected, result) - self.assertNotIn('l2', self.panel4d.labels) + tm.assert_panel_equal(expected, result) + assert 'l2' not in self.panel4d.labels del self.panel4d['l3'] - self.assertNotIn('l3', self.panel4d.labels) - self.assertRaises(Exception, self.panel4d.__delitem__, 'l3') + assert 'l3' not in self.panel4d.labels + pytest.raises(Exception, self.panel4d.__delitem__, 'l3') values = np.empty((4, 4, 4, 4)) values[0] = 0 @@ -334,63 +322,61 @@ def test_delitem_and_pop(self): # did we delete the right row? panel4dc = panel4d.copy() del panel4dc[0] - assert_panel_equal(panel4dc[1], panel4d[1]) - assert_panel_equal(panel4dc[2], panel4d[2]) - assert_panel_equal(panel4dc[3], panel4d[3]) + tm.assert_panel_equal(panel4dc[1], panel4d[1]) + tm.assert_panel_equal(panel4dc[2], panel4d[2]) + tm.assert_panel_equal(panel4dc[3], panel4d[3]) panel4dc = panel4d.copy() del panel4dc[1] - assert_panel_equal(panel4dc[0], panel4d[0]) - assert_panel_equal(panel4dc[2], panel4d[2]) - assert_panel_equal(panel4dc[3], panel4d[3]) + tm.assert_panel_equal(panel4dc[0], panel4d[0]) + tm.assert_panel_equal(panel4dc[2], panel4d[2]) + tm.assert_panel_equal(panel4dc[3], panel4d[3]) panel4dc = panel4d.copy() del panel4dc[2] - assert_panel_equal(panel4dc[1], panel4d[1]) - assert_panel_equal(panel4dc[0], panel4d[0]) - assert_panel_equal(panel4dc[3], panel4d[3]) + tm.assert_panel_equal(panel4dc[1], panel4d[1]) + tm.assert_panel_equal(panel4dc[0], panel4d[0]) + tm.assert_panel_equal(panel4dc[3], panel4d[3]) panel4dc = panel4d.copy() del panel4dc[3] - assert_panel_equal(panel4dc[1], panel4d[1]) - assert_panel_equal(panel4dc[2], panel4d[2]) - assert_panel_equal(panel4dc[0], panel4d[0]) + tm.assert_panel_equal(panel4dc[1], panel4d[1]) + tm.assert_panel_equal(panel4dc[2], panel4d[2]) + tm.assert_panel_equal(panel4dc[0], panel4d[0]) def test_setitem(self): - # LongPanel with one item - # lp = self.panel.filter(['ItemA', 'ItemB']).to_frame() - # self.assertRaises(Exception, self.panel.__setitem__, - # 'ItemE', lp) + with catch_warnings(record=True): - # Panel - p = Panel(dict( - ItemA=self.panel4d['l1']['ItemA'][2:].filter(items=['A', 'B']))) - self.panel4d['l4'] = p - self.panel4d['l5'] = p + # Panel + p = Panel(dict( + ItemA=self.panel4d['l1']['ItemA'][2:].filter( + items=['A', 'B']))) + self.panel4d['l4'] = p + self.panel4d['l5'] = p - p2 = self.panel4d['l4'] + p2 = self.panel4d['l4'] - assert_panel_equal(p, p2.reindex(items=p.items, - major_axis=p.major_axis, - minor_axis=p.minor_axis)) + tm.assert_panel_equal(p, p2.reindex(items=p.items, + major_axis=p.major_axis, + minor_axis=p.minor_axis)) - # scalar - self.panel4d['lG'] = 1 - self.panel4d['lE'] = True - self.assertEqual(self.panel4d['lG'].values.dtype, np.int64) - self.assertEqual(self.panel4d['lE'].values.dtype, np.bool_) + # scalar + self.panel4d['lG'] = 1 + self.panel4d['lE'] = True + assert self.panel4d['lG'].values.dtype == np.int64 + assert self.panel4d['lE'].values.dtype == np.bool_ - # object dtype - self.panel4d['lQ'] = 'foo' - self.assertEqual(self.panel4d['lQ'].values.dtype, np.object_) + # object dtype + self.panel4d['lQ'] = 'foo' + assert self.panel4d['lQ'].values.dtype == np.object_ - # boolean dtype - self.panel4d['lP'] = self.panel4d['l1'] > 0 - self.assertEqual(self.panel4d['lP'].values.dtype, np.bool_) + # boolean dtype + self.panel4d['lP'] = self.panel4d['l1'] > 0 + assert self.panel4d['lP'].values.dtype == np.bool_ def test_setitem_by_indexer(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): # Panel panel4dc = self.panel4d.copy() @@ -398,34 +384,34 @@ def test_setitem_by_indexer(self): def func(): self.panel4d.iloc[0] = p - self.assertRaises(NotImplementedError, func) + pytest.raises(NotImplementedError, func) # DataFrame panel4dc = self.panel4d.copy() df = panel4dc.iloc[0, 0] df.iloc[:] = 1 panel4dc.iloc[0, 0] = df - self.assertTrue((panel4dc.iloc[0, 0].values == 1).all()) + assert (panel4dc.iloc[0, 0].values == 1).all() # Series panel4dc = self.panel4d.copy() s = panel4dc.iloc[0, 0, :, 0] s.iloc[:] = 1 panel4dc.iloc[0, 0, :, 0] = s - self.assertTrue((panel4dc.iloc[0, 0, :, 0].values == 1).all()) + assert (panel4dc.iloc[0, 0, :, 0].values == 1).all() # scalar panel4dc = self.panel4d.copy() panel4dc.iloc[0] = 1 panel4dc.iloc[1] = True panel4dc.iloc[2] = 'foo' - self.assertTrue((panel4dc.iloc[0].values == 1).all()) - self.assertTrue(panel4dc.iloc[1].values.all()) - self.assertTrue((panel4dc.iloc[2].values == 'foo').all()) + assert (panel4dc.iloc[0].values == 1).all() + assert panel4dc.iloc[1].values.all() + assert (panel4dc.iloc[2].values == 'foo').all() def test_setitem_by_indexer_mixed_type(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): # GH 8702 self.panel4d['foo'] = 'bar' @@ -434,12 +420,12 @@ def test_setitem_by_indexer_mixed_type(self): panel4dc.iloc[0] = 1 panel4dc.iloc[1] = True panel4dc.iloc[2] = 'foo' - self.assertTrue((panel4dc.iloc[0].values == 1).all()) - self.assertTrue(panel4dc.iloc[1].values.all()) - self.assertTrue((panel4dc.iloc[2].values == 'foo').all()) + assert (panel4dc.iloc[0].values == 1).all() + assert panel4dc.iloc[1].values.all() + assert (panel4dc.iloc[2].values == 'foo').all() def test_comparisons(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p1 = tm.makePanel4D() p2 = tm.makePanel4D() @@ -448,18 +434,18 @@ def test_comparisons(self): def test_comp(func): result = func(p1, p2) - self.assert_numpy_array_equal(result.values, - func(p1.values, p2.values)) + tm.assert_numpy_array_equal(result.values, + func(p1.values, p2.values)) # versus non-indexed same objs - self.assertRaises(Exception, func, p1, tp) + pytest.raises(Exception, func, p1, tp) # versus different objs - self.assertRaises(Exception, func, p1, p) + pytest.raises(Exception, func, p1, p) result3 = func(self.panel4d, 0) - self.assert_numpy_array_equal(result3.values, - func(self.panel4d.values, 0)) + tm.assert_numpy_array_equal(result3.values, + func(self.panel4d.values, 0)) with np.errstate(invalid='ignore'): test_comp(operator.eq) @@ -473,56 +459,62 @@ def test_major_xs(self): ref = self.panel4d['l1']['ItemA'] idx = self.panel4d.major_axis[5] - xs = self.panel4d.major_xs(idx) + with catch_warnings(record=True): + xs = self.panel4d.major_xs(idx) assert_series_equal(xs['l1'].T['ItemA'], ref.xs(idx), check_names=False) # not contained idx = self.panel4d.major_axis[0] - BDay() - self.assertRaises(Exception, self.panel4d.major_xs, idx) + pytest.raises(Exception, self.panel4d.major_xs, idx) def test_major_xs_mixed(self): self.panel4d['l4'] = 'foo' - xs = self.panel4d.major_xs(self.panel4d.major_axis[0]) - self.assertEqual(xs['l1']['A'].dtype, np.float64) - self.assertEqual(xs['l4']['A'].dtype, np.object_) + with catch_warnings(record=True): + xs = self.panel4d.major_xs(self.panel4d.major_axis[0]) + assert xs['l1']['A'].dtype == np.float64 + assert xs['l4']['A'].dtype == np.object_ def test_minor_xs(self): ref = self.panel4d['l1']['ItemA'] - idx = self.panel4d.minor_axis[1] - xs = self.panel4d.minor_xs(idx) + with catch_warnings(record=True): + idx = self.panel4d.minor_axis[1] + xs = self.panel4d.minor_xs(idx) assert_series_equal(xs['l1'].T['ItemA'], ref[idx], check_names=False) # not contained - self.assertRaises(Exception, self.panel4d.minor_xs, 'E') + pytest.raises(Exception, self.panel4d.minor_xs, 'E') def test_minor_xs_mixed(self): self.panel4d['l4'] = 'foo' - xs = self.panel4d.minor_xs('D') - self.assertEqual(xs['l1'].T['ItemA'].dtype, np.float64) - self.assertEqual(xs['l4'].T['ItemA'].dtype, np.object_) + with catch_warnings(record=True): + xs = self.panel4d.minor_xs('D') + assert xs['l1'].T['ItemA'].dtype == np.float64 + assert xs['l4'].T['ItemA'].dtype == np.object_ def test_xs(self): l1 = self.panel4d.xs('l1', axis=0) expected = self.panel4d['l1'] - assert_panel_equal(l1, expected) + tm.assert_panel_equal(l1, expected) - # view if possible + # View if possible l1_view = self.panel4d.xs('l1', axis=0) l1_view.values[:] = np.nan - self.assertTrue(np.isnan(self.panel4d['l1'].values).all()) + assert np.isnan(self.panel4d['l1'].values).all() - # mixed-type + # Mixed-type self.panel4d['strings'] = 'foo' - result = self.panel4d.xs('D', axis=3) - self.assertIsNotNone(result.is_copy) + with catch_warnings(record=True): + result = self.panel4d.xs('D', axis=3) + + assert result.is_copy is not None def test_getitem_fancy_labels(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): panel4d = self.panel4d labels = panel4d.labels[[1, 0]] @@ -531,34 +523,34 @@ def test_getitem_fancy_labels(self): cols = ['D', 'C', 'F'] # all 4 specified - assert_panel4d_equal(panel4d.loc[labels, items, dates, cols], - panel4d.reindex(labels=labels, items=items, - major=dates, minor=cols)) + tm.assert_panel4d_equal(panel4d.loc[labels, items, dates, cols], + panel4d.reindex(labels=labels, items=items, + major=dates, minor=cols)) # 3 specified - assert_panel4d_equal(panel4d.loc[:, items, dates, cols], - panel4d.reindex(items=items, major=dates, - minor=cols)) + tm.assert_panel4d_equal(panel4d.loc[:, items, dates, cols], + panel4d.reindex(items=items, major=dates, + minor=cols)) # 2 specified - assert_panel4d_equal(panel4d.loc[:, :, dates, cols], - panel4d.reindex(major=dates, minor=cols)) + tm.assert_panel4d_equal(panel4d.loc[:, :, dates, cols], + panel4d.reindex(major=dates, minor=cols)) - assert_panel4d_equal(panel4d.loc[:, items, :, cols], - panel4d.reindex(items=items, minor=cols)) + tm.assert_panel4d_equal(panel4d.loc[:, items, :, cols], + panel4d.reindex(items=items, minor=cols)) - assert_panel4d_equal(panel4d.loc[:, items, dates, :], - panel4d.reindex(items=items, major=dates)) + tm.assert_panel4d_equal(panel4d.loc[:, items, dates, :], + panel4d.reindex(items=items, major=dates)) # only 1 - assert_panel4d_equal(panel4d.loc[:, items, :, :], - panel4d.reindex(items=items)) + tm.assert_panel4d_equal(panel4d.loc[:, items, :, :], + panel4d.reindex(items=items)) - assert_panel4d_equal(panel4d.loc[:, :, dates, :], - panel4d.reindex(major=dates)) + tm.assert_panel4d_equal(panel4d.loc[:, :, dates, :], + panel4d.reindex(major=dates)) - assert_panel4d_equal(panel4d.loc[:, :, :, cols], - panel4d.reindex(minor=cols)) + tm.assert_panel4d_equal(panel4d.loc[:, :, :, cols], + panel4d.reindex(minor=cols)) def test_getitem_fancy_slice(self): pass @@ -578,62 +570,56 @@ def test_get_value(self): def test_set_value(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): for label in self.panel4d.labels: for item in self.panel4d.items: for mjr in self.panel4d.major_axis[::2]: for mnr in self.panel4d.minor_axis: self.panel4d.set_value(label, item, mjr, mnr, 1.) - assert_almost_equal( + tm.assert_almost_equal( self.panel4d[label][item][mnr][mjr], 1.) res3 = self.panel4d.set_value('l4', 'ItemE', 'foobar', 'baz', 5) - self.assertTrue(is_float_dtype(res3['l4'].values)) + assert is_float_dtype(res3['l4'].values) # resize res = self.panel4d.set_value('l4', 'ItemE', 'foo', 'bar', 1.5) - tm.assertIsInstance(res, Panel4D) - self.assertIsNot(res, self.panel4d) - self.assertEqual(res.get_value('l4', 'ItemE', 'foo', 'bar'), 1.5) + assert isinstance(res, Panel4D) + assert res is not self.panel4d + assert res.get_value('l4', 'ItemE', 'foo', 'bar') == 1.5 res3 = self.panel4d.set_value('l4', 'ItemE', 'foobar', 'baz', 5) - self.assertTrue(is_float_dtype(res3['l4'].values)) + assert is_float_dtype(res3['l4'].values) -class TestPanel4d(tm.TestCase, CheckIndexing, SafeForSparse, +class TestPanel4d(CheckIndexing, SafeForSparse, SafeForLongAndSparse): - _multiprocess_can_split_ = True - - @classmethod - def assert_panel4d_equal(cls, x, y): - assert_panel4d_equal(x, y) - - def setUp(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + def setup_method(self, method): + with catch_warnings(record=True): self.panel4d = tm.makePanel4D(nper=8) add_nans(self.panel4d) def test_constructor(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): panel4d = Panel4D(self.panel4d._data) - self.assertIs(panel4d._data, self.panel4d._data) + assert panel4d._data is self.panel4d._data panel4d = Panel4D(self.panel4d._data, copy=True) - self.assertIsNot(panel4d._data, self.panel4d._data) - assert_panel4d_equal(panel4d, self.panel4d) + assert panel4d._data is not self.panel4d._data + tm.assert_panel4d_equal(panel4d, self.panel4d) vals = self.panel4d.values # no copy panel4d = Panel4D(vals) - self.assertIs(panel4d.values, vals) + assert panel4d.values is vals # copy panel4d = Panel4D(vals, copy=True) - self.assertIsNot(panel4d.values, vals) + assert panel4d.values is not vals # GH #8285, test when scalar data is used to construct a Panel4D # if dtype is not passed, it should be inferred @@ -645,7 +631,7 @@ def test_constructor(self): vals = np.empty((2, 3, 4, 5), dtype=dtype) vals.fill(val) expected = Panel4D(vals, dtype=dtype) - assert_panel4d_equal(panel4d, expected) + tm.assert_panel4d_equal(panel4d, expected) # test the case when dtype is passed panel4d = Panel4D(1, labels=range(2), items=range( @@ -654,10 +640,10 @@ def test_constructor(self): vals.fill(1) expected = Panel4D(vals, dtype='float32') - assert_panel4d_equal(panel4d, expected) + tm.assert_panel4d_equal(panel4d, expected) def test_constructor_cast(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): zero_filled = self.panel4d.fillna(0) casted = Panel4D(zero_filled._data, dtype=int) @@ -676,60 +662,60 @@ def test_constructor_cast(self): # can't cast data = [[['foo', 'bar', 'baz']]] - self.assertRaises(ValueError, Panel, data, dtype=float) + pytest.raises(ValueError, Panel, data, dtype=float) def test_consolidate(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertTrue(self.panel4d._data.is_consolidated()) + with catch_warnings(record=True): + assert self.panel4d._data.is_consolidated() self.panel4d['foo'] = 1. - self.assertFalse(self.panel4d._data.is_consolidated()) + assert not self.panel4d._data.is_consolidated() - panel4d = self.panel4d.consolidate() - self.assertTrue(panel4d._data.is_consolidated()) + panel4d = self.panel4d._consolidate() + assert panel4d._data.is_consolidated() def test_ctor_dict(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): l1 = self.panel4d['l1'] l2 = self.panel4d['l2'] d = {'A': l1, 'B': l2.loc[['ItemB'], :, :]} panel4d = Panel4D(d) - assert_panel_equal(panel4d['A'], self.panel4d['l1']) - assert_frame_equal(panel4d.loc['B', 'ItemB', :, :], - self.panel4d.loc['l2', ['ItemB'], - :, :]['ItemB']) + tm.assert_panel_equal(panel4d['A'], self.panel4d['l1']) + tm.assert_frame_equal(panel4d.loc['B', 'ItemB', :, :], + self.panel4d.loc['l2', ['ItemB'], + :, :]['ItemB']) def test_constructor_dict_mixed(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): data = dict((k, v.values) for k, v in self.panel4d.iteritems()) result = Panel4D(data) exp_major = Index(np.arange(len(self.panel4d.major_axis))) - self.assert_index_equal(result.major_axis, exp_major) + tm.assert_index_equal(result.major_axis, exp_major) result = Panel4D(data, labels=self.panel4d.labels, items=self.panel4d.items, major_axis=self.panel4d.major_axis, minor_axis=self.panel4d.minor_axis) - assert_panel4d_equal(result, self.panel4d) + tm.assert_panel4d_equal(result, self.panel4d) data['l2'] = self.panel4d['l2'] result = Panel4D(data) - assert_panel4d_equal(result, self.panel4d) + tm.assert_panel4d_equal(result, self.panel4d) # corner, blow up data['l2'] = data['l2']['ItemB'] - self.assertRaises(Exception, Panel4D, data) + pytest.raises(Exception, Panel4D, data) data['l2'] = self.panel4d['l2'].values[:, :, :-1] - self.assertRaises(Exception, Panel4D, data) + pytest.raises(Exception, Panel4D, data) def test_constructor_resize(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): data = self.panel4d._data labels = self.panel4d.labels[:-1] items = self.panel4d.items[:-1] @@ -740,36 +726,39 @@ def test_constructor_resize(self): major_axis=major, minor_axis=minor) expected = self.panel4d.reindex( labels=labels, items=items, major=major, minor=minor) - assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) result = Panel4D(data, items=items, major_axis=major) expected = self.panel4d.reindex(items=items, major=major) - assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) result = Panel4D(data, items=items) expected = self.panel4d.reindex(items=items) - assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) result = Panel4D(data, minor_axis=minor) expected = self.panel4d.reindex(minor=minor) - assert_panel4d_equal(result, expected) + tm.assert_panel4d_equal(result, expected) def test_conform(self): + with catch_warnings(record=True): - p = self.panel4d['l1'].filter(items=['ItemA', 'ItemB']) - conformed = self.panel4d.conform(p) + p = self.panel4d['l1'].filter(items=['ItemA', 'ItemB']) + conformed = self.panel4d.conform(p) - tm.assert_index_equal(conformed.items, self.panel4d.labels) - tm.assert_index_equal(conformed.major_axis, self.panel4d.major_axis) - tm.assert_index_equal(conformed.minor_axis, self.panel4d.minor_axis) + tm.assert_index_equal(conformed.items, self.panel4d.labels) + tm.assert_index_equal(conformed.major_axis, + self.panel4d.major_axis) + tm.assert_index_equal(conformed.minor_axis, + self.panel4d.minor_axis) def test_reindex(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): ref = self.panel4d['l2'] # labels result = self.panel4d.reindex(labels=['l1', 'l2']) - assert_panel_equal(result['l2'], ref) + tm.assert_panel_equal(result['l2'], ref) # items result = self.panel4d.reindex(items=['ItemA', 'ItemB']) @@ -782,8 +771,8 @@ def test_reindex(self): result['l2']['ItemB'], ref['ItemB'].reindex(index=new_major)) # raise exception put both major and major_axis - self.assertRaises(Exception, self.panel4d.reindex, - major_axis=new_major, major=new_major) + pytest.raises(Exception, self.panel4d.reindex, + major_axis=new_major, major=new_major) # minor new_minor = list(self.panel4d.minor_axis[:2]) @@ -798,8 +787,8 @@ def test_reindex(self): # don't necessarily copy result = self.panel4d.reindex() - assert_panel4d_equal(result, self.panel4d) - self.assertFalse(result is self.panel4d) + tm.assert_panel4d_equal(result, self.panel4d) + assert result is not self.panel4d # with filling smaller_major = self.panel4d.major_axis[::5] @@ -808,33 +797,34 @@ def test_reindex(self): larger = smaller.reindex(major=self.panel4d.major_axis, method='pad') - assert_panel_equal(larger.loc[:, :, self.panel4d.major_axis[1], :], - smaller.loc[:, :, smaller_major[0], :]) + tm.assert_panel_equal(larger.loc[:, :, + self.panel4d.major_axis[1], :], + smaller.loc[:, :, smaller_major[0], :]) # don't necessarily copy result = self.panel4d.reindex( major=self.panel4d.major_axis, copy=False) - assert_panel4d_equal(result, self.panel4d) - self.assertTrue(result is self.panel4d) + tm.assert_panel4d_equal(result, self.panel4d) + assert result is self.panel4d def test_not_hashable(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p4D_empty = Panel4D() - self.assertRaises(TypeError, hash, p4D_empty) - self.assertRaises(TypeError, hash, self.panel4d) + pytest.raises(TypeError, hash, p4D_empty) + pytest.raises(TypeError, hash, self.panel4d) def test_reindex_like(self): # reindex_like - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): smaller = self.panel4d.reindex(labels=self.panel4d.labels[:-1], items=self.panel4d.items[:-1], major=self.panel4d.major_axis[:-1], minor=self.panel4d.minor_axis[:-1]) smaller_like = self.panel4d.reindex_like(smaller) - assert_panel4d_equal(smaller, smaller_like) + tm.assert_panel4d_equal(smaller, smaller_like) def test_sort_index(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): import random rlabels = list(self.panel4d.labels) @@ -848,47 +838,47 @@ def test_sort_index(self): random_order = self.panel4d.reindex(labels=rlabels) sorted_panel4d = random_order.sort_index(axis=0) - assert_panel4d_equal(sorted_panel4d, self.panel4d) + tm.assert_panel4d_equal(sorted_panel4d, self.panel4d) def test_fillna(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertFalse(np.isfinite(self.panel4d.values).all()) + with catch_warnings(record=True): + assert not np.isfinite(self.panel4d.values).all() filled = self.panel4d.fillna(0) - self.assertTrue(np.isfinite(filled.values).all()) + assert np.isfinite(filled.values).all() - self.assertRaises(NotImplementedError, - self.panel4d.fillna, method='pad') + pytest.raises(NotImplementedError, + self.panel4d.fillna, method='pad') def test_swapaxes(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = self.panel4d.swapaxes('labels', 'items') - self.assertIs(result.items, self.panel4d.labels) + assert result.items is self.panel4d.labels result = self.panel4d.swapaxes('labels', 'minor') - self.assertIs(result.labels, self.panel4d.minor_axis) + assert result.labels is self.panel4d.minor_axis result = self.panel4d.swapaxes('items', 'minor') - self.assertIs(result.items, self.panel4d.minor_axis) + assert result.items is self.panel4d.minor_axis result = self.panel4d.swapaxes('items', 'major') - self.assertIs(result.items, self.panel4d.major_axis) + assert result.items is self.panel4d.major_axis result = self.panel4d.swapaxes('major', 'minor') - self.assertIs(result.major_axis, self.panel4d.minor_axis) + assert result.major_axis is self.panel4d.minor_axis # this should also work result = self.panel4d.swapaxes(0, 1) - self.assertIs(result.labels, self.panel4d.items) + assert result.labels is self.panel4d.items # this works, but return a copy result = self.panel4d.swapaxes('items', 'items') - assert_panel4d_equal(self.panel4d, result) - self.assertNotEqual(id(self.panel4d), id(result)) + tm.assert_panel4d_equal(self.panel4d, result) + assert id(self.panel4d) != id(result) def test_update(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): p4d = Panel4D([[[[1.5, np.nan, 3.], [1.5, np.nan, 3.], [1.5, np.nan, 3.], @@ -912,7 +902,7 @@ def test_update(self): [1.5, np.nan, 3.], [1.5, np.nan, 3.]]]]) - assert_panel4d_equal(p4d, expected) + tm.assert_panel4d_equal(p4d, expected) def test_dtypes(self): @@ -921,12 +911,12 @@ def test_dtypes(self): assert_series_equal(result, expected) def test_repr_empty(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): empty = Panel4D() repr(empty) def test_rename(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): mapper = {'l1': 'foo', 'l2': 'bar', @@ -934,23 +924,23 @@ def test_rename(self): renamed = self.panel4d.rename_axis(mapper, axis=0) exp = Index(['foo', 'bar', 'baz']) - self.assert_index_equal(renamed.labels, exp) + tm.assert_index_equal(renamed.labels, exp) renamed = self.panel4d.rename_axis(str.lower, axis=3) exp = Index(['a', 'b', 'c', 'd']) - self.assert_index_equal(renamed.minor_axis, exp) + tm.assert_index_equal(renamed.minor_axis, exp) # don't copy renamed_nocopy = self.panel4d.rename_axis(mapper, axis=0, copy=False) renamed_nocopy['foo'] = 3. - self.assertTrue((self.panel4d['l1'].values == 3).all()) + assert (self.panel4d['l1'].values == 3).all() def test_get_attr(self): - assert_panel_equal(self.panel4d['l1'], self.panel4d.l1) - + tm.assert_panel_equal(self.panel4d['l1'], self.panel4d.l1) -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + # GH issue 15960 + def test_sort_values(self): + pytest.raises(NotImplementedError, self.panel4d.sort_values) + pytest.raises(NotImplementedError, self.panel4d.sort_values, 'ItemA') diff --git a/pandas/tests/test_panelnd.py b/pandas/tests/test_panelnd.py index 92805f3b30ec6..c473e3c09cc74 100644 --- a/pandas/tests/test_panelnd.py +++ b/pandas/tests/test_panelnd.py @@ -1,6 +1,7 @@ # -*- coding: utf-8 -*- -import nose +import pytest +from warnings import catch_warnings from pandas.core import panelnd from pandas.core.panel import Panel @@ -8,14 +9,14 @@ import pandas.util.testing as tm -class TestPanelnd(tm.TestCase): +class TestPanelnd(object): - def setUp(self): + def setup_method(self, method): pass def test_4d_construction(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): # create a 4D Panel4D = panelnd.create_nd_panel_factory( @@ -31,7 +32,7 @@ def test_4d_construction(self): def test_4d_construction_alt(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): # create a 4D Panel4D = panelnd.create_nd_panel_factory( @@ -48,22 +49,22 @@ def test_4d_construction_alt(self): def test_4d_construction_error(self): # create a 4D - self.assertRaises(Exception, - panelnd.create_nd_panel_factory, - klass_name='Panel4D', - orders=['labels', 'items', 'major_axis', - 'minor_axis'], - slices={'items': 'items', - 'major_axis': 'major_axis', - 'minor_axis': 'minor_axis'}, - slicer='foo', - aliases={'major': 'major_axis', - 'minor': 'minor_axis'}, - stat_axis=2) + pytest.raises(Exception, + panelnd.create_nd_panel_factory, + klass_name='Panel4D', + orders=['labels', 'items', 'major_axis', + 'minor_axis'], + slices={'items': 'items', + 'major_axis': 'major_axis', + 'minor_axis': 'minor_axis'}, + slicer='foo', + aliases={'major': 'major_axis', + 'minor': 'minor_axis'}, + stat_axis=2) def test_5d_construction(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): # create a 4D Panel4D = panelnd.create_nd_panel_factory( @@ -101,7 +102,3 @@ def test_5d_construction(self): # test a transpose # results = p5d.transpose(1,2,3,4,0) # expected = - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_resample.py b/pandas/tests/test_resample.py old mode 100755 new mode 100644 similarity index 87% rename from pandas/tseries/tests/test_resample.py rename to pandas/tests/test_resample.py index 56953541265a6..d42e37048d87f --- a/pandas/tseries/tests/test_resample.py +++ b/pandas/tests/test_resample.py @@ -1,33 +1,37 @@ # pylint: disable=E1101 +from warnings import catch_warnings from datetime import datetime, timedelta from functools import partial +from textwrap import dedent -import nose +import pytz +import pytest +import dateutil import numpy as np import pandas as pd import pandas.tseries.offsets as offsets import pandas.util.testing as tm -from pandas import (Series, DataFrame, Panel, Index, isnull, - notnull, Timestamp) +from pandas import (Series, DataFrame, Panel, Index, isna, + notna, Timestamp) -from pandas.types.generic import ABCSeries, ABCDataFrame +from pandas.core.dtypes.generic import ABCSeries, ABCDataFrame from pandas.compat import range, lrange, zip, product, OrderedDict from pandas.core.base import SpecificationError -from pandas.core.common import UnsupportedFunctionCall +from pandas.errors import UnsupportedFunctionCall from pandas.core.groupby import DataError from pandas.tseries.frequencies import MONTHS, DAYS from pandas.tseries.frequencies import to_offset -from pandas.tseries.index import date_range +from pandas.core.indexes.datetimes import date_range from pandas.tseries.offsets import Minute, BDay -from pandas.tseries.period import period_range, PeriodIndex, Period -from pandas.tseries.resample import (DatetimeIndex, TimeGrouper, - DatetimeIndexResampler) -from pandas.tseries.tdi import timedelta_range, TimedeltaIndex +from pandas.core.indexes.period import period_range, PeriodIndex, Period +from pandas.core.resample import (DatetimeIndex, TimeGrouper, + DatetimeIndexResampler) +from pandas.core.indexes.timedeltas import timedelta_range, TimedeltaIndex from pandas.util.testing import (assert_series_equal, assert_almost_equal, assert_frame_equal, assert_index_equal) -from pandas._period import IncompatibleFrequency +from pandas._libs.period import IncompatibleFrequency bday = BDay() @@ -49,10 +53,9 @@ def _simple_pts(start, end, freq='D'): return Series(np.random.randn(len(rng)), index=rng) -class TestResampleAPI(tm.TestCase): - _multiprocess_can_split_ = True +class TestResampleAPI(object): - def setUp(self): + def setup_method(self, method): dti = DatetimeIndex(start=datetime(2005, 1, 1), end=datetime(2005, 1, 10), freq='Min') @@ -63,21 +66,20 @@ def setUp(self): def test_str(self): r = self.series.resample('H') - self.assertTrue( - 'DatetimeIndexResampler [freq=, axis=0, closed=left, ' - 'label=left, convention=start, base=0]' in str(r)) + assert ('DatetimeIndexResampler [freq=, axis=0, closed=left, ' + 'label=left, convention=start, base=0]' in str(r)) def test_api(self): r = self.series.resample('H') result = r.mean() - self.assertIsInstance(result, Series) - self.assertEqual(len(result), 217) + assert isinstance(result, Series) + assert len(result) == 217 r = self.series.to_frame().resample('H') result = r.mean() - self.assertIsInstance(result, DataFrame) - self.assertEqual(len(result), 217) + assert isinstance(result, DataFrame) + assert len(result) == 217 def test_api_changes_v018(self): @@ -85,7 +87,7 @@ def test_api_changes_v018(self): # to .resample(......).how() r = self.series.resample('H') - self.assertIsInstance(r, DatetimeIndexResampler) + assert isinstance(r, DatetimeIndexResampler) for how in ['sum', 'mean', 'prod', 'min', 'max', 'var', 'std']: with tm.assert_produces_warning(FutureWarning, @@ -101,25 +103,25 @@ def test_api_changes_v018(self): tm.assert_frame_equal(result, expected) # compat for pandas-like methods - for how in ['sort_values', 'isnull']: + for how in ['sort_values', 'isna']: with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): getattr(r, how)() # invalids as these can be setting operations r = self.series.resample('H') - self.assertRaises(ValueError, lambda: r.iloc[0]) - self.assertRaises(ValueError, lambda: r.iat[0]) - self.assertRaises(ValueError, lambda: r.loc[0]) - self.assertRaises(ValueError, lambda: r.loc[ + pytest.raises(ValueError, lambda: r.iloc[0]) + pytest.raises(ValueError, lambda: r.iat[0]) + pytest.raises(ValueError, lambda: r.loc[0]) + pytest.raises(ValueError, lambda: r.loc[ Timestamp('2013-01-01 00:00:00', offset='H')]) - self.assertRaises(ValueError, lambda: r.at[ + pytest.raises(ValueError, lambda: r.at[ Timestamp('2013-01-01 00:00:00', offset='H')]) def f(): r[0] = 5 - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # str/repr r = self.series.resample('H') @@ -133,10 +135,10 @@ def f(): tm.assert_numpy_array_equal(np.array(r), np.array(r.mean())) # masquerade as Series/DataFrame as needed for API compat - self.assertTrue(isinstance(self.series.resample('H'), ABCSeries)) - self.assertFalse(isinstance(self.frame.resample('H'), ABCSeries)) - self.assertFalse(isinstance(self.series.resample('H'), ABCDataFrame)) - self.assertTrue(isinstance(self.frame.resample('H'), ABCDataFrame)) + assert isinstance(self.series.resample('H'), ABCSeries) + assert not isinstance(self.frame.resample('H'), ABCSeries) + assert not isinstance(self.series.resample('H'), ABCDataFrame) + assert isinstance(self.frame.resample('H'), ABCDataFrame) # bin numeric ops for op in ['__add__', '__mul__', '__truediv__', '__div__', '__sub__']: @@ -147,7 +149,7 @@ def f(): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertIsInstance(getattr(r, op)(2), pd.Series) + assert isinstance(getattr(r, op)(2), pd.Series) # unary numeric ops for op in ['__pos__', '__neg__', '__abs__', '__inv__']: @@ -158,7 +160,7 @@ def f(): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertIsInstance(getattr(r, op)(), pd.Series) + assert isinstance(getattr(r, op)(), pd.Series) # comparison ops for op in ['__lt__', '__le__', '__gt__', '__ge__', '__eq__', '__ne__']: @@ -166,7 +168,7 @@ def f(): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertIsInstance(getattr(r, op)(2), pd.Series) + assert isinstance(getattr(r, op)(2), pd.Series) # IPython introspection shouldn't trigger warning GH 13618 for op in ['_repr_json', '_repr_latex', @@ -179,7 +181,7 @@ def f(): df = self.series.to_frame('foo') # same as prior versions for DataFrame - self.assertRaises(KeyError, lambda: df.resample('H')[0]) + pytest.raises(KeyError, lambda: df.resample('H')[0]) # compat for Series # but we cannot be sure that we need a warning here @@ -187,13 +189,13 @@ def f(): check_stacklevel=False): result = self.series.resample('H')[0] expected = self.series.resample('H').mean()[0] - self.assertEqual(result, expected) + assert result == expected with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): result = self.series.resample('H')['2005-01-09 23:00:00'] expected = self.series.resample('H').mean()['2005-01-09 23:00:00'] - self.assertEqual(result, expected) + assert result == expected def test_groupby_resample_api(self): @@ -217,6 +219,20 @@ def test_groupby_resample_api(self): lambda x: x.resample('1D').ffill())[['val']] assert_frame_equal(result, expected) + def test_groupby_resample_on_api(self): + + # GH 15021 + # .groupby(...).resample(on=...) results in an unexpected + # keyword warning. + df = pd.DataFrame({'key': ['A', 'B'] * 5, + 'dates': pd.date_range('2016-01-01', periods=10), + 'values': np.random.randn(10)}) + + expected = df.set_index('dates').groupby('key').resample('D').mean() + + result = df.groupby('key').resample('D', on='dates').mean() + assert_frame_equal(result, expected) + def test_plot_api(self): tm._skip_if_no_mpl() @@ -241,7 +257,7 @@ def test_getitem(self): tm.assert_index_equal(r._selected_obj.columns, self.frame.columns) r = self.frame.resample('H')['B'] - self.assertEqual(r._selected_obj.name, self.frame.columns[1]) + assert r._selected_obj.name == self.frame.columns[1] # technically this is allowed r = self.frame.resample('H')['A', 'B'] @@ -255,10 +271,10 @@ def test_getitem(self): def test_select_bad_cols(self): g = self.frame.resample('H') - self.assertRaises(KeyError, g.__getitem__, ['D']) + pytest.raises(KeyError, g.__getitem__, ['D']) - self.assertRaises(KeyError, g.__getitem__, ['A', 'D']) - with tm.assertRaisesRegexp(KeyError, '^[^A]+$'): + pytest.raises(KeyError, g.__getitem__, ['A', 'D']) + with tm.assert_raises_regex(KeyError, '^[^A]+$'): # A should not be referenced as a bad column... # will have to rethink regex if you change message! g[['A', 'D']] @@ -269,14 +285,13 @@ def test_attribute_access(self): tm.assert_series_equal(r.A.sum(), r['A'].sum()) # getting - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertRaises(AttributeError, lambda: r.F) + pytest.raises(AttributeError, lambda: r.F) # setting def f(): r.F = 'bah' - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_api_compat_before_use(self): @@ -358,7 +373,7 @@ def test_fillna(self): result = r.fillna(method='bfill') assert_series_equal(result, expected) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): r.fillna(0) def test_apply_without_aggregation(self): @@ -381,8 +396,10 @@ def test_agg_consistency(self): r = df.resample('3T') - expected = r[['A', 'B', 'C']].agg({'r1': 'mean', 'r2': 'sum'}) - result = r.agg({'r1': 'mean', 'r2': 'sum'}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + expected = r[['A', 'B', 'C']].agg({'r1': 'mean', 'r2': 'sum'}) + result = r.agg({'r1': 'mean', 'r2': 'sum'}) assert_frame_equal(result, expected) # TODO: once GH 14008 is fixed, move these tests into @@ -446,7 +463,9 @@ def test_agg(self): expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), ('A', 'sum')]) for t in cases: - result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) assert_frame_equal(result, expected, check_like=True) expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) @@ -455,8 +474,10 @@ def test_agg(self): ('B', 'mean2'), ('B', 'sum2')]) for t in cases: - result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}, - 'B': {'mean2': 'mean', 'sum2': 'sum'}}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}, + 'B': {'mean2': 'mean', 'sum2': 'sum'}}) assert_frame_equal(result, expected, check_like=True) expected = pd.concat([a_mean, a_std, b_mean, b_std], axis=1) @@ -516,9 +537,12 @@ def test_agg_misc(self): ('result1', 'B'), ('result2', 'A'), ('result2', 'B')]) + for t in cases: - result = t[['A', 'B']].agg(OrderedDict([('result1', np.sum), - ('result2', np.mean)])) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t[['A', 'B']].agg(OrderedDict([('result1', np.sum), + ('result2', np.mean)])) assert_frame_equal(result, expected, check_like=True) # agg with different hows @@ -544,7 +568,9 @@ def test_agg_misc(self): # series like aggs for t in cases: - result = t['A'].agg({'A': ['sum', 'std']}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t['A'].agg({'A': ['sum', 'std']}) expected = pd.concat([t['A'].sum(), t['A'].std()], axis=1) @@ -559,17 +585,22 @@ def test_agg_misc(self): ('A', 'std'), ('B', 'mean'), ('B', 'std')]) - result = t['A'].agg({'A': ['sum', 'std'], 'B': ['mean', 'std']}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t['A'].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) assert_frame_equal(result, expected, check_like=True) # errors # invalid names in the agg specification for t in cases: def f(): - t[['A']].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + t[['A']].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) - self.assertRaises(SpecificationError, f) + pytest.raises(SpecificationError, f) def test_agg_nested_dicts(self): @@ -596,7 +627,7 @@ def test_agg_nested_dicts(self): def f(): t.aggregate({'r1': {'A': ['mean', 'sum']}, 'r2': {'B': ['mean', 'sum']}}) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) for t in cases: expected = pd.concat([t['A'].mean(), t['A'].std(), t['B'].mean(), @@ -604,12 +635,16 @@ def f(): expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( 'ra', 'std'), ('rb', 'mean'), ('rb', 'std')]) - result = t[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) assert_frame_equal(result, expected, check_like=True) - result = t.agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) assert_frame_equal(result, expected, check_like=True) def test_selection_api_validation(self): @@ -625,23 +660,23 @@ def test_selection_api_validation(self): index=index) # non DatetimeIndex - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.resample('2D', level='v') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.resample('2D', on='date', level='d') - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): df.resample('2D', on=['a', 'date']) - with tm.assertRaises(KeyError): + with pytest.raises(KeyError): df.resample('2D', level=['a', 'date']) # upsampling not allowed - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.resample('2D', level='d').asfreq() - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): df.resample('2D', on='date').asfreq() exp = df_exp.resample('2D').sum() @@ -721,7 +756,7 @@ def test_resample_interpolate(self): def test_raises_on_non_datetimelike_index(self): # this is a non datetimelike index xp = DataFrame() - self.assertRaises(TypeError, lambda: xp.resample('A').mean()) + pytest.raises(TypeError, lambda: xp.resample('A').mean()) def test_resample_empty_series(self): # GH12771 & GH12868 @@ -738,19 +773,8 @@ def test_resample_empty_series(self): expected = s.copy() expected.index = s.index._shallow_copy(freq=freq) assert_index_equal(result.index, expected.index) - self.assertEqual(result.index.freq, expected.index.freq) - - if (method == 'size' and - isinstance(result.index, PeriodIndex) and - freq in ['M', 'D']): - # GH12871 - TODO: name should propagate, but currently - # doesn't on lower / same frequency with PeriodIndex - assert_series_equal(result, expected, check_dtype=False, - check_names=False) - # this assert will break when fixed - self.assertTrue(result.name is None) - else: - assert_series_equal(result, expected, check_dtype=False) + assert result.index.freq == expected.index.freq + assert_series_equal(result, expected, check_dtype=False) def test_resample_empty_dataframe(self): # GH13212 @@ -759,15 +783,19 @@ def test_resample_empty_dataframe(self): for freq in ['M', 'D', 'H']: # count retains dimensions too - methods = downsample_methods + ['count'] + methods = downsample_methods + upsample_methods for method in methods: result = getattr(f.resample(freq), method)() + if method != 'size': + expected = f.copy() + else: + # GH14962 + expected = Series([]) - expected = f.copy() expected.index = f.index._shallow_copy(freq=freq) assert_index_equal(result.index, expected.index) - self.assertEqual(result.index.freq, expected.index.freq) - assert_frame_equal(result, expected, check_dtype=False) + assert result.index.freq == expected.index.freq + assert_almost_equal(result, expected, check_dtype=False) # test size for GH13212 (currently stays as df) @@ -817,19 +845,28 @@ def test_resample_loffset_arg_type(self): # GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex if isinstance(expected.index, TimedeltaIndex): - with tm.assertRaises(AssertionError): + with pytest.raises(AssertionError): assert_frame_equal(result_agg, expected) assert_frame_equal(result_how, expected) else: assert_frame_equal(result_agg, expected) assert_frame_equal(result_how, expected) + def test_apply_to_empty_series(self): + # GH 14313 + series = self.create_series()[:0] + + for freq in ['M', 'D', 'H']: + result = series.resample(freq).apply(lambda x: 1) + expected = series.resample(freq).apply(np.sum) + + assert_series_equal(result, expected, check_dtype=False) + -class TestDatetimeIndex(Base, tm.TestCase): - _multiprocess_can_split_ = True +class TestDatetimeIndex(Base): _index_factory = lambda x: date_range - def setUp(self): + def setup_method(self, method): dti = DatetimeIndex(start=datetime(2005, 1, 1), end=datetime(2005, 1, 10), freq='Min') @@ -863,8 +900,8 @@ def test_custom_grouper(self): for f in funcs: g._cython_agg_general(f) - self.assertEqual(g.ngroups, 2593) - self.assertTrue(notnull(g.mean()).all()) + assert g.ngroups == 2593 + assert notna(g.mean()).all() # construct expected val arr = [1] + [5] * 2592 @@ -880,8 +917,8 @@ def test_custom_grouper(self): index=dti, dtype='float64') r = df.groupby(b).agg(np.sum) - self.assertEqual(len(r.columns), 10) - self.assertEqual(len(r.index), 2593) + assert len(r.columns) == 10 + assert len(r.index) == 2593 def test_resample_basic(self): rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', @@ -893,7 +930,7 @@ def test_resample_basic(self): expected = Series([s[0], s[1:6].mean(), s[6:11].mean(), s[11:].mean()], index=exp_idx) assert_series_equal(result, expected) - self.assertEqual(result.index.name, 'index') + assert result.index.name == 'index' result = s.resample('5min', closed='left', label='right').mean() @@ -921,7 +958,7 @@ def test_resample_how(self): args = downsample_methods def _ohlc(group): - if isnull(group).all(): + if isna(group).all(): return np.repeat(np.nan, 4) return [group[0], group.max(), group.min(), group[-1]] @@ -937,7 +974,7 @@ def _ohlc(group): '5min', closed='right', label='right'), arg)() expected = s.groupby(grouplist).agg(func) - self.assertEqual(result.index.name, 'index') + assert result.index.name == 'index' if arg == 'ohlc': expected = DataFrame(expected.values.tolist()) expected.columns = ['open', 'high', 'low', 'close'] @@ -961,11 +998,11 @@ def test_numpy_compat(self): for func in ('min', 'max', 'sum', 'prod', 'mean', 'var', 'std'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(r, func), - func, 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(r, func), axis=1) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(r, func), + func, 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(r, func), axis=1) def test_resample_how_callables(self): # GH 7929 @@ -977,6 +1014,7 @@ def fn(x, a=1): return str(type(x)) class fn_class: + def __call__(self, x): return str(type(x)) @@ -1094,52 +1132,51 @@ def test_resample_basic_from_daily(self): # to weekly result = s.resample('w-sun').last() - self.assertEqual(len(result), 3) - self.assertTrue((result.index.dayofweek == [6, 6, 6]).all()) - self.assertEqual(result.iloc[0], s['1/2/2005']) - self.assertEqual(result.iloc[1], s['1/9/2005']) - self.assertEqual(result.iloc[2], s.iloc[-1]) + assert len(result) == 3 + assert (result.index.dayofweek == [6, 6, 6]).all() + assert result.iloc[0] == s['1/2/2005'] + assert result.iloc[1] == s['1/9/2005'] + assert result.iloc[2] == s.iloc[-1] result = s.resample('W-MON').last() - self.assertEqual(len(result), 2) - self.assertTrue((result.index.dayofweek == [0, 0]).all()) - self.assertEqual(result.iloc[0], s['1/3/2005']) - self.assertEqual(result.iloc[1], s['1/10/2005']) + assert len(result) == 2 + assert (result.index.dayofweek == [0, 0]).all() + assert result.iloc[0] == s['1/3/2005'] + assert result.iloc[1] == s['1/10/2005'] result = s.resample('W-TUE').last() - self.assertEqual(len(result), 2) - self.assertTrue((result.index.dayofweek == [1, 1]).all()) - self.assertEqual(result.iloc[0], s['1/4/2005']) - self.assertEqual(result.iloc[1], s['1/10/2005']) + assert len(result) == 2 + assert (result.index.dayofweek == [1, 1]).all() + assert result.iloc[0] == s['1/4/2005'] + assert result.iloc[1] == s['1/10/2005'] result = s.resample('W-WED').last() - self.assertEqual(len(result), 2) - self.assertTrue((result.index.dayofweek == [2, 2]).all()) - self.assertEqual(result.iloc[0], s['1/5/2005']) - self.assertEqual(result.iloc[1], s['1/10/2005']) + assert len(result) == 2 + assert (result.index.dayofweek == [2, 2]).all() + assert result.iloc[0] == s['1/5/2005'] + assert result.iloc[1] == s['1/10/2005'] result = s.resample('W-THU').last() - self.assertEqual(len(result), 2) - self.assertTrue((result.index.dayofweek == [3, 3]).all()) - self.assertEqual(result.iloc[0], s['1/6/2005']) - self.assertEqual(result.iloc[1], s['1/10/2005']) + assert len(result) == 2 + assert (result.index.dayofweek == [3, 3]).all() + assert result.iloc[0] == s['1/6/2005'] + assert result.iloc[1] == s['1/10/2005'] result = s.resample('W-FRI').last() - self.assertEqual(len(result), 2) - self.assertTrue((result.index.dayofweek == [4, 4]).all()) - self.assertEqual(result.iloc[0], s['1/7/2005']) - self.assertEqual(result.iloc[1], s['1/10/2005']) + assert len(result) == 2 + assert (result.index.dayofweek == [4, 4]).all() + assert result.iloc[0] == s['1/7/2005'] + assert result.iloc[1] == s['1/10/2005'] # to biz day result = s.resample('B').last() - self.assertEqual(len(result), 7) - self.assertTrue((result.index.dayofweek == [ - 4, 0, 1, 2, 3, 4, 0 - ]).all()) - self.assertEqual(result.iloc[0], s['1/2/2005']) - self.assertEqual(result.iloc[1], s['1/3/2005']) - self.assertEqual(result.iloc[5], s['1/9/2005']) - self.assertEqual(result.index.name, 'index') + assert len(result) == 7 + assert (result.index.dayofweek == [4, 0, 1, 2, 3, 4, 0]).all() + + assert result.iloc[0] == s['1/2/2005'] + assert result.iloc[1] == s['1/3/2005'] + assert result.iloc[5] == s['1/9/2005'] + assert result.index.name == 'index' def test_resample_upsampling_picked_but_not_correct(self): @@ -1148,7 +1185,7 @@ def test_resample_upsampling_picked_but_not_correct(self): series = Series(1, index=dates) result = series.resample('D').mean() - self.assertEqual(result.index[0], dates[0]) + assert result.index[0] == dates[0] # GH 5955 # incorrect deciding to upsample when the axis frequency matches the @@ -1209,7 +1246,7 @@ def test_resample_loffset(self): loffset=Minute(1)).mean() assert_series_equal(result, expected) - self.assertEqual(result.index.freq, Minute(5)) + assert result.index.freq == Minute(5) # from daily dti = DatetimeIndex(start=datetime(2005, 1, 1), @@ -1219,7 +1256,7 @@ def test_resample_loffset(self): # to weekly result = ser.resample('w-sun').last() expected = ser.resample('w-sun', loffset=-bday).last() - self.assertEqual(result.index[0] - bday, expected.index[0]) + assert result.index[0] - bday == expected.index[0] def test_resample_loffset_count(self): # GH 12725 @@ -1252,11 +1289,11 @@ def test_resample_upsample(self): # to minutely, by padding result = s.resample('Min').pad() - self.assertEqual(len(result), 12961) - self.assertEqual(result[0], s[0]) - self.assertEqual(result[-1], s[-1]) + assert len(result) == 12961 + assert result[0] == s[0] + assert result[-1] == s[-1] - self.assertEqual(result.index.name, 'index') + assert result.index.name == 'index' def test_resample_how_method(self): # GH9915 @@ -1299,20 +1336,20 @@ def test_resample_ohlc(self): expect = s.groupby(grouper).agg(lambda x: x[-1]) result = s.resample('5Min').ohlc() - self.assertEqual(len(result), len(expect)) - self.assertEqual(len(result.columns), 4) + assert len(result) == len(expect) + assert len(result.columns) == 4 xs = result.iloc[-2] - self.assertEqual(xs['open'], s[-6]) - self.assertEqual(xs['high'], s[-6:-1].max()) - self.assertEqual(xs['low'], s[-6:-1].min()) - self.assertEqual(xs['close'], s[-2]) + assert xs['open'] == s[-6] + assert xs['high'] == s[-6:-1].max() + assert xs['low'] == s[-6:-1].min() + assert xs['close'] == s[-2] xs = result.iloc[0] - self.assertEqual(xs['open'], s[0]) - self.assertEqual(xs['high'], s[:5].max()) - self.assertEqual(xs['low'], s[:5].min()) - self.assertEqual(xs['close'], s[4]) + assert xs['open'] == s[0] + assert xs['high'] == s[:5].max() + assert xs['low'] == s[:5].min() + assert xs['close'] == s[4] def test_resample_ohlc_result(self): @@ -1322,10 +1359,10 @@ def test_resample_ohlc_result(self): s = Series(range(len(index)), index=index) a = s.loc[:'4-15-2000'].resample('30T').ohlc() - self.assertIsInstance(a, DataFrame) + assert isinstance(a, DataFrame) b = s.loc[:'4-14-2000'].resample('30T').ohlc() - self.assertIsInstance(b, DataFrame) + assert isinstance(b, DataFrame) # GH12348 # raising on odd period @@ -1389,9 +1426,9 @@ def test_resample_reresample(self): s = Series(np.random.rand(len(dti)), dti) bs = s.resample('B', closed='right', label='right').mean() result = bs.resample('8H').mean() - self.assertEqual(len(result), 22) - tm.assertIsInstance(result.index.freq, offsets.DateOffset) - self.assertEqual(result.index.freq, offsets.Hour(8)) + assert len(result) == 22 + assert isinstance(result.index.freq, offsets.DateOffset) + assert result.index.freq == offsets.Hour(8) def test_resample_timestamp_to_period(self): ts = _simple_ts('1/1/1990', '1/1/2000') @@ -1418,7 +1455,7 @@ def test_resample_timestamp_to_period(self): def test_ohlc_5min(self): def _ohlc(group): - if isnull(group).all(): + if isna(group).all(): return np.repeat(np.nan, 4) return [group[0], group.max(), group.min(), group[-1]] @@ -1428,13 +1465,13 @@ def _ohlc(group): resampled = ts.resample('5min', closed='right', label='right').ohlc() - self.assertTrue((resampled.loc['1/1/2000 00:00'] == ts[0]).all()) + assert (resampled.loc['1/1/2000 00:00'] == ts[0]).all() exp = _ohlc(ts[1:31]) - self.assertTrue((resampled.loc['1/1/2000 00:05'] == exp).all()) + assert (resampled.loc['1/1/2000 00:05'] == exp).all() exp = _ohlc(ts['1/1/2000 5:55:01':]) - self.assertTrue((resampled.loc['1/1/2000 6:00:00'] == exp).all()) + assert (resampled.loc['1/1/2000 6:00:00'] == exp).all() def test_downsample_non_unique(self): rng = date_range('1/1/2000', '2/29/2000') @@ -1444,7 +1481,7 @@ def test_downsample_non_unique(self): result = ts.resample('M').mean() expected = ts.groupby(lambda x: x.month).mean() - self.assertEqual(len(result), 2) + assert len(result) == 2 assert_almost_equal(result[0], expected[1]) assert_almost_equal(result[1], expected[2]) @@ -1454,7 +1491,7 @@ def test_asfreq_non_unique(self): rng2 = rng.repeat(2).values ts = Series(np.random.randn(len(rng2)), index=rng2) - self.assertRaises(Exception, ts.asfreq, 'B') + pytest.raises(Exception, ts.asfreq, 'B') def test_resample_axis1(self): rng = date_range('1/1/2000', '2/29/2000') @@ -1469,44 +1506,47 @@ def test_resample_panel(self): rng = date_range('1/1/2000', '6/30/2000') n = len(rng) - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) + with catch_warnings(record=True): + panel = Panel(np.random.randn(3, n, 5), + items=['one', 'two', 'three'], + major_axis=rng, + minor_axis=['a', 'b', 'c', 'd', 'e']) - result = panel.resample('M', axis=1).mean() + result = panel.resample('M', axis=1).mean() - def p_apply(panel, f): - result = {} - for item in panel.items: - result[item] = f(panel[item]) - return Panel(result, items=panel.items) + def p_apply(panel, f): + result = {} + for item in panel.items: + result[item] = f(panel[item]) + return Panel(result, items=panel.items) - expected = p_apply(panel, lambda x: x.resample('M').mean()) - tm.assert_panel_equal(result, expected) + expected = p_apply(panel, lambda x: x.resample('M').mean()) + tm.assert_panel_equal(result, expected) - panel2 = panel.swapaxes(1, 2) - result = panel2.resample('M', axis=2).mean() - expected = p_apply(panel2, lambda x: x.resample('M', axis=1).mean()) - tm.assert_panel_equal(result, expected) + panel2 = panel.swapaxes(1, 2) + result = panel2.resample('M', axis=2).mean() + expected = p_apply(panel2, + lambda x: x.resample('M', axis=1).mean()) + tm.assert_panel_equal(result, expected) def test_resample_panel_numpy(self): rng = date_range('1/1/2000', '6/30/2000') n = len(rng) - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) + with catch_warnings(record=True): + panel = Panel(np.random.randn(3, n, 5), + items=['one', 'two', 'three'], + major_axis=rng, + minor_axis=['a', 'b', 'c', 'd', 'e']) - result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) - expected = panel.resample('M', axis=1).mean() - tm.assert_panel_equal(result, expected) + result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) + expected = panel.resample('M', axis=1).mean() + tm.assert_panel_equal(result, expected) - panel = panel.swapaxes(1, 2) - result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) - expected = panel.resample('M', axis=2).mean() - tm.assert_panel_equal(result, expected) + panel = panel.swapaxes(1, 2) + result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) + expected = panel.resample('M', axis=2).mean() + tm.assert_panel_equal(result, expected) def test_resample_anchored_ticks(self): # If a fixed delta (5 minute, 4 hour) evenly divides a day, we should @@ -1551,7 +1591,7 @@ def test_resample_base(self): resampled = ts.resample('5min', base=2).mean() exp_rng = date_range('12/31/1999 23:57:00', '1/1/2000 01:57', freq='5min') - self.assert_index_equal(resampled.index, exp_rng) + tm.assert_index_equal(resampled.index, exp_rng) def test_resample_base_with_timedeltaindex(self): @@ -1565,8 +1605,8 @@ def test_resample_base_with_timedeltaindex(self): exp_without_base = timedelta_range(start='0s', end='25s', freq='2s') exp_with_base = timedelta_range(start='5s', end='29s', freq='2s') - self.assert_index_equal(without_base.index, exp_without_base) - self.assert_index_equal(with_base.index, exp_with_base) + tm.assert_index_equal(without_base.index, exp_without_base) + tm.assert_index_equal(with_base.index, exp_with_base) def test_resample_categorical_data_with_timedeltaindex(self): # GH #12169 @@ -1597,7 +1637,7 @@ def test_resample_to_period_monthly_buglet(self): result = ts.resample('M', kind='period').mean() exp_index = period_range('Jan-2000', 'Dec-2000', freq='M') - self.assert_index_equal(result.index, exp_index) + tm.assert_index_equal(result.index, exp_index) def test_period_with_agg(self): @@ -1641,10 +1681,32 @@ def test_resample_dtype_preservation(self): ).set_index('date') result = df.resample('1D').ffill() - self.assertEqual(result.val.dtype, np.int32) + assert result.val.dtype == np.int32 result = df.groupby('group').resample('1D').ffill() - self.assertEqual(result.val.dtype, np.int32) + assert result.val.dtype == np.int32 + + def test_resample_dtype_coerceion(self): + + pytest.importorskip('scipy.interpolate') + + # GH 16361 + df = {"a": [1, 3, 1, 4]} + df = pd.DataFrame( + df, index=pd.date_range("2017-01-01", "2017-01-04")) + + expected = (df.astype("float64") + .resample("H") + .mean() + ["a"] + .interpolate("cubic") + ) + + result = df.resample("H")["a"].mean().interpolate("cubic") + tm.assert_series_equal(result, expected) + + result = df.resample("H").mean()["a"].interpolate("cubic") + tm.assert_series_equal(result, expected) def test_weekly_resample_buglet(self): # #1327 @@ -1718,7 +1780,7 @@ def test_resample_anchored_intraday(self): ts = _simple_ts('2012-04-29 23:00', '2012-04-30 5:00', freq='h') resampled = ts.resample('M').mean() - self.assertEqual(len(resampled), 1) + assert len(resampled) == 1 def test_resample_anchored_monthstart(self): ts = _simple_ts('1/1/2000', '12/31/2002') @@ -1744,13 +1806,11 @@ def test_resample_anchored_multiday(self): # Ensure left closing works result = s.resample('2200L').mean() - self.assertEqual(result.index[-1], - pd.Timestamp('2014-10-15 23:00:02.000')) + assert result.index[-1] == pd.Timestamp('2014-10-15 23:00:02.000') # Ensure right closing works result = s.resample('2200L', label='right').mean() - self.assertEqual(result.index[-1], - pd.Timestamp('2014-10-15 23:00:04.200')) + assert result.index[-1] == pd.Timestamp('2014-10-15 23:00:04.200') def test_corner_cases(self): # miscellaneous test coverage @@ -1760,18 +1820,18 @@ def test_corner_cases(self): result = ts.resample('5t', closed='right', label='left').mean() ex_index = date_range('1999-12-31 23:55', periods=4, freq='5t') - self.assert_index_equal(result.index, ex_index) + tm.assert_index_equal(result.index, ex_index) len0pts = _simple_pts('2007-01', '2010-05', freq='M')[:0] # it works result = len0pts.resample('A-DEC').mean() - self.assertEqual(len(result), 0) + assert len(result) == 0 # resample to periods ts = _simple_ts('2000-04-28', '2000-04-30 11:00', freq='h') result = ts.resample('M', kind='period').mean() - self.assertEqual(len(result), 1) - self.assertEqual(result.index[0], Period('2000-04', freq='M')) + assert len(result) == 1 + assert result.index[0] == Period('2000-04', freq='M') def test_anchored_lowercase_buglet(self): dates = date_range('4/16/2012 20:00', periods=50000, freq='s') @@ -1786,7 +1846,7 @@ def test_upsample_apply_functions(self): ts = Series(np.random.randn(len(rng)), index=rng) result = ts.resample('20min').aggregate(['mean', 'sum']) - tm.assertIsInstance(result, DataFrame) + assert isinstance(result, DataFrame) def test_resample_not_monotonic(self): rng = pd.date_range('2012-06-12', periods=200, freq='h') @@ -1832,10 +1892,12 @@ def test_how_lambda_functions(self): tm.assert_series_equal(result['foo'], foo_exp) tm.assert_series_equal(result['bar'], bar_exp) + # this is a MI Series, so comparing the names of the results + # doesn't make sense result = ts.resample('M').aggregate({'foo': lambda x: x.mean(), 'bar': lambda x: x.std(ddof=1)}) - tm.assert_series_equal(result['foo'], foo_exp) - tm.assert_series_equal(result['bar'], bar_exp) + tm.assert_series_equal(result['foo'], foo_exp, check_names=False) + tm.assert_series_equal(result['bar'], bar_exp, check_names=False) def test_resample_unequal_times(self): # #1772 @@ -1915,7 +1977,7 @@ def test_resample_nunique(self): g = df.groupby(pd.Grouper(freq='D')) expected = df.groupby(pd.TimeGrouper('D')).ID.apply(lambda x: x.nunique()) - self.assertEqual(expected.name, 'ID') + assert expected.name == 'ID' for t in [r, g]: result = r.ID.nunique() @@ -1927,6 +1989,26 @@ def test_resample_nunique(self): result = df.ID.groupby(pd.Grouper(freq='D')).nunique() assert_series_equal(result, expected) + def test_resample_nunique_with_date_gap(self): + # GH 13453 + index = pd.date_range('1-1-2000', '2-15-2000', freq='h') + index2 = pd.date_range('4-15-2000', '5-15-2000', freq='h') + index3 = index.append(index2) + s = pd.Series(range(len(index3)), index=index3, dtype='int64') + r = s.resample('M') + + # Since all elements are unique, these should all be the same + results = [ + r.count(), + r.nunique(), + r.agg(pd.Series.nunique), + r.agg('nunique') + ] + + assert_series_equal(results[0], results[1]) + assert_series_equal(results[0], results[2]) + assert_series_equal(results[0], results[3]) + def test_resample_group_info(self): # GH10914 for n, k in product((10000, 100000), (10, 100, 1000)): dr = date_range(start='2015-08-27', periods=n // 10, freq='T') @@ -2121,8 +2203,7 @@ def test_resample_datetime_values(self): tm.assert_series_equal(res, exp) -class TestPeriodIndex(Base, tm.TestCase): - _multiprocess_can_split_ = True +class TestPeriodIndex(Base): _index_factory = lambda x: period_range def create_series(self): @@ -2206,10 +2287,10 @@ def test_selection(self): np.arange(len(index), dtype=np.int64), index], names=['v', 'd'])) - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.resample('2D', on='date') - with tm.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.resample('2D', level='d') def test_annual_upsample_D_s_f(self): @@ -2272,10 +2353,10 @@ def test_basic_downsample(self): def test_not_subperiod(self): # These are incompatible period rules for resampling ts = _simple_pts('1/1/1990', '6/30/1995', freq='w-wed') - self.assertRaises(ValueError, lambda: ts.resample('a-dec').mean()) - self.assertRaises(ValueError, lambda: ts.resample('q-mar').mean()) - self.assertRaises(ValueError, lambda: ts.resample('M').mean()) - self.assertRaises(ValueError, lambda: ts.resample('w-thu').mean()) + pytest.raises(ValueError, lambda: ts.resample('a-dec').mean()) + pytest.raises(ValueError, lambda: ts.resample('q-mar').mean()) + pytest.raises(ValueError, lambda: ts.resample('M').mean()) + pytest.raises(ValueError, lambda: ts.resample('w-thu').mean()) def test_basic_upsample(self): ts = _simple_pts('1/1/1990', '6/30/1995', freq='M') @@ -2376,15 +2457,12 @@ def test_resample_same_freq(self): def test_resample_incompat_freq(self): - with self.assertRaises(IncompatibleFrequency): + with pytest.raises(IncompatibleFrequency): pd.Series(range(3), index=pd.period_range( start='2000', periods=3, freq='M')).resample('W').mean() def test_with_local_timezone_pytz(self): - # GH5430 - tm._skip_if_no_pytz() - import pytz - + # see gh-5430 local_timezone = pytz.timezone('America/Los_Angeles') start = datetime(year=2013, month=11, day=1, hour=0, minute=0, @@ -2407,10 +2485,7 @@ def test_with_local_timezone_pytz(self): assert_series_equal(result, expected) def test_with_local_timezone_dateutil(self): - # GH5430 - tm._skip_if_no_dateutil() - import dateutil - + # see gh-5430 local_timezone = 'dateutil/America/Los_Angeles' start = datetime(year=2013, month=11, day=1, hour=0, minute=0, @@ -2502,7 +2577,7 @@ def test_resample_fill_missing(self): def test_cant_fill_missing_dups(self): rng = PeriodIndex([2000, 2005, 2005, 2007, 2007], freq='A') s = Series(np.random.randn(5), index=rng) - self.assertRaises(Exception, lambda: s.resample('A').ffill()) + pytest.raises(Exception, lambda: s.resample('A').ffill()) def test_resample_5minute(self): rng = period_range('1/1/2000', '1/5/2000', freq='T') @@ -2541,7 +2616,7 @@ def test_resample_weekly_all_na(self): result = ts.resample('W-THU').asfreq() - self.assertTrue(result.isnull().all()) + assert result.isna().all() result = ts.resample('W-THU').asfreq().ffill()[:-1] expected = ts.asfreq('W-THU').ffill() @@ -2619,7 +2694,7 @@ def test_closed_left_corner(self): ex_index = date_range(start='1/1/2012 9:30', freq='10min', periods=3) - self.assert_index_equal(result.index, ex_index) + tm.assert_index_equal(result.index, ex_index) assert_series_equal(result, exp) def test_quarterly_resampling(self): @@ -2646,8 +2721,8 @@ def test_resample_bms_2752(self): foo = pd.Series(index=pd.bdate_range('20000101', '20000201')) res1 = foo.resample("BMS").mean() res2 = foo.resample("BMS").mean().resample("B").mean() - self.assertEqual(res1.index[0], Timestamp('20000103')) - self.assertEqual(res1.index[0], res2.index[0]) + assert res1.index[0] == Timestamp('20000103') + assert res1.index[0] == res2.index[0] # def test_monthly_convention_span(self): # rng = period_range('2000-01', periods=3, freq='M') @@ -2729,9 +2804,16 @@ def test_evenly_divisible_with_no_extra_bins(self): result = df.resample('7D').sum() assert_frame_equal(result, expected) + def test_apply_to_empty_series(self): + # GH 14313 + series = self.create_series()[:0] + + for freq in ['M', 'D', 'H']: + with pytest.raises(TypeError): + series.resample(freq).apply(lambda x: 1) -class TestTimedeltaIndex(Base, tm.TestCase): - _multiprocess_can_split_ = True + +class TestTimedeltaIndex(Base): _index_factory = lambda x: timedelta_range def create_series(self): @@ -2752,8 +2834,9 @@ def test_asfreq_bug(self): assert_frame_equal(result, expected) -class TestResamplerGrouper(tm.TestCase): - def setUp(self): +class TestResamplerGrouper(object): + + def setup_method(self, method): self.frame = DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, 'B': np.arange(40)}, index=date_range('1/1/2000', @@ -2777,6 +2860,19 @@ def test_back_compat_v180(self): expected = df.groupby('A').resample('4s').mean().ffill() assert_frame_equal(result, expected) + def test_tab_complete_ipython6_warning(self, ip): + from IPython.core.completer import provisionalcompleter + code = dedent("""\ + import pandas.util.testing as tm + s = tm.makeTimeSeries() + rs = s.resample("D") + """) + ip.run_code(code) + + with tm.assert_produces_warning(None): + with provisionalcompleter('ignore'): + list(ip.Completer.completions('rs.', 1)) + def test_deferred_with_groupby(self): # GH 12486 @@ -2924,11 +3020,11 @@ def test_consistency_with_window(self): df = self.frame expected = pd.Int64Index([1, 2, 3], name='A') result = df.groupby('A').resample('2s').mean() - self.assertEqual(result.index.nlevels, 2) + assert result.index.nlevels == 2 tm.assert_index_equal(result.index.levels[0], expected) result = df.groupby('A').rolling(20).mean() - self.assertEqual(result.index.nlevels, 2) + assert result.index.nlevels == 2 tm.assert_index_equal(result.index.levels[0], expected) def test_median_duplicate_columns(self): @@ -2946,8 +3042,9 @@ def test_median_duplicate_columns(self): assert_frame_equal(result, expected) -class TestTimeGrouper(tm.TestCase): - def setUp(self): +class TestTimeGrouper(object): + + def setup_method(self, method): self.ts = Series(np.random.randn(1000), index=date_range('1/1/2000', periods=1000)) @@ -3002,25 +3099,27 @@ def test_apply_iteration(self): # it works! result = grouped.apply(f) - self.assert_index_equal(result.index, df.index) + tm.assert_index_equal(result.index, df.index) def test_panel_aggregation(self): ind = pd.date_range('1/1/2000', periods=100) data = np.random.randn(2, len(ind), 4) - wp = pd.Panel(data, items=['Item1', 'Item2'], major_axis=ind, - minor_axis=['A', 'B', 'C', 'D']) - tg = TimeGrouper('M', axis=1) - _, grouper, _ = tg._get_grouper(wp) - bingrouped = wp.groupby(grouper) - binagg = bingrouped.mean() + with catch_warnings(record=True): + wp = Panel(data, items=['Item1', 'Item2'], major_axis=ind, + minor_axis=['A', 'B', 'C', 'D']) - def f(x): - assert (isinstance(x, Panel)) - return x.mean(1) + tg = TimeGrouper('M', axis=1) + _, grouper, _ = tg._get_grouper(wp) + bingrouped = wp.groupby(grouper) + binagg = bingrouped.mean() - result = bingrouped.agg(f) - tm.assert_panel_equal(result, binagg) + def f(x): + assert (isinstance(x, Panel)) + return x.mean(1) + + result = bingrouped.agg(f) + tm.assert_panel_equal(result, binagg) def test_fails_on_no_datetime_index(self): index_names = ('Int64Index', 'Index', 'Float64Index', 'MultiIndex') @@ -3031,17 +3130,18 @@ def test_fails_on_no_datetime_index(self): for name, func in zip(index_names, index_funcs): index = func(n) df = DataFrame({'a': np.random.randn(n)}, index=index) - with tm.assertRaisesRegexp(TypeError, - "Only valid with DatetimeIndex, " - "TimedeltaIndex or PeriodIndex, " - "but got an instance of %r" % name): + with tm.assert_raises_regex(TypeError, + "Only valid with " + "DatetimeIndex, TimedeltaIndex " + "or PeriodIndex, but got an " + "instance of %r" % name): df.groupby(TimeGrouper('D')) # PeriodIndex gives a specific error message df = DataFrame({'a': np.random.randn(n)}, index=tm.makePeriodIndex(n)) - with tm.assertRaisesRegexp(TypeError, - "axis must be a DatetimeIndex, but " - "got an instance of 'PeriodIndex'"): + with tm.assert_raises_regex(TypeError, + "axis must be a DatetimeIndex, but " + "got an instance of 'PeriodIndex'"): df.groupby(TimeGrouper('D')) def test_aaa_group_order(self): @@ -3170,12 +3270,7 @@ def test_aggregate_with_nat(self): dt_result = getattr(dt_grouped, func)() assert_series_equal(expected, dt_result) # GH 9925 - self.assertEqual(dt_result.index.name, 'key') + assert dt_result.index.name == 'key' # if NaT is included, 'var', 'std', 'mean', 'first','last' # and 'nth' doesn't work yet - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_sorting.py b/pandas/tests/test_sorting.py new file mode 100644 index 0000000000000..e58042961129d --- /dev/null +++ b/pandas/tests/test_sorting.py @@ -0,0 +1,438 @@ +import pytest +from itertools import product +from collections import defaultdict +import warnings +from datetime import datetime + +import numpy as np +from numpy import nan +import pandas as pd +from pandas.core import common as com +from pandas import DataFrame, MultiIndex, merge, concat, Series, compat +from pandas.util import testing as tm +from pandas.util.testing import assert_frame_equal, assert_series_equal +from pandas.core.sorting import (is_int64_overflow_possible, + decons_group_index, + get_group_index, + nargsort, + lexsort_indexer, + safe_sort) + + +class TestSorting(object): + + @pytest.mark.slow + def test_int64_overflow(self): + + B = np.concatenate((np.arange(1000), np.arange(1000), np.arange(500))) + A = np.arange(2500) + df = DataFrame({'A': A, + 'B': B, + 'C': A, + 'D': B, + 'E': A, + 'F': B, + 'G': A, + 'H': B, + 'values': np.random.randn(2500)}) + + lg = df.groupby(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) + rg = df.groupby(['H', 'G', 'F', 'E', 'D', 'C', 'B', 'A']) + + left = lg.sum()['values'] + right = rg.sum()['values'] + + exp_index, _ = left.index.sortlevel() + tm.assert_index_equal(left.index, exp_index) + + exp_index, _ = right.index.sortlevel(0) + tm.assert_index_equal(right.index, exp_index) + + tups = list(map(tuple, df[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H' + ]].values)) + tups = com._asarray_tuplesafe(tups) + + expected = df.groupby(tups).sum()['values'] + + for k, v in compat.iteritems(expected): + assert left[k] == right[k[::-1]] + assert left[k] == v + assert len(left) == len(right) + + def test_int64_overflow_moar(self): + + # GH9096 + values = range(55109) + data = pd.DataFrame.from_dict({'a': values, + 'b': values, + 'c': values, + 'd': values}) + grouped = data.groupby(['a', 'b', 'c', 'd']) + assert len(grouped) == len(values) + + arr = np.random.randint(-1 << 12, 1 << 12, (1 << 15, 5)) + i = np.random.choice(len(arr), len(arr) * 4) + arr = np.vstack((arr, arr[i])) # add sume duplicate rows + + i = np.random.permutation(len(arr)) + arr = arr[i] # shuffle rows + + df = DataFrame(arr, columns=list('abcde')) + df['jim'], df['joe'] = np.random.randn(2, len(df)) * 10 + gr = df.groupby(list('abcde')) + + # verify this is testing what it is supposed to test! + assert is_int64_overflow_possible(gr.grouper.shape) + + # mannually compute groupings + jim, joe = defaultdict(list), defaultdict(list) + for key, a, b in zip(map(tuple, arr), df['jim'], df['joe']): + jim[key].append(a) + joe[key].append(b) + + assert len(gr) == len(jim) + mi = MultiIndex.from_tuples(jim.keys(), names=list('abcde')) + + def aggr(func): + f = lambda a: np.fromiter(map(func, a), dtype='f8') + arr = np.vstack((f(jim.values()), f(joe.values()))).T + res = DataFrame(arr, columns=['jim', 'joe'], index=mi) + return res.sort_index() + + assert_frame_equal(gr.mean(), aggr(np.mean)) + assert_frame_equal(gr.median(), aggr(np.median)) + + def test_lexsort_indexer(self): + keys = [[nan] * 5 + list(range(100)) + [nan] * 5] + # orders=True, na_position='last' + result = lexsort_indexer(keys, orders=True, na_position='last') + exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) + + # orders=True, na_position='first' + result = lexsort_indexer(keys, orders=True, na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) + tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) + + # orders=False, na_position='last' + result = lexsort_indexer(keys, orders=False, na_position='last') + exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) + + # orders=False, na_position='first' + result = lexsort_indexer(keys, orders=False, na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) + tm.assert_numpy_array_equal(result, np.array(exp, dtype=np.intp)) + + def test_nargsort(self): + # np.argsort(items) places NaNs last + items = [nan] * 5 + list(range(100)) + [nan] * 5 + # np.argsort(items2) may not place NaNs first + items2 = np.array(items, dtype='O') + + try: + # GH 2785; due to a regression in NumPy1.6.2 + np.argsort(np.array([[1, 2], [1, 3], [1, 2]], dtype='i')) + np.argsort(items2, kind='mergesort') + except TypeError: + pytest.skip('requested sort not available for type') + + # mergesort is the most difficult to get right because we want it to be + # stable. + + # According to numpy/core/tests/test_multiarray, """The number of + # sorted items must be greater than ~50 to check the actual algorithm + # because quick and merge sort fall over to insertion sort for small + # arrays.""" + + # mergesort, ascending=True, na_position='last' + result = nargsort(items, kind='mergesort', ascending=True, + na_position='last') + exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=True, na_position='first' + result = nargsort(items, kind='mergesort', ascending=True, + na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=False, na_position='last' + result = nargsort(items, kind='mergesort', ascending=False, + na_position='last') + exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=False, na_position='first' + result = nargsort(items, kind='mergesort', ascending=False, + na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=True, na_position='last' + result = nargsort(items2, kind='mergesort', ascending=True, + na_position='last') + exp = list(range(5, 105)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=True, na_position='first' + result = nargsort(items2, kind='mergesort', ascending=True, + na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(5, 105)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=False, na_position='last' + result = nargsort(items2, kind='mergesort', ascending=False, + na_position='last') + exp = list(range(104, 4, -1)) + list(range(5)) + list(range(105, 110)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + # mergesort, ascending=False, na_position='first' + result = nargsort(items2, kind='mergesort', ascending=False, + na_position='first') + exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1)) + tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False) + + +class TestMerge(object): + + @pytest.mark.slow + def test_int64_overflow_issues(self): + + # #2690, combinatorial explosion + df1 = DataFrame(np.random.randn(1000, 7), + columns=list('ABCDEF') + ['G1']) + df2 = DataFrame(np.random.randn(1000, 7), + columns=list('ABCDEF') + ['G2']) + + # it works! + result = merge(df1, df2, how='outer') + assert len(result) == 2000 + + low, high, n = -1 << 10, 1 << 10, 1 << 20 + left = DataFrame(np.random.randint(low, high, (n, 7)), + columns=list('ABCDEFG')) + left['left'] = left.sum(axis=1) + + # one-2-one match + i = np.random.permutation(len(left)) + right = left.iloc[i].copy() + right.columns = right.columns[:-1].tolist() + ['right'] + right.index = np.arange(len(right)) + right['right'] *= -1 + + out = merge(left, right, how='outer') + assert len(out) == len(left) + assert_series_equal(out['left'], - out['right'], check_names=False) + result = out.iloc[:, :-2].sum(axis=1) + assert_series_equal(out['left'], result, check_names=False) + assert result.name is None + + out.sort_values(out.columns.tolist(), inplace=True) + out.index = np.arange(len(out)) + for how in ['left', 'right', 'outer', 'inner']: + assert_frame_equal(out, merge(left, right, how=how, sort=True)) + + # check that left merge w/ sort=False maintains left frame order + out = merge(left, right, how='left', sort=False) + assert_frame_equal(left, out[left.columns.tolist()]) + + out = merge(right, left, how='left', sort=False) + assert_frame_equal(right, out[right.columns.tolist()]) + + # one-2-many/none match + n = 1 << 11 + left = DataFrame(np.random.randint(low, high, (n, 7)).astype('int64'), + columns=list('ABCDEFG')) + + # confirm that this is checking what it is supposed to check + shape = left.apply(Series.nunique).values + assert is_int64_overflow_possible(shape) + + # add duplicates to left frame + left = concat([left, left], ignore_index=True) + + right = DataFrame(np.random.randint(low, high, (n // 2, 7)) + .astype('int64'), + columns=list('ABCDEFG')) + + # add duplicates & overlap with left to the right frame + i = np.random.choice(len(left), n) + right = concat([right, right, left.iloc[i]], ignore_index=True) + + left['left'] = np.random.randn(len(left)) + right['right'] = np.random.randn(len(right)) + + # shuffle left & right frames + i = np.random.permutation(len(left)) + left = left.iloc[i].copy() + left.index = np.arange(len(left)) + + i = np.random.permutation(len(right)) + right = right.iloc[i].copy() + right.index = np.arange(len(right)) + + # manually compute outer merge + ldict, rdict = defaultdict(list), defaultdict(list) + + for idx, row in left.set_index(list('ABCDEFG')).iterrows(): + ldict[idx].append(row['left']) + + for idx, row in right.set_index(list('ABCDEFG')).iterrows(): + rdict[idx].append(row['right']) + + vals = [] + for k, lval in ldict.items(): + rval = rdict.get(k, [np.nan]) + for lv, rv in product(lval, rval): + vals.append(k + tuple([lv, rv])) + + for k, rval in rdict.items(): + if k not in ldict: + for rv in rval: + vals.append(k + tuple([np.nan, rv])) + + def align(df): + df = df.sort_values(df.columns.tolist()) + df.index = np.arange(len(df)) + return df + + def verify_order(df): + kcols = list('ABCDEFG') + assert_frame_equal(df[kcols].copy(), + df[kcols].sort_values(kcols, kind='mergesort')) + + out = DataFrame(vals, columns=list('ABCDEFG') + ['left', 'right']) + out = align(out) + + jmask = {'left': out['left'].notna(), + 'right': out['right'].notna(), + 'inner': out['left'].notna() & out['right'].notna(), + 'outer': np.ones(len(out), dtype='bool')} + + for how in 'left', 'right', 'outer', 'inner': + mask = jmask[how] + frame = align(out[mask].copy()) + assert mask.all() ^ mask.any() or how == 'outer' + + for sort in [False, True]: + res = merge(left, right, how=how, sort=sort) + if sort: + verify_order(res) + + # as in GH9092 dtypes break with outer/right join + assert_frame_equal(frame, align(res), + check_dtype=how not in ('right', 'outer')) + + +def test_decons(): + + def testit(label_list, shape): + group_index = get_group_index(label_list, shape, sort=True, xnull=True) + label_list2 = decons_group_index(group_index, shape) + + for a, b in zip(label_list, label_list2): + assert (np.array_equal(a, b)) + + shape = (4, 5, 6) + label_list = [np.tile([0, 1, 2, 3, 0, 1, 2, 3], 100), np.tile( + [0, 2, 4, 3, 0, 1, 2, 3], 100), np.tile( + [5, 1, 0, 2, 3, 0, 5, 4], 100)] + testit(label_list, shape) + + shape = (10000, 10000) + label_list = [np.tile(np.arange(10000), 5), np.tile(np.arange(10000), 5)] + testit(label_list, shape) + + +class TestSafeSort(object): + + def test_basic_sort(self): + values = [3, 1, 2, 0, 4] + result = safe_sort(values) + expected = np.array([0, 1, 2, 3, 4]) + tm.assert_numpy_array_equal(result, expected) + + values = list("baaacb") + result = safe_sort(values) + expected = np.array(list("aaabbc"), dtype='object') + tm.assert_numpy_array_equal(result, expected) + + values = [] + result = safe_sort(values) + expected = np.array([]) + tm.assert_numpy_array_equal(result, expected) + + def test_labels(self): + values = [3, 1, 2, 0, 4] + expected = np.array([0, 1, 2, 3, 4]) + + labels = [0, 1, 1, 2, 3, 0, -1, 4] + result, result_labels = safe_sort(values, labels) + expected_labels = np.array([3, 1, 1, 2, 0, 3, -1, 4], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result_labels, expected_labels) + + # na_sentinel + labels = [0, 1, 1, 2, 3, 0, 99, 4] + result, result_labels = safe_sort(values, labels, + na_sentinel=99) + expected_labels = np.array([3, 1, 1, 2, 0, 3, 99, 4], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result_labels, expected_labels) + + # out of bound indices + labels = [0, 101, 102, 2, 3, 0, 99, 4] + result, result_labels = safe_sort(values, labels) + expected_labels = np.array([3, -1, -1, 2, 0, 3, -1, 4], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result_labels, expected_labels) + + labels = [] + result, result_labels = safe_sort(values, labels) + expected_labels = np.array([], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result_labels, expected_labels) + + def test_mixed_integer(self): + values = np.array(['b', 1, 0, 'a', 0, 'b'], dtype=object) + result = safe_sort(values) + expected = np.array([0, 0, 1, 'a', 'b', 'b'], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + values = np.array(['b', 1, 0, 'a'], dtype=object) + labels = [0, 1, 2, 3, 0, -1, 1] + result, result_labels = safe_sort(values, labels) + expected = np.array([0, 1, 'a', 'b'], dtype=object) + expected_labels = np.array([3, 1, 0, 2, 3, -1, 1], dtype=np.intp) + tm.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result_labels, expected_labels) + + def test_mixed_interger_from_list(self): + values = ['b', 1, 0, 'a', 0, 'b'] + result = safe_sort(values) + expected = np.array([0, 0, 1, 'a', 'b', 'b'], dtype=object) + tm.assert_numpy_array_equal(result, expected) + + def test_unsortable(self): + # GH 13714 + arr = np.array([1, 2, datetime.now(), 0, 3], dtype=object) + if compat.PY2 and not pd._np_version_under1p10: + # RuntimeWarning: tp_compare didn't return -1 or -2 for exception + with warnings.catch_warnings(): + pytest.raises(TypeError, safe_sort, arr) + else: + pytest.raises(TypeError, safe_sort, arr) + + def test_exceptions(self): + with tm.assert_raises_regex(TypeError, + "Only list-like objects are allowed"): + safe_sort(values=1) + + with tm.assert_raises_regex(TypeError, + "Only list-like objects or None"): + safe_sort(values=[0, 1, 2], labels=1) + + with tm.assert_raises_regex(ValueError, + "values should be unique"): + safe_sort(values=[0, 1, 2, 1], labels=[0, 1]) diff --git a/pandas/tests/test_stats.py b/pandas/tests/test_stats.py deleted file mode 100644 index 41d25b9662b5b..0000000000000 --- a/pandas/tests/test_stats.py +++ /dev/null @@ -1,192 +0,0 @@ -# -*- coding: utf-8 -*- -from pandas import compat -import nose - -from distutils.version import LooseVersion -from numpy import nan -import numpy as np - -from pandas import Series, DataFrame - -from pandas.compat import product -from pandas.util.testing import (assert_frame_equal, assert_series_equal) -import pandas.util.testing as tm - - -class TestRank(tm.TestCase): - _multiprocess_can_split_ = True - s = Series([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]) - df = DataFrame({'A': s, 'B': s}) - - results = { - 'average': np.array([1.5, 5.5, 7.0, 3.5, nan, - 3.5, 1.5, 8.0, nan, 5.5]), - 'min': np.array([1, 5, 7, 3, nan, 3, 1, 8, nan, 5]), - 'max': np.array([2, 6, 7, 4, nan, 4, 2, 8, nan, 6]), - 'first': np.array([1, 5, 7, 3, nan, 4, 2, 8, nan, 6]), - 'dense': np.array([1, 3, 4, 2, nan, 2, 1, 5, nan, 3]), - } - - def test_rank_tie_methods(self): - s = self.s - - def _check(s, expected, method='average'): - result = s.rank(method=method) - tm.assert_series_equal(result, Series(expected)) - - dtypes = [None, object] - disabled = set([(object, 'first')]) - results = self.results - - for method, dtype in product(results, dtypes): - if (dtype, method) in disabled: - continue - series = s if dtype is None else s.astype(dtype) - _check(series, results[method], method=method) - - def test_rank_methods_series(self): - tm.skip_if_no_package('scipy', '0.13', 'scipy.stats.rankdata') - import scipy - from scipy.stats import rankdata - - xs = np.random.randn(9) - xs = np.concatenate([xs[i:] for i in range(0, 9, 2)]) # add duplicates - np.random.shuffle(xs) - - index = [chr(ord('a') + i) for i in range(len(xs))] - - for vals in [xs, xs + 1e6, xs * 1e-6]: - ts = Series(vals, index=index) - - for m in ['average', 'min', 'max', 'first', 'dense']: - result = ts.rank(method=m) - sprank = rankdata(vals, m if m != 'first' else 'ordinal') - expected = Series(sprank, index=index) - - if LooseVersion(scipy.__version__) >= '0.17.0': - expected = expected.astype('float64') - tm.assert_series_equal(result, expected) - - def test_rank_methods_frame(self): - tm.skip_if_no_package('scipy', '0.13', 'scipy.stats.rankdata') - import scipy - from scipy.stats import rankdata - - xs = np.random.randint(0, 21, (100, 26)) - xs = (xs - 10.0) / 10.0 - cols = [chr(ord('z') - i) for i in range(xs.shape[1])] - - for vals in [xs, xs + 1e6, xs * 1e-6]: - df = DataFrame(vals, columns=cols) - - for ax in [0, 1]: - for m in ['average', 'min', 'max', 'first', 'dense']: - result = df.rank(axis=ax, method=m) - sprank = np.apply_along_axis( - rankdata, ax, vals, - m if m != 'first' else 'ordinal') - sprank = sprank.astype(np.float64) - expected = DataFrame(sprank, columns=cols) - - if LooseVersion(scipy.__version__) >= '0.17.0': - expected = expected.astype('float64') - tm.assert_frame_equal(result, expected) - - def test_rank_dense_method(self): - dtypes = ['O', 'f8', 'i8'] - in_out = [([1], [1]), - ([2], [1]), - ([0], [1]), - ([2, 2], [1, 1]), - ([1, 2, 3], [1, 2, 3]), - ([4, 2, 1], [3, 2, 1],), - ([1, 1, 5, 5, 3], [1, 1, 3, 3, 2]), - ([-5, -4, -3, -2, -1], [1, 2, 3, 4, 5])] - - for ser, exp in in_out: - for dtype in dtypes: - s = Series(ser).astype(dtype) - result = s.rank(method='dense') - expected = Series(exp).astype(result.dtype) - assert_series_equal(result, expected) - - def test_rank_descending(self): - dtypes = ['O', 'f8', 'i8'] - - for dtype, method in product(dtypes, self.results): - if 'i' in dtype: - s = self.s.dropna() - df = self.df.dropna() - else: - s = self.s.astype(dtype) - df = self.df.astype(dtype) - - res = s.rank(ascending=False) - expected = (s.max() - s).rank() - assert_series_equal(res, expected) - - res = df.rank(ascending=False) - expected = (df.max() - df).rank() - assert_frame_equal(res, expected) - - if method == 'first' and dtype == 'O': - continue - - expected = (s.max() - s).rank(method=method) - res2 = s.rank(method=method, ascending=False) - assert_series_equal(res2, expected) - - expected = (df.max() - df).rank(method=method) - - if dtype != 'O': - res2 = df.rank(method=method, ascending=False, - numeric_only=True) - assert_frame_equal(res2, expected) - - res3 = df.rank(method=method, ascending=False, - numeric_only=False) - assert_frame_equal(res3, expected) - - def test_rank_2d_tie_methods(self): - df = self.df - - def _check2d(df, expected, method='average', axis=0): - exp_df = DataFrame({'A': expected, 'B': expected}) - - if axis == 1: - df = df.T - exp_df = exp_df.T - - result = df.rank(method=method, axis=axis) - assert_frame_equal(result, exp_df) - - dtypes = [None, object] - disabled = set([(object, 'first')]) - results = self.results - - for method, axis, dtype in product(results, [0, 1], dtypes): - if (dtype, method) in disabled: - continue - frame = df if dtype is None else df.astype(dtype) - _check2d(frame, results[method], method=method, axis=axis) - - def test_rank_int(self): - s = self.s.dropna().astype('i8') - - for method, res in compat.iteritems(self.results): - result = s.rank(method=method) - expected = Series(res).dropna() - expected.index = result.index - assert_series_equal(result, expected) - - def test_rank_object_bug(self): - # GH 13445 - - # smoke tests - Series([np.nan] * 32).astype(object).rank(ascending=True) - Series([np.nan] * 32).astype(object).rank(ascending=False) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_strings.py b/pandas/tests/test_strings.py index f59127c853ed1..ec2b0b75b9eed 100644 --- a/pandas/tests/test_strings.py +++ b/pandas/tests/test_strings.py @@ -2,17 +2,16 @@ # pylint: disable-msg=E1101,W0612 from datetime import datetime, timedelta +import pytest import re -import nose - from numpy import nan as NA import numpy as np from numpy.random import randint from pandas.compat import range, u import pandas.compat as compat -from pandas import (Index, Series, DataFrame, isnull, MultiIndex, notnull) +from pandas import Index, Series, DataFrame, isna, MultiIndex, notna from pandas.util.testing import assert_series_equal import pandas.util.testing as tm @@ -20,21 +19,20 @@ import pandas.core.strings as strings -class TestStringMethods(tm.TestCase): - - _multiprocess_can_split_ = True +class TestStringMethods(object): def test_api(self): # GH 6106, GH 9322 - self.assertIs(Series.str, strings.StringMethods) - self.assertIsInstance(Series(['']).str, strings.StringMethods) + assert Series.str is strings.StringMethods + assert isinstance(Series(['']).str, strings.StringMethods) # GH 9184 invalid = Series([1]) - with tm.assertRaisesRegexp(AttributeError, "only use .str accessor"): + with tm.assert_raises_regex(AttributeError, + "only use .str accessor"): invalid.str - self.assertFalse(hasattr(invalid, 'str')) + assert not hasattr(invalid, 'str') def test_iter(self): # GH3638 @@ -43,7 +41,7 @@ def test_iter(self): for s in ds.str: # iter must yield a Series - tm.assertIsInstance(s, Series) + assert isinstance(s, Series) # indices of each yielded Series should be equal to the index of # the original Series @@ -51,13 +49,12 @@ def test_iter(self): for el in s: # each element of the series is either a basestring/str or nan - self.assertTrue(isinstance(el, compat.string_types) or - isnull(el)) + assert isinstance(el, compat.string_types) or isna(el) # desired behavior is to iterate until everything would be nan on the # next iter so make sure the last element of the iterator was 'l' in # this case since 'wikitravel' is the longest string - self.assertEqual(s.dropna().values.item(), 'l') + assert s.dropna().values.item() == 'l' def test_iter_empty(self): ds = Series([], dtype=object) @@ -69,8 +66,8 @@ def test_iter_empty(self): # nothing to iterate over so nothing defined values should remain # unchanged - self.assertEqual(i, 100) - self.assertEqual(s, 1) + assert i == 100 + assert s == 1 def test_iter_single_element(self): ds = Series(['a']) @@ -78,7 +75,7 @@ def test_iter_single_element(self): for i, s in enumerate(ds.str): pass - self.assertFalse(i) + assert not i assert_series_equal(ds, s) def test_iter_object_try_string(self): @@ -90,8 +87,8 @@ def test_iter_object_try_string(self): for i, s in enumerate(ds.str): pass - self.assertEqual(i, 100) - self.assertEqual(s, 'h') + assert i == 100 + assert s == 'h' def test_cat(self): one = np.array(['a', 'a', 'b', 'b', 'c', NA], dtype=np.object_) @@ -100,29 +97,29 @@ def test_cat(self): # single array result = strings.str_cat(one) exp = 'aabbc' - self.assertEqual(result, exp) + assert result == exp result = strings.str_cat(one, na_rep='NA') exp = 'aabbcNA' - self.assertEqual(result, exp) + assert result == exp result = strings.str_cat(one, na_rep='-') exp = 'aabbc-' - self.assertEqual(result, exp) + assert result == exp result = strings.str_cat(one, sep='_', na_rep='NA') exp = 'a_a_b_b_c_NA' - self.assertEqual(result, exp) + assert result == exp result = strings.str_cat(two, sep='-') exp = 'a-b-d-foo' - self.assertEqual(result, exp) + assert result == exp # Multiple arrays result = strings.str_cat(one, [two], na_rep='NA') exp = np.array(['aa', 'aNA', 'bb', 'bd', 'cfoo', 'NANA'], dtype=np.object_) - self.assert_numpy_array_equal(result, exp) + tm.assert_numpy_array_equal(result, exp) result = strings.str_cat(one, two) exp = np.array(['aa', NA, 'bb', 'bd', 'cfoo', NA], dtype=np.object_) @@ -138,7 +135,7 @@ def test_count(self): result = Series(values).str.count('f[o]+') exp = Series([1, 2, NA, 4]) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_series_equal(result, exp) # mixed @@ -149,7 +146,7 @@ def test_count(self): rs = Series(mixed).str.count('a') xp = Series([1, NA, 0, NA, NA, 0, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode @@ -161,7 +158,7 @@ def test_count(self): result = Series(values).str.count('f[o]+') exp = Series([1, 2, NA, 4]) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_series_equal(result, exp) def test_contains(self): @@ -180,7 +177,7 @@ def test_contains(self): values = ['foo', 'xyz', 'fooommm__foo', 'mmm_'] result = strings.str_contains(values, pat) expected = np.array([False, False, True, True]) - self.assertEqual(result.dtype, np.bool_) + assert result.dtype == np.bool_ tm.assert_numpy_array_equal(result, expected) # case insensitive using regex @@ -203,7 +200,7 @@ def test_contains(self): rs = Series(mixed).str.contains('o') xp = Series([False, NA, False, NA, NA, True, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode @@ -223,13 +220,13 @@ def test_contains(self): dtype=np.object_) result = strings.str_contains(values, pat) expected = np.array([False, False, True, True]) - self.assertEqual(result.dtype, np.bool_) + assert result.dtype == np.bool_ tm.assert_numpy_array_equal(result, expected) # na values = Series(['om', 'foo', np.nan]) res = values.str.contains('foo', na="foo") - self.assertEqual(res.loc[2], "foo") + assert res.loc[2] == "foo" def test_startswith(self): values = Series(['om', NA, 'foo_nom', 'nom', 'bar_foo', NA, 'foo']) @@ -247,7 +244,7 @@ def test_startswith(self): tm.assert_numpy_array_equal(rs, xp) rs = Series(mixed).str.startswith('f') - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) xp = Series([False, NA, False, NA, NA, True, NA, NA, NA]) tm.assert_series_equal(rs, xp) @@ -278,7 +275,7 @@ def test_endswith(self): rs = Series(mixed).str.endswith('f') xp = Series([False, NA, False, NA, NA, False, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode @@ -330,7 +327,7 @@ def test_lower_upper(self): mixed = mixed.str.upper() rs = Series(mixed).str.lower() xp = Series(['a', NA, 'b', NA, NA, 'foo', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode @@ -384,13 +381,11 @@ def test_swapcase(self): def test_casemethods(self): values = ['aaa', 'bbb', 'CCC', 'Dddd', 'eEEE'] s = Series(values) - self.assertEqual(s.str.lower().tolist(), [v.lower() for v in values]) - self.assertEqual(s.str.upper().tolist(), [v.upper() for v in values]) - self.assertEqual(s.str.title().tolist(), [v.title() for v in values]) - self.assertEqual(s.str.capitalize().tolist(), [ - v.capitalize() for v in values]) - self.assertEqual(s.str.swapcase().tolist(), [ - v.swapcase() for v in values]) + assert s.str.lower().tolist() == [v.lower() for v in values] + assert s.str.upper().tolist() == [v.upper() for v in values] + assert s.str.title().tolist() == [v.title() for v in values] + assert s.str.capitalize().tolist() == [v.capitalize() for v in values] + assert s.str.swapcase().tolist() == [v.swapcase() for v in values] def test_replace(self): values = Series(['fooBAD__barBAD', NA]) @@ -409,7 +404,7 @@ def test_replace(self): rs = Series(mixed).str.replace('BAD[_]*', '') xp = Series(['a', NA, 'b', NA, NA, 'foo', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -434,7 +429,7 @@ def test_replace(self): for repl in (None, 3, {'a': 'b'}): for data in (['a', 'b', None], ['a', 'b', 'c', 'ad']): values = klass(data) - self.assertRaises(TypeError, values.str.replace, 'a', repl) + pytest.raises(TypeError, values.str.replace, 'a', repl) def test_replace_callable(self): # GH 15055 @@ -454,15 +449,15 @@ def test_replace_callable(self): r'(?(3)required )positional arguments?') repl = lambda: None - with tm.assertRaisesRegexp(TypeError, p_err): + with tm.assert_raises_regex(TypeError, p_err): values.str.replace('a', repl) repl = lambda m, x: None - with tm.assertRaisesRegexp(TypeError, p_err): + with tm.assert_raises_regex(TypeError, p_err): values.str.replace('a', repl) repl = lambda m, x, y=None: None - with tm.assertRaisesRegexp(TypeError, p_err): + with tm.assert_raises_regex(TypeError, p_err): values.str.replace('a', repl) # test regex named groups @@ -473,6 +468,68 @@ def test_replace_callable(self): exp = Series(['bAR', NA]) tm.assert_series_equal(result, exp) + def test_replace_compiled_regex(self): + # GH 15446 + values = Series(['fooBAD__barBAD', NA]) + + # test with compiled regex + pat = re.compile(r'BAD[_]*') + result = values.str.replace(pat, '') + exp = Series(['foobar', NA]) + tm.assert_series_equal(result, exp) + + # mixed + mixed = Series(['aBAD', NA, 'bBAD', True, datetime.today(), 'fooBAD', + None, 1, 2.]) + + rs = Series(mixed).str.replace(pat, '') + xp = Series(['a', NA, 'b', NA, NA, 'foo', NA, NA, NA]) + assert isinstance(rs, Series) + tm.assert_almost_equal(rs, xp) + + # unicode + values = Series([u('fooBAD__barBAD'), NA]) + + result = values.str.replace(pat, '') + exp = Series([u('foobar'), NA]) + tm.assert_series_equal(result, exp) + + result = values.str.replace(pat, '', n=1) + exp = Series([u('foobarBAD'), NA]) + tm.assert_series_equal(result, exp) + + # flags + unicode + values = Series([b"abcd,\xc3\xa0".decode("utf-8")]) + exp = Series([b"abcd, \xc3\xa0".decode("utf-8")]) + pat = re.compile(r"(?<=\w),(?=\w)", flags=re.UNICODE) + result = values.str.replace(pat, ", ") + tm.assert_series_equal(result, exp) + + # case and flags provided to str.replace will have no effect + # and will produce warnings + values = Series(['fooBAD__barBAD__bad', NA]) + pat = re.compile(r'BAD[_]*') + + with tm.assert_raises_regex(ValueError, + "case and flags cannot be"): + result = values.str.replace(pat, '', flags=re.IGNORECASE) + + with tm.assert_raises_regex(ValueError, + "case and flags cannot be"): + result = values.str.replace(pat, '', case=False) + + with tm.assert_raises_regex(ValueError, + "case and flags cannot be"): + result = values.str.replace(pat, '', case=True) + + # test with callable + values = Series(['fooBAD__barBAD', NA]) + repl = lambda m: m.group(0).swapcase() + pat = re.compile('[a-z][A-Z]{2}') + result = values.str.replace(pat, repl, n=2) + exp = Series(['foObaD__baRbaD', NA]) + tm.assert_series_equal(result, exp) + def test_repeat(self): values = Series(['a', 'b', NA, 'c', NA, 'd']) @@ -490,7 +547,7 @@ def test_repeat(self): rs = Series(mixed).str.repeat(3) xp = Series(['aaa', NA, 'bbb', NA, NA, 'foofoofoo', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode @@ -504,64 +561,44 @@ def test_repeat(self): exp = Series([u('a'), u('bb'), NA, u('cccc'), NA, u('dddddd')]) tm.assert_series_equal(result, exp) - def test_deprecated_match(self): - # Old match behavior, deprecated (but still default) in 0.13 + def test_match(self): + # New match behavior introduced in 0.13 values = Series(['fooBAD__barBAD', NA, 'foo']) - - with tm.assert_produces_warning(): - result = values.str.match('.*(BAD[_]+).*(BAD)') - exp = Series([('BAD__', 'BAD'), NA, []]) - tm.assert_series_equal(result, exp) - - # mixed - mixed = Series(['aBAD_BAD', NA, 'BAD_b_BAD', True, datetime.today(), - 'foo', None, 1, 2.]) - - with tm.assert_produces_warning(): - rs = Series(mixed).str.match('.*(BAD[_]+).*(BAD)') - xp = Series([('BAD_', 'BAD'), NA, ('BAD_', 'BAD'), - NA, NA, [], NA, NA, NA]) - tm.assertIsInstance(rs, Series) - tm.assert_series_equal(rs, xp) - - # unicode - values = Series([u('fooBAD__barBAD'), NA, u('foo')]) - - with tm.assert_produces_warning(): - result = values.str.match('.*(BAD[_]+).*(BAD)') - exp = Series([(u('BAD__'), u('BAD')), NA, []]) + result = values.str.match('.*(BAD[_]+).*(BAD)') + exp = Series([True, NA, False]) tm.assert_series_equal(result, exp) - def test_match(self): - # New match behavior introduced in 0.13 values = Series(['fooBAD__barBAD', NA, 'foo']) - with tm.assert_produces_warning(): - result = values.str.match('.*(BAD[_]+).*(BAD)', as_indexer=True) + result = values.str.match('.*BAD[_]+.*BAD') exp = Series([True, NA, False]) tm.assert_series_equal(result, exp) - # If no groups, use new behavior even when as_indexer is False. - # (Old behavior is pretty much useless in this case.) + # test passing as_indexer still works but is ignored values = Series(['fooBAD__barBAD', NA, 'foo']) - result = values.str.match('.*BAD[_]+.*BAD', as_indexer=False) exp = Series([True, NA, False]) + with tm.assert_produces_warning(FutureWarning): + result = values.str.match('.*BAD[_]+.*BAD', as_indexer=True) tm.assert_series_equal(result, exp) + with tm.assert_produces_warning(FutureWarning): + result = values.str.match('.*BAD[_]+.*BAD', as_indexer=False) + tm.assert_series_equal(result, exp) + with tm.assert_produces_warning(FutureWarning): + result = values.str.match('.*(BAD[_]+).*(BAD)', as_indexer=True) + tm.assert_series_equal(result, exp) + pytest.raises(ValueError, values.str.match, '.*(BAD[_]+).*(BAD)', + as_indexer=False) # mixed mixed = Series(['aBAD_BAD', NA, 'BAD_b_BAD', True, datetime.today(), 'foo', None, 1, 2.]) - - with tm.assert_produces_warning(): - rs = Series(mixed).str.match('.*(BAD[_]+).*(BAD)', as_indexer=True) + rs = Series(mixed).str.match('.*(BAD[_]+).*(BAD)') xp = Series([True, NA, True, NA, NA, False, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_series_equal(rs, xp) # unicode values = Series([u('fooBAD__barBAD'), NA, u('foo')]) - - with tm.assert_produces_warning(): - result = values.str.match('.*(BAD[_]+).*(BAD)', as_indexer=True) + result = values.str.match('.*(BAD[_]+).*(BAD)') exp = Series([True, NA, False]) tm.assert_series_equal(result, exp) @@ -612,7 +649,7 @@ def test_extract_expand_False(self): # Index only works with one regex group since # multi-group would expand to a frame idx = Index(['A1', 'A2', 'A3', 'A4', 'B5']) - with tm.assertRaisesRegexp(ValueError, "supported"): + with tm.assert_raises_regex(ValueError, "supported"): idx.str.extract('([AB])([123])', expand=False) # these should work for both Series and Index @@ -620,16 +657,16 @@ def test_extract_expand_False(self): # no groups s_or_idx = klass(['A1', 'B2', 'C3']) f = lambda: s_or_idx.str.extract('[ABC][123]', expand=False) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # only non-capturing groups f = lambda: s_or_idx.str.extract('(?:[AB]).*', expand=False) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # single group renames series/index properly s_or_idx = klass(['A1', 'A2']) result = s_or_idx.str.extract(r'(?PA)\d', expand=False) - self.assertEqual(result.name, 'uno') + assert result.name == 'uno' exp = klass(['A', 'A'], name='uno') if klass == Series: @@ -733,7 +770,7 @@ def check_index(index): r = s.str.extract(r'(?P[a-z])', expand=False) e = Series(['a', 'b', 'c'], name='sue') tm.assert_series_equal(r, e) - self.assertEqual(r.name, e.name) + assert r.name == e.name def test_extract_expand_True(self): # Contains tests like those in test_match and some others. @@ -765,16 +802,16 @@ def test_extract_expand_True(self): # no groups s_or_idx = klass(['A1', 'B2', 'C3']) f = lambda: s_or_idx.str.extract('[ABC][123]', expand=True) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # only non-capturing groups f = lambda: s_or_idx.str.extract('(?:[AB]).*', expand=True) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # single group renames series/index properly s_or_idx = klass(['A1', 'A2']) result_df = s_or_idx.str.extract(r'(?PA)\d', expand=True) - tm.assertIsInstance(result_df, DataFrame) + assert isinstance(result_df, DataFrame) result_series = result_df['uno'] assert_series_equal(result_series, Series(['A', 'A'], name='uno')) @@ -1088,7 +1125,7 @@ def test_extractall_errors(self): # no capture groups. (it returns DataFrame with one column for # each capture group) s = Series(['a3', 'b3', 'd4c2'], name='series_name') - with tm.assertRaisesRegexp(ValueError, "no capture groups"): + with tm.assert_raises_regex(ValueError, "no capture groups"): s.str.extractall(r'[a-z]') def test_extract_index_one_two_groups(self): @@ -1172,17 +1209,16 @@ def test_extractall_same_as_extract_subject_index(self): tm.assert_frame_equal(extract_one_noname, no_match_index) def test_empty_str_methods(self): - empty_str = empty = Series(dtype=str) + empty_str = empty = Series(dtype=object) empty_int = Series(dtype=int) empty_bool = Series(dtype=bool) - empty_list = Series(dtype=list) empty_bytes = Series(dtype=object) # GH7241 # (extract) on empty series tm.assert_series_equal(empty_str, empty.str.cat(empty)) - self.assertEqual('', empty.str.cat()) + assert '' == empty.str.cat() tm.assert_series_equal(empty_str, empty.str.title()) tm.assert_series_equal(empty_int, empty.str.count('a')) tm.assert_series_equal(empty_bool, empty.str.contains('a')) @@ -1206,25 +1242,24 @@ def test_empty_str_methods(self): DataFrame(columns=[0, 1], dtype=str), empty.str.extract('()()', expand=False)) tm.assert_frame_equal(DataFrame(dtype=str), empty.str.get_dummies()) - tm.assert_series_equal(empty_str, empty_list.str.join('')) + tm.assert_series_equal(empty_str, empty_str.str.join('')) tm.assert_series_equal(empty_int, empty.str.len()) - tm.assert_series_equal(empty_list, empty_list.str.findall('a')) + tm.assert_series_equal(empty_str, empty_str.str.findall('a')) tm.assert_series_equal(empty_int, empty.str.find('a')) tm.assert_series_equal(empty_int, empty.str.rfind('a')) tm.assert_series_equal(empty_str, empty.str.pad(42)) tm.assert_series_equal(empty_str, empty.str.center(42)) - tm.assert_series_equal(empty_list, empty.str.split('a')) - tm.assert_series_equal(empty_list, empty.str.rsplit('a')) - tm.assert_series_equal(empty_list, + tm.assert_series_equal(empty_str, empty.str.split('a')) + tm.assert_series_equal(empty_str, empty.str.rsplit('a')) + tm.assert_series_equal(empty_str, empty.str.partition('a', expand=False)) - tm.assert_series_equal(empty_list, + tm.assert_series_equal(empty_str, empty.str.rpartition('a', expand=False)) tm.assert_series_equal(empty_str, empty.str.slice(stop=1)) tm.assert_series_equal(empty_str, empty.str.slice(step=1)) tm.assert_series_equal(empty_str, empty.str.strip()) tm.assert_series_equal(empty_str, empty.str.lstrip()) tm.assert_series_equal(empty_str, empty.str.rstrip()) - tm.assert_series_equal(empty_str, empty.str.rstrip()) tm.assert_series_equal(empty_str, empty.str.wrap(42)) tm.assert_series_equal(empty_str, empty.str.get(0)) tm.assert_series_equal(empty_str, empty_bytes.str.decode('ascii')) @@ -1285,20 +1320,13 @@ def test_ismethods(self): tm.assert_series_equal(str_s.str.isupper(), Series(upper_e)) tm.assert_series_equal(str_s.str.istitle(), Series(title_e)) - self.assertEqual(str_s.str.isalnum().tolist(), [v.isalnum() - for v in values]) - self.assertEqual(str_s.str.isalpha().tolist(), [v.isalpha() - for v in values]) - self.assertEqual(str_s.str.isdigit().tolist(), [v.isdigit() - for v in values]) - self.assertEqual(str_s.str.isspace().tolist(), [v.isspace() - for v in values]) - self.assertEqual(str_s.str.islower().tolist(), [v.islower() - for v in values]) - self.assertEqual(str_s.str.isupper().tolist(), [v.isupper() - for v in values]) - self.assertEqual(str_s.str.istitle().tolist(), [v.istitle() - for v in values]) + assert str_s.str.isalnum().tolist() == [v.isalnum() for v in values] + assert str_s.str.isalpha().tolist() == [v.isalpha() for v in values] + assert str_s.str.isdigit().tolist() == [v.isdigit() for v in values] + assert str_s.str.isspace().tolist() == [v.isspace() for v in values] + assert str_s.str.islower().tolist() == [v.islower() for v in values] + assert str_s.str.isupper().tolist() == [v.isupper() for v in values] + assert str_s.str.istitle().tolist() == [v.istitle() for v in values] def test_isnumeric(self): # 0x00bc: ¼ VULGAR FRACTION ONE QUARTER @@ -1313,10 +1341,8 @@ def test_isnumeric(self): tm.assert_series_equal(s.str.isdecimal(), Series(decimal_e)) unicodes = [u'A', u'3', u'¼', u'★', u'፸', u'3', u'four'] - self.assertEqual(s.str.isnumeric().tolist(), [ - v.isnumeric() for v in unicodes]) - self.assertEqual(s.str.isdecimal().tolist(), [ - v.isdecimal() for v in unicodes]) + assert s.str.isnumeric().tolist() == [v.isnumeric() for v in unicodes] + assert s.str.isdecimal().tolist() == [v.isdecimal() for v in unicodes] values = ['A', np.nan, u'¼', u'★', np.nan, u'3', 'four'] s = Series(values) @@ -1375,7 +1401,7 @@ def test_join(self): rs = Series(mixed).str.split('_').str.join('_') xp = Series(['a_b', NA, 'asdf_cas_asdf', NA, NA, 'foo', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -1387,7 +1413,7 @@ def test_len(self): values = Series(['foo', 'fooo', 'fooooo', np.nan, 'fooooooo']) result = values.str.len() - exp = values.map(lambda x: len(x) if notnull(x) else NA) + exp = values.map(lambda x: len(x) if notna(x) else NA) tm.assert_series_equal(result, exp) # mixed @@ -1397,7 +1423,7 @@ def test_len(self): rs = Series(mixed).str.len() xp = Series([3, NA, 13, NA, NA, 3, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -1405,7 +1431,7 @@ def test_len(self): 'fooooooo')]) result = values.str.len() - exp = values.map(lambda x: len(x) if notnull(x) else NA) + exp = values.map(lambda x: len(x) if notna(x) else NA) tm.assert_series_equal(result, exp) def test_findall(self): @@ -1422,7 +1448,7 @@ def test_findall(self): rs = Series(mixed).str.findall('BAD[_]*') xp = Series([['BAD__', 'BAD'], NA, [], NA, NA, ['BAD'], NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -1470,12 +1496,12 @@ def test_find(self): dtype=np.int64) tm.assert_numpy_array_equal(result.values, expected) - with tm.assertRaisesRegexp(TypeError, - "expected a string object, not int"): + with tm.assert_raises_regex(TypeError, + "expected a string object, not int"): result = values.str.find(0) - with tm.assertRaisesRegexp(TypeError, - "expected a string object, not int"): + with tm.assert_raises_regex(TypeError, + "expected a string object, not int"): result = values.str.rfind(0) def test_find_nan(self): @@ -1545,11 +1571,13 @@ def _check(result, expected): dtype=np.int64) tm.assert_numpy_array_equal(result.values, expected) - with tm.assertRaisesRegexp(ValueError, "substring not found"): + with tm.assert_raises_regex(ValueError, + "substring not found"): result = s.str.index('DE') - with tm.assertRaisesRegexp(TypeError, - "expected a string object, not int"): + with tm.assert_raises_regex(TypeError, + "expected a string " + "object, not int"): result = s.str.index(0) # test with nan @@ -1581,7 +1609,7 @@ def test_pad(self): rs = Series(mixed).str.pad(5, side='left') xp = Series([' a', NA, ' b', NA, NA, ' ee', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) mixed = Series(['a', NA, 'b', True, datetime.today(), 'ee', None, 1, 2. @@ -1590,7 +1618,7 @@ def test_pad(self): rs = Series(mixed).str.pad(5, side='right') xp = Series(['a ', NA, 'b ', NA, NA, 'ee ', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) mixed = Series(['a', NA, 'b', True, datetime.today(), 'ee', None, 1, 2. @@ -1599,7 +1627,7 @@ def test_pad(self): rs = Series(mixed).str.pad(5, side='both') xp = Series([' a ', NA, ' b ', NA, NA, ' ee ', NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -1633,12 +1661,14 @@ def test_pad_fillchar(self): exp = Series(['XXaXX', 'XXbXX', NA, 'XXcXX', NA, 'eeeeee']) tm.assert_almost_equal(result, exp) - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not str"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not str"): result = values.str.pad(5, fillchar='XY') - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not int"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not int"): result = values.str.pad(5, fillchar=5) def test_pad_width(self): @@ -1646,8 +1676,9 @@ def test_pad_width(self): s = Series(['1', '22', 'a', 'bb']) for f in ['center', 'ljust', 'rjust', 'zfill', 'pad']: - with tm.assertRaisesRegexp(TypeError, - "width must be of integer type, not*"): + with tm.assert_raises_regex(TypeError, + "width must be of " + "integer type, not*"): getattr(s.str, f)('f') def test_translate(self): @@ -1679,7 +1710,7 @@ def _check(result, expected): expected = klass(['abcde', 'abcc', 'cddd', 'cde']) _check(result, expected) else: - with tm.assertRaisesRegexp( + with tm.assert_raises_regex( ValueError, "deletechars is not a valid argument"): result = s.str.translate(table, deletechars='fg') @@ -1711,19 +1742,19 @@ def test_center_ljust_rjust(self): rs = Series(mixed).str.center(5) xp = Series([' a ', NA, ' b ', NA, NA, ' c ', ' eee ', NA, NA, NA ]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) rs = Series(mixed).str.ljust(5) xp = Series(['a ', NA, 'b ', NA, NA, 'c ', 'eee ', NA, NA, NA ]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) rs = Series(mixed).str.rjust(5) xp = Series([' a', NA, ' b', NA, NA, ' c', ' eee', NA, NA, NA ]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -1768,28 +1799,34 @@ def test_center_ljust_rjust_fillchar(self): # If fillchar is not a charatter, normal str raises TypeError # 'aaa'.ljust(5, 'XY') # TypeError: must be char, not str - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not str"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not str"): result = values.str.center(5, fillchar='XY') - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not str"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not str"): result = values.str.ljust(5, fillchar='XY') - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not str"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not str"): result = values.str.rjust(5, fillchar='XY') - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not int"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not int"): result = values.str.center(5, fillchar=1) - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not int"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not int"): result = values.str.ljust(5, fillchar=1) - with tm.assertRaisesRegexp(TypeError, - "fillchar must be a character, not int"): + with tm.assert_raises_regex(TypeError, + "fillchar must be a " + "character, not int"): result = values.str.rjust(5, fillchar=1) def test_zfill(self): @@ -1835,11 +1872,11 @@ def test_split(self): result = mixed.str.split('_') exp = Series([['a', 'b', 'c'], NA, ['d', 'e', 'f'], NA, NA, NA, NA, NA ]) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_almost_equal(result, exp) result = mixed.str.split('_', expand=False) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_almost_equal(result, exp) # unicode @@ -1880,11 +1917,11 @@ def test_rsplit(self): result = mixed.str.rsplit('_') exp = Series([['a', 'b', 'c'], NA, ['d', 'e', 'f'], NA, NA, NA, NA, NA ]) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_almost_equal(result, exp) result = mixed.str.rsplit('_', expand=False) - tm.assertIsInstance(result, Series) + assert isinstance(result, Series) tm.assert_almost_equal(result, exp) # unicode @@ -1914,9 +1951,9 @@ def test_split_noargs(self): s = Series(['Wes McKinney', 'Travis Oliphant']) result = s.str.split() expected = ['Travis', 'Oliphant'] - self.assertEqual(result[1], expected) + assert result[1] == expected result = s.str.rsplit() - self.assertEqual(result[1], expected) + assert result[1] == expected def test_split_maxsplit(self): # re.split 0, str.split -1 @@ -1971,7 +2008,7 @@ def test_split_to_dataframe(self): index=['preserve', 'me']) tm.assert_frame_equal(result, exp) - with tm.assertRaisesRegexp(ValueError, "expand must be"): + with tm.assert_raises_regex(ValueError, "expand must be"): s.str.split('_', expand="not_a_boolean") def test_split_to_multiindex_expand(self): @@ -1979,14 +2016,14 @@ def test_split_to_multiindex_expand(self): result = idx.str.split('_', expand=True) exp = idx tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 1) + assert result.nlevels == 1 idx = Index(['some_equal_splits', 'with_no_nans']) result = idx.str.split('_', expand=True) exp = MultiIndex.from_tuples([('some', 'equal', 'splits'), ( 'with', 'no', 'nans')]) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 3) + assert result.nlevels == 3 idx = Index(['some_unequal_splits', 'one_of_these_things_is_not']) result = idx.str.split('_', expand=True) @@ -1994,9 +2031,9 @@ def test_split_to_multiindex_expand(self): ), ('one', 'of', 'these', 'things', 'is', 'not')]) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 6) + assert result.nlevels == 6 - with tm.assertRaisesRegexp(ValueError, "expand must be"): + with tm.assert_raises_regex(ValueError, "expand must be"): idx.str.split('_', expand="not_a_boolean") def test_rsplit_to_dataframe_expand(self): @@ -2033,21 +2070,21 @@ def test_rsplit_to_multiindex_expand(self): result = idx.str.rsplit('_', expand=True) exp = idx tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 1) + assert result.nlevels == 1 idx = Index(['some_equal_splits', 'with_no_nans']) result = idx.str.rsplit('_', expand=True) exp = MultiIndex.from_tuples([('some', 'equal', 'splits'), ( 'with', 'no', 'nans')]) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 3) + assert result.nlevels == 3 idx = Index(['some_equal_splits', 'with_no_nans']) result = idx.str.rsplit('_', expand=True, n=1) exp = MultiIndex.from_tuples([('some_equal', 'splits'), ('with_no', 'nans')]) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 2) + assert result.nlevels == 2 def test_split_with_name(self): # GH 12617 @@ -2065,12 +2102,12 @@ def test_split_with_name(self): idx = Index(['a,b', 'c,d'], name='xxx') res = idx.str.split(',') exp = Index([['a', 'b'], ['c', 'd']], name='xxx') - self.assertTrue(res.nlevels, 1) + assert res.nlevels == 1 tm.assert_index_equal(res, exp) res = idx.str.split(',', expand=True) exp = MultiIndex.from_tuples([('a', 'b'), ('c', 'd')]) - self.assertTrue(res.nlevels, 2) + assert res.nlevels == 2 tm.assert_index_equal(res, exp) def test_partition_series(self): @@ -2136,9 +2173,9 @@ def test_partition_series(self): # compare to standard lib values = Series(['A_B_C', 'B_C_D', 'E_F_G', 'EFGHEF']) result = values.str.partition('_', expand=False).tolist() - self.assertEqual(result, [v.partition('_') for v in values]) + assert result == [v.partition('_') for v in values] result = values.str.rpartition('_', expand=False).tolist() - self.assertEqual(result, [v.rpartition('_') for v in values]) + assert result == [v.rpartition('_') for v in values] def test_partition_index(self): values = Index(['a_b_c', 'c_d_e', 'f_g_h']) @@ -2147,25 +2184,25 @@ def test_partition_index(self): exp = Index(np.array([('a', '_', 'b_c'), ('c', '_', 'd_e'), ('f', '_', 'g_h')])) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 1) + assert result.nlevels == 1 result = values.str.rpartition('_', expand=False) exp = Index(np.array([('a_b', '_', 'c'), ('c_d', '_', 'e'), ( 'f_g', '_', 'h')])) tm.assert_index_equal(result, exp) - self.assertEqual(result.nlevels, 1) + assert result.nlevels == 1 result = values.str.partition('_') exp = Index([('a', '_', 'b_c'), ('c', '_', 'd_e'), ('f', '_', 'g_h')]) tm.assert_index_equal(result, exp) - self.assertTrue(isinstance(result, MultiIndex)) - self.assertEqual(result.nlevels, 3) + assert isinstance(result, MultiIndex) + assert result.nlevels == 3 result = values.str.rpartition('_') exp = Index([('a_b', '_', 'c'), ('c_d', '_', 'e'), ('f_g', '_', 'h')]) tm.assert_index_equal(result, exp) - self.assertTrue(isinstance(result, MultiIndex)) - self.assertEqual(result.nlevels, 3) + assert isinstance(result, MultiIndex) + assert result.nlevels == 3 def test_partition_to_dataframe(self): values = Series(['a_b_c', 'c_d_e', NA, 'f_g_h']) @@ -2210,13 +2247,13 @@ def test_partition_with_name(self): idx = Index(['a,b', 'c,d'], name='xxx') res = idx.str.partition(',') exp = MultiIndex.from_tuples([('a', ',', 'b'), ('c', ',', 'd')]) - self.assertTrue(res.nlevels, 3) + assert res.nlevels == 3 tm.assert_index_equal(res, exp) # should preserve name res = idx.str.partition(',', expand=False) exp = Index(np.array([('a', ',', 'b'), ('c', ',', 'd')]), name='xxx') - self.assertTrue(res.nlevels, 1) + assert res.nlevels == 1 tm.assert_index_equal(res, exp) def test_pipe_failures(self): @@ -2244,7 +2281,7 @@ def test_slice(self): (3, 0, -1)]: try: result = values.str.slice(start, stop, step) - expected = Series([s[start:stop:step] if not isnull(s) else NA + expected = Series([s[start:stop:step] if not isna(s) else NA for s in values]) tm.assert_series_equal(result, expected) except: @@ -2258,7 +2295,7 @@ def test_slice(self): rs = Series(mixed).str.slice(2, 5) xp = Series(['foo', NA, 'bar', NA, NA, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) rs = Series(mixed).str.slice(2, 5, -1) @@ -2336,19 +2373,19 @@ def test_strip_lstrip_rstrip_mixed(self): rs = Series(mixed).str.strip() xp = Series(['aa', NA, 'bb', NA, NA, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) rs = Series(mixed).str.lstrip() xp = Series(['aa ', NA, 'bb \t\n', NA, NA, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) rs = Series(mixed).str.rstrip() xp = Series([' aa', NA, ' bb', NA, NA, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) def test_strip_lstrip_rstrip_unicode(self): @@ -2437,7 +2474,7 @@ def test_get(self): rs = Series(mixed).str.split('_').str.get(1) xp = Series(['b', NA, 'd', NA, NA, NA, NA, NA]) - tm.assertIsInstance(rs, Series) + assert isinstance(rs, Series) tm.assert_almost_equal(rs, xp) # unicode @@ -2555,20 +2592,21 @@ def test_match_findall_flags(self): pat = r'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})' - with tm.assert_produces_warning(FutureWarning): - result = data.str.match(pat, flags=re.IGNORECASE) + result = data.str.extract(pat, flags=re.IGNORECASE, expand=True) + assert result.iloc[0].tolist() == ['dave', 'google', 'com'] - self.assertEqual(result[0], ('dave', 'google', 'com')) + result = data.str.match(pat, flags=re.IGNORECASE) + assert result[0] result = data.str.findall(pat, flags=re.IGNORECASE) - self.assertEqual(result[0][0], ('dave', 'google', 'com')) + assert result[0][0] == ('dave', 'google', 'com') result = data.str.count(pat, flags=re.IGNORECASE) - self.assertEqual(result[0], 1) + assert result[0] == 1 with tm.assert_produces_warning(UserWarning): result = data.str.contains(pat, flags=re.IGNORECASE) - self.assertEqual(result[0], True) + assert result[0] def test_encode_decode(self): base = Series([u('a'), u('b'), u('a\xe4')]) @@ -2583,7 +2621,7 @@ def test_encode_decode(self): def test_encode_decode_errors(self): encodeBase = Series([u('a'), u('b'), u('a\x9d')]) - self.assertRaises(UnicodeEncodeError, encodeBase.str.encode, 'cp1252') + pytest.raises(UnicodeEncodeError, encodeBase.str.encode, 'cp1252') f = lambda x: x.encode('cp1252', 'ignore') result = encodeBase.str.encode('cp1252', 'ignore') @@ -2592,7 +2630,7 @@ def test_encode_decode_errors(self): decodeBase = Series([b'a', b'b', b'a\x9d']) - self.assertRaises(UnicodeDecodeError, decodeBase.str.decode, 'cp1252') + pytest.raises(UnicodeDecodeError, decodeBase.str.decode, 'cp1252') f = lambda x: x.decode('cp1252', 'ignore') result = decodeBase.str.decode('cp1252', 'ignore') @@ -2616,7 +2654,8 @@ def test_normalize(self): result = s.str.normalize('NFC') tm.assert_series_equal(result, expected) - with tm.assertRaisesRegexp(ValueError, "invalid normalization form"): + with tm.assert_raises_regex(ValueError, + "invalid normalization form"): s.str.normalize('xxx') s = Index([u'ABC', u'123', u'アイエ']) @@ -2635,19 +2674,19 @@ def test_cat_on_filtered_index(self): str_month = df.month.astype('str') str_both = str_year.str.cat(str_month, sep=' ') - self.assertEqual(str_both.loc[1], '2011 2') + assert str_both.loc[1] == '2011 2' str_multiple = str_year.str.cat([str_month, str_month], sep=' ') - self.assertEqual(str_multiple.loc[1], '2011 2 2') + assert str_multiple.loc[1] == '2011 2 2' def test_str_cat_raises_intuitive_error(self): # https://github.com/pandas-dev/pandas/issues/11334 s = Series(['a', 'b', 'c', 'd']) message = "Did you mean to supply a `sep` keyword?" - with tm.assertRaisesRegexp(ValueError, message): + with tm.assert_raises_regex(ValueError, message): s.str.cat('|') - with tm.assertRaisesRegexp(ValueError, message): + with tm.assert_raises_regex(ValueError, message): s.str.cat(' ') def test_index_str_accessor_visibility(self): @@ -2669,15 +2708,15 @@ def test_index_str_accessor_visibility(self): (['aa', datetime(2011, 1, 1)], 'mixed')] for values, tp in cases: idx = Index(values) - self.assertTrue(isinstance(Series(values).str, StringMethods)) - self.assertTrue(isinstance(idx.str, StringMethods)) - self.assertEqual(idx.inferred_type, tp) + assert isinstance(Series(values).str, StringMethods) + assert isinstance(idx.str, StringMethods) + assert idx.inferred_type == tp for values, tp in cases: idx = Index(values) - self.assertTrue(isinstance(Series(values).str, StringMethods)) - self.assertTrue(isinstance(idx.str, StringMethods)) - self.assertEqual(idx.inferred_type, tp) + assert isinstance(Series(values).str, StringMethods) + assert isinstance(idx.str, StringMethods) + assert idx.inferred_type == tp cases = [([1, np.nan], 'floating'), ([datetime(2011, 1, 1)], 'datetime64'), @@ -2685,38 +2724,33 @@ def test_index_str_accessor_visibility(self): for values, tp in cases: idx = Index(values) message = 'Can only use .str accessor with string values' - with self.assertRaisesRegexp(AttributeError, message): + with tm.assert_raises_regex(AttributeError, message): Series(values).str - with self.assertRaisesRegexp(AttributeError, message): + with tm.assert_raises_regex(AttributeError, message): idx.str - self.assertEqual(idx.inferred_type, tp) + assert idx.inferred_type == tp # MultiIndex has mixed dtype, but not allow to use accessor idx = MultiIndex.from_tuples([('a', 'b'), ('a', 'b')]) - self.assertEqual(idx.inferred_type, 'mixed') + assert idx.inferred_type == 'mixed' message = 'Can only use .str accessor with Index, not MultiIndex' - with self.assertRaisesRegexp(AttributeError, message): + with tm.assert_raises_regex(AttributeError, message): idx.str def test_str_accessor_no_new_attributes(self): # https://github.com/pandas-dev/pandas/issues/10673 s = Series(list('aabbcde')) - with tm.assertRaisesRegexp(AttributeError, - "You cannot add any new attribute"): + with tm.assert_raises_regex(AttributeError, + "You cannot add any new attribute"): s.str.xlabel = "a" def test_method_on_bytes(self): lhs = Series(np.array(list('abc'), 'S1').astype(object)) rhs = Series(np.array(list('def'), 'S1').astype(object)) if compat.PY3: - self.assertRaises(TypeError, lhs.str.cat, rhs) + pytest.raises(TypeError, lhs.str.cat, rhs) else: result = lhs.str.cat(rhs) expected = Series(np.array( ['ad', 'be', 'cf'], 'S2').astype(object)) tm.assert_series_equal(result, expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_take.py b/pandas/tests/test_take.py index 98b3b474f785d..7b97b0e975df3 100644 --- a/pandas/tests/test_take.py +++ b/pandas/tests/test_take.py @@ -2,22 +2,17 @@ import re from datetime import datetime -import nose import numpy as np from pandas.compat import long import pandas.core.algorithms as algos import pandas.util.testing as tm -from pandas.tslib import iNaT +from pandas._libs.tslib import iNaT -_multiprocess_can_split_ = True - -class TestTake(tm.TestCase): +class TestTake(object): # standard incompatible fill error fill_error = re.compile("Incompatible type for fill_value") - _multiprocess_can_split_ = True - def test_1d_with_out(self): def _test_dtype(dtype, can_hold_na, writeable=True): data = np.random.randint(0, 2, 4).astype(dtype) @@ -37,7 +32,7 @@ def _test_dtype(dtype, can_hold_na, writeable=True): expected[3] = np.nan tm.assert_almost_equal(out, expected) else: - with tm.assertRaisesRegexp(TypeError, self.fill_error): + with tm.assert_raises_regex(TypeError, self.fill_error): algos.take_1d(data, indexer, out=out) # no exception o/w data.take(indexer, out=out) @@ -128,7 +123,8 @@ def _test_dtype(dtype, can_hold_na, writeable=True): tm.assert_almost_equal(out1, expected1) else: for i, out in enumerate([out0, out1]): - with tm.assertRaisesRegexp(TypeError, self.fill_error): + with tm.assert_raises_regex(TypeError, + self.fill_error): algos.take_nd(data, indexer, out=out, axis=i) # no exception o/w data.take(indexer, out=out, axis=i) @@ -240,7 +236,8 @@ def _test_dtype(dtype, can_hold_na): tm.assert_almost_equal(out2, expected2) else: for i, out in enumerate([out0, out1, out2]): - with tm.assertRaisesRegexp(TypeError, self.fill_error): + with tm.assert_raises_regex(TypeError, + self.fill_error): algos.take_nd(data, indexer, out=out, axis=i) # no exception o/w data.take(indexer, out=out, axis=i) @@ -353,24 +350,24 @@ def test_1d_bool(self): result = algos.take_1d(arr, [0, 2, 2, 1]) expected = arr.take([0, 2, 2, 1]) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = algos.take_1d(arr, [0, 2, -1]) - self.assertEqual(result.dtype, np.object_) + assert result.dtype == np.object_ def test_2d_bool(self): arr = np.array([[0, 1, 0], [1, 0, 1], [0, 1, 1]], dtype=bool) result = algos.take_nd(arr, [0, 2, 2, 1]) expected = arr.take([0, 2, 2, 1], axis=0) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = algos.take_nd(arr, [0, 2, 2, 1], axis=1) expected = arr.take([0, 2, 2, 1], axis=1) - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = algos.take_nd(arr, [0, 2, -1]) - self.assertEqual(result.dtype, np.object_) + assert result.dtype == np.object_ def test_2d_float32(self): arr = np.random.randn(4, 3).astype(np.float32) @@ -448,8 +445,3 @@ def test_2d_datetime64(self): expected = arr.take(indexer, axis=1) expected[:, [2, 4]] = datetime(2007, 1, 1) tm.assert_almost_equal(result, expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/test_window.py b/pandas/tests/test_window.py index dc23469976e35..1cc0ad8bb4041 100644 --- a/pandas/tests/test_window.py +++ b/pandas/tests/test_window.py @@ -1,22 +1,22 @@ from itertools import product -import nose +import pytest import sys import warnings +from warnings import catch_warnings -from nose.tools import assert_raises -from datetime import datetime +from datetime import datetime, timedelta from numpy.random import randn import numpy as np from distutils.version import LooseVersion import pandas as pd -from pandas import (Series, DataFrame, Panel, bdate_range, isnull, - notnull, concat, Timestamp) +from pandas import (Series, DataFrame, bdate_range, isna, + notna, concat, Timestamp, Index) import pandas.stats.moments as mom import pandas.core.window as rwindow import pandas.tseries.offsets as offsets from pandas.core.base import SpecificationError -from pandas.core.common import UnsupportedFunctionCall +from pandas.errors import UnsupportedFunctionCall import pandas.util.testing as tm from pandas.compat import range, zip, PY3 @@ -30,9 +30,7 @@ def assert_equal(left, right): tm.assert_frame_equal(left, right) -class Base(tm.TestCase): - - _multiprocess_can_split_ = True +class Base(object): _nan_locs = np.arange(20, 40) _inf_locs = np.array([]) @@ -50,7 +48,7 @@ def _create_data(self): class TestApi(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_getitem(self): @@ -59,7 +57,7 @@ def test_getitem(self): tm.assert_index_equal(r._selected_obj.columns, self.frame.columns) r = self.frame.rolling(window=5)[1] - self.assertEqual(r._selected_obj.name, self.frame.columns[1]) + assert r._selected_obj.name == self.frame.columns[1] # technically this is allowed r = self.frame.rolling(window=5)[1, 3] @@ -73,10 +71,10 @@ def test_getitem(self): def test_select_bad_cols(self): df = DataFrame([[1, 2]], columns=['A', 'B']) g = df.rolling(window=5) - self.assertRaises(KeyError, g.__getitem__, ['C']) # g[['C']] + pytest.raises(KeyError, g.__getitem__, ['C']) # g[['C']] - self.assertRaises(KeyError, g.__getitem__, ['A', 'C']) # g[['A', 'C']] - with tm.assertRaisesRegexp(KeyError, '^[^A]+$'): + pytest.raises(KeyError, g.__getitem__, ['A', 'C']) # g[['A', 'C']] + with tm.assert_raises_regex(KeyError, '^[^A]+$'): # A should not be referenced as a bad column... # will have to rethink regex if you change message! g[['A', 'C']] @@ -86,7 +84,7 @@ def test_attribute_access(self): df = DataFrame([[1, 2]], columns=['A', 'B']) r = df.rolling(window=5) tm.assert_series_equal(r.A.sum(), r['A'].sum()) - self.assertRaises(AttributeError, lambda: r.F) + pytest.raises(AttributeError, lambda: r.F) def tests_skip_nuisance(self): @@ -136,16 +134,18 @@ def test_agg(self): expected.columns = ['mean', 'sum'] tm.assert_frame_equal(result, expected) - result = r.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) + with catch_warnings(record=True): + result = r.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) expected = pd.concat([a_mean, a_sum], axis=1) expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), ('A', 'sum')]) tm.assert_frame_equal(result, expected, check_like=True) - result = r.aggregate({'A': {'mean': 'mean', - 'sum': 'sum'}, - 'B': {'mean2': 'mean', - 'sum2': 'sum'}}) + with catch_warnings(record=True): + result = r.aggregate({'A': {'mean': 'mean', + 'sum': 'sum'}, + 'B': {'mean2': 'mean', + 'sum2': 'sum'}}) expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) exp_cols = [('A', 'mean'), ('A', 'sum'), ('B', 'mean2'), ('B', 'sum2')] expected.columns = pd.MultiIndex.from_tuples(exp_cols) @@ -174,7 +174,7 @@ def test_agg_consistency(self): tm.assert_index_equal(result, expected) result = r['A'].agg([np.sum, np.mean]).columns - expected = pd.Index(['sum', 'mean']) + expected = Index(['sum', 'mean']) tm.assert_index_equal(result, expected) result = r.agg({'A': [np.sum, np.mean]}).columns @@ -191,18 +191,20 @@ def f(): r.aggregate({'r1': {'A': ['mean', 'sum']}, 'r2': {'B': ['mean', 'sum']}}) - self.assertRaises(SpecificationError, f) + pytest.raises(SpecificationError, f) expected = pd.concat([r['A'].mean(), r['A'].std(), r['B'].mean(), r['B'].std()], axis=1) expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( 'ra', 'std'), ('rb', 'mean'), ('rb', 'std')]) - result = r[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) + with catch_warnings(record=True): + result = r[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) tm.assert_frame_equal(result, expected, check_like=True) - result = r.agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) + with catch_warnings(record=True): + result = r.agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) expected.columns = pd.MultiIndex.from_tuples([('A', 'ra', 'mean'), ( 'A', 'ra', 'std'), ('B', 'rb', 'mean'), ('B', 'rb', 'std')]) tm.assert_frame_equal(result, expected, check_like=True) @@ -247,7 +249,7 @@ def test_count_nonnumeric_types(self): tm.assert_frame_equal(result, expected) result = df.rolling(1).count() - expected = df.notnull().astype(float) + expected = df.notna().astype(float) tm.assert_frame_equal(result, expected) def test_window_with_args(self): @@ -279,8 +281,8 @@ def test_preserve_metadata(self): s2 = s.rolling(30).sum() s3 = s.rolling(20).sum() - self.assertEqual(s2.name, 'foo') - self.assertEqual(s3.name, 'foo') + assert s2.name == 'foo' + assert s3.name == 'foo' def test_how_compat(self): # in prior versions, we would allow how to be used in the resample @@ -294,8 +296,7 @@ def test_how_compat(self): for op in ['mean', 'sum', 'std', 'var', 'kurt', 'skew']: for t in ['rolling', 'expanding']: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): + with catch_warnings(record=True): dfunc = getattr(pd, "{0}_{1}".format(t, op)) if dfunc is None: @@ -314,7 +315,7 @@ def test_how_compat(self): class TestWindow(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_constructor(self): @@ -335,13 +336,13 @@ def test_constructor(self): # not valid for w in [2., 'foo', np.array([2])]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(win_type='boxcar', window=2, min_periods=w) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(win_type='boxcar', window=2, min_periods=1, center=w) for wt in ['foobar', 1]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(win_type=wt, window=2) def test_numpy_compat(self): @@ -351,15 +352,15 @@ def test_numpy_compat(self): msg = "numpy operations are not valid with window objects" for func in ('sum', 'mean'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(w, func), 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(w, func), dtype=np.float64) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(w, func), 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(w, func), dtype=np.float64) class TestRolling(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_doc_string(self): @@ -383,16 +384,16 @@ def test_constructor(self): # GH 13383 c(0) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(-1) # not valid for w in [2., 'foo', np.array([2])]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(window=w) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(window=2, min_periods=w) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(window=2, min_periods=1, center=w) def test_constructor_with_win_type(self): @@ -401,9 +402,47 @@ def test_constructor_with_win_type(self): for o in [self.series, self.frame]: c = o.rolling c(0, win_type='boxcar') - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(-1, win_type='boxcar') + def test_constructor_with_timedelta_window(self): + # GH 15440 + n = 10 + df = pd.DataFrame({'value': np.arange(n)}, + index=pd.date_range('2015-12-24', + periods=n, + freq="D")) + expected_data = np.append([0., 1.], np.arange(3., 27., 3)) + for window in [timedelta(days=3), pd.Timedelta(days=3)]: + result = df.rolling(window=window).sum() + expected = pd.DataFrame({'value': expected_data}, + index=pd.date_range('2015-12-24', + periods=n, + freq="D")) + tm.assert_frame_equal(result, expected) + expected = df.rolling('3D').sum() + tm.assert_frame_equal(result, expected) + + @pytest.mark.parametrize( + 'window', [timedelta(days=3), pd.Timedelta(days=3), '3D']) + def test_constructor_with_timedelta_window_and_minperiods(self, window): + # GH 15305 + n = 10 + df = pd.DataFrame({'value': np.arange(n)}, + index=pd.date_range('2017-08-08', + periods=n, + freq="D")) + expected = pd.DataFrame({'value': np.append([np.NaN, 1.], + np.arange(3., 27., 3))}, + index=pd.date_range('2017-08-08', + periods=n, + freq="D")) + result_roll_sum = df.rolling(window=window, min_periods=2).sum() + result_roll_generic = df.rolling(window=window, + min_periods=2).apply(sum) + tm.assert_frame_equal(result_roll_sum, expected) + tm.assert_frame_equal(result_roll_generic, expected) + def test_numpy_compat(self): # see gh-12811 r = rwindow.Rolling(Series([2, 4, 6]), window=2) @@ -411,15 +450,46 @@ def test_numpy_compat(self): msg = "numpy operations are not valid with window objects" for func in ('std', 'mean', 'sum', 'max', 'min', 'var'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(r, func), 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(r, func), dtype=np.float64) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(r, func), 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(r, func), dtype=np.float64) + + def test_closed(self): + df = DataFrame({'A': [0, 1, 2, 3, 4]}) + # closed only allowed for datetimelike + with pytest.raises(ValueError): + df.rolling(window=3, closed='neither') + + @pytest.mark.parametrize('roller', ['1s', 1]) + def tests_empty_df_rolling(self, roller): + # GH 15819 Verifies that datetime and integer rolling windows can be + # applied to empty DataFrames + expected = DataFrame() + result = DataFrame().rolling(roller).sum() + tm.assert_frame_equal(result, expected) + + # Verifies that datetime and integer rolling windows can be applied to + # empty DataFrames with datetime index + expected = DataFrame(index=pd.DatetimeIndex([])) + result = DataFrame(index=pd.DatetimeIndex([])).rolling(roller).sum() + tm.assert_frame_equal(result, expected) + + def test_multi_index_names(self): + + # GH 16789, 16825 + cols = pd.MultiIndex.from_product([['A', 'B'], ['C', 'D', 'E']], + names=['1', '2']) + df = pd.DataFrame(np.ones((10, 6)), columns=cols) + result = df.rolling(3).cov() + + tm.assert_index_equal(result.columns, df.columns) + assert result.index.names == [None, '1', '2'] class TestExpanding(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_doc_string(self): @@ -441,9 +511,9 @@ def test_constructor(self): # not valid for w in [2., 'foo', np.array([2])]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(min_periods=w) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(min_periods=1, center=w) def test_numpy_compat(self): @@ -453,15 +523,35 @@ def test_numpy_compat(self): msg = "numpy operations are not valid with window objects" for func in ('std', 'mean', 'sum', 'max', 'min', 'var'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(e, func), 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(e, func), dtype=np.float64) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(e, func), 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(e, func), dtype=np.float64) + + @pytest.mark.parametrize( + 'expander', + [1, pytest.param('ls', marks=pytest.mark.xfail( + reason='GH 16425 expanding with ' + 'offset not supported'))]) + def test_empty_df_expanding(self, expander): + # GH 15819 Verifies that datetime and integer expanding windows can be + # applied to empty DataFrames + + expected = DataFrame() + result = DataFrame().expanding(expander).sum() + tm.assert_frame_equal(result, expected) + + # Verifies that datetime and integer expanding windows can be applied + # to empty DataFrames with datetime index + expected = DataFrame(index=pd.DatetimeIndex([])) + result = DataFrame( + index=pd.DatetimeIndex([])).expanding(expander).sum() + tm.assert_frame_equal(result, expected) class TestEWM(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_doc_string(self): @@ -484,28 +574,28 @@ def test_constructor(self): c(halflife=0.75, alpha=None) # not valid: mutually exclusive - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(com=0.5, alpha=0.5) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(span=1.5, halflife=0.75) - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(alpha=0.5, span=1.5) # not valid: com < 0 - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(com=-0.5) # not valid: span < 1 - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(span=0.5) # not valid: halflife <= 0 - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(halflife=0) # not valid: alpha <= 0 or alpha > 1 for alpha in (-0.5, 1.5): - with self.assertRaises(ValueError): + with pytest.raises(ValueError): c(alpha=alpha) def test_numpy_compat(self): @@ -515,30 +605,30 @@ def test_numpy_compat(self): msg = "numpy operations are not valid with window objects" for func in ('std', 'mean', 'var'): - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(e, func), 1, 2, 3) - tm.assertRaisesRegexp(UnsupportedFunctionCall, msg, - getattr(e, func), dtype=np.float64) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(e, func), 1, 2, 3) + tm.assert_raises_regex(UnsupportedFunctionCall, msg, + getattr(e, func), dtype=np.float64) class TestDeprecations(Base): """ test that we are catching deprecation warnings """ - def setUp(self): + def setup_method(self, method): self._create_data() def test_deprecations(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): mom.rolling_mean(np.ones(10), 3, center=True, axis=0) mom.rolling_mean(Series(np.ones(10)), 3, center=True, axis=0) -# GH #12373 : rolling functions error on float32 data +# gh-12373 : rolling functions error on float32 data # make sure rolling functions works for different dtypes # -# NOTE that these are yielded tests and so _create_data is -# explicity called, nor do these inherit from unittest.TestCase +# NOTE that these are yielded tests and so _create_data +# is explicitly called. # # further note that we are only checking rolling for fully dtype # compliance (though both expanding and ewm inherit) @@ -631,7 +721,7 @@ def test_dtypes(self): f = self.funcs[f_name] d = self.data[d_name] exp = self.expects[d_name][f_name] - yield self.check_dtypes, f, f_name, d, d_name, exp + self.check_dtypes(f, f_name, d, d_name, exp) def check_dtypes(self, f, f_name, d, d_name, exp): roll = d.rolling(window=self.window) @@ -728,7 +818,8 @@ def check_dtypes(self, f, f_name, d, d_name, exp): else: # other methods not Implemented ATM - assert_raises(NotImplementedError, f, roll) + with pytest.raises(NotImplementedError): + f(roll) class TestDtype_timedelta(DatetimeLike): @@ -743,13 +834,13 @@ class TestDtype_datetime64UTC(DatetimeLike): dtype = 'datetime64[ns, UTC]' def _create_data(self): - raise nose.SkipTest("direct creation of extension dtype " - "datetime64[ns, UTC] is not supported ATM") + pytest.skip("direct creation of extension dtype " + "datetime64[ns, UTC] is not supported ATM") class TestMoments(Base): - def setUp(self): + def setup_method(self, method): self._create_data() def test_centered_axis_validation(self): @@ -758,7 +849,7 @@ def test_centered_axis_validation(self): Series(np.ones(10)).rolling(window=3, center=True, axis=0).mean() # bad axis - with self.assertRaises(ValueError): + with pytest.raises(ValueError): Series(np.ones(10)).rolling(window=3, center=True, axis=1).mean() # ok ok @@ -768,7 +859,7 @@ def test_centered_axis_validation(self): axis=1).mean() # bad axis - with self.assertRaises(ValueError): + with pytest.raises(ValueError): (DataFrame(np.ones((10, 10))) .rolling(window=3, center=True, axis=2).mean()) @@ -793,7 +884,7 @@ def test_cmov_mean(self): xp = np.array([np.nan, np.nan, 9.962, 11.27, 11.564, 12.516, 12.818, 12.952, np.nan, np.nan]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): rs = mom.rolling_mean(vals, 5, center=True) tm.assert_almost_equal(xp, rs) @@ -810,7 +901,7 @@ def test_cmov_window(self): xp = np.array([np.nan, np.nan, 9.962, 11.27, 11.564, 12.516, 12.818, 12.952, np.nan, np.nan]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): rs = mom.rolling_window(vals, 5, 'boxcar', center=True) tm.assert_almost_equal(xp, rs) @@ -825,22 +916,22 @@ def test_cmov_window_corner(self): # all nan vals = np.empty(10, dtype=float) vals.fill(np.nan) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): rs = mom.rolling_window(vals, 5, 'boxcar', center=True) - self.assertTrue(np.isnan(rs).all()) + assert np.isnan(rs).all() # empty vals = np.array([]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): rs = mom.rolling_window(vals, 5, 'boxcar', center=True) - self.assertEqual(len(rs), 0) + assert len(rs) == 0 # shorter than window vals = np.random.randn(5) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): rs = mom.rolling_window(vals, 10, 'boxcar') - self.assertTrue(np.isnan(rs).all()) - self.assertEqual(len(rs), 5) + assert np.isnan(rs).all() + assert len(rs) == 5 def test_cmov_window_frame(self): # Gh 8238 @@ -861,7 +952,7 @@ def test_cmov_window_frame(self): tm.assert_frame_equal(DataFrame(xp), rs) # invalid method - with self.assertRaises(AttributeError): + with pytest.raises(AttributeError): (DataFrame(vals).rolling(5, win_type='boxcar', center=True) .std()) @@ -1016,44 +1107,55 @@ def test_cmov_window_special_linear_range(self): tm.assert_series_equal(xp, rs) def test_rolling_median(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self._check_moment_func(mom.rolling_median, np.median, name='median') def test_rolling_min(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self._check_moment_func(mom.rolling_min, np.min, name='min') - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): a = np.array([1, 2, 3, 4, 5]) b = mom.rolling_min(a, window=100, min_periods=1) tm.assert_almost_equal(b, np.ones(len(a))) - self.assertRaises(ValueError, mom.rolling_min, np.array([1, 2, 3]), - window=3, min_periods=5) + pytest.raises(ValueError, mom.rolling_min, np.array([1, 2, 3]), + window=3, min_periods=5) def test_rolling_max(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self._check_moment_func(mom.rolling_max, np.max, name='max') - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): a = np.array([1, 2, 3, 4, 5], dtype=np.float64) b = mom.rolling_max(a, window=100, min_periods=1) tm.assert_almost_equal(a, b) - self.assertRaises(ValueError, mom.rolling_max, np.array([1, 2, 3]), - window=3, min_periods=5) + pytest.raises(ValueError, mom.rolling_max, np.array([1, 2, 3]), + window=3, min_periods=5) def test_rolling_quantile(self): - qs = [.1, .5, .9] + qs = [0.0, .1, .5, .9, 1.0] def scoreatpercentile(a, per): values = np.sort(a, axis=0) - idx = per / 1. * (values.shape[0] - 1) - return values[int(idx)] + idx = int(per / 1. * (values.shape[0] - 1)) + + if idx == values.shape[0] - 1: + retval = values[-1] + + else: + qlow = float(idx) / float(values.shape[0] - 1) + qhig = float(idx + 1) / float(values.shape[0] - 1) + vlow = values[idx] + vhig = values[idx + 1] + retval = vlow + (vhig - vlow) * (per - qlow) / (qhig - qlow) + + return retval for q in qs: @@ -1068,6 +1170,42 @@ def alt(x): self._check_moment_func(f, alt, name='quantile', quantile=q) + def test_rolling_quantile_np_percentile(self): + # #9413: Tests that rolling window's quantile default behavior + # is analogus to Numpy's percentile + row = 10 + col = 5 + idx = pd.date_range(20100101, periods=row, freq='B') + df = pd.DataFrame(np.random.rand(row * col).reshape((row, -1)), + index=idx) + + df_quantile = df.quantile([0.25, 0.5, 0.75], axis=0) + np_percentile = np.percentile(df, [25, 50, 75], axis=0) + + tm.assert_almost_equal(df_quantile.values, np.array(np_percentile)) + + def test_rolling_quantile_series(self): + # #16211: Tests that rolling window's quantile default behavior + # is analogus to pd.Series' quantile + arr = np.arange(100) + s = pd.Series(arr) + q1 = s.quantile(0.1) + q2 = s.rolling(100).quantile(0.1).iloc[-1] + + tm.assert_almost_equal(q1, q2) + + def test_rolling_quantile_param(self): + ser = Series([0.0, .1, .5, .9, 1.0]) + + with pytest.raises(ValueError): + ser.rolling(3).quantile(-0.1) + + with pytest.raises(ValueError): + ser.rolling(3).quantile(10.0) + + with pytest.raises(TypeError): + ser.rolling(3).quantile('foo') + def test_rolling_apply(self): # suppress warnings about empty slices, as we are deliberately testing # with a 0-length Series @@ -1104,11 +1242,11 @@ def test_rolling_apply_out_of_bounds(self): arr = np.arange(4) # it works! - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_apply(arr, 10, np.sum) - self.assertTrue(isnull(result).all()) + assert isna(result).all() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_apply(arr, 10, np.sum, min_periods=1) tm.assert_almost_equal(result, result) @@ -1119,22 +1257,22 @@ def test_rolling_std(self): name='std', ddof=0) def test_rolling_std_1obs(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_std(np.array([1., 2., 3., 4., 5.]), 1, min_periods=1) expected = np.array([np.nan] * 5) tm.assert_almost_equal(result, expected) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_std(np.array([1., 2., 3., 4., 5.]), 1, min_periods=1, ddof=0) expected = np.zeros(5) tm.assert_almost_equal(result, expected) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_std(np.array([np.nan, np.nan, 3., 4., 5.]), 3, min_periods=2) - self.assertTrue(np.isnan(result[2])) + assert np.isnan(result[2]) def test_rolling_std_neg_sqrt(self): # unit test from Bottleneck @@ -1144,13 +1282,13 @@ def test_rolling_std_neg_sqrt(self): a = np.array([0.0011448196318903589, 0.00028718669878572767, 0.00028718669878572767, 0.00028718669878572767, 0.00028718669878572767]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): b = mom.rolling_std(a, window=3) - self.assertTrue(np.isfinite(b[2:]).all()) + assert np.isfinite(b[2:]).all() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): b = mom.ewmstd(a, span=3) - self.assertTrue(np.isfinite(b[2:]).all()) + assert np.isfinite(b[2:]).all() def test_rolling_var(self): self._check_moment_func(mom.rolling_var, lambda x: np.var(x, ddof=1), @@ -1162,7 +1300,7 @@ def test_rolling_skew(self): try: from scipy.stats import skew except ImportError: - raise nose.SkipTest('no scipy') + pytest.skip('no scipy') self._check_moment_func(mom.rolling_skew, lambda x: skew(x, bias=False), name='skew') @@ -1170,14 +1308,14 @@ def test_rolling_kurt(self): try: from scipy.stats import kurtosis except ImportError: - raise nose.SkipTest('no scipy') + pytest.skip('no scipy') self._check_moment_func(mom.rolling_kurt, lambda x: kurtosis(x, bias=False), name='kurt') def test_fperr_robustness(self): # TODO: remove this once python 2.5 out of picture if PY3: - raise nose.SkipTest("doesn't work on python 3") + pytest.skip("doesn't work on python 3") # #2114 data = '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1a@\xaa\xaa\xaa\xaa\xaa\xaa\x02@8\x8e\xe38\x8e\xe3\xe8?z\t\xed%\xb4\x97\xd0?\xa2\x0c<\xdd\x9a\x1f\xb6?\x82\xbb\xfa&y\x7f\x9d?\xac\'\xa7\xc4P\xaa\x83?\x90\xdf\xde\xb0k8j?`\xea\xe9u\xf2zQ?*\xe37\x9d\x98N7?\xe2.\xf5&v\x13\x1f?\xec\xc9\xf8\x19\xa4\xb7\x04?\x90b\xf6w\x85\x9f\xeb>\xb5A\xa4\xfaXj\xd2>F\x02\xdb\xf8\xcb\x8d\xb8>.\xac<\xfb\x87^\xa0>\xe8:\xa6\xf9_\xd3\x85>\xfb?\xe2cUU\xfd?\xfc\x7fA\xed8\x8e\xe3?\xa5\xaa\xac\x91\xf6\x12\xca?n\x1cs\xb6\xf9a\xb1?\xe8%D\xf3L-\x97?5\xddZD\x11\xe7~?#>\xe7\x82\x0b\x9ad?\xd9R4Y\x0fxK?;7x;\nP2?N\xf4JO\xb8j\x18?4\xf81\x8a%G\x00?\x9a\xf5\x97\r2\xb4\xe5>\xcd\x9c\xca\xbcB\xf0\xcc>3\x13\x87(\xd7J\xb3>\x99\x19\xb4\xe0\x1e\xb9\x99>ff\xcd\x95\x14&\x81>\x88\x88\xbc\xc7p\xddf>`\x0b\xa6_\x96|N>@\xb2n\xea\x0eS4>U\x98\x938i\x19\x1b>\x8eeb\xd0\xf0\x10\x02>\xbd\xdc-k\x96\x16\xe8=(\x93\x1e\xf2\x0e\x0f\xd0=\xe0n\xd3Bii\xb5=*\xe9\x19Y\x8c\x8c\x9c=\xc6\xf0\xbb\x90]\x08\x83=]\x96\xfa\xc0|`i=>d\xfc\xd5\xfd\xeaP=R0\xfb\xc7\xa7\x8e6=\xc2\x95\xf9_\x8a\x13\x1e=\xd6c\xa6\xea\x06\r\x04=r\xda\xdd8\t\xbc\xea<\xf6\xe6\x93\xd0\xb0\xd2\xd1<\x9d\xdeok\x96\xc3\xb7<&~\xea9s\xaf\x9f\xb8\x02@\xc6\xd2&\xfd\xa8\xf5\xe8?\xd9\xe1\x19\xfe\xc5\xa3\xd0?v\x82"\xa8\xb2/\xb6?\x9dX\x835\xee\x94\x9d?h\x90W\xce\x9e\xb8\x83?\x8a\xc0th~Kj?\\\x80\xf8\x9a\xa9\x87Q?%\xab\xa0\xce\x8c_7?1\xe4\x80\x13\x11*\x1f? \x98\x00\r\xb6\xc6\x04?\x80u\xabf\x9d\xb3\xeb>UNrD\xbew\xd2>\x1c\x13C[\xa8\x9f\xb8>\x12b\xd7m-\x1fQ@\xe3\x85>\xe6\x91)l\x00/m>Da\xc6\xf2\xaatS>\x05\xd7]\xee\xe3\xf09>' # noqa @@ -1186,27 +1324,27 @@ def test_fperr_robustness(self): if sys.byteorder != "little": arr = arr.byteswap().newbyteorder() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_sum(arr, 2) - self.assertTrue((result[1:] >= 0).all()) + assert (result[1:] >= 0).all() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_mean(arr, 2) - self.assertTrue((result[1:] >= 0).all()) + assert (result[1:] >= 0).all() - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_var(arr, 2) - self.assertTrue((result[1:] >= 0).all()) + assert (result[1:] >= 0).all() # #2527, ugh arr = np.array([0.00012456, 0.0003, 0]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_mean(arr, 1) - self.assertTrue(result[-1] >= 0) + assert result[-1] >= 0 - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.rolling_mean(-arr, 1) - self.assertTrue(result[-1] <= 0) + assert result[-1] <= 0 def _check_moment_func(self, f, static_comp, name=None, window=50, has_min_periods=True, has_center=True, @@ -1259,16 +1397,16 @@ def get_result(arr, window, min_periods=None, center=False): # min_periods is working correctly result = get_result(arr, 20, min_periods=15) - self.assertTrue(np.isnan(result[23])) - self.assertFalse(np.isnan(result[24])) + assert np.isnan(result[23]) + assert not np.isnan(result[24]) - self.assertFalse(np.isnan(result[-6])) - self.assertTrue(np.isnan(result[-5])) + assert not np.isnan(result[-6]) + assert np.isnan(result[-5]) arr2 = randn(20) result = get_result(arr2, 10, min_periods=5) - self.assertTrue(isnull(result[3])) - self.assertTrue(notnull(result[4])) + assert isna(result[3]) + assert notna(result[4]) # min_periods=0 result0 = get_result(arr, 20, min_periods=0) @@ -1290,7 +1428,7 @@ def get_result(arr, window, min_periods=None, center=False): expected = get_result( np.concatenate((arr, np.array([np.NaN] * 9))), 20)[9:] - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) if test_stable: result = get_result(self.arr + 1e9, window) @@ -1306,8 +1444,8 @@ def get_result(arr, window, min_periods=None, center=False): expected = get_result(self.arr, len(self.arr), min_periods=minp) nan_mask = np.isnan(result) - self.assertTrue(np.array_equal(nan_mask, np.isnan( - expected))) + tm.assert_numpy_array_equal(nan_mask, np.isnan(expected)) + nan_mask = ~nan_mask tm.assert_almost_equal(result[nan_mask], expected[nan_mask]) @@ -1315,7 +1453,8 @@ def get_result(arr, window, min_periods=None, center=False): result = get_result(self.arr, len(self.arr) + 1) expected = get_result(self.arr, len(self.arr)) nan_mask = np.isnan(result) - self.assertTrue(np.array_equal(nan_mask, np.isnan(expected))) + tm.assert_numpy_array_equal(nan_mask, np.isnan(expected)) + nan_mask = ~nan_mask tm.assert_almost_equal(result[nan_mask], expected[nan_mask]) @@ -1329,23 +1468,21 @@ def get_result(obj, window, min_periods=None, freq=None, center=False): # catch a freq deprecation warning if freq is provided and not # None - w = FutureWarning if freq is not None else None - with tm.assert_produces_warning(w, check_stacklevel=False): + with catch_warnings(record=True): r = obj.rolling(window=window, min_periods=min_periods, freq=freq, center=center) return getattr(r, name)(**kwargs) # check via the moments API - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): + with catch_warnings(record=True): return f(obj, window=window, min_periods=min_periods, freq=freq, center=center, **kwargs) series_result = get_result(self.series, window=50) frame_result = get_result(self.frame, window=50) - tm.assertIsInstance(series_result, Series) - self.assertEqual(type(frame_result), DataFrame) + assert isinstance(series_result, Series) + assert type(frame_result) == DataFrame # check time_rule works if has_time_rule: @@ -1369,7 +1506,7 @@ def get_result(obj, window, min_periods=None, freq=None, center=False): trunc_series = self.series[::2].truncate(prev_date, last_date) trunc_frame = self.frame[::2].truncate(prev_date, last_date) - self.assertAlmostEqual(series_result[-1], + tm.assert_almost_equal(series_result[-1], static_comp(trunc_series)) tm.assert_series_equal(frame_result.xs(last_date), @@ -1421,9 +1558,9 @@ def test_ewma(self): arr = np.zeros(1000) arr[5] = 1 - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): result = mom.ewma(arr, span=100, adjust=False).sum() - self.assertTrue(np.abs(result - 1) < 1e-2) + assert np.abs(result - 1) < 1e-2 s = Series([1.0, 2.0, 4.0, 8.0]) @@ -1508,31 +1645,31 @@ def test_ewmvol(self): self._check_ew(mom.ewmvol, name='vol') def test_ewma_span_com_args(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): A = mom.ewma(self.arr, com=9.5) B = mom.ewma(self.arr, span=20) tm.assert_almost_equal(A, B) - self.assertRaises(ValueError, mom.ewma, self.arr, com=9.5, span=20) - self.assertRaises(ValueError, mom.ewma, self.arr) + pytest.raises(ValueError, mom.ewma, self.arr, com=9.5, span=20) + pytest.raises(ValueError, mom.ewma, self.arr) def test_ewma_halflife_arg(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): A = mom.ewma(self.arr, com=13.932726172912965) B = mom.ewma(self.arr, halflife=10.0) tm.assert_almost_equal(A, B) - self.assertRaises(ValueError, mom.ewma, self.arr, span=20, - halflife=50) - self.assertRaises(ValueError, mom.ewma, self.arr, com=9.5, - halflife=50) - self.assertRaises(ValueError, mom.ewma, self.arr, com=9.5, span=20, - halflife=50) - self.assertRaises(ValueError, mom.ewma, self.arr) + pytest.raises(ValueError, mom.ewma, self.arr, span=20, + halflife=50) + pytest.raises(ValueError, mom.ewma, self.arr, com=9.5, + halflife=50) + pytest.raises(ValueError, mom.ewma, self.arr, com=9.5, span=20, + halflife=50) + pytest.raises(ValueError, mom.ewma, self.arr) def test_ewma_alpha_old_api(self): # GH 10789 - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): a = mom.ewma(self.arr, alpha=0.61722699889169674) b = mom.ewma(self.arr, com=0.62014947789973052) c = mom.ewma(self.arr, span=2.240298955799461) @@ -1543,14 +1680,14 @@ def test_ewma_alpha_old_api(self): def test_ewma_alpha_arg_old_api(self): # GH 10789 - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertRaises(ValueError, mom.ewma, self.arr) - self.assertRaises(ValueError, mom.ewma, self.arr, - com=10.0, alpha=0.5) - self.assertRaises(ValueError, mom.ewma, self.arr, - span=10.0, alpha=0.5) - self.assertRaises(ValueError, mom.ewma, self.arr, - halflife=10.0, alpha=0.5) + with catch_warnings(record=True): + pytest.raises(ValueError, mom.ewma, self.arr) + pytest.raises(ValueError, mom.ewma, self.arr, + com=10.0, alpha=0.5) + pytest.raises(ValueError, mom.ewma, self.arr, + span=10.0, alpha=0.5) + pytest.raises(ValueError, mom.ewma, self.arr, + halflife=10.0, alpha=0.5) def test_ewm_alpha(self): # GH 10789 @@ -1566,47 +1703,46 @@ def test_ewm_alpha(self): def test_ewm_alpha_arg(self): # GH 10789 s = Series(self.arr) - self.assertRaises(ValueError, s.ewm) - self.assertRaises(ValueError, s.ewm, com=10.0, alpha=0.5) - self.assertRaises(ValueError, s.ewm, span=10.0, alpha=0.5) - self.assertRaises(ValueError, s.ewm, halflife=10.0, alpha=0.5) + pytest.raises(ValueError, s.ewm) + pytest.raises(ValueError, s.ewm, com=10.0, alpha=0.5) + pytest.raises(ValueError, s.ewm, span=10.0, alpha=0.5) + pytest.raises(ValueError, s.ewm, halflife=10.0, alpha=0.5) def test_ewm_domain_checks(self): # GH 12492 s = Series(self.arr) # com must satisfy: com >= 0 - self.assertRaises(ValueError, s.ewm, com=-0.1) + pytest.raises(ValueError, s.ewm, com=-0.1) s.ewm(com=0.0) s.ewm(com=0.1) # span must satisfy: span >= 1 - self.assertRaises(ValueError, s.ewm, span=-0.1) - self.assertRaises(ValueError, s.ewm, span=0.0) - self.assertRaises(ValueError, s.ewm, span=0.9) + pytest.raises(ValueError, s.ewm, span=-0.1) + pytest.raises(ValueError, s.ewm, span=0.0) + pytest.raises(ValueError, s.ewm, span=0.9) s.ewm(span=1.0) s.ewm(span=1.1) # halflife must satisfy: halflife > 0 - self.assertRaises(ValueError, s.ewm, halflife=-0.1) - self.assertRaises(ValueError, s.ewm, halflife=0.0) + pytest.raises(ValueError, s.ewm, halflife=-0.1) + pytest.raises(ValueError, s.ewm, halflife=0.0) s.ewm(halflife=0.1) # alpha must satisfy: 0 < alpha <= 1 - self.assertRaises(ValueError, s.ewm, alpha=-0.1) - self.assertRaises(ValueError, s.ewm, alpha=0.0) + pytest.raises(ValueError, s.ewm, alpha=-0.1) + pytest.raises(ValueError, s.ewm, alpha=0.0) s.ewm(alpha=0.1) s.ewm(alpha=1.0) - self.assertRaises(ValueError, s.ewm, alpha=1.1) + pytest.raises(ValueError, s.ewm, alpha=1.1) def test_ew_empty_arrays(self): arr = np.array([], dtype=np.float64) funcs = [mom.ewma, mom.ewmvol, mom.ewmvar] for f in funcs: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): + with catch_warnings(record=True): result = f(arr, 3) tm.assert_almost_equal(result, arr) def _check_ew(self, func, name=None): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): self._check_ew_ndarray(func, name=name) self._check_ew_structures(func, name=name) @@ -1624,19 +1760,19 @@ def _check_ew_ndarray(self, func, preserve_nan=False, name=None): # check min_periods # GH 7898 result = func(s, 50, min_periods=2) - self.assertTrue(np.isnan(result.values[:11]).all()) - self.assertFalse(np.isnan(result.values[11:]).any()) + assert np.isnan(result.values[:11]).all() + assert not np.isnan(result.values[11:]).any() for min_periods in (0, 1): result = func(s, 50, min_periods=min_periods) if func == mom.ewma: - self.assertTrue(np.isnan(result.values[:10]).all()) - self.assertFalse(np.isnan(result.values[10:]).any()) + assert np.isnan(result.values[:10]).all() + assert not np.isnan(result.values[10:]).any() else: # ewmstd, ewmvol, ewmvar (with bias=False) require at least two # values - self.assertTrue(np.isnan(result.values[:11]).all()) - self.assertFalse(np.isnan(result.values[11:]).any()) + assert np.isnan(result.values[:11]).all() + assert not np.isnan(result.values[11:]).any() # check series of length 0 result = func(Series([]), 50, min_periods=min_periods) @@ -1653,14 +1789,168 @@ def _check_ew_ndarray(self, func, preserve_nan=False, name=None): # pass in ints result2 = func(np.arange(50), span=10) - self.assertEqual(result2.dtype, np.float_) + assert result2.dtype == np.float_ def _check_ew_structures(self, func, name): series_result = getattr(self.series.ewm(com=10), name)() - tm.assertIsInstance(series_result, Series) + assert isinstance(series_result, Series) frame_result = getattr(self.frame.ewm(com=10), name)() - self.assertEqual(type(frame_result), DataFrame) + assert type(frame_result) == DataFrame + + +class TestPairwise(object): + + # GH 7738 + df1s = [DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[0, 1]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 0]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 1]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], + columns=['C', 'C']), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1., 0]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[0., 1]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=['C', 1]), + DataFrame([[2., 4.], [1., 2.], [5., 2.], [8., 1.]], + columns=[1, 0.]), + DataFrame([[2, 4.], [1, 2.], [5, 2.], [8, 1.]], + columns=[0, 1.]), + DataFrame([[2, 4], [1, 2], [5, 2], [8, 1.]], + columns=[1., 'X']), ] + df2 = DataFrame([[None, 1, 1], [None, 1, 2], + [None, 3, 2], [None, 8, 1]], columns=['Y', 'Z', 'X']) + s = Series([1, 1, 3, 8]) + + def compare(self, result, expected): + + # since we have sorted the results + # we can only compare non-nans + result = result.dropna().values + expected = expected.dropna().values + + tm.assert_numpy_array_equal(result, expected) + + @pytest.mark.parametrize('f', [lambda x: x.cov(), lambda x: x.corr()]) + def test_no_flex(self, f): + + # DataFrame methods (which do not call _flex_binary_moment()) + + results = [f(df) for df in self.df1s] + for (df, result) in zip(self.df1s, results): + tm.assert_index_equal(result.index, df.columns) + tm.assert_index_equal(result.columns, df.columns) + for i, result in enumerate(results): + if i > 0: + self.compare(result, results[0]) + + @pytest.mark.parametrize( + 'f', [lambda x: x.expanding().cov(pairwise=True), + lambda x: x.expanding().corr(pairwise=True), + lambda x: x.rolling(window=3).cov(pairwise=True), + lambda x: x.rolling(window=3).corr(pairwise=True), + lambda x: x.ewm(com=3).cov(pairwise=True), + lambda x: x.ewm(com=3).corr(pairwise=True)]) + def test_pairwise_with_self(self, f): + + # DataFrame with itself, pairwise=True + results = [f(df) for df in self.df1s] + for (df, result) in zip(self.df1s, results): + tm.assert_index_equal(result.index.levels[0], + df.index, + check_names=False) + tm.assert_index_equal(result.index.levels[1], + df.columns, + check_names=False) + tm.assert_index_equal(result.columns, df.columns) + for i, result in enumerate(results): + if i > 0: + self.compare(result, results[0]) + + @pytest.mark.parametrize( + 'f', [lambda x: x.expanding().cov(pairwise=False), + lambda x: x.expanding().corr(pairwise=False), + lambda x: x.rolling(window=3).cov(pairwise=False), + lambda x: x.rolling(window=3).corr(pairwise=False), + lambda x: x.ewm(com=3).cov(pairwise=False), + lambda x: x.ewm(com=3).corr(pairwise=False), ]) + def test_no_pairwise_with_self(self, f): + + # DataFrame with itself, pairwise=False + results = [f(df) for df in self.df1s] + for (df, result) in zip(self.df1s, results): + tm.assert_index_equal(result.index, df.index) + tm.assert_index_equal(result.columns, df.columns) + for i, result in enumerate(results): + if i > 0: + self.compare(result, results[0]) + + @pytest.mark.parametrize( + 'f', [lambda x, y: x.expanding().cov(y, pairwise=True), + lambda x, y: x.expanding().corr(y, pairwise=True), + lambda x, y: x.rolling(window=3).cov(y, pairwise=True), + lambda x, y: x.rolling(window=3).corr(y, pairwise=True), + lambda x, y: x.ewm(com=3).cov(y, pairwise=True), + lambda x, y: x.ewm(com=3).corr(y, pairwise=True), ]) + def test_pairwise_with_other(self, f): + + # DataFrame with another DataFrame, pairwise=True + results = [f(df, self.df2) for df in self.df1s] + for (df, result) in zip(self.df1s, results): + tm.assert_index_equal(result.index.levels[0], + df.index, + check_names=False) + tm.assert_index_equal(result.index.levels[1], + self.df2.columns, + check_names=False) + for i, result in enumerate(results): + if i > 0: + self.compare(result, results[0]) + + @pytest.mark.parametrize( + 'f', [lambda x, y: x.expanding().cov(y, pairwise=False), + lambda x, y: x.expanding().corr(y, pairwise=False), + lambda x, y: x.rolling(window=3).cov(y, pairwise=False), + lambda x, y: x.rolling(window=3).corr(y, pairwise=False), + lambda x, y: x.ewm(com=3).cov(y, pairwise=False), + lambda x, y: x.ewm(com=3).corr(y, pairwise=False), ]) + def test_no_pairwise_with_other(self, f): + + # DataFrame with another DataFrame, pairwise=False + results = [f(df, self.df2) if df.columns.is_unique else None + for df in self.df1s] + for (df, result) in zip(self.df1s, results): + if result is not None: + with catch_warnings(record=True): + # we can have int and str columns + expected_index = df.index.union(self.df2.index) + expected_columns = df.columns.union(self.df2.columns) + tm.assert_index_equal(result.index, expected_index) + tm.assert_index_equal(result.columns, expected_columns) + else: + tm.assert_raises_regex( + ValueError, "'arg1' columns are not unique", f, df, + self.df2) + tm.assert_raises_regex( + ValueError, "'arg2' columns are not unique", f, + self.df2, df) + + @pytest.mark.parametrize( + 'f', [lambda x, y: x.expanding().cov(y), + lambda x, y: x.expanding().corr(y), + lambda x, y: x.rolling(window=3).cov(y), + lambda x, y: x.rolling(window=3).corr(y), + lambda x, y: x.ewm(com=3).cov(y), + lambda x, y: x.ewm(com=3).corr(y), ]) + def test_pairwise_with_series(self, f): + + # DataFrame with a Series + results = ([f(df, self.s) for df in self.df1s] + + [f(self.s, df) for df in self.df1s]) + for (df, result) in zip(self.df1s, results): + tm.assert_index_equal(result.index, df.index) + tm.assert_index_equal(result.columns, df.columns) + for i, result in enumerate(results): + if i > 0: + self.compare(result, results[0]) # create the data only once as we are not setting it @@ -1705,10 +1995,10 @@ def create_dataframes(): def is_constant(x): values = x.values.ravel() - return len(set(values[notnull(values)])) == 1 + return len(set(values[notna(values)])) == 1 def no_nans(x): - return x.notnull().all().all() + return x.notna().all().all() # data is a tuple(object, is_contant, no_nans) data = create_series() + create_dataframes() @@ -1719,6 +2009,15 @@ def no_nans(x): _consistency_data = _create_consistency_data() +def _rolling_consistency_cases(): + for window in [1, 2, 3, 10, 20]: + for min_periods in set([0, 1, 2, 3, 4, window]): + if min_periods and (min_periods > window): + continue + for center in [False, True]: + yield window, min_periods, center + + class TestMomentsConsistency(Base): base_functions = [ (lambda v: Series(v).count(), None, 'count'), @@ -1768,7 +2067,7 @@ def _create_data(self): super(TestMomentsConsistency, self)._create_data() self.data = _consistency_data - def setUp(self): + def setup_method(self, method): self._create_data() def _test_moments_consistency(self, min_periods, count, mean, mock_mean, @@ -1778,7 +2077,7 @@ def _test_moments_consistency(self, min_periods, count, mean, mock_mean, var_debiasing_factors=None): def _non_null_values(x): values = x.values.ravel() - return set(values[notnull(values)].tolist()) + return set(values[notna(values)].tolist()) for (x, is_constant, no_nans) in self.data: count_x = count(x) @@ -1791,7 +2090,8 @@ def _non_null_values(x): # check that correlation of a series with itself is either 1 or NaN corr_x_x = corr(x, x) - # self.assertTrue(_non_null_values(corr_x_x).issubset(set([1.]))) # + + # assert _non_null_values(corr_x_x).issubset(set([1.])) # restore once rolling_cov(x, x) is identically equal to var(x) if is_constant: @@ -1821,11 +2121,11 @@ def _non_null_values(x): # check that var(x), std(x), and cov(x) are all >= 0 var_x = var(x) std_x = std(x) - self.assertFalse((var_x < 0).any().any()) - self.assertFalse((std_x < 0).any().any()) + assert not (var_x < 0).any().any() + assert not (std_x < 0).any().any() if cov: cov_x_x = cov(x, x) - self.assertFalse((cov_x_x < 0).any().any()) + assert not (cov_x_x < 0).any().any() # check that var(x) == cov(x, x) assert_equal(var_x, cov_x_x) @@ -1840,7 +2140,7 @@ def _non_null_values(x): if is_constant: # check that variance of constant series is identically 0 - self.assertFalse((var_x > 0).any().any()) + assert not (var_x > 0).any().any() expected = x * np.nan expected[count_x >= max(min_periods, 1)] = 0. if var is var_unbiased: @@ -1849,7 +2149,7 @@ def _non_null_values(x): if isinstance(x, Series): for (y, is_constant, no_nans) in self.data: - if not x.isnull().equals(y.isnull()): + if not x.isna().equals(y.isna()): # can only easily test two Series with similar # structure continue @@ -1885,8 +2185,12 @@ def _non_null_values(x): assert_equal(cov_x_y, mean_x_times_y - (mean_x * mean_y)) - @tm.slow - def test_ewm_consistency(self): + @pytest.mark.slow + @pytest.mark.parametrize( + 'min_periods, adjust, ignore_na', product([0, 1, 2, 3, 4], + [True, False], + [False, True])) + def test_ewm_consistency(self, min_periods, adjust, ignore_na): def _weights(s, com, adjust, ignore_na): if isinstance(s, DataFrame): if not len(s.columns): @@ -1902,8 +2206,8 @@ def _weights(s, com, adjust, ignore_na): w = Series(np.nan, index=s.index) alpha = 1. / (1. + com) if ignore_na: - w[s.notnull()] = _weights(s[s.notnull()], com=com, - adjust=adjust, ignore_na=False) + w[s.notna()] = _weights(s[s.notna()], com=com, + adjust=adjust, ignore_na=False) elif adjust: for i in range(len(s)): if s.iat[i] == s.iat[i]: @@ -1940,52 +2244,51 @@ def _ewma(s, com, min_periods, adjust, ignore_na): return result com = 3. - for min_periods, adjust, ignore_na in product([0, 1, 2, 3, 4], - [True, False], - [False, True]): - # test consistency between different ewm* moments - self._test_moments_consistency( - min_periods=min_periods, - count=lambda x: x.expanding().count(), - mean=lambda x: x.ewm(com=com, min_periods=min_periods, - adjust=adjust, - ignore_na=ignore_na).mean(), - mock_mean=lambda x: _ewma(x, com=com, - min_periods=min_periods, - adjust=adjust, - ignore_na=ignore_na), - corr=lambda x, y: x.ewm(com=com, min_periods=min_periods, - adjust=adjust, - ignore_na=ignore_na).corr(y), - var_unbiased=lambda x: ( - x.ewm(com=com, min_periods=min_periods, - adjust=adjust, - ignore_na=ignore_na).var(bias=False)), - std_unbiased=lambda x: ( - x.ewm(com=com, min_periods=min_periods, - adjust=adjust, ignore_na=ignore_na) - .std(bias=False)), - cov_unbiased=lambda x, y: ( - x.ewm(com=com, min_periods=min_periods, - adjust=adjust, ignore_na=ignore_na) - .cov(y, bias=False)), - var_biased=lambda x: ( - x.ewm(com=com, min_periods=min_periods, - adjust=adjust, ignore_na=ignore_na) - .var(bias=True)), - std_biased=lambda x: x.ewm(com=com, min_periods=min_periods, - adjust=adjust, - ignore_na=ignore_na).std(bias=True), - cov_biased=lambda x, y: ( - x.ewm(com=com, min_periods=min_periods, - adjust=adjust, ignore_na=ignore_na) - .cov(y, bias=True)), - var_debiasing_factors=lambda x: ( - _variance_debiasing_factors(x, com=com, adjust=adjust, - ignore_na=ignore_na))) - - @tm.slow - def test_expanding_consistency(self): + # test consistency between different ewm* moments + self._test_moments_consistency( + min_periods=min_periods, + count=lambda x: x.expanding().count(), + mean=lambda x: x.ewm(com=com, min_periods=min_periods, + adjust=adjust, + ignore_na=ignore_na).mean(), + mock_mean=lambda x: _ewma(x, com=com, + min_periods=min_periods, + adjust=adjust, + ignore_na=ignore_na), + corr=lambda x, y: x.ewm(com=com, min_periods=min_periods, + adjust=adjust, + ignore_na=ignore_na).corr(y), + var_unbiased=lambda x: ( + x.ewm(com=com, min_periods=min_periods, + adjust=adjust, + ignore_na=ignore_na).var(bias=False)), + std_unbiased=lambda x: ( + x.ewm(com=com, min_periods=min_periods, + adjust=adjust, ignore_na=ignore_na) + .std(bias=False)), + cov_unbiased=lambda x, y: ( + x.ewm(com=com, min_periods=min_periods, + adjust=adjust, ignore_na=ignore_na) + .cov(y, bias=False)), + var_biased=lambda x: ( + x.ewm(com=com, min_periods=min_periods, + adjust=adjust, ignore_na=ignore_na) + .var(bias=True)), + std_biased=lambda x: x.ewm(com=com, min_periods=min_periods, + adjust=adjust, + ignore_na=ignore_na).std(bias=True), + cov_biased=lambda x, y: ( + x.ewm(com=com, min_periods=min_periods, + adjust=adjust, ignore_na=ignore_na) + .cov(y, bias=True)), + var_debiasing_factors=lambda x: ( + _variance_debiasing_factors(x, com=com, adjust=adjust, + ignore_na=ignore_na))) + + @pytest.mark.slow + @pytest.mark.parametrize( + 'min_periods', [0, 1, 2, 3, 4]) + def test_expanding_consistency(self, min_periods): # suppress warnings about empty slices, as we are deliberately testing # with empty/0-length Series/DataFrames @@ -1994,87 +2297,72 @@ def test_expanding_consistency(self): message=".*(empty slice|0 for slice).*", category=RuntimeWarning) - for min_periods in [0, 1, 2, 3, 4]: - - # test consistency between different expanding_* moments - self._test_moments_consistency( - min_periods=min_periods, - count=lambda x: x.expanding().count(), - mean=lambda x: x.expanding( - min_periods=min_periods).mean(), - mock_mean=lambda x: x.expanding( - min_periods=min_periods).sum() / x.expanding().count(), - corr=lambda x, y: x.expanding( - min_periods=min_periods).corr(y), - var_unbiased=lambda x: x.expanding( - min_periods=min_periods).var(), - std_unbiased=lambda x: x.expanding( - min_periods=min_periods).std(), - cov_unbiased=lambda x, y: x.expanding( - min_periods=min_periods).cov(y), - var_biased=lambda x: x.expanding( - min_periods=min_periods).var(ddof=0), - std_biased=lambda x: x.expanding( - min_periods=min_periods).std(ddof=0), - cov_biased=lambda x, y: x.expanding( - min_periods=min_periods).cov(y, ddof=0), - var_debiasing_factors=lambda x: ( - x.expanding().count() / - (x.expanding().count() - 1.) - .replace(0., np.nan))) - - # test consistency between expanding_xyz() and either (a) - # expanding_apply of Series.xyz(), or (b) expanding_apply of - # np.nanxyz() - for (x, is_constant, no_nans) in self.data: - functions = self.base_functions - - # GH 8269 - if no_nans: - functions = self.base_functions + self.no_nan_functions - for (f, require_min_periods, name) in functions: - expanding_f = getattr( - x.expanding(min_periods=min_periods), name) - - if (require_min_periods and - (min_periods is not None) and - (min_periods < require_min_periods)): - continue - - if name == 'count': - expanding_f_result = expanding_f() - expanding_apply_f_result = x.expanding( - min_periods=0).apply(func=f) + # test consistency between different expanding_* moments + self._test_moments_consistency( + min_periods=min_periods, + count=lambda x: x.expanding().count(), + mean=lambda x: x.expanding( + min_periods=min_periods).mean(), + mock_mean=lambda x: x.expanding( + min_periods=min_periods).sum() / x.expanding().count(), + corr=lambda x, y: x.expanding( + min_periods=min_periods).corr(y), + var_unbiased=lambda x: x.expanding( + min_periods=min_periods).var(), + std_unbiased=lambda x: x.expanding( + min_periods=min_periods).std(), + cov_unbiased=lambda x, y: x.expanding( + min_periods=min_periods).cov(y), + var_biased=lambda x: x.expanding( + min_periods=min_periods).var(ddof=0), + std_biased=lambda x: x.expanding( + min_periods=min_periods).std(ddof=0), + cov_biased=lambda x, y: x.expanding( + min_periods=min_periods).cov(y, ddof=0), + var_debiasing_factors=lambda x: ( + x.expanding().count() / + (x.expanding().count() - 1.) + .replace(0., np.nan))) + + # test consistency between expanding_xyz() and either (a) + # expanding_apply of Series.xyz(), or (b) expanding_apply of + # np.nanxyz() + for (x, is_constant, no_nans) in self.data: + functions = self.base_functions + + # GH 8269 + if no_nans: + functions = self.base_functions + self.no_nan_functions + for (f, require_min_periods, name) in functions: + expanding_f = getattr( + x.expanding(min_periods=min_periods), name) + + if (require_min_periods and + (min_periods is not None) and + (min_periods < require_min_periods)): + continue + + if name == 'count': + expanding_f_result = expanding_f() + expanding_apply_f_result = x.expanding( + min_periods=0).apply(func=f) + else: + if name in ['cov', 'corr']: + expanding_f_result = expanding_f( + pairwise=False) else: - if name in ['cov', 'corr']: - expanding_f_result = expanding_f( - pairwise=False) - else: - expanding_f_result = expanding_f() - expanding_apply_f_result = x.expanding( - min_periods=min_periods).apply(func=f) - - if not tm._incompat_bottleneck_version(name): - assert_equal(expanding_f_result, - expanding_apply_f_result) - - if (name in ['cov', 'corr']) and isinstance(x, - DataFrame): - # test pairwise=True - expanding_f_result = expanding_f(x, pairwise=True) - expected = Panel(items=x.index, - major_axis=x.columns, - minor_axis=x.columns) - for i, _ in enumerate(x.columns): - for j, _ in enumerate(x.columns): - expected.iloc[:, i, j] = getattr( - x.iloc[:, i].expanding( - min_periods=min_periods), - name)(x.iloc[:, j]) - tm.assert_panel_equal(expanding_f_result, expected) - - @tm.slow - def test_rolling_consistency(self): + expanding_f_result = expanding_f() + expanding_apply_f_result = x.expanding( + min_periods=min_periods).apply(func=f) + + if not tm._incompat_bottleneck_version(name): + assert_equal(expanding_f_result, + expanding_apply_f_result) + + @pytest.mark.slow + @pytest.mark.parametrize( + 'window,min_periods,center', list(_rolling_consistency_cases())) + def test_rolling_consistency(self, window, min_periods, center): # suppress warnings about empty slices, as we are deliberately testing # with empty/0-length Series/DataFrames @@ -2083,119 +2371,91 @@ def test_rolling_consistency(self): message=".*(empty slice|0 for slice).*", category=RuntimeWarning) - def cases(): - for window in [1, 2, 3, 10, 20]: - for min_periods in set([0, 1, 2, 3, 4, window]): - if min_periods and (min_periods > window): - continue - for center in [False, True]: - yield window, min_periods, center - - for window, min_periods, center in cases(): - # test consistency between different rolling_* moments - self._test_moments_consistency( - min_periods=min_periods, - count=lambda x: ( - x.rolling(window=window, center=center) - .count()), - mean=lambda x: ( - x.rolling(window=window, min_periods=min_periods, - center=center).mean()), - mock_mean=lambda x: ( - x.rolling(window=window, - min_periods=min_periods, - center=center).sum() - .divide(x.rolling(window=window, - min_periods=min_periods, - center=center).count())), - corr=lambda x, y: ( - x.rolling(window=window, min_periods=min_periods, - center=center).corr(y)), - - var_unbiased=lambda x: ( - x.rolling(window=window, min_periods=min_periods, - center=center).var()), - - std_unbiased=lambda x: ( - x.rolling(window=window, min_periods=min_periods, - center=center).std()), - - cov_unbiased=lambda x, y: ( - x.rolling(window=window, min_periods=min_periods, - center=center).cov(y)), - - var_biased=lambda x: ( - x.rolling(window=window, min_periods=min_periods, - center=center).var(ddof=0)), - - std_biased=lambda x: ( - x.rolling(window=window, min_periods=min_periods, - center=center).std(ddof=0)), - - cov_biased=lambda x, y: ( - x.rolling(window=window, min_periods=min_periods, - center=center).cov(y, ddof=0)), - var_debiasing_factors=lambda x: ( - x.rolling(window=window, center=center).count() - .divide((x.rolling(window=window, center=center) - .count() - 1.) - .replace(0., np.nan)))) - - # test consistency between rolling_xyz() and either (a) - # rolling_apply of Series.xyz(), or (b) rolling_apply of - # np.nanxyz() - for (x, is_constant, no_nans) in self.data: - functions = self.base_functions - - # GH 8269 - if no_nans: - functions = self.base_functions + self.no_nan_functions - for (f, require_min_periods, name) in functions: - rolling_f = getattr( - x.rolling(window=window, center=center, - min_periods=min_periods), name) - - if require_min_periods and ( - min_periods is not None) and ( - min_periods < require_min_periods): - continue + # test consistency between different rolling_* moments + self._test_moments_consistency( + min_periods=min_periods, + count=lambda x: ( + x.rolling(window=window, center=center) + .count()), + mean=lambda x: ( + x.rolling(window=window, min_periods=min_periods, + center=center).mean()), + mock_mean=lambda x: ( + x.rolling(window=window, + min_periods=min_periods, + center=center).sum() + .divide(x.rolling(window=window, + min_periods=min_periods, + center=center).count())), + corr=lambda x, y: ( + x.rolling(window=window, min_periods=min_periods, + center=center).corr(y)), - if name == 'count': - rolling_f_result = rolling_f() - rolling_apply_f_result = x.rolling( - window=window, min_periods=0, - center=center).apply(func=f) + var_unbiased=lambda x: ( + x.rolling(window=window, min_periods=min_periods, + center=center).var()), + + std_unbiased=lambda x: ( + x.rolling(window=window, min_periods=min_periods, + center=center).std()), + + cov_unbiased=lambda x, y: ( + x.rolling(window=window, min_periods=min_periods, + center=center).cov(y)), + + var_biased=lambda x: ( + x.rolling(window=window, min_periods=min_periods, + center=center).var(ddof=0)), + + std_biased=lambda x: ( + x.rolling(window=window, min_periods=min_periods, + center=center).std(ddof=0)), + + cov_biased=lambda x, y: ( + x.rolling(window=window, min_periods=min_periods, + center=center).cov(y, ddof=0)), + var_debiasing_factors=lambda x: ( + x.rolling(window=window, center=center).count() + .divide((x.rolling(window=window, center=center) + .count() - 1.) + .replace(0., np.nan)))) + + # test consistency between rolling_xyz() and either (a) + # rolling_apply of Series.xyz(), or (b) rolling_apply of + # np.nanxyz() + for (x, is_constant, no_nans) in self.data: + functions = self.base_functions + + # GH 8269 + if no_nans: + functions = self.base_functions + self.no_nan_functions + for (f, require_min_periods, name) in functions: + rolling_f = getattr( + x.rolling(window=window, center=center, + min_periods=min_periods), name) + + if require_min_periods and ( + min_periods is not None) and ( + min_periods < require_min_periods): + continue + + if name == 'count': + rolling_f_result = rolling_f() + rolling_apply_f_result = x.rolling( + window=window, min_periods=0, + center=center).apply(func=f) + else: + if name in ['cov', 'corr']: + rolling_f_result = rolling_f( + pairwise=False) else: - if name in ['cov', 'corr']: - rolling_f_result = rolling_f( - pairwise=False) - else: - rolling_f_result = rolling_f() - rolling_apply_f_result = x.rolling( - window=window, min_periods=min_periods, - center=center).apply(func=f) - if not tm._incompat_bottleneck_version(name): - assert_equal(rolling_f_result, - rolling_apply_f_result) - - if (name in ['cov', 'corr']) and isinstance( - x, DataFrame): - # test pairwise=True - rolling_f_result = rolling_f(x, - pairwise=True) - expected = Panel(items=x.index, - major_axis=x.columns, - minor_axis=x.columns) - for i, _ in enumerate(x.columns): - for j, _ in enumerate(x.columns): - expected.iloc[:, i, j] = ( - getattr( - x.iloc[:, i] - .rolling(window=window, - min_periods=min_periods, - center=center), - name)(x.iloc[:, j])) - tm.assert_panel_equal(rolling_f_result, expected) + rolling_f_result = rolling_f() + rolling_apply_f_result = x.rolling( + window=window, min_periods=min_periods, + center=center).apply(func=f) + if not tm._incompat_bottleneck_version(name): + assert_equal(rolling_f_result, + rolling_apply_f_result) # binary moments def test_rolling_cov(self): @@ -2232,16 +2492,16 @@ def _check_pairwise_moment(self, dispatch, name, **kwargs): def get_result(obj, obj2=None): return getattr(getattr(obj, dispatch)(**kwargs), name)(obj2) - panel = get_result(self.frame) - actual = panel.loc[:, 1, 5] + result = get_result(self.frame) + result = result.loc[(slice(None), 1), 5] + result.index = result.index.droplevel(1) expected = get_result(self.frame[1], self.frame[5]) - tm.assert_series_equal(actual, expected, check_names=False) - self.assertEqual(actual.name, 5) + tm.assert_series_equal(result, expected, check_names=False) def test_flex_binary_moment(self): # GH3155 # don't blow the stack - self.assertRaises(TypeError, rwindow._flex_binary_moment, 5, 6, None) + pytest.raises(TypeError, rwindow._flex_binary_moment, 5, 6, None) def test_corr_sanity(self): # GH 3155 @@ -2251,16 +2511,15 @@ def test_corr_sanity(self): [0.84780328, 0.33394331], [0.78369152, 0.63919667]])) res = df[0].rolling(5, center=True).corr(df[1]) - self.assertTrue(all([np.abs(np.nan_to_num(x)) <= 1 for x in res])) + assert all([np.abs(np.nan_to_num(x)) <= 1 for x in res]) # and some fuzzing - for i in range(10): + for _ in range(10): df = DataFrame(np.random.rand(30, 2)) res = df[0].rolling(5, center=True).corr(df[1]) try: - self.assertTrue(all([np.abs(np.nan_to_num(x)) <= 1 for x in res - ])) - except: + assert all([np.abs(np.nan_to_num(x)) <= 1 for x in res]) + except AssertionError: print(res) def test_flex_binary_frame(self): @@ -2310,16 +2569,16 @@ def func(A, B, com, **kwargs): B[-10:] = np.NaN result = func(A, B, 20, min_periods=5) - self.assertTrue(np.isnan(result.values[:14]).all()) - self.assertFalse(np.isnan(result.values[14:]).any()) + assert np.isnan(result.values[:14]).all() + assert not np.isnan(result.values[14:]).any() # GH 7898 for min_periods in (0, 1, 2): result = func(A, B, 20, min_periods=min_periods) # binary functions (ewmcov, ewmcorr) with bias=False require at # least two values - self.assertTrue(np.isnan(result.values[:11]).all()) - self.assertFalse(np.isnan(result.values[11:]).any()) + assert np.isnan(result.values[:11]).all() + assert not np.isnan(result.values[11:]).any() # check series of length 0 result = func(Series([]), Series([]), 50, min_periods=min_periods) @@ -2330,7 +2589,7 @@ def func(A, B, com, **kwargs): Series([1.]), Series([1.]), 50, min_periods=min_periods) tm.assert_series_equal(result, Series([np.NaN])) - self.assertRaises(Exception, func, A, randn(50), 20, min_periods=5) + pytest.raises(Exception, func, A, randn(50), 20, min_periods=5) def test_expanding_apply(self): ser = Series([]) @@ -2404,17 +2663,14 @@ def test_expanding_cov_pairwise(self): rolling_result = self.frame.rolling(window=len(self.frame), min_periods=1).corr() - for i in result.items: - tm.assert_almost_equal(result[i], rolling_result[i]) + tm.assert_frame_equal(result, rolling_result) def test_expanding_corr_pairwise(self): result = self.frame.expanding().corr() rolling_result = self.frame.rolling(window=len(self.frame), min_periods=1).corr() - - for i in result.items: - tm.assert_almost_equal(result[i], rolling_result[i]) + tm.assert_frame_equal(result, rolling_result) def test_expanding_cov_diff_index(self): # GH 7512 @@ -2482,8 +2738,6 @@ def test_rolling_functions_window_non_shrinkage(self): s_expected = Series(np.nan, index=s.index) df = DataFrame([[1, 5], [3, 2], [3, 9], [-1, 0]], columns=['A', 'B']) df_expected = DataFrame(np.nan, index=df.index, columns=df.columns) - df_expected_panel = Panel(items=df.index, major_axis=df.columns, - minor_axis=df.columns) functions = [lambda x: (x.rolling(window=10, min_periods=5) .cov(x, pairwise=False)), @@ -2515,13 +2769,24 @@ def test_rolling_functions_window_non_shrinkage(self): # scipy needed for rolling_window continue + def test_rolling_functions_window_non_shrinkage_binary(self): + + # corr/cov return a MI DataFrame + df = DataFrame([[1, 5], [3, 2], [3, 9], [-1, 0]], + columns=Index(['A', 'B'], name='foo'), + index=Index(range(4), name='bar')) + df_expected = DataFrame( + columns=Index(['A', 'B'], name='foo'), + index=pd.MultiIndex.from_product([df.index, df.columns], + names=['bar', 'foo']), + dtype='float64') functions = [lambda x: (x.rolling(window=10, min_periods=5) .cov(x, pairwise=True)), lambda x: (x.rolling(window=10, min_periods=5) .corr(x, pairwise=True))] for f in functions: - df_result_panel = f(df) - tm.assert_panel_equal(df_result_panel, df_expected_panel) + df_result = f(df) + tm.assert_frame_equal(df_result, df_expected) def test_moment_functions_zero_length(self): # GH 8056 @@ -2529,13 +2794,9 @@ def test_moment_functions_zero_length(self): s_expected = s df1 = DataFrame() df1_expected = df1 - df1_expected_panel = Panel(items=df1.index, major_axis=df1.columns, - minor_axis=df1.columns) df2 = DataFrame(columns=['a']) df2['a'] = df2['a'].astype('float64') df2_expected = df2 - df2_expected_panel = Panel(items=df2.index, major_axis=df2.columns, - minor_axis=df2.columns) functions = [lambda x: x.expanding().count(), lambda x: x.expanding(min_periods=5).cov( @@ -2588,6 +2849,23 @@ def test_moment_functions_zero_length(self): # scipy needed for rolling_window continue + def test_moment_functions_zero_length_pairwise(self): + + df1 = DataFrame() + df1_expected = df1 + df2 = DataFrame(columns=Index(['a'], name='foo'), + index=Index([], name='bar')) + df2['a'] = df2['a'].astype('float64') + + df1_expected = DataFrame( + index=pd.MultiIndex.from_product([df1.index, df1.columns]), + columns=Index([])) + df2_expected = DataFrame( + index=pd.MultiIndex.from_product([df2.index, df2.columns], + names=['bar', 'foo']), + columns=Index(['a'], name='foo'), + dtype='float64') + functions = [lambda x: (x.expanding(min_periods=5) .cov(x, pairwise=True)), lambda x: (x.expanding(min_periods=5) @@ -2598,24 +2876,33 @@ def test_moment_functions_zero_length(self): .corr(x, pairwise=True)), ] for f in functions: - df1_result_panel = f(df1) - tm.assert_panel_equal(df1_result_panel, df1_expected_panel) + df1_result = f(df1) + tm.assert_frame_equal(df1_result, df1_expected) - df2_result_panel = f(df2) - tm.assert_panel_equal(df2_result_panel, df2_expected_panel) + df2_result = f(df2) + tm.assert_frame_equal(df2_result, df2_expected) def test_expanding_cov_pairwise_diff_length(self): # GH 7512 - df1 = DataFrame([[1, 5], [3, 2], [3, 9]], columns=['A', 'B']) - df1a = DataFrame([[1, 5], [3, 9]], index=[0, 2], columns=['A', 'B']) - df2 = DataFrame([[5, 6], [None, None], [2, 1]], columns=['X', 'Y']) - df2a = DataFrame([[5, 6], [2, 1]], index=[0, 2], columns=['X', 'Y']) - result1 = df1.expanding().cov(df2a, pairwise=True)[2] - result2 = df1.expanding().cov(df2a, pairwise=True)[2] - result3 = df1a.expanding().cov(df2, pairwise=True)[2] - result4 = df1a.expanding().cov(df2a, pairwise=True)[2] - expected = DataFrame([[-3., -5.], [-6., -10.]], index=['A', 'B'], - columns=['X', 'Y']) + df1 = DataFrame([[1, 5], [3, 2], [3, 9]], + columns=Index(['A', 'B'], name='foo')) + df1a = DataFrame([[1, 5], [3, 9]], + index=[0, 2], + columns=Index(['A', 'B'], name='foo')) + df2 = DataFrame([[5, 6], [None, None], [2, 1]], + columns=Index(['X', 'Y'], name='foo')) + df2a = DataFrame([[5, 6], [2, 1]], + index=[0, 2], + columns=Index(['X', 'Y'], name='foo')) + # TODO: xref gh-15826 + # .loc is not preserving the names + result1 = df1.expanding().cov(df2a, pairwise=True).loc[2] + result2 = df1.expanding().cov(df2a, pairwise=True).loc[2] + result3 = df1a.expanding().cov(df2, pairwise=True).loc[2] + result4 = df1a.expanding().cov(df2a, pairwise=True).loc[2] + expected = DataFrame([[-3.0, -6.0], [-5.0, -10.0]], + columns=Index(['A', 'B'], name='foo'), + index=Index(['X', 'Y'], name='foo')) tm.assert_frame_equal(result1, expected) tm.assert_frame_equal(result2, expected) tm.assert_frame_equal(result3, expected) @@ -2623,149 +2910,30 @@ def test_expanding_cov_pairwise_diff_length(self): def test_expanding_corr_pairwise_diff_length(self): # GH 7512 - df1 = DataFrame([[1, 2], [3, 2], [3, 4]], columns=['A', 'B']) - df1a = DataFrame([[1, 2], [3, 4]], index=[0, 2], columns=['A', 'B']) - df2 = DataFrame([[5, 6], [None, None], [2, 1]], columns=['X', 'Y']) - df2a = DataFrame([[5, 6], [2, 1]], index=[0, 2], columns=['X', 'Y']) - result1 = df1.expanding().corr(df2, pairwise=True)[2] - result2 = df1.expanding().corr(df2a, pairwise=True)[2] - result3 = df1a.expanding().corr(df2, pairwise=True)[2] - result4 = df1a.expanding().corr(df2a, pairwise=True)[2] - expected = DataFrame([[-1.0, -1.0], [-1.0, -1.0]], index=['A', 'B'], - columns=['X', 'Y']) + df1 = DataFrame([[1, 2], [3, 2], [3, 4]], + columns=['A', 'B'], + index=Index(range(3), name='bar')) + df1a = DataFrame([[1, 2], [3, 4]], + index=Index([0, 2], name='bar'), + columns=['A', 'B']) + df2 = DataFrame([[5, 6], [None, None], [2, 1]], + columns=['X', 'Y'], + index=Index(range(3), name='bar')) + df2a = DataFrame([[5, 6], [2, 1]], + index=Index([0, 2], name='bar'), + columns=['X', 'Y']) + result1 = df1.expanding().corr(df2, pairwise=True).loc[2] + result2 = df1.expanding().corr(df2a, pairwise=True).loc[2] + result3 = df1a.expanding().corr(df2, pairwise=True).loc[2] + result4 = df1a.expanding().corr(df2a, pairwise=True).loc[2] + expected = DataFrame([[-1.0, -1.0], [-1.0, -1.0]], + columns=['A', 'B'], + index=Index(['X', 'Y'])) tm.assert_frame_equal(result1, expected) tm.assert_frame_equal(result2, expected) tm.assert_frame_equal(result3, expected) tm.assert_frame_equal(result4, expected) - def test_pairwise_stats_column_names_order(self): - # GH 7738 - df1s = [DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[0, 1]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 0]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1, 1]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], - columns=['C', 'C']), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[1., 0]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=[0., 1]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1]], columns=['C', 1]), - DataFrame([[2., 4.], [1., 2.], [5., 2.], [8., 1.]], - columns=[1, 0.]), - DataFrame([[2, 4.], [1, 2.], [5, 2.], [8, 1.]], - columns=[0, 1.]), - DataFrame([[2, 4], [1, 2], [5, 2], [8, 1.]], - columns=[1., 'X']), ] - df2 = DataFrame([[None, 1, 1], [None, 1, 2], - [None, 3, 2], [None, 8, 1]], columns=['Y', 'Z', 'X']) - s = Series([1, 1, 3, 8]) - - # suppress warnings about incomparable objects, as we are deliberately - # testing with such column labels - with warnings.catch_warnings(): - warnings.filterwarnings("ignore", - message=".*incomparable objects.*", - category=RuntimeWarning) - - # DataFrame methods (which do not call _flex_binary_moment()) - for f in [lambda x: x.cov(), lambda x: x.corr(), ]: - results = [f(df) for df in df1s] - for (df, result) in zip(df1s, results): - tm.assert_index_equal(result.index, df.columns) - tm.assert_index_equal(result.columns, df.columns) - for i, result in enumerate(results): - if i > 0: - # compare internal values, as columns can be different - self.assert_numpy_array_equal(result.values, - results[0].values) - - # DataFrame with itself, pairwise=True - for f in [lambda x: x.expanding().cov(pairwise=True), - lambda x: x.expanding().corr(pairwise=True), - lambda x: x.rolling(window=3).cov(pairwise=True), - lambda x: x.rolling(window=3).corr(pairwise=True), - lambda x: x.ewm(com=3).cov(pairwise=True), - lambda x: x.ewm(com=3).corr(pairwise=True), ]: - results = [f(df) for df in df1s] - for (df, result) in zip(df1s, results): - tm.assert_index_equal(result.items, df.index) - tm.assert_index_equal(result.major_axis, df.columns) - tm.assert_index_equal(result.minor_axis, df.columns) - for i, result in enumerate(results): - if i > 0: - self.assert_numpy_array_equal(result.values, - results[0].values) - - # DataFrame with itself, pairwise=False - for f in [lambda x: x.expanding().cov(pairwise=False), - lambda x: x.expanding().corr(pairwise=False), - lambda x: x.rolling(window=3).cov(pairwise=False), - lambda x: x.rolling(window=3).corr(pairwise=False), - lambda x: x.ewm(com=3).cov(pairwise=False), - lambda x: x.ewm(com=3).corr(pairwise=False), ]: - results = [f(df) for df in df1s] - for (df, result) in zip(df1s, results): - tm.assert_index_equal(result.index, df.index) - tm.assert_index_equal(result.columns, df.columns) - for i, result in enumerate(results): - if i > 0: - self.assert_numpy_array_equal(result.values, - results[0].values) - - # DataFrame with another DataFrame, pairwise=True - for f in [lambda x, y: x.expanding().cov(y, pairwise=True), - lambda x, y: x.expanding().corr(y, pairwise=True), - lambda x, y: x.rolling(window=3).cov(y, pairwise=True), - lambda x, y: x.rolling(window=3).corr(y, pairwise=True), - lambda x, y: x.ewm(com=3).cov(y, pairwise=True), - lambda x, y: x.ewm(com=3).corr(y, pairwise=True), ]: - results = [f(df, df2) for df in df1s] - for (df, result) in zip(df1s, results): - tm.assert_index_equal(result.items, df.index) - tm.assert_index_equal(result.major_axis, df.columns) - tm.assert_index_equal(result.minor_axis, df2.columns) - for i, result in enumerate(results): - if i > 0: - self.assert_numpy_array_equal(result.values, - results[0].values) - - # DataFrame with another DataFrame, pairwise=False - for f in [lambda x, y: x.expanding().cov(y, pairwise=False), - lambda x, y: x.expanding().corr(y, pairwise=False), - lambda x, y: x.rolling(window=3).cov(y, pairwise=False), - lambda x, y: x.rolling(window=3).corr(y, pairwise=False), - lambda x, y: x.ewm(com=3).cov(y, pairwise=False), - lambda x, y: x.ewm(com=3).corr(y, pairwise=False), ]: - results = [f(df, df2) if df.columns.is_unique else None - for df in df1s] - for (df, result) in zip(df1s, results): - if result is not None: - expected_index = df.index.union(df2.index) - expected_columns = df.columns.union(df2.columns) - tm.assert_index_equal(result.index, expected_index) - tm.assert_index_equal(result.columns, expected_columns) - else: - tm.assertRaisesRegexp( - ValueError, "'arg1' columns are not unique", f, df, - df2) - tm.assertRaisesRegexp( - ValueError, "'arg2' columns are not unique", f, - df2, df) - - # DataFrame with a Series - for f in [lambda x, y: x.expanding().cov(y), - lambda x, y: x.expanding().corr(y), - lambda x, y: x.rolling(window=3).cov(y), - lambda x, y: x.rolling(window=3).corr(y), - lambda x, y: x.ewm(com=3).cov(y), - lambda x, y: x.ewm(com=3).corr(y), ]: - results = [f(df, s) for df in df1s] + [f(s, df) for df in df1s] - for (df, result) in zip(df1s, results): - tm.assert_index_equal(result.index, df.index) - tm.assert_index_equal(result.columns, df.columns) - for i, result in enumerate(results): - if i > 0: - self.assert_numpy_array_equal(result.values, - results[0].values) - def test_rolling_skew_edge_cases(self): all_nan = Series([np.NaN] * 5) @@ -2826,13 +2994,13 @@ def _check_expanding_ndarray(self, func, static_comp, has_min_periods=True, # min_periods is working correctly result = func(arr, min_periods=15) - self.assertTrue(np.isnan(result[13])) - self.assertFalse(np.isnan(result[14])) + assert np.isnan(result[13]) + assert not np.isnan(result[14]) arr2 = randn(20) result = func(arr2, min_periods=5) - self.assertTrue(isnull(result[3])) - self.assertTrue(notnull(result[4])) + assert isna(result[3]) + assert notna(result[4]) # min_periods=0 result0 = func(arr, min_periods=0) @@ -2844,9 +3012,9 @@ def _check_expanding_ndarray(self, func, static_comp, has_min_periods=True, def _check_expanding_structures(self, func): series_result = func(self.series) - tm.assertIsInstance(series_result, Series) + assert isinstance(series_result, Series) frame_result = func(self.frame) - self.assertEqual(type(frame_result), DataFrame) + assert type(frame_result) == DataFrame def _check_expanding(self, func, static_comp, has_min_periods=True, has_time_rule=True, preserve_nan=True): @@ -2872,7 +3040,7 @@ def test_rolling_max_gh6297(self): expected = Series([1.0, 2.0, 6.0, 4.0, 5.0], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): x = series.rolling(window=1, freq='D').max() tm.assert_series_equal(expected, x) @@ -2891,14 +3059,14 @@ def test_rolling_max_how_resample(self): # Default how should be max expected = Series([0.0, 1.0, 2.0, 3.0, 20.0], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): x = series.rolling(window=1, freq='D').max() tm.assert_series_equal(expected, x) # Now specify median (10.0) expected = Series([0.0, 1.0, 2.0, 3.0, 10.0], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): x = series.rolling(window=1, freq='D').max(how='median') tm.assert_series_equal(expected, x) @@ -2906,7 +3074,7 @@ def test_rolling_max_how_resample(self): v = (4.0 + 10.0 + 20.0) / 3.0 expected = Series([0.0, 1.0, 2.0, 3.0, v], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): x = series.rolling(window=1, freq='D').max(how='mean') tm.assert_series_equal(expected, x) @@ -2925,7 +3093,7 @@ def test_rolling_min_how_resample(self): # Default how should be min expected = Series([0.0, 1.0, 2.0, 3.0, 4.0], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): r = series.rolling(window=1, freq='D') tm.assert_series_equal(expected, r.min()) @@ -2944,7 +3112,7 @@ def test_rolling_median_how_resample(self): # Default how should be median expected = Series([0.0, 1.0, 2.0, 3.0, 10], index=[datetime(1975, 1, i, 0) for i in range(1, 6)]) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + with catch_warnings(record=True): x = series.rolling(window=1, freq='D').median() tm.assert_series_equal(expected, x) @@ -2966,15 +3134,15 @@ def test_rolling_min_max_numeric_types(self): # correctness result = (DataFrame(np.arange(20, dtype=data_type)) .rolling(window=5).max()) - self.assertEqual(result.dtypes[0], np.dtype("f8")) + assert result.dtypes[0] == np.dtype("f8") result = (DataFrame(np.arange(20, dtype=data_type)) .rolling(window=5).min()) - self.assertEqual(result.dtypes[0], np.dtype("f8")) + assert result.dtypes[0] == np.dtype("f8") -class TestGrouperGrouping(tm.TestCase): +class TestGrouperGrouping(object): - def setUp(self): + def setup_method(self, method): self.series = Series(np.arange(10)) self.frame = DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, 'B': np.arange(40)}) @@ -2983,12 +3151,12 @@ def test_mutated(self): def f(): self.frame.groupby('A', foo=1) - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) g = self.frame.groupby('A') - self.assertFalse(g.mutated) + assert not g.mutated g = self.frame.groupby('A', mutated=True) - self.assertTrue(g.mutated) + assert g.mutated def test_getitem(self): g = self.frame.groupby('A') @@ -3117,12 +3285,12 @@ def test_expanding_apply(self): tm.assert_frame_equal(result, expected) -class TestRollingTS(tm.TestCase): +class TestRollingTS(object): # rolling time-series friendly # xref GH13327 - def setUp(self): + def setup_method(self, method): self.regular = DataFrame({'A': pd.date_range('20130101', periods=5, @@ -3152,16 +3320,16 @@ def test_valid(self): df = self.regular # not a valid freq - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling(window='foobar') # not a datetimelike index - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.reset_index().rolling(window='foobar') # non-fixed freqs for freq in ['2MS', pd.offsets.MonthBegin(2)]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling(window=freq) for freq in ['1D', pd.offsets.Day(2), '2ms']: @@ -3169,11 +3337,11 @@ def test_valid(self): # non-integer min_periods for minp in [1.0, 'foo', np.array([1, 2, 3])]: - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling(window='1D', min_periods=minp) # center is not implemented - with self.assertRaises(NotImplementedError): + with pytest.raises(NotImplementedError): df.rolling(window='1D', center=True) def test_on(self): @@ -3181,7 +3349,7 @@ def test_on(self): df = self.regular # not a valid column - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling(window='2s', on='foobar') # column is valid @@ -3190,7 +3358,7 @@ def test_on(self): df.rolling(window='2d', on='C').sum() # invalid columns - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling(window='2d', on='B') # ok even though on non-selected @@ -3204,22 +3372,22 @@ def test_monotonic_on(self): freq='s'), 'B': range(5)}) - self.assertTrue(df.A.is_monotonic) + assert df.A.is_monotonic df.rolling('2s', on='A').sum() df = df.set_index('A') - self.assertTrue(df.index.is_monotonic) + assert df.index.is_monotonic df.rolling('2s').sum() # non-monotonic df.index = reversed(df.index.tolist()) - self.assertFalse(df.index.is_monotonic) + assert not df.index.is_monotonic - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling('2s').sum() df = df.reset_index() - with self.assertRaises(ValueError): + with pytest.raises(ValueError): df.rolling('2s', on='A').sum() def test_frame_on(self): @@ -3331,6 +3499,45 @@ def test_min_periods(self): result = df.rolling('2s', min_periods=1).sum() tm.assert_frame_equal(result, expected) + def test_closed(self): + + # xref GH13965 + + df = DataFrame({'A': [1] * 5}, + index=[pd.Timestamp('20130101 09:00:01'), + pd.Timestamp('20130101 09:00:02'), + pd.Timestamp('20130101 09:00:03'), + pd.Timestamp('20130101 09:00:04'), + pd.Timestamp('20130101 09:00:06')]) + + # closed must be 'right', 'left', 'both', 'neither' + with pytest.raises(ValueError): + self.regular.rolling(window='2s', closed="blabla") + + expected = df.copy() + expected["A"] = [1.0, 2, 2, 2, 1] + result = df.rolling('2s', closed='right').sum() + tm.assert_frame_equal(result, expected) + + # default should be 'right' + result = df.rolling('2s').sum() + tm.assert_frame_equal(result, expected) + + expected = df.copy() + expected["A"] = [1.0, 2, 3, 3, 2] + result = df.rolling('2s', closed='both').sum() + tm.assert_frame_equal(result, expected) + + expected = df.copy() + expected["A"] = [np.nan, 1.0, 2, 2, 1] + result = df.rolling('2s', closed='left').sum() + tm.assert_frame_equal(result, expected) + + expected = df.copy() + expected["A"] = [np.nan, 1.0, 1, 1, np.nan] + result = df.rolling('2s', closed='neither').sum() + tm.assert_frame_equal(result, expected) + def test_ragged_sum(self): df = self.ragged @@ -3410,7 +3617,7 @@ def test_ragged_quantile(self): result = df.rolling(window='2s', min_periods=1).quantile(0.5) expected = df.copy() - expected['B'] = [0.0, 1, 1.0, 3.0, 3.0] + expected['B'] = [0.0, 1, 1.5, 3.0, 3.5] tm.assert_frame_equal(result, expected) def test_ragged_std(self): @@ -3563,11 +3770,11 @@ def test_perf_min(self): freq='s')) expected = dfp.rolling(2, min_periods=1).min() result = dfp.rolling('2s').min() - self.assertTrue(((result - expected) < 0.01).all().bool()) + assert ((result - expected) < 0.01).all().bool() expected = dfp.rolling(200, min_periods=1).min() result = dfp.rolling('200s').min() - self.assertTrue(((result - expected) < 0.01).all().bool()) + assert ((result - expected) < 0.01).all().bool() def test_ragged_max(self): @@ -3679,3 +3886,44 @@ def test_groupby_monotonic(self): lambda x: x.rolling('180D')['amount'].sum()) result = df.groupby('name').rolling('180D', on='date')['amount'].sum() tm.assert_series_equal(result, expected) + + def test_non_monotonic(self): + # GH 13966 (similar to #15130, closed by #15175) + + dates = pd.date_range(start='2016-01-01 09:30:00', + periods=20, freq='s') + df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, + 'B': np.concatenate((dates, dates)), + 'C': np.arange(40)}) + + result = df.groupby('A').rolling('4s', on='B').C.mean() + expected = df.set_index('B').groupby('A').apply( + lambda x: x.rolling('4s')['C'].mean()) + tm.assert_series_equal(result, expected) + + df2 = df.sort_values('B') + result = df2.groupby('A').rolling('4s', on='B').C.mean() + tm.assert_series_equal(result, expected) + + def test_rolling_cov_offset(self): + # GH16058 + + idx = pd.date_range('2017-01-01', periods=24, freq='1h') + ss = pd.Series(np.arange(len(idx)), index=idx) + + result = ss.rolling('2h').cov() + expected = pd.Series([np.nan] + [0.5 for _ in range(len(idx) - 1)], + index=idx) + tm.assert_series_equal(result, expected) + + expected2 = ss.rolling(2, min_periods=1).cov() + tm.assert_series_equal(result, expected2) + + result = ss.rolling('3h').cov() + expected = pd.Series([np.nan, 0.5] + + [1.0 for _ in range(len(idx) - 2)], + index=idx) + tm.assert_series_equal(result, expected) + + expected2 = ss.rolling(3, min_periods=1).cov() + tm.assert_series_equal(result, expected2) diff --git a/pandas/tests/tools/__init__.py b/pandas/tests/tools/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tools/tests/test_util.py b/pandas/tests/tools/test_numeric.py similarity index 71% rename from pandas/tools/tests/test_util.py rename to pandas/tests/tools/test_numeric.py index 8a8960a057926..1d13ba93ba759 100644 --- a/pandas/tools/tests/test_util.py +++ b/pandas/tests/tools/test_numeric.py @@ -1,124 +1,30 @@ -import os -import locale -import codecs -import nose +import pytest import decimal import numpy as np +import pandas as pd +from pandas import to_numeric + +from pandas.util import testing as tm from numpy import iinfo -import pandas as pd -from pandas import (date_range, Index, _np_version_under1p9) -import pandas.util.testing as tm -from pandas.tools.util import cartesian_product, to_numeric - -CURRENT_LOCALE = locale.getlocale() -LOCALE_OVERRIDE = os.environ.get('LOCALE_OVERRIDE', None) - - -class TestCartesianProduct(tm.TestCase): - - def test_simple(self): - x, y = list('ABC'), [1, 22] - result1, result2 = cartesian_product([x, y]) - expected1 = np.array(['A', 'A', 'B', 'B', 'C', 'C']) - expected2 = np.array([1, 22, 1, 22, 1, 22]) - tm.assert_numpy_array_equal(result1, expected1) - tm.assert_numpy_array_equal(result2, expected2) - - def test_datetimeindex(self): - # regression test for GitHub issue #6439 - # make sure that the ordering on datetimeindex is consistent - x = date_range('2000-01-01', periods=2) - result1, result2 = [Index(y).day for y in cartesian_product([x, x])] - expected1 = np.array([1, 1, 2, 2], dtype=np.int32) - expected2 = np.array([1, 2, 1, 2], dtype=np.int32) - tm.assert_numpy_array_equal(result1, expected1) - tm.assert_numpy_array_equal(result2, expected2) + +class TestToNumeric(object): def test_empty(self): - # product of empty factors - X = [[], [0, 1], []] - Y = [[], [], ['a', 'b', 'c']] - for x, y in zip(X, Y): - expected1 = np.array([], dtype=np.asarray(x).dtype) - expected2 = np.array([], dtype=np.asarray(y).dtype) - result1, result2 = cartesian_product([x, y]) - tm.assert_numpy_array_equal(result1, expected1) - tm.assert_numpy_array_equal(result2, expected2) - - # empty product (empty input): - result = cartesian_product([]) - expected = [] - tm.assert_equal(result, expected) - - def test_invalid_input(self): - invalid_inputs = [1, [1], [1, 2], [[1], 2], - 'a', ['a'], ['a', 'b'], [['a'], 'b']] - msg = "Input must be a list-like of list-likes" - for X in invalid_inputs: - tm.assertRaisesRegexp(TypeError, msg, cartesian_product, X=X) - - -class TestLocaleUtils(tm.TestCase): - - @classmethod - def setUpClass(cls): - super(TestLocaleUtils, cls).setUpClass() - cls.locales = tm.get_locales() - - if not cls.locales: - raise nose.SkipTest("No locales found") - - tm._skip_if_windows() - - @classmethod - def tearDownClass(cls): - super(TestLocaleUtils, cls).tearDownClass() - del cls.locales - - def test_get_locales(self): - # all systems should have at least a single locale - assert len(tm.get_locales()) > 0 - - def test_get_locales_prefix(self): - if len(self.locales) == 1: - raise nose.SkipTest("Only a single locale found, no point in " - "trying to test filtering locale prefixes") - first_locale = self.locales[0] - assert len(tm.get_locales(prefix=first_locale[:2])) > 0 - - def test_set_locale(self): - if len(self.locales) == 1: - raise nose.SkipTest("Only a single locale found, no point in " - "trying to test setting another locale") - - if LOCALE_OVERRIDE is None: - lang, enc = 'it_CH', 'UTF-8' - elif LOCALE_OVERRIDE == 'C': - lang, enc = 'en_US', 'ascii' - else: - lang, enc = LOCALE_OVERRIDE.split('.') - - enc = codecs.lookup(enc).name - new_locale = lang, enc - - if not tm._can_set_locale(new_locale): - with tm.assertRaises(locale.Error): - with tm.set_locale(new_locale): - pass - else: - with tm.set_locale(new_locale) as normalized_locale: - new_lang, new_enc = normalized_locale.split('.') - new_enc = codecs.lookup(enc).name - normalized_locale = new_lang, new_enc - self.assertEqual(normalized_locale, new_locale) - - current_locale = locale.getlocale() - self.assertEqual(current_locale, CURRENT_LOCALE) - - -class TestToNumeric(tm.TestCase): + # see gh-16302 + s = pd.Series([], dtype=object) + + res = to_numeric(s) + expected = pd.Series([], dtype=np.int64) + + tm.assert_series_equal(res, expected) + + # Original issue example + res = to_numeric(s, errors='coerce', downcast='integer') + expected = pd.Series([], dtype=np.int8) + + tm.assert_series_equal(res, expected) def test_series(self): s = pd.Series(['1', '-3.14', '7']) @@ -148,7 +54,7 @@ def test_series_numeric(self): def test_error(self): s = pd.Series([1, -3.14, 'apple']) msg = 'Unable to parse string "apple" at position 2' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): to_numeric(s, errors='raise') res = to_numeric(s, errors='ignore') @@ -161,13 +67,13 @@ def test_error(self): s = pd.Series(['orange', 1, -3.14, 'apple']) msg = 'Unable to parse string "orange" at position 0' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): to_numeric(s, errors='raise') def test_error_seen_bool(self): s = pd.Series([True, False, 'apple']) msg = 'Unable to parse string "apple" at position 2' - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): to_numeric(s, errors='raise') res = to_numeric(s, errors='ignore') @@ -258,24 +164,24 @@ def test_all_nan(self): def test_type_check(self): # GH 11776 df = pd.DataFrame({'a': [1, -3.14, 7], 'b': ['4', '5', '6']}) - with tm.assertRaisesRegexp(TypeError, "1-d array"): + with tm.assert_raises_regex(TypeError, "1-d array"): to_numeric(df) for errors in ['ignore', 'raise', 'coerce']: - with tm.assertRaisesRegexp(TypeError, "1-d array"): + with tm.assert_raises_regex(TypeError, "1-d array"): to_numeric(df, errors=errors) def test_scalar(self): - self.assertEqual(pd.to_numeric(1), 1) - self.assertEqual(pd.to_numeric(1.1), 1.1) + assert pd.to_numeric(1) == 1 + assert pd.to_numeric(1.1) == 1.1 - self.assertEqual(pd.to_numeric('1'), 1) - self.assertEqual(pd.to_numeric('1.1'), 1.1) + assert pd.to_numeric('1') == 1 + assert pd.to_numeric('1.1') == 1.1 - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): to_numeric('XX', errors='raise') - self.assertEqual(to_numeric('XX', errors='ignore'), 'XX') - self.assertTrue(np.isnan(to_numeric('XX', errors='coerce'))) + assert to_numeric('XX', errors='ignore') == 'XX' + assert np.isnan(to_numeric('XX', errors='coerce')) def test_numeric_dtypes(self): idx = pd.Index([1, 2, 3], name='xxx') @@ -362,7 +268,7 @@ def test_non_hashable(self): res = pd.to_numeric(s, errors='ignore') tm.assert_series_equal(res, pd.Series([[10.0, 2], 1.0, 'apple'])) - with self.assertRaisesRegexp(TypeError, "Invalid object type"): + with tm.assert_raises_regex(TypeError, "Invalid object type"): pd.to_numeric(s) def test_downcast(self): @@ -383,7 +289,7 @@ def test_downcast(self): smallest_float_dtype = float_32_char for data in (mixed_data, int_data, date_data): - with self.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): pd.to_numeric(data, downcast=invalid_downcast) expected = np.array([1, 2, 3], dtype=np.int64) @@ -449,9 +355,6 @@ def test_downcast(self): def test_downcast_limits(self): # Test the limits of each downcast. Bug: #14401. - # Check to make sure numpy is new enough to run this test. - if _np_version_under1p9: - raise nose.SkipTest("Numpy version is under 1.9") i = 'integer' u = 'unsigned' @@ -477,9 +380,4 @@ def test_downcast_limits(self): for dtype, downcast, min_max in dtype_downcast_min_max: series = pd.to_numeric(pd.Series(min_max), downcast=downcast) - tm.assert_equal(series.dtype, dtype) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert series.dtype == dtype diff --git a/pandas/tests/tseries/__init__.py b/pandas/tests/tseries/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tseries/tests/data/cday-0.14.1.pickle b/pandas/tests/tseries/data/cday-0.14.1.pickle similarity index 100% rename from pandas/tseries/tests/data/cday-0.14.1.pickle rename to pandas/tests/tseries/data/cday-0.14.1.pickle diff --git a/pandas/tseries/tests/data/dateoffset_0_15_2.pickle b/pandas/tests/tseries/data/dateoffset_0_15_2.pickle similarity index 100% rename from pandas/tseries/tests/data/dateoffset_0_15_2.pickle rename to pandas/tests/tseries/data/dateoffset_0_15_2.pickle diff --git a/pandas/tseries/tests/test_frequencies.py b/pandas/tests/tseries/test_frequencies.py similarity index 66% rename from pandas/tseries/tests/test_frequencies.py rename to pandas/tests/tseries/test_frequencies.py index dfb7b26371d7a..4bcd0b49db7e0 100644 --- a/pandas/tseries/tests/test_frequencies.py +++ b/pandas/tests/tseries/test_frequencies.py @@ -1,16 +1,17 @@ from datetime import datetime, timedelta from pandas.compat import range +import pytest import numpy as np from pandas import (Index, DatetimeIndex, Timestamp, Series, date_range, period_range) import pandas.tseries.frequencies as frequencies -from pandas.tseries.tools import to_datetime +from pandas.core.tools.datetimes import to_datetime import pandas.tseries.offsets as offsets -from pandas.tseries.period import PeriodIndex +from pandas.core.indexes.period import PeriodIndex import pandas.compat as compat from pandas.compat import is_platform_windows @@ -18,7 +19,7 @@ from pandas import Timedelta -class TestToOffset(tm.TestCase): +class TestToOffset(object): def test_to_offset_multiple(self): freqstr = '2h30min' @@ -100,7 +101,8 @@ def test_to_offset_multiple(self): assert (result == expected) # malformed - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: 2h20m'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: 2h20m'): frequencies.to_offset('2h20m') def test_to_offset_negative(self): @@ -122,17 +124,23 @@ def test_to_offset_negative(self): def test_to_offset_invalid(self): # GH 13930 - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: U1'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: U1'): frequencies.to_offset('U1') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: -U'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: -U'): frequencies.to_offset('-U') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: 3U1'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: 3U1'): frequencies.to_offset('3U1') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: -2-3U'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: -2-3U'): frequencies.to_offset('-2-3U') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: -2D:3H'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: -2D:3H'): frequencies.to_offset('-2D:3H') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: 1.5.0S'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: 1.5.0S'): frequencies.to_offset('1.5.0S') # split offsets with spaces are valid @@ -145,10 +153,11 @@ def test_to_offset_invalid(self): # special cases assert frequencies.to_offset('2SMS-15') == offsets.SemiMonthBegin(2) - with tm.assertRaisesRegexp(ValueError, - 'Invalid frequency: 2SMS-15-15'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: 2SMS-15-15'): frequencies.to_offset('2SMS-15-15') - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: 2SMS-15D'): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: 2SMS-15D'): frequencies.to_offset('2SMS-15D') def test_to_offset_leading_zero(self): @@ -198,7 +207,7 @@ def test_to_offset_pd_timedelta(self): assert (expected == result) td = Timedelta(microseconds=0) - tm.assertRaises(ValueError, lambda: frequencies.to_offset(td)) + pytest.raises(ValueError, lambda: frequencies.to_offset(td)) def test_anchored_shortcuts(self): result = frequencies.to_offset('W') @@ -239,14 +248,28 @@ def test_anchored_shortcuts(self): # ensure invalid cases fail as expected invalid_anchors = ['SM-0', 'SM-28', 'SM-29', - 'SM-FOO', 'BSM', 'SM--1' + 'SM-FOO', 'BSM', 'SM--1', 'SMS-1', 'SMS-28', 'SMS-30', - 'SMS-BAR', 'BSMS', 'SMS--2'] + 'SMS-BAR', 'SMS-BYR' 'BSMS', + 'SMS--2'] for invalid_anchor in invalid_anchors: - with tm.assertRaisesRegexp(ValueError, 'Invalid frequency: '): + with tm.assert_raises_regex(ValueError, + 'Invalid frequency: '): frequencies.to_offset(invalid_anchor) +def test_ms_vs_MS(): + left = frequencies.get_offset('ms') + right = frequencies.get_offset('MS') + assert left == offsets.Milli() + assert right == offsets.MonthBegin() + + +def test_rule_aliases(): + rule = frequencies.to_offset('10us') + assert rule == offsets.Micro(10) + + def test_get_rule_month(): result = frequencies._get_rule_month('W') assert (result == 'DEC') @@ -270,11 +293,15 @@ def test_get_rule_month(): result = frequencies._get_rule_month('A-DEC') assert (result == 'DEC') + result = frequencies._get_rule_month('Y-DEC') + assert (result == 'DEC') result = frequencies._get_rule_month(offsets.YearEnd()) assert (result == 'DEC') result = frequencies._get_rule_month('A-MAY') assert (result == 'MAY') + result = frequencies._get_rule_month('Y-MAY') + assert (result == 'MAY') result = frequencies._get_rule_month(offsets.YearEnd(month=5)) assert (result == 'MAY') @@ -283,6 +310,10 @@ def test_period_str_to_code(): assert (frequencies._period_str_to_code('A') == 1000) assert (frequencies._period_str_to_code('A-DEC') == 1000) assert (frequencies._period_str_to_code('A-JAN') == 1001) + assert (frequencies._period_str_to_code('Y') == 1000) + assert (frequencies._period_str_to_code('Y-DEC') == 1000) + assert (frequencies._period_str_to_code('Y-JAN') == 1001) + assert (frequencies._period_str_to_code('Q') == 2000) assert (frequencies._period_str_to_code('Q-DEC') == 2000) assert (frequencies._period_str_to_code('Q-FEB') == 2002) @@ -293,7 +324,7 @@ def _assert_depr(freq, expected, aliases): msg = frequencies._INVALID_FREQ_ERROR for alias in aliases: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): frequencies._period_str_to_code(alias) _assert_depr("M", 3000, ["MTH", "MONTH", "MONTHLY"]) @@ -320,185 +351,197 @@ def _assert_depr(freq, expected, aliases): assert (frequencies._period_str_to_code('NS') == 12000) -class TestFrequencyCode(tm.TestCase): +class TestFrequencyCode(object): def test_freq_code(self): - self.assertEqual(frequencies.get_freq('A'), 1000) - self.assertEqual(frequencies.get_freq('3A'), 1000) - self.assertEqual(frequencies.get_freq('-1A'), 1000) + assert frequencies.get_freq('A') == 1000 + assert frequencies.get_freq('3A') == 1000 + assert frequencies.get_freq('-1A') == 1000 - self.assertEqual(frequencies.get_freq('W'), 4000) - self.assertEqual(frequencies.get_freq('W-MON'), 4001) - self.assertEqual(frequencies.get_freq('W-FRI'), 4005) + assert frequencies.get_freq('Y') == 1000 + assert frequencies.get_freq('3Y') == 1000 + assert frequencies.get_freq('-1Y') == 1000 + + assert frequencies.get_freq('W') == 4000 + assert frequencies.get_freq('W-MON') == 4001 + assert frequencies.get_freq('W-FRI') == 4005 for freqstr, code in compat.iteritems(frequencies._period_code_map): result = frequencies.get_freq(freqstr) - self.assertEqual(result, code) + assert result == code result = frequencies.get_freq_group(freqstr) - self.assertEqual(result, code // 1000 * 1000) + assert result == code // 1000 * 1000 result = frequencies.get_freq_group(code) - self.assertEqual(result, code // 1000 * 1000) + assert result == code // 1000 * 1000 def test_freq_group(self): - self.assertEqual(frequencies.get_freq_group('A'), 1000) - self.assertEqual(frequencies.get_freq_group('3A'), 1000) - self.assertEqual(frequencies.get_freq_group('-1A'), 1000) - self.assertEqual(frequencies.get_freq_group('A-JAN'), 1000) - self.assertEqual(frequencies.get_freq_group('A-MAY'), 1000) - self.assertEqual(frequencies.get_freq_group(offsets.YearEnd()), 1000) - self.assertEqual(frequencies.get_freq_group( - offsets.YearEnd(month=1)), 1000) - self.assertEqual(frequencies.get_freq_group( - offsets.YearEnd(month=5)), 1000) - - self.assertEqual(frequencies.get_freq_group('W'), 4000) - self.assertEqual(frequencies.get_freq_group('W-MON'), 4000) - self.assertEqual(frequencies.get_freq_group('W-FRI'), 4000) - self.assertEqual(frequencies.get_freq_group(offsets.Week()), 4000) - self.assertEqual(frequencies.get_freq_group( - offsets.Week(weekday=1)), 4000) - self.assertEqual(frequencies.get_freq_group( - offsets.Week(weekday=5)), 4000) + assert frequencies.get_freq_group('A') == 1000 + assert frequencies.get_freq_group('3A') == 1000 + assert frequencies.get_freq_group('-1A') == 1000 + assert frequencies.get_freq_group('A-JAN') == 1000 + assert frequencies.get_freq_group('A-MAY') == 1000 + + assert frequencies.get_freq_group('Y') == 1000 + assert frequencies.get_freq_group('3Y') == 1000 + assert frequencies.get_freq_group('-1Y') == 1000 + assert frequencies.get_freq_group('Y-JAN') == 1000 + assert frequencies.get_freq_group('Y-MAY') == 1000 + + assert frequencies.get_freq_group(offsets.YearEnd()) == 1000 + assert frequencies.get_freq_group(offsets.YearEnd(month=1)) == 1000 + assert frequencies.get_freq_group(offsets.YearEnd(month=5)) == 1000 + + assert frequencies.get_freq_group('W') == 4000 + assert frequencies.get_freq_group('W-MON') == 4000 + assert frequencies.get_freq_group('W-FRI') == 4000 + assert frequencies.get_freq_group(offsets.Week()) == 4000 + assert frequencies.get_freq_group(offsets.Week(weekday=1)) == 4000 + assert frequencies.get_freq_group(offsets.Week(weekday=5)) == 4000 def test_get_to_timestamp_base(self): tsb = frequencies.get_to_timestamp_base - self.assertEqual(tsb(frequencies.get_freq_code('D')[0]), - frequencies.get_freq_code('D')[0]) - self.assertEqual(tsb(frequencies.get_freq_code('W')[0]), - frequencies.get_freq_code('D')[0]) - self.assertEqual(tsb(frequencies.get_freq_code('M')[0]), - frequencies.get_freq_code('D')[0]) + assert (tsb(frequencies.get_freq_code('D')[0]) == + frequencies.get_freq_code('D')[0]) + assert (tsb(frequencies.get_freq_code('W')[0]) == + frequencies.get_freq_code('D')[0]) + assert (tsb(frequencies.get_freq_code('M')[0]) == + frequencies.get_freq_code('D')[0]) - self.assertEqual(tsb(frequencies.get_freq_code('S')[0]), - frequencies.get_freq_code('S')[0]) - self.assertEqual(tsb(frequencies.get_freq_code('T')[0]), - frequencies.get_freq_code('S')[0]) - self.assertEqual(tsb(frequencies.get_freq_code('H')[0]), - frequencies.get_freq_code('S')[0]) + assert (tsb(frequencies.get_freq_code('S')[0]) == + frequencies.get_freq_code('S')[0]) + assert (tsb(frequencies.get_freq_code('T')[0]) == + frequencies.get_freq_code('S')[0]) + assert (tsb(frequencies.get_freq_code('H')[0]) == + frequencies.get_freq_code('S')[0]) def test_freq_to_reso(self): Reso = frequencies.Resolution - self.assertEqual(Reso.get_str_from_freq('A'), 'year') - self.assertEqual(Reso.get_str_from_freq('Q'), 'quarter') - self.assertEqual(Reso.get_str_from_freq('M'), 'month') - self.assertEqual(Reso.get_str_from_freq('D'), 'day') - self.assertEqual(Reso.get_str_from_freq('H'), 'hour') - self.assertEqual(Reso.get_str_from_freq('T'), 'minute') - self.assertEqual(Reso.get_str_from_freq('S'), 'second') - self.assertEqual(Reso.get_str_from_freq('L'), 'millisecond') - self.assertEqual(Reso.get_str_from_freq('U'), 'microsecond') - self.assertEqual(Reso.get_str_from_freq('N'), 'nanosecond') + assert Reso.get_str_from_freq('A') == 'year' + assert Reso.get_str_from_freq('Q') == 'quarter' + assert Reso.get_str_from_freq('M') == 'month' + assert Reso.get_str_from_freq('D') == 'day' + assert Reso.get_str_from_freq('H') == 'hour' + assert Reso.get_str_from_freq('T') == 'minute' + assert Reso.get_str_from_freq('S') == 'second' + assert Reso.get_str_from_freq('L') == 'millisecond' + assert Reso.get_str_from_freq('U') == 'microsecond' + assert Reso.get_str_from_freq('N') == 'nanosecond' for freq in ['A', 'Q', 'M', 'D', 'H', 'T', 'S', 'L', 'U', 'N']: # check roundtrip result = Reso.get_freq(Reso.get_str_from_freq(freq)) - self.assertEqual(freq, result) + assert freq == result for freq in ['D', 'H', 'T', 'S', 'L', 'U']: result = Reso.get_freq(Reso.get_str(Reso.get_reso_from_freq(freq))) - self.assertEqual(freq, result) + assert freq == result def test_resolution_bumping(self): - # GH 14378 + # see gh-14378 Reso = frequencies.Resolution - self.assertEqual(Reso.get_stride_from_decimal(1.5, 'T'), (90, 'S')) - self.assertEqual(Reso.get_stride_from_decimal(62.4, 'T'), (3744, 'S')) - self.assertEqual(Reso.get_stride_from_decimal(1.04, 'H'), (3744, 'S')) - self.assertEqual(Reso.get_stride_from_decimal(1, 'D'), (1, 'D')) - self.assertEqual(Reso.get_stride_from_decimal(0.342931, 'H'), - (1234551600, 'U')) - self.assertEqual(Reso.get_stride_from_decimal(1.2345, 'D'), - (106660800, 'L')) + assert Reso.get_stride_from_decimal(1.5, 'T') == (90, 'S') + assert Reso.get_stride_from_decimal(62.4, 'T') == (3744, 'S') + assert Reso.get_stride_from_decimal(1.04, 'H') == (3744, 'S') + assert Reso.get_stride_from_decimal(1, 'D') == (1, 'D') + assert (Reso.get_stride_from_decimal(0.342931, 'H') == + (1234551600, 'U')) + assert Reso.get_stride_from_decimal(1.2345, 'D') == (106660800, 'L') - with self.assertRaises(ValueError): + with pytest.raises(ValueError): Reso.get_stride_from_decimal(0.5, 'N') # too much precision in the input can prevent - with self.assertRaises(ValueError): + with pytest.raises(ValueError): Reso.get_stride_from_decimal(0.3429324798798269273987982, 'H') def test_get_freq_code(self): - # freqstr - self.assertEqual(frequencies.get_freq_code('A'), - (frequencies.get_freq('A'), 1)) - self.assertEqual(frequencies.get_freq_code('3D'), - (frequencies.get_freq('D'), 3)) - self.assertEqual(frequencies.get_freq_code('-2M'), - (frequencies.get_freq('M'), -2)) + # frequency str + assert (frequencies.get_freq_code('A') == + (frequencies.get_freq('A'), 1)) + assert (frequencies.get_freq_code('3D') == + (frequencies.get_freq('D'), 3)) + assert (frequencies.get_freq_code('-2M') == + (frequencies.get_freq('M'), -2)) # tuple - self.assertEqual(frequencies.get_freq_code(('D', 1)), - (frequencies.get_freq('D'), 1)) - self.assertEqual(frequencies.get_freq_code(('A', 3)), - (frequencies.get_freq('A'), 3)) - self.assertEqual(frequencies.get_freq_code(('M', -2)), - (frequencies.get_freq('M'), -2)) + assert (frequencies.get_freq_code(('D', 1)) == + (frequencies.get_freq('D'), 1)) + assert (frequencies.get_freq_code(('A', 3)) == + (frequencies.get_freq('A'), 3)) + assert (frequencies.get_freq_code(('M', -2)) == + (frequencies.get_freq('M'), -2)) + # numeric tuple - self.assertEqual(frequencies.get_freq_code((1000, 1)), (1000, 1)) + assert frequencies.get_freq_code((1000, 1)) == (1000, 1) # offsets - self.assertEqual(frequencies.get_freq_code(offsets.Day()), - (frequencies.get_freq('D'), 1)) - self.assertEqual(frequencies.get_freq_code(offsets.Day(3)), - (frequencies.get_freq('D'), 3)) - self.assertEqual(frequencies.get_freq_code(offsets.Day(-2)), - (frequencies.get_freq('D'), -2)) - - self.assertEqual(frequencies.get_freq_code(offsets.MonthEnd()), - (frequencies.get_freq('M'), 1)) - self.assertEqual(frequencies.get_freq_code(offsets.MonthEnd(3)), - (frequencies.get_freq('M'), 3)) - self.assertEqual(frequencies.get_freq_code(offsets.MonthEnd(-2)), - (frequencies.get_freq('M'), -2)) - - self.assertEqual(frequencies.get_freq_code(offsets.Week()), - (frequencies.get_freq('W'), 1)) - self.assertEqual(frequencies.get_freq_code(offsets.Week(3)), - (frequencies.get_freq('W'), 3)) - self.assertEqual(frequencies.get_freq_code(offsets.Week(-2)), - (frequencies.get_freq('W'), -2)) - - # monday is weekday=0 - self.assertEqual(frequencies.get_freq_code(offsets.Week(weekday=1)), - (frequencies.get_freq('W-TUE'), 1)) - self.assertEqual(frequencies.get_freq_code(offsets.Week(3, weekday=0)), - (frequencies.get_freq('W-MON'), 3)) - self.assertEqual( - frequencies.get_freq_code(offsets.Week(-2, weekday=4)), - (frequencies.get_freq('W-FRI'), -2)) + assert (frequencies.get_freq_code(offsets.Day()) == + (frequencies.get_freq('D'), 1)) + assert (frequencies.get_freq_code(offsets.Day(3)) == + (frequencies.get_freq('D'), 3)) + assert (frequencies.get_freq_code(offsets.Day(-2)) == + (frequencies.get_freq('D'), -2)) + + assert (frequencies.get_freq_code(offsets.MonthEnd()) == + (frequencies.get_freq('M'), 1)) + assert (frequencies.get_freq_code(offsets.MonthEnd(3)) == + (frequencies.get_freq('M'), 3)) + assert (frequencies.get_freq_code(offsets.MonthEnd(-2)) == + (frequencies.get_freq('M'), -2)) + + assert (frequencies.get_freq_code(offsets.Week()) == + (frequencies.get_freq('W'), 1)) + assert (frequencies.get_freq_code(offsets.Week(3)) == + (frequencies.get_freq('W'), 3)) + assert (frequencies.get_freq_code(offsets.Week(-2)) == + (frequencies.get_freq('W'), -2)) + + # Monday is weekday=0 + assert (frequencies.get_freq_code(offsets.Week(weekday=1)) == + (frequencies.get_freq('W-TUE'), 1)) + assert (frequencies.get_freq_code(offsets.Week(3, weekday=0)) == + (frequencies.get_freq('W-MON'), 3)) + assert (frequencies.get_freq_code(offsets.Week(-2, weekday=4)) == + (frequencies.get_freq('W-FRI'), -2)) _dti = DatetimeIndex -class TestFrequencyInference(tm.TestCase): +class TestFrequencyInference(object): + def test_raise_if_period_index(self): index = PeriodIndex(start="1/1/1990", periods=20, freq="M") - self.assertRaises(TypeError, frequencies.infer_freq, index) + pytest.raises(TypeError, frequencies.infer_freq, index) def test_raise_if_too_few(self): index = _dti(['12/31/1998', '1/3/1999']) - self.assertRaises(ValueError, frequencies.infer_freq, index) + pytest.raises(ValueError, frequencies.infer_freq, index) def test_business_daily(self): + index = _dti(['01/01/1999', '1/4/1999', '1/5/1999']) + assert frequencies.infer_freq(index) == 'B' + + def test_business_daily_look_alike(self): + # GH 16624, do not infer 'B' when 'weekend' (2-day gap) in wrong place index = _dti(['12/31/1998', '1/3/1999', '1/4/1999']) - self.assertEqual(frequencies.infer_freq(index), 'B') + assert frequencies.infer_freq(index) is None def test_day(self): self._check_tick(timedelta(1), 'D') def test_day_corner(self): index = _dti(['1/1/2000', '1/2/2000', '1/3/2000']) - self.assertEqual(frequencies.infer_freq(index), 'D') + assert frequencies.infer_freq(index) == 'D' def test_non_datetimeindex(self): dates = to_datetime(['1/1/2000', '1/2/2000', '1/3/2000']) - self.assertEqual(frequencies.infer_freq(dates), 'D') + assert frequencies.infer_freq(dates) == 'D' def test_hour(self): self._check_tick(timedelta(hours=1), 'H') @@ -527,16 +570,16 @@ def _check_tick(self, base_delta, code): exp_freq = '%d%s' % (i, code) else: exp_freq = code - self.assertEqual(frequencies.infer_freq(index), exp_freq) + assert frequencies.infer_freq(index) == exp_freq index = _dti([b + base_delta * 7] + [b + base_delta * j for j in range( 3)]) - self.assertIsNone(frequencies.infer_freq(index)) + assert frequencies.infer_freq(index) is None index = _dti([b + base_delta * j for j in range(3)] + [b + base_delta * 7]) - self.assertIsNone(frequencies.infer_freq(index)) + assert frequencies.infer_freq(index) is None def test_weekly(self): days = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'] @@ -554,7 +597,7 @@ def test_week_of_month(self): def test_fifth_week_of_month(self): # Only supports freq up to WOM-4. See #9425 func = lambda: date_range('2014-01-01', freq='WOM-5MON') - self.assertRaises(ValueError, func) + pytest.raises(ValueError, func) def test_fifth_week_of_month_infer(self): # Only attempts to infer up to WOM-4. See #9425 @@ -572,7 +615,7 @@ def test_monthly(self): def test_monthly_ambiguous(self): rng = _dti(['1/31/2000', '2/29/2000', '3/31/2000']) - self.assertEqual(rng.inferred_freq, 'M') + assert rng.inferred_freq == 'M' def test_business_monthly(self): self._check_generated_range('1/1/2000', 'BM') @@ -594,7 +637,7 @@ def test_business_annual(self): def test_annual_ambiguous(self): rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) - self.assertEqual(rng.inferred_freq, 'A-JAN') + assert rng.inferred_freq == 'A-JAN' def _check_generated_range(self, start, freq): freq = freq.upper() @@ -602,41 +645,45 @@ def _check_generated_range(self, start, freq): gen = date_range(start, periods=7, freq=freq) index = _dti(gen.values) if not freq.startswith('Q-'): - self.assertEqual(frequencies.infer_freq(index), gen.freqstr) + assert frequencies.infer_freq(index) == gen.freqstr else: inf_freq = frequencies.infer_freq(index) - self.assertTrue((inf_freq == 'Q-DEC' and gen.freqstr in ( - 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR')) or ( - inf_freq == 'Q-NOV' and gen.freqstr in ( - 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB')) or ( - inf_freq == 'Q-OCT' and gen.freqstr in ( - 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN'))) + is_dec_range = inf_freq == 'Q-DEC' and gen.freqstr in ( + 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR') + is_nov_range = inf_freq == 'Q-NOV' and gen.freqstr in ( + 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB') + is_oct_range = inf_freq == 'Q-OCT' and gen.freqstr in ( + 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN') + assert is_dec_range or is_nov_range or is_oct_range gen = date_range(start, periods=5, freq=freq) index = _dti(gen.values) + if not freq.startswith('Q-'): - self.assertEqual(frequencies.infer_freq(index), gen.freqstr) + assert frequencies.infer_freq(index) == gen.freqstr else: inf_freq = frequencies.infer_freq(index) - self.assertTrue((inf_freq == 'Q-DEC' and gen.freqstr in ( - 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR')) or ( - inf_freq == 'Q-NOV' and gen.freqstr in ( - 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB')) or ( - inf_freq == 'Q-OCT' and gen.freqstr in ( - 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN'))) + is_dec_range = inf_freq == 'Q-DEC' and gen.freqstr in ( + 'Q', 'Q-DEC', 'Q-SEP', 'Q-JUN', 'Q-MAR') + is_nov_range = inf_freq == 'Q-NOV' and gen.freqstr in ( + 'Q-NOV', 'Q-AUG', 'Q-MAY', 'Q-FEB') + is_oct_range = inf_freq == 'Q-OCT' and gen.freqstr in ( + 'Q-OCT', 'Q-JUL', 'Q-APR', 'Q-JAN') + + assert is_dec_range or is_nov_range or is_oct_range def test_infer_freq(self): rng = period_range('1959Q2', '2009Q3', freq='Q') rng = Index(rng.to_timestamp('D', how='e').asobject) - self.assertEqual(rng.inferred_freq, 'Q-DEC') + assert rng.inferred_freq == 'Q-DEC' rng = period_range('1959Q2', '2009Q3', freq='Q-NOV') rng = Index(rng.to_timestamp('D', how='e').asobject) - self.assertEqual(rng.inferred_freq, 'Q-NOV') + assert rng.inferred_freq == 'Q-NOV' rng = period_range('1959Q2', '2009Q3', freq='Q-OCT') rng = Index(rng.to_timestamp('D', how='e').asobject) - self.assertEqual(rng.inferred_freq, 'Q-OCT') + assert rng.inferred_freq == 'Q-OCT' def test_infer_freq_tz(self): @@ -656,7 +703,7 @@ def test_infer_freq_tz(self): 'US/Pacific', 'US/Eastern']: for expected, dates in compat.iteritems(freqs): idx = DatetimeIndex(dates, tz=tz) - self.assertEqual(idx.inferred_freq, expected) + assert idx.inferred_freq == expected def test_infer_freq_tz_transition(self): # Tests for #8772 @@ -672,11 +719,11 @@ def test_infer_freq_tz_transition(self): for freq in freqs: idx = date_range(date_pair[0], date_pair[ 1], freq=freq, tz=tz) - self.assertEqual(idx.inferred_freq, freq) + assert idx.inferred_freq == freq index = date_range("2013-11-03", periods=5, freq="3H").tz_localize("America/Chicago") - self.assertIsNone(index.inferred_freq) + assert index.inferred_freq is None def test_infer_freq_businesshour(self): # GH 7905 @@ -684,21 +731,21 @@ def test_infer_freq_businesshour(self): ['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00']) # hourly freq in a day must result in 'H' - self.assertEqual(idx.inferred_freq, 'H') + assert idx.inferred_freq == 'H' idx = DatetimeIndex( ['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00', '2014-07-01 15:00', '2014-07-01 16:00', '2014-07-02 09:00', '2014-07-02 10:00', '2014-07-02 11:00']) - self.assertEqual(idx.inferred_freq, 'BH') + assert idx.inferred_freq == 'BH' idx = DatetimeIndex( ['2014-07-04 09:00', '2014-07-04 10:00', '2014-07-04 11:00', '2014-07-04 12:00', '2014-07-04 13:00', '2014-07-04 14:00', '2014-07-04 15:00', '2014-07-04 16:00', '2014-07-07 09:00', '2014-07-07 10:00', '2014-07-07 11:00']) - self.assertEqual(idx.inferred_freq, 'BH') + assert idx.inferred_freq == 'BH' idx = DatetimeIndex( ['2014-07-04 09:00', '2014-07-04 10:00', '2014-07-04 11:00', @@ -709,12 +756,12 @@ def test_infer_freq_businesshour(self): '2014-07-07 16:00', '2014-07-08 09:00', '2014-07-08 10:00', '2014-07-08 11:00', '2014-07-08 12:00', '2014-07-08 13:00', '2014-07-08 14:00', '2014-07-08 15:00', '2014-07-08 16:00']) - self.assertEqual(idx.inferred_freq, 'BH') + assert idx.inferred_freq == 'BH' def test_not_monotonic(self): rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) rng = rng[::-1] - self.assertEqual(rng.inferred_freq, '-1A-JAN') + assert rng.inferred_freq == '-1A-JAN' def test_non_datetimeindex2(self): rng = _dti(['1/31/2000', '1/31/2001', '1/31/2002']) @@ -722,21 +769,20 @@ def test_non_datetimeindex2(self): vals = rng.to_pydatetime() result = frequencies.infer_freq(vals) - self.assertEqual(result, rng.inferred_freq) + assert result == rng.inferred_freq def test_invalid_index_types(self): # test all index types for i in [tm.makeIntIndex(10), tm.makeFloatIndex(10), tm.makePeriodIndex(10)]: - self.assertRaises(TypeError, lambda: frequencies.infer_freq(i)) + pytest.raises(TypeError, lambda: frequencies.infer_freq(i)) # GH 10822 # odd error message on conversions to datetime for unicode if not is_platform_windows(): for i in [tm.makeStringIndex(10), tm.makeUnicodeIndex(10)]: - self.assertRaises(ValueError, - lambda: frequencies.infer_freq(i)) + pytest.raises(ValueError, lambda: frequencies.infer_freq(i)) def test_string_datetimelike_compat(self): @@ -745,7 +791,7 @@ def test_string_datetimelike_compat(self): '2004-04']) result = frequencies.infer_freq(Index(['2004-01', '2004-02', '2004-03', '2004-04'])) - self.assertEqual(result, expected) + assert result == expected def test_series(self): @@ -754,51 +800,45 @@ def test_series(self): # invalid type of Series for s in [Series(np.arange(10)), Series(np.arange(10.))]: - self.assertRaises(TypeError, lambda: frequencies.infer_freq(s)) + pytest.raises(TypeError, lambda: frequencies.infer_freq(s)) # a non-convertible string - self.assertRaises(ValueError, - lambda: frequencies.infer_freq( - Series(['foo', 'bar']))) + pytest.raises(ValueError, lambda: frequencies.infer_freq( + Series(['foo', 'bar']))) # cannot infer on PeriodIndex for freq in [None, 'L']: s = Series(period_range('2013', periods=10, freq=freq)) - self.assertRaises(TypeError, lambda: frequencies.infer_freq(s)) - for freq in ['Y']: - - msg = frequencies._INVALID_FREQ_ERROR - with tm.assertRaisesRegexp(ValueError, msg): - s = Series(period_range('2013', periods=10, freq=freq)) - self.assertRaises(TypeError, lambda: frequencies.infer_freq(s)) + pytest.raises(TypeError, lambda: frequencies.infer_freq(s)) # DateTimeIndex for freq in ['M', 'L', 'S']: s = Series(date_range('20130101', periods=10, freq=freq)) inferred = frequencies.infer_freq(s) - self.assertEqual(inferred, freq) + assert inferred == freq s = Series(date_range('20130101', '20130110')) inferred = frequencies.infer_freq(s) - self.assertEqual(inferred, 'D') + assert inferred == 'D' def test_legacy_offset_warnings(self): freqs = ['WEEKDAY', 'EOM', 'W@MON', 'W@TUE', 'W@WED', 'W@THU', 'W@FRI', 'W@SAT', 'W@SUN', 'Q@JAN', 'Q@FEB', 'Q@MAR', 'A@JAN', 'A@FEB', 'A@MAR', 'A@APR', 'A@MAY', 'A@JUN', 'A@JUL', 'A@AUG', 'A@SEP', 'A@OCT', 'A@NOV', 'A@DEC', - 'WOM@1MON', 'WOM@2MON', 'WOM@3MON', 'WOM@4MON', - 'WOM@1TUE', 'WOM@2TUE', 'WOM@3TUE', 'WOM@4TUE', - 'WOM@1WED', 'WOM@2WED', 'WOM@3WED', 'WOM@4WED', - 'WOM@1THU', 'WOM@2THU', 'WOM@3THU', 'WOM@4THU' - 'WOM@1FRI', 'WOM@2FRI', 'WOM@3FRI', 'WOM@4FRI'] + 'Y@JAN', 'WOM@1MON', 'WOM@2MON', 'WOM@3MON', + 'WOM@4MON', 'WOM@1TUE', 'WOM@2TUE', 'WOM@3TUE', + 'WOM@4TUE', 'WOM@1WED', 'WOM@2WED', 'WOM@3WED', + 'WOM@4WED', 'WOM@1THU', 'WOM@2THU', 'WOM@3THU', + 'WOM@4THU', 'WOM@1FRI', 'WOM@2FRI', 'WOM@3FRI', + 'WOM@4FRI'] msg = frequencies._INVALID_FREQ_ERROR for freq in freqs: - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): frequencies.get_offset(freq) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): date_range('2011-01-01', periods=5, freq=freq) diff --git a/pandas/tseries/tests/test_holiday.py b/pandas/tests/tseries/test_holiday.py similarity index 75% rename from pandas/tseries/tests/test_holiday.py rename to pandas/tests/tseries/test_holiday.py index 62446e8e637c6..3ea7e5b8620f2 100644 --- a/pandas/tseries/tests/test_holiday.py +++ b/pandas/tests/tseries/test_holiday.py @@ -1,3 +1,5 @@ +import pytest + from datetime import datetime import pandas.util.testing as tm from pandas import compat @@ -15,11 +17,11 @@ USLaborDay, USColumbusDay, USMartinLutherKingJr, USPresidentsDay) from pytz import utc -import nose -class TestCalendar(tm.TestCase): - def setUp(self): +class TestCalendar(object): + + def setup_method(self, method): self.holiday_list = [ datetime(2012, 1, 2), datetime(2012, 1, 16), @@ -47,14 +49,15 @@ def test_calendar(self): Timestamp(self.start_date), Timestamp(self.end_date)) - self.assertEqual(list(holidays.to_pydatetime()), self.holiday_list) - self.assertEqual(list(holidays_1.to_pydatetime()), self.holiday_list) - self.assertEqual(list(holidays_2.to_pydatetime()), self.holiday_list) + assert list(holidays.to_pydatetime()) == self.holiday_list + assert list(holidays_1.to_pydatetime()) == self.holiday_list + assert list(holidays_2.to_pydatetime()) == self.holiday_list def test_calendar_caching(self): # Test for issue #9552 class TestCalendar(AbstractHolidayCalendar): + def __init__(self, name=None, rules=None): super(TestCalendar, self).__init__(name=name, rules=rules) @@ -79,27 +82,22 @@ def test_calendar_observance_dates(self): def test_rule_from_name(self): USFedCal = get_calendar('USFederalHolidayCalendar') - self.assertEqual(USFedCal.rule_from_name( - 'Thanksgiving'), USThanksgivingDay) + assert USFedCal.rule_from_name('Thanksgiving') == USThanksgivingDay -class TestHoliday(tm.TestCase): - def setUp(self): +class TestHoliday(object): + + def setup_method(self, method): self.start_date = datetime(2011, 1, 1) self.end_date = datetime(2020, 12, 31) def check_results(self, holiday, start, end, expected): - self.assertEqual(list(holiday.dates(start, end)), expected) + assert list(holiday.dates(start, end)) == expected + # Verify that timezone info is preserved. - self.assertEqual( - list( - holiday.dates( - utc.localize(Timestamp(start)), - utc.localize(Timestamp(end)), - ) - ), - [utc.localize(dt) for dt in expected], - ) + assert (list(holiday.dates(utc.localize(Timestamp(start)), + utc.localize(Timestamp(end)))) == + [utc.localize(dt) for dt in expected]) def test_usmemorialday(self): self.check_results(holiday=USMemorialDay, @@ -230,7 +228,7 @@ def test_holidays_within_dates(self): for rule, dates in compat.iteritems(holidays): empty_dates = rule.dates(start_date, end_date) - self.assertEqual(empty_dates.tolist(), []) + assert empty_dates.tolist() == [] if isinstance(dates, tuple): dates = [dates] @@ -251,8 +249,8 @@ def test_argument_types(self): Timestamp(self.start_date), Timestamp(self.end_date)) - self.assert_index_equal(holidays, holidays_1) - self.assert_index_equal(holidays, holidays_2) + tm.assert_index_equal(holidays, holidays_1) + tm.assert_index_equal(holidays, holidays_2) def test_special_holidays(self): base_date = [datetime(2012, 5, 28)] @@ -262,17 +260,15 @@ def test_special_holidays(self): end_date=datetime(2012, 12, 31), offset=DateOffset(weekday=MO(1))) - self.assertEqual(base_date, - holiday_1.dates(self.start_date, self.end_date)) - self.assertEqual(base_date, - holiday_2.dates(self.start_date, self.end_date)) + assert base_date == holiday_1.dates(self.start_date, self.end_date) + assert base_date == holiday_2.dates(self.start_date, self.end_date) def test_get_calendar(self): class TestCalendar(AbstractHolidayCalendar): rules = [] calendar = get_calendar('TestCalendar') - self.assertEqual(TestCalendar, calendar.__class__) + assert TestCalendar == calendar.__class__ def test_factory(self): class_1 = HolidayCalendarFactory('MemorialDay', @@ -283,13 +279,14 @@ def test_factory(self): USThanksgivingDay) class_3 = HolidayCalendarFactory('Combined', class_1, class_2) - self.assertEqual(len(class_1.rules), 1) - self.assertEqual(len(class_2.rules), 1) - self.assertEqual(len(class_3.rules), 2) + assert len(class_1.rules) == 1 + assert len(class_2.rules) == 1 + assert len(class_3.rules) == 2 + +class TestObservanceRules(object): -class TestObservanceRules(tm.TestCase): - def setUp(self): + def setup_method(self, method): self.we = datetime(2014, 4, 9) self.th = datetime(2014, 4, 10) self.fr = datetime(2014, 4, 11) @@ -299,64 +296,65 @@ def setUp(self): self.tu = datetime(2014, 4, 15) def test_next_monday(self): - self.assertEqual(next_monday(self.sa), self.mo) - self.assertEqual(next_monday(self.su), self.mo) + assert next_monday(self.sa) == self.mo + assert next_monday(self.su) == self.mo def test_next_monday_or_tuesday(self): - self.assertEqual(next_monday_or_tuesday(self.sa), self.mo) - self.assertEqual(next_monday_or_tuesday(self.su), self.tu) - self.assertEqual(next_monday_or_tuesday(self.mo), self.tu) + assert next_monday_or_tuesday(self.sa) == self.mo + assert next_monday_or_tuesday(self.su) == self.tu + assert next_monday_or_tuesday(self.mo) == self.tu def test_previous_friday(self): - self.assertEqual(previous_friday(self.sa), self.fr) - self.assertEqual(previous_friday(self.su), self.fr) + assert previous_friday(self.sa) == self.fr + assert previous_friday(self.su) == self.fr def test_sunday_to_monday(self): - self.assertEqual(sunday_to_monday(self.su), self.mo) + assert sunday_to_monday(self.su) == self.mo def test_nearest_workday(self): - self.assertEqual(nearest_workday(self.sa), self.fr) - self.assertEqual(nearest_workday(self.su), self.mo) - self.assertEqual(nearest_workday(self.mo), self.mo) + assert nearest_workday(self.sa) == self.fr + assert nearest_workday(self.su) == self.mo + assert nearest_workday(self.mo) == self.mo def test_weekend_to_monday(self): - self.assertEqual(weekend_to_monday(self.sa), self.mo) - self.assertEqual(weekend_to_monday(self.su), self.mo) - self.assertEqual(weekend_to_monday(self.mo), self.mo) + assert weekend_to_monday(self.sa) == self.mo + assert weekend_to_monday(self.su) == self.mo + assert weekend_to_monday(self.mo) == self.mo def test_next_workday(self): - self.assertEqual(next_workday(self.sa), self.mo) - self.assertEqual(next_workday(self.su), self.mo) - self.assertEqual(next_workday(self.mo), self.tu) + assert next_workday(self.sa) == self.mo + assert next_workday(self.su) == self.mo + assert next_workday(self.mo) == self.tu def test_previous_workday(self): - self.assertEqual(previous_workday(self.sa), self.fr) - self.assertEqual(previous_workday(self.su), self.fr) - self.assertEqual(previous_workday(self.tu), self.mo) + assert previous_workday(self.sa) == self.fr + assert previous_workday(self.su) == self.fr + assert previous_workday(self.tu) == self.mo def test_before_nearest_workday(self): - self.assertEqual(before_nearest_workday(self.sa), self.th) - self.assertEqual(before_nearest_workday(self.su), self.fr) - self.assertEqual(before_nearest_workday(self.tu), self.mo) + assert before_nearest_workday(self.sa) == self.th + assert before_nearest_workday(self.su) == self.fr + assert before_nearest_workday(self.tu) == self.mo def test_after_nearest_workday(self): - self.assertEqual(after_nearest_workday(self.sa), self.mo) - self.assertEqual(after_nearest_workday(self.su), self.tu) - self.assertEqual(after_nearest_workday(self.fr), self.mo) + assert after_nearest_workday(self.sa) == self.mo + assert after_nearest_workday(self.su) == self.tu + assert after_nearest_workday(self.fr) == self.mo -class TestFederalHolidayCalendar(tm.TestCase): +class TestFederalHolidayCalendar(object): - # Test for issue 10278 - def test_no_mlk_before_1984(self): + def test_no_mlk_before_1986(self): + # see gh-10278 class MLKCalendar(AbstractHolidayCalendar): rules = [USMartinLutherKingJr] holidays = MLKCalendar().holidays(start='1984', end='1988').to_pydatetime().tolist() + # Testing to make sure holiday is not incorrectly observed before 1986 - self.assertEqual(holidays, [datetime(1986, 1, 20, 0, 0), datetime( - 1987, 1, 19, 0, 0)]) + assert holidays == [datetime(1986, 1, 20, 0, 0), + datetime(1987, 1, 19, 0, 0)] def test_memorial_day(self): class MemorialDay(AbstractHolidayCalendar): @@ -364,29 +362,24 @@ class MemorialDay(AbstractHolidayCalendar): holidays = MemorialDay().holidays(start='1971', end='1980').to_pydatetime().tolist() - # Fixes 5/31 error and checked manually against wikipedia - self.assertEqual(holidays, [datetime(1971, 5, 31, 0, 0), - datetime(1972, 5, 29, 0, 0), - datetime(1973, 5, 28, 0, 0), - datetime(1974, 5, 27, 0, - 0), datetime(1975, 5, 26, 0, 0), - datetime(1976, 5, 31, 0, - 0), datetime(1977, 5, 30, 0, 0), - datetime(1978, 5, 29, 0, - 0), datetime(1979, 5, 28, 0, 0)]) + # Fixes 5/31 error and checked manually against Wikipedia + assert holidays == [datetime(1971, 5, 31, 0, 0), + datetime(1972, 5, 29, 0, 0), + datetime(1973, 5, 28, 0, 0), + datetime(1974, 5, 27, 0, 0), + datetime(1975, 5, 26, 0, 0), + datetime(1976, 5, 31, 0, 0), + datetime(1977, 5, 30, 0, 0), + datetime(1978, 5, 29, 0, 0), + datetime(1979, 5, 28, 0, 0)] -class TestHolidayConflictingArguments(tm.TestCase): - # GH 10217 +class TestHolidayConflictingArguments(object): def test_both_offset_observance_raises(self): - with self.assertRaises(NotImplementedError): + # see gh-10217 + with pytest.raises(NotImplementedError): Holiday("Cyber Monday", month=11, day=1, offset=[DateOffset(weekday=SA(4))], observance=next_monday) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_offsets.py b/pandas/tests/tseries/test_offsets.py similarity index 83% rename from pandas/tseries/tests/test_offsets.py rename to pandas/tests/tseries/test_offsets.py index 768e9212e6c42..e03b3e0a85e5e 100644 --- a/pandas/tseries/tests/test_offsets.py +++ b/pandas/tests/tseries/test_offsets.py @@ -2,10 +2,10 @@ from distutils.version import LooseVersion from datetime import date, datetime, timedelta from dateutil.relativedelta import relativedelta + +import pytest from pandas.compat import range, iteritems from pandas import compat -import nose -from nose.tools import assert_raises import numpy as np @@ -15,7 +15,8 @@ from pandas.tseries.frequencies import (_offset_map, get_freq_code, _get_freq_str, _INVALID_FREQ_ERROR, get_offset, get_standard_freq) -from pandas.tseries.index import _to_m8, DatetimeIndex, _daterange_cache +from pandas.core.indexes.datetimes import ( + _to_m8, DatetimeIndex, _daterange_cache) from pandas.tseries.offsets import (BDay, CDay, BQuarterEnd, BMonthEnd, BusinessHour, WeekOfMonth, CBMonthEnd, CustomBusinessHour, WeekDay, @@ -27,18 +28,16 @@ QuarterEnd, BusinessMonthEnd, FY5253, Milli, Nano, Easter, FY5253Quarter, LastWeekOfMonth, CacheableOffset) -from pandas.tseries.tools import (format, ole2datetime, parse_time_string, - to_datetime, DateParseError) +from pandas.core.tools.datetimes import ( + format, ole2datetime, parse_time_string, + to_datetime, DateParseError) import pandas.tseries.offsets as offsets from pandas.io.pickle import read_pickle -from pandas.tslib import normalize_date, NaT, Timestamp, Timedelta -import pandas.tslib as tslib -from pandas.util.testing import assertRaisesRegexp +from pandas._libs.tslib import normalize_date, NaT, Timestamp, Timedelta +import pandas._libs.tslib as tslib import pandas.util.testing as tm from pandas.tseries.holiday import USFederalHolidayCalendar -_multiprocess_can_split_ = True - def test_monthrange(): import calendar @@ -60,7 +59,8 @@ def test_ole2datetime(): actual = ole2datetime(60000) assert actual == datetime(2064, 4, 8) - assert_raises(ValueError, ole2datetime, 60) + with pytest.raises(ValueError): + ole2datetime(60) def test_to_datetime1(): @@ -83,7 +83,7 @@ def test_normalize_date(): def test_to_m8(): valb = datetime(2007, 10, 1) valu = _to_m8(valb) - tm.assertIsInstance(valu, np.datetime64) + assert isinstance(valu, np.datetime64) # assert valu == np.datetime64(datetime(2007,10,1)) # def test_datetime64_box(): @@ -97,7 +97,7 @@ def test_to_m8(): ##### -class Base(tm.TestCase): +class Base(object): _offset = None _offset_types = [getattr(offsets, o) for o in offsets.__all__] @@ -145,29 +145,27 @@ def test_apply_out_of_range(self): offset = self._get_offset(self._offset, value=10000) result = Timestamp('20080101') + offset - self.assertIsInstance(result, datetime) - self.assertIsNone(result.tzinfo) + assert isinstance(result, datetime) + assert result.tzinfo is None - tm._skip_if_no_pytz() - tm._skip_if_no_dateutil() # Check tz is preserved for tz in self.timezones: t = Timestamp('20080101', tz=tz) result = t + offset - self.assertIsInstance(result, datetime) - self.assertEqual(t.tzinfo, result.tzinfo) + assert isinstance(result, datetime) + assert t.tzinfo == result.tzinfo - except (tslib.OutOfBoundsDatetime): + except tslib.OutOfBoundsDatetime: raise except (ValueError, KeyError) as e: - raise nose.SkipTest( + pytest.skip( "cannot create out_of_range offset: {0} {1}".format( str(self).split('.')[-1], e)) class TestCommon(Base): - def setUp(self): + def setup_method(self, method): # exected value created by Base._get_offset # are applied to 2011/01/01 09:00 (Saturday) # used for .apply and .rollforward @@ -218,25 +216,25 @@ def test_return_type(self): # make sure that we are returning a Timestamp result = Timestamp('20080101') + offset - self.assertIsInstance(result, Timestamp) + assert isinstance(result, Timestamp) # make sure that we are returning NaT - self.assertTrue(NaT + offset is NaT) - self.assertTrue(offset + NaT is NaT) + assert NaT + offset is NaT + assert offset + NaT is NaT - self.assertTrue(NaT - offset is NaT) - self.assertTrue((-offset).apply(NaT) is NaT) + assert NaT - offset is NaT + assert (-offset).apply(NaT) is NaT def test_offset_n(self): for offset_klass in self.offset_types: offset = self._get_offset(offset_klass) - self.assertEqual(offset.n, 1) + assert offset.n == 1 neg_offset = offset * -1 - self.assertEqual(neg_offset.n, -1) + assert neg_offset.n == -1 mul_offset = offset * 3 - self.assertEqual(mul_offset.n, 3) + assert mul_offset.n == 3 def test_offset_freqstr(self): for offset_klass in self.offset_types: @@ -247,7 +245,7 @@ def test_offset_freqstr(self): "", 'LWOM-SAT', ): code = get_offset(freqstr) - self.assertEqual(offset.rule_code, code) + assert offset.rule_code == code def _check_offsetfunc_works(self, offset, funcname, dt, expected, normalize=False): @@ -255,12 +253,12 @@ def _check_offsetfunc_works(self, offset, funcname, dt, expected, func = getattr(offset_s, funcname) result = func(dt) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected) + assert isinstance(result, Timestamp) + assert result == expected result = func(Timestamp(dt)) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected) + assert isinstance(result, Timestamp) + assert result == expected # see gh-14101 exp_warning = None @@ -275,31 +273,28 @@ def _check_offsetfunc_works(self, offset, funcname, dt, expected, with tm.assert_produces_warning(exp_warning, check_stacklevel=False): result = func(ts) - self.assertTrue(isinstance(result, Timestamp)) + assert isinstance(result, Timestamp) if normalize is False: - self.assertEqual(result, expected + Nano(5)) + assert result == expected + Nano(5) else: - self.assertEqual(result, expected) + assert result == expected if isinstance(dt, np.datetime64): # test tz when input is datetime or Timestamp return - tm._skip_if_no_pytz() - tm._skip_if_no_dateutil() - for tz in self.timezones: expected_localize = expected.tz_localize(tz) tz_obj = tslib.maybe_get_tz(tz) dt_tz = tslib._localize_pydatetime(dt, tz_obj) result = func(dt_tz) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected_localize) + assert isinstance(result, Timestamp) + assert result == expected_localize result = func(Timestamp(dt, tz=tz)) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected_localize) + assert isinstance(result, Timestamp) + assert result == expected_localize # see gh-14101 exp_warning = None @@ -314,11 +309,11 @@ def _check_offsetfunc_works(self, offset, funcname, dt, expected, with tm.assert_produces_warning(exp_warning, check_stacklevel=False): result = func(ts) - self.assertTrue(isinstance(result, Timestamp)) + assert isinstance(result, Timestamp) if normalize is False: - self.assertEqual(result, expected_localize + Nano(5)) + assert result == expected_localize + Nano(5) else: - self.assertEqual(result, expected_localize) + assert result == expected_localize def test_apply(self): sdt = datetime(2011, 1, 1, 9, 0) @@ -442,18 +437,18 @@ def test_onOffset(self): for offset in self.offset_types: dt = self.expecteds[offset.__name__] offset_s = self._get_offset(offset) - self.assertTrue(offset_s.onOffset(dt)) + assert offset_s.onOffset(dt) # when normalize=True, onOffset checks time is 00:00:00 offset_n = self._get_offset(offset, normalize=True) - self.assertFalse(offset_n.onOffset(dt)) + assert not offset_n.onOffset(dt) if offset in (BusinessHour, CustomBusinessHour): # In default BusinessHour (9:00-17:00), normalized time # cannot be in business hour range continue date = datetime(dt.year, dt.month, dt.day) - self.assertTrue(offset_n.onOffset(date)) + assert offset_n.onOffset(date) def test_add(self): dt = datetime(2011, 1, 1, 9, 0) @@ -465,15 +460,14 @@ def test_add(self): result_dt = dt + offset_s result_ts = Timestamp(dt) + offset_s for result in [result_dt, result_ts]: - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected) + assert isinstance(result, Timestamp) + assert result == expected - tm._skip_if_no_pytz() for tz in self.timezones: expected_localize = expected.tz_localize(tz) result = Timestamp(dt, tz=tz) + offset_s - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected_localize) + assert isinstance(result, Timestamp) + assert result == expected_localize # normalize=True offset_s = self._get_offset(offset, normalize=True) @@ -482,14 +476,14 @@ def test_add(self): result_dt = dt + offset_s result_ts = Timestamp(dt) + offset_s for result in [result_dt, result_ts]: - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected) + assert isinstance(result, Timestamp) + assert result == expected for tz in self.timezones: expected_localize = expected.tz_localize(tz) result = Timestamp(dt, tz=tz) + offset_s - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, expected_localize) + assert isinstance(result, Timestamp) + assert result == expected_localize def test_pickle_v0_15_2(self): offsets = {'DateOffset': DateOffset(years=1), @@ -506,9 +500,8 @@ def test_pickle_v0_15_2(self): class TestDateOffset(Base): - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.d = Timestamp(datetime(2008, 1, 2)) _offset_map.clear() @@ -542,14 +535,13 @@ def test_eq(self): offset1 = DateOffset(days=1) offset2 = DateOffset(days=365) - self.assertNotEqual(offset1, offset2) + assert offset1 != offset2 class TestBusinessDay(Base): - _multiprocess_can_split_ = True _offset = BDay - def setUp(self): + def setup_method(self, method): self.d = datetime(2008, 1, 1) self.offset = BDay() @@ -560,10 +552,10 @@ def test_different_normalize_equals(self): offset = BDay() offset2 = BDay() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): - self.assertEqual(repr(self.offset), '') + assert repr(self.offset) == '' assert repr(self.offset2) == '<2 * BusinessDays>' expected = '' @@ -575,49 +567,49 @@ def test_with_offset(self): assert (self.d + offset) == datetime(2008, 1, 2, 2) def testEQ(self): - self.assertEqual(self.offset2, self.offset2) + assert self.offset2 == self.offset2 def test_mul(self): pass def test_hash(self): - self.assertEqual(hash(self.offset2), hash(self.offset2)) + assert hash(self.offset2) == hash(self.offset2) def testCall(self): - self.assertEqual(self.offset2(self.d), datetime(2008, 1, 3)) + assert self.offset2(self.d) == datetime(2008, 1, 3) def testRAdd(self): - self.assertEqual(self.d + self.offset2, self.offset2 + self.d) + assert self.d + self.offset2 == self.offset2 + self.d def testSub(self): off = self.offset2 - self.assertRaises(Exception, off.__sub__, self.d) - self.assertEqual(2 * off - off, off) + pytest.raises(Exception, off.__sub__, self.d) + assert 2 * off - off == off - self.assertEqual(self.d - self.offset2, self.d + BDay(-2)) + assert self.d - self.offset2 == self.d + BDay(-2) def testRSub(self): - self.assertEqual(self.d - self.offset2, (-self.offset2).apply(self.d)) + assert self.d - self.offset2 == (-self.offset2).apply(self.d) def testMult1(self): - self.assertEqual(self.d + 10 * self.offset, self.d + BDay(10)) + assert self.d + 10 * self.offset == self.d + BDay(10) def testMult2(self): - self.assertEqual(self.d + (-5 * BDay(-10)), self.d + BDay(50)) + assert self.d + (-5 * BDay(-10)) == self.d + BDay(50) def testRollback1(self): - self.assertEqual(BDay(10).rollback(self.d), self.d) + assert BDay(10).rollback(self.d) == self.d def testRollback2(self): - self.assertEqual( - BDay(10).rollback(datetime(2008, 1, 5)), datetime(2008, 1, 4)) + assert (BDay(10).rollback(datetime(2008, 1, 5)) == + datetime(2008, 1, 4)) def testRollforward1(self): - self.assertEqual(BDay(10).rollforward(self.d), self.d) + assert BDay(10).rollforward(self.d) == self.d def testRollforward2(self): - self.assertEqual( - BDay(10).rollforward(datetime(2008, 1, 5)), datetime(2008, 1, 7)) + assert (BDay(10).rollforward(datetime(2008, 1, 5)) == + datetime(2008, 1, 7)) def test_roll_date_object(self): offset = BDay() @@ -625,17 +617,17 @@ def test_roll_date_object(self): dt = date(2012, 9, 15) result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 14)) + assert result == datetime(2012, 9, 14) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 17)) + assert result == datetime(2012, 9, 17) offset = offsets.Day() result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) def test_onOffset(self): tests = [(BDay(), datetime(2008, 1, 1), True), @@ -693,41 +685,40 @@ def test_apply_large_n(self): dt = datetime(2012, 10, 23) result = dt + BDay(10) - self.assertEqual(result, datetime(2012, 11, 6)) + assert result == datetime(2012, 11, 6) result = dt + BDay(100) - BDay(100) - self.assertEqual(result, dt) + assert result == dt off = BDay() * 6 rs = datetime(2012, 1, 1) - off xp = datetime(2011, 12, 23) - self.assertEqual(rs, xp) + assert rs == xp st = datetime(2011, 12, 18) rs = st + off xp = datetime(2011, 12, 26) - self.assertEqual(rs, xp) + assert rs == xp off = BDay() * 10 rs = datetime(2014, 1, 5) + off # see #5890 xp = datetime(2014, 1, 17) - self.assertEqual(rs, xp) + assert rs == xp def test_apply_corner(self): - self.assertRaises(TypeError, BDay().apply, BMonthEnd()) + pytest.raises(TypeError, BDay().apply, BMonthEnd()) def test_offsets_compare_equal(self): # root cause of #456 offset1 = BDay() offset2 = BDay() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 class TestBusinessHour(Base): - _multiprocess_can_split_ = True _offset = BusinessHour - def setUp(self): + def setup_method(self, method): self.d = datetime(2014, 7, 1, 10, 00) self.offset1 = BusinessHour() @@ -744,11 +735,11 @@ def setUp(self): def test_constructor_errors(self): from datetime import time as dt_time - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): BusinessHour(start=dt_time(11, 0, 5)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): BusinessHour(start='AAA') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): BusinessHour(start='14:00:05') def test_different_normalize_equals(self): @@ -756,125 +747,113 @@ def test_different_normalize_equals(self): offset = self._offset() offset2 = self._offset() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): - self.assertEqual(repr(self.offset1), '') - self.assertEqual(repr(self.offset2), - '<3 * BusinessHours: BH=09:00-17:00>') - self.assertEqual(repr(self.offset3), - '<-1 * BusinessHour: BH=09:00-17:00>') - self.assertEqual(repr(self.offset4), - '<-4 * BusinessHours: BH=09:00-17:00>') - - self.assertEqual(repr(self.offset5), '') - self.assertEqual(repr(self.offset6), '') - self.assertEqual(repr(self.offset7), - '<-2 * BusinessHours: BH=21:30-06:30>') + assert repr(self.offset1) == '' + assert repr(self.offset2) == '<3 * BusinessHours: BH=09:00-17:00>' + assert repr(self.offset3) == '<-1 * BusinessHour: BH=09:00-17:00>' + assert repr(self.offset4) == '<-4 * BusinessHours: BH=09:00-17:00>' + + assert repr(self.offset5) == '' + assert repr(self.offset6) == '' + assert repr(self.offset7) == '<-2 * BusinessHours: BH=21:30-06:30>' def test_with_offset(self): expected = Timestamp('2014-07-01 13:00') - self.assertEqual(self.d + BusinessHour() * 3, expected) - self.assertEqual(self.d + BusinessHour(n=3), expected) + assert self.d + BusinessHour() * 3 == expected + assert self.d + BusinessHour(n=3) == expected def testEQ(self): for offset in [self.offset1, self.offset2, self.offset3, self.offset4]: - self.assertEqual(offset, offset) + assert offset == offset - self.assertNotEqual(BusinessHour(), BusinessHour(-1)) - self.assertEqual(BusinessHour(start='09:00'), BusinessHour()) - self.assertNotEqual(BusinessHour(start='09:00'), - BusinessHour(start='09:01')) - self.assertNotEqual(BusinessHour(start='09:00', end='17:00'), - BusinessHour(start='17:00', end='09:01')) + assert BusinessHour() != BusinessHour(-1) + assert BusinessHour(start='09:00') == BusinessHour() + assert BusinessHour(start='09:00') != BusinessHour(start='09:01') + assert (BusinessHour(start='09:00', end='17:00') != + BusinessHour(start='17:00', end='09:01')) def test_hash(self): for offset in [self.offset1, self.offset2, self.offset3, self.offset4]: - self.assertEqual(hash(offset), hash(offset)) + assert hash(offset) == hash(offset) def testCall(self): - self.assertEqual(self.offset1(self.d), datetime(2014, 7, 1, 11)) - self.assertEqual(self.offset2(self.d), datetime(2014, 7, 1, 13)) - self.assertEqual(self.offset3(self.d), datetime(2014, 6, 30, 17)) - self.assertEqual(self.offset4(self.d), datetime(2014, 6, 30, 14)) + assert self.offset1(self.d) == datetime(2014, 7, 1, 11) + assert self.offset2(self.d) == datetime(2014, 7, 1, 13) + assert self.offset3(self.d) == datetime(2014, 6, 30, 17) + assert self.offset4(self.d) == datetime(2014, 6, 30, 14) def testRAdd(self): - self.assertEqual(self.d + self.offset2, self.offset2 + self.d) + assert self.d + self.offset2 == self.offset2 + self.d def testSub(self): off = self.offset2 - self.assertRaises(Exception, off.__sub__, self.d) - self.assertEqual(2 * off - off, off) + pytest.raises(Exception, off.__sub__, self.d) + assert 2 * off - off == off - self.assertEqual(self.d - self.offset2, self.d + self._offset(-3)) + assert self.d - self.offset2 == self.d + self._offset(-3) def testRSub(self): - self.assertEqual(self.d - self.offset2, (-self.offset2).apply(self.d)) + assert self.d - self.offset2 == (-self.offset2).apply(self.d) def testMult1(self): - self.assertEqual(self.d + 5 * self.offset1, self.d + self._offset(5)) + assert self.d + 5 * self.offset1 == self.d + self._offset(5) def testMult2(self): - self.assertEqual(self.d + (-3 * self._offset(-2)), - self.d + self._offset(6)) + assert self.d + (-3 * self._offset(-2)) == self.d + self._offset(6) def testRollback1(self): - self.assertEqual(self.offset1.rollback(self.d), self.d) - self.assertEqual(self.offset2.rollback(self.d), self.d) - self.assertEqual(self.offset3.rollback(self.d), self.d) - self.assertEqual(self.offset4.rollback(self.d), self.d) - self.assertEqual(self.offset5.rollback(self.d), - datetime(2014, 6, 30, 14, 30)) - self.assertEqual(self.offset6.rollback( - self.d), datetime(2014, 7, 1, 5, 0)) - self.assertEqual(self.offset7.rollback( - self.d), datetime(2014, 7, 1, 6, 30)) + assert self.offset1.rollback(self.d) == self.d + assert self.offset2.rollback(self.d) == self.d + assert self.offset3.rollback(self.d) == self.d + assert self.offset4.rollback(self.d) == self.d + assert self.offset5.rollback(self.d) == datetime(2014, 6, 30, 14, 30) + assert self.offset6.rollback(self.d) == datetime(2014, 7, 1, 5, 0) + assert self.offset7.rollback(self.d) == datetime(2014, 7, 1, 6, 30) d = datetime(2014, 7, 1, 0) - self.assertEqual(self.offset1.rollback(d), datetime(2014, 6, 30, 17)) - self.assertEqual(self.offset2.rollback(d), datetime(2014, 6, 30, 17)) - self.assertEqual(self.offset3.rollback(d), datetime(2014, 6, 30, 17)) - self.assertEqual(self.offset4.rollback(d), datetime(2014, 6, 30, 17)) - self.assertEqual(self.offset5.rollback( - d), datetime(2014, 6, 30, 14, 30)) - self.assertEqual(self.offset6.rollback(d), d) - self.assertEqual(self.offset7.rollback(d), d) + assert self.offset1.rollback(d) == datetime(2014, 6, 30, 17) + assert self.offset2.rollback(d) == datetime(2014, 6, 30, 17) + assert self.offset3.rollback(d) == datetime(2014, 6, 30, 17) + assert self.offset4.rollback(d) == datetime(2014, 6, 30, 17) + assert self.offset5.rollback(d) == datetime(2014, 6, 30, 14, 30) + assert self.offset6.rollback(d) == d + assert self.offset7.rollback(d) == d - self.assertEqual(self._offset(5).rollback(self.d), self.d) + assert self._offset(5).rollback(self.d) == self.d def testRollback2(self): - self.assertEqual(self._offset(-3) - .rollback(datetime(2014, 7, 5, 15, 0)), - datetime(2014, 7, 4, 17, 0)) + assert (self._offset(-3).rollback(datetime(2014, 7, 5, 15, 0)) == + datetime(2014, 7, 4, 17, 0)) def testRollforward1(self): - self.assertEqual(self.offset1.rollforward(self.d), self.d) - self.assertEqual(self.offset2.rollforward(self.d), self.d) - self.assertEqual(self.offset3.rollforward(self.d), self.d) - self.assertEqual(self.offset4.rollforward(self.d), self.d) - self.assertEqual(self.offset5.rollforward( - self.d), datetime(2014, 7, 1, 11, 0)) - self.assertEqual(self.offset6.rollforward( - self.d), datetime(2014, 7, 1, 20, 0)) - self.assertEqual(self.offset7.rollforward( - self.d), datetime(2014, 7, 1, 21, 30)) + assert self.offset1.rollforward(self.d) == self.d + assert self.offset2.rollforward(self.d) == self.d + assert self.offset3.rollforward(self.d) == self.d + assert self.offset4.rollforward(self.d) == self.d + assert (self.offset5.rollforward(self.d) == + datetime(2014, 7, 1, 11, 0)) + assert (self.offset6.rollforward(self.d) == + datetime(2014, 7, 1, 20, 0)) + assert (self.offset7.rollforward(self.d) == + datetime(2014, 7, 1, 21, 30)) d = datetime(2014, 7, 1, 0) - self.assertEqual(self.offset1.rollforward(d), datetime(2014, 7, 1, 9)) - self.assertEqual(self.offset2.rollforward(d), datetime(2014, 7, 1, 9)) - self.assertEqual(self.offset3.rollforward(d), datetime(2014, 7, 1, 9)) - self.assertEqual(self.offset4.rollforward(d), datetime(2014, 7, 1, 9)) - self.assertEqual(self.offset5.rollforward(d), datetime(2014, 7, 1, 11)) - self.assertEqual(self.offset6.rollforward(d), d) - self.assertEqual(self.offset7.rollforward(d), d) + assert self.offset1.rollforward(d) == datetime(2014, 7, 1, 9) + assert self.offset2.rollforward(d) == datetime(2014, 7, 1, 9) + assert self.offset3.rollforward(d) == datetime(2014, 7, 1, 9) + assert self.offset4.rollforward(d) == datetime(2014, 7, 1, 9) + assert self.offset5.rollforward(d) == datetime(2014, 7, 1, 11) + assert self.offset6.rollforward(d) == d + assert self.offset7.rollforward(d) == d - self.assertEqual(self._offset(5).rollforward(self.d), self.d) + assert self._offset(5).rollforward(self.d) == self.d def testRollforward2(self): - self.assertEqual(self._offset(-3) - .rollforward(datetime(2014, 7, 5, 16, 0)), - datetime(2014, 7, 7, 9)) + assert (self._offset(-3).rollforward(datetime(2014, 7, 5, 16, 0)) == + datetime(2014, 7, 7, 9)) def test_roll_date_object(self): offset = BusinessHour() @@ -882,10 +861,10 @@ def test_roll_date_object(self): dt = datetime(2014, 7, 6, 15, 0) result = offset.rollback(dt) - self.assertEqual(result, datetime(2014, 7, 4, 17)) + assert result == datetime(2014, 7, 4, 17) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2014, 7, 7, 9)) + assert result == datetime(2014, 7, 7, 9) def test_normalize(self): tests = [] @@ -927,7 +906,7 @@ def test_normalize(self): for offset, cases in tests: for dt, expected in compat.iteritems(cases): - self.assertEqual(offset.apply(dt), expected) + assert offset.apply(dt) == expected def test_onOffset(self): tests = [] @@ -966,7 +945,7 @@ def test_onOffset(self): for offset, cases in tests: for dt, expected in compat.iteritems(cases): - self.assertEqual(offset.onOffset(dt), expected) + assert offset.onOffset(dt) == expected def test_opening_time(self): tests = [] @@ -1130,8 +1109,8 @@ def test_opening_time(self): for _offsets, cases in tests: for offset in _offsets: for dt, (exp_next, exp_prev) in compat.iteritems(cases): - self.assertEqual(offset._next_opening_time(dt), exp_next) - self.assertEqual(offset._prev_opening_time(dt), exp_prev) + assert offset._next_opening_time(dt) == exp_next + assert offset._prev_opening_time(dt) == exp_prev def test_apply(self): tests = [] @@ -1392,7 +1371,7 @@ def test_offsets_compare_equal(self): # root cause of #456 offset1 = self._offset() offset2 = self._offset() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 def test_datetimeindex(self): idx1 = DatetimeIndex(start='2014-07-04 15:00', end='2014-07-08 10:00', @@ -1431,10 +1410,9 @@ def test_datetimeindex(self): class TestCustomBusinessHour(Base): - _multiprocess_can_split_ = True _offset = CustomBusinessHour - def setUp(self): + def setup_method(self, method): # 2014 Calendar to check custom holidays # Sun Mon Tue Wed Thu Fri Sat # 6/22 23 24 25 26 27 28 @@ -1449,11 +1427,11 @@ def setUp(self): def test_constructor_errors(self): from datetime import time as dt_time - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): CustomBusinessHour(start=dt_time(11, 0, 5)) - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): CustomBusinessHour(start='AAA') - with tm.assertRaises(ValueError): + with pytest.raises(ValueError): CustomBusinessHour(start='14:00:05') def test_different_normalize_equals(self): @@ -1461,93 +1439,89 @@ def test_different_normalize_equals(self): offset = self._offset() offset2 = self._offset() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): - self.assertEqual(repr(self.offset1), - '') - self.assertEqual(repr(self.offset2), - '') + assert repr(self.offset1) == '' + assert repr(self.offset2) == '' def test_with_offset(self): expected = Timestamp('2014-07-01 13:00') - self.assertEqual(self.d + CustomBusinessHour() * 3, expected) - self.assertEqual(self.d + CustomBusinessHour(n=3), expected) + assert self.d + CustomBusinessHour() * 3 == expected + assert self.d + CustomBusinessHour(n=3) == expected def testEQ(self): for offset in [self.offset1, self.offset2]: - self.assertEqual(offset, offset) + assert offset == offset - self.assertNotEqual(CustomBusinessHour(), CustomBusinessHour(-1)) - self.assertEqual(CustomBusinessHour(start='09:00'), - CustomBusinessHour()) - self.assertNotEqual(CustomBusinessHour(start='09:00'), - CustomBusinessHour(start='09:01')) - self.assertNotEqual(CustomBusinessHour(start='09:00', end='17:00'), - CustomBusinessHour(start='17:00', end='09:01')) + assert CustomBusinessHour() != CustomBusinessHour(-1) + assert (CustomBusinessHour(start='09:00') == + CustomBusinessHour()) + assert (CustomBusinessHour(start='09:00') != + CustomBusinessHour(start='09:01')) + assert (CustomBusinessHour(start='09:00', end='17:00') != + CustomBusinessHour(start='17:00', end='09:01')) - self.assertNotEqual(CustomBusinessHour(weekmask='Tue Wed Thu Fri'), - CustomBusinessHour(weekmask='Mon Tue Wed Thu Fri')) - self.assertNotEqual(CustomBusinessHour(holidays=['2014-06-27']), - CustomBusinessHour(holidays=['2014-06-28'])) + assert (CustomBusinessHour(weekmask='Tue Wed Thu Fri') != + CustomBusinessHour(weekmask='Mon Tue Wed Thu Fri')) + assert (CustomBusinessHour(holidays=['2014-06-27']) != + CustomBusinessHour(holidays=['2014-06-28'])) def test_hash(self): - self.assertEqual(hash(self.offset1), hash(self.offset1)) - self.assertEqual(hash(self.offset2), hash(self.offset2)) + assert hash(self.offset1) == hash(self.offset1) + assert hash(self.offset2) == hash(self.offset2) def testCall(self): - self.assertEqual(self.offset1(self.d), datetime(2014, 7, 1, 11)) - self.assertEqual(self.offset2(self.d), datetime(2014, 7, 1, 11)) + assert self.offset1(self.d) == datetime(2014, 7, 1, 11) + assert self.offset2(self.d) == datetime(2014, 7, 1, 11) def testRAdd(self): - self.assertEqual(self.d + self.offset2, self.offset2 + self.d) + assert self.d + self.offset2 == self.offset2 + self.d def testSub(self): off = self.offset2 - self.assertRaises(Exception, off.__sub__, self.d) - self.assertEqual(2 * off - off, off) + pytest.raises(Exception, off.__sub__, self.d) + assert 2 * off - off == off - self.assertEqual(self.d - self.offset2, self.d - (2 * off - off)) + assert self.d - self.offset2 == self.d - (2 * off - off) def testRSub(self): - self.assertEqual(self.d - self.offset2, (-self.offset2).apply(self.d)) + assert self.d - self.offset2 == (-self.offset2).apply(self.d) def testMult1(self): - self.assertEqual(self.d + 5 * self.offset1, self.d + self._offset(5)) + assert self.d + 5 * self.offset1 == self.d + self._offset(5) def testMult2(self): - self.assertEqual(self.d + (-3 * self._offset(-2)), - self.d + self._offset(6)) + assert self.d + (-3 * self._offset(-2)) == self.d + self._offset(6) def testRollback1(self): - self.assertEqual(self.offset1.rollback(self.d), self.d) - self.assertEqual(self.offset2.rollback(self.d), self.d) + assert self.offset1.rollback(self.d) == self.d + assert self.offset2.rollback(self.d) == self.d d = datetime(2014, 7, 1, 0) + # 2014/07/01 is Tuesday, 06/30 is Monday(holiday) - self.assertEqual(self.offset1.rollback(d), datetime(2014, 6, 27, 17)) + assert self.offset1.rollback(d) == datetime(2014, 6, 27, 17) # 2014/6/30 and 2014/6/27 are holidays - self.assertEqual(self.offset2.rollback(d), datetime(2014, 6, 26, 17)) + assert self.offset2.rollback(d) == datetime(2014, 6, 26, 17) def testRollback2(self): - self.assertEqual(self._offset(-3) - .rollback(datetime(2014, 7, 5, 15, 0)), - datetime(2014, 7, 4, 17, 0)) + assert (self._offset(-3).rollback(datetime(2014, 7, 5, 15, 0)) == + datetime(2014, 7, 4, 17, 0)) def testRollforward1(self): - self.assertEqual(self.offset1.rollforward(self.d), self.d) - self.assertEqual(self.offset2.rollforward(self.d), self.d) + assert self.offset1.rollforward(self.d) == self.d + assert self.offset2.rollforward(self.d) == self.d d = datetime(2014, 7, 1, 0) - self.assertEqual(self.offset1.rollforward(d), datetime(2014, 7, 1, 9)) - self.assertEqual(self.offset2.rollforward(d), datetime(2014, 7, 1, 9)) + assert self.offset1.rollforward(d) == datetime(2014, 7, 1, 9) + assert self.offset2.rollforward(d) == datetime(2014, 7, 1, 9) def testRollforward2(self): - self.assertEqual(self._offset(-3) - .rollforward(datetime(2014, 7, 5, 16, 0)), - datetime(2014, 7, 7, 9)) + assert (self._offset(-3).rollforward(datetime(2014, 7, 5, 16, 0)) == + datetime(2014, 7, 7, 9)) def test_roll_date_object(self): offset = BusinessHour() @@ -1555,10 +1529,10 @@ def test_roll_date_object(self): dt = datetime(2014, 7, 6, 15, 0) result = offset.rollback(dt) - self.assertEqual(result, datetime(2014, 7, 4, 17)) + assert result == datetime(2014, 7, 4, 17) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2014, 7, 7, 9)) + assert result == datetime(2014, 7, 7, 9) def test_normalize(self): tests = [] @@ -1602,7 +1576,7 @@ def test_normalize(self): for offset, cases in tests: for dt, expected in compat.iteritems(cases): - self.assertEqual(offset.apply(dt), expected) + assert offset.apply(dt) == expected def test_onOffset(self): tests = [] @@ -1618,7 +1592,7 @@ def test_onOffset(self): for offset, cases in tests: for dt, expected in compat.iteritems(cases): - self.assertEqual(offset.onOffset(dt), expected) + assert offset.onOffset(dt) == expected def test_apply(self): tests = [] @@ -1692,10 +1666,9 @@ def test_apply_nanoseconds(self): class TestCustomBusinessDay(Base): - _multiprocess_can_split_ = True _offset = CDay - def setUp(self): + def setup_method(self, method): self.d = datetime(2008, 1, 1) self.nd = np_datetime64_compat('2008-01-01 00:00:00Z') @@ -1707,7 +1680,7 @@ def test_different_normalize_equals(self): offset = CDay() offset2 = CDay() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): assert repr(self.offset) == '' @@ -1722,50 +1695,50 @@ def test_with_offset(self): assert (self.d + offset) == datetime(2008, 1, 2, 2) def testEQ(self): - self.assertEqual(self.offset2, self.offset2) + assert self.offset2 == self.offset2 def test_mul(self): pass def test_hash(self): - self.assertEqual(hash(self.offset2), hash(self.offset2)) + assert hash(self.offset2) == hash(self.offset2) def testCall(self): - self.assertEqual(self.offset2(self.d), datetime(2008, 1, 3)) - self.assertEqual(self.offset2(self.nd), datetime(2008, 1, 3)) + assert self.offset2(self.d) == datetime(2008, 1, 3) + assert self.offset2(self.nd) == datetime(2008, 1, 3) def testRAdd(self): - self.assertEqual(self.d + self.offset2, self.offset2 + self.d) + assert self.d + self.offset2 == self.offset2 + self.d def testSub(self): off = self.offset2 - self.assertRaises(Exception, off.__sub__, self.d) - self.assertEqual(2 * off - off, off) + pytest.raises(Exception, off.__sub__, self.d) + assert 2 * off - off == off - self.assertEqual(self.d - self.offset2, self.d + CDay(-2)) + assert self.d - self.offset2 == self.d + CDay(-2) def testRSub(self): - self.assertEqual(self.d - self.offset2, (-self.offset2).apply(self.d)) + assert self.d - self.offset2 == (-self.offset2).apply(self.d) def testMult1(self): - self.assertEqual(self.d + 10 * self.offset, self.d + CDay(10)) + assert self.d + 10 * self.offset == self.d + CDay(10) def testMult2(self): - self.assertEqual(self.d + (-5 * CDay(-10)), self.d + CDay(50)) + assert self.d + (-5 * CDay(-10)) == self.d + CDay(50) def testRollback1(self): - self.assertEqual(CDay(10).rollback(self.d), self.d) + assert CDay(10).rollback(self.d) == self.d def testRollback2(self): - self.assertEqual( - CDay(10).rollback(datetime(2008, 1, 5)), datetime(2008, 1, 4)) + assert (CDay(10).rollback(datetime(2008, 1, 5)) == + datetime(2008, 1, 4)) def testRollforward1(self): - self.assertEqual(CDay(10).rollforward(self.d), self.d) + assert CDay(10).rollforward(self.d) == self.d def testRollforward2(self): - self.assertEqual( - CDay(10).rollforward(datetime(2008, 1, 5)), datetime(2008, 1, 7)) + assert (CDay(10).rollforward(datetime(2008, 1, 5)) == + datetime(2008, 1, 7)) def test_roll_date_object(self): offset = CDay() @@ -1773,17 +1746,17 @@ def test_roll_date_object(self): dt = date(2012, 9, 15) result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 14)) + assert result == datetime(2012, 9, 14) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 17)) + assert result == datetime(2012, 9, 17) offset = offsets.Day() result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) def test_onOffset(self): tests = [(CDay(), datetime(2008, 1, 1), True), @@ -1842,29 +1815,29 @@ def test_apply_large_n(self): dt = datetime(2012, 10, 23) result = dt + CDay(10) - self.assertEqual(result, datetime(2012, 11, 6)) + assert result == datetime(2012, 11, 6) result = dt + CDay(100) - CDay(100) - self.assertEqual(result, dt) + assert result == dt off = CDay() * 6 rs = datetime(2012, 1, 1) - off xp = datetime(2011, 12, 23) - self.assertEqual(rs, xp) + assert rs == xp st = datetime(2011, 12, 18) rs = st + off xp = datetime(2011, 12, 26) - self.assertEqual(rs, xp) + assert rs == xp def test_apply_corner(self): - self.assertRaises(Exception, CDay().apply, BMonthEnd()) + pytest.raises(Exception, CDay().apply, BMonthEnd()) def test_offsets_compare_equal(self): # root cause of #456 offset1 = CDay() offset2 = CDay() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 def test_holidays(self): # Define a TradingDay offset @@ -1875,7 +1848,7 @@ def test_holidays(self): dt = datetime(year, 4, 30) xp = datetime(year, 5, 2) rs = dt + tday - self.assertEqual(rs, xp) + assert rs == xp def test_weekmask(self): weekmask_saudi = 'Sat Sun Mon Tue Wed' # Thu-Fri Weekend @@ -1888,13 +1861,13 @@ def test_weekmask(self): xp_saudi = datetime(2013, 5, 4) xp_uae = datetime(2013, 5, 2) xp_egypt = datetime(2013, 5, 2) - self.assertEqual(xp_saudi, dt + bday_saudi) - self.assertEqual(xp_uae, dt + bday_uae) - self.assertEqual(xp_egypt, dt + bday_egypt) + assert xp_saudi == dt + bday_saudi + assert xp_uae == dt + bday_uae + assert xp_egypt == dt + bday_egypt xp2 = datetime(2013, 5, 5) - self.assertEqual(xp2, dt + 2 * bday_saudi) - self.assertEqual(xp2, dt + 2 * bday_uae) - self.assertEqual(xp2, dt + 2 * bday_egypt) + assert xp2 == dt + 2 * bday_saudi + assert xp2 == dt + 2 * bday_uae + assert xp2 == dt + 2 * bday_egypt def test_weekmask_and_holidays(self): weekmask_egypt = 'Sun Mon Tue Wed Thu' # Fri-Sat Weekend @@ -1903,7 +1876,7 @@ def test_weekmask_and_holidays(self): bday_egypt = CDay(holidays=holidays, weekmask=weekmask_egypt) dt = datetime(2013, 4, 30) xp_egypt = datetime(2013, 5, 5) - self.assertEqual(xp_egypt, dt + 2 * bday_egypt) + assert xp_egypt == dt + 2 * bday_egypt def test_calendar(self): calendar = USFederalHolidayCalendar() @@ -1912,8 +1885,8 @@ def test_calendar(self): def test_roundtrip_pickle(self): def _check_roundtrip(obj): - unpickled = self.round_trip_pickle(obj) - self.assertEqual(unpickled, obj) + unpickled = tm.round_trip_pickle(obj) + assert unpickled == obj _check_roundtrip(self.offset) _check_roundtrip(self.offset2) @@ -1926,56 +1899,54 @@ def test_pickle_compat_0_14_1(self): cday0_14_1 = read_pickle(os.path.join(pth, 'cday-0.14.1.pickle')) cday = CDay(holidays=hdays) - self.assertEqual(cday, cday0_14_1) + assert cday == cday0_14_1 class CustomBusinessMonthBase(object): - _multiprocess_can_split_ = True - def setUp(self): + def setup_method(self, method): self.d = datetime(2008, 1, 1) self.offset = self._object() self.offset2 = self._object(2) def testEQ(self): - self.assertEqual(self.offset2, self.offset2) + assert self.offset2 == self.offset2 def test_mul(self): pass def test_hash(self): - self.assertEqual(hash(self.offset2), hash(self.offset2)) + assert hash(self.offset2) == hash(self.offset2) def testRAdd(self): - self.assertEqual(self.d + self.offset2, self.offset2 + self.d) + assert self.d + self.offset2 == self.offset2 + self.d def testSub(self): off = self.offset2 - self.assertRaises(Exception, off.__sub__, self.d) - self.assertEqual(2 * off - off, off) + pytest.raises(Exception, off.__sub__, self.d) + assert 2 * off - off == off - self.assertEqual(self.d - self.offset2, self.d + self._object(-2)) + assert self.d - self.offset2 == self.d + self._object(-2) def testRSub(self): - self.assertEqual(self.d - self.offset2, (-self.offset2).apply(self.d)) + assert self.d - self.offset2 == (-self.offset2).apply(self.d) def testMult1(self): - self.assertEqual(self.d + 10 * self.offset, self.d + self._object(10)) + assert self.d + 10 * self.offset == self.d + self._object(10) def testMult2(self): - self.assertEqual(self.d + (-5 * self._object(-10)), - self.d + self._object(50)) + assert self.d + (-5 * self._object(-10)) == self.d + self._object(50) def test_offsets_compare_equal(self): offset1 = self._object() offset2 = self._object() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 def test_roundtrip_pickle(self): def _check_roundtrip(obj): - unpickled = self.round_trip_pickle(obj) - self.assertEqual(unpickled, obj) + unpickled = tm.round_trip_pickle(obj) + assert unpickled == obj _check_roundtrip(self._object()) _check_roundtrip(self._object(2)) @@ -1990,26 +1961,24 @@ def test_different_normalize_equals(self): offset = CBMonthEnd() offset2 = CBMonthEnd() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): assert repr(self.offset) == '' assert repr(self.offset2) == '<2 * CustomBusinessMonthEnds>' def testCall(self): - self.assertEqual(self.offset2(self.d), datetime(2008, 2, 29)) + assert self.offset2(self.d) == datetime(2008, 2, 29) def testRollback1(self): - self.assertEqual( - CDay(10).rollback(datetime(2007, 12, 31)), datetime(2007, 12, 31)) + assert (CDay(10).rollback(datetime(2007, 12, 31)) == + datetime(2007, 12, 31)) def testRollback2(self): - self.assertEqual(CBMonthEnd(10).rollback(self.d), - datetime(2007, 12, 31)) + assert CBMonthEnd(10).rollback(self.d) == datetime(2007, 12, 31) def testRollforward1(self): - self.assertEqual(CBMonthEnd(10).rollforward( - self.d), datetime(2008, 1, 31)) + assert CBMonthEnd(10).rollforward(self.d) == datetime(2008, 1, 31) def test_roll_date_object(self): offset = CBMonthEnd() @@ -2017,17 +1986,17 @@ def test_roll_date_object(self): dt = date(2012, 9, 15) result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 8, 31)) + assert result == datetime(2012, 8, 31) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 28)) + assert result == datetime(2012, 9, 28) offset = offsets.Day() result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) def test_onOffset(self): tests = [(CBMonthEnd(), datetime(2008, 1, 31), True), @@ -2065,20 +2034,20 @@ def test_apply_large_n(self): dt = datetime(2012, 10, 23) result = dt + CBMonthEnd(10) - self.assertEqual(result, datetime(2013, 7, 31)) + assert result == datetime(2013, 7, 31) result = dt + CDay(100) - CDay(100) - self.assertEqual(result, dt) + assert result == dt off = CBMonthEnd() * 6 rs = datetime(2012, 1, 1) - off xp = datetime(2011, 7, 29) - self.assertEqual(rs, xp) + assert rs == xp st = datetime(2011, 12, 18) rs = st + off xp = datetime(2012, 5, 31) - self.assertEqual(rs, xp) + assert rs == xp def test_holidays(self): # Define a TradingDay offset @@ -2086,17 +2055,16 @@ def test_holidays(self): np.datetime64('2012-02-29')] bm_offset = CBMonthEnd(holidays=holidays) dt = datetime(2012, 1, 1) - self.assertEqual(dt + bm_offset, datetime(2012, 1, 30)) - self.assertEqual(dt + 2 * bm_offset, datetime(2012, 2, 27)) + assert dt + bm_offset == datetime(2012, 1, 30) + assert dt + 2 * bm_offset == datetime(2012, 2, 27) def test_datetimeindex(self): from pandas.tseries.holiday import USFederalHolidayCalendar hcal = USFederalHolidayCalendar() freq = CBMonthEnd(calendar=hcal) - self.assertEqual(DatetimeIndex(start='20120101', end='20130101', - freq=freq).tolist()[0], - datetime(2012, 1, 31)) + assert (DatetimeIndex(start='20120101', end='20130101', + freq=freq).tolist()[0] == datetime(2012, 1, 31)) class TestCustomBusinessMonthBegin(CustomBusinessMonthBase, Base): @@ -2107,26 +2075,24 @@ def test_different_normalize_equals(self): offset = CBMonthBegin() offset2 = CBMonthBegin() offset2.normalize = True - self.assertEqual(offset, offset2) + assert offset == offset2 def test_repr(self): assert repr(self.offset) == '' assert repr(self.offset2) == '<2 * CustomBusinessMonthBegins>' def testCall(self): - self.assertEqual(self.offset2(self.d), datetime(2008, 3, 3)) + assert self.offset2(self.d) == datetime(2008, 3, 3) def testRollback1(self): - self.assertEqual( - CDay(10).rollback(datetime(2007, 12, 31)), datetime(2007, 12, 31)) + assert (CDay(10).rollback(datetime(2007, 12, 31)) == + datetime(2007, 12, 31)) def testRollback2(self): - self.assertEqual(CBMonthBegin(10).rollback(self.d), - datetime(2008, 1, 1)) + assert CBMonthBegin(10).rollback(self.d) == datetime(2008, 1, 1) def testRollforward1(self): - self.assertEqual(CBMonthBegin(10).rollforward( - self.d), datetime(2008, 1, 1)) + assert CBMonthBegin(10).rollforward(self.d) == datetime(2008, 1, 1) def test_roll_date_object(self): offset = CBMonthBegin() @@ -2134,17 +2100,17 @@ def test_roll_date_object(self): dt = date(2012, 9, 15) result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 3)) + assert result == datetime(2012, 9, 3) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 10, 1)) + assert result == datetime(2012, 10, 1) offset = offsets.Day() result = offset.rollback(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) result = offset.rollforward(dt) - self.assertEqual(result, datetime(2012, 9, 15)) + assert result == datetime(2012, 9, 15) def test_onOffset(self): tests = [(CBMonthBegin(), datetime(2008, 1, 1), True), @@ -2181,20 +2147,21 @@ def test_apply_large_n(self): dt = datetime(2012, 10, 23) result = dt + CBMonthBegin(10) - self.assertEqual(result, datetime(2013, 8, 1)) + assert result == datetime(2013, 8, 1) result = dt + CDay(100) - CDay(100) - self.assertEqual(result, dt) + assert result == dt off = CBMonthBegin() * 6 rs = datetime(2012, 1, 1) - off xp = datetime(2011, 7, 1) - self.assertEqual(rs, xp) + assert rs == xp st = datetime(2011, 12, 18) rs = st + off + xp = datetime(2012, 6, 1) - self.assertEqual(rs, xp) + assert rs == xp def test_holidays(self): # Define a TradingDay offset @@ -2202,15 +2169,15 @@ def test_holidays(self): np.datetime64('2012-03-01')] bm_offset = CBMonthBegin(holidays=holidays) dt = datetime(2012, 1, 1) - self.assertEqual(dt + bm_offset, datetime(2012, 1, 2)) - self.assertEqual(dt + 2 * bm_offset, datetime(2012, 2, 3)) + + assert dt + bm_offset == datetime(2012, 1, 2) + assert dt + 2 * bm_offset == datetime(2012, 2, 3) def test_datetimeindex(self): hcal = USFederalHolidayCalendar() cbmb = CBMonthBegin(calendar=hcal) - self.assertEqual(DatetimeIndex(start='20120101', end='20130101', - freq=cbmb).tolist()[0], - datetime(2012, 1, 3)) + assert (DatetimeIndex(start='20120101', end='20130101', + freq=cbmb).tolist()[0] == datetime(2012, 1, 3)) def assertOnOffset(offset, date, expected): @@ -2224,20 +2191,20 @@ class TestWeek(Base): _offset = Week def test_repr(self): - self.assertEqual(repr(Week(weekday=0)), "") - self.assertEqual(repr(Week(n=-1, weekday=0)), "<-1 * Week: weekday=0>") - self.assertEqual(repr(Week(n=-2, weekday=0)), - "<-2 * Weeks: weekday=0>") + assert repr(Week(weekday=0)) == "" + assert repr(Week(n=-1, weekday=0)) == "<-1 * Week: weekday=0>" + assert repr(Week(n=-2, weekday=0)) == "<-2 * Weeks: weekday=0>" def test_corner(self): - self.assertRaises(ValueError, Week, weekday=7) - assertRaisesRegexp(ValueError, "Day must be", Week, weekday=-1) + pytest.raises(ValueError, Week, weekday=7) + tm.assert_raises_regex( + ValueError, "Day must be", Week, weekday=-1) def test_isAnchored(self): - self.assertTrue(Week(weekday=0).isAnchored()) - self.assertFalse(Week().isAnchored()) - self.assertFalse(Week(2, weekday=2).isAnchored()) - self.assertFalse(Week(2).isAnchored()) + assert Week(weekday=0).isAnchored() + assert not Week().isAnchored() + assert not Week(2, weekday=2).isAnchored() + assert not Week(2).isAnchored() def test_offset(self): tests = [] @@ -2289,27 +2256,27 @@ def test_offsets_compare_equal(self): # root cause of #456 offset1 = Week() offset2 = Week() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 class TestWeekOfMonth(Base): _offset = WeekOfMonth def test_constructor(self): - assertRaisesRegexp(ValueError, "^N cannot be 0", WeekOfMonth, n=0, - week=1, weekday=1) - assertRaisesRegexp(ValueError, "^Week", WeekOfMonth, n=1, week=4, - weekday=0) - assertRaisesRegexp(ValueError, "^Week", WeekOfMonth, n=1, week=-1, - weekday=0) - assertRaisesRegexp(ValueError, "^Day", WeekOfMonth, n=1, week=0, - weekday=-1) - assertRaisesRegexp(ValueError, "^Day", WeekOfMonth, n=1, week=0, - weekday=7) + tm.assert_raises_regex(ValueError, "^N cannot be 0", + WeekOfMonth, n=0, week=1, weekday=1) + tm.assert_raises_regex(ValueError, "^Week", WeekOfMonth, + n=1, week=4, weekday=0) + tm.assert_raises_regex(ValueError, "^Week", WeekOfMonth, + n=1, week=-1, weekday=0) + tm.assert_raises_regex(ValueError, "^Day", WeekOfMonth, + n=1, week=0, weekday=-1) + tm.assert_raises_regex(ValueError, "^Day", WeekOfMonth, + n=1, week=0, weekday=7) def test_repr(self): - self.assertEqual(repr(WeekOfMonth(weekday=1, week=2)), - "") + assert (repr(WeekOfMonth(weekday=1, week=2)) == + "") def test_offset(self): date1 = datetime(2011, 1, 4) # 1st Tuesday of Month @@ -2359,9 +2326,10 @@ def test_offset(self): # try subtracting result = datetime(2011, 2, 1) - WeekOfMonth(week=1, weekday=2) - self.assertEqual(result, datetime(2011, 1, 12)) + assert result == datetime(2011, 1, 12) + result = datetime(2011, 2, 3) - WeekOfMonth(week=0, weekday=2) - self.assertEqual(result, datetime(2011, 2, 2)) + assert result == datetime(2011, 2, 2) def test_onOffset(self): test_cases = [ @@ -2375,19 +2343,20 @@ def test_onOffset(self): for week, weekday, dt, expected in test_cases: offset = WeekOfMonth(week=week, weekday=weekday) - self.assertEqual(offset.onOffset(dt), expected) + assert offset.onOffset(dt) == expected class TestLastWeekOfMonth(Base): _offset = LastWeekOfMonth def test_constructor(self): - assertRaisesRegexp(ValueError, "^N cannot be 0", LastWeekOfMonth, n=0, - weekday=1) + tm.assert_raises_regex(ValueError, "^N cannot be 0", + LastWeekOfMonth, n=0, weekday=1) - assertRaisesRegexp(ValueError, "^Day", LastWeekOfMonth, n=1, - weekday=-1) - assertRaisesRegexp(ValueError, "^Day", LastWeekOfMonth, n=1, weekday=7) + tm.assert_raises_regex(ValueError, "^Day", LastWeekOfMonth, n=1, + weekday=-1) + tm.assert_raises_regex( + ValueError, "^Day", LastWeekOfMonth, n=1, weekday=7) def test_offset(self): # Saturday @@ -2396,13 +2365,13 @@ def test_offset(self): offset_sat = LastWeekOfMonth(n=1, weekday=5) one_day_before = (last_sat + timedelta(days=-1)) - self.assertEqual(one_day_before + offset_sat, last_sat) + assert one_day_before + offset_sat == last_sat one_day_after = (last_sat + timedelta(days=+1)) - self.assertEqual(one_day_after + offset_sat, next_sat) + assert one_day_after + offset_sat == next_sat # Test On that day - self.assertEqual(last_sat + offset_sat, next_sat) + assert last_sat + offset_sat == next_sat # Thursday @@ -2411,23 +2380,22 @@ def test_offset(self): next_thurs = datetime(2013, 2, 28) one_day_before = last_thurs + timedelta(days=-1) - self.assertEqual(one_day_before + offset_thur, last_thurs) + assert one_day_before + offset_thur == last_thurs one_day_after = last_thurs + timedelta(days=+1) - self.assertEqual(one_day_after + offset_thur, next_thurs) + assert one_day_after + offset_thur == next_thurs # Test on that day - self.assertEqual(last_thurs + offset_thur, next_thurs) + assert last_thurs + offset_thur == next_thurs three_before = last_thurs + timedelta(days=-3) - self.assertEqual(three_before + offset_thur, last_thurs) + assert three_before + offset_thur == last_thurs two_after = last_thurs + timedelta(days=+2) - self.assertEqual(two_after + offset_thur, next_thurs) + assert two_after + offset_thur == next_thurs offset_sunday = LastWeekOfMonth(n=1, weekday=WeekDay.SUN) - self.assertEqual(datetime(2013, 7, 31) + - offset_sunday, datetime(2013, 8, 25)) + assert datetime(2013, 7, 31) + offset_sunday == datetime(2013, 8, 25) def test_onOffset(self): test_cases = [ @@ -2449,7 +2417,7 @@ def test_onOffset(self): for weekday, dt, expected in test_cases: offset = LastWeekOfMonth(weekday=weekday) - self.assertEqual(offset.onOffset(dt), expected, msg=date) + assert offset.onOffset(dt) == expected class TestBMonthBegin(Base): @@ -2511,7 +2479,7 @@ def test_offsets_compare_equal(self): # root cause of #456 offset1 = BMonthBegin() offset2 = BMonthBegin() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 class TestBMonthEnd(Base): @@ -2560,7 +2528,7 @@ def test_normalize(self): result = dt + BMonthEnd(normalize=True) expected = dt.replace(hour=0) + BMonthEnd() - self.assertEqual(result, expected) + assert result == expected def test_onOffset(self): @@ -2574,7 +2542,7 @@ def test_offsets_compare_equal(self): # root cause of #456 offset1 = BMonthEnd() offset2 = BMonthEnd() - self.assertFalse(offset1 != offset2) + assert not offset1 != offset2 class TestMonthBegin(Base): @@ -2659,23 +2627,22 @@ def test_offset(self): for base, expected in compat.iteritems(cases): assertEq(offset, base, expected) - # def test_day_of_month(self): - # dt = datetime(2007, 1, 1) - - # offset = MonthEnd(day=20) + def test_day_of_month(self): + dt = datetime(2007, 1, 1) + offset = MonthEnd(day=20) - # result = dt + offset - # self.assertEqual(result, datetime(2007, 1, 20)) + result = dt + offset + assert result == Timestamp(2007, 1, 31) - # result = result + offset - # self.assertEqual(result, datetime(2007, 2, 20)) + result = result + offset + assert result == Timestamp(2007, 2, 28) def test_normalize(self): dt = datetime(2007, 1, 1, 3) result = dt + MonthEnd(normalize=True) expected = dt.replace(hour=0) + MonthEnd() - self.assertEqual(result, expected) + assert result == expected def test_onOffset(self): @@ -2836,7 +2803,7 @@ def test_onOffset(self): def test_vectorized_offset_addition(self): for klass, assert_func in zip([Series, DatetimeIndex], - [self.assert_series_equal, + [tm.assert_series_equal, tm.assert_index_equal]): s = klass([Timestamp('2000-01-15 00:15:00', tz='US/Central'), Timestamp('2000-02-15', tz='US/Central')], name='a') @@ -3011,7 +2978,7 @@ def test_onOffset(self): def test_vectorized_offset_addition(self): for klass, assert_func in zip([Series, DatetimeIndex], - [self.assert_series_equal, + [tm.assert_series_equal, tm.assert_index_equal]): s = klass([Timestamp('2000-01-15 00:15:00', tz='US/Central'), @@ -3037,17 +3004,17 @@ class TestBQuarterBegin(Base): _offset = BQuarterBegin def test_repr(self): - self.assertEqual(repr(BQuarterBegin()), - "") - self.assertEqual(repr(BQuarterBegin(startingMonth=3)), - "") - self.assertEqual(repr(BQuarterBegin(startingMonth=1)), - "") + assert (repr(BQuarterBegin()) == + "") + assert (repr(BQuarterBegin(startingMonth=3)) == + "") + assert (repr(BQuarterBegin(startingMonth=1)) == + "") def test_isAnchored(self): - self.assertTrue(BQuarterBegin(startingMonth=1).isAnchored()) - self.assertTrue(BQuarterBegin().isAnchored()) - self.assertFalse(BQuarterBegin(2, startingMonth=1).isAnchored()) + assert BQuarterBegin(startingMonth=1).isAnchored() + assert BQuarterBegin().isAnchored() + assert not BQuarterBegin(2, startingMonth=1).isAnchored() def test_offset(self): tests = [] @@ -3124,24 +3091,24 @@ def test_offset(self): # corner offset = BQuarterBegin(n=-1, startingMonth=1) - self.assertEqual(datetime(2007, 4, 3) + offset, datetime(2007, 4, 2)) + assert datetime(2007, 4, 3) + offset == datetime(2007, 4, 2) class TestBQuarterEnd(Base): _offset = BQuarterEnd def test_repr(self): - self.assertEqual(repr(BQuarterEnd()), - "") - self.assertEqual(repr(BQuarterEnd(startingMonth=3)), - "") - self.assertEqual(repr(BQuarterEnd(startingMonth=1)), - "") + assert (repr(BQuarterEnd()) == + "") + assert (repr(BQuarterEnd(startingMonth=3)) == + "") + assert (repr(BQuarterEnd(startingMonth=1)) == + "") def test_isAnchored(self): - self.assertTrue(BQuarterEnd(startingMonth=1).isAnchored()) - self.assertTrue(BQuarterEnd().isAnchored()) - self.assertFalse(BQuarterEnd(2, startingMonth=1).isAnchored()) + assert BQuarterEnd(startingMonth=1).isAnchored() + assert BQuarterEnd().isAnchored() + assert not BQuarterEnd(2, startingMonth=1).isAnchored() def test_offset(self): tests = [] @@ -3201,7 +3168,7 @@ def test_offset(self): # corner offset = BQuarterEnd(n=-1, startingMonth=1) - self.assertEqual(datetime(2010, 1, 31) + offset, datetime(2010, 1, 29)) + assert datetime(2010, 1, 31) + offset == datetime(2010, 1, 29) def test_onOffset(self): @@ -3256,6 +3223,7 @@ def makeFY5253LastOfMonth(*args, **kwds): class TestFY5253LastOfMonth(Base): + def test_onOffset(self): offset_lom_sat_aug = makeFY5253LastOfMonth(1, startingMonth=8, @@ -3337,57 +3305,52 @@ def test_apply(self): current = data[0] for datum in data[1:]: current = current + offset - self.assertEqual(current, datum) + assert current == datum class TestFY5253NearestEndMonth(Base): + def test_get_target_month_end(self): - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=8, - weekday=WeekDay.SAT) - .get_target_month_end( - datetime(2013, 1, 1)), datetime(2013, 8, 31)) - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=12, - weekday=WeekDay.SAT) - .get_target_month_end(datetime(2013, 1, 1)), - datetime(2013, 12, 31)) - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=2, - weekday=WeekDay.SAT) - .get_target_month_end(datetime(2013, 1, 1)), - datetime(2013, 2, 28)) + assert (makeFY5253NearestEndMonth( + startingMonth=8, weekday=WeekDay.SAT).get_target_month_end( + datetime(2013, 1, 1)) == datetime(2013, 8, 31)) + assert (makeFY5253NearestEndMonth( + startingMonth=12, weekday=WeekDay.SAT).get_target_month_end( + datetime(2013, 1, 1)) == datetime(2013, 12, 31)) + assert (makeFY5253NearestEndMonth( + startingMonth=2, weekday=WeekDay.SAT).get_target_month_end( + datetime(2013, 1, 1)) == datetime(2013, 2, 28)) def test_get_year_end(self): - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=8, - weekday=WeekDay.SAT) - .get_year_end(datetime(2013, 1, 1)), - datetime(2013, 8, 31)) - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=8, - weekday=WeekDay.SUN) - .get_year_end(datetime(2013, 1, 1)), - datetime(2013, 9, 1)) - self.assertEqual(makeFY5253NearestEndMonth(startingMonth=8, - weekday=WeekDay.FRI) - .get_year_end(datetime(2013, 1, 1)), - datetime(2013, 8, 30)) + assert (makeFY5253NearestEndMonth( + startingMonth=8, weekday=WeekDay.SAT).get_year_end( + datetime(2013, 1, 1)) == datetime(2013, 8, 31)) + assert (makeFY5253NearestEndMonth( + startingMonth=8, weekday=WeekDay.SUN).get_year_end( + datetime(2013, 1, 1)) == datetime(2013, 9, 1)) + assert (makeFY5253NearestEndMonth( + startingMonth=8, weekday=WeekDay.FRI).get_year_end( + datetime(2013, 1, 1)) == datetime(2013, 8, 30)) offset_n = FY5253(weekday=WeekDay.TUE, startingMonth=12, variation="nearest") - self.assertEqual(offset_n.get_year_end( - datetime(2012, 1, 1)), datetime(2013, 1, 1)) - self.assertEqual(offset_n.get_year_end( - datetime(2012, 1, 10)), datetime(2013, 1, 1)) - - self.assertEqual(offset_n.get_year_end( - datetime(2013, 1, 1)), datetime(2013, 12, 31)) - self.assertEqual(offset_n.get_year_end( - datetime(2013, 1, 2)), datetime(2013, 12, 31)) - self.assertEqual(offset_n.get_year_end( - datetime(2013, 1, 3)), datetime(2013, 12, 31)) - self.assertEqual(offset_n.get_year_end( - datetime(2013, 1, 10)), datetime(2013, 12, 31)) + assert (offset_n.get_year_end(datetime(2012, 1, 1)) == + datetime(2013, 1, 1)) + assert (offset_n.get_year_end(datetime(2012, 1, 10)) == + datetime(2013, 1, 1)) + + assert (offset_n.get_year_end(datetime(2013, 1, 1)) == + datetime(2013, 12, 31)) + assert (offset_n.get_year_end(datetime(2013, 1, 2)) == + datetime(2013, 12, 31)) + assert (offset_n.get_year_end(datetime(2013, 1, 3)) == + datetime(2013, 12, 31)) + assert (offset_n.get_year_end(datetime(2013, 1, 10)) == + datetime(2013, 12, 31)) JNJ = FY5253(n=1, startingMonth=12, weekday=6, variation="nearest") - self.assertEqual(JNJ.get_year_end( - datetime(2006, 1, 1)), datetime(2006, 12, 31)) + assert (JNJ.get_year_end(datetime(2006, 1, 1)) == + datetime(2006, 12, 31)) def test_onOffset(self): offset_lom_aug_sat = makeFY5253NearestEndMonth(1, startingMonth=8, @@ -3502,42 +3465,35 @@ def test_apply(self): current = data[0] for datum in data[1:]: current = current + offset - self.assertEqual(current, datum) + assert current == datum class TestFY5253LastOfMonthQuarter(Base): + def test_isAnchored(self): - self.assertTrue( - makeFY5253LastOfMonthQuarter(startingMonth=1, weekday=WeekDay.SAT, - qtr_with_extra_week=4).isAnchored()) - self.assertTrue( - makeFY5253LastOfMonthQuarter(weekday=WeekDay.SAT, startingMonth=3, - qtr_with_extra_week=4).isAnchored()) - self.assertFalse(makeFY5253LastOfMonthQuarter( + assert makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SAT, + qtr_with_extra_week=4).isAnchored() + assert makeFY5253LastOfMonthQuarter( + weekday=WeekDay.SAT, startingMonth=3, + qtr_with_extra_week=4).isAnchored() + assert not makeFY5253LastOfMonthQuarter( 2, startingMonth=1, weekday=WeekDay.SAT, - qtr_with_extra_week=4).isAnchored()) + qtr_with_extra_week=4).isAnchored() def test_equality(self): - self.assertEqual(makeFY5253LastOfMonthQuarter(startingMonth=1, - weekday=WeekDay.SAT, - qtr_with_extra_week=4), - makeFY5253LastOfMonthQuarter(startingMonth=1, - weekday=WeekDay.SAT, - qtr_with_extra_week=4)) - self.assertNotEqual( - makeFY5253LastOfMonthQuarter( - startingMonth=1, weekday=WeekDay.SAT, - qtr_with_extra_week=4), - makeFY5253LastOfMonthQuarter( - startingMonth=1, weekday=WeekDay.SUN, - qtr_with_extra_week=4)) - self.assertNotEqual( - makeFY5253LastOfMonthQuarter( - startingMonth=1, weekday=WeekDay.SAT, - qtr_with_extra_week=4), - makeFY5253LastOfMonthQuarter( - startingMonth=2, weekday=WeekDay.SAT, - qtr_with_extra_week=4)) + assert (makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SAT, + qtr_with_extra_week=4) == makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SAT, qtr_with_extra_week=4)) + assert (makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SAT, + qtr_with_extra_week=4) != makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SUN, qtr_with_extra_week=4)) + assert (makeFY5253LastOfMonthQuarter( + startingMonth=1, weekday=WeekDay.SAT, + qtr_with_extra_week=4) != makeFY5253LastOfMonthQuarter( + startingMonth=2, weekday=WeekDay.SAT, qtr_with_extra_week=4)) def test_offset(self): offset = makeFY5253LastOfMonthQuarter(1, startingMonth=9, @@ -3663,53 +3619,40 @@ def test_onOffset(self): def test_year_has_extra_week(self): # End of long Q1 - self.assertTrue( - makeFY5253LastOfMonthQuarter(1, startingMonth=12, - weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(2011, 4, 2))) + assert makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(2011, 4, 2)) # Start of long Q1 - self.assertTrue( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(2010, 12, 26))) + assert makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(2010, 12, 26)) # End of year before year with long Q1 - self.assertFalse( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(2010, 12, 25))) + assert not makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(2010, 12, 25)) for year in [x for x in range(1994, 2011 + 1) if x not in [2011, 2005, 2000, 1994]]: - self.assertFalse( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(year, 4, 2))) + assert not makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week( + datetime(year, 4, 2)) # Other long years - self.assertTrue( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(2005, 4, 2))) + assert makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(2005, 4, 2)) - self.assertTrue( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(2000, 4, 2))) + assert makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(2000, 4, 2)) - self.assertTrue( - makeFY5253LastOfMonthQuarter( - 1, startingMonth=12, weekday=WeekDay.SAT, - qtr_with_extra_week=1) - .year_has_extra_week(datetime(1994, 4, 2))) + assert makeFY5253LastOfMonthQuarter( + 1, startingMonth=12, weekday=WeekDay.SAT, + qtr_with_extra_week=1).year_has_extra_week(datetime(1994, 4, 2)) def test_get_weeks(self): sat_dec_1 = makeFY5253LastOfMonthQuarter(1, startingMonth=12, @@ -3719,15 +3662,13 @@ def test_get_weeks(self): weekday=WeekDay.SAT, qtr_with_extra_week=4) - self.assertEqual(sat_dec_1.get_weeks( - datetime(2011, 4, 2)), [14, 13, 13, 13]) - self.assertEqual(sat_dec_4.get_weeks( - datetime(2011, 4, 2)), [13, 13, 13, 14]) - self.assertEqual(sat_dec_1.get_weeks( - datetime(2010, 12, 25)), [13, 13, 13, 13]) + assert sat_dec_1.get_weeks(datetime(2011, 4, 2)) == [14, 13, 13, 13] + assert sat_dec_4.get_weeks(datetime(2011, 4, 2)) == [13, 13, 13, 14] + assert sat_dec_1.get_weeks(datetime(2010, 12, 25)) == [13, 13, 13, 13] class TestFY5253NearestEndMonthQuarter(Base): + def test_onOffset(self): offset_nem_sat_aug_4 = makeFY5253NearestEndMonthQuarter( @@ -3813,18 +3754,19 @@ def test_offset(self): class TestQuarterBegin(Base): + def test_repr(self): - self.assertEqual(repr(QuarterBegin()), - "") - self.assertEqual(repr(QuarterBegin(startingMonth=3)), - "") - self.assertEqual(repr(QuarterBegin(startingMonth=1)), - "") + assert (repr(QuarterBegin()) == + "") + assert (repr(QuarterBegin(startingMonth=3)) == + "") + assert (repr(QuarterBegin(startingMonth=1)) == + "") def test_isAnchored(self): - self.assertTrue(QuarterBegin(startingMonth=1).isAnchored()) - self.assertTrue(QuarterBegin().isAnchored()) - self.assertFalse(QuarterBegin(2, startingMonth=1).isAnchored()) + assert QuarterBegin(startingMonth=1).isAnchored() + assert QuarterBegin().isAnchored() + assert not QuarterBegin(2, startingMonth=1).isAnchored() def test_offset(self): tests = [] @@ -3886,23 +3828,24 @@ def test_offset(self): # corner offset = QuarterBegin(n=-1, startingMonth=1) - self.assertEqual(datetime(2010, 2, 1) + offset, datetime(2010, 1, 1)) + assert datetime(2010, 2, 1) + offset == datetime(2010, 1, 1) class TestQuarterEnd(Base): _offset = QuarterEnd def test_repr(self): - self.assertEqual(repr(QuarterEnd()), "") - self.assertEqual(repr(QuarterEnd(startingMonth=3)), - "") - self.assertEqual(repr(QuarterEnd(startingMonth=1)), - "") + assert (repr(QuarterEnd()) == + "") + assert (repr(QuarterEnd(startingMonth=3)) == + "") + assert (repr(QuarterEnd(startingMonth=1)) == + "") def test_isAnchored(self): - self.assertTrue(QuarterEnd(startingMonth=1).isAnchored()) - self.assertTrue(QuarterEnd().isAnchored()) - self.assertFalse(QuarterEnd(2, startingMonth=1).isAnchored()) + assert QuarterEnd(startingMonth=1).isAnchored() + assert QuarterEnd().isAnchored() + assert not QuarterEnd(2, startingMonth=1).isAnchored() def test_offset(self): tests = [] @@ -3963,7 +3906,7 @@ def test_offset(self): # corner offset = QuarterEnd(n=-1, startingMonth=1) - self.assertEqual(datetime(2010, 2, 1) + offset, datetime(2010, 1, 31)) + assert datetime(2010, 2, 1) + offset == datetime(2010, 1, 31) def test_onOffset(self): @@ -4031,8 +3974,8 @@ class TestBYearBegin(Base): _offset = BYearBegin def test_misspecified(self): - self.assertRaises(ValueError, BYearBegin, month=13) - self.assertRaises(ValueError, BYearEnd, month=13) + pytest.raises(ValueError, BYearBegin, month=13) + pytest.raises(ValueError, BYearEnd, month=13) def test_offset(self): tests = [] @@ -4077,7 +4020,7 @@ class TestYearBegin(Base): _offset = YearBegin def test_misspecified(self): - self.assertRaises(ValueError, YearBegin, month=13) + pytest.raises(ValueError, YearBegin, month=13) def test_offset(self): tests = [] @@ -4167,9 +4110,10 @@ def test_onOffset(self): class TestBYearEndLagged(Base): + def test_bad_month_fail(self): - self.assertRaises(Exception, BYearEnd, month=13) - self.assertRaises(Exception, BYearEnd, month=0) + pytest.raises(Exception, BYearEnd, month=13) + pytest.raises(Exception, BYearEnd, month=0) def test_offset(self): tests = [] @@ -4184,14 +4128,14 @@ def test_offset(self): for offset, cases in tests: for base, expected in compat.iteritems(cases): - self.assertEqual(base + offset, expected) + assert base + offset == expected def test_roll(self): offset = BYearEnd(month=6) date = datetime(2009, 11, 30) - self.assertEqual(offset.rollforward(date), datetime(2010, 6, 30)) - self.assertEqual(offset.rollback(date), datetime(2009, 6, 30)) + assert offset.rollforward(date) == datetime(2010, 6, 30) + assert offset.rollback(date) == datetime(2009, 6, 30) def test_onOffset(self): @@ -4257,7 +4201,7 @@ class TestYearEnd(Base): _offset = YearEnd def test_misspecified(self): - self.assertRaises(ValueError, YearEnd, month=13) + pytest.raises(ValueError, YearEnd, month=13) def test_offset(self): tests = [] @@ -4306,6 +4250,7 @@ def test_onOffset(self): class TestYearEndDiffMonth(Base): + def test_offset(self): tests = [] @@ -4383,7 +4328,7 @@ def test_Easter(): assertEq(-Easter(2), datetime(2010, 4, 4), datetime(2008, 3, 23)) -class TestTicks(tm.TestCase): +class TestTicks(object): ticks = [Hour, Minute, Second, Milli, Micro, Nano] @@ -4398,8 +4343,8 @@ def test_ticks(self): for kls, expected in offsets: offset = kls(3) result = offset + Timedelta(hours=2) - self.assertTrue(isinstance(result, Timedelta)) - self.assertEqual(result, expected) + assert isinstance(result, Timedelta) + assert result == expected def test_Hour(self): assertEq(Hour(), datetime(2010, 1, 1), datetime(2010, 1, 1, 1)) @@ -4407,10 +4352,10 @@ def test_Hour(self): assertEq(2 * Hour(), datetime(2010, 1, 1), datetime(2010, 1, 1, 2)) assertEq(-1 * Hour(), datetime(2010, 1, 1, 1), datetime(2010, 1, 1)) - self.assertEqual(Hour(3) + Hour(2), Hour(5)) - self.assertEqual(Hour(3) - Hour(2), Hour()) + assert Hour(3) + Hour(2) == Hour(5) + assert Hour(3) - Hour(2) == Hour() - self.assertNotEqual(Hour(4), Hour(1)) + assert Hour(4) != Hour(1) def test_Minute(self): assertEq(Minute(), datetime(2010, 1, 1), datetime(2010, 1, 1, 0, 1)) @@ -4420,9 +4365,9 @@ def test_Minute(self): assertEq(-1 * Minute(), datetime(2010, 1, 1, 0, 1), datetime(2010, 1, 1)) - self.assertEqual(Minute(3) + Minute(2), Minute(5)) - self.assertEqual(Minute(3) - Minute(2), Minute()) - self.assertNotEqual(Minute(5), Minute()) + assert Minute(3) + Minute(2) == Minute(5) + assert Minute(3) - Minute(2) == Minute() + assert Minute(5) != Minute() def test_Second(self): assertEq(Second(), datetime(2010, 1, 1), datetime(2010, 1, 1, 0, 0, 1)) @@ -4433,8 +4378,8 @@ def test_Second(self): assertEq(-1 * Second(), datetime(2010, 1, 1, 0, 0, 1), datetime(2010, 1, 1)) - self.assertEqual(Second(3) + Second(2), Second(5)) - self.assertEqual(Second(3) - Second(2), Second()) + assert Second(3) + Second(2) == Second(5) + assert Second(3) - Second(2) == Second() def test_Millisecond(self): assertEq(Milli(), datetime(2010, 1, 1), @@ -4448,8 +4393,8 @@ def test_Millisecond(self): assertEq(-1 * Milli(), datetime(2010, 1, 1, 0, 0, 0, 1000), datetime(2010, 1, 1)) - self.assertEqual(Milli(3) + Milli(2), Milli(5)) - self.assertEqual(Milli(3) - Milli(2), Milli()) + assert Milli(3) + Milli(2) == Milli(5) + assert Milli(3) - Milli(2) == Milli() def test_MillisecondTimestampArithmetic(self): assertEq(Milli(), Timestamp('2010-01-01'), @@ -4467,18 +4412,18 @@ def test_Microsecond(self): assertEq(-1 * Micro(), datetime(2010, 1, 1, 0, 0, 0, 1), datetime(2010, 1, 1)) - self.assertEqual(Micro(3) + Micro(2), Micro(5)) - self.assertEqual(Micro(3) - Micro(2), Micro()) + assert Micro(3) + Micro(2) == Micro(5) + assert Micro(3) - Micro(2) == Micro() def test_NanosecondGeneric(self): timestamp = Timestamp(datetime(2010, 1, 1)) - self.assertEqual(timestamp.nanosecond, 0) + assert timestamp.nanosecond == 0 result = timestamp + Nano(10) - self.assertEqual(result.nanosecond, 10) + assert result.nanosecond == 10 reverse_result = Nano(10) + timestamp - self.assertEqual(reverse_result.nanosecond, 10) + assert reverse_result.nanosecond == 10 def test_Nanosecond(self): timestamp = Timestamp(datetime(2010, 1, 1)) @@ -4487,44 +4432,44 @@ def test_Nanosecond(self): assertEq(2 * Nano(), timestamp, timestamp + np.timedelta64(2, 'ns')) assertEq(-1 * Nano(), timestamp + np.timedelta64(1, 'ns'), timestamp) - self.assertEqual(Nano(3) + Nano(2), Nano(5)) - self.assertEqual(Nano(3) - Nano(2), Nano()) + assert Nano(3) + Nano(2) == Nano(5) + assert Nano(3) - Nano(2) == Nano() # GH9284 - self.assertEqual(Nano(1) + Nano(10), Nano(11)) - self.assertEqual(Nano(5) + Micro(1), Nano(1005)) - self.assertEqual(Micro(5) + Nano(1), Nano(5001)) + assert Nano(1) + Nano(10) == Nano(11) + assert Nano(5) + Micro(1) == Nano(1005) + assert Micro(5) + Nano(1) == Nano(5001) def test_tick_zero(self): for t1 in self.ticks: for t2 in self.ticks: - self.assertEqual(t1(0), t2(0)) - self.assertEqual(t1(0) + t2(0), t1(0)) + assert t1(0) == t2(0) + assert t1(0) + t2(0) == t1(0) if t1 is not Nano: - self.assertEqual(t1(2) + t2(0), t1(2)) + assert t1(2) + t2(0) == t1(2) if t1 is Nano: - self.assertEqual(t1(2) + Nano(0), t1(2)) + assert t1(2) + Nano(0) == t1(2) def test_tick_equalities(self): for t in self.ticks: - self.assertEqual(t(3), t(3)) - self.assertEqual(t(), t(1)) + assert t(3) == t(3) + assert t() == t(1) # not equals - self.assertNotEqual(t(3), t(2)) - self.assertNotEqual(t(3), t(-3)) + assert t(3) != t(2) + assert t(3) != t(-3) def test_tick_operators(self): for t in self.ticks: - self.assertEqual(t(3) + t(2), t(5)) - self.assertEqual(t(3) - t(2), t(1)) - self.assertEqual(t(800) + t(300), t(1100)) - self.assertEqual(t(1000) - t(5), t(995)) + assert t(3) + t(2) == t(5) + assert t(3) - t(2) == t(1) + assert t(800) + t(300) == t(1100) + assert t(1000) - t(5) == t(995) def test_tick_offset(self): for t in self.ticks: - self.assertFalse(t().isAnchored()) + assert not t().isAnchored() def test_compare_ticks(self): for kls in self.ticks: @@ -4532,41 +4477,39 @@ def test_compare_ticks(self): four = kls(4) for _ in range(10): - self.assertTrue(three < kls(4)) - self.assertTrue(kls(3) < four) - self.assertTrue(four > kls(3)) - self.assertTrue(kls(4) > three) - self.assertTrue(kls(3) == kls(3)) - self.assertTrue(kls(3) != kls(4)) + assert three < kls(4) + assert kls(3) < four + assert four > kls(3) + assert kls(4) > three + assert kls(3) == kls(3) + assert kls(3) != kls(4) + +class TestOffsetNames(object): -class TestOffsetNames(tm.TestCase): def test_get_offset_name(self): - self.assertEqual(BDay().freqstr, 'B') - self.assertEqual(BDay(2).freqstr, '2B') - self.assertEqual(BMonthEnd().freqstr, 'BM') - self.assertEqual(Week(weekday=0).freqstr, 'W-MON') - self.assertEqual(Week(weekday=1).freqstr, 'W-TUE') - self.assertEqual(Week(weekday=2).freqstr, 'W-WED') - self.assertEqual(Week(weekday=3).freqstr, 'W-THU') - self.assertEqual(Week(weekday=4).freqstr, 'W-FRI') - - self.assertEqual(LastWeekOfMonth( - weekday=WeekDay.SUN).freqstr, "LWOM-SUN") - self.assertEqual( - makeFY5253LastOfMonthQuarter(weekday=1, startingMonth=3, - qtr_with_extra_week=4).freqstr, - "REQ-L-MAR-TUE-4") - self.assertEqual( - makeFY5253NearestEndMonthQuarter(weekday=1, startingMonth=3, - qtr_with_extra_week=3).freqstr, - "REQ-N-MAR-TUE-3") + assert BDay().freqstr == 'B' + assert BDay(2).freqstr == '2B' + assert BMonthEnd().freqstr == 'BM' + assert Week(weekday=0).freqstr == 'W-MON' + assert Week(weekday=1).freqstr == 'W-TUE' + assert Week(weekday=2).freqstr == 'W-WED' + assert Week(weekday=3).freqstr == 'W-THU' + assert Week(weekday=4).freqstr == 'W-FRI' + + assert LastWeekOfMonth(weekday=WeekDay.SUN).freqstr == "LWOM-SUN" + assert (makeFY5253LastOfMonthQuarter( + weekday=1, startingMonth=3, + qtr_with_extra_week=4).freqstr == "REQ-L-MAR-TUE-4") + assert (makeFY5253NearestEndMonthQuarter( + weekday=1, startingMonth=3, + qtr_with_extra_week=3).freqstr == "REQ-N-MAR-TUE-3") def test_get_offset(): - with tm.assertRaisesRegexp(ValueError, _INVALID_FREQ_ERROR): + with tm.assert_raises_regex(ValueError, _INVALID_FREQ_ERROR): get_offset('gibberish') - with tm.assertRaisesRegexp(ValueError, _INVALID_FREQ_ERROR): + with tm.assert_raises_regex(ValueError, _INVALID_FREQ_ERROR): get_offset('QS-JAN-B') pairs = [ @@ -4594,17 +4537,18 @@ def test_get_offset(): def test_get_offset_legacy(): pairs = [('w@Sat', Week(weekday=5))] for name, expected in pairs: - with tm.assertRaisesRegexp(ValueError, _INVALID_FREQ_ERROR): + with tm.assert_raises_regex(ValueError, _INVALID_FREQ_ERROR): get_offset(name) -class TestParseTimeString(tm.TestCase): +class TestParseTimeString(object): + def test_parse_time_string(self): (date, parsed, reso) = parse_time_string('4Q1984') (date_lower, parsed_lower, reso_lower) = parse_time_string('4q1984') - self.assertEqual(date, date_lower) - self.assertEqual(parsed, parsed_lower) - self.assertEqual(reso, reso_lower) + assert date == date_lower + assert parsed == parsed_lower + assert reso == reso_lower def test_parse_time_quarter_w_dash(self): # https://github.com/pandas-dev/pandas/issue/9688 @@ -4614,13 +4558,13 @@ def test_parse_time_quarter_w_dash(self): (date_dash, parsed_dash, reso_dash) = parse_time_string(dashed) (date, parsed, reso) = parse_time_string(normal) - self.assertEqual(date_dash, date) - self.assertEqual(parsed_dash, parsed) - self.assertEqual(reso_dash, reso) + assert date_dash == date + assert parsed_dash == parsed + assert reso_dash == reso - self.assertRaises(DateParseError, parse_time_string, "-2Q1992") - self.assertRaises(DateParseError, parse_time_string, "2-Q1992") - self.assertRaises(DateParseError, parse_time_string, "4-4Q1992") + pytest.raises(DateParseError, parse_time_string, "-2Q1992") + pytest.raises(DateParseError, parse_time_string, "2-Q1992") + pytest.raises(DateParseError, parse_time_string, "4-4Q1992") def test_get_standard_freq(): @@ -4633,7 +4577,7 @@ def test_get_standard_freq(): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): assert fstr == get_standard_freq(('W', 1)) - with tm.assertRaisesRegexp(ValueError, _INVALID_FREQ_ERROR): + with tm.assert_raises_regex(ValueError, _INVALID_FREQ_ERROR): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): get_standard_freq('WeEk') @@ -4642,7 +4586,7 @@ def test_get_standard_freq(): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): assert fstr == get_standard_freq('5q') - with tm.assertRaisesRegexp(ValueError, _INVALID_FREQ_ERROR): + with tm.assert_raises_regex(ValueError, _INVALID_FREQ_ERROR): with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): get_standard_freq('5QuarTer') @@ -4660,30 +4604,31 @@ def test_quarterly_dont_normalize(): assert (result.time() == date.time()) -class TestOffsetAliases(tm.TestCase): - def setUp(self): +class TestOffsetAliases(object): + + def setup_method(self, method): _offset_map.clear() def test_alias_equality(self): for k, v in compat.iteritems(_offset_map): if v is None: continue - self.assertEqual(k, v.copy()) + assert k == v.copy() def test_rule_code(self): lst = ['M', 'MS', 'BM', 'BMS', 'D', 'B', 'H', 'T', 'S', 'L', 'U'] for k in lst: - self.assertEqual(k, get_offset(k).rule_code) + assert k == get_offset(k).rule_code # should be cached - this is kind of an internals test... assert k in _offset_map - self.assertEqual(k, (get_offset(k) * 3).rule_code) + assert k == (get_offset(k) * 3).rule_code suffix_lst = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'] base = 'W' for v in suffix_lst: alias = '-'.join([base, v]) - self.assertEqual(alias, get_offset(alias).rule_code) - self.assertEqual(alias, (get_offset(alias) * 5).rule_code) + assert alias == get_offset(alias).rule_code + assert alias == (get_offset(alias) * 5).rule_code suffix_lst = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'] @@ -4691,15 +4636,15 @@ def test_rule_code(self): for base in base_lst: for v in suffix_lst: alias = '-'.join([base, v]) - self.assertEqual(alias, get_offset(alias).rule_code) - self.assertEqual(alias, (get_offset(alias) * 5).rule_code) + assert alias == get_offset(alias).rule_code + assert alias == (get_offset(alias) * 5).rule_code lst = ['M', 'D', 'B', 'H', 'T', 'S', 'L', 'U'] for k in lst: code, stride = get_freq_code('3' + k) - self.assertTrue(isinstance(code, int)) - self.assertEqual(stride, 3) - self.assertEqual(k, _get_freq_str(code)) + assert isinstance(code, int) + assert stride == 3 + assert k == _get_freq_str(code) def test_apply_ticks(): @@ -4740,35 +4685,35 @@ def get_all_subclasses(cls): return ret -class TestCaching(tm.TestCase): +class TestCaching(object): # as of GH 6479 (in 0.14.0), offset caching is turned off # as of v0.12.0 only BusinessMonth/Quarter were actually caching - def setUp(self): + def setup_method(self, method): _daterange_cache.clear() _offset_map.clear() def run_X_index_creation(self, cls): inst1 = cls() if not inst1.isAnchored(): - self.assertFalse(inst1._should_cache(), cls) + assert not inst1._should_cache(), cls return - self.assertTrue(inst1._should_cache(), cls) + assert inst1._should_cache(), cls DatetimeIndex(start=datetime(2013, 1, 31), end=datetime(2013, 3, 31), freq=inst1, normalize=True) - self.assertTrue(cls() in _daterange_cache, cls) + assert cls() in _daterange_cache, cls def test_should_cache_month_end(self): - self.assertFalse(MonthEnd()._should_cache()) + assert not MonthEnd()._should_cache() def test_should_cache_bmonth_end(self): - self.assertFalse(BusinessMonthEnd()._should_cache()) + assert not BusinessMonthEnd()._should_cache() def test_should_cache_week_month(self): - self.assertFalse(WeekOfMonth(weekday=1, week=2)._should_cache()) + assert not WeekOfMonth(weekday=1, week=2)._should_cache() def test_all_cacheableoffsets(self): for subclass in get_all_subclasses(CacheableOffset): @@ -4780,22 +4725,23 @@ def test_all_cacheableoffsets(self): def test_month_end_index_creation(self): DatetimeIndex(start=datetime(2013, 1, 31), end=datetime(2013, 3, 31), freq=MonthEnd(), normalize=True) - self.assertFalse(MonthEnd() in _daterange_cache) + assert not MonthEnd() in _daterange_cache def test_bmonth_end_index_creation(self): DatetimeIndex(start=datetime(2013, 1, 31), end=datetime(2013, 3, 29), freq=BusinessMonthEnd(), normalize=True) - self.assertFalse(BusinessMonthEnd() in _daterange_cache) + assert not BusinessMonthEnd() in _daterange_cache def test_week_of_month_index_creation(self): inst1 = WeekOfMonth(weekday=1, week=2) DatetimeIndex(start=datetime(2013, 1, 31), end=datetime(2013, 3, 29), freq=inst1, normalize=True) inst2 = WeekOfMonth(weekday=1, week=2) - self.assertFalse(inst2 in _daterange_cache) + assert inst2 not in _daterange_cache -class TestReprNames(tm.TestCase): +class TestReprNames(object): + def test_str_for_named_is_name(self): # look at all the amazing combinations! month_prefixes = ['A', 'AS', 'BA', 'BAS', 'Q', 'BQ', 'BQS', 'QS'] @@ -4810,7 +4756,7 @@ def test_str_for_named_is_name(self): _offset_map.clear() for name in names: offset = get_offset(name) - self.assertEqual(offset.freqstr, name) + assert offset.freqstr == name def get_utc_offset_hours(ts): @@ -4819,7 +4765,7 @@ def get_utc_offset_hours(ts): return (o.days * 24 * 3600 + o.seconds) / 3600.0 -class TestDST(tm.TestCase): +class TestDST(object): """ test DateOffset additions over Daylight Savings Time """ @@ -4855,34 +4801,34 @@ def _test_offset(self, offset_name, offset_n, tstart, expected_utc_offset): t = tstart + offset if expected_utc_offset is not None: - self.assertTrue(get_utc_offset_hours(t) == expected_utc_offset) + assert get_utc_offset_hours(t) == expected_utc_offset if offset_name == 'weeks': # dates should match - self.assertTrue(t.date() == timedelta(days=7 * offset.kwds[ - 'weeks']) + tstart.date()) + assert t.date() == timedelta(days=7 * offset.kwds[ + 'weeks']) + tstart.date() # expect the same day of week, hour of day, minute, second, ... - self.assertTrue(t.dayofweek == tstart.dayofweek and t.hour == - tstart.hour and t.minute == tstart.minute and - t.second == tstart.second) + assert (t.dayofweek == tstart.dayofweek and + t.hour == tstart.hour and + t.minute == tstart.minute and + t.second == tstart.second) elif offset_name == 'days': # dates should match - self.assertTrue(timedelta(offset.kwds['days']) + tstart.date() == - t.date()) + assert timedelta(offset.kwds['days']) + tstart.date() == t.date() # expect the same hour of day, minute, second, ... - self.assertTrue(t.hour == tstart.hour and - t.minute == tstart.minute and - t.second == tstart.second) + assert (t.hour == tstart.hour and + t.minute == tstart.minute and + t.second == tstart.second) elif offset_name in self.valid_date_offsets_singular: # expect the signular offset value to match between tstart and t datepart_offset = getattr(t, offset_name if offset_name != 'weekday' else 'dayofweek') - self.assertTrue(datepart_offset == offset.kwds[offset_name]) + assert datepart_offset == offset.kwds[offset_name] else: # the offset should be the same as if it was done in UTC - self.assertTrue(t == (tstart.tz_convert('UTC') + offset - ).tz_convert('US/Pacific')) + assert (t == (tstart.tz_convert('UTC') + offset) + .tz_convert('US/Pacific')) def _make_timestamp(self, string, hrs_offset, tz): if hrs_offset >= 0: @@ -4898,7 +4844,7 @@ def test_fallback_plural(self): hrs_pre = utc_offsets['utc_offset_daylight'] hrs_post = utc_offsets['utc_offset_standard'] - if dateutil.__version__ != LooseVersion('2.6.0'): + if dateutil.__version__ < LooseVersion('2.6.0'): # buggy ambiguous behavior in 2.6.0 # GH 14621 # https://github.com/dateutil/dateutil/issues/321 @@ -4906,6 +4852,9 @@ def test_fallback_plural(self): n=3, tstart=self._make_timestamp(self.ts_pre_fallback, hrs_pre, tz), expected_utc_offset=hrs_post) + elif dateutil.__version__ > LooseVersion('2.6.0'): + # fixed, but skip the test + continue def test_springforward_plural(self): # test moving from standard to daylight savings @@ -4955,9 +4904,4 @@ def test_all_offset_classes(self): for offset, test_values in iteritems(tests): first = Timestamp(test_values[0], tz='US/Eastern') + offset() second = Timestamp(test_values[1], tz='US/Eastern') - self.assertEqual(first, second, msg=str(offset)) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert first == second diff --git a/pandas/tseries/tests/test_timezones.py b/pandas/tests/tseries/test_timezones.py similarity index 72% rename from pandas/tseries/tests/test_timezones.py rename to pandas/tests/tseries/test_timezones.py index 64787b6e4e79a..a9ecfd797a32b 100644 --- a/pandas/tseries/tests/test_timezones.py +++ b/pandas/tests/tseries/test_timezones.py @@ -1,35 +1,27 @@ # pylint: disable-msg=E1101,W0612 -from datetime import datetime, timedelta, tzinfo, date -import nose +import pytest -import numpy as np import pytz -from distutils.version import LooseVersion -from pandas.types.dtypes import DatetimeTZDtype -from pandas import (Index, Series, DataFrame, isnull, Timestamp) - -from pandas import DatetimeIndex, to_datetime, NaT -from pandas import tslib +import dateutil +import numpy as np -import pandas.tseries.offsets as offsets -from pandas.tseries.index import bdate_range, date_range -import pandas.tseries.tools as tools +from dateutil.parser import parse from pytz import NonExistentTimeError +from distutils.version import LooseVersion +from dateutil.tz import tzlocal, tzoffset +from datetime import datetime, timedelta, tzinfo, date import pandas.util.testing as tm +import pandas.core.tools.datetimes as tools +import pandas.tseries.offsets as offsets +from pandas.compat import lrange, zip +from pandas.core.indexes.datetimes import bdate_range, date_range +from pandas.core.dtypes.dtypes import DatetimeTZDtype +from pandas._libs import tslib +from pandas import (Index, Series, DataFrame, isna, Timestamp, NaT, + DatetimeIndex, to_datetime) from pandas.util.testing import (assert_frame_equal, assert_series_equal, set_timezone) -from pandas.compat import lrange, zip - -try: - import pytz # noqa -except ImportError: - pass - -try: - import dateutil -except ImportError: - pass class FixedOffset(tzinfo): @@ -53,11 +45,7 @@ def dst(self, dt): fixed_off_no_name = FixedOffset(-330, None) -class TestTimeZoneSupportPytz(tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): - tm._skip_if_no_pytz() +class TestTimeZoneSupportPytz(object): def tz(self, tz): # Construct a timezone object from a string. Overridden in subclass to @@ -82,18 +70,18 @@ def test_utc_to_local_no_modify(self): rng_eastern = rng.tz_convert(self.tzstr('US/Eastern')) # Values are unmodified - self.assertTrue(np.array_equal(rng.asi8, rng_eastern.asi8)) + assert np.array_equal(rng.asi8, rng_eastern.asi8) - self.assertTrue(self.cmptz(rng_eastern.tz, self.tz('US/Eastern'))) + assert self.cmptz(rng_eastern.tz, self.tz('US/Eastern')) def test_utc_to_local_no_modify_explicit(self): rng = date_range('3/11/2012', '3/12/2012', freq='H', tz='utc') rng_eastern = rng.tz_convert(self.tz('US/Eastern')) # Values are unmodified - self.assert_numpy_array_equal(rng.asi8, rng_eastern.asi8) + tm.assert_numpy_array_equal(rng.asi8, rng_eastern.asi8) - self.assertEqual(rng_eastern.tz, self.tz('US/Eastern')) + assert rng_eastern.tz == self.tz('US/Eastern') def test_localize_utc_conversion(self): # Localizing to time zone should: @@ -104,13 +92,13 @@ def test_localize_utc_conversion(self): converted = rng.tz_localize(self.tzstr('US/Eastern')) expected_naive = rng + offsets.Hour(5) - self.assert_numpy_array_equal(converted.asi8, expected_naive.asi8) + tm.assert_numpy_array_equal(converted.asi8, expected_naive.asi8) # DST ambiguity, this should fail rng = date_range('3/11/2012', '3/12/2012', freq='30T') # Is this really how it should fail?? - self.assertRaises(NonExistentTimeError, rng.tz_localize, - self.tzstr('US/Eastern')) + pytest.raises(NonExistentTimeError, rng.tz_localize, + self.tzstr('US/Eastern')) def test_localize_utc_conversion_explicit(self): # Localizing to time zone should: @@ -120,29 +108,29 @@ def test_localize_utc_conversion_explicit(self): rng = date_range('3/10/2012', '3/11/2012', freq='30T') converted = rng.tz_localize(self.tz('US/Eastern')) expected_naive = rng + offsets.Hour(5) - self.assertTrue(np.array_equal(converted.asi8, expected_naive.asi8)) + assert np.array_equal(converted.asi8, expected_naive.asi8) # DST ambiguity, this should fail rng = date_range('3/11/2012', '3/12/2012', freq='30T') # Is this really how it should fail?? - self.assertRaises(NonExistentTimeError, rng.tz_localize, - self.tz('US/Eastern')) + pytest.raises(NonExistentTimeError, rng.tz_localize, + self.tz('US/Eastern')) def test_timestamp_tz_localize(self): stamp = Timestamp('3/11/2012 04:00') result = stamp.tz_localize(self.tzstr('US/Eastern')) expected = Timestamp('3/11/2012 04:00', tz=self.tzstr('US/Eastern')) - self.assertEqual(result.hour, expected.hour) - self.assertEqual(result, expected) + assert result.hour == expected.hour + assert result == expected def test_timestamp_tz_localize_explicit(self): stamp = Timestamp('3/11/2012 04:00') result = stamp.tz_localize(self.tz('US/Eastern')) expected = Timestamp('3/11/2012 04:00', tz=self.tz('US/Eastern')) - self.assertEqual(result.hour, expected.hour) - self.assertEqual(result, expected) + assert result.hour == expected.hour + assert result == expected def test_timestamp_constructed_by_date_and_tz(self): # Fix Issue 2993, Timestamp cannot be constructed by datetime.date @@ -151,8 +139,8 @@ def test_timestamp_constructed_by_date_and_tz(self): result = Timestamp(date(2012, 3, 11), tz=self.tzstr('US/Eastern')) expected = Timestamp('3/11/2012', tz=self.tzstr('US/Eastern')) - self.assertEqual(result.hour, expected.hour) - self.assertEqual(result, expected) + assert result.hour == expected.hour + assert result == expected def test_timestamp_constructed_by_date_and_tz_explicit(self): # Fix Issue 2993, Timestamp cannot be constructed by datetime.date @@ -161,16 +149,60 @@ def test_timestamp_constructed_by_date_and_tz_explicit(self): result = Timestamp(date(2012, 3, 11), tz=self.tz('US/Eastern')) expected = Timestamp('3/11/2012', tz=self.tz('US/Eastern')) - self.assertEqual(result.hour, expected.hour) - self.assertEqual(result, expected) + assert result.hour == expected.hour + assert result == expected + + def test_timestamp_constructor_near_dst_boundary(self): + # GH 11481 & 15777 + # Naive string timestamps were being localized incorrectly + # with tz_convert_single instead of tz_localize_to_utc + + for tz in ['Europe/Brussels', 'Europe/Prague']: + result = Timestamp('2015-10-25 01:00', tz=tz) + expected = Timestamp('2015-10-25 01:00').tz_localize(tz) + assert result == expected + + with pytest.raises(pytz.AmbiguousTimeError): + Timestamp('2015-10-25 02:00', tz=tz) + + result = Timestamp('2017-03-26 01:00', tz='Europe/Paris') + expected = Timestamp('2017-03-26 01:00').tz_localize('Europe/Paris') + assert result == expected + + with pytest.raises(pytz.NonExistentTimeError): + Timestamp('2017-03-26 02:00', tz='Europe/Paris') + + # GH 11708 + result = to_datetime("2015-11-18 15:30:00+05:30").tz_localize( + 'UTC').tz_convert('Asia/Kolkata') + expected = Timestamp('2015-11-18 15:30:00+0530', tz='Asia/Kolkata') + assert result == expected + + # GH 15823 + result = Timestamp('2017-03-26 00:00', tz='Europe/Paris') + expected = Timestamp('2017-03-26 00:00:00+0100', tz='Europe/Paris') + assert result == expected + + result = Timestamp('2017-03-26 01:00', tz='Europe/Paris') + expected = Timestamp('2017-03-26 01:00:00+0100', tz='Europe/Paris') + assert result == expected + + with pytest.raises(pytz.NonExistentTimeError): + Timestamp('2017-03-26 02:00', tz='Europe/Paris') + result = Timestamp('2017-03-26 02:00:00+0100', tz='Europe/Paris') + expected = Timestamp(result.value).tz_localize( + 'UTC').tz_convert('Europe/Paris') + assert result == expected + + result = Timestamp('2017-03-26 03:00', tz='Europe/Paris') + expected = Timestamp('2017-03-26 03:00:00+0200', tz='Europe/Paris') + assert result == expected def test_timestamp_to_datetime_tzoffset(self): - # tzoffset - from dateutil.tz import tzoffset tzinfo = tzoffset(None, 7200) expected = Timestamp('3/11/2012 04:00', tz=tzinfo) result = Timestamp(expected.to_pydatetime()) - self.assertEqual(expected, result) + assert expected == result def test_timedelta_push_over_dst_boundary(self): # #1389 @@ -183,7 +215,7 @@ def test_timedelta_push_over_dst_boundary(self): # spring forward, + "7" hours expected = Timestamp('3/11/2012 05:00', tz=self.tzstr('US/Eastern')) - self.assertEqual(result, expected) + assert result == expected def test_timedelta_push_over_dst_boundary_explicit(self): # #1389 @@ -196,7 +228,7 @@ def test_timedelta_push_over_dst_boundary_explicit(self): # spring forward, + "7" hours expected = Timestamp('3/11/2012 05:00', tz=self.tz('US/Eastern')) - self.assertEqual(result, expected) + assert result == expected def test_tz_localize_dti(self): dti = DatetimeIndex(start='1/1/2005', end='1/1/2005 0:00:30.256', @@ -206,20 +238,20 @@ def test_tz_localize_dti(self): dti_utc = DatetimeIndex(start='1/1/2005 05:00', end='1/1/2005 5:00:30.256', freq='L', tz='utc') - self.assert_numpy_array_equal(dti2.values, dti_utc.values) + tm.assert_numpy_array_equal(dti2.values, dti_utc.values) dti3 = dti2.tz_convert(self.tzstr('US/Pacific')) - self.assert_numpy_array_equal(dti3.values, dti_utc.values) + tm.assert_numpy_array_equal(dti3.values, dti_utc.values) dti = DatetimeIndex(start='11/6/2011 1:59', end='11/6/2011 2:00', freq='L') - self.assertRaises(pytz.AmbiguousTimeError, dti.tz_localize, - self.tzstr('US/Eastern')) + pytest.raises(pytz.AmbiguousTimeError, dti.tz_localize, + self.tzstr('US/Eastern')) dti = DatetimeIndex(start='3/13/2011 1:59', end='3/13/2011 2:00', freq='L') - self.assertRaises(pytz.NonExistentTimeError, dti.tz_localize, - self.tzstr('US/Eastern')) + pytest.raises(pytz.NonExistentTimeError, dti.tz_localize, + self.tzstr('US/Eastern')) def test_tz_localize_empty_series(self): # #2248 @@ -227,57 +259,57 @@ def test_tz_localize_empty_series(self): ts = Series() ts2 = ts.tz_localize('utc') - self.assertTrue(ts2.index.tz == pytz.utc) + assert ts2.index.tz == pytz.utc ts2 = ts.tz_localize(self.tzstr('US/Eastern')) - self.assertTrue(self.cmptz(ts2.index.tz, self.tz('US/Eastern'))) + assert self.cmptz(ts2.index.tz, self.tz('US/Eastern')) def test_astimezone(self): utc = Timestamp('3/11/2012 22:00', tz='UTC') expected = utc.tz_convert(self.tzstr('US/Eastern')) result = utc.astimezone(self.tzstr('US/Eastern')) - self.assertEqual(expected, result) - tm.assertIsInstance(result, Timestamp) + assert expected == result + assert isinstance(result, Timestamp) def test_create_with_tz(self): stamp = Timestamp('3/11/2012 05:00', tz=self.tzstr('US/Eastern')) - self.assertEqual(stamp.hour, 5) + assert stamp.hour == 5 rng = date_range('3/11/2012 04:00', periods=10, freq='H', tz=self.tzstr('US/Eastern')) - self.assertEqual(stamp, rng[1]) + assert stamp == rng[1] utc_stamp = Timestamp('3/11/2012 05:00', tz='utc') - self.assertIs(utc_stamp.tzinfo, pytz.utc) - self.assertEqual(utc_stamp.hour, 5) + assert utc_stamp.tzinfo is pytz.utc + assert utc_stamp.hour == 5 - stamp = Timestamp('3/11/2012 05:00').tz_localize('utc') - self.assertEqual(utc_stamp.hour, 5) + utc_stamp = Timestamp('3/11/2012 05:00').tz_localize('utc') + assert utc_stamp.hour == 5 def test_create_with_fixed_tz(self): off = FixedOffset(420, '+07:00') start = datetime(2012, 3, 11, 5, 0, 0, tzinfo=off) end = datetime(2012, 6, 11, 5, 0, 0, tzinfo=off) rng = date_range(start=start, end=end) - self.assertEqual(off, rng.tz) + assert off == rng.tz rng2 = date_range(start, periods=len(rng), tz=off) - self.assert_index_equal(rng, rng2) + tm.assert_index_equal(rng, rng2) rng3 = date_range('3/11/2012 05:00:00+07:00', '6/11/2012 05:00:00+07:00') - self.assertTrue((rng.values == rng3.values).all()) + assert (rng.values == rng3.values).all() def test_create_with_fixedoffset_noname(self): off = fixed_off_no_name start = datetime(2012, 3, 11, 5, 0, 0, tzinfo=off) end = datetime(2012, 6, 11, 5, 0, 0, tzinfo=off) rng = date_range(start=start, end=end) - self.assertEqual(off, rng.tz) + assert off == rng.tz idx = Index([start, end]) - self.assertEqual(off, idx.tz) + assert off == idx.tz def test_date_range_localize(self): rng = date_range('3/11/2012 03:00', periods=15, freq='H', @@ -287,33 +319,33 @@ def test_date_range_localize(self): rng3 = date_range('3/11/2012 03:00', periods=15, freq='H') rng3 = rng3.tz_localize('US/Eastern') - self.assert_index_equal(rng, rng3) + tm.assert_index_equal(rng, rng3) # DST transition time val = rng[0] exp = Timestamp('3/11/2012 03:00', tz='US/Eastern') - self.assertEqual(val.hour, 3) - self.assertEqual(exp.hour, 3) - self.assertEqual(val, exp) # same UTC value - self.assert_index_equal(rng[:2], rng2) + assert val.hour == 3 + assert exp.hour == 3 + assert val == exp # same UTC value + tm.assert_index_equal(rng[:2], rng2) # Right before the DST transition rng = date_range('3/11/2012 00:00', periods=2, freq='H', tz='US/Eastern') rng2 = DatetimeIndex(['3/11/2012 00:00', '3/11/2012 01:00'], tz='US/Eastern') - self.assert_index_equal(rng, rng2) + tm.assert_index_equal(rng, rng2) exp = Timestamp('3/11/2012 00:00', tz='US/Eastern') - self.assertEqual(exp.hour, 0) - self.assertEqual(rng[0], exp) + assert exp.hour == 0 + assert rng[0] == exp exp = Timestamp('3/11/2012 01:00', tz='US/Eastern') - self.assertEqual(exp.hour, 1) - self.assertEqual(rng[1], exp) + assert exp.hour == 1 + assert rng[1] == exp rng = date_range('3/11/2012 00:00', periods=10, freq='H', tz='US/Eastern') - self.assertEqual(rng[2].hour, 3) + assert rng[2].hour == 3 def test_utc_box_timestamp_and_localize(self): rng = date_range('3/11/2012', '3/12/2012', freq='H', tz='utc') @@ -323,16 +355,16 @@ def test_utc_box_timestamp_and_localize(self): expected = rng[-1].astimezone(tz) stamp = rng_eastern[-1] - self.assertEqual(stamp, expected) - self.assertEqual(stamp.tzinfo, expected.tzinfo) + assert stamp == expected + assert stamp.tzinfo == expected.tzinfo # right tzinfo rng = date_range('3/13/2012', '3/14/2012', freq='H', tz='utc') rng_eastern = rng.tz_convert(self.tzstr('US/Eastern')) # test not valid for dateutil timezones. - # self.assertIn('EDT', repr(rng_eastern[0].tzinfo)) - self.assertTrue('EDT' in repr(rng_eastern[0].tzinfo) or 'tzfile' in - repr(rng_eastern[0].tzinfo)) + # assert 'EDT' in repr(rng_eastern[0].tzinfo) + assert ('EDT' in repr(rng_eastern[0].tzinfo) or + 'tzfile' in repr(rng_eastern[0].tzinfo)) def test_timestamp_tz_convert(self): strdates = ['1/1/2012', '3/1/2012', '4/1/2012'] @@ -341,7 +373,7 @@ def test_timestamp_tz_convert(self): conv = idx[0].tz_convert(self.tzstr('US/Pacific')) expected = idx.tz_convert(self.tzstr('US/Pacific'))[0] - self.assertEqual(conv, expected) + assert conv == expected def test_pass_dates_localize_to_utc(self): strdates = ['1/1/2012', '3/1/2012', '4/1/2012'] @@ -351,20 +383,20 @@ def test_pass_dates_localize_to_utc(self): fromdates = DatetimeIndex(strdates, tz=self.tzstr('US/Eastern')) - self.assertEqual(conv.tz, fromdates.tz) - self.assert_numpy_array_equal(conv.values, fromdates.values) + assert conv.tz == fromdates.tz + tm.assert_numpy_array_equal(conv.values, fromdates.values) def test_field_access_localize(self): strdates = ['1/1/2012', '3/1/2012', '4/1/2012'] rng = DatetimeIndex(strdates, tz=self.tzstr('US/Eastern')) - self.assertTrue((rng.hour == 0).all()) + assert (rng.hour == 0).all() # a more unusual time zone, #1946 dr = date_range('2011-10-02 00:00', freq='h', periods=10, tz=self.tzstr('America/Atikokan')) - expected = np.arange(10, dtype=np.int32) - self.assert_numpy_array_equal(dr.hour, expected) + expected = Index(np.arange(10, dtype=np.int64)) + tm.assert_index_equal(dr.hour, expected) def test_with_tz(self): tz = self.tz('US/Central') @@ -372,7 +404,7 @@ def test_with_tz(self): # just want it to work start = datetime(2011, 3, 12, tzinfo=pytz.utc) dr = bdate_range(start, periods=50, freq=offsets.Hour()) - self.assertIs(dr.tz, pytz.utc) + assert dr.tz is pytz.utc # DateRange with naive datetimes dr = bdate_range('1/1/2005', '1/1/2009', tz=pytz.utc) @@ -380,29 +412,29 @@ def test_with_tz(self): # normalized central = dr.tz_convert(tz) - self.assertIs(central.tz, tz) + assert central.tz is tz comp = self.localize(tz, central[0].to_pydatetime().replace( tzinfo=None)).tzinfo - self.assertIs(central[0].tz, comp) + assert central[0].tz is comp # compare vs a localized tz comp = self.localize(tz, dr[0].to_pydatetime().replace(tzinfo=None)).tzinfo - self.assertIs(central[0].tz, comp) + assert central[0].tz is comp # datetimes with tzinfo set dr = bdate_range(datetime(2005, 1, 1, tzinfo=pytz.utc), '1/1/2009', tz=pytz.utc) - self.assertRaises(Exception, bdate_range, - datetime(2005, 1, 1, tzinfo=pytz.utc), '1/1/2009', - tz=tz) + pytest.raises(Exception, bdate_range, + datetime(2005, 1, 1, tzinfo=pytz.utc), '1/1/2009', + tz=tz) def test_tz_localize(self): dr = bdate_range('1/1/2009', '1/1/2010') dr_utc = bdate_range('1/1/2009', '1/1/2010', tz=pytz.utc) localized = dr.tz_localize(pytz.utc) - self.assert_index_equal(dr_utc, localized) + tm.assert_index_equal(dr_utc, localized) def test_with_tz_ambiguous_times(self): tz = self.tz('US/Eastern') @@ -410,7 +442,7 @@ def test_with_tz_ambiguous_times(self): # March 13, 2011, spring forward, skip from 2 AM to 3 AM dr = date_range(datetime(2011, 3, 13, 1, 30), periods=3, freq=offsets.Hour()) - self.assertRaises(pytz.NonExistentTimeError, dr.tz_localize, tz) + pytest.raises(pytz.NonExistentTimeError, dr.tz_localize, tz) # after dst transition, it works dr = date_range(datetime(2011, 3, 13, 3, 30), periods=3, @@ -419,7 +451,7 @@ def test_with_tz_ambiguous_times(self): # November 6, 2011, fall back, repeat 2 AM hour dr = date_range(datetime(2011, 11, 6, 1, 30), periods=3, freq=offsets.Hour()) - self.assertRaises(pytz.AmbiguousTimeError, dr.tz_localize, tz) + pytest.raises(pytz.AmbiguousTimeError, dr.tz_localize, tz) # UTC is OK dr = date_range(datetime(2011, 3, 13), periods=48, @@ -431,7 +463,7 @@ def test_ambiguous_infer(self): tz = self.tz('US/Eastern') dr = date_range(datetime(2011, 11, 6, 0), periods=5, freq=offsets.Hour()) - self.assertRaises(pytz.AmbiguousTimeError, dr.tz_localize, tz) + pytest.raises(pytz.AmbiguousTimeError, dr.tz_localize, tz) # With repeated hours, we can infer the transition dr = date_range(datetime(2011, 11, 6, 0), periods=5, @@ -440,22 +472,22 @@ def test_ambiguous_infer(self): '11/06/2011 02:00', '11/06/2011 03:00'] di = DatetimeIndex(times) localized = di.tz_localize(tz, ambiguous='infer') - self.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, localized) with tm.assert_produces_warning(FutureWarning): localized_old = di.tz_localize(tz, infer_dst=True) - self.assert_index_equal(dr, localized_old) - self.assert_index_equal(dr, DatetimeIndex(times, tz=tz, - ambiguous='infer')) + tm.assert_index_equal(dr, localized_old) + tm.assert_index_equal(dr, DatetimeIndex(times, tz=tz, + ambiguous='infer')) # When there is no dst transition, nothing special happens dr = date_range(datetime(2011, 6, 1, 0), periods=10, freq=offsets.Hour()) localized = dr.tz_localize(tz) localized_infer = dr.tz_localize(tz, ambiguous='infer') - self.assert_index_equal(localized, localized_infer) + tm.assert_index_equal(localized, localized_infer) with tm.assert_produces_warning(FutureWarning): localized_infer_old = dr.tz_localize(tz, infer_dst=True) - self.assert_index_equal(localized, localized_infer_old) + tm.assert_index_equal(localized, localized_infer_old) def test_ambiguous_flags(self): # November 6, 2011, fall back, repeat 2 AM hour @@ -471,33 +503,33 @@ def test_ambiguous_flags(self): di = DatetimeIndex(times) is_dst = [1, 1, 0, 0, 0] localized = di.tz_localize(tz, ambiguous=is_dst) - self.assert_index_equal(dr, localized) - self.assert_index_equal(dr, DatetimeIndex(times, tz=tz, - ambiguous=is_dst)) + tm.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, DatetimeIndex(times, tz=tz, + ambiguous=is_dst)) localized = di.tz_localize(tz, ambiguous=np.array(is_dst)) - self.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, localized) localized = di.tz_localize(tz, ambiguous=np.array(is_dst).astype('bool')) - self.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, localized) # Test constructor localized = DatetimeIndex(times, tz=tz, ambiguous=is_dst) - self.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, localized) # Test duplicate times where infer_dst fails times += times di = DatetimeIndex(times) # When the sizes are incompatible, make sure error is raised - self.assertRaises(Exception, di.tz_localize, tz, ambiguous=is_dst) + pytest.raises(Exception, di.tz_localize, tz, ambiguous=is_dst) # When sizes are compatible and there are repeats ('infer' won't work) is_dst = np.hstack((is_dst, is_dst)) localized = di.tz_localize(tz, ambiguous=is_dst) dr = dr.append(dr) - self.assert_index_equal(dr, localized) + tm.assert_index_equal(dr, localized) # When there is no dst transition, nothing special happens dr = date_range(datetime(2011, 6, 1, 0), periods=10, @@ -505,7 +537,7 @@ def test_ambiguous_flags(self): is_dst = np.array([1] * 10) localized = dr.tz_localize(tz) localized_is_dst = dr.tz_localize(tz, ambiguous=is_dst) - self.assert_index_equal(localized, localized_is_dst) + tm.assert_index_equal(localized, localized_is_dst) # construction with an ambiguous end-point # GH 11626 @@ -514,16 +546,24 @@ def test_ambiguous_flags(self): def f(): date_range("2013-10-26 23:00", "2013-10-27 01:00", tz="Europe/London", freq="H") - self.assertRaises(pytz.AmbiguousTimeError, f) + pytest.raises(pytz.AmbiguousTimeError, f) times = date_range("2013-10-26 23:00", "2013-10-27 01:00", freq="H", tz=tz, ambiguous='infer') - self.assertEqual(times[0], Timestamp('2013-10-26 23:00', tz=tz, - freq="H")) - if dateutil.__version__ != LooseVersion('2.6.0'): - # GH 14621 - self.assertEqual(times[-1], Timestamp('2013-10-27 01:00', tz=tz, - freq="H")) + assert times[0] == Timestamp('2013-10-26 23:00', tz=tz, freq="H") + + if str(tz).startswith('dateutil'): + if dateutil.__version__ < LooseVersion('2.6.0'): + # see gh-14621 + assert times[-1] == Timestamp('2013-10-27 01:00:00+0000', + tz=tz, freq="H") + elif dateutil.__version__ > LooseVersion('2.6.0'): + # fixed ambiguous behavior + assert times[-1] == Timestamp('2013-10-27 01:00:00+0100', + tz=tz, freq="H") + else: + assert times[-1] == Timestamp('2013-10-27 01:00:00+0000', + tz=tz, freq="H") def test_ambiguous_nat(self): tz = self.tz('US/Eastern') @@ -538,7 +578,7 @@ def test_ambiguous_nat(self): # left dtype is datetime64[ns, US/Eastern] # right is datetime64[ns, tzfile('/usr/share/zoneinfo/US/Eastern')] - self.assert_numpy_array_equal(di_test.values, localized.values) + tm.assert_numpy_array_equal(di_test.values, localized.values) def test_ambiguous_bool(self): # make sure that we are correctly accepting bool values as ambiguous @@ -550,13 +590,13 @@ def test_ambiguous_bool(self): def f(): t.tz_localize('US/Central') - self.assertRaises(pytz.AmbiguousTimeError, f) + pytest.raises(pytz.AmbiguousTimeError, f) result = t.tz_localize('US/Central', ambiguous=True) - self.assertEqual(result, expected0) + assert result == expected0 result = t.tz_localize('US/Central', ambiguous=False) - self.assertEqual(result, expected1) + assert result == expected1 s = Series([t]) expected0 = Series([expected0]) @@ -564,7 +604,7 @@ def f(): def f(): s.dt.tz_localize('US/Central') - self.assertRaises(pytz.AmbiguousTimeError, f) + pytest.raises(pytz.AmbiguousTimeError, f) result = s.dt.tz_localize('US/Central', ambiguous=True) assert_series_equal(result, expected0) @@ -584,10 +624,10 @@ def test_nonexistent_raise_coerce(self): times = ['2015-03-08 01:00', '2015-03-08 02:00', '2015-03-08 03:00'] index = DatetimeIndex(times) tz = 'US/Eastern' - self.assertRaises(NonExistentTimeError, - index.tz_localize, tz=tz) - self.assertRaises(NonExistentTimeError, - index.tz_localize, tz=tz, errors='raise') + pytest.raises(NonExistentTimeError, + index.tz_localize, tz=tz) + pytest.raises(NonExistentTimeError, + index.tz_localize, tz=tz, errors='raise') result = index.tz_localize(tz=tz, errors='coerce') test_times = ['2015-03-08 01:00-05:00', 'NaT', '2015-03-08 03:00-04:00'] @@ -617,23 +657,22 @@ def test_infer_tz(self): assert (tools._infer_tzinfo(start, end) is utc) end = self.localize(eastern, _end) - self.assertRaises(Exception, tools._infer_tzinfo, start, end) - self.assertRaises(Exception, tools._infer_tzinfo, end, start) + pytest.raises(Exception, tools._infer_tzinfo, start, end) + pytest.raises(Exception, tools._infer_tzinfo, end, start) def test_tz_string(self): result = date_range('1/1/2000', periods=10, tz=self.tzstr('US/Eastern')) expected = date_range('1/1/2000', periods=10, tz=self.tz('US/Eastern')) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) def test_take_dont_lose_meta(self): - tm._skip_if_no_pytz() rng = date_range('1/1/2000', periods=20, tz=self.tzstr('US/Eastern')) result = rng.take(lrange(5)) - self.assertEqual(result.tz, rng.tz) - self.assertEqual(result.freq, rng.freq) + assert result.tz == rng.tz + assert result.freq == rng.freq def test_index_with_timezone_repr(self): rng = date_range('4/13/2010', '5/6/2010') @@ -641,7 +680,7 @@ def test_index_with_timezone_repr(self): rng_eastern = rng.tz_localize(self.tzstr('US/Eastern')) rng_repr = repr(rng_eastern) - self.assertIn('2010-04-13 00:00:00', rng_repr) + assert '2010-04-13 00:00:00' in rng_repr def test_index_astype_asobject_tzinfos(self): # #1345 @@ -652,14 +691,14 @@ def test_index_astype_asobject_tzinfos(self): objs = rng.asobject for i, x in enumerate(objs): exval = rng[i] - self.assertEqual(x, exval) - self.assertEqual(x.tzinfo, exval.tzinfo) + assert x == exval + assert x.tzinfo == exval.tzinfo objs = rng.astype(object) for i, x in enumerate(objs): exval = rng[i] - self.assertEqual(x, exval) - self.assertEqual(x.tzinfo, exval.tzinfo) + assert x == exval + assert x.tzinfo == exval.tzinfo def test_localized_at_time_between_time(self): from datetime import time @@ -673,37 +712,37 @@ def test_localized_at_time_between_time(self): expected = ts.at_time(time(10, 0)).tz_localize(self.tzstr( 'US/Eastern')) assert_series_equal(result, expected) - self.assertTrue(self.cmptz(result.index.tz, self.tz('US/Eastern'))) + assert self.cmptz(result.index.tz, self.tz('US/Eastern')) t1, t2 = time(10, 0), time(11, 0) result = ts_local.between_time(t1, t2) expected = ts.between_time(t1, t2).tz_localize(self.tzstr('US/Eastern')) assert_series_equal(result, expected) - self.assertTrue(self.cmptz(result.index.tz, self.tz('US/Eastern'))) + assert self.cmptz(result.index.tz, self.tz('US/Eastern')) def test_string_index_alias_tz_aware(self): rng = date_range('1/1/2000', periods=10, tz=self.tzstr('US/Eastern')) ts = Series(np.random.randn(len(rng)), index=rng) result = ts['1/3/2000'] - self.assertAlmostEqual(result, ts[2]) + tm.assert_almost_equal(result, ts[2]) def test_fixed_offset(self): dates = [datetime(2000, 1, 1, tzinfo=fixed_off), datetime(2000, 1, 2, tzinfo=fixed_off), datetime(2000, 1, 3, tzinfo=fixed_off)] result = to_datetime(dates) - self.assertEqual(result.tz, fixed_off) + assert result.tz == fixed_off def test_fixedtz_topydatetime(self): dates = np.array([datetime(2000, 1, 1, tzinfo=fixed_off), datetime(2000, 1, 2, tzinfo=fixed_off), datetime(2000, 1, 3, tzinfo=fixed_off)]) result = to_datetime(dates).to_pydatetime() - self.assert_numpy_array_equal(dates, result) + tm.assert_numpy_array_equal(dates, result) result = to_datetime(dates)._mpl_repr() - self.assert_numpy_array_equal(dates, result) + tm.assert_numpy_array_equal(dates, result) def test_convert_tz_aware_datetime_datetime(self): # #1581 @@ -715,35 +754,32 @@ def test_convert_tz_aware_datetime_datetime(self): dates_aware = [self.localize(tz, x) for x in dates] result = to_datetime(dates_aware) - self.assertTrue(self.cmptz(result.tz, self.tz('US/Eastern'))) + assert self.cmptz(result.tz, self.tz('US/Eastern')) converted = to_datetime(dates_aware, utc=True) ex_vals = np.array([Timestamp(x).value for x in dates_aware]) - self.assert_numpy_array_equal(converted.asi8, ex_vals) - self.assertIs(converted.tz, pytz.utc) + tm.assert_numpy_array_equal(converted.asi8, ex_vals) + assert converted.tz is pytz.utc def test_to_datetime_utc(self): - from dateutil.parser import parse arr = np.array([parse('2012-06-13T01:39:00Z')], dtype=object) result = to_datetime(arr, utc=True) - self.assertIs(result.tz, pytz.utc) + assert result.tz is pytz.utc def test_to_datetime_tzlocal(self): - from dateutil.parser import parse - from dateutil.tz import tzlocal dt = parse('2012-06-13T01:39:00Z') dt = dt.replace(tzinfo=tzlocal()) arr = np.array([dt], dtype=object) result = to_datetime(arr, utc=True) - self.assertIs(result.tz, pytz.utc) + assert result.tz is pytz.utc rng = date_range('2012-11-03 03:00', '2012-11-05 03:00', tz=tzlocal()) arr = rng.to_pydatetime() result = to_datetime(arr, utc=True) - self.assertIs(result.tz, pytz.utc) + assert result.tz is pytz.utc def test_frame_no_datetime64_dtype(self): @@ -754,7 +790,7 @@ def test_frame_no_datetime64_dtype(self): dr_tz = dr.tz_localize(self.tzstr('US/Eastern')) e = DataFrame({'A': 'foo', 'B': dr_tz}, index=dr) tz_expected = DatetimeTZDtype('ns', dr_tz.tzinfo) - self.assertEqual(e['B'].dtype, tz_expected) + assert e['B'].dtype == tz_expected # GH 2810 (with timezones) datetimes_naive = [ts.to_pydatetime() for ts in dr] @@ -788,7 +824,7 @@ def test_shift_localized(self): dr_tz = dr.tz_localize(self.tzstr('US/Eastern')) result = dr_tz.shift(1, '10T') - self.assertEqual(result.tz, dr_tz.tz) + assert result.tz == dr_tz.tz def test_tz_aware_asfreq(self): dr = date_range('2011-12-01', '2012-07-20', freq='D', @@ -809,7 +845,7 @@ def test_tzaware_datetime_to_index(self): d = [datetime(2012, 8, 19, tzinfo=self.tz('US/Eastern'))] index = DatetimeIndex(d) - self.assertTrue(self.cmptz(index.tz, self.tz('US/Eastern'))) + assert self.cmptz(index.tz, self.tz('US/Eastern')) def test_date_range_span_dst_transition(self): # #1778 @@ -818,18 +854,18 @@ def test_date_range_span_dst_transition(self): dr = date_range('03/06/2012 00:00', periods=200, freq='W-FRI', tz='US/Eastern') - self.assertTrue((dr.hour == 0).all()) + assert (dr.hour == 0).all() dr = date_range('2012-11-02', periods=10, tz=self.tzstr('US/Eastern')) - self.assertTrue((dr.hour == 0).all()) + assert (dr.hour == 0).all() def test_convert_datetime_list(self): dr = date_range('2012-06-02', periods=10, tz=self.tzstr('US/Eastern'), name='foo') dr2 = DatetimeIndex(list(dr), name='foo') - self.assert_index_equal(dr, dr2) - self.assertEqual(dr.tz, dr2.tz) - self.assertEqual(dr2.name, 'foo') + tm.assert_index_equal(dr, dr2) + assert dr.tz == dr2.tz + assert dr2.name == 'foo' def test_frame_from_records_utc(self): rec = {'datum': 1.5, @@ -844,17 +880,16 @@ def test_frame_reset_index(self): roundtripped = df.reset_index().set_index('index') xp = df.index.tz rs = roundtripped.index.tz - self.assertEqual(xp, rs) + assert xp == rs def test_dateutil_tzoffset_support(self): - from dateutil.tz import tzoffset values = [188.5, 328.25] tzinfo = tzoffset(None, 7200) index = [datetime(2012, 5, 11, 11, tzinfo=tzinfo), datetime(2012, 5, 11, 12, tzinfo=tzinfo)] series = Series(data=values, index=index) - self.assertEqual(series.index.tz, tzinfo) + assert series.index.tz == tzinfo # it works! #2443 repr(series.index[0]) @@ -867,14 +902,14 @@ def test_getitem_pydatetime_tz(self): tz=self.tzstr('Europe/Berlin')) time_datetime = self.localize( self.tz('Europe/Berlin'), datetime(2012, 12, 24, 17, 0)) - self.assertEqual(ts[time_pandas], ts[time_datetime]) + assert ts[time_pandas] == ts[time_datetime] def test_index_drop_dont_lose_tz(self): # #2621 ind = date_range("2012-12-01", periods=10, tz="utc") ind = ind.drop(ind[-1]) - self.assertTrue(ind.tz is not None) + assert ind.tz is not None def test_datetimeindex_tz(self): """ Test different DatetimeIndex constructions with timezone @@ -890,21 +925,17 @@ def test_datetimeindex_tz(self): idx4 = DatetimeIndex(np.array(arr), tz=self.tzstr('US/Eastern')) for other in [idx2, idx3, idx4]: - self.assert_index_equal(idx1, other) + tm.assert_index_equal(idx1, other) def test_datetimeindex_tz_nat(self): idx = to_datetime([Timestamp("2013-1-1", tz=self.tzstr('US/Eastern')), NaT]) - self.assertTrue(isnull(idx[1])) - self.assertTrue(idx[0].tzinfo is not None) + assert isna(idx[1]) + assert idx[0].tzinfo is not None class TestTimeZoneSupportDateutil(TestTimeZoneSupportPytz): - _multiprocess_can_split_ = True - - def setUp(self): - tm._skip_if_no_dateutil() def tz(self, tz): """ @@ -931,17 +962,17 @@ def test_utc_with_system_utc(self): # Skipped on win32 due to dateutil bug tm._skip_if_windows() - from pandas.tslib import maybe_get_tz + from pandas._libs.tslib import maybe_get_tz # from system utc to real utc ts = Timestamp('2001-01-05 11:56', tz=maybe_get_tz('dateutil/UTC')) # check that the time hasn't changed. - self.assertEqual(ts, ts.tz_convert(dateutil.tz.tzutc())) + assert ts == ts.tz_convert(dateutil.tz.tzutc()) # from system utc to real utc ts = Timestamp('2001-01-05 11:56', tz=maybe_get_tz('dateutil/UTC')) # check that the time hasn't changed. - self.assertEqual(ts, ts.tz_convert(dateutil.tz.tzutc())) + assert ts == ts.tz_convert(dateutil.tz.tzutc()) def test_tz_convert_hour_overflow_dst(self): # Regression test for: @@ -953,8 +984,8 @@ def test_tz_convert_hour_overflow_dst(self): '2009-05-12 09:50:32'] tt = to_datetime(ts).tz_localize('US/Eastern') ut = tt.tz_convert('UTC') - expected = np.array([13, 14, 13], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([13, 14, 13]) + tm.assert_index_equal(ut.hour, expected) # sorted case UTC -> US/Eastern ts = ['2008-05-12 13:50:00', @@ -962,8 +993,8 @@ def test_tz_convert_hour_overflow_dst(self): '2009-05-12 13:50:32'] tt = to_datetime(ts).tz_localize('UTC') ut = tt.tz_convert('US/Eastern') - expected = np.array([9, 9, 9], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([9, 9, 9]) + tm.assert_index_equal(ut.hour, expected) # unsorted case US/Eastern -> UTC ts = ['2008-05-12 09:50:00', @@ -971,8 +1002,8 @@ def test_tz_convert_hour_overflow_dst(self): '2008-05-12 09:50:32'] tt = to_datetime(ts).tz_localize('US/Eastern') ut = tt.tz_convert('UTC') - expected = np.array([13, 14, 13], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([13, 14, 13]) + tm.assert_index_equal(ut.hour, expected) # unsorted case UTC -> US/Eastern ts = ['2008-05-12 13:50:00', @@ -980,8 +1011,8 @@ def test_tz_convert_hour_overflow_dst(self): '2008-05-12 13:50:32'] tt = to_datetime(ts).tz_localize('UTC') ut = tt.tz_convert('US/Eastern') - expected = np.array([9, 9, 9], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([9, 9, 9]) + tm.assert_index_equal(ut.hour, expected) def test_tz_convert_hour_overflow_dst_timestamps(self): # Regression test for: @@ -995,8 +1026,8 @@ def test_tz_convert_hour_overflow_dst_timestamps(self): Timestamp('2009-05-12 09:50:32', tz=tz)] tt = to_datetime(ts) ut = tt.tz_convert('UTC') - expected = np.array([13, 14, 13], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([13, 14, 13]) + tm.assert_index_equal(ut.hour, expected) # sorted case UTC -> US/Eastern ts = [Timestamp('2008-05-12 13:50:00', tz='UTC'), @@ -1004,8 +1035,8 @@ def test_tz_convert_hour_overflow_dst_timestamps(self): Timestamp('2009-05-12 13:50:32', tz='UTC')] tt = to_datetime(ts) ut = tt.tz_convert('US/Eastern') - expected = np.array([9, 9, 9], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([9, 9, 9]) + tm.assert_index_equal(ut.hour, expected) # unsorted case US/Eastern -> UTC ts = [Timestamp('2008-05-12 09:50:00', tz=tz), @@ -1013,8 +1044,8 @@ def test_tz_convert_hour_overflow_dst_timestamps(self): Timestamp('2008-05-12 09:50:32', tz=tz)] tt = to_datetime(ts) ut = tt.tz_convert('UTC') - expected = np.array([13, 14, 13], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([13, 14, 13]) + tm.assert_index_equal(ut.hour, expected) # unsorted case UTC -> US/Eastern ts = [Timestamp('2008-05-12 13:50:00', tz='UTC'), @@ -1022,8 +1053,8 @@ def test_tz_convert_hour_overflow_dst_timestamps(self): Timestamp('2008-05-12 13:50:32', tz='UTC')] tt = to_datetime(ts) ut = tt.tz_convert('US/Eastern') - expected = np.array([9, 9, 9], dtype=np.int32) - self.assert_numpy_array_equal(ut.hour, expected) + expected = Index([9, 9, 9]) + tm.assert_index_equal(ut.hour, expected) def test_tslib_tz_convert_trans_pos_plus_1__bug(self): # Regression test for tslib.tz_convert(vals, tz1, tz2). @@ -1034,9 +1065,8 @@ def test_tslib_tz_convert_trans_pos_plus_1__bug(self): idx = idx.tz_localize('UTC') idx = idx.tz_convert('Europe/Moscow') - expected = np.repeat(np.array([3, 4, 5], dtype=np.int32), - np.array([n, n, 1])) - self.assert_numpy_array_equal(idx.hour, expected) + expected = np.repeat(np.array([3, 4, 5]), np.array([n, n, 1])) + tm.assert_index_equal(idx.hour, Index(expected)) def test_tslib_tz_convert_dst(self): for freq, n in [('H', 1), ('T', 60), ('S', 3600)]: @@ -1045,76 +1075,71 @@ def test_tslib_tz_convert_dst(self): tz='UTC') idx = idx.tz_convert('US/Eastern') expected = np.repeat(np.array([18, 19, 20, 21, 22, 23, - 0, 1, 3, 4, 5], dtype=np.int32), + 0, 1, 3, 4, 5]), np.array([n, n, n, n, n, n, n, n, n, n, 1])) - self.assert_numpy_array_equal(idx.hour, expected) + tm.assert_index_equal(idx.hour, Index(expected)) idx = date_range('2014-03-08 18:00', '2014-03-09 05:00', freq=freq, tz='US/Eastern') idx = idx.tz_convert('UTC') - expected = np.repeat(np.array([23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], - dtype=np.int32), + expected = np.repeat(np.array([23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), np.array([n, n, n, n, n, n, n, n, n, n, 1])) - self.assert_numpy_array_equal(idx.hour, expected) + tm.assert_index_equal(idx.hour, Index(expected)) # End DST idx = date_range('2014-11-01 23:00', '2014-11-02 09:00', freq=freq, tz='UTC') idx = idx.tz_convert('US/Eastern') expected = np.repeat(np.array([19, 20, 21, 22, 23, - 0, 1, 1, 2, 3, 4], dtype=np.int32), + 0, 1, 1, 2, 3, 4]), np.array([n, n, n, n, n, n, n, n, n, n, 1])) - self.assert_numpy_array_equal(idx.hour, expected) + tm.assert_index_equal(idx.hour, Index(expected)) idx = date_range('2014-11-01 18:00', '2014-11-02 05:00', freq=freq, tz='US/Eastern') idx = idx.tz_convert('UTC') expected = np.repeat(np.array([22, 23, 0, 1, 2, 3, 4, 5, 6, - 7, 8, 9, 10], dtype=np.int32), + 7, 8, 9, 10]), np.array([n, n, n, n, n, n, n, n, n, n, n, n, 1])) - self.assert_numpy_array_equal(idx.hour, expected) + tm.assert_index_equal(idx.hour, Index(expected)) # daily # Start DST idx = date_range('2014-03-08 00:00', '2014-03-09 00:00', freq='D', tz='UTC') idx = idx.tz_convert('US/Eastern') - self.assert_numpy_array_equal(idx.hour, - np.array([19, 19], dtype=np.int32)) + tm.assert_index_equal(idx.hour, Index([19, 19])) idx = date_range('2014-03-08 00:00', '2014-03-09 00:00', freq='D', tz='US/Eastern') idx = idx.tz_convert('UTC') - self.assert_numpy_array_equal(idx.hour, - np.array([5, 5], dtype=np.int32)) + tm.assert_index_equal(idx.hour, Index([5, 5])) # End DST idx = date_range('2014-11-01 00:00', '2014-11-02 00:00', freq='D', tz='UTC') idx = idx.tz_convert('US/Eastern') - self.assert_numpy_array_equal(idx.hour, - np.array([20, 20], dtype=np.int32)) + tm.assert_index_equal(idx.hour, Index([20, 20])) idx = date_range('2014-11-01 00:00', '2014-11-02 000:00', freq='D', tz='US/Eastern') idx = idx.tz_convert('UTC') - self.assert_numpy_array_equal(idx.hour, - np.array([4, 4], dtype=np.int32)) + tm.assert_index_equal(idx.hour, Index([4, 4])) def test_tzlocal(self): # GH 13583 ts = Timestamp('2011-01-01', tz=dateutil.tz.tzlocal()) - self.assertEqual(ts.tz, dateutil.tz.tzlocal()) - self.assertTrue("tz='tzlocal()')" in repr(ts)) + assert ts.tz == dateutil.tz.tzlocal() + assert "tz='tzlocal()')" in repr(ts) tz = tslib.maybe_get_tz('tzlocal()') - self.assertEqual(tz, dateutil.tz.tzlocal()) + assert tz == dateutil.tz.tzlocal() # get offset using normal datetime for test offset = dateutil.tz.tzlocal().utcoffset(datetime(2011, 1, 1)) offset = offset.total_seconds() * 1000000000 - self.assertEqual(ts.value + offset, Timestamp('2011-01-01').value) + assert ts.value + offset == Timestamp('2011-01-01').value def test_tz_localize_tzlocal(self): # GH 13583 @@ -1143,7 +1168,8 @@ def test_tz_convert_tzlocal(self): tm.assert_numpy_array_equal(dti2.asi8, dti.asi8) -class TestTimeZoneCacheKey(tm.TestCase): +class TestTimeZoneCacheKey(object): + def test_cache_keys_are_distinct_for_pytz_vs_dateutil(self): tzs = pytz.common_timezones for tz_name in tzs: @@ -1155,17 +1181,12 @@ def test_cache_keys_are_distinct_for_pytz_vs_dateutil(self): if tz_d is None: # skip timezones that dateutil doesn't know about. continue - self.assertNotEqual(tslib._p_tz_cache_key( - tz_p), tslib._p_tz_cache_key(tz_d)) + assert tslib._p_tz_cache_key(tz_p) != tslib._p_tz_cache_key(tz_d) -class TestTimeZones(tm.TestCase): - _multiprocess_can_split_ = True +class TestTimeZones(object): timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific'] - def setUp(self): - tm._skip_if_no_pytz() - def test_replace(self): # GH 14621 # GH 7825 @@ -1173,45 +1194,43 @@ def test_replace(self): dt = Timestamp('2016-01-01 09:00:00') result = dt.replace(hour=0) expected = Timestamp('2016-01-01 00:00:00') - self.assertEqual(result, expected) + assert result == expected for tz in self.timezones: dt = Timestamp('2016-01-01 09:00:00', tz=tz) result = dt.replace(hour=0) expected = Timestamp('2016-01-01 00:00:00', tz=tz) - self.assertEqual(result, expected) + assert result == expected # we preserve nanoseconds dt = Timestamp('2016-01-01 09:00:00.000000123', tz=tz) result = dt.replace(hour=0) expected = Timestamp('2016-01-01 00:00:00.000000123', tz=tz) - self.assertEqual(result, expected) + assert result == expected # test all dt = Timestamp('2016-01-01 09:00:00.000000123', tz=tz) result = dt.replace(year=2015, month=2, day=2, hour=0, minute=5, second=5, microsecond=5, nanosecond=5) expected = Timestamp('2015-02-02 00:05:05.000005005', tz=tz) - self.assertEqual(result, expected) + assert result == expected # error def f(): dt.replace(foo=5) - self.assertRaises(TypeError, f) + pytest.raises(TypeError, f) def f(): dt.replace(hour=0.1) - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) # assert conversion to naive is the same as replacing tzinfo with None dt = Timestamp('2013-11-03 01:59:59.999999-0400', tz='US/Eastern') - self.assertEqual(dt.tz_localize(None), dt.replace(tzinfo=None)) + assert dt.tz_localize(None) == dt.replace(tzinfo=None) def test_ambiguous_compat(self): # validate that pytz and dateutil are compat for dst # when the transition happens - tm._skip_if_no_dateutil() - tm._skip_if_no_pytz() pytz_zone = 'Europe/London' dateutil_zone = 'dateutil/Europe/London' @@ -1219,37 +1238,42 @@ def test_ambiguous_compat(self): .tz_localize(pytz_zone, ambiguous=0)) result_dateutil = (Timestamp('2013-10-27 01:00:00') .tz_localize(dateutil_zone, ambiguous=0)) - self.assertEqual(result_pytz.value, result_dateutil.value) - self.assertEqual(result_pytz.value, 1382835600000000000) - - # dateutil 2.6 buggy w.r.t. ambiguous=0 - if dateutil.__version__ != LooseVersion('2.6.0'): - # GH 14621 - # https://github.com/dateutil/dateutil/issues/321 - self.assertEqual(result_pytz.to_pydatetime().tzname(), - result_dateutil.to_pydatetime().tzname()) - self.assertEqual(str(result_pytz), str(result_dateutil)) + assert result_pytz.value == result_dateutil.value + assert result_pytz.value == 1382835600000000000 + + if dateutil.__version__ < LooseVersion('2.6.0'): + # dateutil 2.6 buggy w.r.t. ambiguous=0 + # see gh-14621 + # see https://github.com/dateutil/dateutil/issues/321 + assert (result_pytz.to_pydatetime().tzname() == + result_dateutil.to_pydatetime().tzname()) + assert str(result_pytz) == str(result_dateutil) + elif dateutil.__version__ > LooseVersion('2.6.0'): + # fixed ambiguous behavior + assert result_pytz.to_pydatetime().tzname() == 'GMT' + assert result_dateutil.to_pydatetime().tzname() == 'BST' + assert str(result_pytz) != str(result_dateutil) # 1 hour difference result_pytz = (Timestamp('2013-10-27 01:00:00') .tz_localize(pytz_zone, ambiguous=1)) result_dateutil = (Timestamp('2013-10-27 01:00:00') .tz_localize(dateutil_zone, ambiguous=1)) - self.assertEqual(result_pytz.value, result_dateutil.value) - self.assertEqual(result_pytz.value, 1382832000000000000) + assert result_pytz.value == result_dateutil.value + assert result_pytz.value == 1382832000000000000 # dateutil < 2.6 is buggy w.r.t. ambiguous timezones if dateutil.__version__ > LooseVersion('2.5.3'): - # GH 14621 - self.assertEqual(str(result_pytz), str(result_dateutil)) - self.assertEqual(result_pytz.to_pydatetime().tzname(), - result_dateutil.to_pydatetime().tzname()) + # see gh-14621 + assert str(result_pytz) == str(result_dateutil) + assert (result_pytz.to_pydatetime().tzname() == + result_dateutil.to_pydatetime().tzname()) def test_index_equals_with_tz(self): left = date_range('1/1/2011', periods=100, freq='H', tz='utc') right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern') - self.assertFalse(left.equals(right)) + assert not left.equals(right) def test_tz_localize_naive(self): rng = date_range('1/1/2011', periods=100, freq='H') @@ -1257,7 +1281,7 @@ def test_tz_localize_naive(self): conv = rng.tz_localize('US/Pacific') exp = date_range('1/1/2011', periods=100, freq='H', tz='US/Pacific') - self.assert_index_equal(conv, exp) + tm.assert_index_equal(conv, exp) def test_tz_localize_roundtrip(self): for tz in self.timezones: @@ -1271,12 +1295,12 @@ def test_tz_localize_roundtrip(self): tz=tz) tm.assert_index_equal(localized, expected) - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): localized.tz_localize(tz) reset = localized.tz_localize(None) tm.assert_index_equal(reset, idx) - self.assertTrue(reset.tzinfo is None) + assert reset.tzinfo is None def test_series_frame_tz_localize(self): @@ -1284,48 +1308,48 @@ def test_series_frame_tz_localize(self): ts = Series(1, index=rng) result = ts.tz_localize('utc') - self.assertEqual(result.index.tz.zone, 'UTC') + assert result.index.tz.zone == 'UTC' df = DataFrame({'a': 1}, index=rng) result = df.tz_localize('utc') expected = DataFrame({'a': 1}, rng.tz_localize('UTC')) - self.assertEqual(result.index.tz.zone, 'UTC') + assert result.index.tz.zone == 'UTC' assert_frame_equal(result, expected) df = df.T result = df.tz_localize('utc', axis=1) - self.assertEqual(result.columns.tz.zone, 'UTC') + assert result.columns.tz.zone == 'UTC' assert_frame_equal(result, expected.T) # Can't localize if already tz-aware rng = date_range('1/1/2011', periods=100, freq='H', tz='utc') ts = Series(1, index=rng) - tm.assertRaisesRegexp(TypeError, 'Already tz-aware', ts.tz_localize, - 'US/Eastern') + tm.assert_raises_regex(TypeError, 'Already tz-aware', + ts.tz_localize, 'US/Eastern') def test_series_frame_tz_convert(self): rng = date_range('1/1/2011', periods=200, freq='D', tz='US/Eastern') ts = Series(1, index=rng) result = ts.tz_convert('Europe/Berlin') - self.assertEqual(result.index.tz.zone, 'Europe/Berlin') + assert result.index.tz.zone == 'Europe/Berlin' df = DataFrame({'a': 1}, index=rng) result = df.tz_convert('Europe/Berlin') expected = DataFrame({'a': 1}, rng.tz_convert('Europe/Berlin')) - self.assertEqual(result.index.tz.zone, 'Europe/Berlin') + assert result.index.tz.zone == 'Europe/Berlin' assert_frame_equal(result, expected) df = df.T result = df.tz_convert('Europe/Berlin', axis=1) - self.assertEqual(result.columns.tz.zone, 'Europe/Berlin') + assert result.columns.tz.zone == 'Europe/Berlin' assert_frame_equal(result, expected.T) # can't convert tz-naive rng = date_range('1/1/2011', periods=200, freq='D') ts = Series(1, index=rng) - tm.assertRaisesRegexp(TypeError, "Cannot convert tz-naive", - ts.tz_convert, 'US/Eastern') + tm.assert_raises_regex(TypeError, "Cannot convert tz-naive", + ts.tz_convert, 'US/Eastern') def test_tz_convert_roundtrip(self): for tz in self.timezones: @@ -1350,7 +1374,7 @@ def test_tz_convert_roundtrip(self): converted = idx.tz_convert(tz) reset = converted.tz_convert(None) tm.assert_index_equal(reset, expected) - self.assertTrue(reset.tzinfo is None) + assert reset.tzinfo is None tm.assert_index_equal(reset, converted.tz_convert( 'UTC').tz_localize(None)) @@ -1362,12 +1386,12 @@ def test_join_utc_convert(self): for how in ['inner', 'outer', 'left', 'right']: result = left.join(left[:-5], how=how) - tm.assertIsInstance(result, DatetimeIndex) - self.assertEqual(result.tz, left.tz) + assert isinstance(result, DatetimeIndex) + assert result.tz == left.tz result = left.join(right[:-5], how=how) - tm.assertIsInstance(result, DatetimeIndex) - self.assertEqual(result.tz.zone, 'UTC') + assert isinstance(result, DatetimeIndex) + assert result.tz.zone == 'UTC' def test_join_aware(self): rng = date_range('1/1/2011', periods=10, freq='H') @@ -1375,8 +1399,8 @@ def test_join_aware(self): ts_utc = ts.tz_localize('utc') - self.assertRaises(Exception, ts.__add__, ts_utc) - self.assertRaises(Exception, ts_utc.__add__, ts) + pytest.raises(Exception, ts.__add__, ts_utc) + pytest.raises(Exception, ts_utc.__add__, ts) test1 = DataFrame(np.zeros((6, 3)), index=date_range("2012-11-15 00:00:00", periods=6, @@ -1389,8 +1413,8 @@ def test_join_aware(self): result = test1.join(test2, how='outer') ex_index = test1.index.union(test2.index) - self.assert_index_equal(result.index, ex_index) - self.assertTrue(result.index.tz.zone == 'US/Central') + tm.assert_index_equal(result.index, ex_index) + assert result.index.tz.zone == 'US/Central' # non-overlapping rng = date_range("2012-11-15 00:00:00", periods=6, freq="H", @@ -1400,7 +1424,7 @@ def test_join_aware(self): tz="US/Eastern") result = rng.union(rng2) - self.assertTrue(result.tz.zone == 'UTC') + assert result.tz.zone == 'UTC' def test_align_aware(self): idx1 = date_range('2001', periods=5, freq='H', tz='US/Eastern') @@ -1408,30 +1432,30 @@ def test_align_aware(self): df1 = DataFrame(np.random.randn(len(idx1), 3), idx1) df2 = DataFrame(np.random.randn(len(idx2), 3), idx2) new1, new2 = df1.align(df2) - self.assertEqual(df1.index.tz, new1.index.tz) - self.assertEqual(df2.index.tz, new2.index.tz) + assert df1.index.tz == new1.index.tz + assert df2.index.tz == new2.index.tz # # different timezones convert to UTC # frame df1_central = df1.tz_convert('US/Central') new1, new2 = df1.align(df1_central) - self.assertEqual(new1.index.tz, pytz.UTC) - self.assertEqual(new2.index.tz, pytz.UTC) + assert new1.index.tz == pytz.UTC + assert new2.index.tz == pytz.UTC # series new1, new2 = df1[0].align(df1_central[0]) - self.assertEqual(new1.index.tz, pytz.UTC) - self.assertEqual(new2.index.tz, pytz.UTC) + assert new1.index.tz == pytz.UTC + assert new2.index.tz == pytz.UTC # combination new1, new2 = df1.align(df1_central[0], axis=0) - self.assertEqual(new1.index.tz, pytz.UTC) - self.assertEqual(new2.index.tz, pytz.UTC) + assert new1.index.tz == pytz.UTC + assert new2.index.tz == pytz.UTC df1[0].align(df1_central, axis=0) - self.assertEqual(new1.index.tz, pytz.UTC) - self.assertEqual(new2.index.tz, pytz.UTC) + assert new1.index.tz == pytz.UTC + assert new2.index.tz == pytz.UTC def test_append_aware(self): rng1 = date_range('1/1/2011 01:00', periods=1, freq='H', @@ -1446,7 +1470,7 @@ def test_append_aware(self): tz='US/Eastern') exp = Series([1, 2], index=exp_index) assert_series_equal(ts_result, exp) - self.assertEqual(ts_result.index.tz, rng1.tz) + assert ts_result.index.tz == rng1.tz rng1 = date_range('1/1/2011 01:00', periods=1, freq='H', tz='UTC') rng2 = date_range('1/1/2011 02:00', periods=1, freq='H', tz='UTC') @@ -1459,7 +1483,7 @@ def test_append_aware(self): exp = Series([1, 2], index=exp_index) assert_series_equal(ts_result, exp) utc = rng1.tz - self.assertEqual(utc, ts_result.index.tz) + assert utc == ts_result.index.tz # GH 7795 # different tz coerces to object dtype, not UTC @@ -1490,7 +1514,7 @@ def test_append_dst(self): tz='US/Eastern') exp = Series([1, 2, 3, 10, 11, 12], index=exp_index) assert_series_equal(ts_result, exp) - self.assertEqual(ts_result.index.tz, rng1.tz) + assert ts_result.index.tz == rng1.tz def test_append_aware_naive(self): rng1 = date_range('1/1/2011 01:00', periods=1, freq='H') @@ -1500,8 +1524,8 @@ def test_append_aware_naive(self): ts2 = Series(np.random.randn(len(rng2)), index=rng2) ts_result = ts1.append(ts2) - self.assertTrue(ts_result.index.equals(ts1.index.asobject.append( - ts2.index.asobject))) + assert ts_result.index.equals(ts1.index.asobject.append( + ts2.index.asobject)) # mixed rng1 = date_range('1/1/2011 01:00', periods=1, freq='H') @@ -1509,8 +1533,8 @@ def test_append_aware_naive(self): ts1 = Series(np.random.randn(len(rng1)), index=rng1) ts2 = Series(np.random.randn(len(rng2)), index=rng2) ts_result = ts1.append(ts2) - self.assertTrue(ts_result.index.equals(ts1.index.asobject.append( - ts2.index))) + assert ts_result.index.equals(ts1.index.asobject.append( + ts2.index)) def test_equal_join_ensure_utc(self): rng = date_range('1/1/2011', periods=10, freq='H', tz='US/Eastern') @@ -1519,18 +1543,18 @@ def test_equal_join_ensure_utc(self): ts_moscow = ts.tz_convert('Europe/Moscow') result = ts + ts_moscow - self.assertIs(result.index.tz, pytz.utc) + assert result.index.tz is pytz.utc result = ts_moscow + ts - self.assertIs(result.index.tz, pytz.utc) + assert result.index.tz is pytz.utc df = DataFrame({'a': ts}) df_moscow = df.tz_convert('Europe/Moscow') result = df + df_moscow - self.assertIs(result.index.tz, pytz.utc) + assert result.index.tz is pytz.utc result = df_moscow + df - self.assertIs(result.index.tz, pytz.utc) + assert result.index.tz is pytz.utc def test_arith_utc_convert(self): rng = date_range('1/1/2011', periods=100, freq='H', tz='utc') @@ -1549,7 +1573,7 @@ def test_arith_utc_convert(self): uts2 = ts2.tz_convert('utc') expected = uts1 + uts2 - self.assertEqual(result.index.tz, pytz.UTC) + assert result.index.tz == pytz.UTC assert_series_equal(result, expected) def test_intersection(self): @@ -1558,9 +1582,9 @@ def test_intersection(self): left = rng[10:90][::-1] right = rng[20:80][::-1] - self.assertEqual(left.tz, rng.tz) + assert left.tz == rng.tz result = left.intersection(right) - self.assertEqual(result.tz, left.tz) + assert result.tz == left.tz def test_timestamp_equality_different_timezones(self): utc_range = date_range('1/1/2000', periods=20, tz='UTC') @@ -1568,19 +1592,19 @@ def test_timestamp_equality_different_timezones(self): berlin_range = utc_range.tz_convert('Europe/Berlin') for a, b, c in zip(utc_range, eastern_range, berlin_range): - self.assertEqual(a, b) - self.assertEqual(b, c) - self.assertEqual(a, c) + assert a == b + assert b == c + assert a == c - self.assertTrue((utc_range == eastern_range).all()) - self.assertTrue((utc_range == berlin_range).all()) - self.assertTrue((berlin_range == eastern_range).all()) + assert (utc_range == eastern_range).all() + assert (utc_range == berlin_range).all() + assert (berlin_range == eastern_range).all() def test_datetimeindex_tz(self): rng = date_range('03/12/2012 00:00', periods=10, freq='W-FRI', tz='US/Eastern') rng2 = DatetimeIndex(data=rng, tz='US/Eastern') - self.assert_index_equal(rng, rng2) + tm.assert_index_equal(rng, rng2) def test_normalize_tz(self): rng = date_range('1/1/2000 9:30', periods=10, freq='D', @@ -1589,33 +1613,30 @@ def test_normalize_tz(self): result = rng.normalize() expected = date_range('1/1/2000', periods=10, freq='D', tz='US/Eastern') - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) - self.assertTrue(result.is_normalized) - self.assertFalse(rng.is_normalized) + assert result.is_normalized + assert not rng.is_normalized rng = date_range('1/1/2000 9:30', periods=10, freq='D', tz='UTC') result = rng.normalize() expected = date_range('1/1/2000', periods=10, freq='D', tz='UTC') - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) - self.assertTrue(result.is_normalized) - self.assertFalse(rng.is_normalized) + assert result.is_normalized + assert not rng.is_normalized - from dateutil.tz import tzlocal rng = date_range('1/1/2000 9:30', periods=10, freq='D', tz=tzlocal()) result = rng.normalize() expected = date_range('1/1/2000', periods=10, freq='D', tz=tzlocal()) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) - self.assertTrue(result.is_normalized) - self.assertFalse(rng.is_normalized) + assert result.is_normalized + assert not rng.is_normalized def test_normalize_tz_local(self): - # GH 13459 - from dateutil.tz import tzlocal - + # see gh-13459 timezones = ['US/Pacific', 'US/Eastern', 'UTC', 'Asia/Kolkata', 'Asia/Shanghai', 'Australia/Canberra'] @@ -1627,15 +1648,15 @@ def test_normalize_tz_local(self): result = rng.normalize() expected = date_range('1/1/2000', periods=10, freq='D', tz=tzlocal()) - self.assert_index_equal(result, expected) + tm.assert_index_equal(result, expected) - self.assertTrue(result.is_normalized) - self.assertFalse(rng.is_normalized) + assert result.is_normalized + assert not rng.is_normalized def test_tzaware_offset(self): dates = date_range('2012-11-01', periods=3, tz='US/Pacific') offset = dates + offsets.Hour(5) - self.assertEqual(dates[0] + offsets.Hour(5), offset[0]) + assert dates[0] + offsets.Hour(5) == offset[0] # GH 6818 for tz in ['UTC', 'US/Pacific', 'Asia/Tokyo']: @@ -1644,47 +1665,91 @@ def test_tzaware_offset(self): '2010-11-01 07:00'], freq='H', tz=tz) offset = dates + offsets.Hour(5) - self.assert_index_equal(offset, expected) + tm.assert_index_equal(offset, expected) offset = dates + np.timedelta64(5, 'h') - self.assert_index_equal(offset, expected) + tm.assert_index_equal(offset, expected) offset = dates + timedelta(hours=5) - self.assert_index_equal(offset, expected) + tm.assert_index_equal(offset, expected) def test_nat(self): # GH 5546 dates = [NaT] idx = DatetimeIndex(dates) idx = idx.tz_localize('US/Pacific') - self.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Pacific')) + tm.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Pacific')) idx = idx.tz_convert('US/Eastern') - self.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Eastern')) + tm.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Eastern')) idx = idx.tz_convert('UTC') - self.assert_index_equal(idx, DatetimeIndex(dates, tz='UTC')) + tm.assert_index_equal(idx, DatetimeIndex(dates, tz='UTC')) dates = ['2010-12-01 00:00', '2010-12-02 00:00', NaT] idx = DatetimeIndex(dates) idx = idx.tz_localize('US/Pacific') - self.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Pacific')) + tm.assert_index_equal(idx, DatetimeIndex(dates, tz='US/Pacific')) idx = idx.tz_convert('US/Eastern') expected = ['2010-12-01 03:00', '2010-12-02 03:00', NaT] - self.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) + tm.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) idx = idx + offsets.Hour(5) expected = ['2010-12-01 08:00', '2010-12-02 08:00', NaT] - self.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) + tm.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) idx = idx.tz_convert('US/Pacific') expected = ['2010-12-01 05:00', '2010-12-02 05:00', NaT] - self.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Pacific')) + tm.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Pacific')) idx = idx + np.timedelta64(3, 'h') expected = ['2010-12-01 08:00', '2010-12-02 08:00', NaT] - self.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Pacific')) + tm.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Pacific')) idx = idx.tz_convert('US/Eastern') expected = ['2010-12-01 11:00', '2010-12-02 11:00', NaT] - self.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + tm.assert_index_equal(idx, DatetimeIndex(expected, tz='US/Eastern')) + + +class TestTslib(object): + + def test_tslib_tz_convert(self): + def compare_utc_to_local(tz_didx, utc_didx): + f = lambda x: tslib.tz_convert_single(x, 'UTC', tz_didx.tz) + result = tslib.tz_convert(tz_didx.asi8, 'UTC', tz_didx.tz) + result_single = np.vectorize(f)(tz_didx.asi8) + tm.assert_numpy_array_equal(result, result_single) + + def compare_local_to_utc(tz_didx, utc_didx): + f = lambda x: tslib.tz_convert_single(x, tz_didx.tz, 'UTC') + result = tslib.tz_convert(utc_didx.asi8, tz_didx.tz, 'UTC') + result_single = np.vectorize(f)(utc_didx.asi8) + tm.assert_numpy_array_equal(result, result_single) + + for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'Europe/Moscow']: + # US: 2014-03-09 - 2014-11-11 + # MOSCOW: 2014-10-26 / 2014-12-31 + tz_didx = date_range('2014-03-01', '2015-01-10', freq='H', tz=tz) + utc_didx = date_range('2014-03-01', '2015-01-10', freq='H') + compare_utc_to_local(tz_didx, utc_didx) + # local tz to UTC can be differ in hourly (or higher) freqs because + # of DST + compare_local_to_utc(tz_didx, utc_didx) + + tz_didx = date_range('2000-01-01', '2020-01-01', freq='D', tz=tz) + utc_didx = date_range('2000-01-01', '2020-01-01', freq='D') + compare_utc_to_local(tz_didx, utc_didx) + compare_local_to_utc(tz_didx, utc_didx) + + tz_didx = date_range('2000-01-01', '2100-01-01', freq='A', tz=tz) + utc_didx = date_range('2000-01-01', '2100-01-01', freq='A') + compare_utc_to_local(tz_didx, utc_didx) + compare_local_to_utc(tz_didx, utc_didx) + + # Check empty array + result = tslib.tz_convert(np.array([], dtype=np.int64), + tslib.maybe_get_tz('US/Eastern'), + tslib.maybe_get_tz('Asia/Tokyo')) + tm.assert_numpy_array_equal(result, np.array([], dtype=np.int64)) + + # Check all-NaT array + result = tslib.tz_convert(np.array([tslib.iNaT], dtype=np.int64), + tslib.maybe_get_tz('US/Eastern'), + tslib.maybe_get_tz('Asia/Tokyo')) + tm.assert_numpy_array_equal(result, np.array( + [tslib.iNaT], dtype=np.int64)) diff --git a/pandas/tests/types/test_cast.py b/pandas/tests/types/test_cast.py deleted file mode 100644 index 56a14a51105ca..0000000000000 --- a/pandas/tests/types/test_cast.py +++ /dev/null @@ -1,285 +0,0 @@ -# -*- coding: utf-8 -*- - -""" -These test the private routines in types/cast.py - -""" - - -import nose -from datetime import datetime -import numpy as np - -from pandas import Timedelta, Timestamp -from pandas.types.cast import (_possibly_downcast_to_dtype, - _possibly_convert_objects, - _infer_dtype_from_scalar, - _maybe_convert_string_to_object, - _maybe_convert_scalar, - _find_common_type) -from pandas.types.dtypes import (CategoricalDtype, - DatetimeTZDtype, PeriodDtype) -from pandas.util import testing as tm - -_multiprocess_can_split_ = True - - -class TestPossiblyDowncast(tm.TestCase): - - def test_downcast_conv(self): - # test downcasting - - arr = np.array([8.5, 8.6, 8.7, 8.8, 8.9999999999995]) - result = _possibly_downcast_to_dtype(arr, 'infer') - assert (np.array_equal(result, arr)) - - arr = np.array([8., 8., 8., 8., 8.9999999999995]) - result = _possibly_downcast_to_dtype(arr, 'infer') - expected = np.array([8, 8, 8, 8, 9]) - assert (np.array_equal(result, expected)) - - arr = np.array([8., 8., 8., 8., 9.0000000000005]) - result = _possibly_downcast_to_dtype(arr, 'infer') - expected = np.array([8, 8, 8, 8, 9]) - assert (np.array_equal(result, expected)) - - # conversions - - expected = np.array([1, 2]) - for dtype in [np.float64, object, np.int64]: - arr = np.array([1.0, 2.0], dtype=dtype) - result = _possibly_downcast_to_dtype(arr, 'infer') - tm.assert_almost_equal(result, expected, check_dtype=False) - - for dtype in [np.float64, object]: - expected = np.array([1.0, 2.0, np.nan], dtype=dtype) - arr = np.array([1.0, 2.0, np.nan], dtype=dtype) - result = _possibly_downcast_to_dtype(arr, 'infer') - tm.assert_almost_equal(result, expected) - - # empties - for dtype in [np.int32, np.float64, np.float32, np.bool_, - np.int64, object]: - arr = np.array([], dtype=dtype) - result = _possibly_downcast_to_dtype(arr, 'int64') - tm.assert_almost_equal(result, np.array([], dtype=np.int64)) - assert result.dtype == np.int64 - - def test_datetimelikes_nan(self): - arr = np.array([1, 2, np.nan]) - exp = np.array([1, 2, np.datetime64('NaT')], dtype='datetime64[ns]') - res = _possibly_downcast_to_dtype(arr, 'datetime64[ns]') - tm.assert_numpy_array_equal(res, exp) - - exp = np.array([1, 2, np.timedelta64('NaT')], dtype='timedelta64[ns]') - res = _possibly_downcast_to_dtype(arr, 'timedelta64[ns]') - tm.assert_numpy_array_equal(res, exp) - - -class TestInferDtype(tm.TestCase): - - def test_infer_dtype_from_scalar(self): - # Test that _infer_dtype_from_scalar is returning correct dtype for int - # and float. - - for dtypec in [np.uint8, np.int8, np.uint16, np.int16, np.uint32, - np.int32, np.uint64, np.int64]: - data = dtypec(12) - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, type(data)) - - data = 12 - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, np.int64) - - for dtypec in [np.float16, np.float32, np.float64]: - data = dtypec(12) - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, dtypec) - - data = np.float(12) - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, np.float64) - - for data in [True, False]: - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, np.bool_) - - for data in [np.complex64(1), np.complex128(1)]: - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, np.complex_) - - import datetime - for data in [np.datetime64(1, 'ns'), Timestamp(1), - datetime.datetime(2000, 1, 1, 0, 0)]: - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, 'M8[ns]') - - for data in [np.timedelta64(1, 'ns'), Timedelta(1), - datetime.timedelta(1)]: - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, 'm8[ns]') - - for data in [datetime.date(2000, 1, 1), - Timestamp(1, tz='US/Eastern'), 'foo']: - dtype, val = _infer_dtype_from_scalar(data) - self.assertEqual(dtype, np.object_) - - -class TestMaybe(tm.TestCase): - - def test_maybe_convert_string_to_array(self): - result = _maybe_convert_string_to_object('x') - tm.assert_numpy_array_equal(result, np.array(['x'], dtype=object)) - self.assertTrue(result.dtype == object) - - result = _maybe_convert_string_to_object(1) - self.assertEqual(result, 1) - - arr = np.array(['x', 'y'], dtype=str) - result = _maybe_convert_string_to_object(arr) - tm.assert_numpy_array_equal(result, np.array(['x', 'y'], dtype=object)) - self.assertTrue(result.dtype == object) - - # unicode - arr = np.array(['x', 'y']).astype('U') - result = _maybe_convert_string_to_object(arr) - tm.assert_numpy_array_equal(result, np.array(['x', 'y'], dtype=object)) - self.assertTrue(result.dtype == object) - - # object - arr = np.array(['x', 2], dtype=object) - result = _maybe_convert_string_to_object(arr) - tm.assert_numpy_array_equal(result, np.array(['x', 2], dtype=object)) - self.assertTrue(result.dtype == object) - - def test_maybe_convert_scalar(self): - - # pass thru - result = _maybe_convert_scalar('x') - self.assertEqual(result, 'x') - result = _maybe_convert_scalar(np.array([1])) - self.assertEqual(result, np.array([1])) - - # leave scalar dtype - result = _maybe_convert_scalar(np.int64(1)) - self.assertEqual(result, np.int64(1)) - result = _maybe_convert_scalar(np.int32(1)) - self.assertEqual(result, np.int32(1)) - result = _maybe_convert_scalar(np.float32(1)) - self.assertEqual(result, np.float32(1)) - result = _maybe_convert_scalar(np.int64(1)) - self.assertEqual(result, np.float64(1)) - - # coerce - result = _maybe_convert_scalar(1) - self.assertEqual(result, np.int64(1)) - result = _maybe_convert_scalar(1.0) - self.assertEqual(result, np.float64(1)) - result = _maybe_convert_scalar(Timestamp('20130101')) - self.assertEqual(result, Timestamp('20130101').value) - result = _maybe_convert_scalar(datetime(2013, 1, 1)) - self.assertEqual(result, Timestamp('20130101').value) - result = _maybe_convert_scalar(Timedelta('1 day 1 min')) - self.assertEqual(result, Timedelta('1 day 1 min').value) - - -class TestConvert(tm.TestCase): - - def test_possibly_convert_objects_copy(self): - values = np.array([1, 2]) - - out = _possibly_convert_objects(values, copy=False) - self.assertTrue(values is out) - - out = _possibly_convert_objects(values, copy=True) - self.assertTrue(values is not out) - - values = np.array(['apply', 'banana']) - out = _possibly_convert_objects(values, copy=False) - self.assertTrue(values is out) - - out = _possibly_convert_objects(values, copy=True) - self.assertTrue(values is not out) - - -class TestCommonTypes(tm.TestCase): - - def test_numpy_dtypes(self): - # (source_types, destination_type) - testcases = ( - # identity - ((np.int64,), np.int64), - ((np.uint64,), np.uint64), - ((np.float32,), np.float32), - ((np.object,), np.object), - - # into ints - ((np.int16, np.int64), np.int64), - ((np.int32, np.uint32), np.int64), - ((np.uint16, np.uint64), np.uint64), - - # into floats - ((np.float16, np.float32), np.float32), - ((np.float16, np.int16), np.float32), - ((np.float32, np.int16), np.float32), - ((np.uint64, np.int64), np.float64), - ((np.int16, np.float64), np.float64), - ((np.float16, np.int64), np.float64), - - # into others - ((np.complex128, np.int32), np.complex128), - ((np.object, np.float32), np.object), - ((np.object, np.int16), np.object), - - ((np.dtype('datetime64[ns]'), np.dtype('datetime64[ns]')), - np.dtype('datetime64[ns]')), - ((np.dtype('timedelta64[ns]'), np.dtype('timedelta64[ns]')), - np.dtype('timedelta64[ns]')), - - ((np.dtype('datetime64[ns]'), np.dtype('datetime64[ms]')), - np.dtype('datetime64[ns]')), - ((np.dtype('timedelta64[ms]'), np.dtype('timedelta64[ns]')), - np.dtype('timedelta64[ns]')), - - ((np.dtype('datetime64[ns]'), np.dtype('timedelta64[ns]')), - np.object), - ((np.dtype('datetime64[ns]'), np.int64), np.object) - ) - for src, common in testcases: - self.assertEqual(_find_common_type(src), common) - - with tm.assertRaises(ValueError): - # empty - _find_common_type([]) - - def test_categorical_dtype(self): - dtype = CategoricalDtype() - self.assertEqual(_find_common_type([dtype]), 'category') - self.assertEqual(_find_common_type([dtype, dtype]), 'category') - self.assertEqual(_find_common_type([np.object, dtype]), np.object) - - def test_datetimetz_dtype(self): - dtype = DatetimeTZDtype(unit='ns', tz='US/Eastern') - self.assertEqual(_find_common_type([dtype, dtype]), - 'datetime64[ns, US/Eastern]') - - for dtype2 in [DatetimeTZDtype(unit='ns', tz='Asia/Tokyo'), - np.dtype('datetime64[ns]'), np.object, np.int64]: - self.assertEqual(_find_common_type([dtype, dtype2]), np.object) - self.assertEqual(_find_common_type([dtype2, dtype]), np.object) - - def test_period_dtype(self): - dtype = PeriodDtype(freq='D') - self.assertEqual(_find_common_type([dtype, dtype]), 'period[D]') - - for dtype2 in [DatetimeTZDtype(unit='ns', tz='Asia/Tokyo'), - PeriodDtype(freq='2D'), PeriodDtype(freq='H'), - np.dtype('datetime64[ns]'), np.object, np.int64]: - self.assertEqual(_find_common_type([dtype, dtype2]), np.object) - self.assertEqual(_find_common_type([dtype2, dtype]), np.object) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/types/test_common.py b/pandas/tests/types/test_common.py deleted file mode 100644 index 4d6f50862c562..0000000000000 --- a/pandas/tests/types/test_common.py +++ /dev/null @@ -1,62 +0,0 @@ -# -*- coding: utf-8 -*- - -import nose -import numpy as np - -from pandas.types.dtypes import DatetimeTZDtype, PeriodDtype, CategoricalDtype -from pandas.types.common import pandas_dtype, is_dtype_equal - -import pandas.util.testing as tm - -_multiprocess_can_split_ = True - - -class TestPandasDtype(tm.TestCase): - - def test_numpy_dtype(self): - for dtype in ['M8[ns]', 'm8[ns]', 'object', 'float64', 'int64']: - self.assertEqual(pandas_dtype(dtype), np.dtype(dtype)) - - def test_numpy_string_dtype(self): - # do not parse freq-like string as period dtype - self.assertEqual(pandas_dtype('U'), np.dtype('U')) - self.assertEqual(pandas_dtype('S'), np.dtype('S')) - - def test_datetimetz_dtype(self): - for dtype in ['datetime64[ns, US/Eastern]', - 'datetime64[ns, Asia/Tokyo]', - 'datetime64[ns, UTC]']: - self.assertIs(pandas_dtype(dtype), DatetimeTZDtype(dtype)) - self.assertEqual(pandas_dtype(dtype), DatetimeTZDtype(dtype)) - self.assertEqual(pandas_dtype(dtype), dtype) - - def test_categorical_dtype(self): - self.assertEqual(pandas_dtype('category'), CategoricalDtype()) - - def test_period_dtype(self): - for dtype in ['period[D]', 'period[3M]', 'period[U]', - 'Period[D]', 'Period[3M]', 'Period[U]']: - self.assertIs(pandas_dtype(dtype), PeriodDtype(dtype)) - self.assertEqual(pandas_dtype(dtype), PeriodDtype(dtype)) - self.assertEqual(pandas_dtype(dtype), dtype) - - -def test_dtype_equal(): - assert is_dtype_equal(np.int64, np.int64) - assert not is_dtype_equal(np.int64, np.float64) - - p1 = PeriodDtype('D') - p2 = PeriodDtype('D') - assert is_dtype_equal(p1, p2) - assert not is_dtype_equal(np.int64, p1) - - p3 = PeriodDtype('2D') - assert not is_dtype_equal(p1, p3) - - assert not DatetimeTZDtype.is_dtype(np.int64) - assert not PeriodDtype.is_dtype(np.int64) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/types/test_dtypes.py b/pandas/tests/types/test_dtypes.py deleted file mode 100644 index f190c85404ff9..0000000000000 --- a/pandas/tests/types/test_dtypes.py +++ /dev/null @@ -1,360 +0,0 @@ -# -*- coding: utf-8 -*- -from itertools import product - -import nose -import numpy as np -import pandas as pd -from pandas import Series, Categorical, date_range - -from pandas.types.dtypes import DatetimeTZDtype, PeriodDtype, CategoricalDtype -from pandas.types.common import (is_categorical_dtype, is_categorical, - is_datetime64tz_dtype, is_datetimetz, - is_period_dtype, is_period, - is_dtype_equal, is_datetime64_ns_dtype, - is_datetime64_dtype, - is_datetime64_any_dtype, is_string_dtype, - _coerce_to_dtype) -import pandas.util.testing as tm - -_multiprocess_can_split_ = True - - -class Base(object): - - def test_hash(self): - hash(self.dtype) - - def test_equality_invalid(self): - self.assertRaises(self.dtype == 'foo') - self.assertFalse(is_dtype_equal(self.dtype, np.int64)) - - def test_numpy_informed(self): - - # np.dtype doesn't know about our new dtype - def f(): - np.dtype(self.dtype) - - self.assertRaises(TypeError, f) - - self.assertNotEqual(self.dtype, np.str_) - self.assertNotEqual(np.str_, self.dtype) - - def test_pickle(self): - result = self.round_trip_pickle(self.dtype) - self.assertEqual(result, self.dtype) - - -class TestCategoricalDtype(Base, tm.TestCase): - - def setUp(self): - self.dtype = CategoricalDtype() - - def test_hash_vs_equality(self): - # make sure that we satisfy is semantics - dtype = self.dtype - dtype2 = CategoricalDtype() - self.assertTrue(dtype == dtype2) - self.assertTrue(dtype2 == dtype) - self.assertTrue(dtype is dtype2) - self.assertTrue(dtype2 is dtype) - self.assertTrue(hash(dtype) == hash(dtype2)) - - def test_equality(self): - self.assertTrue(is_dtype_equal(self.dtype, 'category')) - self.assertTrue(is_dtype_equal(self.dtype, CategoricalDtype())) - self.assertFalse(is_dtype_equal(self.dtype, 'foo')) - - def test_construction_from_string(self): - result = CategoricalDtype.construct_from_string('category') - self.assertTrue(is_dtype_equal(self.dtype, result)) - self.assertRaises( - TypeError, lambda: CategoricalDtype.construct_from_string('foo')) - - def test_is_dtype(self): - self.assertTrue(CategoricalDtype.is_dtype(self.dtype)) - self.assertTrue(CategoricalDtype.is_dtype('category')) - self.assertTrue(CategoricalDtype.is_dtype(CategoricalDtype())) - self.assertFalse(CategoricalDtype.is_dtype('foo')) - self.assertFalse(CategoricalDtype.is_dtype(np.float64)) - - def test_basic(self): - - self.assertTrue(is_categorical_dtype(self.dtype)) - - factor = Categorical(['a', 'b', 'b', 'a', 'a', 'c', 'c', 'c']) - - s = Series(factor, name='A') - - # dtypes - self.assertTrue(is_categorical_dtype(s.dtype)) - self.assertTrue(is_categorical_dtype(s)) - self.assertFalse(is_categorical_dtype(np.dtype('float64'))) - - self.assertTrue(is_categorical(s.dtype)) - self.assertTrue(is_categorical(s)) - self.assertFalse(is_categorical(np.dtype('float64'))) - self.assertFalse(is_categorical(1.0)) - - -class TestDatetimeTZDtype(Base, tm.TestCase): - - def setUp(self): - self.dtype = DatetimeTZDtype('ns', 'US/Eastern') - - def test_hash_vs_equality(self): - # make sure that we satisfy is semantics - dtype = self.dtype - dtype2 = DatetimeTZDtype('ns', 'US/Eastern') - dtype3 = DatetimeTZDtype(dtype2) - self.assertTrue(dtype == dtype2) - self.assertTrue(dtype2 == dtype) - self.assertTrue(dtype3 == dtype) - self.assertTrue(dtype is dtype2) - self.assertTrue(dtype2 is dtype) - self.assertTrue(dtype3 is dtype) - self.assertTrue(hash(dtype) == hash(dtype2)) - self.assertTrue(hash(dtype) == hash(dtype3)) - - def test_construction(self): - self.assertRaises(ValueError, - lambda: DatetimeTZDtype('ms', 'US/Eastern')) - - def test_subclass(self): - a = DatetimeTZDtype('datetime64[ns, US/Eastern]') - b = DatetimeTZDtype('datetime64[ns, CET]') - - self.assertTrue(issubclass(type(a), type(a))) - self.assertTrue(issubclass(type(a), type(b))) - - def test_coerce_to_dtype(self): - self.assertEqual(_coerce_to_dtype('datetime64[ns, US/Eastern]'), - DatetimeTZDtype('ns', 'US/Eastern')) - self.assertEqual(_coerce_to_dtype('datetime64[ns, Asia/Tokyo]'), - DatetimeTZDtype('ns', 'Asia/Tokyo')) - - def test_compat(self): - self.assertTrue(is_datetime64tz_dtype(self.dtype)) - self.assertTrue(is_datetime64tz_dtype('datetime64[ns, US/Eastern]')) - self.assertTrue(is_datetime64_any_dtype(self.dtype)) - self.assertTrue(is_datetime64_any_dtype('datetime64[ns, US/Eastern]')) - self.assertTrue(is_datetime64_ns_dtype(self.dtype)) - self.assertTrue(is_datetime64_ns_dtype('datetime64[ns, US/Eastern]')) - self.assertFalse(is_datetime64_dtype(self.dtype)) - self.assertFalse(is_datetime64_dtype('datetime64[ns, US/Eastern]')) - - def test_construction_from_string(self): - result = DatetimeTZDtype('datetime64[ns, US/Eastern]') - self.assertTrue(is_dtype_equal(self.dtype, result)) - result = DatetimeTZDtype.construct_from_string( - 'datetime64[ns, US/Eastern]') - self.assertTrue(is_dtype_equal(self.dtype, result)) - self.assertRaises(TypeError, - lambda: DatetimeTZDtype.construct_from_string('foo')) - - def test_is_dtype(self): - self.assertTrue(DatetimeTZDtype.is_dtype(self.dtype)) - self.assertTrue(DatetimeTZDtype.is_dtype('datetime64[ns, US/Eastern]')) - self.assertFalse(DatetimeTZDtype.is_dtype('foo')) - self.assertTrue(DatetimeTZDtype.is_dtype(DatetimeTZDtype( - 'ns', 'US/Pacific'))) - self.assertFalse(DatetimeTZDtype.is_dtype(np.float64)) - - def test_equality(self): - self.assertTrue(is_dtype_equal(self.dtype, - 'datetime64[ns, US/Eastern]')) - self.assertTrue(is_dtype_equal(self.dtype, DatetimeTZDtype( - 'ns', 'US/Eastern'))) - self.assertFalse(is_dtype_equal(self.dtype, 'foo')) - self.assertFalse(is_dtype_equal(self.dtype, DatetimeTZDtype('ns', - 'CET'))) - self.assertFalse(is_dtype_equal( - DatetimeTZDtype('ns', 'US/Eastern'), DatetimeTZDtype( - 'ns', 'US/Pacific'))) - - # numpy compat - self.assertTrue(is_dtype_equal(np.dtype("M8[ns]"), "datetime64[ns]")) - - def test_basic(self): - - self.assertTrue(is_datetime64tz_dtype(self.dtype)) - - dr = date_range('20130101', periods=3, tz='US/Eastern') - s = Series(dr, name='A') - - # dtypes - self.assertTrue(is_datetime64tz_dtype(s.dtype)) - self.assertTrue(is_datetime64tz_dtype(s)) - self.assertFalse(is_datetime64tz_dtype(np.dtype('float64'))) - self.assertFalse(is_datetime64tz_dtype(1.0)) - - self.assertTrue(is_datetimetz(s)) - self.assertTrue(is_datetimetz(s.dtype)) - self.assertFalse(is_datetimetz(np.dtype('float64'))) - self.assertFalse(is_datetimetz(1.0)) - - def test_dst(self): - - dr1 = date_range('2013-01-01', periods=3, tz='US/Eastern') - s1 = Series(dr1, name='A') - self.assertTrue(is_datetimetz(s1)) - - dr2 = date_range('2013-08-01', periods=3, tz='US/Eastern') - s2 = Series(dr2, name='A') - self.assertTrue(is_datetimetz(s2)) - self.assertEqual(s1.dtype, s2.dtype) - - def test_parser(self): - # pr #11245 - for tz, constructor in product(('UTC', 'US/Eastern'), - ('M8', 'datetime64')): - self.assertEqual( - DatetimeTZDtype('%s[ns, %s]' % (constructor, tz)), - DatetimeTZDtype('ns', tz), - ) - - def test_empty(self): - dt = DatetimeTZDtype() - with tm.assertRaises(AttributeError): - str(dt) - - -class TestPeriodDtype(Base, tm.TestCase): - - def setUp(self): - self.dtype = PeriodDtype('D') - - def test_construction(self): - with tm.assertRaises(ValueError): - PeriodDtype('xx') - - for s in ['period[D]', 'Period[D]', 'D']: - dt = PeriodDtype(s) - self.assertEqual(dt.freq, pd.tseries.offsets.Day()) - self.assertTrue(is_period_dtype(dt)) - - for s in ['period[3D]', 'Period[3D]', '3D']: - dt = PeriodDtype(s) - self.assertEqual(dt.freq, pd.tseries.offsets.Day(3)) - self.assertTrue(is_period_dtype(dt)) - - for s in ['period[26H]', 'Period[26H]', '26H', - 'period[1D2H]', 'Period[1D2H]', '1D2H']: - dt = PeriodDtype(s) - self.assertEqual(dt.freq, pd.tseries.offsets.Hour(26)) - self.assertTrue(is_period_dtype(dt)) - - def test_subclass(self): - a = PeriodDtype('period[D]') - b = PeriodDtype('period[3D]') - - self.assertTrue(issubclass(type(a), type(a))) - self.assertTrue(issubclass(type(a), type(b))) - - def test_identity(self): - self.assertEqual(PeriodDtype('period[D]'), - PeriodDtype('period[D]')) - self.assertIs(PeriodDtype('period[D]'), - PeriodDtype('period[D]')) - - self.assertEqual(PeriodDtype('period[3D]'), - PeriodDtype('period[3D]')) - self.assertIs(PeriodDtype('period[3D]'), - PeriodDtype('period[3D]')) - - self.assertEqual(PeriodDtype('period[1S1U]'), - PeriodDtype('period[1000001U]')) - self.assertIs(PeriodDtype('period[1S1U]'), - PeriodDtype('period[1000001U]')) - - def test_coerce_to_dtype(self): - self.assertEqual(_coerce_to_dtype('period[D]'), - PeriodDtype('period[D]')) - self.assertEqual(_coerce_to_dtype('period[3M]'), - PeriodDtype('period[3M]')) - - def test_compat(self): - self.assertFalse(is_datetime64_ns_dtype(self.dtype)) - self.assertFalse(is_datetime64_ns_dtype('period[D]')) - self.assertFalse(is_datetime64_dtype(self.dtype)) - self.assertFalse(is_datetime64_dtype('period[D]')) - - def test_construction_from_string(self): - result = PeriodDtype('period[D]') - self.assertTrue(is_dtype_equal(self.dtype, result)) - result = PeriodDtype.construct_from_string('period[D]') - self.assertTrue(is_dtype_equal(self.dtype, result)) - with tm.assertRaises(TypeError): - PeriodDtype.construct_from_string('foo') - with tm.assertRaises(TypeError): - PeriodDtype.construct_from_string('period[foo]') - with tm.assertRaises(TypeError): - PeriodDtype.construct_from_string('foo[D]') - - with tm.assertRaises(TypeError): - PeriodDtype.construct_from_string('datetime64[ns]') - with tm.assertRaises(TypeError): - PeriodDtype.construct_from_string('datetime64[ns, US/Eastern]') - - def test_is_dtype(self): - self.assertTrue(PeriodDtype.is_dtype(self.dtype)) - self.assertTrue(PeriodDtype.is_dtype('period[D]')) - self.assertTrue(PeriodDtype.is_dtype('period[3D]')) - self.assertTrue(PeriodDtype.is_dtype(PeriodDtype('3D'))) - self.assertTrue(PeriodDtype.is_dtype('period[U]')) - self.assertTrue(PeriodDtype.is_dtype('period[S]')) - self.assertTrue(PeriodDtype.is_dtype(PeriodDtype('U'))) - self.assertTrue(PeriodDtype.is_dtype(PeriodDtype('S'))) - - self.assertFalse(PeriodDtype.is_dtype('D')) - self.assertFalse(PeriodDtype.is_dtype('3D')) - self.assertFalse(PeriodDtype.is_dtype('U')) - self.assertFalse(PeriodDtype.is_dtype('S')) - self.assertFalse(PeriodDtype.is_dtype('foo')) - self.assertFalse(PeriodDtype.is_dtype(np.object_)) - self.assertFalse(PeriodDtype.is_dtype(np.int64)) - self.assertFalse(PeriodDtype.is_dtype(np.float64)) - - def test_equality(self): - self.assertTrue(is_dtype_equal(self.dtype, 'period[D]')) - self.assertTrue(is_dtype_equal(self.dtype, PeriodDtype('D'))) - self.assertTrue(is_dtype_equal(self.dtype, PeriodDtype('D'))) - self.assertTrue(is_dtype_equal(PeriodDtype('D'), PeriodDtype('D'))) - - self.assertFalse(is_dtype_equal(self.dtype, 'D')) - self.assertFalse(is_dtype_equal(PeriodDtype('D'), PeriodDtype('2D'))) - - def test_basic(self): - self.assertTrue(is_period_dtype(self.dtype)) - - pidx = pd.period_range('2013-01-01 09:00', periods=5, freq='H') - - self.assertTrue(is_period_dtype(pidx.dtype)) - self.assertTrue(is_period_dtype(pidx)) - self.assertTrue(is_period(pidx)) - - s = Series(pidx, name='A') - # dtypes - # series results in object dtype currently, - # is_period checks period_arraylike - self.assertFalse(is_period_dtype(s.dtype)) - self.assertFalse(is_period_dtype(s)) - self.assertTrue(is_period(s)) - - self.assertFalse(is_period_dtype(np.dtype('float64'))) - self.assertFalse(is_period_dtype(1.0)) - self.assertFalse(is_period(np.dtype('float64'))) - self.assertFalse(is_period(1.0)) - - def test_empty(self): - dt = PeriodDtype() - with tm.assertRaises(AttributeError): - str(dt) - - def test_not_string(self): - # though PeriodDtype has object kind, it cannot be string - self.assertFalse(is_string_dtype(PeriodDtype('D'))) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/types/test_generic.py b/pandas/tests/types/test_generic.py deleted file mode 100644 index 28600687e8062..0000000000000 --- a/pandas/tests/types/test_generic.py +++ /dev/null @@ -1,48 +0,0 @@ -# -*- coding: utf-8 -*- - -import nose -import numpy as np -import pandas as pd -import pandas.util.testing as tm -from pandas.types import generic as gt - -_multiprocess_can_split_ = True - - -class TestABCClasses(tm.TestCase): - tuples = [[1, 2, 2], ['red', 'blue', 'red']] - multi_index = pd.MultiIndex.from_arrays(tuples, names=('number', 'color')) - datetime_index = pd.to_datetime(['2000/1/1', '2010/1/1']) - timedelta_index = pd.to_timedelta(np.arange(5), unit='s') - period_index = pd.period_range('2000/1/1', '2010/1/1/', freq='M') - categorical = pd.Categorical([1, 2, 3], categories=[2, 3, 1]) - categorical_df = pd.DataFrame({"values": [1, 2, 3]}, index=categorical) - df = pd.DataFrame({'names': ['a', 'b', 'c']}, index=multi_index) - sparse_series = pd.Series([1, 2, 3]).to_sparse() - sparse_array = pd.SparseArray(np.random.randn(10)) - - def test_abc_types(self): - self.assertIsInstance(pd.Index(['a', 'b', 'c']), gt.ABCIndex) - self.assertIsInstance(pd.Int64Index([1, 2, 3]), gt.ABCInt64Index) - self.assertIsInstance(pd.UInt64Index([1, 2, 3]), gt.ABCUInt64Index) - self.assertIsInstance(pd.Float64Index([1, 2, 3]), gt.ABCFloat64Index) - self.assertIsInstance(self.multi_index, gt.ABCMultiIndex) - self.assertIsInstance(self.datetime_index, gt.ABCDatetimeIndex) - self.assertIsInstance(self.timedelta_index, gt.ABCTimedeltaIndex) - self.assertIsInstance(self.period_index, gt.ABCPeriodIndex) - self.assertIsInstance(self.categorical_df.index, - gt.ABCCategoricalIndex) - self.assertIsInstance(pd.Index(['a', 'b', 'c']), gt.ABCIndexClass) - self.assertIsInstance(pd.Int64Index([1, 2, 3]), gt.ABCIndexClass) - self.assertIsInstance(pd.Series([1, 2, 3]), gt.ABCSeries) - self.assertIsInstance(self.df, gt.ABCDataFrame) - self.assertIsInstance(self.df.to_panel(), gt.ABCPanel) - self.assertIsInstance(self.sparse_series, gt.ABCSparseSeries) - self.assertIsInstance(self.sparse_array, gt.ABCSparseArray) - self.assertIsInstance(self.categorical, gt.ABCCategorical) - self.assertIsInstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tests/util/__init__.py b/pandas/tests/util/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tools/tests/test_hashing.py b/pandas/tests/util/test_hashing.py similarity index 67% rename from pandas/tools/tests/test_hashing.py rename to pandas/tests/util/test_hashing.py index fb1f187ddd5c0..289592939e3da 100644 --- a/pandas/tools/tests/test_hashing.py +++ b/pandas/tests/util/test_hashing.py @@ -1,16 +1,19 @@ +import pytest +import datetime + +from warnings import catch_warnings import numpy as np import pandas as pd from pandas import DataFrame, Series, Index, MultiIndex -from pandas.tools.hashing import hash_array, hash_tuples, hash_pandas_object +from pandas.util import hash_array, hash_pandas_object +from pandas.core.util.hashing import hash_tuples, hash_tuple, _hash_scalar import pandas.util.testing as tm -class TestHashing(tm.TestCase): - - _multiprocess_can_split_ = True +class TestHashing(object): - def setUp(self): + def setup_method(self, method): self.df = DataFrame( {'i32': np.array([1, 2, 3] * 3, dtype='int32'), 'f32': np.array([None, 2.5, 3.5] * 3, dtype='float32'), @@ -46,7 +49,7 @@ def test_hash_array_mixed(self): def test_hash_array_errors(self): for val in [5, 'foo', pd.Timestamp('20130101')]: - self.assertRaises(TypeError, hash_array, val) + pytest.raises(TypeError, hash_array, val) def check_equal(self, obj, **kwargs): a = hash_pandas_object(obj, **kwargs) @@ -66,28 +69,78 @@ def check_not_equal_with_index(self, obj): a = hash_pandas_object(obj, index=True) b = hash_pandas_object(obj, index=False) if len(obj): - self.assertFalse((a == b).all()) + assert not (a == b).all() def test_hash_tuples(self): tups = [(1, 'one'), (1, 'two'), (2, 'one')] result = hash_tuples(tups) expected = hash_pandas_object(MultiIndex.from_tuples(tups)).values - self.assert_numpy_array_equal(result, expected) + tm.assert_numpy_array_equal(result, expected) result = hash_tuples(tups[0]) - self.assertEqual(result, expected[0]) + assert result == expected[0] + + def test_hash_tuple(self): + # test equivalence between hash_tuples and hash_tuple + for tup in [(1, 'one'), (1, np.nan), (1.0, pd.NaT, 'A'), + ('A', pd.Timestamp("2012-01-01"))]: + result = hash_tuple(tup) + expected = hash_tuples([tup])[0] + assert result == expected + + def test_hash_scalar(self): + for val in [1, 1.4, 'A', b'A', u'A', pd.Timestamp("2012-01-01"), + pd.Timestamp("2012-01-01", tz='Europe/Brussels'), + datetime.datetime(2012, 1, 1), + pd.Timestamp("2012-01-01", tz='EST').to_pydatetime(), + pd.Timedelta('1 days'), datetime.timedelta(1), + pd.Period('2012-01-01', freq='D'), pd.Interval(0, 1), + np.nan, pd.NaT, None]: + result = _hash_scalar(val) + expected = hash_array(np.array([val], dtype=object), + categorize=True) + assert result[0] == expected[0] def test_hash_tuples_err(self): for val in [5, 'foo', pd.Timestamp('20130101')]: - self.assertRaises(TypeError, hash_tuples, val) + pytest.raises(TypeError, hash_tuples, val) def test_multiindex_unique(self): mi = MultiIndex.from_tuples([(118, 472), (236, 118), (51, 204), (102, 51)]) - self.assertTrue(mi.is_unique) + assert mi.is_unique result = hash_pandas_object(mi) - self.assertTrue(result.is_unique) + assert result.is_unique + + def test_multiindex_objects(self): + mi = MultiIndex(levels=[['b', 'd', 'a'], [1, 2, 3]], + labels=[[0, 1, 0, 2], [2, 0, 0, 1]], + names=['col1', 'col2']) + recons = mi._sort_levels_monotonic() + + # these are equal + assert mi.equals(recons) + assert Index(mi.values).equals(Index(recons.values)) + + # _hashed_values and hash_pandas_object(..., index=False) + # equivalency + expected = hash_pandas_object( + mi, index=False).values + result = mi._hashed_values + tm.assert_numpy_array_equal(result, expected) + + expected = hash_pandas_object( + recons, index=False).values + result = recons._hashed_values + tm.assert_numpy_array_equal(result, expected) + + expected = mi._hashed_values + result = recons._hashed_values + + # values should match, but in different order + tm.assert_numpy_array_equal(np.sort(result), + np.sort(expected)) def test_hash_pandas_object(self): @@ -154,13 +207,28 @@ def test_categorical_consistency(self): tm.assert_series_equal(h1, h2) tm.assert_series_equal(h1, h3) + def test_categorical_with_nan_consistency(self): + c = pd.Categorical.from_codes( + [-1, 0, 1, 2, 3, 4], + categories=pd.date_range('2012-01-01', periods=5, name='B')) + expected = hash_array(c, categorize=False) + c = pd.Categorical.from_codes( + [-1, 0], + categories=[pd.Timestamp('2012-01-01')]) + result = hash_array(c, categorize=False) + assert result[0] in expected + assert result[1] in expected + def test_pandas_errors(self): - for obj in [pd.Timestamp('20130101'), tm.makePanel()]: - def f(): - hash_pandas_object(f) + for obj in [pd.Timestamp('20130101')]: + with pytest.raises(TypeError): + hash_pandas_object(obj) - self.assertRaises(TypeError, f) + with catch_warnings(record=True): + obj = tm.makePanel() + with pytest.raises(TypeError): + hash_pandas_object(obj) def test_hash_keys(self): # using different hash keys, should have different hashes @@ -170,13 +238,13 @@ def test_hash_keys(self): obj = Series(list('abc')) a = hash_pandas_object(obj, hash_key='9876543210123456') b = hash_pandas_object(obj, hash_key='9876543210123465') - self.assertTrue((a != b).all()) + assert (a != b).all() def test_invalid_key(self): # this only matters for object dtypes def f(): hash_pandas_object(Series(list('abc')), hash_key='foo') - self.assertRaises(ValueError, f) + pytest.raises(ValueError, f) def test_alread_encoded(self): # if already encoded then ok @@ -195,13 +263,13 @@ def test_same_len_hash_collisions(self): length = 2**(l + 8) + 1 s = tm.rands_array(length, 2) result = hash_array(s, 'utf8') - self.assertFalse(result[0] == result[1]) + assert not result[0] == result[1] for l in range(8): length = 2**(l + 8) s = tm.rands_array(length, 2) result = hash_array(s, 'utf8') - self.assertFalse(result[0] == result[1]) + assert not result[0] == result[1] def test_hash_collisions(self): @@ -213,12 +281,27 @@ def test_hash_collisions(self): # these should be different! result1 = hash_array(np.asarray(L[0:1], dtype=object), 'utf8') expected1 = np.array([14963968704024874985], dtype=np.uint64) - self.assert_numpy_array_equal(result1, expected1) + tm.assert_numpy_array_equal(result1, expected1) result2 = hash_array(np.asarray(L[1:2], dtype=object), 'utf8') expected2 = np.array([16428432627716348016], dtype=np.uint64) - self.assert_numpy_array_equal(result2, expected2) + tm.assert_numpy_array_equal(result2, expected2) result = hash_array(np.asarray(L, dtype=object), 'utf8') - self.assert_numpy_array_equal( + tm.assert_numpy_array_equal( result, np.concatenate([expected1, expected2], axis=0)) + + +def test_deprecation(): + + with tm.assert_produces_warning(DeprecationWarning, + check_stacklevel=False): + from pandas.tools.hashing import hash_pandas_object + obj = Series(list('abc')) + hash_pandas_object(obj, hash_key='9876543210123456') + + with tm.assert_produces_warning(DeprecationWarning, + check_stacklevel=False): + from pandas.tools.hashing import hash_array + obj = np.array([1, 2, 3]) + hash_array(obj, hash_key='9876543210123456') diff --git a/pandas/tests/test_testing.py b/pandas/tests/util/test_testing.py similarity index 73% rename from pandas/tests/test_testing.py rename to pandas/tests/util/test_testing.py index e2f295a5343bc..fe7c3b99987f5 100644 --- a/pandas/tests/test_testing.py +++ b/pandas/tests/util/test_testing.py @@ -1,32 +1,26 @@ # -*- coding: utf-8 -*- import pandas as pd -import unittest -import nose +import pytest import numpy as np import sys from pandas import Series, DataFrame import pandas.util.testing as tm -from pandas.util.testing import (assert_almost_equal, assertRaisesRegexp, - raise_with_traceback, assert_index_equal, - assert_series_equal, assert_frame_equal, - assert_numpy_array_equal, - RNGContext, assertRaises, - skip_if_no_package_deco) +from pandas.util.testing import (assert_almost_equal, raise_with_traceback, + assert_index_equal, assert_series_equal, + assert_frame_equal, assert_numpy_array_equal, + RNGContext) from pandas.compat import is_platform_windows -# let's get meta. - -class TestAssertAlmostEqual(tm.TestCase): - _multiprocess_can_split_ = True +class TestAssertAlmostEqual(object): def _assert_almost_equal_both(self, a, b, **kwargs): assert_almost_equal(a, b, **kwargs) assert_almost_equal(b, a, **kwargs) def _assert_not_almost_equal_both(self, a, b, **kwargs): - self.assertRaises(AssertionError, assert_almost_equal, a, b, **kwargs) - self.assertRaises(AssertionError, assert_almost_equal, b, a, **kwargs) + pytest.raises(AssertionError, assert_almost_equal, a, b, **kwargs) + pytest.raises(AssertionError, assert_almost_equal, b, a, **kwargs) def test_assert_almost_equal_numbers(self): self._assert_almost_equal_both(1.1, 1.1) @@ -132,12 +126,12 @@ def test_assert_almost_equal_inf(self): dtype=np.object_)) def test_assert_almost_equal_pandas(self): - self.assert_almost_equal(pd.Index([1., 1.1]), - pd.Index([1., 1.100001])) - self.assert_almost_equal(pd.Series([1., 1.1]), - pd.Series([1., 1.100001])) - self.assert_almost_equal(pd.DataFrame({'a': [1., 1.1]}), - pd.DataFrame({'a': [1., 1.100001]})) + tm.assert_almost_equal(pd.Index([1., 1.1]), + pd.Index([1., 1.100001])) + tm.assert_almost_equal(pd.Series([1., 1.1]), + pd.Series([1., 1.100001])) + tm.assert_almost_equal(pd.DataFrame({'a': [1., 1.1]}), + pd.DataFrame({'a': [1., 1.100001]})) def test_assert_almost_equal_object(self): a = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-01')] @@ -145,17 +139,16 @@ def test_assert_almost_equal_object(self): self._assert_almost_equal_both(a, b) -class TestUtilTesting(tm.TestCase): - _multiprocess_can_split_ = True +class TestUtilTesting(object): def test_raise_with_traceback(self): - with assertRaisesRegexp(LookupError, "error_text"): + with tm.assert_raises_regex(LookupError, "error_text"): try: raise ValueError("THIS IS AN ERROR") except ValueError as e: e = LookupError("error_text") raise_with_traceback(e) - with assertRaisesRegexp(LookupError, "error_text"): + with tm.assert_raises_regex(LookupError, "error_text"): try: raise ValueError("This is another error") except ValueError: @@ -164,13 +157,13 @@ def test_raise_with_traceback(self): raise_with_traceback(e, traceback) -class TestAssertNumpyArrayEqual(tm.TestCase): +class TestAssertNumpyArrayEqual(object): def test_numpy_array_equal_message(self): if is_platform_windows(): - raise nose.SkipTest("windows has incomparable line-endings " - "and uses L on the shape") + pytest.skip("windows has incomparable line-endings " + "and uses L on the shape") expected = """numpy array are different @@ -178,18 +171,18 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\(2,\\) \\[right\\]: \\(3,\\)""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([1, 2]), np.array([3, 4, 5])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([1, 2]), np.array([3, 4, 5])) # scalar comparison expected = """Expected type """ - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(1, 2) expected = """expected 2\\.00000 but got 1\\.00000, with decimal 5""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(1, 2) # array / scalar array comparison @@ -199,10 +192,10 @@ def test_numpy_array_equal_message(self): \\[left\\]: ndarray \\[right\\]: int""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): # numpy_array_equal only accepts np.ndarray assert_numpy_array_equal(np.array([1]), 1) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([1]), 1) # scalar / array comparison @@ -212,9 +205,9 @@ def test_numpy_array_equal_message(self): \\[left\\]: int \\[right\\]: ndarray""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(1, np.array([1])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(1, np.array([1])) expected = """numpy array are different @@ -223,10 +216,10 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\[nan, 2\\.0, 3\\.0\\] \\[right\\]: \\[1\\.0, nan, 3\\.0\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([np.nan, 2, 3]), np.array([1, np.nan, 3])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([np.nan, 2, 3]), np.array([1, np.nan, 3])) @@ -236,9 +229,9 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\[1, 2\\] \\[right\\]: \\[1, 3\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([1, 2]), np.array([1, 3])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([1, 2]), np.array([1, 3])) expected = """numpy array are different @@ -247,7 +240,7 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\[1\\.1, 2\\.000001\\] \\[right\\]: \\[1\\.1, 2.0\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal( np.array([1.1, 2.000001]), np.array([1.1, 2.0])) @@ -260,10 +253,10 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\[\\[1, 2\\], \\[3, 4\\], \\[5, 6\\]\\] \\[right\\]: \\[\\[1, 3\\], \\[3, 4\\], \\[5, 6\\]\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([[1, 2], [3, 4], [5, 6]]), np.array([[1, 3], [3, 4], [5, 6]])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([[1, 2], [3, 4], [5, 6]]), np.array([[1, 3], [3, 4], [5, 6]])) @@ -273,10 +266,10 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\[\\[1, 2\\], \\[3, 4\\]\\] \\[right\\]: \\[\\[1, 3\\], \\[3, 4\\]\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([[1, 2], [3, 4]]), np.array([[1, 3], [3, 4]])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([[1, 2], [3, 4]]), np.array([[1, 3], [3, 4]])) @@ -287,18 +280,18 @@ def test_numpy_array_equal_message(self): \\[left\\]: \\(2,\\) \\[right\\]: \\(3,\\)""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(np.array([1, 2]), np.array([3, 4, 5]), obj='Index') - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(np.array([1, 2]), np.array([3, 4, 5]), obj='Index') def test_numpy_array_equal_object_message(self): if is_platform_windows(): - raise nose.SkipTest("windows has incomparable line-endings " - "and uses L on the shape") + pytest.skip("windows has incomparable line-endings " + "and uses L on the shape") a = np.array([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-01')]) b = np.array([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')]) @@ -309,9 +302,9 @@ def test_numpy_array_equal_object_message(self): \\[left\\]: \\[2011-01-01 00:00:00, 2011-01-01 00:00:00\\] \\[right\\]: \\[2011-01-01 00:00:00, 2011-01-02 00:00:00\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(a, b) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal(a, b) def test_numpy_array_equal_copy_flag(self): @@ -319,10 +312,10 @@ def test_numpy_array_equal_copy_flag(self): b = a.copy() c = a.view() expected = r'array\(\[1, 2, 3\]\) is not array\(\[1, 2, 3\]\)' - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(a, b, check_same='same') expected = r'array\(\[1, 2, 3\]\) is array\(\[1, 2, 3\]\)' - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_numpy_array_equal(a, c, check_same='copy') def test_assert_almost_equal_iterable_message(self): @@ -333,7 +326,7 @@ def test_assert_almost_equal_iterable_message(self): \\[left\\]: 2 \\[right\\]: 3""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal([1, 2], [3, 4, 5]) expected = """Iterable are different @@ -342,12 +335,11 @@ def test_assert_almost_equal_iterable_message(self): \\[left\\]: \\[1, 2\\] \\[right\\]: \\[1, 3\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_almost_equal([1, 2], [1, 3]) -class TestAssertIndexEqual(unittest.TestCase): - _multiprocess_can_split_ = True +class TestAssertIndexEqual(object): def test_index_equal_message(self): @@ -361,7 +353,7 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3]) idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 3), ('B', 4)]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, exact=False) expected = """MultiIndex level \\[1\\] are different @@ -374,9 +366,9 @@ def test_index_equal_message(self): ('B', 3), ('B', 4)]) idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 3), ('B', 4)]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, check_exact=False) expected = """Index are different @@ -387,9 +379,9 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3]) idx2 = pd.Index([1, 2, 3, 4]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, check_exact=False) expected = """Index are different @@ -400,9 +392,9 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3]) idx2 = pd.Index([1, 2, 3.0]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, exact=True) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, exact=True, check_exact=False) expected = """Index are different @@ -413,7 +405,7 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3.]) idx2 = pd.Index([1, 2, 3.0000000001]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) # must success @@ -427,9 +419,9 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3.]) idx2 = pd.Index([1, 2, 3.0001]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, check_exact=False) # must success assert_index_equal(idx1, idx2, check_exact=False, @@ -443,9 +435,9 @@ def test_index_equal_message(self): idx1 = pd.Index([1, 2, 3]) idx2 = pd.Index([1, 2, 4]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, check_less_precise=True) expected = """MultiIndex level \\[1\\] are different @@ -458,9 +450,9 @@ def test_index_equal_message(self): ('B', 3), ('B', 4)]) idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 3), ('B', 4)]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2, check_exact=False) def test_index_equal_metadata_message(self): @@ -473,7 +465,7 @@ def test_index_equal_metadata_message(self): idx1 = pd.Index([1, 2, 3]) idx2 = pd.Index([1, 2, 3], name='x') - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) # same name, should pass @@ -490,20 +482,19 @@ def test_index_equal_metadata_message(self): idx1 = pd.Index([1, 2, 3], name=np.nan) idx2 = pd.Index([1, 2, 3], name=pd.NaT) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_index_equal(idx1, idx2) -class TestAssertSeriesEqual(tm.TestCase): - _multiprocess_can_split_ = True +class TestAssertSeriesEqual(object): def _assert_equal(self, x, y, **kwargs): assert_series_equal(x, y, **kwargs) assert_series_equal(y, x, **kwargs) def _assert_not_equal(self, a, b, **kwargs): - self.assertRaises(AssertionError, assert_series_equal, a, b, **kwargs) - self.assertRaises(AssertionError, assert_series_equal, b, a, **kwargs) + pytest.raises(AssertionError, assert_series_equal, a, b, **kwargs) + pytest.raises(AssertionError, assert_series_equal, b, a, **kwargs) def test_equal(self): self._assert_equal(Series(range(3)), Series(range(3))) @@ -527,27 +518,27 @@ def test_less_precise(self): s1 = Series([0.12345], dtype='float64') s2 = Series([0.12346], dtype='float64') - self.assertRaises(AssertionError, assert_series_equal, s1, s2) + pytest.raises(AssertionError, assert_series_equal, s1, s2) self._assert_equal(s1, s2, check_less_precise=True) for i in range(4): self._assert_equal(s1, s2, check_less_precise=i) - self.assertRaises(AssertionError, assert_series_equal, s1, s2, 10) + pytest.raises(AssertionError, assert_series_equal, s1, s2, 10) s1 = Series([0.12345], dtype='float32') s2 = Series([0.12346], dtype='float32') - self.assertRaises(AssertionError, assert_series_equal, s1, s2) + pytest.raises(AssertionError, assert_series_equal, s1, s2) self._assert_equal(s1, s2, check_less_precise=True) for i in range(4): self._assert_equal(s1, s2, check_less_precise=i) - self.assertRaises(AssertionError, assert_series_equal, s1, s2, 10) + pytest.raises(AssertionError, assert_series_equal, s1, s2, 10) # even less than less precise s1 = Series([0.1235], dtype='float32') s2 = Series([0.1236], dtype='float32') - self.assertRaises(AssertionError, assert_series_equal, s1, s2) - self.assertRaises(AssertionError, assert_series_equal, s1, s2, True) + pytest.raises(AssertionError, assert_series_equal, s1, s2) + pytest.raises(AssertionError, assert_series_equal, s1, s2, True) def test_index_dtype(self): df1 = DataFrame.from_records( @@ -573,7 +564,7 @@ def test_series_equal_message(self): \\[left\\]: 3, RangeIndex\\(start=0, stop=3, step=1\\) \\[right\\]: 4, RangeIndex\\(start=0, stop=4, step=1\\)""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 3, 4])) expected = """Series are different @@ -582,23 +573,36 @@ def test_series_equal_message(self): \\[left\\]: \\[1, 2, 3\\] \\[right\\]: \\[1, 2, 4\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 4])) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 4]), check_less_precise=True) -class TestAssertFrameEqual(tm.TestCase): - _multiprocess_can_split_ = True +class TestAssertFrameEqual(object): def _assert_equal(self, x, y, **kwargs): assert_frame_equal(x, y, **kwargs) assert_frame_equal(y, x, **kwargs) def _assert_not_equal(self, a, b, **kwargs): - self.assertRaises(AssertionError, assert_frame_equal, a, b, **kwargs) - self.assertRaises(AssertionError, assert_frame_equal, b, a, **kwargs) + pytest.raises(AssertionError, assert_frame_equal, a, b, **kwargs) + pytest.raises(AssertionError, assert_frame_equal, b, a, **kwargs) + + def test_equal_with_different_row_order(self): + # check_like=True ignores row-column orderings + df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, + index=['a', 'b', 'c']) + df2 = pd.DataFrame({'A': [3, 2, 1], 'B': [6, 5, 4]}, + index=['c', 'b', 'a']) + + self._assert_equal(df1, df2, check_like=True) + self._assert_not_equal(df1, df2) + + def test_not_equal_with_different_shape(self): + self._assert_not_equal(pd.DataFrame({'A': [1, 2, 3]}), + pd.DataFrame({'A': [1, 2, 3, 4]})) def test_index_dtype(self): df1 = DataFrame.from_records( @@ -627,21 +631,11 @@ def test_frame_equal_message(self): expected = """DataFrame are different -DataFrame shape \\(number of rows\\) are different -\\[left\\]: 3, RangeIndex\\(start=0, stop=3, step=1\\) -\\[right\\]: 4, RangeIndex\\(start=0, stop=4, step=1\\)""" - - with assertRaisesRegexp(AssertionError, expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3]}), - pd.DataFrame({'A': [1, 2, 3, 4]})) +DataFrame shape mismatch +\\[left\\]: \\(3, 2\\) +\\[right\\]: \\(3, 1\\)""" - expected = """DataFrame are different - -DataFrame shape \\(number of columns\\) are different -\\[left\\]: 2, Index\\(\\[u?'A', u?'B'\\], dtype='object'\\) -\\[right\\]: 1, Index\\(\\[u?'A'\\], dtype='object'\\)""" - - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), pd.DataFrame({'A': [1, 2, 3]})) @@ -651,7 +645,7 @@ def test_frame_equal_message(self): \\[left\\]: Index\\(\\[u?'a', u?'b', u?'c'\\], dtype='object'\\) \\[right\\]: Index\\(\\[u?'a', u?'b', u?'d'\\], dtype='object'\\)""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']), pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, @@ -663,7 +657,7 @@ def test_frame_equal_message(self): \\[left\\]: Index\\(\\[u?'A', u?'B'\\], dtype='object'\\) \\[right\\]: Index\\(\\[u?'A', u?'b'\\], dtype='object'\\)""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']), pd.DataFrame({'A': [1, 2, 3], 'b': [4, 5, 6]}, @@ -675,33 +669,17 @@ def test_frame_equal_message(self): \\[left\\]: \\[4, 5, 6\\] \\[right\\]: \\[4, 5, 7\\]""" - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]})) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]}), by_blocks=True) -class TestIsInstance(tm.TestCase): - - def test_isinstance(self): - - expected = "Expected type " - with assertRaisesRegexp(AssertionError, expected): - tm.assertIsInstance(1, pd.Series) - - def test_notisinstance(self): - - expected = "Input must not be type " - with assertRaisesRegexp(AssertionError, expected): - tm.assertNotIsInstance(pd.Series([1]), pd.Series) - - -class TestAssertCategoricalEqual(unittest.TestCase): - _multiprocess_can_split_ = True +class TestAssertCategoricalEqual(object): def test_categorical_equal_message(self): @@ -713,7 +691,7 @@ def test_categorical_equal_message(self): a = pd.Categorical([1, 2, 3, 4]) b = pd.Categorical([1, 2, 3, 5]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): tm.assert_categorical_equal(a, b) expected = """Categorical\\.codes are different @@ -724,7 +702,7 @@ def test_categorical_equal_message(self): a = pd.Categorical([1, 2, 4, 3], categories=[1, 2, 3, 4]) b = pd.Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4]) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): tm.assert_categorical_equal(a, b) expected = """Categorical are different @@ -735,11 +713,11 @@ def test_categorical_equal_message(self): a = pd.Categorical([1, 2, 3, 4], ordered=False) b = pd.Categorical([1, 2, 3, 4], ordered=True) - with assertRaisesRegexp(AssertionError, expected): + with tm.assert_raises_regex(AssertionError, expected): tm.assert_categorical_equal(a, b) -class TestRNGContext(unittest.TestCase): +class TestRNGContext(object): def test_RNGContext(self): expected0 = 1.764052345967664 @@ -747,63 +725,17 @@ def test_RNGContext(self): with RNGContext(0): with RNGContext(1): - self.assertEqual(np.random.randn(), expected1) - self.assertEqual(np.random.randn(), expected0) - - -class TestDeprecatedTests(tm.TestCase): - - def test_warning(self): - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertEquals(1, 1) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertNotEquals(1, 2) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assert_(True) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertAlmostEquals(1.0, 1.0000000001) + assert np.random.randn() == expected1 + assert np.random.randn() == expected0 - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertNotAlmostEquals(1, 2) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - tm.assert_isinstance(Series([1, 2]), Series, msg='xxx') - - -class TestLocale(tm.TestCase): +class TestLocale(object): def test_locale(self): if sys.platform == 'win32': - raise nose.SkipTest( + pytest.skip( "skipping on win platforms as locale not available") # GH9744 locales = tm.get_locales() - self.assertTrue(len(locales) >= 1) - - -def test_skiptest_deco(): - from nose import SkipTest - - @skip_if_no_package_deco("fakepackagename") - def f(): - pass - with assertRaises(SkipTest): - f() - - @skip_if_no_package_deco("numpy") - def f(): - pass - # hack to ensure that SkipTest is *not* raised - with assertRaises(ValueError): - f() - raise ValueError - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert len(locales) >= 1 diff --git a/pandas/tests/test_util.py b/pandas/tests/util/test_util.py similarity index 71% rename from pandas/tests/test_util.py rename to pandas/tests/util/test_util.py index ed82604035358..abd82cfa89f94 100644 --- a/pandas/tests/test_util.py +++ b/pandas/tests/util/test_util.py @@ -1,22 +1,25 @@ # -*- coding: utf-8 -*- -import nose - -from collections import OrderedDict +import os +import locale +import codecs import sys -import unittest from uuid import uuid4 +from collections import OrderedDict + +import pytest +from pandas.compat import intern from pandas.util._move import move_into_mutable_buffer, BadMove, stolenbuf -from pandas.util.decorators import deprecate_kwarg -from pandas.util.validators import (validate_args, validate_kwargs, - validate_args_and_kwargs, - validate_bool_kwarg) +from pandas.util._decorators import deprecate_kwarg +from pandas.util._validators import (validate_args, validate_kwargs, + validate_args_and_kwargs, + validate_bool_kwarg) import pandas.util.testing as tm -class TestDecorators(tm.TestCase): +class TestDecorators(object): - def setUp(self): + def setup_method(self, method): @deprecate_kwarg('old', 'new') def _f1(new=False): return new @@ -37,7 +40,7 @@ def test_deprecate_kwarg(self): x = 78 with tm.assert_produces_warning(FutureWarning): result = self.f1(old=x) - self.assertIs(result, x) + assert result is x with tm.assert_produces_warning(None): self.f1(new=x) @@ -45,24 +48,24 @@ def test_dict_deprecate_kwarg(self): x = 'yes' with tm.assert_produces_warning(FutureWarning): result = self.f2(old=x) - self.assertEqual(result, True) + assert result def test_missing_deprecate_kwarg(self): x = 'bogus' with tm.assert_produces_warning(FutureWarning): result = self.f2(old=x) - self.assertEqual(result, 'bogus') + assert result == 'bogus' def test_callable_deprecate_kwarg(self): x = 5 with tm.assert_produces_warning(FutureWarning): result = self.f3(old=x) - self.assertEqual(result, x + 1) - with tm.assertRaises(TypeError): + assert result == x + 1 + with pytest.raises(TypeError): self.f3(old='hello') def test_bad_deprecate_kwarg(self): - with tm.assertRaises(TypeError): + with pytest.raises(TypeError): @deprecate_kwarg('old', 'new', 0) def f4(new=None): pass @@ -83,12 +86,12 @@ def test_rands_array(): assert(len(arr[1, 1]) == 7) -class TestValidateArgs(tm.TestCase): +class TestValidateArgs(object): fname = 'func' def test_bad_min_fname_arg_count(self): msg = "'max_fname_arg_count' must be non-negative" - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): validate_args(self.fname, (None,), -1, 'foo') def test_bad_arg_length_max_value_single(self): @@ -103,7 +106,7 @@ def test_bad_arg_length_max_value_single(self): .format(fname=self.fname, max_length=max_length, actual_length=actual_length)) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_args(self.fname, args, min_fname_arg_count, compat_args) @@ -120,7 +123,7 @@ def test_bad_arg_length_max_value_multiple(self): .format(fname=self.fname, max_length=max_length, actual_length=actual_length)) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_args(self.fname, args, min_fname_arg_count, compat_args) @@ -139,7 +142,7 @@ def test_not_all_defaults(self): arg_vals = (1, -1, 3) for i in range(1, 3): - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): validate_args(self.fname, arg_vals[:i], 2, compat_args) def test_validation(self): @@ -153,7 +156,7 @@ def test_validation(self): validate_args(self.fname, (1, None), 2, compat_args) -class TestValidateKwargs(tm.TestCase): +class TestValidateKwargs(object): fname = 'func' def test_bad_kwarg(self): @@ -168,7 +171,7 @@ def test_bad_kwarg(self): r"keyword argument '{arg}'".format( fname=self.fname, arg=badarg)) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_kwargs(self.fname, kwargs, compat_args) def test_not_all_none(self): @@ -189,7 +192,7 @@ def test_not_all_none(self): kwargs = dict(zip(kwarg_keys[:i], kwarg_vals[:i])) - with tm.assertRaisesRegexp(ValueError, msg): + with tm.assert_raises_regex(ValueError, msg): validate_kwargs(self.fname, kwargs, compat_args) def test_validation(self): @@ -208,17 +211,18 @@ def test_validate_bool_kwarg(self): for name in arg_names: for value in invalid_values: - with tm.assertRaisesRegexp(ValueError, - ("For argument \"%s\" expected " - "type bool, received type %s") % - (name, type(value).__name__)): + with tm.assert_raises_regex(ValueError, + "For argument \"%s\" " + "expected type bool, " + "received type %s" % + (name, type(value).__name__)): validate_bool_kwarg(value, name) for value in valid_values: - tm.assert_equal(validate_bool_kwarg(value, name), value) + assert validate_bool_kwarg(value, name) == value -class TestValidateKwargsAndArgs(tm.TestCase): +class TestValidateKwargsAndArgs(object): fname = 'func' def test_invalid_total_length_max_length_one(self): @@ -234,7 +238,7 @@ def test_invalid_total_length_max_length_one(self): .format(fname=self.fname, max_length=max_length, actual_length=actual_length)) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_args_and_kwargs(self.fname, args, kwargs, min_fname_arg_count, compat_args) @@ -252,7 +256,7 @@ def test_invalid_total_length_max_length_multiple(self): .format(fname=self.fname, max_length=max_length, actual_length=actual_length)) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_args_and_kwargs(self.fname, args, kwargs, min_fname_arg_count, compat_args) @@ -271,17 +275,17 @@ def test_no_args_with_kwargs(self): args = () kwargs = {'foo': -5, bad_arg: 2} - tm.assertRaisesRegexp(ValueError, msg, - validate_args_and_kwargs, - self.fname, args, kwargs, - min_fname_arg_count, compat_args) + tm.assert_raises_regex(ValueError, msg, + validate_args_and_kwargs, + self.fname, args, kwargs, + min_fname_arg_count, compat_args) args = (-5, 2) kwargs = {} - tm.assertRaisesRegexp(ValueError, msg, - validate_args_and_kwargs, - self.fname, args, kwargs, - min_fname_arg_count, compat_args) + tm.assert_raises_regex(ValueError, msg, + validate_args_and_kwargs, + self.fname, args, kwargs, + min_fname_arg_count, compat_args) def test_duplicate_argument(self): min_fname_arg_count = 2 @@ -295,7 +299,7 @@ def test_duplicate_argument(self): msg = (r"{fname}\(\) got multiple values for keyword " r"argument '{arg}'".format(fname=self.fname, arg='foo')) - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): validate_args_and_kwargs(self.fname, args, kwargs, min_fname_arg_count, compat_args) @@ -315,13 +319,14 @@ def test_validation(self): compat_args) -class TestMove(tm.TestCase): +class TestMove(object): + def test_cannot_create_instance_of_stolenbuffer(self): """Stolen buffers need to be created through the smart constructor ``move_into_mutable_buffer`` which has a bunch of checks in it. """ msg = "cannot create 'pandas.util._move.stolenbuf' instances" - with tm.assertRaisesRegexp(TypeError, msg): + with tm.assert_raises_regex(TypeError, msg): stolenbuf() def test_more_than_one_ref(self): @@ -330,9 +335,9 @@ def test_more_than_one_ref(self): """ b = b'testing' - with tm.assertRaises(BadMove) as e: + with pytest.raises(BadMove) as e: def handle_success(type_, value, tb): - self.assertIs(value.args[0], b) + assert value.args[0] is b return type(e).handle_success(e, type_, value, tb) # super e.handle_success = handle_success @@ -351,11 +356,11 @@ def test_exactly_one_ref(self): as_stolen_buf = move_into_mutable_buffer(b[:-3]) # materialize as bytearray to show that it is mutable - self.assertEqual(bytearray(as_stolen_buf), b'test') + assert bytearray(as_stolen_buf) == b'test' - @unittest.skipIf( + @pytest.mark.skipif( sys.version_info[0] > 2, - 'bytes objects cannot be interned in py3', + reason='bytes objects cannot be interned in py3', ) def test_interned(self): salt = uuid4().hex @@ -379,19 +384,14 @@ def ref_capture(ob): refcount[0] = sys.getrefcount(ob) - 2 return ob - with tm.assertRaises(BadMove): + with pytest.raises(BadMove): # If we intern the string it will still have one reference but now # it is in the intern table so if other people intern the same # string while the mutable buffer holds the first string they will # be the same instance. move_into_mutable_buffer(ref_capture(intern(make_string()))) # noqa - self.assertEqual( - refcount[0], - 1, - msg='The BadMove was probably raised for refcount reasons instead' - ' of interning reasons', - ) + assert refcount[0] == 1 def test_numpy_errstate_is_default(): @@ -401,9 +401,69 @@ def test_numpy_errstate_is_default(): import numpy as np from pandas.compat import numpy # noqa # The errstate should be unchanged after that import. - tm.assert_equal(np.geterr(), expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) + assert np.geterr() == expected + + +class TestLocaleUtils(object): + + @classmethod + def setup_class(cls): + cls.locales = tm.get_locales() + cls.current_locale = locale.getlocale() + + if not cls.locales: + pytest.skip("No locales found") + + tm._skip_if_windows() + + @classmethod + def teardown_class(cls): + del cls.locales + del cls.current_locale + + def test_get_locales(self): + # all systems should have at least a single locale + assert len(tm.get_locales()) > 0 + + def test_get_locales_prefix(self): + if len(self.locales) == 1: + pytest.skip("Only a single locale found, no point in " + "trying to test filtering locale prefixes") + first_locale = self.locales[0] + assert len(tm.get_locales(prefix=first_locale[:2])) > 0 + + def test_set_locale(self): + if len(self.locales) == 1: + pytest.skip("Only a single locale found, no point in " + "trying to test setting another locale") + + if all(x is None for x in self.current_locale): + # Not sure why, but on some travis runs with pytest, + # getlocale() returned (None, None). + pytest.skip("Current locale is not set.") + + locale_override = os.environ.get('LOCALE_OVERRIDE', None) + + if locale_override is None: + lang, enc = 'it_CH', 'UTF-8' + elif locale_override == 'C': + lang, enc = 'en_US', 'ascii' + else: + lang, enc = locale_override.split('.') + + enc = codecs.lookup(enc).name + new_locale = lang, enc + + if not tm._can_set_locale(new_locale): + with pytest.raises(locale.Error): + with tm.set_locale(new_locale): + pass + else: + with tm.set_locale(new_locale) as normalized_locale: + new_lang, new_enc = normalized_locale.split('.') + new_enc = codecs.lookup(enc).name + normalized_locale = new_lang, new_enc + assert normalized_locale == new_locale + + current_locale = locale.getlocale() + assert current_locale == self.current_locale diff --git a/pandas/tools/hashing.py b/pandas/tools/hashing.py index 800e0b8815443..ba38710b607af 100644 --- a/pandas/tools/hashing.py +++ b/pandas/tools/hashing.py @@ -1,259 +1,18 @@ -""" -data hash pandas / numpy objects -""" -import itertools +import warnings +import sys -import numpy as np -from pandas import _hash, Series, factorize, Categorical, Index, MultiIndex -import pandas.core.algorithms as algos -from pandas.lib import is_bool_array -from pandas.types.generic import ABCIndexClass, ABCSeries, ABCDataFrame -from pandas.types.common import (is_categorical_dtype, is_numeric_dtype, - is_datetime64_dtype, is_timedelta64_dtype, - is_list_like) +m = sys.modules['pandas.tools.hashing'] +for t in ['hash_pandas_object', 'hash_array']: -# 16 byte long hashing key -_default_hash_key = '0123456789123456' + def outer(t=t): + def wrapper(*args, **kwargs): + from pandas import util + warnings.warn("pandas.tools.hashing is deprecated and will be " + "removed in a future version, import " + "from pandas.util", + DeprecationWarning, stacklevel=3) + return getattr(util, t)(*args, **kwargs) + return wrapper -def _combine_hash_arrays(arrays, num_items): - """ - Parameters - ---------- - arrays : generator - num_items : int - - Should be the same as CPython's tupleobject.c - """ - try: - first = next(arrays) - except StopIteration: - return np.array([], dtype=np.uint64) - - arrays = itertools.chain([first], arrays) - - mult = np.uint64(1000003) - out = np.zeros_like(first) + np.uint64(0x345678) - for i, a in enumerate(arrays): - inverse_i = num_items - i - out ^= a - out *= mult - mult += np.uint64(82520 + inverse_i + inverse_i) - assert i + 1 == num_items, 'Fed in wrong num_items' - out += np.uint64(97531) - return out - - -def hash_pandas_object(obj, index=True, encoding='utf8', hash_key=None, - categorize=True): - """ - Return a data hash of the Index/Series/DataFrame - - .. versionadded:: 0.19.2 - - Parameters - ---------- - index : boolean, default True - include the index in the hash (if Series/DataFrame) - encoding : string, default 'utf8' - encoding for data & key when strings - hash_key : string key to encode, default to _default_hash_key - categorize : bool, default True - Whether to first categorize object arrays before hashing. This is more - efficient when the array contains duplicate values. - - .. versionadded:: 0.20.0 - - Returns - ------- - Series of uint64, same length as the object - - """ - if hash_key is None: - hash_key = _default_hash_key - - if isinstance(obj, MultiIndex): - return Series(hash_tuples(obj, encoding, hash_key), - dtype='uint64', copy=False) - - if isinstance(obj, ABCIndexClass): - h = hash_array(obj.values, encoding, hash_key, - categorize).astype('uint64', copy=False) - h = Series(h, index=obj, dtype='uint64', copy=False) - elif isinstance(obj, ABCSeries): - h = hash_array(obj.values, encoding, hash_key, - categorize).astype('uint64', copy=False) - if index: - index_iter = (hash_pandas_object(obj.index, - index=False, - encoding=encoding, - hash_key=hash_key, - categorize=categorize).values - for _ in [None]) - arrays = itertools.chain([h], index_iter) - h = _combine_hash_arrays(arrays, 2) - - h = Series(h, index=obj.index, dtype='uint64', copy=False) - - elif isinstance(obj, ABCDataFrame): - hashes = (hash_array(series.values) for _, series in obj.iteritems()) - num_items = len(obj.columns) - if index: - index_hash_generator = (hash_pandas_object(obj.index, - index=False, - encoding=encoding, - hash_key=hash_key, - categorize=categorize).values # noqa - for _ in [None]) - num_items += 1 - hashes = itertools.chain(hashes, index_hash_generator) - h = _combine_hash_arrays(hashes, num_items) - - h = Series(h, index=obj.index, dtype='uint64', copy=False) - else: - raise TypeError("Unexpected type for hashing %s" % type(obj)) - return h - - -def hash_tuples(vals, encoding='utf8', hash_key=None): - """ - Hash an MultiIndex / list-of-tuples efficiently - - .. versionadded:: 0.20.0 - - Parameters - ---------- - vals : MultiIndex, list-of-tuples, or single tuple - encoding : string, default 'utf8' - hash_key : string key to encode, default to _default_hash_key - - Returns - ------- - ndarray of hashed values array - """ - - is_tuple = False - if isinstance(vals, tuple): - vals = [vals] - is_tuple = True - elif not is_list_like(vals): - raise TypeError("must be convertible to a list-of-tuples") - - if not isinstance(vals, MultiIndex): - vals = MultiIndex.from_tuples(vals) - - # create a list-of-ndarrays - def get_level_values(num): - unique = vals.levels[num] # .values - labels = vals.labels[num] - filled = algos.take_1d(unique._values, labels, - fill_value=unique._na_value) - return filled - - vals = [get_level_values(level) - for level in range(vals.nlevels)] - - # hash the list-of-ndarrays - hashes = (hash_array(l, encoding=encoding, hash_key=hash_key) - for l in vals) - h = _combine_hash_arrays(hashes, len(vals)) - if is_tuple: - h = h[0] - - return h - - -def _hash_categorical(c, encoding, hash_key): - """ - Hash a Categorical by hashing its categories, and then mapping the codes - to the hashes - - Parameters - ---------- - c : Categorical - encoding : string, default 'utf8' - hash_key : string key to encode, default to _default_hash_key - - Returns - ------- - ndarray of hashed values array, same size as len(c) - """ - cat_hashed = hash_array(c.categories.values, encoding, hash_key, - categorize=False).astype(np.uint64, copy=False) - return c.rename_categories(cat_hashed).astype(np.uint64, copy=False) - - -def hash_array(vals, encoding='utf8', hash_key=None, categorize=True): - """ - Given a 1d array, return an array of deterministic integers. - - .. versionadded:: 0.19.2 - - Parameters - ---------- - vals : ndarray, Categorical - encoding : string, default 'utf8' - encoding for data & key when strings - hash_key : string key to encode, default to _default_hash_key - categorize : bool, default True - Whether to first categorize object arrays before hashing. This is more - efficient when the array contains duplicate values. - - .. versionadded:: 0.20.0 - - Returns - ------- - 1d uint64 numpy array of hash values, same length as the vals - - """ - - if not hasattr(vals, 'dtype'): - raise TypeError("must pass a ndarray-like") - - if hash_key is None: - hash_key = _default_hash_key - - # For categoricals, we hash the categories, then remap the codes to the - # hash values. (This check is above the complex check so that we don't ask - # numpy if categorical is a subdtype of complex, as it will choke. - if is_categorical_dtype(vals.dtype): - return _hash_categorical(vals, encoding, hash_key) - - # we'll be working with everything as 64-bit values, so handle this - # 128-bit value early - if np.issubdtype(vals.dtype, np.complex128): - return hash_array(vals.real) + 23 * hash_array(vals.imag) - - # First, turn whatever array this is into unsigned 64-bit ints, if we can - # manage it. - if is_bool_array(vals): - vals = vals.astype('u8') - elif (is_datetime64_dtype(vals) or - is_timedelta64_dtype(vals)): - vals = vals.view('i8').astype('u8', copy=False) - elif (is_numeric_dtype(vals) and vals.dtype.itemsize <= 8): - vals = vals.view('u{}'.format(vals.dtype.itemsize)).astype('u8') - else: - # With repeated values, its MUCH faster to categorize object dtypes, - # then hash and rename categories. We allow skipping the categorization - # when the values are known/likely to be unique. - if categorize: - codes, categories = factorize(vals, sort=False) - cat = Categorical(codes, Index(categories), - ordered=False, fastpath=True) - return _hash_categorical(cat, encoding, hash_key) - - try: - vals = _hash.hash_object_array(vals, hash_key, encoding) - except TypeError: - # we have mixed types - vals = _hash.hash_object_array(vals.astype(str).astype(object), - hash_key, encoding) - - # Then, redistribute these 64-bit ints within the space of 64-bit ints - vals ^= vals >> 30 - vals *= np.uint64(0xbf58476d1ce4e5b9) - vals ^= vals >> 27 - vals *= np.uint64(0x94d049bb133111eb) - vals ^= vals >> 31 - return vals + setattr(m, t, outer(t)) diff --git a/pandas/tools/merge.py b/pandas/tools/merge.py index 3fbd83a6f3245..cd58aa2c7f923 100644 --- a/pandas/tools/merge.py +++ b/pandas/tools/merge.py @@ -1,2031 +1,17 @@ -""" -SQL-style merge routines -""" - -import copy import warnings -import string - -import numpy as np -from pandas.compat import range, lrange, lzip, zip, map, filter -import pandas.compat as compat - -from pandas import (Categorical, DataFrame, Series, - Index, MultiIndex, Timedelta) -from pandas.core.categorical import (_factorize_from_iterable, - _factorize_from_iterables) -from pandas.core.frame import _merge_doc -from pandas.types.generic import ABCSeries -from pandas.types.common import (is_datetime64tz_dtype, - is_datetime64_dtype, - needs_i8_conversion, - is_int64_dtype, - is_integer_dtype, - is_float_dtype, - is_integer, - is_int_or_datetime_dtype, - is_dtype_equal, - is_bool, - is_list_like, - _ensure_int64, - _ensure_float64, - _ensure_object, - _get_dtype) -from pandas.types.missing import na_value_for_dtype - -from pandas.core.generic import NDFrame -from pandas.core.index import (_get_combined_index, - _ensure_index, _get_consensus_names, - _all_indexes_same) -from pandas.core.internals import (items_overlap_with_suffix, - concatenate_block_managers) -from pandas.util.decorators import Appender, Substitution - -import pandas.core.algorithms as algos -import pandas.core.common as com -import pandas.types.concat as _concat - -import pandas._join as _join -import pandas.hashtable as _hash - - -@Substitution('\nleft : DataFrame') -@Appender(_merge_doc, indents=0) -def merge(left, right, how='inner', on=None, left_on=None, right_on=None, - left_index=False, right_index=False, sort=False, - suffixes=('_x', '_y'), copy=True, indicator=False): - op = _MergeOperation(left, right, how=how, on=on, left_on=left_on, - right_on=right_on, left_index=left_index, - right_index=right_index, sort=sort, suffixes=suffixes, - copy=copy, indicator=indicator) - return op.get_result() -if __debug__: - merge.__doc__ = _merge_doc % '\nleft : DataFrame' - - -class MergeError(ValueError): - pass - - -def _groupby_and_merge(by, on, left, right, _merge_pieces, - check_duplicates=True): - """ - groupby & merge; we are always performing a left-by type operation - - Parameters - ---------- - by: field to group - on: duplicates field - left: left frame - right: right frame - _merge_pieces: function for merging - check_duplicates: boolean, default True - should we check & clean duplicates - """ - - pieces = [] - if not isinstance(by, (list, tuple)): - by = [by] - - lby = left.groupby(by, sort=False) - - # if we can groupby the rhs - # then we can get vastly better perf - try: - - # we will check & remove duplicates if indicated - if check_duplicates: - if on is None: - on = [] - elif not isinstance(on, (list, tuple)): - on = [on] - - if right.duplicated(by + on).any(): - right = right.drop_duplicates(by + on, keep='last') - rby = right.groupby(by, sort=False) - except KeyError: - rby = None - - for key, lhs in lby: - - if rby is None: - rhs = right - else: - try: - rhs = right.take(rby.indices[key]) - except KeyError: - # key doesn't exist in left - lcols = lhs.columns.tolist() - cols = lcols + [r for r in right.columns - if r not in set(lcols)] - merged = lhs.reindex(columns=cols) - merged.index = range(len(merged)) - pieces.append(merged) - continue - - merged = _merge_pieces(lhs, rhs) - - # make sure join keys are in the merged - # TODO, should _merge_pieces do this? - for k in by: - try: - if k in merged: - merged[k] = key - except: - pass - - pieces.append(merged) - - # preserve the original order - # if we have a missing piece this can be reset - result = concat(pieces, ignore_index=True) - result = result.reindex(columns=pieces[0].columns, copy=False) - return result, lby - - -def ordered_merge(left, right, on=None, - left_on=None, right_on=None, - left_by=None, right_by=None, - fill_method=None, suffixes=('_x', '_y')): - - warnings.warn("ordered_merge is deprecated and replaced by merge_ordered", - FutureWarning, stacklevel=2) - return merge_ordered(left, right, on=on, - left_on=left_on, right_on=right_on, - left_by=left_by, right_by=right_by, - fill_method=fill_method, suffixes=suffixes) - - -def merge_ordered(left, right, on=None, - left_on=None, right_on=None, - left_by=None, right_by=None, - fill_method=None, suffixes=('_x', '_y'), - how='outer'): - """Perform merge with optional filling/interpolation designed for ordered - data like time series data. Optionally perform group-wise merge (see - examples) - - Parameters - ---------- - left : DataFrame - right : DataFrame - on : label or list - Field names to join on. Must be found in both DataFrames. - left_on : label or list, or array-like - Field names to join on in left DataFrame. Can be a vector or list of - vectors of the length of the DataFrame to use a particular vector as - the join key instead of columns - right_on : label or list, or array-like - Field names to join on in right DataFrame or vector/list of vectors per - left_on docs - left_by : column name or list of column names - Group left DataFrame by group columns and merge piece by piece with - right DataFrame - right_by : column name or list of column names - Group right DataFrame by group columns and merge piece by piece with - left DataFrame - fill_method : {'ffill', None}, default None - Interpolation method for data - suffixes : 2-length sequence (tuple, list, ...) - Suffix to apply to overlapping column names in the left and right - side, respectively - how : {'left', 'right', 'outer', 'inner'}, default 'outer' - * left: use only keys from left frame (SQL: left outer join) - * right: use only keys from right frame (SQL: right outer join) - * outer: use union of keys from both frames (SQL: full outer join) - * inner: use intersection of keys from both frames (SQL: inner join) - - .. versionadded:: 0.19.0 - - Examples - -------- - >>> A >>> B - key lvalue group key rvalue - 0 a 1 a 0 b 1 - 1 c 2 a 1 c 2 - 2 e 3 a 2 d 3 - 3 a 1 b - 4 c 2 b - 5 e 3 b - - >>> ordered_merge(A, B, fill_method='ffill', left_by='group') - key lvalue group rvalue - 0 a 1 a NaN - 1 b 1 a 1 - 2 c 2 a 2 - 3 d 2 a 3 - 4 e 3 a 3 - 5 f 3 a 4 - 6 a 1 b NaN - 7 b 1 b 1 - 8 c 2 b 2 - 9 d 2 b 3 - 10 e 3 b 3 - 11 f 3 b 4 - - Returns - ------- - merged : DataFrame - The output type will the be same as 'left', if it is a subclass - of DataFrame. - - See also - -------- - merge - merge_asof - - """ - def _merger(x, y): - # perform the ordered merge operation - op = _OrderedMerge(x, y, on=on, left_on=left_on, right_on=right_on, - suffixes=suffixes, fill_method=fill_method, - how=how) - return op.get_result() - - if left_by is not None and right_by is not None: - raise ValueError('Can only group either left or right frames') - elif left_by is not None: - result, _ = _groupby_and_merge(left_by, on, left, right, - lambda x, y: _merger(x, y), - check_duplicates=False) - elif right_by is not None: - result, _ = _groupby_and_merge(right_by, on, right, left, - lambda x, y: _merger(y, x), - check_duplicates=False) - else: - result = _merger(left, right) - return result - -ordered_merge.__doc__ = merge_ordered.__doc__ - - -def merge_asof(left, right, on=None, - left_on=None, right_on=None, - left_index=False, right_index=False, - by=None, left_by=None, right_by=None, - suffixes=('_x', '_y'), - tolerance=None, - allow_exact_matches=True, - direction='backward'): - """Perform an asof merge. This is similar to a left-join except that we - match on nearest key rather than equal keys. - - Both DataFrames must be sorted by the key. - - For each row in the left DataFrame: - - - A "backward" search selects the last row in the right DataFrame whose - 'on' key is less than or equal to the left's key. - - - A "forward" search selects the first row in the right DataFrame whose - 'on' key is greater than or equal to the left's key. - - - A "nearest" search selects the row in the right DataFrame whose 'on' - key is closest in absolute distance to the left's key. - - The default is "backward" and is the compatible in versions below 0.20.0. - The direction parameter was added in version 0.20.0 and introduces - "forward" and "nearest". - - Optionally match on equivalent keys with 'by' before searching with 'on'. - - .. versionadded:: 0.19.0 - - Parameters - ---------- - left : DataFrame - right : DataFrame - on : label - Field name to join on. Must be found in both DataFrames. - The data MUST be ordered. Furthermore this must be a numeric column, - such as datetimelike, integer, or float. On or left_on/right_on - must be given. - left_on : label - Field name to join on in left DataFrame. - right_on : label - Field name to join on in right DataFrame. - left_index : boolean - Use the index of the left DataFrame as the join key. - - .. versionadded:: 0.19.2 - - right_index : boolean - Use the index of the right DataFrame as the join key. - - .. versionadded:: 0.19.2 - - by : column name or list of column names - Match on these columns before performing merge operation. - left_by : column name - Field names to match on in the left DataFrame. - - .. versionadded:: 0.19.2 - - right_by : column name - Field names to match on in the right DataFrame. - - .. versionadded:: 0.19.2 - - suffixes : 2-length sequence (tuple, list, ...) - Suffix to apply to overlapping column names in the left and right - side, respectively - tolerance : integer or Timedelta, optional, default None - select asof tolerance within this range; must be compatible - to the merge index. - allow_exact_matches : boolean, default True - - - If True, allow matching the same 'on' value - (i.e. less-than-or-equal-to / greater-than-or-equal-to) - - If False, don't match the same 'on' value - (i.e., stricly less-than / strictly greater-than) - - direction : 'backward' (default), 'forward', or 'nearest' - Whether to search for prior, subsequent, or closest matches. - - .. versionadded:: 0.20.0 - - Returns - ------- - merged : DataFrame - - Examples - -------- - >>> left - a left_val - 0 1 a - 1 5 b - 2 10 c - - >>> right - a right_val - 0 1 1 - 1 2 2 - 2 3 3 - 3 6 6 - 4 7 7 - - >>> pd.merge_asof(left, right, on='a') - a left_val right_val - 0 1 a 1 - 1 5 b 3 - 2 10 c 7 - - >>> pd.merge_asof(left, right, on='a', allow_exact_matches=False) - a left_val right_val - 0 1 a NaN - 1 5 b 3.0 - 2 10 c 7.0 - - >>> pd.merge_asof(left, right, on='a', direction='forward') - a left_val right_val - 0 1 a 1.0 - 1 5 b 6.0 - 2 10 c NaN - - >>> pd.merge_asof(left, right, on='a', direction='nearest') - a left_val right_val - 0 1 a 1 - 1 5 b 6 - 2 10 c 7 - - We can use indexed DataFrames as well. - - >>> left - left_val - 1 a - 5 b - 10 c - - >>> right - right_val - 1 1 - 2 2 - 3 3 - 6 6 - 7 7 - - >>> pd.merge_asof(left, right, left_index=True, right_index=True) - left_val right_val - 1 a 1 - 5 b 3 - 10 c 7 - - Here is a real-world times-series example - - >>> quotes - time ticker bid ask - 0 2016-05-25 13:30:00.023 GOOG 720.50 720.93 - 1 2016-05-25 13:30:00.023 MSFT 51.95 51.96 - 2 2016-05-25 13:30:00.030 MSFT 51.97 51.98 - 3 2016-05-25 13:30:00.041 MSFT 51.99 52.00 - 4 2016-05-25 13:30:00.048 GOOG 720.50 720.93 - 5 2016-05-25 13:30:00.049 AAPL 97.99 98.01 - 6 2016-05-25 13:30:00.072 GOOG 720.50 720.88 - 7 2016-05-25 13:30:00.075 MSFT 52.01 52.03 - - >>> trades - time ticker price quantity - 0 2016-05-25 13:30:00.023 MSFT 51.95 75 - 1 2016-05-25 13:30:00.038 MSFT 51.95 155 - 2 2016-05-25 13:30:00.048 GOOG 720.77 100 - 3 2016-05-25 13:30:00.048 GOOG 720.92 100 - 4 2016-05-25 13:30:00.048 AAPL 98.00 100 - - By default we are taking the asof of the quotes - - >>> pd.merge_asof(trades, quotes, - ... on='time', - ... by='ticker') - time ticker price quantity bid ask - 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 - 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 - 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 - 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 - 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN - - We only asof within 2ms betwen the quote time and the trade time - - >>> pd.merge_asof(trades, quotes, - ... on='time', - ... by='ticker', - ... tolerance=pd.Timedelta('2ms')) - time ticker price quantity bid ask - 0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 - 1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN - 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 - 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 - 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN - - We only asof within 10ms betwen the quote time and the trade time - and we exclude exact matches on time. However *prior* data will - propogate forward - - >>> pd.merge_asof(trades, quotes, - ... on='time', - ... by='ticker', - ... tolerance=pd.Timedelta('10ms'), - ... allow_exact_matches=False) - time ticker price quantity bid ask - 0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN - 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 - 2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 - 3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 - 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN - - See also - -------- - merge - merge_ordered - - """ - op = _AsOfMerge(left, right, - on=on, left_on=left_on, right_on=right_on, - left_index=left_index, right_index=right_index, - by=by, left_by=left_by, right_by=right_by, - suffixes=suffixes, - how='asof', tolerance=tolerance, - allow_exact_matches=allow_exact_matches, - direction=direction) - return op.get_result() - - -# TODO: transformations?? -# TODO: only copy DataFrames when modification necessary -class _MergeOperation(object): - """ - Perform a database (SQL) merge operation between two DataFrame objects - using either columns as keys or their row indexes - """ - _merge_type = 'merge' - - def __init__(self, left, right, how='inner', on=None, - left_on=None, right_on=None, axis=1, - left_index=False, right_index=False, sort=True, - suffixes=('_x', '_y'), copy=True, indicator=False): - self.left = self.orig_left = left - self.right = self.orig_right = right - self.how = how - self.axis = axis - - self.on = com._maybe_make_list(on) - self.left_on = com._maybe_make_list(left_on) - self.right_on = com._maybe_make_list(right_on) - - self.copy = copy - self.suffixes = suffixes - self.sort = sort - - self.left_index = left_index - self.right_index = right_index - - self.indicator = indicator - - if isinstance(self.indicator, compat.string_types): - self.indicator_name = self.indicator - elif isinstance(self.indicator, bool): - self.indicator_name = '_merge' if self.indicator else None - else: - raise ValueError( - 'indicator option can only accept boolean or string arguments') - - if not isinstance(left, DataFrame): - raise ValueError( - 'can not merge DataFrame with instance of ' - 'type {0}'.format(type(left))) - if not isinstance(right, DataFrame): - raise ValueError( - 'can not merge DataFrame with instance of ' - 'type {0}'.format(type(right))) - - if not is_bool(left_index): - raise ValueError( - 'left_index parameter must be of type bool, not ' - '{0}'.format(type(left_index))) - if not is_bool(right_index): - raise ValueError( - 'right_index parameter must be of type bool, not ' - '{0}'.format(type(right_index))) - - # warn user when merging between different levels - if left.columns.nlevels != right.columns.nlevels: - msg = ('merging between different levels can give an unintended ' - 'result ({0} levels on the left, {1} on the right)') - msg = msg.format(left.columns.nlevels, right.columns.nlevels) - warnings.warn(msg, UserWarning) - - self._validate_specification() - - # note this function has side effects - (self.left_join_keys, - self.right_join_keys, - self.join_names) = self._get_merge_keys() - - def get_result(self): - if self.indicator: - self.left, self.right = self._indicator_pre_merge( - self.left, self.right) - - join_index, left_indexer, right_indexer = self._get_join_info() - - ldata, rdata = self.left._data, self.right._data - lsuf, rsuf = self.suffixes - - llabels, rlabels = items_overlap_with_suffix(ldata.items, lsuf, - rdata.items, rsuf) - - lindexers = {1: left_indexer} if left_indexer is not None else {} - rindexers = {1: right_indexer} if right_indexer is not None else {} - - result_data = concatenate_block_managers( - [(ldata, lindexers), (rdata, rindexers)], - axes=[llabels.append(rlabels), join_index], - concat_axis=0, copy=self.copy) - - typ = self.left._constructor - result = typ(result_data).__finalize__(self, method=self._merge_type) - - if self.indicator: - result = self._indicator_post_merge(result) - - self._maybe_add_join_keys(result, left_indexer, right_indexer) - - return result - - def _indicator_pre_merge(self, left, right): - - columns = left.columns.union(right.columns) - - for i in ['_left_indicator', '_right_indicator']: - if i in columns: - raise ValueError("Cannot use `indicator=True` option when " - "data contains a column named {}".format(i)) - if self.indicator_name in columns: - raise ValueError( - "Cannot use name of an existing column for indicator column") - - left = left.copy() - right = right.copy() - - left['_left_indicator'] = 1 - left['_left_indicator'] = left['_left_indicator'].astype('int8') - - right['_right_indicator'] = 2 - right['_right_indicator'] = right['_right_indicator'].astype('int8') - - return left, right - - def _indicator_post_merge(self, result): - - result['_left_indicator'] = result['_left_indicator'].fillna(0) - result['_right_indicator'] = result['_right_indicator'].fillna(0) - - result[self.indicator_name] = Categorical((result['_left_indicator'] + - result['_right_indicator']), - categories=[1, 2, 3]) - result[self.indicator_name] = ( - result[self.indicator_name] - .cat.rename_categories(['left_only', 'right_only', 'both'])) - - result = result.drop(labels=['_left_indicator', '_right_indicator'], - axis=1) - return result - - def _maybe_add_join_keys(self, result, left_indexer, right_indexer): - - left_has_missing = None - right_has_missing = None - - keys = zip(self.join_names, self.left_on, self.right_on) - for i, (name, lname, rname) in enumerate(keys): - if not _should_fill(lname, rname): - continue - - take_left, take_right = None, None - - if name in result: - - if left_indexer is not None and right_indexer is not None: - if name in self.left: - - if left_has_missing is None: - left_has_missing = (left_indexer == -1).any() - - if left_has_missing: - take_right = self.right_join_keys[i] - - if not is_dtype_equal(result[name].dtype, - self.left[name].dtype): - take_left = self.left[name]._values - - elif name in self.right: - - if right_has_missing is None: - right_has_missing = (right_indexer == -1).any() - - if right_has_missing: - take_left = self.left_join_keys[i] - - if not is_dtype_equal(result[name].dtype, - self.right[name].dtype): - take_right = self.right[name]._values - - elif left_indexer is not None \ - and isinstance(self.left_join_keys[i], np.ndarray): - - take_left = self.left_join_keys[i] - take_right = self.right_join_keys[i] - - if take_left is not None or take_right is not None: - - if take_left is None: - lvals = result[name]._values - else: - lfill = na_value_for_dtype(take_left.dtype) - lvals = algos.take_1d(take_left, left_indexer, - fill_value=lfill) - - if take_right is None: - rvals = result[name]._values - else: - rfill = na_value_for_dtype(take_right.dtype) - rvals = algos.take_1d(take_right, right_indexer, - fill_value=rfill) - - # if we have an all missing left_indexer - # make sure to just use the right values - mask = left_indexer == -1 - if mask.all(): - key_col = rvals - else: - key_col = Index(lvals).where(~mask, rvals) - - if name in result: - result[name] = key_col - else: - result.insert(i, name or 'key_%d' % i, key_col) - - def _get_join_indexers(self): - """ return the join indexers """ - return _get_join_indexers(self.left_join_keys, - self.right_join_keys, - sort=self.sort, - how=self.how) - - def _get_join_info(self): - left_ax = self.left._data.axes[self.axis] - right_ax = self.right._data.axes[self.axis] - - if self.left_index and self.right_index and self.how != 'asof': - join_index, left_indexer, right_indexer = \ - left_ax.join(right_ax, how=self.how, return_indexers=True) - elif self.right_index and self.how == 'left': - join_index, left_indexer, right_indexer = \ - _left_join_on_index(left_ax, right_ax, self.left_join_keys, - sort=self.sort) - - elif self.left_index and self.how == 'right': - join_index, right_indexer, left_indexer = \ - _left_join_on_index(right_ax, left_ax, self.right_join_keys, - sort=self.sort) - else: - (left_indexer, - right_indexer) = self._get_join_indexers() - - if self.right_index: - if len(self.left) > 0: - join_index = self.left.index.take(left_indexer) - else: - join_index = self.right.index.take(right_indexer) - left_indexer = np.array([-1] * len(join_index)) - elif self.left_index: - if len(self.right) > 0: - join_index = self.right.index.take(right_indexer) - else: - join_index = self.left.index.take(left_indexer) - right_indexer = np.array([-1] * len(join_index)) - else: - join_index = Index(np.arange(len(left_indexer))) - - if len(join_index) == 0: - join_index = join_index.astype(object) - return join_index, left_indexer, right_indexer - - def _get_merge_data(self): - """ - Handles overlapping column names etc. - """ - ldata, rdata = self.left._data, self.right._data - lsuf, rsuf = self.suffixes - - llabels, rlabels = items_overlap_with_suffix( - ldata.items, lsuf, rdata.items, rsuf) - - if not llabels.equals(ldata.items): - ldata = ldata.copy(deep=False) - ldata.set_axis(0, llabels) - - if not rlabels.equals(rdata.items): - rdata = rdata.copy(deep=False) - rdata.set_axis(0, rlabels) - - return ldata, rdata - - def _get_merge_keys(self): - """ - Note: has side effects (copy/delete key columns) - - Parameters - ---------- - left - right - on - - Returns - ------- - left_keys, right_keys - """ - left_keys = [] - right_keys = [] - join_names = [] - right_drop = [] - left_drop = [] - left, right = self.left, self.right - - is_lkey = lambda x: isinstance( - x, (np.ndarray, ABCSeries)) and len(x) == len(left) - is_rkey = lambda x: isinstance( - x, (np.ndarray, ABCSeries)) and len(x) == len(right) - - # Note that pd.merge_asof() has separate 'on' and 'by' parameters. A - # user could, for example, request 'left_index' and 'left_by'. In a - # regular pd.merge(), users cannot specify both 'left_index' and - # 'left_on'. (Instead, users have a MultiIndex). That means the - # self.left_on in this function is always empty in a pd.merge(), but - # a pd.merge_asof(left_index=True, left_by=...) will result in a - # self.left_on array with a None in the middle of it. This requires - # a work-around as designated in the code below. - # See _validate_specification() for where this happens. - - # ugh, spaghetti re #733 - if _any(self.left_on) and _any(self.right_on): - for lk, rk in zip(self.left_on, self.right_on): - if is_lkey(lk): - left_keys.append(lk) - if is_rkey(rk): - right_keys.append(rk) - join_names.append(None) # what to do? - else: - if rk is not None: - right_keys.append(right[rk]._values) - join_names.append(rk) - else: - # work-around for merge_asof(right_index=True) - right_keys.append(right.index) - join_names.append(right.index.name) - else: - if not is_rkey(rk): - if rk is not None: - right_keys.append(right[rk]._values) - else: - # work-around for merge_asof(right_index=True) - right_keys.append(right.index) - if lk is not None and lk == rk: - # avoid key upcast in corner case (length-0) - if len(left) > 0: - right_drop.append(rk) - else: - left_drop.append(lk) - else: - right_keys.append(rk) - if lk is not None: - left_keys.append(left[lk]._values) - join_names.append(lk) - else: - # work-around for merge_asof(left_index=True) - left_keys.append(left.index) - join_names.append(left.index.name) - elif _any(self.left_on): - for k in self.left_on: - if is_lkey(k): - left_keys.append(k) - join_names.append(None) - else: - left_keys.append(left[k]._values) - join_names.append(k) - if isinstance(self.right.index, MultiIndex): - right_keys = [lev._values.take(lab) - for lev, lab in zip(self.right.index.levels, - self.right.index.labels)] - else: - right_keys = [self.right.index.values] - elif _any(self.right_on): - for k in self.right_on: - if is_rkey(k): - right_keys.append(k) - join_names.append(None) - else: - right_keys.append(right[k]._values) - join_names.append(k) - if isinstance(self.left.index, MultiIndex): - left_keys = [lev._values.take(lab) - for lev, lab in zip(self.left.index.levels, - self.left.index.labels)] - else: - left_keys = [self.left.index.values] - - if left_drop: - self.left = self.left.drop(left_drop, axis=1) - - if right_drop: - self.right = self.right.drop(right_drop, axis=1) - - return left_keys, right_keys, join_names - - def _validate_specification(self): - # Hm, any way to make this logic less complicated?? - if self.on is None and self.left_on is None and self.right_on is None: - - if self.left_index and self.right_index: - self.left_on, self.right_on = (), () - elif self.left_index: - if self.right_on is None: - raise MergeError('Must pass right_on or right_index=True') - elif self.right_index: - if self.left_on is None: - raise MergeError('Must pass left_on or left_index=True') - else: - # use the common columns - common_cols = self.left.columns.intersection( - self.right.columns) - if len(common_cols) == 0: - raise MergeError('No common columns to perform merge on') - if not common_cols.is_unique: - raise MergeError("Data columns not unique: %s" - % repr(common_cols)) - self.left_on = self.right_on = common_cols - elif self.on is not None: - if self.left_on is not None or self.right_on is not None: - raise MergeError('Can only pass argument "on" OR "left_on" ' - 'and "right_on", not a combination of both.') - self.left_on = self.right_on = self.on - elif self.left_on is not None: - n = len(self.left_on) - if self.right_index: - if len(self.left_on) != self.right.index.nlevels: - raise ValueError('len(left_on) must equal the number ' - 'of levels in the index of "right"') - self.right_on = [None] * n - elif self.right_on is not None: - n = len(self.right_on) - if self.left_index: - if len(self.right_on) != self.left.index.nlevels: - raise ValueError('len(right_on) must equal the number ' - 'of levels in the index of "left"') - self.left_on = [None] * n - if len(self.right_on) != len(self.left_on): - raise ValueError("len(right_on) must equal len(left_on)") - - -def _get_join_indexers(left_keys, right_keys, sort=False, how='inner', - **kwargs): - """ - - Parameters - ---------- - - Returns - ------- - - """ - from functools import partial - - assert len(left_keys) == len(right_keys), \ - 'left_key and right_keys must be the same length' - - # bind `sort` arg. of _factorize_keys - fkeys = partial(_factorize_keys, sort=sort) - - # get left & right join labels and num. of levels at each location - llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys))) - - # get flat i8 keys from label lists - lkey, rkey = _get_join_keys(llab, rlab, shape, sort) - - # factorize keys to a dense i8 space - # `count` is the num. of unique keys - # set(lkey) | set(rkey) == range(count) - lkey, rkey, count = fkeys(lkey, rkey) - - # preserve left frame order if how == 'left' and sort == False - kwargs = copy.copy(kwargs) - if how == 'left': - kwargs['sort'] = sort - join_func = _join_functions[how] - - return join_func(lkey, rkey, count, **kwargs) - - -class _OrderedMerge(_MergeOperation): - _merge_type = 'ordered_merge' - - def __init__(self, left, right, on=None, left_on=None, right_on=None, - left_index=False, right_index=False, axis=1, - suffixes=('_x', '_y'), copy=True, - fill_method=None, how='outer'): - - self.fill_method = fill_method - _MergeOperation.__init__(self, left, right, on=on, left_on=left_on, - left_index=left_index, - right_index=right_index, - right_on=right_on, axis=axis, - how=how, suffixes=suffixes, - sort=True # factorize sorts - ) - - def get_result(self): - join_index, left_indexer, right_indexer = self._get_join_info() - - # this is a bit kludgy - ldata, rdata = self.left._data, self.right._data - lsuf, rsuf = self.suffixes - - llabels, rlabels = items_overlap_with_suffix(ldata.items, lsuf, - rdata.items, rsuf) - - if self.fill_method == 'ffill': - left_join_indexer = _join.ffill_indexer(left_indexer) - right_join_indexer = _join.ffill_indexer(right_indexer) - else: - left_join_indexer = left_indexer - right_join_indexer = right_indexer - - lindexers = { - 1: left_join_indexer} if left_join_indexer is not None else {} - rindexers = { - 1: right_join_indexer} if right_join_indexer is not None else {} - - result_data = concatenate_block_managers( - [(ldata, lindexers), (rdata, rindexers)], - axes=[llabels.append(rlabels), join_index], - concat_axis=0, copy=self.copy) - - typ = self.left._constructor - result = typ(result_data).__finalize__(self, method=self._merge_type) - - self._maybe_add_join_keys(result, left_indexer, right_indexer) - - return result - - -def _asof_function(direction, on_type): - return getattr(_join, 'asof_join_%s_%s' % (direction, on_type), None) - - -def _asof_by_function(direction, on_type, by_type): - return getattr(_join, 'asof_join_%s_%s_by_%s' % - (direction, on_type, by_type), None) - - -_type_casters = { - 'int64_t': _ensure_int64, - 'double': _ensure_float64, - 'object': _ensure_object, -} - -_cython_types = { - 'uint8': 'uint8_t', - 'uint32': 'uint32_t', - 'uint16': 'uint16_t', - 'uint64': 'uint64_t', - 'int8': 'int8_t', - 'int32': 'int32_t', - 'int16': 'int16_t', - 'int64': 'int64_t', - 'float16': 'error', - 'float32': 'float', - 'float64': 'double', -} - - -def _get_cython_type(dtype): - """ Given a dtype, return a C name like 'int64_t' or 'double' """ - type_name = _get_dtype(dtype).name - ctype = _cython_types.get(type_name, 'object') - if ctype == 'error': - raise MergeError('unsupported type: ' + type_name) - return ctype - - -def _get_cython_type_upcast(dtype): - """ Upcast a dtype to 'int64_t', 'double', or 'object' """ - if is_integer_dtype(dtype): - return 'int64_t' - elif is_float_dtype(dtype): - return 'double' - else: - return 'object' - - -class _AsOfMerge(_OrderedMerge): - _merge_type = 'asof_merge' - - def __init__(self, left, right, on=None, left_on=None, right_on=None, - left_index=False, right_index=False, - by=None, left_by=None, right_by=None, - axis=1, suffixes=('_x', '_y'), copy=True, - fill_method=None, - how='asof', tolerance=None, - allow_exact_matches=True, - direction='backward'): - - self.by = by - self.left_by = left_by - self.right_by = right_by - self.tolerance = tolerance - self.allow_exact_matches = allow_exact_matches - self.direction = direction - - _OrderedMerge.__init__(self, left, right, on=on, left_on=left_on, - right_on=right_on, left_index=left_index, - right_index=right_index, axis=axis, - how=how, suffixes=suffixes, - fill_method=fill_method) - - def _validate_specification(self): - super(_AsOfMerge, self)._validate_specification() - - # we only allow on to be a single item for on - if len(self.left_on) != 1 and not self.left_index: - raise MergeError("can only asof on a key for left") - - if len(self.right_on) != 1 and not self.right_index: - raise MergeError("can only asof on a key for right") - - if self.left_index and isinstance(self.left.index, MultiIndex): - raise MergeError("left can only have one index") - - if self.right_index and isinstance(self.right.index, MultiIndex): - raise MergeError("right can only have one index") - - # set 'by' columns - if self.by is not None: - if self.left_by is not None or self.right_by is not None: - raise MergeError('Can only pass by OR left_by ' - 'and right_by') - self.left_by = self.right_by = self.by - if self.left_by is None and self.right_by is not None: - raise MergeError('missing left_by') - if self.left_by is not None and self.right_by is None: - raise MergeError('missing right_by') - - # add by to our key-list so we can have it in the - # output as a key - if self.left_by is not None: - if not is_list_like(self.left_by): - self.left_by = [self.left_by] - if not is_list_like(self.right_by): - self.right_by = [self.right_by] - - self.left_on = self.left_by + list(self.left_on) - self.right_on = self.right_by + list(self.right_on) - - # check 'direction' is valid - if self.direction not in ['backward', 'forward', 'nearest']: - raise MergeError('direction invalid: ' + self.direction) - - @property - def _asof_key(self): - """ This is our asof key, the 'on' """ - return self.left_on[-1] - - def _get_merge_keys(self): - - # note this function has side effects - (left_join_keys, - right_join_keys, - join_names) = super(_AsOfMerge, self)._get_merge_keys() - - # validate index types are the same - for lk, rk in zip(left_join_keys, right_join_keys): - if not is_dtype_equal(lk.dtype, rk.dtype): - raise MergeError("incompatible merge keys, " - "must be the same type") - - # validate tolerance; must be a Timedelta if we have a DTI - if self.tolerance is not None: - - if self.left_index: - lt = self.left.index - else: - lt = left_join_keys[-1] - - msg = "incompatible tolerance, must be compat " \ - "with type {0}".format(type(lt)) - - if is_datetime64_dtype(lt) or is_datetime64tz_dtype(lt): - if not isinstance(self.tolerance, Timedelta): - raise MergeError(msg) - if self.tolerance < Timedelta(0): - raise MergeError("tolerance must be positive") - - elif is_int64_dtype(lt): - if not is_integer(self.tolerance): - raise MergeError(msg) - if self.tolerance < 0: - raise MergeError("tolerance must be positive") - - else: - raise MergeError("key must be integer or timestamp") - - # validate allow_exact_matches - if not is_bool(self.allow_exact_matches): - raise MergeError("allow_exact_matches must be boolean, " - "passed {0}".format(self.allow_exact_matches)) - - return left_join_keys, right_join_keys, join_names - - def _get_join_indexers(self): - """ return the join indexers """ - - def flip(xs): - """ unlike np.transpose, this returns an array of tuples """ - labels = list(string.ascii_lowercase[:len(xs)]) - dtypes = [x.dtype for x in xs] - labeled_dtypes = list(zip(labels, dtypes)) - return np.array(lzip(*xs), labeled_dtypes) - - # values to compare - left_values = (self.left.index.values if self.left_index else - self.left_join_keys[-1]) - right_values = (self.right.index.values if self.right_index else - self.right_join_keys[-1]) - tolerance = self.tolerance - - # we required sortedness in the join keys - msg = " keys must be sorted" - if not Index(left_values).is_monotonic: - raise ValueError('left' + msg) - if not Index(right_values).is_monotonic: - raise ValueError('right' + msg) - - # initial type conversion as needed - if needs_i8_conversion(left_values): - left_values = left_values.view('i8') - right_values = right_values.view('i8') - if tolerance is not None: - tolerance = tolerance.value - - # a "by" parameter requires special handling - if self.left_by is not None: - if len(self.left_join_keys) > 2: - # get tuple representation of values if more than one - left_by_values = flip(self.left_join_keys[0:-1]) - right_by_values = flip(self.right_join_keys[0:-1]) - else: - left_by_values = self.left_join_keys[0] - right_by_values = self.right_join_keys[0] - - # upcast 'by' parameter because HashTable is limited - by_type = _get_cython_type_upcast(left_by_values.dtype) - by_type_caster = _type_casters[by_type] - left_by_values = by_type_caster(left_by_values) - right_by_values = by_type_caster(right_by_values) - - # choose appropriate function by type - on_type = _get_cython_type(left_values.dtype) - func = _asof_by_function(self.direction, on_type, by_type) - return func(left_values, - right_values, - left_by_values, - right_by_values, - self.allow_exact_matches, - tolerance) - else: - # choose appropriate function by type - on_type = _get_cython_type(left_values.dtype) - func = _asof_function(self.direction, on_type) - return func(left_values, - right_values, - self.allow_exact_matches, - tolerance) - - -def _get_multiindex_indexer(join_keys, index, sort): - from functools import partial - - # bind `sort` argument - fkeys = partial(_factorize_keys, sort=sort) - - # left & right join labels and num. of levels at each location - rlab, llab, shape = map(list, zip(* map(fkeys, index.levels, join_keys))) - if sort: - rlab = list(map(np.take, rlab, index.labels)) - else: - i8copy = lambda a: a.astype('i8', subok=False, copy=True) - rlab = list(map(i8copy, index.labels)) - - # fix right labels if there were any nulls - for i in range(len(join_keys)): - mask = index.labels[i] == -1 - if mask.any(): - # check if there already was any nulls at this location - # if there was, it is factorized to `shape[i] - 1` - a = join_keys[i][llab[i] == shape[i] - 1] - if a.size == 0 or not a[0] != a[0]: - shape[i] += 1 - - rlab[i][mask] = shape[i] - 1 - - # get flat i8 join keys - lkey, rkey = _get_join_keys(llab, rlab, shape, sort) - - # factorize keys to a dense i8 space - lkey, rkey, count = fkeys(lkey, rkey) - - return _join.left_outer_join(lkey, rkey, count, sort=sort) - - -def _get_single_indexer(join_key, index, sort=False): - left_key, right_key, count = _factorize_keys(join_key, index, sort=sort) - - left_indexer, right_indexer = _join.left_outer_join( - _ensure_int64(left_key), - _ensure_int64(right_key), - count, sort=sort) - - return left_indexer, right_indexer - - -def _left_join_on_index(left_ax, right_ax, join_keys, sort=False): - if len(join_keys) > 1: - if not ((isinstance(right_ax, MultiIndex) and - len(join_keys) == right_ax.nlevels)): - raise AssertionError("If more than one join key is given then " - "'right_ax' must be a MultiIndex and the " - "number of join keys must be the number of " - "levels in right_ax") - - left_indexer, right_indexer = \ - _get_multiindex_indexer(join_keys, right_ax, sort=sort) - else: - jkey = join_keys[0] - - left_indexer, right_indexer = \ - _get_single_indexer(jkey, right_ax, sort=sort) - - if sort or len(left_ax) != len(left_indexer): - # if asked to sort or there are 1-to-many matches - join_index = left_ax.take(left_indexer) - return join_index, left_indexer, right_indexer - - # left frame preserves order & length of its index - return left_ax, None, right_indexer - - -def _right_outer_join(x, y, max_groups): - right_indexer, left_indexer = _join.left_outer_join(y, x, max_groups) - return left_indexer, right_indexer - -_join_functions = { - 'inner': _join.inner_join, - 'left': _join.left_outer_join, - 'right': _right_outer_join, - 'outer': _join.full_outer_join, -} - - -def _factorize_keys(lk, rk, sort=True): - if is_datetime64tz_dtype(lk) and is_datetime64tz_dtype(rk): - lk = lk.values - rk = rk.values - if is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk): - klass = _hash.Int64Factorizer - lk = _ensure_int64(com._values_from_object(lk)) - rk = _ensure_int64(com._values_from_object(rk)) - else: - klass = _hash.Factorizer - lk = _ensure_object(lk) - rk = _ensure_object(rk) - - rizer = klass(max(len(lk), len(rk))) - - llab = rizer.factorize(lk) - rlab = rizer.factorize(rk) - - count = rizer.get_count() - - if sort: - uniques = rizer.uniques.to_array() - llab, rlab = _sort_labels(uniques, llab, rlab) - - # NA group - lmask = llab == -1 - lany = lmask.any() - rmask = rlab == -1 - rany = rmask.any() - - if lany or rany: - if lany: - np.putmask(llab, lmask, count) - if rany: - np.putmask(rlab, rmask, count) - count += 1 - - return llab, rlab, count - - -def _sort_labels(uniques, left, right): - if not isinstance(uniques, np.ndarray): - # tuplesafe - uniques = Index(uniques).values - - l = len(left) - labels = np.concatenate([left, right]) - - _, new_labels = algos.safe_sort(uniques, labels, na_sentinel=-1) - new_labels = _ensure_int64(new_labels) - new_left, new_right = new_labels[:l], new_labels[l:] - - return new_left, new_right - - -def _get_join_keys(llab, rlab, shape, sort): - from pandas.core.groupby import _int64_overflow_possible - - # how many levels can be done without overflow - pred = lambda i: not _int64_overflow_possible(shape[:i]) - nlev = next(filter(pred, range(len(shape), 0, -1))) - - # get keys for the first `nlev` levels - stride = np.prod(shape[1:nlev], dtype='i8') - lkey = stride * llab[0].astype('i8', subok=False, copy=False) - rkey = stride * rlab[0].astype('i8', subok=False, copy=False) - - for i in range(1, nlev): - stride //= shape[i] - lkey += llab[i] * stride - rkey += rlab[i] * stride - - if nlev == len(shape): # all done! - return lkey, rkey - - # densify current keys to avoid overflow - lkey, rkey, count = _factorize_keys(lkey, rkey, sort=sort) - - llab = [lkey] + llab[nlev:] - rlab = [rkey] + rlab[nlev:] - shape = [count] + shape[nlev:] - - return _get_join_keys(llab, rlab, shape, sort) - -# --------------------------------------------------------------------- -# Concatenate DataFrame objects - - -def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, - keys=None, levels=None, names=None, verify_integrity=False, - copy=True): - """ - Concatenate pandas objects along a particular axis with optional set logic - along the other axes. - - Can also add a layer of hierarchical indexing on the concatenation axis, - which may be useful if the labels are the same (or overlapping) on - the passed axis number. - - Parameters - ---------- - objs : a sequence or mapping of Series, DataFrame, or Panel objects - If a dict is passed, the sorted keys will be used as the `keys` - argument, unless it is passed, in which case the values will be - selected (see below). Any None objects will be dropped silently unless - they are all None in which case a ValueError will be raised - axis : {0/'index', 1/'columns'}, default 0 - The axis to concatenate along - join : {'inner', 'outer'}, default 'outer' - How to handle indexes on other axis(es) - join_axes : list of Index objects - Specific indexes to use for the other n - 1 axes instead of performing - inner/outer set logic - ignore_index : boolean, default False - If True, do not use the index values along the concatenation axis. The - resulting axis will be labeled 0, ..., n - 1. This is useful if you are - concatenating objects where the concatenation axis does not have - meaningful indexing information. Note the index values on the other - axes are still respected in the join. - keys : sequence, default None - If multiple levels passed, should contain tuples. Construct - hierarchical index using the passed keys as the outermost level - levels : list of sequences, default None - Specific levels (unique values) to use for constructing a - MultiIndex. Otherwise they will be inferred from the keys - names : list, default None - Names for the levels in the resulting hierarchical index - verify_integrity : boolean, default False - Check whether the new concatenated axis contains duplicates. This can - be very expensive relative to the actual data concatenation - copy : boolean, default True - If False, do not copy data unnecessarily - - Returns - ------- - concatenated : type of objects - - Notes - ----- - The keys, levels, and names arguments are all optional. - - A walkthrough of how this method fits in with other tools for combining - panda objects can be found `here - `__. - - See Also - -------- - Series.append - DataFrame.append - DataFrame.join - DataFrame.merge - - Examples - -------- - Combine two ``Series``. - - >>> s1 = pd.Series(['a', 'b']) - >>> s2 = pd.Series(['c', 'd']) - >>> pd.concat([s1, s2]) - 0 a - 1 b - 0 c - 1 d - dtype: object - - Clear the existing index and reset it in the result - by setting the ``ignore_index`` option to ``True``. - - >>> pd.concat([s1, s2], ignore_index=True) - 0 a - 1 b - 2 c - 3 d - dtype: object - - Add a hierarchical index at the outermost level of - the data with the ``keys`` option. - - >>> pd.concat([s1, s2], keys=['s1', 's2',]) - s1 0 a - 1 b - s2 0 c - 1 d - dtype: object - - Label the index keys you create with the ``names`` option. - - >>> pd.concat([s1, s2], keys=['s1', 's2'], - ... names=['Series name', 'Row ID']) - Series name Row ID - s1 0 a - 1 b - s2 0 c - 1 d - dtype: object - - Combine two ``DataFrame`` objects with identical columns. - - >>> df1 = pd.DataFrame([['a', 1], ['b', 2]], - ... columns=['letter', 'number']) - >>> df1 - letter number - 0 a 1 - 1 b 2 - >>> df2 = pd.DataFrame([['c', 3], ['d', 4]], - ... columns=['letter', 'number']) - >>> df2 - letter number - 0 c 3 - 1 d 4 - >>> pd.concat([df1, df2]) - letter number - 0 a 1 - 1 b 2 - 0 c 3 - 1 d 4 - - Combine ``DataFrame`` objects with overlapping columns - and return everything. Columns outside the intersection will - be filled with ``NaN`` values. - - >>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], - ... columns=['letter', 'number', 'animal']) - >>> df3 - letter number animal - 0 c 3 cat - 1 d 4 dog - >>> pd.concat([df1, df3]) - animal letter number - 0 NaN a 1 - 1 NaN b 2 - 0 cat c 3 - 1 dog d 4 - - Combine ``DataFrame`` objects with overlapping columns - and return only those that are shared by passing ``inner`` to - the ``join`` keyword argument. - - >>> pd.concat([df1, df3], join="inner") - letter number - 0 a 1 - 1 b 2 - 0 c 3 - 1 d 4 - - Combine ``DataFrame`` objects horizontally along the x axis by - passing in ``axis=1``. - - >>> df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']], - ... columns=['animal', 'name']) - >>> pd.concat([df1, df4], axis=1) - letter number animal name - 0 a 1 bird polly - 1 b 2 monkey george - - Prevent the result from including duplicate index values with the - ``verify_integrity`` option. - - >>> df5 = pd.DataFrame([1], index=['a']) - >>> df5 - 0 - a 1 - >>> df6 = pd.DataFrame([2], index=['a']) - >>> df6 - 0 - a 2 - >>> pd.concat([df5, df6], verify_integrity=True) - ValueError: Indexes have overlapping values: ['a'] - """ - op = _Concatenator(objs, axis=axis, join_axes=join_axes, - ignore_index=ignore_index, join=join, - keys=keys, levels=levels, names=names, - verify_integrity=verify_integrity, - copy=copy) - return op.get_result() - - -class _Concatenator(object): - """ - Orchestrates a concatenation operation for BlockManagers - """ - - def __init__(self, objs, axis=0, join='outer', join_axes=None, - keys=None, levels=None, names=None, - ignore_index=False, verify_integrity=False, copy=True): - if isinstance(objs, (NDFrame, compat.string_types)): - raise TypeError('first argument must be an iterable of pandas ' - 'objects, you passed an object of type ' - '"{0}"'.format(type(objs).__name__)) - - if join == 'outer': - self.intersect = False - elif join == 'inner': - self.intersect = True - else: # pragma: no cover - raise ValueError('Only can inner (intersect) or outer (union) ' - 'join the other axis') - - if isinstance(objs, dict): - if keys is None: - keys = sorted(objs) - objs = [objs[k] for k in keys] - else: - objs = list(objs) - - if len(objs) == 0: - raise ValueError('No objects to concatenate') - - if keys is None: - objs = [obj for obj in objs if obj is not None] - else: - # #1649 - clean_keys = [] - clean_objs = [] - for k, v in zip(keys, objs): - if v is None: - continue - clean_keys.append(k) - clean_objs.append(v) - objs = clean_objs - name = getattr(keys, 'name', None) - keys = Index(clean_keys, name=name) - - if len(objs) == 0: - raise ValueError('All objects passed were None') - - # consolidate data & figure out what our result ndim is going to be - ndims = set() - for obj in objs: - if not isinstance(obj, NDFrame): - raise TypeError("cannot concatenate a non-NDFrame object") - - # consolidate - obj.consolidate(inplace=True) - ndims.add(obj.ndim) - - # get the sample - # want the higest ndim that we have, and must be non-empty - # unless all objs are empty - sample = None - if len(ndims) > 1: - max_ndim = max(ndims) - for obj in objs: - if obj.ndim == max_ndim and np.sum(obj.shape): - sample = obj - break - - else: - # filter out the empties if we have not multi-index possibiltes - # note to keep empty Series as it affect to result columns / name - non_empties = [obj for obj in objs - if sum(obj.shape) > 0 or isinstance(obj, Series)] - - if (len(non_empties) and (keys is None and names is None and - levels is None and join_axes is None)): - objs = non_empties - sample = objs[0] - - if sample is None: - sample = objs[0] - self.objs = objs - - # Standardize axis parameter to int - if isinstance(sample, Series): - axis = DataFrame()._get_axis_number(axis) - else: - axis = sample._get_axis_number(axis) - - # Need to flip BlockManager axis in the DataFrame special case - self._is_frame = isinstance(sample, DataFrame) - if self._is_frame: - axis = 1 if axis == 0 else 0 - - self._is_series = isinstance(sample, ABCSeries) - if not 0 <= axis <= sample.ndim: - raise AssertionError("axis must be between 0 and {0}, " - "input was {1}".format(sample.ndim, axis)) - - # if we have mixed ndims, then convert to highest ndim - # creating column numbers as needed - if len(ndims) > 1: - current_column = 0 - max_ndim = sample.ndim - self.objs, objs = [], self.objs - for obj in objs: - - ndim = obj.ndim - if ndim == max_ndim: - pass - - elif ndim != max_ndim - 1: - raise ValueError("cannot concatenate unaligned mixed " - "dimensional NDFrame objects") - - else: - name = getattr(obj, 'name', None) - if ignore_index or name is None: - name = current_column - current_column += 1 - - # doing a row-wise concatenation so need everything - # to line up - if self._is_frame and axis == 1: - name = 0 - obj = sample._constructor({name: obj}) - - self.objs.append(obj) - - # note: this is the BlockManager axis (since DataFrame is transposed) - self.axis = axis - self.join_axes = join_axes - self.keys = keys - self.names = names or getattr(keys, 'names', None) - self.levels = levels - - self.ignore_index = ignore_index - self.verify_integrity = verify_integrity - self.copy = copy - - self.new_axes = self._get_new_axes() - - def get_result(self): - - # series only - if self._is_series: - - # stack blocks - if self.axis == 0: - # concat Series with length to keep dtype as much - non_empties = [x for x in self.objs if len(x) > 0] - if len(non_empties) > 0: - values = [x._values for x in non_empties] - else: - values = [x._values for x in self.objs] - new_data = _concat._concat_compat(values) - - name = com._consensus_name_attr(self.objs) - cons = _concat._get_series_result_type(new_data) - - return (cons(new_data, index=self.new_axes[0], - name=name, dtype=new_data.dtype) - .__finalize__(self, method='concat')) - - # combine as columns in a frame - else: - data = dict(zip(range(len(self.objs)), self.objs)) - cons = _concat._get_series_result_type(data) - - index, columns = self.new_axes - df = cons(data, index=index) - df.columns = columns - return df.__finalize__(self, method='concat') - - # combine block managers - else: - mgrs_indexers = [] - for obj in self.objs: - mgr = obj._data - indexers = {} - for ax, new_labels in enumerate(self.new_axes): - if ax == self.axis: - # Suppress reindexing on concat axis - continue - - obj_labels = mgr.axes[ax] - if not new_labels.equals(obj_labels): - indexers[ax] = obj_labels.reindex(new_labels)[1] - - mgrs_indexers.append((obj._data, indexers)) - - new_data = concatenate_block_managers( - mgrs_indexers, self.new_axes, concat_axis=self.axis, - copy=self.copy) - if not self.copy: - new_data._consolidate_inplace() - - cons = _concat._get_frame_result_type(new_data, self.objs) - return (cons._from_axes(new_data, self.new_axes) - .__finalize__(self, method='concat')) - - def _get_result_dim(self): - if self._is_series and self.axis == 1: - return 2 - else: - return self.objs[0].ndim - - def _get_new_axes(self): - ndim = self._get_result_dim() - new_axes = [None] * ndim - - if self.join_axes is None: - for i in range(ndim): - if i == self.axis: - continue - new_axes[i] = self._get_comb_axis(i) - else: - if len(self.join_axes) != ndim - 1: - raise AssertionError("length of join_axes must not be " - "equal to {0}".format(ndim - 1)) - - # ufff... - indices = lrange(ndim) - indices.remove(self.axis) - - for i, ax in zip(indices, self.join_axes): - new_axes[i] = ax - - new_axes[self.axis] = self._get_concat_axis() - return new_axes - - def _get_comb_axis(self, i): - if self._is_series: - all_indexes = [x.index for x in self.objs] - else: - try: - all_indexes = [x._data.axes[i] for x in self.objs] - except IndexError: - types = [type(x).__name__ for x in self.objs] - raise TypeError("Cannot concatenate list of %s" % types) - - return _get_combined_index(all_indexes, intersect=self.intersect) - - def _get_concat_axis(self): - """ - Return index to be used along concatenation axis. - """ - if self._is_series: - if self.axis == 0: - indexes = [x.index for x in self.objs] - elif self.ignore_index: - idx = com._default_index(len(self.objs)) - return idx - elif self.keys is None: - names = [None] * len(self.objs) - num = 0 - has_names = False - for i, x in enumerate(self.objs): - if not isinstance(x, Series): - raise TypeError("Cannot concatenate type 'Series' " - "with object of type " - "%r" % type(x).__name__) - if x.name is not None: - names[i] = x.name - has_names = True - else: - names[i] = num - num += 1 - if has_names: - return Index(names) - else: - return com._default_index(len(self.objs)) - else: - return _ensure_index(self.keys) - else: - indexes = [x._data.axes[self.axis] for x in self.objs] - - if self.ignore_index: - idx = com._default_index(sum(len(i) for i in indexes)) - return idx - - if self.keys is None: - concat_axis = _concat_indexes(indexes) - else: - concat_axis = _make_concat_multiindex(indexes, self.keys, - self.levels, self.names) - - self._maybe_check_integrity(concat_axis) - - return concat_axis - - def _maybe_check_integrity(self, concat_index): - if self.verify_integrity: - if not concat_index.is_unique: - overlap = concat_index.get_duplicates() - raise ValueError('Indexes have overlapping values: %s' - % str(overlap)) - - -def _concat_indexes(indexes): - return indexes[0].append(indexes[1:]) - - -def _make_concat_multiindex(indexes, keys, levels=None, names=None): - - if ((levels is None and isinstance(keys[0], tuple)) or - (levels is not None and len(levels) > 1)): - zipped = lzip(*keys) - if names is None: - names = [None] * len(zipped) - - if levels is None: - _, levels = _factorize_from_iterables(zipped) - else: - levels = [_ensure_index(x) for x in levels] - else: - zipped = [keys] - if names is None: - names = [None] - - if levels is None: - levels = [_ensure_index(keys)] - else: - levels = [_ensure_index(x) for x in levels] - - if not _all_indexes_same(indexes): - label_list = [] - - # things are potentially different sizes, so compute the exact labels - # for each level and pass those to MultiIndex.from_arrays - - for hlevel, level in zip(zipped, levels): - to_concat = [] - for key, index in zip(hlevel, indexes): - try: - i = level.get_loc(key) - except KeyError: - raise ValueError('Key %s not in level %s' - % (str(key), str(level))) - - to_concat.append(np.repeat(i, len(index))) - label_list.append(np.concatenate(to_concat)) - - concat_index = _concat_indexes(indexes) - - # these go at the end - if isinstance(concat_index, MultiIndex): - levels.extend(concat_index.levels) - label_list.extend(concat_index.labels) - else: - codes, categories = _factorize_from_iterable(concat_index) - levels.append(categories) - label_list.append(codes) - - if len(names) == len(levels): - names = list(names) - else: - # make sure that all of the passed indices have the same nlevels - if not len(set([idx.nlevels for idx in indexes])) == 1: - raise AssertionError("Cannot concat indices that do" - " not have the same number of levels") - - # also copies - names = names + _get_consensus_names(indexes) - - return MultiIndex(levels=levels, labels=label_list, names=names, - verify_integrity=False) - - new_index = indexes[0] - n = len(new_index) - kpieces = len(indexes) - - # also copies - new_names = list(names) - new_levels = list(levels) - - # construct labels - new_labels = [] - - # do something a bit more speedy - - for hlevel, level in zip(zipped, levels): - hlevel = _ensure_index(hlevel) - mapped = level.get_indexer(hlevel) - - mask = mapped == -1 - if mask.any(): - raise ValueError('Values not found in passed level: %s' - % str(hlevel[mask])) - - new_labels.append(np.repeat(mapped, n)) - - if isinstance(new_index, MultiIndex): - new_levels.extend(new_index.levels) - new_labels.extend([np.tile(lab, kpieces) for lab in new_index.labels]) - else: - new_levels.append(new_index) - new_labels.append(np.tile(np.arange(n), kpieces)) - - if len(new_names) < len(new_levels): - new_names.extend(new_index.names) - - return MultiIndex(levels=new_levels, labels=new_labels, names=new_names, - verify_integrity=False) +# back-compat of pseudo-public API +def concat_wrap(): -def _should_fill(lname, rname): - if (not isinstance(lname, compat.string_types) or - not isinstance(rname, compat.string_types)): - return True - return lname == rname + def wrapper(*args, **kwargs): + warnings.warn("pandas.tools.merge.concat is deprecated. " + "import from the public API: " + "pandas.concat instead", + FutureWarning, stacklevel=3) + import pandas as pd + return pd.concat(*args, **kwargs) + return wrapper -def _any(x): - return x is not None and len(x) > 0 and any([y is not None for y in x]) +concat = concat_wrap() diff --git a/pandas/tools/plotting.py b/pandas/tools/plotting.py index 012d67d29cc3f..a68da67a219e2 100644 --- a/pandas/tools/plotting.py +++ b/pandas/tools/plotting.py @@ -1,4030 +1,20 @@ -# being a bit too dynamic -# pylint: disable=E1101 -from __future__ import division - +import sys import warnings -import re -from math import ceil -from collections import namedtuple -from contextlib import contextmanager -from distutils.version import LooseVersion - -import numpy as np - -from pandas.types.common import (is_list_like, - is_integer, - is_number, - is_hashable, - is_iterator) -from pandas.types.missing import isnull, notnull - -from pandas.util.decorators import cache_readonly, deprecate_kwarg -from pandas.core.base import PandasObject - -from pandas.core.common import AbstractMethodError, _try_sort -from pandas.core.generic import _shared_docs, _shared_doc_kwargs -from pandas.core.index import Index, MultiIndex -from pandas.core.series import Series, remove_na -from pandas.tseries.period import PeriodIndex -from pandas.compat import range, lrange, lmap, map, zip, string_types -import pandas.compat as compat -from pandas.formats.printing import pprint_thing -from pandas.util.decorators import Appender -try: # mpl optional - import pandas.tseries.converter as conv - conv.register() # needs to override so set_xlim works with str/number -except ImportError: - pass - - -# Extracted from https://gist.github.com/huyng/816622 -# this is the rcParams set when setting display.with_mpl_style -# to True. -mpl_stylesheet = { - 'axes.axisbelow': True, - 'axes.color_cycle': ['#348ABD', - '#7A68A6', - '#A60628', - '#467821', - '#CF4457', - '#188487', - '#E24A33'], - 'axes.edgecolor': '#bcbcbc', - 'axes.facecolor': '#eeeeee', - 'axes.grid': True, - 'axes.labelcolor': '#555555', - 'axes.labelsize': 'large', - 'axes.linewidth': 1.0, - 'axes.titlesize': 'x-large', - 'figure.edgecolor': 'white', - 'figure.facecolor': 'white', - 'figure.figsize': (6.0, 4.0), - 'figure.subplot.hspace': 0.5, - 'font.family': 'monospace', - 'font.monospace': ['Andale Mono', - 'Nimbus Mono L', - 'Courier New', - 'Courier', - 'Fixed', - 'Terminal', - 'monospace'], - 'font.size': 10, - 'interactive': True, - 'keymap.all_axes': ['a'], - 'keymap.back': ['left', 'c', 'backspace'], - 'keymap.forward': ['right', 'v'], - 'keymap.fullscreen': ['f'], - 'keymap.grid': ['g'], - 'keymap.home': ['h', 'r', 'home'], - 'keymap.pan': ['p'], - 'keymap.save': ['s'], - 'keymap.xscale': ['L', 'k'], - 'keymap.yscale': ['l'], - 'keymap.zoom': ['o'], - 'legend.fancybox': True, - 'lines.antialiased': True, - 'lines.linewidth': 1.0, - 'patch.antialiased': True, - 'patch.edgecolor': '#EEEEEE', - 'patch.facecolor': '#348ABD', - 'patch.linewidth': 0.5, - 'toolbar': 'toolbar2', - 'xtick.color': '#555555', - 'xtick.direction': 'in', - 'xtick.major.pad': 6.0, - 'xtick.major.size': 0.0, - 'xtick.minor.pad': 6.0, - 'xtick.minor.size': 0.0, - 'ytick.color': '#555555', - 'ytick.direction': 'in', - 'ytick.major.pad': 6.0, - 'ytick.major.size': 0.0, - 'ytick.minor.pad': 6.0, - 'ytick.minor.size': 0.0 -} - - -def _mpl_le_1_2_1(): - try: - import matplotlib as mpl - return (str(mpl.__version__) <= LooseVersion('1.2.1') and - str(mpl.__version__)[0] != '0') - except ImportError: - return False - - -def _mpl_ge_1_3_1(): - try: - import matplotlib - # The or v[0] == '0' is because their versioneer is - # messed up on dev - return (matplotlib.__version__ >= LooseVersion('1.3.1') or - matplotlib.__version__[0] == '0') - except ImportError: - return False - - -def _mpl_ge_1_4_0(): - try: - import matplotlib - return (matplotlib.__version__ >= LooseVersion('1.4') or - matplotlib.__version__[0] == '0') - except ImportError: - return False - - -def _mpl_ge_1_5_0(): - try: - import matplotlib - return (matplotlib.__version__ >= LooseVersion('1.5') or - matplotlib.__version__[0] == '0') - except ImportError: - return False - - -def _mpl_ge_2_0_0(): - try: - import matplotlib - return matplotlib.__version__ >= LooseVersion('2.0') - except ImportError: - return False - -if _mpl_ge_1_5_0(): - # Compat with mp 1.5, which uses cycler. - import cycler - colors = mpl_stylesheet.pop('axes.color_cycle') - mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors) - - -def _get_standard_kind(kind): - return {'density': 'kde'}.get(kind, kind) - - -def _get_standard_colors(num_colors=None, colormap=None, color_type='default', - color=None): - import matplotlib.pyplot as plt - - if color is None and colormap is not None: - if isinstance(colormap, compat.string_types): - import matplotlib.cm as cm - cmap = colormap - colormap = cm.get_cmap(colormap) - if colormap is None: - raise ValueError("Colormap {0} is not recognized".format(cmap)) - colors = lmap(colormap, np.linspace(0, 1, num=num_colors)) - elif color is not None: - if colormap is not None: - warnings.warn("'color' and 'colormap' cannot be used " - "simultaneously. Using 'color'") - colors = list(color) if is_list_like(color) else color - else: - if color_type == 'default': - # need to call list() on the result to copy so we don't - # modify the global rcParams below - try: - colors = [c['color'] - for c in list(plt.rcParams['axes.prop_cycle'])] - except KeyError: - colors = list(plt.rcParams.get('axes.color_cycle', - list('bgrcmyk'))) - if isinstance(colors, compat.string_types): - colors = list(colors) - elif color_type == 'random': - import random - - def random_color(column): - random.seed(column) - return [random.random() for _ in range(3)] - - colors = lmap(random_color, lrange(num_colors)) - else: - raise ValueError("color_type must be either 'default' or 'random'") - - if isinstance(colors, compat.string_types): - import matplotlib.colors - conv = matplotlib.colors.ColorConverter() - - def _maybe_valid_colors(colors): - try: - [conv.to_rgba(c) for c in colors] - return True - except ValueError: - return False - - # check whether the string can be convertable to single color - maybe_single_color = _maybe_valid_colors([colors]) - # check whether each character can be convertable to colors - maybe_color_cycle = _maybe_valid_colors(list(colors)) - if maybe_single_color and maybe_color_cycle and len(colors) > 1: - msg = ("'{0}' can be parsed as both single color and " - "color cycle. Specify each color using a list " - "like ['{0}'] or {1}") - raise ValueError(msg.format(colors, list(colors))) - elif maybe_single_color: - colors = [colors] - else: - # ``colors`` is regarded as color cycle. - # mpl will raise error any of them is invalid - pass - - if len(colors) != num_colors: - multiple = num_colors // len(colors) - 1 - mod = num_colors % len(colors) - - colors += multiple * colors - colors += colors[:mod] - - return colors - - -class _Options(dict): - """ - Stores pandas plotting options. - Allows for parameter aliasing so you can just use parameter names that are - the same as the plot function parameters, but is stored in a canonical - format that makes it easy to breakdown into groups later - """ - - # alias so the names are same as plotting method parameter names - _ALIASES = {'x_compat': 'xaxis.compat'} - _DEFAULT_KEYS = ['xaxis.compat'] - - def __init__(self): - self['xaxis.compat'] = False - - def __getitem__(self, key): - key = self._get_canonical_key(key) - if key not in self: - raise ValueError('%s is not a valid pandas plotting option' % key) - return super(_Options, self).__getitem__(key) - - def __setitem__(self, key, value): - key = self._get_canonical_key(key) - return super(_Options, self).__setitem__(key, value) - - def __delitem__(self, key): - key = self._get_canonical_key(key) - if key in self._DEFAULT_KEYS: - raise ValueError('Cannot remove default parameter %s' % key) - return super(_Options, self).__delitem__(key) - - def __contains__(self, key): - key = self._get_canonical_key(key) - return super(_Options, self).__contains__(key) - - def reset(self): - """ - Reset the option store to its initial state - - Returns - ------- - None - """ - self.__init__() - - def _get_canonical_key(self, key): - return self._ALIASES.get(key, key) - - @contextmanager - def use(self, key, value): - """ - Temporarily set a parameter value using the with statement. - Aliasing allowed. - """ - old_value = self[key] - try: - self[key] = value - yield self - finally: - self[key] = old_value - - -plot_params = _Options() - - -def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, - diagonal='hist', marker='.', density_kwds=None, - hist_kwds=None, range_padding=0.05, **kwds): - """ - Draw a matrix of scatter plots. - - Parameters - ---------- - frame : DataFrame - alpha : float, optional - amount of transparency applied - figsize : (float,float), optional - a tuple (width, height) in inches - ax : Matplotlib axis object, optional - grid : bool, optional - setting this to True will show the grid - diagonal : {'hist', 'kde'} - pick between 'kde' and 'hist' for - either Kernel Density Estimation or Histogram - plot in the diagonal - marker : str, optional - Matplotlib marker type, default '.' - hist_kwds : other plotting keyword arguments - To be passed to hist function - density_kwds : other plotting keyword arguments - To be passed to kernel density estimate plot - range_padding : float, optional - relative extension of axis range in x and y - with respect to (x_max - x_min) or (y_max - y_min), - default 0.05 - kwds : other plotting keyword arguments - To be passed to scatter function - - Examples - -------- - >>> df = DataFrame(np.random.randn(1000, 4), columns=['A','B','C','D']) - >>> scatter_matrix(df, alpha=0.2) - """ - import matplotlib.pyplot as plt - - df = frame._get_numeric_data() - n = df.columns.size - naxes = n * n - fig, axes = _subplots(naxes=naxes, figsize=figsize, ax=ax, - squeeze=False) - - # no gaps between subplots - fig.subplots_adjust(wspace=0, hspace=0) - - mask = notnull(df) - - marker = _get_marker_compat(marker) - - hist_kwds = hist_kwds or {} - density_kwds = density_kwds or {} - - # workaround because `c='b'` is hardcoded in matplotlibs scatter method - kwds.setdefault('c', plt.rcParams['patch.facecolor']) - - boundaries_list = [] - for a in df.columns: - values = df[a].values[mask[a].values] - rmin_, rmax_ = np.min(values), np.max(values) - rdelta_ext = (rmax_ - rmin_) * range_padding / 2. - boundaries_list.append((rmin_ - rdelta_ext, rmax_ + rdelta_ext)) - - for i, a in zip(lrange(n), df.columns): - for j, b in zip(lrange(n), df.columns): - ax = axes[i, j] - - if i == j: - values = df[a].values[mask[a].values] - - # Deal with the diagonal by drawing a histogram there. - if diagonal == 'hist': - ax.hist(values, **hist_kwds) - - elif diagonal in ('kde', 'density'): - from scipy.stats import gaussian_kde - y = values - gkde = gaussian_kde(y) - ind = np.linspace(y.min(), y.max(), 1000) - ax.plot(ind, gkde.evaluate(ind), **density_kwds) - - ax.set_xlim(boundaries_list[i]) - - else: - common = (mask[a] & mask[b]).values - - ax.scatter(df[b][common], df[a][common], - marker=marker, alpha=alpha, **kwds) - - ax.set_xlim(boundaries_list[j]) - ax.set_ylim(boundaries_list[i]) - - ax.set_xlabel(b) - ax.set_ylabel(a) - - if j != 0: - ax.yaxis.set_visible(False) - if i != n - 1: - ax.xaxis.set_visible(False) - - if len(df.columns) > 1: - lim1 = boundaries_list[0] - locs = axes[0][1].yaxis.get_majorticklocs() - locs = locs[(lim1[0] <= locs) & (locs <= lim1[1])] - adj = (locs - lim1[0]) / (lim1[1] - lim1[0]) - - lim0 = axes[0][0].get_ylim() - adj = adj * (lim0[1] - lim0[0]) + lim0[0] - axes[0][0].yaxis.set_ticks(adj) - - if np.all(locs == locs.astype(int)): - # if all ticks are int - locs = locs.astype(int) - axes[0][0].yaxis.set_ticklabels(locs) - - _set_ticks_props(axes, xlabelsize=8, xrot=90, ylabelsize=8, yrot=0) - - return axes - - -def _gca(): - import matplotlib.pyplot as plt - return plt.gca() - - -def _gcf(): - import matplotlib.pyplot as plt - return plt.gcf() - - -def _get_marker_compat(marker): - import matplotlib.lines as mlines - import matplotlib as mpl - if mpl.__version__ < '1.1.0' and marker == '.': - return 'o' - if marker not in mlines.lineMarkers: - return 'o' - return marker - - -def radviz(frame, class_column, ax=None, color=None, colormap=None, **kwds): - """RadViz - a multivariate data visualization algorithm - - Parameters: - ----------- - frame: DataFrame - class_column: str - Column name containing class names - ax: Matplotlib axis object, optional - color: list or tuple, optional - Colors to use for the different classes - colormap : str or matplotlib colormap object, default None - Colormap to select colors from. If string, load colormap with that name - from matplotlib. - kwds: keywords - Options to pass to matplotlib scatter plotting method - - Returns: - -------- - ax: Matplotlib axis object - """ - import matplotlib.pyplot as plt - import matplotlib.patches as patches - - def normalize(series): - a = min(series) - b = max(series) - return (series - a) / (b - a) - - n = len(frame) - classes = frame[class_column].drop_duplicates() - class_col = frame[class_column] - df = frame.drop(class_column, axis=1).apply(normalize) - - if ax is None: - ax = plt.gca(xlim=[-1, 1], ylim=[-1, 1]) - - to_plot = {} - colors = _get_standard_colors(num_colors=len(classes), colormap=colormap, - color_type='random', color=color) - - for kls in classes: - to_plot[kls] = [[], []] - - m = len(frame.columns) - 1 - s = np.array([(np.cos(t), np.sin(t)) - for t in [2.0 * np.pi * (i / float(m)) - for i in range(m)]]) - - for i in range(n): - row = df.iloc[i].values - row_ = np.repeat(np.expand_dims(row, axis=1), 2, axis=1) - y = (s * row_).sum(axis=0) / row.sum() - kls = class_col.iat[i] - to_plot[kls][0].append(y[0]) - to_plot[kls][1].append(y[1]) - - for i, kls in enumerate(classes): - ax.scatter(to_plot[kls][0], to_plot[kls][1], color=colors[i], - label=pprint_thing(kls), **kwds) - ax.legend() - - ax.add_patch(patches.Circle((0.0, 0.0), radius=1.0, facecolor='none')) - - for xy, name in zip(s, df.columns): - - ax.add_patch(patches.Circle(xy, radius=0.025, facecolor='gray')) - - if xy[0] < 0.0 and xy[1] < 0.0: - ax.text(xy[0] - 0.025, xy[1] - 0.025, name, - ha='right', va='top', size='small') - elif xy[0] < 0.0 and xy[1] >= 0.0: - ax.text(xy[0] - 0.025, xy[1] + 0.025, name, - ha='right', va='bottom', size='small') - elif xy[0] >= 0.0 and xy[1] < 0.0: - ax.text(xy[0] + 0.025, xy[1] - 0.025, name, - ha='left', va='top', size='small') - elif xy[0] >= 0.0 and xy[1] >= 0.0: - ax.text(xy[0] + 0.025, xy[1] + 0.025, name, - ha='left', va='bottom', size='small') - - ax.axis('equal') - return ax - - -@deprecate_kwarg(old_arg_name='data', new_arg_name='frame') -def andrews_curves(frame, class_column, ax=None, samples=200, color=None, - colormap=None, **kwds): - """ - Generates a matplotlib plot of Andrews curves, for visualising clusters of - multivariate data. - - Andrews curves have the functional form: - - f(t) = x_1/sqrt(2) + x_2 sin(t) + x_3 cos(t) + - x_4 sin(2t) + x_5 cos(2t) + ... - - Where x coefficients correspond to the values of each dimension and t is - linearly spaced between -pi and +pi. Each row of frame then corresponds to - a single curve. - - Parameters: - ----------- - frame : DataFrame - Data to be plotted, preferably normalized to (0.0, 1.0) - class_column : Name of the column containing class names - ax : matplotlib axes object, default None - samples : Number of points to plot in each curve - color: list or tuple, optional - Colors to use for the different classes - colormap : str or matplotlib colormap object, default None - Colormap to select colors from. If string, load colormap with that name - from matplotlib. - kwds: keywords - Options to pass to matplotlib plotting method - - Returns: - -------- - ax: Matplotlib axis object - - """ - from math import sqrt, pi - import matplotlib.pyplot as plt - - def function(amplitudes): - def f(t): - x1 = amplitudes[0] - result = x1 / sqrt(2.0) - - # Take the rest of the coefficients and resize them - # appropriately. Take a copy of amplitudes as otherwise numpy - # deletes the element from amplitudes itself. - coeffs = np.delete(np.copy(amplitudes), 0) - coeffs.resize(int((coeffs.size + 1) / 2), 2) - - # Generate the harmonics and arguments for the sin and cos - # functions. - harmonics = np.arange(0, coeffs.shape[0]) + 1 - trig_args = np.outer(harmonics, t) - - result += np.sum(coeffs[:, 0, np.newaxis] * np.sin(trig_args) + - coeffs[:, 1, np.newaxis] * np.cos(trig_args), - axis=0) - return result - return f - - n = len(frame) - class_col = frame[class_column] - classes = frame[class_column].drop_duplicates() - df = frame.drop(class_column, axis=1) - t = np.linspace(-pi, pi, samples) - used_legends = set([]) - - color_values = _get_standard_colors(num_colors=len(classes), - colormap=colormap, color_type='random', - color=color) - colors = dict(zip(classes, color_values)) - if ax is None: - ax = plt.gca(xlim=(-pi, pi)) - for i in range(n): - row = df.iloc[i].values - f = function(row) - y = f(t) - kls = class_col.iat[i] - label = pprint_thing(kls) - if label not in used_legends: - used_legends.add(label) - ax.plot(t, y, color=colors[kls], label=label, **kwds) - else: - ax.plot(t, y, color=colors[kls], **kwds) - - ax.legend(loc='upper right') - ax.grid() - return ax - - -def bootstrap_plot(series, fig=None, size=50, samples=500, **kwds): - """Bootstrap plot. - - Parameters: - ----------- - series: Time series - fig: matplotlib figure object, optional - size: number of data points to consider during each sampling - samples: number of times the bootstrap procedure is performed - kwds: optional keyword arguments for plotting commands, must be accepted - by both hist and plot - - Returns: - -------- - fig: matplotlib figure - """ - import random - import matplotlib.pyplot as plt - - # random.sample(ndarray, int) fails on python 3.3, sigh - data = list(series.values) - samplings = [random.sample(data, size) for _ in range(samples)] - - means = np.array([np.mean(sampling) for sampling in samplings]) - medians = np.array([np.median(sampling) for sampling in samplings]) - midranges = np.array([(min(sampling) + max(sampling)) * 0.5 - for sampling in samplings]) - if fig is None: - fig = plt.figure() - x = lrange(samples) - axes = [] - ax1 = fig.add_subplot(2, 3, 1) - ax1.set_xlabel("Sample") - axes.append(ax1) - ax1.plot(x, means, **kwds) - ax2 = fig.add_subplot(2, 3, 2) - ax2.set_xlabel("Sample") - axes.append(ax2) - ax2.plot(x, medians, **kwds) - ax3 = fig.add_subplot(2, 3, 3) - ax3.set_xlabel("Sample") - axes.append(ax3) - ax3.plot(x, midranges, **kwds) - ax4 = fig.add_subplot(2, 3, 4) - ax4.set_xlabel("Mean") - axes.append(ax4) - ax4.hist(means, **kwds) - ax5 = fig.add_subplot(2, 3, 5) - ax5.set_xlabel("Median") - axes.append(ax5) - ax5.hist(medians, **kwds) - ax6 = fig.add_subplot(2, 3, 6) - ax6.set_xlabel("Midrange") - axes.append(ax6) - ax6.hist(midranges, **kwds) - for axis in axes: - plt.setp(axis.get_xticklabels(), fontsize=8) - plt.setp(axis.get_yticklabels(), fontsize=8) - return fig - - -@deprecate_kwarg(old_arg_name='colors', new_arg_name='color') -@deprecate_kwarg(old_arg_name='data', new_arg_name='frame', stacklevel=3) -def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None, - use_columns=False, xticks=None, colormap=None, - axvlines=True, axvlines_kwds=None, **kwds): - """Parallel coordinates plotting. - - Parameters - ---------- - frame: DataFrame - class_column: str - Column name containing class names - cols: list, optional - A list of column names to use - ax: matplotlib.axis, optional - matplotlib axis object - color: list or tuple, optional - Colors to use for the different classes - use_columns: bool, optional - If true, columns will be used as xticks - xticks: list or tuple, optional - A list of values to use for xticks - colormap: str or matplotlib colormap, default None - Colormap to use for line colors. - axvlines: bool, optional - If true, vertical lines will be added at each xtick - axvlines_kwds: keywords, optional - Options to be passed to axvline method for vertical lines - kwds: keywords - Options to pass to matplotlib plotting method - - Returns - ------- - ax: matplotlib axis object - - Examples - -------- - >>> from pandas import read_csv - >>> from pandas.tools.plotting import parallel_coordinates - >>> from matplotlib import pyplot as plt - >>> df = read_csv('https://raw.github.com/pandas-dev/pandas/master' - '/pandas/tests/data/iris.csv') - >>> parallel_coordinates(df, 'Name', color=('#556270', - '#4ECDC4', '#C7F464')) - >>> plt.show() - """ - if axvlines_kwds is None: - axvlines_kwds = {'linewidth': 1, 'color': 'black'} - import matplotlib.pyplot as plt - - n = len(frame) - classes = frame[class_column].drop_duplicates() - class_col = frame[class_column] - - if cols is None: - df = frame.drop(class_column, axis=1) - else: - df = frame[cols] - - used_legends = set([]) - - ncols = len(df.columns) - - # determine values to use for xticks - if use_columns is True: - if not np.all(np.isreal(list(df.columns))): - raise ValueError('Columns must be numeric to be used as xticks') - x = df.columns - elif xticks is not None: - if not np.all(np.isreal(xticks)): - raise ValueError('xticks specified must be numeric') - elif len(xticks) != ncols: - raise ValueError('Length of xticks must match number of columns') - x = xticks - else: - x = lrange(ncols) - - if ax is None: - ax = plt.gca() - - color_values = _get_standard_colors(num_colors=len(classes), - colormap=colormap, color_type='random', - color=color) - - colors = dict(zip(classes, color_values)) - - for i in range(n): - y = df.iloc[i].values - kls = class_col.iat[i] - label = pprint_thing(kls) - if label not in used_legends: - used_legends.add(label) - ax.plot(x, y, color=colors[kls], label=label, **kwds) - else: - ax.plot(x, y, color=colors[kls], **kwds) - - if axvlines: - for i in x: - ax.axvline(i, **axvlines_kwds) - - ax.set_xticks(x) - ax.set_xticklabels(df.columns) - ax.set_xlim(x[0], x[-1]) - ax.legend(loc='upper right') - ax.grid() - return ax - - -def lag_plot(series, lag=1, ax=None, **kwds): - """Lag plot for time series. - - Parameters: - ----------- - series: Time series - lag: lag of the scatter plot, default 1 - ax: Matplotlib axis object, optional - kwds: Matplotlib scatter method keyword arguments, optional - - Returns: - -------- - ax: Matplotlib axis object - """ - import matplotlib.pyplot as plt - - # workaround because `c='b'` is hardcoded in matplotlibs scatter method - kwds.setdefault('c', plt.rcParams['patch.facecolor']) - - data = series.values - y1 = data[:-lag] - y2 = data[lag:] - if ax is None: - ax = plt.gca() - ax.set_xlabel("y(t)") - ax.set_ylabel("y(t + %s)" % lag) - ax.scatter(y1, y2, **kwds) - return ax - - -def autocorrelation_plot(series, ax=None, **kwds): - """Autocorrelation plot for time series. - - Parameters: - ----------- - series: Time series - ax: Matplotlib axis object, optional - kwds : keywords - Options to pass to matplotlib plotting method - - Returns: - ----------- - ax: Matplotlib axis object - """ - import matplotlib.pyplot as plt - n = len(series) - data = np.asarray(series) - if ax is None: - ax = plt.gca(xlim=(1, n), ylim=(-1.0, 1.0)) - mean = np.mean(data) - c0 = np.sum((data - mean) ** 2) / float(n) - - def r(h): - return ((data[:n - h] - mean) * - (data[h:] - mean)).sum() / float(n) / c0 - x = np.arange(n) + 1 - y = lmap(r, x) - z95 = 1.959963984540054 - z99 = 2.5758293035489004 - ax.axhline(y=z99 / np.sqrt(n), linestyle='--', color='grey') - ax.axhline(y=z95 / np.sqrt(n), color='grey') - ax.axhline(y=0.0, color='black') - ax.axhline(y=-z95 / np.sqrt(n), color='grey') - ax.axhline(y=-z99 / np.sqrt(n), linestyle='--', color='grey') - ax.set_xlabel("Lag") - ax.set_ylabel("Autocorrelation") - ax.plot(x, y, **kwds) - if 'label' in kwds: - ax.legend() - ax.grid() - return ax - - -class MPLPlot(object): - """ - Base class for assembling a pandas plot using matplotlib - - Parameters - ---------- - data : - - """ - - @property - def _kind(self): - """Specify kind str. Must be overridden in child class""" - raise NotImplementedError - - _layout_type = 'vertical' - _default_rot = 0 - orientation = None - _pop_attributes = ['label', 'style', 'logy', 'logx', 'loglog', - 'mark_right', 'stacked'] - _attr_defaults = {'logy': False, 'logx': False, 'loglog': False, - 'mark_right': True, 'stacked': False} - - def __init__(self, data, kind=None, by=None, subplots=False, sharex=None, - sharey=False, use_index=True, - figsize=None, grid=None, legend=True, rot=None, - ax=None, fig=None, title=None, xlim=None, ylim=None, - xticks=None, yticks=None, - sort_columns=False, fontsize=None, - secondary_y=False, colormap=None, - table=False, layout=None, **kwds): - - self.data = data - self.by = by - - self.kind = kind - - self.sort_columns = sort_columns - - self.subplots = subplots - - if sharex is None: - if ax is None: - self.sharex = True - else: - # if we get an axis, the users should do the visibility - # setting... - self.sharex = False - else: - self.sharex = sharex - - self.sharey = sharey - self.figsize = figsize - self.layout = layout - - self.xticks = xticks - self.yticks = yticks - self.xlim = xlim - self.ylim = ylim - self.title = title - self.use_index = use_index - - self.fontsize = fontsize - - if rot is not None: - self.rot = rot - # need to know for format_date_labels since it's rotated to 30 by - # default - self._rot_set = True - else: - self._rot_set = False - self.rot = self._default_rot - - if grid is None: - grid = False if secondary_y else self.plt.rcParams['axes.grid'] - - self.grid = grid - self.legend = legend - self.legend_handles = [] - self.legend_labels = [] - - for attr in self._pop_attributes: - value = kwds.pop(attr, self._attr_defaults.get(attr, None)) - setattr(self, attr, value) - - self.ax = ax - self.fig = fig - self.axes = None - - # parse errorbar input if given - xerr = kwds.pop('xerr', None) - yerr = kwds.pop('yerr', None) - self.errors = {} - for kw, err in zip(['xerr', 'yerr'], [xerr, yerr]): - self.errors[kw] = self._parse_errorbars(kw, err) - - if not isinstance(secondary_y, (bool, tuple, list, np.ndarray, Index)): - secondary_y = [secondary_y] - self.secondary_y = secondary_y - - # ugly TypeError if user passes matplotlib's `cmap` name. - # Probably better to accept either. - if 'cmap' in kwds and colormap: - raise TypeError("Only specify one of `cmap` and `colormap`.") - elif 'cmap' in kwds: - self.colormap = kwds.pop('cmap') - else: - self.colormap = colormap - - self.table = table - - self.kwds = kwds - - self._validate_color_args() - - def _validate_color_args(self): - if 'color' not in self.kwds and 'colors' in self.kwds: - warnings.warn(("'colors' is being deprecated. Please use 'color'" - "instead of 'colors'")) - colors = self.kwds.pop('colors') - self.kwds['color'] = colors - - if ('color' in self.kwds and self.nseries == 1): - # support series.plot(color='green') - self.kwds['color'] = [self.kwds['color']] - - if ('color' in self.kwds or 'colors' in self.kwds) and \ - self.colormap is not None: - warnings.warn("'color' and 'colormap' cannot be used " - "simultaneously. Using 'color'") - - if 'color' in self.kwds and self.style is not None: - if is_list_like(self.style): - styles = self.style - else: - styles = [self.style] - # need only a single match - for s in styles: - if re.match('^[a-z]+?', s) is not None: - raise ValueError( - "Cannot pass 'style' string with a color " - "symbol and 'color' keyword argument. Please" - " use one or the other or pass 'style' " - "without a color symbol") - - def _iter_data(self, data=None, keep_index=False, fillna=None): - if data is None: - data = self.data - if fillna is not None: - data = data.fillna(fillna) - - # TODO: unused? - # if self.sort_columns: - # columns = _try_sort(data.columns) - # else: - # columns = data.columns - - for col, values in data.iteritems(): - if keep_index is True: - yield col, values - else: - yield col, values.values - - @property - def nseries(self): - if self.data.ndim == 1: - return 1 - else: - return self.data.shape[1] - - def draw(self): - self.plt.draw_if_interactive() - - def generate(self): - self._args_adjust() - self._compute_plot_data() - self._setup_subplots() - self._make_plot() - self._add_table() - self._make_legend() - self._adorn_subplots() - - for ax in self.axes: - self._post_plot_logic_common(ax, self.data) - self._post_plot_logic(ax, self.data) - - def _args_adjust(self): - pass - - def _has_plotted_object(self, ax): - """check whether ax has data""" - return (len(ax.lines) != 0 or - len(ax.artists) != 0 or - len(ax.containers) != 0) - - def _maybe_right_yaxis(self, ax, axes_num): - if not self.on_right(axes_num): - # secondary axes may be passed via ax kw - return self._get_ax_layer(ax) - - if hasattr(ax, 'right_ax'): - # if it has right_ax proparty, ``ax`` must be left axes - return ax.right_ax - elif hasattr(ax, 'left_ax'): - # if it has left_ax proparty, ``ax`` must be right axes - return ax - else: - # otherwise, create twin axes - orig_ax, new_ax = ax, ax.twinx() - # TODO: use Matplotlib public API when available - new_ax._get_lines = orig_ax._get_lines - new_ax._get_patches_for_fill = orig_ax._get_patches_for_fill - orig_ax.right_ax, new_ax.left_ax = new_ax, orig_ax - - if not self._has_plotted_object(orig_ax): # no data on left y - orig_ax.get_yaxis().set_visible(False) - return new_ax - - def _setup_subplots(self): - if self.subplots: - fig, axes = _subplots(naxes=self.nseries, - sharex=self.sharex, sharey=self.sharey, - figsize=self.figsize, ax=self.ax, - layout=self.layout, - layout_type=self._layout_type) - else: - if self.ax is None: - fig = self.plt.figure(figsize=self.figsize) - axes = fig.add_subplot(111) - else: - fig = self.ax.get_figure() - if self.figsize is not None: - fig.set_size_inches(self.figsize) - axes = self.ax - - axes = _flatten(axes) - - if self.logx or self.loglog: - [a.set_xscale('log') for a in axes] - if self.logy or self.loglog: - [a.set_yscale('log') for a in axes] - - self.fig = fig - self.axes = axes - - @property - def result(self): - """ - Return result axes - """ - if self.subplots: - if self.layout is not None and not is_list_like(self.ax): - return self.axes.reshape(*self.layout) - else: - return self.axes - else: - sec_true = isinstance(self.secondary_y, bool) and self.secondary_y - all_sec = (is_list_like(self.secondary_y) and - len(self.secondary_y) == self.nseries) - if (sec_true or all_sec): - # if all data is plotted on secondary, return right axes - return self._get_ax_layer(self.axes[0], primary=False) - else: - return self.axes[0] - - def _compute_plot_data(self): - data = self.data - - if isinstance(data, Series): - label = self.label - if label is None and data.name is None: - label = 'None' - data = data.to_frame(name=label) - - numeric_data = data._convert(datetime=True)._get_numeric_data() - - try: - is_empty = numeric_data.empty - except AttributeError: - is_empty = not len(numeric_data) - - # no empty frames or series allowed - if is_empty: - raise TypeError('Empty {0!r}: no numeric data to ' - 'plot'.format(numeric_data.__class__.__name__)) - - self.data = numeric_data - - def _make_plot(self): - raise AbstractMethodError(self) - - def _add_table(self): - if self.table is False: - return - elif self.table is True: - data = self.data.transpose() - else: - data = self.table - ax = self._get_ax(0) - table(ax, data) - - def _post_plot_logic_common(self, ax, data): - """Common post process for each axes""" - labels = [pprint_thing(key) for key in data.index] - labels = dict(zip(range(len(data.index)), labels)) - - if self.orientation == 'vertical' or self.orientation is None: - if self._need_to_set_index: - xticklabels = [labels.get(x, '') for x in ax.get_xticks()] - ax.set_xticklabels(xticklabels) - self._apply_axis_properties(ax.xaxis, rot=self.rot, - fontsize=self.fontsize) - self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize) - elif self.orientation == 'horizontal': - if self._need_to_set_index: - yticklabels = [labels.get(y, '') for y in ax.get_yticks()] - ax.set_yticklabels(yticklabels) - self._apply_axis_properties(ax.yaxis, rot=self.rot, - fontsize=self.fontsize) - self._apply_axis_properties(ax.xaxis, fontsize=self.fontsize) - else: # pragma no cover - raise ValueError - - def _post_plot_logic(self, ax, data): - """Post process for each axes. Overridden in child classes""" - pass - - def _adorn_subplots(self): - """Common post process unrelated to data""" - if len(self.axes) > 0: - all_axes = self._get_subplots() - nrows, ncols = self._get_axes_layout() - _handle_shared_axes(axarr=all_axes, nplots=len(all_axes), - naxes=nrows * ncols, nrows=nrows, - ncols=ncols, sharex=self.sharex, - sharey=self.sharey) - - for ax in self.axes: - if self.yticks is not None: - ax.set_yticks(self.yticks) - - if self.xticks is not None: - ax.set_xticks(self.xticks) - - if self.ylim is not None: - ax.set_ylim(self.ylim) - - if self.xlim is not None: - ax.set_xlim(self.xlim) - - ax.grid(self.grid) - - if self.title: - if self.subplots: - if is_list_like(self.title): - if len(self.title) != self.nseries: - msg = ('The length of `title` must equal the number ' - 'of columns if using `title` of type `list` ' - 'and `subplots=True`.\n' - 'length of title = {}\n' - 'number of columns = {}').format( - len(self.title), self.nseries) - raise ValueError(msg) - - for (ax, title) in zip(self.axes, self.title): - ax.set_title(title) - else: - self.fig.suptitle(self.title) - else: - if is_list_like(self.title): - msg = ('Using `title` of type `list` is not supported ' - 'unless `subplots=True` is passed') - raise ValueError(msg) - self.axes[0].set_title(self.title) - - def _apply_axis_properties(self, axis, rot=None, fontsize=None): - labels = axis.get_majorticklabels() + axis.get_minorticklabels() - for label in labels: - if rot is not None: - label.set_rotation(rot) - if fontsize is not None: - label.set_fontsize(fontsize) - - @property - def legend_title(self): - if not isinstance(self.data.columns, MultiIndex): - name = self.data.columns.name - if name is not None: - name = pprint_thing(name) - return name - else: - stringified = map(pprint_thing, - self.data.columns.names) - return ','.join(stringified) - - def _add_legend_handle(self, handle, label, index=None): - if label is not None: - if self.mark_right and index is not None: - if self.on_right(index): - label = label + ' (right)' - self.legend_handles.append(handle) - self.legend_labels.append(label) - - def _make_legend(self): - ax, leg = self._get_ax_legend(self.axes[0]) - - handles = [] - labels = [] - title = '' - - if not self.subplots: - if leg is not None: - title = leg.get_title().get_text() - handles = leg.legendHandles - labels = [x.get_text() for x in leg.get_texts()] - - if self.legend: - if self.legend == 'reverse': - self.legend_handles = reversed(self.legend_handles) - self.legend_labels = reversed(self.legend_labels) - - handles += self.legend_handles - labels += self.legend_labels - if self.legend_title is not None: - title = self.legend_title - - if len(handles) > 0: - ax.legend(handles, labels, loc='best', title=title) - - elif self.subplots and self.legend: - for ax in self.axes: - if ax.get_visible(): - ax.legend(loc='best') - - def _get_ax_legend(self, ax): - leg = ax.get_legend() - other_ax = (getattr(ax, 'left_ax', None) or - getattr(ax, 'right_ax', None)) - other_leg = None - if other_ax is not None: - other_leg = other_ax.get_legend() - if leg is None and other_leg is not None: - leg = other_leg - ax = other_ax - return ax, leg - - @cache_readonly - def plt(self): - import matplotlib.pyplot as plt - return plt - - @staticmethod - def mpl_ge_1_3_1(): - return _mpl_ge_1_3_1() - - @staticmethod - def mpl_ge_1_5_0(): - return _mpl_ge_1_5_0() - - _need_to_set_index = False - - def _get_xticks(self, convert_period=False): - index = self.data.index - is_datetype = index.inferred_type in ('datetime', 'date', - 'datetime64', 'time') - - if self.use_index: - if convert_period and isinstance(index, PeriodIndex): - self.data = self.data.reindex(index=index.sort_values()) - x = self.data.index.to_timestamp()._mpl_repr() - elif index.is_numeric(): - """ - Matplotlib supports numeric values or datetime objects as - xaxis values. Taking LBYL approach here, by the time - matplotlib raises exception when using non numeric/datetime - values for xaxis, several actions are already taken by plt. - """ - x = index._mpl_repr() - elif is_datetype: - self.data = self.data.sort_index() - x = self.data.index._mpl_repr() - else: - self._need_to_set_index = True - x = lrange(len(index)) - else: - x = lrange(len(index)) - - return x - - @classmethod - def _plot(cls, ax, x, y, style=None, is_errorbar=False, **kwds): - mask = isnull(y) - if mask.any(): - y = np.ma.array(y) - y = np.ma.masked_where(mask, y) - - if isinstance(x, Index): - x = x._mpl_repr() - - if is_errorbar: - if 'xerr' in kwds: - kwds['xerr'] = np.array(kwds.get('xerr')) - if 'yerr' in kwds: - kwds['yerr'] = np.array(kwds.get('yerr')) - return ax.errorbar(x, y, **kwds) - else: - # prevent style kwarg from going to errorbar, where it is - # unsupported - if style is not None: - args = (x, y, style) - else: - args = (x, y) - return ax.plot(*args, **kwds) - - def _get_index_name(self): - if isinstance(self.data.index, MultiIndex): - name = self.data.index.names - if any(x is not None for x in name): - name = ','.join([pprint_thing(x) for x in name]) - else: - name = None - else: - name = self.data.index.name - if name is not None: - name = pprint_thing(name) - - return name - - @classmethod - def _get_ax_layer(cls, ax, primary=True): - """get left (primary) or right (secondary) axes""" - if primary: - return getattr(ax, 'left_ax', ax) - else: - return getattr(ax, 'right_ax', ax) - - def _get_ax(self, i): - # get the twinx ax if appropriate - if self.subplots: - ax = self.axes[i] - ax = self._maybe_right_yaxis(ax, i) - self.axes[i] = ax - else: - ax = self.axes[0] - ax = self._maybe_right_yaxis(ax, i) - - ax.get_yaxis().set_visible(True) - return ax - - def on_right(self, i): - if isinstance(self.secondary_y, bool): - return self.secondary_y - - if isinstance(self.secondary_y, (tuple, list, np.ndarray, Index)): - return self.data.columns[i] in self.secondary_y - - def _apply_style_colors(self, colors, kwds, col_num, label): - """ - Manage style and color based on column number and its label. - Returns tuple of appropriate style and kwds which "color" may be added. - """ - style = None - if self.style is not None: - if isinstance(self.style, list): - try: - style = self.style[col_num] - except IndexError: - pass - elif isinstance(self.style, dict): - style = self.style.get(label, style) - else: - style = self.style - - has_color = 'color' in kwds or self.colormap is not None - nocolor_style = style is None or re.match('[a-z]+', style) is None - if (has_color or self.subplots) and nocolor_style: - kwds['color'] = colors[col_num % len(colors)] - return style, kwds - - def _get_colors(self, num_colors=None, color_kwds='color'): - if num_colors is None: - num_colors = self.nseries - - return _get_standard_colors(num_colors=num_colors, - colormap=self.colormap, - color=self.kwds.get(color_kwds)) - - def _parse_errorbars(self, label, err): - """ - Look for error keyword arguments and return the actual errorbar data - or return the error DataFrame/dict - - Error bars can be specified in several ways: - Series: the user provides a pandas.Series object of the same - length as the data - ndarray: provides a np.ndarray of the same length as the data - DataFrame/dict: error values are paired with keys matching the - key in the plotted DataFrame - str: the name of the column within the plotted DataFrame - """ - - if err is None: - return None - - from pandas import DataFrame, Series - - def match_labels(data, e): - e = e.reindex_axis(data.index) - return e - - # key-matched DataFrame - if isinstance(err, DataFrame): - - err = match_labels(self.data, err) - # key-matched dict - elif isinstance(err, dict): - pass - - # Series of error values - elif isinstance(err, Series): - # broadcast error series across data - err = match_labels(self.data, err) - err = np.atleast_2d(err) - err = np.tile(err, (self.nseries, 1)) - - # errors are a column in the dataframe - elif isinstance(err, string_types): - evalues = self.data[err].values - self.data = self.data[self.data.columns.drop(err)] - err = np.atleast_2d(evalues) - err = np.tile(err, (self.nseries, 1)) - - elif is_list_like(err): - if is_iterator(err): - err = np.atleast_2d(list(err)) - else: - # raw error values - err = np.atleast_2d(err) - - err_shape = err.shape - - # asymmetrical error bars - if err.ndim == 3: - if (err_shape[0] != self.nseries) or \ - (err_shape[1] != 2) or \ - (err_shape[2] != len(self.data)): - msg = "Asymmetrical error bars should be provided " + \ - "with the shape (%u, 2, %u)" % \ - (self.nseries, len(self.data)) - raise ValueError(msg) - - # broadcast errors to each data series - if len(err) == 1: - err = np.tile(err, (self.nseries, 1)) - - elif is_number(err): - err = np.tile([err], (self.nseries, len(self.data))) - - else: - msg = "No valid %s detected" % label - raise ValueError(msg) - - return err - - def _get_errorbars(self, label=None, index=None, xerr=True, yerr=True): - from pandas import DataFrame - errors = {} - - for kw, flag in zip(['xerr', 'yerr'], [xerr, yerr]): - if flag: - err = self.errors[kw] - # user provided label-matched dataframe of errors - if isinstance(err, (DataFrame, dict)): - if label is not None and label in err.keys(): - err = err[label] - else: - err = None - elif index is not None and err is not None: - err = err[index] - - if err is not None: - errors[kw] = err - return errors - - def _get_subplots(self): - from matplotlib.axes import Subplot - return [ax for ax in self.axes[0].get_figure().get_axes() - if isinstance(ax, Subplot)] - - def _get_axes_layout(self): - axes = self._get_subplots() - x_set = set() - y_set = set() - for ax in axes: - # check axes coordinates to estimate layout - points = ax.get_position().get_points() - x_set.add(points[0][0]) - y_set.add(points[0][1]) - return (len(y_set), len(x_set)) - - -class PlanePlot(MPLPlot): - """ - Abstract class for plotting on plane, currently scatter and hexbin. - """ - - _layout_type = 'single' - - def __init__(self, data, x, y, **kwargs): - MPLPlot.__init__(self, data, **kwargs) - if x is None or y is None: - raise ValueError(self._kind + ' requires and x and y column') - if is_integer(x) and not self.data.columns.holds_integer(): - x = self.data.columns[x] - if is_integer(y) and not self.data.columns.holds_integer(): - y = self.data.columns[y] - self.x = x - self.y = y - - @property - def nseries(self): - return 1 - - def _post_plot_logic(self, ax, data): - x, y = self.x, self.y - ax.set_ylabel(pprint_thing(y)) - ax.set_xlabel(pprint_thing(x)) - - -class ScatterPlot(PlanePlot): - _kind = 'scatter' - - def __init__(self, data, x, y, s=None, c=None, **kwargs): - if s is None: - # hide the matplotlib default for size, in case we want to change - # the handling of this argument later - s = 20 - super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs) - if is_integer(c) and not self.data.columns.holds_integer(): - c = self.data.columns[c] - self.c = c - - def _make_plot(self): - x, y, c, data = self.x, self.y, self.c, self.data - ax = self.axes[0] - - c_is_column = is_hashable(c) and c in self.data.columns - - # plot a colorbar only if a colormap is provided or necessary - cb = self.kwds.pop('colorbar', self.colormap or c_is_column) - - # pandas uses colormap, matplotlib uses cmap. - cmap = self.colormap or 'Greys' - cmap = self.plt.cm.get_cmap(cmap) - color = self.kwds.pop("color", None) - if c is not None and color is not None: - raise TypeError('Specify exactly one of `c` and `color`') - elif c is None and color is None: - c_values = self.plt.rcParams['patch.facecolor'] - elif color is not None: - c_values = color - elif c_is_column: - c_values = self.data[c].values - else: - c_values = c - - if self.legend and hasattr(self, 'label'): - label = self.label - else: - label = None - scatter = ax.scatter(data[x].values, data[y].values, c=c_values, - label=label, cmap=cmap, **self.kwds) - if cb: - img = ax.collections[0] - kws = dict(ax=ax) - if self.mpl_ge_1_3_1(): - kws['label'] = c if c_is_column else '' - self.fig.colorbar(img, **kws) - - if label is not None: - self._add_legend_handle(scatter, label) - else: - self.legend = False - - errors_x = self._get_errorbars(label=x, index=0, yerr=False) - errors_y = self._get_errorbars(label=y, index=0, xerr=False) - if len(errors_x) > 0 or len(errors_y) > 0: - err_kwds = dict(errors_x, **errors_y) - err_kwds['ecolor'] = scatter.get_facecolor()[0] - ax.errorbar(data[x].values, data[y].values, - linestyle='none', **err_kwds) - - -class HexBinPlot(PlanePlot): - _kind = 'hexbin' - - def __init__(self, data, x, y, C=None, **kwargs): - super(HexBinPlot, self).__init__(data, x, y, **kwargs) - if is_integer(C) and not self.data.columns.holds_integer(): - C = self.data.columns[C] - self.C = C - - def _make_plot(self): - x, y, data, C = self.x, self.y, self.data, self.C - ax = self.axes[0] - # pandas uses colormap, matplotlib uses cmap. - cmap = self.colormap or 'BuGn' - cmap = self.plt.cm.get_cmap(cmap) - cb = self.kwds.pop('colorbar', True) - - if C is None: - c_values = None - else: - c_values = data[C].values - - ax.hexbin(data[x].values, data[y].values, C=c_values, cmap=cmap, - **self.kwds) - if cb: - img = ax.collections[0] - self.fig.colorbar(img, ax=ax) - - def _make_legend(self): - pass - - -class LinePlot(MPLPlot): - _kind = 'line' - _default_rot = 0 - orientation = 'vertical' - - def __init__(self, data, **kwargs): - MPLPlot.__init__(self, data, **kwargs) - if self.stacked: - self.data = self.data.fillna(value=0) - self.x_compat = plot_params['x_compat'] - if 'x_compat' in self.kwds: - self.x_compat = bool(self.kwds.pop('x_compat')) - - def _is_ts_plot(self): - # this is slightly deceptive - return not self.x_compat and self.use_index and self._use_dynamic_x() - - def _use_dynamic_x(self): - from pandas.tseries.plotting import _use_dynamic_x - return _use_dynamic_x(self._get_ax(0), self.data) - - def _make_plot(self): - if self._is_ts_plot(): - from pandas.tseries.plotting import _maybe_convert_index - data = _maybe_convert_index(self._get_ax(0), self.data) - - x = data.index # dummy, not used - plotf = self._ts_plot - it = self._iter_data(data=data, keep_index=True) - else: - x = self._get_xticks(convert_period=True) - plotf = self._plot - it = self._iter_data() - - stacking_id = self._get_stacking_id() - is_errorbar = any(e is not None for e in self.errors.values()) - - colors = self._get_colors() - for i, (label, y) in enumerate(it): - ax = self._get_ax(i) - kwds = self.kwds.copy() - style, kwds = self._apply_style_colors(colors, kwds, i, label) - - errors = self._get_errorbars(label=label, index=i) - kwds = dict(kwds, **errors) - - label = pprint_thing(label) # .encode('utf-8') - kwds['label'] = label - - newlines = plotf(ax, x, y, style=style, column_num=i, - stacking_id=stacking_id, - is_errorbar=is_errorbar, - **kwds) - self._add_legend_handle(newlines[0], label, index=i) - - lines = _get_all_lines(ax) - left, right = _get_xlim(lines) - ax.set_xlim(left, right) - - @classmethod - def _plot(cls, ax, x, y, style=None, column_num=None, - stacking_id=None, **kwds): - # column_num is used to get the target column from protf in line and - # area plots - if column_num == 0: - cls._initialize_stacker(ax, stacking_id, len(y)) - y_values = cls._get_stacked_values(ax, stacking_id, y, kwds['label']) - lines = MPLPlot._plot(ax, x, y_values, style=style, **kwds) - cls._update_stacker(ax, stacking_id, y) - return lines - - @classmethod - def _ts_plot(cls, ax, x, data, style=None, **kwds): - from pandas.tseries.plotting import (_maybe_resample, - _decorate_axes, - format_dateaxis) - # accept x to be consistent with normal plot func, - # x is not passed to tsplot as it uses data.index as x coordinate - # column_num must be in kwds for stacking purpose - freq, data = _maybe_resample(data, ax, kwds) - - # Set ax with freq info - _decorate_axes(ax, freq, kwds) - # digging deeper - if hasattr(ax, 'left_ax'): - _decorate_axes(ax.left_ax, freq, kwds) - if hasattr(ax, 'right_ax'): - _decorate_axes(ax.right_ax, freq, kwds) - ax._plot_data.append((data, cls._kind, kwds)) - - lines = cls._plot(ax, data.index, data.values, style=style, **kwds) - # set date formatter, locators and rescale limits - format_dateaxis(ax, ax.freq) - return lines - - def _get_stacking_id(self): - if self.stacked: - return id(self.data) - else: - return None - - @classmethod - def _initialize_stacker(cls, ax, stacking_id, n): - if stacking_id is None: - return - if not hasattr(ax, '_stacker_pos_prior'): - ax._stacker_pos_prior = {} - if not hasattr(ax, '_stacker_neg_prior'): - ax._stacker_neg_prior = {} - ax._stacker_pos_prior[stacking_id] = np.zeros(n) - ax._stacker_neg_prior[stacking_id] = np.zeros(n) - - @classmethod - def _get_stacked_values(cls, ax, stacking_id, values, label): - if stacking_id is None: - return values - if not hasattr(ax, '_stacker_pos_prior'): - # stacker may not be initialized for subplots - cls._initialize_stacker(ax, stacking_id, len(values)) - - if (values >= 0).all(): - return ax._stacker_pos_prior[stacking_id] + values - elif (values <= 0).all(): - return ax._stacker_neg_prior[stacking_id] + values - - raise ValueError('When stacked is True, each column must be either ' - 'all positive or negative.' - '{0} contains both positive and negative values' - .format(label)) - - @classmethod - def _update_stacker(cls, ax, stacking_id, values): - if stacking_id is None: - return - if (values >= 0).all(): - ax._stacker_pos_prior[stacking_id] += values - elif (values <= 0).all(): - ax._stacker_neg_prior[stacking_id] += values - - def _post_plot_logic(self, ax, data): - condition = (not self._use_dynamic_x() and - data.index.is_all_dates and - not self.subplots or - (self.subplots and self.sharex)) - - index_name = self._get_index_name() - - if condition: - # irregular TS rotated 30 deg. by default - # probably a better place to check / set this. - if not self._rot_set: - self.rot = 30 - format_date_labels(ax, rot=self.rot) - - if index_name is not None and self.use_index: - ax.set_xlabel(index_name) - - -class AreaPlot(LinePlot): - _kind = 'area' - - def __init__(self, data, **kwargs): - kwargs.setdefault('stacked', True) - data = data.fillna(value=0) - LinePlot.__init__(self, data, **kwargs) - - if not self.stacked: - # use smaller alpha to distinguish overlap - self.kwds.setdefault('alpha', 0.5) - - if self.logy or self.loglog: - raise ValueError("Log-y scales are not supported in area plot") - - @classmethod - def _plot(cls, ax, x, y, style=None, column_num=None, - stacking_id=None, is_errorbar=False, **kwds): - - if column_num == 0: - cls._initialize_stacker(ax, stacking_id, len(y)) - y_values = cls._get_stacked_values(ax, stacking_id, y, kwds['label']) - - # need to remove label, because subplots uses mpl legend as it is - line_kwds = kwds.copy() - if cls.mpl_ge_1_5_0(): - line_kwds.pop('label') - lines = MPLPlot._plot(ax, x, y_values, style=style, **line_kwds) - - # get data from the line to get coordinates for fill_between - xdata, y_values = lines[0].get_data(orig=False) - - # unable to use ``_get_stacked_values`` here to get starting point - if stacking_id is None: - start = np.zeros(len(y)) - elif (y >= 0).all(): - start = ax._stacker_pos_prior[stacking_id] - elif (y <= 0).all(): - start = ax._stacker_neg_prior[stacking_id] - else: - start = np.zeros(len(y)) - - if 'color' not in kwds: - kwds['color'] = lines[0].get_color() - - rect = ax.fill_between(xdata, start, y_values, **kwds) - cls._update_stacker(ax, stacking_id, y) - - # LinePlot expects list of artists - res = [rect] if cls.mpl_ge_1_5_0() else lines - return res - - def _add_legend_handle(self, handle, label, index=None): - if not self.mpl_ge_1_5_0(): - from matplotlib.patches import Rectangle - # Because fill_between isn't supported in legend, - # specifically add Rectangle handle here - alpha = self.kwds.get('alpha', None) - handle = Rectangle((0, 0), 1, 1, fc=handle.get_color(), - alpha=alpha) - LinePlot._add_legend_handle(self, handle, label, index=index) - - def _post_plot_logic(self, ax, data): - LinePlot._post_plot_logic(self, ax, data) - - if self.ylim is None: - if (data >= 0).all().all(): - ax.set_ylim(0, None) - elif (data <= 0).all().all(): - ax.set_ylim(None, 0) - - -class BarPlot(MPLPlot): - _kind = 'bar' - _default_rot = 90 - orientation = 'vertical' - - def __init__(self, data, **kwargs): - self.bar_width = kwargs.pop('width', 0.5) - pos = kwargs.pop('position', 0.5) - kwargs.setdefault('align', 'center') - self.tick_pos = np.arange(len(data)) - - self.bottom = kwargs.pop('bottom', 0) - self.left = kwargs.pop('left', 0) - - self.log = kwargs.pop('log', False) - MPLPlot.__init__(self, data, **kwargs) - - if self.stacked or self.subplots: - self.tickoffset = self.bar_width * pos - if kwargs['align'] == 'edge': - self.lim_offset = self.bar_width / 2 - else: - self.lim_offset = 0 - else: - if kwargs['align'] == 'edge': - w = self.bar_width / self.nseries - self.tickoffset = self.bar_width * (pos - 0.5) + w * 0.5 - self.lim_offset = w * 0.5 - else: - self.tickoffset = self.bar_width * pos - self.lim_offset = 0 - - self.ax_pos = self.tick_pos - self.tickoffset - - def _args_adjust(self): - if is_list_like(self.bottom): - self.bottom = np.array(self.bottom) - if is_list_like(self.left): - self.left = np.array(self.left) - - @classmethod - def _plot(cls, ax, x, y, w, start=0, log=False, **kwds): - return ax.bar(x, y, w, bottom=start, log=log, **kwds) - - @property - def _start_base(self): - return self.bottom - - def _make_plot(self): - import matplotlib as mpl - - colors = self._get_colors() - ncolors = len(colors) - - pos_prior = neg_prior = np.zeros(len(self.data)) - K = self.nseries - - for i, (label, y) in enumerate(self._iter_data(fillna=0)): - ax = self._get_ax(i) - kwds = self.kwds.copy() - kwds['color'] = colors[i % ncolors] - - errors = self._get_errorbars(label=label, index=i) - kwds = dict(kwds, **errors) - - label = pprint_thing(label) - - if (('yerr' in kwds) or ('xerr' in kwds)) \ - and (kwds.get('ecolor') is None): - kwds['ecolor'] = mpl.rcParams['xtick.color'] - - start = 0 - if self.log and (y >= 1).all(): - start = 1 - start = start + self._start_base - - if self.subplots: - w = self.bar_width / 2 - rect = self._plot(ax, self.ax_pos + w, y, self.bar_width, - start=start, label=label, - log=self.log, **kwds) - ax.set_title(label) - elif self.stacked: - mask = y > 0 - start = np.where(mask, pos_prior, neg_prior) + self._start_base - w = self.bar_width / 2 - rect = self._plot(ax, self.ax_pos + w, y, self.bar_width, - start=start, label=label, - log=self.log, **kwds) - pos_prior = pos_prior + np.where(mask, y, 0) - neg_prior = neg_prior + np.where(mask, 0, y) - else: - w = self.bar_width / K - rect = self._plot(ax, self.ax_pos + (i + 0.5) * w, y, w, - start=start, label=label, - log=self.log, **kwds) - self._add_legend_handle(rect, label, index=i) - - def _post_plot_logic(self, ax, data): - if self.use_index: - str_index = [pprint_thing(key) for key in data.index] - else: - str_index = [pprint_thing(key) for key in range(data.shape[0])] - name = self._get_index_name() - - s_edge = self.ax_pos[0] - 0.25 + self.lim_offset - e_edge = self.ax_pos[-1] + 0.25 + self.bar_width + self.lim_offset - - self._decorate_ticks(ax, name, str_index, s_edge, e_edge) - - def _decorate_ticks(self, ax, name, ticklabels, start_edge, end_edge): - ax.set_xlim((start_edge, end_edge)) - ax.set_xticks(self.tick_pos) - ax.set_xticklabels(ticklabels) - if name is not None and self.use_index: - ax.set_xlabel(name) - - -class BarhPlot(BarPlot): - _kind = 'barh' - _default_rot = 0 - orientation = 'horizontal' - - @property - def _start_base(self): - return self.left - - @classmethod - def _plot(cls, ax, x, y, w, start=0, log=False, **kwds): - return ax.barh(x, y, w, left=start, log=log, **kwds) - - def _decorate_ticks(self, ax, name, ticklabels, start_edge, end_edge): - # horizontal bars - ax.set_ylim((start_edge, end_edge)) - ax.set_yticks(self.tick_pos) - ax.set_yticklabels(ticklabels) - if name is not None and self.use_index: - ax.set_ylabel(name) - - -class HistPlot(LinePlot): - _kind = 'hist' - - def __init__(self, data, bins=10, bottom=0, **kwargs): - self.bins = bins # use mpl default - self.bottom = bottom - # Do not call LinePlot.__init__ which may fill nan - MPLPlot.__init__(self, data, **kwargs) - - def _args_adjust(self): - if is_integer(self.bins): - # create common bin edge - values = (self.data._convert(datetime=True)._get_numeric_data()) - values = np.ravel(values) - values = values[~isnull(values)] - - hist, self.bins = np.histogram( - values, bins=self.bins, - range=self.kwds.get('range', None), - weights=self.kwds.get('weights', None)) - - if is_list_like(self.bottom): - self.bottom = np.array(self.bottom) - - @classmethod - def _plot(cls, ax, y, style=None, bins=None, bottom=0, column_num=0, - stacking_id=None, **kwds): - if column_num == 0: - cls._initialize_stacker(ax, stacking_id, len(bins) - 1) - y = y[~isnull(y)] - - base = np.zeros(len(bins) - 1) - bottom = bottom + \ - cls._get_stacked_values(ax, stacking_id, base, kwds['label']) - # ignore style - n, bins, patches = ax.hist(y, bins=bins, bottom=bottom, **kwds) - cls._update_stacker(ax, stacking_id, n) - return patches - - def _make_plot(self): - colors = self._get_colors() - stacking_id = self._get_stacking_id() - - for i, (label, y) in enumerate(self._iter_data()): - ax = self._get_ax(i) - - kwds = self.kwds.copy() - - label = pprint_thing(label) - kwds['label'] = label - - style, kwds = self._apply_style_colors(colors, kwds, i, label) - if style is not None: - kwds['style'] = style - - kwds = self._make_plot_keywords(kwds, y) - artists = self._plot(ax, y, column_num=i, - stacking_id=stacking_id, **kwds) - self._add_legend_handle(artists[0], label, index=i) - - def _make_plot_keywords(self, kwds, y): - """merge BoxPlot/KdePlot properties to passed kwds""" - # y is required for KdePlot - kwds['bottom'] = self.bottom - kwds['bins'] = self.bins - return kwds - - def _post_plot_logic(self, ax, data): - if self.orientation == 'horizontal': - ax.set_xlabel('Frequency') - else: - ax.set_ylabel('Frequency') - - @property - def orientation(self): - if self.kwds.get('orientation', None) == 'horizontal': - return 'horizontal' - else: - return 'vertical' - - -class KdePlot(HistPlot): - _kind = 'kde' - orientation = 'vertical' - - def __init__(self, data, bw_method=None, ind=None, **kwargs): - MPLPlot.__init__(self, data, **kwargs) - self.bw_method = bw_method - self.ind = ind - - def _args_adjust(self): - pass - - def _get_ind(self, y): - if self.ind is None: - # np.nanmax() and np.nanmin() ignores the missing values - sample_range = np.nanmax(y) - np.nanmin(y) - ind = np.linspace(np.nanmin(y) - 0.5 * sample_range, - np.nanmax(y) + 0.5 * sample_range, 1000) - else: - ind = self.ind - return ind - - @classmethod - def _plot(cls, ax, y, style=None, bw_method=None, ind=None, - column_num=None, stacking_id=None, **kwds): - from scipy.stats import gaussian_kde - from scipy import __version__ as spv - - y = remove_na(y) - - if LooseVersion(spv) >= '0.11.0': - gkde = gaussian_kde(y, bw_method=bw_method) - else: - gkde = gaussian_kde(y) - if bw_method is not None: - msg = ('bw_method was added in Scipy 0.11.0.' + - ' Scipy version in use is %s.' % spv) - warnings.warn(msg) - - y = gkde.evaluate(ind) - lines = MPLPlot._plot(ax, ind, y, style=style, **kwds) - return lines - - def _make_plot_keywords(self, kwds, y): - kwds['bw_method'] = self.bw_method - kwds['ind'] = self._get_ind(y) - return kwds - - def _post_plot_logic(self, ax, data): - ax.set_ylabel('Density') - - -class PiePlot(MPLPlot): - _kind = 'pie' - _layout_type = 'horizontal' - - def __init__(self, data, kind=None, **kwargs): - data = data.fillna(value=0) - if (data < 0).any().any(): - raise ValueError("{0} doesn't allow negative values".format(kind)) - MPLPlot.__init__(self, data, kind=kind, **kwargs) - - def _args_adjust(self): - self.grid = False - self.logy = False - self.logx = False - self.loglog = False - - def _validate_color_args(self): - pass - - def _make_plot(self): - colors = self._get_colors( - num_colors=len(self.data), color_kwds='colors') - self.kwds.setdefault('colors', colors) - - for i, (label, y) in enumerate(self._iter_data()): - ax = self._get_ax(i) - if label is not None: - label = pprint_thing(label) - ax.set_ylabel(label) - - kwds = self.kwds.copy() - - def blank_labeler(label, value): - if value == 0: - return '' - else: - return label - - idx = [pprint_thing(v) for v in self.data.index] - labels = kwds.pop('labels', idx) - # labels is used for each wedge's labels - # Blank out labels for values of 0 so they don't overlap - # with nonzero wedges - if labels is not None: - blabels = [blank_labeler(l, value) for - l, value in zip(labels, y)] - else: - blabels = None - results = ax.pie(y, labels=blabels, **kwds) - - if kwds.get('autopct', None) is not None: - patches, texts, autotexts = results - else: - patches, texts = results - autotexts = [] - - if self.fontsize is not None: - for t in texts + autotexts: - t.set_fontsize(self.fontsize) - - # leglabels is used for legend labels - leglabels = labels if labels is not None else idx - for p, l in zip(patches, leglabels): - self._add_legend_handle(p, l) - - -class BoxPlot(LinePlot): - _kind = 'box' - _layout_type = 'horizontal' - - _valid_return_types = (None, 'axes', 'dict', 'both') - # namedtuple to hold results - BP = namedtuple("Boxplot", ['ax', 'lines']) - - def __init__(self, data, return_type='axes', **kwargs): - # Do not call LinePlot.__init__ which may fill nan - if return_type not in self._valid_return_types: - raise ValueError( - "return_type must be {None, 'axes', 'dict', 'both'}") - - self.return_type = return_type - MPLPlot.__init__(self, data, **kwargs) - - def _args_adjust(self): - if self.subplots: - # Disable label ax sharing. Otherwise, all subplots shows last - # column label - if self.orientation == 'vertical': - self.sharex = False - else: - self.sharey = False - - @classmethod - def _plot(cls, ax, y, column_num=None, return_type='axes', **kwds): - if y.ndim == 2: - y = [remove_na(v) for v in y] - # Boxplot fails with empty arrays, so need to add a NaN - # if any cols are empty - # GH 8181 - y = [v if v.size > 0 else np.array([np.nan]) for v in y] - else: - y = remove_na(y) - bp = ax.boxplot(y, **kwds) - - if return_type == 'dict': - return bp, bp - elif return_type == 'both': - return cls.BP(ax=ax, lines=bp), bp - else: - return ax, bp - - def _validate_color_args(self): - if 'color' in self.kwds: - if self.colormap is not None: - warnings.warn("'color' and 'colormap' cannot be used " - "simultaneously. Using 'color'") - self.color = self.kwds.pop('color') - - if isinstance(self.color, dict): - valid_keys = ['boxes', 'whiskers', 'medians', 'caps'] - for key, values in compat.iteritems(self.color): - if key not in valid_keys: - raise ValueError("color dict contains invalid " - "key '{0}' " - "The key must be either {1}" - .format(key, valid_keys)) - else: - self.color = None - - # get standard colors for default - colors = _get_standard_colors(num_colors=3, - colormap=self.colormap, - color=None) - # use 2 colors by default, for box/whisker and median - # flier colors isn't needed here - # because it can be specified by ``sym`` kw - self._boxes_c = colors[0] - self._whiskers_c = colors[0] - self._medians_c = colors[2] - self._caps_c = 'k' # mpl default - - def _get_colors(self, num_colors=None, color_kwds='color'): - pass - - def maybe_color_bp(self, bp): - if isinstance(self.color, dict): - boxes = self.color.get('boxes', self._boxes_c) - whiskers = self.color.get('whiskers', self._whiskers_c) - medians = self.color.get('medians', self._medians_c) - caps = self.color.get('caps', self._caps_c) - else: - # Other types are forwarded to matplotlib - # If None, use default colors - boxes = self.color or self._boxes_c - whiskers = self.color or self._whiskers_c - medians = self.color or self._medians_c - caps = self.color or self._caps_c - - from matplotlib.artist import setp - setp(bp['boxes'], color=boxes, alpha=1) - setp(bp['whiskers'], color=whiskers, alpha=1) - setp(bp['medians'], color=medians, alpha=1) - setp(bp['caps'], color=caps, alpha=1) - - def _make_plot(self): - if self.subplots: - self._return_obj = Series() - - for i, (label, y) in enumerate(self._iter_data()): - ax = self._get_ax(i) - kwds = self.kwds.copy() - - ret, bp = self._plot(ax, y, column_num=i, - return_type=self.return_type, **kwds) - self.maybe_color_bp(bp) - self._return_obj[label] = ret - - label = [pprint_thing(label)] - self._set_ticklabels(ax, label) - else: - y = self.data.values.T - ax = self._get_ax(0) - kwds = self.kwds.copy() - - ret, bp = self._plot(ax, y, column_num=0, - return_type=self.return_type, **kwds) - self.maybe_color_bp(bp) - self._return_obj = ret - - labels = [l for l, _ in self._iter_data()] - labels = [pprint_thing(l) for l in labels] - if not self.use_index: - labels = [pprint_thing(key) for key in range(len(labels))] - self._set_ticklabels(ax, labels) - - def _set_ticklabels(self, ax, labels): - if self.orientation == 'vertical': - ax.set_xticklabels(labels) - else: - ax.set_yticklabels(labels) - - def _make_legend(self): - pass - - def _post_plot_logic(self, ax, data): - pass - - @property - def orientation(self): - if self.kwds.get('vert', True): - return 'vertical' - else: - return 'horizontal' - - @property - def result(self): - if self.return_type is None: - return super(BoxPlot, self).result - else: - return self._return_obj - - -# kinds supported by both dataframe and series -_common_kinds = ['line', 'bar', 'barh', - 'kde', 'density', 'area', 'hist', 'box'] -# kinds supported by dataframe -_dataframe_kinds = ['scatter', 'hexbin'] -# kinds supported only by series or dataframe single column -_series_kinds = ['pie'] -_all_kinds = _common_kinds + _dataframe_kinds + _series_kinds - -_klasses = [LinePlot, BarPlot, BarhPlot, KdePlot, HistPlot, BoxPlot, - ScatterPlot, HexBinPlot, AreaPlot, PiePlot] - -_plot_klass = {} -for klass in _klasses: - _plot_klass[klass._kind] = klass - - -def _plot(data, x=None, y=None, subplots=False, - ax=None, kind='line', **kwds): - kind = _get_standard_kind(kind.lower().strip()) - if kind in _all_kinds: - klass = _plot_klass[kind] - else: - raise ValueError("%r is not a valid plot kind" % kind) - - from pandas import DataFrame - if kind in _dataframe_kinds: - if isinstance(data, DataFrame): - plot_obj = klass(data, x=x, y=y, subplots=subplots, ax=ax, - kind=kind, **kwds) - else: - raise ValueError("plot kind %r can only be used for data frames" - % kind) - - elif kind in _series_kinds: - if isinstance(data, DataFrame): - if y is None and subplots is False: - msg = "{0} requires either y column or 'subplots=True'" - raise ValueError(msg.format(kind)) - elif y is not None: - if is_integer(y) and not data.columns.holds_integer(): - y = data.columns[y] - # converted to series actually. copy to not modify - data = data[y].copy() - data.index.name = y - plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) - else: - if isinstance(data, DataFrame): - if x is not None: - if is_integer(x) and not data.columns.holds_integer(): - x = data.columns[x] - data = data.set_index(x) - - if y is not None: - if is_integer(y) and not data.columns.holds_integer(): - y = data.columns[y] - label = kwds['label'] if 'label' in kwds else y - series = data[y].copy() # Don't modify - series.name = label - - for kw in ['xerr', 'yerr']: - if (kw in kwds) and \ - (isinstance(kwds[kw], string_types) or - is_integer(kwds[kw])): - try: - kwds[kw] = data[kwds[kw]] - except (IndexError, KeyError, TypeError): - pass - data = series - plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) - - plot_obj.generate() - plot_obj.draw() - return plot_obj.result - - -df_kind = """- 'scatter' : scatter plot - - 'hexbin' : hexbin plot""" -series_kind = "" - -df_coord = """x : label or position, default None - y : label or position, default None - Allows plotting of one column versus another""" -series_coord = "" - -df_unique = """stacked : boolean, default False in line and - bar plots, and True in area plot. If True, create stacked plot. - sort_columns : boolean, default False - Sort column names to determine plot ordering - secondary_y : boolean or sequence, default False - Whether to plot on the secondary y-axis - If a list/tuple, which columns to plot on secondary y-axis""" -series_unique = """label : label argument to provide to plot - secondary_y : boolean or sequence of ints, default False - If True then y-axis will be on the right""" - -df_ax = """ax : matplotlib axes object, default None - subplots : boolean, default False - Make separate subplots for each column - sharex : boolean, default True if ax is None else False - In case subplots=True, share x axis and set some x axis labels to - invisible; defaults to True if ax is None otherwise False if an ax - is passed in; Be aware, that passing in both an ax and sharex=True - will alter all x axis labels for all axis in a figure! - sharey : boolean, default False - In case subplots=True, share y axis and set some y axis labels to - invisible - layout : tuple (optional) - (rows, columns) for the layout of subplots""" -series_ax = """ax : matplotlib axes object - If not passed, uses gca()""" - -df_note = """- If `kind` = 'scatter' and the argument `c` is the name of a dataframe - column, the values of that column are used to color each point. - - If `kind` = 'hexbin', you can control the size of the bins with the - `gridsize` argument. By default, a histogram of the counts around each - `(x, y)` point is computed. You can specify alternative aggregations - by passing values to the `C` and `reduce_C_function` arguments. - `C` specifies the value at each `(x, y)` point and `reduce_C_function` - is a function of one argument that reduces all the values in a bin to - a single number (e.g. `mean`, `max`, `sum`, `std`).""" -series_note = "" - -_shared_doc_df_kwargs = dict(klass='DataFrame', klass_obj='df', - klass_kind=df_kind, klass_coord=df_coord, - klass_ax=df_ax, klass_unique=df_unique, - klass_note=df_note) -_shared_doc_series_kwargs = dict(klass='Series', klass_obj='s', - klass_kind=series_kind, - klass_coord=series_coord, klass_ax=series_ax, - klass_unique=series_unique, - klass_note=series_note) - -_shared_docs['plot'] = """ - Make plots of %(klass)s using matplotlib / pylab. - - *New in version 0.17.0:* Each plot kind has a corresponding method on the - ``%(klass)s.plot`` accessor: - ``%(klass_obj)s.plot(kind='line')`` is equivalent to - ``%(klass_obj)s.plot.line()``. - - Parameters - ---------- - data : %(klass)s - %(klass_coord)s - kind : str - - 'line' : line plot (default) - - 'bar' : vertical bar plot - - 'barh' : horizontal bar plot - - 'hist' : histogram - - 'box' : boxplot - - 'kde' : Kernel Density Estimation plot - - 'density' : same as 'kde' - - 'area' : area plot - - 'pie' : pie plot - %(klass_kind)s - %(klass_ax)s - figsize : a tuple (width, height) in inches - use_index : boolean, default True - Use index as ticks for x axis - title : string or list - Title to use for the plot. If a string is passed, print the string at - the top of the figure. If a list is passed and `subplots` is True, - print each item in the list above the corresponding subplot. - grid : boolean, default None (matlab style default) - Axis grid lines - legend : False/True/'reverse' - Place legend on axis subplots - style : list or dict - matplotlib line style per column - logx : boolean, default False - Use log scaling on x axis - logy : boolean, default False - Use log scaling on y axis - loglog : boolean, default False - Use log scaling on both x and y axes - xticks : sequence - Values to use for the xticks - yticks : sequence - Values to use for the yticks - xlim : 2-tuple/list - ylim : 2-tuple/list - rot : int, default None - Rotation for ticks (xticks for vertical, yticks for horizontal plots) - fontsize : int, default None - Font size for xticks and yticks - colormap : str or matplotlib colormap object, default None - Colormap to select colors from. If string, load colormap with that name - from matplotlib. - colorbar : boolean, optional - If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots) - position : float - Specify relative alignments for bar plot layout. - From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) - layout : tuple (optional) - (rows, columns) for the layout of the plot - table : boolean, Series or DataFrame, default False - If True, draw a table using the data in the DataFrame and the data will - be transposed to meet matplotlib's default layout. - If a Series or DataFrame is passed, use passed data to draw a table. - yerr : DataFrame, Series, array-like, dict and str - See :ref:`Plotting with Error Bars ` for - detail. - xerr : same types as yerr. - %(klass_unique)s - mark_right : boolean, default True - When using a secondary_y axis, automatically mark the column - labels with "(right)" in the legend - kwds : keywords - Options to pass to matplotlib plotting method - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - - Notes - ----- - - - See matplotlib documentation online for more on this subject - - If `kind` = 'bar' or 'barh', you can specify relative alignments - for bar plot layout by `position` keyword. - From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5 (center) - %(klass_note)s - - """ - - -@Appender(_shared_docs['plot'] % _shared_doc_df_kwargs) -def plot_frame(data, x=None, y=None, kind='line', ax=None, - subplots=False, sharex=None, sharey=False, layout=None, - figsize=None, use_index=True, title=None, grid=None, - legend=True, style=None, logx=False, logy=False, loglog=False, - xticks=None, yticks=None, xlim=None, ylim=None, - rot=None, fontsize=None, colormap=None, table=False, - yerr=None, xerr=None, - secondary_y=False, sort_columns=False, - **kwds): - return _plot(data, kind=kind, x=x, y=y, ax=ax, - subplots=subplots, sharex=sharex, sharey=sharey, - layout=layout, figsize=figsize, use_index=use_index, - title=title, grid=grid, legend=legend, - style=style, logx=logx, logy=logy, loglog=loglog, - xticks=xticks, yticks=yticks, xlim=xlim, ylim=ylim, - rot=rot, fontsize=fontsize, colormap=colormap, table=table, - yerr=yerr, xerr=xerr, - secondary_y=secondary_y, sort_columns=sort_columns, - **kwds) - - -@Appender(_shared_docs['plot'] % _shared_doc_series_kwargs) -def plot_series(data, kind='line', ax=None, # Series unique - figsize=None, use_index=True, title=None, grid=None, - legend=False, style=None, logx=False, logy=False, loglog=False, - xticks=None, yticks=None, xlim=None, ylim=None, - rot=None, fontsize=None, colormap=None, table=False, - yerr=None, xerr=None, - label=None, secondary_y=False, # Series unique - **kwds): - - import matplotlib.pyplot as plt - """ - If no axes is specified, check whether there are existing figures - If there is no existing figures, _gca() will - create a figure with the default figsize, causing the figsize=parameter to - be ignored. - """ - if ax is None and len(plt.get_fignums()) > 0: - ax = _gca() - ax = MPLPlot._get_ax_layer(ax) - return _plot(data, kind=kind, ax=ax, - figsize=figsize, use_index=use_index, title=title, - grid=grid, legend=legend, - style=style, logx=logx, logy=logy, loglog=loglog, - xticks=xticks, yticks=yticks, xlim=xlim, ylim=ylim, - rot=rot, fontsize=fontsize, colormap=colormap, table=table, - yerr=yerr, xerr=xerr, - label=label, secondary_y=secondary_y, - **kwds) - - -_shared_docs['boxplot'] = """ - Make a box plot from DataFrame column optionally grouped by some columns or - other inputs - - Parameters - ---------- - data : the pandas object holding the data - column : column name or list of names, or vector - Can be any valid input to groupby - by : string or sequence - Column in the DataFrame to group by - ax : Matplotlib axes object, optional - fontsize : int or string - rot : label rotation angle - figsize : A tuple (width, height) in inches - grid : Setting this to True will show the grid - layout : tuple (optional) - (rows, columns) for the layout of the plot - return_type : {None, 'axes', 'dict', 'both'}, default None - The kind of object to return. The default is ``axes`` - 'axes' returns the matplotlib axes the boxplot is drawn on; - 'dict' returns a dictionary whose values are the matplotlib - Lines of the boxplot; - 'both' returns a namedtuple with the axes and dict. - - When grouping with ``by``, a Series mapping columns to ``return_type`` - is returned, unless ``return_type`` is None, in which case a NumPy - array of axes is returned with the same shape as ``layout``. - See the prose documentation for more. - - kwds : other plotting keyword arguments to be passed to matplotlib boxplot - function - - Returns - ------- - lines : dict - ax : matplotlib Axes - (ax, lines): namedtuple - - Notes - ----- - Use ``return_type='dict'`` when you want to tweak the appearance - of the lines after plotting. In this case a dict containing the Lines - making up the boxes, caps, fliers, medians, and whiskers is returned. - """ - - -@Appender(_shared_docs['boxplot'] % _shared_doc_kwargs) -def boxplot(data, column=None, by=None, ax=None, fontsize=None, - rot=0, grid=True, figsize=None, layout=None, return_type=None, - **kwds): - - # validate return_type: - if return_type not in BoxPlot._valid_return_types: - raise ValueError("return_type must be {'axes', 'dict', 'both'}") - - from pandas import Series, DataFrame - if isinstance(data, Series): - data = DataFrame({'x': data}) - column = 'x' - - def _get_colors(): - return _get_standard_colors(color=kwds.get('color'), num_colors=1) - - def maybe_color_bp(bp): - if 'color' not in kwds: - from matplotlib.artist import setp - setp(bp['boxes'], color=colors[0], alpha=1) - setp(bp['whiskers'], color=colors[0], alpha=1) - setp(bp['medians'], color=colors[2], alpha=1) - - def plot_group(keys, values, ax): - keys = [pprint_thing(x) for x in keys] - values = [remove_na(v) for v in values] - bp = ax.boxplot(values, **kwds) - if fontsize is not None: - ax.tick_params(axis='both', labelsize=fontsize) - if kwds.get('vert', 1): - ax.set_xticklabels(keys, rotation=rot) - else: - ax.set_yticklabels(keys, rotation=rot) - maybe_color_bp(bp) - - # Return axes in multiplot case, maybe revisit later # 985 - if return_type == 'dict': - return bp - elif return_type == 'both': - return BoxPlot.BP(ax=ax, lines=bp) - else: - return ax - - colors = _get_colors() - if column is None: - columns = None - else: - if isinstance(column, (list, tuple)): - columns = column - else: - columns = [column] - - if by is not None: - # Prefer array return type for 2-D plots to match the subplot layout - # https://github.com/pandas-dev/pandas/pull/12216#issuecomment-241175580 - result = _grouped_plot_by_column(plot_group, data, columns=columns, - by=by, grid=grid, figsize=figsize, - ax=ax, layout=layout, - return_type=return_type) - else: - if return_type is None: - return_type = 'axes' - if layout is not None: - raise ValueError("The 'layout' keyword is not supported when " - "'by' is None") - - if ax is None: - ax = _gca() - data = data._get_numeric_data() - if columns is None: - columns = data.columns - else: - data = data[columns] - - result = plot_group(columns, data.values.T, ax) - ax.grid(grid) - - return result - - -def format_date_labels(ax, rot): - # mini version of autofmt_xdate - try: - for label in ax.get_xticklabels(): - label.set_ha('right') - label.set_rotation(rot) - fig = ax.get_figure() - fig.subplots_adjust(bottom=0.2) - except Exception: # pragma: no cover - pass - - -def scatter_plot(data, x, y, by=None, ax=None, figsize=None, grid=False, - **kwargs): - """ - Make a scatter plot from two DataFrame columns - - Parameters - ---------- - data : DataFrame - x : Column name for the x-axis values - y : Column name for the y-axis values - ax : Matplotlib axis object - figsize : A tuple (width, height) in inches - grid : Setting this to True will show the grid - kwargs : other plotting keyword arguments - To be passed to scatter function - - Returns - ------- - fig : matplotlib.Figure - """ - import matplotlib.pyplot as plt - - # workaround because `c='b'` is hardcoded in matplotlibs scatter method - kwargs.setdefault('c', plt.rcParams['patch.facecolor']) - - def plot_group(group, ax): - xvals = group[x].values - yvals = group[y].values - ax.scatter(xvals, yvals, **kwargs) - ax.grid(grid) - - if by is not None: - fig = _grouped_plot(plot_group, data, by=by, figsize=figsize, ax=ax) - else: - if ax is None: - fig = plt.figure() - ax = fig.add_subplot(111) - else: - fig = ax.get_figure() - plot_group(data, ax) - ax.set_ylabel(pprint_thing(y)) - ax.set_xlabel(pprint_thing(x)) - - ax.grid(grid) - - return fig - - -def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, - xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, - sharey=False, figsize=None, layout=None, bins=10, **kwds): - """ - Draw histogram of the DataFrame's series using matplotlib / pylab. - - Parameters - ---------- - data : DataFrame - column : string or sequence - If passed, will be used to limit data to a subset of columns - by : object, optional - If passed, then used to form histograms for separate groups - grid : boolean, default True - Whether to show axis grid lines - xlabelsize : int, default None - If specified changes the x-axis label size - xrot : float, default None - rotation of x axis labels - ylabelsize : int, default None - If specified changes the y-axis label size - yrot : float, default None - rotation of y axis labels - ax : matplotlib axes object, default None - sharex : boolean, default True if ax is None else False - In case subplots=True, share x axis and set some x axis labels to - invisible; defaults to True if ax is None otherwise False if an ax - is passed in; Be aware, that passing in both an ax and sharex=True - will alter all x axis labels for all subplots in a figure! - sharey : boolean, default False - In case subplots=True, share y axis and set some y axis labels to - invisible - figsize : tuple - The size of the figure to create in inches by default - layout : tuple, optional - Tuple of (rows, columns) for the layout of the histograms - bins : integer, default 10 - Number of histogram bins to be used - kwds : other plotting keyword arguments - To be passed to hist function - """ - - if by is not None: - axes = grouped_hist(data, column=column, by=by, ax=ax, grid=grid, - figsize=figsize, sharex=sharex, sharey=sharey, - layout=layout, bins=bins, xlabelsize=xlabelsize, - xrot=xrot, ylabelsize=ylabelsize, - yrot=yrot, **kwds) - return axes - - if column is not None: - if not isinstance(column, (list, np.ndarray, Index)): - column = [column] - data = data[column] - data = data._get_numeric_data() - naxes = len(data.columns) - - fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False, - sharex=sharex, sharey=sharey, figsize=figsize, - layout=layout) - _axes = _flatten(axes) - - for i, col in enumerate(_try_sort(data.columns)): - ax = _axes[i] - ax.hist(data[col].dropna().values, bins=bins, **kwds) - ax.set_title(col) - ax.grid(grid) - - _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, - ylabelsize=ylabelsize, yrot=yrot) - fig.subplots_adjust(wspace=0.3, hspace=0.3) - - return axes - - -def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, - xrot=None, ylabelsize=None, yrot=None, figsize=None, - bins=10, **kwds): - """ - Draw histogram of the input series using matplotlib - - Parameters - ---------- - by : object, optional - If passed, then used to form histograms for separate groups - ax : matplotlib axis object - If not passed, uses gca() - grid : boolean, default True - Whether to show axis grid lines - xlabelsize : int, default None - If specified changes the x-axis label size - xrot : float, default None - rotation of x axis labels - ylabelsize : int, default None - If specified changes the y-axis label size - yrot : float, default None - rotation of y axis labels - figsize : tuple, default None - figure size in inches by default - bins: integer, default 10 - Number of histogram bins to be used - kwds : keywords - To be passed to the actual plotting function - - Notes - ----- - See matplotlib documentation online for more on this - - """ - import matplotlib.pyplot as plt - - if by is None: - if kwds.get('layout', None) is not None: - raise ValueError("The 'layout' keyword is not supported when " - "'by' is None") - # hack until the plotting interface is a bit more unified - fig = kwds.pop('figure', plt.gcf() if plt.get_fignums() else - plt.figure(figsize=figsize)) - if (figsize is not None and tuple(figsize) != - tuple(fig.get_size_inches())): - fig.set_size_inches(*figsize, forward=True) - if ax is None: - ax = fig.gca() - elif ax.get_figure() != fig: - raise AssertionError('passed axis not bound to passed figure') - values = self.dropna().values - - ax.hist(values, bins=bins, **kwds) - ax.grid(grid) - axes = np.array([ax]) - - _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, - ylabelsize=ylabelsize, yrot=yrot) - - else: - if 'figure' in kwds: - raise ValueError("Cannot pass 'figure' when using the " - "'by' argument, since a new 'Figure' instance " - "will be created") - axes = grouped_hist(self, by=by, ax=ax, grid=grid, figsize=figsize, - bins=bins, xlabelsize=xlabelsize, xrot=xrot, - ylabelsize=ylabelsize, yrot=yrot, **kwds) - - if hasattr(axes, 'ndim'): - if axes.ndim == 1 and len(axes) == 1: - return axes[0] - return axes - - -def grouped_hist(data, column=None, by=None, ax=None, bins=50, figsize=None, - layout=None, sharex=False, sharey=False, rot=90, grid=True, - xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, - **kwargs): - """ - Grouped histogram - - Parameters - ---------- - data: Series/DataFrame - column: object, optional - by: object, optional - ax: axes, optional - bins: int, default 50 - figsize: tuple, optional - layout: optional - sharex: boolean, default False - sharey: boolean, default False - rot: int, default 90 - grid: bool, default True - kwargs: dict, keyword arguments passed to matplotlib.Axes.hist - - Returns - ------- - axes: collection of Matplotlib Axes - """ - def plot_group(group, ax): - ax.hist(group.dropna().values, bins=bins, **kwargs) - - xrot = xrot or rot - - fig, axes = _grouped_plot(plot_group, data, column=column, - by=by, sharex=sharex, sharey=sharey, ax=ax, - figsize=figsize, layout=layout, rot=rot) - - _set_ticks_props(axes, xlabelsize=xlabelsize, xrot=xrot, - ylabelsize=ylabelsize, yrot=yrot) - - fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, right=0.9, - hspace=0.5, wspace=0.3) - return axes - - -def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None, - rot=0, grid=True, ax=None, figsize=None, - layout=None, **kwds): - """ - Make box plots from DataFrameGroupBy data. - - Parameters - ---------- - grouped : Grouped DataFrame - subplots : - * ``False`` - no subplots will be used - * ``True`` - create a subplot for each group - column : column name or list of names, or vector - Can be any valid input to groupby - fontsize : int or string - rot : label rotation angle - grid : Setting this to True will show the grid - ax : Matplotlib axis object, default None - figsize : A tuple (width, height) in inches - layout : tuple (optional) - (rows, columns) for the layout of the plot - kwds : other plotting keyword arguments to be passed to matplotlib boxplot - function - - Returns - ------- - dict of key/value = group key/DataFrame.boxplot return value - or DataFrame.boxplot return value in case subplots=figures=False - - Examples - -------- - >>> import pandas - >>> import numpy as np - >>> import itertools - >>> - >>> tuples = [t for t in itertools.product(range(1000), range(4))] - >>> index = pandas.MultiIndex.from_tuples(tuples, names=['lvl0', 'lvl1']) - >>> data = np.random.randn(len(index),4) - >>> df = pandas.DataFrame(data, columns=list('ABCD'), index=index) - >>> - >>> grouped = df.groupby(level='lvl1') - >>> boxplot_frame_groupby(grouped) - >>> - >>> grouped = df.unstack(level='lvl1').groupby(level=0, axis=1) - >>> boxplot_frame_groupby(grouped, subplots=False) - """ - if subplots is True: - naxes = len(grouped) - fig, axes = _subplots(naxes=naxes, squeeze=False, - ax=ax, sharex=False, sharey=True, - figsize=figsize, layout=layout) - axes = _flatten(axes) - - ret = Series() - for (key, group), ax in zip(grouped, axes): - d = group.boxplot(ax=ax, column=column, fontsize=fontsize, - rot=rot, grid=grid, **kwds) - ax.set_title(pprint_thing(key)) - ret.loc[key] = d - fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, - right=0.9, wspace=0.2) - else: - from pandas.tools.merge import concat - keys, frames = zip(*grouped) - if grouped.axis == 0: - df = concat(frames, keys=keys, axis=1) - else: - if len(frames) > 1: - df = frames[0].join(frames[1::]) - else: - df = frames[0] - ret = df.boxplot(column=column, fontsize=fontsize, rot=rot, - grid=grid, ax=ax, figsize=figsize, - layout=layout, **kwds) - return ret - - -def _grouped_plot(plotf, data, column=None, by=None, numeric_only=True, - figsize=None, sharex=True, sharey=True, layout=None, - rot=0, ax=None, **kwargs): - from pandas import DataFrame - - if figsize == 'default': - # allowed to specify mpl default with 'default' - warnings.warn("figsize='default' is deprecated. Specify figure" - "size by tuple instead", FutureWarning, stacklevel=4) - figsize = None - - grouped = data.groupby(by) - if column is not None: - grouped = grouped[column] - - naxes = len(grouped) - fig, axes = _subplots(naxes=naxes, figsize=figsize, - sharex=sharex, sharey=sharey, ax=ax, - layout=layout) - - _axes = _flatten(axes) - - for i, (key, group) in enumerate(grouped): - ax = _axes[i] - if numeric_only and isinstance(group, DataFrame): - group = group._get_numeric_data() - plotf(group, ax, **kwargs) - ax.set_title(pprint_thing(key)) - - return fig, axes - - -def _grouped_plot_by_column(plotf, data, columns=None, by=None, - numeric_only=True, grid=False, - figsize=None, ax=None, layout=None, - return_type=None, **kwargs): - grouped = data.groupby(by) - if columns is None: - if not isinstance(by, (list, tuple)): - by = [by] - columns = data._get_numeric_data().columns.difference(by) - naxes = len(columns) - fig, axes = _subplots(naxes=naxes, sharex=True, sharey=True, - figsize=figsize, ax=ax, layout=layout) - - _axes = _flatten(axes) - - result = Series() - ax_values = [] - - for i, col in enumerate(columns): - ax = _axes[i] - gp_col = grouped[col] - keys, values = zip(*gp_col) - re_plotf = plotf(keys, values, ax, **kwargs) - ax.set_title(col) - ax.set_xlabel(pprint_thing(by)) - ax_values.append(re_plotf) - ax.grid(grid) - - result = Series(ax_values, index=columns) - - # Return axes in multiplot case, maybe revisit later # 985 - if return_type is None: - result = axes - - byline = by[0] if len(by) == 1 else by - fig.suptitle('Boxplot grouped by %s' % byline) - fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1, right=0.9, wspace=0.2) - - return result - - -def table(ax, data, rowLabels=None, colLabels=None, - **kwargs): - """ - Helper function to convert DataFrame and Series to matplotlib.table - - Parameters - ---------- - `ax`: Matplotlib axes object - `data`: DataFrame or Series - data for table contents - `kwargs`: keywords, optional - keyword arguments which passed to matplotlib.table.table. - If `rowLabels` or `colLabels` is not specified, data index or column - name will be used. - - Returns - ------- - matplotlib table object - """ - from pandas import DataFrame - if isinstance(data, Series): - data = DataFrame(data, columns=[data.name]) - elif isinstance(data, DataFrame): - pass - else: - raise ValueError('Input data must be DataFrame or Series') - - if rowLabels is None: - rowLabels = data.index - - if colLabels is None: - colLabels = data.columns - - cellText = data.values - - import matplotlib.table - table = matplotlib.table.table(ax, cellText=cellText, - rowLabels=rowLabels, - colLabels=colLabels, **kwargs) - return table - - -def _get_layout(nplots, layout=None, layout_type='box'): - if layout is not None: - if not isinstance(layout, (tuple, list)) or len(layout) != 2: - raise ValueError('Layout must be a tuple of (rows, columns)') - - nrows, ncols = layout - - # Python 2 compat - ceil_ = lambda x: int(ceil(x)) - if nrows == -1 and ncols > 0: - layout = nrows, ncols = (ceil_(float(nplots) / ncols), ncols) - elif ncols == -1 and nrows > 0: - layout = nrows, ncols = (nrows, ceil_(float(nplots) / nrows)) - elif ncols <= 0 and nrows <= 0: - msg = "At least one dimension of layout must be positive" - raise ValueError(msg) - - if nrows * ncols < nplots: - raise ValueError('Layout of %sx%s must be larger than ' - 'required size %s' % (nrows, ncols, nplots)) - - return layout - - if layout_type == 'single': - return (1, 1) - elif layout_type == 'horizontal': - return (1, nplots) - elif layout_type == 'vertical': - return (nplots, 1) - - layouts = {1: (1, 1), 2: (1, 2), 3: (2, 2), 4: (2, 2)} - try: - return layouts[nplots] - except KeyError: - k = 1 - while k ** 2 < nplots: - k += 1 - - if (k - 1) * k >= nplots: - return k, (k - 1) - else: - return k, k - -# copied from matplotlib/pyplot.py and modified for pandas.plotting - - -def _subplots(naxes=None, sharex=False, sharey=False, squeeze=True, - subplot_kw=None, ax=None, layout=None, layout_type='box', - **fig_kw): - """Create a figure with a set of subplots already made. - - This utility wrapper makes it convenient to create common layouts of - subplots, including the enclosing figure object, in a single call. - - Keyword arguments: - - naxes : int - Number of required axes. Exceeded axes are set invisible. Default is - nrows * ncols. - - sharex : bool - If True, the X axis will be shared amongst all subplots. - - sharey : bool - If True, the Y axis will be shared amongst all subplots. - - squeeze : bool - - If True, extra dimensions are squeezed out from the returned axis object: - - if only one subplot is constructed (nrows=ncols=1), the resulting - single Axis object is returned as a scalar. - - for Nx1 or 1xN subplots, the returned object is a 1-d numpy object - array of Axis objects are returned as numpy 1-d arrays. - - for NxM subplots with N>1 and M>1 are returned as a 2d array. - - If False, no squeezing at all is done: the returned axis object is always - a 2-d array containing Axis instances, even if it ends up being 1x1. - - subplot_kw : dict - Dict with keywords passed to the add_subplot() call used to create each - subplots. - - ax : Matplotlib axis object, optional - - layout : tuple - Number of rows and columns of the subplot grid. - If not specified, calculated from naxes and layout_type - - layout_type : {'box', 'horziontal', 'vertical'}, default 'box' - Specify how to layout the subplot grid. - - fig_kw : Other keyword arguments to be passed to the figure() call. - Note that all keywords not recognized above will be - automatically included here. - - Returns: - - fig, ax : tuple - - fig is the Matplotlib Figure object - - ax can be either a single axis object or an array of axis objects if - more than one subplot was created. The dimensions of the resulting array - can be controlled with the squeeze keyword, see above. - - **Examples:** - - x = np.linspace(0, 2*np.pi, 400) - y = np.sin(x**2) - - # Just a figure and one subplot - f, ax = plt.subplots() - ax.plot(x, y) - ax.set_title('Simple plot') - - # Two subplots, unpack the output array immediately - f, (ax1, ax2) = plt.subplots(1, 2, sharey=True) - ax1.plot(x, y) - ax1.set_title('Sharing Y axis') - ax2.scatter(x, y) - - # Four polar axes - plt.subplots(2, 2, subplot_kw=dict(polar=True)) - """ - import matplotlib.pyplot as plt - - if subplot_kw is None: - subplot_kw = {} - - if ax is None: - fig = plt.figure(**fig_kw) - else: - if is_list_like(ax): - ax = _flatten(ax) - if layout is not None: - warnings.warn("When passing multiple axes, layout keyword is " - "ignored", UserWarning) - if sharex or sharey: - warnings.warn("When passing multiple axes, sharex and sharey " - "are ignored. These settings must be specified " - "when creating axes", UserWarning, - stacklevel=4) - if len(ax) == naxes: - fig = ax[0].get_figure() - return fig, ax - else: - raise ValueError("The number of passed axes must be {0}, the " - "same as the output plot".format(naxes)) - - fig = ax.get_figure() - # if ax is passed and a number of subplots is 1, return ax as it is - if naxes == 1: - if squeeze: - return fig, ax - else: - return fig, _flatten(ax) - else: - warnings.warn("To output multiple subplots, the figure containing " - "the passed axes is being cleared", UserWarning, - stacklevel=4) - fig.clear() - - nrows, ncols = _get_layout(naxes, layout=layout, layout_type=layout_type) - nplots = nrows * ncols - - # Create empty object array to hold all axes. It's easiest to make it 1-d - # so we can just append subplots upon creation, and then - axarr = np.empty(nplots, dtype=object) - - # Create first subplot separately, so we can share it if requested - ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw) - - if sharex: - subplot_kw['sharex'] = ax0 - if sharey: - subplot_kw['sharey'] = ax0 - axarr[0] = ax0 - - # Note off-by-one counting because add_subplot uses the MATLAB 1-based - # convention. - for i in range(1, nplots): - kwds = subplot_kw.copy() - # Set sharex and sharey to None for blank/dummy axes, these can - # interfere with proper axis limits on the visible axes if - # they share axes e.g. issue #7528 - if i >= naxes: - kwds['sharex'] = None - kwds['sharey'] = None - ax = fig.add_subplot(nrows, ncols, i + 1, **kwds) - axarr[i] = ax - - if naxes != nplots: - for ax in axarr[naxes:]: - ax.set_visible(False) - - _handle_shared_axes(axarr, nplots, naxes, nrows, ncols, sharex, sharey) - - if squeeze: - # Reshape the array to have the final desired dimension (nrow,ncol), - # though discarding unneeded dimensions that equal 1. If we only have - # one subplot, just return it instead of a 1-element array. - if nplots == 1: - axes = axarr[0] - else: - axes = axarr.reshape(nrows, ncols).squeeze() - else: - # returned axis array will be always 2-d, even if nrows=ncols=1 - axes = axarr.reshape(nrows, ncols) - - return fig, axes - - -def _remove_labels_from_axis(axis): - for t in axis.get_majorticklabels(): - t.set_visible(False) - - try: - # set_visible will not be effective if - # minor axis has NullLocator and NullFormattor (default) - import matplotlib.ticker as ticker - if isinstance(axis.get_minor_locator(), ticker.NullLocator): - axis.set_minor_locator(ticker.AutoLocator()) - if isinstance(axis.get_minor_formatter(), ticker.NullFormatter): - axis.set_minor_formatter(ticker.FormatStrFormatter('')) - for t in axis.get_minorticklabels(): - t.set_visible(False) - except Exception: # pragma no cover - raise - axis.get_label().set_visible(False) - - -def _handle_shared_axes(axarr, nplots, naxes, nrows, ncols, sharex, sharey): - if nplots > 1: - - if nrows > 1: - try: - # first find out the ax layout, - # so that we can correctly handle 'gaps" - layout = np.zeros((nrows + 1, ncols + 1), dtype=np.bool) - for ax in axarr: - layout[ax.rowNum, ax.colNum] = ax.get_visible() - - for ax in axarr: - # only the last row of subplots should get x labels -> all - # other off layout handles the case that the subplot is - # the last in the column, because below is no subplot/gap. - if not layout[ax.rowNum + 1, ax.colNum]: - continue - if sharex or len(ax.get_shared_x_axes() - .get_siblings(ax)) > 1: - _remove_labels_from_axis(ax.xaxis) - - except IndexError: - # if gridspec is used, ax.rowNum and ax.colNum may different - # from layout shape. in this case, use last_row logic - for ax in axarr: - if ax.is_last_row(): - continue - if sharex or len(ax.get_shared_x_axes() - .get_siblings(ax)) > 1: - _remove_labels_from_axis(ax.xaxis) - - if ncols > 1: - for ax in axarr: - # only the first column should get y labels -> set all other to - # off as we only have labels in teh first column and we always - # have a subplot there, we can skip the layout test - if ax.is_first_col(): - continue - if sharey or len(ax.get_shared_y_axes().get_siblings(ax)) > 1: - _remove_labels_from_axis(ax.yaxis) - - -def _flatten(axes): - if not is_list_like(axes): - return np.array([axes]) - elif isinstance(axes, (np.ndarray, Index)): - return axes.ravel() - return np.array(axes) - - -def _get_all_lines(ax): - lines = ax.get_lines() - - if hasattr(ax, 'right_ax'): - lines += ax.right_ax.get_lines() - - if hasattr(ax, 'left_ax'): - lines += ax.left_ax.get_lines() - - return lines - - -def _get_xlim(lines): - left, right = np.inf, -np.inf - for l in lines: - x = l.get_xdata(orig=False) - left = min(x[0], left) - right = max(x[-1], right) - return left, right - - -def _set_ticks_props(axes, xlabelsize=None, xrot=None, - ylabelsize=None, yrot=None): - import matplotlib.pyplot as plt - - for ax in _flatten(axes): - if xlabelsize is not None: - plt.setp(ax.get_xticklabels(), fontsize=xlabelsize) - if xrot is not None: - plt.setp(ax.get_xticklabels(), rotation=xrot) - if ylabelsize is not None: - plt.setp(ax.get_yticklabels(), fontsize=ylabelsize) - if yrot is not None: - plt.setp(ax.get_yticklabels(), rotation=yrot) - return axes - - -class BasePlotMethods(PandasObject): - - def __init__(self, data): - self._data = data - - def __call__(self, *args, **kwargs): - raise NotImplementedError - - -class SeriesPlotMethods(BasePlotMethods): - """Series plotting accessor and method - - Examples - -------- - >>> s.plot.line() - >>> s.plot.bar() - >>> s.plot.hist() - - Plotting methods can also be accessed by calling the accessor as a method - with the ``kind`` argument: - ``s.plot(kind='line')`` is equivalent to ``s.plot.line()`` - """ - - def __call__(self, kind='line', ax=None, - figsize=None, use_index=True, title=None, grid=None, - legend=False, style=None, logx=False, logy=False, - loglog=False, xticks=None, yticks=None, - xlim=None, ylim=None, - rot=None, fontsize=None, colormap=None, table=False, - yerr=None, xerr=None, - label=None, secondary_y=False, **kwds): - return plot_series(self._data, kind=kind, ax=ax, figsize=figsize, - use_index=use_index, title=title, grid=grid, - legend=legend, style=style, logx=logx, logy=logy, - loglog=loglog, xticks=xticks, yticks=yticks, - xlim=xlim, ylim=ylim, rot=rot, fontsize=fontsize, - colormap=colormap, table=table, yerr=yerr, - xerr=xerr, label=label, secondary_y=secondary_y, - **kwds) - __call__.__doc__ = plot_series.__doc__ - - def line(self, **kwds): - """ - Line plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='line', **kwds) - - def bar(self, **kwds): - """ - Vertical bar plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='bar', **kwds) - - def barh(self, **kwds): - """ - Horizontal bar plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='barh', **kwds) - - def box(self, **kwds): - """ - Boxplot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='box', **kwds) - - def hist(self, bins=10, **kwds): - """ - Histogram - - .. versionadded:: 0.17.0 - - Parameters - ---------- - bins: integer, default 10 - Number of histogram bins to be used - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='hist', bins=bins, **kwds) - - def kde(self, **kwds): - """ - Kernel Density Estimate plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='kde', **kwds) - - density = kde - - def area(self, **kwds): - """ - Area plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='area', **kwds) - - def pie(self, **kwds): - """ - Pie chart - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='pie', **kwds) - - -class FramePlotMethods(BasePlotMethods): - """DataFrame plotting accessor and method - - Examples - -------- - >>> df.plot.line() - >>> df.plot.scatter('x', 'y') - >>> df.plot.hexbin() - - These plotting methods can also be accessed by calling the accessor as a - method with the ``kind`` argument: - ``df.plot(kind='line')`` is equivalent to ``df.plot.line()`` - """ - - def __call__(self, x=None, y=None, kind='line', ax=None, - subplots=False, sharex=None, sharey=False, layout=None, - figsize=None, use_index=True, title=None, grid=None, - legend=True, style=None, logx=False, logy=False, loglog=False, - xticks=None, yticks=None, xlim=None, ylim=None, - rot=None, fontsize=None, colormap=None, table=False, - yerr=None, xerr=None, - secondary_y=False, sort_columns=False, **kwds): - return plot_frame(self._data, kind=kind, x=x, y=y, ax=ax, - subplots=subplots, sharex=sharex, sharey=sharey, - layout=layout, figsize=figsize, use_index=use_index, - title=title, grid=grid, legend=legend, style=style, - logx=logx, logy=logy, loglog=loglog, xticks=xticks, - yticks=yticks, xlim=xlim, ylim=ylim, rot=rot, - fontsize=fontsize, colormap=colormap, table=table, - yerr=yerr, xerr=xerr, secondary_y=secondary_y, - sort_columns=sort_columns, **kwds) - __call__.__doc__ = plot_frame.__doc__ - - def line(self, x=None, y=None, **kwds): - """ - Line plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='line', x=x, y=y, **kwds) - - def bar(self, x=None, y=None, **kwds): - """ - Vertical bar plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='bar', x=x, y=y, **kwds) - - def barh(self, x=None, y=None, **kwds): - """ - Horizontal bar plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='barh', x=x, y=y, **kwds) - - def box(self, by=None, **kwds): - """ - Boxplot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - by : string or sequence - Column in the DataFrame to group by. - \*\*kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='box', by=by, **kwds) - - def hist(self, by=None, bins=10, **kwds): - """ - Histogram - - .. versionadded:: 0.17.0 - - Parameters - ---------- - by : string or sequence - Column in the DataFrame to group by. - bins: integer, default 10 - Number of histogram bins to be used - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='hist', by=by, bins=bins, **kwds) - - def kde(self, **kwds): - """ - Kernel Density Estimate plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='kde', **kwds) - - density = kde - - def area(self, x=None, y=None, **kwds): - """ - Area plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='area', x=x, y=y, **kwds) - - def pie(self, y=None, **kwds): - """ - Pie chart - - .. versionadded:: 0.17.0 - - Parameters - ---------- - y : label or position, optional - Column to plot. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='pie', y=y, **kwds) - - def scatter(self, x, y, s=None, c=None, **kwds): - """ - Scatter plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - s : scalar or array_like, optional - Size of each point. - c : label or position, optional - Color of each point. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds) - - def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, - **kwds): - """ - Hexbin plot - - .. versionadded:: 0.17.0 - - Parameters - ---------- - x, y : label or position, optional - Coordinates for each point. - C : label or position, optional - The value at each `(x, y)` point. - reduce_C_function : callable, optional - Function of one argument that reduces all the values in a bin to - a single number (e.g. `mean`, `max`, `sum`, `std`). - gridsize : int, optional - Number of bins. - **kwds : optional - Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. - - Returns - ------- - axes : matplotlib.AxesSubplot or np.array of them - """ - if reduce_C_function is not None: - kwds['reduce_C_function'] = reduce_C_function - if gridsize is not None: - kwds['gridsize'] = gridsize - return self(kind='hexbin', x=x, y=y, C=C, **kwds) - - -if __name__ == '__main__': - # import pandas.rpy.common as com - # sales = com.load_data('sanfrancisco.home.sales', package='nutshell') - # top10 = sales['zip'].value_counts()[:10].index - # sales2 = sales[sales.zip.isin(top10)] - # _ = scatter_plot(sales2, 'squarefeet', 'price', by='zip') - # plt.show() +import pandas.plotting as _plotting - import matplotlib.pyplot as plt +# back-compat of public API +# deprecate these functions +m = sys.modules['pandas.tools.plotting'] +for t in [t for t in dir(_plotting) if not t.startswith('_')]: - import pandas.tools.plotting as plots - import pandas.core.frame as fr - reload(plots) # noqa - reload(fr) # noqa - from pandas.core.frame import DataFrame + def outer(t=t): - data = DataFrame([[3, 6, -5], [4, 8, 2], [4, 9, -6], - [4, 9, -3], [2, 5, -1]], - columns=['A', 'B', 'C']) - data.plot(kind='barh', stacked=True) + def wrapper(*args, **kwargs): + warnings.warn("'pandas.tools.plotting.{t}' is deprecated, " + "import 'pandas.plotting.{t}' instead.".format(t=t), + FutureWarning, stacklevel=2) + return getattr(_plotting, t)(*args, **kwargs) + return wrapper - plt.show() + setattr(m, t, outer(t)) diff --git a/pandas/tools/tests/test_tile.py b/pandas/tools/tests/test_tile.py deleted file mode 100644 index 5c7cee862ccd3..0000000000000 --- a/pandas/tools/tests/test_tile.py +++ /dev/null @@ -1,358 +0,0 @@ -import os -import nose - -import numpy as np -from pandas.compat import zip - -from pandas import Series, Index -import pandas.util.testing as tm -from pandas.util.testing import assertRaisesRegexp -import pandas.core.common as com - -from pandas.core.algorithms import quantile -from pandas.tools.tile import cut, qcut -import pandas.tools.tile as tmod -from pandas import to_datetime, DatetimeIndex, Timestamp - - -class TestCut(tm.TestCase): - - def test_simple(self): - data = np.ones(5) - result = cut(data, 4, labels=False) - desired = np.array([1, 1, 1, 1, 1]) - tm.assert_numpy_array_equal(result, desired, - check_dtype=False) - - def test_bins(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]) - result, bins = cut(data, 3, retbins=True) - - exp_codes = np.array([0, 0, 0, 1, 2, 0], dtype=np.int8) - tm.assert_numpy_array_equal(result.codes, exp_codes) - exp = np.array([0.1905, 3.36666667, 6.53333333, 9.7]) - tm.assert_almost_equal(bins, exp) - - def test_right(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) - result, bins = cut(data, 4, right=True, retbins=True) - exp_codes = np.array([0, 0, 0, 2, 3, 0, 0], dtype=np.int8) - tm.assert_numpy_array_equal(result.codes, exp_codes) - exp = np.array([0.1905, 2.575, 4.95, 7.325, 9.7]) - tm.assert_numpy_array_equal(bins, exp) - - def test_noright(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) - result, bins = cut(data, 4, right=False, retbins=True) - exp_codes = np.array([0, 0, 0, 2, 3, 0, 1], dtype=np.int8) - tm.assert_numpy_array_equal(result.codes, exp_codes) - exp = np.array([0.2, 2.575, 4.95, 7.325, 9.7095]) - tm.assert_almost_equal(bins, exp) - - def test_arraylike(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - result, bins = cut(data, 3, retbins=True) - exp_codes = np.array([0, 0, 0, 1, 2, 0], dtype=np.int8) - tm.assert_numpy_array_equal(result.codes, exp_codes) - exp = np.array([0.1905, 3.36666667, 6.53333333, 9.7]) - tm.assert_almost_equal(bins, exp) - - def test_bins_not_monotonic(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - self.assertRaises(ValueError, cut, data, [0.1, 1.5, 1, 10]) - - def test_wrong_num_labels(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - self.assertRaises(ValueError, cut, data, [0, 1, 10], - labels=['foo', 'bar', 'baz']) - - def test_cut_corner(self): - # h3h - self.assertRaises(ValueError, cut, [], 2) - - self.assertRaises(ValueError, cut, [1, 2, 3], 0.5) - - def test_cut_out_of_range_more(self): - # #1511 - s = Series([0, -1, 0, 1, -3], name='x') - ind = cut(s, [0, 1], labels=False) - exp = Series([np.nan, np.nan, np.nan, 0, np.nan], name='x') - tm.assert_series_equal(ind, exp) - - def test_labels(self): - arr = np.tile(np.arange(0, 1.01, 0.1), 4) - - result, bins = cut(arr, 4, retbins=True) - ex_levels = Index(['(-0.001, 0.25]', '(0.25, 0.5]', '(0.5, 0.75]', - '(0.75, 1]']) - self.assert_index_equal(result.categories, ex_levels) - - result, bins = cut(arr, 4, retbins=True, right=False) - ex_levels = Index(['[0, 0.25)', '[0.25, 0.5)', '[0.5, 0.75)', - '[0.75, 1.001)']) - self.assert_index_equal(result.categories, ex_levels) - - def test_cut_pass_series_name_to_factor(self): - s = Series(np.random.randn(100), name='foo') - - factor = cut(s, 4) - self.assertEqual(factor.name, 'foo') - - def test_label_precision(self): - arr = np.arange(0, 0.73, 0.01) - - result = cut(arr, 4, precision=2) - ex_levels = Index(['(-0.00072, 0.18]', '(0.18, 0.36]', - '(0.36, 0.54]', '(0.54, 0.72]']) - self.assert_index_equal(result.categories, ex_levels) - - def test_na_handling(self): - arr = np.arange(0, 0.75, 0.01) - arr[::3] = np.nan - - result = cut(arr, 4) - - result_arr = np.asarray(result) - - ex_arr = np.where(com.isnull(arr), np.nan, result_arr) - - tm.assert_almost_equal(result_arr, ex_arr) - - result = cut(arr, 4, labels=False) - ex_result = np.where(com.isnull(arr), np.nan, result) - tm.assert_almost_equal(result, ex_result) - - def test_inf_handling(self): - data = np.arange(6) - data_ser = Series(data, dtype='int64') - - result = cut(data, [-np.inf, 2, 4, np.inf]) - result_ser = cut(data_ser, [-np.inf, 2, 4, np.inf]) - - ex_categories = Index(['(-inf, 2]', '(2, 4]', '(4, inf]']) - - tm.assert_index_equal(result.categories, ex_categories) - tm.assert_index_equal(result_ser.cat.categories, ex_categories) - self.assertEqual(result[5], '(4, inf]') - self.assertEqual(result[0], '(-inf, 2]') - self.assertEqual(result_ser[5], '(4, inf]') - self.assertEqual(result_ser[0], '(-inf, 2]') - - def test_qcut(self): - arr = np.random.randn(1000) - - labels, bins = qcut(arr, 4, retbins=True) - ex_bins = quantile(arr, [0, .25, .5, .75, 1.]) - tm.assert_almost_equal(bins, ex_bins) - - ex_levels = cut(arr, ex_bins, include_lowest=True) - self.assert_categorical_equal(labels, ex_levels) - - def test_qcut_bounds(self): - arr = np.random.randn(1000) - - factor = qcut(arr, 10, labels=False) - self.assertEqual(len(np.unique(factor)), 10) - - def test_qcut_specify_quantiles(self): - arr = np.random.randn(100) - - factor = qcut(arr, [0, .25, .5, .75, 1.]) - expected = qcut(arr, 4) - tm.assert_categorical_equal(factor, expected) - - def test_qcut_all_bins_same(self): - assertRaisesRegexp(ValueError, "edges.*unique", qcut, - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3) - - def test_cut_out_of_bounds(self): - arr = np.random.randn(100) - - result = cut(arr, [-1, 0, 1]) - - mask = result.codes == -1 - ex_mask = (arr < -1) | (arr > 1) - self.assert_numpy_array_equal(mask, ex_mask) - - def test_cut_pass_labels(self): - arr = [50, 5, 10, 15, 20, 30, 70] - bins = [0, 25, 50, 100] - labels = ['Small', 'Medium', 'Large'] - - result = cut(arr, bins, labels=labels) - - exp = cut(arr, bins) - exp.categories = labels - - tm.assert_categorical_equal(result, exp) - - def test_qcut_include_lowest(self): - values = np.arange(10) - - cats = qcut(values, 4) - - ex_levels = ['[0, 2.25]', '(2.25, 4.5]', '(4.5, 6.75]', '(6.75, 9]'] - self.assertTrue((cats.categories == ex_levels).all()) - - def test_qcut_nas(self): - arr = np.random.randn(100) - arr[:20] = np.nan - - result = qcut(arr, 4) - self.assertTrue(com.isnull(result[:20]).all()) - - def test_label_formatting(self): - self.assertEqual(tmod._trim_zeros('1.000'), '1') - - # it works - result = cut(np.arange(11.), 2) - - result = cut(np.arange(11.) / 1e10, 2) - - # #1979, negative numbers - - result = tmod._format_label(-117.9998, precision=3) - self.assertEqual(result, '-118') - result = tmod._format_label(117.9998, precision=3) - self.assertEqual(result, '118') - - def test_qcut_binning_issues(self): - # #1978, 1979 - path = os.path.join(tm.get_data_path(), 'cut_data.csv') - arr = np.loadtxt(path) - - result = qcut(arr, 20) - - starts = [] - ends = [] - for lev in result.categories: - s, e = lev[1:-1].split(',') - - self.assertTrue(s != e) - - starts.append(float(s)) - ends.append(float(e)) - - for (sp, sn), (ep, en) in zip(zip(starts[:-1], starts[1:]), - zip(ends[:-1], ends[1:])): - self.assertTrue(sp < sn) - self.assertTrue(ep < en) - self.assertTrue(ep <= sn) - - def test_cut_return_categorical(self): - from pandas import Categorical - s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) - res = cut(s, 3) - exp = Series(Categorical.from_codes([0, 0, 0, 1, 1, 1, 2, 2, 2], - ["(-0.008, 2.667]", - "(2.667, 5.333]", "(5.333, 8]"], - ordered=True)) - tm.assert_series_equal(res, exp) - - def test_qcut_return_categorical(self): - from pandas import Categorical - s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) - res = qcut(s, [0, 0.333, 0.666, 1]) - exp = Series(Categorical.from_codes([0, 0, 0, 1, 1, 1, 2, 2, 2], - ["[0, 2.664]", - "(2.664, 5.328]", "(5.328, 8]"], - ordered=True)) - tm.assert_series_equal(res, exp) - - def test_series_retbins(self): - # GH 8589 - s = Series(np.arange(4)) - result, bins = cut(s, 2, retbins=True) - tm.assert_numpy_array_equal(result.cat.codes.values, - np.array([0, 0, 1, 1], dtype=np.int8)) - tm.assert_numpy_array_equal(bins, np.array([-0.003, 1.5, 3])) - - result, bins = qcut(s, 2, retbins=True) - tm.assert_numpy_array_equal(result.cat.codes.values, - np.array([0, 0, 1, 1], dtype=np.int8)) - tm.assert_numpy_array_equal(bins, np.array([0, 1.5, 3])) - - def test_qcut_duplicates_bin(self): - # GH 7751 - values = [0, 0, 0, 0, 1, 2, 3] - result_levels = ['[0, 1]', '(1, 3]'] - - cats = qcut(values, 3, duplicates='drop') - self.assertTrue((cats.categories == result_levels).all()) - - self.assertRaises(ValueError, qcut, values, 3) - self.assertRaises(ValueError, qcut, values, 3, duplicates='raise') - - # invalid - self.assertRaises(ValueError, qcut, values, 3, duplicates='foo') - - def test_single_bin(self): - # issue 14652 - expected = Series([0, 0]) - - s = Series([9., 9.]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - s = Series([-9., -9.]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - def test_datetime_cut(self): - # GH 14714 - # testing for time data to be present as series - data = to_datetime(Series(['2013-01-01', '2013-01-02', '2013-01-03'])) - result, bins = cut(data, 3, retbins=True) - expected = Series(['(2012-12-31 23:57:07.200000, 2013-01-01 16:00:00]', - '(2013-01-01 16:00:00, 2013-01-02 08:00:00]', - '(2013-01-02 08:00:00, 2013-01-03 00:00:00]'], - ).astype("category", ordered=True) - tm.assert_series_equal(result, expected) - - # testing for time data to be present as list - data = [np.datetime64('2013-01-01'), np.datetime64('2013-01-02'), - np.datetime64('2013-01-03')] - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - # testing for time data to be present as ndarray - data = np.array([np.datetime64('2013-01-01'), - np.datetime64('2013-01-02'), - np.datetime64('2013-01-03')]) - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - # testing for time data to be present as datetime index - data = DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03']) - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - def test_datetime_bin(self): - data = [np.datetime64('2012-12-13'), np.datetime64('2012-12-15')] - bin_data = ['2012-12-12', '2012-12-14', '2012-12-16'] - expected = Series(['(2012-12-12 00:00:00, 2012-12-14 00:00:00]', - '(2012-12-14 00:00:00, 2012-12-16 00:00:00]'], - ).astype("category", ordered=True) - - for conv in [Timestamp, Timestamp, np.datetime64]: - bins = [conv(v) for v in bin_data] - result = cut(data, bins=bins) - tm.assert_series_equal(Series(result), expected) - - bin_pydatetime = [Timestamp(v).to_pydatetime() for v in bin_data] - result = cut(data, bins=bin_pydatetime) - tm.assert_series_equal(Series(result), expected) - - bins = to_datetime(bin_data) - result = cut(data, bins=bin_pydatetime) - tm.assert_series_equal(Series(result), expected) - - -def curpath(): - pth, _ = os.path.split(os.path.abspath(__file__)) - return pth - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/api.py b/pandas/tseries/api.py index 9a07983b4d951..2094791ecdc60 100644 --- a/pandas/tseries/api.py +++ b/pandas/tseries/api.py @@ -1,14 +1,8 @@ """ - +Timeseries API """ # flake8: noqa -from pandas.tseries.index import DatetimeIndex, date_range, bdate_range from pandas.tseries.frequencies import infer_freq -from pandas.tseries.tdi import Timedelta, TimedeltaIndex, timedelta_range -from pandas.tseries.period import Period, PeriodIndex, period_range, pnow -from pandas.tseries.resample import TimeGrouper -from pandas.tseries.timedeltas import to_timedelta -from pandas.lib import NaT import pandas.tseries.offsets as offsets diff --git a/pandas/tseries/converter.py b/pandas/tseries/converter.py index 95ff9578fa3ee..df603c4d880d8 100644 --- a/pandas/tseries/converter.py +++ b/pandas/tseries/converter.py @@ -1,1002 +1,11 @@ -from datetime import datetime, timedelta -import datetime as pydt -import numpy as np - -from dateutil.relativedelta import relativedelta - -import matplotlib.units as units -import matplotlib.dates as dates - -from matplotlib.ticker import Formatter, AutoLocator, Locator -from matplotlib.transforms import nonsingular - - -from pandas.types.common import (is_float, is_integer, - is_integer_dtype, - is_float_dtype, - is_datetime64_ns_dtype, - is_period_arraylike, - ) - -from pandas.compat import lrange -import pandas.compat as compat -import pandas.lib as lib -import pandas.core.common as com -from pandas.core.index import Index - -from pandas.core.series import Series -from pandas.tseries.index import date_range -import pandas.tseries.tools as tools -import pandas.tseries.frequencies as frequencies -from pandas.tseries.frequencies import FreqGroup -from pandas.tseries.period import Period, PeriodIndex - -# constants -HOURS_PER_DAY = 24. -MIN_PER_HOUR = 60. -SEC_PER_MIN = 60. - -SEC_PER_HOUR = SEC_PER_MIN * MIN_PER_HOUR -SEC_PER_DAY = SEC_PER_HOUR * HOURS_PER_DAY - -MUSEC_PER_DAY = 1e6 * SEC_PER_DAY - - -def _mpl_le_2_0_0(): - try: - import matplotlib - return matplotlib.compare_versions('2.0.0', matplotlib.__version__) - except ImportError: - return False - - -def register(): - units.registry[lib.Timestamp] = DatetimeConverter() - units.registry[Period] = PeriodConverter() - units.registry[pydt.datetime] = DatetimeConverter() - units.registry[pydt.date] = DatetimeConverter() - units.registry[pydt.time] = TimeConverter() - units.registry[np.datetime64] = DatetimeConverter() - - -def _to_ordinalf(tm): - tot_sec = (tm.hour * 3600 + tm.minute * 60 + tm.second + - float(tm.microsecond / 1e6)) - return tot_sec - - -def time2num(d): - if isinstance(d, compat.string_types): - parsed = tools.to_datetime(d) - if not isinstance(parsed, datetime): - raise ValueError('Could not parse time %s' % d) - return _to_ordinalf(parsed.time()) - if isinstance(d, pydt.time): - return _to_ordinalf(d) - return d - - -class TimeConverter(units.ConversionInterface): - - @staticmethod - def convert(value, unit, axis): - valid_types = (str, pydt.time) - if (isinstance(value, valid_types) or is_integer(value) or - is_float(value)): - return time2num(value) - if isinstance(value, Index): - return value.map(time2num) - if isinstance(value, (list, tuple, np.ndarray, Index)): - return [time2num(x) for x in value] - return value - - @staticmethod - def axisinfo(unit, axis): - if unit != 'time': - return None - - majloc = AutoLocator() - majfmt = TimeFormatter(majloc) - return units.AxisInfo(majloc=majloc, majfmt=majfmt, label='time') - - @staticmethod - def default_units(x, axis): - return 'time' - - -# time formatter -class TimeFormatter(Formatter): - - def __init__(self, locs): - self.locs = locs - - def __call__(self, x, pos=0): - fmt = '%H:%M:%S' - s = int(x) - ms = int((x - s) * 1e3) - us = int((x - s) * 1e6 - ms) - m, s = divmod(s, 60) - h, m = divmod(m, 60) - _, h = divmod(h, 24) - if us != 0: - fmt += '.%6f' - elif ms != 0: - fmt += '.%3f' - - return pydt.time(h, m, s, us).strftime(fmt) - - -# Period Conversion - - -class PeriodConverter(dates.DateConverter): - - @staticmethod - def convert(values, units, axis): - if not hasattr(axis, 'freq'): - raise TypeError('Axis must have `freq` set to convert to Periods') - valid_types = (compat.string_types, datetime, - Period, pydt.date, pydt.time) - if (isinstance(values, valid_types) or is_integer(values) or - is_float(values)): - return get_datevalue(values, axis.freq) - if isinstance(values, PeriodIndex): - return values.asfreq(axis.freq)._values - if isinstance(values, Index): - return values.map(lambda x: get_datevalue(x, axis.freq)) - if is_period_arraylike(values): - return PeriodIndex(values, freq=axis.freq)._values - if isinstance(values, (list, tuple, np.ndarray, Index)): - return [get_datevalue(x, axis.freq) for x in values] - return values - - -def get_datevalue(date, freq): - if isinstance(date, Period): - return date.asfreq(freq).ordinal - elif isinstance(date, (compat.string_types, datetime, - pydt.date, pydt.time)): - return Period(date, freq).ordinal - elif (is_integer(date) or is_float(date) or - (isinstance(date, (np.ndarray, Index)) and (date.size == 1))): - return date - elif date is None: - return None - raise ValueError("Unrecognizable date '%s'" % date) - - -def _dt_to_float_ordinal(dt): - """ - Convert :mod:`datetime` to the Gregorian date as UTC float days, - preserving hours, minutes, seconds and microseconds. Return value - is a :func:`float`. - """ - if (isinstance(dt, (np.ndarray, Index, Series) - ) and is_datetime64_ns_dtype(dt)): - base = dates.epoch2num(dt.asi8 / 1.0E9) - else: - base = dates.date2num(dt) - return base - - -# Datetime Conversion -class DatetimeConverter(dates.DateConverter): - - @staticmethod - def convert(values, unit, axis): - def try_parse(values): - try: - return _dt_to_float_ordinal(tools.to_datetime(values)) - except Exception: - return values - - if isinstance(values, (datetime, pydt.date)): - return _dt_to_float_ordinal(values) - elif isinstance(values, np.datetime64): - return _dt_to_float_ordinal(lib.Timestamp(values)) - elif isinstance(values, pydt.time): - return dates.date2num(values) - elif (is_integer(values) or is_float(values)): - return values - elif isinstance(values, compat.string_types): - return try_parse(values) - elif isinstance(values, (list, tuple, np.ndarray, Index)): - if isinstance(values, Index): - values = values.values - if not isinstance(values, np.ndarray): - values = com._asarray_tuplesafe(values) - - if is_integer_dtype(values) or is_float_dtype(values): - return values - - try: - values = tools.to_datetime(values) - if isinstance(values, Index): - values = _dt_to_float_ordinal(values) - else: - values = [_dt_to_float_ordinal(x) for x in values] - except Exception: - values = _dt_to_float_ordinal(values) - - return values - - @staticmethod - def axisinfo(unit, axis): - """ - Return the :class:`~matplotlib.units.AxisInfo` for *unit*. - - *unit* is a tzinfo instance or None. - The *axis* argument is required but not used. - """ - tz = unit - - majloc = PandasAutoDateLocator(tz=tz) - majfmt = PandasAutoDateFormatter(majloc, tz=tz) - datemin = pydt.date(2000, 1, 1) - datemax = pydt.date(2010, 1, 1) - - return units.AxisInfo(majloc=majloc, majfmt=majfmt, label='', - default_limits=(datemin, datemax)) - - -class PandasAutoDateFormatter(dates.AutoDateFormatter): - - def __init__(self, locator, tz=None, defaultfmt='%Y-%m-%d'): - dates.AutoDateFormatter.__init__(self, locator, tz, defaultfmt) - # matplotlib.dates._UTC has no _utcoffset called by pandas - if self._tz is dates.UTC: - self._tz._utcoffset = self._tz.utcoffset(None) - - # For mpl > 2.0 the format strings are controlled via rcparams - # so do not mess with them. For mpl < 2.0 change the second - # break point and add a musec break point - if _mpl_le_2_0_0(): - self.scaled[1. / SEC_PER_DAY] = '%H:%M:%S' - self.scaled[1. / MUSEC_PER_DAY] = '%H:%M:%S.%f' - - -class PandasAutoDateLocator(dates.AutoDateLocator): - - def get_locator(self, dmin, dmax): - 'Pick the best locator based on a distance.' - delta = relativedelta(dmax, dmin) - - num_days = ((delta.years * 12.0) + delta.months * 31.0) + delta.days - num_sec = (delta.hours * 60.0 + delta.minutes) * 60.0 + delta.seconds - tot_sec = num_days * 86400. + num_sec - - if abs(tot_sec) < self.minticks: - self._freq = -1 - locator = MilliSecondLocator(self.tz) - locator.set_axis(self.axis) - - locator.set_view_interval(*self.axis.get_view_interval()) - locator.set_data_interval(*self.axis.get_data_interval()) - return locator - - return dates.AutoDateLocator.get_locator(self, dmin, dmax) - - def _get_unit(self): - return MilliSecondLocator.get_unit_generic(self._freq) - - -class MilliSecondLocator(dates.DateLocator): - - UNIT = 1. / (24 * 3600 * 1000) - - def __init__(self, tz): - dates.DateLocator.__init__(self, tz) - self._interval = 1. - - def _get_unit(self): - return self.get_unit_generic(-1) - - @staticmethod - def get_unit_generic(freq): - unit = dates.RRuleLocator.get_unit_generic(freq) - if unit < 0: - return MilliSecondLocator.UNIT - return unit - - def __call__(self): - # if no data have been set, this will tank with a ValueError - try: - dmin, dmax = self.viewlim_to_dt() - except ValueError: - return [] - - if dmin > dmax: - dmax, dmin = dmin, dmax - # We need to cap at the endpoints of valid datetime - - # TODO(wesm) unused? - # delta = relativedelta(dmax, dmin) - # try: - # start = dmin - delta - # except ValueError: - # start = _from_ordinal(1.0) - - # try: - # stop = dmax + delta - # except ValueError: - # # The magic number! - # stop = _from_ordinal(3652059.9999999) - - nmax, nmin = dates.date2num((dmax, dmin)) - - num = (nmax - nmin) * 86400 * 1000 - max_millis_ticks = 6 - for interval in [1, 10, 50, 100, 200, 500]: - if num <= interval * (max_millis_ticks - 1): - self._interval = interval - break - else: - # We went through the whole loop without breaking, default to 1 - self._interval = 1000. - - estimate = (nmax - nmin) / (self._get_unit() * self._get_interval()) - - if estimate > self.MAXTICKS * 2: - raise RuntimeError(('MillisecondLocator estimated to generate %d ' - 'ticks from %s to %s: exceeds Locator.MAXTICKS' - '* 2 (%d) ') % - (estimate, dmin, dmax, self.MAXTICKS * 2)) - - freq = '%dL' % self._get_interval() - tz = self.tz.tzname(None) - st = _from_ordinal(dates.date2num(dmin)) # strip tz - ed = _from_ordinal(dates.date2num(dmax)) - all_dates = date_range(start=st, end=ed, freq=freq, tz=tz).asobject - - try: - if len(all_dates) > 0: - locs = self.raise_if_exceeds(dates.date2num(all_dates)) - return locs - except Exception: # pragma: no cover - pass - - lims = dates.date2num([dmin, dmax]) - return lims - - def _get_interval(self): - return self._interval - - def autoscale(self): - """ - Set the view limits to include the data range. - """ - dmin, dmax = self.datalim_to_dt() - if dmin > dmax: - dmax, dmin = dmin, dmax - - # We need to cap at the endpoints of valid datetime - - # TODO(wesm): unused? - - # delta = relativedelta(dmax, dmin) - # try: - # start = dmin - delta - # except ValueError: - # start = _from_ordinal(1.0) - - # try: - # stop = dmax + delta - # except ValueError: - # # The magic number! - # stop = _from_ordinal(3652059.9999999) - - dmin, dmax = self.datalim_to_dt() - - vmin = dates.date2num(dmin) - vmax = dates.date2num(dmax) - - return self.nonsingular(vmin, vmax) - - -def _from_ordinal(x, tz=None): - ix = int(x) - dt = datetime.fromordinal(ix) - remainder = float(x) - ix - hour, remainder = divmod(24 * remainder, 1) - minute, remainder = divmod(60 * remainder, 1) - second, remainder = divmod(60 * remainder, 1) - microsecond = int(1e6 * remainder) - if microsecond < 10: - microsecond = 0 # compensate for rounding errors - dt = datetime(dt.year, dt.month, dt.day, int(hour), int(minute), - int(second), microsecond) - if tz is not None: - dt = dt.astimezone(tz) - - if microsecond > 999990: # compensate for rounding errors - dt += timedelta(microseconds=1e6 - microsecond) - - return dt - -# Fixed frequency dynamic tick locators and formatters - -# ------------------------------------------------------------------------- -# --- Locators --- -# ------------------------------------------------------------------------- - - -def _get_default_annual_spacing(nyears): - """ - Returns a default spacing between consecutive ticks for annual data. - """ - if nyears < 11: - (min_spacing, maj_spacing) = (1, 1) - elif nyears < 20: - (min_spacing, maj_spacing) = (1, 2) - elif nyears < 50: - (min_spacing, maj_spacing) = (1, 5) - elif nyears < 100: - (min_spacing, maj_spacing) = (5, 10) - elif nyears < 200: - (min_spacing, maj_spacing) = (5, 25) - elif nyears < 600: - (min_spacing, maj_spacing) = (10, 50) - else: - factor = nyears // 1000 + 1 - (min_spacing, maj_spacing) = (factor * 20, factor * 100) - return (min_spacing, maj_spacing) - - -def period_break(dates, period): - """ - Returns the indices where the given period changes. - - Parameters - ---------- - dates : PeriodIndex - Array of intervals to monitor. - period : string - Name of the period to monitor. - """ - current = getattr(dates, period) - previous = getattr(dates - 1, period) - return (current - previous).nonzero()[0] - - -def has_level_label(label_flags, vmin): - """ - Returns true if the ``label_flags`` indicate there is at least one label - for this level. - - if the minimum view limit is not an exact integer, then the first tick - label won't be shown, so we must adjust for that. - """ - if label_flags.size == 0 or (label_flags.size == 1 and - label_flags[0] == 0 and - vmin % 1 > 0.0): - return False - else: - return True - - -def _daily_finder(vmin, vmax, freq): - periodsperday = -1 - - if freq >= FreqGroup.FR_HR: - if freq == FreqGroup.FR_NS: - periodsperday = 24 * 60 * 60 * 1000000000 - elif freq == FreqGroup.FR_US: - periodsperday = 24 * 60 * 60 * 1000000 - elif freq == FreqGroup.FR_MS: - periodsperday = 24 * 60 * 60 * 1000 - elif freq == FreqGroup.FR_SEC: - periodsperday = 24 * 60 * 60 - elif freq == FreqGroup.FR_MIN: - periodsperday = 24 * 60 - elif freq == FreqGroup.FR_HR: - periodsperday = 24 - else: # pragma: no cover - raise ValueError("unexpected frequency: %s" % freq) - periodsperyear = 365 * periodsperday - periodspermonth = 28 * periodsperday - - elif freq == FreqGroup.FR_BUS: - periodsperyear = 261 - periodspermonth = 19 - elif freq == FreqGroup.FR_DAY: - periodsperyear = 365 - periodspermonth = 28 - elif frequencies.get_freq_group(freq) == FreqGroup.FR_WK: - periodsperyear = 52 - periodspermonth = 3 - else: # pragma: no cover - raise ValueError("unexpected frequency") - - # save this for later usage - vmin_orig = vmin - - (vmin, vmax) = (Period(ordinal=int(vmin), freq=freq), - Period(ordinal=int(vmax), freq=freq)) - span = vmax.ordinal - vmin.ordinal + 1 - dates_ = PeriodIndex(start=vmin, end=vmax, freq=freq) - # Initialize the output - info = np.zeros(span, - dtype=[('val', np.int64), ('maj', bool), - ('min', bool), ('fmt', '|S20')]) - info['val'][:] = dates_._values - info['fmt'][:] = '' - info['maj'][[0, -1]] = True - # .. and set some shortcuts - info_maj = info['maj'] - info_min = info['min'] - info_fmt = info['fmt'] - - def first_label(label_flags): - if (label_flags[0] == 0) and (label_flags.size > 1) and \ - ((vmin_orig % 1) > 0.0): - return label_flags[1] - else: - return label_flags[0] - - # Case 1. Less than a month - if span <= periodspermonth: - day_start = period_break(dates_, 'day') - month_start = period_break(dates_, 'month') - - def _hour_finder(label_interval, force_year_start): - _hour = dates_.hour - _prev_hour = (dates_ - 1).hour - hour_start = (_hour - _prev_hour) != 0 - info_maj[day_start] = True - info_min[hour_start & (_hour % label_interval == 0)] = True - year_start = period_break(dates_, 'year') - info_fmt[hour_start & (_hour % label_interval == 0)] = '%H:%M' - info_fmt[day_start] = '%H:%M\n%d-%b' - info_fmt[year_start] = '%H:%M\n%d-%b\n%Y' - if force_year_start and not has_level_label(year_start, vmin_orig): - info_fmt[first_label(day_start)] = '%H:%M\n%d-%b\n%Y' - - def _minute_finder(label_interval): - hour_start = period_break(dates_, 'hour') - _minute = dates_.minute - _prev_minute = (dates_ - 1).minute - minute_start = (_minute - _prev_minute) != 0 - info_maj[hour_start] = True - info_min[minute_start & (_minute % label_interval == 0)] = True - year_start = period_break(dates_, 'year') - info_fmt = info['fmt'] - info_fmt[minute_start & (_minute % label_interval == 0)] = '%H:%M' - info_fmt[day_start] = '%H:%M\n%d-%b' - info_fmt[year_start] = '%H:%M\n%d-%b\n%Y' - - def _second_finder(label_interval): - minute_start = period_break(dates_, 'minute') - _second = dates_.second - _prev_second = (dates_ - 1).second - second_start = (_second - _prev_second) != 0 - info['maj'][minute_start] = True - info['min'][second_start & (_second % label_interval == 0)] = True - year_start = period_break(dates_, 'year') - info_fmt = info['fmt'] - info_fmt[second_start & (_second % - label_interval == 0)] = '%H:%M:%S' - info_fmt[day_start] = '%H:%M:%S\n%d-%b' - info_fmt[year_start] = '%H:%M:%S\n%d-%b\n%Y' - - if span < periodsperday / 12000.0: - _second_finder(1) - elif span < periodsperday / 6000.0: - _second_finder(2) - elif span < periodsperday / 2400.0: - _second_finder(5) - elif span < periodsperday / 1200.0: - _second_finder(10) - elif span < periodsperday / 800.0: - _second_finder(15) - elif span < periodsperday / 400.0: - _second_finder(30) - elif span < periodsperday / 150.0: - _minute_finder(1) - elif span < periodsperday / 70.0: - _minute_finder(2) - elif span < periodsperday / 24.0: - _minute_finder(5) - elif span < periodsperday / 12.0: - _minute_finder(15) - elif span < periodsperday / 6.0: - _minute_finder(30) - elif span < periodsperday / 2.5: - _hour_finder(1, False) - elif span < periodsperday / 1.5: - _hour_finder(2, False) - elif span < periodsperday * 1.25: - _hour_finder(3, False) - elif span < periodsperday * 2.5: - _hour_finder(6, True) - elif span < periodsperday * 4: - _hour_finder(12, True) - else: - info_maj[month_start] = True - info_min[day_start] = True - year_start = period_break(dates_, 'year') - info_fmt = info['fmt'] - info_fmt[day_start] = '%d' - info_fmt[month_start] = '%d\n%b' - info_fmt[year_start] = '%d\n%b\n%Y' - if not has_level_label(year_start, vmin_orig): - if not has_level_label(month_start, vmin_orig): - info_fmt[first_label(day_start)] = '%d\n%b\n%Y' - else: - info_fmt[first_label(month_start)] = '%d\n%b\n%Y' - - # Case 2. Less than three months - elif span <= periodsperyear // 4: - month_start = period_break(dates_, 'month') - info_maj[month_start] = True - if freq < FreqGroup.FR_HR: - info['min'] = True - else: - day_start = period_break(dates_, 'day') - info['min'][day_start] = True - week_start = period_break(dates_, 'week') - year_start = period_break(dates_, 'year') - info_fmt[week_start] = '%d' - info_fmt[month_start] = '\n\n%b' - info_fmt[year_start] = '\n\n%b\n%Y' - if not has_level_label(year_start, vmin_orig): - if not has_level_label(month_start, vmin_orig): - info_fmt[first_label(week_start)] = '\n\n%b\n%Y' - else: - info_fmt[first_label(month_start)] = '\n\n%b\n%Y' - # Case 3. Less than 14 months ............... - elif span <= 1.15 * periodsperyear: - year_start = period_break(dates_, 'year') - month_start = period_break(dates_, 'month') - week_start = period_break(dates_, 'week') - info_maj[month_start] = True - info_min[week_start] = True - info_min[year_start] = False - info_min[month_start] = False - info_fmt[month_start] = '%b' - info_fmt[year_start] = '%b\n%Y' - if not has_level_label(year_start, vmin_orig): - info_fmt[first_label(month_start)] = '%b\n%Y' - # Case 4. Less than 2.5 years ............... - elif span <= 2.5 * periodsperyear: - year_start = period_break(dates_, 'year') - quarter_start = period_break(dates_, 'quarter') - month_start = period_break(dates_, 'month') - info_maj[quarter_start] = True - info_min[month_start] = True - info_fmt[quarter_start] = '%b' - info_fmt[year_start] = '%b\n%Y' - # Case 4. Less than 4 years ................. - elif span <= 4 * periodsperyear: - year_start = period_break(dates_, 'year') - month_start = period_break(dates_, 'month') - info_maj[year_start] = True - info_min[month_start] = True - info_min[year_start] = False - - month_break = dates_[month_start].month - jan_or_jul = month_start[(month_break == 1) | (month_break == 7)] - info_fmt[jan_or_jul] = '%b' - info_fmt[year_start] = '%b\n%Y' - # Case 5. Less than 11 years ................ - elif span <= 11 * periodsperyear: - year_start = period_break(dates_, 'year') - quarter_start = period_break(dates_, 'quarter') - info_maj[year_start] = True - info_min[quarter_start] = True - info_min[year_start] = False - info_fmt[year_start] = '%Y' - # Case 6. More than 12 years ................ - else: - year_start = period_break(dates_, 'year') - year_break = dates_[year_start].year - nyears = span / periodsperyear - (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) - major_idx = year_start[(year_break % maj_anndef == 0)] - info_maj[major_idx] = True - minor_idx = year_start[(year_break % min_anndef == 0)] - info_min[minor_idx] = True - info_fmt[major_idx] = '%Y' - - return info - - -def _monthly_finder(vmin, vmax, freq): - periodsperyear = 12 - - vmin_orig = vmin - (vmin, vmax) = (int(vmin), int(vmax)) - span = vmax - vmin + 1 - - # Initialize the output - info = np.zeros(span, - dtype=[('val', int), ('maj', bool), ('min', bool), - ('fmt', '|S8')]) - info['val'] = np.arange(vmin, vmax + 1) - dates_ = info['val'] - info['fmt'] = '' - year_start = (dates_ % 12 == 0).nonzero()[0] - info_maj = info['maj'] - info_fmt = info['fmt'] - - if span <= 1.15 * periodsperyear: - info_maj[year_start] = True - info['min'] = True - - info_fmt[:] = '%b' - info_fmt[year_start] = '%b\n%Y' - - if not has_level_label(year_start, vmin_orig): - if dates_.size > 1: - idx = 1 - else: - idx = 0 - info_fmt[idx] = '%b\n%Y' - - elif span <= 2.5 * periodsperyear: - quarter_start = (dates_ % 3 == 0).nonzero() - info_maj[year_start] = True - # TODO: Check the following : is it really info['fmt'] ? - info['fmt'][quarter_start] = True - info['min'] = True - - info_fmt[quarter_start] = '%b' - info_fmt[year_start] = '%b\n%Y' - - elif span <= 4 * periodsperyear: - info_maj[year_start] = True - info['min'] = True - - jan_or_jul = (dates_ % 12 == 0) | (dates_ % 12 == 6) - info_fmt[jan_or_jul] = '%b' - info_fmt[year_start] = '%b\n%Y' - - elif span <= 11 * periodsperyear: - quarter_start = (dates_ % 3 == 0).nonzero() - info_maj[year_start] = True - info['min'][quarter_start] = True - - info_fmt[year_start] = '%Y' - - else: - nyears = span / periodsperyear - (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) - years = dates_[year_start] // 12 + 1 - major_idx = year_start[(years % maj_anndef == 0)] - info_maj[major_idx] = True - info['min'][year_start[(years % min_anndef == 0)]] = True - - info_fmt[major_idx] = '%Y' - - return info - - -def _quarterly_finder(vmin, vmax, freq): - periodsperyear = 4 - vmin_orig = vmin - (vmin, vmax) = (int(vmin), int(vmax)) - span = vmax - vmin + 1 - - info = np.zeros(span, - dtype=[('val', int), ('maj', bool), ('min', bool), - ('fmt', '|S8')]) - info['val'] = np.arange(vmin, vmax + 1) - info['fmt'] = '' - dates_ = info['val'] - info_maj = info['maj'] - info_fmt = info['fmt'] - year_start = (dates_ % 4 == 0).nonzero()[0] - - if span <= 3.5 * periodsperyear: - info_maj[year_start] = True - info['min'] = True - - info_fmt[:] = 'Q%q' - info_fmt[year_start] = 'Q%q\n%F' - if not has_level_label(year_start, vmin_orig): - if dates_.size > 1: - idx = 1 - else: - idx = 0 - info_fmt[idx] = 'Q%q\n%F' - - elif span <= 11 * periodsperyear: - info_maj[year_start] = True - info['min'] = True - info_fmt[year_start] = '%F' - - else: - years = dates_[year_start] // 4 + 1 - nyears = span / periodsperyear - (min_anndef, maj_anndef) = _get_default_annual_spacing(nyears) - major_idx = year_start[(years % maj_anndef == 0)] - info_maj[major_idx] = True - info['min'][year_start[(years % min_anndef == 0)]] = True - info_fmt[major_idx] = '%F' - - return info - - -def _annual_finder(vmin, vmax, freq): - (vmin, vmax) = (int(vmin), int(vmax + 1)) - span = vmax - vmin + 1 - - info = np.zeros(span, - dtype=[('val', int), ('maj', bool), ('min', bool), - ('fmt', '|S8')]) - info['val'] = np.arange(vmin, vmax + 1) - info['fmt'] = '' - dates_ = info['val'] - - (min_anndef, maj_anndef) = _get_default_annual_spacing(span) - major_idx = dates_ % maj_anndef == 0 - info['maj'][major_idx] = True - info['min'][(dates_ % min_anndef == 0)] = True - info['fmt'][major_idx] = '%Y' - - return info - - -def get_finder(freq): - if isinstance(freq, compat.string_types): - freq = frequencies.get_freq(freq) - fgroup = frequencies.get_freq_group(freq) - - if fgroup == FreqGroup.FR_ANN: - return _annual_finder - elif fgroup == FreqGroup.FR_QTR: - return _quarterly_finder - elif freq == FreqGroup.FR_MTH: - return _monthly_finder - elif ((freq >= FreqGroup.FR_BUS) or fgroup == FreqGroup.FR_WK): - return _daily_finder - else: # pragma: no cover - errmsg = "Unsupported frequency: %s" % (freq) - raise NotImplementedError(errmsg) - - -class TimeSeries_DateLocator(Locator): - """ - Locates the ticks along an axis controlled by a :class:`Series`. - - Parameters - ---------- - freq : {var} - Valid frequency specifier. - minor_locator : {False, True}, optional - Whether the locator is for minor ticks (True) or not. - dynamic_mode : {True, False}, optional - Whether the locator should work in dynamic mode. - base : {int}, optional - quarter : {int}, optional - month : {int}, optional - day : {int}, optional - """ - - def __init__(self, freq, minor_locator=False, dynamic_mode=True, - base=1, quarter=1, month=1, day=1, plot_obj=None): - if isinstance(freq, compat.string_types): - freq = frequencies.get_freq(freq) - self.freq = freq - self.base = base - (self.quarter, self.month, self.day) = (quarter, month, day) - self.isminor = minor_locator - self.isdynamic = dynamic_mode - self.offset = 0 - self.plot_obj = plot_obj - self.finder = get_finder(freq) - - def _get_default_locs(self, vmin, vmax): - "Returns the default locations of ticks." - - if self.plot_obj.date_axis_info is None: - self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq) - - locator = self.plot_obj.date_axis_info - - if self.isminor: - return np.compress(locator['min'], locator['val']) - return np.compress(locator['maj'], locator['val']) - - def __call__(self): - 'Return the locations of the ticks.' - # axis calls Locator.set_axis inside set_m_formatter - vi = tuple(self.axis.get_view_interval()) - if vi != self.plot_obj.view_interval: - self.plot_obj.date_axis_info = None - self.plot_obj.view_interval = vi - vmin, vmax = vi - if vmax < vmin: - vmin, vmax = vmax, vmin - if self.isdynamic: - locs = self._get_default_locs(vmin, vmax) - else: # pragma: no cover - base = self.base - (d, m) = divmod(vmin, base) - vmin = (d + 1) * base - locs = lrange(vmin, vmax + 1, base) - return locs - - def autoscale(self): - """ - Sets the view limits to the nearest multiples of base that contain the - data. - """ - # requires matplotlib >= 0.98.0 - (vmin, vmax) = self.axis.get_data_interval() - - locs = self._get_default_locs(vmin, vmax) - (vmin, vmax) = locs[[0, -1]] - if vmin == vmax: - vmin -= 1 - vmax += 1 - return nonsingular(vmin, vmax) - -# ------------------------------------------------------------------------- -# --- Formatter --- -# ------------------------------------------------------------------------- - - -class TimeSeries_DateFormatter(Formatter): - """ - Formats the ticks along an axis controlled by a :class:`PeriodIndex`. - - Parameters - ---------- - freq : {int, string} - Valid frequency specifier. - minor_locator : {False, True} - Whether the current formatter should apply to minor ticks (True) or - major ticks (False). - dynamic_mode : {True, False} - Whether the formatter works in dynamic mode or not. - """ - - def __init__(self, freq, minor_locator=False, dynamic_mode=True, - plot_obj=None): - if isinstance(freq, compat.string_types): - freq = frequencies.get_freq(freq) - self.format = None - self.freq = freq - self.locs = [] - self.formatdict = None - self.isminor = minor_locator - self.isdynamic = dynamic_mode - self.offset = 0 - self.plot_obj = plot_obj - self.finder = get_finder(freq) - - def _set_default_format(self, vmin, vmax): - "Returns the default ticks spacing." - - if self.plot_obj.date_axis_info is None: - self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq) - info = self.plot_obj.date_axis_info - - if self.isminor: - format = np.compress(info['min'] & np.logical_not(info['maj']), - info) - else: - format = np.compress(info['maj'], info) - self.formatdict = dict([(x, f) for (x, _, _, f) in format]) - return self.formatdict - - def set_locs(self, locs): - 'Sets the locations of the ticks' - # don't actually use the locs. This is just needed to work with - # matplotlib. Force to use vmin, vmax - self.locs = locs - - (vmin, vmax) = vi = tuple(self.axis.get_view_interval()) - if vi != self.plot_obj.view_interval: - self.plot_obj.date_axis_info = None - self.plot_obj.view_interval = vi - if vmax < vmin: - (vmin, vmax) = (vmax, vmin) - self._set_default_format(vmin, vmax) - - def __call__(self, x, pos=0): - if self.formatdict is None: - return '' - else: - fmt = self.formatdict.pop(x, '') - return Period(ordinal=int(x), freq=self.freq).strftime(fmt) +# flake8: noqa + +from pandas.plotting._converter import (register, time2num, + TimeConverter, TimeFormatter, + PeriodConverter, get_datevalue, + DatetimeConverter, + PandasAutoDateFormatter, + PandasAutoDateLocator, + MilliSecondLocator, get_finder, + TimeSeries_DateLocator, + TimeSeries_DateFormatter) diff --git a/pandas/tseries/frequencies.py b/pandas/tseries/frequencies.py index e0c602bf5a037..7f34bcaf52926 100644 --- a/pandas/tseries/frequencies.py +++ b/pandas/tseries/frequencies.py @@ -6,20 +6,21 @@ import numpy as np -from pandas.types.generic import ABCSeries -from pandas.types.common import (is_integer, - is_period_arraylike, - is_timedelta64_dtype, - is_datetime64_dtype) +from pandas.core.dtypes.generic import ABCSeries +from pandas.core.dtypes.common import ( + is_integer, + is_period_arraylike, + is_timedelta64_dtype, + is_datetime64_dtype) import pandas.core.algorithms as algos from pandas.core.algorithms import unique from pandas.tseries.offsets import DateOffset -from pandas.util.decorators import cache_readonly, deprecate_kwarg +from pandas.util._decorators import cache_readonly, deprecate_kwarg import pandas.tseries.offsets as offsets -import pandas.lib as lib -import pandas.tslib as tslib -from pandas.tslib import Timedelta + +from pandas._libs import lib, tslib +from pandas._libs.tslib import Timedelta from pytz import AmbiguousTimeError @@ -398,22 +399,27 @@ def _get_freq_str(base, mult=1): 'Q': 'Q', 'A': 'A', 'W': 'W', - 'M': 'M' + 'M': 'M', + 'Y': 'A', + 'BY': 'A', + 'YS': 'A', + 'BYS': 'A', } -need_suffix = ['QS', 'BQ', 'BQS', 'AS', 'BA', 'BAS'] +need_suffix = ['QS', 'BQ', 'BQS', 'YS', 'AS', 'BY', 'BA', 'BYS', 'BAS'] for __prefix in need_suffix: for _m in tslib._MONTHS: - _offset_to_period_map['%s-%s' % (__prefix, _m)] = \ - _offset_to_period_map[__prefix] + _alias = '{prefix}-{month}'.format(prefix=__prefix, month=_m) + _offset_to_period_map[_alias] = _offset_to_period_map[__prefix] for __prefix in ['A', 'Q']: for _m in tslib._MONTHS: - _alias = '%s-%s' % (__prefix, _m) + _alias = '{prefix}-{month}'.format(prefix=__prefix, month=_m) _offset_to_period_map[_alias] = _alias _days = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'] for _d in _days: - _offset_to_period_map['W-%s' % _d] = 'W-%s' % _d + _alias = 'W-{day}'.format(day=_d) + _offset_to_period_map[_alias] = _alias def get_period_alias(offset_str): @@ -426,9 +432,13 @@ def get_period_alias(offset_str): 'Q': 'Q-DEC', 'A': 'A-DEC', # YearEnd(month=12), + 'Y': 'A-DEC', 'AS': 'AS-JAN', # YearBegin(month=1), + 'YS': 'AS-JAN', 'BA': 'BA-DEC', # BYearEnd(month=12), + 'BY': 'BA-DEC', 'BAS': 'BAS-JAN', # BYearBegin(month=1), + 'BYS': 'BAS-JAN', 'Min': 'T', 'min': 'T', @@ -578,7 +588,7 @@ def _base_and_stride(freqstr): groups = opattern.match(freqstr) if not groups: - raise ValueError("Could not evaluate %s" % freqstr) + raise ValueError("Could not evaluate {freq}".format(freq=freqstr)) stride = groups.group(1) @@ -636,20 +646,6 @@ def get_offset(name): getOffset = get_offset -def get_offset_name(offset): - """ - Return rule name associated with a DateOffset object - - Examples - -------- - get_offset_name(BMonthEnd(1)) --> 'EOM' - """ - - msg = "get_offset_name(offset) is deprecated. Use offset.freqstr instead" - warnings.warn(msg, FutureWarning, stacklevel=2) - return offset.freqstr - - def get_standard_freq(freq): """ Return the standardized frequency string @@ -660,6 +656,7 @@ def get_standard_freq(freq): warnings.warn(msg, FutureWarning, stacklevel=2) return to_offset(freq).rule_code + # --------------------------------------------------------------------- # Period codes @@ -720,7 +717,17 @@ def get_standard_freq(freq): for _k, _v in compat.iteritems(_period_code_map): _reverse_period_code_map[_v] = _k -# Additional aliases +# Yearly aliases +year_aliases = {} + +for k, v in compat.iteritems(_period_code_map): + if k.startswith("A-"): + alias = "Y" + k[1:] + year_aliases[alias] = v + +_period_code_map.update(**year_aliases) +del year_aliases + _period_code_map.update({ "Q": 2000, # Quarterly - December year end (default quarterly) "A": 1000, # Annual @@ -769,8 +776,8 @@ def infer_freq(index, warn=True): if not (is_datetime64_dtype(values) or is_timedelta64_dtype(values) or values.dtype == object): - raise TypeError("cannot infer freq from a non-convertible " - "dtype on a Series of {0}".format(index.dtype)) + raise TypeError("cannot infer freq from a non-convertible dtype " + "on a Series of {dtype}".format(dtype=index.dtype)) index = values if is_period_arraylike(index): @@ -783,7 +790,7 @@ def infer_freq(index, warn=True): if isinstance(index, pd.Index) and not isinstance(index, pd.DatetimeIndex): if isinstance(index, (pd.Int64Index, pd.Float64Index)): raise TypeError("cannot infer freq from a non-convertible index " - "type {0}".format(type(index))) + "type {type}".format(type=type(index))) index = index.values if not isinstance(index, pd.DatetimeIndex): @@ -795,6 +802,7 @@ def infer_freq(index, warn=True): inferer = _FrequencyInferer(index, warn=warn) return inferer.get_freq() + _ONE_MICRO = long(1000) _ONE_MILLI = _ONE_MICRO * 1000 _ONE_SECOND = _ONE_MILLI * 1000 @@ -949,15 +957,17 @@ def _infer_daily_rule(self): if annual_rule: nyears = self.ydiffs[0] month = _month_aliases[self.rep_stamp.month] - return _maybe_add_count('%s-%s' % (annual_rule, month), nyears) + alias = '{prefix}-{month}'.format(prefix=annual_rule, month=month) + return _maybe_add_count(alias, nyears) quarterly_rule = self._get_quarterly_rule() if quarterly_rule: nquarters = self.mdiffs[0] / 3 mod_dict = {0: 12, 2: 11, 1: 10} month = _month_aliases[mod_dict[self.rep_stamp.month % 3]] - return _maybe_add_count('%s-%s' % (quarterly_rule, month), - nquarters) + alias = '{prefix}-{month}'.format(prefix=quarterly_rule, + month=month) + return _maybe_add_count(alias, nquarters) monthly_rule = self._get_monthly_rule() if monthly_rule: @@ -967,13 +977,12 @@ def _infer_daily_rule(self): days = self.deltas[0] / _ONE_DAY if days % 7 == 0: # Weekly - alias = _weekday_rule_aliases[self.rep_stamp.weekday()] - return _maybe_add_count('W-%s' % alias, days / 7) + day = _weekday_rule_aliases[self.rep_stamp.weekday()] + return _maybe_add_count('W-{day}'.format(day=day), days / 7) else: return _maybe_add_count('D', days) - # Business daily. Maybe - if self.day_deltas == [1, 3]: + if self._is_business_daily(): return 'B' wom_rule = self._get_wom_rule() @@ -1009,6 +1018,19 @@ def _get_monthly_rule(self): return {'cs': 'MS', 'bs': 'BMS', 'ce': 'M', 'be': 'BM'}.get(pos_check) + def _is_business_daily(self): + # quick check: cannot be business daily + if self.day_deltas != [1, 3]: + return False + + # probably business daily, but need to confirm + first_weekday = self.index[0].weekday() + shifts = np.diff(self.index.asi8) + shifts = np.floor_divide(shifts, _ONE_DAY) + weekdays = np.mod(first_weekday + np.cumsum(shifts), 7) + return np.all(((weekdays == 0) & (shifts == 3)) | + ((weekdays > 0) & (weekdays <= 4) & (shifts == 1))) + def _get_wom_rule(self): # wdiffs = unique(np.diff(self.index.week)) # We also need -47, -49, -48 to catch index spanning year boundary @@ -1029,7 +1051,7 @@ def _get_wom_rule(self): week = week_of_months[0] + 1 wd = _weekday_rule_aliases[weekdays[0]] - return 'WOM-%d%s' % (week, wd) + return 'WOM-{week}{weekday}'.format(week=week, weekday=wd) class _TimedeltaFrequencyInferer(_FrequencyInferer): @@ -1039,15 +1061,16 @@ def _infer_daily_rule(self): days = self.deltas[0] / _ONE_DAY if days % 7 == 0: # Weekly - alias = _weekday_rule_aliases[self.rep_stamp.weekday()] - return _maybe_add_count('W-%s' % alias, days / 7) + wd = _weekday_rule_aliases[self.rep_stamp.weekday()] + alias = 'W-{weekday}'.format(weekday=wd) + return _maybe_add_count(alias, days / 7) else: return _maybe_add_count('D', days) def _maybe_add_count(base, count): if count != 1: - return '%d%s' % (count, base) + return '{count}{base}'.format(count=int(count), base=base) else: return base diff --git a/pandas/tseries/holiday.py b/pandas/tseries/holiday.py index 31e40c6bcbb2c..d8bfa3013f8f7 100644 --- a/pandas/tseries/holiday.py +++ b/pandas/tseries/holiday.py @@ -174,16 +174,16 @@ class from pandas.tseries.offsets def __repr__(self): info = '' if self.year is not None: - info += 'year=%s, ' % self.year - info += 'month=%s, day=%s, ' % (self.month, self.day) + info += 'year={year}, '.format(year=self.year) + info += 'month={mon}, day={day}, '.format(mon=self.month, day=self.day) if self.offset is not None: - info += 'offset=%s' % self.offset + info += 'offset={offset}'.format(offset=self.offset) if self.observance is not None: - info += 'observance=%s' % self.observance + info += 'observance={obs}'.format(obs=self.observance) - repr = 'Holiday: %s (%s)' % (self.name, info) + repr = 'Holiday: {name} ({info})'.format(name=self.name, info=info) return repr def dates(self, start_date, end_date, return_name=False): @@ -286,6 +286,7 @@ def _apply_rule(self, dates): dates += offset return dates + holiday_calendars = {} @@ -364,7 +365,7 @@ def holidays(self, start=None, end=None, return_name=False): ---------- start : starting date, datetime-like, optional end : ending date, datetime-like, optional - return_names : bool, optional + return_name : bool, optional If True, return a series that has dates and holiday names. False will only return a DatetimeIndex of dates. @@ -373,8 +374,8 @@ def holidays(self, start=None, end=None, return_name=False): DatetimeIndex of holidays """ if self.rules is None: - raise Exception('Holiday Calendar %s does not have any ' - 'rules specified' % self.name) + raise Exception('Holiday Calendar {name} does not have any ' + 'rules specified'.format(name=self.name)) if start is None: start = AbstractHolidayCalendar.start_date @@ -461,6 +462,7 @@ def merge(self, other, inplace=False): else: return holidays + USMemorialDay = Holiday('MemorialDay', month=5, day=31, offset=DateOffset(weekday=MO(-1))) USLaborDay = Holiday('Labor Day', month=9, day=1, diff --git a/pandas/tseries/interval.py b/pandas/tseries/interval.py deleted file mode 100644 index 6698c7e924758..0000000000000 --- a/pandas/tseries/interval.py +++ /dev/null @@ -1,38 +0,0 @@ - -from pandas.core.index import Index - - -class Interval(object): - """ - Represents an interval of time defined by two timestamps - """ - - def __init__(self, start, end): - self.start = start - self.end = end - - -class PeriodInterval(object): - """ - Represents an interval of time defined by two Period objects (time - ordinals) - """ - - def __init__(self, start, end): - self.start = start - self.end = end - - -class IntervalIndex(Index): - """ - - """ - - def __new__(self, starts, ends): - pass - - def dtype(self): - return self.values.dtype - -if __name__ == '__main__': - pass diff --git a/pandas/tseries/offsets.py b/pandas/tseries/offsets.py index 370dd00762896..7ccecaa84e6d6 100644 --- a/pandas/tseries/offsets.py +++ b/pandas/tseries/offsets.py @@ -3,15 +3,14 @@ from pandas import compat import numpy as np -from pandas.types.generic import ABCSeries, ABCDatetimeIndex, ABCPeriod -from pandas.tseries.tools import to_datetime, normalize_date +from pandas.core.dtypes.generic import ABCSeries, ABCDatetimeIndex, ABCPeriod +from pandas.core.tools.datetimes import to_datetime, normalize_date from pandas.core.common import AbstractMethodError # import after tools, dateutil check from dateutil.relativedelta import relativedelta, weekday from dateutil.easter import easter -import pandas.tslib as tslib -from pandas.tslib import Timestamp, OutOfBoundsDatetime, Timedelta +from pandas._libs import tslib, Timestamp, OutOfBoundsDatetime, Timedelta import functools import operator @@ -185,6 +184,7 @@ def __add__(date): ) _use_relativedelta = False _adjust_dst = False + _typ = "dateoffset" # default for prior pickles normalize = False @@ -261,10 +261,10 @@ def apply_index(self, i): """ if not type(self) is DateOffset: - raise NotImplementedError("DateOffset subclass %s " + raise NotImplementedError("DateOffset subclass {name} " "does not have a vectorized " - "implementation" - % (self.__class__.__name__,)) + "implementation".format( + name=self.__class__.__name__)) relativedelta_fast = set(['years', 'months', 'weeks', 'days', 'hours', 'minutes', 'seconds', 'microseconds']) @@ -295,10 +295,10 @@ def apply_index(self, i): return i + (self._offset * self.n) else: # relativedelta with other keywords + kwd = set(self.kwds) - relativedelta_fast raise NotImplementedError("DateOffset with relativedelta " - "keyword(s) %s not able to be " - "applied vectorized" % - (set(self.kwds) - relativedelta_fast),) + "keyword(s) {kwd} not able to be " + "applied vectorized".format(kwd=kwd)) def isAnchored(self): return (self.n == 1) @@ -339,19 +339,20 @@ def __repr__(self): if attr not in exclude: attrs.append('='.join((attr, repr(getattr(self, attr))))) + plural = '' if abs(self.n) != 1: plural = 's' - else: - plural = '' - n_str = "" + n_str = '' if self.n != 1: - n_str = "%s * " % self.n + n_str = '{n} * '.format(n=self.n) - out = '<%s' % n_str + className + plural + attrs_str = '' if attrs: - out += ': ' + ', '.join(attrs) - out += '>' + attrs_str = ': ' + ', '.join(attrs) + + repr_content = ''.join([n_str, className, plural, attrs_str]) + out = '<{content}>'.format(content=repr_content) return out @property @@ -501,7 +502,7 @@ def freqstr(self): return repr(self) if self.n != 1: - fstr = '%d%s' % (self.n, code) + fstr = '{n}{code}'.format(n=self.n, code=code) else: fstr = code @@ -509,7 +510,7 @@ def freqstr(self): @property def nanos(self): - raise ValueError("{0} is a non-fixed frequency".format(self)) + raise ValueError("{name} is a non-fixed frequency".format(name=self)) class SingleConstructorOffset(DateOffset): @@ -518,7 +519,7 @@ class SingleConstructorOffset(DateOffset): def _from_name(cls, suffix=None): # default _from_name calls cls with no args if suffix: - raise ValueError("Bad freq suffix %s" % suffix) + raise ValueError("Bad freq suffix {suffix}".format(suffix=suffix)) return cls() @@ -531,21 +532,21 @@ class BusinessMixin(object): def __repr__(self): className = getattr(self, '_outputName', self.__class__.__name__) + plural = '' if abs(self.n) != 1: plural = 's' - else: - plural = '' - n_str = "" + n_str = '' if self.n != 1: - n_str = "%s * " % self.n + n_str = '{n} * '.format(n=self.n) - out = '<%s' % n_str + className + plural + self._repr_attrs() + '>' + repr_content = ''.join([n_str, className, plural, self._repr_attrs()]) + out = '<{content}>'.format(content=repr_content) return out def _repr_attrs(self): if self.offset: - attrs = ['offset=%s' % repr(self.offset)] + attrs = ['offset={offset!r}'.format(offset=self.offset)] else: attrs = None out = '' @@ -601,7 +602,7 @@ def freqstr(self): return repr(self) if self.n != 1: - fstr = '%d%s' % (self.n, code) + fstr = '{n}{code}'.format(n=self.n, code=code) else: fstr = code @@ -777,12 +778,12 @@ def _get_business_hours_by_sec(self): # create dummy datetime to calcurate businesshours in a day dtstart = datetime(2014, 4, 1, self.start.hour, self.start.minute) until = datetime(2014, 4, 1, self.end.hour, self.end.minute) - return tslib.tot_seconds(until - dtstart) + return (until - dtstart).total_seconds() else: self.daytime = False dtstart = datetime(2014, 4, 1, self.start.hour, self.start.minute) until = datetime(2014, 4, 2, self.end.hour, self.end.minute) - return tslib.tot_seconds(until - dtstart) + return (until - dtstart).total_seconds() @apply_wraps def rollback(self, dt): @@ -906,7 +907,7 @@ def _onOffset(self, dt, businesshours): op = self._prev_opening_time(dt) else: op = self._next_opening_time(dt) - span = tslib.tot_seconds(dt - op) + span = (dt - op).total_seconds() if span <= businesshours: return True else: @@ -1109,7 +1110,8 @@ def name(self): if self.isAnchored: return self.rule_code else: - return "%s-%s" % (self.rule_code, _int_to_month[self.n]) + return "{code}-{month}".format(code=self.rule_code, + month=_int_to_month[self.n]) class MonthEnd(MonthOffset): @@ -1176,9 +1178,9 @@ def __init__(self, n=1, day_of_month=None, normalize=False, **kwds): else: self.day_of_month = int(day_of_month) if not self._min_day_of_month <= self.day_of_month <= 27: - raise ValueError('day_of_month must be ' - '{}<=day_of_month<=27, got {}'.format( - self._min_day_of_month, self.day_of_month)) + msg = 'day_of_month must be {min}<=day_of_month<=27, got {day}' + raise ValueError(msg.format(min=self._min_day_of_month, + day=self.day_of_month)) self.n = int(n) self.normalize = normalize self.kwds = kwds @@ -1190,7 +1192,7 @@ def _from_name(cls, suffix=None): @property def rule_code(self): - suffix = '-{}'.format(self.day_of_month) + suffix = '-{day_of_month}'.format(day_of_month=self.day_of_month) return self._prefix + suffix @apply_wraps @@ -1576,8 +1578,8 @@ def __init__(self, n=1, normalize=False, **kwds): if self.weekday is not None: if self.weekday < 0 or self.weekday > 6: - raise ValueError('Day must be 0<=day<=6, got %d' % - self.weekday) + raise ValueError('Day must be 0<=day<=6, got {day}' + .format(day=self.weekday)) self._inc = timedelta(weeks=1) self.kwds = kwds @@ -1597,7 +1599,6 @@ def apply(self, other): if otherDay != self.weekday: other = other + timedelta((self.weekday - otherDay) % 7) k = k - 1 - other = other for i in range(k): other = other + self._inc else: @@ -1631,7 +1632,7 @@ def onOffset(self, dt): def rule_code(self): suffix = '' if self.weekday is not None: - suffix = '-%s' % (_int_to_weekday[self.weekday]) + suffix = '-{weekday}'.format(weekday=_int_to_weekday[self.weekday]) return self._prefix + suffix @classmethod @@ -1652,6 +1653,7 @@ class WeekDay(object): SAT = 5 SUN = 6 + _int_to_weekday = { WeekDay.MON: 'MON', WeekDay.TUE: 'TUE', @@ -1696,11 +1698,11 @@ def __init__(self, n=1, normalize=False, **kwds): raise ValueError('N cannot be 0') if self.weekday < 0 or self.weekday > 6: - raise ValueError('Day must be 0<=day<=6, got %d' % - self.weekday) + raise ValueError('Day must be 0<=day<=6, got {day}' + .format(day=self.weekday)) if self.week < 0 or self.week > 3: - raise ValueError('Week must be 0<=day<=3, got %d' % - self.week) + raise ValueError('Week must be 0<=week<=3, got {week}' + .format(week=self.week)) self.kwds = kwds @@ -1746,15 +1748,18 @@ def onOffset(self, dt): @property def rule_code(self): - return '%s-%d%s' % (self._prefix, self.week + 1, - _int_to_weekday.get(self.weekday, '')) + weekday = _int_to_weekday.get(self.weekday, '') + return '{prefix}-{week}{weekday}'.format(prefix=self._prefix, + week=self.week + 1, + weekday=weekday) _prefix = 'WOM' @classmethod def _from_name(cls, suffix=None): if not suffix: - raise ValueError("Prefix %r requires a suffix." % (cls._prefix)) + raise ValueError("Prefix {prefix!r} requires a suffix." + .format(prefix=cls._prefix)) # TODO: handle n here... # only one digit weeks (1 --> week 0, 2 --> week 1, etc.) week = int(suffix[0]) - 1 @@ -1789,8 +1794,8 @@ def __init__(self, n=1, normalize=False, **kwds): raise ValueError('N cannot be 0') if self.weekday < 0 or self.weekday > 6: - raise ValueError('Day must be 0<=day<=6, got %d' % - self.weekday) + raise ValueError('Day must be 0<=day<=6, got {day}' + .format(day=self.weekday)) self.kwds = kwds @@ -1829,14 +1834,17 @@ def onOffset(self, dt): @property def rule_code(self): - return '%s-%s' % (self._prefix, _int_to_weekday.get(self.weekday, '')) + weekday = _int_to_weekday.get(self.weekday, '') + return '{prefix}-{weekday}'.format(prefix=self._prefix, + weekday=weekday) _prefix = 'LWOM' @classmethod def _from_name(cls, suffix=None): if not suffix: - raise ValueError("Prefix %r requires a suffix." % (cls._prefix)) + raise ValueError("Prefix {prefix!r} requires a suffix." + .format(prefix=cls._prefix)) # TODO: handle n here... weekday = _weekday_to_int[suffix] return cls(weekday=weekday) @@ -1876,7 +1884,8 @@ def _from_name(cls, suffix=None): @property def rule_code(self): - return '%s-%s' % (self._prefix, _int_to_month[self.startingMonth]) + month = _int_to_month[self.startingMonth] + return '{prefix}-{month}'.format(prefix=self._prefix, month=month) class BQuarterEnd(QuarterOffset): @@ -1924,6 +1933,7 @@ def onOffset(self, dt): modMonth = (dt.month - self.startingMonth) % 3 return BMonthEnd().onOffset(dt) and modMonth == 0 + _int_to_month = tslib._MONTH_ALIASES _month_to_int = dict((v, k) for k, v in _int_to_month.items()) @@ -2044,8 +2054,7 @@ def apply(self, other): @apply_index_wraps def apply_index(self, i): freq_month = 12 if self.startingMonth == 1 else self.startingMonth - 1 - # freq_month = self.startingMonth - freqstr = 'Q-%s' % (_int_to_month[freq_month],) + freqstr = 'Q-{month}'.format(month=_int_to_month[freq_month]) return self._beg_apply_index(i, freqstr) @@ -2070,7 +2079,8 @@ def _from_name(cls, suffix=None): @property def rule_code(self): - return '%s-%s' % (self._prefix, _int_to_month[self.month]) + month = _int_to_month[self.month] + return '{prefix}-{month}'.format(prefix=self._prefix, month=month) class BYearEnd(YearOffset): @@ -2245,7 +2255,7 @@ def _rollf(date): @apply_index_wraps def apply_index(self, i): freq_month = 12 if self.month == 1 else self.month - 1 - freqstr = 'A-%s' % (_int_to_month[freq_month],) + freqstr = 'A-{month}'.format(month=_int_to_month[freq_month]) return self._beg_apply_index(i, freqstr) def onOffset(self, dt): @@ -2311,7 +2321,8 @@ def __init__(self, n=1, normalize=False, **kwds): raise ValueError('N cannot be 0') if self.variation not in ["nearest", "last"]: - raise ValueError('%s is not a valid variation' % self.variation) + raise ValueError('{variation} is not a valid variation' + .format(variation=self.variation)) if self.variation == "nearest": weekday_offset = weekday(self.weekday) @@ -2437,8 +2448,9 @@ def _get_year_end_last(self, dt): @property def rule_code(self): + prefix = self._get_prefix() suffix = self.get_rule_code_suffix() - return "%s-%s" % (self._get_prefix(), suffix) + return "{prefix}-{suffix}".format(prefix=prefix, suffix=suffix) def _get_prefix(self): return self._prefix @@ -2450,9 +2462,11 @@ def _get_suffix_prefix(self): return self._suffix_prefix_last def get_rule_code_suffix(self): - return '%s-%s-%s' % (self._get_suffix_prefix(), - _int_to_month[self.startingMonth], - _int_to_weekday[self.weekday]) + prefix = self._get_suffix_prefix() + month = _int_to_month[self.startingMonth] + weekday = _int_to_weekday[self.weekday] + return '{prefix}-{month}-{weekday}'.format(prefix=prefix, month=month, + weekday=weekday) @classmethod def _parse_suffix(cls, varion_code, startingMonth_code, weekday_code): @@ -2462,7 +2476,7 @@ def _parse_suffix(cls, varion_code, startingMonth_code, weekday_code): variation = "last" else: raise ValueError( - "Unable to parse varion_code: %s" % (varion_code,)) + "Unable to parse varion_code: {code}".format(code=varion_code)) startingMonth = _month_to_int[startingMonth_code] weekday = _weekday_to_int[weekday_code] @@ -2627,8 +2641,9 @@ def onOffset(self, dt): @property def rule_code(self): suffix = self._offset.get_rule_code_suffix() - return "%s-%s" % (self._prefix, - "%s-%d" % (suffix, self.qtr_with_extra_week)) + qtr = self.qtr_with_extra_week + return "{prefix}-{suffix}-{qtr}".format(prefix=self._prefix, + suffix=suffix, qtr=qtr) @classmethod def _from_name(cls, *args): @@ -2711,8 +2726,8 @@ def __add__(self, other): except ApplyTypeError: return NotImplemented except OverflowError: - raise OverflowError("the add operation between {} and {} " - "will overflow".format(self, other)) + raise OverflowError("the add operation between {self} and {other} " + "will overflow".format(self=self, other=other)) def __eq__(self, other): if isinstance(other, compat.string_types): @@ -2770,7 +2785,8 @@ def apply(self, other): elif isinstance(other, type(self)): return type(self)(self.n + other.n) - raise ApplyTypeError('Unhandled type: %s' % type(other).__name__) + raise ApplyTypeError('Unhandled type: {type_str}' + .format(type_str=type(other).__name__)) _prefix = 'undefined' @@ -2799,6 +2815,7 @@ def _delta_to_tick(delta): else: # pragma: no cover return Nano(nanos) + _delta_to_nanoseconds = tslib._delta_to_nanoseconds @@ -2919,7 +2936,8 @@ def generate_range(start=None, end=None, periods=None, # faster than cur + offset next_date = offset.apply(cur) if next_date <= cur: - raise ValueError('Offset %s did not increment date' % offset) + raise ValueError('Offset {offset} did not increment date' + .format(offset=offset)) cur = next_date else: while cur >= end: @@ -2928,9 +2946,11 @@ def generate_range(start=None, end=None, periods=None, # faster than cur + offset next_date = offset.apply(cur) if next_date >= cur: - raise ValueError('Offset %s did not decrement date' % offset) + raise ValueError('Offset {offset} did not decrement date' + .format(offset=offset)) cur = next_date + prefix_mapping = dict((offset._prefix, offset) for offset in [ YearBegin, # 'AS' YearEnd, # 'A' diff --git a/pandas/tseries/plotting.py b/pandas/tseries/plotting.py index 89aecf2acc07e..302016907635d 100644 --- a/pandas/tseries/plotting.py +++ b/pandas/tseries/plotting.py @@ -1,313 +1,3 @@ -""" -Period formatters and locators adapted from scikits.timeseries by -Pierre GF Gerard-Marchant & Matt Knox -""" +# flake8: noqa -# TODO: Use the fact that axis can have units to simplify the process - -import numpy as np - -from matplotlib import pylab -from pandas.tseries.period import Period -from pandas.tseries.offsets import DateOffset -import pandas.tseries.frequencies as frequencies -from pandas.tseries.index import DatetimeIndex -from pandas.formats.printing import pprint_thing -import pandas.compat as compat - -from pandas.tseries.converter import (TimeSeries_DateLocator, - TimeSeries_DateFormatter) - -# --------------------------------------------------------------------- -# Plotting functions and monkey patches - - -def tsplot(series, plotf, ax=None, **kwargs): - """ - Plots a Series on the given Matplotlib axes or the current axes - - Parameters - ---------- - axes : Axes - series : Series - - Notes - _____ - Supports same kwargs as Axes.plot - - """ - # Used inferred freq is possible, need a test case for inferred - if ax is None: - import matplotlib.pyplot as plt - ax = plt.gca() - - freq, series = _maybe_resample(series, ax, kwargs) - - # Set ax with freq info - _decorate_axes(ax, freq, kwargs) - ax._plot_data.append((series, plotf, kwargs)) - lines = plotf(ax, series.index._mpl_repr(), series.values, **kwargs) - - # set date formatter, locators and rescale limits - format_dateaxis(ax, ax.freq) - return lines - - -def _maybe_resample(series, ax, kwargs): - # resample against axes freq if necessary - freq, ax_freq = _get_freq(ax, series) - - if freq is None: # pragma: no cover - raise ValueError('Cannot use dynamic axis without frequency info') - - # Convert DatetimeIndex to PeriodIndex - if isinstance(series.index, DatetimeIndex): - series = series.to_period(freq=freq) - - if ax_freq is not None and freq != ax_freq: - if frequencies.is_superperiod(freq, ax_freq): # upsample input - series = series.copy() - series.index = series.index.asfreq(ax_freq, how='s') - freq = ax_freq - elif _is_sup(freq, ax_freq): # one is weekly - how = kwargs.pop('how', 'last') - series = getattr(series.resample('D'), how)().dropna() - series = getattr(series.resample(ax_freq), how)().dropna() - freq = ax_freq - elif frequencies.is_subperiod(freq, ax_freq) or _is_sub(freq, ax_freq): - _upsample_others(ax, freq, kwargs) - ax_freq = freq - else: # pragma: no cover - raise ValueError('Incompatible frequency conversion') - return freq, series - - -def _is_sub(f1, f2): - return ((f1.startswith('W') and frequencies.is_subperiod('D', f2)) or - (f2.startswith('W') and frequencies.is_subperiod(f1, 'D'))) - - -def _is_sup(f1, f2): - return ((f1.startswith('W') and frequencies.is_superperiod('D', f2)) or - (f2.startswith('W') and frequencies.is_superperiod(f1, 'D'))) - - -def _upsample_others(ax, freq, kwargs): - legend = ax.get_legend() - lines, labels = _replot_ax(ax, freq, kwargs) - _replot_ax(ax, freq, kwargs) - - other_ax = None - if hasattr(ax, 'left_ax'): - other_ax = ax.left_ax - if hasattr(ax, 'right_ax'): - other_ax = ax.right_ax - - if other_ax is not None: - rlines, rlabels = _replot_ax(other_ax, freq, kwargs) - lines.extend(rlines) - labels.extend(rlabels) - - if (legend is not None and kwargs.get('legend', True) and - len(lines) > 0): - title = legend.get_title().get_text() - if title == 'None': - title = None - ax.legend(lines, labels, loc='best', title=title) - - -def _replot_ax(ax, freq, kwargs): - data = getattr(ax, '_plot_data', None) - - # clear current axes and data - ax._plot_data = [] - ax.clear() - - _decorate_axes(ax, freq, kwargs) - - lines = [] - labels = [] - if data is not None: - for series, plotf, kwds in data: - series = series.copy() - idx = series.index.asfreq(freq, how='S') - series.index = idx - ax._plot_data.append((series, plotf, kwds)) - - # for tsplot - if isinstance(plotf, compat.string_types): - from pandas.tools.plotting import _plot_klass - plotf = _plot_klass[plotf]._plot - - lines.append(plotf(ax, series.index._mpl_repr(), - series.values, **kwds)[0]) - labels.append(pprint_thing(series.name)) - - return lines, labels - - -def _decorate_axes(ax, freq, kwargs): - """Initialize axes for time-series plotting""" - if not hasattr(ax, '_plot_data'): - ax._plot_data = [] - - ax.freq = freq - xaxis = ax.get_xaxis() - xaxis.freq = freq - if not hasattr(ax, 'legendlabels'): - ax.legendlabels = [kwargs.get('label', None)] - else: - ax.legendlabels.append(kwargs.get('label', None)) - ax.view_interval = None - ax.date_axis_info = None - - -def _get_ax_freq(ax): - """ - Get the freq attribute of the ax object if set. - Also checks shared axes (eg when using secondary yaxis, sharex=True - or twinx) - """ - ax_freq = getattr(ax, 'freq', None) - if ax_freq is None: - # check for left/right ax in case of secondary yaxis - if hasattr(ax, 'left_ax'): - ax_freq = getattr(ax.left_ax, 'freq', None) - elif hasattr(ax, 'right_ax'): - ax_freq = getattr(ax.right_ax, 'freq', None) - if ax_freq is None: - # check if a shared ax (sharex/twinx) has already freq set - shared_axes = ax.get_shared_x_axes().get_siblings(ax) - if len(shared_axes) > 1: - for shared_ax in shared_axes: - ax_freq = getattr(shared_ax, 'freq', None) - if ax_freq is not None: - break - return ax_freq - - -def _get_freq(ax, series): - # get frequency from data - freq = getattr(series.index, 'freq', None) - if freq is None: - freq = getattr(series.index, 'inferred_freq', None) - - ax_freq = _get_ax_freq(ax) - - # use axes freq if no data freq - if freq is None: - freq = ax_freq - - # get the period frequency - if isinstance(freq, DateOffset): - freq = freq.rule_code - else: - freq = frequencies.get_base_alias(freq) - - freq = frequencies.get_period_alias(freq) - return freq, ax_freq - - -def _use_dynamic_x(ax, data): - freq = _get_index_freq(data) - ax_freq = _get_ax_freq(ax) - - if freq is None: # convert irregular if axes has freq info - freq = ax_freq - else: # do not use tsplot if irregular was plotted first - if (ax_freq is None) and (len(ax.get_lines()) > 0): - return False - - if freq is None: - return False - - if isinstance(freq, DateOffset): - freq = freq.rule_code - else: - freq = frequencies.get_base_alias(freq) - freq = frequencies.get_period_alias(freq) - - if freq is None: - return False - - # hack this for 0.10.1, creating more technical debt...sigh - if isinstance(data.index, DatetimeIndex): - base = frequencies.get_freq(freq) - x = data.index - if (base <= frequencies.FreqGroup.FR_DAY): - return x[:1].is_normalized - return Period(x[0], freq).to_timestamp(tz=x.tz) == x[0] - return True - - -def _get_index_freq(data): - freq = getattr(data.index, 'freq', None) - if freq is None: - freq = getattr(data.index, 'inferred_freq', None) - if freq == 'B': - weekdays = np.unique(data.index.dayofweek) - if (5 in weekdays) or (6 in weekdays): - freq = None - return freq - - -def _maybe_convert_index(ax, data): - # tsplot converts automatically, but don't want to convert index - # over and over for DataFrames - if isinstance(data.index, DatetimeIndex): - freq = getattr(data.index, 'freq', None) - - if freq is None: - freq = getattr(data.index, 'inferred_freq', None) - if isinstance(freq, DateOffset): - freq = freq.rule_code - - if freq is None: - freq = _get_ax_freq(ax) - - if freq is None: - raise ValueError('Could not get frequency alias for plotting') - - freq = frequencies.get_base_alias(freq) - freq = frequencies.get_period_alias(freq) - - data = data.to_period(freq=freq) - return data - - -# Patch methods for subplot. Only format_dateaxis is currently used. -# Do we need the rest for convenience? - - -def format_dateaxis(subplot, freq): - """ - Pretty-formats the date axis (x-axis). - - Major and minor ticks are automatically set for the frequency of the - current underlying series. As the dynamic mode is activated by - default, changing the limits of the x axis will intelligently change - the positions of the ticks. - """ - majlocator = TimeSeries_DateLocator(freq, dynamic_mode=True, - minor_locator=False, - plot_obj=subplot) - minlocator = TimeSeries_DateLocator(freq, dynamic_mode=True, - minor_locator=True, - plot_obj=subplot) - subplot.xaxis.set_major_locator(majlocator) - subplot.xaxis.set_minor_locator(minlocator) - - majformatter = TimeSeries_DateFormatter(freq, dynamic_mode=True, - minor_locator=False, - plot_obj=subplot) - minformatter = TimeSeries_DateFormatter(freq, dynamic_mode=True, - minor_locator=True, - plot_obj=subplot) - subplot.xaxis.set_major_formatter(majformatter) - subplot.xaxis.set_minor_formatter(minformatter) - - # x and y coord info - subplot.format_coord = lambda t, y: ( - "t = {0} y = {1:8f}".format(Period(ordinal=int(t), freq=freq), y)) - - pylab.draw_if_interactive() +from pandas.plotting._timeseries import tsplot diff --git a/pandas/tseries/tests/data/daterange_073.pickle b/pandas/tseries/tests/data/daterange_073.pickle deleted file mode 100644 index 0214a023e6338..0000000000000 Binary files a/pandas/tseries/tests/data/daterange_073.pickle and /dev/null differ diff --git a/pandas/tseries/tests/data/frame.pickle b/pandas/tseries/tests/data/frame.pickle deleted file mode 100644 index b3b100fb43022..0000000000000 Binary files a/pandas/tseries/tests/data/frame.pickle and /dev/null differ diff --git a/pandas/tseries/tests/data/series.pickle b/pandas/tseries/tests/data/series.pickle deleted file mode 100644 index 307a4ac265173..0000000000000 Binary files a/pandas/tseries/tests/data/series.pickle and /dev/null differ diff --git a/pandas/tseries/tests/data/series_daterange0.pickle b/pandas/tseries/tests/data/series_daterange0.pickle deleted file mode 100644 index 07e2144039f2e..0000000000000 Binary files a/pandas/tseries/tests/data/series_daterange0.pickle and /dev/null differ diff --git a/pandas/tseries/tests/test_base.py b/pandas/tseries/tests/test_base.py deleted file mode 100644 index bca50237081e1..0000000000000 --- a/pandas/tseries/tests/test_base.py +++ /dev/null @@ -1,2764 +0,0 @@ -from __future__ import print_function -from datetime import datetime, timedelta -import numpy as np -import pandas as pd -from pandas import (Series, Index, Int64Index, Timestamp, Period, - DatetimeIndex, PeriodIndex, TimedeltaIndex, - Timedelta, timedelta_range, date_range, Float64Index, - _np_version_under1p10) -import pandas.tslib as tslib -import pandas.tseries.period as period - -import pandas.util.testing as tm - -from pandas.tests.test_base import Ops - - -class TestDatetimeIndexOps(Ops): - tz = [None, 'UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/Asia/Singapore', - 'dateutil/US/Pacific'] - - def setUp(self): - super(TestDatetimeIndexOps, self).setUp() - mask = lambda x: (isinstance(x, DatetimeIndex) or - isinstance(x, PeriodIndex)) - self.is_valid_objs = [o for o in self.objs if mask(o)] - self.not_valid_objs = [o for o in self.objs if not mask(o)] - - def test_ops_properties(self): - self.check_ops_properties( - ['year', 'month', 'day', 'hour', 'minute', 'second', 'weekofyear', - 'week', 'dayofweek', 'dayofyear', 'quarter']) - self.check_ops_properties(['date', 'time', 'microsecond', 'nanosecond', - 'is_month_start', 'is_month_end', - 'is_quarter_start', - 'is_quarter_end', 'is_year_start', - 'is_year_end', 'weekday_name'], - lambda x: isinstance(x, DatetimeIndex)) - - def test_ops_properties_basic(self): - - # sanity check that the behavior didn't change - # GH7206 - for op in ['year', 'day', 'second', 'weekday']: - self.assertRaises(TypeError, lambda x: getattr(self.dt_series, op)) - - # attribute access should still work! - s = Series(dict(year=2000, month=1, day=10)) - self.assertEqual(s.year, 2000) - self.assertEqual(s.month, 1) - self.assertEqual(s.day, 10) - self.assertRaises(AttributeError, lambda: s.weekday) - - def test_asobject_tolist(self): - idx = pd.date_range(start='2013-01-01', periods=4, freq='M', - name='idx') - expected_list = [Timestamp('2013-01-31'), - Timestamp('2013-02-28'), - Timestamp('2013-03-31'), - Timestamp('2013-04-30')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - idx = pd.date_range(start='2013-01-01', periods=4, freq='M', - name='idx', tz='Asia/Tokyo') - expected_list = [Timestamp('2013-01-31', tz='Asia/Tokyo'), - Timestamp('2013-02-28', tz='Asia/Tokyo'), - Timestamp('2013-03-31', tz='Asia/Tokyo'), - Timestamp('2013-04-30', tz='Asia/Tokyo')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - idx = DatetimeIndex([datetime(2013, 1, 1), datetime(2013, 1, 2), - pd.NaT, datetime(2013, 1, 4)], name='idx') - expected_list = [Timestamp('2013-01-01'), - Timestamp('2013-01-02'), pd.NaT, - Timestamp('2013-01-04')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - def test_minmax(self): - for tz in self.tz: - # monotonic - idx1 = pd.DatetimeIndex(['2011-01-01', '2011-01-02', - '2011-01-03'], tz=tz) - self.assertTrue(idx1.is_monotonic) - - # non-monotonic - idx2 = pd.DatetimeIndex(['2011-01-01', pd.NaT, '2011-01-03', - '2011-01-02', pd.NaT], tz=tz) - self.assertFalse(idx2.is_monotonic) - - for idx in [idx1, idx2]: - self.assertEqual(idx.min(), Timestamp('2011-01-01', tz=tz)) - self.assertEqual(idx.max(), Timestamp('2011-01-03', tz=tz)) - self.assertEqual(idx.argmin(), 0) - self.assertEqual(idx.argmax(), 2) - - for op in ['min', 'max']: - # Return NaT - obj = DatetimeIndex([]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - obj = DatetimeIndex([pd.NaT]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - obj = DatetimeIndex([pd.NaT, pd.NaT, pd.NaT]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - def test_numpy_minmax(self): - dr = pd.date_range(start='2016-01-15', end='2016-01-20') - - self.assertEqual(np.min(dr), - Timestamp('2016-01-15 00:00:00', freq='D')) - self.assertEqual(np.max(dr), - Timestamp('2016-01-20 00:00:00', freq='D')) - - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.min, dr, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.max, dr, out=0) - - self.assertEqual(np.argmin(dr), 0) - self.assertEqual(np.argmax(dr), 5) - - if not _np_version_under1p10: - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.argmin, dr, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.argmax, dr, out=0) - - def test_round(self): - for tz in self.tz: - rng = pd.date_range(start='2016-01-01', periods=5, - freq='30Min', tz=tz) - elt = rng[1] - - expected_rng = DatetimeIndex([ - Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 01:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 02:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 02:00:00', tz=tz, freq='30T'), - ]) - expected_elt = expected_rng[1] - - tm.assert_index_equal(rng.round(freq='H'), expected_rng) - self.assertEqual(elt.round(freq='H'), expected_elt) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with tm.assertRaisesRegexp(ValueError, msg): - rng.round(freq='foo') - with tm.assertRaisesRegexp(ValueError, msg): - elt.round(freq='foo') - - msg = " is a non-fixed frequency" - tm.assertRaisesRegexp(ValueError, msg, rng.round, freq='M') - tm.assertRaisesRegexp(ValueError, msg, elt.round, freq='M') - - def test_repeat_range(self): - rng = date_range('1/1/2000', '1/1/2001') - - result = rng.repeat(5) - self.assertIsNone(result.freq) - self.assertEqual(len(result), 5 * len(rng)) - - for tz in self.tz: - index = pd.date_range('2001-01-01', periods=2, freq='D', tz=tz) - exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', - '2001-01-02', '2001-01-02'], tz=tz) - for res in [index.repeat(2), np.repeat(index, 2)]: - tm.assert_index_equal(res, exp) - self.assertIsNone(res.freq) - - index = pd.date_range('2001-01-01', periods=2, freq='2D', tz=tz) - exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', - '2001-01-03', '2001-01-03'], tz=tz) - for res in [index.repeat(2), np.repeat(index, 2)]: - tm.assert_index_equal(res, exp) - self.assertIsNone(res.freq) - - index = pd.DatetimeIndex(['2001-01-01', 'NaT', '2003-01-01'], - tz=tz) - exp = pd.DatetimeIndex(['2001-01-01', '2001-01-01', '2001-01-01', - 'NaT', 'NaT', 'NaT', - '2003-01-01', '2003-01-01', '2003-01-01'], - tz=tz) - for res in [index.repeat(3), np.repeat(index, 3)]: - tm.assert_index_equal(res, exp) - self.assertIsNone(res.freq) - - def test_repeat(self): - reps = 2 - msg = "the 'axis' parameter is not supported" - - for tz in self.tz: - rng = pd.date_range(start='2016-01-01', periods=2, - freq='30Min', tz=tz) - - expected_rng = DatetimeIndex([ - Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 00:00:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 00:30:00', tz=tz, freq='30T'), - Timestamp('2016-01-01 00:30:00', tz=tz, freq='30T'), - ]) - - res = rng.repeat(reps) - tm.assert_index_equal(res, expected_rng) - self.assertIsNone(res.freq) - - tm.assert_index_equal(np.repeat(rng, reps), expected_rng) - tm.assertRaisesRegexp(ValueError, msg, np.repeat, - rng, reps, axis=1) - - def test_representation(self): - - idx = [] - idx.append(DatetimeIndex([], freq='D')) - idx.append(DatetimeIndex(['2011-01-01'], freq='D')) - idx.append(DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D')) - idx.append(DatetimeIndex( - ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D')) - idx.append(DatetimeIndex( - ['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00' - ], freq='H', tz='Asia/Tokyo')) - idx.append(DatetimeIndex( - ['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], tz='US/Eastern')) - idx.append(DatetimeIndex( - ['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], tz='UTC')) - - exp = [] - exp.append("""DatetimeIndex([], dtype='datetime64[ns]', freq='D')""") - exp.append("DatetimeIndex(['2011-01-01'], dtype='datetime64[ns]', " - "freq='D')") - exp.append("DatetimeIndex(['2011-01-01', '2011-01-02'], " - "dtype='datetime64[ns]', freq='D')") - exp.append("DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], " - "dtype='datetime64[ns]', freq='D')") - exp.append("DatetimeIndex(['2011-01-01 09:00:00+09:00', " - "'2011-01-01 10:00:00+09:00', '2011-01-01 11:00:00+09:00']" - ", dtype='datetime64[ns, Asia/Tokyo]', freq='H')") - exp.append("DatetimeIndex(['2011-01-01 09:00:00-05:00', " - "'2011-01-01 10:00:00-05:00', 'NaT'], " - "dtype='datetime64[ns, US/Eastern]', freq=None)") - exp.append("DatetimeIndex(['2011-01-01 09:00:00+00:00', " - "'2011-01-01 10:00:00+00:00', 'NaT'], " - "dtype='datetime64[ns, UTC]', freq=None)""") - - with pd.option_context('display.width', 300): - for indx, expected in zip(idx, exp): - for func in ['__repr__', '__unicode__', '__str__']: - result = getattr(indx, func)() - self.assertEqual(result, expected) - - def test_representation_to_series(self): - idx1 = DatetimeIndex([], freq='D') - idx2 = DatetimeIndex(['2011-01-01'], freq='D') - idx3 = DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') - idx4 = DatetimeIndex( - ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') - idx5 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00'], freq='H', tz='Asia/Tokyo') - idx6 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], - tz='US/Eastern') - idx7 = DatetimeIndex(['2011-01-01 09:00', '2011-01-02 10:15']) - - exp1 = """Series([], dtype: datetime64[ns])""" - - exp2 = """0 2011-01-01 -dtype: datetime64[ns]""" - - exp3 = """0 2011-01-01 -1 2011-01-02 -dtype: datetime64[ns]""" - - exp4 = """0 2011-01-01 -1 2011-01-02 -2 2011-01-03 -dtype: datetime64[ns]""" - - exp5 = """0 2011-01-01 09:00:00+09:00 -1 2011-01-01 10:00:00+09:00 -2 2011-01-01 11:00:00+09:00 -dtype: datetime64[ns, Asia/Tokyo]""" - - exp6 = """0 2011-01-01 09:00:00-05:00 -1 2011-01-01 10:00:00-05:00 -2 NaT -dtype: datetime64[ns, US/Eastern]""" - - exp7 = """0 2011-01-01 09:00:00 -1 2011-01-02 10:15:00 -dtype: datetime64[ns]""" - - with pd.option_context('display.width', 300): - for idx, expected in zip([idx1, idx2, idx3, idx4, - idx5, idx6, idx7], - [exp1, exp2, exp3, exp4, - exp5, exp6, exp7]): - result = repr(Series(idx)) - self.assertEqual(result, expected) - - def test_summary(self): - # GH9116 - idx1 = DatetimeIndex([], freq='D') - idx2 = DatetimeIndex(['2011-01-01'], freq='D') - idx3 = DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') - idx4 = DatetimeIndex( - ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') - idx5 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00'], - freq='H', tz='Asia/Tokyo') - idx6 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', pd.NaT], - tz='US/Eastern') - - exp1 = """DatetimeIndex: 0 entries -Freq: D""" - - exp2 = """DatetimeIndex: 1 entries, 2011-01-01 to 2011-01-01 -Freq: D""" - - exp3 = """DatetimeIndex: 2 entries, 2011-01-01 to 2011-01-02 -Freq: D""" - - exp4 = """DatetimeIndex: 3 entries, 2011-01-01 to 2011-01-03 -Freq: D""" - - exp5 = ("DatetimeIndex: 3 entries, 2011-01-01 09:00:00+09:00 " - "to 2011-01-01 11:00:00+09:00\n" - "Freq: H") - - exp6 = """DatetimeIndex: 3 entries, 2011-01-01 09:00:00-05:00 to NaT""" - - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, idx6], - [exp1, exp2, exp3, exp4, exp5, exp6]): - result = idx.summary() - self.assertEqual(result, expected) - - def test_resolution(self): - for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', 'T', - 'S', 'L', 'U'], - ['day', 'day', 'day', 'day', 'hour', - 'minute', 'second', 'millisecond', - 'microsecond']): - for tz in self.tz: - idx = pd.date_range(start='2013-04-01', periods=30, freq=freq, - tz=tz) - self.assertEqual(idx.resolution, expected) - - def test_union(self): - for tz in self.tz: - # union - rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other1 = pd.date_range('1/6/2000', freq='D', periods=5, tz=tz) - expected1 = pd.date_range('1/1/2000', freq='D', periods=10, tz=tz) - - rng2 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other2 = pd.date_range('1/4/2000', freq='D', periods=5, tz=tz) - expected2 = pd.date_range('1/1/2000', freq='D', periods=8, tz=tz) - - rng3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other3 = pd.DatetimeIndex([], tz=tz) - expected3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - - for rng, other, expected in [(rng1, other1, expected1), - (rng2, other2, expected2), - (rng3, other3, expected3)]: - - result_union = rng.union(other) - tm.assert_index_equal(result_union, expected) - - def test_add_iadd(self): - for tz in self.tz: - - # offset - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), Timedelta(hours=2)] - - for delta in offsets: - rng = pd.date_range('2000-01-01', '2000-02-01', tz=tz) - result = rng + delta - expected = pd.date_range('2000-01-01 02:00', - '2000-02-01 02:00', tz=tz) - tm.assert_index_equal(result, expected) - rng += delta - tm.assert_index_equal(rng, expected) - - # int - rng = pd.date_range('2000-01-01 09:00', freq='H', periods=10, - tz=tz) - result = rng + 1 - expected = pd.date_range('2000-01-01 10:00', freq='H', periods=10, - tz=tz) - tm.assert_index_equal(result, expected) - rng += 1 - tm.assert_index_equal(rng, expected) - - idx = DatetimeIndex(['2011-01-01', '2011-01-02']) - msg = "cannot add a datelike to a DatetimeIndex" - with tm.assertRaisesRegexp(TypeError, msg): - idx + Timestamp('2011-01-01') - - with tm.assertRaisesRegexp(TypeError, msg): - Timestamp('2011-01-01') + idx - - def test_add_dti_dti(self): - # previously performed setop (deprecated in 0.16.0), now raises - # TypeError (GH14164) - - dti = date_range('20130101', periods=3) - dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') - - with tm.assertRaises(TypeError): - dti + dti - - with tm.assertRaises(TypeError): - dti_tz + dti_tz - - with tm.assertRaises(TypeError): - dti_tz + dti - - with tm.assertRaises(TypeError): - dti + dti_tz - - def test_difference(self): - for tz in self.tz: - # diff - rng1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other1 = pd.date_range('1/6/2000', freq='D', periods=5, tz=tz) - expected1 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - - rng2 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other2 = pd.date_range('1/4/2000', freq='D', periods=5, tz=tz) - expected2 = pd.date_range('1/1/2000', freq='D', periods=3, tz=tz) - - rng3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - other3 = pd.DatetimeIndex([], tz=tz) - expected3 = pd.date_range('1/1/2000', freq='D', periods=5, tz=tz) - - for rng, other, expected in [(rng1, other1, expected1), - (rng2, other2, expected2), - (rng3, other3, expected3)]: - result_diff = rng.difference(other) - tm.assert_index_equal(result_diff, expected) - - def test_sub_isub(self): - for tz in self.tz: - - # offset - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), Timedelta(hours=2)] - - for delta in offsets: - rng = pd.date_range('2000-01-01', '2000-02-01', tz=tz) - expected = pd.date_range('1999-12-31 22:00', - '2000-01-31 22:00', tz=tz) - - result = rng - delta - tm.assert_index_equal(result, expected) - rng -= delta - tm.assert_index_equal(rng, expected) - - # int - rng = pd.date_range('2000-01-01 09:00', freq='H', periods=10, - tz=tz) - result = rng - 1 - expected = pd.date_range('2000-01-01 08:00', freq='H', periods=10, - tz=tz) - tm.assert_index_equal(result, expected) - rng -= 1 - tm.assert_index_equal(rng, expected) - - def test_sub_dti_dti(self): - # previously performed setop (deprecated in 0.16.0), now changed to - # return subtraction -> TimeDeltaIndex (GH ...) - - dti = date_range('20130101', periods=3) - dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') - dti_tz2 = date_range('20130101', periods=3).tz_localize('UTC') - expected = TimedeltaIndex([0, 0, 0]) - - result = dti - dti - tm.assert_index_equal(result, expected) - - result = dti_tz - dti_tz - tm.assert_index_equal(result, expected) - - with tm.assertRaises(TypeError): - dti_tz - dti - - with tm.assertRaises(TypeError): - dti - dti_tz - - with tm.assertRaises(TypeError): - dti_tz - dti_tz2 - - # isub - dti -= dti - tm.assert_index_equal(dti, expected) - - # different length raises ValueError - dti1 = date_range('20130101', periods=3) - dti2 = date_range('20130101', periods=4) - with tm.assertRaises(ValueError): - dti1 - dti2 - - # NaN propagation - dti1 = DatetimeIndex(['2012-01-01', np.nan, '2012-01-03']) - dti2 = DatetimeIndex(['2012-01-02', '2012-01-03', np.nan]) - expected = TimedeltaIndex(['1 days', np.nan, np.nan]) - result = dti2 - dti1 - tm.assert_index_equal(result, expected) - - def test_sub_period(self): - # GH 13078 - # not supported, check TypeError - p = pd.Period('2011-01-01', freq='D') - - for freq in [None, 'D']: - idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq=freq) - - with tm.assertRaises(TypeError): - idx - p - - with tm.assertRaises(TypeError): - p - idx - - def test_comp_nat(self): - left = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT, - pd.Timestamp('2011-01-03')]) - right = pd.DatetimeIndex([pd.NaT, pd.NaT, pd.Timestamp('2011-01-03')]) - - for l, r in [(left, right), (left.asobject, right.asobject)]: - result = l == r - expected = np.array([False, False, True]) - tm.assert_numpy_array_equal(result, expected) - - result = l != r - expected = np.array([True, True, False]) - tm.assert_numpy_array_equal(result, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l == pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT == r, expected) - - expected = np.array([True, True, True]) - tm.assert_numpy_array_equal(l != pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT != l, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l < pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT > l, expected) - - def test_value_counts_unique(self): - # GH 7735 - for tz in self.tz: - idx = pd.date_range('2011-01-01 09:00', freq='H', periods=10) - # create repeated values, 'n'th element is repeated by n+1 times - idx = DatetimeIndex(np.repeat(idx.values, range(1, len(idx) + 1)), - tz=tz) - - exp_idx = pd.date_range('2011-01-01 18:00', freq='-1H', periods=10, - tz=tz) - expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - expected = pd.date_range('2011-01-01 09:00', freq='H', periods=10, - tz=tz) - tm.assert_index_equal(idx.unique(), expected) - - idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 09:00', - '2013-01-01 09:00', '2013-01-01 08:00', - '2013-01-01 08:00', pd.NaT], tz=tz) - - exp_idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 08:00'], - tz=tz) - expected = Series([3, 2], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - exp_idx = DatetimeIndex(['2013-01-01 09:00', '2013-01-01 08:00', - pd.NaT], tz=tz) - expected = Series([3, 2, 1], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(dropna=False), - expected) - - tm.assert_index_equal(idx.unique(), exp_idx) - - def test_nonunique_contains(self): - # GH 9512 - for idx in map(DatetimeIndex, - ([0, 1, 0], [0, 0, -1], [0, -1, -1], - ['2015', '2015', '2016'], ['2015', '2015', '2014'])): - tm.assertIn(idx[0], idx) - - def test_order(self): - # with freq - idx1 = DatetimeIndex(['2011-01-01', '2011-01-02', - '2011-01-03'], freq='D', name='idx') - idx2 = DatetimeIndex(['2011-01-01 09:00', '2011-01-01 10:00', - '2011-01-01 11:00'], freq='H', - tz='Asia/Tokyo', name='tzidx') - - for idx in [idx1, idx2]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, idx) - self.assertEqual(ordered.freq, idx.freq) - - ordered = idx.sort_values(ascending=False) - expected = idx[::-1] - self.assert_index_equal(ordered, expected) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq.n, -1) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, idx) - self.assert_numpy_array_equal(indexer, - np.array([0, 1, 2]), - check_dtype=False) - self.assertEqual(ordered.freq, idx.freq) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - expected = idx[::-1] - self.assert_index_equal(ordered, expected) - self.assert_numpy_array_equal(indexer, - np.array([2, 1, 0]), - check_dtype=False) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq.n, -1) - - # without freq - for tz in self.tz: - idx1 = DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', - '2011-01-02', '2011-01-01'], - tz=tz, name='idx1') - exp1 = DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-02', - '2011-01-03', '2011-01-05'], - tz=tz, name='idx1') - - idx2 = DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', - '2011-01-02', '2011-01-01'], - tz=tz, name='idx2') - - exp2 = DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-02', - '2011-01-03', '2011-01-05'], - tz=tz, name='idx2') - - idx3 = DatetimeIndex([pd.NaT, '2011-01-03', '2011-01-05', - '2011-01-02', pd.NaT], tz=tz, name='idx3') - exp3 = DatetimeIndex([pd.NaT, pd.NaT, '2011-01-02', '2011-01-03', - '2011-01-05'], tz=tz, name='idx3') - - for idx, expected in [(idx1, exp1), (idx2, exp2), (idx3, exp3)]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, expected) - self.assertIsNone(ordered.freq) - - ordered = idx.sort_values(ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - self.assertIsNone(ordered.freq) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, expected) - - exp = np.array([0, 4, 3, 1, 2]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertIsNone(ordered.freq) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - - exp = np.array([2, 1, 3, 4, 0]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertIsNone(ordered.freq) - - def test_getitem(self): - idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') - idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D', - tz='Asia/Tokyo', name='idx') - - for idx in [idx1, idx2]: - result = idx[0] - self.assertEqual(result, Timestamp('2011-01-01', tz=idx.tz)) - - result = idx[0:5] - expected = pd.date_range('2011-01-01', '2011-01-05', freq='D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[0:10:2] - expected = pd.date_range('2011-01-01', '2011-01-09', freq='2D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[-20:-5:3] - expected = pd.date_range('2011-01-12', '2011-01-24', freq='3D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[4::-1] - expected = DatetimeIndex(['2011-01-05', '2011-01-04', '2011-01-03', - '2011-01-02', '2011-01-01'], - freq='-1D', tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - def test_drop_duplicates_metadata(self): - # GH 10115 - idx = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') - result = idx.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertEqual(idx.freq, result.freq) - - idx_dup = idx.append(idx) - self.assertIsNone(idx_dup.freq) # freq is reset - result = idx_dup.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertIsNone(result.freq) - - def test_drop_duplicates(self): - # to check Index/Series compat - base = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') - idx = base.append(base[:5]) - - res = idx.drop_duplicates() - tm.assert_index_equal(res, base) - res = Series(idx).drop_duplicates() - tm.assert_series_equal(res, Series(base)) - - res = idx.drop_duplicates(keep='last') - exp = base[5:].append(base[:5]) - tm.assert_index_equal(res, exp) - res = Series(idx).drop_duplicates(keep='last') - tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) - - res = idx.drop_duplicates(keep=False) - tm.assert_index_equal(res, base[5:]) - res = Series(idx).drop_duplicates(keep=False) - tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) - - def test_take(self): - # GH 10295 - idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') - idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D', - tz='Asia/Tokyo', name='idx') - - for idx in [idx1, idx2]: - result = idx.take([0]) - self.assertEqual(result, Timestamp('2011-01-01', tz=idx.tz)) - - result = idx.take([0, 1, 2]) - expected = pd.date_range('2011-01-01', '2011-01-03', freq='D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([0, 2, 4]) - expected = pd.date_range('2011-01-01', '2011-01-05', freq='2D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([7, 4, 1]) - expected = pd.date_range('2011-01-08', '2011-01-02', freq='-3D', - tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([3, 2, 5]) - expected = DatetimeIndex(['2011-01-04', '2011-01-03', - '2011-01-06'], - freq=None, tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertIsNone(result.freq) - - result = idx.take([-3, 2, 5]) - expected = DatetimeIndex(['2011-01-29', '2011-01-03', - '2011-01-06'], - freq=None, tz=idx.tz, name='idx') - self.assert_index_equal(result, expected) - self.assertIsNone(result.freq) - - def test_take_invalid_kwargs(self): - idx = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx') - indices = [1, 6, 5, 9, 10, 13, 15, 3] - - msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, idx.take, - indices, foo=2) - - msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, out=indices) - - msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, mode='clip') - - def test_infer_freq(self): - # GH 11018 - for freq in ['A', '2A', '-2A', 'Q', '-1Q', 'M', '-1M', 'D', '3D', - '-3D', 'W', '-1W', 'H', '2H', '-2H', 'T', '2T', 'S', - '-3S']: - idx = pd.date_range('2011-01-01 09:00:00', freq=freq, periods=10) - result = pd.DatetimeIndex(idx.asi8, freq='infer') - tm.assert_index_equal(idx, result) - self.assertEqual(result.freq, freq) - - def test_nat_new(self): - idx = pd.date_range('2011-01-01', freq='D', periods=5, name='x') - result = idx._nat_new() - exp = pd.DatetimeIndex([pd.NaT] * 5, name='x') - tm.assert_index_equal(result, exp) - - result = idx._nat_new(box=False) - exp = np.array([tslib.iNaT] * 5, dtype=np.int64) - tm.assert_numpy_array_equal(result, exp) - - def test_shift(self): - # GH 9903 - for tz in self.tz: - idx = pd.DatetimeIndex([], name='xxx', tz=tz) - tm.assert_index_equal(idx.shift(0, freq='H'), idx) - tm.assert_index_equal(idx.shift(3, freq='H'), idx) - - idx = pd.DatetimeIndex(['2011-01-01 10:00', '2011-01-01 11:00' - '2011-01-01 12:00'], name='xxx', tz=tz) - tm.assert_index_equal(idx.shift(0, freq='H'), idx) - exp = pd.DatetimeIndex(['2011-01-01 13:00', '2011-01-01 14:00' - '2011-01-01 15:00'], name='xxx', tz=tz) - tm.assert_index_equal(idx.shift(3, freq='H'), exp) - exp = pd.DatetimeIndex(['2011-01-01 07:00', '2011-01-01 08:00' - '2011-01-01 09:00'], name='xxx', tz=tz) - tm.assert_index_equal(idx.shift(-3, freq='H'), exp) - - def test_nat(self): - self.assertIs(pd.DatetimeIndex._na_value, pd.NaT) - self.assertIs(pd.DatetimeIndex([])._na_value, pd.NaT) - - for tz in [None, 'US/Eastern', 'UTC']: - idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], tz=tz) - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) - self.assertFalse(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([], dtype=np.intp)) - - idx = pd.DatetimeIndex(['2011-01-01', 'NaT'], tz=tz) - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) - self.assertTrue(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([1], dtype=np.intp)) - - def test_equals(self): - # GH 13107 - for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: - idx = pd.DatetimeIndex(['2011-01-01', '2011-01-02', 'NaT']) - self.assertTrue(idx.equals(idx)) - self.assertTrue(idx.equals(idx.copy())) - self.assertTrue(idx.equals(idx.asobject)) - self.assertTrue(idx.asobject.equals(idx)) - self.assertTrue(idx.asobject.equals(idx.asobject)) - self.assertFalse(idx.equals(list(idx))) - self.assertFalse(idx.equals(pd.Series(idx))) - - idx2 = pd.DatetimeIndex(['2011-01-01', '2011-01-02', 'NaT'], - tz='US/Pacific') - self.assertFalse(idx.equals(idx2)) - self.assertFalse(idx.equals(idx2.copy())) - self.assertFalse(idx.equals(idx2.asobject)) - self.assertFalse(idx.asobject.equals(idx2)) - self.assertFalse(idx.equals(list(idx2))) - self.assertFalse(idx.equals(pd.Series(idx2))) - - # same internal, different tz - idx3 = pd.DatetimeIndex._simple_new(idx.asi8, tz='US/Pacific') - tm.assert_numpy_array_equal(idx.asi8, idx3.asi8) - self.assertFalse(idx.equals(idx3)) - self.assertFalse(idx.equals(idx3.copy())) - self.assertFalse(idx.equals(idx3.asobject)) - self.assertFalse(idx.asobject.equals(idx3)) - self.assertFalse(idx.equals(list(idx3))) - self.assertFalse(idx.equals(pd.Series(idx3))) - - -class TestTimedeltaIndexOps(Ops): - def setUp(self): - super(TestTimedeltaIndexOps, self).setUp() - mask = lambda x: isinstance(x, TimedeltaIndex) - self.is_valid_objs = [o for o in self.objs if mask(o)] - self.not_valid_objs = [] - - def test_ops_properties(self): - self.check_ops_properties(['days', 'hours', 'minutes', 'seconds', - 'milliseconds']) - self.check_ops_properties(['microseconds', 'nanoseconds']) - - def test_asobject_tolist(self): - idx = timedelta_range(start='1 days', periods=4, freq='D', name='idx') - expected_list = [Timedelta('1 days'), Timedelta('2 days'), - Timedelta('3 days'), Timedelta('4 days')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - idx = TimedeltaIndex([timedelta(days=1), timedelta(days=2), pd.NaT, - timedelta(days=4)], name='idx') - expected_list = [Timedelta('1 days'), Timedelta('2 days'), pd.NaT, - Timedelta('4 days')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - def test_minmax(self): - - # monotonic - idx1 = TimedeltaIndex(['1 days', '2 days', '3 days']) - self.assertTrue(idx1.is_monotonic) - - # non-monotonic - idx2 = TimedeltaIndex(['1 days', np.nan, '3 days', 'NaT']) - self.assertFalse(idx2.is_monotonic) - - for idx in [idx1, idx2]: - self.assertEqual(idx.min(), Timedelta('1 days')), - self.assertEqual(idx.max(), Timedelta('3 days')), - self.assertEqual(idx.argmin(), 0) - self.assertEqual(idx.argmax(), 2) - - for op in ['min', 'max']: - # Return NaT - obj = TimedeltaIndex([]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - obj = TimedeltaIndex([pd.NaT]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - obj = TimedeltaIndex([pd.NaT, pd.NaT, pd.NaT]) - self.assertTrue(pd.isnull(getattr(obj, op)())) - - def test_numpy_minmax(self): - dr = pd.date_range(start='2016-01-15', end='2016-01-20') - td = TimedeltaIndex(np.asarray(dr)) - - self.assertEqual(np.min(td), Timedelta('16815 days')) - self.assertEqual(np.max(td), Timedelta('16820 days')) - - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.min, td, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.max, td, out=0) - - self.assertEqual(np.argmin(td), 0) - self.assertEqual(np.argmax(td), 5) - - if not _np_version_under1p10: - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.argmin, td, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.argmax, td, out=0) - - def test_round(self): - td = pd.timedelta_range(start='16801 days', periods=5, freq='30Min') - elt = td[1] - - expected_rng = TimedeltaIndex([ - Timedelta('16801 days 00:00:00'), - Timedelta('16801 days 00:00:00'), - Timedelta('16801 days 01:00:00'), - Timedelta('16801 days 02:00:00'), - Timedelta('16801 days 02:00:00'), - ]) - expected_elt = expected_rng[1] - - tm.assert_index_equal(td.round(freq='H'), expected_rng) - self.assertEqual(elt.round(freq='H'), expected_elt) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - td.round(freq='foo') - with tm.assertRaisesRegexp(ValueError, msg): - elt.round(freq='foo') - - msg = " is a non-fixed frequency" - tm.assertRaisesRegexp(ValueError, msg, td.round, freq='M') - tm.assertRaisesRegexp(ValueError, msg, elt.round, freq='M') - - def test_representation(self): - idx1 = TimedeltaIndex([], freq='D') - idx2 = TimedeltaIndex(['1 days'], freq='D') - idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') - idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') - idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) - - exp1 = """TimedeltaIndex([], dtype='timedelta64[ns]', freq='D')""" - - exp2 = ("TimedeltaIndex(['1 days'], dtype='timedelta64[ns]', " - "freq='D')") - - exp3 = ("TimedeltaIndex(['1 days', '2 days'], " - "dtype='timedelta64[ns]', freq='D')") - - exp4 = ("TimedeltaIndex(['1 days', '2 days', '3 days'], " - "dtype='timedelta64[ns]', freq='D')") - - exp5 = ("TimedeltaIndex(['1 days 00:00:01', '2 days 00:00:00', " - "'3 days 00:00:00'], dtype='timedelta64[ns]', freq=None)") - - with pd.option_context('display.width', 300): - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], - [exp1, exp2, exp3, exp4, exp5]): - for func in ['__repr__', '__unicode__', '__str__']: - result = getattr(idx, func)() - self.assertEqual(result, expected) - - def test_representation_to_series(self): - idx1 = TimedeltaIndex([], freq='D') - idx2 = TimedeltaIndex(['1 days'], freq='D') - idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') - idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') - idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) - - exp1 = """Series([], dtype: timedelta64[ns])""" - - exp2 = """0 1 days -dtype: timedelta64[ns]""" - - exp3 = """0 1 days -1 2 days -dtype: timedelta64[ns]""" - - exp4 = """0 1 days -1 2 days -2 3 days -dtype: timedelta64[ns]""" - - exp5 = """0 1 days 00:00:01 -1 2 days 00:00:00 -2 3 days 00:00:00 -dtype: timedelta64[ns]""" - - with pd.option_context('display.width', 300): - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], - [exp1, exp2, exp3, exp4, exp5]): - result = repr(pd.Series(idx)) - self.assertEqual(result, expected) - - def test_summary(self): - # GH9116 - idx1 = TimedeltaIndex([], freq='D') - idx2 = TimedeltaIndex(['1 days'], freq='D') - idx3 = TimedeltaIndex(['1 days', '2 days'], freq='D') - idx4 = TimedeltaIndex(['1 days', '2 days', '3 days'], freq='D') - idx5 = TimedeltaIndex(['1 days 00:00:01', '2 days', '3 days']) - - exp1 = """TimedeltaIndex: 0 entries -Freq: D""" - - exp2 = """TimedeltaIndex: 1 entries, 1 days to 1 days -Freq: D""" - - exp3 = """TimedeltaIndex: 2 entries, 1 days to 2 days -Freq: D""" - - exp4 = """TimedeltaIndex: 3 entries, 1 days to 3 days -Freq: D""" - - exp5 = ("TimedeltaIndex: 3 entries, 1 days 00:00:01 to 3 days " - "00:00:00") - - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5], - [exp1, exp2, exp3, exp4, exp5]): - result = idx.summary() - self.assertEqual(result, expected) - - def test_add_iadd(self): - - # only test adding/sub offsets as + is now numeric - - # offset - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), Timedelta(hours=2)] - - for delta in offsets: - rng = timedelta_range('1 days', '10 days') - result = rng + delta - expected = timedelta_range('1 days 02:00:00', '10 days 02:00:00', - freq='D') - tm.assert_index_equal(result, expected) - rng += delta - tm.assert_index_equal(rng, expected) - - # int - rng = timedelta_range('1 days 09:00:00', freq='H', periods=10) - result = rng + 1 - expected = timedelta_range('1 days 10:00:00', freq='H', periods=10) - tm.assert_index_equal(result, expected) - rng += 1 - tm.assert_index_equal(rng, expected) - - def test_sub_isub(self): - # only test adding/sub offsets as - is now numeric - - # offset - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), Timedelta(hours=2)] - - for delta in offsets: - rng = timedelta_range('1 days', '10 days') - result = rng - delta - expected = timedelta_range('0 days 22:00:00', '9 days 22:00:00') - tm.assert_index_equal(result, expected) - rng -= delta - tm.assert_index_equal(rng, expected) - - # int - rng = timedelta_range('1 days 09:00:00', freq='H', periods=10) - result = rng - 1 - expected = timedelta_range('1 days 08:00:00', freq='H', periods=10) - tm.assert_index_equal(result, expected) - rng -= 1 - tm.assert_index_equal(rng, expected) - - idx = TimedeltaIndex(['1 day', '2 day']) - msg = "cannot subtract a datelike from a TimedeltaIndex" - with tm.assertRaisesRegexp(TypeError, msg): - idx - Timestamp('2011-01-01') - - result = Timestamp('2011-01-01') + idx - expected = DatetimeIndex(['2011-01-02', '2011-01-03']) - tm.assert_index_equal(result, expected) - - def test_ops_compat(self): - - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), Timedelta(hours=2)] - - rng = timedelta_range('1 days', '10 days', name='foo') - - # multiply - for offset in offsets: - self.assertRaises(TypeError, lambda: rng * offset) - - # divide - expected = Int64Index((np.arange(10) + 1) * 12, name='foo') - for offset in offsets: - result = rng / offset - tm.assert_index_equal(result, expected, exact=False) - - # divide with nats - rng = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') - expected = Float64Index([12, np.nan, 24], name='foo') - for offset in offsets: - result = rng / offset - tm.assert_index_equal(result, expected) - - # don't allow division by NaT (make could in the future) - self.assertRaises(TypeError, lambda: rng / pd.NaT) - - def test_subtraction_ops(self): - - # with datetimes/timedelta and tdi/dti - tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') - dti = date_range('20130101', periods=3, name='bar') - td = Timedelta('1 days') - dt = Timestamp('20130101') - - self.assertRaises(TypeError, lambda: tdi - dt) - self.assertRaises(TypeError, lambda: tdi - dti) - self.assertRaises(TypeError, lambda: td - dt) - self.assertRaises(TypeError, lambda: td - dti) - - result = dt - dti - expected = TimedeltaIndex(['0 days', '-1 days', '-2 days'], name='bar') - tm.assert_index_equal(result, expected) - - result = dti - dt - expected = TimedeltaIndex(['0 days', '1 days', '2 days'], name='bar') - tm.assert_index_equal(result, expected) - - result = tdi - td - expected = TimedeltaIndex(['0 days', pd.NaT, '1 days'], name='foo') - tm.assert_index_equal(result, expected, check_names=False) - - result = td - tdi - expected = TimedeltaIndex(['0 days', pd.NaT, '-1 days'], name='foo') - tm.assert_index_equal(result, expected, check_names=False) - - result = dti - td - expected = DatetimeIndex( - ['20121231', '20130101', '20130102'], name='bar') - tm.assert_index_equal(result, expected, check_names=False) - - result = dt - tdi - expected = DatetimeIndex(['20121231', pd.NaT, '20121230'], name='foo') - tm.assert_index_equal(result, expected) - - def test_subtraction_ops_with_tz(self): - - # check that dt/dti subtraction ops with tz are validated - dti = date_range('20130101', periods=3) - ts = Timestamp('20130101') - dt = ts.to_pydatetime() - dti_tz = date_range('20130101', periods=3).tz_localize('US/Eastern') - ts_tz = Timestamp('20130101').tz_localize('US/Eastern') - ts_tz2 = Timestamp('20130101').tz_localize('CET') - dt_tz = ts_tz.to_pydatetime() - td = Timedelta('1 days') - - def _check(result, expected): - self.assertEqual(result, expected) - self.assertIsInstance(result, Timedelta) - - # scalars - result = ts - ts - expected = Timedelta('0 days') - _check(result, expected) - - result = dt_tz - ts_tz - expected = Timedelta('0 days') - _check(result, expected) - - result = ts_tz - dt_tz - expected = Timedelta('0 days') - _check(result, expected) - - # tz mismatches - self.assertRaises(TypeError, lambda: dt_tz - ts) - self.assertRaises(TypeError, lambda: dt_tz - dt) - self.assertRaises(TypeError, lambda: dt_tz - ts_tz2) - self.assertRaises(TypeError, lambda: dt - dt_tz) - self.assertRaises(TypeError, lambda: ts - dt_tz) - self.assertRaises(TypeError, lambda: ts_tz2 - ts) - self.assertRaises(TypeError, lambda: ts_tz2 - dt) - self.assertRaises(TypeError, lambda: ts_tz - ts_tz2) - - # with dti - self.assertRaises(TypeError, lambda: dti - ts_tz) - self.assertRaises(TypeError, lambda: dti_tz - ts) - self.assertRaises(TypeError, lambda: dti_tz - ts_tz2) - - result = dti_tz - dt_tz - expected = TimedeltaIndex(['0 days', '1 days', '2 days']) - tm.assert_index_equal(result, expected) - - result = dt_tz - dti_tz - expected = TimedeltaIndex(['0 days', '-1 days', '-2 days']) - tm.assert_index_equal(result, expected) - - result = dti_tz - ts_tz - expected = TimedeltaIndex(['0 days', '1 days', '2 days']) - tm.assert_index_equal(result, expected) - - result = ts_tz - dti_tz - expected = TimedeltaIndex(['0 days', '-1 days', '-2 days']) - tm.assert_index_equal(result, expected) - - result = td - td - expected = Timedelta('0 days') - _check(result, expected) - - result = dti_tz - td - expected = DatetimeIndex( - ['20121231', '20130101', '20130102'], tz='US/Eastern') - tm.assert_index_equal(result, expected) - - def test_dti_tdi_numeric_ops(self): - - # These are normally union/diff set-like ops - tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') - dti = date_range('20130101', periods=3, name='bar') - - # TODO(wesm): unused? - # td = Timedelta('1 days') - # dt = Timestamp('20130101') - - result = tdi - tdi - expected = TimedeltaIndex(['0 days', pd.NaT, '0 days'], name='foo') - tm.assert_index_equal(result, expected) - - result = tdi + tdi - expected = TimedeltaIndex(['2 days', pd.NaT, '4 days'], name='foo') - tm.assert_index_equal(result, expected) - - result = dti - tdi # name will be reset - expected = DatetimeIndex(['20121231', pd.NaT, '20130101']) - tm.assert_index_equal(result, expected) - - def test_sub_period(self): - # GH 13078 - # not supported, check TypeError - p = pd.Period('2011-01-01', freq='D') - - for freq in [None, 'H']: - idx = pd.TimedeltaIndex(['1 hours', '2 hours'], freq=freq) - - with tm.assertRaises(TypeError): - idx - p - - with tm.assertRaises(TypeError): - p - idx - - def test_addition_ops(self): - - # with datetimes/timedelta and tdi/dti - tdi = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') - dti = date_range('20130101', periods=3, name='bar') - td = Timedelta('1 days') - dt = Timestamp('20130101') - - result = tdi + dt - expected = DatetimeIndex(['20130102', pd.NaT, '20130103'], name='foo') - tm.assert_index_equal(result, expected) - - result = dt + tdi - expected = DatetimeIndex(['20130102', pd.NaT, '20130103'], name='foo') - tm.assert_index_equal(result, expected) - - result = td + tdi - expected = TimedeltaIndex(['2 days', pd.NaT, '3 days'], name='foo') - tm.assert_index_equal(result, expected) - - result = tdi + td - expected = TimedeltaIndex(['2 days', pd.NaT, '3 days'], name='foo') - tm.assert_index_equal(result, expected) - - # unequal length - self.assertRaises(ValueError, lambda: tdi + dti[0:1]) - self.assertRaises(ValueError, lambda: tdi[0:1] + dti) - - # random indexes - self.assertRaises(TypeError, lambda: tdi + Int64Index([1, 2, 3])) - - # this is a union! - # self.assertRaises(TypeError, lambda : Int64Index([1,2,3]) + tdi) - - result = tdi + dti # name will be reset - expected = DatetimeIndex(['20130102', pd.NaT, '20130105']) - tm.assert_index_equal(result, expected) - - result = dti + tdi # name will be reset - expected = DatetimeIndex(['20130102', pd.NaT, '20130105']) - tm.assert_index_equal(result, expected) - - result = dt + td - expected = Timestamp('20130102') - self.assertEqual(result, expected) - - result = td + dt - expected = Timestamp('20130102') - self.assertEqual(result, expected) - - def test_comp_nat(self): - left = pd.TimedeltaIndex([pd.Timedelta('1 days'), pd.NaT, - pd.Timedelta('3 days')]) - right = pd.TimedeltaIndex([pd.NaT, pd.NaT, pd.Timedelta('3 days')]) - - for l, r in [(left, right), (left.asobject, right.asobject)]: - result = l == r - expected = np.array([False, False, True]) - tm.assert_numpy_array_equal(result, expected) - - result = l != r - expected = np.array([True, True, False]) - tm.assert_numpy_array_equal(result, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l == pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT == r, expected) - - expected = np.array([True, True, True]) - tm.assert_numpy_array_equal(l != pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT != l, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l < pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT > l, expected) - - def test_value_counts_unique(self): - # GH 7735 - - idx = timedelta_range('1 days 09:00:00', freq='H', periods=10) - # create repeated values, 'n'th element is repeated by n+1 times - idx = TimedeltaIndex(np.repeat(idx.values, range(1, len(idx) + 1))) - - exp_idx = timedelta_range('1 days 18:00:00', freq='-1H', periods=10) - expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - expected = timedelta_range('1 days 09:00:00', freq='H', periods=10) - tm.assert_index_equal(idx.unique(), expected) - - idx = TimedeltaIndex(['1 days 09:00:00', '1 days 09:00:00', - '1 days 09:00:00', '1 days 08:00:00', - '1 days 08:00:00', pd.NaT]) - - exp_idx = TimedeltaIndex(['1 days 09:00:00', '1 days 08:00:00']) - expected = Series([3, 2], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - exp_idx = TimedeltaIndex(['1 days 09:00:00', '1 days 08:00:00', - pd.NaT]) - expected = Series([3, 2, 1], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(dropna=False), expected) - - tm.assert_index_equal(idx.unique(), exp_idx) - - def test_nonunique_contains(self): - # GH 9512 - for idx in map(TimedeltaIndex, ([0, 1, 0], [0, 0, -1], [0, -1, -1], - ['00:01:00', '00:01:00', '00:02:00'], - ['00:01:00', '00:01:00', '00:00:01'])): - tm.assertIn(idx[0], idx) - - def test_unknown_attribute(self): - # GH 9680 - tdi = pd.timedelta_range(start=0, periods=10, freq='1s') - ts = pd.Series(np.random.normal(size=10), index=tdi) - self.assertNotIn('foo', ts.__dict__.keys()) - self.assertRaises(AttributeError, lambda: ts.foo) - - def test_order(self): - # GH 10295 - idx1 = TimedeltaIndex(['1 day', '2 day', '3 day'], freq='D', - name='idx') - idx2 = TimedeltaIndex( - ['1 hour', '2 hour', '3 hour'], freq='H', name='idx') - - for idx in [idx1, idx2]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, idx) - self.assertEqual(ordered.freq, idx.freq) - - ordered = idx.sort_values(ascending=False) - expected = idx[::-1] - self.assert_index_equal(ordered, expected) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq.n, -1) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, idx) - self.assert_numpy_array_equal(indexer, - np.array([0, 1, 2]), - check_dtype=False) - self.assertEqual(ordered.freq, idx.freq) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, idx[::-1]) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq.n, -1) - - idx1 = TimedeltaIndex(['1 hour', '3 hour', '5 hour', - '2 hour ', '1 hour'], name='idx1') - exp1 = TimedeltaIndex(['1 hour', '1 hour', '2 hour', - '3 hour', '5 hour'], name='idx1') - - idx2 = TimedeltaIndex(['1 day', '3 day', '5 day', - '2 day', '1 day'], name='idx2') - - # TODO(wesm): unused? - # exp2 = TimedeltaIndex(['1 day', '1 day', '2 day', - # '3 day', '5 day'], name='idx2') - - # idx3 = TimedeltaIndex([pd.NaT, '3 minute', '5 minute', - # '2 minute', pd.NaT], name='idx3') - # exp3 = TimedeltaIndex([pd.NaT, pd.NaT, '2 minute', '3 minute', - # '5 minute'], name='idx3') - - for idx, expected in [(idx1, exp1), (idx1, exp1), (idx1, exp1)]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, expected) - self.assertIsNone(ordered.freq) - - ordered = idx.sort_values(ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - self.assertIsNone(ordered.freq) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, expected) - - exp = np.array([0, 4, 3, 1, 2]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertIsNone(ordered.freq) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - - exp = np.array([2, 1, 3, 4, 0]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertIsNone(ordered.freq) - - def test_getitem(self): - idx1 = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') - - for idx in [idx1]: - result = idx[0] - self.assertEqual(result, pd.Timedelta('1 day')) - - result = idx[0:5] - expected = pd.timedelta_range('1 day', '5 day', freq='D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[0:10:2] - expected = pd.timedelta_range('1 day', '9 day', freq='2D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[-20:-5:3] - expected = pd.timedelta_range('12 day', '24 day', freq='3D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx[4::-1] - expected = TimedeltaIndex(['5 day', '4 day', '3 day', - '2 day', '1 day'], - freq='-1D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - def test_drop_duplicates_metadata(self): - # GH 10115 - idx = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') - result = idx.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertEqual(idx.freq, result.freq) - - idx_dup = idx.append(idx) - self.assertIsNone(idx_dup.freq) # freq is reset - result = idx_dup.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertIsNone(result.freq) - - def test_drop_duplicates(self): - # to check Index/Series compat - base = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') - idx = base.append(base[:5]) - - res = idx.drop_duplicates() - tm.assert_index_equal(res, base) - res = Series(idx).drop_duplicates() - tm.assert_series_equal(res, Series(base)) - - res = idx.drop_duplicates(keep='last') - exp = base[5:].append(base[:5]) - tm.assert_index_equal(res, exp) - res = Series(idx).drop_duplicates(keep='last') - tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) - - res = idx.drop_duplicates(keep=False) - tm.assert_index_equal(res, base[5:]) - res = Series(idx).drop_duplicates(keep=False) - tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) - - def test_take(self): - # GH 10295 - idx1 = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') - - for idx in [idx1]: - result = idx.take([0]) - self.assertEqual(result, pd.Timedelta('1 day')) - - result = idx.take([-1]) - self.assertEqual(result, pd.Timedelta('31 day')) - - result = idx.take([0, 1, 2]) - expected = pd.timedelta_range('1 day', '3 day', freq='D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([0, 2, 4]) - expected = pd.timedelta_range('1 day', '5 day', freq='2D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([7, 4, 1]) - expected = pd.timedelta_range('8 day', '2 day', freq='-3D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - - result = idx.take([3, 2, 5]) - expected = TimedeltaIndex(['4 day', '3 day', '6 day'], name='idx') - self.assert_index_equal(result, expected) - self.assertIsNone(result.freq) - - result = idx.take([-3, 2, 5]) - expected = TimedeltaIndex(['29 day', '3 day', '6 day'], name='idx') - self.assert_index_equal(result, expected) - self.assertIsNone(result.freq) - - def test_take_invalid_kwargs(self): - idx = pd.timedelta_range('1 day', '31 day', freq='D', name='idx') - indices = [1, 6, 5, 9, 10, 13, 15, 3] - - msg = r"take\(\) got an unexpected keyword argument 'foo'" - tm.assertRaisesRegexp(TypeError, msg, idx.take, - indices, foo=2) - - msg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, out=indices) - - msg = "the 'mode' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, idx.take, - indices, mode='clip') - - def test_infer_freq(self): - # GH 11018 - for freq in ['D', '3D', '-3D', 'H', '2H', '-2H', 'T', '2T', 'S', '-3S' - ]: - idx = pd.timedelta_range('1', freq=freq, periods=10) - result = pd.TimedeltaIndex(idx.asi8, freq='infer') - tm.assert_index_equal(idx, result) - self.assertEqual(result.freq, freq) - - def test_nat_new(self): - - idx = pd.timedelta_range('1', freq='D', periods=5, name='x') - result = idx._nat_new() - exp = pd.TimedeltaIndex([pd.NaT] * 5, name='x') - tm.assert_index_equal(result, exp) - - result = idx._nat_new(box=False) - exp = np.array([tslib.iNaT] * 5, dtype=np.int64) - tm.assert_numpy_array_equal(result, exp) - - def test_shift(self): - # GH 9903 - idx = pd.TimedeltaIndex([], name='xxx') - tm.assert_index_equal(idx.shift(0, freq='H'), idx) - tm.assert_index_equal(idx.shift(3, freq='H'), idx) - - idx = pd.TimedeltaIndex(['5 hours', '6 hours', '9 hours'], name='xxx') - tm.assert_index_equal(idx.shift(0, freq='H'), idx) - exp = pd.TimedeltaIndex(['8 hours', '9 hours', '12 hours'], name='xxx') - tm.assert_index_equal(idx.shift(3, freq='H'), exp) - exp = pd.TimedeltaIndex(['2 hours', '3 hours', '6 hours'], name='xxx') - tm.assert_index_equal(idx.shift(-3, freq='H'), exp) - - tm.assert_index_equal(idx.shift(0, freq='T'), idx) - exp = pd.TimedeltaIndex(['05:03:00', '06:03:00', '9:03:00'], - name='xxx') - tm.assert_index_equal(idx.shift(3, freq='T'), exp) - exp = pd.TimedeltaIndex(['04:57:00', '05:57:00', '8:57:00'], - name='xxx') - tm.assert_index_equal(idx.shift(-3, freq='T'), exp) - - def test_repeat(self): - index = pd.timedelta_range('1 days', periods=2, freq='D') - exp = pd.TimedeltaIndex(['1 days', '1 days', '2 days', '2 days']) - for res in [index.repeat(2), np.repeat(index, 2)]: - tm.assert_index_equal(res, exp) - self.assertIsNone(res.freq) - - index = TimedeltaIndex(['1 days', 'NaT', '3 days']) - exp = TimedeltaIndex(['1 days', '1 days', '1 days', - 'NaT', 'NaT', 'NaT', - '3 days', '3 days', '3 days']) - for res in [index.repeat(3), np.repeat(index, 3)]: - tm.assert_index_equal(res, exp) - self.assertIsNone(res.freq) - - def test_nat(self): - self.assertIs(pd.TimedeltaIndex._na_value, pd.NaT) - self.assertIs(pd.TimedeltaIndex([])._na_value, pd.NaT) - - idx = pd.TimedeltaIndex(['1 days', '2 days']) - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) - self.assertFalse(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([], dtype=np.intp)) - - idx = pd.TimedeltaIndex(['1 days', 'NaT']) - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) - self.assertTrue(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([1], dtype=np.intp)) - - def test_equals(self): - # GH 13107 - idx = pd.TimedeltaIndex(['1 days', '2 days', 'NaT']) - self.assertTrue(idx.equals(idx)) - self.assertTrue(idx.equals(idx.copy())) - self.assertTrue(idx.equals(idx.asobject)) - self.assertTrue(idx.asobject.equals(idx)) - self.assertTrue(idx.asobject.equals(idx.asobject)) - self.assertFalse(idx.equals(list(idx))) - self.assertFalse(idx.equals(pd.Series(idx))) - - idx2 = pd.TimedeltaIndex(['2 days', '1 days', 'NaT']) - self.assertFalse(idx.equals(idx2)) - self.assertFalse(idx.equals(idx2.copy())) - self.assertFalse(idx.equals(idx2.asobject)) - self.assertFalse(idx.asobject.equals(idx2)) - self.assertFalse(idx.asobject.equals(idx2.asobject)) - self.assertFalse(idx.equals(list(idx2))) - self.assertFalse(idx.equals(pd.Series(idx2))) - - -class TestPeriodIndexOps(Ops): - def setUp(self): - super(TestPeriodIndexOps, self).setUp() - mask = lambda x: (isinstance(x, DatetimeIndex) or - isinstance(x, PeriodIndex)) - self.is_valid_objs = [o for o in self.objs if mask(o)] - self.not_valid_objs = [o for o in self.objs if not mask(o)] - - def test_ops_properties(self): - self.check_ops_properties( - ['year', 'month', 'day', 'hour', 'minute', 'second', 'weekofyear', - 'week', 'dayofweek', 'dayofyear', 'quarter']) - self.check_ops_properties(['qyear'], - lambda x: isinstance(x, PeriodIndex)) - - def test_asobject_tolist(self): - idx = pd.period_range(start='2013-01-01', periods=4, freq='M', - name='idx') - expected_list = [pd.Period('2013-01-31', freq='M'), - pd.Period('2013-02-28', freq='M'), - pd.Period('2013-03-31', freq='M'), - pd.Period('2013-04-30', freq='M')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - self.assertEqual(result.dtype, object) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(idx.tolist(), expected_list) - - idx = PeriodIndex(['2013-01-01', '2013-01-02', 'NaT', - '2013-01-04'], freq='D', name='idx') - expected_list = [pd.Period('2013-01-01', freq='D'), - pd.Period('2013-01-02', freq='D'), - pd.Period('NaT', freq='D'), - pd.Period('2013-01-04', freq='D')] - expected = pd.Index(expected_list, dtype=object, name='idx') - result = idx.asobject - self.assertTrue(isinstance(result, Index)) - self.assertEqual(result.dtype, object) - tm.assert_index_equal(result, expected) - for i in [0, 1, 3]: - self.assertEqual(result[i], expected[i]) - self.assertIs(result[2], pd.NaT) - self.assertEqual(result.name, expected.name) - - result_list = idx.tolist() - for i in [0, 1, 3]: - self.assertEqual(result_list[i], expected_list[i]) - self.assertIs(result_list[2], pd.NaT) - - def test_minmax(self): - - # monotonic - idx1 = pd.PeriodIndex([pd.NaT, '2011-01-01', '2011-01-02', - '2011-01-03'], freq='D') - self.assertTrue(idx1.is_monotonic) - - # non-monotonic - idx2 = pd.PeriodIndex(['2011-01-01', pd.NaT, '2011-01-03', - '2011-01-02', pd.NaT], freq='D') - self.assertFalse(idx2.is_monotonic) - - for idx in [idx1, idx2]: - self.assertEqual(idx.min(), pd.Period('2011-01-01', freq='D')) - self.assertEqual(idx.max(), pd.Period('2011-01-03', freq='D')) - self.assertEqual(idx1.argmin(), 1) - self.assertEqual(idx2.argmin(), 0) - self.assertEqual(idx1.argmax(), 3) - self.assertEqual(idx2.argmax(), 2) - - for op in ['min', 'max']: - # Return NaT - obj = PeriodIndex([], freq='M') - result = getattr(obj, op)() - self.assertIs(result, tslib.NaT) - - obj = PeriodIndex([pd.NaT], freq='M') - result = getattr(obj, op)() - self.assertIs(result, tslib.NaT) - - obj = PeriodIndex([pd.NaT, pd.NaT, pd.NaT], freq='M') - result = getattr(obj, op)() - self.assertIs(result, tslib.NaT) - - def test_numpy_minmax(self): - pr = pd.period_range(start='2016-01-15', end='2016-01-20') - - self.assertEqual(np.min(pr), Period('2016-01-15', freq='D')) - self.assertEqual(np.max(pr), Period('2016-01-20', freq='D')) - - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.min, pr, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.max, pr, out=0) - - self.assertEqual(np.argmin(pr), 0) - self.assertEqual(np.argmax(pr), 5) - - if not _np_version_under1p10: - errmsg = "the 'out' parameter is not supported" - tm.assertRaisesRegexp(ValueError, errmsg, np.argmin, pr, out=0) - tm.assertRaisesRegexp(ValueError, errmsg, np.argmax, pr, out=0) - - def test_representation(self): - # GH 7601 - idx1 = PeriodIndex([], freq='D') - idx2 = PeriodIndex(['2011-01-01'], freq='D') - idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') - idx4 = PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], - freq='D') - idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') - idx6 = PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', - 'NaT'], freq='H') - idx7 = pd.period_range('2013Q1', periods=1, freq="Q") - idx8 = pd.period_range('2013Q1', periods=2, freq="Q") - idx9 = pd.period_range('2013Q1', periods=3, freq="Q") - idx10 = PeriodIndex(['2011-01-01', '2011-02-01'], freq='3D') - - exp1 = """PeriodIndex([], dtype='period[D]', freq='D')""" - - exp2 = """PeriodIndex(['2011-01-01'], dtype='period[D]', freq='D')""" - - exp3 = ("PeriodIndex(['2011-01-01', '2011-01-02'], dtype='period[D]', " - "freq='D')") - - exp4 = ("PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], " - "dtype='period[D]', freq='D')") - - exp5 = ("PeriodIndex(['2011', '2012', '2013'], dtype='period[A-DEC]', " - "freq='A-DEC')") - - exp6 = ("PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], " - "dtype='period[H]', freq='H')") - - exp7 = ("PeriodIndex(['2013Q1'], dtype='period[Q-DEC]', " - "freq='Q-DEC')") - - exp8 = ("PeriodIndex(['2013Q1', '2013Q2'], dtype='period[Q-DEC]', " - "freq='Q-DEC')") - - exp9 = ("PeriodIndex(['2013Q1', '2013Q2', '2013Q3'], " - "dtype='period[Q-DEC]', freq='Q-DEC')") - - exp10 = ("PeriodIndex(['2011-01-01', '2011-02-01'], " - "dtype='period[3D]', freq='3D')") - - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, - idx6, idx7, idx8, idx9, idx10], - [exp1, exp2, exp3, exp4, exp5, - exp6, exp7, exp8, exp9, exp10]): - for func in ['__repr__', '__unicode__', '__str__']: - result = getattr(idx, func)() - self.assertEqual(result, expected) - - def test_representation_to_series(self): - # GH 10971 - idx1 = PeriodIndex([], freq='D') - idx2 = PeriodIndex(['2011-01-01'], freq='D') - idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') - idx4 = PeriodIndex(['2011-01-01', '2011-01-02', - '2011-01-03'], freq='D') - idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') - idx6 = PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', - 'NaT'], freq='H') - - idx7 = pd.period_range('2013Q1', periods=1, freq="Q") - idx8 = pd.period_range('2013Q1', periods=2, freq="Q") - idx9 = pd.period_range('2013Q1', periods=3, freq="Q") - - exp1 = """Series([], dtype: object)""" - - exp2 = """0 2011-01-01 -dtype: object""" - - exp3 = """0 2011-01-01 -1 2011-01-02 -dtype: object""" - - exp4 = """0 2011-01-01 -1 2011-01-02 -2 2011-01-03 -dtype: object""" - - exp5 = """0 2011 -1 2012 -2 2013 -dtype: object""" - - exp6 = """0 2011-01-01 09:00 -1 2012-02-01 10:00 -2 NaT -dtype: object""" - - exp7 = """0 2013Q1 -dtype: object""" - - exp8 = """0 2013Q1 -1 2013Q2 -dtype: object""" - - exp9 = """0 2013Q1 -1 2013Q2 -2 2013Q3 -dtype: object""" - - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, - idx6, idx7, idx8, idx9], - [exp1, exp2, exp3, exp4, exp5, - exp6, exp7, exp8, exp9]): - result = repr(pd.Series(idx)) - self.assertEqual(result, expected) - - def test_summary(self): - # GH9116 - idx1 = PeriodIndex([], freq='D') - idx2 = PeriodIndex(['2011-01-01'], freq='D') - idx3 = PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') - idx4 = PeriodIndex( - ['2011-01-01', '2011-01-02', '2011-01-03'], freq='D') - idx5 = PeriodIndex(['2011', '2012', '2013'], freq='A') - idx6 = PeriodIndex( - ['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], freq='H') - - idx7 = pd.period_range('2013Q1', periods=1, freq="Q") - idx8 = pd.period_range('2013Q1', periods=2, freq="Q") - idx9 = pd.period_range('2013Q1', periods=3, freq="Q") - - exp1 = """PeriodIndex: 0 entries -Freq: D""" - - exp2 = """PeriodIndex: 1 entries, 2011-01-01 to 2011-01-01 -Freq: D""" - - exp3 = """PeriodIndex: 2 entries, 2011-01-01 to 2011-01-02 -Freq: D""" - - exp4 = """PeriodIndex: 3 entries, 2011-01-01 to 2011-01-03 -Freq: D""" - - exp5 = """PeriodIndex: 3 entries, 2011 to 2013 -Freq: A-DEC""" - - exp6 = """PeriodIndex: 3 entries, 2011-01-01 09:00 to NaT -Freq: H""" - - exp7 = """PeriodIndex: 1 entries, 2013Q1 to 2013Q1 -Freq: Q-DEC""" - - exp8 = """PeriodIndex: 2 entries, 2013Q1 to 2013Q2 -Freq: Q-DEC""" - - exp9 = """PeriodIndex: 3 entries, 2013Q1 to 2013Q3 -Freq: Q-DEC""" - - for idx, expected in zip([idx1, idx2, idx3, idx4, idx5, - idx6, idx7, idx8, idx9], - [exp1, exp2, exp3, exp4, exp5, - exp6, exp7, exp8, exp9]): - result = idx.summary() - self.assertEqual(result, expected) - - def test_resolution(self): - for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', - 'T', 'S', 'L', 'U'], - ['day', 'day', 'day', 'day', - 'hour', 'minute', 'second', - 'millisecond', 'microsecond']): - - idx = pd.period_range(start='2013-04-01', periods=30, freq=freq) - self.assertEqual(idx.resolution, expected) - - def test_union(self): - # union - rng1 = pd.period_range('1/1/2000', freq='D', periods=5) - other1 = pd.period_range('1/6/2000', freq='D', periods=5) - expected1 = pd.period_range('1/1/2000', freq='D', periods=10) - - rng2 = pd.period_range('1/1/2000', freq='D', periods=5) - other2 = pd.period_range('1/4/2000', freq='D', periods=5) - expected2 = pd.period_range('1/1/2000', freq='D', periods=8) - - rng3 = pd.period_range('1/1/2000', freq='D', periods=5) - other3 = pd.PeriodIndex([], freq='D') - expected3 = pd.period_range('1/1/2000', freq='D', periods=5) - - rng4 = pd.period_range('2000-01-01 09:00', freq='H', periods=5) - other4 = pd.period_range('2000-01-02 09:00', freq='H', periods=5) - expected4 = pd.PeriodIndex(['2000-01-01 09:00', '2000-01-01 10:00', - '2000-01-01 11:00', '2000-01-01 12:00', - '2000-01-01 13:00', '2000-01-02 09:00', - '2000-01-02 10:00', '2000-01-02 11:00', - '2000-01-02 12:00', '2000-01-02 13:00'], - freq='H') - - rng5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', - '2000-01-01 09:05'], freq='T') - other5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:05' - '2000-01-01 09:08'], - freq='T') - expected5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', - '2000-01-01 09:05', '2000-01-01 09:08'], - freq='T') - - rng6 = pd.period_range('2000-01-01', freq='M', periods=7) - other6 = pd.period_range('2000-04-01', freq='M', periods=7) - expected6 = pd.period_range('2000-01-01', freq='M', periods=10) - - rng7 = pd.period_range('2003-01-01', freq='A', periods=5) - other7 = pd.period_range('1998-01-01', freq='A', periods=8) - expected7 = pd.period_range('1998-01-01', freq='A', periods=10) - - for rng, other, expected in [(rng1, other1, expected1), - (rng2, other2, expected2), - (rng3, other3, expected3), (rng4, other4, - expected4), - (rng5, other5, expected5), (rng6, other6, - expected6), - (rng7, other7, expected7)]: - - result_union = rng.union(other) - tm.assert_index_equal(result_union, expected) - - def test_add_iadd(self): - rng = pd.period_range('1/1/2000', freq='D', periods=5) - other = pd.period_range('1/6/2000', freq='D', periods=5) - - # previously performed setop union, now raises TypeError (GH14164) - with tm.assertRaises(TypeError): - rng + other - - with tm.assertRaises(TypeError): - rng += other - - # offset - # DateOffset - rng = pd.period_range('2014', '2024', freq='A') - result = rng + pd.offsets.YearEnd(5) - expected = pd.period_range('2019', '2029', freq='A') - tm.assert_index_equal(result, expected) - rng += pd.offsets.YearEnd(5) - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365), Timedelta(days=365)]: - msg = ('Input has different freq(=.+)? ' - 'from PeriodIndex\\(freq=A-DEC\\)') - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng + o - - rng = pd.period_range('2014-01', '2016-12', freq='M') - result = rng + pd.offsets.MonthEnd(5) - expected = pd.period_range('2014-06', '2017-05', freq='M') - tm.assert_index_equal(result, expected) - rng += pd.offsets.MonthEnd(5) - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365), Timedelta(days=365)]: - rng = pd.period_range('2014-01', '2016-12', freq='M') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=M\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng + o - - # Tick - offsets = [pd.offsets.Day(3), timedelta(days=3), - np.timedelta64(3, 'D'), pd.offsets.Hour(72), - timedelta(minutes=60 * 24 * 3), np.timedelta64(72, 'h'), - Timedelta('72:00:00')] - for delta in offsets: - rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') - result = rng + delta - expected = pd.period_range('2014-05-04', '2014-05-18', freq='D') - tm.assert_index_equal(result, expected) - rng += delta - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23), Timedelta('23:00:00')]: - rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=D\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng + o - - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), pd.offsets.Minute(120), - timedelta(minutes=120), np.timedelta64(120, 'm'), - Timedelta(minutes=120)] - for delta in offsets: - rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', - freq='H') - result = rng + delta - expected = pd.period_range('2014-01-01 12:00', '2014-01-05 12:00', - freq='H') - tm.assert_index_equal(result, expected) - rng += delta - tm.assert_index_equal(rng, expected) - - for delta in [pd.offsets.YearBegin(2), timedelta(minutes=30), - np.timedelta64(30, 's'), Timedelta(seconds=30)]: - rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', - freq='H') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=H\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - result = rng + delta - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng += delta - - # int - rng = pd.period_range('2000-01-01 09:00', freq='H', periods=10) - result = rng + 1 - expected = pd.period_range('2000-01-01 10:00', freq='H', periods=10) - tm.assert_index_equal(result, expected) - rng += 1 - tm.assert_index_equal(rng, expected) - - def test_difference(self): - # diff - rng1 = pd.period_range('1/1/2000', freq='D', periods=5) - other1 = pd.period_range('1/6/2000', freq='D', periods=5) - expected1 = pd.period_range('1/1/2000', freq='D', periods=5) - - rng2 = pd.period_range('1/1/2000', freq='D', periods=5) - other2 = pd.period_range('1/4/2000', freq='D', periods=5) - expected2 = pd.period_range('1/1/2000', freq='D', periods=3) - - rng3 = pd.period_range('1/1/2000', freq='D', periods=5) - other3 = pd.PeriodIndex([], freq='D') - expected3 = pd.period_range('1/1/2000', freq='D', periods=5) - - rng4 = pd.period_range('2000-01-01 09:00', freq='H', periods=5) - other4 = pd.period_range('2000-01-02 09:00', freq='H', periods=5) - expected4 = rng4 - - rng5 = pd.PeriodIndex(['2000-01-01 09:01', '2000-01-01 09:03', - '2000-01-01 09:05'], freq='T') - other5 = pd.PeriodIndex( - ['2000-01-01 09:01', '2000-01-01 09:05'], freq='T') - expected5 = pd.PeriodIndex(['2000-01-01 09:03'], freq='T') - - rng6 = pd.period_range('2000-01-01', freq='M', periods=7) - other6 = pd.period_range('2000-04-01', freq='M', periods=7) - expected6 = pd.period_range('2000-01-01', freq='M', periods=3) - - rng7 = pd.period_range('2003-01-01', freq='A', periods=5) - other7 = pd.period_range('1998-01-01', freq='A', periods=8) - expected7 = pd.period_range('2006-01-01', freq='A', periods=2) - - for rng, other, expected in [(rng1, other1, expected1), - (rng2, other2, expected2), - (rng3, other3, expected3), - (rng4, other4, expected4), - (rng5, other5, expected5), - (rng6, other6, expected6), - (rng7, other7, expected7), ]: - result_union = rng.difference(other) - tm.assert_index_equal(result_union, expected) - - def test_sub_isub(self): - - # previously performed setop, now raises TypeError (GH14164) - # TODO needs to wait on #13077 for decision on result type - rng = pd.period_range('1/1/2000', freq='D', periods=5) - other = pd.period_range('1/6/2000', freq='D', periods=5) - - with tm.assertRaises(TypeError): - rng - other - - with tm.assertRaises(TypeError): - rng -= other - - # offset - # DateOffset - rng = pd.period_range('2014', '2024', freq='A') - result = rng - pd.offsets.YearEnd(5) - expected = pd.period_range('2009', '2019', freq='A') - tm.assert_index_equal(result, expected) - rng -= pd.offsets.YearEnd(5) - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - rng = pd.period_range('2014', '2024', freq='A') - msg = ('Input has different freq(=.+)? ' - 'from PeriodIndex\\(freq=A-DEC\\)') - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng - o - - rng = pd.period_range('2014-01', '2016-12', freq='M') - result = rng - pd.offsets.MonthEnd(5) - expected = pd.period_range('2013-08', '2016-07', freq='M') - tm.assert_index_equal(result, expected) - rng -= pd.offsets.MonthEnd(5) - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - rng = pd.period_range('2014-01', '2016-12', freq='M') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=M\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng - o - - # Tick - offsets = [pd.offsets.Day(3), timedelta(days=3), - np.timedelta64(3, 'D'), pd.offsets.Hour(72), - timedelta(minutes=60 * 24 * 3), np.timedelta64(72, 'h')] - for delta in offsets: - rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') - result = rng - delta - expected = pd.period_range('2014-04-28', '2014-05-12', freq='D') - tm.assert_index_equal(result, expected) - rng -= delta - tm.assert_index_equal(rng, expected) - - for o in [pd.offsets.YearBegin(2), pd.offsets.MonthBegin(1), - pd.offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23)]: - rng = pd.period_range('2014-05-01', '2014-05-15', freq='D') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=D\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng - o - - offsets = [pd.offsets.Hour(2), timedelta(hours=2), - np.timedelta64(2, 'h'), pd.offsets.Minute(120), - timedelta(minutes=120), np.timedelta64(120, 'm')] - for delta in offsets: - rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', - freq='H') - result = rng - delta - expected = pd.period_range('2014-01-01 08:00', '2014-01-05 08:00', - freq='H') - tm.assert_index_equal(result, expected) - rng -= delta - tm.assert_index_equal(rng, expected) - - for delta in [pd.offsets.YearBegin(2), timedelta(minutes=30), - np.timedelta64(30, 's')]: - rng = pd.period_range('2014-01-01 10:00', '2014-01-05 10:00', - freq='H') - msg = 'Input has different freq(=.+)? from PeriodIndex\\(freq=H\\)' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - result = rng + delta - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - rng += delta - - # int - rng = pd.period_range('2000-01-01 09:00', freq='H', periods=10) - result = rng - 1 - expected = pd.period_range('2000-01-01 08:00', freq='H', periods=10) - tm.assert_index_equal(result, expected) - rng -= 1 - tm.assert_index_equal(rng, expected) - - def test_comp_nat(self): - left = pd.PeriodIndex([pd.Period('2011-01-01'), pd.NaT, - pd.Period('2011-01-03')]) - right = pd.PeriodIndex([pd.NaT, pd.NaT, pd.Period('2011-01-03')]) - - for l, r in [(left, right), (left.asobject, right.asobject)]: - result = l == r - expected = np.array([False, False, True]) - tm.assert_numpy_array_equal(result, expected) - - result = l != r - expected = np.array([True, True, False]) - tm.assert_numpy_array_equal(result, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l == pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT == r, expected) - - expected = np.array([True, True, True]) - tm.assert_numpy_array_equal(l != pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT != l, expected) - - expected = np.array([False, False, False]) - tm.assert_numpy_array_equal(l < pd.NaT, expected) - tm.assert_numpy_array_equal(pd.NaT > l, expected) - - def test_value_counts_unique(self): - # GH 7735 - idx = pd.period_range('2011-01-01 09:00', freq='H', periods=10) - # create repeated values, 'n'th element is repeated by n+1 times - idx = PeriodIndex(np.repeat(idx.values, range(1, len(idx) + 1)), - freq='H') - - exp_idx = PeriodIndex(['2011-01-01 18:00', '2011-01-01 17:00', - '2011-01-01 16:00', '2011-01-01 15:00', - '2011-01-01 14:00', '2011-01-01 13:00', - '2011-01-01 12:00', '2011-01-01 11:00', - '2011-01-01 10:00', - '2011-01-01 09:00'], freq='H') - expected = Series(range(10, 0, -1), index=exp_idx, dtype='int64') - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - expected = pd.period_range('2011-01-01 09:00', freq='H', - periods=10) - tm.assert_index_equal(idx.unique(), expected) - - idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 09:00', - '2013-01-01 09:00', '2013-01-01 08:00', - '2013-01-01 08:00', pd.NaT], freq='H') - - exp_idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 08:00'], - freq='H') - expected = Series([3, 2], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(), expected) - - exp_idx = PeriodIndex(['2013-01-01 09:00', '2013-01-01 08:00', - pd.NaT], freq='H') - expected = Series([3, 2, 1], index=exp_idx) - - for obj in [idx, Series(idx)]: - tm.assert_series_equal(obj.value_counts(dropna=False), expected) - - tm.assert_index_equal(idx.unique(), exp_idx) - - def test_drop_duplicates_metadata(self): - # GH 10115 - idx = pd.period_range('2011-01-01', '2011-01-31', freq='D', name='idx') - result = idx.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertEqual(idx.freq, result.freq) - - idx_dup = idx.append(idx) # freq will not be reset - result = idx_dup.drop_duplicates() - self.assert_index_equal(idx, result) - self.assertEqual(idx.freq, result.freq) - - def test_drop_duplicates(self): - # to check Index/Series compat - base = pd.period_range('2011-01-01', '2011-01-31', freq='D', - name='idx') - idx = base.append(base[:5]) - - res = idx.drop_duplicates() - tm.assert_index_equal(res, base) - res = Series(idx).drop_duplicates() - tm.assert_series_equal(res, Series(base)) - - res = idx.drop_duplicates(keep='last') - exp = base[5:].append(base[:5]) - tm.assert_index_equal(res, exp) - res = Series(idx).drop_duplicates(keep='last') - tm.assert_series_equal(res, Series(exp, index=np.arange(5, 36))) - - res = idx.drop_duplicates(keep=False) - tm.assert_index_equal(res, base[5:]) - res = Series(idx).drop_duplicates(keep=False) - tm.assert_series_equal(res, Series(base[5:], index=np.arange(5, 31))) - - def test_order_compat(self): - def _check_freq(index, expected_index): - if isinstance(index, PeriodIndex): - self.assertEqual(index.freq, expected_index.freq) - - pidx = PeriodIndex(['2011', '2012', '2013'], name='pidx', freq='A') - # for compatibility check - iidx = Index([2011, 2012, 2013], name='idx') - for idx in [pidx, iidx]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, idx) - _check_freq(ordered, idx) - - ordered = idx.sort_values(ascending=False) - self.assert_index_equal(ordered, idx[::-1]) - _check_freq(ordered, idx[::-1]) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, idx) - self.assert_numpy_array_equal(indexer, - np.array([0, 1, 2]), - check_dtype=False) - _check_freq(ordered, idx) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, idx[::-1]) - self.assert_numpy_array_equal(indexer, - np.array([2, 1, 0]), - check_dtype=False) - _check_freq(ordered, idx[::-1]) - - pidx = PeriodIndex(['2011', '2013', '2015', '2012', - '2011'], name='pidx', freq='A') - pexpected = PeriodIndex( - ['2011', '2011', '2012', '2013', '2015'], name='pidx', freq='A') - # for compatibility check - iidx = Index([2011, 2013, 2015, 2012, 2011], name='idx') - iexpected = Index([2011, 2011, 2012, 2013, 2015], name='idx') - for idx, expected in [(pidx, pexpected), (iidx, iexpected)]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, expected) - _check_freq(ordered, idx) - - ordered = idx.sort_values(ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - _check_freq(ordered, idx) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, expected) - - exp = np.array([0, 4, 3, 1, 2]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - _check_freq(ordered, idx) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - - exp = np.array([2, 1, 3, 4, 0]) - self.assert_numpy_array_equal(indexer, exp, - check_dtype=False) - _check_freq(ordered, idx) - - pidx = PeriodIndex(['2011', '2013', 'NaT', '2011'], name='pidx', - freq='D') - - result = pidx.sort_values() - expected = PeriodIndex(['NaT', '2011', '2011', '2013'], - name='pidx', freq='D') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, 'D') - - result = pidx.sort_values(ascending=False) - expected = PeriodIndex( - ['2013', '2011', '2011', 'NaT'], name='pidx', freq='D') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, 'D') - - def test_order(self): - for freq in ['D', '2D', '4D']: - idx = PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03'], - freq=freq, name='idx') - - ordered = idx.sort_values() - self.assert_index_equal(ordered, idx) - self.assertEqual(ordered.freq, idx.freq) - - ordered = idx.sort_values(ascending=False) - expected = idx[::-1] - self.assert_index_equal(ordered, expected) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq, freq) - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, idx) - self.assert_numpy_array_equal(indexer, - np.array([0, 1, 2]), - check_dtype=False) - self.assertEqual(ordered.freq, idx.freq) - self.assertEqual(ordered.freq, freq) - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - expected = idx[::-1] - self.assert_index_equal(ordered, expected) - self.assert_numpy_array_equal(indexer, - np.array([2, 1, 0]), - check_dtype=False) - self.assertEqual(ordered.freq, expected.freq) - self.assertEqual(ordered.freq, freq) - - idx1 = PeriodIndex(['2011-01-01', '2011-01-03', '2011-01-05', - '2011-01-02', '2011-01-01'], freq='D', name='idx1') - exp1 = PeriodIndex(['2011-01-01', '2011-01-01', '2011-01-02', - '2011-01-03', '2011-01-05'], freq='D', name='idx1') - - idx2 = PeriodIndex(['2011-01-01', '2011-01-03', '2011-01-05', - '2011-01-02', '2011-01-01'], - freq='D', name='idx2') - exp2 = PeriodIndex(['2011-01-01', '2011-01-01', '2011-01-02', - '2011-01-03', '2011-01-05'], - freq='D', name='idx2') - - idx3 = PeriodIndex([pd.NaT, '2011-01-03', '2011-01-05', - '2011-01-02', pd.NaT], freq='D', name='idx3') - exp3 = PeriodIndex([pd.NaT, pd.NaT, '2011-01-02', '2011-01-03', - '2011-01-05'], freq='D', name='idx3') - - for idx, expected in [(idx1, exp1), (idx2, exp2), (idx3, exp3)]: - ordered = idx.sort_values() - self.assert_index_equal(ordered, expected) - self.assertEqual(ordered.freq, 'D') - - ordered = idx.sort_values(ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - self.assertEqual(ordered.freq, 'D') - - ordered, indexer = idx.sort_values(return_indexer=True) - self.assert_index_equal(ordered, expected) - - exp = np.array([0, 4, 3, 1, 2]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertEqual(ordered.freq, 'D') - - ordered, indexer = idx.sort_values(return_indexer=True, - ascending=False) - self.assert_index_equal(ordered, expected[::-1]) - - exp = np.array([2, 1, 3, 4, 0]) - self.assert_numpy_array_equal(indexer, exp, check_dtype=False) - self.assertEqual(ordered.freq, 'D') - - def test_getitem(self): - idx1 = pd.period_range('2011-01-01', '2011-01-31', freq='D', - name='idx') - - for idx in [idx1]: - result = idx[0] - self.assertEqual(result, pd.Period('2011-01-01', freq='D')) - - result = idx[-1] - self.assertEqual(result, pd.Period('2011-01-31', freq='D')) - - result = idx[0:5] - expected = pd.period_range('2011-01-01', '2011-01-05', freq='D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx[0:10:2] - expected = pd.PeriodIndex(['2011-01-01', '2011-01-03', - '2011-01-05', - '2011-01-07', '2011-01-09'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx[-20:-5:3] - expected = pd.PeriodIndex(['2011-01-12', '2011-01-15', - '2011-01-18', - '2011-01-21', '2011-01-24'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx[4::-1] - expected = PeriodIndex(['2011-01-05', '2011-01-04', '2011-01-03', - '2011-01-02', '2011-01-01'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - def test_take(self): - # GH 10295 - idx1 = pd.period_range('2011-01-01', '2011-01-31', freq='D', - name='idx') - - for idx in [idx1]: - result = idx.take([0]) - self.assertEqual(result, pd.Period('2011-01-01', freq='D')) - - result = idx.take([5]) - self.assertEqual(result, pd.Period('2011-01-06', freq='D')) - - result = idx.take([0, 1, 2]) - expected = pd.period_range('2011-01-01', '2011-01-03', freq='D', - name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, 'D') - self.assertEqual(result.freq, expected.freq) - - result = idx.take([0, 2, 4]) - expected = pd.PeriodIndex(['2011-01-01', '2011-01-03', - '2011-01-05'], freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx.take([7, 4, 1]) - expected = pd.PeriodIndex(['2011-01-08', '2011-01-05', - '2011-01-02'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx.take([3, 2, 5]) - expected = PeriodIndex(['2011-01-04', '2011-01-03', '2011-01-06'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - result = idx.take([-3, 2, 5]) - expected = PeriodIndex(['2011-01-29', '2011-01-03', '2011-01-06'], - freq='D', name='idx') - self.assert_index_equal(result, expected) - self.assertEqual(result.freq, expected.freq) - self.assertEqual(result.freq, 'D') - - def test_nat_new(self): - - idx = pd.period_range('2011-01', freq='M', periods=5, name='x') - result = idx._nat_new() - exp = pd.PeriodIndex([pd.NaT] * 5, freq='M', name='x') - tm.assert_index_equal(result, exp) - - result = idx._nat_new(box=False) - exp = np.array([tslib.iNaT] * 5, dtype=np.int64) - tm.assert_numpy_array_equal(result, exp) - - def test_shift(self): - # GH 9903 - idx = pd.PeriodIndex([], name='xxx', freq='H') - - with tm.assertRaises(TypeError): - # period shift doesn't accept freq - idx.shift(1, freq='H') - - tm.assert_index_equal(idx.shift(0), idx) - tm.assert_index_equal(idx.shift(3), idx) - - idx = pd.PeriodIndex(['2011-01-01 10:00', '2011-01-01 11:00' - '2011-01-01 12:00'], name='xxx', freq='H') - tm.assert_index_equal(idx.shift(0), idx) - exp = pd.PeriodIndex(['2011-01-01 13:00', '2011-01-01 14:00' - '2011-01-01 15:00'], name='xxx', freq='H') - tm.assert_index_equal(idx.shift(3), exp) - exp = pd.PeriodIndex(['2011-01-01 07:00', '2011-01-01 08:00' - '2011-01-01 09:00'], name='xxx', freq='H') - tm.assert_index_equal(idx.shift(-3), exp) - - def test_repeat(self): - index = pd.period_range('2001-01-01', periods=2, freq='D') - exp = pd.PeriodIndex(['2001-01-01', '2001-01-01', - '2001-01-02', '2001-01-02'], freq='D') - for res in [index.repeat(2), np.repeat(index, 2)]: - tm.assert_index_equal(res, exp) - - index = pd.period_range('2001-01-01', periods=2, freq='2D') - exp = pd.PeriodIndex(['2001-01-01', '2001-01-01', - '2001-01-03', '2001-01-03'], freq='2D') - for res in [index.repeat(2), np.repeat(index, 2)]: - tm.assert_index_equal(res, exp) - - index = pd.PeriodIndex(['2001-01', 'NaT', '2003-01'], freq='M') - exp = pd.PeriodIndex(['2001-01', '2001-01', '2001-01', - 'NaT', 'NaT', 'NaT', - '2003-01', '2003-01', '2003-01'], freq='M') - for res in [index.repeat(3), np.repeat(index, 3)]: - tm.assert_index_equal(res, exp) - - def test_nat(self): - self.assertIs(pd.PeriodIndex._na_value, pd.NaT) - self.assertIs(pd.PeriodIndex([], freq='M')._na_value, pd.NaT) - - idx = pd.PeriodIndex(['2011-01-01', '2011-01-02'], freq='D') - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, False])) - self.assertFalse(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([], dtype=np.intp)) - - idx = pd.PeriodIndex(['2011-01-01', 'NaT'], freq='D') - self.assertTrue(idx._can_hold_na) - - tm.assert_numpy_array_equal(idx._isnan, np.array([False, True])) - self.assertTrue(idx.hasnans) - tm.assert_numpy_array_equal(idx._nan_idxs, - np.array([1], dtype=np.intp)) - - def test_equals(self): - # GH 13107 - for freq in ['D', 'M']: - idx = pd.PeriodIndex(['2011-01-01', '2011-01-02', 'NaT'], - freq=freq) - self.assertTrue(idx.equals(idx)) - self.assertTrue(idx.equals(idx.copy())) - self.assertTrue(idx.equals(idx.asobject)) - self.assertTrue(idx.asobject.equals(idx)) - self.assertTrue(idx.asobject.equals(idx.asobject)) - self.assertFalse(idx.equals(list(idx))) - self.assertFalse(idx.equals(pd.Series(idx))) - - idx2 = pd.PeriodIndex(['2011-01-01', '2011-01-02', 'NaT'], - freq='H') - self.assertFalse(idx.equals(idx2)) - self.assertFalse(idx.equals(idx2.copy())) - self.assertFalse(idx.equals(idx2.asobject)) - self.assertFalse(idx.asobject.equals(idx2)) - self.assertFalse(idx.equals(list(idx2))) - self.assertFalse(idx.equals(pd.Series(idx2))) - - # same internal, different tz - idx3 = pd.PeriodIndex._simple_new(idx.asi8, freq='H') - tm.assert_numpy_array_equal(idx.asi8, idx3.asi8) - self.assertFalse(idx.equals(idx3)) - self.assertFalse(idx.equals(idx3.copy())) - self.assertFalse(idx.equals(idx3.asobject)) - self.assertFalse(idx.asobject.equals(idx3)) - self.assertFalse(idx.equals(list(idx3))) - self.assertFalse(idx.equals(pd.Series(idx3))) - - -if __name__ == '__main__': - import nose - - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_daterange.py b/pandas/tseries/tests/test_daterange.py deleted file mode 100644 index 87f9f55e0189c..0000000000000 --- a/pandas/tseries/tests/test_daterange.py +++ /dev/null @@ -1,824 +0,0 @@ -from datetime import datetime -from pandas.compat import range -import nose -import numpy as np - -from pandas.core.index import Index -from pandas.tseries.index import DatetimeIndex - -from pandas import Timestamp -from pandas.tseries.offsets import (BDay, BMonthEnd, CDay, MonthEnd, - generate_range, DateOffset, Minute) -from pandas.tseries.index import cdate_range, bdate_range, date_range - -from pandas.core import common as com -from pandas.util.testing import assertRaisesRegexp -import pandas.util.testing as tm - - -def eq_gen_range(kwargs, expected): - rng = generate_range(**kwargs) - assert (np.array_equal(list(rng), expected)) - - -START, END = datetime(2009, 1, 1), datetime(2010, 1, 1) - - -class TestGenRangeGeneration(tm.TestCase): - - def test_generate(self): - rng1 = list(generate_range(START, END, offset=BDay())) - rng2 = list(generate_range(START, END, time_rule='B')) - self.assertEqual(rng1, rng2) - - def test_generate_cday(self): - rng1 = list(generate_range(START, END, offset=CDay())) - rng2 = list(generate_range(START, END, time_rule='C')) - self.assertEqual(rng1, rng2) - - def test_1(self): - eq_gen_range(dict(start=datetime(2009, 3, 25), periods=2), - [datetime(2009, 3, 25), datetime(2009, 3, 26)]) - - def test_2(self): - eq_gen_range(dict(start=datetime(2008, 1, 1), - end=datetime(2008, 1, 3)), - [datetime(2008, 1, 1), - datetime(2008, 1, 2), - datetime(2008, 1, 3)]) - - def test_3(self): - eq_gen_range(dict(start=datetime(2008, 1, 5), - end=datetime(2008, 1, 6)), - []) - - def test_precision_finer_than_offset(self): - # GH 9907 - result1 = DatetimeIndex(start='2015-04-15 00:00:03', - end='2016-04-22 00:00:00', freq='Q') - result2 = DatetimeIndex(start='2015-04-15 00:00:03', - end='2015-06-22 00:00:04', freq='W') - expected1_list = ['2015-06-30 00:00:03', '2015-09-30 00:00:03', - '2015-12-31 00:00:03', '2016-03-31 00:00:03'] - expected2_list = ['2015-04-19 00:00:03', '2015-04-26 00:00:03', - '2015-05-03 00:00:03', '2015-05-10 00:00:03', - '2015-05-17 00:00:03', '2015-05-24 00:00:03', - '2015-05-31 00:00:03', '2015-06-07 00:00:03', - '2015-06-14 00:00:03', '2015-06-21 00:00:03'] - expected1 = DatetimeIndex(expected1_list, dtype='datetime64[ns]', - freq='Q-DEC', tz=None) - expected2 = DatetimeIndex(expected2_list, dtype='datetime64[ns]', - freq='W-SUN', tz=None) - self.assert_index_equal(result1, expected1) - self.assert_index_equal(result2, expected2) - - -class TestDateRange(tm.TestCase): - def setUp(self): - self.rng = bdate_range(START, END) - - def test_constructor(self): - bdate_range(START, END, freq=BDay()) - bdate_range(START, periods=20, freq=BDay()) - bdate_range(end=START, periods=20, freq=BDay()) - self.assertRaises(ValueError, date_range, '2011-1-1', '2012-1-1', 'B') - self.assertRaises(ValueError, bdate_range, '2011-1-1', '2012-1-1', 'B') - - def test_naive_aware_conflicts(self): - naive = bdate_range(START, END, freq=BDay(), tz=None) - aware = bdate_range(START, END, freq=BDay(), - tz="Asia/Hong_Kong") - assertRaisesRegexp(TypeError, "tz-naive.*tz-aware", naive.join, aware) - assertRaisesRegexp(TypeError, "tz-naive.*tz-aware", aware.join, naive) - - def test_cached_range(self): - DatetimeIndex._cached_range(START, END, offset=BDay()) - DatetimeIndex._cached_range(START, periods=20, offset=BDay()) - DatetimeIndex._cached_range(end=START, periods=20, offset=BDay()) - - assertRaisesRegexp(TypeError, "offset", DatetimeIndex._cached_range, - START, END) - - assertRaisesRegexp(TypeError, "specify period", - DatetimeIndex._cached_range, START, - offset=BDay()) - - assertRaisesRegexp(TypeError, "specify period", - DatetimeIndex._cached_range, end=END, - offset=BDay()) - - assertRaisesRegexp(TypeError, "start or end", - DatetimeIndex._cached_range, periods=20, - offset=BDay()) - - def test_cached_range_bug(self): - rng = date_range('2010-09-01 05:00:00', periods=50, - freq=DateOffset(hours=6)) - self.assertEqual(len(rng), 50) - self.assertEqual(rng[0], datetime(2010, 9, 1, 5)) - - def test_timezone_comparaison_bug(self): - start = Timestamp('20130220 10:00', tz='US/Eastern') - try: - date_range(start, periods=2, tz='US/Eastern') - except AssertionError: - self.fail() - - def test_timezone_comparaison_assert(self): - start = Timestamp('20130220 10:00', tz='US/Eastern') - self.assertRaises(AssertionError, date_range, start, periods=2, - tz='Europe/Berlin') - - def test_comparison(self): - d = self.rng[10] - - comp = self.rng > d - self.assertTrue(comp[11]) - self.assertFalse(comp[9]) - - def test_copy(self): - cp = self.rng.copy() - repr(cp) - self.assert_index_equal(cp, self.rng) - - def test_repr(self): - # only really care that it works - repr(self.rng) - - def test_getitem(self): - smaller = self.rng[:5] - exp = DatetimeIndex(self.rng.view(np.ndarray)[:5]) - self.assert_index_equal(smaller, exp) - - self.assertEqual(smaller.offset, self.rng.offset) - - sliced = self.rng[::5] - self.assertEqual(sliced.offset, BDay() * 5) - - fancy_indexed = self.rng[[4, 3, 2, 1, 0]] - self.assertEqual(len(fancy_indexed), 5) - tm.assertIsInstance(fancy_indexed, DatetimeIndex) - self.assertIsNone(fancy_indexed.freq) - - # 32-bit vs. 64-bit platforms - self.assertEqual(self.rng[4], self.rng[np.int_(4)]) - - def test_getitem_matplotlib_hackaround(self): - values = self.rng[:, None] - expected = self.rng.values[:, None] - self.assert_numpy_array_equal(values, expected) - - def test_shift(self): - shifted = self.rng.shift(5) - self.assertEqual(shifted[0], self.rng[5]) - self.assertEqual(shifted.offset, self.rng.offset) - - shifted = self.rng.shift(-5) - self.assertEqual(shifted[5], self.rng[0]) - self.assertEqual(shifted.offset, self.rng.offset) - - shifted = self.rng.shift(0) - self.assertEqual(shifted[0], self.rng[0]) - self.assertEqual(shifted.offset, self.rng.offset) - - rng = date_range(START, END, freq=BMonthEnd()) - shifted = rng.shift(1, freq=BDay()) - self.assertEqual(shifted[0], rng[0] + BDay()) - - def test_pickle_unpickle(self): - unpickled = self.round_trip_pickle(self.rng) - self.assertIsNotNone(unpickled.offset) - - def test_union(self): - # overlapping - left = self.rng[:10] - right = self.rng[5:10] - - the_union = left.union(right) - tm.assertIsInstance(the_union, DatetimeIndex) - - # non-overlapping, gap in middle - left = self.rng[:5] - right = self.rng[10:] - - the_union = left.union(right) - tm.assertIsInstance(the_union, Index) - - # non-overlapping, no gap - left = self.rng[:5] - right = self.rng[5:10] - - the_union = left.union(right) - tm.assertIsInstance(the_union, DatetimeIndex) - - # order does not matter - tm.assert_index_equal(right.union(left), the_union) - - # overlapping, but different offset - rng = date_range(START, END, freq=BMonthEnd()) - - the_union = self.rng.union(rng) - tm.assertIsInstance(the_union, DatetimeIndex) - - def test_outer_join(self): - # should just behave as union - - # overlapping - left = self.rng[:10] - right = self.rng[5:10] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - - # non-overlapping, gap in middle - left = self.rng[:5] - right = self.rng[10:] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - self.assertIsNone(the_join.freq) - - # non-overlapping, no gap - left = self.rng[:5] - right = self.rng[5:10] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - - # overlapping, but different offset - rng = date_range(START, END, freq=BMonthEnd()) - - the_join = self.rng.join(rng, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - self.assertIsNone(the_join.freq) - - def test_union_not_cacheable(self): - rng = date_range('1/1/2000', periods=50, freq=Minute()) - rng1 = rng[10:] - rng2 = rng[:25] - the_union = rng1.union(rng2) - self.assert_index_equal(the_union, rng) - - rng1 = rng[10:] - rng2 = rng[15:35] - the_union = rng1.union(rng2) - expected = rng[10:] - self.assert_index_equal(the_union, expected) - - def test_intersection(self): - rng = date_range('1/1/2000', periods=50, freq=Minute()) - rng1 = rng[10:] - rng2 = rng[:25] - the_int = rng1.intersection(rng2) - expected = rng[10:25] - self.assert_index_equal(the_int, expected) - tm.assertIsInstance(the_int, DatetimeIndex) - self.assertEqual(the_int.offset, rng.offset) - - the_int = rng1.intersection(rng2.view(DatetimeIndex)) - self.assert_index_equal(the_int, expected) - - # non-overlapping - the_int = rng[:10].intersection(rng[10:]) - expected = DatetimeIndex([]) - self.assert_index_equal(the_int, expected) - - def test_intersection_bug(self): - # GH #771 - a = bdate_range('11/30/2011', '12/31/2011') - b = bdate_range('12/10/2011', '12/20/2011') - result = a.intersection(b) - self.assert_index_equal(result, b) - - def test_summary(self): - self.rng.summary() - self.rng[2:2].summary() - - def test_summary_pytz(self): - tm._skip_if_no_pytz() - import pytz - bdate_range('1/1/2005', '1/1/2009', tz=pytz.utc).summary() - - def test_summary_dateutil(self): - tm._skip_if_no_dateutil() - import dateutil - bdate_range('1/1/2005', '1/1/2009', tz=dateutil.tz.tzutc()).summary() - - def test_misc(self): - end = datetime(2009, 5, 13) - dr = bdate_range(end=end, periods=20) - firstDate = end - 19 * BDay() - - assert len(dr) == 20 - assert dr[0] == firstDate - assert dr[-1] == end - - def test_date_parse_failure(self): - badly_formed_date = '2007/100/1' - - self.assertRaises(ValueError, Timestamp, badly_formed_date) - - self.assertRaises(ValueError, bdate_range, start=badly_formed_date, - periods=10) - self.assertRaises(ValueError, bdate_range, end=badly_formed_date, - periods=10) - self.assertRaises(ValueError, bdate_range, badly_formed_date, - badly_formed_date) - - def test_equals(self): - self.assertFalse(self.rng.equals(list(self.rng))) - - def test_identical(self): - t1 = self.rng.copy() - t2 = self.rng.copy() - self.assertTrue(t1.identical(t2)) - - # name - t1 = t1.rename('foo') - self.assertTrue(t1.equals(t2)) - self.assertFalse(t1.identical(t2)) - t2 = t2.rename('foo') - self.assertTrue(t1.identical(t2)) - - # freq - t2v = Index(t2.values) - self.assertTrue(t1.equals(t2v)) - self.assertFalse(t1.identical(t2v)) - - def test_daterange_bug_456(self): - # GH #456 - rng1 = bdate_range('12/5/2011', '12/5/2011') - rng2 = bdate_range('12/2/2011', '12/5/2011') - rng2.offset = BDay() - - result = rng1.union(rng2) - tm.assertIsInstance(result, DatetimeIndex) - - def test_error_with_zero_monthends(self): - self.assertRaises(ValueError, date_range, '1/1/2000', '1/1/2001', - freq=MonthEnd(0)) - - def test_range_bug(self): - # GH #770 - offset = DateOffset(months=3) - result = date_range("2011-1-1", "2012-1-31", freq=offset) - - start = datetime(2011, 1, 1) - exp_values = [start + i * offset for i in range(5)] - tm.assert_index_equal(result, DatetimeIndex(exp_values)) - - def test_range_tz_pytz(self): - # GH 2906 - tm._skip_if_no_pytz() - from pytz import timezone - - tz = timezone('US/Eastern') - start = tz.localize(datetime(2011, 1, 1)) - end = tz.localize(datetime(2011, 1, 3)) - - dr = date_range(start=start, periods=3) - self.assertEqual(dr.tz.zone, tz.zone) - self.assertEqual(dr[0], start) - self.assertEqual(dr[2], end) - - dr = date_range(end=end, periods=3) - self.assertEqual(dr.tz.zone, tz.zone) - self.assertEqual(dr[0], start) - self.assertEqual(dr[2], end) - - dr = date_range(start=start, end=end) - self.assertEqual(dr.tz.zone, tz.zone) - self.assertEqual(dr[0], start) - self.assertEqual(dr[2], end) - - def test_range_tz_dst_straddle_pytz(self): - - tm._skip_if_no_pytz() - from pytz import timezone - tz = timezone('US/Eastern') - dates = [(tz.localize(datetime(2014, 3, 6)), - tz.localize(datetime(2014, 3, 12))), - (tz.localize(datetime(2013, 11, 1)), - tz.localize(datetime(2013, 11, 6)))] - for (start, end) in dates: - dr = date_range(start, end, freq='D') - self.assertEqual(dr[0], start) - self.assertEqual(dr[-1], end) - self.assertEqual(np.all(dr.hour == 0), True) - - dr = date_range(start, end, freq='D', tz='US/Eastern') - self.assertEqual(dr[0], start) - self.assertEqual(dr[-1], end) - self.assertEqual(np.all(dr.hour == 0), True) - - dr = date_range(start.replace(tzinfo=None), end.replace( - tzinfo=None), freq='D', tz='US/Eastern') - self.assertEqual(dr[0], start) - self.assertEqual(dr[-1], end) - self.assertEqual(np.all(dr.hour == 0), True) - - def test_range_tz_dateutil(self): - # GH 2906 - tm._skip_if_no_dateutil() - # Use maybe_get_tz to fix filename in tz under dateutil. - from pandas.tslib import maybe_get_tz - tz = lambda x: maybe_get_tz('dateutil/' + x) - - start = datetime(2011, 1, 1, tzinfo=tz('US/Eastern')) - end = datetime(2011, 1, 3, tzinfo=tz('US/Eastern')) - - dr = date_range(start=start, periods=3) - self.assertTrue(dr.tz == tz('US/Eastern')) - self.assertTrue(dr[0] == start) - self.assertTrue(dr[2] == end) - - dr = date_range(end=end, periods=3) - self.assertTrue(dr.tz == tz('US/Eastern')) - self.assertTrue(dr[0] == start) - self.assertTrue(dr[2] == end) - - dr = date_range(start=start, end=end) - self.assertTrue(dr.tz == tz('US/Eastern')) - self.assertTrue(dr[0] == start) - self.assertTrue(dr[2] == end) - - def test_month_range_union_tz_pytz(self): - tm._skip_if_no_pytz() - from pytz import timezone - tz = timezone('US/Eastern') - - early_start = datetime(2011, 1, 1) - early_end = datetime(2011, 3, 1) - - late_start = datetime(2011, 3, 1) - late_end = datetime(2011, 5, 1) - - early_dr = date_range(start=early_start, end=early_end, tz=tz, - freq=MonthEnd()) - late_dr = date_range(start=late_start, end=late_end, tz=tz, - freq=MonthEnd()) - - early_dr.union(late_dr) - - def test_month_range_union_tz_dateutil(self): - tm._skip_if_windows_python_3() - tm._skip_if_no_dateutil() - from pandas.tslib import _dateutil_gettz as timezone - tz = timezone('US/Eastern') - - early_start = datetime(2011, 1, 1) - early_end = datetime(2011, 3, 1) - - late_start = datetime(2011, 3, 1) - late_end = datetime(2011, 5, 1) - - early_dr = date_range(start=early_start, end=early_end, tz=tz, - freq=MonthEnd()) - late_dr = date_range(start=late_start, end=late_end, tz=tz, - freq=MonthEnd()) - - early_dr.union(late_dr) - - def test_range_closed(self): - begin = datetime(2011, 1, 1) - end = datetime(2014, 1, 1) - - for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: - closed = date_range(begin, end, closed=None, freq=freq) - left = date_range(begin, end, closed="left", freq=freq) - right = date_range(begin, end, closed="right", freq=freq) - expected_left = left - expected_right = right - - if end == closed[-1]: - expected_left = closed[:-1] - if begin == closed[0]: - expected_right = closed[1:] - - self.assert_index_equal(expected_left, left) - self.assert_index_equal(expected_right, right) - - def test_range_closed_with_tz_aware_start_end(self): - # GH12409, GH12684 - begin = Timestamp('2011/1/1', tz='US/Eastern') - end = Timestamp('2014/1/1', tz='US/Eastern') - - for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: - closed = date_range(begin, end, closed=None, freq=freq) - left = date_range(begin, end, closed="left", freq=freq) - right = date_range(begin, end, closed="right", freq=freq) - expected_left = left - expected_right = right - - if end == closed[-1]: - expected_left = closed[:-1] - if begin == closed[0]: - expected_right = closed[1:] - - self.assert_index_equal(expected_left, left) - self.assert_index_equal(expected_right, right) - - begin = Timestamp('2011/1/1') - end = Timestamp('2014/1/1') - begintz = Timestamp('2011/1/1', tz='US/Eastern') - endtz = Timestamp('2014/1/1', tz='US/Eastern') - - for freq in ["1D", "3D", "2M", "7W", "3H", "A"]: - closed = date_range(begin, end, closed=None, freq=freq, - tz='US/Eastern') - left = date_range(begin, end, closed="left", freq=freq, - tz='US/Eastern') - right = date_range(begin, end, closed="right", freq=freq, - tz='US/Eastern') - expected_left = left - expected_right = right - - if endtz == closed[-1]: - expected_left = closed[:-1] - if begintz == closed[0]: - expected_right = closed[1:] - - self.assert_index_equal(expected_left, left) - self.assert_index_equal(expected_right, right) - - def test_range_closed_boundary(self): - # GH 11804 - for closed in ['right', 'left', None]: - right_boundary = date_range('2015-09-12', '2015-12-01', - freq='QS-MAR', closed=closed) - left_boundary = date_range('2015-09-01', '2015-09-12', - freq='QS-MAR', closed=closed) - both_boundary = date_range('2015-09-01', '2015-12-01', - freq='QS-MAR', closed=closed) - expected_right = expected_left = expected_both = both_boundary - - if closed == 'right': - expected_left = both_boundary[1:] - if closed == 'left': - expected_right = both_boundary[:-1] - if closed is None: - expected_right = both_boundary[1:] - expected_left = both_boundary[:-1] - - self.assert_index_equal(right_boundary, expected_right) - self.assert_index_equal(left_boundary, expected_left) - self.assert_index_equal(both_boundary, expected_both) - - def test_years_only(self): - # GH 6961 - dr = date_range('2014', '2015', freq='M') - self.assertEqual(dr[0], datetime(2014, 1, 31)) - self.assertEqual(dr[-1], datetime(2014, 12, 31)) - - def test_freq_divides_end_in_nanos(self): - # GH 10885 - result_1 = date_range('2005-01-12 10:00', '2005-01-12 16:00', - freq='345min') - result_2 = date_range('2005-01-13 10:00', '2005-01-13 16:00', - freq='345min') - expected_1 = DatetimeIndex(['2005-01-12 10:00:00', - '2005-01-12 15:45:00'], - dtype='datetime64[ns]', freq='345T', - tz=None) - expected_2 = DatetimeIndex(['2005-01-13 10:00:00', - '2005-01-13 15:45:00'], - dtype='datetime64[ns]', freq='345T', - tz=None) - self.assert_index_equal(result_1, expected_1) - self.assert_index_equal(result_2, expected_2) - - -class TestCustomDateRange(tm.TestCase): - def setUp(self): - self.rng = cdate_range(START, END) - - def test_constructor(self): - cdate_range(START, END, freq=CDay()) - cdate_range(START, periods=20, freq=CDay()) - cdate_range(end=START, periods=20, freq=CDay()) - self.assertRaises(ValueError, date_range, '2011-1-1', '2012-1-1', 'C') - self.assertRaises(ValueError, cdate_range, '2011-1-1', '2012-1-1', 'C') - - def test_cached_range(self): - DatetimeIndex._cached_range(START, END, offset=CDay()) - DatetimeIndex._cached_range(START, periods=20, - offset=CDay()) - DatetimeIndex._cached_range(end=START, periods=20, - offset=CDay()) - - self.assertRaises(Exception, DatetimeIndex._cached_range, START, END) - - self.assertRaises(Exception, DatetimeIndex._cached_range, START, - freq=CDay()) - - self.assertRaises(Exception, DatetimeIndex._cached_range, end=END, - freq=CDay()) - - self.assertRaises(Exception, DatetimeIndex._cached_range, periods=20, - freq=CDay()) - - def test_comparison(self): - d = self.rng[10] - - comp = self.rng > d - self.assertTrue(comp[11]) - self.assertFalse(comp[9]) - - def test_copy(self): - cp = self.rng.copy() - repr(cp) - self.assert_index_equal(cp, self.rng) - - def test_repr(self): - # only really care that it works - repr(self.rng) - - def test_getitem(self): - smaller = self.rng[:5] - exp = DatetimeIndex(self.rng.view(np.ndarray)[:5]) - self.assert_index_equal(smaller, exp) - self.assertEqual(smaller.offset, self.rng.offset) - - sliced = self.rng[::5] - self.assertEqual(sliced.offset, CDay() * 5) - - fancy_indexed = self.rng[[4, 3, 2, 1, 0]] - self.assertEqual(len(fancy_indexed), 5) - tm.assertIsInstance(fancy_indexed, DatetimeIndex) - self.assertIsNone(fancy_indexed.freq) - - # 32-bit vs. 64-bit platforms - self.assertEqual(self.rng[4], self.rng[np.int_(4)]) - - def test_getitem_matplotlib_hackaround(self): - values = self.rng[:, None] - expected = self.rng.values[:, None] - self.assert_numpy_array_equal(values, expected) - - def test_shift(self): - - shifted = self.rng.shift(5) - self.assertEqual(shifted[0], self.rng[5]) - self.assertEqual(shifted.offset, self.rng.offset) - - shifted = self.rng.shift(-5) - self.assertEqual(shifted[5], self.rng[0]) - self.assertEqual(shifted.offset, self.rng.offset) - - shifted = self.rng.shift(0) - self.assertEqual(shifted[0], self.rng[0]) - self.assertEqual(shifted.offset, self.rng.offset) - - with tm.assert_produces_warning(com.PerformanceWarning): - rng = date_range(START, END, freq=BMonthEnd()) - shifted = rng.shift(1, freq=CDay()) - self.assertEqual(shifted[0], rng[0] + CDay()) - - def test_pickle_unpickle(self): - unpickled = self.round_trip_pickle(self.rng) - self.assertIsNotNone(unpickled.offset) - - def test_union(self): - # overlapping - left = self.rng[:10] - right = self.rng[5:10] - - the_union = left.union(right) - tm.assertIsInstance(the_union, DatetimeIndex) - - # non-overlapping, gap in middle - left = self.rng[:5] - right = self.rng[10:] - - the_union = left.union(right) - tm.assertIsInstance(the_union, Index) - - # non-overlapping, no gap - left = self.rng[:5] - right = self.rng[5:10] - - the_union = left.union(right) - tm.assertIsInstance(the_union, DatetimeIndex) - - # order does not matter - self.assert_index_equal(right.union(left), the_union) - - # overlapping, but different offset - rng = date_range(START, END, freq=BMonthEnd()) - - the_union = self.rng.union(rng) - tm.assertIsInstance(the_union, DatetimeIndex) - - def test_outer_join(self): - # should just behave as union - - # overlapping - left = self.rng[:10] - right = self.rng[5:10] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - - # non-overlapping, gap in middle - left = self.rng[:5] - right = self.rng[10:] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - self.assertIsNone(the_join.freq) - - # non-overlapping, no gap - left = self.rng[:5] - right = self.rng[5:10] - - the_join = left.join(right, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - - # overlapping, but different offset - rng = date_range(START, END, freq=BMonthEnd()) - - the_join = self.rng.join(rng, how='outer') - tm.assertIsInstance(the_join, DatetimeIndex) - self.assertIsNone(the_join.freq) - - def test_intersection_bug(self): - # GH #771 - a = cdate_range('11/30/2011', '12/31/2011') - b = cdate_range('12/10/2011', '12/20/2011') - result = a.intersection(b) - self.assert_index_equal(result, b) - - def test_summary(self): - self.rng.summary() - self.rng[2:2].summary() - - def test_summary_pytz(self): - tm._skip_if_no_pytz() - import pytz - cdate_range('1/1/2005', '1/1/2009', tz=pytz.utc).summary() - - def test_summary_dateutil(self): - tm._skip_if_no_dateutil() - import dateutil - cdate_range('1/1/2005', '1/1/2009', tz=dateutil.tz.tzutc()).summary() - - def test_misc(self): - end = datetime(2009, 5, 13) - dr = cdate_range(end=end, periods=20) - firstDate = end - 19 * CDay() - - assert len(dr) == 20 - assert dr[0] == firstDate - assert dr[-1] == end - - def test_date_parse_failure(self): - badly_formed_date = '2007/100/1' - - self.assertRaises(ValueError, Timestamp, badly_formed_date) - - self.assertRaises(ValueError, cdate_range, start=badly_formed_date, - periods=10) - self.assertRaises(ValueError, cdate_range, end=badly_formed_date, - periods=10) - self.assertRaises(ValueError, cdate_range, badly_formed_date, - badly_formed_date) - - def test_equals(self): - self.assertFalse(self.rng.equals(list(self.rng))) - - def test_daterange_bug_456(self): - # GH #456 - rng1 = cdate_range('12/5/2011', '12/5/2011') - rng2 = cdate_range('12/2/2011', '12/5/2011') - rng2.offset = CDay() - - result = rng1.union(rng2) - tm.assertIsInstance(result, DatetimeIndex) - - def test_cdaterange(self): - rng = cdate_range('2013-05-01', periods=3) - xp = DatetimeIndex(['2013-05-01', '2013-05-02', '2013-05-03']) - self.assert_index_equal(xp, rng) - - def test_cdaterange_weekmask(self): - rng = cdate_range('2013-05-01', periods=3, - weekmask='Sun Mon Tue Wed Thu') - xp = DatetimeIndex(['2013-05-01', '2013-05-02', '2013-05-05']) - self.assert_index_equal(xp, rng) - - def test_cdaterange_holidays(self): - rng = cdate_range('2013-05-01', periods=3, holidays=['2013-05-01']) - xp = DatetimeIndex(['2013-05-02', '2013-05-03', '2013-05-06']) - self.assert_index_equal(xp, rng) - - def test_cdaterange_weekmask_and_holidays(self): - rng = cdate_range('2013-05-01', periods=3, - weekmask='Sun Mon Tue Wed Thu', - holidays=['2013-05-01']) - xp = DatetimeIndex(['2013-05-02', '2013-05-05', '2013-05-06']) - self.assert_index_equal(xp, rng) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_period.py b/pandas/tseries/tests/test_period.py deleted file mode 100644 index a707cc3eb74ce..0000000000000 --- a/pandas/tseries/tests/test_period.py +++ /dev/null @@ -1,4975 +0,0 @@ -"""Tests suite for Period handling. - -Parts derived from scikits.timeseries code, original authors: -- Pierre Gerard-Marchant & Matt Knox -- pierregm_at_uga_dot_edu - mattknow_ca_at_hotmail_dot_com - -""" - -from datetime import datetime, date, timedelta - -from pandas import Timestamp, _period -from pandas.tseries.frequencies import MONTHS, DAYS, _period_code_map -from pandas.tseries.period import Period, PeriodIndex, period_range -from pandas.tseries.index import DatetimeIndex, date_range, Index -from pandas.tseries.tools import to_datetime -import pandas.tseries.period as period -import pandas.tseries.offsets as offsets - -import pandas as pd -import numpy as np -from numpy.random import randn -from pandas.compat import range, lrange, lmap, zip, text_type, PY3, iteritems -from pandas.compat.numpy import np_datetime64_compat - -from pandas import (Series, DataFrame, - _np_version_under1p9, _np_version_under1p10, - _np_version_under1p12) -from pandas import tslib -import pandas.util.testing as tm - - -class TestPeriodProperties(tm.TestCase): - "Test properties such as year, month, weekday, etc...." - - def test_quarterly_negative_ordinals(self): - p = Period(ordinal=-1, freq='Q-DEC') - self.assertEqual(p.year, 1969) - self.assertEqual(p.quarter, 4) - self.assertIsInstance(p, Period) - - p = Period(ordinal=-2, freq='Q-DEC') - self.assertEqual(p.year, 1969) - self.assertEqual(p.quarter, 3) - self.assertIsInstance(p, Period) - - p = Period(ordinal=-2, freq='M') - self.assertEqual(p.year, 1969) - self.assertEqual(p.month, 11) - self.assertIsInstance(p, Period) - - def test_period_cons_quarterly(self): - # bugs in scikits.timeseries - for month in MONTHS: - freq = 'Q-%s' % month - exp = Period('1989Q3', freq=freq) - self.assertIn('1989Q3', str(exp)) - stamp = exp.to_timestamp('D', how='end') - p = Period(stamp, freq=freq) - self.assertEqual(p, exp) - - stamp = exp.to_timestamp('3D', how='end') - p = Period(stamp, freq=freq) - self.assertEqual(p, exp) - - def test_period_cons_annual(self): - # bugs in scikits.timeseries - for month in MONTHS: - freq = 'A-%s' % month - exp = Period('1989', freq=freq) - stamp = exp.to_timestamp('D', how='end') + timedelta(days=30) - p = Period(stamp, freq=freq) - self.assertEqual(p, exp + 1) - self.assertIsInstance(p, Period) - - def test_period_cons_weekly(self): - for num in range(10, 17): - daystr = '2011-02-%d' % num - for day in DAYS: - freq = 'W-%s' % day - - result = Period(daystr, freq=freq) - expected = Period(daystr, freq='D').asfreq(freq) - self.assertEqual(result, expected) - self.assertIsInstance(result, Period) - - def test_period_from_ordinal(self): - p = pd.Period('2011-01', freq='M') - res = pd.Period._from_ordinal(p.ordinal, freq='M') - self.assertEqual(p, res) - self.assertIsInstance(res, Period) - - def test_period_cons_nat(self): - p = Period('NaT', freq='M') - self.assertIs(p, pd.NaT) - - p = Period('nat', freq='W-SUN') - self.assertIs(p, pd.NaT) - - p = Period(tslib.iNaT, freq='D') - self.assertIs(p, pd.NaT) - - p = Period(tslib.iNaT, freq='3D') - self.assertIs(p, pd.NaT) - - p = Period(tslib.iNaT, freq='1D1H') - self.assertIs(p, pd.NaT) - - p = Period('NaT') - self.assertIs(p, pd.NaT) - - p = Period(tslib.iNaT) - self.assertIs(p, pd.NaT) - - def test_cons_null_like(self): - # check Timestamp compat - self.assertIs(Timestamp('NaT'), pd.NaT) - self.assertIs(Period('NaT'), pd.NaT) - - self.assertIs(Timestamp(None), pd.NaT) - self.assertIs(Period(None), pd.NaT) - - self.assertIs(Timestamp(float('nan')), pd.NaT) - self.assertIs(Period(float('nan')), pd.NaT) - - self.assertIs(Timestamp(np.nan), pd.NaT) - self.assertIs(Period(np.nan), pd.NaT) - - def test_period_cons_mult(self): - p1 = Period('2011-01', freq='3M') - p2 = Period('2011-01', freq='M') - self.assertEqual(p1.ordinal, p2.ordinal) - - self.assertEqual(p1.freq, offsets.MonthEnd(3)) - self.assertEqual(p1.freqstr, '3M') - - self.assertEqual(p2.freq, offsets.MonthEnd()) - self.assertEqual(p2.freqstr, 'M') - - result = p1 + 1 - self.assertEqual(result.ordinal, (p2 + 3).ordinal) - self.assertEqual(result.freq, p1.freq) - self.assertEqual(result.freqstr, '3M') - - result = p1 - 1 - self.assertEqual(result.ordinal, (p2 - 3).ordinal) - self.assertEqual(result.freq, p1.freq) - self.assertEqual(result.freqstr, '3M') - - msg = ('Frequency must be positive, because it' - ' represents span: -3M') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='-3M') - - msg = ('Frequency must be positive, because it' ' represents span: 0M') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='0M') - - def test_period_cons_combined(self): - p = [(Period('2011-01', freq='1D1H'), - Period('2011-01', freq='1H1D'), - Period('2011-01', freq='H')), - (Period(ordinal=1, freq='1D1H'), - Period(ordinal=1, freq='1H1D'), - Period(ordinal=1, freq='H'))] - - for p1, p2, p3 in p: - self.assertEqual(p1.ordinal, p3.ordinal) - self.assertEqual(p2.ordinal, p3.ordinal) - - self.assertEqual(p1.freq, offsets.Hour(25)) - self.assertEqual(p1.freqstr, '25H') - - self.assertEqual(p2.freq, offsets.Hour(25)) - self.assertEqual(p2.freqstr, '25H') - - self.assertEqual(p3.freq, offsets.Hour()) - self.assertEqual(p3.freqstr, 'H') - - result = p1 + 1 - self.assertEqual(result.ordinal, (p3 + 25).ordinal) - self.assertEqual(result.freq, p1.freq) - self.assertEqual(result.freqstr, '25H') - - result = p2 + 1 - self.assertEqual(result.ordinal, (p3 + 25).ordinal) - self.assertEqual(result.freq, p2.freq) - self.assertEqual(result.freqstr, '25H') - - result = p1 - 1 - self.assertEqual(result.ordinal, (p3 - 25).ordinal) - self.assertEqual(result.freq, p1.freq) - self.assertEqual(result.freqstr, '25H') - - result = p2 - 1 - self.assertEqual(result.ordinal, (p3 - 25).ordinal) - self.assertEqual(result.freq, p2.freq) - self.assertEqual(result.freqstr, '25H') - - msg = ('Frequency must be positive, because it' - ' represents span: -25H') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='-1D1H') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='-1H1D') - with tm.assertRaisesRegexp(ValueError, msg): - Period(ordinal=1, freq='-1D1H') - with tm.assertRaisesRegexp(ValueError, msg): - Period(ordinal=1, freq='-1H1D') - - msg = ('Frequency must be positive, because it' - ' represents span: 0D') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='0D0H') - with tm.assertRaisesRegexp(ValueError, msg): - Period(ordinal=1, freq='0D0H') - - # You can only combine together day and intraday offsets - msg = ('Invalid frequency: 1W1D') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='1W1D') - msg = ('Invalid frequency: 1D1W') - with tm.assertRaisesRegexp(ValueError, msg): - Period('2011-01', freq='1D1W') - - def test_timestamp_tz_arg(self): - tm._skip_if_no_pytz() - import pytz - for case in ['Europe/Brussels', 'Asia/Tokyo', 'US/Pacific']: - p = Period('1/1/2005', freq='M').to_timestamp(tz=case) - exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) - exp_zone = pytz.timezone(case).normalize(p) - - self.assertEqual(p, exp) - self.assertEqual(p.tz, exp_zone.tzinfo) - self.assertEqual(p.tz, exp.tz) - - p = Period('1/1/2005', freq='3H').to_timestamp(tz=case) - exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) - exp_zone = pytz.timezone(case).normalize(p) - - self.assertEqual(p, exp) - self.assertEqual(p.tz, exp_zone.tzinfo) - self.assertEqual(p.tz, exp.tz) - - p = Period('1/1/2005', freq='A').to_timestamp(freq='A', tz=case) - exp = Timestamp('31/12/2005', tz='UTC').tz_convert(case) - exp_zone = pytz.timezone(case).normalize(p) - - self.assertEqual(p, exp) - self.assertEqual(p.tz, exp_zone.tzinfo) - self.assertEqual(p.tz, exp.tz) - - p = Period('1/1/2005', freq='A').to_timestamp(freq='3H', tz=case) - exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) - exp_zone = pytz.timezone(case).normalize(p) - - self.assertEqual(p, exp) - self.assertEqual(p.tz, exp_zone.tzinfo) - self.assertEqual(p.tz, exp.tz) - - def test_timestamp_tz_arg_dateutil(self): - from pandas.tslib import _dateutil_gettz as gettz - from pandas.tslib import maybe_get_tz - for case in ['dateutil/Europe/Brussels', 'dateutil/Asia/Tokyo', - 'dateutil/US/Pacific']: - p = Period('1/1/2005', freq='M').to_timestamp( - tz=maybe_get_tz(case)) - exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) - self.assertEqual(p, exp) - self.assertEqual(p.tz, gettz(case.split('/', 1)[1])) - self.assertEqual(p.tz, exp.tz) - - p = Period('1/1/2005', - freq='M').to_timestamp(freq='3H', tz=maybe_get_tz(case)) - exp = Timestamp('1/1/2005', tz='UTC').tz_convert(case) - self.assertEqual(p, exp) - self.assertEqual(p.tz, gettz(case.split('/', 1)[1])) - self.assertEqual(p.tz, exp.tz) - - def test_timestamp_tz_arg_dateutil_from_string(self): - from pandas.tslib import _dateutil_gettz as gettz - p = Period('1/1/2005', - freq='M').to_timestamp(tz='dateutil/Europe/Brussels') - self.assertEqual(p.tz, gettz('Europe/Brussels')) - - def test_timestamp_mult(self): - p = pd.Period('2011-01', freq='M') - self.assertEqual(p.to_timestamp(how='S'), pd.Timestamp('2011-01-01')) - self.assertEqual(p.to_timestamp(how='E'), pd.Timestamp('2011-01-31')) - - p = pd.Period('2011-01', freq='3M') - self.assertEqual(p.to_timestamp(how='S'), pd.Timestamp('2011-01-01')) - self.assertEqual(p.to_timestamp(how='E'), pd.Timestamp('2011-03-31')) - - def test_period_constructor(self): - i1 = Period('1/1/2005', freq='M') - i2 = Period('Jan 2005') - - self.assertEqual(i1, i2) - - i1 = Period('2005', freq='A') - i2 = Period('2005') - i3 = Period('2005', freq='a') - - self.assertEqual(i1, i2) - self.assertEqual(i1, i3) - - i4 = Period('2005', freq='M') - i5 = Period('2005', freq='m') - - self.assertRaises(ValueError, i1.__ne__, i4) - self.assertEqual(i4, i5) - - i1 = Period.now('Q') - i2 = Period(datetime.now(), freq='Q') - i3 = Period.now('q') - - self.assertEqual(i1, i2) - self.assertEqual(i1, i3) - - # Biz day construction, roll forward if non-weekday - i1 = Period('3/10/12', freq='B') - i2 = Period('3/10/12', freq='D') - self.assertEqual(i1, i2.asfreq('B')) - i2 = Period('3/11/12', freq='D') - self.assertEqual(i1, i2.asfreq('B')) - i2 = Period('3/12/12', freq='D') - self.assertEqual(i1, i2.asfreq('B')) - - i3 = Period('3/10/12', freq='b') - self.assertEqual(i1, i3) - - i1 = Period(year=2005, quarter=1, freq='Q') - i2 = Period('1/1/2005', freq='Q') - self.assertEqual(i1, i2) - - i1 = Period(year=2005, quarter=3, freq='Q') - i2 = Period('9/1/2005', freq='Q') - self.assertEqual(i1, i2) - - i1 = Period(year=2005, month=3, day=1, freq='D') - i2 = Period('3/1/2005', freq='D') - self.assertEqual(i1, i2) - - i3 = Period(year=2005, month=3, day=1, freq='d') - self.assertEqual(i1, i3) - - i1 = Period(year=2012, month=3, day=10, freq='B') - i2 = Period('3/12/12', freq='B') - self.assertEqual(i1, i2) - - i1 = Period('2005Q1') - i2 = Period(year=2005, quarter=1, freq='Q') - i3 = Period('2005q1') - self.assertEqual(i1, i2) - self.assertEqual(i1, i3) - - i1 = Period('05Q1') - self.assertEqual(i1, i2) - lower = Period('05q1') - self.assertEqual(i1, lower) - - i1 = Period('1Q2005') - self.assertEqual(i1, i2) - lower = Period('1q2005') - self.assertEqual(i1, lower) - - i1 = Period('1Q05') - self.assertEqual(i1, i2) - lower = Period('1q05') - self.assertEqual(i1, lower) - - i1 = Period('4Q1984') - self.assertEqual(i1.year, 1984) - lower = Period('4q1984') - self.assertEqual(i1, lower) - - i1 = Period('1982', freq='min') - i2 = Period('1982', freq='MIN') - self.assertEqual(i1, i2) - i2 = Period('1982', freq=('Min', 1)) - self.assertEqual(i1, i2) - - expected = Period('2007-01', freq='M') - i1 = Period('200701', freq='M') - self.assertEqual(i1, expected) - - i1 = Period('200701', freq='M') - self.assertEqual(i1, expected) - - i1 = Period(200701, freq='M') - self.assertEqual(i1, expected) - - i1 = Period(ordinal=200701, freq='M') - self.assertEqual(i1.year, 18695) - - i1 = Period(datetime(2007, 1, 1), freq='M') - i2 = Period('200701', freq='M') - self.assertEqual(i1, i2) - - i1 = Period(date(2007, 1, 1), freq='M') - i2 = Period(datetime(2007, 1, 1), freq='M') - i3 = Period(np.datetime64('2007-01-01'), freq='M') - i4 = Period(np_datetime64_compat('2007-01-01 00:00:00Z'), freq='M') - i5 = Period(np_datetime64_compat('2007-01-01 00:00:00.000Z'), freq='M') - self.assertEqual(i1, i2) - self.assertEqual(i1, i3) - self.assertEqual(i1, i4) - self.assertEqual(i1, i5) - - i1 = Period('2007-01-01 09:00:00.001') - expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1000), freq='L') - self.assertEqual(i1, expected) - - expected = Period(np_datetime64_compat( - '2007-01-01 09:00:00.001Z'), freq='L') - self.assertEqual(i1, expected) - - i1 = Period('2007-01-01 09:00:00.00101') - expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1010), freq='U') - self.assertEqual(i1, expected) - - expected = Period(np_datetime64_compat('2007-01-01 09:00:00.00101Z'), - freq='U') - self.assertEqual(i1, expected) - - self.assertRaises(ValueError, Period, ordinal=200701) - - self.assertRaises(ValueError, Period, '2007-1-1', freq='X') - - def test_period_constructor_offsets(self): - self.assertEqual(Period('1/1/2005', freq=offsets.MonthEnd()), - Period('1/1/2005', freq='M')) - self.assertEqual(Period('2005', freq=offsets.YearEnd()), - Period('2005', freq='A')) - self.assertEqual(Period('2005', freq=offsets.MonthEnd()), - Period('2005', freq='M')) - self.assertEqual(Period('3/10/12', freq=offsets.BusinessDay()), - Period('3/10/12', freq='B')) - self.assertEqual(Period('3/10/12', freq=offsets.Day()), - Period('3/10/12', freq='D')) - - self.assertEqual(Period(year=2005, quarter=1, - freq=offsets.QuarterEnd(startingMonth=12)), - Period(year=2005, quarter=1, freq='Q')) - self.assertEqual(Period(year=2005, quarter=2, - freq=offsets.QuarterEnd(startingMonth=12)), - Period(year=2005, quarter=2, freq='Q')) - - self.assertEqual(Period(year=2005, month=3, day=1, freq=offsets.Day()), - Period(year=2005, month=3, day=1, freq='D')) - self.assertEqual(Period(year=2012, month=3, day=10, - freq=offsets.BDay()), - Period(year=2012, month=3, day=10, freq='B')) - - expected = Period('2005-03-01', freq='3D') - self.assertEqual(Period(year=2005, month=3, day=1, - freq=offsets.Day(3)), expected) - self.assertEqual(Period(year=2005, month=3, day=1, freq='3D'), - expected) - - self.assertEqual(Period(year=2012, month=3, day=10, - freq=offsets.BDay(3)), - Period(year=2012, month=3, day=10, freq='3B')) - - self.assertEqual(Period(200701, freq=offsets.MonthEnd()), - Period(200701, freq='M')) - - i1 = Period(ordinal=200701, freq=offsets.MonthEnd()) - i2 = Period(ordinal=200701, freq='M') - self.assertEqual(i1, i2) - self.assertEqual(i1.year, 18695) - self.assertEqual(i2.year, 18695) - - i1 = Period(datetime(2007, 1, 1), freq='M') - i2 = Period('200701', freq='M') - self.assertEqual(i1, i2) - - i1 = Period(date(2007, 1, 1), freq='M') - i2 = Period(datetime(2007, 1, 1), freq='M') - i3 = Period(np.datetime64('2007-01-01'), freq='M') - i4 = Period(np_datetime64_compat('2007-01-01 00:00:00Z'), freq='M') - i5 = Period(np_datetime64_compat('2007-01-01 00:00:00.000Z'), freq='M') - self.assertEqual(i1, i2) - self.assertEqual(i1, i3) - self.assertEqual(i1, i4) - self.assertEqual(i1, i5) - - i1 = Period('2007-01-01 09:00:00.001') - expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1000), freq='L') - self.assertEqual(i1, expected) - - expected = Period(np_datetime64_compat( - '2007-01-01 09:00:00.001Z'), freq='L') - self.assertEqual(i1, expected) - - i1 = Period('2007-01-01 09:00:00.00101') - expected = Period(datetime(2007, 1, 1, 9, 0, 0, 1010), freq='U') - self.assertEqual(i1, expected) - - expected = Period(np_datetime64_compat('2007-01-01 09:00:00.00101Z'), - freq='U') - self.assertEqual(i1, expected) - - self.assertRaises(ValueError, Period, ordinal=200701) - - self.assertRaises(ValueError, Period, '2007-1-1', freq='X') - - def test_freq_str(self): - i1 = Period('1982', freq='Min') - self.assertEqual(i1.freq, offsets.Minute()) - self.assertEqual(i1.freqstr, 'T') - - def test_period_deprecated_freq(self): - cases = {"M": ["MTH", "MONTH", "MONTHLY", "Mth", "month", "monthly"], - "B": ["BUS", "BUSINESS", "BUSINESSLY", "WEEKDAY", "bus"], - "D": ["DAY", "DLY", "DAILY", "Day", "Dly", "Daily"], - "H": ["HR", "HOUR", "HRLY", "HOURLY", "hr", "Hour", "HRly"], - "T": ["minute", "MINUTE", "MINUTELY", "minutely"], - "S": ["sec", "SEC", "SECOND", "SECONDLY", "second"], - "L": ["MILLISECOND", "MILLISECONDLY", "millisecond"], - "U": ["MICROSECOND", "MICROSECONDLY", "microsecond"], - "N": ["NANOSECOND", "NANOSECONDLY", "nanosecond"]} - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - for exp, freqs in iteritems(cases): - for freq in freqs: - with self.assertRaisesRegexp(ValueError, msg): - Period('2016-03-01 09:00', freq=freq) - with self.assertRaisesRegexp(ValueError, msg): - Period(ordinal=1, freq=freq) - - # check supported freq-aliases still works - p1 = Period('2016-03-01 09:00', freq=exp) - p2 = Period(ordinal=1, freq=exp) - tm.assertIsInstance(p1, Period) - tm.assertIsInstance(p2, Period) - - def test_hash(self): - self.assertEqual(hash(Period('2011-01', freq='M')), - hash(Period('2011-01', freq='M'))) - - self.assertNotEqual(hash(Period('2011-01-01', freq='D')), - hash(Period('2011-01', freq='M'))) - - self.assertNotEqual(hash(Period('2011-01', freq='3M')), - hash(Period('2011-01', freq='2M'))) - - self.assertNotEqual(hash(Period('2011-01', freq='M')), - hash(Period('2011-02', freq='M'))) - - def test_repr(self): - p = Period('Jan-2000') - self.assertIn('2000-01', repr(p)) - - p = Period('2000-12-15') - self.assertIn('2000-12-15', repr(p)) - - def test_repr_nat(self): - p = Period('nat', freq='M') - self.assertIn(repr(tslib.NaT), repr(p)) - - def test_millisecond_repr(self): - p = Period('2000-01-01 12:15:02.123') - - self.assertEqual("Period('2000-01-01 12:15:02.123', 'L')", repr(p)) - - def test_microsecond_repr(self): - p = Period('2000-01-01 12:15:02.123567') - - self.assertEqual("Period('2000-01-01 12:15:02.123567', 'U')", repr(p)) - - def test_strftime(self): - p = Period('2000-1-1 12:34:12', freq='S') - res = p.strftime('%Y-%m-%d %H:%M:%S') - self.assertEqual(res, '2000-01-01 12:34:12') - tm.assertIsInstance(res, text_type) # GH3363 - - def test_sub_delta(self): - left, right = Period('2011', freq='A'), Period('2007', freq='A') - result = left - right - self.assertEqual(result, 4) - - with self.assertRaises(period.IncompatibleFrequency): - left - Period('2007-01', freq='M') - - def test_to_timestamp(self): - p = Period('1982', freq='A') - start_ts = p.to_timestamp(how='S') - aliases = ['s', 'StarT', 'BEGIn'] - for a in aliases: - self.assertEqual(start_ts, p.to_timestamp('D', how=a)) - # freq with mult should not affect to the result - self.assertEqual(start_ts, p.to_timestamp('3D', how=a)) - - end_ts = p.to_timestamp(how='E') - aliases = ['e', 'end', 'FINIsH'] - for a in aliases: - self.assertEqual(end_ts, p.to_timestamp('D', how=a)) - self.assertEqual(end_ts, p.to_timestamp('3D', how=a)) - - from_lst = ['A', 'Q', 'M', 'W', 'B', 'D', 'H', 'Min', 'S'] - - def _ex(p): - return Timestamp((p + 1).start_time.value - 1) - - for i, fcode in enumerate(from_lst): - p = Period('1982', freq=fcode) - result = p.to_timestamp().to_period(fcode) - self.assertEqual(result, p) - - self.assertEqual(p.start_time, p.to_timestamp(how='S')) - - self.assertEqual(p.end_time, _ex(p)) - - # Frequency other than daily - - p = Period('1985', freq='A') - - result = p.to_timestamp('H', how='end') - expected = datetime(1985, 12, 31, 23) - self.assertEqual(result, expected) - result = p.to_timestamp('3H', how='end') - self.assertEqual(result, expected) - - result = p.to_timestamp('T', how='end') - expected = datetime(1985, 12, 31, 23, 59) - self.assertEqual(result, expected) - result = p.to_timestamp('2T', how='end') - self.assertEqual(result, expected) - - result = p.to_timestamp(how='end') - expected = datetime(1985, 12, 31) - self.assertEqual(result, expected) - - expected = datetime(1985, 1, 1) - result = p.to_timestamp('H', how='start') - self.assertEqual(result, expected) - result = p.to_timestamp('T', how='start') - self.assertEqual(result, expected) - result = p.to_timestamp('S', how='start') - self.assertEqual(result, expected) - result = p.to_timestamp('3H', how='start') - self.assertEqual(result, expected) - result = p.to_timestamp('5S', how='start') - self.assertEqual(result, expected) - - def test_start_time(self): - freq_lst = ['A', 'Q', 'M', 'D', 'H', 'T', 'S'] - xp = datetime(2012, 1, 1) - for f in freq_lst: - p = Period('2012', freq=f) - self.assertEqual(p.start_time, xp) - self.assertEqual(Period('2012', freq='B').start_time, - datetime(2012, 1, 2)) - self.assertEqual(Period('2012', freq='W').start_time, - datetime(2011, 12, 26)) - - def test_end_time(self): - p = Period('2012', freq='A') - - def _ex(*args): - return Timestamp(Timestamp(datetime(*args)).value - 1) - - xp = _ex(2013, 1, 1) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='Q') - xp = _ex(2012, 4, 1) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='M') - xp = _ex(2012, 2, 1) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='D') - xp = _ex(2012, 1, 2) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='H') - xp = _ex(2012, 1, 1, 1) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='B') - xp = _ex(2012, 1, 3) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='W') - xp = _ex(2012, 1, 2) - self.assertEqual(xp, p.end_time) - - # Test for GH 11738 - p = Period('2012', freq='15D') - xp = _ex(2012, 1, 16) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='1D1H') - xp = _ex(2012, 1, 2, 1) - self.assertEqual(xp, p.end_time) - - p = Period('2012', freq='1H1D') - xp = _ex(2012, 1, 2, 1) - self.assertEqual(xp, p.end_time) - - def test_anchor_week_end_time(self): - def _ex(*args): - return Timestamp(Timestamp(datetime(*args)).value - 1) - - p = Period('2013-1-1', 'W-SAT') - xp = _ex(2013, 1, 6) - self.assertEqual(p.end_time, xp) - - def test_properties_annually(self): - # Test properties on Periods with annually frequency. - a_date = Period(freq='A', year=2007) - self.assertEqual(a_date.year, 2007) - - def test_properties_quarterly(self): - # Test properties on Periods with daily frequency. - qedec_date = Period(freq="Q-DEC", year=2007, quarter=1) - qejan_date = Period(freq="Q-JAN", year=2007, quarter=1) - qejun_date = Period(freq="Q-JUN", year=2007, quarter=1) - # - for x in range(3): - for qd in (qedec_date, qejan_date, qejun_date): - self.assertEqual((qd + x).qyear, 2007) - self.assertEqual((qd + x).quarter, x + 1) - - def test_properties_monthly(self): - # Test properties on Periods with daily frequency. - m_date = Period(freq='M', year=2007, month=1) - for x in range(11): - m_ival_x = m_date + x - self.assertEqual(m_ival_x.year, 2007) - if 1 <= x + 1 <= 3: - self.assertEqual(m_ival_x.quarter, 1) - elif 4 <= x + 1 <= 6: - self.assertEqual(m_ival_x.quarter, 2) - elif 7 <= x + 1 <= 9: - self.assertEqual(m_ival_x.quarter, 3) - elif 10 <= x + 1 <= 12: - self.assertEqual(m_ival_x.quarter, 4) - self.assertEqual(m_ival_x.month, x + 1) - - def test_properties_weekly(self): - # Test properties on Periods with daily frequency. - w_date = Period(freq='W', year=2007, month=1, day=7) - # - self.assertEqual(w_date.year, 2007) - self.assertEqual(w_date.quarter, 1) - self.assertEqual(w_date.month, 1) - self.assertEqual(w_date.week, 1) - self.assertEqual((w_date - 1).week, 52) - self.assertEqual(w_date.days_in_month, 31) - self.assertEqual(Period(freq='W', year=2012, - month=2, day=1).days_in_month, 29) - - def test_properties_weekly_legacy(self): - # Test properties on Periods with daily frequency. - w_date = Period(freq='W', year=2007, month=1, day=7) - self.assertEqual(w_date.year, 2007) - self.assertEqual(w_date.quarter, 1) - self.assertEqual(w_date.month, 1) - self.assertEqual(w_date.week, 1) - self.assertEqual((w_date - 1).week, 52) - self.assertEqual(w_date.days_in_month, 31) - - exp = Period(freq='W', year=2012, month=2, day=1) - self.assertEqual(exp.days_in_month, 29) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK', year=2007, month=1, day=7) - - def test_properties_daily(self): - # Test properties on Periods with daily frequency. - b_date = Period(freq='B', year=2007, month=1, day=1) - # - self.assertEqual(b_date.year, 2007) - self.assertEqual(b_date.quarter, 1) - self.assertEqual(b_date.month, 1) - self.assertEqual(b_date.day, 1) - self.assertEqual(b_date.weekday, 0) - self.assertEqual(b_date.dayofyear, 1) - self.assertEqual(b_date.days_in_month, 31) - self.assertEqual(Period(freq='B', year=2012, - month=2, day=1).days_in_month, 29) - # - d_date = Period(freq='D', year=2007, month=1, day=1) - # - self.assertEqual(d_date.year, 2007) - self.assertEqual(d_date.quarter, 1) - self.assertEqual(d_date.month, 1) - self.assertEqual(d_date.day, 1) - self.assertEqual(d_date.weekday, 0) - self.assertEqual(d_date.dayofyear, 1) - self.assertEqual(d_date.days_in_month, 31) - self.assertEqual(Period(freq='D', year=2012, month=2, - day=1).days_in_month, 29) - - def test_properties_hourly(self): - # Test properties on Periods with hourly frequency. - h_date1 = Period(freq='H', year=2007, month=1, day=1, hour=0) - h_date2 = Period(freq='2H', year=2007, month=1, day=1, hour=0) - - for h_date in [h_date1, h_date2]: - self.assertEqual(h_date.year, 2007) - self.assertEqual(h_date.quarter, 1) - self.assertEqual(h_date.month, 1) - self.assertEqual(h_date.day, 1) - self.assertEqual(h_date.weekday, 0) - self.assertEqual(h_date.dayofyear, 1) - self.assertEqual(h_date.hour, 0) - self.assertEqual(h_date.days_in_month, 31) - self.assertEqual(Period(freq='H', year=2012, month=2, day=1, - hour=0).days_in_month, 29) - - def test_properties_minutely(self): - # Test properties on Periods with minutely frequency. - t_date = Period(freq='Min', year=2007, month=1, day=1, hour=0, - minute=0) - # - self.assertEqual(t_date.quarter, 1) - self.assertEqual(t_date.month, 1) - self.assertEqual(t_date.day, 1) - self.assertEqual(t_date.weekday, 0) - self.assertEqual(t_date.dayofyear, 1) - self.assertEqual(t_date.hour, 0) - self.assertEqual(t_date.minute, 0) - self.assertEqual(t_date.days_in_month, 31) - self.assertEqual(Period(freq='D', year=2012, month=2, day=1, hour=0, - minute=0).days_in_month, 29) - - def test_properties_secondly(self): - # Test properties on Periods with secondly frequency. - s_date = Period(freq='Min', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - # - self.assertEqual(s_date.year, 2007) - self.assertEqual(s_date.quarter, 1) - self.assertEqual(s_date.month, 1) - self.assertEqual(s_date.day, 1) - self.assertEqual(s_date.weekday, 0) - self.assertEqual(s_date.dayofyear, 1) - self.assertEqual(s_date.hour, 0) - self.assertEqual(s_date.minute, 0) - self.assertEqual(s_date.second, 0) - self.assertEqual(s_date.days_in_month, 31) - self.assertEqual(Period(freq='Min', year=2012, month=2, day=1, hour=0, - minute=0, second=0).days_in_month, 29) - - def test_properties_nat(self): - p_nat = Period('NaT', freq='M') - t_nat = pd.Timestamp('NaT') - self.assertIs(p_nat, t_nat) - - # confirm Period('NaT') work identical with Timestamp('NaT') - for f in ['year', 'month', 'day', 'hour', 'minute', 'second', 'week', - 'dayofyear', 'quarter', 'days_in_month']: - self.assertTrue(np.isnan(getattr(p_nat, f))) - self.assertTrue(np.isnan(getattr(t_nat, f))) - - def test_pnow(self): - dt = datetime.now() - - val = period.pnow('D') - exp = Period(dt, freq='D') - self.assertEqual(val, exp) - - val2 = period.pnow('2D') - exp2 = Period(dt, freq='2D') - self.assertEqual(val2, exp2) - self.assertEqual(val.ordinal, val2.ordinal) - self.assertEqual(val.ordinal, exp2.ordinal) - - def test_constructor_corner(self): - expected = Period('2007-01', freq='2M') - self.assertEqual(Period(year=2007, month=1, freq='2M'), expected) - - self.assertRaises(ValueError, Period, datetime.now()) - self.assertRaises(ValueError, Period, datetime.now().date()) - self.assertRaises(ValueError, Period, 1.6, freq='D') - self.assertRaises(ValueError, Period, ordinal=1.6, freq='D') - self.assertRaises(ValueError, Period, ordinal=2, value=1, freq='D') - self.assertIs(Period(None), pd.NaT) - self.assertRaises(ValueError, Period, month=1) - - p = Period('2007-01-01', freq='D') - - result = Period(p, freq='A') - exp = Period('2007', freq='A') - self.assertEqual(result, exp) - - def test_constructor_infer_freq(self): - p = Period('2007-01-01') - self.assertEqual(p.freq, 'D') - - p = Period('2007-01-01 07') - self.assertEqual(p.freq, 'H') - - p = Period('2007-01-01 07:10') - self.assertEqual(p.freq, 'T') - - p = Period('2007-01-01 07:10:15') - self.assertEqual(p.freq, 'S') - - p = Period('2007-01-01 07:10:15.123') - self.assertEqual(p.freq, 'L') - - p = Period('2007-01-01 07:10:15.123000') - self.assertEqual(p.freq, 'L') - - p = Period('2007-01-01 07:10:15.123400') - self.assertEqual(p.freq, 'U') - - def test_asfreq_MS(self): - initial = Period("2013") - - self.assertEqual(initial.asfreq(freq="M", how="S"), - Period('2013-01', 'M')) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - initial.asfreq(freq="MS", how="S") - - with tm.assertRaisesRegexp(ValueError, msg): - pd.Period('2013-01', 'MS') - - self.assertTrue(_period_code_map.get("MS") is None) - - -def noWrap(item): - return item - - -class TestFreqConversion(tm.TestCase): - "Test frequency conversion of date objects" - - def test_asfreq_corner(self): - val = Period(freq='A', year=2007) - result1 = val.asfreq('5t') - result2 = val.asfreq('t') - expected = Period('2007-12-31 23:59', freq='t') - self.assertEqual(result1.ordinal, expected.ordinal) - self.assertEqual(result1.freqstr, '5T') - self.assertEqual(result2.ordinal, expected.ordinal) - self.assertEqual(result2.freqstr, 'T') - - def test_conv_annual(self): - # frequency conversion tests: from Annual Frequency - - ival_A = Period(freq='A', year=2007) - - ival_AJAN = Period(freq="A-JAN", year=2007) - ival_AJUN = Period(freq="A-JUN", year=2007) - ival_ANOV = Period(freq="A-NOV", year=2007) - - ival_A_to_Q_start = Period(freq='Q', year=2007, quarter=1) - ival_A_to_Q_end = Period(freq='Q', year=2007, quarter=4) - ival_A_to_M_start = Period(freq='M', year=2007, month=1) - ival_A_to_M_end = Period(freq='M', year=2007, month=12) - ival_A_to_W_start = Period(freq='W', year=2007, month=1, day=1) - ival_A_to_W_end = Period(freq='W', year=2007, month=12, day=31) - ival_A_to_B_start = Period(freq='B', year=2007, month=1, day=1) - ival_A_to_B_end = Period(freq='B', year=2007, month=12, day=31) - ival_A_to_D_start = Period(freq='D', year=2007, month=1, day=1) - ival_A_to_D_end = Period(freq='D', year=2007, month=12, day=31) - ival_A_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_A_to_H_end = Period(freq='H', year=2007, month=12, day=31, - hour=23) - ival_A_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_A_to_T_end = Period(freq='Min', year=2007, month=12, day=31, - hour=23, minute=59) - ival_A_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_A_to_S_end = Period(freq='S', year=2007, month=12, day=31, - hour=23, minute=59, second=59) - - ival_AJAN_to_D_end = Period(freq='D', year=2007, month=1, day=31) - ival_AJAN_to_D_start = Period(freq='D', year=2006, month=2, day=1) - ival_AJUN_to_D_end = Period(freq='D', year=2007, month=6, day=30) - ival_AJUN_to_D_start = Period(freq='D', year=2006, month=7, day=1) - ival_ANOV_to_D_end = Period(freq='D', year=2007, month=11, day=30) - ival_ANOV_to_D_start = Period(freq='D', year=2006, month=12, day=1) - - self.assertEqual(ival_A.asfreq('Q', 'S'), ival_A_to_Q_start) - self.assertEqual(ival_A.asfreq('Q', 'e'), ival_A_to_Q_end) - self.assertEqual(ival_A.asfreq('M', 's'), ival_A_to_M_start) - self.assertEqual(ival_A.asfreq('M', 'E'), ival_A_to_M_end) - self.assertEqual(ival_A.asfreq('W', 'S'), ival_A_to_W_start) - self.assertEqual(ival_A.asfreq('W', 'E'), ival_A_to_W_end) - self.assertEqual(ival_A.asfreq('B', 'S'), ival_A_to_B_start) - self.assertEqual(ival_A.asfreq('B', 'E'), ival_A_to_B_end) - self.assertEqual(ival_A.asfreq('D', 'S'), ival_A_to_D_start) - self.assertEqual(ival_A.asfreq('D', 'E'), ival_A_to_D_end) - self.assertEqual(ival_A.asfreq('H', 'S'), ival_A_to_H_start) - self.assertEqual(ival_A.asfreq('H', 'E'), ival_A_to_H_end) - self.assertEqual(ival_A.asfreq('min', 'S'), ival_A_to_T_start) - self.assertEqual(ival_A.asfreq('min', 'E'), ival_A_to_T_end) - self.assertEqual(ival_A.asfreq('T', 'S'), ival_A_to_T_start) - self.assertEqual(ival_A.asfreq('T', 'E'), ival_A_to_T_end) - self.assertEqual(ival_A.asfreq('S', 'S'), ival_A_to_S_start) - self.assertEqual(ival_A.asfreq('S', 'E'), ival_A_to_S_end) - - self.assertEqual(ival_AJAN.asfreq('D', 'S'), ival_AJAN_to_D_start) - self.assertEqual(ival_AJAN.asfreq('D', 'E'), ival_AJAN_to_D_end) - - self.assertEqual(ival_AJUN.asfreq('D', 'S'), ival_AJUN_to_D_start) - self.assertEqual(ival_AJUN.asfreq('D', 'E'), ival_AJUN_to_D_end) - - self.assertEqual(ival_ANOV.asfreq('D', 'S'), ival_ANOV_to_D_start) - self.assertEqual(ival_ANOV.asfreq('D', 'E'), ival_ANOV_to_D_end) - - self.assertEqual(ival_A.asfreq('A'), ival_A) - - def test_conv_quarterly(self): - # frequency conversion tests: from Quarterly Frequency - - ival_Q = Period(freq='Q', year=2007, quarter=1) - ival_Q_end_of_year = Period(freq='Q', year=2007, quarter=4) - - ival_QEJAN = Period(freq="Q-JAN", year=2007, quarter=1) - ival_QEJUN = Period(freq="Q-JUN", year=2007, quarter=1) - - ival_Q_to_A = Period(freq='A', year=2007) - ival_Q_to_M_start = Period(freq='M', year=2007, month=1) - ival_Q_to_M_end = Period(freq='M', year=2007, month=3) - ival_Q_to_W_start = Period(freq='W', year=2007, month=1, day=1) - ival_Q_to_W_end = Period(freq='W', year=2007, month=3, day=31) - ival_Q_to_B_start = Period(freq='B', year=2007, month=1, day=1) - ival_Q_to_B_end = Period(freq='B', year=2007, month=3, day=30) - ival_Q_to_D_start = Period(freq='D', year=2007, month=1, day=1) - ival_Q_to_D_end = Period(freq='D', year=2007, month=3, day=31) - ival_Q_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_Q_to_H_end = Period(freq='H', year=2007, month=3, day=31, hour=23) - ival_Q_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_Q_to_T_end = Period(freq='Min', year=2007, month=3, day=31, - hour=23, minute=59) - ival_Q_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_Q_to_S_end = Period(freq='S', year=2007, month=3, day=31, hour=23, - minute=59, second=59) - - ival_QEJAN_to_D_start = Period(freq='D', year=2006, month=2, day=1) - ival_QEJAN_to_D_end = Period(freq='D', year=2006, month=4, day=30) - - ival_QEJUN_to_D_start = Period(freq='D', year=2006, month=7, day=1) - ival_QEJUN_to_D_end = Period(freq='D', year=2006, month=9, day=30) - - self.assertEqual(ival_Q.asfreq('A'), ival_Q_to_A) - self.assertEqual(ival_Q_end_of_year.asfreq('A'), ival_Q_to_A) - - self.assertEqual(ival_Q.asfreq('M', 'S'), ival_Q_to_M_start) - self.assertEqual(ival_Q.asfreq('M', 'E'), ival_Q_to_M_end) - self.assertEqual(ival_Q.asfreq('W', 'S'), ival_Q_to_W_start) - self.assertEqual(ival_Q.asfreq('W', 'E'), ival_Q_to_W_end) - self.assertEqual(ival_Q.asfreq('B', 'S'), ival_Q_to_B_start) - self.assertEqual(ival_Q.asfreq('B', 'E'), ival_Q_to_B_end) - self.assertEqual(ival_Q.asfreq('D', 'S'), ival_Q_to_D_start) - self.assertEqual(ival_Q.asfreq('D', 'E'), ival_Q_to_D_end) - self.assertEqual(ival_Q.asfreq('H', 'S'), ival_Q_to_H_start) - self.assertEqual(ival_Q.asfreq('H', 'E'), ival_Q_to_H_end) - self.assertEqual(ival_Q.asfreq('Min', 'S'), ival_Q_to_T_start) - self.assertEqual(ival_Q.asfreq('Min', 'E'), ival_Q_to_T_end) - self.assertEqual(ival_Q.asfreq('S', 'S'), ival_Q_to_S_start) - self.assertEqual(ival_Q.asfreq('S', 'E'), ival_Q_to_S_end) - - self.assertEqual(ival_QEJAN.asfreq('D', 'S'), ival_QEJAN_to_D_start) - self.assertEqual(ival_QEJAN.asfreq('D', 'E'), ival_QEJAN_to_D_end) - self.assertEqual(ival_QEJUN.asfreq('D', 'S'), ival_QEJUN_to_D_start) - self.assertEqual(ival_QEJUN.asfreq('D', 'E'), ival_QEJUN_to_D_end) - - self.assertEqual(ival_Q.asfreq('Q'), ival_Q) - - def test_conv_monthly(self): - # frequency conversion tests: from Monthly Frequency - - ival_M = Period(freq='M', year=2007, month=1) - ival_M_end_of_year = Period(freq='M', year=2007, month=12) - ival_M_end_of_quarter = Period(freq='M', year=2007, month=3) - ival_M_to_A = Period(freq='A', year=2007) - ival_M_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_M_to_W_start = Period(freq='W', year=2007, month=1, day=1) - ival_M_to_W_end = Period(freq='W', year=2007, month=1, day=31) - ival_M_to_B_start = Period(freq='B', year=2007, month=1, day=1) - ival_M_to_B_end = Period(freq='B', year=2007, month=1, day=31) - ival_M_to_D_start = Period(freq='D', year=2007, month=1, day=1) - ival_M_to_D_end = Period(freq='D', year=2007, month=1, day=31) - ival_M_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_M_to_H_end = Period(freq='H', year=2007, month=1, day=31, hour=23) - ival_M_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_M_to_T_end = Period(freq='Min', year=2007, month=1, day=31, - hour=23, minute=59) - ival_M_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_M_to_S_end = Period(freq='S', year=2007, month=1, day=31, hour=23, - minute=59, second=59) - - self.assertEqual(ival_M.asfreq('A'), ival_M_to_A) - self.assertEqual(ival_M_end_of_year.asfreq('A'), ival_M_to_A) - self.assertEqual(ival_M.asfreq('Q'), ival_M_to_Q) - self.assertEqual(ival_M_end_of_quarter.asfreq('Q'), ival_M_to_Q) - - self.assertEqual(ival_M.asfreq('W', 'S'), ival_M_to_W_start) - self.assertEqual(ival_M.asfreq('W', 'E'), ival_M_to_W_end) - self.assertEqual(ival_M.asfreq('B', 'S'), ival_M_to_B_start) - self.assertEqual(ival_M.asfreq('B', 'E'), ival_M_to_B_end) - self.assertEqual(ival_M.asfreq('D', 'S'), ival_M_to_D_start) - self.assertEqual(ival_M.asfreq('D', 'E'), ival_M_to_D_end) - self.assertEqual(ival_M.asfreq('H', 'S'), ival_M_to_H_start) - self.assertEqual(ival_M.asfreq('H', 'E'), ival_M_to_H_end) - self.assertEqual(ival_M.asfreq('Min', 'S'), ival_M_to_T_start) - self.assertEqual(ival_M.asfreq('Min', 'E'), ival_M_to_T_end) - self.assertEqual(ival_M.asfreq('S', 'S'), ival_M_to_S_start) - self.assertEqual(ival_M.asfreq('S', 'E'), ival_M_to_S_end) - - self.assertEqual(ival_M.asfreq('M'), ival_M) - - def test_conv_weekly(self): - # frequency conversion tests: from Weekly Frequency - ival_W = Period(freq='W', year=2007, month=1, day=1) - - ival_WSUN = Period(freq='W', year=2007, month=1, day=7) - ival_WSAT = Period(freq='W-SAT', year=2007, month=1, day=6) - ival_WFRI = Period(freq='W-FRI', year=2007, month=1, day=5) - ival_WTHU = Period(freq='W-THU', year=2007, month=1, day=4) - ival_WWED = Period(freq='W-WED', year=2007, month=1, day=3) - ival_WTUE = Period(freq='W-TUE', year=2007, month=1, day=2) - ival_WMON = Period(freq='W-MON', year=2007, month=1, day=1) - - ival_WSUN_to_D_start = Period(freq='D', year=2007, month=1, day=1) - ival_WSUN_to_D_end = Period(freq='D', year=2007, month=1, day=7) - ival_WSAT_to_D_start = Period(freq='D', year=2006, month=12, day=31) - ival_WSAT_to_D_end = Period(freq='D', year=2007, month=1, day=6) - ival_WFRI_to_D_start = Period(freq='D', year=2006, month=12, day=30) - ival_WFRI_to_D_end = Period(freq='D', year=2007, month=1, day=5) - ival_WTHU_to_D_start = Period(freq='D', year=2006, month=12, day=29) - ival_WTHU_to_D_end = Period(freq='D', year=2007, month=1, day=4) - ival_WWED_to_D_start = Period(freq='D', year=2006, month=12, day=28) - ival_WWED_to_D_end = Period(freq='D', year=2007, month=1, day=3) - ival_WTUE_to_D_start = Period(freq='D', year=2006, month=12, day=27) - ival_WTUE_to_D_end = Period(freq='D', year=2007, month=1, day=2) - ival_WMON_to_D_start = Period(freq='D', year=2006, month=12, day=26) - ival_WMON_to_D_end = Period(freq='D', year=2007, month=1, day=1) - - ival_W_end_of_year = Period(freq='W', year=2007, month=12, day=31) - ival_W_end_of_quarter = Period(freq='W', year=2007, month=3, day=31) - ival_W_end_of_month = Period(freq='W', year=2007, month=1, day=31) - ival_W_to_A = Period(freq='A', year=2007) - ival_W_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_W_to_M = Period(freq='M', year=2007, month=1) - - if Period(freq='D', year=2007, month=12, day=31).weekday == 6: - ival_W_to_A_end_of_year = Period(freq='A', year=2007) - else: - ival_W_to_A_end_of_year = Period(freq='A', year=2008) - - if Period(freq='D', year=2007, month=3, day=31).weekday == 6: - ival_W_to_Q_end_of_quarter = Period(freq='Q', year=2007, quarter=1) - else: - ival_W_to_Q_end_of_quarter = Period(freq='Q', year=2007, quarter=2) - - if Period(freq='D', year=2007, month=1, day=31).weekday == 6: - ival_W_to_M_end_of_month = Period(freq='M', year=2007, month=1) - else: - ival_W_to_M_end_of_month = Period(freq='M', year=2007, month=2) - - ival_W_to_B_start = Period(freq='B', year=2007, month=1, day=1) - ival_W_to_B_end = Period(freq='B', year=2007, month=1, day=5) - ival_W_to_D_start = Period(freq='D', year=2007, month=1, day=1) - ival_W_to_D_end = Period(freq='D', year=2007, month=1, day=7) - ival_W_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_W_to_H_end = Period(freq='H', year=2007, month=1, day=7, hour=23) - ival_W_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_W_to_T_end = Period(freq='Min', year=2007, month=1, day=7, - hour=23, minute=59) - ival_W_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_W_to_S_end = Period(freq='S', year=2007, month=1, day=7, hour=23, - minute=59, second=59) - - self.assertEqual(ival_W.asfreq('A'), ival_W_to_A) - self.assertEqual(ival_W_end_of_year.asfreq('A'), - ival_W_to_A_end_of_year) - self.assertEqual(ival_W.asfreq('Q'), ival_W_to_Q) - self.assertEqual(ival_W_end_of_quarter.asfreq('Q'), - ival_W_to_Q_end_of_quarter) - self.assertEqual(ival_W.asfreq('M'), ival_W_to_M) - self.assertEqual(ival_W_end_of_month.asfreq('M'), - ival_W_to_M_end_of_month) - - self.assertEqual(ival_W.asfreq('B', 'S'), ival_W_to_B_start) - self.assertEqual(ival_W.asfreq('B', 'E'), ival_W_to_B_end) - - self.assertEqual(ival_W.asfreq('D', 'S'), ival_W_to_D_start) - self.assertEqual(ival_W.asfreq('D', 'E'), ival_W_to_D_end) - - self.assertEqual(ival_WSUN.asfreq('D', 'S'), ival_WSUN_to_D_start) - self.assertEqual(ival_WSUN.asfreq('D', 'E'), ival_WSUN_to_D_end) - self.assertEqual(ival_WSAT.asfreq('D', 'S'), ival_WSAT_to_D_start) - self.assertEqual(ival_WSAT.asfreq('D', 'E'), ival_WSAT_to_D_end) - self.assertEqual(ival_WFRI.asfreq('D', 'S'), ival_WFRI_to_D_start) - self.assertEqual(ival_WFRI.asfreq('D', 'E'), ival_WFRI_to_D_end) - self.assertEqual(ival_WTHU.asfreq('D', 'S'), ival_WTHU_to_D_start) - self.assertEqual(ival_WTHU.asfreq('D', 'E'), ival_WTHU_to_D_end) - self.assertEqual(ival_WWED.asfreq('D', 'S'), ival_WWED_to_D_start) - self.assertEqual(ival_WWED.asfreq('D', 'E'), ival_WWED_to_D_end) - self.assertEqual(ival_WTUE.asfreq('D', 'S'), ival_WTUE_to_D_start) - self.assertEqual(ival_WTUE.asfreq('D', 'E'), ival_WTUE_to_D_end) - self.assertEqual(ival_WMON.asfreq('D', 'S'), ival_WMON_to_D_start) - self.assertEqual(ival_WMON.asfreq('D', 'E'), ival_WMON_to_D_end) - - self.assertEqual(ival_W.asfreq('H', 'S'), ival_W_to_H_start) - self.assertEqual(ival_W.asfreq('H', 'E'), ival_W_to_H_end) - self.assertEqual(ival_W.asfreq('Min', 'S'), ival_W_to_T_start) - self.assertEqual(ival_W.asfreq('Min', 'E'), ival_W_to_T_end) - self.assertEqual(ival_W.asfreq('S', 'S'), ival_W_to_S_start) - self.assertEqual(ival_W.asfreq('S', 'E'), ival_W_to_S_end) - - self.assertEqual(ival_W.asfreq('W'), ival_W) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - ival_W.asfreq('WK') - - def test_conv_weekly_legacy(self): - # frequency conversion tests: from Weekly Frequency - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK', year=2007, month=1, day=1) - - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-SAT', year=2007, month=1, day=6) - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-FRI', year=2007, month=1, day=5) - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-THU', year=2007, month=1, day=4) - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-WED', year=2007, month=1, day=3) - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-TUE', year=2007, month=1, day=2) - with self.assertRaisesRegexp(ValueError, msg): - Period(freq='WK-MON', year=2007, month=1, day=1) - - def test_conv_business(self): - # frequency conversion tests: from Business Frequency" - - ival_B = Period(freq='B', year=2007, month=1, day=1) - ival_B_end_of_year = Period(freq='B', year=2007, month=12, day=31) - ival_B_end_of_quarter = Period(freq='B', year=2007, month=3, day=30) - ival_B_end_of_month = Period(freq='B', year=2007, month=1, day=31) - ival_B_end_of_week = Period(freq='B', year=2007, month=1, day=5) - - ival_B_to_A = Period(freq='A', year=2007) - ival_B_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_B_to_M = Period(freq='M', year=2007, month=1) - ival_B_to_W = Period(freq='W', year=2007, month=1, day=7) - ival_B_to_D = Period(freq='D', year=2007, month=1, day=1) - ival_B_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_B_to_H_end = Period(freq='H', year=2007, month=1, day=1, hour=23) - ival_B_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_B_to_T_end = Period(freq='Min', year=2007, month=1, day=1, - hour=23, minute=59) - ival_B_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_B_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=23, - minute=59, second=59) - - self.assertEqual(ival_B.asfreq('A'), ival_B_to_A) - self.assertEqual(ival_B_end_of_year.asfreq('A'), ival_B_to_A) - self.assertEqual(ival_B.asfreq('Q'), ival_B_to_Q) - self.assertEqual(ival_B_end_of_quarter.asfreq('Q'), ival_B_to_Q) - self.assertEqual(ival_B.asfreq('M'), ival_B_to_M) - self.assertEqual(ival_B_end_of_month.asfreq('M'), ival_B_to_M) - self.assertEqual(ival_B.asfreq('W'), ival_B_to_W) - self.assertEqual(ival_B_end_of_week.asfreq('W'), ival_B_to_W) - - self.assertEqual(ival_B.asfreq('D'), ival_B_to_D) - - self.assertEqual(ival_B.asfreq('H', 'S'), ival_B_to_H_start) - self.assertEqual(ival_B.asfreq('H', 'E'), ival_B_to_H_end) - self.assertEqual(ival_B.asfreq('Min', 'S'), ival_B_to_T_start) - self.assertEqual(ival_B.asfreq('Min', 'E'), ival_B_to_T_end) - self.assertEqual(ival_B.asfreq('S', 'S'), ival_B_to_S_start) - self.assertEqual(ival_B.asfreq('S', 'E'), ival_B_to_S_end) - - self.assertEqual(ival_B.asfreq('B'), ival_B) - - def test_conv_daily(self): - # frequency conversion tests: from Business Frequency" - - ival_D = Period(freq='D', year=2007, month=1, day=1) - ival_D_end_of_year = Period(freq='D', year=2007, month=12, day=31) - ival_D_end_of_quarter = Period(freq='D', year=2007, month=3, day=31) - ival_D_end_of_month = Period(freq='D', year=2007, month=1, day=31) - ival_D_end_of_week = Period(freq='D', year=2007, month=1, day=7) - - ival_D_friday = Period(freq='D', year=2007, month=1, day=5) - ival_D_saturday = Period(freq='D', year=2007, month=1, day=6) - ival_D_sunday = Period(freq='D', year=2007, month=1, day=7) - - # TODO: unused? - # ival_D_monday = Period(freq='D', year=2007, month=1, day=8) - - ival_B_friday = Period(freq='B', year=2007, month=1, day=5) - ival_B_monday = Period(freq='B', year=2007, month=1, day=8) - - ival_D_to_A = Period(freq='A', year=2007) - - ival_Deoq_to_AJAN = Period(freq='A-JAN', year=2008) - ival_Deoq_to_AJUN = Period(freq='A-JUN', year=2007) - ival_Deoq_to_ADEC = Period(freq='A-DEC', year=2007) - - ival_D_to_QEJAN = Period(freq="Q-JAN", year=2007, quarter=4) - ival_D_to_QEJUN = Period(freq="Q-JUN", year=2007, quarter=3) - ival_D_to_QEDEC = Period(freq="Q-DEC", year=2007, quarter=1) - - ival_D_to_M = Period(freq='M', year=2007, month=1) - ival_D_to_W = Period(freq='W', year=2007, month=1, day=7) - - ival_D_to_H_start = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_D_to_H_end = Period(freq='H', year=2007, month=1, day=1, hour=23) - ival_D_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_D_to_T_end = Period(freq='Min', year=2007, month=1, day=1, - hour=23, minute=59) - ival_D_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_D_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=23, - minute=59, second=59) - - self.assertEqual(ival_D.asfreq('A'), ival_D_to_A) - - self.assertEqual(ival_D_end_of_quarter.asfreq('A-JAN'), - ival_Deoq_to_AJAN) - self.assertEqual(ival_D_end_of_quarter.asfreq('A-JUN'), - ival_Deoq_to_AJUN) - self.assertEqual(ival_D_end_of_quarter.asfreq('A-DEC'), - ival_Deoq_to_ADEC) - - self.assertEqual(ival_D_end_of_year.asfreq('A'), ival_D_to_A) - self.assertEqual(ival_D_end_of_quarter.asfreq('Q'), ival_D_to_QEDEC) - self.assertEqual(ival_D.asfreq("Q-JAN"), ival_D_to_QEJAN) - self.assertEqual(ival_D.asfreq("Q-JUN"), ival_D_to_QEJUN) - self.assertEqual(ival_D.asfreq("Q-DEC"), ival_D_to_QEDEC) - self.assertEqual(ival_D.asfreq('M'), ival_D_to_M) - self.assertEqual(ival_D_end_of_month.asfreq('M'), ival_D_to_M) - self.assertEqual(ival_D.asfreq('W'), ival_D_to_W) - self.assertEqual(ival_D_end_of_week.asfreq('W'), ival_D_to_W) - - self.assertEqual(ival_D_friday.asfreq('B'), ival_B_friday) - self.assertEqual(ival_D_saturday.asfreq('B', 'S'), ival_B_friday) - self.assertEqual(ival_D_saturday.asfreq('B', 'E'), ival_B_monday) - self.assertEqual(ival_D_sunday.asfreq('B', 'S'), ival_B_friday) - self.assertEqual(ival_D_sunday.asfreq('B', 'E'), ival_B_monday) - - self.assertEqual(ival_D.asfreq('H', 'S'), ival_D_to_H_start) - self.assertEqual(ival_D.asfreq('H', 'E'), ival_D_to_H_end) - self.assertEqual(ival_D.asfreq('Min', 'S'), ival_D_to_T_start) - self.assertEqual(ival_D.asfreq('Min', 'E'), ival_D_to_T_end) - self.assertEqual(ival_D.asfreq('S', 'S'), ival_D_to_S_start) - self.assertEqual(ival_D.asfreq('S', 'E'), ival_D_to_S_end) - - self.assertEqual(ival_D.asfreq('D'), ival_D) - - def test_conv_hourly(self): - # frequency conversion tests: from Hourly Frequency" - - ival_H = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_H_end_of_year = Period(freq='H', year=2007, month=12, day=31, - hour=23) - ival_H_end_of_quarter = Period(freq='H', year=2007, month=3, day=31, - hour=23) - ival_H_end_of_month = Period(freq='H', year=2007, month=1, day=31, - hour=23) - ival_H_end_of_week = Period(freq='H', year=2007, month=1, day=7, - hour=23) - ival_H_end_of_day = Period(freq='H', year=2007, month=1, day=1, - hour=23) - ival_H_end_of_bus = Period(freq='H', year=2007, month=1, day=1, - hour=23) - - ival_H_to_A = Period(freq='A', year=2007) - ival_H_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_H_to_M = Period(freq='M', year=2007, month=1) - ival_H_to_W = Period(freq='W', year=2007, month=1, day=7) - ival_H_to_D = Period(freq='D', year=2007, month=1, day=1) - ival_H_to_B = Period(freq='B', year=2007, month=1, day=1) - - ival_H_to_T_start = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=0) - ival_H_to_T_end = Period(freq='Min', year=2007, month=1, day=1, hour=0, - minute=59) - ival_H_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_H_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=59, second=59) - - self.assertEqual(ival_H.asfreq('A'), ival_H_to_A) - self.assertEqual(ival_H_end_of_year.asfreq('A'), ival_H_to_A) - self.assertEqual(ival_H.asfreq('Q'), ival_H_to_Q) - self.assertEqual(ival_H_end_of_quarter.asfreq('Q'), ival_H_to_Q) - self.assertEqual(ival_H.asfreq('M'), ival_H_to_M) - self.assertEqual(ival_H_end_of_month.asfreq('M'), ival_H_to_M) - self.assertEqual(ival_H.asfreq('W'), ival_H_to_W) - self.assertEqual(ival_H_end_of_week.asfreq('W'), ival_H_to_W) - self.assertEqual(ival_H.asfreq('D'), ival_H_to_D) - self.assertEqual(ival_H_end_of_day.asfreq('D'), ival_H_to_D) - self.assertEqual(ival_H.asfreq('B'), ival_H_to_B) - self.assertEqual(ival_H_end_of_bus.asfreq('B'), ival_H_to_B) - - self.assertEqual(ival_H.asfreq('Min', 'S'), ival_H_to_T_start) - self.assertEqual(ival_H.asfreq('Min', 'E'), ival_H_to_T_end) - self.assertEqual(ival_H.asfreq('S', 'S'), ival_H_to_S_start) - self.assertEqual(ival_H.asfreq('S', 'E'), ival_H_to_S_end) - - self.assertEqual(ival_H.asfreq('H'), ival_H) - - def test_conv_minutely(self): - # frequency conversion tests: from Minutely Frequency" - - ival_T = Period(freq='Min', year=2007, month=1, day=1, hour=0, - minute=0) - ival_T_end_of_year = Period(freq='Min', year=2007, month=12, day=31, - hour=23, minute=59) - ival_T_end_of_quarter = Period(freq='Min', year=2007, month=3, day=31, - hour=23, minute=59) - ival_T_end_of_month = Period(freq='Min', year=2007, month=1, day=31, - hour=23, minute=59) - ival_T_end_of_week = Period(freq='Min', year=2007, month=1, day=7, - hour=23, minute=59) - ival_T_end_of_day = Period(freq='Min', year=2007, month=1, day=1, - hour=23, minute=59) - ival_T_end_of_bus = Period(freq='Min', year=2007, month=1, day=1, - hour=23, minute=59) - ival_T_end_of_hour = Period(freq='Min', year=2007, month=1, day=1, - hour=0, minute=59) - - ival_T_to_A = Period(freq='A', year=2007) - ival_T_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_T_to_M = Period(freq='M', year=2007, month=1) - ival_T_to_W = Period(freq='W', year=2007, month=1, day=7) - ival_T_to_D = Period(freq='D', year=2007, month=1, day=1) - ival_T_to_B = Period(freq='B', year=2007, month=1, day=1) - ival_T_to_H = Period(freq='H', year=2007, month=1, day=1, hour=0) - - ival_T_to_S_start = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=0) - ival_T_to_S_end = Period(freq='S', year=2007, month=1, day=1, hour=0, - minute=0, second=59) - - self.assertEqual(ival_T.asfreq('A'), ival_T_to_A) - self.assertEqual(ival_T_end_of_year.asfreq('A'), ival_T_to_A) - self.assertEqual(ival_T.asfreq('Q'), ival_T_to_Q) - self.assertEqual(ival_T_end_of_quarter.asfreq('Q'), ival_T_to_Q) - self.assertEqual(ival_T.asfreq('M'), ival_T_to_M) - self.assertEqual(ival_T_end_of_month.asfreq('M'), ival_T_to_M) - self.assertEqual(ival_T.asfreq('W'), ival_T_to_W) - self.assertEqual(ival_T_end_of_week.asfreq('W'), ival_T_to_W) - self.assertEqual(ival_T.asfreq('D'), ival_T_to_D) - self.assertEqual(ival_T_end_of_day.asfreq('D'), ival_T_to_D) - self.assertEqual(ival_T.asfreq('B'), ival_T_to_B) - self.assertEqual(ival_T_end_of_bus.asfreq('B'), ival_T_to_B) - self.assertEqual(ival_T.asfreq('H'), ival_T_to_H) - self.assertEqual(ival_T_end_of_hour.asfreq('H'), ival_T_to_H) - - self.assertEqual(ival_T.asfreq('S', 'S'), ival_T_to_S_start) - self.assertEqual(ival_T.asfreq('S', 'E'), ival_T_to_S_end) - - self.assertEqual(ival_T.asfreq('Min'), ival_T) - - def test_conv_secondly(self): - # frequency conversion tests: from Secondly Frequency" - - ival_S = Period(freq='S', year=2007, month=1, day=1, hour=0, minute=0, - second=0) - ival_S_end_of_year = Period(freq='S', year=2007, month=12, day=31, - hour=23, minute=59, second=59) - ival_S_end_of_quarter = Period(freq='S', year=2007, month=3, day=31, - hour=23, minute=59, second=59) - ival_S_end_of_month = Period(freq='S', year=2007, month=1, day=31, - hour=23, minute=59, second=59) - ival_S_end_of_week = Period(freq='S', year=2007, month=1, day=7, - hour=23, minute=59, second=59) - ival_S_end_of_day = Period(freq='S', year=2007, month=1, day=1, - hour=23, minute=59, second=59) - ival_S_end_of_bus = Period(freq='S', year=2007, month=1, day=1, - hour=23, minute=59, second=59) - ival_S_end_of_hour = Period(freq='S', year=2007, month=1, day=1, - hour=0, minute=59, second=59) - ival_S_end_of_minute = Period(freq='S', year=2007, month=1, day=1, - hour=0, minute=0, second=59) - - ival_S_to_A = Period(freq='A', year=2007) - ival_S_to_Q = Period(freq='Q', year=2007, quarter=1) - ival_S_to_M = Period(freq='M', year=2007, month=1) - ival_S_to_W = Period(freq='W', year=2007, month=1, day=7) - ival_S_to_D = Period(freq='D', year=2007, month=1, day=1) - ival_S_to_B = Period(freq='B', year=2007, month=1, day=1) - ival_S_to_H = Period(freq='H', year=2007, month=1, day=1, hour=0) - ival_S_to_T = Period(freq='Min', year=2007, month=1, day=1, hour=0, - minute=0) - - self.assertEqual(ival_S.asfreq('A'), ival_S_to_A) - self.assertEqual(ival_S_end_of_year.asfreq('A'), ival_S_to_A) - self.assertEqual(ival_S.asfreq('Q'), ival_S_to_Q) - self.assertEqual(ival_S_end_of_quarter.asfreq('Q'), ival_S_to_Q) - self.assertEqual(ival_S.asfreq('M'), ival_S_to_M) - self.assertEqual(ival_S_end_of_month.asfreq('M'), ival_S_to_M) - self.assertEqual(ival_S.asfreq('W'), ival_S_to_W) - self.assertEqual(ival_S_end_of_week.asfreq('W'), ival_S_to_W) - self.assertEqual(ival_S.asfreq('D'), ival_S_to_D) - self.assertEqual(ival_S_end_of_day.asfreq('D'), ival_S_to_D) - self.assertEqual(ival_S.asfreq('B'), ival_S_to_B) - self.assertEqual(ival_S_end_of_bus.asfreq('B'), ival_S_to_B) - self.assertEqual(ival_S.asfreq('H'), ival_S_to_H) - self.assertEqual(ival_S_end_of_hour.asfreq('H'), ival_S_to_H) - self.assertEqual(ival_S.asfreq('Min'), ival_S_to_T) - self.assertEqual(ival_S_end_of_minute.asfreq('Min'), ival_S_to_T) - - self.assertEqual(ival_S.asfreq('S'), ival_S) - - def test_asfreq_mult(self): - # normal freq to mult freq - p = Period(freq='A', year=2007) - # ordinal will not change - for freq in ['3A', offsets.YearEnd(3)]: - result = p.asfreq(freq) - expected = Period('2007', freq='3A') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - # ordinal will not change - for freq in ['3A', offsets.YearEnd(3)]: - result = p.asfreq(freq, how='S') - expected = Period('2007', freq='3A') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - - # mult freq to normal freq - p = Period(freq='3A', year=2007) - # ordinal will change because how=E is the default - for freq in ['A', offsets.YearEnd()]: - result = p.asfreq(freq) - expected = Period('2009', freq='A') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - # ordinal will not change - for freq in ['A', offsets.YearEnd()]: - result = p.asfreq(freq, how='S') - expected = Period('2007', freq='A') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - - p = Period(freq='A', year=2007) - for freq in ['2M', offsets.MonthEnd(2)]: - result = p.asfreq(freq) - expected = Period('2007-12', freq='2M') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - for freq in ['2M', offsets.MonthEnd(2)]: - result = p.asfreq(freq, how='S') - expected = Period('2007-01', freq='2M') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - - p = Period(freq='3A', year=2007) - for freq in ['2M', offsets.MonthEnd(2)]: - result = p.asfreq(freq) - expected = Period('2009-12', freq='2M') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - for freq in ['2M', offsets.MonthEnd(2)]: - result = p.asfreq(freq, how='S') - expected = Period('2007-01', freq='2M') - - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - - def test_asfreq_combined(self): - # normal freq to combined freq - p = Period('2007', freq='H') - - # ordinal will not change - expected = Period('2007', freq='25H') - for freq, how in zip(['1D1H', '1H1D'], ['E', 'S']): - result = p.asfreq(freq, how=how) - self.assertEqual(result, expected) - self.assertEqual(result.ordinal, expected.ordinal) - self.assertEqual(result.freq, expected.freq) - - # combined freq to normal freq - p1 = Period(freq='1D1H', year=2007) - p2 = Period(freq='1H1D', year=2007) - - # ordinal will change because how=E is the default - result1 = p1.asfreq('H') - result2 = p2.asfreq('H') - expected = Period('2007-01-02', freq='H') - self.assertEqual(result1, expected) - self.assertEqual(result1.ordinal, expected.ordinal) - self.assertEqual(result1.freq, expected.freq) - self.assertEqual(result2, expected) - self.assertEqual(result2.ordinal, expected.ordinal) - self.assertEqual(result2.freq, expected.freq) - - # ordinal will not change - result1 = p1.asfreq('H', how='S') - result2 = p2.asfreq('H', how='S') - expected = Period('2007-01-01', freq='H') - self.assertEqual(result1, expected) - self.assertEqual(result1.ordinal, expected.ordinal) - self.assertEqual(result1.freq, expected.freq) - self.assertEqual(result2, expected) - self.assertEqual(result2.ordinal, expected.ordinal) - self.assertEqual(result2.freq, expected.freq) - - def test_is_leap_year(self): - # GH 13727 - for freq in ['A', 'M', 'D', 'H']: - p = Period('2000-01-01 00:00:00', freq=freq) - self.assertTrue(p.is_leap_year) - self.assertIsInstance(p.is_leap_year, bool) - - p = Period('1999-01-01 00:00:00', freq=freq) - self.assertFalse(p.is_leap_year) - - p = Period('2004-01-01 00:00:00', freq=freq) - self.assertTrue(p.is_leap_year) - - p = Period('2100-01-01 00:00:00', freq=freq) - self.assertFalse(p.is_leap_year) - - -class TestPeriodIndex(tm.TestCase): - def setUp(self): - pass - - def test_hash_error(self): - index = period_range('20010101', periods=10) - with tm.assertRaisesRegexp(TypeError, "unhashable type: %r" % - type(index).__name__): - hash(index) - - def test_make_time_series(self): - index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - series = Series(1, index=index) - tm.assertIsInstance(series, Series) - - def test_constructor_use_start_freq(self): - # GH #1118 - p = Period('4/2/2012', freq='B') - index = PeriodIndex(start=p, periods=10) - expected = PeriodIndex(start='4/2/2012', periods=10, freq='B') - tm.assert_index_equal(index, expected) - - def test_constructor_field_arrays(self): - # GH #1264 - - years = np.arange(1990, 2010).repeat(4)[2:-2] - quarters = np.tile(np.arange(1, 5), 20)[2:-2] - - index = PeriodIndex(year=years, quarter=quarters, freq='Q-DEC') - expected = period_range('1990Q3', '2009Q2', freq='Q-DEC') - tm.assert_index_equal(index, expected) - - index2 = PeriodIndex(year=years, quarter=quarters, freq='2Q-DEC') - tm.assert_numpy_array_equal(index.asi8, index2.asi8) - - index = PeriodIndex(year=years, quarter=quarters) - tm.assert_index_equal(index, expected) - - years = [2007, 2007, 2007] - months = [1, 2] - self.assertRaises(ValueError, PeriodIndex, year=years, month=months, - freq='M') - self.assertRaises(ValueError, PeriodIndex, year=years, month=months, - freq='2M') - self.assertRaises(ValueError, PeriodIndex, year=years, month=months, - freq='M', start=Period('2007-01', freq='M')) - - years = [2007, 2007, 2007] - months = [1, 2, 3] - idx = PeriodIndex(year=years, month=months, freq='M') - exp = period_range('2007-01', periods=3, freq='M') - tm.assert_index_equal(idx, exp) - - def test_constructor_U(self): - # U was used as undefined period - self.assertRaises(ValueError, period_range, '2007-1-1', periods=500, - freq='X') - - def test_constructor_nano(self): - idx = period_range(start=Period(ordinal=1, freq='N'), - end=Period(ordinal=4, freq='N'), freq='N') - exp = PeriodIndex([Period(ordinal=1, freq='N'), - Period(ordinal=2, freq='N'), - Period(ordinal=3, freq='N'), - Period(ordinal=4, freq='N')], freq='N') - tm.assert_index_equal(idx, exp) - - def test_constructor_arrays_negative_year(self): - years = np.arange(1960, 2000, dtype=np.int64).repeat(4) - quarters = np.tile(np.array([1, 2, 3, 4], dtype=np.int64), 40) - - pindex = PeriodIndex(year=years, quarter=quarters) - - self.assert_numpy_array_equal(pindex.year, years) - self.assert_numpy_array_equal(pindex.quarter, quarters) - - def test_constructor_invalid_quarters(self): - self.assertRaises(ValueError, PeriodIndex, year=lrange(2000, 2004), - quarter=lrange(4), freq='Q-DEC') - - def test_constructor_corner(self): - self.assertRaises(ValueError, PeriodIndex, periods=10, freq='A') - - start = Period('2007', freq='A-JUN') - end = Period('2010', freq='A-DEC') - self.assertRaises(ValueError, PeriodIndex, start=start, end=end) - self.assertRaises(ValueError, PeriodIndex, start=start) - self.assertRaises(ValueError, PeriodIndex, end=end) - - result = period_range('2007-01', periods=10.5, freq='M') - exp = period_range('2007-01', periods=10, freq='M') - tm.assert_index_equal(result, exp) - - def test_constructor_fromarraylike(self): - idx = period_range('2007-01', periods=20, freq='M') - - # values is an array of Period, thus can retrieve freq - tm.assert_index_equal(PeriodIndex(idx.values), idx) - tm.assert_index_equal(PeriodIndex(list(idx.values)), idx) - - self.assertRaises(ValueError, PeriodIndex, idx._values) - self.assertRaises(ValueError, PeriodIndex, list(idx._values)) - self.assertRaises(ValueError, PeriodIndex, - data=Period('2007', freq='A')) - - result = PeriodIndex(iter(idx)) - tm.assert_index_equal(result, idx) - - result = PeriodIndex(idx) - tm.assert_index_equal(result, idx) - - result = PeriodIndex(idx, freq='M') - tm.assert_index_equal(result, idx) - - result = PeriodIndex(idx, freq=offsets.MonthEnd()) - tm.assert_index_equal(result, idx) - self.assertTrue(result.freq, 'M') - - result = PeriodIndex(idx, freq='2M') - tm.assert_index_equal(result, idx.asfreq('2M')) - self.assertTrue(result.freq, '2M') - - result = PeriodIndex(idx, freq=offsets.MonthEnd(2)) - tm.assert_index_equal(result, idx.asfreq('2M')) - self.assertTrue(result.freq, '2M') - - result = PeriodIndex(idx, freq='D') - exp = idx.asfreq('D', 'e') - tm.assert_index_equal(result, exp) - - def test_constructor_datetime64arr(self): - vals = np.arange(100000, 100000 + 10000, 100, dtype=np.int64) - vals = vals.view(np.dtype('M8[us]')) - - self.assertRaises(ValueError, PeriodIndex, vals, freq='D') - - def test_constructor_dtype(self): - # passing a dtype with a tz should localize - idx = PeriodIndex(['2013-01', '2013-03'], dtype='period[M]') - exp = PeriodIndex(['2013-01', '2013-03'], freq='M') - tm.assert_index_equal(idx, exp) - self.assertEqual(idx.dtype, 'period[M]') - - idx = PeriodIndex(['2013-01-05', '2013-03-05'], dtype='period[3D]') - exp = PeriodIndex(['2013-01-05', '2013-03-05'], freq='3D') - tm.assert_index_equal(idx, exp) - self.assertEqual(idx.dtype, 'period[3D]') - - # if we already have a freq and its not the same, then asfreq - # (not changed) - idx = PeriodIndex(['2013-01-01', '2013-01-02'], freq='D') - - res = PeriodIndex(idx, dtype='period[M]') - exp = PeriodIndex(['2013-01', '2013-01'], freq='M') - tm.assert_index_equal(res, exp) - self.assertEqual(res.dtype, 'period[M]') - - res = PeriodIndex(idx, freq='M') - tm.assert_index_equal(res, exp) - self.assertEqual(res.dtype, 'period[M]') - - msg = 'specified freq and dtype are different' - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - PeriodIndex(['2011-01'], freq='M', dtype='period[D]') - - def test_constructor_empty(self): - idx = pd.PeriodIndex([], freq='M') - tm.assertIsInstance(idx, PeriodIndex) - self.assertEqual(len(idx), 0) - self.assertEqual(idx.freq, 'M') - - with tm.assertRaisesRegexp(ValueError, 'freq not specified'): - pd.PeriodIndex([]) - - def test_constructor_pi_nat(self): - idx = PeriodIndex([Period('2011-01', freq='M'), pd.NaT, - Period('2011-01', freq='M')]) - exp = PeriodIndex(['2011-01', 'NaT', '2011-01'], freq='M') - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex(np.array([Period('2011-01', freq='M'), pd.NaT, - Period('2011-01', freq='M')])) - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex([pd.NaT, pd.NaT, Period('2011-01', freq='M'), - Period('2011-01', freq='M')]) - exp = PeriodIndex(['NaT', 'NaT', '2011-01', '2011-01'], freq='M') - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex(np.array([pd.NaT, pd.NaT, - Period('2011-01', freq='M'), - Period('2011-01', freq='M')])) - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex([pd.NaT, pd.NaT, '2011-01', '2011-01'], freq='M') - tm.assert_index_equal(idx, exp) - - with tm.assertRaisesRegexp(ValueError, 'freq not specified'): - PeriodIndex([pd.NaT, pd.NaT]) - - with tm.assertRaisesRegexp(ValueError, 'freq not specified'): - PeriodIndex(np.array([pd.NaT, pd.NaT])) - - with tm.assertRaisesRegexp(ValueError, 'freq not specified'): - PeriodIndex(['NaT', 'NaT']) - - with tm.assertRaisesRegexp(ValueError, 'freq not specified'): - PeriodIndex(np.array(['NaT', 'NaT'])) - - def test_constructor_incompat_freq(self): - msg = "Input has different freq=D from PeriodIndex\\(freq=M\\)" - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - PeriodIndex([Period('2011-01', freq='M'), pd.NaT, - Period('2011-01', freq='D')]) - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - PeriodIndex(np.array([Period('2011-01', freq='M'), pd.NaT, - Period('2011-01', freq='D')])) - - # first element is pd.NaT - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - PeriodIndex([pd.NaT, Period('2011-01', freq='M'), - Period('2011-01', freq='D')]) - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - PeriodIndex(np.array([pd.NaT, Period('2011-01', freq='M'), - Period('2011-01', freq='D')])) - - def test_constructor_mixed(self): - idx = PeriodIndex(['2011-01', pd.NaT, Period('2011-01', freq='M')]) - exp = PeriodIndex(['2011-01', 'NaT', '2011-01'], freq='M') - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex(['NaT', pd.NaT, Period('2011-01', freq='M')]) - exp = PeriodIndex(['NaT', 'NaT', '2011-01'], freq='M') - tm.assert_index_equal(idx, exp) - - idx = PeriodIndex([Period('2011-01-01', freq='D'), pd.NaT, - '2012-01-01']) - exp = PeriodIndex(['2011-01-01', 'NaT', '2012-01-01'], freq='D') - tm.assert_index_equal(idx, exp) - - def test_constructor_simple_new(self): - idx = period_range('2007-01', name='p', periods=2, freq='M') - result = idx._simple_new(idx, 'p', freq=idx.freq) - tm.assert_index_equal(result, idx) - - result = idx._simple_new(idx.astype('i8'), 'p', freq=idx.freq) - tm.assert_index_equal(result, idx) - - result = idx._simple_new([pd.Period('2007-01', freq='M'), - pd.Period('2007-02', freq='M')], - 'p', freq=idx.freq) - self.assert_index_equal(result, idx) - - result = idx._simple_new(np.array([pd.Period('2007-01', freq='M'), - pd.Period('2007-02', freq='M')]), - 'p', freq=idx.freq) - self.assert_index_equal(result, idx) - - def test_constructor_simple_new_empty(self): - # GH13079 - idx = PeriodIndex([], freq='M', name='p') - result = idx._simple_new(idx, name='p', freq='M') - tm.assert_index_equal(result, idx) - - def test_constructor_simple_new_floats(self): - # GH13079 - for floats in [[1.1], np.array([1.1])]: - with self.assertRaises(TypeError): - pd.PeriodIndex._simple_new(floats, freq='M') - - def test_shallow_copy_empty(self): - - # GH13067 - idx = PeriodIndex([], freq='M') - result = idx._shallow_copy() - expected = idx - - tm.assert_index_equal(result, expected) - - def test_constructor_nat(self): - self.assertRaises(ValueError, period_range, start='NaT', - end='2011-01-01', freq='M') - self.assertRaises(ValueError, period_range, start='2011-01-01', - end='NaT', freq='M') - - def test_constructor_year_and_quarter(self): - year = pd.Series([2001, 2002, 2003]) - quarter = year - 2000 - idx = PeriodIndex(year=year, quarter=quarter) - strs = ['%dQ%d' % t for t in zip(quarter, year)] - lops = list(map(Period, strs)) - p = PeriodIndex(lops) - tm.assert_index_equal(p, idx) - - def test_constructor_freq_mult(self): - # GH #7811 - for func in [PeriodIndex, period_range]: - # must be the same, but for sure... - pidx = func(start='2014-01', freq='2M', periods=4) - expected = PeriodIndex(['2014-01', '2014-03', - '2014-05', '2014-07'], freq='2M') - tm.assert_index_equal(pidx, expected) - - pidx = func(start='2014-01-02', end='2014-01-15', freq='3D') - expected = PeriodIndex(['2014-01-02', '2014-01-05', - '2014-01-08', '2014-01-11', - '2014-01-14'], freq='3D') - tm.assert_index_equal(pidx, expected) - - pidx = func(end='2014-01-01 17:00', freq='4H', periods=3) - expected = PeriodIndex(['2014-01-01 09:00', '2014-01-01 13:00', - '2014-01-01 17:00'], freq='4H') - tm.assert_index_equal(pidx, expected) - - msg = ('Frequency must be positive, because it' - ' represents span: -1M') - with tm.assertRaisesRegexp(ValueError, msg): - PeriodIndex(['2011-01'], freq='-1M') - - msg = ('Frequency must be positive, because it' ' represents span: 0M') - with tm.assertRaisesRegexp(ValueError, msg): - PeriodIndex(['2011-01'], freq='0M') - - msg = ('Frequency must be positive, because it' ' represents span: 0M') - with tm.assertRaisesRegexp(ValueError, msg): - period_range('2011-01', periods=3, freq='0M') - - def test_constructor_freq_mult_dti_compat(self): - import itertools - mults = [1, 2, 3, 4, 5] - freqs = ['A', 'M', 'D', 'T', 'S'] - for mult, freq in itertools.product(mults, freqs): - freqstr = str(mult) + freq - pidx = PeriodIndex(start='2014-04-01', freq=freqstr, periods=10) - expected = date_range(start='2014-04-01', freq=freqstr, - periods=10).to_period(freqstr) - tm.assert_index_equal(pidx, expected) - - def test_constructor_freq_combined(self): - for freq in ['1D1H', '1H1D']: - pidx = PeriodIndex(['2016-01-01', '2016-01-02'], freq=freq) - expected = PeriodIndex(['2016-01-01 00:00', '2016-01-02 00:00'], - freq='25H') - for freq, func in zip(['1D1H', '1H1D'], [PeriodIndex, period_range]): - pidx = func(start='2016-01-01', periods=2, freq=freq) - expected = PeriodIndex(['2016-01-01 00:00', '2016-01-02 01:00'], - freq='25H') - tm.assert_index_equal(pidx, expected) - - def test_dtype_str(self): - pi = pd.PeriodIndex([], freq='M') - self.assertEqual(pi.dtype_str, 'period[M]') - self.assertEqual(pi.dtype_str, str(pi.dtype)) - - pi = pd.PeriodIndex([], freq='3M') - self.assertEqual(pi.dtype_str, 'period[3M]') - self.assertEqual(pi.dtype_str, str(pi.dtype)) - - def test_view_asi8(self): - idx = pd.PeriodIndex([], freq='M') - - exp = np.array([], dtype=np.int64) - tm.assert_numpy_array_equal(idx.view('i8'), exp) - tm.assert_numpy_array_equal(idx.asi8, exp) - - idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') - - exp = np.array([492, -9223372036854775808], dtype=np.int64) - tm.assert_numpy_array_equal(idx.view('i8'), exp) - tm.assert_numpy_array_equal(idx.asi8, exp) - - exp = np.array([14975, -9223372036854775808], dtype=np.int64) - idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') - tm.assert_numpy_array_equal(idx.view('i8'), exp) - tm.assert_numpy_array_equal(idx.asi8, exp) - - def test_values(self): - idx = pd.PeriodIndex([], freq='M') - - exp = np.array([], dtype=np.object) - tm.assert_numpy_array_equal(idx.values, exp) - tm.assert_numpy_array_equal(idx.get_values(), exp) - exp = np.array([], dtype=np.int64) - tm.assert_numpy_array_equal(idx._values, exp) - - idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') - - exp = np.array([pd.Period('2011-01', freq='M'), pd.NaT], dtype=object) - tm.assert_numpy_array_equal(idx.values, exp) - tm.assert_numpy_array_equal(idx.get_values(), exp) - exp = np.array([492, -9223372036854775808], dtype=np.int64) - tm.assert_numpy_array_equal(idx._values, exp) - - idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') - - exp = np.array([pd.Period('2011-01-01', freq='D'), pd.NaT], - dtype=object) - tm.assert_numpy_array_equal(idx.values, exp) - tm.assert_numpy_array_equal(idx.get_values(), exp) - exp = np.array([14975, -9223372036854775808], dtype=np.int64) - tm.assert_numpy_array_equal(idx._values, exp) - - def test_asobject_like(self): - idx = pd.PeriodIndex([], freq='M') - - exp = np.array([], dtype=object) - tm.assert_numpy_array_equal(idx.asobject.values, exp) - tm.assert_numpy_array_equal(idx._mpl_repr(), exp) - - idx = pd.PeriodIndex(['2011-01', pd.NaT], freq='M') - - exp = np.array([pd.Period('2011-01', freq='M'), pd.NaT], dtype=object) - tm.assert_numpy_array_equal(idx.asobject.values, exp) - tm.assert_numpy_array_equal(idx._mpl_repr(), exp) - - exp = np.array([pd.Period('2011-01-01', freq='D'), pd.NaT], - dtype=object) - idx = pd.PeriodIndex(['2011-01-01', pd.NaT], freq='D') - tm.assert_numpy_array_equal(idx.asobject.values, exp) - tm.assert_numpy_array_equal(idx._mpl_repr(), exp) - - def test_is_(self): - create_index = lambda: PeriodIndex(freq='A', start='1/1/2001', - end='12/1/2009') - index = create_index() - self.assertEqual(index.is_(index), True) - self.assertEqual(index.is_(create_index()), False) - self.assertEqual(index.is_(index.view()), True) - self.assertEqual( - index.is_(index.view().view().view().view().view()), True) - self.assertEqual(index.view().is_(index), True) - ind2 = index.view() - index.name = "Apple" - self.assertEqual(ind2.is_(index), True) - self.assertEqual(index.is_(index[:]), False) - self.assertEqual(index.is_(index.asfreq('M')), False) - self.assertEqual(index.is_(index.asfreq('A')), False) - self.assertEqual(index.is_(index - 2), False) - self.assertEqual(index.is_(index - 0), False) - - def test_comp_period(self): - idx = period_range('2007-01', periods=20, freq='M') - - result = idx < idx[10] - exp = idx.values < idx.values[10] - self.assert_numpy_array_equal(result, exp) - - def test_getitem_index(self): - idx = period_range('2007-01', periods=10, freq='M', name='x') - - result = idx[[1, 3, 5]] - exp = pd.PeriodIndex(['2007-02', '2007-04', '2007-06'], - freq='M', name='x') - tm.assert_index_equal(result, exp) - - result = idx[[True, True, False, False, False, - True, True, False, False, False]] - exp = pd.PeriodIndex(['2007-01', '2007-02', '2007-06', '2007-07'], - freq='M', name='x') - tm.assert_index_equal(result, exp) - - def test_getitem_partial(self): - rng = period_range('2007-01', periods=50, freq='M') - ts = Series(np.random.randn(len(rng)), rng) - - self.assertRaises(KeyError, ts.__getitem__, '2006') - - result = ts['2008'] - self.assertTrue((result.index.year == 2008).all()) - - result = ts['2008':'2009'] - self.assertEqual(len(result), 24) - - result = ts['2008-1':'2009-12'] - self.assertEqual(len(result), 24) - - result = ts['2008Q1':'2009Q4'] - self.assertEqual(len(result), 24) - - result = ts[:'2009'] - self.assertEqual(len(result), 36) - - result = ts['2009':] - self.assertEqual(len(result), 50 - 24) - - exp = result - result = ts[24:] - tm.assert_series_equal(exp, result) - - ts = ts[10:].append(ts[10:]) - self.assertRaisesRegexp(KeyError, - "left slice bound for non-unique " - "label: '2008'", - ts.__getitem__, slice('2008', '2009')) - - def test_getitem_datetime(self): - rng = period_range(start='2012-01-01', periods=10, freq='W-MON') - ts = Series(lrange(len(rng)), index=rng) - - dt1 = datetime(2011, 10, 2) - dt4 = datetime(2012, 4, 20) - - rs = ts[dt1:dt4] - tm.assert_series_equal(rs, ts) - - def test_getitem_nat(self): - idx = pd.PeriodIndex(['2011-01', 'NaT', '2011-02'], freq='M') - self.assertEqual(idx[0], pd.Period('2011-01', freq='M')) - self.assertIs(idx[1], tslib.NaT) - - s = pd.Series([0, 1, 2], index=idx) - self.assertEqual(s[pd.NaT], 1) - - s = pd.Series(idx, index=idx) - self.assertEqual(s[pd.Period('2011-01', freq='M')], - pd.Period('2011-01', freq='M')) - self.assertIs(s[pd.NaT], tslib.NaT) - - def test_getitem_list_periods(self): - # GH 7710 - rng = period_range(start='2012-01-01', periods=10, freq='D') - ts = Series(lrange(len(rng)), index=rng) - exp = ts.iloc[[1]] - tm.assert_series_equal(ts[[Period('2012-01-02', freq='D')]], exp) - - def test_slice_with_negative_step(self): - ts = Series(np.arange(20), - period_range('2014-01', periods=20, freq='M')) - SLC = pd.IndexSlice - - def assert_slices_equivalent(l_slc, i_slc): - tm.assert_series_equal(ts[l_slc], ts.iloc[i_slc]) - tm.assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - tm.assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - - assert_slices_equivalent(SLC[Period('2014-10')::-1], SLC[9::-1]) - assert_slices_equivalent(SLC['2014-10'::-1], SLC[9::-1]) - - assert_slices_equivalent(SLC[:Period('2014-10'):-1], SLC[:8:-1]) - assert_slices_equivalent(SLC[:'2014-10':-1], SLC[:8:-1]) - - assert_slices_equivalent(SLC['2015-02':'2014-10':-1], SLC[13:8:-1]) - assert_slices_equivalent(SLC[Period('2015-02'):Period('2014-10'):-1], - SLC[13:8:-1]) - assert_slices_equivalent(SLC['2015-02':Period('2014-10'):-1], - SLC[13:8:-1]) - assert_slices_equivalent(SLC[Period('2015-02'):'2014-10':-1], - SLC[13:8:-1]) - - assert_slices_equivalent(SLC['2014-10':'2015-02':-1], SLC[:0]) - - def test_slice_with_zero_step_raises(self): - ts = Series(np.arange(20), - period_range('2014-01', periods=20, freq='M')) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - - def test_contains(self): - rng = period_range('2007-01', freq='M', periods=10) - - self.assertTrue(Period('2007-01', freq='M') in rng) - self.assertFalse(Period('2007-01', freq='D') in rng) - self.assertFalse(Period('2007-01', freq='2M') in rng) - - def test_contains_nat(self): - # GH13582 - idx = period_range('2007-01', freq='M', periods=10) - self.assertFalse(pd.NaT in idx) - self.assertFalse(None in idx) - self.assertFalse(float('nan') in idx) - self.assertFalse(np.nan in idx) - - idx = pd.PeriodIndex(['2011-01', 'NaT', '2011-02'], freq='M') - self.assertTrue(pd.NaT in idx) - self.assertTrue(None in idx) - self.assertTrue(float('nan') in idx) - self.assertTrue(np.nan in idx) - - def test_sub(self): - rng = period_range('2007-01', periods=50) - - result = rng - 5 - exp = rng + (-5) - tm.assert_index_equal(result, exp) - - def test_periods_number_check(self): - with tm.assertRaises(ValueError): - period_range('2011-1-1', '2012-1-1', 'B') - - def test_tolist(self): - index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - rs = index.tolist() - [tm.assertIsInstance(x, Period) for x in rs] - - recon = PeriodIndex(rs) - tm.assert_index_equal(index, recon) - - def test_to_timestamp(self): - index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - series = Series(1, index=index, name='foo') - - exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') - result = series.to_timestamp(how='end') - tm.assert_index_equal(result.index, exp_index) - self.assertEqual(result.name, 'foo') - - exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') - result = series.to_timestamp(how='start') - tm.assert_index_equal(result.index, exp_index) - - def _get_with_delta(delta, freq='A-DEC'): - return date_range(to_datetime('1/1/2001') + delta, - to_datetime('12/31/2009') + delta, freq=freq) - - delta = timedelta(hours=23) - result = series.to_timestamp('H', 'end') - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - delta = timedelta(hours=23, minutes=59) - result = series.to_timestamp('T', 'end') - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - result = series.to_timestamp('S', 'end') - delta = timedelta(hours=23, minutes=59, seconds=59) - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - index = PeriodIndex(freq='H', start='1/1/2001', end='1/2/2001') - series = Series(1, index=index, name='foo') - - exp_index = date_range('1/1/2001 00:59:59', end='1/2/2001 00:59:59', - freq='H') - result = series.to_timestamp(how='end') - tm.assert_index_equal(result.index, exp_index) - self.assertEqual(result.name, 'foo') - - def test_to_timestamp_quarterly_bug(self): - years = np.arange(1960, 2000).repeat(4) - quarters = np.tile(lrange(1, 5), 40) - - pindex = PeriodIndex(year=years, quarter=quarters) - - stamps = pindex.to_timestamp('D', 'end') - expected = DatetimeIndex([x.to_timestamp('D', 'end') for x in pindex]) - tm.assert_index_equal(stamps, expected) - - def test_to_timestamp_preserve_name(self): - index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009', - name='foo') - self.assertEqual(index.name, 'foo') - - conv = index.to_timestamp('D') - self.assertEqual(conv.name, 'foo') - - def test_to_timestamp_repr_is_code(self): - zs = [Timestamp('99-04-17 00:00:00', tz='UTC'), - Timestamp('2001-04-17 00:00:00', tz='UTC'), - Timestamp('2001-04-17 00:00:00', tz='America/Los_Angeles'), - Timestamp('2001-04-17 00:00:00', tz=None)] - for z in zs: - self.assertEqual(eval(repr(z)), z) - - def test_to_timestamp_pi_nat(self): - # GH 7228 - index = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='M', - name='idx') - - result = index.to_timestamp('D') - expected = DatetimeIndex([pd.NaT, datetime(2011, 1, 1), - datetime(2011, 2, 1)], name='idx') - tm.assert_index_equal(result, expected) - self.assertEqual(result.name, 'idx') - - result2 = result.to_period(freq='M') - tm.assert_index_equal(result2, index) - self.assertEqual(result2.name, 'idx') - - result3 = result.to_period(freq='3M') - exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='3M', name='idx') - self.assert_index_equal(result3, exp) - self.assertEqual(result3.freqstr, '3M') - - msg = ('Frequency must be positive, because it' - ' represents span: -2A') - with tm.assertRaisesRegexp(ValueError, msg): - result.to_period(freq='-2A') - - def test_to_timestamp_pi_mult(self): - idx = PeriodIndex(['2011-01', 'NaT', '2011-02'], freq='2M', name='idx') - result = idx.to_timestamp() - expected = DatetimeIndex( - ['2011-01-01', 'NaT', '2011-02-01'], name='idx') - self.assert_index_equal(result, expected) - result = idx.to_timestamp(how='E') - expected = DatetimeIndex( - ['2011-02-28', 'NaT', '2011-03-31'], name='idx') - self.assert_index_equal(result, expected) - - def test_to_timestamp_pi_combined(self): - idx = PeriodIndex(start='2011', periods=2, freq='1D1H', name='idx') - result = idx.to_timestamp() - expected = DatetimeIndex( - ['2011-01-01 00:00', '2011-01-02 01:00'], name='idx') - self.assert_index_equal(result, expected) - result = idx.to_timestamp(how='E') - expected = DatetimeIndex( - ['2011-01-02 00:59:59', '2011-01-03 01:59:59'], name='idx') - self.assert_index_equal(result, expected) - result = idx.to_timestamp(how='E', freq='H') - expected = DatetimeIndex( - ['2011-01-02 00:00', '2011-01-03 01:00'], name='idx') - self.assert_index_equal(result, expected) - - def test_to_timestamp_to_period_astype(self): - idx = DatetimeIndex([pd.NaT, '2011-01-01', '2011-02-01'], name='idx') - - res = idx.astype('period[M]') - exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='M', name='idx') - tm.assert_index_equal(res, exp) - - res = idx.astype('period[3M]') - exp = PeriodIndex(['NaT', '2011-01', '2011-02'], freq='3M', name='idx') - self.assert_index_equal(res, exp) - - def test_start_time(self): - index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31') - expected_index = date_range('2016-01-01', end='2016-05-31', freq='MS') - tm.assert_index_equal(index.start_time, expected_index) - - def test_end_time(self): - index = PeriodIndex(freq='M', start='2016-01-01', end='2016-05-31') - expected_index = date_range('2016-01-01', end='2016-05-31', freq='M') - tm.assert_index_equal(index.end_time, expected_index) - - def test_as_frame_columns(self): - rng = period_range('1/1/2000', periods=5) - df = DataFrame(randn(10, 5), columns=rng) - - ts = df[rng[0]] - tm.assert_series_equal(ts, df.iloc[:, 0]) - - # GH # 1211 - repr(df) - - ts = df['1/1/2000'] - tm.assert_series_equal(ts, df.iloc[:, 0]) - - def test_indexing(self): - - # GH 4390, iat incorrectly indexing - index = period_range('1/1/2001', periods=10) - s = Series(randn(10), index=index) - expected = s[index[0]] - result = s.iat[0] - self.assertEqual(expected, result) - - def test_frame_setitem(self): - rng = period_range('1/1/2000', periods=5, name='index') - df = DataFrame(randn(5, 3), index=rng) - - df['Index'] = rng - rs = Index(df['Index']) - tm.assert_index_equal(rs, rng, check_names=False) - self.assertEqual(rs.name, 'Index') - self.assertEqual(rng.name, 'index') - - rs = df.reset_index().set_index('index') - tm.assertIsInstance(rs.index, PeriodIndex) - tm.assert_index_equal(rs.index, rng) - - def test_period_set_index_reindex(self): - # GH 6631 - df = DataFrame(np.random.random(6)) - idx1 = period_range('2011/01/01', periods=6, freq='M') - idx2 = period_range('2013', periods=6, freq='A') - - df = df.set_index(idx1) - tm.assert_index_equal(df.index, idx1) - df = df.set_index(idx2) - tm.assert_index_equal(df.index, idx2) - - def test_frame_to_time_stamp(self): - K = 5 - index = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - df = DataFrame(randn(len(index), K), index=index) - df['mix'] = 'a' - - exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') - result = df.to_timestamp('D', 'end') - tm.assert_index_equal(result.index, exp_index) - tm.assert_numpy_array_equal(result.values, df.values) - - exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') - result = df.to_timestamp('D', 'start') - tm.assert_index_equal(result.index, exp_index) - - def _get_with_delta(delta, freq='A-DEC'): - return date_range(to_datetime('1/1/2001') + delta, - to_datetime('12/31/2009') + delta, freq=freq) - - delta = timedelta(hours=23) - result = df.to_timestamp('H', 'end') - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - delta = timedelta(hours=23, minutes=59) - result = df.to_timestamp('T', 'end') - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - result = df.to_timestamp('S', 'end') - delta = timedelta(hours=23, minutes=59, seconds=59) - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.index, exp_index) - - # columns - df = df.T - - exp_index = date_range('1/1/2001', end='12/31/2009', freq='A-DEC') - result = df.to_timestamp('D', 'end', axis=1) - tm.assert_index_equal(result.columns, exp_index) - tm.assert_numpy_array_equal(result.values, df.values) - - exp_index = date_range('1/1/2001', end='1/1/2009', freq='AS-JAN') - result = df.to_timestamp('D', 'start', axis=1) - tm.assert_index_equal(result.columns, exp_index) - - delta = timedelta(hours=23) - result = df.to_timestamp('H', 'end', axis=1) - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.columns, exp_index) - - delta = timedelta(hours=23, minutes=59) - result = df.to_timestamp('T', 'end', axis=1) - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.columns, exp_index) - - result = df.to_timestamp('S', 'end', axis=1) - delta = timedelta(hours=23, minutes=59, seconds=59) - exp_index = _get_with_delta(delta) - tm.assert_index_equal(result.columns, exp_index) - - # invalid axis - tm.assertRaisesRegexp(ValueError, 'axis', df.to_timestamp, axis=2) - - result1 = df.to_timestamp('5t', axis=1) - result2 = df.to_timestamp('t', axis=1) - expected = pd.date_range('2001-01-01', '2009-01-01', freq='AS') - self.assertTrue(isinstance(result1.columns, DatetimeIndex)) - self.assertTrue(isinstance(result2.columns, DatetimeIndex)) - self.assert_numpy_array_equal(result1.columns.asi8, expected.asi8) - self.assert_numpy_array_equal(result2.columns.asi8, expected.asi8) - # PeriodIndex.to_timestamp always use 'infer' - self.assertEqual(result1.columns.freqstr, 'AS-JAN') - self.assertEqual(result2.columns.freqstr, 'AS-JAN') - - def test_index_duplicate_periods(self): - # monotonic - idx = PeriodIndex([2000, 2007, 2007, 2009, 2009], freq='A-JUN') - ts = Series(np.random.randn(len(idx)), index=idx) - - result = ts[2007] - expected = ts[1:3] - tm.assert_series_equal(result, expected) - result[:] = 1 - self.assertTrue((ts[1:3] == 1).all()) - - # not monotonic - idx = PeriodIndex([2000, 2007, 2007, 2009, 2007], freq='A-JUN') - ts = Series(np.random.randn(len(idx)), index=idx) - - result = ts[2007] - expected = ts[idx == 2007] - tm.assert_series_equal(result, expected) - - def test_index_unique(self): - idx = PeriodIndex([2000, 2007, 2007, 2009, 2009], freq='A-JUN') - expected = PeriodIndex([2000, 2007, 2009], freq='A-JUN') - self.assert_index_equal(idx.unique(), expected) - self.assertEqual(idx.nunique(), 3) - - idx = PeriodIndex([2000, 2007, 2007, 2009, 2007], freq='A-JUN', - tz='US/Eastern') - expected = PeriodIndex([2000, 2007, 2009], freq='A-JUN', - tz='US/Eastern') - self.assert_index_equal(idx.unique(), expected) - self.assertEqual(idx.nunique(), 3) - - def test_constructor(self): - pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 9) - - pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 4 * 9) - - pi = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 12 * 9) - - pi = PeriodIndex(freq='D', start='1/1/2001', end='12/31/2009') - self.assertEqual(len(pi), 365 * 9 + 2) - - pi = PeriodIndex(freq='B', start='1/1/2001', end='12/31/2009') - self.assertEqual(len(pi), 261 * 9) - - pi = PeriodIndex(freq='H', start='1/1/2001', end='12/31/2001 23:00') - self.assertEqual(len(pi), 365 * 24) - - pi = PeriodIndex(freq='Min', start='1/1/2001', end='1/1/2001 23:59') - self.assertEqual(len(pi), 24 * 60) - - pi = PeriodIndex(freq='S', start='1/1/2001', end='1/1/2001 23:59:59') - self.assertEqual(len(pi), 24 * 60 * 60) - - start = Period('02-Apr-2005', 'B') - i1 = PeriodIndex(start=start, periods=20) - self.assertEqual(len(i1), 20) - self.assertEqual(i1.freq, start.freq) - self.assertEqual(i1[0], start) - - end_intv = Period('2006-12-31', 'W') - i1 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), 10) - self.assertEqual(i1.freq, end_intv.freq) - self.assertEqual(i1[-1], end_intv) - - end_intv = Period('2006-12-31', '1w') - i2 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), len(i2)) - self.assertTrue((i1 == i2).all()) - self.assertEqual(i1.freq, i2.freq) - - end_intv = Period('2006-12-31', ('w', 1)) - i2 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), len(i2)) - self.assertTrue((i1 == i2).all()) - self.assertEqual(i1.freq, i2.freq) - - try: - PeriodIndex(start=start, end=end_intv) - raise AssertionError('Cannot allow mixed freq for start and end') - except ValueError: - pass - - end_intv = Period('2005-05-01', 'B') - i1 = PeriodIndex(start=start, end=end_intv) - - try: - PeriodIndex(start=start) - raise AssertionError( - 'Must specify periods if missing start or end') - except ValueError: - pass - - # infer freq from first element - i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) - self.assertEqual(len(i2), 2) - self.assertEqual(i2[0], end_intv) - - i2 = PeriodIndex(np.array([end_intv, Period('2005-05-05', 'B')])) - self.assertEqual(len(i2), 2) - self.assertEqual(i2[0], end_intv) - - # Mixed freq should fail - vals = [end_intv, Period('2006-12-31', 'w')] - self.assertRaises(ValueError, PeriodIndex, vals) - vals = np.array(vals) - self.assertRaises(ValueError, PeriodIndex, vals) - - def test_numpy_repeat(self): - index = period_range('20010101', periods=2) - expected = PeriodIndex([Period('2001-01-01'), Period('2001-01-01'), - Period('2001-01-02'), Period('2001-01-02')]) - - tm.assert_index_equal(np.repeat(index, 2), expected) - - msg = "the 'axis' parameter is not supported" - tm.assertRaisesRegexp(ValueError, msg, np.repeat, index, 2, axis=1) - - def test_shift(self): - pi1 = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='A', start='1/1/2002', end='12/1/2010') - - tm.assert_index_equal(pi1.shift(0), pi1) - - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(1), pi2) - - pi1 = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='A', start='1/1/2000', end='12/1/2008') - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(-1), pi2) - - pi1 = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='M', start='2/1/2001', end='1/1/2010') - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(1), pi2) - - pi1 = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='M', start='12/1/2000', end='11/1/2009') - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(-1), pi2) - - pi1 = PeriodIndex(freq='D', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='D', start='1/2/2001', end='12/2/2009') - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(1), pi2) - - pi1 = PeriodIndex(freq='D', start='1/1/2001', end='12/1/2009') - pi2 = PeriodIndex(freq='D', start='12/31/2000', end='11/30/2009') - self.assertEqual(len(pi1), len(pi2)) - self.assert_index_equal(pi1.shift(-1), pi2) - - def test_shift_nat(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - result = idx.shift(1) - expected = PeriodIndex(['2011-02', '2011-03', 'NaT', - '2011-05'], freq='M', name='idx') - tm.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - - def test_shift_ndarray(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - result = idx.shift(np.array([1, 2, 3, 4])) - expected = PeriodIndex(['2011-02', '2011-04', 'NaT', - '2011-08'], freq='M', name='idx') - tm.assert_index_equal(result, expected) - - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - result = idx.shift(np.array([1, -2, 3, -4])) - expected = PeriodIndex(['2011-02', '2010-12', 'NaT', - '2010-12'], freq='M', name='idx') - tm.assert_index_equal(result, expected) - - def test_asfreq(self): - pi1 = PeriodIndex(freq='A', start='1/1/2001', end='1/1/2001') - pi2 = PeriodIndex(freq='Q', start='1/1/2001', end='1/1/2001') - pi3 = PeriodIndex(freq='M', start='1/1/2001', end='1/1/2001') - pi4 = PeriodIndex(freq='D', start='1/1/2001', end='1/1/2001') - pi5 = PeriodIndex(freq='H', start='1/1/2001', end='1/1/2001 00:00') - pi6 = PeriodIndex(freq='Min', start='1/1/2001', end='1/1/2001 00:00') - pi7 = PeriodIndex(freq='S', start='1/1/2001', end='1/1/2001 00:00:00') - - self.assertEqual(pi1.asfreq('Q', 'S'), pi2) - self.assertEqual(pi1.asfreq('Q', 's'), pi2) - self.assertEqual(pi1.asfreq('M', 'start'), pi3) - self.assertEqual(pi1.asfreq('D', 'StarT'), pi4) - self.assertEqual(pi1.asfreq('H', 'beGIN'), pi5) - self.assertEqual(pi1.asfreq('Min', 'S'), pi6) - self.assertEqual(pi1.asfreq('S', 'S'), pi7) - - self.assertEqual(pi2.asfreq('A', 'S'), pi1) - self.assertEqual(pi2.asfreq('M', 'S'), pi3) - self.assertEqual(pi2.asfreq('D', 'S'), pi4) - self.assertEqual(pi2.asfreq('H', 'S'), pi5) - self.assertEqual(pi2.asfreq('Min', 'S'), pi6) - self.assertEqual(pi2.asfreq('S', 'S'), pi7) - - self.assertEqual(pi3.asfreq('A', 'S'), pi1) - self.assertEqual(pi3.asfreq('Q', 'S'), pi2) - self.assertEqual(pi3.asfreq('D', 'S'), pi4) - self.assertEqual(pi3.asfreq('H', 'S'), pi5) - self.assertEqual(pi3.asfreq('Min', 'S'), pi6) - self.assertEqual(pi3.asfreq('S', 'S'), pi7) - - self.assertEqual(pi4.asfreq('A', 'S'), pi1) - self.assertEqual(pi4.asfreq('Q', 'S'), pi2) - self.assertEqual(pi4.asfreq('M', 'S'), pi3) - self.assertEqual(pi4.asfreq('H', 'S'), pi5) - self.assertEqual(pi4.asfreq('Min', 'S'), pi6) - self.assertEqual(pi4.asfreq('S', 'S'), pi7) - - self.assertEqual(pi5.asfreq('A', 'S'), pi1) - self.assertEqual(pi5.asfreq('Q', 'S'), pi2) - self.assertEqual(pi5.asfreq('M', 'S'), pi3) - self.assertEqual(pi5.asfreq('D', 'S'), pi4) - self.assertEqual(pi5.asfreq('Min', 'S'), pi6) - self.assertEqual(pi5.asfreq('S', 'S'), pi7) - - self.assertEqual(pi6.asfreq('A', 'S'), pi1) - self.assertEqual(pi6.asfreq('Q', 'S'), pi2) - self.assertEqual(pi6.asfreq('M', 'S'), pi3) - self.assertEqual(pi6.asfreq('D', 'S'), pi4) - self.assertEqual(pi6.asfreq('H', 'S'), pi5) - self.assertEqual(pi6.asfreq('S', 'S'), pi7) - - self.assertEqual(pi7.asfreq('A', 'S'), pi1) - self.assertEqual(pi7.asfreq('Q', 'S'), pi2) - self.assertEqual(pi7.asfreq('M', 'S'), pi3) - self.assertEqual(pi7.asfreq('D', 'S'), pi4) - self.assertEqual(pi7.asfreq('H', 'S'), pi5) - self.assertEqual(pi7.asfreq('Min', 'S'), pi6) - - self.assertRaises(ValueError, pi7.asfreq, 'T', 'foo') - result1 = pi1.asfreq('3M') - result2 = pi1.asfreq('M') - expected = PeriodIndex(freq='M', start='2001-12', end='2001-12') - self.assert_numpy_array_equal(result1.asi8, expected.asi8) - self.assertEqual(result1.freqstr, '3M') - self.assert_numpy_array_equal(result2.asi8, expected.asi8) - self.assertEqual(result2.freqstr, 'M') - - def test_asfreq_nat(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', '2011-04'], freq='M') - result = idx.asfreq(freq='Q') - expected = PeriodIndex(['2011Q1', '2011Q1', 'NaT', '2011Q2'], freq='Q') - tm.assert_index_equal(result, expected) - - def test_asfreq_mult_pi(self): - pi = PeriodIndex(['2001-01', '2001-02', 'NaT', '2001-03'], freq='2M') - - for freq in ['D', '3D']: - result = pi.asfreq(freq) - exp = PeriodIndex(['2001-02-28', '2001-03-31', 'NaT', - '2001-04-30'], freq=freq) - self.assert_index_equal(result, exp) - self.assertEqual(result.freq, exp.freq) - - result = pi.asfreq(freq, how='S') - exp = PeriodIndex(['2001-01-01', '2001-02-01', 'NaT', - '2001-03-01'], freq=freq) - self.assert_index_equal(result, exp) - self.assertEqual(result.freq, exp.freq) - - def test_asfreq_combined_pi(self): - pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], - freq='H') - exp = PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], - freq='25H') - for freq, how in zip(['1D1H', '1H1D'], ['S', 'E']): - result = pi.asfreq(freq, how=how) - self.assert_index_equal(result, exp) - self.assertEqual(result.freq, exp.freq) - - for freq in ['1D1H', '1H1D']: - pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', - 'NaT'], freq=freq) - result = pi.asfreq('H') - exp = PeriodIndex(['2001-01-02 00:00', '2001-01-03 02:00', 'NaT'], - freq='H') - self.assert_index_equal(result, exp) - self.assertEqual(result.freq, exp.freq) - - pi = pd.PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', - 'NaT'], freq=freq) - result = pi.asfreq('H', how='S') - exp = PeriodIndex(['2001-01-01 00:00', '2001-01-02 02:00', 'NaT'], - freq='H') - self.assert_index_equal(result, exp) - self.assertEqual(result.freq, exp.freq) - - def test_period_index_length(self): - pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 9) - - pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 4 * 9) - - pi = PeriodIndex(freq='M', start='1/1/2001', end='12/1/2009') - self.assertEqual(len(pi), 12 * 9) - - start = Period('02-Apr-2005', 'B') - i1 = PeriodIndex(start=start, periods=20) - self.assertEqual(len(i1), 20) - self.assertEqual(i1.freq, start.freq) - self.assertEqual(i1[0], start) - - end_intv = Period('2006-12-31', 'W') - i1 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), 10) - self.assertEqual(i1.freq, end_intv.freq) - self.assertEqual(i1[-1], end_intv) - - end_intv = Period('2006-12-31', '1w') - i2 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), len(i2)) - self.assertTrue((i1 == i2).all()) - self.assertEqual(i1.freq, i2.freq) - - end_intv = Period('2006-12-31', ('w', 1)) - i2 = PeriodIndex(end=end_intv, periods=10) - self.assertEqual(len(i1), len(i2)) - self.assertTrue((i1 == i2).all()) - self.assertEqual(i1.freq, i2.freq) - - try: - PeriodIndex(start=start, end=end_intv) - raise AssertionError('Cannot allow mixed freq for start and end') - except ValueError: - pass - - end_intv = Period('2005-05-01', 'B') - i1 = PeriodIndex(start=start, end=end_intv) - - try: - PeriodIndex(start=start) - raise AssertionError( - 'Must specify periods if missing start or end') - except ValueError: - pass - - # infer freq from first element - i2 = PeriodIndex([end_intv, Period('2005-05-05', 'B')]) - self.assertEqual(len(i2), 2) - self.assertEqual(i2[0], end_intv) - - i2 = PeriodIndex(np.array([end_intv, Period('2005-05-05', 'B')])) - self.assertEqual(len(i2), 2) - self.assertEqual(i2[0], end_intv) - - # Mixed freq should fail - vals = [end_intv, Period('2006-12-31', 'w')] - self.assertRaises(ValueError, PeriodIndex, vals) - vals = np.array(vals) - self.assertRaises(ValueError, PeriodIndex, vals) - - def test_frame_index_to_string(self): - index = PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M') - frame = DataFrame(np.random.randn(3, 4), index=index) - - # it works! - frame.to_string() - - def test_asfreq_ts(self): - index = PeriodIndex(freq='A', start='1/1/2001', end='12/31/2010') - ts = Series(np.random.randn(len(index)), index=index) - df = DataFrame(np.random.randn(len(index), 3), index=index) - - result = ts.asfreq('D', how='end') - df_result = df.asfreq('D', how='end') - exp_index = index.asfreq('D', how='end') - self.assertEqual(len(result), len(ts)) - tm.assert_index_equal(result.index, exp_index) - tm.assert_index_equal(df_result.index, exp_index) - - result = ts.asfreq('D', how='start') - self.assertEqual(len(result), len(ts)) - tm.assert_index_equal(result.index, index.asfreq('D', how='start')) - - def test_badinput(self): - self.assertRaises(ValueError, Period, '-2000', 'A') - self.assertRaises(tslib.DateParseError, Period, '0', 'A') - self.assertRaises(tslib.DateParseError, Period, '1/1/-2000', 'A') - - def test_negative_ordinals(self): - Period(ordinal=-1000, freq='A') - Period(ordinal=0, freq='A') - - idx1 = PeriodIndex(ordinal=[-1, 0, 1], freq='A') - idx2 = PeriodIndex(ordinal=np.array([-1, 0, 1]), freq='A') - tm.assert_index_equal(idx1, idx2) - - def test_dti_to_period(self): - dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') - pi1 = dti.to_period() - pi2 = dti.to_period(freq='D') - pi3 = dti.to_period(freq='3D') - - self.assertEqual(pi1[0], Period('Jan 2005', freq='M')) - self.assertEqual(pi2[0], Period('1/31/2005', freq='D')) - self.assertEqual(pi3[0], Period('1/31/2005', freq='3D')) - - self.assertEqual(pi1[-1], Period('Nov 2005', freq='M')) - self.assertEqual(pi2[-1], Period('11/30/2005', freq='D')) - self.assertEqual(pi3[-1], Period('11/30/2005', freq='3D')) - - tm.assert_index_equal(pi1, period_range('1/1/2005', '11/1/2005', - freq='M')) - tm.assert_index_equal(pi2, period_range('1/1/2005', '11/1/2005', - freq='M').asfreq('D')) - tm.assert_index_equal(pi3, period_range('1/1/2005', '11/1/2005', - freq='M').asfreq('3D')) - - def test_pindex_slice_index(self): - pi = PeriodIndex(start='1/1/10', end='12/31/12', freq='M') - s = Series(np.random.rand(len(pi)), index=pi) - res = s['2010'] - exp = s[0:12] - tm.assert_series_equal(res, exp) - res = s['2011'] - exp = s[12:24] - tm.assert_series_equal(res, exp) - - def test_getitem_day(self): - # GH 6716 - # Confirm DatetimeIndex and PeriodIndex works identically - didx = DatetimeIndex(start='2013/01/01', freq='D', periods=400) - pidx = PeriodIndex(start='2013/01/01', freq='D', periods=400) - - for idx in [didx, pidx]: - # getitem against index should raise ValueError - values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', - '2013/02/01 09:00'] - for v in values: - - if _np_version_under1p9: - with tm.assertRaises(ValueError): - idx[v] - else: - # GH7116 - # these show deprecations as we are trying - # to slice with non-integer indexers - # with tm.assertRaises(IndexError): - # idx[v] - continue - - s = Series(np.random.rand(len(idx)), index=idx) - tm.assert_series_equal(s['2013/01'], s[0:31]) - tm.assert_series_equal(s['2013/02'], s[31:59]) - tm.assert_series_equal(s['2014'], s[365:]) - - invalid = ['2013/02/01 9H', '2013/02/01 09:00'] - for v in invalid: - with tm.assertRaises(KeyError): - s[v] - - def test_range_slice_day(self): - # GH 6716 - didx = DatetimeIndex(start='2013/01/01', freq='D', periods=400) - pidx = PeriodIndex(start='2013/01/01', freq='D', periods=400) - - # changed to TypeError in 1.12 - # https://github.com/numpy/numpy/pull/6271 - exc = IndexError if _np_version_under1p12 else TypeError - - for idx in [didx, pidx]: - # slices against index should raise IndexError - values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', - '2013/02/01 09:00'] - for v in values: - with tm.assertRaises(exc): - idx[v:] - - s = Series(np.random.rand(len(idx)), index=idx) - - tm.assert_series_equal(s['2013/01/02':], s[1:]) - tm.assert_series_equal(s['2013/01/02':'2013/01/05'], s[1:5]) - tm.assert_series_equal(s['2013/02':], s[31:]) - tm.assert_series_equal(s['2014':], s[365:]) - - invalid = ['2013/02/01 9H', '2013/02/01 09:00'] - for v in invalid: - with tm.assertRaises(exc): - idx[v:] - - def test_getitem_seconds(self): - # GH 6716 - didx = DatetimeIndex(start='2013/01/01 09:00:00', freq='S', - periods=4000) - pidx = PeriodIndex(start='2013/01/01 09:00:00', freq='S', periods=4000) - - for idx in [didx, pidx]: - # getitem against index should raise ValueError - values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', - '2013/02/01 09:00'] - for v in values: - if _np_version_under1p9: - with tm.assertRaises(ValueError): - idx[v] - else: - # GH7116 - # these show deprecations as we are trying - # to slice with non-integer indexers - # with tm.assertRaises(IndexError): - # idx[v] - continue - - s = Series(np.random.rand(len(idx)), index=idx) - tm.assert_series_equal(s['2013/01/01 10:00'], s[3600:3660]) - tm.assert_series_equal(s['2013/01/01 9H'], s[:3600]) - for d in ['2013/01/01', '2013/01', '2013']: - tm.assert_series_equal(s[d], s) - - def test_range_slice_seconds(self): - # GH 6716 - didx = DatetimeIndex(start='2013/01/01 09:00:00', freq='S', - periods=4000) - pidx = PeriodIndex(start='2013/01/01 09:00:00', freq='S', periods=4000) - - # changed to TypeError in 1.12 - # https://github.com/numpy/numpy/pull/6271 - exc = IndexError if _np_version_under1p12 else TypeError - - for idx in [didx, pidx]: - # slices against index should raise IndexError - values = ['2014', '2013/02', '2013/01/02', '2013/02/01 9H', - '2013/02/01 09:00'] - for v in values: - with tm.assertRaises(exc): - idx[v:] - - s = Series(np.random.rand(len(idx)), index=idx) - - tm.assert_series_equal(s['2013/01/01 09:05':'2013/01/01 09:10'], - s[300:660]) - tm.assert_series_equal(s['2013/01/01 10:00':'2013/01/01 10:05'], - s[3600:3960]) - tm.assert_series_equal(s['2013/01/01 10H':], s[3600:]) - tm.assert_series_equal(s[:'2013/01/01 09:30'], s[:1860]) - for d in ['2013/01/01', '2013/01', '2013']: - tm.assert_series_equal(s[d:], s) - - def test_range_slice_outofbounds(self): - # GH 5407 - didx = DatetimeIndex(start='2013/10/01', freq='D', periods=10) - pidx = PeriodIndex(start='2013/10/01', freq='D', periods=10) - - for idx in [didx, pidx]: - df = DataFrame(dict(units=[100 + i for i in range(10)]), index=idx) - empty = DataFrame(index=idx.__class__([], freq='D'), - columns=['units']) - empty['units'] = empty['units'].astype('int64') - - tm.assert_frame_equal(df['2013/09/01':'2013/09/30'], empty) - tm.assert_frame_equal(df['2013/09/30':'2013/10/02'], df.iloc[:2]) - tm.assert_frame_equal(df['2013/10/01':'2013/10/02'], df.iloc[:2]) - tm.assert_frame_equal(df['2013/10/02':'2013/09/30'], empty) - tm.assert_frame_equal(df['2013/10/15':'2013/10/17'], empty) - tm.assert_frame_equal(df['2013-06':'2013-09'], empty) - tm.assert_frame_equal(df['2013-11':'2013-12'], empty) - - def test_astype_asfreq(self): - pi1 = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01'], freq='D') - exp = PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M') - tm.assert_index_equal(pi1.asfreq('M'), exp) - tm.assert_index_equal(pi1.astype('period[M]'), exp) - - exp = PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='3M') - tm.assert_index_equal(pi1.asfreq('3M'), exp) - tm.assert_index_equal(pi1.astype('period[3M]'), exp) - - def test_pindex_fieldaccessor_nat(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2012-03', '2012-04'], freq='D') - - exp = np.array([2011, 2011, -1, 2012, 2012], dtype=np.int64) - self.assert_numpy_array_equal(idx.year, exp) - exp = np.array([1, 2, -1, 3, 4], dtype=np.int64) - self.assert_numpy_array_equal(idx.month, exp) - - def test_pindex_qaccess(self): - pi = PeriodIndex(['2Q05', '3Q05', '4Q05', '1Q06', '2Q06'], freq='Q') - s = Series(np.random.rand(len(pi)), index=pi).cumsum() - # Todo: fix these accessors! - self.assertEqual(s['05Q4'], s[2]) - - def test_period_dt64_round_trip(self): - dti = date_range('1/1/2000', '1/7/2002', freq='B') - pi = dti.to_period() - tm.assert_index_equal(pi.to_timestamp(), dti) - - dti = date_range('1/1/2000', '1/7/2002', freq='B') - pi = dti.to_period(freq='H') - tm.assert_index_equal(pi.to_timestamp(), dti) - - def test_period_astype_to_timestamp(self): - pi = pd.PeriodIndex(['2011-01', '2011-02', '2011-03'], freq='M') - - exp = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01']) - tm.assert_index_equal(pi.astype('datetime64[ns]'), exp) - - exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31']) - tm.assert_index_equal(pi.astype('datetime64[ns]', how='end'), exp) - - exp = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01'], - tz='US/Eastern') - res = pi.astype('datetime64[ns, US/Eastern]') - tm.assert_index_equal(pi.astype('datetime64[ns, US/Eastern]'), exp) - - exp = pd.DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], - tz='US/Eastern') - res = pi.astype('datetime64[ns, US/Eastern]', how='end') - tm.assert_index_equal(res, exp) - - def test_to_period_quarterly(self): - # make sure we can make the round trip - for month in MONTHS: - freq = 'Q-%s' % month - rng = period_range('1989Q3', '1991Q3', freq=freq) - stamps = rng.to_timestamp() - result = stamps.to_period(freq) - tm.assert_index_equal(rng, result) - - def test_to_period_quarterlyish(self): - offsets = ['BQ', 'QS', 'BQS'] - for off in offsets: - rng = date_range('01-Jan-2012', periods=8, freq=off) - prng = rng.to_period() - self.assertEqual(prng.freq, 'Q-DEC') - - def test_to_period_annualish(self): - offsets = ['BA', 'AS', 'BAS'] - for off in offsets: - rng = date_range('01-Jan-2012', periods=8, freq=off) - prng = rng.to_period() - self.assertEqual(prng.freq, 'A-DEC') - - def test_to_period_monthish(self): - offsets = ['MS', 'BM'] - for off in offsets: - rng = date_range('01-Jan-2012', periods=8, freq=off) - prng = rng.to_period() - self.assertEqual(prng.freq, 'M') - - rng = date_range('01-Jan-2012', periods=8, freq='M') - prng = rng.to_period() - self.assertEqual(prng.freq, 'M') - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - date_range('01-Jan-2012', periods=8, freq='EOM') - - def test_multiples(self): - result1 = Period('1989', freq='2A') - result2 = Period('1989', freq='A') - self.assertEqual(result1.ordinal, result2.ordinal) - self.assertEqual(result1.freqstr, '2A-DEC') - self.assertEqual(result2.freqstr, 'A-DEC') - self.assertEqual(result1.freq, offsets.YearEnd(2)) - self.assertEqual(result2.freq, offsets.YearEnd()) - - self.assertEqual((result1 + 1).ordinal, result1.ordinal + 2) - self.assertEqual((1 + result1).ordinal, result1.ordinal + 2) - self.assertEqual((result1 - 1).ordinal, result2.ordinal - 2) - self.assertEqual((-1 + result1).ordinal, result2.ordinal - 2) - - def test_pindex_multiples(self): - pi = PeriodIndex(start='1/1/11', end='12/31/11', freq='2M') - expected = PeriodIndex(['2011-01', '2011-03', '2011-05', '2011-07', - '2011-09', '2011-11'], freq='2M') - tm.assert_index_equal(pi, expected) - self.assertEqual(pi.freq, offsets.MonthEnd(2)) - self.assertEqual(pi.freqstr, '2M') - - pi = period_range(start='1/1/11', end='12/31/11', freq='2M') - tm.assert_index_equal(pi, expected) - self.assertEqual(pi.freq, offsets.MonthEnd(2)) - self.assertEqual(pi.freqstr, '2M') - - pi = period_range(start='1/1/11', periods=6, freq='2M') - tm.assert_index_equal(pi, expected) - self.assertEqual(pi.freq, offsets.MonthEnd(2)) - self.assertEqual(pi.freqstr, '2M') - - def test_iteration(self): - index = PeriodIndex(start='1/1/10', periods=4, freq='B') - - result = list(index) - tm.assertIsInstance(result[0], Period) - self.assertEqual(result[0].freq, index.freq) - - def test_take(self): - index = PeriodIndex(start='1/1/10', end='12/31/12', freq='D', - name='idx') - expected = PeriodIndex([datetime(2010, 1, 6), datetime(2010, 1, 7), - datetime(2010, 1, 9), datetime(2010, 1, 13)], - freq='D', name='idx') - - taken1 = index.take([5, 6, 8, 12]) - taken2 = index[[5, 6, 8, 12]] - - for taken in [taken1, taken2]: - tm.assert_index_equal(taken, expected) - tm.assertIsInstance(taken, PeriodIndex) - self.assertEqual(taken.freq, index.freq) - self.assertEqual(taken.name, expected.name) - - def test_take_fill_value(self): - # GH 12631 - idx = pd.PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01'], - name='xxx', freq='D') - result = idx.take(np.array([1, 0, -1])) - expected = pd.PeriodIndex(['2011-02-01', '2011-01-01', '2011-03-01'], - name='xxx', freq='D') - tm.assert_index_equal(result, expected) - - # fill_value - result = idx.take(np.array([1, 0, -1]), fill_value=True) - expected = pd.PeriodIndex(['2011-02-01', '2011-01-01', 'NaT'], - name='xxx', freq='D') - tm.assert_index_equal(result, expected) - - # allow_fill=False - result = idx.take(np.array([1, 0, -1]), allow_fill=False, - fill_value=True) - expected = pd.PeriodIndex(['2011-02-01', '2011-01-01', '2011-03-01'], - name='xxx', freq='D') - tm.assert_index_equal(result, expected) - - msg = ('When allow_fill=True and fill_value is not None, ' - 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -5]), fill_value=True) - - with tm.assertRaises(IndexError): - idx.take(np.array([1, -5])) - - def test_joins(self): - index = period_range('1/1/2000', '1/20/2000', freq='D') - - for kind in ['inner', 'outer', 'left', 'right']: - joined = index.join(index[:-5], how=kind) - - tm.assertIsInstance(joined, PeriodIndex) - self.assertEqual(joined.freq, index.freq) - - def test_join_self(self): - index = period_range('1/1/2000', '1/20/2000', freq='D') - - for kind in ['inner', 'outer', 'left', 'right']: - res = index.join(index, how=kind) - self.assertIs(index, res) - - def test_join_does_not_recur(self): - df = tm.makeCustomDataframe( - 3, 2, data_gen_f=lambda *args: np.random.randint(2), - c_idx_type='p', r_idx_type='dt') - s = df.iloc[:2, 0] - - res = s.index.join(df.columns, how='outer') - expected = Index([s.index[0], s.index[1], - df.columns[0], df.columns[1]], object) - tm.assert_index_equal(res, expected) - - def test_align_series(self): - rng = period_range('1/1/2000', '1/1/2010', freq='A') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts + ts[::2] - expected = ts + ts - expected[1::2] = np.nan - tm.assert_series_equal(result, expected) - - result = ts + _permute(ts[::2]) - tm.assert_series_equal(result, expected) - - # it works! - for kind in ['inner', 'outer', 'left', 'right']: - ts.align(ts[::2], join=kind) - msg = "Input has different freq=D from PeriodIndex\\(freq=A-DEC\\)" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - ts + ts.asfreq('D', how="end") - - def test_align_frame(self): - rng = period_range('1/1/2000', '1/1/2010', freq='A') - ts = DataFrame(np.random.randn(len(rng), 3), index=rng) - - result = ts + ts[::2] - expected = ts + ts - expected.values[1::2] = np.nan - tm.assert_frame_equal(result, expected) - - result = ts + _permute(ts[::2]) - tm.assert_frame_equal(result, expected) - - def test_union(self): - index = period_range('1/1/2000', '1/20/2000', freq='D') - - result = index[:-5].union(index[10:]) - tm.assert_index_equal(result, index) - - # not in order - result = _permute(index[:-5]).union(_permute(index[10:])) - tm.assert_index_equal(result, index) - - # raise if different frequencies - index = period_range('1/1/2000', '1/20/2000', freq='D') - index2 = period_range('1/1/2000', '1/20/2000', freq='W-WED') - with tm.assertRaises(period.IncompatibleFrequency): - index.union(index2) - - msg = 'can only call with other PeriodIndex-ed objects' - with tm.assertRaisesRegexp(ValueError, msg): - index.join(index.to_timestamp()) - - index3 = period_range('1/1/2000', '1/20/2000', freq='2D') - with tm.assertRaises(period.IncompatibleFrequency): - index.join(index3) - - def test_union_dataframe_index(self): - rng1 = pd.period_range('1/1/1999', '1/1/2012', freq='M') - s1 = pd.Series(np.random.randn(len(rng1)), rng1) - - rng2 = pd.period_range('1/1/1980', '12/1/2001', freq='M') - s2 = pd.Series(np.random.randn(len(rng2)), rng2) - df = pd.DataFrame({'s1': s1, 's2': s2}) - - exp = pd.period_range('1/1/1980', '1/1/2012', freq='M') - self.assert_index_equal(df.index, exp) - - def test_intersection(self): - index = period_range('1/1/2000', '1/20/2000', freq='D') - - result = index[:-5].intersection(index[10:]) - tm.assert_index_equal(result, index[10:-5]) - - # not in order - left = _permute(index[:-5]) - right = _permute(index[10:]) - result = left.intersection(right).sort_values() - tm.assert_index_equal(result, index[10:-5]) - - # raise if different frequencies - index = period_range('1/1/2000', '1/20/2000', freq='D') - index2 = period_range('1/1/2000', '1/20/2000', freq='W-WED') - with tm.assertRaises(period.IncompatibleFrequency): - index.intersection(index2) - - index3 = period_range('1/1/2000', '1/20/2000', freq='2D') - with tm.assertRaises(period.IncompatibleFrequency): - index.intersection(index3) - - def test_intersection_cases(self): - base = period_range('6/1/2000', '6/30/2000', freq='D', name='idx') - - # if target has the same name, it is preserved - rng2 = period_range('5/15/2000', '6/20/2000', freq='D', name='idx') - expected2 = period_range('6/1/2000', '6/20/2000', freq='D', - name='idx') - - # if target name is different, it will be reset - rng3 = period_range('5/15/2000', '6/20/2000', freq='D', name='other') - expected3 = period_range('6/1/2000', '6/20/2000', freq='D', - name=None) - - rng4 = period_range('7/1/2000', '7/31/2000', freq='D', name='idx') - expected4 = PeriodIndex([], name='idx', freq='D') - - for (rng, expected) in [(rng2, expected2), (rng3, expected3), - (rng4, expected4)]: - result = base.intersection(rng) - tm.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, expected.freq) - - # non-monotonic - base = PeriodIndex(['2011-01-05', '2011-01-04', '2011-01-02', - '2011-01-03'], freq='D', name='idx') - - rng2 = PeriodIndex(['2011-01-04', '2011-01-02', - '2011-02-02', '2011-02-03'], - freq='D', name='idx') - expected2 = PeriodIndex(['2011-01-04', '2011-01-02'], freq='D', - name='idx') - - rng3 = PeriodIndex(['2011-01-04', '2011-01-02', '2011-02-02', - '2011-02-03'], - freq='D', name='other') - expected3 = PeriodIndex(['2011-01-04', '2011-01-02'], freq='D', - name=None) - - rng4 = period_range('7/1/2000', '7/31/2000', freq='D', name='idx') - expected4 = PeriodIndex([], freq='D', name='idx') - - for (rng, expected) in [(rng2, expected2), (rng3, expected3), - (rng4, expected4)]: - result = base.intersection(rng) - tm.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, 'D') - - # empty same freq - rng = date_range('6/1/2000', '6/15/2000', freq='T') - result = rng[0:0].intersection(rng) - self.assertEqual(len(result), 0) - - result = rng.intersection(rng[0:0]) - self.assertEqual(len(result), 0) - - def test_fields(self): - # year, month, day, hour, minute - # second, weekofyear, week, dayofweek, weekday, dayofyear, quarter - # qyear - pi = PeriodIndex(freq='A', start='1/1/2001', end='12/1/2005') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='Q', start='1/1/2001', end='12/1/2002') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='M', start='1/1/2001', end='1/1/2002') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='D', start='12/1/2001', end='6/1/2001') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='B', start='12/1/2001', end='6/1/2001') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='H', start='12/31/2001', end='1/1/2002 23:00') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='Min', start='12/31/2001', end='1/1/2002 00:20') - self._check_all_fields(pi) - - pi = PeriodIndex(freq='S', start='12/31/2001 00:00:00', - end='12/31/2001 00:05:00') - self._check_all_fields(pi) - - end_intv = Period('2006-12-31', 'W') - i1 = PeriodIndex(end=end_intv, periods=10) - self._check_all_fields(i1) - - def _check_all_fields(self, periodindex): - fields = ['year', 'month', 'day', 'hour', 'minute', 'second', - 'weekofyear', 'week', 'dayofweek', 'weekday', 'dayofyear', - 'quarter', 'qyear', 'days_in_month', 'is_leap_year'] - - periods = list(periodindex) - s = pd.Series(periodindex) - - for field in fields: - field_idx = getattr(periodindex, field) - self.assertEqual(len(periodindex), len(field_idx)) - for x, val in zip(periods, field_idx): - self.assertEqual(getattr(x, field), val) - - if len(s) == 0: - continue - - field_s = getattr(s.dt, field) - self.assertEqual(len(periodindex), len(field_s)) - for x, val in zip(periods, field_s): - self.assertEqual(getattr(x, field), val) - - def test_is_full(self): - index = PeriodIndex([2005, 2007, 2009], freq='A') - self.assertFalse(index.is_full) - - index = PeriodIndex([2005, 2006, 2007], freq='A') - self.assertTrue(index.is_full) - - index = PeriodIndex([2005, 2005, 2007], freq='A') - self.assertFalse(index.is_full) - - index = PeriodIndex([2005, 2005, 2006], freq='A') - self.assertTrue(index.is_full) - - index = PeriodIndex([2006, 2005, 2005], freq='A') - self.assertRaises(ValueError, getattr, index, 'is_full') - - self.assertTrue(index[:0].is_full) - - def test_map(self): - index = PeriodIndex([2005, 2007, 2009], freq='A') - result = index.map(lambda x: x + 1) - expected = index + 1 - tm.assert_index_equal(result, expected) - - result = index.map(lambda x: x.ordinal) - exp = Index([x.ordinal for x in index]) - tm.assert_index_equal(result, exp) - - def test_map_with_string_constructor(self): - raw = [2005, 2007, 2009] - index = PeriodIndex(raw, freq='A') - types = str, - - if PY3: - # unicode - types += text_type, - - for t in types: - expected = Index(lmap(t, raw)) - res = index.map(t) - - # should return an Index - tm.assertIsInstance(res, Index) - - # preserve element types - self.assertTrue(all(isinstance(resi, t) for resi in res)) - - # lastly, values should compare equal - tm.assert_index_equal(res, expected) - - def test_convert_array_of_periods(self): - rng = period_range('1/1/2000', periods=20, freq='D') - periods = list(rng) - - result = pd.Index(periods) - tm.assertIsInstance(result, PeriodIndex) - - def test_with_multi_index(self): - # #1705 - index = date_range('1/1/2012', periods=4, freq='12H') - index_as_arrays = [index.to_period(freq='D'), index.hour] - - s = Series([0, 1, 2, 3], index_as_arrays) - - tm.assertIsInstance(s.index.levels[0], PeriodIndex) - - tm.assertIsInstance(s.index.values[0][0], Period) - - def test_to_timestamp_1703(self): - index = period_range('1/1/2012', periods=4, freq='D') - - result = index.to_timestamp() - self.assertEqual(result[0], Timestamp('1/1/2012')) - - def test_to_datetime_depr(self): - index = period_range('1/1/2012', periods=4, freq='D') - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = index.to_datetime() - self.assertEqual(result[0], Timestamp('1/1/2012')) - - def test_get_loc_msg(self): - idx = period_range('2000-1-1', freq='A', periods=10) - bad_period = Period('2012', 'A') - self.assertRaises(KeyError, idx.get_loc, bad_period) - - try: - idx.get_loc(bad_period) - except KeyError as inst: - self.assertEqual(inst.args[0], bad_period) - - def test_get_loc_nat(self): - didx = DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03']) - pidx = PeriodIndex(['2011-01-01', 'NaT', '2011-01-03'], freq='M') - - # check DatetimeIndex compat - for idx in [didx, pidx]: - self.assertEqual(idx.get_loc(pd.NaT), 1) - self.assertEqual(idx.get_loc(None), 1) - self.assertEqual(idx.get_loc(float('nan')), 1) - self.assertEqual(idx.get_loc(np.nan), 1) - - def test_append_concat(self): - # #1815 - d1 = date_range('12/31/1990', '12/31/1999', freq='A-DEC') - d2 = date_range('12/31/2000', '12/31/2009', freq='A-DEC') - - s1 = Series(np.random.randn(10), d1) - s2 = Series(np.random.randn(10), d2) - - s1 = s1.to_period() - s2 = s2.to_period() - - # drops index - result = pd.concat([s1, s2]) - tm.assertIsInstance(result.index, PeriodIndex) - self.assertEqual(result.index[0], s1.index[0]) - - def test_pickle_freq(self): - # GH2891 - prng = period_range('1/1/2011', '1/1/2012', freq='M') - new_prng = self.round_trip_pickle(prng) - self.assertEqual(new_prng.freq, offsets.MonthEnd()) - self.assertEqual(new_prng.freqstr, 'M') - - def test_slice_keep_name(self): - idx = period_range('20010101', periods=10, freq='D', name='bob') - self.assertEqual(idx.name, idx[1:].name) - - def test_factorize(self): - idx1 = PeriodIndex(['2014-01', '2014-01', '2014-02', '2014-02', - '2014-03', '2014-03'], freq='M') - - exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) - exp_idx = PeriodIndex(['2014-01', '2014-02', '2014-03'], freq='M') - - arr, idx = idx1.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - arr, idx = idx1.factorize(sort=True) - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - idx2 = pd.PeriodIndex(['2014-03', '2014-03', '2014-02', '2014-01', - '2014-03', '2014-01'], freq='M') - - exp_arr = np.array([2, 2, 1, 0, 2, 0], dtype=np.intp) - arr, idx = idx2.factorize(sort=True) - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - exp_arr = np.array([0, 0, 1, 2, 0, 2], dtype=np.intp) - exp_idx = PeriodIndex(['2014-03', '2014-02', '2014-01'], freq='M') - arr, idx = idx2.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - def test_recreate_from_data(self): - for o in ['M', 'Q', 'A', 'D', 'B', 'T', 'S', 'L', 'U', 'N', 'H']: - org = PeriodIndex(start='2001/04/01', freq=o, periods=1) - idx = PeriodIndex(org.values, freq=o) - tm.assert_index_equal(idx, org) - - def test_combine_first(self): - # GH 3367 - didx = pd.DatetimeIndex(start='1950-01-31', end='1950-07-31', freq='M') - pidx = pd.PeriodIndex(start=pd.Period('1950-1'), - end=pd.Period('1950-7'), freq='M') - # check to be consistent with DatetimeIndex - for idx in [didx, pidx]: - a = pd.Series([1, np.nan, np.nan, 4, 5, np.nan, 7], index=idx) - b = pd.Series([9, 9, 9, 9, 9, 9, 9], index=idx) - - result = a.combine_first(b) - expected = pd.Series([1, 9, 9, 4, 5, 9, 7], index=idx, - dtype=np.float64) - tm.assert_series_equal(result, expected) - - def test_searchsorted(self): - for freq in ['D', '2D']: - pidx = pd.PeriodIndex(['2014-01-01', '2014-01-02', '2014-01-03', - '2014-01-04', '2014-01-05'], freq=freq) - - p1 = pd.Period('2014-01-01', freq=freq) - self.assertEqual(pidx.searchsorted(p1), 0) - - p2 = pd.Period('2014-01-04', freq=freq) - self.assertEqual(pidx.searchsorted(p2), 3) - - msg = "Input has different freq=H from PeriodIndex" - with self.assertRaisesRegexp(period.IncompatibleFrequency, msg): - pidx.searchsorted(pd.Period('2014-01-01', freq='H')) - - msg = "Input has different freq=5D from PeriodIndex" - with self.assertRaisesRegexp(period.IncompatibleFrequency, msg): - pidx.searchsorted(pd.Period('2014-01-01', freq='5D')) - - with tm.assert_produces_warning(FutureWarning): - pidx.searchsorted(key=p2) - - def test_round_trip(self): - - p = Period('2000Q1') - new_p = self.round_trip_pickle(p) - self.assertEqual(new_p, p) - - -def _permute(obj): - return obj.take(np.random.permutation(len(obj))) - - -class TestMethods(tm.TestCase): - - def test_add(self): - dt1 = Period(freq='D', year=2008, month=1, day=1) - dt2 = Period(freq='D', year=2008, month=1, day=2) - self.assertEqual(dt1 + 1, dt2) - self.assertEqual(1 + dt1, dt2) - - def test_add_pdnat(self): - p = pd.Period('2011-01', freq='M') - self.assertIs(p + pd.NaT, pd.NaT) - self.assertIs(pd.NaT + p, pd.NaT) - - p = pd.Period('NaT', freq='M') - self.assertIs(p + pd.NaT, pd.NaT) - self.assertIs(pd.NaT + p, pd.NaT) - - def test_add_raises(self): - # GH 4731 - dt1 = Period(freq='D', year=2008, month=1, day=1) - dt2 = Period(freq='D', year=2008, month=1, day=2) - msg = r"unsupported operand type\(s\)" - with tm.assertRaisesRegexp(TypeError, msg): - dt1 + "str" - - msg = r"unsupported operand type\(s\)" - with tm.assertRaisesRegexp(TypeError, msg): - "str" + dt1 - - with tm.assertRaisesRegexp(TypeError, msg): - dt1 + dt2 - - def test_sub(self): - dt1 = Period('2011-01-01', freq='D') - dt2 = Period('2011-01-15', freq='D') - - self.assertEqual(dt1 - dt2, -14) - self.assertEqual(dt2 - dt1, 14) - - msg = r"Input has different freq=M from Period\(freq=D\)" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - dt1 - pd.Period('2011-02', freq='M') - - def test_add_offset(self): - # freq is DateOffset - for freq in ['A', '2A', '3A']: - p = Period('2011', freq=freq) - exp = Period('2013', freq=freq) - self.assertEqual(p + offsets.YearEnd(2), exp) - self.assertEqual(offsets.YearEnd(2) + p, exp) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - with tm.assertRaises(period.IncompatibleFrequency): - p + o - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - with tm.assertRaises(period.IncompatibleFrequency): - o + p - - for freq in ['M', '2M', '3M']: - p = Period('2011-03', freq=freq) - exp = Period('2011-05', freq=freq) - self.assertEqual(p + offsets.MonthEnd(2), exp) - self.assertEqual(offsets.MonthEnd(2) + p, exp) - - exp = Period('2012-03', freq=freq) - self.assertEqual(p + offsets.MonthEnd(12), exp) - self.assertEqual(offsets.MonthEnd(12) + p, exp) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - with tm.assertRaises(period.IncompatibleFrequency): - p + o - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - with tm.assertRaises(period.IncompatibleFrequency): - o + p - - # freq is Tick - for freq in ['D', '2D', '3D']: - p = Period('2011-04-01', freq=freq) - - exp = Period('2011-04-06', freq=freq) - self.assertEqual(p + offsets.Day(5), exp) - self.assertEqual(offsets.Day(5) + p, exp) - - exp = Period('2011-04-02', freq=freq) - self.assertEqual(p + offsets.Hour(24), exp) - self.assertEqual(offsets.Hour(24) + p, exp) - - exp = Period('2011-04-03', freq=freq) - self.assertEqual(p + np.timedelta64(2, 'D'), exp) - with tm.assertRaises(TypeError): - np.timedelta64(2, 'D') + p - - exp = Period('2011-04-02', freq=freq) - self.assertEqual(p + np.timedelta64(3600 * 24, 's'), exp) - with tm.assertRaises(TypeError): - np.timedelta64(3600 * 24, 's') + p - - exp = Period('2011-03-30', freq=freq) - self.assertEqual(p + timedelta(-2), exp) - self.assertEqual(timedelta(-2) + p, exp) - - exp = Period('2011-04-03', freq=freq) - self.assertEqual(p + timedelta(hours=48), exp) - self.assertEqual(timedelta(hours=48) + p, exp) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23)]: - with tm.assertRaises(period.IncompatibleFrequency): - p + o - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - with tm.assertRaises(period.IncompatibleFrequency): - o + p - - for freq in ['H', '2H', '3H']: - p = Period('2011-04-01 09:00', freq=freq) - - exp = Period('2011-04-03 09:00', freq=freq) - self.assertEqual(p + offsets.Day(2), exp) - self.assertEqual(offsets.Day(2) + p, exp) - - exp = Period('2011-04-01 12:00', freq=freq) - self.assertEqual(p + offsets.Hour(3), exp) - self.assertEqual(offsets.Hour(3) + p, exp) - - exp = Period('2011-04-01 12:00', freq=freq) - self.assertEqual(p + np.timedelta64(3, 'h'), exp) - with tm.assertRaises(TypeError): - np.timedelta64(3, 'h') + p - - exp = Period('2011-04-01 10:00', freq=freq) - self.assertEqual(p + np.timedelta64(3600, 's'), exp) - with tm.assertRaises(TypeError): - np.timedelta64(3600, 's') + p - - exp = Period('2011-04-01 11:00', freq=freq) - self.assertEqual(p + timedelta(minutes=120), exp) - self.assertEqual(timedelta(minutes=120) + p, exp) - - exp = Period('2011-04-05 12:00', freq=freq) - self.assertEqual(p + timedelta(days=4, minutes=180), exp) - self.assertEqual(timedelta(days=4, minutes=180) + p, exp) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(3200, 's'), - timedelta(hours=23, minutes=30)]: - with tm.assertRaises(period.IncompatibleFrequency): - p + o - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - with tm.assertRaises(period.IncompatibleFrequency): - o + p - - def test_add_offset_nat(self): - # freq is DateOffset - for freq in ['A', '2A', '3A']: - p = Period('NaT', freq=freq) - for o in [offsets.YearEnd(2)]: - self.assertIs(p + o, tslib.NaT) - self.assertIs(o + p, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - for freq in ['M', '2M', '3M']: - p = Period('NaT', freq=freq) - for o in [offsets.MonthEnd(2), offsets.MonthEnd(12)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - # freq is Tick - for freq in ['D', '2D', '3D']: - p = Period('NaT', freq=freq) - for o in [offsets.Day(5), offsets.Hour(24), np.timedelta64(2, 'D'), - np.timedelta64(3600 * 24, 's'), timedelta(-2), - timedelta(hours=48)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - for freq in ['H', '2H', '3H']: - p = Period('NaT', freq=freq) - for o in [offsets.Day(2), offsets.Hour(3), np.timedelta64(3, 'h'), - np.timedelta64(3600, 's'), timedelta(minutes=120), - timedelta(days=4, minutes=180)]: - self.assertIs(p + o, tslib.NaT) - - if not isinstance(o, np.timedelta64): - self.assertIs(o + p, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(3200, 's'), - timedelta(hours=23, minutes=30)]: - self.assertIs(p + o, tslib.NaT) - - if isinstance(o, np.timedelta64): - with tm.assertRaises(TypeError): - o + p - else: - self.assertIs(o + p, tslib.NaT) - - def test_sub_pdnat(self): - # GH 13071 - p = pd.Period('2011-01', freq='M') - self.assertIs(p - pd.NaT, pd.NaT) - self.assertIs(pd.NaT - p, pd.NaT) - - p = pd.Period('NaT', freq='M') - self.assertIs(p - pd.NaT, pd.NaT) - self.assertIs(pd.NaT - p, pd.NaT) - - def test_sub_offset(self): - # freq is DateOffset - for freq in ['A', '2A', '3A']: - p = Period('2011', freq=freq) - self.assertEqual(p - offsets.YearEnd(2), Period('2009', freq=freq)) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - with tm.assertRaises(period.IncompatibleFrequency): - p - o - - for freq in ['M', '2M', '3M']: - p = Period('2011-03', freq=freq) - self.assertEqual(p - offsets.MonthEnd(2), - Period('2011-01', freq=freq)) - self.assertEqual(p - offsets.MonthEnd(12), - Period('2010-03', freq=freq)) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - with tm.assertRaises(period.IncompatibleFrequency): - p - o - - # freq is Tick - for freq in ['D', '2D', '3D']: - p = Period('2011-04-01', freq=freq) - self.assertEqual(p - offsets.Day(5), - Period('2011-03-27', freq=freq)) - self.assertEqual(p - offsets.Hour(24), - Period('2011-03-31', freq=freq)) - self.assertEqual(p - np.timedelta64(2, 'D'), - Period('2011-03-30', freq=freq)) - self.assertEqual(p - np.timedelta64(3600 * 24, 's'), - Period('2011-03-31', freq=freq)) - self.assertEqual(p - timedelta(-2), - Period('2011-04-03', freq=freq)) - self.assertEqual(p - timedelta(hours=48), - Period('2011-03-30', freq=freq)) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23)]: - with tm.assertRaises(period.IncompatibleFrequency): - p - o - - for freq in ['H', '2H', '3H']: - p = Period('2011-04-01 09:00', freq=freq) - self.assertEqual(p - offsets.Day(2), - Period('2011-03-30 09:00', freq=freq)) - self.assertEqual(p - offsets.Hour(3), - Period('2011-04-01 06:00', freq=freq)) - self.assertEqual(p - np.timedelta64(3, 'h'), - Period('2011-04-01 06:00', freq=freq)) - self.assertEqual(p - np.timedelta64(3600, 's'), - Period('2011-04-01 08:00', freq=freq)) - self.assertEqual(p - timedelta(minutes=120), - Period('2011-04-01 07:00', freq=freq)) - self.assertEqual(p - timedelta(days=4, minutes=180), - Period('2011-03-28 06:00', freq=freq)) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(3200, 's'), - timedelta(hours=23, minutes=30)]: - with tm.assertRaises(period.IncompatibleFrequency): - p - o - - def test_sub_offset_nat(self): - # freq is DateOffset - for freq in ['A', '2A', '3A']: - p = Period('NaT', freq=freq) - for o in [offsets.YearEnd(2)]: - self.assertIs(p - o, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - self.assertIs(p - o, tslib.NaT) - - for freq in ['M', '2M', '3M']: - p = Period('NaT', freq=freq) - for o in [offsets.MonthEnd(2), offsets.MonthEnd(12)]: - self.assertIs(p - o, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(365, 'D'), - timedelta(365)]: - self.assertIs(p - o, tslib.NaT) - - # freq is Tick - for freq in ['D', '2D', '3D']: - p = Period('NaT', freq=freq) - for o in [offsets.Day(5), offsets.Hour(24), np.timedelta64(2, 'D'), - np.timedelta64(3600 * 24, 's'), timedelta(-2), - timedelta(hours=48)]: - self.assertIs(p - o, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(4, 'h'), - timedelta(hours=23)]: - self.assertIs(p - o, tslib.NaT) - - for freq in ['H', '2H', '3H']: - p = Period('NaT', freq=freq) - for o in [offsets.Day(2), offsets.Hour(3), np.timedelta64(3, 'h'), - np.timedelta64(3600, 's'), timedelta(minutes=120), - timedelta(days=4, minutes=180)]: - self.assertIs(p - o, tslib.NaT) - - for o in [offsets.YearBegin(2), offsets.MonthBegin(1), - offsets.Minute(), np.timedelta64(3200, 's'), - timedelta(hours=23, minutes=30)]: - self.assertIs(p - o, tslib.NaT) - - def test_nat_ops(self): - for freq in ['M', '2M', '3M']: - p = Period('NaT', freq=freq) - self.assertIs(p + 1, tslib.NaT) - self.assertIs(1 + p, tslib.NaT) - self.assertIs(p - 1, tslib.NaT) - self.assertIs(p - Period('2011-01', freq=freq), tslib.NaT) - self.assertIs(Period('2011-01', freq=freq) - p, tslib.NaT) - - def test_period_ops_offset(self): - p = Period('2011-04-01', freq='D') - result = p + offsets.Day() - exp = pd.Period('2011-04-02', freq='D') - self.assertEqual(result, exp) - - result = p - offsets.Day(2) - exp = pd.Period('2011-03-30', freq='D') - self.assertEqual(result, exp) - - msg = r"Input cannot be converted to Period\(freq=D\)" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - p + offsets.Hour(2) - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - p - offsets.Hour(2) - - -class TestPeriodIndexSeriesMethods(tm.TestCase): - """ Test PeriodIndex and Period Series Ops consistency """ - - def _check(self, values, func, expected): - idx = pd.PeriodIndex(values) - result = func(idx) - if isinstance(expected, pd.Index): - tm.assert_index_equal(result, expected) - else: - # comp op results in bool - tm.assert_numpy_array_equal(result, expected) - - s = pd.Series(values) - result = func(s) - - exp = pd.Series(expected, name=values.name) - tm.assert_series_equal(result, exp) - - def test_pi_ops(self): - idx = PeriodIndex(['2011-01', '2011-02', '2011-03', - '2011-04'], freq='M', name='idx') - - expected = PeriodIndex(['2011-03', '2011-04', - '2011-05', '2011-06'], freq='M', name='idx') - self._check(idx, lambda x: x + 2, expected) - self._check(idx, lambda x: 2 + x, expected) - - self._check(idx + 2, lambda x: x - 2, idx) - result = idx - Period('2011-01', freq='M') - exp = pd.Index([0, 1, 2, 3], name='idx') - tm.assert_index_equal(result, exp) - - result = Period('2011-01', freq='M') - idx - exp = pd.Index([0, -1, -2, -3], name='idx') - tm.assert_index_equal(result, exp) - - def test_pi_ops_errors(self): - idx = PeriodIndex(['2011-01', '2011-02', '2011-03', - '2011-04'], freq='M', name='idx') - s = pd.Series(idx) - - msg = r"unsupported operand type\(s\)" - - for obj in [idx, s]: - for ng in ["str", 1.5]: - with tm.assertRaisesRegexp(TypeError, msg): - obj + ng - - with tm.assertRaises(TypeError): - # error message differs between PY2 and 3 - ng + obj - - with tm.assertRaisesRegexp(TypeError, msg): - obj - ng - - with tm.assertRaises(TypeError): - np.add(obj, ng) - - if _np_version_under1p10: - self.assertIs(np.add(ng, obj), NotImplemented) - else: - with tm.assertRaises(TypeError): - np.add(ng, obj) - - with tm.assertRaises(TypeError): - np.subtract(obj, ng) - - if _np_version_under1p10: - self.assertIs(np.subtract(ng, obj), NotImplemented) - else: - with tm.assertRaises(TypeError): - np.subtract(ng, obj) - - def test_pi_ops_nat(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - expected = PeriodIndex(['2011-03', '2011-04', - 'NaT', '2011-06'], freq='M', name='idx') - self._check(idx, lambda x: x + 2, expected) - self._check(idx, lambda x: 2 + x, expected) - self._check(idx, lambda x: np.add(x, 2), expected) - - self._check(idx + 2, lambda x: x - 2, idx) - self._check(idx + 2, lambda x: np.subtract(x, 2), idx) - - # freq with mult - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='2M', name='idx') - expected = PeriodIndex(['2011-07', '2011-08', - 'NaT', '2011-10'], freq='2M', name='idx') - self._check(idx, lambda x: x + 3, expected) - self._check(idx, lambda x: 3 + x, expected) - self._check(idx, lambda x: np.add(x, 3), expected) - - self._check(idx + 3, lambda x: x - 3, idx) - self._check(idx + 3, lambda x: np.subtract(x, 3), idx) - - def test_pi_ops_array_int(self): - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - f = lambda x: x + np.array([1, 2, 3, 4]) - exp = PeriodIndex(['2011-02', '2011-04', 'NaT', - '2011-08'], freq='M', name='idx') - self._check(idx, f, exp) - - f = lambda x: np.add(x, np.array([4, -1, 1, 2])) - exp = PeriodIndex(['2011-05', '2011-01', 'NaT', - '2011-06'], freq='M', name='idx') - self._check(idx, f, exp) - - f = lambda x: x - np.array([1, 2, 3, 4]) - exp = PeriodIndex(['2010-12', '2010-12', 'NaT', - '2010-12'], freq='M', name='idx') - self._check(idx, f, exp) - - f = lambda x: np.subtract(x, np.array([3, 2, 3, -2])) - exp = PeriodIndex(['2010-10', '2010-12', 'NaT', - '2011-06'], freq='M', name='idx') - self._check(idx, f, exp) - - def test_pi_ops_offset(self): - idx = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01', - '2011-04-01'], freq='D', name='idx') - f = lambda x: x + offsets.Day() - exp = PeriodIndex(['2011-01-02', '2011-02-02', '2011-03-02', - '2011-04-02'], freq='D', name='idx') - self._check(idx, f, exp) - - f = lambda x: x + offsets.Day(2) - exp = PeriodIndex(['2011-01-03', '2011-02-03', '2011-03-03', - '2011-04-03'], freq='D', name='idx') - self._check(idx, f, exp) - - f = lambda x: x - offsets.Day(2) - exp = PeriodIndex(['2010-12-30', '2011-01-30', '2011-02-27', - '2011-03-30'], freq='D', name='idx') - self._check(idx, f, exp) - - def test_pi_offset_errors(self): - idx = PeriodIndex(['2011-01-01', '2011-02-01', '2011-03-01', - '2011-04-01'], freq='D', name='idx') - s = pd.Series(idx) - - # Series op is applied per Period instance, thus error is raised - # from Period - msg_idx = r"Input has different freq from PeriodIndex\(freq=D\)" - msg_s = r"Input cannot be converted to Period\(freq=D\)" - for obj, msg in [(idx, msg_idx), (s, msg_s)]: - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - obj + offsets.Hour(2) - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - offsets.Hour(2) + obj - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - obj - offsets.Hour(2) - - def test_pi_sub_period(self): - # GH 13071 - idx = PeriodIndex(['2011-01', '2011-02', '2011-03', - '2011-04'], freq='M', name='idx') - - result = idx - pd.Period('2012-01', freq='M') - exp = pd.Index([-12, -11, -10, -9], name='idx') - tm.assert_index_equal(result, exp) - - result = np.subtract(idx, pd.Period('2012-01', freq='M')) - tm.assert_index_equal(result, exp) - - result = pd.Period('2012-01', freq='M') - idx - exp = pd.Index([12, 11, 10, 9], name='idx') - tm.assert_index_equal(result, exp) - - result = np.subtract(pd.Period('2012-01', freq='M'), idx) - if _np_version_under1p10: - self.assertIs(result, NotImplemented) - else: - tm.assert_index_equal(result, exp) - - exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx') - tm.assert_index_equal(idx - pd.Period('NaT', freq='M'), exp) - tm.assert_index_equal(pd.Period('NaT', freq='M') - idx, exp) - - def test_pi_sub_pdnat(self): - # GH 13071 - idx = PeriodIndex(['2011-01', '2011-02', 'NaT', - '2011-04'], freq='M', name='idx') - exp = pd.TimedeltaIndex([pd.NaT] * 4, name='idx') - tm.assert_index_equal(pd.NaT - idx, exp) - tm.assert_index_equal(idx - pd.NaT, exp) - - def test_pi_sub_period_nat(self): - # GH 13071 - idx = PeriodIndex(['2011-01', 'NaT', '2011-03', - '2011-04'], freq='M', name='idx') - - result = idx - pd.Period('2012-01', freq='M') - exp = pd.Index([-12, np.nan, -10, -9], name='idx') - tm.assert_index_equal(result, exp) - - result = pd.Period('2012-01', freq='M') - idx - exp = pd.Index([12, np.nan, 10, 9], name='idx') - tm.assert_index_equal(result, exp) - - exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx') - tm.assert_index_equal(idx - pd.Period('NaT', freq='M'), exp) - tm.assert_index_equal(pd.Period('NaT', freq='M') - idx, exp) - - def test_pi_comp_period(self): - idx = PeriodIndex(['2011-01', '2011-02', '2011-03', - '2011-04'], freq='M', name='idx') - - f = lambda x: x == pd.Period('2011-03', freq='M') - exp = np.array([False, False, True, False], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: pd.Period('2011-03', freq='M') == x - self._check(idx, f, exp) - - f = lambda x: x != pd.Period('2011-03', freq='M') - exp = np.array([True, True, False, True], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: pd.Period('2011-03', freq='M') != x - self._check(idx, f, exp) - - f = lambda x: pd.Period('2011-03', freq='M') >= x - exp = np.array([True, True, True, False], dtype=np.bool) - self._check(idx, f, exp) - - f = lambda x: x > pd.Period('2011-03', freq='M') - exp = np.array([False, False, False, True], dtype=np.bool) - self._check(idx, f, exp) - - f = lambda x: pd.Period('2011-03', freq='M') >= x - exp = np.array([True, True, True, False], dtype=np.bool) - self._check(idx, f, exp) - - def test_pi_comp_period_nat(self): - idx = PeriodIndex(['2011-01', 'NaT', '2011-03', - '2011-04'], freq='M', name='idx') - - f = lambda x: x == pd.Period('2011-03', freq='M') - exp = np.array([False, False, True, False], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: pd.Period('2011-03', freq='M') == x - self._check(idx, f, exp) - - f = lambda x: x == tslib.NaT - exp = np.array([False, False, False, False], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: tslib.NaT == x - self._check(idx, f, exp) - - f = lambda x: x != pd.Period('2011-03', freq='M') - exp = np.array([True, True, False, True], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: pd.Period('2011-03', freq='M') != x - self._check(idx, f, exp) - - f = lambda x: x != tslib.NaT - exp = np.array([True, True, True, True], dtype=np.bool) - self._check(idx, f, exp) - f = lambda x: tslib.NaT != x - self._check(idx, f, exp) - - f = lambda x: pd.Period('2011-03', freq='M') >= x - exp = np.array([True, False, True, False], dtype=np.bool) - self._check(idx, f, exp) - - f = lambda x: x < pd.Period('2011-03', freq='M') - exp = np.array([True, False, False, False], dtype=np.bool) - self._check(idx, f, exp) - - f = lambda x: x > tslib.NaT - exp = np.array([False, False, False, False], dtype=np.bool) - self._check(idx, f, exp) - - f = lambda x: tslib.NaT >= x - exp = np.array([False, False, False, False], dtype=np.bool) - self._check(idx, f, exp) - - -class TestPeriodRepresentation(tm.TestCase): - """ - Wish to match NumPy units - """ - - def test_annual(self): - self._check_freq('A', 1970) - - def test_monthly(self): - self._check_freq('M', '1970-01') - - def test_weekly(self): - self._check_freq('W-THU', '1970-01-01') - - def test_daily(self): - self._check_freq('D', '1970-01-01') - - def test_business_daily(self): - self._check_freq('B', '1970-01-01') - - def test_hourly(self): - self._check_freq('H', '1970-01-01') - - def test_minutely(self): - self._check_freq('T', '1970-01-01') - - def test_secondly(self): - self._check_freq('S', '1970-01-01') - - def test_millisecondly(self): - self._check_freq('L', '1970-01-01') - - def test_microsecondly(self): - self._check_freq('U', '1970-01-01') - - def test_nanosecondly(self): - self._check_freq('N', '1970-01-01') - - def _check_freq(self, freq, base_date): - rng = PeriodIndex(start=base_date, periods=10, freq=freq) - exp = np.arange(10, dtype=np.int64) - self.assert_numpy_array_equal(rng._values, exp) - self.assert_numpy_array_equal(rng.asi8, exp) - - def test_negone_ordinals(self): - freqs = ['A', 'M', 'Q', 'D', 'H', 'T', 'S'] - - period = Period(ordinal=-1, freq='D') - for freq in freqs: - repr(period.asfreq(freq)) - - for freq in freqs: - period = Period(ordinal=-1, freq=freq) - repr(period) - self.assertEqual(period.year, 1969) - - period = Period(ordinal=-1, freq='B') - repr(period) - period = Period(ordinal=-1, freq='W') - repr(period) - - -class TestComparisons(tm.TestCase): - def setUp(self): - self.january1 = Period('2000-01', 'M') - self.january2 = Period('2000-01', 'M') - self.february = Period('2000-02', 'M') - self.march = Period('2000-03', 'M') - self.day = Period('2012-01-01', 'D') - - def test_equal(self): - self.assertEqual(self.january1, self.january2) - - def test_equal_Raises_Value(self): - with tm.assertRaises(period.IncompatibleFrequency): - self.january1 == self.day - - def test_notEqual(self): - self.assertNotEqual(self.january1, 1) - self.assertNotEqual(self.january1, self.february) - - def test_greater(self): - self.assertTrue(self.february > self.january1) - - def test_greater_Raises_Value(self): - with tm.assertRaises(period.IncompatibleFrequency): - self.january1 > self.day - - def test_greater_Raises_Type(self): - with tm.assertRaises(TypeError): - self.january1 > 1 - - def test_greaterEqual(self): - self.assertTrue(self.january1 >= self.january2) - - def test_greaterEqual_Raises_Value(self): - with tm.assertRaises(period.IncompatibleFrequency): - self.january1 >= self.day - - with tm.assertRaises(TypeError): - print(self.january1 >= 1) - - def test_smallerEqual(self): - self.assertTrue(self.january1 <= self.january2) - - def test_smallerEqual_Raises_Value(self): - with tm.assertRaises(period.IncompatibleFrequency): - self.january1 <= self.day - - def test_smallerEqual_Raises_Type(self): - with tm.assertRaises(TypeError): - self.january1 <= 1 - - def test_smaller(self): - self.assertTrue(self.january1 < self.february) - - def test_smaller_Raises_Value(self): - with tm.assertRaises(period.IncompatibleFrequency): - self.january1 < self.day - - def test_smaller_Raises_Type(self): - with tm.assertRaises(TypeError): - self.january1 < 1 - - def test_sort(self): - periods = [self.march, self.january1, self.february] - correctPeriods = [self.january1, self.february, self.march] - self.assertEqual(sorted(periods), correctPeriods) - - def test_period_nat_comp(self): - p_nat = Period('NaT', freq='D') - p = Period('2011-01-01', freq='D') - - nat = pd.Timestamp('NaT') - t = pd.Timestamp('2011-01-01') - # confirm Period('NaT') work identical with Timestamp('NaT') - for left, right in [(p_nat, p), (p, p_nat), (p_nat, p_nat), (nat, t), - (t, nat), (nat, nat)]: - self.assertEqual(left < right, False) - self.assertEqual(left > right, False) - self.assertEqual(left == right, False) - self.assertEqual(left != right, True) - self.assertEqual(left <= right, False) - self.assertEqual(left >= right, False) - - def test_pi_pi_comp(self): - - for freq in ['M', '2M', '3M']: - base = PeriodIndex(['2011-01', '2011-02', - '2011-03', '2011-04'], freq=freq) - p = Period('2011-02', freq=freq) - - exp = np.array([False, True, False, False]) - self.assert_numpy_array_equal(base == p, exp) - self.assert_numpy_array_equal(p == base, exp) - - exp = np.array([True, False, True, True]) - self.assert_numpy_array_equal(base != p, exp) - self.assert_numpy_array_equal(p != base, exp) - - exp = np.array([False, False, True, True]) - self.assert_numpy_array_equal(base > p, exp) - self.assert_numpy_array_equal(p < base, exp) - - exp = np.array([True, False, False, False]) - self.assert_numpy_array_equal(base < p, exp) - self.assert_numpy_array_equal(p > base, exp) - - exp = np.array([False, True, True, True]) - self.assert_numpy_array_equal(base >= p, exp) - self.assert_numpy_array_equal(p <= base, exp) - - exp = np.array([True, True, False, False]) - self.assert_numpy_array_equal(base <= p, exp) - self.assert_numpy_array_equal(p >= base, exp) - - idx = PeriodIndex(['2011-02', '2011-01', '2011-03', - '2011-05'], freq=freq) - - exp = np.array([False, False, True, False]) - self.assert_numpy_array_equal(base == idx, exp) - - exp = np.array([True, True, False, True]) - self.assert_numpy_array_equal(base != idx, exp) - - exp = np.array([False, True, False, False]) - self.assert_numpy_array_equal(base > idx, exp) - - exp = np.array([True, False, False, True]) - self.assert_numpy_array_equal(base < idx, exp) - - exp = np.array([False, True, True, False]) - self.assert_numpy_array_equal(base >= idx, exp) - - exp = np.array([True, False, True, True]) - self.assert_numpy_array_equal(base <= idx, exp) - - # different base freq - msg = "Input has different freq=A-DEC from PeriodIndex" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - base <= Period('2011', freq='A') - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - Period('2011', freq='A') >= base - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - idx = PeriodIndex(['2011', '2012', '2013', '2014'], freq='A') - base <= idx - - # different mult - msg = "Input has different freq=4M from PeriodIndex" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - base <= Period('2011', freq='4M') - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - Period('2011', freq='4M') >= base - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - idx = PeriodIndex(['2011', '2012', '2013', '2014'], freq='4M') - base <= idx - - def test_pi_nat_comp(self): - for freq in ['M', '2M', '3M']: - idx1 = PeriodIndex( - ['2011-01', '2011-02', 'NaT', '2011-05'], freq=freq) - - result = idx1 > Period('2011-02', freq=freq) - exp = np.array([False, False, False, True]) - self.assert_numpy_array_equal(result, exp) - result = Period('2011-02', freq=freq) < idx1 - self.assert_numpy_array_equal(result, exp) - - result = idx1 == Period('NaT', freq=freq) - exp = np.array([False, False, False, False]) - self.assert_numpy_array_equal(result, exp) - result = Period('NaT', freq=freq) == idx1 - self.assert_numpy_array_equal(result, exp) - - result = idx1 != Period('NaT', freq=freq) - exp = np.array([True, True, True, True]) - self.assert_numpy_array_equal(result, exp) - result = Period('NaT', freq=freq) != idx1 - self.assert_numpy_array_equal(result, exp) - - idx2 = PeriodIndex(['2011-02', '2011-01', '2011-04', - 'NaT'], freq=freq) - result = idx1 < idx2 - exp = np.array([True, False, False, False]) - self.assert_numpy_array_equal(result, exp) - - result = idx1 == idx2 - exp = np.array([False, False, False, False]) - self.assert_numpy_array_equal(result, exp) - - result = idx1 != idx2 - exp = np.array([True, True, True, True]) - self.assert_numpy_array_equal(result, exp) - - result = idx1 == idx1 - exp = np.array([True, True, False, True]) - self.assert_numpy_array_equal(result, exp) - - result = idx1 != idx1 - exp = np.array([False, False, True, False]) - self.assert_numpy_array_equal(result, exp) - - diff = PeriodIndex(['2011-02', '2011-01', '2011-04', - 'NaT'], freq='4M') - msg = "Input has different freq=4M from PeriodIndex" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - idx1 > diff - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - idx1 == diff - - -class TestSeriesPeriod(tm.TestCase): - - def setUp(self): - self.series = Series(period_range('2000-01-01', periods=10, freq='D')) - - def test_auto_conversion(self): - series = Series(list(period_range('2000-01-01', periods=10, freq='D'))) - self.assertEqual(series.dtype, 'object') - - series = pd.Series([pd.Period('2011-01-01', freq='D'), - pd.Period('2011-02-01', freq='D')]) - self.assertEqual(series.dtype, 'object') - - def test_getitem(self): - self.assertEqual(self.series[1], pd.Period('2000-01-02', freq='D')) - - result = self.series[[2, 4]] - exp = pd.Series([pd.Period('2000-01-03', freq='D'), - pd.Period('2000-01-05', freq='D')], - index=[2, 4]) - self.assert_series_equal(result, exp) - self.assertEqual(result.dtype, 'object') - - def test_constructor_cant_cast_period(self): - with tm.assertRaises(TypeError): - Series(period_range('2000-01-01', periods=10, freq='D'), - dtype=float) - - def test_constructor_cast_object(self): - s = Series(period_range('1/1/2000', periods=10), dtype=object) - exp = Series(period_range('1/1/2000', periods=10)) - tm.assert_series_equal(s, exp) - - def test_isnull(self): - # GH 13737 - s = Series([pd.Period('2011-01', freq='M'), - pd.Period('NaT', freq='M')]) - tm.assert_series_equal(s.isnull(), Series([False, True])) - tm.assert_series_equal(s.notnull(), Series([True, False])) - - def test_fillna(self): - # GH 13737 - s = Series([pd.Period('2011-01', freq='M'), - pd.Period('NaT', freq='M')]) - - res = s.fillna(pd.Period('2012-01', freq='M')) - exp = Series([pd.Period('2011-01', freq='M'), - pd.Period('2012-01', freq='M')]) - tm.assert_series_equal(res, exp) - self.assertEqual(res.dtype, 'object') - - res = s.fillna('XXX') - exp = Series([pd.Period('2011-01', freq='M'), 'XXX']) - tm.assert_series_equal(res, exp) - self.assertEqual(res.dtype, 'object') - - def test_dropna(self): - # GH 13737 - s = Series([pd.Period('2011-01', freq='M'), - pd.Period('NaT', freq='M')]) - tm.assert_series_equal(s.dropna(), - Series([pd.Period('2011-01', freq='M')])) - - def test_series_comparison_scalars(self): - val = pd.Period('2000-01-04', freq='D') - result = self.series > val - expected = pd.Series([x > val for x in self.series]) - tm.assert_series_equal(result, expected) - - val = self.series[5] - result = self.series > val - expected = pd.Series([x > val for x in self.series]) - tm.assert_series_equal(result, expected) - - def test_between(self): - left, right = self.series[[2, 7]] - result = self.series.between(left, right) - expected = (self.series >= left) & (self.series <= right) - tm.assert_series_equal(result, expected) - - # --------------------------------------------------------------------- - # NaT support - - """ - # ToDo: Enable when support period dtype - def test_NaT_scalar(self): - series = Series([0, 1000, 2000, iNaT], dtype='period[D]') - - val = series[3] - self.assertTrue(isnull(val)) - - series[2] = val - self.assertTrue(isnull(series[2])) - - def test_NaT_cast(self): - result = Series([np.nan]).astype('period[D]') - expected = Series([NaT]) - tm.assert_series_equal(result, expected) - """ - - def test_set_none_nan(self): - # currently Period is stored as object dtype, not as NaT - self.series[3] = None - self.assertIs(self.series[3], None) - - self.series[3:5] = None - self.assertIs(self.series[4], None) - - self.series[5] = np.nan - self.assertTrue(np.isnan(self.series[5])) - - self.series[5:7] = np.nan - self.assertTrue(np.isnan(self.series[6])) - - def test_intercept_astype_object(self): - expected = self.series.astype('object') - - df = DataFrame({'a': self.series, - 'b': np.random.randn(len(self.series))}) - - result = df.values.squeeze() - self.assertTrue((result[:, 0] == expected.values).all()) - - df = DataFrame({'a': self.series, 'b': ['foo'] * len(self.series)}) - - result = df.values.squeeze() - self.assertTrue((result[:, 0] == expected.values).all()) - - def test_ops_series_timedelta(self): - # GH 13043 - s = pd.Series([pd.Period('2015-01-01', freq='D'), - pd.Period('2015-01-02', freq='D')], name='xxx') - self.assertEqual(s.dtype, object) - - exp = pd.Series([pd.Period('2015-01-02', freq='D'), - pd.Period('2015-01-03', freq='D')], name='xxx') - tm.assert_series_equal(s + pd.Timedelta('1 days'), exp) - tm.assert_series_equal(pd.Timedelta('1 days') + s, exp) - - tm.assert_series_equal(s + pd.tseries.offsets.Day(), exp) - tm.assert_series_equal(pd.tseries.offsets.Day() + s, exp) - - def test_ops_series_period(self): - # GH 13043 - s = pd.Series([pd.Period('2015-01-01', freq='D'), - pd.Period('2015-01-02', freq='D')], name='xxx') - self.assertEqual(s.dtype, object) - - p = pd.Period('2015-01-10', freq='D') - # dtype will be object because of original dtype - exp = pd.Series([9, 8], name='xxx', dtype=object) - tm.assert_series_equal(p - s, exp) - tm.assert_series_equal(s - p, -exp) - - s2 = pd.Series([pd.Period('2015-01-05', freq='D'), - pd.Period('2015-01-04', freq='D')], name='xxx') - self.assertEqual(s2.dtype, object) - - exp = pd.Series([4, 2], name='xxx', dtype=object) - tm.assert_series_equal(s2 - s, exp) - tm.assert_series_equal(s - s2, -exp) - - def test_comp_series_period_scalar(self): - # GH 13200 - for freq in ['M', '2M', '3M']: - base = Series([Period(x, freq=freq) for x in - ['2011-01', '2011-02', '2011-03', '2011-04']]) - p = Period('2011-02', freq=freq) - - exp = pd.Series([False, True, False, False]) - tm.assert_series_equal(base == p, exp) - tm.assert_series_equal(p == base, exp) - - exp = pd.Series([True, False, True, True]) - tm.assert_series_equal(base != p, exp) - tm.assert_series_equal(p != base, exp) - - exp = pd.Series([False, False, True, True]) - tm.assert_series_equal(base > p, exp) - tm.assert_series_equal(p < base, exp) - - exp = pd.Series([True, False, False, False]) - tm.assert_series_equal(base < p, exp) - tm.assert_series_equal(p > base, exp) - - exp = pd.Series([False, True, True, True]) - tm.assert_series_equal(base >= p, exp) - tm.assert_series_equal(p <= base, exp) - - exp = pd.Series([True, True, False, False]) - tm.assert_series_equal(base <= p, exp) - tm.assert_series_equal(p >= base, exp) - - # different base freq - msg = "Input has different freq=A-DEC from Period" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - base <= Period('2011', freq='A') - - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - Period('2011', freq='A') >= base - - def test_comp_series_period_series(self): - # GH 13200 - for freq in ['M', '2M', '3M']: - base = Series([Period(x, freq=freq) for x in - ['2011-01', '2011-02', '2011-03', '2011-04']]) - - s = Series([Period(x, freq=freq) for x in - ['2011-02', '2011-01', '2011-03', '2011-05']]) - - exp = Series([False, False, True, False]) - tm.assert_series_equal(base == s, exp) - - exp = Series([True, True, False, True]) - tm.assert_series_equal(base != s, exp) - - exp = Series([False, True, False, False]) - tm.assert_series_equal(base > s, exp) - - exp = Series([True, False, False, True]) - tm.assert_series_equal(base < s, exp) - - exp = Series([False, True, True, False]) - tm.assert_series_equal(base >= s, exp) - - exp = Series([True, False, True, True]) - tm.assert_series_equal(base <= s, exp) - - s2 = Series([Period(x, freq='A') for x in - ['2011', '2011', '2011', '2011']]) - - # different base freq - msg = "Input has different freq=A-DEC from Period" - with tm.assertRaisesRegexp(period.IncompatibleFrequency, msg): - base <= s2 - - def test_comp_series_period_object(self): - # GH 13200 - base = Series([Period('2011', freq='A'), Period('2011-02', freq='M'), - Period('2013', freq='A'), Period('2011-04', freq='M')]) - - s = Series([Period('2012', freq='A'), Period('2011-01', freq='M'), - Period('2013', freq='A'), Period('2011-05', freq='M')]) - - exp = Series([False, False, True, False]) - tm.assert_series_equal(base == s, exp) - - exp = Series([True, True, False, True]) - tm.assert_series_equal(base != s, exp) - - exp = Series([False, True, False, False]) - tm.assert_series_equal(base > s, exp) - - exp = Series([True, False, False, True]) - tm.assert_series_equal(base < s, exp) - - exp = Series([False, True, True, False]) - tm.assert_series_equal(base >= s, exp) - - exp = Series([True, False, True, True]) - tm.assert_series_equal(base <= s, exp) - - def test_ops_frame_period(self): - # GH 13043 - df = pd.DataFrame({'A': [pd.Period('2015-01', freq='M'), - pd.Period('2015-02', freq='M')], - 'B': [pd.Period('2014-01', freq='M'), - pd.Period('2014-02', freq='M')]}) - self.assertEqual(df['A'].dtype, object) - self.assertEqual(df['B'].dtype, object) - - p = pd.Period('2015-03', freq='M') - # dtype will be object because of original dtype - exp = pd.DataFrame({'A': np.array([2, 1], dtype=object), - 'B': np.array([14, 13], dtype=object)}) - tm.assert_frame_equal(p - df, exp) - tm.assert_frame_equal(df - p, -exp) - - df2 = pd.DataFrame({'A': [pd.Period('2015-05', freq='M'), - pd.Period('2015-06', freq='M')], - 'B': [pd.Period('2015-05', freq='M'), - pd.Period('2015-06', freq='M')]}) - self.assertEqual(df2['A'].dtype, object) - self.assertEqual(df2['B'].dtype, object) - - exp = pd.DataFrame({'A': np.array([4, 4], dtype=object), - 'B': np.array([16, 16], dtype=object)}) - tm.assert_frame_equal(df2 - df, exp) - tm.assert_frame_equal(df - df2, -exp) - - -class TestPeriodField(tm.TestCase): - def test_get_period_field_raises_on_out_of_range(self): - self.assertRaises(ValueError, _period.get_period_field, -1, 0, 0) - - def test_get_period_field_array_raises_on_out_of_range(self): - self.assertRaises(ValueError, _period.get_period_field_arr, -1, - np.empty(1), 0) - - -if __name__ == '__main__': - import nose - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_timedeltas.py b/pandas/tseries/tests/test_timedeltas.py deleted file mode 100644 index 6efa024d81b98..0000000000000 --- a/pandas/tseries/tests/test_timedeltas.py +++ /dev/null @@ -1,2058 +0,0 @@ -# pylint: disable-msg=E1101,W0612 - -from __future__ import division -from datetime import timedelta, time -import nose - -from distutils.version import LooseVersion -import numpy as np -import pandas as pd - -from pandas import (Index, Series, DataFrame, Timestamp, Timedelta, - TimedeltaIndex, isnull, date_range, - timedelta_range, Int64Index) -from pandas.compat import range -from pandas import compat, to_timedelta, tslib -from pandas.tseries.timedeltas import _coerce_scalar_to_timedelta_type as ct -from pandas.util.testing import (assert_series_equal, assert_frame_equal, - assert_almost_equal, assert_index_equal) -from pandas.tseries.offsets import Day, Second -import pandas.util.testing as tm -from numpy.random import randn -from pandas import _np_version_under1p8 - -iNaT = tslib.iNaT - - -class TestTimedeltas(tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): - pass - - def test_get_loc_nat(self): - tidx = TimedeltaIndex(['1 days 01:00:00', 'NaT', '2 days 01:00:00']) - - self.assertEqual(tidx.get_loc(pd.NaT), 1) - self.assertEqual(tidx.get_loc(None), 1) - self.assertEqual(tidx.get_loc(float('nan')), 1) - self.assertEqual(tidx.get_loc(np.nan), 1) - - def test_contains(self): - # Checking for any NaT-like objects - # GH 13603 - td = to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) - for v in [pd.NaT, None, float('nan'), np.nan]: - self.assertFalse((v in td)) - - td = to_timedelta([pd.NaT]) - for v in [pd.NaT, None, float('nan'), np.nan]: - self.assertTrue((v in td)) - - def test_construction(self): - - expected = np.timedelta64(10, 'D').astype('m8[ns]').view('i8') - self.assertEqual(Timedelta(10, unit='d').value, expected) - self.assertEqual(Timedelta(10.0, unit='d').value, expected) - self.assertEqual(Timedelta('10 days').value, expected) - self.assertEqual(Timedelta(days=10).value, expected) - self.assertEqual(Timedelta(days=10.0).value, expected) - - expected += np.timedelta64(10, 's').astype('m8[ns]').view('i8') - self.assertEqual(Timedelta('10 days 00:00:10').value, expected) - self.assertEqual(Timedelta(days=10, seconds=10).value, expected) - self.assertEqual( - Timedelta(days=10, milliseconds=10 * 1000).value, expected) - self.assertEqual( - Timedelta(days=10, microseconds=10 * 1000 * 1000).value, expected) - - # test construction with np dtypes - # GH 8757 - timedelta_kwargs = {'days': 'D', - 'seconds': 's', - 'microseconds': 'us', - 'milliseconds': 'ms', - 'minutes': 'm', - 'hours': 'h', - 'weeks': 'W'} - npdtypes = [np.int64, np.int32, np.int16, np.float64, np.float32, - np.float16] - for npdtype in npdtypes: - for pykwarg, npkwarg in timedelta_kwargs.items(): - expected = np.timedelta64(1, - npkwarg).astype('m8[ns]').view('i8') - self.assertEqual( - Timedelta(**{pykwarg: npdtype(1)}).value, expected) - - # rounding cases - self.assertEqual(Timedelta(82739999850000).value, 82739999850000) - self.assertTrue('0 days 22:58:59.999850' in str(Timedelta( - 82739999850000))) - self.assertEqual(Timedelta(123072001000000).value, 123072001000000) - self.assertTrue('1 days 10:11:12.001' in str(Timedelta( - 123072001000000))) - - # string conversion with/without leading zero - # GH 9570 - self.assertEqual(Timedelta('0:00:00'), timedelta(hours=0)) - self.assertEqual(Timedelta('00:00:00'), timedelta(hours=0)) - self.assertEqual(Timedelta('-1:00:00'), -timedelta(hours=1)) - self.assertEqual(Timedelta('-01:00:00'), -timedelta(hours=1)) - - # more strings & abbrevs - # GH 8190 - self.assertEqual(Timedelta('1 h'), timedelta(hours=1)) - self.assertEqual(Timedelta('1 hour'), timedelta(hours=1)) - self.assertEqual(Timedelta('1 hr'), timedelta(hours=1)) - self.assertEqual(Timedelta('1 hours'), timedelta(hours=1)) - self.assertEqual(Timedelta('-1 hours'), -timedelta(hours=1)) - self.assertEqual(Timedelta('1 m'), timedelta(minutes=1)) - self.assertEqual(Timedelta('1.5 m'), timedelta(seconds=90)) - self.assertEqual(Timedelta('1 minute'), timedelta(minutes=1)) - self.assertEqual(Timedelta('1 minutes'), timedelta(minutes=1)) - self.assertEqual(Timedelta('1 s'), timedelta(seconds=1)) - self.assertEqual(Timedelta('1 second'), timedelta(seconds=1)) - self.assertEqual(Timedelta('1 seconds'), timedelta(seconds=1)) - self.assertEqual(Timedelta('1 ms'), timedelta(milliseconds=1)) - self.assertEqual(Timedelta('1 milli'), timedelta(milliseconds=1)) - self.assertEqual(Timedelta('1 millisecond'), timedelta(milliseconds=1)) - self.assertEqual(Timedelta('1 us'), timedelta(microseconds=1)) - self.assertEqual(Timedelta('1 micros'), timedelta(microseconds=1)) - self.assertEqual(Timedelta('1 microsecond'), timedelta(microseconds=1)) - self.assertEqual(Timedelta('1.5 microsecond'), - Timedelta('00:00:00.000001500')) - self.assertEqual(Timedelta('1 ns'), Timedelta('00:00:00.000000001')) - self.assertEqual(Timedelta('1 nano'), Timedelta('00:00:00.000000001')) - self.assertEqual(Timedelta('1 nanosecond'), - Timedelta('00:00:00.000000001')) - - # combos - self.assertEqual(Timedelta('10 days 1 hour'), - timedelta(days=10, hours=1)) - self.assertEqual(Timedelta('10 days 1 h'), timedelta(days=10, hours=1)) - self.assertEqual(Timedelta('10 days 1 h 1m 1s'), timedelta( - days=10, hours=1, minutes=1, seconds=1)) - self.assertEqual(Timedelta('-10 days 1 h 1m 1s'), - - timedelta(days=10, hours=1, minutes=1, seconds=1)) - self.assertEqual(Timedelta('-10 days 1 h 1m 1s'), - - timedelta(days=10, hours=1, minutes=1, seconds=1)) - self.assertEqual(Timedelta('-10 days 1 h 1m 1s 3us'), - - timedelta(days=10, hours=1, minutes=1, - seconds=1, microseconds=3)) - self.assertEqual(Timedelta('-10 days 1 h 1.5m 1s 3us'), - - timedelta(days=10, hours=1, minutes=1, - seconds=31, microseconds=3)) - - # currently invalid as it has a - on the hhmmdd part (only allowed on - # the days) - self.assertRaises(ValueError, - lambda: Timedelta('-10 days -1 h 1.5m 1s 3us')) - - # only leading neg signs are allowed - self.assertRaises(ValueError, - lambda: Timedelta('10 days -1 h 1.5m 1s 3us')) - - # no units specified - self.assertRaises(ValueError, lambda: Timedelta('3.1415')) - - # invalid construction - tm.assertRaisesRegexp(ValueError, "cannot construct a Timedelta", - lambda: Timedelta()) - tm.assertRaisesRegexp(ValueError, "unit abbreviation w/o a number", - lambda: Timedelta('foo')) - tm.assertRaisesRegexp(ValueError, - "cannot construct a Timedelta from the passed " - "arguments, allowed keywords are ", - lambda: Timedelta(day=10)) - - # roundtripping both for string and value - for v in ['1s', '-1s', '1us', '-1us', '1 day', '-1 day', - '-23:59:59.999999', '-1 days +23:59:59.999999', '-1ns', - '1ns', '-23:59:59.999999999']: - - td = Timedelta(v) - self.assertEqual(Timedelta(td.value), td) - - # str does not normally display nanos - if not td.nanoseconds: - self.assertEqual(Timedelta(str(td)), td) - self.assertEqual(Timedelta(td._repr_base(format='all')), td) - - # floats - expected = np.timedelta64( - 10, 's').astype('m8[ns]').view('i8') + np.timedelta64( - 500, 'ms').astype('m8[ns]').view('i8') - self.assertEqual(Timedelta(10.5, unit='s').value, expected) - - # nat - self.assertEqual(Timedelta('').value, iNaT) - self.assertEqual(Timedelta('nat').value, iNaT) - self.assertEqual(Timedelta('NAT').value, iNaT) - self.assertEqual(Timedelta(None).value, iNaT) - self.assertEqual(Timedelta(np.nan).value, iNaT) - self.assertTrue(isnull(Timedelta('nat'))) - - # offset - self.assertEqual(to_timedelta(pd.offsets.Hour(2)), - Timedelta('0 days, 02:00:00')) - self.assertEqual(Timedelta(pd.offsets.Hour(2)), - Timedelta('0 days, 02:00:00')) - self.assertEqual(Timedelta(pd.offsets.Second(2)), - Timedelta('0 days, 00:00:02')) - - # unicode - # GH 11995 - expected = Timedelta('1H') - result = pd.Timedelta(u'1H') - self.assertEqual(result, expected) - self.assertEqual(to_timedelta(pd.offsets.Hour(2)), - Timedelta(u'0 days, 02:00:00')) - - self.assertRaises(ValueError, lambda: Timedelta(u'foo bar')) - - def test_round(self): - - t1 = Timedelta('1 days 02:34:56.789123456') - t2 = Timedelta('-1 days 02:34:56.789123456') - - for (freq, s1, s2) in [('N', t1, t2), - ('U', Timedelta('1 days 02:34:56.789123000'), - Timedelta('-1 days 02:34:56.789123000')), - ('L', Timedelta('1 days 02:34:56.789000000'), - Timedelta('-1 days 02:34:56.789000000')), - ('S', Timedelta('1 days 02:34:57'), - Timedelta('-1 days 02:34:57')), - ('2S', Timedelta('1 days 02:34:56'), - Timedelta('-1 days 02:34:56')), - ('5S', Timedelta('1 days 02:34:55'), - Timedelta('-1 days 02:34:55')), - ('T', Timedelta('1 days 02:35:00'), - Timedelta('-1 days 02:35:00')), - ('12T', Timedelta('1 days 02:36:00'), - Timedelta('-1 days 02:36:00')), - ('H', Timedelta('1 days 03:00:00'), - Timedelta('-1 days 03:00:00')), - ('d', Timedelta('1 days'), - Timedelta('-1 days'))]: - r1 = t1.round(freq) - self.assertEqual(r1, s1) - r2 = t2.round(freq) - self.assertEqual(r2, s2) - - # invalid - for freq in ['Y', 'M', 'foobar']: - self.assertRaises(ValueError, lambda: t1.round(freq)) - - t1 = timedelta_range('1 days', periods=3, freq='1 min 2 s 3 us') - t2 = -1 * t1 - t1a = timedelta_range('1 days', periods=3, freq='1 min 2 s') - t1c = pd.TimedeltaIndex([1, 1, 1], unit='D') - - # note that negative times round DOWN! so don't give whole numbers - for (freq, s1, s2) in [('N', t1, t2), - ('U', t1, t2), - ('L', t1a, - TimedeltaIndex(['-1 days +00:00:00', - '-2 days +23:58:58', - '-2 days +23:57:56'], - dtype='timedelta64[ns]', - freq=None) - ), - ('S', t1a, - TimedeltaIndex(['-1 days +00:00:00', - '-2 days +23:58:58', - '-2 days +23:57:56'], - dtype='timedelta64[ns]', - freq=None) - ), - ('12T', t1c, - TimedeltaIndex(['-1 days', - '-1 days', - '-1 days'], - dtype='timedelta64[ns]', - freq=None) - ), - ('H', t1c, - TimedeltaIndex(['-1 days', - '-1 days', - '-1 days'], - dtype='timedelta64[ns]', - freq=None) - ), - ('d', t1c, - pd.TimedeltaIndex([-1, -1, -1], unit='D') - )]: - - r1 = t1.round(freq) - tm.assert_index_equal(r1, s1) - r2 = t2.round(freq) - tm.assert_index_equal(r2, s2) - - # invalid - for freq in ['Y', 'M', 'foobar']: - self.assertRaises(ValueError, lambda: t1.round(freq)) - - def test_repr(self): - - self.assertEqual(repr(Timedelta(10, unit='d')), - "Timedelta('10 days 00:00:00')") - self.assertEqual(repr(Timedelta(10, unit='s')), - "Timedelta('0 days 00:00:10')") - self.assertEqual(repr(Timedelta(10, unit='ms')), - "Timedelta('0 days 00:00:00.010000')") - self.assertEqual(repr(Timedelta(-10, unit='ms')), - "Timedelta('-1 days +23:59:59.990000')") - - def test_identity(self): - - td = Timedelta(10, unit='d') - self.assertTrue(isinstance(td, Timedelta)) - self.assertTrue(isinstance(td, timedelta)) - - def test_conversion(self): - - for td in [Timedelta(10, unit='d'), - Timedelta('1 days, 10:11:12.012345')]: - pydt = td.to_pytimedelta() - self.assertTrue(td == Timedelta(pydt)) - self.assertEqual(td, pydt) - self.assertTrue(isinstance(pydt, timedelta) and not isinstance( - pydt, Timedelta)) - - self.assertEqual(td, np.timedelta64(td.value, 'ns')) - td64 = td.to_timedelta64() - self.assertEqual(td64, np.timedelta64(td.value, 'ns')) - self.assertEqual(td, td64) - self.assertTrue(isinstance(td64, np.timedelta64)) - - # this is NOT equal and cannot be roundtriped (because of the nanos) - td = Timedelta('1 days, 10:11:12.012345678') - self.assertTrue(td != td.to_pytimedelta()) - - def test_ops(self): - - td = Timedelta(10, unit='d') - self.assertEqual(-td, Timedelta(-10, unit='d')) - self.assertEqual(+td, Timedelta(10, unit='d')) - self.assertEqual(td - td, Timedelta(0, unit='ns')) - self.assertTrue((td - pd.NaT) is pd.NaT) - self.assertEqual(td + td, Timedelta(20, unit='d')) - self.assertTrue((td + pd.NaT) is pd.NaT) - self.assertEqual(td * 2, Timedelta(20, unit='d')) - self.assertTrue((td * pd.NaT) is pd.NaT) - self.assertEqual(td / 2, Timedelta(5, unit='d')) - self.assertEqual(abs(td), td) - self.assertEqual(abs(-td), td) - self.assertEqual(td / td, 1) - self.assertTrue((td / pd.NaT) is np.nan) - - # invert - self.assertEqual(-td, Timedelta('-10d')) - self.assertEqual(td * -1, Timedelta('-10d')) - self.assertEqual(-1 * td, Timedelta('-10d')) - self.assertEqual(abs(-td), Timedelta('10d')) - - # invalid - self.assertRaises(TypeError, lambda: Timedelta(11, unit='d') // 2) - - # invalid multiply with another timedelta - self.assertRaises(TypeError, lambda: td * td) - - # can't operate with integers - self.assertRaises(TypeError, lambda: td + 2) - self.assertRaises(TypeError, lambda: td - 2) - - def test_ops_offsets(self): - td = Timedelta(10, unit='d') - self.assertEqual(Timedelta(241, unit='h'), td + pd.offsets.Hour(1)) - self.assertEqual(Timedelta(241, unit='h'), pd.offsets.Hour(1) + td) - self.assertEqual(240, td / pd.offsets.Hour(1)) - self.assertEqual(1 / 240.0, pd.offsets.Hour(1) / td) - self.assertEqual(Timedelta(239, unit='h'), td - pd.offsets.Hour(1)) - self.assertEqual(Timedelta(-239, unit='h'), pd.offsets.Hour(1) - td) - - def test_freq_conversion(self): - - td = Timedelta('1 days 2 hours 3 ns') - result = td / np.timedelta64(1, 'D') - self.assertEqual(result, td.value / float(86400 * 1e9)) - result = td / np.timedelta64(1, 's') - self.assertEqual(result, td.value / float(1e9)) - result = td / np.timedelta64(1, 'ns') - self.assertEqual(result, td.value) - - def test_ops_ndarray(self): - td = Timedelta('1 day') - - # timedelta, timedelta - other = pd.to_timedelta(['1 day']).values - expected = pd.to_timedelta(['2 days']).values - self.assert_numpy_array_equal(td + other, expected) - if LooseVersion(np.__version__) >= '1.8': - self.assert_numpy_array_equal(other + td, expected) - self.assertRaises(TypeError, lambda: td + np.array([1])) - self.assertRaises(TypeError, lambda: np.array([1]) + td) - - expected = pd.to_timedelta(['0 days']).values - self.assert_numpy_array_equal(td - other, expected) - if LooseVersion(np.__version__) >= '1.8': - self.assert_numpy_array_equal(-other + td, expected) - self.assertRaises(TypeError, lambda: td - np.array([1])) - self.assertRaises(TypeError, lambda: np.array([1]) - td) - - expected = pd.to_timedelta(['2 days']).values - self.assert_numpy_array_equal(td * np.array([2]), expected) - self.assert_numpy_array_equal(np.array([2]) * td, expected) - self.assertRaises(TypeError, lambda: td * other) - self.assertRaises(TypeError, lambda: other * td) - - self.assert_numpy_array_equal(td / other, - np.array([1], dtype=np.float64)) - if LooseVersion(np.__version__) >= '1.8': - self.assert_numpy_array_equal(other / td, - np.array([1], dtype=np.float64)) - - # timedelta, datetime - other = pd.to_datetime(['2000-01-01']).values - expected = pd.to_datetime(['2000-01-02']).values - self.assert_numpy_array_equal(td + other, expected) - if LooseVersion(np.__version__) >= '1.8': - self.assert_numpy_array_equal(other + td, expected) - - expected = pd.to_datetime(['1999-12-31']).values - self.assert_numpy_array_equal(-td + other, expected) - if LooseVersion(np.__version__) >= '1.8': - self.assert_numpy_array_equal(other - td, expected) - - def test_ops_series(self): - # regression test for GH8813 - td = Timedelta('1 day') - other = pd.Series([1, 2]) - expected = pd.Series(pd.to_timedelta(['1 day', '2 days'])) - tm.assert_series_equal(expected, td * other) - tm.assert_series_equal(expected, other * td) - - def test_ops_series_object(self): - # GH 13043 - s = pd.Series([pd.Timestamp('2015-01-01', tz='US/Eastern'), - pd.Timestamp('2015-01-01', tz='Asia/Tokyo')], - name='xxx') - self.assertEqual(s.dtype, object) - - exp = pd.Series([pd.Timestamp('2015-01-02', tz='US/Eastern'), - pd.Timestamp('2015-01-02', tz='Asia/Tokyo')], - name='xxx') - tm.assert_series_equal(s + pd.Timedelta('1 days'), exp) - tm.assert_series_equal(pd.Timedelta('1 days') + s, exp) - - # object series & object series - s2 = pd.Series([pd.Timestamp('2015-01-03', tz='US/Eastern'), - pd.Timestamp('2015-01-05', tz='Asia/Tokyo')], - name='xxx') - self.assertEqual(s2.dtype, object) - exp = pd.Series([pd.Timedelta('2 days'), pd.Timedelta('4 days')], - name='xxx') - tm.assert_series_equal(s2 - s, exp) - tm.assert_series_equal(s - s2, -exp) - - s = pd.Series([pd.Timedelta('01:00:00'), pd.Timedelta('02:00:00')], - name='xxx', dtype=object) - self.assertEqual(s.dtype, object) - - exp = pd.Series([pd.Timedelta('01:30:00'), pd.Timedelta('02:30:00')], - name='xxx') - tm.assert_series_equal(s + pd.Timedelta('00:30:00'), exp) - tm.assert_series_equal(pd.Timedelta('00:30:00') + s, exp) - - def test_compare_timedelta_series(self): - # regresssion test for GH5963 - s = pd.Series([timedelta(days=1), timedelta(days=2)]) - actual = s > timedelta(days=1) - expected = pd.Series([False, True]) - tm.assert_series_equal(actual, expected) - - def test_compare_timedelta_ndarray(self): - # GH11835 - periods = [Timedelta('0 days 01:00:00'), Timedelta('0 days 01:00:00')] - arr = np.array(periods) - result = arr[0] > arr - expected = np.array([False, False]) - self.assert_numpy_array_equal(result, expected) - - def test_ops_notimplemented(self): - class Other: - pass - - other = Other() - - td = Timedelta('1 day') - self.assertTrue(td.__add__(other) is NotImplemented) - self.assertTrue(td.__sub__(other) is NotImplemented) - self.assertTrue(td.__truediv__(other) is NotImplemented) - self.assertTrue(td.__mul__(other) is NotImplemented) - self.assertTrue(td.__floordiv__(td) is NotImplemented) - - def test_ops_error_str(self): - # GH 13624 - td = Timedelta('1 day') - - for l, r in [(td, 'a'), ('a', td)]: - - with tm.assertRaises(TypeError): - l + r - - with tm.assertRaises(TypeError): - l > r - - self.assertFalse(l == r) - self.assertTrue(l != r) - - def test_fields(self): - def check(value): - # that we are int/long like - self.assertTrue(isinstance(value, (int, compat.long))) - - # compat to datetime.timedelta - rng = to_timedelta('1 days, 10:11:12') - self.assertEqual(rng.days, 1) - self.assertEqual(rng.seconds, 10 * 3600 + 11 * 60 + 12) - self.assertEqual(rng.microseconds, 0) - self.assertEqual(rng.nanoseconds, 0) - - self.assertRaises(AttributeError, lambda: rng.hours) - self.assertRaises(AttributeError, lambda: rng.minutes) - self.assertRaises(AttributeError, lambda: rng.milliseconds) - - # GH 10050 - check(rng.days) - check(rng.seconds) - check(rng.microseconds) - check(rng.nanoseconds) - - td = Timedelta('-1 days, 10:11:12') - self.assertEqual(abs(td), Timedelta('13:48:48')) - self.assertTrue(str(td) == "-1 days +10:11:12") - self.assertEqual(-td, Timedelta('0 days 13:48:48')) - self.assertEqual(-Timedelta('-1 days, 10:11:12').value, 49728000000000) - self.assertEqual(Timedelta('-1 days, 10:11:12').value, -49728000000000) - - rng = to_timedelta('-1 days, 10:11:12.100123456') - self.assertEqual(rng.days, -1) - self.assertEqual(rng.seconds, 10 * 3600 + 11 * 60 + 12) - self.assertEqual(rng.microseconds, 100 * 1000 + 123) - self.assertEqual(rng.nanoseconds, 456) - self.assertRaises(AttributeError, lambda: rng.hours) - self.assertRaises(AttributeError, lambda: rng.minutes) - self.assertRaises(AttributeError, lambda: rng.milliseconds) - - # components - tup = pd.to_timedelta(-1, 'us').components - self.assertEqual(tup.days, -1) - self.assertEqual(tup.hours, 23) - self.assertEqual(tup.minutes, 59) - self.assertEqual(tup.seconds, 59) - self.assertEqual(tup.milliseconds, 999) - self.assertEqual(tup.microseconds, 999) - self.assertEqual(tup.nanoseconds, 0) - - # GH 10050 - check(tup.days) - check(tup.hours) - check(tup.minutes) - check(tup.seconds) - check(tup.milliseconds) - check(tup.microseconds) - check(tup.nanoseconds) - - tup = Timedelta('-1 days 1 us').components - self.assertEqual(tup.days, -2) - self.assertEqual(tup.hours, 23) - self.assertEqual(tup.minutes, 59) - self.assertEqual(tup.seconds, 59) - self.assertEqual(tup.milliseconds, 999) - self.assertEqual(tup.microseconds, 999) - self.assertEqual(tup.nanoseconds, 0) - - def test_timedelta_range(self): - - expected = to_timedelta(np.arange(5), unit='D') - result = timedelta_range('0 days', periods=5, freq='D') - tm.assert_index_equal(result, expected) - - expected = to_timedelta(np.arange(11), unit='D') - result = timedelta_range('0 days', '10 days', freq='D') - tm.assert_index_equal(result, expected) - - expected = to_timedelta(np.arange(5), unit='D') + Second(2) + Day() - result = timedelta_range('1 days, 00:00:02', '5 days, 00:00:02', - freq='D') - tm.assert_index_equal(result, expected) - - expected = to_timedelta([1, 3, 5, 7, 9], unit='D') + Second(2) - result = timedelta_range('1 days, 00:00:02', periods=5, freq='2D') - tm.assert_index_equal(result, expected) - - expected = to_timedelta(np.arange(50), unit='T') * 30 - result = timedelta_range('0 days', freq='30T', periods=50) - tm.assert_index_equal(result, expected) - - # GH 11776 - arr = np.arange(10).reshape(2, 5) - df = pd.DataFrame(np.arange(10).reshape(2, 5)) - for arg in (arr, df): - with tm.assertRaisesRegexp(TypeError, "1-d array"): - to_timedelta(arg) - for errors in ['ignore', 'raise', 'coerce']: - with tm.assertRaisesRegexp(TypeError, "1-d array"): - to_timedelta(arg, errors=errors) - - # issue10583 - df = pd.DataFrame(np.random.normal(size=(10, 4))) - df.index = pd.timedelta_range(start='0s', periods=10, freq='s') - expected = df.loc[pd.Timedelta('0s'):, :] - result = df.loc['0s':, :] - assert_frame_equal(expected, result) - - def test_numeric_conversions(self): - self.assertEqual(ct(0), np.timedelta64(0, 'ns')) - self.assertEqual(ct(10), np.timedelta64(10, 'ns')) - self.assertEqual(ct(10, unit='ns'), np.timedelta64( - 10, 'ns').astype('m8[ns]')) - - self.assertEqual(ct(10, unit='us'), np.timedelta64( - 10, 'us').astype('m8[ns]')) - self.assertEqual(ct(10, unit='ms'), np.timedelta64( - 10, 'ms').astype('m8[ns]')) - self.assertEqual(ct(10, unit='s'), np.timedelta64( - 10, 's').astype('m8[ns]')) - self.assertEqual(ct(10, unit='d'), np.timedelta64( - 10, 'D').astype('m8[ns]')) - - def test_timedelta_conversions(self): - self.assertEqual(ct(timedelta(seconds=1)), - np.timedelta64(1, 's').astype('m8[ns]')) - self.assertEqual(ct(timedelta(microseconds=1)), - np.timedelta64(1, 'us').astype('m8[ns]')) - self.assertEqual(ct(timedelta(days=1)), - np.timedelta64(1, 'D').astype('m8[ns]')) - - def test_short_format_converters(self): - def conv(v): - return v.astype('m8[ns]') - - self.assertEqual(ct('10'), np.timedelta64(10, 'ns')) - self.assertEqual(ct('10ns'), np.timedelta64(10, 'ns')) - self.assertEqual(ct('100'), np.timedelta64(100, 'ns')) - self.assertEqual(ct('100ns'), np.timedelta64(100, 'ns')) - - self.assertEqual(ct('1000'), np.timedelta64(1000, 'ns')) - self.assertEqual(ct('1000ns'), np.timedelta64(1000, 'ns')) - self.assertEqual(ct('1000NS'), np.timedelta64(1000, 'ns')) - - self.assertEqual(ct('10us'), np.timedelta64(10000, 'ns')) - self.assertEqual(ct('100us'), np.timedelta64(100000, 'ns')) - self.assertEqual(ct('1000us'), np.timedelta64(1000000, 'ns')) - self.assertEqual(ct('1000Us'), np.timedelta64(1000000, 'ns')) - self.assertEqual(ct('1000uS'), np.timedelta64(1000000, 'ns')) - - self.assertEqual(ct('1ms'), np.timedelta64(1000000, 'ns')) - self.assertEqual(ct('10ms'), np.timedelta64(10000000, 'ns')) - self.assertEqual(ct('100ms'), np.timedelta64(100000000, 'ns')) - self.assertEqual(ct('1000ms'), np.timedelta64(1000000000, 'ns')) - - self.assertEqual(ct('-1s'), -np.timedelta64(1000000000, 'ns')) - self.assertEqual(ct('1s'), np.timedelta64(1000000000, 'ns')) - self.assertEqual(ct('10s'), np.timedelta64(10000000000, 'ns')) - self.assertEqual(ct('100s'), np.timedelta64(100000000000, 'ns')) - self.assertEqual(ct('1000s'), np.timedelta64(1000000000000, 'ns')) - - self.assertEqual(ct('1d'), conv(np.timedelta64(1, 'D'))) - self.assertEqual(ct('-1d'), -conv(np.timedelta64(1, 'D'))) - self.assertEqual(ct('1D'), conv(np.timedelta64(1, 'D'))) - self.assertEqual(ct('10D'), conv(np.timedelta64(10, 'D'))) - self.assertEqual(ct('100D'), conv(np.timedelta64(100, 'D'))) - self.assertEqual(ct('1000D'), conv(np.timedelta64(1000, 'D'))) - self.assertEqual(ct('10000D'), conv(np.timedelta64(10000, 'D'))) - - # space - self.assertEqual(ct(' 10000D '), conv(np.timedelta64(10000, 'D'))) - self.assertEqual(ct(' - 10000D '), -conv(np.timedelta64(10000, 'D'))) - - # invalid - self.assertRaises(ValueError, ct, '1foo') - self.assertRaises(ValueError, ct, 'foo') - - def test_full_format_converters(self): - def conv(v): - return v.astype('m8[ns]') - - d1 = np.timedelta64(1, 'D') - - self.assertEqual(ct('1days'), conv(d1)) - self.assertEqual(ct('1days,'), conv(d1)) - self.assertEqual(ct('- 1days,'), -conv(d1)) - - self.assertEqual(ct('00:00:01'), conv(np.timedelta64(1, 's'))) - self.assertEqual(ct('06:00:01'), conv( - np.timedelta64(6 * 3600 + 1, 's'))) - self.assertEqual(ct('06:00:01.0'), conv( - np.timedelta64(6 * 3600 + 1, 's'))) - self.assertEqual(ct('06:00:01.01'), conv( - np.timedelta64(1000 * (6 * 3600 + 1) + 10, 'ms'))) - - self.assertEqual(ct('- 1days, 00:00:01'), - conv(-d1 + np.timedelta64(1, 's'))) - self.assertEqual(ct('1days, 06:00:01'), conv( - d1 + np.timedelta64(6 * 3600 + 1, 's'))) - self.assertEqual(ct('1days, 06:00:01.01'), conv( - d1 + np.timedelta64(1000 * (6 * 3600 + 1) + 10, 'ms'))) - - # invalid - self.assertRaises(ValueError, ct, '- 1days, 00') - - def test_nat_converters(self): - self.assertEqual(to_timedelta( - 'nat', box=False).astype('int64'), tslib.iNaT) - self.assertEqual(to_timedelta( - 'nan', box=False).astype('int64'), tslib.iNaT) - - def test_to_timedelta(self): - def conv(v): - return v.astype('m8[ns]') - - d1 = np.timedelta64(1, 'D') - - self.assertEqual(to_timedelta('1 days 06:05:01.00003', box=False), - conv(d1 + np.timedelta64(6 * 3600 + - 5 * 60 + 1, 's') + - np.timedelta64(30, 'us'))) - self.assertEqual(to_timedelta('15.5us', box=False), - conv(np.timedelta64(15500, 'ns'))) - - # empty string - result = to_timedelta('', box=False) - self.assertEqual(result.astype('int64'), tslib.iNaT) - - result = to_timedelta(['', '']) - self.assertTrue(isnull(result).all()) - - # pass thru - result = to_timedelta(np.array([np.timedelta64(1, 's')])) - expected = pd.Index(np.array([np.timedelta64(1, 's')])) - tm.assert_index_equal(result, expected) - - # ints - result = np.timedelta64(0, 'ns') - expected = to_timedelta(0, box=False) - self.assertEqual(result, expected) - - # Series - expected = Series([timedelta(days=1), timedelta(days=1, seconds=1)]) - result = to_timedelta(Series(['1d', '1days 00:00:01'])) - tm.assert_series_equal(result, expected) - - # with units - result = TimedeltaIndex([np.timedelta64(0, 'ns'), np.timedelta64( - 10, 's').astype('m8[ns]')]) - expected = to_timedelta([0, 10], unit='s') - tm.assert_index_equal(result, expected) - - # single element conversion - v = timedelta(seconds=1) - result = to_timedelta(v, box=False) - expected = np.timedelta64(timedelta(seconds=1)) - self.assertEqual(result, expected) - - v = np.timedelta64(timedelta(seconds=1)) - result = to_timedelta(v, box=False) - expected = np.timedelta64(timedelta(seconds=1)) - self.assertEqual(result, expected) - - # arrays of various dtypes - arr = np.array([1] * 5, dtype='int64') - result = to_timedelta(arr, unit='s') - expected = TimedeltaIndex([np.timedelta64(1, 's')] * 5) - tm.assert_index_equal(result, expected) - - arr = np.array([1] * 5, dtype='int64') - result = to_timedelta(arr, unit='m') - expected = TimedeltaIndex([np.timedelta64(1, 'm')] * 5) - tm.assert_index_equal(result, expected) - - arr = np.array([1] * 5, dtype='int64') - result = to_timedelta(arr, unit='h') - expected = TimedeltaIndex([np.timedelta64(1, 'h')] * 5) - tm.assert_index_equal(result, expected) - - arr = np.array([1] * 5, dtype='timedelta64[s]') - result = to_timedelta(arr) - expected = TimedeltaIndex([np.timedelta64(1, 's')] * 5) - tm.assert_index_equal(result, expected) - - arr = np.array([1] * 5, dtype='timedelta64[D]') - result = to_timedelta(arr) - expected = TimedeltaIndex([np.timedelta64(1, 'D')] * 5) - tm.assert_index_equal(result, expected) - - # Test with lists as input when box=false - expected = np.array(np.arange(3) * 1000000000, dtype='timedelta64[ns]') - result = to_timedelta(range(3), unit='s', box=False) - tm.assert_numpy_array_equal(expected, result) - - result = to_timedelta(np.arange(3), unit='s', box=False) - tm.assert_numpy_array_equal(expected, result) - - result = to_timedelta([0, 1, 2], unit='s', box=False) - tm.assert_numpy_array_equal(expected, result) - - # Tests with fractional seconds as input: - expected = np.array( - [0, 500000000, 800000000, 1200000000], dtype='timedelta64[ns]') - result = to_timedelta([0., 0.5, 0.8, 1.2], unit='s', box=False) - tm.assert_numpy_array_equal(expected, result) - - def testit(unit, transform): - - # array - result = to_timedelta(np.arange(5), unit=unit) - expected = TimedeltaIndex([np.timedelta64(i, transform(unit)) - for i in np.arange(5).tolist()]) - tm.assert_index_equal(result, expected) - - # scalar - result = to_timedelta(2, unit=unit) - expected = Timedelta(np.timedelta64(2, transform(unit)).astype( - 'timedelta64[ns]')) - self.assertEqual(result, expected) - - # validate all units - # GH 6855 - for unit in ['Y', 'M', 'W', 'D', 'y', 'w', 'd']: - testit(unit, lambda x: x.upper()) - for unit in ['days', 'day', 'Day', 'Days']: - testit(unit, lambda x: 'D') - for unit in ['h', 'm', 's', 'ms', 'us', 'ns', 'H', 'S', 'MS', 'US', - 'NS']: - testit(unit, lambda x: x.lower()) - - # offsets - - # m - testit('T', lambda x: 'm') - - # ms - testit('L', lambda x: 'ms') - - def test_to_timedelta_invalid(self): - - # bad value for errors parameter - msg = "errors must be one of" - tm.assertRaisesRegexp(ValueError, msg, to_timedelta, - ['foo'], errors='never') - - # these will error - self.assertRaises(ValueError, lambda: to_timedelta([1, 2], unit='foo')) - self.assertRaises(ValueError, lambda: to_timedelta(1, unit='foo')) - - # time not supported ATM - self.assertRaises(ValueError, lambda: to_timedelta(time(second=1))) - self.assertTrue(to_timedelta( - time(second=1), errors='coerce') is pd.NaT) - - self.assertRaises(ValueError, lambda: to_timedelta(['foo', 'bar'])) - tm.assert_index_equal(TimedeltaIndex([pd.NaT, pd.NaT]), - to_timedelta(['foo', 'bar'], errors='coerce')) - - tm.assert_index_equal(TimedeltaIndex(['1 day', pd.NaT, '1 min']), - to_timedelta(['1 day', 'bar', '1 min'], - errors='coerce')) - - # gh-13613: these should not error because errors='ignore' - invalid_data = 'apple' - self.assertEqual(invalid_data, to_timedelta( - invalid_data, errors='ignore')) - - invalid_data = ['apple', '1 days'] - tm.assert_numpy_array_equal( - np.array(invalid_data, dtype=object), - to_timedelta(invalid_data, errors='ignore')) - - invalid_data = pd.Index(['apple', '1 days']) - tm.assert_index_equal(invalid_data, to_timedelta( - invalid_data, errors='ignore')) - - invalid_data = Series(['apple', '1 days']) - tm.assert_series_equal(invalid_data, to_timedelta( - invalid_data, errors='ignore')) - - def test_to_timedelta_via_apply(self): - # GH 5458 - expected = Series([np.timedelta64(1, 's')]) - result = Series(['00:00:01']).apply(to_timedelta) - tm.assert_series_equal(result, expected) - - result = Series([to_timedelta('00:00:01')]) - tm.assert_series_equal(result, expected) - - def test_timedelta_ops(self): - # GH4984 - # make sure ops return Timedelta - s = Series([Timestamp('20130101') + timedelta(seconds=i * i) - for i in range(10)]) - td = s.diff() - - result = td.mean() - expected = to_timedelta(timedelta(seconds=9)) - self.assertEqual(result, expected) - - result = td.to_frame().mean() - self.assertEqual(result[0], expected) - - result = td.quantile(.1) - expected = Timedelta(np.timedelta64(2600, 'ms')) - self.assertEqual(result, expected) - - result = td.median() - expected = to_timedelta('00:00:09') - self.assertEqual(result, expected) - - result = td.to_frame().median() - self.assertEqual(result[0], expected) - - # GH 6462 - # consistency in returned values for sum - result = td.sum() - expected = to_timedelta('00:01:21') - self.assertEqual(result, expected) - - result = td.to_frame().sum() - self.assertEqual(result[0], expected) - - # std - result = td.std() - expected = to_timedelta(Series(td.dropna().values).std()) - self.assertEqual(result, expected) - - result = td.to_frame().std() - self.assertEqual(result[0], expected) - - # invalid ops - for op in ['skew', 'kurt', 'sem', 'prod']: - self.assertRaises(TypeError, getattr(td, op)) - - # GH 10040 - # make sure NaT is properly handled by median() - s = Series([Timestamp('2015-02-03'), Timestamp('2015-02-07')]) - self.assertEqual(s.diff().median(), timedelta(days=4)) - - s = Series([Timestamp('2015-02-03'), Timestamp('2015-02-07'), - Timestamp('2015-02-15')]) - self.assertEqual(s.diff().median(), timedelta(days=6)) - - def test_overflow(self): - # GH 9442 - s = Series(pd.date_range('20130101', periods=100000, freq='H')) - s[0] += pd.Timedelta('1s 1ms') - - # mean - result = (s - s.min()).mean() - expected = pd.Timedelta((pd.DatetimeIndex((s - s.min())).asi8 / len(s) - ).sum()) - - # the computation is converted to float so might be some loss of - # precision - self.assertTrue(np.allclose(result.value / 1000, expected.value / - 1000)) - - # sum - self.assertRaises(ValueError, lambda: (s - s.min()).sum()) - s1 = s[0:10000] - self.assertRaises(ValueError, lambda: (s1 - s1.min()).sum()) - s2 = s[0:1000] - result = (s2 - s2.min()).sum() - - def test_overflow_on_construction(self): - # xref https://github.com/statsmodels/statsmodels/issues/3374 - value = pd.Timedelta('1day').value * 20169940 - self.assertRaises(OverflowError, pd.Timedelta, value) - - def test_timedelta_ops_scalar(self): - # GH 6808 - base = pd.to_datetime('20130101 09:01:12.123456') - expected_add = pd.to_datetime('20130101 09:01:22.123456') - expected_sub = pd.to_datetime('20130101 09:01:02.123456') - - for offset in [pd.to_timedelta(10, unit='s'), timedelta(seconds=10), - np.timedelta64(10, 's'), - np.timedelta64(10000000000, 'ns'), - pd.offsets.Second(10)]: - result = base + offset - self.assertEqual(result, expected_add) - - result = base - offset - self.assertEqual(result, expected_sub) - - base = pd.to_datetime('20130102 09:01:12.123456') - expected_add = pd.to_datetime('20130103 09:01:22.123456') - expected_sub = pd.to_datetime('20130101 09:01:02.123456') - - for offset in [pd.to_timedelta('1 day, 00:00:10'), - pd.to_timedelta('1 days, 00:00:10'), - timedelta(days=1, seconds=10), - np.timedelta64(1, 'D') + np.timedelta64(10, 's'), - pd.offsets.Day() + pd.offsets.Second(10)]: - result = base + offset - self.assertEqual(result, expected_add) - - result = base - offset - self.assertEqual(result, expected_sub) - - def test_to_timedelta_on_missing_values(self): - # GH5438 - timedelta_NaT = np.timedelta64('NaT') - - actual = pd.to_timedelta(Series(['00:00:01', np.nan])) - expected = Series([np.timedelta64(1000000000, 'ns'), - timedelta_NaT], dtype=' idx1 - expected = np.array([True, False, False, False, True, False]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 <= idx2 - expected = np.array([True, False, False, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx2 >= idx1 - expected = np.array([True, False, False, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 == idx2 - expected = np.array([False, False, False, False, False, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 != idx2 - expected = np.array([True, True, True, True, True, False]) - self.assert_numpy_array_equal(result, expected) - - def test_ops_error_str(self): - # GH 13624 - tdi = TimedeltaIndex(['1 day', '2 days']) - - for l, r in [(tdi, 'a'), ('a', tdi)]: - with tm.assertRaises(TypeError): - l + r - - with tm.assertRaises(TypeError): - l > r - - with tm.assertRaises(TypeError): - l == r - - with tm.assertRaises(TypeError): - l != r - - def test_map(self): - - rng = timedelta_range('1 day', periods=10) - - f = lambda x: x.days - result = rng.map(f) - exp = Int64Index([f(x) for x in rng]) - tm.assert_index_equal(result, exp) - - def test_misc_coverage(self): - - rng = timedelta_range('1 day', periods=5) - result = rng.groupby(rng.days) - tm.assertIsInstance(list(result.values())[0][0], Timedelta) - - idx = TimedeltaIndex(['3d', '1d', '2d']) - self.assertFalse(idx.equals(list(idx))) - - non_td = Index(list('abc')) - self.assertFalse(idx.equals(list(non_td))) - - def test_union(self): - - i1 = timedelta_range('1day', periods=5) - i2 = timedelta_range('3day', periods=5) - result = i1.union(i2) - expected = timedelta_range('1day', periods=7) - self.assert_index_equal(result, expected) - - i1 = Int64Index(np.arange(0, 20, 2)) - i2 = TimedeltaIndex(start='1 day', periods=10, freq='D') - i1.union(i2) # Works - i2.union(i1) # Fails with "AttributeError: can't set attribute" - - def test_union_coverage(self): - - idx = TimedeltaIndex(['3d', '1d', '2d']) - ordered = TimedeltaIndex(idx.sort_values(), freq='infer') - result = ordered.union(idx) - self.assert_index_equal(result, ordered) - - result = ordered[:0].union(ordered) - self.assert_index_equal(result, ordered) - self.assertEqual(result.freq, ordered.freq) - - def test_union_bug_1730(self): - - rng_a = timedelta_range('1 day', periods=4, freq='3H') - rng_b = timedelta_range('1 day', periods=4, freq='4H') - - result = rng_a.union(rng_b) - exp = TimedeltaIndex(sorted(set(list(rng_a)) | set(list(rng_b)))) - self.assert_index_equal(result, exp) - - def test_union_bug_1745(self): - - left = TimedeltaIndex(['1 day 15:19:49.695000']) - right = TimedeltaIndex(['2 day 13:04:21.322000', - '1 day 15:27:24.873000', - '1 day 15:31:05.350000']) - - result = left.union(right) - exp = TimedeltaIndex(sorted(set(list(left)) | set(list(right)))) - self.assert_index_equal(result, exp) - - def test_union_bug_4564(self): - - left = timedelta_range("1 day", "30d") - right = left + pd.offsets.Minute(15) - - result = left.union(right) - exp = TimedeltaIndex(sorted(set(list(left)) | set(list(right)))) - self.assert_index_equal(result, exp) - - def test_intersection_bug_1708(self): - index_1 = timedelta_range('1 day', periods=4, freq='h') - index_2 = index_1 + pd.offsets.Hour(5) - - result = index_1 & index_2 - self.assertEqual(len(result), 0) - - index_1 = timedelta_range('1 day', periods=4, freq='h') - index_2 = index_1 + pd.offsets.Hour(1) - - result = index_1 & index_2 - expected = timedelta_range('1 day 01:00:00', periods=3, freq='h') - tm.assert_index_equal(result, expected) - - def test_get_duplicates(self): - idx = TimedeltaIndex(['1 day', '2 day', '2 day', '3 day', '3day', - '4day']) - - result = idx.get_duplicates() - ex = TimedeltaIndex(['2 day', '3day']) - self.assert_index_equal(result, ex) - - def test_argmin_argmax(self): - idx = TimedeltaIndex(['1 day 00:00:05', '1 day 00:00:01', - '1 day 00:00:02']) - self.assertEqual(idx.argmin(), 1) - self.assertEqual(idx.argmax(), 0) - - def test_sort_values(self): - - idx = TimedeltaIndex(['4d', '1d', '2d']) - - ordered = idx.sort_values() - self.assertTrue(ordered.is_monotonic) - - ordered = idx.sort_values(ascending=False) - self.assertTrue(ordered[::-1].is_monotonic) - - ordered, dexer = idx.sort_values(return_indexer=True) - self.assertTrue(ordered.is_monotonic) - self.assert_numpy_array_equal(dexer, - np.array([1, 2, 0]), - check_dtype=False) - - ordered, dexer = idx.sort_values(return_indexer=True, ascending=False) - self.assertTrue(ordered[::-1].is_monotonic) - self.assert_numpy_array_equal(dexer, - np.array([0, 2, 1]), - check_dtype=False) - - def test_insert(self): - - idx = TimedeltaIndex(['4day', '1day', '2day'], name='idx') - - result = idx.insert(2, timedelta(days=5)) - exp = TimedeltaIndex(['4day', '1day', '5day', '2day'], name='idx') - self.assert_index_equal(result, exp) - - # insertion of non-datetime should coerce to object index - result = idx.insert(1, 'inserted') - expected = Index([Timedelta('4day'), 'inserted', Timedelta('1day'), - Timedelta('2day')], name='idx') - self.assertNotIsInstance(result, TimedeltaIndex) - tm.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - - idx = timedelta_range('1day 00:00:01', periods=3, freq='s', name='idx') - - # preserve freq - expected_0 = TimedeltaIndex(['1day', '1day 00:00:01', '1day 00:00:02', - '1day 00:00:03'], - name='idx', freq='s') - expected_3 = TimedeltaIndex(['1day 00:00:01', '1day 00:00:02', - '1day 00:00:03', '1day 00:00:04'], - name='idx', freq='s') - - # reset freq to None - expected_1_nofreq = TimedeltaIndex(['1day 00:00:01', '1day 00:00:01', - '1day 00:00:02', '1day 00:00:03'], - name='idx', freq=None) - expected_3_nofreq = TimedeltaIndex(['1day 00:00:01', '1day 00:00:02', - '1day 00:00:03', '1day 00:00:05'], - name='idx', freq=None) - - cases = [(0, Timedelta('1day'), expected_0), - (-3, Timedelta('1day'), expected_0), - (3, Timedelta('1day 00:00:04'), expected_3), - (1, Timedelta('1day 00:00:01'), expected_1_nofreq), - (3, Timedelta('1day 00:00:05'), expected_3_nofreq)] - - for n, d, expected in cases: - result = idx.insert(n, d) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, expected.freq) - - def test_delete(self): - idx = timedelta_range(start='1 Days', periods=5, freq='D', name='idx') - - # prserve freq - expected_0 = timedelta_range(start='2 Days', periods=4, freq='D', - name='idx') - expected_4 = timedelta_range(start='1 Days', periods=4, freq='D', - name='idx') - - # reset freq to None - expected_1 = TimedeltaIndex( - ['1 day', '3 day', '4 day', '5 day'], freq=None, name='idx') - - cases = {0: expected_0, - -5: expected_0, - -1: expected_4, - 4: expected_4, - 1: expected_1} - for n, expected in compat.iteritems(cases): - result = idx.delete(n) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, expected.freq) - - with tm.assertRaises((IndexError, ValueError)): - # either depeidnig on numpy version - result = idx.delete(5) - - def test_delete_slice(self): - idx = timedelta_range(start='1 days', periods=10, freq='D', name='idx') - - # prserve freq - expected_0_2 = timedelta_range(start='4 days', periods=7, freq='D', - name='idx') - expected_7_9 = timedelta_range(start='1 days', periods=7, freq='D', - name='idx') - - # reset freq to None - expected_3_5 = TimedeltaIndex(['1 d', '2 d', '3 d', - '7 d', '8 d', '9 d', '10d'], - freq=None, name='idx') - - cases = {(0, 1, 2): expected_0_2, - (7, 8, 9): expected_7_9, - (3, 4, 5): expected_3_5} - for n, expected in compat.iteritems(cases): - result = idx.delete(n) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, expected.freq) - - result = idx.delete(slice(n[0], n[-1] + 1)) - self.assert_index_equal(result, expected) - self.assertEqual(result.name, expected.name) - self.assertEqual(result.freq, expected.freq) - - def test_take(self): - - tds = ['1day 02:00:00', '1 day 04:00:00', '1 day 10:00:00'] - idx = TimedeltaIndex(start='1d', end='2d', freq='H', name='idx') - expected = TimedeltaIndex(tds, freq=None, name='idx') - - taken1 = idx.take([2, 4, 10]) - taken2 = idx[[2, 4, 10]] - - for taken in [taken1, taken2]: - self.assert_index_equal(taken, expected) - tm.assertIsInstance(taken, TimedeltaIndex) - self.assertIsNone(taken.freq) - self.assertEqual(taken.name, expected.name) - - def test_take_fill_value(self): - # GH 12631 - idx = pd.TimedeltaIndex(['1 days', '2 days', '3 days'], - name='xxx') - result = idx.take(np.array([1, 0, -1])) - expected = pd.TimedeltaIndex(['2 days', '1 days', '3 days'], - name='xxx') - tm.assert_index_equal(result, expected) - - # fill_value - result = idx.take(np.array([1, 0, -1]), fill_value=True) - expected = pd.TimedeltaIndex(['2 days', '1 days', 'NaT'], - name='xxx') - tm.assert_index_equal(result, expected) - - # allow_fill=False - result = idx.take(np.array([1, 0, -1]), allow_fill=False, - fill_value=True) - expected = pd.TimedeltaIndex(['2 days', '1 days', '3 days'], - name='xxx') - tm.assert_index_equal(result, expected) - - msg = ('When allow_fill=True and fill_value is not None, ' - 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -5]), fill_value=True) - - with tm.assertRaises(IndexError): - idx.take(np.array([1, -5])) - - def test_isin(self): - - index = tm.makeTimedeltaIndex(4) - result = index.isin(index) - self.assertTrue(result.all()) - - result = index.isin(list(index)) - self.assertTrue(result.all()) - - assert_almost_equal(index.isin([index[2], 5]), - np.array([False, False, True, False])) - - def test_does_not_convert_mixed_integer(self): - df = tm.makeCustomDataframe(10, 10, - data_gen_f=lambda *args, **kwargs: randn(), - r_idx_type='i', c_idx_type='td') - str(df) - - cols = df.columns.join(df.index, how='outer') - joined = cols.join(df.columns) - self.assertEqual(cols.dtype, np.dtype('O')) - self.assertEqual(cols.dtype, joined.dtype) - tm.assert_index_equal(cols, joined) - - def test_slice_keeps_name(self): - - # GH4226 - dr = pd.timedelta_range('1d', '5d', freq='H', name='timebucket') - self.assertEqual(dr[1:].name, dr.name) - - def test_join_self(self): - - index = timedelta_range('1 day', periods=10) - kinds = 'outer', 'inner', 'left', 'right' - for kind in kinds: - joined = index.join(index, how=kind) - tm.assert_index_equal(index, joined) - - def test_factorize(self): - idx1 = TimedeltaIndex(['1 day', '1 day', '2 day', '2 day', '3 day', - '3 day']) - - exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) - exp_idx = TimedeltaIndex(['1 day', '2 day', '3 day']) - - arr, idx = idx1.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - self.assert_index_equal(idx, exp_idx) - - arr, idx = idx1.factorize(sort=True) - self.assert_numpy_array_equal(arr, exp_arr) - self.assert_index_equal(idx, exp_idx) - - # freq must be preserved - idx3 = timedelta_range('1 day', periods=4, freq='s') - exp_arr = np.array([0, 1, 2, 3], dtype=np.intp) - arr, idx = idx3.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - self.assert_index_equal(idx, idx3) - - -class TestSlicing(tm.TestCase): - def test_partial_slice(self): - rng = timedelta_range('1 day 10:11:12', freq='h', periods=500) - s = Series(np.arange(len(rng)), index=rng) - - result = s['5 day':'6 day'] - expected = s.iloc[86:134] - assert_series_equal(result, expected) - - result = s['5 day':] - expected = s.iloc[86:] - assert_series_equal(result, expected) - - result = s[:'6 day'] - expected = s.iloc[:134] - assert_series_equal(result, expected) - - result = s['6 days, 23:11:12'] - self.assertEqual(result, s.iloc[133]) - - self.assertRaises(KeyError, s.__getitem__, '50 days') - - def test_partial_slice_high_reso(self): - - # higher reso - rng = timedelta_range('1 day 10:11:12', freq='us', periods=2000) - s = Series(np.arange(len(rng)), index=rng) - - result = s['1 day 10:11:12':] - expected = s.iloc[0:] - assert_series_equal(result, expected) - - result = s['1 day 10:11:12.001':] - expected = s.iloc[1000:] - assert_series_equal(result, expected) - - result = s['1 days, 10:11:12.001001'] - self.assertEqual(result, s.iloc[1001]) - - def test_slice_with_negative_step(self): - ts = Series(np.arange(20), timedelta_range('0', periods=20, freq='H')) - SLC = pd.IndexSlice - - def assert_slices_equivalent(l_slc, i_slc): - assert_series_equal(ts[l_slc], ts.iloc[i_slc]) - assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - - assert_slices_equivalent(SLC[Timedelta(hours=7)::-1], SLC[7::-1]) - assert_slices_equivalent(SLC['7 hours'::-1], SLC[7::-1]) - - assert_slices_equivalent(SLC[:Timedelta(hours=7):-1], SLC[:6:-1]) - assert_slices_equivalent(SLC[:'7 hours':-1], SLC[:6:-1]) - - assert_slices_equivalent(SLC['15 hours':'7 hours':-1], SLC[15:6:-1]) - assert_slices_equivalent(SLC[Timedelta(hours=15):Timedelta(hours=7):- - 1], SLC[15:6:-1]) - assert_slices_equivalent(SLC['15 hours':Timedelta(hours=7):-1], - SLC[15:6:-1]) - assert_slices_equivalent(SLC[Timedelta(hours=15):'7 hours':-1], - SLC[15:6:-1]) - - assert_slices_equivalent(SLC['7 hours':'15 hours':-1], SLC[:0]) - - def test_slice_with_zero_step_raises(self): - ts = Series(np.arange(20), timedelta_range('0', periods=20, freq='H')) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - - def test_tdi_ops_attributes(self): - rng = timedelta_range('2 days', periods=5, freq='2D', name='x') - - result = rng + 1 - exp = timedelta_range('4 days', periods=5, freq='2D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '2D') - - result = rng - 2 - exp = timedelta_range('-2 days', periods=5, freq='2D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '2D') - - result = rng * 2 - exp = timedelta_range('4 days', periods=5, freq='4D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '4D') - - result = rng / 2 - exp = timedelta_range('1 days', periods=5, freq='D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, 'D') - - result = -rng - exp = timedelta_range('-2 days', periods=5, freq='-2D', name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, '-2D') - - rng = pd.timedelta_range('-2 days', periods=5, freq='D', name='x') - - result = abs(rng) - exp = TimedeltaIndex(['2 days', '1 days', '0 days', '1 days', - '2 days'], name='x') - tm.assert_index_equal(result, exp) - self.assertEqual(result.freq, None) - - def test_add_overflow(self): - # see gh-14068 - msg = "too (big|large) to convert" - with tm.assertRaisesRegexp(OverflowError, msg): - to_timedelta(106580, 'D') + Timestamp('2000') - with tm.assertRaisesRegexp(OverflowError, msg): - Timestamp('2000') + to_timedelta(106580, 'D') - - _NaT = int(pd.NaT) + 1 - msg = "Overflow in int64 addition" - with tm.assertRaisesRegexp(OverflowError, msg): - to_timedelta([106580], 'D') + Timestamp('2000') - with tm.assertRaisesRegexp(OverflowError, msg): - Timestamp('2000') + to_timedelta([106580], 'D') - with tm.assertRaisesRegexp(OverflowError, msg): - to_timedelta([_NaT]) - Timedelta('1 days') - with tm.assertRaisesRegexp(OverflowError, msg): - to_timedelta(['5 days', _NaT]) - Timedelta('1 days') - with tm.assertRaisesRegexp(OverflowError, msg): - (to_timedelta([_NaT, '5 days', '1 hours']) - - to_timedelta(['7 seconds', _NaT, '4 hours'])) - - # These should not overflow! - exp = TimedeltaIndex([pd.NaT]) - result = to_timedelta([pd.NaT]) - Timedelta('1 days') - tm.assert_index_equal(result, exp) - - exp = TimedeltaIndex(['4 days', pd.NaT]) - result = to_timedelta(['5 days', pd.NaT]) - Timedelta('1 days') - tm.assert_index_equal(result, exp) - - exp = TimedeltaIndex([pd.NaT, pd.NaT, '5 hours']) - result = (to_timedelta([pd.NaT, '5 days', '1 hours']) + - to_timedelta(['7 seconds', pd.NaT, '4 hours'])) - tm.assert_index_equal(result, exp) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_timeseries.py b/pandas/tseries/tests/test_timeseries.py deleted file mode 100644 index b5daf1ac0ec68..0000000000000 --- a/pandas/tseries/tests/test_timeseries.py +++ /dev/null @@ -1,5701 +0,0 @@ -# pylint: disable-msg=E1101,W0612 -import locale -import calendar -import operator -import sys -import warnings -from datetime import datetime, time, timedelta -from numpy.random import rand - -import nose -import numpy as np -import pandas.index as _index -import pandas.lib as lib -import pandas.tslib as tslib - -from pandas.types.common import is_datetime64_ns_dtype -import pandas as pd -import pandas.compat as compat -import pandas.core.common as com -import pandas.tseries.frequencies as frequencies -import pandas.tseries.offsets as offsets -import pandas.tseries.tools as tools -import pandas.util.testing as tm -from pandas import ( - Index, Series, DataFrame, isnull, date_range, Timestamp, Period, - DatetimeIndex, Int64Index, to_datetime, bdate_range, Float64Index, - NaT, timedelta_range, Timedelta, _np_version_under1p8, concat) -from pandas.compat import range, long, StringIO, lrange, lmap, zip, product -from pandas.compat.numpy import np_datetime64_compat -from pandas.core.common import PerformanceWarning -from pandas.tslib import iNaT -from pandas.util.testing import ( - assert_frame_equal, assert_series_equal, assert_almost_equal, - _skip_if_has_locale, slow) - -randn = np.random.randn - - -class TestTimeSeriesDuplicates(tm.TestCase): - _multiprocess_can_split_ = True - - def setUp(self): - dates = [datetime(2000, 1, 2), datetime(2000, 1, 2), - datetime(2000, 1, 2), datetime(2000, 1, 3), - datetime(2000, 1, 3), datetime(2000, 1, 3), - datetime(2000, 1, 4), datetime(2000, 1, 4), - datetime(2000, 1, 4), datetime(2000, 1, 5)] - - self.dups = Series(np.random.randn(len(dates)), index=dates) - - def test_constructor(self): - tm.assertIsInstance(self.dups, Series) - tm.assertIsInstance(self.dups.index, DatetimeIndex) - - def test_is_unique_monotonic(self): - self.assertFalse(self.dups.index.is_unique) - - def test_index_unique(self): - uniques = self.dups.index.unique() - expected = DatetimeIndex([datetime(2000, 1, 2), datetime(2000, 1, 3), - datetime(2000, 1, 4), datetime(2000, 1, 5)]) - self.assertEqual(uniques.dtype, 'M8[ns]') # sanity - tm.assert_index_equal(uniques, expected) - self.assertEqual(self.dups.index.nunique(), 4) - - # #2563 - self.assertTrue(isinstance(uniques, DatetimeIndex)) - - dups_local = self.dups.index.tz_localize('US/Eastern') - dups_local.name = 'foo' - result = dups_local.unique() - expected = DatetimeIndex(expected, name='foo') - expected = expected.tz_localize('US/Eastern') - self.assertTrue(result.tz is not None) - self.assertEqual(result.name, 'foo') - tm.assert_index_equal(result, expected) - - # NaT, note this is excluded - arr = [1370745748 + t for t in range(20)] + [iNaT] - idx = DatetimeIndex(arr * 3) - tm.assert_index_equal(idx.unique(), DatetimeIndex(arr)) - self.assertEqual(idx.nunique(), 20) - self.assertEqual(idx.nunique(dropna=False), 21) - - arr = [Timestamp('2013-06-09 02:42:28') + timedelta(seconds=t) - for t in range(20)] + [NaT] - idx = DatetimeIndex(arr * 3) - tm.assert_index_equal(idx.unique(), DatetimeIndex(arr)) - self.assertEqual(idx.nunique(), 20) - self.assertEqual(idx.nunique(dropna=False), 21) - - def test_index_dupes_contains(self): - d = datetime(2011, 12, 5, 20, 30) - ix = DatetimeIndex([d, d]) - self.assertTrue(d in ix) - - def test_duplicate_dates_indexing(self): - ts = self.dups - - uniques = ts.index.unique() - for date in uniques: - result = ts[date] - - mask = ts.index == date - total = (ts.index == date).sum() - expected = ts[mask] - if total > 1: - assert_series_equal(result, expected) - else: - assert_almost_equal(result, expected[0]) - - cp = ts.copy() - cp[date] = 0 - expected = Series(np.where(mask, 0, ts), index=ts.index) - assert_series_equal(cp, expected) - - self.assertRaises(KeyError, ts.__getitem__, datetime(2000, 1, 6)) - - # new index - ts[datetime(2000, 1, 6)] = 0 - self.assertEqual(ts[datetime(2000, 1, 6)], 0) - - def test_range_slice(self): - idx = DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/3/2000', - '1/4/2000']) - - ts = Series(np.random.randn(len(idx)), index=idx) - - result = ts['1/2/2000':] - expected = ts[1:] - assert_series_equal(result, expected) - - result = ts['1/2/2000':'1/3/2000'] - expected = ts[1:4] - assert_series_equal(result, expected) - - def test_groupby_average_dup_values(self): - result = self.dups.groupby(level=0).mean() - expected = self.dups.groupby(self.dups.index).mean() - assert_series_equal(result, expected) - - def test_indexing_over_size_cutoff(self): - import datetime - # #1821 - - old_cutoff = _index._SIZE_CUTOFF - try: - _index._SIZE_CUTOFF = 1000 - - # create large list of non periodic datetime - dates = [] - sec = datetime.timedelta(seconds=1) - half_sec = datetime.timedelta(microseconds=500000) - d = datetime.datetime(2011, 12, 5, 20, 30) - n = 1100 - for i in range(n): - dates.append(d) - dates.append(d + sec) - dates.append(d + sec + half_sec) - dates.append(d + sec + sec + half_sec) - d += 3 * sec - - # duplicate some values in the list - duplicate_positions = np.random.randint(0, len(dates) - 1, 20) - for p in duplicate_positions: - dates[p + 1] = dates[p] - - df = DataFrame(np.random.randn(len(dates), 4), - index=dates, - columns=list('ABCD')) - - pos = n * 3 - timestamp = df.index[pos] - self.assertIn(timestamp, df.index) - - # it works! - df.loc[timestamp] - self.assertTrue(len(df.loc[[timestamp]]) > 0) - finally: - _index._SIZE_CUTOFF = old_cutoff - - def test_indexing_unordered(self): - # GH 2437 - rng = date_range(start='2011-01-01', end='2011-01-15') - ts = Series(randn(len(rng)), index=rng) - ts2 = concat([ts[0:4], ts[-4:], ts[4:-4]]) - - for t in ts.index: - # TODO: unused? - s = str(t) # noqa - - expected = ts[t] - result = ts2[t] - self.assertTrue(expected == result) - - # GH 3448 (ranges) - def compare(slobj): - result = ts2[slobj].copy() - result = result.sort_index() - expected = ts[slobj] - assert_series_equal(result, expected) - - compare(slice('2011-01-01', '2011-01-15')) - compare(slice('2010-12-30', '2011-01-15')) - compare(slice('2011-01-01', '2011-01-16')) - - # partial ranges - compare(slice('2011-01-01', '2011-01-6')) - compare(slice('2011-01-06', '2011-01-8')) - compare(slice('2011-01-06', '2011-01-12')) - - # single values - result = ts2['2011'].sort_index() - expected = ts['2011'] - assert_series_equal(result, expected) - - # diff freq - rng = date_range(datetime(2005, 1, 1), periods=20, freq='M') - ts = Series(np.arange(len(rng)), index=rng) - ts = ts.take(np.random.permutation(20)) - - result = ts['2005'] - for t in result.index: - self.assertTrue(t.year == 2005) - - def test_indexing(self): - - idx = date_range("2001-1-1", periods=20, freq='M') - ts = Series(np.random.rand(len(idx)), index=idx) - - # getting - - # GH 3070, make sure semantics work on Series/Frame - expected = ts['2001'] - expected.name = 'A' - - df = DataFrame(dict(A=ts)) - result = df['2001']['A'] - assert_series_equal(expected, result) - - # setting - ts['2001'] = 1 - expected = ts['2001'] - expected.name = 'A' - - df.loc['2001', 'A'] = 1 - - result = df['2001']['A'] - assert_series_equal(expected, result) - - # GH3546 (not including times on the last day) - idx = date_range(start='2013-05-31 00:00', end='2013-05-31 23:00', - freq='H') - ts = Series(lrange(len(idx)), index=idx) - expected = ts['2013-05'] - assert_series_equal(expected, ts) - - idx = date_range(start='2013-05-31 00:00', end='2013-05-31 23:59', - freq='S') - ts = Series(lrange(len(idx)), index=idx) - expected = ts['2013-05'] - assert_series_equal(expected, ts) - - idx = [Timestamp('2013-05-31 00:00'), - Timestamp(datetime(2013, 5, 31, 23, 59, 59, 999999))] - ts = Series(lrange(len(idx)), index=idx) - expected = ts['2013'] - assert_series_equal(expected, ts) - - # GH14826, indexing with a seconds resolution string / datetime object - df = DataFrame(randn(5, 5), - columns=['open', 'high', 'low', 'close', 'volume'], - index=date_range('2012-01-02 18:01:00', - periods=5, tz='US/Central', freq='s')) - expected = df.loc[[df.index[2]]] - - # this is a single date, so will raise - self.assertRaises(KeyError, df.__getitem__, '2012-01-02 18:01:02', ) - self.assertRaises(KeyError, df.__getitem__, df.index[2], ) - - def test_recreate_from_data(self): - freqs = ['M', 'Q', 'A', 'D', 'B', 'BH', 'T', 'S', 'L', 'U', 'H', 'N', - 'C'] - - for f in freqs: - org = DatetimeIndex(start='2001/02/01 09:00', freq=f, periods=1) - idx = DatetimeIndex(org, freq=f) - tm.assert_index_equal(idx, org) - - org = DatetimeIndex(start='2001/02/01 09:00', freq=f, - tz='US/Pacific', periods=1) - idx = DatetimeIndex(org, freq=f, tz='US/Pacific') - tm.assert_index_equal(idx, org) - - -def assert_range_equal(left, right): - assert (left.equals(right)) - assert (left.freq == right.freq) - assert (left.tz == right.tz) - - -class TestTimeSeries(tm.TestCase): - _multiprocess_can_split_ = True - - def test_is_(self): - dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') - self.assertTrue(dti.is_(dti)) - self.assertTrue(dti.is_(dti.view())) - self.assertFalse(dti.is_(dti.copy())) - - def test_dti_slicing(self): - dti = DatetimeIndex(start='1/1/2005', end='12/1/2005', freq='M') - dti2 = dti[[1, 3, 5]] - - v1 = dti2[0] - v2 = dti2[1] - v3 = dti2[2] - - self.assertEqual(v1, Timestamp('2/28/2005')) - self.assertEqual(v2, Timestamp('4/30/2005')) - self.assertEqual(v3, Timestamp('6/30/2005')) - - # don't carry freq through irregular slicing - self.assertIsNone(dti2.freq) - - def test_pass_datetimeindex_to_index(self): - # Bugs in #1396 - rng = date_range('1/1/2000', '3/1/2000') - idx = Index(rng, dtype=object) - - expected = Index(rng.to_pydatetime(), dtype=object) - - self.assert_numpy_array_equal(idx.values, expected.values) - - def test_contiguous_boolean_preserve_freq(self): - rng = date_range('1/1/2000', '3/1/2000', freq='B') - - mask = np.zeros(len(rng), dtype=bool) - mask[10:20] = True - - masked = rng[mask] - expected = rng[10:20] - self.assertIsNotNone(expected.freq) - assert_range_equal(masked, expected) - - mask[22] = True - masked = rng[mask] - self.assertIsNone(masked.freq) - - def test_getitem_median_slice_bug(self): - index = date_range('20090415', '20090519', freq='2B') - s = Series(np.random.randn(13), index=index) - - indexer = [slice(6, 7, None)] - result = s[indexer] - expected = s[indexer[0]] - assert_series_equal(result, expected) - - def test_series_box_timestamp(self): - rng = date_range('20090415', '20090519', freq='B') - s = Series(rng) - - tm.assertIsInstance(s[5], Timestamp) - - rng = date_range('20090415', '20090519', freq='B') - s = Series(rng, index=rng) - tm.assertIsInstance(s[5], Timestamp) - - tm.assertIsInstance(s.iat[5], Timestamp) - - def test_series_box_timedelta(self): - rng = timedelta_range('1 day 1 s', periods=5, freq='h') - s = Series(rng) - tm.assertIsInstance(s[1], Timedelta) - tm.assertIsInstance(s.iat[2], Timedelta) - - def test_date_range_ambiguous_arguments(self): - # #2538 - start = datetime(2011, 1, 1, 5, 3, 40) - end = datetime(2011, 1, 1, 8, 9, 40) - - self.assertRaises(ValueError, date_range, start, end, freq='s', - periods=10) - - def test_timestamp_to_datetime(self): - tm._skip_if_no_pytz() - rng = date_range('20090415', '20090519', tz='US/Eastern') - - stamp = rng[0] - dtval = stamp.to_pydatetime() - self.assertEqual(stamp, dtval) - self.assertEqual(stamp.tzinfo, dtval.tzinfo) - - def test_timestamp_to_datetime_dateutil(self): - tm._skip_if_no_pytz() - rng = date_range('20090415', '20090519', tz='dateutil/US/Eastern') - - stamp = rng[0] - dtval = stamp.to_pydatetime() - self.assertEqual(stamp, dtval) - self.assertEqual(stamp.tzinfo, dtval.tzinfo) - - def test_timestamp_to_datetime_explicit_pytz(self): - tm._skip_if_no_pytz() - import pytz - rng = date_range('20090415', '20090519', - tz=pytz.timezone('US/Eastern')) - - stamp = rng[0] - dtval = stamp.to_pydatetime() - self.assertEqual(stamp, dtval) - self.assertEqual(stamp.tzinfo, dtval.tzinfo) - - def test_timestamp_to_datetime_explicit_dateutil(self): - tm._skip_if_windows_python_3() - tm._skip_if_no_dateutil() - from pandas.tslib import _dateutil_gettz as gettz - rng = date_range('20090415', '20090519', tz=gettz('US/Eastern')) - - stamp = rng[0] - dtval = stamp.to_pydatetime() - self.assertEqual(stamp, dtval) - self.assertEqual(stamp.tzinfo, dtval.tzinfo) - - def test_index_convert_to_datetime_array(self): - tm._skip_if_no_pytz() - - def _check_rng(rng): - converted = rng.to_pydatetime() - tm.assertIsInstance(converted, np.ndarray) - for x, stamp in zip(converted, rng): - tm.assertIsInstance(x, datetime) - self.assertEqual(x, stamp.to_pydatetime()) - self.assertEqual(x.tzinfo, stamp.tzinfo) - - rng = date_range('20090415', '20090519') - rng_eastern = date_range('20090415', '20090519', tz='US/Eastern') - rng_utc = date_range('20090415', '20090519', tz='utc') - - _check_rng(rng) - _check_rng(rng_eastern) - _check_rng(rng_utc) - - def test_index_convert_to_datetime_array_explicit_pytz(self): - tm._skip_if_no_pytz() - import pytz - - def _check_rng(rng): - converted = rng.to_pydatetime() - tm.assertIsInstance(converted, np.ndarray) - for x, stamp in zip(converted, rng): - tm.assertIsInstance(x, datetime) - self.assertEqual(x, stamp.to_pydatetime()) - self.assertEqual(x.tzinfo, stamp.tzinfo) - - rng = date_range('20090415', '20090519') - rng_eastern = date_range('20090415', '20090519', - tz=pytz.timezone('US/Eastern')) - rng_utc = date_range('20090415', '20090519', tz=pytz.utc) - - _check_rng(rng) - _check_rng(rng_eastern) - _check_rng(rng_utc) - - def test_index_convert_to_datetime_array_dateutil(self): - tm._skip_if_no_dateutil() - import dateutil - - def _check_rng(rng): - converted = rng.to_pydatetime() - tm.assertIsInstance(converted, np.ndarray) - for x, stamp in zip(converted, rng): - tm.assertIsInstance(x, datetime) - self.assertEqual(x, stamp.to_pydatetime()) - self.assertEqual(x.tzinfo, stamp.tzinfo) - - rng = date_range('20090415', '20090519') - rng_eastern = date_range('20090415', '20090519', - tz='dateutil/US/Eastern') - rng_utc = date_range('20090415', '20090519', tz=dateutil.tz.tzutc()) - - _check_rng(rng) - _check_rng(rng_eastern) - _check_rng(rng_utc) - - def test_ctor_str_intraday(self): - rng = DatetimeIndex(['1-1-2000 00:00:01']) - self.assertEqual(rng[0].second, 1) - - def test_series_ctor_plus_datetimeindex(self): - rng = date_range('20090415', '20090519', freq='B') - data = dict((k, 1) for k in rng) - - result = Series(data, index=rng) - self.assertIs(result.index, rng) - - def test_series_pad_backfill_limit(self): - index = np.arange(10) - s = Series(np.random.randn(10), index=index) - - result = s[:2].reindex(index, method='pad', limit=5) - - expected = s[:2].reindex(index).fillna(method='pad') - expected[-3:] = np.nan - assert_series_equal(result, expected) - - result = s[-2:].reindex(index, method='backfill', limit=5) - - expected = s[-2:].reindex(index).fillna(method='backfill') - expected[:3] = np.nan - assert_series_equal(result, expected) - - def test_series_fillna_limit(self): - index = np.arange(10) - s = Series(np.random.randn(10), index=index) - - result = s[:2].reindex(index) - result = result.fillna(method='pad', limit=5) - - expected = s[:2].reindex(index).fillna(method='pad') - expected[-3:] = np.nan - assert_series_equal(result, expected) - - result = s[-2:].reindex(index) - result = result.fillna(method='bfill', limit=5) - - expected = s[-2:].reindex(index).fillna(method='backfill') - expected[:3] = np.nan - assert_series_equal(result, expected) - - def test_frame_pad_backfill_limit(self): - index = np.arange(10) - df = DataFrame(np.random.randn(10, 4), index=index) - - result = df[:2].reindex(index, method='pad', limit=5) - - expected = df[:2].reindex(index).fillna(method='pad') - expected.values[-3:] = np.nan - tm.assert_frame_equal(result, expected) - - result = df[-2:].reindex(index, method='backfill', limit=5) - - expected = df[-2:].reindex(index).fillna(method='backfill') - expected.values[:3] = np.nan - tm.assert_frame_equal(result, expected) - - def test_frame_fillna_limit(self): - index = np.arange(10) - df = DataFrame(np.random.randn(10, 4), index=index) - - result = df[:2].reindex(index) - result = result.fillna(method='pad', limit=5) - - expected = df[:2].reindex(index).fillna(method='pad') - expected.values[-3:] = np.nan - tm.assert_frame_equal(result, expected) - - result = df[-2:].reindex(index) - result = result.fillna(method='backfill', limit=5) - - expected = df[-2:].reindex(index).fillna(method='backfill') - expected.values[:3] = np.nan - tm.assert_frame_equal(result, expected) - - def test_frame_setitem_timestamp(self): - # 2155 - columns = DatetimeIndex(start='1/1/2012', end='2/1/2012', - freq=offsets.BDay()) - index = lrange(10) - data = DataFrame(columns=columns, index=index) - t = datetime(2012, 11, 1) - ts = Timestamp(t) - data[ts] = np.nan # works - - def test_sparse_series_fillna_limit(self): - index = np.arange(10) - s = Series(np.random.randn(10), index=index) - - ss = s[:2].reindex(index).to_sparse() - result = ss.fillna(method='pad', limit=5) - expected = ss.fillna(method='pad', limit=5) - expected = expected.to_dense() - expected[-3:] = np.nan - expected = expected.to_sparse() - assert_series_equal(result, expected) - - ss = s[-2:].reindex(index).to_sparse() - result = ss.fillna(method='backfill', limit=5) - expected = ss.fillna(method='backfill') - expected = expected.to_dense() - expected[:3] = np.nan - expected = expected.to_sparse() - assert_series_equal(result, expected) - - def test_sparse_series_pad_backfill_limit(self): - index = np.arange(10) - s = Series(np.random.randn(10), index=index) - s = s.to_sparse() - - result = s[:2].reindex(index, method='pad', limit=5) - expected = s[:2].reindex(index).fillna(method='pad') - expected = expected.to_dense() - expected[-3:] = np.nan - expected = expected.to_sparse() - assert_series_equal(result, expected) - - result = s[-2:].reindex(index, method='backfill', limit=5) - expected = s[-2:].reindex(index).fillna(method='backfill') - expected = expected.to_dense() - expected[:3] = np.nan - expected = expected.to_sparse() - assert_series_equal(result, expected) - - def test_sparse_frame_pad_backfill_limit(self): - index = np.arange(10) - df = DataFrame(np.random.randn(10, 4), index=index) - sdf = df.to_sparse() - - result = sdf[:2].reindex(index, method='pad', limit=5) - - expected = sdf[:2].reindex(index).fillna(method='pad') - expected = expected.to_dense() - expected.values[-3:] = np.nan - expected = expected.to_sparse() - tm.assert_frame_equal(result, expected) - - result = sdf[-2:].reindex(index, method='backfill', limit=5) - - expected = sdf[-2:].reindex(index).fillna(method='backfill') - expected = expected.to_dense() - expected.values[:3] = np.nan - expected = expected.to_sparse() - tm.assert_frame_equal(result, expected) - - def test_sparse_frame_fillna_limit(self): - index = np.arange(10) - df = DataFrame(np.random.randn(10, 4), index=index) - sdf = df.to_sparse() - - result = sdf[:2].reindex(index) - result = result.fillna(method='pad', limit=5) - - expected = sdf[:2].reindex(index).fillna(method='pad') - expected = expected.to_dense() - expected.values[-3:] = np.nan - expected = expected.to_sparse() - tm.assert_frame_equal(result, expected) - - result = sdf[-2:].reindex(index) - result = result.fillna(method='backfill', limit=5) - - expected = sdf[-2:].reindex(index).fillna(method='backfill') - expected = expected.to_dense() - expected.values[:3] = np.nan - expected = expected.to_sparse() - tm.assert_frame_equal(result, expected) - - def test_pad_require_monotonicity(self): - rng = date_range('1/1/2000', '3/1/2000', freq='B') - - # neither monotonic increasing or decreasing - rng2 = rng[[1, 0, 2]] - - self.assertRaises(ValueError, rng2.get_indexer, rng, method='pad') - - def test_frame_ctor_datetime64_column(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') - dates = np.asarray(rng) - - df = DataFrame({'A': np.random.randn(len(rng)), 'B': dates}) - self.assertTrue(np.issubdtype(df['B'].dtype, np.dtype('M8[ns]'))) - - def test_frame_add_datetime64_column(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') - df = DataFrame(index=np.arange(len(rng))) - - df['A'] = rng - self.assertTrue(np.issubdtype(df['A'].dtype, np.dtype('M8[ns]'))) - - def test_frame_datetime64_pre1900_repr(self): - df = DataFrame({'year': date_range('1/1/1700', periods=50, - freq='A-DEC')}) - # it works! - repr(df) - - def test_frame_add_datetime64_col_other_units(self): - n = 100 - - units = ['h', 'm', 's', 'ms', 'D', 'M', 'Y'] - - ns_dtype = np.dtype('M8[ns]') - - for unit in units: - dtype = np.dtype('M8[%s]' % unit) - vals = np.arange(n, dtype=np.int64).view(dtype) - - df = DataFrame({'ints': np.arange(n)}, index=np.arange(n)) - df[unit] = vals - - ex_vals = to_datetime(vals.astype('O')).values - - self.assertEqual(df[unit].dtype, ns_dtype) - self.assertTrue((df[unit].values == ex_vals).all()) - - # Test insertion into existing datetime64 column - df = DataFrame({'ints': np.arange(n)}, index=np.arange(n)) - df['dates'] = np.arange(n, dtype=np.int64).view(ns_dtype) - - for unit in units: - dtype = np.dtype('M8[%s]' % unit) - vals = np.arange(n, dtype=np.int64).view(dtype) - - tmp = df.copy() - - tmp['dates'] = vals - ex_vals = to_datetime(vals.astype('O')).values - - self.assertTrue((tmp['dates'].values == ex_vals).all()) - - def test_to_datetime_unit(self): - - epoch = 1370745748 - s = Series([epoch + t for t in range(20)]) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in range(20)]) - assert_series_equal(result, expected) - - s = Series([epoch + t for t in range(20)]).astype(float) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in range(20)]) - assert_series_equal(result, expected) - - s = Series([epoch + t for t in range(20)] + [iNaT]) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in range(20)] + [NaT]) - assert_series_equal(result, expected) - - s = Series([epoch + t for t in range(20)] + [iNaT]).astype(float) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in range(20)] + [NaT]) - assert_series_equal(result, expected) - - # GH13834 - s = Series([epoch + t for t in np.arange(0, 2, .25)] + - [iNaT]).astype(float) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in np.arange(0, 2, .25)] + [NaT]) - assert_series_equal(result, expected) - - s = concat([Series([epoch + t for t in range(20)] - ).astype(float), Series([np.nan])], - ignore_index=True) - result = to_datetime(s, unit='s') - expected = Series([Timestamp('2013-06-09 02:42:28') + timedelta( - seconds=t) for t in range(20)] + [NaT]) - assert_series_equal(result, expected) - - result = to_datetime([1, 2, 'NaT', pd.NaT, np.nan], unit='D') - expected = DatetimeIndex([Timestamp('1970-01-02'), - Timestamp('1970-01-03')] + ['NaT'] * 3) - tm.assert_index_equal(result, expected) - - with self.assertRaises(ValueError): - to_datetime([1, 2, 'foo'], unit='D') - with self.assertRaises(ValueError): - to_datetime([1, 2, 111111111], unit='D') - - # coerce we can process - expected = DatetimeIndex([Timestamp('1970-01-02'), - Timestamp('1970-01-03')] + ['NaT'] * 1) - result = to_datetime([1, 2, 'foo'], unit='D', errors='coerce') - tm.assert_index_equal(result, expected) - - result = to_datetime([1, 2, 111111111], unit='D', errors='coerce') - tm.assert_index_equal(result, expected) - - def test_series_ctor_datetime64(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 1:59:50', freq='10s') - dates = np.asarray(rng) - - series = Series(dates) - self.assertTrue(np.issubdtype(series.dtype, np.dtype('M8[ns]'))) - - def test_index_cast_datetime64_other_units(self): - arr = np.arange(0, 100, 10, dtype=np.int64).view('M8[D]') - - idx = Index(arr) - - self.assertTrue((idx.values == tslib.cast_to_nanoseconds(arr)).all()) - - def test_reindex_series_add_nat(self): - rng = date_range('1/1/2000 00:00:00', periods=10, freq='10s') - series = Series(rng) - - result = series.reindex(lrange(15)) - self.assertTrue(np.issubdtype(result.dtype, np.dtype('M8[ns]'))) - - mask = result.isnull() - self.assertTrue(mask[-5:].all()) - self.assertFalse(mask[:-5].any()) - - def test_reindex_frame_add_nat(self): - rng = date_range('1/1/2000 00:00:00', periods=10, freq='10s') - df = DataFrame({'A': np.random.randn(len(rng)), 'B': rng}) - - result = df.reindex(lrange(15)) - self.assertTrue(np.issubdtype(result['B'].dtype, np.dtype('M8[ns]'))) - - mask = com.isnull(result)['B'] - self.assertTrue(mask[-5:].all()) - self.assertFalse(mask[:-5].any()) - - def test_series_repr_nat(self): - series = Series([0, 1000, 2000, iNaT], dtype='M8[ns]') - - result = repr(series) - expected = ('0 1970-01-01 00:00:00.000000\n' - '1 1970-01-01 00:00:00.000001\n' - '2 1970-01-01 00:00:00.000002\n' - '3 NaT\n' - 'dtype: datetime64[ns]') - self.assertEqual(result, expected) - - def test_fillna_nat(self): - series = Series([0, 1, 2, iNaT], dtype='M8[ns]') - - filled = series.fillna(method='pad') - filled2 = series.fillna(value=series.values[2]) - - expected = series.copy() - expected.values[3] = expected.values[2] - - assert_series_equal(filled, expected) - assert_series_equal(filled2, expected) - - df = DataFrame({'A': series}) - filled = df.fillna(method='pad') - filled2 = df.fillna(value=series.values[2]) - expected = DataFrame({'A': expected}) - assert_frame_equal(filled, expected) - assert_frame_equal(filled2, expected) - - series = Series([iNaT, 0, 1, 2], dtype='M8[ns]') - - filled = series.fillna(method='bfill') - filled2 = series.fillna(value=series[1]) - - expected = series.copy() - expected[0] = expected[1] - - assert_series_equal(filled, expected) - assert_series_equal(filled2, expected) - - df = DataFrame({'A': series}) - filled = df.fillna(method='bfill') - filled2 = df.fillna(value=series[1]) - expected = DataFrame({'A': expected}) - assert_frame_equal(filled, expected) - assert_frame_equal(filled2, expected) - - def test_string_na_nat_conversion(self): - # GH #999, #858 - - from pandas.compat import parse_date - - strings = np.array(['1/1/2000', '1/2/2000', np.nan, - '1/4/2000, 12:34:56'], dtype=object) - - expected = np.empty(4, dtype='M8[ns]') - for i, val in enumerate(strings): - if com.isnull(val): - expected[i] = iNaT - else: - expected[i] = parse_date(val) - - result = tslib.array_to_datetime(strings) - assert_almost_equal(result, expected) - - result2 = to_datetime(strings) - tm.assertIsInstance(result2, DatetimeIndex) - tm.assert_numpy_array_equal(result, result2.values) - - malformed = np.array(['1/100/2000', np.nan], dtype=object) - - # GH 10636, default is now 'raise' - self.assertRaises(ValueError, - lambda: to_datetime(malformed, errors='raise')) - - result = to_datetime(malformed, errors='ignore') - tm.assert_numpy_array_equal(result, malformed) - - self.assertRaises(ValueError, to_datetime, malformed, errors='raise') - - idx = ['a', 'b', 'c', 'd', 'e'] - series = Series(['1/1/2000', np.nan, '1/3/2000', np.nan, - '1/5/2000'], index=idx, name='foo') - dseries = Series([to_datetime('1/1/2000'), np.nan, - to_datetime('1/3/2000'), np.nan, - to_datetime('1/5/2000')], index=idx, name='foo') - - result = to_datetime(series) - dresult = to_datetime(dseries) - - expected = Series(np.empty(5, dtype='M8[ns]'), index=idx) - for i in range(5): - x = series[i] - if isnull(x): - expected[i] = iNaT - else: - expected[i] = to_datetime(x) - - assert_series_equal(result, expected, check_names=False) - self.assertEqual(result.name, 'foo') - - assert_series_equal(dresult, expected, check_names=False) - self.assertEqual(dresult.name, 'foo') - - def test_to_datetime_iso8601(self): - result = to_datetime(["2012-01-01 00:00:00"]) - exp = Timestamp("2012-01-01 00:00:00") - self.assertEqual(result[0], exp) - - result = to_datetime(['20121001']) # bad iso 8601 - exp = Timestamp('2012-10-01') - self.assertEqual(result[0], exp) - - def test_to_datetime_default(self): - rs = to_datetime('2001') - xp = datetime(2001, 1, 1) - self.assertTrue(rs, xp) - - # dayfirst is essentially broken - - # to_datetime('01-13-2012', dayfirst=True) - # self.assertRaises(ValueError, to_datetime('01-13-2012', - # dayfirst=True)) - - def test_to_datetime_on_datetime64_series(self): - # #2699 - s = Series(date_range('1/1/2000', periods=10)) - - result = to_datetime(s) - self.assertEqual(result[0], s[0]) - - def test_to_datetime_with_space_in_series(self): - # GH 6428 - s = Series(['10/18/2006', '10/18/2008', ' ']) - tm.assertRaises(ValueError, lambda: to_datetime(s, errors='raise')) - result_coerce = to_datetime(s, errors='coerce') - expected_coerce = Series([datetime(2006, 10, 18), - datetime(2008, 10, 18), - pd.NaT]) - tm.assert_series_equal(result_coerce, expected_coerce) - result_ignore = to_datetime(s, errors='ignore') - tm.assert_series_equal(result_ignore, s) - - def test_to_datetime_with_apply(self): - # this is only locale tested with US/None locales - _skip_if_has_locale() - - # GH 5195 - # with a format and coerce a single item to_datetime fails - td = Series(['May 04', 'Jun 02', 'Dec 11'], index=[1, 2, 3]) - expected = pd.to_datetime(td, format='%b %y') - result = td.apply(pd.to_datetime, format='%b %y') - assert_series_equal(result, expected) - - td = pd.Series(['May 04', 'Jun 02', ''], index=[1, 2, 3]) - self.assertRaises(ValueError, - lambda: pd.to_datetime(td, format='%b %y', - errors='raise')) - self.assertRaises(ValueError, - lambda: td.apply(pd.to_datetime, format='%b %y', - errors='raise')) - expected = pd.to_datetime(td, format='%b %y', errors='coerce') - - result = td.apply( - lambda x: pd.to_datetime(x, format='%b %y', errors='coerce')) - assert_series_equal(result, expected) - - def test_nat_vector_field_access(self): - idx = DatetimeIndex(['1/1/2000', None, None, '1/4/2000']) - - fields = ['year', 'quarter', 'month', 'day', 'hour', 'minute', - 'second', 'microsecond', 'nanosecond', 'week', 'dayofyear', - 'days_in_month', 'is_leap_year'] - - for field in fields: - result = getattr(idx, field) - expected = [getattr(x, field) for x in idx] - self.assert_numpy_array_equal(result, np.array(expected)) - - s = pd.Series(idx) - - for field in fields: - result = getattr(s.dt, field) - expected = [getattr(x, field) for x in idx] - self.assert_series_equal(result, pd.Series(expected)) - - def test_nat_scalar_field_access(self): - fields = ['year', 'quarter', 'month', 'day', 'hour', 'minute', - 'second', 'microsecond', 'nanosecond', 'week', 'dayofyear', - 'days_in_month', 'daysinmonth', 'dayofweek', 'weekday_name'] - for field in fields: - result = getattr(NaT, field) - self.assertTrue(np.isnan(result)) - - def test_NaT_methods(self): - # GH 9513 - raise_methods = ['astimezone', 'combine', 'ctime', 'dst', - 'fromordinal', 'fromtimestamp', 'isocalendar', - 'strftime', 'strptime', 'time', 'timestamp', - 'timetuple', 'timetz', 'toordinal', 'tzname', - 'utcfromtimestamp', 'utcnow', 'utcoffset', - 'utctimetuple'] - nat_methods = ['date', 'now', 'replace', 'to_datetime', 'today'] - nan_methods = ['weekday', 'isoweekday'] - - for method in raise_methods: - if hasattr(NaT, method): - self.assertRaises(ValueError, getattr(NaT, method)) - - for method in nan_methods: - if hasattr(NaT, method): - self.assertTrue(np.isnan(getattr(NaT, method)())) - - for method in nat_methods: - if hasattr(NaT, method): - # see gh-8254 - exp_warning = None - if method == 'to_datetime': - exp_warning = FutureWarning - with tm.assert_produces_warning( - exp_warning, check_stacklevel=False): - self.assertIs(getattr(NaT, method)(), NaT) - - # GH 12300 - self.assertEqual(NaT.isoformat(), 'NaT') - - def test_to_datetime_types(self): - - # empty string - result = to_datetime('') - self.assertIs(result, NaT) - - result = to_datetime(['', '']) - self.assertTrue(isnull(result).all()) - - # ints - result = Timestamp(0) - expected = to_datetime(0) - self.assertEqual(result, expected) - - # GH 3888 (strings) - expected = to_datetime(['2012'])[0] - result = to_datetime('2012') - self.assertEqual(result, expected) - - # array = ['2012','20120101','20120101 12:01:01'] - array = ['20120101', '20120101 12:01:01'] - expected = list(to_datetime(array)) - result = lmap(Timestamp, array) - tm.assert_almost_equal(result, expected) - - # currently fails ### - # result = Timestamp('2012') - # expected = to_datetime('2012') - # self.assertEqual(result, expected) - - def test_to_datetime_unprocessable_input(self): - # GH 4928 - self.assert_numpy_array_equal( - to_datetime([1, '1'], errors='ignore'), - np.array([1, '1'], dtype='O') - ) - self.assertRaises(TypeError, to_datetime, [1, '1'], errors='raise') - - def test_to_datetime_other_datetime64_units(self): - # 5/25/2012 - scalar = np.int64(1337904000000000).view('M8[us]') - as_obj = scalar.astype('O') - - index = DatetimeIndex([scalar]) - self.assertEqual(index[0], scalar.astype('O')) - - value = Timestamp(scalar) - self.assertEqual(value, as_obj) - - def test_to_datetime_list_of_integers(self): - rng = date_range('1/1/2000', periods=20) - rng = DatetimeIndex(rng.values) - - ints = list(rng.asi8) - - result = DatetimeIndex(ints) - - tm.assert_index_equal(rng, result) - - def test_to_datetime_freq(self): - xp = bdate_range('2000-1-1', periods=10, tz='UTC') - rs = xp.to_datetime() - self.assertEqual(xp.freq, rs.freq) - self.assertEqual(xp.tzinfo, rs.tzinfo) - - def test_range_edges(self): - # GH 13672 - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000001'), - end=Timestamp('1970-01-01 00:00:00.000000004'), - freq='N') - exp = DatetimeIndex(['1970-01-01 00:00:00.000000001', - '1970-01-01 00:00:00.000000002', - '1970-01-01 00:00:00.000000003', - '1970-01-01 00:00:00.000000004']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000004'), - end=Timestamp('1970-01-01 00:00:00.000000001'), - freq='N') - exp = DatetimeIndex([]) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000000001'), - end=Timestamp('1970-01-01 00:00:00.000000001'), - freq='N') - exp = DatetimeIndex(['1970-01-01 00:00:00.000000001']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.000001'), - end=Timestamp('1970-01-01 00:00:00.000004'), - freq='U') - exp = DatetimeIndex(['1970-01-01 00:00:00.000001', - '1970-01-01 00:00:00.000002', - '1970-01-01 00:00:00.000003', - '1970-01-01 00:00:00.000004']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:00.001'), - end=Timestamp('1970-01-01 00:00:00.004'), - freq='L') - exp = DatetimeIndex(['1970-01-01 00:00:00.001', - '1970-01-01 00:00:00.002', - '1970-01-01 00:00:00.003', - '1970-01-01 00:00:00.004']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:00:01'), - end=Timestamp('1970-01-01 00:00:04'), freq='S') - exp = DatetimeIndex(['1970-01-01 00:00:01', '1970-01-01 00:00:02', - '1970-01-01 00:00:03', '1970-01-01 00:00:04']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 00:01'), - end=Timestamp('1970-01-01 00:04'), freq='T') - exp = DatetimeIndex(['1970-01-01 00:01', '1970-01-01 00:02', - '1970-01-01 00:03', '1970-01-01 00:04']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01 01:00'), - end=Timestamp('1970-01-01 04:00'), freq='H') - exp = DatetimeIndex(['1970-01-01 01:00', '1970-01-01 02:00', - '1970-01-01 03:00', '1970-01-01 04:00']) - tm.assert_index_equal(idx, exp) - - idx = DatetimeIndex(start=Timestamp('1970-01-01'), - end=Timestamp('1970-01-04'), freq='D') - exp = DatetimeIndex(['1970-01-01', '1970-01-02', - '1970-01-03', '1970-01-04']) - tm.assert_index_equal(idx, exp) - - def test_range_misspecified(self): - # GH #1095 - - self.assertRaises(ValueError, date_range, '1/1/2000') - self.assertRaises(ValueError, date_range, end='1/1/2000') - self.assertRaises(ValueError, date_range, periods=10) - - self.assertRaises(ValueError, date_range, '1/1/2000', freq='H') - self.assertRaises(ValueError, date_range, end='1/1/2000', freq='H') - self.assertRaises(ValueError, date_range, periods=10, freq='H') - - def test_reasonable_keyerror(self): - # GH #1062 - index = DatetimeIndex(['1/3/2000']) - try: - index.get_loc('1/1/2000') - except KeyError as e: - self.assertIn('2000', str(e)) - - def test_reindex_with_datetimes(self): - rng = date_range('1/1/2000', periods=20) - ts = Series(np.random.randn(20), index=rng) - - result = ts.reindex(list(ts.index[5:10])) - expected = ts[5:10] - tm.assert_series_equal(result, expected) - - result = ts[list(ts.index[5:10])] - tm.assert_series_equal(result, expected) - - def test_asfreq_keep_index_name(self): - # GH #9854 - index_name = 'bar' - index = pd.date_range('20130101', periods=20, name=index_name) - df = pd.DataFrame([x for x in range(20)], columns=['foo'], index=index) - - self.assertEqual(index_name, df.index.name) - self.assertEqual(index_name, df.asfreq('10D').index.name) - - def test_promote_datetime_date(self): - rng = date_range('1/1/2000', periods=20) - ts = Series(np.random.randn(20), index=rng) - - ts_slice = ts[5:] - ts2 = ts_slice.copy() - ts2.index = [x.date() for x in ts2.index] - - result = ts + ts2 - result2 = ts2 + ts - expected = ts + ts[5:] - assert_series_equal(result, expected) - assert_series_equal(result2, expected) - - # test asfreq - result = ts2.asfreq('4H', method='ffill') - expected = ts[5:].asfreq('4H', method='ffill') - assert_series_equal(result, expected) - - result = rng.get_indexer(ts2.index) - expected = rng.get_indexer(ts_slice.index) - self.assert_numpy_array_equal(result, expected) - - def test_asfreq_normalize(self): - rng = date_range('1/1/2000 09:30', periods=20) - norm = date_range('1/1/2000', periods=20) - vals = np.random.randn(20) - ts = Series(vals, index=rng) - - result = ts.asfreq('D', normalize=True) - norm = date_range('1/1/2000', periods=20) - expected = Series(vals, index=norm) - - assert_series_equal(result, expected) - - vals = np.random.randn(20, 3) - ts = DataFrame(vals, index=rng) - - result = ts.asfreq('D', normalize=True) - expected = DataFrame(vals, index=norm) - - assert_frame_equal(result, expected) - - def test_date_range_gen_error(self): - rng = date_range('1/1/2000 00:00', '1/1/2000 00:18', freq='5min') - self.assertEqual(len(rng), 4) - - def test_date_range_negative_freq(self): - # GH 11018 - rng = date_range('2011-12-31', freq='-2A', periods=3) - exp = pd.DatetimeIndex(['2011-12-31', '2009-12-31', - '2007-12-31'], freq='-2A') - tm.assert_index_equal(rng, exp) - self.assertEqual(rng.freq, '-2A') - - rng = date_range('2011-01-31', freq='-2M', periods=3) - exp = pd.DatetimeIndex(['2011-01-31', '2010-11-30', - '2010-09-30'], freq='-2M') - tm.assert_index_equal(rng, exp) - self.assertEqual(rng.freq, '-2M') - - def test_date_range_bms_bug(self): - # #1645 - rng = date_range('1/1/2000', periods=10, freq='BMS') - - ex_first = Timestamp('2000-01-03') - self.assertEqual(rng[0], ex_first) - - def test_date_range_businesshour(self): - idx = DatetimeIndex(['2014-07-04 09:00', '2014-07-04 10:00', - '2014-07-04 11:00', - '2014-07-04 12:00', '2014-07-04 13:00', - '2014-07-04 14:00', - '2014-07-04 15:00', '2014-07-04 16:00'], - freq='BH') - rng = date_range('2014-07-04 09:00', '2014-07-04 16:00', freq='BH') - tm.assert_index_equal(idx, rng) - - idx = DatetimeIndex( - ['2014-07-04 16:00', '2014-07-07 09:00'], freq='BH') - rng = date_range('2014-07-04 16:00', '2014-07-07 09:00', freq='BH') - tm.assert_index_equal(idx, rng) - - idx = DatetimeIndex(['2014-07-04 09:00', '2014-07-04 10:00', - '2014-07-04 11:00', - '2014-07-04 12:00', '2014-07-04 13:00', - '2014-07-04 14:00', - '2014-07-04 15:00', '2014-07-04 16:00', - '2014-07-07 09:00', '2014-07-07 10:00', - '2014-07-07 11:00', - '2014-07-07 12:00', '2014-07-07 13:00', - '2014-07-07 14:00', - '2014-07-07 15:00', '2014-07-07 16:00', - '2014-07-08 09:00', '2014-07-08 10:00', - '2014-07-08 11:00', - '2014-07-08 12:00', '2014-07-08 13:00', - '2014-07-08 14:00', - '2014-07-08 15:00', '2014-07-08 16:00'], - freq='BH') - rng = date_range('2014-07-04 09:00', '2014-07-08 16:00', freq='BH') - tm.assert_index_equal(idx, rng) - - def test_first_subset(self): - ts = _simple_ts('1/1/2000', '1/1/2010', freq='12h') - result = ts.first('10d') - self.assertEqual(len(result), 20) - - ts = _simple_ts('1/1/2000', '1/1/2010') - result = ts.first('10d') - self.assertEqual(len(result), 10) - - result = ts.first('3M') - expected = ts[:'3/31/2000'] - assert_series_equal(result, expected) - - result = ts.first('21D') - expected = ts[:21] - assert_series_equal(result, expected) - - result = ts[:0].first('3M') - assert_series_equal(result, ts[:0]) - - def test_last_subset(self): - ts = _simple_ts('1/1/2000', '1/1/2010', freq='12h') - result = ts.last('10d') - self.assertEqual(len(result), 20) - - ts = _simple_ts('1/1/2000', '1/1/2010') - result = ts.last('10d') - self.assertEqual(len(result), 10) - - result = ts.last('21D') - expected = ts['12/12/2009':] - assert_series_equal(result, expected) - - result = ts.last('21D') - expected = ts[-21:] - assert_series_equal(result, expected) - - result = ts[:0].last('3M') - assert_series_equal(result, ts[:0]) - - def test_format_pre_1900_dates(self): - rng = date_range('1/1/1850', '1/1/1950', freq='A-DEC') - rng.format() - ts = Series(1, index=rng) - repr(ts) - - def test_at_time(self): - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = Series(np.random.randn(len(rng)), index=rng) - rs = ts.at_time(rng[1]) - self.assertTrue((rs.index.hour == rng[1].hour).all()) - self.assertTrue((rs.index.minute == rng[1].minute).all()) - self.assertTrue((rs.index.second == rng[1].second).all()) - - result = ts.at_time('9:30') - expected = ts.at_time(time(9, 30)) - assert_series_equal(result, expected) - - df = DataFrame(np.random.randn(len(rng), 3), index=rng) - - result = ts[time(9, 30)] - result_df = df.loc[time(9, 30)] - expected = ts[(rng.hour == 9) & (rng.minute == 30)] - exp_df = df[(rng.hour == 9) & (rng.minute == 30)] - - # expected.index = date_range('1/1/2000', '1/4/2000') - - assert_series_equal(result, expected) - tm.assert_frame_equal(result_df, exp_df) - - chunk = df.loc['1/4/2000':] - result = chunk.loc[time(9, 30)] - expected = result_df[-1:] - tm.assert_frame_equal(result, expected) - - # midnight, everything - rng = date_range('1/1/2000', '1/31/2000') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.at_time(time(0, 0)) - assert_series_equal(result, ts) - - # time doesn't exist - rng = date_range('1/1/2012', freq='23Min', periods=384) - ts = Series(np.random.randn(len(rng)), rng) - rs = ts.at_time('16:00') - self.assertEqual(len(rs), 0) - - def test_at_time_frame(self): - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = DataFrame(np.random.randn(len(rng), 2), index=rng) - rs = ts.at_time(rng[1]) - self.assertTrue((rs.index.hour == rng[1].hour).all()) - self.assertTrue((rs.index.minute == rng[1].minute).all()) - self.assertTrue((rs.index.second == rng[1].second).all()) - - result = ts.at_time('9:30') - expected = ts.at_time(time(9, 30)) - assert_frame_equal(result, expected) - - result = ts.loc[time(9, 30)] - expected = ts.loc[(rng.hour == 9) & (rng.minute == 30)] - - assert_frame_equal(result, expected) - - # midnight, everything - rng = date_range('1/1/2000', '1/31/2000') - ts = DataFrame(np.random.randn(len(rng), 3), index=rng) - - result = ts.at_time(time(0, 0)) - assert_frame_equal(result, ts) - - # time doesn't exist - rng = date_range('1/1/2012', freq='23Min', periods=384) - ts = DataFrame(np.random.randn(len(rng), 2), rng) - rs = ts.at_time('16:00') - self.assertEqual(len(rs), 0) - - def test_between_time(self): - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = Series(np.random.randn(len(rng)), index=rng) - stime = time(0, 0) - etime = time(1, 0) - - close_open = product([True, False], [True, False]) - for inc_start, inc_end in close_open: - filtered = ts.between_time(stime, etime, inc_start, inc_end) - exp_len = 13 * 4 + 1 - if not inc_start: - exp_len -= 5 - if not inc_end: - exp_len -= 4 - - self.assertEqual(len(filtered), exp_len) - for rs in filtered.index: - t = rs.time() - if inc_start: - self.assertTrue(t >= stime) - else: - self.assertTrue(t > stime) - - if inc_end: - self.assertTrue(t <= etime) - else: - self.assertTrue(t < etime) - - result = ts.between_time('00:00', '01:00') - expected = ts.between_time(stime, etime) - assert_series_equal(result, expected) - - # across midnight - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = Series(np.random.randn(len(rng)), index=rng) - stime = time(22, 0) - etime = time(9, 0) - - close_open = product([True, False], [True, False]) - for inc_start, inc_end in close_open: - filtered = ts.between_time(stime, etime, inc_start, inc_end) - exp_len = (12 * 11 + 1) * 4 + 1 - if not inc_start: - exp_len -= 4 - if not inc_end: - exp_len -= 4 - - self.assertEqual(len(filtered), exp_len) - for rs in filtered.index: - t = rs.time() - if inc_start: - self.assertTrue((t >= stime) or (t <= etime)) - else: - self.assertTrue((t > stime) or (t <= etime)) - - if inc_end: - self.assertTrue((t <= etime) or (t >= stime)) - else: - self.assertTrue((t < etime) or (t >= stime)) - - def test_between_time_frame(self): - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = DataFrame(np.random.randn(len(rng), 2), index=rng) - stime = time(0, 0) - etime = time(1, 0) - - close_open = product([True, False], [True, False]) - for inc_start, inc_end in close_open: - filtered = ts.between_time(stime, etime, inc_start, inc_end) - exp_len = 13 * 4 + 1 - if not inc_start: - exp_len -= 5 - if not inc_end: - exp_len -= 4 - - self.assertEqual(len(filtered), exp_len) - for rs in filtered.index: - t = rs.time() - if inc_start: - self.assertTrue(t >= stime) - else: - self.assertTrue(t > stime) - - if inc_end: - self.assertTrue(t <= etime) - else: - self.assertTrue(t < etime) - - result = ts.between_time('00:00', '01:00') - expected = ts.between_time(stime, etime) - assert_frame_equal(result, expected) - - # across midnight - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = DataFrame(np.random.randn(len(rng), 2), index=rng) - stime = time(22, 0) - etime = time(9, 0) - - close_open = product([True, False], [True, False]) - for inc_start, inc_end in close_open: - filtered = ts.between_time(stime, etime, inc_start, inc_end) - exp_len = (12 * 11 + 1) * 4 + 1 - if not inc_start: - exp_len -= 4 - if not inc_end: - exp_len -= 4 - - self.assertEqual(len(filtered), exp_len) - for rs in filtered.index: - t = rs.time() - if inc_start: - self.assertTrue((t >= stime) or (t <= etime)) - else: - self.assertTrue((t > stime) or (t <= etime)) - - if inc_end: - self.assertTrue((t <= etime) or (t >= stime)) - else: - self.assertTrue((t < etime) or (t >= stime)) - - def test_between_time_types(self): - # GH11818 - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - self.assertRaises(ValueError, rng.indexer_between_time, - datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) - - frame = DataFrame({'A': 0}, index=rng) - self.assertRaises(ValueError, frame.between_time, - datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) - - series = Series(0, index=rng) - self.assertRaises(ValueError, series.between_time, - datetime(2010, 1, 2, 1), datetime(2010, 1, 2, 5)) - - def test_between_time_formats(self): - # GH11818 - _skip_if_has_locale() - - rng = date_range('1/1/2000', '1/5/2000', freq='5min') - ts = DataFrame(np.random.randn(len(rng), 2), index=rng) - - strings = [("2:00", "2:30"), ("0200", "0230"), ("2:00am", "2:30am"), - ("0200am", "0230am"), ("2:00:00", "2:30:00"), - ("020000", "023000"), ("2:00:00am", "2:30:00am"), - ("020000am", "023000am")] - expected_length = 28 - - for time_string in strings: - self.assertEqual(len(ts.between_time(*time_string)), - expected_length, - "%s - %s" % time_string) - - def test_dti_constructor_preserve_dti_freq(self): - rng = date_range('1/1/2000', '1/2/2000', freq='5min') - - rng2 = DatetimeIndex(rng) - self.assertEqual(rng.freq, rng2.freq) - - def test_dti_constructor_years_only(self): - # GH 6961 - for tz in [None, 'UTC', 'Asia/Tokyo', 'dateutil/US/Pacific']: - rng1 = date_range('2014', '2015', freq='M', tz=tz) - expected1 = date_range('2014-01-31', '2014-12-31', freq='M', tz=tz) - - rng2 = date_range('2014', '2015', freq='MS', tz=tz) - expected2 = date_range('2014-01-01', '2015-01-01', freq='MS', - tz=tz) - - rng3 = date_range('2014', '2020', freq='A', tz=tz) - expected3 = date_range('2014-12-31', '2019-12-31', freq='A', tz=tz) - - rng4 = date_range('2014', '2020', freq='AS', tz=tz) - expected4 = date_range('2014-01-01', '2020-01-01', freq='AS', - tz=tz) - - for rng, expected in [(rng1, expected1), (rng2, expected2), - (rng3, expected3), (rng4, expected4)]: - tm.assert_index_equal(rng, expected) - - def test_dti_constructor_small_int(self): - # GH 13721 - exp = DatetimeIndex(['1970-01-01 00:00:00.00000000', - '1970-01-01 00:00:00.00000001', - '1970-01-01 00:00:00.00000002']) - - for dtype in [np.int64, np.int32, np.int16, np.int8]: - arr = np.array([0, 10, 20], dtype=dtype) - tm.assert_index_equal(DatetimeIndex(arr), exp) - - def test_dti_constructor_numpy_timeunits(self): - # GH 9114 - base = pd.to_datetime(['2000-01-01T00:00', '2000-01-02T00:00', 'NaT']) - - for dtype in ['datetime64[h]', 'datetime64[m]', 'datetime64[s]', - 'datetime64[ms]', 'datetime64[us]', 'datetime64[ns]']: - values = base.values.astype(dtype) - - tm.assert_index_equal(DatetimeIndex(values), base) - tm.assert_index_equal(to_datetime(values), base) - - def test_normalize(self): - rng = date_range('1/1/2000 9:30', periods=10, freq='D') - - result = rng.normalize() - expected = date_range('1/1/2000', periods=10, freq='D') - tm.assert_index_equal(result, expected) - - rng_ns = pd.DatetimeIndex(np.array([1380585623454345752, - 1380585612343234312]).astype( - "datetime64[ns]")) - rng_ns_normalized = rng_ns.normalize() - expected = pd.DatetimeIndex(np.array([1380585600000000000, - 1380585600000000000]).astype( - "datetime64[ns]")) - tm.assert_index_equal(rng_ns_normalized, expected) - - self.assertTrue(result.is_normalized) - self.assertFalse(rng.is_normalized) - - def test_to_period(self): - from pandas.tseries.period import period_range - - ts = _simple_ts('1/1/2000', '1/1/2001') - - pts = ts.to_period() - exp = ts.copy() - exp.index = period_range('1/1/2000', '1/1/2001') - assert_series_equal(pts, exp) - - pts = ts.to_period('M') - exp.index = exp.index.asfreq('M') - tm.assert_index_equal(pts.index, exp.index.asfreq('M')) - assert_series_equal(pts, exp) - - # GH 7606 without freq - idx = DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', - '2011-01-04']) - exp_idx = pd.PeriodIndex(['2011-01-01', '2011-01-02', '2011-01-03', - '2011-01-04'], freq='D') - - s = Series(np.random.randn(4), index=idx) - expected = s.copy() - expected.index = exp_idx - assert_series_equal(s.to_period(), expected) - - df = DataFrame(np.random.randn(4, 4), index=idx, columns=idx) - expected = df.copy() - expected.index = exp_idx - assert_frame_equal(df.to_period(), expected) - - expected = df.copy() - expected.columns = exp_idx - assert_frame_equal(df.to_period(axis=1), expected) - - def create_dt64_based_index(self): - data = [Timestamp('2007-01-01 10:11:12.123456Z'), - Timestamp('2007-01-01 10:11:13.789123Z')] - index = DatetimeIndex(data) - return index - - def test_to_period_millisecond(self): - index = self.create_dt64_based_index() - - period = index.to_period(freq='L') - self.assertEqual(2, len(period)) - self.assertEqual(period[0], Period('2007-01-01 10:11:12.123Z', 'L')) - self.assertEqual(period[1], Period('2007-01-01 10:11:13.789Z', 'L')) - - def test_to_period_microsecond(self): - index = self.create_dt64_based_index() - - period = index.to_period(freq='U') - self.assertEqual(2, len(period)) - self.assertEqual(period[0], Period('2007-01-01 10:11:12.123456Z', 'U')) - self.assertEqual(period[1], Period('2007-01-01 10:11:13.789123Z', 'U')) - - def test_to_period_tz_pytz(self): - tm._skip_if_no_pytz() - from dateutil.tz import tzlocal - from pytz import utc as UTC - - xp = date_range('1/1/2000', '4/1/2000').to_period() - - ts = date_range('1/1/2000', '4/1/2000', tz='US/Eastern') - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertEqual(result, expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=UTC) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertEqual(result, expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertEqual(result, expected) - tm.assert_index_equal(ts.to_period(), xp) - - def test_to_period_tz_explicit_pytz(self): - tm._skip_if_no_pytz() - import pytz - from dateutil.tz import tzlocal - - xp = date_range('1/1/2000', '4/1/2000').to_period() - - ts = date_range('1/1/2000', '4/1/2000', tz=pytz.timezone('US/Eastern')) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=pytz.utc) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - def test_to_period_tz_dateutil(self): - tm._skip_if_no_dateutil() - import dateutil - from dateutil.tz import tzlocal - - xp = date_range('1/1/2000', '4/1/2000').to_period() - - ts = date_range('1/1/2000', '4/1/2000', tz='dateutil/US/Eastern') - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=dateutil.tz.tzutc()) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) - - result = ts.to_period()[0] - expected = ts[0].to_period() - - self.assertTrue(result == expected) - tm.assert_index_equal(ts.to_period(), xp) - - def test_frame_to_period(self): - K = 5 - from pandas.tseries.period import period_range - - dr = date_range('1/1/2000', '1/1/2001') - pr = period_range('1/1/2000', '1/1/2001') - df = DataFrame(randn(len(dr), K), index=dr) - df['mix'] = 'a' - - pts = df.to_period() - exp = df.copy() - exp.index = pr - assert_frame_equal(pts, exp) - - pts = df.to_period('M') - tm.assert_index_equal(pts.index, exp.index.asfreq('M')) - - df = df.T - pts = df.to_period(axis=1) - exp = df.copy() - exp.columns = pr - assert_frame_equal(pts, exp) - - pts = df.to_period('M', axis=1) - tm.assert_index_equal(pts.columns, exp.columns.asfreq('M')) - - self.assertRaises(ValueError, df.to_period, axis=2) - - def test_timestamp_fields(self): - # extra fields from DatetimeIndex like quarter and week - idx = tm.makeDateIndex(100) - - fields = ['dayofweek', 'dayofyear', 'week', 'weekofyear', 'quarter', - 'days_in_month', 'is_month_start', 'is_month_end', - 'is_quarter_start', 'is_quarter_end', 'is_year_start', - 'is_year_end', 'weekday_name'] - for f in fields: - expected = getattr(idx, f)[-1] - result = getattr(Timestamp(idx[-1]), f) - self.assertEqual(result, expected) - - self.assertEqual(idx.freq, Timestamp(idx[-1], idx.freq).freq) - self.assertEqual(idx.freqstr, Timestamp(idx[-1], idx.freq).freqstr) - - def test_woy_boundary(self): - # make sure weeks at year boundaries are correct - d = datetime(2013, 12, 31) - result = Timestamp(d).week - expected = 1 # ISO standard - self.assertEqual(result, expected) - - d = datetime(2008, 12, 28) - result = Timestamp(d).week - expected = 52 # ISO standard - self.assertEqual(result, expected) - - d = datetime(2009, 12, 31) - result = Timestamp(d).week - expected = 53 # ISO standard - self.assertEqual(result, expected) - - d = datetime(2010, 1, 1) - result = Timestamp(d).week - expected = 53 # ISO standard - self.assertEqual(result, expected) - - d = datetime(2010, 1, 3) - result = Timestamp(d).week - expected = 53 # ISO standard - self.assertEqual(result, expected) - - result = np.array([Timestamp(datetime(*args)).week - for args in [(2000, 1, 1), (2000, 1, 2), ( - 2005, 1, 1), (2005, 1, 2)]]) - self.assertTrue((result == [52, 52, 53, 53]).all()) - - def test_timestamp_date_out_of_range(self): - self.assertRaises(ValueError, Timestamp, '1676-01-01') - self.assertRaises(ValueError, Timestamp, '2263-01-01') - - # 1475 - self.assertRaises(ValueError, DatetimeIndex, ['1400-01-01']) - self.assertRaises(ValueError, DatetimeIndex, [datetime(1400, 1, 1)]) - - def test_compat_replace(self): - # https://github.com/statsmodels/statsmodels/issues/3349 - # replace should take ints/longs for compat - - for f in [compat.long, int]: - result = date_range(Timestamp('1960-04-01 00:00:00', - freq='QS-JAN'), - periods=f(76), - freq='QS-JAN') - self.assertEqual(len(result), 76) - - def test_timestamp_repr(self): - # pre-1900 - stamp = Timestamp('1850-01-01', tz='US/Eastern') - repr(stamp) - - iso8601 = '1850-01-01 01:23:45.012345' - stamp = Timestamp(iso8601, tz='US/Eastern') - result = repr(stamp) - self.assertIn(iso8601, result) - - def test_timestamp_from_ordinal(self): - - # GH 3042 - dt = datetime(2011, 4, 16, 0, 0) - ts = Timestamp.fromordinal(dt.toordinal()) - self.assertEqual(ts.to_pydatetime(), dt) - - # with a tzinfo - stamp = Timestamp('2011-4-16', tz='US/Eastern') - dt_tz = stamp.to_pydatetime() - ts = Timestamp.fromordinal(dt_tz.toordinal(), tz='US/Eastern') - self.assertEqual(ts.to_pydatetime(), dt_tz) - - def test_datetimeindex_integers_shift(self): - rng = date_range('1/1/2000', periods=20) - - result = rng + 5 - expected = rng.shift(5) - tm.assert_index_equal(result, expected) - - result = rng - 5 - expected = rng.shift(-5) - tm.assert_index_equal(result, expected) - - def test_astype_object(self): - # NumPy 1.6.1 weak ns support - rng = date_range('1/1/2000', periods=20) - - casted = rng.astype('O') - exp_values = list(rng) - - tm.assert_index_equal(casted, Index(exp_values, dtype=np.object_)) - self.assertEqual(casted.tolist(), exp_values) - - def test_catch_infinite_loop(self): - offset = offsets.DateOffset(minute=5) - # blow up, don't loop forever - self.assertRaises(Exception, date_range, datetime(2011, 11, 11), - datetime(2011, 11, 12), freq=offset) - - def test_append_concat(self): - rng = date_range('5/8/2012 1:45', periods=10, freq='5T') - ts = Series(np.random.randn(len(rng)), rng) - df = DataFrame(np.random.randn(len(rng), 4), index=rng) - - result = ts.append(ts) - result_df = df.append(df) - ex_index = DatetimeIndex(np.tile(rng.values, 2)) - tm.assert_index_equal(result.index, ex_index) - tm.assert_index_equal(result_df.index, ex_index) - - appended = rng.append(rng) - tm.assert_index_equal(appended, ex_index) - - appended = rng.append([rng, rng]) - ex_index = DatetimeIndex(np.tile(rng.values, 3)) - tm.assert_index_equal(appended, ex_index) - - # different index names - rng1 = rng.copy() - rng2 = rng.copy() - rng1.name = 'foo' - rng2.name = 'bar' - self.assertEqual(rng1.append(rng1).name, 'foo') - self.assertIsNone(rng1.append(rng2).name) - - def test_append_concat_tz(self): - # GH 2938 - tm._skip_if_no_pytz() - - rng = date_range('5/8/2012 1:45', periods=10, freq='5T', - tz='US/Eastern') - rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', - tz='US/Eastern') - rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', - tz='US/Eastern') - ts = Series(np.random.randn(len(rng)), rng) - df = DataFrame(np.random.randn(len(rng), 4), index=rng) - ts2 = Series(np.random.randn(len(rng2)), rng2) - df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) - - result = ts.append(ts2) - result_df = df.append(df2) - tm.assert_index_equal(result.index, rng3) - tm.assert_index_equal(result_df.index, rng3) - - appended = rng.append(rng2) - tm.assert_index_equal(appended, rng3) - - def test_append_concat_tz_explicit_pytz(self): - # GH 2938 - tm._skip_if_no_pytz() - from pytz import timezone as timezone - - rng = date_range('5/8/2012 1:45', periods=10, freq='5T', - tz=timezone('US/Eastern')) - rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', - tz=timezone('US/Eastern')) - rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', - tz=timezone('US/Eastern')) - ts = Series(np.random.randn(len(rng)), rng) - df = DataFrame(np.random.randn(len(rng), 4), index=rng) - ts2 = Series(np.random.randn(len(rng2)), rng2) - df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) - - result = ts.append(ts2) - result_df = df.append(df2) - tm.assert_index_equal(result.index, rng3) - tm.assert_index_equal(result_df.index, rng3) - - appended = rng.append(rng2) - tm.assert_index_equal(appended, rng3) - - def test_append_concat_tz_dateutil(self): - # GH 2938 - tm._skip_if_no_dateutil() - rng = date_range('5/8/2012 1:45', periods=10, freq='5T', - tz='dateutil/US/Eastern') - rng2 = date_range('5/8/2012 2:35', periods=10, freq='5T', - tz='dateutil/US/Eastern') - rng3 = date_range('5/8/2012 1:45', periods=20, freq='5T', - tz='dateutil/US/Eastern') - ts = Series(np.random.randn(len(rng)), rng) - df = DataFrame(np.random.randn(len(rng), 4), index=rng) - ts2 = Series(np.random.randn(len(rng2)), rng2) - df2 = DataFrame(np.random.randn(len(rng2), 4), index=rng2) - - result = ts.append(ts2) - result_df = df.append(df2) - tm.assert_index_equal(result.index, rng3) - tm.assert_index_equal(result_df.index, rng3) - - appended = rng.append(rng2) - tm.assert_index_equal(appended, rng3) - - def test_set_dataframe_column_ns_dtype(self): - x = DataFrame([datetime.now(), datetime.now()]) - self.assertEqual(x[0].dtype, np.dtype('M8[ns]')) - - def test_groupby_count_dateparseerror(self): - dr = date_range(start='1/1/2012', freq='5min', periods=10) - - # BAD Example, datetimes first - s = Series(np.arange(10), index=[dr, lrange(10)]) - grouped = s.groupby(lambda x: x[1] % 2 == 0) - result = grouped.count() - - s = Series(np.arange(10), index=[lrange(10), dr]) - grouped = s.groupby(lambda x: x[0] % 2 == 0) - expected = grouped.count() - - assert_series_equal(result, expected) - - def test_datetimeindex_repr_short(self): - dr = date_range(start='1/1/2012', periods=1) - repr(dr) - - dr = date_range(start='1/1/2012', periods=2) - repr(dr) - - dr = date_range(start='1/1/2012', periods=3) - repr(dr) - - def test_constructor_int64_nocopy(self): - # #1624 - arr = np.arange(1000, dtype=np.int64) - index = DatetimeIndex(arr) - - arr[50:100] = -1 - self.assertTrue((index.asi8[50:100] == -1).all()) - - arr = np.arange(1000, dtype=np.int64) - index = DatetimeIndex(arr, copy=True) - - arr[50:100] = -1 - self.assertTrue((index.asi8[50:100] != -1).all()) - - def test_series_interpolate_method_values(self): - # #1646 - ts = _simple_ts('1/1/2000', '1/20/2000') - ts[::2] = np.nan - - result = ts.interpolate(method='values') - exp = ts.interpolate() - assert_series_equal(result, exp) - - def test_frame_datetime64_handling_groupby(self): - # it works! - df = DataFrame([(3, np.datetime64('2012-07-03')), - (3, np.datetime64('2012-07-04'))], - columns=['a', 'date']) - result = df.groupby('a').first() - self.assertEqual(result['date'][3], Timestamp('2012-07-03')) - - def test_series_interpolate_intraday(self): - # #1698 - index = pd.date_range('1/1/2012', periods=4, freq='12D') - ts = pd.Series([0, 12, 24, 36], index) - new_index = index.append(index + pd.DateOffset(days=1)).sort_values() - - exp = ts.reindex(new_index).interpolate(method='time') - - index = pd.date_range('1/1/2012', periods=4, freq='12H') - ts = pd.Series([0, 12, 24, 36], index) - new_index = index.append(index + pd.DateOffset(hours=1)).sort_values() - result = ts.reindex(new_index).interpolate(method='time') - - self.assert_numpy_array_equal(result.values, exp.values) - - def test_frame_dict_constructor_datetime64_1680(self): - dr = date_range('1/1/2012', periods=10) - s = Series(dr, index=dr) - - # it works! - DataFrame({'a': 'foo', 'b': s}, index=dr) - DataFrame({'a': 'foo', 'b': s.values}, index=dr) - - def test_frame_datetime64_mixed_index_ctor_1681(self): - dr = date_range('2011/1/1', '2012/1/1', freq='W-FRI') - ts = Series(dr) - - # it works! - d = DataFrame({'A': 'foo', 'B': ts}, index=dr) - self.assertTrue(d['B'].isnull().all()) - - def test_frame_timeseries_to_records(self): - index = date_range('1/1/2000', periods=10) - df = DataFrame(np.random.randn(10, 3), index=index, - columns=['a', 'b', 'c']) - - result = df.to_records() - result['index'].dtype == 'M8[ns]' - - result = df.to_records(index=False) - - def test_frame_datetime64_duplicated(self): - dates = date_range('2010-07-01', end='2010-08-05') - - tst = DataFrame({'symbol': 'AAA', 'date': dates}) - result = tst.duplicated(['date', 'symbol']) - self.assertTrue((-result).all()) - - tst = DataFrame({'date': dates}) - result = tst.duplicated() - self.assertTrue((-result).all()) - - def test_timestamp_compare_with_early_datetime(self): - # e.g. datetime.min - stamp = Timestamp('2012-01-01') - - self.assertFalse(stamp == datetime.min) - self.assertFalse(stamp == datetime(1600, 1, 1)) - self.assertFalse(stamp == datetime(2700, 1, 1)) - self.assertNotEqual(stamp, datetime.min) - self.assertNotEqual(stamp, datetime(1600, 1, 1)) - self.assertNotEqual(stamp, datetime(2700, 1, 1)) - self.assertTrue(stamp > datetime(1600, 1, 1)) - self.assertTrue(stamp >= datetime(1600, 1, 1)) - self.assertTrue(stamp < datetime(2700, 1, 1)) - self.assertTrue(stamp <= datetime(2700, 1, 1)) - - def test_to_html_timestamp(self): - rng = date_range('2000-01-01', periods=10) - df = DataFrame(np.random.randn(10, 4), index=rng) - - result = df.to_html() - self.assertIn('2000-01-01', result) - - def test_to_csv_numpy_16_bug(self): - frame = DataFrame({'a': date_range('1/1/2000', periods=10)}) - - buf = StringIO() - frame.to_csv(buf) - - result = buf.getvalue() - self.assertIn('2000-01-01', result) - - def test_series_map_box_timestamps(self): - # #2689, #2627 - s = Series(date_range('1/1/2000', periods=10)) - - def f(x): - return (x.hour, x.day, x.month) - - # it works! - s.map(f) - s.apply(f) - DataFrame(s).applymap(f) - - def test_series_map_box_timedelta(self): - # GH 11349 - s = Series(timedelta_range('1 day 1 s', periods=5, freq='h')) - - def f(x): - return x.total_seconds() - - s.map(f) - s.apply(f) - DataFrame(s).applymap(f) - - def test_concat_datetime_datetime64_frame(self): - # #2624 - rows = [] - rows.append([datetime(2010, 1, 1), 1]) - rows.append([datetime(2010, 1, 2), 'hi']) - - df2_obj = DataFrame.from_records(rows, columns=['date', 'test']) - - ind = date_range(start="2000/1/1", freq="D", periods=10) - df1 = DataFrame({'date': ind, 'test': lrange(10)}) - - # it works! - pd.concat([df1, df2_obj]) - - def test_asfreq_resample_set_correct_freq(self): - # GH5613 - # we test if .asfreq() and .resample() set the correct value for .freq - df = pd.DataFrame({'date': ["2012-01-01", "2012-01-02", "2012-01-03"], - 'col': [1, 2, 3]}) - df = df.set_index(pd.to_datetime(df.date)) - - # testing the settings before calling .asfreq() and .resample() - self.assertEqual(df.index.freq, None) - self.assertEqual(df.index.inferred_freq, 'D') - - # does .asfreq() set .freq correctly? - self.assertEqual(df.asfreq('D').index.freq, 'D') - - # does .resample() set .freq correctly? - self.assertEqual(df.resample('D').asfreq().index.freq, 'D') - - def test_pickle(self): - - # GH4606 - p = self.round_trip_pickle(NaT) - self.assertTrue(p is NaT) - - idx = pd.to_datetime(['2013-01-01', NaT, '2014-01-06']) - idx_p = self.round_trip_pickle(idx) - self.assertTrue(idx_p[0] == idx[0]) - self.assertTrue(idx_p[1] is NaT) - self.assertTrue(idx_p[2] == idx[2]) - - # GH11002 - # don't infer freq - idx = date_range('1750-1-1', '2050-1-1', freq='7D') - idx_p = self.round_trip_pickle(idx) - tm.assert_index_equal(idx, idx_p) - - def test_timestamp_equality(self): - - # GH 11034 - s = Series([Timestamp('2000-01-29 01:59:00'), 'NaT']) - result = s != s - assert_series_equal(result, Series([False, True])) - result = s != s[0] - assert_series_equal(result, Series([False, True])) - result = s != s[1] - assert_series_equal(result, Series([True, True])) - - result = s == s - assert_series_equal(result, Series([True, False])) - result = s == s[0] - assert_series_equal(result, Series([True, False])) - result = s == s[1] - assert_series_equal(result, Series([False, False])) - - -def _simple_ts(start, end, freq='D'): - rng = date_range(start, end, freq=freq) - return Series(np.random.randn(len(rng)), index=rng) - - -class TestToDatetime(tm.TestCase): - _multiprocess_can_split_ = True - - def test_to_datetime_dt64s(self): - in_bound_dts = [ - np.datetime64('2000-01-01'), - np.datetime64('2000-01-02'), - ] - - for dt in in_bound_dts: - self.assertEqual(pd.to_datetime(dt), Timestamp(dt)) - - oob_dts = [np.datetime64('1000-01-01'), np.datetime64('5000-01-02'), ] - - for dt in oob_dts: - self.assertRaises(ValueError, pd.to_datetime, dt, errors='raise') - self.assertRaises(ValueError, tslib.Timestamp, dt) - self.assertIs(pd.to_datetime(dt, errors='coerce'), NaT) - - def test_to_datetime_array_of_dt64s(self): - dts = [np.datetime64('2000-01-01'), np.datetime64('2000-01-02'), ] - - # Assuming all datetimes are in bounds, to_datetime() returns - # an array that is equal to Timestamp() parsing - self.assert_numpy_array_equal( - pd.to_datetime(dts, box=False), - np.array([Timestamp(x).asm8 for x in dts]) - ) - - # A list of datetimes where the last one is out of bounds - dts_with_oob = dts + [np.datetime64('9999-01-01')] - - self.assertRaises(ValueError, pd.to_datetime, dts_with_oob, - errors='raise') - - self.assert_numpy_array_equal( - pd.to_datetime(dts_with_oob, box=False, errors='coerce'), - np.array( - [ - Timestamp(dts_with_oob[0]).asm8, - Timestamp(dts_with_oob[1]).asm8, - iNaT, - ], - dtype='M8' - ) - ) - - # With errors='ignore', out of bounds datetime64s - # are converted to their .item(), which depending on the version of - # numpy is either a python datetime.datetime or datetime.date - self.assert_numpy_array_equal( - pd.to_datetime(dts_with_oob, box=False, errors='ignore'), - np.array( - [dt.item() for dt in dts_with_oob], - dtype='O' - ) - ) - - def test_to_datetime_tz(self): - - # xref 8260 - # uniform returns a DatetimeIndex - arr = [pd.Timestamp('2013-01-01 13:00:00-0800', tz='US/Pacific'), - pd.Timestamp('2013-01-02 14:00:00-0800', tz='US/Pacific')] - result = pd.to_datetime(arr) - expected = DatetimeIndex( - ['2013-01-01 13:00:00', '2013-01-02 14:00:00'], tz='US/Pacific') - tm.assert_index_equal(result, expected) - - # mixed tzs will raise - arr = [pd.Timestamp('2013-01-01 13:00:00', tz='US/Pacific'), - pd.Timestamp('2013-01-02 14:00:00', tz='US/Eastern')] - self.assertRaises(ValueError, lambda: pd.to_datetime(arr)) - - def test_to_datetime_tz_pytz(self): - - # xref 8260 - tm._skip_if_no_pytz() - import pytz - - us_eastern = pytz.timezone('US/Eastern') - arr = np.array([us_eastern.localize(datetime(year=2000, month=1, day=1, - hour=3, minute=0)), - us_eastern.localize(datetime(year=2000, month=6, day=1, - hour=3, minute=0))], - dtype=object) - result = pd.to_datetime(arr, utc=True) - expected = DatetimeIndex(['2000-01-01 08:00:00+00:00', - '2000-06-01 07:00:00+00:00'], - dtype='datetime64[ns, UTC]', freq=None) - tm.assert_index_equal(result, expected) - - def test_to_datetime_utc_is_true(self): - # See gh-11934 - start = pd.Timestamp('2014-01-01', tz='utc') - end = pd.Timestamp('2014-01-03', tz='utc') - date_range = pd.bdate_range(start, end) - - result = pd.to_datetime(date_range, utc=True) - expected = pd.DatetimeIndex(data=date_range) - tm.assert_index_equal(result, expected) - - def test_to_datetime_tz_psycopg2(self): - - # xref 8260 - try: - import psycopg2 - except ImportError: - raise nose.SkipTest("no psycopg2 installed") - - # misc cases - tz1 = psycopg2.tz.FixedOffsetTimezone(offset=-300, name=None) - tz2 = psycopg2.tz.FixedOffsetTimezone(offset=-240, name=None) - arr = np.array([datetime(2000, 1, 1, 3, 0, tzinfo=tz1), - datetime(2000, 6, 1, 3, 0, tzinfo=tz2)], - dtype=object) - - result = pd.to_datetime(arr, errors='coerce', utc=True) - expected = DatetimeIndex(['2000-01-01 08:00:00+00:00', - '2000-06-01 07:00:00+00:00'], - dtype='datetime64[ns, UTC]', freq=None) - tm.assert_index_equal(result, expected) - - # dtype coercion - i = pd.DatetimeIndex([ - '2000-01-01 08:00:00+00:00' - ], tz=psycopg2.tz.FixedOffsetTimezone(offset=-300, name=None)) - self.assertTrue(is_datetime64_ns_dtype(i)) - - # tz coerceion - result = pd.to_datetime(i, errors='coerce') - tm.assert_index_equal(result, i) - - result = pd.to_datetime(i, errors='coerce', utc=True) - expected = pd.DatetimeIndex(['2000-01-01 13:00:00'], - dtype='datetime64[ns, UTC]') - tm.assert_index_equal(result, expected) - - def test_datetime_bool(self): - # GH13176 - with self.assertRaises(TypeError): - to_datetime(False) - self.assertTrue(to_datetime(False, errors="coerce") is tslib.NaT) - self.assertEqual(to_datetime(False, errors="ignore"), False) - with self.assertRaises(TypeError): - to_datetime(True) - self.assertTrue(to_datetime(True, errors="coerce") is tslib.NaT) - self.assertEqual(to_datetime(True, errors="ignore"), True) - with self.assertRaises(TypeError): - to_datetime([False, datetime.today()]) - with self.assertRaises(TypeError): - to_datetime(['20130101', True]) - tm.assert_index_equal(to_datetime([0, False, tslib.NaT, 0.0], - errors="coerce"), - DatetimeIndex([to_datetime(0), tslib.NaT, - tslib.NaT, to_datetime(0)])) - - def test_datetime_invalid_datatype(self): - # GH13176 - - with self.assertRaises(TypeError): - pd.to_datetime(bool) - with self.assertRaises(TypeError): - pd.to_datetime(pd.to_datetime) - - def test_unit(self): - # GH 11758 - # test proper behavior with erros - - with self.assertRaises(ValueError): - to_datetime([1], unit='D', format='%Y%m%d') - - values = [11111111, 1, 1.0, tslib.iNaT, pd.NaT, np.nan, - 'NaT', ''] - result = to_datetime(values, unit='D', errors='ignore') - expected = Index([11111111, Timestamp('1970-01-02'), - Timestamp('1970-01-02'), pd.NaT, - pd.NaT, pd.NaT, pd.NaT, pd.NaT], - dtype=object) - tm.assert_index_equal(result, expected) - - result = to_datetime(values, unit='D', errors='coerce') - expected = DatetimeIndex(['NaT', '1970-01-02', '1970-01-02', - 'NaT', 'NaT', 'NaT', 'NaT', 'NaT']) - tm.assert_index_equal(result, expected) - - with self.assertRaises(tslib.OutOfBoundsDatetime): - to_datetime(values, unit='D', errors='raise') - - values = [1420043460000, tslib.iNaT, pd.NaT, np.nan, 'NaT'] - - result = to_datetime(values, errors='ignore', unit='s') - expected = Index([1420043460000, pd.NaT, pd.NaT, - pd.NaT, pd.NaT], dtype=object) - tm.assert_index_equal(result, expected) - - result = to_datetime(values, errors='coerce', unit='s') - expected = DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT', 'NaT']) - tm.assert_index_equal(result, expected) - - with self.assertRaises(tslib.OutOfBoundsDatetime): - to_datetime(values, errors='raise', unit='s') - - # if we have a string, then we raise a ValueError - # and NOT an OutOfBoundsDatetime - for val in ['foo', Timestamp('20130101')]: - try: - to_datetime(val, errors='raise', unit='s') - except tslib.OutOfBoundsDatetime: - raise AssertionError("incorrect exception raised") - except ValueError: - pass - - def test_unit_consistency(self): - - # consistency of conversions - expected = Timestamp('1970-05-09 14:25:11') - result = pd.to_datetime(11111111, unit='s', errors='raise') - self.assertEqual(result, expected) - self.assertIsInstance(result, Timestamp) - - result = pd.to_datetime(11111111, unit='s', errors='coerce') - self.assertEqual(result, expected) - self.assertIsInstance(result, Timestamp) - - result = pd.to_datetime(11111111, unit='s', errors='ignore') - self.assertEqual(result, expected) - self.assertIsInstance(result, Timestamp) - - def test_unit_with_numeric(self): - - # GH 13180 - # coercions from floats/ints are ok - expected = DatetimeIndex(['2015-06-19 05:33:20', - '2015-05-27 22:33:20']) - arr1 = [1.434692e+18, 1.432766e+18] - arr2 = np.array(arr1).astype('int64') - for errors in ['ignore', 'raise', 'coerce']: - result = pd.to_datetime(arr1, errors=errors) - tm.assert_index_equal(result, expected) - - result = pd.to_datetime(arr2, errors=errors) - tm.assert_index_equal(result, expected) - - # but we want to make sure that we are coercing - # if we have ints/strings - expected = DatetimeIndex(['NaT', - '2015-06-19 05:33:20', - '2015-05-27 22:33:20']) - arr = ['foo', 1.434692e+18, 1.432766e+18] - result = pd.to_datetime(arr, errors='coerce') - tm.assert_index_equal(result, expected) - - expected = DatetimeIndex(['2015-06-19 05:33:20', - '2015-05-27 22:33:20', - 'NaT', - 'NaT']) - arr = [1.434692e+18, 1.432766e+18, 'foo', 'NaT'] - result = pd.to_datetime(arr, errors='coerce') - tm.assert_index_equal(result, expected) - - def test_unit_mixed(self): - - # mixed integers/datetimes - expected = DatetimeIndex(['2013-01-01', 'NaT', 'NaT']) - arr = [pd.Timestamp('20130101'), 1.434692e+18, 1.432766e+18] - result = pd.to_datetime(arr, errors='coerce') - tm.assert_index_equal(result, expected) - - with self.assertRaises(ValueError): - pd.to_datetime(arr, errors='raise') - - expected = DatetimeIndex(['NaT', - 'NaT', - '2013-01-01']) - arr = [1.434692e+18, 1.432766e+18, pd.Timestamp('20130101')] - result = pd.to_datetime(arr, errors='coerce') - tm.assert_index_equal(result, expected) - - with self.assertRaises(ValueError): - pd.to_datetime(arr, errors='raise') - - def test_index_to_datetime(self): - idx = Index(['1/1/2000', '1/2/2000', '1/3/2000']) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = idx.to_datetime() - expected = DatetimeIndex(pd.to_datetime(idx.values)) - tm.assert_index_equal(result, expected) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - today = datetime.today() - idx = Index([today], dtype=object) - result = idx.to_datetime() - expected = DatetimeIndex([today]) - tm.assert_index_equal(result, expected) - - def test_dataframe(self): - - df = DataFrame({'year': [2015, 2016], - 'month': [2, 3], - 'day': [4, 5], - 'hour': [6, 7], - 'minute': [58, 59], - 'second': [10, 11], - 'ms': [1, 1], - 'us': [2, 2], - 'ns': [3, 3]}) - - result = to_datetime({'year': df['year'], - 'month': df['month'], - 'day': df['day']}) - expected = Series([Timestamp('20150204 00:00:00'), - Timestamp('20160305 00:0:00')]) - assert_series_equal(result, expected) - - # dict-like - result = to_datetime(df[['year', 'month', 'day']].to_dict()) - assert_series_equal(result, expected) - - # dict but with constructable - df2 = df[['year', 'month', 'day']].to_dict() - df2['month'] = 2 - result = to_datetime(df2) - expected2 = Series([Timestamp('20150204 00:00:00'), - Timestamp('20160205 00:0:00')]) - assert_series_equal(result, expected2) - - # unit mappings - units = [{'year': 'years', - 'month': 'months', - 'day': 'days', - 'hour': 'hours', - 'minute': 'minutes', - 'second': 'seconds'}, - {'year': 'year', - 'month': 'month', - 'day': 'day', - 'hour': 'hour', - 'minute': 'minute', - 'second': 'second'}, - ] - - for d in units: - result = to_datetime(df[list(d.keys())].rename(columns=d)) - expected = Series([Timestamp('20150204 06:58:10'), - Timestamp('20160305 07:59:11')]) - assert_series_equal(result, expected) - - d = {'year': 'year', - 'month': 'month', - 'day': 'day', - 'hour': 'hour', - 'minute': 'minute', - 'second': 'second', - 'ms': 'ms', - 'us': 'us', - 'ns': 'ns'} - - result = to_datetime(df.rename(columns=d)) - expected = Series([Timestamp('20150204 06:58:10.001002003'), - Timestamp('20160305 07:59:11.001002003')]) - assert_series_equal(result, expected) - - # coerce back to int - result = to_datetime(df.astype(str)) - assert_series_equal(result, expected) - - # passing coerce - df2 = DataFrame({'year': [2015, 2016], - 'month': [2, 20], - 'day': [4, 5]}) - with self.assertRaises(ValueError): - to_datetime(df2) - result = to_datetime(df2, errors='coerce') - expected = Series([Timestamp('20150204 00:00:00'), - pd.NaT]) - assert_series_equal(result, expected) - - # extra columns - with self.assertRaises(ValueError): - df2 = df.copy() - df2['foo'] = 1 - to_datetime(df2) - - # not enough - for c in [['year'], - ['year', 'month'], - ['year', 'month', 'second'], - ['month', 'day'], - ['year', 'day', 'second']]: - with self.assertRaises(ValueError): - to_datetime(df[c]) - - # duplicates - df2 = DataFrame({'year': [2015, 2016], - 'month': [2, 20], - 'day': [4, 5]}) - df2.columns = ['year', 'year', 'day'] - with self.assertRaises(ValueError): - to_datetime(df2) - - df2 = DataFrame({'year': [2015, 2016], - 'month': [2, 20], - 'day': [4, 5], - 'hour': [4, 5]}) - df2.columns = ['year', 'month', 'day', 'day'] - with self.assertRaises(ValueError): - to_datetime(df2) - - def test_dataframe_dtypes(self): - # #13451 - df = DataFrame({'year': [2015, 2016], - 'month': [2, 3], - 'day': [4, 5]}) - - # int16 - result = to_datetime(df.astype('int16')) - expected = Series([Timestamp('20150204 00:00:00'), - Timestamp('20160305 00:00:00')]) - assert_series_equal(result, expected) - - # mixed dtypes - df['month'] = df['month'].astype('int8') - df['day'] = df['day'].astype('int8') - result = to_datetime(df) - expected = Series([Timestamp('20150204 00:00:00'), - Timestamp('20160305 00:00:00')]) - assert_series_equal(result, expected) - - # float - df = DataFrame({'year': [2000, 2001], - 'month': [1.5, 1], - 'day': [1, 1]}) - with self.assertRaises(ValueError): - to_datetime(df) - - -class TestDatetimeIndex(tm.TestCase): - _multiprocess_can_split_ = True - - def test_hash_error(self): - index = date_range('20010101', periods=10) - with tm.assertRaisesRegexp(TypeError, "unhashable type: %r" % - type(index).__name__): - hash(index) - - def test_stringified_slice_with_tz(self): - # GH2658 - import datetime - start = datetime.datetime.now() - idx = DatetimeIndex(start=start, freq="1d", periods=10) - df = DataFrame(lrange(10), index=idx) - df["2013-01-14 23:44:34.437768-05:00":] # no exception here - - def test_append_join_nondatetimeindex(self): - rng = date_range('1/1/2000', periods=10) - idx = Index(['a', 'b', 'c', 'd']) - - result = rng.append(idx) - tm.assertIsInstance(result[0], Timestamp) - - # it works - rng.join(idx, how='outer') - - def test_to_period_nofreq(self): - idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04']) - self.assertRaises(ValueError, idx.to_period) - - idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], - freq='infer') - self.assertEqual(idx.freqstr, 'D') - expected = pd.PeriodIndex(['2000-01-01', '2000-01-02', - '2000-01-03'], freq='D') - tm.assert_index_equal(idx.to_period(), expected) - - # GH 7606 - idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03']) - self.assertEqual(idx.freqstr, None) - tm.assert_index_equal(idx.to_period(), expected) - - def test_000constructor_resolution(self): - # 2252 - t1 = Timestamp((1352934390 * 1000000000) + 1000000 + 1000 + 1) - idx = DatetimeIndex([t1]) - - self.assertEqual(idx.nanosecond[0], t1.nanosecond) - - def test_constructor_coverage(self): - rng = date_range('1/1/2000', periods=10.5) - exp = date_range('1/1/2000', periods=10) - tm.assert_index_equal(rng, exp) - - self.assertRaises(ValueError, DatetimeIndex, start='1/1/2000', - periods='foo', freq='D') - - self.assertRaises(ValueError, DatetimeIndex, start='1/1/2000', - end='1/10/2000') - - self.assertRaises(ValueError, DatetimeIndex, '1/1/2000') - - # generator expression - gen = (datetime(2000, 1, 1) + timedelta(i) for i in range(10)) - result = DatetimeIndex(gen) - expected = DatetimeIndex([datetime(2000, 1, 1) + timedelta(i) - for i in range(10)]) - tm.assert_index_equal(result, expected) - - # NumPy string array - strings = np.array(['2000-01-01', '2000-01-02', '2000-01-03']) - result = DatetimeIndex(strings) - expected = DatetimeIndex(strings.astype('O')) - tm.assert_index_equal(result, expected) - - from_ints = DatetimeIndex(expected.asi8) - tm.assert_index_equal(from_ints, expected) - - # string with NaT - strings = np.array(['2000-01-01', '2000-01-02', 'NaT']) - result = DatetimeIndex(strings) - expected = DatetimeIndex(strings.astype('O')) - tm.assert_index_equal(result, expected) - - from_ints = DatetimeIndex(expected.asi8) - tm.assert_index_equal(from_ints, expected) - - # non-conforming - self.assertRaises(ValueError, DatetimeIndex, - ['2000-01-01', '2000-01-02', '2000-01-04'], freq='D') - - self.assertRaises(ValueError, DatetimeIndex, start='2011-01-01', - freq='b') - self.assertRaises(ValueError, DatetimeIndex, end='2011-01-01', - freq='B') - self.assertRaises(ValueError, DatetimeIndex, periods=10, freq='D') - - def test_constructor_datetime64_tzformat(self): - # GH 6572 - tm._skip_if_no_pytz() - import pytz - # ISO 8601 format results in pytz.FixedOffset - for freq in ['AS', 'W-SUN']: - idx = date_range('2013-01-01T00:00:00-05:00', - '2016-01-01T23:59:59-05:00', freq=freq) - expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', - freq=freq, tz=pytz.FixedOffset(-300)) - tm.assert_index_equal(idx, expected) - # Unable to use `US/Eastern` because of DST - expected_i8 = date_range('2013-01-01T00:00:00', - '2016-01-01T23:59:59', freq=freq, - tz='America/Lima') - self.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) - - idx = date_range('2013-01-01T00:00:00+09:00', - '2016-01-01T23:59:59+09:00', freq=freq) - expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', - freq=freq, tz=pytz.FixedOffset(540)) - tm.assert_index_equal(idx, expected) - expected_i8 = date_range('2013-01-01T00:00:00', - '2016-01-01T23:59:59', freq=freq, - tz='Asia/Tokyo') - self.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) - - tm._skip_if_no_dateutil() - - # Non ISO 8601 format results in dateutil.tz.tzoffset - for freq in ['AS', 'W-SUN']: - idx = date_range('2013/1/1 0:00:00-5:00', '2016/1/1 23:59:59-5:00', - freq=freq) - expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', - freq=freq, tz=pytz.FixedOffset(-300)) - tm.assert_index_equal(idx, expected) - # Unable to use `US/Eastern` because of DST - expected_i8 = date_range('2013-01-01T00:00:00', - '2016-01-01T23:59:59', freq=freq, - tz='America/Lima') - self.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) - - idx = date_range('2013/1/1 0:00:00+9:00', - '2016/1/1 23:59:59+09:00', freq=freq) - expected = date_range('2013-01-01T00:00:00', '2016-01-01T23:59:59', - freq=freq, tz=pytz.FixedOffset(540)) - tm.assert_index_equal(idx, expected) - expected_i8 = date_range('2013-01-01T00:00:00', - '2016-01-01T23:59:59', freq=freq, - tz='Asia/Tokyo') - self.assert_numpy_array_equal(idx.asi8, expected_i8.asi8) - - def test_constructor_dtype(self): - - # passing a dtype with a tz should localize - idx = DatetimeIndex(['2013-01-01', '2013-01-02'], - dtype='datetime64[ns, US/Eastern]') - expected = DatetimeIndex(['2013-01-01', '2013-01-02'] - ).tz_localize('US/Eastern') - tm.assert_index_equal(idx, expected) - - idx = DatetimeIndex(['2013-01-01', '2013-01-02'], - tz='US/Eastern') - tm.assert_index_equal(idx, expected) - - # if we already have a tz and its not the same, then raise - idx = DatetimeIndex(['2013-01-01', '2013-01-02'], - dtype='datetime64[ns, US/Eastern]') - - self.assertRaises(ValueError, - lambda: DatetimeIndex(idx, - dtype='datetime64[ns]')) - - # this is effectively trying to convert tz's - self.assertRaises(TypeError, - lambda: DatetimeIndex(idx, - dtype='datetime64[ns, CET]')) - self.assertRaises(ValueError, - lambda: DatetimeIndex( - idx, tz='CET', - dtype='datetime64[ns, US/Eastern]')) - result = DatetimeIndex(idx, dtype='datetime64[ns, US/Eastern]') - tm.assert_index_equal(idx, result) - - def test_constructor_name(self): - idx = DatetimeIndex(start='2000-01-01', periods=1, freq='A', - name='TEST') - self.assertEqual(idx.name, 'TEST') - - def test_comparisons_coverage(self): - rng = date_range('1/1/2000', periods=10) - - # raise TypeError for now - self.assertRaises(TypeError, rng.__lt__, rng[3].value) - - result = rng == list(rng) - exp = rng == rng - self.assert_numpy_array_equal(result, exp) - - def test_comparisons_nat(self): - - fidx1 = pd.Index([1.0, np.nan, 3.0, np.nan, 5.0, 7.0]) - fidx2 = pd.Index([2.0, 3.0, np.nan, np.nan, 6.0, 7.0]) - - didx1 = pd.DatetimeIndex(['2014-01-01', pd.NaT, '2014-03-01', pd.NaT, - '2014-05-01', '2014-07-01']) - didx2 = pd.DatetimeIndex(['2014-02-01', '2014-03-01', pd.NaT, pd.NaT, - '2014-06-01', '2014-07-01']) - darr = np.array([np_datetime64_compat('2014-02-01 00:00Z'), - np_datetime64_compat('2014-03-01 00:00Z'), - np_datetime64_compat('nat'), np.datetime64('nat'), - np_datetime64_compat('2014-06-01 00:00Z'), - np_datetime64_compat('2014-07-01 00:00Z')]) - - if _np_version_under1p8: - # cannot test array because np.datetime('nat') returns today's date - cases = [(fidx1, fidx2), (didx1, didx2)] - else: - cases = [(fidx1, fidx2), (didx1, didx2), (didx1, darr)] - - # Check pd.NaT is handles as the same as np.nan - with tm.assert_produces_warning(None): - for idx1, idx2 in cases: - - result = idx1 < idx2 - expected = np.array([True, False, False, False, True, False]) - self.assert_numpy_array_equal(result, expected) - - result = idx2 > idx1 - expected = np.array([True, False, False, False, True, False]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 <= idx2 - expected = np.array([True, False, False, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx2 >= idx1 - expected = np.array([True, False, False, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 == idx2 - expected = np.array([False, False, False, False, False, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 != idx2 - expected = np.array([True, True, True, True, True, False]) - self.assert_numpy_array_equal(result, expected) - - with tm.assert_produces_warning(None): - for idx1, val in [(fidx1, np.nan), (didx1, pd.NaT)]: - result = idx1 < val - expected = np.array([False, False, False, False, False, False]) - self.assert_numpy_array_equal(result, expected) - result = idx1 > val - self.assert_numpy_array_equal(result, expected) - - result = idx1 <= val - self.assert_numpy_array_equal(result, expected) - result = idx1 >= val - self.assert_numpy_array_equal(result, expected) - - result = idx1 == val - self.assert_numpy_array_equal(result, expected) - - result = idx1 != val - expected = np.array([True, True, True, True, True, True]) - self.assert_numpy_array_equal(result, expected) - - # Check pd.NaT is handles as the same as np.nan - with tm.assert_produces_warning(None): - for idx1, val in [(fidx1, 3), (didx1, datetime(2014, 3, 1))]: - result = idx1 < val - expected = np.array([True, False, False, False, False, False]) - self.assert_numpy_array_equal(result, expected) - result = idx1 > val - expected = np.array([False, False, False, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 <= val - expected = np.array([True, False, True, False, False, False]) - self.assert_numpy_array_equal(result, expected) - result = idx1 >= val - expected = np.array([False, False, True, False, True, True]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 == val - expected = np.array([False, False, True, False, False, False]) - self.assert_numpy_array_equal(result, expected) - - result = idx1 != val - expected = np.array([True, True, False, True, True, True]) - self.assert_numpy_array_equal(result, expected) - - def test_map(self): - rng = date_range('1/1/2000', periods=10) - - f = lambda x: x.strftime('%Y%m%d') - result = rng.map(f) - exp = Index([f(x) for x in rng], dtype='= -1') - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -5]), fill_value=True) - - with tm.assertRaises(IndexError): - idx.take(np.array([1, -5])) - - def test_take_fill_value_with_timezone(self): - idx = pd.DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01'], - name='xxx', tz='US/Eastern') - result = idx.take(np.array([1, 0, -1])) - expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', '2011-03-01'], - name='xxx', tz='US/Eastern') - tm.assert_index_equal(result, expected) - - # fill_value - result = idx.take(np.array([1, 0, -1]), fill_value=True) - expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', 'NaT'], - name='xxx', tz='US/Eastern') - tm.assert_index_equal(result, expected) - - # allow_fill=False - result = idx.take(np.array([1, 0, -1]), allow_fill=False, - fill_value=True) - expected = pd.DatetimeIndex(['2011-02-01', '2011-01-01', '2011-03-01'], - name='xxx', tz='US/Eastern') - tm.assert_index_equal(result, expected) - - msg = ('When allow_fill=True and fill_value is not None, ' - 'all indices must be >= -1') - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -2]), fill_value=True) - with tm.assertRaisesRegexp(ValueError, msg): - idx.take(np.array([1, 0, -5]), fill_value=True) - - with tm.assertRaises(IndexError): - idx.take(np.array([1, -5])) - - def test_map_bug_1677(self): - index = DatetimeIndex(['2012-04-25 09:30:00.393000']) - f = index.asof - - result = index.map(f) - expected = Index([f(index[0])]) - tm.assert_index_equal(result, expected) - - def test_groupby_function_tuple_1677(self): - df = DataFrame(np.random.rand(100), - index=date_range("1/1/2000", periods=100)) - monthly_group = df.groupby(lambda x: (x.year, x.month)) - - result = monthly_group.mean() - tm.assertIsInstance(result.index[0], tuple) - - def test_append_numpy_bug_1681(self): - # another datetime64 bug - dr = date_range('2011/1/1', '2012/1/1', freq='W-FRI') - a = DataFrame() - c = DataFrame({'A': 'foo', 'B': dr}, index=dr) - - result = a.append(c) - self.assertTrue((result['B'] == dr).all()) - - def test_isin(self): - index = tm.makeDateIndex(4) - result = index.isin(index) - self.assertTrue(result.all()) - - result = index.isin(list(index)) - self.assertTrue(result.all()) - - assert_almost_equal(index.isin([index[2], 5]), - np.array([False, False, True, False])) - - def test_union(self): - i1 = Int64Index(np.arange(0, 20, 2)) - i2 = Int64Index(np.arange(10, 30, 2)) - result = i1.union(i2) - expected = Int64Index(np.arange(0, 30, 2)) - tm.assert_index_equal(result, expected) - - def test_union_with_DatetimeIndex(self): - i1 = Int64Index(np.arange(0, 20, 2)) - i2 = DatetimeIndex(start='2012-01-03 00:00:00', periods=10, freq='D') - i1.union(i2) # Works - i2.union(i1) # Fails with "AttributeError: can't set attribute" - - def test_time(self): - rng = pd.date_range('1/1/2000', freq='12min', periods=10) - result = pd.Index(rng).time - expected = [t.time() for t in rng] - self.assertTrue((result == expected).all()) - - def test_date(self): - rng = pd.date_range('1/1/2000', freq='12H', periods=10) - result = pd.Index(rng).date - expected = [t.date() for t in rng] - self.assertTrue((result == expected).all()) - - def test_does_not_convert_mixed_integer(self): - df = tm.makeCustomDataframe(10, 10, - data_gen_f=lambda *args, **kwargs: randn(), - r_idx_type='i', c_idx_type='dt') - cols = df.columns.join(df.index, how='outer') - joined = cols.join(df.columns) - self.assertEqual(cols.dtype, np.dtype('O')) - self.assertEqual(cols.dtype, joined.dtype) - tm.assert_numpy_array_equal(cols.values, joined.values) - - def test_slice_keeps_name(self): - # GH4226 - st = pd.Timestamp('2013-07-01 00:00:00', tz='America/Los_Angeles') - et = pd.Timestamp('2013-07-02 00:00:00', tz='America/Los_Angeles') - dr = pd.date_range(st, et, freq='H', name='timebucket') - self.assertEqual(dr[1:].name, dr.name) - - def test_join_self(self): - index = date_range('1/1/2000', periods=10) - kinds = 'outer', 'inner', 'left', 'right' - for kind in kinds: - joined = index.join(index, how=kind) - self.assertIs(index, joined) - - def assert_index_parameters(self, index): - assert index.freq == '40960N' - assert index.inferred_freq == '40960N' - - def test_ns_index(self): - nsamples = 400 - ns = int(1e9 / 24414) - dtstart = np.datetime64('2012-09-20T00:00:00') - - dt = dtstart + np.arange(nsamples) * np.timedelta64(ns, 'ns') - freq = ns * offsets.Nano() - index = pd.DatetimeIndex(dt, freq=freq, name='time') - self.assert_index_parameters(index) - - new_index = pd.DatetimeIndex(start=index[0], end=index[-1], - freq=index.freq) - self.assert_index_parameters(new_index) - - def test_join_with_period_index(self): - df = tm.makeCustomDataframe( - 10, 10, data_gen_f=lambda *args: np.random.randint(2), - c_idx_type='p', r_idx_type='dt') - s = df.iloc[:5, 0] - joins = 'left', 'right', 'inner', 'outer' - - for join in joins: - with tm.assertRaisesRegexp(ValueError, 'can only call with other ' - 'PeriodIndex-ed objects'): - df.columns.join(s.index, how=join) - - def test_factorize(self): - idx1 = DatetimeIndex(['2014-01', '2014-01', '2014-02', '2014-02', - '2014-03', '2014-03']) - - exp_arr = np.array([0, 0, 1, 1, 2, 2], dtype=np.intp) - exp_idx = DatetimeIndex(['2014-01', '2014-02', '2014-03']) - - arr, idx = idx1.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - arr, idx = idx1.factorize(sort=True) - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - # tz must be preserved - idx1 = idx1.tz_localize('Asia/Tokyo') - exp_idx = exp_idx.tz_localize('Asia/Tokyo') - - arr, idx = idx1.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - idx2 = pd.DatetimeIndex(['2014-03', '2014-03', '2014-02', '2014-01', - '2014-03', '2014-01']) - - exp_arr = np.array([2, 2, 1, 0, 2, 0], dtype=np.intp) - exp_idx = DatetimeIndex(['2014-01', '2014-02', '2014-03']) - arr, idx = idx2.factorize(sort=True) - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - exp_arr = np.array([0, 0, 1, 2, 0, 2], dtype=np.intp) - exp_idx = DatetimeIndex(['2014-03', '2014-02', '2014-01']) - arr, idx = idx2.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, exp_idx) - - # freq must be preserved - idx3 = date_range('2000-01', periods=4, freq='M', tz='Asia/Tokyo') - exp_arr = np.array([0, 1, 2, 3], dtype=np.intp) - arr, idx = idx3.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(idx, idx3) - - def test_factorize_tz(self): - # GH 13750 - for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: - base = pd.date_range('2016-11-05', freq='H', periods=100, tz=tz) - idx = base.repeat(5) - - exp_arr = np.arange(100, dtype=np.intp).repeat(5) - - for obj in [idx, pd.Series(idx)]: - arr, res = obj.factorize() - self.assert_numpy_array_equal(arr, exp_arr) - tm.assert_index_equal(res, base) - - def test_factorize_dst(self): - # GH 13750 - idx = pd.date_range('2016-11-06', freq='H', periods=12, - tz='US/Eastern') - - for obj in [idx, pd.Series(idx)]: - arr, res = obj.factorize() - self.assert_numpy_array_equal(arr, np.arange(12, dtype=np.intp)) - tm.assert_index_equal(res, idx) - - idx = pd.date_range('2016-06-13', freq='H', periods=12, - tz='US/Eastern') - - for obj in [idx, pd.Series(idx)]: - arr, res = obj.factorize() - self.assert_numpy_array_equal(arr, np.arange(12, dtype=np.intp)) - tm.assert_index_equal(res, idx) - - def test_slice_with_negative_step(self): - ts = Series(np.arange(20), - date_range('2014-01-01', periods=20, freq='MS')) - SLC = pd.IndexSlice - - def assert_slices_equivalent(l_slc, i_slc): - assert_series_equal(ts[l_slc], ts.iloc[i_slc]) - assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - assert_series_equal(ts.loc[l_slc], ts.iloc[i_slc]) - - assert_slices_equivalent(SLC[Timestamp('2014-10-01')::-1], SLC[9::-1]) - assert_slices_equivalent(SLC['2014-10-01'::-1], SLC[9::-1]) - - assert_slices_equivalent(SLC[:Timestamp('2014-10-01'):-1], SLC[:8:-1]) - assert_slices_equivalent(SLC[:'2014-10-01':-1], SLC[:8:-1]) - - assert_slices_equivalent(SLC['2015-02-01':'2014-10-01':-1], - SLC[13:8:-1]) - assert_slices_equivalent(SLC[Timestamp('2015-02-01'):Timestamp( - '2014-10-01'):-1], SLC[13:8:-1]) - assert_slices_equivalent(SLC['2015-02-01':Timestamp('2014-10-01'):-1], - SLC[13:8:-1]) - assert_slices_equivalent(SLC[Timestamp('2015-02-01'):'2014-10-01':-1], - SLC[13:8:-1]) - - assert_slices_equivalent(SLC['2014-10-01':'2015-02-01':-1], SLC[:0]) - - def test_slice_with_zero_step_raises(self): - ts = Series(np.arange(20), - date_range('2014-01-01', periods=20, freq='MS')) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - self.assertRaisesRegexp(ValueError, 'slice step cannot be zero', - lambda: ts.loc[::0]) - - def test_slice_bounds_empty(self): - # GH 14354 - empty_idx = DatetimeIndex(freq='1H', periods=0, end='2015') - - right = empty_idx._maybe_cast_slice_bound('2015-01-02', 'right', 'loc') - exp = Timestamp('2015-01-02 23:59:59.999999999') - self.assertEqual(right, exp) - - left = empty_idx._maybe_cast_slice_bound('2015-01-02', 'left', 'loc') - exp = Timestamp('2015-01-02 00:00:00') - self.assertEqual(left, exp) - - -class TestDatetime64(tm.TestCase): - """ - Also test support for datetime64[ns] in Series / DataFrame - """ - - def setUp(self): - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='Min') - self.series = Series(rand(len(dti)), dti) - - def test_datetimeindex_accessors(self): - dti = DatetimeIndex(freq='D', start=datetime(1998, 1, 1), periods=365) - - self.assertEqual(dti.year[0], 1998) - self.assertEqual(dti.month[0], 1) - self.assertEqual(dti.day[0], 1) - self.assertEqual(dti.hour[0], 0) - self.assertEqual(dti.minute[0], 0) - self.assertEqual(dti.second[0], 0) - self.assertEqual(dti.microsecond[0], 0) - self.assertEqual(dti.dayofweek[0], 3) - - self.assertEqual(dti.dayofyear[0], 1) - self.assertEqual(dti.dayofyear[120], 121) - - self.assertEqual(dti.weekofyear[0], 1) - self.assertEqual(dti.weekofyear[120], 18) - - self.assertEqual(dti.quarter[0], 1) - self.assertEqual(dti.quarter[120], 2) - - self.assertEqual(dti.days_in_month[0], 31) - self.assertEqual(dti.days_in_month[90], 30) - - self.assertEqual(dti.is_month_start[0], True) - self.assertEqual(dti.is_month_start[1], False) - self.assertEqual(dti.is_month_start[31], True) - self.assertEqual(dti.is_quarter_start[0], True) - self.assertEqual(dti.is_quarter_start[90], True) - self.assertEqual(dti.is_year_start[0], True) - self.assertEqual(dti.is_year_start[364], False) - self.assertEqual(dti.is_month_end[0], False) - self.assertEqual(dti.is_month_end[30], True) - self.assertEqual(dti.is_month_end[31], False) - self.assertEqual(dti.is_month_end[364], True) - self.assertEqual(dti.is_quarter_end[0], False) - self.assertEqual(dti.is_quarter_end[30], False) - self.assertEqual(dti.is_quarter_end[89], True) - self.assertEqual(dti.is_quarter_end[364], True) - self.assertEqual(dti.is_year_end[0], False) - self.assertEqual(dti.is_year_end[364], True) - - # GH 11128 - self.assertEqual(dti.weekday_name[4], u'Monday') - self.assertEqual(dti.weekday_name[5], u'Tuesday') - self.assertEqual(dti.weekday_name[6], u'Wednesday') - self.assertEqual(dti.weekday_name[7], u'Thursday') - self.assertEqual(dti.weekday_name[8], u'Friday') - self.assertEqual(dti.weekday_name[9], u'Saturday') - self.assertEqual(dti.weekday_name[10], u'Sunday') - - self.assertEqual(Timestamp('2016-04-04').weekday_name, u'Monday') - self.assertEqual(Timestamp('2016-04-05').weekday_name, u'Tuesday') - self.assertEqual(Timestamp('2016-04-06').weekday_name, u'Wednesday') - self.assertEqual(Timestamp('2016-04-07').weekday_name, u'Thursday') - self.assertEqual(Timestamp('2016-04-08').weekday_name, u'Friday') - self.assertEqual(Timestamp('2016-04-09').weekday_name, u'Saturday') - self.assertEqual(Timestamp('2016-04-10').weekday_name, u'Sunday') - - self.assertEqual(len(dti.year), 365) - self.assertEqual(len(dti.month), 365) - self.assertEqual(len(dti.day), 365) - self.assertEqual(len(dti.hour), 365) - self.assertEqual(len(dti.minute), 365) - self.assertEqual(len(dti.second), 365) - self.assertEqual(len(dti.microsecond), 365) - self.assertEqual(len(dti.dayofweek), 365) - self.assertEqual(len(dti.dayofyear), 365) - self.assertEqual(len(dti.weekofyear), 365) - self.assertEqual(len(dti.quarter), 365) - self.assertEqual(len(dti.is_month_start), 365) - self.assertEqual(len(dti.is_month_end), 365) - self.assertEqual(len(dti.is_quarter_start), 365) - self.assertEqual(len(dti.is_quarter_end), 365) - self.assertEqual(len(dti.is_year_start), 365) - self.assertEqual(len(dti.is_year_end), 365) - self.assertEqual(len(dti.weekday_name), 365) - - dti = DatetimeIndex(freq='BQ-FEB', start=datetime(1998, 1, 1), - periods=4) - - self.assertEqual(sum(dti.is_quarter_start), 0) - self.assertEqual(sum(dti.is_quarter_end), 4) - self.assertEqual(sum(dti.is_year_start), 0) - self.assertEqual(sum(dti.is_year_end), 1) - - # Ensure is_start/end accessors throw ValueError for CustomBusinessDay, - # CBD requires np >= 1.7 - bday_egypt = offsets.CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu') - dti = date_range(datetime(2013, 4, 30), periods=5, freq=bday_egypt) - self.assertRaises(ValueError, lambda: dti.is_month_start) - - dti = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03']) - - self.assertEqual(dti.is_month_start[0], 1) - - tests = [ - (Timestamp('2013-06-01', freq='M').is_month_start, 1), - (Timestamp('2013-06-01', freq='BM').is_month_start, 0), - (Timestamp('2013-06-03', freq='M').is_month_start, 0), - (Timestamp('2013-06-03', freq='BM').is_month_start, 1), - (Timestamp('2013-02-28', freq='Q-FEB').is_month_end, 1), - (Timestamp('2013-02-28', freq='Q-FEB').is_quarter_end, 1), - (Timestamp('2013-02-28', freq='Q-FEB').is_year_end, 1), - (Timestamp('2013-03-01', freq='Q-FEB').is_month_start, 1), - (Timestamp('2013-03-01', freq='Q-FEB').is_quarter_start, 1), - (Timestamp('2013-03-01', freq='Q-FEB').is_year_start, 1), - (Timestamp('2013-03-31', freq='QS-FEB').is_month_end, 1), - (Timestamp('2013-03-31', freq='QS-FEB').is_quarter_end, 0), - (Timestamp('2013-03-31', freq='QS-FEB').is_year_end, 0), - (Timestamp('2013-02-01', freq='QS-FEB').is_month_start, 1), - (Timestamp('2013-02-01', freq='QS-FEB').is_quarter_start, 1), - (Timestamp('2013-02-01', freq='QS-FEB').is_year_start, 1), - (Timestamp('2013-06-30', freq='BQ').is_month_end, 0), - (Timestamp('2013-06-30', freq='BQ').is_quarter_end, 0), - (Timestamp('2013-06-30', freq='BQ').is_year_end, 0), - (Timestamp('2013-06-28', freq='BQ').is_month_end, 1), - (Timestamp('2013-06-28', freq='BQ').is_quarter_end, 1), - (Timestamp('2013-06-28', freq='BQ').is_year_end, 0), - (Timestamp('2013-06-30', freq='BQS-APR').is_month_end, 0), - (Timestamp('2013-06-30', freq='BQS-APR').is_quarter_end, 0), - (Timestamp('2013-06-30', freq='BQS-APR').is_year_end, 0), - (Timestamp('2013-06-28', freq='BQS-APR').is_month_end, 1), - (Timestamp('2013-06-28', freq='BQS-APR').is_quarter_end, 1), - (Timestamp('2013-03-29', freq='BQS-APR').is_year_end, 1), - (Timestamp('2013-11-01', freq='AS-NOV').is_year_start, 1), - (Timestamp('2013-10-31', freq='AS-NOV').is_year_end, 1), - (Timestamp('2012-02-01').days_in_month, 29), - (Timestamp('2013-02-01').days_in_month, 28)] - - for ts, value in tests: - self.assertEqual(ts, value) - - def test_nanosecond_field(self): - dti = DatetimeIndex(np.arange(10)) - - self.assert_numpy_array_equal(dti.nanosecond, - np.arange(10, dtype=np.int32)) - - def test_datetimeindex_diff(self): - dti1 = DatetimeIndex(freq='Q-JAN', start=datetime(1997, 12, 31), - periods=100) - dti2 = DatetimeIndex(freq='Q-JAN', start=datetime(1997, 12, 31), - periods=98) - self.assertEqual(len(dti1.difference(dti2)), 2) - - def test_fancy_getitem(self): - dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), - end=datetime(2010, 1, 1)) - - s = Series(np.arange(len(dti)), index=dti) - - self.assertEqual(s[48], 48) - self.assertEqual(s['1/2/2009'], 48) - self.assertEqual(s['2009-1-2'], 48) - self.assertEqual(s[datetime(2009, 1, 2)], 48) - self.assertEqual(s[lib.Timestamp(datetime(2009, 1, 2))], 48) - self.assertRaises(KeyError, s.__getitem__, '2009-1-3') - - assert_series_equal(s['3/6/2009':'2009-06-05'], - s[datetime(2009, 3, 6):datetime(2009, 6, 5)]) - - def test_fancy_setitem(self): - dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), - end=datetime(2010, 1, 1)) - - s = Series(np.arange(len(dti)), index=dti) - s[48] = -1 - self.assertEqual(s[48], -1) - s['1/2/2009'] = -2 - self.assertEqual(s[48], -2) - s['1/2/2009':'2009-06-05'] = -3 - self.assertTrue((s[48:54] == -3).all()) - - def test_datetimeindex_constructor(self): - arr = ['1/1/2005', '1/2/2005', 'Jn 3, 2005', '2005-01-04'] - self.assertRaises(Exception, DatetimeIndex, arr) - - arr = ['1/1/2005', '1/2/2005', '1/3/2005', '2005-01-04'] - idx1 = DatetimeIndex(arr) - - arr = [datetime(2005, 1, 1), '1/2/2005', '1/3/2005', '2005-01-04'] - idx2 = DatetimeIndex(arr) - - arr = [lib.Timestamp(datetime(2005, 1, 1)), '1/2/2005', '1/3/2005', - '2005-01-04'] - idx3 = DatetimeIndex(arr) - - arr = np.array(['1/1/2005', '1/2/2005', '1/3/2005', - '2005-01-04'], dtype='O') - idx4 = DatetimeIndex(arr) - - arr = to_datetime(['1/1/2005', '1/2/2005', '1/3/2005', '2005-01-04']) - idx5 = DatetimeIndex(arr) - - arr = to_datetime(['1/1/2005', '1/2/2005', 'Jan 3, 2005', '2005-01-04' - ]) - idx6 = DatetimeIndex(arr) - - idx7 = DatetimeIndex(['12/05/2007', '25/01/2008'], dayfirst=True) - idx8 = DatetimeIndex(['2007/05/12', '2008/01/25'], dayfirst=False, - yearfirst=True) - tm.assert_index_equal(idx7, idx8) - - for other in [idx2, idx3, idx4, idx5, idx6]: - self.assertTrue((idx1.values == other.values).all()) - - sdate = datetime(1999, 12, 25) - edate = datetime(2000, 1, 1) - idx = DatetimeIndex(start=sdate, freq='1B', periods=20) - self.assertEqual(len(idx), 20) - self.assertEqual(idx[0], sdate + 0 * offsets.BDay()) - self.assertEqual(idx.freq, 'B') - - idx = DatetimeIndex(end=edate, freq=('D', 5), periods=20) - self.assertEqual(len(idx), 20) - self.assertEqual(idx[-1], edate) - self.assertEqual(idx.freq, '5D') - - idx1 = DatetimeIndex(start=sdate, end=edate, freq='W-SUN') - idx2 = DatetimeIndex(start=sdate, end=edate, - freq=offsets.Week(weekday=6)) - self.assertEqual(len(idx1), len(idx2)) - self.assertEqual(idx1.offset, idx2.offset) - - idx1 = DatetimeIndex(start=sdate, end=edate, freq='QS') - idx2 = DatetimeIndex(start=sdate, end=edate, - freq=offsets.QuarterBegin(startingMonth=1)) - self.assertEqual(len(idx1), len(idx2)) - self.assertEqual(idx1.offset, idx2.offset) - - idx1 = DatetimeIndex(start=sdate, end=edate, freq='BQ') - idx2 = DatetimeIndex(start=sdate, end=edate, - freq=offsets.BQuarterEnd(startingMonth=12)) - self.assertEqual(len(idx1), len(idx2)) - self.assertEqual(idx1.offset, idx2.offset) - - def test_dayfirst(self): - # GH 5917 - arr = ['10/02/2014', '11/02/2014', '12/02/2014'] - expected = DatetimeIndex([datetime(2014, 2, 10), datetime(2014, 2, 11), - datetime(2014, 2, 12)]) - idx1 = DatetimeIndex(arr, dayfirst=True) - idx2 = DatetimeIndex(np.array(arr), dayfirst=True) - idx3 = to_datetime(arr, dayfirst=True) - idx4 = to_datetime(np.array(arr), dayfirst=True) - idx5 = DatetimeIndex(Index(arr), dayfirst=True) - idx6 = DatetimeIndex(Series(arr), dayfirst=True) - tm.assert_index_equal(expected, idx1) - tm.assert_index_equal(expected, idx2) - tm.assert_index_equal(expected, idx3) - tm.assert_index_equal(expected, idx4) - tm.assert_index_equal(expected, idx5) - tm.assert_index_equal(expected, idx6) - - def test_dti_snap(self): - dti = DatetimeIndex(['1/1/2002', '1/2/2002', '1/3/2002', '1/4/2002', - '1/5/2002', '1/6/2002', '1/7/2002'], freq='D') - - res = dti.snap(freq='W-MON') - exp = date_range('12/31/2001', '1/7/2002', freq='w-mon') - exp = exp.repeat([3, 4]) - self.assertTrue((res == exp).all()) - - res = dti.snap(freq='B') - - exp = date_range('1/1/2002', '1/7/2002', freq='b') - exp = exp.repeat([1, 1, 1, 2, 2]) - self.assertTrue((res == exp).all()) - - def test_dti_reset_index_round_trip(self): - dti = DatetimeIndex(start='1/1/2001', end='6/1/2001', freq='D') - d1 = DataFrame({'v': np.random.rand(len(dti))}, index=dti) - d2 = d1.reset_index() - self.assertEqual(d2.dtypes[0], np.dtype('M8[ns]')) - d3 = d2.set_index('index') - assert_frame_equal(d1, d3, check_names=False) - - # #2329 - stamp = datetime(2012, 11, 22) - df = DataFrame([[stamp, 12.1]], columns=['Date', 'Value']) - df = df.set_index('Date') - - self.assertEqual(df.index[0], stamp) - self.assertEqual(df.reset_index()['Date'][0], stamp) - - def test_dti_set_index_reindex(self): - # GH 6631 - df = DataFrame(np.random.random(6)) - idx1 = date_range('2011/01/01', periods=6, freq='M', tz='US/Eastern') - idx2 = date_range('2013', periods=6, freq='A', tz='Asia/Tokyo') - - df = df.set_index(idx1) - tm.assert_index_equal(df.index, idx1) - df = df.reindex(idx2) - tm.assert_index_equal(df.index, idx2) - - # 11314 - # with tz - index = date_range(datetime(2015, 10, 1), - datetime(2015, 10, 1, 23), - freq='H', tz='US/Eastern') - df = DataFrame(np.random.randn(24, 1), columns=['a'], index=index) - new_index = date_range(datetime(2015, 10, 2), - datetime(2015, 10, 2, 23), - freq='H', tz='US/Eastern') - - # TODO: unused? - result = df.set_index(new_index) # noqa - - self.assertEqual(new_index.freq, index.freq) - - def test_datetimeindex_union_join_empty(self): - dti = DatetimeIndex(start='1/1/2001', end='2/1/2001', freq='D') - empty = Index([]) - - result = dti.union(empty) - tm.assertIsInstance(result, DatetimeIndex) - self.assertIs(result, result) - - result = dti.join(empty) - tm.assertIsInstance(result, DatetimeIndex) - - def test_series_set_value(self): - # #1561 - - dates = [datetime(2001, 1, 1), datetime(2001, 1, 2)] - index = DatetimeIndex(dates) - - s = Series().set_value(dates[0], 1.) - s2 = s.set_value(dates[1], np.nan) - - exp = Series([1., np.nan], index=index) - - assert_series_equal(s2, exp) - - # s = Series(index[:1], index[:1]) - # s2 = s.set_value(dates[1], index[1]) - # self.assertEqual(s2.values.dtype, 'M8[ns]') - - @slow - def test_slice_locs_indexerror(self): - times = [datetime(2000, 1, 1) + timedelta(minutes=i * 10) - for i in range(100000)] - s = Series(lrange(100000), times) - s.loc[datetime(1900, 1, 1):datetime(2100, 1, 1)] - - def test_slicing_datetimes(self): - - # GH 7523 - - # unique - df = DataFrame(np.arange(4., dtype='float64'), - index=[datetime(2001, 1, i, 10, 00) - for i in [1, 2, 3, 4]]) - result = df.loc[datetime(2001, 1, 1, 10):] - assert_frame_equal(result, df) - result = df.loc[:datetime(2001, 1, 4, 10)] - assert_frame_equal(result, df) - result = df.loc[datetime(2001, 1, 1, 10):datetime(2001, 1, 4, 10)] - assert_frame_equal(result, df) - - result = df.loc[datetime(2001, 1, 1, 11):] - expected = df.iloc[1:] - assert_frame_equal(result, expected) - result = df.loc['20010101 11':] - assert_frame_equal(result, expected) - - # duplicates - df = pd.DataFrame(np.arange(5., dtype='float64'), - index=[datetime(2001, 1, i, 10, 00) - for i in [1, 2, 2, 3, 4]]) - - result = df.loc[datetime(2001, 1, 1, 10):] - assert_frame_equal(result, df) - result = df.loc[:datetime(2001, 1, 4, 10)] - assert_frame_equal(result, df) - result = df.loc[datetime(2001, 1, 1, 10):datetime(2001, 1, 4, 10)] - assert_frame_equal(result, df) - - result = df.loc[datetime(2001, 1, 1, 11):] - expected = df.iloc[1:] - assert_frame_equal(result, expected) - result = df.loc['20010101 11':] - assert_frame_equal(result, expected) - - -class TestSeriesDatetime64(tm.TestCase): - def setUp(self): - self.series = Series(date_range('1/1/2000', periods=10)) - - def test_auto_conversion(self): - series = Series(list(date_range('1/1/2000', periods=10))) - self.assertEqual(series.dtype, 'M8[ns]') - - def test_constructor_cant_cast_datetime64(self): - msg = "Cannot cast datetime64 to " - with tm.assertRaisesRegexp(TypeError, msg): - Series(date_range('1/1/2000', periods=10), dtype=float) - - with tm.assertRaisesRegexp(TypeError, msg): - Series(date_range('1/1/2000', periods=10), dtype=int) - - def test_constructor_cast_object(self): - s = Series(date_range('1/1/2000', periods=10), dtype=object) - exp = Series(date_range('1/1/2000', periods=10)) - tm.assert_series_equal(s, exp) - - def test_series_comparison_scalars(self): - val = datetime(2000, 1, 4) - result = self.series > val - expected = Series([x > val for x in self.series]) - self.assert_series_equal(result, expected) - - val = self.series[5] - result = self.series > val - expected = Series([x > val for x in self.series]) - self.assert_series_equal(result, expected) - - def test_between(self): - left, right = self.series[[2, 7]] - - result = self.series.between(left, right) - expected = (self.series >= left) & (self.series <= right) - assert_series_equal(result, expected) - - # --------------------------------------------------------------------- - # NaT support - - def test_NaT_scalar(self): - series = Series([0, 1000, 2000, iNaT], dtype='M8[ns]') - - val = series[3] - self.assertTrue(com.isnull(val)) - - series[2] = val - self.assertTrue(com.isnull(series[2])) - - def test_NaT_cast(self): - # GH10747 - result = Series([np.nan]).astype('M8[ns]') - expected = Series([NaT]) - assert_series_equal(result, expected) - - def test_set_none_nan(self): - self.series[3] = None - self.assertIs(self.series[3], NaT) - - self.series[3:5] = None - self.assertIs(self.series[4], NaT) - - self.series[5] = np.nan - self.assertIs(self.series[5], NaT) - - self.series[5:7] = np.nan - self.assertIs(self.series[6], NaT) - - def test_intercept_astype_object(self): - - # this test no longer makes sense as series is by default already - # M8[ns] - expected = self.series.astype('object') - - df = DataFrame({'a': self.series, - 'b': np.random.randn(len(self.series))}) - exp_dtypes = pd.Series([np.dtype('datetime64[ns]'), - np.dtype('float64')], index=['a', 'b']) - tm.assert_series_equal(df.dtypes, exp_dtypes) - - result = df.values.squeeze() - self.assertTrue((result[:, 0] == expected.values).all()) - - df = DataFrame({'a': self.series, 'b': ['foo'] * len(self.series)}) - - result = df.values.squeeze() - self.assertTrue((result[:, 0] == expected.values).all()) - - def test_nat_operations(self): - # GH 8617 - s = Series([0, pd.NaT], dtype='m8[ns]') - exp = s[0] - self.assertEqual(s.median(), exp) - self.assertEqual(s.min(), exp) - self.assertEqual(s.max(), exp) - - def test_round_nat(self): - # GH14940 - s = Series([pd.NaT]) - expected = Series(pd.NaT) - for method in ["round", "floor", "ceil"]: - round_method = getattr(s.dt, method) - for freq in ["s", "5s", "min", "5min", "h", "5h"]: - assert_series_equal(round_method(freq), expected) - - -class TestTimestamp(tm.TestCase): - def test_class_ops_pytz(self): - tm._skip_if_no_pytz() - from pytz import timezone - - def compare(x, y): - self.assertEqual(int(Timestamp(x).value / 1e9), - int(Timestamp(y).value / 1e9)) - - compare(Timestamp.now(), datetime.now()) - compare(Timestamp.now('UTC'), datetime.now(timezone('UTC'))) - compare(Timestamp.utcnow(), datetime.utcnow()) - compare(Timestamp.today(), datetime.today()) - current_time = calendar.timegm(datetime.now().utctimetuple()) - compare(Timestamp.utcfromtimestamp(current_time), - datetime.utcfromtimestamp(current_time)) - compare(Timestamp.fromtimestamp(current_time), - datetime.fromtimestamp(current_time)) - - date_component = datetime.utcnow() - time_component = (date_component + timedelta(minutes=10)).time() - compare(Timestamp.combine(date_component, time_component), - datetime.combine(date_component, time_component)) - - def test_class_ops_dateutil(self): - tm._skip_if_no_dateutil() - from dateutil.tz import tzutc - - def compare(x, y): - self.assertEqual(int(np.round(Timestamp(x).value / 1e9)), - int(np.round(Timestamp(y).value / 1e9))) - - compare(Timestamp.now(), datetime.now()) - compare(Timestamp.now('UTC'), datetime.now(tzutc())) - compare(Timestamp.utcnow(), datetime.utcnow()) - compare(Timestamp.today(), datetime.today()) - current_time = calendar.timegm(datetime.now().utctimetuple()) - compare(Timestamp.utcfromtimestamp(current_time), - datetime.utcfromtimestamp(current_time)) - compare(Timestamp.fromtimestamp(current_time), - datetime.fromtimestamp(current_time)) - - date_component = datetime.utcnow() - time_component = (date_component + timedelta(minutes=10)).time() - compare(Timestamp.combine(date_component, time_component), - datetime.combine(date_component, time_component)) - - def test_basics_nanos(self): - val = np.int64(946684800000000000).view('M8[ns]') - stamp = Timestamp(val.view('i8') + 500) - self.assertEqual(stamp.year, 2000) - self.assertEqual(stamp.month, 1) - self.assertEqual(stamp.microsecond, 0) - self.assertEqual(stamp.nanosecond, 500) - - # GH 14415 - val = np.iinfo(np.int64).min + 80000000000000 - stamp = Timestamp(val) - self.assertEqual(stamp.year, 1677) - self.assertEqual(stamp.month, 9) - self.assertEqual(stamp.day, 21) - self.assertEqual(stamp.microsecond, 145224) - self.assertEqual(stamp.nanosecond, 192) - - def test_unit(self): - - def check(val, unit=None, h=1, s=1, us=0): - stamp = Timestamp(val, unit=unit) - self.assertEqual(stamp.year, 2000) - self.assertEqual(stamp.month, 1) - self.assertEqual(stamp.day, 1) - self.assertEqual(stamp.hour, h) - if unit != 'D': - self.assertEqual(stamp.minute, 1) - self.assertEqual(stamp.second, s) - self.assertEqual(stamp.microsecond, us) - else: - self.assertEqual(stamp.minute, 0) - self.assertEqual(stamp.second, 0) - self.assertEqual(stamp.microsecond, 0) - self.assertEqual(stamp.nanosecond, 0) - - ts = Timestamp('20000101 01:01:01') - val = ts.value - days = (ts - Timestamp('1970-01-01')).days - - check(val) - check(val / long(1000), unit='us') - check(val / long(1000000), unit='ms') - check(val / long(1000000000), unit='s') - check(days, unit='D', h=0) - - # using truediv, so these are like floats - if compat.PY3: - check((val + 500000) / long(1000000000), unit='s', us=500) - check((val + 500000000) / long(1000000000), unit='s', us=500000) - check((val + 500000) / long(1000000), unit='ms', us=500) - - # get chopped in py2 - else: - check((val + 500000) / long(1000000000), unit='s') - check((val + 500000000) / long(1000000000), unit='s') - check((val + 500000) / long(1000000), unit='ms') - - # ok - check((val + 500000) / long(1000), unit='us', us=500) - check((val + 500000000) / long(1000000), unit='ms', us=500000) - - # floats - check(val / 1000.0 + 5, unit='us', us=5) - check(val / 1000.0 + 5000, unit='us', us=5000) - check(val / 1000000.0 + 0.5, unit='ms', us=500) - check(val / 1000000.0 + 0.005, unit='ms', us=5) - check(val / 1000000000.0 + 0.5, unit='s', us=500000) - check(days + 0.5, unit='D', h=12) - - # nan - result = Timestamp(np.nan) - self.assertIs(result, NaT) - - result = Timestamp(None) - self.assertIs(result, NaT) - - result = Timestamp(iNaT) - self.assertIs(result, NaT) - - result = Timestamp(NaT) - self.assertIs(result, NaT) - - result = Timestamp('NaT') - self.assertIs(result, NaT) - - self.assertTrue(isnull(Timestamp('nat'))) - - def test_roundtrip(self): - - # test value to string and back conversions - # further test accessors - base = Timestamp('20140101 00:00:00') - - result = Timestamp(base.value + pd.Timedelta('5ms').value) - self.assertEqual(result, Timestamp(str(base) + ".005000")) - self.assertEqual(result.microsecond, 5000) - - result = Timestamp(base.value + pd.Timedelta('5us').value) - self.assertEqual(result, Timestamp(str(base) + ".000005")) - self.assertEqual(result.microsecond, 5) - - result = Timestamp(base.value + pd.Timedelta('5ns').value) - self.assertEqual(result, Timestamp(str(base) + ".000000005")) - self.assertEqual(result.nanosecond, 5) - self.assertEqual(result.microsecond, 0) - - result = Timestamp(base.value + pd.Timedelta('6ms 5us').value) - self.assertEqual(result, Timestamp(str(base) + ".006005")) - self.assertEqual(result.microsecond, 5 + 6 * 1000) - - result = Timestamp(base.value + pd.Timedelta('200ms 5us').value) - self.assertEqual(result, Timestamp(str(base) + ".200005")) - self.assertEqual(result.microsecond, 5 + 200 * 1000) - - def test_comparison(self): - # 5-18-2012 00:00:00.000 - stamp = long(1337299200000000000) - - val = Timestamp(stamp) - - self.assertEqual(val, val) - self.assertFalse(val != val) - self.assertFalse(val < val) - self.assertTrue(val <= val) - self.assertFalse(val > val) - self.assertTrue(val >= val) - - other = datetime(2012, 5, 18) - self.assertEqual(val, other) - self.assertFalse(val != other) - self.assertFalse(val < other) - self.assertTrue(val <= other) - self.assertFalse(val > other) - self.assertTrue(val >= other) - - other = Timestamp(stamp + 100) - - self.assertNotEqual(val, other) - self.assertNotEqual(val, other) - self.assertTrue(val < other) - self.assertTrue(val <= other) - self.assertTrue(other > val) - self.assertTrue(other >= val) - - def test_compare_invalid(self): - - # GH 8058 - val = Timestamp('20130101 12:01:02') - self.assertFalse(val == 'foo') - self.assertFalse(val == 10.0) - self.assertFalse(val == 1) - self.assertFalse(val == long(1)) - self.assertFalse(val == []) - self.assertFalse(val == {'foo': 1}) - self.assertFalse(val == np.float64(1)) - self.assertFalse(val == np.int64(1)) - - self.assertTrue(val != 'foo') - self.assertTrue(val != 10.0) - self.assertTrue(val != 1) - self.assertTrue(val != long(1)) - self.assertTrue(val != []) - self.assertTrue(val != {'foo': 1}) - self.assertTrue(val != np.float64(1)) - self.assertTrue(val != np.int64(1)) - - # ops testing - df = DataFrame(randn(5, 2)) - a = df[0] - b = Series(randn(5)) - b.name = Timestamp('2000-01-01') - tm.assert_series_equal(a / b, 1 / (b / a)) - - def test_cant_compare_tz_naive_w_aware(self): - tm._skip_if_no_pytz() - # #1404 - a = Timestamp('3/12/2012') - b = Timestamp('3/12/2012', tz='utc') - - self.assertRaises(Exception, a.__eq__, b) - self.assertRaises(Exception, a.__ne__, b) - self.assertRaises(Exception, a.__lt__, b) - self.assertRaises(Exception, a.__gt__, b) - self.assertRaises(Exception, b.__eq__, a) - self.assertRaises(Exception, b.__ne__, a) - self.assertRaises(Exception, b.__lt__, a) - self.assertRaises(Exception, b.__gt__, a) - - if sys.version_info < (3, 3): - self.assertRaises(Exception, a.__eq__, b.to_pydatetime()) - self.assertRaises(Exception, a.to_pydatetime().__eq__, b) - else: - self.assertFalse(a == b.to_pydatetime()) - self.assertFalse(a.to_pydatetime() == b) - - def test_cant_compare_tz_naive_w_aware_explicit_pytz(self): - tm._skip_if_no_pytz() - from pytz import utc - # #1404 - a = Timestamp('3/12/2012') - b = Timestamp('3/12/2012', tz=utc) - - self.assertRaises(Exception, a.__eq__, b) - self.assertRaises(Exception, a.__ne__, b) - self.assertRaises(Exception, a.__lt__, b) - self.assertRaises(Exception, a.__gt__, b) - self.assertRaises(Exception, b.__eq__, a) - self.assertRaises(Exception, b.__ne__, a) - self.assertRaises(Exception, b.__lt__, a) - self.assertRaises(Exception, b.__gt__, a) - - if sys.version_info < (3, 3): - self.assertRaises(Exception, a.__eq__, b.to_pydatetime()) - self.assertRaises(Exception, a.to_pydatetime().__eq__, b) - else: - self.assertFalse(a == b.to_pydatetime()) - self.assertFalse(a.to_pydatetime() == b) - - def test_cant_compare_tz_naive_w_aware_dateutil(self): - tm._skip_if_no_dateutil() - from dateutil.tz import tzutc - utc = tzutc() - # #1404 - a = Timestamp('3/12/2012') - b = Timestamp('3/12/2012', tz=utc) - - self.assertRaises(Exception, a.__eq__, b) - self.assertRaises(Exception, a.__ne__, b) - self.assertRaises(Exception, a.__lt__, b) - self.assertRaises(Exception, a.__gt__, b) - self.assertRaises(Exception, b.__eq__, a) - self.assertRaises(Exception, b.__ne__, a) - self.assertRaises(Exception, b.__lt__, a) - self.assertRaises(Exception, b.__gt__, a) - - if sys.version_info < (3, 3): - self.assertRaises(Exception, a.__eq__, b.to_pydatetime()) - self.assertRaises(Exception, a.to_pydatetime().__eq__, b) - else: - self.assertFalse(a == b.to_pydatetime()) - self.assertFalse(a.to_pydatetime() == b) - - def test_delta_preserve_nanos(self): - val = Timestamp(long(1337299200000000123)) - result = val + timedelta(1) - self.assertEqual(result.nanosecond, val.nanosecond) - - def test_frequency_misc(self): - self.assertEqual(frequencies.get_freq_group('T'), - frequencies.FreqGroup.FR_MIN) - - code, stride = frequencies.get_freq_code(offsets.Hour()) - self.assertEqual(code, frequencies.FreqGroup.FR_HR) - - code, stride = frequencies.get_freq_code((5, 'T')) - self.assertEqual(code, frequencies.FreqGroup.FR_MIN) - self.assertEqual(stride, 5) - - offset = offsets.Hour() - result = frequencies.to_offset(offset) - self.assertEqual(result, offset) - - result = frequencies.to_offset((5, 'T')) - expected = offsets.Minute(5) - self.assertEqual(result, expected) - - self.assertRaises(ValueError, frequencies.get_freq_code, (5, 'baz')) - - self.assertRaises(ValueError, frequencies.to_offset, '100foo') - - self.assertRaises(ValueError, frequencies.to_offset, ('', '')) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result = frequencies.get_standard_freq(offsets.Hour()) - self.assertEqual(result, 'H') - - def test_hash_equivalent(self): - d = {datetime(2011, 1, 1): 5} - stamp = Timestamp(datetime(2011, 1, 1)) - self.assertEqual(d[stamp], 5) - - def test_timestamp_compare_scalars(self): - # case where ndim == 0 - lhs = np.datetime64(datetime(2013, 12, 6)) - rhs = Timestamp('now') - nat = Timestamp('nat') - - ops = {'gt': 'lt', - 'lt': 'gt', - 'ge': 'le', - 'le': 'ge', - 'eq': 'eq', - 'ne': 'ne'} - - for left, right in ops.items(): - left_f = getattr(operator, left) - right_f = getattr(operator, right) - expected = left_f(lhs, rhs) - - result = right_f(rhs, lhs) - self.assertEqual(result, expected) - - expected = left_f(rhs, nat) - result = right_f(nat, rhs) - self.assertEqual(result, expected) - - def test_timestamp_compare_series(self): - # make sure we can compare Timestamps on the right AND left hand side - # GH4982 - s = Series(date_range('20010101', periods=10), name='dates') - s_nat = s.copy(deep=True) - - s[0] = pd.Timestamp('nat') - s[3] = pd.Timestamp('nat') - - ops = {'lt': 'gt', 'le': 'ge', 'eq': 'eq', 'ne': 'ne'} - - for left, right in ops.items(): - left_f = getattr(operator, left) - right_f = getattr(operator, right) - - # no nats - expected = left_f(s, Timestamp('20010109')) - result = right_f(Timestamp('20010109'), s) - tm.assert_series_equal(result, expected) - - # nats - expected = left_f(s, Timestamp('nat')) - result = right_f(Timestamp('nat'), s) - tm.assert_series_equal(result, expected) - - # compare to timestamp with series containing nats - expected = left_f(s_nat, Timestamp('20010109')) - result = right_f(Timestamp('20010109'), s_nat) - tm.assert_series_equal(result, expected) - - # compare to nat with series containing nats - expected = left_f(s_nat, Timestamp('nat')) - result = right_f(Timestamp('nat'), s_nat) - tm.assert_series_equal(result, expected) - - def test_is_leap_year(self): - # GH 13727 - for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: - dt = Timestamp('2000-01-01 00:00:00', tz=tz) - self.assertTrue(dt.is_leap_year) - self.assertIsInstance(dt.is_leap_year, bool) - - dt = Timestamp('1999-01-01 00:00:00', tz=tz) - self.assertFalse(dt.is_leap_year) - - dt = Timestamp('2004-01-01 00:00:00', tz=tz) - self.assertTrue(dt.is_leap_year) - - dt = Timestamp('2100-01-01 00:00:00', tz=tz) - self.assertFalse(dt.is_leap_year) - - self.assertFalse(pd.NaT.is_leap_year) - self.assertIsInstance(pd.NaT.is_leap_year, bool) - - def test_round_nat(self): - # GH14940 - ts = Timestamp('nat') - print(dir(ts)) - for method in ["round", "floor", "ceil"]: - round_method = getattr(ts, method) - for freq in ["s", "5s", "min", "5min", "h", "5h"]: - self.assertIs(round_method(freq), ts) - - -class TestSlicing(tm.TestCase): - def test_slice_year(self): - dti = DatetimeIndex(freq='B', start=datetime(2005, 1, 1), periods=500) - - s = Series(np.arange(len(dti)), index=dti) - result = s['2005'] - expected = s[s.index.year == 2005] - assert_series_equal(result, expected) - - df = DataFrame(np.random.rand(len(dti), 5), index=dti) - result = df.loc['2005'] - expected = df[df.index.year == 2005] - assert_frame_equal(result, expected) - - rng = date_range('1/1/2000', '1/1/2010') - - result = rng.get_loc('2009') - expected = slice(3288, 3653) - self.assertEqual(result, expected) - - def test_slice_quarter(self): - dti = DatetimeIndex(freq='D', start=datetime(2000, 6, 1), periods=500) - - s = Series(np.arange(len(dti)), index=dti) - self.assertEqual(len(s['2001Q1']), 90) - - df = DataFrame(np.random.rand(len(dti), 5), index=dti) - self.assertEqual(len(df.loc['1Q01']), 90) - - def test_slice_month(self): - dti = DatetimeIndex(freq='D', start=datetime(2005, 1, 1), periods=500) - s = Series(np.arange(len(dti)), index=dti) - self.assertEqual(len(s['2005-11']), 30) - - df = DataFrame(np.random.rand(len(dti), 5), index=dti) - self.assertEqual(len(df.loc['2005-11']), 30) - - assert_series_equal(s['2005-11'], s['11-2005']) - - def test_partial_slice(self): - rng = DatetimeIndex(freq='D', start=datetime(2005, 1, 1), periods=500) - s = Series(np.arange(len(rng)), index=rng) - - result = s['2005-05':'2006-02'] - expected = s['20050501':'20060228'] - assert_series_equal(result, expected) - - result = s['2005-05':] - expected = s['20050501':] - assert_series_equal(result, expected) - - result = s[:'2006-02'] - expected = s[:'20060228'] - assert_series_equal(result, expected) - - result = s['2005-1-1'] - self.assertEqual(result, s.iloc[0]) - - self.assertRaises(Exception, s.__getitem__, '2004-12-31') - - def test_partial_slice_daily(self): - rng = DatetimeIndex(freq='H', start=datetime(2005, 1, 31), periods=500) - s = Series(np.arange(len(rng)), index=rng) - - result = s['2005-1-31'] - assert_series_equal(result, s.iloc[:24]) - - self.assertRaises(Exception, s.__getitem__, '2004-12-31 00') - - def test_partial_slice_hourly(self): - rng = DatetimeIndex(freq='T', start=datetime(2005, 1, 1, 20, 0, 0), - periods=500) - s = Series(np.arange(len(rng)), index=rng) - - result = s['2005-1-1'] - assert_series_equal(result, s.iloc[:60 * 4]) - - result = s['2005-1-1 20'] - assert_series_equal(result, s.iloc[:60]) - - self.assertEqual(s['2005-1-1 20:00'], s.iloc[0]) - self.assertRaises(Exception, s.__getitem__, '2004-12-31 00:15') - - def test_partial_slice_minutely(self): - rng = DatetimeIndex(freq='S', start=datetime(2005, 1, 1, 23, 59, 0), - periods=500) - s = Series(np.arange(len(rng)), index=rng) - - result = s['2005-1-1 23:59'] - assert_series_equal(result, s.iloc[:60]) - - result = s['2005-1-1'] - assert_series_equal(result, s.iloc[:60]) - - self.assertEqual(s[Timestamp('2005-1-1 23:59:00')], s.iloc[0]) - self.assertRaises(Exception, s.__getitem__, '2004-12-31 00:00:00') - - def test_partial_slice_second_precision(self): - rng = DatetimeIndex(start=datetime(2005, 1, 1, 0, 0, 59, - microsecond=999990), - periods=20, freq='US') - s = Series(np.arange(20), rng) - - assert_series_equal(s['2005-1-1 00:00'], s.iloc[:10]) - assert_series_equal(s['2005-1-1 00:00:59'], s.iloc[:10]) - - assert_series_equal(s['2005-1-1 00:01'], s.iloc[10:]) - assert_series_equal(s['2005-1-1 00:01:00'], s.iloc[10:]) - - self.assertEqual(s[Timestamp('2005-1-1 00:00:59.999990')], s.iloc[0]) - self.assertRaisesRegexp(KeyError, '2005-1-1 00:00:00', - lambda: s['2005-1-1 00:00:00']) - - def test_partial_slicing_dataframe(self): - # GH14856 - # Test various combinations of string slicing resolution vs. - # index resolution - # - If string resolution is less precise than index resolution, - # string is considered a slice - # - If string resolution is equal to or more precise than index - # resolution, string is considered an exact match - formats = ['%Y', '%Y-%m', '%Y-%m-%d', '%Y-%m-%d %H', - '%Y-%m-%d %H:%M', '%Y-%m-%d %H:%M:%S'] - resolutions = ['year', 'month', 'day', 'hour', 'minute', 'second'] - for rnum, resolution in enumerate(resolutions[2:], 2): - # we check only 'day', 'hour', 'minute' and 'second' - unit = Timedelta("1 " + resolution) - middate = datetime(2012, 1, 1, 0, 0, 0) - index = DatetimeIndex([middate - unit, - middate, middate + unit]) - values = [1, 2, 3] - df = DataFrame({'a': values}, index, dtype=np.int64) - self.assertEqual(df.index.resolution, resolution) - - # Timestamp with the same resolution as index - # Should be exact match for Series (return scalar) - # and raise KeyError for Frame - for timestamp, expected in zip(index, values): - ts_string = timestamp.strftime(formats[rnum]) - # make ts_string as precise as index - result = df['a'][ts_string] - self.assertIsInstance(result, np.int64) - self.assertEqual(result, expected) - self.assertRaises(KeyError, df.__getitem__, ts_string) - - # Timestamp with resolution less precise than index - for fmt in formats[:rnum]: - for element, theslice in [[0, slice(None, 1)], - [1, slice(1, None)]]: - ts_string = index[element].strftime(fmt) - - # Series should return slice - result = df['a'][ts_string] - expected = df['a'][theslice] - assert_series_equal(result, expected) - - # Frame should return slice as well - result = df[ts_string] - expected = df[theslice] - assert_frame_equal(result, expected) - - # Timestamp with resolution more precise than index - # Compatible with existing key - # Should return scalar for Series - # and raise KeyError for Frame - for fmt in formats[rnum + 1:]: - ts_string = index[1].strftime(fmt) - result = df['a'][ts_string] - self.assertIsInstance(result, np.int64) - self.assertEqual(result, 2) - self.assertRaises(KeyError, df.__getitem__, ts_string) - - # Not compatible with existing key - # Should raise KeyError - for fmt, res in list(zip(formats, resolutions))[rnum + 1:]: - ts = index[1] + Timedelta("1 " + res) - ts_string = ts.strftime(fmt) - self.assertRaises(KeyError, df['a'].__getitem__, ts_string) - self.assertRaises(KeyError, df.__getitem__, ts_string) - - def test_partial_slicing_with_multiindex(self): - - # GH 4758 - # partial string indexing with a multi-index buggy - df = DataFrame({'ACCOUNT': ["ACCT1", "ACCT1", "ACCT1", "ACCT2"], - 'TICKER': ["ABC", "MNP", "XYZ", "XYZ"], - 'val': [1, 2, 3, 4]}, - index=date_range("2013-06-19 09:30:00", - periods=4, freq='5T')) - df_multi = df.set_index(['ACCOUNT', 'TICKER'], append=True) - - expected = DataFrame([ - [1] - ], index=Index(['ABC'], name='TICKER'), columns=['val']) - result = df_multi.loc[('2013-06-19 09:30:00', 'ACCT1')] - assert_frame_equal(result, expected) - - expected = df_multi.loc[ - (pd.Timestamp('2013-06-19 09:30:00', tz=None), 'ACCT1', 'ABC')] - result = df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')] - assert_series_equal(result, expected) - - # this is a KeyError as we don't do partial string selection on - # multi-levels - def f(): - df_multi.loc[('2013-06-19', 'ACCT1', 'ABC')] - - self.assertRaises(KeyError, f) - - # GH 4294 - # partial slice on a series mi - s = pd.DataFrame(randn(1000, 1000), index=pd.date_range( - '2000-1-1', periods=1000)).stack() - - s2 = s[:-1].copy() - expected = s2['2000-1-4'] - result = s2[pd.Timestamp('2000-1-4')] - assert_series_equal(result, expected) - - result = s[pd.Timestamp('2000-1-4')] - expected = s['2000-1-4'] - assert_series_equal(result, expected) - - df2 = pd.DataFrame(s) - expected = df2.xs('2000-1-4') - result = df2.loc[pd.Timestamp('2000-1-4')] - assert_frame_equal(result, expected) - - def test_date_range_normalize(self): - snap = datetime.today() - n = 50 - - rng = date_range(snap, periods=n, normalize=False, freq='2D') - - offset = timedelta(2) - values = DatetimeIndex([snap + i * offset for i in range(n)]) - - tm.assert_index_equal(rng, values) - - rng = date_range('1/1/2000 08:15', periods=n, normalize=False, - freq='B') - the_time = time(8, 15) - for val in rng: - self.assertEqual(val.time(), the_time) - - def test_timedelta(self): - # this is valid too - index = date_range('1/1/2000', periods=50, freq='B') - shifted = index + timedelta(1) - back = shifted + timedelta(-1) - self.assertTrue(tm.equalContents(index, back)) - self.assertEqual(shifted.freq, index.freq) - self.assertEqual(shifted.freq, back.freq) - - result = index - timedelta(1) - expected = index + timedelta(-1) - tm.assert_index_equal(result, expected) - - # GH4134, buggy with timedeltas - rng = date_range('2013', '2014') - s = Series(rng) - result1 = rng - pd.offsets.Hour(1) - result2 = DatetimeIndex(s - np.timedelta64(100000000)) - result3 = rng - np.timedelta64(100000000) - result4 = DatetimeIndex(s - pd.offsets.Hour(1)) - tm.assert_index_equal(result1, result4) - tm.assert_index_equal(result2, result3) - - def test_shift(self): - ts = Series(np.random.randn(5), - index=date_range('1/1/2000', periods=5, freq='H')) - - result = ts.shift(1, freq='5T') - exp_index = ts.index.shift(1, freq='5T') - tm.assert_index_equal(result.index, exp_index) - - # GH #1063, multiple of same base - result = ts.shift(1, freq='4H') - exp_index = ts.index + offsets.Hour(4) - tm.assert_index_equal(result.index, exp_index) - - idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04']) - self.assertRaises(ValueError, idx.shift, 1) - - def test_setops_preserve_freq(self): - for tz in [None, 'Asia/Tokyo', 'US/Eastern']: - rng = date_range('1/1/2000', '1/1/2002', name='idx', tz=tz) - - result = rng[:50].union(rng[50:100]) - self.assertEqual(result.name, rng.name) - self.assertEqual(result.freq, rng.freq) - self.assertEqual(result.tz, rng.tz) - - result = rng[:50].union(rng[30:100]) - self.assertEqual(result.name, rng.name) - self.assertEqual(result.freq, rng.freq) - self.assertEqual(result.tz, rng.tz) - - result = rng[:50].union(rng[60:100]) - self.assertEqual(result.name, rng.name) - self.assertIsNone(result.freq) - self.assertEqual(result.tz, rng.tz) - - result = rng[:50].intersection(rng[25:75]) - self.assertEqual(result.name, rng.name) - self.assertEqual(result.freqstr, 'D') - self.assertEqual(result.tz, rng.tz) - - nofreq = DatetimeIndex(list(rng[25:75]), name='other') - result = rng[:50].union(nofreq) - self.assertIsNone(result.name) - self.assertEqual(result.freq, rng.freq) - self.assertEqual(result.tz, rng.tz) - - result = rng[:50].intersection(nofreq) - self.assertIsNone(result.name) - self.assertEqual(result.freq, rng.freq) - self.assertEqual(result.tz, rng.tz) - - def test_min_max(self): - rng = date_range('1/1/2000', '12/31/2000') - rng2 = rng.take(np.random.permutation(len(rng))) - - the_min = rng2.min() - the_max = rng2.max() - tm.assertIsInstance(the_min, Timestamp) - tm.assertIsInstance(the_max, Timestamp) - self.assertEqual(the_min, rng[0]) - self.assertEqual(the_max, rng[-1]) - - self.assertEqual(rng.min(), rng[0]) - self.assertEqual(rng.max(), rng[-1]) - - def test_min_max_series(self): - rng = date_range('1/1/2000', periods=10, freq='4h') - lvls = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'] - df = DataFrame({'TS': rng, 'V': np.random.randn(len(rng)), 'L': lvls}) - - result = df.TS.max() - exp = Timestamp(df.TS.iat[-1]) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, exp) - - result = df.TS.min() - exp = Timestamp(df.TS.iat[0]) - self.assertTrue(isinstance(result, Timestamp)) - self.assertEqual(result, exp) - - def test_from_M8_structured(self): - dates = [(datetime(2012, 9, 9, 0, 0), datetime(2012, 9, 8, 15, 10))] - arr = np.array(dates, - dtype=[('Date', 'M8[us]'), ('Forecasting', 'M8[us]')]) - df = DataFrame(arr) - - self.assertEqual(df['Date'][0], dates[0][0]) - self.assertEqual(df['Forecasting'][0], dates[0][1]) - - s = Series(arr['Date']) - self.assertTrue(s[0], Timestamp) - self.assertEqual(s[0], dates[0][0]) - - s = Series.from_array(arr['Date'], Index([0])) - self.assertEqual(s[0], dates[0][0]) - - def test_get_level_values_box(self): - from pandas import MultiIndex - - dates = date_range('1/1/2000', periods=4) - levels = [dates, [0, 1]] - labels = [[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]] - - index = MultiIndex(levels=levels, labels=labels) - - self.assertTrue(isinstance(index.get_level_values(0)[0], Timestamp)) - - def test_frame_apply_dont_convert_datetime64(self): - from pandas.tseries.offsets import BDay - df = DataFrame({'x1': [datetime(1996, 1, 1)]}) - - df = df.applymap(lambda x: x + BDay()) - df = df.applymap(lambda x: x + BDay()) - - self.assertTrue(df.x1.dtype == 'M8[ns]') - - def test_date_range_fy5252(self): - dr = date_range(start="2013-01-01", periods=2, freq=offsets.FY5253( - startingMonth=1, weekday=3, variation="nearest")) - self.assertEqual(dr[0], Timestamp('2013-01-31')) - self.assertEqual(dr[1], Timestamp('2014-01-30')) - - def test_partial_slice_doesnt_require_monotonicity(self): - # For historical reasons. - s = pd.Series(np.arange(10), pd.date_range('2014-01-01', periods=10)) - - nonmonotonic = s[[3, 5, 4]] - expected = nonmonotonic.iloc[:0] - timestamp = pd.Timestamp('2014-01-10') - - assert_series_equal(nonmonotonic['2014-01-10':], expected) - self.assertRaisesRegexp(KeyError, - r"Timestamp\('2014-01-10 00:00:00'\)", - lambda: nonmonotonic[timestamp:]) - - assert_series_equal(nonmonotonic.loc['2014-01-10':], expected) - self.assertRaisesRegexp(KeyError, - r"Timestamp\('2014-01-10 00:00:00'\)", - lambda: nonmonotonic.loc[timestamp:]) - - -class TimeConversionFormats(tm.TestCase): - def test_to_datetime_format(self): - values = ['1/1/2000', '1/2/2000', '1/3/2000'] - - results1 = [Timestamp('20000101'), Timestamp('20000201'), - Timestamp('20000301')] - results2 = [Timestamp('20000101'), Timestamp('20000102'), - Timestamp('20000103')] - for vals, expecteds in [(values, (Index(results1), Index(results2))), - (Series(values), - (Series(results1), Series(results2))), - (values[0], (results1[0], results2[0])), - (values[1], (results1[1], results2[1])), - (values[2], (results1[2], results2[2]))]: - - for i, fmt in enumerate(['%d/%m/%Y', '%m/%d/%Y']): - result = to_datetime(vals, format=fmt) - expected = expecteds[i] - - if isinstance(expected, Series): - assert_series_equal(result, Series(expected)) - elif isinstance(expected, Timestamp): - self.assertEqual(result, expected) - else: - tm.assert_index_equal(result, expected) - - def test_to_datetime_format_YYYYMMDD(self): - s = Series([19801222, 19801222] + [19810105] * 5) - expected = Series([Timestamp(x) for x in s.apply(str)]) - - result = to_datetime(s, format='%Y%m%d') - assert_series_equal(result, expected) - - result = to_datetime(s.apply(str), format='%Y%m%d') - assert_series_equal(result, expected) - - # with NaT - expected = Series([Timestamp("19801222"), Timestamp("19801222")] + - [Timestamp("19810105")] * 5) - expected[2] = np.nan - s[2] = np.nan - - result = to_datetime(s, format='%Y%m%d') - assert_series_equal(result, expected) - - # string with NaT - s = s.apply(str) - s[2] = 'nat' - result = to_datetime(s, format='%Y%m%d') - assert_series_equal(result, expected) - - # coercion - # GH 7930 - s = Series([20121231, 20141231, 99991231]) - result = pd.to_datetime(s, format='%Y%m%d', errors='ignore') - expected = Series([datetime(2012, 12, 31), - datetime(2014, 12, 31), datetime(9999, 12, 31)], - dtype=object) - self.assert_series_equal(result, expected) - - result = pd.to_datetime(s, format='%Y%m%d', errors='coerce') - expected = Series(['20121231', '20141231', 'NaT'], dtype='M8[ns]') - assert_series_equal(result, expected) - - # GH 10178 - def test_to_datetime_format_integer(self): - s = Series([2000, 2001, 2002]) - expected = Series([Timestamp(x) for x in s.apply(str)]) - - result = to_datetime(s, format='%Y') - assert_series_equal(result, expected) - - s = Series([200001, 200105, 200206]) - expected = Series([Timestamp(x[:4] + '-' + x[4:]) for x in s.apply(str) - ]) - - result = to_datetime(s, format='%Y%m') - assert_series_equal(result, expected) - - def test_to_datetime_format_microsecond(self): - - # these are locale dependent - lang, _ = locale.getlocale() - month_abbr = calendar.month_abbr[4] - val = '01-{}-2011 00:00:01.978'.format(month_abbr) - - format = '%d-%b-%Y %H:%M:%S.%f' - result = to_datetime(val, format=format) - exp = datetime.strptime(val, format) - self.assertEqual(result, exp) - - def test_to_datetime_format_time(self): - data = [ - ['01/10/2010 15:20', '%m/%d/%Y %H:%M', - Timestamp('2010-01-10 15:20')], - ['01/10/2010 05:43', '%m/%d/%Y %I:%M', - Timestamp('2010-01-10 05:43')], - ['01/10/2010 13:56:01', '%m/%d/%Y %H:%M:%S', - Timestamp('2010-01-10 13:56:01')] # , - # ['01/10/2010 08:14 PM', '%m/%d/%Y %I:%M %p', - # Timestamp('2010-01-10 20:14')], - # ['01/10/2010 07:40 AM', '%m/%d/%Y %I:%M %p', - # Timestamp('2010-01-10 07:40')], - # ['01/10/2010 09:12:56 AM', '%m/%d/%Y %I:%M:%S %p', - # Timestamp('2010-01-10 09:12:56')] - ] - for s, format, dt in data: - self.assertEqual(to_datetime(s, format=format), dt) - - def test_to_datetime_with_non_exact(self): - # GH 10834 - _skip_if_has_locale() - - # 8904 - # exact kw - if sys.version_info < (2, 7): - raise nose.SkipTest('on python version < 2.7') - - s = Series(['19MAY11', 'foobar19MAY11', '19MAY11:00:00:00', - '19MAY11 00:00:00Z']) - result = to_datetime(s, format='%d%b%y', exact=False) - expected = to_datetime(s.str.extract(r'(\d+\w+\d+)', expand=False), - format='%d%b%y') - assert_series_equal(result, expected) - - def test_parse_nanoseconds_with_formula(self): - - # GH8989 - # trunctaing the nanoseconds when a format was provided - for v in ["2012-01-01 09:00:00.000000001", - "2012-01-01 09:00:00.000001", - "2012-01-01 09:00:00.001", - "2012-01-01 09:00:00.001000", - "2012-01-01 09:00:00.001000000", ]: - expected = pd.to_datetime(v) - result = pd.to_datetime(v, format="%Y-%m-%d %H:%M:%S.%f") - self.assertEqual(result, expected) - - def test_to_datetime_format_weeks(self): - data = [ - ['2009324', '%Y%W%w', Timestamp('2009-08-13')], - ['2013020', '%Y%U%w', Timestamp('2013-01-13')] - ] - for s, format, dt in data: - self.assertEqual(to_datetime(s, format=format), dt) - - -class TestToDatetimeInferFormat(tm.TestCase): - - def test_to_datetime_infer_datetime_format_consistent_format(self): - s = pd.Series(pd.date_range('20000101', periods=50, freq='H')) - - test_formats = ['%m-%d-%Y', '%m/%d/%Y %H:%M:%S.%f', - '%Y-%m-%dT%H:%M:%S.%f'] - - for test_format in test_formats: - s_as_dt_strings = s.apply(lambda x: x.strftime(test_format)) - - with_format = pd.to_datetime(s_as_dt_strings, format=test_format) - no_infer = pd.to_datetime(s_as_dt_strings, - infer_datetime_format=False) - yes_infer = pd.to_datetime(s_as_dt_strings, - infer_datetime_format=True) - - # Whether the format is explicitly passed, it is inferred, or - # it is not inferred, the results should all be the same - self.assert_series_equal(with_format, no_infer) - self.assert_series_equal(no_infer, yes_infer) - - def test_to_datetime_infer_datetime_format_inconsistent_format(self): - s = pd.Series(np.array(['01/01/2011 00:00:00', - '01-02-2011 00:00:00', - '2011-01-03T00:00:00'])) - - # When the format is inconsistent, infer_datetime_format should just - # fallback to the default parsing - tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), - pd.to_datetime(s, infer_datetime_format=True)) - - s = pd.Series(np.array(['Jan/01/2011', 'Feb/01/2011', 'Mar/01/2011'])) - - tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), - pd.to_datetime(s, infer_datetime_format=True)) - - def test_to_datetime_infer_datetime_format_series_with_nans(self): - s = pd.Series(np.array(['01/01/2011 00:00:00', np.nan, - '01/03/2011 00:00:00', np.nan])) - tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), - pd.to_datetime(s, infer_datetime_format=True)) - - def test_to_datetime_infer_datetime_format_series_starting_with_nans(self): - s = pd.Series(np.array([np.nan, np.nan, '01/01/2011 00:00:00', - '01/02/2011 00:00:00', '01/03/2011 00:00:00'])) - - tm.assert_series_equal(pd.to_datetime(s, infer_datetime_format=False), - pd.to_datetime(s, infer_datetime_format=True)) - - def test_to_datetime_iso8601_noleading_0s(self): - # GH 11871 - s = pd.Series(['2014-1-1', '2014-2-2', '2015-3-3']) - expected = pd.Series([pd.Timestamp('2014-01-01'), - pd.Timestamp('2014-02-02'), - pd.Timestamp('2015-03-03')]) - tm.assert_series_equal(pd.to_datetime(s), expected) - tm.assert_series_equal(pd.to_datetime(s, format='%Y-%m-%d'), expected) - - -class TestGuessDatetimeFormat(tm.TestCase): - - def test_guess_datetime_format_with_parseable_formats(self): - tm._skip_if_not_us_locale() - dt_string_to_format = (('20111230', '%Y%m%d'), - ('2011-12-30', '%Y-%m-%d'), - ('30-12-2011', '%d-%m-%Y'), - ('2011-12-30 00:00:00', '%Y-%m-%d %H:%M:%S'), - ('2011-12-30T00:00:00', '%Y-%m-%dT%H:%M:%S'), - ('2011-12-30 00:00:00.000000', - '%Y-%m-%d %H:%M:%S.%f'), ) - - for dt_string, dt_format in dt_string_to_format: - self.assertEqual( - tools._guess_datetime_format(dt_string), - dt_format - ) - - def test_guess_datetime_format_with_dayfirst(self): - ambiguous_string = '01/01/2011' - self.assertEqual( - tools._guess_datetime_format(ambiguous_string, dayfirst=True), - '%d/%m/%Y' - ) - self.assertEqual( - tools._guess_datetime_format(ambiguous_string, dayfirst=False), - '%m/%d/%Y' - ) - - def test_guess_datetime_format_with_locale_specific_formats(self): - # The month names will vary depending on the locale, in which - # case these wont be parsed properly (dateutil can't parse them) - _skip_if_has_locale() - - dt_string_to_format = (('30/Dec/2011', '%d/%b/%Y'), - ('30/December/2011', '%d/%B/%Y'), - ('30/Dec/2011 00:00:00', '%d/%b/%Y %H:%M:%S'), ) - - for dt_string, dt_format in dt_string_to_format: - self.assertEqual( - tools._guess_datetime_format(dt_string), - dt_format - ) - - def test_guess_datetime_format_invalid_inputs(self): - # A datetime string must include a year, month and a day for it - # to be guessable, in addition to being a string that looks like - # a datetime - invalid_dts = [ - '2013', - '01/2013', - '12:00:00', - '1/1/1/1', - 'this_is_not_a_datetime', - '51a', - 9, - datetime(2011, 1, 1), - ] - - for invalid_dt in invalid_dts: - self.assertTrue(tools._guess_datetime_format(invalid_dt) is None) - - def test_guess_datetime_format_nopadding(self): - # GH 11142 - dt_string_to_format = (('2011-1-1', '%Y-%m-%d'), - ('30-1-2011', '%d-%m-%Y'), - ('1/1/2011', '%m/%d/%Y'), - ('2011-1-1 00:00:00', '%Y-%m-%d %H:%M:%S'), - ('2011-1-1 0:0:0', '%Y-%m-%d %H:%M:%S'), - ('2011-1-3T00:00:0', '%Y-%m-%dT%H:%M:%S')) - - for dt_string, dt_format in dt_string_to_format: - self.assertEqual( - tools._guess_datetime_format(dt_string), - dt_format - ) - - def test_guess_datetime_format_for_array(self): - tm._skip_if_not_us_locale() - expected_format = '%Y-%m-%d %H:%M:%S.%f' - dt_string = datetime(2011, 12, 30, 0, 0, 0).strftime(expected_format) - - test_arrays = [ - np.array([dt_string, dt_string, dt_string], dtype='O'), - np.array([np.nan, np.nan, dt_string], dtype='O'), - np.array([dt_string, 'random_string'], dtype='O'), - ] - - for test_array in test_arrays: - self.assertEqual( - tools._guess_datetime_format_for_array(test_array), - expected_format - ) - - format_for_string_of_nans = tools._guess_datetime_format_for_array( - np.array( - [np.nan, np.nan, np.nan], dtype='O')) - self.assertTrue(format_for_string_of_nans is None) - - -class TestTimestampToJulianDate(tm.TestCase): - def test_compare_1700(self): - r = Timestamp('1700-06-23').to_julian_date() - self.assertEqual(r, 2342145.5) - - def test_compare_2000(self): - r = Timestamp('2000-04-12').to_julian_date() - self.assertEqual(r, 2451646.5) - - def test_compare_2100(self): - r = Timestamp('2100-08-12').to_julian_date() - self.assertEqual(r, 2488292.5) - - def test_compare_hour01(self): - r = Timestamp('2000-08-12T01:00:00').to_julian_date() - self.assertEqual(r, 2451768.5416666666666666) - - def test_compare_hour13(self): - r = Timestamp('2000-08-12T13:00:00').to_julian_date() - self.assertEqual(r, 2451769.0416666666666666) - - -class TestDateTimeIndexToJulianDate(tm.TestCase): - def test_1700(self): - r1 = Float64Index([2345897.5, 2345898.5, 2345899.5, 2345900.5, - 2345901.5]) - r2 = date_range(start=Timestamp('1710-10-01'), periods=5, - freq='D').to_julian_date() - self.assertIsInstance(r2, Float64Index) - tm.assert_index_equal(r1, r2) - - def test_2000(self): - r1 = Float64Index([2451601.5, 2451602.5, 2451603.5, 2451604.5, - 2451605.5]) - r2 = date_range(start=Timestamp('2000-02-27'), periods=5, - freq='D').to_julian_date() - self.assertIsInstance(r2, Float64Index) - tm.assert_index_equal(r1, r2) - - def test_hour(self): - r1 = Float64Index( - [2451601.5, 2451601.5416666666666666, 2451601.5833333333333333, - 2451601.625, 2451601.6666666666666666]) - r2 = date_range(start=Timestamp('2000-02-27'), periods=5, - freq='H').to_julian_date() - self.assertIsInstance(r2, Float64Index) - tm.assert_index_equal(r1, r2) - - def test_minute(self): - r1 = Float64Index( - [2451601.5, 2451601.5006944444444444, 2451601.5013888888888888, - 2451601.5020833333333333, 2451601.5027777777777777]) - r2 = date_range(start=Timestamp('2000-02-27'), periods=5, - freq='T').to_julian_date() - self.assertIsInstance(r2, Float64Index) - tm.assert_index_equal(r1, r2) - - def test_second(self): - r1 = Float64Index( - [2451601.5, 2451601.500011574074074, 2451601.5000231481481481, - 2451601.5000347222222222, 2451601.5000462962962962]) - r2 = date_range(start=Timestamp('2000-02-27'), periods=5, - freq='S').to_julian_date() - self.assertIsInstance(r2, Float64Index) - tm.assert_index_equal(r1, r2) - - -class TestDaysInMonth(tm.TestCase): - # tests for issue #10154 - def test_day_not_in_month_coerce(self): - self.assertTrue(isnull(to_datetime('2015-02-29', errors='coerce'))) - self.assertTrue(isnull(to_datetime('2015-02-29', format="%Y-%m-%d", - errors='coerce'))) - self.assertTrue(isnull(to_datetime('2015-02-32', format="%Y-%m-%d", - errors='coerce'))) - self.assertTrue(isnull(to_datetime('2015-04-31', format="%Y-%m-%d", - errors='coerce'))) - - def test_day_not_in_month_raise(self): - self.assertRaises(ValueError, to_datetime, '2015-02-29', - errors='raise') - self.assertRaises(ValueError, to_datetime, '2015-02-29', - errors='raise', format="%Y-%m-%d") - self.assertRaises(ValueError, to_datetime, '2015-02-32', - errors='raise', format="%Y-%m-%d") - self.assertRaises(ValueError, to_datetime, '2015-04-31', - errors='raise', format="%Y-%m-%d") - - def test_day_not_in_month_ignore(self): - self.assertEqual(to_datetime( - '2015-02-29', errors='ignore'), '2015-02-29') - self.assertEqual(to_datetime( - '2015-02-29', errors='ignore', format="%Y-%m-%d"), '2015-02-29') - self.assertEqual(to_datetime( - '2015-02-32', errors='ignore', format="%Y-%m-%d"), '2015-02-32') - self.assertEqual(to_datetime( - '2015-04-31', errors='ignore', format="%Y-%m-%d"), '2015-04-31') - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_timeseries_legacy.py b/pandas/tseries/tests/test_timeseries_legacy.py deleted file mode 100644 index d8c01c53fb2e5..0000000000000 --- a/pandas/tseries/tests/test_timeseries_legacy.py +++ /dev/null @@ -1,226 +0,0 @@ -# pylint: disable-msg=E1101,W0612 -from datetime import datetime -import sys -import os -import nose -import numpy as np - -from pandas import (Index, Series, date_range, Timestamp, - DatetimeIndex, Int64Index, to_datetime) - -from pandas.tseries.frequencies import get_offset, to_offset -from pandas.tseries.offsets import BDay, Micro, Milli, MonthBegin -import pandas as pd - -from pandas.util.testing import assert_series_equal, assert_almost_equal -import pandas.util.testing as tm - -from pandas.compat import StringIO, cPickle as pickle -from pandas import read_pickle -from numpy.random import rand -import pandas.compat as compat - -randn = np.random.randn - - -# Unfortunately, too much has changed to handle these legacy pickles -# class TestLegacySupport(unittest.TestCase): -class LegacySupport(object): - - _multiprocess_can_split_ = True - - @classmethod - def setUpClass(cls): - if compat.PY3: - raise nose.SkipTest("not compatible with Python >= 3") - - pth, _ = os.path.split(os.path.abspath(__file__)) - filepath = os.path.join(pth, 'data', 'frame.pickle') - - with open(filepath, 'rb') as f: - cls.frame = pickle.load(f) - - filepath = os.path.join(pth, 'data', 'series.pickle') - with open(filepath, 'rb') as f: - cls.series = pickle.load(f) - - def test_pass_offset_warn(self): - buf = StringIO() - - sys.stderr = buf - DatetimeIndex(start='1/1/2000', periods=10, offset='H') - sys.stderr = sys.__stderr__ - - def test_unpickle_legacy_frame(self): - dtindex = DatetimeIndex(start='1/3/2005', end='1/14/2005', - freq=BDay(1)) - - unpickled = self.frame - - self.assertEqual(type(unpickled.index), DatetimeIndex) - self.assertEqual(len(unpickled), 10) - self.assertTrue((unpickled.columns == Int64Index(np.arange(5))).all()) - self.assertTrue((unpickled.index == dtindex).all()) - self.assertEqual(unpickled.index.offset, BDay(1, normalize=True)) - - def test_unpickle_legacy_series(self): - unpickled = self.series - - dtindex = DatetimeIndex(start='1/3/2005', end='1/14/2005', - freq=BDay(1)) - - self.assertEqual(type(unpickled.index), DatetimeIndex) - self.assertEqual(len(unpickled), 10) - self.assertTrue((unpickled.index == dtindex).all()) - self.assertEqual(unpickled.index.offset, BDay(1, normalize=True)) - - def test_unpickle_legacy_len0_daterange(self): - pth, _ = os.path.split(os.path.abspath(__file__)) - filepath = os.path.join(pth, 'data', 'series_daterange0.pickle') - - result = pd.read_pickle(filepath) - - ex_index = DatetimeIndex([], freq='B') - - self.assert_index_equal(result.index, ex_index) - tm.assertIsInstance(result.index.freq, BDay) - self.assertEqual(len(result), 0) - - def test_arithmetic_interaction(self): - index = self.frame.index - obj_index = index.asobject - - dseries = Series(rand(len(index)), index=index) - oseries = Series(dseries.values, index=obj_index) - - result = dseries + oseries - expected = dseries * 2 - tm.assertIsInstance(result.index, DatetimeIndex) - assert_series_equal(result, expected) - - result = dseries + oseries[:5] - expected = dseries + dseries[:5] - tm.assertIsInstance(result.index, DatetimeIndex) - assert_series_equal(result, expected) - - def test_join_interaction(self): - index = self.frame.index - obj_index = index.asobject - - def _check_join(left, right, how='inner'): - ra, rb, rc = left.join(right, how=how, return_indexers=True) - ea, eb, ec = left.join(DatetimeIndex(right), how=how, - return_indexers=True) - - tm.assertIsInstance(ra, DatetimeIndex) - self.assert_index_equal(ra, ea) - - assert_almost_equal(rb, eb) - assert_almost_equal(rc, ec) - - _check_join(index[:15], obj_index[5:], how='inner') - _check_join(index[:15], obj_index[5:], how='outer') - _check_join(index[:15], obj_index[5:], how='right') - _check_join(index[:15], obj_index[5:], how='left') - - def test_join_nonunique(self): - idx1 = to_datetime(['2012-11-06 16:00:11.477563', - '2012-11-06 16:00:11.477563']) - idx2 = to_datetime(['2012-11-06 15:11:09.006507', - '2012-11-06 15:11:09.006507']) - rs = idx1.join(idx2, how='outer') - self.assertTrue(rs.is_monotonic) - - def test_unpickle_daterange(self): - pth, _ = os.path.split(os.path.abspath(__file__)) - filepath = os.path.join(pth, 'data', 'daterange_073.pickle') - - rng = read_pickle(filepath) - tm.assertIsInstance(rng[0], datetime) - tm.assertIsInstance(rng.offset, BDay) - self.assertEqual(rng.values.dtype, object) - - def test_setops(self): - index = self.frame.index - obj_index = index.asobject - - result = index[:5].union(obj_index[5:]) - expected = index - tm.assertIsInstance(result, DatetimeIndex) - self.assert_index_equal(result, expected) - - result = index[:10].intersection(obj_index[5:]) - expected = index[5:10] - tm.assertIsInstance(result, DatetimeIndex) - self.assert_index_equal(result, expected) - - result = index[:10] - obj_index[5:] - expected = index[:5] - tm.assertIsInstance(result, DatetimeIndex) - self.assert_index_equal(result, expected) - - def test_index_conversion(self): - index = self.frame.index - obj_index = index.asobject - - conv = DatetimeIndex(obj_index) - self.assert_index_equal(conv, index) - - self.assertRaises(ValueError, DatetimeIndex, ['a', 'b', 'c', 'd']) - - def test_tolist(self): - rng = date_range('1/1/2000', periods=10) - - result = rng.tolist() - tm.assertIsInstance(result[0], Timestamp) - - def test_object_convert_fail(self): - idx = DatetimeIndex([np.NaT]) - self.assertRaises(ValueError, idx.astype, 'O') - - def test_setops_conversion_fail(self): - index = self.frame.index - - right = Index(['a', 'b', 'c', 'd']) - - result = index.union(right) - expected = Index(np.concatenate([index.asobject, right])) - self.assert_index_equal(result, expected) - - result = index.intersection(right) - expected = Index([]) - self.assert_index_equal(result, expected) - - def test_legacy_time_rules(self): - rules = [('WEEKDAY', 'B'), ('EOM', 'BM'), ('W@MON', 'W-MON'), - ('W@TUE', 'W-TUE'), ('W@WED', 'W-WED'), ('W@THU', 'W-THU'), - ('W@FRI', 'W-FRI'), ('Q@JAN', 'BQ-JAN'), ('Q@FEB', 'BQ-FEB'), - ('Q@MAR', 'BQ-MAR'), ('A@JAN', 'BA-JAN'), ('A@FEB', 'BA-FEB'), - ('A@MAR', 'BA-MAR'), ('A@APR', 'BA-APR'), ('A@MAY', 'BA-MAY'), - ('A@JUN', 'BA-JUN'), ('A@JUL', 'BA-JUL'), ('A@AUG', 'BA-AUG'), - ('A@SEP', 'BA-SEP'), ('A@OCT', 'BA-OCT'), ('A@NOV', 'BA-NOV'), - ('A@DEC', 'BA-DEC'), ('WOM@1FRI', 'WOM-1FRI'), - ('WOM@2FRI', 'WOM-2FRI'), ('WOM@3FRI', 'WOM-3FRI'), - ('WOM@4FRI', 'WOM-4FRI')] - - start, end = '1/1/2000', '1/1/2010' - - for old_freq, new_freq in rules: - old_rng = date_range(start, end, freq=old_freq) - new_rng = date_range(start, end, freq=new_freq) - self.assert_index_equal(old_rng, new_rng) - - def test_ms_vs_MS(self): - left = get_offset('ms') - right = get_offset('MS') - self.assertEqual(left, Milli()) - self.assertEqual(right, MonthBegin()) - - def test_rule_aliases(self): - rule = to_offset('10us') - self.assertEqual(rule, Micro(10)) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_tslib.py b/pandas/tseries/tests/test_tslib.py deleted file mode 100644 index 58ec1561b2535..0000000000000 --- a/pandas/tseries/tests/test_tslib.py +++ /dev/null @@ -1,1546 +0,0 @@ -import nose -from distutils.version import LooseVersion -import numpy as np - -from pandas import tslib, lib -import pandas._period as period -import datetime - -import pandas as pd -from pandas.core.api import (Timestamp, Index, Series, Timedelta, Period, - to_datetime) -from pandas.tslib import get_timezone -from pandas._period import period_asfreq, period_ordinal -from pandas.tseries.index import date_range, DatetimeIndex -from pandas.tseries.frequencies import ( - get_freq, - RESO_US, RESO_MS, RESO_SEC, RESO_HR, RESO_DAY, RESO_MIN -) -import pandas.tseries.tools as tools -import pandas.tseries.offsets as offsets -import pandas.util.testing as tm -import pandas.compat as compat -from pandas.compat.numpy import (np_datetime64_compat, - np_array_datetime64_compat) - -from pandas.util.testing import assert_series_equal, _skip_if_has_locale - - -class TestTsUtil(tm.TestCase): - - def test_try_parse_dates(self): - from dateutil.parser import parse - arr = np.array(['5/1/2000', '6/1/2000', '7/1/2000'], dtype=object) - - result = lib.try_parse_dates(arr, dayfirst=True) - expected = [parse(d, dayfirst=True) for d in arr] - self.assertTrue(np.array_equal(result, expected)) - - def test_min_valid(self): - # Ensure that Timestamp.min is a valid Timestamp - Timestamp(Timestamp.min) - - def test_max_valid(self): - # Ensure that Timestamp.max is a valid Timestamp - Timestamp(Timestamp.max) - - def test_to_datetime_bijective(self): - # Ensure that converting to datetime and back only loses precision - # by going from nanoseconds to microseconds. - exp_warning = None if Timestamp.max.nanosecond == 0 else UserWarning - with tm.assert_produces_warning(exp_warning, check_stacklevel=False): - self.assertEqual( - Timestamp(Timestamp.max.to_pydatetime()).value / 1000, - Timestamp.max.value / 1000) - - exp_warning = None if Timestamp.min.nanosecond == 0 else UserWarning - with tm.assert_produces_warning(exp_warning, check_stacklevel=False): - self.assertEqual( - Timestamp(Timestamp.min.to_pydatetime()).value / 1000, - Timestamp.min.value / 1000) - - -class TestTimestamp(tm.TestCase): - - def test_constructor(self): - base_str = '2014-07-01 09:00' - base_dt = datetime.datetime(2014, 7, 1, 9) - base_expected = 1404205200000000000 - - # confirm base representation is correct - import calendar - self.assertEqual(calendar.timegm(base_dt.timetuple()) * 1000000000, - base_expected) - - tests = [(base_str, base_dt, base_expected), - ('2014-07-01 10:00', datetime.datetime(2014, 7, 1, 10), - base_expected + 3600 * 1000000000), - ('2014-07-01 09:00:00.000008000', - datetime.datetime(2014, 7, 1, 9, 0, 0, 8), - base_expected + 8000), - ('2014-07-01 09:00:00.000000005', - Timestamp('2014-07-01 09:00:00.000000005'), - base_expected + 5)] - - tm._skip_if_no_pytz() - tm._skip_if_no_dateutil() - import pytz - import dateutil - timezones = [(None, 0), ('UTC', 0), (pytz.utc, 0), ('Asia/Tokyo', 9), - ('US/Eastern', -4), ('dateutil/US/Pacific', -7), - (pytz.FixedOffset(-180), -3), - (dateutil.tz.tzoffset(None, 18000), 5)] - - for date_str, date, expected in tests: - for result in [Timestamp(date_str), Timestamp(date)]: - # only with timestring - self.assertEqual(result.value, expected) - self.assertEqual(tslib.pydt_to_i8(result), expected) - - # re-creation shouldn't affect to internal value - result = Timestamp(result) - self.assertEqual(result.value, expected) - self.assertEqual(tslib.pydt_to_i8(result), expected) - - # with timezone - for tz, offset in timezones: - for result in [Timestamp(date_str, tz=tz), Timestamp(date, - tz=tz)]: - expected_tz = expected - offset * 3600 * 1000000000 - self.assertEqual(result.value, expected_tz) - self.assertEqual(tslib.pydt_to_i8(result), expected_tz) - - # should preserve tz - result = Timestamp(result) - self.assertEqual(result.value, expected_tz) - self.assertEqual(tslib.pydt_to_i8(result), expected_tz) - - # should convert to UTC - result = Timestamp(result, tz='UTC') - expected_utc = expected - offset * 3600 * 1000000000 - self.assertEqual(result.value, expected_utc) - self.assertEqual(tslib.pydt_to_i8(result), expected_utc) - - def test_constructor_with_stringoffset(self): - # GH 7833 - base_str = '2014-07-01 11:00:00+02:00' - base_dt = datetime.datetime(2014, 7, 1, 9) - base_expected = 1404205200000000000 - - # confirm base representation is correct - import calendar - self.assertEqual(calendar.timegm(base_dt.timetuple()) * 1000000000, - base_expected) - - tests = [(base_str, base_expected), - ('2014-07-01 12:00:00+02:00', - base_expected + 3600 * 1000000000), - ('2014-07-01 11:00:00.000008000+02:00', base_expected + 8000), - ('2014-07-01 11:00:00.000000005+02:00', base_expected + 5)] - - tm._skip_if_no_pytz() - tm._skip_if_no_dateutil() - import pytz - import dateutil - timezones = [(None, 0), ('UTC', 0), (pytz.utc, 0), ('Asia/Tokyo', 9), - ('US/Eastern', -4), ('dateutil/US/Pacific', -7), - (pytz.FixedOffset(-180), -3), - (dateutil.tz.tzoffset(None, 18000), 5)] - - for date_str, expected in tests: - for result in [Timestamp(date_str)]: - # only with timestring - self.assertEqual(result.value, expected) - self.assertEqual(tslib.pydt_to_i8(result), expected) - - # re-creation shouldn't affect to internal value - result = Timestamp(result) - self.assertEqual(result.value, expected) - self.assertEqual(tslib.pydt_to_i8(result), expected) - - # with timezone - for tz, offset in timezones: - result = Timestamp(date_str, tz=tz) - expected_tz = expected - self.assertEqual(result.value, expected_tz) - self.assertEqual(tslib.pydt_to_i8(result), expected_tz) - - # should preserve tz - result = Timestamp(result) - self.assertEqual(result.value, expected_tz) - self.assertEqual(tslib.pydt_to_i8(result), expected_tz) - - # should convert to UTC - result = Timestamp(result, tz='UTC') - expected_utc = expected - self.assertEqual(result.value, expected_utc) - self.assertEqual(tslib.pydt_to_i8(result), expected_utc) - - # This should be 2013-11-01 05:00 in UTC - # converted to Chicago tz - result = Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago') - self.assertEqual(result.value, Timestamp('2013-11-01 05:00').value) - expected = "Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')" # noqa - self.assertEqual(repr(result), expected) - self.assertEqual(result, eval(repr(result))) - - # This should be 2013-11-01 05:00 in UTC - # converted to Tokyo tz (+09:00) - result = Timestamp('2013-11-01 00:00:00-0500', tz='Asia/Tokyo') - self.assertEqual(result.value, Timestamp('2013-11-01 05:00').value) - expected = "Timestamp('2013-11-01 14:00:00+0900', tz='Asia/Tokyo')" - self.assertEqual(repr(result), expected) - self.assertEqual(result, eval(repr(result))) - - # GH11708 - # This should be 2015-11-18 10:00 in UTC - # converted to Asia/Katmandu - result = Timestamp("2015-11-18 15:45:00+05:45", tz="Asia/Katmandu") - self.assertEqual(result.value, Timestamp("2015-11-18 10:00").value) - expected = "Timestamp('2015-11-18 15:45:00+0545', tz='Asia/Katmandu')" - self.assertEqual(repr(result), expected) - self.assertEqual(result, eval(repr(result))) - - # This should be 2015-11-18 10:00 in UTC - # converted to Asia/Kolkata - result = Timestamp("2015-11-18 15:30:00+05:30", tz="Asia/Kolkata") - self.assertEqual(result.value, Timestamp("2015-11-18 10:00").value) - expected = "Timestamp('2015-11-18 15:30:00+0530', tz='Asia/Kolkata')" - self.assertEqual(repr(result), expected) - self.assertEqual(result, eval(repr(result))) - - def test_constructor_invalid(self): - with tm.assertRaisesRegexp(TypeError, 'Cannot convert input'): - Timestamp(slice(2)) - with tm.assertRaisesRegexp(ValueError, 'Cannot convert Period'): - Timestamp(Period('1000-01-01')) - - def test_constructor_positional(self): - # GH 10758 - with tm.assertRaises(TypeError): - Timestamp(2000, 1) - with tm.assertRaises(ValueError): - Timestamp(2000, 0, 1) - with tm.assertRaises(ValueError): - Timestamp(2000, 13, 1) - with tm.assertRaises(ValueError): - Timestamp(2000, 1, 0) - with tm.assertRaises(ValueError): - Timestamp(2000, 1, 32) - - # GH 11630 - self.assertEqual( - repr(Timestamp(2015, 11, 12)), - repr(Timestamp('20151112'))) - - self.assertEqual( - repr(Timestamp(2015, 11, 12, 1, 2, 3, 999999)), - repr(Timestamp('2015-11-12 01:02:03.999999'))) - - self.assertIs(Timestamp(None), pd.NaT) - - def test_constructor_keyword(self): - # GH 10758 - with tm.assertRaises(TypeError): - Timestamp(year=2000, month=1) - with tm.assertRaises(ValueError): - Timestamp(year=2000, month=0, day=1) - with tm.assertRaises(ValueError): - Timestamp(year=2000, month=13, day=1) - with tm.assertRaises(ValueError): - Timestamp(year=2000, month=1, day=0) - with tm.assertRaises(ValueError): - Timestamp(year=2000, month=1, day=32) - - self.assertEqual( - repr(Timestamp(year=2015, month=11, day=12)), - repr(Timestamp('20151112'))) - - self.assertEqual( - repr(Timestamp(year=2015, month=11, day=12, - hour=1, minute=2, second=3, microsecond=999999)), - repr(Timestamp('2015-11-12 01:02:03.999999'))) - - def test_constructor_fromordinal(self): - base = datetime.datetime(2000, 1, 1) - - ts = Timestamp.fromordinal(base.toordinal(), freq='D') - self.assertEqual(base, ts) - self.assertEqual(ts.freq, 'D') - self.assertEqual(base.toordinal(), ts.toordinal()) - - ts = Timestamp.fromordinal(base.toordinal(), tz='US/Eastern') - self.assertEqual(pd.Timestamp('2000-01-01', tz='US/Eastern'), ts) - self.assertEqual(base.toordinal(), ts.toordinal()) - - def test_constructor_offset_depr(self): - # GH 12160 - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - ts = Timestamp('2011-01-01', offset='D') - self.assertEqual(ts.freq, 'D') - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - self.assertEqual(ts.offset, 'D') - - msg = "Can only specify freq or offset, not both" - with tm.assertRaisesRegexp(TypeError, msg): - Timestamp('2011-01-01', offset='D', freq='D') - - def test_constructor_offset_depr_fromordinal(self): - # GH 12160 - base = datetime.datetime(2000, 1, 1) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - ts = Timestamp.fromordinal(base.toordinal(), offset='D') - self.assertEqual(pd.Timestamp('2000-01-01'), ts) - self.assertEqual(ts.freq, 'D') - self.assertEqual(base.toordinal(), ts.toordinal()) - - msg = "Can only specify freq or offset, not both" - with tm.assertRaisesRegexp(TypeError, msg): - Timestamp.fromordinal(base.toordinal(), offset='D', freq='D') - - def test_conversion(self): - # GH 9255 - ts = Timestamp('2000-01-01') - - result = ts.to_pydatetime() - expected = datetime.datetime(2000, 1, 1) - self.assertEqual(result, expected) - self.assertEqual(type(result), type(expected)) - - result = ts.to_datetime64() - expected = np.datetime64(ts.value, 'ns') - self.assertEqual(result, expected) - self.assertEqual(type(result), type(expected)) - self.assertEqual(result.dtype, expected.dtype) - - def test_repr(self): - tm._skip_if_no_pytz() - tm._skip_if_no_dateutil() - - dates = ['2014-03-07', '2014-01-01 09:00', - '2014-01-01 00:00:00.000000001'] - - # dateutil zone change (only matters for repr) - import dateutil - if (dateutil.__version__ >= LooseVersion('2.3') and - (dateutil.__version__ <= LooseVersion('2.4.0') or - dateutil.__version__ >= LooseVersion('2.6.0'))): - timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', - 'dateutil/US/Pacific'] - else: - timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', - 'dateutil/America/Los_Angeles'] - - freqs = ['D', 'M', 'S', 'N'] - - for date in dates: - for tz in timezones: - for freq in freqs: - - # avoid to match with timezone name - freq_repr = "'{0}'".format(freq) - if tz.startswith('dateutil'): - tz_repr = tz.replace('dateutil', '') - else: - tz_repr = tz - - date_only = Timestamp(date) - self.assertIn(date, repr(date_only)) - self.assertNotIn(tz_repr, repr(date_only)) - self.assertNotIn(freq_repr, repr(date_only)) - self.assertEqual(date_only, eval(repr(date_only))) - - date_tz = Timestamp(date, tz=tz) - self.assertIn(date, repr(date_tz)) - self.assertIn(tz_repr, repr(date_tz)) - self.assertNotIn(freq_repr, repr(date_tz)) - self.assertEqual(date_tz, eval(repr(date_tz))) - - date_freq = Timestamp(date, freq=freq) - self.assertIn(date, repr(date_freq)) - self.assertNotIn(tz_repr, repr(date_freq)) - self.assertIn(freq_repr, repr(date_freq)) - self.assertEqual(date_freq, eval(repr(date_freq))) - - date_tz_freq = Timestamp(date, tz=tz, freq=freq) - self.assertIn(date, repr(date_tz_freq)) - self.assertIn(tz_repr, repr(date_tz_freq)) - self.assertIn(freq_repr, repr(date_tz_freq)) - self.assertEqual(date_tz_freq, eval(repr(date_tz_freq))) - - # this can cause the tz field to be populated, but it's redundant to - # information in the datestring - tm._skip_if_no_pytz() - import pytz # noqa - date_with_utc_offset = Timestamp('2014-03-13 00:00:00-0400', tz=None) - self.assertIn('2014-03-13 00:00:00-0400', repr(date_with_utc_offset)) - self.assertNotIn('tzoffset', repr(date_with_utc_offset)) - self.assertIn('pytz.FixedOffset(-240)', repr(date_with_utc_offset)) - expr = repr(date_with_utc_offset).replace("'pytz.FixedOffset(-240)'", - 'pytz.FixedOffset(-240)') - self.assertEqual(date_with_utc_offset, eval(expr)) - - def test_bounds_with_different_units(self): - out_of_bounds_dates = ('1677-09-21', '2262-04-12', ) - - time_units = ('D', 'h', 'm', 's', 'ms', 'us') - - for date_string in out_of_bounds_dates: - for unit in time_units: - self.assertRaises(ValueError, Timestamp, np.datetime64( - date_string, dtype='M8[%s]' % unit)) - - in_bounds_dates = ('1677-09-23', '2262-04-11', ) - - for date_string in in_bounds_dates: - for unit in time_units: - Timestamp(np.datetime64(date_string, dtype='M8[%s]' % unit)) - - def test_tz(self): - t = '2014-02-01 09:00' - ts = Timestamp(t) - local = ts.tz_localize('Asia/Tokyo') - self.assertEqual(local.hour, 9) - self.assertEqual(local, Timestamp(t, tz='Asia/Tokyo')) - conv = local.tz_convert('US/Eastern') - self.assertEqual(conv, Timestamp('2014-01-31 19:00', tz='US/Eastern')) - self.assertEqual(conv.hour, 19) - - # preserves nanosecond - ts = Timestamp(t) + offsets.Nano(5) - local = ts.tz_localize('Asia/Tokyo') - self.assertEqual(local.hour, 9) - self.assertEqual(local.nanosecond, 5) - conv = local.tz_convert('US/Eastern') - self.assertEqual(conv.nanosecond, 5) - self.assertEqual(conv.hour, 19) - - def test_tz_localize_ambiguous(self): - - ts = Timestamp('2014-11-02 01:00') - ts_dst = ts.tz_localize('US/Eastern', ambiguous=True) - ts_no_dst = ts.tz_localize('US/Eastern', ambiguous=False) - - rng = date_range('2014-11-02', periods=3, freq='H', tz='US/Eastern') - self.assertEqual(rng[1], ts_dst) - self.assertEqual(rng[2], ts_no_dst) - self.assertRaises(ValueError, ts.tz_localize, 'US/Eastern', - ambiguous='infer') - - # GH 8025 - with tm.assertRaisesRegexp(TypeError, - 'Cannot localize tz-aware Timestamp, use ' - 'tz_convert for conversions'): - Timestamp('2011-01-01', tz='US/Eastern').tz_localize('Asia/Tokyo') - - with tm.assertRaisesRegexp(TypeError, - 'Cannot convert tz-naive Timestamp, use ' - 'tz_localize to localize'): - Timestamp('2011-01-01').tz_convert('Asia/Tokyo') - - def test_tz_localize_nonexistent(self): - # See issue 13057 - from pytz.exceptions import NonExistentTimeError - times = ['2015-03-08 02:00', '2015-03-08 02:30', - '2015-03-29 02:00', '2015-03-29 02:30'] - timezones = ['US/Eastern', 'US/Pacific', - 'Europe/Paris', 'Europe/Belgrade'] - for t, tz in zip(times, timezones): - ts = Timestamp(t) - self.assertRaises(NonExistentTimeError, ts.tz_localize, - tz) - self.assertRaises(NonExistentTimeError, ts.tz_localize, - tz, errors='raise') - self.assertIs(ts.tz_localize(tz, errors='coerce'), - pd.NaT) - - def test_tz_localize_errors_ambiguous(self): - # See issue 13057 - from pytz.exceptions import AmbiguousTimeError - ts = pd.Timestamp('2015-11-1 01:00') - self.assertRaises(AmbiguousTimeError, - ts.tz_localize, 'US/Pacific', errors='coerce') - - def test_tz_localize_roundtrip(self): - for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']: - for t in ['2014-02-01 09:00', '2014-07-08 09:00', - '2014-11-01 17:00', '2014-11-05 00:00']: - ts = Timestamp(t) - localized = ts.tz_localize(tz) - self.assertEqual(localized, Timestamp(t, tz=tz)) - - with tm.assertRaises(TypeError): - localized.tz_localize(tz) - - reset = localized.tz_localize(None) - self.assertEqual(reset, ts) - self.assertTrue(reset.tzinfo is None) - - def test_tz_convert_roundtrip(self): - for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']: - for t in ['2014-02-01 09:00', '2014-07-08 09:00', - '2014-11-01 17:00', '2014-11-05 00:00']: - ts = Timestamp(t, tz='UTC') - converted = ts.tz_convert(tz) - - reset = converted.tz_convert(None) - self.assertEqual(reset, Timestamp(t)) - self.assertTrue(reset.tzinfo is None) - self.assertEqual(reset, - converted.tz_convert('UTC').tz_localize(None)) - - def test_barely_oob_dts(self): - one_us = np.timedelta64(1).astype('timedelta64[us]') - - # By definition we can't go out of bounds in [ns], so we - # convert the datetime64s to [us] so we can go out of bounds - min_ts_us = np.datetime64(Timestamp.min).astype('M8[us]') - max_ts_us = np.datetime64(Timestamp.max).astype('M8[us]') - - # No error for the min/max datetimes - Timestamp(min_ts_us) - Timestamp(max_ts_us) - - # One us less than the minimum is an error - self.assertRaises(ValueError, Timestamp, min_ts_us - one_us) - - # One us more than the maximum is an error - self.assertRaises(ValueError, Timestamp, max_ts_us + one_us) - - def test_utc_z_designator(self): - self.assertEqual(get_timezone( - Timestamp('2014-11-02 01:00Z').tzinfo), 'UTC') - - def test_now(self): - # #9000 - ts_from_string = Timestamp('now') - ts_from_method = Timestamp.now() - ts_datetime = datetime.datetime.now() - - ts_from_string_tz = Timestamp('now', tz='US/Eastern') - ts_from_method_tz = Timestamp.now(tz='US/Eastern') - - # Check that the delta between the times is less than 1s (arbitrarily - # small) - delta = Timedelta(seconds=1) - self.assertTrue(abs(ts_from_method - ts_from_string) < delta) - self.assertTrue(abs(ts_datetime - ts_from_method) < delta) - self.assertTrue(abs(ts_from_method_tz - ts_from_string_tz) < delta) - self.assertTrue(abs(ts_from_string_tz.tz_localize(None) - - ts_from_method_tz.tz_localize(None)) < delta) - - def test_today(self): - - ts_from_string = Timestamp('today') - ts_from_method = Timestamp.today() - ts_datetime = datetime.datetime.today() - - ts_from_string_tz = Timestamp('today', tz='US/Eastern') - ts_from_method_tz = Timestamp.today(tz='US/Eastern') - - # Check that the delta between the times is less than 1s (arbitrarily - # small) - delta = Timedelta(seconds=1) - self.assertTrue(abs(ts_from_method - ts_from_string) < delta) - self.assertTrue(abs(ts_datetime - ts_from_method) < delta) - self.assertTrue(abs(ts_from_method_tz - ts_from_string_tz) < delta) - self.assertTrue(abs(ts_from_string_tz.tz_localize(None) - - ts_from_method_tz.tz_localize(None)) < delta) - - def test_asm8(self): - np.random.seed(7960929) - ns = [Timestamp.min.value, Timestamp.max.value, 1000, ] - for n in ns: - self.assertEqual(Timestamp(n).asm8.view('i8'), - np.datetime64(n, 'ns').view('i8'), n) - self.assertEqual(Timestamp('nat').asm8.view('i8'), - np.datetime64('nat', 'ns').view('i8')) - - def test_fields(self): - def check(value, equal): - # that we are int/long like - self.assertTrue(isinstance(value, (int, compat.long))) - self.assertEqual(value, equal) - - # GH 10050 - ts = Timestamp('2015-05-10 09:06:03.000100001') - check(ts.year, 2015) - check(ts.month, 5) - check(ts.day, 10) - check(ts.hour, 9) - check(ts.minute, 6) - check(ts.second, 3) - self.assertRaises(AttributeError, lambda: ts.millisecond) - check(ts.microsecond, 100) - check(ts.nanosecond, 1) - check(ts.dayofweek, 6) - check(ts.quarter, 2) - check(ts.dayofyear, 130) - check(ts.week, 19) - check(ts.daysinmonth, 31) - check(ts.daysinmonth, 31) - - def test_nat_fields(self): - # GH 10050 - ts = Timestamp('NaT') - self.assertTrue(np.isnan(ts.year)) - self.assertTrue(np.isnan(ts.month)) - self.assertTrue(np.isnan(ts.day)) - self.assertTrue(np.isnan(ts.hour)) - self.assertTrue(np.isnan(ts.minute)) - self.assertTrue(np.isnan(ts.second)) - self.assertTrue(np.isnan(ts.microsecond)) - self.assertTrue(np.isnan(ts.nanosecond)) - self.assertTrue(np.isnan(ts.dayofweek)) - self.assertTrue(np.isnan(ts.quarter)) - self.assertTrue(np.isnan(ts.dayofyear)) - self.assertTrue(np.isnan(ts.week)) - self.assertTrue(np.isnan(ts.daysinmonth)) - self.assertTrue(np.isnan(ts.days_in_month)) - - def test_pprint(self): - # GH12622 - import pprint - nested_obj = {'foo': 1, - 'bar': [{'w': {'a': Timestamp('2011-01-01')}}] * 10} - result = pprint.pformat(nested_obj, width=50) - expected = r"""{'bar': [{'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}, - {'w': {'a': Timestamp('2011-01-01 00:00:00')}}], - 'foo': 1}""" - self.assertEqual(result, expected) - - def to_datetime_depr(self): - # see gh-8254 - ts = Timestamp('2011-01-01') - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - expected = datetime.datetime(2011, 1, 1) - result = ts.to_datetime() - self.assertEqual(result, expected) - - def to_pydatetime_nonzero_nano(self): - ts = Timestamp('2011-01-01 9:00:00.123456789') - - # Warn the user of data loss (nanoseconds). - with tm.assert_produces_warning(UserWarning, - check_stacklevel=False): - expected = datetime.datetime(2011, 1, 1, 9, 0, 0, 123456) - result = ts.to_pydatetime() - self.assertEqual(result, expected) - - -class TestDatetimeParsingWrappers(tm.TestCase): - def test_does_not_convert_mixed_integer(self): - bad_date_strings = ('-50000', '999', '123.1234', 'm', 'T') - - for bad_date_string in bad_date_strings: - self.assertFalse(tslib._does_string_look_like_datetime( - bad_date_string)) - - good_date_strings = ('2012-01-01', - '01/01/2012', - 'Mon Sep 16, 2013', - '01012012', - '0101', - '1-1', ) - - for good_date_string in good_date_strings: - self.assertTrue(tslib._does_string_look_like_datetime( - good_date_string)) - - def test_parsers(self): - - # https://github.com/dateutil/dateutil/issues/217 - import dateutil - yearfirst = dateutil.__version__ >= LooseVersion('2.5.0') - - cases = {'2011-01-01': datetime.datetime(2011, 1, 1), - '2Q2005': datetime.datetime(2005, 4, 1), - '2Q05': datetime.datetime(2005, 4, 1), - '2005Q1': datetime.datetime(2005, 1, 1), - '05Q1': datetime.datetime(2005, 1, 1), - '2011Q3': datetime.datetime(2011, 7, 1), - '11Q3': datetime.datetime(2011, 7, 1), - '3Q2011': datetime.datetime(2011, 7, 1), - '3Q11': datetime.datetime(2011, 7, 1), - - # quarterly without space - '2000Q4': datetime.datetime(2000, 10, 1), - '00Q4': datetime.datetime(2000, 10, 1), - '4Q2000': datetime.datetime(2000, 10, 1), - '4Q00': datetime.datetime(2000, 10, 1), - '2000q4': datetime.datetime(2000, 10, 1), - '2000-Q4': datetime.datetime(2000, 10, 1), - '00-Q4': datetime.datetime(2000, 10, 1), - '4Q-2000': datetime.datetime(2000, 10, 1), - '4Q-00': datetime.datetime(2000, 10, 1), - '00q4': datetime.datetime(2000, 10, 1), - '2005': datetime.datetime(2005, 1, 1), - '2005-11': datetime.datetime(2005, 11, 1), - '2005 11': datetime.datetime(2005, 11, 1), - '11-2005': datetime.datetime(2005, 11, 1), - '11 2005': datetime.datetime(2005, 11, 1), - '200511': datetime.datetime(2020, 5, 11), - '20051109': datetime.datetime(2005, 11, 9), - '20051109 10:15': datetime.datetime(2005, 11, 9, 10, 15), - '20051109 08H': datetime.datetime(2005, 11, 9, 8, 0), - '2005-11-09 10:15': datetime.datetime(2005, 11, 9, 10, 15), - '2005-11-09 08H': datetime.datetime(2005, 11, 9, 8, 0), - '2005/11/09 10:15': datetime.datetime(2005, 11, 9, 10, 15), - '2005/11/09 08H': datetime.datetime(2005, 11, 9, 8, 0), - "Thu Sep 25 10:36:28 2003": datetime.datetime(2003, 9, 25, 10, - 36, 28), - "Thu Sep 25 2003": datetime.datetime(2003, 9, 25), - "Sep 25 2003": datetime.datetime(2003, 9, 25), - "January 1 2014": datetime.datetime(2014, 1, 1), - - # GH 10537 - '2014-06': datetime.datetime(2014, 6, 1), - '06-2014': datetime.datetime(2014, 6, 1), - '2014-6': datetime.datetime(2014, 6, 1), - '6-2014': datetime.datetime(2014, 6, 1), - - '20010101 12': datetime.datetime(2001, 1, 1, 12), - '20010101 1234': datetime.datetime(2001, 1, 1, 12, 34), - '20010101 123456': datetime.datetime(2001, 1, 1, 12, 34, 56), - } - - for date_str, expected in compat.iteritems(cases): - result1, _, _ = tools.parse_time_string(date_str, - yearfirst=yearfirst) - result2 = to_datetime(date_str, yearfirst=yearfirst) - result3 = to_datetime([date_str], yearfirst=yearfirst) - # result5 is used below - result4 = to_datetime(np.array([date_str], dtype=object), - yearfirst=yearfirst) - result6 = DatetimeIndex([date_str], yearfirst=yearfirst) - # result7 is used below - result8 = DatetimeIndex(Index([date_str]), yearfirst=yearfirst) - result9 = DatetimeIndex(Series([date_str]), yearfirst=yearfirst) - - for res in [result1, result2]: - self.assertEqual(res, expected) - for res in [result3, result4, result6, result8, result9]: - exp = DatetimeIndex([pd.Timestamp(expected)]) - tm.assert_index_equal(res, exp) - - # these really need to have yearfist, but we don't support - if not yearfirst: - result5 = Timestamp(date_str) - self.assertEqual(result5, expected) - result7 = date_range(date_str, freq='S', periods=1, - yearfirst=yearfirst) - self.assertEqual(result7, expected) - - # NaT - result1, _, _ = tools.parse_time_string('NaT') - result2 = to_datetime('NaT') - result3 = Timestamp('NaT') - result4 = DatetimeIndex(['NaT'])[0] - self.assertTrue(result1 is tslib.NaT) - self.assertTrue(result1 is tslib.NaT) - self.assertTrue(result1 is tslib.NaT) - self.assertTrue(result1 is tslib.NaT) - - def test_parsers_quarter_invalid(self): - - cases = ['2Q 2005', '2Q-200A', '2Q-200', '22Q2005', '6Q-20', '2Q200.'] - for case in cases: - self.assertRaises(ValueError, tools.parse_time_string, case) - - def test_parsers_dayfirst_yearfirst(self): - tm._skip_if_no_dateutil() - - # OK - # 2.5.1 10-11-12 [dayfirst=0, yearfirst=0] -> 2012-10-11 00:00:00 - # 2.5.2 10-11-12 [dayfirst=0, yearfirst=1] -> 2012-10-11 00:00:00 - # 2.5.3 10-11-12 [dayfirst=0, yearfirst=0] -> 2012-10-11 00:00:00 - - # OK - # 2.5.1 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 - # 2.5.2 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 - # 2.5.3 10-11-12 [dayfirst=0, yearfirst=1] -> 2010-11-12 00:00:00 - - # bug fix in 2.5.2 - # 2.5.1 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-11-12 00:00:00 - # 2.5.2 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-12-11 00:00:00 - # 2.5.3 10-11-12 [dayfirst=1, yearfirst=1] -> 2010-12-11 00:00:00 - - # OK - # 2.5.1 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 - # 2.5.2 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 - # 2.5.3 10-11-12 [dayfirst=1, yearfirst=0] -> 2012-11-10 00:00:00 - - # OK - # 2.5.1 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 - # 2.5.2 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 - # 2.5.3 20/12/21 [dayfirst=0, yearfirst=0] -> 2021-12-20 00:00:00 - - # OK - # 2.5.1 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 - # 2.5.2 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 - # 2.5.3 20/12/21 [dayfirst=0, yearfirst=1] -> 2020-12-21 00:00:00 - - # revert of bug in 2.5.2 - # 2.5.1 20/12/21 [dayfirst=1, yearfirst=1] -> 2020-12-21 00:00:00 - # 2.5.2 20/12/21 [dayfirst=1, yearfirst=1] -> month must be in 1..12 - # 2.5.3 20/12/21 [dayfirst=1, yearfirst=1] -> 2020-12-21 00:00:00 - - # OK - # 2.5.1 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 - # 2.5.2 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 - # 2.5.3 20/12/21 [dayfirst=1, yearfirst=0] -> 2021-12-20 00:00:00 - - import dateutil - is_lt_253 = dateutil.__version__ < LooseVersion('2.5.3') - - # str : dayfirst, yearfirst, expected - cases = {'10-11-12': [(False, False, - datetime.datetime(2012, 10, 11)), - (True, False, - datetime.datetime(2012, 11, 10)), - (False, True, - datetime.datetime(2010, 11, 12)), - (True, True, - datetime.datetime(2010, 12, 11))], - '20/12/21': [(False, False, - datetime.datetime(2021, 12, 20)), - (True, False, - datetime.datetime(2021, 12, 20)), - (False, True, - datetime.datetime(2020, 12, 21)), - (True, True, - datetime.datetime(2020, 12, 21))]} - - from dateutil.parser import parse - for date_str, values in compat.iteritems(cases): - for dayfirst, yearfirst, expected in values: - - # odd comparisons across version - # let's just skip - if dayfirst and yearfirst and is_lt_253: - continue - - # compare with dateutil result - dateutil_result = parse(date_str, dayfirst=dayfirst, - yearfirst=yearfirst) - self.assertEqual(dateutil_result, expected) - - result1, _, _ = tools.parse_time_string(date_str, - dayfirst=dayfirst, - yearfirst=yearfirst) - - # we don't support dayfirst/yearfirst here: - if not dayfirst and not yearfirst: - result2 = Timestamp(date_str) - self.assertEqual(result2, expected) - - result3 = to_datetime(date_str, dayfirst=dayfirst, - yearfirst=yearfirst) - - result4 = DatetimeIndex([date_str], dayfirst=dayfirst, - yearfirst=yearfirst)[0] - - self.assertEqual(result1, expected) - self.assertEqual(result3, expected) - self.assertEqual(result4, expected) - - def test_parsers_timestring(self): - tm._skip_if_no_dateutil() - from dateutil.parser import parse - - # must be the same as dateutil result - cases = {'10:15': (parse('10:15'), datetime.datetime(1, 1, 1, 10, 15)), - '9:05': (parse('9:05'), datetime.datetime(1, 1, 1, 9, 5))} - - for date_str, (exp_now, exp_def) in compat.iteritems(cases): - result1, _, _ = tools.parse_time_string(date_str) - result2 = to_datetime(date_str) - result3 = to_datetime([date_str]) - result4 = Timestamp(date_str) - result5 = DatetimeIndex([date_str])[0] - # parse time string return time string based on default date - # others are not, and can't be changed because it is used in - # time series plot - self.assertEqual(result1, exp_def) - self.assertEqual(result2, exp_now) - self.assertEqual(result3, exp_now) - self.assertEqual(result4, exp_now) - self.assertEqual(result5, exp_now) - - def test_parsers_time(self): - # GH11818 - _skip_if_has_locale() - strings = ["14:15", "1415", "2:15pm", "0215pm", "14:15:00", "141500", - "2:15:00pm", "021500pm", datetime.time(14, 15)] - expected = datetime.time(14, 15) - - for time_string in strings: - self.assertEqual(tools.to_time(time_string), expected) - - new_string = "14.15" - self.assertRaises(ValueError, tools.to_time, new_string) - self.assertEqual(tools.to_time(new_string, format="%H.%M"), expected) - - arg = ["14:15", "20:20"] - expected_arr = [datetime.time(14, 15), datetime.time(20, 20)] - self.assertEqual(tools.to_time(arg), expected_arr) - self.assertEqual(tools.to_time(arg, format="%H:%M"), expected_arr) - self.assertEqual(tools.to_time(arg, infer_time_format=True), - expected_arr) - self.assertEqual(tools.to_time(arg, format="%I:%M%p", errors="coerce"), - [None, None]) - - res = tools.to_time(arg, format="%I:%M%p", errors="ignore") - self.assert_numpy_array_equal(res, np.array(arg, dtype=np.object_)) - - with tm.assertRaises(ValueError): - tools.to_time(arg, format="%I:%M%p", errors="raise") - - self.assert_series_equal(tools.to_time(Series(arg, name="test")), - Series(expected_arr, name="test")) - - res = tools.to_time(np.array(arg)) - self.assertIsInstance(res, list) - self.assert_equal(res, expected_arr) - - def test_parsers_monthfreq(self): - cases = {'201101': datetime.datetime(2011, 1, 1, 0, 0), - '200005': datetime.datetime(2000, 5, 1, 0, 0)} - - for date_str, expected in compat.iteritems(cases): - result1, _, _ = tools.parse_time_string(date_str, freq='M') - self.assertEqual(result1, expected) - - def test_parsers_quarterly_with_freq(self): - msg = ('Incorrect quarterly string is given, quarter ' - 'must be between 1 and 4: 2013Q5') - with tm.assertRaisesRegexp(tslib.DateParseError, msg): - tools.parse_time_string('2013Q5') - - # GH 5418 - msg = ('Unable to retrieve month information from given freq: ' - 'INVLD-L-DEC-SAT') - with tm.assertRaisesRegexp(tslib.DateParseError, msg): - tools.parse_time_string('2013Q1', freq='INVLD-L-DEC-SAT') - - cases = {('2013Q2', None): datetime.datetime(2013, 4, 1), - ('2013Q2', 'A-APR'): datetime.datetime(2012, 8, 1), - ('2013-Q2', 'A-DEC'): datetime.datetime(2013, 4, 1)} - - for (date_str, freq), exp in compat.iteritems(cases): - result, _, _ = tools.parse_time_string(date_str, freq=freq) - self.assertEqual(result, exp) - - def test_parsers_timezone_minute_offsets_roundtrip(self): - # GH11708 - base = to_datetime("2013-01-01 00:00:00") - dt_strings = [ - ('2013-01-01 05:45+0545', - "Asia/Katmandu", - "Timestamp('2013-01-01 05:45:00+0545', tz='Asia/Katmandu')"), - ('2013-01-01 05:30+0530', - "Asia/Kolkata", - "Timestamp('2013-01-01 05:30:00+0530', tz='Asia/Kolkata')") - ] - - for dt_string, tz, dt_string_repr in dt_strings: - dt_time = to_datetime(dt_string) - self.assertEqual(base, dt_time) - converted_time = dt_time.tz_localize('UTC').tz_convert(tz) - self.assertEqual(dt_string_repr, repr(converted_time)) - - def test_parsers_iso8601(self): - # GH 12060 - # test only the iso parser - flexibility to different - # separators and leadings 0s - # Timestamp construction falls back to dateutil - cases = {'2011-01-02': datetime.datetime(2011, 1, 2), - '2011-1-2': datetime.datetime(2011, 1, 2), - '2011-01': datetime.datetime(2011, 1, 1), - '2011-1': datetime.datetime(2011, 1, 1), - '2011 01 02': datetime.datetime(2011, 1, 2), - '2011.01.02': datetime.datetime(2011, 1, 2), - '2011/01/02': datetime.datetime(2011, 1, 2), - '2011\\01\\02': datetime.datetime(2011, 1, 2), - '2013-01-01 05:30:00': datetime.datetime(2013, 1, 1, 5, 30), - '2013-1-1 5:30:00': datetime.datetime(2013, 1, 1, 5, 30)} - for date_str, exp in compat.iteritems(cases): - actual = tslib._test_parse_iso8601(date_str) - self.assertEqual(actual, exp) - - # seperators must all match - YYYYMM not valid - invalid_cases = ['2011-01/02', '2011^11^11', - '201401', '201111', '200101', - # mixed separated and unseparated - '2005-0101', '200501-01', - '20010101 12:3456', '20010101 1234:56', - # HHMMSS must have two digits in each component - # if unseparated - '20010101 1', '20010101 123', '20010101 12345', - '20010101 12345Z', - # wrong separator for HHMMSS - '2001-01-01 12-34-56'] - for date_str in invalid_cases: - with tm.assertRaises(ValueError): - tslib._test_parse_iso8601(date_str) - # If no ValueError raised, let me know which case failed. - raise Exception(date_str) - - -class TestArrayToDatetime(tm.TestCase): - def test_parsing_valid_dates(self): - arr = np.array(['01-01-2013', '01-02-2013'], dtype=object) - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr), - np_array_datetime64_compat( - [ - '2013-01-01T00:00:00.000000000-0000', - '2013-01-02T00:00:00.000000000-0000' - ], - dtype='M8[ns]' - ) - ) - - arr = np.array(['Mon Sep 16 2013', 'Tue Sep 17 2013'], dtype=object) - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr), - np_array_datetime64_compat( - [ - '2013-09-16T00:00:00.000000000-0000', - '2013-09-17T00:00:00.000000000-0000' - ], - dtype='M8[ns]' - ) - ) - - def test_number_looking_strings_not_into_datetime(self): - # #4601 - # These strings don't look like datetimes so they shouldn't be - # attempted to be converted - arr = np.array(['-352.737091', '183.575577'], dtype=object) - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr, errors='ignore'), arr) - - arr = np.array(['1', '2', '3', '4', '5'], dtype=object) - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr, errors='ignore'), arr) - - def test_coercing_dates_outside_of_datetime64_ns_bounds(self): - invalid_dates = [ - datetime.date(1000, 1, 1), - datetime.datetime(1000, 1, 1), - '1000-01-01', - 'Jan 1, 1000', - np.datetime64('1000-01-01'), - ] - - for invalid_date in invalid_dates: - self.assertRaises(ValueError, - tslib.array_to_datetime, - np.array( - [invalid_date], dtype='object'), - errors='raise', ) - self.assert_numpy_array_equal( - tslib.array_to_datetime( - np.array([invalid_date], dtype='object'), - errors='coerce'), - np.array([tslib.iNaT], dtype='M8[ns]') - ) - - arr = np.array(['1/1/1000', '1/1/2000'], dtype=object) - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr, errors='coerce'), - np_array_datetime64_compat( - [ - tslib.iNaT, - '2000-01-01T00:00:00.000000000-0000' - ], - dtype='M8[ns]' - ) - ) - - def test_coerce_of_invalid_datetimes(self): - arr = np.array(['01-01-2013', 'not_a_date', '1'], dtype=object) - - # Without coercing, the presence of any invalid dates prevents - # any values from being converted - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr, errors='ignore'), arr) - - # With coercing, the invalid dates becomes iNaT - self.assert_numpy_array_equal( - tslib.array_to_datetime(arr, errors='coerce'), - np_array_datetime64_compat( - [ - '2013-01-01T00:00:00.000000000-0000', - tslib.iNaT, - tslib.iNaT - ], - dtype='M8[ns]' - ) - ) - - def test_parsing_timezone_offsets(self): - # All of these datetime strings with offsets are equivalent - # to the same datetime after the timezone offset is added - dt_strings = [ - '01-01-2013 08:00:00+08:00', - '2013-01-01T08:00:00.000000000+0800', - '2012-12-31T16:00:00.000000000-0800', - '12-31-2012 23:00:00-01:00' - ] - - expected_output = tslib.array_to_datetime(np.array( - ['01-01-2013 00:00:00'], dtype=object)) - - for dt_string in dt_strings: - self.assert_numpy_array_equal( - tslib.array_to_datetime( - np.array([dt_string], dtype=object) - ), - expected_output - ) - - -class TestTimestampNsOperations(tm.TestCase): - def setUp(self): - self.timestamp = Timestamp(datetime.datetime.utcnow()) - - def assert_ns_timedelta(self, modified_timestamp, expected_value): - value = self.timestamp.value - modified_value = modified_timestamp.value - - self.assertEqual(modified_value - value, expected_value) - - def test_timedelta_ns_arithmetic(self): - self.assert_ns_timedelta(self.timestamp + np.timedelta64(-123, 'ns'), - -123) - - def test_timedelta_ns_based_arithmetic(self): - self.assert_ns_timedelta(self.timestamp + np.timedelta64( - 1234567898, 'ns'), 1234567898) - - def test_timedelta_us_arithmetic(self): - self.assert_ns_timedelta(self.timestamp + np.timedelta64(-123, 'us'), - -123000) - - def test_timedelta_ms_arithmetic(self): - time = self.timestamp + np.timedelta64(-123, 'ms') - self.assert_ns_timedelta(time, -123000000) - - def test_nanosecond_string_parsing(self): - ts = Timestamp('2013-05-01 07:15:45.123456789') - # GH 7878 - expected_repr = '2013-05-01 07:15:45.123456789' - expected_value = 1367392545123456789 - self.assertEqual(ts.value, expected_value) - self.assertIn(expected_repr, repr(ts)) - - ts = Timestamp('2013-05-01 07:15:45.123456789+09:00', tz='Asia/Tokyo') - self.assertEqual(ts.value, expected_value - 9 * 3600 * 1000000000) - self.assertIn(expected_repr, repr(ts)) - - ts = Timestamp('2013-05-01 07:15:45.123456789', tz='UTC') - self.assertEqual(ts.value, expected_value) - self.assertIn(expected_repr, repr(ts)) - - ts = Timestamp('2013-05-01 07:15:45.123456789', tz='US/Eastern') - self.assertEqual(ts.value, expected_value + 4 * 3600 * 1000000000) - self.assertIn(expected_repr, repr(ts)) - - # GH 10041 - ts = Timestamp('20130501T071545.123456789') - self.assertEqual(ts.value, expected_value) - self.assertIn(expected_repr, repr(ts)) - - def test_nanosecond_timestamp(self): - # GH 7610 - expected = 1293840000000000005 - t = Timestamp('2011-01-01') + offsets.Nano(5) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000005')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 5) - - t = Timestamp(t) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000005')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 5) - - t = Timestamp(np_datetime64_compat('2011-01-01 00:00:00.000000005Z')) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000005')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 5) - - expected = 1293840000000000010 - t = t + offsets.Nano(5) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000010')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 10) - - t = Timestamp(t) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000010')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 10) - - t = Timestamp(np_datetime64_compat('2011-01-01 00:00:00.000000010Z')) - self.assertEqual(repr(t), "Timestamp('2011-01-01 00:00:00.000000010')") - self.assertEqual(t.value, expected) - self.assertEqual(t.nanosecond, 10) - - def test_nat_arithmetic(self): - # GH 6873 - i = 2 - f = 1.5 - - for (left, right) in [(pd.NaT, i), (pd.NaT, f), (pd.NaT, np.nan)]: - self.assertIs(left / right, pd.NaT) - self.assertIs(left * right, pd.NaT) - self.assertIs(right * left, pd.NaT) - with tm.assertRaises(TypeError): - right / left - - # Timestamp / datetime - t = Timestamp('2014-01-01') - dt = datetime.datetime(2014, 1, 1) - for (left, right) in [(pd.NaT, pd.NaT), (pd.NaT, t), (pd.NaT, dt)]: - # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT - self.assertIs(right + left, pd.NaT) - self.assertIs(left + right, pd.NaT) - self.assertIs(left - right, pd.NaT) - self.assertIs(right - left, pd.NaT) - - # timedelta-like - # offsets are tested in test_offsets.py - - delta = datetime.timedelta(3600) - td = Timedelta('5s') - - for (left, right) in [(pd.NaT, delta), (pd.NaT, td)]: - # NaT + timedelta-like returns NaT - self.assertIs(right + left, pd.NaT) - self.assertIs(left + right, pd.NaT) - self.assertIs(right - left, pd.NaT) - self.assertIs(left - right, pd.NaT) - - # GH 11718 - tm._skip_if_no_pytz() - import pytz - - t_utc = Timestamp('2014-01-01', tz='UTC') - t_tz = Timestamp('2014-01-01', tz='US/Eastern') - dt_tz = pytz.timezone('Asia/Tokyo').localize(dt) - - for (left, right) in [(pd.NaT, t_utc), (pd.NaT, t_tz), - (pd.NaT, dt_tz)]: - # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT - self.assertIs(right + left, pd.NaT) - self.assertIs(left + right, pd.NaT) - self.assertIs(left - right, pd.NaT) - self.assertIs(right - left, pd.NaT) - - # int addition / subtraction - for (left, right) in [(pd.NaT, 2), (pd.NaT, 0), (pd.NaT, -3)]: - self.assertIs(right + left, pd.NaT) - self.assertIs(left + right, pd.NaT) - self.assertIs(left - right, pd.NaT) - self.assertIs(right - left, pd.NaT) - - def test_nat_arithmetic_index(self): - # GH 11718 - - # datetime - tm._skip_if_no_pytz() - - dti = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], name='x') - exp = pd.DatetimeIndex([pd.NaT, pd.NaT], name='x') - self.assert_index_equal(dti + pd.NaT, exp) - self.assert_index_equal(pd.NaT + dti, exp) - - dti_tz = pd.DatetimeIndex(['2011-01-01', '2011-01-02'], - tz='US/Eastern', name='x') - exp = pd.DatetimeIndex([pd.NaT, pd.NaT], name='x', tz='US/Eastern') - self.assert_index_equal(dti_tz + pd.NaT, exp) - self.assert_index_equal(pd.NaT + dti_tz, exp) - - exp = pd.TimedeltaIndex([pd.NaT, pd.NaT], name='x') - for (left, right) in [(pd.NaT, dti), (pd.NaT, dti_tz)]: - self.assert_index_equal(left - right, exp) - self.assert_index_equal(right - left, exp) - - # timedelta - tdi = pd.TimedeltaIndex(['1 day', '2 day'], name='x') - exp = pd.DatetimeIndex([pd.NaT, pd.NaT], name='x') - for (left, right) in [(pd.NaT, tdi)]: - self.assert_index_equal(left + right, exp) - self.assert_index_equal(right + left, exp) - self.assert_index_equal(left - right, exp) - self.assert_index_equal(right - left, exp) - - -class TestTslib(tm.TestCase): - def test_intraday_conversion_factors(self): - self.assertEqual(period_asfreq( - 1, get_freq('D'), get_freq('H'), False), 24) - self.assertEqual(period_asfreq( - 1, get_freq('D'), get_freq('T'), False), 1440) - self.assertEqual(period_asfreq( - 1, get_freq('D'), get_freq('S'), False), 86400) - self.assertEqual(period_asfreq(1, get_freq( - 'D'), get_freq('L'), False), 86400000) - self.assertEqual(period_asfreq(1, get_freq( - 'D'), get_freq('U'), False), 86400000000) - self.assertEqual(period_asfreq(1, get_freq( - 'D'), get_freq('N'), False), 86400000000000) - - self.assertEqual(period_asfreq( - 1, get_freq('H'), get_freq('T'), False), 60) - self.assertEqual(period_asfreq( - 1, get_freq('H'), get_freq('S'), False), 3600) - self.assertEqual(period_asfreq(1, get_freq('H'), - get_freq('L'), False), 3600000) - self.assertEqual(period_asfreq(1, get_freq( - 'H'), get_freq('U'), False), 3600000000) - self.assertEqual(period_asfreq(1, get_freq( - 'H'), get_freq('N'), False), 3600000000000) - - self.assertEqual(period_asfreq( - 1, get_freq('T'), get_freq('S'), False), 60) - self.assertEqual(period_asfreq( - 1, get_freq('T'), get_freq('L'), False), 60000) - self.assertEqual(period_asfreq(1, get_freq( - 'T'), get_freq('U'), False), 60000000) - self.assertEqual(period_asfreq(1, get_freq( - 'T'), get_freq('N'), False), 60000000000) - - self.assertEqual(period_asfreq( - 1, get_freq('S'), get_freq('L'), False), 1000) - self.assertEqual(period_asfreq(1, get_freq('S'), - get_freq('U'), False), 1000000) - self.assertEqual(period_asfreq(1, get_freq( - 'S'), get_freq('N'), False), 1000000000) - - self.assertEqual(period_asfreq( - 1, get_freq('L'), get_freq('U'), False), 1000) - self.assertEqual(period_asfreq(1, get_freq('L'), - get_freq('N'), False), 1000000) - - self.assertEqual(period_asfreq( - 1, get_freq('U'), get_freq('N'), False), 1000) - - def test_period_ordinal_start_values(self): - # information for 1.1.1970 - self.assertEqual(0, period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, - get_freq('A'))) - self.assertEqual(0, period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, - get_freq('M'))) - self.assertEqual(1, period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, - get_freq('W'))) - self.assertEqual(0, period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, - get_freq('D'))) - self.assertEqual(0, period_ordinal(1970, 1, 1, 0, 0, 0, 0, 0, - get_freq('B'))) - - def test_period_ordinal_week(self): - self.assertEqual(1, period_ordinal(1970, 1, 4, 0, 0, 0, 0, 0, - get_freq('W'))) - self.assertEqual(2, period_ordinal(1970, 1, 5, 0, 0, 0, 0, 0, - get_freq('W'))) - - self.assertEqual(2284, period_ordinal(2013, 10, 6, 0, 0, 0, 0, 0, - get_freq('W'))) - self.assertEqual(2285, period_ordinal(2013, 10, 7, 0, 0, 0, 0, 0, - get_freq('W'))) - - def test_period_ordinal_business_day(self): - # Thursday - self.assertEqual(11415, period_ordinal(2013, 10, 3, 0, 0, 0, 0, 0, - get_freq('B'))) - # Friday - self.assertEqual(11416, period_ordinal(2013, 10, 4, 0, 0, 0, 0, 0, - get_freq('B'))) - # Saturday - self.assertEqual(11417, period_ordinal(2013, 10, 5, 0, 0, 0, 0, 0, - get_freq('B'))) - # Sunday - self.assertEqual(11417, period_ordinal(2013, 10, 6, 0, 0, 0, 0, 0, - get_freq('B'))) - # Monday - self.assertEqual(11417, period_ordinal(2013, 10, 7, 0, 0, 0, 0, 0, - get_freq('B'))) - # Tuesday - self.assertEqual(11418, period_ordinal(2013, 10, 8, 0, 0, 0, 0, 0, - get_freq('B'))) - - def test_tslib_tz_convert(self): - def compare_utc_to_local(tz_didx, utc_didx): - f = lambda x: tslib.tz_convert_single(x, 'UTC', tz_didx.tz) - result = tslib.tz_convert(tz_didx.asi8, 'UTC', tz_didx.tz) - result_single = np.vectorize(f)(tz_didx.asi8) - self.assert_numpy_array_equal(result, result_single) - - def compare_local_to_utc(tz_didx, utc_didx): - f = lambda x: tslib.tz_convert_single(x, tz_didx.tz, 'UTC') - result = tslib.tz_convert(utc_didx.asi8, tz_didx.tz, 'UTC') - result_single = np.vectorize(f)(utc_didx.asi8) - self.assert_numpy_array_equal(result, result_single) - - for tz in ['UTC', 'Asia/Tokyo', 'US/Eastern', 'Europe/Moscow']: - # US: 2014-03-09 - 2014-11-11 - # MOSCOW: 2014-10-26 / 2014-12-31 - tz_didx = date_range('2014-03-01', '2015-01-10', freq='H', tz=tz) - utc_didx = date_range('2014-03-01', '2015-01-10', freq='H') - compare_utc_to_local(tz_didx, utc_didx) - # local tz to UTC can be differ in hourly (or higher) freqs because - # of DST - compare_local_to_utc(tz_didx, utc_didx) - - tz_didx = date_range('2000-01-01', '2020-01-01', freq='D', tz=tz) - utc_didx = date_range('2000-01-01', '2020-01-01', freq='D') - compare_utc_to_local(tz_didx, utc_didx) - compare_local_to_utc(tz_didx, utc_didx) - - tz_didx = date_range('2000-01-01', '2100-01-01', freq='A', tz=tz) - utc_didx = date_range('2000-01-01', '2100-01-01', freq='A') - compare_utc_to_local(tz_didx, utc_didx) - compare_local_to_utc(tz_didx, utc_didx) - - # Check empty array - result = tslib.tz_convert(np.array([], dtype=np.int64), - tslib.maybe_get_tz('US/Eastern'), - tslib.maybe_get_tz('Asia/Tokyo')) - self.assert_numpy_array_equal(result, np.array([], dtype=np.int64)) - - # Check all-NaT array - result = tslib.tz_convert(np.array([tslib.iNaT], dtype=np.int64), - tslib.maybe_get_tz('US/Eastern'), - tslib.maybe_get_tz('Asia/Tokyo')) - self.assert_numpy_array_equal(result, np.array( - [tslib.iNaT], dtype=np.int64)) - - def test_shift_months(self): - s = DatetimeIndex([Timestamp('2000-01-05 00:15:00'), Timestamp( - '2000-01-31 00:23:00'), Timestamp('2000-01-01'), Timestamp( - '2000-02-29'), Timestamp('2000-12-31')]) - for years in [-1, 0, 1]: - for months in [-2, 0, 2]: - actual = DatetimeIndex(tslib.shift_months(s.asi8, years * 12 + - months)) - expected = DatetimeIndex([x + offsets.DateOffset( - years=years, months=months) for x in s]) - tm.assert_index_equal(actual, expected) - - def test_round(self): - stamp = Timestamp('2000-01-05 05:09:15.13') - - def _check_round(freq, expected): - result = stamp.round(freq=freq) - self.assertEqual(result, expected) - - for freq, expected in [('D', Timestamp('2000-01-05 00:00:00')), - ('H', Timestamp('2000-01-05 05:00:00')), - ('S', Timestamp('2000-01-05 05:09:15'))]: - _check_round(freq, expected) - - msg = pd.tseries.frequencies._INVALID_FREQ_ERROR - with self.assertRaisesRegexp(ValueError, msg): - stamp.round('foo') - - -class TestTimestampOps(tm.TestCase): - def test_timestamp_and_datetime(self): - self.assertEqual((Timestamp(datetime.datetime( - 2013, 10, 13)) - datetime.datetime(2013, 10, 12)).days, 1) - self.assertEqual((datetime.datetime(2013, 10, 12) - - Timestamp(datetime.datetime(2013, 10, 13))).days, -1) - - def test_timestamp_and_series(self): - timestamp_series = Series(date_range('2014-03-17', periods=2, freq='D', - tz='US/Eastern')) - first_timestamp = timestamp_series[0] - - delta_series = Series([np.timedelta64(0, 'D'), np.timedelta64(1, 'D')]) - assert_series_equal(timestamp_series - first_timestamp, delta_series) - assert_series_equal(first_timestamp - timestamp_series, -delta_series) - - def test_addition_subtraction_types(self): - # Assert on the types resulting from Timestamp +/- various date/time - # objects - datetime_instance = datetime.datetime(2014, 3, 4) - timedelta_instance = datetime.timedelta(seconds=1) - # build a timestamp with a frequency, since then it supports - # addition/subtraction of integers - timestamp_instance = date_range(datetime_instance, periods=1, - freq='D')[0] - - self.assertEqual(type(timestamp_instance + 1), Timestamp) - self.assertEqual(type(timestamp_instance - 1), Timestamp) - - # Timestamp + datetime not supported, though subtraction is supported - # and yields timedelta more tests in tseries/base/tests/test_base.py - self.assertEqual( - type(timestamp_instance - datetime_instance), Timedelta) - self.assertEqual( - type(timestamp_instance + timedelta_instance), Timestamp) - self.assertEqual( - type(timestamp_instance - timedelta_instance), Timestamp) - - # Timestamp +/- datetime64 not supported, so not tested (could possibly - # assert error raised?) - timedelta64_instance = np.timedelta64(1, 'D') - self.assertEqual( - type(timestamp_instance + timedelta64_instance), Timestamp) - self.assertEqual( - type(timestamp_instance - timedelta64_instance), Timestamp) - - def test_addition_subtraction_preserve_frequency(self): - timestamp_instance = date_range('2014-03-05', periods=1, freq='D')[0] - timedelta_instance = datetime.timedelta(days=1) - original_freq = timestamp_instance.freq - self.assertEqual((timestamp_instance + 1).freq, original_freq) - self.assertEqual((timestamp_instance - 1).freq, original_freq) - self.assertEqual( - (timestamp_instance + timedelta_instance).freq, original_freq) - self.assertEqual( - (timestamp_instance - timedelta_instance).freq, original_freq) - - timedelta64_instance = np.timedelta64(1, 'D') - self.assertEqual( - (timestamp_instance + timedelta64_instance).freq, original_freq) - self.assertEqual( - (timestamp_instance - timedelta64_instance).freq, original_freq) - - def test_resolution(self): - - for freq, expected in zip(['A', 'Q', 'M', 'D', 'H', 'T', - 'S', 'L', 'U'], - [RESO_DAY, RESO_DAY, - RESO_DAY, RESO_DAY, - RESO_HR, RESO_MIN, - RESO_SEC, RESO_MS, - RESO_US]): - for tz in [None, 'Asia/Tokyo', 'US/Eastern', - 'dateutil/US/Eastern']: - idx = date_range(start='2013-04-01', periods=30, freq=freq, - tz=tz) - result = period.resolution(idx.asi8, idx.tz) - self.assertEqual(result, expected) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/tests/test_util.py b/pandas/tseries/tests/test_util.py deleted file mode 100644 index 96da32a4a845c..0000000000000 --- a/pandas/tseries/tests/test_util.py +++ /dev/null @@ -1,132 +0,0 @@ -from pandas.compat import range -import nose - -import numpy as np - -from pandas import Series, date_range -import pandas.util.testing as tm - -from datetime import datetime, date - -from pandas.tseries.tools import normalize_date -from pandas.tseries.util import pivot_annual, isleapyear - - -class TestPivotAnnual(tm.TestCase): - """ - New pandas of scikits.timeseries pivot_annual - """ - - def test_daily(self): - rng = date_range('1/1/2000', '12/31/2004', freq='D') - ts = Series(np.random.randn(len(rng)), index=rng) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - annual = pivot_annual(ts, 'D') - - doy = ts.index.dayofyear - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - doy[(~isleapyear(ts.index.year)) & (doy >= 60)] += 1 - - for i in range(1, 367): - subset = ts[doy == i] - subset.index = [x.year for x in subset.index] - - result = annual[i].dropna() - tm.assert_series_equal(result, subset, check_names=False) - self.assertEqual(result.name, i) - - # check leap days - leaps = ts[(ts.index.month == 2) & (ts.index.day == 29)] - day = leaps.index.dayofyear[0] - leaps.index = leaps.index.year - leaps.name = 60 - tm.assert_series_equal(annual[day].dropna(), leaps) - - def test_hourly(self): - rng_hourly = date_range('1/1/1994', periods=(18 * 8760 + 4 * 24), - freq='H') - data_hourly = np.random.randint(100, 350, rng_hourly.size) - ts_hourly = Series(data_hourly, index=rng_hourly) - - grouped = ts_hourly.groupby(ts_hourly.index.year) - hoy = grouped.apply(lambda x: x.reset_index(drop=True)) - hoy = hoy.index.droplevel(0).values - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - hoy[~isleapyear(ts_hourly.index.year) & (hoy >= 1416)] += 24 - hoy += 1 - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - annual = pivot_annual(ts_hourly) - - ts_hourly = ts_hourly.astype(float) - for i in [1, 1416, 1417, 1418, 1439, 1440, 1441, 8784]: - subset = ts_hourly[hoy == i] - subset.index = [x.year for x in subset.index] - - result = annual[i].dropna() - tm.assert_series_equal(result, subset, check_names=False) - self.assertEqual(result.name, i) - - leaps = ts_hourly[(ts_hourly.index.month == 2) & ( - ts_hourly.index.day == 29) & (ts_hourly.index.hour == 0)] - hour = leaps.index.dayofyear[0] * 24 - 23 - leaps.index = leaps.index.year - leaps.name = 1417 - tm.assert_series_equal(annual[hour].dropna(), leaps) - - def test_weekly(self): - pass - - def test_monthly(self): - rng = date_range('1/1/2000', '12/31/2004', freq='M') - ts = Series(np.random.randn(len(rng)), index=rng) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - annual = pivot_annual(ts, 'M') - - month = ts.index.month - for i in range(1, 13): - subset = ts[month == i] - subset.index = [x.year for x in subset.index] - result = annual[i].dropna() - tm.assert_series_equal(result, subset, check_names=False) - self.assertEqual(result.name, i) - - def test_period_monthly(self): - pass - - def test_period_daily(self): - pass - - def test_period_weekly(self): - pass - - def test_isleapyear_deprecate(self): - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertTrue(isleapyear(2000)) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertFalse(isleapyear(2001)) - - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - self.assertTrue(isleapyear(2004)) - - -def test_normalize_date(): - value = date(2012, 9, 7) - - result = normalize_date(value) - assert (result == datetime(2012, 9, 7)) - - value = datetime(2012, 9, 7, 12) - - result = normalize_date(value) - assert (result == datetime(2012, 9, 7)) - - -if __name__ == '__main__': - nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'], - exit=False) diff --git a/pandas/tseries/util.py b/pandas/tseries/util.py index dc460dee8415b..5934f5843736c 100644 --- a/pandas/tseries/util.py +++ b/pandas/tseries/util.py @@ -2,7 +2,7 @@ from pandas.compat import lrange import numpy as np -from pandas.types.common import _ensure_platform_int +from pandas.core.dtypes.common import _ensure_platform_int from pandas.core.frame import DataFrame import pandas.core.algorithms as algorithms @@ -54,7 +54,7 @@ def pivot_annual(series, freq=None): if freq == 'D': width = 366 - offset = index.dayofyear - 1 + offset = np.asarray(index.dayofyear) - 1 # adjust for leap year offset[(~isleapyear(year)) & (offset >= 59)] += 1 @@ -63,7 +63,7 @@ def pivot_annual(series, freq=None): # todo: strings like 1/1, 1/25, etc.? elif freq in ('M', 'BM'): width = 12 - offset = index.month - 1 + offset = np.asarray(index.month) - 1 columns = lrange(1, 13) elif freq == 'H': width = 8784 diff --git a/pandas/tslib.py b/pandas/tslib.py new file mode 100644 index 0000000000000..c960a4eaf59ad --- /dev/null +++ b/pandas/tslib.py @@ -0,0 +1,7 @@ +# flake8: noqa + +import warnings +warnings.warn("The pandas.tslib module is deprecated and will be " + "removed in a future version.", FutureWarning, stacklevel=2) +from pandas._libs.tslib import (Timestamp, Timedelta, + NaT, NaTType, OutOfBoundsDatetime) diff --git a/pandas/types/common.py b/pandas/types/common.py index e58e0826ea49a..a125c27d04596 100644 --- a/pandas/types/common.py +++ b/pandas/types/common.py @@ -1,493 +1,8 @@ -""" common type operations """ +import warnings -import numpy as np -from pandas.compat import (string_types, text_type, binary_type, - PY3, PY36) -from pandas import lib, algos -from .dtypes import (CategoricalDtype, CategoricalDtypeType, - DatetimeTZDtype, DatetimeTZDtypeType, - PeriodDtype, PeriodDtypeType, - ExtensionDtype) -from .generic import (ABCCategorical, ABCPeriodIndex, - ABCDatetimeIndex, ABCSeries, - ABCSparseArray, ABCSparseSeries) -from .inference import is_string_like -from .inference import * # noqa +warnings.warn("pandas.types.common is deprecated and will be " + "removed in a future version, import " + "from pandas.api.types", + DeprecationWarning, stacklevel=3) - -_POSSIBLY_CAST_DTYPES = set([np.dtype(t).name - for t in ['O', 'int8', 'uint8', 'int16', 'uint16', - 'int32', 'uint32', 'int64', 'uint64']]) - -_NS_DTYPE = np.dtype('M8[ns]') -_TD_DTYPE = np.dtype('m8[ns]') -_INT64_DTYPE = np.dtype(np.int64) - -_ensure_float64 = algos.ensure_float64 -_ensure_float32 = algos.ensure_float32 - - -def _ensure_float(arr): - if issubclass(arr.dtype.type, (np.integer, np.bool_)): - arr = arr.astype(float) - return arr - - -_ensure_uint64 = algos.ensure_uint64 -_ensure_int64 = algos.ensure_int64 -_ensure_int32 = algos.ensure_int32 -_ensure_int16 = algos.ensure_int16 -_ensure_int8 = algos.ensure_int8 -_ensure_platform_int = algos.ensure_platform_int -_ensure_object = algos.ensure_object - - -def _ensure_categorical(arr): - if not is_categorical(arr): - from pandas import Categorical - arr = Categorical(arr) - return arr - - -def is_object_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.object_) - - -def is_sparse(array): - """ return if we are a sparse array """ - return isinstance(array, (ABCSparseArray, ABCSparseSeries)) - - -def is_categorical(array): - """ return if we are a categorical possibility """ - return isinstance(array, ABCCategorical) or is_categorical_dtype(array) - - -def is_datetimetz(array): - """ return if we are a datetime with tz array """ - return ((isinstance(array, ABCDatetimeIndex) and - getattr(array, 'tz', None) is not None) or - is_datetime64tz_dtype(array)) - - -def is_period(array): - """ return if we are a period array """ - return isinstance(array, ABCPeriodIndex) or is_period_arraylike(array) - - -def is_datetime64_dtype(arr_or_dtype): - try: - tipo = _get_dtype_type(arr_or_dtype) - except TypeError: - return False - return issubclass(tipo, np.datetime64) - - -def is_datetime64tz_dtype(arr_or_dtype): - return DatetimeTZDtype.is_dtype(arr_or_dtype) - - -def is_timedelta64_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.timedelta64) - - -def is_period_dtype(arr_or_dtype): - return PeriodDtype.is_dtype(arr_or_dtype) - - -def is_categorical_dtype(arr_or_dtype): - return CategoricalDtype.is_dtype(arr_or_dtype) - - -def is_string_dtype(arr_or_dtype): - dtype = _get_dtype(arr_or_dtype) - return dtype.kind in ('O', 'S', 'U') and not is_period_dtype(dtype) - - -def is_period_arraylike(arr): - """ return if we are period arraylike / PeriodIndex """ - if isinstance(arr, ABCPeriodIndex): - return True - elif isinstance(arr, (np.ndarray, ABCSeries)): - return arr.dtype == object and lib.infer_dtype(arr) == 'period' - return getattr(arr, 'inferred_type', None) == 'period' - - -def is_datetime_arraylike(arr): - """ return if we are datetime arraylike / DatetimeIndex """ - if isinstance(arr, ABCDatetimeIndex): - return True - elif isinstance(arr, (np.ndarray, ABCSeries)): - return arr.dtype == object and lib.infer_dtype(arr) == 'datetime' - return getattr(arr, 'inferred_type', None) == 'datetime' - - -def is_datetimelike(arr): - return (is_datetime64_dtype(arr) or is_datetime64tz_dtype(arr) or - is_timedelta64_dtype(arr) or - isinstance(arr, ABCPeriodIndex) or - is_datetimetz(arr)) - - -def is_dtype_equal(source, target): - """ return a boolean if the dtypes are equal """ - try: - source = _get_dtype(source) - target = _get_dtype(target) - return source == target - except (TypeError, AttributeError): - - # invalid comparison - # object == category will hit this - return False - - -def is_any_int_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.integer) - - -def is_integer_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return (issubclass(tipo, np.integer) and - not issubclass(tipo, (np.datetime64, np.timedelta64))) - - -def is_signed_integer_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return (issubclass(tipo, np.signedinteger) and - not issubclass(tipo, (np.datetime64, np.timedelta64))) - - -def is_unsigned_integer_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return (issubclass(tipo, np.unsignedinteger) and - not issubclass(tipo, (np.datetime64, np.timedelta64))) - - -def is_int64_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.int64) - - -def is_int_or_datetime_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return (issubclass(tipo, np.integer) or - issubclass(tipo, (np.datetime64, np.timedelta64))) - - -def is_datetime64_any_dtype(arr_or_dtype): - return (is_datetime64_dtype(arr_or_dtype) or - is_datetime64tz_dtype(arr_or_dtype)) - - -def is_datetime64_ns_dtype(arr_or_dtype): - try: - tipo = _get_dtype(arr_or_dtype) - except TypeError: - if is_datetime64tz_dtype(arr_or_dtype): - tipo = _get_dtype(arr_or_dtype.dtype) - else: - return False - return tipo == _NS_DTYPE or getattr(tipo, 'base', None) == _NS_DTYPE - - -def is_timedelta64_ns_dtype(arr_or_dtype): - tipo = _get_dtype(arr_or_dtype) - return tipo == _TD_DTYPE - - -def is_datetime_or_timedelta_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, (np.datetime64, np.timedelta64)) - - -def _is_unorderable_exception(e): - """ - return a boolean if we an unorderable exception error message - - These are different error message for PY>=3<=3.5 and PY>=3.6 - """ - if PY36: - return "'>' not supported between instances of" in str(e) - - elif PY3: - return 'unorderable' in str(e) - return False - - -def is_numeric_v_string_like(a, b): - """ - numpy doesn't like to compare numeric arrays vs scalar string-likes - - return a boolean result if this is the case for a,b or b,a - - """ - is_a_array = isinstance(a, np.ndarray) - is_b_array = isinstance(b, np.ndarray) - - is_a_numeric_array = is_a_array and is_numeric_dtype(a) - is_b_numeric_array = is_b_array and is_numeric_dtype(b) - is_a_string_array = is_a_array and is_string_like_dtype(a) - is_b_string_array = is_b_array and is_string_like_dtype(b) - - is_a_scalar_string_like = not is_a_array and is_string_like(a) - is_b_scalar_string_like = not is_b_array and is_string_like(b) - - return ((is_a_numeric_array and is_b_scalar_string_like) or - (is_b_numeric_array and is_a_scalar_string_like) or - (is_a_numeric_array and is_b_string_array) or - (is_b_numeric_array and is_a_string_array)) - - -def is_datetimelike_v_numeric(a, b): - # return if we have an i8 convertible and numeric comparison - if not hasattr(a, 'dtype'): - a = np.asarray(a) - if not hasattr(b, 'dtype'): - b = np.asarray(b) - - def is_numeric(x): - return is_integer_dtype(x) or is_float_dtype(x) - - is_datetimelike = needs_i8_conversion - return ((is_datetimelike(a) and is_numeric(b)) or - (is_datetimelike(b) and is_numeric(a))) - - -def is_datetimelike_v_object(a, b): - # return if we have an i8 convertible and object comparsion - if not hasattr(a, 'dtype'): - a = np.asarray(a) - if not hasattr(b, 'dtype'): - b = np.asarray(b) - - def f(x): - return is_object_dtype(x) - - def is_object(x): - return is_integer_dtype(x) or is_float_dtype(x) - - is_datetimelike = needs_i8_conversion - return ((is_datetimelike(a) and is_object(b)) or - (is_datetimelike(b) and is_object(a))) - - -def needs_i8_conversion(arr_or_dtype): - return (is_datetime_or_timedelta_dtype(arr_or_dtype) or - is_datetime64tz_dtype(arr_or_dtype) or - is_period_dtype(arr_or_dtype)) - - -def is_numeric_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return (issubclass(tipo, (np.number, np.bool_)) and - not issubclass(tipo, (np.datetime64, np.timedelta64))) - - -def is_string_like_dtype(arr_or_dtype): - # exclude object as its a mixed dtype - dtype = _get_dtype(arr_or_dtype) - return dtype.kind in ('S', 'U') - - -def is_float_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.floating) - - -def is_floating_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return isinstance(tipo, np.floating) - - -def is_bool_dtype(arr_or_dtype): - try: - tipo = _get_dtype_type(arr_or_dtype) - except ValueError: - # this isn't even a dtype - return False - return issubclass(tipo, np.bool_) - - -def is_extension_type(value): - """ - if we are a klass that is preserved by the internals - these are internal klasses that we represent (and don't use a np.array) - """ - if is_categorical(value): - return True - elif is_sparse(value): - return True - elif is_datetimetz(value): - return True - return False - - -def is_complex_dtype(arr_or_dtype): - tipo = _get_dtype_type(arr_or_dtype) - return issubclass(tipo, np.complexfloating) - - -def _coerce_to_dtype(dtype): - """ coerce a string / np.dtype to a dtype """ - if is_categorical_dtype(dtype): - dtype = CategoricalDtype() - elif is_datetime64tz_dtype(dtype): - dtype = DatetimeTZDtype(dtype) - elif is_period_dtype(dtype): - dtype = PeriodDtype(dtype) - else: - dtype = np.dtype(dtype) - return dtype - - -def _get_dtype(arr_or_dtype): - if isinstance(arr_or_dtype, np.dtype): - return arr_or_dtype - elif isinstance(arr_or_dtype, type): - return np.dtype(arr_or_dtype) - elif isinstance(arr_or_dtype, CategoricalDtype): - return arr_or_dtype - elif isinstance(arr_or_dtype, DatetimeTZDtype): - return arr_or_dtype - elif isinstance(arr_or_dtype, PeriodDtype): - return arr_or_dtype - elif isinstance(arr_or_dtype, string_types): - if is_categorical_dtype(arr_or_dtype): - return CategoricalDtype.construct_from_string(arr_or_dtype) - elif is_datetime64tz_dtype(arr_or_dtype): - return DatetimeTZDtype.construct_from_string(arr_or_dtype) - elif is_period_dtype(arr_or_dtype): - return PeriodDtype.construct_from_string(arr_or_dtype) - - if hasattr(arr_or_dtype, 'dtype'): - arr_or_dtype = arr_or_dtype.dtype - return np.dtype(arr_or_dtype) - - -def _get_dtype_type(arr_or_dtype): - if isinstance(arr_or_dtype, np.dtype): - return arr_or_dtype.type - elif isinstance(arr_or_dtype, type): - return np.dtype(arr_or_dtype).type - elif isinstance(arr_or_dtype, CategoricalDtype): - return CategoricalDtypeType - elif isinstance(arr_or_dtype, DatetimeTZDtype): - return DatetimeTZDtypeType - elif isinstance(arr_or_dtype, PeriodDtype): - return PeriodDtypeType - elif isinstance(arr_or_dtype, string_types): - if is_categorical_dtype(arr_or_dtype): - return CategoricalDtypeType - elif is_datetime64tz_dtype(arr_or_dtype): - return DatetimeTZDtypeType - elif is_period_dtype(arr_or_dtype): - return PeriodDtypeType - return _get_dtype_type(np.dtype(arr_or_dtype)) - try: - return arr_or_dtype.dtype.type - except AttributeError: - return type(None) - - -def _get_dtype_from_object(dtype): - """Get a numpy dtype.type-style object. This handles the datetime64[ns] - and datetime64[ns, TZ] compat - - Notes - ----- - If nothing can be found, returns ``object``. - """ - - # type object from a dtype - if isinstance(dtype, type) and issubclass(dtype, np.generic): - return dtype - elif is_categorical(dtype): - return CategoricalDtype().type - elif is_datetimetz(dtype): - return DatetimeTZDtype(dtype).type - elif isinstance(dtype, np.dtype): # dtype object - try: - _validate_date_like_dtype(dtype) - except TypeError: - # should still pass if we don't have a datelike - pass - return dtype.type - elif isinstance(dtype, string_types): - if dtype in ['datetimetz', 'datetime64tz']: - return DatetimeTZDtype.type - elif dtype in ['period']: - raise NotImplementedError - - if dtype == 'datetime' or dtype == 'timedelta': - dtype += '64' - - try: - return _get_dtype_from_object(getattr(np, dtype)) - except (AttributeError, TypeError): - # handles cases like _get_dtype(int) - # i.e., python objects that are valid dtypes (unlike user-defined - # types, in general) - # TypeError handles the float16 typecode of 'e' - # further handle internal types - pass - - return _get_dtype_from_object(np.dtype(dtype)) - - -def _validate_date_like_dtype(dtype): - try: - typ = np.datetime_data(dtype)[0] - except ValueError as e: - raise TypeError('%s' % e) - if typ != 'generic' and typ != 'ns': - raise ValueError('%r is too specific of a frequency, try passing %r' % - (dtype.name, dtype.type.__name__)) - - -_string_dtypes = frozenset(map(_get_dtype_from_object, (binary_type, - text_type))) - - -def pandas_dtype(dtype): - """ - Converts input into a pandas only dtype object or a numpy dtype object. - - Parameters - ---------- - dtype : object to be converted - - Returns - ------- - np.dtype or a pandas dtype - """ - if isinstance(dtype, DatetimeTZDtype): - return dtype - elif isinstance(dtype, PeriodDtype): - return dtype - elif isinstance(dtype, CategoricalDtype): - return dtype - elif isinstance(dtype, string_types): - try: - return DatetimeTZDtype.construct_from_string(dtype) - except TypeError: - pass - - if dtype.startswith('period[') or dtype.startswith('Period['): - # do not parse string like U as period[U] - try: - return PeriodDtype.construct_from_string(dtype) - except TypeError: - pass - - try: - return CategoricalDtype.construct_from_string(dtype) - except TypeError: - pass - elif isinstance(dtype, ExtensionDtype): - return dtype - - return np.dtype(dtype) +from pandas.core.dtypes.common import * # noqa diff --git a/pandas/types/concat.py b/pandas/types/concat.py index 827eb160c452d..477156b38d56d 100644 --- a/pandas/types/concat.py +++ b/pandas/types/concat.py @@ -1,480 +1,11 @@ -""" -Utility functions related to concat -""" +import warnings -import numpy as np -import pandas.tslib as tslib -from pandas import compat -from pandas.core.algorithms import take_1d -from .common import (is_categorical_dtype, - is_sparse, - is_datetimetz, - is_datetime64_dtype, - is_timedelta64_dtype, - is_period_dtype, - is_object_dtype, - is_bool_dtype, - is_dtype_equal, - _NS_DTYPE, - _TD_DTYPE) -from pandas.types.generic import (ABCDatetimeIndex, ABCTimedeltaIndex, - ABCPeriodIndex) - -def get_dtype_kinds(l): - """ - Parameters - ---------- - l : list of arrays - - Returns - ------- - a set of kinds that exist in this list of arrays - """ - - typs = set() - for arr in l: - - dtype = arr.dtype - if is_categorical_dtype(dtype): - typ = 'category' - elif is_sparse(arr): - typ = 'sparse' - elif is_datetimetz(arr): - # if to_concat contains different tz, - # the result must be object dtype - typ = str(arr.dtype) - elif is_datetime64_dtype(dtype): - typ = 'datetime' - elif is_timedelta64_dtype(dtype): - typ = 'timedelta' - elif is_object_dtype(dtype): - typ = 'object' - elif is_bool_dtype(dtype): - typ = 'bool' - elif is_period_dtype(dtype): - typ = str(arr.dtype) - else: - typ = dtype.kind - typs.add(typ) - return typs - - -def _get_series_result_type(result): - """ - return appropriate class of Series concat - input is either dict or array-like - """ - if isinstance(result, dict): - # concat Series with axis 1 - if all(is_sparse(c) for c in compat.itervalues(result)): - from pandas.sparse.api import SparseDataFrame - return SparseDataFrame - else: - from pandas.core.frame import DataFrame - return DataFrame - - elif is_sparse(result): - # concat Series with axis 1 - from pandas.sparse.api import SparseSeries - return SparseSeries - else: - from pandas.core.series import Series - return Series - - -def _get_frame_result_type(result, objs): - """ - return appropriate class of DataFrame-like concat - if any block is SparseBlock, return SparseDataFrame - otherwise, return 1st obj - """ - if any(b.is_sparse for b in result.blocks): - from pandas.sparse.api import SparseDataFrame - return SparseDataFrame - else: - return objs[0] - - -def _concat_compat(to_concat, axis=0): - """ - provide concatenation of an array of arrays each of which is a single - 'normalized' dtypes (in that for example, if it's object, then it is a - non-datetimelike and provide a combined dtype for the resulting array that - preserves the overall dtype if possible) - - Parameters - ---------- - to_concat : array of arrays - axis : axis to provide concatenation - - Returns - ------- - a single array, preserving the combined dtypes - """ - - # filter empty arrays - # 1-d dtypes always are included here - def is_nonempty(x): - try: - return x.shape[axis] > 0 - except Exception: - return True - - nonempty = [x for x in to_concat if is_nonempty(x)] - - # If all arrays are empty, there's nothing to convert, just short-cut to - # the concatenation, #3121. - # - # Creating an empty array directly is tempting, but the winnings would be - # marginal given that it would still require shape & dtype calculation and - # np.concatenate which has them both implemented is compiled. - - typs = get_dtype_kinds(to_concat) - - _contains_datetime = any(typ.startswith('datetime') for typ in typs) - _contains_period = any(typ.startswith('period') for typ in typs) - - if 'category' in typs: - # this must be priort to _concat_datetime, - # to support Categorical + datetime-like - return _concat_categorical(to_concat, axis=axis) - - elif _contains_datetime or 'timedelta' in typs or _contains_period: - return _concat_datetime(to_concat, axis=axis, typs=typs) - - # these are mandated to handle empties as well - elif 'sparse' in typs: - return _concat_sparse(to_concat, axis=axis, typs=typs) - - if not nonempty: - # we have all empties, but may need to coerce the result dtype to - # object if we have non-numeric type operands (numpy would otherwise - # cast this to float) - typs = get_dtype_kinds(to_concat) - if len(typs) != 1: - - if (not len(typs - set(['i', 'u', 'f'])) or - not len(typs - set(['bool', 'i', 'u']))): - # let numpy coerce - pass - else: - # coerce to object - to_concat = [x.astype('object') for x in to_concat] - - return np.concatenate(to_concat, axis=axis) - - -def _concat_categorical(to_concat, axis=0): - """Concatenate an object/categorical array of arrays, each of which is a - single dtype - - Parameters - ---------- - to_concat : array of arrays - axis : int - Axis to provide concatenation in the current implementation this is - always 0, e.g. we only have 1D categoricals - - Returns - ------- - Categorical - A single array, preserving the combined dtypes - """ - - def _concat_asobject(to_concat): - to_concat = [x.get_values() if is_categorical_dtype(x.dtype) - else x.ravel() for x in to_concat] - res = _concat_compat(to_concat) - if axis == 1: - return res.reshape(1, len(res)) - else: - return res - - # we could have object blocks and categoricals here - # if we only have a single categoricals then combine everything - # else its a non-compat categorical - categoricals = [x for x in to_concat if is_categorical_dtype(x.dtype)] - - # validate the categories - if len(categoricals) != len(to_concat): - pass - else: - # when all categories are identical - first = to_concat[0] - if all(first.is_dtype_equal(other) for other in to_concat[1:]): - return union_categoricals(categoricals) - - return _concat_asobject(to_concat) - - -def union_categoricals(to_union, sort_categories=False): - """ - Combine list-like of Categorical-like, unioning categories. All - categories must have the same dtype. - - .. versionadded:: 0.19.0 - - Parameters - ---------- - to_union : list-like of Categorical, CategoricalIndex, - or Series with dtype='category' - sort_categories : boolean, default False - If true, resulting categories will be lexsorted, otherwise - they will be ordered as they appear in the data. - - Returns - ------- - result : Categorical - - Raises - ------ - TypeError - - all inputs do not have the same dtype - - all inputs do not have the same ordered property - - all inputs are ordered and their categories are not identical - - sort_categories=True and Categoricals are ordered - ValueError - Emmpty list of categoricals passed - """ - from pandas import Index, Categorical, CategoricalIndex, Series - - if len(to_union) == 0: - raise ValueError('No Categoricals to union') - - def _maybe_unwrap(x): - if isinstance(x, (CategoricalIndex, Series)): - return x.values - elif isinstance(x, Categorical): - return x - else: - raise TypeError("all components to combine must be Categorical") - - to_union = [_maybe_unwrap(x) for x in to_union] - first = to_union[0] - - if not all(is_dtype_equal(other.categories.dtype, first.categories.dtype) - for other in to_union[1:]): - raise TypeError("dtype of categories must be the same") - - ordered = False - if all(first.is_dtype_equal(other) for other in to_union[1:]): - # identical categories - fastpath - categories = first.categories - ordered = first.ordered - new_codes = np.concatenate([c.codes for c in to_union]) - - if sort_categories and ordered: - raise TypeError("Cannot use sort_categories=True with " - "ordered Categoricals") - - if sort_categories and not categories.is_monotonic_increasing: - categories = categories.sort_values() - indexer = categories.get_indexer(first.categories) - new_codes = take_1d(indexer, new_codes, fill_value=-1) - elif all(not c.ordered for c in to_union): - # different categories - union and recode - cats = first.categories.append([c.categories for c in to_union[1:]]) - categories = Index(cats.unique()) - if sort_categories: - categories = categories.sort_values() - - new_codes = [] - for c in to_union: - if len(c.categories) > 0: - indexer = categories.get_indexer(c.categories) - new_codes.append(take_1d(indexer, c.codes, fill_value=-1)) - else: - # must be all NaN - new_codes.append(c.codes) - new_codes = np.concatenate(new_codes) - else: - # ordered - to show a proper error message - if all(c.ordered for c in to_union): - msg = ("to union ordered Categoricals, " - "all categories must be the same") - raise TypeError(msg) - else: - raise TypeError('Categorical.ordered must be the same') - - return Categorical(new_codes, categories=categories, ordered=ordered, - fastpath=True) - - -def _concat_datetime(to_concat, axis=0, typs=None): - """ - provide concatenation of an datetimelike array of arrays each of which is a - single M8[ns], datetimet64[ns, tz] or m8[ns] dtype - - Parameters - ---------- - to_concat : array of arrays - axis : axis to provide concatenation - typs : set of to_concat dtypes - - Returns - ------- - a single array, preserving the combined dtypes - """ - - def convert_to_pydatetime(x, axis): - # coerce to an object dtype - - # if dtype is of datetimetz or timezone - if x.dtype.kind == _NS_DTYPE.kind: - if getattr(x, 'tz', None) is not None: - x = x.asobject.values - else: - shape = x.shape - x = tslib.ints_to_pydatetime(x.view(np.int64).ravel(), - box=True) - x = x.reshape(shape) - - elif x.dtype == _TD_DTYPE: - shape = x.shape - x = tslib.ints_to_pytimedelta(x.view(np.int64).ravel(), box=True) - x = x.reshape(shape) - - if axis == 1: - x = np.atleast_2d(x) - return x - - if typs is None: - typs = get_dtype_kinds(to_concat) - - # must be single dtype - if len(typs) == 1: - _contains_datetime = any(typ.startswith('datetime') for typ in typs) - _contains_period = any(typ.startswith('period') for typ in typs) - - if _contains_datetime: - - if 'datetime' in typs: - new_values = np.concatenate([x.view(np.int64) for x in - to_concat], axis=axis) - return new_values.view(_NS_DTYPE) - else: - # when to_concat has different tz, len(typs) > 1. - # thus no need to care - return _concat_datetimetz(to_concat) - - elif 'timedelta' in typs: - new_values = np.concatenate([x.view(np.int64) for x in to_concat], - axis=axis) - return new_values.view(_TD_DTYPE) - - elif _contains_period: - # PeriodIndex must be handled by PeriodIndex, - # Thus can't meet this condition ATM - # Must be changed when we adding PeriodDtype - raise NotImplementedError - - # need to coerce to object - to_concat = [convert_to_pydatetime(x, axis) for x in to_concat] - return np.concatenate(to_concat, axis=axis) - - -def _concat_datetimetz(to_concat, name=None): - """ - concat DatetimeIndex with the same tz - all inputs must be DatetimeIndex - it is used in DatetimeIndex.append also - """ - # do not pass tz to set because tzlocal cannot be hashed - if len(set([str(x.dtype) for x in to_concat])) != 1: - raise ValueError('to_concat must have the same tz') - tz = to_concat[0].tz - # no need to localize because internal repr will not be changed - new_values = np.concatenate([x.asi8 for x in to_concat]) - return to_concat[0]._simple_new(new_values, tz=tz, name=name) - - -def _concat_index_asobject(to_concat, name=None): - """ - concat all inputs as object. DatetimeIndex, TimedeltaIndex and - PeriodIndex are converted to object dtype before concatenation - """ - - klasses = ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex - to_concat = [x.asobject if isinstance(x, klasses) else x - for x in to_concat] - - from pandas import Index - self = to_concat[0] - attribs = self._get_attributes_dict() - attribs['name'] = name - - to_concat = [x._values if isinstance(x, Index) else x - for x in to_concat] - return self._shallow_copy_with_infer(np.concatenate(to_concat), **attribs) - - -def _concat_sparse(to_concat, axis=0, typs=None): - """ - provide concatenation of an sparse/dense array of arrays each of which is a - single dtype - - Parameters - ---------- - to_concat : array of arrays - axis : axis to provide concatenation - typs : set of to_concat dtypes - - Returns - ------- - a single array, preserving the combined dtypes - """ - - from pandas.sparse.array import SparseArray, _make_index - - def convert_sparse(x, axis): - # coerce to native type - if isinstance(x, SparseArray): - x = x.get_values() - x = x.ravel() - if axis > 0: - x = np.atleast_2d(x) - return x - - if typs is None: - typs = get_dtype_kinds(to_concat) - - if len(typs) == 1: - # concat input as it is if all inputs are sparse - # and have the same fill_value - fill_values = set(c.fill_value for c in to_concat) - if len(fill_values) == 1: - sp_values = [c.sp_values for c in to_concat] - indexes = [c.sp_index.to_int_index() for c in to_concat] - - indices = [] - loc = 0 - for idx in indexes: - indices.append(idx.indices + loc) - loc += idx.length - sp_values = np.concatenate(sp_values) - indices = np.concatenate(indices) - sp_index = _make_index(loc, indices, kind=to_concat[0].sp_index) - - return SparseArray(sp_values, sparse_index=sp_index, - fill_value=to_concat[0].fill_value) - - # input may be sparse / dense mixed and may have different fill_value - # input must contain sparse at least 1 - sparses = [c for c in to_concat if is_sparse(c)] - fill_values = [c.fill_value for c in sparses] - sp_indexes = [c.sp_index for c in sparses] - - # densify and regular concat - to_concat = [convert_sparse(x, axis) for x in to_concat] - result = np.concatenate(to_concat, axis=axis) - - if not len(typs - set(['sparse', 'f', 'i'])): - # sparsify if inputs are sparse and dense numerics - # first sparse input's fill_value and SparseIndex is used - result = SparseArray(result.ravel(), fill_value=fill_values[0], - kind=sp_indexes[0]) - else: - # coerce to object if needed - result = result.astype('object') - return result +def union_categoricals(to_union, sort_categories=False, ignore_order=False): + warnings.warn("pandas.types.concat.union_categoricals is " + "deprecated and will be removed in a future version.\n" + "use pandas.api.types.union_categoricals", + FutureWarning, stacklevel=2) + from pandas.api.types import union_categoricals + return union_categoricals( + to_union, sort_categories=sort_categories, ignore_order=ignore_order) diff --git a/pandas/types/inference.py b/pandas/types/inference.py deleted file mode 100644 index d2a2924b27659..0000000000000 --- a/pandas/types/inference.py +++ /dev/null @@ -1,106 +0,0 @@ -""" basic inference routines """ - -import collections -import re -import numpy as np -from numbers import Number -from pandas.compat import (string_types, text_type, - string_and_binary_types) -from pandas import lib - -is_bool = lib.is_bool - -is_integer = lib.is_integer - -is_float = lib.is_float - -is_complex = lib.is_complex - -is_scalar = lib.isscalar - -is_decimal = lib.is_decimal - - -def is_number(obj): - return isinstance(obj, (Number, np.number)) - - -def is_string_like(obj): - return isinstance(obj, (text_type, string_types)) - - -def _iterable_not_string(x): - return (isinstance(x, collections.Iterable) and - not isinstance(x, string_types)) - - -def is_iterator(obj): - # python 3 generators have __next__ instead of next - return hasattr(obj, 'next') or hasattr(obj, '__next__') - - -def is_re(obj): - return isinstance(obj, re._pattern_type) - - -def is_re_compilable(obj): - try: - re.compile(obj) - except TypeError: - return False - else: - return True - - -def is_list_like(arg): - return (hasattr(arg, '__iter__') and - not isinstance(arg, string_and_binary_types)) - - -def is_dict_like(arg): - return hasattr(arg, '__getitem__') and hasattr(arg, 'keys') - - -def is_named_tuple(arg): - return isinstance(arg, tuple) and hasattr(arg, '_fields') - - -def is_hashable(arg): - """Return True if hash(arg) will succeed, False otherwise. - - Some types will pass a test against collections.Hashable but fail when they - are actually hashed with hash(). - - Distinguish between these and other types by trying the call to hash() and - seeing if they raise TypeError. - - Examples - -------- - >>> a = ([],) - >>> isinstance(a, collections.Hashable) - True - >>> is_hashable(a) - False - """ - # unfortunately, we can't use isinstance(arg, collections.Hashable), which - # can be faster than calling hash, because numpy scalars on Python 3 fail - # this test - - # reconsider this decision once this numpy bug is fixed: - # https://github.com/numpy/numpy/issues/5562 - - try: - hash(arg) - except TypeError: - return False - else: - return True - - -def is_sequence(x): - try: - iter(x) - len(x) # it has a length - return not isinstance(x, string_and_binary_types) - except (TypeError, AttributeError): - return False diff --git a/pandas/util/__init__.py b/pandas/util/__init__.py index e69de29bb2d1d..202e58c916e47 100644 --- a/pandas/util/__init__.py +++ b/pandas/util/__init__.py @@ -0,0 +1,2 @@ +from pandas.util._decorators import Appender, Substitution, cache_readonly # noqa +from pandas.core.util.hashing import hash_pandas_object, hash_array # noqa diff --git a/pandas/util/_decorators.py b/pandas/util/_decorators.py new file mode 100644 index 0000000000000..bb7ffe45c689b --- /dev/null +++ b/pandas/util/_decorators.py @@ -0,0 +1,298 @@ +from pandas.compat import callable, signature +from pandas._libs.lib import cache_readonly # noqa +import types +import warnings +from textwrap import dedent +from functools import wraps, update_wrapper + + +def deprecate(name, alternative, alt_name=None, klass=None, + stacklevel=2): + """ + Return a new function that emits a deprecation warning on use. + + Parameters + ---------- + name : str + Name of function to deprecate + alternative : str + Name of function to use instead + alt_name : str, optional + Name to use in preference of alternative.__name__ + klass : Warning, default FutureWarning + stacklevel : int, default 2 + """ + + alt_name = alt_name or alternative.__name__ + klass = klass or FutureWarning + + def wrapper(*args, **kwargs): + msg = "{name} is deprecated. Use {alt_name} instead".format( + name=name, alt_name=alt_name) + warnings.warn(msg, klass, stacklevel=stacklevel) + return alternative(*args, **kwargs) + return wrapper + + +def deprecate_kwarg(old_arg_name, new_arg_name, mapping=None, stacklevel=2): + """ + Decorator to deprecate a keyword argument of a function. + + Parameters + ---------- + old_arg_name : str + Name of argument in function to deprecate + new_arg_name : str + Name of preferred argument in function + mapping : dict or callable + If mapping is present, use it to translate old arguments to + new arguments. A callable must do its own value checking; + values not found in a dict will be forwarded unchanged. + + Examples + -------- + The following deprecates 'cols', using 'columns' instead + + >>> @deprecate_kwarg(old_arg_name='cols', new_arg_name='columns') + ... def f(columns=''): + ... print(columns) + ... + >>> f(columns='should work ok') + should work ok + >>> f(cols='should raise warning') + FutureWarning: cols is deprecated, use columns instead + warnings.warn(msg, FutureWarning) + should raise warning + >>> f(cols='should error', columns="can\'t pass do both") + TypeError: Can only specify 'cols' or 'columns', not both + >>> @deprecate_kwarg('old', 'new', {'yes': True, 'no': False}) + ... def f(new=False): + ... print('yes!' if new else 'no!') + ... + >>> f(old='yes') + FutureWarning: old='yes' is deprecated, use new=True instead + warnings.warn(msg, FutureWarning) + yes! + """ + + if mapping is not None and not hasattr(mapping, 'get') and \ + not callable(mapping): + raise TypeError("mapping from old to new argument values " + "must be dict or callable!") + + def _deprecate_kwarg(func): + @wraps(func) + def wrapper(*args, **kwargs): + old_arg_value = kwargs.pop(old_arg_name, None) + if old_arg_value is not None: + if mapping is not None: + if hasattr(mapping, 'get'): + new_arg_value = mapping.get(old_arg_value, + old_arg_value) + else: + new_arg_value = mapping(old_arg_value) + msg = ("the {old_name}={old_val!r} keyword is deprecated, " + "use {new_name}={new_val!r} instead" + ).format(old_name=old_arg_name, + old_val=old_arg_value, + new_name=new_arg_name, + new_val=new_arg_value) + else: + new_arg_value = old_arg_value + msg = ("the '{old_name}' keyword is deprecated, " + "use '{new_name}' instead" + ).format(old_name=old_arg_name, + new_name=new_arg_name) + + warnings.warn(msg, FutureWarning, stacklevel=stacklevel) + if kwargs.get(new_arg_name, None) is not None: + msg = ("Can only specify '{old_name}' or '{new_name}', " + "not both").format(old_name=old_arg_name, + new_name=new_arg_name) + raise TypeError(msg) + else: + kwargs[new_arg_name] = new_arg_value + return func(*args, **kwargs) + return wrapper + return _deprecate_kwarg + + +# Substitution and Appender are derived from matplotlib.docstring (1.1.0) +# module http://matplotlib.org/users/license.html + + +class Substitution(object): + """ + A decorator to take a function's docstring and perform string + substitution on it. + + This decorator should be robust even if func.__doc__ is None + (for example, if -OO was passed to the interpreter) + + Usage: construct a docstring.Substitution with a sequence or + dictionary suitable for performing substitution; then + decorate a suitable function with the constructed object. e.g. + + sub_author_name = Substitution(author='Jason') + + @sub_author_name + def some_function(x): + "%(author)s wrote this function" + + # note that some_function.__doc__ is now "Jason wrote this function" + + One can also use positional arguments. + + sub_first_last_names = Substitution('Edgar Allen', 'Poe') + + @sub_first_last_names + def some_function(x): + "%s %s wrote the Raven" + """ + + def __init__(self, *args, **kwargs): + if (args and kwargs): + raise AssertionError("Only positional or keyword args are allowed") + + self.params = args or kwargs + + def __call__(self, func): + func.__doc__ = func.__doc__ and func.__doc__ % self.params + return func + + def update(self, *args, **kwargs): + """ + Update self.params with supplied args. + + If called, we assume self.params is a dict. + """ + + self.params.update(*args, **kwargs) + + @classmethod + def from_params(cls, params): + """ + In the case where the params is a mutable sequence (list or dictionary) + and it may change before this class is called, one may explicitly use a + reference to the params rather than using *args or **kwargs which will + copy the values and not reference them. + """ + result = cls() + result.params = params + return result + + +class Appender(object): + """ + A function decorator that will append an addendum to the docstring + of the target function. + + This decorator should be robust even if func.__doc__ is None + (for example, if -OO was passed to the interpreter). + + Usage: construct a docstring.Appender with a string to be joined to + the original docstring. An optional 'join' parameter may be supplied + which will be used to join the docstring and addendum. e.g. + + add_copyright = Appender("Copyright (c) 2009", join='\n') + + @add_copyright + def my_dog(has='fleas'): + "This docstring will have a copyright below" + pass + """ + + def __init__(self, addendum, join='', indents=0): + if indents > 0: + self.addendum = indent(addendum, indents=indents) + else: + self.addendum = addendum + self.join = join + + def __call__(self, func): + func.__doc__ = func.__doc__ if func.__doc__ else '' + self.addendum = self.addendum if self.addendum else '' + docitems = [func.__doc__, self.addendum] + func.__doc__ = dedent(self.join.join(docitems)) + return func + + +def indent(text, indents=1): + if not text or not isinstance(text, str): + return '' + jointext = ''.join(['\n'] + [' '] * indents) + return jointext.join(text.split('\n')) + + +def make_signature(func): + """ + Returns a string repr of the arg list of a func call, with any defaults. + + Examples + -------- + >>> def f(a,b,c=2) : + >>> return a*b*c + >>> print(_make_signature(f)) + a,b,c=2 + """ + + spec = signature(func) + if spec.defaults is None: + n_wo_defaults = len(spec.args) + defaults = ('',) * n_wo_defaults + else: + n_wo_defaults = len(spec.args) - len(spec.defaults) + defaults = ('',) * n_wo_defaults + spec.defaults + args = [] + for i, (var, default) in enumerate(zip(spec.args, defaults)): + args.append(var if default == '' else var + '=' + repr(default)) + if spec.varargs: + args.append('*' + spec.varargs) + if spec.keywords: + args.append('**' + spec.keywords) + return args, spec.args + + +class docstring_wrapper(object): + """ + Decorator to wrap a function and provide + a dynamically evaluated doc-string. + + Parameters + ---------- + func : callable + creator : callable + return the doc-string + default : str, optional + return this doc-string on error + """ + _attrs = ['__module__', '__name__', + '__qualname__', '__annotations__'] + + def __init__(self, func, creator, default=None): + self.func = func + self.creator = creator + self.default = default + update_wrapper( + self, func, [attr for attr in self._attrs + if hasattr(func, attr)]) + + def __get__(self, instance, cls=None): + + # we are called with a class + if instance is None: + return self + + # we want to return the actual passed instance + return types.MethodType(self, instance) + + def __call__(self, *args, **kwargs): + return self.func(*args, **kwargs) + + @property + def __doc__(self): + try: + return self.creator() + except Exception as exc: + msg = self.default or str(exc) + return msg diff --git a/pandas/util/_depr_module.py b/pandas/util/_depr_module.py new file mode 100644 index 0000000000000..9c648b76fdad1 --- /dev/null +++ b/pandas/util/_depr_module.py @@ -0,0 +1,103 @@ +""" +This module houses a utility class for mocking deprecated modules. +It is for internal use only and should not be used beyond this purpose. +""" + +import warnings +import importlib + + +class _DeprecatedModule(object): + """ Class for mocking deprecated modules. + + Parameters + ---------- + deprmod : name of module to be deprecated. + deprmodto : name of module as a replacement, optional. + If not given, the __module__ attribute will + be used when needed. + removals : objects or methods in module that will no longer be + accessible once module is removed. + moved : dict, optional + dictionary of function name -> new location for moved + objects + """ + + def __init__(self, deprmod, deprmodto=None, removals=None, + moved=None): + self.deprmod = deprmod + self.deprmodto = deprmodto + self.removals = removals + if self.removals is not None: + self.removals = frozenset(self.removals) + self.moved = moved + + # For introspection purposes. + self.self_dir = frozenset(dir(self.__class__)) + + def __dir__(self): + deprmodule = self._import_deprmod() + return dir(deprmodule) + + def __repr__(self): + deprmodule = self._import_deprmod() + return repr(deprmodule) + + __str__ = __repr__ + + def __getattr__(self, name): + if name in self.self_dir: + return object.__getattribute__(self, name) + + try: + deprmodule = self._import_deprmod(self.deprmod) + except ImportError: + if self.deprmodto is None: + raise + + # a rename + deprmodule = self._import_deprmod(self.deprmodto) + + obj = getattr(deprmodule, name) + + if self.removals is not None and name in self.removals: + warnings.warn( + "{deprmod}.{name} is deprecated and will be removed in " + "a future version.".format(deprmod=self.deprmod, name=name), + FutureWarning, stacklevel=2) + elif self.moved is not None and name in self.moved: + warnings.warn( + "{deprmod} is deprecated and will be removed in " + "a future version.\nYou can access {name} as {moved}".format( + deprmod=self.deprmod, + name=name, + moved=self.moved[name]), + FutureWarning, stacklevel=2) + else: + deprmodto = self.deprmodto + if deprmodto is False: + warnings.warn( + "{deprmod}.{name} is deprecated and will be removed in " + "a future version.".format( + deprmod=self.deprmod, name=name), + FutureWarning, stacklevel=2) + else: + if deprmodto is None: + deprmodto = obj.__module__ + # The object is actually located in another module. + warnings.warn( + "{deprmod}.{name} is deprecated. Please use " + "{deprmodto}.{name} instead.".format( + deprmod=self.deprmod, name=name, deprmodto=deprmodto), + FutureWarning, stacklevel=2) + + return obj + + def _import_deprmod(self, mod=None): + if mod is None: + mod = self.deprmod + + with warnings.catch_warnings(): + warnings.filterwarnings('ignore', category=FutureWarning) + deprmodule = importlib.import_module(mod) + return deprmodule diff --git a/pandas/util/doctools.py b/pandas/util/_doctools.py similarity index 96% rename from pandas/util/doctools.py rename to pandas/util/_doctools.py index 62dcba1405581..cbc9518b96416 100644 --- a/pandas/util/doctools.py +++ b/pandas/util/_doctools.py @@ -113,12 +113,12 @@ def _insert_index(self, data): else: for i in range(idx_nlevels): data.insert(i, 'Index{0}'.format(i), - data.index.get_level_values(i)) + data.index._get_level_values(i)) col_nlevels = data.columns.nlevels if col_nlevels > 1: - col = data.columns.get_level_values(0) - values = [data.columns.get_level_values(i).values + col = data.columns._get_level_values(0) + values = [data.columns._get_level_values(i).values for i in range(1, col_nlevels)] col_df = pd.DataFrame(values) data.columns = col_df.columns @@ -131,7 +131,7 @@ def _make_table(self, ax, df, title, height=None): ax.set_visible(False) return - import pandas.tools.plotting as plotting + import pandas.plotting as plotting idx_nlevels = df.index.nlevels col_nlevels = df.columns.nlevels diff --git a/pandas/util/print_versions.py b/pandas/util/_print_versions.py similarity index 79% rename from pandas/util/print_versions.py rename to pandas/util/_print_versions.py index c7b9f4bdea6b2..83c1433bf5c39 100644 --- a/pandas/util/print_versions.py +++ b/pandas/util/_print_versions.py @@ -38,18 +38,17 @@ def get_sys_info(): (sysname, nodename, release, version, machine, processor) = platform.uname() blob.extend([ - ("python", "%d.%d.%d.%s.%s" % sys.version_info[:]), + ("python", '.'.join(map(str, sys.version_info))), ("python-bits", struct.calcsize("P") * 8), - ("OS", "%s" % (sysname)), - ("OS-release", "%s" % (release)), - # ("Version", "%s" % (version)), - ("machine", "%s" % (machine)), - ("processor", "%s" % (processor)), - ("byteorder", "%s" % sys.byteorder), - ("LC_ALL", "%s" % os.environ.get('LC_ALL', "None")), - ("LANG", "%s" % os.environ.get('LANG', "None")), - ("LOCALE", "%s.%s" % locale.getlocale()), - + ("OS", "{sysname}".format(sysname=sysname)), + ("OS-release", "{release}".format(release=release)), + # ("Version", "{version}".format(version=version)), + ("machine", "{machine}".format(machine=machine)), + ("processor", "{processor}".format(processor=processor)), + ("byteorder", "{byteorder}".format(byteorder=sys.byteorder)), + ("LC_ALL", "{lc}".format(lc=os.environ.get('LC_ALL', "None"))), + ("LANG", "{lang}".format(lang=os.environ.get('LANG', "None"))), + ("LOCALE", '.'.join(map(str, locale.getlocale()))), ]) except: pass @@ -63,13 +62,13 @@ def show_versions(as_json=False): deps = [ # (MODULE_NAME, f(mod) -> mod version) ("pandas", lambda mod: mod.__version__), - ("nose", lambda mod: mod.__version__), + ("pytest", lambda mod: mod.__version__), ("pip", lambda mod: mod.__version__), ("setuptools", lambda mod: mod.__version__), ("Cython", lambda mod: mod.__version__), ("numpy", lambda mod: mod.version.version), ("scipy", lambda mod: mod.version.version), - ("statsmodels", lambda mod: mod.__version__), + ("pyarrow", lambda mod: mod.__version__), ("xarray", lambda mod: mod.__version__), ("IPython", lambda mod: mod.__version__), ("sphinx", lambda mod: mod.__version__), @@ -89,14 +88,14 @@ def show_versions(as_json=False): ("lxml", lambda mod: mod.etree.__version__), ("bs4", lambda mod: mod.__version__), ("html5lib", lambda mod: mod.__version__), - ("httplib2", lambda mod: mod.__version__), - ("apiclient", lambda mod: mod.__version__), ("sqlalchemy", lambda mod: mod.__version__), ("pymysql", lambda mod: mod.__version__), ("psycopg2", lambda mod: mod.__version__), ("jinja2", lambda mod: mod.__version__), ("s3fs", lambda mod: mod.__version__), - ("pandas_datareader", lambda mod: mod.__version__) + ("fastparquet", lambda mod: mod.__version__), + ("pandas_gbq", lambda mod: mod.__version__), + ("pandas_datareader", lambda mod: mod.__version__), ] deps_blob = list() @@ -131,11 +130,11 @@ def show_versions(as_json=False): print("------------------") for k, stat in sys_info: - print("%s: %s" % (k, stat)) + print("{k}: {stat}".format(k=k, stat=stat)) print("") for k, stat in deps_blob: - print("%s: %s" % (k, stat)) + print("{k}: {stat}".format(k=k, stat=stat)) def main(): @@ -154,5 +153,6 @@ def main(): return 0 + if __name__ == "__main__": sys.exit(main()) diff --git a/pandas/util/_tester.py b/pandas/util/_tester.py new file mode 100644 index 0000000000000..aeb4259a9edae --- /dev/null +++ b/pandas/util/_tester.py @@ -0,0 +1,27 @@ +""" +Entrypoint for testing from the top-level namespace +""" +import os +import sys + +PKG = os.path.dirname(os.path.dirname(__file__)) + + +try: + import pytest +except ImportError: + def test(): + raise ImportError("Need pytest>=3.0 to run tests") +else: + def test(extra_args=None): + cmd = ['--skip-slow', '--skip-network'] + if extra_args: + if not isinstance(extra_args, list): + extra_args = [extra_args] + cmd = extra_args + cmd += [PKG] + print("running: pytest {}".format(' '.join(cmd))) + sys.exit(pytest.main(cmd)) + + +__all__ = ['test'] diff --git a/pandas/util/validators.py b/pandas/util/_validators.py similarity index 96% rename from pandas/util/validators.py rename to pandas/util/_validators.py index f22412a2bcd17..2661e4a98aedf 100644 --- a/pandas/util/validators.py +++ b/pandas/util/_validators.py @@ -3,7 +3,7 @@ for validating data or function arguments """ -from pandas.types.common import is_bool +from pandas.core.dtypes.common import is_bool def _check_arg_length(fname, args, max_fname_arg_count, compat_args): @@ -220,7 +220,7 @@ def validate_args_and_kwargs(fname, args, kwargs, def validate_bool_kwarg(value, arg_name): """ Ensures that argument passed in arg_name is of type bool. """ if not (is_bool(value) or value is None): - raise ValueError('For argument "%s" expected type bool, ' - 'received type %s.' % - (arg_name, type(value).__name__)) + raise ValueError('For argument "{arg}" expected type bool, received ' + 'type {typ}.'.format(arg=arg_name, + typ=type(value).__name__)) return value diff --git a/pandas/util/decorators.py b/pandas/util/decorators.py index e1888a3ffd62a..54bb834e829f3 100644 --- a/pandas/util/decorators.py +++ b/pandas/util/decorators.py @@ -1,292 +1,8 @@ -from pandas.compat import StringIO, callable, signature -from pandas.lib import cache_readonly # noqa -import sys import warnings -from textwrap import dedent -from functools import wraps +warnings.warn("pandas.util.decorators is deprecated and will be " + "removed in a future version, import " + "from pandas.util", + DeprecationWarning, stacklevel=3) -def deprecate(name, alternative, alt_name=None): - alt_name = alt_name or alternative.__name__ - - def wrapper(*args, **kwargs): - warnings.warn("%s is deprecated. Use %s instead" % (name, alt_name), - FutureWarning, stacklevel=2) - return alternative(*args, **kwargs) - return wrapper - - -def deprecate_kwarg(old_arg_name, new_arg_name, mapping=None, stacklevel=2): - """Decorator to deprecate a keyword argument of a function - - Parameters - ---------- - old_arg_name : str - Name of argument in function to deprecate - new_arg_name : str - Name of prefered argument in function - mapping : dict or callable - If mapping is present, use it to translate old arguments to - new arguments. A callable must do its own value checking; - values not found in a dict will be forwarded unchanged. - - Examples - -------- - The following deprecates 'cols', using 'columns' instead - - >>> @deprecate_kwarg(old_arg_name='cols', new_arg_name='columns') - ... def f(columns=''): - ... print(columns) - ... - >>> f(columns='should work ok') - should work ok - >>> f(cols='should raise warning') - FutureWarning: cols is deprecated, use columns instead - warnings.warn(msg, FutureWarning) - should raise warning - >>> f(cols='should error', columns="can\'t pass do both") - TypeError: Can only specify 'cols' or 'columns', not both - >>> @deprecate_kwarg('old', 'new', {'yes': True, 'no': False}) - ... def f(new=False): - ... print('yes!' if new else 'no!') - ... - >>> f(old='yes') - FutureWarning: old='yes' is deprecated, use new=True instead - warnings.warn(msg, FutureWarning) - yes! - - """ - if mapping is not None and not hasattr(mapping, 'get') and \ - not callable(mapping): - raise TypeError("mapping from old to new argument values " - "must be dict or callable!") - - def _deprecate_kwarg(func): - @wraps(func) - def wrapper(*args, **kwargs): - old_arg_value = kwargs.pop(old_arg_name, None) - if old_arg_value is not None: - if mapping is not None: - if hasattr(mapping, 'get'): - new_arg_value = mapping.get(old_arg_value, - old_arg_value) - else: - new_arg_value = mapping(old_arg_value) - msg = "the %s=%r keyword is deprecated, " \ - "use %s=%r instead" % \ - (old_arg_name, old_arg_value, - new_arg_name, new_arg_value) - else: - new_arg_value = old_arg_value - msg = "the '%s' keyword is deprecated, " \ - "use '%s' instead" % (old_arg_name, new_arg_name) - - warnings.warn(msg, FutureWarning, stacklevel=stacklevel) - if kwargs.get(new_arg_name, None) is not None: - msg = ("Can only specify '%s' or '%s', not both" % - (old_arg_name, new_arg_name)) - raise TypeError(msg) - else: - kwargs[new_arg_name] = new_arg_value - return func(*args, **kwargs) - return wrapper - return _deprecate_kwarg - - -# Substitution and Appender are derived from matplotlib.docstring (1.1.0) -# module http://matplotlib.org/users/license.html - - -class Substitution(object): - """ - A decorator to take a function's docstring and perform string - substitution on it. - - This decorator should be robust even if func.__doc__ is None - (for example, if -OO was passed to the interpreter) - - Usage: construct a docstring.Substitution with a sequence or - dictionary suitable for performing substitution; then - decorate a suitable function with the constructed object. e.g. - - sub_author_name = Substitution(author='Jason') - - @sub_author_name - def some_function(x): - "%(author)s wrote this function" - - # note that some_function.__doc__ is now "Jason wrote this function" - - One can also use positional arguments. - - sub_first_last_names = Substitution('Edgar Allen', 'Poe') - - @sub_first_last_names - def some_function(x): - "%s %s wrote the Raven" - """ - def __init__(self, *args, **kwargs): - if (args and kwargs): - raise AssertionError("Only positional or keyword args are allowed") - - self.params = args or kwargs - - def __call__(self, func): - func.__doc__ = func.__doc__ and func.__doc__ % self.params - return func - - def update(self, *args, **kwargs): - "Assume self.params is a dict and update it with supplied args" - self.params.update(*args, **kwargs) - - @classmethod - def from_params(cls, params): - """ - In the case where the params is a mutable sequence (list or dictionary) - and it may change before this class is called, one may explicitly use a - reference to the params rather than using *args or **kwargs which will - copy the values and not reference them. - """ - result = cls() - result.params = params - return result - - -class Appender(object): - """ - A function decorator that will append an addendum to the docstring - of the target function. - - This decorator should be robust even if func.__doc__ is None - (for example, if -OO was passed to the interpreter). - - Usage: construct a docstring.Appender with a string to be joined to - the original docstring. An optional 'join' parameter may be supplied - which will be used to join the docstring and addendum. e.g. - - add_copyright = Appender("Copyright (c) 2009", join='\n') - - @add_copyright - def my_dog(has='fleas'): - "This docstring will have a copyright below" - pass - """ - def __init__(self, addendum, join='', indents=0): - if indents > 0: - self.addendum = indent(addendum, indents=indents) - else: - self.addendum = addendum - self.join = join - - def __call__(self, func): - func.__doc__ = func.__doc__ if func.__doc__ else '' - self.addendum = self.addendum if self.addendum else '' - docitems = [func.__doc__, self.addendum] - func.__doc__ = dedent(self.join.join(docitems)) - return func - - -def indent(text, indents=1): - if not text or not isinstance(text, str): - return '' - jointext = ''.join(['\n'] + [' '] * indents) - return jointext.join(text.split('\n')) - - -def suppress_stdout(f): - def wrapped(*args, **kwargs): - try: - sys.stdout = StringIO() - f(*args, **kwargs) - finally: - sys.stdout = sys.__stdout__ - - return wrapped - - -class KnownFailureTest(Exception): - """Raise this exception to mark a test as a known failing test.""" - pass - - -def knownfailureif(fail_condition, msg=None): - """ - Make function raise KnownFailureTest exception if given condition is true. - - If the condition is a callable, it is used at runtime to dynamically - make the decision. This is useful for tests that may require costly - imports, to delay the cost until the test suite is actually executed. - - Parameters - ---------- - fail_condition : bool or callable - Flag to determine whether to mark the decorated test as a known - failure (if True) or not (if False). - msg : str, optional - Message to give on raising a KnownFailureTest exception. - Default is None. - - Returns - ------- - decorator : function - Decorator, which, when applied to a function, causes SkipTest - to be raised when `skip_condition` is True, and the function - to be called normally otherwise. - - Notes - ----- - The decorator itself is decorated with the ``nose.tools.make_decorator`` - function in order to transmit function name, and various other metadata. - - """ - if msg is None: - msg = 'Test skipped due to known failure' - - # Allow for both boolean or callable known failure conditions. - if callable(fail_condition): - fail_val = fail_condition - else: - fail_val = lambda: fail_condition - - def knownfail_decorator(f): - # Local import to avoid a hard nose dependency and only incur the - # import time overhead at actual test-time. - import nose - - def knownfailer(*args, **kwargs): - if fail_val(): - raise KnownFailureTest(msg) - else: - return f(*args, **kwargs) - return nose.tools.make_decorator(f)(knownfailer) - - return knownfail_decorator - - -def make_signature(func): - """ - Returns a string repr of the arg list of a func call, with any defaults - - Examples - -------- - - >>> def f(a,b,c=2) : - >>> return a*b*c - >>> print(_make_signature(f)) - a,b,c=2 - """ - spec = signature(func) - if spec.defaults is None: - n_wo_defaults = len(spec.args) - defaults = ('',) * n_wo_defaults - else: - n_wo_defaults = len(spec.args) - len(spec.defaults) - defaults = ('',) * n_wo_defaults + spec.defaults - args = [] - for i, (var, default) in enumerate(zip(spec.args, defaults)): - args.append(var if default == '' else var + '=' + repr(default)) - if spec.varargs: - args.append('*' + spec.varargs) - if spec.keywords: - args.append('**' + spec.keywords) - return args, spec.args +from pandas.util._decorators import * # noqa diff --git a/pandas/util/depr_module.py b/pandas/util/depr_module.py deleted file mode 100644 index 736d2cdaab31c..0000000000000 --- a/pandas/util/depr_module.py +++ /dev/null @@ -1,64 +0,0 @@ -""" -This module houses a utility class for mocking deprecated modules. -It is for internal use only and should not be used beyond this purpose. -""" - -import warnings -import importlib - - -class _DeprecatedModule(object): - """ Class for mocking deprecated modules. - - Parameters - ---------- - deprmod : name of module to be deprecated. - removals : objects or methods in module that will no longer be - accessible once module is removed. - """ - def __init__(self, deprmod, removals=None): - self.deprmod = deprmod - self.removals = removals - if self.removals is not None: - self.removals = frozenset(self.removals) - - # For introspection purposes. - self.self_dir = frozenset(dir(self.__class__)) - - def __dir__(self): - deprmodule = self._import_deprmod() - return dir(deprmodule) - - def __repr__(self): - deprmodule = self._import_deprmod() - return repr(deprmodule) - - __str__ = __repr__ - - def __getattr__(self, name): - if name in self.self_dir: - return object.__getattribute__(self, name) - - deprmodule = self._import_deprmod() - obj = getattr(deprmodule, name) - - if self.removals is not None and name in self.removals: - warnings.warn( - "{deprmod}.{name} is deprecated and will be removed in " - "a future version.".format(deprmod=self.deprmod, name=name), - FutureWarning, stacklevel=2) - else: - # The object is actually located in another module. - warnings.warn( - "{deprmod}.{name} is deprecated. Please use " - "{modname}.{name} instead.".format( - deprmod=self.deprmod, modname=obj.__module__, name=name), - FutureWarning, stacklevel=2) - - return obj - - def _import_deprmod(self): - with warnings.catch_warnings(): - warnings.filterwarnings('ignore', category=FutureWarning) - deprmodule = importlib.import_module(self.deprmod) - return deprmodule diff --git a/pandas/util/hashing.py b/pandas/util/hashing.py new file mode 100644 index 0000000000000..f97a7ac507407 --- /dev/null +++ b/pandas/util/hashing.py @@ -0,0 +1,18 @@ +import warnings +import sys + +m = sys.modules['pandas.util.hashing'] +for t in ['hash_pandas_object', 'hash_array']: + + def outer(t=t): + + def wrapper(*args, **kwargs): + from pandas import util + warnings.warn("pandas.util.hashing is deprecated and will be " + "removed in a future version, import " + "from pandas.util", + DeprecationWarning, stacklevel=3) + return getattr(util, t)(*args, **kwargs) + return wrapper + + setattr(m, t, outer(t)) diff --git a/pandas/util/nosetester.py b/pandas/util/nosetester.py deleted file mode 100644 index 1bdaaff99fd50..0000000000000 --- a/pandas/util/nosetester.py +++ /dev/null @@ -1,261 +0,0 @@ -""" -Nose test running. - -This module implements ``test()`` function for pandas modules. - -""" -from __future__ import division, absolute_import, print_function - -import os -import sys -import warnings -from pandas.compat import string_types -from numpy.testing import nosetester - - -def get_package_name(filepath): - """ - Given a path where a package is installed, determine its name. - - Parameters - ---------- - filepath : str - Path to a file. If the determination fails, "pandas" is returned. - - Examples - -------- - >>> pandas.util.nosetester.get_package_name('nonsense') - 'pandas' - - """ - - pkg_name = [] - while 'site-packages' in filepath or 'dist-packages' in filepath: - filepath, p2 = os.path.split(filepath) - if p2 in ('site-packages', 'dist-packages'): - break - pkg_name.append(p2) - - # if package name determination failed, just default to pandas - if not pkg_name: - return "pandas" - - # otherwise, reverse to get correct order and return - pkg_name.reverse() - - # don't include the outer egg directory - if pkg_name[0].endswith('.egg'): - pkg_name.pop(0) - - return '.'.join(pkg_name) - -import_nose = nosetester.import_nose -run_module_suite = nosetester.run_module_suite - - -class NoseTester(nosetester.NoseTester): - """ - Nose test runner. - - This class is made available as pandas.util.nosetester.NoseTester, and - a test function is typically added to a package's __init__.py like so:: - - from numpy.testing import Tester - test = Tester().test - - Calling this test function finds and runs all tests associated with the - package and all its sub-packages. - - Attributes - ---------- - package_path : str - Full path to the package to test. - package_name : str - Name of the package to test. - - Parameters - ---------- - package : module, str or None, optional - The package to test. If a string, this should be the full path to - the package. If None (default), `package` is set to the module from - which `NoseTester` is initialized. - raise_warnings : None, str or sequence of warnings, optional - This specifies which warnings to configure as 'raise' instead - of 'warn' during the test execution. Valid strings are: - - - "develop" : equals ``(DeprecationWarning, RuntimeWarning)`` - - "release" : equals ``()``, don't raise on any warnings. - - See Notes for more details. - - Notes - ----- - The default for `raise_warnings` is - ``(DeprecationWarning, RuntimeWarning)`` for development versions of - pandas, and ``()`` for released versions. The purpose of this switching - behavior is to catch as many warnings as possible during development, but - not give problems for packaging of released versions. - - """ - excludes = [] - - def _show_system_info(self): - nose = import_nose() - - import pandas - print("pandas version %s" % pandas.__version__) - import numpy - print("numpy version %s" % numpy.__version__) - pddir = os.path.dirname(pandas.__file__) - print("pandas is installed in %s" % pddir) - - pyversion = sys.version.replace('\n', '') - print("Python version %s" % pyversion) - print("nose version %d.%d.%d" % nose.__versioninfo__) - - def _get_custom_doctester(self): - """ Return instantiated plugin for doctests - - Allows subclassing of this class to override doctester - - A return value of None means use the nose builtin doctest plugin - """ - return None - - def _test_argv(self, label, verbose, extra_argv): - """ - Generate argv for nosetest command - - Parameters - ---------- - label : {'fast', 'full', '', attribute identifier}, optional - see ``test`` docstring - verbose : int, optional - Verbosity value for test outputs, in the range 1-10. Default is 1. - extra_argv : list, optional - List with any extra arguments to pass to nosetests. - - Returns - ------- - argv : list - command line arguments that will be passed to nose - """ - - argv = [__file__, self.package_path] - if label and label != 'full': - if not isinstance(label, string_types): - raise TypeError('Selection label should be a string') - if label == 'fast': - label = 'not slow and not network and not disabled' - argv += ['-A', label] - argv += ['--verbosity', str(verbose)] - - # When installing with setuptools, and also in some other cases, the - # test_*.py files end up marked +x executable. Nose, by default, does - # not run files marked with +x as they might be scripts. However, in - # our case nose only looks for test_*.py files under the package - # directory, which should be safe. - argv += ['--exe'] - - if extra_argv: - argv += extra_argv - return argv - - def test(self, label='fast', verbose=1, extra_argv=None, - doctests=False, coverage=False, raise_warnings=None): - """ - Run tests for module using nose. - - Parameters - ---------- - label : {'fast', 'full', '', attribute identifier}, optional - Identifies the tests to run. This can be a string to pass to - the nosetests executable with the '-A' option, or one of several - special values. Special values are: - - * 'fast' - the default - which corresponds to the ``nosetests -A`` - option of 'not slow'. - * 'full' - fast (as above) and slow tests as in the - 'no -A' option to nosetests - this is the same as ''. - * None or '' - run all tests. - * attribute_identifier - string passed directly to nosetests - as '-A'. - - verbose : int, optional - Verbosity value for test outputs, in the range 1-10. Default is 1. - extra_argv : list, optional - List with any extra arguments to pass to nosetests. - doctests : bool, optional - If True, run doctests in module. Default is False. - coverage : bool, optional - If True, report coverage of NumPy code. Default is False. - (This requires the `coverage module - `_). - raise_warnings : str or sequence of warnings, optional - This specifies which warnings to configure as 'raise' instead - of 'warn' during the test execution. Valid strings are: - - - 'develop' : equals ``(DeprecationWarning, RuntimeWarning)`` - - 'release' : equals ``()``, don't raise on any warnings. - - Returns - ------- - result : object - Returns the result of running the tests as a - ``nose.result.TextTestResult`` object. - - """ - - # cap verbosity at 3 because nose becomes *very* verbose beyond that - verbose = min(verbose, 3) - - if doctests: - print("Running unit tests and doctests for %s" % self.package_name) - else: - print("Running unit tests for %s" % self.package_name) - - self._show_system_info() - - # reset doctest state on every run - import doctest - doctest.master = None - - if raise_warnings is None: - - # default based on if we are released - from pandas import __version__ - from distutils.version import StrictVersion - try: - StrictVersion(__version__) - raise_warnings = 'release' - except ValueError: - raise_warnings = 'develop' - - _warn_opts = dict(develop=(DeprecationWarning, RuntimeWarning), - release=()) - if isinstance(raise_warnings, string_types): - raise_warnings = _warn_opts[raise_warnings] - - with warnings.catch_warnings(): - - if len(raise_warnings): - - # Reset the warning filters to the default state, - # so that running the tests is more repeatable. - warnings.resetwarnings() - # Set all warnings to 'warn', this is because the default - # 'once' has the bad property of possibly shadowing later - # warnings. - warnings.filterwarnings('always') - # Force the requested warnings to raise - for warningtype in raise_warnings: - warnings.filterwarnings('error', category=warningtype) - # Filter out annoying import messages. - warnings.filterwarnings("ignore", category=FutureWarning) - - from numpy.testing.noseclasses import NumpyTestProgram - argv, plugins = self.prepare_test_args( - label, verbose, extra_argv, doctests, coverage) - t = NumpyTestProgram(argv=argv, exit=False, plugins=plugins) - - return t.result diff --git a/pandas/util/testing.py b/pandas/util/testing.py index 6ea91543677a7..7dac83953ad8f 100644 --- a/pandas/util/testing.py +++ b/pandas/util/testing.py @@ -10,7 +10,6 @@ import os import subprocess import locale -import unittest import traceback from datetime import datetime @@ -19,42 +18,44 @@ from distutils.version import LooseVersion from numpy.random import randn, rand -from numpy.testing.decorators import slow # noqa import numpy as np import pandas as pd -from pandas.types.missing import array_equivalent -from pandas.types.common import (is_datetimelike_v_numeric, - is_datetimelike_v_object, - is_number, is_bool, - needs_i8_conversion, - is_categorical_dtype, - is_sequence, - is_list_like) -from pandas.formats.printing import pprint_thing +from pandas.core.dtypes.missing import array_equivalent +from pandas.core.dtypes.common import ( + is_datetimelike_v_numeric, + is_datetimelike_v_object, + is_number, is_bool, + needs_i8_conversion, + is_categorical_dtype, + is_interval_dtype, + is_sequence, + is_list_like) +from pandas.io.formats.printing import pprint_thing from pandas.core.algorithms import take_1d import pandas.compat as compat from pandas.compat import ( filter, map, zip, range, unichr, lrange, lmap, lzip, u, callable, Counter, raise_with_traceback, httplib, is_platform_windows, is_platform_32bit, - PY3 + StringIO, PY3 ) -from pandas.computation import expressions as expr +from pandas.core.computation import expressions as expr -from pandas import (bdate_range, CategoricalIndex, Categorical, DatetimeIndex, - TimedeltaIndex, PeriodIndex, RangeIndex, Index, MultiIndex, +from pandas import (bdate_range, CategoricalIndex, Categorical, IntervalIndex, + DatetimeIndex, TimedeltaIndex, PeriodIndex, RangeIndex, + Index, MultiIndex, Series, DataFrame, Panel, Panel4D) -from pandas.util.decorators import deprecate -from pandas import _testing + +from pandas._libs import testing as _testing from pandas.io.common import urlopen + N = 30 K = 4 _RAISE_NETWORK_ERROR_DEFAULT = False - # set testing_mode _testing_mode_warnings = (DeprecationWarning, compat.ResourceWarning) @@ -72,62 +73,110 @@ def reset_testing_mode(): if 'deprecate' in testing_mode: warnings.simplefilter('ignore', _testing_mode_warnings) + set_testing_mode() -class TestCase(unittest.TestCase): +def reset_display_options(): + """ + Reset the display options for printing and representing objects. + """ - @classmethod - def setUpClass(cls): - pd.set_option('chained_assignment', 'raise') + pd.reset_option('^display.', silent=True) + + +def round_trip_pickle(obj, path=None): + """ + Pickle an object and then read it again. + + Parameters + ---------- + obj : pandas object + The object to pickle and then re-read. + path : str, default None + The path where the pickled object is written and then read. + + Returns + ------- + round_trip_pickled_object : pandas object + The original object that was pickled and then re-read. + """ + + if path is None: + path = u('__{random_bytes}__.pickle'.format(random_bytes=rands(10))) + with ensure_clean(path) as path: + pd.to_pickle(obj, path) + return pd.read_pickle(path) - @classmethod - def tearDownClass(cls): - pass - def reset_display_options(self): - # reset the display options - pd.reset_option('^display.', silent=True) +def round_trip_pathlib(writer, reader, path=None): + """ + Write an object to file specifed by a pathlib.Path and read it back + + Parameters + ---------- + writer : callable bound to pandas object + IO writing function (e.g. DataFrame.to_csv ) + reader : callable + IO reading function (e.g. pd.read_csv ) + path : str, default None + The path where the object is written and then read. - def round_trip_pickle(self, obj, path=None): - if path is None: - path = u('__%s__.pickle' % rands(10)) - with ensure_clean(path) as path: - pd.to_pickle(obj, path) - return pd.read_pickle(path) + Returns + ------- + round_trip_object : pandas object + The original object that was serialized and then re-read. + """ - # https://docs.python.org/3/library/unittest.html#deprecated-aliases - def assertEquals(self, *args, **kwargs): - return deprecate('assertEquals', - self.assertEqual)(*args, **kwargs) + import pytest + Path = pytest.importorskip('pathlib').Path + if path is None: + path = '___pathlib___' + with ensure_clean(path) as path: + writer(Path(path)) + obj = reader(Path(path)) + return obj - def assertNotEquals(self, *args, **kwargs): - return deprecate('assertNotEquals', - self.assertNotEqual)(*args, **kwargs) - def assert_(self, *args, **kwargs): - return deprecate('assert_', - self.assertTrue)(*args, **kwargs) +def round_trip_localpath(writer, reader, path=None): + """ + Write an object to file specifed by a py.path LocalPath and read it back - def assertAlmostEquals(self, *args, **kwargs): - return deprecate('assertAlmostEquals', - self.assertAlmostEqual)(*args, **kwargs) + Parameters + ---------- + writer : callable bound to pandas object + IO writing function (e.g. DataFrame.to_csv ) + reader : callable + IO reading function (e.g. pd.read_csv ) + path : str, default None + The path where the object is written and then read. - def assertNotAlmostEquals(self, *args, **kwargs): - return deprecate('assertNotAlmostEquals', - self.assertNotAlmostEqual)(*args, **kwargs) + Returns + ------- + round_trip_object : pandas object + The original object that was serialized and then re-read. + """ + import pytest + LocalPath = pytest.importorskip('py.path').local + if path is None: + path = '___localpath___' + with ensure_clean(path) as path: + writer(LocalPath(path)) + obj = reader(LocalPath(path)) + return obj def assert_almost_equal(left, right, check_exact=False, check_dtype='equiv', check_less_precise=False, **kwargs): - """Check that left and right Index are equal. + """ + Check that the left and right objects are approximately equal. Parameters ---------- left : object right : object - check_exact : bool, default True + check_exact : bool, default False Whether to compare number exactly. check_dtype: bool, default True check dtype if both a and b are the same type @@ -165,7 +214,7 @@ def assert_almost_equal(left, right, check_exact=False, pass else: if (isinstance(left, np.ndarray) or - isinstance(right, np.ndarray)): + isinstance(right, np.ndarray)): obj = 'numpy array' else: obj = 'Input' @@ -177,11 +226,37 @@ def assert_almost_equal(left, right, check_exact=False, **kwargs) -def assert_dict_equal(left, right, compare_keys=True): +def _check_isinstance(left, right, cls): + """ + Helper method for our assert_* methods that ensures that + the two objects being compared have the right type before + proceeding with the comparison. + + Parameters + ---------- + left : The first object being compared. + right : The second object being compared. + cls : The class type to check against. + + Raises + ------ + AssertionError : Either `left` or `right` is not an instance of `cls`. + """ - assertIsInstance(left, dict, '[dict] ') - assertIsInstance(right, dict, '[dict] ') + err_msg = "{name} Expected type {exp_type}, found {act_type} instead" + cls_name = cls.__name__ + if not isinstance(left, cls): + raise AssertionError(err_msg.format(name=cls_name, exp_type=cls, + act_type=type(left))) + if not isinstance(right, cls): + raise AssertionError(err_msg.format(name=cls_name, exp_type=cls, + act_type=type(right))) + + +def assert_dict_equal(left, right, compare_keys=True): + + _check_isinstance(left, right, dict) return _testing.assert_dict_equal(left, right, compare_keys=compare_keys) @@ -246,127 +321,86 @@ def close(fignum=None): def _skip_if_32bit(): - import nose if is_platform_32bit(): - raise nose.SkipTest("skipping for 32 bit") - - -def mplskip(cls): - """Skip a TestCase instance if matplotlib isn't installed""" - - @classmethod - def setUpClass(cls): - try: - import matplotlib as mpl - mpl.use("Agg", warn=False) - except ImportError: - import nose - raise nose.SkipTest("matplotlib not installed") - - cls.setUpClass = setUpClass - return cls + import pytest + pytest.skip("skipping for 32 bit") def _skip_if_no_mpl(): - try: - import matplotlib # noqa - except ImportError: - import nose - raise nose.SkipTest("matplotlib not installed") + import pytest + + mpl = pytest.importorskip("matplotlib") + mpl.use("Agg", warn=False) def _skip_if_mpl_1_5(): - import matplotlib - v = matplotlib.__version__ + import matplotlib as mpl + + v = mpl.__version__ if v > LooseVersion('1.4.3') or v[0] == '0': - import nose - raise nose.SkipTest("matplotlib 1.5") + import pytest + pytest.skip("matplotlib 1.5") + else: + mpl.use("Agg", warn=False) def _skip_if_no_scipy(): - try: - import scipy.stats # noqa - except ImportError: - import nose - raise nose.SkipTest("no scipy.stats module") - try: - import scipy.interpolate # noqa - except ImportError: - import nose - raise nose.SkipTest('scipy.interpolate missing') - + import pytest -def _skip_if_scipy_0_17(): - import scipy - v = scipy.__version__ - if v >= LooseVersion("0.17.0"): - import nose - raise nose.SkipTest("scipy 0.17") + pytest.importorskip("scipy.stats") + pytest.importorskip("scipy.sparse") + pytest.importorskip("scipy.interpolate") -def _skip_if_no_lzma(): +def _check_if_lzma(): try: return compat.import_lzma() except ImportError: - import nose - raise nose.SkipTest('need backports.lzma to run') + return False -def _skip_if_no_xarray(): - try: - import xarray - except ImportError: - import nose - raise nose.SkipTest("xarray not installed") - - v = xarray.__version__ - if v < LooseVersion('0.7.0'): - import nose - raise nose.SkipTest("xarray not version is too low: {0}".format(v)) +def _skip_if_no_lzma(): + import pytest + return _check_if_lzma() or pytest.skip('need backports.lzma to run') -def _skip_if_no_pytz(): - try: - import pytz # noqa - except ImportError: - import nose - raise nose.SkipTest("pytz not installed") +def _skip_if_no_xarray(): + import pytest + xarray = pytest.importorskip("xarray") + v = xarray.__version__ -def _skip_if_no_dateutil(): - try: - import dateutil # noqa - except ImportError: - import nose - raise nose.SkipTest("dateutil not installed") + if v < LooseVersion('0.7.0'): + import pytest + pytest.skip("xarray version is too low: {version}".format(version=v)) def _skip_if_windows_python_3(): if PY3 and is_platform_windows(): - import nose - raise nose.SkipTest("not used on python 3/win32") + import pytest + pytest.skip("not used on python 3/win32") def _skip_if_windows(): if is_platform_windows(): - import nose - raise nose.SkipTest("Running on Windows") + import pytest + pytest.skip("Running on Windows") def _skip_if_no_pathlib(): try: from pathlib import Path # noqa except ImportError: - import nose - raise nose.SkipTest("pathlib not available") + import pytest + pytest.skip("pathlib not available") def _skip_if_no_localpath(): try: from py.path import local as LocalPath # noqa except ImportError: - import nose - raise nose.SkipTest("py.path not installed") + import pytest + pytest.skip("py.path not installed") def _incompat_bottleneck_version(method): @@ -385,32 +419,49 @@ def _incompat_bottleneck_version(method): def skip_if_no_ne(engine='numexpr'): - from pandas.computation.expressions import (_USE_NUMEXPR, - _NUMEXPR_INSTALLED) + from pandas.core.computation.expressions import ( + _USE_NUMEXPR, + _NUMEXPR_INSTALLED) if engine == 'numexpr': if not _USE_NUMEXPR: - import nose - raise nose.SkipTest("numexpr enabled->{enabled}, " - "installed->{installed}".format( - enabled=_USE_NUMEXPR, - installed=_NUMEXPR_INSTALLED)) + import pytest + pytest.skip("numexpr enabled->{enabled}, " + "installed->{installed}".format( + enabled=_USE_NUMEXPR, + installed=_NUMEXPR_INSTALLED)) def _skip_if_has_locale(): import locale lang, _ = locale.getlocale() if lang is not None: - import nose - raise nose.SkipTest("Specific locale is set {0}".format(lang)) + import pytest + pytest.skip("Specific locale is set {lang}".format(lang=lang)) def _skip_if_not_us_locale(): import locale lang, _ = locale.getlocale() if lang != 'en_US': - import nose - raise nose.SkipTest("Specific locale is set {0}".format(lang)) + import pytest + pytest.skip("Specific locale is set {lang}".format(lang=lang)) + + +def _skip_if_no_mock(): + try: + import mock # noqa + except ImportError: + try: + from unittest import mock # noqa + except ImportError: + import pytest + raise pytest.skip("mock is not installed") + + +def _skip_if_no_ipython(): + import pytest + pytest.importorskip("IPython") # ----------------------------------------------------------------------------- # locale utilities @@ -455,8 +506,8 @@ def _default_locale_getter(): try: raw_locales = check_output(['locale -a'], shell=True) except subprocess.CalledProcessError as e: - raise type(e)("%s, the 'locale -a' command cannot be found on your " - "system" % e) + raise type(e)("{exception}, the 'locale -a' command cannot be found " + "on your system".format(exception=e)) return raw_locales @@ -513,7 +564,8 @@ def get_locales(prefix=None, normalize=True, if prefix is None: return _valid_locales(out_locales, normalize) - found = re.compile('%s.*' % prefix).findall('\n'.join(out_locales)) + found = re.compile('{prefix}.*'.format(prefix=prefix)) \ + .findall('\n'.join(out_locales)) return _valid_locales(found, normalize) @@ -597,6 +649,105 @@ def _valid_locales(locales, normalize): return list(filter(_can_set_locale, map(normalizer, locales))) +# ----------------------------------------------------------------------------- +# Stdout / stderr decorators + + +def capture_stdout(f): + """ + Decorator to capture stdout in a buffer so that it can be checked + (or suppressed) during testing. + + Parameters + ---------- + f : callable + The test that is capturing stdout. + + Returns + ------- + f : callable + The decorated test ``f``, which captures stdout. + + Examples + -------- + + >>> from pandas.util.testing import capture_stdout + >>> + >>> import sys + >>> + >>> @capture_stdout + ... def test_print_pass(): + ... print("foo") + ... out = sys.stdout.getvalue() + ... assert out == "foo\n" + >>> + >>> @capture_stdout + ... def test_print_fail(): + ... print("foo") + ... out = sys.stdout.getvalue() + ... assert out == "bar\n" + ... + AssertionError: assert 'foo\n' == 'bar\n' + """ + + @wraps(f) + def wrapper(*args, **kwargs): + try: + sys.stdout = StringIO() + f(*args, **kwargs) + finally: + sys.stdout = sys.__stdout__ + + return wrapper + + +def capture_stderr(f): + """ + Decorator to capture stderr in a buffer so that it can be checked + (or suppressed) during testing. + + Parameters + ---------- + f : callable + The test that is capturing stderr. + + Returns + ------- + f : callable + The decorated test ``f``, which captures stderr. + + Examples + -------- + + >>> from pandas.util.testing import capture_stderr + >>> + >>> import sys + >>> + >>> @capture_stderr + ... def test_stderr_pass(): + ... sys.stderr.write("foo") + ... out = sys.stderr.getvalue() + ... assert out == "foo\n" + >>> + >>> @capture_stderr + ... def test_stderr_fail(): + ... sys.stderr.write("foo") + ... out = sys.stderr.getvalue() + ... assert out == "bar\n" + ... + AssertionError: assert 'foo\n' == 'bar\n' + """ + + @wraps(f) + def wrapper(*args, **kwargs): + try: + sys.stderr = StringIO() + f(*args, **kwargs) + finally: + sys.stderr = sys.__stderr__ + + return wrapper + # ----------------------------------------------------------------------------- # Console debugging tools @@ -660,8 +811,8 @@ def ensure_clean(filename=None, return_filelike=False): try: fd, filename = tempfile.mkstemp(suffix=filename) except UnicodeEncodeError: - import nose - raise nose.SkipTest('no unicode file names on this system') + import pytest + pytest.skip('no unicode file names on this system') try: yield filename @@ -669,13 +820,13 @@ def ensure_clean(filename=None, return_filelike=False): try: os.close(fd) except Exception as e: - print("Couldn't close file descriptor: %d (file: %s)" % - (fd, filename)) + print("Couldn't close file descriptor: {fdesc} (file: {fname})" + .format(fdesc=fd, fname=filename)) try: if os.path.exists(filename): os.remove(filename) except Exception as e: - print("Exception on removing file: %s" % e) + print("Exception on removing file: {error}".format(error=e)) def get_data_path(f=''): @@ -697,23 +848,6 @@ def equalContents(arr1, arr2): return frozenset(arr1) == frozenset(arr2) -def assert_equal(a, b, msg=""): - """asserts that a equals b, like nose's assert_equal, - but allows custom message to start. Passes a and b to - format string as well. So you can use '{0}' and '{1}' - to display a and b. - - Examples - -------- - >>> assert_equal(2, 2, "apples") - >>> assert_equal(5.2, 1.2, "{0} was really a dead parrot") - Traceback (most recent call last): - ... - AssertionError: 5.2 was really a dead parrot: 5.2 != 1.2 - """ - assert a == b, "%s: %r != %r" % (msg.format(a, b), a, b) - - def assert_index_equal(left, right, exact='equiv', check_names=True, check_less_precise=False, check_exact=True, check_categorical=True, obj='Index'): @@ -725,8 +859,8 @@ def assert_index_equal(left, right, exact='equiv', check_names=True, right : Index exact : bool / string {'equiv'}, default False Whether to check the Index class, dtype and inferred_type - are identical. If 'equiv', then RangeIndex can be substitued for - Int64Index as well + are identical. If 'equiv', then RangeIndex can be substituted for + Int64Index as well. check_names : bool, default True Whether to check the names attribute. check_less_precise : bool or int, default False @@ -748,7 +882,7 @@ def _check_types(l, r, obj='Index'): assert_attr_equal('dtype', l, r, obj=obj) # allow string-like to have different inferred_types if l.inferred_type in ('string', 'unicode'): - assertIn(r.inferred_type, ('string', 'unicode')) + assert r.inferred_type in ('string', 'unicode') else: assert_attr_equal('inferred_type', l, r, obj=obj) @@ -761,23 +895,24 @@ def _get_ilevel_values(index, level): return values # instance validation - assertIsInstance(left, Index, '[index] ') - assertIsInstance(right, Index, '[index] ') + _check_isinstance(left, right, Index) # class / dtype comparison _check_types(left, right, obj=obj) # level comparison if left.nlevels != right.nlevels: - raise_assert_detail(obj, '{0} levels are different'.format(obj), - '{0}, {1}'.format(left.nlevels, left), - '{0}, {1}'.format(right.nlevels, right)) + msg1 = '{obj} levels are different'.format(obj=obj) + msg2 = '{nlevels}, {left}'.format(nlevels=left.nlevels, left=left) + msg3 = '{nlevels}, {right}'.format(nlevels=right.nlevels, right=right) + raise_assert_detail(obj, msg1, msg2, msg3) # length comparison if len(left) != len(right): - raise_assert_detail(obj, '{0} length are different'.format(obj), - '{0}, {1}'.format(len(left), left), - '{0}, {1}'.format(len(right), right)) + msg1 = '{obj} length are different'.format(obj=obj) + msg2 = '{length}, {left}'.format(length=len(left), left=left) + msg3 = '{length}, {right}'.format(length=len(right), right=right) + raise_assert_detail(obj, msg1, msg2, msg3) # MultiIndex special comparison for little-friendly error messages if left.nlevels > 1: @@ -786,7 +921,7 @@ def _get_ilevel_values(index, level): llevel = _get_ilevel_values(left, level) rlevel = _get_ilevel_values(right, level) - lobj = 'MultiIndex level [{0}]'.format(level) + lobj = 'MultiIndex level [{level}]'.format(level=level) assert_index_equal(llevel, rlevel, exact=exact, check_names=check_names, check_less_precise=check_less_precise, @@ -798,8 +933,8 @@ def _get_ilevel_values(index, level): if not left.equals(right): diff = np.sum((left.values != right.values) .astype(int)) * 100.0 / len(left) - msg = '{0} values are different ({1} %)'\ - .format(obj, np.round(diff, 5)) + msg = '{obj} values are different ({pct} %)'.format( + obj=obj, pct=np.round(diff, 5)) raise_assert_detail(obj, msg, left, right) else: _testing.assert_almost_equal(left.values, right.values, @@ -812,11 +947,14 @@ def _get_ilevel_values(index, level): assert_attr_equal('names', left, right, obj=obj) if isinstance(left, pd.PeriodIndex) or isinstance(right, pd.PeriodIndex): assert_attr_equal('freq', left, right, obj=obj) + if (isinstance(left, pd.IntervalIndex) or + isinstance(right, pd.IntervalIndex)): + assert_attr_equal('closed', left, right, obj=obj) if check_categorical: if is_categorical_dtype(left) or is_categorical_dtype(right): assert_categorical_equal(left.values, right.values, - obj='{0} category'.format(obj)) + obj='{obj} category'.format(obj=obj)) def assert_class_equal(left, right, exact=True, obj='Input'): @@ -837,12 +975,12 @@ def repr_class(x): # allow equivalence of Int64Index/RangeIndex types = set([type(left).__name__, type(right).__name__]) if len(types - set(['Int64Index', 'RangeIndex'])): - msg = '{0} classes are not equivalent'.format(obj) + msg = '{obj} classes are not equivalent'.format(obj=obj) raise_assert_detail(obj, msg, repr_class(left), repr_class(right)) elif exact: if type(left) != type(right): - msg = '{0} classes are different'.format(obj) + msg = '{obj} classes are different'.format(obj=obj) raise_assert_detail(obj, msg, repr_class(left), repr_class(right)) @@ -882,23 +1020,22 @@ def assert_attr_equal(attr, left, right, obj='Attributes'): if result: return True else: - raise_assert_detail(obj, 'Attribute "{0}" are different'.format(attr), - left_attr, right_attr) + msg = 'Attribute "{attr}" are different'.format(attr=attr) + raise_assert_detail(obj, msg, left_attr, right_attr) def assert_is_valid_plot_return_object(objs): import matplotlib.pyplot as plt if isinstance(objs, (pd.Series, np.ndarray)): for el in objs.ravel(): - msg = ('one of \'objs\' is not a matplotlib Axes instance, ' - 'type encountered {0!r}') - assert isinstance(el, (plt.Axes, dict)), msg.format( - el.__class__.__name__) + msg = ('one of \'objs\' is not a matplotlib Axes instance, type ' + 'encountered {name!r}').format(name=el.__class__.__name__) + assert isinstance(el, (plt.Axes, dict)), msg else: assert isinstance(objs, (plt.Artist, tuple, dict)), \ ('objs is neither an ndarray of Artist instances nor a ' - 'single Artist instance, tuple, or dict, "objs" is a {0!r} ' - ''.format(objs.__class__.__name__)) + 'single Artist instance, tuple, or dict, "objs" is a {name!r}' + ).format(name=objs.__class__.__name__) def isiterable(obj): @@ -912,66 +1049,9 @@ def is_sorted(seq): return assert_numpy_array_equal(seq, np.sort(np.array(seq))) -def assertIs(first, second, msg=''): - """Checks that 'first' is 'second'""" - a, b = first, second - assert a is b, "%s: %r is not %r" % (msg.format(a, b), a, b) - - -def assertIsNot(first, second, msg=''): - """Checks that 'first' is not 'second'""" - a, b = first, second - assert a is not b, "%s: %r is %r" % (msg.format(a, b), a, b) - - -def assertIn(first, second, msg=''): - """Checks that 'first' is in 'second'""" - a, b = first, second - assert a in b, "%s: %r is not in %r" % (msg.format(a, b), a, b) - - -def assertNotIn(first, second, msg=''): - """Checks that 'first' is not in 'second'""" - a, b = first, second - assert a not in b, "%s: %r is in %r" % (msg.format(a, b), a, b) - - -def assertIsNone(expr, msg=''): - """Checks that 'expr' is None""" - return assertIs(expr, None, msg) - - -def assertIsNotNone(expr, msg=''): - """Checks that 'expr' is not None""" - return assertIsNot(expr, None, msg) - - -def assertIsInstance(obj, cls, msg=''): - """Test that obj is an instance of cls - (which can be a class or a tuple of classes, - as supported by isinstance()).""" - if not isinstance(obj, cls): - err_msg = "{0}Expected type {1}, found {2} instead" - raise AssertionError(err_msg.format(msg, cls, type(obj))) - - -def assert_isinstance(obj, class_type_or_tuple, msg=''): - return deprecate('assert_isinstance', assertIsInstance)( - obj, class_type_or_tuple, msg=msg) - - -def assertNotIsInstance(obj, cls, msg=''): - """Test that obj is not an instance of cls - (which can be a class or a tuple of classes, - as supported by isinstance()).""" - if isinstance(obj, cls): - err_msg = "{0}Input must not be type {1}" - raise AssertionError(err_msg.format(msg, cls)) - - def assert_categorical_equal(left, right, check_dtype=True, obj='Categorical', check_category_order=True): - """Test that categoricals are eqivalent + """Test that Categoricals are equivalent. Parameters ---------- @@ -988,22 +1068,21 @@ def assert_categorical_equal(left, right, check_dtype=True, values are compared. The ordered attribute is checked regardless. """ - assertIsInstance(left, pd.Categorical, '[Categorical] ') - assertIsInstance(right, pd.Categorical, '[Categorical] ') + _check_isinstance(left, right, Categorical) if check_category_order: assert_index_equal(left.categories, right.categories, - obj='{0}.categories'.format(obj)) + obj='{obj}.categories'.format(obj=obj)) assert_numpy_array_equal(left.codes, right.codes, check_dtype=check_dtype, - obj='{0}.codes'.format(obj)) + obj='{obj}.codes'.format(obj=obj)) else: assert_index_equal(left.categories.sort_values(), right.categories.sort_values(), - obj='{0}.categories'.format(obj)) + obj='{obj}.categories'.format(obj=obj)) assert_index_equal(left.categories.take(left.codes), right.categories.take(right.codes), - obj='{0}.values'.format(obj)) + obj='{obj}.values'.format(obj=obj)) assert_attr_equal('ordered', left, right, obj=obj) @@ -1014,14 +1093,14 @@ def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(right, np.ndarray): right = pprint_thing(right) - msg = """{0} are different + msg = """{obj} are different -{1} -[left]: {2} -[right]: {3}""".format(obj, message, left, right) +{message} +[left]: {left} +[right]: {right}""".format(obj=obj, message=message, left=left, right=right) if diff is not None: - msg = msg + "\n[diff]: {diff}".format(diff=diff) + msg += "\n[diff]: {diff}".format(diff=diff) raise AssertionError(msg) @@ -1049,25 +1128,33 @@ def assert_numpy_array_equal(left, right, strict_nan=False, """ # instance validation - # to show a detailed erorr message when classes are different + # Show a detailed error message when classes are different assert_class_equal(left, right, obj=obj) # both classes must be an np.ndarray - assertIsInstance(left, np.ndarray, '[ndarray] ') - assertIsInstance(right, np.ndarray, '[ndarray] ') + _check_isinstance(left, right, np.ndarray) def _get_base(obj): return obj.base if getattr(obj, 'base', None) is not None else obj + left_base = _get_base(left) + right_base = _get_base(right) + if check_same == 'same': - assertIs(_get_base(left), _get_base(right)) + if left_base is not right_base: + msg = "{left!r} is not {right!r}".format( + left=left_base, right=right_base) + raise AssertionError(msg) elif check_same == 'copy': - assertIsNot(_get_base(left), _get_base(right)) + if left_base is right_base: + msg = "{left!r} is {right!r}".format( + left=left_base, right=right_base) + raise AssertionError(msg) def _raise(left, right, err_msg): if err_msg is None: if left.shape != right.shape: - raise_assert_detail(obj, '{0} shapes are different' - .format(obj), left.shape, right.shape) + raise_assert_detail(obj, '{obj} shapes are different' + .format(obj=obj), left.shape, right.shape) diff = 0 for l, r in zip(left, right): @@ -1076,8 +1163,8 @@ def _raise(left, right, err_msg): diff += 1 diff = diff * 100.0 / left.size - msg = '{0} values are different ({1} %)'\ - .format(obj, np.round(diff, 5)) + msg = '{obj} values are different ({pct} %)'.format( + obj=obj, pct=np.round(diff, 5)) raise_assert_detail(obj, msg, left, right) raise AssertionError(err_msg) @@ -1103,7 +1190,6 @@ def assert_series_equal(left, right, check_dtype=True, check_datetimelike_compat=False, check_categorical=True, obj='Series'): - """Check that left and right Series are equal. Parameters @@ -1125,7 +1211,7 @@ def assert_series_equal(left, right, check_dtype=True, Whether to compare number exactly. check_names : bool, default True Whether to check the Series and Index names attribute. - check_dateteimelike_compat : bool, default False + check_datetimelike_compat : bool, default False Compare datetime-like which is comparable ignoring dtype. check_categorical : bool, default True Whether to compare internal Categorical exactly. @@ -1135,20 +1221,19 @@ def assert_series_equal(left, right, check_dtype=True, """ # instance validation - assertIsInstance(left, Series, '[Series] ') - assertIsInstance(right, Series, '[Series] ') + _check_isinstance(left, right, Series) if check_series_type: # ToDo: There are some tests using rhs is sparse # lhs is dense. Should use assert_class_equal in future - assertIsInstance(left, type(right)) + assert isinstance(left, type(right)) # assert_class_equal(left, right, obj=obj) # length comparison if len(left) != len(right): - raise_assert_detail(obj, 'Series length are different', - '{0}, {1}'.format(len(left), left.index), - '{0}, {1}'.format(len(right), right.index)) + msg1 = '{len}, {left}'.format(len=len(left), left=left.index) + msg2 = '{len}, {right}'.format(len=len(right), right=right.index) + raise_assert_detail(obj, 'Series length are different', msg1, msg2) # index comparison assert_index_equal(left.index, right.index, exact=check_index_type, @@ -1156,7 +1241,7 @@ def assert_series_equal(left, right, check_dtype=True, check_less_precise=check_less_precise, check_exact=check_exact, check_categorical=check_categorical, - obj='{0}.index'.format(obj)) + obj='{obj}.index'.format(obj=obj)) if check_dtype: assert_attr_equal('dtype', left, right) @@ -1164,7 +1249,7 @@ def assert_series_equal(left, right, check_dtype=True, if check_exact: assert_numpy_array_equal(left.get_values(), right.get_values(), check_dtype=check_dtype, - obj='{0}'.format(obj),) + obj='{obj}'.format(obj=obj),) elif check_datetimelike_compat: # we want to check only if we have compat dtypes # e.g. integer and M|m are NOT compat, but we can simply check @@ -1177,16 +1262,23 @@ def assert_series_equal(left, right, check_dtype=True, # datetimelike may have different objects (e.g. datetime.datetime # vs Timestamp) but will compare equal if not Index(left.values).equals(Index(right.values)): - msg = '[datetimelike_compat=True] {0} is not equal to {1}.' - raise AssertionError(msg.format(left.values, right.values)) + msg = ('[datetimelike_compat=True] {left} is not equal to ' + '{right}.').format(left=left.values, right=right.values) + raise AssertionError(msg) else: assert_numpy_array_equal(left.get_values(), right.get_values(), check_dtype=check_dtype) + elif is_interval_dtype(left) or is_interval_dtype(right): + # TODO: big hack here + l = pd.IntervalIndex(left) + r = pd.IntervalIndex(right) + assert_index_equal(l, r, obj='{obj}.index'.format(obj=obj)) + else: _testing.assert_almost_equal(left.get_values(), right.get_values(), check_less_precise=check_less_precise, check_dtype=check_dtype, - obj='{0}'.format(obj)) + obj='{obj}'.format(obj=obj)) # metadata comparison if check_names: @@ -1195,7 +1287,7 @@ def assert_series_equal(left, right, check_dtype=True, if check_categorical: if is_categorical_dtype(left) or is_categorical_dtype(right): assert_categorical_equal(left.values, right.values, - obj='{0} category'.format(obj)) + obj='{obj} category'.format(obj=obj)) # This could be refactored to use the NDFrame.equals method @@ -1211,7 +1303,6 @@ def assert_frame_equal(left, right, check_dtype=True, check_categorical=True, check_like=False, obj='DataFrame'): - """Check that left and right DataFrame are equal. Parameters @@ -1239,53 +1330,43 @@ def assert_frame_equal(left, right, check_dtype=True, If True, compare by blocks. check_exact : bool, default False Whether to compare number exactly. - check_dateteimelike_compat : bool, default False + check_datetimelike_compat : bool, default False Compare datetime-like which is comparable ignoring dtype. check_categorical : bool, default True Whether to compare internal Categorical exactly. check_like : bool, default False - If true, then reindex_like operands + If true, ignore the order of rows & columns obj : str, default 'DataFrame' Specify object name being compared, internally used to show appropriate assertion message """ # instance validation - assertIsInstance(left, DataFrame, '[DataFrame] ') - assertIsInstance(right, DataFrame, '[DataFrame] ') + _check_isinstance(left, right, DataFrame) if check_frame_type: # ToDo: There are some tests using rhs is SparseDataFrame # lhs is DataFrame. Should use assert_class_equal in future - assertIsInstance(left, type(right)) + assert isinstance(left, type(right)) # assert_class_equal(left, right, obj=obj) + # shape comparison + if left.shape != right.shape: + raise_assert_detail(obj, + 'DataFrame shape mismatch', + '{shape!r}'.format(shape=left.shape), + '{shape!r}'.format(shape=right.shape)) + if check_like: left, right = left.reindex_like(right), right - # shape comparison (row) - if left.shape[0] != right.shape[0]: - raise_assert_detail(obj, - 'DataFrame shape (number of rows) are different', - '{0}, {1}'.format(left.shape[0], left.index), - '{0}, {1}'.format(right.shape[0], right.index)) - # shape comparison (columns) - if left.shape[1] != right.shape[1]: - raise_assert_detail(obj, - 'DataFrame shape (number of columns) ' - 'are different', - '{0}, {1}'.format(left.shape[1], - left.columns), - '{0}, {1}'.format(right.shape[1], - right.columns)) - # index comparison assert_index_equal(left.index, right.index, exact=check_index_type, check_names=check_names, check_less_precise=check_less_precise, check_exact=check_exact, check_categorical=check_categorical, - obj='{0}.index'.format(obj)) + obj='{obj}.index'.format(obj=obj)) # column comparison assert_index_equal(left.columns, right.columns, exact=check_column_type, @@ -1293,7 +1374,7 @@ def assert_frame_equal(left, right, check_dtype=True, check_less_precise=check_less_precise, check_exact=check_exact, check_categorical=check_categorical, - obj='{0}.columns'.format(obj)) + obj='{obj}.columns'.format(obj=obj)) # compare by blocks if by_blocks: @@ -1318,7 +1399,7 @@ def assert_frame_equal(left, right, check_dtype=True, check_exact=check_exact, check_names=check_names, check_datetimelike_compat=check_datetimelike_compat, check_categorical=check_categorical, - obj='DataFrame.iloc[:, {0}]'.format(i)) + obj='DataFrame.iloc[:, {idx}]'.format(idx=i)) def assert_panelnd_equal(left, right, @@ -1373,13 +1454,16 @@ def assert_panelnd_equal(left, right, # can potentially be slow for i, item in enumerate(left._get_axis(0)): - assert item in right, "non-matching item (right) '%s'" % item + msg = "non-matching item (right) '{item}'".format(item=item) + assert item in right, msg litem = left.iloc[i] ritem = right.iloc[i] assert_func(litem, ritem, check_less_precise=check_less_precise) for i, item in enumerate(right._get_axis(0)): - assert item in left, "non-matching item (left) '%s'" % item + msg = "non-matching item (left) '{item}'".format(item=item) + assert item in left, msg + # TODO: strangely check_names fails in py3 ? _panel_frame_equal = partial(assert_frame_equal, check_names=False) @@ -1404,15 +1488,14 @@ def assert_sp_array_equal(left, right, check_dtype=True): Whether to check the data dtype is identical. """ - assertIsInstance(left, pd.SparseArray, '[SparseArray]') - assertIsInstance(right, pd.SparseArray, '[SparseArray]') + _check_isinstance(left, right, pd.SparseArray) assert_numpy_array_equal(left.sp_values, right.sp_values, check_dtype=check_dtype) # SparseIndex comparison - assertIsInstance(left.sp_index, pd._sparse.SparseIndex, '[SparseIndex]') - assertIsInstance(right.sp_index, pd._sparse.SparseIndex, '[SparseIndex]') + assert isinstance(left.sp_index, pd._libs.sparse.SparseIndex) + assert isinstance(right.sp_index, pd._libs.sparse.SparseIndex) if not left.sp_index.equals(right.sp_index): raise_assert_detail('SparseArray.index', 'index are not equal', @@ -1445,14 +1528,13 @@ def assert_sp_series_equal(left, right, check_dtype=True, exact_indices=True, Specify the object name being compared, internally used to show the appropriate assertion message. """ - assertIsInstance(left, pd.SparseSeries, '[SparseSeries]') - assertIsInstance(right, pd.SparseSeries, '[SparseSeries]') + _check_isinstance(left, right, pd.SparseSeries) if check_series_type: assert_class_equal(left, right, obj=obj) assert_index_equal(left.index, right.index, - obj='{0}.index'.format(obj)) + obj='{obj}.index'.format(obj=obj)) assert_sp_array_equal(left.block.values, right.block.values) @@ -1483,16 +1565,15 @@ def assert_sp_frame_equal(left, right, check_dtype=True, exact_indices=True, Specify the object name being compared, internally used to show the appropriate assertion message. """ - assertIsInstance(left, pd.SparseDataFrame, '[SparseDataFrame]') - assertIsInstance(right, pd.SparseDataFrame, '[SparseDataFrame]') + _check_isinstance(left, right, pd.SparseDataFrame) if check_frame_type: assert_class_equal(left, right, obj=obj) assert_index_equal(left.index, right.index, - obj='{0}.index'.format(obj)) + obj='{obj}.index'.format(obj=obj)) assert_index_equal(left.columns, right.columns, - obj='{0}.columns'.format(obj)) + obj='{obj}.columns'.format(obj=obj)) for col, series in compat.iteritems(left): assert (col in right) @@ -1515,8 +1596,8 @@ def assert_sp_frame_equal(left, right, check_dtype=True, exact_indices=True, def assert_sp_list_equal(left, right): - assertIsInstance(left, pd.SparseList, '[SparseList]') - assertIsInstance(right, pd.SparseList, '[SparseList]') + assert isinstance(left, pd.SparseList) + assert isinstance(right, pd.SparseList) assert_sp_array_equal(left.to_array(), right.to_array()) @@ -1526,7 +1607,7 @@ def assert_sp_list_equal(left, right): def assert_contains_all(iterable, dic): for k in iterable: - assert k in dic, "Did not contain item: '%r'" % k + assert k in dic, "Did not contain item: '{key!r}'".format(key=k) def assert_copy(iter1, iter2, **eql_kwargs): @@ -1540,10 +1621,10 @@ def assert_copy(iter1, iter2, **eql_kwargs): """ for elem1, elem2 in zip(iter1, iter2): assert_almost_equal(elem1, elem2, **eql_kwargs) - assert elem1 is not elem2, ("Expected object %r and " - "object %r to be different " - "objects, were same." - % (type(elem1), type(elem2))) + msg = ("Expected object {obj1!r} and object {obj2!r} to be " + "different objects, but they were the same object." + ).format(obj1=type(elem1), obj2=type(elem2)) + assert elem1 is not elem2, msg def getCols(k): @@ -1569,6 +1650,12 @@ def makeCategoricalIndex(k=10, n=3, name=None): return CategoricalIndex(np.random.choice(x, k), name=name) +def makeIntervalIndex(k=10, name=None): + """ make a length k IntervalIndex """ + x = np.linspace(0, 100, num=(k + 1)) + return IntervalIndex.from_breaks(x, name=name) + + def makeBoolIndex(k=10, name=None): if k == 1: return Index([True], name=name) @@ -1716,20 +1803,24 @@ def makePeriodFrame(nper=None): def makePanel(nper=None): - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = dict((c, makeTimeDataFrame(nper)) for c in cols) - return Panel.fromDict(data) + with warnings.catch_warnings(record=True): + cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] + data = dict((c, makeTimeDataFrame(nper)) for c in cols) + return Panel.fromDict(data) def makePeriodPanel(nper=None): - cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] - data = dict((c, makePeriodFrame(nper)) for c in cols) - return Panel.fromDict(data) + with warnings.catch_warnings(record=True): + cols = ['Item' + c for c in string.ascii_uppercase[:K - 1]] + data = dict((c, makePeriodFrame(nper)) for c in cols) + return Panel.fromDict(data) def makePanel4D(nper=None): - return Panel4D(dict(l1=makePanel(nper), l2=makePanel(nper), - l3=makePanel(nper))) + with warnings.catch_warnings(record=True): + d = dict(l1=makePanel(nper), l2=makePanel(nper), + l3=makePanel(nper)) + return Panel4D(d) def makeCustomIndex(nentries, nlevels, prefix='#', names=False, ndupe_l=None, @@ -1787,8 +1878,9 @@ def makeCustomIndex(nentries, nlevels, prefix='#', names=False, ndupe_l=None, idx.name = names[0] return idx elif idx_type is not None: - raise ValueError('"%s" is not a legal value for `idx_type`, use ' - '"i"/"f"/"s"/"u"/"dt/"p"/"td".' % idx_type) + raise ValueError('"{idx_type}" is not a legal value for `idx_type`, ' + 'use "i"/"f"/"s"/"u"/"dt/"p"/"td".' + .format(idx_type=idx_type)) if len(ndupe_l) < nlevels: ndupe_l.extend([1] * (nlevels - len(ndupe_l))) @@ -1807,7 +1899,7 @@ def keyfunc(x): div_factor = nentries // ndupe_l[i] + 1 cnt = Counter() for j in range(div_factor): - label = prefix + '_l%d_g' % i + str(j) + label = '{prefix}_l{i}_g{j}'.format(prefix=prefix, i=i, j=j) cnt[label] = ndupe_l[i] # cute Counter trick result = list(sorted(cnt.elements(), key=keyfunc))[:nentries] @@ -1817,7 +1909,11 @@ def keyfunc(x): # convert tuples to index if nentries == 1: + # we have a single level of tuples, i.e. a regular Index index = Index(tuples[0], name=names[0]) + elif nlevels == 1: + name = None if names is None else names[0] + index = Index((x[0] for x in tuples), name=name) else: index = MultiIndex.from_tuples(tuples, names=names) return index @@ -1900,7 +1996,7 @@ def makeCustomDataframe(nrows, ncols, c_idx_names=True, r_idx_names=True, # by default, generate data based on location if data_gen_f is None: - data_gen_f = lambda r, c: "R%dC%d" % (r, c) + data_gen_f = lambda r, c: "R{rows}C{cols}".format(rows=r, cols=c) data = [[data_gen_f(r, c) for c in range(ncols)] for r in range(nrows)] @@ -1995,87 +2091,60 @@ def __init__(self, *args, **kwargs): dict.__init__(self, *args, **kwargs) -# Dependency checks. Copied this from Nipy/Nipype (Copyright of -# respective developers, license: BSD-3) -def package_check(pkg_name, version=None, app='pandas', checker=LooseVersion, - exc_failed_import=ImportError, - exc_failed_check=RuntimeError): - """Check that the minimal version of the required package is installed. +# Dependency checker when running tests. +# +# Copied this from nipy/nipype +# Copyright of respective developers, License: BSD-3 +def skip_if_no_package(pkg_name, min_version=None, max_version=None, + app='pandas', checker=LooseVersion): + """Check that the min/max version of the required package is installed. + + If the package check fails, the test is automatically skipped. Parameters ---------- pkg_name : string Name of the required package. - version : string, optional + min_version : string, optional Minimal version number for required package. + max_version : string, optional + Max version number for required package. app : string, optional - Application that is performing the check. For instance, the + Application that is performing the check. For instance, the name of the tutorial being executed that depends on specific packages. checker : object, optional - The class that will perform the version checking. Default is + The class that will perform the version checking. Default is distutils.version.LooseVersion. - exc_failed_import : Exception, optional - Class of the exception to be thrown if import failed. - exc_failed_check : Exception, optional - Class of the exception to be thrown if version check failed. Examples -------- package_check('numpy', '1.3') - package_check('networkx', '1.0', 'tutorial1') """ + import pytest if app: - msg = '%s requires %s' % (app, pkg_name) + msg = '{app} requires {pkg_name}'.format(app=app, pkg_name=pkg_name) else: - msg = 'module requires %s' % pkg_name - if version: - msg += ' with version >= %s' % (version,) + msg = 'module requires {pkg_name}'.format(pkg_name=pkg_name) + if min_version: + msg += ' with version >= {min_version}'.format(min_version=min_version) + if max_version: + msg += ' with version < {max_version}'.format(max_version=max_version) try: mod = __import__(pkg_name) except ImportError: - raise exc_failed_import(msg) - if not version: - return + mod = None try: have_version = mod.__version__ except AttributeError: - raise exc_failed_check('Cannot find version for %s' % pkg_name) - if checker(have_version) < checker(version): - raise exc_failed_check(msg) - - -def skip_if_no_package(*args, **kwargs): - """Raise SkipTest if package_check fails - - Parameters - ---------- - *args Positional parameters passed to `package_check` - *kwargs Keyword parameters passed to `package_check` - """ - from nose import SkipTest - package_check(exc_failed_import=SkipTest, - exc_failed_check=SkipTest, - *args, **kwargs) - - -def skip_if_no_package_deco(pkg_name, version=None, app='pandas'): - from nose import SkipTest - - def deco(func): - @wraps(func) - def wrapper(*args, **kwargs): - package_check(pkg_name, version=version, app=app, - exc_failed_import=SkipTest, - exc_failed_check=SkipTest) - return func(*args, **kwargs) - return wrapper - return deco -# -# Additional tags decorators for nose -# + pytest.skip('Cannot find version for {pkg_name}' + .format(pkg_name=pkg_name)) + if min_version and checker(have_version) < checker(min_version): + pytest.skip(msg) + if max_version and checker(have_version) >= checker(max_version): + pytest.skip(msg) def optional_args(decorator): @@ -2103,6 +2172,7 @@ def dec(f): return wrapper + # skip tests on exceptions with this message _network_error_messages = ( # 'urlopen error timed out', @@ -2120,6 +2190,7 @@ def dec(f): 'Temporary failure in name resolution', 'Name or service not known', 'Connection refused', + 'certificate verify', ) # or this e.errno/e.reason.errno @@ -2255,18 +2326,17 @@ def network(t, url="http://www.google.com", >>> test_something() Traceback (most recent call last): ... - SkipTest Errors not related to networking will always be raised. """ - from nose import SkipTest + from pytest import skip t.network = True @wraps(t) def wrapper(*args, **kwargs): if check_before_test and not raise_on_error: if not can_connect(url, error_classes): - raise SkipTest + skip() try: return t(*args, **kwargs) except Exception as e: @@ -2275,8 +2345,8 @@ def wrapper(*args, **kwargs): errno = getattr(e.reason, 'errno', None) if errno in skip_errnos: - raise SkipTest("Skipping test due to known errno" - " and error %s" % e) + skip("Skipping test due to known errno" + " and error {error}".format(error=e)) try: e_str = traceback.format_exc(e) @@ -2284,8 +2354,8 @@ def wrapper(*args, **kwargs): e_str = str(e) if any([m.lower() in e_str.lower() for m in _skip_on_messages]): - raise SkipTest("Skipping test because exception " - "message is known and error %s" % e) + skip("Skipping test because exception " + "message is known and error {error}".format(error=e)) if not isinstance(e, error_classes): raise @@ -2293,8 +2363,8 @@ def wrapper(*args, **kwargs): if raise_on_error or can_connect(url, error_classes): raise else: - raise SkipTest("Skipping test due to lack of connectivity" - " and error %s" % e) + skip("Skipping test due to lack of connectivity" + " and error {error}".format(e)) return wrapper @@ -2355,78 +2425,37 @@ def stdin_encoding(encoding=None): sys.stdin = _stdin -def assertRaises(_exception, _callable=None, *args, **kwargs): - """assertRaises that is usable as context manager or in a with statement - - Exceptions that don't match the given Exception type fall through:: - - >>> with assertRaises(ValueError): - ... raise TypeError("banana") - ... - Traceback (most recent call last): - ... - TypeError: banana - - If it raises the given Exception type, the test passes - >>> with assertRaises(KeyError): - ... dct = dict() - ... dct["apple"] - - If the expected error doesn't occur, it raises an error. - >>> with assertRaises(KeyError): - ... dct = {'apple':True} - ... dct["apple"] - Traceback (most recent call last): - ... - AssertionError: KeyError not raised. - - In addition to using it as a contextmanager, you can also use it as a - function, just like the normal assertRaises - - >>> assertRaises(TypeError, ",".join, [1, 3, 5]) +def assert_raises_regex(_exception, _regexp, _callable=None, + *args, **kwargs): """ - manager = _AssertRaisesContextmanager(exception=_exception) - # don't return anything if used in function form - if _callable is not None: - with manager: - _callable(*args, **kwargs) - else: - return manager + Check that the specified Exception is raised and that the error message + matches a given regular expression pattern. This may be a regular + expression object or a string containing a regular expression suitable + for use by `re.search()`. This is a port of the `assertRaisesRegexp` + function from unittest in Python 2.7. - -def assertRaisesRegexp(_exception, _regexp, _callable=None, *args, **kwargs): - """ Port of assertRaisesRegexp from unittest in - Python 2.7 - used in with statement. - - Explanation from standard library: - Like assertRaises() but also tests that regexp matches on the - string representation of the raised exception. regexp may be a - regular expression object or a string containing a regular - expression suitable for use by re.search(). - - You can pass either a regular expression - or a compiled regular expression object. - >>> assertRaisesRegexp(ValueError, 'invalid literal for.*XYZ', - ... int, 'XYZ') + Examples + -------- + >>> assert_raises_regex(ValueError, 'invalid literal for.*XYZ', int, 'XYZ') >>> import re - >>> assertRaisesRegexp(ValueError, re.compile('literal'), int, 'XYZ') + >>> assert_raises_regex(ValueError, re.compile('literal'), int, 'XYZ') If an exception of a different type is raised, it bubbles up. - >>> assertRaisesRegexp(TypeError, 'literal', int, 'XYZ') + >>> assert_raises_regex(TypeError, 'literal', int, 'XYZ') Traceback (most recent call last): ... ValueError: invalid literal for int() with base 10: 'XYZ' >>> dct = dict() - >>> assertRaisesRegexp(KeyError, 'pear', dct.__getitem__, 'apple') + >>> assert_raises_regex(KeyError, 'pear', dct.__getitem__, 'apple') Traceback (most recent call last): ... AssertionError: "pear" does not match "'apple'" You can also use this in a with statement. - >>> with assertRaisesRegexp(TypeError, 'unsupported operand type\(s\)'): + >>> with assert_raises_regex(TypeError, 'unsupported operand type\(s\)'): ... 1 + {} - >>> with assertRaisesRegexp(TypeError, 'banana'): + >>> with assert_raises_regex(TypeError, 'banana'): ... 'apple'[0] = 'b' Traceback (most recent call last): ... @@ -2443,39 +2472,80 @@ def assertRaisesRegexp(_exception, _regexp, _callable=None, *args, **kwargs): class _AssertRaisesContextmanager(object): """ - Handles the behind the scenes work - for assertRaises and assertRaisesRegexp + Context manager behind `assert_raises_regex`. """ - def __init__(self, exception, regexp=None, *args, **kwargs): + + def __init__(self, exception, regexp=None): + """ + Initialize an _AssertRaisesContextManager instance. + + Parameters + ---------- + exception : class + The expected Exception class. + regexp : str, default None + The regex to compare against the Exception message. + """ + self.exception = exception + if regexp is not None and not hasattr(regexp, "search"): regexp = re.compile(regexp, re.DOTALL) + self.regexp = regexp def __enter__(self): return self - def __exit__(self, exc_type, exc_value, traceback): + def __exit__(self, exc_type, exc_value, trace_back): expected = self.exception - if not exc_type: - name = getattr(expected, "__name__", str(expected)) - raise AssertionError("{0} not raised.".format(name)) - if issubclass(exc_type, expected): - return self.handle_success(exc_type, exc_value, traceback) - return self.handle_failure(exc_type, exc_value, traceback) - - def handle_failure(*args, **kwargs): - # Failed, so allow Exception to bubble up - return False - def handle_success(self, exc_type, exc_value, traceback): - if self.regexp is not None: - val = str(exc_value) - if not self.regexp.search(val): - e = AssertionError('"%s" does not match "%s"' % - (self.regexp.pattern, str(val))) - raise_with_traceback(e, traceback) - return True + if not exc_type: + exp_name = getattr(expected, "__name__", str(expected)) + raise AssertionError("{name} not raised.".format(name=exp_name)) + + return self.exception_matches(exc_type, exc_value, trace_back) + + def exception_matches(self, exc_type, exc_value, trace_back): + """ + Check that the Exception raised matches the expected Exception + and expected error message regular expression. + + Parameters + ---------- + exc_type : class + The type of Exception raised. + exc_value : Exception + The instance of `exc_type` raised. + trace_back : stack trace object + The traceback object associated with `exc_value`. + + Returns + ------- + is_matched : bool + Whether or not the Exception raised matches the expected + Exception class and expected error message regular expression. + + Raises + ------ + AssertionError : The error message provided does not match + the expected error message regular expression. + """ + + if issubclass(exc_type, self.exception): + if self.regexp is not None: + val = str(exc_value) + + if not self.regexp.search(val): + msg = '"{pat}" does not match "{val}"'.format( + pat=self.regexp.pattern, val=val) + e = AssertionError(msg) + raise_with_traceback(e, trace_back) + + return True + else: + # Failed, so allow Exception to bubble up. + return False @contextmanager @@ -2536,23 +2606,20 @@ def assert_produces_warning(expected_warning=Warning, filter_level="always", from inspect import getframeinfo, stack caller = getframeinfo(stack()[2][0]) msg = ("Warning not set with correct stacklevel. " - "File where warning is raised: {0} != {1}. " - "Warning message: {2}".format( - actual_warning.filename, caller.filename, - actual_warning.message)) + "File where warning is raised: {actual} != " + "{caller}. Warning message: {message}" + ).format(actual=actual_warning.filename, + caller=caller.filename, + message=actual_warning.message) assert actual_warning.filename == caller.filename, msg else: extra_warnings.append(actual_warning.category.__name__) if expected_warning: - assert saw_warning, ("Did not see expected warning of class %r." - % expected_warning.__name__) - assert not extra_warnings, ("Caused unexpected warning(s): %r." - % extra_warnings) - - -def disabled(t): - t.disabled = True - return t + msg = "Did not see expected warning of class {name!r}.".format( + name=expected_warning.__name__) + assert saw_warning, msg + assert not extra_warnings, ("Caused unexpected warning(s): {extra!r}." + ).format(extra=extra_warnings) class RNGContext(object): @@ -2596,12 +2663,6 @@ def use_numexpr(use, min_elements=expr._MIN_ELEMENTS): expr.set_use_numexpr(olduse) -# Also provide all assert_* functions in the TestCase class -for name, obj in inspect.getmembers(sys.modules[__name__]): - if inspect.isfunction(obj) and name.startswith('assert'): - setattr(TestCase, name, staticmethod(obj)) - - def test_parallel(num_threads=2, kwargs_list=None): """Decorator to run the same function multiple times in parallel. @@ -2779,8 +2840,8 @@ def set_timezone(tz): 'EDT' """ if is_platform_windows(): - import nose - raise nose.SkipTest("timezone setting not supported on windows") + import pytest + pytest.skip("timezone setting not supported on windows") import os import time diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000000000..f0d57d1d808a2 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,9 @@ +[build-system] +requires = [ + "wheel", + "setuptools", + "Cython", # required for VCS build, optional for released source + "numpy==1.9.3; python_version=='3.5'", + "numpy==1.12.1; python_version=='3.6'", + "numpy==1.13.1; python_version>='3.7'", +] diff --git a/scripts/api_rst_coverage.py b/scripts/api_rst_coverage.py index cc456f03c02ec..6bb5383509be6 100644 --- a/scripts/api_rst_coverage.py +++ b/scripts/api_rst_coverage.py @@ -4,11 +4,11 @@ def main(): # classes whose members to check - classes = [pd.Series, pd.DataFrame, pd.Panel, pd.Panel4D] + classes = [pd.Series, pd.DataFrame, pd.Panel] def class_name_sort_key(x): if x.startswith('Series'): - # make sure Series precedes DataFrame, Panel, and Panel4D + # make sure Series precedes DataFrame, and Panel. return ' ' + x else: return x diff --git a/scripts/bench_join.py b/scripts/bench_join.py index 1ce5c94130e85..f9d43772766d8 100644 --- a/scripts/bench_join.py +++ b/scripts/bench_join.py @@ -1,6 +1,6 @@ from pandas.compat import range, lrange import numpy as np -import pandas.lib as lib +import pandas._libs.lib as lib from pandas import * from copy import deepcopy import time diff --git a/scripts/bench_join_multi.py b/scripts/bench_join_multi.py index 7b93112b7f869..b19da6a2c47d8 100644 --- a/scripts/bench_join_multi.py +++ b/scripts/bench_join_multi.py @@ -3,7 +3,7 @@ import numpy as np from pandas.compat import zip, range, lzip from pandas.util.testing import rands -import pandas.lib as lib +import pandas._libs.lib as lib N = 100000 diff --git a/scripts/build_dist.sh b/scripts/build_dist.sh index c9c36c18bed9c..c3f849ce7a6eb 100755 --- a/scripts/build_dist.sh +++ b/scripts/build_dist.sh @@ -10,9 +10,7 @@ read -p "Ok to continue (y/n)? " answer case ${answer:0:1} in y|Y ) echo "Building distribution" - python setup.py clean - python setup.py build_ext --inplace - python setup.py sdist --formats=gztar + ./build_dist_for_release.sh ;; * ) echo "Not building distribution" diff --git a/scripts/build_dist_for_release.sh b/scripts/build_dist_for_release.sh new file mode 100644 index 0000000000000..e77974ae08b0c --- /dev/null +++ b/scripts/build_dist_for_release.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +# this requires cython to be installed + +# this builds the release cleanly & is building on the current checkout +rm -rf dist +git clean -xfd +python setup.py clean +python setup.py cython +python setup.py sdist --formats=gztar diff --git a/scripts/find_commits_touching_func.py b/scripts/find_commits_touching_func.py index 099761f38bb44..74ea120bf0b64 100755 --- a/scripts/find_commits_touching_func.py +++ b/scripts/find_commits_touching_func.py @@ -4,7 +4,7 @@ # copryright 2013, y-p @ github from __future__ import print_function -from pandas.compat import range, lrange, map +from pandas.compat import range, lrange, map, string_types, text_type """Search the git history for all commits touching a named method @@ -94,7 +94,7 @@ def get_hits(defname,files=()): def get_commit_info(c,fmt,sep='\t'): r=sh.git('log', "--format={}".format(fmt), '{}^..{}'.format(c,c),"-n","1",_tty_out=False) - return compat.text_type(r).split(sep) + return text_type(r).split(sep) def get_commit_vitals(c,hlen=HASH_LEN): h,s,d= get_commit_info(c,'%H\t%s\t%ci',"\t") @@ -183,11 +183,11 @@ def main(): !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """) return - if isinstance(args.file_masks,compat.string_types): + if isinstance(args.file_masks, string_types): args.file_masks = args.file_masks.split(',') - if isinstance(args.path_masks,compat.string_types): + if isinstance(args.path_masks, string_types): args.path_masks = args.path_masks.split(',') - if isinstance(args.dir_masks,compat.string_types): + if isinstance(args.dir_masks, string_types): args.dir_masks = args.dir_masks.split(',') logger.setLevel(getattr(logging,args.debug_level)) diff --git a/scripts/groupby_test.py b/scripts/groupby_test.py index 5acf7da7534a3..f640a6ed79503 100644 --- a/scripts/groupby_test.py +++ b/scripts/groupby_test.py @@ -5,7 +5,7 @@ from pandas import * -import pandas.lib as tseries +import pandas._libs.lib as tseries import pandas.core.groupby as gp import pandas.util.testing as tm from pandas.compat import range diff --git a/scripts/merge-py.py b/scripts/merge-pr.py similarity index 83% rename from scripts/merge-py.py rename to scripts/merge-pr.py index b9350f8feceb8..1fc4eef3d0583 100755 --- a/scripts/merge-py.py +++ b/scripts/merge-pr.py @@ -99,6 +99,14 @@ def continue_maybe(prompt): fail("Okay, exiting") +def continue_maybe2(prompt): + result = input("\n%s (y/n): " % prompt) + if result.lower() != "y": + return False + else: + return True + + original_head = run_cmd("git rev-parse HEAD")[:8] @@ -193,6 +201,40 @@ def merge_pr(pr_num, target_ref): return merge_hash +def update_pr(pr_num, user_login, base_ref): + + pr_branch_name = "%s_MERGE_PR_%s" % (BRANCH_PREFIX, pr_num) + + run_cmd("git fetch %s pull/%s/head:%s" % (PR_REMOTE_NAME, pr_num, + pr_branch_name)) + run_cmd("git checkout %s" % pr_branch_name) + + continue_maybe("Update ready (local ref %s)? Push to %s/%s?" % ( + pr_branch_name, user_login, base_ref)) + + push_user_remote = "https://github.com/%s/pandas.git" % user_login + + try: + run_cmd('git push %s %s:%s' % (push_user_remote, pr_branch_name, + base_ref)) + except Exception as e: + + if continue_maybe2("Force push?"): + try: + run_cmd( + 'git push -f %s %s:%s' % (push_user_remote, pr_branch_name, + base_ref)) + except Exception as e: + fail("Exception while pushing: %s" % e) + clean_up() + else: + fail("Exception while pushing: %s" % e) + clean_up() + + clean_up() + print("Pull request #%s updated!" % pr_num) + + def cherry_pick(pr_num, merge_hash, default_branch): pick_ref = input("Enter a branch name [%s]: " % default_branch) if pick_ref == "": @@ -257,8 +299,17 @@ def fix_version_from_branch(branch, versions): print("\n=== Pull Request #%s ===" % pr_num) print("title\t%s\nsource\t%s\ntarget\t%s\nurl\t%s" % (title, pr_repo_desc, target_ref, url)) -continue_maybe("Proceed with merging pull request #%s?" % pr_num) + + merged_refs = [target_ref] -merge_hash = merge_pr(pr_num, target_ref) +print("\nProceed with updating or merging pull request #%s?" % pr_num) +update = input("Update PR and push to remote (r), merge locally (l), " + "or do nothing (n) ?") +update = update.lower() + +if update == 'r': + merge_hash = update_pr(pr_num, user_login, base_ref) +elif update == 'l': + merge_hash = merge_pr(pr_num, target_ref) diff --git a/scripts/roll_median_leak.py b/scripts/roll_median_leak.py index 07161cc6499bf..03f39e2b18372 100644 --- a/scripts/roll_median_leak.py +++ b/scripts/roll_median_leak.py @@ -7,7 +7,7 @@ from vbench.api import Benchmark from pandas.util.testing import rands from pandas.compat import range -import pandas.lib as lib +import pandas._libs.lib as lib import pandas._sandbox as sbx import time diff --git a/scripts/windows_builder/build_27-32.bat b/scripts/windows_builder/build_27-32.bat deleted file mode 100644 index 37eb4d436d567..0000000000000 --- a/scripts/windows_builder/build_27-32.bat +++ /dev/null @@ -1,25 +0,0 @@ -@echo off -echo "starting 27-32" - -setlocal EnableDelayedExpansion -set MSSdk=1 -CALL "C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\SetEnv.cmd" /x86 /release -set DISTUTILS_USE_SDK=1 - -title 27-32 build -echo "building" -cd "c:\users\Jeff Reback\documents\github\pandas" -C:\python27-32\python.exe setup.py build > build.27-32.log 2>&1 - -title "installing" -C:\python27-32\python.exe setup.py bdist --formats=wininst > install.27-32.log 2>&1 - -echo "testing" -C:\python27-32\scripts\nosetests -A "not slow" build\lib.win32-2.7\pandas > test.27-32.log 2>&1 - -echo "versions" -cd build\lib.win32-2.7 -C:\python27-32\python.exe ../../ci/print_versions.py > ../../versions.27-32.log 2>&1 - -exit - diff --git a/scripts/windows_builder/build_27-64.bat b/scripts/windows_builder/build_27-64.bat deleted file mode 100644 index e76e25d0ef39c..0000000000000 --- a/scripts/windows_builder/build_27-64.bat +++ /dev/null @@ -1,25 +0,0 @@ -@echo off -echo "starting 27-64" - -setlocal EnableDelayedExpansion -set MSSdk=1 -CALL "C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\SetEnv.cmd" /x64 /release -set DISTUTILS_USE_SDK=1 - -title 27-64 build -echo "building" -cd "c:\users\Jeff Reback\documents\github\pandas" -C:\python27-64\python.exe setup.py build > build.27-64.log 2>&1 - -echo "installing" -C:\python27-64\python.exe setup.py bdist --formats=wininst > install.27-64.log 2>&1 - -echo "testing" -C:\python27-64\scripts\nosetests -A "not slow" build\lib.win-amd64-2.7\pandas > test.27-64.log 2>&1 - -echo "versions" -cd build\lib.win-amd64-2.7 -C:\python27-64\python.exe ../../ci/print_versions.py > ../../versions.27-64.log 2>&1 - -exit - diff --git a/scripts/windows_builder/build_34-32.bat b/scripts/windows_builder/build_34-32.bat deleted file mode 100644 index 8e060e000bc8f..0000000000000 --- a/scripts/windows_builder/build_34-32.bat +++ /dev/null @@ -1,27 +0,0 @@ -@echo off -echo "starting 34-32" - -setlocal EnableDelayedExpansion -set MSSdk=1 -CALL "C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x86 /release -set DISTUTILS_USE_SDK=1 - -title 34-32 build -echo "building" -cd "c:\users\Jeff Reback\documents\github\pandas" -C:\python34-32\python.exe setup.py build > build.34-32.log 2>&1 - -echo "installing" -C:\python34-32\python.exe setup.py bdist --formats=wininst > install.34-32.log 2>&1 - -echo "testing" -C:\python34-32\scripts\nosetests -A "not slow" build\lib.win32-3.4\pandas > test.34-32.log 2>&1 - -echo "versions" -cd build\lib.win32-3.4 -C:\python34-32\python.exe ../../ci/print_versions.py > ../../versions.34-32.log 2>&1 - -exit - - - diff --git a/scripts/windows_builder/build_34-64.bat b/scripts/windows_builder/build_34-64.bat deleted file mode 100644 index 3a8512b730346..0000000000000 --- a/scripts/windows_builder/build_34-64.bat +++ /dev/null @@ -1,27 +0,0 @@ -@echo off -echo "starting 34-64" - -setlocal EnableDelayedExpansion -set MSSdk=1 -CALL "C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x64 /release -set DISTUTILS_USE_SDK=1 - -title 34-64 build -echo "building" -cd "c:\users\Jeff Reback\documents\github\pandas" -C:\python34-64\python.exe setup.py build > build.34-64.log 2>&1 - -echo "installing" -C:\python34-64\python.exe setup.py bdist --formats=wininst > install.34-64.log 2>&1 - -echo "testing" -C:\python34-64\scripts\nosetests -A "not slow" build\lib.win-amd64-3.4\pandas > test.34-64.log 2>&1 - -echo "versions" -cd build\lib.win-amd64-3.4 -C:\python34-64\python.exe ../../ci/print_versions.py > ../../versions.34-64.log 2>&1 - -exit - - - diff --git a/scripts/windows_builder/check_and_build.bat b/scripts/windows_builder/check_and_build.bat deleted file mode 100644 index 32be1bde1f7f3..0000000000000 --- a/scripts/windows_builder/check_and_build.bat +++ /dev/null @@ -1,2 +0,0 @@ -set PYTHONPATH=c:/python27-64/lib -c:/python27-64/python.exe c:/Builds/check_and_build.py %1 %2 %3 %4 %4 %6 %7 %8 %9 diff --git a/scripts/windows_builder/check_and_build.py b/scripts/windows_builder/check_and_build.py deleted file mode 100644 index 2eb32fb4265d9..0000000000000 --- a/scripts/windows_builder/check_and_build.py +++ /dev/null @@ -1,194 +0,0 @@ -import datetime -import git -import logging -import os, re, time -import subprocess -import argparse -import pysftp - -# parse the args -parser = argparse.ArgumentParser(description='build, test, and install updated versions of master pandas') -parser.add_argument('-b', '--build', - help='run just this build', - dest='build') -parser.add_argument('-u', '--update', - help='get a git update', - dest='update', - action='store_true', - default=False) -parser.add_argument('-t', '--test', - help='run the tests', - dest='test', - action='store_true', - default=False) -parser.add_argument('-c', '--compare', - help='show the last tests compare', - dest='compare', - action='store_true', - default=False) -parser.add_argument('-v', '--version', - help='show the last versions', - dest='version', - action='store_true', - default=False) -parser.add_argument('-i', '--install', - help='run the install', - dest='install', - action='store_true', - default=False) -parser.add_argument('--dry', - help='dry run', - dest='dry', - action='store_true', - default=False) - -args = parser.parse_args() -dry_run = args.dry - -builds = ['27-32','27-64','34-32','34-64'] -base_dir = "C:\Users\Jeff Reback\Documents\GitHub\pandas" -remote_host='pandas.pydata.org' -username='pandas' -password=############ - -# drop python from our environment to avoid -# passing this onto sub-processes -env = os.environ -del env['PYTHONPATH'] - -# the stdout logger -fmt = '%(asctime)s: %(message)s' -logger = logging.getLogger('check_and_build') -logger.setLevel(logging.DEBUG) -stream_handler = logging.StreamHandler() -stream_handler.setFormatter(logging.Formatter(fmt)) -logger.addHandler(stream_handler) - -def run_all(test=False,compare=False,install=False,version=False,build=None): - # run everything - - for b in builds: - if build is not None and build != b: - continue - if test: - do_rebuild(b) - if compare or test: - try: - do_compare(b) - except (Exception) as e: - logger.info("ERROR COMPARE {0} : {1}".format(b,e)) - if version: - try: - do_version(b) - except (Exception) as e: - logger.info("ERROR VERSION {0} : {1}".format(b,e)) - - if install: - run_install() - -def do_rebuild(build): - # trigger the rebuild - - cmd = "c:/Builds/build_{0}.bat".format(build) - logger.info("rebuild : {0}".format(cmd)) - p = subprocess.Popen("start /wait /min {0}".format(cmd),env=env,shell=True,close_fds=True) - ret = p.wait() - -def do_compare(build): - # print the test outputs - - f = os.path.join(base_dir,"test.{0}.log".format(build)) - with open(f,'r') as fh: - for l in fh: - l = l.rstrip() - if l.startswith('ERROR:'): - logger.info("{0} : {1}".format(build,l)) - if l.startswith('Ran') or l.startswith('OK') or l.startswith('FAIL'): - logger.info("{0} : {1}".format(build,l)) - -def do_version(build): - # print the version strings - - f = os.path.join(base_dir,"versions.{0}.log".format(build)) - with open(f,'r') as fh: - for l in fh: - l = l.rstrip() - logger.info("{0} : {1}".format(build,l)) - -def do_update(is_verbose=True): - # update git; return True if the commit has changed - - repo = git.Repo(base_dir) - master = repo.heads.master - origin = repo.remotes.origin - start_commit = master.commit - - if is_verbose: - logger.info("current commit : {0}".format(start_commit)) - - try: - origin.update() - except (Exception) as e: - logger.info("update exception : {0}".format(e)) - try: - origin.pull() - except (Exception) as e: - logger.info("pull exception : {0}".format(e)) - - result = start_commit != master.commit - if result: - if is_verbose: - logger.info("commits changed : {0} -> {1}".format(start_commit,master.commit)) - return result - -def run_install(): - # send the installation binaries - - repo = git.Repo(base_dir) - master = repo.heads.master - commit = master.commit - short_hash = str(commit)[:7] - - logger.info("sending files : {0}".format(commit)) - d = os.path.join(base_dir,"dist") - files = [ f for f in os.listdir(d) if re.search(short_hash,f) ] - srv = pysftp.Connection(host=remote_host,username=username,password=password) - srv.chdir("www/pandas-build/dev") - - # get current files - remote_files = set(srv.listdir(path='.')) - - for f in files: - if f not in remote_files: - logger.info("sending: {0}".format(f)) - local = os.path.join(d,f) - srv.put(localpath=local) - - srv.close() - logger.info("sending files: done") - -# just perform the action -if args.update or args.test or args.compare or args.install or args.version: - if args.update: - do_update() - run_all(test=args.test,compare=args.compare,install=args.install,version=args.version,build=args.build) - exit(0) - -# file logging -file_handler = logging.FileHandler("C:\Builds\logs\check_and_build.log") -file_handler.setFormatter(logging.Formatter(fmt)) -logger.addHandler(file_handler) - -logger.info("start") - -# main loop -while(True): - - if do_update(): - run_all(test=True,install=False) - - time.sleep(60*60) - -logger.info("exit") -file_handler.close() - diff --git a/scripts/windows_builder/readme.txt b/scripts/windows_builder/readme.txt deleted file mode 100644 index 789e2a9ee0c63..0000000000000 --- a/scripts/windows_builder/readme.txt +++ /dev/null @@ -1,17 +0,0 @@ -This is a collection of windows batch scripts (and a python script) -to rebuild the binaries, test, and upload the binaries for public distribution -upon a commit on github. - -Obviously requires that these be setup on windows -Requires an install of Windows SDK 3.5 and 4.0 -Full python installs for each version with the deps - -Currently supporting - -27-32,27-64,34-32,34-64 - -Note that 34 use the 4.0 SDK, while the other suse 3.5 SDK - -I installed these scripts in C:\Builds - -Installed libaries in C:\Installs diff --git a/setup.cfg b/setup.cfg index f69e256b80869..0123078523b6f 100644 --- a/setup.cfg +++ b/setup.cfg @@ -12,10 +12,19 @@ tag_prefix = v parentdir_prefix = pandas- [flake8] -ignore = E731 +ignore = E731,E402 +max-line-length = 79 [yapf] based_on_style = pep8 split_before_named_assigns = false split_penalty_after_opening_bracket = 1000000 split_penalty_logical_operator = 30 + +[tool:pytest] +testpaths = pandas +markers = + single: mark a test as single cpu only + slow: mark a test as slow + network: mark a test as network + highmemory: mark a test as a high-memory only diff --git a/setup.py b/setup.py index 0c4dd33a70482..444db5bc4d275 100755 --- a/setup.py +++ b/setup.py @@ -45,7 +45,7 @@ def is_platform_mac(): _have_setuptools = False setuptools_kwargs = {} -min_numpy_ver = '1.7.0' +min_numpy_ver = '1.9.0' if sys.version_info[0] >= 3: setuptools_kwargs = { @@ -109,20 +109,23 @@ def is_platform_mac(): from os.path import join as pjoin -_pxipath = pjoin('pandas', 'src') _pxi_dep_template = { - 'algos': ['algos_common_helper.pxi.in', 'algos_groupby_helper.pxi.in', - 'algos_take_helper.pxi.in', 'algos_rank_helper.pxi.in'], - '_join': ['join_helper.pxi.in', 'joins_func_helper.pxi.in'], - 'hashtable': ['hashtable_class_helper.pxi.in', - 'hashtable_func_helper.pxi.in'], - 'index': ['index_class_helper.pxi.in'], - '_sparse': ['sparse_op_helper.pxi.in'] + 'algos': ['_libs/algos_common_helper.pxi.in', + '_libs/algos_take_helper.pxi.in', '_libs/algos_rank_helper.pxi.in'], + 'groupby': ['_libs/groupby_helper.pxi.in'], + 'join': ['_libs/join_helper.pxi.in', '_libs/join_func_helper.pxi.in'], + 'reshape': ['_libs/reshape_helper.pxi.in'], + 'hashtable': ['_libs/hashtable_class_helper.pxi.in', + '_libs/hashtable_func_helper.pxi.in'], + 'index': ['_libs/index_class_helper.pxi.in'], + 'sparse': ['_libs/sparse_op_helper.pxi.in'], + 'interval': ['_libs/intervaltree.pxi.in'] } + _pxifiles = [] _pxi_dep = {} for module, files in _pxi_dep_template.items(): - pxi_files = [pjoin(_pxipath, x) for x in files] + pxi_files = [pjoin('pandas', x) for x in files] _pxifiles.extend(pxi_files) _pxi_dep[module] = pxi_files @@ -243,7 +246,6 @@ def build_extensions(self): 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 2.7', - 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Cython', @@ -260,7 +262,7 @@ def initialize_options(self): self._clean_me = [] self._clean_trees = [] - base = pjoin('pandas','src') + base = pjoin('pandas','_libs', 'src') dt = pjoin(base,'datetime') src = base util = pjoin('pandas','util') @@ -326,19 +328,20 @@ def run(self): class CheckSDist(sdist_class): """Custom sdist that ensures Cython has compiled all pyx files to c.""" - _pyxfiles = ['pandas/lib.pyx', - 'pandas/hashtable.pyx', - 'pandas/tslib.pyx', - 'pandas/index.pyx', - 'pandas/algos.pyx', - 'pandas/join.pyx', - 'pandas/window.pyx', - 'pandas/parser.pyx', - 'pandas/src/period.pyx', - 'pandas/src/sparse.pyx', - 'pandas/src/testing.pyx', - 'pandas/src/hash.pyx', - 'pandas/io/sas/saslib.pyx'] + _pyxfiles = ['pandas/_libs/lib.pyx', + 'pandas/_libs/hashtable.pyx', + 'pandas/_libs/tslib.pyx', + 'pandas/_libs/period.pyx', + 'pandas/_libs/index.pyx', + 'pandas/_libs/algos.pyx', + 'pandas/_libs/join.pyx', + 'pandas/_libs/interval.pyx', + 'pandas/_libs/hashing.pyx', + 'pandas/_libs/testing.pyx', + 'pandas/_libs/window.pyx', + 'pandas/_libs/sparse.pyx', + 'pandas/_libs/parsers.pyx', + 'pandas/io/sas/sas.pyx'] def initialize_options(self): sdist_class.initialize_options(self) @@ -373,6 +376,7 @@ def check_cython_extensions(self, extensions): for ext in extensions: for src in ext.sources: if not os.path.exists(src): + print("{}: -> [{}]".format(ext.name, ext.sources)) raise Exception("""Cython-generated file '%s' not found. Cython is required to compile pandas from a development branch. Please install Cython or download a release package of pandas. @@ -438,13 +442,13 @@ def srcpath(name=None, suffix='.pyx', subdir='src'): return pjoin('pandas', subdir, name + suffix) if suffix == '.pyx': - lib_depends = [srcpath(f, suffix='.pyx') for f in lib_depends] - lib_depends.append('pandas/src/util.pxd') + lib_depends = [srcpath(f, suffix='.pyx', subdir='_libs/src') for f in lib_depends] + lib_depends.append('pandas/_libs/src/util.pxd') else: lib_depends = [] plib_depends = [] -common_include = ['pandas/src/klib', 'pandas/src'] +common_include = ['pandas/_libs/src/klib', 'pandas/_libs/src'] def pxd(name): @@ -456,69 +460,76 @@ def pxd(name): else: extra_compile_args=['-Wno-unused-function'] -lib_depends = lib_depends + ['pandas/src/numpy_helper.h', - 'pandas/src/parse_helper.h'] +lib_depends = lib_depends + ['pandas/_libs/src/numpy_helper.h', + 'pandas/_libs/src/parse_helper.h', + 'pandas/_libs/src/compat_helper.h'] -tseries_depends = ['pandas/src/datetime/np_datetime.h', - 'pandas/src/datetime/np_datetime_strings.h', - 'pandas/src/datetime_helper.h', - 'pandas/src/period_helper.h', - 'pandas/src/datetime.pxd'] +tseries_depends = ['pandas/_libs/src/datetime/np_datetime.h', + 'pandas/_libs/src/datetime/np_datetime_strings.h', + 'pandas/_libs/src/period_helper.h', + 'pandas/_libs/src/datetime.pxd'] # some linux distros require it libraries = ['m'] if not is_platform_windows() else [] -ext_data = dict( - lib={'pyxfile': 'lib', - 'pxdfiles': [], - 'depends': lib_depends}, - hashtable={'pyxfile': 'hashtable', - 'pxdfiles': ['hashtable'], - 'depends': (['pandas/src/klib/khash_python.h'] - + _pxi_dep['hashtable'])}, - tslib={'pyxfile': 'tslib', - 'depends': tseries_depends, - 'sources': ['pandas/src/datetime/np_datetime.c', - 'pandas/src/datetime/np_datetime_strings.c', - 'pandas/src/period_helper.c']}, - _period={'pyxfile': 'src/period', - 'depends': tseries_depends, - 'sources': ['pandas/src/datetime/np_datetime.c', - 'pandas/src/datetime/np_datetime_strings.c', - 'pandas/src/period_helper.c']}, - index={'pyxfile': 'index', - 'sources': ['pandas/src/datetime/np_datetime.c', - 'pandas/src/datetime/np_datetime_strings.c'], - 'pxdfiles': ['src/util', 'hashtable'], - 'depends': _pxi_dep['index']}, - algos={'pyxfile': 'algos', - 'pxdfiles': ['src/util', 'hashtable'], - 'depends': _pxi_dep['algos']}, - _join={'pyxfile': 'src/join', - 'pxdfiles': ['src/util', 'hashtable'], - 'depends': _pxi_dep['_join']}, - _window={'pyxfile': 'window', - 'pxdfiles': ['src/skiplist', 'src/util'], - 'depends': ['pandas/src/skiplist.pyx', - 'pandas/src/skiplist.h']}, - parser={'pyxfile': 'parser', - 'depends': ['pandas/src/parser/tokenizer.h', - 'pandas/src/parser/io.h', - 'pandas/src/numpy_helper.h'], - 'sources': ['pandas/src/parser/tokenizer.c', - 'pandas/src/parser/io.c']}, - _sparse={'pyxfile': 'src/sparse', - 'depends': ([srcpath('sparse', suffix='.pyx')] + - _pxi_dep['_sparse'])}, - _testing={'pyxfile': 'src/testing', - 'depends': [srcpath('testing', suffix='.pyx')]}, - _hash={'pyxfile': 'src/hash', - 'depends': [srcpath('hash', suffix='.pyx')]}, -) - -ext_data["io.sas.saslib"] = {'pyxfile': 'io/sas/saslib'} +ext_data = { + '_libs.lib': {'pyxfile': '_libs/lib', + 'depends': lib_depends + tseries_depends}, + '_libs.hashtable': {'pyxfile': '_libs/hashtable', + 'pxdfiles': ['_libs/hashtable'], + 'depends': (['pandas/_libs/src/klib/khash_python.h'] + + _pxi_dep['hashtable'])}, + '_libs.tslib': {'pyxfile': '_libs/tslib', + 'pxdfiles': ['_libs/src/util', '_libs/lib'], + 'depends': tseries_depends, + 'sources': ['pandas/_libs/src/datetime/np_datetime.c', + 'pandas/_libs/src/datetime/np_datetime_strings.c', + 'pandas/_libs/src/period_helper.c']}, + '_libs.period': {'pyxfile': '_libs/period', + 'depends': tseries_depends, + 'sources': ['pandas/_libs/src/datetime/np_datetime.c', + 'pandas/_libs/src/datetime/np_datetime_strings.c', + 'pandas/_libs/src/period_helper.c']}, + '_libs.index': {'pyxfile': '_libs/index', + 'sources': ['pandas/_libs/src/datetime/np_datetime.c', + 'pandas/_libs/src/datetime/np_datetime_strings.c'], + 'pxdfiles': ['_libs/src/util', '_libs/hashtable'], + 'depends': _pxi_dep['index']}, + '_libs.algos': {'pyxfile': '_libs/algos', + 'pxdfiles': ['_libs/src/util', '_libs/algos', '_libs/hashtable'], + 'depends': _pxi_dep['algos']}, + '_libs.groupby': {'pyxfile': '_libs/groupby', + 'pxdfiles': ['_libs/src/util', '_libs/algos'], + 'depends': _pxi_dep['groupby']}, + '_libs.join': {'pyxfile': '_libs/join', + 'pxdfiles': ['_libs/src/util', '_libs/hashtable'], + 'depends': _pxi_dep['join']}, + '_libs.reshape': {'pyxfile': '_libs/reshape', + 'depends': _pxi_dep['reshape']}, + '_libs.interval': {'pyxfile': '_libs/interval', + 'pxdfiles': ['_libs/hashtable'], + 'depends': _pxi_dep['interval']}, + '_libs.window': {'pyxfile': '_libs/window', + 'pxdfiles': ['_libs/src/skiplist', '_libs/src/util'], + 'depends': ['pandas/_libs/src/skiplist.pyx', + 'pandas/_libs/src/skiplist.h']}, + '_libs.parsers': {'pyxfile': '_libs/parsers', + 'depends': ['pandas/_libs/src/parser/tokenizer.h', + 'pandas/_libs/src/parser/io.h', + 'pandas/_libs/src/numpy_helper.h'], + 'sources': ['pandas/_libs/src/parser/tokenizer.c', + 'pandas/_libs/src/parser/io.c']}, + '_libs.sparse': {'pyxfile': '_libs/sparse', + 'depends': (['pandas/_libs/sparse.pyx'] + + _pxi_dep['sparse'])}, + '_libs.testing': {'pyxfile': '_libs/testing', + 'depends': ['pandas/_libs/testing.pyx']}, + '_libs.hashing': {'pyxfile': '_libs/hashing', + 'depends': ['pandas/_libs/hashing.pyx']}, + 'io.sas._sas': {'pyxfile': 'io/sas/sas'}, + } extensions = [] @@ -549,25 +560,25 @@ def pxd(name): else: macros = [('__LITTLE_ENDIAN__', '1')] -packer_ext = Extension('pandas.msgpack._packer', - depends=['pandas/src/msgpack/pack.h', - 'pandas/src/msgpack/pack_template.h'], +packer_ext = Extension('pandas.io.msgpack._packer', + depends=['pandas/_libs/src/msgpack/pack.h', + 'pandas/_libs/src/msgpack/pack_template.h'], sources = [srcpath('_packer', suffix=suffix if suffix == '.pyx' else '.cpp', - subdir='msgpack')], + subdir='io/msgpack')], language='c++', - include_dirs=['pandas/src/msgpack'] + common_include, + include_dirs=['pandas/_libs/src/msgpack'] + common_include, define_macros=macros, extra_compile_args=extra_compile_args) -unpacker_ext = Extension('pandas.msgpack._unpacker', - depends=['pandas/src/msgpack/unpack.h', - 'pandas/src/msgpack/unpack_define.h', - 'pandas/src/msgpack/unpack_template.h'], +unpacker_ext = Extension('pandas.io.msgpack._unpacker', + depends=['pandas/_libs/src/msgpack/unpack.h', + 'pandas/_libs/src/msgpack/unpack_define.h', + 'pandas/_libs/src/msgpack/unpack_template.h'], sources = [srcpath('_unpacker', suffix=suffix if suffix == '.pyx' else '.cpp', - subdir='msgpack')], + subdir='io/msgpack')], language='c++', - include_dirs=['pandas/src/msgpack'] + common_include, + include_dirs=['pandas/_libs/src/msgpack'] + common_include, define_macros=macros, extra_compile_args=extra_compile_args) extensions.append(packer_ext) @@ -583,20 +594,19 @@ def pxd(name): root, _ = os.path.splitext(ext.sources[0]) ext.sources[0] = root + suffix -ujson_ext = Extension('pandas.json', - depends=['pandas/src/ujson/lib/ultrajson.h', - 'pandas/src/datetime_helper.h', - 'pandas/src/numpy_helper.h'], - sources=['pandas/src/ujson/python/ujson.c', - 'pandas/src/ujson/python/objToJSON.c', - 'pandas/src/ujson/python/JSONtoObj.c', - 'pandas/src/ujson/lib/ultrajsonenc.c', - 'pandas/src/ujson/lib/ultrajsondec.c', - 'pandas/src/datetime/np_datetime.c', - 'pandas/src/datetime/np_datetime_strings.c'], - include_dirs=['pandas/src/ujson/python', - 'pandas/src/ujson/lib', - 'pandas/src/datetime'] + common_include, +ujson_ext = Extension('pandas._libs.json', + depends=['pandas/_libs/src/ujson/lib/ultrajson.h', + 'pandas/_libs/src/numpy_helper.h'], + sources=['pandas/_libs/src/ujson/python/ujson.c', + 'pandas/_libs/src/ujson/python/objToJSON.c', + 'pandas/_libs/src/ujson/python/JSONtoObj.c', + 'pandas/_libs/src/ujson/lib/ultrajsonenc.c', + 'pandas/_libs/src/ujson/lib/ultrajsondec.c', + 'pandas/_libs/src/datetime/np_datetime.c', + 'pandas/_libs/src/datetime/np_datetime_strings.c'], + include_dirs=['pandas/_libs/src/ujson/python', + 'pandas/_libs/src/ujson/lib', + 'pandas/_libs/src/datetime'] + common_include, extra_compile_args=['-D_GNU_SOURCE'] + extra_compile_args) @@ -622,70 +632,90 @@ def pxd(name): version=versioneer.get_version(), packages=['pandas', 'pandas.api', - 'pandas.api.tests', 'pandas.api.types', 'pandas.compat', 'pandas.compat.numpy', - 'pandas.computation', - 'pandas.computation.tests', 'pandas.core', - 'pandas.indexes', + 'pandas.core.dtypes', + 'pandas.core.indexes', + 'pandas.core.computation', + 'pandas.core.reshape', + 'pandas.core.sparse', + 'pandas.core.tools', + 'pandas.core.util', + 'pandas.computation', + 'pandas.errors', + 'pandas.formats', 'pandas.io', + 'pandas.io.json', 'pandas.io.sas', - 'pandas.formats', - 'pandas.sparse', - 'pandas.sparse.tests', + 'pandas.io.msgpack', + 'pandas.io.formats', + 'pandas.io.clipboard', + 'pandas._libs', + 'pandas.plotting', 'pandas.stats', + 'pandas.types', 'pandas.util', 'pandas.tests', + 'pandas.tests.api', + 'pandas.tests.dtypes', + 'pandas.tests.computation', + 'pandas.tests.sparse', 'pandas.tests.frame', + 'pandas.tests.indexing', 'pandas.tests.indexes', + 'pandas.tests.indexes.datetimes', + 'pandas.tests.indexes.timedeltas', + 'pandas.tests.indexes.period', + 'pandas.tests.internals', + 'pandas.tests.io', + 'pandas.tests.io.json', + 'pandas.tests.io.parser', + 'pandas.tests.io.sas', + 'pandas.tests.io.msgpack', + 'pandas.tests.io.formats', 'pandas.tests.groupby', + 'pandas.tests.reshape', 'pandas.tests.series', - 'pandas.tests.formats', - 'pandas.tests.types', - 'pandas.tests.test_msgpack', + 'pandas.tests.scalar', + 'pandas.tests.tseries', 'pandas.tests.plotting', + 'pandas.tests.tools', + 'pandas.tests.util', 'pandas.tools', - 'pandas.tools.tests', 'pandas.tseries', - 'pandas.tseries.tests', - 'pandas.types', - 'pandas.io.tests', - 'pandas.io.tests.json', - 'pandas.io.tests.parser', - 'pandas.io.tests.sas', - 'pandas.stats.tests', - 'pandas.msgpack', - 'pandas.util.clipboard' ], - package_data={'pandas.io': ['tests/data/legacy_hdf/*.h5', - 'tests/data/legacy_pickle/*/*.pickle', - 'tests/data/legacy_msgpack/*/*.msgpack', - 'tests/data/*.csv*', - 'tests/data/*.dta', - 'tests/data/*.pickle', - 'tests/data/*.txt', - 'tests/data/*.xls', - 'tests/data/*.xlsx', - 'tests/data/*.xlsm', - 'tests/data/*.table', - 'tests/parser/data/*.csv', - 'tests/parser/data/*.gz', - 'tests/parser/data/*.bz2', - 'tests/parser/data/*.txt', - 'tests/sas/data/*.csv', - 'tests/sas/data/*.xpt', - 'tests/sas/data/*.sas7bdat', - 'tests/data/*.html', - 'tests/data/html_encoding/*.html', - 'tests/json/data/*.json'], - 'pandas.tools': ['tests/data/*.csv'], - 'pandas.tests': ['data/*.csv'], - 'pandas.tests.formats': ['data/*.csv'], + package_data={'pandas.tests': ['data/*.csv'], 'pandas.tests.indexes': ['data/*.pickle'], - 'pandas.tseries.tests': ['data/*.pickle', - 'data/*.csv'] + 'pandas.tests.io': ['data/legacy_hdf/*.h5', + 'data/legacy_pickle/*/*.pickle', + 'data/legacy_msgpack/*/*.msgpack', + 'data/*.csv*', + 'data/*.dta', + 'data/*.pickle', + 'data/*.txt', + 'data/*.xls', + 'data/*.xlsx', + 'data/*.xlsm', + 'data/*.table', + 'parser/data/*.csv', + 'parser/data/*.gz', + 'parser/data/*.bz2', + 'parser/data/*.txt', + 'parser/data/*.tar', + 'parser/data/*.tar.gz', + 'sas/data/*.csv', + 'sas/data/*.xpt', + 'sas/data/*.sas7bdat', + 'data/*.html', + 'data/html_encoding/*.html', + 'json/data/*.json'], + 'pandas.tests.io.formats': ['data/*.csv'], + 'pandas.tests.io.msgpack': ['data/*.mp'], + 'pandas.tests.reshape': ['data/*.csv'], + 'pandas.tests.tseries': ['data/*.pickle'], + 'pandas.io.formats': ['templates/*.tpl'] }, ext_modules=extensions, maintainer_email=EMAIL, diff --git a/test.bat b/test.bat index 16aa6c9105ec3..6c69f83866ffd 100644 --- a/test.bat +++ b/test.bat @@ -1,3 +1,3 @@ :: test on windows -nosetests --exe -A "not slow and not network and not disabled" pandas %* +pytest --skip-slow --skip-network pandas -n 2 %* diff --git a/test.sh b/test.sh index 4a9ffd7be98b1..23c7ff52d2ce9 100755 --- a/test.sh +++ b/test.sh @@ -1,11 +1,4 @@ #!/bin/sh command -v coverage >/dev/null && coverage erase command -v python-coverage >/dev/null && python-coverage erase -# nosetests pandas/tests/test_index.py --with-coverage --cover-package=pandas.core --pdb-failure --pdb -#nosetests -w pandas --with-coverage --cover-package=pandas --pdb-failure --pdb #--cover-inclusive -#nosetests -A "not slow" -w pandas/tseries --with-coverage --cover-package=pandas.tseries $* #--cover-inclusive -nosetests -w pandas --with-coverage --cover-package=pandas $* -# nosetests -w pandas/io --with-coverage --cover-package=pandas.io --pdb-failure --pdb -# nosetests -w pandas/core --with-coverage --cover-package=pandas.core --pdb-failure --pdb -# nosetests -w pandas/stats --with-coverage --cover-package=pandas.stats -# coverage run runtests.py +pytest pandas --cov=pandas diff --git a/test_fast.bat b/test_fast.bat new file mode 100644 index 0000000000000..17dc54b580137 --- /dev/null +++ b/test_fast.bat @@ -0,0 +1,3 @@ +:: test on windows +set PYTHONHASHSEED=314159265 +pytest --skip-slow --skip-network -m "not single" -n 4 pandas diff --git a/test_fast.sh b/test_fast.sh index b390705f901ad..9b984156a796c 100755 --- a/test_fast.sh +++ b/test_fast.sh @@ -1 +1,8 @@ -nosetests -A "not slow and not network" pandas --with-id $* +#!/bin/bash + +# Workaround for pytest-xdist flaky collection order +# https://github.com/pytest-dev/pytest/issues/920 +# https://github.com/pytest-dev/pytest/issues/1075 +export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') + +pytest pandas --skip-slow --skip-network -m "not single" -n 4 "$@" diff --git a/test_multi.sh b/test_multi.sh deleted file mode 100755 index 5d77945c66a26..0000000000000 --- a/test_multi.sh +++ /dev/null @@ -1 +0,0 @@ -nosetests -A "not slow and not network" pandas --processes=4 $* diff --git a/test_perf.sh b/test_perf.sh deleted file mode 100755 index 022de25bca8fc..0000000000000 --- a/test_perf.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/bin/sh - -CURDIR=$(pwd) -BASEDIR=$(cd "$(dirname "$0")"; pwd) -python "$BASEDIR"/vb_suite/test_perf.py $@ diff --git a/test_rebuild.sh b/test_rebuild.sh index d3710c5ff67d3..65aa1098811a1 100755 --- a/test_rebuild.sh +++ b/test_rebuild.sh @@ -3,10 +3,4 @@ python setup.py clean python setup.py build_ext --inplace coverage erase -# nosetests pandas/tests/test_index.py --with-coverage --cover-package=pandas.core --pdb-failure --pdb -#nosetests -w pandas --with-coverage --cover-package=pandas --pdb-failure --pdb #--cover-inclusive -nosetests -w pandas --with-coverage --cover-package=pandas $* #--cover-inclusive -# nosetests -w pandas/io --with-coverage --cover-package=pandas.io --pdb-failure --pdb -# nosetests -w pandas/core --with-coverage --cover-package=pandas.core --pdb-failure --pdb -# nosetests -w pandas/stats --with-coverage --cover-package=pandas.stats -# coverage run runtests.py +pytest pandas --cov=pandas diff --git a/tox.ini b/tox.ini index 5d6c8975307b6..45ad7fc451e76 100644 --- a/tox.ini +++ b/tox.ini @@ -4,12 +4,13 @@ # and then run "tox" from this directory. [tox] -envlist = py27, py34, py35 +envlist = py27, py35, py36 [testenv] deps = cython nose + pytest pytz>=2011k python-dateutil beautifulsoup4 @@ -26,7 +27,7 @@ changedir = {envdir} commands = # TODO: --exe because of GH #761 - {envbindir}/nosetests --exe pandas {posargs:-A "not network and not disabled"} + {envbindir}/pytest pandas {posargs:-A "not network and not disabled"} # cleanup the temp. build dir created by the tox build # /bin/rm -rf {toxinidir}/build @@ -48,14 +49,14 @@ deps = bigquery {[testenv]deps} -[testenv:py34] +[testenv:py35] deps = - numpy==1.8.0 + numpy==1.10.0 {[testenv]deps} -[testenv:py35] +[testenv:py36] deps = - numpy==1.10.0 + numpy {[testenv]deps} [testenv:openpyxl1] @@ -63,18 +64,18 @@ usedevelop = True deps = {[testenv]deps} openpyxl<2.0.0 -commands = {envbindir}/nosetests {toxinidir}/pandas/io/tests/test_excel.py +commands = {envbindir}/pytest {toxinidir}/pandas/io/tests/test_excel.py [testenv:openpyxl20] usedevelop = True deps = {[testenv]deps} openpyxl<2.2.0 -commands = {envbindir}/nosetests {posargs} {toxinidir}/pandas/io/tests/test_excel.py +commands = {envbindir}/pytest {posargs} {toxinidir}/pandas/io/tests/test_excel.py [testenv:openpyxl22] usedevelop = True deps = {[testenv]deps} openpyxl>=2.2.0 -commands = {envbindir}/nosetests {posargs} {toxinidir}/pandas/io/tests/test_excel.py +commands = {envbindir}/pytest {posargs} {toxinidir}/pandas/io/tests/test_excel.py diff --git a/vb_suite/.gitignore b/vb_suite/.gitignore deleted file mode 100644 index cc110f04e1225..0000000000000 --- a/vb_suite/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -benchmarks.db -build/* -source/vbench/* -source/*.rst \ No newline at end of file diff --git a/vb_suite/attrs_caching.py b/vb_suite/attrs_caching.py deleted file mode 100644 index a7e3ed7094ed6..0000000000000 --- a/vb_suite/attrs_caching.py +++ /dev/null @@ -1,20 +0,0 @@ -from vbench.benchmark import Benchmark - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# DataFrame.index / columns property lookup time - -setup = common_setup + """ -df = DataFrame(np.random.randn(10, 6)) -cur_index = df.index -""" -stmt = "foo = df.index" - -getattr_dataframe_index = Benchmark(stmt, setup, - name="getattr_dataframe_index") - -stmt = "df.index = cur_index" -setattr_dataframe_index = Benchmark(stmt, setup, - name="setattr_dataframe_index") diff --git a/vb_suite/binary_ops.py b/vb_suite/binary_ops.py deleted file mode 100644 index 7c821374a83ab..0000000000000 --- a/vb_suite/binary_ops.py +++ /dev/null @@ -1,199 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -SECTION = 'Binary ops' - -#---------------------------------------------------------------------- -# binary ops - -#---------------------------------------------------------------------- -# add - -setup = common_setup + """ -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -""" -frame_add = \ - Benchmark("df + df2", setup, name='frame_add', - start_date=datetime(2012, 1, 1)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_numexpr_threads(1) -""" - -frame_add_st = \ - Benchmark("df + df2", setup, name='frame_add_st',cleanup="expr.set_numexpr_threads()", - start_date=datetime(2013, 2, 26)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_use_numexpr(False) -""" -frame_add_no_ne = \ - Benchmark("df + df2", setup, name='frame_add_no_ne',cleanup="expr.set_use_numexpr(True)", - start_date=datetime(2013, 2, 26)) - -#---------------------------------------------------------------------- -# mult - -setup = common_setup + """ -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -""" -frame_mult = \ - Benchmark("df * df2", setup, name='frame_mult', - start_date=datetime(2012, 1, 1)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_numexpr_threads(1) -""" -frame_mult_st = \ - Benchmark("df * df2", setup, name='frame_mult_st',cleanup="expr.set_numexpr_threads()", - start_date=datetime(2013, 2, 26)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_use_numexpr(False) -""" -frame_mult_no_ne = \ - Benchmark("df * df2", setup, name='frame_mult_no_ne',cleanup="expr.set_use_numexpr(True)", - start_date=datetime(2013, 2, 26)) - -#---------------------------------------------------------------------- -# division - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000, 1000)) -""" -frame_float_div_by_zero = \ - Benchmark("df / 0", setup, name='frame_float_div_by_zero') - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000, 1000)) -""" -frame_float_floor_by_zero = \ - Benchmark("df // 0", setup, name='frame_float_floor_by_zero') - -setup = common_setup + """ -df = DataFrame(np.random.random_integers(np.iinfo(np.int16).min, np.iinfo(np.int16).max, size=(1000, 1000))) -""" -frame_int_div_by_zero = \ - Benchmark("df / 0", setup, name='frame_int_div_by_zero') - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000, 1000)) -df2 = DataFrame(np.random.randn(1000, 1000)) -""" -frame_float_div = \ - Benchmark("df // df2", setup, name='frame_float_div') - -#---------------------------------------------------------------------- -# modulo - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000, 1000)) -df2 = DataFrame(np.random.randn(1000, 1000)) -""" -frame_float_mod = \ - Benchmark("df / df2", setup, name='frame_float_mod') - -setup = common_setup + """ -df = DataFrame(np.random.random_integers(np.iinfo(np.int16).min, np.iinfo(np.int16).max, size=(1000, 1000))) -df2 = DataFrame(np.random.random_integers(np.iinfo(np.int16).min, np.iinfo(np.int16).max, size=(1000, 1000))) -""" -frame_int_mod = \ - Benchmark("df / df2", setup, name='frame_int_mod') - -#---------------------------------------------------------------------- -# multi and - -setup = common_setup + """ -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -""" -frame_multi_and = \ - Benchmark("df[(df>0) & (df2>0)]", setup, name='frame_multi_and', - start_date=datetime(2012, 1, 1)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_numexpr_threads(1) -""" -frame_multi_and_st = \ - Benchmark("df[(df>0) & (df2>0)]", setup, name='frame_multi_and_st',cleanup="expr.set_numexpr_threads()", - start_date=datetime(2013, 2, 26)) - -setup = common_setup + """ -import pandas.computation.expressions as expr -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -expr.set_use_numexpr(False) -""" -frame_multi_and_no_ne = \ - Benchmark("df[(df>0) & (df2>0)]", setup, name='frame_multi_and_no_ne',cleanup="expr.set_use_numexpr(True)", - start_date=datetime(2013, 2, 26)) - -#---------------------------------------------------------------------- -# timeseries - -setup = common_setup + """ -N = 1000000 -halfway = N // 2 - 1 -s = Series(date_range('20010101', periods=N, freq='T')) -ts = s[halfway] -""" - -timestamp_series_compare = Benchmark("ts >= s", setup, - start_date=datetime(2013, 9, 27)) -series_timestamp_compare = Benchmark("s <= ts", setup, - start_date=datetime(2012, 2, 21)) - -setup = common_setup + """ -N = 1000000 -s = Series(date_range('20010101', periods=N, freq='s')) -""" - -timestamp_ops_diff1 = Benchmark("s.diff()", setup, - start_date=datetime(2013, 1, 1)) -timestamp_ops_diff2 = Benchmark("s-s.shift()", setup, - start_date=datetime(2013, 1, 1)) - -#---------------------------------------------------------------------- -# timeseries with tz - -setup = common_setup + """ -N = 10000 -halfway = N // 2 - 1 -s = Series(date_range('20010101', periods=N, freq='T', tz='US/Eastern')) -ts = s[halfway] -""" - -timestamp_tz_series_compare = Benchmark("ts >= s", setup, - start_date=datetime(2013, 9, 27)) -series_timestamp_tz_compare = Benchmark("s <= ts", setup, - start_date=datetime(2012, 2, 21)) - -setup = common_setup + """ -N = 10000 -s = Series(date_range('20010101', periods=N, freq='s', tz='US/Eastern')) -""" - -timestamp_tz_ops_diff1 = Benchmark("s.diff()", setup, - start_date=datetime(2013, 1, 1)) -timestamp_tz_ops_diff2 = Benchmark("s-s.shift()", setup, - start_date=datetime(2013, 1, 1)) diff --git a/vb_suite/categoricals.py b/vb_suite/categoricals.py deleted file mode 100644 index a08d479df20cb..0000000000000 --- a/vb_suite/categoricals.py +++ /dev/null @@ -1,16 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# Series constructors - -setup = common_setup + """ -s = pd.Series(list('aabbcd') * 1000000).astype('category') -""" - -concat_categorical = \ - Benchmark("concat([s, s])", setup=setup, name='concat_categorical', - start_date=datetime(year=2015, month=7, day=15)) diff --git a/vb_suite/ctors.py b/vb_suite/ctors.py deleted file mode 100644 index 8123322383f0a..0000000000000 --- a/vb_suite/ctors.py +++ /dev/null @@ -1,39 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# Series constructors - -setup = common_setup + """ -data = np.random.randn(100) -index = Index(np.arange(100)) -""" - -ctor_series_ndarray = \ - Benchmark("Series(data, index=index)", setup=setup, - name='series_constructor_ndarray') - -setup = common_setup + """ -arr = np.random.randn(100, 100) -""" - -ctor_frame_ndarray = \ - Benchmark("DataFrame(arr)", setup=setup, - name='frame_constructor_ndarray') - -setup = common_setup + """ -data = np.array(['foo', 'bar', 'baz'], dtype=object) -""" - -ctor_index_array_string = Benchmark('Index(data)', setup=setup) - -# index constructors -setup = common_setup + """ -s = Series([Timestamp('20110101'),Timestamp('20120101'),Timestamp('20130101')]*1000) -""" -index_from_series_ctor = Benchmark('Index(s)', setup=setup) - -dtindex_from_series_ctor = Benchmark('DatetimeIndex(s)', setup=setup) diff --git a/vb_suite/eval.py b/vb_suite/eval.py deleted file mode 100644 index bf80aad956184..0000000000000 --- a/vb_suite/eval.py +++ /dev/null @@ -1,150 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -import pandas as pd -df = DataFrame(np.random.randn(20000, 100)) -df2 = DataFrame(np.random.randn(20000, 100)) -df3 = DataFrame(np.random.randn(20000, 100)) -df4 = DataFrame(np.random.randn(20000, 100)) -""" - -setup = common_setup + """ -import pandas.computation.expressions as expr -expr.set_numexpr_threads(1) -""" - -SECTION = 'Eval' - -#---------------------------------------------------------------------- -# binary ops - -#---------------------------------------------------------------------- -# add -eval_frame_add_all_threads = \ - Benchmark("pd.eval('df + df2 + df3 + df4')", common_setup, - name='eval_frame_add_all_threads', - start_date=datetime(2013, 7, 21)) - - - -eval_frame_add_one_thread = \ - Benchmark("pd.eval('df + df2 + df3 + df4')", setup, - name='eval_frame_add_one_thread', - start_date=datetime(2013, 7, 26)) - -eval_frame_add_python = \ - Benchmark("pd.eval('df + df2 + df3 + df4', engine='python')", common_setup, - name='eval_frame_add_python', start_date=datetime(2013, 7, 21)) - -eval_frame_add_python_one_thread = \ - Benchmark("pd.eval('df + df2 + df3 + df4', engine='python')", setup, - name='eval_frame_add_python_one_thread', - start_date=datetime(2013, 7, 26)) -#---------------------------------------------------------------------- -# mult - -eval_frame_mult_all_threads = \ - Benchmark("pd.eval('df * df2 * df3 * df4')", common_setup, - name='eval_frame_mult_all_threads', - start_date=datetime(2013, 7, 21)) - -eval_frame_mult_one_thread = \ - Benchmark("pd.eval('df * df2 * df3 * df4')", setup, - name='eval_frame_mult_one_thread', - start_date=datetime(2013, 7, 26)) - -eval_frame_mult_python = \ - Benchmark("pd.eval('df * df2 * df3 * df4', engine='python')", - common_setup, - name='eval_frame_mult_python', start_date=datetime(2013, 7, 21)) - -eval_frame_mult_python_one_thread = \ - Benchmark("pd.eval('df * df2 * df3 * df4', engine='python')", setup, - name='eval_frame_mult_python_one_thread', - start_date=datetime(2013, 7, 26)) - -#---------------------------------------------------------------------- -# multi and - -eval_frame_and_all_threads = \ - Benchmark("pd.eval('(df > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)')", - common_setup, - name='eval_frame_and_all_threads', - start_date=datetime(2013, 7, 21)) - -eval_frame_and_one_thread = \ - Benchmark("pd.eval('(df > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)')", setup, - name='eval_frame_and_one_thread', - start_date=datetime(2013, 7, 26)) - -eval_frame_and_python = \ - Benchmark("pd.eval('(df > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)', engine='python')", - common_setup, name='eval_frame_and_python', - start_date=datetime(2013, 7, 21)) - -eval_frame_and_one_thread = \ - Benchmark("pd.eval('(df > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)', engine='python')", - setup, - name='eval_frame_and_python_one_thread', - start_date=datetime(2013, 7, 26)) - -#-------------------------------------------------------------------- -# chained comp -eval_frame_chained_cmp_all_threads = \ - Benchmark("pd.eval('df < df2 < df3 < df4')", common_setup, - name='eval_frame_chained_cmp_all_threads', - start_date=datetime(2013, 7, 21)) - -eval_frame_chained_cmp_one_thread = \ - Benchmark("pd.eval('df < df2 < df3 < df4')", setup, - name='eval_frame_chained_cmp_one_thread', - start_date=datetime(2013, 7, 26)) - -eval_frame_chained_cmp_python = \ - Benchmark("pd.eval('df < df2 < df3 < df4', engine='python')", - common_setup, name='eval_frame_chained_cmp_python', - start_date=datetime(2013, 7, 26)) - -eval_frame_chained_cmp_one_thread = \ - Benchmark("pd.eval('df < df2 < df3 < df4', engine='python')", setup, - name='eval_frame_chained_cmp_python_one_thread', - start_date=datetime(2013, 7, 26)) - - -common_setup = """from .pandas_vb_common import * -""" - -setup = common_setup + """ -N = 1000000 -halfway = N // 2 - 1 -index = date_range('20010101', periods=N, freq='T') -s = Series(index) -ts = s.iloc[halfway] -""" - -series_setup = setup + """ -df = DataFrame({'dates': s.values}) -""" - -query_datetime_series = Benchmark("df.query('dates < @ts')", - series_setup, - start_date=datetime(2013, 9, 27)) - -index_setup = setup + """ -df = DataFrame({'a': np.random.randn(N)}, index=index) -""" - -query_datetime_index = Benchmark("df.query('index < @ts')", - index_setup, start_date=datetime(2013, 9, 27)) - -setup = setup + """ -N = 1000000 -df = DataFrame({'a': np.random.randn(N)}) -min_val = df['a'].min() -max_val = df['a'].max() -""" - -query_with_boolean_selection = Benchmark("df.query('(a >= @min_val) & (a <= @max_val)')", - setup, start_date=datetime(2013, 9, 27)) - diff --git a/vb_suite/frame_ctor.py b/vb_suite/frame_ctor.py deleted file mode 100644 index 0d57da7b88d3b..0000000000000 --- a/vb_suite/frame_ctor.py +++ /dev/null @@ -1,123 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime -try: - import pandas.tseries.offsets as offsets -except: - import pandas.core.datetools as offsets - -common_setup = """from .pandas_vb_common import * -try: - from pandas.tseries.offsets import * -except: - from pandas.core.datetools import * -""" - -#---------------------------------------------------------------------- -# Creation from nested dict - -setup = common_setup + """ -N, K = 5000, 50 -index = tm.makeStringIndex(N) -columns = tm.makeStringIndex(K) -frame = DataFrame(np.random.randn(N, K), index=index, columns=columns) - -try: - data = frame.to_dict() -except: - data = frame.toDict() - -some_dict = data.values()[0] -dict_list = [dict(zip(columns, row)) for row in frame.values] -""" - -frame_ctor_nested_dict = Benchmark("DataFrame(data)", setup) - -# From JSON-like stuff -frame_ctor_list_of_dict = Benchmark("DataFrame(dict_list)", setup, - start_date=datetime(2011, 12, 20)) - -series_ctor_from_dict = Benchmark("Series(some_dict)", setup) - -# nested dict, integer indexes, regression described in #621 -setup = common_setup + """ -data = dict((i,dict((j,float(j)) for j in range(100))) for i in xrange(2000)) -""" -frame_ctor_nested_dict_int64 = Benchmark("DataFrame(data)", setup) - -# dynamically generate benchmarks for every offset -# -# get_period_count & get_index_for_offset are there because blindly taking each -# offset times 1000 can easily go out of Timestamp bounds and raise errors. -dynamic_benchmarks = {} -n_steps = [1, 2] -offset_kwargs = {'WeekOfMonth': {'weekday': 1, 'week': 1}, - 'LastWeekOfMonth': {'weekday': 1, 'week': 1}, - 'FY5253': {'startingMonth': 1, 'weekday': 1}, - 'FY5253Quarter': {'qtr_with_extra_week': 1, 'startingMonth': 1, 'weekday': 1}} - -offset_extra_cases = {'FY5253': {'variation': ['nearest', 'last']}, - 'FY5253Quarter': {'variation': ['nearest', 'last']}} - -for offset in offsets.__all__: - for n in n_steps: - kwargs = {} - if offset in offset_kwargs: - kwargs = offset_kwargs[offset] - - if offset in offset_extra_cases: - extras = offset_extra_cases[offset] - else: - extras = {'': ['']} - - for extra_arg in extras: - for extra in extras[extra_arg]: - if extra: - kwargs[extra_arg] = extra - setup = common_setup + """ - -def get_period_count(start_date, off): - ten_offsets_in_days = ((start_date + off * 10) - start_date).days - if ten_offsets_in_days == 0: - return 1000 - else: - return min(9 * ((Timestamp.max - start_date).days // - ten_offsets_in_days), - 1000) - -def get_index_for_offset(off): - start_date = Timestamp('1/1/1900') - return date_range(start_date, - periods=min(1000, get_period_count(start_date, off)), - freq=off) - -idx = get_index_for_offset({}({}, **{})) -df = DataFrame(np.random.randn(len(idx),10), index=idx) -d = dict([ (col,df[col]) for col in df.columns ]) -""".format(offset, n, kwargs) - key = 'frame_ctor_dtindex_{}x{}'.format(offset, n) - if extra: - key += '__{}_{}'.format(extra_arg, extra) - dynamic_benchmarks[key] = Benchmark("DataFrame(d)", setup, name=key) - -# Have to stuff them in globals() so vbench detects them -globals().update(dynamic_benchmarks) - -# from a mi-series -setup = common_setup + """ -mi = MultiIndex.from_tuples([(x,y) for x in range(100) for y in range(100)]) -s = Series(randn(10000), index=mi) -""" -frame_from_series = Benchmark("DataFrame(s)", setup) - -#---------------------------------------------------------------------- -# get_numeric_data - -setup = common_setup + """ -df = DataFrame(randn(10000, 25)) -df['foo'] = 'bar' -df['bar'] = 'baz' -df = df.consolidate() -""" - -frame_get_numeric_data = Benchmark('df._get_numeric_data()', setup, - start_date=datetime(2011, 11, 1)) diff --git a/vb_suite/frame_methods.py b/vb_suite/frame_methods.py deleted file mode 100644 index 46343e9c607fd..0000000000000 --- a/vb_suite/frame_methods.py +++ /dev/null @@ -1,525 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# lookup - -setup = common_setup + """ -df = DataFrame(np.random.randn(10000, 8), columns=list('abcdefgh')) -df['foo'] = 'bar' - -row_labels = list(df.index[::10])[:900] -col_labels = list(df.columns) * 100 -row_labels_all = np.array(list(df.index) * len(df.columns), dtype='object') -col_labels_all = np.array(list(df.columns) * len(df.index), dtype='object') -""" - -frame_fancy_lookup = Benchmark('df.lookup(row_labels, col_labels)', setup, - start_date=datetime(2012, 1, 12)) - -frame_fancy_lookup_all = Benchmark('df.lookup(row_labels_all, col_labels_all)', - setup, - start_date=datetime(2012, 1, 12)) - -#---------------------------------------------------------------------- -# fillna in place - -setup = common_setup + """ -df = DataFrame(randn(10000, 100)) -df.values[::2] = np.nan -""" - -frame_fillna_inplace = Benchmark('df.fillna(0, inplace=True)', setup, - start_date=datetime(2012, 4, 4)) - - -#---------------------------------------------------------------------- -# reindex both axes - -setup = common_setup + """ -df = DataFrame(randn(10000, 10000)) -idx = np.arange(4000, 7000) -""" - -frame_reindex_axis0 = Benchmark('df.reindex(idx)', setup) - -frame_reindex_axis1 = Benchmark('df.reindex(columns=idx)', setup) - -frame_reindex_both_axes = Benchmark('df.reindex(index=idx, columns=idx)', - setup, start_date=datetime(2011, 1, 1)) - -frame_reindex_both_axes_ix = Benchmark('df.ix[idx, idx]', setup, - start_date=datetime(2011, 1, 1)) - -#---------------------------------------------------------------------- -# reindex with upcasts -setup = common_setup + """ -df=DataFrame(dict([(c, { - 0: randint(0, 2, 1000).astype(np.bool_), - 1: randint(0, 1000, 1000).astype(np.int16), - 2: randint(0, 1000, 1000).astype(np.int32), - 3: randint(0, 1000, 1000).astype(np.int64) - }[randint(0, 4)]) for c in range(1000)])) -""" - -frame_reindex_upcast = Benchmark('df.reindex(permutation(range(1200)))', setup) - -#---------------------------------------------------------------------- -# boolean indexing - -setup = common_setup + """ -df = DataFrame(randn(10000, 100)) -bool_arr = np.zeros(10000, dtype=bool) -bool_arr[:1000] = True -""" - -frame_boolean_row_select = Benchmark('df[bool_arr]', setup, - start_date=datetime(2011, 1, 1)) - -#---------------------------------------------------------------------- -# iteritems (monitor no-copying behaviour) - -setup = common_setup + """ -df = DataFrame(randn(10000, 1000)) -df2 = DataFrame(randn(3000,1),columns=['A']) -df3 = DataFrame(randn(3000,1)) - -def f(): - if hasattr(df, '_item_cache'): - df._item_cache.clear() - for name, col in df.iteritems(): - pass - -def g(): - for name, col in df.iteritems(): - pass - -def h(): - for i in range(10000): - df2['A'] - -def j(): - for i in range(10000): - df3[0] - -""" - -# as far back as the earliest test currently in the suite -frame_iteritems = Benchmark('f()', setup, - start_date=datetime(2010, 6, 1)) - -frame_iteritems_cached = Benchmark('g()', setup, - start_date=datetime(2010, 6, 1)) - -frame_getitem_single_column = Benchmark('h()', setup, - start_date=datetime(2010, 6, 1)) - -frame_getitem_single_column2 = Benchmark('j()', setup, - start_date=datetime(2010, 6, 1)) - -#---------------------------------------------------------------------- -# assignment - -setup = common_setup + """ -idx = date_range('1/1/2000', periods=100000, freq='D') -df = DataFrame(randn(100000, 1),columns=['A'],index=idx) -def f(df): - x = df.copy() - x['date'] = x.index -""" - -frame_assign_timeseries_index = Benchmark('f(df)', setup, - start_date=datetime(2013, 10, 1)) - - -#---------------------------------------------------------------------- -# to_string - -setup = common_setup + """ -df = DataFrame(randn(100, 10)) -""" - -frame_to_string_floats = Benchmark('df.to_string()', setup, - start_date=datetime(2010, 6, 1)) - -#---------------------------------------------------------------------- -# to_html - -setup = common_setup + """ -nrows=500 -df = DataFrame(randn(nrows, 10)) -df[0]=period_range("2000","2010",nrows) -df[1]=range(nrows) - -""" - -frame_to_html_mixed = Benchmark('df.to_html()', setup, - start_date=datetime(2011, 11, 18)) - - -# truncated repr_html, single index - -setup = common_setup + """ -nrows=10000 -data=randn(nrows,10) -idx=MultiIndex.from_arrays(np.tile(randn(3,nrows/100),100)) -df=DataFrame(data,index=idx) - -""" - -frame_html_repr_trunc_mi = Benchmark('df._repr_html_()', setup, - start_date=datetime(2013, 11, 25)) - -# truncated repr_html, MultiIndex - -setup = common_setup + """ -nrows=10000 -data=randn(nrows,10) -idx=randn(nrows) -df=DataFrame(data,index=idx) - -""" - -frame_html_repr_trunc_si = Benchmark('df._repr_html_()', setup, - start_date=datetime(2013, 11, 25)) - - -# insert many columns - -setup = common_setup + """ -N = 1000 - -def f(K=500): - df = DataFrame(index=range(N)) - new_col = np.random.randn(N) - for i in range(K): - df[i] = new_col -""" - -frame_insert_500_columns_end = Benchmark('f()', setup, start_date=datetime(2011, 1, 1)) - -setup = common_setup + """ -N = 1000 - -def f(K=100): - df = DataFrame(index=range(N)) - new_col = np.random.randn(N) - for i in range(K): - df.insert(0,i,new_col) -""" - -frame_insert_100_columns_begin = Benchmark('f()', setup, start_date=datetime(2011, 1, 1)) - -#---------------------------------------------------------------------- -# strings methods, #2602 - -setup = common_setup + """ -s = Series(['abcdefg', np.nan]*500000) -""" - -series_string_vector_slice = Benchmark('s.str[:5]', setup, - start_date=datetime(2012, 8, 1)) - -#---------------------------------------------------------------------- -# df.info() and get_dtype_counts() # 2807 - -setup = common_setup + """ -df = pandas.DataFrame(np.random.randn(10,10000)) -""" - -frame_get_dtype_counts = Benchmark('df.get_dtype_counts()', setup, - start_date=datetime(2012, 8, 1)) - -## -setup = common_setup + """ -df = pandas.DataFrame(np.random.randn(10,10000)) -""" - -frame_repr_wide = Benchmark('repr(df)', setup, - start_date=datetime(2012, 8, 1)) - -## -setup = common_setup + """ -df = pandas.DataFrame(np.random.randn(10000, 10)) -""" - -frame_repr_tall = Benchmark('repr(df)', setup, - start_date=datetime(2012, 8, 1)) - -## -setup = common_setup + """ -df = DataFrame(randn(100000, 1)) -""" - -frame_xs_row = Benchmark('df.xs(50000)', setup) - -## -setup = common_setup + """ -df = DataFrame(randn(1,100000)) -""" - -frame_xs_col = Benchmark('df.xs(50000,axis = 1)', setup) - -#---------------------------------------------------------------------- -# nulls/masking - -## masking -setup = common_setup + """ -data = np.random.randn(1000, 500) -df = DataFrame(data) -df = df.where(df > 0) # create nans -bools = df > 0 -mask = isnull(df) -""" - -frame_mask_bools = Benchmark('bools.mask(mask)', setup, - start_date=datetime(2013,1,1)) - -frame_mask_floats = Benchmark('bools.astype(float).mask(mask)', setup, - start_date=datetime(2013,1,1)) - -## isnull -setup = common_setup + """ -data = np.random.randn(1000, 1000) -df = DataFrame(data) -""" -frame_isnull = Benchmark('isnull(df)', setup, - start_date=datetime(2012,1,1)) - -## dropna -dropna_setup = common_setup + """ -data = np.random.randn(10000, 1000) -df = DataFrame(data) -df.ix[50:1000,20:50] = np.nan -df.ix[2000:3000] = np.nan -df.ix[:,60:70] = np.nan -""" -frame_dropna_axis0_any = Benchmark('df.dropna(how="any",axis=0)', dropna_setup, - start_date=datetime(2012,1,1)) -frame_dropna_axis0_all = Benchmark('df.dropna(how="all",axis=0)', dropna_setup, - start_date=datetime(2012,1,1)) - -frame_dropna_axis1_any = Benchmark('df.dropna(how="any",axis=1)', dropna_setup, - start_date=datetime(2012,1,1)) - -frame_dropna_axis1_all = Benchmark('df.dropna(how="all",axis=1)', dropna_setup, - start_date=datetime(2012,1,1)) - -# dropna on mixed dtypes -dropna_mixed_setup = common_setup + """ -data = np.random.randn(10000, 1000) -df = DataFrame(data) -df.ix[50:1000,20:50] = np.nan -df.ix[2000:3000] = np.nan -df.ix[:,60:70] = np.nan -df['foo'] = 'bar' -""" -frame_dropna_axis0_any_mixed_dtypes = Benchmark('df.dropna(how="any",axis=0)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) -frame_dropna_axis0_all_mixed_dtypes = Benchmark('df.dropna(how="all",axis=0)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) - -frame_dropna_axis1_any_mixed_dtypes = Benchmark('df.dropna(how="any",axis=1)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) - -frame_dropna_axis1_all_mixed_dtypes = Benchmark('df.dropna(how="all",axis=1)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) - -## dropna multi -dropna_setup = common_setup + """ -data = np.random.randn(10000, 1000) -df = DataFrame(data) -df.ix[50:1000,20:50] = np.nan -df.ix[2000:3000] = np.nan -df.ix[:,60:70] = np.nan -df.index = MultiIndex.from_tuples(df.index.map(lambda x: (x, x))) -df.columns = MultiIndex.from_tuples(df.columns.map(lambda x: (x, x))) -""" -frame_count_level_axis0_multi = Benchmark('df.count(axis=0, level=1)', dropna_setup, - start_date=datetime(2012,1,1)) - -frame_count_level_axis1_multi = Benchmark('df.count(axis=1, level=1)', dropna_setup, - start_date=datetime(2012,1,1)) - -# dropna on mixed dtypes -dropna_mixed_setup = common_setup + """ -data = np.random.randn(10000, 1000) -df = DataFrame(data) -df.ix[50:1000,20:50] = np.nan -df.ix[2000:3000] = np.nan -df.ix[:,60:70] = np.nan -df['foo'] = 'bar' -df.index = MultiIndex.from_tuples(df.index.map(lambda x: (x, x))) -df.columns = MultiIndex.from_tuples(df.columns.map(lambda x: (x, x))) -""" -frame_count_level_axis0_mixed_dtypes_multi = Benchmark('df.count(axis=0, level=1)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) - -frame_count_level_axis1_mixed_dtypes_multi = Benchmark('df.count(axis=1, level=1)', dropna_mixed_setup, - start_date=datetime(2012,1,1)) - -#---------------------------------------------------------------------- -# apply - -setup = common_setup + """ -s = Series(np.arange(1028.)) -df = DataFrame({ i:s for i in range(1028) }) -""" -frame_apply_user_func = Benchmark('df.apply(lambda x: np.corrcoef(x,s)[0,1])', setup, - name = 'frame_apply_user_func', - start_date=datetime(2012,1,1)) - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,100)) -""" -frame_apply_lambda_mean = Benchmark('df.apply(lambda x: x.sum())', setup, - name = 'frame_apply_lambda_mean', - start_date=datetime(2012,1,1)) -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,100)) -""" -frame_apply_np_mean = Benchmark('df.apply(np.mean)', setup, - name = 'frame_apply_np_mean', - start_date=datetime(2012,1,1)) - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,100)) -""" -frame_apply_pass_thru = Benchmark('df.apply(lambda x: x)', setup, - name = 'frame_apply_pass_thru', - start_date=datetime(2012,1,1)) - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,100)) -""" -frame_apply_axis_1 = Benchmark('df.apply(lambda x: x+1,axis=1)', setup, - name = 'frame_apply_axis_1', - start_date=datetime(2012,1,1)) - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,3),columns=list('ABC')) -""" -frame_apply_ref_by_name = Benchmark('df.apply(lambda x: x["A"] + x["B"],axis=1)', setup, - name = 'frame_apply_ref_by_name', - start_date=datetime(2012,1,1)) - -#---------------------------------------------------------------------- -# dtypes - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000,1000)) -""" -frame_dtypes = Benchmark('df.dtypes', setup, - start_date=datetime(2012,1,1)) - -#---------------------------------------------------------------------- -# equals -setup = common_setup + """ -def make_pair(frame): - df = frame - df2 = df.copy() - df2.ix[-1,-1] = np.nan - return df, df2 - -def test_equal(name): - df, df2 = pairs[name] - return df.equals(df) - -def test_unequal(name): - df, df2 = pairs[name] - return df.equals(df2) - -float_df = DataFrame(np.random.randn(1000, 1000)) -object_df = DataFrame([['foo']*1000]*1000) -nonunique_cols = object_df.copy() -nonunique_cols.columns = ['A']*len(nonunique_cols.columns) - -pairs = dict([(name, make_pair(frame)) - for name, frame in (('float_df', float_df), ('object_df', object_df), ('nonunique_cols', nonunique_cols))]) -""" -frame_float_equal = Benchmark('test_equal("float_df")', setup) -frame_object_equal = Benchmark('test_equal("object_df")', setup) -frame_nonunique_equal = Benchmark('test_equal("nonunique_cols")', setup) - -frame_float_unequal = Benchmark('test_unequal("float_df")', setup) -frame_object_unequal = Benchmark('test_unequal("object_df")', setup) -frame_nonunique_unequal = Benchmark('test_unequal("nonunique_cols")', setup) - -#----------------------------------------------------------------------------- -# interpolate -# this is the worst case, where every column has NaNs. -setup = common_setup + """ -df = DataFrame(randn(10000, 100)) -df.values[::2] = np.nan -""" - -frame_interpolate = Benchmark('df.interpolate()', setup, - start_date=datetime(2014, 2, 7)) - -setup = common_setup + """ -df = DataFrame({'A': np.arange(0, 10000), - 'B': np.random.randint(0, 100, 10000), - 'C': randn(10000), - 'D': randn(10000)}) -df.loc[1::5, 'A'] = np.nan -df.loc[1::5, 'C'] = np.nan -""" - -frame_interpolate_some_good = Benchmark('df.interpolate()', setup, - start_date=datetime(2014, 2, 7)) -frame_interpolate_some_good_infer = Benchmark('df.interpolate(downcast="infer")', - setup, - start_date=datetime(2014, 2, 7)) - - -#------------------------------------------------------------------------- -# frame shift speedup issue-5609 - -setup = common_setup + """ -df = DataFrame(np.random.rand(10000,500)) -# note: df._data.blocks are f_contigous -""" -frame_shift_axis0 = Benchmark('df.shift(1,axis=0)', setup, - start_date=datetime(2014,1,1)) -frame_shift_axis1 = Benchmark('df.shift(1,axis=1)', setup, - name = 'frame_shift_axis_1', - start_date=datetime(2014,1,1)) - - -#----------------------------------------------------------------------------- -# from_records issue-6700 - -setup = common_setup + """ -def get_data(n=100000): - return ((x, x*20, x*100) for x in range(n)) -""" - -frame_from_records_generator = Benchmark('df = DataFrame.from_records(get_data())', - setup, - name='frame_from_records_generator', - start_date=datetime(2013,10,4)) # issue-4911 - -frame_from_records_generator_nrows = Benchmark('df = DataFrame.from_records(get_data(), nrows=1000)', - setup, - name='frame_from_records_generator_nrows', - start_date=datetime(2013,10,04)) # issue-4911 - -#----------------------------------------------------------------------------- -# duplicated - -setup = common_setup + ''' -n = 1 << 20 - -t = date_range('2015-01-01', freq='S', periods=n // 64) -xs = np.random.randn(n // 64).round(2) - -df = DataFrame({'a':np.random.randint(- 1 << 8, 1 << 8, n), - 'b':np.random.choice(t, n), - 'c':np.random.choice(xs, n)}) -''' - -frame_duplicated = Benchmark('df.duplicated()', setup, - name='frame_duplicated') diff --git a/vb_suite/generate_rst_files.py b/vb_suite/generate_rst_files.py deleted file mode 100644 index 92e7cd4d59b71..0000000000000 --- a/vb_suite/generate_rst_files.py +++ /dev/null @@ -1,2 +0,0 @@ -from suite import benchmarks, generate_rst_files -generate_rst_files(benchmarks) diff --git a/vb_suite/gil.py b/vb_suite/gil.py deleted file mode 100644 index df2bd2dcd8db4..0000000000000 --- a/vb_suite/gil.py +++ /dev/null @@ -1,110 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -basic = common_setup + """ -try: - from pandas.util.testing import test_parallel - have_real_test_parallel = True -except ImportError: - have_real_test_parallel = False - def test_parallel(num_threads=1): - def wrapper(fname): - return fname - - return wrapper - -N = 1000000 -ngroups = 1000 -np.random.seed(1234) - -df = DataFrame({'key' : np.random.randint(0,ngroups,size=N), - 'data' : np.random.randn(N) }) - -if not have_real_test_parallel: - raise NotImplementedError -""" - -setup = basic + """ - -def f(): - df.groupby('key')['data'].sum() - -# run consecutivily -def g2(): - for i in range(2): - f() -def g4(): - for i in range(4): - f() -def g8(): - for i in range(8): - f() - -# run in parallel -@test_parallel(num_threads=2) -def pg2(): - f() - -@test_parallel(num_threads=4) -def pg4(): - f() - -@test_parallel(num_threads=8) -def pg8(): - f() - -""" - -nogil_groupby_sum_4 = Benchmark( - 'pg4()', setup, - start_date=datetime(2015, 1, 1)) - -nogil_groupby_sum_8 = Benchmark( - 'pg8()', setup, - start_date=datetime(2015, 1, 1)) - - -#### test all groupby funcs #### - -setup = basic + """ - -@test_parallel(num_threads=2) -def pg2(): - df.groupby('key')['data'].func() - -""" - -for f in ['sum','prod','var','count','min','max','mean','last']: - - name = "nogil_groupby_{f}_2".format(f=f) - bmark = Benchmark('pg2()', setup.replace('func',f), start_date=datetime(2015, 1, 1)) - bmark.name = name - globals()[name] = bmark - -del bmark - - -#### test take_1d #### -setup = basic + """ -from pandas.core import common as com - -N = 1e7 -df = DataFrame({'int64' : np.arange(N,dtype='int64'), - 'float64' : np.arange(N,dtype='float64')}) -indexer = np.arange(100,len(df)-100) - -@test_parallel(num_threads=2) -def take_1d_pg2_int64(): - com.take_1d(df.int64.values,indexer) - -@test_parallel(num_threads=2) -def take_1d_pg2_float64(): - com.take_1d(df.float64.values,indexer) - -""" - -nogil_take1d_float64 = Benchmark('take_1d_pg2_int64()', setup, start_date=datetime(2015, 1, 1)) -nogil_take1d_int64 = Benchmark('take_1d_pg2_float64()', setup, start_date=datetime(2015, 1, 1)) diff --git a/vb_suite/groupby.py b/vb_suite/groupby.py deleted file mode 100644 index 268d71f864823..0000000000000 --- a/vb_suite/groupby.py +++ /dev/null @@ -1,620 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -setup = common_setup + """ -N = 100000 -ngroups = 100 - -def get_test_data(ngroups=100, n=100000): - unique_groups = range(ngroups) - arr = np.asarray(np.tile(unique_groups, n / ngroups), dtype=object) - - if len(arr) < n: - arr = np.asarray(list(arr) + unique_groups[:n - len(arr)], - dtype=object) - - random.shuffle(arr) - return arr - -# aggregate multiple columns -df = DataFrame({'key1' : get_test_data(ngroups=ngroups), - 'key2' : get_test_data(ngroups=ngroups), - 'data1' : np.random.randn(N), - 'data2' : np.random.randn(N)}) -def f(): - df.groupby(['key1', 'key2']).agg(lambda x: x.values.sum()) - -simple_series = Series(np.random.randn(N)) -key1 = df['key1'] -""" - -stmt1 = "df.groupby(['key1', 'key2'])['data1'].agg(lambda x: x.values.sum())" -groupby_multi_python = Benchmark(stmt1, setup, - start_date=datetime(2011, 7, 1)) - -stmt3 = "df.groupby(['key1', 'key2']).sum()" -groupby_multi_cython = Benchmark(stmt3, setup, - start_date=datetime(2011, 7, 1)) - -stmt = "df.groupby(['key1', 'key2'])['data1'].agg(np.std)" -groupby_multi_series_op = Benchmark(stmt, setup, - start_date=datetime(2011, 8, 1)) - -groupby_series_simple_cython = \ - Benchmark('simple_series.groupby(key1).sum()', setup, - start_date=datetime(2011, 3, 1)) - - -stmt4 = "df.groupby('key1').rank(pct=True)" -groupby_series_simple_cython = Benchmark(stmt4, setup, - start_date=datetime(2014, 1, 16)) - -#---------------------------------------------------------------------- -# 2d grouping, aggregate many columns - -setup = common_setup + """ -labels = np.random.randint(0, 100, size=1000) -df = DataFrame(randn(1000, 1000)) -""" - -groupby_frame_cython_many_columns = Benchmark( - 'df.groupby(labels).sum()', setup, - start_date=datetime(2011, 8, 1), - logy=True) - -#---------------------------------------------------------------------- -# single key, long, integer key - -setup = common_setup + """ -data = np.random.randn(100000, 1) -labels = np.random.randint(0, 1000, size=100000) -df = DataFrame(data) -""" - -groupby_frame_singlekey_integer = \ - Benchmark('df.groupby(labels).sum()', setup, - start_date=datetime(2011, 8, 1), logy=True) - -#---------------------------------------------------------------------- -# group with different functions per column - -setup = common_setup + """ -fac1 = np.array(['A', 'B', 'C'], dtype='O') -fac2 = np.array(['one', 'two'], dtype='O') - -df = DataFrame({'key1': fac1.take(np.random.randint(0, 3, size=100000)), - 'key2': fac2.take(np.random.randint(0, 2, size=100000)), - 'value1' : np.random.randn(100000), - 'value2' : np.random.randn(100000), - 'value3' : np.random.randn(100000)}) -""" - -groupby_multi_different_functions = \ - Benchmark("""df.groupby(['key1', 'key2']).agg({'value1' : 'mean', - 'value2' : 'var', - 'value3' : 'sum'})""", - setup, start_date=datetime(2011, 9, 1)) - -groupby_multi_different_numpy_functions = \ - Benchmark("""df.groupby(['key1', 'key2']).agg({'value1' : np.mean, - 'value2' : np.var, - 'value3' : np.sum})""", - setup, start_date=datetime(2011, 9, 1)) - -#---------------------------------------------------------------------- -# size() speed - -setup = common_setup + """ -n = 100000 -offsets = np.random.randint(n, size=n).astype('timedelta64[ns]') -dates = np.datetime64('now') + offsets -df = DataFrame({'key1': np.random.randint(0, 500, size=n), - 'key2': np.random.randint(0, 100, size=n), - 'value1' : np.random.randn(n), - 'value2' : np.random.randn(n), - 'value3' : np.random.randn(n), - 'dates' : dates}) -""" - -groupby_multi_size = Benchmark("df.groupby(['key1', 'key2']).size()", - setup, start_date=datetime(2011, 10, 1)) - -groupby_dt_size = Benchmark("df.groupby(['dates']).size()", - setup, start_date=datetime(2011, 10, 1)) - -groupby_dt_timegrouper_size = Benchmark("df.groupby(TimeGrouper(key='dates', freq='M')).size()", - setup, start_date=datetime(2011, 10, 1)) - -#---------------------------------------------------------------------- -# count() speed - -setup = common_setup + """ -n = 10000 -offsets = np.random.randint(n, size=n).astype('timedelta64[ns]') - -dates = np.datetime64('now') + offsets -dates[np.random.rand(n) > 0.5] = np.datetime64('nat') - -offsets[np.random.rand(n) > 0.5] = np.timedelta64('nat') - -value2 = np.random.randn(n) -value2[np.random.rand(n) > 0.5] = np.nan - -obj = np.random.choice(list('ab'), size=n).astype(object) -obj[np.random.randn(n) > 0.5] = np.nan - -df = DataFrame({'key1': np.random.randint(0, 500, size=n), - 'key2': np.random.randint(0, 100, size=n), - 'dates': dates, - 'value2' : value2, - 'value3' : np.random.randn(n), - 'ints': np.random.randint(0, 1000, size=n), - 'obj': obj, - 'offsets': offsets}) -""" - -groupby_multi_count = Benchmark("df.groupby(['key1', 'key2']).count()", - setup, name='groupby_multi_count', - start_date=datetime(2014, 5, 5)) - -setup = common_setup + """ -n = 10000 - -df = DataFrame({'key1': randint(0, 500, size=n), - 'key2': randint(0, 100, size=n), - 'ints': randint(0, 1000, size=n), - 'ints2': randint(0, 1000, size=n)}) -""" - -groupby_int_count = Benchmark("df.groupby(['key1', 'key2']).count()", - setup, name='groupby_int_count', - start_date=datetime(2014, 5, 6)) -#---------------------------------------------------------------------- -# Series.value_counts - -setup = common_setup + """ -s = Series(np.random.randint(0, 1000, size=100000)) -""" - -series_value_counts_int64 = Benchmark('s.value_counts()', setup, - start_date=datetime(2011, 10, 21)) - -# value_counts on lots of strings - -setup = common_setup + """ -K = 1000 -N = 100000 -uniques = tm.makeStringIndex(K).values -s = Series(np.tile(uniques, N // K)) -""" - -series_value_counts_strings = Benchmark('s.value_counts()', setup, - start_date=datetime(2011, 10, 21)) - -#value_counts on float dtype - -setup = common_setup + """ -s = Series(np.random.randint(0, 1000, size=100000)).astype(float) -""" - -series_value_counts_float64 = Benchmark('s.value_counts()', setup, - start_date=datetime(2015, 8, 17)) - -#---------------------------------------------------------------------- -# pivot_table - -setup = common_setup + """ -fac1 = np.array(['A', 'B', 'C'], dtype='O') -fac2 = np.array(['one', 'two'], dtype='O') - -ind1 = np.random.randint(0, 3, size=100000) -ind2 = np.random.randint(0, 2, size=100000) - -df = DataFrame({'key1': fac1.take(ind1), -'key2': fac2.take(ind2), -'key3': fac2.take(ind2), -'value1' : np.random.randn(100000), -'value2' : np.random.randn(100000), -'value3' : np.random.randn(100000)}) -""" - -stmt = "df.pivot_table(index='key1', columns=['key2', 'key3'])" -groupby_pivot_table = Benchmark(stmt, setup, start_date=datetime(2011, 12, 15)) - - -#---------------------------------------------------------------------- -# dict return values - -setup = common_setup + """ -labels = np.arange(1000).repeat(10) -data = Series(randn(len(labels))) -f = lambda x: {'first': x.values[0], 'last': x.values[-1]} -""" - -groupby_apply_dict_return = Benchmark('data.groupby(labels).apply(f)', - setup, start_date=datetime(2011, 12, 15)) - -#---------------------------------------------------------------------- -# First / last functions - -setup = common_setup + """ -labels = np.arange(10000).repeat(10) -data = Series(randn(len(labels))) -data[::3] = np.nan -data[1::3] = np.nan -data2 = Series(randn(len(labels)),dtype='float32') -data2[::3] = np.nan -data2[1::3] = np.nan -labels = labels.take(np.random.permutation(len(labels))) -""" - -groupby_first_float64 = Benchmark('data.groupby(labels).first()', setup, - start_date=datetime(2012, 5, 1)) - -groupby_first_float32 = Benchmark('data2.groupby(labels).first()', setup, - start_date=datetime(2013, 1, 1)) - -groupby_last_float64 = Benchmark('data.groupby(labels).last()', setup, - start_date=datetime(2012, 5, 1)) - -groupby_last_float32 = Benchmark('data2.groupby(labels).last()', setup, - start_date=datetime(2013, 1, 1)) - -groupby_nth_float64_none = Benchmark('data.groupby(labels).nth(0)', setup, - start_date=datetime(2012, 5, 1)) -groupby_nth_float32_none = Benchmark('data2.groupby(labels).nth(0)', setup, - start_date=datetime(2013, 1, 1)) -groupby_nth_float64_any = Benchmark('data.groupby(labels).nth(0,dropna="all")', setup, - start_date=datetime(2012, 5, 1)) -groupby_nth_float32_any = Benchmark('data2.groupby(labels).nth(0,dropna="all")', setup, - start_date=datetime(2013, 1, 1)) - -# with datetimes (GH7555) -setup = common_setup + """ -df = DataFrame({'a' : date_range('1/1/2011',periods=100000,freq='s'),'b' : range(100000)}) -""" - -groupby_first_datetimes = Benchmark('df.groupby("b").first()', setup, - start_date=datetime(2013, 5, 1)) -groupby_last_datetimes = Benchmark('df.groupby("b").last()', setup, - start_date=datetime(2013, 5, 1)) -groupby_nth_datetimes_none = Benchmark('df.groupby("b").nth(0)', setup, - start_date=datetime(2013, 5, 1)) -groupby_nth_datetimes_any = Benchmark('df.groupby("b").nth(0,dropna="all")', setup, - start_date=datetime(2013, 5, 1)) - -# with object -setup = common_setup + """ -df = DataFrame({'a' : ['foo']*100000,'b' : range(100000)}) -""" - -groupby_first_object = Benchmark('df.groupby("b").first()', setup, - start_date=datetime(2013, 5, 1)) -groupby_last_object = Benchmark('df.groupby("b").last()', setup, - start_date=datetime(2013, 5, 1)) -groupby_nth_object_none = Benchmark('df.groupby("b").nth(0)', setup, - start_date=datetime(2013, 5, 1)) -groupby_nth_object_any = Benchmark('df.groupby("b").nth(0,dropna="any")', setup, - start_date=datetime(2013, 5, 1)) - -#---------------------------------------------------------------------- -# groupby_indices replacement, chop up Series - -setup = common_setup + """ -try: - rng = date_range('1/1/2000', '12/31/2005', freq='H') - year, month, day = rng.year, rng.month, rng.day -except: - rng = date_range('1/1/2000', '12/31/2000', offset=datetools.Hour()) - year = rng.map(lambda x: x.year) - month = rng.map(lambda x: x.month) - day = rng.map(lambda x: x.day) - -ts = Series(np.random.randn(len(rng)), index=rng) -""" - -groupby_indices = Benchmark('len(ts.groupby([year, month, day]))', - setup, start_date=datetime(2012, 1, 1)) - -#---------------------------------------------------------------------- -# median - -#---------------------------------------------------------------------- -# single key, long, integer key - -setup = common_setup + """ -data = np.random.randn(100000, 2) -labels = np.random.randint(0, 1000, size=100000) -df = DataFrame(data) -""" - -groupby_frame_median = \ - Benchmark('df.groupby(labels).median()', setup, - start_date=datetime(2011, 8, 1), logy=True) - - -setup = common_setup + """ -data = np.random.randn(1000000, 2) -labels = np.random.randint(0, 1000, size=1000000) -df = DataFrame(data) -""" - -groupby_simple_compress_timing = \ - Benchmark('df.groupby(labels).mean()', setup, - start_date=datetime(2011, 8, 1)) - - -#---------------------------------------------------------------------- -# DataFrame Apply overhead - -setup = common_setup + """ -N = 10000 -labels = np.random.randint(0, 2000, size=N) -labels2 = np.random.randint(0, 3, size=N) -df = DataFrame({'key': labels, -'key2': labels2, -'value1': randn(N), -'value2': ['foo', 'bar', 'baz', 'qux'] * (N / 4)}) -def f(g): - return 1 -""" - -groupby_frame_apply_overhead = Benchmark("df.groupby('key').apply(f)", setup, - start_date=datetime(2011, 10, 1)) - -groupby_frame_apply = Benchmark("df.groupby(['key', 'key2']).apply(f)", setup, - start_date=datetime(2011, 10, 1)) - - -#---------------------------------------------------------------------- -# DataFrame nth - -setup = common_setup + """ -df = DataFrame(np.random.randint(1, 100, (10000, 2))) -""" - -# Not really a fair test as behaviour has changed! -groupby_frame_nth_none = Benchmark("df.groupby(0).nth(0)", setup, - start_date=datetime(2014, 3, 1)) - -groupby_series_nth_none = Benchmark("df[1].groupby(df[0]).nth(0)", setup, - start_date=datetime(2014, 3, 1)) -groupby_frame_nth_any= Benchmark("df.groupby(0).nth(0,dropna='any')", setup, - start_date=datetime(2014, 3, 1)) - -groupby_series_nth_any = Benchmark("df[1].groupby(df[0]).nth(0,dropna='any')", setup, - start_date=datetime(2014, 3, 1)) - - -#---------------------------------------------------------------------- -# Sum booleans #2692 - -setup = common_setup + """ -N = 500 -df = DataFrame({'ii':range(N),'bb':[True for x in range(N)]}) -""" - -groupby_sum_booleans = Benchmark("df.groupby('ii').sum()", setup) - - -#---------------------------------------------------------------------- -# multi-indexed group sum #9049 - -setup = common_setup + """ -N = 50 -df = DataFrame({'A': range(N) * 2, 'B': range(N*2), 'C': 1}).set_index(["A", "B"]) -""" - -groupby_sum_multiindex = Benchmark("df.groupby(level=[0, 1]).sum()", setup) - - -#---------------------------------------------------------------------- -# Transform testing - -setup = common_setup + """ -n_dates = 400 -n_securities = 250 -n_columns = 3 -share_na = 0.1 - -dates = date_range('1997-12-31', periods=n_dates, freq='B') -dates = Index(map(lambda x: x.year * 10000 + x.month * 100 + x.day, dates)) - -secid_min = int('10000000', 16) -secid_max = int('F0000000', 16) -step = (secid_max - secid_min) // (n_securities - 1) -security_ids = map(lambda x: hex(x)[2:10].upper(), range(secid_min, secid_max + 1, step)) - -data_index = MultiIndex(levels=[dates.values, security_ids], - labels=[[i for i in range(n_dates) for _ in xrange(n_securities)], range(n_securities) * n_dates], - names=['date', 'security_id']) -n_data = len(data_index) - -columns = Index(['factor{}'.format(i) for i in range(1, n_columns + 1)]) - -data = DataFrame(np.random.randn(n_data, n_columns), index=data_index, columns=columns) - -step = int(n_data * share_na) -for column_index in range(n_columns): - index = column_index - while index < n_data: - data.set_value(data_index[index], columns[column_index], np.nan) - index += step - -f_fillna = lambda x: x.fillna(method='pad') -""" - -groupby_transform = Benchmark("data.groupby(level='security_id').transform(f_fillna)", setup) -groupby_transform_ufunc = Benchmark("data.groupby(level='date').transform(np.max)", setup) - -setup = common_setup + """ -np.random.seed(0) - -N = 120000 -N_TRANSITIONS = 1400 - -# generate groups -transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS] -transition_points.sort() -transitions = np.zeros((N,), dtype=np.bool) -transitions[transition_points] = True -g = transitions.cumsum() - -df = DataFrame({ 'signal' : np.random.rand(N)}) -""" -groupby_transform_series = Benchmark("df['signal'].groupby(g).transform(np.mean)", setup) - -setup = common_setup + """ -np.random.seed(0) - -df=DataFrame( { 'id' : np.arange( 100000 ) / 3, - 'val': np.random.randn( 100000) } ) -""" - -groupby_transform_series2 = Benchmark("df.groupby('id')['val'].transform(np.mean)", setup) - -setup = common_setup + ''' -np.random.seed(2718281) -n = 20000 -df = DataFrame(np.random.randint(1, n, (n, 3)), - columns=['jim', 'joe', 'jolie']) -''' - -stmt = "df.groupby(['jim', 'joe'])['jolie'].transform('max')"; -groupby_transform_multi_key1 = Benchmark(stmt, setup) -groupby_transform_multi_key2 = Benchmark(stmt, setup + "df['jim'] = df['joe']") - -setup = common_setup + ''' -np.random.seed(2718281) -n = 200000 -df = DataFrame(np.random.randint(1, n / 10, (n, 3)), - columns=['jim', 'joe', 'jolie']) -''' -groupby_transform_multi_key3 = Benchmark(stmt, setup) -groupby_transform_multi_key4 = Benchmark(stmt, setup + "df['jim'] = df['joe']") - -setup = common_setup + ''' -np.random.seed(27182) -n = 100000 -df = DataFrame(np.random.randint(1, n / 100, (n, 3)), - columns=['jim', 'joe', 'jolie']) -''' - -groupby_agg_builtins1 = Benchmark("df.groupby('jim').agg([sum, min, max])", setup) -groupby_agg_builtins2 = Benchmark("df.groupby(['jim', 'joe']).agg([sum, min, max])", setup) - - -setup = common_setup + ''' -arr = np.random.randint(- 1 << 12, 1 << 12, (1 << 17, 5)) -i = np.random.choice(len(arr), len(arr) * 5) -arr = np.vstack((arr, arr[i])) # add sume duplicate rows - -i = np.random.permutation(len(arr)) -arr = arr[i] # shuffle rows - -df = DataFrame(arr, columns=list('abcde')) -df['jim'], df['joe'] = np.random.randn(2, len(df)) * 10 -''' - -groupby_int64_overflow = Benchmark("df.groupby(list('abcde')).max()", setup, - name='groupby_int64_overflow') - - -setup = common_setup + ''' -from itertools import product -from string import ascii_letters, digits - -n = 5 * 7 * 11 * (1 << 9) -alpha = list(map(''.join, product(ascii_letters + digits, repeat=4))) -f = lambda k: np.repeat(np.random.choice(alpha, n // k), k) - -df = DataFrame({'a': f(11), 'b': f(7), 'c': f(5), 'd': f(1)}) -df['joe'] = (np.random.randn(len(df)) * 10).round(3) - -i = np.random.permutation(len(df)) -df = df.iloc[i].reset_index(drop=True).copy() -''' - -groupby_multi_index = Benchmark("df.groupby(list('abcd')).max()", setup, - name='groupby_multi_index') - -#---------------------------------------------------------------------- -# groupby with a variable value for ngroups - - -ngroups_list = [100, 10000] -no_arg_func_list = [ - 'all', - 'any', - 'count', - 'cumcount', - 'cummax', - 'cummin', - 'cumprod', - 'cumsum', - 'describe', - 'diff', - 'first', - 'head', - 'last', - 'mad', - 'max', - 'mean', - 'median', - 'min', - 'nunique', - 'pct_change', - 'prod', - 'rank', - 'sem', - 'size', - 'skew', - 'std', - 'sum', - 'tail', - 'unique', - 'var', - 'value_counts', -] - - -_stmt_template = "df.groupby('value')['timestamp'].%s" -_setup_template = common_setup + """ -np.random.seed(1234) -ngroups = %s -size = ngroups * 2 -rng = np.arange(ngroups) -df = DataFrame(dict( - timestamp=rng.take(np.random.randint(0, ngroups, size=size)), - value=np.random.randint(0, size, size=size) -)) -""" -START_DATE = datetime(2011, 7, 1) - - -def make_large_ngroups_bmark(ngroups, func_name, func_args=''): - bmark_name = 'groupby_ngroups_%s_%s' % (ngroups, func_name) - stmt = _stmt_template % ('%s(%s)' % (func_name, func_args)) - setup = _setup_template % ngroups - bmark = Benchmark(stmt, setup, start_date=START_DATE) - # MUST set name - bmark.name = bmark_name - return bmark - - -def inject_bmark_into_globals(bmark): - if not bmark.name: - raise AssertionError('benchmark must have a name') - globals()[bmark.name] = bmark - - -for ngroups in ngroups_list: - for func_name in no_arg_func_list: - bmark = make_large_ngroups_bmark(ngroups, func_name) - inject_bmark_into_globals(bmark) - -# avoid bmark to be collected as Benchmark object -del bmark diff --git a/vb_suite/hdfstore_bench.py b/vb_suite/hdfstore_bench.py deleted file mode 100644 index 393fd4cc77e66..0000000000000 --- a/vb_suite/hdfstore_bench.py +++ /dev/null @@ -1,278 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -start_date = datetime(2012, 7, 1) - -common_setup = """from .pandas_vb_common import * -import os - -f = '__test__.h5' -def remove(f): - try: - os.remove(f) - except: - pass - -""" - -#---------------------------------------------------------------------- -# get from a store - -setup1 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000)}, - index=index) -remove(f) -store = HDFStore(f) -store.put('df1',df) -""" - -read_store = Benchmark("store.get('df1')", setup1, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a store - -setup2 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000)}, - index=index) -remove(f) -store = HDFStore(f) -""" - -write_store = Benchmark( - "store.put('df2',df)", setup2, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# get from a store (mixed) - -setup3 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000), - 'string1' : ['foo'] * 25000, - 'bool1' : [True] * 25000, - 'int1' : np.random.randint(0, 250000, size=25000)}, - index=index) -remove(f) -store = HDFStore(f) -store.put('df3',df) -""" - -read_store_mixed = Benchmark( - "store.get('df3')", setup3, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a store (mixed) - -setup4 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000), - 'string1' : ['foo'] * 25000, - 'bool1' : [True] * 25000, - 'int1' : np.random.randint(0, 250000, size=25000)}, - index=index) -remove(f) -store = HDFStore(f) -""" - -write_store_mixed = Benchmark( - "store.put('df4',df)", setup4, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# get from a table (mixed) - -setup5 = common_setup + """ -N=10000 -index = tm.makeStringIndex(N) -df = DataFrame({'float1' : randn(N), - 'float2' : randn(N), - 'string1' : ['foo'] * N, - 'bool1' : [True] * N, - 'int1' : np.random.randint(0, N, size=N)}, - index=index) - -remove(f) -store = HDFStore(f) -store.append('df5',df) -""" - -read_store_table_mixed = Benchmark( - "store.select('df5')", setup5, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a table (mixed) - -setup6 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000), - 'string1' : ['foo'] * 25000, - 'bool1' : [True] * 25000, - 'int1' : np.random.randint(0, 25000, size=25000)}, - index=index) -remove(f) -store = HDFStore(f) -""" - -write_store_table_mixed = Benchmark( - "store.append('df6',df)", setup6, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# select from a table - -setup7 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000) }, - index=index) - -remove(f) -store = HDFStore(f) -store.append('df7',df) -""" - -read_store_table = Benchmark( - "store.select('df7')", setup7, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a table - -setup8 = common_setup + """ -index = tm.makeStringIndex(25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000) }, - index=index) -remove(f) -store = HDFStore(f) -""" - -write_store_table = Benchmark( - "store.append('df8',df)", setup8, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# get from a table (wide) - -setup9 = common_setup + """ -df = DataFrame(np.random.randn(25000,100)) - -remove(f) -store = HDFStore(f) -store.append('df9',df) -""" - -read_store_table_wide = Benchmark( - "store.select('df9')", setup9, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a table (wide) - -setup10 = common_setup + """ -df = DataFrame(np.random.randn(25000,100)) - -remove(f) -store = HDFStore(f) -""" - -write_store_table_wide = Benchmark( - "store.append('df10',df)", setup10, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# get from a table (wide) - -setup11 = common_setup + """ -index = date_range('1/1/2000', periods = 25000) -df = DataFrame(np.random.randn(25000,100), index = index) - -remove(f) -store = HDFStore(f) -store.append('df11',df) -""" - -query_store_table_wide = Benchmark( - "store.select('df11', [ ('index', '>', df.index[10000]), ('index', '<', df.index[15000]) ])", setup11, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# query from a table - -setup12 = common_setup + """ -index = date_range('1/1/2000', periods = 25000) -df = DataFrame({'float1' : randn(25000), - 'float2' : randn(25000) }, - index=index) - -remove(f) -store = HDFStore(f) -store.append('df12',df) -""" - -query_store_table = Benchmark( - "store.select('df12', [ ('index', '>', df.index[10000]), ('index', '<', df.index[15000]) ])", setup12, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# select from a panel table - -setup13 = common_setup + """ -p = Panel(randn(20, 1000, 25), items= [ 'Item%03d' % i for i in range(20) ], - major_axis=date_range('1/1/2000', periods=1000), minor_axis = [ 'E%03d' % i for i in range(25) ]) - -remove(f) -store = HDFStore(f) -store.append('p1',p) -""" - -read_store_table_panel = Benchmark( - "store.select('p1')", setup13, cleanup="store.close()", - start_date=start_date) - - -#---------------------------------------------------------------------- -# write to a panel table - -setup14 = common_setup + """ -p = Panel(randn(20, 1000, 25), items= [ 'Item%03d' % i for i in range(20) ], - major_axis=date_range('1/1/2000', periods=1000), minor_axis = [ 'E%03d' % i for i in range(25) ]) - -remove(f) -store = HDFStore(f) -""" - -write_store_table_panel = Benchmark( - "store.append('p2',p)", setup14, cleanup="store.close()", - start_date=start_date) - -#---------------------------------------------------------------------- -# write to a table (data_columns) - -setup15 = common_setup + """ -df = DataFrame(np.random.randn(10000,10),columns = [ 'C%03d' % i for i in range(10) ]) - -remove(f) -store = HDFStore(f) -""" - -write_store_table_dc = Benchmark( - "store.append('df15',df,data_columns=True)", setup15, cleanup="store.close()", - start_date=start_date) - diff --git a/vb_suite/index_object.py b/vb_suite/index_object.py deleted file mode 100644 index 2ab2bc15f3853..0000000000000 --- a/vb_suite/index_object.py +++ /dev/null @@ -1,173 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -SECTION = "Index / MultiIndex objects" - - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# intersection, union - -setup = common_setup + """ -rng = DatetimeIndex(start='1/1/2000', periods=10000, freq=datetools.Minute()) -if rng.dtype == object: - rng = rng.view(Index) -else: - rng = rng.asobject -rng2 = rng[:-1] -""" - -index_datetime_intersection = Benchmark("rng.intersection(rng2)", setup) -index_datetime_union = Benchmark("rng.union(rng2)", setup) - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=10000, freq='T') -rng2 = rng[:-1] -""" - -datetime_index_intersection = Benchmark("rng.intersection(rng2)", setup, - start_date=datetime(2013, 9, 27)) -datetime_index_union = Benchmark("rng.union(rng2)", setup, - start_date=datetime(2013, 9, 27)) - -# integers -setup = common_setup + """ -N = 1000000 -options = np.arange(N) - -left = Index(options.take(np.random.permutation(N)[:N // 2])) -right = Index(options.take(np.random.permutation(N)[:N // 2])) -""" - -index_int64_union = Benchmark('left.union(right)', setup, - start_date=datetime(2011, 1, 1)) - -index_int64_intersection = Benchmark('left.intersection(right)', setup, - start_date=datetime(2011, 1, 1)) - -#---------------------------------------------------------------------- -# string index slicing -setup = common_setup + """ -idx = tm.makeStringIndex(1000000) - -mask = np.arange(1000000) % 3 == 0 -series_mask = Series(mask) -""" -index_str_slice_indexer_basic = Benchmark('idx[:-1]', setup) -index_str_slice_indexer_even = Benchmark('idx[::2]', setup) -index_str_boolean_indexer = Benchmark('idx[mask]', setup) -index_str_boolean_series_indexer = Benchmark('idx[series_mask]', setup) - -#---------------------------------------------------------------------- -# float64 index -#---------------------------------------------------------------------- -# construction -setup = common_setup + """ -baseidx = np.arange(1e6) -""" - -index_float64_construct = Benchmark('Index(baseidx)', setup, - name='index_float64_construct', - start_date=datetime(2014, 4, 13)) - -setup = common_setup + """ -idx = tm.makeFloatIndex(1000000) - -mask = np.arange(idx.size) % 3 == 0 -series_mask = Series(mask) -""" -#---------------------------------------------------------------------- -# getting -index_float64_get = Benchmark('idx[1]', setup, name='index_float64_get', - start_date=datetime(2014, 4, 13)) - - -#---------------------------------------------------------------------- -# slicing -index_float64_slice_indexer_basic = Benchmark('idx[:-1]', setup, - name='index_float64_slice_indexer_basic', - start_date=datetime(2014, 4, 13)) -index_float64_slice_indexer_even = Benchmark('idx[::2]', setup, - name='index_float64_slice_indexer_even', - start_date=datetime(2014, 4, 13)) -index_float64_boolean_indexer = Benchmark('idx[mask]', setup, - name='index_float64_boolean_indexer', - start_date=datetime(2014, 4, 13)) -index_float64_boolean_series_indexer = Benchmark('idx[series_mask]', setup, - name='index_float64_boolean_series_indexer', - start_date=datetime(2014, 4, 13)) - -#---------------------------------------------------------------------- -# arith ops -index_float64_mul = Benchmark('idx * 2', setup, name='index_float64_mul', - start_date=datetime(2014, 4, 13)) -index_float64_div = Benchmark('idx / 2', setup, name='index_float64_div', - start_date=datetime(2014, 4, 13)) - - -# Constructing MultiIndex from cartesian product of iterables -# - -setup = common_setup + """ -iterables = [tm.makeStringIndex(10000), range(20)] -""" - -multiindex_from_product = Benchmark('MultiIndex.from_product(iterables)', - setup, name='multiindex_from_product', - start_date=datetime(2014, 6, 30)) - -#---------------------------------------------------------------------- -# MultiIndex with DatetimeIndex level - -setup = common_setup + """ -level1 = range(1000) -level2 = date_range(start='1/1/2012', periods=100) -mi = MultiIndex.from_product([level1, level2]) -""" - -multiindex_with_datetime_level_full = \ - Benchmark("mi.copy().values", setup, - name='multiindex_with_datetime_level_full', - start_date=datetime(2014, 10, 11)) - - -multiindex_with_datetime_level_sliced = \ - Benchmark("mi[:10].values", setup, - name='multiindex_with_datetime_level_sliced', - start_date=datetime(2014, 10, 11)) - -# multi-index duplicated -setup = common_setup + """ -n, k = 200, 5000 -levels = [np.arange(n), tm.makeStringIndex(n).values, 1000 + np.arange(n)] -labels = [np.random.choice(n, k * n) for lev in levels] -mi = MultiIndex(levels=levels, labels=labels) -""" - -multiindex_duplicated = Benchmark('mi.duplicated()', setup, - name='multiindex_duplicated') - -#---------------------------------------------------------------------- -# repr - -setup = common_setup + """ -dr = pd.date_range('20000101', freq='D', periods=100000) -""" - -datetime_index_repr = \ - Benchmark("dr._is_dates_only", setup, - start_date=datetime(2012, 1, 11)) - -setup = common_setup + """ -n = 3 * 5 * 7 * 11 * (1 << 10) -low, high = - 1 << 12, 1 << 12 -f = lambda k: np.repeat(np.random.randint(low, high, n // k), k) - -i = np.random.permutation(n) -mi = MultiIndex.from_arrays([f(11), f(7), f(5), f(3), f(1)])[i] -""" - -multiindex_sortlevel_int64 = Benchmark('mi.sortlevel()', setup, - name='multiindex_sortlevel_int64') diff --git a/vb_suite/indexing.py b/vb_suite/indexing.py deleted file mode 100644 index 3d95d52dccd71..0000000000000 --- a/vb_suite/indexing.py +++ /dev/null @@ -1,292 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -SECTION = 'Indexing and scalar value access' - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# Series.__getitem__, get_value, __getitem__(slice) - -setup = common_setup + """ -tm.N = 1000 -ts = tm.makeTimeSeries() -dt = ts.index[500] -""" -statement = "ts[dt]" -bm_getitem = Benchmark(statement, setup, ncalls=100000, - name='time_series_getitem_scalar') - -setup = common_setup + """ -index = tm.makeStringIndex(1000) -s = Series(np.random.rand(1000), index=index) -idx = index[100] -""" -statement = "s.get_value(idx)" -bm_get_value = Benchmark(statement, setup, - name='series_get_value', - start_date=datetime(2011, 11, 12)) - - -setup = common_setup + """ -index = tm.makeStringIndex(1000000) -s = Series(np.random.rand(1000000), index=index) -""" -series_getitem_pos_slice = Benchmark("s[:800000]", setup, - name="series_getitem_pos_slice") - - -setup = common_setup + """ -index = tm.makeStringIndex(1000000) -s = Series(np.random.rand(1000000), index=index) -lbl = s.index[800000] -""" -series_getitem_label_slice = Benchmark("s[:lbl]", setup, - name="series_getitem_label_slice") - - -#---------------------------------------------------------------------- -# DataFrame __getitem__ - -setup = common_setup + """ -index = tm.makeStringIndex(1000) -columns = tm.makeStringIndex(30) -df = DataFrame(np.random.rand(1000, 30), index=index, - columns=columns) -idx = index[100] -col = columns[10] -""" -statement = "df[col][idx]" -bm_df_getitem = Benchmark(statement, setup, - name='dataframe_getitem_scalar') - -setup = common_setup + """ -try: - klass = DataMatrix -except: - klass = DataFrame - -index = tm.makeStringIndex(1000) -columns = tm.makeStringIndex(30) -df = klass(np.random.rand(1000, 30), index=index, columns=columns) -idx = index[100] -col = columns[10] -""" -statement = "df[col][idx]" -bm_df_getitem2 = Benchmark(statement, setup, - name='datamatrix_getitem_scalar') - - -#---------------------------------------------------------------------- -# ix get scalar - -setup = common_setup + """ -index = tm.makeStringIndex(1000) -columns = tm.makeStringIndex(30) -df = DataFrame(np.random.randn(1000, 30), index=index, columns=columns) -idx = index[100] -col = columns[10] -""" - -indexing_frame_get_value_ix = Benchmark("df.ix[idx,col]", setup, - name='indexing_frame_get_value_ix', - start_date=datetime(2011, 11, 12)) - -indexing_frame_get_value = Benchmark("df.get_value(idx,col)", setup, - name='indexing_frame_get_value', - start_date=datetime(2011, 11, 12)) - -setup = common_setup + """ -mi = MultiIndex.from_tuples([(x,y) for x in range(1000) for y in range(1000)]) -s = Series(np.random.randn(1000000), index=mi) -""" - -series_xs_mi_ix = Benchmark("s.ix[999]", setup, - name='series_xs_mi_ix', - start_date=datetime(2013, 1, 1)) - -setup = common_setup + """ -mi = MultiIndex.from_tuples([(x,y) for x in range(1000) for y in range(1000)]) -s = Series(np.random.randn(1000000), index=mi) -df = DataFrame(s) -""" - -frame_xs_mi_ix = Benchmark("df.ix[999]", setup, - name='frame_xs_mi_ix', - start_date=datetime(2013, 1, 1)) - -#---------------------------------------------------------------------- -# Boolean DataFrame row selection - -setup = common_setup + """ -df = DataFrame(np.random.randn(10000, 4), columns=['A', 'B', 'C', 'D']) -indexer = df['B'] > 0 -obj_indexer = indexer.astype('O') -""" -indexing_dataframe_boolean_rows = \ - Benchmark("df[indexer]", setup, name='indexing_dataframe_boolean_rows') - -indexing_dataframe_boolean_rows_object = \ - Benchmark("df[obj_indexer]", setup, - name='indexing_dataframe_boolean_rows_object') - -setup = common_setup + """ -df = DataFrame(np.random.randn(50000, 100)) -df2 = DataFrame(np.random.randn(50000, 100)) -""" -indexing_dataframe_boolean = \ - Benchmark("df > df2", setup, name='indexing_dataframe_boolean', - start_date=datetime(2012, 1, 1)) - -setup = common_setup + """ -try: - import pandas.computation.expressions as expr -except: - expr = None - -if expr is None: - raise NotImplementedError -df = DataFrame(np.random.randn(50000, 100)) -df2 = DataFrame(np.random.randn(50000, 100)) -expr.set_numexpr_threads(1) -""" - -indexing_dataframe_boolean_st = \ - Benchmark("df > df2", setup, name='indexing_dataframe_boolean_st',cleanup="expr.set_numexpr_threads()", - start_date=datetime(2013, 2, 26)) - - -setup = common_setup + """ -try: - import pandas.computation.expressions as expr -except: - expr = None - -if expr is None: - raise NotImplementedError -df = DataFrame(np.random.randn(50000, 100)) -df2 = DataFrame(np.random.randn(50000, 100)) -expr.set_use_numexpr(False) -""" - -indexing_dataframe_boolean_no_ne = \ - Benchmark("df > df2", setup, name='indexing_dataframe_boolean_no_ne',cleanup="expr.set_use_numexpr(True)", - start_date=datetime(2013, 2, 26)) -#---------------------------------------------------------------------- -# MultiIndex sortlevel - -setup = common_setup + """ -a = np.repeat(np.arange(100), 1000) -b = np.tile(np.arange(1000), 100) -midx = MultiIndex.from_arrays([a, b]) -midx = midx.take(np.random.permutation(np.arange(100000))) -""" -sort_level_zero = Benchmark("midx.sortlevel(0)", setup, - start_date=datetime(2012, 1, 1)) -sort_level_one = Benchmark("midx.sortlevel(1)", setup, - start_date=datetime(2012, 1, 1)) - -#---------------------------------------------------------------------- -# Panel subset selection - -setup = common_setup + """ -p = Panel(np.random.randn(100, 100, 100)) -inds = range(0, 100, 10) -""" - -indexing_panel_subset = Benchmark('p.ix[inds, inds, inds]', setup, - start_date=datetime(2012, 1, 1)) - -#---------------------------------------------------------------------- -# Iloc - -setup = common_setup + """ -df = DataFrame({'A' : [0.1] * 3000, 'B' : [1] * 3000}) -idx = np.array(range(30)) * 99 -df2 = DataFrame({'A' : [0.1] * 1000, 'B' : [1] * 1000}) -df2 = concat([df2, 2*df2, 3*df2]) -""" - -frame_iloc_dups = Benchmark('df2.iloc[idx]', setup, - start_date=datetime(2013, 1, 1)) - -frame_loc_dups = Benchmark('df2.loc[idx]', setup, - start_date=datetime(2013, 1, 1)) - -setup = common_setup + """ -df = DataFrame(dict( A = [ 'foo'] * 1000000)) -""" - -frame_iloc_big = Benchmark('df.iloc[:100,0]', setup, - start_date=datetime(2013, 1, 1)) - -#---------------------------------------------------------------------- -# basic tests for [], .loc[], .iloc[] and .ix[] - -setup = common_setup + """ -s = Series(np.random.rand(1000000)) -""" - -series_getitem_scalar = Benchmark("s[800000]", setup) -series_getitem_slice = Benchmark("s[:800000]", setup) -series_getitem_list_like = Benchmark("s[[800000]]", setup) -series_getitem_array = Benchmark("s[np.arange(10000)]", setup) - -series_loc_scalar = Benchmark("s.loc[800000]", setup) -series_loc_slice = Benchmark("s.loc[:800000]", setup) -series_loc_list_like = Benchmark("s.loc[[800000]]", setup) -series_loc_array = Benchmark("s.loc[np.arange(10000)]", setup) - -series_iloc_scalar = Benchmark("s.iloc[800000]", setup) -series_iloc_slice = Benchmark("s.iloc[:800000]", setup) -series_iloc_list_like = Benchmark("s.iloc[[800000]]", setup) -series_iloc_array = Benchmark("s.iloc[np.arange(10000)]", setup) - -series_ix_scalar = Benchmark("s.ix[800000]", setup) -series_ix_slice = Benchmark("s.ix[:800000]", setup) -series_ix_list_like = Benchmark("s.ix[[800000]]", setup) -series_ix_array = Benchmark("s.ix[np.arange(10000)]", setup) - - -# multi-index slicing -setup = common_setup + """ -np.random.seed(1234) -idx=pd.IndexSlice -n=100000 -mdt = pandas.DataFrame() -mdt['A'] = np.random.choice(range(10000,45000,1000), n) -mdt['B'] = np.random.choice(range(10,400), n) -mdt['C'] = np.random.choice(range(1,150), n) -mdt['D'] = np.random.choice(range(10000,45000), n) -mdt['x'] = np.random.choice(range(400), n) -mdt['y'] = np.random.choice(range(25), n) - - -test_A = 25000 -test_B = 25 -test_C = 40 -test_D = 35000 - -eps_A = 5000 -eps_B = 5 -eps_C = 5 -eps_D = 5000 -mdt2 = mdt.set_index(['A','B','C','D']).sortlevel() -""" - -multiindex_slicers = Benchmark('mdt2.loc[idx[test_A-eps_A:test_A+eps_A,test_B-eps_B:test_B+eps_B,test_C-eps_C:test_C+eps_C,test_D-eps_D:test_D+eps_D],:]', setup, - start_date=datetime(2015, 1, 1)) - -#---------------------------------------------------------------------- -# take - -setup = common_setup + """ -s = Series(np.random.rand(100000)) -ts = Series(np.random.rand(100000), - index=date_range('2011-01-01', freq='S', periods=100000)) -indexer = [True, False, True, True, False] * 20000 -""" - -series_take_intindex = Benchmark("s.take(indexer)", setup) -series_take_dtindex = Benchmark("ts.take(indexer)", setup) diff --git a/vb_suite/inference.py b/vb_suite/inference.py deleted file mode 100644 index aaa51aa5163ce..0000000000000 --- a/vb_suite/inference.py +++ /dev/null @@ -1,36 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime -import sys - -# from GH 7332 - -setup = """from .pandas_vb_common import * -import pandas as pd -N = 500000 -df_int64 = DataFrame(dict(A = np.arange(N,dtype='int64'), B = np.arange(N,dtype='int64'))) -df_int32 = DataFrame(dict(A = np.arange(N,dtype='int32'), B = np.arange(N,dtype='int32'))) -df_uint32 = DataFrame(dict(A = np.arange(N,dtype='uint32'), B = np.arange(N,dtype='uint32'))) -df_float64 = DataFrame(dict(A = np.arange(N,dtype='float64'), B = np.arange(N,dtype='float64'))) -df_float32 = DataFrame(dict(A = np.arange(N,dtype='float32'), B = np.arange(N,dtype='float32'))) -df_datetime64 = DataFrame(dict(A = pd.to_datetime(np.arange(N,dtype='int64'),unit='ms'), - B = pd.to_datetime(np.arange(N,dtype='int64'),unit='ms'))) -df_timedelta64 = DataFrame(dict(A = df_datetime64['A']-df_datetime64['B'], - B = df_datetime64['B'])) -""" - -dtype_infer_int64 = Benchmark('df_int64["A"] + df_int64["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_int32 = Benchmark('df_int32["A"] + df_int32["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_uint32 = Benchmark('df_uint32["A"] + df_uint32["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_float64 = Benchmark('df_float64["A"] + df_float64["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_float32 = Benchmark('df_float32["A"] + df_float32["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_datetime64 = Benchmark('df_datetime64["A"] - df_datetime64["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_timedelta64_1 = Benchmark('df_timedelta64["A"] + df_timedelta64["B"]', setup, - start_date=datetime(2014, 1, 1)) -dtype_infer_timedelta64_2 = Benchmark('df_timedelta64["A"] + df_timedelta64["A"]', setup, - start_date=datetime(2014, 1, 1)) diff --git a/vb_suite/io_bench.py b/vb_suite/io_bench.py deleted file mode 100644 index af5f6076515cc..0000000000000 --- a/vb_suite/io_bench.py +++ /dev/null @@ -1,150 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -from io import StringIO -""" - -#---------------------------------------------------------------------- -# read_csv - -setup1 = common_setup + """ -index = tm.makeStringIndex(10000) -df = DataFrame({'float1' : randn(10000), - 'float2' : randn(10000), - 'string1' : ['foo'] * 10000, - 'bool1' : [True] * 10000, - 'int1' : np.random.randint(0, 100000, size=10000)}, - index=index) -df.to_csv('__test__.csv') -""" - -read_csv_standard = Benchmark("read_csv('__test__.csv')", setup1, - start_date=datetime(2011, 9, 15)) - -#---------------------------------- -# skiprows - -setup1 = common_setup + """ -index = tm.makeStringIndex(20000) -df = DataFrame({'float1' : randn(20000), - 'float2' : randn(20000), - 'string1' : ['foo'] * 20000, - 'bool1' : [True] * 20000, - 'int1' : np.random.randint(0, 200000, size=20000)}, - index=index) -df.to_csv('__test__.csv') -""" - -read_csv_skiprows = Benchmark("read_csv('__test__.csv', skiprows=10000)", setup1, - start_date=datetime(2011, 9, 15)) - -#---------------------------------------------------------------------- -# write_csv - -setup2 = common_setup + """ -index = tm.makeStringIndex(10000) -df = DataFrame({'float1' : randn(10000), - 'float2' : randn(10000), - 'string1' : ['foo'] * 10000, - 'bool1' : [True] * 10000, - 'int1' : np.random.randint(0, 100000, size=10000)}, - index=index) -""" - -write_csv_standard = Benchmark("df.to_csv('__test__.csv')", setup2, - start_date=datetime(2011, 9, 15)) - -#---------------------------------- -setup = common_setup + """ -df = DataFrame(np.random.randn(3000, 30)) -""" -frame_to_csv = Benchmark("df.to_csv('__test__.csv')", setup, - start_date=datetime(2011, 1, 1)) -#---------------------------------- - -setup = common_setup + """ -df=DataFrame({'A':range(50000)}) -df['B'] = df.A + 1.0 -df['C'] = df.A + 2.0 -df['D'] = df.A + 3.0 -""" -frame_to_csv2 = Benchmark("df.to_csv('__test__.csv')", setup, - start_date=datetime(2011, 1, 1)) - -#---------------------------------- -setup = common_setup + """ -from pandas import concat, Timestamp - -def create_cols(name): - return [ "%s%03d" % (name,i) for i in range(5) ] -df_float = DataFrame(np.random.randn(5000, 5),dtype='float64',columns=create_cols('float')) -df_int = DataFrame(np.random.randn(5000, 5),dtype='int64',columns=create_cols('int')) -df_bool = DataFrame(True,index=df_float.index,columns=create_cols('bool')) -df_object = DataFrame('foo',index=df_float.index,columns=create_cols('object')) -df_dt = DataFrame(Timestamp('20010101'),index=df_float.index,columns=create_cols('date')) - -# add in some nans -df_float.ix[30:500,1:3] = np.nan - -df = concat([ df_float, df_int, df_bool, df_object, df_dt ], axis=1) - -""" -frame_to_csv_mixed = Benchmark("df.to_csv('__test__.csv')", setup, - start_date=datetime(2012, 6, 1)) - -#---------------------------------------------------------------------- -# parse dates, ISO8601 format - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=1000) -data = '\\n'.join(rng.map(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))) -""" - -stmt = ("read_csv(StringIO(data), header=None, names=['foo'], " - " parse_dates=['foo'])") -read_parse_dates_iso8601 = Benchmark(stmt, setup, - start_date=datetime(2012, 3, 1)) - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=1000) -data = DataFrame(rng, index=rng) -""" - -stmt = ("data.to_csv('__test__.csv', date_format='%Y%m%d')") - -frame_to_csv_date_formatting = Benchmark(stmt, setup, - start_date=datetime(2013, 9, 1)) - -#---------------------------------------------------------------------- -# infer datetime format - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=1000) -data = '\\n'.join(rng.map(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))) -""" - -stmt = ("read_csv(StringIO(data), header=None, names=['foo'], " - " parse_dates=['foo'], infer_datetime_format=True)") - -read_csv_infer_datetime_format_iso8601 = Benchmark(stmt, setup) - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=1000) -data = '\\n'.join(rng.map(lambda x: x.strftime("%Y%m%d"))) -""" - -stmt = ("read_csv(StringIO(data), header=None, names=['foo'], " - " parse_dates=['foo'], infer_datetime_format=True)") - -read_csv_infer_datetime_format_ymd = Benchmark(stmt, setup) - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=1000) -data = '\\n'.join(rng.map(lambda x: x.strftime("%m/%d/%Y %H:%M:%S.%f"))) -""" - -stmt = ("read_csv(StringIO(data), header=None, names=['foo'], " - " parse_dates=['foo'], infer_datetime_format=True)") - -read_csv_infer_datetime_format_custom = Benchmark(stmt, setup) diff --git a/vb_suite/io_sql.py b/vb_suite/io_sql.py deleted file mode 100644 index ba8367e7e356b..0000000000000 --- a/vb_suite/io_sql.py +++ /dev/null @@ -1,126 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -import sqlite3 -import sqlalchemy -from sqlalchemy import create_engine - -engine = create_engine('sqlite:///:memory:') -con = sqlite3.connect(':memory:') -""" - -sdate = datetime(2014, 6, 1) - - -#------------------------------------------------------------------------------- -# to_sql - -setup = common_setup + """ -index = tm.makeStringIndex(10000) -df = DataFrame({'float1' : randn(10000), - 'float2' : randn(10000), - 'string1' : ['foo'] * 10000, - 'bool1' : [True] * 10000, - 'int1' : np.random.randint(0, 100000, size=10000)}, - index=index) -""" - -sql_write_sqlalchemy = Benchmark("df.to_sql('test1', engine, if_exists='replace')", - setup, start_date=sdate) - -sql_write_fallback = Benchmark("df.to_sql('test1', con, if_exists='replace')", - setup, start_date=sdate) - - -#------------------------------------------------------------------------------- -# read_sql - -setup = common_setup + """ -index = tm.makeStringIndex(10000) -df = DataFrame({'float1' : randn(10000), - 'float2' : randn(10000), - 'string1' : ['foo'] * 10000, - 'bool1' : [True] * 10000, - 'int1' : np.random.randint(0, 100000, size=10000)}, - index=index) -df.to_sql('test2', engine, if_exists='replace') -df.to_sql('test2', con, if_exists='replace') -""" - -sql_read_query_sqlalchemy = Benchmark("read_sql_query('SELECT * FROM test2', engine)", - setup, start_date=sdate) - -sql_read_query_fallback = Benchmark("read_sql_query('SELECT * FROM test2', con)", - setup, start_date=sdate) - -sql_read_table_sqlalchemy = Benchmark("read_sql_table('test2', engine)", - setup, start_date=sdate) - - -#------------------------------------------------------------------------------- -# type specific write - -setup = common_setup + """ -df = DataFrame({'float' : randn(10000), - 'string' : ['foo'] * 10000, - 'bool' : [True] * 10000, - 'datetime' : date_range('2000-01-01', periods=10000, freq='s')}) -df.loc[1000:3000, 'float'] = np.nan -""" - -sql_float_write_sqlalchemy = \ - Benchmark("df[['float']].to_sql('test_float', engine, if_exists='replace')", - setup, start_date=sdate) - -sql_float_write_fallback = \ - Benchmark("df[['float']].to_sql('test_float', con, if_exists='replace')", - setup, start_date=sdate) - -sql_string_write_sqlalchemy = \ - Benchmark("df[['string']].to_sql('test_string', engine, if_exists='replace')", - setup, start_date=sdate) - -sql_string_write_fallback = \ - Benchmark("df[['string']].to_sql('test_string', con, if_exists='replace')", - setup, start_date=sdate) - -sql_datetime_write_sqlalchemy = \ - Benchmark("df[['datetime']].to_sql('test_datetime', engine, if_exists='replace')", - setup, start_date=sdate) - -#sql_datetime_write_fallback = \ -# Benchmark("df[['datetime']].to_sql('test_datetime', con, if_exists='replace')", -# setup3, start_date=sdate) - -#------------------------------------------------------------------------------- -# type specific read - -setup = common_setup + """ -df = DataFrame({'float' : randn(10000), - 'datetime' : date_range('2000-01-01', periods=10000, freq='s')}) -df['datetime_string'] = df['datetime'].map(str) - -df.to_sql('test_type', engine, if_exists='replace') -df[['float', 'datetime_string']].to_sql('test_type', con, if_exists='replace') -""" - -sql_float_read_query_sqlalchemy = \ - Benchmark("read_sql_query('SELECT float FROM test_type', engine)", - setup, start_date=sdate) - -sql_float_read_table_sqlalchemy = \ - Benchmark("read_sql_table('test_type', engine, columns=['float'])", - setup, start_date=sdate) - -sql_float_read_query_fallback = \ - Benchmark("read_sql_query('SELECT float FROM test_type', con)", - setup, start_date=sdate) - -sql_datetime_read_as_native_sqlalchemy = \ - Benchmark("read_sql_table('test_type', engine, columns=['datetime'])", - setup, start_date=sdate) - -sql_datetime_read_and_parse_sqlalchemy = \ - Benchmark("read_sql_table('test_type', engine, columns=['datetime_string'], parse_dates=['datetime_string'])", - setup, start_date=sdate) diff --git a/vb_suite/join_merge.py b/vb_suite/join_merge.py deleted file mode 100644 index 238a129552e90..0000000000000 --- a/vb_suite/join_merge.py +++ /dev/null @@ -1,270 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -setup = common_setup + """ -level1 = tm.makeStringIndex(10).values -level2 = tm.makeStringIndex(1000).values -label1 = np.arange(10).repeat(1000) -label2 = np.tile(np.arange(1000), 10) - -key1 = np.tile(level1.take(label1), 10) -key2 = np.tile(level2.take(label2), 10) - -shuf = np.arange(100000) -random.shuffle(shuf) -try: - index2 = MultiIndex(levels=[level1, level2], labels=[label1, label2]) - index3 = MultiIndex(levels=[np.arange(10), np.arange(100), np.arange(100)], - labels=[np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)]) - df_multi = DataFrame(np.random.randn(len(index2), 4), index=index2, - columns=['A', 'B', 'C', 'D']) -except: # pre-MultiIndex - pass - -try: - DataFrame = DataMatrix -except: - pass - -df = pd.DataFrame({'data1' : np.random.randn(100000), - 'data2' : np.random.randn(100000), - 'key1' : key1, - 'key2' : key2}) - - -df_key1 = pd.DataFrame(np.random.randn(len(level1), 4), index=level1, - columns=['A', 'B', 'C', 'D']) -df_key2 = pd.DataFrame(np.random.randn(len(level2), 4), index=level2, - columns=['A', 'B', 'C', 'D']) - -df_shuf = df.reindex(df.index[shuf]) -""" - -#---------------------------------------------------------------------- -# DataFrame joins on key - -join_dataframe_index_single_key_small = \ - Benchmark("df.join(df_key1, on='key1')", setup, - name='join_dataframe_index_single_key_small') - -join_dataframe_index_single_key_bigger = \ - Benchmark("df.join(df_key2, on='key2')", setup, - name='join_dataframe_index_single_key_bigger') - -join_dataframe_index_single_key_bigger_sort = \ - Benchmark("df_shuf.join(df_key2, on='key2', sort=True)", setup, - name='join_dataframe_index_single_key_bigger_sort', - start_date=datetime(2012, 2, 5)) - -join_dataframe_index_multi = \ - Benchmark("df.join(df_multi, on=['key1', 'key2'])", setup, - name='join_dataframe_index_multi', - start_date=datetime(2011, 10, 20)) - -#---------------------------------------------------------------------- -# Joins on integer keys -setup = common_setup + """ -df = pd.DataFrame({'key1': np.tile(np.arange(500).repeat(10), 2), - 'key2': np.tile(np.arange(250).repeat(10), 4), - 'value': np.random.randn(10000)}) -df2 = pd.DataFrame({'key1': np.arange(500), 'value2': randn(500)}) -df3 = df[:5000] -""" - - -join_dataframe_integer_key = Benchmark("merge(df, df2, on='key1')", setup, - start_date=datetime(2011, 10, 20)) -join_dataframe_integer_2key = Benchmark("merge(df, df3)", setup, - start_date=datetime(2011, 10, 20)) - -#---------------------------------------------------------------------- -# DataFrame joins on index - - -#---------------------------------------------------------------------- -# Merges -setup = common_setup + """ -N = 10000 - -indices = tm.makeStringIndex(N).values -indices2 = tm.makeStringIndex(N).values -key = np.tile(indices[:8000], 10) -key2 = np.tile(indices2[:8000], 10) - -left = pd.DataFrame({'key' : key, 'key2':key2, - 'value' : np.random.randn(80000)}) -right = pd.DataFrame({'key': indices[2000:], 'key2':indices2[2000:], - 'value2' : np.random.randn(8000)}) -""" - -merge_2intkey_nosort = Benchmark('merge(left, right, sort=False)', setup, - start_date=datetime(2011, 10, 20)) - -merge_2intkey_sort = Benchmark('merge(left, right, sort=True)', setup, - start_date=datetime(2011, 10, 20)) - -#---------------------------------------------------------------------- -# Appending DataFrames - -setup = common_setup + """ -df1 = pd.DataFrame(np.random.randn(10000, 4), columns=['A', 'B', 'C', 'D']) -df2 = df1.copy() -df2.index = np.arange(10000, 20000) -mdf1 = df1.copy() -mdf1['obj1'] = 'bar' -mdf1['obj2'] = 'bar' -mdf1['int1'] = 5 -try: - mdf1.consolidate(inplace=True) -except: - pass -mdf2 = mdf1.copy() -mdf2.index = df2.index -""" - -stmt = "df1.append(df2)" -append_frame_single_homogenous = \ - Benchmark(stmt, setup, name='append_frame_single_homogenous', - ncalls=500, repeat=1) - -stmt = "mdf1.append(mdf2)" -append_frame_single_mixed = Benchmark(stmt, setup, - name='append_frame_single_mixed', - ncalls=500, repeat=1) - -#---------------------------------------------------------------------- -# data alignment - -setup = common_setup + """n = 1000000 -# indices = tm.makeStringIndex(n) -def sample(values, k): - sampler = np.random.permutation(len(values)) - return values.take(sampler[:k]) -sz = 500000 -rng = np.arange(0, 10000000000000, 10000000) -stamps = np.datetime64(datetime.now()).view('i8') + rng -idx1 = np.sort(sample(stamps, sz)) -idx2 = np.sort(sample(stamps, sz)) -ts1 = Series(np.random.randn(sz), idx1) -ts2 = Series(np.random.randn(sz), idx2) -""" -stmt = "ts1 + ts2" -series_align_int64_index = \ - Benchmark(stmt, setup, - name="series_align_int64_index", - start_date=datetime(2010, 6, 1), logy=True) - -stmt = "ts1.align(ts2, join='left')" -series_align_left_monotonic = \ - Benchmark(stmt, setup, - name="series_align_left_monotonic", - start_date=datetime(2011, 12, 1), logy=True) - -#---------------------------------------------------------------------- -# Concat Series axis=1 - -setup = common_setup + """ -n = 1000 -indices = tm.makeStringIndex(1000) -s = Series(n, index=indices) -pieces = [s[i:-i] for i in range(1, 10)] -pieces = pieces * 50 -""" - -concat_series_axis1 = Benchmark('concat(pieces, axis=1)', setup, - start_date=datetime(2012, 2, 27)) - -setup = common_setup + """ -df = pd.DataFrame(randn(5, 4)) -""" - -concat_small_frames = Benchmark('concat([df] * 1000)', setup, - start_date=datetime(2012, 1, 1)) - - -#---------------------------------------------------------------------- -# Concat empty - -setup = common_setup + """ -df = pd.DataFrame(dict(A = range(10000)),index=date_range('20130101',periods=10000,freq='s')) -empty = pd.DataFrame() -""" - -concat_empty_frames1 = Benchmark('concat([df,empty])', setup, - start_date=datetime(2012, 1, 1)) -concat_empty_frames2 = Benchmark('concat([empty,df])', setup, - start_date=datetime(2012, 1, 1)) - - -#---------------------------------------------------------------------- -# Ordered merge - -setup = common_setup + """ -groups = tm.makeStringIndex(10).values - -left = pd.DataFrame({'group': groups.repeat(5000), - 'key' : np.tile(np.arange(0, 10000, 2), 10), - 'lvalue': np.random.randn(50000)}) - -right = pd.DataFrame({'key' : np.arange(10000), - 'rvalue' : np.random.randn(10000)}) - -""" - -stmt = "ordered_merge(left, right, on='key', left_by='group')" - -#---------------------------------------------------------------------- -# outer join of non-unique -# GH 6329 - -setup = common_setup + """ -date_index = date_range('01-Jan-2013', '23-Jan-2013', freq='T') -daily_dates = date_index.to_period('D').to_timestamp('S','S') -fracofday = date_index.view(np.ndarray) - daily_dates.view(np.ndarray) -fracofday = fracofday.astype('timedelta64[ns]').astype(np.float64)/864e11 -fracofday = TimeSeries(fracofday, daily_dates) -index = date_range(date_index.min().to_period('A').to_timestamp('D','S'), - date_index.max().to_period('A').to_timestamp('D','E'), - freq='D') -temp = TimeSeries(1.0, index) -""" - -join_non_unique_equal = Benchmark('fracofday * temp[fracofday.index]', setup, - start_date=datetime(2013, 1, 1)) - - -setup = common_setup + ''' -np.random.seed(2718281) -n = 50000 - -left = pd.DataFrame(np.random.randint(1, n/500, (n, 2)), - columns=['jim', 'joe']) - -right = pd.DataFrame(np.random.randint(1, n/500, (n, 2)), - columns=['jolie', 'jolia']).set_index('jolie') -''' - -left_outer_join_index = Benchmark("left.join(right, on='jim')", setup, - name='left_outer_join_index') - - -setup = common_setup + """ -low, high, n = -1 << 10, 1 << 10, 1 << 20 -left = pd.DataFrame(np.random.randint(low, high, (n, 7)), - columns=list('ABCDEFG')) -left['left'] = left.sum(axis=1) - -i = np.random.permutation(len(left)) -right = left.iloc[i].copy() -right.columns = right.columns[:-1].tolist() + ['right'] -right.index = np.arange(len(right)) -right['right'] *= -1 -""" - -i8merge = Benchmark("merge(left, right, how='outer')", setup, - name='i8merge') diff --git a/vb_suite/make.py b/vb_suite/make.py deleted file mode 100755 index 5a8a8215db9a4..0000000000000 --- a/vb_suite/make.py +++ /dev/null @@ -1,167 +0,0 @@ -#!/usr/bin/env python - -""" -Python script for building documentation. - -To build the docs you must have all optional dependencies for statsmodels -installed. See the installation instructions for a list of these. - -Note: currently latex builds do not work because of table formats that are not -supported in the latex generation. - -Usage ------ -python make.py clean -python make.py html -""" - -import glob -import os -import shutil -import sys -import sphinx - -os.environ['PYTHONPATH'] = '..' - -SPHINX_BUILD = 'sphinxbuild' - - -def upload(): - 'push a copy to the site' - os.system('cd build/html; rsync -avz . pandas@pandas.pydata.org' - ':/usr/share/nginx/pandas/pandas-docs/vbench/ -essh') - - -def clean(): - if os.path.exists('build'): - shutil.rmtree('build') - - if os.path.exists('source/generated'): - shutil.rmtree('source/generated') - - -def html(): - check_build() - if os.system('sphinx-build -P -b html -d build/doctrees ' - 'source build/html'): - raise SystemExit("Building HTML failed.") - - -def check_build(): - build_dirs = [ - 'build', 'build/doctrees', 'build/html', - 'build/plots', 'build/_static', - 'build/_templates'] - for d in build_dirs: - try: - os.mkdir(d) - except OSError: - pass - - -def all(): - clean() - html() - - -def auto_update(): - msg = '' - try: - clean() - html() - upload() - sendmail() - except (Exception, SystemExit), inst: - msg += str(inst) + '\n' - sendmail(msg) - - -def sendmail(err_msg=None): - from_name, to_name = _get_config() - - if err_msg is None: - msgstr = 'Daily vbench uploaded successfully' - subject = "VB: daily update successful" - else: - msgstr = err_msg - subject = "VB: daily update failed" - - import smtplib - from email.MIMEText import MIMEText - msg = MIMEText(msgstr) - msg['Subject'] = subject - msg['From'] = from_name - msg['To'] = to_name - - server_str, port, login, pwd = _get_credentials() - server = smtplib.SMTP(server_str, port) - server.ehlo() - server.starttls() - server.ehlo() - - server.login(login, pwd) - try: - server.sendmail(from_name, to_name, msg.as_string()) - finally: - server.close() - - -def _get_dir(subdir=None): - import getpass - USERNAME = getpass.getuser() - if sys.platform == 'darwin': - HOME = '/Users/%s' % USERNAME - else: - HOME = '/home/%s' % USERNAME - - if subdir is None: - subdir = '/code/scripts' - conf_dir = '%s%s' % (HOME, subdir) - return conf_dir - - -def _get_credentials(): - tmp_dir = _get_dir() - cred = '%s/credentials' % tmp_dir - with open(cred, 'r') as fh: - server, port, un, domain = fh.read().split(',') - port = int(port) - login = un + '@' + domain + '.com' - - import base64 - with open('%s/cron_email_pwd' % tmp_dir, 'r') as fh: - pwd = base64.b64decode(fh.read()) - - return server, port, login, pwd - - -def _get_config(): - tmp_dir = _get_dir() - with open('%s/addresses' % tmp_dir, 'r') as fh: - from_name, to_name = fh.read().split(',') - return from_name, to_name - -funcd = { - 'html': html, - 'clean': clean, - 'upload': upload, - 'auto_update': auto_update, - 'all': all, -} - -small_docs = False - -# current_dir = os.getcwd() -# os.chdir(os.path.dirname(os.path.join(current_dir, __file__))) - -if len(sys.argv) > 1: - for arg in sys.argv[1:]: - func = funcd.get(arg) - if func is None: - raise SystemExit('Do not know how to handle %s; valid args are %s' % ( - arg, funcd.keys())) - func() -else: - small_docs = False - all() -# os.chdir(current_dir) diff --git a/vb_suite/measure_memory_consumption.py b/vb_suite/measure_memory_consumption.py deleted file mode 100755 index bb73cf5da4302..0000000000000 --- a/vb_suite/measure_memory_consumption.py +++ /dev/null @@ -1,55 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - -from __future__ import print_function - -"""Short one-line summary - -long summary -""" - - -def main(): - import shutil - import tempfile - import warnings - - from pandas import Series - - from vbench.api import BenchmarkRunner - from suite import (REPO_PATH, BUILD, DB_PATH, PREPARE, - dependencies, benchmarks) - - from memory_profiler import memory_usage - - warnings.filterwarnings('ignore', category=FutureWarning) - - try: - TMP_DIR = tempfile.mkdtemp() - runner = BenchmarkRunner( - benchmarks, REPO_PATH, REPO_PATH, BUILD, DB_PATH, - TMP_DIR, PREPARE, always_clean=True, - # run_option='eod', start_date=START_DATE, - module_dependencies=dependencies) - results = {} - for b in runner.benchmarks: - k = b.name - try: - vs = memory_usage((b.run,)) - v = max(vs) - # print(k, v) - results[k] = v - except Exception as e: - print("Exception caught in %s\n" % k) - print(str(e)) - - s = Series(results) - s.sort() - print((s)) - - finally: - shutil.rmtree(TMP_DIR) - - -if __name__ == "__main__": - main() diff --git a/vb_suite/miscellaneous.py b/vb_suite/miscellaneous.py deleted file mode 100644 index da2c736e79ea7..0000000000000 --- a/vb_suite/miscellaneous.py +++ /dev/null @@ -1,32 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# cache_readonly - -setup = common_setup + """ -from pandas.util.decorators import cache_readonly - -class Foo: - - @cache_readonly - def prop(self): - return 5 -obj = Foo() -""" -misc_cache_readonly = Benchmark("obj.prop", setup, name="misc_cache_readonly", - ncalls=2000000) - -#---------------------------------------------------------------------- -# match - -setup = common_setup + """ -uniques = tm.makeStringIndex(1000).values -all = uniques.repeat(10) -""" - -match_strings = Benchmark("match(all, uniques)", setup, - start_date=datetime(2012, 5, 12)) diff --git a/vb_suite/packers.py b/vb_suite/packers.py deleted file mode 100644 index 69ec10822b392..0000000000000 --- a/vb_suite/packers.py +++ /dev/null @@ -1,252 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -start_date = datetime(2013, 5, 1) - -common_setup = """from .pandas_vb_common import * -import os -import pandas as pd -from pandas.core import common as com -from pandas.compat import BytesIO -from random import randrange - -f = '__test__.msg' -def remove(f): - try: - os.remove(f) - except: - pass - -N=100000 -C=5 -index = date_range('20000101',periods=N,freq='H') -df = DataFrame(dict([ ("float{0}".format(i),randn(N)) for i in range(C) ]), - index=index) - -N=100000 -C=5 -index = date_range('20000101',periods=N,freq='H') -df2 = DataFrame(dict([ ("float{0}".format(i),randn(N)) for i in range(C) ]), - index=index) -df2['object'] = ['%08x'%randrange(16**8) for _ in range(N)] -remove(f) -""" - -#---------------------------------------------------------------------- -# msgpack - -setup = common_setup + """ -df2.to_msgpack(f) -""" - -packers_read_pack = Benchmark("pd.read_msgpack(f)", setup, start_date=start_date) - -setup = common_setup + """ -""" - -packers_write_pack = Benchmark("df2.to_msgpack(f)", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# pickle - -setup = common_setup + """ -df2.to_pickle(f) -""" - -packers_read_pickle = Benchmark("pd.read_pickle(f)", setup, start_date=start_date) - -setup = common_setup + """ -""" - -packers_write_pickle = Benchmark("df2.to_pickle(f)", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# csv - -setup = common_setup + """ -df.to_csv(f) -""" - -packers_read_csv = Benchmark("pd.read_csv(f)", setup, start_date=start_date) - -setup = common_setup + """ -""" - -packers_write_csv = Benchmark("df.to_csv(f)", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# hdf store - -setup = common_setup + """ -df2.to_hdf(f,'df') -""" - -packers_read_hdf_store = Benchmark("pd.read_hdf(f,'df')", setup, start_date=start_date) - -setup = common_setup + """ -""" - -packers_write_hdf_store = Benchmark("df2.to_hdf(f,'df')", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# hdf table - -setup = common_setup + """ -df2.to_hdf(f,'df',format='table') -""" - -packers_read_hdf_table = Benchmark("pd.read_hdf(f,'df')", setup, start_date=start_date) - -setup = common_setup + """ -""" - -packers_write_hdf_table = Benchmark("df2.to_hdf(f,'df',table=True)", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# sql - -setup = common_setup + """ -import sqlite3 -from sqlalchemy import create_engine -engine = create_engine('sqlite:///:memory:') - -df2.to_sql('table', engine, if_exists='replace') -""" - -packers_read_sql= Benchmark("pd.read_sql_table('table', engine)", setup, start_date=start_date) - -setup = common_setup + """ -import sqlite3 -from sqlalchemy import create_engine -engine = create_engine('sqlite:///:memory:') -""" - -packers_write_sql = Benchmark("df2.to_sql('table', engine, if_exists='replace')", setup, start_date=start_date) - -#---------------------------------------------------------------------- -# json - -setup_int_index = """ -import numpy as np -df.index = np.arange(N) -""" - -setup = common_setup + """ -df.to_json(f,orient='split') -""" -packers_read_json_date_index = Benchmark("pd.read_json(f, orient='split')", setup, start_date=start_date) -setup = setup + setup_int_index -packers_read_json = Benchmark("pd.read_json(f, orient='split')", setup, start_date=start_date) - -setup = common_setup + """ -""" -packers_write_json_date_index = Benchmark("df.to_json(f,orient='split')", setup, cleanup="remove(f)", start_date=start_date) - -setup = setup + setup_int_index -packers_write_json = Benchmark("df.to_json(f,orient='split')", setup, cleanup="remove(f)", start_date=start_date) -packers_write_json_T = Benchmark("df.to_json(f,orient='columns')", setup, cleanup="remove(f)", start_date=start_date) - -setup = common_setup + """ -from numpy.random import randint -from collections import OrderedDict - -cols = [ - lambda i: ("{0}_timedelta".format(i), [pd.Timedelta('%d seconds' % randrange(1e6)) for _ in range(N)]), - lambda i: ("{0}_int".format(i), randint(1e8, size=N)), - lambda i: ("{0}_timestamp".format(i), [pd.Timestamp( 1418842918083256000 + randrange(1e9, 1e18, 200)) for _ in range(N)]) - ] -df_mixed = DataFrame(OrderedDict([cols[i % len(cols)](i) for i in range(C)]), - index=index) -""" -packers_write_json_mixed_delta_int_tstamp = Benchmark("df_mixed.to_json(f,orient='split')", setup, cleanup="remove(f)", start_date=start_date) - -setup = common_setup + """ -from numpy.random import randint -from collections import OrderedDict -cols = [ - lambda i: ("{0}_float".format(i), randn(N)), - lambda i: ("{0}_int".format(i), randint(1e8, size=N)) - ] -df_mixed = DataFrame(OrderedDict([cols[i % len(cols)](i) for i in range(C)]), - index=index) -""" -packers_write_json_mixed_float_int = Benchmark("df_mixed.to_json(f,orient='index')", setup, cleanup="remove(f)", start_date=start_date) -packers_write_json_mixed_float_int_T = Benchmark("df_mixed.to_json(f,orient='columns')", setup, cleanup="remove(f)", start_date=start_date) - -setup = common_setup + """ -from numpy.random import randint -from collections import OrderedDict -cols = [ - lambda i: ("{0}_float".format(i), randn(N)), - lambda i: ("{0}_int".format(i), randint(1e8, size=N)), - lambda i: ("{0}_str".format(i), ['%08x'%randrange(16**8) for _ in range(N)]) - ] -df_mixed = DataFrame(OrderedDict([cols[i % len(cols)](i) for i in range(C)]), - index=index) -""" -packers_write_json_mixed_float_int_str = Benchmark("df_mixed.to_json(f,orient='split')", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# stata - -setup = common_setup + """ -df.to_stata(f, {'index': 'tc'}) -""" -packers_read_stata = Benchmark("pd.read_stata(f)", setup, start_date=start_date) - -packers_write_stata = Benchmark("df.to_stata(f, {'index': 'tc'})", setup, cleanup="remove(f)", start_date=start_date) - -setup = common_setup + """ -df['int8_'] = [randint(np.iinfo(np.int8).min, np.iinfo(np.int8).max - 27) for _ in range(N)] -df['int16_'] = [randint(np.iinfo(np.int16).min, np.iinfo(np.int16).max - 27) for _ in range(N)] -df['int32_'] = [randint(np.iinfo(np.int32).min, np.iinfo(np.int32).max - 27) for _ in range(N)] -df['float32_'] = np.array(randn(N), dtype=np.float32) -df.to_stata(f, {'index': 'tc'}) -""" - -packers_read_stata_with_validation = Benchmark("pd.read_stata(f)", setup, start_date=start_date) - -packers_write_stata_with_validation = Benchmark("df.to_stata(f, {'index': 'tc'})", setup, cleanup="remove(f)", start_date=start_date) - -#---------------------------------------------------------------------- -# Excel - alternative writers -setup = common_setup + """ -bio = BytesIO() -""" - -excel_writer_bench = """ -bio.seek(0) -writer = pd.io.excel.ExcelWriter(bio, engine='{engine}') -df[:2000].to_excel(writer) -writer.save() -""" - -benchmark_xlsxwriter = excel_writer_bench.format(engine='xlsxwriter') - -packers_write_excel_xlsxwriter = Benchmark(benchmark_xlsxwriter, setup) - -benchmark_openpyxl = excel_writer_bench.format(engine='openpyxl') - -packers_write_excel_openpyxl = Benchmark(benchmark_openpyxl, setup) - -benchmark_xlwt = excel_writer_bench.format(engine='xlwt') - -packers_write_excel_xlwt = Benchmark(benchmark_xlwt, setup) - - -#---------------------------------------------------------------------- -# Excel - reader - -setup = common_setup + """ -bio = BytesIO() -writer = pd.io.excel.ExcelWriter(bio, engine='xlsxwriter') -df[:2000].to_excel(writer) -writer.save() -""" - -benchmark_read_excel=""" -bio.seek(0) -pd.read_excel(bio) -""" - -packers_read_excel = Benchmark(benchmark_read_excel, setup) diff --git a/vb_suite/pandas_vb_common.py b/vb_suite/pandas_vb_common.py deleted file mode 100644 index a1326d63a112a..0000000000000 --- a/vb_suite/pandas_vb_common.py +++ /dev/null @@ -1,30 +0,0 @@ -from pandas import * -import pandas as pd -from datetime import timedelta -from numpy.random import randn -from numpy.random import randint -from numpy.random import permutation -import pandas.util.testing as tm -import random -import numpy as np -try: - from pandas.compat import range -except ImportError: - pass - -np.random.seed(1234) -try: - import pandas._tseries as lib -except: - import pandas.lib as lib - -try: - Panel = WidePanel -except Exception: - pass - -# didn't add to namespace until later -try: - from pandas.core.index import MultiIndex -except ImportError: - pass diff --git a/vb_suite/panel_ctor.py b/vb_suite/panel_ctor.py deleted file mode 100644 index 9f497e7357a61..0000000000000 --- a/vb_suite/panel_ctor.py +++ /dev/null @@ -1,76 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# Panel.from_dict homogenization time - -START_DATE = datetime(2011, 6, 1) - -setup_same_index = common_setup + """ -# create 100 dataframes with the same index -dr = np.asarray(DatetimeIndex(start=datetime(1990,1,1), end=datetime(2012,1,1), - freq=datetools.Day(1))) -data_frames = {} -for x in range(100): - df = DataFrame({"a": [0]*len(dr), "b": [1]*len(dr), - "c": [2]*len(dr)}, index=dr) - data_frames[x] = df -""" - -panel_from_dict_same_index = \ - Benchmark("Panel.from_dict(data_frames)", - setup_same_index, name='panel_from_dict_same_index', - start_date=START_DATE, repeat=1, logy=True) - -setup_equiv_indexes = common_setup + """ -data_frames = {} -for x in range(100): - dr = np.asarray(DatetimeIndex(start=datetime(1990,1,1), end=datetime(2012,1,1), - freq=datetools.Day(1))) - df = DataFrame({"a": [0]*len(dr), "b": [1]*len(dr), - "c": [2]*len(dr)}, index=dr) - data_frames[x] = df -""" - -panel_from_dict_equiv_indexes = \ - Benchmark("Panel.from_dict(data_frames)", - setup_equiv_indexes, name='panel_from_dict_equiv_indexes', - start_date=START_DATE, repeat=1, logy=True) - -setup_all_different_indexes = common_setup + """ -data_frames = {} -start = datetime(1990,1,1) -end = datetime(2012,1,1) -for x in range(100): - end += timedelta(days=1) - dr = np.asarray(date_range(start, end)) - df = DataFrame({"a": [0]*len(dr), "b": [1]*len(dr), - "c": [2]*len(dr)}, index=dr) - data_frames[x] = df -""" -panel_from_dict_all_different_indexes = \ - Benchmark("Panel.from_dict(data_frames)", - setup_all_different_indexes, - name='panel_from_dict_all_different_indexes', - start_date=START_DATE, repeat=1, logy=True) - -setup_two_different_indexes = common_setup + """ -data_frames = {} -start = datetime(1990,1,1) -end = datetime(2012,1,1) -for x in range(100): - if x == 50: - end += timedelta(days=1) - dr = np.asarray(date_range(start, end)) - df = DataFrame({"a": [0]*len(dr), "b": [1]*len(dr), - "c": [2]*len(dr)}, index=dr) - data_frames[x] = df -""" -panel_from_dict_two_different_indexes = \ - Benchmark("Panel.from_dict(data_frames)", - setup_two_different_indexes, - name='panel_from_dict_two_different_indexes', - start_date=START_DATE, repeat=1, logy=True) diff --git a/vb_suite/panel_methods.py b/vb_suite/panel_methods.py deleted file mode 100644 index 28586422a66e3..0000000000000 --- a/vb_suite/panel_methods.py +++ /dev/null @@ -1,28 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# shift - -setup = common_setup + """ -index = date_range(start="2000", freq="D", periods=1000) -panel = Panel(np.random.randn(100, len(index), 1000)) -""" - -panel_shift = Benchmark('panel.shift(1)', setup, - start_date=datetime(2012, 1, 12)) - -panel_shift_minor = Benchmark('panel.shift(1, axis="minor")', setup, - start_date=datetime(2012, 1, 12)) - -panel_pct_change_major = Benchmark('panel.pct_change(1, axis="major")', setup, - start_date=datetime(2014, 4, 19)) - -panel_pct_change_minor = Benchmark('panel.pct_change(1, axis="minor")', setup, - start_date=datetime(2014, 4, 19)) - -panel_pct_change_items = Benchmark('panel.pct_change(1, axis="items")', setup, - start_date=datetime(2014, 4, 19)) diff --git a/vb_suite/parser_vb.py b/vb_suite/parser_vb.py deleted file mode 100644 index bb9ccbdb5e854..0000000000000 --- a/vb_suite/parser_vb.py +++ /dev/null @@ -1,112 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -from pandas import read_csv, read_table -""" - -setup = common_setup + """ -import os -N = 10000 -K = 8 -df = DataFrame(np.random.randn(N, K) * np.random.randint(100, 10000, (N, K))) -df.to_csv('test.csv', sep='|') -""" - -read_csv_vb = Benchmark("read_csv('test.csv', sep='|')", setup, - cleanup="os.remove('test.csv')", - start_date=datetime(2012, 5, 7)) - - -setup = common_setup + """ -import os -N = 10000 -K = 8 -format = lambda x: '{:,}'.format(x) -df = DataFrame(np.random.randn(N, K) * np.random.randint(100, 10000, (N, K))) -df = df.applymap(format) -df.to_csv('test.csv', sep='|') -""" - -read_csv_thou_vb = Benchmark("read_csv('test.csv', sep='|', thousands=',')", - setup, - cleanup="os.remove('test.csv')", - start_date=datetime(2012, 5, 7)) - -setup = common_setup + """ -data = ['A,B,C'] -data = data + ['1,2,3 # comment'] * 100000 -data = '\\n'.join(data) -""" - -stmt = "read_csv(StringIO(data), comment='#')" -read_csv_comment2 = Benchmark(stmt, setup, - start_date=datetime(2011, 11, 1)) - -setup = common_setup + """ -try: - from cStringIO import StringIO -except ImportError: - from io import StringIO - -import os -N = 10000 -K = 8 -data = '''\ -KORD,19990127, 19:00:00, 18:56:00, 0.8100, 2.8100, 7.2000, 0.0000, 280.0000 -KORD,19990127, 20:00:00, 19:56:00, 0.0100, 2.2100, 7.2000, 0.0000, 260.0000 -KORD,19990127, 21:00:00, 20:56:00, -0.5900, 2.2100, 5.7000, 0.0000, 280.0000 -KORD,19990127, 21:00:00, 21:18:00, -0.9900, 2.0100, 3.6000, 0.0000, 270.0000 -KORD,19990127, 22:00:00, 21:56:00, -0.5900, 1.7100, 5.1000, 0.0000, 290.0000 -''' -data = data * 200 -""" -cmd = ("read_table(StringIO(data), sep=',', header=None, " - "parse_dates=[[1,2], [1,3]])") -sdate = datetime(2012, 5, 7) -read_table_multiple_date = Benchmark(cmd, setup, start_date=sdate) - -setup = common_setup + """ -try: - from cStringIO import StringIO -except ImportError: - from io import StringIO - -import os -N = 10000 -K = 8 -data = '''\ -KORD,19990127 19:00:00, 18:56:00, 0.8100, 2.8100, 7.2000, 0.0000, 280.0000 -KORD,19990127 20:00:00, 19:56:00, 0.0100, 2.2100, 7.2000, 0.0000, 260.0000 -KORD,19990127 21:00:00, 20:56:00, -0.5900, 2.2100, 5.7000, 0.0000, 280.0000 -KORD,19990127 21:00:00, 21:18:00, -0.9900, 2.0100, 3.6000, 0.0000, 270.0000 -KORD,19990127 22:00:00, 21:56:00, -0.5900, 1.7100, 5.1000, 0.0000, 290.0000 -''' -data = data * 200 -""" -cmd = "read_table(StringIO(data), sep=',', header=None, parse_dates=[1])" -sdate = datetime(2012, 5, 7) -read_table_multiple_date_baseline = Benchmark(cmd, setup, start_date=sdate) - -setup = common_setup + """ -try: - from cStringIO import StringIO -except ImportError: - from io import StringIO - -data = '''\ -0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336 -0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285 -0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126 -0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394 -0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020 -''' -data = data * 200 -""" -cmd = "read_csv(StringIO(data), sep=',', header=None, float_precision=None)" -sdate = datetime(2014, 8, 20) -read_csv_default_converter = Benchmark(cmd, setup, start_date=sdate) -cmd = "read_csv(StringIO(data), sep=',', header=None, float_precision='high')" -read_csv_precise_converter = Benchmark(cmd, setup, start_date=sdate) -cmd = "read_csv(StringIO(data), sep=',', header=None, float_precision='round_trip')" -read_csv_roundtrip_converter = Benchmark(cmd, setup, start_date=sdate) diff --git a/vb_suite/perf_HEAD.py b/vb_suite/perf_HEAD.py deleted file mode 100755 index 143d943b9eadf..0000000000000 --- a/vb_suite/perf_HEAD.py +++ /dev/null @@ -1,243 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - -from __future__ import print_function - -"""Run all the vbenches in `suite`, and post the results as a json blob to gist - -""" - -import urllib2 -from contextlib import closing -from urllib2 import urlopen -import json - -import pandas as pd - -WEB_TIMEOUT = 10 - - -def get_travis_data(): - """figure out what worker we're running on, and the number of jobs it's running - """ - import os - jobid = os.environ.get("TRAVIS_JOB_ID") - if not jobid: - return None, None - - with closing(urlopen("https://api.travis-ci.org/workers/")) as resp: - workers = json.loads(resp.read()) - - host = njobs = None - for item in workers: - host = item.get("host") - id = ((item.get("payload") or {}).get("job") or {}).get("id") - if id and str(id) == str(jobid): - break - if host: - njobs = len( - [x for x in workers if host in x['host'] and x['payload']]) - - return host, njobs - - -def get_utcdatetime(): - try: - from datetime import datetime - return datetime.utcnow().isoformat(" ") - except: - pass - - -def dump_as_gist(data, desc="The Commit", njobs=None): - host, njobs2 = get_travis_data()[:2] - - if njobs: # be slightly more reliable - njobs = max(njobs, njobs2) - - content = dict(version="0.1.1", - timings=data, - datetime=get_utcdatetime(), # added in 0.1.1 - hostname=host, # added in 0.1.1 - njobs=njobs # added in 0.1.1, a measure of load on the travis box - ) - - payload = dict(description=desc, - public=True, - files={'results.json': dict(content=json.dumps(content))}) - try: - with closing(urlopen("https://api.github.com/gists", - json.dumps(payload), timeout=WEB_TIMEOUT)) as r: - if 200 <= r.getcode() < 300: - print("\n\n" + "-" * 80) - - gist = json.loads(r.read()) - file_raw_url = gist['files'].items()[0][1]['raw_url'] - print("[vbench-gist-raw_url] %s" % file_raw_url) - print("[vbench-html-url] %s" % gist['html_url']) - print("[vbench-api-url] %s" % gist['url']) - - print("-" * 80 + "\n\n") - else: - print("api.github.com returned status %d" % r.getcode()) - except: - print("Error occured while dumping to gist") - - -def main(): - import warnings - from suite import benchmarks - - exit_code = 0 - warnings.filterwarnings('ignore', category=FutureWarning) - - host, njobs = get_travis_data()[:2] - results = [] - for b in benchmarks: - try: - d = b.run() - d.update(dict(name=b.name)) - results.append(d) - msg = "{name:<40}: {timing:> 10.4f} [ms]" - print(msg.format(name=results[-1]['name'], - timing=results[-1]['timing'])) - - except Exception as e: - exit_code = 1 - if (type(e) == KeyboardInterrupt or - 'KeyboardInterrupt' in str(d)): - raise KeyboardInterrupt() - - msg = "{name:<40}: ERROR:\n<-------" - print(msg.format(name=b.name)) - if isinstance(d, dict): - if d['succeeded']: - print("\nException:\n%s\n" % str(e)) - else: - for k, v in sorted(d.iteritems()): - print("{k}: {v}".format(k=k, v=v)) - - print("------->\n") - - dump_as_gist(results, "testing", njobs=njobs) - - return exit_code - - -if __name__ == "__main__": - import sys - sys.exit(main()) - -##################################################### -# functions for retrieving and processing the results - - -def get_vbench_log(build_url): - with closing(urllib2.urlopen(build_url)) as r: - if not (200 <= r.getcode() < 300): - return - - s = json.loads(r.read()) - s = [x for x in s['matrix'] if "VBENCH" in ((x.get('config', {}) - or {}).get('env', {}) or {})] - # s=[x for x in s['matrix']] - if not s: - return - id = s[0]['id'] # should be just one for now - with closing(urllib2.urlopen("https://api.travis-ci.org/jobs/%s" % id)) as r2: - if not 200 <= r.getcode() < 300: - return - s2 = json.loads(r2.read()) - return s2.get('log') - - -def get_results_raw_url(build): - "Taks a Travis a build number, retrieves the build log and extracts the gist url" - import re - log = get_vbench_log("https://api.travis-ci.org/builds/%s" % build) - if not log: - return - l = [x.strip( - ) for x in log.split("\n") if re.match(".vbench-gist-raw_url", x)] - if l: - s = l[0] - m = re.search("(https://[^\s]+)", s) - if m: - return m.group(0) - - -def convert_json_to_df(results_url): - """retrieve json results file from url and return df - - df contains timings for all successful vbenchmarks - """ - - with closing(urlopen(results_url)) as resp: - res = json.loads(resp.read()) - timings = res.get("timings") - if not timings: - return - res = [x for x in timings if x.get('succeeded')] - df = pd.DataFrame(res) - df = df.set_index("name") - return df - - -def get_build_results(build): - "Returns a df with the results of the VBENCH job associated with the travis build" - r_url = get_results_raw_url(build) - if not r_url: - return - - return convert_json_to_df(r_url) - - -def get_all_results(repo_id=53976): # travis pandas-dev/pandas id - """Fetches the VBENCH results for all travis builds, and returns a list of result df - - unsuccesful individual vbenches are dropped. - """ - from collections import OrderedDict - - def get_results_from_builds(builds): - dfs = OrderedDict() - for build in builds: - build_id = build['id'] - build_number = build['number'] - print(build_number) - res = get_build_results(build_id) - if res is not None: - dfs[build_number] = res - return dfs - - base_url = 'https://api.travis-ci.org/builds?url=%2Fbuilds&repository_id={repo_id}' - url = base_url.format(repo_id=repo_id) - url_after = url + '&after_number={after}' - dfs = OrderedDict() - - while True: - with closing(urlopen(url)) as r: - if not (200 <= r.getcode() < 300): - break - builds = json.loads(r.read()) - res = get_results_from_builds(builds) - if not res: - break - last_build_number = min(res.keys()) - dfs.update(res) - url = url_after.format(after=last_build_number) - - return dfs - - -def get_all_results_joined(repo_id=53976): - def mk_unique(df): - for dupe in df.index.get_duplicates(): - df = df.ix[df.index != dupe] - return df - dfs = get_all_results(repo_id) - for k in dfs: - dfs[k] = mk_unique(dfs[k]) - ss = [pd.Series(v.timing, name=k) for k, v in dfs.iteritems()] - results = pd.concat(reversed(ss), 1) - return results diff --git a/vb_suite/plotting.py b/vb_suite/plotting.py deleted file mode 100644 index 79e81e9eea8f4..0000000000000 --- a/vb_suite/plotting.py +++ /dev/null @@ -1,25 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * - -try: - from pandas import date_range -except ImportError: - def date_range(start=None, end=None, periods=None, freq=None): - return DatetimeIndex(start, end, periods=periods, offset=freq) - -""" - -#----------------------------------------------------------------------------- -# Timeseries plotting - -setup = common_setup + """ -N = 2000 -M = 5 -df = DataFrame(np.random.randn(N,M), index=date_range('1/1/1975', periods=N)) -""" - -plot_timeseries_period = Benchmark("df.plot()", setup=setup, - name='plot_timeseries_period') - diff --git a/vb_suite/reindex.py b/vb_suite/reindex.py deleted file mode 100644 index 443eb43835745..0000000000000 --- a/vb_suite/reindex.py +++ /dev/null @@ -1,225 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# DataFrame reindex columns - -setup = common_setup + """ -df = DataFrame(index=range(10000), data=np.random.rand(10000,30), - columns=range(30)) -""" -statement = "df.reindex(columns=df.columns[1:5])" - -frame_reindex_columns = Benchmark(statement, setup) - -#---------------------------------------------------------------------- - -setup = common_setup + """ -rng = DatetimeIndex(start='1/1/1970', periods=10000, freq=datetools.Minute()) -df = DataFrame(np.random.rand(10000, 10), index=rng, - columns=range(10)) -df['foo'] = 'bar' -rng2 = Index(rng[::2]) -""" -statement = "df.reindex(rng2)" -dataframe_reindex = Benchmark(statement, setup) - -#---------------------------------------------------------------------- -# multiindex reindexing - -setup = common_setup + """ -N = 1000 -K = 20 - -level1 = tm.makeStringIndex(N).values.repeat(K) -level2 = np.tile(tm.makeStringIndex(K).values, N) -index = MultiIndex.from_arrays([level1, level2]) - -s1 = Series(np.random.randn(N * K), index=index) -s2 = s1[::2] -""" -statement = "s1.reindex(s2.index)" -reindex_multi = Benchmark(statement, setup, - name='reindex_multiindex', - start_date=datetime(2011, 9, 1)) - -#---------------------------------------------------------------------- -# Pad / backfill - -def pad(source_series, target_index): - try: - source_series.reindex(target_index, method='pad') - except: - source_series.reindex(target_index, fillMethod='pad') - -def backfill(source_series, target_index): - try: - source_series.reindex(target_index, method='backfill') - except: - source_series.reindex(target_index, fillMethod='backfill') - -setup = common_setup + """ -rng = date_range('1/1/2000', periods=100000, freq=datetools.Minute()) - -ts = Series(np.random.randn(len(rng)), index=rng) -ts2 = ts[::2] -ts3 = ts2.reindex(ts.index) -ts4 = ts3.astype('float32') - -def pad(source_series, target_index): - try: - source_series.reindex(target_index, method='pad') - except: - source_series.reindex(target_index, fillMethod='pad') -def backfill(source_series, target_index): - try: - source_series.reindex(target_index, method='backfill') - except: - source_series.reindex(target_index, fillMethod='backfill') -""" - -statement = "pad(ts2, ts.index)" -reindex_daterange_pad = Benchmark(statement, setup, - name="reindex_daterange_pad") - -statement = "backfill(ts2, ts.index)" -reindex_daterange_backfill = Benchmark(statement, setup, - name="reindex_daterange_backfill") - -reindex_fillna_pad = Benchmark("ts3.fillna(method='pad')", setup, - name="reindex_fillna_pad", - start_date=datetime(2011, 3, 1)) - -reindex_fillna_pad_float32 = Benchmark("ts4.fillna(method='pad')", setup, - name="reindex_fillna_pad_float32", - start_date=datetime(2013, 1, 1)) - -reindex_fillna_backfill = Benchmark("ts3.fillna(method='backfill')", setup, - name="reindex_fillna_backfill", - start_date=datetime(2011, 3, 1)) -reindex_fillna_backfill_float32 = Benchmark("ts4.fillna(method='backfill')", setup, - name="reindex_fillna_backfill_float32", - start_date=datetime(2013, 1, 1)) - -#---------------------------------------------------------------------- -# align on level - -setup = common_setup + """ -index = MultiIndex(levels=[np.arange(10), np.arange(100), np.arange(100)], - labels=[np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)]) -random.shuffle(index.values) -df = DataFrame(np.random.randn(len(index), 4), index=index) -df_level = DataFrame(np.random.randn(100, 4), index=index.levels[1]) -""" - -reindex_frame_level_align = \ - Benchmark("df.align(df_level, level=1, copy=False)", setup, - name='reindex_frame_level_align', - start_date=datetime(2011, 12, 27)) - -reindex_frame_level_reindex = \ - Benchmark("df_level.reindex(df.index, level=1)", setup, - name='reindex_frame_level_reindex', - start_date=datetime(2011, 12, 27)) - - -#---------------------------------------------------------------------- -# sort_index, drop_duplicates - -# pathological, but realistic -setup = common_setup + """ -N = 10000 -K = 10 - -key1 = tm.makeStringIndex(N).values.repeat(K) -key2 = tm.makeStringIndex(N).values.repeat(K) - -df = DataFrame({'key1' : key1, 'key2' : key2, - 'value' : np.random.randn(N * K)}) -col_array_list = list(df.values.T) -""" -statement = "df.sort_index(by=['key1', 'key2'])" -frame_sort_index_by_columns = Benchmark(statement, setup, - start_date=datetime(2011, 11, 1)) - -# drop_duplicates - -statement = "df.drop_duplicates(['key1', 'key2'])" -frame_drop_duplicates = Benchmark(statement, setup, - start_date=datetime(2011, 11, 15)) - -statement = "df.drop_duplicates(['key1', 'key2'], inplace=True)" -frame_drop_dup_inplace = Benchmark(statement, setup, - start_date=datetime(2012, 5, 16)) - -lib_fast_zip = Benchmark('lib.fast_zip(col_array_list)', setup, - name='lib_fast_zip', - start_date=datetime(2012, 1, 1)) - -setup = setup + """ -df.ix[:10000, :] = np.nan -""" -statement2 = "df.drop_duplicates(['key1', 'key2'])" -frame_drop_duplicates_na = Benchmark(statement2, setup, - start_date=datetime(2012, 5, 15)) - -lib_fast_zip_fillna = Benchmark('lib.fast_zip_fillna(col_array_list)', setup, - start_date=datetime(2012, 5, 15)) - -statement2 = "df.drop_duplicates(['key1', 'key2'], inplace=True)" -frame_drop_dup_na_inplace = Benchmark(statement2, setup, - start_date=datetime(2012, 5, 16)) - -setup = common_setup + """ -s = Series(np.random.randint(0, 1000, size=10000)) -s2 = Series(np.tile(tm.makeStringIndex(1000).values, 10)) -""" - -series_drop_duplicates_int = Benchmark('s.drop_duplicates()', setup, - start_date=datetime(2012, 11, 27)) - -series_drop_duplicates_string = \ - Benchmark('s2.drop_duplicates()', setup, - start_date=datetime(2012, 11, 27)) - -#---------------------------------------------------------------------- -# fillna, many columns - - -setup = common_setup + """ -values = np.random.randn(1000, 1000) -values[::2] = np.nan -df = DataFrame(values) -""" - -frame_fillna_many_columns_pad = Benchmark("df.fillna(method='pad')", - setup, - start_date=datetime(2011, 3, 1)) - -#---------------------------------------------------------------------- -# blog "pandas escaped the zoo" - -setup = common_setup + """ -n = 50000 -indices = tm.makeStringIndex(n) - -def sample(values, k): - from random import shuffle - sampler = np.arange(len(values)) - shuffle(sampler) - return values.take(sampler[:k]) - -subsample_size = 40000 - -x = Series(np.random.randn(50000), indices) -y = Series(np.random.randn(subsample_size), - index=sample(indices, subsample_size)) -""" - -series_align_irregular_string = Benchmark("x + y", setup, - start_date=datetime(2010, 6, 1)) diff --git a/vb_suite/replace.py b/vb_suite/replace.py deleted file mode 100644 index 9326aa5becca9..0000000000000 --- a/vb_suite/replace.py +++ /dev/null @@ -1,36 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -from datetime import timedelta - -N = 1000000 - -try: - rng = date_range('1/1/2000', periods=N, freq='min') -except NameError: - rng = DatetimeIndex('1/1/2000', periods=N, offset=datetools.Minute()) - date_range = DateRange - -ts = Series(np.random.randn(N), index=rng) -""" - -large_dict_setup = """from .pandas_vb_common import * -from pandas.compat import range -n = 10 ** 6 -start_value = 10 ** 5 -to_rep = dict((i, start_value + i) for i in range(n)) -s = Series(np.random.randint(n, size=10 ** 3)) -""" - -replace_fillna = Benchmark('ts.fillna(0., inplace=True)', common_setup, - name='replace_fillna', - start_date=datetime(2012, 4, 4)) -replace_replacena = Benchmark('ts.replace(np.nan, 0., inplace=True)', - common_setup, - name='replace_replacena', - start_date=datetime(2012, 5, 15)) -replace_large_dict = Benchmark('s.replace(to_rep, inplace=True)', - large_dict_setup, - name='replace_large_dict', - start_date=datetime(2014, 4, 6)) diff --git a/vb_suite/reshape.py b/vb_suite/reshape.py deleted file mode 100644 index daab96103f2c5..0000000000000 --- a/vb_suite/reshape.py +++ /dev/null @@ -1,65 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -index = MultiIndex.from_arrays([np.arange(100).repeat(100), - np.roll(np.tile(np.arange(100), 100), 25)]) -df = DataFrame(np.random.randn(10000, 4), index=index) -""" - -reshape_unstack_simple = Benchmark('df.unstack(1)', common_setup, - start_date=datetime(2011, 10, 1)) - -setup = common_setup + """ -udf = df.unstack(1) -""" - -reshape_stack_simple = Benchmark('udf.stack()', setup, - start_date=datetime(2011, 10, 1)) - -setup = common_setup + """ -def unpivot(frame): - N, K = frame.shape - data = {'value' : frame.values.ravel('F'), - 'variable' : np.asarray(frame.columns).repeat(N), - 'date' : np.tile(np.asarray(frame.index), K)} - return DataFrame(data, columns=['date', 'variable', 'value']) -index = date_range('1/1/2000', periods=10000, freq='h') -df = DataFrame(randn(10000, 50), index=index, columns=range(50)) -pdf = unpivot(df) -f = lambda: pdf.pivot('date', 'variable', 'value') -""" - -reshape_pivot_time_series = Benchmark('f()', setup, - start_date=datetime(2012, 5, 1)) - -# Sparse key space, re: #2278 - -setup = common_setup + """ -NUM_ROWS = 1000 -for iter in range(10): - df = DataFrame({'A' : np.random.randint(50, size=NUM_ROWS), - 'B' : np.random.randint(50, size=NUM_ROWS), - 'C' : np.random.randint(-10,10, size=NUM_ROWS), - 'D' : np.random.randint(-10,10, size=NUM_ROWS), - 'E' : np.random.randint(10, size=NUM_ROWS), - 'F' : np.random.randn(NUM_ROWS)}) - idf = df.set_index(['A', 'B', 'C', 'D', 'E']) - if len(idf.index.unique()) == NUM_ROWS: - break -""" - -unstack_sparse_keyspace = Benchmark('idf.unstack()', setup, - start_date=datetime(2011, 10, 1)) - -# Melt - -setup = common_setup + """ -from pandas.core.reshape import melt -df = DataFrame(np.random.randn(10000, 3), columns=['A', 'B', 'C']) -df['id1'] = np.random.randint(0, 10, 10000) -df['id2'] = np.random.randint(100, 1000, 10000) -""" - -melt_dataframe = Benchmark("melt(df, id_vars=['id1', 'id2'])", setup, - start_date=datetime(2012, 8, 1)) diff --git a/vb_suite/run_suite.py b/vb_suite/run_suite.py deleted file mode 100755 index 43bf24faae43a..0000000000000 --- a/vb_suite/run_suite.py +++ /dev/null @@ -1,15 +0,0 @@ -#!/usr/bin/env python -from vbench.api import BenchmarkRunner -from suite import * - - -def run_process(): - runner = BenchmarkRunner(benchmarks, REPO_PATH, REPO_URL, - BUILD, DB_PATH, TMP_DIR, PREPARE, - always_clean=True, - run_option='eod', start_date=START_DATE, - module_dependencies=dependencies) - runner.run() - -if __name__ == '__main__': - run_process() diff --git a/vb_suite/series_methods.py b/vb_suite/series_methods.py deleted file mode 100644 index cd8688495fa09..0000000000000 --- a/vb_suite/series_methods.py +++ /dev/null @@ -1,39 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -setup = common_setup + """ -s1 = Series(np.random.randn(10000)) -s2 = Series(np.random.randint(1, 10, 10000)) -s3 = Series(np.random.randint(1, 10, 100000)).astype('int64') -values = [1,2] -s4 = s3.astype('object') -""" - -series_nlargest1 = Benchmark('s1.nlargest(3, take_last=True);' - 's1.nlargest(3, take_last=False)', - setup, - start_date=datetime(2014, 1, 25)) -series_nlargest2 = Benchmark('s2.nlargest(3, take_last=True);' - 's2.nlargest(3, take_last=False)', - setup, - start_date=datetime(2014, 1, 25)) - -series_nsmallest2 = Benchmark('s1.nsmallest(3, take_last=True);' - 's1.nsmallest(3, take_last=False)', - setup, - start_date=datetime(2014, 1, 25)) - -series_nsmallest2 = Benchmark('s2.nsmallest(3, take_last=True);' - 's2.nsmallest(3, take_last=False)', - setup, - start_date=datetime(2014, 1, 25)) - -series_isin_int64 = Benchmark('s3.isin(values)', - setup, - start_date=datetime(2014, 1, 25)) -series_isin_object = Benchmark('s4.isin(values)', - setup, - start_date=datetime(2014, 1, 25)) diff --git a/vb_suite/source/conf.py b/vb_suite/source/conf.py deleted file mode 100644 index d83448fd97d09..0000000000000 --- a/vb_suite/source/conf.py +++ /dev/null @@ -1,225 +0,0 @@ -# -*- coding: utf-8 -*- -# -# pandas documentation build configuration file, created by -# -# This file is execfile()d with the current directory set to its containing dir. -# -# Note that not all possible configuration values are present in this -# autogenerated file. -# -# All configuration values have a default; values that are commented out -# serve to show the default. - -import sys -import os - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# sys.path.append(os.path.abspath('.')) -sys.path.insert(0, os.path.abspath('../sphinxext')) - -sys.path.extend([ - - # numpy standard doc extensions - os.path.join(os.path.dirname(__file__), - '..', '../..', - 'sphinxext') - -]) - -# -- General configuration ----------------------------------------------- - -# Add any Sphinx extension module names here, as strings. They can be extensions -# coming with Sphinx (named 'sphinx.ext.*') or your custom ones. sphinxext. - -extensions = ['sphinx.ext.autodoc', - 'sphinx.ext.doctest'] - -# Add any paths that contain templates here, relative to this directory. -templates_path = ['_templates', '_templates/autosummary'] - -# The suffix of source filenames. -source_suffix = '.rst' - -# The encoding of source files. -# source_encoding = 'utf-8' - -# The master toctree document. -master_doc = 'index' - -# General information about the project. -project = u'pandas' -copyright = u'2008-2011, the pandas development team' - -# The version info for the project you're documenting, acts as replacement for -# |version| and |release|, also used in various other places throughout the -# built documents. -# -# The short X.Y version. -import pandas - -# version = '%s r%s' % (pandas.__version__, svn_version()) -version = '%s' % (pandas.__version__) - -# The full version, including alpha/beta/rc tags. -release = version - -# JP: added from sphinxdocs -autosummary_generate = True - -# The language for content autogenerated by Sphinx. Refer to documentation -# for a list of supported languages. -# language = None - -# There are two options for replacing |today|: either, you set today to some -# non-false value, then it is used: -# today = '' -# Else, today_fmt is used as the format for a strftime call. -# today_fmt = '%B %d, %Y' - -# List of documents that shouldn't be included in the build. -# unused_docs = [] - -# List of directories, relative to source directory, that shouldn't be searched -# for source files. -exclude_trees = [] - -# The reST default role (used for this markup: `text`) to use for all documents. -# default_role = None - -# If true, '()' will be appended to :func: etc. cross-reference text. -# add_function_parentheses = True - -# If true, the current module name will be prepended to all description -# unit titles (such as .. function::). -# add_module_names = True - -# If true, sectionauthor and moduleauthor directives will be shown in the -# output. They are ignored by default. -# show_authors = False - -# The name of the Pygments (syntax highlighting) style to use. -pygments_style = 'sphinx' - -# A list of ignored prefixes for module index sorting. -# modindex_common_prefix = [] - - -# -- Options for HTML output --------------------------------------------- - -# The theme to use for HTML and HTML Help pages. Major themes that come with -# Sphinx are currently 'default' and 'sphinxdoc'. -html_theme = 'agogo' - -# The style sheet to use for HTML and HTML Help pages. A file of that name -# must exist either in Sphinx' static/ path, or in one of the custom paths -# given in html_static_path. -# html_style = 'statsmodels.css' - -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -# html_theme_options = {} - -# Add any paths that contain custom themes here, relative to this directory. -html_theme_path = ['themes'] - -# The name for this set of Sphinx documents. If None, it defaults to -# " v documentation". -html_title = 'Vbench performance benchmarks for pandas' - -# A shorter title for the navigation bar. Default is the same as html_title. -# html_short_title = None - -# The name of an image file (relative to this directory) to place at the top -# of the sidebar. -# html_logo = None - -# The name of an image file (within the static path) to use as favicon of the -# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 -# pixels large. -# html_favicon = None - -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = ['_static'] - -# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, -# using the given strftime format. -# html_last_updated_fmt = '%b %d, %Y' - -# If true, SmartyPants will be used to convert quotes and dashes to -# typographically correct entities. -# html_use_smartypants = True - -# Custom sidebar templates, maps document names to template names. -# html_sidebars = {} - -# Additional templates that should be rendered to pages, maps page names to -# template names. -# html_additional_pages = {} - -# If false, no module index is generated. -html_use_modindex = True - -# If false, no index is generated. -# html_use_index = True - -# If true, the index is split into individual pages for each letter. -# html_split_index = False - -# If true, links to the reST sources are added to the pages. -# html_show_sourcelink = True - -# If true, an OpenSearch description file will be output, and all pages will -# contain a tag referring to it. The value of this option must be the -# base URL from which the finished HTML is served. -# html_use_opensearch = '' - -# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml"). -# html_file_suffix = '' - -# Output file base name for HTML help builder. -htmlhelp_basename = 'performance' - - -# -- Options for LaTeX output -------------------------------------------- - -# The paper size ('letter' or 'a4'). -# latex_paper_size = 'letter' - -# The font size ('10pt', '11pt' or '12pt'). -# latex_font_size = '10pt' - -# Grouping the document tree into LaTeX files. List of tuples -# (source start file, target name, title, author, documentclass [howto/manual]). -latex_documents = [ - ('index', 'performance.tex', - u'pandas vbench Performance Benchmarks', - u'Wes McKinney', 'manual'), -] - -# The name of an image file (relative to this directory) to place at the top of -# the title page. -# latex_logo = None - -# For "manual" documents, if this is true, then toplevel headings are parts, -# not chapters. -# latex_use_parts = False - -# Additional stuff for the LaTeX preamble. -# latex_preamble = '' - -# Documents to append as an appendix to all manuals. -# latex_appendices = [] - -# If false, no module index is generated. -# latex_use_modindex = True - - -# Example configuration for intersphinx: refer to the Python standard library. -# intersphinx_mapping = {'http://docs.scipy.org/': None} -import glob -autosummary_generate = glob.glob("*.rst") diff --git a/vb_suite/source/themes/agogo/layout.html b/vb_suite/source/themes/agogo/layout.html deleted file mode 100644 index cd0f3d7ffc9c7..0000000000000 --- a/vb_suite/source/themes/agogo/layout.html +++ /dev/null @@ -1,95 +0,0 @@ -{# - agogo/layout.html - ~~~~~~~~~~~~~~~~~ - - Sphinx layout template for the agogo theme, originally written - by Andi Albrecht. - - :copyright: Copyright 2007-2011 by the Sphinx team, see AUTHORS. - :license: BSD, see LICENSE for details. -#} -{% extends "basic/layout.html" %} - -{% block header %} -
-
- {%- if logo %} - - {%- endif %} - {%- block headertitle %} -

{{ shorttitle|e }}

- {%- endblock %} -
- {%- for rellink in rellinks|reverse %} - {{ rellink[3] }} - {%- if not loop.last %}{{ reldelim2 }}{% endif %} - {%- endfor %} -
-
-
-{% endblock %} - -{% block content %} -
-
- -
- {%- block document %} - {{ super() }} - {%- endblock %} -
-
-
-
-{% endblock %} - -{% block footer %} - -{% endblock %} - -{% block relbar1 %}{% endblock %} -{% block relbar2 %}{% endblock %} diff --git a/vb_suite/source/themes/agogo/static/agogo.css_t b/vb_suite/source/themes/agogo/static/agogo.css_t deleted file mode 100644 index ef909b72e20f6..0000000000000 --- a/vb_suite/source/themes/agogo/static/agogo.css_t +++ /dev/null @@ -1,476 +0,0 @@ -/* - * agogo.css_t - * ~~~~~~~~~~~ - * - * Sphinx stylesheet -- agogo theme. - * - * :copyright: Copyright 2007-2011 by the Sphinx team, see AUTHORS. - * :license: BSD, see LICENSE for details. - * - */ - -* { - margin: 0px; - padding: 0px; -} - -body { - font-family: {{ theme_bodyfont }}; - line-height: 1.4em; - color: black; - background-color: {{ theme_bgcolor }}; -} - - -/* Page layout */ - -div.header, div.content, div.footer { - max-width: {{ theme_pagewidth }}; - margin-left: auto; - margin-right: auto; -} - -div.header-wrapper { - background: {{ theme_headerbg }}; - padding: 1em 1em 0; - border-bottom: 3px solid #2e3436; - min-height: 0px; -} - - -/* Default body styles */ -a { - color: {{ theme_linkcolor }}; -} - -div.bodywrapper a, div.footer a { - text-decoration: underline; -} - -.clearer { - clear: both; -} - -.left { - float: left; -} - -.right { - float: right; -} - -.line-block { - display: block; - margin-top: 1em; - margin-bottom: 1em; -} - -.line-block .line-block { - margin-top: 0; - margin-bottom: 0; - margin-left: 1.5em; -} - -h1, h2, h3, h4 { - font-family: {{ theme_headerfont }}; - font-weight: normal; - color: {{ theme_headercolor2 }}; - margin-bottom: .8em; -} - -h1 { - color: {{ theme_headercolor1 }}; -} - -h2 { - padding-bottom: .5em; - border-bottom: 1px solid {{ theme_headercolor2 }}; -} - -a.headerlink { - visibility: hidden; - color: #dddddd; - padding-left: .3em; -} - -h1:hover > a.headerlink, -h2:hover > a.headerlink, -h3:hover > a.headerlink, -h4:hover > a.headerlink, -h5:hover > a.headerlink, -h6:hover > a.headerlink, -dt:hover > a.headerlink { - visibility: visible; -} - -img { - border: 0; -} - -pre { - background-color: #EEE; - padding: 0.5em; -} - -div.admonition { - margin-top: 10px; - margin-bottom: 10px; - padding: 2px 7px 1px 7px; - border-left: 0.2em solid black; -} - -p.admonition-title { - margin: 0px 10px 5px 0px; - font-weight: bold; -} - -dt:target, .highlighted { - background-color: #fbe54e; -} - -/* Header */ - -/* -div.header { - padding-top: 10px; - padding-bottom: 10px; -} -*/ - -div.header {} - -div.header h1 { - font-family: {{ theme_headerfont }}; - font-weight: normal; - font-size: 180%; - letter-spacing: .08em; -} - -div.header h1 a { - color: white; -} - -div.header div.rel { - text-decoration: none; -} -/* margin-top: 1em; */ - -div.header div.rel a { - margin-top: 1em; - color: {{ theme_headerlinkcolor }}; - letter-spacing: .1em; - text-transform: uppercase; - padding: 3px 1em; -} - -p.logo { - float: right; -} - -img.logo { - border: 0; -} - - -/* Content */ -div.content-wrapper { - background-color: white; - padding: 1em; -} -/* - padding-top: 20px; - padding-bottom: 20px; -*/ - -/* float: left; */ - -div.document { - max-width: {{ theme_documentwidth }}; -} - -div.body { - padding-right: 2em; - text-align: {{ theme_textalign }}; -} - -div.document ul { - margin: 1.5em; - list-style-type: square; -} - -div.document dd { - margin-left: 1.2em; - margin-top: .4em; - margin-bottom: 1em; -} - -div.document .section { - margin-top: 1.7em; -} -div.document .section:first-child { - margin-top: 0px; -} - -div.document div.highlight { - padding: 3px; - background-color: #eeeeec; - border-top: 2px solid #dddddd; - border-bottom: 2px solid #dddddd; - margin-top: .8em; - margin-bottom: .8em; -} - -div.document h2 { - margin-top: .7em; -} - -div.document p { - margin-bottom: .5em; -} - -div.document li.toctree-l1 { - margin-bottom: 1em; -} - -div.document .descname { - font-weight: bold; -} - -div.document .docutils.literal { - background-color: #eeeeec; - padding: 1px; -} - -div.document .docutils.xref.literal { - background-color: transparent; - padding: 0px; -} - -div.document blockquote { - margin: 1em; -} - -div.document ol { - margin: 1.5em; -} - - -/* Sidebar */ - - -div.sidebar { - width: {{ theme_sidebarwidth }}; - padding: 0 1em; - float: right; - font-size: .93em; -} - -div.sidebar a, div.header a { - text-decoration: none; -} - -div.sidebar a:hover, div.header a:hover { - text-decoration: underline; -} - -div.sidebar h3 { - color: #2e3436; - text-transform: uppercase; - font-size: 130%; - letter-spacing: .1em; -} - -div.sidebar ul { - list-style-type: none; -} - -div.sidebar li.toctree-l1 a { - display: block; - padding: 1px; - border: 1px solid #dddddd; - background-color: #eeeeec; - margin-bottom: .4em; - padding-left: 3px; - color: #2e3436; -} - -div.sidebar li.toctree-l2 a { - background-color: transparent; - border: none; - margin-left: 1em; - border-bottom: 1px solid #dddddd; -} - -div.sidebar li.toctree-l3 a { - background-color: transparent; - border: none; - margin-left: 2em; - border-bottom: 1px solid #dddddd; -} - -div.sidebar li.toctree-l2:last-child a { - border-bottom: none; -} - -div.sidebar li.toctree-l1.current a { - border-right: 5px solid {{ theme_headerlinkcolor }}; -} - -div.sidebar li.toctree-l1.current li.toctree-l2 a { - border-right: none; -} - - -/* Footer */ - -div.footer-wrapper { - background: {{ theme_footerbg }}; - border-top: 4px solid #babdb6; - padding-top: 10px; - padding-bottom: 10px; - min-height: 80px; -} - -div.footer, div.footer a { - color: #888a85; -} - -div.footer .right { - text-align: right; -} - -div.footer .left { - text-transform: uppercase; -} - - -/* Styles copied from basic theme */ - -img.align-left, .figure.align-left, object.align-left { - clear: left; - float: left; - margin-right: 1em; -} - -img.align-right, .figure.align-right, object.align-right { - clear: right; - float: right; - margin-left: 1em; -} - -img.align-center, .figure.align-center, object.align-center { - display: block; - margin-left: auto; - margin-right: auto; -} - -.align-left { - text-align: left; -} - -.align-center { - clear: both; - text-align: center; -} - -.align-right { - text-align: right; -} - -/* -- search page ----------------------------------------------------------- */ - -ul.search { - margin: 10px 0 0 20px; - padding: 0; -} - -ul.search li { - padding: 5px 0 5px 20px; - background-image: url(file.png); - background-repeat: no-repeat; - background-position: 0 7px; -} - -ul.search li a { - font-weight: bold; -} - -ul.search li div.context { - color: #888; - margin: 2px 0 0 30px; - text-align: left; -} - -ul.keywordmatches li.goodmatch a { - font-weight: bold; -} - -/* -- index page ------------------------------------------------------------ */ - -table.contentstable { - width: 90%; -} - -table.contentstable p.biglink { - line-height: 150%; -} - -a.biglink { - font-size: 1.3em; -} - -span.linkdescr { - font-style: italic; - padding-top: 5px; - font-size: 90%; -} - -/* -- general index --------------------------------------------------------- */ - -table.indextable td { - text-align: left; - vertical-align: top; -} - -table.indextable dl, table.indextable dd { - margin-top: 0; - margin-bottom: 0; -} - -table.indextable tr.pcap { - height: 10px; -} - -table.indextable tr.cap { - margin-top: 10px; - background-color: #f2f2f2; -} - -img.toggler { - margin-right: 3px; - margin-top: 3px; - cursor: pointer; -} - -/* -- viewcode extension ---------------------------------------------------- */ - -.viewcode-link { - float: right; -} - -.viewcode-back { - float: right; - font-family:: {{ theme_bodyfont }}; -} - -div.viewcode-block:target { - margin: -1px -3px; - padding: 0 3px; - background-color: #f4debf; - border-top: 1px solid #ac9; - border-bottom: 1px solid #ac9; -} - -th.field-name { - white-space: nowrap; -} diff --git a/vb_suite/source/themes/agogo/static/bgfooter.png b/vb_suite/source/themes/agogo/static/bgfooter.png deleted file mode 100644 index 9ce5bdd902943..0000000000000 Binary files a/vb_suite/source/themes/agogo/static/bgfooter.png and /dev/null differ diff --git a/vb_suite/source/themes/agogo/static/bgtop.png b/vb_suite/source/themes/agogo/static/bgtop.png deleted file mode 100644 index a0d4709bac8f7..0000000000000 Binary files a/vb_suite/source/themes/agogo/static/bgtop.png and /dev/null differ diff --git a/vb_suite/source/themes/agogo/theme.conf b/vb_suite/source/themes/agogo/theme.conf deleted file mode 100644 index 3fc88580f1ab4..0000000000000 --- a/vb_suite/source/themes/agogo/theme.conf +++ /dev/null @@ -1,19 +0,0 @@ -[theme] -inherit = basic -stylesheet = agogo.css -pygments_style = tango - -[options] -bodyfont = "Verdana", Arial, sans-serif -headerfont = "Georgia", "Times New Roman", serif -pagewidth = 70em -documentwidth = 50em -sidebarwidth = 20em -bgcolor = #eeeeec -headerbg = url(bgtop.png) top left repeat-x -footerbg = url(bgfooter.png) top left repeat-x -linkcolor = #ce5c00 -headercolor1 = #204a87 -headercolor2 = #3465a4 -headerlinkcolor = #fcaf3e -textalign = justify \ No newline at end of file diff --git a/vb_suite/sparse.py b/vb_suite/sparse.py deleted file mode 100644 index 53e2778ee0865..0000000000000 --- a/vb_suite/sparse.py +++ /dev/null @@ -1,65 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- - -setup = common_setup + """ -from pandas.core.sparse import SparseSeries, SparseDataFrame - -K = 50 -N = 50000 -rng = np.asarray(date_range('1/1/2000', periods=N, - freq='T')) - -# rng2 = np.asarray(rng).astype('M8[ns]').astype('i8') - -series = {} -for i in range(1, K + 1): - data = np.random.randn(N)[:-i] - this_rng = rng[:-i] - data[100:] = np.nan - series[i] = SparseSeries(data, index=this_rng) -""" -stmt = "SparseDataFrame(series)" - -bm_sparse1 = Benchmark(stmt, setup, name="sparse_series_to_frame", - start_date=datetime(2011, 6, 1)) - - -setup = common_setup + """ -from pandas.core.sparse import SparseDataFrame -""" - -stmt = "SparseDataFrame(columns=np.arange(100), index=np.arange(1000))" - -sparse_constructor = Benchmark(stmt, setup, name="sparse_frame_constructor", - start_date=datetime(2012, 6, 1)) - - -setup = common_setup + """ -s = pd.Series([np.nan] * 10000) -s[0] = 3.0 -s[100] = -1.0 -s[999] = 12.1 -s.index = pd.MultiIndex.from_product((range(10), range(10), range(10), range(10))) -ss = s.to_sparse() -""" - -stmt = "ss.to_coo(row_levels=[0, 1], column_levels=[2, 3], sort_labels=True)" - -sparse_series_to_coo = Benchmark(stmt, setup, name="sparse_series_to_coo", - start_date=datetime(2015, 1, 3)) - -setup = common_setup + """ -import scipy.sparse -import pandas.sparse.series -A = scipy.sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(100, 100)) -""" - -stmt = "ss = pandas.sparse.series.SparseSeries.from_coo(A)" - -sparse_series_from_coo = Benchmark(stmt, setup, name="sparse_series_from_coo", - start_date=datetime(2015, 1, 3)) diff --git a/vb_suite/stat_ops.py b/vb_suite/stat_ops.py deleted file mode 100644 index 8d7c30dc9fdcf..0000000000000 --- a/vb_suite/stat_ops.py +++ /dev/null @@ -1,126 +0,0 @@ -from vbench.benchmark import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -""" - -#---------------------------------------------------------------------- -# nanops - -setup = common_setup + """ -s = Series(np.random.randn(100000), index=np.arange(100000)) -s[::2] = np.nan -""" - -stat_ops_series_std = Benchmark("s.std()", setup) - -#---------------------------------------------------------------------- -# ops by level - -setup = common_setup + """ -index = MultiIndex(levels=[np.arange(10), np.arange(100), np.arange(100)], - labels=[np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)]) -random.shuffle(index.values) -df = DataFrame(np.random.randn(len(index), 4), index=index) -df_level = DataFrame(np.random.randn(100, 4), index=index.levels[1]) -""" - -stat_ops_level_frame_sum = \ - Benchmark("df.sum(level=1)", setup, - start_date=datetime(2011, 11, 15)) - -stat_ops_level_frame_sum_multiple = \ - Benchmark("df.sum(level=[0, 1])", setup, repeat=1, - start_date=datetime(2011, 11, 15)) - -stat_ops_level_series_sum = \ - Benchmark("df[1].sum(level=1)", setup, - start_date=datetime(2011, 11, 15)) - -stat_ops_level_series_sum_multiple = \ - Benchmark("df[1].sum(level=[0, 1])", setup, repeat=1, - start_date=datetime(2011, 11, 15)) - -sum_setup = common_setup + """ -df = DataFrame(np.random.randn(100000, 4)) -dfi = DataFrame(np.random.randint(1000, size=df.shape)) -""" - -stat_ops_frame_sum_int_axis_0 = \ - Benchmark("dfi.sum()", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_sum_float_axis_0 = \ - Benchmark("df.sum()", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_mean_int_axis_0 = \ - Benchmark("dfi.mean()", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_mean_float_axis_0 = \ - Benchmark("df.mean()", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_sum_int_axis_1 = \ - Benchmark("dfi.sum(1)", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_sum_float_axis_1 = \ - Benchmark("df.sum(1)", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_mean_int_axis_1 = \ - Benchmark("dfi.mean(1)", sum_setup, start_date=datetime(2013, 7, 25)) - -stat_ops_frame_mean_float_axis_1 = \ - Benchmark("df.mean(1)", sum_setup, start_date=datetime(2013, 7, 25)) - -#---------------------------------------------------------------------- -# rank - -setup = common_setup + """ -values = np.concatenate([np.arange(100000), - np.random.randn(100000), - np.arange(100000)]) -s = Series(values) -""" - -stats_rank_average = Benchmark('s.rank()', setup, - start_date=datetime(2011, 12, 12)) - -stats_rank_pct_average = Benchmark('s.rank(pct=True)', setup, - start_date=datetime(2014, 1, 16)) -stats_rank_pct_average_old = Benchmark('s.rank() / len(s)', setup, - start_date=datetime(2014, 1, 16)) -setup = common_setup + """ -values = np.random.randint(0, 100000, size=200000) -s = Series(values) -""" - -stats_rank_average_int = Benchmark('s.rank()', setup, - start_date=datetime(2011, 12, 12)) - -setup = common_setup + """ -df = DataFrame(np.random.randn(5000, 50)) -""" - -stats_rank2d_axis1_average = Benchmark('df.rank(1)', setup, - start_date=datetime(2011, 12, 12)) - -stats_rank2d_axis0_average = Benchmark('df.rank()', setup, - start_date=datetime(2011, 12, 12)) - -# rolling functions - -setup = common_setup + """ -arr = np.random.randn(100000) -""" - -stats_rolling_mean = Benchmark('rolling_mean(arr, 100)', setup, - start_date=datetime(2011, 6, 1)) - -# spearman correlation - -setup = common_setup + """ -df = DataFrame(np.random.randn(1000, 30)) -""" - -stats_corr_spearman = Benchmark("df.corr(method='spearman')", setup, - start_date=datetime(2011, 12, 4)) diff --git a/vb_suite/strings.py b/vb_suite/strings.py deleted file mode 100644 index 0948df5673a0d..0000000000000 --- a/vb_suite/strings.py +++ /dev/null @@ -1,59 +0,0 @@ -from vbench.api import Benchmark - -common_setup = """from .pandas_vb_common import * -""" - -setup = common_setup + """ -import string -import itertools as IT - -def make_series(letters, strlen, size): - return Series( - [str(x) for x in np.fromiter(IT.cycle(letters), count=size*strlen, dtype='|S1') - .view('|S{}'.format(strlen))]) - -many = make_series('matchthis'+string.ascii_uppercase, strlen=19, size=10000) # 31% matches -few = make_series('matchthis'+string.ascii_uppercase*42, strlen=19, size=10000) # 1% matches -""" - -strings_cat = Benchmark("many.str.cat(sep=',')", setup) -strings_title = Benchmark("many.str.title()", setup) -strings_count = Benchmark("many.str.count('matchthis')", setup) -strings_contains_many = Benchmark("many.str.contains('matchthis')", setup) -strings_contains_few = Benchmark("few.str.contains('matchthis')", setup) -strings_contains_many_noregex = Benchmark( - "many.str.contains('matchthis', regex=False)", setup) -strings_contains_few_noregex = Benchmark( - "few.str.contains('matchthis', regex=False)", setup) -strings_startswith = Benchmark("many.str.startswith('matchthis')", setup) -strings_endswith = Benchmark("many.str.endswith('matchthis')", setup) -strings_lower = Benchmark("many.str.lower()", setup) -strings_upper = Benchmark("many.str.upper()", setup) -strings_replace = Benchmark("many.str.replace(r'(matchthis)', r'\1\1')", setup) -strings_repeat = Benchmark( - "many.str.repeat(list(IT.islice(IT.cycle(range(1,4)),len(many))))", setup) -strings_match = Benchmark("many.str.match(r'mat..this')", setup) -strings_extract = Benchmark("many.str.extract(r'(\w*)matchthis(\w*)')", setup) -strings_join_split = Benchmark("many.str.join(r'--').str.split('--')", setup) -strings_join_split_expand = Benchmark("many.str.join(r'--').str.split('--',expand=True)", setup) -strings_len = Benchmark("many.str.len()", setup) -strings_findall = Benchmark("many.str.findall(r'[A-Z]+')", setup) -strings_pad = Benchmark("many.str.pad(100, side='both')", setup) -strings_center = Benchmark("many.str.center(100)", setup) -strings_slice = Benchmark("many.str.slice(5,15,2)", setup) -strings_strip = Benchmark("many.str.strip('matchthis')", setup) -strings_lstrip = Benchmark("many.str.lstrip('matchthis')", setup) -strings_rstrip = Benchmark("many.str.rstrip('matchthis')", setup) -strings_get = Benchmark("many.str.get(0)", setup) - -setup = setup + """ -s = make_series(string.ascii_uppercase, strlen=10, size=10000).str.join('|') -""" -strings_get_dummies = Benchmark("s.str.get_dummies('|')", setup) - -setup = common_setup + """ -import pandas.util.testing as testing -ser = Series(testing.makeUnicodeIndex()) -""" - -strings_encode_decode = Benchmark("ser.str.encode('utf-8').str.decode('utf-8')", setup) diff --git a/vb_suite/suite.py b/vb_suite/suite.py deleted file mode 100644 index 45053b6610896..0000000000000 --- a/vb_suite/suite.py +++ /dev/null @@ -1,164 +0,0 @@ -from vbench.api import Benchmark, GitRepo -from datetime import datetime - -import os - -modules = ['attrs_caching', - 'binary_ops', - 'ctors', - 'frame_ctor', - 'frame_methods', - 'groupby', - 'index_object', - 'indexing', - 'io_bench', - 'io_sql', - 'inference', - 'hdfstore_bench', - 'join_merge', - 'gil', - 'miscellaneous', - 'panel_ctor', - 'packers', - 'parser_vb', - 'panel_methods', - 'plotting', - 'reindex', - 'replace', - 'sparse', - 'strings', - 'reshape', - 'stat_ops', - 'timeseries', - 'timedelta', - 'eval'] - -by_module = {} -benchmarks = [] - -for modname in modules: - ref = __import__(modname) - by_module[modname] = [v for v in ref.__dict__.values() - if isinstance(v, Benchmark)] - benchmarks.extend(by_module[modname]) - -for bm in benchmarks: - assert(bm.name is not None) - -import getpass -import sys - -USERNAME = getpass.getuser() - -if sys.platform == 'darwin': - HOME = '/Users/%s' % USERNAME -else: - HOME = '/home/%s' % USERNAME - -try: - import ConfigParser - - config = ConfigParser.ConfigParser() - config.readfp(open(os.path.expanduser('~/.vbenchcfg'))) - - REPO_PATH = config.get('setup', 'repo_path') - REPO_URL = config.get('setup', 'repo_url') - DB_PATH = config.get('setup', 'db_path') - TMP_DIR = config.get('setup', 'tmp_dir') -except: - REPO_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__), "../")) - REPO_URL = 'git@github.com:pandas-dev/pandas.git' - DB_PATH = os.path.join(REPO_PATH, 'vb_suite/benchmarks.db') - TMP_DIR = os.path.join(HOME, 'tmp/vb_pandas') - -PREPARE = """ -python setup.py clean -""" -BUILD = """ -python setup.py build_ext --inplace -""" -dependencies = ['pandas_vb_common.py'] - -START_DATE = datetime(2010, 6, 1) - -# repo = GitRepo(REPO_PATH) - -RST_BASE = 'source' - -# HACK! - -# timespan = [datetime(2011, 1, 1), datetime(2012, 1, 1)] - - -def generate_rst_files(benchmarks): - import matplotlib as mpl - mpl.use('Agg') - import matplotlib.pyplot as plt - - vb_path = os.path.join(RST_BASE, 'vbench') - fig_base_path = os.path.join(vb_path, 'figures') - - if not os.path.exists(vb_path): - print('creating %s' % vb_path) - os.makedirs(vb_path) - - if not os.path.exists(fig_base_path): - print('creating %s' % fig_base_path) - os.makedirs(fig_base_path) - - for bmk in benchmarks: - print('Generating rst file for %s' % bmk.name) - rst_path = os.path.join(RST_BASE, 'vbench/%s.txt' % bmk.name) - - fig_full_path = os.path.join(fig_base_path, '%s.png' % bmk.name) - - # make the figure - plt.figure(figsize=(10, 6)) - ax = plt.gca() - bmk.plot(DB_PATH, ax=ax) - - start, end = ax.get_xlim() - - plt.xlim([start - 30, end + 30]) - plt.savefig(fig_full_path, bbox_inches='tight') - plt.close('all') - - fig_rel_path = 'vbench/figures/%s.png' % bmk.name - rst_text = bmk.to_rst(image_path=fig_rel_path) - with open(rst_path, 'w') as f: - f.write(rst_text) - - with open(os.path.join(RST_BASE, 'index.rst'), 'w') as f: - print >> f, """ -Performance Benchmarks -====================== - -These historical benchmark graphs were produced with `vbench -`__. - -The ``.pandas_vb_common`` setup script can be found here_ - -.. _here: https://github.com/pandas-dev/pandas/tree/master/vb_suite - -Produced on a machine with - - - Intel Core i7 950 processor - - (K)ubuntu Linux 12.10 - - Python 2.7.2 64-bit (Enthought Python Distribution 7.1-2) - - NumPy 1.6.1 - -.. toctree:: - :hidden: - :maxdepth: 3 -""" - for modname, mod_bmks in sorted(by_module.items()): - print >> f, ' vb_%s' % modname - modpath = os.path.join(RST_BASE, 'vb_%s.rst' % modname) - with open(modpath, 'w') as mh: - header = '%s\n%s\n\n' % (modname, '=' * len(modname)) - print >> mh, header - - for bmk in mod_bmks: - print >> mh, bmk.name - print >> mh, '-' * len(bmk.name) - print >> mh, '.. include:: vbench/%s.txt\n' % bmk.name diff --git a/vb_suite/test.py b/vb_suite/test.py deleted file mode 100644 index da30c3e1a5f76..0000000000000 --- a/vb_suite/test.py +++ /dev/null @@ -1,67 +0,0 @@ -from pandas import * -import matplotlib.pyplot as plt - -import sqlite3 - -from vbench.git import GitRepo - - -REPO_PATH = '/home/adam/code/pandas' -repo = GitRepo(REPO_PATH) - -con = sqlite3.connect('vb_suite/benchmarks.db') - -bmk = '36900a889961162138c140ce4ae3c205' -# bmk = '9d7b8c04b532df6c2d55ef497039b0ce' -bmk = '4481aa4efa9926683002a673d2ed3dac' -bmk = '00593cd8c03d769669d7b46585161726' -bmk = '3725ab7cd0a0657d7ae70f171c877cea' -bmk = '3cd376d6d6ef802cdea49ac47a67be21' -bmk2 = '459225186023853494bc345fd180f395' -bmk = 'c22ca82e0cfba8dc42595103113c7da3' -bmk = 'e0e651a8e9fbf0270ab68137f8b9df5f' -bmk = '96bda4b9a60e17acf92a243580f2a0c3' - - -def get_results(bmk): - results = con.execute( - "select * from results where checksum='%s'" % bmk).fetchall() - x = Series(dict((t[1], t[3]) for t in results)) - x.index = x.index.map(repo.timestamps.get) - x = x.sort_index() - return x - -x = get_results(bmk) - - -def graph1(): - dm_getitem = get_results('459225186023853494bc345fd180f395') - dm_getvalue = get_results('c22ca82e0cfba8dc42595103113c7da3') - - plt.figure() - ax = plt.gca() - - dm_getitem.plot(label='df[col][idx]', ax=ax) - dm_getvalue.plot(label='df.get_value(idx, col)', ax=ax) - - plt.ylabel('ms') - plt.legend(loc='best') - - -def graph2(): - bm = get_results('96bda4b9a60e17acf92a243580f2a0c3') - plt.figure() - ax = plt.gca() - - bm.plot(ax=ax) - plt.ylabel('ms') - -bm = get_results('36900a889961162138c140ce4ae3c205') -fig = plt.figure() -ax = plt.gca() -bm.plot(ax=ax) -fig.autofmt_xdate() - -plt.xlim([bm.dropna().index[0] - datetools.MonthEnd(), - bm.dropna().index[-1] + datetools.MonthEnd()]) -plt.ylabel('ms') diff --git a/vb_suite/test_perf.py b/vb_suite/test_perf.py deleted file mode 100755 index be546b72f9465..0000000000000 --- a/vb_suite/test_perf.py +++ /dev/null @@ -1,616 +0,0 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- - -""" -What ----- -vbench is a library which can be used to benchmark the performance -of a codebase over time. -Although vbench can collect data over many commites, generate plots -and other niceties, for Pull-Requests the important thing is the -performance of the HEAD commit against a known-good baseline. - -This script tries to automate the process of comparing these -two commits, and is meant to run out of the box on a fresh -clone. - -How ---- -These are the steps taken: -1) create a temp directory into which vbench will clone the temporary repo. -2) instantiate a vbench runner, using the local repo as the source repo. -3) perform a vbench run for the baseline commit, then the target commit. -4) pull the results for both commits from the db. use pandas to align -everything and calculate a ration for the timing information. -5) print the results to the log file and to stdout. - -""" - -# IMPORTANT NOTE -# -# This script should run on pandas versions at least as far back as 0.9.1. -# devs should be able to use the latest version of this script with -# any dusty old commit and expect it to "just work". -# One way in which this is useful is when collecting historical data, -# where writing some logic around this script may prove easier -# in some cases then running vbench directly (think perf bisection). -# -# *please*, when you modify this script for whatever reason, -# make sure you do not break its functionality when running under older -# pandas versions. -# Note that depreaction warnings are turned off in main(), so there's -# no need to change the actual code to supress such warnings. - -import shutil -import os -import sys -import argparse -import tempfile -import time -import re - -import random -import numpy as np - -import pandas as pd -from pandas import DataFrame, Series - -from suite import REPO_PATH -VB_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) -DEFAULT_MIN_DURATION = 0.01 -HEAD_COL="head[ms]" -BASE_COL="base[ms]" - -try: - import git # gitpython -except Exception: - print("Error: Please install the `gitpython` package\n") - sys.exit(1) - -class RevParseAction(argparse.Action): - def __call__(self, parser, namespace, values, option_string=None): - import subprocess - cmd = 'git rev-parse --short -verify {0}^{{commit}}'.format(values) - rev_parse = subprocess.check_output(cmd, shell=True) - setattr(namespace, self.dest, rev_parse.strip()) - - -parser = argparse.ArgumentParser(description='Use vbench to measure and compare the performance of commits.') -parser.add_argument('-H', '--head', - help='Execute vbenches using the currently checked out copy.', - dest='head', - action='store_true', - default=False) -parser.add_argument('-b', '--base-commit', - help='The commit serving as performance baseline ', - type=str, action=RevParseAction) -parser.add_argument('-t', '--target-commit', - help='The commit to compare against the baseline (default: HEAD).', - type=str, action=RevParseAction) -parser.add_argument('--base-pickle', - help='name of pickle file with timings data generated by a former `-H -d FILE` run. '\ - 'filename must be of the form -*.* or specify --base-commit seperately', - type=str) -parser.add_argument('--target-pickle', - help='name of pickle file with timings data generated by a former `-H -d FILE` run '\ - 'filename must be of the form -*.* or specify --target-commit seperately', - type=str) -parser.add_argument('-m', '--min-duration', - help='Minimum duration (in ms) of baseline test for inclusion in report (default: %.3f).' % DEFAULT_MIN_DURATION, - type=float, - default=0.01) -parser.add_argument('-o', '--output', - metavar="", - dest='log_file', - help='Path of file in which to save the textual report (default: vb_suite.log).') -parser.add_argument('-d', '--outdf', - metavar="FNAME", - dest='outdf', - default=None, - help='Name of file to df.save() the result table into. Will overwrite') -parser.add_argument('-r', '--regex', - metavar="REGEX", - dest='regex', - default="", - help='Regex pat, only tests whose name matches the regext will be run.') -parser.add_argument('-s', '--seed', - metavar="SEED", - dest='seed', - default=1234, - type=int, - help='Integer value to seed PRNG with') -parser.add_argument('-n', '--repeats', - metavar="N", - dest='repeats', - default=3, - type=int, - help='Number of times to run each vbench, result value is the best of') -parser.add_argument('-c', '--ncalls', - metavar="N", - dest='ncalls', - default=3, - type=int, - help='Number of calls to in each repetition of a vbench') -parser.add_argument('-N', '--hrepeats', - metavar="N", - dest='hrepeats', - default=1, - type=int, - help='implies -H, number of times to run the vbench suite on the head commit.\n' - 'Each iteration will yield another column in the output' ) -parser.add_argument('-a', '--affinity', - metavar="a", - dest='affinity', - default=1, - type=int, - help='set processor affinity of process by default bind to cpu/core #1 only. ' - 'Requires the "affinity" or "psutil" python module, will raise Warning otherwise') -parser.add_argument('-u', '--burnin', - metavar="u", - dest='burnin', - default=1, - type=int, - help='Number of extra iteration per benchmark to perform first, then throw away. ' ) - -parser.add_argument('-S', '--stats', - default=False, - action='store_true', - help='when specified with -N, prints the output of describe() per vbench results. ' ) - -parser.add_argument('--temp-dir', - metavar="PATH", - default=None, - help='Specify temp work dir to use. ccache depends on builds being invoked from consistent directory.' ) - -parser.add_argument('-q', '--quiet', - default=False, - action='store_true', - help='Suppress report output to stdout. ' ) - -def get_results_df(db, rev): - """Takes a git commit hash and returns a Dataframe of benchmark results - """ - bench = DataFrame(db.get_benchmarks()) - results = DataFrame(map(list,db.get_rev_results(rev).values())) - - # Sinch vbench.db._reg_rev_results returns an unlabeled dict, - # we have to break encapsulation a bit. - results.columns = db._results.c.keys() - results = results.join(bench['name'], on='checksum').set_index("checksum") - return results - - -def prprint(s): - print("*** %s" % s) - -def pre_hook(): - import gc - gc.disable() - -def post_hook(): - import gc - gc.enable() - -def profile_comparative(benchmarks): - - from vbench.api import BenchmarkRunner - from vbench.db import BenchmarkDB - from vbench.git import GitRepo - from suite import BUILD, DB_PATH, PREPARE, dependencies - - TMP_DIR = args.temp_dir or tempfile.mkdtemp() - - try: - - prprint("Opening DB at '%s'...\n" % DB_PATH) - db = BenchmarkDB(DB_PATH) - - prprint("Initializing Runner...") - - # all in a good cause... - GitRepo._parse_commit_log = _parse_wrapper(args.base_commit) - - runner = BenchmarkRunner( - benchmarks, REPO_PATH, REPO_PATH, BUILD, DB_PATH, - TMP_DIR, PREPARE, always_clean=True, - # run_option='eod', start_date=START_DATE, - module_dependencies=dependencies) - - repo = runner.repo # (steal the parsed git repo used by runner) - h_head = args.target_commit or repo.shas[-1] - h_baseline = args.base_commit - - # ARGH. reparse the repo, without discarding any commits, - # then overwrite the previous parse results - # prprint("Slaughtering kittens...") - (repo.shas, repo.messages, - repo.timestamps, repo.authors) = _parse_commit_log(None,REPO_PATH, - args.base_commit) - - prprint('Target [%s] : %s\n' % (h_head, repo.messages.get(h_head, ""))) - prprint('Baseline [%s] : %s\n' % (h_baseline, - repo.messages.get(h_baseline, ""))) - - prprint("Removing any previous measurements for the commits.") - db.delete_rev_results(h_baseline) - db.delete_rev_results(h_head) - - # TODO: we could skip this, but we need to make sure all - # results are in the DB, which is a little tricky with - # start dates and so on. - prprint("Running benchmarks for baseline [%s]" % h_baseline) - runner._run_and_write_results(h_baseline) - - prprint("Running benchmarks for target [%s]" % h_head) - runner._run_and_write_results(h_head) - - prprint('Processing results...') - - head_res = get_results_df(db, h_head) - baseline_res = get_results_df(db, h_baseline) - - report_comparative(head_res,baseline_res) - - finally: - # print("Disposing of TMP_DIR: %s" % TMP_DIR) - shutil.rmtree(TMP_DIR) - -def prep_pickle_for_total(df, agg_name='median'): - """ - accepts a datafram resulting from invocation with -H -d o.pickle - If multiple data columns are present (-N was used), the - `agg_name` attr of the datafram will be used to reduce - them to a single value per vbench, df.median is used by defa - ult. - - Returns a datadrame of the form expected by prep_totals - """ - def prep(df): - agg = getattr(df,agg_name) - df = DataFrame(agg(1)) - cols = list(df.columns) - cols[0]='timing' - df.columns=cols - df['name'] = list(df.index) - return df - - return prep(df) - -def prep_totals(head_res, baseline_res): - """ - Each argument should be a dataframe with 'timing' and 'name' columns - where name is the name of the vbench. - - returns a 'totals' dataframe, suitable as input for print_report. - """ - head_res, baseline_res = head_res.align(baseline_res) - ratio = head_res['timing'] / baseline_res['timing'] - totals = DataFrame({HEAD_COL:head_res['timing'], - BASE_COL:baseline_res['timing'], - 'ratio':ratio, - 'name':baseline_res.name}, - columns=[HEAD_COL, BASE_COL, "ratio", "name"]) - totals = totals.ix[totals[HEAD_COL] > args.min_duration] - # ignore below threshold - totals = totals.dropna( - ).sort("ratio").set_index('name') # sort in ascending order - return totals - -def report_comparative(head_res,baseline_res): - try: - r=git.Repo(VB_DIR) - except: - import pdb - pdb.set_trace() - - totals = prep_totals(head_res,baseline_res) - - h_head = args.target_commit - h_baseline = args.base_commit - h_msg = b_msg = "Unknown" - try: - h_msg = r.commit(h_head).message.strip() - except git.exc.BadObject: - pass - try: - b_msg = r.commit(h_baseline).message.strip() - except git.exc.BadObject: - pass - - - print_report(totals,h_head=h_head,h_msg=h_msg, - h_baseline=h_baseline,b_msg=b_msg) - - if args.outdf: - prprint("The results DataFrame was written to '%s'\n" % args.outdf) - totals.save(args.outdf) - -def profile_head_single(benchmark): - import gc - results = [] - - # just in case - gc.collect() - - try: - from ctypes import cdll, CDLL - cdll.LoadLibrary("libc.so.6") - libc = CDLL("libc.so.6") - libc.malloc_trim(0) - except: - pass - - - N = args.hrepeats + args.burnin - - results = [] - try: - for i in range(N): - gc.disable() - d=dict() - - try: - d = benchmark.run() - - except KeyboardInterrupt: - raise - except Exception as e: # if a single vbench bursts into flames, don't die. - err="" - try: - err = d.get("traceback") - if err is None: - err = str(e) - except: - pass - print("%s died with:\n%s\nSkipping...\n" % (benchmark.name, err)) - - results.append(d.get('timing',np.nan)) - gc.enable() - gc.collect() - - finally: - gc.enable() - - if results: - # throw away the burn_in - results = results[args.burnin:] - sys.stdout.write('.') - sys.stdout.flush() - return Series(results, name=benchmark.name) - - # df = DataFrame(results) - # df.columns = ["name",HEAD_COL] - # return df.set_index("name")[HEAD_COL] - -def profile_head(benchmarks): - print( "Performing %d benchmarks (%d runs each)" % ( len(benchmarks), args.hrepeats)) - - ss= [profile_head_single(b) for b in benchmarks] - print("\n") - - results = DataFrame(ss) - results.columns=[ "#%d" %i for i in range(args.hrepeats)] - # results.index = ["#%d" % i for i in range(len(ss))] - # results = results.T - - shas, messages, _,_ = _parse_commit_log(None,REPO_PATH,base_commit="HEAD^") - print_report(results,h_head=shas[-1],h_msg=messages[-1]) - - - if args.outdf: - prprint("The results DataFrame was written to '%s'\n" % args.outdf) - DataFrame(results).save(args.outdf) - -def print_report(df,h_head=None,h_msg="",h_baseline=None,b_msg=""): - - name_width=45 - col_width = 10 - - hdr = ("{:%s}" % name_width).format("Test name") - hdr += ("|{:^%d}" % col_width)* len(df.columns) - hdr += "|" - hdr = hdr.format(*df.columns) - hdr = "-"*len(hdr) + "\n" + hdr + "\n" + "-"*len(hdr) + "\n" - ftr=hdr - s = "\n" - s+= "Invoked with :\n" - s+= "--ncalls: %s\n" % (args.ncalls or 'Auto') - s+= "--repeats: %s\n" % (args.repeats) - s+= "\n\n" - - s += hdr - # import ipdb - # ipdb.set_trace() - for i in range(len(df)): - lfmt = ("{:%s}" % name_width) - lfmt += ("| {:%d.4f} " % (col_width-2))* len(df.columns) - lfmt += "|\n" - s += lfmt.format(df.index[i],*list(df.iloc[i].values)) - - s+= ftr + "\n" - - s += "Ratio < 1.0 means the target commit is faster then the baseline.\n" - s += "Seed used: %d\n\n" % args.seed - - if h_head: - s += 'Target [%s] : %s\n' % (h_head, h_msg) - if h_baseline: - s += 'Base [%s] : %s\n\n' % ( - h_baseline, b_msg) - - stats_footer = "\n" - if args.stats : - try: - pd.options.display.expand_frame_repr=False - except: - pass - stats_footer += str(df.T.describe().T) + "\n\n" - - s+= stats_footer - logfile = open(args.log_file, 'w') - logfile.write(s) - logfile.close() - - if not args.quiet: - prprint(s) - - if args.stats and args.quiet: - prprint(stats_footer) - - prprint("Results were also written to the logfile at '%s'" % - args.log_file) - - - -def main(): - from suite import benchmarks - - if not args.log_file: - args.log_file = os.path.abspath( - os.path.join(REPO_PATH, 'vb_suite.log')) - - saved_dir = os.path.curdir - if args.outdf: - # not bullet-proof but enough for us - args.outdf = os.path.realpath(args.outdf) - - if args.log_file: - # not bullet-proof but enough for us - args.log_file = os.path.realpath(args.log_file) - - random.seed(args.seed) - np.random.seed(args.seed) - - if args.base_pickle and args.target_pickle: - baseline_res = prep_pickle_for_total(pd.load(args.base_pickle)) - target_res = prep_pickle_for_total(pd.load(args.target_pickle)) - - report_comparative(target_res, baseline_res) - sys.exit(0) - - if args.affinity is not None: - try: # use psutil rather then stale affinity module. Thanks @yarikoptic - import psutil - if hasattr(psutil.Process, 'set_cpu_affinity'): - psutil.Process(os.getpid()).set_cpu_affinity([args.affinity]) - print("CPU affinity set to %d" % args.affinity) - except ImportError: - print("-a/--affinity specified, but the 'psutil' module is not available, aborting.\n") - sys.exit(1) - - print("\n") - prprint("LOG_FILE = %s" % args.log_file) - if args.outdf: - prprint("PICKE_FILE = %s" % args.outdf) - - print("\n") - - # move away from the pandas root dir, to avoid possible import - # surprises - os.chdir(os.path.dirname(os.path.abspath(__file__))) - - benchmarks = [x for x in benchmarks if re.search(args.regex,x.name)] - - for b in benchmarks: - b.repeat = args.repeats - if args.ncalls: - b.ncalls = args.ncalls - - if benchmarks: - if args.head: - profile_head(benchmarks) - else: - profile_comparative(benchmarks) - else: - print( "No matching benchmarks") - - os.chdir(saved_dir) - -# hack , vbench.git ignores some commits, but we -# need to be able to reference any commit. -# modified from vbench.git -def _parse_commit_log(this,repo_path,base_commit=None): - from vbench.git import _convert_timezones - from pandas import Series - from dateutil import parser as dparser - - git_cmd = 'git --git-dir=%s/.git --work-tree=%s ' % (repo_path, repo_path) - githist = git_cmd + ('log --graph --pretty=format:'+ - '\"::%h::%cd::%s::%an\"'+ - ('%s..' % base_commit)+ - '> githist.txt') - os.system(githist) - githist = open('githist.txt').read() - os.remove('githist.txt') - - shas = [] - timestamps = [] - messages = [] - authors = [] - for line in githist.split('\n'): - if '*' not in line.split("::")[0]: # skip non-commit lines - continue - - _, sha, stamp, message, author = line.split('::', 4) - - # parse timestamp into datetime object - stamp = dparser.parse(stamp) - - shas.append(sha) - timestamps.append(stamp) - messages.append(message) - authors.append(author) - - # to UTC for now - timestamps = _convert_timezones(timestamps) - - shas = Series(shas, timestamps) - messages = Series(messages, shas) - timestamps = Series(timestamps, shas) - authors = Series(authors, shas) - return shas[::-1], messages[::-1], timestamps[::-1], authors[::-1] - -# even worse, monkey patch vbench -def _parse_wrapper(base_commit): - def inner(repo_path): - return _parse_commit_log(repo_path,base_commit) - return inner - -if __name__ == '__main__': - args = parser.parse_args() - if (not args.head - and not (args.base_commit and args.target_commit) - and not (args.base_pickle and args.target_pickle)): - parser.print_help() - sys.exit(1) - elif ((args.base_pickle or args.target_pickle) and not - (args.base_pickle and args.target_pickle)): - print("Must specify Both --base-pickle and --target-pickle.") - sys.exit(1) - - if ((args.base_pickle or args.target_pickle) and not - (args.base_commit and args.target_commit)): - if not args.base_commit: - print("base_commit not specified, Assuming base_pickle is named -foo.*") - args.base_commit = args.base_pickle.split('-')[0] - if not args.target_commit: - print("target_commit not specified, Assuming target_pickle is named -foo.*") - args.target_commit = args.target_pickle.split('-')[0] - - import warnings - warnings.filterwarnings('ignore',category=FutureWarning) - warnings.filterwarnings('ignore',category=DeprecationWarning) - - if args.base_commit and args.target_commit: - print("Verifying specified commits exist in repo...") - r=git.Repo(VB_DIR) - for c in [ args.base_commit, args.target_commit ]: - try: - msg = r.commit(c).message.strip() - except git.BadObject: - print("The commit '%s' was not found, aborting..." % c) - sys.exit(1) - else: - print("%s: %s" % (c,msg)) - - main() diff --git a/vb_suite/timedelta.py b/vb_suite/timedelta.py deleted file mode 100644 index 378968ea1379a..0000000000000 --- a/vb_suite/timedelta.py +++ /dev/null @@ -1,32 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime - -common_setup = """from .pandas_vb_common import * -from pandas import to_timedelta -""" - -#---------------------------------------------------------------------- -# conversion - -setup = common_setup + """ -arr = np.random.randint(0,1000,size=10000) -""" - -stmt = "to_timedelta(arr,unit='s')" -timedelta_convert_int = Benchmark(stmt, setup, start_date=datetime(2014, 1, 1)) - -setup = common_setup + """ -arr = np.random.randint(0,1000,size=10000) -arr = [ '{0} days'.format(i) for i in arr ] -""" - -stmt = "to_timedelta(arr)" -timedelta_convert_string = Benchmark(stmt, setup, start_date=datetime(2014, 1, 1)) - -setup = common_setup + """ -arr = np.random.randint(0,60,size=10000) -arr = [ '00:00:{0:02d}'.format(i) for i in arr ] -""" - -stmt = "to_timedelta(arr)" -timedelta_convert_string_seconds = Benchmark(stmt, setup, start_date=datetime(2014, 1, 1)) diff --git a/vb_suite/timeseries.py b/vb_suite/timeseries.py deleted file mode 100644 index 15bc89d62305f..0000000000000 --- a/vb_suite/timeseries.py +++ /dev/null @@ -1,445 +0,0 @@ -from vbench.api import Benchmark -from datetime import datetime -from pandas import * - -N = 100000 -try: - rng = date_range(start='1/1/2000', periods=N, freq='min') -except NameError: - rng = DatetimeIndex(start='1/1/2000', periods=N, freq='T') - def date_range(start=None, end=None, periods=None, freq=None): - return DatetimeIndex(start=start, end=end, periods=periods, offset=freq) - - -common_setup = """from .pandas_vb_common import * -from datetime import timedelta -N = 100000 - -rng = date_range(start='1/1/2000', periods=N, freq='T') - -if hasattr(Series, 'convert'): - Series.resample = Series.convert - -ts = Series(np.random.randn(N), index=rng) -""" - -#---------------------------------------------------------------------- -# Lookup value in large time series, hash map population - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=1500000, freq='S') -ts = Series(1, index=rng) -""" - -stmt = "ts[ts.index[len(ts) // 2]]; ts.index._cleanup()" -timeseries_large_lookup_value = Benchmark(stmt, setup, - start_date=datetime(2012, 1, 1)) - -#---------------------------------------------------------------------- -# Test slice minutely series - -timeseries_slice_minutely = Benchmark('ts[:10000]', common_setup) - -#---------------------------------------------------------------------- -# Test conversion - -setup = common_setup + """ - -""" - -timeseries_1min_5min_ohlc = Benchmark( - "ts[:10000].resample('5min', how='ohlc')", - common_setup, - start_date=datetime(2012, 5, 1)) - -timeseries_1min_5min_mean = Benchmark( - "ts[:10000].resample('5min', how='mean')", - common_setup, - start_date=datetime(2012, 5, 1)) - -#---------------------------------------------------------------------- -# Irregular alignment - -setup = common_setup + """ -lindex = np.random.permutation(N)[:N // 2] -rindex = np.random.permutation(N)[:N // 2] -left = Series(ts.values.take(lindex), index=ts.index.take(lindex)) -right = Series(ts.values.take(rindex), index=ts.index.take(rindex)) -""" - -timeseries_add_irregular = Benchmark('left + right', setup) - -#---------------------------------------------------------------------- -# Sort large irregular time series - -setup = common_setup + """ -N = 100000 -rng = date_range(start='1/1/2000', periods=N, freq='s') -rng = rng.take(np.random.permutation(N)) -ts = Series(np.random.randn(N), index=rng) -""" - -timeseries_sort_index = Benchmark('ts.sort_index()', setup, - start_date=datetime(2012, 4, 1)) - -#---------------------------------------------------------------------- -# Shifting, add offset - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=10000, freq='T') -""" - -datetimeindex_add_offset = Benchmark('rng + timedelta(minutes=2)', setup, - start_date=datetime(2012, 4, 1)) - -setup = common_setup + """ -N = 10000 -rng = date_range(start='1/1/1990', periods=N, freq='53s') -ts = Series(np.random.randn(N), index=rng) -dates = date_range(start='1/1/1990', periods=N * 10, freq='5s') -""" -timeseries_asof_single = Benchmark('ts.asof(dates[0])', setup, - start_date=datetime(2012, 4, 27)) - -timeseries_asof = Benchmark('ts.asof(dates)', setup, - start_date=datetime(2012, 4, 27)) - -setup = setup + 'ts[250:5000] = np.nan' - -timeseries_asof_nan = Benchmark('ts.asof(dates)', setup, - start_date=datetime(2012, 4, 27)) - -#---------------------------------------------------------------------- -# Time zone - -setup = common_setup + """ -rng = date_range(start='1/1/2000', end='3/1/2000', tz='US/Eastern') -""" - -timeseries_timestamp_tzinfo_cons = \ - Benchmark('rng[0]', setup, start_date=datetime(2012, 5, 5)) - -#---------------------------------------------------------------------- -# Resampling period - -setup = common_setup + """ -rng = period_range(start='1/1/2000', end='1/1/2001', freq='T') -ts = Series(np.random.randn(len(rng)), index=rng) -""" - -timeseries_period_downsample_mean = \ - Benchmark("ts.resample('D', how='mean')", setup, - start_date=datetime(2012, 4, 25)) - -setup = common_setup + """ -rng = date_range(start='1/1/2000', end='1/1/2001', freq='T') -ts = Series(np.random.randn(len(rng)), index=rng) -""" - -timeseries_timestamp_downsample_mean = \ - Benchmark("ts.resample('D', how='mean')", setup, - start_date=datetime(2012, 4, 25)) - -# GH 7754 -setup = common_setup + """ -rng = date_range(start='2000-01-01 00:00:00', - end='2000-01-01 10:00:00', freq='555000U') -int_ts = Series(5, rng, dtype='int64') -ts = int_ts.astype('datetime64[ns]') -""" - -timeseries_resample_datetime64 = Benchmark("ts.resample('1S', how='last')", setup) - -#---------------------------------------------------------------------- -# to_datetime - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=20000, freq='H') -strings = [x.strftime('%Y-%m-%d %H:%M:%S') for x in rng] -""" - -timeseries_to_datetime_iso8601 = \ - Benchmark('to_datetime(strings)', setup, - start_date=datetime(2012, 7, 11)) - -timeseries_to_datetime_iso8601_format = \ - Benchmark("to_datetime(strings, format='%Y-%m-%d %H:%M:%S')", setup, - start_date=datetime(2012, 7, 11)) - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=10000, freq='D') -strings = Series(rng.year*10000+rng.month*100+rng.day,dtype=np.int64).apply(str) -""" - -timeseries_to_datetime_YYYYMMDD = \ - Benchmark('to_datetime(strings,format="%Y%m%d")', setup, - start_date=datetime(2012, 7, 1)) - -setup = common_setup + """ -s = Series(['19MAY11','19MAY11:00:00:00']*100000) -""" -timeseries_with_format_no_exact = Benchmark("to_datetime(s,format='%d%b%y',exact=False)", \ - setup, start_date=datetime(2014, 11, 26)) -timeseries_with_format_replace = Benchmark("to_datetime(s.str.replace(':\S+$',''),format='%d%b%y')", \ - setup, start_date=datetime(2014, 11, 26)) - -# ---- infer_freq -# infer_freq - -setup = common_setup + """ -from pandas.tseries.frequencies import infer_freq -rng = date_range(start='1/1/1700', freq='D', periods=100000) -a = rng[:50000].append(rng[50002:]) -""" - -timeseries_infer_freq = \ - Benchmark('infer_freq(a)', setup, start_date=datetime(2012, 7, 1)) - -# setitem PeriodIndex - -setup = common_setup + """ -rng = period_range(start='1/1/1990', freq='S', periods=20000) -df = DataFrame(index=range(len(rng))) -""" - -period_setitem = \ - Benchmark("df['col'] = rng", setup, - start_date=datetime(2012, 8, 1)) - -setup = common_setup + """ -rng = date_range(start='1/1/2000 9:30', periods=10000, freq='S', tz='US/Eastern') -""" - -datetimeindex_normalize = \ - Benchmark('rng.normalize()', setup, - start_date=datetime(2012, 9, 1)) - -setup = common_setup + """ -from pandas.tseries.offsets import Second -s1 = date_range(start='1/1/2000', periods=100, freq='S') -curr = s1[-1] -slst = [] -for i in range(100): - slst.append(curr + Second()), periods=100, freq='S') - curr = slst[-1][-1] -""" - -# dti_append_tz = \ -# Benchmark('s1.append(slst)', setup, start_date=datetime(2012, 9, 1)) - - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=1000, freq='H') -df = DataFrame(np.random.randn(len(rng), 2), rng) -""" - -dti_reset_index = \ - Benchmark('df.reset_index()', setup, start_date=datetime(2012, 9, 1)) - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=1000, freq='H', - tz='US/Eastern') -df = DataFrame(np.random.randn(len(rng), 2), index=rng) -""" - -dti_reset_index_tz = \ - Benchmark('df.reset_index()', setup, start_date=datetime(2012, 9, 1)) - -setup = common_setup + """ -rng = date_range(start='1/1/2000', periods=1000, freq='T') -index = rng.repeat(10) -""" - -datetimeindex_unique = Benchmark('index.unique()', setup, - start_date=datetime(2012, 7, 1)) - -# tz_localize with infer argument. This is an attempt to emulate the results -# of read_csv with duplicated data. Not passing infer_dst will fail -setup = common_setup + """ -dst_rng = date_range(start='10/29/2000 1:00:00', - end='10/29/2000 1:59:59', freq='S') -index = date_range(start='10/29/2000', end='10/29/2000 00:59:59', freq='S') -index = index.append(dst_rng) -index = index.append(dst_rng) -index = index.append(date_range(start='10/29/2000 2:00:00', - end='10/29/2000 3:00:00', freq='S')) -""" - -datetimeindex_infer_dst = \ -Benchmark('index.tz_localize("US/Eastern", infer_dst=True)', - setup, start_date=datetime(2013, 9, 30)) - - -#---------------------------------------------------------------------- -# Resampling: fast-path various functions - -setup = common_setup + """ -rng = date_range(start='20130101',periods=100000,freq='50L') -df = DataFrame(np.random.randn(100000,2),index=rng) -""" - -dataframe_resample_mean_string = \ - Benchmark("df.resample('1s', how='mean')", setup) - -dataframe_resample_mean_numpy = \ - Benchmark("df.resample('1s', how=np.mean)", setup) - -dataframe_resample_min_string = \ - Benchmark("df.resample('1s', how='min')", setup) - -dataframe_resample_min_numpy = \ - Benchmark("df.resample('1s', how=np.min)", setup) - -dataframe_resample_max_string = \ - Benchmark("df.resample('1s', how='max')", setup) - -dataframe_resample_max_numpy = \ - Benchmark("df.resample('1s', how=np.max)", setup) - - -#---------------------------------------------------------------------- -# DatetimeConverter - -setup = common_setup + """ -from pandas.tseries.converter import DatetimeConverter -""" - -datetimeindex_converter = \ - Benchmark('DatetimeConverter.convert(rng, None, None)', - setup, start_date=datetime(2013, 1, 1)) - -# Adding custom business day -setup = common_setup + """ -import datetime as dt -import pandas as pd -try: - import pandas.tseries.holiday -except ImportError: - pass -import numpy as np - -date = dt.datetime(2011,1,1) -dt64 = np.datetime64('2011-01-01 09:00Z') -hcal = pd.tseries.holiday.USFederalHolidayCalendar() - -day = pd.offsets.Day() -year = pd.offsets.YearBegin() -cday = pd.offsets.CustomBusinessDay() -cmb = pd.offsets.CustomBusinessMonthBegin(calendar=hcal) -cme = pd.offsets.CustomBusinessMonthEnd(calendar=hcal) - -cdayh = pd.offsets.CustomBusinessDay(calendar=hcal) -""" -timeseries_day_incr = Benchmark("date + day",setup) - -timeseries_day_apply = Benchmark("day.apply(date)",setup) - -timeseries_year_incr = Benchmark("date + year",setup) - -timeseries_year_apply = Benchmark("year.apply(date)",setup) - -timeseries_custom_bday_incr = \ - Benchmark("date + cday",setup) - -timeseries_custom_bday_decr = \ - Benchmark("date - cday",setup) - -timeseries_custom_bday_apply = \ - Benchmark("cday.apply(date)",setup) - -timeseries_custom_bday_apply_dt64 = \ - Benchmark("cday.apply(dt64)",setup) - -timeseries_custom_bday_cal_incr = \ - Benchmark("date + 1 * cdayh",setup) - -timeseries_custom_bday_cal_decr = \ - Benchmark("date - 1 * cdayh",setup) - -timeseries_custom_bday_cal_incr_n = \ - Benchmark("date + 10 * cdayh",setup) - -timeseries_custom_bday_cal_incr_neg_n = \ - Benchmark("date - 10 * cdayh",setup) - -# Increment custom business month -timeseries_custom_bmonthend_incr = \ - Benchmark("date + cme",setup) - -timeseries_custom_bmonthend_incr_n = \ - Benchmark("date + 10 * cme",setup) - -timeseries_custom_bmonthend_decr_n = \ - Benchmark("date - 10 * cme",setup) - -timeseries_custom_bmonthbegin_incr_n = \ - Benchmark("date + 10 * cmb",setup) - -timeseries_custom_bmonthbegin_decr_n = \ - Benchmark("date - 10 * cmb",setup) - - -#---------------------------------------------------------------------- -# month/quarter/year start/end accessors - -setup = common_setup + """ -N = 10000 -rng = date_range(start='1/1/1', periods=N, freq='B') -""" - -timeseries_is_month_start = Benchmark('rng.is_month_start', setup, - start_date=datetime(2014, 4, 1)) - -#---------------------------------------------------------------------- -# iterate over DatetimeIndex/PeriodIndex -setup = common_setup + """ -N = 1000000 -M = 10000 -idx1 = date_range(start='20140101', freq='T', periods=N) -idx2 = period_range(start='20140101', freq='T', periods=N) - -def iter_n(iterable, n=None): - i = 0 - for _ in iterable: - i += 1 - if n is not None and i > n: - break -""" - -timeseries_iter_datetimeindex = Benchmark('iter_n(idx1)', setup) - -timeseries_iter_periodindex = Benchmark('iter_n(idx2)', setup) - -timeseries_iter_datetimeindex_preexit = Benchmark('iter_n(idx1, M)', setup) - -timeseries_iter_periodindex_preexit = Benchmark('iter_n(idx2, M)', setup) - - -#---------------------------------------------------------------------- -# apply an Offset to a DatetimeIndex -setup = common_setup + """ -N = 100000 -idx1 = date_range(start='20140101', freq='T', periods=N) -delta_offset = pd.offsets.Day() -fast_offset = pd.offsets.DateOffset(months=2, days=2) -slow_offset = pd.offsets.BusinessDay() - -""" - -timeseries_datetimeindex_offset_delta = Benchmark('idx1 + delta_offset', setup) -timeseries_datetimeindex_offset_fast = Benchmark('idx1 + fast_offset', setup) -timeseries_datetimeindex_offset_slow = Benchmark('idx1 + slow_offset', setup) - -# apply an Offset to a Series containing datetime64 values -setup = common_setup + """ -N = 100000 -s = Series(date_range(start='20140101', freq='T', periods=N)) -delta_offset = pd.offsets.Day() -fast_offset = pd.offsets.DateOffset(months=2, days=2) -slow_offset = pd.offsets.BusinessDay() - -""" - -timeseries_series_offset_delta = Benchmark('s + delta_offset', setup) -timeseries_series_offset_fast = Benchmark('s + fast_offset', setup) -timeseries_series_offset_slow = Benchmark('s + slow_offset', setup) diff --git a/versioneer.py b/versioneer.py index c010f63e3ead8..104e8e97c6bd6 100644 --- a/versioneer.py +++ b/versioneer.py @@ -1130,7 +1130,9 @@ def versions_from_parentdir(parentdir_prefix, root, verbose): # unpacked source archive. Distribution tarballs contain a pre-generated copy # of this file. -import json +from warnings import catch_warnings +with catch_warnings(record=True): + import json import sys version_json = '''